mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
3687 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
78b842c995
|
fix(install): support non-sudo service-user installs on apt distros (#25814)
The Debian/Ubuntu branch of install_node_deps() ran 'npx playwright install --with-deps chromium' unconditionally. Playwright invokes sudo interactively to apt-install Chromium's system libraries, which blocks the installer for non-sudo users (systemd service accounts, unprivileged operator users) on an unsatisfiable password prompt. Changes: - install.sh: gate --with-deps behind a sudo capability check on the apt branch (matches the existing Arch/pacman branch pattern). Non-sudo users fall back to 'npx playwright install chromium' alone and the installer prints the exact 'sudo npx playwright install-deps chromium' command an administrator can run separately. - install.sh: add --skip-browser (alias --no-playwright) to skip the Playwright step entirely for headless installs that don't need browser automation. Mirrors the existing --no-venv / --skip-setup shape. - installation.md: add a 'Non-Sudo / System Service User Installs' section covering the admin/service-user split, the --skip-browser flag, and the ~/.local/bin PATH gotcha (the root cause of the 'No module named dotenv' error users hit when running the repo source 'hermes' script with system Python instead of the venv launcher). - test_install_sh_browser_install.py: regression coverage for the --skip-browser flag and the sudo-gate on the apt branch. Reported by @ssilver in Discord. |
||
|
|
26933c2f59 |
fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks
_make_stream_chunk built delta_kwargs with only `role`, so a reasoning-only chunk produced a SimpleNamespace without a `.content` attribute. Downstream consumers that read `delta.content` then raised AttributeError on Gemini 2.5 Flash, where the thinking delta arrives before any content delta. Seed `content`, `tool_calls`, `reasoning`, and `reasoning_content` as None up front, matching the pattern already used in gemini_native_adapter.py. Key-present arguments still override the defaults. Fixes #24974 References: Related open PR #24984 (luyao618) applies the same 1-line fix; this PR adds a regression test that #24984 omits Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
72b5dd8658
|
fix(update): refresh lazy-installed backends on hermes update (#25766)
Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional
backends moved to tools/lazy_deps.py and only install on first use.
hermes update runs uv pip install -e .[all] which doesn't touch any of
them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were
silently ignored on already-activated backends.
Two changes:
1. _is_satisfied() now parses the spec and checks the installed version
against the constraint via packaging.specifiers. Previously it
returned True the moment the package name was importable, which made
ensure() a name-presence gate rather than a version-pin gate.
2. New active_features() / refresh_active_features() pair: lists every
feature with at least one of its packages currently installed, then
re-runs ensure() on each. Refresh is invoked at the end of
_cmd_update_impl, right after the [all] install completes. Cold
backends (never activated) stay quiet — no churn for them.
Output during update is one summary block:
→ Refreshing 4 active lazy backend(s)...
↑ 1 refreshed: provider.anthropic
✓ 3 already current
or
⚠ memory.honcho failed to refresh: <pip stderr>
Failures never raise out of update — backends keep their previously-
installed version and we tell the user to rerun once upstream is fixed.
security.allow_lazy_installs=false is honored: features get marked
"skipped" with the reason shown.
Tests: 18 new unit tests covering version-aware satisfaction (exact pin,
range, extras blocks, missing package, malformed spec), active feature
discovery, and refresh status reporting. All 61 lazy_deps tests pass.
|
||
|
|
436a0a271e |
test(toolsets): lock web search into default platform coverage
Adds regression tests pinning web search into the WhatsApp and api-server
default platform-coverage toolsets. Pure test additions, no runtime change.
Salvage of the test-addition commit from #25692 by @wesleysimplicio.
(The AUTHOR_MAP fixup commit from the same PR landed separately as
|
||
|
|
f26098e22f | fix(gateway): enable text-intercept for multi-choice clarify fallback (#25567) | ||
|
|
1247ff2dca | fix: stop retrying initial MCP auth failures | ||
|
|
12f755c9eb
|
fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769)
Mirrors openclaw beta.8's app-server resilience fixes so a stuck codex subprocess can't burn the full turn deadline and so users get a `codex login` pointer instead of raw RPC errors when their token expires. - TurnResult.should_retire signals the caller to drop+respawn codex. - Deadline-hit path and dead-subprocess detection set should_retire so the next turn doesn't ride a CPU-spinning or auth-broken process. - Post-tool watchdog (post_tool_quiet_timeout=90s): if a tool item completes and codex goes silent past the threshold without further output or turn/completed, fast-fail instead of waiting the full 600s. Resets on any non-tool activity so normal think-after-tool flows are not affected. - <turn_aborted> and <turn_aborted/> in agent text are treated as terminal — some codex builds tear down a turn that way without emitting turn/completed. - _classify_oauth_failure() inspects RPC error message + stderr tail for invalid_grant / token refresh / 401 / etc. and rewrites user-facing errors to 'run codex login'. Conservative: generic failures still surface verbatim. Fires at turn/start failure, turn/completed failure, and dead-subprocess paths. - thread/start cross-fill: tolerate thread.id, thread.sessionId, top-level sessionId/threadId so future codex schema drift doesn't KeyError us at handshake. - run_agent.py: when run_turn returns should_retire=True OR raises, close + null self._codex_session so the next turn respawns. Tests: +30 cases across session + integration suites. tests/agent/transports/test_codex_app_server_session.py 50/50 pass tests/run_agent/test_codex_app_server_integration.py 27/27 pass Broader codex scope (transports + cli runtime/migration) 376/376 pass |
||
|
|
63991bbd97 | fix(memory): skip OpenViking upload symlinks | ||
|
|
26deeea830 |
fix(telegram): restore model-switch success path + author map
The cherry-picked PR over-indented the edit_message_text block for the mm: (model selected → switch) success path so the confirmation edit lived inside the preceding 'except Exception as exc' branch and only fired when the callback raised. Dedent the try/except back to 12-space indent so it runs after the callback succeeds, restoring the original flow that removes the inline buttons and shows the 'Switched to ...' confirmation. Add a regression test (test_model_selected_edits_message_on_success) that asserts edit_message_text is awaited and the result text is routed through format_message (MARKDOWN_V2 + backtick survival). Add phuongvm to scripts/release.py AUTHOR_MAP. |
||
|
|
a694040520 |
fix(telegram): escape dynamic markdown in callback flows
Use MarkdownV2 formatting for Telegram callback follow-ups and interactive prompts where dynamic names or user text can break legacy Markdown parsing. Add regression coverage for reload-mcp, model picker, approval callbacks, and update prompts. |
||
|
|
1dca6a6960 |
feat(discord): render clarify choices as buttons
Brings Discord to parity with Telegram on the clarify tool's interactive
UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach
a button view when choices are present.
- ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's
25-component view cap leaves one slot for Other) plus a final
'Other (type answer)' button.
- Numeric click -> tools.clarify_gateway.resolve_gateway_clarify(
clarify_id, choice_text) using the canonical choice text from the
gateway entry (falls back to the button label if the entry vanished).
- Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so
the gateway's text-intercept captures the next user message in this
session as the response.
- Auth via the shared _component_check_auth helper (same OR-semantics as
ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView).
- Open-ended (no choices) path renders the prompt as a plain embed and
relies on the existing text-intercept resolution.
- Single-use: first valid click disables every button and updates the
embed footer with who answered and what they chose.
No changes to BasePlatformAdapter.send_clarify or the gateway's
clarify_callback wiring -- the existing scaffolding already drives all
adapters; Discord just inherits the default text fallback today and gains
buttons by virtue of this override.
Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs
so tests can construct embedded views without monkey-patching per-test.
Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's
work onto current main's clarify infrastructure (clarify_id + entry-based
resolution shared with Telegram, instead of a parallel on_answer-closure
mechanism). The button view structure and UX shape are preserved.
Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py.
391/391 existing Discord gateway tests still pass.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
|
||
|
|
c75e1a03f9 |
fix(install): preserve pip entry point when re-running on symlinked install
setup_path() writes the user-facing hermes shim with `cat >`, which follows existing symlinks. Older installs created `$command_link_dir/hermes` as a symlink to `$HERMES_BIN` (`venv/bin/hermes`), so re-running install.sh stomped the pip entry point with a bash shim that exec'd itself in an infinite loop. `rm -f` the link target before writing so the shim lands at `$command_link_dir/hermes` and the venv entry point is left intact. Adds a regression test that reproduces the symlink-stomp end-to-end (creates the symlink, drives the real shim-write block from setup_path, asserts the venv pip script body survives and the shim is now a regular file). Both new assertions fail on origin/main and pass with the fix. Closes #21454. |
||
|
|
0f0e20ef81 |
test(novita): cache pricing, add provider test coverage, AUTHOR_MAP entry
Follow-up to Alex-wuhu's NovitaAI provider commit. Adds: - _pricing_cache hit/write in _fetch_novita_pricing (was missing — every pricing fetch was re-hitting the network), mirroring the fetch_ai_gateway_pricing pattern. force_refresh now also propagates from get_pricing_for_provider. - TestNovitaProvider in tests/hermes_cli/test_api_key_providers.py covering profile load, alias resolution, registry auto-registration, model list parity between main.py and models.py, _URL_TO_PROVIDER, _PROVIDER_PREFIXES, context_size in _CONTEXT_LENGTH_KEYS, pricing unit conversion, and pricing cache behavior. - AUTHOR_MAP entry for yanglongwei06@gmail.com → @Alex-yang00. |
||
|
|
8199ec3803 | fix(gateway): keep QQBot reconnect loop alive | ||
|
|
00ad3d3c9c | fix: show context compaction status | ||
|
|
bd33a48a58 | feat(whatsapp): surface quoted reply metadata | ||
|
|
31b4721791 |
fix: simplify ACP approval bridging
Previously ACP dangerous-command approvals mixed an invalid ACP payload shape with partial Hermes option mapping, and the callback plumbing was shared across worker threads. This commit uses ACP tool-call updates, preserves Hermes once/session/always semantics, and scopes approval callbacks to the current worker thread. - Build permission requests with `update_tool_call` and unique `perm-check-*` ids in `acp_adapter/permissions.py` - Keep ACP option mapping explicit and fail closed on unknown outcomes or request failures - Set approval callbacks inside the ACP executor worker and read them from thread-local state in `tools/terminal_tool.py` - Replace duplicated ACP bridge coverage with focused tests in `tests/acp/test_permissions.py` and add a thread-local callback test |
||
|
|
35ce94a2f8 |
fix(tests): correct skin engine test API call
The salvaged regression test called skin.get_spinner_list() which doesn't exist on SkinConfig. Replace with direct dict access on skin.spinner — same intent (verify default empty spinner is preserved when user override is invalid). |
||
|
|
5f234d4057 | fix(cli): harden skin yaml parsing for invalid section types | ||
|
|
8f19078c6a
|
feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited. |
||
|
|
8db544b4d0 | fix(clipboard): reject non-png clipboard images when png normalization fails | ||
|
|
c872f07c47 |
fix(tests): exercise profile-mode HERMES_HOME for honcho fallback
The cherry-picked tests from #6173 set HERMES_HOME outside Path.home()/.hermes, which forces get_default_hermes_root() down its Docker branch and returns HERMES_HOME directly — so _get_default_hermes_home() never resolves to the ~/.hermes directory the tests were trying to assert about. Rewire both tests to use the real profile layout (HERMES_HOME pointing at ~/.hermes/profiles/<name>) so _get_default_hermes_home() resolves back to ~/.hermes and the default-profile fallback is actually exercised. |
||
|
|
d18618f48f | fix(honcho): respect HOME-anchored default profile fallback | ||
|
|
4ca5e72444 |
fix(web): preserve top-level error envelope on unconfigured systems
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.
Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:
1. **search**: ``firecrawl.search()`` caught the
``ValueError("Web tools are not configured...")`` from
``_get_firecrawl_client()`` and returned it as
``{"success": False, "error": ...}``, losing the legacy
``{"error": "Error searching web: ..."}`` envelope that
``tool_error()`` emits on main. Models that special-case the
``error`` key still detect the failure, but the prefix is part of
the legacy contract some users rely on.
2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
``ValueError`` and wrapped it as a per-page error inside
``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
BEFORE dispatching, so its unconfigured response is
``{"success": False, "error": "web_crawl requires Firecrawl..."}``
at the top level. The PR's per-page burying hid the failure inside
``results[]`` where models that check ``result.get("error")`` would
miss it.
Fix:
- ``plugins/web/firecrawl/provider.py``: pull
``_get_firecrawl_client()`` outside the broad ``try`` in
``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
to the dispatcher's top-level exception handler. In-flight SDK
errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
``web_crawl_tool``. When the resolved crawl provider is
``is_available()==False``, short-circuit BEFORE dispatching with the
same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
(``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
future plugin work can't undo this.
Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
|
||
|
|
21e3a863bb |
feat(web): firecrawl plugin natively supports crawl; delete legacy inline path
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.
What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
- Accepts the same kwargs as the dispatcher passes to any crawl
provider (``instructions``, ``depth``, ``limit``); Firecrawl's
/crawl endpoint ignores ``instructions`` and ``depth`` so we log
and drop with a clear info message.
- Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
gateway event loop isn't blocked on a multi-page crawl.
- Preserves the response-shape normalization across pydantic /
typed-object / dict variants that the legacy inline code did.
- Preserves per-page website-policy re-check (catches blocked
redirects after the SDK returns).
- Returns the same {"results": [...]} shape so the dispatcher's
shared LLM-summarization post-processing path works unchanged.
- Sets supports_crawl() to True so the dispatcher routes through
the plugin instead of the legacy fallthrough.
2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
that used to run after "No registered provider supports crawl" —
~270 lines including:
- check_firecrawl_api_key gate + typed error
- inline SSRF + website-policy seed-URL gate (dispatcher already
does this)
- Firecrawl client setup with crawl_params
- 100+ lines of pydantic/dict/typed-object normalization
- Per-page LLM-processing loop (kept in the dispatcher's shared
post-processing path; that's where it always belonged)
- trimming + base64 image cleanup (still done in the dispatcher's
shared path)
Replaced with a single typed-error branch when no crawl-capable
provider is available: "web_crawl has no available backend. Set
FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
TAVILY_API_KEY for Tavily."
Test updates
------------
- tests/tools/test_website_policy.py:
- test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
gate still runs on web_tools.check_website_access (no change to
that patch), but the firecrawl client lockdown moved to the
plugin module — patch firecrawl_provider._get_firecrawl_client
instead of web_tools._get_firecrawl_client. The dispatcher
short-circuits before the plugin runs, so the test still passes.
- test_web_crawl_blocks_redirected_final_url: patch the per-page
policy gate at plugins.web.firecrawl.provider.check_website_access
(where it now runs) AND on web_tools (where the seed-URL gate
still runs). Patch firecrawl_provider._get_firecrawl_client for
the FakeCrawlClient injection. Both checks flow through the same
fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
- Update parametrized capability-flag spec: firecrawl supports_crawl
is now True.
- Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
verifies inspect.iscoroutinefunction(p.crawl) is True and that
the async crawl returns a per-page error dict (not a raise) when
FIRECRAWL_API_KEY is missing.
Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
name search extract crawl async-extract? async-crawl?
firecrawl True True True True True
tavily True True True False False
Both crawl-capable providers exercise the dispatcher's
inspect.iscoroutinefunction async-or-sync detection.
Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
+ new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
before this commit, 2227 before the migration started).
|
||
|
|
e8cee87e85 |
test(plugins): tests/plugins/web/ — coverage for the 7-plugin migration
Adds 44 focused tests under tests/plugins/web/ covering the surface that the PR #25182 web-provider migration introduced. Complements the existing tests/tools/ coverage which is dispatcher-centric; this file is plugin-centric and tests each plugin + the registry directly. Test classes (44 tests, ~1.1s on 4 workers) ------------------------------------------- TestBundledPluginsRegister (16 tests) - All seven plugins present in the registry after _ensure_plugins_discovered() - Per-plugin parametrized capability-flag assertions (brave-free / ddgs / searxng: search-only; exa / parallel / firecrawl: search + extract; tavily: search + extract + crawl) - Every plugin exposes name + display_name properties - Every plugin returns a picker-compatible get_setup_schema() dict TestIsAvailable (7 tests) - Each premium plugin reports is_available()==False when its env var is absent and True once set (brave-free / searxng / tavily / exa / parallel) - firecrawl recognizes either FIRECRAWL_API_KEY or FIRECRAWL_API_URL as a "configured" signal - ddgs is the always-on fallback and must not raise from is_available() TestRegistryResolution (4 tests) - Option B semantics validated end-to-end: 1. Explicit configured provider wins even when is_available()==False (dispatcher surfaces typed credential errors, no silent switch) 2. Unknown/typo name falls back to first available legacy-preference provider 3. Asking for extract via a search-only backend falls back to an extract-capable available provider (capability-incompatible branch in _resolve()) 4. No config + no credentials → None (or ddgs if installed) TestAsyncExtractDispatch (4 tests) - parallel + firecrawl extract() are coroutine functions (async path in dispatcher uses await) - exa + tavily extract() are sync (dispatcher wraps in asyncio.to_thread) TestErrorResponseShapes (7 tests) - Plugins return typed error dicts (success=False + "error" key) when credentials are missing, never raise - async extract() returns list of per-URL error dicts - tavily crawl() returns {"results": [{"error": ...}]} on missing credentials Design notes ------------ - All tests use real imports of plugin modules — no mocking of provider classes themselves — so they catch drift in the ABC, registry, and glue layer simultaneously. Per the hermes-agent-dev skill's E2E testing guidance. - The autouse _isolate_env fixture clears every web-provider env var before each test so is_available() reflects the test's setup. - Resolution tests use the lower-level _resolve() directly rather than rebuilding the HERMES_HOME config dance — same observable behavior, no sys.modules.pop side-effects that would break the ABC isinstance check inside ctx.register_web_search_provider(). |
||
|
|
39b4ebfcea |
refactor(web): delete legacy tools/web_providers/ directory + migrate ABC tests
Removes the legacy in-tree provider scaffolding that PR #25182 fully replaced with the plugin architecture: tools/web_providers/__init__.py (6 lines) tools/web_providers/base.py (89 lines — old ABCs) tools/web_providers/ARCHITECTURE.md (73 lines — old design doc) These were the staging-ground ABCs and provider modules that the plugin migration absorbed. All seven web providers now implement the single :class:`agent.web_search_provider.WebSearchProvider` ABC and live under ``plugins/web/<vendor>/``. Nothing else in the tree imports ``tools.web_providers`` — verified via grep before deletion. Test migration (tests/tools/test_web_providers.py) -------------------------------------------------- Rewrote ``TestWebProviderABCs`` to test the new unified ABC at :mod:`agent.web_search_provider`: - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available`` - test_concrete_search_only_provider_works — exercise default ``supports_extract=False`` / ``supports_crawl=False`` flags - test_concrete_multi_capability_provider_works — exercise all three capabilities, async extract supported (declared sync here for simplicity; real plugins like parallel + firecrawl use async) - test_search_only_provider_skips_extract_and_crawl — verify ``supports_*()`` flags default to False so search-only providers don't have to implement extract() or crawl() The 9 other tests in the file (per-capability backend selection, DEFAULT_CONFIG merge, dispatcher routing) test public helpers in ``tools.web_tools`` that still exist and pass unchanged. agent/web_search_provider.py docstring updated to reflect that the legacy ABCs no longer exist; the response-shape contract is preserved bit-for-bit so external consumers see no behavioral change. Net diff -------- - tools/web_providers/ removed (-168 lines) - tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net, same coverage, new API) - agent/web_search_provider.py docstring (-3/+5 lines) Verified -------- - 173/173 targeted web tests pass - 12/12 ABC contract tests pass with the new interface - No remaining grep hits for ``tools.web_providers`` outside of intentional historical references in plugin docstrings. |
||
|
|
5e54330e27 |
fix(web): preserve firecrawl crawl + website-policy gate after migration
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:
1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).
Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.
2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.
Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.
Test updates (tests/tools/test_website_policy.py — 4 tests):
- test_web_extract_short_circuits_blocked_url
- test_web_extract_blocks_redirected_final_url
Both: patch the gate at plugins.web.firecrawl.provider (where it
runs after migration) and force the firecrawl plugin to be the
active extract provider via FIRECRAWL_API_KEY.
- test_web_crawl_short_circuits_blocked_url
- test_web_crawl_blocks_redirected_final_url
Both: unchanged — the dispatcher-level gate at tools.web_tools.py
line 1651 still uses the imported `check_website_access` name and
the firecrawl-fallthrough path is exercised as before.
Verified: 22/22 tests/tools/test_website_policy.py pass.
|
||
|
|
b05253ceed |
refactor(web): dispatch all three tools through web_search_registry
Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.
Per-tool changes:
web_search_tool (sync)
Replace 5 backend branches (parallel, exa, registry-3-providers,
tavily, firecrawl-fallthrough) with a single registry path:
1. _get_search_backend() resolves the configured name
2. _wsp_get_provider(name) for explicit-config-wins semantics
3. get_active_search_provider() fallback for typo / unknown name
4. provider.search(query, limit) — sync for all 7 providers
web_extract_tool (async)
Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
search-only-error, firecrawl-perurl-loop) with:
1. Same provider resolution as search.
2. When configured backend IS registered but doesn't support
extract (search-only providers like brave-free), surface a
typed "search-only" error matching the legacy text — tests
assert that wording.
3. inspect.iscoroutinefunction(provider.extract) detects sync vs
async: parallel + firecrawl are async; exa + tavily are sync.
Sync extracts run in asyncio.to_thread() so we don't block.
web_crawl_tool (async)
Replace tavily-specific branch + search-only-error block with:
1. _wsp_get_provider(backend) — explicit config first
2. Search-only typed error when the configured name doesn't
support crawl (matches legacy phrasing)
3. get_active_crawl_provider() fallback otherwise
4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
5. Response post-processing (LLM summarization, trimming) stays
unchanged — it's not provider-specific.
When no plugin advertises supports_crawl, falls through to the
existing Firecrawl-via-web-summarize path below (unchanged).
Test updates (2 tests in tests/tools/test_web_tools_config.py):
- test_web_search_clamps_limit_before_backend_call:
patch("tools.web_tools._parallel_search") -> patch the registry
provider returned by agent.web_search_registry.get_provider
- test_search_error_response_does_not_expose_diagnostics:
patch("tools.web_tools._get_firecrawl_client") -> same pattern
Tests unchanged (still pass):
- All TestXBackendWiring classes (test _get_backend / _is_backend_available
config-resolution, independent of dispatch)
- All TestXSearchOnlyErrors classes (test the search-only error path
via web_extract_tool / web_crawl_tool — error text preserved)
- 141 passing web tests total, 0 regressions.
Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
- tools.web_tools._exa_search / _exa_extract / _parallel_search /
_parallel_extract / _tavily_request / _normalize_tavily_* /
_get_firecrawl_client / _extract_web_search_results /
_extract_scrape_payload / _to_plain_object / _normalize_result_list
are no longer called by the dispatchers, but still exist.
- The config-resolution layer (_get_backend, _is_backend_available,
_is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
use and must stay.
- The Firecrawl proxy and check_firecrawl_api_key are still imported
by integration tests and patched by unit tests — must stay (or be
re-exported from the plugin).
|
||
|
|
6b219f5af6 |
refactor(web): remove legacy in-tree provider modules
Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three
providers that moved to plugins/web/ in prior commits. tools/web_tools.py
no longer imports them (registry dispatch as of
|
||
|
|
efc32ab639 |
refactor(inventory): extract shared ConfigContext + build_models_payload
Three call-sites in the codebase each duplicated the same config-slice
+ list_authenticated_providers + post-processing pattern:
- hermes_cli/web_server.py /api/model/options
- tui_gateway/server.py model.options JSON-RPC
- tui_gateway/server.py model.save_key JSON-RPC
This consolidates them onto hermes_cli/inventory.py:
load_picker_context() -> ConfigContext
Replaces the 17-LOC config-slice (model.{default,name,provider,
base_url}, providers:, custom_providers:) every consumer did
inline.
ConfigContext.with_overrides(*, current_provider=, current_model=,
current_base_url=) -> ConfigContext
Truthy-only overlay for TUI agent-session state on top of disk
config. Empty getattr(agent, ...) attrs MUST NOT clobber disk.
build_models_payload(ctx, *, include_unconfigured, picker_hints,
canonical_order, max_models) -> dict
Single payload builder. Delegates curation to
list_authenticated_providers (does not call provider_model_ids
per row \u2014 that pulls non-agentic models). picker_hints +
canonical_order produce the TUI ModelPickerDialog shape;
defaults match the dashboard's existing /api/model/options
contract.
Two latent bugs fixed by consolidation:
1. The dashboard read cfg.get('custom_providers') directly, missing
the v12+ keyed providers: form. Now both surfaces go through
get_compatible_custom_providers().
2. The TUI's canonical-merge keyed on is_user_defined to decide order.
Section 3 of list_authenticated_providers sets is_user_defined=True
on rows from the providers: config dict even when the slug is
canonical \u2014 that silently demoted them to the picker tail.
_reorder_canonical now keys on slug membership instead.
Stats: +666 / -145 (net +521). Module 240 LOC; 18 behavior tests.
This PR replaces the rejected #23369 (which bundled the consolidation
with new scriptable CLI surfaces \u2014 hermes models list/status, hermes
providers list \u2014 and a JSON contract that have no external user
demand). Just the refactor; the CLI surface is deferred to a separate
PR gated on actual demand.
Refs #23359.
|
||
|
|
4ceab16893 |
fix(compression): keep default protect_first_n at 3 + align ABC
Follow-up on the salvaged feat commit: - Keep the constructor / config / yaml-example default at 3 so existing gateway and CLI users see no behavioural change. PR #13754 (which this builds on) had lowered the default to 2 to chase pre-feature parity in the system-prompt-present case, at the cost of quietly halving the protected head for the gateway path (which strips the system prompt before calling compress()). With the new "system prompt is implicit" semantics, default 3 gives every caller a stable head shape. - agent/context_engine.py: bring the ABC's protect_first_n docstring in line with the new semantics so plugin context engines interpret the config key the same way the built-in compressor does. - tests: adjust the default-value test (3, not 2) and a stale comment; per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is since those tests fix concrete head shapes. |
||
|
|
dee71a31e5 |
feat(compression): make protect_first_n configurable
The number of head messages preserved verbatim across context compactions was previously hardcoded to 3 in AIAgent.__init__. Expose it as `compression.protect_first_n` in config, matching the existing `protect_last_n` pattern. Motivation: users who rely on rolling compaction for long-running sessions had the opening user/assistant exchange pinned as head forever, which doesn't always match how they want the session framed after many compactions. Lowering to 1 preserves the system prompt + first non-system message; lowering to 0 preserves only the system prompt and lets the entire first exchange age out naturally through the summary. Semantics: `protect_first_n` counts non-system head messages protected **in addition to** the system prompt, which is always implicitly protected when present. Same meaning across both code paths: protect_first_n=0 → system prompt only (or nothing if no system message) protect_first_n=2 → system prompt + first 2 non-system messages (default) This unifies the CLI path (which reads messages with the system prompt at position 0) and the gateway path (where the gateway /compress handler strips the system prompt before calling compress() — see gateway/run.py L9150-9154 on the parent fork). Previously these two paths disagreed: CLI path: protect_first_n=1 → protect system prompt only Gateway path: protect_first_n=1 → protect first USER turn forever In practice on long-running gateway sessions the old semantics pinned whatever stale aside happened to be the first user message, reinserting it into every compaction summary indefinitely. Default chosen as 2 (not 3) so that the effective protected head count remains 3 messages in the common case — assuming a system prompt is present, default protection becomes system + 2 non-system = 3 total, matching the pre-feature behaviour where `protect_first_n` was hardcoded to protect 3 messages total. Sessions without a system prompt will see a small behaviour change (2 protected head messages instead of 3), but this is the rare path and the new semantics make the system-prompt-present case the well-defined one. Changes: - agent/context_compressor.py: redefine protect_first_n as the count of non-system head messages protected beyond the implicit system-prompt guarantee; both paths converge. Constructor default updated to 2. - hermes_cli/config.py: add `compression.protect_first_n` default (2), matching the new semantics. `show_config` label tweaked to 'Protect first: N non-system head messages' for clarity. - run_agent.py: read protect_first_n from config; 0 is now valid (system prompt is always implicitly protected). - cli-config.yaml.example: document the new key and rationale. - tests/agent/test_context_compressor.py: cover default, override, the end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour, the no-system-prompt (gateway) path, and the new shared-semantics regression test. Fixes #13751 Tested on Ubuntu 24.04. |
||
|
|
d863773c81 |
feat(discord): add thread_require_mention for multi-bot threads
By default, once Hermes participates in a Discord thread (auto-created on @mention or replied in once) it auto-responds to every subsequent message in that thread without requiring further @mentions. That's the right default for one-on-one conversations and isolated channel threads. But it's a confirmed footgun in multi-bot threads. When a user invokes one bot per turn — addressing Codex first, then Hermes — every other bot in the thread also fires on every message, burning credits and spamming the channel. Author has hit this personally in active multi-bot research-team threads. Add a new `discord.thread_require_mention` config key (env: `DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing behavior. When `true`, the in-thread mention shortcut is disabled and threads are gated the same way channels are. Explicit @mentions still pass through as expected. Mirrors the existing helper shape (config.extra > env > default) and the existing yaml→env bridge pattern used by `require_mention`. Changes: - gateway/platforms/discord.py: new `_discord_thread_require_mention()` helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()` - gateway/config.py: bridge `discord.thread_require_mention` from config.yaml to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing `require_mention` bridge two lines above) - hermes_cli/config.py: add `thread_require_mention: False` default to DEFAULT_CONFIG['discord'] - tests/gateway/test_discord_free_response.py: 4 new tests covering default behaviour (in-thread shortcut still works), enabled behaviour (mention required in threads), enabled+mentioned (mention still passes through), and yaml-via-config.extra path. Also clears DISCORD_* env vars in the `adapter` fixture so process-env state from the contributor's shell doesn't leak into per-test behaviour. - tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge (both the apply-from-yaml and env-precedence-over-yaml paths) - website/docs/user-guide/messaging/discord.md: document the new env var + config key with multi-bot rationale; cross-link from `auto_thread` section Tested on Ubuntu 24.04. |
||
|
|
d557544560 |
fix(discord): keep free-response channels inline
Free-response channels are intended as lightweight chat surfaces — the bot
responds to every message without requiring an @mention. But the auto-thread
gate only checked DISCORD_NO_THREAD_CHANNELS, not DISCORD_FREE_RESPONSE_CHANNELS,
so every message in a free-response channel still spawned a brand-new thread.
That turns a chat channel into a thread-spawning machine: 1 thread per message.
The user-facing docs at website/docs/user-guide/messaging/discord.md already
describe the intended behavior ("Free-response channels also skip auto-threading
— the bot replies inline rather than spinning off a new thread per message"),
so this is a code-vs-docs gap, not a design change.
Fix: OR is_free_channel into skip_thread alongside the existing no_thread_channels
check. One-line production change.
Regression test added at tests/gateway/test_discord_free_response.py:
test_discord_free_response_channel_skips_auto_thread asserts that a message
in a free-response channel never calls _auto_create_thread. Reverting the
one-line fix causes the test to fail with 'Expected mock to not have been
awaited. Awaited 1 times.' — i.e. the test demonstrates the bug concretely.
|
||
|
|
3633c8690b |
refactor(plugins): add apply_yaml_config_fn registry hook
Lets platform plugins own their YAML→env config bridge instead of forcing core gateway/config.py to know every platform's schema. The hook receives the full parsed config.yaml and the platform's own sub-dict, may mutate os.environ (env > YAML precedence preserved via the standard `not os.getenv(...)` guards), and may return a dict to merge into PlatformConfig.extra. It runs during load_gateway_config() after the existing generic shared-key loop and before _apply_env_overrides(), mirroring the env_enablement_fn dispatch pattern (#21306, #21331). Pure addition — no behavior change for existing platforms. Each of the eight platforms with hardcoded YAML→env blocks today (discord, telegram, whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in gateway/config.py) can migrate in independent follow-up PRs; the hardcoded blocks remain functional in the meantime, and their `not os.getenv(...)` guards make them no-ops for any env var the hook already set. Test coverage: 10 new tests in tests/gateway/test_platform_registry.py covering field default, callable acceptance, env mutation, extras merge, both signature args, exception swallowing, missing/non-dict sections, and env > YAML precedence. Refs #3823, #24356. Closes #24836. |
||
|
|
d5775fe988
|
feat(codex-runtime): skip unavailable plugins during migration (#25437)
Followup to PR #24182 — caught when scanning OpenClaw for recent codex fixes we hadn't considered. OpenClaw learned the hard way (#80815) that migrating plugins which codex itself reports as unavailable produces config that fails at activation time. Our /codex-runtime codex_app_server enable path queries codex's plugin/list and migrates everything where installed=true. We were trusting codex's installation state and ignoring its availability field. So a plugin that's installed=true but availability=UNAVAILABLE (broken local install) or REQUIRES_AUTH (OAuth expired or never completed) would get an [plugins."<n>@openai-curated"] entry in ~/.codex/config.toml — and the user's first codex turn after enabling the runtime would fail because codex refuses to activate it. Fix: filter on availability in _query_codex_plugins(). Only emit plugins where availability is empty (older codex versions without the field — preserve backward compat) or explicitly AVAILABLE. Tests: test_plugin_discovery_skips_unavailable_plugins — verifies 4 cases: - good-plugin (installed=True, availability=AVAILABLE) → migrated - broken-plugin (installed=True, availability=UNAVAILABLE) → skipped - auth-pending (installed=True, availability=REQUIRES_AUTH) → skipped - legacy-plugin (installed=True, no availability field) → migrated (older codex versions; preserve backward compat) Docs: Added bullet to 'What's NOT migrated' list in the docs page calling out the availability filter and why. Other OpenClaw codex PRs I reviewed but did NOT apply (with reasoning): - #81591 (load Codex for selectable models): we resolve runtime per-call already, no startup-time gating to fix - #81510 (cron compatibility): we documented cron as untested; their fix is for OpenClaw-specific cron orchestration shape - #81223 (rotate incompatible context-engine threads): we don't have a Lossless context engine equivalent - #80688 (constrain sandbox): we don't have an outer-sandbox concept - #80616 (release on turn_aborted): we already handle status= interrupted in turn/completed correctly - #80278 (expose activeModel in plugin SDK): not our surface - #80792 (default destructive_actions on): we don't expose that knob 56 codex-runtime migration tests still green (+1 new). |
||
|
|
8c6b0c9ecd |
test(memory): cover cache-parity + runtime whitelist on background review fork
- test_background_review_does_not_narrow_toolset_schema: review fork must NOT pass enabled_toolsets to AIAgent (full parent schema = matching Anthropic cache key on the 'tools' field). - test_background_review_installs_thread_local_whitelist: the runtime whitelist that replaces schema-level narrowing must contain memory + skills tools and exclude terminal / send_message / delegate_task / web_search / execute_code. - test_review_fork_inherits_parent_cached_system_prompt: new test for PR #17276's first root cause — the fork's _cached_system_prompt must equal the parent's byte-for-byte. - test_review_fork_pins_session_start_and_session_id: defensive belt-and- suspenders for the cached-prompt inheritance. Inverted the original test_background_review_agent_uses_restricted_toolsets (which asserted the schema-level narrowing) — that narrowing was the direct cause of #25322's cache miss, and the runtime whitelist replaces its safety claim without breaking cache parity. Refs #25322, #15204, PR #17276. |
||
|
|
3a30c605b3 |
feat(plugins): add thread-local tool whitelist to pre_tool_call gate
Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to hermes_cli/plugins.py. When set on the current thread, restricts which tools can pass through get_pre_tool_call_block_message; non-whitelisted tools are blocked with a configurable deny message. Mirrors the per-thread approval-callback pattern already used by set_approval_callback (tools/terminal_tool.py:190). Used by _spawn_background_review to deny non-memory/non-skill tools at runtime while inheriting the parent agent's full tools schema for prefix-cache parity (see follow-up commit). Tests cover allow / deny / clear / cross-thread isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
52521c937a | fix(install): skip browser download when system chromium exists | ||
|
|
7f08cb5941 |
fix(tts): align MiniMax TTS defaults with current API and add GroupId support
Follow-up on @pty819's t2a_v2 endpoint fix: - Default model: speech-02 -> speech-02-hd (bare 'speech-02' is not in the supported enum; t2a_v2 rejects it with 400). Official enum: speech-01-hd, speech-01-turbo, speech-02-hd, speech-02-turbo, speech-2.6-hd/turbo, speech-2.8-hd/turbo. - Default voice: female-shaonv -> English_expressive_narrator. The legacy speech-01-series short ID doesn't resolve cleanly on the speech-02+ models that are now the default. - Default base URL: api.minimaxi.com -> api.minimax.io (matches the canonical host in the published docs; api-uw.minimax.io is the reduced-latency alt). - Add GroupId support via tts.minimax.group_id config or MINIMAX_GROUP_ID env var. Some MiniMax accounts scope TTS requests by group; without it, requests 401. Only appended when not already in the user's base_url. Tests rewritten to cover both the default t2a_v2 path (hex-encoded audio in JSON, nested voice_setting/audio_setting) and the legacy text_to_speech path (raw audio bytes, flat payload). Adds coverage for GroupId config/env wiring and error surfacing. Also adds AUTHOR_MAP entry for pty819's GitHub-noreply email. |
||
|
|
6122a79aab
|
feat(slack): support !cmd as alternate prefix for slash commands in threads (#25355)
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-main (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Slack platform-blocks native slash commands inside thread replies ("/queue
is not supported in threads. Sorry!") and there is no app-side setting to
re-enable them. As a workaround, rewrite a leading '!' to '/' for any known
gateway command before downstream processing — so '!queue', '!stop',
'!model gpt-5.4' etc. work inside Slack threads (and anywhere else).
Only the first token is checked against is_gateway_known_command(), so
casual messages like '!nice work' pass through to the agent unchanged.
Downstream pipeline (MessageType.COMMAND tagging, gateway dispatcher,
thread reply routing) is unchanged.
Adds 6 tests covering rewrite, args preservation, thread routing,
casual-message passthrough, '@bot' suffix, and plain '/' still-works.
|
||
|
|
3f13d78088
|
perf(tools): cache get_nous_auth_status() and load_env() to fix slow hermes tools menus (#25341)
`hermes tools` -> "All Platforms" took ~14s to render the checklist because building the toolset labels called `get_nous_auth_status()` ~31x transitively (`_toolset_has_keys` -> `_visible_providers` -> `get_nous_subscription_features` -> `managed_nous_tools_enabled`). Each call did a synchronous OAuth refresh POST to portal.nousresearch.com (~350ms even on the failure path), so one menu paint burned >13s of HTTP and 31 single-use Nous refresh tokens. Secondary hot spot: every `get_env_value()` re-read and re-sanitised the entire .env file. 116 reads with O(lines x known-keys) scanning added ~300ms of CPU per render. Fix is two process-level caches, both mtime-keyed so login/logout/edit invalidate naturally: * `hermes_cli/auth.py`: memoise `get_nous_auth_status()` for 15s keyed on auth.json mtime. Splits `_compute_nous_auth_status()` as the uncached impl. Adds `invalidate_nous_auth_status_cache()`. * `hermes_cli/config.py`: memoise `load_env()` keyed on .env (path, mtime, size). Adds `invalidate_env_cache()`, wired into `save_env_value`, `remove_env_value`, and the sanitize-on-load writer so writers don't return stale dicts on same-second writes. Before/after on Teknium's box (real HERMES_HOME, no Nous login): * "All Platforms" cold path: ~13,874ms -> ~691ms label-build * Warm re-open within the same process: ~122ms -> ~17ms Side benefit: stops burning a Nous refresh token on every menu paint, which was risking the portal's reuse-detection revocation logic. |
||
|
|
dd5a9502e3
|
fix(tools-config): write video_gen.provider on Reconfigure tool path (#25307)
`_reconfigure_provider()` handled `image_gen_plugin_name` in both branches (no-env-vars early return and post-env-vars) but never mirrored the same handling for `video_gen_plugin_name`. The first-time `_configure_provider()` path correctly routes to `_select_plugin_video_gen_provider()`; reconfigure forgot to. Repro: 1. Enable video_gen in `hermes tools` → Configure for All Platforms. 2. Go back into `hermes tools` → Reconfigure tool → Video Generation. 3. Pick xAI (with XAI_API_KEY already set). 4. Hit Enter at the "keep current key?" prompt. Expected: `video_gen.provider: xai` written to config.yaml. Actual: function returns silently; no `video_gen:` block ever written; `video_generate` tool fails with "No video generation backend is configured." Fix: add the missing `video_gen_plugin_name` branch in both code paths of `_reconfigure_provider()`, mirroring the existing `image_gen_plugin_name` handling and the first-time configure logic. Tests: `tests/hermes_cli/test_video_gen_picker.py` covers both branches (env-vars-set keep-current and no-env-vars paths). |
||
|
|
091d8e1030
|
feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182)
* feat(codex-runtime): scaffold optional codex app-server runtime
Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.
Lands in three pieces:
1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
for codex's app-server protocol (codex-rs/app-server). Spawn, init
handshake, request/response, notification queue, server-initiated
request queue (for approval round-trips), interrupt-friendly blocking
reads. Tested against real codex 0.130.0 binary end-to-end during
development.
2. hermes_cli/runtime_provider.py:
- Adds 'codex_app_server' to _VALID_API_MODES.
- Adds _maybe_apply_codex_app_server_runtime() helper, called at the
end of _resolve_runtime_from_pool_entry(). Inert unless
'model.openai_runtime: codex_app_server' is set in config.yaml AND
provider in {openai, openai-codex}. Other providers cannot be
rerouted (anthropic, openrouter, etc. preserved).
3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
covering api_mode registration, the rewriter helper (default-off,
case-insensitive, opt-in, non-eligible providers preserved), version
parser, missing-binary handling, error class. Does NOT require codex
CLI installed.
This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.
Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)
* feat(codex-runtime): add codex item projector for memory/skill review
The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.
Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
- userMessage → {role: user, content}
- agentMessage → {role: assistant, content: text}
- reasoning → stashed in next assistant's 'reasoning' field
- commandExecution → assistant tool_call(name='exec_command') + tool result
- fileChange → assistant tool_call(name='apply_patch') + tool result
- mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result
- dynamicToolCall → assistant tool_call(name=<tool>) + tool result
- plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls
Invariants preserved:
- Message role alternation never violated: each tool item produces at most
one assistant + one tool message in that order, correlated by call_id.
- Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
don't materialize messages — only item/completed does. Mirrors how
Hermes already only writes the assistant message after streaming ends.
- Tool call ids are deterministic (codex item id-based) so replays produce
identical messages and prefix caches stay valid (AGENTS.md pitfall #16).
- JSON args use sorted_keys for the same reason.
Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).
23 new tests, all green:
- Streaming deltas don't materialize (3 paths)
- Turn/thread frame events are silent
- commandExecution: 5 tests including non-zero exit annotation +
deterministic id stability across replays
- agentMessage + reasoning attachment + reasoning consumption
- fileChange: summary without inlined content
- mcpToolCall: namespaced naming + error surfacing
- userMessage: text fragments only (drops images/etc)
- opaque items: no fabricated tool_calls
- Helpers: deterministic id stability + sorted JSON args
- Role alternation invariant across all four tool-shaped item types
This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.
* feat(codex-runtime): add session adapter + approval bridge
The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.
The adapter has a single public per-turn method:
result = session.run_turn(user_input='...', turn_timeout=600)
# result.final_text → assistant text for the caller
# result.projected_messages → list ready to splice into AIAgent.messages
# result.tool_iterations → tick count for _iters_since_skill nudge
# result.interrupted → True on Ctrl+C / deadline / interrupt
# result.error → error string when the turn cannot complete
# result.turn_id, thread_id → for sessions DB / resume
Behavior:
- ensure_started() spawns codex, does the initialize handshake, and
issues thread/start with cwd + permissions profile. Idempotent.
- run_turn() blocks until turn/completed, drains server-initiated
requests (approvals) before reading notifications so codex never
deadlocks waiting for us, projects every item/completed via the
projector, and increments tool_iterations for the skill nudge gate.
- request_interrupt() is thread-safe (threading.Event); the next loop
iteration issues turn/interrupt and unwinds.
- turn_timeout deadlock guard issues turn/interrupt and records an
error if the turn never completes.
- close() escalates terminate → kill via the underlying client.
Approval bridge:
Codex emits server-initiated requests for execCommandApproval and
applyPatchApproval. The adapter translates Hermes' approval choice
vocabulary onto codex's decision vocabulary:
Hermes 'once' → codex 'approved'
Hermes 'session' or 'always' → codex 'approvedForSession'
Hermes 'deny' / anything else → codex 'denied'
Routing precedence:
1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
2. approval_callback wired by the CLI (defers to
tools.approval.prompt_dangerous_approval())
3. Fail-closed denial when neither is wired
Unknown server-request methods are answered with JSON-RPC error -32601
so codex doesn't hang waiting for us.
Permission profile mapping mirrors AGENTS.md:
Hermes 'auto' → codex 'workspace-write'
Hermes 'approval-required' → codex 'read-only-with-approval'
Hermes 'unrestricted/yolo' → codex 'full-access'
20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
- test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
- test_codex_event_projector.py: 23 (item taxonomy projections)
- test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)
Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.
Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.
* feat(codex-runtime): wire codex_app_server runtime into AIAgent
The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.
Three small surgical edits to run_agent.py (~105 LOC total):
1. Line ~1204 (constructor api_mode validation set):
Add 'codex_app_server' so an explicit api_mode='codex_app_server'
passed to AIAgent() isn't silently rewritten to 'chat_completions'.
2. Line ~12048 (run_conversation, just before the while loop):
Early-return to _run_codex_app_server_turn() when self.api_mode is
'codex_app_server'. Placed AFTER all standard pre-loop setup —
logging context, session DB, surrogate sanitization, _user_turn_count
and _turns_since_memory increments, _ext_prefetch_cache, memory
manager on_turn_start — so behavior outside the model-call loop is
identical between paths. Default Hermes flow is unchanged when the
flag is off.
3. End-of-class (line ~15497):
New method _run_codex_app_server_turn(). Lazy-instantiates one
CodexAppServerSession per AIAgent (reused across turns), runs the
turn, splices projected_messages into messages, increments
_iters_since_skill by tool_iterations (since the chat_completions
loop normally does that per iteration), fires
_spawn_background_review on the same cadence as the default path.
Counter accounting:
_turns_since_memory ← already incremented at run_conversation:11817
(gated on memory store configured) — codex
helper does NOT touch it (would double-count).
_user_turn_count ← already incremented at run_conversation:11793
— codex helper does NOT touch it.
_iters_since_skill ← incremented in the chat_completions loop per
tool iteration. Codex helper increments by
turn.tool_iterations since the loop is bypassed.
User message:
ALREADY appended to messages by run_conversation pre-loop (line 11823)
before the early-return reaches us. Helper does NOT append again.
Regression test test_user_message_not_duplicated guards this.
Approval callback wiring:
Lazy-fetches tools.terminal_tool._get_approval_callback at session
spawn time, passes to CodexAppServerSession. CLI threads with
prompt_toolkit get interactive approvals; gateway/cron contexts get
the codex-side fail-closed deny.
Error path:
Codex session exceptions become a 'partial' result with completed=False
and a final_response that explicitly tells the user how to switch back:
'Codex app-server turn failed: ... Fall back to default runtime with
/codex-runtime auto.' Same return-dict shape as the chat_completions
path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.
9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
- api_mode='codex_app_server' is accepted on AIAgent construction
- run_conversation returns the expected codex shape
(final_response, codex_thread_id, codex_turn_id, completed, partial)
- Projected messages are spliced into messages list
- _iters_since_skill ticks per tool iteration
- _user_turn_count delegated to standard flow (not double-counted)
- User message appears exactly once (regression guard)
- _spawn_background_review IS invoked (memory/skill review keeps working)
- chat.completions.create is NEVER called (loop fully bypassed)
- Session exception → partial result with /codex-runtime auto hint
- Interrupted turn → partial result with error preserved
Adjacent test runs confirm no regressions:
- tests/run_agent/test_memory_nudge_counter_hydration.py: green
- tests/run_agent/test_background_review.py: green
- tests/run_agent/test_fallback_model.py: green
- tests/agent/transports/: 249/249 green
Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.
* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)
User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.
Surface:
/codex-runtime — show current state + codex CLI status
/codex-runtime auto — Hermes default runtime
/codex-runtime codex_app_server — codex subprocess runtime
/codex-runtime on / off — synonyms
Files changed:
hermes_cli/codex_runtime_switch.py (new):
Pure-Python state machine shared by CLI and gateway. Parse args,
read/write model.openai_runtime in the config dict, gate enabling
behind a codex --version check (don't let users opt in to a runtime
they have no binary for; print npm install hint instead).
Returns a CodexRuntimeStatus dataclass that callers render however
suits their surface.
hermes_cli/commands.py:
Single CommandDef entry, no aliases (codex-runtime is its own thing).
cli.py:
Dispatch in process_command() + _handle_codex_runtime() handler that
delegates to the shared module and renders results via _cprint.
gateway/run.py:
Dispatch in _handle_message() + _handle_codex_runtime_command() that
returns a string (gateway sends as message). On a successful change
that requires a new session, _evict_cached_agent() forces the next
inbound message to construct a fresh AIAgent with the new api_mode —
avoids prompt-cache invalidation mid-session.
gateway/run.py running-agent guard:
/codex-runtime joins /model in the early-intercept block so a runtime
flip mid-turn can't split a turn across two transports.
Tests:
tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
state machine: arg parsing (10 cases incl. case-insensitive and
synonyms), reading current runtime (5 cases incl. malformed configs),
writing runtime (3 cases), apply() entry point covering read-only,
no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
and persist-failure paths (8 cases). All green.
Adjacent test suites confirm no regressions:
- tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
167/167 green
- tests/agent/transports/: 283/283 green when combined with prior commits
Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.
* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml
Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.
The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.
What translates (mapping verified against codex-rs/core/src/config/edit.rs):
Hermes mcp_servers.<n>.command/args/env → codex stdio transport
Hermes mcp_servers.<n>.url/headers → codex streamable_http transport
Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec
Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec
Hermes mcp_servers.<n>.cwd → codex stdio cwd
Hermes mcp_servers.<n>.enabled: false → codex enabled = false
What does NOT translate (warned + skipped per server):
Hermes-specific keys (sampling, etc.) — codex's MCP client has no
equivalent. Listed in the per-server skipped[] field of the report.
What's NOT migrated (intentional):
AGENTS.md — codex respects this file natively in its cwd. Hermes' own
AGENTS.md (project-level) is already in the worktree, so codex picks
it up without translation. No code needed.
Idempotency design:
All managed content lives between a 'managed by hermes-agent' marker
and the next non-mcp_servers section header. _strip_existing_managed_block
removes the prior managed region cleanly, preserving any user-added
codex config (model, providers.openai, sandbox profiles, etc.) above
or below.
Files added:
hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
helper. Public API: migrate(hermes_config, codex_home=None,
dry_run=False) returns MigrationReport with .migrated/.errors/
.skipped_keys_per_server. No external TOML dependency — minimal
formatter handles strings/numbers/booleans/lists/inline-tables.
tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
covering:
- per-server translation (12): stdio/http/sse, cwd, timeouts,
enabled flag, command+url precedence, sampling drop, unknown keys
- TOML formatter (8): types, escaping, inline tables, error case
- existing-block stripping (4): no marker, alone, with user content
above, with user content below
- end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
re-run, preserves user config, error reporting, invalid input,
summary formatting
Files changed:
hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
the codex_app_server enable branch. Migration failure logs a warning
in the result message but does NOT fail the runtime change. Disable
path (auto) explicitly skips migration.
tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
test_migration_failure_does_not_block_enable.
All 325 feature tests green:
- tests/agent/transports/: 249 (incl. 67 new)
- tests/run_agent/test_codex_app_server_integration.py: 9
- tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
- tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)
* perf(codex-runtime): cache codex --version check within apply()
Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.
Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.
Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.
Two regression-guard tests added:
- test_binary_check_cached_within_apply: enable path → call_count == 1
- test_binary_check_cached_on_read_only_call: state-report path → call_count == 1
Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.
* fix(codex-runtime): correct protocol field names found via live e2e test
Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.
Bug 1: thread/start.permissions wire format
Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
{"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.
Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.
Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.
Bug 2: server-request method names
Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
item/commandExecution/requestApproval
item/fileChange/requestApproval
item/permissions/requestApproval (new third method)
Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.
Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.
Bug 3: approval decision values
Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).
Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.
Live test verified after fixes:
$ hermes (with model.openai_runtime: codex_app_server)
> Run the shell command: echo hermes-codex-livetest > .../proof.txt
then read it back
Approval prompt fired with 'Codex requests exec in <cwd>'.
User chose 'Allow once'. Codex executed the command, wrote the file,
read it back. Final response: 'Read back from proof.txt:
hermes-codex-livetest'. File contents on disk match.
agent.log confirms:
codex app-server thread started: id=019e200e profile=workspace-write
cwd=/tmp/hermes-codex-livetest/workspace
All 20 session tests still green after wire-format updates.
* fix(codex-runtime): correct apply_patch approval params + ship docs
Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.
Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.
Live e2e validation across the path matrix:
✓ thread/start handshake
✓ turn/start with text input
✓ commandExecution items + projection
✓ item/commandExecution/requestApproval → Hermes UI → response
✓ Approve once → command runs
✓ Deny → command rejected, codex falls back to read-only message
✓ Multi-turn (codex remembers prior turn's results)
✓ apply_patch via Codex's fileChange path
✓ item/fileChange/requestApproval → Hermes UI
✓ MCP server migration loads inside spawned codex (verified via
'use the filesystem MCP tool' prompt)
✓ /codex-runtime auto → codex_app_server toggle cycle
✓ Disable doesn't trigger migration
✓ Enable with codex CLI present succeeds + migrates
✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
even if codex finishes before the interrupt lands)
Known live-validated limitations now documented in the docs page:
- delegate_task subagents unavailable on this runtime
- permission profile selection delegated to ~/.codex/config.toml
- apply_patch approval prompt has no inline changeset (codex protocol
doesn't expose it)
145/145 codex-runtime tests still green.
* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)
Major: migrate native Codex plugins (#7 in OpenClaw's PR list)
Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.
Implementation:
- hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
helper spawns 'codex app-server' briefly and walks plugin/list. Returns
(plugins, error) — failures are non-fatal so MCP migration still works.
- render_codex_toml_section() now takes plugins + permissions args.
- migrate() defaults: discover_plugins=True, default_permission_profile=
'workspace-write'. Explicit None on either disables that side.
- _strip_existing_managed_block() now also strips [plugins.*] and
[permissions]/[permissions.*] sections inside the managed block, so
re-runs replace plugins cleanly without touching codex's own config.
Quirk fixes:
#2 Default permissions profile written on enable.
Without this, Codex's read-only default kicks in and EVERY write
triggers an approval prompt. Now writes [permissions] default =
'workspace-write' so the runtime feels normal out of the box. Set
default_permission_profile=None to opt out.
#4 apply_patch approval prompt now shows what's changing.
Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
Session adapter now caches the fileChange item from item/started
notifications and looks it up by itemId when codex requests approval.
Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
'apply_patch (0 change(s))'.
Side benefit: also drains pending notifications BEFORE handling a
server request, so the projector and per-turn caches are up to date
when the approval decision fires. Bounded to 8 notifications per
loop iter to avoid starving codex's response.
#5/#10 Exec approval prompt never shows empty cwd.
When codex omits cwd in CommandExecutionRequestApprovalParams, fall
back to the session's cwd. If somehow neither is available, show
'<unknown>' explicitly instead of an empty string.
Also surfaces 'reason' from the approval params when codex provides
it — gives users more context on why codex wants to run something.
#11 Banner indicates the codex_app_server runtime when active.
New 'Runtime: codex app-server (terminal/file ops/MCP run inside
codex)' line appears in the welcome banner only when the runtime is
on. Default banner is unchanged.
Tests:
- 7 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery (mocked), failure handling, dry-run skip, opt-out
flag, idempotent re-runs, and permissions writing.
- 3 new tests in test_codex_app_server_session.py covering the
enriched approval prompts: cwd fallback, change summary on
apply_patch, fallback when no item/started cache exists.
- All 26 session tests + 46 migration tests green; 153 total in PR.
* feat(codex-runtime): hermes-tools MCP callback + native plugin migration
The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.
Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.
New module: agent/transports/hermes_tools_mcp_server.py
FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
through model_tools.handle_function_call() — same code path as the
Hermes default runtime. Run with:
python -m agent.transports.hermes_tools_mcp_server [--verbose]
Exposed: web_search, web_extract, browser_navigate / _click / _type /
_press / _snapshot / _scroll / _back / _get_images / _console /
_vision, vision_analyze, image_generate, skill_view, skills_list,
text_to_speech.
NOT exposed (deliberately):
- terminal/shell/read_file/write_file/patch — codex has built-ins
- delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
model_tools.py:493, require running AIAgent context. Documented
as a limitation and surfaced in the slash command output.
Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
- _query_codex_plugins() spawns 'codex app-server' briefly to walk
plugin/list and pull installed openai-curated plugins. Failures are
non-fatal — MCP migration still completes.
- render_codex_toml_section() now takes plugins + permissions args
AND wraps the managed block with a MIGRATION_END_MARKER comment so
the stripper can reliably find both ends, even when the block
contains top-level keys (default_permissions = ...).
- migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
default_permission_profile=':workspace' (built-in codex profile name
— must be prefixed with ':'). All three opt-out via explicit args.
- _build_hermes_tools_mcp_entry() builds the codex stdio entry with
HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
Hermes points the MCP subprocess at the same module layout.
Live-caught wire bugs fixed during this turn:
1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
for *user-defined* profiles with structured fields. Built-in
profile names start with ':' (':workspace', ':read-only',
':danger-no-sandbox'). Was emitting
which codex rejected with 'invalid type: string "X", expected
struct PermissionProfileToml'.
2. Built-in profile is , NOT . Codex
rejected with 'unknown built-in profile'.
3. Codex's MCP layer sends for
tool-call confirmation. We weren't handling it, so codex stalled
and returned 'MCP tool call was rejected'. Now: auto-accept for
our own hermes-tools server (user already opted in by enabling
the runtime), decline for third-party servers.
Quirk fixes shipped (from the limitations list):
#2 default permissions: workspace profile written on enable. No more
approval prompt on every write.
#4 apply_patch approval shows what's changing: cache fileChange
items from item/started, look up by itemId when codex sends
item/fileChange/requestApproval. Prompt: '1 add, 1 update:
/tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
#5/#10 exec approval cwd never empty: fall back to session cwd, then
'<unknown>'. Also surfaces 'reason' from codex when present.
#11 banner shows 'Runtime: codex app-server' line when active so
users understand why tool counts may not match what's reachable.
Tests:
- 5 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery, expose_hermes_tools entry generation, idempotent
re-runs, opt-out flag, permissions profile.
- 3 new tests in test_codex_app_server_session.py covering enriched
approval prompts (cwd fallback, fileChange summary).
- 2 new tests for mcpServer/elicitation/request handling (accept
hermes-tools, decline others).
- New test file test_hermes_tools_mcp_server.py covering module
surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
no agent-loop tools), and main() error paths.
- 166 codex-runtime tests total, all green.
Live e2e validated against codex 0.130.0 + ChatGPT subscription:
✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
registers hermes-tools, writes default_permissions = ':workspace'
✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
✓ Shell command runs without approval prompt (workspace profile works)
✓ Multi-turn — codex remembers prior turn's results
✓ apply_patch path via fileChange request approval
✓ web_search via hermes-tools MCP callback returns real Firecrawl
results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
✓ Disable cycle clean
Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
Full re-write covering native plugin migration, the hermes-tools
callback architecture, the prerequisites change ('codex login is
separate from hermes auth login codex'), the trade-off table now
reflecting which Hermes tools work via callback, and the limitations
list updated with what's actually unavailable on this runtime.
* feat(codex-runtime): pin user-config preservation invariant for quirk #6
Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.
This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.
Now explicitly tested:
- User MCP server above the managed block survives migration
- User MCP server below the managed block survives migration
- Both above + below survive a second re-migration
- User content (model, providers, sandbox, otel, etc.) outside our
region is left untouched
Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.
167 codex-runtime tests, all green.
* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find
Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.
Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).
Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:
1. Codex's built-in toolset (always on) — shell, apply_patch,
update_plan, view_image, web_search; covers everything terminal-
adjacent.
2. Native Codex plugins (auto-migrated from your codex plugin
install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
web_search/web_extract via Firecrawl, browser_*, vision_analyze,
image_generate, skill_view/skills_list, text_to_speech.
Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.
Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.
Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.
No code changes — purely docs clarification. 167 codex-runtime tests
still green.
* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade
Two real bugs in the self-improvement loop integration that the previous
test mocked away.
Bug 1: wrong call signature
The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
messages_snapshot=list (positional or keyword)
review_memory=bool (at least one trigger must be True)
review_skills=bool
So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.
Bug 2: review fork inherits codex_app_server api_mode
The review fork is constructed with:
api_mode = _parent_runtime.get('api_mode')
So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.
Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).
Also rewrote the codex helper's review wiring to match the
chat_completions path:
- Computes _should_review_memory in the pre-loop block (was already
being computed; now passed through to the helper as an arg).
- Computes _should_review_skills AFTER the codex turn returns +
counters tick (line ~15432 pattern in chat_completions).
- Calls _spawn_background_review(messages_snapshot=, review_memory=,
review_skills=) only when at least one trigger fires.
- Adds the external memory provider sync (_sync_external_memory_for_turn)
that the chat_completions path runs after every turn.
Tests:
Replaced the broken test_background_review_invoked (which only
asserted spawn.called) with three sharper tests:
- test_background_review_NOT_invoked_below_threshold:
single turn at default thresholds → no review fires (would have
caught the original 'every turn calls spawn with no args' bug)
- test_background_review_skill_trigger_fires_above_threshold:
10 tool_iterations at threshold=10 → review fires with
messages_snapshot=list, review_skills=True, counter resets
- test_background_review_signature_never_breaks: regression guard
asserting positional args are always empty and kwargs include
messages_snapshot
New TestReviewForkApiModeDowngrade class:
- test_codex_app_server_parent_downgrades_review_fork: drives the
real _spawn_background_review function (no mock at that level),
asserts the review_agent gets api_mode='codex_responses' when
the parent was codex_app_server.
Live-validated against real run_conversation:
- Counter ticked from 0 to 5 after a 5-tool-iteration turn
- _spawn_background_review fired exactly once with kwargs-only signature
- review_skills=True, review_memory=False
- messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
results + 1 final assistant + initial system/user)
- Counter reset to 0 after fire
170 codex-runtime tests, all green.
Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).
* feat(codex-runtime): expose kanban tools through Hermes MCP callback
Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.
That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.
Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.
Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.
Tools exposed:
Worker handoff (require HERMES_KANBAN_TASK):
kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
Read-only board queries:
kanban_show, kanban_list
Orchestrator (require HERMES_KANBAN_TASK unset):
kanban_create, kanban_unblock, kanban_link
Tests:
- test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
- test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link
Docs:
- New 'Workflow features' section in the docs page covering /goal,
kanban, and cron behavior on this runtime
- /goal: works fully via run_conversation feedback; only caveat is
approval-prompt noise on long writes-heavy goals (mitigated by
the default :workspace permission profile)
- Kanban: enumerated which tools are reachable via the callback and
why the env var propagates correctly through the codex subprocess
to the MCP server subprocess
- Cron: documented as 'not specifically tested' — same rules as the
CLI apply since cron runs through AIAgent.run_conversation
- Trade-offs table gained rows for /goal, kanban worker, kanban
orchestrator
172/172 codex-runtime tests green (+2 from kanban tests).
* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost
Three docs gaps caught during a final audit:
1. /codex-runtime was only in the feature docs page, not in the
slash-commands reference. Added rows to both the CLI section and
the Messaging section so users discover it where they'd look for
slash command syntax.
2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
CODEX_HOME lets users redirect Codex CLI's config dir (the migration
honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
propagates to the codex subprocess + the hermes-tools MCP subprocess
so kanban worker tools gate correctly — documented as 'don't set
manually' since it's an internal handoff.
3. Aux client behavior on this runtime. When openai_runtime=
codex_app_server is on with the openai-codex provider, every aux
task (title generation, context compression, vision auto-detect,
session search summarization, the background self-improvement review
fork) flows through the user's ChatGPT subscription by default.
This is true for the existing codex_responses path too, but it's
more visible / important here because users explicitly opted in for
subscription billing. Added a 'Auxiliary tasks and ChatGPT
subscription token cost' section to the docs page with a YAML
example showing how to override specific aux tasks to a cheaper
model (typically google/gemini-3-flash-preview via OpenRouter).
Also documents how the self-improvement review fork gets
auto-downgraded from codex_app_server to codex_responses by the
fix earlier in this PR.
No code changes — pure docs. 172 codex-runtime tests still green.
* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME
OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.
Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:
test_spawn_env_preserves_HOME — confirms parent HOME survives intact
in the subprocess env
test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
arg still isolates
codex state correctly
Docs additions:
'HOME environment variable passthrough' section — calls out the
contract explicitly: CODEX_HOME isolates codex's own state, HOME
stays user-real so gh/git/aws/npm/etc. find their normal config.
Cites openclaw#81562 as the cautionary tale.
'Multi-profile / multi-tenant setups' section — addresses the
related concern: profiles share ~/.codex/ by default. For users who
want per-profile codex isolation (separate auth, separate plugins),
documents the manual CODEX_HOME=<profile-scoped-dir> approach.
Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
would silently invalidate existing codex login state for anyone
upgrading to this PR with tokens already at ~/.codex/auth.json.
Opt-in is safer than surprising users.
174 codex-runtime tests (+2 from HOME guards), all green.
* fix(codex-runtime): TOML control-char escapes + atomic config.toml write
Two footguns caught in a final audit pass before merge.
Bug 1: TOML control characters not escaped
The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.
Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.
Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.
Bug 2: config.toml write wasn't atomic
If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.
Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.
Tests:
- test_string_with_newline_escaped — \n in value → \n in output
- test_string_with_tab_escaped — \t in value → \t in output
- test_string_with_other_controls_escaped — \r, \f, \b
- test_windows_path_escaped_correctly — backslash doubling
- test_atomic_write_no_temp_leak_on_success — no .config.toml.*
left over after a successful write
- test_atomic_write_cleanup_on_rename_failure — temp file removed
when Path.replace raises (simulated disk full)
180 codex-runtime tests, all green (+6 from this commit).
Footguns audited but NOT fixed (with rationale):
- Concurrent migrations race. Two Hermes processes hitting
/codex-runtime codex_app_server within seconds of each other could
cause one writer to lose entries. Low probability (you'd have to
enable from two surfaces simultaneously) and low impact (just re-run
migration). Adding fcntl/msvcrt locking is more code than it's
worth here. The atomic rename above means each individual write is
consistent — only the merge step is racy.
- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
check at runtime but don't reject too-new versions. Right call —
the protocol has been stable through 0.125 → 0.130. If OpenAI
breaks it later we'd see the error in test_codex_app_server_runtime
on CI before users hit it.
|
||
|
|
9d42c2c286
|
feat(video_gen): unified video_generate tool with pluggable provider backends (#25126)
* feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' | 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> |
||
|
|
cc64a04f61 |
test(gateway): make queued follow-up regression generic
Replace tenant-specific example text in the transcript offset regression with generic follow-up turns so the upstream test documents the bug without customer-specific wording. |
||
|
|
9a815b6c8c |
fix(gateway): preserve queued follow-up transcript history
Keep the outer history_offset when _run_agent drains queued follow-ups recursively so transcript persistence includes every queued turn in the chain instead of only the last one. |
||
|
|
e2b2d48610 |
fix(cli): preserve startup banner on terminal resize
Recover from SIGWINCH without clearing the physical screen or scrollback buffer. The startup banner and tool summary are printed before prompt_toolkit owns the live chrome, so they live in normal terminal scrollback. Calling erase_screen() + \x1b[3J] on every resize removed that UI permanently — _replay_output_history cannot reconstruct it because the banner was never added to _OUTPUT_HISTORY. Instead, just reset prompt_toolkit's renderer cache and invalidate so the next incremental redraw starts from a clean slate, then let the original on_resize handler recalculate layout for the new terminal size. This matches the behaviour of bash/zsh/fish on SIGWINCH. Fixes NousResearch/hermes-agent#22999 |
||
|
|
59da8ec4ec |
fix(tools): refuse skill_view name collisions instead of guessing
skill_view ran the direct-path strategy across every skill dir before the recursive strategy, so a top-level skill in an external dir could silently shadow a same-named nested local skill. /skills correctly listed the local version (deduped local-first by _find_all_skills) but skill_view loaded the external one — confusing, and a real bug class for users with skills.external_dirs registered alongside categorized local skills. Pick a louder fix than @polkn's PR #6136 proposed: collect every match across all dirs (direct path, recursive by parent dir name, legacy flat <name>.md), and if there's more than one, refuse with an error that surfaces every matching path plus a hint to load by the categorized form. Local-first precedence would have replaced silent external-shadowing with silent same-name collisions between two externals, or made an externally-shadowed-by-local skill unreachable by bare name with no signal. Refusing forces the user to disambiguate once and never wonder which skill ran. Recovery: pass the full categorized path ("foundations/runtime/explore-codebase" instead of "explore-codebase"), or rename one of the colliding skills. Co-authored-by: pol <pol.kuijken@gmail.com> |