mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
851 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0f0e20ef81 |
test(novita): cache pricing, add provider test coverage, AUTHOR_MAP entry
Follow-up to Alex-wuhu's NovitaAI provider commit. Adds: - _pricing_cache hit/write in _fetch_novita_pricing (was missing — every pricing fetch was re-hitting the network), mirroring the fetch_ai_gateway_pricing pattern. force_refresh now also propagates from get_pricing_for_provider. - TestNovitaProvider in tests/hermes_cli/test_api_key_providers.py covering profile load, alias resolution, registry auto-registration, model list parity between main.py and models.py, _URL_TO_PROVIDER, _PROVIDER_PREFIXES, context_size in _CONTEXT_LENGTH_KEYS, pricing unit conversion, and pricing cache behavior. - AUTHOR_MAP entry for yanglongwei06@gmail.com → @Alex-yang00. |
||
|
|
35ce94a2f8 |
fix(tests): correct skin engine test API call
The salvaged regression test called skin.get_spinner_list() which doesn't exist on SkinConfig. Replace with direct dict access on skin.spinner — same intent (verify default empty spinner is preserved when user override is invalid). |
||
|
|
5f234d4057 | fix(cli): harden skin yaml parsing for invalid section types | ||
|
|
8f19078c6a
|
feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited. |
||
|
|
efc32ab639 |
refactor(inventory): extract shared ConfigContext + build_models_payload
Three call-sites in the codebase each duplicated the same config-slice
+ list_authenticated_providers + post-processing pattern:
- hermes_cli/web_server.py /api/model/options
- tui_gateway/server.py model.options JSON-RPC
- tui_gateway/server.py model.save_key JSON-RPC
This consolidates them onto hermes_cli/inventory.py:
load_picker_context() -> ConfigContext
Replaces the 17-LOC config-slice (model.{default,name,provider,
base_url}, providers:, custom_providers:) every consumer did
inline.
ConfigContext.with_overrides(*, current_provider=, current_model=,
current_base_url=) -> ConfigContext
Truthy-only overlay for TUI agent-session state on top of disk
config. Empty getattr(agent, ...) attrs MUST NOT clobber disk.
build_models_payload(ctx, *, include_unconfigured, picker_hints,
canonical_order, max_models) -> dict
Single payload builder. Delegates curation to
list_authenticated_providers (does not call provider_model_ids
per row \u2014 that pulls non-agentic models). picker_hints +
canonical_order produce the TUI ModelPickerDialog shape;
defaults match the dashboard's existing /api/model/options
contract.
Two latent bugs fixed by consolidation:
1. The dashboard read cfg.get('custom_providers') directly, missing
the v12+ keyed providers: form. Now both surfaces go through
get_compatible_custom_providers().
2. The TUI's canonical-merge keyed on is_user_defined to decide order.
Section 3 of list_authenticated_providers sets is_user_defined=True
on rows from the providers: config dict even when the slug is
canonical \u2014 that silently demoted them to the picker tail.
_reorder_canonical now keys on slug membership instead.
Stats: +666 / -145 (net +521). Module 240 LOC; 18 behavior tests.
This PR replaces the rejected #23369 (which bundled the consolidation
with new scriptable CLI surfaces \u2014 hermes models list/status, hermes
providers list \u2014 and a JSON contract that have no external user
demand). Just the refactor; the CLI surface is deferred to a separate
PR gated on actual demand.
Refs #23359.
|
||
|
|
d5775fe988
|
feat(codex-runtime): skip unavailable plugins during migration (#25437)
Followup to PR #24182 — caught when scanning OpenClaw for recent codex fixes we hadn't considered. OpenClaw learned the hard way (#80815) that migrating plugins which codex itself reports as unavailable produces config that fails at activation time. Our /codex-runtime codex_app_server enable path queries codex's plugin/list and migrates everything where installed=true. We were trusting codex's installation state and ignoring its availability field. So a plugin that's installed=true but availability=UNAVAILABLE (broken local install) or REQUIRES_AUTH (OAuth expired or never completed) would get an [plugins."<n>@openai-curated"] entry in ~/.codex/config.toml — and the user's first codex turn after enabling the runtime would fail because codex refuses to activate it. Fix: filter on availability in _query_codex_plugins(). Only emit plugins where availability is empty (older codex versions without the field — preserve backward compat) or explicitly AVAILABLE. Tests: test_plugin_discovery_skips_unavailable_plugins — verifies 4 cases: - good-plugin (installed=True, availability=AVAILABLE) → migrated - broken-plugin (installed=True, availability=UNAVAILABLE) → skipped - auth-pending (installed=True, availability=REQUIRES_AUTH) → skipped - legacy-plugin (installed=True, no availability field) → migrated (older codex versions; preserve backward compat) Docs: Added bullet to 'What's NOT migrated' list in the docs page calling out the availability filter and why. Other OpenClaw codex PRs I reviewed but did NOT apply (with reasoning): - #81591 (load Codex for selectable models): we resolve runtime per-call already, no startup-time gating to fix - #81510 (cron compatibility): we documented cron as untested; their fix is for OpenClaw-specific cron orchestration shape - #81223 (rotate incompatible context-engine threads): we don't have a Lossless context engine equivalent - #80688 (constrain sandbox): we don't have an outer-sandbox concept - #80616 (release on turn_aborted): we already handle status= interrupted in turn/completed correctly - #80278 (expose activeModel in plugin SDK): not our surface - #80792 (default destructive_actions on): we don't expose that knob 56 codex-runtime migration tests still green (+1 new). |
||
|
|
3a30c605b3 |
feat(plugins): add thread-local tool whitelist to pre_tool_call gate
Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to hermes_cli/plugins.py. When set on the current thread, restricts which tools can pass through get_pre_tool_call_block_message; non-whitelisted tools are blocked with a configurable deny message. Mirrors the per-thread approval-callback pattern already used by set_approval_callback (tools/terminal_tool.py:190). Used by _spawn_background_review to deny non-memory/non-skill tools at runtime while inheriting the parent agent's full tools schema for prefix-cache parity (see follow-up commit). Tests cover allow / deny / clear / cross-thread isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3f13d78088
|
perf(tools): cache get_nous_auth_status() and load_env() to fix slow hermes tools menus (#25341)
`hermes tools` -> "All Platforms" took ~14s to render the checklist because building the toolset labels called `get_nous_auth_status()` ~31x transitively (`_toolset_has_keys` -> `_visible_providers` -> `get_nous_subscription_features` -> `managed_nous_tools_enabled`). Each call did a synchronous OAuth refresh POST to portal.nousresearch.com (~350ms even on the failure path), so one menu paint burned >13s of HTTP and 31 single-use Nous refresh tokens. Secondary hot spot: every `get_env_value()` re-read and re-sanitised the entire .env file. 116 reads with O(lines x known-keys) scanning added ~300ms of CPU per render. Fix is two process-level caches, both mtime-keyed so login/logout/edit invalidate naturally: * `hermes_cli/auth.py`: memoise `get_nous_auth_status()` for 15s keyed on auth.json mtime. Splits `_compute_nous_auth_status()` as the uncached impl. Adds `invalidate_nous_auth_status_cache()`. * `hermes_cli/config.py`: memoise `load_env()` keyed on .env (path, mtime, size). Adds `invalidate_env_cache()`, wired into `save_env_value`, `remove_env_value`, and the sanitize-on-load writer so writers don't return stale dicts on same-second writes. Before/after on Teknium's box (real HERMES_HOME, no Nous login): * "All Platforms" cold path: ~13,874ms -> ~691ms label-build * Warm re-open within the same process: ~122ms -> ~17ms Side benefit: stops burning a Nous refresh token on every menu paint, which was risking the portal's reuse-detection revocation logic. |
||
|
|
dd5a9502e3
|
fix(tools-config): write video_gen.provider on Reconfigure tool path (#25307)
`_reconfigure_provider()` handled `image_gen_plugin_name` in both branches (no-env-vars early return and post-env-vars) but never mirrored the same handling for `video_gen_plugin_name`. The first-time `_configure_provider()` path correctly routes to `_select_plugin_video_gen_provider()`; reconfigure forgot to. Repro: 1. Enable video_gen in `hermes tools` → Configure for All Platforms. 2. Go back into `hermes tools` → Reconfigure tool → Video Generation. 3. Pick xAI (with XAI_API_KEY already set). 4. Hit Enter at the "keep current key?" prompt. Expected: `video_gen.provider: xai` written to config.yaml. Actual: function returns silently; no `video_gen:` block ever written; `video_generate` tool fails with "No video generation backend is configured." Fix: add the missing `video_gen_plugin_name` branch in both code paths of `_reconfigure_provider()`, mirroring the existing `image_gen_plugin_name` handling and the first-time configure logic. Tests: `tests/hermes_cli/test_video_gen_picker.py` covers both branches (env-vars-set keep-current and no-env-vars paths). |
||
|
|
091d8e1030
|
feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182)
* feat(codex-runtime): scaffold optional codex app-server runtime
Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.
Lands in three pieces:
1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
for codex's app-server protocol (codex-rs/app-server). Spawn, init
handshake, request/response, notification queue, server-initiated
request queue (for approval round-trips), interrupt-friendly blocking
reads. Tested against real codex 0.130.0 binary end-to-end during
development.
2. hermes_cli/runtime_provider.py:
- Adds 'codex_app_server' to _VALID_API_MODES.
- Adds _maybe_apply_codex_app_server_runtime() helper, called at the
end of _resolve_runtime_from_pool_entry(). Inert unless
'model.openai_runtime: codex_app_server' is set in config.yaml AND
provider in {openai, openai-codex}. Other providers cannot be
rerouted (anthropic, openrouter, etc. preserved).
3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
covering api_mode registration, the rewriter helper (default-off,
case-insensitive, opt-in, non-eligible providers preserved), version
parser, missing-binary handling, error class. Does NOT require codex
CLI installed.
This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.
Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)
* feat(codex-runtime): add codex item projector for memory/skill review
The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.
Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
- userMessage → {role: user, content}
- agentMessage → {role: assistant, content: text}
- reasoning → stashed in next assistant's 'reasoning' field
- commandExecution → assistant tool_call(name='exec_command') + tool result
- fileChange → assistant tool_call(name='apply_patch') + tool result
- mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result
- dynamicToolCall → assistant tool_call(name=<tool>) + tool result
- plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls
Invariants preserved:
- Message role alternation never violated: each tool item produces at most
one assistant + one tool message in that order, correlated by call_id.
- Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
don't materialize messages — only item/completed does. Mirrors how
Hermes already only writes the assistant message after streaming ends.
- Tool call ids are deterministic (codex item id-based) so replays produce
identical messages and prefix caches stay valid (AGENTS.md pitfall #16).
- JSON args use sorted_keys for the same reason.
Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).
23 new tests, all green:
- Streaming deltas don't materialize (3 paths)
- Turn/thread frame events are silent
- commandExecution: 5 tests including non-zero exit annotation +
deterministic id stability across replays
- agentMessage + reasoning attachment + reasoning consumption
- fileChange: summary without inlined content
- mcpToolCall: namespaced naming + error surfacing
- userMessage: text fragments only (drops images/etc)
- opaque items: no fabricated tool_calls
- Helpers: deterministic id stability + sorted JSON args
- Role alternation invariant across all four tool-shaped item types
This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.
* feat(codex-runtime): add session adapter + approval bridge
The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.
The adapter has a single public per-turn method:
result = session.run_turn(user_input='...', turn_timeout=600)
# result.final_text → assistant text for the caller
# result.projected_messages → list ready to splice into AIAgent.messages
# result.tool_iterations → tick count for _iters_since_skill nudge
# result.interrupted → True on Ctrl+C / deadline / interrupt
# result.error → error string when the turn cannot complete
# result.turn_id, thread_id → for sessions DB / resume
Behavior:
- ensure_started() spawns codex, does the initialize handshake, and
issues thread/start with cwd + permissions profile. Idempotent.
- run_turn() blocks until turn/completed, drains server-initiated
requests (approvals) before reading notifications so codex never
deadlocks waiting for us, projects every item/completed via the
projector, and increments tool_iterations for the skill nudge gate.
- request_interrupt() is thread-safe (threading.Event); the next loop
iteration issues turn/interrupt and unwinds.
- turn_timeout deadlock guard issues turn/interrupt and records an
error if the turn never completes.
- close() escalates terminate → kill via the underlying client.
Approval bridge:
Codex emits server-initiated requests for execCommandApproval and
applyPatchApproval. The adapter translates Hermes' approval choice
vocabulary onto codex's decision vocabulary:
Hermes 'once' → codex 'approved'
Hermes 'session' or 'always' → codex 'approvedForSession'
Hermes 'deny' / anything else → codex 'denied'
Routing precedence:
1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
2. approval_callback wired by the CLI (defers to
tools.approval.prompt_dangerous_approval())
3. Fail-closed denial when neither is wired
Unknown server-request methods are answered with JSON-RPC error -32601
so codex doesn't hang waiting for us.
Permission profile mapping mirrors AGENTS.md:
Hermes 'auto' → codex 'workspace-write'
Hermes 'approval-required' → codex 'read-only-with-approval'
Hermes 'unrestricted/yolo' → codex 'full-access'
20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
- test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
- test_codex_event_projector.py: 23 (item taxonomy projections)
- test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)
Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.
Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.
* feat(codex-runtime): wire codex_app_server runtime into AIAgent
The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.
Three small surgical edits to run_agent.py (~105 LOC total):
1. Line ~1204 (constructor api_mode validation set):
Add 'codex_app_server' so an explicit api_mode='codex_app_server'
passed to AIAgent() isn't silently rewritten to 'chat_completions'.
2. Line ~12048 (run_conversation, just before the while loop):
Early-return to _run_codex_app_server_turn() when self.api_mode is
'codex_app_server'. Placed AFTER all standard pre-loop setup —
logging context, session DB, surrogate sanitization, _user_turn_count
and _turns_since_memory increments, _ext_prefetch_cache, memory
manager on_turn_start — so behavior outside the model-call loop is
identical between paths. Default Hermes flow is unchanged when the
flag is off.
3. End-of-class (line ~15497):
New method _run_codex_app_server_turn(). Lazy-instantiates one
CodexAppServerSession per AIAgent (reused across turns), runs the
turn, splices projected_messages into messages, increments
_iters_since_skill by tool_iterations (since the chat_completions
loop normally does that per iteration), fires
_spawn_background_review on the same cadence as the default path.
Counter accounting:
_turns_since_memory ← already incremented at run_conversation:11817
(gated on memory store configured) — codex
helper does NOT touch it (would double-count).
_user_turn_count ← already incremented at run_conversation:11793
— codex helper does NOT touch it.
_iters_since_skill ← incremented in the chat_completions loop per
tool iteration. Codex helper increments by
turn.tool_iterations since the loop is bypassed.
User message:
ALREADY appended to messages by run_conversation pre-loop (line 11823)
before the early-return reaches us. Helper does NOT append again.
Regression test test_user_message_not_duplicated guards this.
Approval callback wiring:
Lazy-fetches tools.terminal_tool._get_approval_callback at session
spawn time, passes to CodexAppServerSession. CLI threads with
prompt_toolkit get interactive approvals; gateway/cron contexts get
the codex-side fail-closed deny.
Error path:
Codex session exceptions become a 'partial' result with completed=False
and a final_response that explicitly tells the user how to switch back:
'Codex app-server turn failed: ... Fall back to default runtime with
/codex-runtime auto.' Same return-dict shape as the chat_completions
path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.
9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
- api_mode='codex_app_server' is accepted on AIAgent construction
- run_conversation returns the expected codex shape
(final_response, codex_thread_id, codex_turn_id, completed, partial)
- Projected messages are spliced into messages list
- _iters_since_skill ticks per tool iteration
- _user_turn_count delegated to standard flow (not double-counted)
- User message appears exactly once (regression guard)
- _spawn_background_review IS invoked (memory/skill review keeps working)
- chat.completions.create is NEVER called (loop fully bypassed)
- Session exception → partial result with /codex-runtime auto hint
- Interrupted turn → partial result with error preserved
Adjacent test runs confirm no regressions:
- tests/run_agent/test_memory_nudge_counter_hydration.py: green
- tests/run_agent/test_background_review.py: green
- tests/run_agent/test_fallback_model.py: green
- tests/agent/transports/: 249/249 green
Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.
* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)
User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.
Surface:
/codex-runtime — show current state + codex CLI status
/codex-runtime auto — Hermes default runtime
/codex-runtime codex_app_server — codex subprocess runtime
/codex-runtime on / off — synonyms
Files changed:
hermes_cli/codex_runtime_switch.py (new):
Pure-Python state machine shared by CLI and gateway. Parse args,
read/write model.openai_runtime in the config dict, gate enabling
behind a codex --version check (don't let users opt in to a runtime
they have no binary for; print npm install hint instead).
Returns a CodexRuntimeStatus dataclass that callers render however
suits their surface.
hermes_cli/commands.py:
Single CommandDef entry, no aliases (codex-runtime is its own thing).
cli.py:
Dispatch in process_command() + _handle_codex_runtime() handler that
delegates to the shared module and renders results via _cprint.
gateway/run.py:
Dispatch in _handle_message() + _handle_codex_runtime_command() that
returns a string (gateway sends as message). On a successful change
that requires a new session, _evict_cached_agent() forces the next
inbound message to construct a fresh AIAgent with the new api_mode —
avoids prompt-cache invalidation mid-session.
gateway/run.py running-agent guard:
/codex-runtime joins /model in the early-intercept block so a runtime
flip mid-turn can't split a turn across two transports.
Tests:
tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
state machine: arg parsing (10 cases incl. case-insensitive and
synonyms), reading current runtime (5 cases incl. malformed configs),
writing runtime (3 cases), apply() entry point covering read-only,
no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
and persist-failure paths (8 cases). All green.
Adjacent test suites confirm no regressions:
- tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
167/167 green
- tests/agent/transports/: 283/283 green when combined with prior commits
Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.
* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml
Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.
The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.
What translates (mapping verified against codex-rs/core/src/config/edit.rs):
Hermes mcp_servers.<n>.command/args/env → codex stdio transport
Hermes mcp_servers.<n>.url/headers → codex streamable_http transport
Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec
Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec
Hermes mcp_servers.<n>.cwd → codex stdio cwd
Hermes mcp_servers.<n>.enabled: false → codex enabled = false
What does NOT translate (warned + skipped per server):
Hermes-specific keys (sampling, etc.) — codex's MCP client has no
equivalent. Listed in the per-server skipped[] field of the report.
What's NOT migrated (intentional):
AGENTS.md — codex respects this file natively in its cwd. Hermes' own
AGENTS.md (project-level) is already in the worktree, so codex picks
it up without translation. No code needed.
Idempotency design:
All managed content lives between a 'managed by hermes-agent' marker
and the next non-mcp_servers section header. _strip_existing_managed_block
removes the prior managed region cleanly, preserving any user-added
codex config (model, providers.openai, sandbox profiles, etc.) above
or below.
Files added:
hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
helper. Public API: migrate(hermes_config, codex_home=None,
dry_run=False) returns MigrationReport with .migrated/.errors/
.skipped_keys_per_server. No external TOML dependency — minimal
formatter handles strings/numbers/booleans/lists/inline-tables.
tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
covering:
- per-server translation (12): stdio/http/sse, cwd, timeouts,
enabled flag, command+url precedence, sampling drop, unknown keys
- TOML formatter (8): types, escaping, inline tables, error case
- existing-block stripping (4): no marker, alone, with user content
above, with user content below
- end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
re-run, preserves user config, error reporting, invalid input,
summary formatting
Files changed:
hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
the codex_app_server enable branch. Migration failure logs a warning
in the result message but does NOT fail the runtime change. Disable
path (auto) explicitly skips migration.
tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
test_migration_failure_does_not_block_enable.
All 325 feature tests green:
- tests/agent/transports/: 249 (incl. 67 new)
- tests/run_agent/test_codex_app_server_integration.py: 9
- tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
- tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)
* perf(codex-runtime): cache codex --version check within apply()
Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.
Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.
Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.
Two regression-guard tests added:
- test_binary_check_cached_within_apply: enable path → call_count == 1
- test_binary_check_cached_on_read_only_call: state-report path → call_count == 1
Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.
* fix(codex-runtime): correct protocol field names found via live e2e test
Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.
Bug 1: thread/start.permissions wire format
Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
{"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.
Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.
Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.
Bug 2: server-request method names
Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
item/commandExecution/requestApproval
item/fileChange/requestApproval
item/permissions/requestApproval (new third method)
Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.
Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.
Bug 3: approval decision values
Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).
Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.
Live test verified after fixes:
$ hermes (with model.openai_runtime: codex_app_server)
> Run the shell command: echo hermes-codex-livetest > .../proof.txt
then read it back
Approval prompt fired with 'Codex requests exec in <cwd>'.
User chose 'Allow once'. Codex executed the command, wrote the file,
read it back. Final response: 'Read back from proof.txt:
hermes-codex-livetest'. File contents on disk match.
agent.log confirms:
codex app-server thread started: id=019e200e profile=workspace-write
cwd=/tmp/hermes-codex-livetest/workspace
All 20 session tests still green after wire-format updates.
* fix(codex-runtime): correct apply_patch approval params + ship docs
Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.
Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.
Live e2e validation across the path matrix:
✓ thread/start handshake
✓ turn/start with text input
✓ commandExecution items + projection
✓ item/commandExecution/requestApproval → Hermes UI → response
✓ Approve once → command runs
✓ Deny → command rejected, codex falls back to read-only message
✓ Multi-turn (codex remembers prior turn's results)
✓ apply_patch via Codex's fileChange path
✓ item/fileChange/requestApproval → Hermes UI
✓ MCP server migration loads inside spawned codex (verified via
'use the filesystem MCP tool' prompt)
✓ /codex-runtime auto → codex_app_server toggle cycle
✓ Disable doesn't trigger migration
✓ Enable with codex CLI present succeeds + migrates
✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
even if codex finishes before the interrupt lands)
Known live-validated limitations now documented in the docs page:
- delegate_task subagents unavailable on this runtime
- permission profile selection delegated to ~/.codex/config.toml
- apply_patch approval prompt has no inline changeset (codex protocol
doesn't expose it)
145/145 codex-runtime tests still green.
* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)
Major: migrate native Codex plugins (#7 in OpenClaw's PR list)
Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.
Implementation:
- hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
helper spawns 'codex app-server' briefly and walks plugin/list. Returns
(plugins, error) — failures are non-fatal so MCP migration still works.
- render_codex_toml_section() now takes plugins + permissions args.
- migrate() defaults: discover_plugins=True, default_permission_profile=
'workspace-write'. Explicit None on either disables that side.
- _strip_existing_managed_block() now also strips [plugins.*] and
[permissions]/[permissions.*] sections inside the managed block, so
re-runs replace plugins cleanly without touching codex's own config.
Quirk fixes:
#2 Default permissions profile written on enable.
Without this, Codex's read-only default kicks in and EVERY write
triggers an approval prompt. Now writes [permissions] default =
'workspace-write' so the runtime feels normal out of the box. Set
default_permission_profile=None to opt out.
#4 apply_patch approval prompt now shows what's changing.
Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
Session adapter now caches the fileChange item from item/started
notifications and looks it up by itemId when codex requests approval.
Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
'apply_patch (0 change(s))'.
Side benefit: also drains pending notifications BEFORE handling a
server request, so the projector and per-turn caches are up to date
when the approval decision fires. Bounded to 8 notifications per
loop iter to avoid starving codex's response.
#5/#10 Exec approval prompt never shows empty cwd.
When codex omits cwd in CommandExecutionRequestApprovalParams, fall
back to the session's cwd. If somehow neither is available, show
'<unknown>' explicitly instead of an empty string.
Also surfaces 'reason' from the approval params when codex provides
it — gives users more context on why codex wants to run something.
#11 Banner indicates the codex_app_server runtime when active.
New 'Runtime: codex app-server (terminal/file ops/MCP run inside
codex)' line appears in the welcome banner only when the runtime is
on. Default banner is unchanged.
Tests:
- 7 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery (mocked), failure handling, dry-run skip, opt-out
flag, idempotent re-runs, and permissions writing.
- 3 new tests in test_codex_app_server_session.py covering the
enriched approval prompts: cwd fallback, change summary on
apply_patch, fallback when no item/started cache exists.
- All 26 session tests + 46 migration tests green; 153 total in PR.
* feat(codex-runtime): hermes-tools MCP callback + native plugin migration
The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.
Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.
New module: agent/transports/hermes_tools_mcp_server.py
FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
through model_tools.handle_function_call() — same code path as the
Hermes default runtime. Run with:
python -m agent.transports.hermes_tools_mcp_server [--verbose]
Exposed: web_search, web_extract, browser_navigate / _click / _type /
_press / _snapshot / _scroll / _back / _get_images / _console /
_vision, vision_analyze, image_generate, skill_view, skills_list,
text_to_speech.
NOT exposed (deliberately):
- terminal/shell/read_file/write_file/patch — codex has built-ins
- delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
model_tools.py:493, require running AIAgent context. Documented
as a limitation and surfaced in the slash command output.
Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
- _query_codex_plugins() spawns 'codex app-server' briefly to walk
plugin/list and pull installed openai-curated plugins. Failures are
non-fatal — MCP migration still completes.
- render_codex_toml_section() now takes plugins + permissions args
AND wraps the managed block with a MIGRATION_END_MARKER comment so
the stripper can reliably find both ends, even when the block
contains top-level keys (default_permissions = ...).
- migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
default_permission_profile=':workspace' (built-in codex profile name
— must be prefixed with ':'). All three opt-out via explicit args.
- _build_hermes_tools_mcp_entry() builds the codex stdio entry with
HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
Hermes points the MCP subprocess at the same module layout.
Live-caught wire bugs fixed during this turn:
1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
for *user-defined* profiles with structured fields. Built-in
profile names start with ':' (':workspace', ':read-only',
':danger-no-sandbox'). Was emitting
which codex rejected with 'invalid type: string "X", expected
struct PermissionProfileToml'.
2. Built-in profile is , NOT . Codex
rejected with 'unknown built-in profile'.
3. Codex's MCP layer sends for
tool-call confirmation. We weren't handling it, so codex stalled
and returned 'MCP tool call was rejected'. Now: auto-accept for
our own hermes-tools server (user already opted in by enabling
the runtime), decline for third-party servers.
Quirk fixes shipped (from the limitations list):
#2 default permissions: workspace profile written on enable. No more
approval prompt on every write.
#4 apply_patch approval shows what's changing: cache fileChange
items from item/started, look up by itemId when codex sends
item/fileChange/requestApproval. Prompt: '1 add, 1 update:
/tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
#5/#10 exec approval cwd never empty: fall back to session cwd, then
'<unknown>'. Also surfaces 'reason' from codex when present.
#11 banner shows 'Runtime: codex app-server' line when active so
users understand why tool counts may not match what's reachable.
Tests:
- 5 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery, expose_hermes_tools entry generation, idempotent
re-runs, opt-out flag, permissions profile.
- 3 new tests in test_codex_app_server_session.py covering enriched
approval prompts (cwd fallback, fileChange summary).
- 2 new tests for mcpServer/elicitation/request handling (accept
hermes-tools, decline others).
- New test file test_hermes_tools_mcp_server.py covering module
surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
no agent-loop tools), and main() error paths.
- 166 codex-runtime tests total, all green.
Live e2e validated against codex 0.130.0 + ChatGPT subscription:
✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
registers hermes-tools, writes default_permissions = ':workspace'
✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
✓ Shell command runs without approval prompt (workspace profile works)
✓ Multi-turn — codex remembers prior turn's results
✓ apply_patch path via fileChange request approval
✓ web_search via hermes-tools MCP callback returns real Firecrawl
results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
✓ Disable cycle clean
Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
Full re-write covering native plugin migration, the hermes-tools
callback architecture, the prerequisites change ('codex login is
separate from hermes auth login codex'), the trade-off table now
reflecting which Hermes tools work via callback, and the limitations
list updated with what's actually unavailable on this runtime.
* feat(codex-runtime): pin user-config preservation invariant for quirk #6
Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.
This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.
Now explicitly tested:
- User MCP server above the managed block survives migration
- User MCP server below the managed block survives migration
- Both above + below survive a second re-migration
- User content (model, providers, sandbox, otel, etc.) outside our
region is left untouched
Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.
167 codex-runtime tests, all green.
* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find
Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.
Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).
Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:
1. Codex's built-in toolset (always on) — shell, apply_patch,
update_plan, view_image, web_search; covers everything terminal-
adjacent.
2. Native Codex plugins (auto-migrated from your codex plugin
install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
web_search/web_extract via Firecrawl, browser_*, vision_analyze,
image_generate, skill_view/skills_list, text_to_speech.
Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.
Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.
Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.
No code changes — purely docs clarification. 167 codex-runtime tests
still green.
* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade
Two real bugs in the self-improvement loop integration that the previous
test mocked away.
Bug 1: wrong call signature
The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
messages_snapshot=list (positional or keyword)
review_memory=bool (at least one trigger must be True)
review_skills=bool
So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.
Bug 2: review fork inherits codex_app_server api_mode
The review fork is constructed with:
api_mode = _parent_runtime.get('api_mode')
So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.
Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).
Also rewrote the codex helper's review wiring to match the
chat_completions path:
- Computes _should_review_memory in the pre-loop block (was already
being computed; now passed through to the helper as an arg).
- Computes _should_review_skills AFTER the codex turn returns +
counters tick (line ~15432 pattern in chat_completions).
- Calls _spawn_background_review(messages_snapshot=, review_memory=,
review_skills=) only when at least one trigger fires.
- Adds the external memory provider sync (_sync_external_memory_for_turn)
that the chat_completions path runs after every turn.
Tests:
Replaced the broken test_background_review_invoked (which only
asserted spawn.called) with three sharper tests:
- test_background_review_NOT_invoked_below_threshold:
single turn at default thresholds → no review fires (would have
caught the original 'every turn calls spawn with no args' bug)
- test_background_review_skill_trigger_fires_above_threshold:
10 tool_iterations at threshold=10 → review fires with
messages_snapshot=list, review_skills=True, counter resets
- test_background_review_signature_never_breaks: regression guard
asserting positional args are always empty and kwargs include
messages_snapshot
New TestReviewForkApiModeDowngrade class:
- test_codex_app_server_parent_downgrades_review_fork: drives the
real _spawn_background_review function (no mock at that level),
asserts the review_agent gets api_mode='codex_responses' when
the parent was codex_app_server.
Live-validated against real run_conversation:
- Counter ticked from 0 to 5 after a 5-tool-iteration turn
- _spawn_background_review fired exactly once with kwargs-only signature
- review_skills=True, review_memory=False
- messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
results + 1 final assistant + initial system/user)
- Counter reset to 0 after fire
170 codex-runtime tests, all green.
Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).
* feat(codex-runtime): expose kanban tools through Hermes MCP callback
Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.
That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.
Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.
Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.
Tools exposed:
Worker handoff (require HERMES_KANBAN_TASK):
kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
Read-only board queries:
kanban_show, kanban_list
Orchestrator (require HERMES_KANBAN_TASK unset):
kanban_create, kanban_unblock, kanban_link
Tests:
- test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
- test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link
Docs:
- New 'Workflow features' section in the docs page covering /goal,
kanban, and cron behavior on this runtime
- /goal: works fully via run_conversation feedback; only caveat is
approval-prompt noise on long writes-heavy goals (mitigated by
the default :workspace permission profile)
- Kanban: enumerated which tools are reachable via the callback and
why the env var propagates correctly through the codex subprocess
to the MCP server subprocess
- Cron: documented as 'not specifically tested' — same rules as the
CLI apply since cron runs through AIAgent.run_conversation
- Trade-offs table gained rows for /goal, kanban worker, kanban
orchestrator
172/172 codex-runtime tests green (+2 from kanban tests).
* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost
Three docs gaps caught during a final audit:
1. /codex-runtime was only in the feature docs page, not in the
slash-commands reference. Added rows to both the CLI section and
the Messaging section so users discover it where they'd look for
slash command syntax.
2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
CODEX_HOME lets users redirect Codex CLI's config dir (the migration
honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
propagates to the codex subprocess + the hermes-tools MCP subprocess
so kanban worker tools gate correctly — documented as 'don't set
manually' since it's an internal handoff.
3. Aux client behavior on this runtime. When openai_runtime=
codex_app_server is on with the openai-codex provider, every aux
task (title generation, context compression, vision auto-detect,
session search summarization, the background self-improvement review
fork) flows through the user's ChatGPT subscription by default.
This is true for the existing codex_responses path too, but it's
more visible / important here because users explicitly opted in for
subscription billing. Added a 'Auxiliary tasks and ChatGPT
subscription token cost' section to the docs page with a YAML
example showing how to override specific aux tasks to a cheaper
model (typically google/gemini-3-flash-preview via OpenRouter).
Also documents how the self-improvement review fork gets
auto-downgraded from codex_app_server to codex_responses by the
fix earlier in this PR.
No code changes — pure docs. 172 codex-runtime tests still green.
* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME
OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.
Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:
test_spawn_env_preserves_HOME — confirms parent HOME survives intact
in the subprocess env
test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
arg still isolates
codex state correctly
Docs additions:
'HOME environment variable passthrough' section — calls out the
contract explicitly: CODEX_HOME isolates codex's own state, HOME
stays user-real so gh/git/aws/npm/etc. find their normal config.
Cites openclaw#81562 as the cautionary tale.
'Multi-profile / multi-tenant setups' section — addresses the
related concern: profiles share ~/.codex/ by default. For users who
want per-profile codex isolation (separate auth, separate plugins),
documents the manual CODEX_HOME=<profile-scoped-dir> approach.
Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
would silently invalidate existing codex login state for anyone
upgrading to this PR with tokens already at ~/.codex/auth.json.
Opt-in is safer than surprising users.
174 codex-runtime tests (+2 from HOME guards), all green.
* fix(codex-runtime): TOML control-char escapes + atomic config.toml write
Two footguns caught in a final audit pass before merge.
Bug 1: TOML control characters not escaped
The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.
Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.
Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.
Bug 2: config.toml write wasn't atomic
If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.
Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.
Tests:
- test_string_with_newline_escaped — \n in value → \n in output
- test_string_with_tab_escaped — \t in value → \t in output
- test_string_with_other_controls_escaped — \r, \f, \b
- test_windows_path_escaped_correctly — backslash doubling
- test_atomic_write_no_temp_leak_on_success — no .config.toml.*
left over after a successful write
- test_atomic_write_cleanup_on_rename_failure — temp file removed
when Path.replace raises (simulated disk full)
180 codex-runtime tests, all green (+6 from this commit).
Footguns audited but NOT fixed (with rationale):
- Concurrent migrations race. Two Hermes processes hitting
/codex-runtime codex_app_server within seconds of each other could
cause one writer to lose entries. Low probability (you'd have to
enable from two surfaces simultaneously) and low impact (just re-run
migration). Adding fcntl/msvcrt locking is more code than it's
worth here. The atomic rename above means each individual write is
consistent — only the merge step is racy.
- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
check at runtime but don't reject too-new versions. Right call —
the protocol has been stable through 0.125 → 0.130. If OpenAI
breaks it later we'd see the error in test_codex_app_server_runtime
on CI before users hit it.
|
||
|
|
256bedb632
|
fix(setup): drop post-setup chat handoff (#25067)
Removes the 'Launch hermes chat now? (Y/n)' prompt at the end of hermes setup. The summary already prints 'Ready to go! → hermes' so the auto-launch was redundant, and on macOS 26+ it could crash in prompt_toolkit when setup was invoked from the curl install script with stdin redirected from /dev/tty (#5884, #6128). After setup, users run 'hermes' themselves like every other CLI tool. Same pattern applies to the Windows installer. Closes #6128 (narrower env-var-guarded fix superseded by removing the prompt outright). |
||
|
|
6f2d1c88b7 |
feat(custom): prompt and persist explicit api_mode for custom providers
Adds an explicit API compatibility mode prompt to the `hermes model -> custom`
flow so Codex-compatible third-party endpoints (and any other non-default
backend whose URL doesn't match the existing heuristics in
`_detect_api_mode_for_url`) can be selected explicitly instead of silently
falling back to chat_completions.
Choices: Auto-detect / chat_completions / codex_responses / anthropic_messages.
Persists `api_mode` to:
- `model.api_mode` (active session config)
- the matching `custom_providers[*]` entry (so re-activating the named
provider next time replays the same transport)
Salvaged from PR #6125 onto current main: kept the new prompt and the
`_save_custom_provider(api_mode=...)` plumbing; the named-custom flow
already extracts and applies `api_mode` from the saved entry on current
main so those changes are preserved as-is. Test fixtures updated for the
new prompt and the existing display-name prompt.
Co-authored-by: littlewwwhite <1095245867@qq.com>
|
||
|
|
a43d7e67b4 |
refactor(profiles): remove dead generate_bash_completion / generate_zsh_completion
These two functions in hermes_cli/profiles.py have no callers — the live
`hermes completion {bash,zsh}` command uses hermes_cli/completion.py's
generate_bash() / generate_zsh() instead. Multiple PRs (incl. #6141) tried
to fix the trailing-`_hermes "$@"` zsh bug here, only to discover the
patch never reached users. Delete the dead code so future contributors
patch the right file.
The actual user-facing fix lives in the preceding cherry-picked commits
to hermes_cli/completion.py.
|
||
|
|
6d30b4a7e3 | test(cli): strengthen zsh completion regression coverage | ||
|
|
8c4bec6155 | fix(cli): repair broken zsh completion generation | ||
|
|
9b2488af2a |
fix: include arg-taking commands in Telegram menu
Built-in commands with required args (e.g. /queue, /steer, /background) were excluded from Telegram setMyCommands output, making them invisible in the autocomplete menu. However, their handlers already return usage text when invoked without arguments, so hiding them hurts discoverability. This commit removes the _requires_argument filter for built-in commands (COMMAND_REGISTRY) while keeping it for plugin-registered slash commands, which may not provide a no-arg usage fallback. Closes #24312 |
||
|
|
3955aefced
|
fix(install): use --extra all not --all-extras; drop lazy-covered extras from [all] (#24515)
* fix(install): use `--extra all` not `--all-extras`; drop lazy-covered extras from [all]
Two coupled fixes for the Windows install hang where uv sync built
python-olm from sdist and failed on missing make.
# Root cause: --all-extras vs --extra all (credit: ethernet)
`uv sync --all-extras` installs every key in [project.optional-
dependencies], bypassing the curated [all] extra entirely. So even
when [all] excluded [matrix], [rl], [yc-bench], etc., the installer
pulled them anyway because they were still defined as extras. On
Windows that meant python-olm (no wheel, needs make to build from
sdist) and the install died there.
The right flag is `--extra all` — install just the [all] extra's
contents, respecting curation. Empirically verified via dry-run:
--all-extras: pulls python-olm, mautrix, ctranslate2, onnxruntime,
atroposlib, tinker, wandb, modal, daytona, vercel,
python-telegram-bot, discord.py, slack-bolt,
dingtalk-stream, lark-oapi, anthropic, boto3,
edge-tts, elevenlabs, exa-py, fal-client, faster-
whisper, firecrawl-py, honcho-ai, parallel-web
--extra all: pulls none of those — just [all]'s curated set
Dockerfile already uses `--extra all` (with comment explaining the
gotcha) — knowledge existed; the gap was install.sh / install.ps1 /
setup-hermes.sh.
Sites fixed: scripts/install.sh L1118, scripts/install.ps1 L809,
setup-hermes.sh L245.
# Companion fix: drop lazy-covered extras from [all]
`tools/lazy_deps.py` already covers anthropic, bedrock, exa,
firecrawl, parallel-web, fal, edge-tts, elevenlabs, modal, daytona,
vercel, all messaging platforms (telegram/discord/slack/matrix/
dingtalk/feishu), honcho, and faster-whisper. They were ALSO in
[all], which defeats the whole point of lazy-install — fresh
installs eager-pulled them and inherited whatever was broken
upstream (the matrix → python-olm → no Windows wheel chain being
the proximate symptom).
[all] now contains only what genuinely can't be lazy-installed:
cron, cli, dev, pty, mcp, homeassistant, sms, acp, google, web,
youtube. Same trim applied to [termux-all]. New regression test
asserts the contract: every extra in LAZY_DEPS must NOT also appear
in [all].
# Companion fix: surface uv progress + errors
setup-hermes.sh's hash-verified path swallowed uv's stderr to a
tempfile, identical to the install.sh bug fixed in PR #24504. Same
fix applied: stream stderr through directly so users see live
progress instead of staring at a frozen prompt.
# Files
- pyproject.toml: trim [all] and [termux-all] to non-lazy extras only.
- scripts/install.sh: --all-extras → --extra all; trim _ALL_EXTRAS /
_PYPI_EXTRAS to match.
- scripts/install.ps1: --all-extras → --extra all; trim $allExtras /
$pypiExtras to match.
- setup-hermes.sh: --all-extras → --extra all; stream stderr.
- tests/test_project_metadata.py: invert matrix-in-[all] assertion;
add lazy-coverage contract test.
- uv.lock: regenerated.
# Validation
5/5 metadata tests pass. 37/37 in update_autostash + tool_token_
estimation. `uv lock --check` passes. Empirical dry-run confirms
`--extra all` excludes python-olm + RL chain on the new lockfile.
* fix(install): parse [all] from pyproject.toml instead of mirroring it
ethernet's review point: the previous patch left two hand-mirrored
copies of [all]'s contents (in install.sh's $_ALL_EXTRAS and
install.ps1's $allExtras). That guarantees future drift the next
time pyproject.toml's [all] changes.
Now both scripts parse pyproject.toml at install time using stdlib
tomllib (Python 3.11+, which the bootstrap step already requires).
Single source of truth. The only purpose of the parsed list is to
build the 'Tier 2: [all] minus broken extras' fallback spec — so we
parse, filter against $brokenExtras, and rebuild the .[a,b,c] spec.
Also: removed redundant fallback tiers.
Before: Tier 1 [all]
Tier 2 [all] minus broken
Tier 3 PyPI-only extras (no git deps)
Tier 4 [web,mcp,cron,cli,messaging,dev]
Tier 5 .
After: Tier 1 [all]
Tier 2 [all] minus broken
Tier 3 .
Tier 3 (PyPI-only) and Tier 4 (dashboard+core) used to dodge the [rl]
git+sdist deps and the [matrix] python-olm build. Both are no longer
in [all] post-2026-05-12 lazy-install migration, so the carve-out
tiers had no remaining content. Tier 4 also referenced [messaging],
which is now lazy-installed — the hardcoded fallback was actually
inconsistent with the new policy.
Defensive fallback: if tomllib parse fails (corrupted pyproject,
unexpected schema), Tier 2 collapses to '.[all]' (same as Tier 1) so
the broken-extras path becomes a no-op rather than crashing.
* fix(gateway): hide Matrix from setup picker on Windows
Matrix is the one messaging platform that has no working install path
on Windows: [matrix] -> mautrix[encryption] -> python-olm, which has
Linux-only wheels and needs make + libolm to build from sdist. The
[all] cleanup in this PR keeps mautrix out of fresh installs, but a
user who picked Matrix in 'hermes setup gateway' would still walk
into the same sdist build failure when the wizard tried to install
the extra.
Hide the option at the picker so users never get the chance to try.
The gate lives in _all_platforms() — single source of truth for the
setup wizard, the curses gateway-config menu, and any future picker.
Adapter loading at runtime is intentionally NOT gated: users who
already have MATRIX_* env vars set (e.g. config copied from a Linux
install) keep working if they somehow have python-olm available.
This is the lowest-friction fix — picker visibility only.
Tests cover linux/darwin/win32 and verify other platforms aren't
collateral damage.
|
||
|
|
642768c5c7
|
Merge pull request #24161 from NousResearch/austin/fix/dashboard
fix(dashboard): UI polish — modals, layout, consistency |
||
|
|
c23a87bc16
|
union paid recs from nous portal with static list (#24509) | ||
|
|
fc3fd6bb6b |
fix(dashboard): UI polish — modals, layout, consistency, test fixes
Dashboard UX polish pass — consolidates create forms into modals triggered from the page header, fixes layout inconsistencies, adds scroll-to navigation for the Keys page, and aligns the TokenBar with the design system. Changes: - App.tsx: add padding to sidebar header - resolve-page-title.ts: add missing routes, better fallback title - en.ts: fix nav labels (Profiles was 'profiles : multi agents') - ModelsPage: two-col layout, auxiliary tasks modal, TokenBar redesign - ProfilesPage: create button in header, form in modal, Checkbox component - CronPage: create button in header, form in modal - EnvPage: scroll-to sub-nav in header, fix text overflow Modal and dialog standardization: - Replace all native confirm()/window.confirm() with ConfirmDialog (OAuthProvidersCard, PluginsPage, ModelsPage, ConfigPage) - Add useModalBehavior hook (Escape-to-close, scroll lock, focus restore) - Apply hook to ProfilesPage, CronPage, AuxiliaryTasksModal Component fixes (from PR review): - Checkbox: fix controlled/uncontrolled mismatch, add focus-visible ring - TokenBar: add rounded-full to legend dots, remove dead code CI/test fixes: - Fix TS unused imports (noUnusedLocals), type-narrow PickerTarget union - Add windows-footgun suppression on platform-guarded os.killpg - Fix 19 stale unit tests + 9 e2e tests broken by recent main changes - Restore minimal example-dashboard plugin for plugin auth test |
||
|
|
c1eb2dcda7
|
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
|
||
|
|
58e2109f10 |
fix(minimax): harden OAuth dashboard and runtime
Handle MiniMax OAuth expiry values consistently across CLI and dashboard flows, fix CLI status/add behavior, and force pooled OAuth runtime requests through Anthropic Messages. - web_server._minimax_poller: parse expired_in via the shared resolver so unix-ms absolute timestamps stop landing as TTL seconds and crashing with 'year 583911 is out of range' when a user connects MiniMax OAuth from the dashboard. - auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on the CLI login + refresh paths. - auth.get_auth_status: dispatch minimax-oauth to its dedicated status function instead of falling through. - auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now starts the device-code login flow and persists a pool entry with the access + refresh tokens, instead of requiring credentials to already exist. - runtime_provider._resolve_runtime_from_pool_entry: pin pooled minimax-oauth credentials to anthropic_messages so a stale model.api_mode: chat_completions can't send requests to /anthropic/chat/completions and trigger MiniMax nginx 404s. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e85592591e
|
fix(nous): surface Portal-flagged free models in picker even when curated list is stale (#24082)
Free-tier users were seeing 'No free models currently available.' in the
`hermes model` and post-login pickers even though qwen/qwen3.6-plus is
free on the Portal right now. Three independent breakages compounded:
1. The docs-hosted catalog manifest at website/static/api/model-catalog.json
was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users
fetching the manifest got a list that didn't include qwen/qwen3.6-plus.
2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip,
collapsing get_pricing_for_provider('nous') to {} and making every
curated model fall through the free-tier filter as 'paid'.
3. Even with healthy pricing, the picker only ever showed models from the
in-repo curated list intersected with live pricing — a Portal-flagged
free model not yet in the curated list could never appear.
Changes:
- hermes_cli/models.py: new union_with_portal_free_recommendations() that
augments the curated list with Portal freeRecommendedModels entries
(with synthetic free pricing so partition keeps them). The Portal's
/api/nous/recommended-models endpoint is now the source of truth for
free-tier surfacing — old Hermes builds will see new free models
without a CLI release.
- hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to
the public inference base URL when runtime cred resolution fails.
The /v1/models endpoint exposes pricing without auth, so silently
returning {} just because a refresh token expired was wrong.
- hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call
sites call union_with_portal_free_recommendations() before partition.
- tests/hermes_cli/test_models.py: 7 tests covering union behaviour
(prepend, dedup, end-to-end with stale pricing, empty/missing/error
payloads, invalid entries).
- tests/hermes_cli/test_model_catalog.py: drift guard
TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous']
or OPENROUTER_MODELS is edited without re-running
scripts/build_model_catalog.py. Verified empirically that removing a
manifest entry triggers an assertion with an actionable error message.
Validation:
- 133/133 targeted tests pass (test_models, test_model_catalog,
test_auth_nous_provider).
- Live E2E against the real Portal:
- Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no
qwen) → after union: ['qwen/qwen3.6-plus', ...] →
partition(free_tier=True): selectable=['qwen/qwen3.6-plus'].
- Simulated expired refresh token → anon fetch returns 403 pricing
entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}.
- ruff: clean.
|
||
|
|
ced1990c1c
|
feat(computer-use): refresh cua-driver on hermes update + add install --upgrade (#24063)
cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall. Two refresh points now: - `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call. - `hermes computer-use install --upgrade` for manual force-refresh. The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed. `hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing. |
||
|
|
05bad7b1e7 |
fix(dashboard): MiniMax 'Login' button launched Claude OAuth (#22832)
Fixes #22832. ## Root cause `hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by the catalog's `flow` field rather than provider id: if catalog_entry["flow"] == "pkce": return _start_anthropic_pkce() The catalog had two `flow: "pkce"` entries — `anthropic` and `minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's Keys tab unconditionally launched the Anthropic/Claude PKCE flow. ## Fix Three changes in `hermes_cli/web_server.py`: 1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to `flow: "device_code"`. From a UX perspective MiniMax is a verification-URI + user-code flow (open URL, enter code, backend polls) — same shape as Nous's device-code flow. The PKCE bit (verifier + challenge from `_minimax_pkce_pair`) is a security extension that doesn't change the operator experience; the existing dashboard modal already renders `device_code` correctly for this UX. 2. New MiniMax branch in `_start_device_code_flow`, mirroring the existing Nous branch but calling MiniMax-specific helpers (`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes verifier + state in the session for the poller to consume. Handles the overloaded `expired_in` field (could be unix-ms timestamp OR seconds-from-now duration) the same way `_minimax_poll_token` does. 3. New `_minimax_poller` background thread mirroring `_nous_poller`. Calls `_minimax_poll_token` → on success builds the same `auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and persists via `_minimax_save_auth_state` so the dashboard path leaves the system in the same state as `hermes auth add minimax-oauth`. Plus a dispatcher tightening to prevent regression: the `pkce` branch now requires `provider_id == "anthropic"`, so any future PKCE provider added without a proper start function gets a clean `400 Unsupported flow` rather than silently launching Anthropic OAuth. ## Test New `tests/hermes_cli/test_web_oauth_dispatch.py`: - Regression test asserting MiniMax start does NOT return claude.ai - Sanity test that Anthropic PKCE still works after the dispatcher tightening - Forward-looking test: a hypothetical pkce-flagged provider without an explicit branch is rejected cleanly rather than misrouted ## Limitations - The dashboard MiniMax path defaults to `region="global"`. CN-region operators can still use the CLI flow which supports `--region cn`. Adding a region toggle to the dashboard UI is a follow-up. |
||
|
|
c6ca11618a |
refactor(tui): simplify TUI build logic, remove stale staleness checks
The old mtime-tracking staleness machinery (_tui_build_needed, _hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding by comparing source timestamps to dist/entry.js. This was fragile and added ~100 lines of code. Replace with three clear paths: 1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build 2. --dev mode: tsx src/entry.tsx, no build, hot reload 3. Normal: always npm run build (esbuild is ~1s, correctness > caching) Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt bundle has no source code to hot-reload). |
||
|
|
3197b4de6d | Merge remote-tracking branch 'origin/main' into fix/bundle-size | ||
|
|
8e2eb4b511
|
fix(/model): surface Nous Portal models from remote catalog manifest (#23912)
The /model picker for Nous Portal users was returning the in-repo _PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes releases — instead of the remote manifest published at https://hermes-agent.nousresearch.com/docs/api/model-catalog.json. OpenRouter already pulled from the manifest via fetch_openrouter_models; "nous" was the only curated provider where the existing manifest plumbing (get_curated_nous_model_ids → get_curated_nous_models) was defined but not wired into the picker pipeline. Switch the curated build in list_authenticated_providers to use it, with the same graceful fallback to the in-repo snapshot when the manifest is unreachable. Test: tests/hermes_cli/test_model_catalog.py exercises the picker with a patched manifest and asserts the manifest's nous list reaches list_picker_providers. Falls-back-to-static path was already covered by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog. |
||
|
|
283381b1ce |
fix(dashboard): validate dist exists when --skip-build is set
Follow-up to PR #23824. Adds two correctness fixes on top of the contributor's salvaged commit: 1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard` passes `fatal=True` and is the primary scenario this fallback is for (issue #23817 — Windows Scheduled Task at logon). The previous gate meant the fallback never fired in the case it was designed for. 2. `--skip-build` now verifies the dist actually exists before starting the server. Without this, a misconfigured pre-build would launch the dashboard pointing at a missing dist and silently serve 404s. We now exit 1 with a clear "pre-build first: cd web && npm run build" message, and on success print which dist directory is being used. Verified end-to-end on Linux: - build fails + stale dist (fatal=True) -> fallback fires - build fails + no dist (fatal=True) -> exit 1 with stderr surfaced - build fails + stale dist (fatal=False) -> fallback fires - --skip-build + missing dist -> exit 1 with clear guidance - --skip-build + valid dist -> 'Skipping web UI build...' |
||
|
|
a479ec01ed |
fix: make web UI build output decoding robust on Windows
On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning. This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows. The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics. |
||
|
|
3e7145e0bb
|
revert: roll back /goal checklist + /subgoal feature stack (#23813)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit |
||
|
|
641e40c4bd | fix(kanban): restore HERMES_KANBAN_BOARD after scoped slash override | ||
|
|
f6d4f3c37d | fix(kanban): route gateway create auto-subscribe to explicit board | ||
|
|
228a4d11ae
|
fix(config): warn loudly on YAML parse failure instead of silent default fallback (#23585)
A YAML parse error in ~/.hermes/config.yaml caused load_config() to print one line to stdout (Warning: Failed to load config: ...) and silently fall back to DEFAULT_CONFIG, dropping every user override (auxiliary providers, fallback chain, model settings). Users only noticed when downstream behavior misbehaved — see issue #23570 where a tab-indent error in the auxiliary section caused aux fallback to use OpenRouter (depleted) instead of the configured Codex/MiniMax chain. Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam, and re-warn after the user edits the file. Both call sites (raw read + merged load) route through the same helper. Refs #23570 |
||
|
|
3b122cc1ac
|
feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578)
Surface ready tasks that nobody claims within a threshold (default 30 min) regardless of why. One identity-agnostic signal that catches: - Operator typo'd the assignee - Profile was deleted, leaving its tasks stranded - External worker pool (Codex CLI lane, custom daemon) is down - Dispatcher misconfigured (wrong board / wrong HERMES_HOME) Today the dispatcher correctly skips these (no respawn loop, good) but nothing surfaces the fact that operator-actionable work is accumulating. The new `stranded_in_ready` rule does that without requiring a manual lane registry — it reads the most recent ready- transition event (`created` / `promoted` / `reclaimed` / `unblocked`) and fires when (now - last_ready_ts) > threshold. Severity escalates with age: warning at threshold, error at 2x, critical at 6x. The cli_hint and reassign actions point operators at the right next step. Out of scope deliberately: - Lane registry (#20157 closed) — this signal supersedes it. - Pushing the diagnostic into messaging gateways — diagnostics are pull-only via 'hermes kanban diagnostics' for now; gateway push is a separate UX decision. Tests: 10 new + 461 existing kanban tests pass. E2E verified end- to-end via 'hermes kanban diagnostics --json' against a 2h-old stranded task — surfaces as error severity with correct actions. |
||
|
|
a63a2b7c78
|
fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)
Live-tested on gemini-3-flash-preview the judge kept returning empty or non-JSON content, tripping the consecutive-parse-failures auto- pause. Free-form JSON output is hopeful; tool-call schemas are enforced server-side by virtually every modern provider. Two new tools the judge calls: - submit_checklist(items) — Phase A, decompose - update_checklist(updates, new_items, reason) — Phase B, evaluate Both phases now call the auxiliary client with tool_choice forcing the right tool. read_file remains for Phase B history inspection, with the loop exiting only when update_checklist is called or the read budget is exhausted (at which point read_file is dropped from the toolbox and update_checklist is forced). Robustness: - _call_judge_with_tool_choice falls back tool_choice forced→required→ auto if the provider rejects a particular shape. - If a fully-broken provider still returns content instead of a tool call, the legacy JSON-text parsers stay around as a last-ditch backstop so we never silently lose a checklist. - _normalize_update_args replaces the JSON parser for the apply layer; same 1-based→0-based conversion + terminal-status filter. Live verification: same fizzbuzz goal that was hitting 'judge model returned unparseable output 3 turns in a row' before now terminates in 2 turns, all 11 items marked completed with item-specific evidence, no auto-pause. Agent log shows 'produced 11 checklist items via tool call' instead of the JSON- parse path. Tests: 7 new cases for the tool-call path (Phase A success, Phase B update only, Phase B read_file→update, JSON-content backstop, empty-text item dropping, non-terminal status filter). |
||
|
|
4a080b1d5a
|
fix(goals): forward standing /goal state on auto-compression session rotation (#23530)
When run_agent's _compress_context fires mid-turn it ends the parent
session in SessionDB and creates a new continuation session with a
fresh session_id. The /goal state is keyed on session_id in
state_meta ("goal:<sid>"), so without forwarding the goal silently
disappears: _get_goal_manager() rebinds for the new session_id,
load_goal() returns None, mgr.is_active() is False, and the
continuation loop dies with no user-visible signal.
Fix: in the same SessionDB transaction block that creates the
continuation session, copy state_meta[goal:<old>] →
state_meta[goal:<new>] when present. No-op when the user has no
active goal. Logged at INFO so a stuck loop is debuggable.
Tests cover the round-trip via SessionDB and the no-op path.
Affects all three run-conversation surfaces (CLI, gateway, TUI
gateway) because _compress_context is the single rotation site.
|
||
|
|
ba5640fa11 | fix(gateway): route kanban notifications to creator profile | ||
|
|
404640a2b7
|
feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
|
||
|
|
62cfe79e93 |
fix(tools): clarify kanban_complete phantom-card retry guidance
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.
Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.
Fixes #22923
|
||
|
|
88588b6159 |
fix(kanban): extend stale claim instead of killing live worker
Workers running slow models (e.g. kimi-k2.6) can spend longer than DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making no tool calls and therefore not heartbeating. release_stale_claims previously reclaimed these healthy workers, producing the spawn-then-immediately-reclaim loop reported in #23025. When a stale-by-TTL claim's host-local worker PID is still alive, extend the claim (emit a claim_extended event) rather than killing it. enforce_max_runtime / detect_crashed_workers remain the upper bounds for genuinely wedged or dead workers. Reclaim events now also record claim_expires, last_heartbeat_at, worker_pid, and host_local so operators can see why a worker was killed. |
||
|
|
787e3c368c |
test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests
Follow-up to the previous commit's notifier behavior change. Two test fixes: 1. `tests/gateway/test_kanban_notifier.py` gains `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new contract directly: a task that crashes, gets reclaimed, and crashes again notifies the user BOTH times. Before #21398 the second crash silently dropped because the subscription was already deleted. 2. `tests/hermes_cli/test_kanban_notify.py:: test_notifier_unsubs_after_abnormal_events[gave_up|crashed|timed_out]` is flipped. Those tests were added in the salvage of #22941 and asserted the OLD behavior (subscription deleted after gave_up / crashed / timed_out). They're now obsolete — the new contract is "subscription survives a non-final terminal event so retries reach the user." Updated docstring + asserts; the cursor-advance check is added to confirm the dedup mechanism still works. The `test_notifier_unsubs_after_completed_event` test stays untouched because `completed` IS still a terminal event that triggers unsub (the task hits `done` status, which is handled by the `task_terminal` branch in the notifier loop). |
||
|
|
861ce7c0b6 | fix: dedupe kanban notifier delivery claims | ||
|
|
00ce5f04d9 |
feat(session): make /handoff actually transfer the session live
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.
State machine on sessions table replaces the boolean flag:
None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.
CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.
Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:
1. Resolves the platform's home channel.
2. Asks the adapter for a fresh thread via the new
create_handoff_thread(parent_chat_id, name) capability so the
handed-off conversation gets its own scrollback. Adapters that
don't support threads (or fail) return None and the watcher
falls back to the home channel directly.
3. Constructs a SessionSource keyed as 'thread' when a thread was
created, 'dm' otherwise, then session_store.switch_session
re-binds the destination key to the CLI session_id. The full
role-aware transcript replays via load_transcript on the next
turn (no flat-text injection into context_prompt).
4. Forges a synthetic MessageEvent(internal=True) with the handoff
notice and dispatches through _handle_message; the agent runs
against the loaded transcript and adapter.send delivers the
reply.
5. Marks the row 'completed' on success, 'failed' (+error) on any
exception.
Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:
- Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
so DM topics (Bot API 9.4+) and forum supergroups both work.
- Discord (gateway/platforms/discord.py): parent.create_thread on
text channels with a seed-message + message.create_thread
fallback for permission edge cases. Skips DMs and other
non-thread-capable parents.
- Slack (gateway/platforms/slack.py): posts a seed message and
returns its ts as the thread anchor — Slack threads are
message-anchored.
In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.
CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).
Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.
Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
|
||
|
|
878611a79d |
feat(session): add /handoff command for cross-platform session transfer
Adds /handoff <platform> CLI command that queues the current session for resume on the configured home channel of any messaging platform. CLI side: - /handoff telegram — marks session in shared DB, sends summary to the Telegram home channel via send_message - /handoff discord — same for Discord - Supports telegram, discord, slack, whatsapp, signal, matrix Gateway side: - On new session creation, checks for pending handoffs for the incoming message's platform - If found, loads the CLI session's full conversation history and injects it into the context prompt as a handoff transcript - Agent continues the conversation seamlessly Files: - hermes_state.py: handoff_pending, handoff_platform columns + helpers - cli.py: _handle_handoff_command dispatch + handler - hermes_cli/commands.py: CommandDef entry - gateway/run.py: handoff detection in _handle_message_with_agent - tests/hermes_cli/test_session_handoff.py: 8 tests |
||
|
|
845be254ec | fix(kanban): cap dispatch by running workers | ||
|
|
1f5983c4c8 |
feat(kanban): aggregate all toolset-name typos in skills before raising
Follow-up to the previous commit's toolset-vs-skill validation.
The contributor's fix raises ValueError on the first toolset name found
in the skills list. That works for one mistake, but agents that confuse
skills with toolsets usually pass several at once
(`skills=["web", "browser", "terminal"]`) — and serial-correcting one
per failure round-trip wastes tokens. Collect all toolset-shaped
entries first, then raise once with the full list.
The error message is also slightly clearer:
'web', 'browser', 'terminal' are toolset names, not skill name(s).
Put toolsets in the assignee profile's `toolsets:` config instead of
per-task skills. Skills are named skill bundles (e.g. `kanban-worker`,
`blogwatcher`); toolsets are runtime capabilities (e.g. `web`,
`browser`, `terminal`).
vs. the previous "the assignee profile's toolsets" — explicitly naming
the YAML key (`toolsets:`) and giving concrete examples in both
categories closes the conceptual gap that produced the bug to begin
with.
Adds one regression test (test_create_task_skills_lists_all_toolset_typos)
covering the multi-name aggregation path. The single-typo test from
the original PR still passes (the loose `match="toolset name"` matches
both singular and plural forms).
|
||
|
|
673418dfa1 | fix(kanban): reject toolset names in task skills | ||
|
|
40a4bfa719 |
test(kanban): cover task_age safe-int guards + AUTHOR_MAP entry
Follow-up to the previous commit's safe-int task_age fix. The original PR shipped without test coverage. This commit adds: - test_safe_int_accepts_int_and_int_string — sanity for the well-typed path so the helper itself can't quietly start swallowing valid values. - test_safe_int_returns_none_on_corrupt_inputs — the failure modes (None, '%s', 'abc', '', '1.5', random objects). Covers both the ValueError and TypeError catch branches. - test_task_age_handles_corrupt_created_at — the headline regression: a task with created_at='%s' used to raise ValueError and turn GET /api/plugins/kanban/board into a 500. - test_task_age_handles_corrupt_started_and_completed — confirms the safe-int treatment is consistent across all three timestamp fields. - test_task_age_well_formed_task — regression that the safe path doesn't change observable output for normal data. - test_task_dict_survives_corrupt_created_at — defense in depth. Writes a corrupt row directly via SQL, reads it back through the ORM, and confirms task_age + the surrounding plugin_api guard degrade gracefully instead of crashing. Also adds the AUTHOR_MAP entry for the contributor's GitHub-noreply email so release notes credit @baocin (the commit was authored locally as `aoi <aoi@hino.local>` — re-attributed during salvage to the github noreply form). |
||
|
|
62b1c74cbc |
fix(kanban): correct dispatcher spawn module name + PATH-first lookup
Follow-up to the previous commit's contributor cherry-pick.
The cherry-picked change replaced the bare ``["hermes", ...]`` spawn with
``[sys.executable, "-m", "hermes", ...]``. The intent was right (avoid
PATH dependence — cron, systemd User= services, launchd jobs, and other
detached dispatcher invocations routinely run with a stripped $PATH that
doesn't include the venv's bin/, breaking the bare-shim spawn) but the
module name is wrong: there is no top-level ``hermes`` package. The
console-script entry point in pyproject.toml is
``hermes = "hermes_cli.main:main"``, and ``python -m hermes`` fails with
``No module named hermes``. The cherry-picked form would have replaced a
sometimes-broken spawn with an always-broken one.
This commit:
- Adds ``_resolve_hermes_argv()``, mirroring ``gateway.run._resolve_hermes_bin``.
Tries ``shutil.which("hermes")`` first (preferred — keeps existing ``ps``
output and log lines familiar in the common case) and falls back to
``[sys.executable, "-m", "hermes_cli.main"]`` when the shim is not on
PATH. The fallback goes through the running interpreter so it's
PATH-independent. Kept as a local helper rather than imported from
gateway because ``hermes_cli`` sits below ``gateway`` in the dependency
order.
- Switches the dispatcher's ``cmd`` list to use ``*_resolve_hermes_argv()``.
- Adds three regression tests:
* ``test_resolve_hermes_argv_prefers_path_shim`` — pins the PATH-first
branch so a future refactor doesn't silently flip the order.
* ``test_resolve_hermes_argv_falls_back_to_module_form_when_no_path_shim`` —
pins the correct module name (``hermes_cli.main``, NOT ``hermes``).
Direct regression guard for the form that shipped in the original PR.
* ``test_resolve_hermes_argv_module_actually_runs`` — runs the fallback
invocation as a real subprocess and asserts ``--version`` works, so
losing ``hermes_cli.main``'s ``__main__`` handling can't slip past the
string-match test.
Verified end-to-end: with the shim on PATH the resolver returns
``[/.../hermes]`` and ``--version`` works; with the shim removed the
resolver returns ``[python, -m, hermes_cli.main]`` and ``--version``
still works; the original PR's ``python -m hermes`` invocation fails as
expected (``No module named hermes``).
|