mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
1589 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b59ed9c6bc |
fix(discord): handle forwarded messages via message_snapshots
Discord introduced message_snapshots for forwarded messages — text and attachments live inside snap.content / snap.attachments rather than on the parent message. _handle_message wasn't reading them, so forwards showed up empty. Defensively extracts snapshot text (when raw_content is empty) and appends snapshot attachments to the working all_attachments list used for type detection and media routing. hasattr/getattr guards keep this safe on older discord.py installs without the field. Salvage of #25462 by @1RB (manually re-applied — original branch was stale against current main). |
||
|
|
8f19078c6a
|
feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited. |
||
|
|
d863773c81 |
feat(discord): add thread_require_mention for multi-bot threads
By default, once Hermes participates in a Discord thread (auto-created on @mention or replied in once) it auto-responds to every subsequent message in that thread without requiring further @mentions. That's the right default for one-on-one conversations and isolated channel threads. But it's a confirmed footgun in multi-bot threads. When a user invokes one bot per turn — addressing Codex first, then Hermes — every other bot in the thread also fires on every message, burning credits and spamming the channel. Author has hit this personally in active multi-bot research-team threads. Add a new `discord.thread_require_mention` config key (env: `DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing behavior. When `true`, the in-thread mention shortcut is disabled and threads are gated the same way channels are. Explicit @mentions still pass through as expected. Mirrors the existing helper shape (config.extra > env > default) and the existing yaml→env bridge pattern used by `require_mention`. Changes: - gateway/platforms/discord.py: new `_discord_thread_require_mention()` helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()` - gateway/config.py: bridge `discord.thread_require_mention` from config.yaml to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing `require_mention` bridge two lines above) - hermes_cli/config.py: add `thread_require_mention: False` default to DEFAULT_CONFIG['discord'] - tests/gateway/test_discord_free_response.py: 4 new tests covering default behaviour (in-thread shortcut still works), enabled behaviour (mention required in threads), enabled+mentioned (mention still passes through), and yaml-via-config.extra path. Also clears DISCORD_* env vars in the `adapter` fixture so process-env state from the contributor's shell doesn't leak into per-test behaviour. - tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge (both the apply-from-yaml and env-precedence-over-yaml paths) - website/docs/user-guide/messaging/discord.md: document the new env var + config key with multi-bot rationale; cross-link from `auto_thread` section Tested on Ubuntu 24.04. |
||
|
|
d557544560 |
fix(discord): keep free-response channels inline
Free-response channels are intended as lightweight chat surfaces — the bot
responds to every message without requiring an @mention. But the auto-thread
gate only checked DISCORD_NO_THREAD_CHANNELS, not DISCORD_FREE_RESPONSE_CHANNELS,
so every message in a free-response channel still spawned a brand-new thread.
That turns a chat channel into a thread-spawning machine: 1 thread per message.
The user-facing docs at website/docs/user-guide/messaging/discord.md already
describe the intended behavior ("Free-response channels also skip auto-threading
— the bot replies inline rather than spinning off a new thread per message"),
so this is a code-vs-docs gap, not a design change.
Fix: OR is_free_channel into skip_thread alongside the existing no_thread_channels
check. One-line production change.
Regression test added at tests/gateway/test_discord_free_response.py:
test_discord_free_response_channel_skips_auto_thread asserts that a message
in a free-response channel never calls _auto_create_thread. Reverting the
one-line fix causes the test to fail with 'Expected mock to not have been
awaited. Awaited 1 times.' — i.e. the test demonstrates the bug concretely.
|
||
|
|
3633c8690b |
refactor(plugins): add apply_yaml_config_fn registry hook
Lets platform plugins own their YAML→env config bridge instead of forcing core gateway/config.py to know every platform's schema. The hook receives the full parsed config.yaml and the platform's own sub-dict, may mutate os.environ (env > YAML precedence preserved via the standard `not os.getenv(...)` guards), and may return a dict to merge into PlatformConfig.extra. It runs during load_gateway_config() after the existing generic shared-key loop and before _apply_env_overrides(), mirroring the env_enablement_fn dispatch pattern (#21306, #21331). Pure addition — no behavior change for existing platforms. Each of the eight platforms with hardcoded YAML→env blocks today (discord, telegram, whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in gateway/config.py) can migrate in independent follow-up PRs; the hardcoded blocks remain functional in the meantime, and their `not os.getenv(...)` guards make them no-ops for any env var the hook already set. Test coverage: 10 new tests in tests/gateway/test_platform_registry.py covering field default, callable acceptance, env mutation, extras merge, both signature args, exception swallowing, missing/non-dict sections, and env > YAML precedence. Refs #3823, #24356. Closes #24836. |
||
|
|
d898e0eb7f
|
fix(gateway): complete lazy-install rebind for slack/feishu/matrix + add ensure_and_bind helper (#25038)
Fixes #25028. The lazy-install hooks added in #25014 installed packages correctly but failed to rebind module-level globals after install: - Slack: missing aiohttp rebind → NameError on file uploads - Feishu: none of the ~25 lark_oapi symbols rebound → TypeError on adapter instantiation - Matrix: mautrix.types enums stayed as stubs → mismatched values at runtime Introduces tools.lazy_deps.ensure_and_bind() — a DRY helper that combines ensure() + importer-callable + globals().update(). This eliminates the error-prone pattern of manually listing every global that needs updating after lazy-install. Each platform adapter now defines a single _import() function returning all bindings. Also fixes: pyproject.toml [slack] extra was missing aiohttp (needed by slack-bolt's async path). |
||
|
|
6122a79aab
|
feat(slack): support !cmd as alternate prefix for slash commands in threads (#25355)
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-main (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Slack platform-blocks native slash commands inside thread replies ("/queue
is not supported in threads. Sorry!") and there is no app-side setting to
re-enable them. As a workaround, rewrite a leading '!' to '/' for any known
gateway command before downstream processing — so '!queue', '!stop',
'!model gpt-5.4' etc. work inside Slack threads (and anywhere else).
Only the first token is checked against is_gateway_known_command(), so
casual messages like '!nice work' pass through to the agent unchanged.
Downstream pipeline (MessageType.COMMAND tagging, gateway dispatcher,
thread reply routing) is unchanged.
Adds 6 tests covering rewrite, args preservation, thread routing,
casual-message passthrough, '@bot' suffix, and plain '/' still-works.
|
||
|
|
091d8e1030
|
feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182)
* feat(codex-runtime): scaffold optional codex app-server runtime
Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.
Lands in three pieces:
1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
for codex's app-server protocol (codex-rs/app-server). Spawn, init
handshake, request/response, notification queue, server-initiated
request queue (for approval round-trips), interrupt-friendly blocking
reads. Tested against real codex 0.130.0 binary end-to-end during
development.
2. hermes_cli/runtime_provider.py:
- Adds 'codex_app_server' to _VALID_API_MODES.
- Adds _maybe_apply_codex_app_server_runtime() helper, called at the
end of _resolve_runtime_from_pool_entry(). Inert unless
'model.openai_runtime: codex_app_server' is set in config.yaml AND
provider in {openai, openai-codex}. Other providers cannot be
rerouted (anthropic, openrouter, etc. preserved).
3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
covering api_mode registration, the rewriter helper (default-off,
case-insensitive, opt-in, non-eligible providers preserved), version
parser, missing-binary handling, error class. Does NOT require codex
CLI installed.
This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.
Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)
* feat(codex-runtime): add codex item projector for memory/skill review
The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.
Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
- userMessage → {role: user, content}
- agentMessage → {role: assistant, content: text}
- reasoning → stashed in next assistant's 'reasoning' field
- commandExecution → assistant tool_call(name='exec_command') + tool result
- fileChange → assistant tool_call(name='apply_patch') + tool result
- mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result
- dynamicToolCall → assistant tool_call(name=<tool>) + tool result
- plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls
Invariants preserved:
- Message role alternation never violated: each tool item produces at most
one assistant + one tool message in that order, correlated by call_id.
- Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
don't materialize messages — only item/completed does. Mirrors how
Hermes already only writes the assistant message after streaming ends.
- Tool call ids are deterministic (codex item id-based) so replays produce
identical messages and prefix caches stay valid (AGENTS.md pitfall #16).
- JSON args use sorted_keys for the same reason.
Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).
23 new tests, all green:
- Streaming deltas don't materialize (3 paths)
- Turn/thread frame events are silent
- commandExecution: 5 tests including non-zero exit annotation +
deterministic id stability across replays
- agentMessage + reasoning attachment + reasoning consumption
- fileChange: summary without inlined content
- mcpToolCall: namespaced naming + error surfacing
- userMessage: text fragments only (drops images/etc)
- opaque items: no fabricated tool_calls
- Helpers: deterministic id stability + sorted JSON args
- Role alternation invariant across all four tool-shaped item types
This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.
* feat(codex-runtime): add session adapter + approval bridge
The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.
The adapter has a single public per-turn method:
result = session.run_turn(user_input='...', turn_timeout=600)
# result.final_text → assistant text for the caller
# result.projected_messages → list ready to splice into AIAgent.messages
# result.tool_iterations → tick count for _iters_since_skill nudge
# result.interrupted → True on Ctrl+C / deadline / interrupt
# result.error → error string when the turn cannot complete
# result.turn_id, thread_id → for sessions DB / resume
Behavior:
- ensure_started() spawns codex, does the initialize handshake, and
issues thread/start with cwd + permissions profile. Idempotent.
- run_turn() blocks until turn/completed, drains server-initiated
requests (approvals) before reading notifications so codex never
deadlocks waiting for us, projects every item/completed via the
projector, and increments tool_iterations for the skill nudge gate.
- request_interrupt() is thread-safe (threading.Event); the next loop
iteration issues turn/interrupt and unwinds.
- turn_timeout deadlock guard issues turn/interrupt and records an
error if the turn never completes.
- close() escalates terminate → kill via the underlying client.
Approval bridge:
Codex emits server-initiated requests for execCommandApproval and
applyPatchApproval. The adapter translates Hermes' approval choice
vocabulary onto codex's decision vocabulary:
Hermes 'once' → codex 'approved'
Hermes 'session' or 'always' → codex 'approvedForSession'
Hermes 'deny' / anything else → codex 'denied'
Routing precedence:
1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
2. approval_callback wired by the CLI (defers to
tools.approval.prompt_dangerous_approval())
3. Fail-closed denial when neither is wired
Unknown server-request methods are answered with JSON-RPC error -32601
so codex doesn't hang waiting for us.
Permission profile mapping mirrors AGENTS.md:
Hermes 'auto' → codex 'workspace-write'
Hermes 'approval-required' → codex 'read-only-with-approval'
Hermes 'unrestricted/yolo' → codex 'full-access'
20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
- test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
- test_codex_event_projector.py: 23 (item taxonomy projections)
- test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)
Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.
Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.
* feat(codex-runtime): wire codex_app_server runtime into AIAgent
The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.
Three small surgical edits to run_agent.py (~105 LOC total):
1. Line ~1204 (constructor api_mode validation set):
Add 'codex_app_server' so an explicit api_mode='codex_app_server'
passed to AIAgent() isn't silently rewritten to 'chat_completions'.
2. Line ~12048 (run_conversation, just before the while loop):
Early-return to _run_codex_app_server_turn() when self.api_mode is
'codex_app_server'. Placed AFTER all standard pre-loop setup —
logging context, session DB, surrogate sanitization, _user_turn_count
and _turns_since_memory increments, _ext_prefetch_cache, memory
manager on_turn_start — so behavior outside the model-call loop is
identical between paths. Default Hermes flow is unchanged when the
flag is off.
3. End-of-class (line ~15497):
New method _run_codex_app_server_turn(). Lazy-instantiates one
CodexAppServerSession per AIAgent (reused across turns), runs the
turn, splices projected_messages into messages, increments
_iters_since_skill by tool_iterations (since the chat_completions
loop normally does that per iteration), fires
_spawn_background_review on the same cadence as the default path.
Counter accounting:
_turns_since_memory ← already incremented at run_conversation:11817
(gated on memory store configured) — codex
helper does NOT touch it (would double-count).
_user_turn_count ← already incremented at run_conversation:11793
— codex helper does NOT touch it.
_iters_since_skill ← incremented in the chat_completions loop per
tool iteration. Codex helper increments by
turn.tool_iterations since the loop is bypassed.
User message:
ALREADY appended to messages by run_conversation pre-loop (line 11823)
before the early-return reaches us. Helper does NOT append again.
Regression test test_user_message_not_duplicated guards this.
Approval callback wiring:
Lazy-fetches tools.terminal_tool._get_approval_callback at session
spawn time, passes to CodexAppServerSession. CLI threads with
prompt_toolkit get interactive approvals; gateway/cron contexts get
the codex-side fail-closed deny.
Error path:
Codex session exceptions become a 'partial' result with completed=False
and a final_response that explicitly tells the user how to switch back:
'Codex app-server turn failed: ... Fall back to default runtime with
/codex-runtime auto.' Same return-dict shape as the chat_completions
path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.
9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
- api_mode='codex_app_server' is accepted on AIAgent construction
- run_conversation returns the expected codex shape
(final_response, codex_thread_id, codex_turn_id, completed, partial)
- Projected messages are spliced into messages list
- _iters_since_skill ticks per tool iteration
- _user_turn_count delegated to standard flow (not double-counted)
- User message appears exactly once (regression guard)
- _spawn_background_review IS invoked (memory/skill review keeps working)
- chat.completions.create is NEVER called (loop fully bypassed)
- Session exception → partial result with /codex-runtime auto hint
- Interrupted turn → partial result with error preserved
Adjacent test runs confirm no regressions:
- tests/run_agent/test_memory_nudge_counter_hydration.py: green
- tests/run_agent/test_background_review.py: green
- tests/run_agent/test_fallback_model.py: green
- tests/agent/transports/: 249/249 green
Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.
* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)
User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.
Surface:
/codex-runtime — show current state + codex CLI status
/codex-runtime auto — Hermes default runtime
/codex-runtime codex_app_server — codex subprocess runtime
/codex-runtime on / off — synonyms
Files changed:
hermes_cli/codex_runtime_switch.py (new):
Pure-Python state machine shared by CLI and gateway. Parse args,
read/write model.openai_runtime in the config dict, gate enabling
behind a codex --version check (don't let users opt in to a runtime
they have no binary for; print npm install hint instead).
Returns a CodexRuntimeStatus dataclass that callers render however
suits their surface.
hermes_cli/commands.py:
Single CommandDef entry, no aliases (codex-runtime is its own thing).
cli.py:
Dispatch in process_command() + _handle_codex_runtime() handler that
delegates to the shared module and renders results via _cprint.
gateway/run.py:
Dispatch in _handle_message() + _handle_codex_runtime_command() that
returns a string (gateway sends as message). On a successful change
that requires a new session, _evict_cached_agent() forces the next
inbound message to construct a fresh AIAgent with the new api_mode —
avoids prompt-cache invalidation mid-session.
gateway/run.py running-agent guard:
/codex-runtime joins /model in the early-intercept block so a runtime
flip mid-turn can't split a turn across two transports.
Tests:
tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
state machine: arg parsing (10 cases incl. case-insensitive and
synonyms), reading current runtime (5 cases incl. malformed configs),
writing runtime (3 cases), apply() entry point covering read-only,
no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
and persist-failure paths (8 cases). All green.
Adjacent test suites confirm no regressions:
- tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
167/167 green
- tests/agent/transports/: 283/283 green when combined with prior commits
Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.
* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml
Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.
The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.
What translates (mapping verified against codex-rs/core/src/config/edit.rs):
Hermes mcp_servers.<n>.command/args/env → codex stdio transport
Hermes mcp_servers.<n>.url/headers → codex streamable_http transport
Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec
Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec
Hermes mcp_servers.<n>.cwd → codex stdio cwd
Hermes mcp_servers.<n>.enabled: false → codex enabled = false
What does NOT translate (warned + skipped per server):
Hermes-specific keys (sampling, etc.) — codex's MCP client has no
equivalent. Listed in the per-server skipped[] field of the report.
What's NOT migrated (intentional):
AGENTS.md — codex respects this file natively in its cwd. Hermes' own
AGENTS.md (project-level) is already in the worktree, so codex picks
it up without translation. No code needed.
Idempotency design:
All managed content lives between a 'managed by hermes-agent' marker
and the next non-mcp_servers section header. _strip_existing_managed_block
removes the prior managed region cleanly, preserving any user-added
codex config (model, providers.openai, sandbox profiles, etc.) above
or below.
Files added:
hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
helper. Public API: migrate(hermes_config, codex_home=None,
dry_run=False) returns MigrationReport with .migrated/.errors/
.skipped_keys_per_server. No external TOML dependency — minimal
formatter handles strings/numbers/booleans/lists/inline-tables.
tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
covering:
- per-server translation (12): stdio/http/sse, cwd, timeouts,
enabled flag, command+url precedence, sampling drop, unknown keys
- TOML formatter (8): types, escaping, inline tables, error case
- existing-block stripping (4): no marker, alone, with user content
above, with user content below
- end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
re-run, preserves user config, error reporting, invalid input,
summary formatting
Files changed:
hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
the codex_app_server enable branch. Migration failure logs a warning
in the result message but does NOT fail the runtime change. Disable
path (auto) explicitly skips migration.
tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
test_migration_failure_does_not_block_enable.
All 325 feature tests green:
- tests/agent/transports/: 249 (incl. 67 new)
- tests/run_agent/test_codex_app_server_integration.py: 9
- tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
- tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)
* perf(codex-runtime): cache codex --version check within apply()
Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.
Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.
Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.
Two regression-guard tests added:
- test_binary_check_cached_within_apply: enable path → call_count == 1
- test_binary_check_cached_on_read_only_call: state-report path → call_count == 1
Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.
* fix(codex-runtime): correct protocol field names found via live e2e test
Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.
Bug 1: thread/start.permissions wire format
Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
{"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.
Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.
Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.
Bug 2: server-request method names
Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
item/commandExecution/requestApproval
item/fileChange/requestApproval
item/permissions/requestApproval (new third method)
Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.
Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.
Bug 3: approval decision values
Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).
Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.
Live test verified after fixes:
$ hermes (with model.openai_runtime: codex_app_server)
> Run the shell command: echo hermes-codex-livetest > .../proof.txt
then read it back
Approval prompt fired with 'Codex requests exec in <cwd>'.
User chose 'Allow once'. Codex executed the command, wrote the file,
read it back. Final response: 'Read back from proof.txt:
hermes-codex-livetest'. File contents on disk match.
agent.log confirms:
codex app-server thread started: id=019e200e profile=workspace-write
cwd=/tmp/hermes-codex-livetest/workspace
All 20 session tests still green after wire-format updates.
* fix(codex-runtime): correct apply_patch approval params + ship docs
Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.
Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.
Live e2e validation across the path matrix:
✓ thread/start handshake
✓ turn/start with text input
✓ commandExecution items + projection
✓ item/commandExecution/requestApproval → Hermes UI → response
✓ Approve once → command runs
✓ Deny → command rejected, codex falls back to read-only message
✓ Multi-turn (codex remembers prior turn's results)
✓ apply_patch via Codex's fileChange path
✓ item/fileChange/requestApproval → Hermes UI
✓ MCP server migration loads inside spawned codex (verified via
'use the filesystem MCP tool' prompt)
✓ /codex-runtime auto → codex_app_server toggle cycle
✓ Disable doesn't trigger migration
✓ Enable with codex CLI present succeeds + migrates
✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
even if codex finishes before the interrupt lands)
Known live-validated limitations now documented in the docs page:
- delegate_task subagents unavailable on this runtime
- permission profile selection delegated to ~/.codex/config.toml
- apply_patch approval prompt has no inline changeset (codex protocol
doesn't expose it)
145/145 codex-runtime tests still green.
* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)
Major: migrate native Codex plugins (#7 in OpenClaw's PR list)
Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.
Implementation:
- hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
helper spawns 'codex app-server' briefly and walks plugin/list. Returns
(plugins, error) — failures are non-fatal so MCP migration still works.
- render_codex_toml_section() now takes plugins + permissions args.
- migrate() defaults: discover_plugins=True, default_permission_profile=
'workspace-write'. Explicit None on either disables that side.
- _strip_existing_managed_block() now also strips [plugins.*] and
[permissions]/[permissions.*] sections inside the managed block, so
re-runs replace plugins cleanly without touching codex's own config.
Quirk fixes:
#2 Default permissions profile written on enable.
Without this, Codex's read-only default kicks in and EVERY write
triggers an approval prompt. Now writes [permissions] default =
'workspace-write' so the runtime feels normal out of the box. Set
default_permission_profile=None to opt out.
#4 apply_patch approval prompt now shows what's changing.
Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
Session adapter now caches the fileChange item from item/started
notifications and looks it up by itemId when codex requests approval.
Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
'apply_patch (0 change(s))'.
Side benefit: also drains pending notifications BEFORE handling a
server request, so the projector and per-turn caches are up to date
when the approval decision fires. Bounded to 8 notifications per
loop iter to avoid starving codex's response.
#5/#10 Exec approval prompt never shows empty cwd.
When codex omits cwd in CommandExecutionRequestApprovalParams, fall
back to the session's cwd. If somehow neither is available, show
'<unknown>' explicitly instead of an empty string.
Also surfaces 'reason' from the approval params when codex provides
it — gives users more context on why codex wants to run something.
#11 Banner indicates the codex_app_server runtime when active.
New 'Runtime: codex app-server (terminal/file ops/MCP run inside
codex)' line appears in the welcome banner only when the runtime is
on. Default banner is unchanged.
Tests:
- 7 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery (mocked), failure handling, dry-run skip, opt-out
flag, idempotent re-runs, and permissions writing.
- 3 new tests in test_codex_app_server_session.py covering the
enriched approval prompts: cwd fallback, change summary on
apply_patch, fallback when no item/started cache exists.
- All 26 session tests + 46 migration tests green; 153 total in PR.
* feat(codex-runtime): hermes-tools MCP callback + native plugin migration
The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.
Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.
New module: agent/transports/hermes_tools_mcp_server.py
FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
through model_tools.handle_function_call() — same code path as the
Hermes default runtime. Run with:
python -m agent.transports.hermes_tools_mcp_server [--verbose]
Exposed: web_search, web_extract, browser_navigate / _click / _type /
_press / _snapshot / _scroll / _back / _get_images / _console /
_vision, vision_analyze, image_generate, skill_view, skills_list,
text_to_speech.
NOT exposed (deliberately):
- terminal/shell/read_file/write_file/patch — codex has built-ins
- delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
model_tools.py:493, require running AIAgent context. Documented
as a limitation and surfaced in the slash command output.
Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
- _query_codex_plugins() spawns 'codex app-server' briefly to walk
plugin/list and pull installed openai-curated plugins. Failures are
non-fatal — MCP migration still completes.
- render_codex_toml_section() now takes plugins + permissions args
AND wraps the managed block with a MIGRATION_END_MARKER comment so
the stripper can reliably find both ends, even when the block
contains top-level keys (default_permissions = ...).
- migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
default_permission_profile=':workspace' (built-in codex profile name
— must be prefixed with ':'). All three opt-out via explicit args.
- _build_hermes_tools_mcp_entry() builds the codex stdio entry with
HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
Hermes points the MCP subprocess at the same module layout.
Live-caught wire bugs fixed during this turn:
1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
for *user-defined* profiles with structured fields. Built-in
profile names start with ':' (':workspace', ':read-only',
':danger-no-sandbox'). Was emitting
which codex rejected with 'invalid type: string "X", expected
struct PermissionProfileToml'.
2. Built-in profile is , NOT . Codex
rejected with 'unknown built-in profile'.
3. Codex's MCP layer sends for
tool-call confirmation. We weren't handling it, so codex stalled
and returned 'MCP tool call was rejected'. Now: auto-accept for
our own hermes-tools server (user already opted in by enabling
the runtime), decline for third-party servers.
Quirk fixes shipped (from the limitations list):
#2 default permissions: workspace profile written on enable. No more
approval prompt on every write.
#4 apply_patch approval shows what's changing: cache fileChange
items from item/started, look up by itemId when codex sends
item/fileChange/requestApproval. Prompt: '1 add, 1 update:
/tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
#5/#10 exec approval cwd never empty: fall back to session cwd, then
'<unknown>'. Also surfaces 'reason' from codex when present.
#11 banner shows 'Runtime: codex app-server' line when active so
users understand why tool counts may not match what's reachable.
Tests:
- 5 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery, expose_hermes_tools entry generation, idempotent
re-runs, opt-out flag, permissions profile.
- 3 new tests in test_codex_app_server_session.py covering enriched
approval prompts (cwd fallback, fileChange summary).
- 2 new tests for mcpServer/elicitation/request handling (accept
hermes-tools, decline others).
- New test file test_hermes_tools_mcp_server.py covering module
surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
no agent-loop tools), and main() error paths.
- 166 codex-runtime tests total, all green.
Live e2e validated against codex 0.130.0 + ChatGPT subscription:
✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
registers hermes-tools, writes default_permissions = ':workspace'
✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
✓ Shell command runs without approval prompt (workspace profile works)
✓ Multi-turn — codex remembers prior turn's results
✓ apply_patch path via fileChange request approval
✓ web_search via hermes-tools MCP callback returns real Firecrawl
results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
✓ Disable cycle clean
Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
Full re-write covering native plugin migration, the hermes-tools
callback architecture, the prerequisites change ('codex login is
separate from hermes auth login codex'), the trade-off table now
reflecting which Hermes tools work via callback, and the limitations
list updated with what's actually unavailable on this runtime.
* feat(codex-runtime): pin user-config preservation invariant for quirk #6
Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.
This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.
Now explicitly tested:
- User MCP server above the managed block survives migration
- User MCP server below the managed block survives migration
- Both above + below survive a second re-migration
- User content (model, providers, sandbox, otel, etc.) outside our
region is left untouched
Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.
167 codex-runtime tests, all green.
* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find
Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.
Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).
Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:
1. Codex's built-in toolset (always on) — shell, apply_patch,
update_plan, view_image, web_search; covers everything terminal-
adjacent.
2. Native Codex plugins (auto-migrated from your codex plugin
install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
web_search/web_extract via Firecrawl, browser_*, vision_analyze,
image_generate, skill_view/skills_list, text_to_speech.
Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.
Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.
Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.
No code changes — purely docs clarification. 167 codex-runtime tests
still green.
* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade
Two real bugs in the self-improvement loop integration that the previous
test mocked away.
Bug 1: wrong call signature
The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
messages_snapshot=list (positional or keyword)
review_memory=bool (at least one trigger must be True)
review_skills=bool
So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.
Bug 2: review fork inherits codex_app_server api_mode
The review fork is constructed with:
api_mode = _parent_runtime.get('api_mode')
So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.
Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).
Also rewrote the codex helper's review wiring to match the
chat_completions path:
- Computes _should_review_memory in the pre-loop block (was already
being computed; now passed through to the helper as an arg).
- Computes _should_review_skills AFTER the codex turn returns +
counters tick (line ~15432 pattern in chat_completions).
- Calls _spawn_background_review(messages_snapshot=, review_memory=,
review_skills=) only when at least one trigger fires.
- Adds the external memory provider sync (_sync_external_memory_for_turn)
that the chat_completions path runs after every turn.
Tests:
Replaced the broken test_background_review_invoked (which only
asserted spawn.called) with three sharper tests:
- test_background_review_NOT_invoked_below_threshold:
single turn at default thresholds → no review fires (would have
caught the original 'every turn calls spawn with no args' bug)
- test_background_review_skill_trigger_fires_above_threshold:
10 tool_iterations at threshold=10 → review fires with
messages_snapshot=list, review_skills=True, counter resets
- test_background_review_signature_never_breaks: regression guard
asserting positional args are always empty and kwargs include
messages_snapshot
New TestReviewForkApiModeDowngrade class:
- test_codex_app_server_parent_downgrades_review_fork: drives the
real _spawn_background_review function (no mock at that level),
asserts the review_agent gets api_mode='codex_responses' when
the parent was codex_app_server.
Live-validated against real run_conversation:
- Counter ticked from 0 to 5 after a 5-tool-iteration turn
- _spawn_background_review fired exactly once with kwargs-only signature
- review_skills=True, review_memory=False
- messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
results + 1 final assistant + initial system/user)
- Counter reset to 0 after fire
170 codex-runtime tests, all green.
Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).
* feat(codex-runtime): expose kanban tools through Hermes MCP callback
Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.
That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.
Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.
Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.
Tools exposed:
Worker handoff (require HERMES_KANBAN_TASK):
kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
Read-only board queries:
kanban_show, kanban_list
Orchestrator (require HERMES_KANBAN_TASK unset):
kanban_create, kanban_unblock, kanban_link
Tests:
- test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
- test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link
Docs:
- New 'Workflow features' section in the docs page covering /goal,
kanban, and cron behavior on this runtime
- /goal: works fully via run_conversation feedback; only caveat is
approval-prompt noise on long writes-heavy goals (mitigated by
the default :workspace permission profile)
- Kanban: enumerated which tools are reachable via the callback and
why the env var propagates correctly through the codex subprocess
to the MCP server subprocess
- Cron: documented as 'not specifically tested' — same rules as the
CLI apply since cron runs through AIAgent.run_conversation
- Trade-offs table gained rows for /goal, kanban worker, kanban
orchestrator
172/172 codex-runtime tests green (+2 from kanban tests).
* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost
Three docs gaps caught during a final audit:
1. /codex-runtime was only in the feature docs page, not in the
slash-commands reference. Added rows to both the CLI section and
the Messaging section so users discover it where they'd look for
slash command syntax.
2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
CODEX_HOME lets users redirect Codex CLI's config dir (the migration
honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
propagates to the codex subprocess + the hermes-tools MCP subprocess
so kanban worker tools gate correctly — documented as 'don't set
manually' since it's an internal handoff.
3. Aux client behavior on this runtime. When openai_runtime=
codex_app_server is on with the openai-codex provider, every aux
task (title generation, context compression, vision auto-detect,
session search summarization, the background self-improvement review
fork) flows through the user's ChatGPT subscription by default.
This is true for the existing codex_responses path too, but it's
more visible / important here because users explicitly opted in for
subscription billing. Added a 'Auxiliary tasks and ChatGPT
subscription token cost' section to the docs page with a YAML
example showing how to override specific aux tasks to a cheaper
model (typically google/gemini-3-flash-preview via OpenRouter).
Also documents how the self-improvement review fork gets
auto-downgraded from codex_app_server to codex_responses by the
fix earlier in this PR.
No code changes — pure docs. 172 codex-runtime tests still green.
* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME
OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.
Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:
test_spawn_env_preserves_HOME — confirms parent HOME survives intact
in the subprocess env
test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
arg still isolates
codex state correctly
Docs additions:
'HOME environment variable passthrough' section — calls out the
contract explicitly: CODEX_HOME isolates codex's own state, HOME
stays user-real so gh/git/aws/npm/etc. find their normal config.
Cites openclaw#81562 as the cautionary tale.
'Multi-profile / multi-tenant setups' section — addresses the
related concern: profiles share ~/.codex/ by default. For users who
want per-profile codex isolation (separate auth, separate plugins),
documents the manual CODEX_HOME=<profile-scoped-dir> approach.
Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
would silently invalidate existing codex login state for anyone
upgrading to this PR with tokens already at ~/.codex/auth.json.
Opt-in is safer than surprising users.
174 codex-runtime tests (+2 from HOME guards), all green.
* fix(codex-runtime): TOML control-char escapes + atomic config.toml write
Two footguns caught in a final audit pass before merge.
Bug 1: TOML control characters not escaped
The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.
Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.
Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.
Bug 2: config.toml write wasn't atomic
If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.
Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.
Tests:
- test_string_with_newline_escaped — \n in value → \n in output
- test_string_with_tab_escaped — \t in value → \t in output
- test_string_with_other_controls_escaped — \r, \f, \b
- test_windows_path_escaped_correctly — backslash doubling
- test_atomic_write_no_temp_leak_on_success — no .config.toml.*
left over after a successful write
- test_atomic_write_cleanup_on_rename_failure — temp file removed
when Path.replace raises (simulated disk full)
180 codex-runtime tests, all green (+6 from this commit).
Footguns audited but NOT fixed (with rationale):
- Concurrent migrations race. Two Hermes processes hitting
/codex-runtime codex_app_server within seconds of each other could
cause one writer to lose entries. Low probability (you'd have to
enable from two surfaces simultaneously) and low impact (just re-run
migration). Adding fcntl/msvcrt locking is more code than it's
worth here. The atomic rename above means each individual write is
consistent — only the merge step is racy.
- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
check at runtime but don't reject too-new versions. Right call —
the protocol has been stable through 0.125 → 0.130. If OpenAI
breaks it later we'd see the error in test_codex_app_server_runtime
on CI before users hit it.
|
||
|
|
9a815b6c8c |
fix(gateway): preserve queued follow-up transcript history
Keep the outer history_offset when _run_agent drains queued follow-ups recursively so transcript persistence includes every queued turn in the chain instead of only the last one. |
||
|
|
5d90386baa
|
fix(gateway): add lazy_deps.ensure() to slack, matrix, dingtalk, feishu adapters (#25014)
Only Discord and Telegram had lazy-install hooks in their check_*_requirements() functions. The remaining four platforms that were moved to lazy_deps (Slack, Matrix, DingTalk, Feishu) would just return False immediately if their packages weren't pre-installed — no attempt to install them at runtime. This means even with the .venv permissions fix (#24841), these four platforms would still fail to load in Docker (or any fresh install) unless the user manually ran pip install. Add the same lazy_deps.ensure() pattern to all four, matching the existing Discord/Telegram implementation. |
||
|
|
1a4e8f7041 |
fix(gateway): make WhatsApp npm install timeout configurable
Default timeout raised from 60s to 300s (5 minutes) to accommodate slower systems like Unraid NAS. Configurable via WHATSAPP_NPM_INSTALL_TIMEOUT environment variable. |
||
|
|
e77fd75c44 |
fix(wecom): update connection status after WebSocket reconnection
The WeCom adapter's _listen_loop() automatically reconnects when the WebSocket drops, but it never called _mark_connected() after a successful reconnection. This left the runtime status file (gateway_state.json) stuck in "disconnected" even though the adapter was fully operational again. Add self._mark_connected() right after _open_connection() succeeds so that the dashboard and health probes report the correct state. Tested by forcing a WebSocket close via the heartbeat loop and verifying that the status file updated from "disconnected" back to "connected". |
||
|
|
e474130c48 |
fix(telegram): use thread fallback helper in slash-confirm result send
PR #23458 introduced _send_message_with_thread_fallback() and applied it to all control-style sends (send_update_prompt, send_approval_request, send_model_picker_prompt), but the slash-confirm result message in handle_callback_query still called self._bot.send_message directly. In supergroups with stale message_thread_id on the callback's parent message, this raises "Message thread not found" and silently swallows the result text. Replace with the helper so the same retry-without- thread-id logic applies. |
||
|
|
dd1d4e9c5d | fix(gateway): add chat_id to hook_ctx for message source tracking | ||
|
|
e71393237e |
fix(signal): handle group messages from linked devices in syncMessage path
Closes #23064 When Hermes connects to Signal via signal-cli in daemon mode (linked device setup), group messages sent from the user's phone were silently dropped. The syncMessage handler only processed events where destinationNumber equals the bot's own number (Note to Self). Group messages from linked devices carry a groupInfo.groupId instead of a destinationNumber. Extend the condition to also pass through sync messages that have a groupId, so group messages are promoted to dataMessage and reach the agent. |
||
|
|
a694a26330 |
docs(gateway): mention Weixin in gateway help and docstrings
Salvage of #21063 — adds 'Weixin, and more' to module-level docstrings in gateway/__init__.py, gateway/config.py, gateway/platforms/base.py and the 'hermes gateway' subparser description. Co-authored-by: wuwuzhijing <chuang.guo@hopechart.com> |
||
|
|
6f285efb80
|
fix(telegram): clear in-progress reaction on cancelled processing (#24628)
When the user runs /stop or a session is interrupted mid-flight, the 👀 in-progress reaction lingered on the user's message indefinitely. Without another agent run to swap it for 👍/👎, the eyes stayed there forever — visually misleading (looks like the agent is still working). Fix: on ProcessingOutcome.CANCELLED, call set_message_reaction with reaction=None to clear all reactions on the message. Documented Bot API semantics (equivalent to Bot API 10.0's deleteMessageReaction, but works on PTB 22.6 already without the version bump). Test changes: - Renamed test_on_processing_complete_cancelled_keeps_existing_reaction → test_on_processing_complete_cancelled_clears_reaction; updated assertion to expect set_message_reaction(reaction=None). - Added test_on_processing_complete_cancelled_skipped_when_disabled (TELEGRAM_REACTIONS=false short-circuits). - Added test_clear_reactions_handles_api_error_gracefully and test_clear_reactions_returns_false_without_bot to cover the new _clear_reactions helper. |
||
|
|
29d7c244c5
|
feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199)
The clarify tool returned 'not available in this execution context' for every gateway-mode agent because gateway/run.py never passed clarify_callback into the AIAgent constructor. Schema actively encouraged calling it; users never saw the question. Changes: - tools/clarify_gateway.py — new event-based primitive mirroring tools/approval.py: register/wait_for_response/resolve_gateway_clarify with per-session FIFO, threading.Event blocking with 1s heartbeat slices (so the inactivity watchdog keeps ticking), and clear_session for boundary cleanup. - gateway/platforms/base.py — abstract send_clarify with a numbered-text fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix, etc.) gets a working clarify out of the box. Plus an active-session bypass: when the agent is blocked on a text-awaiting clarify, the next non-command message routes inline to the runner's intercept instead of being queued + triggering an interrupt. Same shape as the /approve deadlock fix from PR #4926. - gateway/platforms/telegram.py — concrete send_clarify renders one inline button per choice plus '✏️ Other (type answer)'. cl: callback handler resolves numeric choices immediately, flips to text-capture mode for Other, with the same authorization guards as exec/slash approvals. - gateway/run.py — clarify_callback wired at the cached-agent per-turn callback assignment site (only the user-facing agent path; cron and hygiene-compress agents have no human attached). Bridges sync→async via run_coroutine_threadsafe, blocks with the configured timeout, and returns a '[user did not respond within Xm]' sentinel on timeout so the agent adapts rather than pinning the running-agent guard. Text- intercept added to _handle_message before slash-confirm intercept (skipping slash commands). clear_session called in the run's finally to cancel any orphan entries. - hermes_cli/config.py — agent.clarify_timeout default 600s. - website/docs/user-guide/messaging/telegram.md — Interactive Prompts section. Tests: - tests/tools/test_clarify_gateway.py (14 tests) — full primitive coverage: button resolve, open-ended auto-await, Other flip, timeout None, unknown-id idempotency, clear_session cancellation, FIFO ordering, register/unregister notify, config default. - tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render paths (multi-choice/open-ended/long-label/HTML-escape/not-connected), callback dispatch (numeric resolve/Other flip/already-resolved/ unauthorized/invalid-token), and base-adapter text fallback. Out of scope: bot-to-bot, guest mode, checklists, poll media, live photos. Closes #24191. |
||
|
|
f9559c39c4 |
fix(gateway): consult lock record argv when cmdline unreadable in scoped-lock stale check
PR #24500 introduced stale-lock detection that calls `_looks_like_gateway_process` to confirm a running PID is not an unrelated process that reused the slot. On Windows neither `/proc` nor `ps` is available, so `_read_process_cmdline` always returns `None` and `_looks_like_gateway_process` always returns `False` — causing every valid Windows gateway lock to be marked stale and immediately evicted. Fix: after `_looks_like_gateway_process` returns `False`, call `_read_process_cmdline` directly. If the result is non-`None` the live cmdline was readable and confirms the PID is foreign → stale. If it is `None` (cmdline unreadable, e.g. Windows without ps), fall back to `_record_looks_like_gateway` which validates the stored `argv` the gateway wrote into the lock file at startup. Both oracles must say "not a gateway" before the lock is evicted — the same two-oracle pattern already used in `get_running_pid` (line 941). Adds a regression test that simulates a Windows host where `_looks_like_gateway_process` returns `False` for every PID and `_read_process_cmdline` returns `None`, confirming the lock is kept when the record's argv identifies it as a gateway process. |
||
|
|
4bb0a82a2b | fix(gateway): enqueue SSE EOS sentinel on task completion | ||
|
|
653d304290 |
fix(gateway): detect stale scoped locks via cmdline when start_time is absent on macOS
On macOS (and Windows), /proc is unavailable so _get_process_start_time()
always returns None. When a gateway creates a scoped lock record with
start_time=None and then exits, macOS can reuse that PID for an unrelated
process. On restart, acquire_scoped_lock() sees:
1. os.kill(pid, 0) succeeds (PID is alive — but it's bluetoothuserd, not
the gateway)
2. existing.start_time is None and current_start is None, so the
start_time comparison is inconclusive
3. The lock is treated as active, blocking gateway startup with:
"Telegram bot token already in use (PID 873). Stop the other gateway
first."
Root cause: _read_process_cmdline() only reads /proc/<pid>/cmdline, which
doesn't exist on macOS. It always returns None, making
_looks_like_gateway_process() always return False, so the cmdline fallback
path in acquire_scoped_lock() was unreachable on macOS.
Fix (two parts):
1. _read_process_cmdline(): Add a ps(1) fallback for platforms without
/proc. When /proc/<pid>/cmdline doesn't exist, we now run
"ps -p <pid> -o command=" to retrieve the process command line. The
/proc path is tried first (preserving Linux performance); ps is only
invoked as a fallback.
2. acquire_scoped_lock(): When both the lock record's start_time and the
live process's start_time are None (the macOS case), fall back to
checking whether the live PID still looks like a Hermes gateway process
via _looks_like_gateway_process(). If it doesn't, the lock is stale.
Closes #16376
|
||
|
|
c1eb2dcda7
|
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
|
||
|
|
271883447e
|
feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847)
Set HERMES_SESSION_ID using the existing session_context.py ContextVar system for concurrency safety (multiple gateway sessions in one process won't cross-talk). Also writes os.environ as fallback for CLI mode. Touchpoints: - gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry - run_agent.py: Set both ContextVar and os.environ at init and on context-compression rotation - tools/environments/local.py: Bridge ContextVars into subprocess env in _make_run_env() (ContextVars don't propagate to child processes) - tests/run_agent/test_session_id_env.py: 3 tests covering env, provided ID, and ContextVar paths execute_code subprocess already passes HERMES_* prefixed vars through _scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_'). Primary use case: webhook-triggered agents that need to include a `--resume <session_id>` takeover command in their output. |
||
|
|
ce0f529cde
|
chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940)
C401: set(x for x in y) -> {x for x in y} (set comprehension)
C416: [(k,v) for k,v in d] -> list(d.items()) (unnecessary listcomp)
C408: tuple()/dict() -> ()/{} (unnecessary collection call)
PLR1722: exit() -> sys.exit() (adds import sys where needed)
21 instances fixed, 0 remaining. 19 files, +40/-36.
|
||
|
|
2ec8d2b42f
|
chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero). |
||
|
|
657874460f
|
chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions. |
||
|
|
3e7145e0bb
|
revert: roll back /goal checklist + /subgoal feature stack (#23813)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit |
||
|
|
f6d4f3c37d | fix(kanban): route gateway create auto-subscribe to explicit board | ||
|
|
ace1c4ea8c |
fix(discord): typing indicator task not cleaned up after API error
When the Discord typing API call fails (rate limit, network error, 403), _typing_loop returns early but the stale task remains in _typing_tasks. Subsequent send_typing calls see the stale entry and skip, leaving no typing indicator for the rest of the agent invocation. Add finally block to _typing_loop to always remove the task from _typing_tasks on exit, whether from cancellation, error, or normal completion. This allows send_typing to create a fresh task. 3 new tests in test_discord_send.py: - Task removed after API error - Typing restartable after failure - stop_typing cleans up |
||
|
|
3af3c4eb8c |
fix(misc): three small defensive fixes from PR #1974
Salvages the three substantive low-severity fixes from Gutslabs' #1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> |
||
|
|
ac95b8cdbe |
perf(gateway): tune Telegram cadence + adaptive fast-path for short replies
Re-authored against current main from PR #10388 by @wilsen0. The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.
Tuning summary
==============
Text-batch ingress (gateway/platforms/telegram.py):
- HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
- HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
- Adaptive fast-path tiers in _flush_text_batch:
total <= 320 cp -> min(cap, 0.18)
total <= 1024 cp -> min(cap, 0.24)
else -> cap
A single short reply now reaches the agent in ~180ms instead of
600ms. Tier constants compose with the configured cap via min()
so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
below 0.18 still wins on every tier.
- _env_float_clamped helper replaces bare float(os.getenv()).
Rejects NaN / Inf, applies optional min/max bounds. Used for
text-batch + media-batch knobs. Prevents asyncio.sleep(NaN)
crashes when an operator typos an env var.
Stream cadence (gateway/config.py + stream_consumer.py):
- StreamingConfig.edit_interval default 1.0s -> 0.8s
- StreamingConfig.buffer_threshold default 40 -> 24 chars
- DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
a single source of truth. StreamConsumerConfig imports them
instead of duplicating the literals; the prior dual-source drift
is fixed.
Tool progress (gateway/display_config.py):
- Telegram default tool_progress 'all' -> 'new'. Inside
Telegram's ~1 edit/s flood envelope the 'all' default would
accumulate edit pressure on busy chats; 'new' shows only the
leading bubble per tool batch and feels less spammy.
- Slack tier_low override (tool_progress='off') is preserved.
Composition with native draft streaming (#23512)
================================================
The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based. The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport. No conflict.
Stale-base avoidance
====================
Re-authored from scratch rather than cherry-picked. Dropped from the
original branch:
- Unrelated
|
||
|
|
f6736ced81 |
fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages
1. Quick command exec ran in the gateway process's full environment without env sanitization or output redaction. A quick command like "env" or "printenv" would leak all API keys, OAuth tokens, and bot credentials to the messaging user. Fix: apply _sanitize_subprocess_env() before exec and redact_sensitive_text() on output before returning. 2. GatewayRunner._pending_messages was written on every interrupt (lines 1331-1334) but never read or consumed anywhere. The actual interrupt delivery uses adapter._pending_messages (a separate dict). Removed the write-only accumulation to prevent unbounded growth. |
||
|
|
bf1f40996f |
fix(telegram): split-and-deliver oversized edits instead of silent truncation
When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit,
the adapter caught the BadRequest, best-effort truncated the content
with '…', and returned SendResult(success=True). The stream consumer
believed the full edit was delivered and never recovered, silently
dropping everything past the truncation boundary on long replies.
Returning failure isn't safe either — the consumer's existing fallback
path can race against the next streaming tick, producing duplicate
sends or gaps. Instead, the adapter now SPLITS the oversized payload
across the existing message + new continuation messages, so the user
always gets the full reply in correct order.
How it works:
1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH,
call the new _edit_overflow_split helper directly — saves a doomed
round-trip + a Telegram error.
2. Reactive: if Telegram still returns 'message_too_long' after the
pre-flight (e.g. parse_mode formatting inflated the payload past
the limit via MarkdownV2 escapes), the same helper handles it.
3. _edit_overflow_split:
- Splits via truncate_message(len_fn=utf16_len) — same chunking the
non-streaming send() path uses; chunks get '(1/N)' suffixes.
- Edits the original message_id with chunk 1 (with parse_mode +
plain-fallback when finalize=True, mirroring the main edit path).
- Sends each remaining chunk via self._bot.send_message threaded as
a reply to the previous chunk so the user sees them as a
contiguous block. MarkdownV2-with-plain-fallback per chunk on
finalize.
- Returns SendResult(success=True, message_id=<last_chunk_id>,
continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the
stream consumer can keep editing the most recent visible message
and the gateway has full visibility into every message id.
SendResult contract extension:
Added optional continuation_message_ids: tuple = () field. When
empty (the common case), behavior is unchanged. When populated, the
caller knows the adapter delivered across multiple platform messages.
Stream consumer integration:
GatewayStreamConsumer._send_or_edit advances _message_id to the
last-continuation id when it sees continuation_message_ids on a
successful edit result, resets _last_sent_text (the new visible
message holds only the final chunk's text), and fires
on_new_message so tool-progress bubbles linearize below the new
continuation rather than the original. Mirrors the openclaw #32535
inter-tool-leak guard.
Composes with what just landed:
- PR #23455 (UTF-16 length-aware splitting in stream consumer)
prevents most overflows upstream by measuring text in UTF-16
codeunits before deciding to split. This PR is the safety net at
the adapter boundary.
- PR #23512 (native draft streaming, default for DM Telegram) routes
DM streaming through send_draft, which has its own contract
unaffected by this change. So this fix narrows in scope to the
edit-based path: groups, supergroups, forum topics, every
non-Telegram platform, and the per-response fallback after a
draft failure.
Salvage notes:
- Cherry-picked from PR #19537 by @kjames2001. Original PR returned
failure on overflow; this evolves to split-and-deliver so users
never lose content and the consumer state stays consistent.
- Dropped an unrelated model-picker hunk (line 2114-2117) that
silently killed the 'X more available — type /model <name>
directly' hint by hardcoding total=len(models). Not in scope.
- Restored the timeout-aware retryable=not is_timeout signal in
send()'s fallthrough catch block.
Closes #19537.
|
||
|
|
ba5640fa11 | fix(gateway): route kanban notifications to creator profile | ||
|
|
4ed293b38e |
feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+)
Adds Telegram's native streaming-draft API as a streaming transport so DM
replies render with smooth animated previews as tokens arrive, dropping
the per-edit jitter of the legacy editMessageText polling path.
Adapter contract (gateway/platforms/base.py):
- supports_draft_streaming(chat_type, metadata) -> bool. Default False.
Telegram returns True only for DMs and only when the bound python-
telegram-bot version exposes Bot.send_message_draft (PTB 22.6+).
- send_draft(chat_id, draft_id, content, metadata) -> SendResult.
Default raises NotImplementedError. Telegram delegates to PTB's
send_message_draft. Drafts have no message_id (Bot API contract);
SendResult.message_id is None on success.
Telegram adapter (gateway/platforms/telegram.py):
- supports_draft_streaming gates on chat_type='dm' AND PTB capability.
- send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads
message_thread_id through metadata, and routes failures back as
SendResult(success=False, error=...) so the consumer can fall back.
Stream consumer (gateway/stream_consumer.py):
- StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off')
and chat_type fields.
- run() resolves _use_draft_streaming once via a probe at the top of
the run, allocating a fresh class-wide draft_id_counter so each
response animates as its own preview (no animation collision across
consecutive responses to the same chat).
- _send_or_edit gains a pre-edit branch: when drafts are active AND
not finalizing AND no edit-path message_id is established, the
frame routes through _send_draft_frame instead of edit_message.
Drafts intentionally do NOT set _already_sent so the gateway's
final sendMessage path still fires — drafts have no message_id and
the user needs a real message in their chat history.
- _reset_segment_state bumps the draft_id when the consumer is in
draft mode so each text block after a tool boundary animates as a
fresh preview below the tool-progress bubble (avoids the inter-
tool-call leak openclaw documented in their #32535).
- Per-response fallback: any send_draft failure (transient network,
server reject, capability gap) flips _use_draft_streaming to False
for the rest of the run, gracefully returning to the edit path.
Gateway config (gateway/config.py):
- StreamingConfig.transport default flips edit -> auto. The auto path
is identical to edit on every chat type that doesn't currently
support drafts (groups, supergroups, forum topics, every non-
Telegram platform), so the default is backwards-compatible for
non-DM users.
Lifecycle model (Telegram Bot API 9.5):
1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble.
2. Repeated sendMessageDraft calls with the SAME draft_id animate
the preview as text grows.
3. Drafts have no message_id and cannot be edited or deleted.
4. When the response finishes the gateway's normal sendMessage path
delivers the final answer; the draft preview clears naturally on
the client and the user sees a real message in their history.
Inspired by PR #3412 by @NivOO5. Re-authored against current main
(stream_consumer.py is now ~4x larger than at #3412's branch base, with
new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the
original PR didn't account for) but the design call (DM-only, edit-
fallback, transport=auto|draft|edit|off) is faithful to the original
proposal, with two improvements baked in:
1. Per-response draft_id (monotonic counter, not a time hash) — no
collision risk across consecutive responses on the same chat.
2. Tool-boundary draft_id bump — prevents the inter-tool-call leak
openclaw hit during their rollout (their #32535).
Closes #21439 (duplicate feature request).
|
||
|
|
737314fe91 |
fix(telegram): normalize dm threads and retry control sends
Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id issue (#3206): 1. _build_message_event filters DM thread_ids: only preserve thread_id for real topic messages (is_topic_message=True). Telegram puts message_thread_id on every DM that is a reply, but reply-chain ids route to nonexistent threads on send. 2. _send_message_with_thread_fallback helper: control sends (send_update_prompt, send_exec_approval / send_slash_confirm, send_model_picker) retry once without message_thread_id when Telegram returns BadRequest 'Message thread not found'. Mirrors the pattern PR #3390 added for the streaming send path. Salvage notes: - Conflict 1 (line ~4099): merged the contributor's DM is_topic_message filter with the existing forum General-topic default from #22423, preserving both behaviors. - Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416) alongside the new helper. Tightened the helper's exception catch from bare 'Exception' to use the existing _is_bad_request_error + _is_thread_not_found_error helpers (line 484-496) for consistency with the streaming send path. - Widened the fix to send_update_prompt (was bare self._bot.send_message, same bug class). Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@ local commit author). |
||
|
|
404640a2b7
|
feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
|
||
|
|
c0da5d09a6 |
fix: use UTF-16 length for Telegram stream consumer message splitting
The stream consumer measured message length using Python's len() (Unicode code points), but Telegram's actual limit is in UTF-16 code units. This caused messages with supplementary characters (emoji, CJK, etc.) to exceed Telegram's 4096-character limit, resulting in truncated messages with formatting artifacts. Changes: - Add message_len_fn property to BasePlatformAdapter (defaults to len) - Override in TelegramAdapter to return utf16_len - Stream consumer uses adapter.message_len_fn for: - safe_limit calculation - overflow detection - truncate_message calls - split point calculation (via _custom_unit_to_cp) - fallback final send chunking Fixes truncated messages with black square artifacts on Telegram when the model generates responses containing multi-byte Unicode characters. |
||
|
|
2f00559d9e |
fix(telegram): pass source.thread_id explicitly on auto-reset notice (carve-out of #7404)
The auto-reset notice ("◐ Session automatically reset…") was being sent
with metadata=getattr(event, 'metadata', None), which can drop or
mis-route in Telegram forum topics: the event's metadata isn't
guaranteed to carry the originating thread_id, so the notice could leak
into General or another topic.
Use the existing self._thread_metadata_for_source(source) helper, which
already handles thread_id construction plus the Telegram DM topic
reply-fallback shape used everywhere else in the gateway.
Carve-out of #7404. The PR's other hunk (line 7578, queued first
response) is already redundant on main — gateway/run.py:15782 has used
_status_thread_metadata since the _thread_metadata_for_source plumbing
landed.
Closes #7355 (path B; paths A and C closed via prior salvage merges).
|
||
|
|
e164a9c1ed |
fix(stream-consumer): preserve thread routing on overflow first-send path
When the first streamed message exceeds the platform length limit and gets split into chunks, _send_new_chunk was called with self._message_id (which is None on first send), dropping thread routing entirely. Fallback to self._initial_reply_to_id so overflow chunks land in the correct topic/thread. Also fix a fragile test assertion that could be silently skipped. |
||
|
|
ff14666cdc |
fix(gateway): stream consumer first message drops thread context
Cherry-picked from PR #13077 commits: - 5500c7d8 fix(gateway): stream consumer first message drops thread context - e84403b9 test(gateway): add regression tests for stream consumer thread routing Fixes: Streaming first message drops thread/topic context in Feishu group topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and _send_new_chunk. Also fixes Feishu _send_raw_message fallback path (reply -> create) to use receive_id_type='thread_id' so the new message lands in the correct topic instead of the main channel. Authored by hrygo via PR #13077 (re-attributed from the bot-authored salvage commit on the original branch). |
||
|
|
6636fecd47
|
fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420)
The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was
copying the cumulative _already_sent flag into _final_response_sent on the
done frame. _already_sent goes True on any successful prior edit (tool
progress) or on fallback-mode promotion when an edit fails — neither
proves the *current* chunked send delivered the final answer.
When the chunked send actually fails (network error, flood control), the
consumer would wrongly claim 'final delivered' and the gateway's
independent fallback delivery in run.py would be suppressed. User saw
only tool-progress bubbles and never got the answer.
Now we track per-chunk success locally: _send_new_chunk returns the new
message_id on success or returns the passed-in reply_to unchanged on
failure. If at least one returned id differs, chunks_delivered = True;
otherwise stays False, gateway fallback runs.
Adds two regression tests:
- test_split_overflow_failed_send_does_not_mark_final_sent — primes
_already_sent=True, then makes every send fail; asserts
_final_response_sent stays False.
- test_split_overflow_partial_send_marks_final_sent — happy path,
asserts _final_response_sent goes True.
Note: the companion bug at the CancelledError handler (issue cited
lines 417-418) was already fixed by
|
||
|
|
a96dd54872 |
fix: deduplicate kanban notifications for blocked/gave_up states
The kanban notifier was re-firing the same blocked/gave_up/crashed/timed_out notifications on every 5-second tick. Root cause: after delivering a terminal event, the notifier unsubscribed the subscription, deleting its cursor. If the unsub failed (WAL contention, transient error), the subscription survived with a stale cursor, and the next tick would re-deliver the same event. Even when the unsub succeeded, the subscription was gone. If the task later transitioned to a different state (e.g., blocked -> unblocked -> blocked again), a new subscription would start at cursor=0, re-delivering all past events. Fix: stop unsubscribing on terminal event kinds. Only remove the subscription when the task reaches a truly final status (done/archived). For blocked, gave_up, crashed, and timed_out, the subscription stays alive and the cursor mechanism deduplicates naturally -- events with id <= last_event_id are never re-fetched. This makes the dedup idempotent and eliminates the re-fire bug. The old concern about subscriptions leaking forever on blocked tasks is moot: blocked tasks will eventually be unblocked (transitioning to ready/running) or archived, at which point the subscription is cleaned up. |
||
|
|
ec1fad3449 |
fix(gateway): align fallback delete with sibling style + add regression tests
Follow-up to HuangYuChuh's #17384 cherry-pick: - Use defensive getattr+logger.debug for delete_message lookup, mirroring the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms that don't implement delete_message no longer raise AttributeError; the failure path now logs at debug for diagnosability instead of silently swallowing. - Add three regression tests in tests/gateway/test_stream_consumer.py: - delete_message awaited on happy-path exit with stale id - delete_message NOT awaited when no fallback chunks reached the user - no crash on adapters that lack delete_message (spec-restricted mock) |
||
|
|
4eb8479ebd |
fix(gateway): delete partial message after fallback send on flood control
When Telegram flood control triggers 3+ consecutive edit failures, the stream consumer enters fallback mode and sends the complete response as a new message. This leaves the user seeing two messages: a frozen partial (with cursor) and the full duplicate. After the fallback chunks are sent successfully, delete the original partial message so the user only sees one complete response. The delete is best-effort — if it fails (e.g. flood still active, missing permissions), the full answer is still delivered. Fixes #16668 |
||
|
|
9c68d12079 |
test(kanban): cover send-exception rewind + drop noisy success log to debug
Two follow-up improvements to the previous commit's notifier dedup work. 1. Add a regression test for the send-exception rewind path. The contributor's PR included a test for the adapter-disconnect path (test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where adapter is None at delivery time), but not for the "adapter is connected, send() raises" path that fires inside the inner try/except at gateway/run.py:4314. The new test (test_kanban_notifier_rewinds_claim_on_send_exception) uses a FailingAdapter that always raises and confirms (a) send was actually attempted, (b) the claim was rewound, (c) the next call to unseen_events_for_sub still returns the event for retry. 2. Drop the per-delivery success log from INFO to DEBUG. A busy board on a multi-platform gateway can produce hundreds of these per day; that's gateway.log noise that obscures real warnings. Failure paths stay at WARNING (where you'd want to look when something's wrong) so we don't lose visibility into transient send issues. |
||
|
|
861ce7c0b6 | fix: dedupe kanban notifier delivery claims | ||
|
|
00ce5f04d9 |
feat(session): make /handoff actually transfer the session live
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.
State machine on sessions table replaces the boolean flag:
None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.
CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.
Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:
1. Resolves the platform's home channel.
2. Asks the adapter for a fresh thread via the new
create_handoff_thread(parent_chat_id, name) capability so the
handed-off conversation gets its own scrollback. Adapters that
don't support threads (or fail) return None and the watcher
falls back to the home channel directly.
3. Constructs a SessionSource keyed as 'thread' when a thread was
created, 'dm' otherwise, then session_store.switch_session
re-binds the destination key to the CLI session_id. The full
role-aware transcript replays via load_transcript on the next
turn (no flat-text injection into context_prompt).
4. Forges a synthetic MessageEvent(internal=True) with the handoff
notice and dispatches through _handle_message; the agent runs
against the loaded transcript and adapter.send delivers the
reply.
5. Marks the row 'completed' on success, 'failed' (+error) on any
exception.
Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:
- Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
so DM topics (Bot API 9.4+) and forum supergroups both work.
- Discord (gateway/platforms/discord.py): parent.create_thread on
text channels with a seed-message + message.create_thread
fallback for permission edge cases. Skips DMs and other
non-thread-capable parents.
- Slack (gateway/platforms/slack.py): posts a seed message and
returns its ts as the thread anchor — Slack threads are
message-anchored.
In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.
CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).
Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.
Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
|
||
|
|
878611a79d |
feat(session): add /handoff command for cross-platform session transfer
Adds /handoff <platform> CLI command that queues the current session for resume on the configured home channel of any messaging platform. CLI side: - /handoff telegram — marks session in shared DB, sends summary to the Telegram home channel via send_message - /handoff discord — same for Discord - Supports telegram, discord, slack, whatsapp, signal, matrix Gateway side: - On new session creation, checks for pending handoffs for the incoming message's platform - If found, loads the CLI session's full conversation history and injects it into the context prompt as a handoff transcript - Agent continues the conversation seamlessly Files: - hermes_state.py: handoff_pending, handoff_platform columns + helpers - cli.py: _handle_handoff_command dispatch + handler - hermes_cli/commands.py: CommandDef entry - gateway/run.py: handoff detection in _handle_message_with_agent - tests/hermes_cli/test_session_handoff.py: 8 tests |
||
|
|
a282434301
|
feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373)
* feat(gateway): per-platform admin/user split for slash commands Adds an opt-in two-list access control on top of the existing per-platform `allow_from` allowlists, scoped to slash commands only: - allow_admin_from — full slash command access - user_allowed_commands — what non-admins may run - group_allow_admin_from — same, group/channel scope - group_user_allowed_commands When `allow_admin_from` is unset for a scope, gating is disabled and every allowed user keeps full access (backward compat). Plain chat is unaffected. `/help` and `/whoami` are always reachable so users can see what they can run. Gate runs at the slash command dispatch site in gateway/run.py and uses `is_gateway_known_command()`, so it covers built-in AND plugin-registered commands through the live registry without per-feature wiring. Adds `/whoami` showing platform, scope, tier, and runnable commands. Salvage of PR #4443's permission tier work, scoped down. The full tier system, tool filtering, audit log, usage tracking, rate limiting, `/promote` flow, and persistent SQLite stores are not included here — those can be re-expanded later if needed. Co-authored-by: ReqX <mike@grossmann.at> * fix(gateway): close running-agent fast-path bypass + add coverage and central docs The slash command access gate was only applied at the cold dispatch site (line ~5921). When an agent was already running, the running-agent fast-path block (line ~5574) dispatched /restart, /stop, /new, /steer, /model, /approve, /deny, /agents, /background, /kanban, /goal, /yolo, /verbose, /footer, /help, /commands, /profile, /update directly without going through the gate — letting non-admins bypass gating just because an agent happens to be busy. Refactored the gate into _check_slash_access() and called from BOTH paths. /status remains intentionally pre-gate so users can always see session state. Also added 18 more dispatch tests covering: - Running-agent fast-path: blocks non-admin, allows admin, /status always works - Alias canonicalization (gate uses canonical name, not user alias) - Unknown / unregistered commands pass through (don't false-positive) - DM admin scope-locked when group has its own admin list - Multi-platform isolation (Discord gated, Telegram unrestricted) Docs: added Slash Command Access Control section to the central messaging index page + /whoami row in the chat commands table. Co-authored-by: ReqX <mike@grossmann.at> --------- Co-authored-by: ReqX <mike@grossmann.at> |