mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
872 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e2fd462ebe
|
ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861)
* ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock The full pytest suite reliably hangs at ~96% on origin/main, blowing through the 20-minute GHA job timeout on every CI push since yesterday. Individual tests complete in <30s — the deadlock builds up at session teardown after all tests run, when leaked threads and atexit handlers from thousands of tests interact and one of them lands in a futex-wait that never resolves. This PR is a stopgap that unblocks CI immediately + speeds up several slow tests we found while diagnosing. Changes - pyproject.toml: add pytest-timeout==2.4.0 to dev deps; bake --timeout=60 --timeout-method=thread into the default addopts. - scripts/run_tests.sh: re-add --timeout flags directly because the script wipes pyproject addopts with -o 'addopts='. - .github/workflows/tests.yml: explicit --timeout/--timeout-method on the CI pytest invocation for clarity. - gateway/run.py: in _run_agent, if the stream consumer was never created (e.g. non-streaming agent or test stub), cancel the stream_task immediately instead of waiting out the 5s wait_for timeout. ~5s saved per non-streaming gateway test run. - tests/run_agent/conftest.py: extend _fast_retry_backoff to patch agent.conversation_loop.jittered_backoff alongside run_agent.jittered_backoff. The retry loop was extracted into agent.conversation_loop which holds its own import — patching the run_agent reference alone left tests burning real wall-clock backoff seconds. - tests/run_agent/test_anthropic_error_handling.py tests/run_agent/test_run_agent.py (TestRetryExhaustion) tests/run_agent/test_fallback_model.py: same conversation_loop fix for per-test fixtures (defensive — the conftest covers them too). - tests/gateway/test_gateway_inactivity_timeout.py: trim run_duration 10.0 → 2.0 / 5.0 → 2.0 on three tests that wait the full SlowFakeAgent duration. Adjusted thresholds proportionally. - tests/gateway/test_api_server_runs.py: test_stop_interrupt_exception_does_not_crash trips the interrupted event in addition to raising, so the slow_run thread unblocks at teardown instead of waiting 10s. - tests/hermes_cli/test_update_gateway_restart.py: also patch time.monotonic in the autouse fixture. _wait_for_service_active loops on a wall-clock deadline; with sleep no-op'd the loop spun on real monotonic until 10s real-time per restart attempt (20s+ per test). - tests/tools/test_zombie_process_cleanup.py: cut runner._restart_drain_timeout 5.0 → 0.1 in test_gateway_stop_calls_close. Suite still hangs at 96% on full no-timeout runs; with these changes CI runs through to a real pass/fail signal. * chore(lock): regenerate uv.lock after adding pytest-timeout * ci: drop pytest-timeout 60 → 30s + bump GHA job 20 → 30 min Prior commit's timeout=60 was too generous — CI test job still hit the 20-min wall-clock cap with the suite hung at 96% (orphan agent-browser subprocesses blocking pytest session teardown). The local timeout=20 run completed in 6:17, so 30s is conservative enough to let real tests finish but aggressive enough to short-circuit deadlocks. Also bump GHA job timeout to 30 min as a safety margin. * test: delete 11 pre-existing failing tests + revert monotonic patch The previous PR commit landed pytest-timeout=30s and the suite now completes in 18:14 instead of hanging at 96%, but 11 pre-existing tests fail with real assertions. Per Teknium: nuke them. Deleted (no replacements): - tests/gateway/test_restart_resume_pending.py::test_clean_drain_does_not_mark_resume_pending - tests/gateway/test_restart_resume_pending.py::test_drain_timeout_only_marks_still_running_sessions - tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_gateway_install_passes_system_flags - tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages::test_install_wsl_with_systemd_warns - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_detects_launchd_and_skips_manual_restart_message - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways - tests/tools/test_file_operations.py::TestGitBaselineCheck::* (6 tests, entire class — _check_git_baseline helper doesn't exist) Also reverted my time.monotonic autouse-fixture hack in test_update_gateway_restart.py — it was causing worker crashes in CI by poisoning later tests in the same xdist worker. The two slow tests in that file (~24s and ~20s) will go back to taking real time but should still finish under the 30s pytest-timeout. * test: delete more pre-existing CI failures After previous push 3 more tests failed on CI; cull them all. Removed: - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_without_launchd_shows_manual_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_reset_failed_also_runs_before_retry_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_final_failure_message_tells_user_to_reset_failed - tests/run_agent/test_tool_call_args_sanitizer.py::test_marker_message_inserted_when_missing The 4 update_gateway_restart tests trigger `_wait_for_service_active` polling on a real wall-clock deadline that occasionally exceeds the 30s pytest-timeout cap and crashes xdist workers. The marker test has a pre-existing assertion mismatch. * test: nuke entire TestCmdUpdateLaunchdRestart class After surgical deletes of 4 tests this class keeps producing new worker-crashing tests. The pattern is consistent: any test in this class that triggers cmd_update's _wait_for_service_active polling spins on real wall-clock time and trips pytest-timeout's thread method, crashing the xdist worker. Just delete the whole class (285 lines, ~10 tests). These exercise macOS-only launchd behavior that's better tested on a real macOS runner than in linux xdist. * test: stub the 2 fallback_model tests that crash xdist workers on CI * test: delete test_anthropic_error_handling.py + test_fallback_model.py entirely These two files exercise the agent retry/fallback code paths and consistently crash xdist workers under pytest-timeout's thread method. Whack-a-mole-stubbing individual tests just surfaces the next ones. Nuke both files. * test: delete tests/hermes_cli/test_update_gateway_restart.py entirely This file's cmd_update integration tests consistently crash xdist workers under pytest-timeout's thread method. Surgical deletes just surface the next set. Removing the whole file. * ci(tests): switch pytest-timeout method thread → signal Thread-method has been crashing xdist workers when it interrupts code that's not interruption-safe (retry loops, threading.Event waits, etc). Signal method uses SIGALRM which is interpreter-level and cleanly raises a Failed: Timeout exception in test code. Should stop the worker crash cascade — failures will surface as proper Timeout markers we can diagnose individually. |
||
|
|
a3c753128d |
fix(telegram): address post-merge audit follow-ups (#28670, #28672, #28674, #28676, #28678)
Five small fixes against issues filed during the post-merge salvage audit:
* #28670: `_GATEWAY_PROVIDER_ERROR_RE` false-positives on legitimate prose.
Replace the regex with an anchored `_GATEWAY_PROVIDER_ERROR_SHAPE_RE` and
add a length-cap heuristic to `_looks_like_gateway_provider_error`:
short envelope at the start of the message → real provider error; long
prose containing 'HTTP 404' → assistant answer, leave alone.
* #28672: drop the pointless 1s asyncio.sleep on Telegram thread-not-found
retries. The same-thread retry is preserved (catches Telegram's
occasional transient flake exercised by
test_send_retries_transient_thread_not_found_before_fallback) but with
no artificial delay.
* #28674: broaden `_should_retry_without_dm_topic_reply_anchor` to also
fire when Bot API rejects `direct_messages_topic_id` for synthetic /
resumed sends that have no reply anchor. Avoids dropping post-resume
background notifications if the topic id goes stale.
* #28676: delete the dead image-document branch superseded by
|
||
|
|
e2a1a2bf13 |
fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)
Pre-mark all running agent sessions as resume_pending BEFORE the drain wait begins. If the service manager kills the process during the drain (window), the durable marker is already written so the next gateway boot can recover in-flight sessions. On graceful drain completion, clear the early markers for sessions that finished successfully. |
||
|
|
ad2531be08 |
feat(telegram): skip-STT audio path + 2GB cap via local Bot API server
Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
8a80eee02d | Quiet noisy Telegram gateway errors | ||
|
|
b38140eb8f | fix(gateway): allow chat-scoped telegram auth without sender user_id | ||
|
|
95a0955e19 |
fix(gateway): restore Telegram DM topic thread_id after session split (#27166)
When context compression triggers a mid-turn session split, source.thread_id can be None on synthetic/recovered events. _thread_metadata_for_source then returns None, causing the Telegram adapter to send with no message_thread_id and the response lands in the General thread instead of the active DM topic. Fix: - hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse lookup by session_id (enabled by the existing UNIQUE INDEX on session_id). - gateway/run.py: After session-split detection, if source is a Telegram DM and source.thread_id is None, recover it from the binding via the new method so _thread_metadata_for_source produces the correct thread routing. - tests/: Coverage for the new lookup method and the recovery flow. |
||
|
|
9d789f3a5b |
feat(telegram): add disable_topic_auto_rename gateway flag
When Hermes auto-titles a session in a Telegram DM topic it currently renames the topic itself to the generated title. That works for operator-managed lanes (extra.dm_topics) but is disruptive for ad-hoc Threaded-Mode topics that users name by hand — every first exchange overwrites their chosen title. Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default False, preserving prior behaviour). When set, both _schedule_telegram_topic_title_rename and the underlying _rename_telegram_topic_for_session_title short-circuit before touching the Telegram API. Internal session titles (sessions list, TUI) keep working unchanged. Also bridge the legacy top-level telegram.disable_topic_auto_rename key through to gateway.platforms.telegram.extra so users on the older config layout don't have to migrate to enable it. - Tests cover the runtime flag, the scheduling entry-point, and string truthiness coercion for YAML-loaded values. - Docs updated in messaging/telegram.md with an example block. |
||
|
|
3ec28f34ca | fix(telegram): preserve topic metadata on overflow edits | ||
|
|
7682198178 |
fix(gateway): register Telegram commands for groups
Register Telegram bot commands across default, private, and group scopes so the slash-command menu is available outside DMs. Changes from review feedback: - Add asyncio.Lock to prevent race condition in _ensure_forum_commands - Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number - Upgrade error logging from debug->warning in forum registration - Add tests covering lazy forum registration and concurrent safety - Remove /start handler from this PR (separate feature) Fixes review: needs_work (race, magic number, log levels, missing tests) |
||
|
|
ede47a54be |
fix(gateway): pin Telegram DM-topic routing to user's current topic
Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged. |
||
|
|
d69f0c1a99 |
fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly
In Telegram "important" notifications mode (default), TelegramPlatformAdapter sets ``disable_notification=True`` on every send unless metadata carries ``notify=True``. GatewayRunner._send_voice_reply already passes thread metadata through to ``adapter.send_voice``, but never marks the final auto-TTS voice reply as notify-worthy — so users with the default mode get the final voice note delivered silently with no push notification. Mirror the final-text path in gateway/platforms/base.py (the existing text-response final send already adds ``metadata["notify"] = True``). Issue #27970 Bug 2. Bug 1 (MP3 vs. native OGG voice-note) is being addressed by existing PRs #20182 / #20878 — this PR is intentionally scoped to the silent-delivery bug only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fbabd560ff |
fix(gateway): route background-process notifications into Telegram DM topics
Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6be579f626 |
fix(telegram): preserve can_edit after transient network errors in progress edits (#27828)
When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 |
||
|
|
1b3c51bccc |
fix(gateway): keep tool-progress edits alive after Telegram flood control
When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 |
||
|
|
256c4c1b4a |
fix(gateway): scope audio_file_paths outside media_urls guard
The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. |
||
|
|
f55c67ac1f | fix(gateway): roll over Telegram tool progress bubbles | ||
|
|
b93996c35e |
fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)
Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 |
||
|
|
7fad501f08 | fix(telegram): default streaming transport to edit | ||
|
|
b5c1fe78aa
|
feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)
Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.
Use cases:
- /backend-dev → loads github-code-review + test-driven-development
+ github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.
Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
at invocation time, the same way /<skill> does. Prompt cache stays
intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
/help display.
Schema (~/.hermes/skill-bundles/<slug>.yaml):
name: backend-dev
description: Backend feature work.
skills:
- github-code-review
- test-driven-development
instruction: |
Optional extra guidance prepended to the loaded skills.
New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.
New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.
New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.
Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.
Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
scan/cache freshness, resolve (including underscore→hyphen
Telegram alias), build_bundle_invocation_message (loading, missing
skills, user/bundle instruction injection, dedup), save/delete,
reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
subcommand (create/list/show/delete/reload, --force, missing
bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
handler and bundle resolution priority.
Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.
Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
section with quick example, YAML schema, management commands,
behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
the top-level command table and given its own subcommand section.
|
||
|
|
e286e68756 |
feat(kanban): stale detection for running tasks in dispatcher
Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. |
||
|
|
f55d94a1e0 |
feat(kanban): wire dispatcher to dispatch review agents from review column
Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). |
||
|
|
d37574775b |
fix(gateway): quiet corrupt kanban dispatcher boards
Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.
Cherry-picked the substantive commit (
|
||
|
|
365da2d2df |
fix: 4 small surgical bugs
Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. |
||
|
|
5fdcfd851f |
feat(kanban): add max_in_progress config to cap concurrent running tasks
Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). |
||
|
|
06161c6ed8 |
fix(mattermost): resolve thread root_id and route progress to threads
Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. |
||
|
|
ea49b38625 |
fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found
Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. |
||
|
|
2b538c1f4e |
fix: guard json.loads() against invalid TTS and skill_view responses
Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. |
||
|
|
5987b24314 |
fix(gateway): exit code 75 on service restart so launchd relaunches
When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 |
||
|
|
4e9df52d60 |
fix: elevate plugin discovery failures from debug to warning
Plugin discovery exceptions in gateway startup (gateway/run.py) and CLI startup (hermes_cli/main.py) are caught and logged at DEBUG level, making them invisible at the default INFO log level. If any plugin import fails — syntax error, missing dependency, import cycle — operators get zero indication unless they bump the log level to DEBUG. This makes broken plugins appear enabled but silently non-functional. Change both locations to logger.warning() so failures are visible at production log levels. Closes #28137 |
||
|
|
5766504c60 |
fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch
_deliver_kanban_artifacts used a broader _IMAGE_EXTS that included .bmp, .tiff, and .svg. These three extensions are absent from the equivalent set in _deliver_media_from_response (line 10661), which intentionally routes them through send_document rather than send_multiple_images (comment near line 10522 notes that Telegram sendPhoto recompresses and rejects non-raster formats). Routing .svg (XML text), .bmp, or .tiff through the photo API causes send_multiple_images to raise on most platforms; the exception is caught and logged as a warning, silently dropping the artifact. Aligning the two sets ensures kanban deliverables with these extensions follow the same send_document path as regular agent responses. No behaviour change for .png/.jpg/.jpeg/.gif/.webp. |
||
|
|
1634397ddb
|
fix(compress): abort instead of dropping messages when summary LLM fails (#28102)
When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path. |
||
|
|
f2fdb9a178
|
feat(gateway): deliverable mode — ship artifacts as native uploads from any agent surface (#27813)
The agent can now produce a chart, PDF, spreadsheet, or any other supported file type and have it land in Slack / Discord / Telegram / WhatsApp / etc. as a native attachment, just by mentioning the absolute path in its response. Same primitive works for kanban-worker completions: workers attach artifacts via kanban_complete(artifacts=[...]) and the gateway notifier uploads them alongside the completion message. Changes: - gateway/platforms/base.py: extract_local_files now covers PDFs, docx, spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video. Image/video extensions still embed inline; everything else routes to send_document via the existing dispatch partition in gateway/run.py. - tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains an explicit ``artifacts`` parameter. The handler stashes it in metadata.artifacts (for downstream workers) and the kernel promotes it onto the completed-event payload so the notifier can find it without a second SQL round-trip. - gateway/run.py: _kanban_notifier_watcher now calls a new helper _deliver_kanban_artifacts after sending the completion text. The helper reads payload.artifacts (preferred), falls back to scanning the payload summary and task.result with extract_local_files, then partitions images / videos / documents and uploads each via send_multiple_images / send_video / send_document. - website/docs/user-guide/features/deliverable-mode.md + sidebars.ts: user-facing docs page covering the extension list, the kanban artifacts pattern, and the MCP-for-connector-breadth recommendation. Tests: - tests/gateway/test_extract_local_files.py: 7 new test cases (documents, spreadsheets, presentations, audio, archives, html, chart-pdf canonical case). 44 passing, 0 regressions. - tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts arg shape (list / string / merge with existing metadata / type rejection). 17 passing. - tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full notifier → artifact-upload path and missing-file silent-skip. 12 passing. - E2E (real files, real kanban kernel, real BasePlatformAdapter): worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata + event payload land → notifier helper partitions correctly → send_multiple_images called once with the PNG, send_document called twice with PDF + CSV. What's NOT in this PR (deferred to follow-ups): - Ad-hoc "research this for two hours, ping the thread when done" slash command — covered today by kanban subscriptions; a dedicated slash command can ride a follow-up PR if needed. - Setup-wizard prompt for recommended MCP servers (Notion, GitHub, Linear, etc.) — docs page lists them; UI is a separate change. Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf (local doc, not shipped). |
||
|
|
0fa46c613b |
fix(yuanbao): persist message_id on @bot user transcript writes
Yuanbao's QuoteContextMiddleware has a transcript-lookup fallback for
when quote.desc is empty: it scans the session transcript for the quoted
message_id and pulls ybres anchors out of its content. That fallback
works for observed (silent) group messages because the platform writer
attaches message_id (yuanbao.py:2091).
It silently fails for @bot agent-processed messages because gateway/run.py
wrote them as {role:user, content, timestamp} with no message_id, so
quoting an earlier @bot turn that contained an image/file couldn't be
resolved.
Fix: attach event.message_id to the user transcript entry at all three
write sites in gateway/run.py — the agent_failed_early branch, the
no-new-messages edge case, and the normal agent path (first user-role
entry in new_messages).
Surfaces gap reported in #27425 (loongfay) using the existing fallback
already on main; no new caches needed.
Co-authored-by: loongfay <loongfay@users.noreply.github.com>
|
||
|
|
1345dda0cf
|
feat(kanban): orchestrator-driven auto-decomposition on triage (#27572)
* feat(kanban): orchestrator-driven auto-decomposition on triage
Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").
The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
read_profile_meta / write_profile_meta. `create_profile` accepts
optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
sentence description from a profile's skills + model + name via the
auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
flag; new `hermes profile describe [name] [--text ... | --auto |
--all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
creates N child tasks, links the root as a child of every leaf
(root waits for the whole graph), flips root `triage -> todo` with
orchestrator assignee, records an audit comment + `decomposed` event
in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
(`auxiliary.kanban_decomposer`) with the profile roster + descriptions
to produce a JSON task graph, then invokes the DB helper. Rewrites
unknown assignees to the configured `kanban.default_assignee` (or
the active default profile) so a task NEVER lands with assignee=None.
Falls back to specify-style single-task promotion when the LLM
returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
kanban.orchestrator_profile, kanban.default_assignee,
kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
(default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
before each `_tick_once`, capped by `auto_decompose_per_tick` so a
bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
GET /profiles (list roster + descriptions),
PATCH /profiles/<name> (set description, user-authored),
POST /profiles/<name>/describe-auto (LLM-generate),
POST /tasks/<id>/decompose (run decomposer),
GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
collapsible — dropdowns for orchestrator profile and default
assignee, auto-decompose toggle, per-profile description editor with
Save and Auto-generate buttons. New ⚗ Decompose button next to
✨ Specify on triage-column task drawers.
Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
Children with no internal parents flip to `ready` immediately
(parallel dispatch). Children with sibling parents wait. The root
stays alive as a parent of every child — when the whole graph
finishes, it promotes to `ready` and the orchestrator profile wakes
back up to judge completion (the "adds more tasks until done" part
of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
automatically on dispatcher ticks; manual `hermes kanban decompose`
is always available.
Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
atomic DB helper (status transitions, dep graph, audit trail,
validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
decomposer module (fanout, no-fanout fallback, unknown-assignee
rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
user-vs-auto description protection, --overwrite, fallback parsing).
E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
task, mocked the aux LLM with a 3-task graph -> verified all three
children were created with the right assignees, the dependency
edges matched the LLM's graph, root flipped to todo gated by every
child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
HERMES_HOME, verified all four new endpoints via curl (profile
listing, PATCH for description, PUT for orchestration settings,
POST for decompose). Opened the UI in the browser, confirmed the
OrchestrationPanel renders with all three pickers + the per-profile
description editor, typed a description, clicked Save, verified
~/.hermes/profile.yaml was written. Clicked Decompose on the triage
card and confirmed the inline error message surfaced as designed
("no auxiliary client configured").
* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill
The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.
UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
to "Decompose mode / Auto (default) | Manual" for language parity
with the pill.
Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
until the user clicks ⚗ Decompose on each card (or runs
`hermes kanban decompose <id>`).
Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
(not just on expand) so the collapsed pill reflects real state.
Render mode pill in both collapsed and expanded headers. Reuses the
existing PUT /api/plugins/kanban/orchestration endpoint — no new
backend, no new tests required.
E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.
* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"
Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.
- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change
* docs(kanban): document decompose, profile descriptions, orchestration mode
Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.
- website/docs/reference/cli-commands.md
+ `hermes kanban decompose` action row in the action table, with
pointer to the Auto vs Manual orchestration section.
- website/docs/reference/profile-commands.md
+ `--description "<text>"` flag on `hermes profile create`.
+ Full `hermes profile describe` section: read, --text, --auto,
--overwrite, --all flags with examples.
- website/docs/user-guide/features/kanban.md (the big one)
+ Triage column intro rewritten around the Auto-decompose default
behavior, with pointer to the new Auto vs Manual section.
+ Status action row updated to mention both ⚗ Decompose and
✨ Specify on triage cards.
+ New "Auto vs Manual orchestration" section explaining the two
modes, how to flip them (pill, config), how routing-by-description
works, the no-None-assignee guarantee, plus a config knob table
(auto_decompose, auto_decompose_per_tick, orchestrator_profile,
default_assignee) and the two new auxiliary slots
(kanban_decomposer, profile_describer).
+ REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
/profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
/orchestration (GET + PUT).
- website/docs/user-guide/features/kanban-tutorial.md
+ Triage column blurb updated for Auto by default + Manual via the
pill, with cross-link to the Auto vs Manual orchestration section.
- website/docs/user-guide/profiles.md
+ Blank-profile flow now mentions --description and points to the
kanban routing model for context.
- website/docs/user-guide/configuration.md
+ `kanban_decomposer` and `profile_describer` added to the
`hermes model -> Configure auxiliary models` menu listing.
|
||
|
|
4afd479f51 |
fix(gateway): use service restart path in Docker/Podman containers
The /restart command used a detached subprocess approach to restart the gateway. In Docker, when the gateway process exits, tini (PID 1) also exits, causing Docker to stop the container and kill the detached helper before it can restart the gateway. This made /restart effectively a /shutdown in containerized deployments. Detect Docker (/.dockerenv) and Podman (/run/.containerenv) containers and use the service restart path (exit code 75) instead, letting the container restart policy handle the actual restart. Note: requires restart policy that restarts on non-zero exit (e.g. unless-stopped or on-failure). |
||
|
|
5fba236644
|
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)). |
||
|
|
fdd455bc58 | fix(gateway): avoid zsh status variable in update wrapper | ||
|
|
5338250dab |
fix(gateway): add direct_messages_topic_id for synthetic Telegram DM events
When /goal loop generates synthetic MessageEvents (goal continuations, status notices), the reply anchor is unavailable (message_id=None). For Telegram DM topic lanes, the Telegram adapter requires direct_messages_topic_id to route messages correctly; without it, the adapter falls back to message_thread_id=None, sending messages to the root 'All Messages' thread instead of the active topic lane. The fix includes direct_messages_topic_id in thread metadata for all non-General Telegram DM topics, ensuring queued/synthetic messages are delivered to the correct thread even when no reply anchor exists. |
||
|
|
e21cb8d145
|
feat(status): append session recap to /status output (#27176)
Adds a pure-local recap of recent session activity — turn counts, tools used, files touched, last user ask, last assistant reply — appended to the existing /status output. Useful when juggling multiple sessions and you want a one-glance reminder of where this one left off. Inspired by Claude Code 2.1.114's /recap, but folded into /status so we don't add a 6th info command. Pure local computation: no LLM call, no auxiliary model, no prompt-cache invalidation, instant and free. Salvage of #18587 — kept the shared hermes_cli.session_recap.build_recap helper and its 13 unit tests, dropped the /recap slash command + ACTIVE_SESSION_BYPASS_COMMANDS entry + Level-2 bypass since /status already covers both surfaces. Tailored to hermes-agent's tool vocabulary: file-editing tools (patch, write_file, read_file, skill_manage, skill_view) surface touched paths; tool-call counts highlight which classes of work drove the session. Source: https://code.claude.com/docs/en/whats-new/2026-w17 |
||
|
|
dc3d0fe148
|
Port from cline/cline#10343: periodic gateway memory logging (#27102)
Emit a grep-friendly '[MEMORY] rss=...MB ...' line in agent.log / gateway.log every N minutes (default 5) so slow leaks in the long-lived gateway process show up as a time series. Based on https://github.com/cline/cline/pull/10343 (src/standalone/memory-monitor.ts). - gateway/memory_monitor.py: new module. Daemon thread, baseline on start, final snapshot on stop. Uses resource.getrusage() (stdlib) first, falls back to psutil, disables itself with one WARNING if neither is available. - gateway/run.py: start monitor right after setup_logging() in start_gateway(); stop it in the shutdown block next to MCP teardown. - hermes_cli/config.py: logging.memory_monitor { enabled, interval_seconds } defaults under the existing logging section. - tests/gateway/test_memory_monitor.py: 10 unit tests covering format, baseline/shutdown snapshots, double-start noop, periodic timer, daemon thread invariant, and unavailable-RSS warn-and-skip path. Adapted from TypeScript/Node to Python (threading.Event-based daemon thread instead of setInterval/unref), added Python-specific gc + thread counts to the log line (handier than ext/arrayBuffers for diagnosing Python gateway leaks), and gated behind a config.yaml toggle so users can silence the periodic line if they want. No heap-snapshot-on-OOM equivalent — CPython doesn't have V8's --heapsnapshot-near-heap-limit; tracemalloc would be the Python equivalent but adds non-trivial overhead, so leaving that out. |
||
|
|
518f39557b
|
fix(gateway): keep running when platforms fail; add per-platform circuit breaker + /platform (#26600)
Stop the gateway from exiting (or systemd-restart-looping) when a single
messaging adapter fails at startup or runtime. A misconfigured WhatsApp
(npm install timeout, unpaired bridge, missing creds.json) used to take
the entire gateway down, killing cron jobs and any other connected
platforms with it.
Changes:
• Startup (gateway/run.py): when connected_count==0 but the only
errors are retryable, log a degraded-state warning and keep the
gateway alive instead of returning False. Reconnect watcher then
recovers platforms as their underlying problem clears.
• Runtime (gateway/run.py _handle_adapter_fatal_error): when the last
adapter goes down with a retryable error and is queued for
reconnection, stay alive instead of exit-with-failure. Previously
this triggered systemd Restart=on-failure, which created infinite
restart loops on persistent retryable failures (proxy outage,
repeated bridge crashes).
• Reconnect watcher (gateway/run.py _platform_reconnect_watcher):
replace the 20-attempt hard drop with a circuit-breaker pause.
After _PAUSE_AFTER_FAILURES (10) consecutive retryable failures, the
platform stays in _failed_platforms with paused=True so the watcher
skips it but the operator can still see and resume it. Non-retryable
errors still drop out of the queue immediately. Resolves #17063
(gateway giving up on Telegram after 20 attempts).
• WhatsApp preflight (gateway/platforms/whatsapp.py): refuse to start
the Node bridge when creds.json is missing. Sets a non-retryable
whatsapp_not_paired fatal error so the watcher drops it cleanly
with a single 'run hermes whatsapp' log line instead of paying the
30s bridge bootstrap timeout on every gateway start.
• WhatsApp setup ordering (hermes_cli/main.py cmd_whatsapp): only set
WHATSAPP_ENABLED=true once pairing actually succeeds. Previously
the wizard wrote the env var at step 2 (before npm install and QR
pairing), so any Ctrl+C left .env claiming WhatsApp was ready when
the bridge had no creds.json. Also propagate the env var when the
user keeps an existing pairing on a re-run.
• /platform slash command (hermes_cli/commands.py + gateway/run.py):
new gateway-only command for manual circuit-breaker control.
/platform list — show connected + failed/paused platforms
/platform pause <name> — silence a known-broken platform
/platform resume <name> — re-queue a paused platform
Tests:
• New: pause/resume helpers, /platform list|pause|resume command,
WhatsApp creds.json preflight, WhatsApp setup ordering.
• Updated: stale assertions that codified the old 'exit and let
systemd restart' behavior in test_runner_fatal_adapter.py,
test_runner_startup_failures.py, and test_platform_reconnect.py
(the 20-attempt give-up test became a circuit-breaker pause test).
5488 tests pass in tests/gateway/.
|
||
|
|
4e89c53082
|
fix(async): close unscheduled coroutines in all threadsafe bridges (#26584)
Wraps every sync->async coroutine-scheduling site in the codebase with a new agent.async_utils.safe_schedule_threadsafe() helper that closes the coroutine on scheduling failure (closed loop, shutdown race, etc.) instead of leaking it as 'coroutine was never awaited' RuntimeWarnings plus reference leaks. 22 production call sites migrated across the codebase: - acp_adapter/events.py, acp_adapter/permissions.py - agent/lsp/manager.py - cron/scheduler.py (media + text delivery paths) - gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper which now delegates to safe_schedule_threadsafe) - gateway/run.py (10 sites: telegram rename, agent:step hook, status callback, interim+bg-review, clarify send, exec-approval button+text, temp-bubble cleanup, channel-directory refresh) - plugins/memory/hindsight, plugins/platforms/google_chat - tools/browser_supervisor.py (3), browser_cdp_tool.py, computer_use/cua_backend.py, slash_confirm.py - tools/environments/modal.py (_AsyncWorker) - tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to factory-style so the coroutine is never constructed on a dead loop) - tui_gateway/ws.py Tests: new tests/agent/test_async_utils.py covers helper behavior under live loop, dead loop, None loop, and scheduling exceptions. Regression tests added at three PR-original sites (acp events, acp permissions, mcp loop runner) mirroring contributor's intent. Live-tested end-to-end: - Helper stress test: 1500 schedules across live/dead/race scenarios, zero leaked coroutines - Race exercised: 5000 schedules with loop killed mid-flight, 100 ok / 4900 None returns, zero leaks - hermes chat -q with terminal tool call (exercises step_callback bridge) - MCP probe against failing subprocess servers + factory path - Real gateway daemon boot + SIGINT shutdown across multiple platform adapter inits - WSTransport 100 live + 50 dead-loop writes - Cron delivery path live + dead loop Salvages PR #2657 — adopts contributor's intent over a much wider site list and a single centralized helper instead of inline try/except at each site. 3 of the original PR's 6 sites no longer exist on main (environments/patches.py deleted, DingTalk refactored to native async); the equivalent fix lives in tools/environments/modal.py instead. Co-authored-by: JithendraNara <jithendranaidunara@gmail.com> |
||
|
|
23ac522d37 |
fix(gateway): isinstance-guard string-form 429 error body
When a non-Anthropic provider (e.g. Morpheus proxy) returns a 429 with
`{"error": "Too Many Requests"}` instead of the expected
`{"error": {"type": ...}}` dict, _err_body.json().get("error", {})
returns the raw string and the next .get("type") line crashes with
AttributeError, taking down the message handler.
Guard with isinstance(_err_json, dict) so non-dict error bodies fall
through to the generic rate-limit hint.
Salvaged from PR #2587 by @KiraKatana. The PR's fallback-config
`base_url`/`api_key_env` fix was already implemented independently
on main (run_agent.py:8759-8780) with additional aliases and Ollama
Cloud host handling, so only the gateway guard is cherry-picked.
Co-authored-by: KiraKatana <kira.ops@proton.me>
|
||
|
|
e84fe483bc |
feat(discord): channel history backfill for multi-user sessions
Adds optional channel-context backfill for Discord shared-channel sessions so the agent can see recent messages it missed between its own turns (typically when require_mention=true filters out most traffic). Previously the agent only saw the @mention message that triggered it, which led to disorienting replies in active multi-user channels where the conversation context was invisible. With backfill enabled, a configurable number of recent messages are fetched per-turn and prepended to the trigger message as a context block, kept separate from sender-prefix logic so attribution remains clean. This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20, closed when I closed the branch to address the simpolism:main head-branch issue plus an ordering bug I caught later in live use). Filing against the freshly-rewritten problem statement in #13054 so the design is grounded in the failure mode rather than the implementation shape. The implementation follows the **push-mode last-self-anchored** design from the two options laid out in #13054. See the issue for the trade-off discussion vs pull-mode (#13120 was an earlier closed PR using that shape). Treating this as a reference implementation — happy to rewrite as last-trigger anchoring or as a hybrid with #13120 if maintainers prefer. Changes: - gateway/platforms/discord.py: - new `_discord_history_backfill()` / `_discord_history_backfill_limit()` helpers (config.extra > env > default), mirroring the existing `_discord_require_mention()` shape - new `_fetch_channel_context()` that scans `channel.history()` backwards from the trigger to the bot's last message (or limit), formats as `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS, skips system messages - per-channel `_last_self_message_id` cache to narrow the fetch window on hot paths (avoids full history scan when the bot has spoken recently) - **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`. discord.py 2.x silently flips the default to True when `after=` is supplied, which would select the EARLIEST N messages after our last response instead of the LATEST N before the trigger. In high-traffic windows this would return stale tool traces and drop the actual final answer the user is asking about. See regression test below. Caught in live use during a Codex tool-trace burst on May 13 2026. - gateway/config.py: discord_history_backfill + discord_history_backfill_limit settings + yaml→env bridge - gateway/platforms/base.py: channel_context field on MessageEvent - gateway/run.py: prepend channel_context after sender-prefix so the [sender name] tag applies to the trigger message alone, not to the backfill - hermes_cli/config.py: defaults for new discord.history_backfill and discord.history_backfill_limit keys - cli-config.yaml.example: documented defaults - tests/gateway/test_discord_free_response.py: 7 new tests covering cold-start backfill, self-message stop boundary, other-bot filtering, cache hot-path narrowing, stale-cache fallback, shared-channel + per-user backfill paths, and the ordering regression test (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`) - tests/gateway/test_config.py: yaml→env bridge tests - tests/gateway/test_session.py: prefix-order edge cases - website/docs/user-guide/messaging/discord.md: env vars + config keys + usage docs Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord research server for the past three weeks. Fixes #13054 Supersedes #13063 (closed) |
||
|
|
bc42e62b17 |
fix(gateway): prevent duplicate final send when only cosmetic edit failed
When the stream consumer's got_done handler successfully delivers the final response content via _send_or_edit but the subsequent edit (e.g. cursor removal) fails, final_response_sent remains False even though the user has already received the final answer. The gateway's fallback send path then re-delivers the same content, causing the user to see the response twice on Telegram. Introduce a new _final_content_delivered flag on the stream consumer, set by the got_done handler when the final content has reached the user. The _run_agent suppression logic now treats this flag as an additional signal (alongside final_response_sent and response_previewed) that final delivery is already complete. This preserves the existing behavior for intermediate-text-only streams (where already_sent=True but no final content has been delivered) — those still receive the gateway's fallback send, matching the test expectation in test_partial_stream_output_does_not_set_already_sent. Adds TestFinalContentDeliveredSuppression with two cases covering both the suppression (content delivered + edit failed) and the non-suppression (intermediate text only) branches. |
||
|
|
3adde245b7 |
fix(gateway): forward image attachments to background agent tasks
When the gateway spawned a background agent (e.g. for delegation), media URLs and types from the originating message weren't forwarded — the bg agent saw the prompt but no attached images. Vision-enabled tasks effectively lost their inputs. Forwards media_urls/media_types through the bg-task spawn path and runs the same vision-enrichment step the main flow uses, so the bg agent gets image descriptions inlined into its prompt. Closes #25614. Salvage of #25603 by @oxngon (manually re-applied — original branch was severely stale against current main). |
||
|
|
8f19078c6a
|
feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited. |
||
|
|
091d8e1030
|
feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182)
* feat(codex-runtime): scaffold optional codex app-server runtime
Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.
Lands in three pieces:
1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
for codex's app-server protocol (codex-rs/app-server). Spawn, init
handshake, request/response, notification queue, server-initiated
request queue (for approval round-trips), interrupt-friendly blocking
reads. Tested against real codex 0.130.0 binary end-to-end during
development.
2. hermes_cli/runtime_provider.py:
- Adds 'codex_app_server' to _VALID_API_MODES.
- Adds _maybe_apply_codex_app_server_runtime() helper, called at the
end of _resolve_runtime_from_pool_entry(). Inert unless
'model.openai_runtime: codex_app_server' is set in config.yaml AND
provider in {openai, openai-codex}. Other providers cannot be
rerouted (anthropic, openrouter, etc. preserved).
3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
covering api_mode registration, the rewriter helper (default-off,
case-insensitive, opt-in, non-eligible providers preserved), version
parser, missing-binary handling, error class. Does NOT require codex
CLI installed.
This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.
Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)
* feat(codex-runtime): add codex item projector for memory/skill review
The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.
Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
- userMessage → {role: user, content}
- agentMessage → {role: assistant, content: text}
- reasoning → stashed in next assistant's 'reasoning' field
- commandExecution → assistant tool_call(name='exec_command') + tool result
- fileChange → assistant tool_call(name='apply_patch') + tool result
- mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result
- dynamicToolCall → assistant tool_call(name=<tool>) + tool result
- plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls
Invariants preserved:
- Message role alternation never violated: each tool item produces at most
one assistant + one tool message in that order, correlated by call_id.
- Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
don't materialize messages — only item/completed does. Mirrors how
Hermes already only writes the assistant message after streaming ends.
- Tool call ids are deterministic (codex item id-based) so replays produce
identical messages and prefix caches stay valid (AGENTS.md pitfall #16).
- JSON args use sorted_keys for the same reason.
Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).
23 new tests, all green:
- Streaming deltas don't materialize (3 paths)
- Turn/thread frame events are silent
- commandExecution: 5 tests including non-zero exit annotation +
deterministic id stability across replays
- agentMessage + reasoning attachment + reasoning consumption
- fileChange: summary without inlined content
- mcpToolCall: namespaced naming + error surfacing
- userMessage: text fragments only (drops images/etc)
- opaque items: no fabricated tool_calls
- Helpers: deterministic id stability + sorted JSON args
- Role alternation invariant across all four tool-shaped item types
This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.
* feat(codex-runtime): add session adapter + approval bridge
The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.
The adapter has a single public per-turn method:
result = session.run_turn(user_input='...', turn_timeout=600)
# result.final_text → assistant text for the caller
# result.projected_messages → list ready to splice into AIAgent.messages
# result.tool_iterations → tick count for _iters_since_skill nudge
# result.interrupted → True on Ctrl+C / deadline / interrupt
# result.error → error string when the turn cannot complete
# result.turn_id, thread_id → for sessions DB / resume
Behavior:
- ensure_started() spawns codex, does the initialize handshake, and
issues thread/start with cwd + permissions profile. Idempotent.
- run_turn() blocks until turn/completed, drains server-initiated
requests (approvals) before reading notifications so codex never
deadlocks waiting for us, projects every item/completed via the
projector, and increments tool_iterations for the skill nudge gate.
- request_interrupt() is thread-safe (threading.Event); the next loop
iteration issues turn/interrupt and unwinds.
- turn_timeout deadlock guard issues turn/interrupt and records an
error if the turn never completes.
- close() escalates terminate → kill via the underlying client.
Approval bridge:
Codex emits server-initiated requests for execCommandApproval and
applyPatchApproval. The adapter translates Hermes' approval choice
vocabulary onto codex's decision vocabulary:
Hermes 'once' → codex 'approved'
Hermes 'session' or 'always' → codex 'approvedForSession'
Hermes 'deny' / anything else → codex 'denied'
Routing precedence:
1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
2. approval_callback wired by the CLI (defers to
tools.approval.prompt_dangerous_approval())
3. Fail-closed denial when neither is wired
Unknown server-request methods are answered with JSON-RPC error -32601
so codex doesn't hang waiting for us.
Permission profile mapping mirrors AGENTS.md:
Hermes 'auto' → codex 'workspace-write'
Hermes 'approval-required' → codex 'read-only-with-approval'
Hermes 'unrestricted/yolo' → codex 'full-access'
20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
- test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
- test_codex_event_projector.py: 23 (item taxonomy projections)
- test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)
Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.
Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.
* feat(codex-runtime): wire codex_app_server runtime into AIAgent
The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.
Three small surgical edits to run_agent.py (~105 LOC total):
1. Line ~1204 (constructor api_mode validation set):
Add 'codex_app_server' so an explicit api_mode='codex_app_server'
passed to AIAgent() isn't silently rewritten to 'chat_completions'.
2. Line ~12048 (run_conversation, just before the while loop):
Early-return to _run_codex_app_server_turn() when self.api_mode is
'codex_app_server'. Placed AFTER all standard pre-loop setup —
logging context, session DB, surrogate sanitization, _user_turn_count
and _turns_since_memory increments, _ext_prefetch_cache, memory
manager on_turn_start — so behavior outside the model-call loop is
identical between paths. Default Hermes flow is unchanged when the
flag is off.
3. End-of-class (line ~15497):
New method _run_codex_app_server_turn(). Lazy-instantiates one
CodexAppServerSession per AIAgent (reused across turns), runs the
turn, splices projected_messages into messages, increments
_iters_since_skill by tool_iterations (since the chat_completions
loop normally does that per iteration), fires
_spawn_background_review on the same cadence as the default path.
Counter accounting:
_turns_since_memory ← already incremented at run_conversation:11817
(gated on memory store configured) — codex
helper does NOT touch it (would double-count).
_user_turn_count ← already incremented at run_conversation:11793
— codex helper does NOT touch it.
_iters_since_skill ← incremented in the chat_completions loop per
tool iteration. Codex helper increments by
turn.tool_iterations since the loop is bypassed.
User message:
ALREADY appended to messages by run_conversation pre-loop (line 11823)
before the early-return reaches us. Helper does NOT append again.
Regression test test_user_message_not_duplicated guards this.
Approval callback wiring:
Lazy-fetches tools.terminal_tool._get_approval_callback at session
spawn time, passes to CodexAppServerSession. CLI threads with
prompt_toolkit get interactive approvals; gateway/cron contexts get
the codex-side fail-closed deny.
Error path:
Codex session exceptions become a 'partial' result with completed=False
and a final_response that explicitly tells the user how to switch back:
'Codex app-server turn failed: ... Fall back to default runtime with
/codex-runtime auto.' Same return-dict shape as the chat_completions
path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.
9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
- api_mode='codex_app_server' is accepted on AIAgent construction
- run_conversation returns the expected codex shape
(final_response, codex_thread_id, codex_turn_id, completed, partial)
- Projected messages are spliced into messages list
- _iters_since_skill ticks per tool iteration
- _user_turn_count delegated to standard flow (not double-counted)
- User message appears exactly once (regression guard)
- _spawn_background_review IS invoked (memory/skill review keeps working)
- chat.completions.create is NEVER called (loop fully bypassed)
- Session exception → partial result with /codex-runtime auto hint
- Interrupted turn → partial result with error preserved
Adjacent test runs confirm no regressions:
- tests/run_agent/test_memory_nudge_counter_hydration.py: green
- tests/run_agent/test_background_review.py: green
- tests/run_agent/test_fallback_model.py: green
- tests/agent/transports/: 249/249 green
Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.
* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)
User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.
Surface:
/codex-runtime — show current state + codex CLI status
/codex-runtime auto — Hermes default runtime
/codex-runtime codex_app_server — codex subprocess runtime
/codex-runtime on / off — synonyms
Files changed:
hermes_cli/codex_runtime_switch.py (new):
Pure-Python state machine shared by CLI and gateway. Parse args,
read/write model.openai_runtime in the config dict, gate enabling
behind a codex --version check (don't let users opt in to a runtime
they have no binary for; print npm install hint instead).
Returns a CodexRuntimeStatus dataclass that callers render however
suits their surface.
hermes_cli/commands.py:
Single CommandDef entry, no aliases (codex-runtime is its own thing).
cli.py:
Dispatch in process_command() + _handle_codex_runtime() handler that
delegates to the shared module and renders results via _cprint.
gateway/run.py:
Dispatch in _handle_message() + _handle_codex_runtime_command() that
returns a string (gateway sends as message). On a successful change
that requires a new session, _evict_cached_agent() forces the next
inbound message to construct a fresh AIAgent with the new api_mode —
avoids prompt-cache invalidation mid-session.
gateway/run.py running-agent guard:
/codex-runtime joins /model in the early-intercept block so a runtime
flip mid-turn can't split a turn across two transports.
Tests:
tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
state machine: arg parsing (10 cases incl. case-insensitive and
synonyms), reading current runtime (5 cases incl. malformed configs),
writing runtime (3 cases), apply() entry point covering read-only,
no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
and persist-failure paths (8 cases). All green.
Adjacent test suites confirm no regressions:
- tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
167/167 green
- tests/agent/transports/: 283/283 green when combined with prior commits
Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.
* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml
Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.
The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.
What translates (mapping verified against codex-rs/core/src/config/edit.rs):
Hermes mcp_servers.<n>.command/args/env → codex stdio transport
Hermes mcp_servers.<n>.url/headers → codex streamable_http transport
Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec
Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec
Hermes mcp_servers.<n>.cwd → codex stdio cwd
Hermes mcp_servers.<n>.enabled: false → codex enabled = false
What does NOT translate (warned + skipped per server):
Hermes-specific keys (sampling, etc.) — codex's MCP client has no
equivalent. Listed in the per-server skipped[] field of the report.
What's NOT migrated (intentional):
AGENTS.md — codex respects this file natively in its cwd. Hermes' own
AGENTS.md (project-level) is already in the worktree, so codex picks
it up without translation. No code needed.
Idempotency design:
All managed content lives between a 'managed by hermes-agent' marker
and the next non-mcp_servers section header. _strip_existing_managed_block
removes the prior managed region cleanly, preserving any user-added
codex config (model, providers.openai, sandbox profiles, etc.) above
or below.
Files added:
hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
helper. Public API: migrate(hermes_config, codex_home=None,
dry_run=False) returns MigrationReport with .migrated/.errors/
.skipped_keys_per_server. No external TOML dependency — minimal
formatter handles strings/numbers/booleans/lists/inline-tables.
tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
covering:
- per-server translation (12): stdio/http/sse, cwd, timeouts,
enabled flag, command+url precedence, sampling drop, unknown keys
- TOML formatter (8): types, escaping, inline tables, error case
- existing-block stripping (4): no marker, alone, with user content
above, with user content below
- end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
re-run, preserves user config, error reporting, invalid input,
summary formatting
Files changed:
hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
the codex_app_server enable branch. Migration failure logs a warning
in the result message but does NOT fail the runtime change. Disable
path (auto) explicitly skips migration.
tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
test_migration_failure_does_not_block_enable.
All 325 feature tests green:
- tests/agent/transports/: 249 (incl. 67 new)
- tests/run_agent/test_codex_app_server_integration.py: 9
- tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
- tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)
* perf(codex-runtime): cache codex --version check within apply()
Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.
Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.
Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.
Two regression-guard tests added:
- test_binary_check_cached_within_apply: enable path → call_count == 1
- test_binary_check_cached_on_read_only_call: state-report path → call_count == 1
Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.
* fix(codex-runtime): correct protocol field names found via live e2e test
Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.
Bug 1: thread/start.permissions wire format
Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
{"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.
Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.
Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.
Bug 2: server-request method names
Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
item/commandExecution/requestApproval
item/fileChange/requestApproval
item/permissions/requestApproval (new third method)
Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.
Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.
Bug 3: approval decision values
Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).
Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.
Live test verified after fixes:
$ hermes (with model.openai_runtime: codex_app_server)
> Run the shell command: echo hermes-codex-livetest > .../proof.txt
then read it back
Approval prompt fired with 'Codex requests exec in <cwd>'.
User chose 'Allow once'. Codex executed the command, wrote the file,
read it back. Final response: 'Read back from proof.txt:
hermes-codex-livetest'. File contents on disk match.
agent.log confirms:
codex app-server thread started: id=019e200e profile=workspace-write
cwd=/tmp/hermes-codex-livetest/workspace
All 20 session tests still green after wire-format updates.
* fix(codex-runtime): correct apply_patch approval params + ship docs
Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.
Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.
Live e2e validation across the path matrix:
✓ thread/start handshake
✓ turn/start with text input
✓ commandExecution items + projection
✓ item/commandExecution/requestApproval → Hermes UI → response
✓ Approve once → command runs
✓ Deny → command rejected, codex falls back to read-only message
✓ Multi-turn (codex remembers prior turn's results)
✓ apply_patch via Codex's fileChange path
✓ item/fileChange/requestApproval → Hermes UI
✓ MCP server migration loads inside spawned codex (verified via
'use the filesystem MCP tool' prompt)
✓ /codex-runtime auto → codex_app_server toggle cycle
✓ Disable doesn't trigger migration
✓ Enable with codex CLI present succeeds + migrates
✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
even if codex finishes before the interrupt lands)
Known live-validated limitations now documented in the docs page:
- delegate_task subagents unavailable on this runtime
- permission profile selection delegated to ~/.codex/config.toml
- apply_patch approval prompt has no inline changeset (codex protocol
doesn't expose it)
145/145 codex-runtime tests still green.
* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)
Major: migrate native Codex plugins (#7 in OpenClaw's PR list)
Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.
Implementation:
- hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
helper spawns 'codex app-server' briefly and walks plugin/list. Returns
(plugins, error) — failures are non-fatal so MCP migration still works.
- render_codex_toml_section() now takes plugins + permissions args.
- migrate() defaults: discover_plugins=True, default_permission_profile=
'workspace-write'. Explicit None on either disables that side.
- _strip_existing_managed_block() now also strips [plugins.*] and
[permissions]/[permissions.*] sections inside the managed block, so
re-runs replace plugins cleanly without touching codex's own config.
Quirk fixes:
#2 Default permissions profile written on enable.
Without this, Codex's read-only default kicks in and EVERY write
triggers an approval prompt. Now writes [permissions] default =
'workspace-write' so the runtime feels normal out of the box. Set
default_permission_profile=None to opt out.
#4 apply_patch approval prompt now shows what's changing.
Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
Session adapter now caches the fileChange item from item/started
notifications and looks it up by itemId when codex requests approval.
Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
'apply_patch (0 change(s))'.
Side benefit: also drains pending notifications BEFORE handling a
server request, so the projector and per-turn caches are up to date
when the approval decision fires. Bounded to 8 notifications per
loop iter to avoid starving codex's response.
#5/#10 Exec approval prompt never shows empty cwd.
When codex omits cwd in CommandExecutionRequestApprovalParams, fall
back to the session's cwd. If somehow neither is available, show
'<unknown>' explicitly instead of an empty string.
Also surfaces 'reason' from the approval params when codex provides
it — gives users more context on why codex wants to run something.
#11 Banner indicates the codex_app_server runtime when active.
New 'Runtime: codex app-server (terminal/file ops/MCP run inside
codex)' line appears in the welcome banner only when the runtime is
on. Default banner is unchanged.
Tests:
- 7 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery (mocked), failure handling, dry-run skip, opt-out
flag, idempotent re-runs, and permissions writing.
- 3 new tests in test_codex_app_server_session.py covering the
enriched approval prompts: cwd fallback, change summary on
apply_patch, fallback when no item/started cache exists.
- All 26 session tests + 46 migration tests green; 153 total in PR.
* feat(codex-runtime): hermes-tools MCP callback + native plugin migration
The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.
Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.
New module: agent/transports/hermes_tools_mcp_server.py
FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
through model_tools.handle_function_call() — same code path as the
Hermes default runtime. Run with:
python -m agent.transports.hermes_tools_mcp_server [--verbose]
Exposed: web_search, web_extract, browser_navigate / _click / _type /
_press / _snapshot / _scroll / _back / _get_images / _console /
_vision, vision_analyze, image_generate, skill_view, skills_list,
text_to_speech.
NOT exposed (deliberately):
- terminal/shell/read_file/write_file/patch — codex has built-ins
- delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
model_tools.py:493, require running AIAgent context. Documented
as a limitation and surfaced in the slash command output.
Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
- _query_codex_plugins() spawns 'codex app-server' briefly to walk
plugin/list and pull installed openai-curated plugins. Failures are
non-fatal — MCP migration still completes.
- render_codex_toml_section() now takes plugins + permissions args
AND wraps the managed block with a MIGRATION_END_MARKER comment so
the stripper can reliably find both ends, even when the block
contains top-level keys (default_permissions = ...).
- migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
default_permission_profile=':workspace' (built-in codex profile name
— must be prefixed with ':'). All three opt-out via explicit args.
- _build_hermes_tools_mcp_entry() builds the codex stdio entry with
HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
Hermes points the MCP subprocess at the same module layout.
Live-caught wire bugs fixed during this turn:
1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
for *user-defined* profiles with structured fields. Built-in
profile names start with ':' (':workspace', ':read-only',
':danger-no-sandbox'). Was emitting
which codex rejected with 'invalid type: string "X", expected
struct PermissionProfileToml'.
2. Built-in profile is , NOT . Codex
rejected with 'unknown built-in profile'.
3. Codex's MCP layer sends for
tool-call confirmation. We weren't handling it, so codex stalled
and returned 'MCP tool call was rejected'. Now: auto-accept for
our own hermes-tools server (user already opted in by enabling
the runtime), decline for third-party servers.
Quirk fixes shipped (from the limitations list):
#2 default permissions: workspace profile written on enable. No more
approval prompt on every write.
#4 apply_patch approval shows what's changing: cache fileChange
items from item/started, look up by itemId when codex sends
item/fileChange/requestApproval. Prompt: '1 add, 1 update:
/tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
#5/#10 exec approval cwd never empty: fall back to session cwd, then
'<unknown>'. Also surfaces 'reason' from codex when present.
#11 banner shows 'Runtime: codex app-server' line when active so
users understand why tool counts may not match what's reachable.
Tests:
- 5 new tests in test_codex_runtime_plugin_migration.py covering
plugin discovery, expose_hermes_tools entry generation, idempotent
re-runs, opt-out flag, permissions profile.
- 3 new tests in test_codex_app_server_session.py covering enriched
approval prompts (cwd fallback, fileChange summary).
- 2 new tests for mcpServer/elicitation/request handling (accept
hermes-tools, decline others).
- New test file test_hermes_tools_mcp_server.py covering module
surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
no agent-loop tools), and main() error paths.
- 166 codex-runtime tests total, all green.
Live e2e validated against codex 0.130.0 + ChatGPT subscription:
✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
registers hermes-tools, writes default_permissions = ':workspace'
✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
✓ Shell command runs without approval prompt (workspace profile works)
✓ Multi-turn — codex remembers prior turn's results
✓ apply_patch path via fileChange request approval
✓ web_search via hermes-tools MCP callback returns real Firecrawl
results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
✓ Disable cycle clean
Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
Full re-write covering native plugin migration, the hermes-tools
callback architecture, the prerequisites change ('codex login is
separate from hermes auth login codex'), the trade-off table now
reflecting which Hermes tools work via callback, and the limitations
list updated with what's actually unavailable on this runtime.
* feat(codex-runtime): pin user-config preservation invariant for quirk #6
Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.
This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.
Now explicitly tested:
- User MCP server above the managed block survives migration
- User MCP server below the managed block survives migration
- Both above + below survive a second re-migration
- User content (model, providers, sandbox, otel, etc.) outside our
region is left untouched
Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.
167 codex-runtime tests, all green.
* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find
Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.
Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).
Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:
1. Codex's built-in toolset (always on) — shell, apply_patch,
update_plan, view_image, web_search; covers everything terminal-
adjacent.
2. Native Codex plugins (auto-migrated from your codex plugin
install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
web_search/web_extract via Firecrawl, browser_*, vision_analyze,
image_generate, skill_view/skills_list, text_to_speech.
Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.
Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.
Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.
No code changes — purely docs clarification. 167 codex-runtime tests
still green.
* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade
Two real bugs in the self-improvement loop integration that the previous
test mocked away.
Bug 1: wrong call signature
The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
messages_snapshot=list (positional or keyword)
review_memory=bool (at least one trigger must be True)
review_skills=bool
So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.
Bug 2: review fork inherits codex_app_server api_mode
The review fork is constructed with:
api_mode = _parent_runtime.get('api_mode')
So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.
Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).
Also rewrote the codex helper's review wiring to match the
chat_completions path:
- Computes _should_review_memory in the pre-loop block (was already
being computed; now passed through to the helper as an arg).
- Computes _should_review_skills AFTER the codex turn returns +
counters tick (line ~15432 pattern in chat_completions).
- Calls _spawn_background_review(messages_snapshot=, review_memory=,
review_skills=) only when at least one trigger fires.
- Adds the external memory provider sync (_sync_external_memory_for_turn)
that the chat_completions path runs after every turn.
Tests:
Replaced the broken test_background_review_invoked (which only
asserted spawn.called) with three sharper tests:
- test_background_review_NOT_invoked_below_threshold:
single turn at default thresholds → no review fires (would have
caught the original 'every turn calls spawn with no args' bug)
- test_background_review_skill_trigger_fires_above_threshold:
10 tool_iterations at threshold=10 → review fires with
messages_snapshot=list, review_skills=True, counter resets
- test_background_review_signature_never_breaks: regression guard
asserting positional args are always empty and kwargs include
messages_snapshot
New TestReviewForkApiModeDowngrade class:
- test_codex_app_server_parent_downgrades_review_fork: drives the
real _spawn_background_review function (no mock at that level),
asserts the review_agent gets api_mode='codex_responses' when
the parent was codex_app_server.
Live-validated against real run_conversation:
- Counter ticked from 0 to 5 after a 5-tool-iteration turn
- _spawn_background_review fired exactly once with kwargs-only signature
- review_skills=True, review_memory=False
- messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
results + 1 final assistant + initial system/user)
- Counter reset to 0 after fire
170 codex-runtime tests, all green.
Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).
* feat(codex-runtime): expose kanban tools through Hermes MCP callback
Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.
That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.
Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.
Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.
Tools exposed:
Worker handoff (require HERMES_KANBAN_TASK):
kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
Read-only board queries:
kanban_show, kanban_list
Orchestrator (require HERMES_KANBAN_TASK unset):
kanban_create, kanban_unblock, kanban_link
Tests:
- test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
- test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link
Docs:
- New 'Workflow features' section in the docs page covering /goal,
kanban, and cron behavior on this runtime
- /goal: works fully via run_conversation feedback; only caveat is
approval-prompt noise on long writes-heavy goals (mitigated by
the default :workspace permission profile)
- Kanban: enumerated which tools are reachable via the callback and
why the env var propagates correctly through the codex subprocess
to the MCP server subprocess
- Cron: documented as 'not specifically tested' — same rules as the
CLI apply since cron runs through AIAgent.run_conversation
- Trade-offs table gained rows for /goal, kanban worker, kanban
orchestrator
172/172 codex-runtime tests green (+2 from kanban tests).
* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost
Three docs gaps caught during a final audit:
1. /codex-runtime was only in the feature docs page, not in the
slash-commands reference. Added rows to both the CLI section and
the Messaging section so users discover it where they'd look for
slash command syntax.
2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
CODEX_HOME lets users redirect Codex CLI's config dir (the migration
honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
propagates to the codex subprocess + the hermes-tools MCP subprocess
so kanban worker tools gate correctly — documented as 'don't set
manually' since it's an internal handoff.
3. Aux client behavior on this runtime. When openai_runtime=
codex_app_server is on with the openai-codex provider, every aux
task (title generation, context compression, vision auto-detect,
session search summarization, the background self-improvement review
fork) flows through the user's ChatGPT subscription by default.
This is true for the existing codex_responses path too, but it's
more visible / important here because users explicitly opted in for
subscription billing. Added a 'Auxiliary tasks and ChatGPT
subscription token cost' section to the docs page with a YAML
example showing how to override specific aux tasks to a cheaper
model (typically google/gemini-3-flash-preview via OpenRouter).
Also documents how the self-improvement review fork gets
auto-downgraded from codex_app_server to codex_responses by the
fix earlier in this PR.
No code changes — pure docs. 172 codex-runtime tests still green.
* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME
OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.
Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:
test_spawn_env_preserves_HOME — confirms parent HOME survives intact
in the subprocess env
test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
arg still isolates
codex state correctly
Docs additions:
'HOME environment variable passthrough' section — calls out the
contract explicitly: CODEX_HOME isolates codex's own state, HOME
stays user-real so gh/git/aws/npm/etc. find their normal config.
Cites openclaw#81562 as the cautionary tale.
'Multi-profile / multi-tenant setups' section — addresses the
related concern: profiles share ~/.codex/ by default. For users who
want per-profile codex isolation (separate auth, separate plugins),
documents the manual CODEX_HOME=<profile-scoped-dir> approach.
Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
would silently invalidate existing codex login state for anyone
upgrading to this PR with tokens already at ~/.codex/auth.json.
Opt-in is safer than surprising users.
174 codex-runtime tests (+2 from HOME guards), all green.
* fix(codex-runtime): TOML control-char escapes + atomic config.toml write
Two footguns caught in a final audit pass before merge.
Bug 1: TOML control characters not escaped
The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.
Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.
Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.
Bug 2: config.toml write wasn't atomic
If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.
Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.
Tests:
- test_string_with_newline_escaped — \n in value → \n in output
- test_string_with_tab_escaped — \t in value → \t in output
- test_string_with_other_controls_escaped — \r, \f, \b
- test_windows_path_escaped_correctly — backslash doubling
- test_atomic_write_no_temp_leak_on_success — no .config.toml.*
left over after a successful write
- test_atomic_write_cleanup_on_rename_failure — temp file removed
when Path.replace raises (simulated disk full)
180 codex-runtime tests, all green (+6 from this commit).
Footguns audited but NOT fixed (with rationale):
- Concurrent migrations race. Two Hermes processes hitting
/codex-runtime codex_app_server within seconds of each other could
cause one writer to lose entries. Low probability (you'd have to
enable from two surfaces simultaneously) and low impact (just re-run
migration). Adding fcntl/msvcrt locking is more code than it's
worth here. The atomic rename above means each individual write is
consistent — only the merge step is racy.
- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
check at runtime but don't reject too-new versions. Right call —
the protocol has been stable through 0.125 → 0.130. If OpenAI
breaks it later we'd see the error in test_codex_app_server_runtime
on CI before users hit it.
|
||
|
|
9a815b6c8c |
fix(gateway): preserve queued follow-up transcript history
Keep the outer history_offset when _run_agent drains queued follow-ups recursively so transcript persistence includes every queued turn in the chain instead of only the last one. |