Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.
Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
The existing _is_resume_pending branch in _handle_message_with_agent
(run.py ~L13738) already injects a reason-aware recovery system note
on the next turn. With kyan's text in place the model saw two stacked
system notes. Now the event text is empty and the existing injection
path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method. The
filter is 8 lines inline in _schedule_resume_pending_sessions() —
one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set. That's the
reason SessionStore.suspend_recently_active() stamps on sessions
recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
Previously those sessions had to wait for a real user message to
auto-resume; now they continue automatically at startup like
drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
future reasons (e.g. 'manual_resume_request') can be opted in with
one line.
Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)
E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.
Co-authored-by: Kevin Yan <kevyan1998@gmail.com>
The kanban_heartbeat tool called heartbeat_worker but never
heartbeat_claim, so a worker that loops the tool while a single tool
call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got
reclaimed by release_stale_claims. The function name and
heartbeat_claim's own docstring imply otherwise:
"Workers that know they'll exceed 15 minutes should call this
every few minutes to keep ownership."
But there was no caller in the worker tool path. Workers couldn't
invoke heartbeat_claim themselves either — it isn't exposed as a tool.
Fix: _handle_heartbeat now calls heartbeat_claim first, reading
HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins
this in _default_spawn). Falls back to _claimer_id() for locally-
driven workers that didn't go through dispatcher spawn.
Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires
rewinds claim_expires into the past, calls the tool, and asserts the
new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to
fail against the unfixed code (claim_expires stays at the rewound
value).
Closes the root cause underlying the symptom in #21141 (15-min
respawns of long-running workers). #21141 separately addresses
post-reclaim cleanup; this fixes the upstream "shouldn't have been
reclaimed in the first place" half.
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.
Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.
Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.
- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
resolver consults get_session_env after HERMES_PLATFORM, and the
gateway sets that variable through set_session_vars (a ContextVar),
not os.environ. The test uses set_session_vars / clear_session_vars
to drive the actual gateway signal, and the disabled-skill stub reads
the same value via get_session_env. A regression that swapped
get_session_env for plain os.getenv would still pass an env-var-based
test but break concurrent gateway sessions, which is the bug the
ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
gateway session): the cached telegram view must be dropped and the
unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
scope must NOT rescan. The rescan trigger is change in scope, not
"always re-resolve" — a gateway serving many consecutive telegram
requests should pay the scan cost once, not per request.
The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.
All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.
In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).
The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:
FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
AssertionError: Expected 'kill' to not have been called. Called 1 times.
Calls: [call(12345, <Signals.SIGKILL: 9>)].
FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
AssertionError: Expected 'kill' to have been called once. Called 2 times.
Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].
FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
assert 2 == 1
manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]
In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.
Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.
Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.
Validation:
$ pytest tests/hermes_cli/test_update_gateway_restart.py -q
43 passed in 4.13s
No production code change. Fixes the three failures observed on `main`
(run 25250051126):
test_update_restarts_profile_manual_gateways
test_update_profile_manual_gateway_falls_back_to_sigterm
test_update_kills_manual_pid_but_not_service_pid
Refs: #17648 (post-update survivor sweep that the tests didn't model).
_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.
Mirrors the fix shipped for agent/google_oauth.py in #19673.
Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).
Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:
FAILED tests/hermes_cli/test_update_yes_flag.py
::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
`AssertionError: assert <MagicMock 'input'>.called is False`
FAILED tests/hermes_cli/test_update_yes_flag.py
::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
`AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`
Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:
with patch("hermes_cli.main.sys") as mock_sys:
mock_sys.stdin.isatty.return_value = True
mock_sys.stdout.isatty.return_value = True
The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.
Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).
Validation:
$ pytest tests/hermes_cli/test_update_yes_flag.py -q
3 passed in 5.47s
No production code change. Fixes the two failures observed on `main`
(run 25250051126):
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`
Refs: #18261 (added the `--yes` flag + these tests).
The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:
`test_dockerfile_installs_tui_dependencies` required:
'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text
…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).
`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.
Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:
* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
`await import('@hermes/ink')` resolves at runtime).
This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.
Validation:
$ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
6 passed in 1.43s
No production code change. Fixes the two failures observed on `main`
(run 25250051126):
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`
Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.
Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.
- create_profile() gains a `no_skills` param, mutually exclusive with
`--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
`{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
re-seeds normally.
6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
Two follow-ups on top of helix4u's slash-command sync hardening:
- Only suppress exceptions that are actually Discord 429 rate limits
(discord.RateLimited, HTTPException with status 429, or a clearly
rate-limit-named duck type). Arbitrary failures that happen to expose
a retry_after attribute now re-raise to the outer handler instead of
silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
stops collecting ad-hoc runtime files.
Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
Previously, /personality in the TUI called _reset_session_agent() which
destroyed the agent, cleared conversation history, and effectively started
a new session. This made personality switching disruptive — users lost
their entire conversation context.
Now /personality updates the agent's ephemeral_system_prompt in-place and
injects a pivot marker into the conversation history. The marker tells
the model to adopt the new persona from that point forward, which is
necessary because LLMs tend to pattern-match their prior responses and
continue the established tone without an explicit signal.
Changes:
- tui_gateway/server.py: Rewrite _apply_personality_to_session to update
the agent in-place instead of resetting. Inject a user-role pivot
marker so the model actually switches style mid-conversation.
- ui-tui/src/app/slash/commands/session.ts: Update help text (no longer
mentions history reset).
- tests/test_tui_gateway_server.py: Update test to verify history is
preserved, pivot marker is injected, and ephemeral prompt is set.
The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.
- Convert _require_root_for_system_service to raise a typed
SystemScopeRequiresRootError (RuntimeError subclass) instead of
sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
restart|uninstall` without sudo) still exits 1 cleanly via a new
catch at the top of gateway_command, matching the existing
UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
_print_system_scope_remediation() helper. Both setup wizards
(hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
detect the dead-end before prompting and print actionable guidance:
either `sudo systemctl start <service>` this time, or uninstall the
system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
SystemScopeRequiresRootError and fall back to the remediation
helper if the pre-check is bypassed (race, etc.).
Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.
* fix(tui): restore classic CLI voice push-to-talk parity
(cherry picked from commit 93b9ae301b)
* fix(tui): harden voice push-to-talk stop flow
Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.
* fix(tui): preserve silent voice strike tracking
Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.
* fix(tui): clean up voice stop failure path
Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.
* fix(tui): report busy voice capture starts
Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.
* fix(tui): handle busy voice record responses
Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.
* fix(tui): clear voice recording on null response
Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.
* fix(tui): count silent manual voice stops
Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.
---------
Co-authored-by: Montbra <montbra@gmail.com>
Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.
Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.
The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.
Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
_global_auth_file_path() returns None.
Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.
Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).
Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.
- Expand migration comment to name the primary failure mode (missing
column OperationalError from #20842) ahead of the secondary SQLite
schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
explicitly asserts the exact #20842 crash no longer occurs and that
consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
asserts the migration is a no-op when both columns already exist,
preserving the existing counter value
Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.
Adds SearXNG as a free, self-hosted web search provider. SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.
## What this adds
- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
`WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key
## Config
```yaml
# Use SearXNG for search, any paid provider for extract
web:
search_backend: "searxng"
extract_backend: "firecrawl"
# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
backend: "searxng"
```
SearXNG is search-only — it does not implement WebExtractProvider. Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).
Closes#19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)
Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.
Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
DEFAULT_CONFIG (empty = inherit from web.backend)
Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
to before — _get_search_backend() falls through to _get_backend()
This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.
12 new tests, 49 existing tests pass with zero regressions.
Ref: #19198
OpenCode Go and OpenCode Zen are flat-namespace model resellers — their
/v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the
inference API rejects vendor-prefixed names with HTTP 401 'Model not
supported'. Two bugs fixed:
1. `switch_model` in hermes_cli/model_switch.py was silently switching the
user off opencode-go to native deepseek when they typed
`/model deepseek-v4-flash`. Step d found the model in opencode-go's live
catalog, but step e (detect_provider_for_model) still ran and matched
the bare name against deepseek's static catalog. Fix: track whether
the live catalog resolved it; skip step e when it did.
2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only
stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor
prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator
slugs into fallback_model configs) intact — causing HTTP 401s when
the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY
leading vendor prefix because their APIs are flat-namespace.
Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py
covering both normalization (prefix stripping, regression guards for
opencode-zen Claude hyphenation and openrouter vendor-prepending) and
switch_model (bare-name resolution on opencode-go's live catalog must
not trigger cross-provider hijack).
Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai
has no overlapping entry in a native provider's static catalog. Deepseek
and minimax failed because their v4/v2.7 names existed in the native
deepseek/minimax catalogs.
When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).
The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:
WARNING cli: process_loop unhandled error (msg may be lost):
[Errno 63] File name too long: "/goal Drive the space board..."
This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.
Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.
Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.
Reproducer (without the fix):
python -c "from cli import _detect_file_drop;
_detect_file_drop('/goal ' + 'a'*300)"
→ OSError: [Errno 63] File name too long
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
CPython's logging module is not reentrant-safe. `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.
When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.
Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path. This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself. Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.
Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.
Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
deliberately so real shutdown signals still propagate
Closes#13710 regression.
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
- hermes_cli/config.py: add tr to supported languages comment
- locales/en.yaml: add tr to locale file list comment
- tests/agent/test_i18n.py: add Turkish alias tests + explicit lang test
- website/docs/user-guide/configuration.md: add tr to supported values
The kanban-worker skill (built into the gateway dispatcher's spawn
prompt) instructs every worker to hand off via
``kanban_complete(summary=..., metadata=...)``. That writes the summary
onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the
latter is left NULL unless the caller passes ``result=`` explicitly.
Result: a glance at the dashboard or ``hermes kanban show <id>`` shows
a blank "Result:" section even when the worker did real work, which
on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a
task that had a 10-line completion summary on its run.
This patch surfaces the latest non-null run summary as
``latest_summary`` so the worker's actual handoff lands in front of
operators.
* New helpers ``kanban_db.latest_summary(conn, task_id)`` and
``kanban_db.latest_summaries(conn, task_ids)``. The batch variant
uses a single window-function SELECT so the dashboard board endpoint
doesn't pay an N+1 cost on multi-hundred-task boards.
* CLI ``hermes kanban show <id>`` prints a "Latest summary:" block
when ``tasks.result`` is empty but a run has produced a summary
(the existing "Result:" section still wins when populated, so the
back-compat path for hand-edited results is untouched). JSON output
gains a top-level ``latest_summary`` field.
* Dashboard ``/board`` and ``/tasks/{id}`` now include a
``latest_summary`` field on every task. Cards on /board carry a
200-character preview (cheap to render, plenty for "what did this
worker do?" at a glance); the drawer/detail endpoint returns the
full summary.
* Five new tests cover: empty-runs case, post-complete surface,
newest-of-multiple selection, empty-string skip, batch with
missing tasks + empty input.
Smoke-tested locally against the live profile DB on the three
acceptance-criterion targets (t_f08fef91 cron-hygiene-audit,
t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now
return their populated summaries via both ``latest_summary`` and
``latest_summaries``.
Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests
pass. No regression on tasks where ``tasks.result`` is explicitly
populated (the existing "Result:" branch is preserved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add parent dependency guard to _set_status_direct so dragging
a task to the ready column is rejected (409) when its parents
are not all done. Previously the guard only existed in
recompute_ready, allowing direct status writes via the
dashboard API to bypass the dependency engine.
Root cause: after reclaiming stale workers, both T3 and T4
were set to ready via dashboard status writes in quick
succession, causing the writer to be spawned while the analyst
was blocked — upstream work wasn't done yet.
After #19473 landed (enforce_max_runtime reads from task_runs.started_at
rather than tasks.started_at), a regression test added earlier still
only backdated the tasks column. Backdate both so the test is robust
regardless of which column the enforcer reads from.
Widens _verify_created_cards to also accept ids that are children of the
completing task in task_links. Previously we only accepted cards where
created_by matched the completing task's assignee, which was too strict
for legitimate orchestrator flows: a specifier creates a card (so
created_by=specifier, not worker), then a worker picks it up and passes
parents=[current_task] to kanban_create. The explicit link proves the
relationship and should be trusted.
Salvaged from #20022 @LeonSGP43 (full PR superseded by #20232 +
this patch; the linked-children relaxation was the portable
improvement).
Salvage follow-up for PR #20344:
- AUTHOR_MAP entry for rob-maron (required by CI)
- 17 parametrized tests covering _is_arcee_trinity_thinking,
_fixed_temperature_for_model Trinity override, and
_compression_threshold_for_model, including sibling-model negatives
(trinity-large-preview, trinity-mini) and the OpenRouter slug form.