Repeated quarantines of an unchanged corrupt kanban.db used to amplify
disk usage by N: the gateway dispatcher's 5-minute retry loop, multi-
profile fleets sharing one DB, and manual reopen attempts each produced
a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10
retries on a 100KB DB you had 11x the disk footprint of duplicate
corrupt data.
Derive the backup filename from a sha256 of the main DB instead of a
timestamp + collision counter. Same bytes → same filename → skip the
copy on retries. Different bytes (partial repair, further damage) →
different filename → preserve separately. Sidecar (-wal/-shm) backups
inherit the same content-addressed name.
Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop
the persistent JSON marker file, drop the atomic temp+fsync+rename
helper (shutil.copy2 is fine for a quarantine-only path), drop the
gateway-side WAL/SHM fingerprint extension (the existing
(path, mtime, size) tuple still gives the 5-minute retry semantics it
needs), and drop the gateway-side helper extraction. The backup file
existing IS the marker; no separate state needed.
Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup
proves 10 retries on the same corrupt bytes produce 1 backup (was 11),
and mutating the corrupt bytes produces a second backup with a
different fingerprint.
Refs #33529
Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>
`hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands *after* `pip install -e .` has already touched the install but *before* the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384).
Changes:
- New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update.
- `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract.
- Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step.
Tests:
- `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests.
- `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step.
- `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update.
Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed.
Fixes#33788.
Extends @liuhao1024's escape-normalized fix so the patch tool also
recovers when old_string carries a real tab byte and matches via the
`exact` strategy — which is the headline reproduction in the issue and
the most common case in practice (LLMs frequently get old_string right
because they re-read the file, but still serialize new_string's tabs as
two-character `\t`).
Instead of gating on the match strategy, decide per-sequence by looking
at the *matched region of the file*: only convert `\t` -> tab and
`\r` -> CR when the file region we're replacing actually contains the
corresponding control byte. That mirrors the region-based heuristic in
`_detect_escape_drift` and keeps legitimate writes of the literal
two-character string `"\t"` (e.g. patching `sep = "\t"` in Python
source) untouched — those files have a backslash+t in the matched
region, not a real tab, so new_string passes through verbatim. `\n` is
still excluded because newlines serialize correctly through JSON and
unescaping would corrupt source escape sequences far more often than
help.
E2E verified against the live `patch` tool: tab-indented file + literal
`\t` in new_string under both `exact` (Variant 1) and `escape_normalized`
(Variant 2) strategies now produces real tab bytes; a Python source line
containing `sep = "\t"` (legitimate literal backslash-t) survives a
patch unchanged.
Tests updated to cover both strategies and the legitimate-literal case,
and to assert that `\n` is intentionally preserved.
Refs #33733
When the patch tool matches via the escape_normalized strategy, old_string
contains literal \t, \n, \r sequences that get unescaped to match real
control characters in the file. However, new_string was written as-is,
leaving literal backslash sequences in the output.
Add _unescape_common_sequences() helper and apply it to new_string when
the matching strategy is escape_normalized. This ensures LLM-generated
tab/newline sequences become real bytes in the patched file.
Fixes#33733
Sessions now survive `hermes gateway stop` / `restart` on native Windows.
Previously the gateway died on schtasks `/End` + os.kill SIGTERM without
ever running the drain loop, so the v0.13.0 session-resume feature (#21192)
silently broke on Windows: `resume_pending=True` was never written, and
the next boot started with a blank conversation history (issue #33778).
Root cause is twofold and the reporter only identified half of it:
1. `hermes_cli/gateway_windows.py::stop()` did not write the
`planned_stop_marker` before signalling. The reporter caught this.
2. The bigger reason: `asyncio.add_signal_handler` raises
NotImplementedError for SIGTERM/SIGINT on Windows, so even if the
marker had been written, the gateway's existing SIGTERM handler
(which is what calls `runner.stop()` and the `mark_resume_pending`
loop) was never invoked. Writing the marker would have been
necessary-but-insufficient.
The fix has two parts:
* gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls
for the planned-stop marker file every 0.5s. When the marker appears
it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the
same shutdown path a real SIGTERM would have driven, including the
pre-drain `mark_resume_pending` writes (run.py:5977) and graceful
drain wait. The existing signal handler already accepts
`received_signal=None` and falls through to
`consume_planned_stop_marker_for_self()`, so no handler changes
needed. Runs on every platform as cheap belt-and-suspenders.
* hermes_cli/gateway_windows.py: `stop()` now writes the marker for
the running gateway PID and waits up to `agent.restart_drain_timeout`
(default 30s) for the PID to exit cleanly. On clean drain, the kill
sweep is non-forceful; on timeout, escalates to
`kill_gateway_processes(force=True)` which routes to taskkill /T /F
per `references/windows-native-support.md`.
Validation:
* 7 new tests in tests/gateway/test_planned_stop_watcher.py covering:
marker→handler dispatch, no-marker idle, already-draining skip,
not-yet-running skip, stop_event responsiveness, fire-once
semantics, error tolerance.
* 8 new tests in tests/hermes_cli/test_gateway_windows.py covering:
marker-before-kill ordering, clean-drain skips force-kill,
drain-timeout escalates to force=True, no-pid-skips-drain,
invalid-pid handling, fast-exit success, timeout failure,
marker-write-failure tolerance.
* E2E (Linux, detached orphan): write_planned_stop_marker(pid) +
`_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the
victim sees the marker and exits. Tested with a double-forked
subprocess so the test parent isn't holding it as a zombie.
* Targeted: tests/gateway/{restart_drain,restart_resume_pending,
signal,signal_format,status,shutdown_forensics,approve_deny_commands,
planned_stop_watcher} + tests/hermes_cli/{gateway_windows,
gateway_service} → 519/519.
What was wrong with the reporter's claim (for future archaeology): they
described the symptom as "no `resume_pending=True` written to
`sessions.json`" — but Hermes uses `state.db` (SQLite), not
`sessions.json`, and `mark_resume_pending` is called regardless of
the marker (the marker only affects exit code 0 vs 1 for systemd
revival semantics). The real session-loss path is the missing drain
on Windows, not a missing marker. Both halves are fixed here.
Closes#33778.
api_messages is built once before the retry loop while the primary provider
is active. When a mid-conversation fallback switches to a require-side thinking
provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary
(e.g. Codex) go out without reasoning_content and the new provider rejects the
request with HTTP 400 ("reasoning_content must be passed back").
Re-apply the echo-back pad against the current provider immediately before
building the request kwargs. Idempotent and a no-op unless the active provider
enforces echo-back, so it covers all fallback paths without affecting normal or
reject-side operation.
Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In GatewayStreamConsumer._run(), _final_content_delivered was set to True
based on the success of a mid-stream finalize edit, before the final
finalize edit was attempted. When the final edit later failed (Telegram
flood control, retry-after), _final_response_sent stayed False but
_final_content_delivered was already True, so gateway/run.py suppressed
its normal final send and the user saw a partial / fallback message
instead of the real answer.
Changes in gateway/stream_consumer.py:
- Remove the premature _final_content_delivered = True at the top of
the got_done block.
- Set _final_content_delivered = True only when the actual final send /
edit succeeds, in each finalize branch (no-finalize adapter,
_message_id finalize, no-_already_sent send).
- _send_fallback_final: don't set _final_response_sent = True when only
some chunks were delivered; the gateway should still attempt a
complete final send. Set _final_content_delivered = True alongside
_final_response_sent on the success path and short-text path.
- Cancellation handler: set _final_content_delivered = True alongside
_final_response_sent when the best-effort final edit succeeds.
Adds TestFinalContentDeliveredGuard with 3 regression tests covering
the core bug scenario, the happy path, and partial fallback.
Closes#33708Closes#25010
Refs #29200
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Three new tests in tests/hermes_cli/test_proxy.py:
- xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case.
Two-entry pool, 429 on first entry, must rotate to second entry
AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant
for rate limits).
- xai_adapter_retry_returns_none_on_429_when_pool_exhausted —
single-entry pool: 429 returns None so the rate-limit response
flows back to the client unchanged (existing behavior preserved).
- xai_adapter_retry_returns_none_for_unrelated_status — non-{401,
429} statuses must not trigger any retry path at all; guards
against the gate becoming too broad in future changes.
Each test asserts that refresh_xai_oauth_pure is never called on the
429 path — refresh is a 401-specific concern.
39/39 in tests/hermes_cli/test_proxy.py.
get_retry_credential only triggered on 401; a 429 Too Many Requests from
xAI was silently streamed back with no key rotation or back-off signal.
- server.py: widen retry gate from == 401 to in {401, 429}
- xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate
to stamp the 1-hour cooldown on the rate-limited key and return the
next available credential. Returns None if pool is exhausted.
Salvage follow-up on top of @vynxevainglory-ai's PR #29233. Keep the
column-body flex:1 + min-height:0 fix (tall columns scroll internally
now), but drop the flex-wrap: wrap part — instead just stop hiding
the existing horizontal scrollbar.
PR #523254b34 (sadiksaifi, May 18) deliberately moved the kanban board
from a wrapping grid to a single-row pinned-width flex so the board
stays as one stable horizontal row. The mistake in that PR was the
scrollbar-width: none + ::-webkit-scrollbar { display: none } pair,
which hid the affordance so columns past the viewport became visually
inaccessible. Fixing that hidden-scrollbar bug while keeping the
single-row design honors both contributors' intent.
Two CSS issues in the kanban dashboard:
1. Columns overflow horizontally with no way to reach them — the
original scrollbar-width: none hid the scrollbar entirely, and
even with a scrollbar, a wrapping layout is better UX for a board
with 8+ columns. Changed to flex-wrap: wrap and removed the
overflow-x: auto + hidden scrollbar rules. Columns now flow into
multiple rows (~3 per row on a typical viewport) instead of
running off-screen.
2. .hermes-kanban-column-body lacked flex: 1 and min-height: 0,
so the flex child's implicit min-height: auto prevented it from
shrinking below its content size. Columns with many cards pushed
past the parent max-height instead of scrolling internally.
Verified: 9 columns wrap into 3 rows, all visible without
horizontal scroll. Done column (53 tasks) scrolls vertically
within its column bounds.
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373
into a minimal generic host contract that external context engine plugins
(e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that
duplicated existing infrastructure or had marginal value.
Five concrete changes:
1. `_transition_context_engine_session()` on AIAgent — generic lifecycle
helper that fires on_session_end → on_session_reset → on_session_start
→ optional carry_over_new_session_context. Engines implement only the
hooks they need; missing hooks are skipped. Built-in compressor keeps
its existing reset-only behavior because callers default to no
metadata. `reset_session_state()` now optionally accepts
previous_messages / old_session_id / carry_over_context and delegates
to the transition helper when provided. (#16453)
2. `conversation_id` passed to `on_session_start()` — both the
agent-init call site and the compression-boundary call site now
forward `self._gateway_session_key` so plugin engines have a stable
conversation identity that survives session_id rotation (compression
splits, /new, resume). The key already existed on AIAgent; it just
wasn't reaching engines. (#16453)
3. Canonical cache buckets forwarded to engines — the usage dict passed
to `update_from_response()` now includes input_tokens, output_tokens,
cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of
the legacy prompt/completion/total keys. Engines can make decisions on
cache-hit ratios and reasoning costs instead of only aggregates. ABC
docstring updated. (#17453)
4. Plugin-registered context engines visible in the picker —
`_discover_context_engines()` in plugins_cmd.py now also includes
engines registered via `ctx.register_context_engine()` from plugin
manifests, deduplicating by name so repo-shipped descriptions win on
collision. (#16451)
5. `_EngineCollector.register_command()` — context engines using the
standard `register(ctx)` pattern can now expose slash commands (e.g.
`/lcm`). Routes to the global plugin command registry with the same
conflict-rejection policy regular plugins use (no shadowing built-ins,
no clobbering other plugins). Previously these calls hit a no-op and
the slash commands silently never appeared. (#17600)
Dropped from the original 5 PRs:
- Compression boundary signal (`boundary_reason="compression"`) from
#16453 — already on main at `agent/conversation_compression.py:412-424`,
landed via the bg-review extraction.
- `discover_plugins()` before fallback in run_agent.py from #16451 —
redundant: `get_plugin_context_engine()` already routes through
`_ensure_plugins_discovered()` which is idempotent.
- Runtime identity diagnostics method + helpers from #13373 (+251 LOC) —
operators can already read engine state via `engine.get_status()`;
the diagnostics view added marginal value relative to its surface area.
- The 553-LOC slash-command machinery from #17600 — replaced with a
20-LOC `register_command` method on the collector that reuses the
existing plugin command registry instead of building a parallel one.
Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs
~1,176 LOC across the original 5 PRs.
Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com>
Closes#16453.
Closes#17453.
Closes#16451.
Closes#17600.
Closes#13373.
Related: stephenschoettler/hermes-lcm#68.
* fix(skills): pull full ClawHub catalog into the skills index
The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.
End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.
Changes:
- `ClawHubSource.search(query="")` now routes through the existing
`_load_catalog_index()` paginating walker instead of the unpaginated
listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
regression hard-fails the CI build instead of silently shipping a
degenerate index.
Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
exercises the cursor-following path with three mocked pages (450
total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
pages before this commit landed, paginating well past the previous
50-page cap.
* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k
E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.
- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
#33164 made _save_codex_tokens sync the singleton-seeded `device_code`
pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed
`manual:device_code` entries created by `hermes auth add openai-codex`
(the recommended workaround for users who hit #33000 before #33164
landed).
Every subsequent re-auth would refresh the device_code entry but leave
the manual:device_code entry holding the consumed refresh token plus
stale last_error_* markers — immediately recreating the 401
token_invalidated symptom on the next request, exactly as reported in
#33538.
Extend the refreshable source set to include `manual:device_code`.
Completing the device-code OAuth flow proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry. Keep
`manual:api_key` and other non-device-code manual sources untouched —
those represent independent credentials.
Closes#33538.
Kimi K2.6 is natively multimodal — flagged by Shengyuan from the Kimi
growth team. Replace the named-vendor example with a model-agnostic
phrasing so the row doesn't go stale as more vendors ship vision.
Adds first-class `client_cert` / `client_key` config keys so MCP servers
behind mTLS work without an external TLS-terminating proxy. Resolves
inbound community question (Jeremy W.).
Schema (per `mcp_servers.<name>`, HTTP/SSE only):
- `client_cert: "/path/to/combined.pem"` — single PEM with cert + key
- `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate
- `client_cert: [cert, key]` or `[cert, key, password]` — list form,
with optional passphrase for encrypted keys
Paths support `~` expansion. Missing files raise a server-scoped
`FileNotFoundError` at connect time rather than failing later with an
opaque TLS handshake error.
Wiring:
- New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned
`httpx.AsyncClient` alongside the existing `verify=` handling.
- SSE path: routed through an `httpx_client_factory` that wraps the
SDK's defaults (follow_redirects=True) and layers `verify` + `cert`
on top. The factory is only injected when needed, so the SDK's
built-in `create_mcp_http_client` keeps being used in the default
case.
- Deprecated mcp<1.24 path left untouched — that SDK's
`streamablehttp_client` signature doesn't expose `cert`, and adding
it would be dead code.
Also documents the previously-undocumented `ssl_verify` key (bool or
CA bundle path) in the MCP config reference.
Tests:
- `tests/tools/test_mcp_client_cert.py` (new, 19 tests):
- `_resolve_client_cert` helper: all three input forms, `~` expansion,
missing-file and validation errors.
- HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for
string and tuple forms; absent when unset; missing-file error
propagates.
- SSE transport: factory only injected when cert or non-default
verify is set; factory applies cert, custom CA bundle, and
preserves `follow_redirects=True` + forwarded headers/auth.
- Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py`
still pass.
Resolves the two Dependabot alerts currently open against the website
lockfile:
- serialize-javascript: pin to ^7.0.5 (was 6.0.2 — high-severity RCE
via RegExp.flags + Date.prototype.to*, plus medium-severity DoS)
- uuid: pin to ^14.0.0 (was 8.3.2 — medium buffer bounds check miss
in v3/v5/v6 when buf is provided)
Lockfile regenerated against current main (not the stale lockfile
from the original PR — several Dependabot bumps for mermaid,
webpack-dev-server, @babel/plugin-transform-modules-systemjs,
fast-uri, lodash-es+langium, lodash, follow-redirects, and dompurify
have landed since #30036 was opened, so the website portion was
re-applied surgically on top of those).
Salvaged the website half of PR #30036. The TUI test half landed
on main separately, so this PR is web-only.
* docs(voice): use `uv pip install faster-whisper` in STT install hints
Three runtime messages told users to `pip install faster-whisper`
(reported in #29782 for the gateway STT failure message under
Telegram-in-Docker, where the user hit `bash: pip: command not
found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv`
with a uv-managed venv that doesn't ship `pip` on PATH; users on
modern `uv tool install` / `uv venv` installs see the same problem.
The canonical install command in this repo is `uv pip install`
(see `tools/lazy_deps.py:509` `feature_install_command()`), which
works in Docker (uv image), in `uv tool install` venvs, and in
pip-based venvs that already have uv on PATH.
Changed three locations to match:
- `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice
reply when no STT provider is configured. Suggests
`uv pip install faster-whisper` and notes that
`pip install faster-whisper` also works if `pip` is on PATH.
- `tools/voice_mode.py` — `/voice` status line for missing STT.
- `cli.py` — Voice-mode startup error, "Option 1".
No behavior change beyond the user-facing text. No production
code path was touched.
* docs(voice): add pip fallback to cli + voice_mode STT hints
Copilot flagged that cli.py and tools/voice_mode.py recommend
`uv pip install faster-whisper` without a fallback for environments
where uv isn't on PATH. The gateway/run.py message already lists
`pip install faster-whisper` as an alternative; this commit aligns
the two remaining call sites to match.
Addresses inline Copilot review on #29800.
---------
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Two unrelated transient failures on PR #33661's initial CI run, both
pre-existing on main and recovered on rerun. Hardening:
1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for
resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning
tests intend to exercise only the warning-log path, but
_run_job_impl continues into provider resolution and MCP discovery
after the warning. Both can spawn subprocesses / hit the network and
pushed the test over its 30s budget under GHA load.
2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown
against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD
arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign
pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now
catches AssertionError/TimeoutExpired, force-kills, and always reaps
so no zombie escapes. Same hardening applied to the early-skip branch.
The regression-guard test
`test_cmd_update_on_git_install_does_not_print_docker_message` mocked
`is_managed` and `detect_install_method` but not `subprocess.run`, so
once `cmd_update(check=True)` decided this was a git install it shelled
out to a real `git fetch upstream` / `git fetch origin`. On CI runners
the worktree has no `upstream` remote configured and the fetch hung
past the 30s pytest-timeout — test (4) slice failed in #33659 CI.
Fix: stub `subprocess.run` with a successful CompletedProcess-shaped
object whose stdout is `"0\n"`, so:
- no real git command is ever invoked
- the rev-list parsing later in the flow (`int(stdout.strip())`)
succeeds rather than `ValueError`-ing through the test's
SystemExit catch
- the flow proceeds far enough to confirm the docker banner is
absent (the actual assertion)
Also broaden the except clause to `(SystemExit, Exception)`: the only
assertion in this test is the negative-banner check on captured stdout;
any further failure in the rest of the update flow is irrelevant to
that contract.
Verified locally: all 7 tests in
`tests/hermes_cli/test_cmd_update_docker.py` pass in 0.39s (previously
the regression-guard test alone consumed 30s+ and got SIGTERM'd).
Inside the published Docker image, `hermes update` was hitting the
".git missing → reinstall via curl" fallback:
✗ Not a git repository. Please reinstall:
curl -fsSL https://raw.githubusercontent.com/.../install.sh | bash
That message is wrong on two counts:
1. It tells the user to run the host-side installer, which would
install a *new* Hermes on the host — not update the running
container.
2. It doesn't mention `docker pull` at all, leaving Docker users
to figure out the right action from scratch.
`hermes update --check` was worse: it bailed with "Not a git
repository — cannot check for updates." and nothing else.
Fix: detect the Docker install method (already stamped by
`docker/stage2-hook.sh` and surfaced by `detect_install_method()`)
in both update entry points and print a long-form message that
covers:
- The right command: `docker pull nousresearch/hermes-agent:latest`
- Restart guidance (`docker compose up -d --force-recreate` /
re-run `docker run`)
- How to verify the new version after restart
- Tag-pinning caveat (`:latest` doesn't move a pinned tag)
- Config persistence across upgrades (state under `HERMES_HOME` /
`/opt/data` is bind-mounted and survives)
- Fork escape hatch (build your own image with the repo's Dockerfile)
Exit code is 1 (matches `managed_error` semantic for "tried to
update but can't update this way").
Plumbing:
- hermes_cli/config.py: new `format_docker_update_message()` helper
sits next to the existing `_NIX_UPDATE_MSG` /
`format_managed_message()` family so the wording lives in one
place and both call sites (apply path + check path) consume it.
- hermes_cli/main.py:
* `cmd_update()`: bail right after the `is_managed()` gate, before
any of the apply-path branches.
* `_cmd_update_check()`: bail at the top of the function, before
the existing `method == "pip"` branch.
Neither path touches subprocess.run / git when method == "docker".
Coverage:
- 7 new tests in `tests/hermes_cli/test_cmd_update_docker.py`:
* `hermes update` in Docker → message + exit 1, no git calls
* `hermes update --check` (via cmd_update) → same
* `--yes` / `--force` don't bypass (intentional)
* `_cmd_update_check` called directly → bails too
* git/pip installs still take their normal paths (regression guards)
* `format_docker_update_message` content-lock test pinning the
five user-actionable bits the message must contain
- Existing test_cmd_update.py (21 tests) + test_managed_installs.py
(5 tests) still pass — no regression on the source-install path.
- Verified end-to-end in a real container: `docker run ... update`
and `docker run ... update --check` both render the message and
exit 1.
Snapshot review_agent._session_messages before teardown so close() can
clean per-session state without dropping the user-visible
self-improvement summary. Adds two regressions:
- bg-review summarizer receives captured review-agent tool messages
after review_agent.close() runs
- context-compressor protected-head handoff rehydration populates
_previous_summary and keeps the old handoff out of newly summarized
turns
Salvaged from PR #26039 onto current main after agent/background_review.py
extraction. Original commit 63eaf6055; bg-review test updated to patch
the module-level summarize_background_review_actions in
agent.background_review instead of the now-forwarder
AIAgent._summarize_background_review_actions.
`hermes dump` and the startup banner both call `git rev-parse HEAD` to
report the running commit, but `.dockerignore` line 2 excludes `.git` —
so inside the published image `hermes dump` shows
`version: ... [(unknown)]` and the banner drops its `· upstream <sha>`
suffix entirely. That makes support triage from container bug reports
impossible: we can't tell which commit the user is actually running.
Fix: thread the build-time SHA through as a Docker build-arg, write it
to `/opt/hermes/.hermes_build_sha` in the image, and have a new
`hermes_cli/build_info.get_build_sha()` read it as a fallback after the
existing live-git lookup fails. Output format is unchanged in both
callsites — same 8-char short SHA whether resolved live or baked.
Wiring:
- Dockerfile: `ARG HERMES_GIT_SHA=` + write-file step after the source
copy. Empty/missing arg → no file written → callers fall through to
live git (so local `docker build` without --build-arg is unchanged).
- docker-publish.yml: passes `HERMES_GIT_SHA=${{ github.sha }}` on all
four build-push-action steps (amd64/arm64, smoke-test + final push).
- dump.py:_get_git_commit() / banner.py:get_git_banner_state(): try
live git first, fall back to baked SHA, then to legacy `(unknown)`
/ None. Banner returns `upstream == local, ahead=0` because a built
image is by definition pinned to one commit.
Coverage:
- Unit tests cover build_info (file present/absent/empty/error,
truncation, whitespace), dump (live-git wins, both fallbacks,
identical output-format regression guard), and banner (no-repo +
baked, no-repo + no-sha, shallow-clone fallback).
- tests/docker/test_dump_build_sha.py is an integration regression
guard that runs against the real image, reads
`/opt/hermes/.hermes_build_sha`, and asserts `hermes dump` surfaces
its content (or stays at `(unknown)` if no file).
- Verified end-to-end: `docker build --build-arg HERMES_GIT_SHA=abc...`
→ `docker run ... dump` reports `[abc12345]`; without the build-arg
it reports `[(unknown)]` as before.
`sqlite3.Connection.__exit__` commits/rollbacks but does NOT close the
underlying FD. `with kb.connect() as conn:` in long-lived processes
(gateway `run_slash`, dashboard `decompose_task_endpoint`) therefore
leaks one FD to `kanban.db` per call. After enough operations the
gateway dies with `[Errno 24] Too many open files` (~4 days uptime
in the production report — #33159).
Fix: add a `connect_closing()` context manager in `hermes_cli/kanban_db`
that wraps `connect()` with a real `try/finally: conn.close()`. Switch
the 42 leak-prone call sites in `hermes_cli/kanban.py` (35),
`hermes_cli/kanban_decompose.py` (4), and `hermes_cli/kanban_specify.py`
(3) over to it.
`kanban.py` matters because `run_slash` (called from the gateway for
every `/kanban` slash command) parses argparse and dispatches to those
`_cmd_*` functions in-process — each one was leaking one FD per
invocation.
Tests inside `tests/` are untouched: short-lived processes where OS
cleanup masks the leak. Regression tests added in
`test_kanban_db.py` cover both happy-path and exception-path closure,
plus an explicit assertion that bare `with kb.connect()` still does
NOT close (documenting the upstream sqlite3 behaviour we're working
around).
Closes#33159.
fal announced Krea 2 day-0 as an official API partner on 2026-05-27.
Add both variants to the FAL_MODELS catalog so they appear in the
'hermes tools' model picker alongside flux-2, gpt-image, nano-banana,
etc. Users who already bill through FAL or Nous Portal subscription
can now use Krea without registering directly with Krea.
Model IDs (as listed in fal's launch announcement):
fal-ai/krea/v2/medium/text-to-image — $0.030 / image
fal-ai/krea/v2/large/text-to-image — $0.060 / image
Both share the same parameter schema:
- aspect_ratio (1:1, 4:3, 3:2, 16:9, 2.35:1, 4:5, 2:3, 9:16)
mapped from our 3 abstract ratios via size_style='aspect_ratio'
- creativity (raw|low|medium|high; default medium)
- seed (reproducibility)
- image_style_references (up to 10 per Krea's API spec)
No num_inference_steps / guidance_scale / num_images — Krea 2 does
not expose those, and the supports-set filter strips them defensively
if the agent ever passes them.
This is the FAL-routed variant. The separate native-Krea-API plugin
shipped in PR #33236 (plugins/image_gen/krea/) remains available for
users who want to bill directly through Krea's API with their own
key. Both routes converge on the same underlying model.
Nous Portal managed-FAL gateway: this commit makes the model IDs
known to the catalog and the picker. The Portal team will need to
allowlist these two endpoint slugs on the fal-queue origin server-side
for them to flow through the managed billing path.
* feat(web): add collapsible sidebar for the dashboard
The desktop sidebar can now be collapsed to an icon-only rail via a
toggle button in the sidebar header. State is persisted in
localStorage so it survives page reloads.
When collapsed (lg+ only):
- Sidebar shrinks from w-64 to w-14 with a smooth width transition
- Nav items show only their icon with a native title tooltip
- Brand text, plugin headings, system actions, theme/language
switchers, auth widget, and footer are hidden
- Mobile drawer behavior is unchanged (always full-width)
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(web): align sidebar tooltips to sidebar edge consistently
Tooltip left position now uses the sidebar's right edge instead of the
anchor element's right edge, so narrow anchors (theme/language switchers)
align with full-width anchors (nav links, system actions).
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(web): add tooltip animations, restore theme label, rename Sessions tab
- Sidebar tooltips now animate in with a subtle 120ms ease-out slide;
subsequent tooltips within the same hover sequence appear instantly
(no delay/animation) following Emil Kowalski's tooltip pattern
- Restore theme name label when sidebar is expanded
- Rename Sessions segment tab to "History" across all 16 locales
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(web): smooth sidebar collapse animation
- Remove icon centering on collapse; icons stay left-aligned at px-5
so they don't jump during the width transition
- Text labels fade out with opacity transition instead of instant
display:none, clipped naturally by overflow-hidden
- Slow collapse duration from 450ms to 600ms for a more relaxed feel
- Gateway dot always rendered with opacity toggle so it doesn't
slide in from the right on collapse
- Pin gateway dot at fixed left offset (pl-[1.625rem]) to align
with nav icons
- Align header toggle button with justify-center when collapsed
- Bottom switchers use items-start when collapsed to prevent reflow
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Debian 13 ships only `python3` — there's no `/usr/bin/python` symlink. When
the agent emits bash commands using bare `python` (which models do frequently
from their training prior), every such call fails with:
/usr/bin/bash: python: command not found
Tool terminal returned error … exit_code 127
The agent then retries with different approaches, sessions take longer, and
agent.log fills with WARNING noise.
`python-is-python3` is the standard Debian package that drops a
`/usr/bin/python → python3` symlink. ~30 KB, zero behavior change for
anything calling `python3` directly; transparent fix for everything else.
Fixes#33178.
When operators ran `docker exec <c> hermes login` (or anything else
that wrote under $HERMES_HOME) they defaulted to root, leaving
/opt/data/auth.json root:root mode 0600. The supervised gateway
(UID 10000) then couldn't read its own credentials and returned
"Provider authentication failed: Hermes is not logged into Nous
Portal" on every Telegram/Discord/etc. message — even though
`docker exec <c> hermes chat -q ping` (also root) succeeded because
root could read its own root-owned file. _load_auth_store swallowed
PermissionError as a parse failure and copied the file aside as
auth.json.corrupt, making the diagnostic more misleading.
Fix: install a privilege-drop shim at /opt/hermes/bin/hermes,
prepended ahead of the venv on PATH. When invoked as root the shim
exec's the real venv binary via `s6-setuidgid hermes` — so any file
the docker-exec session writes is uid-aligned with the supervised
processes. Non-root callers (the supervised processes themselves,
`docker exec --user hermes`, kanban subagents, anything inside the
container that's not coming through docker-exec) hit a single exec
to the absolute venv path with no privilege change.
Recursion is impossible: the shim exec's the venv binary by
absolute path (/opt/hermes/.venv/bin/hermes), so the second hop
cannot re-enter the shim regardless of PATH state. No sentinel env
var needed (unlike #33583's gateway-run redirect which DOES need
HERMES_S6_SUPERVISED_CHILD because there's no absolute-path
equivalent for the s6 dispatch).
Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for
diagnostic sessions where the operator deliberately wants root.
Strict truthiness (1/true/yes case-insensitive); typos like `=0`
do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in
#33583.
If `s6-setuidgid` is missing (someone stripped s6-overlay in a
downstream fork), the shim exits 126 with a remediation message
pointing at `--user hermes` and the opt-out — never silently runs
as root.
Test plan:
- tests/docker/test_docker_exec_privilege_drop.py — 11 tests
- shim drops root to hermes uid (file ownership check)
- shim short-circuits for non-root docker exec
- HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root
- strict-truthiness parametrization (5 falsy values reject)
- main CMD path unaffected (recursion guard)
- E2E: every file written by docker-exec is readable by uid 10000
- Full tests/docker/ harness: 32/32 pass against fresh image build
- shellcheck --severity=error: clean
- hadolint: clean
- Manual: reproduced the original symptom (root-owned auth.json)
by bypassing the shim; confirmed default docker-exec produces
hermes-owned files; confirmed opt-out env keeps root semantics.
Known follow-up: this prevents NEW instances of the bug. Volumes
that already have root:root /opt/data/auth.json from a pre-shim
image need a one-time `chown hermes:hermes` before rebooting onto
the new image. A stage2-hook chown sweep can self-heal that, but
is deferred per scope decision.
Follow-up to #33583 (the gateway-run-supervised redirect).
Before this fix, the supervised gateway's stdout (most visibly the
"Hermes Gateway Starting…" rich-console banner) was swallowed by
`s6-log` into the rotated file at
`${HERMES_HOME}/logs/gateways/<profile>/current` and never reached
`docker logs`. Operational signal lived in two places:
* **docker logs** — saw stderr (Python `logging` defaults to
stderr), so warnings/errors were visible.
* **the rotated file** — saw stdout (rich banners, `print()`
output, third-party libs that wrote to fd 1).
This was surprising for users coming from the pre-s6 image, where
`docker run … gateway run` produced a single unified stream in
`docker logs`. They'd see partial output, conclude something was
broken, and dig around for the missing pieces.
Fix: add the `1` s6-log action directive before the file destination
so each line is forwarded to s6-log's stdout — which propagates up
the s6-supervise pipeline to /init's stdout = container stdout =
`docker logs`. The file destination is preserved as a second
destination, so the rotated log (with ISO 8601 timestamps) still
exists for `hermes logs` and for survival across container restarts.
Trade-off considered: timestamps. Putting `T` between `1` and the
file destination (not before `1`) means:
* docker logs sees raw lines — Python's logging formatter has its
own timestamps, and `docker logs --timestamps` adds another
layer when desired. No double-stamping in the common reading
path.
* The persisted file gets s6-log's ISO 8601 timestamp so even
output that lacked a Python-logger timestamp (rich banners,
third-party raw prints) is correlatable in `current`.
Verification:
* New unit-test assertion in `test_service_manager.py` locks the
`s6-log 1` directive into the rendered run-script. Mutation-
tested by reverting to the pre-fix script (no `1`); the assert
catches it cleanly.
* New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs`
builds the image, runs `docker run … gateway run`, and asserts
the unique `⚕` banner glyph reaches `docker logs`. Also verifies
the rotated file still contains the banner (no regression on
the existing file destination). Mutation-tested end-to-end: built
a deliberately-broken image without the `1` directive and the
test failed exactly as designed, citing the banner present in
`current` but absent from `docker logs`.
* `website/docs/user-guide/docker.md` gains a new `:::note Where
gateway logs go` admonition documenting both destinations and
the audit-log file at `${HERMES_HOME}/logs/container-boot.log`.
Existing functionality preserved: every other docker-harness test
still passes against the new image. Unit-test sweep across
`tests/hermes_cli/` (5561 tests) is green.
* fix(tui): suppress mouse-residue leaks during Python launcher startup
`hermes --tui …` spends ~100–300ms inside the Python launcher (lazy
imports, arg parsing, session resolution) before exec'ing the Node TUI
binary. During that window stdin is still in cooked + echo mode. If a
prior session left DEC mouse tracking asserted (or the user spammed
mouse movement while the previous session was opening), the terminal
keeps emitting `\\x1b[<…M` SGR motion reports that get echoed straight
back into the user's shell scrollback as literal `^[[<…M` text and
sit there above the TUI banner until the next clear.
The Node side already calls `resetTerminalModes()` in `entry.tsx`, but
by then the race is already lost — the bytes echoed during the Python
warmup window were committed to the scrollback before Node started.
Fix: write the mouse-tracking disable sequence at the very top of
`hermes_cli.main`, before every heavy import. The terminal stops
emitting motion events as soon as the bytes hit the wire (one TTY
round-trip), shrinking the race window from hundreds of milliseconds
to a few. `HERMES_TUI_NO_EARLY_DISABLE=1` opts out for diagnostics.
* test(tui): drop dead _reload_main, hoist import out of patch context
Addresses Copilot review on PR #31213.
The tests used to import `hermes_cli.main` inside the `patch("os.write")`
context, which Copilot pointed out is order-dependent: if the module
is already loaded (e.g. imported by a prior test in the same process),
the import is a no-op and the patch only sees the explicit
`_suppress_mouse_residue_early()` call. Either way the assertion can
flake when run alongside other tests.
Move the import to module scope — every subprocess gets a fresh
`hermes_cli.main`, whose module-level invocation is a no-op under
pytest argv. Tests then exercise `_suppress_mouse_residue_early()`
directly inside their own patch context. Also drop the unused
`_reload_main` helper.
* fix(tui): skip early mouse-disable when stdout is not a TTY
Addresses Copilot review on PR #31213.
`hermes --tui … >log` or CI capture pipes fd 1 away from the terminal.
The disable bytes can't reach the terminal in that case but would
still get written into the log file as raw CSI sequences. Guard with
`os.isatty(1)` inside the existing `try/except OSError` block so the
'never break startup' contract holds.
* docs(tui): rephrase 'raw cooked mode' as 'cooked + echo mode'
Copilot review nit on PR #31213 — the original wording was self-
contradictory. Pre-TUI stdin state is cooked + echo (kernel TTY
discipline still owns the line buffer and echoes input back). The
TUI switches it to raw mode later when Ink mounts.