* fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern
- BUG-009 (CRITICAL): freeze HERMES_YOLO_MODE at module import via
_YOLO_MODE_FROZEN; prevents skills/prompt-injection from calling
os.environ["HERMES_YOLO_MODE"]="true" at runtime to bypass all checks
- BUG-002 (HIGH): replace substring "APPROVE" in answer with exact
answer == "APPROVE" in _smart_approve; prompt already requests exactly
one word, substring match was exploitable via verbose LLM responses
- BUG-001 (MEDIUM): add logger.warning for every dangerous command that
auto-approves in non-interactive non-gateway context; makes silent
approvals visible in audit logs without breaking script behavior
- BUG-008 (LOW): expand curl/wget pipe pattern to cover | /bin/bash and
| bash -c variants, not just | sh / | bash
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(approval): add missing is_truthy_value import + fix yolo test patches
_YOLO_MODE_FROZEN uses is_truthy_value() from utils — import was missing.
Tests that set HERMES_YOLO_MODE via monkeypatch.setenv() no longer work
because the value is frozen at import time; update them to patch the
module-level flag directly via monkeypatch.setattr().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
register_env_passthrough() (the skill-declared path) filters out names in
_HERMES_PROVIDER_ENV_BLOCKLIST and logs a warning citing GHSA-rhgp-j443-p4rf.
_load_config_passthrough() (the config.yaml path) did not. Both feed the
same is_env_passthrough() allowlist that local.py and code_execution_tool.py
consult before stripping a variable from the child env.
A skill that wanted to leak ANTHROPIC_API_KEY or OPENAI_API_KEY into
execute_code could no longer self-register the name (the GHSA fix
blocks it), but the same outcome was still reachable by asking the
operator to add the name to terminal.env_passthrough in config.yaml,
or by any in-process actor with write access to ~/.hermes/config.yaml.
Apply the same _is_hermes_provider_credential filter inside
_load_config_passthrough, mirroring the skill-path warning so operators
see the same explanation. Non-Hermes API keys (TENOR_API_KEY,
NOTION_TOKEN, etc.) are unaffected since they are not in the blocklist.
* perf(bitwarden): persist secret-fetch cache across CLI invocations
Every `hermes` invocation paid a ~380ms tax for `bws secret list` to
Bitwarden Secrets Manager because the existing cache was in-process only.
Back-to-back `hermes chat -q`, gateway-spawned agents, and cron-launched
runs all re-fetched.
Adds a disk-persisted L2 cache at `<hermes_home>/cache/bws_cache.json`
(mode 0600, never contains the access token — only the SHA-256
fingerprint prefix). Same TTL as the in-process cache. Read on miss,
write on bws success, ignored on key mismatch / corruption / expiry.
Measured on a startup profile:
load_hermes_dotenv() cold: 372ms → warm (disk cache hit): 20ms
End-to-end `hermes --version` cold→warm: 666ms → ~295ms.
In a hermes-vs-codex benchmark across 11 single- and multi-turn tasks
(framework overhead = wall − llm − tool_exec, median over 3 trials):
cohort before after saved
single-turn (median) 2.96s 2.31s -0.65s
multi-turn (5-turn) 9.40s 8.95s -0.45s (≈0.3s/turn)
Hermes now wins head-to-head on 6/11 tasks vs codex (was 4/11 before).
The remaining ~0.6s single-turn delta is mostly Python's own import
cost in hermes_cli.main, which is a separate optimization.
* perf(cli): lazy-load model catalog + dedupe config.yaml reads at startup
Two import-time wins on top of the bws disk-cache fix:
1. Lazy-load `hermes_cli.models._PROVIDER_MODELS` via PEP 562
module-level `__getattr__`. The catalog is ~55ms of work that was
eagerly imported on every CLI invocation (line 4557 `if not
_is_termux_startup_environment(): from hermes_cli.models import
_PROVIDER_MODELS`). Audit showed every internal call site already
does its own function-local import; only test code reads
`hermes_cli.main._PROVIDER_MODELS` as a module attribute, and
__getattr__ keeps that working transparently. First access triggers
the import once and caches the result on the module via
`globals()[name] = ...`, so subsequent reads are dict lookups.
2. Dedupe the double config.yaml read in the top-of-module bootstrap.
Previously: one raw yaml.safe_load for the `security.redact_secrets`
bridge, then a separate full `load_config()` (with deep-merge) for
`network.force_ipv4`. Both keys come from the same file. Merged
into one raw yaml load.
Combined with the bws cache fix in the previous commit:
hermes --version wall time:
original (cold): 666 ms
after bws fix (warm): 295 ms
after lazy-load + dedupe: 228 ms (-67 ms additional, -66% from original)
Tests:
- tests/hermes_cli/test_api_key_providers.py: 173/173 pass
(lazy __getattr__ correctly handles
`from hermes_cli.main import _PROVIDER_MODELS`)
- tests/test_ipv4_preference.py + tests/hermes_cli/test_redact_config_bridge.py +
tests/agent/test_redact.py: 93/93 pass (dedupe preserves both bridges)
- tests/test_bitwarden_secrets.py + env_loader tests: 49/49 pass
V4A patch '*** Update File:', '*** Add File:', '*** Delete File:' headers
come from patch CONTENT, not the explicit `path=` argument. That makes
them attacker-influenceable through skill content, web extract output,
prompt injection, and other surfaces the agent processes. Headers like
'*** Update File: ../../../etc/shadow' would resolve relative to the
agent's cwd; in deployment configurations where that cwd is deep enough
to land outside Hermes' protected paths, the write could land somewhere
the agent operator did not intend.
Reject any V4A header containing a '..' path component before applying
the patch. The explicit `path=` argument on patch_tool is UNCHANGED —
the agent legitimately uses '..' there (e.g. `patch path='../other_module/x.py'`
from a worktree dir is normal cross-module editing).
Regression tests: V4A Update header with traversal rejected, V4A Add
header with traversal rejected, patch_v4a never invoked when rejection
fires.
Salvaged from PR #29395 by @waefrebeorn. The original PR added
has_traversal_component as a blanket reject on read_file_tool,
write_file_tool, patch_tool's explicit path, and search_tool — that
would break legitimate agent operation where '..' is normal. Also
dropped the over-eager skills_guard pattern additions
(pickle.loads/marshal.loads/ctypes.CDLL/importlib at high/critical
severity would false-positive on legit data-science and FFI skills).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Expand _MEMORY_THREAT_PATTERNS from 13 to 24 regex patterns and align
_INVISIBLE_CHARS with skills_guard.py (10 → 17 characters).
Key changes:
- Add multi-word bypass prevention (?:\w+\s+)* to injection patterns
- Add missing injection patterns: role_pretend, leak_system_prompt,
remove_filters, fake_update, translate_execute, html_comment_injection,
hidden_div
- Add exfiltration patterns: send_to_url, context_exfil
- Add persistence patterns: agent_config_mod, hermes_config_mod
(both require modification-verb prefix to avoid false positives on
mere mentions of config filenames)
- Add hardcoded secret detection pattern
- Add role_hijack precision fix: require article after "now" to avoid
blocking "you are now ready/connected/set up" etc.
- Expand invisible unicode set with directional isolates (U+2066-2069)
and invisible math operators (U+2062-2064)
Test coverage expanded from ~8 to ~30 scan tests including dedicated
false-positive regression tests for all precision-sensitive patterns.
Known limitations (deferred to follow-up PRs):
- prompt_builder.py and cronjob_tools.py still use older pattern sets
- No semantic/LLM-based scanning (regex-only approach)
- No cross-entry or cross-store analysis
show_snapshot.py unpickled a user-supplied path unconditionally. pickle.loads
is equivalent to arbitrary code execution, so a snapshot from an untrusted
source = RCE. Require an explicit --i-trust-this-file acknowledgement before
calling pickle.loads, and emit a stderr warning when proceeding.
Co-authored-by: Jiahui-Gu <jiahuigu@users.noreply.github.com>
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:
1. The non-stream stale-call detector estimated context tokens from
``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
their conversational load in ``input`` (with ``instructions`` and
``tools``), so every Codex turn logged ``context=~0 tokens`` and the
detector never applied its >50k / >100k tier bumps.
2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
main Codex path. The chat_completions path and the auxiliary Codex
adapter both forwarded it; the main path skipped it through three
places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
``_preflight_codex_api_kwargs``).
3. The streaming stale detector had the same payload-shape bug for
``codex_responses`` requests, which route through the non-streaming
detector (it's the path that emits the user-facing
"No response from provider for 300s (non-streaming, ...)" warning that
reporters keep pasting).
This commit:
- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
used by both the non-stream and stream detectors. Handles ``messages``
(Chat Completions), ``input + instructions + tools`` (Responses API),
bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
and ``_preflight_codex_api_kwargs`` (with guards against
zero/negative/inf/bool values), and wires
``_resolved_api_call_timeout()`` into the Codex branch of
``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
kick in faster when upstream stalls:
* base 300s -> 90s
* >50k 450s -> 150s
* >100k 600s -> 240s
These only apply when the user has *not* set
``providers.<id>.stale_timeout_seconds`` or
``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
context-tier scaling, transport timeout pass-through, and preflight
timeout pass-through / rejection of invalid values.
Closes#21444
Supersedes #21652#24126#31855
Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
Translates the full English docs corpus (335 files) into Simplified
Chinese under website/i18n/zh-Hans/. Combined with PR #31895 (cross-
locale link fix), the 简体中文 locale toggle now serves a complete
Chinese site with working cross-page navigation.
Pipeline:
- Claude Sonnet 4.6 via OpenRouter, 8-way concurrent
- Preserves frontmatter keys, code blocks, MDX/JSX, link URLs, brand
names, and technical jargon (prompt/token/hook/MCP/ACP/etc.)
- Translates only frontmatter title/description and prose
- Two largest files (configuration.md 93KB, research-paper-writing.md
107KB) retried with 64K max_tokens after initial fence-drift
- 3 manual post-fixes for MDX edge cases the model didn't escape:
< in optional-skills-catalog table, double-quotes in an alt= tag,
and a bare URL adjacent to a full-width period
Cost: ~$30 total (Sonnet 4.6 input $3/M + output $15/M).
Verified `npm run build` succeeds for both en and zh-Hans locales,
no double-prefixed /docs/zh-Hans/docs/ URLs in rendered output,
all in-page navigation resolves correctly.
Translations are machine-generated and may need human review on
specific pages — but they're an enormous improvement over the
previous state (3 zh-Hans pages out of 335).
When 'hermes chat --quiet --resume <id> -q "..."' is used, three status
messages were written to stdout via ChatConsole / _cprint:
- '↻ Resumed session <id> (N user messages, M total messages)'
- 'Session <id> found but has no messages. Starting fresh.'
- 'Session not found: <id>' / usage hint
This polluted the machine-readable stdout that automation wrappers capture
with $(...), making it impossible to cleanly separate the agent's answer
from the resume banner.
Fix: detect quiet mode via tool_progress_mode == 'off' and route the three
resume status messages to stderr (as plain text, matching the existing
stderr convention for session_id). Interactive mode is unchanged — it
still uses the Rich-rendered path through ChatConsole.
Surgical reapply of PR #11868. Original branch was stale against current
main; reapplied onto current cli.py by hand with original authorship
preserved via --author.
Follow-up to @someaka's fix.
Polish:
- Drop the redundant `_preflight_tokens >= threshold_tokens` clause.
`should_compress(tokens)` already short-circuits when tokens < threshold,
so the explicit comparison was dead code on the True branch.
Tests:
- Preflight: pin that should_compress() is called (anti-thrash has a vote).
Mocks should_compress to return False even with tokens past the raw
threshold and asserts no compression runs — exact bug shape from #29335.
- Gateway: AST scan of gateway/run.py asserts every
`session_entry.session_id = ...` assignment is followed by a
`session_store._save()` call within the same block. Three sites mutate
the session_id after compression; all three must persist or the next
turn loads the pre-compression transcript and re-loops. Empirically
verified the test catches the bug (drops the new _save() line → red).
AUTHOR_MAP:
- Map ed@bebop.crew -> someaka so the salvaged commit resolves to
@someaka in release notes.
Three compounding root causes:
A) run_conversation() result dict missing session_id — gateway's
dead-code guard at gateway/run.py:8700 never triggers
B) preflight compression bypasses should_compress() anti-thrashing —
re-triggers every turn when tool schemas dominate token budget
C) gateway updates session_entry.session_id in memory but doesn't
persist via session_store._save()
Fixes: #29335
Session IDs are profile-constrained, so the resume hint needs to
include the active profile for multi-profile users. Without this,
copying the hint from a non-default profile fails to resume the
correct session.
Before: hermes --resume 20260414_063228_c1240e
After: hermes --resume 20260414_063228_c1240e -p dev
Also includes -p on the resume-by-title hint. Skipped for
'default' and 'custom' profiles (no -p needed).
Surgical reapply of PR #9652. Original branch was stale against
current main (~6 months); reapplied onto current cli.py by hand
with original authorship preserved.
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any
shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds,
SenseVoice, curl pipelines — become an STT backend with zero Python.
Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved
untouched via the built-in local_command path) and the
register_transcription_provider() Python plugin hook also shipped in this
PR.
Resolution order (mirrors TTS exactly):
1. Built-in (local, local_command, groq, openai, mistral, xai)
→ native handler. Always wins.
2. stt.providers.<name>: type: command → command-provider runner.
3. Plugin-registered TranscriptionProvider → plugin dispatch.
4. No match → 'No STT provider available'.
Files
-----
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained;
added _resolve_command_stt_provider_config, _transcribe_command_stt,
and local helpers for template rendering, shell-quote context, and
process-tree termination. Helpers are documented as mirrors of their
tts_tool.py counterparts (kept local to avoid cross-tool private
import). Wire-in is one insertion point in transcribe_audio() after
the xai elif and before the plugin dispatcher. Plugin dispatcher
additionally defensively short-circuits when a same-name command
config exists (command-wins-over-plugin invariant).
- tests/tools/test_transcription_command_providers.py: 50 new tests
covering resolution (builtin precedence, type/command gating,
case-insensitive lookup, legacy stt.<name> back-compat), helpers
(timeout fallback, format validation, iter, has-any), template
rendering (shell-quote contexts, doubled-brace preservation),
end-to-end via _transcribe_command_stt (output_path read, stdout
fallback, timeout, nonzero exit envelope, model override,
language precedence), and dispatcher integration via the real
transcribe_audio() including command-wins-over-plugin and
builtin-shadow-rejection.
- tests/plugins/transcription/check_parity_vs_main.py: extended from
10 to 13 scenarios. New cases: command-provider-installed,
command-vs-plugin-same-name (verifies command wins precedence),
explicit-openai-with-command-shadow (verifies built-in wins).
Adds command_provider dispatch_kind detection via transcript prefix
(CMD: vs PLUGIN:) so command-provider scenarios can be distinguished
from plugin scenarios even when sharing a provider name.
- website/docs/user-guide/features/tts.md: new 'STT custom command
providers' section symmetric to the TTS section — example config,
placeholder grammar table (input_path / output_path / output_dir /
format / language / model), transcript-read-back semantics (file
first, then stdout fallback), optional keys table, behavior notes,
security note. Updated 'Python plugin providers (STT)' to include
the new 'When to pick which (STT)' decision table and updated
resolution-order section (now 4 layers instead of 3).
Verification
------------
189/189 STT targeted tests + 50/50 new command-provider tests pass.
Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/
8623/8623 — zero regressions across 14,199 tests.
Parity harness: 13 scenarios, 9 OK + 4 expected diffs
(no_provider_error → plugin, plugin_unavailable, command_provider × 2).
E2E live-verified in an isolated HERMES_HOME with a real .wav file:
command: → dispatched to stt.providers.my-fake-cli
plugin: → dispatched to registered TranscriptionProvider
command-wins-over-plugin: → command provider beats same-name plugin
builtin-wins-over-command: → built-in OpenAI handler fires;
stt.providers.openai: type: command
does NOT hijack it.
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
- CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged.
- Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.
The /resume usage hint shows '<session_id_or_title>' which a few users have
typed verbatim, including the angle brackets. Strip outer <>, [], "", and ''
from the argument before lookup so '/resume <abc123>' works the same as
'/resume abc123'. Mirrors the new bracket-stripping in the CLI handler.
Also let the gateway resolve a bare session ID. Previously the gateway only
called resolve_session_by_title, so '/resume <session_id>' always returned
'Session not found' even for valid IDs. Try get_session() first, fall back
to title resolution second.
Surgical reapply of PR #10215 (branch was based on a many-months-old main
and reverted ~3100 unrelated files; original commit by claw@openclaw.ai
preserved via --author).
PR #31416 (avoid persisting borrowed credential secrets) added
sanitize_borrowed_credential_payload, which strips access_token from
any auth.json pool entry whose (provider, source) isn't in the
_PERSISTABLE_PROVIDER_SOURCES allowlist.
(copilot, gh_cli) is borrowed (not in the allowlist), so the test
fixture's pre-seeded access_token now gets stripped at load_pool()
time, leaving the pool empty. resolve_target('1') then fails with
'No credential #1. Provider: copilot.'
Fix: align the test with the new contract. At runtime, copilot tokens
are hydrated by resolve_copilot_token() — mock that path so the pool
gets an entry the test can remove. The behavior under test
(suppression of gh_cli + env variants on remove) is unchanged.
CI repro on origin/main HEAD; reproduced locally with stock checkout.
Extend PR #31716 to plugin setup paths that were also using bare
getpass.getpass(): hindsight (4 sites), honcho, simplex, line. Same
mechanical swap onto hermes_cli.secret_prompt.masked_secret_prompt.
Two defense-in-depth fixes on cron output path handling:
1. cron/jobs.py:update_job() rejects mutation of the immutable 'id' field
(raises ValueError). Dashboard PUT /api/cron/jobs/{id} converts this to
HTTP 400. Without this, an attacker who can reach the update endpoint
could rename a job's id to '../escape' and move its output directory
outside OUTPUT_DIR.
2. cron/jobs.py:_job_output_dir() validates job IDs before composing
paths: rejects '.', '..', '/', '\\', absolute paths, and Windows drive
prefixes. Used by save_job_output() and remove_job() so legacy unsafe
IDs (from before this guard) fail closed rather than half-applying a
shutil.rmtree or output write outside the sandbox.
Tests:
- update_job rejects {'id': '../escape'} without renaming
- remove_job(legacy '../escape' id) raises ValueError without deleting
files outside OUTPUT_DIR or removing the job from the store
- save_job_output rejects '..', './escape', 'nested/escape',
absolute paths
- dashboard PUT /api/cron/jobs/{id} with {'id': '../escape'} returns
400, job list unchanged
Salvaged from PR #29826 by @zapabob. Simplified implementation:
- Dropped a 23-line _validate_job_output_id() helper using Path.parts
semantics. The inline check (path separators + dot-components +
is_absolute) is shorter and behaviorally identical.
- Dropped the secondary OUTPUT_DIR.resolve()/relative_to() check —
redundant once we reject any path separator at the input boundary.
- Dropped the _docs/2026-05-21_cron-output-path-hardening_codex.md
planning artifact (we don't check planning docs into the repo).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
The bug: cron/scheduler.py:_resolve_cron_enabled_toolsets returns an
LLM-supplied per-job enabled_toolsets verbatim. The disabled_toolsets
passed to AIAgent was a hardcoded [cronjob, messaging, clarify] that
ignored agent.disabled_toolsets from config.yaml. An LLM could call
cronjob(action='add', enabled_toolsets=['terminal','file'],
prompt='...') and the cron-spawned agent would receive terminal+file
even when the operator had globally disabled them.
Fix: new _resolve_cron_disabled_toolsets() helper that ALWAYS layers
agent.disabled_toolsets on top of the cron baseline. AIAgent's
disabled_toolsets takes precedence over enabled_toolsets, so this
stops the bypass regardless of what the per-job override contains.
This is the disabled-side fix. Three concurrent PRs (#25842, #25815,
#25780) proposed intersection-side variants on _resolve_cron_enabled_toolsets;
this fix is more robust because it stops the leak at the precedence
boundary AIAgent itself enforces, not at a layer above.
Regression test reproduces the issue's PoC exactly:
config.yaml has agent.disabled_toolsets=[terminal,file]; cron job has
enabled_toolsets=[web,terminal,file]; assertion: AIAgent receives
disabled_toolsets containing terminal AND file.
Salvaged from PR #25786 by @Schrotti77. Simplified the implementation:
dropped a 23-line _normalize_toolset_list() helper (handled str/tuple/
set/garbage input shapes) in favor of the existing convention
(agent_cfg.get('disabled_toolsets') or []) used elsewhere in the
codebase. YAML always parses these as lists; the elaborate normalizer
was theatre for shapes we never produce.
Closes#25752
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Follow-up on top of @jacevys' PR #21437 cherry-pick:
- _provider_model_ids() now also matches normalized == 'openai-api' for
the live /v1/models fetch path, so users see the full catalog instead
of just the curated list.
- Add gpt-5.5-pro and gpt-5.3-codex to the curated list for parity with
the existing 'openai' table (used as fallback when /v1/models fails).
- Add scripts/release.py AUTHOR_MAP entry for jacevys so CI doesn't
block the salvage PR.
The legacy runtime_calls[-1] == "anthropic" check in
test_model_switch_uses_requested_provider failed in CI under
specific test-shard scheduling with 'custom' == 'anthropic',
across multiple unrelated PRs on 2026-05-25. The May 23 pin
(commit 3127a41cb) monkeypatched parse_model_input + detect_provider_for_model
to remove the dependency on live _KNOWN_PROVIDER_NAMES module state but the
flake reappeared anyway — root cause still not reproducible locally even
under stress runs.
The other three assertions ("Provider: anthropic" in result,
state.agent.provider == "anthropic", state.agent.base_url ==
"https://anthropic.example/v1") already prove
fake_resolve_runtime_provider was called with requested="anthropic"
for the model-switch step — the agent's provider and base_url
come directly from that fake's return value. The tail-position
check was redundant and the only assertion that flaked.
Replaces runtime_calls[-1] == "anthropic" with
"anthropic" in runtime_calls so the plumbing path is still
covered without depending on call ordering.
The locale switcher appeared broken because hardcoded markdown links
(`](/docs/X)`) got double-prefixed by Docusaurus to `/docs/<locale>/docs/X`
(404) in non-English locales, and the MDX hero `<a href>` on the index
page escaped locale routing entirely.
Changes:
- Rewrite 922 `](/docs/X)` -> `](/X)` across 166 docs files (strip trailing
.md too). Docusaurus prepends locale + baseUrl itself.
- docs/index.md -> index.mdx; hero "Get Started" anchor -> Docusaurus
<Link> so it stays inside the active locale.
- Drop `ko` locale entirely from docusaurus.config.ts + delete i18n/ko/
(4 stale auto-translated kanban pages, <2% coverage, misleading).
Verified `npm run build` succeeds for both en and zh-Hans; `build/zh-Hans/
index.html` has no /docs/zh-Hans/docs/... double-prefixed paths.
PR2 will translate the 335 English docs into i18n/zh-Hans/.
#27385 reports that on macOS the browser sees the xAI 'authorization
received' success page but Hermes still raises xai_callback_timeout.
The loopback HTTP handler was silent — no log line on receipt, no log
line on wait timeout — so triaging the gap between 'browser saw
success' and 'CLI saw timeout' required either a code change or
guesswork.
Adds two INFO log lines:
- Per callback hit (handler): path, has_code, has_state, has_error,
truncated User-Agent. Booleans / fingerprints only — no actual
code/state strings leak.
- On wait timeout: report whether result.code or result.error was
populated at deadline. Distinguishes three failure modes:
1. No hit log + timeout log w/ has_code=False has_error=False
→ xAI's IDP never reached the loopback (firewall, port-binding,
IPv6/IPv4 mismatch, browser blocked private-network access).
2. Hit log w/ has_code=False has_error=False + timeout log
→ xAI hit the loopback without OAuth params (the bare-URL
case the handler already 400s on).
3. Hit log w/ has_code=True + timeout log w/ has_code=False
→ result_lock contention or race; would indicate a real bug.
133/133 in tests/hermes_cli/test_auth_xai_oauth_provider.py,
tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py, and
tests/run_agent/test_codex_xai_oauth_recovery.py.
Follow-up to @Strontvod's fix.
Tests:
- Five new tests in test_update_concurrent_quarantine.py cover the parent-
chain exclusion: the .exe launcher is excluded, an unrelated sibling
hermes.exe is still reported, multi-level ancestry is fully excluded,
PID cycles in the parent chain don't hang, and a partially-stubbed
psutil (no Process attribute) degrades gracefully instead of crashing.
- New _fake_psutil_with_parent_chain helper builds a fuller stand-in
(Process / NoSuchProcess / AccessDenied + process_iter) than the
process_iter-only SimpleNamespace the older tests use.
Hardening:
- Broaden the except in the parent-walk to bare Exception. The original
fix listed (NoSuchProcess, AccessDenied, ValueError), but those names
are evaluated lazily during exception matching — if psutil is a partial
stub without the attribute, the exception handler itself raises
AttributeError that escapes. The function is documented as 'never raises'
(the surrounding update flow depends on it), so the broader catch keeps
the contract regardless of how the dependency is shaped.
AUTHOR_MAP:
- Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit
resolves to @Strontvod in the release notes.
All 18 detect_concurrent + quarantine tests pass.
On Windows, the setuptools-generated hermes.exe launcher is a separate native
process that spawns python.exe (the interpreter running the update code).
os.getpid() returns the Python PID, but the launcher (which holds the file
lock) is the parent. Without walking the parent chain, every 'hermes update'
reports its own launcher as a concurrent instance - a false positive.
This patch builds an exclusion set containing the Python process and its
entire ancestor chain, so the running invocation never reports itself.
The new tests/docker/ suite (added by this PR) was being picked up by the
sharded pytest matrix in tests.yml, where its session-scoped `built_image`
fixture issued a 3-7min `docker build` under tests/docker/conftest.py's
180s pytest-timeout cap. Every test in the directory failed in fixture
setup across all 6 shards.
Fix the suite so it actually runs (not skips):
1. Wire the docker tests into docker-publish.yml's build-amd64 job, right
after the existing smoke test. The image is already loaded into the
local daemon as `nousresearch/hermes-agent:test`; set
HERMES_TEST_IMAGE to that and the fixture's pre-built-image branch
short-circuits the rebuild. 21 tests run in ~90s locally against a
prebuilt image, no rebuild cost on top of the existing build step.
2. Exclude tests/docker/ from scripts/run_tests_parallel.py's default
discovery so the sharded matrix in tests.yml stops trying to build
the image. Explicit positional paths (`pytest tests/docker/` or
`scripts/run_tests.sh tests/docker/`) still pick the suite up — the
skip rule honors directory-level user intent, matching the existing
per-file override pattern.
The dedicated docker-tests step runs on every PR that touches docker
code (the existing path filters on docker-publish.yml already cover
`tests/docker/**` via `**/*.py`), so the suite gates real changes.
(cherry picked from commit 4c481860ce)
After the supervise-perms fix lands, the s6 lifecycle actually works
for the hermes user — hermes -p <profile> gateway start now genuinely
brings the supervised gateway up rather than silently no-op'ing on
EACCES. That exposes a latent bug in this test's assertion: it
expected 'want up' to appear literally in s6-svstat output, but
s6-svstat elides redundancies — when the slot is currently up AND
s6 wants it up, the output is just 'up (pid N pgid N) X seconds';
the explicit 'want up' token only appears when current ≠ wanted
(e.g. 'down (exitcode 1) … , want up' on a crash-loop).
Add a small helper _svstat_wants_up() that reads the want-state
correctly across both spellings:
* 'up …' → wanted up (unless explicit 'want down')
* 'down …, want up' → wanted up explicitly
* 'down …' → wanted down
Both stop and start assertions now use the helper. Also rewords
the module docstring to acknowledge that the supervised process
may succeed OR crash-loop depending on environment, but the want-
state contract holds either way.
(cherry picked from commit 02c933aedc)
PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed
because the test hardcoded known_inits = ('tini', 'dumb-init',
'catatonit'). The PR replaced tini with s6-overlay's /init (which execs
s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name,
so the substring scan against ENTRYPOINT missed it.
Two-part fix:
1. Extend the accepted token list to include 's6-overlay', 's6-svscan',
and '/init'. The contract these tests enforce is behavioural ('some
PID-1 init reaps SIGCHLD'), so the names list is purely a recognition
table and any reaper-capable family should qualify.
2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the
sibling check) against comment-only matches. It was scanning the full
Dockerfile text and only passed because the word 'tini' is still in
a historical comment explaining why we used to use it. The next
person to clean up that comment would have silently broken the test.
New _instruction_text() helper joins only the parsed, non-comment
Dockerfile instructions so stale comments can't satisfy the check.
(cherry picked from commit ffc1bb6393)
Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.
The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.
The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.
The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.
New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
svc_dir/event/ hermes:hermes 03730
svc_dir/supervise/ hermes:hermes 0755
svc_dir/supervise/event/ hermes:hermes 03730
svc_dir/supervise/control hermes:hermes 0660 (FIFO)
svc_dir/log/event/ hermes:hermes 03730 (if log/ present)
svc_dir/log/supervise/ hermes:hermes 0755
svc_dir/log/supervise/event/ hermes:hermes 03730
svc_dir/log/supervise/control hermes:hermes 0660 (FIFO)
The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).
Wiring
------
1. S6ServiceManager.register_profile_gateway calls
_seed_supervise_skeleton(tmp_dir) just before publishing the
slot via Path.replace. Runtime-registered profile gateways are
set up by hermes.
2. container_boot._register_service does the same in the cont-init.d
reconciliation path so boot-time-restored profile slots inherit
the same layout.
3. New cont-init.d/015-supervise-perms script chowns the supervise/
and event/ trees for STATIC s6-rc services (dashboard,
main-hermes). These are spawned by s6-rc before cont-init.d
gets to run, so the EEXIST-trick doesn't apply; we chown the
already-existing tree instead. s6-supervise keeps using the
same files; it never re-asserts ownership on a running service.
The script skips s6-overlay internal services (s6rc-*,
s6-linux-*) so the supervision tree itself stays root-only.
015- slot is intentional: lex-sorts between 01-hermes-setup
and 02-reconcile-profiles in the container's C-locale, so
the chown finishes before the reconciler walks the scandir.
Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.
Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:
(a) /run/service/dashboard is a symlink to a TRANSIENT
/run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
mid-transaction; the touch landed in a soon-to-be-discarded
tmp.
(b) Even when written to the final /run/s6-rc/servicedirs/
location, the 'down' file is only consulted by s6-supervise
at slot startup. s6-rc's user-bundle explicitly transitions
'dashboard' to 'up' on every boot, overriding any down
marker.
The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.
03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).
Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.
References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
+ Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
semantics in finish scripts
(cherry picked from commit c41f908ad4)
Documents five approaches for adding tools beyond what the official
image ships with: npx/uvx for npm/Python tools, ad-hoc apt installs
that Hermes remembers, derived images for durability, sidecar
containers for multi-service stacks, and upstreaming via issue/PR
for broadly useful additions.
Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1
contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added
with #15012) hardcoded tini/dumb-init/catatonit as the only accepted
inits, so they failed after #30136 replaced tini with s6-overlay's
/init.
s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly
on SIGCHLD — same contract the test exists to enforce. Two updates:
* test_dockerfile_installs_an_init_for_zombie_reaping — accept
's6-overlay' as a known-installed marker (matches the
s6-overlay install layer in Ben's Dockerfile).
* test_dockerfile_entrypoint_routes_through_the_init — accept
'/init' as a known-routed marker (s6-overlay's PID-1 binary
lives at /init by convention).
Both assertions still fire if a future Dockerfile rewrite drops
the init entirely. Local: 7/7 pass.
Two CI follow-ups to @benbarclay's #30136 salvage:
1. scripts/run_tests_parallel.py — add 'docker' to _SKIP_PARTS so
the new tests/docker/ harness doesn't run in the regular test (N)
matrix. The harness builds the real Dockerfile in a session
fixture, which can exceed pytest-timeout's 180s ceiling on
ubuntu-latest where Docker IS available — it surfaced as 6
identical setup-timeout failures across slices 1–6 on the first
CI run.
The docker harness has its own dedicated runner via
.github/actions/hermes-smoke-test (added in #30136) plus the
docker-lint workflow. Same treatment as tests/integration/ and
tests/e2e/ — runs separately, not in the main shards.
2. hermes_cli/service_manager.py — pin encoding='utf-8' on the
/proc/1/comm read_text call. Ruff PLW1514 enforcement rolled in
between Ben's last push and the salvage; pure ruff-fix, no
behavior change.
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.