Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.
The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.
The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.
The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.
New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
svc_dir/event/ hermes:hermes 03730
svc_dir/supervise/ hermes:hermes 0755
svc_dir/supervise/event/ hermes:hermes 03730
svc_dir/supervise/control hermes:hermes 0660 (FIFO)
svc_dir/log/event/ hermes:hermes 03730 (if log/ present)
svc_dir/log/supervise/ hermes:hermes 0755
svc_dir/log/supervise/event/ hermes:hermes 03730
svc_dir/log/supervise/control hermes:hermes 0660 (FIFO)
The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).
Wiring
------
1. S6ServiceManager.register_profile_gateway calls
_seed_supervise_skeleton(tmp_dir) just before publishing the
slot via Path.replace. Runtime-registered profile gateways are
set up by hermes.
2. container_boot._register_service does the same in the cont-init.d
reconciliation path so boot-time-restored profile slots inherit
the same layout.
3. New cont-init.d/015-supervise-perms script chowns the supervise/
and event/ trees for STATIC s6-rc services (dashboard,
main-hermes). These are spawned by s6-rc before cont-init.d
gets to run, so the EEXIST-trick doesn't apply; we chown the
already-existing tree instead. s6-supervise keeps using the
same files; it never re-asserts ownership on a running service.
The script skips s6-overlay internal services (s6rc-*,
s6-linux-*) so the supervision tree itself stays root-only.
015- slot is intentional: lex-sorts between 01-hermes-setup
and 02-reconcile-profiles in the container's C-locale, so
the chown finishes before the reconciler walks the scandir.
Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.
Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:
(a) /run/service/dashboard is a symlink to a TRANSIENT
/run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
mid-transaction; the touch landed in a soon-to-be-discarded
tmp.
(b) Even when written to the final /run/s6-rc/servicedirs/
location, the 'down' file is only consulted by s6-supervise
at slot startup. s6-rc's user-bundle explicitly transitions
'dashboard' to 'up' on every boot, overriding any down
marker.
The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.
03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).
Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.
References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
+ Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
semantics in finish scripts
(cherry picked from commit c41f908ad4)
Follow-up to @benbarclay's Docker s6 PR (#30136). The Phase 4 hooks
`_maybe_register_gateway_service` and `_maybe_unregister_gateway_service`
were already documented as "no-op on host", but they reached that no-op
by:
1. importing `hermes_cli.service_manager`
2. calling `get_service_manager()` (which calls `detect_service_manager()`)
3. checking `mgr.supports_runtime_registration()` and returning False
If anything in step 1 or 2 raised an unexpected exception (e.g. a host
machine with a partial s6 install — `/proc/1/comm == s6-svscan` somehow,
but `/run/s6/basedir` absent, or vice versa), the `except Exception`
in the hook would print a confusing "⚠ Could not register s6 gateway
service: ..." warning on a non-container machine that has never touched
the container.
Reorder so `detect_service_manager() != "s6"` is checked FIRST, and
return silently for any detection failure. Host machines now:
- never import the s6 backend
- never call get_service_manager()
- never print an s6-shaped warning under any failure mode
E2E confirmed on host Linux (systemd):
`_maybe_register_gateway_service(...)` produces empty stdout,
detect_service_manager() returns "systemd".
Existing tests updated to patch `detect_service_manager` for the s6
call-through cases (they previously relied on get_service_manager
being the only gate, which is no longer true). Added one new test —
`test_register_silent_when_detect_throws` — asserting that a broken
detector cannot leak a warning to host users.
cc @benbarclay — visible behavior change vs. your branch is one
fewer code path on host. Test changes are minimal (one helper +
`_patch_detect_s6` opt-in per s6 test). Happy to revert if you
prefer the original shape.
PR #30136 review noted the asymmetry: `register_profile_gateway`
used tmp_dir + rename to publish a new service slot atomically,
but the boot-time reconciler wrote files into the slot directly.
Same underlying concern (a concurrent s6-svscan rescan could
observe a half-populated directory), different code path.
Rewrite `container_boot._register_service` to mirror the manager:
build everything in `<scandir>/gateway-<profile>.tmp/`, then
`Path.replace` into place. If a previous interrupted run left a
`.tmp` sibling, it's cleaned up before the new build starts. If
the target already exists, it's removed before the rename so
`Path.replace` doesn't error on a non-empty target (Linux `rename`
overwrites empty targets only).
Three new tests: atomic publication leaves no .tmp leftovers,
overwriting an existing slot still leaves no .tmp leftovers, and
a stale .tmp from an interrupted run is cleaned up automatically.
PR #30136 review noted: container-boot.log was append-only with no
rotation. On a long-lived container with frequent restarts and
many profiles it would grow unboundedly (~80 B per profile per
reconcile pass).
Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`,
≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the
current file is renamed to `container-boot.log.1` (replacing any
existing one) before new entries are appended. Worst case is two
files at ~512 KiB — well within visibility limits for grep/cat.
Rotation is intentionally simple (no logrotate or s6-log machinery
for one append-only file). Failures during rotation are logged via
the module logger and treated as non-fatal — we keep appending to
the existing file rather than dropping the reconcile entry. Three
new unit tests cover above-threshold rotation, below-threshold
non-rotation, and overwrite of an existing .1 file.
PR #30136 review caught: `_allocate_gateway_port()` in profiles.py
computed a SHA-256-derived port that was threaded through
`register_profile_gateway(profile, port=N)` →
`_render_run_script(profile, port, extra_env)` → and then **ignored**.
The rendered run script picked the bind port from the profile's
config.yaml (`[gateway] port = …`), never from the allocator. So
the entire allocator + parameter chain was dead code.
Remove:
* `hermes_cli.profiles._allocate_gateway_port` (deterministic
SHA-256 → [9200, 9800) — never used).
* `port` kwarg from `ServiceManager.register_profile_gateway`
(Protocol + Mixin + S6 implementation).
* `port` positional arg from `_render_run_script(profile, port,
extra_env)` — now `_render_run_script(profile, extra_env)`.
* The pass-through call in `profiles._maybe_register_gateway_service`.
config.yaml is now the single source of truth for gateway port
selection — matches reality and reduces the API surface. Three
explanatory comments in service_manager.py / profiles.py document
the retirement so future readers don't reach for the allocator and
find a ghost.
Tests: drop the three `_allocate_gateway_port` tests; update
fakes' signatures throughout test_service_manager.py and
test_profiles_s6_hooks.py to match the new no-port API.
PR #30136 review caught: `S6ServiceManager.start/stop/restart` called
`subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as
a raw `CalledProcessError` traceback. The two cases operators
actually hit are:
1. The service slot doesn't exist — most commonly because the user
typed a profile name wrong (`hermes -p typo gateway start`).
2. s6-svc itself fails — most commonly EACCES on the supervise
control FIFO when running unprivileged.
Both deserve named errors with actionable messages, not stacktraces.
Changes:
* Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`:
- `GatewayNotRegisteredError(profile)` — carries the unprefixed
profile name; message: `no such gateway 'typo': register it
with `hermes profile create typo` first, or pass an existing
profile name via `-p <name>``.
- `S6CommandError(service, action, returncode, stderr)` — carries
the s6-svc rc and stderr; message: `s6-svc start on
'gateway-coder' failed (rc=111): <stderr>`.
* Factor lifecycle dispatch through `_run_svc(flag, label, name)`:
pre-checks that the service directory exists (raises
GatewayNotRegisteredError before invoking s6-svc), then runs
s6-svc and translates any CalledProcessError into S6CommandError.
* `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway`
catches both errors and prints `✗ <message>` + `sys.exit(1)`
instead of letting the exception bubble. The dispatch path that
used to dump a traceback at the user now gives an actionable
one-liner.
Tests: 6 new tests for the error types and their CLI rendering;
existing lifecycle test pre-seeds the slot directory before calling
`mgr.start` etc.
PR #30136 review caught: `hermes gateway start` (no `-p`) inside
the container resolves `_profile_suffix() == ""` → service name
`gateway-default`, but no such slot was ever registered. The Phase 4
profile-create hook only fired on `hermes profile create <name>`,
and the root profile (which lives at the top of $HERMES_HOME, not
under `profiles/`) was never one of those. So bare `hermes gateway
start` landed on `s6-svc -u /run/service/gateway-default` →
uncaught `CalledProcessError` → traceback to the user.
Changes:
1. `reconcile_profile_gateways` now always registers a
`gateway-default` slot before iterating named profiles. Its
prior state is read from `$HERMES_HOME/gateway_state.json`
(sibling to the profile root, not under `profiles/`); stale
runtime files there are swept the same way. Auto-up only if the
prior state was `running` — same rule as named profiles.
2. `S6ServiceManager._render_run_script` special-cases
`profile == "default"` to emit `hermes gateway run` with NO
`-p` flag. Passing `-p default` would resolve to
`$HERMES_HOME/profiles/default/` — a different profile that
almost certainly doesn't exist. The empty profile-suffix
convention is the dispatcher's contract and the run script has
to match.
3. A user-created `profiles/default/` collides with the reserved
root-profile slot; the reconciler now skips it with a warning
rather than producing two registrations of the same service name.
Action-list ordering is stable: `default` first, then named
profiles in directory order. Boot-log readers can rely on this.
Tests: 8 new dedicated default-slot tests plus updates to every
existing test that asserted against the action list (via the new
`_named_actions` helper that drops the always-present default
entry).
PR #30136 review caught that `hermes gateway stop --all` and
`... restart --all` were broken under s6. The Phase 4 dispatcher was
gated on `not stop_all` (and the symmetric restart_all), so `--all`
fell through to `kill_gateway_processes(all_profiles=True)`. pkill
SIGTERMed every gateway, s6-supervise observed the crashes, and
restarted every gateway ~1s later — net effect: `--all` *kicked*
gateways instead of *stopping* them.
Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates
`mgr.list_profile_gateways()` and routes stop/restart through each
service slot. s6's `want up`/`want down` flips correctly, so a
stop persists. Partial failures are surfaced per-profile with a
running success count; the host pkill path is only reached when s6
isn't in play.
`start --all` isn't a CLI surface — the helper rejects it and
returns False (host code path can take over).
PR #30136 review surfaced two issues, both rooted in the same audit gap:
docker integration tests were running as root, not the unprivileged
`hermes` user (UID 10000) that the runtime actually uses via
`s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to
the s6 control surface worked as root in the tests but was inert in
production.
Fixes:
1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`,
which is root-only readable. For UID 10000 the symlink yields
PermissionError, `resolve()` silently returns the unresolved path,
and `exe.name == "exe"` — so detection always returned False, the
service-manager runtime-registration path was inert, and every
`hermes profile create` / `hermes -p X gateway start` silently
skipped the s6 hook. Replace with `/proc/1/comm` (world-readable)
+ `/run/s6/basedir` (s6-overlay-specific) — both required, fail
closed.
2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/`
{control,lock} to hermes so `s6-svscanctl -a/-an` works without
root. Previously the directory chown stopped at `/run/service`
and the FIFO inside stayed root-owned, so `register_profile_gateway`
from hermes failed at the rescan-trigger step with EACCES — the
wrapper in profiles.py caught the exception and printed a swallowed
warning, so profile creation appeared to succeed while the slot
was rolled back.
Audit changes to flush this class of bug next time:
- Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py`
that default to `-u hermes`. The module docstring explains why and
flags `user="root"` as opt-in only for tests that explicitly need
root (none currently do).
- Refactor every `docker exec` call in tests/docker/ through the new
helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py,
test_container_restart.py, test_s6_profile_gateway_integration.py).
- Add 5 unit tests covering `_s6_running` under various probe states
(both signals present; comm wrong; basedir missing; PermissionError
on /proc/1/comm; missing /proc — non-Linux). The PermissionError
test is the explicit regression guard for the original bug.
Known follow-up: the per-service `supervise/control` FIFO inside each
`/run/service/gateway-<profile>/supervise/` is created root-owned by
s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc
-u/-d/-t` from the hermes user will get EACCES on those. The audit
under `-u hermes` will reveal this in lifecycle tests — surfacing the
issue cleanly so it can be fixed in a focused follow-up (likely via a
small SUID helper or a polling chown loop in cont-init.d). The
detection + svscanctl fixes here are independent and complete on
their own.
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.
Task 4.0 — container-boot reconciliation:
/run/service/ is tmpfs, so every `docker restart` wipes every
per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
invokes hermes_cli.container_boot.reconcile_profile_gateways() on
every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
gateway_state.json, recreates the s6 service slot, and auto-starts
only those whose last state was 'running'. Other states
(stopped, starting, startup_failed, missing) register the slot
in the down state — avoiding crash-loops across restarts for a
gateway that was broken last boot. Per-profile outcome is recorded
to $HERMES_HOME/logs/container-boot.log.
Implementation: hermes_cli/container_boot.py + 12 unit tests.
Profile-marker is SOUL.md, not config.yaml, because `hermes profile
create` only seeds SOUL.md by default (config.yaml comes from
`hermes setup`).
Task 4.1 / 4.2 — profile create/delete hooks:
hermes_cli/profiles.py::create_profile now calls
_maybe_register_gateway_service(<canon>) at the end, which routes
through ServiceManager.register_profile_gateway when running on s6
and no-ops on host backends. delete_profile mirrors with
_maybe_unregister_gateway_service. _allocate_gateway_port produces
a deterministic SHA-256-derived port in [9200, 9800).
Task 4.3 — gateway dispatch + remove rejection arms:
_dispatch_via_service_manager_if_s6(action) intercepts
start/stop/restart at the top of each subcommand and routes them
through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
`elif is_container():` rejection arms are kept as fallback for
pre-s6 containers / unsupported runtimes, but only ever fire when
detect_service_manager() != 's6'. install/uninstall under s6
print informational guidance pointing users at profile create/delete.
Removed the two xfail(strict=True) markers from
tests/docker/test_profile_gateway.py — both tests now pass strictly.
Task 4.4 — status reporting:
get_gateway_runtime_snapshot() reports
Manager: 's6 (container supervisor)' inside an s6 container instead
of 'docker (foreground)'.
Plan-vs-reality drift fixed in this commit:
- Plan's S6ServiceManager._render_run_script used
`gateway start --foreground --port {port}` — invented args; the
real CLI is `gateway run`. Switched accordingly. port arg
retained for API parity but now documented as 'currently ignored'.
- Plan's reconciler keyed on config.yaml; switched to SOUL.md
(config.yaml is created by hermes setup, not by hermes profile
create, so the original gate caught nothing).
- The plan's _dispatch helper used _profile_arg() which returns
'--profile <name>' (i.e. with the flag prefix). Switched to
_profile_suffix() which returns the bare name.
- Architecture B's docker exec doesn't get /command on PATH or
the venv on PATH; Dockerfile's runtime PATH now includes
/opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
without sourcing the venv.
- stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
boot, not just on the UID-remap path. Without this, files created
by docker-exec-as-root accumulate and the next reconciler run
fails with PermissionError reading SOUL.md.
Test harness:
19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
passing). 78 unit tests across service_manager + container_boot +
profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
pass cleanly.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Phase 3 of the s6-overlay supervision plan. Implements the runtime-
registration surface from D4 — only the s6 backend supports
register_profile_gateway / unregister_profile_gateway /
list_profile_gateways; host backends continue to raise
NotImplementedError. No caller yet (Phase 4 wires in the profile
create/delete hooks).
Key implementation notes:
- Service directory shape: /run/service/gateway-<profile>/{type,run,log/run}.
Atomic register: write to gateway-<profile>.tmp, fsync via
os.rename. Cleanup on rescan failure.
- Run script uses #!/command/with-contenv sh so HERMES_HOME and any
extra_env arrive at exec time. The hermes -p <profile> gateway
start --foreground --port <port> command is wrapped in
s6-setuidgid hermes for the per-service privilege drop (OQ2-A).
- Log script (OQ8-C): persists via s6-log to
${HERMES_HOME}/logs/gateways/<profile>/. CRITICAL — HERMES_HOME is
a runtime env-var expansion in the rendered script, NOT a Python
f-string substitution. Negative-asserted in
test_s6_register_creates_service_dir_and_triggers_scan so
regressions are caught.
- PATH gotcha: /command/ is only on PATH for processes spawned by
the supervision tree (services, cont-init.d). `docker exec` and
profile-create hooks don't get it. S6ServiceManager calls all
s6-* binaries via absolute path through the new _S6_BIN_DIR
constant so callers don't have to fix up env vars.
- validate_profile_name rejects path-traversal, leading-dash (s6
would parse as a flag), uppercase, whitespace, and names >251
chars (s6-svscan default name_max).
Test coverage:
- 13 new unit tests in tests/hermes_cli/test_service_manager.py
(kind detection, run-script content, env quoting, register
rollback on rescan failure, unregister idempotence, list filter,
lifecycle dispatch, svstat parsing). Total: 36 passing.
- 2 new in-container integration tests in
tests/docker/test_s6_profile_gateway_integration.py validating
end-to-end registration against a real s6 supervision tree.
Docker harness: 14 passed, 2 xfailed (Phase 4 target unchanged).
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Phase 1 of the s6-overlay supervision plan. Pure-refactor addition:
introduces the abstract interface (with runtime_checkable Protocol),
detect_service_manager(), validate_profile_name(), and thin
SystemdServiceManager / LaunchdServiceManager / WindowsServiceManager
wrappers around the existing systemd_* / launchd_* / gateway_windows.*
module-level functions. No host call site was modified — host code
continues to use the existing functions directly; the protocol is for
new backend-agnostic code (Phase 4 profile create/delete hooks and the
Phase 4 s6 dispatch path in 'hermes gateway start/stop/restart').
WindowsServiceManager.install() forwards the v3 kwargs (start_now,
start_on_login, elevated_handoff) added in PRs #28169-adjacent so
non-Windows callers — there aren't any today — can opt in.
The s6 backend lands in Phase 3; until then get_service_manager()
raises a clear error if invoked on a host that detects as 's6'.
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.
The hook covers cases the shell-template grammar can't reasonably
express:
- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows
None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.
## Resolution order
The dispatcher's resolution order is the load-bearing invariant:
1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
→ command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
→ plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).
Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
the `hermes tools` row list defensively.
Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
`_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
config defensively so a refactor of the caller can't silently break
the invariant.
## New files
- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
`list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
`voice_compatible` (all optional with sane defaults). Mirrors
`agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
`agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).
## Modified files
- `hermes_cli/plugins.py` — `register_tts_provider()` method on
`PluginContext`. Matches the gating shape of
`register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
`_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
plugin rows into the Text-to-Speech picker category alongside the
10 hardcoded built-in rows.
## Tests
- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
`tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
built-in-always-wins, command-wins-over-plugin, plugin dispatch,
exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
picker surface, builtin shadowing defense, integration with
`_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
parity harness vs `origin/main`. The only intentional diff is
`fallback_edge → plugin` for the `plugin-installed` scenario.
## Verification
- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
`text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.
## Docs
- `website/docs/user-guide/features/tts.md` — new "Python plugin
providers" section with a decision table (command-provider vs
plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
mention both surfaces (command-provider primary, plugin for
SDK/streaming).
Closes#30398
The web dashboard's Anthropic OAuth helper wrote the credential file
straight to its final destination and relied on the process umask for
permissions. That left the dashboard-specific path weaker than the
existing auth writers, which already use owner-only permissions and
safer write semantics.
This change keeps the scope narrow: make the dashboard helper write via
a temp file + replace, chmod the final file to owner-only, and add a
focused regression test for both permission handling and atomic-write
behavior.
Constraint: Must preserve the existing dashboard OAuth flow and credential-pool side effects
Rejected: Broader auth-storage refactor | unnecessary scope for a single verified inconsistency
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep dashboard credential writes aligned with existing auth storage semantics; do not reintroduce direct write_text() here without matching chmod/atomic behavior
Tested: pytest -o addopts='' tests/hermes_cli/test_web_server_oauth_write.py tests/hermes_cli/test_web_server.py -q (78 passed)
Not-tested: Cross-platform permission semantics on Windows-managed filesystems
Board defaults represent persistent project checkouts. Scratch workspaces
are auto-deleted on completion and must stay under the per-board scratch
root that resolve_workspace() creates. Inheriting default_workdir for a
scratch task pointed the cleanup path at the user's source tree — the
data-loss vector documented in #28818.
The containment guard in _cleanup_workspace (just added) is the safety
rail. This commit prevents the bad state from being created in the first
place: only persistent kinds (dir/worktree) inherit board defaults.
Tests updated to cover the new semantics: scratch with default_workdir
set keeps workspace_path=None; dir/worktree still inherits the board
default.
Salvaged from PR #31315 by @leeseoki0 — prevention layer on top of the
#28819 containment fix by @briandevans.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Copilot review on PR #28819 flagged that `_is_managed_scratch_path` accepted
the entire `<kanban_home>/kanban` subtree as managed scratch storage. With
that, a task whose `workspace_kind='scratch'` and `workspace_path` was
mis-set to `<kanban_home>/kanban`, `.../kanban/logs`, or a board's
metadata directory (e.g. `.../kanban/boards/<slug>` without the
`workspaces/` child) would pass the containment guard and let task
completion `shutil.rmtree` Hermes' own DB, metadata, and log subtrees.
Tighten the guard:
* Allowed roots are now exclusively `workspaces/` directories — the
`HERMES_KANBAN_WORKSPACES_ROOT` override, `<kanban_home>/kanban/workspaces`,
and each `<kanban_home>/kanban/boards/<slug>/workspaces` discovered on
disk.
* Require strict descendancy: a path equal to a root itself is rejected
too, because deleting a workspaces root would wipe every task's scratch
dir at once.
Add a regression test covering the three Copilot-named attack paths
(kanban root, kanban/logs, board root without `workspaces/`) plus the
workspaces-root-itself case, and confirm the inner task-id dir still
matches.
A board's ``default_workdir`` (e.g. ``hermes kanban boards
set-default-workdir my-board /path/to/real/source``) is copied into
``tasks.workspace_path`` for tasks created without an explicit
``workspace_kind``. Those tasks default to ``workspace_kind='scratch'``,
so completion calls ``_cleanup_workspace`` and unconditionally runs
``shutil.rmtree(wp, ignore_errors=True)`` — deleting the user's real
source tree as if it were disposable scratch storage.
Add ``_is_managed_scratch_path()`` and gate ``_cleanup_workspace`` on
it: only delete paths under ``HERMES_KANBAN_WORKSPACES_ROOT`` (the
worker-side override the dispatcher injects) or under the active kanban
home's ``kanban/`` subtree (covering both the legacy default-board root
and per-board ``kanban/boards/<slug>/workspaces`` roots). Anything else
gets a warning log and is left alone, so a misconfigured
``default_workdir`` can no longer destroy user data on task completion.
Adds 'hermes security audit' — a one-shot vulnerability scan against
OSV.dev covering three surfaces a Hermes user actually controls:
1. The running Python's installed PyPI dists (importlib.metadata)
2. Plugin requirements.txt / pyproject.toml pins under ~/.hermes/plugins/
3. Pinned npx/uvx MCP servers in config.yaml
Zero new dependencies (stdlib urllib + importlib.metadata + tomllib +
concurrent.futures). No auth required for OSV's public batch API.
Flags: --json, --fail-on {low,moderate,high,critical} (default: critical),
--skip-venv, --skip-plugins, --skip-mcp
Output groups findings by source, sorts by severity descending, surfaces
fixed-versions inline. Exit 1 when any finding meets the --fail-on tier.
Deliberately out of scope: globally-installed pip/npm, editor/browser
extensions, daily background scans, auto-blocking of installs. The audit
is on-demand by design — daily scans become noise the user trains
themselves to ignore.
response_store.db (api server) holds conversation history including tool
payloads, prompts, and results. webhook_subscriptions.json holds per-route
HMAC secrets. Under a permissive umask (e.g. 0o022, default on most
distros) both files were created mode 0o644 — readable by other local
users on shared boxes.
- gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM
sidecars to 0o600 after __init__, then trusts the inode. (Original
contributor patch chmod'd after every _commit() — wasteful on a hot
api_server path; chmod-on-create is sufficient since SQLite preserves
mode bits across writes.)
- hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp
(which itself creates the file with 0o600), chmods the temp before the
atomic rename, and re-asserts 0o600 on the destination so an existing
permissive file from before this fix gets narrowed.
Tests cover (a) creation under permissive umask leaves 0o600 and (b) an
existing 0o644 webhook_subscriptions.json gets narrowed on next save.
Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply
on Windows.
Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py
side from chmod-on-every-commit to chmod-on-create.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Adds an --ids flag to 'hermes kanban promote' mirroring the existing
block/schedule convention, so the marquee use case from issue #28822
(promote all children of a closed organizational parent in one shot)
doesn't require a shell loop. Single-id JSON output stays a flat
object for back-compat; bulk emits a list. Dedupes positional + --ids
so the same id can't be promoted twice in one call. 5 new CLI-level
tests cover bulk happy path, partial-failure exit code, JSON shapes,
and dedup.
Also adds the thedavidmurray noreply-email -> github-login mapping in
scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP
contributor-credit check.
Adds `hermes kanban promote <task_id>` for manual lifecycle recovery
when an auto-promote daemon misses the parent-done transition (issue
#28822). Refuses promotion unless every parent dep is done/archived
(override with --force). Emits a `promoted_manual` audit event distinct
from the automatic `promoted` kind, so audit consumers can filter
human-driven from system-driven promotions. Supports --dry-run and
--json for orchestration. Does not mutate assignee/claim state — the
dispatcher picks the card up via its normal ready polling path.
Closes#28822.
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.
This change adds a generic plugin surface for auxiliary task
registration:
ctx.register_auxiliary_task(
key='memory_retain_filter',
display_name='Memory retain filter',
description='hindsight pre-retain dedup/extract',
defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
)
After registration, the task automatically:
- Appears in 'hermes model → Configure auxiliary models' picker via
a new _all_aux_tasks() merge of built-in + plugin tasks
- Has its provider/model/base_url/api_key bridged from config.yaml
to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
(gateway/run.py now uses a dynamic bridged-keys set instead of
a hardcoded per-task dict)
- Gets plugin-declared defaults (timeout, extra_body, etc.) layered
underneath user config so unconfigured plugin tasks still work
(agent/auxiliary_client._get_auxiliary_task_config)
- Resets to auto via 'Reset all to auto' alongside built-ins
Validation:
- Rejects shadowing of built-in keys (vision, compression, etc.)
- Rejects invalid key shapes (must match [A-Za-z0-9_]+)
- Rejects cross-plugin collisions (clear error)
- Allows same-plugin re-registration (idempotent updates)
Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.
Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.
Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.
Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.
Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
Null bytes in API key values (introduced by copy-paste) crash
os.environ[k] = v with ValueError: embedded null byte, preventing
hermes from starting at all.
First scratch workspace creation on an install now emits a one-shot
warning log + a 'tip_scratch_workspace' event on the task. Sentinel
file at ~/.hermes/kanban/.scratch_tip_shown silences subsequent
creations across the whole install.
Behavior unchanged — scratch is still ephemeral by design. This just
makes the design visible to new users (reported in user community:
'progress files vanished, no warning anywhere').
Docs (en + ko) updated to spell out 'Deleted when the task completes'
on the scratch bullet and 'Preserved on completion' on worktree/dir.
`_deliver_kanban_artifacts` routes candidates through
`BasePlatformAdapter.filter_local_delivery_paths` (added in 41d2c758c),
which rejects paths outside `MEDIA_DELIVERY_SAFE_ROOTS`. The two
artifact-delivery tests create fixtures under `tmp_path`, which lives
outside the cache roots — so under CI's hermetic HOME the filter
silently dropped both fake files and the assertions on
`images_uploaded` / `documents_uploaded` failed.
Fix: monkeypatch `HERMES_MEDIA_ALLOW_DIRS=str(tmp_path)` in both tests
so the safety filter accepts the fixtures. Production behaviour
unchanged; test-side fix only.
CI fail repro on origin/main: test (6) shard, both
test_notifier_uploads_artifacts_on_completion and
test_notifier_artifact_delivery_skips_missing_files.
PR #41d2c758c ("Fix unsafe gateway media path delivery") tightened
`validate_media_delivery_path` so that artifacts emitted by the agent
must live inside `MEDIA_DELIVERY_SAFE_ROOTS` (Hermes-managed cache
dirs) or an operator-allowlisted root via `HERMES_MEDIA_ALLOW_DIRS`.
Two kanban-notifier tests put their PDFs and PNGs under pytest's
`tmp_path`, which is correctly rejected by the new validator. They
started failing on main as soon as that PR landed:
FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_uploads_artifacts_on_completion
FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_artifact_delivery_skips_missing_files
Symptom in logs: "Skipping unsafe local file path outside allowed
roots". The validator is doing exactly what it should — the tests were
relying on the looser pre-fix behaviour.
Fix: add `HERMES_MEDIA_ALLOW_DIRS=tmp_path` to the `kanban_home`
fixture so artifacts under `tmp_path` are recognised as safe. This is
the same allowlist mechanism the operator-facing env var documents.
35 new tests across 5 classes covering every layer of the
GHSA-5qr3-c538-wm9j defence. Each class corresponds to one chokepoint
so a regression in any single layer is caught by the named class:
* ``TestProjectPluginsEnvGate`` (13 cases) — parametrised over both
the documented truthy values (``1`` / ``true`` / ``yes`` / ``on``
+ uppercase variants) and the previously-bypassing falsy strings
(``0`` / ``false`` / ``no`` / ``off`` / ``""`` / ``False``). The
falsy half is the direct env-bypass repro: pre-fix any non-empty
string enabled the project source.
* ``TestApiPathSanitizer`` (16 cases) — unit-level coverage of the
new ``_safe_plugin_api_relpath`` helper. Absolute paths
(``/etc/passwd``, ``/tmp/payload.py``, ``/usr/bin/python``),
``..``-traversal payloads (including nested ``subdir/../../..``),
and non-string / empty / whitespace-only values must all return
``None``. Safe relative paths (``api.py``, ``backend/routes.py``)
round-trip unchanged so legitimate plugins keep working.
* ``TestDiscoveryScrubsApiField`` (3 cases) — end-to-end through
``_discover_dashboard_plugins`` with a real manifest on disk.
Verifies that the cached plugin entry's ``_api_file`` is
scrubbed *at discovery time* (``None`` + ``has_api: False``) so
any downstream consumer can't be tricked into re-deriving the
unsafe path from cache.
* ``TestMountApiRoutesRefusesUntrusted`` (3 cases) — pokes
synthetic plugin entries with each refusal vector directly into
the cache and patches ``importlib.util.spec_from_file_location``
to assert it is *not* invoked for project-source / traversal
payloads, and *is* invoked normally for bundled / user plugins.
* ``TestEndToEndPocBlocked`` (1 case) — reproduces the original
advisory PoC: operator sets ``HERMES_ENABLE_PROJECT_PLUGINS=0``
believing project plugins are off, attacker plants a manifest in
CWD's ``.hermes/plugins/`` with ``api`` pointing at an absolute
payload path. Asserts that the importer is never called against
the payload path *and* that ``hermes_dashboard_plugin_evil`` is
not in ``sys.modules`` after the mount routine runs.
An autouse fixture busts ``_dashboard_plugins_cache`` before and
after each test so the production cache (populated by the
import-time ``_mount_plugin_api_routes()`` call) can't bleed in.
All 12 pre-existing dashboard-plugin tests in
``test_web_server.py`` still pass unchanged.
Follow-up to PR #28832 — the dashboard plugin routes now accept slashed
names like `observability/langfuse` and `image_gen/openai`, but
`_sanitize_plugin_name` still rejected forward slash and so dashboard
update + remove on those plugins fell through to '404 not found' even
though they exist on disk.
Adds an opt-in `allow_subdir=True` flag that:
- Permits internal forward slashes (category-namespaced plugin keys
emitted by `_discover_all_plugins`).
- Strips leading and trailing slashes.
- Still rejects `..` and backslash, and still asserts the resolved
target lives inside `plugins_dir`.
Opted in at the two read-paths that operate on installed plugins:
`_require_installed_plugin` (CLI update/remove) and
`_user_installed_plugin_dir` (dashboard update/remove). The install
path keeps the default (`allow_subdir=False`) because freshly-cloned
plugins always land top-level under `~/.hermes/plugins/<name>/`.
Adds 6 targeted unit tests covering the new flag's allow/reject matrix.
- test_browser_secret_exfil: mock _run_browser_command instead of
launching real Chrome (secret check is pre-launch, browser is
irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
yield the event loop before receive_text(). TestClient's sync mode
can race the broadcast handler otherwise, hanging the test.
@memosr's PR #27612 put the inference_base_url allowlist check only at the
Nous proxy adapter forward boundary. The poisoned URL, however, lands in
``auth.json`` upstream of that — at five refresh / agent-key-mint payload
read sites inside ``resolve_nous_runtime_credentials`` and
``_extend_state_from_refresh``. Without gating those sites, a single MITM
on a refresh response persists the attacker's URL across restarts, even
if the proxy adapter's defense-in-depth check would later catch it on
the way out.
Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network``
at all five Portal-network reads:
- hermes_cli/auth.py L4840 (refresh-only access-token path)
- hermes_cli/auth.py L4876 (mint payload path)
- hermes_cli/auth.py L5154 (terminal-runtime access-token refresh)
- hermes_cli/auth.py L5262 (cross-process serialized refresh)
- hermes_cli/auth.py L5317 (terminal-runtime mint payload)
The state-read path at L5025 (``state.get("inference_base_url")``) is
deliberately NOT gated — pre-existing state in ``auth.json`` is either
already validated (it came from one of the five network sites above) or
set by a trusted local actor (manual edit, ``_setup_nous_auth`` test
fixture, ``hermes login nous`` against a staging endpoint via the
documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file /
patch tampering with auth.json is independently blocked by PR #14157.
Adds tests/hermes_cli/test_nous_inference_url_validation.py covering:
- validator https + host + edge-case rules (12 cases)
- all 5 network call sites grep contracts (no _optional_base_url
regression possible without test failure)
- proxy adapter defense-in-depth check still present
- env override path NOT gated (documented dev/staging behaviour)
18 new tests, all 119 Nous-auth tests green.
Mirrors the architecture established by the web (#25182), browser
(#25214), and video_gen (#25126) plugin migrations:
* `tools/fal_common.py` — stateless atoms shared by both FAL-backed
plugins (image_gen + video_gen). Holds the lazy `fal_client` import
helper, `_ManagedFalSyncClient`, `_normalize_fal_queue_url_format`,
`_extract_http_status`. Stateful pieces (`fal_client` module global,
`_managed_fal_client*` cache, `_submit_fal_request`,
`_resolve_managed_fal_gateway`, `_get_managed_fal_client`)
intentionally stay on `tools.image_generation_tool` so the existing
`monkeypatch.setattr(image_tool, ...)` patch sites keep working
unchanged.
* `plugins/video_gen/fal/__init__.py` — drops its inline
`_load_fal_client` duplicate; consumes `tools.fal_common.import_fal_client`.
* `plugins/image_gen/fal/{plugin.yaml,__init__.py}` — new plugin.
`FalImageGenProvider` is a thin registration adapter that resolves
the legacy module via `import tools.image_generation_tool as _it`
and calls `_it.image_generate_tool` + `_it._resolve_fal_model` at
call time. The 18-model catalog, `_build_fal_payload`, managed-
gateway selection, and Clarity Upscaler chaining all remain in
`tools.image_generation_tool` as the single source of truth —
the plugin is a registration adapter, not a parallel implementation.
* `tools/image_generation_tool.py::_dispatch_to_plugin_provider` —
drops the `configured == "fal"` skip. Setting `image_gen.provider:
fal` now routes through the registry like any other provider; the
plugin re-enters this module's pipeline so behavior is identical.
Unset `image_gen.provider` still falls through to the in-tree
pipeline (preserves no-config-with-FAL_KEY UX from #15696).
* `hermes_cli/tools_config.py` — drops the hardcoded "FAL.ai" row from
`TOOL_CATEGORIES["image_gen"]["providers"]` (now injected by
`_plugin_image_gen_providers` like every other backend) and the
`getattr(provider, "name") == "fal"` skip that protected against
duplication with the hardcoded row. The "Nous Subscription" row
stays as a setup-flow entry — same shape browser kept "Nous
Subscription (Browser Use cloud)" after #25214.
* `tests/plugins/image_gen/test_fal_provider.py` — 14 cases covering
the ABC surface, call-time indirection (verifying
`monkeypatch.setattr(image_tool, "image_generate_tool", ...)` takes
effect through the plugin), response-shape stamping, exception
handling, and registry wiring.
* `tests/plugins/image_gen/check_parity_vs_main.py` — subprocess
harness mirroring `tests/plugins/browser/check_parity_vs_main.py`.
Pins one path to origin/main, one to the worktree; runs six
scenarios (unset, explicit-fal-no-creds, explicit-fal-with-creds,
explicit-fal-with-model, typo provider, managed-gateway-only) and
diffs the reduced shape `{dispatch_kind, provider_name, model}`
per scenario. The only acceptable diff is "legacy_fal → plugin
(fal)" for explicit-FAL paths — every other delta is flagged as
a regression.
* `tests/hermes_cli/test_image_gen_picker.py::test_fal_surfaced_alongside_other_plugins`
— flips the previous `test_fal_skipped_to_avoid_duplicate` to
match the new shape (FAL is a plugin now, no dedup needed).
Verified: 195/195 tests across
`tests/{tools/test_image_generation*,tools/test_managed_media_gateways,plugins/image_gen,plugins/video_gen,hermes_cli/test_image_gen_picker}.py`
pass on this branch with no test patches modified outside the picker
test that asserted the old skip behaviour.
Fixes#26241
curses.init_pair(N, 8, -1) uses extended color 8 ("bright black" /
dim gray) which does not exist on 8-color terminals (COLORS == 8,
valid range 0-7). This crashes the entire plugins UI, session
browser, and radio picker in Docker containers with:
curses.error: init_pair() : color number is greater than COLORS-1
Replace all 5 occurrences across plugins_cmd.py, main.py, and
curses_ui.py with min(8, curses.COLORS - 1), which falls back to
COLOR_WHITE (7) on 8-color terminals.
Closes#13688
* fix(tui): surface verbose tool details
Emit redacted structured verbose args/results to the TUI so /verbose verbose can show full tool detail without reopening stdout, and fail closed if redaction is unavailable.
Salvages #29011.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
* fix(tui): address verbose detail review
Label verbose tool failures as errors, cover forced verbose reasoning, and avoid new diff type warnings from the redaction regression tests.
* fix(tui): bound verbose tool payloads
Cap verbose tool detail text before emitting JSON-RPC events and preserve verbose results on inline diff completions.
* fix(tui): align termux argv test with gc flag
Update the stale TUI launch expectation so the Termux freshness path matches the current direct Node argv.
---------
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
The upstream cua-driver installer resolves the latest release and attempts
to download an architecture-specific asset. When the release only ships
arm64 builds (as of v0.1.6), the installer fails with a raw 404 on Intel
macOS with no clear path forward.
Add _check_cua_driver_asset_for_arch() that probes the GitHub Releases API
before running the installer. If the latest release has no x86_64/amd64
asset, print a clear warning and link to the upstream issue. On arm64 or
API failure, fail open and let the installer proceed as before.
Fixes#24530
The bundled-skill sync stamp added in the cherry-picked salvage commit
parsed .git/HEAD and looked for a loose ref file in the worktree gitdir
only, so two real cases hit the unresolved branch:
- repos after `git gc` where active refs live in packed-refs
- linked worktrees, whose branch ref lives in <commondir>/refs/heads/
(verified on the worktree this salvage was built in)
Both fell back to a constant-string fingerprint, so post-commit launches
would never re-run the real skill sync. Now we resolve packed-refs and
check both the worktree gitdir and the common dir for loose refs.
Adds three tests covering: packed-refs resolution, worktree common-dir
packed lookup, worktree common-dir loose lookup, and the explicit
'unresolved' marker (still stable + version-fallback-safe).
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.
This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.
Example config:
```yaml
custom_providers:
- name: gemma-local
base_url: http://localhost:8080/v1
model: google/gemma-4-31b-it
extra_body:
enable_thinking: true
reasoning_effort: high
```
For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:
```yaml
extra_body:
chat_template_kwargs:
enable_thinking: true
```
Changes:
- `hermes_cli/config.py`: preserve `extra_body` in normalized
`custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
as `request_overrides.extra_body` for named custom runtime resolution,
including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
entry by `base_url` (+ optional model) and merge its `extra_body` into
`AIAgent.request_overrides`, with caller-provided overrides winning on
conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
`extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
trailing-slash equivalence, fallback when no `model` field is set, and
caller-override merging precedence.
Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.
Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.
Closes#29022
* ci(tests): install ripgrep from prebuilt tarball instead of apt
apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.
- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.
* fix(cli): compile syntax check to tempdir, not source __pycache__
`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:
1. Parallel test workers walking the same source tree (e.g. running
the suite under per-file process isolation) can race against each
other on the `__pycache__` write — manifests as flaky 'directory
not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
that the next interpreter run might pick up — fine when the
interpreter version matches, sketchy if it doesn't.
Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.
* test(runner): per-file process isolation, drop manual state reset + xdist
Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.
Key changes:
* scripts/run_tests_parallel.py — new runner: discovers test files,
runs N in parallel via ThreadPoolExecutor, captures stdout per file,
treats exit code 5 (no tests collected) as pass, kills all children
on exit. Change from cpu_count to cpu_count*2. The runner is
I/O-bound (waiting on subprocess.communicate() from pytest children)
The parent process does almost no CPU work, so 2x oversubscription
keeps more pipes full. When a file fails, immediately show the last
30 lines of pytest output (stack traces + FAILED summary) plus a
ready-to-copy repro command:
python -m pytest tests/agent/test_auxiliary_client.py
* scripts/run_tests.sh — delegates to run_tests_parallel.py
* .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
* pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
* tests/conftest.py — remove ~200 lines of manual state-reset fixtures
* AGENTS.md — update Testing section for per-file design
* test(runner): speed gateway test antipattern scan up
* fix(test): web search provider plugin test missing xai
* fix(tests): make 14 test files pass under per-file subprocess isolation
Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:
Tool registry not populated:
- test_video_generation_tool_surface_matrix: add discover_builtin_tools()
- test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
registering all 8 bundled web providers, reset after each test
- test_website_policy: same provider registration pattern
- test_web_tools_tavily: same pattern across 3 dispatch test classes
- Also add is_safe_url/check_website_access mocks where SSRF check
blocks example.com (DNS resolution fails in isolated envs)
Stale check_fn cache:
- test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
in both kanban guidance tests (prior test cached False for kanban_show)
- test_discord_tool: cache invalidation in setup/teardown
- test_homeassistant_tool: invalidate_check_fn_cache() before registry queries
Module-level state pollution:
- test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
- test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
(ContextVar takes precedence over os.environ)
- test_dm_topics: overwrite sys.modules + separate telegram.constants mock
+ force-reimport of gateway.platforms.telegram
- test_terminal_tool_requirements: removed duplicate class declaration,
autouse _clear_caches fixture
* change(tests): run_tests.sh explicitly includes env vars
instead of manually dropping some vars, now we just only include some
* fix(tests): 5 more isolation/NixOS fixes
- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
(feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
/etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
(doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
profile deletion when copytree preserved read-only modes from
nix store
* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client
* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor
* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test
* fix: address PR #29016 review feedback
- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
shared helper (register_all_web_providers), replacing 8 copy-pasted
blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
fix worker count to match code (cpu_count*2)
* fix: eliminate race in stale-cache achievements test
The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.