Commit graph

4276 commits

Author SHA1 Message Date
Teknium
4fbdf0e893 test(cli,gateway): cover bracket-stripping and gateway session-ID lookup
- CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged.
- Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.
2026-05-25 01:33:32 -07:00
Teknium
920b350e57 test(auth): align copilot-remove test with borrowed-credential policy (#31416)
PR #31416 (avoid persisting borrowed credential secrets) added
sanitize_borrowed_credential_payload, which strips access_token from
any auth.json pool entry whose (provider, source) isn't in the
_PERSISTABLE_PROVIDER_SOURCES allowlist.

(copilot, gh_cli) is borrowed (not in the allowlist), so the test
fixture's pre-seeded access_token now gets stripped at load_pool()
time, leaving the pool empty. resolve_target('1') then fails with
'No credential #1. Provider: copilot.'

Fix: align the test with the new contract. At runtime, copilot tokens
are hydrated by resolve_copilot_token() — mock that path so the pool
gets an entry the test can remove. The behavior under test
(suppression of gh_cli + env variants on remove) is unchanged.

CI repro on origin/main HEAD; reproduced locally with stock checkout.
2026-05-25 01:23:31 -07:00
helix4u
ec4d6f1823 fix(cli): show masked feedback for secret prompts 2026-05-25 01:20:33 -07:00
Glen Workman
d952b377aa
fix: add cron API provenance logging (#24889)
Co-authored-by: sgtworkman <178342791+sgtworkman@users.noreply.github.com>
2026-05-25 01:15:56 -07:00
zapabob
2c3ca475c0 fix(cron): reject id mutation + validate output paths under OUTPUT_DIR
Two defense-in-depth fixes on cron output path handling:

1. cron/jobs.py:update_job() rejects mutation of the immutable 'id' field
   (raises ValueError). Dashboard PUT /api/cron/jobs/{id} converts this to
   HTTP 400. Without this, an attacker who can reach the update endpoint
   could rename a job's id to '../escape' and move its output directory
   outside OUTPUT_DIR.

2. cron/jobs.py:_job_output_dir() validates job IDs before composing
   paths: rejects '.', '..', '/', '\\', absolute paths, and Windows drive
   prefixes. Used by save_job_output() and remove_job() so legacy unsafe
   IDs (from before this guard) fail closed rather than half-applying a
   shutil.rmtree or output write outside the sandbox.

Tests:
  - update_job rejects {'id': '../escape'} without renaming
  - remove_job(legacy '../escape' id) raises ValueError without deleting
    files outside OUTPUT_DIR or removing the job from the store
  - save_job_output rejects '..', './escape', 'nested/escape',
    absolute paths
  - dashboard PUT /api/cron/jobs/{id} with {'id': '../escape'} returns
    400, job list unchanged

Salvaged from PR #29826 by @zapabob. Simplified implementation:
- Dropped a 23-line _validate_job_output_id() helper using Path.parts
  semantics. The inline check (path separators + dot-components +
  is_absolute) is shorter and behaviorally identical.
- Dropped the secondary OUTPUT_DIR.resolve()/relative_to() check —
  redundant once we reject any path separator at the input boundary.
- Dropped the _docs/2026-05-21_cron-output-path-hardening_codex.md
  planning artifact (we don't check planning docs into the repo).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 01:15:24 -07:00
Schrotti77
9863a07af6 fix(cron): layer agent.disabled_toolsets onto cron baseline (#25752)
The bug: cron/scheduler.py:_resolve_cron_enabled_toolsets returns an
LLM-supplied per-job enabled_toolsets verbatim. The disabled_toolsets
passed to AIAgent was a hardcoded [cronjob, messaging, clarify] that
ignored agent.disabled_toolsets from config.yaml. An LLM could call
cronjob(action='add', enabled_toolsets=['terminal','file'],
prompt='...') and the cron-spawned agent would receive terminal+file
even when the operator had globally disabled them.

Fix: new _resolve_cron_disabled_toolsets() helper that ALWAYS layers
agent.disabled_toolsets on top of the cron baseline. AIAgent's
disabled_toolsets takes precedence over enabled_toolsets, so this
stops the bypass regardless of what the per-job override contains.

This is the disabled-side fix. Three concurrent PRs (#25842, #25815,
#25780) proposed intersection-side variants on _resolve_cron_enabled_toolsets;
this fix is more robust because it stops the leak at the precedence
boundary AIAgent itself enforces, not at a layer above.

Regression test reproduces the issue's PoC exactly:
config.yaml has agent.disabled_toolsets=[terminal,file]; cron job has
enabled_toolsets=[web,terminal,file]; assertion: AIAgent receives
disabled_toolsets containing terminal AND file.

Salvaged from PR #25786 by @Schrotti77. Simplified the implementation:
dropped a 23-line _normalize_toolset_list() helper (handled str/tuple/
set/garbage input shapes) in favor of the existing convention
(agent_cfg.get('disabled_toolsets') or []) used elsewhere in the
codebase. YAML always parses these as lists; the elaborate normalizer
was theatre for shapes we never produce.

Closes #25752

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 01:09:54 -07:00
Hasan Ali
d7c5d5dee5
fix: avoid persisting borrowed credential secrets (#31416) 2026-05-25 00:32:08 -07:00
Teknium
2b768535c9 test(acp): drop flaky runtime_calls[-1] tail-position assertion
The legacy runtime_calls[-1] == "anthropic" check in
test_model_switch_uses_requested_provider failed in CI under
specific test-shard scheduling with 'custom' == 'anthropic',
across multiple unrelated PRs on 2026-05-25. The May 23 pin
(commit 3127a41cb) monkeypatched parse_model_input + detect_provider_for_model
to remove the dependency on live _KNOWN_PROVIDER_NAMES module state but the
flake reappeared anyway — root cause still not reproducible locally even
under stress runs.

The other three assertions ("Provider: anthropic" in result,
state.agent.provider == "anthropic", state.agent.base_url ==
"https://anthropic.example/v1") already prove
fake_resolve_runtime_provider was called with requested="anthropic"
for the model-switch step — the agent's provider and base_url
come directly from that fake's return value. The tail-position
check was redundant and the only assertion that flaked.

Replaces runtime_calls[-1] == "anthropic" with
"anthropic" in runtime_calls so the plumbing path is still
covered without depending on call ordering.
2026-05-24 23:23:12 -07:00
helix4u
3b839f4369 fix(context): align guidance with 64k minimum 2026-05-24 23:23:12 -07:00
Ben Barclay
7e165e843d
Merge pull request #31760 from NousResearch/hermes/hermes-bf5898da
feat(docker)!: s6-overlay container supervision (salvage of #30136)
2026-05-25 12:57:51 +10:00
Teknium
46f8948bad test+harden(cli): cover parent-chain walk in concurrent-instance detection
Follow-up to @Strontvod's fix.

Tests:
- Five new tests in test_update_concurrent_quarantine.py cover the parent-
  chain exclusion: the .exe launcher is excluded, an unrelated sibling
  hermes.exe is still reported, multi-level ancestry is fully excluded,
  PID cycles in the parent chain don't hang, and a partially-stubbed
  psutil (no Process attribute) degrades gracefully instead of crashing.
- New _fake_psutil_with_parent_chain helper builds a fuller stand-in
  (Process / NoSuchProcess / AccessDenied + process_iter) than the
  process_iter-only SimpleNamespace the older tests use.

Hardening:
- Broaden the except in the parent-walk to bare Exception. The original
  fix listed (NoSuchProcess, AccessDenied, ValueError), but those names
  are evaluated lazily during exception matching — if psutil is a partial
  stub without the attribute, the exception handler itself raises
  AttributeError that escapes. The function is documented as 'never raises'
  (the surrounding update flow depends on it), so the broader catch keeps
  the contract regardless of how the dependency is shaped.

AUTHOR_MAP:
- Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit
  resolves to @Strontvod in the release notes.

All 18 detect_concurrent + quarantine tests pass.
2026-05-24 19:51:46 -07:00
Ben
c524b8a4dc test(docker): fix svstat 'want up' assertion in profile-gateway lifecycle test
After the supervise-perms fix lands, the s6 lifecycle actually works
for the hermes user — hermes -p <profile> gateway start now genuinely
brings the supervised gateway up rather than silently no-op'ing on
EACCES. That exposes a latent bug in this test's assertion: it
expected 'want up' to appear literally in s6-svstat output, but
s6-svstat elides redundancies — when the slot is currently up AND
s6 wants it up, the output is just 'up (pid N pgid N) X seconds';
the explicit 'want up' token only appears when current ≠ wanted
(e.g. 'down (exitcode 1) … , want up' on a crash-loop).

Add a small helper _svstat_wants_up() that reads the want-state
correctly across both spellings:
  * 'up …'                       → wanted up (unless explicit 'want down')
  * 'down …, want up'            → wanted up explicitly
  * 'down …'                     → wanted down

Both stop and start assertions now use the helper. Also rewords
the module docstring to acknowledge that the supervised process
may succeed OR crash-loop depending on environment, but the want-
state contract holds either way.

(cherry picked from commit 02c933aedc)
2026-05-25 12:25:06 +10:00
Ben
7d54288d82 test(dockerfile): recognize s6-overlay/init as a valid PID-1; harden against historical-comment masquerade
PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed
because the test hardcoded known_inits = ('tini', 'dumb-init',
'catatonit'). The PR replaced tini with s6-overlay's /init (which execs
s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name,
so the substring scan against ENTRYPOINT missed it.

Two-part fix:

1. Extend the accepted token list to include 's6-overlay', 's6-svscan',
   and '/init'. The contract these tests enforce is behavioural ('some
   PID-1 init reaps SIGCHLD'), so the names list is purely a recognition
   table and any reaper-capable family should qualify.

2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the
   sibling check) against comment-only matches. It was scanning the full
   Dockerfile text and only passed because the word 'tini' is still in
   a historical comment explaining why we used to use it. The next
   person to clean up that comment would have silently broken the test.
   New _instruction_text() helper joins only the parsed, non-comment
   Dockerfile instructions so stale comments can't satisfy the check.

(cherry picked from commit ffc1bb6393)
2026-05-25 12:24:58 +10:00
Ben
4f416fc40c fix(docker): make s6 lifecycle work for the unprivileged hermes user
Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.

The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.

The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.

The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.

New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
  svc_dir/event/                       hermes:hermes 03730
  svc_dir/supervise/                   hermes:hermes 0755
  svc_dir/supervise/event/             hermes:hermes 03730
  svc_dir/supervise/control            hermes:hermes 0660 (FIFO)
  svc_dir/log/event/                   hermes:hermes 03730  (if log/ present)
  svc_dir/log/supervise/               hermes:hermes 0755
  svc_dir/log/supervise/event/         hermes:hermes 03730
  svc_dir/log/supervise/control        hermes:hermes 0660 (FIFO)

The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).

Wiring
------
1. S6ServiceManager.register_profile_gateway calls
   _seed_supervise_skeleton(tmp_dir) just before publishing the
   slot via Path.replace. Runtime-registered profile gateways are
   set up by hermes.

2. container_boot._register_service does the same in the cont-init.d
   reconciliation path so boot-time-restored profile slots inherit
   the same layout.

3. New cont-init.d/015-supervise-perms script chowns the supervise/
   and event/ trees for STATIC s6-rc services (dashboard,
   main-hermes). These are spawned by s6-rc before cont-init.d
   gets to run, so the EEXIST-trick doesn't apply; we chown the
   already-existing tree instead. s6-supervise keeps using the
   same files; it never re-asserts ownership on a running service.
   The script skips s6-overlay internal services (s6rc-*,
   s6-linux-*) so the supervision tree itself stays root-only.
   015- slot is intentional: lex-sorts between 01-hermes-setup
   and 02-reconcile-profiles in the container's C-locale, so
   the chown finishes before the reconciler walks the scandir.

Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.

Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:

  (a) /run/service/dashboard is a symlink to a TRANSIENT
      /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
      mid-transaction; the touch landed in a soon-to-be-discarded
      tmp.

  (b) Even when written to the final /run/s6-rc/servicedirs/
      location, the 'down' file is only consulted by s6-supervise
      at slot startup. s6-rc's user-bundle explicitly transitions
      'dashboard' to 'up' on every boot, overriding any down
      marker.

The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.

03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).

Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.

References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
  + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
  http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
  community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
  semantics in finish scripts

(cherry picked from commit c41f908ad4)
2026-05-25 12:23:23 +10:00
teknium1
7f6f00f6ec
test(dockerfile): accept s6-overlay /init as a known PID-1 init
Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1
contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added
with #15012) hardcoded tini/dumb-init/catatonit as the only accepted
inits, so they failed after #30136 replaced tini with s6-overlay's
/init.

s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly
on SIGCHLD — same contract the test exists to enforce. Two updates:

  * test_dockerfile_installs_an_init_for_zombie_reaping — accept
    's6-overlay' as a known-installed marker (matches the
    s6-overlay install layer in Ben's Dockerfile).
  * test_dockerfile_entrypoint_routes_through_the_init — accept
    '/init' as a known-routed marker (s6-overlay's PID-1 binary
    lives at /init by convention).

Both assertions still fire if a future Dockerfile rewrite drops
the init entirely. Local: 7/7 pass.
2026-05-24 18:32:14 -07:00
teknium1
af144cd60d fix(model): include Premium+ in xAI OAuth label
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.
2026-05-24 18:12:16 -07:00
helix4u
4987fd2a59 fix(model): disambiguate xAI OAuth picker label 2026-05-24 18:12:16 -07:00
Teknium
031f9c9edc
fix(image_gen): cache xAI ephemeral URL responses to disk (#26942) (#31759)
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them.  The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image.  Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.

Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal.  Mirrors the audio_cache pattern used by text_to_speech.

Wires save_url_image into both the xAI and OpenAI providers' URL
branches.  When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.

Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
  in-process HTTP server: bytes round-trip, content-type → extension,
  URL-suffix fallback, default-to-png, 404 propagation, empty-body
  refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
  test_successful_url_response (was asserting the bug), add
  test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.

160/160 in the broader image_gen test surface.
2026-05-24 18:10:47 -07:00
teknium1
a4092ab217
fix(profiles): short-circuit s6 hooks on host before importing service_manager
Follow-up to @benbarclay's Docker s6 PR (#30136). The Phase 4 hooks
`_maybe_register_gateway_service` and `_maybe_unregister_gateway_service`
were already documented as "no-op on host", but they reached that no-op
by:

  1. importing `hermes_cli.service_manager`
  2. calling `get_service_manager()` (which calls `detect_service_manager()`)
  3. checking `mgr.supports_runtime_registration()` and returning False

If anything in step 1 or 2 raised an unexpected exception (e.g. a host
machine with a partial s6 install — `/proc/1/comm == s6-svscan` somehow,
but `/run/s6/basedir` absent, or vice versa), the `except Exception`
in the hook would print a confusing "⚠ Could not register s6 gateway
service: ..." warning on a non-container machine that has never touched
the container.

Reorder so `detect_service_manager() != "s6"` is checked FIRST, and
return silently for any detection failure. Host machines now:

  - never import the s6 backend
  - never call get_service_manager()
  - never print an s6-shaped warning under any failure mode

E2E confirmed on host Linux (systemd):
  `_maybe_register_gateway_service(...)` produces empty stdout,
  detect_service_manager() returns "systemd".

Existing tests updated to patch `detect_service_manager` for the s6
call-through cases (they previously relied on get_service_manager
being the only gate, which is no longer true). Added one new test —
`test_register_silent_when_detect_throws` — asserting that a broken
detector cannot leak a warning to host users.

cc @benbarclay — visible behavior change vs. your branch is one
fewer code path on host. Test changes are minimal (one helper +
`_patch_detect_s6` opt-in per s6 test). Happy to revert if you
prefer the original shape.
2026-05-24 18:07:47 -07:00
kshitijk4poor
af973e4071 refactor(gateway): migrate Mattermost adapter to bundled plugin
Second migration of an existing built-in platform adapter after Discord
(PR #30591) — follows the same shape established by IRC / Teams / LINE /
Google Chat / SimpleX and the playbook in
`references/platform-plugin-migration.md`. Advances the umbrella refactor
in #3823.

Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/`
with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell,
`register(ctx)` entry point, **no back-compat shim** at the old import
path, and full parity for all five hooks Discord uses plus the
`apply_yaml_config_fn` hook (mattermost is the second consumer of #25443
after Discord):

* `standalone_sender_fn` — out-of-process cron delivery via Mattermost
  REST API. Picks up the thread_id + media_files capabilities the
  legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`).
* `setup_fn` — interactive `hermes setup gateway` wizard.
* `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys
  (`require_mention`, `free_response_channels`, `allowed_channels`) into
  `MATTERMOST_*` env vars (replaces the hardcoded block in
  `gateway/config.py`).
* `is_connected` — declares connection state from `MATTERMOST_TOKEN` +
  `MATTERMOST_URL`.
* `check_fn` — verifies aiohttp is installed and both required env vars
  are set.
* plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`,
  `max_message_length` (4000 — Mattermost practical limit), `emoji`,
  `required_env`, `install_hint`.

Files
-----
* `gateway/platforms/mattermost.py` (873 LOC) →
  `plugins/platforms/mattermost/adapter.py` (git rename, R071) +
  appended `register()` block, hook helpers, and `_standalone_send`
  with media upload + thread_id support.
* New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with
  `requires_env` / `optional_env` declarations covering MATTERMOST_URL,
  MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS,
  MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE,
  MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS,
  MATTERMOST_ALLOWED_CHANNELS.
* `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge
  (moved into plugin's `_apply_yaml_config`).
* `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` —
  replaced by the existing generic plugin-registry-first dispatch.
* `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) +
  `Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch
  already routes plugin platforms through `_send_via_adapter`, which
  hits the registry's `standalone_sender_fn`.
* `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced
  by the plugin's `interactive_setup`.
* `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry
  (3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in
  `_configure_platform`.
* Consumer rewrite: 5 test files (test_mattermost.py,
  test_media_download_retry.py, test_send_multiple_images.py,
  test_stream_consumer.py, test_ws_auth_retry.py) get
  `gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter`
  with the bulk-rewrite recipe from the platform-plugin-migration playbook.
  Single `mock.patch` string in test_stream_consumer.py also repointed.
* `tests/tools/test_send_message_missing_platforms.py`: thin
  `(token, extra, chat_id, message)` compat shim around the plugin's
  `_standalone_send(pconfig, …)` so existing test bodies continue to
  work without rewriting every signature.

Validation
----------
* Plugin discovery: mattermost registers from `plugins/platforms/mattermost/`
  alongside discord / teams / irc / line / google_chat / simplex.
  All 9 hooks present (setup_fn, standalone_sender_fn,
  apply_yaml_config_fn, is_connected, check_fn, allowed_users_env,
  allow_all_env, cron_deliver_env_var, max_message_length=4000).
* Mattermost-touching tests: 62/62 pass
  (`test_mattermost.py` + `test_send_message_missing_platforms.py`).
* Targeted selectors (mattermost or platform_registry or stream_consumer
  or ws_auth_retry or media_download_retry or send_multiple_images or
  send_message_tool or platform_connected): 433/433 pass.
* Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/
  tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py
  tests/integration/`): **6220/6220 pass in 47.8s, 0 failures**.
* Lint: ruff clean on all touched files.
* Git identity verified: kshitijk4poor.
* Rename detection: R071 (similarity dropped from a hypothetical R09x
  by the ~320-line appended register block — ~36% growth over the
  873-LoC base, vs Discord's 5101 LoC base which kept R091).

Closes part of #3823.
2026-05-24 18:05:33 -07:00
Ben
cd5b2c4123
test(docker): poll for boot-log signal instead of fixed sleeps
PR #30136 review item O6: test_container_restart.py used fixed
`time.sleep(8)` calls after `docker restart` to wait for the
cont-init reconciler to finish. Fixed sleeps are slow when the
event happens fast and false-fail when the event happens slow.

Replace with two polling helpers:

* `_wait_for_path(container, path, kind='f' | 'd', deadline_s=...)`
  — generic `test -f/-d` poller. Returns True on success, False on
  timeout; callers assert with a clear message.
* `_wait_for_reconcile_log_mention(container, profile, ...)` — the
  reconciler's per-profile log line is the canonical signal that
  the cont-init reconcile has finished for that profile. Poll on
  it instead of a sleep that hopes 8 seconds is enough.

The fixture-level setup wait is similarly migrated: it now polls
for `profile=default` in the boot log (every container always
gets a default-slot entry per item I1) and raises a clear timeout
error from the fixture if the container never finishes cont-init —
much better diagnostics than a mid-test KeyError.

The remaining `time.sleep()` calls are all internal interval_s
between probe attempts; no fixed wait points left.
2026-05-24 18:05:33 -07:00
Ben
d0b1ab48dc
fix(container_boot): publish reconciled service dirs atomically
PR #30136 review noted the asymmetry: `register_profile_gateway`
used tmp_dir + rename to publish a new service slot atomically,
but the boot-time reconciler wrote files into the slot directly.
Same underlying concern (a concurrent s6-svscan rescan could
observe a half-populated directory), different code path.

Rewrite `container_boot._register_service` to mirror the manager:
build everything in `<scandir>/gateway-<profile>.tmp/`, then
`Path.replace` into place. If a previous interrupted run left a
`.tmp` sibling, it's cleaned up before the new build starts. If
the target already exists, it's removed before the rename so
`Path.replace` doesn't error on a non-empty target (Linux `rename`
overwrites empty targets only).

Three new tests: atomic publication leaves no .tmp leftovers,
overwriting an existing slot still leaves no .tmp leftovers, and
a stale .tmp from an interrupted run is cleaned up automatically.
2026-05-24 18:05:33 -07:00
Ben
4443fb481d
fix(container_boot): rotate container-boot.log when it exceeds 256 KiB
PR #30136 review noted: container-boot.log was append-only with no
rotation. On a long-lived container with frequent restarts and
many profiles it would grow unboundedly (~80 B per profile per
reconcile pass).

Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`,
≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the
current file is renamed to `container-boot.log.1` (replacing any
existing one) before new entries are appended. Worst case is two
files at ~512 KiB — well within visibility limits for grep/cat.

Rotation is intentionally simple (no logrotate or s6-log machinery
for one append-only file). Failures during rotation are logged via
the module logger and treated as non-fatal — we keep appending to
the existing file rather than dropping the reconcile entry. Three
new unit tests cover above-threshold rotation, below-threshold
non-rotation, and overwrite of an existing .1 file.
2026-05-24 18:05:33 -07:00
Ben
d735b083e8
fix(service_manager): rip out dead port parameter
PR #30136 review caught: `_allocate_gateway_port()` in profiles.py
computed a SHA-256-derived port that was threaded through
`register_profile_gateway(profile, port=N)` →
`_render_run_script(profile, port, extra_env)` → and then **ignored**.
The rendered run script picked the bind port from the profile's
config.yaml (`[gateway] port = …`), never from the allocator. So
the entire allocator + parameter chain was dead code.

Remove:

* `hermes_cli.profiles._allocate_gateway_port` (deterministic
  SHA-256 → [9200, 9800) — never used).
* `port` kwarg from `ServiceManager.register_profile_gateway`
  (Protocol + Mixin + S6 implementation).
* `port` positional arg from `_render_run_script(profile, port,
  extra_env)` — now `_render_run_script(profile, extra_env)`.
* The pass-through call in `profiles._maybe_register_gateway_service`.

config.yaml is now the single source of truth for gateway port
selection — matches reality and reduces the API surface. Three
explanatory comments in service_manager.py / profiles.py document
the retirement so future readers don't reach for the allocator and
find a ghost.

Tests: drop the three `_allocate_gateway_port` tests; update
fakes' signatures throughout test_service_manager.py and
test_profiles_s6_hooks.py to match the new no-port API.
2026-05-24 18:05:33 -07:00
Ben
1dfabe47b3
fix(docker): dashboard slot stays 'down' when HERMES_DASHBOARD unset
PR #30136 review caught a false positive: when HERMES_DASHBOARD was
unset, the dashboard run script did `exec sleep infinity`, so
`s6-svstat /run/service/dashboard` reported the slot as 'up'.
`hermes doctor` and any other s6-svstat-based health check saw the
dashboard as supervised-running even though no dashboard process
existed.

Add cont-init.d/03-dashboard-toggle: writes a `down` marker file
into `/run/service/dashboard/` when HERMES_DASHBOARD is falsy,
removes any leftover marker when it's truthy. s6-supervise honors
`down` by not starting the service, so s6-svstat reports 'down' —
matching reality.

The run script's HERMES_DASHBOARD case-statement stays in place as
a belt-and-suspenders guard, so the two layers can never disagree.

Two new integration tests lock the behavior: slot reports down
when unset; slot reports up when set to 1.
2026-05-24 18:05:33 -07:00
Ben
b28b3f51d3
fix(service_manager): friendly errors for missing slots and s6-svc failures
PR #30136 review caught: `S6ServiceManager.start/stop/restart` called
`subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as
a raw `CalledProcessError` traceback. The two cases operators
actually hit are:

  1. The service slot doesn't exist — most commonly because the user
     typed a profile name wrong (`hermes -p typo gateway start`).
  2. s6-svc itself fails — most commonly EACCES on the supervise
     control FIFO when running unprivileged.

Both deserve named errors with actionable messages, not stacktraces.

Changes:

* Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`:
    - `GatewayNotRegisteredError(profile)` — carries the unprefixed
      profile name; message: `no such gateway 'typo': register it
      with `hermes profile create typo` first, or pass an existing
      profile name via `-p <name>``.
    - `S6CommandError(service, action, returncode, stderr)` — carries
      the s6-svc rc and stderr; message: `s6-svc start on
      'gateway-coder' failed (rc=111): <stderr>`.

* Factor lifecycle dispatch through `_run_svc(flag, label, name)`:
  pre-checks that the service directory exists (raises
  GatewayNotRegisteredError before invoking s6-svc), then runs
  s6-svc and translates any CalledProcessError into S6CommandError.

* `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway`
  catches both errors and prints `✗ <message>` + `sys.exit(1)`
  instead of letting the exception bubble. The dispatch path that
  used to dump a traceback at the user now gives an actionable
  one-liner.

Tests: 6 new tests for the error types and their CLI rendering;
existing lifecycle test pre-seeds the slot directory before calling
`mgr.start` etc.
2026-05-24 18:05:33 -07:00
Ben
b044c1ac29
fix(container_boot): always register gateway-default slot
PR #30136 review caught: `hermes gateway start` (no `-p`) inside
the container resolves `_profile_suffix() == ""` → service name
`gateway-default`, but no such slot was ever registered. The Phase 4
profile-create hook only fired on `hermes profile create <name>`,
and the root profile (which lives at the top of $HERMES_HOME, not
under `profiles/`) was never one of those. So bare `hermes gateway
start` landed on `s6-svc -u /run/service/gateway-default` →
uncaught `CalledProcessError` → traceback to the user.

Changes:

1. `reconcile_profile_gateways` now always registers a
   `gateway-default` slot before iterating named profiles. Its
   prior state is read from `$HERMES_HOME/gateway_state.json`
   (sibling to the profile root, not under `profiles/`); stale
   runtime files there are swept the same way. Auto-up only if the
   prior state was `running` — same rule as named profiles.

2. `S6ServiceManager._render_run_script` special-cases
   `profile == "default"` to emit `hermes gateway run` with NO
   `-p` flag. Passing `-p default` would resolve to
   `$HERMES_HOME/profiles/default/` — a different profile that
   almost certainly doesn't exist. The empty profile-suffix
   convention is the dispatcher's contract and the run script has
   to match.

3. A user-created `profiles/default/` collides with the reserved
   root-profile slot; the reconciler now skips it with a warning
   rather than producing two registrations of the same service name.

Action-list ordering is stable: `default` first, then named
profiles in directory order. Boot-log readers can rely on this.

Tests: 8 new dedicated default-slot tests plus updates to every
existing test that asserted against the action list (via the new
`_named_actions` helper that drops the always-present default
entry).
2026-05-24 18:05:33 -07:00
Ben
6dedaa4846
fix(gateway): route --all stop/restart through s6 under container
PR #30136 review caught that `hermes gateway stop --all` and
`... restart --all` were broken under s6. The Phase 4 dispatcher was
gated on `not stop_all` (and the symmetric restart_all), so `--all`
fell through to `kill_gateway_processes(all_profiles=True)`. pkill
SIGTERMed every gateway, s6-supervise observed the crashes, and
restarted every gateway ~1s later — net effect: `--all` *kicked*
gateways instead of *stopping* them.

Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates
`mgr.list_profile_gateways()` and routes stop/restart through each
service slot. s6's `want up`/`want down` flips correctly, so a
stop persists. Partial failures are surfaced per-profile with a
running success count; the host pkill path is only reached when s6
isn't in play.

`start --all` isn't a CLI surface — the helper rejects it and
returns False (host code path can take over).
2026-05-24 18:05:33 -07:00
Ben
fc39296e1f
fix(service_manager): s6 detection works for unprivileged hermes user
PR #30136 review surfaced two issues, both rooted in the same audit gap:
docker integration tests were running as root, not the unprivileged
`hermes` user (UID 10000) that the runtime actually uses via
`s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to
the s6 control surface worked as root in the tests but was inert in
production.

Fixes:

1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`,
   which is root-only readable. For UID 10000 the symlink yields
   PermissionError, `resolve()` silently returns the unresolved path,
   and `exe.name == "exe"` — so detection always returned False, the
   service-manager runtime-registration path was inert, and every
   `hermes profile create` / `hermes -p X gateway start` silently
   skipped the s6 hook. Replace with `/proc/1/comm` (world-readable)
   + `/run/s6/basedir` (s6-overlay-specific) — both required, fail
   closed.

2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/`
   {control,lock} to hermes so `s6-svscanctl -a/-an` works without
   root. Previously the directory chown stopped at `/run/service`
   and the FIFO inside stayed root-owned, so `register_profile_gateway`
   from hermes failed at the rescan-trigger step with EACCES — the
   wrapper in profiles.py caught the exception and printed a swallowed
   warning, so profile creation appeared to succeed while the slot
   was rolled back.

Audit changes to flush this class of bug next time:

- Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py`
  that default to `-u hermes`. The module docstring explains why and
  flags `user="root"` as opt-in only for tests that explicitly need
  root (none currently do).
- Refactor every `docker exec` call in tests/docker/ through the new
  helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py,
  test_container_restart.py, test_s6_profile_gateway_integration.py).
- Add 5 unit tests covering `_s6_running` under various probe states
  (both signals present; comm wrong; basedir missing; PermissionError
  on /proc/1/comm; missing /proc — non-Linux). The PermissionError
  test is the explicit regression guard for the original bug.

Known follow-up: the per-service `supervise/control` FIFO inside each
`/run/service/gateway-<profile>/supervise/` is created root-owned by
s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc
-u/-d/-t` from the hermes user will get EACCES on those. The audit
under `-u hermes` will reveal this in lifecycle tests — surfacing the
issue cleanly so it can be fixed in a focused follow-up (likely via a
small SUID helper or a polling chown loop in cont-init.d). The
detection + svscanctl fixes here are independent and complete on
their own.
2026-05-24 18:05:33 -07:00
Ben
4b4c36cb61
feat(docker): remove gosu from bundled image; s6-setuidgid handles privilege drop
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.

Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated

Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring

tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.

Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
  per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
  break test discovery; pre-existing unrelated mock warning)

The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
2026-05-24 18:05:33 -07:00
Ben
2afefc501c
feat(docker): per-profile s6 supervision + container-restart reconciliation
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.

Task 4.0 — container-boot reconciliation:
  /run/service/ is tmpfs, so every `docker restart` wipes every
  per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
  invokes hermes_cli.container_boot.reconcile_profile_gateways() on
  every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
  gateway_state.json, recreates the s6 service slot, and auto-starts
  only those whose last state was 'running'. Other states
  (stopped, starting, startup_failed, missing) register the slot
  in the down state — avoiding crash-loops across restarts for a
  gateway that was broken last boot. Per-profile outcome is recorded
  to $HERMES_HOME/logs/container-boot.log.

  Implementation: hermes_cli/container_boot.py + 12 unit tests.
  Profile-marker is SOUL.md, not config.yaml, because `hermes profile
  create` only seeds SOUL.md by default (config.yaml comes from
  `hermes setup`).

Task 4.1 / 4.2 — profile create/delete hooks:
  hermes_cli/profiles.py::create_profile now calls
  _maybe_register_gateway_service(<canon>) at the end, which routes
  through ServiceManager.register_profile_gateway when running on s6
  and no-ops on host backends. delete_profile mirrors with
  _maybe_unregister_gateway_service. _allocate_gateway_port produces
  a deterministic SHA-256-derived port in [9200, 9800).

Task 4.3 — gateway dispatch + remove rejection arms:
  _dispatch_via_service_manager_if_s6(action) intercepts
  start/stop/restart at the top of each subcommand and routes them
  through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
  `elif is_container():` rejection arms are kept as fallback for
  pre-s6 containers / unsupported runtimes, but only ever fire when
  detect_service_manager() != 's6'. install/uninstall under s6
  print informational guidance pointing users at profile create/delete.

  Removed the two xfail(strict=True) markers from
  tests/docker/test_profile_gateway.py — both tests now pass strictly.

Task 4.4 — status reporting:
  get_gateway_runtime_snapshot() reports
  Manager: 's6 (container supervisor)' inside an s6 container instead
  of 'docker (foreground)'.

Plan-vs-reality drift fixed in this commit:
  - Plan's S6ServiceManager._render_run_script used
    `gateway start --foreground --port {port}` — invented args; the
    real CLI is `gateway run`. Switched accordingly. port arg
    retained for API parity but now documented as 'currently ignored'.
  - Plan's reconciler keyed on config.yaml; switched to SOUL.md
    (config.yaml is created by hermes setup, not by hermes profile
    create, so the original gate caught nothing).
  - The plan's _dispatch helper used _profile_arg() which returns
    '--profile <name>' (i.e. with the flag prefix). Switched to
    _profile_suffix() which returns the bare name.
  - Architecture B's docker exec doesn't get /command on PATH or
    the venv on PATH; Dockerfile's runtime PATH now includes
    /opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
    without sourcing the venv.
  - stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
    boot, not just on the UID-remap path. Without this, files created
    by docker-exec-as-root accumulate and the next reconciler run
    fails with PermissionError reading SOUL.md.

Test harness:
  19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
  passing). 78 unit tests across service_manager + container_boot +
  profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
  pass cleanly.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben
0abf661f71
feat(service_manager): add S6ServiceManager for runtime gateway supervision
Phase 3 of the s6-overlay supervision plan. Implements the runtime-
registration surface from D4 — only the s6 backend supports
register_profile_gateway / unregister_profile_gateway /
list_profile_gateways; host backends continue to raise
NotImplementedError. No caller yet (Phase 4 wires in the profile
create/delete hooks).

Key implementation notes:

  - Service directory shape: /run/service/gateway-<profile>/{type,run,log/run}.
    Atomic register: write to gateway-<profile>.tmp, fsync via
    os.rename. Cleanup on rescan failure.

  - Run script uses #!/command/with-contenv sh so HERMES_HOME and any
    extra_env arrive at exec time. The hermes -p <profile> gateway
    start --foreground --port <port> command is wrapped in
    s6-setuidgid hermes for the per-service privilege drop (OQ2-A).

  - Log script (OQ8-C): persists via s6-log to
    ${HERMES_HOME}/logs/gateways/<profile>/. CRITICAL — HERMES_HOME is
    a runtime env-var expansion in the rendered script, NOT a Python
    f-string substitution. Negative-asserted in
    test_s6_register_creates_service_dir_and_triggers_scan so
    regressions are caught.

  - PATH gotcha: /command/ is only on PATH for processes spawned by
    the supervision tree (services, cont-init.d). `docker exec` and
    profile-create hooks don't get it. S6ServiceManager calls all
    s6-* binaries via absolute path through the new _S6_BIN_DIR
    constant so callers don't have to fix up env vars.

  - validate_profile_name rejects path-traversal, leading-dash (s6
    would parse as a flag), uppercase, whitespace, and names >251
    chars (s6-svscan default name_max).

Test coverage:
  - 13 new unit tests in tests/hermes_cli/test_service_manager.py
    (kind detection, run-script content, env quoting, register
    rollback on rescan failure, unregister idempotence, list filter,
    lifecycle dispatch, svstat parsing). Total: 36 passing.
  - 2 new in-container integration tests in
    tests/docker/test_s6_profile_gateway_integration.py validating
    end-to-end registration against a real s6 supervision tree.

Docker harness: 14 passed, 2 xfailed (Phase 4 target unchanged).

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben
e0e9c895d3
feat(docker)!: replace tini with s6-overlay as PID 1
BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay)
instead of /usr/bin/tini. Main hermes runs as the container CMD with
TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc
service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the
ground is laid for per-profile gateway supervision (Phase 3+4).

All five pre-s6 docker run invocation patterns continue to work
identically — verified by the Phase 0 docker harness:

  docker run <image>                  → `hermes` with no args
  docker run <image> chat -q "..."    → `hermes chat -q ...` passthrough
  docker run <image> sleep infinity   → `sleep infinity` direct
  docker run <image> bash             → interactive bash
  docker run -it <image> --tui        → interactive Ink TUI

Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint
+ shellcheck pass cleanly.

Architecture pivot from plan v3 (documented in main-hermes/run header):
the plan called for main hermes to be an s6-supervised service, but
two real s6-overlay v3 mechanics blocked that — cont-init.d scripts
receive no arguments (CMD args are not visible to stage2-hook), and
`/run/s6/basedir/bin/halt` after writing the exit code did not
propagate the desired exit code (container exits 143). We use the
s6-overlay-native CMD pattern instead: main-wrapper.sh is the
container's main program (ENTRYPOINT prepends it so leading-dash
args like --version aren't intercepted by /init), exec's the final
program with stdin/stdout/stderr inherited, and the program's exit
code becomes the container exit code. main-hermes is now a no-op
`sleep infinity` slot kept for future supervised-gateway-container
modes. This trades "supervised restart of main hermes" for arg-
parity with the pre-s6 contract — main hermes was already unsupervised
under tini, so we lose nothing functional. Dashboard supervision is
the only new guarantee added by this phase.

Files added:
  docker/main-wrapper.sh           # arg routing + s6-setuidgid drop
  docker/stage2-hook.sh            # gosu-equivalent + chown + seed
  docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base}
  docker/s6-rc.d/dashboard/{type,run,dependencies.d/base}
  docker/s6-rc.d/user/contents.d/{main-hermes,dashboard}

Files changed:
  Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring
  docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat
  tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben
51914b0514
feat(service_manager): add ServiceManager protocol + host wrappers
Phase 1 of the s6-overlay supervision plan. Pure-refactor addition:
introduces the abstract interface (with runtime_checkable Protocol),
detect_service_manager(), validate_profile_name(), and thin
SystemdServiceManager / LaunchdServiceManager / WindowsServiceManager
wrappers around the existing systemd_* / launchd_* / gateway_windows.*
module-level functions. No host call site was modified — host code
continues to use the existing functions directly; the protocol is for
new backend-agnostic code (Phase 4 profile create/delete hooks and the
Phase 4 s6 dispatch path in 'hermes gateway start/stop/restart').

WindowsServiceManager.install() forwards the v3 kwargs (start_now,
start_on_login, elevated_handoff) added in PRs #28169-adjacent so
non-Windows callers — there aren't any today — can opt in.

The s6 backend lands in Phase 3; until then get_service_manager()
raises a clear error if invoked on a host that detects as 's6'.
2026-05-24 18:05:14 -07:00
Ben
440147ebea
test(docker): stabilize Phase 0 baseline harness
Two pre-existing baseline issues found while running the Phase 0 harness
against the tini image that need fixing before later phases can use the
harness as a behavior-parity oracle:

1. The autouse `_enforce_test_timeout` fixture in tests/conftest.py
   hard-coded a 30s SIGALRM, which preempted any `pytest.mark.timeout`
   marker (already honored by pytest-timeout). Honor the marker if
   present; fall back to 30s otherwise. Docker harness tests carry a
   180s marker applied at collection time in tests/docker/conftest.py.

2. test_dashboard_port_override polled via `ss -tlnp` / `netstat -tln`
   — neither is installed in the Hermes image, so the probe trivially
   failed even when the dashboard was bound. The dashboard also takes
   8-15s to bind on cold image; the 5s sleep was insufficient. Replace
   with a poll loop reading /proc/net/tcp directly (port 9120 = 0x23A0,
   state 0A = LISTEN). Bump probe deadline to 60s and switch
   test_dashboard_opt_in_starts to a similar poll for pgrep so we don't
   regress to the same race.

Result: 11 passed, 2 xfailed (Phase 4 target) on tini image. Harness
now ready to serve as Phase 2's behavior-parity oracle.
2026-05-24 18:05:14 -07:00
Ben
a18f69eb55
test(docker): apply 180s timeout to docker harness tests
The agent-test suite default is 30s; docker test_no_args (the dashboard
spin-up, the container restart) routinely take 60-90s. Without this
they intermittently fail in CI with TimeoutError.
2026-05-24 18:05:14 -07:00
Ben
6e6acdea2a
test(docker): lock baseline behavior for Phase 0 harness
Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the
user-visible behavior we must preserve through the Phase 2 init-
system swap:

- test_main_invocation.py (Task 0.2): docker run <image> with no
  args, chat subcommand passthrough, bare executable passthrough,
  bash pattern, exit-code propagation
- test_tui_passthrough.py (Task 0.3): TTY allocation via docker -t
  using the host's script(1) for a PTY
- test_dashboard.py (Task 0.4): HERMES_DASHBOARD=1 opt-in,
  HERMES_DASHBOARD_PORT override
- test_profile_gateway.py (Task 0.5): per-profile gateway
  start/stop and profile-delete-stops-gateway. Both marked
  xfail(strict=True) because the current tini image refuses
  gateway lifecycle commands inside the container; Phase 4
  Task 4.3 flips them to passing.
- test_zombie_reaping.py (Task 0.6): PID 1 reaps orphaned
  zombies. tini does this today; s6-overlay's /init must
  continue to.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:14 -07:00
Ben
08302135b6
test(docker): add conftest fixtures for docker harness
Task 0.1 of the s6-overlay supervision plan. Establishes the test
infrastructure for tests/docker/: skip-on-missing-Docker collection
hook, session-scoped image-build fixture (overridable via the
HERMES_TEST_IMAGE env var for faster local iteration), and a
container_name fixture that ensures cleanup on test exit.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:14 -07:00
kshitijk4poor
00ec0b617c feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.

The hook covers cases the shell-template grammar can't reasonably
express:

- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows

None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.

## Resolution order

The dispatcher's resolution order is the load-bearing invariant:

1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
   → command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
   → plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).

Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
  names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
  the `hermes tools` row list defensively.

Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
  `_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
  config defensively so a refactor of the caller can't silently break
  the invariant.

## New files

- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
  `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
  `voice_compatible` (all optional with sane defaults). Mirrors
  `agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
  with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
  `agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).

## Modified files

- `hermes_cli/plugins.py` — `register_tts_provider()` method on
  `PluginContext`. Matches the gating shape of
  `register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
  `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
  the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
  plugin rows into the Text-to-Speech picker category alongside the
  10 hardcoded built-in rows.

## Tests

- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
  lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
  test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
  `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
  circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
  built-in-always-wins, command-wins-over-plugin, plugin dispatch,
  exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
  picker surface, builtin shadowing defense, integration with
  `_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
  tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
  parity harness vs `origin/main`. The only intentional diff is
  `fallback_edge → plugin` for the `plugin-installed` scenario.

## Verification

- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
  test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
  `text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.

## Docs

- `website/docs/user-guide/features/tts.md` — new "Python plugin
  providers" section with a decision table (command-provider vs
  plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
  mention both surfaces (command-provider primary, plugin for
  SDK/streaming).

Closes #30398
2026-05-24 18:04:54 -07:00
Zyrix
782681f904
fix(google_chat): harden oauth credential persistence with atomic private writes (#24788) 2026-05-24 17:58:52 -07:00
vgocoder
dcc163ee28 fix(security): redact credentials before persistence in session capture
Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:

1. agent/chat_completion_helpers.py :: build_assistant_message
   - Redact assistant content before the message dict is constructed
     (catches PATs / API keys the model inlines into natural language)
   - Redact tool_call.function.arguments at the same site (catches secrets
     inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
   Tool execution uses the raw API response object, not this dict, so
   redacting the persisted shape is safe.

2. run_agent.py :: _save_session_log
   - Add _redact_message_content() static helper that handles both string
     content and OpenAI/Anthropic multimodal list-of-parts (image parts
     pass through untouched, only text/content fields are redacted)
   - Apply to every message + the cached system prompt before writing
     session_*.json

Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.

Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
  - api key in tool content
  - api key in user message
  - api key in system prompt
  - multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.

Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.

Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).

Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 17:58:25 -07:00
JunghwanNA
243ebc7a61 Protect dashboard OAuth credentials with the same file-safety guarantees as other auth paths
The web dashboard's Anthropic OAuth helper wrote the credential file
straight to its final destination and relied on the process umask for
permissions. That left the dashboard-specific path weaker than the
existing auth writers, which already use owner-only permissions and
safer write semantics.

This change keeps the scope narrow: make the dashboard helper write via
a temp file + replace, chmod the final file to owner-only, and add a
focused regression test for both permission handling and atomic-write
behavior.

Constraint: Must preserve the existing dashboard OAuth flow and credential-pool side effects
Rejected: Broader auth-storage refactor | unnecessary scope for a single verified inconsistency
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep dashboard credential writes aligned with existing auth storage semantics; do not reintroduce direct write_text() here without matching chmod/atomic behavior
Tested: pytest -o addopts='' tests/hermes_cli/test_web_server_oauth_write.py tests/hermes_cli/test_web_server.py -q (78 passed)
Not-tested: Cross-platform permission semantics on Windows-managed filesystems
2026-05-24 17:47:24 -07:00
kronexoi
4694524dee fix(security): restrict write access to Anthropic OAuth credential store 2026-05-24 17:47:24 -07:00
Guts
223a3971c0
fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152)
_write_claude_code_credentials wrote ~/.claude/.credentials.json via
Path.write_text + replace + post-write chmod(0o600). Both the temp file
and the destination briefly inherited the process umask (commonly 0o644
= world-readable) between create/replace and chmod, exposing the OAuth
access/refresh tokens to other local users on multi-user hosts.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the temp file is created atomically at 0o600. After os.replace,
the destination inherits the temp's mode, so the post-write chmod is no
longer needed. The temp name also gains a per-process random suffix to
avoid collisions between concurrent writers and stale leftovers from a
crashed prior write.

Parent dir (~/.claude/) is owned by Claude Code itself and shared with
its native auth, so we deliberately don't tighten its mode here (unlike
the mcp_oauth fix which owns its own subtree under HERMES_HOME).

Mirrors the fix shipped for agent/google_oauth.py in #19673 and the
parallel fix for tools/mcp_oauth.py in #21148.

Adds a regression test in TestWriteClaudeCodeCredentials asserting the
resulting file mode is 0o600 (skipped on Windows where POSIX mode bits
aren't enforced).
2026-05-24 17:45:12 -07:00
Hinotobi
bba76f3dcd
fix(file-safety): deny reads of Google OAuth tokens (#30972) 2026-05-24 17:45:03 -07:00
Teknium
9c08070703 test(cli): update resume usage-hint assertion for numbered selection
PR #9020's salvage changed the /resume list footer from
'Use /resume <session id or title> to continue.' to
'Use /resume <number>, /resume <session id>, or /resume <session title> to continue.\n  Example: /resume 2'.

test_resume_without_target_lists_recent_sessions still pinned the old
string verbatim and failed in CI. Relax to substring assertions that
allow both the new numbered footer and any future tweaks while still
verifying the hint is shown.
2026-05-24 16:22:48 -07:00
Teknium
c043c86bd7 i18n+tests: add list_item_numbered, list_footer_numbered, out_of_range for 15 locales
The numbered /resume feature added new i18n keys to en.yaml; the catalog parity
tests require every locale to carry matching keys and placeholders, so add
translations to all 15 supported locales.

Also unblock tests/cli/test_cli_resume_command.py:
- _make_cli stub now sets self.resume_display = 'minimal' since
  _handle_resume_command (post-#31695) calls _display_resumed_history.
- mock_db.resolve_resume_session_id returns the input id (no compression
  chain) so HERMES_SESSION_ID is set to a real string, not a MagicMock.
2026-05-24 16:22:48 -07:00
daizhonggeng
fef733d56b feat: support numbered resume selection in cli and gateway 2026-05-24 16:22:48 -07:00
AhmetArif0
4f4e337c47 fix(file-safety): write-deny pairing/ directory to prevent approved-list injection
The gateway pairing directory (~/.hermes/pairing/) stores per-platform
access-control files (telegram-approved.json, discord-approved.json, etc.).
A prompt-injected agent using write_file could add arbitrary user IDs to an
approved file, granting persistent gateway access without going through the
pairing code flow — the same threat class that motivated protecting
webhook_subscriptions.json (#14157).

The pairing directory was not included in the original control-plane protection
because it postdates PR #14157. PR #30383 introduced the hashed-pending schema
and made the approved files the sole source of truth for gateway access, raising
the security sensitivity of the directory.

Apply the same mcp-tokens pattern: block writes to pairing/ and any path within
it, under both the active hermes_home and the root path (for profile-mode parity
with the fix in #30382).

Regression tests verify denial for pairing/telegram-approved.json,
pairing/discord-pending.json, and the directory itself, in both normal and
profile-mode layouts.
2026-05-24 16:15:33 -07:00
LeonSGP43
6c44d537cc fix(cli): show full session titles in /resume list 2026-05-24 16:13:23 -07:00