hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Ben	3b69bdb74e	test(docker): poll for boot-log signal instead of fixed sleeps PR #30136 review item O6: test_container_restart.py used fixed `time.sleep(8)` calls after `docker restart` to wait for the cont-init reconciler to finish. Fixed sleeps are slow when the event happens fast and false-fail when the event happens slow. Replace with two polling helpers: * `_wait_for_path(container, path, kind='f' \| 'd', deadline_s=...)` — generic `test -f/-d` poller. Returns True on success, False on timeout; callers assert with a clear message. * `_wait_for_reconcile_log_mention(container, profile, ...)` — the reconciler's per-profile log line is the canonical signal that the cont-init reconcile has finished for that profile. Poll on it instead of a sleep that hopes 8 seconds is enough. The fixture-level setup wait is similarly migrated: it now polls for `profile=default` in the boot log (every container always gets a default-slot entry per item I1) and raises a clear timeout error from the fixture if the container never finishes cont-init — much better diagnostics than a mid-test KeyError. The remaining `time.sleep()` calls are all internal interval_s between probe attempts; no fixed wait points left.	2026-05-23 16:21:00 +10:00
Ben	e3050657aa	docs(docker): deprecation warning in entrypoint.sh shim PR #30136 review item O5: docker/entrypoint.sh is now a thin shim that forwards to stage2-hook.sh — the real ENTRYPOINT is /init plus main-wrapper.sh. External scripts that hard-coded entrypoint.sh as the container's ENTRYPOINT will see the cont-init bootstrap happen but the CMD will not be exec'd (because stage2-hook only handles bootstrap; main-wrapper.sh handles the CMD passthrough). Add a stderr warning explaining the new contract and pointing callers at the migration path (drop the --entrypoint override). The shim itself stays in place for one release cycle so the deprecation isn't a hard break — anyone still invoking it sees the warning in their logs and has time to migrate.	2026-05-23 16:18:59 +10:00
Ben	541b40532a	fix(container_boot): publish reconciled service dirs atomically PR #30136 review noted the asymmetry: `register_profile_gateway` used tmp_dir + rename to publish a new service slot atomically, but the boot-time reconciler wrote files into the slot directly. Same underlying concern (a concurrent s6-svscan rescan could observe a half-populated directory), different code path. Rewrite `container_boot._register_service` to mirror the manager: build everything in `<scandir>/gateway-<profile>.tmp/`, then `Path.replace` into place. If a previous interrupted run left a `.tmp` sibling, it's cleaned up before the new build starts. If the target already exists, it's removed before the rename so `Path.replace` doesn't error on a non-empty target (Linux `rename` overwrites empty targets only). Three new tests: atomic publication leaves no .tmp leftovers, overwriting an existing slot still leaves no .tmp leftovers, and a stale .tmp from an interrupted run is cleaned up automatically.	2026-05-23 15:34:51 +10:00
Ben	5b1fcdd16b	fix(container_boot): rotate container-boot.log when it exceeds 256 KiB PR #30136 review noted: container-boot.log was append-only with no rotation. On a long-lived container with frequent restarts and many profiles it would grow unboundedly (~80 B per profile per reconcile pass). Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`, ≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the current file is renamed to `container-boot.log.1` (replacing any existing one) before new entries are appended. Worst case is two files at ~512 KiB — well within visibility limits for grep/cat. Rotation is intentionally simple (no logrotate or s6-log machinery for one append-only file). Failures during rotation are logged via the module logger and treated as non-fatal — we keep appending to the existing file rather than dropping the reconcile entry. Three new unit tests cover above-threshold rotation, below-threshold non-rotation, and overwrite of an existing .1 file.	2026-05-23 15:33:11 +10:00
Ben	f83b9b96d1	docker: drop sh -c wrappers from stage2-hook.sh PR #30136 review caught: three `s6-setuidgid hermes sh -c "..."` invocations in stage2-hook.sh interpolated $HERMES_HOME into a nested shell context. Practically low-risk (a malicious HERMES_HOME already requires container-launch privileges) but the cleaner pattern is to invoke commands directly so the shell isn't a second interpreter. * `mkdir -p` of the data subdirs now runs directly via s6-setuidgid, one path per arg. * The .install_method stamp is written via `printf \| tee` — also no shell wrapper. * The skills_sync invocation uses the venv's python by absolute path instead of sourcing activate inside a shell. skills_sync.py doesn't need anything from activate beyond sys.path, which the bin-stub python already provides. No behavior change. Just a smaller attack surface and a script that's easier to read.	2026-05-23 15:31:46 +10:00
Ben	8b6733ebe2	fix(service_manager): rip out dead port parameter PR #30136 review caught: `_allocate_gateway_port()` in profiles.py computed a SHA-256-derived port that was threaded through `register_profile_gateway(profile, port=N)` → `_render_run_script(profile, port, extra_env)` → and then ignored. The rendered run script picked the bind port from the profile's config.yaml (`[gateway] port = …`), never from the allocator. So the entire allocator + parameter chain was dead code. Remove: * `hermes_cli.profiles._allocate_gateway_port` (deterministic SHA-256 → [9200, 9800) — never used). * `port` kwarg from `ServiceManager.register_profile_gateway` (Protocol + Mixin + S6 implementation). * `port` positional arg from `_render_run_script(profile, port, extra_env)` — now `_render_run_script(profile, extra_env)`. * The pass-through call in `profiles._maybe_register_gateway_service`. config.yaml is now the single source of truth for gateway port selection — matches reality and reduces the API surface. Three explanatory comments in service_manager.py / profiles.py document the retirement so future readers don't reach for the allocator and find a ghost. Tests: drop the three `_allocate_gateway_port` tests; update fakes' signatures throughout test_service_manager.py and test_profiles_s6_hooks.py to match the new no-port API.	2026-05-23 15:30:15 +10:00
Ben	7b16e4448a	docs(compose): update entrypoint comment for s6-overlay PR #30136 review caught: docker-compose.yml still said "If you override entrypoint, keep /opt/hermes/docker/entrypoint.sh in the command chain." That was true under tini; under s6-overlay the entrypoint is /init plus main-wrapper.sh, and entrypoint.sh is now only a backward-compat shim. Replace with an accurate description: /init must remain first in the chain because it's PID 1 and runs the cont-init.d scripts (chown, profile reconcile, dashboard toggle) before any service starts.	2026-05-23 15:24:46 +10:00
Ben	9ba349b6e9	fix(docker): dashboard slot stays 'down' when HERMES_DASHBOARD unset PR #30136 review caught a false positive: when HERMES_DASHBOARD was unset, the dashboard run script did `exec sleep infinity`, so `s6-svstat /run/service/dashboard` reported the slot as 'up'. `hermes doctor` and any other s6-svstat-based health check saw the dashboard as supervised-running even though no dashboard process existed. Add cont-init.d/03-dashboard-toggle: writes a `down` marker file into `/run/service/dashboard/` when HERMES_DASHBOARD is falsy, removes any leftover marker when it's truthy. s6-supervise honors `down` by not starting the service, so s6-svstat reports 'down' — matching reality. The run script's HERMES_DASHBOARD case-statement stays in place as a belt-and-suspenders guard, so the two layers can never disagree. Two new integration tests lock the behavior: slot reports down when unset; slot reports up when set to 1.	2026-05-23 15:24:17 +10:00
Ben	1759c0f090	fix(service_manager): friendly errors for missing slots and s6-svc failures PR #30136 review caught: `S6ServiceManager.start/stop/restart` called `subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as a raw `CalledProcessError` traceback. The two cases operators actually hit are: 1. The service slot doesn't exist — most commonly because the user typed a profile name wrong (`hermes -p typo gateway start`). 2. s6-svc itself fails — most commonly EACCES on the supervise control FIFO when running unprivileged. Both deserve named errors with actionable messages, not stacktraces. Changes: * Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`: - `GatewayNotRegisteredError(profile)` — carries the unprefixed profile name; message: `no such gateway 'typo': register it with `hermes profile create typo` first, or pass an existing profile name via `-p <name>``. - `S6CommandError(service, action, returncode, stderr)` — carries the s6-svc rc and stderr; message: `s6-svc start on 'gateway-coder' failed (rc=111): <stderr>`. * Factor lifecycle dispatch through `_run_svc(flag, label, name)`: pre-checks that the service directory exists (raises GatewayNotRegisteredError before invoking s6-svc), then runs s6-svc and translates any CalledProcessError into S6CommandError. * `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway` catches both errors and prints `✗ <message>` + `sys.exit(1)` instead of letting the exception bubble. The dispatch path that used to dump a traceback at the user now gives an actionable one-liner. Tests: 6 new tests for the error types and their CLI rendering; existing lifecycle test pre-seeds the slot directory before calling `mgr.start` etc.	2026-05-23 15:20:41 +10:00
Ben	367c15b1dc	fix(container_boot): always register gateway-default slot PR #30136 review caught: `hermes gateway start` (no `-p`) inside the container resolves `_profile_suffix() == ""` → service name `gateway-default`, but no such slot was ever registered. The Phase 4 profile-create hook only fired on `hermes profile create <name>`, and the root profile (which lives at the top of $HERMES_HOME, not under `profiles/`) was never one of those. So bare `hermes gateway start` landed on `s6-svc -u /run/service/gateway-default` → uncaught `CalledProcessError` → traceback to the user. Changes: 1. `reconcile_profile_gateways` now always registers a `gateway-default` slot before iterating named profiles. Its prior state is read from `$HERMES_HOME/gateway_state.json` (sibling to the profile root, not under `profiles/`); stale runtime files there are swept the same way. Auto-up only if the prior state was `running` — same rule as named profiles. 2. `S6ServiceManager._render_run_script` special-cases `profile == "default"` to emit `hermes gateway run` with NO `-p` flag. Passing `-p default` would resolve to `$HERMES_HOME/profiles/default/` — a different profile that almost certainly doesn't exist. The empty profile-suffix convention is the dispatcher's contract and the run script has to match. 3. A user-created `profiles/default/` collides with the reserved root-profile slot; the reconciler now skips it with a warning rather than producing two registrations of the same service name. Action-list ordering is stable: `default` first, then named profiles in directory order. Boot-log readers can rely on this. Tests: 8 new dedicated default-slot tests plus updates to every existing test that asserted against the action list (via the new `_named_actions` helper that drops the always-present default entry).	2026-05-23 15:16:35 +10:00
Ben	04d1894f36	docs(docker): dashboard IS supervised — update note that contradicted the PR PR #30136 review caught that website/docs/user-guide/docker.md still said "The dashboard side-process is not supervised — if it crashes, it stays down until the container restarts." That was true under tini but is the opposite of the s6 behavior this PR ships and `test_dashboard_restarts_after_crash` proves. Replace with a description of what users actually see now: automatic restart by s6-overlay, new PID after a short backoff, logs via `docker logs`. The standalone-container caveat carries forward unchanged.	2026-05-23 15:08:48 +10:00
Ben	efd3569739	fix(gateway): route --all stop/restart through s6 under container PR #30136 review caught that `hermes gateway stop --all` and `... restart --all` were broken under s6. The Phase 4 dispatcher was gated on `not stop_all` (and the symmetric restart_all), so `--all` fell through to `kill_gateway_processes(all_profiles=True)`. pkill SIGTERMed every gateway, s6-supervise observed the crashes, and restarted every gateway ~1s later — net effect: `--all` kicked gateways instead of stopping them. Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates `mgr.list_profile_gateways()` and routes stop/restart through each service slot. s6's `want up`/`want down` flips correctly, so a stop persists. Partial failures are surfaced per-profile with a running success count; the host pkill path is only reached when s6 isn't in play. `start --all` isn't a CLI surface — the helper rejects it and returns False (host code path can take over).	2026-05-23 15:08:17 +10:00
Ben	8ae959adb6	fix(ci): drop --entrypoint override in hermes-smoke-test action PR #30136 review caught a silent regression: the smoke-test action overrode ENTRYPOINT to `/opt/hermes/docker/entrypoint.sh`, which the s6-overlay migration reduced to a shim that just `exec`s the stage2 hook. stage2-hook ignores its CMD args, prints "Setup complete", and exits 0 — so `hermes --help` and `hermes dashboard --help` never ran. The #9153 regression guard was a green-always no-op. Drop the override so the smoke test uses the image's real ENTRYPOINT chain (`/init` + `main-wrapper.sh`), which is the actual production startup path. `hermes --help` and `hermes dashboard --help` now run through the full supervision tree and exercise the real argv routing.	2026-05-23 15:00:43 +10:00
Ben	eb59d6f774	fix(docker): SHA256-verify s6-overlay tarballs PR #30136 review flagged the s6-overlay install as a supply-chain regression vs the gosu source it replaced — `tianon/gosu` was digest-pinned via `FROM ...@sha256:...`, but the three new ADD/curl downloads had no integrity check at all. Pin all three tarballs (noarch, symlinks-noarch, per-arch) to upstream-published SHA256s via ARGs. Verification happens via `sha256sum -c` against a single checksum file (avoids a piped-shell hadolint DL4006 warning under dash). To bump S6_OVERLAY_VERSION, fetch the four `.sha256` files from the new release and update the ARGs — documented inline. If upstream artifacts are tampered with mid-build, the build now fails loudly at the verification step instead of silently producing a tainted image.	2026-05-23 14:59:42 +10:00
Ben	928e52e574	fix(docker): support multi-arch s6-overlay install (amd64 + arm64) The Dockerfile only ADD'd `s6-overlay-x86_64.tar.xz`, so the `build-arm64` job in docker-publish.yml — which runs on `ubuntu-24.04-arm` and publishes by digest — produced an image whose `/init` couldn't exec on actual arm64 hosts. Apple Silicon and ARM server users were getting a broken container. Map BuildKit's `TARGETARCH` (`amd64` / `arm64`) to s6's kernel-arch naming (`x86_64` / `aarch64`) inside the RUN step and fetch the correct tarball via `curl` (`ADD`'s URL is evaluated at parse time, before TARGETARCH substitution, so dynamic arch selection requires RUN). The noarch + symlinks tarballs are architecture-independent and stay as ADDs. The audit case is now explicit: unsupported architectures fail loudly at build time rather than producing a silently-broken image.	2026-05-23 14:58:06 +10:00
Ben	2f8ceeab9a	fix(service_manager): s6 detection works for unprivileged hermes user PR #30136 review surfaced two issues, both rooted in the same audit gap: docker integration tests were running as root, not the unprivileged `hermes` user (UID 10000) that the runtime actually uses via `s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to the s6 control surface worked as root in the tests but was inert in production. Fixes: 1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`, which is root-only readable. For UID 10000 the symlink yields PermissionError, `resolve()` silently returns the unresolved path, and `exe.name == "exe"` — so detection always returned False, the service-manager runtime-registration path was inert, and every `hermes profile create` / `hermes -p X gateway start` silently skipped the s6 hook. Replace with `/proc/1/comm` (world-readable) + `/run/s6/basedir` (s6-overlay-specific) — both required, fail closed. 2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/` {control,lock} to hermes so `s6-svscanctl -a/-an` works without root. Previously the directory chown stopped at `/run/service` and the FIFO inside stayed root-owned, so `register_profile_gateway` from hermes failed at the rescan-trigger step with EACCES — the wrapper in profiles.py caught the exception and printed a swallowed warning, so profile creation appeared to succeed while the slot was rolled back. Audit changes to flush this class of bug next time: - Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py` that default to `-u hermes`. The module docstring explains why and flags `user="root"` as opt-in only for tests that explicitly need root (none currently do). - Refactor every `docker exec` call in tests/docker/ through the new helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py, test_container_restart.py, test_s6_profile_gateway_integration.py). - Add 5 unit tests covering `_s6_running` under various probe states (both signals present; comm wrong; basedir missing; PermissionError on /proc/1/comm; missing /proc — non-Linux). The PermissionError test is the explicit regression guard for the original bug. Known follow-up: the per-service `supervise/control` FIFO inside each `/run/service/gateway-<profile>/supervise/` is created root-owned by s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc -u/-d/-t` from the hermes user will get EACCES on those. The audit under `-u hermes` will reveal this in lifecycle tests — surfacing the issue cleanly so it can be fixed in a focused follow-up (likely via a small SUID helper or a polling chown loop in cont-init.d). The detection + svscanctl fixes here are independent and complete on their own.	2026-05-23 14:56:39 +10:00
Ben	a6f7171a5e	feat(docker): remove gosu from bundled image; s6-setuidgid handles privilege drop The s6-overlay migration replaced every runtime use of gosu with s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run scripts, and cont-init.d hooks), but the gosu binary itself was still being copied into the image from tianon/gosu, and several comments across the repo still pointed to it. Image changes: - Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage - Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer - Net: one fewer base-image pull, ~12-15 MB layer eliminated Documentation/comment refresh (no behavior change): - Dockerfile: update root-user rationale comment + cont-init.d comment - docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference - docker-compose.yml: update UID/GID remap comment - .hadolint.yaml: update DL3002 ignore rationale - website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now - hermes_cli/config.py: docker_run_as_host_user docstring tools/environments/docker.py runs arbitrary user images via the terminal backend, not the bundled Hermes image. It still needs SETUID/ SETGID caps so user images that use gosu/su/s6-setuidgid all work. Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and updated comments to list s6-setuidgid alongside the others as examples. The matching test (test_security_args_include_setuid_setgid_for_gosu_drop → test_security_args_include_setuid_setgid_for_privdrop) was renamed and its docstring updated; behavior is unchanged. Verification: - hadolint clean against .hadolint.yaml - shellcheck clean against all docker/ shell scripts - Image rebuilt successfully (sha 1a090924ccea) - Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4 per-profile-gateway lifecycle + container-restart reconciliation) - tests/tools/test_docker_environment.py: 23 passed (rename did not break test discovery; pre-existing unrelated mock warning) The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md) intentionally retains its historical references to gosu — it describes the pre-s6 entrypoint as background for understanding the migration.	2026-05-22 11:47:42 +10:00
Ben	7d07dd60a8	docs(s6): document container supervision; doctor + skill + user-guide updates Phase 5 of the s6-overlay supervision plan. Documentation + small diagnostic cleanups; no behavior changes. website/docs/user-guide/docker.md: - Replace the old 'entrypoint script does the bootstrap' section with the s6-overlay boot flow (cont-init.d/01-hermes-setup, cont-init.d/02-reconcile-profiles, static main-hermes + dashboard services, ENTRYPOINT-as-main-program pattern). - Add a 'Per-profile gateway supervision' subsection covering the new lifecycle commands, restart semantics, log persistence, and 'Manager: s6 (container supervisor)' status reporting. - Add 'Breaking change vs. pre-s6 images' callout naming the /init ENTRYPOINT and pointing affected wrappers at the pin workaround. website/docs/user-guide/profiles.md: - Add a note under 'Persistent services' pointing container users at the docker.md section explaining s6 supervision inside the image. Host-side systemd/launchd documentation is unchanged. skills/software-development/hermes-s6-container-supervision/SKILL.md: - New maintainer skill covering the supervision-tree map, file layout, the Architecture B rationale (cont-init.d args + halt exit-code propagation), quick recipes, and the 8 pitfalls we hit while implementing the plan (PATH-without-/command, root-owned profile dirs, SOUL.md as marker, the '143' anti-pattern, etc.). hermes_cli/doctor.py: - _check_gateway_service_linger skips on s6 (the linger concept doesn't apply inside the container). - New _check_s6_supervision section reports main-hermes/dashboard state and per-profile-gateway count (registered vs supervised up), only inside the s6 container. Host doctor output unchanged. - External Tools / Docker check no longer emits a 'docker not found' warning inside the container; prints an explanatory info line instead. Still respects an explicit TERMINAL_ENV=docker (in case the user mounted /var/run/docker.sock). hermes_cli/gateway.py: - Document _container_systemd_operational more precisely: it's NOT for our Hermes Docker image (s6-overlay handles that via detect_service_manager() == 's6'). It still covers systemd-nspawn / k8s-with-systemd-init cases, so leaving it in place is correct; the docstring just makes that explicit. Test harness (verification, no test changes in this commit): 19 passed, 0 xfailed. 66 service-manager / container-boot / profiles-s6-hooks / gateway-s6-dispatch unit tests still green. 61 doctor tests still green. Hadolint + shellcheck clean. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:47:42 +10:00
Ben	57c6e29666	feat(docker): per-profile s6 supervision + container-restart reconciliation Phase 4 of the s6-overlay supervision plan. Activates the Phase 3 S6ServiceManager by hooking it into the profile lifecycle and the `hermes gateway start/stop/restart` dispatcher, and adds a cont- init.d-time reconciliation pass that survives `docker restart`. Task 4.0 — container-boot reconciliation: /run/service/ is tmpfs, so every `docker restart` wipes every per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles invokes hermes_cli.container_boot.reconcile_profile_gateways() on every boot, which walks $HERMES_HOME/profiles/<name>/, reads each gateway_state.json, recreates the s6 service slot, and auto-starts only those whose last state was 'running'. Other states (stopped, starting, startup_failed, missing) register the slot in the down state — avoiding crash-loops across restarts for a gateway that was broken last boot. Per-profile outcome is recorded to $HERMES_HOME/logs/container-boot.log. Implementation: hermes_cli/container_boot.py + 12 unit tests. Profile-marker is SOUL.md, not config.yaml, because `hermes profile create` only seeds SOUL.md by default (config.yaml comes from `hermes setup`). Task 4.1 / 4.2 — profile create/delete hooks: hermes_cli/profiles.py::create_profile now calls _maybe_register_gateway_service(<canon>) at the end, which routes through ServiceManager.register_profile_gateway when running on s6 and no-ops on host backends. delete_profile mirrors with _maybe_unregister_gateway_service. _allocate_gateway_port produces a deterministic SHA-256-derived port in [9200, 9800). Task 4.3 — gateway dispatch + remove rejection arms: _dispatch_via_service_manager_if_s6(action) intercepts start/stop/restart at the top of each subcommand and routes them through S6ServiceManager.{start,stop,restart}. The pre-Phase-4 `elif is_container():` rejection arms are kept as fallback for pre-s6 containers / unsupported runtimes, but only ever fire when detect_service_manager() != 's6'. install/uninstall under s6 print informational guidance pointing users at profile create/delete. Removed the two xfail(strict=True) markers from tests/docker/test_profile_gateway.py — both tests now pass strictly. Task 4.4 — status reporting: get_gateway_runtime_snapshot() reports Manager: 's6 (container supervisor)' inside an s6 container instead of 'docker (foreground)'. Plan-vs-reality drift fixed in this commit: - Plan's S6ServiceManager._render_run_script used `gateway start --foreground --port {port}` — invented args; the real CLI is `gateway run`. Switched accordingly. port arg retained for API parity but now documented as 'currently ignored'. - Plan's reconciler keyed on config.yaml; switched to SOUL.md (config.yaml is created by hermes setup, not by hermes profile create, so the original gate caught nothing). - The plan's _dispatch helper used _profile_arg() which returns '--profile <name>' (i.e. with the flag prefix). Switched to _profile_suffix() which returns the bare name. - Architecture B's docker exec doesn't get /command on PATH or the venv on PATH; Dockerfile's runtime PATH now includes /opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works without sourcing the venv. - stage2-hook now chowns $HERMES_HOME/profiles to hermes on every boot, not just on the UID-remap path. Without this, files created by docker-exec-as-root accumulate and the next reconciler run fails with PermissionError reading SOUL.md. Test harness: 19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to passing). 78 unit tests across service_manager + container_boot + profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck pass cleanly. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:47:42 +10:00
Ben	ad5fdab092	feat(service_manager): add S6ServiceManager for runtime gateway supervision Phase 3 of the s6-overlay supervision plan. Implements the runtime- registration surface from D4 — only the s6 backend supports register_profile_gateway / unregister_profile_gateway / list_profile_gateways; host backends continue to raise NotImplementedError. No caller yet (Phase 4 wires in the profile create/delete hooks). Key implementation notes: - Service directory shape: /run/service/gateway-<profile>/{type,run,log/run}. Atomic register: write to gateway-<profile>.tmp, fsync via os.rename. Cleanup on rescan failure. - Run script uses #!/command/with-contenv sh so HERMES_HOME and any extra_env arrive at exec time. The hermes -p <profile> gateway start --foreground --port <port> command is wrapped in s6-setuidgid hermes for the per-service privilege drop (OQ2-A). - Log script (OQ8-C): persists via s6-log to ${HERMES_HOME}/logs/gateways/<profile>/. CRITICAL — HERMES_HOME is a runtime env-var expansion in the rendered script, NOT a Python f-string substitution. Negative-asserted in test_s6_register_creates_service_dir_and_triggers_scan so regressions are caught. - PATH gotcha: /command/ is only on PATH for processes spawned by the supervision tree (services, cont-init.d). `docker exec` and profile-create hooks don't get it. S6ServiceManager calls all s6-* binaries via absolute path through the new _S6_BIN_DIR constant so callers don't have to fix up env vars. - validate_profile_name rejects path-traversal, leading-dash (s6 would parse as a flag), uppercase, whitespace, and names >251 chars (s6-svscan default name_max). Test coverage: - 13 new unit tests in tests/hermes_cli/test_service_manager.py (kind detection, run-script content, env quoting, register rollback on rescan failure, unregister idempotence, list filter, lifecycle dispatch, svstat parsing). Total: 36 passing. - 2 new in-container integration tests in tests/docker/test_s6_profile_gateway_integration.py validating end-to-end registration against a real s6 supervision tree. Docker harness: 14 passed, 2 xfailed (Phase 4 target unchanged). Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:47:41 +10:00
Ben	4826ea7b41	feat(docker)!: replace tini with s6-overlay as PID 1 BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay) instead of /usr/bin/tini. Main hermes runs as the container CMD with TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the ground is laid for per-profile gateway supervision (Phase 3+4). All five pre-s6 docker run invocation patterns continue to work identically — verified by the Phase 0 docker harness: docker run <image> → `hermes` with no args docker run <image> chat -q "..." → `hermes chat -q ...` passthrough docker run <image> sleep infinity → `sleep infinity` direct docker run <image> bash → interactive bash docker run -it <image> --tui → interactive Ink TUI Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint + shellcheck pass cleanly. Architecture pivot from plan v3 (documented in main-hermes/run header): the plan called for main hermes to be an s6-supervised service, but two real s6-overlay v3 mechanics blocked that — cont-init.d scripts receive no arguments (CMD args are not visible to stage2-hook), and `/run/s6/basedir/bin/halt` after writing the exit code did not propagate the desired exit code (container exits 143). We use the s6-overlay-native CMD pattern instead: main-wrapper.sh is the container's main program (ENTRYPOINT prepends it so leading-dash args like --version aren't intercepted by /init), exec's the final program with stdin/stdout/stderr inherited, and the program's exit code becomes the container exit code. main-hermes is now a no-op `sleep infinity` slot kept for future supervised-gateway-container modes. This trades "supervised restart of main hermes" for arg- parity with the pre-s6 contract — main hermes was already unsupervised under tini, so we lose nothing functional. Dashboard supervision is the only new guarantee added by this phase. Files added: docker/main-wrapper.sh # arg routing + s6-setuidgid drop docker/stage2-hook.sh # gosu-equivalent + chown + seed docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base} docker/s6-rc.d/dashboard/{type,run,dependencies.d/base} docker/s6-rc.d/user/contents.d/{main-hermes,dashboard} Files changed: Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:47:41 +10:00
Ben	cf6133495c	feat(service_manager): add ServiceManager protocol + host wrappers Phase 1 of the s6-overlay supervision plan. Pure-refactor addition: introduces the abstract interface (with runtime_checkable Protocol), detect_service_manager(), validate_profile_name(), and thin SystemdServiceManager / LaunchdServiceManager / WindowsServiceManager wrappers around the existing systemd_* / launchd_* / gateway_windows.* module-level functions. No host call site was modified — host code continues to use the existing functions directly; the protocol is for new backend-agnostic code (Phase 4 profile create/delete hooks and the Phase 4 s6 dispatch path in 'hermes gateway start/stop/restart'). WindowsServiceManager.install() forwards the v3 kwargs (start_now, start_on_login, elevated_handoff) added in PRs #28169-adjacent so non-Windows callers — there aren't any today — can opt in. The s6 backend lands in Phase 3; until then get_service_manager() raises a clear error if invoked on a host that detects as 's6'.	2026-05-22 11:47:41 +10:00
Ben	c6febe3765	ci(docker): add hadolint + shellcheck for container build inputs Phase 0.5 of the s6-overlay supervision plan. Catches Dockerfile and shell-script regressions that the behavioral docker-publish smoke test can't surface — unquoted variable expansions, silently-failing RUN commands, missing apt-get clean, etc. Both lint clean against the current (tini) Dockerfile + entrypoint.sh at the configured thresholds (hadolint: warning, shellcheck: error). Each ignore in .hadolint.yaml carries a one-line justification; the shellcheck severity floor is documented in the workflow file. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:47:41 +10:00
Ben	a957ef0834	test(docker): stabilize Phase 0 baseline harness Two pre-existing baseline issues found while running the Phase 0 harness against the tini image that need fixing before later phases can use the harness as a behavior-parity oracle: 1. The autouse `_enforce_test_timeout` fixture in tests/conftest.py hard-coded a 30s SIGALRM, which preempted any `pytest.mark.timeout` marker (already honored by pytest-timeout). Honor the marker if present; fall back to 30s otherwise. Docker harness tests carry a 180s marker applied at collection time in tests/docker/conftest.py. 2. test_dashboard_port_override polled via `ss -tlnp` / `netstat -tln` — neither is installed in the Hermes image, so the probe trivially failed even when the dashboard was bound. The dashboard also takes 8-15s to bind on cold image; the 5s sleep was insufficient. Replace with a poll loop reading /proc/net/tcp directly (port 9120 = 0x23A0, state 0A = LISTEN). Bump probe deadline to 60s and switch test_dashboard_opt_in_starts to a similar poll for pgrep so we don't regress to the same race. Result: 11 passed, 2 xfailed (Phase 4 target) on tini image. Harness now ready to serve as Phase 2's behavior-parity oracle.	2026-05-22 11:47:41 +10:00
Ben	60d8e07ded	test(docker): apply 180s timeout to docker harness tests The agent-test suite default is 30s; docker test_no_args (the dashboard spin-up, the container restart) routinely take 60-90s. Without this they intermittently fail in CI with TimeoutError.	2026-05-22 11:46:52 +10:00
Ben	244d62ded3	test(docker): lock baseline behavior for Phase 0 harness Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the user-visible behavior we must preserve through the Phase 2 init- system swap: - test_main_invocation.py (Task 0.2): docker run <image> with no args, chat subcommand passthrough, bare executable passthrough, bash pattern, exit-code propagation - test_tui_passthrough.py (Task 0.3): TTY allocation via docker -t using the host's script(1) for a PTY - test_dashboard.py (Task 0.4): HERMES_DASHBOARD=1 opt-in, HERMES_DASHBOARD_PORT override - test_profile_gateway.py (Task 0.5): per-profile gateway start/stop and profile-delete-stops-gateway. Both marked xfail(strict=True) because the current tini image refuses gateway lifecycle commands inside the container; Phase 4 Task 4.3 flips them to passing. - test_zombie_reaping.py (Task 0.6): PID 1 reaps orphaned zombies. tini does this today; s6-overlay's /init must continue to. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:46:52 +10:00
Ben	705256aaa6	test(docker): add conftest fixtures for docker harness Task 0.1 of the s6-overlay supervision plan. Establishes the test infrastructure for tests/docker/: skip-on-missing-Docker collection hook, session-scoped image-build fixture (overridable via the HERMES_TEST_IMAGE env var for faster local iteration), and a container_name fixture that ensures cleanup on test exit. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-22 11:46:52 +10:00
Ben	ef536880a3	docs(plans): add s6-overlay supervision plan (v3) Replace tini with s6-overlay as PID 1 in the Hermes Docker image so that main hermes, the dashboard, and dynamically-created per-profile gateways all run as supervised services. Includes container-boot reconciliation (Task 4.0) so per-profile gateways survive docker restart. Plan history: - v1: 2026-05-07 — original design (subagent gateways scope) - v2: 2026-05-18 — re-validated, scope narrowed to per-profile gateways, WindowsServiceManager added to protocol - v3: 2026-05-21 — re-validated in docker_s6 worktree, install-method stamp preservation noted in Task 2.3, Task 4.0 added for container restart survival 12.5 engineering days estimated across 7 phases.	2026-05-22 11:46:52 +10:00
brooklyn!	a7cd254c29	feat(tui): mouse_tracking DEC mode presets (salvage of #26681 ) (#30084 ) * feat(tui): make display.mouse_tracking pick which DEC modes to enable Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006. Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row fire a clipboard probe that surfaces as "No image in clipboard" — sometimes dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too, since tmux's own scrollback is preempted by the alt-screen TUI. `display.mouse_tracking` (and `/mouse <preset>`) now accepts `off \| wheel \| buttons \| all` in addition to the legacy booleans. `wheel` is 1000+1006: scroll wheel + click only, no drag, no hover — the tmux-friendly subset. `buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.). * fix(tui): repaint + sync mouse mode when display.mouse_tracking changes Two interacting bugs left the TUI blank when `display.mouse_tracking` switched at runtime (config edit, /mouse <preset>): 1. AlternateScreen's effect re-runs on every `mouseTracking` change, tearing down and re-entering the alt screen. After re-entry, ink's frame buffers are reset by `resetFramesForAltScreen()` but nothing schedules the follow-up render — the alt screen sits blank until some other state change happens to trigger one. Add a `scheduleRender()` in `setAltScreenActive`'s active=true branch so the freshly-entered alt screen gets a full repaint immediately. 2. `setAltScreenActive` early-returns when `active` hasn't changed, which silently drops a `mouseTracking` change if the cleanup→setup pair somehow leaves `altScreenActive` already true. Call `setAltScreenMouseTracking` explicitly from the AlternateScreen effect so the in-memory mode and terminal DECSET sequence stay in sync regardless of how `setAltScreenActive` resolved (the call is a no-op when the mode is unchanged). * fix(tui): address copilot review #4341269705 - tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already centralizes the canonical preset set via its values; the separate constant added no behavior. - tests/test_tui_gateway_server.py: update the existing test_config_mouse_uses_documented_key_with_legacy_fallback to assert the new preset strings ('all'/'off' instead of 'on'/'off', display.mouse_tracking persisted as 'all' instead of True) and add test_config_mouse_accepts_preset_strings_and_aliases covering /mouse set with wheel/click/unknown (comment #3284802453). The on/off legacy config.set return shape was an implementation detail of the boolean flag, not a stable API — the slash command, gateway help text, and docs all advertise the preset values now. - ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the end of reenterAltScreen() (comment #3284802461). Mirrors the same fix in setAltScreenActive() from `ece0a2f4c` — without it, SIGCONT/resize self-heal/stdin-gap re-entry leaves the alt screen blank because every caller returns early after invoking us. * fix(tui): address copilot review #4341308478 round 2 - ui-tui/src/config/env.ts (comment #3284837577): the precedence comment was misleading. Actual behavior on origin/main is HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default > HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from main; the only change here was the wrong comment that claimed DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block to document the actual precedence ladder. - tui_gateway/server.py /mouse set (comment #3284837607): replaced 'str(value or "").strip().lower()' with the explicit None idiom already used for /indicator, so programmatic callers can pass 0 / False and have them route through _MOUSE_TRACKING_ALIASES → 'off' instead of collapsing to '' and triggering the toggle path. - ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx (comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before enableMouseTrackingFor(...) on mount. Otherwise selecting 'wheel'/'buttons' from a state where DEC 1003 was already asserted (crash, another app, debugger) would silently leave hover on. Also unconditionally DISABLE on unmount so a crash mid-mount can't leak DEC modes back to the host shell. * chore(release): map nat@nthrow.io to @nthrow for #26681 salvage * fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen Copilot review #4341356637 (comment #3284880417). The explicit setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true, mouseTracking) was defensive paranoia added in the previous fix commit that's not actually reachable in practice: - React's cleanup always runs before the next setup, so on any prop change (mouseTracking or writeRaw) the cleanup sets active=false first. Setup then sees active was false and applies the new mode via setAltScreenActive without early-returning. - On the impossible 'active stayed true' path, the writeRaw above has already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode) to the terminal, so the in-memory mode would lag but the visible state is already correct. Removing the redundant call means a single DEC sequence per mount. If the 'active stayed true' path ever manifests in practice, the right fix is in setAltScreenActive (track mode regardless of the active early-return), not here. * fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx Copilot review #4341379994 (comments #3284900825, #3284900840, #3284900852). Three remaining call sites in ink.tsx still re-enabled mouse tracking without first sending DISABLE_MOUSE_TRACKING: - handleResize alt-screen recovery (line ~577) - reassertTerminalModes stdin-gap re-assertion (line ~1351) - reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408) For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally- asserted DEC 1003 (other apps, prior crash, tmux state) still active and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING is idempotent and safe to send unconditionally — it resets all four modes. Matches the pattern already in setAltScreenMouseTracking and the AlternateScreen mount path. * fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen Copilot review #4341452823 (comment #3284959762). exitAlternateScreen() was the last call site in ink.tsx still re-enabling mouse tracking without DISABLE first. Editors (vim/nvim/less) and tmux can leave DEC 1003 hover asserted across the handoff back; without DISABLE, 'wheel'/'buttons' presets silently kept hover on after the editor quit. Now all five enableMouseTrackingFor() call sites in ink.tsx prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes, reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen. * fix(tui): add defensive default to enableMouseTrackingFor switch Copilot review #4341485231 (comment #3284979323). TS exhaustive switch returns string per the type system, but a JS caller / corrupted config / hot-reload-in-dev could reach the function with an unknown value at runtime. Without a default, that path returns undefined which then concatenates as the literal string 'undefined' into the terminal byte stream — visibly garbling output. Treat unknown as 'off' (no DEC sequences) so the worst case is silent input loss rather than a wrecked screen. --------- Co-authored-by: Nat Thrower <nat@nthrow.io>	2026-05-21 20:25:52 -05:00
Ben Barclay	4d58e48cdb	Merge pull request #29387 from NousResearch/fix/no-docker-tag fix(ci): stop pushing per-commit SHA tags to Docker Hub	2026-05-22 10:38:32 +10:00
xxxigm	bec2250d2c	test(computer_use): end-to-end regression for capture routing (#24015 ) Add tests/tools/test_computer_use_capture_routing.py — 13 integration tests that drive _capture_response end-to-end with deterministic stubs for the routing helper, _run_async, vision_analyze_tool, and get_hermes_dir, so the full code path is exercised without a live cua-driver, real auxiliary client, or network access. Coverage: * TestCaptureResponseDefaultPath (3 cases) - SOM PNG capture returns the legacy multimodal envelope when the routing helper says 'native' (image/png MIME). - Same path returns image/jpeg MIME for JPEG payloads (cua-driver can return either). - AX-only mode never even consults the routing helper because no PNG is present. * TestCaptureResponseRoutedToAuxVision (5 cases) - SOM capture with routing on returns a JSON string with the vision_analysis embedded, the AX/SOM index preserved, and NO image_url parts. Verifies the aux call receives a path under the configured cache and a prompt that grounds itself against the AX summary. - Temp screenshot file is unlinked after _capture_response returns, including when the aux call raises (the finally block runs). - Empty / malformed aux analysis falls back to the multimodal envelope so the user always gets something useful. * TestRoutingDecisionWiring (4 cases) - Explicit auxiliary.vision in config flips routing on regardless of main-model vision capability. - Vision-capable main + native tool-result support keeps multimodal. - Config load failure fails open (returns False, multimodal path continues to work). - Helper exception is swallowed and routes to legacy behaviour. * TestBugReproductionAnchor (1 case) - directly pins the #24015 contract: when routing is on, the response must NEVER contain a 'data:image' or 'image_url' substring. That is exactly what tripped the reporter's HTTP 404 ('No endpoints found that support image input') on tencent/hy3-preview before the fix. Bug-reproduction proof: $ git checkout upstream/main -- tools/computer_use/tool.py $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py ============================== 13 failed in 1.29s ============================== $ # restore tool.py to this branch's HEAD $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py ============================== 13 passed in 1.04s ============================== Total branch coverage: 85 passed across test_computer_use.py, test_computer_use_vision_routing.py, test_computer_use_capture_routing.py	2026-05-21 17:38:19 -07:00
xxxigm	e02a7e5e1c	fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015 ) When the active main model has no vision capability — or when the user explicitly configured auxiliary.vision in config.yaml — sending the captured screenshot back to the main model in a multimodal tool-result envelope is the wrong move: it trips HTTP 404 / 400 at the provider boundary (e.g. 'No endpoints found that support image input') and the agent loop reports a hard tool failure for what should have been a simple capture. The reporter on #24015 hit this with: model: default: tencent/hy3-preview # no vision support provider: openrouter auxiliary: vision: provider: openrouter model: google/gemini-2.5-flash # explicitly configured …and observed: computer_use(action='capture', mode='som') → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404] 🔌 Provider: openrouter Model: tencent/hy3-preview 📝 Error: HTTP 404: No endpoints found that support image input Fix: in tools/computer_use/tool.py::_capture_response, after a screenshot is captured (modes 'som' / 'vision'), consult the routing helper introduced earlier in this branch. When it says 'route to aux', materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze on it (which honours auxiliary.vision via the standard async_call_llm task='vision' router), and return a text-only JSON tool result that embeds the analysis alongside the existing AX/SOM index. The main model never sees the pixels — it sees an actionable text description plus the same set-of-mark element index it normally uses. The two new helpers (_should_route_through_aux_vision, _route_capture_through_aux_vision) keep the policy and the IO separated so each can be tested in isolation. Both fail open: if the config import fails, if the aux call raises, or if the analysis is empty, we fall back to the existing multimodal envelope so the behaviour is at worst the pre-fix status quo. Temp screenshot files are cleaned up unconditionally in a finally block — even on aux call failure — to avoid leaving residue under cache/vision/. The end-to-end regression for #24015 is added in the next commit.	2026-05-21 17:38:19 -07:00
xxxigm	5ce5fe3181	test(computer_use): cover capture vision-routing helper Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests that pin the contract of the new vision-routing helper introduced in the previous commit: * TestExplicitAuxVisionOverride (12 cases): mirror the auxiliary.vision detection rules used by agent.image_routing so the capture path and the user-attached-image path agree on what counts as an explicit override (provider/model/base_url with non-blank, non-'auto' values). * TestRouteDecision (7 cases): pin the policy itself — explicit override always wins, vision-capable + native-tool-result keeps multimodal, everything else fails closed and routes to aux. * TestLookupHelpers (5 cases): defensive paths for the models.dev / tool-result-support lookups (blank inputs, exceptions, missing caps). * TestModuleSurface (4 cases): pin the public/__all__ surface and keep internal helpers addressable so the integration test in the next commit can monkeypatch them deterministically. Run with: scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py	2026-05-21 17:38:19 -07:00
xxxigm	531efe7208	fix(computer_use): add helper to decide capture vision routing Add tools/computer_use/vision_routing.py with should_route_capture_to_aux_vision(provider, model, cfg) — a small policy helper that decides whether a captured screenshot should be returned as a multimodal envelope (main model has native vision) or pre-analysed through the auxiliary.vision pipeline so the main model only sees text. The decision mirrors agent.image_routing.decide_image_input_mode for user-attached images, so the capture path and the user-turn path agree on what counts as an explicit aux vision override: * provider/model/base_url under auxiliary.vision => explicit override => route through aux vision * provider+model accepts multimodal tool results AND main model reports supports_vision=True => keep multimodal envelope * everything else (no tool-result image support, non-vision model, metadata lookup failure) => fail closed and route through aux No call sites are changed in this commit; the helper is added in isolation so the routing decision can be unit-tested before it is plumbed into _capture_response().	2026-05-21 17:38:19 -07:00
Teknium	2a474bcf72	fix(termux): resolve packed-refs and worktree refs in skill-sync fingerprint The bundled-skill sync stamp added in the cherry-picked salvage commit parsed .git/HEAD and looked for a loose ref file in the worktree gitdir only, so two real cases hit the unresolved branch: - repos after `git gc` where active refs live in packed-refs - linked worktrees, whose branch ref lives in <commondir>/refs/heads/ (verified on the worktree this salvage was built in) Both fell back to a constant-string fingerprint, so post-commit launches would never re-run the real skill sync. Now we resolve packed-refs and check both the worktree gitdir and the common dir for loose refs. Adds three tests covering: packed-refs resolution, worktree common-dir packed lookup, worktree common-dir loose lookup, and the explicit 'unresolved' marker (still stable + version-fallback-safe).	2026-05-21 17:19:05 -07:00
adybag14-cyber	6dbbf20ff4	perf(termux): speed up non-tui cli startup	2026-05-21 17:19:05 -07:00
briandevans	5aa4727f34	fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1) `CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back to the frontmost on-screen window when X matched no app — typically a menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than the requested app. The agent then received UI elements for the wrong app and clicked / typed into it. The root cause is a localized macOS app name mismatch: `list_windows` returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese system) but callers naturally pass the English name ("Calculator"). The substring filter doesn't match, and the code falls through to picking the frontmost window with no signal that the filter was effectively dropped. Fix: - `capture(app=…)`: when the filter matches nothing, return a `CaptureResult` with empty `app`/`elements` and a diagnostic `window_title` pointing the caller at `list_apps` and noting the localized-name convention. `_active_pid` / `_active_window_id` are left untouched so a subsequent action doesn't inadvertently hit the wrong process. - `focus_app(app=…)`: when the filter matches nothing, set `target = None` and let the existing `return ActionResult(ok=False, …, "No on-screen window found for app …")` path fire instead of falsely reporting success on the frontmost window. This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046; bugs 3 & 4 in #30032.	2026-05-21 17:15:35 -07:00
Bartok9	4cc18877c6	fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5) Bug 2 (capture_after=True loses app context): _maybe_follow_capture called backend.capture(mode='som') with no app=, causing cua-driver to capture the frontmost window instead of the app targeted by the preceding capture/focus_app. Fix: track _last_app on CuaDriverBackend and thread it through the follow-up capture call so the same app is re-captured regardless of which window has OS focus. Bug 5 (element labels stripped in capture results): _ELEMENT_LINE_RE matched the classic ' - [N] AXRole "label"' format but not the '[N] AXRole (order) id=Label' format introduced in cua-driver v0.1.6. All element labels were silently dropped as empty strings, making element identification impossible. Fix: extend regex to capture both group(3) (quoted label) and group(4) (id= label), and update _parse_elements_from_tree to use group(4) as fallback. Both old and new cua-driver output now produce populated UIElement.label values. focus_app() now also sets _last_app so that capture_after= on any subsequent action re-targets the focused app. 5 new regression tests added. Part of #24170 (bugs 1 and 3/4 addressed separately).	2026-05-21 14:19:09 -07:00
Teknium	3fde8c153d	fix(skills): prune dependency/venv dirs from all skill scanners (#30042 ) * fix(skills): skip dependency dirs in skill scan * fix(skills): widen sibling rglob scanners to use shared exclusion set Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS in the canonical walker (iter_skill_index_files), which fixes the user-visible discovery path. This commit sweeps the ~12 other rglob('SKILL.md') sites that did their own ad-hoc filtering — most only checked .git/.hub, some had no filter at all — so dependency dirs (.venv, node_modules, site-packages, etc.) cannot leak ghost skills through the secondary paths. Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates all 13 sites to use it. Removes 3 hardcoded duplicate filter sets. Sites touched: agent/curator_backup.py - skill backup file count gateway/run.py - disabled-skill response (2 sites) hermes_cli/dump.py - skill count in env dump hermes_cli/profile_describer.py- profile description (2 sites) hermes_cli/profile_distribution.py - profile install count hermes_cli/profiles.py - profile skill count hermes_cli/skills_hub.py - category detection tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper) tools/skill_usage.py - usage tracking + skill dir lookup (2 sites) tools/skills_hub.py - optional skills find + scan (2 sites) tools/skills_sync.py - bundled skills sync E2E verified with the exact reported shape (bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no sibling site picks up the ghost skill, all five legit-skill counts still return 1. * chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep --------- Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>	2026-05-21 14:18:02 -07:00
helix4u	3462b097e2	fix(voice): chunk oversized CLI recordings	2026-05-21 14:17:39 -07:00
Teknium	552e9c7881	feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035 ) * feat(secrets): Bitwarden Secrets Manager integration with lazy bws install Pull API keys from Bitwarden Secrets Manager at process startup instead of storing them all in plaintext in ~/.hermes/.env. One bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and rotating a credential becomes a single change in the Bitwarden web app. Bitwarden defaults to source of truth: secrets pulled from BSM overwrite any matching env vars on startup so rotations actually take effect. Set secrets.bitwarden.override_existing: false in config.yaml to invert. The bws binary is auto-downloaded into ~/.hermes/bin/bws on first use (pinned to v2.0.0, SHA-256 verified against the GitHub release checksum file). No apt, brew, or sudo required. New surfaces: hermes secrets bitwarden setup — interactive wizard hermes secrets bitwarden status — config + binary + token state hermes secrets bitwarden sync — dry-run fetch / --apply exports hermes secrets bitwarden disable — flip enabled: false hermes secrets bitwarden install — just download the binary Failures (missing binary, bad token, no network) never block Hermes startup — they emit a one-line warning to stderr and continue with whatever credentials .env already had. Docs: website/docs/user-guide/secrets/{index,bitwarden}.md Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws subprocess and HTTP downloads fully mocked) * chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid Generated for PR #30035 — Bitwarden Secrets Manager integration. Style picked via pick_pr_infographic_style.py rotation: layout: bento-grid style: retro-pop-grid aspect: 1:1 square Saved at infographic/bitwarden-secrets-manager/infographic.png	2026-05-21 14:10:34 -07:00
liuhao1024	18cd1e5c72	fix(computer_use): correct type_text MCP tool name and implement drag action Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars' which does not exist in current cua-driver. Changed to 'type_text' which is the correct MCP tool name. Bug 4: The drag() method returned a hardcoded 'not supported' error even though cua-driver exposes a 'drag' MCP tool. Implemented proper drag dispatching with coordinate-based and element-based targeting. Added dispatch-level validation for drag to ensure from/to coordinates or elements are provided before calling any backend. Fixes #24170 (bugs 3 and 4)	2026-05-21 14:08:28 -07:00
github-actions[bot]	0ce12a9241	fix(nix): auto-refresh npm lockfile hashes Source: `56b79f12ac` Run: https://github.com/NousResearch/hermes-agent/actions/runs/26250404490	2026-05-21 20:11:48 +00:00
Teknium	56b79f12ac	fix(dashboard): remove country flags from language picker (#29997 ) Closes #29750. Reporter flagged that 繁體中文 displayed the TW flag instead of the PRC flag. Rather than picking a side, drop the language-flag pairings entirely — languages aren't countries (English ≠ GB, Portuguese ≠ PT, Mandarin variants ≠ any single jurisdiction), and endonyms are unambiguous. - LOCALE_META: strip flagCountryCode field - LanguageSwitcher: remove LocaleFlagIcon component + both call sites - main.tsx: drop flag-icons CSS import - package.json: uninstall flag-icons	2026-05-21 13:10:52 -07:00
teknium1	3d2f146460	fix(tui): also pass --expose-gc on the wheel-bundled launch path The original PR fixed the ext_dir and built-tui paths but missed the sibling pip-wheel path at line 1155. Without this, wheel installs would lose --expose-gc entirely (the env-var append at the call site was already removed). All three production node-launch sites now pass --expose-gc via argv consistently.	2026-05-21 13:10:34 -07:00
teknium1	2e3f576298	chore(release): map yichengqiao21 to YarrowQiao	2026-05-21 13:10:34 -07:00
YarrowQiao	2ea7cf287e	fix(tui): pass --expose-gc as node argv instead of NODE_OPTIONS Node refuses to start when NODE_OPTIONS contains --expose-gc: node: --expose-gc is not allowed in NODE_OPTIONS NODE_OPTIONS is restricted to a small allowlist of flags that are safe to inject via env (since any process able to set env vars on a node child could otherwise enable arbitrary capabilities). --expose-gc is not on that list and never has been -- it must be passed as a direct CLI flag. _launch_tui() was appending --expose-gc to NODE_OPTIONS before spawning the TUI's node process, which made `hermes --tui` fail to start on every modern node release. The intent (manual GC for long sessions to avoid fatal-OOM) is preserved by inserting --expose-gc directly into the node argv in _make_tui_argv() -- same effect, but actually allowed. --max-old-space-size=8192 stays in NODE_OPTIONS: it is allowlisted, and keeping it there means downstream node spawns inherit the same heap cap without having to re-thread the flag through every spawn site. The dev paths (`tsx src/entry.tsx` and `npm start` fallback) are left alone -- they don't accept node flags directly, and the production dist path is the one users actually hit via `hermes --tui`. Repro before fix: $ hermes --tui /usr/bin/node: --expose-gc is not allowed in NODE_OPTIONS	2026-05-21 13:10:34 -07:00
helix4u	ba9964ff0d	fix(custom): pass custom provider extra body Allow custom OpenAI-compatible providers declared under `custom_providers:` to set provider-specific `extra_body` fields and have Hermes merge them into chat-completions requests when the matching custom endpoint is active. This is a manual per-provider override rather than a model-name heuristic. OpenAI-compatible Gemma thinking support is real, but the on-wire payload shape is backend-specific: some servers want top-level `enable_thinking`, while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`. A per-provider override is safer than picking one assumed payload. Example config: ```yaml custom_providers: - name: gemma-local base_url: http://localhost:8080/v1 model: google/gemma-4-31b-it extra_body: enable_thinking: true reasoning_effort: high ``` For vLLM Gemma or NIM-style endpoints, use the nested shape those servers expect: ```yaml extra_body: chat_template_kwargs: enable_thinking: true ``` Changes: - `hermes_cli/config.py`: preserve `extra_body` in normalized `custom_providers:` entries and allow it in the validated field set. - `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body` as `request_overrides.extra_body` for named custom runtime resolution, including credential-pool paths. - `agent/agent_init.py`: at agent init, locate the matching custom-provider entry by `base_url` (+ optional model) and merge its `extra_body` into `AIAgent.request_overrides`, with caller-provided overrides winning on conflicting top-level keys. - `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile behavior (Ollama `num_ctx`, `think=False` when reasoning disabled); user-configured `extra_body` flows through `request_overrides`. - `website/docs/integrations/providers.md`: document the explicit `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant. - Tests cover config normalization, runtime propagation, model matching, trailing-slash equivalence, fallback when no `model` field is set, and caller-override merging precedence. Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`: configured `extra_body` reaches `kwargs.extra_body` on the wire request, and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`) without clobber. Salvaged from #29022 onto current `main`. Cosmetic typing edit in `plugins/model-providers/custom/__init__.py` and a stale-base docs revert in `providers.md` were dropped during cherry-pick. Closes #29022	2026-05-21 07:48:53 -07:00
ethernet	2fdefca570	Merge pull request #28269 from cresslank/chore/tui-remove-unused-babel-deps chore(tui): remove unused Babel build deps	2026-05-21 10:21:31 -04:00
ethernet	48be2e0e4d	test: use subprocesses for each test file (#29016 ) * ci(tests): install ripgrep from prebuilt tarball instead of apt apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu runners (the apt-get update against archive.ubuntu.com is the slow part; ripgrep itself is small). Switching to the upstream musl binary tarball cuts the step to a few seconds. - Pinned to ripgrep 15.1.0 with sha256 verification (same hash as published in the releases sha256 sidecar file). - Drops the `rg` binary into /usr/local/bin so it is on PATH for every subsequent step without GITHUB_PATH manipulation. - Applied to both the test and e2e jobs in tests.yml. * fix(cli): compile syntax check to tempdir, not source __pycache__ `_validate_critical_files_syntax` runs `py_compile.compile()` on each critical bootstrap file after a successful `git pull`. The default `py_compile` writes the resulting `.pyc` next to the source under `__pycache__/`, which causes two real problems: 1. Parallel test workers walking the same source tree (e.g. running the suite under per-file process isolation) can race against each other on the `__pycache__` write — manifests as flaky 'directory not empty' errors during teardown. 2. In production, the post-pull syntax check leaves a `.pyc` behind that the next interpreter run might pick up — fine when the interpreter version matches, sketchy if it doesn't. Fix: write the compiled output to a `tempfile.TemporaryDirectory()` that's discarded on function exit. We only care about the compile-or-not signal, not the artifact. * test(runner): per-file process isolation, drop manual state reset + xdist Replace fragile manual _reset_module_state test fixtures with robust per-file subprocess isolation. Each test file runs in a fresh `python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist, no custom pytest plugin, no shared worker state. Key changes: * scripts/run_tests_parallel.py — new runner: discovers test files, runs N in parallel via ThreadPoolExecutor, captures stdout per file, treats exit code 5 (no tests collected) as pass, kills all children on exit. Change from cpu_count to cpu_count2. The runner is I/O-bound (waiting on subprocess.communicate() from pytest children) The parent process does almost no CPU work, so 2x oversubscription keeps more pipes full. When a file fails, immediately show the last 30 lines of pytest output (stack traces + FAILED summary) plus a ready-to-copy repro command: python -m pytest tests/agent/test_auxiliary_client.py scripts/run_tests.sh — delegates to run_tests_parallel.py * .github/workflows/tests.yml — test step: python scripts/run_tests_parallel.py * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts * tests/conftest.py — remove ~200 lines of manual state-reset fixtures * AGENTS.md — update Testing section for per-file design * test(runner): speed gateway test antipattern scan up * fix(test): web search provider plugin test missing xai * fix(tests): make 14 test files pass under per-file subprocess isolation Tests that relied on cross-file state pollution from xdist workers fail when run in isolation (per-file subprocess model). Root causes and fixes: Tool registry not populated: - test_video_generation_tool_surface_matrix: add discover_builtin_tools() - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures registering all 8 bundled web providers, reset after each test - test_website_policy: same provider registration pattern - test_web_tools_tavily: same pattern across 3 dispatch test classes - Also add is_safe_url/check_website_access mocks where SSRF check blocks example.com (DNS resolution fails in isolated envs) Stale check_fn cache: - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache() in both kanban guidance tests (prior test cached False for kanban_show) - test_discord_tool: cache invalidation in setup/teardown - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries Module-level state pollution: - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache - test_skill_commands: set_session_vars() instead of patch.dict(os.environ) (ContextVar takes precedence over os.environ) - test_dm_topics: overwrite sys.modules + separate telegram.constants mock + force-reimport of gateway.platforms.telegram - test_terminal_tool_requirements: removed duplicate class declaration, autouse _clear_caches fixture * change(tests): run_tests.sh explicitly includes env vars instead of manually dropping some vars, now we just only include some * fix(tests): 5 more isolation/NixOS fixes - test_approval_plugin_hooks: isolate HERMES_HOME so real user's command_allowlist doesn't short-circuit the approval path - test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum (feature not merged on this branch) - test_write_deny: test systemd prefix against tmp_path instead of /etc/systemd which resolves to /nix/store on NixOS - test_pty_bridge: use shutil.which('cat') instead of /bin/cat (doesn't exist on NixOS) - profiles.py: rmtree onexc handler chmod's parent dirs too, fixing profile deletion when copytree preserved read-only modes from nix store * fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client * fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor * fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test * fix: address PR #29016 review feedback - Remove tracked .pytest-cache/ artifact and add to .gitignore - Fix stale 'xdist worker' comment in conftest.py - Deduplicate web provider registration into tests/tools/conftest.py shared helper (register_all_web_providers), replacing 8 copy-pasted blocks across 6 test files - Update PR description: remove stale recovered-test-files claim, fix worker count to match code (cpu_count2) fix: eliminate race in stale-cache achievements test The background scan thread could complete and overwrite _SNAPSHOT_CACHE before evaluate_all() returned the stale data — only 10 fake sessions made the scan finish instantly. Added scan_delay param to _FakeSessionDB and set it to 2s in the stale-cache test so the background thread can't win the race.	2026-05-21 16:40:04 +05:30

1 2 3 4 5 ...

9190 commits