hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-29 06:31:32 +00:00

Author	SHA1	Message	Date
Ben	3e33e14335	fix(docker): discover agent-browser Chromium binary at boot The image's Dockerfile runs npx playwright install chromium, which populates $PLAYWRIGHT_BROWSERS_PATH (=/opt/hermes/.playwright) with a `chromium_headless_shell-<build>/chrome-headless-shell-linux64/` tree. agent-browser (the runtime CLI Hermes spawns for the browser tool) doesn't recognise this layout in its own cache scan and fails with `Auto-launch failed: Chrome not found` — even though the binary is right there. Reproduction on current main: $ docker run --rm <image> sh -c 'npx -y agent-browser snapshot --url about:blank' ✗ Auto-launch failed: Chrome not found. Checked: - agent-browser cache: /tmp/.../.agent-browser/browsers - System Chrome installations - Puppeteer browser cache - Playwright browser cache Run `agent-browser install` to download Chrome, or use --executable-path. Fix: at boot, locate the binary under $PLAYWRIGHT_BROWSERS_PATH and export AGENT_BROWSER_EXECUTABLE_PATH via /run/s6/container_environment so the with-contenv shebang on main-wrapper.sh propagates it into the supervised `hermes` process and thence to agent-browser subprocesses. Filename-matched (chrome / chromium / chrome-headless-shell / chromium-browser), not path-matched: the chromium dir contains many shared libraries (libGLESv2.so, libEGL.so, ...) which inherit the executable bit from Playwright's tarball but are NOT browser binaries. Compare PR #18635's earlier `find \| grep -Ei 'chrome\|chromium'` which would match the path .../chrome-headless-shell-linux64/libGLESv2.so and pick a .so as the browser binary. User overrides (e.g. `-e AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/...`) are respected — the discovery block is skipped when the env var is already set. Quietly skipped when $PLAYWRIGHT_BROWSERS_PATH doesn't exist (e.g. custom builds that strip Playwright). This salvages PR #18635 by @jackey8616, who identified the bug and proposed the same env-var approach but in the now-deprecated docker/entrypoint.sh shim and with a path-match find command that selected .so files instead of the chrome binary. The fix retargets docker/stage2-hook.sh (the s6-overlay cont-init script where boot-time env setup belongs) with a corrected filename-match query. Fixes #15697 Closes #18635 Co-authored-by: Clooooode <12930377+jackey8616@users.noreply.github.com>	2026-05-27 20:43:27 +10:00
Ben	fb298a958c	fix(docker): mkdir HERMES_HOME as root in stage2 before chown / privilege drop (#18488 ) When HERMES_HOME points at a custom path whose parent directories only root can create (e.g. HERMES_HOME=/home/hermes/.hermes in a Compose file, or any path under a fresh / not pre-populated by the image), stage2-hook.sh fails on first boot: [stage2] Warning: chown failed (rootless container?) - continuing mkdir: cannot create directory '/custom': Permission denied mkdir: cannot create directory '/custom': Permission denied ... (one per s6-setuidgid hermes mkdir invocation) cont-init: info: /etc/cont-init.d/01-hermes-setup exited 1 The mkdirs fail because s6-setuidgid drops to hermes (UID 10000) before invoking mkdir -p, and the runtime user has no permission to create root-owned ancestor directories. 02-reconcile-profiles then crashes with FileNotFoundError, .install_method never lands, and the container limps on in a half-initialized state. Bootstrap HERMES_HOME with mkdir -p while still root, before the ownership normalization. Idempotent on the default /opt/data path (directory already exists from the Dockerfile RUN mkdir -p) and on any subsequent restart. (#18482) Retargeted from the original PR's docker/entrypoint.sh (now a deprecated shim) to docker/stage2-hook.sh where the related chown logic moved during the s6-overlay rework. Co-authored-by: wpengpeng168 <133926080+wpengpeng168@users.noreply.github.com>	2026-05-27 17:16:40 +10:00
Ben	22eb4d13f7	fix(docker): chown ui-tui and node_modules on UID remap so TUI esbuild works (#28851 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Build Skills Index / build-index (push) Has been cancelled Details Build Skills Index / trigger-deploy (push) Has been cancelled Details When HERMES_UID remaps the hermes user from 10000 to another UID (e.g. matching the host user's UID for bind-mount ergonomics), the TUI launcher's esbuild step fails: ✘ [ERROR] Failed to write to output file: open /opt/hermes/ui-tui/dist/entry.js: permission denied TUI build failed. This is because the Dockerfile's build-time `chown -R hermes:hermes` on `/opt/hermes/{.venv,ui-tui,node_modules}` (line 154) wrote UID 10000, and stage2-hook.sh only re-chowned `.venv` on UID remap — leaving the TUI build trees still owned by the old UID. Extend the stage2 re-chown to include the same set as the build-time chown: `.venv`, `ui-tui`, `node_modules`. These are the runtime-writable trees under $INSTALL_DIR; everything else under /opt/hermes is read-only at runtime so keeping it root-owned is fine. Original fix targeted docker/entrypoint.sh which is now a deprecated shim; retargeted to docker/stage2-hook.sh where the .venv chown moved during the s6-overlay rework. Co-authored-by: Andreas Steffan <623481+deas@users.noreply.github.com>	2026-05-27 15:41:48 +10:00
Ben	9eadb6805c	fix(docker): targeted chown to preserve host file ownership in HERMES_HOME (#19795 ) Replaces the recursive chown of $HERMES_HOME in stage2-hook.sh with a targeted approach: chown the top-level dir (so hermes can create new subdirs) plus the specific hermes-owned subdirectories (cron/, sessions/, logs/, hooks/, memories/, skills/, skins/, plans/, workspace/, home/, profiles/) — the same canonical list seeded by the s6-setuidgid mkdir -p block below. Avoids clobbering host-side file ownership when $HERMES_HOME is a bind mount that contains user-owned files not managed by hermes (issue #19788). Original fix targeted docker/entrypoint.sh which is now a deprecated shim; retargeted to docker/stage2-hook.sh where the recursive chown moved during the s6-overlay rework. Co-authored-by: Ptichalouf <1809721+ptichalouf@users.noreply.github.com>	2026-05-27 15:08:41 +10:00
dusterbloom	79fc92e9cb	fix(security): tighten .env file permissions to 0600 at all creation sites .env holds API keys and secrets. Multiple creation sites used `cp` / `touch` / `shutil.copy2` which obey the process umask — commonly 0o022, leaving the file at 0o644 (world-readable). Apply chmod 0o600 explicitly at every site that creates or copies .env. Sites covered: - docker/stage2-hook.sh: after the seed_one '.env' call, applied unconditionally (not just on first-seed) so a host-mounted .env with loose perms gets tightened on every container restart - hermes_cli/doctor.py: 'hermes doctor --fix' touches an empty .env when missing - hermes_cli/profiles.py: 'hermes profile create --clone' copies .env from the source profile; shutil.copy2 preserves source mode, so a source .env at 0o644 was being cloned into 0o644 - setup-hermes.sh: in-tree setup script's cp .env.example .env path, plus the already-exists branch (mirror of install.sh which already chmods 600 unconditionally on line 1442) scripts/install.sh was NOT changed — it already chmod 600's the .env unconditionally after the create/already-exists branches (line 1442). Salvaged from PR #25726 by @dusterbloom. The docker/entrypoint.sh portion of the original PR was dropped because main switched to an s6-overlay shim — the .env creation logic moved to stage2-hook.sh, which is where the chmod now lives. Closes #25497 (subset — install.sh + setup-hermes.sh) and #8448 (subset — install.sh only) as superseded. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-25 03:40:47 -07:00
Ben	9914bfc594	docker: drop sh -c wrappers from stage2-hook.sh PR #30136 review caught: three `s6-setuidgid hermes sh -c "..."` invocations in stage2-hook.sh interpolated $HERMES_HOME into a nested shell context. Practically low-risk (a malicious HERMES_HOME already requires container-launch privileges) but the cleaner pattern is to invoke commands directly so the shell isn't a second interpreter. * `mkdir -p` of the data subdirs now runs directly via s6-setuidgid, one path per arg. * The .install_method stamp is written via `printf \| tee` — also no shell wrapper. * The skills_sync invocation uses the venv's python by absolute path instead of sourcing activate inside a shell. skills_sync.py doesn't need anything from activate beyond sys.path, which the bin-stub python already provides. No behavior change. Just a smaller attack surface and a script that's easier to read.	2026-05-24 18:05:33 -07:00
Ben	2afefc501c	feat(docker): per-profile s6 supervision + container-restart reconciliation Phase 4 of the s6-overlay supervision plan. Activates the Phase 3 S6ServiceManager by hooking it into the profile lifecycle and the `hermes gateway start/stop/restart` dispatcher, and adds a cont- init.d-time reconciliation pass that survives `docker restart`. Task 4.0 — container-boot reconciliation: /run/service/ is tmpfs, so every `docker restart` wipes every per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles invokes hermes_cli.container_boot.reconcile_profile_gateways() on every boot, which walks $HERMES_HOME/profiles/<name>/, reads each gateway_state.json, recreates the s6 service slot, and auto-starts only those whose last state was 'running'. Other states (stopped, starting, startup_failed, missing) register the slot in the down state — avoiding crash-loops across restarts for a gateway that was broken last boot. Per-profile outcome is recorded to $HERMES_HOME/logs/container-boot.log. Implementation: hermes_cli/container_boot.py + 12 unit tests. Profile-marker is SOUL.md, not config.yaml, because `hermes profile create` only seeds SOUL.md by default (config.yaml comes from `hermes setup`). Task 4.1 / 4.2 — profile create/delete hooks: hermes_cli/profiles.py::create_profile now calls _maybe_register_gateway_service(<canon>) at the end, which routes through ServiceManager.register_profile_gateway when running on s6 and no-ops on host backends. delete_profile mirrors with _maybe_unregister_gateway_service. _allocate_gateway_port produces a deterministic SHA-256-derived port in [9200, 9800). Task 4.3 — gateway dispatch + remove rejection arms: _dispatch_via_service_manager_if_s6(action) intercepts start/stop/restart at the top of each subcommand and routes them through S6ServiceManager.{start,stop,restart}. The pre-Phase-4 `elif is_container():` rejection arms are kept as fallback for pre-s6 containers / unsupported runtimes, but only ever fire when detect_service_manager() != 's6'. install/uninstall under s6 print informational guidance pointing users at profile create/delete. Removed the two xfail(strict=True) markers from tests/docker/test_profile_gateway.py — both tests now pass strictly. Task 4.4 — status reporting: get_gateway_runtime_snapshot() reports Manager: 's6 (container supervisor)' inside an s6 container instead of 'docker (foreground)'. Plan-vs-reality drift fixed in this commit: - Plan's S6ServiceManager._render_run_script used `gateway start --foreground --port {port}` — invented args; the real CLI is `gateway run`. Switched accordingly. port arg retained for API parity but now documented as 'currently ignored'. - Plan's reconciler keyed on config.yaml; switched to SOUL.md (config.yaml is created by hermes setup, not by hermes profile create, so the original gate caught nothing). - The plan's _dispatch helper used _profile_arg() which returns '--profile <name>' (i.e. with the flag prefix). Switched to _profile_suffix() which returns the bare name. - Architecture B's docker exec doesn't get /command on PATH or the venv on PATH; Dockerfile's runtime PATH now includes /opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works without sourcing the venv. - stage2-hook now chowns $HERMES_HOME/profiles to hermes on every boot, not just on the UID-remap path. Without this, files created by docker-exec-as-root accumulate and the next reconciler run fails with PermissionError reading SOUL.md. Test harness: 19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to passing). 78 unit tests across service_manager + container_boot + profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck pass cleanly. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-24 18:05:33 -07:00
Ben	e0e9c895d3	feat(docker)!: replace tini with s6-overlay as PID 1 BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay) instead of /usr/bin/tini. Main hermes runs as the container CMD with TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the ground is laid for per-profile gateway supervision (Phase 3+4). All five pre-s6 docker run invocation patterns continue to work identically — verified by the Phase 0 docker harness: docker run <image> → `hermes` with no args docker run <image> chat -q "..." → `hermes chat -q ...` passthrough docker run <image> sleep infinity → `sleep infinity` direct docker run <image> bash → interactive bash docker run -it <image> --tui → interactive Ink TUI Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint + shellcheck pass cleanly. Architecture pivot from plan v3 (documented in main-hermes/run header): the plan called for main hermes to be an s6-supervised service, but two real s6-overlay v3 mechanics blocked that — cont-init.d scripts receive no arguments (CMD args are not visible to stage2-hook), and `/run/s6/basedir/bin/halt` after writing the exit code did not propagate the desired exit code (container exits 143). We use the s6-overlay-native CMD pattern instead: main-wrapper.sh is the container's main program (ENTRYPOINT prepends it so leading-dash args like --version aren't intercepted by /init), exec's the final program with stdin/stdout/stderr inherited, and the program's exit code becomes the container exit code. main-hermes is now a no-op `sleep infinity` slot kept for future supervised-gateway-container modes. This trades "supervised restart of main hermes" for arg- parity with the pre-s6 contract — main hermes was already unsupervised under tini, so we lose nothing functional. Dashboard supervision is the only new guarantee added by this phase. Files added: docker/main-wrapper.sh # arg routing + s6-setuidgid drop docker/stage2-hook.sh # gosu-equivalent + chown + seed docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base} docker/s6-rc.d/dashboard/{type,run,dependencies.d/base} docker/s6-rc.d/user/contents.d/{main-hermes,dashboard} Files changed: Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md	2026-05-24 18:05:33 -07:00

8 commits