On hosts where the cgroup v2 cpu/memory/pids controllers are not delegated
to the docker/podman process (unprivileged Proxmox LXCs, some rootless and
nested setups), --pids-limit/--cpus/--memory cause every container start to
fail with OCI runtime error / exit 126, breaking terminal + execute_code.
- Add _cgroup_limits_available(image): one-shot, host-wide cached probe that
spawns a throwaway container from the sandbox image itself (sleep 0) with
all three flags together, mirroring the existing _storage_opt_supported
probe-and-degrade pattern.
- Remove --pids-limit from static _BASE_SECURITY_ARGS; apply it (default 256
via _DEFAULT_PIDS_LIMIT) in resource_args gated on the probe.
- Gate --cpus and --memory on the same probe.
Behavior unchanged on cgroup-capable hosts; graceful degradation with a
one-time warning where controllers aren't delegated.
Fixes#6568.
(cherry picked from commit c933880b7e)
Co-authored-by: angelos <angelos@oikos.lan.home.malaiwah.com>
* fix(daytona): quote single-upload mkdir parent path
The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.
Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.
Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.
* docs(infographic): daytona quote-sync fix
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.
- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
that proves the snapshot never tears (would still tear with $$).
Fix race condition in terminal environment snapshots that could corrupt
PATH with declare -x entries. When concurrent terminal calls share the
same snapshot file, the non-atomic 'export -p > snapshot.sh' write could
be read mid-write by another process, causing partial/corrupted env vars
to be sourced and mixed into PATH.
The fix uses atomic file replacement:
- Write to a temp file: export -p > snapshot.sh.tmp.303651
- Atomically replace: mv -f snapshot.sh.tmp.303651 snapshot.sh
On POSIX, mv within the same filesystem is atomic, so source() will
either see the old complete snapshot or the new complete one, never a
partial/truncated file.
Fixes#38249
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.
Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:
- Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
secrets -- stripped even for credential-inheriting children.
- Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
unless inherit_credentials=True. The opt-in is grep-able for audit.
Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.
This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.
Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
* fix(windows): stop terminal-window popups from background spawns
Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:
- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
subprocess allocate a fresh console when the launching process has
none (pythonw desktop backend / detached gateway) - even with output
captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
get a console regardless of how they're launched.
Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
exclusively owns an auto-allocated console detaches it at startup
(GetConsoleProcessList()==1 gate leaves shared interactive consoles
untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
tools repo-wide so the popup class can't regress.
* fix(windows): scope FreeConsole to background entry points; fix merge fallout
Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.
- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
(start_server, what the desktop spawns), cron standalone, tui_gateway entry,
slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
no-op without console / on POSIX, and asserts it's not an import side effect.
Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
every terminal init). _subprocess_compat.popen already hides the window, so
drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
case-variant test to drive that production path.
* test(claw): mock _subprocess_compat.run seam for Windows process scan
claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).
Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.
Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.
Also adds the AUTHOR_MAP entry for the salvaged contributor.
Closes#23473
preexec_fn=os.setsid runs Python code in the forked child before exec,
which is unsafe in multi-threaded processes (CPython docs). When the
Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs)
with active thread pools, the fork can SIGSEGV before the child execs.
Replace all preexec_fn usage with start_new_session=True, which provides
the same setsid/process-group semantics without running Python in the
fork. This is already the pattern used throughout hermes_cli/gateway.py
and hermes_cli/_subprocess_compat.py.
Fixes#46789
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:
1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
→ /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
drops the -c argument, so the command is swallowed (exit 0, no output).
Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.
Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.
Closes#42203. Supersedes #42290.
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.
Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.
Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.
Supersedes #44221 (Windows-enablement salvage of #30660).
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Plugins shelling out to bare `hermes` via the terminal tool hit
`command not found` (exit 127) when the gateway was launched without the
hermes install dir on PATH (systemd, service managers, cron, desktop
launchers) — even though `hermes` works in the user's own interactive
terminal, which sources the shell rc that exports that dir.
The terminal tool's subshell PATH was the agent process PATH plus a
static set of system dirs (_SANE_PATH); it never included wherever the
hermes console-script actually lives (~/.local/bin, the venv bin/Scripts,
pipx, nix). Resolve that dir once (which/argv0/sys.executable) and
prepend-if-missing it so bare `hermes` resolves regardless of launch
method.
When profile isolation activates ({HERMES_HOME}/home/ exists), child
processes receive HOME={HERMES_HOME}/home/ for tool config isolation
(git, ssh, gh). However, scripts using Path.home() to locate
~/.hermes/ would incorrectly resolve to the isolated profile home,
breaking helpers that rely on the real user home directory.
New get_real_home() helper in hermes_constants resolves the actual
user home independently of profile isolation. All four subprocess
spawners now inject HERMES_REAL_HOME alongside the profile HOME:
- tools/code_execution_tool.py (execute_code)
- tools/environments/local.py (terminal background, run_env)
- agent/copilot_acp_client.py (Copilot ACP)
Child scripts can now use:
Path(os.environ.get("HERMES_REAL_HOME", os.environ.get("HOME", "")))
to reliably find the real user home regardless of profile isolation.
Closes#25114
* fix(terminal): complete sane PATH entries on POSIX
Fixes macOS gateway/launchd terminal sessions whose PATH already
includes /usr/bin while omitting Apple Silicon Homebrew paths.
LocalEnvironment._make_run_env() now appends each missing _SANE_PATH
entry individually on POSIX, preserving caller precedence and avoiding
duplicate sane entries.
Root cause: the previous logic used /usr/bin as the sentinel for sane
PATH injection. macOS launchd commonly provides /usr/bin while leaving
out /opt/homebrew/bin and /opt/homebrew/sbin, so Homebrew-installed
CLIs stayed unavailable in terminal tool calls.
Salvaged from #35614 by @y0shua1ee. Fixes#35613.
Co-authored-by: y0shua1ee <104712437+y0shua1ee@users.noreply.github.com>
* test(terminal): harden sane PATH completion against dup/empty entries
Follow-up to the #35613 fix. Strengthens _append_missing_sane_path_entries:
- De-duplicate the caller-supplied PATH (first occurrence wins) so a PATH
that already contains duplicate entries is collapsed rather than carried
through. Previously only newly-appended sane entries were guarded against
duplication; pre-existing caller duplicates were preserved verbatim.
- Drop empty PATH entries (leading/trailing/double ':'), which POSIX shells
interpret as the current working directory — a mild foot-gun in a
default terminal environment.
Behaviour for well-formed PATHs (no duplicates, no empty entries) is
byte-identical to before; only malformed/duplicated inputs change.
Adds regression tests for: the literal macOS launchd PATH
(/usr/bin:/bin:/usr/sbin:/sbin), caller-duplicate collapsing with
order preservation, and empty-entry stripping.
* docs(terminal): clarify PATH normalisation semantics; drop dead set add
Addresses review findings on the sane-PATH completion follow-up:
- Sharpen the _append_missing_sane_path_entries docstring to state
explicitly that on POSIX the caller PATH is rewritten (empty entries
stripped, duplicates collapsed) rather than merely appended to, and
that well-formed PATHs remain byte-identical bar the appended sane
entries. This makes the intentional semantic change visible rather
than buried under "hardening".
- Document why _path_env_key is a deliberate second Windows guard
distinct from the helper's early return (key-casing selection vs
standalone safety), so neither is mistaken for redundant and removed.
- Drop the dead `seen.add(entry)` in the sane-entry loop: _SANE_PATH is
a static duplicate-free constant, so the membership check against the
caller entries is sufficient and `seen` is never read afterwards.
No behaviour change: verified byte-identical output across the launchd,
minimal, empty, duplicate, empty-entry and already-full cases, and
re-confirmed gh/brew resolve through the real LocalEnvironment.execute()
path under a launchd-style PATH. 133 targeted tests pass.
Intentionally NOT consolidating with tools/browser_tool._merge_browser_path:
it prepends (vs append), filters on os.path.isdir, uses os.pathsep, and
draws from a dynamic candidate set — a shared helper is a separate
refactor, out of scope for this bugfix.
---------
Co-authored-by: y0shua1ee <104712437+y0shua1ee@users.noreply.github.com>
When Hermes runs in TUI mode, the gateway child process communicates with
the Node.js parent over a JSON-RPC protocol on stdin. Subprocess calls that
inherit this stdin fd can trigger a race condition where the child's stdin
read returns EOF, causing the gateway to exit cleanly (exit code 0) mid-tool-
execution.
This is the same root cause as issue #14036 (byterover plugin) and PR #39257
(SSH environment backend). This commit applies the fix — stdin=subprocess.DEVNULL
— to all 85 subprocess.run() and subprocess.Popen() calls that execute inside
the TUI gateway child process.
Scope: TUI-context code only (agent/, tools/, plugins/, tui_gateway/server.py).
CLI code (cli.py, hermes_cli/), tests, scripts, and gateway process management
are excluded — they don't run inside the TUI child and inherit the terminal's
stdin, not the JSON-RPC pipe.
85 call sites across 28 files. All files pass syntax check.
Add HERMES_DASHBOARD_SESSION_TOKEN to the Hermes-managed subprocess environment blocklist so dashboard authorization material does not propagate into shell, PTY, or background process launches.
Extend the local environment blocklist regression coverage to prove the dashboard session token is stripped like other Hermes-managed secrets.
The previous catch-all except OSError would silently swallow real
errors (disk full, bad path, permission issues unrelated to symlink
privilege). Narrow the handler to winerror == 1314 — the specific
Windows error code for "A required privilege is not held by the
client" — and re-raise every other OSError so genuine failures are
not hidden.
On Windows, os.symlink() raises OSError (WinError 1314) unless the
process has Administrator rights or Developer Mode is enabled. The SSH
bulk-upload staging logic used symlinks to mirror the remote layout
before piping through tar; this caused all ssh_bulk_upload tests to
fail on Windows.
- ssh.py: wrap os.symlink() in try/except OSError and fall back to
shutil.copy2() so staging works on every platform. shutil was already
imported, no new dependency introduced.
- file_sync.py: replace str(Path(remote).parent) with
posixpath.dirname(remote) in unique_parent_dirs(). pathlib.Path uses
the host separator (\ on Windows), but these paths are sent to a
remote Linux host over SSH and must always use forward slashes.
- test_ssh_bulk_upload.py: make test_staging_symlinks_mirror_remote_layout
platform-agnostic — assert file existence and content instead of
os.path.islink() + os.readlink(), since the staged entry may be a
copy on Windows.
Salvage of #36631 (@annguyenNous), rebased onto current main with
regression tests added. Fixes#36266.
When a persistent Docker sandbox container is removed out-of-band (idle
reaper, `docker prune`, OOM kill, daemon restart), the gateway kept
issuing `docker exec` against the dead container ID, returning
"No such container" on every subsequent tool call — the agent was
permanently blocked until the gateway process restarted.
DockerEnvironment.execute() now detects the "No such container" /
"is not running" error after a non-zero exit (gated on
persist_across_processes) and calls _recreate_container(): it tries
label-based reuse first, falls back to a fresh container replaying the
same image + full all_run_args set, re-runs init_session(), and retries
the command once. A genuine non-zero exit is NOT misclassified as
container-gone.
Differs from #36631 as submitted: adds the tests the original lacked.
tests/tools/test_docker_environment.py covers _is_container_gone pattern
matching (incl. the negative/control case), the recover-and-retry path,
the persist_across_processes=False opt-out (no recovery), and the
ordinary-failure passthrough (no spurious recreation). _make_dummy_env
now forwards persist_across_processes.
Verified:
- Unit: 67/67 in test_docker_environment.py (4 new + existing).
- Live E2E against the real docker daemon: started a persistent
container, `docker rm -f`'d it out-of-band, and the next execute()
transparently recreated a fresh container and succeeded; a follow-up
command worked in the recovered container; a real `exit N` passed
through without triggering recovery.
Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>
When `docker run -d` fails after Docker has already created the container
object (e.g. exit 125 when the daemon isn't ready, or a timeout mid image
pull), the code raised before `self._container_id` was set — so the
container leaked permanently in "Created" state. Reported in #7439:
110+ orphaned containers accumulated over 3 days from hourly cron-
scheduled gateway sessions hitting a Docker Desktop startup race.
The orphan reaper added in #33645 (reap_orphan_containers) does NOT cover
this case: it filters `status=exited`, but a failed-create container is in
`Created` state, so it slips through and is never reaped.
Wrap the `docker run -d` call in try/except and `docker rm -f` the
container by its known name before re-raising.
Salvages #7440 by @Tranquil-Flow. Their branch predated the cross-process
reuse + labels rework on `main`, so a cherry-pick conflicted; reconstructed
the same intent (plus their two regression tests, adapted to mock the new
reuse `docker ps` probe) against current `main`.
Verified adversarially: reverted just the product change to origin/main's
`docker.py`, ran the two new tests -> both FAIL with
`assert 0 == 1 ("docker rm should be called once")`. With the fix applied,
both pass; full test_docker_environment.py is 65/65 green.
Closes#7440. Fixes#7439.
Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>
s6-overlay images (e.g. hermes-agent:latest) use /init as PID 1 and exec
/run/s6/basedir/bin/init during stage0 startup. The Docker terminal backend
unconditionally added Docker --init and mounted /run as noexec, which broke
those images in two ways: --init created a second competing PID-1 init, and
the noexec /run made s6 stage0 fail with "exec: /run/s6/basedir/bin/init:
Permission denied" (exit 126), so the container died and terminal commands
reported a generic "container is not running" error.
Detect images whose entrypoint is /init via 'docker image inspect' and, for
those images only, skip Docker --init and mount /run with exec. All other
images keep the hardened --init + noexec defaults. Detection is best-effort:
any inspect failure falls back to the safe defaults.
The docker_forward_env build loop only consulted the ~/.hermes/.env disk
fallback when a key was unset (value is None), not when it was present
but empty (""). A transient empty value in os.environ was therefore
forwarded into the sandbox container as `-e KEY=`, clobbering the correct
value on disk. Sandboxed workloads then read a zero-length secret and
failed auth (observed as intermittent Linear API 401s) with no gateway
restart and no .env rewrite.
Treat empty-string like unset (`if not value:` on the fallback) and never
forward a blank secret (`if value:` on the guard).
Fixes#35580
Background tasks on non-local backends (SSH/Docker/Modal/Daytona/Singularity)
go through `ProcessRegistry.spawn_via_env`, which builds a hand-crafted,
shell-safe wrapper:
mkdir -p T && ( nohup bash -lc CMD > LOG 2>&1; rc=$?; ... ) & echo $! > PID && cat PID
`BaseEnvironment.execute()` unconditionally ran `_rewrite_compound_background`
on every command, including this wrapper. The rewrite (meant to defuse the
`A && B &` subshell-wait trap for user commands) turns `( ... ) & echo $!` into
`{ ( ... ) & } echo $!` — note `} echo` with no separator, which is a bash
syntax error. The wrapper then never produces a PID, the redirected output file
is never created, and the agent sees an immediate exit code -1. This breaks
*every* background launch on a non-local backend (e.g. a simple
count-and-redirect script over SSH), not just edge cases.
Fix:
- Add `rewrite_compound_background: bool = True` to `BaseEnvironment.execute()`
(and the `BaseModalExecutionEnvironment` override, which accepts and ignores
it). Default preserves existing behavior; the user foreground terminal path
still rewrites.
- `spawn_via_env` passes `rewrite_compound_background=False` so its already
shell-safe wrapper is left intact.
- Treat a wrapper that produces no PID as a failed launch (mark the session
exited with a real exit code instead of exposing a fake running session), and
don't register/checkpoint a session that never started.
Verified empirically: with the rewrite skipped, the wrapper is valid bash,
launches the process, captures the PID, and writes the log/pid/exit files; the
old rewritten form fails `bash -n` with a syntax error.
Based on #33756 by @CharZhou (extracted from a multi-feature branch; the
unrelated image_gen / docker-media changes are not included here).
Co-authored-by: CharZhou <17255546+CharZhou@users.noreply.github.com>
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix: drain thread no longer crashes on fd-less stdout streams
The _wait_for_process drain thread called proc.stdout.fileno()
unconditionally. ProcessHandle implementations whose stdout is not
backed by a real OS fd (iterator-style in-memory streams, mock procs)
raised 'list_iterator' object has no attribute 'fileno' (or
'fileno() returned a non-integer' from select.select), killing the
daemon thread and silently losing all process output.
Resolve the fd defensively at the top of _drain; when stdout has no
usable integer fileno, fall back to draining it as an iterable (the
legacy 'for line in proc.stdout' contract). The real subprocess /
os.pipe-backed select() fast path is unchanged.
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.
Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.
Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.
Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
Salvages #24490 by @liuhao1024 against current main.
The Docker daemon will silently auto-create a directory at the host
path of any `-v <host>:<container>` bind mount when the host path
doesn't exist. In Docker-in-Docker setups (where the outer host's
real credential file isn't visible inside the agent's parent
container), this leaves a directory at the credential mount source —
and the inner `docker run` then refuses to mount a directory over a
file destination with exit 125.
Add defensive shape guards to all three mount loops in
DockerEnvironment.__init__:
* credentials (expected: file) — skip + warn on directory or missing
* skills (expected: dir) — skip + warn when not a directory
* cache (expected: dir) — skip + warn when not a directory
Failed mounts surface as WARN logs rather than crashing the container
start. Existing well-formed sources mount unchanged.
The original PR's branch was on a pre-container-reuse-rework base
(May 12) and conflicted with the post-May-28 driver work (label
tagging, container reuse, orphan reaper). Reconstructed the same
intent on current main; the three guard blocks slot cleanly into
`tools/environments/docker.py` around the existing mount loops.
Three new tests pinned in `tests/tools/test_docker_environment.py`:
directory-source skip, missing-source skip, valid-file mounts. Test-
first regression verification: reverted just the production code to
`origin/main` and confirmed the new tests fail with
`'deleted_token.json' is contained here: /root/.hermes/...` — the
fixed code makes them pass. Full file passes (54/54).
Closes#24490
Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.
This commit splits the cleanup paths cleanly:
* **Persist mode (default)** — cleanup() is a NO-OP for the
container. Container stays running, processes survive, next Hermes
process attaches via the existing label probe in ~ms instead of
waiting for docker start. Resource reclamation happens via the
orphan reaper at next startup (2 × lifetime_seconds threshold), which
covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
overrides persist mode and tears the container down unconditionally.
cleanup_vm(task_id) now defaults to force_remove=True since
it's the user-driven reset path (called from AIAgent.close(),
/reset-style flows, and the idle reaper's per-turn cleanup).
The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.
E2E verified against real Docker:
✓ Container is still running after cleanup()
✓ Background process (sleep loop) survived cleanup()
✓ Filesystem state preserved across cleanup()
✓ In-process container_id cleared (next __init__ will re-probe)
✓ Background process visible from reused env (no docker start happened)
✓ force_remove=True removed the container even in persist mode
✓ cleanup_vm() removed the container (defaults to force_remove=True)
Test changes:
* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
`test_cleanup_with_persist_is_noop_for_container` — asserts neither
stop nor rm runs in persist mode, and the in-process handle is
cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
— covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
`test_wait_for_cleanup_after_cleanup_returns_true` to pass
`force_remove=True` so they actually exercise the docker code path
(default no-op would trivially pass).
cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.
Refs #20561
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.
But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.
This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.
Implementation:
* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
``label=hermes-profile=<current>``. Per-container ``docker inspect``
parses ``State.FinishedAt`` (with nanosecond-precision trimming for
Python's microsecond-bound ``fromisoformat``); containers older than
the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
load-bearing — a running container may belong to a sibling Hermes
process whose reuse path will pick it up; killing it would crash the
sibling mid-command. Single-container failures are logged and the
sweep continues to the next candidate.
* New ``_maybe_reap_docker_orphans()`` helper in
``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
``env_type == "docker"``. Gated by:
- ``terminal.docker_orphan_reaper: true`` (default; opt-out for
operators running multiple Hermes processes in the same profile
who don't trust the conservative defaults)
- ``_docker_orphan_reaper_ran`` module flag with double-checked
locking — parallel subagents and RL rollouts don't trigger N
concurrent docker ps storms
- Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
(so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
setup)
- Profile scoping — a research profile NEVER reaps the default
profile's stragglers
- Exception swallow — a janitor failure must never block container
creation
* New config ``terminal.docker_orphan_reaper`` wired through all four
config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
tests/conftest.py) and pinned by
``test_docker_orphan_reaper_is_bridged_everywhere``.
Coverage:
* 9 new unit tests in test_docker_environment.py — happy path, recent-
container sparing, profile scoping, unparseable-timestamp safety,
docker-ps-failure handling, partial-failure continuation, nanosecond
timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
— once-per-process gate, disable-flag respected, lifetime doubling
with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.
Closes#20561 (combined with the two prior commits on this branch).
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).
Three root causes, all addressed by this commit:
1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
which raced with parent-process exit — when Python exited promptly the
detached shell child got killed mid-\`docker stop\`, leaving stopped
containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
(the bind-mount-persistence flag). Default config sets
\`container_persistent: true\`, so the default happy path skipped \`rm\`
entirely — even when the user explicitly didn't want cross-process
reuse, containers leaked.
Fix:
* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
true, init probes
\`docker ps -a --filter label=hermes-agent=1
--filter label=hermes-task-id=<task>
--filter label=hermes-profile=<profile>\`
and reuses a matching container (running → attach; stopped →
\`docker start\` → attach; \`docker start\` failure → fall through to a
fresh \`docker run\`). Multiple matches prefer the running one, with the
stragglers left for the orphan reaper (next commit) to clean up.
* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
\`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
whenever \`persist_across_processes\` is false, regardless of the
bind-mount-persistence setting. The leak class is gone in all
combinations.
* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
hook calls this on every active env, blocking up to 15s for the
cleanup thread before interpreter exit. Without this, \`hermes /quit\`
raced the daemon-thread teardown and dropped the stop/rm work.
* New config \`terminal.docker_persist_across_processes\` (default
\`true\` — restores the documented contract). Set \`false\` for hard
per-process isolation. Wired through all four config-bridge sites
(cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
list); regression-pinned by
\`test_docker_persist_across_processes_is_bridged_everywhere\` matching
the existing pattern for docker_run_as_host_user / docker_env.
Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.
Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.
Refs #20561
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.
This commit adds three labels at `docker run` time:
--label hermes-agent=1 # global sweep target
--label hermes-task-id=<sanitized> # per-task reuse key
--label hermes-profile=<sanitized> # per-profile isolation key
Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.
No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.
Refs #20561
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.
Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated
Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring
tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.
Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
break test discovery; pre-existing unrelated mock warning)
The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
Commits 8bf09455d (Grogger, explicit creationflags=) and 95683c028
(nekwo, **_popen_kwargs via windows_hide_flags()) landed 77 minutes
apart and both injected creationflags into the same subprocess.Popen
call. nekwo's commit correctly replaced the explicit line in
tools/process_registry.py but only added the kwargs spread in
tools/environments/local.py -- leaving creationflags specified twice.
Result on Windows: every LocalEnvironment.init_session() raised
"subprocess.Popen() got multiple values for keyword argument
'creationflags'" and fell back to bash -l per command (much slower --
bashrc runs on every shell invocation).
Drop the explicit line so **_popen_kwargs is the single source.
`_wait_for_process()` was sleeping for a fixed 200ms between polls of
the subprocess exit status. For commands that complete in <50ms (echo,
pwd, date, cat short files, write_file with small content, read_file
with small content), the agent was stuck waiting for the next 200ms
tick to notice the process had exited. That floor was the dominant
component of per-tool latency for typical short commands.
Replace with adaptive backoff: start at 5ms, multiply by 1.5 each
iteration up to 200ms. Fast commands (the common case) return in
~6ms; long-running commands (builds, tests, sleeps) reach the 200ms
steady-state poll rate within ~12 iterations (~150ms total) and pay
identical CPU after that.
Tool-call wall time (deterministic microbench of `echo first`):
before: median 200ms min 200ms max 200ms
after: median 5ms min 5ms max 7ms
saved: ~195ms per terminal tool call
End-to-end chat -q with 3 sequential terminal tool calls
(`echo first`, `echo second`, `echo third`):
before: median 5.73s, min 5.61s
after: median 4.64s, min 4.60s
saved: ~1100ms wall per turn
Live tmux session: a typical 'write file, read it back' turn now
displays each tool as 0.1s in the spinner (was 0.9s before). The
agent observes the subprocess exit ~200ms faster per call. For chat
workflows that do 4-8 terminal/file calls per turn this saves
800ms-1.5s of pure wall-clock waiting.
Why it's safe:
- Interrupt and timeout checks still fire on every iteration (no
longer rate-limited to 5/sec)
- Activity callback fires on the same 'due' schedule (`touch_activity_if_due`)
- DEBUG_INTERRUPT heartbeat is unchanged (30s)
- Steady-state poll rate for long-running commands matches the old
200ms within ~150ms of startup
Tests:
- tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes
(test_delegate.py::test_depth_limit, test_constants — pass in isolation)
- Live tmux: 2-turn conversation + multiple tool calls, no errors
Apply Windows CREATE_NO_WINDOW flags to foreground local terminal subprocesses and tracked background processes so Hermes operations do not flash or steal focus with extra console windows.
1. trajectory_compressor.py: yaml.safe_load() returns None on empty
files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
adding `or {}` fallback. (HIGH — blocks startup with empty config)
2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
try/except: cron/scheduler.py, hermes_cli/auth.py,
agent/shell_hooks.py, tools/skill_usage.py,
tools/environments/file_sync.py, tools/memory_tool.py. If unlock
raises OSError, fd.close() is skipped and the lock is held forever.
The msvcrt branches already had try/except; the fcntl branches did
not. Fix by wrapping in try/except (OSError, IOError): pass.
3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
followed by path.read_text() with no try/except. If file is deleted
between the check and the read, FileNotFoundError propagates. Fix
by using try/except FileNotFoundError.
4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
can leave truncated JSON on crash, causing JSONDecodeError on next
load. Fix by writing to tempfile + fsync + os.replace (atomic).
Add creationflags=CREATE_NO_WINDOW to every Windows Popen call
across the terminal, process registry, code execution, and kanban
worker subsystems. Prevents visible CMD windows from flashing on
the user's desktop during agent operation.
Also adds the _IS_WINDOWS module constant to kanban_db.py where
it was missing, for consistency with the other patched files.
5 Popen sites across 4 files:
- tools/environments/local.py (terminal foreground spawn)
- tools/process_registry.py (background process spawn)
- tools/code_execution_tool.py (sandbox + interpreter probe)
- hermes_cli/kanban_db.py (kanban worker spawn)
Two log-spam fixes surfaced by a Windows user (Git Bash + Python 3.11.9):
1. LocalEnvironment cwd warn spam
============================
Git Bash's `pwd -P` emits paths like `/c/Users/x`. The base-class
`_extract_cwd_from_output` was assigning this verbatim to `self.cwd`
without validation, then `_resolve_safe_cwd`'s `os.path.isdir(/c/...)`
returned False on Windows, triggering:
LocalEnvironment cwd '/c/Users/NVIDIA' is missing on disk;
falling back to '/' so terminal commands keep working.
...on every terminal call. The pre-existing Windows-path translation
inside `_run_bash` ran AFTER the safe-cwd check, so it could never
prevent the warning.
Fix:
- New `_msys_to_windows_path` helper (idempotent, no-op off Windows).
- `_resolve_safe_cwd` normalizes before `isdir`, so a valid MSYS path
is recognized as the real directory it points at.
- `LocalEnvironment._update_cwd` and a new override of
`_extract_cwd_from_output` translate + validate before mutating
`self.cwd`. Stale / non-existent marker paths roll back to the
previous cwd instead of clobbering it.
- The fallback warning still fires when the directory really is gone
(deletion-recovery scenario from #17558 still covered).
2. tirith spawn-failed warn spam
=============================
When tirith isn't installed (background install in flight, or marked
failed for the day) and the configured path stays as the bare string
`tirith`, every `subprocess.run([tirith_path, ...])` raises OSError
and logged:
tirith spawn failed: [WinError 2] The system cannot find the file specified
...on every command. fail_open=True means behaviour is correct, but
the log noise is severe.
Fix:
- `_warn_once(key, ...)` thread-safe dedupe helper.
- Three hot-path warnings (`tirith path resolved to None`,
`tirith spawn failed: ...`, `tirith timed out after Ns`) now log
once per (exception class, errno) / timeout-value / path-none key.
- Dedupe set is cleared on `_clear_install_failed` so a successful
install lets a subsequent failure surface again.
Tests
=====
- `tests/tools/test_local_env_windows_msys.py`: 12 tests covering the
MSYS→Windows translator, the resolve fast-path, update_cwd validation,
and extract_cwd_from_output rollback.
- `tests/tools/test_tirith_security.py`: 4 new dedupe tests (15 spawn
failures → 1 log line; distinct exc types → 2 lines; timeout dedupe;
path-None dedupe).
Targeted runs:
test_local_env_windows_msys.py 12 passed
test_local_env_cwd_recovery.py 7 passed (pre-existing, no regressions)
test_tirith_security.py 67 passed (63 pre-existing + 4 new)
test_base_environment + local_* 37 passed (no regressions)
test_local_env_blocklist + neighbours 114 passed
Reported via Hermes log capture: 19× cwd warnings + 15× tirith warnings
in a single short session.
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
Daytona ships breaking SDK changes on June 10, 2026 — `list()` returns
an iterator and the `page=` offset parameter is removed. We pin
daytona==0.155.0 so we're past the May 24 hard-cutoff, but the
legacy-sandbox resume path in DaytonaEnvironment still passes `page=1`
and reads `.items` off the result.
Switch to `next(iter(results), None)` against a single-result
`list(labels=..., limit=1)` call. Update tests to use `iter([...])`
and drop the `page=1` kwarg from list() assertions.
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
Set HERMES_SESSION_ID using the existing session_context.py ContextVar
system for concurrency safety (multiple gateway sessions in one process
won't cross-talk). Also writes os.environ as fallback for CLI mode.
Touchpoints:
- gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry
- run_agent.py: Set both ContextVar and os.environ at init and on
context-compression rotation
- tools/environments/local.py: Bridge ContextVars into subprocess env
in _make_run_env() (ContextVars don't propagate to child processes)
- tests/run_agent/test_session_id_env.py: 3 tests covering env, provided
ID, and ContextVar paths
execute_code subprocess already passes HERMES_* prefixed vars through
_scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_').
Primary use case: webhook-triggered agents that need to include a
`--resume <session_id>` takeover command in their output.
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).