OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token. The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.
Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run(). The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.
Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD. Without the guard:
✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
✓ Connected (1487ms) — 63 tools discovered
Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
DocuSeal), but only passes when POST returns 2xx + MCP content-type.
Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
(making it OAuth-aware), but is labeled duplicate of #37598 and may not
land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.
Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
_preflight_content_type raises NonMcpEndpointError for 200 text/html
(the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
_preflight_content_type when auth_type=="oauth", using class-level
monkeypatching so the gate is exercised without a live MCP transport.
23 passed tests/tools/test_mcp_preflight_content_type.py
When security.redact_secrets is on (default), read_file/search_files/cat
applied redact_sensitive_text(code_file=True) to file content, which still
ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.)
came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking
truncated key. When an agent read that and wrote it back to config, the masked
value replaced the real credential, silently breaking auth (401). Production
evidence: a config.yaml found containing the exact 13-char masked GitHub PAT.
The two community PRs (#35529, #35534) fixed the corruption by NOT redacting
prefixes for config reads — but that exposes the user's real keys to the agent
context, model, and logs (a security regression). This takes the safer route:
keep redacting, but for file content emit a NON-REUSABLE sentinel.
- New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor
label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis
wrapper is syntactically invalid as a token so it can't be mistaken for or
written back as a usable key).
- New `redact_sensitive_text(file_read=True)` routes prefix matches through it
(implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token`
still keeps head/tail (fine for logs, never written back).
- Wired the 3 file_tools.py call sites (read_file / search_files / cat) to
file_read=True.
Fixes both the corruption AND avoids the secret-exposure of the un-redact
approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default
mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass;
mutation-verified (reverting to the old mask fails the sentinel/leak tests).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com>
Closes#35519. Supersedes #35529, #35534.
* fix(security): redact secrets in background process + foreground env-dump output
Terminal-output redaction was incomplete (#43025):
- Gap 1: process(action=poll/log/wait) returned background stdout verbatim —
no redaction at all. A background printenv/server/test emitting a key leaked
raw to the model, session.db, and CLI display. Same for the gateway
background-process watcher's completion/progress notifications.
- Gap 2: the foreground terminal path hardcoded code_file=True, which skips the
ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv
leaked even there.
Adds agent.redact.redact_terminal_output(output, command) as the single policy
for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/
declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens;
other commands stay on code_file=True to avoid false positives on source dumps.
Wired into terminal_tool, process_registry (_handle_process boundary), and the
gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved.
* docs: add infographic for #43025 terminal-output redaction fix
The snapshot/vision guards re-check the page URL before returning content,
but browser_console(expression=...) -> _browser_eval returns arbitrary JS
results directly, leaving two same-class bypasses open:
1. Direct fetch: fetch('http://127.0.0.1/secret').then(r=>r.text()) reads
a private endpoint and returns the body — the page URL stays public so
the post-eval recheck never sees it.
2. Navigate-then-read: location.href='http://127.0.0.1/' then a later eval
reads document.body.innerText.
Guard _browser_eval on the same condition as navigate/snapshot/vision
(not local backend, not local sidecar, not allow_private_urls):
- pre-scan the expression for private/always-blocked URL literals
- re-check window.location.href after the eval at both success-return
sites (supervisor fast-path + subprocess fallback)
Probe failures fail-open (matching the snapshot/vision guards).
The private-network guard in browser_snapshot() and browser_vision()
blocked all private URLs, including those accessed via local sidecar
sessions (hybrid routing). Local sidecar sessions intentionally access
private URLs — the cloud provider never sees the URL in that case.
Add `_is_local_sidecar_key(effective_task_id)` check to both guards,
matching the existing pattern in browser_navigate().
Fixes#45101 review feedback from egilewski.
The SSRF bypass in #44731 was only patched for browser_snapshot(), but
browser_vision() exposes the same vulnerability — it takes a screenshot
and sends it to the vision model without checking if eval-driven
navigation moved the page to a private/internal URL.
Add the same current-page URL safety check to browser_vision() before
any screenshot is captured, encoded, or forwarded to the vision model.
This covers both the normal screenshot path and the Lightpanda Chrome
fallback path.
7 new tests: blocks private URL, allows public URL, skips in local
backend, skips when private URLs allowed, handles eval failure/empty/exception.
browser_snapshot() now checks the current page URL before returning
content. When browser_console() changes location.href to a private or
internal address (e.g., http://127.0.0.1:8080/), the snapshot returns
an error instead of exposing the private page content.
This closes the SSRF bypass where an attacker could:
1. Navigate to a public page
2. Use browser_console to eval location.href = 'http://127.0.0.1:port/'
3. Use browser_snapshot to read the private page content
The fix reuses the existing _is_safe_url() and _allow_private_urls()
infrastructure, and fails open if the URL check itself fails.
Fixes#44731
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.
- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
that proves the snapshot never tears (would still tear with $$).
Fix race condition in terminal environment snapshots that could corrupt
PATH with declare -x entries. When concurrent terminal calls share the
same snapshot file, the non-atomic 'export -p > snapshot.sh' write could
be read mid-write by another process, causing partial/corrupted env vars
to be sourced and mixed into PATH.
The fix uses atomic file replacement:
- Write to a temp file: export -p > snapshot.sh.tmp.303651
- Atomically replace: mv -f snapshot.sh.tmp.303651 snapshot.sh
On POSIX, mv within the same filesystem is atomic, so source() will
either see the old complete snapshot or the new complete one, never a
partial/truncated file.
Fixes#38249
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.
Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):
_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.
Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
the match can never span a line break. A real DSN password never contains
whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
sites (file_tools already did) so code-execution output isn't corrupted by
ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
private keys are still redacted.
Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.
Reported-by: koishi70
Reported-by: pfrenssen
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
Two follow-ups on top of the salvaged #46365 fix:
1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
(#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
Point the three sys.modules patches at the current module path so the
ephemeral-fallback tests actually exercise the injected fake adapter.
2. Harden the live-adapter lookup: split the gateway import guard from the
adapter lookup and log (instead of silently swallowing) when a runner
exists but adapters.get() raises. A silent fall-through there would
re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
to prevent (#46310). Documented that the live adapter is gateway-owned and
must not be disconnected, and why the ephemeral finally never touches it.
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.
The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.
Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.
Fixes#46310
Fixes#28126. sync_skills() was unconditionally writing bundled skills
into the local <profile_home>/skills/ tree even when the profile's
config.yaml delegated skill resolution to an external directory
via skills.external_dirs. The skill loader then saw two candidates
for the same name (local shadow + external canonical), refused to
resolve on collision, and every worker that auto-loaded such a skill
crashed with 'Unknown skill(s): <name>'.
Changes:
- _build_external_skill_index() indexes skills available in external
dirs (by directory name and frontmatter name)
- sync_skills() skips writing a bundled skill when it finds the same
name in the external index; records the hash in the manifest so
subsequent syncs treat it as already handled
- Self-healing: removes stale local shadows left by prior buggy syncs
(only when origin_hash == bundled_hash == user_hash, i.e. we wrote
it and user didn't touch it)
- New 'shadowed_by_external' key in sync_skills() return dict
3 new tests in TestExternalDirsIndexing (all passing).
All 48 tests in test_skills_sync.py pass.
Closes#28126
The curator's LLM consolidation pass could archive whole clusters of
active skills with zero verified consolidations (#29912): a bare prune
(skill_manage delete with absorbed_into empty/omitted) from the forked
review agent was accepted, removing the skill's name from lookup even
though counts.consolidated_this_run was 0.
- _delete_skill now fails closed during the curator/background-review
pass: a delete is only allowed when it declares a verified
consolidation (absorbed_into=<umbrella>, umbrella must exist). A prune
with no forwarding target is refused; the skill stays active. The
deterministic inactivity prune (archive_skill) is unaffected.
- A verified consolidation delete during the curator pass now routes
through the recoverable archive primitive instead of shutil.rmtree, so
a misjudged consolidation can be undone with hermes curator restore.
The usage record is kept (state=archived) rather than forgotten.
- Foreground, user-directed deletes keep their existing hard-delete
semantics.
Follow-up to the cherry-picked #29212 (#29177):
- Promote the 24h stale-process threshold to config.yaml
(session_reset.bg_process_max_age_hours) instead of a hardcoded
constant. 0 disables the cutoff (legacy: any live process blocks reset).
Wired through GatewayConfig.default_reset_policy in gateway/run.py.
- Bug 2: process(action=list) now resolves the gateway session_key from
the contextvar and surfaces session-scoped background processes (a
forgotten preview server under a different task), flagged
session_scoped — so the agent/user can discover and kill the blocker.
Previously the task-scoped list returned [] and the blocker was invisible.
- Tests: config round-trip for the new field, cross-task list visibility.
- Docs: messaging session-reset section.
Background processes (e.g. http.server preview) that Hermes starts and
forgets about previously blocked session idle/daily reset indefinitely.
The reset guard in session.py checked has_active_for_session() with no
max age — a 3-day-old preview server blocked reset the same as a task
started 30 seconds ago.
Changes:
- Add max_active_age parameter to has_active_for_session() in
process_registry.py. Processes older than this threshold are ignored.
- Add MAX_ACTIVE_PROCESS_AGE constant (24h / 86400s).
- Wire max_active_age into the gateway's session store callback in
run.py so stale processes no longer block session lifecycle.
- Add debug logging when reset is skipped due to active processes.
- Add 3 tests covering recent, stale, and legacy (None) max age.
Fixes#29177
The module-level import broke tests/tools/test_managed_browserbase_and_modal.py,
which loads browser_tool.py via spec_from_file_location against a stubbed
'tools' package that does not include tools.environments.local. Move the import
into a _build_browser_env() helper called at the two agent-browser spawn sites,
matching the lazy-import pattern already used by lazy_deps.py.
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.
Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:
- Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
secrets -- stripped even for credential-inheriting children.
- Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
unless inherit_credentials=True. The opt-in is grep-able for audit.
Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.
This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.
Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
The durable _last_known_cwd anchor is keyed by the shared 'default' container,
so a non-owning worktree session could inherit the owning session's cwd through
it — breaking the wrong-worktree-routing fix (test_file_tools_cwd_resolution::
test_resolution_routes_to_resolving_sessions_worktree).
Reorder _authoritative_workspace_root so the session-specific registered cwd
override (keyed by raw session id) is checked BEFORE the shared-container
_last_known_cwd fallback. A non-owning session now resolves into its own
registered worktree; the durable anchor only fills in when there's no
session-specific override (the #26211 single-session case). Adds a regression
test covering the owner-mirrors-then-other-session-resolves interaction.
Belt-and-suspenders on top of the cherry-picked cwd-preservation fix:
- Proactively mirror every live terminal cwd into _last_known_cwd on each
successful read, so the durable anchor survives even when the cleanup
thread pops both _file_ops_cache and _active_environments before
_get_file_ops' stale-cache save branch can fire.
- Fall back to _last_known_cwd in _authoritative_workspace_root. write_file_tool
resolves the path (via _resolve_path_for_task) BEFORE _get_file_ops rebuilds
the env, so restoring only the rebuilt env's cwd was insufficient — the
resolution that decides where the file lands runs first. This closes that gap.
The local env's persisted _cwd_file can't serve this role: it's keyed by a
random per-session uuid and deleted on cleanup (the same cleanup that triggers
the bug). The in-memory _last_known_cwd registry is the durable anchor instead.
Adds a real-IO E2E regression (TestSilentFileMisplacementE2E) exercising the
actual write_file_tool path after env cleanup.
Root cause: when the terminal environment (`_active_environments` entry) is
cleaned up and re-created during a long conversation, the new environment
always starts with the default config CWD (typically `~/.hermes/hermes-agent`)
instead of preserving the user's last-known working directory. Subsequent
relative-path writes (`write_file`, `execute_code`, shell commands) silently
land in the default CWD, making files appear to be "created but absent."
Fix: add `_last_known_cwd` dict that preserves the old environment's CWD
before the stale cache entry is invalidated. When a new environment is
created for the same task_id, we check `_last_known_cwd` first and use the
preserved CWD instead of the config default.
Changes:
- tools/file_tools.py: add `_last_known_cwd` dict, save CWD before stale
cache invalidation, restore CWD on env recreation
- tests/tools/test_file_tools.py: add `TestLastKnownCwd` with 2 tests
verifying CWD preservation and fallback behavior
Fixes#26211
A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements
running `docker version` with a 5s timeout, momentarily timing out under load)
would return False for a single get_tool_definitions() call. Because file
tools delegate their check_fn to the terminal check, that one flake silently
stripped read_file/write_file/patch/search_files AND terminal from whatever
agent was being constructed at that instant — most visibly a delegate_task
subagent, which then reported "Tool read_file does not exist". This explains
both the intermittent (~80% success) user-session failures and the
deterministic cron failures in #21658 / #5304.
The existing _check_fn TTL cache made this worse: it cached the transient
False for the full 30s window, poisoning every subagent spawned in that span.
Fix: remember the last time each check_fn returned True; when a fresh probe
fails within a short grace window of that success, treat it as a flake —
serve the last-good True and do NOT cache the failure (so the next call
re-probes). A failure with no recent success, or past the grace window, is
honored normally so a backend that genuinely went down stops advertising its
tools. Probe failures now log at WARNING regardless of quiet mode, making the
previously-silent tool loss diagnosable in subagent (quiet) sessions.
Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.
- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
docker/powershell/…); calls to those are flagged even when captured. Non-flashing
programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
_subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
* fix(windows): stop terminal-window popups from background spawns
Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:
- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
subprocess allocate a fresh console when the launching process has
none (pythonw desktop backend / detached gateway) - even with output
captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
get a console regardless of how they're launched.
Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
exclusively owns an auto-allocated console detaches it at startup
(GetConsoleProcessList()==1 gate leaves shared interactive consoles
untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
tools repo-wide so the popup class can't regress.
* fix(windows): scope FreeConsole to background entry points; fix merge fallout
Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.
- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
(start_server, what the desktop spawns), cron standalone, tui_gateway entry,
slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
no-op without console / on POSIX, and asserts it's not an import side effect.
Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
every terminal init). _subprocess_compat.popen already hides the window, so
drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
case-variant test to drive that production path.
* test(claw): mock _subprocess_compat.run seam for Windows process scan
claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
* fix(windows): stop subprocess console-window popups + add CI guard
The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.
- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
calls that can create a new console — output-redirection-aware (capture/
redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
windows_detach_popen_kwargs(); marked intentionally-visible launches
(editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.
* test: accept creationflags kwarg in psutil_android fake_subprocess_run
The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).
Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.
Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.
Also adds the AUTHOR_MAP entry for the salvaged contributor.
Closes#23473
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open. When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.
Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it. The default tirith_fail_open: true behaviour
is unchanged.
Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false → always allowed regardless of fail_open
Fixes#20733
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
# tests/tools/test_approval.py
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).
- _order_for_recall() stable-sorts FTS rows so interactive sources rank
above cron before lineage dedup; within each class BM25/recency order
is preserved. Cron is demoted, not excluded, so it still surfaces when
it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
in hand for the demotion pass.
Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.
- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
routes media_files through the whatsapp plugin's standalone_sender_fn, and
add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
chunk is media-only), then uploads each file via the bridge /send-media
endpoint with a mediaType derived from extension/is_voice/force_document, so
images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.
Closes#19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).
Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.
- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
instead of returning when the reconnect budget is exhausted, so the task
stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
parked/dead task and respawning a dead stdio subprocess) and returns a
clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
(stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling
Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.
Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().
Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.
Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).
Co-authored-by: season179 <season.saw@gmail.com>
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."
Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).
Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.
Closes#35927.
SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
The OpenAI SDK exposes client.base_url as an httpx.URL object, not str.
The isinstance(live_raw, str) guard made this branch dead code in
production. Use _normalized_runtime_url (which coerces via str()) so
the fallback actually fires.
When parent_agent.base_url still carries a stale OpenRouter URL but the
live OpenAI client already points at local Ollama, subagents were routing
API calls to OpenRouter and failing with HTTP 401. Prefer _client_kwargs
and the mounted client base_url when they disagree with the surface field.
preexec_fn=os.setsid runs Python code in the forked child before exec,
which is unsafe in multi-threaded processes (CPython docs). When the
Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs)
with active thread pools, the fork can SIGSEGV before the child execs.
Replace all preexec_fn usage with start_new_session=True, which provides
the same setsid/process-group semantics without running Python in the
fork. This is already the pattern used throughout hermes_cli/gateway.py
and hermes_cli/_subprocess_compat.py.
Fixes#46789
The autonomous self-improvement review fork could still write to a pinned
skill — only external/bundled/hub-installed/protected-builtin skills were
guarded. The curator skips pinned skills from every auto-transition; the
review fork is the same kind of no-user-present actor and must too.
Adds a pin check to _background_review_write_guard so background-origin
edit/patch/delete/write_file/remove_file on a pinned skill are refused.
Stricter than the foreground _pinned_guard (delete-only) by design: with
no user in the loop there is no one to consent to an edit.
Fixes#25839
The dangerous-command approval layer already blocks `hermes gateway
(stop|restart)`, `pkill/killall hermes|gateway`, and `kill ... $(pgrep ...)`.
A reporter noted on #33071 that the agent can still achieve the same
effect by driving launchd directly against the gateway's service label
(`launchctl stop ai.hermes.gateway`, `launchctl kickstart -k
system/ai.hermes.gateway`, etc.) or by substituting `pidof` for `pgrep`
in the kill-expansion form.
This widens the "Gateway lifecycle protection" block in
`tools/approval.py` to cover both vectors:
- `launchctl (stop|kickstart|bootout|unload|kill|disable|remove)`
scoped to commands that target a Hermes label (`hermes`,
`ai.hermes`). Read-only inspection (`launchctl print …`,
`launchctl list`) and operations against unrelated labels remain
unflagged.
- `kill ... $(pidof …)` and the backtick form, alongside the existing
`pgrep` expansion. `pidof` is the BSD/Linux equivalent and is
equally opaque to the `(pkill|killall) … hermes` name pattern.
Intentionally left out of scope: plain `kill -TERM <numeric_pid>` with
a PID looked up out-of-band. Catching that would require runtime PID
state and would break the existing
`TestPgrepKillExpansion::test_safe_kill_pid_not_flagged` contract,
which guarantees that a plain literal-PID `kill 12345` stays safe.
Natural-language skill search returned a short, arbitrary list and never
surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes:
1. The runtime index collapses every GitHub tap into source="github", so
there was no way to find or filter by provider at the CLI — the per-tap
identity only existed in the docs-site catalog.
2. HermesIndexSource.search matched only name/description/tags (not the
identifier or provider) and broke at the first `limit` hits in raw index
order, burying the most relevant skills. `search` also defaulted to
--limit 10 against an 86k-entry catalog.
Changes:
- GitHubSource stamps a per-tap provider label (extra.provider) on each
skill via github_provider_for(); source stays "github" so dedup/floor/
index-skip logic is untouched. Flows into the built index.
- HermesIndexSource.search now matches identifier + provider too, and
collect-then-ranks (exact > prefix > whole-word > substring) instead of
break-at-limit.
- --source nvidia|openai|anthropic|huggingface|voltagent|gstack|minimax
provider filters for browse/search (narrows merged results by provider).
- search --limit default 10 -> 25; table Source column shows the provider
label for github skills.
Tested: 181 unit tests pass; E2E against the live runtime index confirms
'nvidia'/'cuda' searches now surface NVIDIA-provider skills and
--source nvidia narrows to exactly the NVIDIA catalog.
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:
1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
→ /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
drops the -c argument, so the command is swallowed (exit 0, no output).
Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.
Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.
Closes#42203. Supersedes #42290.
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.
Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.
Fixes#35075
Co-authored-by: rlaope <piyrw9754@gmail.com>
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.
Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
Fixes#36767.
Two complementary recoveries for the recurring "delete three cache files and
re-auth by hand" ritual when an MCP server's dynamically-registered OAuth
client goes dead server-side (IdP redeploy / DB wipe / rebrand):
- Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs
auth-flow responses and, on a 400/401 `invalid_client` from the discovered
token endpoint, backs up + deletes `<server>.client.json` and `.meta.json`
and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client
registration on the next flow. Conservative by construction: only
dynamically-registered (non config-supplied) clients, only the token
endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's
`invalid_client_metadata` does not trip it); best-effort so a miss never
breaks the live flow. Covers both code-exchange and refresh when the token
endpoint was discovered. Tokens are preserved.
- `hermes mcp reauth [<name>|--all]`: the reporter's primary symptom — the
IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK
only sees a callback timeout), so it cannot be auto-detected. The new
command re-auths one or ALL `auth: oauth` servers, serially: one browser
flow at a time, which also fixes the startup popup storm when several
servers are stale at once. Single-server reauth is factored out of
`mcp login` and shared.
Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint,
success-response, pre-registered, and invalid_client_metadata negative guards;
a bridge integration test driving the real async_auth_flow generator to prove
the detection hook preserves the bidirectional asend() forwarding contract;
reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh
122/122 green for the touched suites; check-windows-footguns.py and ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.
Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.