Sibling-site follow-up to the AGENTS.md token-lock fix (#50481). Platform
adapters migrated from gateway/platforms/<name>.py to
plugins/platforms/<name>/adapter.py; a handful (signal, weixin, bluebubbles,
qqbot, yuanbao, msgraph_webhook, webhook, api_server) still live in
gateway/platforms/.
- adding-platform-adapters.md: new-adapter creation path + reference-impl table
- gateway-internals.md: rewrite the adapter tree to reflect the actual split
- zh-Hans mirrors of both kept in parity
- scripts/release.py: add TutkuEroglu to AUTHOR_MAP (CI gate)
gateway/platforms/telegram.py no longer exists (adapters moved to
plugins/platforms/<name>/adapter.py) and telegram no longer uses the
scoped-lock pattern. Point the token-lock canonical-pattern reference to
plugins/platforms/irc/adapter.py, which acquires the lock in connect()
and releases it in disconnect() — and is already cited as a canonical
example in ADDING_A_PLATFORM.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
The welcome banner's 'Available Tools' merged in every toolset from the
global check_tool_availability() registry walk, regardless of whether it
was enabled for the current platform. On a Blank Slate CLI (file +
terminal only) that surfaced discord / feishu / kanban tools the agent
was never actually given — they are not in the agent's tool schema, but
the banner displayed them, making it look like they were exposed.
- Filter the unavailable-toolset merge to toolsets actually in
enabled_toolsets (a toolset that's enabled but has unmet deps still
legitimately shows as disabled/lazy).
- Gate the 'Available Skills' section on the skills toolset being
enabled — when it's off, the agent can't load any skill, so show
'Skills toolset disabled' instead of the on-disk catalog.
When enabled_toolsets is empty (older callers), behavior is unchanged.
Validation: blank-slate banner now shows only file + terminal and
'Skills toolset disabled'; a skills-enabled banner still lists the
catalog. Added regression tests; full banner suite green (15/15).
Live testing against a real SIGTERM-ignoring process TREE (parent + children,
the agent-browser daemon + renderer shape) revealed psutil.wait_procs's
gone/alive partition mis-handles a parent/child tree: it reaps via
Process.wait() and could mark targets gone/alive inconsistently across the
tree, leaving survivors un-killed (flaky — sometimes the parent lived,
sometimes a child). Replace it with: sleep out the grace window, then
directly re-probe every captured target (_proc_alive, treating zombies as
dead) and SIGKILL any that's still running. Add a multi-child-tree regression
test. 6/6 escalation tests green across repeated runs; the real-tree E2E now
kills the full tree 6/6 runs.
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).
Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.
Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.
Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
Surface dangerous host/deployment posture at gateway startup so operators get
the 'you're exposed' signal the June 2026 MCP-config persistence campaign
victims never had. Warn-only — never blocks startup, never raises.
Checks (each independently fail-safe):
- Running as root (POSIX uid 0)
- SSH daemon with PasswordAuthentication enabled (incl. the 'yes' default)
- Running in a container with no persistent volume mount over HERMES_HOME
- Network-accessible API server with no API_SERVER_KEY
New module hermes_cli/security_audit_startup.py; invoked once per process from
start_gateway() right after setup_logging(). Cross-platform (root/SSH checks
no-op on Windows). Idea: @Cthulhu.
The s6 dashboard entrypoint and docker integration tests relied on
HERMES_DASHBOARD_INSECURE=1 to bring up a 0.0.0.0 dashboard with no auth
provider. With --insecure now a no-op (auth gate mandatory on non-loopback
binds), that path fails closed.
- s6 dashboard/run: drop --insecure derivation; warn that the env is a no-op
and point operators at HERMES_DASHBOARD_BASIC_AUTH_* / OAuth.
- docker tests: supervision tests now register the bundled basic password
provider (HERMES_DASHBOARD_BASIC_AUTH_USERNAME/_PASSWORD) so the gate has a
provider and the dashboard binds. Rewrote the insecure-opt-out test to
assert fail-closed (dashboard does NOT serve) instead of gate-bypass.
- docs (en + zh-Hans): HERMES_DASHBOARD_INSECURE documented as deprecated
no-op; basic-auth is the zero-infra way to authenticate a containerized
public dashboard.
Remove the dashboard --insecure auth-bypass, add an MCP persistence guard +
IOC blocklist, and raise the API-server key entropy floor.
Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media
instance): scanners find exposed Hermes dashboards/API servers, drive the
root agent to plant a 'command: bash' MCP entry that appends an attacker SSH
key to authorized_keys, which cron + startup then re-execute every tick.
- dashboard: --insecure no longer disables the auth gate. should_require_auth
returns True for every non-loopback bind; a public bind ALWAYS requires an
auth provider (bundled password provider or OAuth). --insecure kept as a
warned no-op for backward compat. Fail-closed error now points at the
password provider, not at --insecure.
- mcp_security: validate_mcp_server_entry now also rejects shell payloads that
write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/
rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key +
source IPs) anywhere in command/args/env. Runs at save AND spawn time.
- api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars;
warn when a network-accessible API server runs an unsandboxed local backend.
Same library-code anti-pattern as the compressor fix: MiniSWERunner.__init__
called logging.basicConfig(), overriding the application's root logger config
every time a runner was instantiated. Moved the call into main() (the CLI
entry point) where it belongs; __init__ now only does getLogger(__name__).
Standalone verbose logging is preserved.
logging.basicConfig() in TrajectoryCompressor.__init__ overrides the
root logger configuration every time the class is instantiated. Library
code should use logging.getLogger(__name__) and let the application
entry point configure the root logger.
Fixes inconsistent log formatting when the compressor is used alongside
other logging configuration in the gateway.
* fix(agent): strip stale reasoning_content when falling back to a strict provider
A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content
on every assistant tool-call turn (a single space " " pad). api_messages is
built once under the primary; on a mid-session fallback to a strict
OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale
pads were replayed verbatim and rejected with HTTP 400/422:
body.messages.2.assistant.reasoning_content: Extra inputs are not
permitted (input: ' ')
reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never
reconciled history built under a reasoning primary against a strict fallback.
copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only
shapes to non-pad providers.
Fix both sites: when the active provider does not enforce echo-back, strip
reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when
switching TO a reasoning provider is preserved. Covers the Cerebras 400 from
#45655 and the DeepSeek->Mistral 422 fallback report.
Refs #45655.
* test: update reasoning-replay tests for strict-provider stripping
test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was
implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning
provider so the precedence it checks is observable. Add a positive
strict-provider test asserting reasoning_content is stripped on replay.
Addresses reviewer feedback on #13377:
1. Restore all stripped docstrings (_load_config, _is_breaker_open,
sync_turn, register, _get_client, _read_filters, _write_filters,
_unwrap_results, save_config) and section dividers
2. Revert api_key to required:true in schema — self-hosted Mem0 also
requires auth by default; validation in _get_client() handles the
either/or logic separately from the schema
3. Confirm secret:true remains on api_key (already correct)
The mem0 plugin previously hardcoded api.mem0.ai as the endpoint.
This adds a `host` config key and MEM0_HOST env var so users can
point the plugin at a self-hosted Mem0 instance.
Changes:
- _load_config(): read MEM0_HOST env var
- is_available(): accept host OR api_key (self-hosted may not need a real key)
- get_config_schema(): add host field
- initialize(): read host from config
- _get_client(): pass host kwarg to MemoryClient when set
- system_prompt_block(): show target (cloud vs URL)
- README: document self-hosted setup
The PID-reuse guard (#43846) reads /proc/<pid>/stat field 22, which only
exists on Linux — on macOS/Windows it returned None and the guard silently
degraded to a bare liveness check (a no-op, safety-wise). Add a
psutil.create_time() fallback (psutil is a hard dep, cross-platform),
quantized to centiseconds for stable equality, so the recycled-PID guard
actually protects macOS/Windows too. /proc always wins first on Linux and
always misses on macOS/Windows, so the two sources never mix on one host and
same-source equality is all the guard needs.
The salvaged test spawned a listener subprocess that printed its port
immediately after bind() but BEFORE listen(), so under CI's loaded 8-worker
box the parent connected before the socket was listening -> ConnectionRefused
(flaked on test slice 2/6). Reorder the child to listen() then print the port,
and make the client connect with a short bounded retry to absorb scheduler
jitter. 15/15 green locally including direct hammering.
Follow-up to the salvaged #43846 commits: the WhatsApp adapter moved from
gateway/platforms/whatsapp.py to plugins/platforms/whatsapp/adapter.py since the
PR was authored. The cherry-pick brought _listener_pids_on_port's `re.finditer`
ss-fallback and the new test's import, but the new module location doesn't import
`re` (latent NameError on the lsof-absent fallback path) and the test imported the
old module path. Add `import re` to the adapter and repoint the test import.
This is the bug that was actually closing Firefox. `_kill_port_process`, run on
every bridge (re)start to free the port, used `lsof -ti :PORT` / `fuser PORT/tcp`
— both of which match a process whose socket merely *involves* that port number
in ANY state, including ESTABLISHED client connections. It then SIGTERMed every
match.
The bridge defaults to port 3000 — a ubiquitous local dev-server port. With a
browser tab open on localhost:3000, `lsof -ti :3000` returned Firefox's PID, so
each restart of the (crash-looping) WhatsApp bridge SIGTERMed Firefox, closing
the whole browser at irregular intervals with no crash and no coredump.
Proven live with the kernel `signal:signal_generate` tracepoint:
hermes-gateway(3396516) -> sig=15 (code=0/SI_USER) -> comm=firefox pid=3371585
captured immediately after a gateway start, while Firefox held a socket on the
bridge port. Demonstrated over-match: `lsof -ti :8080` returns the listener AND
the gateway's own client connection; `lsof -ti tcp:8080 -sTCP:LISTEN` returns
only the listener.
Fix: `_listener_pids_on_port` resolves only LISTEN-state sockets
(`lsof -ti tcp:PORT -sTCP:LISTEN`, with an `ss -ltnp` fallback) and
`_kill_port_process` signals just those. A client whose connection happens to
involve the port number is never touched — which is also more correct, since a
client never blocks the new bridge from binding. Windows already filtered
LISTENING; the broad `fuser -k` path is removed.
Adds TestKillPortProcess: real-socket tests proving a separate client process
is excluded from the listener lookup and survives port cleanup. 9 tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`_kill_stale_bridge_by_pidfile` SIGTERMed the PID recorded in `bridge.pid`
after only a bare liveness check. Once the bridge exits and is reaped the
kernel recycles that PID onto an unrelated process; because the WhatsApp bridge
crash-loops ("Bridge process died (exit code 1)" repeating), this cleanup ran
on every restart and could SIGTERM a recycled PID that had landed on the user's
browser — closing Firefox at irregular intervals with no crash and no coredump
(a clean kill of a stranger).
Same PID-recycling class as the MCP reaper (7bd1f8a2d) and the process-registry
host-PID guard (e6a99cef2); this was the third, and most actively-fired, path.
Fix: `_write_bridge_pidfile` now also records the leader's kernel start time
(line 2). `_kill_stale_bridge_by_pidfile` re-validates identity via
`_bridge_pid_is_ours` before signalling — the (pid, start time) pair must match,
or for legacy single-line pidfiles the live cmdline must name `node` + this
session's unique path. A recycled PID (different start time / cmdline) is logged
and skipped, never signalled. Legacy pidfiles stay readable.
Adds TestWhatsappBridgePidfile: real-process tests proving a genuine bridge is
reaped while a recycled PID (start-time mismatch, or non-bridge cmdline) is
spared. 7 new + 108 gateway/registry tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The background-process registry signalled host PIDs (recovery adoption,
detached-session kill, tree-kill) using a number captured at spawn, guarded
only by a bare liveness check. Once a session's process exits and is reaped the
kernel recycles that PID onto an unrelated process, so an alive-but-different
PID passed the check and got tree-killed.
Observed in the wild: a recycled background-session PID landed on Firefox's
session leader; a later kill/refresh walked its process tree and SIGTERMed
every tab — Firefox "closing" at irregular intervals with no crash/coredump.
This is the same PID/PGID-recycling class fixed for the MCP orphan reaper in
7bd1f8a2d, but the process_registry subsystem was never guarded — so the bug
persisted.
Fix: record each host process's kernel start time (/proc/<pid>/stat field 22)
at spawn, persist it in the checkpoint, and re-validate it before every signal
via `_host_pid_is_ours`. A PID whose start time no longer matches — or that is
gone — is never signalled:
- recover_from_checkpoint: a recycled PID is not adopted as a session.
- _refresh_detached_session: a recycled detached PID is marked exited.
- kill_process / _terminate_host_pid: refuse to tree-kill a stranger.
Legacy checkpoints and platforms without /proc (no baseline) degrade to the
prior best-effort liveness behaviour, so nothing else changes.
Adds TestPidReuseGuard: real-process tests proving a mismatched start time
refuses termination while a matching one still kills, plus recovery/refresh
recycling paths. 74 registry + 22 MCP-stability tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kanban-worker and kanban-orchestrator bundled skills existed only to
be force-loaded into dispatcher-spawned workers, gated by
environments:[kanban] so they wouldn't leak into normal CLI listings.
That gating was fragile (the leak that #50443 patched) and the
--skills auto-load was already best-effort — most workers ran without it
because the bundled skill isn't present in profile-scoped skills dirs.
Remove the skills entirely and promote their load-bearing content
(workspace kinds, deliverable artifacts, created-card integrity, profile
discovery) into KANBAN_GUIDANCE, which is already injected into every
kanban worker's system prompt. Net result: every worker reliably gets
the guidance, nothing can leak into a CLI/blank-slate session, and the
gating machinery is gone.
- agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE
- hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe
- hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card
- hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text
- delete skills/devops/kanban-{worker,orchestrator}
- docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references
- tests: update spawn-argv expectations; re-bound the guidance-size guard
Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).
On a Linux source install the in-app updater ran the full backend update +
desktop rebuild successfully but never restarted the app — it hung forever on
the applying overlay with no close button. Two causes:
- applyUpdatesPosixInApp() only handled the macOS .app bundle swap;
runningAppBundle() is null off macOS, so Linux fell through to
{ ok: true, backendUpdated: true } without ever relaunching.
- The renderer store had no terminal state for that result shape, so
$updateApply stayed { applying: true } and the overlay's close button
(hidden while applying) never appeared.
Fix (new electron/update-relaunch.cjs, pure + unit-tested):
- Decide the Linux outcome from whether the *running* binary is the one we
just rebuilt (execPath under release/<plat>-unpacked, path-segment-aware so
linux-unpacked-evil can't masquerade) and whether its chrome-sandbox helper
is launchable (root:root + setuid, or an --no-sandbox / ELECTRON_DISABLE_SANDBOX
opt-out):
relaunch — detached watcher waits for this PID to exit (graceful, then
SIGKILL), self-deletes, and re-execs the rebuilt binary with the original
launch context (filtered args + HERMES_*/sandbox env + cwd) restored.
guiSkew — AppImage/.deb/.rpm/dev: backend updated but this GUI package was
NOT changed; surface an honest closeable 'reinstall the desktop app'
terminal state instead of lying that it loads next launch (#37541 skew).
manual — rebuilt binary but sandbox helper not launchable: keep the
working window, don't quit into a dead app.
- store/updates.ts lands a terminal, closeable state for EVERY resolved apply
outcome (handedOff / guiSkew / manualRestart / updated-not-relaunched / error)
so the hang is impossible regardless of platform or result.
- New DesktopUpdateStage values (update/rebuild/done/guiSkew) + GuiSkewView so
progress reads correctly and the skew state is closeable. i18n in all four
locales (en/ja/zh/zh-hant) in parity.
- electron/update-relaunch.test.cjs (16 tests) + store outcome tests.
Salvaged from #45205 onto current main. Linux quit dwell uses the shared
UPDATE_HANDOFF_DWELL_MS (2.5s) from #50448 for consistency. Four-locale i18n
parity, AUTHOR_MAP entry, and the test wiring added on top.
Closes#45205.
* fix(desktop): filter undefined entries in AttachmentList to prevent refText crash on session switch
When switching sessions, the attachments array can contain stale/undefined
entries from the previous session's state. Accessing attachment.refText on
an undefined entry throws TypeError, breaking session switching entirely.
Fix: add .filter(Boolean) before .map() to skip undefined/null entries.
Fixes#49614
* fix(desktop): update I18nConfigClient usage in attachment test
The i18n config API changed from getLocale/saveLocale to
getConfig/saveConfig. Update the test fixture to match.
CI on the salvage caught two issues the stale PR base masked:
1. The model-setup flows were extracted from main.py into
hermes_cli/model_setup_flows.py after @pmos69 forked. The cherry-pick
re-introduced a stale _model_flow_custom into main.py (duplicating the
one main.py now imports) and put _model_flow_google_antigravity there too.
Move the antigravity flow into model_setup_flows.py alongside its siblings
and drop the stale _model_flow_custom dup. Fixes the getpass/stdin OSError
in tests/cli/test_cli_provider_resolution.py.
2. google-antigravity re-exposes Claude/Gemini/GPT-OSS models, so its catalog
was hijacking bare short aliases (`sonnet` -> google-antigravity instead of
anthropic) in detect_static_provider_for_model via dict insertion order.
Add _BORROWED_MODEL_PROVIDERS and defer those providers to a last-resort
pass so a model's native vendor always wins alias/direct-catalog detection.
Fixes tests/hermes_cli/test_models.py::test_short_alias_resolves_to_static_model.
The salvaged PR wired auth.py / providers.py / runtime_provider.py for
google-antigravity but never registered a ProviderProfile, so the provider
was invisible to list_providers() / the model picker / alias resolution.
Register it in the gemini model-provider plugin (alongside gemini and
google-gemini-cli) with the antigravity-pa:// scheme and aliases. Also add
@pmos69 to release.py AUTHOR_MAP (CI gate).
Salvage follow-up on top of @pmos69's #29474. The PR resolved the
Antigravity OAuth client purely by discovering it from an installed `agy`
binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without
agy installed hit a hard 'client ID not available' error.
Antigravity's desktop OAuth client is a public, non-confidential installed-app
client (PKCE provides the security), baked into every copy of the Antigravity
CLI — same posture as the gemini-cli credentials Hermes already ships in
google_oauth.py. Bake it in as the final fallback (env -> discovery -> public
default) and add the public default Code Assist project as the discovery
fallback, matching the reference Antigravity flow. Now consumers can
authenticate directly without agy installed.
Pin the contract that ``_apply_env_overrides`` consults ``is_connected``
before the install-triggering ``check_fn``: an unconfigured platform is
skipped without calling ``check_fn`` (no lazy install), while a configured
platform still has ``check_fn`` run and is auto-enabled. The first assertion
fails on the pre-fix unconditional sweep.
For adapter plugins, ``PlatformEntry.check_fn`` doubles as a lazy installer:
calling it pip-installs the platform SDK as a side effect (see e.g.
``plugins/platforms/discord/adapter.py::check_discord_requirements``). The
enablement sweep in ``_apply_env_overrides`` called ``check_fn`` for every
registered plugin platform unconditionally, so a single
``load_gateway_config()`` — which the desktop/dashboard readiness probe
``GET /api/status`` awaits synchronously — pip-installed Discord, Telegram,
Slack, Feishu and Dingtalk even when the user configured none of them
(``platforms: none``). On a slow or restricted network the installs ran long
enough to block the event loop past the desktop's readiness timeouts, so the
app timed out, killed and re-spawned the backend, and boot-looped (stuck at
94%).
Consult the cheap ``is_connected`` credential check FIRST and only run the
install-triggering ``check_fn`` for platforms that are already enabled or
actually configured. Auto-enable-by-credentials is unchanged: a platform with
its token set still gets its SDK installed and enabled.
The pop-out position is a bottom-right corner inset; the old clamp only floored
it and capped each inset by a flat constant, so dragging left/up (or restoring a
position saved on a larger/other monitor) could push the box's width/height past
the left/top edges and strand it off-screen — unrecoverable since the bad spot
persisted to localStorage.
Now the clamp bounds the WHOLE box (accounting for its measured width/height plus
an edge margin) on all four sides. Applied on drag (measured size), on load
(clamped in readPosition), and via a mount + window-resize reclamp so a shrunk
window or stale persisted value always pulls the box back into view.
Follow-up to #50238/#50381. The restart-loop is now SAFE (marker + launch
gate), but the trigger that lured users into relaunching mid-update remained:
on the in-app update hand-off the desktop window vanished almost immediately
(app.quit() 600ms after spawning the detached updater), before the updater's
own window appeared — a blank-screen gap that looks like a crash.
- Linger on the update overlay for UPDATE_HANDOFF_DWELL_MS (2.5s, was 600ms)
before quitting, on BOTH hand-off paths (in-app update + Windows bootstrap
recovery), so the message lands and bridges to the updater window.
- Strengthen the restart-stage copy and the overlay's applyingBody/applyingClose
to explicitly tell the user the window will reopen automatically and NOT to
reopen Hermes themselves while it updates. All four locales (en/ja/zh/zh-hant)
updated in parity.
Pure UX; does not touch the #50381 marker/gate mutual-exclusion safety net.
The browser orphan reaper reads a daemon PID from a `.pid` file in a
world-writable, predictably-named temp dir (`/tmp/agent-browser-h_*`) it
does not write itself, then tree-kills that PID via `_terminate_host_pid`
after only a liveness check. A same-user actor could plant a fake socket
dir whose `.pid` points at an arbitrary victim process, and OS PID reuse
after the real daemon exits could land the recorded PID on an unrelated
process — either way an arbitrary same-user process (and its whole tree)
gets SIGTERMed. Local DoS.
Add `_verify_reapable_browser_daemon()`, gated before the kill: via psutil
(a hard dep, fine cross-platform for the same-user processes the reaper can
signal) require both (1) identity — `agent-browser` in the process
name/cmdline — and (2) binding — the live process references *this* session's
socket dir in its cmdline or `AGENT_BROWSER_SOCKET_DIR`. The binding check is
the real spoof defense: a planted/recycled PID won't embed our exact socket
path. Fail-closed on any ambiguity (unreadable cmdline, no match), leaving the
process and its socket dir untouched for a later sweep.
Builds on @sgaofen's fix in #14394 (cmdline identity check); rewritten to use
psutil instead of `/proc`+`ps` (cross-platform, Windows-covered) and to add
the session-socket-dir binding check for recycled-PID / spoof resistance.
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Follow-up to the salvaged #9560 fix:
- Replace the _TRAVERSAL_RE regex with an explicit _is_path_unsafe() helper
(drops the now-unused `import re`); catches a path separator ANYWHERE,
not just leading, so a non-leading Windows backslash can't slip through.
- Switch the per-entry skip in _ensure_loaded_locked from print() to
logger.warning to match the module's logging conventions.
- Add AUTHOR_MAP entry for the contributor.
- Add regression tests for the non-leading-separator case.
Extends the CWE-22 path traversal guard to cover Windows absolute paths
of the form C:/... and D:\... — previously only leading / and \ were
checked, which missed drive-letter prefixes. Replaces the inline
startswith check with a compiled module-level regex (_TRAVERSAL_RE) that
covers all three attack patterns: .., leading /\, and leading X: drives.
Adds two regression tests for C:/windows/system32 and D:\\path\\to\\file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses PR #9560 review comments: applies the CWE-22 fix to current main
(post-PR #458 rebase) and adds the requested regression tests.
- SessionEntry.from_dict now raises ValueError for session_key or session_id
containing '..' or starting with '/' or '\' (directory traversal guard)
- SessionStore._ensure_loaded moves per-entry validation inside the loop so
one malicious/corrupt entry is skipped with a warning instead of aborting
the entire sessions.json load
- Adds TestSessionEntryFromDictTraversalValidation (5 cases) and
TestEnsureLoadedSkipsInvalidEntries covering the skip-not-abort behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Secret redaction only matched `Authorization: Bearer <token>`. Other auth
headers passed through verbatim into logs, tool output, and transcripts:
- `Authorization: Basic <base64>` — leaks base64(user:password)
- `Authorization: token <pat>` / any non-Bearer scheme
- `Proxy-Authorization: ...`
- `x-api-key: <key>` (Anthropic and many providers) and `api-key`,
`x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with
no known vendor prefix were caught by nothing
A logged request or an echoed `curl -H "x-api-key: ..."` command therefore
leaked live credentials.
Generalize the Authorization rule to mask the credential for any scheme (and
Proxy-Authorization) while preserving the header name and scheme word for
debuggability, and add an api-key header rule for the single-opaque-value
headers. Bearer behavior is unchanged; plain prose containing the word
"authorization" (no colon-delimited value) is left untouched.
Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key
headers, including inside a curl command.
Follow-up to the salvaged #25961 fix: regression tests asserting that
scope-bearing IPv6 addresses (fe80::1%eth0, ::1%lo) are blocked by
is_safe_url after the scope is stripped, that a still-unparseable address
fails closed, and that a scoped IPv4-mapped IMDS address is caught by the
always-blocked floor.