On macOS app.quit() closes windows but window-all-closed deliberately keeps
the process alive (Dock convention). Every detached hand-off (update swap,
relaunch, Windows bootstrap recovery, uninstall cleanup) waits for the
desktop PID to exit before replacing/removing the bundle — so the process
never dying means the script spins its full PID-wait and the user sees a
blank app, or an uninstall that appears to do nothing.
Add a module-level isQuittingForHandoff flag, set before every hand-off
app.quit(); window-all-closed then quits on all platforms when it's set.
Covers all five hand-off sites including the Linux relaunch path.
The desktop install step ran npm ci / npm run pack with no wall-clock cap, and
the sibling browser-tools / TUI / agent-browser dependency installs had the same
gap. The Electron binary (~150MB) is fetched from GitHub during the pack; on a
throttled or region-blocked link that download can *stall* rather than fail —
npm never errors and never exits, so the installer sits on "Build desktop app"
(step 9/11) indefinitely with only harmless 'npm warn deprecated' lines visible.
The existing self-heal escalation (cache purge -> dist restore -> npmmirror
fallback) only fires when pack returns non-zero, so a stall bypassed it.
- run_with_timeout (generalized from run_browser_install_with_timeout): GNU
timeout --foreground -k 10 (Ctrl+C-aware, #35166) / gtimeout for external
commands, else a pure-shell process-group watchdog so stock macOS (neither
binary present) is protected. Shell functions (_desktop_pack) always take the
pure-shell path — the timeout binary can't exec a function. Integer-normalized
budget + a boundary recheck so a command finishing in the final poll second
isn't mislabeled 124. The internal wait is guarded so set -e can't abort
mid-function before the real exit code is computed.
- Wrap the desktop npm ci/install (sharing ONE budget via a computed deadline so
a stall can't cost 2x DESKTOP_BUILD_TIMEOUT) + all three _desktop_pack attempts
(DESKTOP_BUILD_TIMEOUT, default 900s), and the browser-tools / TUI / agent-
browser registry installs (NODE_DEPS_TIMEOUT, default 600s).
A stall now converts to a bounded non-zero exit that feeds the existing mirror
self-heal instead of hanging the whole install.
* test(ci): raise per-file timeout 140s to 300s to stop false timeouts
The per-file parallel runner caps each test-file subprocess at a flat
wall-clock budget. Combined with per-test subprocess isolation (a fresh
Python process per test), a large-collection file pays N x (interpreter
startup + import) of overhead before any test logic runs. That overhead
dilates under load on shared CI runners, so a file that finishes in
~100s on a quiet box can blow the old 140s cap purely from scheduling
jitter, surfacing as a false 'no tests ran' timeout (rc=124) with zero
actual test failures.
Raise the default to 300s (5 min). The Docker build matrix jobs already
take 7-10 min, so this headroom costs nothing on total CI wall time
while still bounding a genuinely hung file.
* docs: add infographic for CI per-file timeout bump
Add two tests for the self-lock guard in _recover_from_interrupted_install:
one asserting it clears the marker and skips install when hermes.exe is a
process ancestor (breaking the #52378/#45542 loop), one asserting it falls
through to a normal recovery install when the shim is NOT an ancestor.
The guard's manual-recovery hint runs only inside the Windows branch, so
quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform
fallback hint at the end of the function is left POSIX-correct.
Map Icather in scripts/release.py AUTHOR_MAP for the salvage.
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.
Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Follow-up to the salvaged #37733 fix. The contributor centralized
redaction at _openai_error and the chat/responses failure paths, which
covers the OpenAI-compatible envelopes transitively. Two sibling classes
crossed the same authenticated HTTP boundary unredacted:
- 8x cron-management endpoints returning {"error": str(e)} on 500
- the session-chat SSE error event ({"message": str(exc)})
Route both through the same _redact_api_error_text(force=True) helper.
Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard
covering mask/force/limit/passthrough behavior.
Follow-up on #54032 for #35166:
- Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt
release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via
playwright_host_unrecognized(), instead of retrying on ANY install failure.
A network/disk/permission failure on a supported host now surfaces unchanged
rather than getting a mismatched-glibc build forced onto it.
- detect_os() now captures DISTRO_VERSION from os-release.
- Fold in the interruptibility fix (was PR #35304, self-closed): wrap the
download in 'timeout --foreground -k 10' (probed, with plain-timeout
fallback) so a terminal Ctrl+C reaches the child and a wedged download is
force-killed after the deadline.
- Add behavioral tests that source the helpers and assert the retry fires only
on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros,
native-success, operator-pinned override, or unsupported arch.
On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04,
Debian 14, and future distros), 'npx playwright install --with-deps chromium'
hangs uninterruptibly at 'Installing Playwright Chromium with system
dependencies' because Playwright's resolver maps the host to a platform with
no download build (#35166).
Wrap every installer Playwright call in run_playwright_install(), which tries
the native install first and, only if it fails or times out, retries once with
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build
(ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless
for unrecognized platforms (microsoft/playwright#33434).
Try-native-first (not a hardcoded distro/version table) is deliberate:
- Self-correcting — when Playwright already supports the host (e.g. Ubuntu
26.04 on Playwright >=1.61) the first attempt succeeds and the override is
never applied, so we never force a mismatched-glibc build onto a release
Playwright handles correctly (microsoft/playwright#35114).
- Zero-maintenance — new distro releases work the moment Playwright adds them.
- Covers Debian 14+ and future releases, not just Ubuntu 26.04.
An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied
to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback
build and skip the retry.
Refs #35166
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.
Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
the stub-creation point and stamp _content_filter_terminated on the stub
(single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
burning any continuation retries; roll partial content back to the last
clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
Plain network stalls are unaffected (only content_policy_blocked is tagged).
Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.
Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
The .integration.test.mjs greps bridge.js source text for the queue
wiring — a change-detector that breaks on any benign refactor of the
same code. The behavioral unit test (bridge.sendqueue.test.mjs) already
covers FIFO ordering, error isolation, timeout propagation, and
single-consumer concurrency, which is the contract that matters.
Concurrent sock.sendMessage() calls on a single Baileys socket can cause
the WhatsApp protocol-level routing to misdeliver messages — responses
intended for one chat appear in another.
Add a promise-based send queue that serialises all sendMessage() calls
across concurrent HTTP /send, /edit, and /send-media handlers so only
one send is in-flight at a time.
Includes unit tests for queue ordering, error isolation, timeout
propagation, and single-consumer concurrency semantics, plus an
integration check that the queue is wired into sendWithTimeout.
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.
Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.
- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
-k keeps its value/filters, '--' still works, positional path stays a root).
The plain-Linux overlay re-enable (#53185) left nativeOverlayWidth() at 0
for plain Linux, so the native min/max/close buttons painted on top of the
app's right-edge titlebar tools. Reserve the fallback width everywhere the
WCO overlay is painted (Windows, WSLg, plain Linux); macOS still reserves 0
since it uses traffic lights.
_restore_primary_runtime restored the construction-time api_key snapshot and
never consulted the credential pool. After the pool rotated away from a
revoked/exhausted entry mid-session, every new turn restored the dead key,
re-failed instantly, burned the remaining entries, and fell through to
cross-provider fallback.
After restoring the snapshot, re-select the pool's current best entry and
swap the live credential in via _swap_credential (which already rebuilds the
OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163
base_url / OAuth-detection fixes). Falls back to the snapshot key when the
pool is absent, empty, or the entry has no usable key.
Salvaged from #25206 onto current main: the original targeted the pre-refactor
monolithic method in run_agent.py; the logic now lives in
agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead
of re-inlining the client rebuild.
Fixes#25205
A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements
running `docker version` with a 5s timeout, momentarily timing out under load)
would return False for a single get_tool_definitions() call. Because file
tools delegate their check_fn to the terminal check, that one flake silently
stripped read_file/write_file/patch/search_files AND terminal from whatever
agent was being constructed at that instant — most visibly a delegate_task
subagent, which then reported "Tool read_file does not exist". This explains
both the intermittent (~80% success) user-session failures and the
deterministic cron failures in #21658 / #5304.
The existing _check_fn TTL cache made this worse: it cached the transient
False for the full 30s window, poisoning every subagent spawned in that span.
Fix: remember the last time each check_fn returned True; when a fresh probe
fails within a short grace window of that success, treat it as a flake —
serve the last-good True and do NOT cache the failure (so the next call
re-probes). A failure with no recent success, or past the grace window, is
honored normally so a backend that genuinely went down stops advertising its
tools. Probe failures now log at WARNING regardless of quiet mode, making the
previously-silent tool loss diagnosable in subagent (quiet) sessions.
Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>
A document attached alongside an image in the same Discord message was
swept into the vision pipeline and 400'd the whole turn ("Could not
process image"), and was simultaneously never surfaced to the agent as a
readable file. Restores the "any file type works" contract for mixed
messages and fixes the HTTP 400.
Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video
classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in
a PHOTO message landed in image_paths and poisoned the vision call. The
document context-note path was gated on message_type == DOCUMENT, so that
same doc never reached the agent at all. Now classification is
per-attachment (trust each attachment's own MIME; fall back to the
message-level type only when MIME is unknown), via shared _event_media_is_*
helpers used by both _build_media_placeholder and the main inbound loop.
The document note now fires for any non-image/audio/video attachment
regardless of message-level type.
Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic
400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now
transcodes those to PNG via Pillow before declaring media_type, skipping
cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow
can't rasterize it — so it's skipped rather than transcoded.
Closes#25935.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>
Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on
its own schedule, so by the time Hermes notices an expired token Claude
Code may have already rotated it. Re-read live credential sources first and
adopt a valid token rather than POSTing a possibly-stale refresh token.
Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle)
on top of the keychain/file reconciliation from PR #21112 (nodejun).
Adds AUTHOR_MAP entry for nodejun.
DISCORD_ALLOWED_USERS="*" now means "allow everyone", matching the
SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and
the value `claw migrate` emits. Previously _is_allowed_user did exact
ID matching only, so "*" matched no user and blocked every non-self
sender — a P1 with no workaround.
Three sites, all required for the fix to hold at runtime:
- _is_allowed_user: short-circuit when "*" is in the allowlist.
- connect(): exclude "*" from the intents.members trigger so the
wildcard does not request the privileged Server Members intent
(which can block the bot from coming online).
- _resolve_allowed_usernames: preserve "*" verbatim; otherwise it lands
in the username-resolution bucket, matches no member, and is silently
dropped from the set and env var on the first on_ready — quietly
undoing the fix.
Slash auth delegates to _is_allowed_user (auto-covered); component auth
already honors "*" on main.
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.
- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
docker/powershell/…); calls to those are flagged even when captured. Non-flashing
programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
_subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
* fix(windows): stop subprocess console-window popups + add CI guard
The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.
- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
calls that can create a new console — output-redirection-aware (capture/
redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
windows_detach_popen_kwargs(); marked intentionally-visible launches
(editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.
* test: accept creationflags kwarg in psutil_android fake_subprocess_run
The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).
Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.
Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.
Also adds the AUTHOR_MAP entry for the salvaged contributor.
Closes#23473
Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in
self._restart_task (a bare asyncio.create_task keeps only a weak reference,
so a still-pending task can be GC'd mid-flight) and explicitly skips it in
the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for
the contributor and a regression test that fails when the task is cancellable.
Refs #12875
When a Hermes-managed node/npm/npx shim exists but fails --version, redownload
the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall
back to system npm on PATH when a managed tree is present.
POSIX heal probes node, npm, and npx (npm can break while node still runs).
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.
Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.
Fixes#35075
Co-authored-by: rlaope <piyrw9754@gmail.com>
The salvaged #51875 added a background-review write guard in skill_manage
that refuses mutations to skills.external_dirs skills — but it only fires
when is_background_review() is true. The curator's LLM review fork ran with
the default _memory_write_origin='assistant_tool', so the guard never
triggered during the exact curation pass it exists to protect against
(GH-47688).
- Set _memory_write_origin='background_review' on the curator review fork so
turn_context binds it onto the write-origin ContextVar and the guard fires.
- Add a regression test asserting the fork runs under the background_review
origin (the invariant linking the fork to the guard).
- AUTHOR_MAP: map yu-xin-c for the salvaged commit.