hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
teknium1	86ec979f66	chore(release): map PRATHAMESH75 author for #37550 salvage	2026-06-28 02:05:50 -07:00
Coy Geek	d7a1052424	fix(env-passthrough): fail closed when provider blocklist import fails When tools.environments.local can't be imported (partial install, import-time error), _is_hermes_provider_credential() returned False — fail-open. A skill could then register a Hermes provider credential (ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets passthrough vars bypass the secret-substring net (rule 1), so the operator's real key would land in the execute_code child. Reopens the GHSA-rhgp-j443-p4rf bypass. Fail closed instead: on import failure, treat the name as a protected provider credential and refuse passthrough. Regression test exercises the full register -> scrub path under a simulated import failure. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-28 02:05:43 -07:00
teknium1	58c36b1798	fix(api-server): widen error redaction to cron-endpoint + SSE sites Follow-up to the salvaged #37733 fix. The contributor centralized redaction at _openai_error and the chat/responses failure paths, which covers the OpenAI-compatible envelopes transitively. Two sibling classes crossed the same authenticated HTTP boundary unredacted: - 8x cron-management endpoints returning {"error": str(e)} on 500 - the session-chat SSE error event ({"message": str(exc)}) Route both through the same _redact_api_error_text(force=True) helper. Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard covering mask/force/limit/passthrough behavior.	2026-06-28 02:05:38 -07:00
teknium1	c0b4a3438a	fix(install): scope Playwright override to too-new apt releases + keep step interruptible Follow-up on #54032 for #35166: - Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via playwright_host_unrecognized(), instead of retrying on ANY install failure. A network/disk/permission failure on a supported host now surfaces unchanged rather than getting a mismatched-glibc build forced onto it. - detect_os() now captures DISTRO_VERSION from os-release. - Fold in the interruptibility fix (was PR #35304, self-closed): wrap the download in 'timeout --foreground -k 10' (probed, with plain-timeout fallback) so a terminal Ctrl+C reaches the child and a wedged download is force-killed after the deadline. - Add behavioral tests that source the helpers and assert the retry fires only on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros, native-success, operator-pinned override, or unsupported arch.	2026-06-28 02:05:18 -07:00
kshitijk4poor	a28fe788a6	fix(install): retry Playwright install with platform override on unrecognized host (#35166 ) On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04, Debian 14, and future distros), 'npx playwright install --with-deps chromium' hangs uninterruptibly at 'Installing Playwright Chromium with system dependencies' because Playwright's resolver maps the host to a platform with no download build (#35166). Wrap every installer Playwright call in run_playwright_install(), which tries the native install first and, only if it fails or times out, retries once with PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build (ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless for unrecognized platforms (microsoft/playwright#33434). Try-native-first (not a hardcoded distro/version table) is deliberate: - Self-correcting — when Playwright already supports the host (e.g. Ubuntu 26.04 on Playwright >=1.61) the first attempt succeeds and the override is never applied, so we never force a mismatched-glibc build onto a release Playwright handles correctly (microsoft/playwright#35114). - Zero-maintenance — new distro releases work the moment Playwright adds them. - Covers Debian 14+ and future releases, not just Ubuntu 26.04. An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback build and skip the retry. Refs #35166	2026-06-28 02:05:18 -07:00
teknium1	578e3989d4	fix(agent): route content-filter stream stalls to fallback chain (#32421 ) When a provider's output-layer safety filter (MiniMax "output new_sensitive (1027)", Azure content_filter, etc.) kills a streaming response after deltas were already sent, interruptible_streaming_api_call swallows the raw error into a finish_reason=length partial-stream stub. The conversation loop then burned 3 continuation retries against the SAME primary — re-hitting the content-deterministic filter every time — and gave up with "Response remained truncated after 3 continuation attempts", never consulting fallback_providers. Builds on @595650661's classifier change (cherry-picked) so error_classifier recognizes the filter; then: - chat_completion_helpers: run the swallowed error through error_classifier at the stub-creation point and stamp _content_filter_terminated on the stub (single source of truth — no parallel pattern list). - conversation_loop: read the tag and activate the fallback chain BEFORE burning any continuation retries; roll partial content back to the last clean turn and re-issue against the new provider (restart_with_rebuilt_messages). Plain network stalls are unaffected (only content_policy_blocked is tagged). Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the same issue via the stub-tag and loop-escalation approaches respectively. Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback fires on the first stub and the fallback provider completes the turn.	2026-06-28 01:15:21 -07:00
teknium1	cb9f855c2b	test(whatsapp-bridge): drop structural send-queue integration test The .integration.test.mjs greps bridge.js source text for the queue wiring — a change-detector that breaks on any benign refactor of the same code. The behavioral unit test (bridge.sendqueue.test.mjs) already covers FIFO ordering, error isolation, timeout propagation, and single-consumer concurrency, which is the contract that matters.	2026-06-28 01:10:14 -07:00
Tranquil-Flow	c393a8e55f	fix(whatsapp-bridge): serialize sendMessage to prevent cross-chat contamination (#33360 ) Concurrent sock.sendMessage() calls on a single Baileys socket can cause the WhatsApp protocol-level routing to misdeliver messages — responses intended for one chat appear in another. Add a promise-based send queue that serialises all sendMessage() calls across concurrent HTTP /send, /edit, and /send-media handlers so only one send is in-flight at a time. Includes unit tests for queue ordering, error isolation, timeout propagation, and single-consumer concurrency semantics, plus an integration check that the queue is wired into sendWithTimeout.	2026-06-28 01:10:14 -07:00
teknium1	2e1b48ed31	chore: map kurlyk local email → skabartem for PR #32867 salvage	2026-06-28 01:08:04 -07:00
Teknium	2523917680	fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008 ) The parallel runner only forwarded pytest args after a literal '--', so a bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored out with 'unrecognized arguments'. This contradicted the docstring's promise that common pytest flags pass through, and forced a retry on every run that used pytest muscle-memory. Now any token starting with '-' that isn't one of the runner's own options (-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files, --include-integration) is routed to each per-file pytest invocation automatically. Value-taking flags given space-separated (-k expr, -m mark, -p plugin, -o name=val, etc.) keep their value instead of having it stolen by positional-path discovery. The explicit '--' separator still works and stacks with bare flags. - scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to pytest; value-flag lookahead; updated docstring. - scripts/run_tests.sh: usage comment reflects bare-flag passthrough. - tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs, -k keeps its value/filters, '--' still works, positional path stays a root).	2026-06-27 22:43:26 -07:00
Rafael Millan	54ea059919	fix: fall back to no-sandbox for desktop launch on restricted Linux hosts	2026-06-27 22:16:20 -07:00
teknium1	97640fd9ad	fix(desktop): reserve WCO width on plain Linux + author map The plain-Linux overlay re-enable (#53185) left nativeOverlayWidth() at 0 for plain Linux, so the native min/max/close buttons painted on top of the app's right-edge titlebar tools. Reserve the fallback width everywhere the WCO overlay is painted (Windows, WSLg, plain Linux); macOS still reserves 0 since it uses traffic lights.	2026-06-27 22:05:33 -07:00
teknium1	c72d68715f	chore(release): map salvaged contributor emails for #49129 and #51488	2026-06-27 21:23:25 -07:00
teknium1	2e7e600eaa	chore(release): map HexLab98 author for PR #53863 salvage	2026-06-27 21:22:49 -07:00
Jack Maloney	f0de4c6a47	fix(pool): re-select from credential pool on primary runtime restore _restore_primary_runtime restored the construction-time api_key snapshot and never consulted the credential pool. After the pool rotated away from a revoked/exhausted entry mid-session, every new turn restored the dead key, re-failed instantly, burned the remaining entries, and fell through to cross-provider fallback. After restoring the snapshot, re-select the pool's current best entry and swap the live credential in via _swap_credential (which already rebuilds the OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163 base_url / OAuth-detection fixes). Falls back to the snapshot key when the pool is absent, empty, or the entry has no usable key. Salvaged from #25206 onto current main: the original targeted the pre-refactor monolithic method in run_agent.py; the logic now lives in agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead of re-inlining the client rebuild. Fixes #25205	2026-06-27 20:04:45 -07:00
teknium1	926a1b915d	fix(tools): suppress transient check_fn flakes so subagents keep file/terminal tools A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements running `docker version` with a 5s timeout, momentarily timing out under load) would return False for a single get_tool_definitions() call. Because file tools delegate their check_fn to the terminal check, that one flake silently stripped read_file/write_file/patch/search_files AND terminal from whatever agent was being constructed at that instant — most visibly a delegate_task subagent, which then reported "Tool read_file does not exist". This explains both the intermittent (~80% success) user-session failures and the deterministic cron failures in #21658 / #5304. The existing _check_fn TTL cache made this worse: it cached the transient False for the full 30s window, poisoning every subagent spawned in that span. Fix: remember the last time each check_fn returned True; when a fresh probe fails within a short grace window of that success, treat it as a flake — serve the last-good True and do NOT cache the failure (so the next call re-probes). A failure with no recent success, or past the grace window, is honored normally so a backend that genuinely went down stops advertising its tools. Probe failures now log at WARNING regardless of quiet mode, making the previously-silent tool loss diagnosable in subagent (quiet) sessions. Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>	2026-06-27 19:29:00 -07:00
Shashwat Gokhe	505bc27d8d	fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats A document attached alongside an image in the same Discord message was swept into the vision pipeline and 400'd the whole turn ("Could not process image"), and was simultaneously never surfaced to the agent as a readable file. Restores the "any file type works" contract for mixed messages and fixes the HTTP 400. Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in a PHOTO message landed in image_paths and poisoned the vision call. The document context-note path was gated on message_type == DOCUMENT, so that same doc never reached the agent at all. Now classification is per-attachment (trust each attachment's own MIME; fall back to the message-level type only when MIME is unknown), via shared _event_media_is_* helpers used by both _build_media_placeholder and the main inbound loop. The document note now fires for any non-image/audio/video attachment regardless of message-level type. Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic 400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now transcodes those to PNG via Pillow before declaring media_type, skipping cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow can't rasterize it — so it's skipped rather than transcoded. Closes #25935. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com> Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>	2026-06-27 19:26:04 -07:00
Chaz Dinkle	1dde7e2f2a	fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on its own schedule, so by the time Hermes notices an expired token Claude Code may have already rotated it. Re-read live credential sources first and adopt a valid token rather than POSTing a possibly-stale refresh token. Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle) on top of the keychain/file reconciliation from PR #21112 (nodejun). Adds AUTHOR_MAP entry for nodejun.	2026-06-27 19:14:43 -07:00
teknium1	6514be5a28	chore(release): add AUTHOR_MAP entry for linyubin (#50228 salvage)	2026-06-27 19:12:21 -07:00
bykim0119	851f75d4df	fix(discord): honor "" wildcard in DISCORD_ALLOWED_USERS (#22334 ) DISCORD_ALLOWED_USERS="" now means "allow everyone", matching the SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and the value `claw migrate` emits. Previously _is_allowed_user did exact ID matching only, so "" matched no user and blocked every non-self sender — a P1 with no workaround. Three sites, all required for the fix to hold at runtime: - _is_allowed_user: short-circuit when "" is in the allowlist. - connect(): exclude "" from the intents.members trigger so the wildcard does not request the privileged Server Members intent (which can block the bot from coming online). - _resolve_allowed_usernames: preserve "" verbatim; otherwise it lands in the username-resolution bucket, matches no member, and is silently dropped from the set and env var on the first on_ready — quietly undoing the fix. Slash auth delegates to _is_allowed_user (auto-covered); component auth already honors "*" on main.	2026-06-27 19:11:30 -07:00
teknium1	ea8facee81	chore(release): add konsisumer to AUTHOR_MAP for PR #19608 salvage	2026-06-27 19:01:37 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Dale Nguyen	dbbf102b8e	fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env The Hermes gateway runs inside its own venv, so its process environment carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned subprocesses inheriting those markers. When the agent ran `uv sync`, `uv pip install`, `poetry install`, etc. in ANY other project directory, those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that project's dependencies into the Hermes venv path — wiping Hermes' own runtime deps (and, when the other project pinned a different Python, replacing the interpreter), bricking the gateway on the next restart (#23473). Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env` — via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays reachable because its bin dir is already first on PATH, so removing the active-environment markers is safe and only prevents the cross-project clobber. Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't reach the spawned subprocess) and unit coverage for both functions, plus a guard on the marker constant. Also adds the AUTHOR_MAP entry for the salvaged contributor. Closes #23473	2026-06-28 01:04:20 +05:30
teknium1	f2ca3e3d84	fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in self._restart_task (a bare asyncio.create_task keeps only a weak reference, so a still-pending task can be GC'd mid-flight) and explicitly skips it in the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for the contributor and a regression test that fails when the task is cancellable. Refs #12875	2026-06-27 03:57:31 -07:00
Teknium	d3db73210c	chore(release): map blaryx@gmail.com → Blaryxoff for PR #32602 salvage	2026-06-27 03:48:18 -07:00
teknium1	3cd4693494	chore: add DiamondEyesFox to AUTHOR_MAP for PR #53351 salvage	2026-06-27 03:04:26 -07:00
ethernet	dd0e4ab81a	change(ci): slice files in matrix job avoid duplicating work, avoid file discovery on each job	2026-06-26 19:15:18 -07:00
ethernet	1a75387fa8	change(ci): log json decode error in durations	2026-06-26 19:15:18 -07:00
ethernet	707ae6e623	change(tests): don't count with pytest collect it's way too slow. just grep files lol	2026-06-26 19:15:18 -07:00
ethernet	9a861cd0ab	change(tests): don't pass pytest args when counting tests	2026-06-26 19:15:18 -07:00
ethernet	fb1dd1bf91	change(ci): docker-publish.yml -> docker.yml	2026-06-26 19:15:18 -07:00
kshitijk4poor	5eb108f06c	chore: AUTHOR_MAP — yashiel@skyner.co.za → yashiels PR #53284 salvage (discord markdown table-to-bullet conversion; #21168)	2026-06-27 03:38:29 +05:30
xxxigm	65be0061e0	fix(hermes): heal broken managed Node tree instead of PATH fallback When a Hermes-managed node/npm/npx shim exists but fails --version, redownload the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall back to system npm on PATH when a managed tree is present. POSIX heal probes node, npm, and npx (npm can break while node still runs).	2026-06-26 20:10:20 +05:30
kshitijk4poor	05ba5f3962	chore: add Dr1985 to AUTHOR_MAP for launchd salvage (#42567 )	2026-06-26 14:09:11 +05:30
teknium1	fbfccbb3ee	fix(security): align cron invisible-unicode set with install-time scanner The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17. The cron-local set was missing U+2062-U+2064 (invisible math operators) and U+2066-U+2069 (directional isolates), so a directive obfuscated with one of those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past the runtime cron gate while being caught at install time. Import the canonical set so the cron tripwire and install scanner can't drift apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged. Fixes #35075 Co-authored-by: rlaope <piyrw9754@gmail.com>	2026-06-26 01:11:11 -07:00
konsisumer	3cf900eb67	fix(install): discard managed lockfile churn before stashing	2026-06-25 23:49:11 -07:00
kshitijk4poor	fe255ab28b	chore: add Tranquil-Flow to AUTHOR_MAP for auxiliary base_url salvage (#52623 )	2026-06-26 11:11:33 +05:30
teknium1	4d04c652f2	fix(curator): make external-skill write guard actually fire during curation The salvaged #51875 added a background-review write guard in skill_manage that refuses mutations to skills.external_dirs skills — but it only fires when is_background_review() is true. The curator's LLM review fork ran with the default _memory_write_origin='assistant_tool', so the guard never triggered during the exact curation pass it exists to protect against (GH-47688). - Set _memory_write_origin='background_review' on the curator review fork so turn_context binds it onto the write-origin ContextVar and the guard fires. - Add a regression test asserting the fork runs under the background_review origin (the invariant linking the fork to the guard). - AUTHOR_MAP: map yu-xin-c for the salvaged commit.	2026-06-25 22:03:02 -07:00
teknium1	eed9bbeb0a	chore(release): add rebel0789 to AUTHOR_MAP for salvaged PR #47308	2026-06-25 22:02:22 -07:00
teknium1	1abfa66ba6	chore(release): add DavidMetcalfe to AUTHOR_MAP for PR #52272 salvage	2026-06-25 19:00:48 -07:00
teknium1	e29823f1e8	chore(release): map agt-user noreply email for #48496 salvage	2026-06-25 18:50:11 -07:00
teknium1	6dfb8326f5	fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes Follow-up to the salvage of #45035 + #48682. The two PRs touched different functions (resolve_resume_session_id vs get_compression_tip) but #45035's descendant walk followed ANY parent_session_id child, so a delegate/subagent child could hijack the resume target. Apply the same _branched_from / _delegate_from / source!='tool' exclusion the rest of hermes_state.py uses, so the resume walk only follows genuine compression continuations. Also updates the unrealistic delegation test fixture to carry the real _delegate_from marker, and updates 3 list_sessions_rich test mocks for the order_by_last_active kwarg #48682 added. AUTHOR_MAP: map PINKIIILQWQ + ailang323 salvage authors.	2026-06-25 16:29:09 -07:00
teknium1	92b5987ca2	chore: add herbalizer404 + pyxl-dev to AUTHOR_MAP for auxiliary fallback salvage	2026-06-25 13:08:18 -07:00
kshitijk4poor	0654319644	chore(release): map srojk34 legacy prefix-less noreply in AUTHOR_MAP (#50098 )	2026-06-25 12:56:05 -07:00
kshitij	d682f320b3	Merge pull request #52147 from NousResearch/salvage/29184-mcp-osv-nonblocking fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184)	2026-06-25 23:39:44 +05:30
qdaszx	6305ac0e4b	fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184 ) During stdio MCP server startup, _run_stdio (an async method) called the synchronous check_package_for_malware() inline. That makes a blocking urllib HTTPS POST to api.osv.dev whose own timeout doesn't reliably cover a stalled SSL handshake, so an intermittent network issue froze the entire asyncio event loop for up to ~120s — blowing past the TUI/gateway's 15s startup budget and showing "gateway startup timeout". Run the check via asyncio.to_thread (off the loop) AND bound it with asyncio.wait_for(timeout=_OSV_MALWARE_CHECK_TIMEOUT_S=12s). The malware check is fail-open, so on timeout we log and proceed rather than blocking startup. Salvaged from #29190 by @qdaszx (re-applied on current main — the call site moved since the PR was opened), combining the to_thread approach also proposed in #29192 by @ygd58. Two load-bearing tests: event-loop-not-blocked-during- check and timeout-fails-open — both mutation-verified to fail against the old inline blocking call. Closes #29184. Co-authored-by: ygd58 <buraysandro9@gmail.com>	2026-06-25 23:30:41 +05:30
Teknium	60a2feeebf	chore: add benbenlijie to AUTHOR_MAP for PR #47205 salvage	2026-06-25 00:17:17 -07:00
Brooklyn Nicholson	a5849917a8	test(pets): make slow pet generation suite opt-in The pet generation image-processing suite is deterministic but expensive enough to blow the per-file CI timeout on Linux (140s), and it is not relevant to the fast timeout PR's normal signal. Keep it available for manual validation, but do not run it by default. Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper now preserves that opt-in variable through its hermetic env.	2026-06-25 00:44:53 -05:00

1 2 3 4 5 ...

1310 commits