The test set HERMES_YOLO_MODE=1 via monkeypatch.setenv, expecting
check_dangerous_command() to honor yolo and bypass cron_mode=deny. But
tools.approval._YOLO_MODE_FROZEN is intentionally frozen at module
import time (security: prevents prompt-injection runtime escalation).
When CI imports the module BEFORE the test sets the env, the frozen
value stays False and the yolo bypass never activates.
Local runs missed this because the conftest leaked a non-empty
HERMES_YOLO_MODE into the import-time env. CI's clean-env path exposed
the bug deterministically on test (3) / test (4) shards.
Fix: patch the module attribute directly via mock.patch.object so the
test simulates process-startup-with-yolo regardless of import order.
The behavior under test (yolo bypasses cron_mode=deny for non-hardline
commands) is unchanged; the security invariant (_YOLO_MODE_FROZEN can't
be set at runtime by skills) is preserved.
Reproduced locally with: env -i HOME=$HOME PATH=$PATH python3 -m pytest
tests/tools/test_cron_approval_mode.py -o 'addopts=' -v
Without the fix: 1 failed, 23 passed. With the fix: 24 passed.
* fix(transcription): reject symlinked audio inputs
Validation runs before provider selection, so rejecting symbolic-link paths there prevents supported-extension links from being treated as normal audio files. Use os.path.islink to avoid perturbing the existing Path.stat error path and to reject links before resolving targets.
Constraint: Keep validation platform-safe and avoid requiring symlink support where unavailable.
Rejected: Use Path.is_symlink | it consumes pathlib stat calls and broke the existing stat error regression.
Confidence: high
Scope-risk: narrow
Directive: Keep path hardening in _validate_audio_file before provider dispatch.
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases -q (5 passed)
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases tests/tools/test_transcription_tools.py::TestTranscribeAudioDispatch::test_invalid_file_short_circuits -q (6 passed)
Tested: source venv/bin/activate && python -m compileall tools/transcription_tools.py tests/tools/test_transcription_tools.py
Tested: git diff --check
Not-tested: Full tests/tools/test_transcription_tools.py under .[dev] only; existing faster_whisper optional dependency tests fail with ModuleNotFoundError.
* Keep transcription tests independent of optional whisper install
The transcription suite mocks faster-whisper directly, so a minimal test stub keeps the branch verifiable in environments where the optional package is not installed. This preserves the existing mock-based coverage without adding a dependency.
Constraint: faster-whisper is an optional local STT dependency and is absent from the current validation environment
Rejected: Install faster-whisper just for branch validation | would add heavyweight environment coupling outside the patch scope
Confidence: high
Scope-risk: narrow
Directive: Keep this as a test-only stub unless production import semantics change
Tested: pytest tests/tools/test_transcription_tools.py -q
---------
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* fix: reject read_file symlinks to blocking devices
The read_file guard already refused direct device paths such as /dev/zero, but a workspace symlink resolving to one of those devices could still reach the shell-backed read path and hang on wc/head/sed. Keep the literal alias check and add a resolved-path pass so local symlinks to blocked device/fd endpoints are rejected before I/O.
Constraint: Preserve literal /dev/stdin handling before terminal-specific realpath resolution
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py tests/tools/test_file_tools.py -q; python -m compileall tools/file_tools.py tests/tools/test_file_read_guards.py; git diff --check
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* Keep file guard tests off sensitive macOS temp paths
The branch now inherits a sensitive-path write guard from upstream main. On macOS, tempfile.mkdtemp() resolves under /private/var/folders, so the new write-path guard fired before the file read dedup assertions could exercise their intended behavior. The tests now create their scratch files inside the worktree temp checkout, outside those system-sensitive prefixes, without changing production behavior.
Constraint: Rebased branch must pass the expanded file read guard suite on macOS.
Rejected: Loosen the production sensitive-path prefix list | broader behavior change unrelated to this PR.
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py -q
---------
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
The read_file tool and terminal cat can access /proc/self/environ to
recover all process env vars including secrets stripped by the subprocess
blocklist. Output redaction partially mitigates (catches known-format
tokens) but misses custom/proprietary key formats, especially when
values are printed without their key names.
Add /proc/*/environ, /proc/*/cmdline, and /proc/*/maps to the blocked
device paths in _is_blocked_device():
- /proc/*/environ: leaks full process env (API keys, tokens)
- /proc/*/cmdline: leaks command-line args (may contain passwords)
- /proc/*/maps: leaks memory layout (ASLR bypass for exploitation)
Legitimate /proc reads (cpuinfo, meminfo, uptime, version) remain
accessible — the check only blocks per-pid pseudo-files with known
sensitive suffixes.
Complements PR #4432 (PID namespace isolation for child processes)
which prevents children from reading the parent's /proc, but does not
prevent the parent process itself from being read via file tools.
Partially addresses #4427
Changes:
tools/file_tools.py | +6
tests/tools/test_file_read_guards.py | +18 -1
Co-authored-by: dsr-restyn <dsr-restyn@users.noreply.github.com>
* fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern
- BUG-009 (CRITICAL): freeze HERMES_YOLO_MODE at module import via
_YOLO_MODE_FROZEN; prevents skills/prompt-injection from calling
os.environ["HERMES_YOLO_MODE"]="true" at runtime to bypass all checks
- BUG-002 (HIGH): replace substring "APPROVE" in answer with exact
answer == "APPROVE" in _smart_approve; prompt already requests exactly
one word, substring match was exploitable via verbose LLM responses
- BUG-001 (MEDIUM): add logger.warning for every dangerous command that
auto-approves in non-interactive non-gateway context; makes silent
approvals visible in audit logs without breaking script behavior
- BUG-008 (LOW): expand curl/wget pipe pattern to cover | /bin/bash and
| bash -c variants, not just | sh / | bash
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(approval): add missing is_truthy_value import + fix yolo test patches
_YOLO_MODE_FROZEN uses is_truthy_value() from utils — import was missing.
Tests that set HERMES_YOLO_MODE via monkeypatch.setenv() no longer work
because the value is frozen at import time; update them to patch the
module-level flag directly via monkeypatch.setattr().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
V4A patch '*** Update File:', '*** Add File:', '*** Delete File:' headers
come from patch CONTENT, not the explicit `path=` argument. That makes
them attacker-influenceable through skill content, web extract output,
prompt injection, and other surfaces the agent processes. Headers like
'*** Update File: ../../../etc/shadow' would resolve relative to the
agent's cwd; in deployment configurations where that cwd is deep enough
to land outside Hermes' protected paths, the write could land somewhere
the agent operator did not intend.
Reject any V4A header containing a '..' path component before applying
the patch. The explicit `path=` argument on patch_tool is UNCHANGED —
the agent legitimately uses '..' there (e.g. `patch path='../other_module/x.py'`
from a worktree dir is normal cross-module editing).
Regression tests: V4A Update header with traversal rejected, V4A Add
header with traversal rejected, patch_v4a never invoked when rejection
fires.
Salvaged from PR #29395 by @waefrebeorn. The original PR added
has_traversal_component as a blanket reject on read_file_tool,
write_file_tool, patch_tool's explicit path, and search_tool — that
would break legitimate agent operation where '..' is normal. Also
dropped the over-eager skills_guard pattern additions
(pickle.loads/marshal.loads/ctypes.CDLL/importlib at high/critical
severity would false-positive on legit data-science and FFI skills).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Expand _MEMORY_THREAT_PATTERNS from 13 to 24 regex patterns and align
_INVISIBLE_CHARS with skills_guard.py (10 → 17 characters).
Key changes:
- Add multi-word bypass prevention (?:\w+\s+)* to injection patterns
- Add missing injection patterns: role_pretend, leak_system_prompt,
remove_filters, fake_update, translate_execute, html_comment_injection,
hidden_div
- Add exfiltration patterns: send_to_url, context_exfil
- Add persistence patterns: agent_config_mod, hermes_config_mod
(both require modification-verb prefix to avoid false positives on
mere mentions of config filenames)
- Add hardcoded secret detection pattern
- Add role_hijack precision fix: require article after "now" to avoid
blocking "you are now ready/connected/set up" etc.
- Expand invisible unicode set with directional isolates (U+2066-2069)
and invisible math operators (U+2062-2064)
Test coverage expanded from ~8 to ~30 scan tests including dedicated
false-positive regression tests for all precision-sensitive patterns.
Known limitations (deferred to follow-up PRs):
- prompt_builder.py and cronjob_tools.py still use older pattern sets
- No semantic/LLM-based scanning (regex-only approach)
- No cross-entry or cross-store analysis
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any
shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds,
SenseVoice, curl pipelines — become an STT backend with zero Python.
Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved
untouched via the built-in local_command path) and the
register_transcription_provider() Python plugin hook also shipped in this
PR.
Resolution order (mirrors TTS exactly):
1. Built-in (local, local_command, groq, openai, mistral, xai)
→ native handler. Always wins.
2. stt.providers.<name>: type: command → command-provider runner.
3. Plugin-registered TranscriptionProvider → plugin dispatch.
4. No match → 'No STT provider available'.
Files
-----
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained;
added _resolve_command_stt_provider_config, _transcribe_command_stt,
and local helpers for template rendering, shell-quote context, and
process-tree termination. Helpers are documented as mirrors of their
tts_tool.py counterparts (kept local to avoid cross-tool private
import). Wire-in is one insertion point in transcribe_audio() after
the xai elif and before the plugin dispatcher. Plugin dispatcher
additionally defensively short-circuits when a same-name command
config exists (command-wins-over-plugin invariant).
- tests/tools/test_transcription_command_providers.py: 50 new tests
covering resolution (builtin precedence, type/command gating,
case-insensitive lookup, legacy stt.<name> back-compat), helpers
(timeout fallback, format validation, iter, has-any), template
rendering (shell-quote contexts, doubled-brace preservation),
end-to-end via _transcribe_command_stt (output_path read, stdout
fallback, timeout, nonzero exit envelope, model override,
language precedence), and dispatcher integration via the real
transcribe_audio() including command-wins-over-plugin and
builtin-shadow-rejection.
- tests/plugins/transcription/check_parity_vs_main.py: extended from
10 to 13 scenarios. New cases: command-provider-installed,
command-vs-plugin-same-name (verifies command wins precedence),
explicit-openai-with-command-shadow (verifies built-in wins).
Adds command_provider dispatch_kind detection via transcript prefix
(CMD: vs PLUGIN:) so command-provider scenarios can be distinguished
from plugin scenarios even when sharing a provider name.
- website/docs/user-guide/features/tts.md: new 'STT custom command
providers' section symmetric to the TTS section — example config,
placeholder grammar table (input_path / output_path / output_dir /
format / language / model), transcript-read-back semantics (file
first, then stdout fallback), optional keys table, behavior notes,
security note. Updated 'Python plugin providers (STT)' to include
the new 'When to pick which (STT)' decision table and updated
resolution-order section (now 4 layers instead of 3).
Verification
------------
189/189 STT targeted tests + 50/50 new command-provider tests pass.
Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/
8623/8623 — zero regressions across 14,199 tests.
Parity harness: 13 scenarios, 9 OK + 4 expected diffs
(no_provider_error → plugin, plugin_unavailable, command_provider × 2).
E2E live-verified in an isolated HERMES_HOME with a real .wav file:
command: → dispatched to stt.providers.my-fake-cli
plugin: → dispatched to registered TranscriptionProvider
command-wins-over-plugin: → command provider beats same-name plugin
builtin-wins-over-command: → built-in OpenAI handler fires;
stt.providers.openai: type: command
does NOT hijack it.
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed
because the test hardcoded known_inits = ('tini', 'dumb-init',
'catatonit'). The PR replaced tini with s6-overlay's /init (which execs
s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name,
so the substring scan against ENTRYPOINT missed it.
Two-part fix:
1. Extend the accepted token list to include 's6-overlay', 's6-svscan',
and '/init'. The contract these tests enforce is behavioural ('some
PID-1 init reaps SIGCHLD'), so the names list is purely a recognition
table and any reaper-capable family should qualify.
2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the
sibling check) against comment-only matches. It was scanning the full
Dockerfile text and only passed because the word 'tini' is still in
a historical comment explaining why we used to use it. The next
person to clean up that comment would have silently broken the test.
New _instruction_text() helper joins only the parsed, non-comment
Dockerfile instructions so stale comments can't satisfy the check.
(cherry picked from commit ffc1bb6393)
Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1
contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added
with #15012) hardcoded tini/dumb-init/catatonit as the only accepted
inits, so they failed after #30136 replaced tini with s6-overlay's
/init.
s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly
on SIGCHLD — same contract the test exists to enforce. Two updates:
* test_dockerfile_installs_an_init_for_zombie_reaping — accept
's6-overlay' as a known-installed marker (matches the
s6-overlay install layer in Ben's Dockerfile).
* test_dockerfile_entrypoint_routes_through_the_init — accept
'/init' as a known-routed marker (s6-overlay's PID-1 binary
lives at /init by convention).
Both assertions still fire if a future Dockerfile rewrite drops
the init entirely. Local: 7/7 pass.
Second migration of an existing built-in platform adapter after Discord
(PR #30591) — follows the same shape established by IRC / Teams / LINE /
Google Chat / SimpleX and the playbook in
`references/platform-plugin-migration.md`. Advances the umbrella refactor
in #3823.
Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/`
with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell,
`register(ctx)` entry point, **no back-compat shim** at the old import
path, and full parity for all five hooks Discord uses plus the
`apply_yaml_config_fn` hook (mattermost is the second consumer of #25443
after Discord):
* `standalone_sender_fn` — out-of-process cron delivery via Mattermost
REST API. Picks up the thread_id + media_files capabilities the
legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`).
* `setup_fn` — interactive `hermes setup gateway` wizard.
* `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys
(`require_mention`, `free_response_channels`, `allowed_channels`) into
`MATTERMOST_*` env vars (replaces the hardcoded block in
`gateway/config.py`).
* `is_connected` — declares connection state from `MATTERMOST_TOKEN` +
`MATTERMOST_URL`.
* `check_fn` — verifies aiohttp is installed and both required env vars
are set.
* plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`,
`max_message_length` (4000 — Mattermost practical limit), `emoji`,
`required_env`, `install_hint`.
Files
-----
* `gateway/platforms/mattermost.py` (873 LOC) →
`plugins/platforms/mattermost/adapter.py` (git rename, R071) +
appended `register()` block, hook helpers, and `_standalone_send`
with media upload + thread_id support.
* New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with
`requires_env` / `optional_env` declarations covering MATTERMOST_URL,
MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS,
MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE,
MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS,
MATTERMOST_ALLOWED_CHANNELS.
* `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge
(moved into plugin's `_apply_yaml_config`).
* `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` —
replaced by the existing generic plugin-registry-first dispatch.
* `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) +
`Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch
already routes plugin platforms through `_send_via_adapter`, which
hits the registry's `standalone_sender_fn`.
* `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced
by the plugin's `interactive_setup`.
* `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry
(3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in
`_configure_platform`.
* Consumer rewrite: 5 test files (test_mattermost.py,
test_media_download_retry.py, test_send_multiple_images.py,
test_stream_consumer.py, test_ws_auth_retry.py) get
`gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter`
with the bulk-rewrite recipe from the platform-plugin-migration playbook.
Single `mock.patch` string in test_stream_consumer.py also repointed.
* `tests/tools/test_send_message_missing_platforms.py`: thin
`(token, extra, chat_id, message)` compat shim around the plugin's
`_standalone_send(pconfig, …)` so existing test bodies continue to
work without rewriting every signature.
Validation
----------
* Plugin discovery: mattermost registers from `plugins/platforms/mattermost/`
alongside discord / teams / irc / line / google_chat / simplex.
All 9 hooks present (setup_fn, standalone_sender_fn,
apply_yaml_config_fn, is_connected, check_fn, allowed_users_env,
allow_all_env, cron_deliver_env_var, max_message_length=4000).
* Mattermost-touching tests: 62/62 pass
(`test_mattermost.py` + `test_send_message_missing_platforms.py`).
* Targeted selectors (mattermost or platform_registry or stream_consumer
or ws_auth_retry or media_download_retry or send_multiple_images or
send_message_tool or platform_connected): 433/433 pass.
* Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/
tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py
tests/integration/`): **6220/6220 pass in 47.8s, 0 failures**.
* Lint: ruff clean on all touched files.
* Git identity verified: kshitijk4poor.
* Rename detection: R071 (similarity dropped from a hypothetical R09x
by the ~320-line appended register block — ~36% growth over the
873-LoC base, vs Discord's 5101 LoC base which kept R091).
Closes part of #3823.
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.
Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated
Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring
tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.
Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
break test discovery; pre-existing unrelated mock warning)
The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.
The hook covers cases the shell-template grammar can't reasonably
express:
- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows
None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.
## Resolution order
The dispatcher's resolution order is the load-bearing invariant:
1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
→ command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
→ plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).
Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
the `hermes tools` row list defensively.
Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
`_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
config defensively so a refactor of the caller can't silently break
the invariant.
## New files
- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
`list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
`voice_compatible` (all optional with sane defaults). Mirrors
`agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
`agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).
## Modified files
- `hermes_cli/plugins.py` — `register_tts_provider()` method on
`PluginContext`. Matches the gating shape of
`register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
`_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
plugin rows into the Text-to-Speech picker category alongside the
10 hardcoded built-in rows.
## Tests
- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
`tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
built-in-always-wins, command-wins-over-plugin, plugin dispatch,
exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
picker surface, builtin shadowing defense, integration with
`_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
parity harness vs `origin/main`. The only intentional diff is
`fallback_edge → plugin` for the `plugin-installed` scenario.
## Verification
- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
`text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.
## Docs
- `website/docs/user-guide/features/tts.md` — new "Python plugin
providers" section with a decision table (command-provider vs
plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
mention both surfaces (command-provider primary, plugin for
SDK/streaming).
Closes#30398
The gateway pairing directory (~/.hermes/pairing/) stores per-platform
access-control files (telegram-approved.json, discord-approved.json, etc.).
A prompt-injected agent using write_file could add arbitrary user IDs to an
approved file, granting persistent gateway access without going through the
pairing code flow — the same threat class that motivated protecting
webhook_subscriptions.json (#14157).
The pairing directory was not included in the original control-plane protection
because it postdates PR #14157. PR #30383 introduced the hashed-pending schema
and made the approved files the sole source of truth for gateway access, raising
the security sensitivity of the directory.
Apply the same mcp-tokens pattern: block writes to pairing/ and any path within
it, under both the active hermes_home and the root path (for profile-mode parity
with the fix in #30382).
Regression tests verify denial for pairing/telegram-approved.json,
pairing/discord-pending.json, and the directory itself, in both normal and
profile-mode layouts.
Four recent security PRs landed on main with stale/missing test updates,
breaking 4 test shards on every subsequent PR's CI run:
- test_discord_bot_auth_bypass.py (PR #30742c3caca658):
DISCORD_ALLOWED_ROLES no longer bypasses _is_user_authorized.
Inverted 3 tests to assert the new (correct) behavior: role config
alone does NOT authorize at the gateway layer.
- test_msgraph_webhook.py (PR #301694ca77f105):
adapter.is_connected is a @property, not a method. Test was calling
it with () after the connect() change; TypeError: 'bool' is not
callable. Removed the parens.
- test_feishu_approval_buttons.py (PR #30744bdb97b857):
Card-action callbacks now go through _allow_group_message
authorization. 3 tests in TestCardActionCallbackResponse didn't
populate adapter._allowed_group_users so the operator's open_id got
rejected. Added the allowlist setup to each test, matching the
existing pattern in test_returns_card_for_approve_action.
Also raise tolerance on test_wait_for_process_kills_subprocess_on_keyboardinterrupt:
the SIGTERM → 3s TimeoutStopSec → SIGKILL → reap chain can exceed 10s
under loaded xdist (40 workers). Bumped _wait_for_pgid_exit timeout
10→30s and worker join timeout 5→15s. Passes 100% in isolation
already; this just makes it tolerant of CI-host load.
Validation: 270/270 tests pass across the 5 affected files.
When the 'mcp' Python SDK isn't installed, _run_stdio leaked a bare
'NameError: name StdioServerParameters is not defined' because the
top-level 'from mcp import ...' fails inside try/except ImportError,
leaving the names unbound at module scope.
Mirror the _MCP_HTTP_AVAILABLE gate that _run_http already had: raise
a clear ImportError with install instructions instead.
Fixes#30904
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
`terminal(background=true)` without `notify_on_complete=true` or
`watch_patterns` runs the process SILENTLY — the agent has no way
to learn it finished short of calling `process(action='poll')`
explicitly. That's correct for genuine long-lived processes (servers,
watchers, daemons) but is a footgun for every bounded task (tests,
builds, deploys, CI pollers, batch jobs), which is the vast majority
of background uses.
Hit on May 23, 2026 (PR #31231 incident): agent launched a CI-watch
loop with `background=true` only. The poller ran fine, exited green
6 minutes later, agent never noticed. User had to surface 'we are
green CI, you can merge.' Memory and skill docs said *what* to do
(poll in background) but not *how* to receive the result. The
`notify_on_complete=true` flag exists and works, but is easy to
forget when bg seems sufficient on its own.
Two changes here, mutually reinforcing:
1. Runtime nudge: tool result for `background=true` w/o notify or
watch_patterns now includes a `hint` field explaining the silent-
process failure mode and pointing at the corrective flag. Agent
sees it on the same turn and self-corrects without needing the
user to surface anything. Cost for legitimate server cases is one
ignored read (~50 tokens); cost for forgot-notify cases is
prevented blindness (potentially many turns, or a user nudge).
False positives << false negatives.
2. Schema/description rewrite: top-level TERMINAL_TOOL_DESCRIPTION
and the `background` field description now lead with 'Almost
always pair with notify_on_complete=true' instead of presenting
it as one of two equally-likely patterns. The two legitimate
non-notify shapes (long-lived servers; watch_patterns mid-process
signals) are still documented, but as the minority case.
Tests cover all four shapes: bg-only emits hint, bg+notify doesn't,
bg+watch_patterns doesn't, foreground doesn't. 4 new tests; full
suite of background/process tests stays green (160/160 across the
relevant 6 test files).
The Windows branch of `_terminate_host_pid` early-returned after
`os.kill(pid, SIGTERM)` (which Python maps to `TerminateProcess` for
the target handle only), leaving descendant processes — e.g. Chromium
renderer/GPU/network helpers spawned by an `agent-browser` daemon —
running on Windows even after the preceding commit fixed POSIX.
The right Windows primitive is `taskkill /PID <pid> /T /F`:
`/T` walks the tree, `/F` force-terminates. Same approach
`gateway.status.terminate_pid(force=True)` already uses for the
gateway's own shutdown path; reuse the same shape here.
Why NOT extend the POSIX psutil tree-walk to Windows:
1. Windows doesn't maintain a Unix-style process tree. `psutil.
Process.children(recursive=True)` walks PPID links that go stale
when intermediate processes exit, so enumeration is best-effort
and silently misses orphaned descendants. The whole bug we're
fixing is orphaned descendants.
2. `psutil.Process.terminate()` on Windows is `TerminateProcess()`
for one handle — same single-PID scope as the existing
`os.kill`. The existing comment in `gateway/status.py::
terminate_pid` warns this explicitly: 'os.kill SIGTERM is not
equivalent to a tree-killing hard stop' on Windows.
3. Headless Chromium has no GUI window, so the softer
`taskkill /T` without `/F` (which sends WM_CLOSE) won't reach
it either. `/F` is required.
POSIX path is unchanged. The taskkill subprocess uses the same
`creationflags=windows_hide_flags()` pattern other Windows shellouts
in this codebase use. `FileNotFoundError` / `TimeoutExpired` /
`OSError` fall back to bare `os.kill(SIGTERM)` as cheap insurance.
Tests cover the Windows branch via the codebase's standard
`monkeypatch _IS_WINDOWS` pattern (`references/windows-native-
support.md`), plus POSIX tree-walk order, NoSuchProcess swallow,
and the OSError fallback path. 7 new tests, all green on Linux CI.
os.kill(pid, SIGTERM) only signals the parent, leaving Chromium child
processes (renderer, GPU, etc.) orphaned. Reuse the existing
ProcessRegistry._terminate_host_pid() helper which walks the process
tree leaf-up via psutil, terminating children before the parent.
Trim ~600 LOC off the original contribution while keeping the same
operator-facing surface and detection coverage.
- Collapse three entry points (file / dir / bundle) into one
ast_scan_path(path) that handles both files and directories.
- Drop AstFinding dataclass + severity field — replaced with plain
(file, line, pattern_id, description) tuples. Severity ordering was
display-only for a diagnostic that explicitly disclaims security
verdicts, so the field added bookkeeping without earning its place.
- Replace Rich-markup formatter with plain text grouped by file.
- Drop the 'inspect --ast-deep' surface — same scanner, same output as
'audit --deep', single CLI entry is enough. Operators audit after
install; pre-install inspection signal isn't worth the second surface.
- Trim test file to the cases that earn their place: bypass payload,
syntax error survival, RecursionError survival, false-positive guard
(importer lookalike), literal-arg false-positive guard, non-.py
ignored, directory recursion + cache-dir skipping, missing-path,
getattr/__dict__ detection, formatter empty + populated.
Net: tools/skills_ast_audit.py 353 -> 133 LOC,
tests/tools/test_skills_ast_audit.py 299 -> 103 LOC, full diff
+704/-12 -> +264/-6. No change to tools/skills_guard.py — Skills Guard
verdicts remain untouched per SECURITY.md §2.4.
Add opt-in AST diagnostics for skill review without making Skills Guard stricter by default.
- Add hermes skills inspect --ast-deep to scan fetched skill bundles before installation
- Add hermes skills audit --deep to scan already-installed hub skills
- Keep AST analysis in tools/skills_ast_audit.py, separate from tools/skills_guard.py
- Label output as diagnostic hints, not security verdicts
- Cover dynamic import/access patterns: importlib, __import__(computed), getattr(computed), and __dict__[computed]
This follows the maintainer guidance from closed PR #7436: useful AST-level analysis belongs in an opt-in diagnostic path, not in Skills Guard's default heuristic scan.
User incident (Slack, 2026-05-13): user walked away mid-conversation,
agent requested approval to run `rm -rf .git`, the prompt timed out
after the gateway_timeout (default 300s), and the agent removed the
.git folder on its own. Corroborated by an independent report from a
Telegram user.
The underlying code path was correct — `check_all_command_guards`
returns `approved=False` with a BLOCKED message on both timeout and
explicit deny, and `terminal_tool` surfaces that as `status=blocked`
to the agent. The bug is at the model-interface layer: the message
"BLOCKED: Command timed out. Do NOT retry this command." reads to
some models as "try a different command achieving the same outcome."
This commit changes only the model-facing message + the structured
return shape:
- Timeout message now explicitly names the three evasion paths the
agent must avoid: retry, rephrase, AND achieve the same outcome
via a different command. Ends with "Silence is not consent."
- Explicit deny gets the same shape minus the silence-is-not-consent
line (it WAS an explicit deny, not silence).
- New structured fields on the return dict: `outcome` ("timeout"
or "denied") and `user_consent` (always False on this branch)
so plugins, hooks, and audit pipelines don't have to string-parse
the message to distinguish the two cases.
The mechanism that should already have prevented the original incident
— timeout treated as deny, BLOCKED result, post hook fires with
`choice="timeout"` — is unchanged. This commit hardens only the
agent's reading of the result.
Tests:
- test_timeout_returns_approved_false_with_no_consent — pins the
return shape on the Slack-shaped notify_cb-registered path
- test_timeout_message_is_emphatic_against_retry_and_rephrase —
pins the exact phrases the message must contain
- test_explicit_deny_carries_same_no_consent_shape — same contract
on explicit /deny
- test_timeout_emits_post_hook_with_timeout_outcome — pins the
post_approval_response hook payload so audit plugins can act
329 approval tests passing (4 new + 325 existing).
Fixes#24912
Reproduction (production, 2026-05-14): two concurrent sessions on the
same agent. Session A patches MEMORY.md directly via the patch tool,
appending ~8KB of structured content (Vendor Master, Standing Orders,
Pin Board) — none of it through the memory tool, so no § delimiters.
Session B starts later with stale in-memory state (1 entry, ~331
chars). Session B calls memory(action=replace) on its one known
entry. The tool's _read_file parses A's content as a single 8KB
'entry' (no § splits), then replace truncates that entry to B's new
333-byte content. ~8KB of structured content silently destroyed.
The atomic-rename write path is fine in isolation. The bug is the
implicit contract: the tool assumes MEMORY.md is exclusively a
§-delimited list of small entries it wrote, but the v0.13 install
runbook itself uses 'cat >> MEMORY.md' for onboarding, the patch tool
edits the file directly, and operators do too.
Fix: a drift guard in MemoryStore._detect_external_drift that fires
on either signal:
1. Re-parse + re-serialize doesn't produce identical bytes
(catches oddly-encoded delimiters / partial writes).
2. Any single parsed entry exceeds the store's whole-file char
limit. The tool budgets the ENTIRE store against that limit
(2200 chars for memory, 1375 for user), so no tool-written
entry can legitimately be larger. An entry bigger than the
store limit means an external writer dropped free-form content
into what the tool will treat as one entry.
When drift fires, _reload_target writes a .bak.<ts> snapshot of the
on-disk file, then add/replace/remove refuse to flush. The original
file stays untouched. The error dict surfaces the .bak path AND a
remediation string ('integrate missing entries via memory(add=...)
one at a time, then rewrite the file clean') so the model can act on
it without escalating to the operator.
Tests:
- test_replace_refuses_on_drift, test_add_refuses_on_drift,
test_remove_refuses_on_drift — all three mutators refuse
- test_clean_file_does_not_trigger_drift — false-positive check
- test_error_message_points_at_remediation — error string shape
- test_drift_guard_also_protects_user_target — USER.md too
- test_drift_backup_filename_is_unique_per_invocation — bak.<ts>
naming pin
144 memory tests passing (was 137; +7).
Fixes#26045
Follow-up to @sprmn24's verdict-logic fix. The previous block-message
ended in 'Use --force to override' regardless of verdict — but as of
the --force fix above, dangerous community/trusted skills can't be
overridden by --force at all. The misleading hint sends users in a
loop. Replace it with a specific message that tells them what the
documented behavior actually is.
Adds two regression tests covering the dangerous-verdict message
shape and one that pins the existing --force hint for non-dangerous
blocks.
PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a
filename swap between two files in a bundle changes the digest. But it
only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py``
(the on-disk equivalent) still hashed file contents only.
These two functions need to stay symmetric: ``check_for_skill_updates``
compares the disk hash of an installed skill against the bundle hash
of the upstream copy. With the asymmetric fix, every clean install
showed as drifted because the digests no longer matched
(2 existing tests in ``test_skills_hub.py`` started failing as soon as
the contributor's change landed).
Apply the same ``rel_path + \x00 + content`` shape to the disk-side
function. Both functions now produce the same digest for the same skill
content laid out two ways. Documented the symmetry invariant in the
docstring so a future change to either function knows to touch both.
Also adds tests/tools/test_pr_6656_regressions.py with 10 regression
tests covering all three fixes salvaged in PR #6656:
- uninstall_skill path traversal (4 cases: parent segments, absolute
paths, symlink escape, legitimate skill)
- bundle_content_hash filename swap detection (4 cases: in-memory
swap, identity, disk-side swap, bundle↔disk symmetry)
- list_pending lock contract (2 cases: source-grep contract, smoke)
Also fixes AUTHOR_MAP entry for @aaronlab — their commit email
(1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub
login, so changelog @mentions would 404.
- test_browser_secret_exfil: mock _run_browser_command instead of
launching real Chrome (secret check is pre-launch, browser is
irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
yield the event loop before receive_text(). TestClient's sync mode
can race the broadcast handler otherwise, hanging the test.
First migration of an existing built-in platform adapter to the plugin
system established by IRC / Teams / LINE / Google Chat. Closes#24325;
advances the umbrella refactor in #3823.
Matches Teams' shape exactly — adapter under ``plugins/platforms/discord/``
with the standard ``__init__.py`` / ``adapter.py`` / ``plugin.yaml``
shell, ``register(ctx)`` entry point, **no back-compat shim** at the old
import path, and full parity for the four hooks Teams uses plus the
``apply_yaml_config_fn`` hook that landed in #25443 (the Discord plugin
is the first consumer of that hook):
* ``standalone_sender_fn`` — out-of-process cron delivery via REST API
* ``setup_fn`` — interactive ``hermes setup gateway`` wizard
* ``apply_yaml_config_fn`` — translate ``config.yaml`` ``discord:`` keys
into ``DISCORD_*`` env vars (replaces the hardcoded block in
``gateway/config.py``)
* ``is_connected`` — declares connection state from ``DISCORD_BOT_TOKEN``
* ``check_fn`` — lazy-installs ``discord.py`` on demand
* plus ``allowed_users_env``, ``allow_all_env``, ``cron_deliver_env_var``,
``max_message_length``, ``emoji``, ``required_env``, ``install_hint``
* ``gateway/platforms/discord.py`` (5,101 LOC) →
``plugins/platforms/discord/adapter.py`` (git rename, R090).
* New ``plugins/platforms/discord/{__init__.py, plugin.yaml}`` with
``requires_env`` / ``optional_env`` declarations.
* Append ``register(ctx)`` block + new hook implementations
(``_standalone_send``, ``interactive_setup``, ``_apply_yaml_config``,
``_clean_discord_user_ids``, ``_is_connected``, ``_build_adapter``,
plus helpers ``_DISCORD_CHANNEL_TYPE_PROBE_CACHE`` etc.) to the
adapter.
* Replace the ``Platform.DISCORD elif`` branch in
``GatewayRunner._create_adapter()`` (−9 LOC) with a generic post-creation
hook (+6 LOC) in the registry path: any plugin adapter that declares a
``gateway_runner`` attribute now gets it auto-injected. Webhook's
built-in branch is unchanged (it doesn't go through the registry path).
* Move ``_send_discord`` (190 LOC) and helpers
(``_DISCORD_CHANNEL_TYPE_PROBE_CACHE``, ``_remember_channel_is_forum``,
``_probe_is_forum_cached``, ``_derive_forum_thread_name``) from
``tools/send_message_tool.py`` into the plugin as ``_standalone_send``.
* Wire via ``standalone_sender_fn=_standalone_send`` (Teams pattern; same
gap fixed in #21804 for other plugin platforms).
* Replace the Discord ``elif`` in ``tools/send_message_tool.py``
``_send_to_platform`` with a 10-line registry-hook dispatch.
* Drop the ``DiscordAdapter`` import and the
``Platform.DISCORD: DiscordAdapter.MAX_MESSAGE_LENGTH`` ``_MAX_LENGTHS``
entry — the registry's ``max_message_length=2000`` covers it.
* Move ``_setup_discord`` and ``_clean_discord_user_ids`` (68 LOC) from
``hermes_cli/setup.py`` into the plugin as ``interactive_setup``.
* Wire via ``setup_fn=interactive_setup``. CLI helpers (``prompt``,
``print_info``, etc.) are lazy-imported so the plugin's module-load
surface stays minimal.
* Remove ``"discord": _s._setup_discord`` from
``hermes_cli/gateway.py::_builtin_setup_fn``.
* Remove the entire 32-line ``_PLATFORMS["discord"]`` static dict entry —
Discord's setup metadata is now discovered dynamically via
``_all_platforms()`` from the registry entry.
* Move the 59-line ``discord_cfg`` YAML→env bridge from
``gateway/config.py::load_gateway_config()`` into the plugin as
``_apply_yaml_config``. Covers ``require_mention``,
``thread_require_mention``, ``free_response_channels``, ``auto_thread``,
``reactions``, ``ignored_channels``, ``allowed_channels``,
``no_thread_channels``, ``allow_mentions.{everyone,roles,users,
replied_user}``, and ``reply_to_mode`` (including the YAML 1.1
``off``-as-False coercion and the ``extra.reply_to_mode`` fallback).
* Wire via ``apply_yaml_config_fn=_apply_yaml_config``.
* The hook runs BEFORE ``_apply_env_overrides`` and after the generic
shared-key loop, exactly as documented in
``website/docs/developer-guide/adding-platform-adapters.md``.
* Behavior is preserved exactly — every assignment still uses
``not os.getenv(...)`` guards so env vars take precedence over YAML.
All 78 references to the old import path are rewritten — no back-compat
shim:
* 51 ``from gateway.platforms.discord import X`` →
``from plugins.platforms.discord.adapter import X``
* 5 ``import gateway.platforms.discord as discord_platform`` →
``import plugins.platforms.discord.adapter as discord_platform``
* 1 ``from gateway.platforms import discord as discord_mod`` →
``from plugins.platforms.discord import adapter as discord_mod``
* 21 ``mock.patch("gateway.platforms.discord.X")`` strings →
``mock.patch("plugins.platforms.discord.adapter.X")``
* 1 docstring reference in ``hermes_cli/commands.py``
* 1 import in ``tools/send_message_tool.py`` (now removed entirely)
The import-safety test in ``tests/gateway/test_discord_imports.py`` is
updated to purge the new canonical module name from ``sys.modules``.
**38 files changed, +621 / −473** — net positive due to the YAML hook
implementation (89 new LOC in the plugin trading for 59 deleted in core),
but every line moved has a clear plugin home now. The git rename is
detected at R090 because the adapter gained ~340 LOC of moved-in hook
implementations (``_standalone_send`` + ``interactive_setup`` +
``_apply_yaml_config`` + helpers).
* All 568 Discord-specific tests pass across 25 ``test_discord_*.py``
files plus voice/send/text-batching/reload-skills/stream-consumer/
integration tests.
* All 147 tests in the YAML-touching subset
(``test_discord_reply_mode``, ``test_discord_free_response``,
``test_discord_allowed_channels``, ``test_discord_allowed_mentions``,
``test_discord_channel_controls``, ``test_discord_reactions``,
``test_discord_thread_persistence``, ``test_runtime_footer``) pass —
this is the strongest signal that the YAML→env hook behaves
identically to the legacy block.
* Broader gateway/cron/integration sweep (1297 tests) introduces zero
new failures vs ``main``. Pre-existing failures in
``tests/gateway/test_tts_media_routing.py`` and
``tests/e2e/test_platform_commands.py`` reproduce identically on the
unchanged ``main`` revision.
* Plugin discovery sanity check confirms Discord registers alongside the
other four platform plugins:
Registered platforms: ['discord', 'google_chat', 'irc', 'line', 'teams']
These Discord-shaped tendrils in core were **deliberately not moved** —
they are generic platform-registry concerns affecting every platform,
not Discord-specific:
* ``gateway/config.py:1205`` ``DISCORD_BOT_TOKEN → config.token`` env
enablement — same shape Telegram has. The existing
``env_enablement_fn`` registry hook only seeds ``extra``, not
``.token``, so it can't replace this without an adapter refactor to
read from ``extra["bot_token"]``.
* ``gateway/run.py`` voice-mode hooks
(``self.adapters.get(Platform.DISCORD)`` for
``start_voice_mode``/``stop_voice_mode``), role-based auth,
``DISCORD_ALLOW_BOTS`` branch in ``_is_user_authorized``,
``_UPDATE_ALLOWED_PLATFORMS`` frozenset, and the per-platform
allowlist maps — generic platform-registry concerns.
* ``Platform.DISCORD`` enum literal — stable identifier used as dict
keys throughout the codebase; removing it is a separate refactor with
no real benefit.
* ``tools/discord_tool.py`` and ``tools/environments/local.py`` —
first-class agent tools and env-passthrough config, neither is the
gateway adapter.
Each of these is worth its own scoping issue when the time comes.
PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME,
which is fine in non-profile mode but leaves a gap once a profile is
active: HERMES_HOME points at <root>/profiles/<name>, so the global
<root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json
+ <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR
#15981 closed via _hermes_root_path().
Apply the same widening pattern here. The control-file/mcp-tokens check
now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes
when they coincide in non-profile mode). Also tightens the mcp-tokens
check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep"
so writing the directory entry itself is blocked, not just files inside.
Regression tests cover both protections in a real profile-mode layout
(<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.
Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.
Fixes#14072
Move the autouse `_disable_lazy_stt_install` fixture out of the three
transcription test files and into `tests/tools/conftest.py` as a regular
(non-autouse) fixture. Each transcription test module opts in once at
the top via `pytestmark = pytest.mark.usefixtures(...)`.
Why: addresses three Copilot inline review comments on this PR that
flagged the verbatim duplication across files. Centralizing also keeps
the patch target in a single place, so a future rename of
`_try_lazy_install_stt` only updates one location.
Why opt-in (not autouse in conftest): other `tests/tools/` files do not
patch `_HAS_FASTER_WHISPER` and have no reason to bypass the runtime
lazy-install probe; making the fixture autouse globally would silently
mask any future test that wants to exercise the real lazy-install path.
`b5c6d9ac0` ("fix: wire STT lazy-install into transcription_tools.py")
added `_try_lazy_install_stt()`, which calls
`importlib.util.find_spec("faster_whisper")` after `ensure()` runs.
In the dev / CI environment `faster_whisper` is already installed, so
the probe returns truthy and `_get_provider()` returns "local" even
when the test has patched `_HAS_FASTER_WHISPER=False` to simulate
"not installed".
Add a per-file autouse fixture that patches `_try_lazy_install_stt`
to return False so the simulation stays accurate. The 16 baseline
failures across `test_transcription_tools.py`,
`test_transcription.py`, and `test_transcription_dotenv_fallback.py`
disappear; the production lazy-install path is unaffected at runtime.
_maybe_follow_capture() issued a follow-up screenshot unconditionally
when capture_after=True, even when res.ok=False. The model then received
a normal-looking screenshot alongside an error message, and in practice
it often ignored ok=False and proceeded as if the action had succeeded.
Fix: return _text_response(res) early when res.ok is False so the model
receives only the error and can decide how to recover.
Tests added:
- test_capture_after_skipped_when_action_failed: patches click to return
ok=False and asserts no capture call is issued.
- test_capture_after_fires_when_action_succeeds: ensures the happy path
still triggers the follow-up capture.
_dispatch() routes action="set_value" to backend.set_value(), but:
- ComputerUseBackend did not declare set_value as @abstractmethod, so
subclasses could silently omit it without a TypeError at class load time.
- _NoopBackend (the test/CI stub) had no set_value method at all, causing
AttributeError in any test that exercises the set_value action path.
Fix:
- Add set_value as @abstractmethod to ComputerUseBackend in backend.py.
- Add a recording stub in _NoopBackend in tool.py.
- Add two TestDispatch cases: one verifying the call reaches the backend,
one verifying the missing-value guard returns a clean error.
Four findings from Copilot's review on PR #22891, all in the AX
elements-array cap added by 22fa1ed:
1. The truncation note ("response truncated to N of M elements") was
appended unconditionally — including in the som/vision multimodal
path, whose response carries a screenshot rather than an `elements`
array. The note described a payload field that wasn't present.
Moved the note into the AX-text branch where the array actually
appears.
2. `_format_elements(cap.elements)` ran on the full untrimmed list with
its own `max_lines=40` cap, so a caller passing `max_elements=10`
would see summary lines referencing `#11..#40` even though the JSON
`elements` array only held #1..#10. Format on `visible_elements`
instead so the summary indices always exist in the response.
3. `_coerce_max_elements` enforced a lower bound but no upper bound,
so `max_elements=10_000_000` silently disabled the safeguard and
reintroduced the original context-blow-up. Added a hard cap
(`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.
4. The schema string said "Default 100" but the property carried no
`default` field, and claimed `max_elements` had no effect on som/
vision while the image-missing fallback path can still return an
elements array. Added `"default": 100`, `"maximum": 1000`, and
clarified the fallback-path wording.
Each finding gets a regression test:
- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound
Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`computer_use(action='capture', mode='ax')` returned the full AX element
list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains
UIs publish 500+ AX nodes (one reproduction in #22865 returned 597
elements against Obsidian), so a single capture could consume enough
context to trigger compression failures or render the session unusable.
The human-readable `_format_elements` summary is already capped at 40
lines, so the truncation gap was invisible to anyone reading the summary
output.
Add a `max_elements` argument to the tool schema, default 100, that
trims the AX `elements` array. When the cap fires, the response surfaces
`total_elements` and `truncated_elements` and appends a "raise
max_elements or pass app= to narrow" hint to the summary so the model
knows the JSON view is partial and can re-issue with a tighter scope.
Validation is centralized in `_coerce_max_elements`: missing /
non-integer / sub-1 inputs fall back to the default cap, so the
protection can never be silently disabled by a malformed tool-call
argument. The cap only affects AX-mode JSON; `mode='som'` and
`mode='vision'` keep returning a screenshot + image-aware summary
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add tests/tools/test_computer_use_capture_routing.py — 13 integration
tests that drive _capture_response end-to-end with deterministic stubs
for the routing helper, _run_async, vision_analyze_tool, and
get_hermes_dir, so the full code path is exercised without a live
cua-driver, real auxiliary client, or network access.
Coverage:
* TestCaptureResponseDefaultPath (3 cases)
- SOM PNG capture returns the legacy multimodal envelope when the
routing helper says 'native' (image/png MIME).
- Same path returns image/jpeg MIME for JPEG payloads (cua-driver
can return either).
- AX-only mode never even consults the routing helper because no
PNG is present.
* TestCaptureResponseRoutedToAuxVision (5 cases)
- SOM capture with routing on returns a JSON string with the
vision_analysis embedded, the AX/SOM index preserved, and NO
image_url parts. Verifies the aux call receives a path under
the configured cache and a prompt that grounds itself against
the AX summary.
- Temp screenshot file is unlinked after _capture_response returns,
including when the aux call raises (the finally block runs).
- Empty / malformed aux analysis falls back to the multimodal
envelope so the user always gets *something* useful.
* TestRoutingDecisionWiring (4 cases)
- Explicit auxiliary.vision in config flips routing on regardless of
main-model vision capability.
- Vision-capable main + native tool-result support keeps multimodal.
- Config load failure fails open (returns False, multimodal path
continues to work).
- Helper exception is swallowed and routes to legacy behaviour.
* TestBugReproductionAnchor (1 case) - directly pins the #24015
contract: when routing is on, the response must NEVER contain a
'data:image' or 'image_url' substring. That is exactly what tripped
the reporter's HTTP 404 ('No endpoints found that support image
input') on tencent/hy3-preview before the fix.
Bug-reproduction proof:
$ git checkout upstream/main -- tools/computer_use/tool.py
$ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
============================== 13 failed in 1.29s ==============================
$ # restore tool.py to this branch's HEAD
$ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
============================== 13 passed in 1.04s ==============================
Total branch coverage:
85 passed across test_computer_use.py, test_computer_use_vision_routing.py,
test_computer_use_capture_routing.py
Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests
that pin the contract of the new vision-routing helper introduced in
the previous commit:
* TestExplicitAuxVisionOverride (12 cases): mirror the
auxiliary.vision detection rules used by agent.image_routing so
the capture path and the user-attached-image path agree on what
counts as an explicit override (provider/model/base_url with
non-blank, non-'auto' values).
* TestRouteDecision (7 cases): pin the policy itself — explicit
override always wins, vision-capable + native-tool-result keeps
multimodal, everything else fails closed and routes to aux.
* TestLookupHelpers (5 cases): defensive paths for the models.dev /
tool-result-support lookups (blank inputs, exceptions, missing
caps).
* TestModuleSurface (4 cases): pin the public/__all__ surface and
keep internal helpers addressable so the integration test in the
next commit can monkeypatch them deterministically.
Run with:
scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.
The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.
Fix:
- `capture(app=…)`: when the filter matches nothing, return a
`CaptureResult` with empty `app`/`elements` and a diagnostic
`window_title` pointing the caller at `list_apps` and noting the
localized-name convention. `_active_pid` / `_active_window_id` are left
untouched so a subsequent action doesn't inadvertently hit the wrong
process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
and let the existing `return ActionResult(ok=False, …, "No on-screen
window found for app …")` path fire instead of falsely reporting success
on the frontmost window.
This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.
Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic ' - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.
Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.
focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.
5 new regression tests added.
Part of #24170 (bugs 1 and 3/4 addressed separately).
* fix(skills): skip dependency dirs in skill scan
* fix(skills): widen sibling rglob scanners to use shared exclusion set
Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.
Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.
Sites touched:
agent/curator_backup.py - skill backup file count
gateway/run.py - disabled-skill response (2 sites)
hermes_cli/dump.py - skill count in env dump
hermes_cli/profile_describer.py- profile description (2 sites)
hermes_cli/profile_distribution.py - profile install count
hermes_cli/profiles.py - profile skill count
hermes_cli/skills_hub.py - category detection
tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper)
tools/skill_usage.py - usage tracking + skill dir lookup (2 sites)
tools/skills_hub.py - optional skills find + scan (2 sites)
tools/skills_sync.py - bundled skills sync
E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.
* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep
---------
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>