Closes#33368.
`_CodexCompletionsAdapter.create()` iterates `final.output` from the
Codex Responses stream. The event-driven consumer (introduced in #33042)
always sets `final.output` to a list, so this shape can't come from our
own code path. But:
- Mocked clients in tests can return a typed Response with `output=None`
- Third-party shims / compatibility layers that bypass the consumer can
do the same
- A future code path that wraps a different consumer could regress
The old code `getattr(final, "output", [])` returns `None` (not the
default `[]`) when the attribute EXISTS but is `None`. Iterating
`None` then raises `TypeError: 'NoneType' object is not iterable` —
the exact error logged by title-generation when this fires.
Fix: `getattr(final, "output", None) or []` — single-line defensive
coerce. Cheap; zero risk.
Regression test asserts the auxiliary path handles a final whose
`.output` is `None` (via monkey-patched consumer) without raising and
returns the expected chat.completions-shaped response.
Reporter: @pavegrid-1 (issue #33368).
Three additions on top of @Nami4D's salvage:
1. Gate the preflight slash-enum strip on the model name pattern
(grok-* / x-ai/grok-*). The original PR stripped slash-containing
enum values from every codex_responses request, but native Codex
(OpenAI) and GitHub Models DO accept slash enums — stripping them
there would silently degrade tool-schema constraints. xAI is the
only Responses-API surface that rejects the shape.
2. Resolve the merge conflict in agent/transports/codex.py by
preserving both the timeout-forwarding block that landed on main
between the PR's branch point and now AND the new service_tier
strip. Behavioural intent of both is preserved.
3. Six new tests in tests/agent/transports/test_codex_transport.py
covering:
- TestCodexTransportXaiServiceTierStrip (3 tests): xAI strips
service_tier from request_overrides; non-xAI codex_responses
and GitHub Models both KEEP service_tier (regression guards
so the strip stays xAI-only).
- TestPreflightSlashEnumStrip (3 tests): Grok and aggregator-
prefixed Grok model names both trigger the safety-net strip;
non-Grok models preserve slash enums as a regression guard
against the strip becoming too broad.
51/51 in tests/agent/transports/test_codex_transport.py.
Co-authored-by: Nami4D <hello@nami4d.tech>
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
* refactor(codex): drop SDK responses.stream() helper; consume events directly
The OpenAI Python SDK's high-level `client.responses.stream(...)` helper
does post-hoc typed reconstruction from the terminal
`response.completed.response.output` field. The chatgpt.com Codex
backend has been observed (today, gpt-5.5) to ship `response.output =
null` on terminal frames, which crashes the SDK with `TypeError:
'NoneType' object is not iterable` mid-iteration.
Carlton's #32963 patched the symptom by wrapping the helper in
try/except and recovering from the same per-event accumulator the SDK
was supposed to populate. This PR removes the helper from the call
path entirely: we now use `client.responses.create(stream=True)` (raw
AsyncIterable of SSE events) and assemble the final response object
ourselves from `response.output_item.done` events as they arrive. The
terminal event's `output` field is never read for content. Same
strategy OpenClaw uses for the same backend.
This makes Hermes structurally immune to the bug class, not patched.
The next time OpenAI ships a shape change to chatgpt.com's terminal
frame, our consumer keeps working because it doesn't read that frame
for content — only for usage/status/id.
Changes
- `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared
consumer; `run_codex_stream()` uses `responses.create(stream=True)`;
`run_codex_create_stream_fallback()` collapses into a thin alias
since the primary path now does what the fallback used to do.
- `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the
same consumer; old null-output recovery helpers deleted as
unreferenced.
- Tests migrated: fixtures that mocked `responses.stream` now mock
`responses.create` returning a raw iterable. New regression test
asserts the auxiliary path returns streamed items even when the
terminal event's `output` is literally `null`.
Validation
- Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex`
with `gpt-5.5` — response built correctly with `response.output=null`
on the terminal frame, all events consumed, usage/reasoning tokens
propagated.
- `tests/run_agent/test_run_agent_codex_responses.py` +
`tests/agent/test_auxiliary_client.py`: 242 passed.
* test+fix(codex): migrate streaming tests, raise on truncated streams
CI surfaced 10 test failures across tests/run_agent/test_streaming.py
and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had
their own `responses.stream(...)` mocks I missed in the first sweep.
agent/codex_runtime.py: _consume_codex_event_stream() now raises
"Codex Responses stream did not emit a terminal response" when the
stream ends without any terminal frame AND no usable content. This
preserves the signal callers used to get from the SDK's high-level
helper, which they distinguished from "completed with empty body"
in error handling.
Tests migrated:
- test_streaming.py: text-delta callback, activity-touch, and
remote-protocol-error tests all switch from mocking responses.stream
to responses.create returning an iterable of events.
- test_codex_xai_oauth_recovery.py: prelude-error tests are recast as
wire-error-event tests (the new path raises _StreamErrorEvent
directly when the wire emits type=error, which is strictly better
than the old two-phase "SDK RuntimeError → retry → fallback"). The
retry-on-transport-error test moves from responses.stream side-effect
to responses.create side-effect.
Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat()
through the full codex_responses path returns correctly, 319/319
targeted tests passing.
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144)
When an OpenAI-compatible Responses API surface accepts an initial
request but later rejects the replayed `codex_reasoning_items`
encrypted blob with HTTP 400 `invalid_encrypted_content`, the
session previously got stuck retrying the same poisoned payload.
Recovery: classify the error as a dedicated FailoverReason, and on the
first hit disable encrypted reasoning replay for the rest of the
session, strip cached items from message history, and retry once.
Changes:
* error_classifier: add FailoverReason.invalid_encrypted_content
branch in _classify_400 (before context_overflow so the messages
that mention 'encrypted content … could not be verified' don't trip
context heuristics), in _classify_by_error_code, and extend
_extract_error_code to peek inside wrapped JSON in error.message and
ignore the bare '400' as a code.
* agent_init: initialize `_codex_reasoning_replay_enabled = True` on
every agent.
* run_agent: add AIAgent._disable_codex_reasoning_replay() helper
that flips the flag and pops cached items.
* codex_responses_adapter: thread a `replay_encrypted_reasoning`
kwarg through _chat_messages_to_responses_input so that when the
flag is False we don't replay codex_reasoning_items.
* transports/codex.py: read `replay_encrypted_reasoning` from params,
thread it into the adapter, and gate the
`include=['reasoning.encrypted_content']` request hint on it.
* chat_completion_helpers: pass the agent's replay flag through to
the transport.
* conversation_loop: in the retry loop, add an
invalid_encrypted_content recovery branch that fires once per
session, only when api_mode == codex_responses, only when replay is
still enabled, and only when at least one assistant message in
history actually carries cached reasoning items (otherwise the 400
has nothing to do with our cache and the normal retry path handles
it).
Tests:
* test_error_classifier: new wrapped-JSON _extract_error_code case;
new TestClassifyApiError cases proving the 400 is retryable with
no fallback, that the broad message match doesn't catch a generic
'parsed' message, and that the error code match is
case-insensitive.
* test_run_agent_codex_responses: end-to-end test of the recovery
branch firing once and disabling replay, plus a sibling test that
proves the branch does *not* fire (and the flag stays True) when
history has no cached reasoning items.
Salvages PR #10144 onto the post-refactor module layout
(error_classifier / codex_responses_adapter / transports/codex /
conversation_loop / agent_init) since the original diff was written
against the pre-refactor monolithic run_agent.py.
* chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage
---------
Co-authored-by: victorGPT <wuxuebin1993@gmail.com>
SubdirectoryHintTracker was scanning directories outside the active
working directory, allowing files like ~/.codex/AGENTS.md or
~/.claude/CLAUDE.md to be loaded and injected into the agent context.
This causes cross-agent context contamination and instruction mixup.
Add _is_ancestor_or_same() helper and a path boundary check in
_is_valid_subdir(): only directories within the working directory tree
(i.e. path.is_relative_to(working_dir)) are allowed.
Also add exist_ok=True to mkdir() calls in new tests to prevent
pytest-xdist race conditions when workers share the same tmp_path parent.
Tests added:
- test_outside_working_dir_rejected: verifies sibling dirs are blocked
- test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked
- test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected
- test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace
When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude
Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY
to ~/.hermes/.env and zeros ANTHROPIC_TOKEN. That env-var pattern is the
user's explicit choice of auth method — API key, not OAuth.
But the anthropic credential pool's autodiscovery (_seed_from_singletons)
unconditionally read ~/.claude/.credentials.json from the Claude Code CLI
and any saved hermes_pkce creds, and added them to the SAME anthropic
pool as the user's API key. Two problems:
1. Even with the API key at higher priority, a 401/429 on the API key
would rotate the session onto an autodiscovered OAuth credential,
silently flipping the agent into the Claude Code masquerade
mid-conversation: 'You are Claude Code' system block, every tool
renamed to mcp_*, claude-cli User-Agent header.
2. Switching OAuth → API key at `hermes setup` cleared the env vars
but left previously-seeded OAuth entries dormant in auth.json,
where rotation could revive them.
The user picking the API-key path is explicitly opting OUT of the
masquerade. Mixing OAuth credentials into their pool defeats that
choice.
Fix: in `_seed_from_singletons` for provider='anthropic', detect the
API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and:
- Skip calling read_claude_code_credentials() and
read_hermes_oauth_credentials() entirely
- Prune any stale hermes_pkce / claude_code entries that may already
be in the on-disk pool
OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery
continues to fire as before.
Tests: 3 new regression tests (api-key skips autodiscovery, api-key
prunes stale entries, oauth path still autodiscovers). Full file 70/70.
Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:
1. tools/threat_patterns.py — single source of truth for injection/promptware
patterns. Replaces the duplicated pattern lists in prompt_builder.py and
memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
override, known framework names). Three scopes — 'all' (narrow, classic
injection), 'context' (adds promptware/role-play, broader detection),
'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).
2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
frozen system-prompt snapshot. Live state keeps the original so the
user can still inspect + remove via memory(action=read/remove). Scan is
deterministic from disk bytes — prefix-cache invariant holds.
3. make_tool_result_message() wraps results from high-risk tools
(web_extract, web_search, browser_*, mcp_*) in
<untrusted_tool_result source="...">...</untrusted_tool_result>
delimiters with framing prose telling the model the content is data,
not instructions. Architectural defense against indirect injection
from poisoned web pages, GitHub issues, MCP responses — does NOT
regex-scan tool results (pattern arms race + per-iteration latency).
Multimodal content lists pass through unwrapped to preserve adapter
compatibility.
Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.
Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
path, blocked from MEMORY.md snapshot, wrapped in delimiters when
arriving via web_extract. Legitimate 'you must follow conventions'
phrasing not flagged.
Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
block-with-placeholder — there's no warn mode that makes sense)
Closes#496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.
xAI retired grok-4-1-fast. hermes_cli/models.py already removed it from
the static fallback in an earlier commit, but the context-length
metadata, the tests pinning those values, and the provider doc still
referenced the retired ID. Clean those up so retired model names stop
appearing in user-facing output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path),
but the non-retryable-401 guidance branch only covered openai-codex and
xai-oauth. A Nous 401 fell through to the generic 'Your API key was
rejected... run hermes setup' message, which is wrong advice — the user
needs hermes auth add nous --type oauth, not an API key.
Also flag the case where the failing model slug ends in :free (OpenRouter
syntax) while provider is nous. Without that hint, users re-OAuth
successfully and then hit the same 401 on the next message because Nous
Portal doesn't carry the OpenRouter free-tier slug.
Reported by ashh — debug dump showed Nous device_code exhausted +
deepseek/deepseek-v4-flash:free as the model.
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override. The
expectation in that case is universal: 'use my main model for side
tasks too.'
Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend. That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.
The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):
1. caller-passed model (caller knew what they wanted)
2. provider's catalog default (cheap aux model, if registered)
3. user's main model from config.yaml
Behaviour by provider class:
- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
step 3 applies. Title gen runs on grok-4.3 / gpt-5.4 against the
user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
default wins at step 2, preserving the original 'cheap aux model'
behaviour. Anthropic users still get claude-haiku-4-5 for titles,
not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
callers) — caller wins at step 1, no surprise switching.
Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically. The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.
4 new tests in TestResolveProviderClientUniversalModelFallback:
- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks
365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.
Co-authored-by: wysie <wysie@users.noreply.github.com>
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode
where it accepts the connection but never emits a single stream event — the
socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP
status), and a fresh reconnect then succeeds in ~2s. Today the only guard is
the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival
connection is held for the full stale window (90-900s depending on context /
config) before the retry loop can reconnect — minutes of wasted wall time per
stall, at a rate of ~20% of calls during affected windows.
Add a TTFB watchdog scoped to the codex_responses path:
- codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on
*every* stream event (not just output-text deltas), so reasoning-only and
tool-call-only turns are not mistaken for a stall.
- interruptible_api_call resets that marker before the worker starts and, while
it is still None, kills the connection once elapsed exceeds the TTFB cutoff
(default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The
raised TimeoutError flows through the existing retry path unchanged.
Once any event has arrived the stream is healthy and only the existing
wall-clock stale timeout applies, so legitimate long generations are never
interrupted. Gated to codex_responses; the chat_completions non-stream,
anthropic and bedrock branches have no first-event signal and are untouched.
Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the
events-flowing pass-through, and the env-disable path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_read_block_error() only blocked internal Hermes cache files but
allowed reading project-local secret-bearing environment files (.env,
.env.production, .env.local, etc.) through both read_file and ACP
fs/read_text_file paths.
Add a basename deny set for common secret-bearing .env variants.
.env.example remains readable as documentation.
Fixes#20734
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:
1. The non-stream stale-call detector estimated context tokens from
``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
their conversational load in ``input`` (with ``instructions`` and
``tools``), so every Codex turn logged ``context=~0 tokens`` and the
detector never applied its >50k / >100k tier bumps.
2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
main Codex path. The chat_completions path and the auxiliary Codex
adapter both forwarded it; the main path skipped it through three
places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
``_preflight_codex_api_kwargs``).
3. The streaming stale detector had the same payload-shape bug for
``codex_responses`` requests, which route through the non-streaming
detector (it's the path that emits the user-facing
"No response from provider for 300s (non-streaming, ...)" warning that
reporters keep pasting).
This commit:
- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
used by both the non-stream and stream detectors. Handles ``messages``
(Chat Completions), ``input + instructions + tools`` (Responses API),
bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
and ``_preflight_codex_api_kwargs`` (with guards against
zero/negative/inf/bool values), and wires
``_resolved_api_call_timeout()`` into the Codex branch of
``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
kick in faster when upstream stalls:
* base 300s -> 90s
* >50k 450s -> 150s
* >100k 600s -> 240s
These only apply when the user has *not* set
``providers.<id>.stale_timeout_seconds`` or
``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
context-tier scaling, transport timeout pass-through, and preflight
timeout pass-through / rejection of invalid values.
Closes#21444
Supersedes #21652#24126#31855
Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them. The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image. Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.
Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal. Mirrors the audio_cache pattern used by text_to_speech.
Wires save_url_image into both the xAI and OpenAI providers' URL
branches. When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.
Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
in-process HTTP server: bytes round-trip, content-type → extension,
URL-suffix fallback, default-to-png, 404 propagation, empty-body
refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
test_successful_url_response (was asserting the bug), add
test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.
160/160 in the broader image_gen test surface.
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.
The hook covers cases the shell-template grammar can't reasonably
express:
- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows
None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.
## Resolution order
The dispatcher's resolution order is the load-bearing invariant:
1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
→ command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
→ plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).
Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
the `hermes tools` row list defensively.
Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
`_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
config defensively so a refactor of the caller can't silently break
the invariant.
## New files
- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
`list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
`voice_compatible` (all optional with sane defaults). Mirrors
`agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
`agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).
## Modified files
- `hermes_cli/plugins.py` — `register_tts_provider()` method on
`PluginContext`. Matches the gating shape of
`register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
`_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
plugin rows into the Text-to-Speech picker category alongside the
10 hardcoded built-in rows.
## Tests
- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
`tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
built-in-always-wins, command-wins-over-plugin, plugin dispatch,
exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
picker surface, builtin shadowing defense, integration with
`_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
parity harness vs `origin/main`. The only intentional diff is
`fallback_edge → plugin` for the `plugin-installed` scenario.
## Verification
- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
`text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.
## Docs
- `website/docs/user-guide/features/tts.md` — new "Python plugin
providers" section with a decision table (command-provider vs
plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
mention both surfaces (command-provider primary, plugin for
SDK/streaming).
Closes#30398
_write_claude_code_credentials wrote ~/.claude/.credentials.json via
Path.write_text + replace + post-write chmod(0o600). Both the temp file
and the destination briefly inherited the process umask (commonly 0o644
= world-readable) between create/replace and chmod, exposing the OAuth
access/refresh tokens to other local users on multi-user hosts.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the temp file is created atomically at 0o600. After os.replace,
the destination inherits the temp's mode, so the post-write chmod is no
longer needed. The temp name also gains a per-process random suffix to
avoid collisions between concurrent writers and stale leftovers from a
crashed prior write.
Parent dir (~/.claude/) is owned by Claude Code itself and shared with
its native auth, so we deliberately don't tighten its mode here (unlike
the mcp_oauth fix which owns its own subtree under HERMES_HOME).
Mirrors the fix shipped for agent/google_oauth.py in #19673 and the
parallel fix for tools/mcp_oauth.py in #21148.
Adds a regression test in TestWriteClaudeCodeCredentials asserting the
resulting file mode is 0o600 (skipped on Windows where POSIX mode bits
aren't enforced).
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without
this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool
name in step 3, so a native MCP server tool registered as
'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The
incoming strip only removes ONE prefix, which still worked on first
call, but on subsequent calls the model pattern-matched the
single-prefixed form from message history and produced names that
stripped to 'composio_X' — registry miss, dispatch fail.
The history-rewrite block (#4) already has this guard. Apply the same
guard to the schema-rewrite block (#3) so round-trip is symmetric.
Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass.
Author map: hayka-pacha added for PR #25270 salvage attribution.
Refs GH-25255.
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response
unconditionally stripped the mcp_ prefix from ALL tool names starting
with mcp_. This broke Hermes-native MCP server tools (registered under
their full mcp_<server>_<tool> name in the registry) because the stripped
name doesn't match any registry entry.
Fix: check the tool registry before stripping. Only strip when:
- The stripped name EXISTS in the registry (OAuth-injected tool)
- The full name does NOT exist in the registry
This preserves backward compatibility for OAuth-injected tools while
protecting native MCP server tools from incorrect prefix removal.
7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp,
unknown tools, mixed responses, and dual-registration edge case.
Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com>
Standard OpenAI returns request-validation failures (unknown/
unsupported parameter, malformed request) as 4xx. Some
OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me
returns 502 for an unknown parameter.
The generic '5xx -> retryable server_error' rule then misfires: the
error is deterministic (every retry gets the identical rejection), so
the retry loop burns all 3 attempts, the transport-recovery path
resets the counter and burns 3 more, and the result is a request
flood against a request that can never succeed.
Fix: when a 500/502 body carries an unambiguous request-validation
signal — 'unknown parameter' / 'unsupported parameter' /
'invalid_request_error' in the message text, or invalid_request_error
/ unknown_parameter / unsupported_parameter as the structured error
code — classify as a non-retryable format_error so the loop fails
fast and falls back. Genuine 502 Bad Gateway with no such signal
stays retryable as before.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.
chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:
502: [ObjectParam] [input[617]._empty_recovery_synthetic]
[unknown_parameter] Unknown parameter:
'_empty_recovery_synthetic'
Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.
Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision
Fixes#31179. Three coupled fixes so a configured aux vision backend
actually serves vision tasks instead of silently routing images to the
user's main provider:
1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves
to `custom` + `https://api.openai.com/v1`. "openai" was not in
PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for
manual base_url), so the obvious config name silently failed to build a
client. User-supplied base_url is still preserved; only the provider
name normalises to `custom` so resolution doesn't hit the
PROVIDER_REGISTRY-only path.
2. agent/auxiliary_client.py: the vision auto-detect chain now skips the
user's main provider when models.dev reports `supports_vision=False`.
Without this guard, a misconfigured aux provider would fall back to
`auto`, which happily returned the main-provider client. The caller
would then send image content to e.g. api.deepseek.com with model
`gpt-4o-mini` and get a cryptic `unknown variant 'image_url',
expected 'text'` from the provider's parser.
3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements`
now mirrors the runtime fallback chain (explicit provider, then auto),
so `vision_analyze` shows up whenever vision is actually serviceable.
`browser_vision` gets a new `check_browser_vision_requirements` check_fn
that AND-gates browser + vision availability, so it doesn't get
advertised to the model when the call would fail at runtime.
Reproduction (config from the bug report):
model.provider: deepseek
model.default: deepseek-v4-pro
auxiliary.vision.provider: openai
auxiliary.vision.model: gpt-4o-mini
Before: resolve_vision_provider_client() returns None for the explicit
provider, fallback auto returns the deepseek client with model='gpt-4o-mini',
image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze
hidden from tool list; browser_vision exposed but fails at call time.
After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini.
vision_analyze and browser_vision both gate correctly on capability.
Tests: tests/agent/test_vision_routing_31179.py covers all three fixes
(12 cases including the user's exact scenario, base_url preservation,
text-only-main skip, capability-unknown permissive fallback, and tool
gating parity). Existing 382 tests across auxiliary/vision/image_routing
suites still pass.
* test(vision): use exact hostname check to silence CodeQL substring-sanitization alert
* fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL
The new `logger.debug(...)` added in the previous commit interpolated
both `main_provider` and `vision_model` (a public model slug \u2014 not
sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic
re-flagged it twice because the rule mis-detects multi-value
interpolations near tainted-via-config provider strings.
Drop the model from the log args (provider alone is enough to diagnose
the skip; the same sibling branch a few lines up already logs provider
only). Behavior unchanged; CodeQL false positive cleared.
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
The original PR #17194 description claimed test_display_tool_preview.py
but only ever shipped test_display_todo_progress.py. Add the missing
coverage for the failure-suffix path:
- _trim_error: whitespace strip, length cap, File-not-found path collapse
- _detect_tool_failure: terminal exit codes, memory full, structured
{error}/{message} extraction, malformed JSON, None result
- get_cute_tool_message E2E: read_file failure, terminal exit-only,
terminal stderr message, memory full, success path, no-result path
Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool
to reflect the new behavior: the generic '[error]' fallback in cli.py
has been removed; failure suffixes now come from the result-aware
_detect_tool_failure (e.g. '[exit 1]', '[File not found: x]').
Parse the todo_tool result summary to display completion progress in
CLI tool preview lines:
Read: ┊ 📋 plan 3/4 task(s) 0.5s
Update: ┊ 📋 plan update 3/4 ✓ 0.5s
Create: falls back to plain count when no completed tasks
Falls back gracefully to the existing 'N task(s)' format when the
result is missing, malformed, or has no completed items.
Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current
main.
Co-authored-by: Albert.Zhou <albert748@gmail.com>
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.
This change adds a generic plugin surface for auxiliary task
registration:
ctx.register_auxiliary_task(
key='memory_retain_filter',
display_name='Memory retain filter',
description='hindsight pre-retain dedup/extract',
defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
)
After registration, the task automatically:
- Appears in 'hermes model → Configure auxiliary models' picker via
a new _all_aux_tasks() merge of built-in + plugin tasks
- Has its provider/model/base_url/api_key bridged from config.yaml
to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
(gateway/run.py now uses a dynamic bridged-keys set instead of
a hardcoded per-task dict)
- Gets plugin-declared defaults (timeout, extra_body, etc.) layered
underneath user config so unconfigured plugin tasks still work
(agent/auxiliary_client._get_auxiliary_task_config)
- Resets to auto via 'Reset all to auto' alongside built-ins
Validation:
- Rejects shadowing of built-in keys (vision, compression, etc.)
- Rejects invalid key shapes (must match [A-Za-z0-9_]+)
- Rejects cross-plugin collisions (clear error)
- Allows same-plugin re-registration (idempotent updates)
Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.
Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.
Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.
Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.
Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:
- HERMES_HOME/.env (provider API keys)
- HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
- HERMES_HOME/mcp-tokens/ (OAuth token directory; dir
+ everything inside)
…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).
Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:
- Models that respect tool denials empirically tend to stop rather
than reach for the shell.
- The denial surfaces an audit trail when something tries to read
credentials — easier to spot in logs than a generic `cat`.
Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.
Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).
Co-authored-by: Tom Qiao <zqiao@microsoft.com>
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.
Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.
Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.
Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of
OpenAI-compatible servers) follow the OpenAI spec strictly and require
tool message `content` to be a string — they reject our list-type
content (text + image_url parts) with HTTP 400 'text is not set' /
'tool message content must be a string'.
Instead of an allowlist of known-good providers (maintenance burden,
guaranteed to miss aggregators like OpenRouter where the underlying
model determines support, not the aggregator name), this lands a
reactive recovery:
1. New `FailoverReason.multimodal_tool_content_unsupported` with a
small pattern list covering the common 400 wordings.
2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API
message list, downgrades any `role:tool` message whose content is
list-with-image to a plain text summary (preserves text parts) in
place, AND records the active (provider, model) in a session-scoped
`_no_list_tool_content_models` set.
3. `_tool_result_content_for_active_model` short-circuits to a text
summary when (provider, model) is in the cache — so after the first
400 + retry, subsequent screenshots in the same session skip the
round trip entirely.
4. Retry hook in `agent.conversation_loop` mirrors the existing
`image_too_large` recovery: detect the reason, run the helper,
retry once, fall through to the normal error path if no list-type
tool content was actually present.
Cache is transient (per-session) by design — next session retries in
case the provider added support, no persistent state to maintain.
Fixes#27344. Closes#27351 (allowlist approach superseded by reactive
recovery).
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.
Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).
Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.
Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.
Gate condition (matches the natural meaning of enabled_toolsets):
None → no filter, inject (backward compat)
contains 'memory' → user opted in, inject
otherwise (including []) → skip injection
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.
Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.
Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.
E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
Before: _find_all_skills() returned ~3 skills.
After: _find_all_skills() returns 84 skills including
github-pr-workflow, google-workspace, github-auth,
huggingface-hub. Apple-only skills remain excluded.
Non-Termux Linux/macOS/Windows behavior unchanged (verified).
Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
* fix(skills): skip dependency dirs in skill scan
* fix(skills): widen sibling rglob scanners to use shared exclusion set
Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.
Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.
Sites touched:
agent/curator_backup.py - skill backup file count
gateway/run.py - disabled-skill response (2 sites)
hermes_cli/dump.py - skill count in env dump
hermes_cli/profile_describer.py- profile description (2 sites)
hermes_cli/profile_distribution.py - profile install count
hermes_cli/profiles.py - profile skill count
hermes_cli/skills_hub.py - category detection
tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper)
tools/skill_usage.py - usage tracking + skill dir lookup (2 sites)
tools/skills_hub.py - optional skills find + scan (2 sites)
tools/skills_sync.py - bundled skills sync
E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.
* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep
---------
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.
This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.
Example config:
```yaml
custom_providers:
- name: gemma-local
base_url: http://localhost:8080/v1
model: google/gemma-4-31b-it
extra_body:
enable_thinking: true
reasoning_effort: high
```
For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:
```yaml
extra_body:
chat_template_kwargs:
enable_thinking: true
```
Changes:
- `hermes_cli/config.py`: preserve `extra_body` in normalized
`custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
as `request_overrides.extra_body` for named custom runtime resolution,
including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
entry by `base_url` (+ optional model) and merge its `extra_body` into
`AIAgent.request_overrides`, with caller-provided overrides winning on
conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
`extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
trailing-slash equivalence, fallback when no `model` field is set, and
caller-override merging precedence.
Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.
Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.
Closes#29022
* ci(tests): install ripgrep from prebuilt tarball instead of apt
apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.
- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.
* fix(cli): compile syntax check to tempdir, not source __pycache__
`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:
1. Parallel test workers walking the same source tree (e.g. running
the suite under per-file process isolation) can race against each
other on the `__pycache__` write — manifests as flaky 'directory
not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
that the next interpreter run might pick up — fine when the
interpreter version matches, sketchy if it doesn't.
Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.
* test(runner): per-file process isolation, drop manual state reset + xdist
Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.
Key changes:
* scripts/run_tests_parallel.py — new runner: discovers test files,
runs N in parallel via ThreadPoolExecutor, captures stdout per file,
treats exit code 5 (no tests collected) as pass, kills all children
on exit. Change from cpu_count to cpu_count*2. The runner is
I/O-bound (waiting on subprocess.communicate() from pytest children)
The parent process does almost no CPU work, so 2x oversubscription
keeps more pipes full. When a file fails, immediately show the last
30 lines of pytest output (stack traces + FAILED summary) plus a
ready-to-copy repro command:
python -m pytest tests/agent/test_auxiliary_client.py
* scripts/run_tests.sh — delegates to run_tests_parallel.py
* .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
* pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
* tests/conftest.py — remove ~200 lines of manual state-reset fixtures
* AGENTS.md — update Testing section for per-file design
* test(runner): speed gateway test antipattern scan up
* fix(test): web search provider plugin test missing xai
* fix(tests): make 14 test files pass under per-file subprocess isolation
Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:
Tool registry not populated:
- test_video_generation_tool_surface_matrix: add discover_builtin_tools()
- test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
registering all 8 bundled web providers, reset after each test
- test_website_policy: same provider registration pattern
- test_web_tools_tavily: same pattern across 3 dispatch test classes
- Also add is_safe_url/check_website_access mocks where SSRF check
blocks example.com (DNS resolution fails in isolated envs)
Stale check_fn cache:
- test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
in both kanban guidance tests (prior test cached False for kanban_show)
- test_discord_tool: cache invalidation in setup/teardown
- test_homeassistant_tool: invalidate_check_fn_cache() before registry queries
Module-level state pollution:
- test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
- test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
(ContextVar takes precedence over os.environ)
- test_dm_topics: overwrite sys.modules + separate telegram.constants mock
+ force-reimport of gateway.platforms.telegram
- test_terminal_tool_requirements: removed duplicate class declaration,
autouse _clear_caches fixture
* change(tests): run_tests.sh explicitly includes env vars
instead of manually dropping some vars, now we just only include some
* fix(tests): 5 more isolation/NixOS fixes
- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
(feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
/etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
(doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
profile deletion when copytree preserved read-only modes from
nix store
* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client
* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor
* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test
* fix: address PR #29016 review feedback
- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
shared helper (register_all_web_providers), replacing 8 copy-pasted
blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
fix worker count to match code (cpu_count*2)
* fix: eliminate race in stale-cache achievements test
The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
The contributor PR (#17936) only patched the strip path in
`_model_supports_vision()`. The auto-mode router in
`agent/image_routing._lookup_supports_vision` still only read models.dev,
so a custom-provider model declared as vision-capable would still get its
images routed through vision_analyze in the default `agent.image_input_mode:
auto` setting. Users had to set both `supports_vision: true` AND
`image_input_mode: native` to bypass the text pipeline.
Single-knob behavior now: `supports_vision: true` alone is enough in auto
mode. The strip path and the routing path consult the same resolver.
- Extract override resolution into `_supports_vision_override()` in
agent/image_routing.py and wire it into `_lookup_supports_vision()`.
- Refactor `run_agent._model_supports_vision` to call the same helper
(DRY, single source of truth for the resolution order).
- Strict YAML boolean coercion: `supports_vision: "false"` (quoted —
a common YAML mistake) no longer coerces to True via bool() truthiness.
Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1.
Unrecognised values return None and fall through to models.dev.
- Add @CNSeniorious000 to AUTHOR_MAP for release attribution.
Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride,
TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing
contributor tests + image_routing + vision_native_fast_path +
native_image_buffer_isolation all green (92/92).