Salvages the transport-side fix from #32911 (@xxxigm). Closes#32892.
The openai SDK's responses.stream() / responses.parse() eagerly call
_make_tools(tools), which iterates tools without a None guard. Passing
tools=None raises TypeError: 'NoneType' object is not iterable before
any HTTP request is issued (openai==2.24.0).
PR #33042 already removed responses.stream() from our own Codex call
paths, so the specific iteration crash inside _make_tools is no longer
on the hot path. But the right API contract is to omit tools entirely
when there are no functions to expose — passing tools=None to the
backend is semantically wrong regardless of the SDK's iteration
behavior, and we'd hit it again on any future code path that hasn't
migrated off responses.stream().
This applies the transport-level part of @xxxigm's fix: move
'tools': response_tools into the if response_tools: branch so the
key is omitted when there are no tools, just like tool_choice and
parallel_tool_calls already are. Skips the run_agent.py-side
_strip_sdk_none_iterables helper from their PR — that path is now
obsolete because the SDK helper that needed defending is gone.
Tests
- tests/run_agent/test_codex_no_tools_nonetype.py: 6 tests trimmed
from @xxxigm's original 13-test file. Drops the obsolete tests for
_strip_sdk_none_iterables and _RecordingResponsesStream (helpers
that don't exist on main anymore), keeps the transport behavior
tests + the SDK contract sanity check that ensures we notice if
upstream ever fixes _make_tools(None).
- 6/6 passing locally.
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Salvages the intent of #33136 (@Brixyy) onto current main. The original PR
was written against the pre-refactor monolithic run_agent.py and added a
top-level _is_nonretryable_local_validation_error() helper. Both target
functions have since been extracted to agent/conversation_loop.py:2869,
so the salvage applies the equivalent guard inline at that canonical
location rather than reintroducing the helper.
## Why
After #33042 made our own Codex consumer structurally immune to NoneType
crashes, third-party shims, mocked clients, and any future code path that
hasn't migrated could still surface TypeError: 'NoneType' object is not
iterable as a wire-shape mismatch. The agent loop's classifier currently
treats ALL TypeError as a local programming bug and aborts non-retryable
— users on stale Telegram/gateway turns saw bare "Non-retryable error
(HTTP None)" with no recovery.
This is a provider/SDK shape mismatch, not a local programming bug. The
retry/fallback path should run, not be short-circuited.
## What
agent/conversation_loop.py: extend is_local_validation_error to exclude
TypeErrors whose message matches the NoneType-not-iterable shape (case-
insensitive, both "NoneType" and "not iterable" must appear).
tests/run_agent/test_jsondecodeerror_retryable.py:
- update the mirror predicate to match the production check
- add TestNoneTypeNotIterableIsRetryable class with 3 tests (the basic
shape, message variants, unrelated TypeErrors still abort)
- add TestAgentLoopSourceHasNoneTypeCarveOut to enforce the source-level
invariant matches the test mirror
## Validation
tests/run_agent/test_jsondecodeerror_retryable.py +
tests/run_agent/test_31273_402_not_retried.py → 14/14 passing
Co-authored-by: Brixyy <subrtt@gmail.com>
Closes#33368.
`_CodexCompletionsAdapter.create()` iterates `final.output` from the
Codex Responses stream. The event-driven consumer (introduced in #33042)
always sets `final.output` to a list, so this shape can't come from our
own code path. But:
- Mocked clients in tests can return a typed Response with `output=None`
- Third-party shims / compatibility layers that bypass the consumer can
do the same
- A future code path that wraps a different consumer could regress
The old code `getattr(final, "output", [])` returns `None` (not the
default `[]`) when the attribute EXISTS but is `None`. Iterating
`None` then raises `TypeError: 'NoneType' object is not iterable` —
the exact error logged by title-generation when this fires.
Fix: `getattr(final, "output", None) or []` — single-line defensive
coerce. Cheap; zero risk.
Regression test asserts the auxiliary path handles a final whose
`.output` is `None` (via monkey-patched consumer) without raising and
returns the expected chat.completions-shaped response.
Reporter: @pavegrid-1 (issue #33368).
Closes#33163.
When _try_activate_fallback() switches from one provider to another (e.g.
openai-codex → openrouter), the credential pool still belongs to the
primary provider. This causes two compounding bugs:
1. The pool retains the primary's base_url. Downstream pool recovery
(rate_limit / billing / auth) calls _swap_credential() with a primary
entry which overwrites the agent's base_url back to the primary's
endpoint. Every fallback request then 404s against the wrong host.
2. Pool recovery acting on errors from the FALLBACK provider mutates the
PRIMARY's pool state (#33088 reported a related corruption pattern),
exhausting/rotating entries that have nothing to do with the failure.
Two layered fixes:
a) try_activate_fallback (agent/chat_completion_helpers.py): on fallback
activation, clear agent._credential_pool when the fallback provider
doesn't match the pool's provider. Pool is preserved when the fallback
shares the pool's provider (e.g. multiple openrouter entries).
b) recover_with_credential_pool (agent/agent_runtime_helpers.py):
defensive guard rejects any pool mutation when agent.provider doesn't
match pool.provider. Defense-in-depth — should never fire after (a)
is in place, but covers any future path that attaches a stale pool.
Salvaged from @zccyman's PR #33217. The original PR was written against
the pre-refactor monolithic run_agent.py; both target functions have
since been extracted to module-level helpers. Behavior is identical —
the guards live in the canonical extracted locations.
Tests
- New tests/run_agent/test_fallback_credential_isolation.py (7 tests
covering: fallback clears mismatched pool, fallback preserves matching
pool, recovery rejects mismatched pool, recovery accepts matching
pool, 429-from-z.ai-doesn't-exhaust-codex-pool, _client_kwargs
base_url survives pool clear, _swap_credential doesn't restore
primary URL after fallback).
- Cross-verified: 77/77 passing across fallback isolation tests +
agent/test_credential_pool.py — no regression.
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Closes#33175.
switch_model() in agent/agent_runtime_helpers.py mutated agent.model and
agent.provider before rebuilding the client, with no try/except to restore
them on failure. If the rebuild raised (bad API key, network error,
build_anthropic_client failure, etc.) the agent was left with the new
model+provider name paired with the OLD client — producing HTTP 400s like
"claude-sonnet-4-6 is not supported on openai-codex" on the next turn.
Callers in cli.py, gateway/run.py, and tui_gateway/server.py already catch
the exception and warn the user, but the warning was misleading because
the swap had partially succeeded; the agent's state was torn.
Snapshot every mutated field before the swap, wrap the swap+rebuild block
in try/except, and restore the snapshot on failure before re-raising so
the caller's warning surfaces.
Reported by @amirariff91. Tests cover both branches (chat_completions and
anthropic_messages) and the cross-branch case (anthropic -> openai).
xAI's /v1/responses endpoint rejects service_tier with HTTP 400
"Argument not supported: service_tier" when users activate /fast mode.
Also add a safety-net strip_slash_enum call in _preflight_codex_api_kwargs
to catch any tool schemas that might slip through the caller-level
sanitization. xAI's Responses API grammar compiler rejects enum values
containing forward slashes (e.g. HuggingFace model IDs like
"Qwen/Qwen3.5-0.8B") with the opaque "Invalid arguments passed to the
model" error.
Fixes the root cause of "Invalid arguments passed to the model" errors
reported by xAI OAuth (SuperGrok) users.
reasoning.encrypted_content is sealed to the Responses endpoint that
minted it. When a session switches model providers mid-conversation —
say the user runs /model gpt-5.5 after several turns on grok-4.3, or
vice versa — the persisted codex_reasoning_items carry blobs the new
endpoint cannot decrypt, and every subsequent turn fails with HTTP 400
invalid_encrypted_content.
This is the cross-issuer prevention layer. Pairs with:
* PR #33035 — runtime recovery when the HTTP 400 fires anyway
* PR #33146 — prevention for transient rs_tmp_* items
Stamps each reasoning item with the issuer kind that minted it
(codex_backend / xai_responses / github_responses / other:<url>) at
normalize time, then drops items at replay time when the active
endpoint differs from the stamp. Unstamped (legacy) items pass
through for backwards compatibility.
Cherry-picked from @chaconne67's PR #31629. Conflict against current
main (#33035's replay_encrypted_reasoning parameter) resolved as
'keep both' — the two guards compose: replay_encrypted_reasoning=False
is the session-wide kill switch, current_issuer_kind is the per-item
filter that runs only when replay is still enabled.
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
* refactor(codex): drop SDK responses.stream() helper; consume events directly
The OpenAI Python SDK's high-level `client.responses.stream(...)` helper
does post-hoc typed reconstruction from the terminal
`response.completed.response.output` field. The chatgpt.com Codex
backend has been observed (today, gpt-5.5) to ship `response.output =
null` on terminal frames, which crashes the SDK with `TypeError:
'NoneType' object is not iterable` mid-iteration.
Carlton's #32963 patched the symptom by wrapping the helper in
try/except and recovering from the same per-event accumulator the SDK
was supposed to populate. This PR removes the helper from the call
path entirely: we now use `client.responses.create(stream=True)` (raw
AsyncIterable of SSE events) and assemble the final response object
ourselves from `response.output_item.done` events as they arrive. The
terminal event's `output` field is never read for content. Same
strategy OpenClaw uses for the same backend.
This makes Hermes structurally immune to the bug class, not patched.
The next time OpenAI ships a shape change to chatgpt.com's terminal
frame, our consumer keeps working because it doesn't read that frame
for content — only for usage/status/id.
Changes
- `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared
consumer; `run_codex_stream()` uses `responses.create(stream=True)`;
`run_codex_create_stream_fallback()` collapses into a thin alias
since the primary path now does what the fallback used to do.
- `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the
same consumer; old null-output recovery helpers deleted as
unreferenced.
- Tests migrated: fixtures that mocked `responses.stream` now mock
`responses.create` returning a raw iterable. New regression test
asserts the auxiliary path returns streamed items even when the
terminal event's `output` is literally `null`.
Validation
- Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex`
with `gpt-5.5` — response built correctly with `response.output=null`
on the terminal frame, all events consumed, usage/reasoning tokens
propagated.
- `tests/run_agent/test_run_agent_codex_responses.py` +
`tests/agent/test_auxiliary_client.py`: 242 passed.
* test+fix(codex): migrate streaming tests, raise on truncated streams
CI surfaced 10 test failures across tests/run_agent/test_streaming.py
and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had
their own `responses.stream(...)` mocks I missed in the first sweep.
agent/codex_runtime.py: _consume_codex_event_stream() now raises
"Codex Responses stream did not emit a terminal response" when the
stream ends without any terminal frame AND no usable content. This
preserves the signal callers used to get from the SDK's high-level
helper, which they distinguished from "completed with empty body"
in error handling.
Tests migrated:
- test_streaming.py: text-delta callback, activity-touch, and
remote-protocol-error tests all switch from mocking responses.stream
to responses.create returning an iterable of events.
- test_codex_xai_oauth_recovery.py: prelude-error tests are recast as
wire-error-event tests (the new path raises _StreamErrorEvent
directly when the wire emits type=error, which is strictly better
than the old two-phase "SDK RuntimeError → retry → fallback"). The
retry-on-transport-error test moves from responses.stream side-effect
to responses.create side-effect.
Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat()
through the full codex_responses path returns correctly, 319/319
targeted tests passing.
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144)
When an OpenAI-compatible Responses API surface accepts an initial
request but later rejects the replayed `codex_reasoning_items`
encrypted blob with HTTP 400 `invalid_encrypted_content`, the
session previously got stuck retrying the same poisoned payload.
Recovery: classify the error as a dedicated FailoverReason, and on the
first hit disable encrypted reasoning replay for the rest of the
session, strip cached items from message history, and retry once.
Changes:
* error_classifier: add FailoverReason.invalid_encrypted_content
branch in _classify_400 (before context_overflow so the messages
that mention 'encrypted content … could not be verified' don't trip
context heuristics), in _classify_by_error_code, and extend
_extract_error_code to peek inside wrapped JSON in error.message and
ignore the bare '400' as a code.
* agent_init: initialize `_codex_reasoning_replay_enabled = True` on
every agent.
* run_agent: add AIAgent._disable_codex_reasoning_replay() helper
that flips the flag and pops cached items.
* codex_responses_adapter: thread a `replay_encrypted_reasoning`
kwarg through _chat_messages_to_responses_input so that when the
flag is False we don't replay codex_reasoning_items.
* transports/codex.py: read `replay_encrypted_reasoning` from params,
thread it into the adapter, and gate the
`include=['reasoning.encrypted_content']` request hint on it.
* chat_completion_helpers: pass the agent's replay flag through to
the transport.
* conversation_loop: in the retry loop, add an
invalid_encrypted_content recovery branch that fires once per
session, only when api_mode == codex_responses, only when replay is
still enabled, and only when at least one assistant message in
history actually carries cached reasoning items (otherwise the 400
has nothing to do with our cache and the normal retry path handles
it).
Tests:
* test_error_classifier: new wrapped-JSON _extract_error_code case;
new TestClassifyApiError cases proving the 400 is retryable with
no fallback, that the broad message match doesn't catch a generic
'parsed' message, and that the error code match is
case-insensitive.
* test_run_agent_codex_responses: end-to-end test of the recovery
branch firing once and disabling replay, plus a sibling test that
proves the branch does *not* fire (and the flag stays True) when
history has no cached reasoning items.
Salvages PR #10144 onto the post-refactor module layout
(error_classifier / codex_responses_adapter / transports/codex /
conversation_loop / agent_init) since the original diff was written
against the pre-refactor monolithic run_agent.py.
* chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage
---------
Co-authored-by: victorGPT <wuxuebin1993@gmail.com>
'hermes login' was removed (the command now just prints a deprecation
message and exits). The bundled hermes-agent SKILL.md, in-code error
messages, the tip rotation, the proxy adapters, and the docs site
still pointed agents and users at the dead command — so models loading
the skill kept running 'hermes login --provider openai-codex' and
getting a dead-end print.
Replacements use the canonical 'hermes auth add <provider>' surface
(or bare 'hermes auth' for the interactive manager).
Files:
- skills/autonomous-ai-agents/hermes-agent/SKILL.md (+ regenerated docs page)
- hermes_cli/tips.py (tip rotation)
- agent/google_oauth.py (gemini-cli error message)
- agent/conversation_loop.py (nous re-auth troubleshooting line)
- agent/credential_sources.py (docstring)
- hermes_cli/proxy/cli.py + hermes_cli/proxy/adapters/nous_portal.py (proxy auth hints)
- tests/hermes_cli/test_proxy.py (updated assertions)
- website/docs/reference/faq.md, website/docs/user-guide/features/subscription-proxy.md
- zh-Hans i18n mirrors for the above
'hermes logout' is still a live command and is left untouched.
The 'hermes login' stub in hermes_cli/auth.py:login_command() and
the cli-commands.md 'Deprecated' rows are intentionally kept as
the discoverable deprecation surface.
SubdirectoryHintTracker was scanning directories outside the active
working directory, allowing files like ~/.codex/AGENTS.md or
~/.claude/CLAUDE.md to be loaded and injected into the agent context.
This causes cross-agent context contamination and instruction mixup.
Add _is_ancestor_or_same() helper and a path boundary check in
_is_valid_subdir(): only directories within the working directory tree
(i.e. path.is_relative_to(working_dir)) are allowed.
Also add exist_ok=True to mkdir() calls in new tests to prevent
pytest-xdist race conditions when workers share the same tmp_path parent.
Tests added:
- test_outside_working_dir_rejected: verifies sibling dirs are blocked
- test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked
- test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected
- test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace
When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude
Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY
to ~/.hermes/.env and zeros ANTHROPIC_TOKEN. That env-var pattern is the
user's explicit choice of auth method — API key, not OAuth.
But the anthropic credential pool's autodiscovery (_seed_from_singletons)
unconditionally read ~/.claude/.credentials.json from the Claude Code CLI
and any saved hermes_pkce creds, and added them to the SAME anthropic
pool as the user's API key. Two problems:
1. Even with the API key at higher priority, a 401/429 on the API key
would rotate the session onto an autodiscovered OAuth credential,
silently flipping the agent into the Claude Code masquerade
mid-conversation: 'You are Claude Code' system block, every tool
renamed to mcp_*, claude-cli User-Agent header.
2. Switching OAuth → API key at `hermes setup` cleared the env vars
but left previously-seeded OAuth entries dormant in auth.json,
where rotation could revive them.
The user picking the API-key path is explicitly opting OUT of the
masquerade. Mixing OAuth credentials into their pool defeats that
choice.
Fix: in `_seed_from_singletons` for provider='anthropic', detect the
API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and:
- Skip calling read_claude_code_credentials() and
read_hermes_oauth_credentials() entirely
- Prune any stale hermes_pkce / claude_code entries that may already
be in the on-disk pool
OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery
continues to fire as before.
Tests: 3 new regression tests (api-key skips autodiscovery, api-key
prunes stale entries, oauth path still autodiscovers). Full file 70/70.
The outer 'except Exception' guard in run_conversation() captures
exceptions raised inside the agent loop (during streaming, tool
dispatch, message construction, etc.) and prints a one-line summary
to the screen. The traceback was only logged at DEBUG, so it never
landed in errors.log (WARNING+) and was lost.
For intermittent failures — the most important kind to debug — users
saw 'Error during OpenAI-compatible API call #N: <message>' on
screen with no way to recover the call site. Switching to
logger.exception() emits the full traceback at ERROR so it goes to
both agent.log and errors.log automatically.
This is a pure logging change; control flow is unchanged.
Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:
1. tools/threat_patterns.py — single source of truth for injection/promptware
patterns. Replaces the duplicated pattern lists in prompt_builder.py and
memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
override, known framework names). Three scopes — 'all' (narrow, classic
injection), 'context' (adds promptware/role-play, broader detection),
'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).
2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
frozen system-prompt snapshot. Live state keeps the original so the
user can still inspect + remove via memory(action=read/remove). Scan is
deterministic from disk bytes — prefix-cache invariant holds.
3. make_tool_result_message() wraps results from high-risk tools
(web_extract, web_search, browser_*, mcp_*) in
<untrusted_tool_result source="...">...</untrusted_tool_result>
delimiters with framing prose telling the model the content is data,
not instructions. Architectural defense against indirect injection
from poisoned web pages, GitHub issues, MCP responses — does NOT
regex-scan tool results (pattern arms race + per-iteration latency).
Multimodal content lists pass through unwrapped to preserve adapter
compatibility.
Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.
Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
path, blocked from MEMORY.md snapshot, wrapped in delimiters when
arriving via web_extract. Legitimate 'you must follow conventions'
phrasing not flagged.
Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
block-with-placeholder — there's no warn mode that makes sense)
Closes#496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.
xAI retired grok-4-1-fast. hermes_cli/models.py already removed it from
the static fallback in an earlier commit, but the context-length
metadata, the tests pinning those values, and the provider doc still
referenced the retired ID. Clean those up so retired model names stop
appearing in user-facing output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After key #1 is marked exhausted the retry still called the API with key #1
due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials.
Fix: peek the pool and pass the active entry's key as explicit_api_key.
Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry
under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError;
extract_api_error_context parses "Resets in Xhr Ymin".
Closes#26145.
When the user interrupts the retry loop between two 429s (Ctrl-C in
interactive mode, /new, gateway disconnect), the local has_retried_429
flag dies with the recovery function. On the next user prompt the agent
restarts with has_retried_429=False, hits 429 on the exhausted credential,
sets the flag, returns 'retry once'. Repeat forever — the second 429 that
would trigger rotation is never reached, and healthy entries (priority>0
free/paid accounts) are never tried.
Fix: in recover_with_credential_pool's rate_limit branch, pre-check
pool.current().last_status before running the retry-once dance. If the
current entry is already STATUS_EXHAUSTED, rotate immediately. Uses
getattr() for the attribute read so existing tests with SimpleNamespace
mocks (which only set 'label') keep working.
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path),
but the non-retryable-401 guidance branch only covered openai-codex and
xai-oauth. A Nous 401 fell through to the generic 'Your API key was
rejected... run hermes setup' message, which is wrong advice — the user
needs hermes auth add nous --type oauth, not an API key.
Also flag the case where the failing model slug ends in :free (OpenRouter
syntax) while provider is nous. Without that hint, users re-OAuth
successfully and then hit the same 401 on the next message because Nous
Portal doesn't carry the OpenRouter free-tier slug.
Reported by ashh — debug dump showed Nous device_code exhausted +
deepseek/deepseek-v4-flash:free as the model.
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override. The
expectation in that case is universal: 'use my main model for side
tasks too.'
Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend. That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.
The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):
1. caller-passed model (caller knew what they wanted)
2. provider's catalog default (cheap aux model, if registered)
3. user's main model from config.yaml
Behaviour by provider class:
- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
step 3 applies. Title gen runs on grok-4.3 / gpt-5.4 against the
user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
default wins at step 2, preserving the original 'cheap aux model'
behaviour. Anthropic users still get claude-haiku-4-5 for titles,
not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
callers) — caller wins at step 1, no surprise switching.
Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically. The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.
4 new tests in TestResolveProviderClientUniversalModelFallback:
- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks
365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.
Co-authored-by: wysie <wysie@users.noreply.github.com>
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode
where it accepts the connection but never emits a single stream event — the
socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP
status), and a fresh reconnect then succeeds in ~2s. Today the only guard is
the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival
connection is held for the full stale window (90-900s depending on context /
config) before the retry loop can reconnect — minutes of wasted wall time per
stall, at a rate of ~20% of calls during affected windows.
Add a TTFB watchdog scoped to the codex_responses path:
- codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on
*every* stream event (not just output-text deltas), so reasoning-only and
tool-call-only turns are not mistaken for a stall.
- interruptible_api_call resets that marker before the worker starts and, while
it is still None, kills the connection once elapsed exceeds the TTFB cutoff
(default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The
raised TimeoutError flows through the existing retry path unchanged.
Once any event has arrived the stream is healthy and only the existing
wall-clock stale timeout applies, so legitimate long generations are never
interrupted. Gated to codex_responses; the chat_completions non-stream,
anthropic and bedrock branches have no first-event signal and are untouched.
Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the
events-flowing pass-through, and the env-disable path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)
When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.
Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
'length' for mid-tool-call partials. The stub still has tool_calls=None
so no tool auto-executes — the model gets a fresh API call through the
existing length-continuation machinery (bounded to 3 retries).
Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
partial-stream-stubs with dropped tool calls. Instead of the generic
'continue where you left off' (which would retry the same large call),
tell the model to break the output into smaller tool calls (~8K
tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
finish_reason='stop' to 'length', add _dropped_tool_names assertion,
add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
prompt branching.
Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.
* refactor: extract constants and continuation prompt helper
- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic
---------
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.
This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.
Changes:
- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
Returns ``None`` for any request that does not match all three guards
(codex_responses api_mode, openai-codex provider or chatgpt.com Codex
base URL, gpt-5.5-family model name with word-boundary regex anchoring
to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
consults the hint via ``getattr(...)`` so the call site stays robust
if the helper is ever removed or stubbed in tests. Hint is appended to
both the ``_emit_status`` warning and the ``TimeoutError`` message so
the user sees it in their terminal AND it lands in any retry-loop
diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
gpt-5.5-codex SKU, model=None fallback to self.model) and negative
cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
non-codex api_mode, non-codex provider, empty/None model, unrelated
models on Codex).
Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.
Closes#22046
get_read_block_error() only blocked internal Hermes cache files but
allowed reading project-local secret-bearing environment files (.env,
.env.production, .env.local, etc.) through both read_file and ACP
fs/read_text_file paths.
Add a basename deny set for common secret-bearing .env variants.
.env.example remains readable as documentation.
Fixes#20734
* perf(bitwarden): persist secret-fetch cache across CLI invocations
Every `hermes` invocation paid a ~380ms tax for `bws secret list` to
Bitwarden Secrets Manager because the existing cache was in-process only.
Back-to-back `hermes chat -q`, gateway-spawned agents, and cron-launched
runs all re-fetched.
Adds a disk-persisted L2 cache at `<hermes_home>/cache/bws_cache.json`
(mode 0600, never contains the access token — only the SHA-256
fingerprint prefix). Same TTL as the in-process cache. Read on miss,
write on bws success, ignored on key mismatch / corruption / expiry.
Measured on a startup profile:
load_hermes_dotenv() cold: 372ms → warm (disk cache hit): 20ms
End-to-end `hermes --version` cold→warm: 666ms → ~295ms.
In a hermes-vs-codex benchmark across 11 single- and multi-turn tasks
(framework overhead = wall − llm − tool_exec, median over 3 trials):
cohort before after saved
single-turn (median) 2.96s 2.31s -0.65s
multi-turn (5-turn) 9.40s 8.95s -0.45s (≈0.3s/turn)
Hermes now wins head-to-head on 6/11 tasks vs codex (was 4/11 before).
The remaining ~0.6s single-turn delta is mostly Python's own import
cost in hermes_cli.main, which is a separate optimization.
* perf(cli): lazy-load model catalog + dedupe config.yaml reads at startup
Two import-time wins on top of the bws disk-cache fix:
1. Lazy-load `hermes_cli.models._PROVIDER_MODELS` via PEP 562
module-level `__getattr__`. The catalog is ~55ms of work that was
eagerly imported on every CLI invocation (line 4557 `if not
_is_termux_startup_environment(): from hermes_cli.models import
_PROVIDER_MODELS`). Audit showed every internal call site already
does its own function-local import; only test code reads
`hermes_cli.main._PROVIDER_MODELS` as a module attribute, and
__getattr__ keeps that working transparently. First access triggers
the import once and caches the result on the module via
`globals()[name] = ...`, so subsequent reads are dict lookups.
2. Dedupe the double config.yaml read in the top-of-module bootstrap.
Previously: one raw yaml.safe_load for the `security.redact_secrets`
bridge, then a separate full `load_config()` (with deep-merge) for
`network.force_ipv4`. Both keys come from the same file. Merged
into one raw yaml load.
Combined with the bws cache fix in the previous commit:
hermes --version wall time:
original (cold): 666 ms
after bws fix (warm): 295 ms
after lazy-load + dedupe: 228 ms (-67 ms additional, -66% from original)
Tests:
- tests/hermes_cli/test_api_key_providers.py: 173/173 pass
(lazy __getattr__ correctly handles
`from hermes_cli.main import _PROVIDER_MODELS`)
- tests/test_ipv4_preference.py + tests/hermes_cli/test_redact_config_bridge.py +
tests/agent/test_redact.py: 93/93 pass (dedupe preserves both bridges)
- tests/test_bitwarden_secrets.py + env_loader tests: 49/49 pass
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:
1. The non-stream stale-call detector estimated context tokens from
``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
their conversational load in ``input`` (with ``instructions`` and
``tools``), so every Codex turn logged ``context=~0 tokens`` and the
detector never applied its >50k / >100k tier bumps.
2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
main Codex path. The chat_completions path and the auxiliary Codex
adapter both forwarded it; the main path skipped it through three
places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
``_preflight_codex_api_kwargs``).
3. The streaming stale detector had the same payload-shape bug for
``codex_responses`` requests, which route through the non-streaming
detector (it's the path that emits the user-facing
"No response from provider for 300s (non-streaming, ...)" warning that
reporters keep pasting).
This commit:
- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
used by both the non-stream and stream detectors. Handles ``messages``
(Chat Completions), ``input + instructions + tools`` (Responses API),
bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
and ``_preflight_codex_api_kwargs`` (with guards against
zero/negative/inf/bool values), and wires
``_resolved_api_call_timeout()`` into the Codex branch of
``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
kick in faster when upstream stalls:
* base 300s -> 90s
* >50k 450s -> 150s
* >100k 600s -> 240s
These only apply when the user has *not* set
``providers.<id>.stale_timeout_seconds`` or
``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
context-tier scaling, transport timeout pass-through, and preflight
timeout pass-through / rejection of invalid values.
Closes#21444
Supersedes #21652#24126#31855
Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
Follow-up to @someaka's fix.
Polish:
- Drop the redundant `_preflight_tokens >= threshold_tokens` clause.
`should_compress(tokens)` already short-circuits when tokens < threshold,
so the explicit comparison was dead code on the True branch.
Tests:
- Preflight: pin that should_compress() is called (anti-thrash has a vote).
Mocks should_compress to return False even with tokens past the raw
threshold and asserts no compression runs — exact bug shape from #29335.
- Gateway: AST scan of gateway/run.py asserts every
`session_entry.session_id = ...` assignment is followed by a
`session_store._save()` call within the same block. Three sites mutate
the session_id after compression; all three must persist or the next
turn loads the pre-compression transcript and re-loops. Empirically
verified the test catches the bug (drops the new _save() line → red).
AUTHOR_MAP:
- Map ed@bebop.crew -> someaka so the salvaged commit resolves to
@someaka in release notes.
Three compounding root causes:
A) run_conversation() result dict missing session_id — gateway's
dead-code guard at gateway/run.py:8700 never triggers
B) preflight compression bypasses should_compress() anti-thrashing —
re-triggers every turn when tool schemas dominate token budget
C) gateway updates session_entry.session_id in memory but doesn't
persist via session_store._save()
Fixes: #29335
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them. The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image. Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.
Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal. Mirrors the audio_cache pattern used by text_to_speech.
Wires save_url_image into both the xAI and OpenAI providers' URL
branches. When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.
Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
in-process HTTP server: bytes round-trip, content-type → extension,
URL-suffix fallback, default-to-png, 404 propagation, empty-body
refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
test_successful_url_response (was asserting the bug), add
test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.
160/160 in the broader image_gen test surface.
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.
The hook covers cases the shell-template grammar can't reasonably
express:
- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows
None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.
## Resolution order
The dispatcher's resolution order is the load-bearing invariant:
1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
→ command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
→ plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).
Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
the `hermes tools` row list defensively.
Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
`_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
config defensively so a refactor of the caller can't silently break
the invariant.
## New files
- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
`list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
`voice_compatible` (all optional with sane defaults). Mirrors
`agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
`agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).
## Modified files
- `hermes_cli/plugins.py` — `register_tts_provider()` method on
`PluginContext`. Matches the gating shape of
`register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
`_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
plugin rows into the Text-to-Speech picker category alongside the
10 hardcoded built-in rows.
## Tests
- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
`tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
built-in-always-wins, command-wins-over-plugin, plugin dispatch,
exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
picker surface, builtin shadowing defense, integration with
`_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
parity harness vs `origin/main`. The only intentional diff is
`fallback_edge → plugin` for the `plugin-installed` scenario.
## Verification
- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
`text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.
## Docs
- `website/docs/user-guide/features/tts.md` — new "Python plugin
providers" section with a decision table (command-provider vs
plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
mention both surfaces (command-provider primary, plugin for
SDK/streaming).
Closes#30398
Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:
1. agent/chat_completion_helpers.py :: build_assistant_message
- Redact assistant content before the message dict is constructed
(catches PATs / API keys the model inlines into natural language)
- Redact tool_call.function.arguments at the same site (catches secrets
inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
Tool execution uses the raw API response object, not this dict, so
redacting the persisted shape is safe.
2. run_agent.py :: _save_session_log
- Add _redact_message_content() static helper that handles both string
content and OpenAI/Anthropic multimodal list-of-parts (image parts
pass through untouched, only text/content fields are redacted)
- Apply to every message + the cached system prompt before writing
session_*.json
Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.
Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
- api key in tool content
- api key in user message
- api key in system prompt
- multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.
Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.
Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).
Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
_write_claude_code_credentials wrote ~/.claude/.credentials.json via
Path.write_text + replace + post-write chmod(0o600). Both the temp file
and the destination briefly inherited the process umask (commonly 0o644
= world-readable) between create/replace and chmod, exposing the OAuth
access/refresh tokens to other local users on multi-user hosts.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the temp file is created atomically at 0o600. After os.replace,
the destination inherits the temp's mode, so the post-write chmod is no
longer needed. The temp name also gains a per-process random suffix to
avoid collisions between concurrent writers and stale leftovers from a
crashed prior write.
Parent dir (~/.claude/) is owned by Claude Code itself and shared with
its native auth, so we deliberately don't tighten its mode here (unlike
the mcp_oauth fix which owns its own subtree under HERMES_HOME).
Mirrors the fix shipped for agent/google_oauth.py in #19673 and the
parallel fix for tools/mcp_oauth.py in #21148.
Adds a regression test in TestWriteClaudeCodeCredentials asserting the
resulting file mode is 0o600 (skipped on Windows where POSIX mode bits
aren't enforced).
The write denylist already protects SSH keys, AWS, GPG, npm, PyPI,
Docker, Azure, and GitHub CLI credentials. Two common credential
stores were missing:
~/.git-credentials stores plaintext git tokens in the format
https://username:token@github.com when using git credential-store.
It is directly analogous to ~/.netrc which was already protected.
~/.config/gcloud/ contains Google Cloud OAuth tokens and service
account credentials. It is directly analogous to ~/.aws/ which
was already protected.
Under prompt injection, an agent could be instructed to overwrite
these files, destroying credentials or planting malicious ones.
Verified before and after with is_write_denied() on both paths.
The gateway pairing directory (~/.hermes/pairing/) stores per-platform
access-control files (telegram-approved.json, discord-approved.json, etc.).
A prompt-injected agent using write_file could add arbitrary user IDs to an
approved file, granting persistent gateway access without going through the
pairing code flow — the same threat class that motivated protecting
webhook_subscriptions.json (#14157).
The pairing directory was not included in the original control-plane protection
because it postdates PR #14157. PR #30383 introduced the hashed-pending schema
and made the approved files the sole source of truth for gateway access, raising
the security sensitivity of the directory.
Apply the same mcp-tokens pattern: block writes to pairing/ and any path within
it, under both the active hermes_home and the root path (for profile-mode parity
with the fix in #30382).
Regression tests verify denial for pairing/telegram-approved.json,
pairing/discord-pending.json, and the directory itself, in both normal and
profile-mode layouts.
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without
this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool
name in step 3, so a native MCP server tool registered as
'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The
incoming strip only removes ONE prefix, which still worked on first
call, but on subsequent calls the model pattern-matched the
single-prefixed form from message history and produced names that
stripped to 'composio_X' — registry miss, dispatch fail.
The history-rewrite block (#4) already has this guard. Apply the same
guard to the schema-rewrite block (#3) so round-trip is symmetric.
Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass.
Author map: hayka-pacha added for PR #25270 salvage attribution.
Refs GH-25255.
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response
unconditionally stripped the mcp_ prefix from ALL tool names starting
with mcp_. This broke Hermes-native MCP server tools (registered under
their full mcp_<server>_<tool> name in the registry) because the stripped
name doesn't match any registry entry.
Fix: check the tool registry before stripping. Only strip when:
- The stripped name EXISTS in the registry (OAuth-injected tool)
- The full name does NOT exist in the registry
This preserves backward compatibility for OAuth-injected tools while
protecting native MCP server tools from incorrect prefix removal.
7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp,
unknown tools, mixed responses, and dual-registration edge case.
Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com>
Standard OpenAI returns request-validation failures (unknown/
unsupported parameter, malformed request) as 4xx. Some
OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me
returns 502 for an unknown parameter.
The generic '5xx -> retryable server_error' rule then misfires: the
error is deterministic (every retry gets the identical rejection), so
the retry loop burns all 3 attempts, the transport-recovery path
resets the counter and burns 3 more, and the result is a request
flood against a request that can never succeed.
Fix: when a 500/502 body carries an unambiguous request-validation
signal — 'unknown parameter' / 'unsupported parameter' /
'invalid_request_error' in the message text, or invalid_request_error
/ unknown_parameter / unsupported_parameter as the structured error
code — classify as a non-retryable format_error so the loop fails
fast and falls back. Genuine 502 Bad Gateway with no such signal
stays retryable as before.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.
chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:
502: [ObjectParam] [input[617]._empty_recovery_synthetic]
[unknown_parameter] Unknown parameter:
'_empty_recovery_synthetic'
Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.
Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.
Origin: local-author
Upstream-PR: none
Patch-State: local-only