Commit graph

1484 commits

Author SHA1 Message Date
Teknium
c1c179a239
fix(security): redact secrets in background process + foreground env-dump output (#43025) (#54149)
* fix(security): redact secrets in background process + foreground env-dump output

Terminal-output redaction was incomplete (#43025):

- Gap 1: process(action=poll/log/wait) returned background stdout verbatim —
  no redaction at all. A background printenv/server/test emitting a key leaked
  raw to the model, session.db, and CLI display. Same for the gateway
  background-process watcher's completion/progress notifications.
- Gap 2: the foreground terminal path hardcoded code_file=True, which skips the
  ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv
  leaked even there.

Adds agent.redact.redact_terminal_output(output, command) as the single policy
for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/
declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens;
other commands stay on code_file=True to avoid false positives on source dumps.
Wired into terminal_tool, process_registry (_handle_process boundary), and the
gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved.

* docs: add infographic for #43025 terminal-output redaction fix
2026-06-28 02:44:21 -07:00
teknium1
bbe1bf4045 fix(agent): stop redacting tool-call args in history; fix auth-header quote-eating
Two related redaction bugs from #43083:

1. build_assistant_message redacted tool-call arguments in-memory. That dict
   feeds both the replayed conversation history and state.db (which is itself
   replayed verbatim on session resume), so the model read back its own
   PGPASSWORD='***' psql call and copied the placeholder, breaking every
   credential-dependent command on the second turn. The masking gave no real
   protection either — the same secret still leaks through tool OUTPUT. Remove
   it. Keeping secrets out of the replayable store is a separate
   tokenization/vault concern (security.redact_secrets still governs
   storage-time redaction elsewhere).

2. _AUTH_HEADER_RE's greedy \S+ credential class ate a closing quote when the
   token sat flush against it (Authorization: Bearer sk-.."), turning value
   corruption into syntax corruption (unterminated quote -> shell EOF /
   SyntaxError). Exclude " and ' from the token class; real credentials never
   contain them.

Closes #43083.
2026-06-28 02:44:06 -07:00
teknium1
aa50c1ba5d fix(prompt): repair backend probe import (get_environment never existed)
The system-prompt backend probe imported a nonexistent symbol —
`from tools.environments import get_environment` — which always raised
ImportError: cannot import name 'get_environment'. The exception is caught
and only drops the live backend description to a static fallback, so it is
cosmetic, but it broke the live OS/user/cwd probe for every non-local
backend (docker/singularity/modal/daytona/ssh).

The real factory is `_create_environment` in tools.terminal_tool. Build the
environment the same way the live terminal path does (select backend image,
assemble ssh/container config from _get_env_config()), then run the probe.

Note: this does NOT affect tool loading — tool selection runs each tool's
check_fn and never consults this probe. Regression from #52147 (2026-06-25).

Closes #53667 (probe import); the 'cronjob-only' tool-collapse symptom is
not reproducible — tool selection has no probe dependency and memory's
check_fn is unconditionally True.
2026-06-28 02:41:31 -07:00
Teknium
674e16e7c6
fix(redact): stop DB-connstr redaction from corrupting code output (#33801) (#54061)
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):

_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.

Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
  the match can never span a line break. A real DSN password never contains
  whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
  expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
  not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
  sites (file_tools already did) so code-execution output isn't corrupted by
  ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
  private keys are still redacted.

Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.

Reported-by: koishi70
Reported-by: pfrenssen
2026-06-28 01:15:39 -07:00
teknium1
578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00
595650661
b8e2268628 fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.

Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.

Refs #32421
2026-06-28 01:15:21 -07:00
sweetcornna
2701ea2f0c fix(agent): reopen fallback chain after primary recovery 2026-06-28 00:57:42 -07:00
Gille
9229d0db17 fix(moa): preserve Nous provider identity for references 2026-06-28 00:47:15 -07:00
Teknium
7c38249c79
feat(moa): references see full tool state + fire on every user/tool response (#54016)
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.

Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
  rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
  reference gives an informed judgement on the real current state. Still emits
  zero tool-role messages and zero tool_calls arrays (strict providers reject
  those), and large tool results are previewed head+tail (4000-char budget).
  The required end-on-user shape is met by APPENDING a synthetic advisory user
  turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
  new tool result — instead of once per user turn. The state-sensitive advisory
  signature drives the cache: new tool result = miss (re-run), identical-state
  re-call = hit (no re-run, no re-emit).

The acting aggregator still receives the full, untrimmed transcript.
2026-06-28 00:30:11 -07:00
Teknium
1fa44180b0
fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
infinitycrew39
e860a40e14 fix(agent,gateway): surface partial-stream recovery and bound detached restart
Salvage of NousResearch/hermes-agent#41498 (0-CYBERDYNE-SYSTEMS-0).

- Leave response_previewed false on partial_stream_recovery so gateway
  fallback delivery can send the recovered fragment plus explanation.
- Always append the turn-completion explainer for partial_stream_recovery,
  not only for empty or very short fragments (#34452 gap).
- Launch the detached /restart helper before drain, idempotently, with a
  bounded wait of restart_drain_timeout + 5s.
2026-06-27 22:03:14 -07:00
HexLab98
073847c0f2 fix(auxiliary): use env-only proxy policy for OpenAI SDK clients (#53702)
Auxiliary clients now inject a keepalive httpx transport with explicit
HTTPS_PROXY/NO_PROXY resolution, matching the main agent. This avoids
macOS system proxy settings (which omit the ExceptionsList) breaking
vision and other auxiliary calls to internal provider endpoints.
2026-06-27 21:22:49 -07:00
Teknium
a8c862900b
fix(tui): sanitize replay history on WebUI/TUI session resume (#29086) (#53939)
A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill,
interrupt, or process restart before the tool result was written) persists a
dangling assistant(tool_calls) or interrupted assistant->tool tail. The
messaging gateway already strips these tails before replay (the #49201 fix),
but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight
in as the agent's conversation_history with no cleanup. The model re-issued
the unanswered call on every resume -- including after a full WebUI + Gateway
restart, since the poison lives in the SessionDB, not memory -- leaving the
session permanently 'thinking'. Only deleting the session recovered it.

- Extract the two strippers + helper from gateway/run.py into a shared
  agent/replay_cleanup.py (sanitize_replay_history wraps both).
- gateway/run.py re-exports under the historical private names; messaging
  behavior unchanged.
- Both TUI cold-resume sites now sanitize the model-fed history while leaving
  the display transcript untouched, so the user still sees their full history.

Verified E2E against a real SessionDB: dangling and interrupted tails are
stripped from the model feed, healthy mid-progress tool sequences are
preserved, and the display transcript is always the full raw history.
2026-06-27 20:56:49 -07:00
Teknium
d43e0cf304
fix(agent): config-driven intent-ack continuation for all api_modes (#27881) (#53943)
* fix(agent): config-driven intent-ack continuation for all api_modes (#27881)

The agent could end a turn after only stating intent ('I will run a health
check...') without executing the announced tool call, forcing the user to
re-prompt. A continuation guard that catches this and nudges the model to
proceed already existed but was hard-gated to the codex_responses api_mode,
so Gemini/Claude/OpenRouter turns never benefited.

- New agent.intent_ack_continuation config (default 'auto' = codex-only,
  byte-stable for existing conversations). 'true'/model-list opts every
  api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape.
- looks_like_codex_intermediate_ack gains require_workspace (default True).
  The opted-in path drops the codebase/filesystem requirement so general
  autonomous workflows (server ops, deploys, API calls) are caught, not just
  coding tasks. Future-ack + action-verb + short-content + no-prior-tool
  guards still apply; the 2-nudge-per-turn cap is unchanged.
- Resolution centralized in intent_ack_continuation_mode (off/codex_only/all).

* docs(infographic): intent-ack continuation (#27881)
2026-06-27 20:46:00 -07:00
teknium1
9c6229ce24 fix(security): centralize credential-safe subprocess env (#29157)
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.

Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:

  - Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
    secrets -- stripped even for credential-inheriting children.
  - Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
    unless inherit_credentials=True. The opt-in is grep-able for audit.

Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.

This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.

Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
2026-06-27 20:45:31 -07:00
Jack Maloney
f0de4c6a47 fix(pool): re-select from credential pool on primary runtime restore
_restore_primary_runtime restored the construction-time api_key snapshot and
never consulted the credential pool. After the pool rotated away from a
revoked/exhausted entry mid-session, every new turn restored the dead key,
re-failed instantly, burned the remaining entries, and fell through to
cross-provider fallback.

After restoring the snapshot, re-select the pool's current best entry and
swap the live credential in via _swap_credential (which already rebuilds the
OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163
base_url / OAuth-detection fixes). Falls back to the snapshot key when the
pool is absent, empty, or the entry has no usable key.

Salvaged from #25206 onto current main: the original targeted the pre-refactor
monolithic method in run_agent.py; the logic now lives in
agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead
of re-inlining the client rebuild.

Fixes #25205
2026-06-27 20:04:45 -07:00
EloquentBrush0x
a0b9663c7c fix(streaming): rebuild Anthropic client on stream cleanup instead of OpenAI client
interruptible_streaming_api_call() has three connection-pool cleanup
sites that called _replace_primary_openai_client() unconditionally.
For api_mode=anthropic_messages this has two consequences:

1. _replace_primary_openai_client() fails (OPENAI_API_KEY unset on
   Anthropic-only configs), so dead connections are never purged.
2. The stale-stream detector's outer-poll site (L1977) is the only
   mechanism that can interrupt the worker thread while it blocks in
   for event in stream:. Because the Anthropic client is never closed,
   the thread stays blocked until the 900 s httpx read-timeout fires,
   producing a visible 15-minute hang for Telegram/gateway users on
   claude-opus-4-7.

Fix: mirror the existing interrupt-path pattern (L1989-1997) at all
three cleanup sites — if api_mode == "anthropic_messages", call
_anthropic_client.close() + _rebuild_anthropic_client() instead of
_replace_primary_openai_client(). _rebuild_anthropic_client() handles
both direct Anthropic and Bedrock-hosted Claude correctly, unlike the
inline build_anthropic_client() calls in open PR #14430.

PR #14430 (open) covers only the outer stale-detector site (L1977).
PR #23678 (open) covers only the inner retry sites (L1774, L1833).
This PR covers all three sites and uses _rebuild_anthropic_client()
for Bedrock parity.

Fixes #28161
2026-06-27 19:37:33 -07:00
Shashwat Gokhe
505bc27d8d fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats
A document attached alongside an image in the same Discord message was
swept into the vision pipeline and 400'd the whole turn ("Could not
process image"), and was simultaneously never surfaced to the agent as a
readable file. Restores the "any file type works" contract for mixed
messages and fixes the HTTP 400.

Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video
classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in
a PHOTO message landed in image_paths and poisoned the vision call. The
document context-note path was gated on message_type == DOCUMENT, so that
same doc never reached the agent at all. Now classification is
per-attachment (trust each attachment's own MIME; fall back to the
message-level type only when MIME is unknown), via shared _event_media_is_*
helpers used by both _build_media_placeholder and the main inbound loop.
The document note now fires for any non-image/audio/video attachment
regardless of message-level type.

Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic
400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now
transcodes those to PNG via Pillow before declaring media_type, skipping
cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow
can't rasterize it — so it's skipped rather than transcoded.

Closes #25935.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>
2026-06-27 19:26:04 -07:00
teknium1
0c372274cd fix(agent): disable OpenAI SDK auto-retry that double-fires inside the rate-limit loop
Same bug class as the Anthropic fix (#26293): the OpenAI/aggregator client is
built without max_retries, so the SDK default of 2 applies. The SDK's own 1-2s
backoff ignores Retry-After and retries inside hermes's outer conversation loop,
burning request slots against a rate-limited bucket. Set max_retries=0 at the
single create_openai_client chokepoint (covers init, switch_model, recovery,
restore, request-scoped). auxiliary_client builds its own clients and is not
wrapped by the loop, so it keeps SDK retries.
2026-06-27 19:23:15 -07:00
konsisumer
1ab35ba25d fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s
The Anthropic SDK clients were built without max_retries, so the SDK
default (max_retries=2) retried 429/5xx with its own backoff that ignores
Retry-After — double-retrying inside hermes's outer loop and burning
request slots against a bucket that won't refill for minutes. Set
max_retries=0 on all Anthropic/AnthropicBedrock client constructions so
the outer conversation loop (which already honors Retry-After) owns retry.

Also raise the Retry-After cap in the conversation loop from 120s to 600s.
Anthropic Tier 1 input-token buckets reset in ~171s, so the 120s cap made
hermes retry before the reset window and re-trip the limit.

Refs #26293
2026-06-27 19:23:15 -07:00
LeonSGP43
32732a8f83 fix(agent): cap same-entry credential refreshes so fallback can activate (#26080)
A persistent upstream 401 on a single-entry OAuth pool (common for Claude
Max subscribers) made the credential-pool recovery spin forever:
try_refresh_current() re-mints a fresh token and reports success on every
401, so recover_with_credential_pool returned True and the retry loop
continue'd without ever incrementing retry_count or reaching the
auth-failover block. The configured fallback_model never activated and the
agent appeared to hang.

Cap consecutive successful same-entry refreshes (keyed by provider +
pool-entry id) at 2; once exceeded, treat the credential as unrecoverable
and return not-recovered so the loop falls through to
_try_activate_fallback. The 429/billing paths already rotate-or-fall-through
correctly (mark_exhausted_and_rotate returns None on a single entry), so
only the auth-refresh branch needed the cap.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 19:20:07 -07:00
Teknium
fae920642a
fix(agent): throttle cross-turn fallback-switch replay storm (#24996) (#53909)
When every provider in the fallback chain fails non-retryably back-to-back
(e.g. HTTP 400/402/429 across distinct providers), the within-turn walk is
already bounded — _fallback_index advances monotonically and the loop aborts
when the chain exhausts. The damaging mode is cross-turn: restore_primary_
runtime resets _fallback_index=0 every turn, so a client that re-submits
immediately replays the entire chain, re-marshaling the full (potentially
80k-token) context once per provider every turn with no throttle on the
non-rate-limit path. On constrained hosts this exhausts memory/swap.

Rate-limit/billing failures already arm a 60s cooldown via _rate_limited_until;
the gap was the non-rate-limit case. Now, when the chain exhausts on a non-
rate-limit failure with a non-empty chain, arm a short (5s) cooldown on the
same _rate_limited_until gate (max(), never shrinking an existing window).
The next turn's restore stays gated and does NOT reset the index, so the
chain isn't replayed until the cooldown clears. No new state, no thread sleep,
no false-trip on legitimately long chains (those walk normally within a turn).

Tests: tests/run_agent/test_24996_fallback_exhaustion_cooldown.py
2026-06-27 19:15:40 -07:00
Chaz Dinkle
1dde7e2f2a fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh
Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on
its own schedule, so by the time Hermes notices an expired token Claude
Code may have already rotated it. Re-read live credential sources first and
adopt a valid token rather than POSTing a possibly-stale refresh token.

Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle)
on top of the keychain/file reconciliation from PR #21112 (nodejun).
Adds AUTHOR_MAP entry for nodejun.
2026-06-27 19:14:43 -07:00
jun
5a5396aecb fix(anthropic): reconcile keychain/file credentials when one is expired
read_claude_code_credentials() previously returned the macOS Keychain
entry as soon as one existed, even if its OAuth token was already
expired. Callers then ran is_claude_code_token_valid() on the result
and got False, so resolve_anthropic_token() returned None — surfacing
the misleading 'No Anthropic credentials found' error even when
~/.claude/.credentials.json held a perfectly valid token.

Now reads both sources and prefers the non-expired one. When both are
valid (or both expired), prefers the later expiresAt so any subsequent
refresh uses the freshest refresh_token.

Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation
cases. The existing 'keychain wins' priority test still passes because
both fixtures share the same expiresAt and the tiebreaker is >=.
2026-06-27 19:14:43 -07:00
linyubin
c946e6709f fix(agent): activate fallback on persistent transport failures (#22277)
Eager fallback previously fired only on rate_limit/billing. A stale-
detector-killed hung stream classifies as FailoverReason.timeout
(retryable=True) and the retry loop re-hit the same dead primary until
the budget exhausted -- 3 x ~180-300s stale kills compounding into a
15+ min silent hang while the configured fallback chain sat idle.

Extend the existing eager-fallback gate to also cover timeout and
overloaded, but only after one real retry (retry_count >= 2) so genuine
transient hiccups still recover on the primary. Reuses the same
pool-recovery guard and state-reset as the rate_limit branch -- no new
config flag, no change to the rate-limit intent.

Salvaged from PR #50228 by @linyubin. Closes #22277.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-27 19:12:21 -07:00
LeonSGP43
c56b39c11e fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted
_try_openrouter() returned (None, None) whenever an OpenRouter credential
pool existed but was exhausted (_select_pool_entry -> (True, None)), making
the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks
(compression, vision, web_extract) silently failed even with a valid env key.

Now the pool-present branch only returns early when it successfully builds a
client; an exhausted pool falls through to the env-var path. The final
failure (pool exhausted AND no env var) still marks the provider unhealthy.

Fixes #23452.

Co-authored-by: ambition0802 <noreply@github.com>
2026-06-27 19:09:27 -07:00
qWaitCrypto
46e18804ad fix(auxiliary): fall back on 401 auth errors in auto mode (#21165)
When the primary provider returns 401 and the auth-refresh path is
unavailable or fails, both call_llm() and async_call_llm() reached the
should_fallback gate without _is_auth_error in the condition, so the
auxiliary task (e.g. compression) was dropped silently — losing message
history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in
both sync and async paths, plus an 'auth error' reason branch.

Auth stays a non-capacity error: it falls back in auto mode via the
is_auto gate, but on an explicitly-configured provider it still respects
the user's choice and raises rather than silently switching providers.
2026-06-27 19:07:04 -07:00
Teknium
1a570dae00
fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901)
The agent's image-rejection fallback strips images and retries text-only when
a provider rejects image content, which is what lets the gateway drain its
queued messages. The fallback only fires on a hardcoded phrase list, and the
OpenRouter wording — HTTP 404 'No endpoints found that support image input' —
was missing. For OpenRouter-routed non-vision models the fallback never fired,
the retry loop re-sent the same rejected request until exhaustion, and every
subsequent message (including plain text) stayed queued behind the stuck turn.

Add the phrase to _IMAGE_REJECTION_PHRASES (the 404 already passes the 4xx
gate). Add a positive test and a guard test so the sibling OpenRouter
'no endpoints ... data policy / guardrail' 404s do NOT get their images
stripped.

Fixes #21160. Reported by @liu14goal14-ux; PR #21198 by @ygd58.
2026-06-27 19:07:02 -07:00
konsisumer
8b4c29f0f0 fix(auth): preserve concurrently-added credentials on pool rewrite 2026-06-27 19:01:37 -07:00
Teknium
d3d621f7c3
revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853)
* Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)"

This reverts commit 2ecca1e7d3.

* Revert "fix(windows): stop terminal-window popups from background spawns (#53810)"

This reverts commit 5db1430af9.

* Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)"

This reverts commit ef17cd204d.
2026-06-27 15:59:00 -07:00
Teknium
2ecca1e7d3
fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.

- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
  safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
  docker/powershell/…); calls to those are flagged even when captured. Non-flashing
  programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
  popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
  _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
  durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
2026-06-27 14:49:41 -07:00
Teknium
3ac96d3308
fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827)
On a MoA session, auxiliary tasks (title generation, compression, vision, …)
ran through _resolve_auto with provider='moa' / model='<preset>', which sent
the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client —
producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible
as the title-generation warning).

MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the
reference fan-out. _resolve_auto now resolves a 'moa' main provider to the
preset's aggregator slot (its acting model) and continues Step 1 with that real
provider+model, dropping the virtual moa://local base_url + placeholder key so
the aggregator resolves via its own provider credentials. Mirrors the MoA
context-length resolution.

Verified live: a MoA turn no longer emits the 'not a valid model ID' warning.
Test: tests/agent/test_auxiliary_main_first.py (19 pass).
2026-06-27 14:21:26 -07:00
Gille
e7bb67332d fix(moa): preserve Codex slot routing 2026-06-27 14:20:51 -07:00
Gille
66aeda3550 fix(moa): keep virtual provider on MoA client 2026-06-27 14:20:51 -07:00
Teknium
ef17cd204d
fix(windows): stop subprocess console-window popups + add CI guard (#53791)
* fix(windows): stop subprocess console-window popups + add CI guard

The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.

- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
  calls that can create a new console — output-redirection-aware (capture/
  redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
  systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
  burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
  windows_detach_popen_kwargs(); marked intentionally-visible launches
  (editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
  tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.

* test: accept creationflags kwarg in psutil_android fake_subprocess_run

The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
2026-06-27 13:03:51 -07:00
Teknium
3b44a3c8bb
feat(moa): show each reference model's output as a labelled block before the aggregator (#53793)
When a MoA preset is selected, each reference model's answer now renders in the
CLI as a thinking-style block labelled with its source model, BEFORE the
aggregator responds — so the mixture-of-agents process is visible instead of a
silent pause. The aggregator's response (and its tool actions) follow as normal.

Mechanism (shared seam, all surfaces):
- MoAChatCompletions/MoAClient take an optional reference_callback and emit
  'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating'
  (aggregator label) once. agent_init wires this to the agent's
  tool_progress_callback, which every surface already consumes — so the events
  reach CLI/TUI/desktop/gateway with no new plumbing.
- CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference
  i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_
  preview), and 'moa.aggregating' as a spinner transition. Display-only; never
  touches message history (cache-safe).

Turn-scoped reference cache: the agent loop calls the facade once per tool-loop
iteration, but the advisory message view is identical across iterations within a
turn, so references are now run AND displayed once per user turn (keyed by the
advisory view's signature) instead of re-running/re-spamming on every iteration.
This also cuts reference API cost from O(iterations) back to O(turns).

Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs):
reference blocks render once per turn, labelled by model, before the aggregator;
fresh blocks on each new turn; aggregator tool actions still execute.

Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive
the events via tool_progress_callback; their surface-specific renderers are a
separate change.
2026-06-27 12:45:23 -07:00
Teknium
227e6c0143
fix(moa): resolve context window from the aggregator, not the 256K default (#53780)
A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is
the virtual local endpoint, so get_model_context_length() missed every probe
and fell through to the 256K fallback — even when the aggregator is a 1M-context
model. The acting model in MoA IS the aggregator, so resolve the context window
from the aggregator slot's real provider+model.

- model_metadata.get_model_context_length: when provider=='moa', resolve the
  preset's aggregator slot through resolve_runtime_provider and recurse with the
  aggregator's real provider/model/base_url. Explicit model.context_length still
  wins (checked first); falls through to the generic default if resolution fails.

Tests: opus-gpt preset now reports 1M (the aggregator window), config override
still honored.
2026-06-27 12:08:09 -07:00
konsisumer
1b6ebb24c0 fix(agent): validate OpenRouter provider sort before request dispatch 2026-06-27 11:43:08 -07:00
Teknium
190e1ffac9
fix(redact): mask passwords in lowercase/dotted config keys (#53590)
The secret redactor only matched uppercase env-style keys ([A-Z0-9_]),
so config-file assignments like spring.datasource.password=secret,
app.api.key=xyz, and YAML password: secret leaked verbatim when the
agent ran cat/grep on application.properties or .env files (issue #16413).

Adds three case-insensitive config-key matchers that run only in a
config-file context, preserving the existing #4367 (lowercase code/prose)
and web-URL-passthrough carve-outs:
  - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config
  - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export)
  - _YAML_ASSIGN_RE: unquoted colon config (password: value)
Value capture stops at whitespace and '&' so form bodies stay pair-wise;
the '://' guard keeps intentional web-URL query-param passthrough intact.

Reported-by: Murtaza1211
2026-06-27 04:43:28 -07:00
Teknium
02b32e2d7c
fix(moa): call reference + aggregator models through their provider's real route (#53580)
MoA was calling reference and aggregator models through a bare
call_llm(provider=slot["provider"], model=slot["model"]) with a forced
temperature and a forced max_tokens (the preset's hardcoded 4096). That left
base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed
the API surface instead of using the provider's real runtime, and the 4096 cap
truncated long aggregator syntheses.

A MoA slot is just a model selection and must be called the same way any model
is called elsewhere. Each slot is now resolved through resolve_runtime_provider
(the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and
delegate_task all use) via a new _slot_runtime() helper, and the resolved
endpoint is passed into call_llm. So a reference/aggregator gets its provider's
actual API surface — MiniMax → anthropic_messages, GPT-5/o-series →
max_completion_tokens, custom endpoints → their base_url — identical to how that
model is handled as the acting model.

MoA also no longer imposes its own output cap: max_tokens defaults to None
(omitted → the model's real maximum) for references and is passed through from
the caller for the aggregator. The preset's hardcoded 4096 is gone. The
max_tokens preset config field is left in place (config/web/desktop unchanged);
it is simply no longer applied as a forced cap.

Tests: slots route through resolve_runtime_provider with resolved base_url/
api_key; resolution errors fall back to bare provider/model; neither call
carries an output cap even when the preset config still contains max_tokens.
2026-06-27 04:39:42 -07:00
herbalizer404
3fe16e3cd5 fix(fallback): attach credential pool after provider switch
When automatic fallback activates a provider that differs from the
primary, try_activate_fallback() cleared the primary's pool (to avoid
cross-provider base_url contamination, #33163) but never loaded the
fallback provider's own pool. The fallback then ran with no pool, so
rate_limit/billing/auth recovery couldn't rotate its credentials.

After clearing a mismatched pool, load_pool(fb_provider) and attach it
when it has credentials, so provider-specific rotation continues to
work on the fallback target.
2026-06-27 04:39:26 -07:00
Tranquil-Flow
635841d210 fix(agent): reload credential pool on switch_model provider change (#52727)
switch_model() swapped model/provider/base_url/api_key but never
refreshed agent._credential_pool, which stays bound to the original
provider. recover_with_credential_pool() then sees a pool.provider !=
agent.provider mismatch and short-circuits — so a 429/401 on the new
provider gets no rotation and falls through to fallback instead.

Reload load_pool(new_provider) inside switch_model when the provider
changes (or the pool is missing). The reload is inside the protected
swap block and the pool is added to the rollback snapshot, so a failed
client rebuild restores the original pool.

Fixes #16678, #52727.
2026-06-27 04:39:26 -07:00
teknium1
38e7bd8a08 fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit
Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status
path classified these unconditionally as rate_limit with
should_rotate_credential=True, so an overloaded provider exhausted the
credential pool after two errors — fatal for a single-key user, who has
nothing to rotate to.

The credential is valid; the server is just busy. Disambiguate the 429
body against a shared _OVERLOADED_PATTERNS list and route overload
language to FailoverReason.overloaded (retryable, no rotation), matching
the existing 503/529 path and the message-only path (#52890). Genuine
rate limits (no overload language) still rotate.

Extracted the inline overloaded tuple #52890 added into the shared
_OVERLOADED_PATTERNS constant so the status-code and message paths use
one list.

Closes #14038.
2026-06-27 04:16:54 -07:00
LeonSGP43
e7c013494d fix(agent): preserve nested API error bodies 2026-06-27 04:13:53 -07:00
Teknium
68a65ed7a1
fix(agent_init): correct misleading sub-64K context_length error message (#53569)
The error raised when a model's context window is below the 64K minimum
advertised "or set model.context_length in config.yaml to override" — but
the guard intentionally has no sub-64K escape hatch. Sub-64K models are
rejected by design (tool schemas + system prompt need the headroom).

The misleading clause invited a cluster of dup PRs (#11097, #11110, #8962,
#9142, #37548) all trying to wire an override that we don't want. Reword to
state the real options: pick a >=64K model, or — if your local server
under-reports its true window — declare the real value (which must itself
be >=64K). Guard behavior is unchanged.
2026-06-27 03:56:25 -07:00
Bartok9
45ce35ed72 fix(agent): classify message-only 'overloaded' as server overload
Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the
overloaded-classification fix, with a regression test that fails without it.
2026-06-27 03:52:52 -07:00
Teknium
ec769e49d2
fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564)
The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not
use markdown as it does not render' — factually wrong. Both adapters
actively convert markdown to native formatting:

- whatsapp_common.format_message(): **bold**, ~~strike~~, # headers,
  links, code blocks -> WhatsApp native syntax
- signal_format.markdown_to_signal(): same conversions via bodyRanges,
  plus '- item' / '* item' bullets -> '• ' Unicode bullets

The wrong hint made the agent strip bullets and bold the adapter would
have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud:
markdown is auto-converted, bullet lists work, tables are not supported.
Added a contract test asserting markdown-converting platforms never
forbid markdown in their hint.
2026-06-27 03:46:41 -07:00
Teknium
60f58a2b95
feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552)
The verify-on-stop guard fired too eagerly — including on doc/markdown/skill
edits with nothing to verify, where it pushed a pointless /tmp verification
script. Three changes:

1. Default OFF for new installs: agent.verify_on_stop defaults to false
   (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31.
2. One-time migration (v30 -> v31): existing installs are switched off once,
   but only when the value is missing or still the "auto" sentinel — an
   explicit true/false the user set is preserved.
3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose
   paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly
   enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on
   the code paths.

The legacy "auto" sentinel is still honored when set explicitly (ON for
interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env
override unchanged.
2026-06-27 03:23:22 -07:00
diamondeyesfox
8df231c941 fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
kshitijk4poor
cdb1dfbc49 fix: use os.pathsep, add tests, update tips for multi-root support
- Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and
  the Windows separator ';' work correctly.
- Add 9 tests covering multi-root behavior: writes inside first/second
  root, writes outside all roots, trailing/leading/double separators,
  all-separators edge case, static deny priority, duplicate dedup.
- Update hermes_cli/tips.py tip string to mention multiple paths.
- Update docs to mention os.pathsep / ; on Windows.

Follow-up for salvaged PR #49557.
2026-06-27 04:01:12 +05:30