Commit graph

13210 commits

Author SHA1 Message Date
teknium1
b2faeba182 fix(file-ops): make preserved cwd reachable at write-time resolution (#26211)
Belt-and-suspenders on top of the cherry-picked cwd-preservation fix:

- Proactively mirror every live terminal cwd into _last_known_cwd on each
  successful read, so the durable anchor survives even when the cleanup
  thread pops both _file_ops_cache and _active_environments before
  _get_file_ops' stale-cache save branch can fire.
- Fall back to _last_known_cwd in _authoritative_workspace_root. write_file_tool
  resolves the path (via _resolve_path_for_task) BEFORE _get_file_ops rebuilds
  the env, so restoring only the rebuilt env's cwd was insufficient — the
  resolution that decides where the file lands runs first. This closes that gap.

The local env's persisted _cwd_file can't serve this role: it's keyed by a
random per-session uuid and deleted on cleanup (the same cleanup that triggers
the bug). The in-memory _last_known_cwd registry is the durable anchor instead.

Adds a real-IO E2E regression (TestSilentFileMisplacementE2E) exercising the
actual write_file_tool path after env cleanup.
2026-06-27 19:29:06 -07:00
zccyman
adeba1d7a8 fix(file-ops): preserve CWD across terminal environment re-creation (#26211)
Root cause: when the terminal environment (`_active_environments` entry) is
cleaned up and re-created during a long conversation, the new environment
always starts with the default config CWD (typically `~/.hermes/hermes-agent`)
instead of preserving the user's last-known working directory. Subsequent
relative-path writes (`write_file`, `execute_code`, shell commands) silently
land in the default CWD, making files appear to be "created but absent."

Fix: add `_last_known_cwd` dict that preserves the old environment's CWD
before the stale cache entry is invalidated. When a new environment is
created for the same task_id, we check `_last_known_cwd` first and use the
preserved CWD instead of the config default.

Changes:
- tools/file_tools.py: add `_last_known_cwd` dict, save CWD before stale
  cache invalidation, restore CWD on env recreation
- tests/tools/test_file_tools.py: add `TestLastKnownCwd` with 2 tests
  verifying CWD preservation and fallback behavior

Fixes #26211
2026-06-27 19:29:06 -07:00
teknium1
926a1b915d fix(tools): suppress transient check_fn flakes so subagents keep file/terminal tools
A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements
running `docker version` with a 5s timeout, momentarily timing out under load)
would return False for a single get_tool_definitions() call. Because file
tools delegate their check_fn to the terminal check, that one flake silently
stripped read_file/write_file/patch/search_files AND terminal from whatever
agent was being constructed at that instant — most visibly a delegate_task
subagent, which then reported "Tool read_file does not exist". This explains
both the intermittent (~80% success) user-session failures and the
deterministic cron failures in #21658 / #5304.

The existing _check_fn TTL cache made this worse: it cached the transient
False for the full 30s window, poisoning every subagent spawned in that span.

Fix: remember the last time each check_fn returned True; when a fresh probe
fails within a short grace window of that success, treat it as a flake —
serve the last-good True and do NOT cache the failure (so the next call
re-probes). A failure with no recent success, or past the grace window, is
honored normally so a backend that genuinely went down stops advertising its
tools. Probe failures now log at WARNING regardless of quiet mode, making the
previously-silent tool loss diagnosable in subagent (quiet) sessions.

Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>
2026-06-27 19:29:00 -07:00
Shashwat Gokhe
505bc27d8d fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats
A document attached alongside an image in the same Discord message was
swept into the vision pipeline and 400'd the whole turn ("Could not
process image"), and was simultaneously never surfaced to the agent as a
readable file. Restores the "any file type works" contract for mixed
messages and fixes the HTTP 400.

Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video
classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in
a PHOTO message landed in image_paths and poisoned the vision call. The
document context-note path was gated on message_type == DOCUMENT, so that
same doc never reached the agent at all. Now classification is
per-attachment (trust each attachment's own MIME; fall back to the
message-level type only when MIME is unknown), via shared _event_media_is_*
helpers used by both _build_media_placeholder and the main inbound loop.
The document note now fires for any non-image/audio/video attachment
regardless of message-level type.

Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic
400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now
transcodes those to PNG via Pillow before declaring media_type, skipping
cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow
can't rasterize it — so it's skipped rather than transcoded.

Closes #25935.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>
2026-06-27 19:26:04 -07:00
teknium1
0c372274cd fix(agent): disable OpenAI SDK auto-retry that double-fires inside the rate-limit loop
Same bug class as the Anthropic fix (#26293): the OpenAI/aggregator client is
built without max_retries, so the SDK default of 2 applies. The SDK's own 1-2s
backoff ignores Retry-After and retries inside hermes's outer conversation loop,
burning request slots against a rate-limited bucket. Set max_retries=0 at the
single create_openai_client chokepoint (covers init, switch_model, recovery,
restore, request-scoped). auxiliary_client builds its own clients and is not
wrapped by the loop, so it keeps SDK retries.
2026-06-27 19:23:15 -07:00
konsisumer
1ab35ba25d fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s
The Anthropic SDK clients were built without max_retries, so the SDK
default (max_retries=2) retried 429/5xx with its own backoff that ignores
Retry-After — double-retrying inside hermes's outer loop and burning
request slots against a bucket that won't refill for minutes. Set
max_retries=0 on all Anthropic/AnthropicBedrock client constructions so
the outer conversation loop (which already honors Retry-After) owns retry.

Also raise the Retry-After cap in the conversation loop from 120s to 600s.
Anthropic Tier 1 input-token buckets reset in ~171s, so the 120s cap made
hermes retry before the reset window and re-trip the limit.

Refs #26293
2026-06-27 19:23:15 -07:00
LeonSGP43
32732a8f83 fix(agent): cap same-entry credential refreshes so fallback can activate (#26080)
A persistent upstream 401 on a single-entry OAuth pool (common for Claude
Max subscribers) made the credential-pool recovery spin forever:
try_refresh_current() re-mints a fresh token and reports success on every
401, so recover_with_credential_pool returned True and the retry loop
continue'd without ever incrementing retry_count or reaching the
auth-failover block. The configured fallback_model never activated and the
agent appeared to hang.

Cap consecutive successful same-entry refreshes (keyed by provider +
pool-entry id) at 2; once exceeded, treat the credential as unrecoverable
and return not-recovered so the loop falls through to
_try_activate_fallback. The 429/billing paths already rotate-or-fall-through
correctly (mark_exhausted_and_rotate returns None on a single entry), so
only the auth-refresh branch needed the cap.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 19:20:07 -07:00
Teknium
fae920642a
fix(agent): throttle cross-turn fallback-switch replay storm (#24996) (#53909)
When every provider in the fallback chain fails non-retryably back-to-back
(e.g. HTTP 400/402/429 across distinct providers), the within-turn walk is
already bounded — _fallback_index advances monotonically and the loop aborts
when the chain exhausts. The damaging mode is cross-turn: restore_primary_
runtime resets _fallback_index=0 every turn, so a client that re-submits
immediately replays the entire chain, re-marshaling the full (potentially
80k-token) context once per provider every turn with no throttle on the
non-rate-limit path. On constrained hosts this exhausts memory/swap.

Rate-limit/billing failures already arm a 60s cooldown via _rate_limited_until;
the gap was the non-rate-limit case. Now, when the chain exhausts on a non-
rate-limit failure with a non-empty chain, arm a short (5s) cooldown on the
same _rate_limited_until gate (max(), never shrinking an existing window).
The next turn's restore stays gated and does NOT reset the index, so the
chain isn't replayed until the cooldown clears. No new state, no thread sleep,
no false-trip on legitimately long chains (those walk normally within a turn).

Tests: tests/run_agent/test_24996_fallback_exhaustion_cooldown.py
2026-06-27 19:15:40 -07:00
Chaz Dinkle
1dde7e2f2a fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh
Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on
its own schedule, so by the time Hermes notices an expired token Claude
Code may have already rotated it. Re-read live credential sources first and
adopt a valid token rather than POSTing a possibly-stale refresh token.

Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle)
on top of the keychain/file reconciliation from PR #21112 (nodejun).
Adds AUTHOR_MAP entry for nodejun.
2026-06-27 19:14:43 -07:00
jun
5a5396aecb fix(anthropic): reconcile keychain/file credentials when one is expired
read_claude_code_credentials() previously returned the macOS Keychain
entry as soon as one existed, even if its OAuth token was already
expired. Callers then ran is_claude_code_token_valid() on the result
and got False, so resolve_anthropic_token() returned None — surfacing
the misleading 'No Anthropic credentials found' error even when
~/.claude/.credentials.json held a perfectly valid token.

Now reads both sources and prefers the non-expired one. When both are
valid (or both expired), prefers the later expiresAt so any subsequent
refresh uses the freshest refresh_token.

Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation
cases. The existing 'keychain wins' priority test still passes because
both fixtures share the same expiresAt and the tiebreaker is >=.
2026-06-27 19:14:43 -07:00
Teknium
db16854f34
fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912)
When a Telegram attachment download/cache fails (typically a transient
httpx.ConnectError to Telegram's CDN), the except handler logged a warning
and fell through to handle_message() with empty media and no text — the user
thought the file was delivered, the agent saw a content-less turn with no
signal an attachment was attempted, and the only record was a buried log line.

Adds _surface_media_cache_failure(): replies to the user in Telegram so they
know to retry, and appends an agent-visible notice to event.text via the
existing _append_observed_note channel so the agent knows an attachment was
attempted and failed. No new event fields (structured-event refactor is out
of scope per #23045). Wired into all five cache-failure sites — photo, voice,
audio, video, document — since they shared the identical silent fall-through.

Bug 1 from #23045 (unsupported types routed as fake user messages) no longer
exists on main: the document handler now accepts any file type, so there is no
rejection branch to fix.

Closes #23045
2026-06-27 19:12:57 -07:00
teknium1
4133cd9fbf docs(infographic): eager fallback on persistent transport failures 2026-06-27 19:12:21 -07:00
teknium1
6514be5a28 chore(release): add AUTHOR_MAP entry for linyubin (#50228 salvage) 2026-06-27 19:12:21 -07:00
linyubin
c946e6709f fix(agent): activate fallback on persistent transport failures (#22277)
Eager fallback previously fired only on rate_limit/billing. A stale-
detector-killed hung stream classifies as FailoverReason.timeout
(retryable=True) and the retry loop re-hit the same dead primary until
the budget exhausted -- 3 x ~180-300s stale kills compounding into a
15+ min silent hang while the configured fallback chain sat idle.

Extend the existing eager-fallback gate to also cover timeout and
overloaded, but only after one real retry (retry_count >= 2) so genuine
transient hiccups still recover on the primary. Reuses the same
pool-recovery guard and state-reset as the rate_limit branch -- no new
config flag, no change to the rate-limit intent.

Salvaged from PR #50228 by @linyubin. Closes #22277.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-27 19:12:21 -07:00
bykim0119
851f75d4df fix(discord): honor "*" wildcard in DISCORD_ALLOWED_USERS (#22334)
DISCORD_ALLOWED_USERS="*" now means "allow everyone", matching the
SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and
the value `claw migrate` emits. Previously _is_allowed_user did exact
ID matching only, so "*" matched no user and blocked every non-self
sender — a P1 with no workaround.

Three sites, all required for the fix to hold at runtime:
- _is_allowed_user: short-circuit when "*" is in the allowlist.
- connect(): exclude "*" from the intents.members trigger so the
  wildcard does not request the privileged Server Members intent
  (which can block the bot from coming online).
- _resolve_allowed_usernames: preserve "*" verbatim; otherwise it lands
  in the username-resolution bucket, matches no member, and is silently
  dropped from the set and env var on the first on_ready — quietly
  undoing the fix.

Slash auth delegates to _is_allowed_user (auto-covered); component auth
already honors "*" on main.
2026-06-27 19:11:30 -07:00
Teknium
1207d81eed
fix(gateway): unify outbound chat redaction onto authoritative redactor (#23810) (#53907)
The gateway banner promises 'chat responses are scrubbed before delivery',
but _redact_gateway_user_facing_secrets used a divergent 6-pattern subset that
leaked credential shapes the comprehensive agent.redact catches — notably the
GitHub fine-grained PAT (github_pat_...) and the Telegram bot-token shape
(bot<digits>:<token>), the gateway's own credential type.

_redact_gateway_user_facing_secrets now delegates to
agent.redact.redact_sensitive_text(force=True) — the same Tirith-grade redactor
already applied to logs, tool output, and approval-command prompts — so the
outbound LLM-response path (final_response -> _sanitize_gateway_final_response)
masks the full credential set. The narrow local pattern set is kept as a
fail-soft second pass. force=True honors redaction even when
security.redact_secrets is off, matching _redact_approval_command.

Test: regression guard parametrizing all 5 issue shapes x every chat surface;
asserts secret body never reaches the user and surrounding prose survives. The
existing bearer-token test's marker assertion is loosened from the literal
'[REDACTED]' to mask-agnostic (the redactor masks as '***'/partial) — it
asserts the security invariant, not the implementation's mask string.
2026-06-27 19:09:41 -07:00
LeonSGP43
c56b39c11e fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted
_try_openrouter() returned (None, None) whenever an OpenRouter credential
pool existed but was exhausted (_select_pool_entry -> (True, None)), making
the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks
(compression, vision, web_extract) silently failed even with a valid env key.

Now the pool-present branch only returns early when it successfully builds a
client; an exhausted pool falls through to the env-var path. The final
failure (pool exhausted AND no env var) still marks the provider unhealthy.

Fixes #23452.

Co-authored-by: ambition0802 <noreply@github.com>
2026-06-27 19:09:27 -07:00
qWaitCrypto
46e18804ad fix(auxiliary): fall back on 401 auth errors in auto mode (#21165)
When the primary provider returns 401 and the auth-refresh path is
unavailable or fails, both call_llm() and async_call_llm() reached the
should_fallback gate without _is_auth_error in the condition, so the
auxiliary task (e.g. compression) was dropped silently — losing message
history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in
both sync and async paths, plus an 'auth error' reason branch.

Auth stays a non-capacity error: it falls back in auto mode via the
is_auto gate, but on an explicitly-configured provider it still respects
the user's choice and raises rather than silently switching providers.
2026-06-27 19:07:04 -07:00
Teknium
1a570dae00
fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901)
The agent's image-rejection fallback strips images and retries text-only when
a provider rejects image content, which is what lets the gateway drain its
queued messages. The fallback only fires on a hardcoded phrase list, and the
OpenRouter wording — HTTP 404 'No endpoints found that support image input' —
was missing. For OpenRouter-routed non-vision models the fallback never fired,
the retry loop re-sent the same rejected request until exhaustion, and every
subsequent message (including plain text) stayed queued behind the stuck turn.

Add the phrase to _IMAGE_REJECTION_PHRASES (the 404 already passes the 4xx
gate). Add a positive test and a guard test so the sibling OpenRouter
'no endpoints ... data policy / guardrail' 404s do NOT get their images
stripped.

Fixes #21160. Reported by @liu14goal14-ux; PR #21198 by @ygd58.
2026-06-27 19:07:02 -07:00
Teknium
a94f657a50
fix(tui): route completion RPCs to the pool so they can't freeze the TUI (#53895)
complete.path and complete.slash ran inline on the tui_gateway stdin
reader thread. complete.path spawns git ls-files and fuzzy-ranks the
whole repo; complete.slash does first-call prompt_toolkit imports plus a
skill-dir scan. While either ran, prompt.submit / session.interrupt sat
unread in the stdin pipe, freezing the TUI until the 120s RPC timeout
fired — most reliably reproduced by typing @ on a large repo / WSL2 mount.

Add both to _LONG_HANDLERS so completion runs on the existing thread
pool (write_json is already _stdout_lock-guarded). Root-cause fix:
covers any slow completion, not just the bare-@ trigger.

Fixes #21123
2026-06-27 19:06:01 -07:00
teknium1
ccf526964a fix(gateway): bound adapter teardown awaits on the stop path (#14128)
The main stop loop in _stop_impl() awaited adapter.cancel_background_tasks()
and adapter.disconnect() with no timeout, for both the primary and the
secondary-profile (multiplex) adapter maps. A half-dead platform — a wedged
Feishu/Lark WebSocket thread blocked on network I/O is the reported case —
makes one of those awaits block forever, so the process never exits. systemd
then SIGKILLs it after TimeoutStopSec, skipping atexit PID-file cleanup, and
the next start dies with 'PID file race lost' and enters a restart loop.

The per-adapter timeout infra already existed on main
(_adapter_disconnect_timeout_secs / HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT,
default 5s) but was only wired into _safe_adapter_disconnect, which the
teardown path never calls.

Add _bounded_adapter_teardown(): wraps BOTH cancel_background_tasks() and
disconnect() in the existing timeout budget, logs and forces forward progress
on timeout, and never raises. Both teardown loops now route through it, so the
stop sequence always completes regardless of any adapter's internal behavior
and PID-file cleanup runs.

Original report + fix direction by @happy5318 (#14128, #14130); this widens it
to cover cancel_background_tasks(), the multiplex loop, and the config knob.

Co-authored-by: happy5318 <happy5318@users.noreply.github.com>
2026-06-27 19:05:04 -07:00
Teknium
6717cfc805
docs(gateway): warn against custom ExecStopPost kill drop-in (restart loop) (#53903)
A user-added systemd drop-in like ExecStopPost=/bin/kill -9 $MAINPID fires
on every stop, including clean restarts — it SIGKILLs the freshly spawned
gateway before it stabilizes and Restart=always respawns it, producing an
infinite restart loop (issue #23272). The unit Hermes installs already shuts
down cleanly via KillMode=mixed + KillSignal=SIGTERM with Restart=always +
RestartForceExitStatus, so no extra kill is needed. Document this as a danger
callout in the gateway service-management section.
2026-06-27 19:04:29 -07:00
teknium1
ea8facee81 chore(release): add konsisumer to AUTHOR_MAP for PR #19608 salvage 2026-06-27 19:01:37 -07:00
konsisumer
8b4c29f0f0 fix(auth): preserve concurrently-added credentials on pool rewrite 2026-06-27 19:01:37 -07:00
Teknium
163cb24d45
feat(moa): render reference-model blocks in TUI and desktop, not just CLI (#53855)
The MoA reference-block display (each reference model's output shown as a
labelled thinking block before the aggregator responds) previously existed
only in the classic CLI. The facade already emits moa.reference / moa.aggregating
through tool_progress_callback; this wires the TUI and desktop consumers.

- tui_gateway/server.py: _on_tool_progress relays moa.reference (label / text /
  index / count) and moa.aggregating to the Ink/desktop client as their own
  events.
- ui-tui: gatewayTypes adds the two event shapes; createGatewayEventHandler
  routes them; turnController.recordMoaReference pushes a committed
  thinking-style segment tagged with the source model. Shown regardless of
  showReasoning — references ARE the mixture-of-agents process the user opted
  into, not ordinary reasoning. moa.aggregating is a status-only transition
  (no transcript entry).
- apps/desktop: use-message-stream appends each reference as a labelled
  reasoning chunk via the existing reasoning disclosure; GatewayEventPayload
  gains label/index/aggregator.

Tests: tui_gateway emit (3), Ink handler render + showReasoning-independence +
aggregating-no-segment (3). TUI typecheck/lint clean; desktop typecheck/lint
clean.
2026-06-27 18:46:20 -07:00
Teknium
d3d621f7c3
revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853)
* Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)"

This reverts commit 2ecca1e7d3.

* Revert "fix(windows): stop terminal-window popups from background spawns (#53810)"

This reverts commit 5db1430af9.

* Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)"

This reverts commit ef17cd204d.
2026-06-27 15:59:00 -07:00
Teknium
1d32e5d98c
fix(gateway): relay _thinking bubbles when thinking_progress is on but tool_progress is off (#53849)
display.thinking_progress is documented as independent of tool_progress —
users can keep tool progress quiet while opting into mid-turn assistant
scratch-text bubbles. But two gates were keyed on tool_progress_enabled alone,
so with tool_progress:off the _thinking relay was silently dead even when
thinking_progress:true:

1. agent.tool_progress_callback was set to None unless tool_progress_enabled,
   so the callback that queues _thinking text never fired.
2. The send_progress_messages drain task was only started when
   tool_progress_enabled, so even queued messages had no consumer.

Both now gate on needs_progress_queue (tool_progress OR thinking_progress) —
the same condition that already decides whether to create the progress queue
at all. No effect when both are off (queue is None) or when tool_progress is
on (unchanged).

Tests: _thinking relays with thinking_progress:on/tool_progress:off, and is
suppressed when thinking_progress:off. Full progress-topics suite: 35 pass.
2026-06-27 15:48:20 -07:00
Teknium
2ecca1e7d3
fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.

- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
  safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
  docker/powershell/…); calls to those are flagged even when captured. Non-flashing
  programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
  popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
  _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
  durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
2026-06-27 14:49:41 -07:00
Teknium
3ac96d3308
fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827)
On a MoA session, auxiliary tasks (title generation, compression, vision, …)
ran through _resolve_auto with provider='moa' / model='<preset>', which sent
the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client —
producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible
as the title-generation warning).

MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the
reference fan-out. _resolve_auto now resolves a 'moa' main provider to the
preset's aggregator slot (its acting model) and continues Step 1 with that real
provider+model, dropping the virtual moa://local base_url + placeholder key so
the aggregator resolves via its own provider credentials. Mirrors the MoA
context-length resolution.

Verified live: a MoA turn no longer emits the 'not a valid model ID' warning.
Test: tests/agent/test_auxiliary_main_first.py (19 pass).
2026-06-27 14:21:26 -07:00
Gille
e7bb67332d fix(moa): preserve Codex slot routing 2026-06-27 14:20:51 -07:00
Gille
66aeda3550 fix(moa): keep virtual provider on MoA client 2026-06-27 14:20:51 -07:00
brooklyn!
5db1430af9
fix(windows): stop terminal-window popups from background spawns (#53810)
* fix(windows): stop terminal-window popups from background spawns

Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:

- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
  agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
  subprocess allocate a fresh console when the launching process has
  none (pythonw desktop backend / detached gateway) - even with output
  captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
  get a console regardless of how they're launched.

Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
  CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
  console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
  exclusively owns an auto-allocated console detaches it at startup
  (GetConsoleProcessList()==1 gate leaves shared interactive consoles
  untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
  tools repo-wide so the popup class can't regress.

* fix(windows): scope FreeConsole to background entry points; fix merge fallout

Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.

- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
  remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
  (start_server, what the desktop spawns), cron standalone, tui_gateway entry,
  slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
  no-op without console / on POSIX, and asserts it's not an import side effect.

Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
  every terminal init). _subprocess_compat.popen already hides the window, so
  drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
  primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
  case-variant test to drive that production path.

* test(claw): mock _subprocess_compat.run seam for Windows process scan

claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
2026-06-27 14:02:24 -07:00
Teknium
ef17cd204d
fix(windows): stop subprocess console-window popups + add CI guard (#53791)
* fix(windows): stop subprocess console-window popups + add CI guard

The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.

- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
  calls that can create a new console — output-redirection-aware (capture/
  redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
  systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
  burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
  windows_detach_popen_kwargs(); marked intentionally-visible launches
  (editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
  tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.

* test: accept creationflags kwarg in psutil_android fake_subprocess_run

The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
2026-06-27 13:03:51 -07:00
Teknium
3b44a3c8bb
feat(moa): show each reference model's output as a labelled block before the aggregator (#53793)
When a MoA preset is selected, each reference model's answer now renders in the
CLI as a thinking-style block labelled with its source model, BEFORE the
aggregator responds — so the mixture-of-agents process is visible instead of a
silent pause. The aggregator's response (and its tool actions) follow as normal.

Mechanism (shared seam, all surfaces):
- MoAChatCompletions/MoAClient take an optional reference_callback and emit
  'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating'
  (aggregator label) once. agent_init wires this to the agent's
  tool_progress_callback, which every surface already consumes — so the events
  reach CLI/TUI/desktop/gateway with no new plumbing.
- CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference
  i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_
  preview), and 'moa.aggregating' as a spinner transition. Display-only; never
  touches message history (cache-safe).

Turn-scoped reference cache: the agent loop calls the facade once per tool-loop
iteration, but the advisory message view is identical across iterations within a
turn, so references are now run AND displayed once per user turn (keyed by the
advisory view's signature) instead of re-running/re-spamming on every iteration.
This also cuts reference API cost from O(iterations) back to O(turns).

Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs):
reference blocks render once per turn, labelled by model, before the aggregator;
fresh blocks on each new turn; aggregator tool actions still execute.

Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive
the events via tool_progress_callback; their surface-specific renderers are a
separate change.
2026-06-27 12:45:23 -07:00
Dale Nguyen
dbbf102b8e fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).

Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.

Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.

Also adds the AUTHOR_MAP entry for the salvaged contributor.

Closes #23473
2026-06-28 01:04:20 +05:30
Teknium
d470ed0c4c
fix(cli): commit tool scrollback lines in verbose mode (non-streaming/MoA) (#53785)
In the interactive CLI, the aggregator's tool calls under a MoA preset (or
any non-streaming model call, e.g. copilot-acp) appeared to overwrite each
other instead of building scrollable history. Each tool only updated the
transient spinner line; no committed scrollback line was printed.

Root cause: persistent tool lines in _on_tool_progress's tool.completed
branch were gated on tool_progress_mode in {all, new}, omitting 'verbose'.
Streaming models hid the bug because _on_tool_gen_start commits a 'preparing'
line per tool during streaming; non-streaming calls (MoA forces
_use_streaming=False) never emit that, so under 'verbose' there was no
committed line at all — only the self-overwriting spinner.

'verbose' is strictly more than 'all', so it now commits the same scrollback
line. Verified live via interactive PTY on the MoA opus-gpt preset: three
terminal calls in turn 1 and two in turn 2 each render as separate persistent
lines.
2026-06-27 12:29:55 -07:00
Teknium
227e6c0143
fix(moa): resolve context window from the aggregator, not the 256K default (#53780)
A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is
the virtual local endpoint, so get_model_context_length() missed every probe
and fell through to the 256K fallback — even when the aggregator is a 1M-context
model. The acting model in MoA IS the aggregator, so resolve the context window
from the aggregator slot's real provider+model.

- model_metadata.get_model_context_length: when provider=='moa', resolve the
  preset's aggregator slot through resolve_runtime_provider and recurse with the
  aggregator's real provider/model/base_url. Explicit model.context_length still
  wins (checked first); falls through to the generic default if resolution fails.

Tests: opus-gpt preset now reports 1M (the aggregator window), config override
still honored.
2026-06-27 12:08:09 -07:00
ailthrim
25ec01f79f fix(desktop): don't purge Electron cache / mirror-retry after a late build failure
`hermes desktop` / `hermes update` recover from a corrupt Electron download by
purging the cached zip + re-downloading and retrying the pack, and then by
falling back to a public mirror. That recovery is only meaningful when the
packaged executable is MISSING — the signature of a partial/corrupt unpack.

A LATE failure such as macOS code signing (#40187) leaves
`Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place.
Re-downloading Electron can't repair a signing failure, so the purge +
slow mirror retry just grind through another identical failure before the
build finally errors out.

Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None`
so a build that already produced the executable fails fast instead of
triggering the destructive download recovery. The corrupt-download path
(executable missing) is unchanged.

Salvage of #42782, re-applied onto current main (the surrounding recovery was
refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR
was opened). Adds a regression test asserting no purge / mirror retry runs when
the executable exists, and updates the existing retry/mirror tests to model the
corrupt-download case (executable absent) the recovery is actually for.

Related to #40187 (the residual cache-purge sub-issue; the signing failure
itself is fixed by #52591).
2026-06-28 00:29:34 +05:30
teknium1
1ef19bad90 fix(model): show MoA preset picker on selection and label MoA in the banner
Selecting 'Mixture of Agents' in the `hermes model` provider picker fell
through silently — select_provider_and_model had no moa branch, so it just
reprinted the current model/provider summary and exited. And the CLI session
banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'),
which is meaningless out of context.

- Add _model_flow_moa: always lists the available presets (even one), then
  prints the full reference-models + aggregator breakdown for the selection
  and persists model.provider=moa / model.default=<preset> (dropping stale
  base_url + endpoint creds, since moa is a virtual local provider).
- Wire the branch into select_provider_and_model.
- build_welcome_banner takes provider; when 'moa' it renders
  'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call
  sites pass self.provider.

Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker
persists the preset and clears stale base_url/api_key.
2026-06-27 11:45:07 -07:00
konsisumer
1b6ebb24c0 fix(agent): validate OpenRouter provider sort before request dispatch 2026-06-27 11:43:08 -07:00
Teknium
27322612b4
fix(update): route loud build/installer output to update.log instead of the terminal (#53616)
* fix(update): route loud build/installer output to update.log instead of the terminal

hermes update flooded the terminal with the full vite asset dump,
electron-builder logs, npm deprecation warnings from the desktop build,
and the cua-driver installer's 'Next steps' wall. All of that is
low-signal noise the user doesn't need on a successful update.

- Capture the desktop --build-only subprocess (vite + electron-builder)
  into ~/.hermes/logs/update.log; print a one-line status, and on
  failure surface the last 15 lines + a pointer to the full log.
- Capture the cua-driver installer's output when verbose=False (the
  hermes update refresh path); concise upgrade line is unchanged.
- Add _log_only_write() / _run_logged_subprocess() helpers that write to
  the update.log handle without echoing to the terminal.

The repo-root npm install keeps streaming (capture_output=False) — that
is the deliberate #18840 guard so a slow postinstall download doesn't
look hung. The desktop npm install is a separate Electron process with
no such progress concern and is captured.

* fix(update): persist full cua-driver installer output to update.log

The captured cua-driver installer output was only sent to logger.debug
(agent.log) on failure, so the 'Next steps' wall was lost from
update.log entirely on success. Write the full captured output straight
to the update.log handle (sys.stdout._log) on both success and failure,
matching the desktop-build capture, so update.log keeps the complete
record of everything an update did.
2026-06-27 11:43:01 -07:00
ethernet
f53b184c48 fix(ci): pass secrets down to docker workflows 2026-06-27 09:53:28 -07:00
Teknium
190e1ffac9
fix(redact): mask passwords in lowercase/dotted config keys (#53590)
The secret redactor only matched uppercase env-style keys ([A-Z0-9_]),
so config-file assignments like spring.datasource.password=secret,
app.api.key=xyz, and YAML password: secret leaked verbatim when the
agent ran cat/grep on application.properties or .env files (issue #16413).

Adds three case-insensitive config-key matchers that run only in a
config-file context, preserving the existing #4367 (lowercase code/prose)
and web-URL-passthrough carve-outs:
  - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config
  - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export)
  - _YAML_ASSIGN_RE: unquoted colon config (password: value)
Value capture stops at whitespace and '&' so form bodies stay pair-wise;
the '://' guard keeps intentional web-URL query-param passthrough intact.

Reported-by: Murtaza1211
2026-06-27 04:43:28 -07:00
Teknium
917f6bdb00
fix(tools): let vision pick any provider+model, not just OpenRouter (#53606)
* fix(tools): let vision pick any provider+model, not just OpenRouter

hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY.
It now offers the same any-provider surface as the model command: Auto
(use main model / aggregator fallback), pick any authenticated provider +
model, or a custom OpenAI-compatible endpoint. Selections persist to
auxiliary.vision.{provider,model,base_url} — the keys the vision resolver
already reads. Custom endpoint pins provider=custom so base_url routes
correctly. Reconfigure path uses the same picker instead of re-prompting
for OPENROUTER_API_KEY.

* docs: add PR infographic for vision any-provider picker
2026-06-27 04:41:42 -07:00
Brandon Zarnitz
9c81c938d3 fix(approval): honour tirith_fail_open=false on Tirith ImportError (#20733)
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open.  When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.

Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it.  The default tirith_fail_open: true behaviour
is unchanged.

Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true  + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false           → always allowed regardless of fail_open

Fixes #20733

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	tests/tools/test_approval.py
2026-06-27 04:41:24 -07:00
Teknium
fe1c1c1121
fix(session_search): demote cron below interactive sessions in discover ranking (#53597)
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).

- _order_for_recall() stable-sorts FTS rows so interactive sources rank
  above cron before lineage dedup; within each class BM25/recency order
  is preserved. Cron is demoted, not excluded, so it still surfaces when
  it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
  in hand for the demotion pass.

Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
2026-06-27 04:41:22 -07:00
Teknium
cd592c105c
feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598)
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.

- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
  routes media_files through the whatsapp plugin's standalone_sender_fn, and
  add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
  chunk is media-only), then uploads each file via the bridge /send-media
  endpoint with a mediaType derived from extension/is_voice/force_document, so
  images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.

Closes #19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
2026-06-27 04:40:05 -07:00
Teknium
88c02469cc
fix(mcp): never permanently wedge the circuit breaker on a dead transport (#53599)
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).

Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.

- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
  instead of returning when the reconnect budget is exhausted, so the task
  stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
  long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
  parked/dead task and respawning a dead stdio subprocess) and returns a
  clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
  (stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling

Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.

Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-27 04:39:54 -07:00
r266-tech
dbc925b755 Guard oversized Telegram video downloads 2026-06-27 04:39:48 -07:00
Teknium
02b32e2d7c
fix(moa): call reference + aggregator models through their provider's real route (#53580)
MoA was calling reference and aggregator models through a bare
call_llm(provider=slot["provider"], model=slot["model"]) with a forced
temperature and a forced max_tokens (the preset's hardcoded 4096). That left
base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed
the API surface instead of using the provider's real runtime, and the 4096 cap
truncated long aggregator syntheses.

A MoA slot is just a model selection and must be called the same way any model
is called elsewhere. Each slot is now resolved through resolve_runtime_provider
(the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and
delegate_task all use) via a new _slot_runtime() helper, and the resolved
endpoint is passed into call_llm. So a reference/aggregator gets its provider's
actual API surface — MiniMax → anthropic_messages, GPT-5/o-series →
max_completion_tokens, custom endpoints → their base_url — identical to how that
model is handled as the acting model.

MoA also no longer imposes its own output cap: max_tokens defaults to None
(omitted → the model's real maximum) for references and is passed through from
the caller for the aggregator. The preset's hardcoded 4096 is gone. The
max_tokens preset config field is left in place (config/web/desktop unchanged);
it is simply no longer applied as a forced cap.

Tests: slots route through resolve_runtime_provider with resolved base_url/
api_key; resolution errors fall back to bare provider/model; neither call
carries an output cap even when the preset config still contains max_tokens.
2026-06-27 04:39:42 -07:00