Commit graph

1182 commits

Author SHA1 Message Date
m4dni5
d1f23bb2d5 fix: prevent TUI gateway stdin EOF crash across all TUI-context subprocess calls
When Hermes runs in TUI mode, the gateway child process communicates with
the Node.js parent over a JSON-RPC protocol on stdin. Subprocess calls that
inherit this stdin fd can trigger a race condition where the child's stdin
read returns EOF, causing the gateway to exit cleanly (exit code 0) mid-tool-
execution.

This is the same root cause as issue #14036 (byterover plugin) and PR #39257
(SSH environment backend). This commit applies the fix — stdin=subprocess.DEVNULL
— to all 85 subprocess.run() and subprocess.Popen() calls that execute inside
the TUI gateway child process.

Scope: TUI-context code only (agent/, tools/, plugins/, tui_gateway/server.py).
CLI code (cli.py, hermes_cli/), tests, scripts, and gateway process management
are excluded — they don't run inside the TUI child and inherit the terminal's
stdin, not the JSON-RPC pipe.

85 call sites across 28 files. All files pass syntax check.
2026-06-08 22:46:57 -07:00
teknium1
c78b3e1d3c fix(auth): add Codex OAuth accounts as distinct pool entries
hermes auth add openai-codex now creates an independent
manual:device_code pool entry per account instead of routing through
the singleton _save_codex_tokens save path, which collapsed every
added account into the latest login (the second add overwrote the
first account's singleton-mirrored device_code entry). This is the
add-path half of #39236; PR #39243 (already on this branch) fixes the
re-auth half.

manual:device_code entries refresh from their own token pair
(_sync_codex_entry_from_auth_store only adopts the singleton for
source=="device_code"), so they need no providers.openai-codex
shadow. Adding the first credential marks openai-codex active (the
singleton path did this implicitly) so the setup wizard's
get_active_provider() check still passes; subsequent adds leave the
active provider untouched.

Adds SOURCE_MANUAL_DEVICE_CODE constant and a regression test that two
distinct accounts keep distinct token pairs. Updates two existing add
tests to the pool-only behavior.

Co-authored-by: glesperance <info@glesperance.com>
2026-06-08 11:57:03 -07:00
Teknium
c9094f5e5f
fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314)
* fix(stream): don't report dropped mid-tool-call streams as output truncation

A streaming tool call whose SSE ends with no finish_reason (the upstream
delivers the tool name + opening '{' then closes the connection cleanly,
no terminator, no [DONE]) was stamped finish_reason='length' by the mock
builder. That routed it through the output-cap truncation path: 3 useless
max_tokens-boosted retries, then the misleading 'Response truncated due to
output length limit' error — even though the model never reported hitting
any cap.

Reproduced live on nvidia/nemotron-3-ultra:free via the Nous dedicated
endpoint, which stalls/drops during large tool-arg generation (50s-4m41s).

Now: when tool args are incomplete AND the provider sent no finish_reason,
tag the response as a partial-stream stub so the loop reports an honest
mid-tool-call drop and asks the model to chunk its output (existing
continuation machinery), instead of escalating output budget and lying.
A provider-reported finish_reason='length' still takes the real-truncation
path unchanged.

* test(stream): update truncated-tool-args test for drop-vs-cap split

test_truncated_tool_call_args_upgrade_finish_reason_to_length pinned the
old behaviour where ANY incomplete tool args → finish_reason='length' with
tool_calls preserved. That single-chunk-no-finish_reason scenario is exactly
the mid-tool-call stream drop now reclassified as a partial-stream stub.

Split into two tests matching the new contract:
- no finish_reason + incomplete args → PARTIAL_STREAM_STUB_ID, tool_calls=None,
  _dropped_tool_names set (the drop path)
- explicit finish_reason='length' + incomplete args → tool_calls preserved,
  'length' upgrade unchanged (the genuine output-cap path)
2026-06-08 11:56:10 -07:00
BarnacleBoy
550b72dd87 fix(cli): gate tool-rendering paths with tool_progress_mode, not quiet_mode
quiet_mode was being used to suppress tool-result display when
tool_progress_mode was 'off'. But quiet_mode also gates operational
status messages, so users with /verbose + tool-progress off lost all
status output.

Adds a dedicated tool_progress_mode attribute to AIAgent; the
tool_executor result-rendering path gates on tool_progress_mode != 'off'.
The CLI passes its tool_progress_mode through agent setup and the
tool-progress cycle command syncs it onto the live agent.

Fixes #33860.
2026-06-08 11:29:53 -07:00
teknium1
55b83c3d99 refactor(agent): extract run_conversation post-loop tail into finalize_turn (god-file Phase 1)
Lift the post-loop finalization tail out of run_conversation into
agent/turn_finalizer.py:finalize_turn. Behavior-neutral; run_conversation
4204 -> 3846 LOC, conversation_loop.py 4578 -> 4220.

The region (everything after the main tool-calling while loop): budget-exhaustion
summary, trajectory save, session persist, turn diagnostics, response transforms,
result-dict assembly, steer drain, and the memory/skill review trigger. Lifted
verbatim into a synchronous single-return free function; the 12 post-loop locals
it reads are passed as keyword args and the assembled result dict is returned to
run_conversation (which returns it to the caller). All agent.* side effects fire
exactly as before.

Imports: os + _summarize_user_message_for_log at module top; logger lazy from
agent.conversation_loop (preserves the gateway... err 'agent.conversation_loop'
logger name, no import cycle).

Validation: 1609/1609 tests/run_agent/ pass; live PTY agent turn PASS.
2026-06-08 09:42:23 -07:00
Teknium
399b8ee5f0
fix(anthropic): strip Responses-only kwargs before Messages SDK call (#31673) (#42155)
A Responses-API-shaped payload carrying instructions=/input=/store=/
parallel_tool_calls= can reach the native Anthropic messages.stream() /
messages.create() call under a rare api_mode-flip race (e.g. a concurrent
auxiliary vision call mutating a shared agent between the kwargs build and
the stream dispatch). The Anthropic SDK rejects these with a non-retryable
TypeError that kills the whole turn and propagates the entire fallback chain.

Add sanitize_anthropic_kwargs() at both Anthropic dispatch sites: it drops
the Responses-only keys in place and logs a WARNING (with #31673 breadcrumb)
when one is present, so the underlying race stays visible in the wild
instead of being silently papered over.
2026-06-08 09:36:38 -07:00
teknium1
dd0d1222a2 fix(agent): don't retry interrupt-induced transport errors (cascading-interrupt hang)
When agent.interrupt() fires during an active LLM call, the main poll loop
force-closes the worker-local httpx client to stop token generation. That
raises a transport error (RemoteProtocolError) on the worker thread — the
EXPECTED consequence of our own close, not a network bug.

The streaming retry loop misclassified it as a transient connection error
and retried; each doomed retry stalled for the full stream-stale timeout
(up to 300s). Because the gateway caches AIAgent instances per session, the
stale worker outlived the interrupted turn and raced the next turn's request
on shared client state — the root of the multi-minute cascading-interrupt
hang reported in the wild.

Fix: a request-local _request_cancelled token set by the poll loop right
before the force-close, in both interruptible_api_call (non-streaming) and
interruptible_streaming_api_call. The worker's exception handler checks the
token and exits cleanly — no retry, no fallback, no 'reconnecting' status —
instead of treating the forced error as transient. The token is request-
local (not agent._interrupt_requested, which is cleared at turn boundaries)
so a stale worker outliving its turn still recognizes its own forced close.

Original diagnosis and fix by @kristianvast (PR #6600), against the then-
inline methods in run_agent.py. Those were since extracted into
agent/chat_completion_helpers.py, so the fix is reapplied there.

Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>
2026-06-08 02:19:13 -07:00
Teknium
aa6f2775fa
fix(memory): run end-of-turn sync off the turn thread (#41945)
A misconfigured/slow external memory provider could hold the agent in
the 'running' state for minutes after the final response was delivered.
MemoryManager.sync_all / queue_prefetch_all looped provider.sync_turn /
queue_prefetch INLINE on the turn-completion path; a provider making a
blocking network/daemon call (a broken Hindsight daemon was observed
blocking ~298s before failing) blocked run_conversation from returning.
Because every interface (CLI, TUI, gateway) marks the agent 'running'
until run_conversation returns, the agent stayed busy for the full block
and any follow-up message triggered an aggressive interrupt that dropped
the message.

Dispatch provider sync/prefetch to a lazily-created single-worker
background executor. sync_all / queue_prefetch_all return immediately;
work completes (or fails, logged) in the background. A single worker
serializes writes so turn N lands before turn N+1. flush_pending()
provides a barrier for session boundaries and deterministic tests.
shutdown_all() drains the executor with a bounded timeout so a wedged
provider can never hang teardown.

Builtin-only / no-provider sessions spawn no executor (zero new threads
in the common case).
2026-06-08 02:18:59 -07:00
teknium1
02a4d66951 fix(auxiliary): retry transient transport error once before fallback (#16587)
Some checks failed
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
Nix Lockfile Fix / auto-fix-main (push) Has been cancelled
Nix Lockfile Fix / fix (push) Has been cancelled
A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
2026-06-08 01:05:45 -07:00
teknium1
524453dab5 refactor(agent): consolidate inner-retry-loop recovery flags into TurnRetryState (god-file Phase 1b)
run_conversation's inner retry loop tracked recovery state in ~15 scattered
bare booleans (per-provider OAuth refresh guards, format-recovery guards,
restart signals). They are now fields on a single TurnRetryState dataclass the
loop mutates in place (_retry.<flag>), giving the recovery bookkeeping a named,
testable home.

Loop-control vars (retry_count, max_retries, max_compression_attempts) stay as
plain locals — they're while-mechanics, not recovery bookkeeping.

Behavior-neutral: pure local→attribute rewrite of 42 references; kwarg NAMES
preserved (e.g. has_retried_429=_retry.has_retried_429). Live simple + tool
turns OK.

Validation: tests/run_agent/ 1615 passed / 0 failed under per-file process
isolation; new test_turn_retry_state.py pins the field contract.
2026-06-07 22:42:05 -07:00
Rod Boev
648706936d test(gateway): add compression session_id rotation integration tests (#34089) 2026-06-07 22:39:51 -07:00
JimStenstrom
cb5c24e37d fix(agent): sync logging session context on compaction id rotation
When context compaction rotates agent.session_id, it updates the gateway/tools
session context (set_current_session_id -> HERMES_SESSION_ID env + ContextVar)
but never updates the separate logging session context. The [session_id] tag on
log lines comes from hermes_logging._session_context (set once per turn in
conversation_loop.py), so post-compaction log lines in the same turn carry the
STALE old id while the message/DB/gateway state carry the new one — breaking log
correlation exactly at the compaction boundary.

Call hermes_logging.set_session_context(agent.session_id) alongside the existing
set_current_session_id, guarded so a logging failure can't regress the routing
update. Logs-only; no runtime or caching impact.

Refs #34089
2026-06-07 22:30:02 -07:00
Teknium
8e223b36ed
fix(curator): protect load-bearing built-in skills from archival/consolidation (#41817)
The curator's idle-archival path (apply_automatic_transitions under
prune_builtins) could archive the bundled `plan` skill, killing the
/plan slash command silently — typing /plan then returned 'Unknown
command' with no signal that a skill had vanished. The archived skill's
hash stays in .bundled_manifest, so 'hermes update' wouldn't re-seed it.

Add PROTECTED_BUILTIN_SKILLS ({plan}) enforced at the master gate
is_curation_eligible() (covers archive_skill + the transition walk) and
in the candidate enumerator (so the LLM consolidation pass never sees
them). Immune to prune_builtins, pin state, and LLM judgment.
2026-06-07 22:23:29 -07:00
Martín Alcalá Rubí
132d6fe6d6 fix(volcengine): strip XML attribute fragments from tool_use.name (#33007)
VolcEngine's api/plan endpoint occasionally leaks raw XML attribute
fragments into tool_use.name when its protocol-translation layer
converts the model's native XML-style tool emission to Anthropic
Messages tool_use blocks, producing names like:

  terminal" parameter="command" string="true
  execute_code" parameter="code" string="true
  session_search" parameter="session_id" string="true

The corruption happens server-side at the provider, but it breaks
every tool call for affected users — no normalization rule in
repair_tool_call can rescue them, so each request runs through three
retries and then aborts as partial.

Add an early sanitizer in agent_runtime_helpers.repair_tool_call that
trims at the first ' " ', " ' ", '<', or '>' character (idx > 0
only) so the rest of the existing repair pipeline (lowercase /
snake_case / fuzzy match) can resolve the cleaned name normally.

Whitespace is deliberately NOT a separator — the legitimate
"write file" -> write_file repair path (covered by
test_space_to_underscore) must keep working.

Tests: 11 new regression cases in TestVolcEngineXmlPollution
covering all three observed polluted names, CamelCase + pollution
mix, single-quote variants, angle-bracket variants, clean-name
passthrough, and the whitespace-preservation guard. All 18 pre-
existing repair tests still pass (29 total in the file).
2026-06-07 22:22:01 -07:00
teknium1
f5bd09af4b refactor(acp): share interrupt-sentinel prefix, simplify guard
Replace the ACP-local prefix/suffix matcher + helper with a single
startswith() check against INTERRUPT_WAITING_FOR_MODEL_PREFIX, now
defined once in conversation_loop.py where the sentinel is produced.
Keeps the source of truth in one place so the guard cannot drift if
the status string changes. Net -17 LOC in server.py.

Also add lsaether to release.py AUTHOR_MAP.
2026-06-07 22:20:43 -07:00
Teknium
2789bf4e25 fix(auxiliary): route Codex Responses path through shared converter (#5709)
The auxiliary Codex adapter maintained its own chat->Responses conversion
loop that forwarded every non-system message's role verbatim into
Responses input[]. When flush_memories()/compression replayed session
history containing assistant tool_calls + role=tool results, those tool
messages leaked into the request and the Responses API rejected them with
HTTP 400: Invalid value: 'tool'.

Route _CodexCompletionsAdapter.create() through the same shared converter
the main agent transport uses (_chat_messages_to_responses_input), so tool
calls become function_call items and tool results become function_call_output
items with a valid call_id. Single conversion path means no future drift.

Also remove the now-dead _convert_content_for_responses() helper — its only
caller was the private conversion loop this change deletes.

Co-authored-by: ProgramCaiCai <techxacm@gmail.com>
2026-06-07 22:18:31 -07:00
teknium1
54870847cb refactor(agent): extract run_conversation prologue into agent/turn_context.py
Phase 1 of the god-file decomposition plan. run_conversation's ~470-line
once-per-turn setup block (stdio guarding, retry-counter resets, user-message
sanitization, todo/nudge hydration, system-prompt restore-or-build,
crash-resilience persistence, preflight compression, the pre_llm_call hook, and
external-memory prefetch) is moved verbatim into build_turn_context(), which
returns a TurnContext dataclass the loop unpacks.

Behavior-neutral move-and-name refactor: the builder mutates `agent` exactly as
the inline code did; only the locals the loop reads back are returned.

- run_conversation: 4602 -> 4217 LOC (-385)
- agent/conversation_loop.py: 4965 -> ~4580 LOC
- new agent/turn_context.py: focused, dependency-injected, unit-tested in isolation

Tests: tests/run_agent/ 1570 passed / 0 failed under per-file process isolation.
Relocation follow-ups: 413_compression mocks now patch both module references;
nudge/on_turn_start source-inspection guards point at the extracted module.
2026-06-07 22:17:35 -07:00
dusterbloom
cca3b77a4b fix(compression): clear _previous_summary on session end (defense-in-depth)
ContextCompressor inherited a no-op on_session_end() from ContextEngine, so
per-session iterative-summary state (_previous_summary) survived a real session
boundary on a reused compressor instance. Override it to clear the summary the
moment the owning session ends, complementing the point-of-use guard in
compress(). Closes the cross-session contamination path in #38788.

Co-authored-by: dusterbloom <32869278+dusterbloom@users.noreply.github.com>
2026-06-07 22:09:45 -07:00
Basil Al Shukaili
8513a6aec7 fix(compression): guard against cross-session stale _previous_summary contamination
When a cron or background session compacts, it sets _previous_summary for
iterative updates. If that session ends without /new or /reset (which calls
on_session_reset()), the stale summary survives on the ContextCompressor
instance. A subsequent live messaging session's compaction then injects it as
'PREVIOUS SUMMARY:' into the summarizer prompt — contaminating the live
session with unrelated content from the prior session.

Add an else guard in compress(): when no handoff summary is found in the
current messages but _previous_summary is non-empty, discard it so
_generate_summary() starts fresh instead of iteratively updating a stale
cross-session summary.

Fixes #38788
2026-06-07 22:09:45 -07:00
Teknium
a77bc2c08d
fix(compression): disable compression on background-review fork to prevent cross-turn stale-parent fork (#41708)
The per-session compression lock prevents same-window concurrent forks but
not cross-turn ones: the background-review fork shares the parent's
session_id, so if it won a compression race its new child session was never
adopted by the gateway (the fork is single-lifecycle). The next foreground
turn then started from the stale parent and compressed it again, leaving the
same parent with two sibling children.

Set review_agent.compression_enabled = False so the fork never triggers
compression. Both trigger sites in conversation_loop.py gate on
compression_enabled before calling _compress_context, so the fork can never
rotate the shared parent. Review needs full context anyway — compressing
would degrade the memory/skill summary.

The per-session lock is kept as defense-in-depth for any future shared-session
path. Adds a regression test that fails without the flag and passes with it.

Closes #38727
2026-06-07 22:06:48 -07:00
Teknium
48ae8029aa
fix(delegate): resolve custom-endpoint subagent pools by endpoint identity (#41730)
Subagents delegated to a custom endpoint were misrouted when the parent
ran on a different custom endpoint. Both runtimes collapse to
provider="custom", so _resolve_child_credential_pool() treated them as
interchangeable and handed the child the parent's pool. Leasing from it
then overwrote the child's delegated base_url with the parent's endpoint
via _swap_credential() — the child sent the delegated model name to the
wrong endpoint.

Custom runtimes now resolve by endpoint identity (the custom:<name> pool
key derived from base_url). The parent pool is reused only when both
parent and child resolve to the same custom endpoint; unregistered raw
endpoints return None so the child keeps its fixed delegated credential.
Non-custom provider paths are unchanged.

Fixes #7833.
2026-06-07 22:05:14 -07:00
islam666
2e61de0638 fix(model_metadata): consult DEFAULT_CONTEXT_LENGTHS before 256K fallback on custom endpoints
Problem: get_model_context_length() had an early return at the end of the
custom-endpoint probe branch (step 3) that returned DEFAULT_FALLBACK_CONTEXT
(256K) without ever consulting the hardcoded DEFAULT_CONTEXT_LENGTHS catalog
(step 8). Models served through a custom/proxied gateway (e.g. corporate
Anthropic proxy) that didn't expose Ollama or local-server endpoints would
hit this path and get capped at 256K, even when the model name clearly
matched a known entry in the catalog (e.g. claude-opus-4-8 → 1M).

Changes:
- agent/model_metadata.py: Before returning DEFAULT_FALLBACK_CONTEXT at the
  end of the custom-endpoint branch, consult DEFAULT_CONTEXT_LENGTHS using
  the same longest-key-first fuzzy matching as step 8. Only fall through
  to 256K if no catalog entry matches.
- tests/agent/test_model_metadata.py: Updated existing test and added new
  test covering the custom-endpoint → catalog fallback behavior.

Fixes #38865
2026-06-07 21:50:57 -07:00
islam666
41f0714287 fix(vision): honor custom_providers per-model supports_vision (#41036)
_supports_vision_override() in image_routing.py checked model.supports_vision
and providers.<name>.models, but not the legacy list-style custom_providers
config. A custom provider entry like:

  custom_providers:
    - name: my-provider
      models:
        my-model:
          supports_vision: true

was ignored, causing image_input_mode=auto to route through the auxiliary
vision_analyze path instead of natively attaching images.

Fix: added a lookup step for custom_providers list entries, matching by
provider name (including 'custom:<name>' variants at runtime).
providers.<name>.models still takes precedence over custom_providers.

13 new tests covering: true/false override, custom: prefix matching,
no-match fallback, non-dict entries, empty lists, models key missing.
2026-06-07 21:50:57 -07:00
islam666
b18490b890 fix(compaction): prevent infinite loop when transcript fits in tail budget
When summary_target_ratio is large (e.g. 0.45) and the context_length is
moderate (e.g. 96000), the soft_ceiling (token_budget * 1.5) can exceed
the total transcript size.  _find_tail_cut_by_tokens walks the entire
transcript without breaking early, and the resulting compress window is
either empty (compress_start >= compress_end) or a single message whose
summary-of-one overhead saves ~0 tokens.

Both outcomes cause a no-op compression that does not increment
_ineffective_compression_count, so should_compress() returns True on
every subsequent turn and the loop repeats endlessly.

Fix (two layers):
1. _find_tail_cut_by_tokens: when the backward walk consumed the entire
   transcript without breaking (cut_idx <= head_end and accumulated <=
   soft_ceiling), re-walk with the raw (non-inflated) token budget to
   find a meaningful cut that gives the summarizer a useful middle window.
2. compress(): when compress_start >= compress_end, increment
   _ineffective_compression_count and log a warning so the existing
   anti-thrashing guard in should_compress() can break the loop.

Fixes #40803
2026-06-07 21:50:57 -07:00
Teknium
b97cd81c78
refactor(insights): drop dead pricing/duration wrappers, call usage_pricing directly (#40618)
Salvaged from #40527; re-verified on main, tightened, tested.

Co-authored-by: HeLLGURD <HeLLGURD@users.noreply.github.com>
2026-06-07 18:33:20 -07:00
Teknium
cb3e41e2fd
feat(onboarding): opt-in structured profile-build path on first contact (#41114)
* feat(onboarding): opt-in structured profile-build path on first contact

On a user's very first gateway message, Hermes now optionally offers to
build a short profile of them — then, only with consent, gathers durable
facts and persists them to the user-profile memory store (memory tool,
target="user") so future sessions start already knowing who they are.

Inspired by Poke's zero-input onboarding, but consent-first by design:
- The agent OFFERS, never assumes. Declining stops it immediately.
- Before ANY external lookup it states what it will look up and asks.
- It never reads connected accounts (email/calendar) silently — the
  exact privacy concern that made naive implementations feel invasive.

Wiring reuses existing infrastructure end-to-end:
- gateway/run.py first-message hook (was a plain self-intro) now swaps in
  the profile-build directive when enabled and not yet offered.
- agent/onboarding.py gains profile_build_mode()/profile_build_directive()
  + PROFILE_BUILD_FLAG, latched once via the existing onboarding.seen
  mechanism so the offer fires at most once per install.
- config default onboarding.profile_build: "ask" (set "off" to disable).
  Added to an existing section, so no _config_version bump needed.

No new storage layer, no new injection path, no prompt-cache impact.

* fix(dashboard): fold onboarding into agent tab to avoid 1-field category

onboarding.profile_build is the only schema-surfaced onboarding field
(onboarding.seen is an internal latch dict), so the dashboard CONFIG_SCHEMA
single-field-category invariant rejected it. Merge onboarding -> agent like
the other small categories.
2026-06-07 08:36:48 -07:00
Teknium
d87f293972
feat(compression): temporal anchoring in compaction summaries (#41102)
Compaction summaries now receive the current date and instruct the
summarizer to rewrite completed actions as absolute, dated, past-tense
facts (e.g. "email John about the proposal" -> "Sent the proposal email
to John on 2026-06-07"). A resumed conversation no longer re-issues work
that already happened or treats a finished action as still pending.

The date is resolved via hermes_time.now() (date-only, user-configured
timezone) inside _generate_summary. The compaction summary is a
mid-conversation message that is never part of the cached prefix, so the
date does not affect prompt-cache stability. Date resolution is
best-effort: a clock failure omits the rule rather than blocking
compaction. The rule rides the shared template, so both first-compaction
and iterative-update prompts carry it.

Inspired by Poke's summarization (temporal anchoring + semantic
preservation).
2026-06-07 08:36:45 -07:00
Teknium
2912d94370
fix: guard int(os.getenv()) casts against malformed env vars (#40598)
A non-numeric value in env vars like HERMES_STREAM_RETRIES,
HERMES_KANBAN_SPECIFY_MAX_TOKENS, GOOGLE_CHAT_MAX_BYTES, IRC_PORT, etc.
raised ValueError at import/init and crashed startup. Parse them safely,
falling back to the default.

Unified onto the existing utils.env_int(key, default) helper for core/
hermes_cli/tools modules instead of the original PR's three duplicate
local helpers; plugins keep minimal inline guards (no core-utils import).
All existing max()/min()/`or extra.get()` wrappers preserved.

Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>
2026-06-07 06:14:24 -07:00
Teknium
1fb99b1f22
fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (#40589)
Two isolated reliability fixes:
- chat_completion_helpers: raise on a zero-chunk stream (no finish_reason,
  no content/reasoning/tool_calls) so retry handles it instead of
  fabricating a successful empty turn.
- model_metadata: parse the OpenRouter/Nous output-cap error phrasing
  ("maximum context length is N ... (A of text input, B of tool input,
  C in the output)") so parse_available_output_tokens_from_error returns
  a real cap and the caller stops looping on it.

Salvaged from #40405 (@ashishpatel26) — took the two stream/error-parsing
fixes. The PR also bundled compression-state changes (on_session_start
clearing _previous_summary; cron session-id prefix preservation, #38788);
those touch the compression hot path and are split out for separate review.

Co-authored-by: ashishpatel26 <ashishpatel26@users.noreply.github.com>
2026-06-07 03:52:09 -07:00
bmoore210
330ca4585b fix: harden gateway startup and turn persistence
Persist the inbound user turn before provider/tool execution so a crash
before run_conversation() (e.g. provider/httpx client init failure) keeps
the inbound message in the transcript. Repair stale/missing SSL_CERT_FILE
state on gateway startup, and avoid duplicate gateway fallback writes.
2026-06-07 02:15:23 -07:00
kshitijk4poor
ffe665277c fix(aux): honor model.default_headers on auxiliary client too (#40033)
The salvaged main-agent fix (sanidhyasin) applies model.default_headers
to the primary OpenAI client, but the auxiliary client (title generation,
context compression, vision routing) builds its own clients and did not
read the override. For a `provider: custom` endpoint behind a gateway/WAF
that rejects the OpenAI SDK's identifying headers, the main turn would
succeed while auxiliary calls to the same endpoint still failed with the
opaque 502/4xx from #40033.

Add agent.auxiliary_client._apply_user_default_headers() (user values win
over provider/SDK defaults; no-op when unconfigured) and apply it at every
OpenAI-wire client construction site:
- _try_custom_endpoint() — config-level `model.provider: custom`
- the named custom-provider branch (custom_providers/providers entries),
  including the anthropic-SDK-missing OpenAI-wire fallback
- the api-key-provider, async-conversion, and main resolve_provider_client
  fallback branches

To prevent the two clients ever drifting on precedence/value handling,
AIAgent._apply_user_default_headers (run_agent.py) now delegates the config
read + merge to this shared helper (run_agent already imports from
auxiliary_client). Native Anthropic/Bedrock branches are untouched (they
don't use the OpenAI wire).

8 new tests (helper semantics + config-level custom + named custom);
full aux + attribution header suites green (295).
2026-06-07 02:02:40 -07:00
Sanidhya Singh
a216ff839b fix(agent): honor model.default_headers for custom OpenAI-compatible providers (#40033)
Custom OpenAI-compatible endpoints sitting behind a gateway/WAF can reject
the OpenAI Python SDK's default identifying headers (User-Agent: OpenAI/Python,
X-Stainless-*) and return an opaque 502/4xx even though the same request body
succeeds under curl. There was no supported way to override those headers.

Add a model.default_headers config key whose values are merged onto the
OpenAI client's default_headers, taking precedence over provider- and
SDK-supplied defaults. Applied at client construction and on every credential
swap / client rebuild so the override survives reconnects. No-op for native
Anthropic / Bedrock modes and when unconfigured.
2026-06-07 02:02:40 -07:00
Teknium
3c8f1dee8d fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (#36718)
compress_context() sets last_prompt_tokens=-1 right after compression to
mark "no real API usage yet". The preflight display-seed used
`_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1
(truthy), so any positive rough estimate clobbered the sentinel with a
schema-inflated count — re-triggering compression on the next turn.
Treat any negative value as "no real data yet" and skip the seed.

Salvaged from #40246 as the minimal root-cause fix. The original also
added an `_awaiting_suppression_count` bounded-window state machine to
should_compress() across 3 files; left out here to keep blast radius
small — the sentinel guard alone fixes the re-fire. The suppression
window can be added separately if the usage=None-stub edge case warrants it.

Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
2026-06-07 01:56:51 -07:00
Teknium
0524c9b34e
feat(compression): raise compaction trigger to 85% for gpt-5.5 on Codex OAuth (#40957)
The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window
(verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses
is rejected with context_length_exceeded while ~250K succeeds; the same slug
exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the
default 50% trigger, auto-compaction fires at ~136K — half the usable window.

Raise the trigger to 85% (~231K) on this exact route only, gated by a new
compression.codex_gpt55_autoraise config flag (default true). When it fires,
emit a one-time notice (CLI inline print + gateway status_callback replay) with
the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's
global threshold.

- _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex
- _compression_threshold_for_model() now provider-aware + opt-out param
- config key + _config_version bump (27->28) for backfill
- docs + tests (40 cases in test_arcee_trinity_overrides.py)
2026-06-07 01:40:50 -07:00
Teknium
fe8920db18
fix(memory): reject memory tools that shadow core tool names (#40902)
A memory provider tool whose name collides with a built-in core tool
(e.g. clarify, delegate_task) was skipped from agent.tools at init but
lingered in MemoryManager._tool_to_provider, where the has_tool dispatch
branch could route a call to a tool that was never registered (#40466).

Block the collision at registration instead of patching dispatch:
- MemoryManager.add_provider rejects any tool whose name is in
  _HERMES_CORE_TOOLS (warn + skip), so it never enters the routing table.
- get_all_tool_schemas applies the same filter, so the manager never
  advertises a schema it would refuse to route.

Built-ins always win, matching the invariant used by the TTS/browser/
search provider registries. Makes the dispatch-hijack structurally
impossible regardless of branch ordering.

Closes #40466.
2026-06-06 18:44:09 -07:00
Teknium
8f7567c325
fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569)
Salvaged from #40381; cleaned up, re-verified against main, tests added.

Co-authored-by: zapabob <zapabob@users.noreply.github.com>
2026-06-06 18:33:44 -07:00
kshitij
d4a7bfd3aa
Merge pull request #29724 from bbednarski9/bbednarski/nmf-41B-nemoflow-plugin
feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay
2026-06-06 10:46:41 -07:00
Siddharth Balyan
fcb1944b4f
feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011)
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix Lockfile Fix / auto-fix-main (push) Waiting to run
Nix Lockfile Fix / fix (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
uv.lock check / uv lock --check (push) Waiting to run
* feat(tui): HERMES_DEV_CREDITS live-spend dev readout (L0 tracer for usage-aware credits)

L0 of the usage-aware-credits feature: a dev-only, env-gated tracer that
exercises the real header -> CreditsState -> TUI pipe end-to-end behind
HERMES_DEV_CREDITS, de-risking the L1/L5 build before the notice policy exists.

- agent/credits_tracker.py: CreditsState + parse_credits_headers (headers are
  strings -> paid_access via == "true", never bool(); retain-last-known; only
  subscription_micros may be negative; *_usd kept verbatim).
- run_agent.py: _capture_credits / get_credits_state / get_credits_spent_micros,
  session-start baseline latch, + dev-gated "credits" capture log.
- agent/chat_completion_helpers.py: capture on the streaming response.
- agent/agent_init.py: init _credits_state + _credits_session_start_micros.
- tui_gateway/server.py: _get_usage emits dev_credits_spent_micros only when flagged.
- ui-tui appChrome.tsx / types.ts: cents delta status segment + "(dev credits)" banner.

Off by default; silent for normal users. Validated live against staging
(capture log delta matches the TUI segment). Throwaway consumer (readout/log/
banner); credits_tracker + the capture plumbing are the real feature foundation.

* test(credits): lock parser under 9-state matrix + harden validation (L2)

Add tests/agent/test_credits_tracker.py with 92 tests covering the 9-state
matrix (healthy, sub_90pct, grant_exhausted, purchased_only, tool_pool_free,
depleted, debt, missing, no_org) plus validation edge cases: version strict==1
with warn-once latch for v>1, bool-string trap (paid_access/tool_pool_gated_off
== "true"/"false", never bool()), half-pair subscription limit treated as
both-absent while parse succeeds, USD regex ^-?\d+\.\d{2}$, non-int micros
→ None, negative non-subscription micros → None, as_of_ms junk → None, zero
limit ZeroDivision guard.

Harden agent/credits_tracker.py to match the spec:
- Add tool_pool_micros/tool_pool_gated_off/from_header fields to CreditsState
- Add depleted property (== not paid_access, never remaining==0)
- Change used_fraction guard to key off subscription_limit_micros (the actual
  denominator) not denominator_kind (metadata)
- Replace fail-soft _safe_int with a sentinel-returning variant; full validation
  now returns None on any malformed field rather than silently defaulting
- Add module-level warn-once latch for version > 1
- Add USD regex validation; add denominator_kind allow-list check
- Parse x-nous-tool-pool-* prefix headers (not x-nous-credits-tool-pool-*)

* feat(credits): notice spine — AgentNotice + notice_callback/notice_clear_callback + TUI binding (L1)

L1 of usage-aware credits: the driver-agnostic notice delivery spine that L4's
policy will fire through and L5's TUI render will consume.

- agent/credits_tracker.py: AgentNotice dataclass (text/level/kind/ttl_ms/key/id;
  kind defaults "sticky", kept TTL-expressive for a future config seam).
- run_agent.py: AIAgent gains notice_callback + notice_clear_callback slots and
  _emit_notice / _emit_notice_clear emitters (swallow all callback errors — a
  notice must never break the agent loop; no-op when unbound).
- agent/agent_init.py: thread both callbacks through init_agent.
- tui_gateway/server.py: bind both in _agent_cbs → notification.show / notification.clear
  WS events (snake_case payload, matching the existing gateway-event convention).
- ui-tui/src/gatewayTypes.ts: notification.show / notification.clear arms on GatewayEvent.
- tests/run_agent/test_notice_spine.py: 15 tests (emitter fire + fail-open + no-op,
  signature threading, TUI binding payload shape).

Messaging push is out of v1 (binds neither callback). CLI binding + the TUI render/
decode land with L4 (firing) and L5 (render) so turn-end flush is wired correctly.

* feat(credits): threshold reconciliation policy + tests (L4.1)

* feat(credits): wire threshold policy into capture + latch (L4.2)

After a fresh header parse, _capture_credits runs evaluate_credits_notices against
the agent's _credits_latch and emits the result — clears first, then shows (so a
recovered depletion clears before the "restored" success lands, and depleted wins
the latest-wins slot). Gated on a bound notice_callback: messaging (no callbacks)
still caches state for /usage but runs no policy. Parse stays fail-open (miss →
keep last-known); the eval/emit path warns on failure rather than swallowing, so a
depletion-notice bug can't vanish silently.

- run_agent.py: _capture_credits split into parse (swallow→miss) + policy (warn);
  latch lazy-guarded (object.__new__ safety).
- agent/agent_init.py: init agent._credits_latch = {"active": set(), "seen_below_90": False}.

* feat(tui): render credits notices in the status bar (L5, Strategy B)

The TUI now renders the notification.show / notification.clear gateway events the
agent emits — a level-colored notice overrides the status/verb slot when not busy.

- Notice state machine on turnController (pendingNotice + dedicated noticeTimer +
  show/clear/applyNotice/flushPendingNotice/clearNoticeState). createGatewayEventHandler
  decodes the events and delegates.
- Render priority busy > notice > status (appChrome StatusRule); notice text rendered
  verbatim (its glyph comes from the policy), shrinkable so it never clips model│ctx;
  dev-credits banner + Δ segment preserved. UiState.notice is snake_case (matches wire).
- Busy-wins: a notice arriving mid-turn is held and flushed at the THREE turn-end sites
  (recordMessageComplete / interruptTurn / recordError) — never idle(), which reset()
  also calls (would leak across sessions); reset() clears instead.
- Dedicated noticeTimer (never statusTimer); TTL starts on visibility with an id-guard;
  latest-wins cancels the prior timer; clear is key-matched (no-op on mismatch); a sticky
  survives a turn (flush no-ops with no pending); session reset clears (no cross-session leak).
- 20 tests (handler/turnController logic incl. R3-C2 timer isolation + render priority).

* feat(credits): cold-start seed for new Nous sessions (L3)

A genuinely-new Nous session has no inference header yet, so seed credits state from
the authoritative GET /api/oauth/account snapshot at session start (in the new-session
branch of _restore_or_build_system_prompt — inline, since the on_session_start plugin
hook gets no agent reference). The seed runs the shared notice policy, so a session that
opens already depleted warns IMMEDIATELY rather than only after the first turn.

- Maps the nested account fields (paid_service_access → paid_access; total_usable /
  subscription / purchased on paid_service_access_info; rollover on subscription), each
  None-guarded; float dollars → micros via round(d*1e6), *_usd left "" (render formats
  from micros — never synthesize a verbatim usd from a float).
- Magnitudes-only: no monthlyCredits on the endpoint → subscription_limit_* unset →
  used_fraction None → no warn90 from the seed (% only once a header lands, per D-E).
- Provider-guarded to Nous; fail-open (any error leaves _credits_state None, never
  blocks startup); paid_access unknown ⇒ True (never falsely depleted).
- run_agent.py: extracted the warm-path policy/emit block into a shared
  _emit_credits_notices() so capture and the seed fire notices identically.

* feat(credits): /usage Nous credits magnitudes view + recovery trigger (L6)

Add Nous credit dollar magnitudes to /usage (subscription / top-up / total
+ rollover + renewal + portal CTA), magnitudes-only per v1 (no % until the
account endpoint exposes a denominator). Reuses the existing account-usage
render machinery via a new pure build_nous_credits_snapshot() that maps a
NousPortalAccountInfo to an AccountUsageSnapshot; no nous branch is added to
fetch_account_usage (keeps the per-provider boundary intact).

CLI /usage also doubles as a depletion-recovery trigger: a force_fresh
account fetch, kept in a SEPARATE local so it never clobbers the
header-sourced agent._credits_state (which alone carries used_fraction). If
paid access recovered while credits.depleted is latched and a notice
consumer is bound, it reuses agent._emit_credits_notices() to clear it.
Gateway /usage displays magnitudes only — messaging binds no notice
consumer, so it performs no recovery emit.

Fail-open throughout: any portal hiccup leaves /usage unaffected.

* refactor(credits): dedupe HERMES_DEV_CREDITS flag parse via shared helpers

The dev-flag truthy check was inlined in three places. Replace with the shared
utils.is_truthy_value (run_agent.py, tui_gateway/server.py — also drops a
redundant inline `import os`) and a hoisted DEV_CREDITS_MODE export in
ui-tui/src/config/env.ts (consumed by appChrome, which also stops recomputing the
env check on every render). Behaviour-preserving; identical truthy set.

* fix(credits): cut dead /usage recovery trigger + bound portal fetches (L6 review)

Adversarial review found the /usage depletion-recovery trigger dead AND broken:
the CLI binds no notice_clear_callback, the TUI runs /usage in a separate
slash-worker subprocess (its own agent/latch), and the no-clobber rule made it
evaluate stale paid_access anyway. Recovery already happens on the next inference
(warm path), so the trigger was redundant — remove it and stop the depleted
notice over-promising.

- cli.py: remove the dead recovery block; bound the /usage portal fetch with a
  10s wall-clock timeout (ThreadPoolExecutor) like the per-provider fetch —
  urllib's per-socket timeout is not a wall-clock guarantee.
- agent/credits_tracker.py: reword the depleted CTA to "run /usage for balance"
  (no false recovery promise; /usage shows fresh magnitudes, sticky clears next turn).
- agent/conversation_loop.py: same wall-clock timeout on the cold-start seed fetch
  so a stalled portal can't hang session startup; tidy its time import.

* chore(credits): dev notice-state fixtures (HERMES_DEV_CREDITS_FIXTURE)

Throwaway dev scaffolding to exercise the notice pipeline without real spend or
Redis seeding. Set HERMES_DEV_CREDITS_FIXTURE to a state name (healthy / sub_90pct
/ grant_exhausted / depleted / clear) or a file path whose contents name a state
(re-read each turn → flip states live for recovery testing). _capture_credits
injects the chosen CreditsState instead of parsing real headers and runs the
shared notice policy. Deletable with the rest of the HERMES_DEV_CREDITS scaffolding.

* feat(credits): /usage monthly-grant % gauge

The portal /api/oauth/account subscription block now carries monthly_credits
(the per-period grant allowance, the % denominator). The consumer parsed
monthly_charge but dropped monthly_credits, so /usage stayed magnitudes-only.

Capture monthly_credits into NousPortalSubscriptionInfo + _subscription_from_payload.
build_nous_credits_snapshot emits a Subscription usage window (real % used, routed
through the existing render machinery) when monthly_credits is a finite positive
denominator and credits_remaining is finite and <= cap; otherwise it degrades to
magnitudes-only (older portals, rollover-over-cap, or non-finite payloads).

Guards (adversarial-review-driven): reject non-finite operands (json.loads parses
bare NaN/Infinity by default → would render $nan + a false 100% used), reject
bools, guard div-by-zero (cap>0), and suppress the gauge when remaining > cap
(rollover spanning the period makes the cap a nonsensical denominator → the
$X-of-$Y detail would read as a contradiction). Debt (remaining<0) clamps to 100%.

Money rule preserved: the ratio + magnitudes are computed from numeric float
account fields via display formatting, never by parsing a server *_usd string
(there are none on these dataclasses).

13 gauge tests added (tests/agent/test_nous_credits_gauge.py).

* fix(credits): show /usage Nous block whenever a Nous account is present

/usage runs in a slash-worker subprocess whose resolved inference provider is
often not "nous" even when the user has a Nous account, so gating the Nous
credits block on (provider == "nous") hid it entirely — the account data was
fully available but never rendered.

Gate instead on "a Nous account is logged in": a cheap local auth-state lookup
(get_provider_auth_state('nous') has an access_token) decides whether to attempt
the portal fetch, regardless of which provider inference runs on. In the gateway
the block is also lifted out of the 'if provider:' scope so a Nous-credentialled
user with another (or no) resident inference provider still sees their balance.
Fail-open and the per-fetch wall-clock timeout are preserved.

* fix(credits): show /usage Nous block when there's no live agent (TUI slash-worker)

In the TUI, /usage runs in a slash-worker subprocess that resumes the session
WITHOUT building an agent (self.agent is None), so _show_usage early-returned
"(._.) No active agent" before ever reaching the Nous credits block — which is
agent-independent (a portal fetch gated on Nous auth-state). Extract the block
into _print_nous_credits_block() and run it at the no-agent / no-calls
early-returns too (returns True if it printed, so the fallback message only
shows when there's genuinely nothing).

Verified live against staging: the block + monthly-grant gauge now render in the
slash-worker /usage path (previously hidden). The plain CLI REPL + messaging
paths are unchanged (they have a live agent).

* feat(credits): escalating 50/75/90 usage bands (single status line)

Replace the lone 90%-used warning with three escalating bands (50 info, 75 warn,
90 warn) shown as ONE status-bar line: it displays the highest band the
subscription grant has crossed, replaces the line as usage climbs, steps back
down on recovery, and clears below 50%. No stacking, no per-turn churn.

Bands live in a tunable CREDITS_USAGE_BANDS list; the policy derives everything
from it. Single notice key (credits.usage) with a usage_band latch field so the
notice only re-emits when the band actually changes. The crossing gate
(seen_below_90) is preserved so a fresh live session that opens mid-range stays
quiet until it has been observed below the lowest band (cold-start primes it when
it wants an open-high warning). Denominator math unchanged: % = subscription
grant burn (cap - grant_remaining)/cap, clamped [0,1]; top-up never moves the %.

Migrated test_credits_policy.py to the new key + added TestUsageBands (climb,
step-down, recovery-clear, idempotent, inclusive boundaries).

* feat(credits): hydrate notices at session OPEN via shared seed (TUI + first-turn)

Notices previously only fired inside a conversation turn (first message), so a
session that opened already depleted / past a usage band showed nothing at
'ready'. Extract the cold-start seed into a shared seed_credits_at_session_start()
and call it (a) in the TUI/desktop agent build right after the notice callback is
wired (fires at 'ready', before any message) and (b) as the first-turn fallback in
conversation_loop. Idempotent (skips once _credits_state exists) and fail-open.

The seed now maps monthly_credits -> subscription_limit_micros +
denominator_kind='subscription_cap', so used_fraction is computable at seed time
and usage-band warnings (not just depletion) hydrate on open. Primes the crossing
latch so a session opening already in a band warns immediately. Degrades to
depletion-only when monthly_credits is absent (older portals).

Adds test_credits_cold_start.py covering open-at-band, depletion, debt, no-cap
degradation, and the shared seed (fires/idempotent/skips-non-nous).

* feat(credits): /usage monthly-grant % gauge + fixture support + TUI surfacing

agent/account_usage.py: build_nous_credits_snapshot emits a subscription %% gauge
when the portal supplies a positive, finite monthly_credits denominator with
remaining <= cap (guards reject NaN/Infinity and rollover-over-cap, which would
render $nan or a contradictory $X-of-$Y); degrades to magnitudes-only otherwise.
Adds shared nous_credits_lines() (auth-gated, wall-clock-bounded portal fetch) so
the CLI and TUI /usage render the same block, and _snapshot_from_credits_state()
so HERMES_DEV_CREDITS_FIXTURE drives /usage offline too.

TUI: session.usage RPC carries credits_lines (agent-independent) and the /usage
panel renders them regardless of API-call count or resume state — previously the
TUI's separate /usage implementation only showed token counts.

Money rule preserved: %% and magnitudes come from numeric float account fields via
display formatting, never by parsing a server *_usd string.

* feat(credits): CLI REPL inline notices (parity with TUI)

The plain CLI agent bound no notice callbacks, so credit notices were TUI-only.
Bind notice_callback/notice_clear_callback on the CLI AIAgent; _on_notice renders
a single level-colored line above the prompt (error red / warn yellow / success
green / info dim) via _cprint, and seed credits at session open so a depletion or
usage-band warning shows before the first message — the same hydration the TUI
got. _on_notice_clear is a no-op (the REPL prints lines, no persistent slot).

* test(credits): add sub_50pct + sub_75pct dev fixtures for the new usage bands

The fixture set jumped 10%% -> 90%%; add sub_50pct (uf 0.5 -> band 50 info) and
sub_75pct (uf 0.75 -> band 75 warn) so the new escalating bands are exercisable
via HERMES_DEV_CREDITS_FIXTURE across all three surfaces (notice, session-open
seed, /usage gauge).

* fix(credits): usage-band notice clears on next prompt (not sticky-forever)

A 50/75/90 usage heads-up was sticky and camped the status bar indefinitely. Clear
the visible credits.usage notice when a new turn starts (startMessage), so it shows
until your next prompt then yields. The server latch is unchanged, so it won't
re-nag at the same band — it only re-shows when the band actually changes (climb)
or clears when usage drops below the lowest band. Depletion stays sticky.

* refactor(credits): consolidate the /usage credits block behind nous_credits_lines()

The CLI (_print_nous_credits_block) and the messaging gateway (_handle_usage_command)
each re-implemented the auth-gate + portal fetch + render, and both bypassed the
dev-fixture short-circuit that only the TUI honored — so /usage ignored
HERMES_DEV_CREDITS_FIXTURE on the CLI and in chat. Route both through the shared
agent.account_usage.nous_credits_lines() helper: one fetch/render path, one auth
gate, and the fixture works on every surface (~60 fewer duplicated lines).

The gateway usage test recorded only the last asyncio.to_thread call; /usage now
dispatches both the account fetch and the credits fetch, so it records every call
and matches the account fetch by its provider arg.

* fix(credits): keep the /usage gauge type-safe and log its fail-open path

_is_finite_num is now a TypeGuard[float], so the type checker narrows the gauge
operands (monthly_credits / credits_remaining) and the magnitudes passed to
_fmt_usd through it — no more None-operand warnings on the arithmetic. Add a debug
breadcrumb on the nous_credits_lines portal-fetch fail-open so a dead /usage block
is diagnosable in agent.log without a dev flag.

* fix(credits): harden the header tracker — prod-leak gate, hot-path probe, fire-and-forget seed

- Prod-leak guard: dev fixtures (HERMES_DEV_CREDITS_FIXTURE) now also require
  HERMES_DEV_CREDITS, so a stray fixture var can't surface fabricated balances on a
  real account. Matches the documented run workflow (both vars set together).
- Hot-path probe: parse_credits_headers checks for the version sentinel header
  before allocating a lowercased copy of the response headers — skips that work on
  every non-Nous API call. Behaviour-identical and still case-insensitive.
- Fire-and-forget seed: the real portal fetch in seed_credits_at_session_start now
  runs in a daemon thread, so a slow/unreachable portal never delays session "ready"
  (previously blocked up to 10s). The dev-fixture path stays synchronous; the thread
  re-checks idempotency before hydrating (a live header may land first).
- Diagnostics: debug breadcrumbs on the parse and seed fail-open paths so a crashed
  parser / dead seed is distinguishable from a legitimate no-headers miss.

Cold-start tests set HERMES_DEV_CREDITS alongside the fixture to match the gate.

* test(tui): fix env-timing in the StatusRule dev-credits assertion

DEV_CREDITS_MODE is read once at module load (config/env), so mutating
process.env.HERMES_DEV_CREDITS inside the test couldn't flip it — the dev-banner
assertion only passed if the env was exported before vitest started, and failed in a
normal run. Move that assertion to a sibling file that mocks config/env with
DEV_CREDITS_MODE: true (scoped, no module-reset / React-identity hazard).

* test(credits): cover the dev-fixture /usage render and usage-band clear-on-prompt

- _snapshot_from_credits_state (the offline /usage renderer) had no direct test:
  lock the gauge math, the verbatim *_usd magnitudes, the depletion line and the
  fixture marker, plus the no-cap (no gauge) and None-state cases.
- turnController.startMessage had no test for clearing the credits.usage notice on
  the next prompt while leaving credits.depleted sticky.

* feat(credits): deliver credit notices over messaging gateways

Bind notice_callback/notice_clear_callback on the per-turn gateway agent
so usage-band / depletion / restored notices reach Telegram/Discord/Slack/
etc. Previously the messaging gateway bound neither callback, so the agent's
_emit_credits_notices early-returned and a chat user crossing a band got
nothing unless they ran /usage manually.

- render_notice_line(): AgentNotice -> single plaintext line (level glyph +
  text), plaintext-only so it renders uniformly without per-platform escaping.
  Fail-soft on malformed/empty notices.
- Standalone push for every notice (messaging has no persistent status bar):
  route through the shared _deliver_platform_notice rail (honors private/
  public delivery + thread metadata), scheduled onto the gateway loop via
  safe_schedule_threadsafe from the agent's sync worker thread — same pattern
  as _status_callback_sync.
- The fired-once latch lives on the cached (reused-in-place) agent and
  persists across turns, so a band crosses once -> one push, no per-turn
  re-nag. Re-fires only after idle-eviction rebuilds the agent (a reminder).
- Recovery ('Credit access restored') rides the show path (emitted as a
  success notice, not a clear). notice_clear_callback is a no-op: a sent
  platform message can't be cleanly retracted.

Tests: render glyph/levels/fail-soft + public/private delivery seam through
_deliver_platform_notice + no-adapter no-op.

* fix(credits): don't double the glyph on messaging notices

render_notice_line prepended a per-level glyph, but the notice policy already
bakes the glyph into the text (and the TUI + CLI render it verbatim) — so every
credit notice over messaging came out doubled ("⚠ ⚠ Credits 90% used",
" ✕ Credit access paused"). Emit the text verbatim instead; drop the now-dead
level→glyph map.

The render tests fed glyph-less text (and the success case only checked
startswith), so the doubling slipped through. Rework them around the verbatim
contract and add an end-to-end regression that runs real evaluate_credits_notices
output through render_notice_line and asserts the line is returned unchanged.
2026-06-06 13:18:18 +05:30
Brooklyn Nicholson
0f45509daf fix(agent): make mid-turn /steer trusted, not read as injection
A steer rides inside a tool result (the only role-alternation-safe slot
mid-turn), so a bare "User guidance:" line reads as untrusted tool content —
well-behaved models refuse it as suspected prompt injection (observed live:
"I only follow instructions from you directly, not ones injected through
command results").

- Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker
  (prompt_builder.format_steer_marker), shared by both drain sites.
- Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this
  exact marker and trusts it as a genuine user message — while still ignoring
  lookalikes buried in tool/web/file output. Static text → byte-stable prompt,
  no prompt-cache regression; gated on the agent having tools.
- Desktop: steer ack is now an inline transcript note ( steered · …) instead
  of a toast.

Marker is intentionally static (not a per-session nonce) to honor the
byte-stable system-prompt caching policy; nonce hardening noted as follow-up.
2026-06-05 20:59:36 -05:00
Teknium
ec46f5912e
fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints (#39730)
* fix: respect disabled auto-compaction on context overflow

Port from anomalyco/opencode#30749.

When compression.enabled is false, NO automatic compaction trigger may
fire. The proactive token-threshold paths (preflight + post-response
should_compress gate) already honoured the setting, but the three
provider-overflow recovery paths in the agent loop — long-context-tier
429, 413 payload-too-large, and context-overflow — called
_compress_context() unconditionally, silently compressing and rotating
the session against the user's explicit choice.

Add a single guard at the top of the overflow-recovery dispatch: when
compression is disabled and the error is one of those three overflow
classes, surface a terminal error (compaction_disabled: True) telling the
user to /compress manually, /new, switch to a larger-context model, or
reduce attachments. Manual /compress (force=True) is unaffected — it never
enters this loop.

Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't
compress when disabled; control case still compresses when enabled).
Existing overflow-recovery tests updated to enable compaction explicitly
(they verify the recovery fires); fixture defaults flipped to True to
match production (compression.enabled defaults to True).

* fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints

Two distinct failures hit users on the gemini provider with only Google
AI Studio keys set.

1. Truncation loop: build_gemini_request() only set maxOutputTokens when
   max_tokens was non-None. Hermes passes None to mean "unlimited", but
   Gemini's native generateContent does NOT treat an absent maxOutputTokens
   as full budget — it applies a low internal default and stops early with
   finishReason=MAX_TOKENS, truncating tool calls. The agent then retries
   3x and refuses the incomplete call. Now default to the published 65,535
   ceiling (shared by all current Gemini text models) when max_tokens=None.

2. HTTP 400 on Gemini endpoint: the chat_completions transport assembles
   profile extra_body (Nous portal 'tags', reasoning, provider prefs) and
   sends it via the OpenAI client to whatever base_url is resolved. When a
   profile that emits extra_body (e.g. Nous) is active but the endpoint is a
   native Gemini base_url — typical when only Google creds exist and a
   fallback/aux call lands on Gemini — Google rejects the unknown 'tags'
   field with a non-retryable 400. Strip all non-thinking_config extra_body
   keys when the resolved endpoint is native Gemini.

Verified E2E against real transport code: tags stripped on native Gemini,
preserved on Nous and the /openai compat endpoint; maxOutputTokens=65535
on None, explicit values respected.
2026-06-05 03:53:59 -07:00
Evi Nova
4690bbc363
fix(local): recognize unqualified hostnames as local endpoints (#9248)
Docker Compose service names (e.g. ollama, litellm, hermes-litellm)
are unqualified hostnames with no dots. These are always local — they
resolve via Docker DNS, /etc/hosts, or mDNS. Without this fix, the
stale stream timeout fires on local LLM proxies, causing infinite
reconnect loops.

Closes #7905
2026-06-05 10:18:10 +10:00
teknium1
dd4ba4c2c4 fix(vision): cap pixel dimensions proactively at embed time + declare Pillow
Follow-up to the salvaged #37727. That PR fixed the reactive recovery path
(classifier + post-failure shrinker) but left the PROACTIVE embed-time guard
in vision_tools byte-only — a tall small-byte screenshot (e.g. 1200x12000 at
0.06 MB) still baked into immutable history un-resized, relying on a failed
round-trip to trigger reactive shrink.

- vision_tools: add _image_exceeds_dimension() + _EMBED_MAX_DIMENSION (7900px);
  the embed-time cap now fires on bytes OR pixels and passes max_dimension to
  the resizer, so tall small-byte images are shrunk before they're embedded.
- vision_tools: best-effort lazy-install of Pillow (tool.vision) in the resize
  ImportError fallback so the soft dep self-heals (respects allow_lazy_installs).
- error_classifier: add two more Anthropic dimension-cap wording variants.
- pyproject + lazy_deps: declare Pillow as the [vision] extra / tool.vision
  lazy dep (it was undeclared everywhere; without it ALL resize recovery no-ops).
- tests: cover _image_exceeds_dimension (tall/small/edge/no-Pillow/corrupt).

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-04 06:16:45 -07:00
kyssta-exe
6bdbe30763 fix(vision): guard image pixel dimensions, not just bytes (#37677)
Anthropic enforces two independent ceilings per image:
1. 5 MB encoded byte size
2. 8000 px longest side

Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB)
passes every byte check but fails the pixel check, returning a
non-retryable HTTP 400 that permanently bricks the conversation thread.

Fixes:
- error_classifier: add 'image dimensions exceed' pattern to
  _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large
  and triggers the shrink/retry path instead of falling through to
  non-retryable error.
- conversation_compression: check pixel dimensions (via Pillow) even
  when byte size is under the 4 MB target. If max(dims) > 8000, force
  shrink.
- vision_tools._resize_image_for_vision: add optional max_dimension param.
  When set, images exceeding the pixel cap are downscaled even if they're
  under the byte budget. The resize loop now checks both byte AND pixel
  limits before accepting a candidate.

Closes #37677
2026-06-04 06:16:45 -07:00
Teknium
38d3c49aaf
refactor(skills): clean up bundled skill set + add environments: relevance gate (#39028)
* refactor(skills): clean up bundled skill set + add environments: relevance gate

Bundled skills cleanup pass plus a new offer-time relevance gate.

Removals (redundant / dead):
- spotify (covered by the spotify plugin's 7 native tools)
- linear (covered by `hermes mcp install linear`)
- kanban-codex-lane, debugging-hermes-tui-commands
- empty category markers: diagramming, gifs, inference-sh,
  mlops/training, mlops/vector-databases
- domain (stale orphan dup of optional/research/domain-intel)

Bundled -> optional:
- baoyu-article-illustrator, baoyu-comic, creative-ideation, pixel-art
- dspy, subagent-driven-development
- minecraft-modpack-server, pokemon-player
- hermes-s6-container-supervision (-> optional/devops)

Consolidation:
- webhook-subscriptions + native-mcp folded into the hermes-agent skill
  as references/webhooks.md + references/native-mcp.md with SKILL.md pointers
- writing-plans merged into plan (v2.0.0); related_skills + prose refs updated

New: environments: frontmatter gate (agent/skill_utils.skill_matches_environment)
- Offer-time relevance filter (kanban / docker / s6), parallel to platforms:.
- Wired into the 3 OFFER surfaces only (prompt_builder skills index,
  skills_tool.list_skills, skill_commands slash discovery).
- Explicit loads (skill_view, --skills preload) intentionally BYPASS it, so
  load-bearing force-loads like the kanban dispatcher's `--skills kanban-worker`
  always resolve. Verified via E2E.
- kanban-orchestrator/kanban-worker tagged environments: [kanban];
  hermes-s6-container-supervision tagged environments: [s6] + platforms: [linux].

Validation: 8/8 E2E gating assertions (incl force-load invariant);
442 targeted tests green (agent, skills_tool, skill_commands, kanban worker).

* docs: regenerate skill catalogs + pages for the bundled cleanup

Regenerated per-skill doc pages, catalogs, and sidebar to match the skill
moves/removals in the parent commit. Moved skills' pages relocate
bundled -> optional (history preserved); removed skills' pages deleted;
edited skills' pages refreshed (hermes-agent now embeds the webhook +
native-mcp reference pointers). zh-Hans i18n mirror: stale bundled pages
and catalog rows for moved/removed skills pruned (new optional translations
land via the translation pipeline).

* test: drop regression test for removed kanban-codex-lane skill

The kanban-codex-lane skill was removed in the bundled-skills cleanup;
its dedicated regression test read the now-deleted SKILL.md and failed
with FileNotFoundError on CI shard 6.
2026-06-04 06:11:22 -07:00
Fearvox
3d1d0a49fe fix(minimax): align default_aux_model with M3 frontier on minimax + minimax-cn
The minimax / minimax-cn / minimax-oauth profiles still advertised
M2.7 (and M2.7-highspeed for OAuth) as their default_aux_model,
predating the M3 release (2026-06-01). The user-facing
_PROVIDER_MODELS['minimax'] catalog top entry is M3, and the
recommended config for a Token-Plan install now sets
model.default: MiniMax-M3, so the aux default was the only
remaining drift.

Updates:

  * minimax        default_aux_model: M2.7        -> M3
  * minimax-cn     default_aux_model: M2.7        -> M3
  * minimax-oauth  default_aux_model: M2.7-highspeed -> M2.7
                    (M3 is not on the OAuth / Coding Plan tier per
                    platform docs as of this PR; the highspeed
                    variant was the 2x-cost regression from #4082
                    that PR #6082 collapsed to plain M2.7 for
                    minimax / minimax-cn but missed OAuth)

  * agent/auxiliary_client.py: drop the three legacy
    _API_KEY_PROVIDER_AUX_MODELS_FALLBACK entries for the minimax
    family. _get_aux_model_for_provider() reads from
    ProviderProfile.default_aux_model first (line 250) and only
    falls back to the dict when the profile has no aux model or
    the profile import fails. With the profile now set, the dict
    entries are dead code and a drift hazard. Mirrors the deepseek
    cleanup in 773a0faca.

  * tests/agent/test_minimax_provider.py: update the existing
    TestMinimaxAuxModel assertions from MiniMax-M2.7 to MiniMax-M3
    (the intent — 'standard, not highspeed' — is unchanged; the
    pin value is).

  * tests/plugins/model_providers/test_minimax_profile.py: new
    file mirroring tests/plugins/model_providers/test_deepseek_profile.py.
    Pins each of the three profiles' default_aux_model and
    asserts _get_aux_model_for_provider() returns it. A second
    class guards against the highspeed regression coming back.

Refs:
  - Closes #36196 in spirit (M3 support — the catalog half of
    that issue is #36212; this PR covers the profile half)
  - Related: #4082 (M2.7-highspeed 2x-cost), #6082 (previous
    M2.7-highspeed -> M2.7 fix that missed OAuth + the
    auxiliary_client.py fallback dict)
  - Pattern: 773a0faca (same profile-layer fix for deepseek)
2026-06-04 05:53:35 -07:00
teknium
153fe28474 fix(vision): use MiniMax type="video" block (not input_video) + tests
The salvaged conversion emitted type:"input_video", which MiniMax M3 rejects
just like the original video_url block. Per MiniMax's Anthropic-compat docs,
the video content block is type:"video" with an image-style source (base64 or
url). Fixes the block type, converts URL-based videos too, and adds 4 video
conversion tests (none shipped with the original PR).
2026-06-04 05:38:11 -07:00
kyssta-exe
0b46c4163a fix(vision): convert video_url blocks to Anthropic input_video format for MiniMax providers
The video_analyze tool sends OpenAI-style 'video_url' content blocks, which
breaks Anthropic-protocol providers (minimax, minimax-cn). These providers
expect 'input_video' blocks with base64 data instead of data: URLs.

Extends _convert_openai_images_to_anthropic() to also handle video_url
blocks, converting them to Anthropic's input_video format when targeting
Anthropic-compatible endpoints.

Fixes #37219
2026-06-04 05:38:11 -07:00
AhmetArif0
9756dff5fd fix(model_metadata): drop stale ≤256,000 cache entries for Grok-4.3
The ``grok-4.3`` (1M context) catalog entry was added on 2026-05-15
(ce0e189d3).  Between 2026-04-10 (when ``grok-4`` at 256,000 was first
added by b57769718) and 2026-05-15, grok-4.3 slugs resolved via the
generic ``grok-4`` substring catch-all and that 256,000 value was
persisted to context_length_cache.yaml.  Users who first queried
grok-4.3 in that 35-day window are stuck at 256K forever — the cache
is read at step 1 before the hardcoded defaults in step 8, so the
correct 1M entry is never reached.

Mirror the existing Kimi/Codex/MiniMax-M3 stale-cache guards: add
_model_name_suggests_grok_4_3() and an elif branch that drops any
cached value ≤ 256,000 for a grok-4.3 slug so the next lookup falls
through to the 1M hardcoded default.

Adds 4 regression tests: helper unit test, stale-drop-and-re-resolve,
correct-cache-preserved, and no-clobber for plain grok-4 (256K correct).
2026-06-04 05:36:34 -07:00
Teknium
f99665f99a
feat(prompt): broaden Hermes self-knowledge pointer to docs + skill (#38538)
The HERMES_AGENT_HELP_GUIDANCE block (added #16535) only fired when the
user explicitly asked about configuring/setting up Hermes. Broaden it so
the agent treats the docs as a standing source of self-knowledge for any
Hermes-related help and for understanding its own features/tools, points
to the hermes-agent skill for additional guidance, and treats the docs as
the authoritative/latest source of truth when the two differ.

Static constant in the cache-safe stable tier — no prompt-cache impact.
2026-06-03 17:01:56 -07:00
Nate George
e8c3ac2f5c fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral)
Fireworks/Mistral reject HTTP 400 'Extra inputs are not permitted, field:
messages[N].tool_calls[M].extra_content' on any session whose history
contains prior Gemini tool calls. Gemini 3 thinking models attach
extra_content (thought_signature) to tool_calls; it survived to the wire
because the sanitize paths only stripped call_id/response_item_id.

Strip extra_content from the outgoing wire copy in both sanitize paths
(ChatCompletionsTransport.convert_messages + _sanitize_tool_calls_for_strict_api),
but gate it on the target model: keep extra_content for Gemini-family
targets (the thought_signature MUST be replayed or Gemini 400s), strip it
for everyone else — including non-Gemini models that inherit a stale Gemini
signature earlier in a mixed-provider session. Native Gemini is unaffected
(GeminiNativeClient bypasses these paths).

Original stored history is never mutated (only the per-call copy).

Fixes #17986.
2026-06-03 16:42:52 -07:00