hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-29 18:46:59 +00:00

Author	SHA1	Message	Date
brooklyn!	3e74f75e41	feat(agent): coding-context posture across CLI/TUI/desktop/ACP (#43316 ) * feat(agent): coding-context posture with per-model edit-format tuning Hermes detects when it's running in a coding context — an interactive surface (CLI, TUI, ACP, desktop) sitting in a code workspace (git repo or recognised project root) — and shifts into a coding posture. Outside that (chat platforms, non-workspaces) nothing changes. The posture is modelled as a frozen RuntimeMode selected from a small ContextProfile registry (coding/general). A profile is data: the toolset to collapse to, the operating brief to inject, and seams for model routing and memory. Every domain reads the same resolved object instead of re-probing git/config on its own: - System prompt — RuntimeMode.system_blocks(): an operating brief (gather context before editing, edit through tools not chat, verify with terminal, cap retry loops) plus a live git/workspace snapshot, built once and baked into the stable prompt tier so per-conversation caching is preserved. - Per-model edit-format tuning — the brief nudges each model family toward the patch mode it handles best: OpenAI/Codex toward mode='patch' (V4A multi-file diffs), Anthropic toward mode='replace' (string replacement). The model id rides on RuntimeMode; unknown families keep neutral wording. - Skill index — non-coding skill categories are pruned from the prompt's skill index (discovery-only; skills_list/skill_view still reach the full catalog, with a disclosure note). - Toolset — only under the opt-in 'focus' mode does the posture collapse to the coding toolset + enabled MCP servers; the default posture is prompt-only and never overrides configured toolsets. Activation via agent.coding_context: auto (default), focus, on, off. Subagents inherit the posture for free via toolset inheritance + the shared prompt builder. Detection is not memoized so a long-lived gateway/TUI process can't pin a stale posture across working directories. * feat(agent): cover new-file authoring in the coding edit-format nudge The per-model edit-format guidance only addressed editing existing code (patch mode='patch' vs 'replace'), but authoring a brand-new file — write_file, not patch — is a large fraction of real coding work and the nudge was silent on it. Surfaced when building a single-file artifact where the dominant operation was write_file and the steering offered no guidance. Both family lines now lead with "author new files with write_file; for edits to existing code prefer ...". Tests assert write_file appears in each family's brief; unknown families still get neutral wording. * docs(agent): correct memoization docstring + clarify TUI config-load asymmetry * feat(agent): sharpen the coding posture — verify-loop facts, wider edit steering, $HOME guard Tuning pass on the coding posture from dogfooding it as a harness: - Workspace snapshot now hands the model its verify loop up front: detected manifests + package manager (lockfile sniff), the exact verify commands (package.json scripts, Makefile targets, scripts/run_tests.sh, pytest config), and which context files (AGENTS.md / CLAUDE.md / .cursorrules) exist at the root. Marker-only (non-git) projects get the snapshot too instead of nothing. The "verify before claiming done" brief line was the highest-value piece in evals — this turns it from advice into an executable loop instead of making the model rediscover the test command every session. Still stat-cheap, size-guarded reads, built once at prompt time. - Edit-format steering covers the families Hermes actually serves: Gemini and open-weight coding models (DeepSeek, Qwen, Kimi, GLM, Grok, Hermes, Llama, Mistral, Devstral, MiniMax) steer to mode='replace' — their RL scaffolds use str_replace-style editors. Previously only GPT/Codex and Claude families got steering; the models Hermes users disproportionately run all fell to neutral. - Operating brief gains four behaviors elite harnesses encode: batch independent reads/searches in one turn; fix root causes and the bug class (sibling call paths), not the reported site; no drive-by refactors/renames/reformatting; never read, print, or commit secrets. Plus a patch-failure escalation ladder: after the same region fails twice, rewrite the enclosing function/file with write_file instead of a third patch attempt. - $HOME dotfiles guard: a git repo rooted exactly at the home directory (or a marker sitting in it, e.g. a global ~/AGENTS.md) is user config, not a code workspace — without the guard, every session anywhere under a dotfiles-managed home silently flipped to the coding posture. Real projects under such a home still detect via their own markers/repos; 'on' mode bypasses the guard.	2026-06-10 23:06:44 -05:00
teknium1	efcbbde48c	refactor: keep anthropic_content_blocks in-memory only (no state.db column) Drop the hermes_state.py column + persistence plumbing from the salvaged interleaved-thinking fix. The ordered-block channel covers the failure window in-memory (turn replayed within the live conversation loop). A session reloaded from disk after a crash falls back to reconstruction; if that replay 400s, the thinking-signature recovery (#43667) strips reasoning_details and retries — one degraded call in a rare resume path instead of a schema column. Replaces the DB-roundtrip test with a fallback-shape test.	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	7a1eed8268	fix(anthropic): redact replayed tool inputs and broaden thinking-replay 400 recovery Two additive hardening changes on the interleaved-thinking replay path introduced by this PR's anthropic_content_blocks channel. Both are scoped to that channel's blast radius; neither changes correct behavior. 1. Replay-time tool-input re-sourcing (credential safety). The ordered-block channel captures each tool_use `input` from the RAW API response in normalize_response, which is NOT credential-redacted. The parallel tool_calls[].function.arguments IS redacted at storage time (build_assistant_message, #19798). The verbatim-replay fast path in _convert_assistant_message replayed the raw block input, so a secret a model inlined into a tool call (e.g. an Authorization header value passed inside a terminal command) would ride back onto the wire even though it is redacted everywhere else in history. Re-source tool_use input from the redacted tool_calls map by sanitized id; interleave order (the reason this channel exists) is unaffected. Adapted from #36071, which re-sources tool inputs the same way on its replay path. 2. Broaden the thinking-replay 400 classifier (defense-in-depth). error_classifier only matched "signature" + "thinking", so the frozen-block variant — "thinking ... blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response." — carried no "signature" token and fell through to a non-retryable abort. The anthropic_content_blocks channel prevents the reorder that triggers this 400 at the source, but if any future mutator reintroduces it, the turn now self-heals via the existing strip-reasoning-and-retry recovery instead of crash-looping. A negative case ensures an unrelated "cannot be modified" 400 (no "thinking") is not swept in. Mirrors the classifier broadening in #36087 and #36071. Tests - tests/agent/test_anthropic_thinking_block_order.py: a replay test asserting an inlined secret is redacted on the wire while interleave order is preserved. - tests/agent/test_error_classifier.py: three cases — frozen-block 400 native and via OpenRouter route to thinking_signature/retryable; an unrelated "cannot be modified" 400 does not. Both grafts verified RED (tests fail with the change reverted) then GREEN. Full adapter, transport, classifier and output-field-leak suites pass. Co-authored-by: AlexanderBFoley <92330381+AlexanderBFoley@users.noreply.github.com>	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	529bb1c3d5	fix(anthropic): strip output-only SDK fields from replayed content blocks HTTP 400 "messages.N.content.M.text.parsed_output: Extra inputs are not permitted" on the native Anthropic transport. Anthropic SDK 0.87.0 response blocks carry output-only attributes the Messages input schema forbids: text blocks get `parsed_output` and `citations=None`, tool_use blocks get `caller`. normalize_response captured blocks verbatim via _to_plain_data and replayed them as request input on the next turn, so the forbidden fields leaked back -> 400. Like the earlier thinking-block bug, one poisoned turn wedges every subsequent request in the session (even the diagnostic turn), recoverable only by switching models or deleting the session. This is a defect in the anthropic_content_blocks channel added for the interleaved-thinking fix: it preserved block ORDER correctly but copied every SDK attribute, including output-only ones. Fix — whitelist input-permitted fields per block type at all three leak points: - agent/transports/anthropic.py normalize_response: sanitize at CAPTURE so the poison never persists to state.db (defence-in-depth). - agent/anthropic_adapter.py _sanitize_replay_block (new): whitelist used on the ordered-blocks replay path; also recovers already-poisoned stored sessions. - agent/anthropic_adapter.py _convert_content_part_to_anthropic: a stored `text` part is rebuilt from whitelisted fields instead of dict(part) verbatim (this was the exact content.N.text.parsed_output failure locus). Whitelist not blacklist, so future SDK output-only fields can't reintroduce it. Block order and thinking-block signatures are preserved (the reason the channel exists). Adds tests/agent/test_anthropic_output_field_leak.py; full adapter suite green (163 tests). Existing poisoned state.db rows scrubbed out-of-band.	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	aaccaada28	fix(anthropic): preserve interleaved thinking/tool_use block order on replay Interleaved-thinking turns (adaptive thinking, Claude 4.6+/Opus 4.8) emit content blocks like: thinking_1(signed) tool_use_1 thinking_2(signed) tool_use_2 Anthropic signs each thinking block against the turn content preceding it at its position. normalize_response split the turn into two parallel lists (reasoning_details + tool_calls), discarding cross-type order, and _convert_assistant_message rebuilt it as [all thinking][text][all tool_use]. That moved thinking_2 ahead of tool_use_1, invalidating its signature, so Anthropic rejected the latest assistant message with HTTP 400: messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. Observed repeatedly in agent.conversation_loop against api.anthropic.com / claude-opus-4-8, recurring across sessions on multi-thinking-block turns. Fix: carry a verbatim, order-preserving copy of the turn's content blocks (anthropic_content_blocks) end-to-end - capture in normalize_response, persist/restore through state.db, and replay unchanged for the latest assistant message. Gated to turns that actually interleave signed thinking with tool_use, so normal turns are unaffected. Adds 3 regression tests including a SQLite round-trip covering the crash-recovery reload path.	2026-06-10 20:45:16 -07:00
Matt Harris	e0e2571711	feat(web): Parallel-backed web search & extract — free Search MCP when keyless, v1 REST when keyed Make Parallel the web search/extract backend with a zero-setup free tier: - Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel becomes the default backend when no other web credentials are configured (ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing web_search/web_extract tools are the only tools registered. - Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints (client.search / client.extract with advanced_settings.full_content) — no beta. Bumps parallel-web 0.4.2 -> 0.6.0. - Attribution: on the free path only, results carry provider/attribution and the CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is unbranded. - Selection/registration: web tools register unconditionally (free MCP backstop) while check_web_api_key remains a real usability probe; explicit per-capability backends are honored (so misconfig surfaces) rather than masked by the fallback. Tested: live web_search/web_extract against search.parallel.ai in keyless and keyed modes; unit suites for the MCP client, backend selection, and display labeling; full agent run shows the "Parallel search" label on the free path.	2026-06-10 19:54:38 -07:00
xxxigm	615ad97928	fix(streaming): stop socket read timeout from preempting stale-stream detector (#43570 ) * fix(streaming): stop socket read timeout from preempting stale-stream detector The stale-stream detector is deliberately scaled to 180-300s so reasoning models (e.g. Opus) can pause mid-stream during extended thinking. But the httpx socket read timeout stayed at a flat 120s for cloud providers and fired first, tearing down healthy reasoning streams before the detector (which owns retry + diagnostics) could act. Symptom: every Copilot/Opus turn dies with ReadTimeout at a consistent ~125s and never completes. Floor the cloud socket read timeout at the stale-stream timeout so it can no longer fire before the detector. Local providers and explicit HERMES_STREAM_READ_TIMEOUT / request_timeout_seconds overrides are unchanged. * test(streaming): pin read-timeout >= stale-stream invariant for cloud reasoning streams Cover the contract that the httpx socket read timeout is never shorter than the stale-stream detector for cloud providers on the default: small contexts floor to 180s, >=50K to 240s, >=100K to 300s; explicit overrides win; local providers and the unresolved-value fallback are unaffected.	2026-06-10 20:21:38 -05:00
Ian Culling	86e10dd874	fix(agent): route 'thinking blocks cannot be modified' 400 to recovery Anthropic returns a 400 when the thinking/redacted_thinking blocks in the latest assistant message are mutated upstream: 'thinking or redacted_thinking blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.' The classifier's thinking_signature branch only matched on the substring 'signature', so this variant fell through to a non-retryable client error and hard-aborted the turn -- even though the existing strip-reasoning_details -and-retry recovery would have healed it. Broaden the 400 match to also catch 'cannot be modified' / 'must remain as they were' (still gated on 'thinking'), routing it to the same recovery. Adds a negative-case test so unrelated 'cannot be modified' 400s are not swept in. Defense-in-depth, orthogonal to the root-cause work in #35975 / #17861 (which prevent the block mutation in the first place). Only changes a terminal-failure into a one-shot recovery. Signed-off-by: Ian Culling <ian@culling.ca>	2026-06-10 12:39:44 -07:00
rob-maron	6110aed9be	Suppress "Credit access paused" notice on free models (#43669 ) * don't show credits message on free model * PR comments	2026-06-10 23:55:06 +05:30
Gille	47e77ae166	fix(curator): use shared atomic state writer	2026-06-10 03:04:54 -07:00
Robin Fernandes	af978ecb17	fix(model): require confirmation for expensive model selections Rebased onto current main and re-ported across the restructured surfaces: model flows now thread confirm_provider/base_url/api_key through hermes_cli/model_setup_flows.py, the Discord picker lives in plugins/platforms/discord/adapter.py, and the web dashboard picker applies chat-mode switches via config.set so the expensive-model confirmation can ride the response. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 00:24:06 -07:00
teknium	2ce3ae3d16	fix(error-classifier): don't misclassify unsupported-param 400s as context overflow A GPT-5 model rejecting max_tokens returns a 400 whose message contains the literal substring 'max_tokens' — one of the _CONTEXT_OVERFLOW_PATTERNS. The 400 path in _classify_400 checked overflow patterns before any request-validation check (which only existed on the 5xx path), so the parameter error was routed into the compression loop, re-sent with the same bad param, and ended in 'Cannot compress further' on a tiny context. Hoist a request-validation guard (unsupported/unknown parameter) above the context-overflow check in _classify_400. Deliberately excludes the generic invalid_request_error code, which OpenAI also stamps on real overflow 400s, so genuine overflows still compress. Pairs with the max_completion_tokens param fix that stops the bad request at the source. Also adds AUTHOR_MAP entry for the salvaged PR #13902 commit.	2026-06-09 23:22:10 -07:00
Xiangji	19c07c4037	fix(params): send max_completion_tokens for newer OpenAI families on custom endpoints Third-party OpenAI-compatible endpoints (self-hosted gateways, OpenRouter, Azure proxies) fronting gpt-4o / gpt-4.1 / gpt-5+ / o1-o4 models silently received max_tokens and 400'd with unsupported_parameter, because the three kwarg-selection sites only checked base_url_hostname(...) == "api.openai.com" and fell through to max_tokens on every other host. The constraint is enforced server-side by the model family, not by the URL, so name-based detection is required as a fallback. Changes: - utils.py: new shared helper model_forces_max_completion_tokens(model) that prefix-matches gpt-4o, gpt-4.1, gpt-5, o1, o3, o4 families on normalized (lowercased, vendor-prefix-stripped) names. - run_agent.py: _max_tokens_param ORs the helper into the URL check. - agent/auxiliary_client.py: - auxiliary_max_tokens_param gains an optional keyword-only model arg. - _build_call_kwargs inline branch applies the same check for both provider == "custom" and non-custom paths. Tests: - tests/test_model_forces_max_completion_tokens.py: 31 new cases covering positive families, negatives (classic gpt-4, claude, llama, mistral, qwen, deepseek), vendor prefixes, case-insensitivity, whitespace, None/empty, and substring-not-prefix guards. - tests/run_agent/test_run_agent.py::TestMaxTokensParam: 5 new model-based cases (custom + gpt-5.4, openrouter + gpt-4o-mini, custom + o1-preview, classic gpt-4-turbo keeps max_tokens, llama3 keeps max_tokens). - tests/agent/test_auxiliary_client.py::TestAuxiliaryMaxTokensParam: new class, 7 tests covering the URL x model matrix.	2026-06-09 23:22:10 -07:00
Brooklyn Nicholson	7ffc216bc0	fix(agent): make a binary @file: reference actionable instead of a dead end A binary @file: ref (PDF, docx, spreadsheet, …) expanded to a bare "binary files are not supported" warning with no content. The model saw a failure and gave up — e.g. a dropped PDF came back as a text note claiming the type was unsupported, even though the file was staged on disk right next to it. Inject an actionable content block instead: the path, mime type, size, and a nudge to use its tools to read/convert/view the file (and explicitly not to tell the user the type is unsupported). General across every binary type — not PDF-specific. The file already resolves where the agent's tools run (local cwd or the staged copy in a remote session workspace), so it can act on it directly.	2026-06-09 19:16:46 -05:00
Siddharth Balyan	1febb08240	fix(anthropic): default new Claude models to the modern thinking contract (#42991 ) New Anthropic models without a recognized version substring (claude-fable-5 and future named/numbered releases) were classified as legacy and routed down the manual-thinking path, which made OpenRouter emit thinking.type.disabled — a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400. Invert the brittle version-substring allowlists to default-to-modern (mirroring _get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/ no-sampling contract, with an explicit legacy list for older families. Non-Claude Anthropic-Messages models (minimax, qwen3, …) keep the manual path. - anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort / _forbids_sampling_params now default unknown Claude models to modern; legacy families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS. - openrouter profile: omit reasoning entirely (→ adaptive default) instead of forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy Anthropic + all non-Anthropic models still pass the disable form through. - model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out). Tests assert the invariant ("unknown Claude model -> modern contract; legacy stays manual; non-Claude unaffected"), not specific model names.	2026-06-09 23:37:23 +05:30
Teknium	967c325da8	fix(models): read OpenRouter live context_length before hardcoded catch-all (#42986 ) OpenRouter-routed slugs that are absent from models.dev (e.g. a freshly shipped anthropic/claude-fable-5) fell through to the generic DEFAULT_CONTEXT_LENGTHS["claude"]=200K entry and under-reported their real 1M window. The step-6 OpenRouter live-metadata fallback was gated on `not effective_provider`, but an OpenRouter selection sets effective_provider="openrouter" (inferred from the base URL), so that branch was dead code for every OR model. Add a dedicated step-5 OpenRouter branch that consults the live /models catalog (authoritative, refreshes as new slugs ship) before models.dev and the hardcoded family defaults — mirroring the existing Nous/Copilot/GMI branches. Keeps the Kimi-family 32k underreport guard. Per-model values are respected (claude-haiku-4.5 stays 200K), so it does not blanket-bump to 1M. Regression tests cover the fable-5 case, the genuinely-200k case, and the Kimi guard.	2026-06-09 10:49:32 -07:00
JP Lew	cb4cc08b0a	fix(codex): record app-server token usage in session accounting	2026-06-09 02:46:04 -07:00
Teknium	399b8ee5f0	fix(anthropic): strip Responses-only kwargs before Messages SDK call (#31673 ) (#42155 ) A Responses-API-shaped payload carrying instructions=/input=/store=/ parallel_tool_calls= can reach the native Anthropic messages.stream() / messages.create() call under a rare api_mode-flip race (e.g. a concurrent auxiliary vision call mutating a shared agent between the kwargs build and the stream dispatch). The Anthropic SDK rejects these with a non-retryable TypeError that kills the whole turn and propagates the entire fallback chain. Add sanitize_anthropic_kwargs() at both Anthropic dispatch sites: it drops the Responses-only keys in place and logs a WARNING (with #31673 breadcrumb) when one is present, so the underlying race stays visible in the wild instead of being silently papered over.	2026-06-08 09:36:38 -07:00
teknium1	dd0d1222a2	fix(agent): don't retry interrupt-induced transport errors (cascading-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR #6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>	2026-06-08 02:19:13 -07:00
Teknium	aa6f2775fa	fix(memory): run end-of-turn sync off the turn thread (#41945 ) A misconfigured/slow external memory provider could hold the agent in the 'running' state for minutes after the final response was delivered. MemoryManager.sync_all / queue_prefetch_all looped provider.sync_turn / queue_prefetch INLINE on the turn-completion path; a provider making a blocking network/daemon call (a broken Hindsight daemon was observed blocking ~298s before failing) blocked run_conversation from returning. Because every interface (CLI, TUI, gateway) marks the agent 'running' until run_conversation returns, the agent stayed busy for the full block and any follow-up message triggered an aggressive interrupt that dropped the message. Dispatch provider sync/prefetch to a lazily-created single-worker background executor. sync_all / queue_prefetch_all return immediately; work completes (or fails, logged) in the background. A single worker serializes writes so turn N lands before turn N+1. flush_pending() provides a barrier for session boundaries and deterministic tests. shutdown_all() drains the executor with a bounded timeout so a wedged provider can never hang teardown. Builtin-only / no-provider sessions spawn no executor (zero new threads in the common case).	2026-06-08 02:18:59 -07:00
teknium1	02a4d66951	fix(auxiliary): retry transient transport error once before fallback (#16587 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Has been cancelled Details Nix Lockfile Fix / fix (push) Has been cancelled Details A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>	2026-06-08 01:05:45 -07:00
teknium1	524453dab5	refactor(agent): consolidate inner-retry-loop recovery flags into TurnRetryState (god-file Phase 1b) run_conversation's inner retry loop tracked recovery state in ~15 scattered bare booleans (per-provider OAuth refresh guards, format-recovery guards, restart signals). They are now fields on a single TurnRetryState dataclass the loop mutates in place (_retry.<flag>), giving the recovery bookkeeping a named, testable home. Loop-control vars (retry_count, max_retries, max_compression_attempts) stay as plain locals — they're while-mechanics, not recovery bookkeeping. Behavior-neutral: pure local→attribute rewrite of 42 references; kwarg NAMES preserved (e.g. has_retried_429=_retry.has_retried_429). Live simple + tool turns OK. Validation: tests/run_agent/ 1615 passed / 0 failed under per-file process isolation; new test_turn_retry_state.py pins the field contract.	2026-06-07 22:42:05 -07:00
JimStenstrom	cb5c24e37d	fix(agent): sync logging session context on compaction id rotation When context compaction rotates agent.session_id, it updates the gateway/tools session context (set_current_session_id -> HERMES_SESSION_ID env + ContextVar) but never updates the separate logging session context. The [session_id] tag on log lines comes from hermes_logging._session_context (set once per turn in conversation_loop.py), so post-compaction log lines in the same turn carry the STALE old id while the message/DB/gateway state carry the new one — breaking log correlation exactly at the compaction boundary. Call hermes_logging.set_session_context(agent.session_id) alongside the existing set_current_session_id, guarded so a logging failure can't regress the routing update. Logs-only; no runtime or caching impact. Refs #34089	2026-06-07 22:30:02 -07:00
Teknium	8e223b36ed	fix(curator): protect load-bearing built-in skills from archival/consolidation (#41817 ) The curator's idle-archival path (apply_automatic_transitions under prune_builtins) could archive the bundled `plan` skill, killing the /plan slash command silently — typing /plan then returned 'Unknown command' with no signal that a skill had vanished. The archived skill's hash stays in .bundled_manifest, so 'hermes update' wouldn't re-seed it. Add PROTECTED_BUILTIN_SKILLS ({plan}) enforced at the master gate is_curation_eligible() (covers archive_skill + the transition walk) and in the candidate enumerator (so the LLM consolidation pass never sees them). Immune to prune_builtins, pin state, and LLM judgment.	2026-06-07 22:23:29 -07:00
Teknium	2789bf4e25	fix(auxiliary): route Codex Responses path through shared converter (#5709 ) The auxiliary Codex adapter maintained its own chat->Responses conversion loop that forwarded every non-system message's role verbatim into Responses input[]. When flush_memories()/compression replayed session history containing assistant tool_calls + role=tool results, those tool messages leaked into the request and the Responses API rejected them with HTTP 400: Invalid value: 'tool'. Route _CodexCompletionsAdapter.create() through the same shared converter the main agent transport uses (_chat_messages_to_responses_input), so tool calls become function_call items and tool results become function_call_output items with a valid call_id. Single conversion path means no future drift. Also remove the now-dead _convert_content_for_responses() helper — its only caller was the private conversion loop this change deletes. Co-authored-by: ProgramCaiCai <techxacm@gmail.com>	2026-06-07 22:18:31 -07:00
teknium1	54870847cb	refactor(agent): extract run_conversation prologue into agent/turn_context.py Phase 1 of the god-file decomposition plan. run_conversation's ~470-line once-per-turn setup block (stdio guarding, retry-counter resets, user-message sanitization, todo/nudge hydration, system-prompt restore-or-build, crash-resilience persistence, preflight compression, the pre_llm_call hook, and external-memory prefetch) is moved verbatim into build_turn_context(), which returns a TurnContext dataclass the loop unpacks. Behavior-neutral move-and-name refactor: the builder mutates `agent` exactly as the inline code did; only the locals the loop reads back are returned. - run_conversation: 4602 -> 4217 LOC (-385) - agent/conversation_loop.py: 4965 -> ~4580 LOC - new agent/turn_context.py: focused, dependency-injected, unit-tested in isolation Tests: tests/run_agent/ 1570 passed / 0 failed under per-file process isolation. Relocation follow-ups: 413_compression mocks now patch both module references; nudge/on_turn_start source-inspection guards point at the extracted module.	2026-06-07 22:17:35 -07:00
Basil Al Shukaili	8513a6aec7	fix(compression): guard against cross-session stale _previous_summary contamination When a cron or background session compacts, it sets _previous_summary for iterative updates. If that session ends without /new or /reset (which calls on_session_reset()), the stale summary survives on the ContextCompressor instance. A subsequent live messaging session's compaction then injects it as 'PREVIOUS SUMMARY:' into the summarizer prompt — contaminating the live session with unrelated content from the prior session. Add an else guard in compress(): when no handoff summary is found in the current messages but _previous_summary is non-empty, discard it so _generate_summary() starts fresh instead of iteratively updating a stale cross-session summary. Fixes #38788	2026-06-07 22:09:45 -07:00
Teknium	a77bc2c08d	fix(compression): disable compression on background-review fork to prevent cross-turn stale-parent fork (#41708 ) The per-session compression lock prevents same-window concurrent forks but not cross-turn ones: the background-review fork shares the parent's session_id, so if it won a compression race its new child session was never adopted by the gateway (the fork is single-lifecycle). The next foreground turn then started from the stale parent and compressed it again, leaving the same parent with two sibling children. Set review_agent.compression_enabled = False so the fork never triggers compression. Both trigger sites in conversation_loop.py gate on compression_enabled before calling _compress_context, so the fork can never rotate the shared parent. Review needs full context anyway — compressing would degrade the memory/skill summary. The per-session lock is kept as defense-in-depth for any future shared-session path. Adds a regression test that fails without the flag and passes with it. Closes #38727	2026-06-07 22:06:48 -07:00
islam666	2e61de0638	fix(model_metadata): consult DEFAULT_CONTEXT_LENGTHS before 256K fallback on custom endpoints Problem: get_model_context_length() had an early return at the end of the custom-endpoint probe branch (step 3) that returned DEFAULT_FALLBACK_CONTEXT (256K) without ever consulting the hardcoded DEFAULT_CONTEXT_LENGTHS catalog (step 8). Models served through a custom/proxied gateway (e.g. corporate Anthropic proxy) that didn't expose Ollama or local-server endpoints would hit this path and get capped at 256K, even when the model name clearly matched a known entry in the catalog (e.g. claude-opus-4-8 → 1M). Changes: - agent/model_metadata.py: Before returning DEFAULT_FALLBACK_CONTEXT at the end of the custom-endpoint branch, consult DEFAULT_CONTEXT_LENGTHS using the same longest-key-first fuzzy matching as step 8. Only fall through to 256K if no catalog entry matches. - tests/agent/test_model_metadata.py: Updated existing test and added new test covering the custom-endpoint → catalog fallback behavior. Fixes #38865	2026-06-07 21:50:57 -07:00
islam666	41f0714287	fix(vision): honor custom_providers per-model supports_vision (#41036 ) _supports_vision_override() in image_routing.py checked model.supports_vision and providers.<name>.models, but not the legacy list-style custom_providers config. A custom provider entry like: custom_providers: - name: my-provider models: my-model: supports_vision: true was ignored, causing image_input_mode=auto to route through the auxiliary vision_analyze path instead of natively attaching images. Fix: added a lookup step for custom_providers list entries, matching by provider name (including 'custom:<name>' variants at runtime). providers.<name>.models still takes precedence over custom_providers. 13 new tests covering: true/false override, custom: prefix matching, no-match fallback, non-dict entries, empty lists, models key missing.	2026-06-07 21:50:57 -07:00
Teknium	b97cd81c78	refactor(insights): drop dead pricing/duration wrappers, call usage_pricing directly (#40618 ) Salvaged from #40527; re-verified on main, tightened, tested. Co-authored-by: HeLLGURD <HeLLGURD@users.noreply.github.com>	2026-06-07 18:33:20 -07:00
Teknium	cb3e41e2fd	feat(onboarding): opt-in structured profile-build path on first contact (#41114 ) * feat(onboarding): opt-in structured profile-build path on first contact On a user's very first gateway message, Hermes now optionally offers to build a short profile of them — then, only with consent, gathers durable facts and persists them to the user-profile memory store (memory tool, target="user") so future sessions start already knowing who they are. Inspired by Poke's zero-input onboarding, but consent-first by design: - The agent OFFERS, never assumes. Declining stops it immediately. - Before ANY external lookup it states what it will look up and asks. - It never reads connected accounts (email/calendar) silently — the exact privacy concern that made naive implementations feel invasive. Wiring reuses existing infrastructure end-to-end: - gateway/run.py first-message hook (was a plain self-intro) now swaps in the profile-build directive when enabled and not yet offered. - agent/onboarding.py gains profile_build_mode()/profile_build_directive() + PROFILE_BUILD_FLAG, latched once via the existing onboarding.seen mechanism so the offer fires at most once per install. - config default onboarding.profile_build: "ask" (set "off" to disable). Added to an existing section, so no _config_version bump needed. No new storage layer, no new injection path, no prompt-cache impact. * fix(dashboard): fold onboarding into agent tab to avoid 1-field category onboarding.profile_build is the only schema-surfaced onboarding field (onboarding.seen is an internal latch dict), so the dashboard CONFIG_SCHEMA single-field-category invariant rejected it. Merge onboarding -> agent like the other small categories.	2026-06-07 08:36:48 -07:00
Teknium	d87f293972	feat(compression): temporal anchoring in compaction summaries (#41102 ) Compaction summaries now receive the current date and instruct the summarizer to rewrite completed actions as absolute, dated, past-tense facts (e.g. "email John about the proposal" -> "Sent the proposal email to John on 2026-06-07"). A resumed conversation no longer re-issues work that already happened or treats a finished action as still pending. The date is resolved via hermes_time.now() (date-only, user-configured timezone) inside _generate_summary. The compaction summary is a mid-conversation message that is never part of the cached prefix, so the date does not affect prompt-cache stability. Date resolution is best-effort: a clock failure omits the rule rather than blocking compaction. The rule rides the shared template, so both first-compaction and iterative-update prompts carry it. Inspired by Poke's summarization (temporal anchoring + semantic preservation).	2026-06-07 08:36:45 -07:00
kshitijk4poor	ffe665277c	fix(aux): honor model.default_headers on auxiliary client too (#40033 ) The salvaged main-agent fix (sanidhyasin) applies model.default_headers to the primary OpenAI client, but the auxiliary client (title generation, context compression, vision routing) builds its own clients and did not read the override. For a `provider: custom` endpoint behind a gateway/WAF that rejects the OpenAI SDK's identifying headers, the main turn would succeed while auxiliary calls to the same endpoint still failed with the opaque 502/4xx from #40033. Add agent.auxiliary_client._apply_user_default_headers() (user values win over provider/SDK defaults; no-op when unconfigured) and apply it at every OpenAI-wire client construction site: - _try_custom_endpoint() — config-level `model.provider: custom` - the named custom-provider branch (custom_providers/providers entries), including the anthropic-SDK-missing OpenAI-wire fallback - the api-key-provider, async-conversion, and main resolve_provider_client fallback branches To prevent the two clients ever drifting on precedence/value handling, AIAgent._apply_user_default_headers (run_agent.py) now delegates the config read + merge to this shared helper (run_agent already imports from auxiliary_client). Native Anthropic/Bedrock branches are untouched (they don't use the OpenAI wire). 8 new tests (helper semantics + config-level custom + named custom); full aux + attribution header suites green (295).	2026-06-07 02:02:40 -07:00
Teknium	3c8f1dee8d	fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (#36718 ) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from #40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>	2026-06-07 01:56:51 -07:00
Teknium	0524c9b34e	feat(compression): raise compaction trigger to 85% for gpt-5.5 on Codex OAuth (#40957 ) The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window (verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses is rejected with context_length_exceeded while ~250K succeeds; the same slug exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the default 50% trigger, auto-compaction fires at ~136K — half the usable window. Raise the trigger to 85% (~231K) on this exact route only, gated by a new compression.codex_gpt55_autoraise config flag (default true). When it fires, emit a one-time notice (CLI inline print + gateway status_callback replay) with the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's global threshold. - _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex - _compression_threshold_for_model() now provider-aware + opt-out param - config key + _config_version bump (27->28) for backfill - docs + tests (40 cases in test_arcee_trinity_overrides.py)	2026-06-07 01:40:50 -07:00
Siddharth Balyan	fcb1944b4f	feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details * feat(tui): HERMES_DEV_CREDITS live-spend dev readout (L0 tracer for usage-aware credits) L0 of the usage-aware-credits feature: a dev-only, env-gated tracer that exercises the real header -> CreditsState -> TUI pipe end-to-end behind HERMES_DEV_CREDITS, de-risking the L1/L5 build before the notice policy exists. - agent/credits_tracker.py: CreditsState + parse_credits_headers (headers are strings -> paid_access via == "true", never bool(); retain-last-known; only subscription_micros may be negative; _usd kept verbatim). - run_agent.py: _capture_credits / get_credits_state / get_credits_spent_micros, session-start baseline latch, + dev-gated "credits" capture log. - agent/chat_completion_helpers.py: capture on the streaming response. - agent/agent_init.py: init _credits_state + _credits_session_start_micros. - tui_gateway/server.py: _get_usage emits dev_credits_spent_micros only when flagged. - ui-tui appChrome.tsx / types.ts: cents delta status segment + "(dev credits)" banner. Off by default; silent for normal users. Validated live against staging (capture log delta matches the TUI segment). Throwaway consumer (readout/log/ banner); credits_tracker + the capture plumbing are the real feature foundation. test(credits): lock parser under 9-state matrix + harden validation (L2) Add tests/agent/test_credits_tracker.py with 92 tests covering the 9-state matrix (healthy, sub_90pct, grant_exhausted, purchased_only, tool_pool_free, depleted, debt, missing, no_org) plus validation edge cases: version strict==1 with warn-once latch for v>1, bool-string trap (paid_access/tool_pool_gated_off == "true"/"false", never bool()), half-pair subscription limit treated as both-absent while parse succeeds, USD regex ^-?\d+\.\d{2}$, non-int micros → None, negative non-subscription micros → None, as_of_ms junk → None, zero limit ZeroDivision guard. Harden agent/credits_tracker.py to match the spec: - Add tool_pool_micros/tool_pool_gated_off/from_header fields to CreditsState - Add depleted property (== not paid_access, never remaining==0) - Change used_fraction guard to key off subscription_limit_micros (the actual denominator) not denominator_kind (metadata) - Replace fail-soft _safe_int with a sentinel-returning variant; full validation now returns None on any malformed field rather than silently defaulting - Add module-level warn-once latch for version > 1 - Add USD regex validation; add denominator_kind allow-list check - Parse x-nous-tool-pool-* prefix headers (not x-nous-credits-tool-pool-) feat(credits): notice spine — AgentNotice + notice_callback/notice_clear_callback + TUI binding (L1) L1 of usage-aware credits: the driver-agnostic notice delivery spine that L4's policy will fire through and L5's TUI render will consume. - agent/credits_tracker.py: AgentNotice dataclass (text/level/kind/ttl_ms/key/id; kind defaults "sticky", kept TTL-expressive for a future config seam). - run_agent.py: AIAgent gains notice_callback + notice_clear_callback slots and _emit_notice / _emit_notice_clear emitters (swallow all callback errors — a notice must never break the agent loop; no-op when unbound). - agent/agent_init.py: thread both callbacks through init_agent. - tui_gateway/server.py: bind both in _agent_cbs → notification.show / notification.clear WS events (snake_case payload, matching the existing gateway-event convention). - ui-tui/src/gatewayTypes.ts: notification.show / notification.clear arms on GatewayEvent. - tests/run_agent/test_notice_spine.py: 15 tests (emitter fire + fail-open + no-op, signature threading, TUI binding payload shape). Messaging push is out of v1 (binds neither callback). CLI binding + the TUI render/ decode land with L4 (firing) and L5 (render) so turn-end flush is wired correctly. * feat(credits): threshold reconciliation policy + tests (L4.1) * feat(credits): wire threshold policy into capture + latch (L4.2) After a fresh header parse, _capture_credits runs evaluate_credits_notices against the agent's _credits_latch and emits the result — clears first, then shows (so a recovered depletion clears before the "restored" success lands, and depleted wins the latest-wins slot). Gated on a bound notice_callback: messaging (no callbacks) still caches state for /usage but runs no policy. Parse stays fail-open (miss → keep last-known); the eval/emit path warns on failure rather than swallowing, so a depletion-notice bug can't vanish silently. - run_agent.py: _capture_credits split into parse (swallow→miss) + policy (warn); latch lazy-guarded (object.__new__ safety). - agent/agent_init.py: init agent._credits_latch = {"active": set(), "seen_below_90": False}. * feat(tui): render credits notices in the status bar (L5, Strategy B) The TUI now renders the notification.show / notification.clear gateway events the agent emits — a level-colored notice overrides the status/verb slot when not busy. - Notice state machine on turnController (pendingNotice + dedicated noticeTimer + show/clear/applyNotice/flushPendingNotice/clearNoticeState). createGatewayEventHandler decodes the events and delegates. - Render priority busy > notice > status (appChrome StatusRule); notice text rendered verbatim (its glyph comes from the policy), shrinkable so it never clips model│ctx; dev-credits banner + Δ segment preserved. UiState.notice is snake_case (matches wire). - Busy-wins: a notice arriving mid-turn is held and flushed at the THREE turn-end sites (recordMessageComplete / interruptTurn / recordError) — never idle(), which reset() also calls (would leak across sessions); reset() clears instead. - Dedicated noticeTimer (never statusTimer); TTL starts on visibility with an id-guard; latest-wins cancels the prior timer; clear is key-matched (no-op on mismatch); a sticky survives a turn (flush no-ops with no pending); session reset clears (no cross-session leak). - 20 tests (handler/turnController logic incl. R3-C2 timer isolation + render priority). * feat(credits): cold-start seed for new Nous sessions (L3) A genuinely-new Nous session has no inference header yet, so seed credits state from the authoritative GET /api/oauth/account snapshot at session start (in the new-session branch of _restore_or_build_system_prompt — inline, since the on_session_start plugin hook gets no agent reference). The seed runs the shared notice policy, so a session that opens already depleted warns IMMEDIATELY rather than only after the first turn. - Maps the nested account fields (paid_service_access → paid_access; total_usable / subscription / purchased on paid_service_access_info; rollover on subscription), each None-guarded; float dollars → micros via round(d1e6), _usd left "" (render formats from micros — never synthesize a verbatim usd from a float). - Magnitudes-only: no monthlyCredits on the endpoint → subscription_limit_* unset → used_fraction None → no warn90 from the seed (% only once a header lands, per D-E). - Provider-guarded to Nous; fail-open (any error leaves _credits_state None, never blocks startup); paid_access unknown ⇒ True (never falsely depleted). - run_agent.py: extracted the warm-path policy/emit block into a shared _emit_credits_notices() so capture and the seed fire notices identically. * feat(credits): /usage Nous credits magnitudes view + recovery trigger (L6) Add Nous credit dollar magnitudes to /usage (subscription / top-up / total + rollover + renewal + portal CTA), magnitudes-only per v1 (no % until the account endpoint exposes a denominator). Reuses the existing account-usage render machinery via a new pure build_nous_credits_snapshot() that maps a NousPortalAccountInfo to an AccountUsageSnapshot; no nous branch is added to fetch_account_usage (keeps the per-provider boundary intact). CLI /usage also doubles as a depletion-recovery trigger: a force_fresh account fetch, kept in a SEPARATE local so it never clobbers the header-sourced agent._credits_state (which alone carries used_fraction). If paid access recovered while credits.depleted is latched and a notice consumer is bound, it reuses agent._emit_credits_notices() to clear it. Gateway /usage displays magnitudes only — messaging binds no notice consumer, so it performs no recovery emit. Fail-open throughout: any portal hiccup leaves /usage unaffected. * refactor(credits): dedupe HERMES_DEV_CREDITS flag parse via shared helpers The dev-flag truthy check was inlined in three places. Replace with the shared utils.is_truthy_value (run_agent.py, tui_gateway/server.py — also drops a redundant inline `import os`) and a hoisted DEV_CREDITS_MODE export in ui-tui/src/config/env.ts (consumed by appChrome, which also stops recomputing the env check on every render). Behaviour-preserving; identical truthy set. * fix(credits): cut dead /usage recovery trigger + bound portal fetches (L6 review) Adversarial review found the /usage depletion-recovery trigger dead AND broken: the CLI binds no notice_clear_callback, the TUI runs /usage in a separate slash-worker subprocess (its own agent/latch), and the no-clobber rule made it evaluate stale paid_access anyway. Recovery already happens on the next inference (warm path), so the trigger was redundant — remove it and stop the depleted notice over-promising. - cli.py: remove the dead recovery block; bound the /usage portal fetch with a 10s wall-clock timeout (ThreadPoolExecutor) like the per-provider fetch — urllib's per-socket timeout is not a wall-clock guarantee. - agent/credits_tracker.py: reword the depleted CTA to "run /usage for balance" (no false recovery promise; /usage shows fresh magnitudes, sticky clears next turn). - agent/conversation_loop.py: same wall-clock timeout on the cold-start seed fetch so a stalled portal can't hang session startup; tidy its time import. * chore(credits): dev notice-state fixtures (HERMES_DEV_CREDITS_FIXTURE) Throwaway dev scaffolding to exercise the notice pipeline without real spend or Redis seeding. Set HERMES_DEV_CREDITS_FIXTURE to a state name (healthy / sub_90pct / grant_exhausted / depleted / clear) or a file path whose contents name a state (re-read each turn → flip states live for recovery testing). _capture_credits injects the chosen CreditsState instead of parsing real headers and runs the shared notice policy. Deletable with the rest of the HERMES_DEV_CREDITS scaffolding. * feat(credits): /usage monthly-grant % gauge The portal /api/oauth/account subscription block now carries monthly_credits (the per-period grant allowance, the % denominator). The consumer parsed monthly_charge but dropped monthly_credits, so /usage stayed magnitudes-only. Capture monthly_credits into NousPortalSubscriptionInfo + _subscription_from_payload. build_nous_credits_snapshot emits a Subscription usage window (real % used, routed through the existing render machinery) when monthly_credits is a finite positive denominator and credits_remaining is finite and <= cap; otherwise it degrades to magnitudes-only (older portals, rollover-over-cap, or non-finite payloads). Guards (adversarial-review-driven): reject non-finite operands (json.loads parses bare NaN/Infinity by default → would render $nan + a false 100% used), reject bools, guard div-by-zero (cap>0), and suppress the gauge when remaining > cap (rollover spanning the period makes the cap a nonsensical denominator → the $X-of-$Y detail would read as a contradiction). Debt (remaining<0) clamps to 100%. Money rule preserved: the ratio + magnitudes are computed from numeric float account fields via display formatting, never by parsing a server _usd string (there are none on these dataclasses). 13 gauge tests added (tests/agent/test_nous_credits_gauge.py). fix(credits): show /usage Nous block whenever a Nous account is present /usage runs in a slash-worker subprocess whose resolved inference provider is often not "nous" even when the user has a Nous account, so gating the Nous credits block on (provider == "nous") hid it entirely — the account data was fully available but never rendered. Gate instead on "a Nous account is logged in": a cheap local auth-state lookup (get_provider_auth_state('nous') has an access_token) decides whether to attempt the portal fetch, regardless of which provider inference runs on. In the gateway the block is also lifted out of the 'if provider:' scope so a Nous-credentialled user with another (or no) resident inference provider still sees their balance. Fail-open and the per-fetch wall-clock timeout are preserved. * fix(credits): show /usage Nous block when there's no live agent (TUI slash-worker) In the TUI, /usage runs in a slash-worker subprocess that resumes the session WITHOUT building an agent (self.agent is None), so _show_usage early-returned "(._.) No active agent" before ever reaching the Nous credits block — which is agent-independent (a portal fetch gated on Nous auth-state). Extract the block into _print_nous_credits_block() and run it at the no-agent / no-calls early-returns too (returns True if it printed, so the fallback message only shows when there's genuinely nothing). Verified live against staging: the block + monthly-grant gauge now render in the slash-worker /usage path (previously hidden). The plain CLI REPL + messaging paths are unchanged (they have a live agent). * feat(credits): escalating 50/75/90 usage bands (single status line) Replace the lone 90%-used warning with three escalating bands (50 info, 75 warn, 90 warn) shown as ONE status-bar line: it displays the highest band the subscription grant has crossed, replaces the line as usage climbs, steps back down on recovery, and clears below 50%. No stacking, no per-turn churn. Bands live in a tunable CREDITS_USAGE_BANDS list; the policy derives everything from it. Single notice key (credits.usage) with a usage_band latch field so the notice only re-emits when the band actually changes. The crossing gate (seen_below_90) is preserved so a fresh live session that opens mid-range stays quiet until it has been observed below the lowest band (cold-start primes it when it wants an open-high warning). Denominator math unchanged: % = subscription grant burn (cap - grant_remaining)/cap, clamped [0,1]; top-up never moves the %. Migrated test_credits_policy.py to the new key + added TestUsageBands (climb, step-down, recovery-clear, idempotent, inclusive boundaries). * feat(credits): hydrate notices at session OPEN via shared seed (TUI + first-turn) Notices previously only fired inside a conversation turn (first message), so a session that opened already depleted / past a usage band showed nothing at 'ready'. Extract the cold-start seed into a shared seed_credits_at_session_start() and call it (a) in the TUI/desktop agent build right after the notice callback is wired (fires at 'ready', before any message) and (b) as the first-turn fallback in conversation_loop. Idempotent (skips once _credits_state exists) and fail-open. The seed now maps monthly_credits -> subscription_limit_micros + denominator_kind='subscription_cap', so used_fraction is computable at seed time and usage-band warnings (not just depletion) hydrate on open. Primes the crossing latch so a session opening already in a band warns immediately. Degrades to depletion-only when monthly_credits is absent (older portals). Adds test_credits_cold_start.py covering open-at-band, depletion, debt, no-cap degradation, and the shared seed (fires/idempotent/skips-non-nous). * feat(credits): /usage monthly-grant % gauge + fixture support + TUI surfacing agent/account_usage.py: build_nous_credits_snapshot emits a subscription %% gauge when the portal supplies a positive, finite monthly_credits denominator with remaining <= cap (guards reject NaN/Infinity and rollover-over-cap, which would render $nan or a contradictory $X-of-$Y); degrades to magnitudes-only otherwise. Adds shared nous_credits_lines() (auth-gated, wall-clock-bounded portal fetch) so the CLI and TUI /usage render the same block, and _snapshot_from_credits_state() so HERMES_DEV_CREDITS_FIXTURE drives /usage offline too. TUI: session.usage RPC carries credits_lines (agent-independent) and the /usage panel renders them regardless of API-call count or resume state — previously the TUI's separate /usage implementation only showed token counts. Money rule preserved: %% and magnitudes come from numeric float account fields via display formatting, never by parsing a server _usd string. feat(credits): CLI REPL inline notices (parity with TUI) The plain CLI agent bound no notice callbacks, so credit notices were TUI-only. Bind notice_callback/notice_clear_callback on the CLI AIAgent; _on_notice renders a single level-colored line above the prompt (error red / warn yellow / success green / info dim) via _cprint, and seed credits at session open so a depletion or usage-band warning shows before the first message — the same hydration the TUI got. _on_notice_clear is a no-op (the REPL prints lines, no persistent slot). * test(credits): add sub_50pct + sub_75pct dev fixtures for the new usage bands The fixture set jumped 10%% -> 90%%; add sub_50pct (uf 0.5 -> band 50 info) and sub_75pct (uf 0.75 -> band 75 warn) so the new escalating bands are exercisable via HERMES_DEV_CREDITS_FIXTURE across all three surfaces (notice, session-open seed, /usage gauge). * fix(credits): usage-band notice clears on next prompt (not sticky-forever) A 50/75/90 usage heads-up was sticky and camped the status bar indefinitely. Clear the visible credits.usage notice when a new turn starts (startMessage), so it shows until your next prompt then yields. The server latch is unchanged, so it won't re-nag at the same band — it only re-shows when the band actually changes (climb) or clears when usage drops below the lowest band. Depletion stays sticky. * refactor(credits): consolidate the /usage credits block behind nous_credits_lines() The CLI (_print_nous_credits_block) and the messaging gateway (_handle_usage_command) each re-implemented the auth-gate + portal fetch + render, and both bypassed the dev-fixture short-circuit that only the TUI honored — so /usage ignored HERMES_DEV_CREDITS_FIXTURE on the CLI and in chat. Route both through the shared agent.account_usage.nous_credits_lines() helper: one fetch/render path, one auth gate, and the fixture works on every surface (~60 fewer duplicated lines). The gateway usage test recorded only the last asyncio.to_thread call; /usage now dispatches both the account fetch and the credits fetch, so it records every call and matches the account fetch by its provider arg. * fix(credits): keep the /usage gauge type-safe and log its fail-open path _is_finite_num is now a TypeGuard[float], so the type checker narrows the gauge operands (monthly_credits / credits_remaining) and the magnitudes passed to _fmt_usd through it — no more None-operand warnings on the arithmetic. Add a debug breadcrumb on the nous_credits_lines portal-fetch fail-open so a dead /usage block is diagnosable in agent.log without a dev flag. * fix(credits): harden the header tracker — prod-leak gate, hot-path probe, fire-and-forget seed - Prod-leak guard: dev fixtures (HERMES_DEV_CREDITS_FIXTURE) now also require HERMES_DEV_CREDITS, so a stray fixture var can't surface fabricated balances on a real account. Matches the documented run workflow (both vars set together). - Hot-path probe: parse_credits_headers checks for the version sentinel header before allocating a lowercased copy of the response headers — skips that work on every non-Nous API call. Behaviour-identical and still case-insensitive. - Fire-and-forget seed: the real portal fetch in seed_credits_at_session_start now runs in a daemon thread, so a slow/unreachable portal never delays session "ready" (previously blocked up to 10s). The dev-fixture path stays synchronous; the thread re-checks idempotency before hydrating (a live header may land first). - Diagnostics: debug breadcrumbs on the parse and seed fail-open paths so a crashed parser / dead seed is distinguishable from a legitimate no-headers miss. Cold-start tests set HERMES_DEV_CREDITS alongside the fixture to match the gate. * test(tui): fix env-timing in the StatusRule dev-credits assertion DEV_CREDITS_MODE is read once at module load (config/env), so mutating process.env.HERMES_DEV_CREDITS inside the test couldn't flip it — the dev-banner assertion only passed if the env was exported before vitest started, and failed in a normal run. Move that assertion to a sibling file that mocks config/env with DEV_CREDITS_MODE: true (scoped, no module-reset / React-identity hazard). * test(credits): cover the dev-fixture /usage render and usage-band clear-on-prompt - _snapshot_from_credits_state (the offline /usage renderer) had no direct test: lock the gauge math, the verbatim _usd magnitudes, the depletion line and the fixture marker, plus the no-cap (no gauge) and None-state cases. - turnController.startMessage had no test for clearing the credits.usage notice on the next prompt while leaving credits.depleted sticky. feat(credits): deliver credit notices over messaging gateways Bind notice_callback/notice_clear_callback on the per-turn gateway agent so usage-band / depletion / restored notices reach Telegram/Discord/Slack/ etc. Previously the messaging gateway bound neither callback, so the agent's _emit_credits_notices early-returned and a chat user crossing a band got nothing unless they ran /usage manually. - render_notice_line(): AgentNotice -> single plaintext line (level glyph + text), plaintext-only so it renders uniformly without per-platform escaping. Fail-soft on malformed/empty notices. - Standalone push for every notice (messaging has no persistent status bar): route through the shared _deliver_platform_notice rail (honors private/ public delivery + thread metadata), scheduled onto the gateway loop via safe_schedule_threadsafe from the agent's sync worker thread — same pattern as _status_callback_sync. - The fired-once latch lives on the cached (reused-in-place) agent and persists across turns, so a band crosses once -> one push, no per-turn re-nag. Re-fires only after idle-eviction rebuilds the agent (a reminder). - Recovery ('Credit access restored') rides the show path (emitted as a success notice, not a clear). notice_clear_callback is a no-op: a sent platform message can't be cleanly retracted. Tests: render glyph/levels/fail-soft + public/private delivery seam through _deliver_platform_notice + no-adapter no-op. * fix(credits): don't double the glyph on messaging notices render_notice_line prepended a per-level glyph, but the notice policy already bakes the glyph into the text (and the TUI + CLI render it verbatim) — so every credit notice over messaging came out doubled ("⚠ ⚠ Credits 90% used", "⛔ ✕ Credit access paused"). Emit the text verbatim instead; drop the now-dead level→glyph map. The render tests fed glyph-less text (and the success case only checked startswith), so the doubling slipped through. Rework them around the verbatim contract and add an end-to-end regression that runs real evaluate_credits_notices output through render_notice_line and asserts the line is returned unchanged.	2026-06-06 13:18:18 +05:30
Teknium	ec46f5912e	fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints (#39730 ) * fix: respect disabled auto-compaction on context overflow Port from anomalyco/opencode#30749. When compression.enabled is false, NO automatic compaction trigger may fire. The proactive token-threshold paths (preflight + post-response should_compress gate) already honoured the setting, but the three provider-overflow recovery paths in the agent loop — long-context-tier 429, 413 payload-too-large, and context-overflow — called _compress_context() unconditionally, silently compressing and rotating the session against the user's explicit choice. Add a single guard at the top of the overflow-recovery dispatch: when compression is disabled and the error is one of those three overflow classes, surface a terminal error (compaction_disabled: True) telling the user to /compress manually, /new, switch to a larger-context model, or reduce attachments. Manual /compress (force=True) is unaffected — it never enters this loop. Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't compress when disabled; control case still compresses when enabled). Existing overflow-recovery tests updated to enable compaction explicitly (they verify the recovery fires); fixture defaults flipped to True to match production (compression.enabled defaults to True). * fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints Two distinct failures hit users on the gemini provider with only Google AI Studio keys set. 1. Truncation loop: build_gemini_request() only set maxOutputTokens when max_tokens was non-None. Hermes passes None to mean "unlimited", but Gemini's native generateContent does NOT treat an absent maxOutputTokens as full budget — it applies a low internal default and stops early with finishReason=MAX_TOKENS, truncating tool calls. The agent then retries 3x and refuses the incomplete call. Now default to the published 65,535 ceiling (shared by all current Gemini text models) when max_tokens=None. 2. HTTP 400 on Gemini endpoint: the chat_completions transport assembles profile extra_body (Nous portal 'tags', reasoning, provider prefs) and sends it via the OpenAI client to whatever base_url is resolved. When a profile that emits extra_body (e.g. Nous) is active but the endpoint is a native Gemini base_url — typical when only Google creds exist and a fallback/aux call lands on Gemini — Google rejects the unknown 'tags' field with a non-retryable 400. Strip all non-thinking_config extra_body keys when the resolved endpoint is native Gemini. Verified E2E against real transport code: tags stripped on native Gemini, preserved on Nous and the /openai compat endpoint; maxOutputTokens=65535 on None, explicit values respected.	2026-06-05 03:53:59 -07:00
Evi Nova	4690bbc363	fix(local): recognize unqualified hostnames as local endpoints (#9248 ) Docker Compose service names (e.g. ollama, litellm, hermes-litellm) are unqualified hostnames with no dots. These are always local — they resolve via Docker DNS, /etc/hosts, or mDNS. Without this fix, the stale stream timeout fires on local LLM proxies, causing infinite reconnect loops. Closes #7905	2026-06-05 10:18:10 +10:00
Fearvox	fa8e2f935b	polish(minimax): address Copilot review comments on M3 default-aux fix Three Copilot inline review comments on #37664, two worth landing in a polish pass before merge: 1. auxiliary_client.py:270 — Copilot suggested keeping the minimax-* entries in _API_KEY_PROVIDER_AUX_MODELS_FALLBACK as a safety net for environments where the profile-based resolution can't import or run plugin discovery. Declined. The deepseek precedent (commit `773a0faca`) explicitly removed deepseek from the same dict for the same reason — the profile layer is the source of truth and the dict is a legacy pre-profiles-system fallback. We do not want to fragment the codebase by provider: either the profile layer is authoritative or the dict is. The minimax PR picks profile (matching deepseek) and the dict stays cleaned up. The risk Copilot raises is real but theoretical — plugin discovery runs at import time of the providers module, which is the first thing any modern Hermes entrypoint imports. 2. tests/agent/test_minimax_provider.py:162 — Copilot flagged that the test class relies on _get_aux_model_for_provider() resolving via provider profiles but doesn't explicitly trigger plugin discovery. Fixed. Added 'import model_tools # noqa: F401' at the top of both test_minimax_aux_is_standard and test_minimax_aux_not_highspeed. The fixtures in the parallel test_minimax_profile.py already did this; the legacy test in test_minimax_provider.py was order-dependent and would silently break if anyone reorganised the test ordering. Pinned the dependency explicitly so the test is order-independent. 3. tests/plugins/model_providers/test_minimax_profile.py:46 — Copilot flagged that the docstring referenced a hard-coded line number 'hermes_cli/models.py:298' that would go stale. Fixed. Replaced with the symbol reference 'hermes_cli.models._PROVIDER_MODELS[\'minimax\']' which is stable under file edits and grep-friendly. The new docstring also reads more naturally — readers don't have to look up 'what's at line 298' to follow the reasoning. All 221 minimax-related tests still pass.	2026-06-04 05:53:35 -07:00
Fearvox	3d1d0a49fe	fix(minimax): align default_aux_model with M3 frontier on minimax + minimax-cn The minimax / minimax-cn / minimax-oauth profiles still advertised M2.7 (and M2.7-highspeed for OAuth) as their default_aux_model, predating the M3 release (2026-06-01). The user-facing _PROVIDER_MODELS['minimax'] catalog top entry is M3, and the recommended config for a Token-Plan install now sets model.default: MiniMax-M3, so the aux default was the only remaining drift. Updates: * minimax default_aux_model: M2.7 -> M3 * minimax-cn default_aux_model: M2.7 -> M3 * minimax-oauth default_aux_model: M2.7-highspeed -> M2.7 (M3 is not on the OAuth / Coding Plan tier per platform docs as of this PR; the highspeed variant was the 2x-cost regression from #4082 that PR #6082 collapsed to plain M2.7 for minimax / minimax-cn but missed OAuth) * agent/auxiliary_client.py: drop the three legacy _API_KEY_PROVIDER_AUX_MODELS_FALLBACK entries for the minimax family. _get_aux_model_for_provider() reads from ProviderProfile.default_aux_model first (line 250) and only falls back to the dict when the profile has no aux model or the profile import fails. With the profile now set, the dict entries are dead code and a drift hazard. Mirrors the deepseek cleanup in `773a0faca`. * tests/agent/test_minimax_provider.py: update the existing TestMinimaxAuxModel assertions from MiniMax-M2.7 to MiniMax-M3 (the intent — 'standard, not highspeed' — is unchanged; the pin value is). * tests/plugins/model_providers/test_minimax_profile.py: new file mirroring tests/plugins/model_providers/test_deepseek_profile.py. Pins each of the three profiles' default_aux_model and asserts _get_aux_model_for_provider() returns it. A second class guards against the highspeed regression coming back. Refs: - Closes #36196 in spirit (M3 support — the catalog half of that issue is #36212; this PR covers the profile half) - Related: #4082 (M2.7-highspeed 2x-cost), #6082 (previous M2.7-highspeed -> M2.7 fix that missed OAuth + the auxiliary_client.py fallback dict) - Pattern: `773a0faca` (same profile-layer fix for deepseek)	2026-06-04 05:53:35 -07:00
teknium	153fe28474	fix(vision): use MiniMax type="video" block (not input_video) + tests The salvaged conversion emitted type:"input_video", which MiniMax M3 rejects just like the original video_url block. Per MiniMax's Anthropic-compat docs, the video content block is type:"video" with an image-style source (base64 or url). Fixes the block type, converts URL-based videos too, and adds 4 video conversion tests (none shipped with the original PR).	2026-06-04 05:38:11 -07:00
AhmetArif0	9756dff5fd	fix(model_metadata): drop stale ≤256,000 cache entries for Grok-4.3 The ``grok-4.3`` (1M context) catalog entry was added on 2026-05-15 (`ce0e189d3`). Between 2026-04-10 (when ``grok-4`` at 256,000 was first added by `b57769718`) and 2026-05-15, grok-4.3 slugs resolved via the generic ``grok-4`` substring catch-all and that 256,000 value was persisted to context_length_cache.yaml. Users who first queried grok-4.3 in that 35-day window are stuck at 256K forever — the cache is read at step 1 before the hardcoded defaults in step 8, so the correct 1M entry is never reached. Mirror the existing Kimi/Codex/MiniMax-M3 stale-cache guards: add _model_name_suggests_grok_4_3() and an elif branch that drops any cached value ≤ 256,000 for a grok-4.3 slug so the next lookup falls through to the 1M hardcoded default. Adds 4 regression tests: helper unit test, stale-drop-and-re-resolve, correct-cache-preserved, and no-clobber for plain grok-4 (256K correct).	2026-06-04 05:36:34 -07:00
Sol Aitken	de60bf40c6	fix(memory): register parent packages for user-installed provider imports User-installed memory providers load under the synthetic _hermes_user_memory.<name> package, but the loader never registered that parent namespace in sys.modules (it only registers "plugins" and "plugins.memory" for bundled providers). As a result any external provider using a relative import failed to load: from . import config ModuleNotFoundError: No module named '_hermes_user_memory' The same gap in discover_plugin_cli_commands() meant an external provider's cli.py with a relative import could never be discovered, so the documented "hermes <plugin>" CLI integration did not work for standalone plugins. Register the synthetic parent namespace before loading user-installed providers, mirror it for cli.py discovery (including the per-provider parent package, without executing the plugin's __init__.py), and make _load_provider_from_dir() reuse only modules actually loaded from disk so a parent shell registered by CLI discovery is never mistaken for the loaded provider. Regressions cover: a flat provider with a sibling relative import, a provider with its implementation in a nested subpackage (including a namespace intermediate directory), cli.py discovery with a relative import, and provider load after CLI discovery ran first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 05:35:43 -07:00
Nate George	e8c3ac2f5c	fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral) Fireworks/Mistral reject HTTP 400 'Extra inputs are not permitted, field: messages[N].tool_calls[M].extra_content' on any session whose history contains prior Gemini tool calls. Gemini 3 thinking models attach extra_content (thought_signature) to tool_calls; it survived to the wire because the sanitize paths only stripped call_id/response_item_id. Strip extra_content from the outgoing wire copy in both sanitize paths (ChatCompletionsTransport.convert_messages + _sanitize_tool_calls_for_strict_api), but gate it on the target model: keep extra_content for Gemini-family targets (the thought_signature MUST be replayed or Gemini 400s), strip it for everyone else — including non-Gemini models that inherit a stale Gemini signature earlier in a mixed-provider session. Native Gemini is unaffected (GeminiNativeClient bypasses these paths). Original stored history is never mutated (only the per-call copy). Fixes #17986.	2026-06-03 16:42:52 -07:00
kshitijk4poor	da4f407e51	feat(cli): make `hermes portal` the human-readable Portal onboarding alias `hermes portal` (no subcommand) now runs the one-shot Nous Portal onboarding — OAuth login, switch provider to Nous, offer Tool Gateway — identical to `hermes setup --portal` and the human-readable alias for `hermes auth add nous --type oauth` (which still works). The prior status default moves to `hermes portal info`; `status` is kept as a hidden back-compat alias. `open`/`tools` subcommands are unchanged. User-facing hints and docs (status.py, conversation_loop 401 guidance, SystemPage, README, website docs + zh-Hans) now point at `hermes portal` / `hermes portal info`. `--manual-paste` references keep the explicit auth command since `hermes portal` does not expose that flag.	2026-06-04 01:19:28 +05:30
Siddharth Balyan	c349eca823	fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383 ) * fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix locales/ is a bare data dir (no __init__.py), invisible to packages.find and package-data. Sealed installs (pip wheel, Nix store venv) dropped it, so gateway/CLI commands rendered raw i18n keys like gateway.reset.header_default. - pyproject: [tool.setuptools.data-files] locales = ["locales/.yaml"] (wheel) - MANIFEST.in: graft locales (sdist) - agent/i18n._locales_dir: env override -> source -> sysconfig data scheme - nix/hermes-agent.nix: copy locales into the store + set HERMES_BUNDLED_LOCALES as defense-in-depth. The wheel's data-files already materialize into the uv2nix venv, so resolution works with no env var; the override pins the store path against a future uv2nix change that could drop data-files. - tests: metadata regression, wheel + sdist build-install smoke tests, and a bundled-locales flake check that verifies BOTH the wrapper override and the env-var-less data-files path. Smoke test wired into CI. Closes #23943, #27632, #35374. Supersedes #23966, #27716, #30261, #33841, #35429, #35494, #35735, #36697. test: cap locale e2e timeout, tighten catalog count guard The two wheel/sdist e2e tests inherit the global --timeout=30 from addopts; a cold-CI run (isolated build env + venv create + network pip install) can plausibly exceed it. Add @pytest.mark.timeout(300) so they don't ride the unit-test budget and flake intermittently. Also assert the shipped catalog count equals len(SUPPORTED_LANGUAGES) instead of a hardcoded >=16 floor, so the guard self-updates and trips on a single dropped catalog (not just a fully-empty graft).	2026-06-03 12:00:27 -07:00
Teknium	ab2472e692	fix(aux): self-heal Nous-routed calls when a pinned model leaves the catalog (#37732 ) A long-lived process (gateway, watcher) caches the Nous Portal's recommended-models payload and can pin a model for its whole lifetime. When that model is later dropped from the Nous -> OpenRouter catalog, every auxiliary call 404s with 'model does not exist in our configuration or OpenRouter catalog' until the process restarts. Now such a 404 force-refreshes the Portal recommendation and retries once with the current pick (or the gemini-3-flash-preview default). Scoped to Nous-routed calls only. - _is_model_not_found_error(): 404/400 'not found / does not exist / not a valid model' predicate, excludes billing keywords so it never overlaps _is_payment_error. - _refresh_nous_recommended_model(): force-refresh fetch, returns a model distinct from the one that failed, else the known-good default. - Wired into both call_llm and async_call_llm error chains.	2026-06-02 17:14:36 -07:00
brooklyn!	31c40c72c0	fix(desktop): stabilize project folder sessions (#37586 ) * fix(desktop): stabilize project folder sessions Keep desktop folder selection aligned with new sessions and scope TUI gateway cwd through session context so prompts and tools resolve against the selected workspace. * fix(desktop): address review feedback on folder sessions Snapshot sessions before iterating to avoid concurrent-mutation crashes, optional-chain the revealLogs catch, and read console-message args from the correct Electron event/messageDetails positions. * fix(desktop): address second review pass on folder sessions Sync the remembered workspace key with the cwd atom (clear on empty), only load tree children for real directory nodes, and throttle renderer auto-reloads so a deterministic startup crash can't loop forever. * fix(desktop): inherit parent workspace for ephemeral agent tasks Background and preview tasks use ephemeral ids absent from the session map, so pass the parent session cwd into the session context explicitly instead of clearing it back to the gateway launch dir. Also correct the set_session_vars docstring about clear_session_vars semantics. * fix(desktop): validate preview cwd before pinning session context A non-empty but non-existent client cwd would pin an unusable override and silently fall back to the launch dir. Validate once, reuse for both the session context and the terminal override, and fall back to the parent session workspace when invalid. * fix(desktop): harden preview cwd normalization and adopt normalized cwd Guard preview cwd normalization against malformed client paths so a bad input can't fail the whole restart, and adopt the backend's normalized config.get cwd in the no-active-session path so the persisted workspace stays consistent with what the agent uses.	2026-06-02 20:23:09 +00:00
Teknium	0269eca7e1	test(minimax): assert M3 stale-cache guard contract, not a brittle 1M literal (#37220 ) test_stale_m3_cache_dropped_and_reresolves_to_1m hardcoded assert ctx == 1_000_000. The test re-resolves M3 through the live models.dev registry (the seeded stale entry is dropped, so nothing short-circuits the lookup), and models.dev now reports MiniMax-M3 at 512,000 — a change-detector failure unrelated to any code change. The guard's actual contract is: a stale <=204,800 catch-all value for an M3 slug must be DROPPED and re-resolved to M3's real (large) context. Both sources satisfy that (hardcoded catalog 1,000,000; models.dev 512,000), so assert the invariant (ctx > 204,800, stale value gone) instead of a literal that external data can move. Renamed accordingly. 47/47 in test_minimax_provider.py pass.	2026-06-01 23:35:23 -07:00

1 2 3 4 5 ...

641 commits