hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-18 09:51:59 +00:00

Author	SHA1	Message	Date
Teknium	1d8b9e6458	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 ) Auxiliary tasks (title_generation, vision, compression, web_extract, session_search) now pick the correct wire protocol based on the endpoint, not just on which resolve_provider_client branch built the client. Fixes 404s on Kimi Coding Plan and any other named provider whose endpoint speaks Anthropic Messages. Root cause: the 'api_key' branch of resolve_provider_client (and the Step 2 fallback chain inside _resolve_auto) always built a plain OpenAI client regardless of what the endpoint actually spoke. For provider=kimi-coding + model=kimi-for-coding, that meant: POST https://api.kimi.com/coding/v1/chat/completions { "model": "kimi-for-coding", ... } → 404 resource_not_found_error The /coding route only accepts the Anthropic Messages shape (the main agent already uses api_mode=anthropic_messages for it). Earlier fixes (#16819, #`22ddac4b1`) patched the anonymous-custom, named-custom, and external-process branches — but the named api_key branch (kimi-coding, minimax, zai, future /anthropic providers) was the fourth sibling and never got the same treatment. Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a plain OpenAI client in AnthropicAuxiliaryClient when: - api_mode is explicitly 'anthropic_messages', OR - the URL ends in '/anthropic', OR - the host is api.kimi.com + path contains '/coding', OR - the host is api.anthropic.com. Wired into _wrap_if_needed (covers all resolve_provider_client branches that already go through it) and into the Step 2 api_key fallback chain inside _resolve_auto. Explicit api_mode still wins: passing api_mode='chat_completions' forces OpenAI wire, and already- wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass through unchanged. E2E verified: - resolve_provider_client('kimi-coding', 'kimi-for-coding') → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd) - _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient - resolve_provider_client('openrouter', ...) → plain OpenAI (no regression) - api_mode='chat_completions' override → plain OpenAI (explicit wins) Tests: - tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests covering URL detection, wrap decisions, and integration. - 204/205 existing auxiliary tests pass (1 pre-existing failure on main, unrelated to this change). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:14 -07:00
Teknium	391f1ca1f4	feat(aux): translate extra_body.reasoning into Codex Responses API (#17004 ) Auxiliary callers that configure reasoning via auxiliary.<task>.extra_body.reasoning were having that config silently dropped by the Codex Responses adapter — it only forwarded messages/model/tools through to responses.stream(), never translating chat.completions-shaped reasoning hints into the Responses API's top-level reasoning + include fields. Mirror the main-agent translation from agent/transports/codex.py: - extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"} - 'minimal' → 'low' clamp (Codex backend rejects 'minimal') - Always include ['reasoning.encrypted_content'] when reasoning is enabled - {'enabled': False} → omit reasoning and include entirely - Non-dict reasoning values are ignored defensively Reported by @OP (Apr 26 feedback bundle). ## Changes - agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads and translates extra_body.reasoning before calling responses.stream() - tests/agent/test_auxiliary_client.py: 9 new tests covering all effort levels, the minimal→low clamp, the disabled path, the no-op paths, and defensive handling of wrong-shape inputs Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:47:42 -07:00
Teknium	06164a7b28	fix(codex): resync pool entry from auth.json after reauth (#17001 ) When openai-codex tokens expire or the ChatGPT account hits a 429 window, the pool entry gets marked STATUS_EXHAUSTED with last_error_reset_at many hours in the future. If the user then runs `hermes model` / `hermes auth openai-codex` to reauth, fresh tokens land in ~/.hermes/auth.json but the pool entry stayed frozen behind its reset_at — every request kept failing with 'credential pool: no available entries (all exhausted or empty)' until the original window elapsed. _available_entries() already had auth.json/credentials-file resync branches for anthropic/claude_code and nous/device_code; openai-codex was missing. Added _sync_codex_entry_from_auth_store() mirroring the nous version (reads state["tokens"][{access,refresh}_token] + state["last_refresh"]) and wired it into the exhausted-entry resync loop. Also softens the 'codex CLI not found' doctor warning — native device-code OAuth does not require the Codex binary, only importing existing Codex CLI tokens does. Downgraded to an info line. Reported on Discord by p1aceho1der: Codex stalled indefinitely after a rate-limit reset, reauth didn't help, and doctor falsely warned that the codex CLI was required. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:43:09 -07:00
teknium1	529eb29b6a	fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel values. The salvaged bridge was forwarding Hermes' "minimal" effort to Flash verbatim, which is not a documented Gemini level and risks a 400 from the native adapter. Clamp minimal->low on Flash (matching how Pro already clamps minimal+low down), and funnel anything outside {low, medium, high} into medium to keep the request valid by construction. No behaviour change for the documented effort levels.	2026-04-28 05:38:23 -07:00
Nanako0129	dbbe2d1973	fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes	2026-04-28 05:38:23 -07:00
Pony.Ma	02ae152222	fix(mcp): normalize nullable tool schemas	2026-04-28 04:58:03 -07:00
Ruda Porto Filgueiras	37551ee53e	test(bedrock): add model picker and region routing tests 25 new tests (all Bedrock API calls mocked, no real AWS creds needed): tests/hermes_cli/test_bedrock_model_picker.py (20 tests): - provider_model_ids("bedrock") uses live discovery, returns regional model IDs, falls back gracefully on empty/exception, resolves all bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery - list_authenticated_providers() section 2: bedrock appears with AWS creds, model list from discover_bedrock_models(), total_models matches, is_current flag works, absent creds hides bedrock, discovery failure does not crash, no duplicate entries - Region routing: botocore profile eu-central-1 yields eu.* model IDs end-to-end; env var takes priority over botocore profile - providers.py overlay: exists with correct transport/auth_type, label is non-empty, all aliases normalize to bedrock tests/agent/test_bedrock_adapter.py (5 tests): - resolve_bedrock_region() botocore profile fallback, botocore failure fallback, us-east-1 hard fallback (with botocore mocked)	2026-04-28 03:53:11 -07:00
Teknium	023f5c74b1	fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 ) * fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path OAuth requests now identify as Hermes on the wire. Removed: - "You are Claude Code, Anthropic's official CLI for Claude." system prompt prepend - Hermes Agent → Claude Code / Nous Research → Anthropic system-prompt substitutions - mcp_ tool-name prefix on outgoing tool schemas + message history - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path removed from AnthropicTransport.normalize_response, + all 5 call sites in run_agent.py and auxiliary_client.py) - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on the Messages API client Added: - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth requests carrying it with HTTP 400 'This authentication style is incompatible with the long context beta header.' Kept (auth plumbing, not identity spoofing): - _is_oauth_token classifier and is_oauth flag threading - Bearer vs x-api-key auth routing - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend requires these on the OAuth-gated Messages endpoint - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth creds to third parties; this is the only way the login flow works - claude-cli/<v> User-Agent on the OAuth token exchange + refresh endpoints at platform.claude.com/v1/oauth/token — bare requests get Cloudflare 1010 blocked Verified live against api.anthropic.com with a fresh sk-ant-oat01-* token: - claude-haiku-4-5 simple message: HTTP 200, 'OK' response - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool named 'terminal' (no mcp_ prefix) round-tripped correctly - Outgoing wire: no user-agent, no x-app, real Hermes identity in system prompt, real tool name in schema Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer needed since the mcp_ round-trip is gone). * fix(anthropic): resolve_anthropic_token() reads credential pool first Close the gap where ~/.hermes/auth.json → credential_pool.anthropic (where hermes login + dashboard PKCE flow write OAuth tokens) was not in resolve_anthropic_token()'s source list. Before: users who authed via hermes login got the token written into the pool, but legacy fallback code paths (auxiliary_client, models catalog fetch, explicit-runtime path) that call resolve_anthropic_token() saw None and raised 'No Anthropic credentials found' — even though the token was sitting in auth.json. New priority 1: pool.select() with env-sourced entries skipped. Skipping env:* entries preserves the existing env-var priority logic further down the chain (static env OAuth → refreshable Claude Code upgrade via _prefer_refreshable_claude_code_token). Surfaced while writing the hermes-agent-dev skill playbook for 'finding a live OAuth token for an E2E test'. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 03:51:17 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
crayfish-ai	f3371c39a4	fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen - auxiliary_client: apply _to_openai_base_url() to custom base_url (fixes /anthropic → /v1 rewrite missing for provider="custom") - auxiliary_client: use main_runtime.get("model") instead of _read_main_model() so auxiliary tasks follow system default model changes - title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title - cli.py / gateway/run.py: pass main_runtime to maybe_auto_title - tests: update mock assertions for new main_runtime parameter	2026-04-28 01:47:25 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
Teknium	a7cdd4133c	fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793 ) On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6 are capped at 200K context unless the request carries the `context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com) 1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate it as beta as of 2026-04. Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`) while silently sending a request without the beta — so Bedrock users saw a 200K ceiling with no error message, and no config knob unblocked it. Claude Code sends this header by default, which is why the same Bedrock credentials worked there. - Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved thinking and fine-grained tool streaming). - Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth endpoints — they host their own models, not Claude, so Anthropic beta headers are irrelevant and could risk rejection. - Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock client. Previously that constructor passed no betas at all, so native Anthropic had the 1M unlock via default_headers but Bedrock didn't. - Fast-mode per-request `extra_headers` already rebuilds from `_common_betas_for_base_url`, so it picks up the 1M beta automatically. Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while same credentials worked in Claude Code.	2026-04-27 20:41:36 -07:00
Teknium	6ea5699e3f	fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775 ) A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures. Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields: - _last_aux_model_failure_model: str \| None - _last_aux_model_failure_error: str \| None Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings. Surface at three places: - gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved) - gateway /compress command: ℹ line appended to the reply - CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.	2026-04-27 20:08:23 -07:00
Teknium	94b26f3ec9	fix(compression): retry summary on main model for unknown errors before giving up (#16774 ) The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.	2026-04-27 19:25:57 -07:00
iamagenius00	dfdc4276e8	fix(compression): notify gateway users when summary generation fails When auxiliary compression's summary LLM call fails (e.g. model 404, auxiliary model misconfigured), the compressor still drops the selected turns and inserts a static fallback placeholder — the dropped context is unrecoverable. Previously the only signal of this was a WARNING in agent.log. Gateway users (Telegram/Discord/etc.) had no way to know context was lost because the existing _emit_warning path requires a status_callback, and the gateway hygiene path uses a temporary _hyg_agent with quiet_mode=True and no callback wired up. Changes: - ContextCompressor: track _last_summary_fallback_used and _last_summary_dropped_count on each compress() call. Cleared at the start of compress() and on session reset. - gateway/run.py hygiene: after auto-compress, inspect the temp agent's compressor; if fallback was used, send a visible ⚠️ warning to the user via the platform adapter (TG/Discord/etc.) including dropped count and the underlying error. - gateway/run.py /compress: append the same warning to the manual compress reply so users running /compress see the failure too. Acceptance: - Summary success: no user-visible warning (unchanged). - Summary failure on gateway hygiene: user receives a TG/Discord message with dropped count + error + remediation hint. - Summary failure on /compress: warning appended to the command reply. - CLI status_callback / _emit_warning path is untouched. - Test coverage: two new tests verify the tracking fields are set on failure and cleared on subsequent success.	2026-04-27 19:18:13 -07:00
Erosika	49e3a1d8ee	style: trim verbose comment blocks added by previous commit	2026-04-27 12:37:33 -07:00
Erosika	e553f6f3e4	fix(memory): narrow scrub surface to known wrapper boundaries Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence).	2026-04-27 12:37:33 -07:00
Erosika	3b2edb347d	fix(gateway): scrub memory-context leaks from vision auto-analysis output fixes #5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)	2026-04-27 12:37:33 -07:00
kshitijk4poor	56724147ef	fix(providers/gmi): post-salvage review fixes - config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version is 22, so all users are past version 17 and would never be prompted for GMI_API_KEY on upgrade — consistent with how arcee was added) - auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview, stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview) - test_gmi_provider.py: fix malformed write_text() call in doctor test (was: write_text("GMI_API_KEY=* encoding="utf-8") → missing closing quote, wrote literal string 'GMI_API_KEY=* encoding=' to .env file) - test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions to match new cheaper default - docs/integrations/providers.md: add 'gmi' to inline 'Supported providers' fallback list (was only in the table, not the inline list at line ~1181) - docs/reference/cli-commands.md: add 'gmi' to --provider choices list	2026-04-27 11:17:59 -07:00
Isaac Huang	c53fcb0173	feat(providers): add GMI Cloud as a first-class API-key provider (#11955 ) Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>	2026-04-27 11:17:59 -07:00
hermes-agent-dhabibi	8402ba150e	fix(copilot): send vision header for Copilot vision requests Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route. Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	4a2ee6c162	fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure Closes #15775. Title generation swallowed exceptions at debug level and returned None, so a depleted auxiliary provider (e.g. OpenRouter 402) silently left sessions with NULL titles. Reporter observed 45 untitled sessions accumulated over 19 days with no user-visible indication. - agent/title_generator.py: accept optional failure_callback, bump log to WARNING, invoke callback on call_llm exception (swallowing callback errors so nothing can crash the fire-and-forget worker thread). - cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the callback so failures route through the existing user-visible warning channel. - tests: cover callback fires / errors are swallowed / no-callback legacy behavior / maybe_auto_title forwards kwarg to worker.	2026-04-26 21:49:34 -07:00
briandevans	943465235e	fix(compressor): guard against bare-string items in multimodal content list raw_content from message["content"] can be a list that contains bare strings, not only dicts. The previous `p.get("text", "")` call raised AttributeError on string items, crashing context compression for any session that had a message with mixed content. Guard with isinstance checks: dict → .get("text"), str → len(p), fallback → len(str(p)). Adds a regression test covering the bare-string case that would have AttributeError'd on the pre-fix code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
briandevans	cfc8befe65	fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens _find_tail_cut_by_tokens called len(content) to estimate message tokens. When content is a list of blocks (multimodal: text + image_url), len() returns block count (e.g. 2) rather than character count, so a message with 500 chars of text was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust all messages before hitting the budget ceiling; the head_end safeguard then forced cut = n - min_tail, shrinking the protected tail to the bare minimum and preventing effective compression of long multimodal conversations. Fix mirrors the existing pattern in _prune_old_tool_results (line 487): sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content) Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard (confirms the test fails with the bug), plain-string regression guard, and image-only block edge case. Fixes #16087. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
Teknium	6c87371815	fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327 ) Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes migration (especially migrations done via OpenClaw's own tool, which doesn't archive the source directory). 1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py: rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml (capital H — a directory that doesn't exist). Now case-preserving: "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem paths land on the real Hermes home). Regex logic unchanged — replacement function now checks if the matched text was all-lowercase and emits the replacement in the matching case. 2. agent/onboarding.py + cli.py: one-time startup banner the first time Hermes launches and finds ~/.openclaw/. Tells the user to run `hermes claw cleanup` to archive it, gated on the existing onboarding seen-flag framework (onboarding.seen.openclaw_residue_cleanup in config.yaml). Fires once per install; re-running requires wiping that flag or running cleanup directly. Tests: - 4 new TestDetectOpenclawResidue tests (present / absent / file-instead- of-dir / default-home smoke) - 2 TestOpenclawResidueHint tests (content check) - 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip) - test_rebrand_text_preserves_filesystem_path_casing regression test with 4 scenarios including the exact ~/.openclaw/config.yaml case - Existing test_rebrand_text_* tests updated to the new case-preserving contract (lowercase input → lowercase output) Co-authored-by: teknium1 <teknium@noreply.github.com>	2026-04-26 20:57:26 -07:00
Teknium	e19854d893	fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322 ) `_resolve_effective_accept()` used `return bool(cfg_val)` for the `hooks_auto_accept` config key. In Python, `bool("false")` is `True`, so a user setting `hooks_auto_accept: "false"` (quoted YAML string) in `config.yaml` would silently enable auto-approval of every shell hook, bypassing the consent prompt entirely. Replace the coercion with the same type-aware parsing already used for the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough, strings checked against {1,true,yes,on} case-insensitively, everything else (including "false", None, 0, ints) rejected. Add TestHooksAutoAcceptParsing guarding the regression across all four value shapes (bool, string-truthy, string-falsy, missing/None). Reported by @sprmn24 in #16244.	2026-04-26 20:48:35 -07:00
Teknium	635253b918	feat(busy): add 'steer' as a third display.busy_input_mode option (#16279 ) Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt).	2026-04-26 18:21:29 -07:00
Teknium	9a70260490	Revert "feat(onboarding): port first-touch hints to the TUI (#16054 )" (#16062 ) This reverts commit `ffd2621039`.	2026-04-26 06:31:37 -07:00
Teknium	ffd2621039	feat(onboarding): port first-touch hints to the TUI (#16054 ) PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean	2026-04-26 06:24:19 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	438db0c7b0	fix(cli): /model picker honors provider-specific context caps (#16030 ) `_apply_model_switch_result` (the interactive `/model` picker's confirmation path) printed `ModelInfo.context_window` straight from models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker showed 1M while the runtime (compressor, gateway `/model`, typed `/model <name>`) correctly used 272K — the classic 'sometimes 1M, sometimes 272K' mismatch on a single model. Both display paths now go through `resolve_display_context_length()`, matching the fix that `_handle_model_switch` received earlier. Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS (`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the 272K Codex cap is already enforced via the Codex-OAuth branch, so the fallback now reflects what every non-Codex probe-miss should see. Tests: adds `test_apply_model_switch_result_context.py` with three scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls back to ModelInfo). Updates the existing non-Codex fallback test to assert 1.05M (the correct value). ## Validation \| path \| before \| after \| \|-------------------------------\|-----------\|-----------\| \| picker -> gpt-5.5 on Codex \| 1,050,000 \| 272,000 \| \| picker -> gpt-5.5 on OpenAI \| 1,050,000 \| 1,050,000 \| \| picker -> gpt-5.5 on OpenRouter \| 1,050,000 \| 1,050,000 \| \| typed /model gpt-5.5 on Codex \| 272,000 \| 272,000 \|	2026-04-26 05:43:31 -07:00
zkl	2ccdadcca6	fix(deepseek): bump V4 family context window to 1M tokens #14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native provider but the context-window lookup still falls back to the existing "deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context window, so any caller relying on get_model_context_length() for pre-flight token budgeting (compression, context warnings) under-counts by ~8x. Add explicit lowercase entries for the four DeepSeek model ids that ship 1M context: - deepseek-v4-pro - deepseek-v4-flash - deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking) - deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking) Longest-key-first substring matching means these explicit entries also cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter and Nous Portal) without regressing the existing 128K fallback for older / unknown DeepSeek model ids on custom endpoints. Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing	2026-04-26 05:32:54 -07:00
Teknium	192e7eb21f	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 ) Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi, MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of those models recorded a cross-session file breaker that blocked EVERY model on Nous for the cooldown window -- even though the caller's own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro capacity error, restarted, switched to Kimi 2.6, and still got 'Nous Portal rate limit active -- resets in 46m 53s'. Nous already emits the full x-ratelimit-* header suite on every response (captured by rate_limit_tracker into agent._rate_limit_state). We now gate the breaker on that data: trip it only when either the 429's own headers or the last-known-good state show a bucket with remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s (healthy buckets everywhere, but upstream out of capacity) fall through to normal retry/fallback and the breaker is never written. Note: the in-memory 'restart TUI/gateway to clear' workaround circulated in Discord does NOT work -- the breaker is file-backed at ~/.hermes/rate_limits/nous.json. The workaround for users still affected by a bad state file is to delete it. Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).	2026-04-26 04:53:42 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
Teknium	3c1c65e754	fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633 ) Generalize the temperature-specific 400 retry that shipped in PR #15621 so the same reactive strategy covers any provider that rejects an arbitrary request parameter — — not just temperature. - agent/auxiliary_client.py: * New _is_unsupported_parameter_error(exc, param): matches the same six phrasings the old temperature detector did plus 'unrecognized parameter' and 'invalid parameter', against any named param. * _is_unsupported_temperature_error is now a thin back-compat wrapper so existing imports and tests keep working. * The max_tokens → max_completion_tokens retry branch in call_llm and async_call_llm now (a) gates on 'max_tokens is not None' so we do not pop a key that was never set and silently substitute a None value on the retry, and (b) also matches the generic helper in addition to the legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking up phrasings like 'Unknown parameter: max_tokens' that previously slipped through. - tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering the generic detector across params, the back-compat wrapper, and the two hardenings to the max_tokens retry branch (None gate + generic phrasing). Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR also proposed the reactive temperature retry which landed independently via PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages the remaining hardening ideas onto current main.	2026-04-25 05:50:34 -07:00
Ash Rowan Vale 🌿	facea84559	fix(auxiliary): retry without temperature when any provider rejects it Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature' across all providers/models — not just Codex Responses. The same backend can accept temperature for some models and reject it for others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does not scale. call_llm / async_call_llm now detect the concrete 'unsupported parameter: temperature' 400 and transparently retry once without temperature. Kimi's server-managed omission and Opus 4.7+'s proactive strip stay in place — this is the safety net for everything else. Changes: - agent/auxiliary_client.py: add _is_unsupported_temperature_error helper; wire into both sync and async call_llm paths before the existing max_tokens/payment/auth retry ladder - tests/agent/test_unsupported_temperature_retry.py: 19 tests covering detector phrasings, sync + async retry, no-retry-without-temperature, and non-temperature 400s not triggering the retry Builds on PR #15620 (codex_responses fallback) which stripped temperature up front for that one api_mode. This PR closes the gap for every other provider/model combo via reactive retry. Credit: retry approach and detector originate from @BlueBirdBack's PR #15578. Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>	2026-04-25 05:27:17 -07:00
vominh1919	5401a0080d	fix: recalculate token budgets on model switch in ContextCompressor update_model() recalculated threshold_tokens but left tail_token_budget and max_summary_tokens at their __init__ values. When switching from a 200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K) instead of the intended ~10%. Adds budget recalculation in update_model() and 2 regression tests.	2026-04-25 15:07:56 +05:30
helix4u	6a957a74bc	fix(memory): add write origin metadata	2026-04-24 14:37:55 -07:00
Andre Kurait	a9ccb03ccc	fix(bedrock): evict cached boto3 client on stale-connection errors ## Problem When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT timeout, VPN flap, server-side TCP RST, proxy idle cull), the next Converse call surfaces as one of: * botocore.exceptions.ConnectionClosedError / ReadTimeoutError / EndpointConnectionError / ConnectTimeoutError * urllib3.exceptions.ProtocolError * A bare AssertionError raised from inside urllib3 or botocore (internal connection-pool invariant check) The agent loop retries the request 3x, but the cached boto3 client in _bedrock_runtime_client_cache is reused across retries — so every attempt hits the same dead connection pool and fails identically. Only a process restart clears the cache and lets the user keep working. The bare-AssertionError variant is particularly user-hostile because str(AssertionError()) is an empty string, so the retry banner shows: ⚠️ API call failed: AssertionError 📝 Error: with no hint of what went wrong. ## Fix Add two helpers to agent/bedrock_adapter.py: * is_stale_connection_error(exc) — classifies exceptions that indicate dead-client/dead-socket state. Matches botocore ConnectionError + HTTPClientError subtrees, urllib3 ProtocolError / NewConnectionError, and AssertionError raised from a frame whose module name starts with urllib3., botocore., or boto3.. Application-level AssertionErrors are intentionally excluded. * invalidate_runtime_client(region) — per-region counterpart to the existing reset_client_cache(). Evicts a single cached client so the next call rebuilds it (and its connection pool). Wire both into the Converse call sites: * call_converse() / call_converse_stream() in bedrock_adapter.py (defense-in-depth for any future caller) * The two direct client.converse(kwargs) / client.converse_stream(kwargs) call sites in run_agent.py (the paths the agent loop actually uses) On a stale-connection exception, the client is evicted and the exception re-raised unchanged. The agent's existing retry loop then builds a fresh client on the next attempt and recovers without requiring a process restart. ## Tests tests/agent/test_bedrock_adapter.py gets three new classes (14 tests): * TestInvalidateRuntimeClient — per-region eviction correctness; non-cached region returns False. * TestIsStaleConnectionError — classifies botocore ConnectionClosedError / EndpointConnectionError / ReadTimeoutError, urllib3 ProtocolError, library-internal AssertionError (both urllib3.* and botocore.* frames), and correctly ignores application-level AssertionError and unrelated exceptions (ValueError, KeyError). * TestCallConverseInvalidatesOnStaleError — end-to-end: stale error evicts the cached client, non-stale error (validation) leaves it alone, successful call leaves it cached. All 116 tests in test_bedrock_adapter.py pass. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Tranquil-Flow	7dc6eb9fbf	fix(agent): handle aws_sdk auth type in resolve_provider_client Bedrock's aws_sdk auth_type had no matching branch in resolve_provider_client(), causing it to fall through to the "unhandled auth_type" warning and return (None, None). This broke all auxiliary tasks (compression, memory, summarization) for Bedrock users — the main conversation loop worked fine, but background context management silently failed. Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via build_anthropic_bedrock_client(), using boto3's default credential chain (IAM roles, SSO, env vars, instance metadata). Default auxiliary model is Haiku for cost efficiency. Closes #13919	2026-04-24 07:26:07 -07:00
Andre Kurait	b290297d66	fix(bedrock): resolve context length via static table before custom-endpoint probe ## Problem `get_model_context_length()` in `agent/model_metadata.py` had a resolution order bug that caused every Bedrock model to fall back to the 128K default context length instead of reaching the static Bedrock table (200K for Claude, etc.). The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in `_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False. The resolution order then ran the custom-endpoint probe (step 2) before the Bedrock branch (step 4b), which: 1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`). 2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the bedrock-runtime URL (Bedrock doesn't serve this shape). 3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the "probe-down" branch — never reaching the Bedrock static table. Result: users on Bedrock saw 128K context for Claude models that actually support 200K on Bedrock, causing premature auto-compression. ## Fix Promote the Bedrock branch from step 4b to step 1b, so it runs before the custom-endpoint probe at step 2. The static table in `bedrock_adapter.py::get_bedrock_context_length()` is the authoritative source for Bedrock (the ListFoundationModels API doesn't expose context window sizes), so there's no reason to probe `/models` first. The original step 4b is replaced with a one-line breadcrumb comment pointing to the new location, to make the resolution-order docstring accurate. ## Changes - `agent/model_metadata.py` - Add step 1b: Bedrock static-table branch (unchanged predicate, moved). - Remove dead step 4b block, replace with breadcrumb comment. - Update resolution-order docstring to include step 1b. - `tests/agent/test_model_metadata.py` - New `TestBedrockContextResolution` class (3 tests): - `test_bedrock_provider_returns_static_table_before_probe`: confirms `provider="bedrock"` hits the static table and does NOT call `fetch_endpoint_model_metadata` (regression guard). - `test_bedrock_url_without_provider_hint`: confirms the `bedrock-runtime.*.amazonaws.com` host match works without an explicit `provider=` hint. - `test_non_bedrock_url_still_probes`: confirms the probe still fires for genuinely-custom endpoints (no over-reach). ## Testing pytest tests/agent/test_model_metadata.py -q # 83 passed in 1.95s (3 new + 80 existing) ## Risk Very low. - Predicate is identical to the original step 4b — no behaviour change for non-Bedrock paths. - Original step 4b was dead code for the user-facing case (always hit the 128K fallback first), so removing it cannot regress behaviour. - Bedrock path now short-circuits before any network I/O — faster too. - `ImportError` fall-through preserved so users without `boto3` installed are unaffected. ## Related - This is a prerequisite for accurate context-window accounting on Bedrock — the fix for #14710 (stale-connection client eviction) depends on correct context sizing to know when to compress. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Qi Ke	f2fba4f9a1	fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295 ) Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7, us.anthropic.claude-sonnet-4-5-v1:0), not version separators. normalize_model_name() was unconditionally converting all dots to hyphens, producing invalid IDs that Bedrock rejects with HTTP 400/404. This affected both the main agent loop (partially mitigated by _anthropic_preserve_dots in run_agent.py) and all auxiliary client calls (compression, session_search, vision, etc.) which go through _AnthropicCompletionsAdapter and never pass preserve_dots=True. Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes (anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen conversion for these IDs regardless of the preserve_dots flag.	2026-04-24 07:26:07 -07:00
Wooseong Kim	be6b83562d	fix(aux): force anthropic oauth refresh after 401 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 07:14:00 -07:00
5park1e	e1106772d9	fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain Bug 3 — Stale OAuth token not detected in 'hermes model': - _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats any non-empty token (including expired OAuth tokens) as valid. - Added existing_is_stale_oauth check: if the only credential is an OAuth token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale and force the re-auth menu instead of silently accepting a broken token. Bug 4 — macOS Keychain credentials never read: - Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the macOS Keychain under service 'Claude Code-credentials'. - Added _read_claude_code_credentials_from_keychain() using the 'security' CLI tool; read_claude_code_credentials() now tries Keychain first then falls back to JSON file. - Non-Darwin platforms return None from Keychain read immediately. Tests: - tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only guard, security command failures, JSON parsing, fallback priority. - tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases covering stale OAuth detection, API key passthrough, cc_creds fallback. Refs: #12905	2026-04-24 07:14:00 -07:00
vlwkaos	f7f7588893	fix(agent): only set rate-limit cooldown when leaving primary; add tests	2026-04-24 05:35:43 -07:00
Teknium	ba44a3d256	fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133 ) Two small fixes triggered by a support report where the user saw a cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML error page, not a real API error) on every gemini-2.5-pro request. The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but nothing in our output made that diagnosable: 1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but omitted Google entirely, so users had no way to verify from 'hermes dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY rows. 2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted an empty/whitespace api_key and stamped it into the x-goog-api-key header, which made Google's frontend return a generic HTML 400 long before the request reached the Generative Language backend. Now we raise RuntimeError at construction with an actionable message pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com. Added a regression test that covers '', ' ', and None.	2026-04-24 05:35:17 -07:00
konsisumer	785d168d50	fix(credential_pool): add Nous OAuth cross-process auth-store sync Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token via resolve_nous_runtime_credentials() write the rotated tokens to auth.json. The calling process's pool entry becomes stale, and the next refresh against the already-rotated token triggers a 'refresh token reuse' revocation on the Nous Portal. _sync_nous_entry_from_auth_store() reads auth.json under the same lock used by resolve_nous_runtime_credentials, and adopts the newer token pair before refreshing the pool entry. This complements #15111 (which preserved the obtained_at timestamps through seeding). Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py changes + the 3 Nous-specific regression tests. The PR also touched 10 unrelated files (Dockerfile, tips.py, various tool tests) which were dropped as scope creep. Regression tests: - test_sync_nous_entry_from_auth_store_adopts_newer_tokens - test_sync_nous_entry_noop_when_tokens_match - test_nous_exhausted_entry_recovers_via_auth_store_sync	2026-04-24 05:20:05 -07:00

1 2 3 4 5 ...

320 commits