hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
helix4u	6ab78401c9	fix(aux): add session_search extra_body and concurrency controls Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy OpenAI-compatible providers can receive provider-specific request fields (e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds session_search summary fan-out with auxiliary.session_search.max_concurrency (default 3, clamped 1-5) to avoid 429 bursts on small providers. - agent/auxiliary_client.py: extract _get_auxiliary_task_config helper, add _get_task_extra_body, merge config+explicit extra_body with explicit winning - hermes_cli/config.py: extra_body defaults on all aux tasks + session_search.max_concurrency; _config_version 19 -> 20 - tools/session_search_tool.py: semaphore around _summarize_all gather - tests: coverage in test_auxiliary_client, test_session_search, test_aux_config - docs: user-guide/configuration.md + fallback-providers.md Co-authored-by: Teknium <teknium@nousresearch.com>	2026-04-20 00:47:39 -07:00
kagura-agent	9b60ffc47f	fix: include api.moonshot.cn in public API temperature override (#12745 ) kimi-k2.5 on api.moonshot.cn/v1 rejects temperature=0.6 with HTTP 400, same as api.moonshot.ai. The public API check now matches both domains.	2026-04-20 00:32:06 -07:00
Teknium	65a31ee0d5	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 ) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>	2026-04-19 22:43:09 -07:00
taeng0204	6f79b8f01d	fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai Follow-up to #12144. That PR standardized the kimi-k2.* temperature lock against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where non-thinking models require 0.6. Verified empirically against Moonshot (April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a different contract for kimi-k2.5: it only accepts temperature=1, and rejects 0.6 with: HTTP 400 "invalid temperature: only 1 is allowed for this model" Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py). So for Coding Plan subscribers the fix from #12144 is correct, but for public-API users it reintroduces the exact 400 reported in #9125. Reproduction on api.moonshot.ai/v1 + kimi-k2.5: temperature=1.0 → 200 OK temperature=0.6 → 400 "only 1 is allowed" ← #12144 default temperature=None → 200 OK Other kimi-k2.* models are unaffected empirically — turbo-preview accepts 0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5 diverges. Fix: thread the client's actual base_url through _build_call_kwargs (the parameter already existed but callers passed config-level resolved_base_url; for auto-detected routes that was often empty). _fixed_temperature_for_model now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES map, then falls back to the Coding Plan defaults. Tests parametrize over endpoint + model to lock both contracts. Closes #9125.	2026-04-19 18:54:35 -07:00
kshitijk4poor	d393104bad	fix(gemini): tighten native routing and streaming replay - only use the native adapter for the canonical Gemini native endpoint - keep custom and /openai base URLs on the OpenAI-compatible path - preserve Hermes keepalive transport injection for native Gemini clients - stabilize streaming tool-call replay across repeated SSE events - add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks	2026-04-19 12:40:08 -07:00
kshitijk4poor	3dea497b20	feat(providers): route gemini through the native AI Studio API - add a native Gemini adapter over generateContent/streamGenerateContent - switch the built-in gemini provider off the OpenAI-compatible endpoint - preserve thought signatures and native functionResponse replay - route auxiliary Gemini clients through the same adapter - add focused unit coverage plus native-provider integration checks	2026-04-19 12:40:08 -07:00
Teknium	cca3278079	fix(codex): pin correct Cloudflare headers and extend to auxiliary client The cherry-picked salvage (admin28980's commit) added codex headers only on the primary chat client path, with two inaccuracies: - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs, codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on the list, so the header had no mitigating effect on the 403 (the account-id header alone may have been carrying the fix). - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID). Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex branch) constructs OpenAI clients against the same chatgpt.com endpoint with no default headers at all — so compression, title generation, vision, session search, and web_extract all still 403 from VPS IPs. Consolidate the header set into _codex_cloudflare_headers() in agent/auxiliary_client.py (natural home next to _read_codex_access_token and the existing JWT decode logic) and call it from all four insertion points: - run_agent.py: AIAgent.__init__ (initial construction) - run_agent.py: _apply_client_headers_for_base_url (credential rotation) - agent/auxiliary_client.py: _try_codex (aux client) - agent/auxiliary_client.py: resolve_provider_client raw_codex branch Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to match the codex-rs shape while keeping product attribution. Tests in tests/agent/test_codex_cloudflare_headers.py cover: - originator value, User-Agent shape, canonical header casing - account-ID extraction from a real JWT fixture - graceful handling of malformed / non-string / claim-missing tokens - wiring at all four insertion points (primary init, rotation, both aux paths) - non-chatgpt base URLs (openrouter) do NOT get codex headers - switching away from chatgpt.com drops the headers	2026-04-19 11:59:25 -07:00
helix4u	ca32a2a60b	fix(gemini): restore bearer auth on openai route	2026-04-18 12:52:01 -07:00
kshitij	c14b3b5880	fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144 ) * fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) The prior override only matched the literal model name "kimi-for-coding", but Moonshot's coding endpoint is hit with real model IDs such as `kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those requests bypassed the override and kept the caller's temperature, so Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for this model" (or 1.0 for thinking variants). Match the whole kimi-k2.* family: * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode) * all other kimi-k2.* -> 0.6 (non-thinking / instant mode) Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so aggregator routings are covered. * refactor(kimi): whitelist-match kimi coding models instead of prefix Addresses review feedback on PR #12144. - Replace `startswith("kimi-k2")` with explicit frozensets sourced from Moonshot's kimi-for-coding model list. The prefix match would have also clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the separate non-coding K2 family with variable temperature (recommended 0.6 but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct). - Confirmed via platform.kimi.ai docs that all five coding models (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo) share the fixed-temperature lock, so the preview-model mapping is no longer an assumption. - Drop the fragile `"thinking" in bare` substring test for a set lookup. - Log a debug line on each override so operators can see when Hermes silently rewrites temperature. - Update class docstring. Extend the negative test to parametrize over kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future kimi-k2-experimental name — all must keep the caller's temperature.	2026-04-18 09:35:51 -07:00
AviArora02-commits	994faacce8	fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893 )	2026-04-17 21:30:17 -07:00
Teknium	a155b4a159	feat(auxiliary): default 'auto' routing to main model for all users (#11900 ) Before: aggregator users (OpenRouter / Nous Portal) running 'auto' routing for auxiliary tasks — compression, vision, web extraction, session search, etc. — got routed to a cheap provider-side default model (Gemini Flash). Non-aggregator users already got their main model. Behavior was inconsistent and surprising — users picked Claude / GPT / their preferred model, but side tasks ran on Gemini Flash. After: 'auto' means "use my main chat model" for every user, regardless of provider type. Only when the main provider has no working client does the fallback chain run (OpenRouter → Nous → custom → Codex → API-key providers). Explicit per-task overrides in config.yaml (auxiliary.<task>.provider / .model) still win — they are a hard constraint, not subject to the auto policy. Vision auto-detection follows the same policy: try main provider + main model first (with _PROVIDER_VISION_MODELS overrides preserved for providers like xiaomi and zai that ship a dedicated multimodal model distinct from their chat model). Aggregator strict vision backends are fallbacks, not the primary path. Changes: - agent/auxiliary_client.py: _resolve_auto() drops the `_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client() auto branch unifies aggregator and exotic-provider paths — everyone goes through resolve_provider_client() with main_model. Dead _AGGREGATOR_PROVIDERS constant removed (was only used by the guard we just removed). - hermes_cli/main.py: aux config menu copy updated to reflect the new semantics ("'auto' means 'use my main model'"). - tests/agent/test_auxiliary_main_first.py: 12 regression tests covering OpenRouter/Nous/DeepSeek main paths, runtime-override wins, explicit-config wins, vision override preservation for exotic providers, and fallback-chain activation when the main provider has no working client. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:13:23 -07:00
helix4u	2b60478fc2	fix(kimi): force kimi-for-coding temperature to 0.6	2026-04-17 15:49:14 -07:00
trevthefoolish	0517ac3e93	fix(agent): complete Claude Opus 4.7 API migration Claude Opus 4.7 introduced several breaking API changes that the current codebase partially handled but not completely. This patch finishes the migration per the official migration guide at https://platform.claude.com/docs/en/about-claude/models/migration-guide Fixes NousResearch/hermes-agent#11137 Breaking-change coverage: 1. Adaptive thinking + output_config.effort — 4.7 is now recognized by _supports_adaptive_thinking() (extends previous 4.6-only gate). 2. Sampling parameter stripping — 4.7 returns 400 for any non-default temperature / top_p / top_k. build_anthropic_kwargs drops them as a safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs) and AnthropicCompletionsAdapter.create() both early-exit before setting temperature for 4.7+ models. This keeps flush_memories and structured-JSON aux paths that hardcode temperature from 400ing when the aux model is flipped to 4.7. 3. thinking.display = "summarized" — 4.7 defaults display to "omitted", which silently hides reasoning text from Hermes's CLI activity feed during long tool runs. Restoring "summarized" preserves 4.6 UX. 4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which silently over-efforted every coding/agentic request). max is now a distinct ceiling per Anthropic's 5-level effort model. 5. New stop_reason values — refusal and model_context_window_exceeded were silently collapsed to "stop" (end_turn) by the adapter's stop_reason_map. Now mapped to "content_filter" and "length" respectively, matching upstream finish-reason handling already in bedrock_adapter. 6. Model catalogs — claude-opus-4-7 added to the Anthropic provider list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback catalog (recommended), claude-opus-4-7 added to model_metadata DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide). 7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role prefill (400). 8. Tests — 4 new tests in test_anthropic_adapter covering display default, xhigh preservation, max on 4.7, refusal / context-overflow stop_reason mapping, plus the sampling-param predicate. test_model_metadata accepts 4.7 at 1M context. Tested on macOS 15.5 (darwin). 119 tests pass in tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.	2026-04-16 10:48:20 -07:00
lrawnsley	8c1276c0bf	fix: pass resolved args to resolve_vision_provider_client() resolve_vision_provider_client() was receiving the raw call_llm parameters instead of the resolved provider/model/key/url from _resolve_task_provider_model(). This caused config overrides (auxiliary.vision.provider, etc.) to be silently discarded. Cherry-picked from #10901 by @lrawnsley.	2026-04-16 07:45:13 -07:00
Teknium	0c1217d01e	feat(xai): upgrade to Responses API, add TTS provider Cherry-picked and trimmed from PR #10600 by Jaaneek. - Switch xAI transport from openai_chat to codex_responses (Responses API) - Add codex_responses detection for xAI in all runtime_provider resolution paths - Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect) - Add extra_headers passthrough for codex_responses requests - Add x-grok-conv-id session header for xAI prompt caching - Add xAI reasoning support (encrypted_content include, no effort param) - Move x-grok-conv-id from chat_completions path to codex_responses path - Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion) - Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary - Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning) - Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS - Add xAI TTS config section, setup wizard entry, tools_config provider option - Add shared xai_http.py helper for User-Agent string Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-04-16 02:24:08 -07:00
kshitijk4poor	1b61ec470b	feat: add Ollama Cloud as built-in provider Add ollama-cloud as a first-class provider with full parity to existing API-key providers (gemini, zai, minimax, etc.): - PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var - Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud - models.dev integration for accurate context lengths - URL-to-provider mapping (ollama.com -> ollama-cloud) - Passthrough model normalization (preserves Ollama model:tag format) - Default auxiliary model (nemotron-3-nano:30b) - HermesOverlay in providers.py - CLI --provider choices, CANONICAL_PROVIDERS entry - Dynamic model discovery with disk caching (1hr TTL) - 37 provider-specific tests Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926	2026-04-16 02:22:09 -07:00
Teknium	9d9b424390	fix: Nous Portal rate limit guard — prevent retry amplification (#10568 ) When Nous returns a 429, the retry amplification chain burns up to 9 API requests per conversation turn (3 SDK retries × 3 Hermes retries), each counting against RPH and deepening the rate limit. With multiple concurrent sessions (cron + gateway + auxiliary), this creates a spiral where retries keep the limit tapped indefinitely. New module: agent/nous_rate_guard.py - Shared file-based rate limit state (~/.hermes/rate_limits/nous.json) - Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit- reset-requests, retry-after headers, or error context - Falls back to 5-minute default cooldown if no header data - Atomic writes (tempfile + rename) for cross-process safety - Auto-cleanup of expired state files run_agent.py changes: - Top-of-retry-loop guard: when another session already recorded Nous as rate-limited, skip the API call entirely. Try fallback provider first, then return a clear message with the reset time. - On 429 from Nous: record rate limit state and skip further retries (sets retry_count = max_retries to trigger fallback path) - On success from Nous: clear the rate limit state so other sessions know they can resume auxiliary_client.py changes: - _try_nous() checks rate guard before attempting Nous in the auxiliary fallback chain. When rate-limited, returns (None, None) so the chain skips to the next provider instead of piling more requests onto Nous. This eliminates three sources of amplification: 1. Hermes-level retries (saves 6 of 9 calls per turn) 2. Cross-session retries (cron + gateway all skip Nous) 3. Auxiliary fallback to Nous (compression/session_search skip too) Includes 24 tests covering the rate guard module, header parsing, state lifecycle, and auxiliary client integration.	2026-04-15 16:31:48 -07:00
MestreY0d4-Uninter	f4724803b4	fix(runtime): surface malformed proxy env and base URL before client init When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken shell config — the OpenAI/httpx client throws a cryptic 'Invalid port' error that doesn't identify the offending variable. Add _validate_proxy_env_urls() and _validate_base_url() in auxiliary_client.py, called from resolve_provider_client() and _create_openai_client() to fail fast with a clear, actionable error message naming the broken env var or URL. Closes #6360 Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>	2026-04-15 16:10:53 -07:00
Teknium	6391b46779	fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200 ) (#10470 ) The _client_cache used event loop id() as part of the cache key, so every new worker-thread event loop created a new entry for the same provider config. In long-running gateways where threads are recycled frequently, this caused unbounded cache growth — each stale entry held an unclosed AsyncOpenAI client with its httpx connection pool, eventually exhausting file descriptors. Fix: remove loop_id from the cache key and instead validate on each async cache hit that the cached loop is the current, open loop. If the loop changed or was closed, the stale entry is replaced in-place rather than creating an additional entry. This bounds cache growth to at most one entry per unique provider config. Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO eviction as defense-in-depth against any remaining unbounded growth. Cross-loop safety is preserved: different event loops still get different client instances (validated by existing test suite). Closes #10200	2026-04-15 13:16:28 -07:00
Teknium	6448e1da23	feat(zai): add GLM-5V-Turbo support for coding plan (#9907 ) - Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists - Add glm-5v context length entry (200K tokens) to model metadata - Update Z.AI endpoint probe to try multiple candidate models per endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for newer coding plan accounts that lack older models - Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks (vision_analyze, browser screenshots) route through 5v Fixes #9888	2026-04-14 16:26:01 -07:00
Teknium	b27eaaa4db	fix: improve ACP type check and restore comment accuracy - Use isinstance() with try/except import for CopilotACPClient check in _to_async_client instead of fragile __class__.__name__ string check - Restore accurate comment: GPT-5.x models require (not 'often require') the Responses API on OpenAI/OpenRouter; ACP is the exception, not a softening of the requirement - Add inline comment explaining the ACP exclusion rationale	2026-04-13 16:17:43 -07:00
helix4u	8680f61f8b	fix(copilot-acp): keep acp runtime off responses path	2026-04-13 16:17:43 -07:00
hcshen0111	2b3aa36242	feat(providers): add kimi-coding-cn provider for mainland China users Cherry-picked from PR #7637 by hcshen0111. Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var and api.moonshot.cn/v1 endpoint for China-region Moonshot users.	2026-04-13 11:20:37 -07:00
墨綠BG	c449cd1af5	fix(config): restore custom providers after v11→v12 migration The v11→v12 migration converts custom_providers (list) into providers (dict), then deletes the list. But all runtime resolvers read from custom_providers — after migration, named custom endpoints silently stop resolving and fallback chains fail with AuthError. Add get_compatible_custom_providers() that reads from both config schemas (legacy custom_providers list + v12+ providers dict), normalizes entries, deduplicates, and returns a unified list. Update ALL consumers: - hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env - hermes_cli/auth_commands.py: credential pool provider names - hermes_cli/main.py: model picker + _model_flow_named_custom() - agent/auxiliary_client.py: key_env + custom_entry model fallback - agent/credential_pool.py: _iter_custom_providers() - cli.py + gateway/run.py: /model switch custom_providers passthrough - run_agent.py + gateway/run.py: per-model context_length lookup Also: use config.pop() instead of del for safer migration, fix stale _config_version assertions in tests, add pool mock to codex test. Co-authored-by: 墨綠BG <s5460703@gmail.com> Closes #8776, salvaged from PR #8814	2026-04-13 10:50:52 -07:00
luyao618	8ec1608642	fix(agent): propagate api_mode to vision provider resolution resolve_vision_provider_client() computed resolved_api_mode from config but never passed it to downstream resolve_provider_client() or _get_cached_client() calls, causing custom providers with api_mode: anthropic_messages to crash when used for vision tasks. Also remove the for_vision special case in _normalize_aux_provider() that incorrectly discarded named custom provider identifiers. Fixes #8857 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 05:02:54 -07:00
Teknium	e3ffe5b75f	fix: remove legacy compression.summary_* config and env var fallbacks (#8992 ) Remove the backward-compat code paths that read compression provider/model settings from legacy config keys and env vars, which caused silent failures when auto-detection resolved to incompatible backends. What changed: - Remove compression.summary_model, summary_provider, summary_base_url from DEFAULT_CONFIG and cli.py defaults - Remove backward-compat block in _resolve_task_provider_model() that read from the legacy compression section - Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper functions (AUXILIARY_/CONTEXT_ env var readers) - Remove env var fallback chain for per-task overrides - Update hermes config show to read from auxiliary.compression - Add config migration (v16→17) that moves non-empty legacy values to auxiliary.compression and strips the old keys - Update example config and openclaw migration script - Remove/update tests for deleted code paths Compression model/provider is now configured exclusively via: auxiliary.compression.provider / auxiliary.compression.model Closes #8923	2026-04-13 04:59:26 -07:00
Teknium	400fe9b2a1	fix: add <thought> stripping to auxiliary_client + tests auxiliary_client.py had its own regex mirroring _strip_think_blocks but was missing the <thought> variant. Also adds test coverage for <thought> paired and orphaned tags.	2026-04-12 12:44:49 -07:00
Teknium	078dba015d	fix: three provider-related bugs (#8161 , #8181 , #8147 ) (#8243 ) - Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV so context-length lookups use models.dev data instead of 128k fallback. Fixes #8161. - Set api_mode from custom_providers entry when switching via hermes model, and clear stale api_mode when the entry has none. Also extract api_mode in _named_custom_provider_map(). Fixes #8181. - Convert OpenAI image_url content blocks to Anthropic image blocks when the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL containing /anthropic). Fixes #8147.	2026-04-12 01:44:18 -07:00
Harish Kukreja	b1f13a8c5f	fix(agent): route compression aux through live session runtime	2026-04-12 01:34:52 -07:00
Teknium	976bad5bde	refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889 ) The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER, AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section. This violated the project's '.env is for secrets only' policy — these are behavioral settings, not API keys. Flipped the resolution order in _resolve_task_provider_model(): 1. Explicit args (always win) 2. config.yaml auxiliary.{task}.* (PRIMARY) 3. Env var overrides (backward-compat fallback only) 4. 'auto' (full auto-detection chain) Env var reading code is kept for backward compatibility but config.yaml now takes precedence. Updated module docstring and function docstring. Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.	2026-04-11 11:21:59 -07:00
Teknium	d4bb44d4b9	docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len - environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list - cli-commands.md: --provider choices - integrations/providers.md: provider table, Chinese providers section, config example, base URL list, choosing table, fallback providers list - fallback-providers.md: supported providers table, auto-detection chain - Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry)	2026-04-11 11:17:52 -07:00
kshitijk4poor	6693e2a497	feat(xiaomi): add Xiaomi MiMo as first-class provider Cherry-picked from PR #7702 by kshitijk4poor. Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models: - mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest) Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example. Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's main model. Non-vision aux uses the user's selected model. Added _PROVIDER_VISION_MODELS dict for provider-specific vision model overrides. On failure, falls back to aggregators (gemini flash) via existing fallback chain. Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000, mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000. 36 tests covering registry, aliases, auto-detect, credentials, models.dev, normalization, URL mapping, providers module, doctor, aux client, vision model override, and agent init.	2026-04-11 11:17:52 -07:00
kshitijk4poor	c89719ad9c	fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161 )	2026-04-11 01:52:58 -07:00
kshitijk4poor	d3c5d65563	fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264 ) async_call_llm (and call_llm) can return non-OpenAI objects from custom providers or adapter shims, crashing downstream consumers with misleading AttributeError ('str' has no attribute 'choices'). Add _validate_llm_response() that checks the response has the expected .choices[0].message shape before returning. Wraps all return paths in call_llm, async_call_llm, and fallback paths. Fails fast with a clear RuntimeError identifying the task, response type, and a preview of the malformed payload. Closes #7264	2026-04-11 01:52:58 -07:00
ran	4f5e8b22a7	fix: drop incompatible model slugs on auxiliary client cache hit `resolve_provider_client()` already drops OpenRouter-format model slugs (containing "/") when the resolved provider is not OpenRouter (line 1097). However, `_get_cached_client()` returns `model or cached_default` directly on cache hits, bypassing this check entirely. When the main provider is openai-codex, the auto-detection chain (Step 1 of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary calls for different tasks (e.g. compression with `summary_model: google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter- format model slug straight to the Codex Responses API, which does not understand it and returns an empty `response.output`. This causes two user-visible failures: - "Invalid API response shape" (empty output after 3 retries) - "Context length exceeded, cannot compress further" (compression itself fails through the same path) Add `_compat_model()` helper that mirrors the "/" check from `resolve_provider_client()` and call it on the cache-hit return path.	2026-04-11 01:52:58 -07:00
kshitijk4poor	eeb8b4b00f	fix(auxiliary): harden fallback behavior for non-OpenRouter users Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512	2026-04-11 01:52:58 -07:00
kshitijk4poor	ffbd80f5fc	fix(auxiliary): honor api_mode in auxiliary client (#6800 ) The auxiliary client always calls client.chat.completions.create(), ignoring the api_mode config flag. This breaks codex-family models (e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the /v1/responses endpoint. Changes: - Expand _resolve_task_provider_model to return api_mode (5-tuple) - Read api_mode from auxiliary.{task}.api_mode config and env vars (AUXILIARY_{TASK}_API_MODE) - Pass api_mode through _get_cached_client to resolve_provider_client - Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI clients in CodexAuxiliaryClient when api_mode=codex_responses or when auto-detection finds api.openai.com + codex model pattern - Apply wrapping at all custom endpoint, named custom provider, and API-key provider return paths - Update test mocks for the new 5-tuple return format Users can now set: auxiliary: compression: model: gpt-5.3-codex base_url: https://api.openai.com/v1 api_mode: codex_responses Closes #6800	2026-04-11 01:52:58 -07:00
hermes-agent-dhabibi	c1af614289	fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks GPT-5+ models (except gpt-5-mini) are only accessible via the Responses API on Copilot. When these models were configured as the compression summary_model (or any auxiliary task), the plain OpenAI client sent them to /chat/completions which returned a 400 error: model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint resolve_provider_client() now checks _should_use_copilot_responses_api() for the copilot provider and wraps the client in CodexAuxiliaryClient when needed, routing calls through responses.stream() transparently. Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping (gpt-4.1-mini) paths.	2026-04-10 21:16:53 -07:00
Billard	475cbce775	fix(aux): honor api_mode for custom auxiliary endpoints	2026-04-10 16:47:44 -07:00
Kenny Xie	b730c2955a	fix(model): normalize direct provider ids in auxiliary routing	2026-04-10 05:52:45 -07:00
xwp	5a1cce53e4	fix(auxiliary): skip anthropic in fallback chain when not explicitly configured _resolve_api_key_provider() now checks is_provider_explicitly_configured before calling _try_anthropic(). Previously, any auxiliary fallback (e.g. when kimi-coding key was invalid) would silently discover and use Claude Code OAuth tokens — consuming the user's Claude Max subscription without their knowledge. This is the auxiliary-client counterpart of the setup-wizard gate in PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
alt-glitch	96c060018a	fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-04-10 03:44:43 -07:00
kshitijk4poor	9431f82aff	fix: update Kimi Coding User-Agent to KimiCLI/1.30.0 The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at v1.30.0. The stale version string causes intermittent 403 errors from Kimi's coding endpoint ('only available for Coding Agents'). Update all 8 occurrences across run_agent.py, auxiliary_client.py, and doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.	2026-04-10 02:37:28 -07:00
Teknium	c6974fd108	fix: allow custom endpoint users to use main model for auxiliary tasks Step 1 of _resolve_auto() explicitly excluded 'custom' providers, forcing custom endpoint users through the fragile fallback chain instead of using their known-working main model credentials. This caused silent compression failures for users on local OpenAI- compatible endpoints — the summary generation would fail, middle turns would be silently dropped, and the agent would lose all conversation context. Remove 'custom' from the exclusion list so custom endpoint users get the same main-model-first treatment as DeepSeek, Anthropic, Gemini, and other direct providers.	2026-04-09 13:23:56 -07:00
Yang Zhi	110cdd573a	fix(auxiliary_client): inject KimiCLI User-Agent for custom endpoint sync clients When is explicitly set to , the custom-endpoint path in creates a plain client without provider-specific headers. This means sync vision calls (e.g. ) use the generic User-Agent and get rejected by Kimi's coding endpoint with a 403: 'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...' The async converter already injects , and the auto-detected API-key provider path also injects it, but the explicit custom endpoint shortcut was missing it entirely. This patch adds the same injection to the custom endpoint branch, and updates all existing Kimi header sites to for consistency. Fixes <issue number to be filled in>	2026-04-09 11:11:25 -07:00
Teknium	1ec1f6a68a	fix: model fallback — stale model on Nous login + connection error fallback (#6554 ) Two bugs in the model fallback system: 1. Nous login leaves stale model in config (provider=nous, model=opus from previous OpenRouter setup). Fixed by deferring the config.yaml provider write until AFTER model selection completes, and passing the selected model atomically via _update_config_for_provider's default_model parameter. Previously, _update_config_for_provider was called before model selection — if selection failed (free tier, no models, exception), config stayed as nous+opus permanently. 2. Codex/stale providers in auxiliary fallback can't connect but block the auto-detection chain. Added _is_connection_error() detection (APIConnectionError, APITimeoutError, DNS failures, connection refused) alongside the existing _is_payment_error() check in call_llm(). When a provider endpoint is unreachable, the system now falls back to the next available provider instead of crashing.	2026-04-09 10:38:53 -07:00
Cherif Yaya	5cf4fac2aa	fix: restore codex fallback auth-store lookup	2026-04-09 01:56:10 -07:00
Teknium	7156f8d866	fix: CI test failures — metadata key, cli console, docker env, vision order (#6294 ) Fixes 9 test failures on current main, incorporating ideas from PR stack #6219-#6222 by xinbenlv with corrections: - model_metadata: sync HF context length key casing (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5) - cli.py: route quick command error output through self.console instead of creating a new ChatConsole() instance - docker.py: explicit docker_forward_env entries now bypass the Hermes secret blocklist (intentional opt-in wins over generic filter) - auxiliary_client: revert _read_main_provider() to simple provider.strip().lower() — the _normalize_aux_provider() call introduced in `5c03f2e7` stripped the custom: prefix, breaking named custom provider resolution - auxiliary_client: flip vision auto-detection order to active provider → OpenRouter → Nous → stop (was OR → Nous → active) - test: update vision priority test to match new order Based on PR #6219-#6222 by xinbenlv.	2026-04-08 16:37:05 -07:00
Teknium	5d2fc6d928	fix: cleanup Qwen OAuth provider gaps - Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing despite being referenced in code) - Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS (non-aggregator providers use their main model for aux tasks automatically)	2026-04-08 13:46:30 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00

1 2 3

130 commits