#23482 fixed cache poisoning in the sync path: when a Codex auxiliary
timeout closes the underlying OpenAI client, _evict_cached_client_instance
walks CodexAuxiliaryClient wrappers via their _real_client attribute and
drops the cache entry so the next aux call rebuilds.
The cache key includes async_mode (see _client_cache_key), so the sync and
async clients for the same provider live in two distinct entries pointing
at the same underlying transport. The fix walked the sync wrapper's
_real_client correctly but the async wrappers
(AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient,
AsyncGeminiNativeClient) never exposed _real_client at all, so the async
entry survived eviction and kept handing out the poisoned client.
Effect on async aux callers: one timeout now poisons every subsequent
async aux call (compression, vision, session_search, title_generation)
with 'Connection error' until gateway restart -- even while the sync
route recovered as designed in #23482.
Mirror the sync wrapper's _real_client onto each async wrapper so the
existing eviction helper finds them. Three changes, one per wrapper:
- AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client
(the underlying OpenAI client)
- AsyncAnthropicAuxiliaryClient: same shape
- AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's
native facade is itself the leaf; no OpenAI client beneath it)
Update _evict_cached_client_instance docstring to reflect that it now
covers both sync and async wrappers via the same attribute walk.
Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper
seeds both sync and async cache entries pointing at the same leaf and
asserts both are dropped on a single eviction call. Verified the test
fails without the wrapper changes ("async cache entry survived
eviction -- wrapper is missing _real_client") and passes with them.
Refs #23482, #23432
When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue #23570).
Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
* _resolve_auto Step-1 (main provider use-as-aux path)
* _resolve_auto Step-2 (aggregator/fallback chain)
* _try_payment_fallback (used by call_llm/acall_llm on first 402)
Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.
Refs #23570
A Codex auxiliary timeout closes the underlying OpenAI client (so the
streaming hang doesn't sit until the user kills the session), but the
cached wrapper kept pointing at the now-dead transport. Subsequent
auxiliary calls (compression retry, memory flush, background review,
title generation routed via provider: main) reused that closed client
and failed fast with 'Connection error' until the gateway restarted —
even though the main agent route was healthy the whole time.
Sync `_get_cached_client` had no liveness check (async did, via loop
identity), and the connection-error fallback in `call_llm` only fired
on the auto provider path, so an explicit provider — including the
common `auxiliary.compression.provider: main` shape — never evicted.
Three fixes:
* New `_evict_cached_client_instance(target)` helper that drops the
cache entry whose stored client is target (or wraps it via
`_real_client`, for `CodexAuxiliaryClient`).
* `_CodexCompletionsAdapter._close_client_on_timeout` evicts the
wrapper after closing the inner OpenAI client.
* `call_llm` and `async_call_llm` evict on `_is_connection_error`
before re-raising, regardless of whether the provider is auto.
Net effect: one timeout costs one summary attempt + the existing 30s
compressor cooldown; the next compaction rebuilds the client and
works. Non-connection errors (4xx/5xx) do not evict, so cache hits
stay stable.
Closes#23432
## Summary
- Forwards chat-completions `timeout` into the Codex Responses stream call.
- Adds total elapsed-time enforcement while the Responses stream is still yielding events.
- Closes the underlying client on timeout to unblock stalled streams, then raises `TimeoutError`.
- Adds focused tests for timeout forwarding and total timeout enforcement.
## Why
The Codex auxiliary adapter can be used by non-interactive auxiliary work such as context compression. If the stream keeps yielding progress-like events but never completes, SDK socket/read timeouts do not necessarily protect the full operation. This makes the CLI look stuck until the user force-interrupts the whole session.
This is a refreshed upstream-ready version of the earlier fork fix around `d3f08e9a0` / PR #3.
## Verification
- `python -m py_compile agent/auxiliary_client.py tests/agent/test_auxiliary_client.py`
- `python -m pytest -o addopts='' tests/agent/test_auxiliary_client.py::TestCodexAuxiliaryAdapterTimeout -q`
- `git diff --check`
When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.
Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.
Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
keywords, generic 429 without billing keywords, and OpenAI SDK
`RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".
This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.
Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
Copilot review on PR #17012 noted the docstring/comment lists `0`
among the falsy effort values that fall back to `medium`, but the
existing regression tests only cover `None` and `""`. Add the third
case to lock in the full contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auxiliary.<task>.extra_body.reasoning, but the new translation path in
_CodexCompletionsAdapter.create() reads the effort with
``reasoning_cfg.get("effort", "medium")``. That returns the configured
value verbatim when the key is present, so ``effort: null`` /
``effort: ""`` (both common YAML shapes) flow through as
``{"effort": null, "summary": "auto"}`` and Codex rejects the request
with "Invalid value for parameter ``reasoning.effort``".
agent/transports/codex.py::build_kwargs() — which the new adapter is
documented to mirror — uses a truthy check (``elif
reasoning_config.get("effort"):``) so the same falsy values keep the
"medium" default. Switch the auxiliary adapter to the same
``or "medium"`` truthy form so identical config produces identical
requests on both paths.
- [x] Two new regression tests cover ``effort: None`` and
``effort: ""`` and assert the request goes out as
``{"effort": "medium", "summary": "auto"}``.
- [x] Old behaviour fails the new tests (``{'effort': None} !=
{'effort': 'medium'}``); fixed behaviour passes all 11 tests in the
``TestCodexAdapterReasoningTranslation`` class.
- [x] Adjacent suites green: ``tests/agent/test_auxiliary_client.py``
(108 passed) and ``tests/agent/transports/test_codex_transport.py +
test_chat_completions.py`` (73 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_try_anthropic() lacked the explicit_api_key parameter added to
_try_openrouter() in #18768. When resolve_provider_client() is called
with provider="anthropic" and an explicit key (e.g. from a fallback_model
entry with api_key set), the key was silently ignored — _try_anthropic()
always fell back to resolve_anthropic_token(), so the fallback returned
None,None for users without a default Anthropic credential configured.
Fix: add explicit_api_key: str = None to _try_anthropic() and use
explicit_api_key or <pool/env fallback> in both the pool-present and
no-pool paths. Pass explicit_api_key=explicit_api_key at the call site
in resolve_provider_client(). Symmetric with the _try_openrouter() fix.
No behavior change when explicit_api_key is None.
When resolve_provider_client() passes explicit_api_key for OpenRouter auxiliary
tasks, _try_openrouter() now accepts and honors this parameter instead of
silently ignoring it and falling back to OPENROUTER_API_KEY env var.
Root cause: _try_openrouter() had no explicit_api_key parameter, so even
when callers wanted to pass a runtime credential pool key, it could not be used.
Fix:
- Add explicit_api_key: str = None parameter to _try_openrouter()
- Prioritize explicit_api_key over pool key and env var
- Update resolve_provider_client() call site to pass explicit_api_key
Regression coverage:
- Test that explicit_api_key is passed to OpenAI client when provided
- Test that fallback to OPENROUTER_API_KEY still works when explicit_api_key is None
Closes#18338
Providers like Google Vertex, Azure, and Amazon Bedrock reject API
requests with duplicate tool names (HTTP 400: 'Tool names must be
unique'). The upstream injection paths in run_agent.py already dedup
after PR #17335, but two API-boundary functions pass tools through
without checking:
- agent/auxiliary_client.py: _build_call_kwargs() (all non-Anthropic
providers in chat_completions mode)
- agent/anthropic_adapter.py: convert_tools_to_anthropic() (Anthropic
Messages API path)
Add defensive dedup guards at both sites. Duplicates are dropped with
a warning log, converting a hard 400 failure into a recoverable
condition. This is intentionally conservative — the root-cause dedup
in run_agent.py is the primary defense; these guards add resilience
against future injection-path regressions.
Includes 8 new tests covering unique passthrough, duplicate removal,
empty/None edge cases.
Closes#18478
The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks
(gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex)
because ChatGPT-account Codex gates which models it accepts via an
undocumented, shifting allow-list that OpenAI publishes no changelog
for. Any pinned default will keep going stale. Issue #17533 reports
the current breakage: every ChatGPT-account auxiliary fallback fails
with HTTP 400 "model is not supported" and the 60s pause loop degrades
long sessions.
Rather than reset the clock with another stale pin (PR #17544 proposes
gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex
fallback entirely:
- Delete `_CODEX_AUX_MODEL`.
- Drop `_try_codex` from `_get_provider_chain()` (the auto chain now
ends at api-key providers; 4 rungs instead of 5).
- Rename `_try_codex() -> _build_codex_client(model)` and require an
explicit model from the caller. No more guessing.
- `resolve_provider_client("openai-codex", model=None)` now warns and
returns (None, None) instead of silently guessing a stale model ID.
- Remove `_try_codex` from the `provider="custom"` fallback ladder
(same stale-constant trap).
- `_resolve_strict_vision_backend("openai-codex")` routes through
`resolve_provider_client` so the caller's explicit model is honored.
Codex-main users are unaffected: Step 1 of `_resolve_auto` already
uses `main_provider` + `main_model` directly and passes the user's
configured Codex model through `resolve_provider_client`, which never
touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`)
continue to work and are the supported way to route specific aux tasks
through Codex.
Users whose main provider fails with a payment/connection error and
who have ONLY ChatGPT-account Codex auth will now see the 60s pause
without a stale-model-rejection noise line in between -- same outcome,
cleaner failure.
Closes#17533. Supersedes #17544 (which resets the clock on the
same stale-constant problem).
* docs(anthropic): correct OAuth scope to Max plan + extra usage credits only
The previous docs pass (#17399) overstated what Anthropic OAuth works
with. In practice Hermes can only route against a Claude Max plan that
has purchased extra usage credits — the base Max allowance is not
consumed, and Claude Pro is not supported at all. Without Max + extra
credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token).
Updates the four pages touched in #17399:
- integrations/providers.md
- user-guide/features/credential-pools.md
- reference/environment-variables.md
- getting-started/quickstart.md
* fix(aux): skip kimi-coding in vision auto-detect (closes#17076)
Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no
image_in capability — Kimi's own docs confirm and suggest switching to
a vision-capable model. Vision lives on the separate Kimi Platform
(api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has
kimi-coding as main provider and auxiliary.vision.provider=auto,
resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient
wrapped around /coding which 404'd on every vision request.
Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn})
and gate the main-provider vision branch on membership. On a skip the
auto-detect falls through to OpenRouter → Nous like any other
main-provider-unavailable case.
Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are
unaffected — the skip only applies when the caller is in auto mode.
Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering
the skip path, CN variant, explicit-override passthrough, and a guard
against accidental skip-list widening.
Auxiliary callers that configure reasoning via
auxiliary.<task>.extra_body.reasoning were having that config silently
dropped by the Codex Responses adapter — it only forwarded
messages/model/tools through to responses.stream(), never translating
chat.completions-shaped reasoning hints into the Responses API's
top-level reasoning + include fields.
Mirror the main-agent translation from agent/transports/codex.py:
- extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"}
- 'minimal' → 'low' clamp (Codex backend rejects 'minimal')
- Always include ['reasoning.encrypted_content'] when reasoning is enabled
- {'enabled': False} → omit reasoning and include entirely
- Non-dict reasoning values are ignored defensively
Reported by @OP (Apr 26 feedback bundle).
## Changes
- agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads
and translates extra_body.reasoning before calling responses.stream()
- tests/agent/test_auxiliary_client.py: 9 new tests covering all effort
levels, the minimal→low clamp, the disabled path, the no-op paths,
and defensive handling of wrong-shape inputs
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.
Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.
Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
prefix check (covers all kimi-* models), _fixed_temperature_for_model()
returns sentinel for kimi models. _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
for kimi (sentinel mapped), direct client calls use kwargs dict to
conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
with 'temperature not in kwargs' assertions.
Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
Follow-up to #12144. That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6. Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:
HTTP 400 "invalid temperature: only 1 is allowed for this model"
Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py). So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.
Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
temperature=1.0 → 200 OK
temperature=0.6 → 400 "only 1 is allowed" ← #12144 default
temperature=None → 200 OK
Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.
Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty). _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults. Tests parametrize over
endpoint + model to lock both contracts.
Closes#9125.
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)
The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).
Match the whole kimi-k2.* family:
* kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
* all other kimi-k2.* -> 0.6 (non-thinking / instant mode)
Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.
* refactor(kimi): whitelist-match kimi coding models instead of prefix
Addresses review feedback on PR #12144.
- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
Moonshot's kimi-for-coding model list. The prefix match would have also
clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
separate non-coding K2 family with variable temperature (recommended 0.6
but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
(k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
share the fixed-temperature lock, so the preview-model mapping is no
longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
silently rewrites temperature.
- Update class docstring. Extend the negative test to parametrize over
kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
kimi-k2-experimental name — all must keep the caller's temperature.
First pass of test-suite reduction to address flaky CI and bloat.
Removed tests that fall into these change-detector patterns:
1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
that call inspect.getsource() on production modules and grep for string
literals. Break on any refactor/rename even when behavior is correct.
2. Platform enum tautologies (every gateway/test_X.py): assertions like
`Platform.X.value == 'x'` duplicated across ~9 adapter test files.
3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
only verify a key exists in a dict. Data-layout tests, not behavior.
4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
_fallback): tests that do parser.parse_args([...]) then assert args.field.
Tests Python's argparse, not our code.
5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
cmd_X, call plugins_command with matching action, assert mock called.
Tests the if/elif chain, not behavior.
6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
that mock the external API client, call our function, and assert exact
kwargs. Break on refactor even when behavior is preserved.
7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
tests): tests that patch own helper method, then assert it was called.
Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.
Net reduction: 169 tests removed. 38 empty classes cleaned up.
Collected before: 12,522 tests
Collected after: 12,353 tests
Salvaged from PR #10643 by kshitijk4poor, updated for current main.
Root causes fixed:
1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared
mock that runs at collection time (prevents ChatType=None caching)
2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests
3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry
4. Stale vision model assertion — zai now uses glm-5v-turbo
5. Reasoning item id intentionally stripped — assert 'id' not in (store=False)
6. Context length warning unreachable — pass base_url to AIAgent in test
7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py
8. Google Workspace calendar tests — rewritten for current production code,
properly mock subprocess on api_module, removed stale +agenda assertions
9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto /
_import_codex_cli_tokens to prevent real credentials from leaking into tests
Production fixes:
- Add clear_session_context() to hermes_logging.py (fixes 48 teardown errors)
- Add clear_session() to tools/approval.py (fixes 9 setup errors)
- Add SyncError M_UNKNOWN_TOKEN check to Matrix _sync_loop (bug fix)
- Fall back to inline api_key in named custom providers when key_env
is absent (runtime_provider.py)
Test fixes:
- test_memory_user_id: use builtin+external provider pair, fix honcho
peer_name override test to match production behavior
- test_display_config: remove TestHelpers for non-existent functions
- test_auxiliary_client: fix OAuth tokens to match _is_oauth_token
patterns, replace get_vision_auxiliary_client with resolve_vision_provider_client
- test_cli_interrupt_subagent: add missing _execution_thread_id attr
- test_compress_focus: add model/provider/api_key/base_url/api_mode
to mock compressor
- test_auth_provider_gate: add autouse fixture to clean Anthropic env
vars that leak from CI secrets
- test_opencode_go_in_model_list: accept both 'built-in' and 'hermes'
source (models.dev API unavailable in CI)
- test_email: verify email Platform enum membership instead of source
inspection (build_channel_directory now uses dynamic enum loop)
- test_feishu: add bot_added/bot_deleted handler mocks to _Builder
- test_ws_auth_retry: add AsyncMock for sync_store.get_next_batch,
add _pending_megolm and _joined_rooms to Matrix adapter mocks
- test_restart_drain: monkeypatch-delete INVOCATION_ID (systemd sets
this in CI, changing the restart call signature)
- test_session_hygiene: add user_id to SessionSource
- test_session_env: use relative baseline for contextvar clear check
(pytest-xdist workers share context)
Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.
What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths
Compression model/provider is now configured exclusively via:
auxiliary.compression.provider / auxiliary.compression.model
Closes#8923
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
so context-length lookups use models.dev data instead of 128k fallback.
Fixes#8161.
- Set api_mode from custom_providers entry when switching via hermes model,
and clear stale api_mode when the entry has none. Also extract api_mode
in _named_custom_provider_map(). Fixes#8181.
- Convert OpenAI image_url content blocks to Anthropic image blocks when
the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
containing /anthropic). Fixes#8147.
Four fixes to auxiliary_client.py:
1. Respect explicit provider as hard constraint (#7559)
When auxiliary.{task}.provider is explicitly set (not 'auto'),
connection/payment errors no longer silently fallback to cloud
providers. Local-only users (Ollama, vLLM) will no longer get
unexpected OpenRouter billing from auxiliary tasks.
2. Eliminate model='default' sentinel (#7512)
_resolve_api_key_provider() no longer sends literal 'default' as
model name to APIs. Providers without a known aux model in
_API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
model_not_supported errors.
3. Add payment/connection fallback to async_call_llm (#7512)
async_call_llm now mirrors sync call_llm's fallback logic for
payment (402) and connection errors. Previously, async consumers
(session_search, web_tools, vision) got hard failures with no
recovery. Also fixes hardcoded 'openrouter' fallback to use the
full auto-detection chain.
4. Use accurate error reason in fallback logs (#7512)
_try_payment_fallback() now accepts a reason parameter and uses
it in log messages. Connection timeouts are no longer misleadingly
logged as 'payment error'.
Closes#7559Closes#7512
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.
Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
(AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
clients in CodexAuxiliaryClient when api_mode=codex_responses or
when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
API-key provider return paths
- Update test mocks for the new 5-tuple return format
Users can now set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses
Closes#6800
GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:
model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint
resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.
Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
_resolve_api_key_provider() now checks is_provider_explicitly_configured
before calling _try_anthropic(). Previously, any auxiliary fallback
(e.g. when kimi-coding key was invalid) would silently discover and use
Claude Code OAuth tokens — consuming the user's Claude Max subscription
without their knowledge.
This is the auxiliary-client counterpart of the setup-wizard gate in
PR #4210.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:
1. OpenRouter (known vision-capable default model)
2. Nous Portal (known vision-capable default model)
3. Active provider + model (whatever the user is running)
4. Stop
This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).
Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.
Based on PR #5376 by Mibay. Closes#5366.
Salvaged fixes from community PRs:
- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
key lookup (was checking top-level dict instead of store['providers']).
OAuth providers now correctly detected in /model picker.
Cherry-picked from PR #5911 by Xule Lin (linxule).
- fix(ollama): pass num_ctx to override 2048 default context window.
Ollama defaults to 2048 context regardless of model capabilities. Now
auto-detects from /api/show metadata and injects num_ctx into every
request. Config override via model.ollama_num_ctx. Fixes#2708.
Cherry-picked from PR #5929 by kshitij (kshitijk4poor).
- fix(aux): normalize provider aliases for vision/auxiliary routing.
Adds _normalize_aux_provider() with 17 aliases (google→gemini,
claude→anthropic, glm→zai, etc). Fixes vision routing failure when
provider is set to 'google' instead of 'gemini'.
Cherry-picked from PR #5793 by e11i (Elizabeth1979).
- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
but auxiliary client uses OpenAI SDK which appends /chat/completions →
404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
Inspired by PR #5786 by Lempkey.
Added debug logging to silent exception blocks across all fixes.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>