## Problem
When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:
* botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
EndpointConnectionError / ConnectTimeoutError
* urllib3.exceptions.ProtocolError
* A bare AssertionError raised from inside urllib3 or botocore
(internal connection-pool invariant check)
The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.
The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:
⚠️ API call failed: AssertionError
📝 Error:
with no hint of what went wrong.
## Fix
Add two helpers to agent/bedrock_adapter.py:
* is_stale_connection_error(exc) — classifies exceptions that
indicate dead-client/dead-socket state. Matches botocore
ConnectionError + HTTPClientError subtrees, urllib3
ProtocolError / NewConnectionError, and AssertionError
raised from a frame whose module name starts with urllib3.,
botocore., or boto3.. Application-level AssertionErrors are
intentionally excluded.
* invalidate_runtime_client(region) — per-region counterpart to
the existing reset_client_cache(). Evicts a single cached
client so the next call rebuilds it (and its connection pool).
Wire both into the Converse call sites:
* call_converse() / call_converse_stream() in
bedrock_adapter.py (defense-in-depth for any future caller)
* The two direct client.converse(**kwargs) /
client.converse_stream(**kwargs) call sites in run_agent.py
(the paths the agent loop actually uses)
On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.
## Tests
tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):
* TestInvalidateRuntimeClient — per-region eviction correctness;
non-cached region returns False.
* TestIsStaleConnectionError — classifies botocore
ConnectionClosedError / EndpointConnectionError /
ReadTimeoutError, urllib3 ProtocolError, library-internal
AssertionError (both urllib3.* and botocore.* frames), and
correctly ignores application-level AssertionError and
unrelated exceptions (ValueError, KeyError).
* TestCallConverseInvalidatesOnStaleError — end-to-end: stale
error evicts the cached client, non-stale error (validation)
leaves it alone, successful call leaves it cached.
All 116 tests in test_bedrock_adapter.py pass.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>
Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None). This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.
Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata). Default
auxiliary model is Haiku for cost efficiency.
Closes#13919
## Problem
`get_model_context_length()` in `agent/model_metadata.py` had a resolution
order bug that caused every Bedrock model to fall back to the 128K default
context length instead of reaching the static Bedrock table (200K for
Claude, etc.).
The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in
`_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False.
The resolution order then ran the custom-endpoint probe (step 2) *before*
the Bedrock branch (step 4b), which:
1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`).
2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the
bedrock-runtime URL (Bedrock doesn't serve this shape).
3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the
"probe-down" branch — never reaching the Bedrock static table.
Result: users on Bedrock saw 128K context for Claude models that
actually support 200K on Bedrock, causing premature auto-compression.
## Fix
Promote the Bedrock branch from step 4b to step 1b, so it runs *before*
the custom-endpoint probe at step 2. The static table in
`bedrock_adapter.py::get_bedrock_context_length()` is the authoritative
source for Bedrock (the ListFoundationModels API doesn't expose context
window sizes), so there's no reason to probe `/models` first.
The original step 4b is replaced with a one-line breadcrumb comment
pointing to the new location, to make the resolution-order docstring
accurate.
## Changes
- `agent/model_metadata.py`
- Add step 1b: Bedrock static-table branch (unchanged predicate, moved).
- Remove dead step 4b block, replace with breadcrumb comment.
- Update resolution-order docstring to include step 1b.
- `tests/agent/test_model_metadata.py`
- New `TestBedrockContextResolution` class (3 tests):
- `test_bedrock_provider_returns_static_table_before_probe`:
confirms `provider="bedrock"` hits the static table and does NOT
call `fetch_endpoint_model_metadata` (regression guard).
- `test_bedrock_url_without_provider_hint`: confirms the
`bedrock-runtime.*.amazonaws.com` host match works without an
explicit `provider=` hint.
- `test_non_bedrock_url_still_probes`: confirms the probe still
fires for genuinely-custom endpoints (no over-reach).
## Testing
pytest tests/agent/test_model_metadata.py -q
# 83 passed in 1.95s (3 new + 80 existing)
## Risk
Very low.
- Predicate is identical to the original step 4b — no behaviour change
for non-Bedrock paths.
- Original step 4b was dead code for the user-facing case (always hit
the 128K fallback first), so removing it cannot regress behaviour.
- Bedrock path now short-circuits before any network I/O — faster too.
- `ImportError` fall-through preserved so users without `boto3`
installed are unaffected.
## Related
- This is a prerequisite for accurate context-window accounting on
Bedrock — the fix for #14710 (stale-connection client eviction)
depends on correct context sizing to know when to compress.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>
Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7,
us.anthropic.claude-sonnet-4-5-v1:0), not version separators.
normalize_model_name() was unconditionally converting all dots to hyphens,
producing invalid IDs that Bedrock rejects with HTTP 400/404.
This affected both the main agent loop (partially mitigated by
_anthropic_preserve_dots in run_agent.py) and all auxiliary client calls
(compression, session_search, vision, etc.) which go through
_AnthropicCompletionsAdapter and never pass preserve_dots=True.
Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes
(anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen
conversion for these IDs regardless of the preserve_dots flag.
Bug 3 — Stale OAuth token not detected in 'hermes model':
- _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats
any non-empty token (including expired OAuth tokens) as valid.
- Added existing_is_stale_oauth check: if the only credential is an OAuth
token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale
and force the re-auth menu instead of silently accepting a broken token.
Bug 4 — macOS Keychain credentials never read:
- Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the
macOS Keychain under service 'Claude Code-credentials'.
- Added _read_claude_code_credentials_from_keychain() using the 'security'
CLI tool; read_claude_code_credentials() now tries Keychain first then
falls back to JSON file.
- Non-Darwin platforms return None from Keychain read immediately.
Tests:
- tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only
guard, security command failures, JSON parsing, fallback priority.
- tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases
covering stale OAuth detection, API key passthrough, cc_creds fallback.
Refs: #12905
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.
The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:
1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
omitted Google entirely, so users had no way to verify from 'hermes
dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
rows.
2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
an empty/whitespace api_key and stamped it into the x-goog-api-key
header, which made Google's frontend return a generic HTML 400 long
before the request reached the Generative Language backend. Now we
raise RuntimeError at construction with an actionable message
pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.
Added a regression test that covers '', ' ', and None.
Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token
via resolve_nous_runtime_credentials() write the rotated tokens to auth.json.
The calling process's pool entry becomes stale, and the next refresh against
the already-rotated token triggers a 'refresh token reuse' revocation on
the Nous Portal.
_sync_nous_entry_from_auth_store() reads auth.json under the same lock used
by resolve_nous_runtime_credentials, and adopts the newer token pair before
refreshing the pool entry. This complements #15111 (which preserved the
obtained_at timestamps through seeding).
Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py
changes + the 3 Nous-specific regression tests. The PR also touched 10
unrelated files (Dockerfile, tips.py, various tool tests) which were
dropped as scope creep.
Regression tests:
- test_sync_nous_entry_from_auth_store_adopts_newer_tokens
- test_sync_nous_entry_noop_when_tokens_match
- test_nous_exhausted_entry_recovers_via_auth_store_sync
The least_used strategy selected entries via min(request_count) but
never incremented the counter. All entries stayed at count=0, so the
strategy degenerated to fill_first behavior with no actual load balancing.
Now increments request_count after each selection and persists the update.
Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME.
Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback.
Refs #11068.
Two narrow fixes motivated by #15099.
1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at,
expires_in, and friends when seeding device_code pool entries from the
providers.nous singleton. Fresh credentials showed up with
obtained_at=None, which broke downstream freshness-sensitive consumers
(self-heal hooks, pool pruning by age) — they treated just-minted
credentials as older than they actually were and evicted them.
2. When the Nous Portal OAuth 2.1 server returns invalid_grant with
'Refresh token reuse detected' in the error_description, rewrite the
message to explain the likely cause (an external process consumed the
rotated RT without persisting it back) and the mitigation. The generic
reuse message led users to report this as a Hermes persistence bug when
the actual trigger was typically a third-party monitoring script calling
/api/oauth/token directly. Non-reuse errors keep their original server
description untouched.
Closes#15099.
Regression tests:
- tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description
Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is
exhausted in a handful of agent turns, so the setup wizard now refuses
to wire up Gemini when the supplied key is on the free tier, and the
runtime 429 handler appends actionable billing guidance.
Setup-time probe (hermes_cli/main.py):
- `_model_flow_api_key_provider` fires one minimal generateContent call
when provider_id == 'gemini' and classifies the response as
free/paid/unknown via x-ratelimit-limit-requests-per-day header or
429 body containing 'free_tier'.
- Free -> print block message, refuse to save the provider, return.
- Paid -> 'Tier check: paid' and proceed.
- Unknown (network/auth error) -> 'could not verify', proceed anyway.
Runtime 429 handler (agent/gemini_native_adapter.py):
- `gemini_http_error` appends billing guidance when the 429 error body
mentions 'free_tier', catching users who bypass setup by putting
GOOGLE_API_KEY directly in .env.
Tests: 21 unit tests for the probe + error path, 4 tests for the
setup-flow block. All 67 existing gemini tests still pass.
PR #14935 added a Codex-aware context resolver but only new lookups
hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4
BEFORE that PR already had the wrong value (e.g. 1,050,000 from
models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the
cache-first lookup in get_model_context_length() returns it forever.
Symptom (reported in the wild by Ludwig, min heo, Gaoge on current
main at 6051fba9d, which is AFTER #14935):
* Startup banner shows context usage against 1M
* Compression fires late and then OpenAI hard-rejects with
'context length will be reduced from 1,050,000 to 128,000'
around the real 272k boundary.
Fix: when the step-1 cache returns a value for an openai-codex lookup,
check whether it's >= 400k. Codex OAuth caps every slug at 272k (live
probe values) so anything at or above 400k is definitionally a
pre-#14935 leftover. Drop that entry from the on-disk cache and fall
through to step 5, which runs the live /models probe and repersists
the correct value (or 272k from the hardcoded fallback if the probe
fails). Non-Codex providers and legitimately-cached Codex entries at
272k are untouched.
Changes:
- agent/model_metadata.py:
* _invalidate_cached_context_length() — drop a single entry from
context_length_cache.yaml and rewrite the file.
* Step-1 cache check in get_model_context_length() now gates
provider=='openai-codex' entries >= 400k through invalidation
instead of returning them.
Tests (3 new in TestCodexOAuthContextLength):
- stale 1.05M Codex entry is dropped from disk AND re-resolved
through the live probe to 272k; unrelated cache entries survive.
- fresh 272k Codex entry is respected (no probe call, no invalidation).
- non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter)
are unaffected — the guard is strictly scoped to openai-codex.
Full tests/agent/test_model_metadata.py: 88 passed.
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.
Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.
Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
(the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
finish_reason, tool_call.id, tool_call.type so the chat_completions
transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
after the scanner switched to agent.skill_utils.iter_skill_index_files
(os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
alias-args path doesn't trip AttributeError when fuzzy-matching leaks
a skill command across xdist test distribution.
Gemini's Schema validator requires every `enum` entry to be a string,
even when the parent `type` is integer/number/boolean. Discord's
`auto_archive_duration` parameter (`type: integer, enum: [60, 1440,
4320, 10080]`) tripped this on every request that shipped the full
tool catalog to generativelanguage.googleapis.com, surfacing as
`Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT)
Invalid value ... (TYPE_STRING), 60` and aborting the turn.
Sanitize by dropping the `enum` key when the declared type is numeric
or boolean and any entry is non-string. The `type` and `description`
survive, so the model still knows the allowed values; the tool handler
keeps its own runtime validation. Other providers (OpenAI,
OpenRouter, Anthropic) are unaffected — the sanitizer only runs for
native Gemini / cloudcode adapters.
Reported by @selfhostedsoul on Discord with hermes debug share.
Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.
Two gaps caused this:
1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
providers-dict branch (new-style) returned only name/base_url/api_key/
model and dropped api_mode. The legacy custom_providers-list branch
already propagated it correctly. The dict branch now parses and
returns api_mode via _parse_api_mode() in both match paths.
2. agent/auxiliary_client.py::resolve_provider_client — the named
custom provider block at ~L1740 ignored custom_entry['api_mode']
and unconditionally built an OpenAI client (only wrapping for
Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
otherwise plain OpenAI. An explicit task-level api_mode override
still wins over the provider entry's declared api_mode.
Fixes#15033
Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering
- providers-dict preserves valid api_mode
- invalid api_mode values are dropped
- missing api_mode leaves the entry unchanged (no regression)
- resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
api_mode=anthropic_messages
- full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
with an auxiliary.<task> override
- providers without api_mode still use the OpenAI-wire path
- hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi
fallback (mirrors weixin 3a0ec1d93). Extend env var chain to include
REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths.
- agent/model_metadata.py: add _resolve_requests_verify() reading
HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority
order. Apply explicit verify= to all 6 requests.get callsites.
- Tests: 18 new unit tests + autouse platform pin on existing
TestResolveVerifyFallback to keep its "returns True" assertions
platform-independent.
Empirically verified against self-signed HTTPS server: requests honors
REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now
honors all three everywhere.
Triggered by Discord reports — Nous OAuth SSL failure on macOS
Homebrew Python; custom provider self-signed cert ignored despite
REQUESTS_CA_BUNDLE set in env.
OpenRouter returns a 404 with the specific message
'No endpoints available matching your guardrail restrictions and data
policy. Configure: https://openrouter.ai/settings/privacy'
when a user's account-level privacy setting excludes the only endpoint
serving a model (e.g. DeepSeek V4 Pro, which today is hosted only by
DeepSeek's own endpoint that may log inputs).
Before this change we classified it as model_not_found, which was
misleading (the model exists) and triggered provider fallback (useless —
the same account setting applies to every OpenRouter call).
Now it classifies as a new FailoverReason.provider_policy_blocked with
retryable=False, should_fallback=False. The error body already contains
the fix URL, so the user still gets actionable guidance.
On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens,
but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev)
because openai-codex aliases to the openai entry there. At 1.05M the
compressor never fires and requests hard-fail with 'context window
exceeded' around the real 272k boundary.
Verified live against chatgpt.com/backend-api/codex/models:
gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex,
gpt-5.2, gpt-5.1-codex-max → context_window = 272000
Changes:
- agent/model_metadata.py:
* _fetch_codex_oauth_context_lengths() — probe the Codex /models
endpoint with the OAuth bearer token and read context_window per
slug (1h in-memory TTL).
* _resolve_codex_oauth_context_length() — prefer the live probe,
fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k).
* Wire into get_model_context_length() when provider=='openai-codex',
running BEFORE the models.dev lookup (which returns 1.05M). Result
persists via save_context_length() so subsequent lookups skip the
probe entirely.
* Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5
entry (400k was never right for Codex; it's the catch-all for
providers we can't probe live).
Tests (4 new in TestCodexOAuthContextLength):
- fallback table used when no token is available (no models.dev leakage)
- live probe overrides the fallback
- probe failure (non-200) falls back to hardcoded 272k
- non-codex providers (openrouter, direct openai) unaffected
Non-codex context resolution is unchanged — the Codex branch only fires
when provider=='openai-codex'.
Fixes a broader class of 'tools.function.parameters is not a valid
moonshot flavored json schema' errors on Nous / OpenRouter aggregators
routing to moonshotai/kimi-k2.6 with MCP tools loaded.
## Moonshot sanitizer (agent/moonshot_schema.py, new)
Model-name-routed (not base-URL-routed) so Nous / OpenRouter users are
covered alongside api.moonshot.ai. Applied in
ChatCompletionsTransport.build_kwargs when is_moonshot_model(model).
Two repairs:
1. Fill missing 'type' on every property / items / anyOf-child schema
node (structural walk — only schema-position dicts are touched, not
container maps like properties/$defs).
2. Strip 'type' at anyOf parents; Moonshot rejects it.
## MCP normalizer hardened (tools/mcp_tool.py)
Draft-07 $ref rewrite from PR #14802 now also does:
- coerce missing / null 'type' on object-shaped nodes (salvages #4897)
- prune 'required' arrays to names that exist in 'properties'
(salvages #4651; Gemini 400s on dangling required)
- apply recursively, not just top-level
These repairs are provider-agnostic so the same MCP schema is valid on
OpenAI, Anthropic, Gemini, and Moonshot in one pass.
## Crash fix: safe getattr for Tool.inputSchema
_convert_mcp_schema now uses getattr(t, 'inputSchema', None) so MCP
servers whose Tool objects omit the attribute entirely no longer abort
registration (salvages #3882).
## Validation
- tests/agent/test_moonshot_schema.py: 27 new tests (model detection,
missing-type fill, anyOf-parent strip, non-mutation, real-world MCP
shape)
- tests/tools/test_mcp_tool.py: 7 new tests (missing / null type,
required pruning, nested repair, safe getattr)
- tests/agent/transports/test_chat_completions.py: 2 new integration
tests (Moonshot route sanitizes, non-Moonshot route doesn't)
- Targeted suite: 49 passed
- E2E via execute_code with a realistic MCP tool carrying all three
Moonshot rejection modes + dangling required + draft-07 refs:
sanitizer produces a schema valid on Moonshot and Gemini
A test in tests/agent/test_credential_pool.py
(test_try_refresh_current_updates_only_current_entry) monkeypatched
refresh_codex_oauth_pure() to return the literal fixture strings
'access-new'/'refresh-new', then executed the real production code path
in agent/credential_pool.py::try_refresh_current which calls
_sync_device_code_entry_to_auth_store → _save_provider_state → writes
to `providers.openai-codex.tokens`. That writer resolves the target via
get_hermes_home()/auth.json. If the test ran with HERMES_HOME unset (direct
pytest invocation, IDE runner bypassing conftest discovery, or any other
sandbox escape), it would overwrite the real user's auth store with the
fixture strings.
Observed in the wild: Teknium's ~/.hermes/auth.json providers.openai-codex.tokens
held 'access-new'/'refresh-new' for five days. His CLI kept working because
the credential_pool entries still held real JWTs, but `hermes model`'s live
discovery path (which reads via resolve_codex_runtime_credentials →
_read_codex_tokens → providers.tokens) was silently 401-ing.
Fixes:
- Delete test_try_refresh_current_updates_only_current_entry. It was the
only test that exercised a writer hitting providers.openai-codex.tokens
with literal stub tokens. The entry-level rotation behavior it asserted
is still covered by test_mark_exhausted_and_rotate_persists_status above.
- Add a seat belt in hermes_cli.auth._auth_file_path(): if PYTEST_CURRENT_TEST
is set AND the resolved path equals the real ~/.hermes/auth.json, raise
with a clear message. In production (no PYTEST_CURRENT_TEST), a single
dict lookup. Any future test that forgets to monkeypatch HERMES_HOME
fails loudly instead of corrupting the user's credentials.
Validation:
- production (no PYTEST_CURRENT_TEST): returns real path, unchanged behavior
- pytest + HERMES_HOME unset (points at real home): raises with message
- pytest + HERMES_HOME=/tmp/...: returns tmp path, tests pass normally
Commit 43de1ca8 removed the _nr_to_assistant_message shim in favor of
duck-typed properties on the ToolCall dataclass. However, the
extra_content property (which carries the Gemini thought_signature) was
omitted from the ToolCall definition. This caused _build_assistant_message
to silently drop the signature via getattr(tc, 'extra_content', None)
returning None, leading to HTTP 400 errors on subsequent turns for all
Gemini 3 thinking models.
Add the extra_content property to ToolCall (matching the existing
call_id and response_item_id pattern) so the thought_signature round-trips
correctly through the transport → agent loop → API replay path.
Credit to @celttechie for identifying the root cause and providing the fix.
Closes#14488
## Merged
Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.
### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference
### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
NormalizedResponse and ToolCall now have backward-compat properties
so the agent loop can read them directly without the shim:
ToolCall: .type, .function (returns self), .call_id, .response_item_id
NormalizedResponse: .reasoning_content, .reasoning_details,
.codex_reasoning_items
This eliminates the 35-line shim and its 4 call sites in run_agent.py.
Also changes flush_memories guard from hasattr(response, 'choices')
to self.api_mode in ('chat_completions', 'bedrock_converse') so it
works with raw boto3 dicts too.
WS1 items 3+4 of Cycle 2 (#14418).
3-layer chain (transport → v2 → v1) was collapsed to 2-layer in PR 7.
This collapses the remaining 2-layer (transport → v1 → NR mapping in
transport) to 1-layer: v1 now returns NormalizedResponse directly.
Before: adapter returns (SimpleNamespace, finish_reason) tuple,
transport unpacks and maps to NormalizedResponse (22 lines).
After: adapter returns NormalizedResponse, transport is a
1-line passthrough.
Also updates ToolCall construction — adapter now creates ToolCall
dataclass directly instead of SimpleNamespace(id, type, function).
WS1 item 1 of Cycle 2 (#14418).
* feat(agent): add PLATFORM_HINTS for matrix, mattermost, and feishu
These platform adapters fully support media delivery (send_image,
send_document, send_voice, send_video) but were missing from
PLATFORM_HINTS, leaving agents unaware of their platform context,
markdown rendering, and MEDIA: tag support.
Salvaged from PR #7370 by Rutimka — wecom excluded since main already
has a more detailed version.
Co-Authored-By: Marco Rutsch <marco@rutimka.de>
* test: add missing Markdown assertion for feishu platform hint
---------
Co-authored-by: Marco Rutsch <marco@rutimka.de>
Consolidate 4 per-transport lazy singleton helpers (_get_anthropic_transport,
_get_codex_transport, _get_chat_completions_transport, _get_bedrock_transport)
into one generic _get_transport(api_mode) with a shared dict cache.
Collapse the 65-line main normalize block (3 api_mode branches, each with
its own SimpleNamespace shim) into 7 lines: one _get_transport() call +
one _nr_to_assistant_message() shared shim. The shim extracts provider_data
fields (codex_reasoning_items, reasoning_details, call_id, response_item_id)
into the SimpleNamespace shape downstream code expects.
Wire chat_completions and bedrock_converse normalize through their transports
for the first time — these were previously falling into the raw
response.choices[0].message else branch.
Remove 8 dead codex adapter imports that have zero callers after PRs 1-6.
Transport lifecycle improvements:
- Eagerly warm transport cache at __init__ (surfaces import errors early)
- Invalidate transport cache on api_mode change (switch_model, fallback
activation, fallback restore, transport recovery) — prevents stale
transport after mid-session provider switch
run_agent.py: -32 net lines (11,988 -> 11,956).
PR 7 of the provider transport refactor.
Port from openclaw/openclaw#66664. The build_anthropic_kwargs call site
used 'max_tokens or _get_anthropic_max_output(model)', which correctly
falls back when max_tokens is 0 or None (falsy) but lets negative ints
(-1, -500), fractional floats (0.5, 8192.7), NaN, and infinity leak
through to the Anthropic API. Anthropic rejects these with HTTP 400
('max_tokens: must be greater than or equal to 1'), turning a local
config error into a surprise mid-conversation failure.
Add two resolver helpers matching OpenClaw's:
_resolve_positive_anthropic_max_tokens — returns int(value) only if
value is a finite positive number; excludes bools, strings, NaN,
infinity, sub-one positives (floor to 0).
_resolve_anthropic_messages_max_tokens — prefers a positive requested
value, else falls back to the model's output ceiling; raises
ValueError only if no positive budget can be resolved.
The context-window clamp at the call site (max_tokens > context_length)
is preserved unchanged — it handles oversized values; the new resolver
handles non-positive values. These concerns are now cleanly separated.
Tests: 17 new cases covering positive/zero/negative ints, fractional
floats (both >1 and <1), NaN, infinity, booleans, strings, None, and
integration via build_anthropic_kwargs.
Refs: openclaw/openclaw#66664
Mid-stream SSL alerts (bad_record_mac, tls_alert_internal_error, handshake
failures) previously fell through the classifier pipeline to the 'unknown'
bucket because:
- ssl.SSLError type names weren't in _TRANSPORT_ERROR_TYPES (the
isinstance(OSError) catch picks up some but not all SDK-wrapped forms)
- the message-pattern list had no SSL alert substrings
The 'unknown' bucket is still retryable, but: (a) logs tell the user
'unknown' instead of identifying the cause, (b) it bypasses the
transport-specific backoff/fallback logic, and (c) if the SSL error
happens on a large session with a generic 'connection closed' wrapper,
the existing disconnect-on-large-session heuristic would incorrectly
trigger context compression — expensive, and never fixes a transport
hiccup.
Changes:
- Add ssl.SSLError and its subclass type names to _TRANSPORT_ERROR_TYPES
- New _SSL_TRANSIENT_PATTERNS list (separate from _SERVER_DISCONNECT_PATTERNS
so SSL alerts route to timeout, not context_overflow+compress)
- New step 5 in the classifier pipeline: SSL pattern check runs BEFORE
the disconnect check to pre-empt the large-session-compress path
Patterns cover both space-separated ('ssl alert', 'bad record mac')
and underscore-separated ('ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC')
forms. This is load-bearing because OpenSSL 3.x changed the error-code
separator from underscore to slash (e.g. SSLV3_ALERT_BAD_RECORD_MAC →
SSL/TLS_ALERT_BAD_RECORD_MAC) and will likely churn again — matching on
stable alert reason substrings survives future format changes.
Tests (8 new):
- BAD_RECORD_MAC in Python ssl.c format
- OpenSSL 3.x underscore format
- TLSV1_ALERT_INTERNAL_ERROR
- ssl handshake failure
- [SSL: ...] prefix fallback
- Real ssl.SSLError instance
- REGRESSION GUARD: SSL on large session does NOT compress
- REGRESSION GUARD: plain disconnect on large session STILL compresses
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
`is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies
RFC-1918 ranges and link-local as private but deliberately excludes the
RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its
mesh IPs. As a result, Ollama reached over Tailscale (e.g.
`http://100.77.243.5:11434`) was treated as remote and missed the
automatic stream-read / stale-stream timeout bumps, so cold model load
plus long prefill would trip the 300 s watchdog before the first token.
Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")`
(built once) and extend `is_local_endpoint()` to match the block both
via the parsed-`IPv4Address` path and the existing bare-string fallback
(for symmetry with the 10/172/192 checks). Also hoist the previously
function-local `import ipaddress` to module scope now that it's used by
the constant.
Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound,
representative host, MagicDNS anchor, upper bound) and a near-miss
negative set (just below 100.64.0.0, just above 100.127.255.255, well
outside the block, and first-octet-wrong).
Anthropic's API can legitimately return content=[] with stop_reason="end_turn"
when the model has nothing more to add after a turn that already delivered the
user-facing text alongside a trivial tool call (e.g. memory write). The transport
validator was treating that as an invalid response, triggering 3 retries that
each returned the same valid-but-empty response, then failing the run with
"Invalid API response after 3 retries."
The downstream normalizer already handles empty content correctly (empty loop
over response.content, content=None, finish_reason="stop"), so the only fix
needed is at the validator boundary.
Tests:
- Empty content + stop_reason="end_turn" → valid (the fix)
- Empty content + stop_reason="tool_use" → still invalid (regression guard)
- Empty content without stop_reason → still invalid (existing behavior preserved)
The 404 branch in _classify_by_status had dead code: the generic
fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the
exact same classification (model_not_found + should_fallback=True),
so every 404 — regardless of message — was treated as a missing model.
This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s
usually mean a wrong endpoint path, proxy routing glitch, or transient
backend issue — not a missing model. Claiming 'model not found' misleads
the next turn and silently falls back to another provider when the real
problem was a URL typo the user should see.
Fix: only classify 404 as model_not_found when the message actually
matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found",
etc.). Otherwise fall through as unknown (retryable) so the real error
surfaces in the retry loop.
Test updated to match the new behavior. 103 error_classifier tests pass.
- Add configurable retain_tags / retain_source / retain_user_prefix /
retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
chat_type, thread_id) through AIAgent and MemoryManager into
MemoryProvider.initialize kwargs so providers can scope and tag
retained memories.
- Hindsight attaches the new identity fields as retain metadata,
merges per-call tool tags with configured default tags, and uses
the configurable transcript labels for auto-retained turns.
Co-authored-by: Abner <abner.the.foreman@agentmail.to>
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:
- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
mapping, and StepFun region/model persistence
Based on #6005 by @hengm3467.
Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:
HTTP 400: thinking is enabled but reasoning_content is missing in
assistant tool call message at index N
Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.
build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).
Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
Fourth and final transport — completes the transport layer with all four
api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.
Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction
Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response. Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it. Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.
Transport coverage — complete:
| api_mode | Transport | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport | ✅ | ✅ | ✅ |
| codex_responses | ResponsesApiTransport | ✅ | ✅ | ✅ |
| chat_completions | ChatCompletionsTransport | ✅ | ✅ | ✅ |
| bedrock_converse | BedrockTransport | ✅ | ✅ | ✅ |
17 new BedrockTransport tests pass. 117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.
Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
ToolCall.provider_data — the original shim stripped it, causing 400 errors
on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
chat_completions, response.choices[0].message is already the right shape
with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
and per-tool-call .extra_content from the OpenAI SDK.
run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.
39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().
Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)
Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.
Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.
24 new tests. 343 codex/responses/transport tests pass (0 failures).
PR 4 of the provider transport refactor.
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.
The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.
Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.
Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.