Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content
replay via model_extra fallback + capturing tool_calls at method entry.
Kimi / Moonshot thinking mode enforces the same echo-back contract and
hits the same 400 when a tool-call turn is persisted without
reasoning_content.
- _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad()
(DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone.
- Extract _needs_thinking_reasoning_pad() and reuse it in
_copy_reasoning_content_for_api so both sites share one predicate.
- tests/run_agent/test_deepseek_reasoning_content_echo.py: add
TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek
(attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url),
and an OpenRouter negative control that must NOT pad. Proven to fail
2/5 cases on Kimi/Moonshot without this change.
- scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179.
Refs #17400.
Co-authored-by: season179 <season.saw@gmail.com>
Alongside the existing 'least recently used' section, surface two more
rankings so users can see which of their agent-created skills actually
get exercised:
- 'most used (top 5)' — sorted by use_count descending. Hidden when every
skill has use_count=0 (noise suppression on fresh installs).
- 'least used (top 5)' — sorted by use_count ascending. Always shown
when the catalog is non-empty.
use_count started tracking real agent skill activation in PR #17932
(bump_use wired into skill_view tool + slash invocation + --skill
preload), so these rankings are now meaningful.
Tests: 3 new in tests/hermes_cli/test_curator_status.py — happy path
with mixed use_counts, zero-use suppression of the most-used section,
and the no-skills clean-empty case.
Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.
restore_skill() in tools/skill_usage.py used archive_root.iterdir(), which
only walked the top level of .archive/. Skills archived under nested layouts
(e.g. .archive/openclaw-imports/<skill>/ from older archive paths or
external imports) were invisible to both the exact-match and prefix-match
candidate scans, surfacing as a misleading "skill '<name>' not found in
archive" error even though the directory existed on disk.
Switch both candidate scans to archive_root.rglob('*') so the lookup
descends into category subdirectories.
Fixes#17942
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly. Pin each invariant so a
future refactor can't silently regress them:
1. Normal single-message path still releases _active_sessions[sk] and
_session_tasks[sk] through end-of-finally. The #17758 follow-up
moved _release_session_guard under
if current_task is self._session_tasks.get(session_key)
For the 99%-common case current_task IS the stored task, so the
guard must still fire. Test would fail if the conditional were
ever tightened in a way that dropped the normal path.
2. Drain-task cancellation releases the session. If the drain task
spawned by the in-band hand-off is cancelled mid-handler (e.g.
/stop fired while draining a follow-up), its own finally must
fire _release_session_guard. Without this a cancel would leave
the session permanently pinned busy.
3. Late-arrival drain still spawns when no in-band drain preceded
it. Pre-existing path, but the #17758 follow-up added a
re-queue branch that only fires when ownership was already
handed off. When no handoff happened the else branch must still
spawn a fresh drain task — otherwise a message arriving during
stop_typing gets silently dropped.
All three tests pass against current main. Zero production code
changes.
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping. That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.
Split the failure classification in ``GatewayRunner._run_agent``:
* Context-overflow (``compression_exhausted`` flag, explicit
context-length phrases, or generic 400 with a long history) → keep
the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
conversation survives a retry.
Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.
Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.
Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for #17767), and the extract target.
_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.
Fix:
- _set_nested now detects list nodes and navigates by numeric index,
and preserves both dicts AND lists at intermediate positions (scalars
are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
_set_nested instead -- single source of truth for the rules.
Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path
35/35 existing + new tests pass.
E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.
Closes#17876
Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:
- DeepSeek: 'Tool names must be unique.'
- Xiaomi MiMo: 'tools contains duplicate names: lcm_expand'
- Moonshot/Kimi: 'function name lcm_grep is duplicated'
TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.
Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
but the FIRST uncached call stored and returned the SAME object.
run_agent.py mutates self.tools (memory + LCM context-engine schemas)
in-place, so the very first agent init in a Gateway process
poisoned the cache, and every subsequent init appended LCM schemas
again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
lcm_expand) had no dedup, unlike the memory-tools injection right
above it which already skips already-present names.
Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
computed list but return list(result) to the caller. Same pattern as
the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
schemas whose names are already present, mirroring the memory-tools
block. This also defends against plugin paths that may register the
same schemas via ctx.register_tool().
Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
appending lcm_grep / lcm_expand to the returned list and asserts the
next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
was fine.
5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).
Root cause: two separate context values existed independently:
- context_compressor.context_length = config value (e.g. 65536) ✓
- _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config
Changes:
1. Cap Ollama num_ctx to config context_length (run_agent.py)
When model.context_length is explicitly set and no explicit
ollama_num_ctx override exists, cap the auto-detected GGUF value
to the user's context_length. This is the core fix — it prevents
Ollama from allocating more VRAM than the user budgeted.
2. Pass config_context_length through all secondary call sites
Several paths called get_model_context_length() without the config
override, falling through to the 256K default fallback:
- cli.py: @-reference expansion and /model switch display
- gateway/run.py: @-reference expansion and /model switch display
- tui_gateway/server.py: @-reference expansion
- hermes_cli/model_switch.py: resolve_display_context_length()
3. Normalize root-level context_length in config (hermes_cli/config.py)
_normalize_root_model_keys() now migrates root-level context_length
into the model section, matching existing behavior for provider and
base_url. Users who wrote `context_length: 65536` at the YAML root
instead of under `model:` had it silently ignored.
4. Fix misleading comments (agent/model_metadata.py)
DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
as two comments stated.
Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().
This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.
Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.
Closes#17775
The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:
Unknown name "thinking_config": Cannot find field
`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.
The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.
Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.
Fixes#17426
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.
Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
GIFs peeled off and routed through send_animation (albums don't support
animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
over); URL images downloaded into BytesIO so they render inline; forum
channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
SMTP size governs); remote URLs remain linked in body (parity with
existing send_image)
All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.
Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.
Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.
Co-authored-by: Maxence Groine <maxence@groine.fr>
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.
Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.
Also implements batching + rate-limit handling in the `send_message`
tool.
New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
Merge resolved conflicts in web/src/{i18n/{en,zh,types}.ts,lib/api.ts}
by keeping both this branch's `profiles` additions and upstream's new
`models` page additions.
Copilot review feedback:
- Implement POST /api/profiles/{name}/open-terminal endpoint (already
present); align Windows branch to `cmd.exe /c start "" <cmd>` so it
matches the new test and spawns a fresh window instead of /k reusing
the parent console.
- Move backslash escaping out of the macOS AppleScript f-string
expression (Python <3.12 disallows backslashes inside f-string
expression parts).
- Patch `_get_wrapper_dir` via monkeypatch in
test_profiles_create_creates_wrapper_alias_when_safe so the test no
longer writes to the real `~/.local/bin`.
- Extend test_dashboard_browser_safe_imports to scan `.ts` files in
addition to `.tsx`.
- Switch upstream's new ModelsPage.tsx away from the `@nous-research/ui`
root barrel onto per-component subpaths to satisfy the stricter scan.
- Fix NouiTypography `leading-1.4` -> `leading-[1.4]` so Tailwind
actually emits the line-height for the `sm` variant.
- Guard ProfilesPage.openSoulEditor against out-of-order responses by
tracking the latest requested profile via a ref.
- Replace ProfilesPage's hand-rolled setup command with a fetch to
`/api/profiles/{name}/setup-command` so the copied command always
matches what the backend would actually run (handles wrapper-alias
collisions and reserved names correctly).
- Wire SOUL.md textarea label `htmlFor` -> textarea `id` so screen
readers and clicking the label work as expected.
The sandbox-side `_call()` in both the UDS and file-based transports was
not thread-safe, so scripts that call tools from multiple threads (e.g.
`ThreadPoolExecutor` over `terminal()`) inside a single `execute_code`
run could silently receive each other's responses.
Root cause:
* UDS transport — a single module-level `_sock` was shared across all
threads; the newline-framed protocol has no request-id; and the
server-side RPC loop handles one connection serially. With concurrent
callers, each thread would `sendall()` then race to `recv()` the next
newline-terminated response from the shared buffer, so responses got
delivered to the wrong caller.
* File transport — `_seq += 1` is a non-atomic read-modify-write, so
two threads could allocate the same sequence number and clobber each
other's request/response files.
Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering
send+recv), and guard `_seq` allocation with a lock in the file case.
No protocol change.
Regression tests cover both the generated-source level (lock is present
and used) and an end-to-end concurrency test: running a sandboxed
ThreadPoolExecutor of 10 `terminal()` calls against a slow mock
dispatcher, asserting every caller sees its own tagged response. The
test fails without the fix (10/10 mismatched, matching real-world
repro) and passes with it.
The v11→v12 migrate_config step writes the API mode for every entry
under the new transport: field (per the v12+ schema in
_normalize_custom_provider_entry). _get_named_custom_provider
read the legacy api_mode: spelling only, so for every migrated
config the lookup returned None for the api mode.
Downstream, _resolve_named_custom_runtime then falls back through
custom_provider.get("api_mode") or _detect_api_mode_for_url(base_url)
or "chat_completions". For loopback URLs (proxies, local servers)
or unknown hostnames, the URL detector returns None and the resolver
silently downgrades the configured codex_responses /
anthropic_messages transport to chat_completions. Requests
get sent to /v1/chat/completions instead of /v1/responses or
/v1/messages and the provider 404s — or worse, returns a usable
chat_completions response while skipping the model's reasoning /
caching surface.
Fix: read both field names — entry.get("api_mode") or
entry.get("transport") — at the two match-by-key + match-by-name
branches in _get_named_custom_provider. The runtime normaliser
_normalize_custom_provider_entry already accepts both spellings;
this lifts the same compat into the direct-dict reader so v12+
configs work without going through the shim.
Adds three regression tests under
tests/hermes_cli/test_user_providers_model_switch.py:
- transport field is read on the match-by-key branch
- legacy api_mode spelling still works for hand-edited configs
- transport is read on the match-by-display-name branch
run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].
Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.
Fixes#17855
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block. The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.
A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.
Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key]. When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it. This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.
Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).
Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same. Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges. Post-fix, the entry is preserved and the drain
task reuses the original Event.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`. Each
chained follow-up added a frame to the call stack instead of starting
fresh. Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.
Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind. The new task takes
ownership via `_session_tasks[session_key]`.
The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.
Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry. Pre-fix: depths grow linearly `[1,2,3,…,12]`. Post-fix: all
depths are `1`.
`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.
What ships:
- New 'piper' built-in provider in tools/tts_tool.py
- Lazy import via _import_piper()
- Module-level voice cache keyed on (model_path, use_cuda) so switching
voices doesn't invalidate older cached voices
- _resolve_piper_voice_path() accepts either an absolute .onnx path or a
voice name (auto-downloaded on first use via 'python -m
piper.download_voices --download-dir <cache>')
- Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
get_hermes_dir)
- Optional SynthesisConfig knobs: length_scale, noise_scale,
noise_w_scale, volume, normalize_audio, use_cuda — passed through
only when configured, so older piper-tts versions aren't broken
- WAV output then ffmpeg conversion path (same as neutts/kittentts) so
Telegram voice bubbles work when ffmpeg is present
- Piper added to BUILTIN_TTS_PROVIDERS so a user's
tts.providers.piper.command cannot shadow the native provider
(regression test included)
- 'hermes tools' wizard entry
- Piper appears under Voice and TTS as local free, with
'pip install piper-tts' auto-install via post_setup handler
- Prints voice-catalog URL and default-voice info after install
- config.yaml defaults
- tts.piper.voice defaults to en_US-lessac-medium
- Commented advanced knobs for discoverability
- Docs
- New 'Piper (local, 44 languages)' section in features/tts.md
explaining install path, voice switching, pre-downloaded voices,
and advanced knobs
- Piper listed in the ten-provider table and ffmpeg table
- Custom-command-providers section updated to drop the Piper example
(now native) and add a piper-custom example for users with their own
trained .onnx models
- overview.md bumps provider count to ten
- Tests (tests/tools/test_tts_piper.py, 16 tests)
- Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
- _resolve_piper_voice_path across every branch: direct .onnx path,
cached voice name, fresh download with correct CLI args, download
failure, successful-exit-but-missing-files, empty voice to default
- _generate_piper_tts: loads voice once, reuses cache, voice-name
download wiring, advanced knobs flow through SynthesisConfig
- text_to_speech_tool end-to-end dispatch and missing-package error
- check_tts_requirements: piper availability toggles the return value
- Regression guard: piper cannot be shadowed by a command provider
with the same name
- Pre-existing test_tts_mistral test broadened to mock the new
piper/kittentts/command-provider checks (otherwise it false-passes
when piper is installed in the test venv)
E2E verification (live):
Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.
This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
Six tests in this file failed in CI (-n auto) after #17832 landed because
other tests on the same xdist worker reload hermes_cli.main:
tests/hermes_cli/test_env_loader.py:85-86
sys.modules.pop('hermes_cli.main', None)
importlib.import_module('hermes_cli.main')
tests/hermes_cli/test_skills_subparser.py:24-25
del sys.modules['hermes_cli.main']
When either ran first on a worker, our top-of-file
'from hermes_cli.main import _kill_stale_dashboard_processes' captured a
stale function object whose __globals__ points at the old module dict.
patch('hermes_cli.main._find_stale_dashboard_pids', ...) then patched the
new module, but the stale function resolved the dependency via its stale
__globals__, so every patch became a no-op: pids=[] → early return → no
signals, no output, assertions failed.
Fix: add an autouse fixture that rebinds the three module-level names to
whatever is currently live in sys.modules['hermes_cli.main'] before each
test runs. The pollutants in the other two files are load-bearing for
their own tests, so fixing it on the consumer side is correct.
Repro: pytest tests/hermes_cli/test_env_loader.py tests/hermes_cli/test_update_stale_dashboard.py
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
UserMessageChunk and AgentMessageChunk do not have a message_id field
in the ACP schema. Passing it silently dropped the kwarg (pydantic
does not raise on unknown init kwargs here) and the subsequent test
assertions on .message_id raised AttributeError. Strip the dead
plumbing (uuid import, message_id= kwarg on both chunk types, unused
session_id/index parameters) and remove the matching .message_id
asserts from the test.
Adds a deterministic pre-check on top of htsh's exception-based fallback:
before calling /content/abstract or /content/overview on a non-pseudo URI,
probe /api/v1/fs/stat. If the server says the URI is a file, route straight
to /content/read instead of eating a failing 500 round-trip.
This is the same idea pty819 and chennest independently landed in PRs
#12757 and #12937 — merged here on top of htsh's broader fix so we keep
pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding
the slow exception path on servers that return a raised 500 every time.
The exception fallback from #5886 stays in place for environments where
fs/stat is unavailable or returns an unfamiliar shape.
Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release
notes attribute them correctly.
OpenViking returns 500 for /content/abstract and /content/overview when URI points to mem_*.md files.
Add resilient fallback to /content/read for non-pseudo summary file URIs while preserving pseudo summary normalization.
Also add regression tests for fallback behavior.
`hermes dashboard` is a long-lived foreground server that users often
start and forget about, sometimes in a shell they've since closed. We
didn't have a way to stop it — users had to find the PID manually.
Adds two lifecycle flags that reuse the same detection + termination
path the post-`hermes update` cleanup (PR #17832) uses:
hermes dashboard --status
List running hermes dashboard processes with PID + cmdline.
Exit 0, informational.
hermes dashboard --stop
Terminate all running dashboards (3s grace then force-kill survivors).
Exit 0 if none remain, 1 if any couldn't be stopped.
Windows uses `taskkill /F` as before.
Both flags short-circuit before any fastapi/uvicorn import so they work
even on installations where the dashboard extras aren't installed —
useful when you're cleaning up after uninstalling.
The kill helper gained an optional `reason=...` param so the output
reads "(requested via --stop)" instead of the post-update-specific
"running backend no longer matches the updated frontend" wording.
E2E: `hermes dashboard --status` with nothing running prints the
empty message; with a fake `hermes dashboard ...` cmdline spawned via
`exec -a`, `--status` lists it, `--stop` terminates it (exit -15),
and a follow-up `--status` returns empty.
Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).
Config shape:
tts:
provider: piper-en
providers:
piper-en:
type: command
command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
output_format: wav
Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.
Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
under tts.providers.<name>.max_text_length.
Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.
Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.
E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.
Co-authored-by: Versun <me+github7604@versun.org>
`hermes update` previously just printed a warning when it detected a
running `hermes dashboard` process from the previous version, telling
the user to kill and restart it themselves. In practice dashboards get
started and forgotten, so the warning was routinely ignored and users
ended up with a silent frontend/backend mismatch (new JS bundle served
against the old in-memory Python backend, e.g. new auth headers the old
code doesn't recognise → every API call 401s).
The dashboard has no service manager, no PID file, and we don't record
the original launch args (--host, --port, --insecure, --tui, --no-open)
so we can't auto-restart it. But we CAN stop it, which is what the
user wants — the failure mode when the stale process is left alive is
worse than the dashboard just being down.
- POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors.
- Windows: `taskkill /PID <pid> /F`.
- Print each PID's outcome plus a one-line restart hint.
- Detection logic is unchanged (same ps / wmic scan, same guards
against the `pgrep -f` greedy-match trap from #16872 and the
#17049 wmic UnicodeDecodeError fix).
Also split the old monolithic `_warn_stale_dashboard_processes` into
`_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes`
(kill), keeping the old name as an alias so any external callers still
work.
E2E verified: spawned a fake `hermes dashboard` cmdline via
`exec -a 'hermes dashboard …' sleep 300`, ran
`_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15)
and that a post-scan returns an empty PID list.
Three narrow fixes targeting the remaining red checks after #17828:
1. ui-tui/src/app/slash/commands/ops.ts (Docker Build):
/reload-mcp's local params type annotated session_id: string
while ctx.sid is string | null. Widen to string | null —
matches every other rpc call site and the test harness which passes
{ session_id: null }. Fixes TS2322 on line 86. The rpc signature
itself is Record<string, unknown>, so this is purely a local
typing fix, no behavioral change.
2. tests/plugins/test_achievements_plugin.py (13 cascading test failures):
_install_fake_session_db did a raw sys.modules['hermes_state'] =
fake_module without restoration, leaking the fake across xdist
worker boundaries. Downstream tests doing from hermes_state import
SessionDB got a module whose SessionDB was lambda: fake_db
— 6 test_hermes_state.py tests failed with AttributeError: 'function'
object has no attribute '_sanitize_fts5_query' / _contains_cjk,
and 7 test_860_dedup.py tests failed with TypeError: got unexpected
keyword argument 'db_path' (real code calls SessionDB(db_path=...)).
Fix: stash monkeypatch on the plugin_api module object in the
fixture, and have the helper do monkeypatch.setitem(sys.modules,
'hermes_state', fake_module) for auto-restoration at test teardown.
3. tests/hermes_cli/test_web_server.py (WS race):
TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers hit the
30s test timeout on CI. websocket_connect returns after
ws.accept() — but /api/events registers the subscriber in
_event_channels on the NEXT await (inside _event_lock). A
publish immediately after connect could race ahead of registration
and be dropped, and the subsequent receive_text() blocked until
SIGALRM killed the test. Fix: poll _event_channels after the
subscriber connects, before publishing.
Validation:
scripts/run_tests.sh tests/plugins/test_achievements_plugin.py
tests/run_agent/test_860_dedup.py
tests/test_hermes_state.py
tests/hermes_cli/test_web_server.py 338 passed
cd ui-tui && npm run type-check clean
cd ui-tui && npm run build clean
Remaining red checks are pure infra (Nix ubuntu hits
TwirpErrorResponse ResourceExhausted on the GH Actions cache API; Nix
macos bounces between npm build openssl-legacy and cache rate-limits)
and cannot be fixed in the codebase.
Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.
- Add should_send_media_as_audio(platform, ext, is_voice) in
gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
_TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
end-to-end MEDIA routing via _process_message_background and
GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
fallback for FLAC/WAV.
Co-authored-by: Versun <me+github7604@versun.org>
Fixes the xdist collision that broke CI on PR #17764, and structurally
prevents future plugin-adapter tests from reintroducing it.
Problem
-------
tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py
(already on main) both followed the same anti-pattern:
sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>'))
from adapter import <Adapter>
Every platform plugin ships its own adapter.py, so the bare
'from adapter import ...' races for sys.modules['adapter']. Whichever test
collected first in a given xdist worker won; the other crashed at
collection with ImportError, and the polluted sys.path cascaded into 19
unrelated test failures across tools/, hermes_cli/, and run_agent/ in the
same worker.
Fix
---
1. tests/gateway/_plugin_adapter_loader.py (new): shared helper
load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py
via importlib.util under the unique module name plugin_adapter_<name>.
Zero sys.path mutation, no possibility of collision.
2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py:
migrated to the helper. All 'from adapter import ...' statements
(including the ones inside test methods) are replaced with module-level
attribute access on the loaded module.
3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans
every test_*.py under tests/gateway/ at session start and fails the
run with a pointer to the helper if any test uses sys.path.insert into
plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'.
Runs on the xdist controller only (skipped in workers). The next plugin
adapter test that tries to reintroduce this pattern gets rejected at
collection time with a clear remediation message.
4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to
AUTHOR_MAP so the check-attribution workflow passes.
Validation
----------
scripts/run_tests.sh tests/gateway/ 4194 passed
scripts/run_tests.sh tests/gateway/test_{teams,irc}* 72 passed (both orderings)
scripts/run_tests.sh <11 prev-failing test files> 398 passed
Guard triggers correctly on both Path-operator and string-literal forms
of the anti-pattern.
Hello! I am the maintainer of the microsoft-teams-apps Python SDK and
I built this Teams adapter to integrate Microsoft Teams into Hermes.
Adds a `plugins/platforms/teams` platform plugin using the new
PlatformRegistry system from #17751. The adapter self-registers via
`register(ctx)` — no hardcoding in run.py, toolsets.py, or any
other core file.
Key features:
- Supports personal DMs, group chats, and channel posts
- Adaptive Card approval prompts with in-place button replacement
(Allow Once / Allow Session / Always Allow / Deny)
- aiohttp webhook server bridged from the Teams SDK to avoid
the fastapi/uvicorn dependency
- ConversationReference caching for correct proactive sends in
non-DM chats
- `interactive_setup()` for `hermes gateway setup` integration
- `platform_hint` for LLM context (Teams markdown subset)
- 34 tests covering adapter init, send, message handling, and
plugin registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #17660 landed a sweep of CI fixes but left three loose ends:
1. tests/cli/test_cli_loading_indicator.py::test_reload_mcp_sets_busy_state_
and_prints_status — /reload-mcp gained a prompt-cache-invalidation
confirmation (commit 4d7fc0f37) that was never wired into this test.
The test exercises the loading-indicator path, so pre-approve via
config and go straight into _reload_mcp().
2. tools/mcp_tool.py _make_tool_handler — the added
getattr(server, '_rpc_lock', None) + 'skip the lock if missing'
branch is inconsistent with four sibling call sites that still
direct-access server._rpc_lock. The lock is guaranteed by
MCPServerTask.__init__; falling through to an unlocked
session.call_tool would silently serialize-strip RPCs if the guard
ever triggered. Restore direct access.
3. tui_gateway/server.py _messages_as_conversation — the helper
existed only to catch 'TypeError: include_ancestors unexpected'
from mocked SessionDBs that don't actually exist. The real
SessionDB.get_messages_as_conversation has accepted
include_ancestors since introduction, and every test FakeDB in
the repo already declares the kwarg. Remove the shim, inline the
two call sites.
* feat(plugins): bundle hermes-achievements, scan full session history
Ships @PCinkusz's hermes-achievements dashboard plugin (https://github.com/PCinkusz/hermes-achievements) as a bundled plugin at plugins/hermes-achievements/ and fixes a bug in the scan path that made the plugin only see the first 200 sessions — making lifetime badges (50k tool calls, 75k errors, etc.) unreachable on long-running installs.
Changes:
- plugins/hermes-achievements/: vendor v0.3.1 verbatim (manifest, dist/, plugin_api.py, tests, docs, README).
- plugins/hermes-achievements/dashboard/plugin_api.py:
* scan_sessions(): limit=None now scans ALL sessions via SQLite LIMIT -1. Previously capped at 200, so users with 8000+ sessions saw ~2% of their history.
* evaluate_all(): first-ever scans run in a background thread so the dashboard request path never blocks. Stale snapshots serve immediately while a background refresh runs. force=True still blocks synchronously for manual /rescan.
* _build_pending_snapshot(), _start_background_scan(), _run_scan_and_update_cache(): supporting plumbing + idempotent thread spawn.
- tests/plugins/test_achievements_plugin.py: new tests covering the 200-cap regression, the background-scan first-run flow, stale-serve-plus-background-refresh, forced sync rescan, and scan-thread idempotency.
- website/docs/user-guide/features/built-in-plugins.md: lists hermes-achievements in the bundled-plugins table and documents API endpoints, state files, and performance characteristics.
E2E validated against a real 8564-session ~6.4GB state.db:
* Cold scan: 13m 19s (one-time, backgrounded — UI never blocks)
* Warm rescan: 1.47s (8563/8564 sessions reused from checkpoint cache)
* 57/60 achievements unlocked, 3 discovered — aggregates like total_tool_calls=259958, total_errors=164213, skill_events=368243 correctly surface lifetime badges that the 200-cap made unreachable.
Original credit: @PCinkusz (MIT-licensed). Upstream repo remains the staging ground for new badges; this bundle keeps the dashboard feature parity with Hermes core changes.
* feat(achievements): publish partial snapshots during cold scan
Previously a cold scan on a large session DB (13min on 8564 sessions)
showed zero badges for the entire duration, then every badge at once
when the scan completed. A dashboard refresh mid-scan was indistinguishable
from a fresh install with no history.
Now the scanner publishes a partial snapshot to _SNAPSHOT_CACHE every
250 sessions, so each refresh during a cold scan surfaces more badges
incrementally.
Mechanism:
- scan_sessions() takes an optional progress_callback fired every
progress_every sessions with (sessions_so_far, scanned, total).
- _compute_from_scan() is extracted from compute_all() and gains an
is_partial flag that skips writing to state.json — we don't want
to record unlocked_at based on a half-complete aggregate that a
later session might rebalance.
- _run_scan_and_update_cache() installs a publisher callback that
builds a partial snapshot, marks it mode='in_progress', and writes
it to the cache with age=0 so the UI keeps polling /scan-status
and picks up the final snapshot when the scan completes.
- Manual /rescan (force=True) disables partial publishing — the
caller is blocking on the final result anyway.
E2E against real 8564-session state.db (polled cache every 10s):
t=10s: cache empty
t=20s: 250/8564 scanned, 35 unlocked, 25 discovered
t=40s: 500/8564 scanned, 42 unlocked, 18 discovered
t=60s: 1000/8564 scanned, 49 unlocked, 11 discovered
...
Tests: 9/9 pass (2 new — partial snapshot publication + no-persist-on-partial).
Upstream unittest suite: 10/10 pass.
* feat(achievements): in-progress scan banner with live % progress
Previously the dashboard showed zero badges silently during long cold
scans (13min on 8564 sessions). The backend was publishing partial
snapshots every 250 sessions, but the bundled UI didn't surface any
indicator that a scan was running — it just rendered the main page
with whatever counts were currently published and no way for the user
to know more progress was coming.
UI changes (dist/index.js, dist/style.css):
- Added a scan-in-progress banner rendered between the hero and stats
when scan_meta.mode is 'pending' or 'in_progress'. Shows:
BUILDING ACHIEVEMENT PROFILE…
Scanned 1,750 of 8,564 sessions · 20%. Badges unlock as more history streams in.
with a pulsing teal indicator and a filling teal/cyan progress bar.
Disappears the moment the backend flips to 'full' or 'incremental'.
- Added an auto-poller via useEffect — while scanInFlight is true the
page re-fetches /achievements every 4s WITHOUT toggling the loading
skeleton, so unlock counts tick up visibly without the user refreshing.
The effect cleans itself up when the scan finishes.
- Added refresh() (re-fetch, no loading flip) alongside the existing
load() (full reload, used by the Rescan button).
Attribution preserved:
- Added a header comment to index.js crediting @PCinkusz
(https://github.com/PCinkusz/hermes-achievements, MIT) as the
original author, noting the banner is a layered addition on top
of the original dist bundle.
- Matching header comment in style.css, flagging the new
.ha-scan-banner* rules as the local addition.
Live-verified end to end:
- Spun up `hermes dashboard --port 9229 --no-open` against a fresh
HERMES_HOME symlinked to the real 8564-session state.db.
- Opened /achievements in a browser, confirmed the banner renders with
live progress: 'Scanned 1,000 of 8,564 sessions · 11%' → updates to
'1,250 ... · 14%' → '1,750 ... · 20%' without user interaction,
matching the backend's partial publications.
- Stats row simultaneously climbed from 35 → 49 → 53 unlocked as
more history streamed in.
- Vision analysis of the rendered page confirms the banner styling
matches the rest of the dashboard (dark card bg, teal accent, same
small-caps typography, pulsing indicator reusing ha-pulse keyframes).
The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks
(gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex)
because ChatGPT-account Codex gates which models it accepts via an
undocumented, shifting allow-list that OpenAI publishes no changelog
for. Any pinned default will keep going stale. Issue #17533 reports
the current breakage: every ChatGPT-account auxiliary fallback fails
with HTTP 400 "model is not supported" and the 60s pause loop degrades
long sessions.
Rather than reset the clock with another stale pin (PR #17544 proposes
gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex
fallback entirely:
- Delete `_CODEX_AUX_MODEL`.
- Drop `_try_codex` from `_get_provider_chain()` (the auto chain now
ends at api-key providers; 4 rungs instead of 5).
- Rename `_try_codex() -> _build_codex_client(model)` and require an
explicit model from the caller. No more guessing.
- `resolve_provider_client("openai-codex", model=None)` now warns and
returns (None, None) instead of silently guessing a stale model ID.
- Remove `_try_codex` from the `provider="custom"` fallback ladder
(same stale-constant trap).
- `_resolve_strict_vision_backend("openai-codex")` routes through
`resolve_provider_client` so the caller's explicit model is honored.
Codex-main users are unaffected: Step 1 of `_resolve_auto` already
uses `main_provider` + `main_model` directly and passes the user's
configured Codex model through `resolve_provider_client`, which never
touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`)
continue to work and are the supported way to route specific aux tasks
through Codex.
Users whose main provider fails with a payment/connection error and
who have ONLY ChatGPT-account Codex auth will now see the 60s pause
without a stale-model-rejection noise line in between -- same outcome,
cleaner failure.
Closes#17533. Supersedes #17544 (which resets the clock on the
same stale-constant problem).