* fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models
Reasoning-mandatory Anthropic models (Claude 4.6+/fable/mythos-class) over
OpenRouter ignore reasoning.effort and use adaptive thinking. #42991 correctly
stopped Hermes from sending a reasoning field to them (it 400s), but put nothing
in its place — leaving agent.reasoning_effort a silent no-op on the OpenRouter
path: the model always ran at its adaptive default (high) regardless of config.
OpenRouter honors the requested effort on the top-level verbosity field instead
(maps to Anthropic output_config.effort). Route the existing
reasoning_config[effort] there for these models while still never emitting a
reasoning field, preserving the #42991 fix. No new config arg — the value the
user already sets via agent.reasoning_effort now flows to verbosity.
- low/medium/high/xhigh/max pass through verbatim (OpenRouter accepts the
extended scale for Claude; verified live HTTP 200 + monotonic token spend).
- effort unset/none/disabled omits verbosity so the model keeps its default.
- native Anthropic transport already correct; unchanged.
Fixes#43432
* test(openrouter): cover real effort range (add minimal, frame max as passthrough)
Adversarial review noted the verbosity tests looped over 'max' — a value
parse_reasoning_effort can never produce — while omitting 'minimal', which it
can. Align the routing test with the real config range
(VALID_REASONING_EFFORTS = minimal/low/medium/high/xhigh) and keep a separate
value-agnostic passthrough test that documents why xhigh/max must survive
verbatim (TypedDict, no runtime literal validation; OpenRouter accepts the
extended scale for Claude).
* docs: explain reasoning_effort -> verbosity routing for adaptive Anthropic models
Document that reasoning_effort transparently maps to OpenRouter's verbosity
field for adaptive-thinking Anthropic models (Claude 4.6+/Fable/Mythos), where
reasoning.effort is ignored. Note xhigh is the configurable ceiling (max is wire-
only). Add verbosity as a top-level-kwarg example in the provider-plugin guide.
The previous fix (#42991) only omitted reasoning when it was being disabled.
But reasoning-mandatory Anthropic models (Claude 4.6+, fable) 400 with
thinking.type.disabled on EVERY tool-continuation turn even when reasoning is
enabled: chat_completions never replays signed thinking blocks, so the prior
assistant tool_call has no thinking, and OpenRouter resolves "reasoning
requested but history has none" by emitting thinking.type.disabled — which
these models reject. Result: first turn works, every turn after the first tool
call dies (HTTP 400, non-retryable).
OpenRouter ignores reasoning.effort for adaptive Anthropic models anyway (the
model self-decides), so the reasoning field is pointless for them on every turn
and harmful on tool-replay turns. Omit it entirely → adaptive default.
- openrouter profile: drop the reasoning field for reasoning-mandatory Anthropic
models regardless of enabled/disabled; legacy Anthropic + non-Anthropic models
unchanged.
- tests: assert omission across enabled/disabled/effort variants; parity tests
switched to a non-Anthropic reasoning model (deepseek) since Anthropic 4.6+ no
longer carries a reasoning field.
Verified live end-to-end: a tool-replay turn on anthropic/claude-fable-5 with
reasoning enabled now builds extra_body=None and returns HTTP 200 (was 400).
New Anthropic models without a recognized version substring (claude-fable-5
and future named/numbered releases) were classified as legacy and routed down
the manual-thinking path, which made OpenRouter emit thinking.type.disabled —
a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400.
Invert the brittle version-substring allowlists to default-to-modern (mirroring
_get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/
no-sampling contract, with an explicit legacy list for older families. Non-Claude
Anthropic-Messages models (minimax, qwen3, …) keep the manual path.
- anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort /
_forbids_sampling_params now default unknown Claude models to modern; legacy
families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS.
- openrouter profile: omit reasoning entirely (→ adaptive default) instead of
forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy
Anthropic + all non-Anthropic models still pass the disable form through.
- model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out).
Tests assert the invariant ("unknown Claude model -> modern contract; legacy
stays manual; non-Claude unaffected"), not specific model names.
OpenRouter supports a session_id field in extra_body that pins
multi-turn conversations to the same provider endpoint, enabling
prompt cache reuse across turns. The session_id was already threaded
through to build_extra_body() but never included in the returned dict.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
Pass session_id through to provider profile build_api_kwargs_extras so
the OpenRouter profile can attach an xAI cache-affinity header
(x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt
cache requires server affinity via this header — without it the cache
is poisoned and Grok prompt-cache hit rates drop dramatically on
multi-turn sessions.
Carve-out of #22708 by Ninso112. The original PR bundled a /diff
slash command, a zsh completion fix (already on main via #22802),
and holographic memory null-guards. This salvage keeps just the
Grok header work — small, targeted, and well-tested. Other
contributors and changes preserved for separate review.
Closes#22705.
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.
- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
plugins (providers/ discovery owns them); standalone plugins with
register_provider+ProviderProfile in their __init__.py auto-coerce to
this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
provider-runtime.md, providers/README.md, plugins/model-providers/README.md
No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.
Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.