fix(provider): make config.yaml model.provider the single source of truth (#31222)

Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER
was leaking behavioral config into the .env surface, including from the gateway,
which bypassed config.yaml entirely.

Behavior:
- gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs.
  Gateway now flows through resolve_runtime_provider() with no `requested` override,
  which reads model.provider from config.yaml first.

Docs/UX (strip env var from user-facing surface):
- --provider help text no longer mentions the env var
- cli-config.yaml.example same
- reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and
  the cross-reference from HERMES_INFERENCE_MODEL
- reference/cli-commands.md: blank the env-var column for --provider
- guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace
  HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider
- developer-guide/adding-providers.md, model-provider-plugin.md: reframe

Internal mechanism (kept as-is):
- hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env
- tui_gateway/server.py reads it on TUI startup
- resolve_requested_provider() / oneshot.py / cli.py still fall through to the
  env var as a last-resort behind config.yaml, which is what makes the TUI
  parent->child handoff work
This stays. We just stop documenting it as a user knob.

Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first
call, succeed on second; drop monkeypatch.setenv lines that no longer matter.

Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying
issue but proposed aligning gateway *to* the env var rather than removing it).
This commit is contained in:
Teknium 2026-05-23 18:18:41 -07:00 committed by GitHub
parent 7a4dc8e8d6
commit e42fcc5625
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 25 additions and 22 deletions

View file

@ -39,7 +39,7 @@ model:
# LM Studio is first-class and uses provider: "lmstudio".
# It works with both no-auth and auth-enabled server modes.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
# Can also be overridden for a single invocation with the --provider flag.
provider: "auto"
# API configuration (falls back to OPENROUTER_API_KEY env var)

View file

@ -962,6 +962,12 @@ _AGENT_PENDING_SENTINEL = object()
def _resolve_runtime_agent_kwargs() -> dict:
"""Resolve provider credentials for gateway-created AIAgent instances.
Provider is read from ``config.yaml`` ``model.provider`` (the single
source of truth). ``resolve_runtime_provider()`` falls through to env
var lookups internally for legacy compatibility, but the gateway does
not consult environment variables for behavioral config config.yaml
is authoritative.
If the primary provider fails with an authentication error, attempt to
resolve credentials using the fallback provider chain from config.yaml
before giving up.
@ -973,9 +979,7 @@ def _resolve_runtime_agent_kwargs() -> dict:
from hermes_cli.auth import AuthError
try:
runtime = resolve_runtime_provider(
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
)
runtime = resolve_runtime_provider()
except AuthError as auth_exc:
# Primary provider auth failed (expired token, revoked key, etc.).
# Try the fallback provider chain before raising.

View file

@ -129,7 +129,8 @@ def build_top_level_parser():
default=None,
help=(
"Provider override for this invocation (e.g. openrouter, anthropic). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
"Applies to -z/--oneshot and --tui. The persistent provider lives in config.yaml "
"under model.provider — use `hermes setup` or edit the file to change it."
),
)
parser.add_argument(

View file

@ -17,7 +17,6 @@ Model / provider selection mirrors `hermes chat`:
Env var fallbacks (used when the corresponding arg is not passed):
- HERMES_INFERENCE_MODEL
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
"""
from __future__ import annotations
@ -135,9 +134,8 @@ def run_oneshot(
prompt: The user message to send.
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
env var, then config.yaml's model.default / model.model.
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
provider: Optional provider override. Falls back to config.yaml's
model.provider, then "auto".
toolsets: Optional comma-separated string or iterable of toolsets.
Returns the exit code. Caller should sys.exit() with the return.

View file

@ -27,8 +27,11 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
def _mock_resolve(**kwargs):
call_count["n"] += 1
requested = kwargs.get("requested", "")
if requested and "codex" in str(requested).lower():
# First call = primary path (gateway reads model.provider from
# config.yaml internally; we simulate the auth failure here).
# Second call = fallback path with explicit_api_key + explicit_base_url
# supplied by gateway from fallback_model config.
if call_count["n"] == 1:
raise AuthError("Codex token refresh failed with status 401")
return {
"api_key": "fallback-key",
@ -40,8 +43,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
"credential_pool": None,
}
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
with patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
side_effect=_mock_resolve,
@ -62,7 +63,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
config_path.write_text("model:\n provider: openai-codex\n")
monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
with patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",

View file

@ -116,7 +116,7 @@ When you add a plugin and it calls `register_provider()`, the following wire up
8. `hermes setup` wizard delegates to `main.py` automatically
9. `provider:model` alias syntax works
10. Runtime resolver returns the correct `base_url` and `api_key`
11. `HERMES_INFERENCE_PROVIDER` env-var override accepts the provider id
11. `--provider <name>` CLI flag accepts the provider id
12. Fallback model activation can switch into the provider cleanly
User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled plugins of the same name (last-writer-wins in `register_provider()`) — so third parties can monkey-patch or replace any built-in profile without editing the repo.

View file

@ -89,7 +89,7 @@ Full definition in `providers/base.py`. The most useful ones:
| Field | Type | Purpose |
|---|---|---|
| `name` | str | Canonical id — matches `--provider` choices and `HERMES_INFERENCE_PROVIDER` |
| `name` | str | Canonical id — matches `model.provider` in `config.yaml` and the `--provider` flag |
| `aliases` | `tuple[str, ...]` | Alternative names resolved by `get_provider_profile()` (e.g. `grok``xai`) |
| `api_mode` | str | `chat_completions` \| `codex_responses` \| `anthropic_messages` \| `bedrock_converse` |
| `display_name` | str | Human label shown in `hermes model` picker |

View file

@ -157,10 +157,10 @@ The `minimax-oauth` provider does **not** use `MINIMAX_API_KEY` or `MINIMAX_BASE
| `MINIMAX_API_KEY` | Used by `minimax` provider only — ignored for `minimax-oauth` |
| `MINIMAX_CN_API_KEY` | Used by `minimax-cn` provider only — ignored for `minimax-oauth` |
To force the `minimax-oauth` provider at runtime:
To use `minimax-oauth` as the active provider, set `model.provider: minimax-oauth` in `config.yaml` (use `hermes setup` for the guided flow), or pass `--provider minimax-oauth` for a single invocation:
```bash
HERMES_INFERENCE_PROVIDER=minimax-oauth hermes
hermes --provider minimax-oauth
```
## Models

View file

@ -190,7 +190,8 @@ The chat catalog is derived live from the on-disk `models.dev` cache; new xAI re
| Variable | Effect |
|----------|--------|
| `XAI_BASE_URL` | Override the default `https://api.x.ai/v1` endpoint (rarely needed). |
| `HERMES_INFERENCE_PROVIDER` | Force the active provider at runtime, e.g. `HERMES_INFERENCE_PROVIDER=xai-oauth hermes`. |
To select xAI as the active provider, set `model.provider: xai-oauth` in `config.yaml` (use `hermes setup` for the guided flow) or pass `--provider xai-oauth` for a single invocation.
## Troubleshooting

View file

@ -138,7 +138,7 @@ Per-run overrides (no mutation to `~/.hermes/config.yaml`):
| Flag | Equivalent env var | Purpose |
|---|---|---|
| `-m` / `--model <model>` | `HERMES_INFERENCE_MODEL` | Override the model for this run |
| `--provider <provider>` | `HERMES_INFERENCE_PROVIDER` | Override the provider for this run |
| `--provider <provider>` | _(none)_ | Override the provider for this run |
```bash
hermes -z "…" --provider openrouter --model openai/gpt-5.5

View file

@ -113,7 +113,6 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| Variable | Description |
|----------|-------------|
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `custom`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `novita`, `gemini`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth` (browser OAuth login — no API key required; see [MiniMax OAuth guide](../guides/minimax-oauth.md)), `kilocode`, `xiaomi`, `arcee`, `gmi`, `stepfun`, `alibaba`, `alibaba-coding-plan` (alias `alibaba_coding`), `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `xai-oauth` (browser OAuth login for SuperGrok subscribers — no API key required; see [xAI Grok OAuth guide](../guides/xai-grok-oauth.md)), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `tencent-tokenhub` (default: `auto`) |
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
@ -589,7 +588,7 @@ Advanced per-platform knobs for throttling the outbound message batcher. Most us
| `HERMES_TUI_DIR` | Path to a prebuilt `ui-tui/` directory (must contain `dist/entry.js` and populated `node_modules`). Used by distros and Nix to skip the first-launch `npm install`. |
| `HERMES_TUI_RESUME` | Resume a specific TUI session by ID on launch. When set, `hermes --tui` skips forging a fresh session and picks up the named session instead — useful for re-attaching after a disconnect or terminal crash. |
| `HERMES_TUI_THEME` | Force the TUI color theme: `light`, `dark`, or a raw 6-character background hex (e.g. `ffffff` or `1a1a2e`). When unset, Hermes auto-detects using `COLORFGBG` and terminal background queries; this variable overrides detection on terminals (Ghostty, Warp, iTerm2, etc.) that don't set `COLORFGBG`. |
| `HERMES_INFERENCE_MODEL` | Force the model for `hermes -z` / `hermes chat` without mutating `config.yaml`. Pairs with `HERMES_INFERENCE_PROVIDER`. Useful for scripted callers (sweeper, CI, batch runners) that need to override the default model per run. |
| `HERMES_INFERENCE_MODEL` | Force the model for `hermes -z` / `hermes chat` without mutating `config.yaml`. Pairs with the `--provider` flag. Useful for scripted callers (sweeper, CI, batch runners) that need to override the default model per run. |
## Session Settings