* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected.
8.4 KiB
| sidebar_position | title | description |
|---|---|---|
| 4 | Provider Runtime Resolution | How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime |
Provider Runtime Resolution
Hermes has a shared provider runtime resolver used across:
- CLI
- gateway
- cron jobs
- ACP
- auxiliary model calls
Primary implementation:
hermes_cli/runtime_provider.py— credential resolution,_resolve_custom_runtime()hermes_cli/auth.py— provider registry,resolve_provider()hermes_cli/model_switch.py— shared/modelswitch pipeline (CLI + gateway)agent/auxiliary_client.py— auxiliary model routingproviders/— ABC + registry entry points (ProviderProfile,register_provider,get_provider_profile,list_providers)plugins/model-providers/<name>/— per-provider plugins (bundled) that declareapi_mode,base_url,env_vars,fallback_modelsand register themselves into the registry on first access. User plugins at$HERMES_HOME/plugins/model-providers/<name>/override bundled ones of the same name.
get_provider_profile() in providers/ returns a ProviderProfile for a given provider id. runtime_provider.py calls this at resolution time to get the canonical base_url, env_vars priority list, api_mode, and fallback_models without needing to duplicate that data in multiple files. Adding a new plugin under plugins/model-providers/<your-provider>/ (or $HERMES_HOME/plugins/model-providers/<your-provider>/) that calls register_provider() is enough for runtime_provider.py to pick it up — no branch needed in the resolver itself.
If you are trying to add a new first-class inference provider, read Adding Providers and the Model Provider Plugin guide alongside this page.
Resolution precedence
At a high level, provider resolution uses:
- explicit CLI/runtime request
config.yamlmodel/provider config- environment variables
- provider-specific defaults or auto resolution
That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in hermes model.
Providers
Current provider families include (see plugins/model-providers/ for the complete bundled set):
- OpenRouter
- Nous Portal
- OpenAI Codex
- Copilot / Copilot ACP
- Anthropic (native)
- Google / Gemini (
gemini,google-gemini-cli) - Alibaba / DashScope (
alibaba,alibaba-coding-plan) - DeepSeek
- Z.AI
- Kimi / Moonshot (
kimi-coding,kimi-coding-cn) - MiniMax (
minimax,minimax-cn,minimax-oauth) - Kilo Code
- Hugging Face
- OpenCode Zen / OpenCode Go
- AWS Bedrock
- Azure Foundry
- NVIDIA NIM
- xAI (Grok)
- Arcee
- GMI Cloud
- StepFun
- Qwen OAuth
- Xiaomi
- Ollama Cloud
- LM Studio
- Tencent TokenHub
- Custom (
provider: custom) — first-class provider for any OpenAI-compatible endpoint - Named custom providers (
custom_providerslist in config.yaml)
Output of runtime resolution
The runtime resolver returns data such as:
providerapi_modebase_urlapi_keysource- provider-specific metadata like expiry/refresh info
Why this matters
This resolver is the main reason Hermes can share auth/runtime logic between:
hermes chat- gateway message handling
- cron jobs running in fresh sessions
- ACP editor sessions
- auxiliary model tasks
OpenRouter and custom OpenAI-compatible base URLs
Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. OPENROUTER_API_KEY and OPENAI_API_KEY).
Each provider's API key is scoped to its own base URL:
OPENROUTER_API_KEYis only sent toopenrouter.aiendpointsOPENAI_API_KEYis used for custom endpoints and as a fallback
Hermes also distinguishes between:
- a real custom endpoint selected by the user
- the OpenRouter fallback path used when no custom endpoint is configured
That distinction is especially important for:
- local model servers
- non-OpenRouter OpenAI-compatible APIs
- switching providers without re-running setup
- config-saved custom endpoints that should keep working even when
OPENAI_BASE_URLis not exported in the current shell
Native Anthropic path
Anthropic is not just "via OpenRouter" anymore.
When provider resolution selects anthropic, Hermes uses:
api_mode = anthropic_messages- the native Anthropic Messages API
agent/anthropic_adapter.pyfor translation
Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:
- Claude Code credential files are treated as the preferred source when they include refreshable auth
- manual
ANTHROPIC_TOKEN/CLAUDE_CODE_OAUTH_TOKENvalues still work as explicit overrides - Hermes preflights Anthropic credential refresh before native Messages API calls
- Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path
OpenAI Codex path
Codex uses a separate Responses API path:
api_mode = codex_responses- dedicated credential resolution and auth store support
Auxiliary model routing
Auxiliary tasks such as:
- vision
- web extraction summarization
- context compression summaries
- skills hub operations
- MCP helper operations
- memory flushes
can use their own provider/model routing rather than the main conversational model.
When an auxiliary task is configured with provider main, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:
- env-driven custom endpoints still work
- custom endpoints saved via
hermes model/config.yamlalso work - auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback
Fallback models
Hermes supports a configured fallback provider chain — a list of (provider, model) entries tried in order when the primary model encounters errors. The legacy single-pair fallback_model dict is still accepted for back-compat (and migrated on first write).
How it works internally
-
Storage:
AIAgent.__init__stores thefallback_modeldict and sets_fallback_activated = False. -
Trigger points:
_try_activate_fallback()is called from three places in the main retry loop inrun_agent.py:- After max retries on invalid API responses (None choices, missing content)
- On non-retryable client errors (HTTP 401, 403, 404)
- After max retries on transient errors (HTTP 429, 500, 502, 503)
-
Activation flow (
_try_activate_fallback):- Returns
Falseimmediately if already activated or not configured - Calls
resolve_provider_client()fromauxiliary_client.pyto build a new client with proper auth - Determines
api_mode:codex_responsesfor openai-codex,anthropic_messagesfor anthropic,chat_completionsfor everything else - Swaps in-place:
self.model,self.provider,self.base_url,self.api_mode,self.client,self._client_kwargs - For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
- Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
- Sets
_fallback_activated = True— prevents firing again - Resets retry count to 0 and continues the loop
- Returns
-
Config flow:
- CLI:
cli.pyreadsCLI_CONFIG["fallback_model"]→ passes toAIAgent(fallback_model=...) - Gateway:
gateway/run.py._load_fallback_model()readsconfig.yaml→ passes toAIAgent - Validation: both
providerandmodelkeys must be non-empty, or fallback is disabled
- CLI:
What does NOT support fallback
- Subagent delegation (
tools/delegate_tool.py): subagents inherit the parent's provider but not the fallback config - Auxiliary tasks: use their own independent provider auto-detection chain (see Auxiliary model routing above)
Cron jobs do support fallback: run_job() reads fallback_providers (or legacy fallback_model) from config.yaml and passes it to AIAgent(fallback_model=...), matching the gateway's _load_fallback_model() pattern. See Cron Internals.
Test coverage
See tests/test_fallback_model.py for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.