mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.
messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
per-profile .env, not 'hermes config set' (same pattern fixed in
api-server.md last round). Also bumped example ports to 8650+ to dodge the
default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
collision.
developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
platform tree updated (qqbot is a sub-package, not qqbot.py; added
yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
(always active)' was wrong — it's an empty extension point and
_register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
legacy single-pair model — corrected to the canonical list-form
fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
'git submodule update --init' is required if cloning without
--recurse-submodules.
guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
positional (not --schedule), the script-only flag is --no-agent (not
--script-only), and there's no --command flag. Replaced with a real example
that creates the script under ~/.hermes/scripts/ and uses the actual flags.
Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
fails because --region isn't registered on the auth-add argparse spec.
Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
table mention with a webhook-subscription pointer; also fixed the dead link
to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
one quoted string, so cmd.exe gets a single arg instead of the multi-token
command line it needs. Removed the surrounding quotes — argparse nargs='*'
collects each token correctly.
integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
refreshed. Fallback section rewritten to lead with the canonical
fallback_providers list form (was leading with the legacy fallback_model
single dict); supported-providers list extended to include azure-foundry,
alibaba-coding-plan, lmstudio.
index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
integrations/index.md ('19+') and undercounted — bumped to 20+ and added
Weixin/QQ Bot/Yuanbao/Google Chat to the list.
Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.
209 lines
8.8 KiB
Markdown
209 lines
8.8 KiB
Markdown
---
|
|
sidebar_position: 4
|
|
title: "Provider Runtime Resolution"
|
|
description: "How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime"
|
|
---
|
|
|
|
# Provider Runtime Resolution
|
|
|
|
Hermes has a shared provider runtime resolver used across:
|
|
|
|
- CLI
|
|
- gateway
|
|
- cron jobs
|
|
- ACP
|
|
- auxiliary model calls
|
|
|
|
Primary implementation:
|
|
|
|
- `hermes_cli/runtime_provider.py` — credential resolution, `_resolve_custom_runtime()`
|
|
- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
|
|
- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
|
|
- `agent/auxiliary_client.py` — auxiliary model routing
|
|
- `providers/` — ABC + registry entry points (`ProviderProfile`, `register_provider`, `get_provider_profile`, `list_providers`)
|
|
- `plugins/model-providers/<name>/` — per-provider plugins (bundled) that declare `api_mode`, `base_url`, `env_vars`, `fallback_models` and register themselves into the registry on first access. User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled ones of the same name.
|
|
|
|
`get_provider_profile()` in `providers/` returns a `ProviderProfile` for a given provider id. `runtime_provider.py` calls this at resolution time to get the canonical `base_url`, `env_vars` priority list, `api_mode`, and `fallback_models` without needing to duplicate that data in multiple files. Adding a new plugin under `plugins/model-providers/<your-provider>/` (or `$HERMES_HOME/plugins/model-providers/<your-provider>/`) that calls `register_provider()` is enough for `runtime_provider.py` to pick it up — no branch needed in the resolver itself.
|
|
|
|
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) and the [Model Provider Plugin guide](./model-provider-plugin.md) alongside this page.
|
|
|
|
## Resolution precedence
|
|
|
|
At a high level, provider resolution uses:
|
|
|
|
1. explicit CLI/runtime request
|
|
2. `config.yaml` model/provider config
|
|
3. environment variables
|
|
4. provider-specific defaults or auto resolution
|
|
|
|
That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in `hermes model`.
|
|
|
|
## Providers
|
|
|
|
Current provider families include (see `plugins/model-providers/` for the complete bundled set):
|
|
|
|
- AI Gateway (Vercel)
|
|
- OpenRouter
|
|
- Nous Portal
|
|
- OpenAI Codex
|
|
- Copilot / Copilot ACP
|
|
- Anthropic (native)
|
|
- Google / Gemini (`gemini`, `google-gemini-cli`)
|
|
- Alibaba / DashScope (`alibaba`, `alibaba-coding-plan`)
|
|
- DeepSeek
|
|
- Z.AI
|
|
- Kimi / Moonshot (`kimi-coding`, `kimi-coding-cn`)
|
|
- MiniMax (`minimax`, `minimax-cn`, `minimax-oauth`)
|
|
- Kilo Code
|
|
- Hugging Face
|
|
- OpenCode Zen / OpenCode Go
|
|
- AWS Bedrock
|
|
- Azure Foundry
|
|
- NVIDIA NIM
|
|
- xAI (Grok)
|
|
- Arcee
|
|
- GMI Cloud
|
|
- StepFun
|
|
- Qwen OAuth
|
|
- Xiaomi
|
|
- Ollama Cloud
|
|
- LM Studio
|
|
- Tencent TokenHub
|
|
- Custom (`provider: custom`) — first-class provider for any OpenAI-compatible endpoint
|
|
- Named custom providers (`custom_providers` list in config.yaml)
|
|
|
|
## Output of runtime resolution
|
|
|
|
The runtime resolver returns data such as:
|
|
|
|
- `provider`
|
|
- `api_mode`
|
|
- `base_url`
|
|
- `api_key`
|
|
- `source`
|
|
- provider-specific metadata like expiry/refresh info
|
|
|
|
## Why this matters
|
|
|
|
This resolver is the main reason Hermes can share auth/runtime logic between:
|
|
|
|
- `hermes chat`
|
|
- gateway message handling
|
|
- cron jobs running in fresh sessions
|
|
- ACP editor sessions
|
|
- auxiliary model tasks
|
|
|
|
## AI Gateway
|
|
|
|
Set `AI_GATEWAY_API_KEY` in `~/.hermes/.env` and run with `--provider ai-gateway`. Hermes fetches available models from the gateway's `/models` endpoint, filtering to language models with tool-use support.
|
|
|
|
## OpenRouter, AI Gateway, and custom OpenAI-compatible base URLs
|
|
|
|
Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. `OPENROUTER_API_KEY`, `AI_GATEWAY_API_KEY`, and `OPENAI_API_KEY`).
|
|
|
|
Each provider's API key is scoped to its own base URL:
|
|
|
|
- `OPENROUTER_API_KEY` is only sent to `openrouter.ai` endpoints
|
|
- `AI_GATEWAY_API_KEY` is only sent to `ai-gateway.vercel.sh` endpoints
|
|
- `OPENAI_API_KEY` is used for custom endpoints and as a fallback
|
|
|
|
Hermes also distinguishes between:
|
|
|
|
- a real custom endpoint selected by the user
|
|
- the OpenRouter fallback path used when no custom endpoint is configured
|
|
|
|
That distinction is especially important for:
|
|
|
|
- local model servers
|
|
- non-OpenRouter/non-AI Gateway OpenAI-compatible APIs
|
|
- switching providers without re-running setup
|
|
- config-saved custom endpoints that should keep working even when `OPENAI_BASE_URL` is not exported in the current shell
|
|
|
|
## Native Anthropic path
|
|
|
|
Anthropic is not just "via OpenRouter" anymore.
|
|
|
|
When provider resolution selects `anthropic`, Hermes uses:
|
|
|
|
- `api_mode = anthropic_messages`
|
|
- the native Anthropic Messages API
|
|
- `agent/anthropic_adapter.py` for translation
|
|
|
|
Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:
|
|
|
|
- Claude Code credential files are treated as the preferred source when they include refreshable auth
|
|
- manual `ANTHROPIC_TOKEN` / `CLAUDE_CODE_OAUTH_TOKEN` values still work as explicit overrides
|
|
- Hermes preflights Anthropic credential refresh before native Messages API calls
|
|
- Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path
|
|
|
|
## OpenAI Codex path
|
|
|
|
Codex uses a separate Responses API path:
|
|
|
|
- `api_mode = codex_responses`
|
|
- dedicated credential resolution and auth store support
|
|
|
|
## Auxiliary model routing
|
|
|
|
Auxiliary tasks such as:
|
|
|
|
- vision
|
|
- web extraction summarization
|
|
- context compression summaries
|
|
- session search summarization
|
|
- skills hub operations
|
|
- MCP helper operations
|
|
- memory flushes
|
|
|
|
can use their own provider/model routing rather than the main conversational model.
|
|
|
|
When an auxiliary task is configured with provider `main`, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:
|
|
|
|
- env-driven custom endpoints still work
|
|
- custom endpoints saved via `hermes model` / `config.yaml` also work
|
|
- auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback
|
|
|
|
## Fallback models
|
|
|
|
Hermes supports a configured fallback provider chain — a list of `(provider, model)` entries tried in order when the primary model encounters errors. The legacy single-pair `fallback_model` dict is still accepted for back-compat (and migrated on first write).
|
|
|
|
### How it works internally
|
|
|
|
1. **Storage**: `AIAgent.__init__` stores the `fallback_model` dict and sets `_fallback_activated = False`.
|
|
|
|
2. **Trigger points**: `_try_activate_fallback()` is called from three places in the main retry loop in `run_agent.py`:
|
|
- After max retries on invalid API responses (None choices, missing content)
|
|
- On non-retryable client errors (HTTP 401, 403, 404)
|
|
- After max retries on transient errors (HTTP 429, 500, 502, 503)
|
|
|
|
3. **Activation flow** (`_try_activate_fallback`):
|
|
- Returns `False` immediately if already activated or not configured
|
|
- Calls `resolve_provider_client()` from `auxiliary_client.py` to build a new client with proper auth
|
|
- Determines `api_mode`: `codex_responses` for openai-codex, `anthropic_messages` for anthropic, `chat_completions` for everything else
|
|
- Swaps in-place: `self.model`, `self.provider`, `self.base_url`, `self.api_mode`, `self.client`, `self._client_kwargs`
|
|
- For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
|
|
- Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
|
|
- Sets `_fallback_activated = True` — prevents firing again
|
|
- Resets retry count to 0 and continues the loop
|
|
|
|
4. **Config flow**:
|
|
- CLI: `cli.py` reads `CLI_CONFIG["fallback_model"]` → passes to `AIAgent(fallback_model=...)`
|
|
- Gateway: `gateway/run.py._load_fallback_model()` reads `config.yaml` → passes to `AIAgent`
|
|
- Validation: both `provider` and `model` keys must be non-empty, or fallback is disabled
|
|
|
|
### What does NOT support fallback
|
|
|
|
- **Subagent delegation** (`tools/delegate_tool.py`): subagents inherit the parent's provider but not the fallback config
|
|
- **Auxiliary tasks**: use their own independent provider auto-detection chain (see Auxiliary model routing above)
|
|
|
|
Cron jobs **do** support fallback: `run_job()` reads `fallback_providers` (or legacy `fallback_model`) from `config.yaml` and passes it to `AIAgent(fallback_model=...)`, matching the gateway's `_load_fallback_model()` pattern. See [Cron Internals](./cron-internals.md).
|
|
|
|
### Test coverage
|
|
|
|
See `tests/test_fallback_model.py` for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.
|
|
|
|
## Related docs
|
|
|
|
- [Agent Loop Internals](./agent-loop.md)
|
|
- [ACP Internals](./acp-internals.md)
|
|
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|