Adds 'hermes proxy start' — a local HTTP server that lets external apps (OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the user's real OAuth-resolved credentials to each forwarded request, refreshing them automatically; the client can send any bearer (it gets stripped). Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and registry in hermes_cli/proxy/adapters/ are designed for additional OAuth providers to plug in by name without server changes. Commands: hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645] hermes proxy status hermes proxy providers Allowed Portal paths: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Anything else returns 404 with a clear error pointing at the allowed list. aiohttp is gated like gateway/platforms/api_server.py (try-import, clean runtime error if missing). No new core dependency. Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess and verifies the upstream receives the right bearer with the client's header stripped.
6.1 KiB
| sidebar_position | title | description |
|---|---|---|
| 15 | Subscription Proxy | Use your Nous Portal subscription (or other OAuth provider) as an OpenAI-compatible endpoint for external apps |
Subscription Proxy
The subscription proxy is a local HTTP server that lets external apps — OpenViking, Karakeep, Open WebUI, anything that speaks OpenAI-compatible chat completions — use your Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the right credentials (refreshing them automatically) so the app never needs a static API key.
This is different from the API server:
| API server | Subscription proxy | |
|---|---|---|
| What it serves | Your agent (full toolset, memory, skills) | Raw model inference |
| Use case | "Use Hermes as a chat backend" | "Use my Portal sub from another app" |
| Auth | Your API_SERVER_KEY |
Any bearer (proxy attaches the real one) |
| Tool calls | Yes — the agent runs tools | No — passthrough only |
Use the API server when you want the agent as a backend. Use the proxy when you just want the model through your subscription.
Quick Start
1. Log into your provider (one-time)
hermes login nous
This opens your browser for the Nous Portal OAuth flow. Hermes stores
the refresh token in ~/.hermes/auth.json — the same place all Hermes
provider logins live.
2. Start the proxy
hermes proxy start
Starting Hermes proxy for Nous Portal
Listening on: http://127.0.0.1:8645/v1
Forwarding to: (resolved per-request from your subscription)
Use any bearer token in the client — the proxy attaches your real credential.
Leave this running in the foreground. Use tmux, nohup, or a systemd
unit if you want it to survive logout.
3. Point your app at it
Any OpenAI-compatible app config takes the same triple:
Base URL: http://127.0.0.1:8645/v1
API key: anything (e.g. "sk-unused")
Model: Hermes-4-70B # or Hermes-4.3-36B, Hermes-4-405B
The proxy ignores the Authorization header from your app and attaches
your real Portal credential to the upstream request. Refreshes happen
automatically when the bearer approaches expiry.
Available providers
hermes proxy providers
Currently shipped: nous (Nous Portal). More OAuth providers can be
added by implementing the UpstreamAdapter interface in
hermes_cli/proxy/adapters/.
Check status
hermes proxy status
Hermes proxy upstream adapters
[nous ] Nous Portal — ready (bearer expires 2026-05-15T06:43:21Z)
If you see not logged in, run hermes login nous. If you see
credentials need attention, your refresh token was revoked (rare —
happens if you signed out from the Portal web UI) — just re-run
hermes login nous.
Allowed paths
The proxy only forwards paths the upstream actually serves. For Nous Portal:
| Path | Purpose |
|---|---|
/v1/chat/completions |
Chat completions (streaming + non-streaming) |
/v1/completions |
Legacy text completions |
/v1/embeddings |
Embeddings |
/v1/models |
Model list |
Other paths (/v1/images/generations, /v1/audio/speech, etc.) return
404 with a clear error pointing at the allowed paths. This keeps stray
clients from leaking weird requests to the upstream.
Configuring OpenViking to use Portal
OpenViking is a context
database that needs an LLM provider for its VLM (vision/language model
used to extract memories) and embedding model. With the proxy, you can
point its vlm.api_base at your local proxy:
Edit ~/.openviking/ov.conf:
{
"vlm": {
"provider": "openai",
"model": "Hermes-4-70B",
"api_base": "http://127.0.0.1:8645/v1",
"api_key": "unused-proxy-attaches-real-creds"
}
}
Then start your proxy in a terminal alongside openviking-server:
# Terminal 1
hermes proxy start
# Terminal 2
openviking-server
OpenViking's VLM calls now flow through your Portal subscription. The
embedding model side still needs its own provider — Portal does serve
/v1/embeddings but the model selection depends on what your tier
supports; check portal.nousresearch.com/models.
Configuring Karakeep (or any bookmark/summarizer app)
Karakeep takes an OpenAI-compatible API for bookmark summarization. In its config:
# Karakeep .env
OPENAI_API_BASE_URL=http://127.0.0.1:8645/v1
OPENAI_API_KEY=any-non-empty-string
INFERENCE_TEXT_MODEL=Hermes-4-70B
Same pattern works for Open WebUI, LobeChat, NextChat, or any other OpenAI-compatible client.
Exposing on LAN
By default the proxy binds 127.0.0.1 (localhost only). To let other
machines on your network use it:
hermes proxy start --host 0.0.0.0 --port 8645
⚠ Be aware: anyone on your network can now use your Portal subscription. The proxy has no auth of its own — it accepts any bearer. Use a firewall, VPN, or reverse proxy with proper auth if you expose this beyond your trusted network.
Rate limits
Your Portal tier's RPM/TPM limits apply across the whole proxy. The proxy doesn't fan out or pool — it's a single bearer with your full subscription quota. Monitor usage at portal.nousresearch.com.
Architecture
The proxy is intentionally minimal. Per request:
- Receive
POST /v1/chat/completionsfrom your app - Look up the adapter's current credential (refresh if expiring)
- Forward the request body verbatim, with
Authorization: Bearer <minted-key> - Stream the response back unchanged (SSE preserved)
No transformation. No logging of request bodies. No agent loop. The proxy is a credential-attaching pass-through.
Future: more OAuth providers
The adapter system is pluggable. Adding a new provider (e.g.
HuggingFace, GitHub Copilot's chat endpoint, Anthropic via OAuth)
requires implementing UpstreamAdapter in
hermes_cli/proxy/adapters/<provider>.py and registering it in
adapters/__init__.py. Providers that aren't OpenAI-compatible at the
protocol level (Anthropic Messages API, for example) would need a
transformation layer, which is out of scope for the current shape.