feat(proxy): local OpenAI-compatible proxy for OAuth providers (#25969)

Adds 'hermes proxy start' — a local HTTP server that lets external apps (OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the user's real OAuth-resolved credentials to each forwarded request, refreshing them automatically; the client can send any bearer (it gets stripped). Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and registry in hermes_cli/proxy/adapters/ are designed for additional OAuth providers to plug in by name without server changes. Commands: hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645] hermes proxy status hermes proxy providers Allowed Portal paths: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Anything else returns 404 with a clear error pointing at the allowed list. aiohttp is gated like gateway/platforms/api_server.py (try-import, clean runtime error if missing). No new core dependency. Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess and verifies the upstream receives the right bearer with the client's header stripped.
2026-05-18 04:41:56 +00:00 · 2026-05-14 15:40:48 -07:00 · 2026-05-14 15:40:48 -07:00 · ccb5aae0d2
commit ccb5aae0d2
parent 34fc94d1f4
11 changed files with 1466 additions and 1 deletions
--- a/website/docs/user-guide/features/subscription-proxy.md
+++ b/website/docs/user-guide/features/subscription-proxy.md
@ -0,0 +1,203 @@
+---
+sidebar_position: 15
+title: "Subscription Proxy"
+description: "Use your Nous Portal subscription (or other OAuth provider) as an OpenAI-compatible endpoint for external apps"
+---
+
+# Subscription Proxy
+
+The subscription proxy is a local HTTP server that lets external apps —
+OpenViking, Karakeep, Open WebUI, anything that speaks OpenAI-compatible
+chat completions — use your Hermes-managed provider subscription as their
+LLM endpoint. The proxy attaches the right credentials (refreshing them
+automatically) so the app never needs a static API key.
+
+This is different from the [API server](./api-server.md):
+
+| | API server | Subscription proxy |
+|---|---|---|
+| What it serves | Your agent (full toolset, memory, skills) | Raw model inference |
+| Use case | "Use Hermes as a chat backend" | "Use my Portal sub from another app" |
+| Auth | Your `API_SERVER_KEY` | Any bearer (proxy attaches the real one) |
+| Tool calls | Yes — the agent runs tools | No — passthrough only |
+
+Use the API server when you want the **agent** as a backend. Use the
+proxy when you just want **the model** through your subscription.
+
+## Quick Start
+
+### 1. Log into your provider (one-time)
+
+```bash
+hermes login nous
+```
+
+This opens your browser for the Nous Portal OAuth flow. Hermes stores
+the refresh token in `~/.hermes/auth.json` — the same place all Hermes
+provider logins live.
+
+### 2. Start the proxy
+
+```bash
+hermes proxy start
+```
+
+```
+Starting Hermes proxy for Nous Portal
+  Listening on:  http://127.0.0.1:8645/v1
+  Forwarding to: (resolved per-request from your subscription)
+  Use any bearer token in the client — the proxy attaches your real credential.
+```
+
+Leave this running in the foreground. Use `tmux`, `nohup`, or a systemd
+unit if you want it to survive logout.
+
+### 3. Point your app at it
+
+Any OpenAI-compatible app config takes the same triple:
+
+```
+Base URL:   http://127.0.0.1:8645/v1
+API key:    anything (e.g. "sk-unused")
+Model:      Hermes-4-70B    # or Hermes-4.3-36B, Hermes-4-405B
+```
+
+The proxy ignores the `Authorization` header from your app and attaches
+your real Portal credential to the upstream request. Refreshes happen
+automatically when the bearer approaches expiry.
+
+## Available providers
+
+```bash
+hermes proxy providers
+```
+
+Currently shipped: `nous` (Nous Portal). More OAuth providers can be
+added by implementing the `UpstreamAdapter` interface in
+`hermes_cli/proxy/adapters/`.
+
+## Check status
+
+```bash
+hermes proxy status
+```
+
+```
+Hermes proxy upstream adapters
+
+  [nous    ] Nous Portal — ready (bearer expires 2026-05-15T06:43:21Z)
+```
+
+If you see `not logged in`, run `hermes login nous`. If you see
+`credentials need attention`, your refresh token was revoked (rare —
+happens if you signed out from the Portal web UI) — just re-run
+`hermes login nous`.
+
+## Allowed paths
+
+The proxy only forwards paths the upstream actually serves. For Nous
+Portal:
+
+| Path | Purpose |
+|------|---------|
+| `/v1/chat/completions` | Chat completions (streaming + non-streaming) |
+| `/v1/completions` | Legacy text completions |
+| `/v1/embeddings` | Embeddings |
+| `/v1/models` | Model list |
+
+Other paths (`/v1/images/generations`, `/v1/audio/speech`, etc.) return
+404 with a clear error pointing at the allowed paths. This keeps stray
+clients from leaking weird requests to the upstream.
+
+## Configuring OpenViking to use Portal
+
+[OpenViking](https://github.com/volcengine/OpenViking) is a context
+database that needs an LLM provider for its VLM (vision/language model
+used to extract memories) and embedding model. With the proxy, you can
+point its `vlm.api_base` at your local proxy:
+
+Edit `~/.openviking/ov.conf`:
+
+```json
+{
+  "vlm": {
+    "provider": "openai",
+    "model": "Hermes-4-70B",
+    "api_base": "http://127.0.0.1:8645/v1",
+    "api_key": "unused-proxy-attaches-real-creds"
+  }
+}
+```
+
+Then start your proxy in a terminal alongside `openviking-server`:
+
+```bash
+# Terminal 1
+hermes proxy start
+
+# Terminal 2
+openviking-server
+```
+
+OpenViking's VLM calls now flow through your Portal subscription. The
+embedding model side still needs its own provider — Portal does serve
+`/v1/embeddings` but the model selection depends on what your tier
+supports; check `portal.nousresearch.com/models`.
+
+## Configuring Karakeep (or any bookmark/summarizer app)
+
+[Karakeep](https://karakeep.app/) takes an OpenAI-compatible API for
+bookmark summarization. In its config:
+
+```bash
+# Karakeep .env
+OPENAI_API_BASE_URL=http://127.0.0.1:8645/v1
+OPENAI_API_KEY=any-non-empty-string
+INFERENCE_TEXT_MODEL=Hermes-4-70B
+```
+
+Same pattern works for Open WebUI, LobeChat, NextChat, or any other
+OpenAI-compatible client.
+
+## Exposing on LAN
+
+By default the proxy binds `127.0.0.1` (localhost only). To let other
+machines on your network use it:
+
+```bash
+hermes proxy start --host 0.0.0.0 --port 8645
+```
+
+⚠ **Be aware:** anyone on your network can now use your Portal
+subscription. The proxy has no auth of its own — it accepts any bearer.
+Use a firewall, VPN, or reverse proxy with proper auth if you expose
+this beyond your trusted network.
+
+## Rate limits
+
+Your Portal tier's RPM/TPM limits apply across the whole proxy. The
+proxy doesn't fan out or pool — it's a single bearer with your full
+subscription quota. Monitor usage at
+[portal.nousresearch.com](https://portal.nousresearch.com).
+
+## Architecture
+
+The proxy is intentionally minimal. Per request:
+
+1. Receive `POST /v1/chat/completions` from your app
+2. Look up the adapter's current credential (refresh if expiring)
+3. Forward the request body verbatim, with `Authorization: Bearer <minted-key>`
+4. Stream the response back unchanged (SSE preserved)
+
+No transformation. No logging of request bodies. No agent loop. The
+proxy is a credential-attaching pass-through.
+
+## Future: more OAuth providers
+
+The adapter system is pluggable. Adding a new provider (e.g.
+HuggingFace, GitHub Copilot's chat endpoint, Anthropic via OAuth)
+requires implementing `UpstreamAdapter` in
+`hermes_cli/proxy/adapters/<provider>.py` and registering it in
+`adapters/__init__.py`. Providers that aren't OpenAI-compatible at the
+protocol level (Anthropic Messages API, for example) would need a
+transformation layer, which is out of scope for the current shape.