docs: add proxy mode documentation

- Matrix docs: full Proxy Mode section with architecture diagram,
  step-by-step setup (host + Docker), docker-compose.yml/Dockerfile
  examples, configuration reference, and limitations notes
- API Server docs: add Proxy Mode section explaining the api_server
  serves as the backend for gateway proxy mode
- Environment variables reference: add GATEWAY_PROXY_URL and
  GATEWAY_PROXY_KEY entries
This commit is contained in:
Teknium 2026-04-14 10:46:45 -07:00 committed by Teknium
parent 90c98345c9
commit 8bb5973950
3 changed files with 143 additions and 0 deletions

View file

@ -301,6 +301,8 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI
| `API_SERVER_PORT` | Port for the API server (default: `8642`) |
| `API_SERVER_HOST` | Host/bind address for the API server (default: `127.0.0.1`). Use `0.0.0.0` for network access — requires `API_SERVER_KEY` and a narrow `API_SERVER_CORS_ORIGINS` allowlist. |
| `API_SERVER_MODEL_NAME` | Model name advertised on `/v1/models`. Defaults to the profile name (or `hermes-agent` for the default profile). Useful for multi-user setups where frontends like Open WebUI need distinct model names per connection. |
| `GATEWAY_PROXY_URL` | URL of a remote Hermes API server to forward messages to ([proxy mode](/docs/user-guide/messaging/matrix#proxy-mode-e2ee-on-macos)). When set, the gateway handles platform I/O only — all agent work is delegated to the remote server. Also configurable via `gateway.proxy_url` in `config.yaml`. |
| `GATEWAY_PROXY_KEY` | Bearer token for authenticating with the remote API server in proxy mode. Must match `API_SERVER_KEY` on the remote host. |
| `MESSAGING_CWD` | Working directory for terminal commands in messaging mode (default: `~`) |
| `GATEWAY_ALLOWED_USERS` | Comma-separated user IDs allowed across all platforms |
| `GATEWAY_ALLOW_ALL_USERS` | Allow all users without allowlists (`true`/`false`, default: `false`) |

View file

@ -278,3 +278,9 @@ In Open WebUI, add each as a separate connection. The model dropdown shows `alic
- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
- **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
## Proxy Mode
The API server also serves as the backend for **gateway proxy mode**. When another Hermes gateway instance is configured with `GATEWAY_PROXY_URL` pointing at this API server, it forwards all messages here instead of running its own agent. This enables split deployments — for example, a Docker container handling Matrix E2EE that relays to a host-side agent.
See [Matrix Proxy Mode](/docs/user-guide/messaging/matrix#proxy-mode-e2ee-on-macos) for the full setup guide.

View file

@ -439,6 +439,141 @@ security breach). A new access token gets a new device ID with no stale key
history, so other clients trust it immediately.
:::
## Proxy Mode (E2EE on macOS)
Matrix E2EE requires `libolm`, which doesn't compile on macOS ARM64 (Apple Silicon). The `hermes-agent[matrix]` extra is gated to Linux only. If you're on macOS, proxy mode lets you run E2EE in a Docker container on a Linux VM while the actual agent runs natively on macOS with full access to your local files, memory, and skills.
### How It Works
```
macOS (Host):
└─ hermes gateway
├─ api_server adapter ← listens on 0.0.0.0:8642
├─ AIAgent ← single source of truth
├─ Sessions, memory, skills
└─ Local file access (Obsidian, projects, etc.)
Linux VM (Docker):
└─ hermes gateway (proxy mode)
├─ Matrix adapter ← E2EE decryption/encryption
└─ HTTP forward → macOS:8642/v1/chat/completions
(no LLM API keys, no agent, no inference)
```
The Docker container only handles Matrix protocol + E2EE. When a message arrives, it decrypts it and forwards the text to the host via a standard HTTP request. The host runs the agent, calls tools, generates a response, and streams it back. The container encrypts and sends the response to Matrix. All sessions are unified — CLI, Matrix, Telegram, and any other platform share the same memory and conversation history.
### Step 1: Configure the Host (macOS)
Enable the API server so the host accepts incoming requests from the Docker container.
Add to `~/.hermes/.env`:
```bash
API_SERVER_ENABLED=true
API_SERVER_KEY=your-secret-key-here
API_SERVER_HOST=0.0.0.0
```
- `API_SERVER_HOST=0.0.0.0` binds to all interfaces so the Docker container can reach it.
- `API_SERVER_KEY` is required for non-loopback binding. Pick a strong random string.
- The API server runs on port 8642 by default (change with `API_SERVER_PORT` if needed).
Start the gateway:
```bash
hermes gateway
```
You should see the API server start alongside any other platforms you have configured. Verify it's reachable from the VM:
```bash
# From the Linux VM
curl http://<mac-ip>:8642/health
```
### Step 2: Configure the Docker Container (Linux VM)
The container needs Matrix credentials and the proxy URL. It does NOT need LLM API keys.
**`docker-compose.yml`:**
```yaml
services:
hermes-matrix:
build: .
environment:
# Matrix credentials
MATRIX_HOMESERVER: "https://matrix.example.org"
MATRIX_ACCESS_TOKEN: "syt_..."
MATRIX_ALLOWED_USERS: "@you:matrix.example.org"
MATRIX_ENCRYPTION: "true"
MATRIX_DEVICE_ID: "HERMES_BOT"
# Proxy mode — forward to host agent
GATEWAY_PROXY_URL: "http://192.168.1.100:8642"
GATEWAY_PROXY_KEY: "your-secret-key-here"
volumes:
- ./matrix-store:/root/.hermes/platforms/matrix/store
```
**`Dockerfile`:**
```dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y libolm-dev && rm -rf /var/lib/apt/lists/*
RUN pip install 'hermes-agent[matrix]'
CMD ["hermes", "gateway"]
```
That's the entire container. No API keys for OpenRouter, Anthropic, or any inference provider.
### Step 3: Start Both
1. Start the host gateway first:
```bash
hermes gateway
```
2. Start the Docker container:
```bash
docker compose up -d
```
3. Send a message in an encrypted Matrix room. The container decrypts it, forwards it to the host, and streams the response back.
### Configuration Reference
Proxy mode is configured on the **container side** (the thin gateway):
| Setting | Description |
|---------|-------------|
| `GATEWAY_PROXY_URL` | URL of the remote Hermes API server (e.g., `http://192.168.1.100:8642`) |
| `GATEWAY_PROXY_KEY` | Bearer token for authentication (must match `API_SERVER_KEY` on the host) |
| `gateway.proxy_url` | Same as `GATEWAY_PROXY_URL` but in `config.yaml` |
The host side needs:
| Setting | Description |
|---------|-------------|
| `API_SERVER_ENABLED` | Set to `true` |
| `API_SERVER_KEY` | Bearer token (shared with the container) |
| `API_SERVER_HOST` | Set to `0.0.0.0` for network access |
| `API_SERVER_PORT` | Port number (default: `8642`) |
### Works for Any Platform
Proxy mode is not limited to Matrix. Any platform adapter can use it — set `GATEWAY_PROXY_URL` on any gateway instance and it will forward to the remote agent instead of running one locally. This is useful for any deployment where the platform adapter needs to run in a different environment from the agent (network isolation, E2EE requirements, resource constraints).
:::tip
Session continuity is maintained via the `X-Hermes-Session-Id` header. The host's API server tracks sessions by this ID, so conversations persist across messages just like they would with a local agent.
:::
:::note
**Limitations (v1):** Tool progress messages from the remote agent are not relayed back — the user sees the streamed final response only, not individual tool calls. Dangerous command approval prompts are handled on the host side, not relayed to the Matrix user. These can be addressed in future updates.
:::
### Sync issues / bot falls behind
**Cause**: Long-running tool executions can delay the sync loop, or the homeserver is slow.