mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
Merge bc15db6231 into 05d8f11085
This commit is contained in:
commit
09ea67c0ac
1 changed files with 133 additions and 0 deletions
|
|
@ -243,6 +243,139 @@ When using Docker as the execution environment (not the methods above, but when
|
|||
|
||||
The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command.
|
||||
|
||||
## Connecting to local inference servers (vLLM, Ollama, etc.)
|
||||
|
||||
When running Hermes in Docker and your inference server (vLLM, Ollama, text-generation-inference, etc.) is also running on the host or in another container, networking requires extra attention.
|
||||
|
||||
### Docker Compose (recommended)
|
||||
|
||||
Put both services on the same Docker network. This is the most reliable approach:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
vllm:
|
||||
image: vllm/vllm-openai:latest
|
||||
container_name: vllm
|
||||
command: >
|
||||
--model Qwen/Qwen2.5-7B-Instruct
|
||||
--served-model-name my-model
|
||||
--host 0.0.0.0
|
||||
--port 8000
|
||||
ports:
|
||||
- "8000:8000"
|
||||
networks:
|
||||
- hermes-net
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- capabilities: [gpu]
|
||||
|
||||
hermes:
|
||||
image: nousresearch/hermes-agent:latest
|
||||
container_name: hermes
|
||||
restart: unless-stopped
|
||||
command: gateway run
|
||||
ports:
|
||||
- "8642:8642"
|
||||
volumes:
|
||||
- ~/.hermes:/opt/data
|
||||
networks:
|
||||
- hermes-net
|
||||
|
||||
networks:
|
||||
hermes-net:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
Then in your `~/.hermes/config.yaml`, use the **container name** as the hostname:
|
||||
|
||||
```yaml
|
||||
model:
|
||||
provider: custom
|
||||
model: my-model
|
||||
base_url: http://vllm:8000/v1
|
||||
api_key: "none"
|
||||
```
|
||||
|
||||
:::tip Key points
|
||||
- Use the **container name** (`vllm`) as the hostname — not `localhost` or `127.0.0.1`, which refer to the Hermes container itself.
|
||||
- The `model` value must match the `--served-model-name` you passed to vLLM.
|
||||
- Set `api_key` to any non-empty string (vLLM requires the header but doesn't validate it by default).
|
||||
- Do **not** include a trailing slash in `base_url`.
|
||||
:::
|
||||
|
||||
### Standalone Docker run (no Compose)
|
||||
|
||||
If your inference server runs directly on the host (not in Docker), use `host.docker.internal` on macOS/Windows, or `--network host` on Linux:
|
||||
|
||||
**macOS / Windows:**
|
||||
|
||||
```sh
|
||||
docker run -d \
|
||||
--name hermes \
|
||||
-v ~/.hermes:/opt/data \
|
||||
-p 8642:8642 \
|
||||
nousresearch/hermes-agent gateway run
|
||||
```
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
model:
|
||||
provider: custom
|
||||
model: my-model
|
||||
base_url: http://host.docker.internal:8000/v1
|
||||
api_key: "none"
|
||||
```
|
||||
|
||||
**Linux (host networking):**
|
||||
|
||||
```sh
|
||||
docker run -d \
|
||||
--name hermes \
|
||||
--network host \
|
||||
-v ~/.hermes:/opt/data \
|
||||
nousresearch/hermes-agent gateway run
|
||||
```
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
model:
|
||||
provider: custom
|
||||
model: my-model
|
||||
base_url: http://127.0.0.1:8000/v1
|
||||
api_key: "none"
|
||||
```
|
||||
|
||||
:::warning With `--network host`, the `-p` flag is ignored — all container ports are directly exposed on the host.
|
||||
:::
|
||||
|
||||
### Verifying connectivity
|
||||
|
||||
From inside the Hermes container, confirm the inference server is reachable:
|
||||
|
||||
```sh
|
||||
docker exec hermes curl -s http://vllm:8000/v1/models
|
||||
```
|
||||
|
||||
You should see a JSON response listing your served model. If this fails, check:
|
||||
|
||||
1. Both containers are on the same Docker network (`docker network inspect hermes-net`)
|
||||
2. The inference server is listening on `0.0.0.0`, not `127.0.0.1`
|
||||
3. The port number matches
|
||||
|
||||
### Ollama
|
||||
|
||||
Ollama works the same way. If Ollama runs on the host, use `host.docker.internal:11434` (macOS/Windows) or `127.0.0.1:11434` (Linux with `--network host`). If Ollama runs in its own container on the same Docker network:
|
||||
|
||||
```yaml
|
||||
model:
|
||||
provider: custom
|
||||
model: llama3
|
||||
base_url: http://ollama:11434/v1
|
||||
api_key: "none"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Container exits immediately
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue