--- sidebar_position: 2 title: "Configuration" description: "Configure Hermes Agent — config.yaml, providers, models, API keys, and more" --- # Configuration All settings are stored in the `~/.hermes/` directory for easy access. ## Directory Structure ```text ~/.hermes/ ├── config.yaml # Settings (model, terminal, TTS, compression, etc.) ├── .env # API keys and secrets ├── auth.json # OAuth provider credentials (Nous Portal, etc.) ├── SOUL.md # Primary agent identity (slot #1 in system prompt) ├── memories/ # Persistent memory (MEMORY.md, USER.md) ├── skills/ # Agent-created skills (managed via skill_manage tool) ├── cron/ # Scheduled jobs ├── sessions/ # Gateway sessions └── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted) ``` ## Managing Configuration ```bash hermes config # View current configuration hermes config edit # Open config.yaml in your editor hermes config set KEY VAL # Set a specific value hermes config check # Check for missing options (after updates) hermes config migrate # Interactively add missing options # Examples: hermes config set model anthropic/claude-opus-4 hermes config set terminal.backend docker hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env ``` :::tip The `hermes config set` command automatically routes values to the right file — API keys are saved to `.env`, everything else to `config.yaml`. ::: ## Configuration Precedence Settings are resolved in this order (highest priority first): 1. **CLI arguments** — e.g., `hermes chat --model anthropic/claude-sonnet-4` (per-invocation override) 2. **`~/.hermes/config.yaml`** — the primary config file for all non-secret settings 3. **`~/.hermes/.env`** — fallback for env vars; **required** for secrets (API keys, tokens, passwords) 4. **Built-in defaults** — hardcoded safe defaults when nothing else is set :::info Rule of Thumb Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings. ::: ## Environment Variable Substitution You can reference environment variables in `config.yaml` using `${VAR_NAME}` syntax: ```yaml auxiliary: vision: api_key: ${GOOGLE_API_KEY} base_url: ${CUSTOM_VISION_URL} delegation: api_key: ${DELEGATION_KEY} ``` Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded. For AI provider setup (OpenRouter, Anthropic, Copilot, custom endpoints, self-hosted LLMs, fallback models, etc.), see [AI Providers](/docs/integrations/providers). ### Provider Timeouts You can set `providers..request_timeout_seconds` for a provider-wide request timeout, plus `providers..models..timeout_seconds` for a model-specific override. Applies to the primary turn client on every transport (OpenAI-wire, native Anthropic, Anthropic-compatible), the fallback chain, rebuilds after credential rotation, and (for OpenAI-wire) the per-request timeout kwarg — so the configured value wins over the legacy `HERMES_API_TIMEOUT` env var. You can also set `providers..stale_timeout_seconds` for the non-streaming stale-call detector, plus `providers..models..stale_timeout_seconds` for a model-specific override. This wins over the legacy `HERMES_API_CALL_STALE_TIMEOUT` env var. Leaving these unset keeps the legacy defaults (`HERMES_API_TIMEOUT=1800`s, `HERMES_API_CALL_STALE_TIMEOUT=300`s, native Anthropic 900s). Not currently wired for AWS Bedrock (both `bedrock_converse` and AnthropicBedrock SDK paths use boto3 with its own timeout configuration). See the commented example in [`cli-config.yaml.example`](https://github.com/NousResearch/hermes-agent/blob/main/cli-config.yaml.example). ## Terminal Backend Configuration Hermes supports six terminal backends. Each determines where the agent's shell commands actually execute — your local machine, a Docker container, a remote server via SSH, a Modal cloud sandbox, a Daytona workspace, or a Singularity/Apptainer container. ```yaml terminal: backend: local # local | docker | ssh | modal | daytona | singularity cwd: "." # Working directory ("." = current dir for local, "/root" for containers) timeout: 180 # Per-command timeout in seconds env_passthrough: [] # Env var names to forward to sandboxed execution (terminal + execute_code) singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Singularity backend modal_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Modal backend daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Daytona backend ``` For cloud sandboxes such as Modal and Daytona, `container_persistent: true` means Hermes will try to preserve filesystem state across sandbox recreation. It does not promise that the same live sandbox, PID space, or background processes will still be running later. ### Backend Overview | Backend | Where commands run | Isolation | Best for | |---------|-------------------|-----------|----------| | **local** | Your machine directly | None | Development, personal use | | **docker** | Docker container | Full (namespaces, cap-drop) | Safe sandboxing, CI/CD | | **ssh** | Remote server via SSH | Network boundary | Remote dev, powerful hardware | | **modal** | Modal cloud sandbox | Full (cloud VM) | Ephemeral cloud compute, evals | | **daytona** | Daytona workspace | Full (cloud container) | Managed cloud dev environments | | **singularity** | Singularity/Apptainer container | Namespaces (--containall) | HPC clusters, shared machines | ### Local Backend The default. Commands run directly on your machine with no isolation. No special setup required. ```yaml terminal: backend: local ``` :::warning The agent has the same filesystem access as your user account. Use `hermes tools` to disable tools you don't want, or switch to Docker for sandboxing. ::: ### Docker Backend Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits). ```yaml terminal: backend: docker docker_image: "nikolaik/python-nodejs:python3.11-nodejs20" docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace docker_forward_env: # Env vars to forward into container - "GITHUB_TOKEN" docker_volumes: # Host directory mounts - "/home/user/projects:/workspace/projects" - "/home/user/data:/data:ro" # :ro for read-only # Resource limits container_cpu: 1 # CPU cores (0 = unlimited) container_memory: 5120 # MB (0 = unlimited) container_disk: 51200 # MB (requires overlay2 on XFS+pquota) container_persistent: true # Persist /workspace and /root across sessions ``` **Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). **Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it). Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images. **Security hardening:** - `--cap-drop ALL` with only `DAC_OVERRIDE`, `CHOWN`, `FOWNER` added back - `--security-opt no-new-privileges` - `--pids-limit 256` - Size-limited tmpfs for `/tmp` (512MB), `/var/tmp` (256MB), `/run` (64MB) **Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically. ### SSH Backend Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands. ```yaml terminal: backend: ssh persistent_shell: true # Keep a long-lived bash session (default: true) ``` **Required environment variables:** ```bash TERMINAL_SSH_HOST=my-server.example.com TERMINAL_SSH_USER=ubuntu ``` **Optional:** | Variable | Default | Description | |----------|---------|-------------| | `TERMINAL_SSH_PORT` | `22` | SSH port | | `TERMINAL_SSH_KEY` | (system default) | Path to SSH private key | | `TERMINAL_SSH_PERSISTENT` | `true` | Enable persistent shell | **How it works:** Connects at init time with `BatchMode=yes` and `StrictHostKeyChecking=accept-new`. Persistent shell keeps a single `bash -l` process alive on the remote host, communicating via temporary files. Commands that need `stdin_data` or `sudo` automatically fall back to one-shot mode. ### Modal Backend Runs commands in a [Modal](https://modal.com) cloud sandbox. Each task gets an isolated VM with configurable CPU, memory, and disk. Filesystem can be snapshot/restored across sessions. ```yaml terminal: backend: modal container_cpu: 1 # CPU cores container_memory: 5120 # MB (5GB) container_disk: 51200 # MB (50GB) container_persistent: true # Snapshot/restore filesystem ``` **Required:** Either `MODAL_TOKEN_ID` + `MODAL_TOKEN_SECRET` environment variables, or a `~/.modal.toml` config file. **Persistence:** When enabled, the sandbox filesystem is snapshotted on cleanup and restored on next session. Snapshots are tracked in `~/.hermes/modal_snapshots.json`. This preserves filesystem state, not live processes, PID space, or background jobs. **Credential files:** Automatically mounted from `~/.hermes/` (OAuth tokens, etc.) and synced before each command. ### Daytona Backend Runs commands in a [Daytona](https://daytona.io) managed workspace. Supports stop/resume for persistence. ```yaml terminal: backend: daytona container_cpu: 1 # CPU cores container_memory: 5120 # MB → converted to GiB container_disk: 10240 # MB → converted to GiB (max 10 GiB) container_persistent: true # Stop/resume instead of delete ``` **Required:** `DAYTONA_API_KEY` environment variable. **Persistence:** When enabled, sandboxes are stopped (not deleted) on cleanup and resumed on next session. Sandbox names follow the pattern `hermes-{task_id}`. **Disk limit:** Daytona enforces a 10 GiB maximum. Requests above this are capped with a warning. ### Singularity/Apptainer Backend Runs commands in a [Singularity/Apptainer](https://apptainer.org) container. Designed for HPC clusters and shared machines where Docker isn't available. ```yaml terminal: backend: singularity singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20" container_cpu: 1 # CPU cores container_memory: 5120 # MB container_persistent: true # Writable overlay persists across sessions ``` **Requirements:** `apptainer` or `singularity` binary in `$PATH`. **Image handling:** Docker URLs (`docker://...`) are automatically converted to SIF files and cached. Existing `.sif` files are used directly. **Scratch directory:** Resolved in order: `TERMINAL_SCRATCH_DIR` → `TERMINAL_SANDBOX_DIR/singularity` → `/scratch/$USER/hermes-agent` (HPC convention) → `~/.hermes/sandboxes/singularity`. **Isolation:** Uses `--containall --no-home` for full namespace isolation without mounting the host home directory. ### Common Terminal Backend Issues If terminal commands fail immediately or the terminal tool is reported as disabled: - **Local** — No special requirements. The safest default when getting started. - **Docker** — Run `docker version` to verify Docker is working. If it fails, fix Docker or `hermes config set terminal.backend local`. - **SSH** — Both `TERMINAL_SSH_HOST` and `TERMINAL_SSH_USER` must be set. Hermes logs a clear error if either is missing. - **Modal** — Needs `MODAL_TOKEN_ID` env var or `~/.modal.toml`. Run `hermes doctor` to check. - **Daytona** — Needs `DAYTONA_API_KEY`. The Daytona SDK handles server URL configuration. - **Singularity** — Needs `apptainer` or `singularity` in `$PATH`. Common on HPC clusters. When in doubt, set `terminal.backend` back to `local` and verify that commands run there first. ### Docker Volume Mounts When using the Docker backend, `docker_volumes` lets you share host directories with the container. Each entry uses standard Docker `-v` syntax: `host_path:container_path[:options]`. ```yaml terminal: backend: docker docker_volumes: - "/home/user/projects:/workspace/projects" # Read-write (default) - "/home/user/datasets:/data:ro" # Read-only - "/home/user/.hermes/cache/documents:/output" # Gateway-visible exports ``` This is useful for: - **Providing files** to the agent (datasets, configs, reference code) - **Receiving files** from the agent (generated code, reports, exports) - **Shared workspaces** where both you and the agent access the same files If you use a messaging gateway and want the agent to send generated files via `MEDIA:/...`, prefer a dedicated host-visible export mount such as `/home/user/.hermes/cache/documents:/output`. - Write files inside Docker to `/output/...` - Emit the **host path** in `MEDIA:`, for example: `MEDIA:/home/user/.hermes/cache/documents/report.txt` - Do **not** emit `/workspace/...` or `/output/...` unless that exact path also exists for the gateway process on the host :::warning YAML duplicate keys silently override earlier ones. If you already have a `docker_volumes:` block, merge new mounts into the same list instead of adding another `docker_volumes:` key later in the file. ::: Can also be set via environment variable: `TERMINAL_DOCKER_VOLUMES='["/host:/container"]'` (JSON array). ### Docker Credential Forwarding By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to `terminal.docker_forward_env`. ```yaml terminal: backend: docker docker_forward_env: - "GITHUB_TOKEN" - "NPM_TOKEN" ``` Hermes resolves each listed variable from your current shell first, then falls back to `~/.hermes/.env` if it was saved with `hermes config set`. :::warning Anything listed in `docker_forward_env` becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session. ::: ### Optional: Mount the Launch Directory into `/workspace` Docker sandboxes stay isolated by default. Hermes does **not** pass your current host working directory into the container unless you explicitly opt in. Enable it in `config.yaml`: ```yaml terminal: backend: docker docker_mount_cwd_to_workspace: true ``` When enabled: - if you launch Hermes from `~/projects/my-app`, that host directory is bind-mounted to `/workspace` - the Docker backend starts in `/workspace` - file tools and terminal commands both see the same mounted project When disabled, `/workspace` stays sandbox-owned unless you explicitly mount something via `docker_volumes`. Security tradeoff: - `false` preserves the sandbox boundary - `true` gives the sandbox direct access to the directory you launched Hermes from Use the opt-in only when you intentionally want the container to work on live host files. ### Persistent Shell By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When **persistent shell** is enabled, a single long-lived bash process is kept alive across `execute()` calls so that state survives between commands. This is most useful for the **SSH backend**, where it also eliminates per-command connection overhead. Persistent shell is **enabled by default for SSH** and disabled for the local backend. ```yaml terminal: persistent_shell: true # default — enables persistent shell for SSH ``` To disable: ```bash hermes config set terminal.persistent_shell false ``` **What persists across commands:** - Working directory (`cd /tmp` sticks for the next command) - Exported environment variables (`export FOO=bar`) - Shell variables (`MY_VAR=hello`) **Precedence:** | Level | Variable | Default | |-------|----------|---------| | Config | `terminal.persistent_shell` | `true` | | SSH override | `TERMINAL_SSH_PERSISTENT` | follows config | | Local override | `TERMINAL_LOCAL_PERSISTENT` | `false` | Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too: ```bash export TERMINAL_LOCAL_PERSISTENT=true ``` :::note Commands that require `stdin_data` or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol. ::: See [Code Execution](features/code-execution.md) and the [Terminal section of the README](features/tools.md) for details on each backend. ## Skill Settings Skills can declare their own configuration settings via their SKILL.md frontmatter. These are non-secret values (paths, preferences, domain settings) stored under the `skills.config` namespace in `config.yaml`. ```yaml skills: config: myplugin: path: ~/myplugin-data # Example — each skill defines its own keys ``` **How skill settings work:** - `hermes config migrate` scans all enabled skills, finds unconfigured settings, and offers to prompt you - `hermes config show` displays all skill settings under "Skill Settings" with the skill they belong to - When a skill loads, its resolved config values are injected into the skill context automatically **Setting values manually:** ```bash hermes config set skills.config.myplugin.path ~/myplugin-data ``` For details on declaring config settings in your own skills, see [Creating Skills — Config Settings](/docs/developer-guide/creating-skills#config-settings-configyaml). ## Memory Configuration ```yaml memory: memory_enabled: true user_profile_enabled: true memory_char_limit: 2200 # ~800 tokens user_char_limit: 1375 # ~500 tokens ``` ## File Read Safety Controls how much content a single `read_file` call can return. Reads that exceed the limit are rejected with an error telling the agent to use `offset` and `limit` for a smaller range. This prevents a single read of a minified JS bundle or large data file from flooding the context window. ```yaml file_read_max_chars: 100000 # default — ~25-35K tokens ``` Raise it if you're on a model with a large context window and frequently read big files. Lower it for small-context models to keep reads efficient: ```yaml # Large context model (200K+) file_read_max_chars: 200000 # Small local model (16K context) file_read_max_chars: 30000 ``` The agent also deduplicates file reads automatically — if the same file region is read twice and the file hasn't changed, a lightweight stub is returned instead of re-sending the content. This resets on context compression so the agent can re-read files after their content is summarized away. ## Tool Output Truncation Limits Three related caps control how much raw output a tool can return before Hermes truncates it: ```yaml tool_output: max_bytes: 50000 # terminal output cap (chars) max_lines: 2000 # read_file pagination cap max_line_length: 2000 # per-line cap in read_file's line-numbered view ``` - **`max_bytes`** — When a `terminal` command produces more than this many characters of combined stdout/stderr, Hermes keeps the first 40% and last 60% and inserts a `[OUTPUT TRUNCATED]` notice between them. Default `50000` (≈12-15K tokens across typical tokenisers). - **`max_lines`** — Upper bound on the `limit` parameter of a single `read_file` call. Requests above this are clamped so a single read can't flood the context window. Default `2000`. - **`max_line_length`** — Per-line cap applied when `read_file` emits the line-numbered view. Lines longer than this are truncated to this many chars followed by `... [truncated]`. Default `2000`. Raise the limits on models with large context windows that can afford more raw output per call. Lower them for small-context models to keep tool results compact: ```yaml # Large context model (200K+) tool_output: max_bytes: 150000 max_lines: 5000 # Small local model (16K context) tool_output: max_bytes: 20000 max_lines: 500 ``` ## Git Worktree Isolation Enable isolated git worktrees for running multiple agents in parallel on the same repo: ```yaml worktree: true # Always create a worktree (same as hermes -w) # worktree: false # Default — only when -w flag is passed ``` When enabled, each CLI session creates a fresh worktree under `.worktrees/` with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery. You can also list gitignored files to copy into worktrees via `.worktreeinclude` in your repo root: ``` # .worktreeinclude .env .venv/ node_modules/ ``` ## Context Compression Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint. All compression settings live in `config.yaml` (no environment variables). ### Full reference ```yaml compression: enabled: true # Toggle compression on/off threshold: 0.50 # Compress at this % of context limit target_ratio: 0.20 # Fraction of threshold to preserve as recent tail protect_last_n: 20 # Min recent messages to keep uncompressed # The summarization model/provider is configured under auxiliary: auxiliary: compression: model: "google/gemini-3-flash-preview" # Model for summarization provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc. base_url: null # Custom OpenAI-compatible endpoint (overrides provider) ``` :::info Legacy config migration Older configs with `compression.summary_model`, `compression.summary_provider`, and `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17). No manual action needed. ::: ### Common setups **Default (auto-detect) — no configuration needed:** ```yaml compression: enabled: true threshold: 0.50 ``` Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash. **Force a specific provider** (OAuth or API-key based): ```yaml auxiliary: compression: provider: nous model: gemini-3-flash ``` Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc. **Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.): ```yaml auxiliary: compression: model: glm-4.7 base_url: https://api.z.ai/api/coding/paas/v4 ``` Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth. ### How the three knobs interact | `auxiliary.compression.provider` | `auxiliary.compression.base_url` | Result | |---------------------|---------------------|--------| | `auto` (default) | not set | Auto-detect best available provider | | `nous` / `openrouter` / etc. | not set | Force that provider, use its auth | | any | set | Use the custom endpoint directly (provider ignored) | :::warning Summary model context length requirement The summary model **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's. ::: ## Context Engine The context engine controls how conversations are managed when approaching the model's token limit. The built-in `compressor` engine uses lossy summarization (see [Context Compression](/docs/developer-guide/context-compression-and-caching)). Plugin engines can replace it with alternative strategies. ```yaml context: engine: "compressor" # default — built-in lossy summarization ``` To use a plugin engine (e.g., LCM for lossless context management): ```yaml context: engine: "lcm" # must match the plugin's name ``` Plugin engines are **never auto-activated** — you must explicitly set `context.engine` to the plugin name. Available engines can be browsed and selected via `hermes plugins` → Provider Plugins → Context Engine. See [Memory Providers](/docs/user-guide/features/memory-providers) for the analogous single-select system for memory plugins. ## Iteration Budget Pressure When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit: | Threshold | Level | What the model sees | |-----------|-------|---------------------| | **70%** | Caution | `[BUDGET: 63/90. 27 iterations left. Start consolidating.]` | | **90%** | Warning | `[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]` | Warnings are injected into the last tool result's JSON (as a `_budget_warning` field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure. ```yaml agent: max_turns: 90 # Max iterations per conversation turn (default: 90) ``` Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations. When the iteration budget is fully exhausted, the CLI shows a notification to the user: `⚠ Iteration budget reached (90/90) — response may be incomplete`. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping. ### API Timeouts Hermes has separate timeout layers for streaming, plus a stale detector for non-streaming calls. The stale detectors auto-adjust for local providers only when you leave them at their implicit defaults. | Timeout | Default | Local providers | Config / env | |---------|---------|----------------|--------------| | Socket read timeout | 120s | Auto-raised to 1800s | `HERMES_STREAM_READ_TIMEOUT` | | Stale stream detection | 180s | Auto-disabled | `HERMES_STREAM_STALE_TIMEOUT` | | Stale non-stream detection | 300s | Auto-disabled when left implicit | `providers..stale_timeout_seconds` or `HERMES_API_CALL_STALE_TIMEOUT` | | API call (non-streaming) | 1800s | Unchanged | `providers..request_timeout_seconds` / `timeout_seconds` or `HERMES_API_TIMEOUT` | The **socket read timeout** controls how long httpx waits for the next chunk of data from the provider. Local LLMs can take minutes for prefill on large contexts before producing the first token, so Hermes raises this to 30 minutes when it detects a local endpoint. If you explicitly set `HERMES_STREAM_READ_TIMEOUT`, that value is always used regardless of endpoint detection. The **stale stream detection** kills connections that receive SSE keep-alive pings but no actual content. This is disabled entirely for local providers since they don't send keep-alive pings during prefill. The **stale non-stream detection** kills non-streaming calls that produce no response for too long. By default Hermes disables this on local endpoints to avoid false positives during long prefills. If you explicitly set `providers..stale_timeout_seconds`, `providers..models..stale_timeout_seconds`, or `HERMES_API_CALL_STALE_TIMEOUT`, that explicit value is honored even on local endpoints. ## Context Pressure Warnings Separate from iteration budget pressure, context pressure tracks how close the conversation is to the **compaction threshold** — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long. | Progress | Level | What happens | |----------|-------|-------------| | **≥ 60%** to threshold | Info | CLI shows a cyan progress bar; gateway sends an informational notice | | **≥ 85%** to threshold | Warning | CLI shows a bold yellow bar; gateway warns compaction is imminent | In the CLI, context pressure appears as a progress bar in the tool output feed: ``` ◐ context ████████████░░░░░░░░ 62% to compaction 48k threshold (50%) · approaching compaction ``` On messaging platforms, a plain-text notification is sent: ``` ◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window). ``` If auto-compression is disabled, the warning tells you context may be truncated instead. Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context. ## Credential Pool Strategies When you have multiple API keys or OAuth tokens for the same provider, configure the rotation strategy: ```yaml credential_pool_strategies: openrouter: round_robin # cycle through keys evenly anthropic: least_used # always pick the least-used key ``` Options: `fill_first` (default), `round_robin`, `least_used`, `random`. See [Credential Pools](/docs/user-guide/features/credential-pools) for full documentation. ## Auxiliary Models Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via auto-detection — you don't need to configure anything. ### Video Tutorial