docs(hermes-agent skill): cover v0.13–v0.17 features, fix stale claims, tighten (#53566)

Refresh the hermes-agent skill against the last 5 major releases and the
current codebase, and cut verbose prose.

Coverage added (v0.13.0–v0.17.0):
- New gateway platforms: iMessage (Photon), Teams, LINE, SimpleX, ntfy,
  Google Chat, Raft, official WhatsApp Business Cloud API (now 20+).
- New surfaces section: desktop app, web dashboard admin panel,
  hermes proxy (OpenAI-compatible OAuth proxy), Automation Blueprints.
- delegate_task(background=true) async subagents; memory-tool atomic
  batch operations; session_search three-mode shape; x_search/video_analyze
  toolsets; image_gen image-to-image; xAI Grok via SuperGrok OAuth.
- display.interface (cli/tui), curator.consolidate opt-in, PyPI install.

Accuracy fixes:
- Adding-a-Tool is two files (auto-discovery), not three.
- Testing uses scripts/run_tests.sh (canonical runner), not bare pytest.
- Dropped change-detector test count and a dangling references/ pointer.
- Refreshed overview (Windows-native, 20+ providers, many surfaces).

Conciseness: trimmed over-explained Windows keybinding/sandbox/test prose
and deep prompt-builder internals to pointers.
This commit is contained in:
Teknium 2026-06-27 03:51:25 -07:00 committed by GitHub
parent d3db73210c
commit f67c0b3e60
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,7 +1,7 @@
---
name: hermes-agent
description: "Configure, extend, or contribute to Hermes Agent."
version: 2.2.0
version: 2.3.0
author: Hermes Agent + Teknium
license: MIT
platforms: [linux, macos, windows]
@ -14,13 +14,14 @@ metadata:
# Hermes Agent
Hermes Agent is an open-source AI agent framework by Nous Research that runs in your terminal, messaging platforms, and IDEs. It belongs to the same category as Claude Code (Anthropic), Codex (OpenAI), and OpenClaw — autonomous coding and task-execution agents that use tool calling to interact with your system. Hermes works with any LLM provider (OpenRouter, Anthropic, OpenAI, DeepSeek, local models, and 15+ others) and runs on Linux, macOS, and WSL.
Hermes Agent is an open-source AI agent framework by Nous Research that runs in your terminal, a native desktop app, messaging platforms, and IDEs. It's in the same category as Claude Code (Anthropic), Codex (OpenAI), and OpenClaw — autonomous coding and task-execution agents that use tool calling to interact with your system. Hermes works with any LLM provider (OpenRouter, Anthropic, OpenAI, Google, DeepSeek, xAI, local models, and 20+ others) and runs on Linux, macOS, Windows, and WSL.
What makes Hermes different:
- **Self-improving through skills** — Hermes learns from experience by saving reusable procedures as skills. When it solves a complex problem, discovers a workflow, or gets corrected, it can persist that knowledge as a skill document that loads into future sessions. Skills accumulate over time, making the agent better at your specific tasks and environment.
- **Persistent memory across sessions** — remembers who you are, your preferences, environment details, and lessons learned. Pluggable memory backends (built-in, Honcho, Mem0, and more) let you choose how memory works.
- **Multi-platform gateway** — the same agent runs on Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, and 10+ other platforms with full tool access, not just chat.
- **Multi-platform gateway** — the same agent runs on Telegram, Discord, Slack, WhatsApp, iMessage, Signal, Matrix, Teams, Email, and a dozen more platforms with full tool access, not just chat.
- **Many surfaces** — the same agent core drives the CLI, the Ink TUI, a native Electron desktop app, a web dashboard, and an ACP server for IDEs (VS Code / Zed / JetBrains).
- **Provider-agnostic** — swap models and providers mid-workflow without changing anything else. Credential pools rotate across multiple API keys automatically.
- **Profiles** — run multiple independent Hermes instances with isolated configs, sessions, skills, and memory.
- **Extensible** — plugins, MCP servers, custom tools, webhook triggers, cron scheduling, and the full Python ecosystem.
@ -44,23 +45,27 @@ Good verification targets:
## Quick Start
```bash
# Install
# Install (shell installer — sets up uv, Python, the venv, and the launcher)
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
# Interactive chat (default)
# Or via PyPI (ships the TUI bundle + shell launcher)
pip install hermes-agent # or: uv pip install hermes-agent
# Interactive chat (default surface; set display.interface: tui to launch the Ink TUI instead)
hermes
# Single query
hermes chat -q "What is the capital of France?"
# Setup wizard
# Setup wizard / pick model+provider / health check
hermes setup
# Change model/provider
hermes model
# Check health
hermes doctor
# Other surfaces
hermes desktop # launch the native desktop app (alias: hermes gui)
hermes dashboard # web admin panel + embedded chat
hermes proxy # OpenAI-compatible local proxy backed by your OAuth provider
```
---
@ -110,14 +115,12 @@ hermes config path Print config.yaml path
hermes config env-path Print .env path
hermes config check Check for missing/outdated config
hermes config migrate Update config with new options
hermes auth Interactive credential manager
hermes auth add PROVIDER Add OAuth or API-key credential (e.g. nous, openai-codex, qwen-oauth)
hermes auth list List stored credentials
hermes auth remove PROVIDER Remove a stored credential
hermes doctor [--fix] Check dependencies and config
hermes status [--all] Show component status
```
Credentials (OAuth + API keys, with pooling) are managed under `hermes auth` — see the Credentials & Pools section below.
### Tools & Skills
```
@ -165,7 +168,7 @@ hermes gateway status Check status
hermes gateway setup Configure platforms
```
Supported platforms: Telegram, Discord, Slack, WhatsApp, Signal, Email, SMS, Matrix, Mattermost, Home Assistant, DingTalk, Feishu, WeCom, BlueBubbles (iMessage), Weixin (WeChat), API Server, Webhooks. Open WebUI connects via the API Server adapter.
Supported platforms (20+): Telegram, Discord, Slack, WhatsApp (Baileys bridge + official Business Cloud API), iMessage (Photon — `hermes photon setup`, the BlueBubbles successor with no Mac relay), Signal, Email, SMS, Matrix, Mattermost, Microsoft Teams, LINE, SimpleX, ntfy, Google Chat, Home Assistant, DingTalk, Feishu, WeCom, Weixin (WeChat), Raft (agent network), API Server, Webhooks. Open WebUI connects via the API Server adapter. Most adapters ship under `plugins/platforms/`, so new ones drop in without touching core.
Platform docs: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/
@ -219,30 +222,42 @@ hermes profile export NAME Export to tar.gz
hermes profile import FILE Import from archive
```
### Credential Pools
### Credentials & Pools
```
hermes auth add Interactive credential wizard
hermes auth Interactive credential manager
hermes auth add [PROVIDER] Add OAuth or API-key credential
(e.g. nous, openai-codex, qwen-oauth, anthropic)
hermes auth list [PROVIDER] List pooled credentials
hermes auth remove P INDEX Remove by provider + index
hermes auth reset PROVIDER Clear exhaustion status
```
Multiple credentials per provider form a pool that rotates automatically and skips exhausted keys.
### Other
```
hermes insights [--days N] Usage analytics
hermes update Update to latest version
hermes desktop / gui Launch the native desktop app
hermes dashboard Web admin panel + embedded chat
hermes proxy OpenAI-compatible local proxy backed by an OAuth provider
hermes portal Quick setup / sign in via Nous Portal
hermes kanban <verb> Multi-agent work-queue board (init/create/list/show/assign/…)
hermes pairing list/approve/revoke DM authorization
hermes plugins list/install/remove Plugin management
hermes honcho setup/status Honcho memory integration (requires honcho plugin)
hermes secrets bitwarden … External secret store (Bitwarden Secrets Manager)
hermes memory setup/status/off Memory provider config
hermes send Send a one-off message through a gateway platform
hermes completion bash|zsh Shell completions
hermes acp ACP server (IDE integration)
hermes claw migrate Migrate from OpenClaw
hermes uninstall Uninstall Hermes
```
For the full, authoritative command list run `hermes --help` (and `hermes <command> --help`). Plugin- and provider-supplied subcommands (e.g. `hermes photon setup` for iMessage) only appear once their plugin is installed/active.
---
## Slash Commands (In-Session)
@ -321,6 +336,7 @@ The registry of record is `hermes_cli/commands.py` — every consumer
### Utility
```
/branch (/fork) Branch the current session
/handoff <platform> Hand the live session off to a messaging platform (CLI)
/fast Toggle priority/fast processing
/browser Open CDP browser connection
/history Show conversation history (CLI)
@ -373,13 +389,14 @@ Edit with `hermes config edit` or `hermes config set section.key value`.
| `agent` | `max_turns` (90), `tool_use_enforcement` |
| `terminal` | `backend` (local/docker/ssh/modal), `cwd`, `timeout` (180) |
| `compression` | `enabled`, `threshold` (0.50), `target_ratio` (0.20) |
| `display` | `skin`, `tool_progress`, `show_reasoning`, `show_cost` |
| `display` | `skin`, `interface` (cli/tui), `tool_progress`, `show_reasoning`, `show_cost`, `language` |
| `stt` | `enabled`, `provider` (local/groq/openai/mistral) |
| `tts` | `provider` (edge/elevenlabs/openai/minimax/mistral/neutts) |
| `memory` | `memory_enabled`, `user_profile_enabled`, `provider` |
| `security` | `tirith_enabled`, `website_blocklist` |
| `delegation` | `model`, `provider`, `base_url`, `api_key`, `max_iterations` (50), `reasoning_effort` |
| `checkpoints` | `enabled`, `max_snapshots` (50) |
| `curator` | `enabled`, `consolidate` (false — opt-in aux-model skill consolidation), `interval_hours`, `stale_after_days` |
Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/configuration
@ -426,8 +443,9 @@ Enable/disable via `hermes tools` (interactive) or `hermes tools enable/disable
| `file` | File read/write/search/patch |
| `code_execution` | Sandboxed Python execution |
| `vision` | Image analysis |
| `image_gen` | AI image generation |
| `video` | Video analysis and generation |
| `image_gen` | AI image generation and image-to-image editing |
| `video` | Video analysis (`video_analyze`) and generation |
| `x_search` | First-class X (Twitter) search (X OAuth or API key) |
| `tts` | Text-to-speech |
| `skills` | Skill browsing and management |
| `memory` | Persistent cross-session memory |
@ -686,16 +704,19 @@ here; full developer notes live in `AGENTS.md`, user-facing docs under
### Delegation (`delegate_task`)
Synchronous subagent spawn — the parent waits for the child's summary
before continuing its own loop. Isolated context + terminal session.
Spawn a subagent with an isolated context + terminal session.
- **Single:** `delegate_task(goal, context, toolsets)`.
- **Batch:** `delegate_task(tasks=[{goal, ...}, ...])` runs children in
parallel, capped by `delegation.max_concurrent_children` (default 3).
- **Background:** `delegate_task(background=true)` returns a handle
immediately and keeps the parent loop going; the child's result
re-enters the conversation as a new turn when it finishes.
- **Roles:** `leaf` (default; cannot re-delegate) vs `orchestrator`
(can spawn its own workers, bounded by `delegation.max_spawn_depth`).
- **Not durable.** If the parent is interrupted, the child is
cancelled. For work that must outlive the turn, use `cronjob` or
- **Not durable.** A backgrounded child is still process-local — if the
parent process exits, the child is lost. For work that must outlive
the process, use `cronjob` or
`terminal(background=True, notify_on_complete=True)`.
Config: `delegation.*` in `config.yaml`.
@ -734,6 +755,11 @@ so nothing is lost.
Bundled + hub-installed skills are off-limits. **Never deletes**
max destructive action is archive. Pinned skills are exempt from
every auto-transition and every LLM review pass.
- **Cost:** the deterministic inactivity/prune sweep runs for free. The
aux-model "consolidate overlapping skills into umbrellas" pass is
**off by default** — opt in with `curator.consolidate: true` or
`hermes curator run --consolidate`. Routine background curation costs
zero tokens.
- **Telemetry:** sidecar at `~/.hermes/skills/.usage.json` holds
per-skill `use_count`, `view_count`, `patch_count`,
`last_activity_at`, `state`, `pinned`.
@ -773,6 +799,39 @@ User docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
---
## Surfaces & Other Capabilities
Beyond the CLI and gateway, a few things worth knowing about:
- **Desktop app** (`hermes desktop` / `hermes gui`) — native Electron app
for macOS/Linux/Windows: streaming chat, session list, drag-and-drop +
clipboard-paste files, Cmd+K palette, status-bar model picker,
rebindable shortcuts, native notifications, live subagent watch-windows,
VS Code Marketplace themes, and per-profile remote-gateway login (OAuth
or username/password) so a thin local GUI can drive a heavy remote agent.
- **Web dashboard** (`hermes dashboard`) — full admin panel: configure
every messaging channel, the MCP catalog, webhooks/hooks, memory, and a
complete profile builder (model + skills + MCPs) from the browser, plus
an embedded `hermes --tui` chat. Secured behind an OAuth/token gate.
- **OpenAI-compatible proxy** (`hermes proxy`) — exposes a
`http://localhost:port` OpenAI API backed by whichever OAuth provider
you're signed into (Claude Pro, ChatGPT Pro, SuperGrok). Point Codex
CLI, Aider, Cline, Continue, or any script at it — no API key.
- **Automation Blueprints** — pick a named automation and Hermes asks for
what it needs (no cron syntax). One definition renders as a dashboard
form, a slash command, an agent conversation, and a docs-catalog entry.
- **`memory` tool batch operations** — pass an `operations` array of
add/replace/remove edits applied atomically against the final character
budget, so a single call can free space and add entries even when an add
alone would overflow.
- **`session_search`** — FTS5-backed, no aux-LLM, effectively free. One
tool, three modes inferred from which args are set: discovery (`query`),
scroll (`session_id` + `around_message_id`), browse (no args).
- **xAI Grok via SuperGrok OAuth** — sign in with your xAI account (no API
key); includes Cursor's `grok-composer-2.5-fast` coding model.
---
## Windows-Specific Quirks
Hermes runs natively on Windows (PowerShell, cmd, Windows Terminal, git-bash
@ -783,54 +842,33 @@ rediscover them from scratch.
### Input / Keybindings
**Alt+Enter doesn't insert a newline.** Windows Terminal intercepts Alt+Enter
at the terminal layer to toggle fullscreen — the keystroke never reaches
prompt_toolkit. Use **Ctrl+Enter** instead. Windows Terminal delivers
Ctrl+Enter as LF (`c-j`), distinct from plain Enter (`c-m` / CR), and the
CLI binds `c-j` to newline insertion on `win32` only (see
`_bind_prompt_submit_keys` + the Windows-only `c-j` binding in `cli.py`).
Side effect: the raw Ctrl+J keystroke also inserts a newline on Windows —
unavoidable, because Windows Terminal collapses Ctrl+Enter and Ctrl+J to
the same keycode at the Win32 console API layer. No conflicting binding
existed for Ctrl+J on Windows, so this is a harmless side effect.
mintty / git-bash behaves the same (fullscreen on Alt+Enter) unless you
disable Alt+Fn shortcuts in Options → Keys. Easier to just use Ctrl+Enter.
**Diagnosing keybindings.** Run `python scripts/keystroke_diagnostic.py`
(repo root) to see exactly how prompt_toolkit identifies each keystroke
in the current terminal. Answers questions like "does Shift+Enter come
through as a distinct key?" (almost never — most terminals collapse it
to plain Enter) or "what byte sequence is my terminal sending for
Ctrl+Enter?" This is how the Ctrl+Enter = c-j fact was established.
**Alt+Enter doesn't insert a newline** — Windows Terminal (and mintty) grab it
for fullscreen before prompt_toolkit sees it. Use **Ctrl+Enter** instead (the
CLI binds it to newline on Windows; raw Ctrl+J does the same, harmlessly).
To inspect how your terminal reports a keystroke, run
`python scripts/keystroke_diagnostic.py` from the repo root.
### Config / Files
**HTTP 400 "No models provided" on first run.** `config.yaml` was saved
with a UTF-8 BOM (common when Windows apps write it). Re-save as UTF-8
without BOM. `hermes config edit` writes without BOM; manual edits in
Notepad are the usual culprit.
**HTTP 400 "No models provided" on first run** — `config.yaml` was saved with
a UTF-8 BOM (Notepad does this). Re-save as UTF-8 without BOM;
`hermes config edit` writes correctly.
### `execute_code` / Sandbox
**WinError 10106** ("The requested service provider could not be loaded
or initialized") from the sandbox child process — it can't create an
`AF_INET` socket, so the loopback-TCP RPC fallback fails before
`connect()`. Root cause is usually **not** a broken Winsock LSP; it's
Hermes's own env scrubber dropping `SYSTEMROOT` / `WINDIR` / `COMSPEC`
from the child env. Python's `socket` module needs `SYSTEMROOT` to locate
`mswsock.dll`. Fixed via the `_WINDOWS_ESSENTIAL_ENV_VARS` allowlist in
`tools/code_execution_tool.py`. If you still hit it, echo `os.environ`
inside an `execute_code` block to confirm `SYSTEMROOT` is set. Full
diagnostic recipe in `references/execute-code-sandbox-env-windows.md`.
**WinError 10106** from the sandbox child process — it can't create an
`AF_INET` socket. Root cause is usually Hermes's env scrubber dropping
`SYSTEMROOT`/`WINDIR`/`COMSPEC` (Python's `socket` needs `SYSTEMROOT` to find
`mswsock.dll`), not a broken Winsock LSP. The `_WINDOWS_ESSENTIAL_ENV_VARS`
allowlist in `tools/code_execution_tool.py` covers it; if you still hit it,
echo `os.environ` inside an `execute_code` block to confirm `SYSTEMROOT` is set.
### Testing / Contributing
### Testing on Windows
**`scripts/run_tests.sh` doesn't work as-is on Windows** — it looks for
POSIX venv layouts (`.venv/bin/activate`). The Hermes-installed venv at
`venv/Scripts/` has no pip or pytest either (stripped for install size).
Workaround: install `pytest + pytest-xdist + pyyaml` into a system Python
3.11 user site, then invoke pytest directly with `PYTHONPATH` set:
`scripts/run_tests.sh` is POSIX-only (expects `.venv/bin/activate`); the
Hermes-installed `venv/Scripts/` has no pip/pytest (stripped for size).
Install pytest into a system Python and run directly with `-n 0`
(`pyproject.toml`'s `addopts` already sets `-n`):
```bash
"/c/Program Files/Python311/python" -m pip install --user pytest pytest-xdist pyyaml
@ -838,24 +876,14 @@ export PYTHONPATH="$(pwd)"
"/c/Program Files/Python311/python" -m pytest tests/foo/test_bar.py -v --tb=short -n 0
```
Use `-n 0`, not `-n 4``pyproject.toml`'s default `addopts` already
includes `-n`, and the wrapper's CI-parity guarantees don't apply off POSIX.
**POSIX-only tests need skip guards.** Common markers already in the codebase:
- Symlinks — elevated privileges on Windows
- `0o600` file modes — POSIX mode bits not enforced on NTFS by default
- `signal.SIGALRM` — Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
- Winsock / Windows-specific regressions — `@pytest.mark.skipif(sys.platform != "win32", ...)`
Use the existing skip-pattern style (`sys.platform == "win32"` or
`sys.platform.startswith("win")`) to stay consistent with the rest of the
suite.
(POSIX-only tests need skip guards — see the cross-platform guard list in the
Contributor section below.)
### Path / Filesystem
**Line endings.** Git may warn `LF will be replaced by CRLF the next time
Git touches it`. Cosmetic — the repo's `.gitattributes` normalizes. Don't
let editors auto-convert committed POSIX-newline files to CRLF.
**Line endings.** Git may warn `LF will be replaced by CRLF`. Cosmetic — the
repo's `.gitattributes` normalizes. Don't let editors auto-convert committed
POSIX-newline files to CRLF.
**Forward slashes work almost everywhere.** `C:/Users/...` is accepted by
every Hermes tool and most Windows APIs. Prefer forward slashes in code
@ -961,13 +989,17 @@ hermes-agent/
├── gateway/ # Messaging gateway
│ └── platforms/ # Platform adapters (telegram, discord, etc.)
├── cron/ # Job scheduler
├── tests/ # ~3000 pytest tests
├── tests/ # Extensive pytest suite (run via scripts/run_tests.sh)
└── website/ # Docusaurus docs site
```
Config: `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys) — both under `$HERMES_HOME` when it is set.
### Adding a Tool (3 files)
### Adding a Tool
Two files. Auto-discovery imports any `tools/*.py` with a top-level
`registry.register()` call, but a tool is only *exposed* to an agent once
its name appears in a toolset.
**1. Create `tools/your_tool.py`:**
```python
@ -991,11 +1023,12 @@ registry.register(
)
```
**2. Add to `toolsets.py`** → `_HERMES_CORE_TOOLS` list.
**2. Wire it into a toolset in `toolsets.py`** — add the name to
`_HERMES_CORE_TOOLS` (every platform) or to a specific toolset.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual list needed.
All handlers must return JSON strings. Use `get_hermes_home()` for paths, never hardcode `~/.hermes`.
All handlers must return JSON strings. Use `get_hermes_home()` for paths,
never hardcode `~/.hermes`. For custom/local-only tools, write a plugin in
`~/.hermes/plugins/` instead of editing core — see the developer docs.
### Adding a Slash Command
@ -1019,25 +1052,22 @@ run_conversation():
### Testing
```bash
python -m pytest tests/ -o 'addopts=' -q # Full suite
python -m pytest tests/tools/ -q # Specific area
```
- Tests auto-redirect `HERMES_HOME` to temp dirs — never touch real `~/.hermes/`
- Run full suite before pushing any change
- Use `-o 'addopts='` to clear any baked-in pytest flags
**Windows contributors:** `scripts/run_tests.sh` currently looks for POSIX venvs (`.venv/bin/activate` / `venv/bin/activate`) and will error out on Windows where the layout is `venv/Scripts/activate` + `python.exe`. The Hermes-installed venv at `venv/Scripts/` also has no `pip` or `pytest` — it's stripped for end-user install size. Workaround: install pytest + pytest-xdist + pyyaml into a system Python 3.11 user site (`/c/Program Files/Python311/python -m pip install --user pytest pytest-xdist pyyaml`), then run tests directly:
Use the canonical runner — it enforces CI-parity (hermetic env, unset
credentials, TZ=UTC, xdist workers, per-test subprocess isolation):
```bash
export PYTHONPATH="$(pwd)"
"/c/Program Files/Python311/python" -m pytest tests/tools/test_foo.py -v --tb=short -n 0
scripts/run_tests.sh # full suite
scripts/run_tests.sh tests/tools/ # one directory
scripts/run_tests.sh tests/tools/test_x.py # one file
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
```
Use `-n 0` (not `-n 4`) because `pyproject.toml`'s default `addopts` already includes `-n`, and the wrapper's CI-parity story doesn't apply off-POSIX.
- Tests auto-redirect `HERMES_HOME` to temp dirs — never touch real `~/.hermes/`.
- The script probes `.venv`, then `venv`, then the shared worktree venv.
- **Windows:** the wrapper is POSIX-only; see the **Windows-Specific Quirks**
section above for the direct-pytest workaround.
**Cross-platform test guards:** tests that use POSIX-only syscalls need a skip marker. Common ones already in the codebase:
**Cross-platform test guards:** tests using POSIX-only syscalls need a skip marker. Common ones already in the codebase:
- Symlink creation → `@pytest.mark.skipif(sys.platform == "win32", reason="Symlinks require elevated privileges on Windows")` (see `tests/cron/test_cron_script.py`)
- POSIX file modes (0o600, etc.) → `@pytest.mark.skipif(sys.platform.startswith("win"), reason="POSIX mode bits not enforced on Windows")` (see `tests/hermes_cli/test_auth_toctou_file_modes.py`)
- `signal.SIGALRM` → Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
@ -1053,18 +1083,14 @@ monkeypatch.setattr(platform, "release", lambda: "6.8.0-generic")
See `tests/agent/test_prompt_builder.py::TestEnvironmentHints` for a worked example.
### Extending the system prompt's execution-environment block
### System prompt's execution-environment block
Factual guidance about the host OS, user home, cwd, terminal backend, and shell (bash vs. PowerShell on Windows) is emitted from `agent/prompt_builder.py::build_environment_hints()`. This is also where the WSL hint and per-backend probe logic live. The convention:
- **Local terminal backend** → emit host info (OS, `$HOME`, cwd) + Windows-specific notes (hostname ≠ username, `terminal` uses bash not PowerShell).
- **Remote terminal backend** (anything in `_REMOTE_TERMINAL_BACKENDS`: `docker, singularity, modal, daytona, ssh, managed_modal`) → **suppress** host info entirely and describe only the backend. A live `uname`/`whoami`/`pwd` probe runs inside the backend via `tools.environments.get_environment(...).execute(...)`, cached per process in `_BACKEND_PROBE_CACHE`, with a static fallback if the probe times out.
- **Key fact for prompt authoring:** when `TERMINAL_ENV != "local"`, *every* file tool (`read_file`, `write_file`, `patch`, `search_files`) runs inside the backend container, not on the host. The system prompt must never describe the host in that case — the agent can't touch it.
Full design notes, the exact emitted strings, and testing pitfalls:
`references/prompt-builder-environment-hints.md`.
**Refactor-safety pattern (POSIX-equivalence guard):** when you extract inline logic into a helper that adds Windows/platform-specific behavior, keep a `_legacy_<name>` oracle function in the test file that's a verbatim copy of the old code, then parametrize-diff against it. Example: `tests/tools/test_code_execution_windows_env.py::TestPosixEquivalence`. This locks in the invariant that POSIX behavior is bit-for-bit identical and makes any future drift fail loudly with a clear diff.
Factual host/backend guidance (OS, `$HOME`, cwd, terminal backend, shell)
is emitted by `agent/prompt_builder.py::build_environment_hints()`. The key
invariant for prompt authors: with a **remote** terminal backend
(`docker, singularity, modal, daytona, ssh, managed_modal`), host info is
suppressed and *every* file tool runs inside the backend container — the
prompt must never describe the host the agent can't touch.
### Commit Conventions