* test: make test env hermetic; enforce CI parity via scripts/run_tests.sh
Fixes the recurring 'works locally, fails in CI' (and vice versa) class
of flakes by making tests hermetic and providing a canonical local runner
that matches CI's environment.
## Layer 1 — hermetic conftest.py (tests/conftest.py)
Autouse fixture now unsets every credential-shaped env var before every
test, so developer-local API keys can't leak into tests that assert
'auto-detect provider when key present'.
Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD,
_CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of
credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID,
FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that
change auto-detect behavior.
Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET,
HERMES_SESSION_*, etc.) that mutate agent behavior.
Also:
- Redirects HOME to a per-test tempdir (not just HERMES_HOME), so
code reading ~/.hermes/* directly can't touch the real dir.
- Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to
match CI's deterministic runtime.
The old _isolate_hermes_home fixture name is preserved as an alias so
any test that yields it explicitly still works.
## Layer 2 — scripts/run_tests.sh canonical runner
'Always use scripts/run_tests.sh, never call pytest directly' is the
new rule (documented in AGENTS.md). The script:
- Unsets all credential env vars (belt-and-suspenders for callers
who bypass conftest — e.g. IDE integrations)
- Pins TZ/LANG/PYTHONHASHSEED
- Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on
a 20-core workstation surfaces test-ordering flakes CI will never
see, causing the infamous 'passes in CI, fails locally' drift)
- Finds the venv in .venv, venv, or main checkout's venv
- Passes through arbitrary pytest args
Installs pytest-split on demand so the script can also be used to run
matrix-split subsets locally for debugging.
## Remove 3 module-level dotenv stubs that broke test isolation
tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a
module-level:
if 'dotenv' not in sys.modules:
fake_dotenv = types.ModuleType('dotenv')
fake_dotenv.load_dotenv = lambda *a, **kw: None
sys.modules['dotenv'] = fake_dotenv
This patches sys.modules['dotenv'] to a fake at import time with no
teardown. Under pytest-xdist LoadScheduling, whichever worker collected
one of these files first poisoned its sys.modules; subsequent tests in
the same worker that imported load_dotenv transitively (e.g.
test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and
saw their assertions fail.
dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml),
so the defensive stub was never needed. Removed.
## Validation
- tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4
failures in test_env_loader.py before this fix)
- tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py,
tests/test_hermes_logging.py combined: 123 passed (the caplog
regression tests from PR #11453 still pass)
- Local full run shows no F/E clusters in the 0-55% range that were
previously present before the conftest hardening
## Background
See AGENTS.md 'Testing' section for the full list of drift sources
this closes. Matrix split (closed as #11566) will be re-attempted
once this foundation lands — cross-test pollution was the root cause
of the shard-3 hang in that PR.
* fix(conftest): don't redirect HOME — it broke CI subprocesses
PR #11577's autouse fixture was setting HOME to a per-test tempdir.
CI started timing out at 97% complete with dozens of E/F markers and
orphan python processes at cleanup — tests (or transitive deps)
spawn subprocesses that expect a stable HOME, and the redirect broke
them in non-obvious ways.
Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift
fixes) are unchanged and still in place. HERMES_HOME redirection is
also unchanged — that's the canonical way to isolate tests from
~/.hermes/, not HOME.
Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"`
instead of `get_hermes_home()` is a bug to fix at the callsite, not
something to paper over in conftest.
22 KiB
Hermes Agent - Development Guide
Instructions for AI coding assistants and developers working on the hermes-agent codebase.
Development Environment
source venv/bin/activate # ALWAYS activate before running Python
Project Structure
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Auto context compression
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI subcommands and setup
│ ├── main.py # Entry point — all `hermes` subcommands
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # Slash command definitions + SlashCommandCompleter
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
│ ├── setup.py # Interactive setup wizard
│ ├── skin_engine.py # Skin/theme engine — CLI visual customization
│ ├── skills_config.py # `hermes skills` — enable/disable skills per platform
│ ├── tools_config.py # `hermes tools` — enable/disable tools per platform
│ ├── skills_hub.py # `/skills` slash command (search, browse, install)
│ ├── models.py # Model catalog, provider model lists
│ ├── model_switch.py # Shared /model switch pipeline (CLI + gateway)
│ └── auth.py # Provider credential resolution
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
│ ├── web_tools.py # Web search/extract (Parallel + Firecrawl)
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
│ ├── mcp_tool.py # MCP client (~1050 lines)
│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/ # Messaging platform gateway
│ ├── run.py # Main loop, slash commands, message dispatch
│ ├── session.py # SessionStore — conversation persistence
│ └── platforms/ # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal, qqbot
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains integration)
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── environments/ # RL training environments (Atropos)
├── tests/ # Pytest suite (~3000 tests)
└── batch_runner.py # Parallel batch processing
User config: ~/.hermes/config.yaml (settings), ~/.hermes/.env (API keys)
File Dependency Chain
tools/registry.py (no deps — imported by all tool files)
↑
tools/*.py (each calls registry.register() at import time)
↑
model_tools.py (imports tools/registry + triggers tool discovery)
↑
run_agent.py, cli.py, batch_runner.py, environments/
AIAgent Class (run_agent.py)
class AIAgent:
def __init__(self,
model: str = "anthropic/claude-opus-4.6",
max_iterations: int = 90,
enabled_toolsets: list = None,
disabled_toolsets: list = None,
quiet_mode: bool = False,
save_trajectories: bool = False,
platform: str = None, # "cli", "telegram", etc.
session_id: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
# ... plus provider, api_mode, callbacks, routing params
): ...
def chat(self, message: str) -> str:
"""Simple interface — returns final response string."""
def run_conversation(self, user_message: str, system_message: str = None,
conversation_history: list = None, task_id: str = None) -> dict:
"""Full interface — returns dict with final_response + messages."""
Agent Loop
The core loop is inside run_conversation() — entirely synchronous:
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
if response.tool_calls:
for tool_call in response.tool_calls:
result = handle_function_call(tool_call.name, tool_call.args, task_id)
messages.append(tool_result_message(result))
api_call_count += 1
else:
return response.content
Messages follow OpenAI format: {"role": "system/user/assistant/tool", ...}. Reasoning content is stored in assistant_msg["reasoning"].
CLI Architecture (cli.py)
- Rich for banner/panels, prompt_toolkit for input with autocomplete
- KawaiiSpinner (
agent/display.py) — animated faces during API calls,┊activity feed for tool results load_cli_config()in cli.py merges hardcoded defaults + user config YAML- Skin engine (
hermes_cli/skin_engine.py) — data-driven CLI theming; initialized fromdisplay.skinconfig key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text process_command()is a method onHermesCLI— dispatches on canonical command name resolved viaresolve_command()from the central registry- Skill slash commands:
agent/skill_commands.pyscans~/.hermes/skills/, injects as user message (not system prompt) to preserve prompt caching
Slash Command Registry (hermes_cli/commands.py)
All slash commands are defined in a central COMMAND_REGISTRY list of CommandDef objects. Every downstream consumer derives from this registry automatically:
- CLI —
process_command()resolves aliases viaresolve_command(), dispatches on canonical name - Gateway —
GATEWAY_KNOWN_COMMANDSfrozenset for hook emission,resolve_command()for dispatch - Gateway help —
gateway_help_lines()generates/helpoutput - Telegram —
telegram_bot_commands()generates the BotCommand menu - Slack —
slack_subcommand_map()generates/hermessubcommand routing - Autocomplete —
COMMANDSflat dict feedsSlashCommandCompleter - CLI help —
COMMANDS_BY_CATEGORYdict feedsshow_help()
Adding a Slash Command
- Add a
CommandDefentry toCOMMAND_REGISTRYinhermes_cli/commands.py:
CommandDef("mycommand", "Description of what it does", "Session",
aliases=("mc",), args_hint="[arg]"),
- Add handler in
HermesCLI.process_command()incli.py:
elif canonical == "mycommand":
self._handle_mycommand(cmd_original)
- If the command is available in the gateway, add a handler in
gateway/run.py:
if canonical == "mycommand":
return await self._handle_mycommand(event)
- For persistent settings, use
save_config_value()incli.py
CommandDef fields:
name— canonical name without slash (e.g."background")description— human-readable descriptioncategory— one of"Session","Configuration","Tools & Skills","Info","Exit"aliases— tuple of alternative names (e.g.("bg",))args_hint— argument placeholder shown in help (e.g."<prompt>","[name]")cli_only— only available in the interactive CLIgateway_only— only available in messaging platformsgateway_config_gate— config dotpath (e.g."display.tool_progress_command"); when set on acli_onlycommand, the command becomes available in the gateway if the config value is truthy.GATEWAY_KNOWN_COMMANDSalways includes config-gated commands so the gateway can dispatch them; help/menus only show them when the gate is open.
Adding an alias requires only adding it to the aliases tuple on the existing CommandDef. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.
Adding New Tools
Requires changes in 2 files:
1. Create tools/your_tool.py:
import json, os
from tools.registry import registry
def check_requirements() -> bool:
return bool(os.getenv("EXAMPLE_API_KEY"))
def example_tool(param: str, task_id: str = None) -> str:
return json.dumps({"success": True, "data": "..."})
registry.register(
name="example_tool",
toolset="example",
schema={"name": "example_tool", "description": "...", "parameters": {...}},
handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
check_fn=check_requirements,
requires_env=["EXAMPLE_API_KEY"],
)
2. Add to toolsets.py — either _HERMES_CORE_TOOLS (all platforms) or a new toolset.
Auto-discovery: any tools/*.py file with a top-level registry.register() call is imported automatically — no manual import list to maintain.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
Path references in tool schemas: If the schema description mentions file paths (e.g. default output directories), use display_hermes_home() to make them profile-aware. The schema is generated at import time, which is after _apply_profile_override() sets HERMES_HOME.
State files: If a tool stores persistent state (caches, logs, checkpoints), use get_hermes_home() for the base directory — never Path.home() / ".hermes". This ensures each profile gets its own state.
Agent-level tools (todo, memory): intercepted by run_agent.py before handle_function_call(). See todo_tool.py for the pattern.
Adding Configuration
config.yaml options:
- Add to
DEFAULT_CONFIGinhermes_cli/config.py - Bump
_config_version(currently 5) to trigger migration for existing users
.env variables:
- Add to
OPTIONAL_ENV_VARSinhermes_cli/config.pywith metadata:
"NEW_API_KEY": {
"description": "What it's for",
"prompt": "Display name",
"url": "https://...",
"password": True,
"category": "tool", # provider, tool, messaging, setting
},
Config loaders (two separate systems):
| Loader | Used by | Location |
|---|---|---|
load_cli_config() |
CLI mode | cli.py |
load_config() |
hermes tools, hermes setup |
hermes_cli/config.py |
| Direct YAML load | Gateway | gateway/run.py |
Skin/Theme System
The skin engine (hermes_cli/skin_engine.py) provides data-driven CLI visual customization. Skins are pure data — no code changes needed to add a new skin.
Architecture
hermes_cli/skin_engine.py # SkinConfig dataclass, built-in skins, YAML loader
~/.hermes/skins/*.yaml # User-installed custom skins (drop-in)
init_skin_from_config()— called at CLI startup, readsdisplay.skinfrom configget_active_skin()— returns cachedSkinConfigfor the current skinset_active_skin(name)— switches skin at runtime (used by/skincommand)load_skin(name)— loads from user skins first, then built-ins, then falls back to default- Missing skin values inherit from the
defaultskin automatically
What skins customize
| Element | Skin Key | Used By |
|---|---|---|
| Banner panel border | colors.banner_border |
banner.py |
| Banner panel title | colors.banner_title |
banner.py |
| Banner section headers | colors.banner_accent |
banner.py |
| Banner dim text | colors.banner_dim |
banner.py |
| Banner body text | colors.banner_text |
banner.py |
| Response box border | colors.response_border |
cli.py |
| Spinner faces (waiting) | spinner.waiting_faces |
display.py |
| Spinner faces (thinking) | spinner.thinking_faces |
display.py |
| Spinner verbs | spinner.thinking_verbs |
display.py |
| Spinner wings (optional) | spinner.wings |
display.py |
| Tool output prefix | tool_prefix |
display.py |
| Per-tool emojis | tool_emojis |
display.py → get_tool_emoji() |
| Agent name | branding.agent_name |
banner.py, cli.py |
| Welcome message | branding.welcome |
cli.py |
| Response box label | branding.response_label |
cli.py |
| Prompt symbol | branding.prompt_symbol |
cli.py |
Built-in skins
default— Classic Hermes gold/kawaii (the current look)ares— Crimson/bronze war-god theme with custom spinner wingsmono— Clean grayscale monochromeslate— Cool blue developer-focused theme
Adding a built-in skin
Add to _BUILTIN_SKINS dict in hermes_cli/skin_engine.py:
"mytheme": {
"name": "mytheme",
"description": "Short description",
"colors": { ... },
"spinner": { ... },
"branding": { ... },
"tool_prefix": "┊",
},
User skins (YAML)
Users create ~/.hermes/skins/<name>.yaml:
name: cyberpunk
description: Neon-soaked terminal theme
colors:
banner_border: "#FF00FF"
banner_title: "#00FFFF"
banner_accent: "#FF1493"
spinner:
thinking_verbs: ["jacking in", "decrypting", "uploading"]
wings:
- ["⟨⚡", "⚡⟩"]
branding:
agent_name: "Cyber Agent"
response_label: " ⚡ Cyber "
tool_prefix: "▏"
Activate with /skin cyberpunk or display.skin: cyberpunk in config.yaml.
Important Policies
Prompt Caching Must Not Break
Hermes-Agent ensures caching remains valid throughout a conversation. Do NOT implement changes that would:
- Alter past context mid-conversation
- Change toolsets mid-conversation
- Reload memories or rebuild system prompts mid-conversation
Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
Working Directory Behavior
- CLI: Uses current directory (
.→os.getcwd()) - Messaging: Uses
MESSAGING_CWDenv var (default: home directory)
Background Process Notifications (Gateway)
When terminal(background=true, notify_on_complete=true) is used, the gateway runs a watcher that
detects process completion and triggers a new agent turn. Control verbosity of background process
messages with display.background_process_notifications
in config.yaml (or HERMES_BACKGROUND_NOTIFICATIONS env var):
all— running-output updates + final message (default)result— only the final completion messageerror— only the final message when exit code != 0off— no watcher messages at all
Profiles: Multi-Instance Support
Hermes supports profiles — multiple fully isolated instances, each with its own
HERMES_HOME directory (config, API keys, memory, sessions, skills, gateway, etc.).
The core mechanism: _apply_profile_override() in hermes_cli/main.py sets
HERMES_HOME before any module imports. All 119+ references to get_hermes_home()
automatically scope to the active profile.
Rules for profile-safe code
-
Use
get_hermes_home()for all HERMES_HOME paths. Import fromhermes_constants. NEVER hardcode~/.hermesorPath.home() / ".hermes"in code that reads/writes state.# GOOD from hermes_constants import get_hermes_home config_path = get_hermes_home() / "config.yaml" # BAD — breaks profiles config_path = Path.home() / ".hermes" / "config.yaml" -
Use
display_hermes_home()for user-facing messages. Import fromhermes_constants. This returns~/.hermesfor default or~/.hermes/profiles/<name>for profiles.# GOOD from hermes_constants import display_hermes_home print(f"Config saved to {display_hermes_home()}/config.yaml") # BAD — shows wrong path for profiles print("Config saved to ~/.hermes/config.yaml") -
Module-level constants are fine — they cache
get_hermes_home()at import time, which is AFTER_apply_profile_override()sets the env var. Just useget_hermes_home(), notPath.home() / ".hermes". -
Tests that mock
Path.home()must also setHERMES_HOME— since code now usesget_hermes_home()(reads env var), notPath.home() / ".hermes":with patch.object(Path, "home", return_value=tmp_path), \ patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}): ... -
Gateway platform adapters should use token locks — if the adapter connects with a unique credential (bot token, API key), call
acquire_scoped_lock()fromgateway.statusin theconnect()/start()method andrelease_scoped_lock()indisconnect()/stop(). This prevents two profiles from using the same credential. Seegateway/platforms/telegram.pyfor the canonical pattern. -
Profile operations are HOME-anchored, not HERMES_HOME-anchored —
_get_profiles_root()returnsPath.home() / ".hermes" / "profiles", NOTget_hermes_home() / "profiles". This is intentional — it letshermes -p coder profile listsee all profiles regardless of which one is active.
Known Pitfalls
DO NOT hardcode ~/.hermes paths
Use get_hermes_home() from hermes_constants for code paths. Use display_hermes_home()
for user-facing print/log messages. Hardcoding ~/.hermes breaks profiles — each profile
has its own HERMES_HOME directory. This was the source of 5 bugs fixed in PR #3575.
DO NOT use simple_term_menu for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use curses (stdlib) instead. See hermes_cli/tools_config.py for the pattern.
DO NOT use \033[K (ANSI erase-to-EOL) in spinner/display code
Leaks as literal ?[K text under prompt_toolkit's patch_stdout. Use space-padding: f"\r{line}{' ' * pad}".
_last_resolved_tool_names is a process-global in model_tools.py
_run_single_child() in delegate_tool.py saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
DO NOT hardcode cross-tool references in schema descriptions
Tool schema descriptions must not mention tools from other toolsets by name (e.g., browser_navigate saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in get_tool_definitions() in model_tools.py — see the browser_navigate / execute_code post-processing blocks for the pattern.
Tests must not write to ~/.hermes/
The _isolate_hermes_home autouse fixture in tests/conftest.py redirects HERMES_HOME to a temp dir. Never hardcode ~/.hermes/ paths in tests.
Profile tests: When testing profile features, also mock Path.home() so that
_get_profiles_root() and _get_default_hermes_home() resolve within the temp dir.
Use the pattern from tests/hermes_cli/test_profiles.py:
@pytest.fixture
def profile_env(tmp_path, monkeypatch):
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setattr(Path, "home", lambda: tmp_path)
monkeypatch.setenv("HERMES_HOME", str(home))
return home
Testing
ALWAYS use scripts/run_tests.sh — do not call pytest directly. The script enforces
hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
4 xdist workers matching GHA ubuntu-latest). Direct pytest on a 16+ core
developer machine with API keys set diverges from CI in ways that have caused
multiple "works locally, fails in CI" incidents (and the reverse).
scripts/run_tests.sh # full suite, CI-parity
scripts/run_tests.sh tests/gateway/ # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
Why the wrapper (and why the old "just call pytest" doesn't work)
Five real sources of local-vs-CI drift the script closes:
| Without wrapper | With wrapper | |
|---|---|---|
| Provider API keys | Whatever is in your env (auto-detects pool) | All *_API_KEY/*_TOKEN/etc. unset |
HOME / ~/.hermes/ |
Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
| xdist workers | -n auto = all cores (20+ on a workstation) |
-n 4 matching CI |
tests/conftest.py also enforces points 1-4 as an autouse fixture so ANY pytest
invocation (including IDE integrations) gets hermetic behavior — but the wrapper
is belt-and-suspenders.
Running without the wrapper (only if you must)
If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
pytest directly), at minimum activate the venv and pass -n 4:
source venv/bin/activate
python -m pytest tests/ -q -n 4
Worker count above 4 will surface test-ordering flakes that CI never sees.
Always run the full suite before pushing changes.