mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-10 08:32:09 +00:00

docs(agents): add Design Philosophy + Contribution Rubric to AGENTS.md (#42641 )

AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.

- 'What Hermes Is': framing + the two properties that drive design
  (prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
  what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
  to close on the three allowed reasons AND when NOT to close one. Taste-based
  'won't implement / out of scope' closes stay human-only by design.
  - 'What we want' calibrated against the last ~55 merges: fix real bugs well,
    expand reach at the edges (platforms/channels/providers/models/desktop —
    large features land routinely), refactor god-files into clean modules,
    keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
  - 'What we don't want': speculative hooks, .env-for-non-secrets, needless
    core tools, lazy-read escape hatches, feature-destroying fixes, ungated
    telemetry, change-detector tests, core-touching plugins.
  - 'Before you call it a bug — verify the premise (and when NOT to close)':
    distilled from real closes (#41741 intentional-design-not-a-gap, #41610
    wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission,
    #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
    legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
  > plugin > MCP server in the catalog > new core tool (last resort).

Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.

2026-06-09 21:31:07 -07:00

68 KiB

Raw Blame History

Hermes Agent - Development Guide

Instructions for AI coding assistants and developers working on the hermes-agent codebase.

Never give up on the right solution.

What Hermes Is

Hermes is a personal AI agent that runs the same agent core across a CLI, a messaging gateway (Telegram, Discord, Slack, and ~20 other platforms), a TUI, and an Electron desktop app. It learns across sessions (memory + skills), delegates to subagents, runs scheduled jobs, and drives a real terminal and browser. It is extended primarily through plugins and skills, not by growing the core.

Two properties shape almost every design decision and are the lens for reviewing any change:

Per-conversation prompt caching is sacred. A long-lived conversation reuses a cached prefix every turn. Anything that mutates past context, swaps toolsets, or rebuilds the system prompt mid-conversation invalidates that cache and multiplies the user's cost. We do not do it (the one exception is context compression).
The core is a narrow waist; capability lives at the edges. Every model tool we add is sent on every API call, so the bar for a new core tool is high. Most new capability should arrive as a CLI command + skill, a service-gated tool, or a plugin — not as core surface.

Contribution Rubric — What We Want / What We Don't

This is the project's intent layer. Use it two ways:

For humans and for your own work — what gets merged and what gets rejected, so a contribution aims at the target.
For automated review (the triage sweeper) — guidance on when a PR is safe to close on the three allowed reasons (implemented_on_main, cannot_reproduce, incoherent) and, just as important, when NOT to close one. Taste-based "we don't want this / out of scope" closes are NOT an automated decision — those stay with a human maintainer. The sweeper's job here is to recognize design intent and avoid wrongly closing a legitimate contribution, not to make the won't-implement call itself.

Read the balance right: Hermes ships a lot — most merges are bug fixes to real reported behavior, and the product surface (platforms, channels, providers, models, desktop/TUI features) expands aggressively and on purpose. The restraint below is aimed squarely at the core agent + the model tool schema, the one place where every addition is paid for on every API call. "Smallest footprint" governs how a capability is wired into the core, NOT whether the product is allowed to grow. We are expansive at the edges and conservative at the waist.

What we want

Fix real bugs, well. The bulk of what lands is fix(...) against an actual reported symptom. A good fix reproduces the symptom on current main, points to the exact line where it manifests, and fixes the whole bug class — sibling call paths included — not just the one site the reporter hit.
Expand reach at the edges. New platform adapters, channels, providers, models, and desktop/TUI/dashboard features are welcome and land routinely, including large ones (a new messaging channel, a session-cap feature, a Windows PTY bridge). Breadth in the product is a goal, not a footprint concern — as long as it integrates with the existing setup/config UX (hermes tools, hermes setup, auto-install) rather than bolting on a raw env var.
Refactor god-files into clean modules. Extracting a multi-thousand-line cluster out of cli.py / run_agent.py / gateway/run.py into a focused mixin or module is wanted work, even when the diff is huge and mechanical (large +N/-N refactors merge regularly). The "every line traces to the request" test applies to feature PRs; a declared refactor's request IS the extraction.
Keep the core narrow. New model tools are the expensive exception — every tool ships on every API call. Prefer, in order: extend existing code → CLI command + skill → service-gated tool (check_fn) → plugin → MCP server in the catalog → new core tool (last resort). See "The Footprint Ladder."
Extend, don't duplicate. Before adding a module/manager/hook, check whether existing infrastructure already covers the use case. When several PRs integrate the same category, design one shared interface instead of merging them one at a time (see the ABC + orchestrator note under the Footprint Ladder).
Behavior contracts over snapshots. Tests should assert how two pieces of data must relate (invariants), not freeze a current value (model lists, config version literals, enumeration counts). See "Don't write change-detector tests."
E2E validation, not just green unit mocks. For anything touching resolution chains, config propagation, security boundaries, remote backends, or file/network I/O, exercise the real path with real imports against a temp HERMES_HOME. Mocks hide integration bugs.
Cache-, alternation-, and invariant-safe. Preserve prompt caching, strict message role alternation (never two same-role messages in a row; never a synthetic user message injected mid-loop), and a system prompt that is byte-stable for the life of a conversation.
Contributor credit preserved. Salvage external work by cherry-picking (rebase-merge) so authorship survives in git history; don't reimplement from scratch when you can build on top.

What we don't want (rejected even when well-built)

Speculative infrastructure. Hooks, callbacks, or extension points with no concrete consumer. Adding a hook is easy; removing one after plugins depend on it is hard. A hook is NOT speculative if a contributor has a real, stated use case — even if the consumer ships separately.
New HERMES_* env vars for non-secret config. .env is for secrets only (API keys, tokens, passwords). All behavioral settings — timeouts, thresholds, feature flags, display prefs — go in config.yaml. Bridge to an internal env var if the mechanism needs one, but user-facing docs point to config.yaml. Reject PRs that tell users to "set X in your .env" unless X is a credential.
A new core tool when terminal + file already do the job, or when a skill would. If the only barrier is file visibility on a remote backend, fix the mount, not the toolset.
Lazy-reading escape hatches on instructional tools. No offset/limit pagination on tools that load content the agent must read fully (skills, prompts, playbooks). Models will read page 1 and skip the rest.
"Fixes" that destroy the feature they secure. A mitigation that kills the feature's purpose is the wrong mitigation. Read the original commit's intent (git log -p -S) before restricting behavior; find a fix that preserves the feature.
Outbound telemetry / usage attribution without opt-in gating. No new analytics, third-party identifier tagging, or attribution tags until a generic user-facing opt-in (config gate + setup prompt + hermes tools toggle) exists. Park behind a label, do not merge.
Change-detector tests, cache-breaking mid-conversation, dead code wired in without E2E proof, and plugins that touch core files. Plugins live in their own directory and work within the ABCs/hooks we provide; if a plugin needs more, widen the generic plugin surface, don't special-case it in core.

Before you call it a bug — verify the premise (and when NOT to close)

The most common reason a well-written PR gets closed is not code quality — it is that the change is built on a wrong premise, or it treats an intentional design as a gap. These patterns cut both ways: they tell a human reviewer what to scrutinize, and they tell the automated sweeper when a PR is NOT safe to close as implemented_on_main / cannot_reproduce (when in doubt, leave it open for a human). They are distilled from real closes.

"Intentional design, not a gap." A limitation that looks like an oversight is often deliberate. Before "fixing" a missing link or a restriction, ask whether the isolation IS the design. Example: profiles are independent islands on purpose — a PR adding live config inheritance from the default profile was closed because coupling profiles together is exactly what the design prevents (the copy-at-creation --clone path already covers the legitimate "start from my default" case). Read the original commit's intent (git log -p -S "<symbol>") before assuming something is unfinished.
"The premise doesn't hold against how X actually works." A PR's justification frequently rests on a wrong mental model of an existing mechanism. Trace the real code/runtime before accepting the rationale. Two real closes: a rate-limit "re-probe during cooldown" PR (the breaker only trips on a confirmed-empty account bucket, so re-probing just hammers a bucket we've already proven empty); a usage-accumulation fix whose new branch never executes at runtime because an earlier guard already popped the state it depended on. If you can't point to the exact line where the bug manifests AND show the fix changes that line's behavior, you haven't verified the premise.
"This fix was wrong — the absence/omission was deliberate." Adding the obvious-looking missing piece can break things the omission was protecting. Example: restoring "missing" __init__.py files made a test tree importable as a dotted package that shadowed the real plugin, deleting its register() at import time. The absence was load-bearing.
"Overreached / resurrected an approach we'd moved past." Scope creep that supersedes an agreed-on base, or revives a direction the maintainers deliberately closed, gets rejected even when the code works. Keep the change to the narrow piece that was actually agreed; offer the rest as a focused follow-up.

The throughline: verify the claim AND the intent against the codebase before writing or merging a fix. A confirmed reproduction on current main plus a line-level account of where the fix acts beats a plausible-sounding rationale every time. When in doubt about intent, it is cheaper to ask than to ship a fix that fights the design.

The Footprint Ladder (new capability decision)

Each rung adds more permanent surface than the one above. Choose the highest (least-footprint) rung that correctly solves the problem:

Extend existing code — the capability is a variation of something that already exists. Zero new surface.
CLI command + skill — manages config/state/infra expressible as shell commands. The agent runs hermes <subcommand> guided by a skill. Zero model-tool footprint. Default choice for subscriptions, scheduled tasks, service setup. Examples: hermes webhook, hermes cron, hermes tools.
Service-gated tool (check_fn) — needs structured params/returns AND only appears when a prerequisite is configured. Zero footprint otherwise. Examples: Home Assistant tools (gated on token), memory-provider tools.
Plugin — third-party/niche/user-specific capability that doesn't ship in core. Lives in ~/.hermes/plugins/ or a pip package, discovered at runtime.
MCP server (in the catalog) — if the capability genuinely needs to be a tool (structured I/O the agent invokes) but isn't core-fundamental, prefer building it as an MCP server and adding it to the MCP catalog over growing the core toolset. The agent connects to it through the built-in MCP client; zero permanent core-schema footprint, and it's reusable by any MCP host.
New core tool — only when the capability is fundamental, broadly useful to nearly every user, and unreachable via terminal + file (or an MCP server). Examples of correct core tools: terminal, read_file, web_search, browser_navigate.

When 3+ open PRs try to integrate the same category of thing (memory backends, providers, notifiers), don't merge them one at a time — design an ABC + orchestrator, wrap the existing built-in as the first provider, and turn the competing PRs into plugins against that interface.

Development Environment

# Prefer .venv; fall back to venv if that's what your checkout has.
source .venv/bin/activate   # or: source venv/bin/activate

scripts/run_tests.sh probes .venv first, then venv, then $HOME/.hermes/hermes-agent/venv (for worktrees that share a venv with the main checkout).

Project Structure

File counts shift constantly — don't treat the tree below as exhaustive. The canonical source is the filesystem. The notes call out the load-bearing entry points you'll actually edit.

hermes-agent/
├── run_agent.py          # AIAgent class — core conversation loop (~12k LOC)
├── model_tools.py        # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py                # HermesCLI class — interactive CLI orchestrator (~11k LOC)
├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
├── hermes_constants.py   # get_hermes_home(), display_hermes_home() — profile-aware paths
├── hermes_logging.py     # setup_logging() — agent.log / errors.log / gateway.log (profile-aware)
├── batch_runner.py       # Parallel batch processing
├── agent/                # Agent internals (provider adapters, memory, caching, compression, etc.)
├── hermes_cli/           # CLI subcommands, setup wizard, plugins loader, skin engine
├── tools/                # Tool implementations — auto-discovered via tools/registry.py
│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/              # Messaging gateway — run.py + session.py + platforms/
│   ├── platforms/        # Adapter per platform (telegram, discord, slack, whatsapp,
│   │                     #   homeassistant, signal, matrix, mattermost, email, sms,
│   │                     #   dingtalk, wecom, weixin, feishu, qqbot, bluebubbles,
│   │                     #   yuanbao, webhook, api_server, ...). See ADDING_A_PLATFORM.md.
│   └── builtin_hooks/    # Extension point for always-registered gateway hooks (none shipped)
├── plugins/              # Plugin system (see "Plugins" section below)
│   ├── memory/           # Memory-provider plugins (honcho, mem0, supermemory, ...)
│   ├── context_engine/   # Context-engine plugins
│   ├── model-providers/  # Inference backend plugins (openrouter, anthropic, gmi, ...)
│   ├── kanban/           # Multi-agent board dispatcher + worker plugin
│   ├── hermes-achievements/  # Gamified achievement tracking
│   ├── observability/    # Metrics / traces / logs plugin
│   ├── image_gen/        # Image-generation providers
│   └── <others>/         # disk-cleanup, google_meet, platforms, spotify,
│                         #   strike-freedom-cockpit, ...
├── optional-skills/      # Heavier/niche skills shipped but NOT active by default
├── skills/               # Built-in skills bundled with the repo
├── ui-tui/               # Ink (React) terminal UI — `hermes --tui`
│   └── src/              # entry.tsx, app.tsx, gatewayClient.ts + app/components/hooks/lib
├── tui_gateway/          # Python JSON-RPC backend for the TUI
├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
├── cron/                 # Scheduler — jobs.py, scheduler.py
├── scripts/              # run_tests.sh, release.py, auxiliary scripts
├── website/              # Docusaurus docs site
└── tests/                # Pytest suite (~17k tests across ~900 files as of May 2026)

User config: ~/.hermes/config.yaml (settings), ~/.hermes/.env (API keys only). Logs: ~/.hermes/logs/ — agent.log (INFO+), errors.log (WARNING+), gateway.log when running the gateway. Profile-aware via get_hermes_home(). Browse with hermes logs [--follow] [--level ...] [--session ...].

TypeScript Style

Applies to TypeScript across Hermes: desktop, TUI, website, and future TS packages.

Prefer small nanostores over component state when state is shared, reused, or read by distant UI.
Let each feature own its atoms. Chat state belongs near chat, shell state near shell, shared state in src/store.
Components that render from an atom should use useStore. Non-rendering actions should read with $atom.get().
Do not pass state through three components when the leaf can subscribe to the atom.
Keep persistence beside the atom that owns it.
Keep route roots thin. They compose routes and shell; they should not become controllers.
No monolithic hooks. A hook should own one narrow job.
Prefer colocated action modules over hidden god hooks.
If a callback is pure side effect, use the terse void form: onState={st => void setGatewayState(st)}.
Async UI handlers should make intent explicit: onClick={() => void save()}.
Prefer interfaces for public props and shared object shapes. Avoid type X = { ... } for object props.
Extend React primitives for props: React.ComponentProps<'button'>, React.ComponentProps<typeof Dialog>, Omit<...>, Pick<...>.
Table-driven beats condition ladders when mapping ids, routes, or views.
src/app owns routes, pages, and page-specific components.
src/store owns shared atoms.
src/lib owns shared pure helpers.

File Dependency Chain

tools/registry.py  (no deps — imported by all tool files)
       ↑
tools/*.py  (each calls registry.register() at import time)
       ↑
model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
run_agent.py, cli.py, batch_runner.py, environments/

AIAgent Class (run_agent.py)

The real AIAgent.__init__ takes ~60 parameters (credentials, routing, callbacks, session context, budget, credential pool, etc.). The signature below is the minimum subset you'll usually touch — read run_agent.py for the full list.

class AIAgent:
    def __init__(self,
        base_url: str = None,
        api_key: str = None,
        provider: str = None,
        api_mode: str = None,              # "chat_completions" | "codex_responses" | ...
        model: str = "",                   # empty → resolved from config/provider later
        max_iterations: int = 90,          # tool-calling iterations (shared with subagents)
        enabled_toolsets: list = None,
        disabled_toolsets: list = None,
        quiet_mode: bool = False,
        save_trajectories: bool = False,
        platform: str = None,              # "cli", "telegram", etc.
        session_id: str = None,
        skip_context_files: bool = False,
        skip_memory: bool = False,
        credential_pool=None,
        # ... plus callbacks, thread/user/chat IDs, iteration_budget, fallback_model,
        # checkpoints config, prefill_messages, service_tier, reasoning_config, etc.
    ): ...

    def chat(self, message: str) -> str:
        """Simple interface — returns final response string."""

    def run_conversation(self, user_message: str, system_message: str = None,
                         conversation_history: list = None, task_id: str = None) -> dict:
        """Full interface — returns dict with final_response + messages."""

Agent Loop

The core loop is inside run_conversation() — entirely synchronous, with interrupt checks, budget tracking, and a one-turn grace call:

while (api_call_count < self.max_iterations and self.iteration_budget.remaining > 0) \
        or self._budget_grace_call:
    if self._interrupt_requested: break
    response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
    if response.tool_calls:
        for tool_call in response.tool_calls:
            result = handle_function_call(tool_call.name, tool_call.args, task_id)
            messages.append(tool_result_message(result))
        api_call_count += 1
    else:
        return response.content

Messages follow OpenAI format: {"role": "system/user/assistant/tool", ...}. Reasoning content is stored in assistant_msg["reasoning"].

CLI Architecture (cli.py)

Rich for banner/panels, prompt_toolkit for input with autocomplete
KawaiiSpinner (agent/display.py) — animated faces during API calls, ┊ activity feed for tool results
load_cli_config() in cli.py merges hardcoded defaults + user config YAML
Skin engine (hermes_cli/skin_engine.py) — data-driven CLI theming; initialized from display.skin config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
process_command() is a method on HermesCLI — dispatches on canonical command name resolved via resolve_command() from the central registry
Skill slash commands: agent/skill_commands.py scans ~/.hermes/skills/, injects as user message (not system prompt) to preserve prompt caching

Slash Command Registry (`hermes_cli/commands.py`)

All slash commands are defined in a central COMMAND_REGISTRY list of CommandDef objects. Every downstream consumer derives from this registry automatically:

CLI — process_command() resolves aliases via resolve_command(), dispatches on canonical name
Gateway — GATEWAY_KNOWN_COMMANDS frozenset for hook emission, resolve_command() for dispatch
Gateway help — gateway_help_lines() generates /help output
Telegram — telegram_bot_commands() generates the BotCommand menu
Slack — slack_subcommand_map() generates /hermes subcommand routing
Autocomplete — COMMANDS flat dict feeds SlashCommandCompleter
CLI help — COMMANDS_BY_CATEGORY dict feeds show_help()

Adding a Slash Command

Add a CommandDef entry to COMMAND_REGISTRY in hermes_cli/commands.py:

CommandDef("mycommand", "Description of what it does", "Session",
           aliases=("mc",), args_hint="[arg]"),

Add handler in HermesCLI.process_command() in cli.py:

elif canonical == "mycommand":
    self._handle_mycommand(cmd_original)

If the command is available in the gateway, add a handler in gateway/run.py:

if canonical == "mycommand":
    return await self._handle_mycommand(event)

For persistent settings, use save_config_value() in cli.py

CommandDef fields:

name — canonical name without slash (e.g. "background")
description — human-readable description
category — one of "Session", "Configuration", "Tools & Skills", "Info", "Exit"
aliases — tuple of alternative names (e.g. ("bg",))
args_hint — argument placeholder shown in help (e.g. "<prompt>", "[name]")
cli_only — only available in the interactive CLI
gateway_only — only available in messaging platforms
gateway_config_gate — config dotpath (e.g. "display.tool_progress_command"); when set on a cli_only command, the command becomes available in the gateway if the config value is truthy. GATEWAY_KNOWN_COMMANDS always includes config-gated commands so the gateway can dispatch them; help/menus only show them when the gate is open.

Adding an alias requires only adding it to the aliases tuple on the existing CommandDef. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.

TUI Architecture (ui-tui + tui_gateway)

The TUI is a full replacement for the classic (prompt_toolkit) CLI, activated via hermes --tui or HERMES_TUI=1.

Process Model

hermes --tui
  └─ Node (Ink)  ──stdio JSON-RPC──  Python (tui_gateway)
       │                                  └─ AIAgent + tools + sessions
       └─ renders transcript, composer, prompts, activity

TypeScript owns the screen. Python owns sessions, tools, model calls, and slash command logic.

Transport

Newline-delimited JSON-RPC over stdio. Requests from Ink, events from Python. See tui_gateway/server.py for the full method/event catalog.

Key Surfaces

Surface	Ink component	Gateway method
Chat streaming	`app.tsx` + `messageLine.tsx`	`prompt.submit` → `message.delta/complete`
Tool activity	`thinking.tsx`	`tool.start/progress/complete`
Approvals	`prompts.tsx`	`approval.respond` ← `approval.request`
Clarify/sudo/secret	`prompts.tsx`, `maskedPrompt.tsx`	`clarify/sudo/secret.respond`
Session picker	`sessionPicker.tsx`	`session.list/resume`
Slash commands	Local handler + fallthrough	`slash.exec` → `_SlashWorker`, `command.dispatch`
Completions	`useCompletion` hook	`complete.slash`, `complete.path`
Theming	`theme.ts` + `branding.tsx`	`gateway.ready` with skin data

Slash Command Flow

Built-in client commands (/help, /quit, /clear, /resume, /copy, /paste, etc.) handled locally in app.tsx
Everything else → slash.exec (runs in persistent _SlashWorker subprocess) → command.dispatch fallback

Dev Commands

cd ui-tui
npm install       # first time
npm run dev       # watch mode (rebuilds hermes-ink + tsx --watch)
npm start         # production
npm run build     # full build (hermes-ink + tsc)
npm run type-check # typecheck only (tsc --noEmit)
npm run lint      # eslint
npm run fmt       # prettier
npm test          # vitest

TUI in the Dashboard (`hermes dashboard` → `/chat`)

The dashboard embeds the real hermes --tui — not a rewrite. See hermes_cli/pty_bridge.py + the @app.websocket("/api/pty") endpoint in hermes_cli/web_server.py.

Browser loads web/src/pages/ChatPage.tsx, which mounts xterm.js's Terminal with the WebGL renderer, @xterm/addon-fit for container-driven resize, and @xterm/addon-unicode11 for modern wide-character widths.
/api/pty?token=… upgrades to a WebSocket; auth uses the same ephemeral _SESSION_TOKEN as REST, via query param (browsers can't set Authorization on WS upgrade).
The server spawns whatever hermes --tui would spawn, through ptyprocess (POSIX PTY — WSL works, native Windows does not).
Frames: raw PTY bytes each direction; resize via \x1b[RESIZE:<cols>;<rows>] intercepted on the server and applied with TIOCSWINSZ.

Do not re-implement the primary chat experience in React. The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded hermes --tui — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.

Structured React UI around the TUI is allowed when it is not a second chat surface. Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. ChatSidebar, ModelPickerDialog, ToolCall) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.

Electron Desktop Chat App (`apps/desktop/`)

A separate chat surface from both the classic CLI and the dashboard's embedded TUI. It is an Electron + React + nanostore renderer (@assistant-ui/react) that talks to a tui_gateway backend over JSON-RPC (requestGateway(method, params)). It does NOT embed hermes --tui — it has its own composer, transcript, and slash-command pipeline. Route desktop bugs to the hermes-desktop-app-work skill, not hermes-dashboard-work.

Slash commands in the desktop app are curated client-side, then dispatched to the backend. The pipeline:

Backend already provides everything. tui_gateway/server.py commands.catalog (empty-query list) and complete.slash (typed-query completions) both include built-in commands, user quick_commands, AND skill-derived commands (scan_skill_commands() / get_skill_commands()). The desktop app does not need a new RPC to see skills.
The renderer curates via apps/desktop/src/lib/desktop-slash-commands.ts. This is the load-bearing file. It holds DESKTOP_COMMANDS (the ~19 built-ins shown in the palette) plus block-lists for terminal-only / messaging-only / picker-owned / settings-owned / advanced commands that should NOT clutter the desktop popover.
- isDesktopSlashCommand(name) — gates execution. Returns true for built-ins AND for any non-built-in (skill / quick command), so typed extension commands run.
- isDesktopSlashSuggestion(name) — gates discovery/completion. Used by BOTH completion paths in app/chat/composer/hooks/use-slash-completions.ts (empty-query catalog filter + typed-query complete.slash filter) and by filterDesktopCommandsCatalog.
- isDesktopSlashExtensionCommand(name) — true when the command is NOT a known Hermes built-in (i.e. a skill or user quick command). Both suggestion and catalog-filter paths allow extensions through so skill commands surface in the palette. (Added when fixing "skill commands missing from the desktop slash palette" — the curated allow-list was silently dropping every skill/quick command from completions even though they executed fine when typed.)
Dispatch lives in app/session/hooks/use-prompt-actions.ts (runSlash): built-ins that the desktop owns (/skin, /help, /new, …) are handled locally or via commands.catalog; everything else goes to slash.exec, falling back to command.dispatch (which the gateway resolves into skill / alias / exec directives). A skill command resolves to {type: "skill", message} and is submitted as a normal prompt.

Rule: the desktop slash palette's curation is about hiding noise (terminal-only / messaging-only built-ins), NOT about hiding user-activated extensions. Skill commands and quick_commands are extensions the backend surfaces — they belong in completions. If you tighten desktop-slash-commands.ts, keep isDesktopSlashExtensionCommand flowing into both the suggestion and catalog-filter paths. Tests: apps/desktop/src/lib/desktop-slash-commands.test.ts (run via the repo-root vitest, since apps/desktop resolves deps from the root workspace install).

Adding New Tools

Before adding any tool, settle the footprint question first (see "The Footprint Ladder" in the Contribution Rubric): most capabilities should NOT be core tools. For custom or local-only tools, do not edit Hermes core. Use the plugin route instead: create ~/.hermes/plugins/<name>/plugin.yaml and ~/.hermes/plugins/<name>/__init__.py, then register tools with ctx.register_tool(...). Plugin toolsets are discovered automatically and can be enabled or disabled without touching tools/ or toolsets.py.

Use the built-in route below only when the user is explicitly contributing a new core Hermes tool that should ship in the base system.

Built-in/core tools require changes in 2 files:

1. Create tools/your_tool.py:

import json, os
from tools.registry import registry

def check_requirements() -> bool:
    return bool(os.getenv("EXAMPLE_API_KEY"))

def example_tool(param: str, task_id: str = None) -> str:
    return json.dumps({"success": True, "data": "..."})

registry.register(
    name="example_tool",
    toolset="example",
    schema={"name": "example_tool", "description": "...", "parameters": {...}},
    handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
    check_fn=check_requirements,
    requires_env=["EXAMPLE_API_KEY"],
)

2. Add to toolsets.py — either _HERMES_CORE_TOOLS (all platforms) or a new toolset. This step is required: auto-discovery imports the tool and registers its schema, but the tool is only exposed to an agent if its name appears in a toolset. _HERMES_CORE_TOOLS is not dead code — it's the default bundle every platform's base toolset inherits from.

Auto-discovery: any tools/*.py file with a top-level registry.register() call is imported automatically — no manual import list to maintain. Wiring into a toolset is still a deliberate, manual step.

The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.

Path references in tool schemas: If the schema description mentions file paths (e.g. default output directories), use display_hermes_home() to make them profile-aware. The schema is generated at import time, which is after _apply_profile_override() sets HERMES_HOME.

State files: If a tool stores persistent state (caches, logs, checkpoints), use get_hermes_home() for the base directory — never Path.home() / ".hermes". This ensures each profile gets its own state.

Agent-level tools (todo, memory): intercepted by run_agent.py before handle_function_call(). See tools/todo_tool.py for the pattern.

Dependency Pinning Policy

All dependencies must have upper bounds to limit supply-chain attack surface. This policy was established after the litellm compromise (PR #2796, #2810) and reinforced after the Mini Shai-Hulud worm campaign (May 2026).

Source type	Treatment	Example
PyPI package	`>=floor,<next_major`	`"httpx>=0.28.1,<1"`
Git URL	Commit SHA	`git+https://...@<40-char-sha>`
GitHub Actions	Commit SHA + comment	`uses: actions/checkout@<sha> # v4`
CI-only pip	`==exact`	`pyyaml==6.0.2`

When adding a new dependency to pyproject.toml:

Pin to >=current_version,<next_major for post-1.0 (e.g. >=1.5.0,<2).
For pre-1.0 packages, use <0.(current_minor + 2) (e.g. >=0.29,<0.32).
Never commit a bare >=X.Y.Z without a ceiling — CI and reviewers will reject it.
Run uv lock to regenerate uv.lock with hashes.

Reference: #2810 (bounds pass), #9801 (SHA pinning + audit CI).

Adding Configuration

config.yaml options:

Add to DEFAULT_CONFIG in hermes_cli/config.py
Bump _config_version (check the current value at the top of DEFAULT_CONFIG) ONLY if you need to actively migrate/transform existing user config (renaming keys, changing structure). Adding a new key to an existing section is handled automatically by the deep-merge and does NOT require a version bump.

Top-level `config.yaml` sections (non-exhaustive):

model, agent, terminal, compression, display, stt, tts, memory, security, delegation, smart_model_routing, checkpoints, auxiliary, curator, skills, gateway, logging, cron, profiles, plugins, honcho.

auxiliary holds per-task overrides for side-LLM work (curator, vision, embedding, title generation, session_search, etc.) — each task can pin its own provider/model/base_url/max_tokens/reasoning_effort. See agent/auxiliary_client.py::_resolve_auto for resolution order.

curator holds the background skill-maintenance config — enabled, interval_hours, min_idle_hours, stale_after_days, archive_after_days, backup (nested).

.env variables (SECRETS ONLY — API keys, tokens, passwords):

Add to OPTIONAL_ENV_VARS in hermes_cli/config.py with metadata:

"NEW_API_KEY": {
    "description": "What it's for",
    "prompt": "Display name",
    "url": "https://...",
    "password": True,
    "category": "tool",  # provider, tool, messaging, setting
},

Non-secret settings (timeouts, thresholds, feature flags, paths, display preferences) belong in config.yaml, not .env. If internal code needs an env var mirror for backward compatibility, bridge it from config.yaml to the env var in code (see gateway_timeout, terminal.cwd → TERMINAL_CWD).

Config loaders (three paths — know which one you're in):

Loader	Used by	Location
`load_cli_config()`	CLI mode	`cli.py` — merges CLI-specific defaults + user YAML
`load_config()`	`hermes tools`, `hermes setup`, most CLI subcommands	`hermes_cli/config.py` — merges `DEFAULT_CONFIG` + user YAML
Direct YAML load	Gateway runtime	`gateway/run.py` + `gateway/config.py` — reads user YAML raw

If you add a new key and the CLI sees it but the gateway doesn't (or vice versa), you're on the wrong loader. Check DEFAULT_CONFIG coverage.

Working directory:

CLI — uses the process's current directory (os.getcwd()).
Messaging — uses terminal.cwd from config.yaml. The gateway bridges this to the TERMINAL_CWD env var for child tools. MESSAGING_CWD has been removed — the config loader prints a deprecation warning if it's set in .env. Same for TERMINAL_CWD in .env; the canonical setting is terminal.cwd in config.yaml.

Skin/Theme System

The skin engine (hermes_cli/skin_engine.py) provides data-driven CLI visual customization. Skins are pure data — no code changes needed to add a new skin.

Architecture

hermes_cli/skin_engine.py    # SkinConfig dataclass, built-in skins, YAML loader
~/.hermes/skins/*.yaml       # User-installed custom skins (drop-in)

init_skin_from_config() — called at CLI startup, reads display.skin from config
get_active_skin() — returns cached SkinConfig for the current skin
set_active_skin(name) — switches skin at runtime (used by /skin command)
load_skin(name) — loads from user skins first, then built-ins, then falls back to default
Missing skin values inherit from the default skin automatically

What skins customize

Element	Skin Key	Used By
Banner panel border	`colors.banner_border`	`banner.py`
Banner panel title	`colors.banner_title`	`banner.py`
Banner section headers	`colors.banner_accent`	`banner.py`
Banner dim text	`colors.banner_dim`	`banner.py`
Banner body text	`colors.banner_text`	`banner.py`
Response box border	`colors.response_border`	`cli.py`
Spinner faces (waiting)	`spinner.waiting_faces`	`display.py`
Spinner faces (thinking)	`spinner.thinking_faces`	`display.py`
Spinner verbs	`spinner.thinking_verbs`	`display.py`
Spinner wings (optional)	`spinner.wings`	`display.py`
Tool output prefix	`tool_prefix`	`display.py`
Per-tool emojis	`tool_emojis`	`display.py` → `get_tool_emoji()`
Agent name	`branding.agent_name`	`banner.py`, `cli.py`
Welcome message	`branding.welcome`	`cli.py`
Response box label	`branding.response_label`	`cli.py`
Prompt symbol	`branding.prompt_symbol`	`cli.py`

Built-in skins

default — Classic Hermes gold/kawaii (the current look)
ares — Crimson/bronze war-god theme with custom spinner wings
mono — Clean grayscale monochrome
slate — Cool blue developer-focused theme

Adding a built-in skin

Add to _BUILTIN_SKINS dict in hermes_cli/skin_engine.py:

"mytheme": {
    "name": "mytheme",
    "description": "Short description",
    "colors": { ... },
    "spinner": { ... },
    "branding": { ... },
    "tool_prefix": "┊",
},

User skins (YAML)

Users create ~/.hermes/skins/<name>.yaml:

name: cyberpunk
description: Neon-soaked terminal theme

colors:
  banner_border: "#FF00FF"
  banner_title: "#00FFFF"
  banner_accent: "#FF1493"

spinner:
  thinking_verbs: ["jacking in", "decrypting", "uploading"]
  wings:
    - ["⟨⚡", "⚡⟩"]

branding:
  agent_name: "Cyber Agent"
  response_label: " ⚡ Cyber "

tool_prefix: "▏"

Activate with /skin cyberpunk or display.skin: cyberpunk in config.yaml.

Plugins

Hermes has two plugin surfaces. Both live under plugins/ in the repo so repo-shipped plugins can be discovered alongside user-installed ones in ~/.hermes/plugins/ and pip-installed entry points.

General plugins (`hermes_cli/plugins.py` + `plugins/<name>/`)

PluginManager discovers plugins from ~/.hermes/plugins/, ./.hermes/plugins/, and pip entry points. Each plugin exposes a register(ctx) function that can:

Register Python-callback lifecycle hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end
Register new tools via ctx.register_tool(...)
Register CLI subcommands via ctx.register_cli_command(...) — the plugin's argparse tree is wired into hermes at startup so hermes <pluginname> <subcmd> works with no change to main.py

Hooks are invoked from model_tools.py (pre/post tool) and run_agent.py (lifecycle). Discovery timing pitfall: discover_plugins() only runs as a side effect of importing model_tools.py. Code paths that read plugin state without importing model_tools.py first must call discover_plugins() explicitly (it's idempotent).

Memory-provider plugins (`plugins/memory/<name>/`)

Separate discovery system for pluggable memory backends. Current built-in providers include honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb.

Each provider implements the MemoryProvider ABC (see agent/memory_provider.py) and is orchestrated by agent/memory_manager.py. Lifecycle hooks include sync_turn(turn_messages), prefetch(query), shutdown(), and optional post_setup(hermes_home, config) for setup-wizard integration.

CLI commands via plugins/memory/<name>/cli.py: if a memory plugin defines register_cli(subparser), discover_plugin_cli_commands() finds it at argparse setup time and wires it into hermes <plugin>. The framework only exposes CLI commands for the currently active memory provider (read from memory.provider in config.yaml), so disabled providers don't clutter hermes --help.

Rule (Teknium, May 2026): plugins MUST NOT modify core files (run_agent.py, cli.py, gateway/run.py, hermes_cli/main.py, etc.). If a plugin needs a capability the framework doesn't expose, expand the generic plugin surface (new hook, new ctx method) — never hardcode plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded honcho argparse from main.py for exactly this reason.

No new in-tree memory providers (policy, May 2026): the set of built-in memory providers under plugins/memory/ is closed. New memory backends must ship as standalone plugin repos that users install into ~/.hermes/plugins/ (or via pip entry points) — they implement the same MemoryProvider ABC, register through the same discovery path, and integrate via hermes memory setup / post_setup() without landing in this tree. PRs that add a new directory under plugins/memory/ will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.

Model-provider plugins (`plugins/model-providers/<name>/`)

Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …) ships as a plugin here. Each plugin's __init__.py calls providers.register_provider(ProviderProfile(...)) at module load. providers/__init__.py._discover_providers() is a lazy, separate discovery system — scanned on first get_provider_profile() or list_providers() call, NOT by the general PluginManager.

Scan order:

Bundled: <repo>/plugins/model-providers/<name>/
User: $HERMES_HOME/plugins/model-providers/<name>/
Legacy: <repo>/providers/<name>.py (back-compat)

User plugins of the same name override bundled ones — register_provider() is last-writer-wins. This lets third parties swap out any built-in profile without a repo patch.

The general PluginManager records kind: model-provider manifests but does NOT import them (would double-instantiate ProviderProfile). Plugins without an explicit kind: get auto-coerced via a source-text heuristic (register_provider + ProviderProfile in __init__.py).

Full authoring guide: website/docs/developer-guide/model-provider-plugin.md.

Dashboard / context-engine / image-gen plugin directories

plugins/context_engine/, plugins/image_gen/, etc. follow the same pattern (ABC + orchestrator + per-plugin directory). Context engines plug into agent/context_engine.py; image-gen providers into agent/image_gen_provider.py. Reference / docs-companion plugins (example-dashboard, strike-freedom-cockpit, plugin-llm-example, plugin-llm-async-example) live in the hermes-example-plugins companion repo, not in this tree.

Skills

Two parallel surfaces:

skills/ — built-in skills shipped and loadable by default. Organized by category directories (e.g. skills/github/, skills/mlops/).
optional-skills/ — heavier or niche skills shipped with the repo but NOT active by default. Installed explicitly via hermes skills install official/<category>/<skill>. Adapter lives in tools/skills_hub.py (OptionalSkillSource). Categories include autonomous-ai-agents, blockchain, communication, creative, devops, email, health, mcp, migration, mlops, productivity, research, security, web-development.

When reviewing skill PRs, check which directory they target — heavy-dep or niche skills belong in optional-skills/.

SKILL.md frontmatter

Standard fields: name, description, version, author, license, platforms (OS-gating list: [macos], [linux, macos], ...), metadata.hermes.tags, metadata.hermes.category, metadata.hermes.related_skills, metadata.hermes.config (config.yaml settings the skill needs — stored under skills.config.<key>, prompted during setup, injected at load time).

Top-level tags: and category: are also accepted and mirrored from metadata.hermes.* by the loader.

Skill authoring standards (HARDLINE)

Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.

description ≤ 60 characters, one sentence, ends with a period. Long descriptions bloat skill listings and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
```
import re, pathlib
m = re.search(r'^description: (.*)$',
              pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
              re.MULTILINE)
assert len(m.group(1)) <= 60, len(m.group(1))
```
Tools referenced in SKILL.md prose must be native Hermes tools or MCP servers the skill explicitly expects. When the skill needs a capability, point at the proper tool by name in backticks (`terminal`, `web_extract`, `read_file`, `patch`, `search_files`, `vision_analyze`, `browser_navigate`, `delegate_task`, etc.). Do NOT name shell utilities the agent already has wrapped — grep → search_files, cat/head/tail → read_file, sed/awk → patch, find/ls → search_files target='files'. If the skill depends on an MCP server, name the MCP server and document the expected setup in ## Prerequisites. Anything else (third-party CLIs, shell pipelines, etc.) is fair game inside script files but should not be the headline interaction surface in the prose.
platforms: gating audited against actual script imports. Skills that use POSIX-only primitives (fcntl, termios, os.setsid, os.kill(pid, 0) for liveness, /proc, /tmp hardcoded, signal.SIGKILL, bash heredocs, osascript, apt, systemctl) must declare their supported platforms. Default posture: try to fix it cross-platform first — tempfile.gettempdir, pathlib.Path, psutil.pid_exists, Python-level filtering instead of grep. Gate to a narrower set only when the dependency is genuinely platform-bound.
author credits the human contributor first. For external contributions, the contributor's real name + GitHub handle goes first; "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author (because they used Hermes to draft the skill), replace it with their actual name — credit the human, not the tool.
SKILL.md body uses the modern section order. # <Skill> Skill title, 2-3 sentence intro stating what it does and doesn't do, ## When to Use, ## Prerequisites, ## How to Run, ## Quick Reference, ## Procedure, ## Pitfalls, ## Verification. Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already in ## Prerequisites.
Scripts go in scripts/, references in references/, templates in templates/. Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference it from SKILL.md by path relative to the skill directory.
Tests live at tests/skills/test_<skill>_skill.py and use only stdlib + pytest + unittest.mock. No live network calls. Run via scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q.
.env.example additions are isolated to a clearly delimited block. Don't touch the surrounding file — contributor-supplied .env.example versions are usually stale and edits outside the skill's own block must be dropped during salvage.

The full salvage / modernization checklist for external skill PRs lives in the hermes-agent-dev skill at references/new-skill-pr-salvage.md — load it before polishing contributor skill PRs.

Toolsets

All toolsets are defined in toolsets.py as a single TOOLSETS dict. Each platform's adapter picks a base toolset (e.g. Telegram uses "messaging"); _HERMES_CORE_TOOLS is the default bundle most platforms inherit from.

Current toolset keys: browser, clarify, code_execution, cronjob, debugging, delegation, discord, discord_admin, feishu_doc, feishu_drive, file, homeassistant, image_gen, kanban, memory, messaging, moa, rl, safe, search, session_search, skills, spotify, terminal, todo, tts, video, vision, web, yuanbao.

Enable/disable per platform via hermes tools (the curses UI) or the tools.<platform>.enabled / tools.<platform>.disabled lists in config.yaml.

Delegation (`delegate_task`)

tools/delegate_tool.py spawns a subagent with an isolated context + terminal session. Synchronous: the parent waits for the child's summary before continuing its own loop — if the parent is interrupted, the child is cancelled.

Two shapes:

Single: pass goal (+ optional context, toolsets).
Batch (parallel): pass tasks: [...] — each gets its own subagent running concurrently. Concurrency is capped by delegation.max_concurrent_children (default 3).

Roles:

role="leaf" (default) — focused worker. Cannot call delegate_task, clarify, memory, send_message, execute_code.
role="orchestrator" — retains delegate_task so it can spawn its own workers. Gated by delegation.orchestrator_enabled (default true) and bounded by delegation.max_spawn_depth (default 2).

Key config knobs (under delegation: in config.yaml): max_concurrent_children, max_spawn_depth, child_timeout_seconds, orchestrator_enabled, subagent_auto_approve, inherit_mcp_toolsets, max_iterations.

Synchronicity rule: delegate_task is not durable. For long-running work that must outlive the current turn, use cronjob or terminal(background=True, notify_on_complete=True) instead.

Curator (skill lifecycle)

Background skill-maintenance system that tracks usage on agent-created skills and auto-archives stale ones. Users never lose skills; archives go to ~/.hermes/skills/.archive/ and are restorable.

Core: agent/curator.py (review loop, auto-transitions, LLM review prompt) + agent/curator_backup.py (pre-run tar.gz snapshots).
CLI: hermes_cli/curator.py wires hermes curator <verb> where verbs are: status, run, pause, resume, pin, unpin, archive, restore, prune, backup, rollback.
Telemetry: tools/skill_usage.py owns the sidecar ~/.hermes/skills/.usage.json — per-skill use_count, view_count, patch_count, last_activity_at, state (active / stale / archived), pinned.

Invariants:

Curator only touches skills with created_by: "agent" provenance — bundled + hub-installed skills are off-limits.
Never deletes; max destructive action is archive.
Pinned skills are exempt from every auto-transition and from the LLM review pass.
skill_manage(action="delete") refuses pinned skills; patch/edit/ write_file/remove_file go through so the agent can keep improving pinned skills.

Config section (curator: in config.yaml): enabled, interval_hours, min_idle_hours, stale_after_days, archive_after_days, backup.*.

Full user-facing docs: website/docs/user-guide/features/curator.md.

Cron (scheduled jobs)

cron/jobs.py (job store) + cron/scheduler.py (tick loop). Agents schedule jobs via the cronjob tool; users via hermes cron <verb> (list, add, edit, pause, resume, run, remove) or the /cron slash command.

Supported schedule formats:

Duration: "30m", "2h", "1d"
"every" phrase: "every 2h", "every monday 9am"
5-field cron expression: "0 9 * * *"
ISO timestamp (one-shot): "2026-06-01T09:00:00Z"

Per-job fields include skills (load specific skills), model / provider overrides, script (pre-run data-collection script whose stdout is injected into the prompt; no_agent=True turns the script into the entire job), context_from (chain job A's last output into job B's prompt), workdir (run in a specific directory with its AGENTS.md/CLAUDE.md loaded), and multi-platform delivery.

Hardening invariants:

3-minute hard interrupt on cron sessions — runaway agent loops cannot monopolize the scheduler.
Catchup window: half the job's period, clamped to 120s–2h.
Grace window: 120s for one-shot jobs whose fire time was missed.
File lock at ~/.hermes/cron/.tick.lock prevents duplicate ticks across processes.
Cron sessions pass skip_memory=True by default; memory providers intentionally do not run during cron.

Cron deliveries are not mirrored into the target gateway session — they land in their own cron session with a header/footer frame so the main conversation's message-role alternation stays intact.

Kanban (multi-agent work queue)

Durable SQLite-backed board that lets multiple profiles / workers collaborate on shared tasks. Users drive it via hermes kanban <verb>; workers spawned by the dispatcher drive it via a dedicated kanban_* toolset so their schema footprint is zero when they're not inside a kanban task.

CLI: hermes_cli/kanban.py wires hermes kanban with verbs init, create, list (alias ls), show, assign, link, unlink, comment, complete, block, unblock, archive, tail, plus less-commonly-used watch, stats, runs, log, assignees, heartbeat, notify-*, dispatch, daemon, gc.
Worker/orchestrator toolset: tools/kanban_tools.py exposes kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link; profiles that explicitly enable the kanban toolset outside a dispatcher-spawned task also get kanban_list and kanban_unblock for board routing.
Dispatcher: long-lived loop that (default every 60s) reclaims stale claims, promotes ready tasks, atomically claims, and spawns assigned profiles. Runs inside the gateway by default via kanban.dispatch_in_gateway: true.
Plugin assets: plugins/kanban/dashboard/ (web UI) + plugins/kanban/systemd/ (hermes-kanban-dispatcher.service for standalone dispatcher deployment).

Isolation model:

Board is the hard boundary — workers are spawned with HERMES_KANBAN_BOARD pinned in their env so they can't see other boards.
Tenant is a soft namespace within a board — one specialist fleet can serve multiple businesses with workspace-path + memory-key isolation.
After kanban.failure_limit consecutive non-success attempts on the same task (default: 2), the dispatcher auto-blocks it to prevent spin loops.

Full user-facing docs: website/docs/user-guide/features/kanban.md.

Important Policies

Prompt Caching Must Not Break

Hermes-Agent ensures caching remains valid throughout a conversation. Do NOT implement changes that would:

Alter past context mid-conversation
Change toolsets mid-conversation
Reload memories or rebuild system prompts mid-conversation

Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.

Slash commands that mutate system-prompt state (skills, tools, memory, etc.) must be cache-aware: default to deferred invalidation (change takes effect next session), with an opt-in --now flag for immediate invalidation. See /skills install --now for the canonical pattern.

Background Process Notifications (Gateway)

When terminal(background=true, notify_on_complete=true) is used, the gateway runs a watcher that detects process completion and triggers a new agent turn. Control verbosity of background process messages with display.background_process_notifications in config.yaml (or HERMES_BACKGROUND_NOTIFICATIONS env var):

all — running-output updates + final message (default)
result — only the final completion message
error — only the final message when exit code != 0
off — no watcher messages at all

Profiles: Multi-Instance Support

Hermes supports profiles — multiple fully isolated instances, each with its own HERMES_HOME directory (config, API keys, memory, sessions, skills, gateway, etc.).

The core mechanism: _apply_profile_override() in hermes_cli/main.py sets HERMES_HOME before any module imports. All get_hermes_home() references automatically scope to the active profile.

Rules for profile-safe code

Use get_hermes_home() for all HERMES_HOME paths. Import from hermes_constants. NEVER hardcode ~/.hermes or Path.home() / ".hermes" in code that reads/writes state.

# GOOD
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"

# BAD — breaks profiles
config_path = Path.home() / ".hermes" / "config.yaml"

Use display_hermes_home() for user-facing messages. Import from hermes_constants. This returns ~/.hermes for default or ~/.hermes/profiles/<name> for profiles.

# GOOD
from hermes_constants import display_hermes_home
print(f"Config saved to {display_hermes_home()}/config.yaml")

# BAD — shows wrong path for profiles
print("Config saved to ~/.hermes/config.yaml")

Module-level constants are fine — they cache get_hermes_home() at import time, which is AFTER _apply_profile_override() sets the env var. Just use get_hermes_home(), not Path.home() / ".hermes".

Tests that mock Path.home() must also set HERMES_HOME — since code now uses get_hermes_home() (reads env var), not Path.home() / ".hermes":

with patch.object(Path, "home", return_value=tmp_path), \
     patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
    ...

Gateway platform adapters should use token locks — if the adapter connects with a unique credential (bot token, API key), call acquire_scoped_lock() from gateway.status in the connect()/start() method and release_scoped_lock() in disconnect()/stop(). This prevents two profiles from using the same credential. See gateway/platforms/telegram.py for the canonical pattern.
Profile operations are HOME-anchored, not HERMES_HOME-anchored — _get_profiles_root() returns Path.home() / ".hermes" / "profiles", NOT get_hermes_home() / "profiles". This is intentional — it lets hermes -p coder profile list see all profiles regardless of which one is active.

Known Pitfalls

DO NOT hardcode `~/.hermes` paths

Use get_hermes_home() from hermes_constants for code paths. Use display_hermes_home() for user-facing print/log messages. Hardcoding ~/.hermes breaks profiles — each profile has its own HERMES_HOME directory. This was the source of 5 bugs fixed in PR #3575.

DO NOT introduce new `simple_term_menu` usage

Existing call sites in hermes_cli/main.py remain for legacy fallback only; the preferred UI is curses (stdlib) because simple_term_menu has ghost-duplication rendering bugs in tmux/iTerm2 with arrow keys. New interactive menus must use hermes_cli/curses_ui.py — see hermes_cli/tools_config.py for the canonical pattern.

DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code

Leaks as literal ?[K text under prompt_toolkit's patch_stdout. Use space-padding: f"\r{line}{' ' * pad}".

`_last_resolved_tool_names` is a process-global in `model_tools.py`

_run_single_child() in delegate_tool.py saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.

DO NOT hardcode cross-tool references in schema descriptions

Tool schema descriptions must not mention tools from other toolsets by name (e.g., browser_navigate saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in get_tool_definitions() in model_tools.py — see the browser_navigate / execute_code post-processing blocks for the pattern.

The gateway has TWO message guards — both must bypass approval/control commands

When an agent is running, messages pass through two sequential guards: (1) base adapter (gateway/platforms/base.py) queues messages in _pending_messages when session_key in self._active_sessions, and (2) gateway runner (gateway/run.py) intercepts /stop, /new, /queue, /status, /approve, /deny before they reach running_agent.interrupt(). Any new command that must reach the runner while the agent is blocked (e.g. approval prompts) MUST bypass BOTH guards and be dispatched inline, not via _process_message_background() (which races session lifecycle).

Squash merges from stale branches silently revert recent fixes

Before squash-merging a PR, ensure the branch is up to date with main (git fetch origin main && git reset --hard origin/main in the worktree, then re-apply the PR's commits). A stale branch's version of an unrelated file will silently overwrite recent fixes on main when squashed. Verify with git diff HEAD~1..HEAD after merging — unexpected deletions are a red flag.

Don't wire in dead code without E2E validation

Unused code that was never shipped was dead for a reason. Before wiring an unused module into a live code path, E2E test the real resolution chain with actual imports (not mocks) against a temp HERMES_HOME.

Tests must not write to `~/.hermes/`

The _isolate_hermes_home autouse fixture in tests/conftest.py redirects HERMES_HOME to a temp dir. Never hardcode ~/.hermes/ paths in tests.

Profile tests: When testing profile features, also mock Path.home() so that _get_profiles_root() and _get_default_hermes_home() resolve within the temp dir. Use the pattern from tests/hermes_cli/test_profiles.py:

@pytest.fixture
def profile_env(tmp_path, monkeypatch):
    home = tmp_path / ".hermes"
    home.mkdir()
    monkeypatch.setattr(Path, "home", lambda: tmp_path)
    monkeypatch.setenv("HERMES_HOME", str(home))
    return home

Testing

ALWAYS use scripts/run_tests.sh — do not call pytest directly. The script enforces hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8, -n auto xdist workers, in-tree subprocess-isolation plugin). Direct pytest on a 16+ core developer machine with API keys set diverges from CI in ways that have caused multiple "works locally, fails in CI" incidents (and the reverse).

scripts/run_tests.sh                                  # full suite, CI-parity
scripts/run_tests.sh tests/gateway/                   # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags
scripts/run_tests.sh --no-isolate tests/foo/          # disable subprocess isolation (faster, for debugging)

Subprocess-per-test isolation

Every test runs in a freshly-spawned Python subprocess via the in-tree plugin at tests/_isolate_plugin.py. This means module-level dicts/sets and ContextVars from one test cannot leak into the next — the historic _reset_module_state autouse fixture is gone.

Implementation notes:

The plugin uses multiprocessing.get_context("spawn"), which works on Linux, macOS, and Windows alike (POSIX fork is not used).
Per-test overhead is ~0.5–1.0s (Python startup + pytest collection). xdist parallelism amortizes this across cores; on a 20-core box the full suite finishes in roughly the same wall time as before, but flake-free.
isolate_timeout (configured in pyproject.toml) caps each test at 30s. Hangs are killed and surfaced as a failure report.
Pass --no-isolate to disable isolation — useful when debugging a single test interactively, or when you specifically want to verify state leakage.
The plugin disables itself in child processes (sentinel envvar HERMES_ISOLATE_CHILD=1), so there's no fork-bomb risk.

Why the wrapper (and why the old "just call pytest" doesn't work)

Five real sources of local-vs-CI drift the script closes:

	Without wrapper	With wrapper
Provider API keys	Whatever is in your env (auto-detects pool)	All `_API_KEY`/`_TOKEN`/etc. unset
HOME / `~/.hermes/`	Your real config+auth.json	Temp dir per test
Timezone	Local TZ (PDT etc.)	UTC
Locale	Whatever is set	C.UTF-8
xdist workers	`-n auto` = all cores	`-n auto` (safe — subprocess isolation prevents cross-worker flakes)

tests/conftest.py also enforces points 1-4 as an autouse fixture so ANY pytest invocation (including IDE integrations) gets hermetic behavior — but the wrapper is belt-and-suspenders.

Running without the wrapper (only if you must)

If you can't use the wrapper (e.g. inside an IDE that shells pytest directly), at minimum activate the venv. The isolation plugin loads automatically from addopts in pyproject.toml, so you get the same per-test process isolation either way.

source .venv/bin/activate   # or: source venv/bin/activate
python -m pytest tests/ -q

If you need to bypass isolation for fast feedback while debugging:

python -m pytest tests/agent/test_foo.py -q --no-isolate

Always run the full suite before pushing changes.

Don't write change-detector tests

A test is a change-detector if it fails whenever data that is expected to change gets updated — model catalogs, config version numbers, enumeration counts, hardcoded lists of provider models. These tests add no behavioral coverage; they just guarantee that routine source updates break CI and cost engineering time to "fix."

Do not write:

# catalog snapshot — breaks every model release
assert "gemini-2.5-pro" in _PROVIDER_MODELS["gemini"]
assert "MiniMax-M2.7" in models

# config version literal — breaks every schema bump
assert DEFAULT_CONFIG["_config_version"] == 21

# enumeration count — breaks every time a skill/provider is added
assert len(_PROVIDER_MODELS["huggingface"]) == 8

Do write:

# behavior: does the catalog plumbing work at all?
assert "gemini" in _PROVIDER_MODELS
assert len(_PROVIDER_MODELS["gemini"]) >= 1

# behavior: does migration bump the user's version to current latest?
assert raw["_config_version"] == DEFAULT_CONFIG["_config_version"]

# invariant: no plan-only model leaks into the legacy list
assert not (set(moonshot_models) & coding_plan_only_models)

# invariant: every model in the catalog has a context-length entry
for m in _PROVIDER_MODELS["huggingface"]:
    assert m.lower() in DEFAULT_CONTEXT_LENGTHS_LOWER

The rule: if the test reads like a snapshot of current data, delete it. If it reads like a contract about how two pieces of data must relate, keep it. When a PR adds a new provider/model and you want a test, make the test assert the relationship (e.g. "catalog entries all have context lengths"), not the specific names.

Reviewers should reject new change-detector tests; authors should convert them into invariants before re-requesting review.

68 KiB Raw Blame History Unescape Escape