mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

emozilla f49afd3122 feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard

Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.

Architecture:

    browser <Terminal> (xterm.js)
           │  onData ───► ws.send(keystrokes)
           │  onResize ► ws.send('\x1b[RESIZE:cols;rows]')
           │  write   ◄── ws.onmessage (PTY bytes)
           ▼
    FastAPI /api/pty (token-gated, loopback-only)
           ▼
    PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent

Components
----------

hermes_cli/pty_bridge.py
  Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
  master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
  inherently byte-oriented and UTF-8 boundaries may land mid-read),
  non-blocking select-based reads, TIOCSWINSZ resize, idempotent
  SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
  is WSL-supported only).

hermes_cli/web_server.py
  @app.websocket("/api/pty") endpoint gated by the existing
  _SESSION_TOKEN (via ?token= query param since browsers can't set
  Authorization on WS upgrades). Loopback-only enforcement. Reader task
  uses run_in_executor to pump PTY bytes without blocking the event
  loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
  before forwarding to the PTY. The endpoint resolves the TUI argv
  through a _resolve_chat_argv hook so tests can inject fake commands
  without building the real TUI.

Tests
-----

tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.

tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.

96 tests pass under scripts/run_tests.sh.

(cherry picked from commit 29b337bca7)

feat(web): add Chat tab with xterm.js terminal + Sessions resume button

(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
 against current main: BUILTIN_ROUTES table + plugin slot layout)

fix(tui): replace OSC 52 jargon in /copy confirmation

When the user ran /copy successfully, Ink confirmed with:

  sent OSC52 copy sequence (terminal support required)

That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.

Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:

  copied 1482 chars

(cherry picked from commit a0701b1d5a)

docs: document the dashboard Chat tab

AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'

website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).

(cherry picked from commit 2c2e32cc45)

feat(tui-gateway): transport-aware dispatch + WebSocket sidecar

Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.

- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
  + module-level `StdioTransport` fallback. The stdio stream resolves
  through a lambda so existing tests that monkey-patch `_real_stdout`
  keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
  endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
  - `write_json` routes via session transport (for async events) →
    contextvar transport (for in-request writes) → stdio fallback.
  - `dispatch(req, transport=None)` binds the transport for the request
    lifetime and propagates it to pool workers via `contextvars.copy_context`
    so async handlers don't lose their sink.
  - `_init_session` and the manual-session create path stash the
    request's transport so out-of-band events (subagent.complete, etc.)
    fan out to the right peer.

`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.

feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal

Composes the two transports into a single Chat tab:

  ┌─────────────────────────────────────────┬──────────────┐
  │  xterm.js / PTY  (emozilla #13379)      │ ChatSidebar  │
  │  the literal hermes --tui process       │  /api/ws     │
  └─────────────────────────────────────────┴──────────────┘
        terminal bytes                          structured events

The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:

  • model + provider badge with connection state (click → switch)
  • running tool-call list (driven by tool.start / tool.progress /
    tool.complete events)
  • model picker dialog (gateway-driven, reuses ModelPickerDialog)

The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.

- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
  Owns its GatewayClient, drives the model picker through
  `slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
  (`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
  guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.

Co-authored-by: emozilla <emozilla@nousresearch.com>

refactor(web): /clean pass on ChatSidebar + ChatPage lint debt

- ChatSidebar: lift gw out of useRef into a useMemo derived from a
  reconnect counter. React 19's react-hooks/refs and react-hooks/
  set-state-in-effect rules both fire when you touch a ref during
  render or call setState from inside a useEffect body. The
  counter-derived gw is the canonical pattern for "external resource
  that needs to be replaceable on user action" — re-creating the
  client comes from bumping `version`, the effect just wires + tears
  down. Drops the imperative `gwRef.current = …` reassign in
  reconnect, drops the truthy ref guard in JSX. modelLabel +
  banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
  so the effect body doesn't have to setState on first run. Drops
  the unused react-hooks/exhaustive-deps eslint-disable. Adds a
  scoped no-control-regex disable on the SGR mouse parser regex
  (the \\x1b is intentional for xterm escape sequences).

All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.

Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.

chore: uptick

fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary

The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.

Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.

feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast

The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.

Cross-process forwarding:

- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
  (event_publisher.py, sync websockets client). entry.py installs the
  tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
  every dispatcher emit to a back-WS without disturbing Ink's stdio
  handshake.

- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
  (subscriber) endpoints with a per-channel registry. /api/pty now
  accepts ?channel= and propagates the sidecar URL via env. start_server
  also stashes app.state.bound_port so the URL is constructable.

- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
  passes it to /api/pty and as a prop to ChatSidebar.

- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
  tool.start/progress/complete back into the ToolCall list. Restores
  the ToolCall primitive.

Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).

fix(web): address Copilot review on #14890

Five threads, all real:

- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
  the open handshake.  Server emits `gateway.ready` immediately after
  accept, so a listener attached after the open promise could race past
  the initial skin payload and lose it.

- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
  WS into the existing error banner.  4401/4403 (auth/loopback reject)
  surface as a "reload the page" message; mid-stream drops surface as
  "events feed disconnected" with the existing reconnect button.  Clean
  unmount closes (1000/1001) stay silent.

- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
  ptyprocess lives in the `pty` extra, not `web`.  Switch to
  `hermes-agent[web,pty]` in both prerequisite blocks.

- AGENTS.md: previous "never add a parallel React chat surface" guidance
  was overbroad and contradicted this PR's sidebar.  Tightened to forbid
  re-implementing the transcript/composer/PTY terminal while explicitly
  allowing structured supporting widgets (sidebar / model picker /
  inspectors), matching the actual architecture.

- web/package-lock.json: regenerated cleanly so the wterm sibling
  workspace paths (extraneous machine-local entries) stop polluting CI.

Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.

refactor(web): /clean pass on ChatSidebar events handler

Spotted in the round-2 review:

- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
  fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
  hit the "unexpected drop" branch.  Track `unmounting` in the effect
  scope and gate the banner through a `surface()` helper so cleanup
  closes stay silent.

- DRY the duplicated "events feed disconnected" string into a local
  const used by both the error and close handlers.

- Drop the `opened` flag (no longer needed once the unmount guard is
  the source of truth for "is this an expected close?").

2026-04-24 10:51:49 -04:00

35 KiB

Raw Blame History

Hermes Agent - Development Guide

Instructions for AI coding assistants and developers working on the hermes-agent codebase.

Development Environment

# Prefer .venv; fall back to venv if that's what your checkout has.
source .venv/bin/activate   # or: source venv/bin/activate

scripts/run_tests.sh probes .venv first, then venv, then $HOME/.hermes/hermes-agent/venv (for worktrees that share a venv with the main checkout).

Project Structure

File counts shift constantly — don't treat the tree below as exhaustive. The canonical source is the filesystem. The notes call out the load-bearing entry points you'll actually edit.

hermes-agent/
├── run_agent.py          # AIAgent class — core conversation loop (~12k LOC)
├── model_tools.py        # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py                # HermesCLI class — interactive CLI orchestrator (~11k LOC)
├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
├── hermes_constants.py   # get_hermes_home(), display_hermes_home() — profile-aware paths
├── hermes_logging.py     # setup_logging() — agent.log / errors.log / gateway.log (profile-aware)
├── batch_runner.py       # Parallel batch processing
├── agent/                # Agent internals (provider adapters, memory, caching, compression, etc.)
├── hermes_cli/           # CLI subcommands, setup wizard, plugins loader, skin engine
├── tools/                # Tool implementations — auto-discovered via tools/registry.py
│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/              # Messaging gateway — run.py + session.py + platforms/
│   ├── platforms/        # Adapter per platform (telegram, discord, slack, whatsapp,
│   │                     #   homeassistant, signal, matrix, mattermost, email, sms,
│   │                     #   dingtalk, wecom, weixin, feishu, qqbot, bluebubbles,
│   │                     #   webhook, api_server, ...). See ADDING_A_PLATFORM.md.
│   └── builtin_hooks/    # Always-registered gateway hooks (boot-md, ...)
├── plugins/              # Plugin system (see "Plugins" section below)
│   ├── memory/           # Memory-provider plugins (honcho, mem0, supermemory, ...)
│   ├── context_engine/   # Context-engine plugins
│   └── <others>/         # Dashboard, image-gen, disk-cleanup, examples, ...
├── optional-skills/      # Heavier/niche skills shipped but NOT active by default
├── skills/               # Built-in skills bundled with the repo
├── ui-tui/               # Ink (React) terminal UI — `hermes --tui`
│   └── src/              # entry.tsx, app.tsx, gatewayClient.ts + app/components/hooks/lib
├── tui_gateway/          # Python JSON-RPC backend for the TUI
├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
├── cron/                 # Scheduler — jobs.py, scheduler.py
├── environments/         # RL training environments (Atropos)
├── scripts/              # run_tests.sh, release.py, auxiliary scripts
├── website/              # Docusaurus docs site
└── tests/                # Pytest suite (~15k tests across ~700 files as of Apr 2026)

User config: ~/.hermes/config.yaml (settings), ~/.hermes/.env (API keys only). Logs: ~/.hermes/logs/ — agent.log (INFO+), errors.log (WARNING+), gateway.log when running the gateway. Profile-aware via get_hermes_home(). Browse with hermes logs [--follow] [--level ...] [--session ...].

File Dependency Chain

tools/registry.py  (no deps — imported by all tool files)
       ↑
tools/*.py  (each calls registry.register() at import time)
       ↑
model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
run_agent.py, cli.py, batch_runner.py, environments/

AIAgent Class (run_agent.py)

The real AIAgent.__init__ takes ~60 parameters (credentials, routing, callbacks, session context, budget, credential pool, etc.). The signature below is the minimum subset you'll usually touch — read run_agent.py for the full list.

class AIAgent:
    def __init__(self,
        base_url: str = None,
        api_key: str = None,
        provider: str = None,
        api_mode: str = None,              # "chat_completions" | "codex_responses" | ...
        model: str = "",                   # empty → resolved from config/provider later
        max_iterations: int = 90,          # tool-calling iterations (shared with subagents)
        enabled_toolsets: list = None,
        disabled_toolsets: list = None,
        quiet_mode: bool = False,
        save_trajectories: bool = False,
        platform: str = None,              # "cli", "telegram", etc.
        session_id: str = None,
        skip_context_files: bool = False,
        skip_memory: bool = False,
        credential_pool=None,
        # ... plus callbacks, thread/user/chat IDs, iteration_budget, fallback_model,
        # checkpoints config, prefill_messages, service_tier, reasoning_config, etc.
    ): ...

    def chat(self, message: str) -> str:
        """Simple interface — returns final response string."""

    def run_conversation(self, user_message: str, system_message: str = None,
                         conversation_history: list = None, task_id: str = None) -> dict:
        """Full interface — returns dict with final_response + messages."""

Agent Loop

The core loop is inside run_conversation() — entirely synchronous, with interrupt checks, budget tracking, and a one-turn grace call:

while (api_call_count < self.max_iterations and self.iteration_budget.remaining > 0) \
        or self._budget_grace_call:
    if self._interrupt_requested: break
    response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
    if response.tool_calls:
        for tool_call in response.tool_calls:
            result = handle_function_call(tool_call.name, tool_call.args, task_id)
            messages.append(tool_result_message(result))
        api_call_count += 1
    else:
        return response.content

Messages follow OpenAI format: {"role": "system/user/assistant/tool", ...}. Reasoning content is stored in assistant_msg["reasoning"].

CLI Architecture (cli.py)

Rich for banner/panels, prompt_toolkit for input with autocomplete
KawaiiSpinner (agent/display.py) — animated faces during API calls, ┊ activity feed for tool results
load_cli_config() in cli.py merges hardcoded defaults + user config YAML
Skin engine (hermes_cli/skin_engine.py) — data-driven CLI theming; initialized from display.skin config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
process_command() is a method on HermesCLI — dispatches on canonical command name resolved via resolve_command() from the central registry
Skill slash commands: agent/skill_commands.py scans ~/.hermes/skills/, injects as user message (not system prompt) to preserve prompt caching

Slash Command Registry (`hermes_cli/commands.py`)

All slash commands are defined in a central COMMAND_REGISTRY list of CommandDef objects. Every downstream consumer derives from this registry automatically:

CLI — process_command() resolves aliases via resolve_command(), dispatches on canonical name
Gateway — GATEWAY_KNOWN_COMMANDS frozenset for hook emission, resolve_command() for dispatch
Gateway help — gateway_help_lines() generates /help output
Telegram — telegram_bot_commands() generates the BotCommand menu
Slack — slack_subcommand_map() generates /hermes subcommand routing
Autocomplete — COMMANDS flat dict feeds SlashCommandCompleter
CLI help — COMMANDS_BY_CATEGORY dict feeds show_help()

Adding a Slash Command

Add a CommandDef entry to COMMAND_REGISTRY in hermes_cli/commands.py:

CommandDef("mycommand", "Description of what it does", "Session",
           aliases=("mc",), args_hint="[arg]"),

Add handler in HermesCLI.process_command() in cli.py:

elif canonical == "mycommand":
    self._handle_mycommand(cmd_original)

If the command is available in the gateway, add a handler in gateway/run.py:

if canonical == "mycommand":
    return await self._handle_mycommand(event)

For persistent settings, use save_config_value() in cli.py

CommandDef fields:

name — canonical name without slash (e.g. "background")
description — human-readable description
category — one of "Session", "Configuration", "Tools & Skills", "Info", "Exit"
aliases — tuple of alternative names (e.g. ("bg",))
args_hint — argument placeholder shown in help (e.g. "<prompt>", "[name]")
cli_only — only available in the interactive CLI
gateway_only — only available in messaging platforms
gateway_config_gate — config dotpath (e.g. "display.tool_progress_command"); when set on a cli_only command, the command becomes available in the gateway if the config value is truthy. GATEWAY_KNOWN_COMMANDS always includes config-gated commands so the gateway can dispatch them; help/menus only show them when the gate is open.

Adding an alias requires only adding it to the aliases tuple on the existing CommandDef. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.

TUI Architecture (ui-tui + tui_gateway)

The TUI is a full replacement for the classic (prompt_toolkit) CLI, activated via hermes --tui or HERMES_TUI=1.

Process Model

hermes --tui
  └─ Node (Ink)  ──stdio JSON-RPC──  Python (tui_gateway)
       │                                  └─ AIAgent + tools + sessions
       └─ renders transcript, composer, prompts, activity

TypeScript owns the screen. Python owns sessions, tools, model calls, and slash command logic.

Transport

Newline-delimited JSON-RPC over stdio. Requests from Ink, events from Python. See tui_gateway/server.py for the full method/event catalog.

Key Surfaces

Surface	Ink component	Gateway method
Chat streaming	`app.tsx` + `messageLine.tsx`	`prompt.submit` → `message.delta/complete`
Tool activity	`thinking.tsx`	`tool.start/progress/complete`
Approvals	`prompts.tsx`	`approval.respond` ← `approval.request`
Clarify/sudo/secret	`prompts.tsx`, `maskedPrompt.tsx`	`clarify/sudo/secret.respond`
Session picker	`sessionPicker.tsx`	`session.list/resume`
Slash commands	Local handler + fallthrough	`slash.exec` → `_SlashWorker`, `command.dispatch`
Completions	`useCompletion` hook	`complete.slash`, `complete.path`
Theming	`theme.ts` + `branding.tsx`	`gateway.ready` with skin data

Slash Command Flow

Built-in client commands (/help, /quit, /clear, /resume, /copy, /paste, etc.) handled locally in app.tsx
Everything else → slash.exec (runs in persistent _SlashWorker subprocess) → command.dispatch fallback

Dev Commands

cd ui-tui
npm install       # first time
npm run dev       # watch mode (rebuilds hermes-ink + tsx --watch)
npm start         # production
npm run build     # full build (hermes-ink + tsc)
npm run type-check # typecheck only (tsc --noEmit)
npm run lint      # eslint
npm run fmt       # prettier
npm test          # vitest

TUI in the Dashboard (`hermes dashboard` → `/chat`)

The dashboard embeds the real hermes --tui — not a rewrite. See hermes_cli/pty_bridge.py + the @app.websocket("/api/pty") endpoint in hermes_cli/web_server.py.

Browser loads web/src/pages/ChatPage.tsx, which mounts xterm.js's Terminal with the WebGL renderer, @xterm/addon-fit for container-driven resize, and @xterm/addon-unicode11 for modern wide-character widths.
/api/pty?token=… upgrades to a WebSocket; auth uses the same ephemeral _SESSION_TOKEN as REST, via query param (browsers can't set Authorization on WS upgrade).
The server spawns whatever hermes --tui would spawn, through ptyprocess (POSIX PTY — WSL works, native Windows does not).
Frames: raw PTY bytes each direction; resize via \x1b[RESIZE:<cols>;<rows>] intercepted on the server and applied with TIOCSWINSZ.

Do not re-implement the primary chat experience in React. The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded hermes --tui — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.

Structured React UI around the TUI is allowed when it is not a second chat surface. Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. ChatSidebar, ModelPickerDialog, ToolCall) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.

Adding New Tools

Requires changes in 2 files:

1. Create tools/your_tool.py:

import json, os
from tools.registry import registry

def check_requirements() -> bool:
    return bool(os.getenv("EXAMPLE_API_KEY"))

def example_tool(param: str, task_id: str = None) -> str:
    return json.dumps({"success": True, "data": "..."})

registry.register(
    name="example_tool",
    toolset="example",
    schema={"name": "example_tool", "description": "...", "parameters": {...}},
    handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
    check_fn=check_requirements,
    requires_env=["EXAMPLE_API_KEY"],
)

2. Add to toolsets.py — either _HERMES_CORE_TOOLS (all platforms) or a new toolset.

Auto-discovery: any tools/*.py file with a top-level registry.register() call is imported automatically — no manual import list to maintain.

The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.

Path references in tool schemas: If the schema description mentions file paths (e.g. default output directories), use display_hermes_home() to make them profile-aware. The schema is generated at import time, which is after _apply_profile_override() sets HERMES_HOME.

State files: If a tool stores persistent state (caches, logs, checkpoints), use get_hermes_home() for the base directory — never Path.home() / ".hermes". This ensures each profile gets its own state.

Agent-level tools (todo, memory): intercepted by run_agent.py before handle_function_call(). See tools/todo_tool.py for the pattern.

Adding Configuration

config.yaml options:

Add to DEFAULT_CONFIG in hermes_cli/config.py
Bump _config_version (check the current value at the top of DEFAULT_CONFIG) ONLY if you need to actively migrate/transform existing user config (renaming keys, changing structure). Adding a new key to an existing section is handled automatically by the deep-merge and does NOT require a version bump.

.env variables (SECRETS ONLY — API keys, tokens, passwords):

Add to OPTIONAL_ENV_VARS in hermes_cli/config.py with metadata:

"NEW_API_KEY": {
    "description": "What it's for",
    "prompt": "Display name",
    "url": "https://...",
    "password": True,
    "category": "tool",  # provider, tool, messaging, setting
},

Non-secret settings (timeouts, thresholds, feature flags, paths, display preferences) belong in config.yaml, not .env. If internal code needs an env var mirror for backward compatibility, bridge it from config.yaml to the env var in code (see gateway_timeout, terminal.cwd → TERMINAL_CWD).

Config loaders (three paths — know which one you're in):

Loader	Used by	Location
`load_cli_config()`	CLI mode	`cli.py` — merges CLI-specific defaults + user YAML
`load_config()`	`hermes tools`, `hermes setup`, most CLI subcommands	`hermes_cli/config.py` — merges `DEFAULT_CONFIG` + user YAML
Direct YAML load	Gateway runtime	`gateway/run.py` + `gateway/config.py` — reads user YAML raw

If you add a new key and the CLI sees it but the gateway doesn't (or vice versa), you're on the wrong loader. Check DEFAULT_CONFIG coverage.

Working directory:

CLI — uses the process's current directory (os.getcwd()).
Messaging — uses terminal.cwd from config.yaml. The gateway bridges this to the TERMINAL_CWD env var for child tools. MESSAGING_CWD has been removed — the config loader prints a deprecation warning if it's set in .env. Same for TERMINAL_CWD in .env; the canonical setting is terminal.cwd in config.yaml.

Skin/Theme System

The skin engine (hermes_cli/skin_engine.py) provides data-driven CLI visual customization. Skins are pure data — no code changes needed to add a new skin.

Architecture

hermes_cli/skin_engine.py    # SkinConfig dataclass, built-in skins, YAML loader
~/.hermes/skins/*.yaml       # User-installed custom skins (drop-in)

init_skin_from_config() — called at CLI startup, reads display.skin from config
get_active_skin() — returns cached SkinConfig for the current skin
set_active_skin(name) — switches skin at runtime (used by /skin command)
load_skin(name) — loads from user skins first, then built-ins, then falls back to default
Missing skin values inherit from the default skin automatically

What skins customize

Element	Skin Key	Used By
Banner panel border	`colors.banner_border`	`banner.py`
Banner panel title	`colors.banner_title`	`banner.py`
Banner section headers	`colors.banner_accent`	`banner.py`
Banner dim text	`colors.banner_dim`	`banner.py`
Banner body text	`colors.banner_text`	`banner.py`
Response box border	`colors.response_border`	`cli.py`
Spinner faces (waiting)	`spinner.waiting_faces`	`display.py`
Spinner faces (thinking)	`spinner.thinking_faces`	`display.py`
Spinner verbs	`spinner.thinking_verbs`	`display.py`
Spinner wings (optional)	`spinner.wings`	`display.py`
Tool output prefix	`tool_prefix`	`display.py`
Per-tool emojis	`tool_emojis`	`display.py` → `get_tool_emoji()`
Agent name	`branding.agent_name`	`banner.py`, `cli.py`
Welcome message	`branding.welcome`	`cli.py`
Response box label	`branding.response_label`	`cli.py`
Prompt symbol	`branding.prompt_symbol`	`cli.py`

Built-in skins

default — Classic Hermes gold/kawaii (the current look)
ares — Crimson/bronze war-god theme with custom spinner wings
mono — Clean grayscale monochrome
slate — Cool blue developer-focused theme

Adding a built-in skin

Add to _BUILTIN_SKINS dict in hermes_cli/skin_engine.py:

"mytheme": {
    "name": "mytheme",
    "description": "Short description",
    "colors": { ... },
    "spinner": { ... },
    "branding": { ... },
    "tool_prefix": "┊",
},

User skins (YAML)

Users create ~/.hermes/skins/<name>.yaml:

name: cyberpunk
description: Neon-soaked terminal theme

colors:
  banner_border: "#FF00FF"
  banner_title: "#00FFFF"
  banner_accent: "#FF1493"

spinner:
  thinking_verbs: ["jacking in", "decrypting", "uploading"]
  wings:
    - ["⟨⚡", "⚡⟩"]

branding:
  agent_name: "Cyber Agent"
  response_label: " ⚡ Cyber "

tool_prefix: "▏"

Activate with /skin cyberpunk or display.skin: cyberpunk in config.yaml.

Plugins

Hermes has two plugin surfaces. Both live under plugins/ in the repo so repo-shipped plugins can be discovered alongside user-installed ones in ~/.hermes/plugins/ and pip-installed entry points.

General plugins (`hermes_cli/plugins.py` + `plugins/<name>/`)

PluginManager discovers plugins from ~/.hermes/plugins/, ./.hermes/plugins/, and pip entry points. Each plugin exposes a register(ctx) function that can:

Register Python-callback lifecycle hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end
Register new tools via ctx.register_tool(...)
Register CLI subcommands via ctx.register_cli_command(...) — the plugin's argparse tree is wired into hermes at startup so hermes <pluginname> <subcmd> works with no change to main.py

Hooks are invoked from model_tools.py (pre/post tool) and run_agent.py (lifecycle). Discovery timing pitfall: discover_plugins() only runs as a side effect of importing model_tools.py. Code paths that read plugin state without importing model_tools.py first must call discover_plugins() explicitly (it's idempotent).

Memory-provider plugins (`plugins/memory/<name>/`)

Separate discovery system for pluggable memory backends. Current built-in providers include honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb.

Each provider implements the MemoryProvider ABC (see agent/memory_provider.py) and is orchestrated by agent/memory_manager.py. Lifecycle hooks include sync_turn(turn_messages), prefetch(query), shutdown(), and optional post_setup(hermes_home, config) for setup-wizard integration.

CLI commands via plugins/memory/<name>/cli.py: if a memory plugin defines register_cli(subparser), discover_plugin_cli_commands() finds it at argparse setup time and wires it into hermes <plugin>. The framework only exposes CLI commands for the currently active memory provider (read from memory.provider in config.yaml), so disabled providers don't clutter hermes --help.

Rule (Teknium, May 2026): plugins MUST NOT modify core files (run_agent.py, cli.py, gateway/run.py, hermes_cli/main.py, etc.). If a plugin needs a capability the framework doesn't expose, expand the generic plugin surface (new hook, new ctx method) — never hardcode plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded honcho argparse from main.py for exactly this reason.

Dashboard / context-engine / image-gen plugin directories

plugins/context_engine/, plugins/image_gen/, plugins/example-dashboard/, etc. follow the same pattern (ABC + orchestrator + per-plugin directory). Context engines plug into agent/context_engine.py; image-gen providers into agent/image_gen_provider.py.

Skills

Two parallel surfaces:

skills/ — built-in skills shipped and loadable by default. Organized by category directories (e.g. skills/github/, skills/mlops/).
optional-skills/ — heavier or niche skills shipped with the repo but NOT active by default. Installed explicitly via hermes skills install official/<category>/<skill>. Adapter lives in tools/skills_hub.py (OptionalSkillSource). Categories include autonomous-ai-agents, blockchain, communication, creative, devops, email, health, mcp, migration, mlops, productivity, research, security, web-development.

When reviewing skill PRs, check which directory they target — heavy-dep or niche skills belong in optional-skills/.

SKILL.md frontmatter

Standard fields: name, description, version, platforms (OS-gating list: [macos], [linux, macos], ...), metadata.hermes.tags, metadata.hermes.category, metadata.hermes.config (config.yaml settings the skill needs — stored under skills.config.<key>, prompted during setup, injected at load time).

Important Policies

Prompt Caching Must Not Break

Hermes-Agent ensures caching remains valid throughout a conversation. Do NOT implement changes that would:

Alter past context mid-conversation
Change toolsets mid-conversation
Reload memories or rebuild system prompts mid-conversation

Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.

Slash commands that mutate system-prompt state (skills, tools, memory, etc.) must be cache-aware: default to deferred invalidation (change takes effect next session), with an opt-in --now flag for immediate invalidation. See /skills install --now for the canonical pattern.

Background Process Notifications (Gateway)

When terminal(background=true, notify_on_complete=true) is used, the gateway runs a watcher that detects process completion and triggers a new agent turn. Control verbosity of background process messages with display.background_process_notifications in config.yaml (or HERMES_BACKGROUND_NOTIFICATIONS env var):

all — running-output updates + final message (default)
result — only the final completion message
error — only the final message when exit code != 0
off — no watcher messages at all

Profiles: Multi-Instance Support

Hermes supports profiles — multiple fully isolated instances, each with its own HERMES_HOME directory (config, API keys, memory, sessions, skills, gateway, etc.).

The core mechanism: _apply_profile_override() in hermes_cli/main.py sets HERMES_HOME before any module imports. All get_hermes_home() references automatically scope to the active profile.

Rules for profile-safe code

Use get_hermes_home() for all HERMES_HOME paths. Import from hermes_constants. NEVER hardcode ~/.hermes or Path.home() / ".hermes" in code that reads/writes state.

# GOOD
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"

# BAD — breaks profiles
config_path = Path.home() / ".hermes" / "config.yaml"

Use display_hermes_home() for user-facing messages. Import from hermes_constants. This returns ~/.hermes for default or ~/.hermes/profiles/<name> for profiles.

# GOOD
from hermes_constants import display_hermes_home
print(f"Config saved to {display_hermes_home()}/config.yaml")

# BAD — shows wrong path for profiles
print("Config saved to ~/.hermes/config.yaml")

Module-level constants are fine — they cache get_hermes_home() at import time, which is AFTER _apply_profile_override() sets the env var. Just use get_hermes_home(), not Path.home() / ".hermes".

Tests that mock Path.home() must also set HERMES_HOME — since code now uses get_hermes_home() (reads env var), not Path.home() / ".hermes":

with patch.object(Path, "home", return_value=tmp_path), \
     patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
    ...

Gateway platform adapters should use token locks — if the adapter connects with a unique credential (bot token, API key), call acquire_scoped_lock() from gateway.status in the connect()/start() method and release_scoped_lock() in disconnect()/stop(). This prevents two profiles from using the same credential. See gateway/platforms/telegram.py for the canonical pattern.
Profile operations are HOME-anchored, not HERMES_HOME-anchored — _get_profiles_root() returns Path.home() / ".hermes" / "profiles", NOT get_hermes_home() / "profiles". This is intentional — it lets hermes -p coder profile list see all profiles regardless of which one is active.

Known Pitfalls

DO NOT hardcode `~/.hermes` paths

Use get_hermes_home() from hermes_constants for code paths. Use display_hermes_home() for user-facing print/log messages. Hardcoding ~/.hermes breaks profiles — each profile has its own HERMES_HOME directory. This was the source of 5 bugs fixed in PR #3575.

DO NOT introduce new `simple_term_menu` usage

Existing call sites in hermes_cli/main.py remain for legacy fallback only; the preferred UI is curses (stdlib) because simple_term_menu has ghost-duplication rendering bugs in tmux/iTerm2 with arrow keys. New interactive menus must use hermes_cli/curses_ui.py — see hermes_cli/tools_config.py for the canonical pattern.

DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code

Leaks as literal ?[K text under prompt_toolkit's patch_stdout. Use space-padding: f"\r{line}{' ' * pad}".

`_last_resolved_tool_names` is a process-global in `model_tools.py`

_run_single_child() in delegate_tool.py saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.

DO NOT hardcode cross-tool references in schema descriptions

Tool schema descriptions must not mention tools from other toolsets by name (e.g., browser_navigate saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in get_tool_definitions() in model_tools.py — see the browser_navigate / execute_code post-processing blocks for the pattern.

The gateway has TWO message guards — both must bypass approval/control commands

When an agent is running, messages pass through two sequential guards: (1) base adapter (gateway/platforms/base.py) queues messages in _pending_messages when session_key in self._active_sessions, and (2) gateway runner (gateway/run.py) intercepts /stop, /new, /queue, /status, /approve, /deny before they reach running_agent.interrupt(). Any new command that must reach the runner while the agent is blocked (e.g. approval prompts) MUST bypass BOTH guards and be dispatched inline, not via _process_message_background() (which races session lifecycle).

Squash merges from stale branches silently revert recent fixes

Before squash-merging a PR, ensure the branch is up to date with main (git fetch origin main && git reset --hard origin/main in the worktree, then re-apply the PR's commits). A stale branch's version of an unrelated file will silently overwrite recent fixes on main when squashed. Verify with git diff HEAD~1..HEAD after merging — unexpected deletions are a red flag.

Don't wire in dead code without E2E validation

Unused code that was never shipped was dead for a reason. Before wiring an unused module into a live code path, E2E test the real resolution chain with actual imports (not mocks) against a temp HERMES_HOME.

Tests must not write to `~/.hermes/`

The _isolate_hermes_home autouse fixture in tests/conftest.py redirects HERMES_HOME to a temp dir. Never hardcode ~/.hermes/ paths in tests.

Profile tests: When testing profile features, also mock Path.home() so that _get_profiles_root() and _get_default_hermes_home() resolve within the temp dir. Use the pattern from tests/hermes_cli/test_profiles.py:

@pytest.fixture
def profile_env(tmp_path, monkeypatch):
    home = tmp_path / ".hermes"
    home.mkdir()
    monkeypatch.setattr(Path, "home", lambda: tmp_path)
    monkeypatch.setenv("HERMES_HOME", str(home))
    return home

Testing

ALWAYS use scripts/run_tests.sh — do not call pytest directly. The script enforces hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8, 4 xdist workers matching GHA ubuntu-latest). Direct pytest on a 16+ core developer machine with API keys set diverges from CI in ways that have caused multiple "works locally, fails in CI" incidents (and the reverse).

scripts/run_tests.sh                                  # full suite, CI-parity
scripts/run_tests.sh tests/gateway/                   # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags

Why the wrapper (and why the old "just call pytest" doesn't work)

Five real sources of local-vs-CI drift the script closes:

	Without wrapper	With wrapper
Provider API keys	Whatever is in your env (auto-detects pool)	All `_API_KEY`/`_TOKEN`/etc. unset
HOME / `~/.hermes/`	Your real config+auth.json	Temp dir per test
Timezone	Local TZ (PDT etc.)	UTC
Locale	Whatever is set	C.UTF-8
xdist workers	`-n auto` = all cores (20+ on a workstation)	`-n 4` matching CI

tests/conftest.py also enforces points 1-4 as an autouse fixture so ANY pytest invocation (including IDE integrations) gets hermetic behavior — but the wrapper is belt-and-suspenders.

Running without the wrapper (only if you must)

If you can't use the wrapper (e.g. on Windows or inside an IDE that shells pytest directly), at minimum activate the venv and pass -n 4:

source .venv/bin/activate   # or: source venv/bin/activate
python -m pytest tests/ -q -n 4

Worker count above 4 will surface test-ordering flakes that CI never sees.

Always run the full suite before pushing changes.

Don't write change-detector tests

A test is a change-detector if it fails whenever data that is expected to change gets updated — model catalogs, config version numbers, enumeration counts, hardcoded lists of provider models. These tests add no behavioral coverage; they just guarantee that routine source updates break CI and cost engineering time to "fix."

Do not write:

# catalog snapshot — breaks every model release
assert "gemini-2.5-pro" in _PROVIDER_MODELS["gemini"]
assert "MiniMax-M2.7" in models

# config version literal — breaks every schema bump
assert DEFAULT_CONFIG["_config_version"] == 21

# enumeration count — breaks every time a skill/provider is added
assert len(_PROVIDER_MODELS["huggingface"]) == 8

Do write:

# behavior: does the catalog plumbing work at all?
assert "gemini" in _PROVIDER_MODELS
assert len(_PROVIDER_MODELS["gemini"]) >= 1

# behavior: does migration bump the user's version to current latest?
assert raw["_config_version"] == DEFAULT_CONFIG["_config_version"]

# invariant: no plan-only model leaks into the legacy list
assert not (set(moonshot_models) & coding_plan_only_models)

# invariant: every model in the catalog has a context-length entry
for m in _PROVIDER_MODELS["huggingface"]:
    assert m.lower() in DEFAULT_CONTEXT_LENGTHS_LOWER

The rule: if the test reads like a snapshot of current data, delete it. If it reads like a contract about how two pieces of data must relate, keep it. When a PR adds a new provider/model and you want a test, make the test assert the relationship (e.g. "catalog entries all have context lengths"), not the specific names.

Reviewers should reject new change-detector tests; authors should convert them into invariants before re-requesting review.

35 KiB Raw Blame History