From 6b330522e1fb8950a6552e421d1fa9df4793b33f Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Tue, 9 Jun 2026 21:31:07 -0700 Subject: [PATCH] docs(agents): add Design Philosophy + Contribution Rubric to AGENTS.md (#42641) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (#41741 intentional-design-not-a-gap, #41610 wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission, #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them. --- AGENTS.md | 203 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 200 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 15cd7536ef1..8d9ef1621d8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,6 +4,201 @@ Instructions for AI coding assistants and developers working on the hermes-agent **Never give up on the right solution.** +## What Hermes Is + +Hermes is a personal AI agent that runs the same agent core across a CLI, a +messaging gateway (Telegram, Discord, Slack, and ~20 other platforms), a TUI, +and an Electron desktop app. It learns across sessions (memory + skills), +delegates to subagents, runs scheduled jobs, and drives a real terminal and +browser. It is extended primarily through **plugins and skills**, not by +growing the core. + +Two properties shape almost every design decision and are the lens for +reviewing any change: + +- **Per-conversation prompt caching is sacred.** A long-lived conversation + reuses a cached prefix every turn. Anything that mutates past context, + swaps toolsets, or rebuilds the system prompt mid-conversation invalidates + that cache and multiplies the user's cost. We do not do it (the one + exception is context compression). +- **The core is a narrow waist; capability lives at the edges.** Every model + tool we add is sent on every API call, so the bar for a new *core* tool is + high. Most new capability should arrive as a CLI command + skill, a + service-gated tool, or a plugin — not as core surface. + +## Contribution Rubric — What We Want / What We Don't + +This is the project's intent layer. Use it two ways: + +1. **For humans and for your own work** — what gets merged and what gets + rejected, so a contribution aims at the target. +2. **For automated review (the triage sweeper)** — guidance on when a PR is + safe to close on the three allowed reasons (`implemented_on_main`, + `cannot_reproduce`, `incoherent`) and, just as important, **when NOT to + close** one. Taste-based "we don't want this / out of scope" closes are NOT + an automated decision — those stay with a human maintainer. The sweeper's + job here is to recognize design intent and *avoid wrongly closing a + legitimate contribution*, not to make the won't-implement call itself. + +Read the balance right: Hermes ships a **lot** — most merges are bug fixes to +real reported behavior, and the product surface (platforms, channels, +providers, models, desktop/TUI features) expands aggressively and on purpose. +The restraint below is aimed squarely at the **core agent + the model tool +schema**, the one place where every addition is paid for on every API call. +"Smallest footprint" governs *how a capability is wired into the core*, NOT +whether the product is allowed to grow. We are expansive at the edges and +conservative at the waist. + +### What we want + +- **Fix real bugs, well.** The bulk of what lands is `fix(...)` against an + actual reported symptom. A good fix reproduces the symptom on current + `main`, points to the exact line where it manifests, and fixes the whole bug + class — sibling call paths included — not just the one site the reporter hit. +- **Expand reach at the edges.** New platform adapters, channels, providers, + models, and desktop/TUI/dashboard features are welcome and land routinely, + including large ones (a new messaging channel, a session-cap feature, a + Windows PTY bridge). Breadth in the product is a goal, not a footprint + concern — as long as it integrates with the existing setup/config UX + (`hermes tools`, `hermes setup`, auto-install) rather than bolting on a raw + env var. +- **Refactor god-files into clean modules.** Extracting a multi-thousand-line + cluster out of `cli.py` / `run_agent.py` / `gateway/run.py` into a focused + mixin or module is wanted work, even when the diff is huge and mechanical + (large `+N/-N` refactors merge regularly). The "every line traces to the + request" test applies to *feature* PRs; a declared refactor's request IS the + extraction. +- **Keep the core narrow.** New *model tools* are the expensive exception — + every tool ships on every API call. Prefer, in order: extend existing code → + CLI command + skill → service-gated tool (`check_fn`) → plugin → MCP server + in the catalog → new core tool (last resort). See "The Footprint Ladder." +- **Extend, don't duplicate.** Before adding a module/manager/hook, check + whether existing infrastructure already covers the use case. When several PRs + integrate the same *category*, design one shared interface instead of merging + them one at a time (see the ABC + orchestrator note under the Footprint + Ladder). +- **Behavior contracts over snapshots.** Tests should assert how two pieces of + data must relate (invariants), not freeze a current value (model lists, + config version literals, enumeration counts). See "Don't write + change-detector tests." +- **E2E validation, not just green unit mocks.** For anything touching + resolution chains, config propagation, security boundaries, remote + backends, or file/network I/O, exercise the real path with real imports + against a temp `HERMES_HOME`. Mocks hide integration bugs. +- **Cache-, alternation-, and invariant-safe.** Preserve prompt caching, strict + message role alternation (never two same-role messages in a row; never a + synthetic user message injected mid-loop), and a system prompt that is + byte-stable for the life of a conversation. +- **Contributor credit preserved.** Salvage external work by cherry-picking + (rebase-merge) so authorship survives in git history; don't reimplement from + scratch when you can build on top. + +### What we don't want (rejected even when well-built) + +- **Speculative infrastructure.** Hooks, callbacks, or extension points with no + concrete consumer. Adding a hook is easy; removing one after plugins depend + on it is hard. A hook is NOT speculative if a contributor has a real, stated + use case — even if the consumer ships separately. +- **New `HERMES_*` env vars for non-secret config.** `.env` is for secrets + only (API keys, tokens, passwords). All behavioral settings — timeouts, + thresholds, feature flags, display prefs — go in `config.yaml`. Bridge to an + internal env var if the mechanism needs one, but user-facing docs point to + `config.yaml`. Reject PRs that tell users to "set X in your .env" unless X + is a credential. +- **A new core tool when terminal + file already do the job, or when a skill + would.** If the only barrier is file visibility on a remote backend, fix the + mount, not the toolset. +- **Lazy-reading escape hatches on instructional tools.** No `offset`/`limit` + pagination on tools that load content the agent must read fully (skills, + prompts, playbooks). Models will read page 1 and skip the rest. +- **"Fixes" that destroy the feature they secure.** A mitigation that kills the + feature's purpose is the wrong mitigation. Read the original commit's intent + (`git log -p -S`) before restricting behavior; find a fix that preserves the + feature. +- **Outbound telemetry / usage attribution without opt-in gating.** No new + analytics, third-party identifier tagging, or attribution tags until a + generic user-facing opt-in (config gate + setup prompt + `hermes tools` + toggle) exists. Park behind a label, do not merge. +- **Change-detector tests, cache-breaking mid-conversation, dead code wired in + without E2E proof, and plugins that touch core files.** Plugins live in their + own directory and work within the ABCs/hooks we provide; if a plugin needs + more, widen the generic plugin surface, don't special-case it in core. + +### Before you call it a bug — verify the premise (and when NOT to close) + +The most common reason a well-written PR gets closed is not code quality — it +is that the change is built on a **wrong premise**, or it treats an +**intentional design as a gap**. These patterns cut both ways: they tell a +human reviewer what to scrutinize, and they tell the automated sweeper when a +PR is NOT safe to close as `implemented_on_main` / `cannot_reproduce` (when in +doubt, leave it open for a human). They are distilled from real closes. + +- **"Intentional design, not a gap."** A limitation that looks like an + oversight is often deliberate. Before "fixing" a missing link or a + restriction, ask whether the isolation IS the design. Example: profiles are + independent islands on purpose — a PR adding live config inheritance from the + default profile was closed because coupling profiles together is exactly what + the design prevents (the copy-at-creation `--clone` path already covers the + legitimate "start from my default" case). Read the original commit's intent + (`git log -p -S ""`) before assuming something is unfinished. +- **"The premise doesn't hold against how X actually works."** A PR's + justification frequently rests on a wrong mental model of an existing + mechanism. Trace the real code/runtime before accepting the rationale. Two + real closes: a rate-limit "re-probe during cooldown" PR (the breaker only + trips on a *confirmed-empty* account bucket, so re-probing just hammers a + bucket we've already proven empty); a usage-accumulation fix whose new branch + **never executes at runtime** because an earlier guard already popped the + state it depended on. If you can't point to the exact line where the bug + manifests AND show the fix changes that line's behavior, you haven't verified + the premise. +- **"This fix was wrong — the absence/omission was deliberate."** Adding the + obvious-looking missing piece can break things the omission was protecting. + Example: restoring "missing" `__init__.py` files made a test tree importable + as a dotted package that shadowed the real plugin, deleting its `register()` + at import time. The absence was load-bearing. +- **"Overreached / resurrected an approach we'd moved past."** Scope creep that + supersedes an agreed-on base, or revives a direction the maintainers + deliberately closed, gets rejected even when the code works. Keep the change + to the narrow piece that was actually agreed; offer the rest as a focused + follow-up. + +The throughline: **verify the claim AND the intent against the codebase before +writing or merging a fix.** A confirmed reproduction on current `main` plus a +line-level account of where the fix acts beats a plausible-sounding rationale +every time. When in doubt about intent, it is cheaper to ask than to ship a +fix that fights the design. + +### The Footprint Ladder (new capability decision) + +Each rung adds more permanent surface than the one above. Choose the highest +(least-footprint) rung that correctly solves the problem: + +1. **Extend existing code** — the capability is a variation of something that + already exists. Zero new surface. +2. **CLI command + skill** — manages config/state/infra expressible as shell + commands. The agent runs `hermes ` guided by a skill. Zero + model-tool footprint. Default choice for subscriptions, scheduled tasks, + service setup. Examples: `hermes webhook`, `hermes cron`, `hermes tools`. +3. **Service-gated tool (`check_fn`)** — needs structured params/returns AND + only appears when a prerequisite is configured. Zero footprint otherwise. + Examples: Home Assistant tools (gated on token), memory-provider tools. +4. **Plugin** — third-party/niche/user-specific capability that doesn't ship in + core. Lives in `~/.hermes/plugins/` or a pip package, discovered at runtime. +5. **MCP server (in the catalog)** — if the capability genuinely needs to be a + tool (structured I/O the agent invokes) but isn't core-fundamental, prefer + building it as an MCP server and adding it to the MCP catalog over growing + the core toolset. The agent connects to it through the built-in MCP client; + zero permanent core-schema footprint, and it's reusable by any MCP host. +6. **New core tool** — only when the capability is fundamental, broadly useful + to nearly every user, and unreachable via terminal + file (or an MCP server). + Examples of correct core tools: terminal, read_file, web_search, + browser_navigate. + +When 3+ open PRs try to integrate the same *category* of thing (memory +backends, providers, notifiers), don't merge them one at a time — design an +ABC + orchestrator, wrap the existing built-in as the first provider, and turn +the competing PRs into plugins against that interface. + ## Development Environment ```bash @@ -302,9 +497,11 @@ A **separate** chat surface from both the classic CLI and the dashboard's embedd ## Adding New Tools -For most custom or local-only tools, do **not** edit Hermes core. Use the plugin -route instead: create `~/.hermes/plugins//plugin.yaml` and -`~/.hermes/plugins//__init__.py`, then register tools with +Before adding any tool, settle the footprint question first (see "The +Footprint Ladder" in the Contribution Rubric): most capabilities should NOT +be core tools. For custom or local-only tools, do **not** edit Hermes core. +Use the plugin route instead: create `~/.hermes/plugins//plugin.yaml` +and `~/.hermes/plugins//__init__.py`, then register tools with `ctx.register_tool(...)`. Plugin toolsets are discovered automatically and can be enabled or disabled without touching `tools/` or `toolsets.py`.