mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-26 11:12:03 +00:00
* refactor(skills): clean up bundled skill set + add environments: relevance gate Bundled skills cleanup pass plus a new offer-time relevance gate. Removals (redundant / dead): - spotify (covered by the spotify plugin's 7 native tools) - linear (covered by `hermes mcp install linear`) - kanban-codex-lane, debugging-hermes-tui-commands - empty category markers: diagramming, gifs, inference-sh, mlops/training, mlops/vector-databases - domain (stale orphan dup of optional/research/domain-intel) Bundled -> optional: - baoyu-article-illustrator, baoyu-comic, creative-ideation, pixel-art - dspy, subagent-driven-development - minecraft-modpack-server, pokemon-player - hermes-s6-container-supervision (-> optional/devops) Consolidation: - webhook-subscriptions + native-mcp folded into the hermes-agent skill as references/webhooks.md + references/native-mcp.md with SKILL.md pointers - writing-plans merged into plan (v2.0.0); related_skills + prose refs updated New: environments: frontmatter gate (agent/skill_utils.skill_matches_environment) - Offer-time relevance filter (kanban / docker / s6), parallel to platforms:. - Wired into the 3 OFFER surfaces only (prompt_builder skills index, skills_tool.list_skills, skill_commands slash discovery). - Explicit loads (skill_view, --skills preload) intentionally BYPASS it, so load-bearing force-loads like the kanban dispatcher's `--skills kanban-worker` always resolve. Verified via E2E. - kanban-orchestrator/kanban-worker tagged environments: [kanban]; hermes-s6-container-supervision tagged environments: [s6] + platforms: [linux]. Validation: 8/8 E2E gating assertions (incl force-load invariant); 442 targeted tests green (agent, skills_tool, skill_commands, kanban worker). * docs: regenerate skill catalogs + pages for the bundled cleanup Regenerated per-skill doc pages, catalogs, and sidebar to match the skill moves/removals in the parent commit. Moved skills' pages relocate bundled -> optional (history preserved); removed skills' pages deleted; edited skills' pages refreshed (hermes-agent now embeds the webhook + native-mcp reference pointers). zh-Hans i18n mirror: stale bundled pages and catalog rows for moved/removed skills pruned (new optional translations land via the translation pipeline). * test: drop regression test for removed kanban-codex-lane skill The kanban-codex-lane skill was removed in the bundled-skills cleanup; its dedicated regression test read the now-deleted SKILL.md and failed with FileNotFoundError on CI shard 6.
53 lines
4.1 KiB
Markdown
53 lines
4.1 KiB
Markdown
# Context Budget Discipline
|
||
|
||
Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.
|
||
|
||
Adapted from the GSD (Get Shit Done) project's context-budget reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
|
||
|
||
## Universal rules
|
||
|
||
Every workflow that spawns agents or reads significant content must follow these:
|
||
|
||
1. **Never read agent definition files.** `delegate_task` auto-loads them — you reading them too just doubles the cost.
|
||
2. **Never inline large files into subagent prompts.** Tell the agent to read the file from disk with `read_file` instead. The subagent gets full content; your context stays lean.
|
||
3. **Read depth scales with context window.** See the table below.
|
||
4. **Delegate heavy work to subagents.** The orchestrator routes; it doesn't execute.
|
||
5. **Proactively warn** the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").
|
||
|
||
## Read depth by context window
|
||
|
||
Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.
|
||
|
||
| Context window | Subagent output reading | Summary files | Verification files | Plans for other phases |
|
||
|----------------|-------------------------|---------------|--------------------|-----------------------|
|
||
| < 500k (e.g. 200k) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
|
||
| >= 500k (1M models) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
|
||
|
||
"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.
|
||
|
||
## Four-tier degradation model
|
||
|
||
Monitor your context usage and shift behavior as you climb the tiers. The point is to notice *before* you hit the wall, not when responses start truncating.
|
||
|
||
| Tier | Usage | Behavior |
|
||
|------|-------|----------|
|
||
| **PEAK** | 0 – 30% | Full operations. Read bodies, spawn multiple agents in parallel, inline results freely. |
|
||
| **GOOD** | 30 – 50% | Normal operations. Prefer frontmatter reads. Delegate aggressively. |
|
||
| **DEGRADING** | 50 – 70% | Economize. Frontmatter-only reads, minimal inlining, **warn the user** about budget. |
|
||
| **POOR** | 70%+ | Emergency mode. **Checkpoint progress immediately.** No new reads unless critical. Finish the current task and stop cleanly. |
|
||
|
||
## Early warning signs (before panic thresholds fire)
|
||
|
||
Quality degrades *gradually* before hard limits hit. Watch for these:
|
||
|
||
- **Silent partial completion.** Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
|
||
- **Increasing vagueness.** Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
|
||
- **Skipped protocol steps.** Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."
|
||
|
||
When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.
|
||
|
||
## Fundamental limitation
|
||
|
||
When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.
|
||
|
||
**Mitigation:** in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.
|