mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-01 01:51:44 +00:00

skills: adapt spike/sketch + 2 references from gsd-build/get-shit-done (MIT) (#17421 )

* skills: port spike, sketch, and gates/context-budget references from GSD

Adds two new lightweight standalone skills and two reference docs adapted
from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson). All ports
coexist cleanly with a full `npx get-shit-done-cc --hermes --global`
install — GSD lives under `skills/gsd-*/`, these ports live at their
natural Hermes category paths, zero name collisions.

New skills:
- skills/software-development/spike/ — Lightweight "spike an idea with
  throwaway experiments" workflow: decompose into Given/When/Then
  questions, research per-spike, build comparable variants, close with
  VALIDATED/PARTIAL/INVALIDATED verdict. Standalone alternative to the
  full `gsd-spike` (which requires `.planning/spikes/` state machinery
  and the rest of GSD).
- skills/creative/sketch/ — Lightweight "sketch 2-3 HTML design
  variants" workflow: intake (feel, references, core action), produce
  differentiated variants along a design axis, head-to-head comparison.
  Standalone alternative to the full `gsd-sketch`.

New references under subagent-driven-development/:
- references/context-budget-discipline.md — Four-tier context
  degradation model (PEAK/GOOD/DEGRADING/POOR at 0-30%/30-50%/50-70%/70%+)
  with read-depth rules that scale with context window size, plus early
  warning signs of silent degradation (silent partial completion,
  increasing vagueness, skipped protocol steps).
- references/gates-taxonomy.md — Four canonical gate types for
  validation checkpoints: Pre-flight (precondition block), Revision
  (bounded retry loop with stall detection), Escalation (pause for
  human decision), Abort (terminate to prevent damage). Each ships
  with behavior, recovery, and examples.

Collision guard: each port has explicit "If the user has the full GSD
system installed" guidance directing the agent to prefer `gsd-spike` /
`gsd-sketch` when the full workflow is available. Verified end-to-end
with 86 GSD skills + these 2 Hermes ports installed in the same
HERMES_HOME — 90 total skills, zero duplicate names, both
counterparts appear in the system prompt with distinct descriptions.

Attribution preserved in each SKILL.md footer per MIT notice
requirement. Full GSD system now installable via
`npx get-shit-done-cc --hermes --global` (gsd-build/get-shit-done#2845).

* skills/gsd-port: tighten descriptions, surface Hermes-native tools

Review feedback adjustments to the spike/sketch ports from the previous
commit on this branch:

- description lengths trimmed to <=60 chars with trigger-first phrasing
  (spike: 55 chars 'Throwaway experiments to validate an idea before build.';
   sketch: 55 chars 'Throwaway HTML mockups: 2-3 design variants to compare.')
- author field credits gsd-build/get-shit-done explicitly
- stale duplicate top-level `tags:` removed from sketch frontmatter
  (Hermes reads only metadata.hermes.tags — the top-level field was
  dead weight)
- spike research step now shows concrete Hermes tool calls
  (web_search, web_extract with real URLs, terminal for venv inspection)
  instead of just naming the tool names
- spike build step adds a worked tool-sequence example
  (terminal + write_file + terminal to run) and a delegate_task fan-out
  pattern for parallel comparison spikes (002a / 002b)
- sketch build step adds browser_navigate + browser_vision verification
  step — visual spot-check that catches layout bugs pure source
  inspection misses
- sketch Output section adds a worked tool-sequence example mirroring
  the spike pattern

Descriptions now lead with 'Throwaway' (the pattern-match word that
signals 'disposable / not production code') — gives the agent a clean
activation signal in the system-prompt skill index.

2026-04-29 06:10:05 -07:00

4.1 KiB

Raw Blame History

Context Budget Discipline

Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.

Universal rules

Every workflow that spawns agents or reads significant content must follow these:

Never read agent definition files. delegate_task auto-loads them — you reading them too just doubles the cost.
Never inline large files into subagent prompts. Tell the agent to read the file from disk with read_file instead. The subagent gets full content; your context stays lean.
Read depth scales with context window. See the table below.
Delegate heavy work to subagents. The orchestrator routes; it doesn't execute.
Proactively warn the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").

Read depth by context window

Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.

Context window	Subagent output reading	Summary files	Verification files	Plans for other phases
< 500k (e.g. 200k)	Frontmatter only	Frontmatter only	Frontmatter only	Current phase only
>= 500k (1M models)	Full body permitted	Full body permitted	Full body permitted	Current phase only

"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.

Four-tier degradation model

Monitor your context usage and shift behavior as you climb the tiers. The point is to notice before you hit the wall, not when responses start truncating.

Tier	Usage	Behavior
PEAK	0 – 30%	Full operations. Read bodies, spawn multiple agents in parallel, inline results freely.
GOOD	30 – 50%	Normal operations. Prefer frontmatter reads. Delegate aggressively.
DEGRADING	50 – 70%	Economize. Frontmatter-only reads, minimal inlining, warn the user about budget.
POOR	70%+	Emergency mode. Checkpoint progress immediately. No new reads unless critical. Finish the current task and stop cleanly.

Early warning signs (before panic thresholds fire)

Quality degrades gradually before hard limits hit. Watch for these:

Silent partial completion. Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
Increasing vagueness. Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
Skipped protocol steps. Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."

When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.

Fundamental limitation

When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.

Mitigation: in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.

4.1 KiB Raw Blame History Unescape Escape