* skills: port spike, sketch, and gates/context-budget references from GSD Adds two new lightweight standalone skills and two reference docs adapted from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson). All ports coexist cleanly with a full `npx get-shit-done-cc --hermes --global` install — GSD lives under `skills/gsd-*/`, these ports live at their natural Hermes category paths, zero name collisions. New skills: - skills/software-development/spike/ — Lightweight "spike an idea with throwaway experiments" workflow: decompose into Given/When/Then questions, research per-spike, build comparable variants, close with VALIDATED/PARTIAL/INVALIDATED verdict. Standalone alternative to the full `gsd-spike` (which requires `.planning/spikes/` state machinery and the rest of GSD). - skills/creative/sketch/ — Lightweight "sketch 2-3 HTML design variants" workflow: intake (feel, references, core action), produce differentiated variants along a design axis, head-to-head comparison. Standalone alternative to the full `gsd-sketch`. New references under subagent-driven-development/: - references/context-budget-discipline.md — Four-tier context degradation model (PEAK/GOOD/DEGRADING/POOR at 0-30%/30-50%/50-70%/70%+) with read-depth rules that scale with context window size, plus early warning signs of silent degradation (silent partial completion, increasing vagueness, skipped protocol steps). - references/gates-taxonomy.md — Four canonical gate types for validation checkpoints: Pre-flight (precondition block), Revision (bounded retry loop with stall detection), Escalation (pause for human decision), Abort (terminate to prevent damage). Each ships with behavior, recovery, and examples. Collision guard: each port has explicit "If the user has the full GSD system installed" guidance directing the agent to prefer `gsd-spike` / `gsd-sketch` when the full workflow is available. Verified end-to-end with 86 GSD skills + these 2 Hermes ports installed in the same HERMES_HOME — 90 total skills, zero duplicate names, both counterparts appear in the system prompt with distinct descriptions. Attribution preserved in each SKILL.md footer per MIT notice requirement. Full GSD system now installable via `npx get-shit-done-cc --hermes --global` (gsd-build/get-shit-done#2845). * skills/gsd-port: tighten descriptions, surface Hermes-native tools Review feedback adjustments to the spike/sketch ports from the previous commit on this branch: - description lengths trimmed to <=60 chars with trigger-first phrasing (spike: 55 chars 'Throwaway experiments to validate an idea before build.'; sketch: 55 chars 'Throwaway HTML mockups: 2-3 design variants to compare.') - author field credits gsd-build/get-shit-done explicitly - stale duplicate top-level `tags:` removed from sketch frontmatter (Hermes reads only metadata.hermes.tags — the top-level field was dead weight) - spike research step now shows concrete Hermes tool calls (web_search, web_extract with real URLs, terminal for venv inspection) instead of just naming the tool names - spike build step adds a worked tool-sequence example (terminal + write_file + terminal to run) and a delegate_task fan-out pattern for parallel comparison spikes (002a / 002b) - sketch build step adds browser_navigate + browser_vision verification step — visual spot-check that catches layout bugs pure source inspection misses - sketch Output section adds a worked tool-sequence example mirroring the spike pattern Descriptions now lead with 'Throwaway' (the pattern-match word that signals 'disposable / not production code') — gives the agent a clean activation signal in the system-prompt skill index.
4.1 KiB
Context Budget Discipline
Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.
Adapted from the GSD (Get Shit Done) project's context-budget reference — MIT © 2025 Lex Christopherson (gsd-build/get-shit-done).
Universal rules
Every workflow that spawns agents or reads significant content must follow these:
- Never read agent definition files.
delegate_taskauto-loads them — you reading them too just doubles the cost. - Never inline large files into subagent prompts. Tell the agent to read the file from disk with
read_fileinstead. The subagent gets full content; your context stays lean. - Read depth scales with context window. See the table below.
- Delegate heavy work to subagents. The orchestrator routes; it doesn't execute.
- Proactively warn the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").
Read depth by context window
Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.
| Context window | Subagent output reading | Summary files | Verification files | Plans for other phases |
|---|---|---|---|---|
| < 500k (e.g. 200k) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
| >= 500k (1M models) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.
Four-tier degradation model
Monitor your context usage and shift behavior as you climb the tiers. The point is to notice before you hit the wall, not when responses start truncating.
| Tier | Usage | Behavior |
|---|---|---|
| PEAK | 0 – 30% | Full operations. Read bodies, spawn multiple agents in parallel, inline results freely. |
| GOOD | 30 – 50% | Normal operations. Prefer frontmatter reads. Delegate aggressively. |
| DEGRADING | 50 – 70% | Economize. Frontmatter-only reads, minimal inlining, warn the user about budget. |
| POOR | 70%+ | Emergency mode. Checkpoint progress immediately. No new reads unless critical. Finish the current task and stop cleanly. |
Early warning signs (before panic thresholds fire)
Quality degrades gradually before hard limits hit. Watch for these:
- Silent partial completion. Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
- Increasing vagueness. Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
- Skipped protocol steps. Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."
When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.
Fundamental limitation
When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.
Mitigation: in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.