diff --git a/skills/creative/sketch/SKILL.md b/skills/creative/sketch/SKILL.md
new file mode 100644
index 0000000000..b84f143dd4
--- /dev/null
+++ b/skills/creative/sketch/SKILL.md
@@ -0,0 +1,217 @@
+---
+name: sketch
+description: "Throwaway HTML mockups: 2-3 design variants to compare."
+version: 1.0.0
+author: Hermes Agent (adapted from gsd-build/get-shit-done)
+license: MIT
+metadata:
+  hermes:
+    tags: [sketch, mockup, design, ui, prototype, html, variants, exploration, wireframe, comparison]
+    related_skills: [spike, claude-design, popular-web-designs, excalidraw]
+---
+
+# Sketch
+
+Use this skill when the user wants to **see a design direction before committing** to one — exploring a UI/UX idea as disposable HTML mockups. The point is to generate 2-3 interactive variants so the user can compare visual directions side-by-side, not to produce shippable code.
+
+Load this when the user says things like "sketch this screen", "show me what X could look like", "compare layout A vs B", "give me 2-3 takes on this UI", "let me see some variants", "mockup this before I build".
+
+## When NOT to use this
+
+- User wants a production component — use `claude-design` or build it properly
+- User wants a polished one-off HTML artifact (landing page, deck) — `claude-design`
+- User wants a diagram — `excalidraw`, `architecture-diagram`
+- The design is already locked — just build it
+
+## If the user has the full GSD system installed
+
+If `gsd-sketch` shows up as a sibling skill (installed via `npx get-shit-done-cc --hermes`), prefer **`gsd-sketch`** for the full workflow: persistent `.planning/sketches/` with MANIFEST, frontier mode analysis, consistency audits across past sketches, and integration with the rest of GSD. This skill is the lightweight standalone version — one-off sketching without the state machinery.
+
+## Core method
+
+```
+intake  →  variants  →  head-to-head  →  pick winner (or iterate)
+```
+
+### 1. Intake (skip if the user already gave you enough)
+
+Before generating variants, get three things — one question at a time, not all at once:
+
+1. **Feel.** "What should this feel like? Adjectives, emotions, a vibe." — *"calm, editorial, like Linear"* tells you more than *"minimal"*.
+2. **References.** "What apps, sites, or products capture the feel you're imagining?" — actual references beat abstract descriptions.
+3. **Core action.** "What's the single most important thing a user does on this screen?" — the variants should all serve this well; if they don't, they're just decoration.
+
+Reflect each answer briefly before the next question. If the user already gave you all three upfront, skip straight to variants.
+
+### 2. Variants (2-3, never 1, rarely 4+)
+
+Produce **2-3 variants** in one go. Each variant is a complete, standalone HTML file. Don't describe variants — build them. The point is comparison.
+
+Each variant should take a **different design stance**, not different pixel values. Three good variant axes:
+
+- **Density:** compact / airy / ultra-dense (pick two contrasting poles)
+- **Emphasis:** content-first / action-first / tool-first
+- **Aesthetic:** editorial / utilitarian / playful
+- **Layout:** single-column / sidebar / split-pane
+- **Grounding:** card-based / bare-content / document-style
+
+Pick one axis and pull apart from it. Two variants that differ only in accent color are wasted effort — the user can't distinguish them.
+
+**Variant naming:** describe the stance, not the number.
+
+```
+sketches/
+├── 001-calm-editorial/
+│   ├── index.html
+│   └── README.md
+├── 001-utilitarian-dense/
+│   ├── index.html
+│   └── README.md
+└── 001-playful-split/
+    ├── index.html
+    └── README.md
+```
+
+### 3. Make them real HTML
+
+Each variant is a **single self-contained HTML file**:
+
+- Inline `<style>` — no build step, no external CSS
+- System fonts or one Google Font via `<link>`
+- Tailwind via CDN (`<script src="https://cdn.tailwindcss.com"></script>`) is fine
+- Realistic fake content — actual sentences, actual names, not "Lorem ipsum"
+- **Interactive**: links clickable, hovers real, at least one state transition (open/close, filter, toggle). A frozen static image is a worse spike than a sloppy animated one.
+
+Open it in a browser. If it looks broken, fix it before showing the user.
+
+**Verify variants visually — use Hermes' browser tools.** Don't just write HTML and hope it renders; load each variant and look at it:
+
+```
+browser_navigate(url="file:///absolute/path/to/sketches/001-calm-editorial/index.html")
+browser_vision(question="Does this layout look clean and readable? Any visible bugs (overlapping text, unstyled elements, broken images)?")
+```
+
+`browser_vision` returns an AI description of what's actually on the page plus a screenshot path — catches layout bugs that pure source inspection misses (e.g. a font import that silently failed, a flex container that collapsed). Fix and re-navigate until each variant looks right.
+
+**Default CSS reset + system font stack** for fast starts:
+
+```html
+<style>
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body {
+    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
+                 "Helvetica Neue", Arial, sans-serif;
+    -webkit-font-smoothing: antialiased;
+    color: #1a1a1a;
+    background: #fafafa;
+    line-height: 1.5;
+  }
+</style>
+```
+
+### 4. Variant README
+
+Each variant's `README.md` answers:
+
+```markdown
+## Variant: {stance name}
+
+### Design stance
+One sentence on the principle driving this variant.
+
+### Key choices
+- Layout: ...
+- Typography: ...
+- Color: ...
+- Interaction: ...
+
+### Trade-offs
+- Strong at: ...
+- Weak at: ...
+
+### Best for
+- The kind of user or use case this variant actually serves
+```
+
+### 5. Head-to-head
+
+After all variants are built, present them as a comparison. Don't just list — **opinionate**:
+
+```markdown
+## Three takes on the home screen
+
+| Dimension | Calm editorial | Utilitarian dense | Playful split |
+|-----------|----------------|-------------------|---------------|
+| Density   | Low            | High              | Medium        |
+| Primary action visibility | Low | High | Medium |
+| Scan-ability | High | Medium | Low |
+| Feel | Calm, trusted | Sharp, tool-like | Inviting, energetic |
+
+**My take:** Utilitarian dense for power users, calm editorial for content-forward audiences. Playful split is weakest — tries to do both and commits to neither.
+```
+
+Let the user pick a winner, or combine two into a hybrid, or ask for another round.
+
+## Theming (when the project has a visual identity)
+
+If the user has an existing theme (colors, fonts, tokens), put shared tokens in `sketches/themes/tokens.css` and `@import` them in each variant. Keep tokens minimal:
+
+```css
+/* sketches/themes/tokens.css */
+:root {
+  --color-bg: #fafafa;
+  --color-fg: #1a1a1a;
+  --color-accent: #0066ff;
+  --color-muted: #666;
+  --radius: 8px;
+  --font-display: "Inter", sans-serif;
+  --font-body: -apple-system, BlinkMacSystemFont, sans-serif;
+}
+```
+
+Don't over-tokenize a throwaway sketch — three colors and one font is usually enough.
+
+## Interactivity bar
+
+A sketch is interactive enough when the user can:
+
+1. **Click a primary action** and something visible happens (state change, modal, toast, navigation feint)
+2. **See one meaningful state transition** (filter a list, toggle a mode, open/close a panel)
+3. **Hover recognizable affordances** (buttons, rows, tabs)
+
+More than that is over-engineering a throwaway. Less than that is a screenshot.
+
+## Frontier mode (picking what to sketch next)
+
+If sketches already exist and the user says "what should I sketch next?":
+
+- **Consistency gaps** — two winning variants from different sketches made independent choices that haven't been composed together yet
+- **Unsketched screens** — referenced but never explored
+- **State coverage** — happy path sketched, but not empty / loading / error / 1000-items
+- **Responsive gaps** — validated at one viewport; does it hold at mobile / ultrawide?
+- **Interaction patterns** — static layouts exist; transitions, drag, scroll behavior don't
+
+Propose 2-4 named candidates. Let the user pick.
+
+## Output
+
+- Create `sketches/` (or `.planning/sketches/` if the user is using GSD conventions) in the repo root
+- One subdir per variant: `NNN-stance-name/index.html` + `README.md`
+- Tell the user how to open them: `open sketches/001-calm-editorial/index.html` on macOS, `xdg-open` on Linux, `start` on Windows
+- Keep variants disposable — a sketch that you felt the need to preserve should be promoted into real project code, not curated as an asset
+
+**Typical tool sequence for one variant:**
+
+```
+terminal("mkdir -p sketches/001-calm-editorial")
+write_file("sketches/001-calm-editorial/index.html", "<!doctype html>...")
+write_file("sketches/001-calm-editorial/README.md", "## Variant: Calm editorial\n...")
+browser_navigate(url="file://$(pwd)/sketches/001-calm-editorial/index.html")
+browser_vision(question="How does this look? Any obvious layout issues?")
+```
+
+Repeat for each variant, then present the comparison table.
+
+## Attribution
+
+Adapted from the GSD (Get Shit Done) project's `/gsd-sketch` workflow — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)). The full GSD system ships persistent sketch state, theme/variant pattern references, and consistency-audit workflows; install with `npx get-shit-done-cc --hermes --global`.
diff --git a/skills/software-development/spike/SKILL.md b/skills/software-development/spike/SKILL.md
new file mode 100644
index 0000000000..79d66bda14
--- /dev/null
+++ b/skills/software-development/spike/SKILL.md
@@ -0,0 +1,196 @@
+---
+name: spike
+description: "Throwaway experiments to validate an idea before build."
+version: 1.0.0
+author: Hermes Agent (adapted from gsd-build/get-shit-done)
+license: MIT
+metadata:
+  hermes:
+    tags: [spike, prototype, experiment, feasibility, throwaway, exploration, research, planning, mvp, proof-of-concept]
+    related_skills: [sketch, writing-plans, subagent-driven-development, plan]
+---
+
+# Spike
+
+Use this skill when the user wants to **feel out an idea** before committing to a real build — validating feasibility, comparing approaches, or surfacing unknowns that no amount of research will answer. Spikes are disposable by design. Throw them away once they've paid their debt.
+
+Load this when the user says things like "let me try this", "I want to see if X works", "spike this out", "before I commit to Y", "quick prototype of Z", "is this even possible?", or "compare A vs B".
+
+## When NOT to use this
+
+- The answer is knowable from docs or reading code — just do research, don't build
+- The work is production path — use `writing-plans` / `plan` instead
+- The idea is already validated — jump straight to implementation
+
+## If the user has the full GSD system installed
+
+If `gsd-spike` shows up as a sibling skill (installed via `npx get-shit-done-cc --hermes`), prefer **`gsd-spike`** when the user wants the full GSD workflow: persistent `.planning/spikes/` state, MANIFEST tracking across sessions, Given/When/Then verdict format, and commit patterns that integrate with the rest of GSD. This skill is the lightweight standalone version for users who don't have (or don't want) the full system.
+
+## Core method
+
+Regardless of scale, every spike follows this loop:
+
+```
+decompose  →  research  →  build  →  verdict
+   ↑__________________________________________↓
+                  iterate on findings
+```
+
+### 1. Decompose
+
+Break the user's idea into **2-5 independent feasibility questions**. Each question is one spike. Present them as a table with Given/When/Then framing:
+
+| # | Spike | Validates (Given/When/Then) | Risk |
+|---|-------|----------------------------|------|
+| 001 | websocket-streaming | Given a WS connection, when LLM streams tokens, then client receives chunks < 100ms | High |
+| 002a | pdf-parse-pdfjs | Given a multi-page PDF, when parsed with pdfjs, then structured text is extractable | Medium |
+| 002b | pdf-parse-camelot | Given a multi-page PDF, when parsed with camelot, then structured text is extractable | Medium |
+
+**Spike types:**
+- **standard** — one approach answering one question
+- **comparison** — same question, different approaches (shared number, letter suffix `a`/`b`/`c`)
+
+**Good spike questions:** specific feasibility with observable output.
+**Bad spike questions:** too broad, no observable output, or just "read the docs about X".
+
+**Order by risk.** The spike most likely to kill the idea runs first. No point prototyping the easy parts if the hard part doesn't work.
+
+**Skip decomposition** only if the user already knows exactly what they want to spike and says so. Then take their idea as a single spike.
+
+### 2. Align (for multi-spike ideas)
+
+Present the spike table. Ask: "Build all in this order, or adjust?" Let the user drop, reorder, or re-frame before you write any code.
+
+### 3. Research (per spike, before building)
+
+Spikes are not research-free — you research enough to pick the right approach, then you build. Per spike:
+
+1. **Brief it.** 2-3 sentences: what this spike is, why it matters, key risk.
+2. **Surface competing approaches** if there's real choice:
+
+   | Approach | Tool/Library | Pros | Cons | Status |
+   |----------|-------------|------|------|--------|
+   | ... | ... | ... | ... | maintained / abandoned / beta |
+
+3. **Pick one.** State why. If 2+ are credible, build quick variants within the spike.
+4. **Skip research** for pure logic with no external dependencies.
+
+Use Hermes tools for the research step:
+
+- `web_search("python websocket streaming libraries 2025")` — find candidates
+- `web_extract(urls=["https://websockets.readthedocs.io/..."])` — read the actual docs (returns markdown)
+- `terminal("pip show websockets | grep Version")` — check what's installed in the project's venv
+
+For libraries without docs pages, clone and read their `README.md` / `examples/` via `read_file`. Context7 MCP (if the user has it configured) is also a good source — `mcp_*_resolve-library-id` then `mcp_*_query-docs`.
+
+### 4. Build
+
+One directory per spike. Keep it standalone.
+
+```
+spikes/
+├── 001-websocket-streaming/
+│   ├── README.md
+│   └── main.py
+├── 002a-pdf-parse-pdfjs/
+│   ├── README.md
+│   └── parse.js
+└── 002b-pdf-parse-camelot/
+    ├── README.md
+    └── parse.py
+```
+
+**Bias toward something the user can interact with.** Spikes fail when the only output is a log line that says "it works." The user wants to *feel* the spike working. Default choices, in order of preference:
+
+1. A runnable CLI that takes input and prints observable output
+2. A minimal HTML page that demonstrates the behavior
+3. A small web server with one endpoint
+4. A unit test that exercises the question with recognizable assertions
+
+**Depth over speed.** Never declare "it works" after one happy-path run. Test edge cases. Follow surprising findings. The verdict is only trustworthy when the investigation was honest.
+
+**Avoid** unless the spike specifically requires it: complex package management, build tools/bundlers, Docker, env files, config systems. Hardcode everything — it's a spike.
+
+**Building one spike** — a typical tool sequence:
+
+```
+terminal("mkdir -p spikes/001-websocket-streaming")
+write_file("spikes/001-websocket-streaming/README.md", "# 001: websocket-streaming\n\n...")
+write_file("spikes/001-websocket-streaming/main.py", "...")
+terminal("cd spikes/001-websocket-streaming && python3 main.py")
+# Observe output, iterate.
+```
+
+**Parallel comparison spikes (002a / 002b) — delegate.** When two approaches can run in parallel and both need real engineering (not 10-line prototypes), fan out with `delegate_task`:
+
+```
+delegate_task(tasks=[
+    {"goal": "Build 002a-pdf-parse-pdfjs: ...", "toolsets": ["terminal", "file", "web"]},
+    {"goal": "Build 002b-pdf-parse-camelot: ...", "toolsets": ["terminal", "file", "web"]},
+])
+```
+
+Each subagent returns its own verdict; you write the head-to-head.
+
+### 5. Verdict
+
+Each spike's `README.md` closes with:
+
+```markdown
+## Verdict: VALIDATED | PARTIAL | INVALIDATED
+
+### What worked
+- ...
+
+### What didn't
+- ...
+
+### Surprises
+- ...
+
+### Recommendation for the real build
+- ...
+```
+
+**VALIDATED** = the core question was answered yes, with evidence.
+**PARTIAL** = it works under constraints X, Y, Z — document them.
+**INVALIDATED** = doesn't work, for this reason. This is a successful spike.
+
+## Comparison spikes
+
+When two approaches answer the same question (002a / 002b), build them **back to back**, then do a head-to-head comparison at the end:
+
+```markdown
+## Head-to-head: pdfjs vs camelot
+
+| Dimension | pdfjs (002a) | camelot (002b) |
+|-----------|--------------|----------------|
+| Extraction quality | 9/10 structured | 7/10 table-only |
+| Setup complexity | npm install, 1 line | pip + ghostscript |
+| Perf on 100-page PDF | 3s | 18s |
+| Handles rotated text | no | yes |
+
+**Winner:** pdfjs for our use case. Camelot if we need table-first extraction later.
+```
+
+## Frontier mode (picking what to spike next)
+
+If spikes already exist and the user says "what should I spike next?", walk the existing directories and look for:
+
+- **Integration risks** — two validated spikes that touch the same resource but were tested independently
+- **Data handoffs** — spike A's output was assumed compatible with spike B's input; never proven
+- **Gaps in the vision** — capabilities assumed but unproven
+- **Alternative approaches** — different angles for PARTIAL or INVALIDATED spikes
+
+Propose 2-4 candidates as Given/When/Then. Let the user pick.
+
+## Output
+
+- Create `spikes/` (or `.planning/spikes/` if the user is using GSD conventions) in the repo root
+- One dir per spike: `NNN-descriptive-name/`
+- `README.md` per spike captures question, approach, results, verdict
+- Keep the code throwaway — a spike that takes 2 days to "clean up for production" was a bad spike
+
+## Attribution
+
+Adapted from the GSD (Get Shit Done) project's `/gsd-spike` workflow — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)). The full GSD system offers persistent spike state, MANIFEST tracking, and integration with a broader spec-driven development pipeline; install with `npx get-shit-done-cc --hermes --global`.
diff --git a/skills/software-development/subagent-driven-development/SKILL.md b/skills/software-development/subagent-driven-development/SKILL.md
index 5d349c9720..23c5bf47da 100644
--- a/skills/software-development/subagent-driven-development/SKILL.md
+++ b/skills/software-development/subagent-driven-development/SKILL.md
@@ -340,3 +340,12 @@ Catch issues early
 ```
 
 **Quality is not an accident. It's the result of systematic process.**
+
+## Further reading (load when relevant)
+
+When the orchestration involves significant context usage, long review loops, or complex validation checkpoints, load these references for the specific discipline:
+
+- **`references/context-budget-discipline.md`** — Four-tier context degradation model (PEAK / GOOD / DEGRADING / POOR), read-depth rules that scale with context window size, and early warning signs of silent degradation. Load when a run will clearly consume significant context (multi-phase plans, many subagents, large artifacts).
+- **`references/gates-taxonomy.md`** — The four canonical gate types (Pre-flight, Revision, Escalation, Abort) with behavior, recovery, and examples. Load when designing or reviewing any workflow that has validation checkpoints — use the vocabulary explicitly so each gate has defined entry, failure behavior, and resumption rules.
+
+Both references adapted from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson).
diff --git a/skills/software-development/subagent-driven-development/references/context-budget-discipline.md b/skills/software-development/subagent-driven-development/references/context-budget-discipline.md
new file mode 100644
index 0000000000..2728160c16
--- /dev/null
+++ b/skills/software-development/subagent-driven-development/references/context-budget-discipline.md
@@ -0,0 +1,53 @@
+# Context Budget Discipline
+
+Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.
+
+Adapted from the GSD (Get Shit Done) project's context-budget reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
+
+## Universal rules
+
+Every workflow that spawns agents or reads significant content must follow these:
+
+1. **Never read agent definition files.** `delegate_task` auto-loads them — you reading them too just doubles the cost.
+2. **Never inline large files into subagent prompts.** Tell the agent to read the file from disk with `read_file` instead. The subagent gets full content; your context stays lean.
+3. **Read depth scales with context window.** See the table below.
+4. **Delegate heavy work to subagents.** The orchestrator routes; it doesn't execute.
+5. **Proactively warn** the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").
+
+## Read depth by context window
+
+Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.
+
+| Context window | Subagent output reading | Summary files | Verification files | Plans for other phases |
+|----------------|-------------------------|---------------|--------------------|-----------------------|
+| < 500k (e.g. 200k) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
+| >= 500k (1M models) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
+
+"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.
+
+## Four-tier degradation model
+
+Monitor your context usage and shift behavior as you climb the tiers. The point is to notice *before* you hit the wall, not when responses start truncating.
+
+| Tier | Usage | Behavior |
+|------|-------|----------|
+| **PEAK** | 0 – 30% | Full operations. Read bodies, spawn multiple agents in parallel, inline results freely. |
+| **GOOD** | 30 – 50% | Normal operations. Prefer frontmatter reads. Delegate aggressively. |
+| **DEGRADING** | 50 – 70% | Economize. Frontmatter-only reads, minimal inlining, **warn the user** about budget. |
+| **POOR** | 70%+ | Emergency mode. **Checkpoint progress immediately.** No new reads unless critical. Finish the current task and stop cleanly. |
+
+## Early warning signs (before panic thresholds fire)
+
+Quality degrades *gradually* before hard limits hit. Watch for these:
+
+- **Silent partial completion.** Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
+- **Increasing vagueness.** Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
+- **Skipped protocol steps.** Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."
+
+When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.
+
+## Fundamental limitation
+
+When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.
+
+**Mitigation:** in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.
diff --git a/skills/software-development/subagent-driven-development/references/gates-taxonomy.md b/skills/software-development/subagent-driven-development/references/gates-taxonomy.md
new file mode 100644
index 0000000000..206f71efc9
--- /dev/null
+++ b/skills/software-development/subagent-driven-development/references/gates-taxonomy.md
@@ -0,0 +1,93 @@
+# Gates Taxonomy
+
+Canonical gate types for validation checkpoints across any workflow that spawns subagents, runs review loops, or has human-approval pauses. Every validation checkpoint maps to one of these four types — naming them explicitly makes the workflow legible and prevents "what happens when this check fails?" confusion.
+
+Adapted from the GSD (Get Shit Done) project's gates reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
+
+## The four gate types
+
+### 1. Pre-flight gate
+
+**Purpose:** Validates preconditions before starting an operation.
+
+**Behavior:** Blocks entry if conditions unmet. No partial work created — bail before anything changes.
+
+**Recovery:** Fix the missing precondition, then retry.
+
+**Examples:**
+- Implementation phase checks that the plan file exists before it starts writing code.
+- Delegated subagent checks that required env vars are set before making API calls.
+- Commit checks that tests passed before pushing.
+
+### 2. Revision gate
+
+**Purpose:** Evaluates output quality and routes to revision if insufficient.
+
+**Behavior:** Loops back to the producer with specific feedback. Bounded by an iteration cap (typically 3).
+
+**Recovery:** Producer addresses feedback; checker re-evaluates. The loop escalates early if issue count does not decrease between consecutive iterations (stall detection). After max iterations, escalates to the user unconditionally — never loop forever.
+
+**Examples:**
+- Plan reviewer reads a draft plan, returns specific issues, planner revises, reviewer re-reads (max 3 cycles).
+- Code reviewer checks subagent-produced code against must-haves; dispatches fixes back to the implementer if any must-have failed.
+- Test coverage checker validates new tests exercise the new paths; if not, sends back to author.
+
+### 3. Escalation gate
+
+**Purpose:** Surfaces unresolvable issues to the human for a decision.
+
+**Behavior:** Pauses workflow, presents options, waits for human input. Never guesses, never picks a default.
+
+**Recovery:** Human chooses action; workflow resumes on the selected path.
+
+**Examples:**
+- Revision loop exhausted after 3 iterations.
+- Merge conflict during automated worktree cleanup.
+- Ambiguous requirement — two reasonable interpretations and the choice changes the approach.
+- Subagent reports "the plan says X but the codebase actually does Y" — human decides which is right.
+
+### 4. Abort gate
+
+**Purpose:** Terminates the operation to prevent damage or waste.
+
+**Behavior:** Stops immediately, preserves state (checkpoint current progress), reports the specific reason.
+
+**Recovery:** Human investigates root cause, fixes, restarts from checkpoint.
+
+**Examples:**
+- Context window critically low during execution (POOR tier, >70%) — abort cleanly rather than produce truncated output.
+- Critical dependency unavailable mid-run (network down, API key revoked).
+- Unrecoverable filesystem state (disk full, permissions lost).
+- Safety invariant violated (agent attempted an irreversible destructive action outside approved scope).
+
+## How to use this in a skill
+
+When you write an orchestration skill that has validation checkpoints, **name each checkpoint by its gate type explicitly** and answer three questions:
+
+1. **What condition triggers this gate?** (e.g., "plan file missing", "issue count didn't decrease", "context >70%")
+2. **What happens when it fails?** (block / loop back / ask human / abort)
+3. **Who resumes, and from where?** (fix precondition + retry, revise + re-check, human decision, restart from checkpoint)
+
+Answering these three up front means your skill never hits "what do we do now?" at runtime.
+
+## Example — a review loop with all four gate types
+
+```
+[Pre-flight] plan.md exists and is non-empty?   → no: bail, ask user to write a plan first
+                ↓ yes
+[Execute]  subagent implements task
+                ↓
+[Revision] reviewer checks against must-haves  → fail: loop back to subagent (max 3)
+                ↓ pass
+[Pre-flight] tests pass?                       → no: bail, report failing tests
+                ↓ yes
+[Commit]
+                ↓
+(on revision loop exhaustion)
+[Escalation] "3 review cycles failed to converge on issue X — pick: force-merge, rewrite task, abandon"
+                ↓ user picks
+(on any tier-POOR context pressure during loop)
+[Abort] "context at 73%, checkpointing and stopping"
+```
+
+The vocabulary is small on purpose. Every gate in every workflow should fit one of these four. If you find yourself inventing a fifth, it's probably a revision gate with extra branching, or an escalation gate in disguise.