mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-01 01:51:44 +00:00

skills: adapt spike/sketch + 2 references from gsd-build/get-shit-done (MIT) (#17421 )

* skills: port spike, sketch, and gates/context-budget references from GSD

Adds two new lightweight standalone skills and two reference docs adapted
from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson). All ports
coexist cleanly with a full `npx get-shit-done-cc --hermes --global`
install — GSD lives under `skills/gsd-*/`, these ports live at their
natural Hermes category paths, zero name collisions.

New skills:
- skills/software-development/spike/ — Lightweight "spike an idea with
  throwaway experiments" workflow: decompose into Given/When/Then
  questions, research per-spike, build comparable variants, close with
  VALIDATED/PARTIAL/INVALIDATED verdict. Standalone alternative to the
  full `gsd-spike` (which requires `.planning/spikes/` state machinery
  and the rest of GSD).
- skills/creative/sketch/ — Lightweight "sketch 2-3 HTML design
  variants" workflow: intake (feel, references, core action), produce
  differentiated variants along a design axis, head-to-head comparison.
  Standalone alternative to the full `gsd-sketch`.

New references under subagent-driven-development/:
- references/context-budget-discipline.md — Four-tier context
  degradation model (PEAK/GOOD/DEGRADING/POOR at 0-30%/30-50%/50-70%/70%+)
  with read-depth rules that scale with context window size, plus early
  warning signs of silent degradation (silent partial completion,
  increasing vagueness, skipped protocol steps).
- references/gates-taxonomy.md — Four canonical gate types for
  validation checkpoints: Pre-flight (precondition block), Revision
  (bounded retry loop with stall detection), Escalation (pause for
  human decision), Abort (terminate to prevent damage). Each ships
  with behavior, recovery, and examples.

Collision guard: each port has explicit "If the user has the full GSD
system installed" guidance directing the agent to prefer `gsd-spike` /
`gsd-sketch` when the full workflow is available. Verified end-to-end
with 86 GSD skills + these 2 Hermes ports installed in the same
HERMES_HOME — 90 total skills, zero duplicate names, both
counterparts appear in the system prompt with distinct descriptions.

Attribution preserved in each SKILL.md footer per MIT notice
requirement. Full GSD system now installable via
`npx get-shit-done-cc --hermes --global` (gsd-build/get-shit-done#2845).

* skills/gsd-port: tighten descriptions, surface Hermes-native tools

Review feedback adjustments to the spike/sketch ports from the previous
commit on this branch:

- description lengths trimmed to <=60 chars with trigger-first phrasing
  (spike: 55 chars 'Throwaway experiments to validate an idea before build.';
   sketch: 55 chars 'Throwaway HTML mockups: 2-3 design variants to compare.')
- author field credits gsd-build/get-shit-done explicitly
- stale duplicate top-level `tags:` removed from sketch frontmatter
  (Hermes reads only metadata.hermes.tags — the top-level field was
  dead weight)
- spike research step now shows concrete Hermes tool calls
  (web_search, web_extract with real URLs, terminal for venv inspection)
  instead of just naming the tool names
- spike build step adds a worked tool-sequence example
  (terminal + write_file + terminal to run) and a delegate_task fan-out
  pattern for parallel comparison spikes (002a / 002b)
- sketch build step adds browser_navigate + browser_vision verification
  step — visual spot-check that catches layout bugs pure source
  inspection misses
- sketch Output section adds a worked tool-sequence example mirroring
  the spike pattern

Descriptions now lead with 'Throwaway' (the pattern-match word that
signals 'disposable / not production code') — gives the agent a clean
activation signal in the system-prompt skill index.

2026-04-29 06:10:05 -07:00

4.6 KiB

Raw Blame History

Gates Taxonomy

Canonical gate types for validation checkpoints across any workflow that spawns subagents, runs review loops, or has human-approval pauses. Every validation checkpoint maps to one of these four types — naming them explicitly makes the workflow legible and prevents "what happens when this check fails?" confusion.

The four gate types

1. Pre-flight gate

Purpose: Validates preconditions before starting an operation.

Behavior: Blocks entry if conditions unmet. No partial work created — bail before anything changes.

Recovery: Fix the missing precondition, then retry.

Examples:

Implementation phase checks that the plan file exists before it starts writing code.
Delegated subagent checks that required env vars are set before making API calls.
Commit checks that tests passed before pushing.

2. Revision gate

Purpose: Evaluates output quality and routes to revision if insufficient.

Behavior: Loops back to the producer with specific feedback. Bounded by an iteration cap (typically 3).

Recovery: Producer addresses feedback; checker re-evaluates. The loop escalates early if issue count does not decrease between consecutive iterations (stall detection). After max iterations, escalates to the user unconditionally — never loop forever.

Examples:

Plan reviewer reads a draft plan, returns specific issues, planner revises, reviewer re-reads (max 3 cycles).
Code reviewer checks subagent-produced code against must-haves; dispatches fixes back to the implementer if any must-have failed.
Test coverage checker validates new tests exercise the new paths; if not, sends back to author.

3. Escalation gate

Purpose: Surfaces unresolvable issues to the human for a decision.

Behavior: Pauses workflow, presents options, waits for human input. Never guesses, never picks a default.

Recovery: Human chooses action; workflow resumes on the selected path.

Examples:

Revision loop exhausted after 3 iterations.
Merge conflict during automated worktree cleanup.
Ambiguous requirement — two reasonable interpretations and the choice changes the approach.
Subagent reports "the plan says X but the codebase actually does Y" — human decides which is right.

4. Abort gate

Purpose: Terminates the operation to prevent damage or waste.

Behavior: Stops immediately, preserves state (checkpoint current progress), reports the specific reason.

Recovery: Human investigates root cause, fixes, restarts from checkpoint.

Examples:

Context window critically low during execution (POOR tier, >70%) — abort cleanly rather than produce truncated output.
Critical dependency unavailable mid-run (network down, API key revoked).
Unrecoverable filesystem state (disk full, permissions lost).
Safety invariant violated (agent attempted an irreversible destructive action outside approved scope).

How to use this in a skill

When you write an orchestration skill that has validation checkpoints, name each checkpoint by its gate type explicitly and answer three questions:

What condition triggers this gate? (e.g., "plan file missing", "issue count didn't decrease", "context >70%")
What happens when it fails? (block / loop back / ask human / abort)
Who resumes, and from where? (fix precondition + retry, revise + re-check, human decision, restart from checkpoint)

Answering these three up front means your skill never hits "what do we do now?" at runtime.

Example — a review loop with all four gate types

[Pre-flight] plan.md exists and is non-empty?   → no: bail, ask user to write a plan first
                ↓ yes
[Execute]  subagent implements task
                ↓
[Revision] reviewer checks against must-haves  → fail: loop back to subagent (max 3)
                ↓ pass
[Pre-flight] tests pass?                       → no: bail, report failing tests
                ↓ yes
[Commit]
                ↓
(on revision loop exhaustion)
[Escalation] "3 review cycles failed to converge on issue X — pick: force-merge, rewrite task, abandon"
                ↓ user picks
(on any tier-POOR context pressure during loop)
[Abort] "context at 73%, checkpointing and stopping"

The vocabulary is small on purpose. Every gate in every workflow should fit one of these four. If you find yourself inventing a fifth, it's probably a revision gate with extra branching, or an escalation gate in disguise.

4.6 KiB Raw Blame History

Gates Taxonomy

The four gate types

1. Pre-flight gate

2. Revision gate

3. Escalation gate

4. Abort gate

How to use this in a skill

Example — a review loop with all four gate types

4.6 KiB

Raw Blame History