feat(simplify-code): add risk-tiered application, Chesterton's Fence, slop + silent failure detection

Five targeted enhancements to the upstream simplify-code skill: 1. Risk-tiered application (SAFE/CAREFUL/RISKY) — safe changes auto-applied, careful changes verified per-file, risky changes flagged for human review. Prevents auto-applying N+1 restructures and public API renames. 2. Chesterton's Fence — before flagging anything for removal, reviewers run 'git blame' to understand why it exists. Low-confidence findings are escalated rather than guessed. 3. AI slop detection — Quality reviewer now catches: extra comments restating obvious code, unnecessary defensive null-checks on validated inputs, 'as any' casts, and patterns inconsistent with the rest of the file. 4. Silent failure detection — Efficiency reviewer now catches: empty catch blocks, ignored error returns, except:pass, .catch(()=>{}) with no handling, and error propagation gaps. 5. Structured reviewer output with confidence+risk tags — reviewers report in 'file:line → problem → fix | confidence: H/M/L | risk: SAFE/CAREFUL/RISKY' format, enabling the orchestrator to tier the application. Plus 3 new pitfalls: over-trusting dead code tools, public contract awareness, and preserving intentional error handling. Total: +45/-8 lines. Keeps the 212-line compact spirit. Ref: #379
2026-06-21 10:22:18 +00:00 · 2026-06-08 15:53:07 +01:00 · 2026-06-08 15:53:07 +01:00 · db744e7d1e
commit db744e7d1e
parent ba50e86563
1 changed files with 45 additions and 8 deletions
--- a/skills/software-development/simplify-code/SKILL.md
+++ b/skills/software-development/simplify-code/SKILL.md
@ -87,8 +87,20 @@ toolsets (so they can `git`, `read_file`, and `search_files`/grep).

 Tell each reviewer to:
 - Search the existing codebase for evidence (don't reason from the diff alone).
- Report findings as a concrete list: `file:line → problem → suggested fix`.
- Rank each finding `high` / `medium` / `low` confidence.
+- **Apply Chesterton's Fence:** before flagging anything for removal, run
+  `git blame` on the line to understand why it exists. If you can't determine
+  the original purpose, mark it `confidence: low` — don't guess.
+- Report findings as structured output with confidence and risk:
+  ```
+  file:line → problem → suggested fix | confidence: high/medium/low | risk: SAFE/CAREFUL/RISKY
+  ```
+  - **SAFE** = proven not to affect behavior (unused imports, commented-out
+    code, pass-through wrappers). Auto-apply these.
+  - **CAREFUL** = improves without changing semantics (rename local variable,
+    flatten nested ternary, extract helper). Apply with test verification.
+  - **RISKY** = may change behavior or breaks public contracts (N+1
+    restructuring, public API rename, memory lifecycle change). Flag for
+    human review — do NOT auto-apply.
 - Skip nits and style-only churn. Only flag things that materially improve
  the code.

@ -112,7 +124,11 @@ Pass these three goals (drop any the user's focus excludes):
 > blocks that should share an abstraction); leaky abstractions (exposing
 > internals, breaking an existing encapsulation boundary); stringly-typed
 > code (raw strings where a constant/enum/registry already exists — check the
-> canonical registries before flagging). For each, give the concrete refactor.
+> canonical registries before flagging); AI-generated slop patterns (extra
+> comments restating obvious code like `// increment counter` above `count++`;
+> unnecessary defensive null-checks on already-validated inputs; `as any`
+> casts that bypass the type system; patterns inconsistent with the rest of
+> the file). For each, give the concrete refactor.

 **Reviewer 3 — Efficiency**
 > Review this diff for efficiency problems. Look for: unnecessary work
@ -122,8 +138,10 @@ Pass these three goals (drop any the user's focus excludes):
 > TOCTOU anti-patterns (existence pre-checks before an op instead of doing
 > the op and handling the error); memory issues (unbounded growth, missing
 > cleanup, listener/handle leaks); overly broad reads (loading whole files
-> when a slice would do). For each, give the concrete fix and why it's faster
-> or lighter.
+> when a slice would do); silent failures (empty catch blocks, ignored error
+> returns, `except: pass`, `.catch(() => {})` with no handling, error
+> propagation gaps — these hide bugs and should at minimum log before
+> swallowing). For each, give the concrete fix and why it's faster or safer.

 ### Phase 3 — Aggregate and apply

@ -138,13 +156,22 @@ Wait for all three to return (batch mode returns them together).
   Don't apply a perf "fix" that hurts clarity unless the path is genuinely
   hot. When two suggestions are mutually exclusive and both defensible, pick
   the one that touches less code and note the alternative.
-4. **Apply** the surviving fixes directly with `patch` / `write_file` — unless
-   the user asked for a dry run, in which case present the list and ask first.
+4. **Apply in risk-tier order:**
+   - **SAFE first** (auto-apply): unused imports, commented-out code,
+     pass-through wrappers, redundant type assertions. Run tests after.
+   - **CAREFUL next** (apply with verification, one file at a time): rename
+     locals, flatten ternaries, extract helpers, consolidate dupes. Run tests
+     after each file. Revert any that break.
+   - **RISKY last** (flag for review — do NOT auto-apply): N+1 restructuring,
+     public API changes, concurrency fixes, error-handling changes. Present
+     each with risk description and test coverage status.
+   If the user opted for a dry run, present all three tiers and apply nothing.
 5. **Verify** you didn't break anything: run the project's targeted tests for
   the touched files (not the full suite), and re-run any linter/type check the
   repo uses. If a fix breaks a test, revert that one fix and report it.
 6. **Summarize** what you changed: a short list of applied fixes grouped by
-   reviewer category, plus any findings you deliberately skipped and why.
+   reviewer category and risk tier, plus any findings you deliberately skipped
+   and why.

 ## Pitfalls

@ -166,6 +193,16 @@ Wait for all three to return (batch mode returns them together).
 - **Large diffs blow context.** If the diff is huge, scope it down before
  delegating — three subagents each carrying a 5000-line diff is expensive and
  may truncate.
+- **Over-trusting dead code tools.** `knip`, `ts-prune`, and `depcheck` flag
+  exports that ARE used dynamically (string-based imports, reflection). Always
+  grep for the symbol name before removing — a clean tool report is not proof.
+- **Renaming without checking public contracts.** Export names, API route
+  paths, DB column names, and config keys are contracts — even if the name is
+  bad, renaming breaks consumers. Tag public-contract changes as RISKY; never
+  auto-rename them.
+- **Removing "unnecessary" error handling.** An empty catch block or ignored
+  error might be intentional — the error is expected and benign in that
+  context. Flag it, don't remove it; let the human decide.

 ## Related