mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-21 10:22:18 +00:00
feat(simplify-code): add risk-tiered application, Chesterton's Fence, slop + silent failure detection
Five targeted enhancements to the upstream simplify-code skill:
1. Risk-tiered application (SAFE/CAREFUL/RISKY) — safe changes auto-applied,
careful changes verified per-file, risky changes flagged for human review.
Prevents auto-applying N+1 restructures and public API renames.
2. Chesterton's Fence — before flagging anything for removal, reviewers run
'git blame' to understand why it exists. Low-confidence findings are
escalated rather than guessed.
3. AI slop detection — Quality reviewer now catches: extra comments restating
obvious code, unnecessary defensive null-checks on validated inputs, 'as any'
casts, and patterns inconsistent with the rest of the file.
4. Silent failure detection — Efficiency reviewer now catches: empty catch
blocks, ignored error returns, except:pass, .catch(()=>{}) with no handling,
and error propagation gaps.
5. Structured reviewer output with confidence+risk tags — reviewers report in
'file:line → problem → fix | confidence: H/M/L | risk: SAFE/CAREFUL/RISKY'
format, enabling the orchestrator to tier the application.
Plus 3 new pitfalls: over-trusting dead code tools, public contract awareness,
and preserving intentional error handling.
Total: +45/-8 lines. Keeps the 212-line compact spirit.
Ref: #379
This commit is contained in:
parent
ba50e86563
commit
db744e7d1e
1 changed files with 45 additions and 8 deletions
|
|
@ -87,8 +87,20 @@ toolsets (so they can `git`, `read_file`, and `search_files`/grep).
|
|||
|
||||
Tell each reviewer to:
|
||||
- Search the existing codebase for evidence (don't reason from the diff alone).
|
||||
- Report findings as a concrete list: `file:line → problem → suggested fix`.
|
||||
- Rank each finding `high` / `medium` / `low` confidence.
|
||||
- **Apply Chesterton's Fence:** before flagging anything for removal, run
|
||||
`git blame` on the line to understand why it exists. If you can't determine
|
||||
the original purpose, mark it `confidence: low` — don't guess.
|
||||
- Report findings as structured output with confidence and risk:
|
||||
```
|
||||
file:line → problem → suggested fix | confidence: high/medium/low | risk: SAFE/CAREFUL/RISKY
|
||||
```
|
||||
- **SAFE** = proven not to affect behavior (unused imports, commented-out
|
||||
code, pass-through wrappers). Auto-apply these.
|
||||
- **CAREFUL** = improves without changing semantics (rename local variable,
|
||||
flatten nested ternary, extract helper). Apply with test verification.
|
||||
- **RISKY** = may change behavior or breaks public contracts (N+1
|
||||
restructuring, public API rename, memory lifecycle change). Flag for
|
||||
human review — do NOT auto-apply.
|
||||
- Skip nits and style-only churn. Only flag things that materially improve
|
||||
the code.
|
||||
|
||||
|
|
@ -112,7 +124,11 @@ Pass these three goals (drop any the user's focus excludes):
|
|||
> blocks that should share an abstraction); leaky abstractions (exposing
|
||||
> internals, breaking an existing encapsulation boundary); stringly-typed
|
||||
> code (raw strings where a constant/enum/registry already exists — check the
|
||||
> canonical registries before flagging). For each, give the concrete refactor.
|
||||
> canonical registries before flagging); AI-generated slop patterns (extra
|
||||
> comments restating obvious code like `// increment counter` above `count++`;
|
||||
> unnecessary defensive null-checks on already-validated inputs; `as any`
|
||||
> casts that bypass the type system; patterns inconsistent with the rest of
|
||||
> the file). For each, give the concrete refactor.
|
||||
|
||||
**Reviewer 3 — Efficiency**
|
||||
> Review this diff for efficiency problems. Look for: unnecessary work
|
||||
|
|
@ -122,8 +138,10 @@ Pass these three goals (drop any the user's focus excludes):
|
|||
> TOCTOU anti-patterns (existence pre-checks before an op instead of doing
|
||||
> the op and handling the error); memory issues (unbounded growth, missing
|
||||
> cleanup, listener/handle leaks); overly broad reads (loading whole files
|
||||
> when a slice would do). For each, give the concrete fix and why it's faster
|
||||
> or lighter.
|
||||
> when a slice would do); silent failures (empty catch blocks, ignored error
|
||||
> returns, `except: pass`, `.catch(() => {})` with no handling, error
|
||||
> propagation gaps — these hide bugs and should at minimum log before
|
||||
> swallowing). For each, give the concrete fix and why it's faster or safer.
|
||||
|
||||
### Phase 3 — Aggregate and apply
|
||||
|
||||
|
|
@ -138,13 +156,22 @@ Wait for all three to return (batch mode returns them together).
|
|||
Don't apply a perf "fix" that hurts clarity unless the path is genuinely
|
||||
hot. When two suggestions are mutually exclusive and both defensible, pick
|
||||
the one that touches less code and note the alternative.
|
||||
4. **Apply** the surviving fixes directly with `patch` / `write_file` — unless
|
||||
the user asked for a dry run, in which case present the list and ask first.
|
||||
4. **Apply in risk-tier order:**
|
||||
- **SAFE first** (auto-apply): unused imports, commented-out code,
|
||||
pass-through wrappers, redundant type assertions. Run tests after.
|
||||
- **CAREFUL next** (apply with verification, one file at a time): rename
|
||||
locals, flatten ternaries, extract helpers, consolidate dupes. Run tests
|
||||
after each file. Revert any that break.
|
||||
- **RISKY last** (flag for review — do NOT auto-apply): N+1 restructuring,
|
||||
public API changes, concurrency fixes, error-handling changes. Present
|
||||
each with risk description and test coverage status.
|
||||
If the user opted for a dry run, present all three tiers and apply nothing.
|
||||
5. **Verify** you didn't break anything: run the project's targeted tests for
|
||||
the touched files (not the full suite), and re-run any linter/type check the
|
||||
repo uses. If a fix breaks a test, revert that one fix and report it.
|
||||
6. **Summarize** what you changed: a short list of applied fixes grouped by
|
||||
reviewer category, plus any findings you deliberately skipped and why.
|
||||
reviewer category and risk tier, plus any findings you deliberately skipped
|
||||
and why.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
|
|
@ -166,6 +193,16 @@ Wait for all three to return (batch mode returns them together).
|
|||
- **Large diffs blow context.** If the diff is huge, scope it down before
|
||||
delegating — three subagents each carrying a 5000-line diff is expensive and
|
||||
may truncate.
|
||||
- **Over-trusting dead code tools.** `knip`, `ts-prune`, and `depcheck` flag
|
||||
exports that ARE used dynamically (string-based imports, reflection). Always
|
||||
grep for the symbol name before removing — a clean tool report is not proof.
|
||||
- **Renaming without checking public contracts.** Export names, API route
|
||||
paths, DB column names, and config keys are contracts — even if the name is
|
||||
bad, renaming breaks consumers. Tag public-contract changes as RISKY; never
|
||||
auto-rename them.
|
||||
- **Removing "unnecessary" error handling.** An empty catch block or ignored
|
||||
error might be intentional — the error is expected and benign in that
|
||||
context. Flag it, don't remove it; let the human decide.
|
||||
|
||||
## Related
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue