Closes#6051.
Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").
Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.
Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
• Environment-dependent failures (missing binaries, fresh-install
errors, post-migration path mismatches, 'command not found',
unconfigured credentials, uninstalled packages)
• Negative claims about tools or features ("X does not work")
that harden into self-cited refusals
• Session-specific transient errors that resolved before the
conversation ended
• One-off task narratives ("summarize today's market", "analyze
this PR") — also addresses the #12812 / #4538 family
Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.
Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.
This rewrite changes the stance:
- **Active-update bias.** The reviewer now treats inaction as a missed
learning opportunity. 'Nothing to save.' remains an explicit escape
but is no longer framed as the most-common outcome.
- **User-preference corrections are first-class skill signals.** Style,
tone, format, legibility, verbosity complaints — and the actual
phrasings users use ('stop doing X', 'this is too verbose', 'I hate
when you Y', 'remember this') — now warrant patching the skill that
governs the task, not just writing to memory.
- **Loaded-skill-first preference order.** When a skill was loaded via
/skill-name or skill_view during the session, the reviewer patches
THAT one first. It was in play; it's the right place.
- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
create.** Support files are explicitly enumerated as three kinds:
* references/<topic>.md — session-specific detail OR condensed
knowledge banks (quoted research, API docs excerpts, domain notes)
* templates/<name>.<ext> — starter files to copy and modify
* scripts/<name>.<ext> — statically re-runnable actions
- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
numbers, error strings, codenames, library-alone names, or session
artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
only fits today's task, fall back to one of the patch/support-file
options.
- **Memory scope clarified.** 'who the user is and what the current
situation and state of your operations are' — MEMORY.md is
situational/state, USER.md is identity/preferences.
- **Curator handoff.** Reviewer flags overlap; the background curator
handles consolidation at scale. Single-session reviewer doesn't
attempt umbrella-rebalancing.
Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.
This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").
Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.
Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.
Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.