fix(review): tell background reviewer not to capture transient env failures as skills (#23004)

Closes #6051. Reported failure mode: agent migrated to WSL2, browser launch failed because Playwright wasn't installed yet. Background reviewer captured the failure as a durable skill (`browser-tool-launch-issue`) and the agent kept refusing the browser tool for weeks after Playwright was installed and verified working. Negative claims also propagated into unrelated skills ("browser tools do not work", "cannot use Y from execute_code"). Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both lean hard on "be active, save things, a pass that does nothing is a missed learning opportunity." Neither distinguished durable knowledge from transient environment state. The reviewer was doing what it was told. Fix at the write site — both prompts now carry a "Do NOT capture" section calling out: • Environment-dependent failures (missing binaries, fresh-install errors, post-migration path mismatches, 'command not found', unconfigured credentials, uninstalled packages) • Negative claims about tools or features ("X does not work") that harden into self-cited refusals • Session-specific transient errors that resolved before the conversation ended • One-off task narratives ("summarize today's market", "analyze this PR") — also addresses the #12812 / #4538 family Plus a positive-reframing line: when a tool fails because of setup state, capture the FIX (install command, config step, env var) under an existing setup/troubleshooting skill — never "this tool doesn't work" as a standalone constraint. Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py (2 new + all existing review-prompt assertions). Substring-based checks so future prompt edits don't false-fail.
2026-05-18 04:41:56 +00:00 · 2026-05-09 22:51:25 -07:00 · 2026-05-09 22:51:25 -07:00 · e5af1dd633
commit e5af1dd633
parent 126cbffb8a
2 changed files with 84 additions and 0 deletions
--- a/tests/run_agent/test_review_prompt_class_first.py
+++ b/tests/run_agent/test_review_prompt_class_first.py
@ -178,6 +178,50 @@ def test_combined_review_prompt_preserves_opt_out_clause():
    assert "Nothing to save." in prompt


+# ---------------------------------------------------------------------------
+# Anti-pattern guidance — see issue #6051. The reviewer was learning transient
+# environment failures (e.g. "browser tools do not work" from a fresh-install
+# Playwright miss) as durable skill rules, then citing them against itself for
+# weeks after the environment was fixed. Both review prompts must explicitly
+# tell the reviewer not to capture environment-dependent or negative-framing
+# content as skills.
+# ---------------------------------------------------------------------------
+
+
+def _assert_anti_pattern_guidance(prompt: str, label: str) -> None:
+    """Both review prompts must carry the same anti-pattern section."""
+    lower = prompt.lower()
+    assert "do not capture" in lower, (
+        f"{label}: must have an explicit 'Do NOT capture' section"
+    )
+    # Environment-dependent failures (the #6051 root cause)
+    assert any(k in lower for k in ("missing binar", "command not found", "uninstalled", "fresh-install")), (
+        f"{label}: must call out environment/setup failures as not-skill-worthy"
+    )
+    # Negative-framing avoidance
+    assert any(k in lower for k in ("negative claim", "do not work", "is broken")), (
+        f"{label}: must call out negative-claim phrasings as the failure mode"
+    )
+    # Positive reframing — "capture the fix, not the failure"
+    assert "capture the fix" in lower or "capture the fix " in lower, (
+        f"{label}: must redirect tool-failure capture toward the fix, not the constraint"
+    )
+    # One-off task narratives (#12812 family)
+    assert "one-off" in lower, (
+        f"{label}: must call out one-off task narratives as not-skill-worthy"
+    )
+
+
+def test_skill_review_prompt_has_anti_pattern_guidance():
+    """_SKILL_REVIEW_PROMPT must tell the reviewer NOT to capture transient env failures (#6051)."""
+    _assert_anti_pattern_guidance(AIAgent._SKILL_REVIEW_PROMPT, "_SKILL_REVIEW_PROMPT")
+
+
+def test_combined_review_prompt_has_anti_pattern_guidance():
+    """_COMBINED_REVIEW_PROMPT must carry the same guidance — same failure mode applies."""
+    _assert_anti_pattern_guidance(AIAgent._COMBINED_REVIEW_PROMPT, "_COMBINED_REVIEW_PROMPT")
+
+
 # ---------------------------------------------------------------------------
 # _MEMORY_REVIEW_PROMPT — unchanged, still memory-focused
 # ---------------------------------------------------------------------------