test(memory): cover cache-parity + runtime whitelist on background review fork

- test_background_review_does_not_narrow_toolset_schema: review fork must NOT pass enabled_toolsets to AIAgent (full parent schema = matching Anthropic cache key on the 'tools' field). - test_background_review_installs_thread_local_whitelist: the runtime whitelist that replaces schema-level narrowing must contain memory + skills tools and exclude terminal / send_message / delegate_task / web_search / execute_code. - test_review_fork_inherits_parent_cached_system_prompt: new test for PR #17276's first root cause — the fork's _cached_system_prompt must equal the parent's byte-for-byte. - test_review_fork_pins_session_start_and_session_id: defensive belt-and- suspenders for the cached-prompt inheritance. Inverted the original test_background_review_agent_uses_restricted_toolsets (which asserted the schema-level narrowing) — that narrowing was the direct cause of #25322's cache miss, and the runtime whitelist replaces its safety claim without breaking cache parity. Refs #25322, #15204, PR #17276.
2026-05-23 05:31:23 +00:00 · 2026-05-13 22:06:31 -07:00 · 2026-05-13 22:06:31 -07:00 · 8c6b0c9ecd
commit 8c6b0c9ecd
parent 07349ce4df
3 changed files with 273 additions and 9 deletions
--- a/tests/run_agent/test_background_review.py
+++ b/tests/run_agent/test_background_review.py
@ -20,6 +20,9 @@ def _bare_agent() -> AIAgent:
    agent._memory_store = object()
    agent._memory_enabled = True
    agent._user_profile_enabled = False
+    agent._cached_system_prompt = "test-cached-system-prompt"
+    import datetime as _dt
+    agent.session_start = _dt.datetime(2026, 1, 1, 12, 0, 0)
    agent._MEMORY_REVIEW_PROMPT = "review memory"
    agent._SKILL_REVIEW_PROMPT = "review skills"
    agent._COMBINED_REVIEW_PROMPT = "review both"