fix(agent): downgrade xhigh→max on Anthropic pre-4.7 adaptive models

Regression from #11161 (Claude Opus 4.7 migration, commit 0517ac3e). The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max" (the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s: BadRequestError [HTTP 400]: This model does not support effort level 'xhigh'. Supported levels: high, low, max, medium. Users who set reasoning_effort=xhigh as their default (xhigh is the recommended default for coding/agentic on 4.7 per the Anthropic migration guide) now 400 every request the moment they switch back to a 4.6 model via `/model` or config. Verified live against the Anthropic API on `anthropic==0.94.0`. Fix: make the mapping model-aware. Add `_supports_xhigh_effort()` predicate (matches 4-7/4.7 substrings, mirroring the existing `_supports_adaptive_thinking` / `_forbids_sampling_params` pattern). On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort those models accept, restoring pre-migration behavior). On 4.7+, keep xhigh as a distinct level. Per Anthropic's migration guide, xhigh is 4.7-only: https://platform.claude.com/docs/en/about-claude/models/migration-guide > Opus 4.7 effort levels: max, xhigh (new), high, medium, low. > Opus 4.6 effort levels: max, high, medium, low. SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[ "low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh). ## Test plan Verified live on macOS 15.5 / anthropic==0.94.0: claude-opus-4-6 + effort=xhigh → output_config.effort=max → 200 OK claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK claude-opus-4-6 + effort=max → output_config.effort=max → 200 OK claude-opus-4-7 + effort=max → output_config.effort=max → 200 OK `tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged test that asserted the broken behavior, added 1 for 4.7 preservation). Full adapter suite: 120 passed in 1.05s. Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed (2 pre-existing failures on clean upstream/main, unrelated). ## Platforms Tested on macOS 15.5. No platform-specific code paths touched.
2026-06-09 08:21:50 +00:00 · 2026-04-16 13:51:42 -05:00 · 2026-04-16 13:51:42 -05:00 · 63d06dd93d
commit 63d06dd93d
parent 37913d9109
2 changed files with 48 additions and 9 deletions
--- a/tests/agent/test_anthropic_adapter.py
+++ b/tests/agent/test_anthropic_adapter.py
@ -959,11 +959,13 @@ class TestBuildAnthropicKwargs:
        assert "temperature" not in kwargs
        assert kwargs["max_tokens"] == 4096

-    def test_reasoning_config_maps_xhigh_to_xhigh_effort_for_4_6_models(self):
-        # Opus 4.7 added "xhigh" as a distinct effort level (the recommended
-        # default for coding/agentic work). Earlier mapping aliased xhigh→max,
-        # which silently over-efforted every request. 2026-04-16 migration
-        # guide: xhigh and max are distinct levels.
+    def test_reasoning_config_downgrades_xhigh_to_max_for_4_6_models(self):
+        # Opus 4.7 added "xhigh" as a distinct effort level (low/medium/high/
+        # xhigh/max). Opus 4.6 only supports low/medium/high/max — sending
+        # "xhigh" there returns an API 400. Preserve the pre-migration
+        # behavior of aliasing xhigh→max on pre-4.7 adaptive models so users
+        # who prefer xhigh as their default don't 400 every request when
+        # switching back to 4.6.
        kwargs = build_anthropic_kwargs(
            model="claude-sonnet-4-6",
            messages=[{"role": "user", "content": "think harder"}],
@ -972,6 +974,19 @@ class TestBuildAnthropicKwargs:
            reasoning_config={"enabled": True, "effort": "xhigh"},
        )
        assert kwargs["thinking"] == {"type": "adaptive", "display": "summarized"}
+        assert kwargs["output_config"] == {"effort": "max"}
+
+    def test_reasoning_config_preserves_xhigh_for_4_7_models(self):
+        # On 4.7+ xhigh is a real level and the recommended default for
+        # coding/agentic work — keep it distinct from max.
+        kwargs = build_anthropic_kwargs(
+            model="claude-opus-4-7",
+            messages=[{"role": "user", "content": "think harder"}],
+            tools=None,
+            max_tokens=4096,
+            reasoning_config={"enabled": True, "effort": "xhigh"},
+        )
+        assert kwargs["thinking"] == {"type": "adaptive", "display": "summarized"}
        assert kwargs["output_config"] == {"effort": "xhigh"}

    def test_reasoning_config_maps_max_effort_for_4_7_models(self):