fix(deepseek): wire thinking-mode via DeepSeekProfile, not legacy fallback

The cherry-picked PR #15251 from @tw2818 correctly identified the DeepSeek 400 root cause but placed the fix in the legacy fallback path of `build_kwargs`, which DeepSeek never reaches — DeepSeek has a registered ProviderProfile and goes through `_build_kwargs_from_profile` instead. The legacy-path block was therefore dead code. This commit pivots the fix to where it actually fires: - New `DeepSeekProfile` in `plugins/model-providers/deepseek/__init__.py` overrides `build_api_kwargs_extras` to emit DeepSeek's expected wire format (mirrors `KimiProfile`): {"reasoning_effort": "<low|medium|high|max>", "extra_body": {"thinking": {"type": "enabled" | "disabled"}}} - Model gating: only `deepseek-v4-*` and `deepseek-reasoner` emit thinking control. `deepseek-chat` (V3) is untouched — current behavior. - Effort mapping: low/medium/high passthrough, xhigh/max → max, unset → omitted (DeepSeek server applies its own default). - Revert the legacy-path additions from PR #15251 — they were dead code, and the `_copy_reasoning_content_for_api` strip block specifically would have nullified the existing reasoning_content padding machinery (`_needs_deepseek_tool_reasoning` → space-pad on replay) that the active provider already relies on for replay correctness. - Unit tests pin the wire-shape contract and the model gating rules (26 tests, all passing). Existing transport + provider profile suites (321 tests) continue to pass. - AUTHOR_MAP: map twebefy@gmail.com → tw2818 for release notes credit. Closes #15700, #17212, #17825. Co-authored-by: tw2818 <twebefy@gmail.com>
2026-05-22 05:22:09 +00:00 · 2026-05-15 16:39:18 -07:00 · 2026-05-15 16:39:18 -07:00 · cd9470f416
commit cd9470f416
parent 068c24f8a4
5 changed files with 266 additions and 29 deletions
--- a/agent/transports/chat_completions.py
+++ b/agent/transports/chat_completions.py
@ -189,7 +189,6 @@ class ChatCompletionsTransport(ProviderTransport):
            is_kimi: bool
            is_tokenhub: bool
            is_lmstudio: bool
-            is_deepseek: bool
            is_custom_provider: bool
            ollama_num_ctx: int | None
            # Provider routing
@ -349,25 +348,6 @@ class ChatCompletionsTransport(ProviderTransport):
                "type": "enabled" if _kimi_thinking_enabled else "disabled",
            }

-        # DeepSeek extra_body.thinking + top-level reasoning_effort
-        is_deepseek = params.get("is_deepseek", False)
-        if is_deepseek:
-            _ds_thinking_enabled = True
-            if reasoning_config and isinstance(reasoning_config, dict):
-                if reasoning_config.get("enabled") is False:
-                    _ds_thinking_enabled = False
-            extra_body["thinking"] = {
-                "type": "enabled" if _ds_thinking_enabled else "disabled",
-            }
-            # DeepSeek effort: low/medium→high, high→high, xhigh/max→max
-            if _ds_thinking_enabled and reasoning_config:
-                _e = (reasoning_config.get("effort") or "").strip().lower()
-                if _e in ("xhigh", "max"):
-                    api_kwargs["reasoning_effort"] = "max"
-                elif _e in ("low", "medium", "high"):
-                    api_kwargs["reasoning_effort"] = _e
-            # If no effort configured, don't set it → DeepSeek defaults to high
-
        # Reasoning. LM Studio is handled above via top-level reasoning_effort,
        # so skip emitting extra_body.reasoning for it.
        if params.get("supports_reasoning", False) and not params.get("is_lmstudio", False):