fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072

The opencode-go relay defaults max_tokens to 262144 when none is sent, but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every request 400s with "max_tokens is too large: 262144" before the agent can do anything. Add a get_max_tokens(model) hook on ProviderProfile (default returns default_max_tokens) so profiles fronting multiple upstreams can vary the cap per-model. Wire chat_completions transport through the hook. Override on OpenCodeGoProfile with mimo-v2.5-pro=131072. Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm, qwen, minimax, other mimo variants) unchanged.
2026-07-19 15:18:03 +00:00 · 2026-05-28 20:35:04 -07:00 · 2026-05-28 20:35:04 -07:00 · 8cf6b3da9d
commit 8cf6b3da9d
parent bfecfabd0f
3 changed files with 35 additions and 2 deletions
--- a/plugins/model-providers/opencode-zen/init.py
+++ b/plugins/model-providers/opencode-zen/init.py
@ -34,6 +34,21 @@ def _is_deepseek_thinking_model(model: str | None) -> bool:
 class OpenCodeGoProfile(ProviderProfile):
    """OpenCode Go - model-specific reasoning controls."""

+    # Per-model completion-token cap. The opencode-go relay's default is
+    # too large for mimo-v2.5-pro — it sends max_tokens=262144 but Xiaomi
+    # only supports 131072 completion tokens and 400s the request.
+    # Setting an explicit cap here prevents the relay default from being
+    # applied. Keys are normalized via _flat_model_name().
+    _MODEL_MAX_TOKENS: dict[str, int] = {
+        "mimo-v2.5-pro": 131072,
+    }
+
+    def get_max_tokens(self, model: str | None) -> int | None:
+        cap = self._MODEL_MAX_TOKENS.get(_flat_model_name(model))
+        if cap is not None:
+            return cap
+        return self.default_max_tokens
+
    def build_api_kwargs_extras(
        self, *, reasoning_config: dict | None = None, model: str | None = None, **context
    ) -> tuple[dict[str, Any], dict[str, Any]]: