fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072

The opencode-go relay defaults max_tokens to 262144 when none is sent, but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every request 400s with "max_tokens is too large: 262144" before the agent can do anything. Add a get_max_tokens(model) hook on ProviderProfile (default returns default_max_tokens) so profiles fronting multiple upstreams can vary the cap per-model. Wire chat_completions transport through the hook. Override on OpenCodeGoProfile with mimo-v2.5-pro=131072. Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm, qwen, minimax, other mimo variants) unchanged.
2026-06-07 08:02:23 +00:00 · 2026-05-28 20:35:04 -07:00 · 2026-05-28 20:35:04 -07:00 · 8cf6b3da9d
commit 8cf6b3da9d
parent bfecfabd0f
3 changed files with 35 additions and 2 deletions
--- a/agent/transports/chat_completions.py
+++ b/agent/transports/chat_completions.py
@ -476,13 +476,17 @@ class ChatCompletionsTransport(ProviderTransport):
        ephemeral = params.get("ephemeral_max_output_tokens")
        user_max = params.get("max_tokens")
        anthropic_max = params.get("anthropic_max_output")
+        # Per-model default cap — profiles override get_max_tokens() when
+        # they front several backends with different completion-token limits
+        # (e.g. opencode-go: mimo-v2.5-pro = 131072).
+        profile_max = profile.get_max_tokens(model)

        if ephemeral is not None and max_tokens_fn:
            api_kwargs.update(max_tokens_fn(ephemeral))
        elif user_max is not None and max_tokens_fn:
            api_kwargs.update(max_tokens_fn(user_max))
-        elif profile.default_max_tokens and max_tokens_fn:
-            api_kwargs.update(max_tokens_fn(profile.default_max_tokens))
+        elif profile_max and max_tokens_fn:
+            api_kwargs.update(max_tokens_fn(profile_max))
        elif anthropic_max is not None:
            api_kwargs["max_tokens"] = anthropic_max