fix(compression): remove hardcoded gemini-3-flash-preview as default summary model

Closes #2453 The DEFAULT_CONFIG was hardcoding google/gemini-3-flash-preview as the summary_model for context compression. This caused unexpected OpenRouter charges for users who configured a different provider/model, because the compression task would silently fall back to gemini via OpenRouter even when the user's main model was on a different provider. Fix: change summary_model default to empty string. When empty, call_llm() resolves the model through the standard auto-detection chain (auxiliary.compression config -> env vars -> main provider), which correctly uses the user's configured provider and model. Users who want a dedicated cheap model for compression can still explicitly set compression.summary_model in their config.yaml.
2026-04-25 00:51:20 +00:00 · 2026-03-22 11:20:27 +00:00 · 2026-03-22 11:20:27 +00:00 · 0698ddb496
commit 0698ddb496
parent 0962cbb2e5
2 changed files with 4 additions and 3 deletions
--- a/cli.py
+++ b/cli.py
@ -180,7 +180,7 @@ def load_cli_config() -> Dict[str, Any]:
        "compression": {
            "enabled": True,      # Auto-compress when approaching context limit
            "threshold": 0.50,    # Compress at 50% of model's context limit
-            "summary_model": "google/gemini-3-flash-preview",  # Fast/cheap model for summaries
+            "summary_model": "",  # Model for summaries (empty = use main model)
        },
        "smart_model_routing": {
            "enabled": False,