* fix: respect disabled auto-compaction on context overflow
Port from anomalyco/opencode#30749.
When compression.enabled is false, NO automatic compaction trigger may
fire. The proactive token-threshold paths (preflight + post-response
should_compress gate) already honoured the setting, but the three
provider-overflow recovery paths in the agent loop — long-context-tier
429, 413 payload-too-large, and context-overflow — called
_compress_context() unconditionally, silently compressing and rotating
the session against the user's explicit choice.
Add a single guard at the top of the overflow-recovery dispatch: when
compression is disabled and the error is one of those three overflow
classes, surface a terminal error (compaction_disabled: True) telling the
user to /compress manually, /new, switch to a larger-context model, or
reduce attachments. Manual /compress (force=True) is unaffected — it never
enters this loop.
Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't
compress when disabled; control case still compresses when enabled).
Existing overflow-recovery tests updated to enable compaction explicitly
(they verify the recovery fires); fixture defaults flipped to True to
match production (compression.enabled defaults to True).
* fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints
Two distinct failures hit users on the gemini provider with only Google
AI Studio keys set.
1. Truncation loop: build_gemini_request() only set maxOutputTokens when
max_tokens was non-None. Hermes passes None to mean "unlimited", but
Gemini's native generateContent does NOT treat an absent maxOutputTokens
as full budget — it applies a low internal default and stops early with
finishReason=MAX_TOKENS, truncating tool calls. The agent then retries
3x and refuses the incomplete call. Now default to the published 65,535
ceiling (shared by all current Gemini text models) when max_tokens=None.
2. HTTP 400 on Gemini endpoint: the chat_completions transport assembles
profile extra_body (Nous portal 'tags', reasoning, provider prefs) and
sends it via the OpenAI client to whatever base_url is resolved. When a
profile that emits extra_body (e.g. Nous) is active but the endpoint is a
native Gemini base_url — typical when only Google creds exist and a
fallback/aux call lands on Gemini — Google rejects the unknown 'tags'
field with a non-retryable 400. Strip all non-thinking_config extra_body
keys when the resolved endpoint is native Gemini.
Verified E2E against real transport code: tags stripped on native Gemini,
preserved on Nous and the /openai compat endpoint; maxOutputTokens=65535
on None, explicit values respected.
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.
The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:
1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
omitted Google entirely, so users had no way to verify from 'hermes
dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
rows.
2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
an empty/whitespace api_key and stamped it into the x-goog-api-key
header, which made Google's frontend return a generic HTML 400 long
before the request reached the Generative Language backend. Now we
raise RuntimeError at construction with an actionable message
pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.
Added a regression test that covers '', ' ', and None.
- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks
- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks