hermes-agent/agent
KUSH42 34d06a9802 fix(compaction): don't halve context_length on output-cap-too-large errors
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.

Two distinct error classes now handled separately:

  Prompt too long  — input itself exceeds context window.
    Fix: compress history + halve context_length (existing behaviour,
    unchanged).

  Output cap too large — input OK, but input + max_tokens > window.
    Fix: parse available_tokens from the error message, set a one-shot
    _ephemeral_max_output_tokens override for the retry, and leave
    context_length completely untouched.

Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
  that detects Anthropic's "available_tokens: N" error format and returns
  the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
  block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
  safety margin) and break to retry without touching context_length.
  _build_api_kwargs consumes the ephemeral value exactly once then clears
  it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
  clearly document the max_tokens (output cap) vs context_length (total
  window) distinction, which is a persistent source of confusion due to
  the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
  by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
  of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
  the parser, build_anthropic_kwargs clamping, ephemeral one-shot
  consumption, and the invariant that context_length is never mutated
  on output-cap errors.
2026-04-09 11:27:41 -07:00
..
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
anthropic_adapter.py fix(compaction): don't halve context_length on output-cap-too-large errors 2026-04-09 11:27:41 -07:00
auxiliary_client.py fix(auxiliary_client): inject KimiCLI User-Agent for custom endpoint sync clients 2026-04-09 11:11:25 -07:00
builtin_memory_provider.py refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
context_compressor.py fix(compaction): token-budget primary tail protection 2026-04-08 23:54:23 -07:00
context_references.py refactor: replace inline HERMES_HOME re-implementations with get_hermes_home() 2026-04-07 10:40:34 -07:00
copilot_acp_client.py fix: bridge tool-calls in copilot-acp adapter 2026-04-06 01:47:57 -07:00
credential_pool.py fix(credential_pool): use _resolve_kimi_base_url when seeding kimi-coding pool 2026-04-09 11:11:25 -07:00
display.py refactor: remove 24 confirmed dead functions — 432 lines of unused code 2026-04-07 11:41:26 -07:00
error_classifier.py fix: prevent 400 format errors from triggering compression loop on Codex Responses API (#6751) 2026-04-09 11:11:34 -07:00
insights.py fix(insights): show cache tokens in overview so total adds up (#4428) 2026-04-01 03:06:47 -07:00
memory_manager.py refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
memory_provider.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
model_metadata.py fix(compaction): don't halve context_length on output-cap-too-large errors 2026-04-09 11:27:41 -07:00
models_dev.py feat(qwen): add Qwen OAuth provider with portal request support 2026-04-08 13:46:30 -07:00
prompt_builder.py feat(gateway): add BlueBubbles iMessage platform adapter (#6437) 2026-04-08 23:54:03 -07:00
prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
rate_limit_tracker.py feat: capture provider rate limit headers and show in /usage (#6541) 2026-04-09 03:43:14 -07:00
redact.py fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423) 2026-04-05 22:43:33 -07:00
retry_utils.py feat(agent): add jittered retry backoff 2026-04-08 00:41:36 -07:00
skill_commands.py feat(skills): add skill config interface + llm-wiki skill (#5635) 2026-04-06 13:49:13 -07:00
skill_utils.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
smart_model_routing.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-04-02 11:00:35 +11:00
subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
title_generator.py feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597) 2026-03-28 14:35:28 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024) 2026-03-25 12:45:58 -07:00