hermes-agent/agent
Teknium 6f11ff53ad
fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426)
The Anthropic adapter defaulted to max_tokens=16384 when no explicit value
was configured.  This severely limits thinking-enabled models where thinking
tokens count toward max_tokens:

- Claude Opus 4.6 supports 128K output but was capped at 16K
- Claude Sonnet 4.6 supports 64K output but was capped at 16K

With extended thinking (adaptive or budget-based), the model could exhaust
the entire 16K on reasoning, leaving zero tokens for the actual response.
This caused two user-visible errors:
- 'Response truncated (finish_reason=length)' — thinking consumed most tokens
- 'Response only contains think block with no content' — thinking consumed all

Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs
and Cline's model catalog) and use the model's actual output limit as the
default.  Unknown future models default to 128K (the current maximum).

Also adds context_length clamping: if the user configured a smaller context
window (e.g. custom endpoint), max_tokens is clamped to context_length - 1
to avoid exceeding the window.

Closes #2706
2026-03-27 13:02:52 -07:00
..
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
anthropic_adapter.py fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426) 2026-03-27 13:02:52 -07:00
auxiliary_client.py fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398) 2026-03-27 09:45:25 -07:00
context_compressor.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
context_references.py fix(context): restrict @ references to safe workspace paths (#2601) 2026-03-23 06:40:05 -07:00
copilot_acp_client.py fix(acp): preserve leading whitespace in streaming chunks 2026-03-20 09:38:13 -07:00
display.py fix(gateway): silence background agent terminal output (#3297) 2026-03-26 17:40:31 -07:00
insights.py chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119) 2026-03-25 19:47:58 -07:00
model_metadata.py feat: add Hugging Face as a first-class inference provider (#3419) 2026-03-27 12:41:59 -07:00
models_dev.py fix: 6 bugs in model metadata, reasoning detection, and delegate tool 2026-03-20 08:52:37 -07:00
prompt_builder.py perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366) (#3421) 2026-03-27 10:54:02 -07:00
prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
redact.py fix(redact): safely handle non-string inputs 2026-03-21 16:55:02 -07:00
skill_commands.py fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897) 2026-03-18 03:17:37 -07:00
skill_utils.py perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366) (#3421) 2026-03-27 10:54:02 -07:00
smart_model_routing.py feat: integrate GitHub Copilot providers across Hermes 2026-03-17 23:40:22 -07:00
title_generator.py feat: auto-generate session titles after first exchange 2026-03-17 04:14:40 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024) 2026-03-25 12:45:58 -07:00