hermes-agent/tests/agent
Teknium d76fa7fc37
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051)
* fix: detect context length for custom model endpoints via fuzzy matching + config override

Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:

1. Endpoint metadata lookup used exact match only — model name mismatches
   (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
   match even though only one model was loaded
3. No user escape hatch to manually set context length

Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
  use the only available model regardless of name; multi-model servers
  try substring matching in both directions
- Add model.context_length config override (highest priority) so users
  can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
  users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init

Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).

* fix: auto-detect local model name and context length for local servers

Cherry-picked from PR #2043 by sudoingX.

- Auto-detect model name from local server's /v1/models when only one
  model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
  training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init

Co-authored-by: sudo <sudoingx@users.noreply.github.com>

* fix: revert accidental summary_target_tokens change + add docs for context_length config

- Revert summary_target_tokens from 2500 back to 500 (accidental change
  during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
  explaining model.context_length config override

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
..
__init__.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_auxiliary_client.py feat: integrate GitHub Copilot providers across Hermes 2026-03-17 23:40:22 -07:00
test_context_compressor.py fix(context_compressor): prevent consecutive same-role messages after compression (#1743) 2026-03-17 05:18:52 -07:00
test_display_emoji.py feat(tools): centralize tool emoji metadata in registry + skin integration 2026-03-15 20:21:21 -07:00
test_model_metadata.py fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) 2026-03-19 06:01:16 -07:00
test_prompt_builder.py fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897) 2026-03-18 03:17:37 -07:00
test_prompt_caching.py fix(cache_control) treat empty text like None to avoid anthropic api cache_control error 2026-03-13 18:08:46 -07:00
test_redact.py feat: secure skill env setup on load (core #688) 2026-03-13 03:14:04 -07:00
test_skill_commands.py fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897) 2026-03-18 03:17:37 -07:00
test_smart_model_routing.py fix: hermes update causes dual gateways on macOS (launchd) (#1567) 2026-03-16 12:36:29 -07:00
test_subagent_progress.py fix(display): fix subagent progress tree-view visual nits 2026-02-28 23:29:49 -08:00
test_title_generator.py feat: auto-generate session titles after first exchange 2026-03-17 04:14:40 -07:00
test_usage_pricing.py feat: use endpoint metadata for custom model context and pricing (#1906) 2026-03-18 03:04:07 -07:00