hermes-agent/hermes_cli
Teknium d76fa7fc37
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051)
* fix: detect context length for custom model endpoints via fuzzy matching + config override

Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:

1. Endpoint metadata lookup used exact match only — model name mismatches
   (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
   match even though only one model was loaded
3. No user escape hatch to manually set context length

Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
  use the only available model regardless of name; multi-model servers
  try substring matching in both directions
- Add model.context_length config override (highest priority) so users
  can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
  users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init

Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).

* fix: auto-detect local model name and context length for local servers

Cherry-picked from PR #2043 by sudoingX.

- Auto-detect model name from local server's /v1/models when only one
  model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
  training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init

Co-authored-by: sudo <sudoingx@users.noreply.github.com>

* fix: revert accidental summary_target_tokens change + add docs for context_length config

- Revert summary_target_tokens from 2500 back to 500 (accidental change
  during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
  explaining model.context_length config override

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
..
__init__.py feat: integrate GitHub Copilot providers across Hermes 2026-03-17 23:40:22 -07:00
auth.py feat: proper Copilot auth with OAuth device code flow and token validation 2026-03-18 03:25:58 -07:00
banner.py fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) 2026-03-19 06:01:16 -07:00
callbacks.py refactor(cli): implement approval locking mechanism to serialize concurrent requests 2026-03-13 23:59:18 -07:00
checklist.py fix: skip hanging tests + add global test timeout 2026-03-12 01:23:28 -07:00
claw.py fix(claw): warn when API keys are skipped during OpenClaw migration (#1580) 2026-03-17 02:10:36 -07:00
clipboard.py fix: clean up empty file after failed wl-paste clipboard extraction 2026-03-11 02:56:19 -07:00
codex_models.py fix: add codex forward-compat model listing 2026-03-13 21:34:01 -07:00
colors.py Revert "feat(cli): skin-aware light/dark theme mode with terminal auto-detection" 2026-03-17 10:04:53 -07:00
commands.py fix(gateway): replace bare text approval with /approve and /deny commands (#2002) 2026-03-18 16:58:20 -07:00
config.py feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) 2026-03-17 10:44:37 -07:00
copilot_auth.py feat: proper Copilot auth with OAuth device code flow and token validation 2026-03-18 03:25:58 -07:00
cron.py docs: clarify gateway service scopes (#1378) 2026-03-14 21:17:41 -07:00
curses_ui.py refactor: extract shared curses checklist, fix skill discovery perf 2026-03-11 03:06:15 -07:00
default_soul.py feat: seed a default global SOUL.md 2026-03-14 08:05:30 -07:00
doctor.py feat: add Kilo Code (kilocode) as first-class inference provider (#1666) 2026-03-17 02:40:34 -07:00
env_loader.py fix(config): reload .env over stale shell overrides 2026-03-15 06:46:28 -07:00
gateway.py fix(gateway): detect script-style gateway processes for --replace 2026-03-18 03:12:59 -07:00
main.py feat: proper Copilot auth with OAuth device code flow and token validation 2026-03-18 03:25:58 -07:00
models.py Merge origin/main, resolve conflicts (self._base_url_lower) 2026-03-18 04:09:00 -07:00
pairing.py Cleanup time! 2026-02-20 23:23:32 -08:00
plugins.py feat: first-class plugin architecture (#1555) 2026-03-16 07:17:36 -07:00
runtime_provider.py fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) 2026-03-19 06:01:16 -07:00
setup.py Merge origin/main, resolve conflicts (self._base_url_lower) 2026-03-18 04:09:00 -07:00
skills_config.py fix: wire email platform into toolset mappings + add documentation 2026-03-11 06:34:32 -07:00
skills_hub.py fix: add --yes flag to bypass confirmation in /skills install and uninstall (#1647) 2026-03-17 01:59:07 -07:00
skin_engine.py Revert "feat(cli): skin-aware light/dark theme mode with terminal auto-detection" 2026-03-17 10:04:53 -07:00
status.py feat(web): add Tavily as web search/extract/crawl backend (#1731) 2026-03-17 04:28:03 -07:00
tools_config.py feat(web): add Tavily as web search/extract/crawl backend (#1731) 2026-03-17 04:28:03 -07:00
uninstall.py feat(gateway): scope systemd service name to HERMES_HOME 2026-03-16 04:42:46 -07:00