mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 02:11:48 +00:00
When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests). |
||
|---|---|---|
| .. | ||
| transports | ||
| __init__.py | ||
| account_usage.py | ||
| anthropic_adapter.py | ||
| auxiliary_client.py | ||
| bedrock_adapter.py | ||
| codex_responses_adapter.py | ||
| context_compressor.py | ||
| context_engine.py | ||
| context_references.py | ||
| copilot_acp_client.py | ||
| credential_pool.py | ||
| credential_sources.py | ||
| curator.py | ||
| display.py | ||
| error_classifier.py | ||
| file_safety.py | ||
| gemini_cloudcode_adapter.py | ||
| gemini_native_adapter.py | ||
| gemini_schema.py | ||
| google_code_assist.py | ||
| google_oauth.py | ||
| image_gen_provider.py | ||
| image_gen_registry.py | ||
| image_routing.py | ||
| insights.py | ||
| lmstudio_reasoning.py | ||
| manual_compression_feedback.py | ||
| memory_manager.py | ||
| memory_provider.py | ||
| model_metadata.py | ||
| models_dev.py | ||
| moonshot_schema.py | ||
| nous_rate_guard.py | ||
| onboarding.py | ||
| prompt_builder.py | ||
| prompt_caching.py | ||
| rate_limit_tracker.py | ||
| redact.py | ||
| retry_utils.py | ||
| shell_hooks.py | ||
| skill_commands.py | ||
| skill_preprocessing.py | ||
| skill_utils.py | ||
| subdirectory_hints.py | ||
| title_generator.py | ||
| trajectory.py | ||
| usage_pricing.py | ||