mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 02:11:48 +00:00
When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests). |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_branch_command.py | ||
| test_busy_input_mode_command.py | ||
| test_cli_approval_ui.py | ||
| test_cli_background_tui_refresh.py | ||
| test_cli_bracketed_paste_sanitizer.py | ||
| test_cli_browser_connect.py | ||
| test_cli_context_warning.py | ||
| test_cli_copy_command.py | ||
| test_cli_extension_hooks.py | ||
| test_cli_external_editor.py | ||
| test_cli_file_drop.py | ||
| test_cli_force_redraw.py | ||
| test_cli_image_command.py | ||
| test_cli_init.py | ||
| test_cli_interrupt_subagent.py | ||
| test_cli_loading_indicator.py | ||
| test_cli_markdown_rendering.py | ||
| test_cli_mcp_config_watch.py | ||
| test_cli_new_session.py | ||
| test_cli_prefix_matching.py | ||
| test_cli_preloaded_skills.py | ||
| test_cli_provider_resolution.py | ||
| test_cli_reload_skills.py | ||
| test_cli_retry.py | ||
| test_cli_save_config_value.py | ||
| test_cli_secret_capture.py | ||
| test_cli_shutdown_memory_messages.py | ||
| test_cli_skin_integration.py | ||
| test_cli_status_bar.py | ||
| test_cli_status_command.py | ||
| test_cli_steer_busy_path.py | ||
| test_cli_terminal_response_sanitizer.py | ||
| test_cli_tools_command.py | ||
| test_cli_user_message_preview.py | ||
| test_compress_focus.py | ||
| test_cwd_env_respect.py | ||
| test_fast_command.py | ||
| test_gquota_command.py | ||
| test_manual_compress.py | ||
| test_personality_none.py | ||
| test_quick_commands.py | ||
| test_reasoning_command.py | ||
| test_resume_display.py | ||
| test_save_conversation_location.py | ||
| test_session_boundary_hooks.py | ||
| test_stream_delta_think_tag.py | ||
| test_surrogate_sanitization.py | ||
| test_tool_progress_scrollback.py | ||
| test_worktree.py | ||
| test_worktree_security.py | ||