When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).
Root cause: two separate context values existed independently:
- context_compressor.context_length = config value (e.g. 65536) ✓
- _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config
Changes:
1. Cap Ollama num_ctx to config context_length (run_agent.py)
When model.context_length is explicitly set and no explicit
ollama_num_ctx override exists, cap the auto-detected GGUF value
to the user's context_length. This is the core fix — it prevents
Ollama from allocating more VRAM than the user budgeted.
2. Pass config_context_length through all secondary call sites
Several paths called get_model_context_length() without the config
override, falling through to the 256K default fallback:
- cli.py: @-reference expansion and /model switch display
- gateway/run.py: @-reference expansion and /model switch display
- tui_gateway/server.py: @-reference expansion
- hermes_cli/model_switch.py: resolve_display_context_length()
3. Normalize root-level context_length in config (hermes_cli/config.py)
_normalize_root_model_keys() now migrates root-level context_length
into the model section, matching existing behavior for provider and
base_url. Users who wrote `context_length: 65536` at the YAML root
instead of under `model:` had it silently ignored.
4. Fix misleading comments (agent/model_metadata.py)
DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
as two comments stated.
Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).
Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.
Based on #17127 by @scotttrinh, cherry-picked onto current main.
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock