hermes-agent/tests/hermes_cli/test_picker_prewarm.py
Teknium 9ca11b35d5
perf(/model): prewarm picker provider-models cache in background (#39847)
* fix: respect disabled auto-compaction on context overflow

Port from anomalyco/opencode#30749.

When compression.enabled is false, NO automatic compaction trigger may
fire. The proactive token-threshold paths (preflight + post-response
should_compress gate) already honoured the setting, but the three
provider-overflow recovery paths in the agent loop — long-context-tier
429, 413 payload-too-large, and context-overflow — called
_compress_context() unconditionally, silently compressing and rotating
the session against the user's explicit choice.

Add a single guard at the top of the overflow-recovery dispatch: when
compression is disabled and the error is one of those three overflow
classes, surface a terminal error (compaction_disabled: True) telling the
user to /compress manually, /new, switch to a larger-context model, or
reduce attachments. Manual /compress (force=True) is unaffected — it never
enters this loop.

Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't
compress when disabled; control case still compresses when enabled).
Existing overflow-recovery tests updated to enable compaction explicitly
(they verify the recovery fires); fixture defaults flipped to True to
match production (compression.enabled defaults to True).

* perf(/model): prewarm picker provider-models cache in background

The no-args /model picker calls list_authenticated_providers(), which
fetches each authenticated provider's live /v1/models list serially. On a
cold or stale (>1h TTL) cache that blocks ~1.5s on the user's critical path
the first time /model is opened in a session.

Warm that exact path off-thread during the idle window right after the CLI
banner is shown: a once-per-process daemon thread runs
list_authenticated_providers() to populate provider_models_cache.json for
every authed provider. By the time the user types /model, the picker hits
the warm disk cache (~136ms vs ~1500ms).

Process-level Event guard (mirrors run_agent's _openrouter_prewarm_done)
ensures at most one thread per process; fully exception-isolated so an
offline/no-creds provider can never affect the session.
2026-06-05 06:55:09 -07:00

60 lines
2.3 KiB
Python

"""Tests for the /model picker background cache prewarm.
``prewarm_picker_cache_async()`` warms the provider-models disk cache off the
user's critical path so the first ``/model`` open in a session is fast instead
of blocking ~1-2s on serial /v1/models fetches. These pin the two contracts
that matter: it runs the warm path exactly once per process (no thread leak),
and it delegates to ``list_authenticated_providers`` to do the warming.
"""
from __future__ import annotations
from unittest.mock import patch
import hermes_cli.model_switch as ms
def _reset_guard():
ms._picker_prewarm_done.clear()
def test_prewarm_runs_list_authenticated_providers_once():
"""First call spawns a thread that calls list_authenticated_providers;
the warm side effect is delegated there (which disk-caches per provider)."""
_reset_guard()
with patch.object(ms, "list_authenticated_providers", return_value=[]) as mock_list:
t = ms.prewarm_picker_cache_async()
assert t is not None, "first call must spawn a prewarm thread"
t.join(timeout=10)
assert not t.is_alive(), "prewarm thread should finish promptly"
mock_list.assert_called_once()
_reset_guard()
def test_prewarm_guard_is_once_per_process():
"""The process-level Event guard must make repeat calls no-ops so a
long-lived process never leaks one OS thread per call."""
_reset_guard()
with patch.object(ms, "list_authenticated_providers", return_value=[]):
t1 = ms.prewarm_picker_cache_async()
assert t1 is not None
t1.join(timeout=10)
# Subsequent calls return None (guard set) — no new thread.
assert ms.prewarm_picker_cache_async() is None
assert ms.prewarm_picker_cache_async() is None
_reset_guard()
def test_prewarm_never_raises_on_failure():
"""A failing/offline provider path must be fully swallowed — the prewarm
is best-effort and must never surface errors into the session."""
_reset_guard()
with patch.object(
ms, "list_authenticated_providers", side_effect=RuntimeError("boom")
):
t = ms.prewarm_picker_cache_async()
assert t is not None
# join must not raise; the worker swallows the exception internally.
t.join(timeout=10)
assert not t.is_alive()
_reset_guard()