perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR #20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes #20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
This commit is contained in:
teknium1 2026-05-17 17:23:15 -07:00 committed by Teknium
parent 9b91377bec
commit 4a3f13b47b
4 changed files with 355 additions and 38 deletions

View file

@ -82,6 +82,108 @@ def _ra():
return run_agent
def _restore_or_build_system_prompt(agent, system_message, conversation_history):
"""Restore the cached system prompt from the session DB or build it fresh.
Mutates ``agent._cached_system_prompt`` and persists a freshly-built
prompt back to the session DB on first build. Extracted from
``run_conversation`` so the prefix-cache restore path can be tested in
isolation.
Three-way state distinction for the stored row, surfaced via logs so
silent prefix-cache misses are visible in ``agent.log``:
* ``missing`` no session row yet (legitimate first turn).
* ``null`` row exists, ``system_prompt`` column is NULL.
Legacy session predating system-prompt persistence, or a migration
leftover. Warns when ``conversation_history`` is non-empty.
* ``empty`` row exists, ``system_prompt`` column is the empty
string. Indicates a previous-turn write that ran but stored
nothing (silent persistence bug). Always warns.
* ``present`` row exists with a usable prompt reused verbatim.
Read or write failures against the session DB log at WARNING (not
DEBUG) so persistent issues (disk full, schema drift, lock contention)
surface without needing verbose mode. This used to be a debug-level
log that silently broke prefix-cache reuse on the gateway path
(which constructs a fresh ``AIAgent`` per turn and depends on this
DB roundtrip).
"""
stored_prompt = None
stored_state = "missing"
if conversation_history and agent._session_db:
try:
session_row = agent._session_db.get_session(agent.session_id)
if session_row is not None:
raw_prompt = session_row.get("system_prompt")
if raw_prompt is None:
stored_state = "null"
elif raw_prompt == "":
stored_state = "empty"
else:
stored_prompt = raw_prompt
stored_state = "present"
except Exception as exc:
logger.warning(
"Session DB get_session failed for system-prompt restore "
"(session=%s): %s. Falling back to fresh build — prefix "
"cache will miss for this turn.",
agent.session_id, exc,
)
if stored_prompt:
# Continuing session — reuse the exact system prompt from the
# previous turn so the Anthropic cache prefix matches.
agent._cached_system_prompt = stored_prompt
return
if conversation_history and stored_state in ("null", "empty"):
# Continuing session whose stored prompt is unusable. The
# previous turn's write either never happened or wrote an empty
# string — either way every turn now rebuilds and the prefix
# cache misses every time.
logger.warning(
"Stored system prompt for session %s is %s; rebuilding "
"from scratch this turn. Prefix cache will miss until "
"the rebuild persists. Investigate the previous turn's "
"update_system_prompt write path.",
agent.session_id, stored_state,
)
# First turn of a new session (or recovering from a broken stored
# prompt) — build from scratch.
agent._cached_system_prompt = agent._build_system_prompt(system_message)
# Plugin hook: on_session_start — fired once when a brand-new
# session is created (not on continuation). Plugins can use this
# to initialise session-scoped state (e.g. warm a memory cache).
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
"on_session_start",
session_id=agent.session_id,
model=agent.model,
platform=getattr(agent, "platform", None) or "",
)
except Exception as exc:
logger.warning("on_session_start hook failed: %s", exc)
# Persist the system prompt snapshot in SQLite. Failure here used
# to log at DEBUG, which silently broke prefix-cache reuse on the
# gateway path (fresh AIAgent per turn → reads from this row every
# subsequent turn).
if agent._session_db:
try:
agent._session_db.update_system_prompt(agent.session_id, agent._cached_system_prompt)
except Exception as exc:
logger.warning(
"Session DB update_system_prompt failed for session %s: "
"%s. Subsequent turns will rebuild the system prompt and "
"miss the prefix cache.",
agent.session_id, exc,
)
def run_conversation(
agent,
user_message: str,
@ -313,43 +415,7 @@ def run_conversation(
# producing a different system prompt and breaking the Anthropic
# prefix cache.
if agent._cached_system_prompt is None:
stored_prompt = None
if conversation_history and agent._session_db:
try:
session_row = agent._session_db.get_session(agent.session_id)
if session_row:
stored_prompt = session_row.get("system_prompt") or None
except Exception:
pass # Fall through to build fresh
if stored_prompt:
# Continuing session — reuse the exact system prompt from
# the previous turn so the Anthropic cache prefix matches.
agent._cached_system_prompt = stored_prompt
else:
# First turn of a new session — build from scratch.
agent._cached_system_prompt = agent._build_system_prompt(system_message)
# Plugin hook: on_session_start
# Fired once when a brand-new session is created (not on
# continuation). Plugins can use this to initialise
# session-scoped state (e.g. warm a memory cache).
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
"on_session_start",
session_id=agent.session_id,
model=agent.model,
platform=getattr(agent, "platform", None) or "",
)
except Exception as exc:
logger.warning("on_session_start hook failed: %s", exc)
# Store the system prompt snapshot in SQLite
if agent._session_db:
try:
agent._session_db.update_system_prompt(agent.session_id, agent._cached_system_prompt)
except Exception as e:
logger.debug("Session DB update_system_prompt failed: %s", e)
_restore_or_build_system_prompt(agent, system_message, conversation_history)
active_system_prompt = agent._cached_system_prompt