feat(memory): notify providers on mid-process session_id rotation (#17409)

Fixes #6672

Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.

MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
  parent_session_id='', reset=False, **kwargs) with no-op default for
  backward compat. reset=True signals /reset or /new — providers should
  flush accumulated per-session buffers. reset=False for /resume,
  /branch, compression where the logical conversation continues.

MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
  Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
  during shutdown paths.

run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
  into sync_all() and queue_prefetch_all(). Providers with defensive
  session_id updates in sync_turn (Hindsight already had this at
  plugins/memory/hindsight/__init__.py:1199) now actually receive the
  current id.
- Compression block at ~L8884 already notified the context engine of
  the rollover; now also calls
  _memory_manager.on_session_switch(reason='compression').

cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
  flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
  previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
  parent session_id already captured for the DB parent link.

gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
  /branch and /reset. The next message rebuilds a fresh agent whose
  memory provider initialize() runs with the correct session_id —
  matches the pattern the gateway already uses for provider state
  cross-session transitions.

Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
  updates _session_id, mints a fresh _document_id (prevents
  vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
  _turn_counter / _turn_index so in-flight batches don't flush under
  the new document id. parent_session_id only overwritten when provided
  (avoids clobbering on a bare switch).

Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
  default no-op, manager fan-out, failure isolation, empty-id no-op,
  session_id propagation through sync_all/queue_prefetch_all, Hindsight
  state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
  the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
  evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
  assertions to account for the session_id kwarg on sync_all and
  queue_prefetch_all.

E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash

Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
This commit is contained in:
Teknium 2026-04-29 04:57:22 -07:00 committed by GitHub
parent d244596dba
commit 13683c0842
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 543 additions and 2 deletions

View file

@ -1325,6 +1325,51 @@ class HindsightMemoryProvider(MemoryProvider):
return tool_error(f"Unknown tool: {tool_name}")
def on_session_switch(
self,
new_session_id: str,
*,
parent_session_id: str = "",
reset: bool = False,
**kwargs,
) -> None:
"""Refresh cached per-session state when the agent rotates session_id.
Fires on /resume, /branch, /reset, /new, and context compression.
Without this hook, initialize()-cached state (``_session_id``,
``_document_id``, ``_session_turns``, ``_turn_counter``) would keep
pointing at the previous session and writes would land in the wrong
document. See hermes-agent#6672.
Always update ``_session_id`` so metadata and tags on subsequent
retains reflect the active session. Always mint a fresh
``_document_id`` so the new session's retain doesn't overwrite the
old session's document on vectorize-io/hindsight#1303. Always clear
the accumulated batch buffers (``_session_turns``, ``_turn_counter``,
``_turn_index``) even for /resume and /branch, the new session's
batching must start from zero so an in-flight retain doesn't flush
under the wrong ``_document_id``.
``parent_session_id`` is recorded for lineage tags on future retains.
``reset`` is accepted but not needed for Hindsight's state model —
buffer clearing is correct for every session switch, not only /reset.
"""
new_id = str(new_session_id or "").strip()
if not new_id:
return
if parent_session_id:
self._parent_session_id = str(parent_session_id).strip()
self._session_id = new_id
start_ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
self._document_id = f"{self._session_id}-{start_ts}"
self._session_turns = []
self._turn_counter = 0
self._turn_index = 0
logger.debug(
"Hindsight on_session_switch: new_session=%s parent=%s reset=%s doc=%s",
self._session_id, self._parent_session_id, reset, self._document_id,
)
def shutdown(self) -> None:
logger.debug("Hindsight shutdown: waiting for background threads")
for t in (self._prefetch_thread, self._sync_thread):