feat: add persistent memory system + SQLite session store

Two-part implementation: Part A - Curated Bounded Memory: - New memory tool (tools/memory_tool.py) with MEMORY.md + USER.md stores - Character-limited (2200/1375 chars), § delimited entries - Frozen snapshot injected into system prompt at session start - Model manages pruning via replace/remove with substring matching - Usage indicator shown in system prompt header Part B - SQLite Session Store: - New hermes_state.py with SessionDB class, FTS5 full-text search - Gateway session.py rewritten to dual-write SQLite + legacy JSONL - Compression-triggered session splitting with parent_session_id chains - New session_search tool with Gemini Flash summarization of matched sessions - CLI session lifecycle (create on launch, close on exit) Also: - System prompt now cached per session, only rebuilt on compression (fixes prefix cache invalidation from date/time changes every turn) - Config version bumped to 3, hermes doctor checks for new artifacts - Disabled in batch_runner and RL environments
2026-07-20 15:33:54 +00:00 · 2026-02-19 00:57:31 -08:00 · 2026-02-19 00:57:31 -08:00 · 440c244cac
commit 440c244cac
parent 655303f2f1
19 changed files with 2397 additions and 327 deletions
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@ -142,6 +142,26 @@ compression:
  # This model compresses the middle turns into a concise summary
  summary_model: "google/gemini-3-flash-preview"

+# =============================================================================
+# Persistent Memory
+# =============================================================================
+# Bounded curated memory injected into the system prompt every session.
+# Two stores: MEMORY.md (agent's notes) and USER.md (user profile).
+# Character limits keep the memory small and focused. The agent manages
+# pruning -- when at the limit, it must consolidate or replace entries.
+# Disabled by default in batch_runner and RL environments.
+#
+memory:
+  # Agent's personal notes: environment facts, conventions, things learned
+  memory_enabled: true
+  
+  # User profile: preferences, communication style, expectations
+  user_profile_enabled: true
+  
+  # Character limits (~2.75 chars per token, model-independent)
+  memory_char_limit: 2200   # ~800 tokens
+  user_char_limit: 1375     # ~500 tokens
+
 # =============================================================================
 # Agent Behavior
 # =============================================================================
@ -274,6 +294,8 @@ platform_toolsets:
 #   skills       - Load skill documents (skills_list, skill_view)
 #   moa          - Mixture of Agents reasoning (mixture_of_agents)
 #   todo         - Task planning and tracking for multi-step work
+#   memory       - Persistent memory across sessions (personal notes + user profile)
+#   session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
 #   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
 #   cronjob      - Schedule and manage automated tasks (CLI-only)
 #   rl           - RL training tools (Tinker-Atropos)