hermes-agent/plugins/memory
kshitij 2a7047c2ed
fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043)
SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl
byte-range locks that don't reliably work on network filesystems. Upstream
documents this explicitly:
  https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode

On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises
'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before
this change, every feature backed by state.db or kanban.db broke silently:
  - /resume, /title, /history, /branch returned 'Session database not
    available.' with no cause
  - gateway logged the init failure at DEBUG (invisible in errors.log)
  - kanban dispatcher crashed every 60s, driving the known migration race
    (duplicate column name: consecutive_failures, #21708 / #21374)

Changes:
  - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL
    and falls back to DELETE on SQLITE_PROTOCOL-style errors with one
    WARNING explaining why
  - hermes_state.get_last_init_error() + format_session_db_unavailable():
    capture the init failure cause and surface it in user-facing strings
    (with an NFS/SMB pointer for 'locking protocol')
  - hermes_cli/kanban_db.connect(): use the shared helper
  - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING
    (matches cli.py's existing correct behavior)
  - cli.py (4 sites) + gateway/run.py (5 sites): replace bare
    'Session database not available.' with format_session_db_unavailable()

Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new
test in tests/hermes_cli/test_kanban_db.py. Existing suites (state,
kanban, gateway, cli) remain green for all tests unrelated to pre-existing
failures on main.

Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home,
local_lock=none) reporting 'Session database not available.' on /resume;
'locking protocol' appears in 4 distinct log entries across backup,
kanban, TUI, and CLI paths in the same session.

closes #22032
2026-05-09 02:09:35 -07:00
..
byterover refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
hindsight feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
holographic fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043) 2026-05-09 02:09:35 -07:00
honcho fix(honcho): pass user_message as search_query in get_prefetch_context 2026-05-05 05:01:12 -07:00
mem0 refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
openviking fix: always send tenant headers in OpenViking _headers() when account/user are set 2026-05-09 01:53:19 -07:00
retaindb refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
supermemory feat(supermemory): add multi-container, search_mode, identity template, and env var override (#5933) 2026-04-07 14:03:46 -07:00
__init__.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00