hermes-agent/plugins
kshitij 2a7047c2ed
fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043)
SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl
byte-range locks that don't reliably work on network filesystems. Upstream
documents this explicitly:
  https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode

On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises
'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before
this change, every feature backed by state.db or kanban.db broke silently:
  - /resume, /title, /history, /branch returned 'Session database not
    available.' with no cause
  - gateway logged the init failure at DEBUG (invisible in errors.log)
  - kanban dispatcher crashed every 60s, driving the known migration race
    (duplicate column name: consecutive_failures, #21708 / #21374)

Changes:
  - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL
    and falls back to DELETE on SQLITE_PROTOCOL-style errors with one
    WARNING explaining why
  - hermes_state.get_last_init_error() + format_session_db_unavailable():
    capture the init failure cause and surface it in user-facing strings
    (with an NFS/SMB pointer for 'locking protocol')
  - hermes_cli/kanban_db.connect(): use the shared helper
  - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING
    (matches cli.py's existing correct behavior)
  - cli.py (4 sites) + gateway/run.py (5 sites): replace bare
    'Session database not available.' with format_session_db_unavailable()

Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new
test in tests/hermes_cli/test_kanban_db.py. Existing suites (state,
kanban, gateway, cli) remain green for all tests unrelated to pre-existing
failures on main.

Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home,
local_lock=none) reporting 'Session database not available.' on /resume;
'locking protocol' appears in 4 distinct log entries across backup,
kanban, TUI, and CLI paths in the same session.

closes #22032
2026-05-09 02:09:35 -07:00
..
context_engine feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
disk-cleanup feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
example-dashboard/dashboard feat(dashboard): page-scoped plugin slots for built-in pages (#15658) 2026-04-25 06:55:35 -07:00
google_meet feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
hermes-achievements feat(achievements): share card render on unlocked badges (#19657) 2026-05-04 04:47:53 -07:00
image_gen fix(image-gen): preserve xAI API error status 2026-05-04 04:43:07 -07:00
kanban feat(kanban): add tooltips and docs link across dashboard (#21541) 2026-05-07 16:13:27 -07:00
memory fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043) 2026-05-09 02:09:35 -07:00
model-providers refactor(gmi): move User-Agent to profile.default_headers 2026-05-08 03:22:11 -07:00
observability/langfuse feat(plugins): add bundled observability/langfuse plugin 2026-04-28 01:40:59 -07:00
platforms fix(google-chat): repair setup prompt imports 2026-05-08 16:24:01 -07:00
spotify refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174) 2026-04-24 07:06:11 -07:00
strike-freedom-cockpit feat(dashboard): reskin extension points for themes and plugins (#14776) 2026-04-23 15:31:01 -07:00
teams_pipeline feat(teams): add pipeline outbound delivery via existing adapter 2026-05-08 12:00:09 -07:00
__init__.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00