fix(matrix): replace pickle crypto store with SQLite, fix E2EE decryption (#7981)

Fixes #7952 — Matrix E2EE completely broken after mautrix migration.

- Replace MemoryCryptoStore + pickle/HMAC persistence with mautrix's
  PgCryptoStore backed by SQLite via aiosqlite. Crypto state now
  persists reliably across restarts without fragile serialization.

- Add handle_sync() call on initial sync response so to-device events
  (queued Megolm key shares) are dispatched to OlmMachine instead of
  being silently dropped.

- Add _verify_device_keys_on_server() after loading crypto state.
  Detects missing keys (re-uploads), stale keys from migration
  (attempts re-upload), and corrupted state (refuses E2EE).

- Add _CryptoStateStore adapter wrapping MemoryStateStore to satisfy
  mautrix crypto's StateStore interface (is_encrypted,
  get_encryption_info, find_shared_rooms).

- Remove redundant share_keys() call from sync loop — OlmMachine
  already handles this via DEVICE_OTK_COUNT event handler.

- Fix datetime vs float TypeError in session.py suspend_recently_active()
  that crashed gateway startup.

- Add aiosqlite and asyncpg to [matrix] extra in pyproject.toml.

- Update test mocks for PgCryptoStore/Database and add query_keys mock
  for key verification. 174 tests pass.

- Add E2EE upgrade/migration docs to Matrix user guide.
This commit is contained in:
Siddharth Balyan 2026-04-11 18:54:46 -07:00 committed by GitHub
parent 27eeea0555
commit 50d86b3c71
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 298 additions and 64 deletions

View file

@ -344,9 +344,79 @@ pip install 'hermes-agent[matrix]'
**Fix**:
1. Verify `libolm` is installed on your system (see the E2EE section above).
2. Make sure `MATRIX_ENCRYPTION=true` is set in your `.env`.
3. In your Matrix client (Element), go to the bot's profile **Sessions** verify/trust the bot's device.
3. In your Matrix client (Element), go to the bot's profile -> Sessions -> verify/trust the bot's device.
4. If the bot just joined an encrypted room, it can only decrypt messages sent *after* it joined. Older messages are inaccessible.
### Upgrading from a previous version with E2EE
If you previously used Hermes with `MATRIX_ENCRYPTION=true` and are upgrading to
a version that uses the new SQLite-based crypto store, the bot's encryption
identity has changed. Your Matrix client (Element) may cache the old device keys
and refuse to share encryption sessions with the bot.
**Symptoms**: The bot connects and shows "E2EE enabled" in the logs, but all
messages show "could not decrypt event" and the bot never responds.
**What's happening**: The old encryption state (from the previous `matrix-nio` or
serialization-based `mautrix` backend) is incompatible with the new SQLite crypto
store. The bot creates a fresh encryption identity, but your Matrix client still
has the old keys cached and won't share the room's encryption session with a
device whose keys changed. This is a Matrix security feature -- clients treat
changed identity keys for the same device as suspicious.
**Fix** (one-time migration):
1. **Generate a new access token** to get a fresh device ID. The simplest way:
```bash
curl -X POST https://your-server/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"identifier": {"type": "m.id.user", "user": "@hermes:your-server.org"},
"password": "your-password",
"initial_device_display_name": "Hermes Agent"
}'
```
Copy the new `access_token` and update `MATRIX_ACCESS_TOKEN` in `~/.hermes/.env`.
2. **Delete old encryption state**:
```bash
rm -f ~/.hermes/platforms/matrix/store/crypto.db
rm -f ~/.hermes/platforms/matrix/store/crypto_store.*
```
3. **Force your Matrix client to rotate the encryption session**. In Element,
open the DM room with the bot and type `/discardsession`. This forces Element
to create a new encryption session and share it with the bot's new device.
4. **Restart the gateway**:
```bash
hermes gateway run
```
5. **Send a new message**. The bot should decrypt and respond normally.
:::note
After migration, messages sent *before* the upgrade cannot be decrypted -- the old
encryption keys are gone. This only affects the transition; new messages work
normally.
:::
:::tip
**New installations are not affected.** This migration is only needed if you had
a working E2EE setup with a previous version of Hermes and are upgrading.
**Why a new access token?** Each Matrix access token is bound to a specific device
ID. Reusing the same device ID with new encryption keys causes other Matrix
clients to distrust the device (they see changed identity keys as a potential
security breach). A new access token gets a new device ID with no stale key
history, so other clients trust it immediately.
:::
### Sync issues / bot falls behind
**Cause**: Long-running tool executions can delay the sync loop, or the homeserver is slow.