mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
fix(matrix): E2EE and migration bugfixes (#10860)
* - make buffered streaming - fix path naming to expand `~` for agent. - fix stripping of matrix ID to not remove other mentions / localports. * fix(matrix): register MembershipEventDispatcher for invite auto-join The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE events are only dispatched when MembershipEventDispatcher is registered on the client. Without it, _on_invite is dead code and the bot silently ignores all room invites. Closes #10094 Closes #10725 Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz) * fix(matrix): preserve _joined_rooms reference for CryptoStateStore connect() reassigned self._joined_rooms = set(...) after initial sync, orphaning the reference captured by _CryptoStateStore at init time. find_shared_rooms() returned [] forever, breaking Megolm session rotation on membership changes. Mutate in place with clear() + update() so the CryptoStateStore reference stays valid. Refs #8174, PR #8215 * fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race mautrix auto-registers DecryptionDispatcher when client.crypto is set. The adapter also registered _on_encrypted_event for the same event type. _on_encrypted_event had zero awaits and won the race to mark event IDs in the dedup set, causing _on_room_message to drop successfully decrypted events from DecryptionDispatcher. The retry loop masked this by re-decrypting every message ~4 seconds later. Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption; genuinely undecryptable events are logged by mautrix and retried on next key exchange. Refs #8174, PR #8215 * fix(matrix): re-verify device keys after share_keys() upload Matrix homeservers treat ed25519 identity keys as immutable per device. share_keys() can return 200 but silently ignore new keys if the device already exists with different identity keys. The bot would proceed with shared=True while peers encrypt to the old (unreachable) keys. Now re-queries the server after share_keys() and fails closed if keys don't match, with an actionable error message. Refs #8174, PR #8215 * fix(matrix): encrypt outbound attachments in E2EE rooms _upload_and_send() uploaded raw bytes and used the 'url' key for all rooms. In E2EE rooms, media must be encrypted client-side with encrypt_attachment(), the ciphertext uploaded, and the 'file' key (with key/iv/hashes) used instead of 'url'. Now detects encrypted rooms via state_store.is_encrypted() and branches to the encrypted upload path. Refs: PR #9822 (charles-brooks) * fix(matrix): add stop_typing to clear typing indicator after response The adapter set a 30-second typing timeout but never cleared it. The base class stop_typing() is a no-op, so the typing indicator lingered for up to 30 seconds after each response. Closes #6016 Refs: PR #6020 (r266-tech) * fix(matrix): cache all media types locally, not just photos/voice should_cache_locally only covered PHOTO, VOICE, and encrypted media. Unencrypted audio/video/documents in plaintext rooms were passed as MXC URLs that require authentication the agent doesn't have, resulting in 401 errors. Refs #3487, #3806 * fix(matrix): detect stale OTK conflict on startup and fail closed When crypto state is wiped but the same device ID is reused, the homeserver may still hold one-time keys signed with the previous identity key. Identity key re-upload succeeds but OTK uploads fail with "already exists" and a signature mismatch. Peers cannot establish new Olm sessions, so all new messages are undecryptable. Now proactively flushes OTKs via share_keys() during connect() and catches the "already exists" error with an actionable log message telling the operator to purge the device from the homeserver or generate a fresh device ID. Also documents the crypto store recovery procedure in the Matrix setup guide. Refs #8174 * docs(matrix): improve crypto recovery docs per review - Put easy path (fresh access token) first, manual purge second - URL-encode user ID in Synapse admin API example - Note that device deletion may invalidate the access token - Add "stop Synapse first" caveat for direct SQLite approach - Mention the fail-closed startup detection behavior - Add back-reference from upgrade section to OTK warning * refactor(matrix): cleanup from code review - Extract _extract_server_ed25519() and _reverify_keys_after_upload() to deduplicate the re-verification block (was copy-pasted in two places, three copies of ed25519 key extraction total) - Remove dead code: _pending_megolm, _retry_pending_decryptions, _MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after removing _on_encrypted_event - Remove tautological TestMediaCacheGate (tested its own predicate, not production code) - Remove dead TestMatrixMegolmEventHandling and TestMatrixRetryPendingDecryptions (tested removed methods) - Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator - Trim comment to just the "why"
This commit is contained in:
parent
387aa9afc9
commit
d38b73fa57
8 changed files with 833 additions and 358 deletions
|
|
@ -284,8 +284,40 @@ MATRIX_RECOVERY_KEY=EsT... your recovery key here
|
|||
|
||||
On each startup, if `MATRIX_RECOVERY_KEY` is set, Hermes imports cross-signing keys from the homeserver's secure secret storage and signs the current device. This is idempotent and safe to leave enabled permanently.
|
||||
|
||||
:::warning
|
||||
If you delete the `~/.hermes/platforms/matrix/store/` directory, the bot loses its encryption keys. You'll need to verify the device again in your Matrix client. Back up this directory if you want to preserve encrypted sessions.
|
||||
:::warning[Deleting the crypto store]
|
||||
If you delete `~/.hermes/platforms/matrix/store/crypto.db`, the bot loses its encryption identity. Simply restarting with the same device ID will **not** fully recover — the homeserver still holds one-time keys signed with the old identity key, and peers cannot establish new Olm sessions.
|
||||
|
||||
Hermes detects this condition on startup and refuses to enable E2EE, logging: `device XXXX has stale one-time keys on the server signed with a previous identity key`.
|
||||
|
||||
**Easiest recovery: generate a new access token** (which gets a fresh device ID with no stale key history). See the "Upgrading from a previous version with E2EE" section below. This is the most reliable path and avoids touching the homeserver database.
|
||||
|
||||
**Manual recovery** (advanced — keeps the same device ID):
|
||||
|
||||
1. Stop Synapse and delete the old device from its database:
|
||||
```bash
|
||||
sudo systemctl stop matrix-synapse
|
||||
sudo sqlite3 /var/lib/matrix-synapse/homeserver.db "
|
||||
DELETE FROM e2e_device_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
|
||||
DELETE FROM e2e_one_time_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
|
||||
DELETE FROM e2e_fallback_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
|
||||
DELETE FROM devices WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
|
||||
"
|
||||
sudo systemctl start matrix-synapse
|
||||
```
|
||||
Or via the Synapse admin API (note the URL-encoded user ID):
|
||||
```bash
|
||||
curl -X DELETE -H "Authorization: Bearer ADMIN_TOKEN" \
|
||||
'https://your-server/_synapse/admin/v2/users/%40hermes%3Ayour-server/devices/DEVICE_ID'
|
||||
```
|
||||
Note: deleting a device via the admin API may also invalidate the associated access token. You may need to generate a new token afterward.
|
||||
|
||||
2. Delete the local crypto store and restart Hermes:
|
||||
```bash
|
||||
rm -f ~/.hermes/platforms/matrix/store/crypto.db*
|
||||
# restart hermes
|
||||
```
|
||||
|
||||
Other Matrix clients (Element, matrix-commander) may cache the old device keys. After recovery, type `/discardsession` in Element to force a new encryption session with the bot.
|
||||
:::
|
||||
|
||||
:::info
|
||||
|
|
@ -361,6 +393,10 @@ pip install 'hermes-agent[matrix]'
|
|||
|
||||
### Upgrading from a previous version with E2EE
|
||||
|
||||
:::tip
|
||||
If you also manually deleted `crypto.db`, see the "Deleting the crypto store" warning in the E2EE section above — there are additional steps to clear stale one-time keys from the homeserver.
|
||||
:::
|
||||
|
||||
If you previously used Hermes with `MATRIX_ENCRYPTION=true` and are upgrading to
|
||||
a version that uses the new SQLite-based crypto store, the bot's encryption
|
||||
identity has changed. Your Matrix client (Element) may cache the old device keys
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue