mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-24 10:52:21 +00:00
631 lines
30 KiB
Markdown
631 lines
30 KiB
Markdown
# Session Lifecycle
|
|
|
|
> **Audience:** Gateway developers and maintainers
|
|
> **Source files:** `gateway/session.py` (~1444 lines), `gateway/run.py` (~16800 lines), `gateway/config.py`
|
|
> **Last updated:** 2026-06-16
|
|
|
|
## Overview
|
|
|
|
A **session** represents a continuous conversation between the agent and one or more users on a
|
|
messaging platform. The session lifecycle governs when conversations persist, when they reset,
|
|
how they survive gateway restarts, and how messages queue during concurrent operations.
|
|
|
|
The session system lives primarily in two modules:
|
|
|
|
- `gateway/session.py` — Data model (`SessionSource`, `SessionEntry`, `SessionContext`),
|
|
key generation (`build_session_key`), and the main store (`SessionStore`).
|
|
- `gateway/run.py` — Gateway runner (`GatewayRunner`) that wires sessions into the message
|
|
processing pipeline: session expiry watching, agent caching, restart recovery, and message
|
|
queuing.
|
|
|
|
---
|
|
|
|
## 1. SessionSource — Message Origin Descriptor
|
|
|
|
`SessionSource` is a frozen record of *where a message came from*. It is attached to every
|
|
incoming `MessageEvent` and used for routing, isolation, and context injection.
|
|
|
|
### Fields
|
|
|
|
| Field | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `platform` | `Platform` | *(required)* | Enum identifying the messaging platform (telegram, discord, slack, signal, whatsapp, matrix, local, etc.). |
|
|
| `chat_id` | `str` | *(required)* | Platform-level chat/group/channel identifier. Routed through the adapter's `chat_id_key` transform. |
|
|
| `chat_name` | `Optional[str]` | `None` | Human-readable name of the chat or group. |
|
|
| `chat_type` | `str` | `"dm"` | One of `"dm"`, `"group"`, `"channel"`, `"thread"`. Controls session key generation and isolation. |
|
|
| `user_id` | `Optional[str]` | `None` | Platform-specific user identifier. Used for authorization and per-user session isolation. |
|
|
| `user_name` | `Optional[str]` | `None` | Display name of the message author. Injected into system prompt. |
|
|
| `thread_id` | `Optional[str]` | `None` | Forum topic / Discord thread / Slack thread identifier. Differentiates threaded conversations. |
|
|
| `chat_topic` | `Optional[str]` | `None` | Channel topic or description (Discord channel topic, Slack channel purpose). |
|
|
| `user_id_alt` | `Optional[str]` | `None` | Platform-specific stable alternative ID (Signal UUID, Feishu union_id). Used when `user_id` is ephemeral. |
|
|
| `chat_id_alt` | `Optional[str]` | `None` | Signal group internal ID — maps a Signal group V2 identifier to its canonical form. |
|
|
| `is_bot` | `bool` | `False` | True when the message author is a bot or webhook (Discord bots). |
|
|
| `guild_id` | `Optional[str]` | `None` | Discord guild / Slack workspace / Matrix server scope identifier. |
|
|
| `parent_chat_id` | `Optional[str]` | `None` | Parent channel when `chat_id` refers to a thread. |
|
|
| `message_id` | `Optional[str]` | `None` | ID of the triggering message. Used for pin/reply/react operations and Discord ID injection. |
|
|
| `role_authorized` | `bool` | `False` | True when adapter granted access via a platform role (not individual user ID). |
|
|
|
|
### Key Methods
|
|
|
|
- **`description`** (property: `str`) — Human-readable summary e.g. `"DM with Alice"`,
|
|
`"group: My Group, thread: 12345"`.
|
|
- **`to_dict()` / `from_dict()`** — Serialization round-trip for persistence in `sessions.json`.
|
|
|
|
---
|
|
|
|
## 2. SessionEntry — Active Session Record
|
|
|
|
`SessionEntry` is the per-session metadata record stored in memory and persisted to
|
|
`{sessions_dir}/sessions.json`. Each entry maps a `session_key` to its current `session_id`.
|
|
|
|
### Fields
|
|
|
|
| Field | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `session_key` | `str` | *(required)* | Deterministic key identifying the conversation lane (see §4). |
|
|
| `session_id` | `str` | *(required)* | Unique identifier for this specific conversation incarnation. Format: `YYYYMMDD_HHMMSS_<8hex>`. |
|
|
| `created_at` | `datetime` | *(required)* | When this session incarnation was created. |
|
|
| `updated_at` | `datetime` | *(required)* | Last activity timestamp. Used for idle timeout and expiry checks. |
|
|
| `origin` | `Optional[SessionSource]` | `None` | The source that created this session, used for delivery routing. |
|
|
| `display_name` | `Optional[str]` | `None` | Chat display name (sourced from `SessionSource.chat_name`). |
|
|
| `platform` | `Optional[Platform]` | `None` | Platform enum, persisted for expiry policy lookup across restarts. |
|
|
| `chat_type` | `str` | `"dm"` | Chat type, also persisted for policy lookup. |
|
|
| `input_tokens` | `int` | `0` | Cumulative LLM input (prompt) tokens consumed. |
|
|
| `output_tokens` | `int` | `0` | Cumulative LLM output (completion) tokens consumed. |
|
|
| `cache_read_tokens` | `int` | `0` | Cumulative prompt cache read tokens. |
|
|
| `cache_write_tokens` | `int` | `0` | Cumulative prompt cache write tokens. |
|
|
| `total_tokens` | `int` | `0` | Total token count across all turns. |
|
|
| `estimated_cost_usd` | `float` | `0.0` | Estimated cumulative USD cost. |
|
|
| `cost_status` | `str` | `"unknown"` | Cost tracking status label. |
|
|
| `last_prompt_tokens` | `int` | `0` | Last API-reported prompt token count. Used for accurate compression pre-check. |
|
|
|
|
### Boolean Flags (State Machine)
|
|
|
|
SessionEntry has several boolean flags that form a simple state machine governing session
|
|
behavior on the next access.
|
|
|
|
| Flag | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `was_auto_reset` | `bool` | `False` | Set when a session was auto-reset due to policy expiry (idle/daily). Consumed once to inject a context notice. |
|
|
| `auto_reset_reason` | `Optional[str]` | `None` | `"idle"` or `"daily"` — why the previous session was auto-reset. |
|
|
| `reset_had_activity` | `bool` | `False` | Whether the expired session had any messages (`total_tokens > 0`). |
|
|
| `is_fresh_reset` | `bool` | `False` | Set by explicit `/new` or `/reset`. Triggers topic/channel skill re-injection on first message. Distinguished from `was_auto_reset` to avoid misleading "session expired" notices. |
|
|
| `expiry_finalized` | `bool` | `False` | Set by background expiry watcher after invoking `on_session_finalize` hooks, cleaning tool resources, and evicting the cached agent. Prevents redundant finalization across restarts. |
|
|
| `suspended` | `bool` | `False` | Hard force-wipe signal. Set by `/stop` or stuck-loop escalation (3+ consecutive restart failures). On next `get_or_create_session()`, forces a new `session_id` regardless of `resume_pending`. |
|
|
| `resume_pending` | `bool` | `False` | Soft recovery marker. Set by `suspend_recently_active()` (crash recovery) or drain timeout. On next access, preserves the existing `session_id` — the user continues on the same transcript. Cleared after the next successful turn completes. |
|
|
| `resume_reason` | `Optional[str]` | `None` | Why resume was marked: `"restart_timeout"`, `"shutdown_timeout"`, `"restart_interrupted"`. |
|
|
| `last_resume_marked_at` | `Optional[datetime]` | `None` | Timestamp of the last resume-pending marking. |
|
|
|
|
### State Transition Logic (get_or_create_session)
|
|
|
|
```
|
|
┌──────────┐
|
|
│ Incoming │
|
|
│ Message │
|
|
└────┬─────┘
|
|
│
|
|
▼
|
|
┌──────────────────────┐
|
|
│ session_key exists │──── No ──► Create fresh SessionEntry
|
|
│ AND !force_new │
|
|
└──────────┬───────────┘
|
|
│ Yes
|
|
▼
|
|
┌──────────────────────┐
|
|
│ entry.suspended? │──── Yes ──► Auto-reset: new session_id
|
|
└──────────┬───────────┘ (reason="suspended")
|
|
│ No
|
|
▼
|
|
┌──────────────────────┐
|
|
│ entry.resume_pending?│──── Yes ──► Return existing entry
|
|
└──────────┬───────────┘ (preserve session_id)
|
|
│ No Clear flag on next successful turn
|
|
▼
|
|
┌──────────────────────┐
|
|
│ Policy says reset? │──── Yes ──► Auto-reset: new session_id
|
|
└──────────┬───────────┘ (reason="idle"/"daily")
|
|
│ No
|
|
▼
|
|
┌──────────────────────┐
|
|
│ Return existing │
|
|
│ entry, bump │
|
|
│ updated_at │
|
|
└──────────────────────┘
|
|
```
|
|
|
|
**Priority order in `get_or_create_session()`:**
|
|
1. `suspended=True` → always force-reset (hard wipe)
|
|
2. `resume_pending=True` → preserve session_id (soft recovery)
|
|
3. Policy expiry (idle/daily) → auto-reset
|
|
4. No trigger → return existing entry (bump `updated_at`)
|
|
|
|
---
|
|
|
|
## 3. SessionStore — Storage and Operations
|
|
|
|
`SessionStore` is the main storage layer. It maintains an in-memory dict (`_entries`) persisted
|
|
to `sessions.json`, with SQLite (`SessionDB`) as the canonical store for session metadata and
|
|
message transcripts.
|
|
|
|
### Constructor
|
|
|
|
```python
|
|
SessionStore(sessions_dir: Path, config: GatewayConfig, has_active_processes_fn=None)
|
|
```
|
|
|
|
- `sessions_dir` — Directory where `sessions.json` lives.
|
|
- `config` — `GatewayConfig` instance for reset policy lookups.
|
|
- `has_active_processes_fn` — Optional callback keyed by `session_key` to check for running
|
|
background processes. Sessions with active processes are never expired or pruned.
|
|
|
|
### Operations (Methods)
|
|
|
|
| Method | Description |
|
|
|---|---|
|
|
| `get_or_create_session(source, force_new=False)` | Core entry point. Returns existing or creates new `SessionEntry`. Evaluates `suspended`, `resume_pending`, and reset policy. Creates/ends SQLite records. |
|
|
| `update_session(session_key, last_prompt_tokens=None)` | Lightweight metadata update after an interaction. Bumps `updated_at`, optionally records `last_prompt_tokens`. |
|
|
| `reset_session(session_key, display_name=None)` | Explicit reset (from `/new` or `/reset`). Creates new `session_id`, sets `is_fresh_reset=True`. Ends old SQLite session, creates new one. |
|
|
| `switch_session(session_key, target_session_id)` | Switch to a different existing session ID (from `/resume`). Ends current SQLite session, reopens target. |
|
|
| `suspend_session(session_key)` | Mark session as `suspended=True` (from `/stop`). Forces auto-reset on next access. |
|
|
| `mark_resume_pending(session_key, reason)` | Mark session as `resume_pending=True` (from drain timeout). Preserves session_id on next access. Will NOT override `suspended=True`. |
|
|
| `clear_resume_pending(session_key)` | Clear `resume_pending` after a successful resumed turn. Called from gateway after `run_conversation()` returns. |
|
|
| `suspend_recently_active(max_age_seconds=120)` | Crash recovery: mark recently-active sessions as `resume_pending=True`. Skips already-pending and already-suspended entries. Called on startup after unclean shutdown. |
|
|
| `prune_old_entries(max_age_days)` | Drop entries older than `max_age_days` (based on `updated_at`). Skips `suspended` entries and sessions with active processes. |
|
|
| `list_sessions(active_minutes=None)` | Return all sessions, optionally filtered by recent activity. Sorted by `updated_at` descending. |
|
|
| `lookup_by_session_id(session_id)` | Find the active `SessionEntry` for a persisted session ID. |
|
|
| `has_any_sessions()` | Check if any sessions have ever been created (uses SQLite for history, not just in-memory dict). |
|
|
| `append_to_transcript(session_id, message, skip_db=False)` | Append a message to SQLite transcript. `skip_db=True` prevents duplicate writes when the agent already persisted. |
|
|
| `rewrite_transcript(session_id, messages)` | Full replacement of session transcript (used by `/retry`, `/undo`, `/compress`). |
|
|
| `load_transcript(session_id)` | Load all messages from a session's SQLite transcript. |
|
|
| `rewind_session(session_id, n=1)` | Back up `n` user turns via soft-delete (keeps audit trail). Returns `{rewound_count, turns_undone, target_text}`. |
|
|
|
|
### Internal Helpers
|
|
|
|
- `_ensure_loaded()` / `_ensure_loaded_locked()` — Load `sessions.json` into `_entries` dict.
|
|
- `_save()` — Atomic write to `sessions.json` via temp file + `atomic_replace`.
|
|
- `_generate_session_key(source)` — Delegates to `build_session_key()` with config params.
|
|
- `_is_session_expired(entry)` — Policy check from entry alone (no source needed). Used by
|
|
background expiry watcher.
|
|
- `_should_reset(entry, source)` — Policy check returning `"idle"`, `"daily"`, or `None`.
|
|
|
|
### Storage Layout
|
|
|
|
```
|
|
{sessions_dir}/
|
|
sessions.json # In-memory _entries dict, persisted as JSON
|
|
Maps session_key → SessionEntry (metadata only)
|
|
{session_id}.jsonl # (Legacy, removed in spec 002)
|
|
```
|
|
|
|
The canonical transcript store is SQLite via `SessionDB` (from `hermes_state`). The
|
|
`sessions.json` file persists the `session_key → session_id` mapping and entry metadata
|
|
(flags, timestamps, token counts). If SQLite is unavailable, the store falls back to
|
|
JSONL, but this is a degradation path.
|
|
|
|
---
|
|
|
|
## 4. SessionKey Generation Rules
|
|
|
|
Session keys are deterministic strings that identify a conversation lane. They are generated
|
|
by `build_session_key(source, group_sessions_per_user, thread_sessions_per_user)`.
|
|
|
|
### Key Format
|
|
|
|
```
|
|
agent:main:{platform}:{chat_type}[:{chat_id}][:{thread_id}][:{participant_id}]
|
|
```
|
|
|
|
### DM Rules
|
|
|
|
| Scenario | Key |
|
|
|---|---|
|
|
| DM with chat_id | `agent:main:telegram:dm:12345` |
|
|
| DM with chat_id + thread | `agent:main:telegram:dm:12345:thread_678` |
|
|
| DM without chat_id, with participant_id | `agent:main:signal:dm:user_abc` |
|
|
| DM without chat_id or participant_id | `agent:main:telegram:dm` |
|
|
| WhatsApp DM (canonicalized) | `agent:main:whatsapp:dm:{canonical_number}` |
|
|
|
|
- DMs always include `chat_id` when present, isolating each private conversation.
|
|
- `thread_id` further differentiates threaded DMs within the same DM chat.
|
|
- Without `chat_id`, falls back to `user_id_alt` or `user_id` as participant_id.
|
|
- Without any identifier, all DMs on that platform collapse to one shared session.
|
|
|
|
### Group/Channel Rules
|
|
|
|
| Scenario | Key |
|
|
|---|---|
|
|
| Group chat | `agent:main:telegram:group:-10012345` |
|
|
| Group chat, per-user isolation | `agent:main:telegram:group:-10012345:user_abc` |
|
|
| Thread in group, shared | `agent:main:discord:group:12345:thread_678` |
|
|
| Thread in group, per-user | `agent:main:discord:group:12345:thread_678:user_abc` |
|
|
| Channel | `agent:main:slack:channel:C12345` |
|
|
| WhatsApp group (canonicalized) | `agent:main:whatsapp:group:{canonical_id}:{participant}` |
|
|
|
|
- `chat_id` identifies the parent group/channel.
|
|
- `thread_id` differentiates threads within that parent.
|
|
- **Per-user isolation** (append `participant_id`) is controlled by:
|
|
- `group_sessions_per_user` (default: `True`) — group/channel sessions are isolated.
|
|
- `thread_sessions_per_user` (default: `False`) — threads are **shared** by default
|
|
(Telegram forum topics, Discord threads, Slack threads all share one session per thread).
|
|
- `participant_id` = `user_id_alt` or `user_id` (in that priority).
|
|
- WhatsApp identifiers are canonicalized to handle JID/LID alias flips.
|
|
|
|
### Special Case: WhatApp
|
|
|
|
WhatsApp phone numbers go through `canonical_whatsapp_identifier()` which strips the
|
|
`@s.whatsapp.net` suffix and normalizes to E.164 format. This prevents session fragmentation
|
|
when the bridge returns different alias forms of the same phone number.
|
|
|
|
---
|
|
|
|
## 5. Multi-User Isolation Strategy
|
|
|
|
Multi-user isolation determines whether multiple users in the same chat share a conversation
|
|
or each get their own private session.
|
|
|
|
### Decision Logic (`is_shared_multi_user_session`)
|
|
|
|
```python
|
|
def is_shared_multi_user_session(source, *, group_sessions_per_user, thread_sessions_per_user):
|
|
if source.chat_type == "dm":
|
|
return False # DMs are always private
|
|
if source.thread_id:
|
|
return not thread_sessions_per_user # Threads: shared unless per-user
|
|
return not group_sessions_per_user # Groups: isolated unless shared
|
|
```
|
|
|
|
### Summary
|
|
|
|
| Chat Type | Default | Config Control |
|
|
|---|---|---|
|
|
| DM | Private (never shared) | N/A |
|
|
| Group/Channel | Per-user isolation | `group_sessions_per_user` (default: True) |
|
|
| Thread (forum, discord) | Shared (all participants see same context) | `thread_sessions_per_user` (default: False) |
|
|
|
|
### Impact on System Prompt
|
|
|
|
When `shared_multi_user_session=True`, the system prompt omits a fixed user name and instead
|
|
states: *"Multi-user {thread|session} — messages are prefixed with [sender name]. Multiple
|
|
users may participate."* Individual sender names are prefixed on each user message by the
|
|
gateway at runtime, preserving prompt caching (the system prompt doesn't change per-turn).
|
|
|
|
---
|
|
|
|
## 6. Reset Policy
|
|
|
|
Reset policies control when a session automatically loses context (gets a new `session_id`).
|
|
|
|
### Policy Modes (`SessionResetPolicy`)
|
|
|
|
| Mode | Behavior | Default Config |
|
|
|---|---|---|
|
|
| `"none"` | Never auto-reset. Context managed only by compression. | — |
|
|
| `"idle"` | Reset after N minutes of inactivity from `updated_at`. | `idle_minutes: 1440` (24h) |
|
|
| `"daily"` | Reset at a specific hour each day (local time). | `at_hour: 4` (4 AM) |
|
|
| `"both"` | Whichever triggers first — daily boundary OR idle timeout. | **(default)** |
|
|
|
|
### Policy Evaluation
|
|
|
|
```python
|
|
# Idle check
|
|
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
|
|
if now > idle_deadline: return "idle"
|
|
|
|
# Daily check
|
|
today_reset = now.replace(hour=policy.at_hour, minute=0, second=0, microsecond=0)
|
|
if now.hour < policy.at_hour:
|
|
today_reset -= timedelta(days=1) # Reset hasn't happened yet today
|
|
if entry.updated_at < today_reset: return "daily"
|
|
```
|
|
|
|
### Per-Platform/Per-Type Policies
|
|
|
|
Reset policies are configurable per platform and session type via `config.get_reset_policy()`.
|
|
This allows different platforms to have different expiry rules (e.g., Telegram DMs reset
|
|
after 24h idle, but Slack groups persist indefinitely).
|
|
|
|
### Exclusions
|
|
|
|
Sessions with active background processes are **never** expired or reset. The
|
|
`has_active_processes_fn` callback checks for running processes when evaluating policies.
|
|
|
|
### Reset Effects
|
|
|
|
When a reset triggers:
|
|
|
|
1. Old session is ended in SQLite (with reason `"session_reset"`).
|
|
2. New `session_id` is generated (`YYYYMMDD_HHMMSS_<8hex>`).
|
|
3. New `SessionEntry` is created with `was_auto_reset=True` and the reset reason.
|
|
4. `reset_had_activity` is set if the old session had any turns (`total_tokens > 0`).
|
|
5. The old AIAgent cache entry is evicted on the next expiry watcher pass.
|
|
6. On the first message after reset, a context notice is injected: "Session expired due to inactivity / daily reset."
|
|
|
|
---
|
|
|
|
## 7. Restart Recovery Flow
|
|
|
|
The restart recovery system ensures that in-flight sessions are preserved across gateway
|
|
restarts, crashes, and drain timeouts. It is the solution to issue #7536.
|
|
|
|
### Startup Recovery Sequence
|
|
|
|
```
|
|
Gateway starts
|
|
│
|
|
▼
|
|
┌───────────────────────────────┐
|
|
│ Check for .clean_shutdown │── Exists? ──► Skip suspension (clean exit)
|
|
│ marker │
|
|
└───────────────────────────────┘
|
|
│ Missing
|
|
▼
|
|
┌───────────────────────────────┐
|
|
│ session_store │── Marks sessions updated within
|
|
│ .suspend_recently_active() │ last 120 seconds as resume_pending
|
|
└───────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────┐
|
|
│ _suspend_stuck_loop_sessions()│── Suspends sessions that have been
|
|
│ │ active across 3+ restarts
|
|
└───────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────┐
|
|
│ Queue inbound messages while │
|
|
│ startup restore runs │
|
|
│ (_startup_restore_in_progress)│
|
|
└───────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────┐
|
|
│ For each adapter, find │
|
|
│ resume_pending sessions → │
|
|
│ synthesize MessageEvent and │
|
|
│ run _handle_message to let │
|
|
│ the agent auto-continue │
|
|
└───────────────────────────────┘
|
|
```
|
|
|
|
### suspend_recently_active(max_age_seconds=120)
|
|
|
|
Called on gateway startup when no `.clean_shutdown` marker exists (indicating a crash or
|
|
unexpected exit). For each session updated within the last 120 seconds:
|
|
|
|
- Sets `resume_pending=True`, `resume_reason="restart_interrupted"`,
|
|
`last_resume_marked_at=now`.
|
|
- Skips entries already `resume_pending=True` (no double-mark).
|
|
- Skips entries explicitly `suspended=True` (hard wipe should stay).
|
|
|
|
### Stuck-Loop Detection (`_suspend_stuck_loop_sessions`)
|
|
|
|
Counts consecutive restarts via a JSON file (`{HERMES_HOME}/restart_counts.json`). If a
|
|
session has been active across 3+ consecutive restarts, it's auto-suspended so the user
|
|
gets a clean slate.
|
|
|
|
### Drain-Timeout Marking
|
|
|
|
On graceful shutdown/restart, the drain system calls `mark_resume_pending()` for any
|
|
session that was mid-turn when the drain timeout fired. Reasons:
|
|
|
|
- `"restart_timeout"` — killed during restart drain
|
|
- `"shutdown_timeout"` — killed during shutdown drain
|
|
- `"restart_interrupted"` — crash recovery (from `suspend_recently_active`)
|
|
|
|
All three reasons are in `_AUTO_RESUME_REASONS` and eligible for startup auto-resume.
|
|
|
|
### Auto-Resume on Next Access
|
|
|
|
When `get_or_create_session()` encounters `resume_pending=True`:
|
|
|
|
1. It returns the existing entry **without** creating a new `session_id`.
|
|
2. The existing transcript is loaded intact.
|
|
3. The marking is not cleared here — it survives until the next successful turn
|
|
completes (`clear_resume_pending()` is called from the gateway after
|
|
`run_conversation()` returns a real response).
|
|
4. If the resumed turn is interrupted again, the `resume_pending` flag remains set,
|
|
and the next restart will retry. The stuck-loop counter handles terminal escalation
|
|
(3 retries → suspended).
|
|
|
|
### Clean Shutdown Marker (`.clean_shutdown`)
|
|
|
|
Written at the end of a graceful shutdown. On next startup:
|
|
|
|
- If present: skip `suspend_recently_active()` entirely. Active agents were already
|
|
drained, so no sessions are stuck.
|
|
- Then delete the marker.
|
|
|
|
This prevents unwanted auto-resets after `hermes update`, `hermes gateway restart`,
|
|
or `/restart`.
|
|
|
|
---
|
|
|
|
## 8. Message Queuing Flow
|
|
|
|
The message queuing system handles two scenarios:
|
|
|
|
1. **Interrupt follow-ups** — When a user sends multiple messages while the agent is
|
|
processing, subsequent messages are queued as single-slot pending messages.
|
|
2. **`/queue` FIFO** — Explicit `/queue` commands that must each produce their own full
|
|
agent turn, in order, without merging.
|
|
|
|
### Data Structures
|
|
|
|
```
|
|
adapter._pending_messages: Dict[session_key, MessageEvent]
|
|
└── Single "next-up" slot per session. Overwritten on repeat sends
|
|
(burst collapse). Shared with photo-burst follow-ups.
|
|
|
|
self._queued_events: Dict[session_key, List[MessageEvent]]
|
|
└── Overflow buffer. Each /queue invocation appends here when the
|
|
slot is occupied. Promoted one-at-a-time after each drain.
|
|
```
|
|
|
|
### Enqueue (`_enqueue_fifo`)
|
|
|
|
```
|
|
_enqueue_fifo(session_key, event, adapter)
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────┐
|
|
│ Is slot free? │
|
|
│ (session_key NOT in _pending_messages)│── Yes ──► Place event in slot
|
|
└───────────────────────────────────────┘
|
|
│ No
|
|
▼
|
|
Append to _queued_events[session_key] (overflow tail)
|
|
```
|
|
|
|
### Dequeue / Promotion (`_promote_queued_event`)
|
|
|
|
Called at the drain site after the slot was consumed. If there's an overflow item:
|
|
|
|
- When `pending_event is None` (slot was empty), return overflow head as the new event.
|
|
- When `pending_event` exists, stage overflow head in the slot for the next recursion.
|
|
- If no adapter available, push back to `_queued_events` (don't silently drop).
|
|
|
|
### Queue Depth
|
|
|
|
`_queue_depth(session_key, adapter)` returns `len(overflow) + (1 if slot occupied else 0)`.
|
|
|
|
### Clearing
|
|
|
|
Queued events for a session are cleared on `/new` and `/reset` (via `_handle_reset_command`).
|
|
|
|
### FIFO Invariant
|
|
|
|
Each `/queue` invocation produces exactly one full agent turn, in FIFO order, with no
|
|
merging. The single-slot `_pending_messages` + overflow `_queued_events` design ensures
|
|
that repeated sends during an active turn don't cause out-of-order processing.
|
|
|
|
---
|
|
|
|
## 9. Session Context Injection
|
|
|
|
`SessionContext` is built from a `SessionSource` and `GatewayConfig` and injected into the
|
|
agent's system prompt. It tells the agent:
|
|
|
|
- Where the current message came from
|
|
- What platforms are connected
|
|
- Where it can deliver scheduled task outputs
|
|
- Whether this is a shared multi-user session
|
|
|
|
### Construction (`build_session_context`)
|
|
|
|
```python
|
|
def build_session_context(source, config, session_entry=None) -> SessionContext
|
|
```
|
|
|
|
1. Collects connected platforms from config.
|
|
2. Collects home channels for each platform.
|
|
3. Determines `shared_multi_user_session` via `is_shared_multi_user_session()`.
|
|
4. Attaches session metadata (key, id, timestamps) if `session_entry` is provided.
|
|
|
|
### PII Redaction (`build_session_context_prompt`)
|
|
|
|
The dynamic system prompt section (`## Current Session Context`) can optionally redact
|
|
personally identifiable information before sending to the LLM:
|
|
|
|
- User IDs → `user_<12hex>` (SHA-256 prefix)
|
|
- Chat IDs → `<platform>:<12hex>` or just `<12hex>`
|
|
- Platforms excluded from redaction: Discord (needs raw IDs for `@mentions`),
|
|
and any plugin-registered platform not marked `pii_safe`.
|
|
|
|
Redaction applies only to the system prompt text. Routing, session keys, and adapter
|
|
operations always use the original values.
|
|
|
|
---
|
|
|
|
## 10. Background Expiry Watcher
|
|
|
|
The `_session_expiry_watcher` task runs in the gateway event loop every 300 seconds (5 min).
|
|
|
|
### Responsibilities
|
|
|
|
1. **Finalize expired sessions** — For each entry where `_is_session_expired()` returns
|
|
True and `expiry_finalized` is False:
|
|
- Invoke `on_session_finalize` plugin hooks (cleanup, notifications).
|
|
- Clean up cached AIAgent resources (close tool resources, shut down memory provider).
|
|
- Evict the cached agent entry.
|
|
- Clear per-session overrides (`_session_model_overrides`, reasoning overrides, etc.).
|
|
- Mark `expiry_finalized=True` and persist.
|
|
|
|
2. **Sweep idle cached agents** — Calls `_sweep_idle_cached_agents()` to evict agents that
|
|
have been idle beyond `_AGENT_CACHE_IDLE_TTL_SECS` (3600s / 1h), regardless of session
|
|
reset policy. This prevents unbounded memory growth in gateways with long-lived sessions.
|
|
|
|
3. **Prune stale entries** — Calls `session_store.prune_old_entries()` hourly based on
|
|
`config.session_store_max_age_days`. Prevents `sessions.json` from growing unbounded.
|
|
|
|
### Failure Handling
|
|
|
|
- Per-session retry count: each failed finalize is retried up to 3 consecutive times.
|
|
- After 3 failures, the entry is force-marked `expiry_finalized=True` to prevent infinite
|
|
retry loops.
|
|
|
|
---
|
|
|
|
## 11. Agent Cache
|
|
|
|
The gateway maintains an LRU cache of `AIAgent` instances keyed by `session_key` to
|
|
preserve prompt caching across turns.
|
|
|
|
### Cache Properties
|
|
|
|
- **Max size:** 128 entries (`_AGENT_CACHE_MAX_SIZE`).
|
|
- **Eviction policy:** Least-recently-used (LRU via `OrderedDict`).
|
|
- **Idle TTL:** 3600s (1h) — enforced by `_session_expiry_watcher`.
|
|
- **Lock:** `_agent_cache_lock` (threading) for thread safety.
|
|
|
|
### Cache Lifecycle
|
|
|
|
```
|
|
Message arrives
|
|
│
|
|
▼
|
|
get_or_create_session() → session_key obtained
|
|
│
|
|
▼
|
|
Lookup _agent_cache[session_key]
|
|
│
|
|
├── Hit → move_to_end(), reuse AIAgent (preserves prompt cache)
|
|
│
|
|
└── Miss → create new AIAgent, store in cache
|
|
(if at capacity, popitem(last=False) evicts LRU entry)
|
|
│
|
|
▼
|
|
run_conversation() → agent processes message
|
|
│
|
|
▼
|
|
Session expiry watcher evicts agent when session finalizes
|
|
```
|
|
|
|
### Cleanup Flow
|
|
|
|
When a session expires:
|
|
1. `_cleanup_agent_resources(agent)` — shuts down memory provider, closes tool resources.
|
|
2. `_evict_cached_agent(key)` — removes from `_agent_cache` so the agent can be GC'd.
|
|
|
|
---
|
|
|
|
## Appendix: Key Configuration
|
|
|
|
| Config Key | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `group_sessions_per_user` | `bool` | `true` | Isolate group/channel sessions per user |
|
|
| `thread_sessions_per_user` | `bool` | `false` | Isolate thread sessions per user |
|
|
| `session_store_max_age_days` | `int` | `0` | Prune sessions older than N days (0=disabled) |
|
|
| `agent.gateway_auto_continue_freshness` | `int` | `3600` | Seconds for resume freshness window |
|
|
| `agent.gateway_timeout` | `int` | `1800` | Agent turn timeout (30 min default) |
|
|
|
|
### Reset Policy (per-platform/type, in config.yaml)
|
|
|
|
```yaml
|
|
session_reset:
|
|
mode: both # none | idle | daily | both
|
|
at_hour: 4 # daily reset hour (local time)
|
|
idle_minutes: 1440 # idle timeout (24h)
|
|
notify: true # notify user on auto-reset
|
|
```
|
|
|
|
Platform-specific overrides can be set under `platforms.<name>.session_reset`.
|