* fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction
The per-session AIAgent cache was unbounded. Each cached AIAgent holds
LLM clients, tool schemas, memory providers, and a conversation buffer.
In a long-lived gateway serving many chats/threads, cached agents
accumulated indefinitely — entries were only evicted on /new, /model,
or session reset.
Changes:
- Cache is now an OrderedDict so we can pop least-recently-used entries.
- _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64
when a new agent is inserted. LRU order is refreshed via move_to_end()
on cache hits.
- _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle
longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing
_session_expiry_watcher so no new background task is created.
- The expiry watcher now also pops the cache entry after calling
_cleanup_agent_resources on a flushed session — previously the agent
was shut down but its reference stayed in the cache dict.
- Evicted agents have _cleanup_agent_resources() called on a daemon
thread so the cache lock isn't held during slow teardown.
Both tuning constants live at module scope so tests can monkeypatch
them without touching class state.
Tests: 7 new cases in test_agent_cache.py covering LRU eviction,
move_to_end refresh, cleanup thread dispatch, idle TTL sweep,
defensive handling of agents without _last_activity_ts, and plain-dict
test fixture tolerance.
* tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128
* fix(gateway): never evict mid-turn agents; live spillover tests
The prior commit could tear down an active agent if its session_key
happened to be LRU when the cap was exceeded. AIAgent.close() kills
process_registry entries for the task, tears down the terminal
sandbox, closes the OpenAI client (sets self.client = None), and
cascades .close() into any active child subagents — all fatal if
the agent is still processing a turn.
Changes:
- _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at
GatewayRunner._running_agents and skip any entry whose AIAgent
instance is present (identity via id(), so MagicMock doesn't
confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated
as 'not active' since no real agent exists yet.
- Eviction only considers the LRU-excess window (first size-cap
entries). If an excess slot is held by a mid-turn agent, we skip
it WITHOUT compensating by evicting a newer entry. A freshly
inserted session (zero cache history) shouldn't be punished to
protect a long-lived one that happens to be busy.
- Cache may therefore stay transiently over cap when load spikes;
a WARNING is logged so operators can see it, and the next insert
re-runs the check after some turns have finished.
New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive):
- Active LRU entry is skipped; no newer entry compensated
- Mixed active/idle excess window: only idle slots go
- All-active cache: no eviction, WARNING logged, all clients intact
- _AGENT_PENDING_SENTINEL doesn't block other evictions
- Idle-TTL sweep skips active agents
- End-to-end: active agent's .client survives eviction attempt
- Live fill-to-cap with real AIAgents, then spillover
- Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown
- Live: 8 threads racing 160 inserts into CAP=16 — settles at 16
- Live: evicted session's next turn gets a fresh agent that works
30 tests pass (13 pre-existing + 17 new). Related gateway suites
(model switch, session reset, proxy, etc.) all green.
* fix(gateway): cache eviction preserves per-task state for session resume
The prior commits called AIAgent.close() on cache-evicted agents, which
tears down process_registry entries, terminal sandbox, and browser
daemon for that task_id — permanently. Fine for session-expiry (session
ended), wrong for cache eviction (session may resume).
Real-world scenario: a user leaves a Telegram session open for 2+ hours,
idle TTL evicts the cached AIAgent, user returns and sends a message.
Conversation history is preserved via SessionStore, but their terminal
sandbox (cwd, env vars, bg shells) and browser state were destroyed.
Fix: split the two cleanup modes.
close() Full teardown — session ended. Kills bg procs,
tears down terminal sandbox + browser daemon,
closes LLM client. Used by session-expiry,
/new, /reset (unchanged).
release_clients() Soft cleanup — session may resume. Closes
LLM client only. Leaves process_registry,
terminal sandbox, browser daemon intact
for the resuming agent to inherit via
shared task_id.
Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents)
now dispatches _release_evicted_agent_soft on the daemon thread instead
of _cleanup_agent_resources. All session-expiry call sites of
_cleanup_agent_resources are unchanged.
Tests (TestAgentCacheIdleResume, 5 new cases):
- release_clients does NOT call process_registry.kill_all
- release_clients does NOT call cleanup_vm / cleanup_browser
- release_clients DOES close the LLM client (agent.client is None after)
- close() vs release_clients() — semantic contract pinned
- Idle-evicted session's rebuild with same session_id gets same task_id
Updated test_cap_triggers_cleanup_thread to assert the soft path fires
and the hard path does NOT.
35 tests pass in test_agent_cache.py; 67 related tests green.
Adds a new DISCORD_ALLOWED_ROLES environment variable that allows filtering
bot interactions by Discord role ID. Uses OR semantics with the existing
DISCORD_ALLOWED_USERS - if a user matches either allowlist, they're permitted.
Changes:
- Parse DISCORD_ALLOWED_ROLES comma-separated role IDs on connect
- Enable members intent when roles are configured (needed for role lookup)
- Update _is_allowed_user() to accept optional author param for direct role check
- Fallback to scanning mutual guilds when author object lacks roles (DMs, voice)
- Fully backwards compatible: no behavior change when env var is unset
Fixes#4466.
Root cause: two sequential authorization gates both independently rejected
bot messages, making DISCORD_ALLOW_BOTS completely ineffective.
Gate 1 — `discord.py` `on_message`:
_is_allowed_user ran BEFORE the bot filter, so bot senders were dropped
before the DISCORD_ALLOW_BOTS policy was ever evaluated.
Gate 2 — `gateway/run.py` _is_user_authorized:
The gateway-level allowlist check rejected bot IDs with 'Unauthorized
user: <bot_id>' even if they passed Gate 1.
Fix:
gateway/platforms/discord.py — reorder on_message so DISCORD_ALLOW_BOTS
runs BEFORE _is_allowed_user. Bots permitted by the filter skip the
user allowlist; non-bots are still checked.
gateway/session.py — add is_bot: bool = False to SessionSource so the
gateway layer can distinguish bot senders.
gateway/platforms/base.py — expose is_bot parameter in build_source.
gateway/platforms/discord.py _handle_message — set is_bot=True when
building the SessionSource for bot authors.
gateway/run.py _is_user_authorized — when source.is_bot is True AND
DISCORD_ALLOW_BOTS is 'mentions' or 'all', return True early. Platform
filter already validated the message at on_message; don't re-reject.
Behavior matrix:
| Config | Before | After |
| DISCORD_ALLOW_BOTS=none (default) | Blocked | Blocked |
| DISCORD_ALLOW_BOTS=all | Blocked | Allowed |
| DISCORD_ALLOW_BOTS=mentions + @mention | Blocked | Allowed |
| DISCORD_ALLOW_BOTS=mentions, no mention | Blocked | Blocked |
| Human in DISCORD_ALLOWED_USERS | Allowed | Allowed |
| Human NOT in DISCORD_ALLOWED_USERS | Blocked | Blocked |
Co-authored-by: Hermes Maintainer <hermes@nousresearch.com>
The Weixin adapter's send() method previously split and delivered the
raw response text without first extracting MEDIA: tags or bare local
file paths. This meant images, documents, and voice files referenced
by the agent were silently dropped in normal (non-streaming,
non-background) conversations.
Changes:
- In WeixinAdapter.send(), call extract_media() and
extract_local_files() before formatting/splitting text.
- Deliver extracted files via send_image_file(), send_document(),
send_voice(), or send_video() prior to sending text chunks.
- Also fix two minor typing issues in gateway/run.py where
extract_media() tuples were not unpacked correctly in background
and /btw task handlers.
Fixes missing media delivery on Weixin personal accounts.
* - make buffered streaming
- fix path naming to expand `~` for agent.
- fix stripping of matrix ID to not remove other mentions / localports.
* fix(matrix): register MembershipEventDispatcher for invite auto-join
The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE
events are only dispatched when MembershipEventDispatcher is registered on the
client. Without it, _on_invite is dead code and the bot silently ignores all
room invites.
Closes#10094Closes#10725
Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz)
* fix(matrix): preserve _joined_rooms reference for CryptoStateStore
connect() reassigned self._joined_rooms = set(...) after initial sync,
orphaning the reference captured by _CryptoStateStore at init time.
find_shared_rooms() returned [] forever, breaking Megolm session rotation
on membership changes.
Mutate in place with clear() + update() so the CryptoStateStore reference
stays valid.
Refs #8174, PR #8215
* fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race
mautrix auto-registers DecryptionDispatcher when client.crypto is set.
The adapter also registered _on_encrypted_event for the same event type.
_on_encrypted_event had zero awaits and won the race to mark event IDs
in the dedup set, causing _on_room_message to drop successfully decrypted
events from DecryptionDispatcher. The retry loop masked this by re-decrypting
every message ~4 seconds later.
Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption;
genuinely undecryptable events are logged by mautrix and retried on next
key exchange.
Refs #8174, PR #8215
* fix(matrix): re-verify device keys after share_keys() upload
Matrix homeservers treat ed25519 identity keys as immutable per device.
share_keys() can return 200 but silently ignore new keys if the device
already exists with different identity keys. The bot would proceed with
shared=True while peers encrypt to the old (unreachable) keys.
Now re-queries the server after share_keys() and fails closed if keys
don't match, with an actionable error message.
Refs #8174, PR #8215
* fix(matrix): encrypt outbound attachments in E2EE rooms
_upload_and_send() uploaded raw bytes and used the 'url' key for all
rooms. In E2EE rooms, media must be encrypted client-side with
encrypt_attachment(), the ciphertext uploaded, and the 'file' key
(with key/iv/hashes) used instead of 'url'.
Now detects encrypted rooms via state_store.is_encrypted() and
branches to the encrypted upload path.
Refs: PR #9822 (charles-brooks)
* fix(matrix): add stop_typing to clear typing indicator after response
The adapter set a 30-second typing timeout but never cleared it.
The base class stop_typing() is a no-op, so the typing indicator
lingered for up to 30 seconds after each response.
Closes#6016
Refs: PR #6020 (r266-tech)
* fix(matrix): cache all media types locally, not just photos/voice
should_cache_locally only covered PHOTO, VOICE, and encrypted media.
Unencrypted audio/video/documents in plaintext rooms were passed as MXC
URLs that require authentication the agent doesn't have, resulting
in 401 errors.
Refs #3487, #3806
* fix(matrix): detect stale OTK conflict on startup and fail closed
When crypto state is wiped but the same device ID is reused, the
homeserver may still hold one-time keys signed with the previous
identity key. Identity key re-upload succeeds but OTK uploads fail
with "already exists" and a signature mismatch. Peers cannot
establish new Olm sessions, so all new messages are undecryptable.
Now proactively flushes OTKs via share_keys() during connect() and
catches the "already exists" error with an actionable log message
telling the operator to purge the device from the homeserver or
generate a fresh device ID.
Also documents the crypto store recovery procedure in the Matrix
setup guide.
Refs #8174
* docs(matrix): improve crypto recovery docs per review
- Put easy path (fresh access token) first, manual purge second
- URL-encode user ID in Synapse admin API example
- Note that device deletion may invalidate the access token
- Add "stop Synapse first" caveat for direct SQLite approach
- Mention the fail-closed startup detection behavior
- Add back-reference from upgrade section to OTK warning
* refactor(matrix): cleanup from code review
- Extract _extract_server_ed25519() and _reverify_keys_after_upload()
to deduplicate the re-verification block (was copy-pasted in two
places, three copies of ed25519 key extraction total)
- Remove dead code: _pending_megolm, _retry_pending_decryptions,
_MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after
removing _on_encrypted_event
- Remove tautological TestMediaCacheGate (tested its own predicate,
not production code)
- Remove dead TestMatrixMegolmEventHandling and
TestMatrixRetryPendingDecryptions (tested removed methods)
- Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator
- Trim comment to just the "why"
Users (Teknium) report missing debug reports before the 1-hour auto-delete
fires. 6 hours gives enough window for async bug-report triage without
leaving sensitive log data on public paste services indefinitely.
Applies to both the CLI (hermes debug share) and gateway (/debug) paths.
Initialize next_channel_prompt before the pending_event check and use
getattr with None default, matching the existing pattern for
next_source/next_message/next_message_id. Prevents AttributeError
when pending_event is None (interrupt path).
Cherry-picked from #10953 by @jackjin1997.
Switch from fragile Markdown V1 to HTML parse mode with html.escape()
for exec approval messages. Add fallback to text-based approval when
the formatted send fails.
Cherry-picked from #10999 by @danieldoderlein.
config.yaml terminal.cwd is now the single source of truth for working
directory. MESSAGING_CWD and TERMINAL_CWD in .env are deprecated with a
migration warning.
Changes:
1. config.py: Remove MESSAGING_CWD from OPTIONAL_ENV_VARS (setup wizard
no longer prompts for it). Add warn_deprecated_cwd_env_vars() that
prints a migration hint when deprecated env vars are detected.
2. gateway/run.py: Replace all MESSAGING_CWD reads with TERMINAL_CWD
(which is bridged from config.yaml terminal.cwd). MESSAGING_CWD is
still accepted as a backward-compat fallback with deprecation warning.
Config bridge skips cwd placeholder values so they don't clobber
the resolved TERMINAL_CWD.
3. cli.py: Guard against lazy-import clobbering — when cli.py is
imported lazily during gateway runtime (via delegate_tool), don't
let load_cli_config() overwrite an already-resolved TERMINAL_CWD
with os.getcwd() of the service's working directory. (#10817)
4. hermes_cli/main.py: Add 'hermes memory reset' command with
--target all/memory/user and --yes flags. Profile-scoped via
HERMES_HOME.
Migration path for users with .env settings:
Remove MESSAGING_CWD / TERMINAL_CWD from .env
Add to config.yaml:
terminal:
cwd: /your/project/path
Addresses: #10225, #4672, #10817, #7663
When the LLM returns an empty completion, gateway/run.py replaced
final_response with the literal string '(No response generated)'.
This defeated cron/scheduler.py's empty-response skip guard, causing
the placeholder to be delivered to home channels.
Changes:
- gateway/run.py: return empty string instead of placeholder when
there is no error and no response content
- cron/scheduler.py: defensively strip the placeholder text in case
any upstream path still produces it
FixesNousResearch/hermes-agent#9270
All 10 call sites in gateway/run.py and gateway/platforms/api_server.py
are inside async functions where a loop is guaranteed to be running.
get_event_loop() is deprecated since Python 3.10 — it can silently
create a new loop when none is running, masking bugs.
get_running_loop() raises RuntimeError instead, which is safer.
Surfaced during review of PRs #10533 and #10647.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Three targeted fixes for the 'agent stuck on terminal command' report:
1. **Concurrent tool wait loop now checks interrupts** (run_agent.py)
The sequential path checked _interrupt_requested before each tool call,
but the concurrent path's wait loop just blocked with 30s timeouts.
Now polls every 5s and cancels pending futures on interrupt, giving
already-running tools 3s to notice the per-thread interrupt signal.
2. **Cancelled concurrent tools get proper interrupt messages** (run_agent.py)
When a concurrent tool is cancelled or didn't return a result due to
interrupt, the tool result message says 'skipped due to user interrupt'
instead of a generic error.
3. **Typing indicator fires before follow-up turn** (gateway/run.py)
After an interrupt is acknowledged and the pending message dequeued,
the gateway now sends a typing indicator before starting the recursive
_run_agent call. This gives the user immediate visual feedback that
the system is processing their new message (closing the perceived
'dead air' gap between the interrupt ack and the response).
Reported by @_SushantSays.
Fixes 12 CI test failures:
1. test_cli_new_session (4): _FakeAgent missing commit_memory_session
attribute added in the memory provider refactoring. Added MagicMock.
2. test_run_progress_topics (1): already_sent detection only checked
stream consumer flags, missing the response_previewed path from
interim_assistant_callback. Restructured guard to check both paths.
3. test_timezone (1): HERMES_TIMEZONE leaked into child processes via
_SAFE_ENV_PREFIXES matching HERMES_*. The code correctly converts
it to TZ but didn't remove the original. Added child_env.pop().
4. test_session_env (1): contextvars baseline captured from a different
context couldn't be restored after clear. Changed assertion to verify
the test's value was removed rather than comparing to a fragile baseline.
5. test_discord_slash_commands (5): already fixed on current main.
Gateway executor work now inherits the active session contextvars via
copy_context() so background process watchers retain the correct
platform/chat/user/session metadata for routing completion events back
to the originating chat.
Cherry-picked from #10647 by @helix4u with:
- Use asyncio.get_running_loop() instead of deprecated get_event_loop()
- Strip trailing whitespace
- Add *args forwarding test
- Add exception propagation test
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.
When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'
Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.
The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.
Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.
Based on PR #10484 by Xowiek.
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.
Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.
The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes#10541
- Remove double str() normalization in _resolve_channel_prompt since
config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
_parse_session_key() blindly assigned parts[5] as thread_id for all
chat types. For group sessions with per-user isolation, parts[5] is
a user_id, not a thread_id. This could cause shutdown notifications
to route with incorrect thread metadata.
Only return thread_id for chat types where the 6th element is
unambiguous: dm and thread. For group/channel sessions, omit
thread_id since the suffix may be a user_id.
Based on the approach from PR #9938 by @Ruzzgar.
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery. The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.
Two changes:
1. gateway/run.py return path: skip already_sent suppression when the
final_response is "(empty)" or empty — the user needs to know the
agent finished even if streaming sent partial content earlier.
2. gateway/run.py response handler: convert the internal "(empty)"
sentinel to a user-friendly warning instead of delivering the raw
sentinel string.
Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
notices, and the delete subcommand
Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
Two gateway fixes:
1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)
Previously, is_duplicate() returned True for any previously seen ID
without checking its age — expired entries were only purged when cache
size exceeded max_size. On normal workloads that never overflow, message
IDs stayed deduplicated forever instead of expiring after the TTL.
Fix: check `now - timestamp < ttl` before returning True. Expired
entries are removed and treated as new messages.
2. Gateway --config flag now uses yaml.safe_load() (#10216)
The --config CLI flag in gateway/run.py main() used json.load() to
parse config files. YAML is the only documented config format and
every other config loader uses yaml.safe_load(). A YAML config file
passed via --config would crash with json.JSONDecodeError.
Closes#10306Closes#10216
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
- Populate watcher_* routing fields for watch-only processes (not just
notify_on_complete), so watch-pattern events carry direct metadata
instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
(empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Three fixes for the duplicate reply bug affecting all gateway platforms:
1. base.py: Suppress stale response when the session was interrupted by a
new message that hasn't been consumed yet. Checks both interrupt_event
and _pending_messages to avoid false positives. (#8221, #2483)
2. run.py (return path): Remove response_previewed guard from already_sent
check. Stream consumer's already_sent alone is authoritative — if
content was delivered via streaming, the duplicate send must be
suppressed regardless of the agent's response_previewed flag. (#8375)
3. run.py (queued-message path): Same fix — already_sent without
response_previewed now correctly marks the first response as already
streamed, preventing re-send before processing the queued message.
The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.
Concepts from PR #8380 (konsisumer). Closes#8375, #8221, #2483.
Three independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix (uv doesn't set up traditional
venv activation). The generated ExecStart pointed to a Python
binary without site-packages, crashing the service on startup.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models (GLM-5, mimo-v2-pro) sometimes return empty
responses after tool calls instead of continuing to the next
step. The agent silently abandoned the remaining work with
'(empty)' or used prior-turn fallback text.
Fix: when the model returns empty after tool calls AND there's
no prior-turn content to fall back on, inject a one-time user
nudge message telling the model to process the tool results and
continue. The flag resets after each successful tool round so it
can fire again on later rounds.
Test plan: 97 gateway + CLI tests pass, 9 venv detection tests pass
When a user sends a message while the agent is executing a task on the
gateway, the agent is now interrupted immediately — not silently queued.
Previously, messages were stored in _pending_messages with zero feedback
to the user, potentially leaving them waiting 1+ hours.
Root cause: Level 1 guard (base.py) intercepted all messages for active
sessions and returned with no response. Level 2 (gateway/run.py) which
calls agent.interrupt() was never reached.
Fix: Expand _handle_active_session_busy_message to handle the normal
(non-draining) case:
1. Call running_agent.interrupt(text) to abort in-flight tool calls
and exit the agent loop at the next check point
2. Store the message as pending so it becomes the next turn once the
interrupted run returns
3. Send a brief ack: 'Interrupting current task (10 min elapsed,
iteration 21/60, running: terminal). I'll respond shortly.'
4. Debounce acks to once per 30s to avoid spam on rapid messages
Reported by @Lonely__MH.
When compression fails after max attempts, the agent returns
{completed: False, partial: True} but was missing the 'failed' flag.
The gateway's agent_failed_early guard checked for 'failed' AND
'not final_response', but _run_agent_blocking always converts errors
to final_response — making the guard dead code. This caused the
oversized session to persist, creating an infinite fail loop where
every subsequent message hits the same compression failure.
Changes:
- run_agent.py: add 'failed: True' and 'compression_exhausted: True'
to all 5 compression-exhaustion return paths
- gateway/run.py (_run_agent_blocking): forward 'failed' and
'compression_exhausted' flags through to the caller
- gateway/run.py (_handle_message_with_agent): fix agent_failed_early
to check bool(failed) without the broken 'not final_response' clause;
auto-reset the session when compression is exhausted so the next
message starts fresh
- Update tests to match new guard logic and add
TestCompressionExhaustedFlag test class
Closes#9893
* fix: hermes gateway restart waits for service to come back up (#8260)
Previously, systemd_restart() sent SIGUSR1 to the gateway, printed
'restart requested', and returned immediately. The gateway still
needed to drain active agents, exit with code 75, wait for systemd's
RestartSec=30, and start the new process. The user saw 'success' but
the gateway was actually down for 30-60 seconds.
Now the SIGUSR1 path blocks with progress feedback:
Phase 1 — wait for old process to die:
⏳ User service draining active work...
Polls os.kill(pid, 0) until ProcessLookupError (up to 90s)
Phase 2 — wait for new process to become active:
⏳ Waiting for hermes-gateway to restart...
Polls systemctl is-active + verifies new PID (up to 60s)
Success:
✓ User service restarted (PID 12345)
Timeout:
⚠ User service did not become active within 60s.
Check status: hermes gateway status
Check logs: journalctl --user -u hermes-gateway --since '2 min ago'
The reload-or-restart fallback path (line 1189) already blocks because
systemctl reload-or-restart is synchronous.
Test plan:
- Updated test to verify wait-for-restart behavior
- All 118 gateway CLI tests pass
* fix: add 402 billing error hint to gateway error handler (#5220)
The gateway's exception handler for agent errors had specific hints for
HTTP 401, 429, 529, 400, 500 — but not 402 (Payment Required / quota
exhausted). Users hitting billing limits from custom proxy providers
got a generic error with no guidance.
Added: 'Your API balance or quota is exhausted. Check your provider
dashboard.'
The underlying billing classification (error_classifier.py) already
correctly handles 402 as FailoverReason.billing with credential
rotation and fallback. The original issue (#5220) where 402 killed
the entire gateway was from an older version — on current main, 402
is excluded from the is_client_error abort path (line 9460) and goes
through the proper retry/fallback/fail flow. Combined with PR #9875
(auto-recover from unexpected SIGTERM), even edge cases where the
gateway dies are now survivable.
When a session gets stuck (hung terminal, runaway tool loop) and the
user restarts the gateway, the same session history loads and puts the
agent right back in the stuck state. The user is trapped in a loop:
restart → stuck → restart → stuck.
Fix: track restart-failure counts per session using a simple JSON file
(.restart_failure_counts). On each shutdown with active agents, the
counter increments for those sessions. On startup, if any session has
been active across 3+ consecutive restarts, it's auto-suspended —
giving the user a clean slate on their next message.
The counter resets to 0 when a session completes a turn successfully
(response delivered), so normal sessions that happen to be active
during planned restarts (/restart, hermes update) won't accumulate
false counts.
Implementation:
- _increment_restart_failure_counts(): called during stop() when
agents are active. Writes {session_key: count} to JSON file.
Sessions NOT active are dropped (loop broken).
- _suspend_stuck_loop_sessions(): called on startup. Reads the file,
suspends sessions at threshold (3), clears the file.
- _clear_restart_failure_count(): called after successful response
delivery. Removes the session from the counter file.
No SessionEntry schema changes. No database migration. Pure file-based
tracking that naturally cleans up.
Test plan:
- 9 new stuck-loop tests (increment, accumulate, threshold, clear,
suspend, file cleanup, edge cases)
- All 28 gateway lifecycle tests pass (restart drain + auto-continue
+ stuck loop)