The _watch_update_progress() poll loop never deleted .update_prompt.json
after forwarding the prompt to the user, causing the same prompt to be
re-sent every poll cycle (2s). Two fixes:
1. Delete .update_prompt.json after forwarding — the update process only
polls for .update_response, it doesn't need the prompt file to persist.
2. Guard re-sends with _update_prompt_pending check — belt-and-suspenders
to prevent duplicates even under race conditions.
Add regression test asserting the prompt is sent exactly once.
When a user configures a provider (e.g. `hermes auth add openai-codex`)
but never selects a model via `hermes model`, the gateway and CLI would
pass an empty model string to the API, causing:
'Codex Responses request model must be a non-empty string'
Now both gateway (_resolve_session_agent_runtime) and CLI
(_ensure_runtime_credentials) detect an empty model and fill it from
the provider's first catalog entry in _PROVIDER_MODELS. This covers
all providers that have a static model list (openai-codex, anthropic,
gemini, copilot, etc.).
The fix is conservative: it only triggers when model is truly empty
and a known provider was resolved. Explicit model choices are never
overridden.
When the gateway shuts down gracefully (hermes update, gateway restart,
/restart), it now writes a .clean_shutdown marker file. On the next
startup, if this marker exists, suspend_recently_active() is skipped
and the marker is cleaned up.
Previously, suspend_recently_active() fired on EVERY startup —
including planned restarts from hermes update or hermes gateway restart.
This caused users to lose their conversation history unexpectedly: the
session would be marked as suspended, and the next message would
trigger an auto-reset with a notification the user never asked for.
The original purpose of suspend_recently_active() is crash recovery —
preventing stuck sessions that were mid-processing when the gateway
died unexpectedly. Graceful shutdowns already drain active agents via
_drain_active_agents(), so there is no stuck-session risk. After a
crash (no marker written), suspension still fires as before.
Fixes the scenario where a user asks the agent to run hermes update,
the gateway restarts, and the user's next message gets an unwanted
'Session automatically reset' notification with their history cleared.
Follow-up for cherry-picked PR #8272:
- Add MATRIX_RECOVERY_KEY to module docstring header in matrix.py
- Register in OPTIONAL_ENV_VARS (config.py) with password=True, advanced=True
- Add to _NON_SETUP_ENV_VARS set
- Document cross-signing verification in matrix.md E2EE section
- Update migration guide with recovery key step (step 3)
- Add to environment-variables.md reference
After the PgCryptoStore migration in v0.8.0, the verify_with_recovery_key
call that previously ran after share_keys() was dropped. On any rotation
that uploads fresh device keys (fresh crypto.db, server had stale keys
from a prior install, etc.), the new device keys carry no valid self-
signing signature because the bot has no access to the self-signing
private key.
Peers like Element then refuse to share Megolm sessions with the
rotated device, so the bot silently stops decrypting incoming messages.
This restores the recovery-key bootstrap: on startup, if
MATRIX_RECOVERY_KEY is set, import the cross-signing private keys from
SSSS and sign_own_device(), producing a valid signature server-side.
Idempotent and gated on MATRIX_RECOVERY_KEY — no behavior change for
users who don't configure a recovery key.
Verified end-to-end by deleting crypto.db and restarting: the bot
rotates device identity keys, re-uploads, self-signs via recovery key,
and decrypts+replies to fresh messages from a paired Element client.
After /model switches the model (both picker and text paths), the cached
agent's config signature becomes stale — the agent was updated in-place
via switch_model() but the cache tuple's signature was never refreshed.
The next turn *should* detect the signature mismatch and create a fresh
agent, but this relies on the new model's signature differing from the
old one in _agent_config_signature().
Evicting the cached agent explicitly after storing the session override
is more defensive — the next turn is guaranteed to create a fresh agent
from the override without depending on signature mismatch detection.
Also adds debug logging at three key decision points so we can trace
exactly what happens when /model + /retry interact:
- _resolve_session_agent_runtime: which override path is taken (fast
with api_key vs fallback), or why no override was found
- _run_agent.run_sync: final resolved model/provider before agent
creation
Reported: /model switch to xiaomi/mimo-v2-pro followed by /retry still
used the old model (glm-5.1).
The monitor_for_interrupt() and backup interrupt checks were calling
get_pending_message() which pops the message from the adapter's queue.
This created a race condition: if the agent finished naturally before
checking _interrupt_requested, the pending message was permanently lost.
Timeline of the race:
1. Agent near completion, user sends message
2. Level 1 guard stores message in adapter._pending_messages, sets event
3. monitor_for_interrupt() detects event, POPS message, calls agent.interrupt()
4. Agent's run_conversation() was already returning (interrupted=False)
5. Post-run dequeue finds nothing (monitor already consumed it)
6. result.get('interrupted') is False so interrupt_message fallback doesn't fire
7. User message permanently lost — agent finishes without processing it
Fix: change all three interrupt detection sites (primary monitor + two
backup checks) from get_pending_message() (pop) to
_pending_messages.get() (peek). The message stays in the adapter's queue
until _dequeue_pending_event() consumes it in the post-run handler,
which runs regardless of whether the agent was interrupted or finished
naturally.
Reported by @_SushantSays — intermittent message loss during long
terminal command execution, persisting after the previous fix (73f970fa)
which addressed monitor task death but not this consumption race.
Add content-aware splitting to compact mode: short chat-like exchanges
(2-6 short lines without headings/lists/quotes) get separate message
bubbles for a natural chat feel, while structured content (tables,
headings with body, numbered lists) stays in a single message.
Cherry-picked from PR #7587 by bravohenry, adapted to the compact/legacy
split_per_line architecture from #7903.
When the agent calls process(action='wait') or process(action='poll')
and gets the exited status, the completion_queue notification is
redundant — the agent already has the output from the tool return.
Previously, the drain loops in CLI and gateway would still inject
the [SYSTEM: Background process completed] message, causing the
agent to receive the same information twice.
Fix: track session IDs in _completion_consumed set when wait/poll/log
returns an exited process. Drain loops in cli.py and gateway watcher
skip completion events for consumed sessions. Watch pattern events
are never suppressed (they have independent semantics).
Adds 4 tests covering wait/poll/log marking and running-process
negative case.
Add a 'tip of the day' feature that displays a random one-liner about
Hermes Agent features on every new session — CLI startup, /clear, /new,
and gateway /new across all messaging platforms.
- New hermes_cli/tips.py module with 210 curated tips covering slash
commands, keybindings, CLI flags, config options, tools, gateway
platforms, profiles, sessions, memory, skills, cron, voice, security,
and more
- CLI: tips display in skin-aware dim gold color after the welcome line
- Gateway: tips append to the /new and /reset response on all platforms
- Fully wrapped in try/except — tips are non-critical and never break
startup or reset
Display format (CLI):
✦ Tip: /btw <question> asks a quick side question without tools or history.
Display format (gateway):
✨ Session reset! Starting fresh.
✦ Tip: hermes -c resumes your most recent CLI session.
The interrupt mechanism for regular text messages (non-commands) during
active agent runs relied on a single async polling task
(monitor_for_interrupt) with no error handling. If this task died
silently due to an unhandled exception, stale adapter reference after
reconnect, or any other failure, user messages sent during agent
execution would be queued but never trigger an actual interrupt — the
agent would continue running until it finished naturally, then process
the queued message.
Three improvements:
1. Error handling in monitor_for_interrupt(): wrap the polling body in
try/except so transient errors are logged and retried instead of
silently killing the task.
2. Fresh adapter reference on each poll iteration: re-resolve
self.adapters.get(source.platform) every 200ms instead of capturing
the adapter once at task creation time. This prevents stale
references after adapter reconnects.
3. Backup interrupt check in the inactivity poll loop: both the
unlimited and timeout-enabled paths now check for pending interrupts
every 5 seconds (the existing poll interval). Uses a shared
_interrupt_detected asyncio.Event to avoid double-firing when the
primary monitor already handled the interrupt. Logs at INFO level
with monitor task state for debugging.
On servers with broken or unreachable IPv6, Python's socket.getaddrinfo
returns AAAA records first. urllib/httpx/requests all try IPv6 connections
first and hang for the full TCP timeout before falling back to IPv4. This
affects web_extract, web_search, the OpenAI SDK, and all HTTP tools.
Adds network.force_ipv4 config option (default: false) that monkey-patches
socket.getaddrinfo to resolve as AF_INET when the caller didn't specify a
family. Falls back to full resolution if no A record exists, so pure-IPv6
hosts still work.
Applied early at all three entry points (CLI, gateway, cron scheduler)
before any HTTP clients are created.
Reported by user @29n — Chinese Ubuntu server with unreachable IPv6 causing
timeouts on lobste.rs and other IPv6-enabled sites while Google/GitHub
worked fine (IPv4-only resolution).
The gateway startup path references RedactingFormatter without
importing it, causing a NameError crash when launched with a
verbosity flag (e.g. via launchd --replace).
Fixes#8044
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds an optional focus topic to /compress: `/compress database schema`
guides the summariser to preserve information related to the focus topic
(60-70% of summary budget) while compressing everything else more aggressively.
Inspired by Claude Code's /compact <focus>.
Changes:
- context_compressor.py: focus_topic parameter on _generate_summary() and
compress(); appends FOCUS TOPIC guidance block to the LLM prompt
- run_agent.py: focus_topic parameter on _compress_context(), passed through
to the compressor
- cli.py: _manual_compress() extracts focus topic from command string,
preserves existing manual_compression_feedback integration (no regression)
- gateway/run.py: _handle_compress_command() extracts focus from event args
and passes through — full gateway parity
- commands.py: args_hint="[focus topic]" on /compress CommandDef
Salvaged from PR #7459 (CLI /compress focus only — /context command deferred).
15 new tests across CLI, compressor, and gateway.
Fixes#7952 — Matrix E2EE completely broken after mautrix migration.
- Replace MemoryCryptoStore + pickle/HMAC persistence with mautrix's
PgCryptoStore backed by SQLite via aiosqlite. Crypto state now
persists reliably across restarts without fragile serialization.
- Add handle_sync() call on initial sync response so to-device events
(queued Megolm key shares) are dispatched to OlmMachine instead of
being silently dropped.
- Add _verify_device_keys_on_server() after loading crypto state.
Detects missing keys (re-uploads), stale keys from migration
(attempts re-upload), and corrupted state (refuses E2EE).
- Add _CryptoStateStore adapter wrapping MemoryStateStore to satisfy
mautrix crypto's StateStore interface (is_encrypted,
get_encryption_info, find_shared_rooms).
- Remove redundant share_keys() call from sync loop — OlmMachine
already handles this via DEVICE_OTK_COUNT event handler.
- Fix datetime vs float TypeError in session.py suspend_recently_active()
that crashed gateway startup.
- Add aiosqlite and asyncpg to [matrix] extra in pyproject.toml.
- Update test mocks for PgCryptoStore/Database and add query_keys mock
for key verification. 174 tests pass.
- Add E2EE upgrade/migration docs to Matrix user guide.
* feat: component-separated logging with session context and filtering
Phase 1 — Gateway log isolation:
- gateway.log now only receives records from gateway.* loggers
(platform adapters, session management, slash commands, delivery)
- agent.log remains the catch-all (all components)
- errors.log remains WARNING+ catch-all
- Moved gateway.log handler creation from gateway/run.py into
hermes_logging.setup_logging(mode='gateway') with _ComponentFilter
Phase 2 — Session ID injection:
- Added set_session_context(session_id) / clear_session_context() API
using threading.local() for per-thread session tracking
- _SessionFilter enriches every log record with session_tag attribute
- Log format: '2026-04-11 10:23:45 INFO [session_id] logger.name: msg'
- Session context set at start of run_conversation() in run_agent.py
- Thread-isolated: gateway conversations on different threads don't leak
Phase 3 — Component filtering in hermes logs:
- Added --component flag: hermes logs --component gateway|agent|tools|cli|cron
- COMPONENT_PREFIXES maps component names to logger name prefixes
- Works with all existing filters (--level, --session, --since, -f)
- Logger name extraction handles both old and new log formats
Files changed:
- hermes_logging.py: _SessionFilter, _ComponentFilter, COMPONENT_PREFIXES,
set/clear_session_context(), gateway.log creation in setup_logging()
- gateway/run.py: removed redundant gateway.log handler (now in hermes_logging)
- run_agent.py: set_session_context() at start of run_conversation()
- hermes_cli/logs.py: --component filter, logger name extraction
- hermes_cli/main.py: --component argument on logs subparser
Addresses community request for component-separated, filterable logging.
Zero changes to existing logger names — __name__ already provides hierarchy.
* fix: use LogRecord factory instead of per-handler _SessionFilter
The _SessionFilter approach required attaching a filter to every handler
we create. Any handler created outside our _add_rotating_handler (like
the gateway stderr handler, or third-party handlers) would crash with
KeyError: 'session_tag' if it used our format string.
Replace with logging.setLogRecordFactory() which injects session_tag
into every LogRecord at creation time — process-global, zero per-handler
wiring needed. The factory is installed at import time (before
setup_logging) so session_tag is available from the moment hermes_logging
is imported.
- Idempotent: marker attribute prevents double-wrapping on module reload
- Chains with existing factory: won't break third-party record factories
- Removes _SessionFilter from _add_rotating_handler and setup_verbose_logging
- Adds tests: record factory injection, idempotency, arbitrary handler compat
Add display.platforms section to config.yaml for per-platform overrides of
display settings (tool_progress, show_reasoning, streaming, tool_preview_length).
Each platform gets sensible built-in defaults based on capability tier:
- High (telegram, discord): tool_progress=all, streaming follows global
- Medium (slack, mattermost, matrix, feishu): tool_progress=new
- Low (signal, whatsapp, bluebubbles, wecom, etc.): tool_progress=off, streaming=false
- Minimal (email, sms, webhook, homeassistant): tool_progress=off, streaming=false
Example config:
display:
platforms:
telegram:
tool_progress: all
show_reasoning: true
slack:
tool_progress: off
Resolution order: platform override > global setting > built-in platform default.
Changes:
- New gateway/display_config.py: resolver module with tier-based platform defaults
- gateway/run.py: tool_progress, tool_preview_length, streaming, show_reasoning
all resolve per-platform via the new resolver
- /verbose command: now cycles tool_progress per-platform (saves to
display.platforms.<platform>.tool_progress instead of global)
- /reasoning show|hide: now saves show_reasoning per-platform
- Config version 15 -> 16: migrates tool_progress_overrides into display.platforms
- Backward compat: legacy tool_progress_overrides still read as fallback
- 27 new tests for resolver, normalization, migration, backward compat
- Updated verbose command tests for per-platform behavior
Addresses community request for per-channel verbosity control (Guillaume Meyer,
Nathan Danielsen) — high verbosity on backchannel Telegram, low on customer-facing
Slack, none on email.
Add display.interim_assistant_messages config (enabled by default) that
forwards completed assistant commentary between tool calls to the user
as separate chat messages. Models already emit useful status text like
'I'll inspect the repo first.' — this surfaces it on Telegram, Discord,
and other messaging platforms instead of swallowing it.
Independent from tool_progress and gateway streaming. Disabled for
webhooks. Uses GatewayStreamConsumer when available, falls back to
direct adapter send. Tracks response_previewed to prevent double-delivery
when interim message matches the final response.
Also fixes: cursor not stripped from fallback prefix in stream consumer
(affected continuation calculation on no-edit platforms like Signal).
Cherry-picked from PR #7885 by asheriif, default changed to enabled.
Fixes#5016
Complete the contextvars migration by adding HERMES_SESSION_KEY to the
unified _VAR_MAP in session_context.py. Without this, concurrent gateway
handlers race on os.environ["HERMES_SESSION_KEY"].
- Add _SESSION_KEY ContextVar to _VAR_MAP, set_session_vars(), clear_session_vars()
- Wire session_key through _set_session_env() from SessionContext
- Replace os.getenv fallback in tools/approval.py with get_session_env()
(function-level import to avoid cross-layer coupling)
- Keep os.environ set as CLI/cron fallback
Cherry-picked from PR #7878 by 0xbyt4.
Add a second WeCom integration mode for regular enterprise self-built
applications. Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges. The agent's reply is delivered proactively
via the message/send API.
Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful. The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.
Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health
Salvaged from PR #7774 by @chqchshj. Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.
When 'hermes claw migrate' copies Telegram/Discord/Slack bot tokens from
OpenClaw while the Hermes gateway is already polling with those same tokens,
the platforms conflict (e.g. Telegram 409). Add a pre-flight check that reads
gateway_state.json via get_running_pid() + read_runtime_status(), warns the
user, and lets them cancel or continue.
Also improve the Telegram polling conflict error message to mention OpenClaw
as a common cause and give the 'hermes start' restart command.
Refs #7907
In _run_agent(), the pending message handler references 'event' which
is not defined in that scope — it only exists in the caller. This
causes a NameError when sending the first response before processing a
queued follow-up message.
Replace getattr(event, 'metadata', None) with the established pattern
using source.thread_id, consistent with lines 2625, 2810, 3678, 4410, 4566
in the same file.
When sending multi-chunk responses, individual chunks can fail due to
transient iLink API errors. Previously a single failure would abort the
entire message. Now each chunk is retried with linear backoff before
giving up, and the same client_id is reused across retries for
server-side deduplication.
Configurable via config.yaml (platforms.weixin.extra) or env vars:
- send_chunk_delay_seconds (default 0.35s) — pacing between chunks
- send_chunk_retries (default 2) — max retry attempts per chunk
- send_chunk_retry_delay_seconds (default 1.0s) — base retry delay
Replaces the hardcoded 0.3s inter-chunk delay from #7903.
Salvaged from PR #7899 by @corazzione. Fixes#7836.
WeCom AI Bot sends file attachments with msgtype="appmsg", not
msgtype="file". Previously only file content was discarded while
the text title reached the agent.
Changes:
- _extract_text(): Extract appmsg title (filename) for display
- _extract_media(): Handle appmsg type with file/image content
Fixes#7750
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked from PR #7747 with follow-up fixes:
- Narrowed suspend_all_active() to suspend_recently_active() — only
suspends sessions updated within the last 2 minutes (likely in-flight),
not all sessions which would unnecessarily reset idle users
- /stop with no running agent no longer suspends the session; only
actual force-stops mark the session for reset
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes#6341, #6485, #7643
Relates to #6516, #7392
The Weixin adapter was splitting responses at every top-level newline,
causing notification spam (up to 70 API calls for a single long markdown
response). This salvages the best aspects of six contributor PRs:
Compact mode (new default):
- Messages under the 4000-char limit stay as a single bubble even with
multiple lines, paragraphs, and code blocks
- Only oversized messages get split at logical markdown boundaries
- Inter-chunk delay (0.3s) between chunks prevents WeChat rate-limit drops
Legacy mode (opt-in):
- Set split_multiline_messages: true in platforms.weixin.extra config
- Or set WEIXIN_SPLIT_MULTILINE_MESSAGES=true env var
- Restores the old per-line splitting behavior
Salvaged from PRs #7797 (guantoubaozi), #7792 (luoxiao6645),
#7838 (qyx596), #7825 (weedge), #7784 (sherunlock03), #7773 (JnyRoad).
Core fix unanimous across all six; config toggle from #7838; inter-chunk
delay from #7825.
Matrix gateway: fix sync loop never dispatching events (#5819)
- _sync_loop() called client.sync() but never called handle_sync()
to dispatch events to registered callbacks — _on_room_message was
registered but never fired for new messages
- Store next_batch token from initial sync and pass as since= to
subsequent incremental syncs (was doing full initial sync every time)
- 17 comments, confirmed by multiple users on matrix.org
Feishu docs: add interactive card configuration for approvals (#6893)
- Error 200340 is a Feishu Developer Console configuration issue,
not a code bug — users need to enable Interactive Card capability
and configure Card Request URL
- Added required 3-step setup instructions to feishu.md
- Added troubleshooting entry for error 200340
- 17 comments from Feishu users
Copilot provider drift: detect GPT-5.x Responses API requirement (#3388)
- GPT-5.x models are rejected on /v1/chat/completions by both OpenAI
and OpenRouter (unsupported_api_for_model error)
- Added _model_requires_responses_api() to detect models needing
Responses API regardless of provider
- Applied in __init__ (covers OpenRouter primary users) and in
_try_activate_fallback() (covers Copilot->OpenRouter drift)
- Fixed stale comment claiming gateway creates fresh agents per message
(it caches them via _agent_cache since the caching was added)
- 7 comments, reported on Copilot+Telegram gateway
* fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21
MemoryCryptoStore.__init__() now requires account_id and pickle_key
positional arguments as of mautrix 0.21. The migration from matrix-nio
(commit 1850747) didn't account for this, causing E2EE initialization
to fail with:
MemoryCryptoStore.__init__() missing 2 required positional arguments:
'account_id' and 'pickle_key'
Pass self._user_id as account_id and derive pickle_key from the same
user_id:device_id pair already used for the on-disk HMAC signature.
Update the test stub to accept the new parameters.
Fixes#7803
* fix: use consistent fallback for pickle_key derivation
Address review: _pickle_key now uses _acct_id (which has the 'hermes'
fallback) instead of raw self._user_id, so both values stay consistent
when user_id is empty.
---------
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Telegram flood control during streaming caused messages to be cut off
mid-response. The old behavior permanently disabled edits after a single
flood-control failure, losing the remainder of the response.
Changes:
- Adaptive backoff: on flood-control edit failures, double the edit interval
instead of immediately disabling edits. Only permanently disable after 3
consecutive failures (_MAX_FLOOD_STRIKES).
- Cursor strip: when entering fallback mode, best-effort edit to remove the
cursor (▉) from the last visible message so it doesn't appear stuck.
- Fallback send retry: _send_fallback_final retries each chunk once on
flood-control failures (3s delay) before giving up.
- Default edit_interval increased from 0.3s to 1.0s. Telegram rate-limits
edits at ~1/s per message; 0.3s was virtually guaranteed to trigger flood
control on any non-trivial response.
- _send_or_edit returns bool so the overflow split loop knows not to
truncate accumulated text when an edit fails (prevents content loss).
Fixes: messages cutting/stopping mid-response on Telegram, especially
with streaming enabled.
* feat: add watch_patterns to background processes for output monitoring
Adds a new 'watch_patterns' parameter to terminal(background=true) that
lets the agent specify strings to watch for in process output. When a
matching line appears, a notification is queued and injected as a
synthetic message — triggering a new agent turn, similar to
notify_on_complete but mid-process.
Implementation:
- ProcessSession gets watch_patterns field + rate-limit state
- _check_watch_patterns() in ProcessRegistry scans new output chunks
from all three reader threads (local, PTY, env-poller)
- Rate limited: max 8 notifications per 10s window
- Sustained overload (45s) permanently disables watching for that process
- watch_queue alongside completion_queue, same consumption pattern
- CLI drains watch_queue in both idle loop and post-turn drain
- Gateway drains after agent runs via _inject_watch_notification()
- Checkpoint persistence + crash recovery includes watch_patterns
- Blocked in execute_code sandbox (like other bg params)
- 20 new tests covering matching, rate limiting, overload kill,
checkpoint persistence, schema, and handler passthrough
Usage:
terminal(
command='npm run dev',
background=true,
watch_patterns=['ERROR', 'WARN', 'listening on port']
)
* refactor: merge watch_queue into completion_queue
Unified queue with 'type' field distinguishing 'completion',
'watch_match', and 'watch_disabled' events. Extracted
_format_process_notification() in CLI and gateway to handle
all event types in a single drain loop. Removes duplication
across both CLI drain sites and the gateway.
When the stream consumer has sent at least one message (already_sent=True),
the gateway skips sending the final response to avoid duplicates. But this
also suppressed error messages when the agent failed mid-loop — rate limit
exhaustion, context overflow, compression failure, etc.
The user would see the last streamed content and then nothing: no error
message, no explanation. The agent appeared to 'stop responding.'
Fix: check the 'failed' flag at both the producer (_run_agent marks
already_sent) and consumer (_handle_message_with_agent checks it) sites.
Error messages are always delivered regardless of streaming state.
- Add agent.close() call to _finalize_shutdown_agents() to prevent
zombie processes (terminal sandboxes, browser daemons, httpx clients)
- Global cleanup (process_registry, environments, browsers) preserved
in _stop_impl() during conflict resolution
- Move /restart CommandDef from 'Info' to 'Session' category to match
/stop and /status