Combines detection from both PRs into _detect_openclaw_processes():
- Cross-platform process scan (pgrep/tasklist/PowerShell) from PR #8102
- systemd service check from PR #8555
- Returns list[str] with details about what's found
Fixes in cleanup warning (from PR #8555):
- print_warning -> print_error/print_info (print_warning not in import chain)
- Added isatty() guard for non-interactive sessions
- Removed duplicate _check_openclaw_running() in favor of shared function
Updated all tests to match new API.
- Use PowerShell to inspect node.exe command lines on Windows,
since tasklist output does not include them.
- Also check for dedicated openclaw.exe/clawd.exe processes.
- Skip the interactive prompt in non-interactive sessions so the
preview-only behavior is preserved.
- Update tests accordingly.
Relates to #7907
Add _is_openclaw_running() and _warn_if_openclaw_running() to detect
OpenClaw processes (via pgrep/tasklist) before hermes claw migrate.
Warns the user that messaging platforms only allow one active session
per bot token, and lets them cancel or continue.
Fixes#7907
Add a CI-built skills index served from the docs site. The index is
crawled daily by GitHub Actions, resolves all GitHub paths upfront, and
is cached locally by the client. When the index is available:
- Search uses the cached index (0 GitHub API calls, was 23+)
- Install uses resolved paths from index (6 API calls for file
downloads only, was 31-45 for discovery + downloads)
Total: 68 → 6 GitHub API calls for a typical search + install flow.
Unauthenticated users (60 req/hr) can now search and install without
hitting rate limits.
Components:
- scripts/build_skills_index.py: Crawl all sources (skills.sh, GitHub
taps, official, clawhub, lobehub), batch-resolve GitHub paths via
tree API, output JSON index
- tools/skills_hub.py: HermesIndexSource class — search/fetch/inspect
backed by the index, with lazy GitHubSource for file downloads
- parallel_search_sources() skips external API sources when index is
available (0 GitHub calls for search)
- .github/workflows/skills-index.yml: twice-daily CI build + deploy
- .github/workflows/deploy-site.yml: also builds index during docs deploy
Graceful degradation: when the index is unavailable (first run, network
down, stale), all methods return empty/None and downstream sources
handle the request via direct API as before.
Skills.sh installs hit the GitHub API 45 times per install because the
same repo tree was fetched 6 times redundantly. Combined with search
(23 API calls), this totals 68 — exceeding the unauthenticated rate
limit of 60 req/hr, causing 'Could not fetch' errors for users without
a GITHUB_TOKEN.
Changes:
- Add _get_repo_tree() cache to GitHubSource — repo info + recursive
tree fetched once per repo per source instance, eliminating 10
redundant API calls (6 tree + 4 candidate 404s)
- _download_directory_via_tree returns {} (not None) when cached tree
shows path doesn't exist, skipping unnecessary Contents API fallback
- _check_rate_limit_response() detects exhausted quota and sets
is_rate_limited flag
- do_install() shows actionable hint when rate limited: set
GITHUB_TOKEN or install gh CLI
Before: 45 API calls per install (68 total with search)
After: 31 API calls per install (54 total with search — under 60/hr)
Reported by community user from Vietnam (no GitHub auth configured).
Centralize container detection in hermes_constants.is_container() with
process-lifetime caching, matching existing is_wsl()/is_termux() patterns.
Dedup _is_inside_container() in config.py to delegate to the new function.
Add _run_systemctl() wrapper that converts FileNotFoundError to RuntimeError
for defense-in-depth — all 10 bare subprocess.run(_systemctl_cmd(...)) call
sites now route through it.
Make supports_systemd_services() return False in containers and when
systemctl binary is absent (shutil.which check).
Add Docker-specific guidance in gateway_command() for install/uninstall/start
subcommands — exit 0 with helpful instructions instead of crashing.
Make 'hermes status' show 'Manager: docker (foreground)' and 'hermes dump'
show 'running (docker, pid N)' inside containers.
Fix setup_gateway() to use supports_systemd instead of _is_linux for all
systemd-related branches, and show Docker restart policy instructions in
containers.
Replace inline /.dockerenv check in voice_mode.py with is_container().
Fixes#7420
Co-authored-by: teknium1 <teknium1@users.noreply.github.com>
The backup validation checked for 'hermes_state.db' and 'memory_store.db'
as telltale markers of a valid Hermes backup zip. Neither name exists in a
real Hermes installation — the actual database file is 'state.db'
(hermes_state.py: DEFAULT_DB_PATH = get_hermes_home() / 'state.db').
A fresh Hermes installation produces:
~/.hermes/state.db (actual name)
~/.hermes/config.yaml
~/.hermes/.env
Because the marker set never matched 'state.db', a backup zip containing
only 'state.db' plus 'config.yaml' would fail validation with:
'zip does not appear to be a Hermes backup'
and the import would exit with sys.exit(1), silently rejecting a valid backup.
Fix: replace the wrong marker names with the correct filename.
Adds TestValidateBackupZip with three cases:
- state.db is accepted as a valid marker
- old wrong names (hermes_state.db, memory_store.db) alone are rejected
- config.yaml continues to pass (existing behaviour preserved)
Three fixes for the (empty) response bug affecting open reasoning models:
1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
populate reasoning fields via OpenRouter, so the old 'not _has_structured'
guard on the retry path blocked retries for EVERY reasoning model after
the 2 prefill attempts. Now: 2 prefills + 3 retries = 6 total attempts
before (empty).
2. Reset prefill/retry counters on tool-call recovery — the counters
accumulated across the entire conversation, never resetting during
tool-calling turns. A model cycling empty→prefill→tools→empty burned
both prefill attempts and the third empty got zero recovery. Now
counters reset when prefill succeeds with tool calls.
3. Strip think blocks before _truly_empty check — inline <think> content
made the string non-empty, skipping both retry paths.
Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML. Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
Add agent.gateway_notify_interval config option (default 600s).
Set to 0 to disable periodic 'still working' notifications.
Bridged to HERMES_AGENT_NOTIFY_INTERVAL env var (same pattern as
gateway_timeout and gateway_timeout_warning).
The inactivity warning (gateway_timeout_warning) was already
configurable; this makes the wall-clock ping configurable too.
- Remove duplicate _setup_feishu() definition (old 3-line version left
behind by cherry-pick — Python picked the new one but dead code
remained)
- Remove misleading 'Disable direct messages' DM option — the Feishu
adapter has no DM policy mechanism, so 'disable' produced identical
env vars to 'pairing'. Users who chose 'disable' would still see
pairing prompts. Reduced to 3 options: pairing, allow-all, allowlist.
- Fix test_probe_returns_bot_info_on_success and
test_probe_returns_none_on_failure: patch FEISHU_AVAILABLE=True so
probe_bot() takes the SDK path when lark_oapi is not installed
Previously, all invalid API responses (choices=None) were diagnosed
as 'fast response often indicates rate limiting' regardless of actual
response time or error code. A 738s Cloudflare 524 timeout was labeled
as 'fast response' and 'possible rate limit'.
Now extracts the error code from response.error and classifies:
- 524: upstream provider timed out (Cloudflare)
- 504: upstream gateway timeout
- 429: rate limited by upstream provider
- 500/502: upstream server error
- 503/529: upstream provider overloaded
- Other codes: shown with code number
- No code + <10s: likely rate limited (timing heuristic)
- No code + >60s: likely upstream timeout
- No code + 10-60s: neutral response time
All downstream messages (retry status, final error, interrupt message)
now use the classified hint instead of generic rate-limit language.
Reported by community member Lumen Radley (MiMo provider timeouts).
auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.
Gemma 4 (26B/31B) uses <thought>...</thought> to wrap its reasoning
output. This tag was not included in the existing list of reasoning tag
variants stripped by _strip_think_blocks(), causing raw thinking blocks
to leak into the visible response.
Added a new re.sub() line for <thought> and extended the cleanup regex
to include 'thought' alongside the existing variants.
Fixes#6148
When a user closes a terminal tab, SIGHUP exits the main thread but
the non-daemon agent_thread kept the entire Python process alive —
stuck in the API call loop with no interrupt signal. Over many
conversations, these orphan processes accumulate and cause massive
swap usage (reported: 77GB on a 32GB M1 Pro).
Changes:
- Make agent_thread daemon=True so the process exits when the main
thread finishes its cleanup. Under normal operation this changes
nothing — the main thread already waits on agent_thread.is_alive().
- Interrupt the agent in the finally/exit path so the daemon thread
stops making API calls promptly rather than being killed mid-flight.
On macOS with uv-managed Python, stdin (fd 0) can be invalid or
unregisterable with the asyncio selector, causing:
KeyError: '0 is not registered'
during prompt_toolkit's app.run() → asyncio.run() → _add_reader(0).
Three-layer fix:
1. Pre-flight fstat(0) check before app.run() — detects broken stdin
early and prints actionable guidance instead of a raw traceback.
2. Catch KeyError/OSError around app.run() as fallback for edge cases
that slip past the fstat guard.
3. Extend asyncio exception handler to suppress selector registration
KeyErrors in async callbacks.
Fixes#6393
Fresh profiles (created without --clone) now:
- Auto-seed a default SOUL.md immediately, so users have a file to
customize right away instead of discovering it only after first use
- Print a clear warning that the profile has no API keys and will
inherit from the shell environment unless configured separately
- Show the SOUL.md path for personality customization
Previously, fresh profiles started with no SOUL.md (only seeded on
first use via ensure_hermes_home), no mention of credential isolation,
and no guidance about customizing personality. Users reported confusion
about profiles using the wrong model/plan tokens and SOUL.md not
being read — both traced to operational gaps in the creation UX.
Closes#8093 (investigated: code correctly loads SOUL.md from profile
HERMES_HOME; issue was operational, not a code bug).
The _watch_update_progress() poll loop never deleted .update_prompt.json
after forwarding the prompt to the user, causing the same prompt to be
re-sent every poll cycle (2s). Two fixes:
1. Delete .update_prompt.json after forwarding — the update process only
polls for .update_response, it doesn't need the prompt file to persist.
2. Guard re-sends with _update_prompt_pending check — belt-and-suspenders
to prevent duplicates even under race conditions.
Add regression test asserting the prompt is sent exactly once.
When a user configures a provider (e.g. `hermes auth add openai-codex`)
but never selects a model via `hermes model`, the gateway and CLI would
pass an empty model string to the API, causing:
'Codex Responses request model must be a non-empty string'
Now both gateway (_resolve_session_agent_runtime) and CLI
(_ensure_runtime_credentials) detect an empty model and fill it from
the provider's first catalog entry in _PROVIDER_MODELS. This covers
all providers that have a static model list (openai-codex, anthropic,
gemini, copilot, etc.).
The fix is conservative: it only triggers when model is truly empty
and a known provider was resolved. Explicit model choices are never
overridden.
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:
- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
are relevant'
This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
When the gateway shuts down gracefully (hermes update, gateway restart,
/restart), it now writes a .clean_shutdown marker file. On the next
startup, if this marker exists, suspend_recently_active() is skipped
and the marker is cleaned up.
Previously, suspend_recently_active() fired on EVERY startup —
including planned restarts from hermes update or hermes gateway restart.
This caused users to lose their conversation history unexpectedly: the
session would be marked as suspended, and the next message would
trigger an auto-reset with a notification the user never asked for.
The original purpose of suspend_recently_active() is crash recovery —
preventing stuck sessions that were mid-processing when the gateway
died unexpectedly. Graceful shutdowns already drain active agents via
_drain_active_agents(), so there is no stuck-session risk. After a
crash (no marker written), suspension still fires as before.
Fixes the scenario where a user asks the agent to run hermes update,
the gateway restarts, and the user's next message gets an unwanted
'Session automatically reset' notification with their history cleared.
When /update runs via Telegram, hermes update --gateway is spawned inside
the gateway's systemd cgroup. The update process itself calls
systemctl restart hermes-gateway, which tears down the cgroup with
KillMode=mixed — SIGKILL to all remaining processes. The wrapping bash
shell is killed before it can execute the exit-code epilogue, so
.update_exit_code is never created. The new gateway's update watcher
then polls for 30 minutes and sends a spurious timeout message.
Fix: write .update_exit_code from Python inside cmd_update() immediately
after the git pull + pip install succeed ("Update complete!"), before
attempting the gateway restart. The shell epilogue still writes it too
(idempotent overwrite), but now the marker exists even when the process
is killed mid-restart.
When running inside WSL (Windows Subsystem for Linux), inject a hint into
the system prompt explaining that the Windows host filesystem is mounted
at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows
paths (Desktop, Documents) to their /mnt/ equivalents without the user
needing to configure anything.
Uses the existing is_wsl() detection from hermes_constants (cached,
checks /proc/version for 'microsoft'). Adds build_environment_hints()
in prompt_builder.py — extensible for Termux, Docker, etc. later.
Closes the UX gap where WSL users had to manually explain path
translation to the agent every session.
Follow-up for cherry-picked PR #8272:
- Add MATRIX_RECOVERY_KEY to module docstring header in matrix.py
- Register in OPTIONAL_ENV_VARS (config.py) with password=True, advanced=True
- Add to _NON_SETUP_ENV_VARS set
- Document cross-signing verification in matrix.md E2EE section
- Update migration guide with recovery key step (step 3)
- Add to environment-variables.md reference
After the PgCryptoStore migration in v0.8.0, the verify_with_recovery_key
call that previously ran after share_keys() was dropped. On any rotation
that uploads fresh device keys (fresh crypto.db, server had stale keys
from a prior install, etc.), the new device keys carry no valid self-
signing signature because the bot has no access to the self-signing
private key.
Peers like Element then refuse to share Megolm sessions with the
rotated device, so the bot silently stops decrypting incoming messages.
This restores the recovery-key bootstrap: on startup, if
MATRIX_RECOVERY_KEY is set, import the cross-signing private keys from
SSSS and sign_own_device(), producing a valid signature server-side.
Idempotent and gated on MATRIX_RECOVERY_KEY — no behavior change for
users who don't configure a recovery key.
Verified end-to-end by deleting crypto.db and restarting: the bot
rotates device identity keys, re-uploads, self-signs via recovery key,
and decrypts+replies to fresh messages from a paired Element client.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When Hermes refreshes a Codex token, it consumed the old refresh_token
but never wrote the new pair back to ~/.codex/auth.json. This caused
Codex CLI and VS Code to fail with 'refresh_token_reused' on their
next refresh attempt.
This mirrors the existing Anthropic write-back pattern where refreshed
tokens are written to ~/.claude/.credentials.json via
_write_claude_code_credentials().
Changes:
- Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to
_write_claude_code_credentials in anthropic_adapter.py)
- Call it from _refresh_codex_auth_tokens() (non-pool refresh path)
- Call it from credential_pool._refresh_entry() (pool happy path + retry)
- Add tests for the new write-back behavior
- Update existing test docstring to clarify _save_codex_tokens vs
_write_codex_cli_tokens separation
Fixes refresh token conflict reported by @ec12edfae2cb221
After /model switches the model (both picker and text paths), the cached
agent's config signature becomes stale — the agent was updated in-place
via switch_model() but the cache tuple's signature was never refreshed.
The next turn *should* detect the signature mismatch and create a fresh
agent, but this relies on the new model's signature differing from the
old one in _agent_config_signature().
Evicting the cached agent explicitly after storing the session override
is more defensive — the next turn is guaranteed to create a fresh agent
from the override without depending on signature mismatch detection.
Also adds debug logging at three key decision points so we can trace
exactly what happens when /model + /retry interact:
- _resolve_session_agent_runtime: which override path is taken (fast
with api_key vs fallback), or why no override was found
- _run_agent.run_sync: final resolved model/provider before agent
creation
Reported: /model switch to xiaomi/mimo-v2-pro followed by /retry still
used the old model (glm-5.1).
The monitor_for_interrupt() and backup interrupt checks were calling
get_pending_message() which pops the message from the adapter's queue.
This created a race condition: if the agent finished naturally before
checking _interrupt_requested, the pending message was permanently lost.
Timeline of the race:
1. Agent near completion, user sends message
2. Level 1 guard stores message in adapter._pending_messages, sets event
3. monitor_for_interrupt() detects event, POPS message, calls agent.interrupt()
4. Agent's run_conversation() was already returning (interrupted=False)
5. Post-run dequeue finds nothing (monitor already consumed it)
6. result.get('interrupted') is False so interrupt_message fallback doesn't fire
7. User message permanently lost — agent finishes without processing it
Fix: change all three interrupt detection sites (primary monitor + two
backup checks) from get_pending_message() (pop) to
_pending_messages.get() (peek). The message stays in the adapter's queue
until _dequeue_pending_event() consumes it in the post-run handler,
which runs regardless of whether the agent was interrupted or finished
naturally.
Reported by @_SushantSays — intermittent message loss during long
terminal command execution, persisting after the previous fix (73f970fa)
which addressed monitor task death but not this consumption race.
Reject non-URL values (e.g. shell commands typed by mistake) in the
base URL prompt during provider setup. Previously any string was saved
as-is to .env, breaking connectivity when the garbage value was used
as the API endpoint.
Adds http:// / https:// prefix check with a clear error message.
The custom-endpoint flow already had this validation (line 1620);
this brings the generic API-key provider flow to parity.
Triggered by a user support case where 'nano ~/.hermes/.env' was
accidentally entered as GLM_BASE_URL during Z.AI setup.
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:
- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
are relevant'
This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
so context-length lookups use models.dev data instead of 128k fallback.
Fixes#8161.
- Set api_mode from custom_providers entry when switching via hermes model,
and clear stale api_mode when the entry has none. Also extract api_mode
in _named_custom_provider_map(). Fixes#8181.
- Convert OpenAI image_url content blocks to Anthropic image blocks when
the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
containing /anthropic). Fixes#8147.
* fix: list all available toolsets in delegate_task schema description
The delegate_task tool's toolsets parameter description only mentioned
'terminal', 'file', and 'web' as examples. Models (especially smaller
ones like Gemma) would substitute 'web' for 'browser' because they
didn't know 'browser' was a valid option.
Now dynamically builds the toolset list from the TOOLSETS dict at import
time, excluding blocked, composite, and platform-specific toolsets.
Auto-updates when new toolsets are added.
Reported by jeffutter on Discord.
* chore: exclude moa and rl from delegate_task toolset list
- Add gosu for runtime privilege dropping from root to hermes user
- Support HERMES_UID/HERMES_GID env vars for host mount permission matching
- Switch to debian:13.4-slim base image
- Use uv venv instead of pip install --break-system-packages
- Pin uv and gosu multi-stage images with SHA256 digests
- Set PLAYWRIGHT_BROWSERS_PATH to /opt/hermes/.playwright so build-time
chromium install survives the /opt/data volume mount
- Keep procps for container debugging
Based on work by m0n5t3r in PR #5811. Stripped to hardening-only
changes (non-root, virtualenv, slim base); matrix deps, fonts, xvfb,
and entrypoint playwright download deferred to follow-up.
Add content-aware splitting to compact mode: short chat-like exchanges
(2-6 short lines without headings/lists/quotes) get separate message
bubbles for a natural chat feel, while structured content (tables,
headings with body, numbered lists) stays in a single message.
Cherry-picked from PR #7587 by bravohenry, adapted to the compact/legacy
split_per_line architecture from #7903.
When the agent calls process(action='wait') or process(action='poll')
and gets the exited status, the completion_queue notification is
redundant — the agent already has the output from the tool return.
Previously, the drain loops in CLI and gateway would still inject
the [SYSTEM: Background process completed] message, causing the
agent to receive the same information twice.
Fix: track session IDs in _completion_consumed set when wait/poll/log
returns an exited process. Drain loops in cli.py and gateway watcher
skip completion events for consumed sessions. Watch pattern events
are never suppressed (they have independent semantics).
Adds 4 tests covering wait/poll/log marking and running-process
negative case.
Add a 'tip of the day' feature that displays a random one-liner about
Hermes Agent features on every new session — CLI startup, /clear, /new,
and gateway /new across all messaging platforms.
- New hermes_cli/tips.py module with 210 curated tips covering slash
commands, keybindings, CLI flags, config options, tools, gateway
platforms, profiles, sessions, memory, skills, cron, voice, security,
and more
- CLI: tips display in skin-aware dim gold color after the welcome line
- Gateway: tips append to the /new and /reset response on all platforms
- Fully wrapped in try/except — tips are non-critical and never break
startup or reset
Display format (CLI):
✦ Tip: /btw <question> asks a quick side question without tools or history.
Display format (gateway):
✨ Session reset! Starting fresh.
✦ Tip: hermes -c resumes your most recent CLI session.
Remove auto-archival from hermes claw migrate — not its
responsibility (hermes claw cleanup is still there for that).
Skip MESSAGING_CWD when it points inside the OpenClaw source
directory, which was the actual root cause of agent confusion
after migration. Use Path.is_relative_to() for robust path
containment check.
Salvaged from PR #8192 by opriz.
Co-authored-by: opriz <opriz@users.noreply.github.com>
- Add rebrand_text() that replaces OpenClaw, Open Claw, Open-Claw,
ClawdBot, and MoltBot with Hermes (case-insensitive, word-boundary)
- Apply rebranding to memory entries (MEMORY.md, USER.md, daily memory)
- Apply rebranding to SOUL.md and workspace instructions via new
transform parameter on copy_file()
- Fix moldbot -> moltbot typo across codebase (claw.py, migration
script, docs, tests)
- Add unit tests for rebrand_text and integration tests for memory
and soul migration rebranding
Users whose credentials exist only in external files — OpenAI Codex
OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials
in ~/.claude/.credentials.json — would not see those providers in the
/model picker, even though hermes auth and hermes model detected them.
Root cause: list_authenticated_providers() only checked the raw Hermes
auth store and env vars. External credential file fallbacks (Codex CLI
import, Claude Code file discovery) were never triggered.
Fix (three parts):
1. _seed_from_singletons() in credential_pool.py: openai-codex now
imports from ~/.codex/auth.json when the Hermes auth store is empty,
mirroring resolve_codex_runtime_credentials().
2. list_authenticated_providers() in model_switch.py: auth store + pool
checks now run for ALL providers (not just OAuth auth_type), catching
providers like anthropic that support both API key and OAuth.
3. list_authenticated_providers(): direct check for anthropic external
credential files (Claude Code, Hermes PKCE). The credential pool
intentionally gates anthropic behind is_provider_explicitly_configured()
to prevent auxiliary tasks from silently consuming tokens. The /model
picker bypasses this gate since it is discovery-oriented.
The interrupt mechanism for regular text messages (non-commands) during
active agent runs relied on a single async polling task
(monitor_for_interrupt) with no error handling. If this task died
silently due to an unhandled exception, stale adapter reference after
reconnect, or any other failure, user messages sent during agent
execution would be queued but never trigger an actual interrupt — the
agent would continue running until it finished naturally, then process
the queued message.
Three improvements:
1. Error handling in monitor_for_interrupt(): wrap the polling body in
try/except so transient errors are logged and retried instead of
silently killing the task.
2. Fresh adapter reference on each poll iteration: re-resolve
self.adapters.get(source.platform) every 200ms instead of capturing
the adapter once at task creation time. This prevents stale
references after adapter reconnects.
3. Backup interrupt check in the inactivity poll loop: both the
unlimited and timeout-enabled paths now check for pending interrupts
every 5 seconds (the existing poll interval). Uses a shared
_interrupt_detected asyncio.Event to avoid double-firing when the
primary monitor already handled the interrupt. Logs at INFO level
with monitor task state for debugging.
The TUI transition (4970705, f83e86d) replaced stacked per-tool history
lines with a single live-updating spinner widget. While the spinner
provides a nice live timer, it removed the scrollback history that
users relied on to see what the agent did during a session.
This restores stacked tool progress lines in 'all' and 'new' modes by
printing persistent scrollback lines via _cprint() when tools complete,
in addition to the existing live spinner display.
Behavior per mode:
- off: no scrollback lines, no spinner (unchanged)
- new: scrollback line on completion, skipping consecutive same-tool repeats
- all: scrollback line on every tool completion
- verbose: no scrollback (run_agent.py handles verbose output directly)
Implementation:
- Store function_args from tool.started events in _pending_tool_info
- On tool.completed, pop stored args and format via get_cute_tool_message()
- FIFO queue per function_name handles concurrent tool execution
- 'new' mode tracks _last_scrollback_tool for dedup
- State cleared at end of agent run
Reported by community user Mr.D — the stacked history provides
transparency into what the agent is doing, which builds trust.
Addresses user report from Discord about lost tool call visibility.