The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
_SLASH_WORKER_TIMEOUT_S and _pool used raw float()/int() on env vars
at module level. A non-numeric value (e.g. HERMES_TUI_SLASH_TIMEOUT_S=abc)
raises ValueError during import, preventing TUI gateway from starting
with no useful error message.
Wrap both parses in try/except with safe fallbacks:
- HERMES_TUI_SLASH_TIMEOUT_S: fallback to 45.0s
- HERMES_TUI_RPC_POOL_WORKERS: fallback to 4 workers
The Ink TUI (\`hermes --tui\` + dashboard \`/chat\`) had no wiring for the
background self-improvement review. When the review fired and patched
a skill or saved a memory entry, the change landed but the user had
no visual indication it happened — only the CLI had a print surface
for the '💾 Self-improvement review: …' line.
Changes:
- tui_gateway/server.py: in _init_session, attach
agent.background_review_callback to an _emit('review.summary',
sid, {text}) closure. Wrapped in try/except so agents with locked
attribute slots don't break session startup.
- ui-tui/src/app/createGatewayEventHandler.ts: handle 'review.summary'
by routing ev.payload.text through sys(…), matching the existing
'background.complete' pattern. Empty / whitespace payloads are
ignored so the transcript never gets a blank system line.
- ui-tui/src/gatewayTypes.ts: extend the GatewayEvent discriminated
union with { type: 'review.summary', payload?: { text?: string } }.
Gateway platforms (Telegram, Discord, Slack, …) already route the
review summary via background_review_callback → post-delivery queue
in gateway/run.py, so they pick up the new 'Self-improvement review:'
prefix from the companion run_agent change with no platform edits.
Tests:
- tests/tui_gateway/test_review_summary_callback.py (Python, 2 tests):
_init_session attaches a callback that emits the right event; the
callback path survives agents that can't accept the attribute.
- ui-tui/src/__tests__/createGatewayEventHandler.test.ts (vitest, 2
new cases): review.summary events feed sys(...) with the full text;
empty / missing payloads are no-ops.
- TypeScript type-check passes.
- tui_gateway suite: 64/64 pass.
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).
Root cause: two separate context values existed independently:
- context_compressor.context_length = config value (e.g. 65536) ✓
- _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config
Changes:
1. Cap Ollama num_ctx to config context_length (run_agent.py)
When model.context_length is explicitly set and no explicit
ollama_num_ctx override exists, cap the auto-detected GGUF value
to the user's context_length. This is the core fix — it prevents
Ollama from allocating more VRAM than the user budgeted.
2. Pass config_context_length through all secondary call sites
Several paths called get_model_context_length() without the config
override, falling through to the 256K default fallback:
- cli.py: @-reference expansion and /model switch display
- gateway/run.py: @-reference expansion and /model switch display
- tui_gateway/server.py: @-reference expansion
- hermes_cli/model_switch.py: resolve_display_context_length()
3. Normalize root-level context_length in config (hermes_cli/config.py)
_normalize_root_model_keys() now migrates root-level context_length
into the model section, matching existing behavior for provider and
base_url. Users who wrote `context_length: 65536` at the YAML root
instead of under `model:` had it silently ignored.
4. Fix misleading comments (agent/model_metadata.py)
DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
as two comments stated.
Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
PR #17660 landed a sweep of CI fixes but left three loose ends:
1. tests/cli/test_cli_loading_indicator.py::test_reload_mcp_sets_busy_state_
and_prints_status — /reload-mcp gained a prompt-cache-invalidation
confirmation (commit 4d7fc0f37) that was never wired into this test.
The test exercises the loading-indicator path, so pre-approve via
config and go straight into _reload_mcp().
2. tools/mcp_tool.py _make_tool_handler — the added
getattr(server, '_rpc_lock', None) + 'skip the lock if missing'
branch is inconsistent with four sibling call sites that still
direct-access server._rpc_lock. The lock is guaranteed by
MCPServerTask.__init__; falling through to an unlocked
session.call_tool would silently serialize-strip RPCs if the guard
ever triggered. Restore direct access.
3. tui_gateway/server.py _messages_as_conversation — the helper
existed only to catch 'TypeError: include_ancestors unexpected'
from mocked SessionDBs that don't actually exist. The real
SessionDB.get_messages_as_conversation has accepted
include_ancestors since introduction, and every test FakeDB in
the repo already declares the kwarg. Remove the shim, inline the
two call sites.
Reloading MCP servers rebuilds the tool set for the active session, which
invalidates the provider prompt cache (tool schemas are baked into the
system prompt). The next message re-sends full input tokens — can be
expensive on long-context or high-reasoning models.
To surface that cost, /reload-mcp now routes through a new slash-confirm
primitive with three options: Approve Once / Always Approve / Cancel.
'Always Approve' persists approvals.mcp_reload_confirm: false so future
reloads run silently.
Coverage:
* Classic CLI (cli.py) — interactive numbered prompt.
* TUI (tui_gateway + Ink ops.ts) — text warning on first call; `now` /
`always` args skip the gate; `always` also persists the opt-out.
* Messenger gateway — button UI on Telegram (inline keyboard), Discord
(discord.ui.View), Slack (Block Kit actions); text fallback on every
other platform via /approve /always /cancel replies intercepted in
gateway/run.py _handle_message.
* Config key: approvals.mcp_reload_confirm (default true).
* Auto-reload paths (CLI file watcher, TUI config-sync mtime poll) pass
confirm=true so they do NOT prompt.
Implementation:
* tools/slash_confirm.py — module-level pending-state store used by all
adapters and by the CLI prompt. Thread-safe register/resolve/clear.
* gateway/platforms/base.py — send_slash_confirm hook (default 'Not
supported' → text fallback).
* gateway/run.py — _request_slash_confirm helper + text intercept in
_handle_message (yields to in-progress tool-exec approvals so
dangerous-command /approve still unblocks the tool thread first).
Tests:
* tests/tools/test_slash_confirm.py — primitive lifecycle + async
resolution + double-click atomicity (16 tests).
* tests/hermes_cli/test_mcp_reload_confirm_gate.py — default-config
shape + deep-merge preserves user opt-out (5 tests).
Targeted runs (hermetic): 89 passed (slash-confirm, config gate,
existing agent cache, existing telegram approval buttons).
If a concurrent RPC mutates _sessions while session.delete is iterating
it (e.g. a parallel session.create on the thread pool), the bare except
swallowed the RuntimeError and let the delete proceed against a row
that may still be live. Snapshot via list(_sessions.values()) and
return an error when even that raises, instead of treating "couldn't
check" as "no active sessions."
Pressing `d` on the highlighted row in the resume picker prompts
`delete? y/n`; `y` deletes the session (DB row + on-disk transcript
files), anything else cancels. The active session is excluded from
deletion server-side.
Adds a new `session.delete` JSON-RPC handler that wraps
`SessionDB.delete_session`, forwarding the per-profile `sessions/`
directory so transcripts get cleaned up alongside the row.
* fix(tui): offload manual compaction RPC
Route TUI session compression through the existing long-handler pool so slow compaction does not block other gateway RPCs.
* fix(tui): show compaction progress immediately
Print a local status line before the compress RPC starts so slow manual compaction does not look like a no-op.
* feat(tui): rich /compress feedback parity with CLI
Show pre-compaction message count and rough token estimate immediately, emit a status update so the bottom bar reflects ongoing compaction, and report a multi-line summary (headline + token delta + optional note) using the shared summarize_manual_compression helper.
* fix(tui): show live compaction estimate in transcript
Mirror compression progress status into the transcript so users see the backend message count and token estimate while /compress is still running.
* fix(tui): single live compaction line with spinner glyph
Drop the redundant local "compressing context..." placeholder and prefix the live backend status line with a braille spinner glyph so /compress reads as a single in-progress row.
* fix(tui): address review nits on /compress feedback
Reuse the precomputed token estimate inside _compress_session_history so the gateway does not redo the O(n) work while holding history_lock, keep the status bar pinned during long manual compactions instead of auto-restoring after 4s, and drop the redundant noop bullet that doubled with the system role glyph.
* fix(tui): release history_lock during compaction LLM call
Move the snapshot/commit pattern into _compress_session_history so the lock is held only across the in-memory bookkeeping, not during agent._compress_context. Also emit a final neutral status update from session.compress so the pinned compressing indicator clears even on errors.
* fix(tui): rebuild prompt cleanly + sync session_key after compress
Pass system_message=None so AIAgent._compress_context rebuilds the system prompt without nesting the cached identity block. Reuse the handler's pre-snapshotted history inside _compress_session_history to avoid a second O(n) copy under the lock. After compaction, when AIAgent._compress_context rotates session_id, sync the gateway session_key, migrate approval notify + yolo state, restart the slash worker, and clear the stale pending title. Mirrors HermesCLI._manual_compress.
* Avoid /compress lock re-entry in slash side effects.
Stop pre-locking history before _compress_session_history in slash command mirroring, keep session-key sync parity with manual compression, and add a regression test that asserts /compress is invoked without holding history_lock.
* fix(tui): honor launch toolsets
Carry chat --toolsets through the TUI launcher so TUI sessions use the same per-session tool scope as the classic CLI.
* fix(tui): parse top-level toolsets flag
Allow top-level hermes --tui --toolsets to reach the implicit chat session, matching chat subcommand behavior.
* fix(tui): validate launch toolsets
Filter invalid HERMES_TUI_TOOLSETS entries and fall back to configured CLI toolsets when the override contains no valid toolsets.
* fix(tui): avoid config load for builtin toolsets
Honor built-in HERMES_TUI_TOOLSETS values before loading config and treat all/* as the all-toolsets sentinel.
* fix(cli): honor toolsets in oneshot mode
Forward top-level --toolsets into oneshot agent construction so the flag is not silently ignored outside the TUI path.
* fix(cli): validate oneshot toolsets
Reject invalid-only oneshot toolset overrides before output redirection and clarify TUI fallback warnings.
* fix(cli): preserve all-toolsets sentinel
Map explicit all/* oneshot toolset overrides to the all-toolsets sentinel and replace locals() checks in TUI toolset loading.
* fix(cli): warn on extra all-toolset entries
Warn when all/* toolset overrides include additional ignored entries so typos are still visible.
* fix(tui): honor plugin toolset overrides
Discover plugin toolsets before rejecting unresolved explicit toolset overrides and read raw config for MCP name validation.
* fix(tui): reuse toolset argument normalizer
Share top-level TUI toolset argument parsing with the oneshot path to avoid duplicate normalization logic.
* fix(cli): reject disabled mcp toolsets
Validate explicit toolset overrides against enabled MCP servers only and clarify top-level toolset flag help.
* fix(cli): distinguish disabled mcp from unknown toolsets
Report disabled MCP servers separately from unknown toolset entries and stub plugin discovery in invalid-name tests for determinism.
Finish the Copilot review cleanup for lazy prompt submission:
- prompt.submit now claims session.running before returning success, preserving
the existing RPC-level session busy error so the frontend can queue.
- agent-init timeout/failure now emits a normal error event instead of writing a
second JSON-RPC response for an already-settled request id.
Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Respond to Copilot's lazy-start review: session metadata/history/usage do not
need a constructed AIAgent, so keep them on the no-wait session path. This
preserves the deferred startup model and avoids blocking simple session RPCs on
agent initialization.
Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Classic CLI exposes ``/reload`` (re-reads ~/.hermes/.env into
``os.environ`` via ``hermes_cli.config.reload_env``) so newly added API
keys take effect without restarting the session. The TUI was missing
the parity command, so users had to Ctrl+C out and ``hermes --tui``
again whenever they added or rotated a credential.
Three small wires:
* New ``reload.env`` JSON-RPC method in ``tui_gateway/server.py`` that
delegates to ``hermes_cli.config.reload_env`` and returns the count
of vars updated.
* New ``/reload`` slash command in ``ui-tui/src/app/slash/commands/ops.ts``
matching the existing ``/reload-mcp`` pattern (native RPC, no slash
worker).
* Drop ``cli_only=True`` from the ``reload`` ``CommandDef`` in
``hermes_cli/commands.py`` so help/menus surface it in the TUI too.
``reload_env`` itself is environment-agnostic.
Same caveat as classic CLI: the *currently constructed* agent's
credential pool / provider routing does not auto-rebuild. Users who
want a brand-new credential resolution should follow with ``/new``.
Tests:
* New ``test_reload_env_rpc_calls_hermes_cli_reload_env`` confirms
RPC delegates and reports the count.
* New ``test_reload_env_rpc_surfaces_errors`` confirms exceptions are
rendered as JSON-RPC errors.
* ``createSlashHandler.test.ts`` slash-parity matrix extended with
``['/reload', 'reload.env', {}]`` so we can't regress the routing.
Validation:
scripts/run_tests.sh tests/test_tui_gateway_server.py — 92/92.
scripts/run_tests.sh tests/hermes_cli/test_commands.py — 128/128.
cd ui-tui && npm run type-check — clean; npm test --run — 390/390.
* Reject unsupported schemes (anything outside http/https/ws/wss) in
cli.py /browser connect before probing or persisting, matching the
gateway's existing 4015 path.
* Defend gateway browser.manage against `{"url": null}` and
non-string urls: empty/null falls back to DEFAULT_BROWSER_CDP_URL,
non-string returns a 4015 instead of slipping into the generic
5031 catch via TypeError on `"://" in url`.
* Add regression tests for both null-url fallback and non-string
rejection.
* Gate `browser.progress` emit on truthy `session_id`. The TUI
prints `messages` from the response when there's no session, so
emitting events too would double-render. Now: with a session →
events stream live; without one → bundled messages only.
* Resolve `system = platform.system()` once in `_browser_connect`
and thread it through `try_launch_chrome_debug` and
`_failure_messages` → `manual_chrome_debug_command`, so the
generated hint is consistent (and tests are deterministic) on
any host.
* Add `test_browser_manage_connect_no_session_skips_progress_events`
to lock in the gating behavior.
Fixes from Copilot's two passes on PR #17238:
* Validate parsed URL once: reject missing host, invalid port, and
unsupported scheme up front so malformed inputs (e.g. http://:9222
or http://localhost:abc) don't fall through to a generic 5031.
* Tighten _is_default_local_cdp to require a discovery-style path so
ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare
http://127.0.0.1:9222 (which would lose the path and break the
connect).
* Move browser.manage into _LONG_HANDLERS so the up-to-10s
launch-and-retry loop runs on the RPC pool instead of blocking the
main dispatcher.
* try_launch_chrome_debug uses Windows-appropriate detach kwargs
(creationflags=DETACHED_PROCESS|CREATE_NEW_PROCESS_GROUP) instead
of POSIX-only start_new_session=True.
* manual_chrome_debug_command uses subprocess.list2cmdline on
Windows so the printed instruction is cmd.exe-compatible.
* Mirror host/port validation in cli.py /browser connect so the
classic CLI never persists an invalid BROWSER_CDP_URL.
Split browser.manage into a small dispatcher with named connect/disconnect
helpers, fold _http_ok / _probe_urls / _normalize_cdp_url out of the nested
probe loop, collapse the failure-message scaffolding, and DRY the chrome
candidate path tables. Behaviour and event shape unchanged.
Emit browser.progress JSON-RPC notifications during the connect work and render them in the TUI as system transcript lines, so users see the same step-by-step status the base CLI prints instead of nothing for ~1m followed by a final result.
Return CLI-style browser connect status messages from the gateway and render them in the TUI so local Chrome launch attempts are visible instead of ending in a silent delayed failure.
Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.
Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.
Clean up the remaining review nits:
- let the deferred @hermes/ink import retry after a transient failure instead
of memoizing a rejected promise forever
- keep memory-monitor in-flight state inside a finally so future exceptions
cannot suppress that memory level indefinitely
- use read_raw_config for the TUI MCP cold-start probe instead of full
load_config()
- keep input.detect_drop for explicit relative path prefixes (./ and ../)
while preserving the no-RPC fast path for ordinary plain prompts
Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
A cleanup review found that adding prompt.submit to _LONG_HANDLERS made the RPC
pool own the full first-turn wait even though the handler itself already spawns
a turn thread. Keep prompt.submit inline and make it return immediately:
- look up the session without waiting
- kick the lazy agent build
- spawn a short waiter thread that blocks on agent_ready, then starts the
existing turn dispatcher
This keeps stdin dispatch responsive, avoids occupying a bounded pool worker for
a normal chat turn, and preserves the lazy-start hydration behavior.
Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Copilot correctly flagged two concurrency windows:
- memoryMonitor could re-enter while awaiting the lazy @hermes/ink import or
heap dump, producing duplicate imports/dumps under sustained pressure.
- _start_agent_build used a check-then-set guard without synchronization, so
concurrent agent-backed RPCs could start duplicate agent builders.
Fix both with single-flight guards: cache the dynamic import promise and track
per-level dump in-flight state in memoryMonitor, and protect the TUI agent build
flag with a per-session lock.
Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
The lazy startup panel could remain stuck on the placeholder when no first
prompt was submitted because agent construction only started from _sess(). Keep
session.create cheap, but schedule _start_agent_build shortly after returning
the placeholder so tools/skills hydrate automatically.
Also replace the ugly placeholder bar rows with compact unicode-animations
braille loaders for the tools and skills sections.
Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
Match classic CLI perceived startup behavior: show the TUI shell and composer
before constructing the full AIAgent. session.create now returns a lightweight
placeholder session with lazy=true and no longer starts _make_agent eagerly.
The first method that needs the agent triggers _start_agent_build() via _sess();
prompt.submit is routed through the RPC worker pool so that the initial wait for
agent construction does not block the stdio dispatcher.
The intro panel renders skeleton rows for tools/skills while the real
session.info payload is absent, then hydrates to the real tools/skills panel once
AIAgent initialization completes. Also skip the startup /voice status probe and
avoid the input.detect_drop RPC for ordinary plain-text prompts to keep early
startup/first-submit paths cheap.
Measurements on macOS Terminal.app:
- Previous full ready p50 after earlier PR commits: ~1537ms
- Lazy skeleton panel p50: ~794ms
- Original baseline full ready p50: ~1843ms
So the visible startup surface is now ~743ms faster than the prior PR state and
~1.05s faster than the original baseline. First prompt still pays the same agent
construction cost if it races the background/skeleton state, matching classic
CLI's deferred behavior.
Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Two targeted fixes on the critical path from `hermes --tui` launch to
`gateway.ready`:
1. **Defer `@hermes/ink` import in memoryMonitor.ts.** The static top-level
import dragged the full ~414KB Ink bundle (React + renderer + all
components/hooks) onto the critical path *before* `gw.start()` could
spawn the Python gateway — serialising ~155ms of Node work in front of
it on every launch. `evictInkCaches` only runs inside the 10-second
tick under heap pressure, so it moves to a lazy dynamic import. First
tick hits the ESM cache because the app entry has long since imported
`@hermes/ink`.
2. **Gate `tools.mcp_tool` import on config in tui_gateway/entry.py.**
Importing the module transitively pulls the MCP SDK + pydantic + httpx
+ jsonschema + starlette formparsers (~200ms). The overwhelming
majority of users have no `mcp_servers` configured, so this runs for
nothing. A cheap `load_config()` check (~25ms) skips the 200ms import
when no servers are declared, with a conservative fallback to the old
behaviour if the config probe itself fails.
## Measurements (macOS Terminal.app, Apple Silicon, n=12)
| Metric | Before (p50) | After (p50) | Δ |
|----------------------------|--------------|-------------|----------|
| Python gateway boot alone | 252–365ms | 105–151ms | −180ms |
| `hermes --tui` banner paint | 686ms | 665ms | −21ms |
| `hermes --tui` → ready | **1843ms** | **1655ms** | **−188ms (−10.2%)** |
| `hermes --tui` → ready p90 | 1932ms | 1778ms | −154ms |
| stdev (ready) | 126ms | 83ms | also more consistent |
## Tests
- `scripts/run_tests.sh tests/tui_gateway/ tests/tools/test_mcp_tool.py`:
195 passed. (The one pre-existing failure in
`test_session_resume_returns_hydrated_messages` reproduces on main —
unrelated, it's a mock-DB kwarg mismatch.)
- `ui-tui` vitest: 430 tests, all pass.
- `npm run type-check` in ui-tui: clean.
## Notes
- Node-side first paint ("banner") didn't move meaningfully because that
latency is dominated by Ink's render pipeline + React mount, not by
which imports load first.
- The win shows up entirely in the time from banner to `gateway.ready`
— exactly where we expected it, since both fixes shorten the Python
gateway's boot path or let it overlap more with Node startup.
- No user-visible behaviour change. Memory monitoring still fires every
10s; MCP still works when `mcp_servers` is configured.
* fix(tui): honor documented mouse_tracking config key
The TUI runtime was reading display.tui_mouse while docs and user-facing
examples pointed users at display.mouse_tracking. That made persistent
mouse-disable config look like a no-op for users trying to restore native
terminal selection/copy behavior on Linux/SSH/tmux terminals.
Use display.mouse_tracking as the canonical key, keep display.tui_mouse as
a legacy fallback, and have /mouse write the documented key. Both gateway
config.get and client-side config sync now share the same precedence: the
canonical key wins, then the legacy key, then default on.
* review(copilot): align mouse tracking config coercion
- Load gateway config once before deriving display.mouse_tracking state.
- Use key-presence precedence on the TUI client too, so canonical
mouse_tracking wins over legacy tui_mouse even when the value is null.
- Treat numeric 0 as disabled on both gateway and client, matching the
existing string "0" handling.
- Widen ConfigDisplayConfig mouse fields because config.get full returns raw
YAML, not normalized booleans.
* feat(tui): pluggable busy-indicator styles (kaomoji/emoji/unicode/ascii)
The status-bar `FaceTicker` rotated through wide-and-variable kaomoji
glyphs (`(。•́︿•̀。)`, `( ͡° ͜ʖ ͡°)`, …) every 2.5s. Real display widths range
from ~5 to ~16 columns, so the rest of the bar (cwd, ctx %, voice,
bg counter) shifted on every cycle. Padding the verb alone (#17116)
helped but didn't address the dominant jitter source — the glyph
itself.
Add four indicator styles, configurable + hot-swappable:
* `kaomoji` (default — preserves the existing vibe; verb is now
pad-stable so the only width churn left is the kaomoji itself).
* `emoji` — single 2-col emoji frame (`⚕ 🌀🤔✨🍵🔮`).
* `unicode` — `unicode-animations` braille spinner (1-col, smooth).
* `ascii` — `| / - \` (1-col, max compat).
Wires:
* `display.tui_status_indicator` in `DEFAULT_CONFIG` (default
`kaomoji`).
* New JSON-RPC `config.set/get indicator` keys, narrow allow-list.
* `applyDisplay` reads the field and patches `UiState.indicatorStyle`,
so the existing `mtime` poll picks up `~/.hermes/config.yaml` edits
within ~5s without a TUI restart.
* `/indicator [style]` slash command (alias `/indicator-style`,
subcommand completion `kaomoji|emoji|unicode|ascii`). Bare form
shows the current style; setter fires `config.set` and
optimistically `patchUiState({ indicatorStyle })` so the live TUI
swaps immediately, matching the `/skin` UX.
* `CommandDef("indicator", ..., subcommands=...)` so classic CLI
autocomplete + TUI `complete.slash` both surface it.
* `FaceTicker` decouples spinner cadence from verb cadence — the
glyph runs at the spinner's authored interval (or `FACE_TICK_MS`
for kaomoji), the verb stays on the original 2.5s cycle, and both
re-arm cleanly when style changes.
Tests:
* `normalizeIndicatorStyle` rejects unknown / non-string input.
* `applyDisplay → tui_status_indicator` covers fan-out + fallback.
* `/indicator <style>` hot-swaps `UiState.indicatorStyle` after a
successful `config.set`.
* `/indicator sparkle` rejects with the usage hint and never hits
the gateway.
* Slash-parity matrix gets `'/indicator'` → `config.get`.
Validation:
cd ui-tui && npm run type-check — clean; npm test --run — 398/398.
scripts/run_tests.sh tests/test_tui_gateway_server.py
tests/hermes_cli/test_commands.py — 220/220.
* chore(tui): drop /indicator-style alias to declutter autocomplete
* fix(tui): drop verb-width pad — /indicator handles glyph jitter directly
* fix(tui): unicode indicator style hides the verb (cleanest option)
* refactor(tui): single source of truth for INDICATOR_STYLES; cleaner error format
Round 1 Copilot review on PR #17150:
- Exported `INDICATOR_STYLES` const tuple from `interfaces.ts`;
`IndicatorStyle` union type is derived from it. `useConfigSync`
builds its validation Set from the tuple, and `session.ts` uses it
for both the usage hint and the runtime allow-list — adding/removing
a style now touches one line.
- Backend `config.set indicator` error message: switched
`sorted(allowed)` list repr to `pick one of ascii|emoji|kaomoji|unicode`
(matches the TUI usage hint), and reports the normalized `raw`
instead of the original `value`. Backend allowed tuple now has a
comment pointing back at `INDICATOR_STYLES` so the two stay aligned.
Note: kept the verb portion unpadded per design intent — fixed-width
padding was the exact UX the `/indicator` command was added to remove.
Stable width comes from the glyph; verbs cycling is part of the kawaii
aesthetic. Reply on the verb thread will explain.
* fix(tui): drop type collapse + gate verb timer + DEFAULT_INDICATOR_STYLE
Round 2 Copilot review on PR #17150:
- `tui_status_indicator?: 'ascii' | ... | string` collapses to `string`
in TS — consumers got no narrowing. Documented as plain `string` with
a comment about runtime validation via `normalizeIndicatorStyle`.
- `FaceTicker` always started a 2.5s verb interval, even for the
`unicode` style which hides the verb entirely. Now gated on
`showVerb` from `renderIndicator` — `unicode` stays calm.
Pre-emptive self-review (avoid round 3):
- Three call sites duplicated the literal `'kaomoji'` default
(uiStore, normalizeIndicatorStyle, slash command). Added
`DEFAULT_INDICATOR_STYLE` to interfaces.ts and threaded it through
so changing the default touches one line.
* fix(tui-gateway): normalize config.get indicator output to match TUI render
Round 4 Copilot review on PR #17150: `config.get` for `indicator`
returned the raw `display.tui_status_indicator` value without
validation, so a hand-edited config.yaml with stray casing or an
unknown style would leave `/indicator` printing one thing while
the TUI rendered the kaomoji default (frontend's
`normalizeIndicatorStyle` does this normalization on receive).
Lifted the allow-list to module scope as `_INDICATOR_STYLES` /
`_INDICATOR_DEFAULT`, reused by both `config.set` and `config.get`.
Comment notes the alignment with `INDICATOR_STYLES` /
`DEFAULT_INDICATOR_STYLE` in interfaces.ts so adding/removing a
style is a one-line change on each end.
Tests cover: known value verbatim, casing/whitespace normalize,
unknown→default, unset→default.
* fix(tui-gateway): preserve falsy-input diagnostics in config.set indicator error
Round 5 Copilot review on PR #17150: `raw = str(value or "").strip().lower()`
collapsed any falsy non-string (`0`, `False`, `[]`) to empty string,
so the error message read `unknown indicator: ` with nothing after —
losing the original input.
Switched to `("" if value is None else str(value)).strip().lower()`
so only `None` (the genuine 'no value' case) becomes blank. Used
`{raw!r}` in the error so the diagnostic is unambiguous (`'0'` vs `0`).
Tests:
- known-value happy path (`'EMOJI'` → `'emoji'`)
- falsy non-string inputs (`0` / `False` / `[]`) surface meaningfully
- `None` keeps the blank-repr error
* fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races
`tui_gateway` reports `tui_gateway_crash.log` traces where the main
thread sits in `sys.stdin` while a worker holds `_stdout_lock` mid-
flush, and SIGTERM then calls `sys.exit(0)` while the lock is still
held — the interpreter shutdown stalls behind the wedged write.
Two narrowly scoped hardenings:
**`tui_gateway/transport.py`**
* Move JSON serialisation outside the lock — long messages no longer
block sibling writers while we serialise.
* Treat `BrokenPipeError`, `ValueError` ("I/O on closed file") and
generic `OSError` from both `write` and `flush` as "peer is gone":
return `False` instead of bubbling, matching what `write_json`'s
callers in `entry.py` already expect.
* Split `flush` into its own try block so a stuck flush never strands
a partial write or holds the lock indefinitely on its way out.
* Optional `HERMES_TUI_GATEWAY_NO_FLUSH=1` env knob to skip explicit
`flush()` entirely on environments where a half-closed read pipe
produces an indefinite kernel-level block. Default unchanged.
**`tui_gateway/entry.py`**
* `_log_signal` now spawns a 1-second daemon timer that calls
`os._exit(0)` if the orderly `sys.exit(0)` path is itself stuck
behind a wedged worker. Atexit handlers run inside the grace
window when they can; the timer is the safety net so a deadlocked
flush no longer strands the gateway process.
Tests:
* `test_write_json_closed_stream_returns_false` — ValueError path.
* `test_write_json_oserror_on_flush_returns_false` — OSError on flush
must not strand the lock; the write portion still landed before the
flush failure.
* `test_write_json_no_flush_env_skips_flush` — env knob bypass.
Validation: `scripts/run_tests.sh tests/tui_gateway/test_protocol.py`
(42/42 pass; one pre-existing failure on
`test_session_resume_returns_hydrated_messages` is unrelated to this
change — same `include_ancestors` mock kwarg issue tracked elsewhere).
`scripts/run_tests.sh tests/test_tui_gateway_server.py` 90/90 pass.
* review(copilot): tighten transport hardening comments + test cleanup
* review(copilot): narrow exception capture, configurable grace, simpler no-flush test
* fix(tui-gateway): narrow ValueError to closed-stream; surface UnicodeEncodeError
Copilot review on PR #17118: `UnicodeEncodeError` is a ValueError
subclass, so a non-UTF-8 stdout (mismatched PYTHONIOENCODING / locale)
would have been silently swallowed as 'peer gone' under
`except ValueError`. That hides a real environment bug.
Now:
- UnicodeEncodeError → log with exc_info (warning) and drop the frame
- ValueError where str(e) contains 'closed file' → peer gone, return False
- Any other ValueError → log loudly, drop frame (defensive, but visible)
Same shape applied to flush. Adds two regression tests.
* fix(tui-gateway): reserve write() False for peer-gone; re-raise programming errors
Round 2 Copilot review on PR #17118: `Transport.write()` returning
`False` is documented as 'peer is gone', and `entry.py` reacts by
calling `sys.exit(0)`. But the implementation also returned False
for non-IO conditions (non-JSON-safe payloads, UnicodeEncodeError,
unrelated ValueErrors), so a programming error or local env bug would
present as a clean disconnect — exactly the diagnosis pain we wanted
to eliminate.
Now:
- `json.dumps` failure → re-raises (TypeError/ValueError surfaces in crash log)
- `BrokenPipeError` → False (peer gone)
- `ValueError('...closed file...')` → False (peer gone)
- `UnicodeEncodeError` and any other ValueError → re-raise
- `OSError` → False (existing IO-failure semantics, debug-logged)
Tests updated to assert the re-raise behaviour and added a
non-serializable-payload regression test.
* fix(tui-gateway): narrow OSError to peer-gone errnos; honest test naming
Round 3 Copilot review on PR #17118:
- Docstring claimed False = peer gone, but generic OSError on write/flush
also returned False — meaning ENOSPC/EACCES/EIO would silently exit.
Added `_PEER_GONE_ERRNOS = {EPIPE, ECONNRESET, EBADF, ESHUTDOWN, +WSA}`
and narrowed the OSError handlers; non-peer-gone errnos re-raise.
Docstring now lists OSError as peer-gone branch with the errno set.
- The `_DISABLE_FLUSH` test was named after the env var but actually
patched the module constant. Renamed it to reflect the contract being
tested (skips flush when constant is true) AND added a real
end-to-end test that sets the env var, reloads transport.py, and
asserts the constant flips. Cleanup reload restores defaults so
parallel tests stay isolated.
Self-review (avoid round 4):
- Verified TeeTransport's secondary-swallow stays intentional.
- _log_signal grace path already covered by separate tests.
* fix(tui): make /browser connect actually take effect on the live agent
Reports were that `/browser connect <url>` (and "changes to CDP url
don't get picked up") didn't propagate to the live agent in `--tui`,
forcing users to fall back to setting `browser.cdp_url` in
`config.yaml` and restarting. Tracing the path on current main shows
the protocol wiring is already correct — `/browser` is registered in
`ui-tui/src/app/slash/commands/ops.ts` and dispatches `browser.manage`
through the gateway RPC, NOT the slash worker (covered by the
`browser.manage` row in `slashParity.test.ts`). But three real gaps
left the experience flaky:
1. `cleanup_all_browsers()` ran AFTER `os.environ["BROWSER_CDP_URL"]`
was rewritten. `_ensure_cdp_supervisor(...)` reads the env to
resolve its target URL, so a tool call landing in that brief window
could re-attach the supervisor to the OLD CDP endpoint just before
we reaped sessions, leaving the agent talking to a dead URL.
Reorder to clean first, swap env, clean again so the supervisor
for the default task is definitively closed.
2. `browser.manage status` reported only the env var, ignoring
`browser.cdp_url` from config.yaml. `_get_cdp_override()` (the
resolver the agent itself uses) consults both — match it so
`/browser status` answers the same question the next
`browser_navigate` will see. Closes a stealth bug where users
saw "browser not connected" while their CDP URL was perfectly
set in config.yaml.
3. `/browser disconnect` only cleared `BROWSER_CDP_URL` and reaped
once, leaving the same swap window as connect. Symmetrical
double-cleanup here too.
Frontend (`ops.ts`):
* Echo "next browser tool call will use this CDP endpoint" on success
so users see immediate confirmation that the gateway accepted the
swap, even before any tool runs.
* Mention `browser.cdp_url` in `config.yaml` in the usage hint and
the not-connected status line. Persistent config is the correct
fix for some terminal-multiplexer / sub-agent flows where env
inheritance is unreliable; surfacing it makes that workaround
discoverable.
Tests (4 new, all hermetic):
* `status` returns the resolved URL when only `browser.cdp_url` is
set in config.yaml.
* `connect` writes env AND cleans before/after, in that order.
* `connect` against an unreachable endpoint does NOT mutate env or
reap.
* `disconnect` removes env and cleans twice.
Validation:
scripts/run_tests.sh tests/test_tui_gateway_server.py — 94/94 pass.
cd ui-tui && npm run type-check — clean; npm test --run — 389/389.
* review(copilot): always defer to _get_cdp_override; normalize bare host:port
* review(copilot): collapse discovery-style CDP paths so /json/version isn't duplicated
* fix(tui): /browser status must not perform CDP discovery I/O
Copilot review on PR #17120: previous version routed through
`tools.browser_tool._get_cdp_override`, which calls
`_resolve_cdp_override` and performs an HTTP probe to /json/version
with a multi-second timeout for discovery-style URLs. That blocks
the TUI on `/browser status` whenever the configured host is slow
or unreachable.
Status now reads env-then-config directly with no network I/O. The
WS normalization still happens in `browser_navigate` for actual
tool calls, so behaviour-on-call is unchanged.
* fix(tui): skip /json/version probe for concrete ws://devtools/browser endpoints
Round 2 Copilot review on PR #17120: hosted CDP providers (Browserbase,
browserless, etc.) return concrete `ws[s]://.../devtools/browser/<id>`
URLs which are already directly connectable but don't serve the HTTP
discovery path. The previous `/json/version` probe rejected these
valid endpoints with 'could not reach browser CDP'.
For `ws[s]://...` URLs whose path starts with `/devtools/browser/` we
now do a TCP-level reachability check (`socket.create_connection`)
instead of the HTTP probe. The actual CDP handshake happens on the
next `browser_navigate` call, so we still surface unreachable hosts
as 5031 errors — just without the false negatives.
Discovery-style URLs (`http://host:port[/json[/version]]`) keep the
HTTP probe path unchanged. Updated existing test + added two new
ones (TCP-only success, TCP unreachable → 5031).
* feat(tui): opt-in auto-resume of the most recent session
`hermes --tui` always forges a fresh session at startup unless the user
sets `HERMES_TUI_RESUME=<id>`. Disconnects, terminal-window crashes,
and accidental Ctrl+D therefore lose every piece of in-flight context
even though `state.db` still has the full history a `/resume` away.
Add an opt-in path that mirrors classic CLI's `hermes -c` muscle
memory: when `display.tui_auto_resume_recent: true` is set in
`~/.hermes/config.yaml`, the TUI looks up the most recent human-facing
session and resumes it instead of starting fresh. Default off so
existing users aren't surprised; explicit `HERMES_TUI_RESUME` always
wins.
Wires:
* New `session.most_recent` JSON-RPC in `tui_gateway/server.py` that
returns the first non-`tool` row from `list_sessions_rich`, or
`{"session_id": null}` when none. Uses the same deny-list as
`session.list` so sub-agent rows can't sneak in.
* `createGatewayEventHandler.handleReady` re-ordered: explicit
`STARTUP_RESUME_ID` first (unchanged), then conditional auto-resume
via `config.get full → display.tui_auto_resume_recent`, then the
legacy `newSession()` fallback. Failures of either RPC fall back
to `newSession()` so the path is always finite.
* Default `display.tui_auto_resume_recent: False` added to
`DEFAULT_CONFIG` in `hermes_cli/config.py` (no `_config_version`
bump per AGENTS.md — deep-merge handles the additive key).
Tests:
* 4 new vitest cases in `createGatewayEventHandler.test.ts` cover
every gate-and-fallback combination (env wins, config off, config
on with hit, config on with miss).
* 3 new pytest cases for `session.most_recent` (denied row skip,
tool-only → null, db-unavailable → null).
Validation:
scripts/run_tests.sh tests/test_tui_gateway_server.py — 93/93.
cd ui-tui && npm run type-check — clean; npm test --run — 393/393.
* review(copilot): fold session.most_recent errors into null + extend ConfigDisplayConfig
* review(copilot): cover RPC-rejection fallbacks in auto-resume tests
provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS
table containing only hardcoded us.* model IDs. Users configured for
non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no
models in /model and autocomplete.
Root causes fixed:
1. models.py: provider_model_ids() now calls discover_bedrock_models()
keyed by the resolved region before falling back to the static table.
A new bedrock_model_ids_or_none() helper in bedrock_adapter.py
consolidates the discover -> extract IDs -> fallback pattern used by
all three call sites.
2. providers.py: registers bedrock in HERMES_OVERLAYS with
transport=bedrock_converse and auth_type=aws_sdk so
get_provider("bedrock") and resolve_provider_full("bedrock") work.
3. model_switch.py: list_authenticated_providers() sections 2 and 3
detect AWS credentials via has_aws_credentials() for aws_sdk
overlays and use live discovery for the model list.
4. bedrock_adapter.py: resolve_bedrock_region() reads the configured
region from botocore.session before falling back to us-east-1,
covering users who set their region in ~/.aws/config via a named
profile rather than env vars.
5. tui_gateway/server.py: passes provider= to get_model_context_length()
so context window lookups work correctly for the Bedrock provider.
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).
The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread. A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.
Fix: remove the module-level call. Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:
- gateway/run.py: loop.run_in_executor(None, discover_mcp_tools)
before platforms start accepting traffic
- hermes_cli/main.py: inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()
Closes#16856.
`/new` after `/model <custom-provider>:<model>` silently reverted to a
native provider whose static catalog happened to contain the same model
name (e.g. `deepseek-v4-pro` → native `deepseek` → 401).
Root cause at the `/model` writeback site: `HERMES_INFERENCE_PROVIDER`
was set unconditionally but `HERMES_TUI_PROVIDER` was only mirrored when
it was already set. On sessions launched without `--provider`,
`HERMES_TUI_PROVIDER` stayed unset, so `_resolve_startup_runtime()` on
`/new` skipped the explicit-provider early return and fell through to
`detect_static_provider_for_model()`.
Fix: set `HERMES_TUI_PROVIDER` unconditionally alongside
`HERMES_INFERENCE_PROVIDER` when `/model` lands. Keeps #15755's
invariant intact — `HERMES_TUI_PROVIDER` remains the canonical
"explicit this process" carrier, `HERMES_INFERENCE_PROVIDER` remains
ambient and does not short-circuit startup resolution.
Bug report and diagnosis: @Bartok9 in #16857 / #16873.
Fixes#16857
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.