- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.
Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.
Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.
- create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally
- add assertions that launch wiring uses a randomized session file path and removes it on exit
- use a grouped last_active join in search_sessions to avoid per-row correlated max lookups
- always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup
- order session listing by computed last_active in SessionDB so callers get MRU rows directly
- keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling
Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.
The auto-lowered-threshold warning only named the compression model,
making it confusing when the main and aux models are configured with
the same slug but end up with different resolved context lengths (e.g.
OpenRouter's stepfun/step-3.5-flash catalog value vs. a main-model
context_length override). Users couldn't tell whether the warning
reflected two different models or a context-resolution mismatch.
Now includes both 'model (provider)' labels. The aux provider falls
back to the client's base_url hostname when the configured provider
is 'auto', so users see where compression is actually being called.
Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.
Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.
Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
* fix: clean gateway auxiliary client caches on teardown
* fix(gateway): recover from stale pid files and close cron agents
Two issues were keeping the gateway from surviving long runs:
1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
refuses to unlink when the file's pid differs from our own. That
safety check exists for the --replace atexit handoff, but it also
applied to stale-record cleanup, so after a crashy exit the pid
file was orphaned: `write_pid_file()`'s O_EXCL create then failed
with `FileExistsError`, and systemd looped on "PID file race lost
to another gateway instance". Unlink unconditionally from this
helper since the caller has already verified the record is dead.
2. The cron scheduler never closed the ephemeral `AIAgent` it creates
per tick, and never swept the process-global auxiliary-client
cache. Over days of 10-minute ticks this leaked subprocesses and
async httpx transports until the gateway hit EMFILE. Release the
agent and call `cleanup_stale_async_clients()` in `run_job`'s
outer `finally`, matching the gateway's own per-turn cleanup.
* chore(release): map bloodcarter@gmail.com -> bloodcarter
---------
Co-authored-by: bloodcarter <bloodcarter@gmail.com>
When a paste takes longer than 500ms to process on the prompt_toolkit
event-loop thread, emit a logger.warning with elapsed time, byte size,
line count, and sys.platform. Gives us concrete repro data for the
recurring 'CLI freezes after paste on macOS' class of reports (issue
#16263, plus sibling reports across Claude Code / Cursor / Lightroom
against macOS Tahoe 26).
Pure diagnostic — no behavior change. Two time.perf_counter() calls
and one conditional per paste event. Log line only fires when the
handler is actually slow, so normal pastes add no log noise.
The backup takes a consistent snapshot of each .db via sqlite3.backup(),
so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the
fresh snapshot with stale sidecar state and produces a torn restore on
first open. Sidecars are transient and SQLite regenerates them on next
connection anyway.
This also trims multi-MB of junk from every zip — state.db-wal alone was
~9 MB here, doubled by the fact the WAL is the live write-ahead log, not
data.
PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor
workers didn't inherit the CLI's TLS approval callback). Two vectors
remained that could still land in the deadlocking input() fallback:
1. _spawn_background_review spawns a raw threading.Thread with no
approval callback installed, so any dangerous-command guard the
review agent trips falls back to input() -> deadlock against the
parent's prompt_toolkit TUI (same class as delegate_task subagents,
fixed in 023b1bff1 / #15491). Install a _bg_review_auto_deny
callback at thread start, clear on finally.
2. prompt_dangerous_approval's fallback unconditionally spawned a
daemon thread calling input() when approval_callback was None.
That fallback can never succeed under prompt_toolkit because the
user's Enter goes to pt's raw-mode stdin capture. Detect an active
pt Application via get_app_or_none() and fail closed (deny + log)
instead, so future threads that forget to install a callback
degrade gracefully instead of hanging 60s invisibly.
Regression guards:
- tests/run_agent/test_background_review.py verifies the review
worker thread sees a callable auto-deny callback mid-run and that
the slot is cleared in the finally block.
- tests/tools/test_approval.py TestFailClosedUnderPromptToolkit
verifies prompt_dangerous_approval returns 'deny' fast under a
mocked pt Application, and that a real callback still wins over
the guard.
When tools execute concurrently via ThreadPoolExecutor, worker threads
could not see the thread-local approval/sudo callbacks registered by
the CLI. This caused dangerous-command prompts to fall back to plain
input(), which deadlocks against prompt_toolkit's raw terminal mode.
Capture parent-thread callbacks before launching workers, register
them locally in each _run_tool thread, and clear them on exit.
Mirrors the existing fix pattern from cli.py run_agent() for the
main agent worker thread (GHSA-qg5c-hvr5-hjgr / #13617).
The background skill/memory review agent was created without toolset
restrictions, inheriting the full default tool set. This allowed it to
use terminal, send_message, delegate_task, and other tools outside its
intended scope, potentially performing unrelated side effects after
skill creation.
Restrict the review agent to only memory and skills toolsets by passing
enabled_toolsets=['memory', 'skills'] during AIAgent construction.
Fixes#15204
The gateway fix in the previous commit forwards _session_messages on
gateway session teardown. The CLI exit cleanup path had the same bug:
it read getattr(agent, 'conversation_history', None) or [] — but AIAgent
has no conversation_history attribute, so providers always received [].
Switch to _session_messages (same attribute the gateway now uses),
guarded by isinstance(..., list) to preserve the no-arg fallback for
MagicMock-based CLI test stubs.
Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring
the gateway suite).