The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.
Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).
Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
- Feature Nous Portal as the primary onboarding card (Recommended tag,
app logo, single pitch line); collapse other OAuth providers behind an
"Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
an empty credential or only implicit Bedrock/IAM, so fresh installs never
saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
via the /proc kernel marker.
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
* fix(model picker): unify /model and `hermes model` model lists, add disk cache
The /model slash picker and `hermes model` were drifting apart. /model
read the raw static `OPENROUTER_MODELS` list (31 entries, including 5
that fail at runtime — no tool-call support or absent from live catalog),
while `hermes model` ran the same list through the live OpenRouter
/v1/models tool-support filter and showed 26 valid entries. Same problem
existed for every other authed provider: /model used curated static
lists, `hermes model` used live /v1/models.
Unifies both surfaces on `provider_model_ids()` and adds a generic
disk-cached wrapper so the picker stays snappy.
Changes
- hermes_cli/models.py: new `cached_provider_model_ids()` —
~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries
keyed by credential fingerprint (env vars + OAuth file mtimes).
Stale-data-beats-no-data on transient failures. Pair with
`clear_provider_models_cache(provider=None)`.
- hermes_cli/models.py: `provider_model_ids("nous")` now falls back
to the docs-hosted manifest (not the in-repo snapshot) when the live
Portal /models call fails — preserves the model_catalog regression
guarantee while still going through the unified pathway.
- hermes_cli/model_switch.py: `list_authenticated_providers` routes
sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with
curated fallback when the live fetcher comes up empty.
- hermes_cli/model_switch.py: `parse_model_flags` extended to a
4-tuple, parses `--refresh`.
- cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking;
CLI + gateway wire `--refresh` to `clear_provider_models_cache()`.
- hermes_cli/main.py: `hermes model --refresh` argparse flag.
- hermes_cli/commands.py: `/model` args_hint advertises `--refresh`.
- tests/hermes_cli/test_inventory.py: refresh stale comment.
Live PTY parity verification
- /model → OpenRouter row: `(26 models)` (was 31, with broken entries)
- `hermes model` → OpenRouter: 26 models (unchanged)
- The 5 dropped entries: `pareto-code` (no tool-call support),
`gemini-3-pro-image-preview` (no tool-call support),
`elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone
from OpenRouter's live catalog).
Live PTY timing
- First /model open, empty cache: 4624 ms (full network round trip
across every authed provider)
- Second /model open, warm cache: 51 ms (90× faster)
- `/model --refresh` clears the disk cache and re-fetches.
Cache schema (~/.hermes/provider_models_cache.json, ~3 KB):
{ "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]},
... }
Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway —
5855/5855 pass.
* fix(model picker): use blake2b for cache fingerprint to silence CodeQL
py/weak-sensitive-data-hashing flagged the sha256 call in
_credential_fingerprint() as a high-severity alert because the input
includes env var values whose names contain *_API_KEY / *_TOKEN.
The hash is used solely as a cache-bust identity — never reversed, never
stored, collisions are harmless (worst case: cache miss → live re-fetch).
blake2b serves the same purpose and isn't flagged by this rule.
Functional behavior identical: 16-hex-char digest, cache hit/miss logic
unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.
The May 27 merge of origin/main into bb/gui re-introduced two callers of
_content_display_text (in _inflight_text and _history_to_messages) but
dropped the helper definition itself, leaving an unresolved reference.
NameError fires on every user message via _start_inflight_turn ->
_inflight_text, taking down both the TUI and the desktop (which share
this gateway backend) the moment input is dispatched.
Restores the helper verbatim from main (commit 36c99af37) -- pure
structured-content text extractor, no other dependencies.
Bb/gui had dropped the helper but the orchestrator code merged from main
still calls it (_inflight_text, _message_preview). Re-add the definition
verbatim from main so session.create / _start_inflight_turn don't crash
with NameError on first prompt submit.
Add a first-class active-session orchestrator for the Ink TUI:
- list, activate, close, and launch live process-local TUI sessions
- hydrate committed and in-flight output when switching sessions
- dispatch a new prompt session from the +new row with session-scoped model picks
- expose a clickable live-session count in the status chrome
- preserve stable row order while initially focusing the current session
- support mouse hit-testing for floating orchestrator overlays
- add backend and frontend regression coverage for the lifecycle and UI helpers
Bring 313 commits of upstream main into the bb/gui dashboard
refactor branch. Eight conflicts resolved by hand, the rest
auto-merged. One missing class (_StreamErrorEvent) restored from
main after the auto-merger dropped it.
Conflict resolutions:
apps/dashboard/README.md take HEAD: main's text described
the pre-rename web/ layout that
bb/gui refactored away.
apps/dashboard/package.json combine: keep HEAD's @hermes/shared
workspace dep, take main's
@nous-research/ui 0.16.0 bump.
apps/dashboard/package-lock.json regenerate via
npm install --package-lock-only.
Root lock also regenerated; only
dashboard and apps/desktop entries
moved (apps/desktop version 0.0.1 →
0.0.2 to match bb/gui's
package.json bump).
apps/dashboard/src/pages/ take main (4 hunks): text-xs
EnvPage.tsx replaces text-[0.65rem] per the
typography rule HEAD's own README
documents.
hermes_cli/gateway.py take main (2 hunks): Discord
setup metadata moved to plugin
(architectural migration); s6
service-manager dispatch helpers
additive.
hermes_cli/main.py combine (2 hunks): take main's
Termux-aware
_sync_bundled_skills_for_startup;
combine gui + portal subcommands
in the known-subcommand list.
hermes_cli/web_server.py mixed (10 hunks):
- take main on _PUBLIC_API_PATHS
(bb/gui's own test asserts the
rescan endpoint must require auth)
- combine WS helpers: keep HEAD's
_ws_client_label + main's
Host/Origin guard + composing
_ws_request_is_allowed
- take HEAD's debug-level broadcast
drop log (matches the comment
"subscriber went away mid-send")
- take main's _safe_plugin_api_relpath
GHSA-5qr3-c538-wm9j fix and the
paired discovery-time validation
- take main's {name:path} route
converter for plugin visibility
tui_gateway/server.py take main: PR #31379's verbose-
args gating supersedes HEAD's
unconditional args dump on
tool.start.
Post-merge restoration:
run_agent.py restored class _StreamErrorEvent
(40 lines, from origin/main:288).
Auto-merge silently dropped it,
breaking imports in
agent/codex_runtime.py and three
test files
(test_codex_xai_oauth_recovery.py,
test_streaming.py). Restored
verbatim from main.
Sanity checks:
* git diff --check / --cached --check: clean (no stray markers)
* ast.parse + import on all touched .py files: clean
* targeted pytest on resolved files: 756 passed, 1 pre-existing
Windows-curses failure unrelated to the merge
* full pytest_parallel run: 105 files / 391 failures vs baseline
98 files / 346. Differential vs origin/bb/gui shows all 11
"new" failure files come from main's added tests/code and
reproduce identically against origin/main on the same Windows
host (pure Windows path-separator / perms / git-bash issues
in upstream tests, not merge regressions). 4 baseline
failures fixed: 3 in test_codex_xai_oauth_recovery (the
_StreamErrorEvent restoration), 1 each in test_pairing,
test_runner_startup_failures, test_stream_consumer.
* sentinel-token sweep on main's eight largest commits:
every audited symbol present in the merged tree at expected
counts (TTSProvider 61, NtfyAdapter 29, S6ServiceManager 70,
install_bws 12, security_audit 16, register_image_gen_provider
23, list_profile_gateways 22, DISCORD_FREE_RESPONSE_CHANNELS
48, …).
* byte-diff sweep: 30/30 sampled main-only-modified files
byte-identical to origin/main; the four bb/gui-only files
that drifted (i18n/types.ts, i18n/ru.ts, ThemeSwitcher.tsx,
ToolCall.tsx) correctly absorbed main's web/ → apps/dashboard/
edits through git's rename detection (main's added lines all
present, removed lines all absent).
PR #6a1aa420e coupled `display.tool_progress: verbose` (a per-tool display
toggle for full args / results / think blocks) to `self.verbose` — which
controls root-logger DEBUG level. Result: setting tool_progress: verbose
in config silently flipped every module in the process to DEBUG and
flooded the terminal with internal logging, far beyond just full tool
calls.
The two concepts are separate:
- `tool_progress_mode == 'verbose'` → display behavior (tool rendering)
- `self.verbose` → logging behavior (root logger → DEBUG, line 9795)
This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback
plumbing but severs the verbose-display → debug-logging link.
Changes:
- cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no
longer auto-True when tool_progress_mode == 'verbose'.
- cli.py:_toggle_verbose — slash-cycle through tool progress modes no
longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`.
- cli.py:9355 — fix misleading label (drop 'and debug logs').
- tui_gateway/server.py:_make_agent — same decoupling on the TUI side
(verbose_logging no longer derived from tool_progress_mode).
- tests/cli/test_tool_progress_scrollback.py — invert the test that
asserted the broken coupling; add coverage for explicit `--verbose`
still enabling DEBUG independent of tool_progress.
Live verified:
- tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines
- --verbose flag explicit → 32 DEBUG/INFO log lines (as expected)
Two independent bugs caused the slash-command autocomplete to render
`/goal` as `/goa` (and `/gquota` as `/gquot` for that matter) in the TUI:
1. `tui_gateway/server.py` was forwarding `c.display` from
prompt_toolkit's `Completion` straight into the JSON-RPC payload.
prompt_toolkit normalizes `display=` into `FormattedText` (a `list`
subclass), so the wire format became `[["", "/goal"]]` instead of
the `string` that `CompletionItem.display` in the TUI declares.
`meta` already went through `to_plain_text` — `display` did not.
2. The dropdown row in `appOverlays.tsx` used `flexDirection="row"`
with the display `<Text>` and the (very long) meta `<Text>` as
siblings. When the meta overflows the row width, Ink/Yoga shrinks
the *first* column by one cell, lopping the trailing character off
the command name. `/goal` triggers it reliably because its meta
string is the longest of any built-in command (description +
embedded `[text | pause | resume | clear | status]` usage hint).
Wrapping the display column in `<Box flexShrink={0}>` keeps it at
its natural width and lets the meta wrap or truncate instead.
Bug 1: /voice off in TUI mode did not clear HERMES_VOICE_TTS,
leaving TTS stuck ON with no way to disable it (the voice.toggle
tts handler requires voice mode to be ON).
Bug 2: TUI status bar only showed 'voice on/off' without any
indication of whether TTS speech output is active, because the
frontend never tracked voiceTts state.
- tui_gateway/server.py: clear HERMES_VOICE_TTS when voice is turned off
- ui-tui/src/app/useMainApp.ts: add voiceTts state, thread setVoiceTts
through voice contexts, display [tts] in status bar
- ui-tui/src/app/slash/commands/session.ts: sync tts from voice.toggle response
- ui-tui/src/app/interfaces.ts: add setVoiceTts to all voice context interfaces
* fix(tui): surface verbose tool details
Emit redacted structured verbose args/results to the TUI so /verbose verbose can show full tool detail without reopening stdout, and fail closed if redaction is unavailable.
Salvages #29011.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
* fix(tui): address verbose detail review
Label verbose tool failures as errors, cover forced verbose reasoning, and avoid new diff type warnings from the redaction regression tests.
* fix(tui): bound verbose tool payloads
Cap verbose tool detail text before emitting JSON-RPC events and preserve verbose results on inline diff completions.
* fix(tui): align termux argv test with gc flag
Update the stale TUI launch expectation so the Termux freshness path matches the current direct Node argv.
---------
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
* feat(tui): make display.mouse_tracking pick which DEC modes to enable
Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006.
Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row
fire a clipboard probe that surfaces as "No image in clipboard" — sometimes
dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too,
since tmux's own scrollback is preempted by the alt-screen TUI.
`display.mouse_tracking` (and `/mouse <preset>`) now accepts `off | wheel |
buttons | all` in addition to the legacy booleans. `wheel` is 1000+1006:
scroll wheel + click only, no drag, no hover — the tmux-friendly subset.
`buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the
hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.).
* fix(tui): repaint + sync mouse mode when display.mouse_tracking changes
Two interacting bugs left the TUI blank when `display.mouse_tracking`
switched at runtime (config edit, /mouse <preset>):
1. AlternateScreen's effect re-runs on every `mouseTracking` change,
tearing down and re-entering the alt screen. After re-entry, ink's
frame buffers are reset by `resetFramesForAltScreen()` but nothing
schedules the follow-up render — the alt screen sits blank until
some other state change happens to trigger one. Add a
`scheduleRender()` in `setAltScreenActive`'s active=true branch so
the freshly-entered alt screen gets a full repaint immediately.
2. `setAltScreenActive` early-returns when `active` hasn't changed,
which silently drops a `mouseTracking` change if the cleanup→setup
pair somehow leaves `altScreenActive` already true. Call
`setAltScreenMouseTracking` explicitly from the AlternateScreen
effect so the in-memory mode and terminal DECSET sequence stay in
sync regardless of how `setAltScreenActive` resolved (the call is a
no-op when the mode is unchanged).
* fix(tui): address copilot review #4341269705
- tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES
frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already
centralizes the canonical preset set via its values; the separate
constant added no behavior.
- tests/test_tui_gateway_server.py: update the existing
test_config_mouse_uses_documented_key_with_legacy_fallback to assert
the new preset strings ('all'/'off' instead of 'on'/'off',
display.mouse_tracking persisted as 'all' instead of True) and add
test_config_mouse_accepts_preset_strings_and_aliases covering /mouse
set with wheel/click/unknown (comment #3284802453). The on/off legacy
config.set return shape was an implementation detail of the boolean
flag, not a stable API — the slash command, gateway help text, and
docs all advertise the preset values now.
- ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the
end of reenterAltScreen() (comment #3284802461). Mirrors the same fix
in setAltScreenActive() from ece0a2f4c — without it, SIGCONT/resize
self-heal/stdin-gap re-entry leaves the alt screen blank because
every caller returns early after invoking us.
* fix(tui): address copilot review #4341308478 round 2
- ui-tui/src/config/env.ts (comment #3284837577): the precedence
comment was misleading. Actual behavior on origin/main is
HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default >
HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from
main; the only change here was the wrong comment that claimed
DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block
to document the actual precedence ladder.
- tui_gateway/server.py /mouse set (comment #3284837607): replaced
'str(value or "").strip().lower()' with the explicit None idiom
already used for /indicator, so programmatic callers can pass 0 /
False and have them route through _MOUSE_TRACKING_ALIASES → 'off'
instead of collapsing to '' and triggering the toggle path.
- ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx
(comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before
enableMouseTrackingFor(...) on mount. Otherwise selecting
'wheel'/'buttons' from a state where DEC 1003 was already asserted
(crash, another app, debugger) would silently leave hover on. Also
unconditionally DISABLE on unmount so a crash mid-mount can't leak
DEC modes back to the host shell.
* chore(release): map nat@nthrow.io to @nthrow for #26681 salvage
* fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen
Copilot review #4341356637 (comment #3284880417). The explicit
setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true,
mouseTracking) was defensive paranoia added in the previous fix commit
that's not actually reachable in practice:
- React's cleanup always runs before the next setup, so on any prop
change (mouseTracking or writeRaw) the cleanup sets active=false
first. Setup then sees active was false and applies the new mode
via setAltScreenActive without early-returning.
- On the impossible 'active stayed true' path, the writeRaw above has
already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode)
to the terminal, so the in-memory mode would lag but the visible
state is already correct.
Removing the redundant call means a single DEC sequence per mount.
If the 'active stayed true' path ever manifests in practice, the
right fix is in setAltScreenActive (track mode regardless of the
active early-return), not here.
* fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx
Copilot review #4341379994 (comments #3284900825, #3284900840,
#3284900852). Three remaining call sites in ink.tsx still re-enabled
mouse tracking without first sending DISABLE_MOUSE_TRACKING:
- handleResize alt-screen recovery (line ~577)
- reassertTerminalModes stdin-gap re-assertion (line ~1351)
- reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408)
For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally-
asserted DEC 1003 (other apps, prior crash, tmux state) still active
and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING
is idempotent and safe to send unconditionally — it resets all four
modes. Matches the pattern already in setAltScreenMouseTracking and
the AlternateScreen mount path.
* fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen
Copilot review #4341452823 (comment #3284959762). exitAlternateScreen()
was the last call site in ink.tsx still re-enabling mouse tracking
without DISABLE first. Editors (vim/nvim/less) and tmux can leave
DEC 1003 hover asserted across the handoff back; without DISABLE,
'wheel'/'buttons' presets silently kept hover on after the editor
quit. Now all five enableMouseTrackingFor() call sites in ink.tsx
prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes,
reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen.
* fix(tui): add defensive default to enableMouseTrackingFor switch
Copilot review #4341485231 (comment #3284979323). TS exhaustive switch
returns string per the type system, but a JS caller / corrupted config
/ hot-reload-in-dev could reach the function with an unknown value at
runtime. Without a default, that path returns undefined which then
concatenates as the literal string 'undefined' into the terminal byte
stream — visibly garbling output. Treat unknown as 'off' (no DEC
sequences) so the worst case is silent input loss rather than a
wrecked screen.
---------
Co-authored-by: Nat Thrower <nat@nthrow.io>
Add browser CDP launch candidates for Chrome, Chromium, Brave, and Edge while preserving Chrome-first selection. Retry candidate launch failures instead of giving up after the first executable.
Update /browser CLI and TUI messaging, docs, and tool descriptions from Chrome-only wording to Chromium-family browser support. Add regression coverage for Brave/Edge paths, Chrome-first precedence, fallback launches, and CDP endpoint probing.
Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.
Use cases:
- /backend-dev → loads github-code-review + test-driven-development
+ github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.
Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
at invocation time, the same way /<skill> does. Prompt cache stays
intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
/help display.
Schema (~/.hermes/skill-bundles/<slug>.yaml):
name: backend-dev
description: Backend feature work.
skills:
- github-code-review
- test-driven-development
instruction: |
Optional extra guidance prepended to the loaded skills.
New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.
New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.
New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.
Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.
Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
scan/cache freshness, resolve (including underscore→hyphen
Telegram alias), build_bundle_invocation_message (loading, missing
skills, user/bundle instruction injection, dedup), save/delete,
reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
subcommand (create/list/show/delete/reload, --force, missing
bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
handler and bundle resolution priority.
Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.
Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
section with quick example, YAML schema, management commands,
behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
the top-level command table and given its own subcommand section.
* feat: add /update slash command to CLI and TUI
* test(cli): add Python tests for /update slash command
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(cli): address Copilot review for /update slash command
Route classic CLI /update through prompt_toolkit modal confirmation and
defer relaunch to the main-thread cleanup path after app.exit(). Tighten
Y/n semantics, add Python wrapper and catalog coverage tests, and assert
/update stays visible in the TUI command catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(cli): address review feedback on /update command
- Replace raw input() with _prompt_text_input_modal in _handle_update_command
to avoid EOF/hang/keystroke-leak races with prompt_toolkit's stdin ownership
- Fix confirmation logic: only proceed on recognized affirmative aliases
(y/yes/1/ok); cancel on everything else including empty string, typos,
and unrecognized input — matches all other [Y/n] prompts in the codebase
- Route relaunch through main-thread shutdown path: set _pending_relaunch
and return False from process_command so process_loop triggers app.exit();
run() then calls relaunch() after prompt_toolkit has restored terminal modes
and after cleanup — safe on both POSIX (execvp) and Windows (subprocess+exit)
- Fix misleading docstring in test_update_command.py: the Vitest only covers
the TypeScript slash handler that emits code 42, not the Python wrapper
branch that acts on it
- Rewrite tests to use SimpleNamespace pattern (like test_destructive_slash_confirm)
so _prompt_text_input_modal can be stubbed directly
- Add Python test for _launch_tui exit-code-42 → relaunch branch in main.py
Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb
Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>
* fix(cli): polish test fixtures for /update command
- Remove unused _prompt_text_input from SimpleNamespace stub
- Use pytest.fail sentinel in managed-install guard test to catch unexpected modal invocations
Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb
Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>
* chore: re-trigger CI after Copilot review fixes
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>
Three call-sites in the codebase each duplicated the same config-slice
+ list_authenticated_providers + post-processing pattern:
- hermes_cli/web_server.py /api/model/options
- tui_gateway/server.py model.options JSON-RPC
- tui_gateway/server.py model.save_key JSON-RPC
This consolidates them onto hermes_cli/inventory.py:
load_picker_context() -> ConfigContext
Replaces the 17-LOC config-slice (model.{default,name,provider,
base_url}, providers:, custom_providers:) every consumer did
inline.
ConfigContext.with_overrides(*, current_provider=, current_model=,
current_base_url=) -> ConfigContext
Truthy-only overlay for TUI agent-session state on top of disk
config. Empty getattr(agent, ...) attrs MUST NOT clobber disk.
build_models_payload(ctx, *, include_unconfigured, picker_hints,
canonical_order, max_models) -> dict
Single payload builder. Delegates curation to
list_authenticated_providers (does not call provider_model_ids
per row \u2014 that pulls non-agentic models). picker_hints +
canonical_order produce the TUI ModelPickerDialog shape;
defaults match the dashboard's existing /api/model/options
contract.
Two latent bugs fixed by consolidation:
1. The dashboard read cfg.get('custom_providers') directly, missing
the v12+ keyed providers: form. Now both surfaces go through
get_compatible_custom_providers().
2. The TUI's canonical-merge keyed on is_user_defined to decide order.
Section 3 of list_authenticated_providers sets is_user_defined=True
on rows from the providers: config dict even when the slug is
canonical \u2014 that silently demoted them to the picker tail.
_reorder_canonical now keys on slug membership instead.
Stats: +666 / -145 (net +521). Module 240 LOC; 18 behavior tests.
This PR replaces the rejected #23369 (which bundled the consolidation
with new scriptable CLI surfaces \u2014 hermes models list/status, hermes
providers list \u2014 and a JSON contract that have no external user
demand). Just the refactor; the CLI surface is deferred to a separate
PR gated on actual demand.
Refs #23359.
_session_info() used os.getcwd() which reflects the gateway process
working directory, not the user's actual working directory. This caused
the TUI status line to display incorrect paths (e.g. D:\HermesWork
instead of D:\Hermes\HermesWork) after agent turns that changed the
process cwd.
Align with session.create which already correctly reads TERMINAL_CWD
env var set by the CLI launcher.
- chat-messages: match tool rows by overlapping query/context/preview values
so preview-first `tool.progress` rows reliably adopt later stable-id
`tool.start` payloads instead of spawning ghost rows or mis-merging
parallel same-name calls; preserve prior args/result across phases.
- tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`,
drop redundant `tool.started` re-emit from `tool.progress`.
- electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so
local backend edits actually run; split hardening helpers into
`electron/hardening.cjs` with tests.
- thread/tool UI: one-shot enter animation keyed by stable ids, braille
spinner for running rows, Cursor-like disclosure rows, drill-down +
duration/count formatting via new tool-fallback-model.
- composer: extract `text-utils`, drop liquid-glass overrides.
- right-rail: split preview-pane into preview-console / preview-file.
- runtime: incremental external-store runtime + runtime-readiness gate;
onboarding store + tests; route-resume hook test.
- regression tests for live tool reconciliation (parallel tools, id-less
progress, preview-first rows, structured args/results).
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"
This reverts commit a63a2b7c78.
* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"
This reverts commit 4a080b1d5a.
* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"
This reverts commit 404640a2b7.
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.
Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs). That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.
After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly. Works identically on every platform
and every locale, no surprise behavior.
Mechanical sweep via:
ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' .
All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing
else changed. Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).
Scope notes:
- tests/ excluded: test fixtures can use locale encoding intentionally
(exercising edge cases). If we want to tighten tests later that's
a separate PR.
- plugins/ excluded: plugin-specific conventions may differ; plugin
authors own their code.
- optional-skills/ and skills/ excluded: skill scripts are user-authored
and we don't want to mass-edit them.
- website/ and tinker-atropos/ excluded: vendored / generated content.
46 files touched, 89 +/- lines (symmetric replacement). No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding.
Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it.
Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors.
- Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and
Haiku 4.5 with updated source URLs (platform.claude.com)
- Add _normalize_anthropic_model_name() to handle dot-notation variants
(e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups
- Fix silent token loss: ensure session row exists before UPDATE in both
run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent)
- Log token persistence failures at DEBUG level instead of swallowing
them silently — makes undercounted analytics diagnosable
- Surface reasoning tokens in CLI /usage and TUI usage panel
- Add 'reasoning' and 'cost_status' fields to TUI Usage type
When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.
Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.
Reported by @counterposition in PR #20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
Previously, /personality in the TUI called _reset_session_agent() which
destroyed the agent, cleared conversation history, and effectively started
a new session. This made personality switching disruptive — users lost
their entire conversation context.
Now /personality updates the agent's ephemeral_system_prompt in-place and
injects a pivot marker into the conversation history. The marker tells
the model to adopt the new persona from that point forward, which is
necessary because LLMs tend to pattern-match their prior responses and
continue the established tone without an explicit signal.
Changes:
- tui_gateway/server.py: Rewrite _apply_personality_to_session to update
the agent in-place instead of resetting. Inject a user-role pivot
marker so the model actually switches style mid-conversation.
- ui-tui/src/app/slash/commands/session.ts: Update help text (no longer
mentions history reset).
- tests/test_tui_gateway_server.py: Update test to verify history is
preserved, pivot marker is injected, and ephemeral prompt is set.
* fix(tui): restore classic CLI voice push-to-talk parity
(cherry picked from commit 93b9ae301b)
* fix(tui): harden voice push-to-talk stop flow
Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.
* fix(tui): preserve silent voice strike tracking
Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.
* fix(tui): clean up voice stop failure path
Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.
* fix(tui): report busy voice capture starts
Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.
* fix(tui): handle busy voice record responses
Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.
* fix(tui): clear voice recording on null response
Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.
* fix(tui): count silent manual voice stops
Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.
---------
Co-authored-by: Montbra <montbra@gmail.com>