Commit graph

156 commits

Author SHA1 Message Date
Teknium
222a3a9c19 test(cli): cover exit resume hint -p flag across profiles
5 tests: default/custom profiles emit no -p; named profile emits
-p <name> on both --resume and -c hints; lookup failure falls back
gracefully.
2026-05-25 01:41:54 -07:00
Teknium
4fbdf0e893 test(cli,gateway): cover bracket-stripping and gateway session-ID lookup
- CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged.
- Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.
2026-05-25 01:33:32 -07:00
helix4u
ec4d6f1823 fix(cli): show masked feedback for secret prompts 2026-05-25 01:20:33 -07:00
helix4u
3b839f4369 fix(context): align guidance with 64k minimum 2026-05-24 23:23:12 -07:00
Teknium
9c08070703 test(cli): update resume usage-hint assertion for numbered selection
PR #9020's salvage changed the /resume list footer from
'Use /resume <session id or title> to continue.' to
'Use /resume <number>, /resume <session id>, or /resume <session title> to continue.\n  Example: /resume 2'.

test_resume_without_target_lists_recent_sessions still pinned the old
string verbatim and failed in CI. Relax to substring assertions that
allow both the new numbered footer and any future tweaks while still
verifying the hint is shown.
2026-05-24 16:22:48 -07:00
Teknium
c043c86bd7 i18n+tests: add list_item_numbered, list_footer_numbered, out_of_range for 15 locales
The numbered /resume feature added new i18n keys to en.yaml; the catalog parity
tests require every locale to carry matching keys and placeholders, so add
translations to all 15 supported locales.

Also unblock tests/cli/test_cli_resume_command.py:
- _make_cli stub now sets self.resume_display = 'minimal' since
  _handle_resume_command (post-#31695) calls _display_resumed_history.
- mock_db.resolve_resume_session_id returns the input id (no compression
  chain) so HERMES_SESSION_ID is set to a real string, not a MagicMock.
2026-05-24 16:22:48 -07:00
daizhonggeng
fef733d56b feat: support numbered resume selection in cli and gateway 2026-05-24 16:22:48 -07:00
LeonSGP43
6c44d537cc fix(cli): show full session titles in /resume list 2026-05-24 16:13:23 -07:00
Teknium
8e68426981 fix(cli): add inline --yes/now skip for destructive slash commands (#30768)
Issue #30768 reports that on native Windows PowerShell the destructive-slash
confirmation modal renders but never registers keypresses, leaving the user
unable to confirm or cancel /reset, /new, /clear, or /undo. The modal works
on macOS, Linux, and WSL; PR #23907 (merged May 11) replaced the
daemon-thread input() pattern with a prompt_toolkit-native keybinding modal
but the win32 input pipeline apparently doesn't dispatch keys to the
filter-conditioned handlers. The modal investigation is ongoing.

This change ships the immediate escape hatch: append `now`, `--yes`, or `-y`
to any destructive slash command to bypass the modal and run the action
immediately. Works on every platform without touching the broken Windows
code path.

  /reset now            -> reset, no modal
  /new --yes my-session -> new session titled "my-session", no modal
  /clear -y             -> clear, no modal
  /undo -y              -> undo, no modal

The default behavior (modal prompts when approvals.destructive_slash_confirm
is True) is unchanged for users who don't pass a skip token.

Implementation:

- New classmethod HermesCLI._split_destructive_skip(text) -> (remainder, skip)
  parses a destructive-slash command string, strips the leading "/cmd" word
  and any recognized skip tokens (case-insensitive exact match, not substring),
  and reports whether a skip was requested.
- HermesCLI._confirm_destructive_slash gains an optional cmd_original= arg.
  When the arg contains a skip token, it returns "once" immediately —
  before the gate check and before any modal rendering.
- The /clear, /new, /undo handlers in process_command pass cmd_original
  through. /new additionally uses _split_destructive_skip to strip skip
  tokens from the remaining text before deriving the session title, so
  "/new now My Session" yields title="My Session" (not "now My Session").

Tests:

- 7 new unit tests in tests/cli/test_destructive_slash_confirm.py covering
  the helper (recognized tokens, command-word stripping, case-insensitive
  exact match, None/empty input) and the modal bypass (now and --yes both
  skip; no-skip-token still consults the modal).
- 3 new integration tests in tests/cli/test_destructive_slash_inline_skip_e2e.py
  driving HermesCLI.process_command end-to-end and asserting (a) new_session
  is invoked, (b) the modal is never reached, (c) the skip token does not
  leak into the session title, and (d) the no-skip-token path still reaches
  the modal as a sanity check that we haven't accidentally short-circuited
  the normal flow.

All 31 tests across the destructive-slash test surface pass.

Docs:

- website/docs/reference/slash-commands.md documents the new flags both in
  the destructive-commands table and the dedicated approval section, with a
  link back to issue #30768 explaining why the escape hatch exists.
2026-05-24 16:13:03 -07:00
Teknium
5dc10ec3ba test(cli): reconcile resume-recap tests with skip-tool-only default and compression-chain helper
- test_tool_calls_shown_as_summary: explicitly disable resume_skip_tool_only
  (#4434 made True the default; the legacy assertion relied on tool-only
  entries being rendered as a summary).
- test_tool_only_message_skipped_by_default: add coverage for the new
  default skip behavior.
- test_resume_command_*: mock_db.resolve_resume_session_id now returns the
  same id (no compression chain) so the post-#15000 redirect block doesn't
  shove a MagicMock into HERMES_SESSION_ID.
2026-05-24 15:36:37 -07:00
Samuel Zhang
961e34a1d3 fix: show recap after in-session resume 2026-05-24 15:36:37 -07:00
Teknium
c9b3eeabdc
fix(cli): decouple tool_progress=verbose from global DEBUG logging (#31379)
PR #6a1aa420e coupled `display.tool_progress: verbose` (a per-tool display
toggle for full args / results / think blocks) to `self.verbose` — which
controls root-logger DEBUG level. Result: setting tool_progress: verbose
in config silently flipped every module in the process to DEBUG and
flooded the terminal with internal logging, far beyond just full tool
calls.

The two concepts are separate:
- `tool_progress_mode == 'verbose'` → display behavior (tool rendering)
- `self.verbose` → logging behavior (root logger → DEBUG, line 9795)

This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback
plumbing but severs the verbose-display → debug-logging link.

Changes:
- cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no
  longer auto-True when tool_progress_mode == 'verbose'.
- cli.py:_toggle_verbose — slash-cycle through tool progress modes no
  longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`.
- cli.py:9355 — fix misleading label (drop 'and debug logs').
- tui_gateway/server.py:_make_agent — same decoupling on the TUI side
  (verbose_logging no longer derived from tool_progress_mode).
- tests/cli/test_tool_progress_scrollback.py — invert the test that
  asserted the broken coupling; add coverage for explicit `--verbose`
  still enabling DEBUG independent of tool_progress.

Live verified:
- tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines
- --verbose flag explicit → 32 DEBUG/INFO log lines (as expected)
2026-05-24 02:19:20 -07:00
novax635
421ab81052 fix(cli): reuse canonical root model key normalization in load_cli_config 2026-05-23 23:08:05 -07:00
Teknium
2b10024ee8 test(display): cover failure-suffix rendering + update scrollback test
The original PR #17194 description claimed test_display_tool_preview.py
but only ever shipped test_display_todo_progress.py. Add the missing
coverage for the failure-suffix path:

- _trim_error: whitespace strip, length cap, File-not-found path collapse
- _detect_tool_failure: terminal exit codes, memory full, structured
  {error}/{message} extraction, malformed JSON, None result
- get_cute_tool_message E2E: read_file failure, terminal exit-only,
  terminal stderr message, memory full, success path, no-result path

Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool
to reflect the new behavior: the generic '[error]' fallback in cli.py
has been removed; failure suffixes now come from the result-aware
_detect_tool_failure (e.g. '[exit 1]', '[File not found: x]').
2026-05-23 21:03:51 -07:00
honor2030
6a1aa420e7 Fix CLI verbose tool progress config fallback 2026-05-23 21:03:51 -07:00
novax635
86871ee25a fix(cli): synchronize HERMES_SESSION_ID across environment and contextvar during session switches 2026-05-23 17:46:55 -07:00
QuenVix
7245bc77eb fix(fallback): merge fallback_providers with legacy fallback_model configurations 2026-05-23 05:24:57 -07:00
yoniebans
cebd480818 refactor(session-log): drop branch/compress re-point of session_log_file
The attribute no longer exists; nothing to re-point.
2026-05-20 11:44:10 -07:00
H-Ali13381
6a6766fb89 test(cli): cover Brave binary CDP launch detection 2026-05-19 22:34:05 -07:00
H-Ali13381
697d38a3f4 feat: auto-launch Chromium-family browser for CDP
Add browser CDP launch candidates for Chrome, Chromium, Brave, and Edge while preserving Chrome-first selection. Retry candidate launch failures instead of giving up after the first executable.

Update /browser CLI and TUI messaging, docs, and tool descriptions from Chrome-only wording to Chromium-family browser support. Add regression coverage for Brave/Edge paths, Chrome-first precedence, fallback launches, and CDP endpoint probing.
2026-05-19 22:34:05 -07:00
墨綠BG
c9d5ef28bf 🐛 fix(cli): handle missing remote tracking refs 2026-05-19 14:50:42 -07:00
墨綠BG
28ab420302 🐛 fix(cli): handle no-remote worktree cleanup 2026-05-19 14:50:42 -07:00
felix-windsor
5d1f350784 fix(cli): preserve cron asterisks in strip mode 2026-05-18 20:08:36 -07:00
Austin Pickett
2ef501e1f5
feat(cli): add /update slash command to CLI and TUI (#23854)
* feat: add /update slash command to CLI and TUI

* test(cli): add Python tests for /update slash command

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): address Copilot review for /update slash command

Route classic CLI /update through prompt_toolkit modal confirmation and
defer relaunch to the main-thread cleanup path after app.exit(). Tighten
Y/n semantics, add Python wrapper and catalog coverage tests, and assert
/update stays visible in the TUI command catalog.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): address review feedback on /update command

- Replace raw input() with _prompt_text_input_modal in _handle_update_command
  to avoid EOF/hang/keystroke-leak races with prompt_toolkit's stdin ownership
- Fix confirmation logic: only proceed on recognized affirmative aliases
  (y/yes/1/ok); cancel on everything else including empty string, typos,
  and unrecognized input — matches all other [Y/n] prompts in the codebase
- Route relaunch through main-thread shutdown path: set _pending_relaunch
  and return False from process_command so process_loop triggers app.exit();
  run() then calls relaunch() after prompt_toolkit has restored terminal modes
  and after cleanup — safe on both POSIX (execvp) and Windows (subprocess+exit)
- Fix misleading docstring in test_update_command.py: the Vitest only covers
  the TypeScript slash handler that emits code 42, not the Python wrapper
  branch that acts on it
- Rewrite tests to use SimpleNamespace pattern (like test_destructive_slash_confirm)
  so _prompt_text_input_modal can be stubbed directly
- Add Python test for _launch_tui exit-code-42 → relaunch branch in main.py

Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb

Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>

* fix(cli): polish test fixtures for /update command

- Remove unused _prompt_text_input from SimpleNamespace stub
- Use pytest.fail sentinel in managed-install guard test to catch unexpected modal invocations

Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb

Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>

* chore: re-trigger CI after Copilot review fixes

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>
2026-05-18 20:10:46 -04:00
kshitij
5fba236644
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.

All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.

Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
  gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
  tests/tools/: 18187 passed, 59 pre-existing failures (verified against
  origin/main with the same shape — identical failure count, identical
  category — all xdist test-order flakes unrelated to this change)

Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
2026-05-17 02:29:41 -07:00
Teknium
226cee43d9
feat(cli): show ▶ N indicator in status bar when /background tasks are running (#27175)
Surface live background-task count in the prompt_toolkit status bar so users
can see at a glance that a /background task exists and is running — no need
to ask the agent about it (the agent has no visibility into bg sessions by
design).

- _get_status_bar_snapshot now reports active_background_tasks from len()
  of the live _background_tasks dict (entries are removed in the task
  thread's finally block, so this reflects truly-running tasks)
- Indicator shown only on medium (<76) and wide (>=76) tiers; narrow (<52)
  stays minimal since it's already cramped
- No invalidate plumbing needed: status bar fragments are pulled via lambda
  on every redraw, and the bg thread already calls _app.invalidate() on exit

Refs #8568
2026-05-16 16:51:29 -07:00
Teknium
fc03c95da1
feat(cli): add /exit --delete flag to remove session on quit (#27101)
Port from google-gemini/gemini-cli#19332.

Users can now exit with '/exit --delete' (or '/quit --delete', '/exit -d')
to permanently remove the current session's SQLite history plus on-disk
transcripts (*.json / *.jsonl / request_dump_*) in one shot. Useful for
privacy-sensitive workflows and one-off interactions where leaving a
session recording behind is undesirable.

Implementation:
- New HermesCLI._delete_session_on_exit one-shot flag (defaults False).
- process_command() parses --delete / -d after /exit or /quit and arms
  the flag. Unknown args print a hint and keep the CLI running (prevents
  typos like '/exit -delete' from accidentally exiting).
- Shutdown path calls SessionDB.delete_session(session_id, sessions_dir=...)
  right after end_session() when the flag is set. That API already
  existed for 'hermes sessions delete' and handles both SQLite removal
  (orphaning child sessions so FK constraints hold) and on-disk file
  cleanup.
- /quit CommandDef now advertises '[--delete]' in args_hint so /help
  and CLI autocomplete surface it.

Tests: tests/cli/test_exit_delete_session.py (12 cases covering both
aliases, case insensitivity, whitespace, short form, unknown-arg
rejection, and registry metadata).

E2E-verified with isolated HERMES_HOME: session row deleted, all three
transcript/request-dump files removed, second delete_session call
correctly returns False.
2026-05-16 12:51:08 -07:00
Teknium
965ae7fa97
revert(cli): drop scrollback box width clamp (#25975), restore full-width borders (#26163)
#25975 (salvaging #24403) clamped decorative scrollback Panels and
streaming box rules to `max(32, min(width, 56))` as a defense against
terminal-emulator reflow when columns shrink. On any modern wide
terminal this made the response/reasoning borders look stubby — 56
cols inside a 200-col viewport.

#26137 (salvaging #25981, by @OutThisLife) landed a more fundamental
fix: prompt_toolkit's `_output_screen_diff` is monkey-patched so its
reserve-vertical-space cursor move no longer pushes chrome into
scrollback at all. With that in place, the clamp is no longer
load-bearing for the chrome-into-scrollback class of bugs — the
remaining risk is purely cosmetic reflow of *already stamped*
Panel borders during an aggressive column shrink, which we now
accept as a tradeoff for restoring proper full-width rendering.

Changes:
- `_scrollback_box_width()` returns `max(32, width)` (just the floor,
  no upper cap). All 10 call sites stay valid.
- Updated `test_scrollback_box_width_caps_to_resize_safe_value` to
  the new `test_scrollback_box_width_returns_viewport_width` asserting
  full-width passthrough above the 32-col floor.

Floor of 32 is kept so `'─' * (w - 2)` math stays positive on tiny
terminals.

Refs #18449 #19280 #22976 (the original reflow class) and #25975
(the clamp this reverts).
2026-05-14 23:30:16 -07:00
teknium1
cbd1f8e4be test(cli): cover light-mode detection + SkinConfig.get_color remap
Adds 16 unit tests covering the light/dark terminal detection path
introduced in the previous commit:

- Env override priority (HERMES_LIGHT, HERMES_TUI_LIGHT,
  HERMES_TUI_THEME, HERMES_TUI_BACKGROUND, COLORFGBG)
- Detection cache stickiness
- _maybe_remap_for_light_mode() no-op in dark mode
- Known dark-mode color remap (#FFF8DC -> #1A1A1A etc)
- Case-insensitive lookup
- Unknown color passthrough
- Status-bar paired colors (#C0C0C0, #888888, #555555, #8B8682) are
  intentionally NOT remapped — regression guard for the patch-11 fix,
  since remapping them would produce dark-on-dark on the status bar's
  navy bg
- SkinConfig.get_color() wrapper is installed and idempotent
- SkinConfig.get_color() does remap in light mode and passes through
  in dark mode

We don't try to fake an OSC 11 reply — that path is exercised
end-to-end in real Terminal.app; the env-override path covers the
algorithmic logic.
2026-05-14 23:23:32 -07:00
Phil Thomas
d6c488f2dc fix(cli): wire /sessions slash command in the classic CLI
The 'sessions' command has been registered in the central command
registry since #20805 (May 2025) and surfaces in /help and tab-completion,
but the classic CLI's process_command() never had an elif branch for it.
The canonical name fell through and printed 'Unknown command: sessions'.
The TUI side was wired up correctly via the SessionPicker overlay; only
the legacy CLI was missing the dispatch.

Adds _handle_sessions_command() which mirrors /resume's no-arg behavior
inline (the CLI has no overlay primitive equivalent to the TUI picker):

- /sessions and /sessions list  → print the recent-sessions table
- /sessions <id_or_title>       → delegates to _handle_resume_command

Includes regression tests covering the dispatcher wiring (the original
bug) plus the three handler branches.
2026-05-14 16:00:03 -07:00
Teknium
2844c888f1
fix(cli): clamp scrollback box widths + suppress status bar after resize (#25975)
When the terminal shrinks, already-printed box-drawing rules (response,
reasoning, streaming TTS, background-task Panels) reflow into multiple
narrower rows — visible as duplicated horizontal separators / ghost
lines in scrollback. Similarly, prompt_toolkit redraws a fresh status
bar on SIGWINCH on top of one the terminal just reflowed, producing
double-bar artifacts on column shrink.

Two surgical changes:

1. Decorative scrollback boxes now use a new
   `HermesCLI._scrollback_box_width()` helper that clamps to
   `max(32, min(width, 56))`. The live TUI footer is unaffected and still
   uses the full width. Covers: streaming response box (open + close),
   reasoning box (open + close, both streaming and post-stream paths),
   streaming-TTS box close, final-response Rich Panel, and the
   background-task Rich Panel.

2. `_recover_after_resize()` now also sets a new
   `_status_bar_suppressed_after_resize` flag so the dynamic status bar
   and both input separator rules stay hidden until the next user input.
   The flag is cleared in the process loop the moment the user submits
   their next prompt, restoring chrome cleanly.

Tests:
- New `test_input_rules_hide_after_resize_until_next_input` covers the
  flag's effect on rule heights.
- New `test_scrollback_box_width_caps_to_resize_safe_value` covers the
  helper at floor / cap / mid-range / overflow.
- Existing resize-recovery test extended to assert the flag flips.

Refs: #18449 #19280 #22976
Salvage of #24403.

Co-authored-by: Szymonclawd <szymonclawd@mac.home>
2026-05-14 15:22:44 -07:00
LeonSGP43
ac64d0c2ca fix: preserve ansi output history on resize replay 2026-05-14 15:14:29 -07:00
Xu Zhizhong
06c6c1f0f2 fix(cli): batch resize history replay 2026-05-14 15:11:51 -07:00
vominh1919
e2b2d48610 fix(cli): preserve startup banner on terminal resize
Recover from SIGWINCH without clearing the physical screen or scrollback
buffer. The startup banner and tool summary are printed before
prompt_toolkit owns the live chrome, so they live in normal terminal
scrollback. Calling erase_screen() + \x1b[3J] on every resize removed
that UI permanently — _replay_output_history cannot reconstruct it
because the banner was never added to _OUTPUT_HISTORY.

Instead, just reset prompt_toolkit's renderer cache and invalidate so
the next incremental redraw starts from a clean slate, then let the
original on_resize handler recalculate layout for the new terminal
size. This matches the behaviour of bash/zsh/fish on SIGWINCH.

Fixes NousResearch/hermes-agent#22999
2026-05-13 13:36:31 -07:00
littlewwwhite
6f2d1c88b7 feat(custom): prompt and persist explicit api_mode for custom providers
Adds an explicit API compatibility mode prompt to the `hermes model -> custom`
flow so Codex-compatible third-party endpoints (and any other non-default
backend whose URL doesn't match the existing heuristics in
`_detect_api_mode_for_url`) can be selected explicitly instead of silently
falling back to chat_completions.

Choices: Auto-detect / chat_completions / codex_responses / anthropic_messages.

Persists `api_mode` to:
  - `model.api_mode` (active session config)
  - the matching `custom_providers[*]` entry (so re-activating the named
    provider next time replays the same transport)

Salvaged from PR #6125 onto current main: kept the new prompt and the
`_save_custom_provider(api_mode=...)` plumbing; the named-custom flow
already extracts and applies `api_mode` from the saved entry on current
main so those changes are preserved as-is. Test fixtures updated for the
new prompt and the existing display-name prompt.

Co-authored-by: littlewwwhite <1095245867@qq.com>
2026-05-13 13:21:33 -07:00
helix4u
a34998ee2f fix(cli): parse positional insights days 2026-05-12 14:56:47 -07:00
Teknium
1d00716754
fix(cli,tui): align CJK / wide-char markdown tables (#23863)
CJK and emoji glyphs render as two terminal cells but JS String#length
and the model's own padding count them as one, so any markdown table
with Chinese / Japanese / Korean cells drifts right per row when a
real terminal renders it. Both surfaces fix this with a display-cell
width measurement (wcswidth on the Python side, stringWidth on the
TUI side).

Changes:
- agent/markdown_tables.py: new helper. realign_markdown_tables(text)
  detects markdown table blocks (header + |---| divider) and
  rewrites the row padding using wcwidth.wcswidth so every pipe and
  dash lines up across rows. No-op on text without tables.
- cli.py: hook the helper into _render_final_assistant_content for
  strip / render modes (raw passes through untouched), and into the
  streaming line emitter so live token-by-token rendering also
  produces aligned tables. A small two-buffer state machine in
  _emit_stream_text holds table rows until the block ends, then
  flushes them through the realigner so all rows pad to a single
  per-column width.
- ui-tui/src/components/markdown.tsx: renderTable now uses
  stringWidth (Bun.stringWidth fast path + East-Asian-width-aware
  fallback, already memoised in @hermes/ink) instead of UTF-16
  String#length for both column-width measurement and per-cell
  padding. Drops the comment that documented the bug as a deliberate
  limitation.

Validation:
- New tests/agent/test_markdown_tables.py (11): every rebuilt block
  shares pipe column offsets across rows for pure CJK, mixed
  CJK+emoji, ragged-row, and multi-table inputs.
- Updated tests/cli/test_cli_markdown_rendering.py: the existing
  strip-mode test asserted exact whitespace; rewritten to assert the
  alignment contract (cell content survives + every rendered row
  shares pipe offsets).
- New ui-tui markdown.test.ts case (1): rendered column-2 start
  offset is identical for the header + every body row, including
  the CJK row that drifted before the fix.
- Live: hermes chat -q with the user-reported screenshot prompt now
  produces a perfectly aligned table on the wire (header, divider,
  4 body rows including '通义千问', all pipes at identical columns).
2026-05-11 11:13:06 -07:00
zhengyuna
054f568578 fix: use TUI modal for slash confirmations 2026-05-11 10:02:03 -07:00
Teknium
3e7145e0bb
revert: roll back /goal checklist + /subgoal feature stack (#23813)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"

This reverts commit a63a2b7c78.

* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"

This reverts commit 4a080b1d5a.

* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"

This reverts commit 404640a2b7.
2026-05-11 07:06:27 -07:00
Teknium1
28b4fe6007 test: stabilize quick-command redaction test against xdist ordering
agent.redact._REDACT_ENABLED is snapshotted at import time from
HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker
can flip it, so test_exec_command_output_is_redacted was order-dependent.
Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.
2026-05-10 22:12:23 -07:00
0xbyt4
f6736ced81 fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages
1. Quick command exec ran in the gateway process's full environment
   without env sanitization or output redaction. A quick command like
   "env" or "printenv" would leak all API keys, OAuth tokens, and
   bot credentials to the messaging user.

   Fix: apply _sanitize_subprocess_env() before exec and
   redact_sensitive_text() on output before returning.

2. GatewayRunner._pending_messages was written on every interrupt
   (lines 1331-1334) but never read or consumed anywhere. The actual
   interrupt delivery uses adapter._pending_messages (a separate dict).
   Removed the write-only accumulation to prevent unbounded growth.
2026-05-10 22:12:23 -07:00
Teknium
404640a2b7
feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls

Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.

Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
  compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
  more items not fewer. Falls through to legacy freeform judge when
  decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
  a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
  A bounded read_file tool (max 5 calls per turn, restricted to that
  one file) lets the judge inspect history when the snippet is
  ambiguous. Stickiness in code: terminal items are frozen, only the
  user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
  reverts to old prompt when empty.
- Status line shows M/N done counts.

CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.

Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.

* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning

Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:

1. Off-by-one bug — the judge sees the checklist rendered with 1-based
   indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
   state.checklist as 0-based. Result: every judge update landed on
   the wrong item, evidence got attached to neighbouring rows, and
   the genuine 'first pending' item (usually #1) never got marked.
   Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
   user prompt to call out the 1-based scheme explicitly. New tests
   cover the parser conversion + an end-to-end fake-judge round-trip.

2. Conversation dump never happened — _extract_agent_messages tried
   common AIAgent attribute names (.messages, .conversation_history,
   etc.) but AIAgent doesn't expose the message list as an instance
   attribute; it lives inside run_conversation()'s scope. Result: the
   judge's read_file tool always saw history_path=unavailable. Fix:
   added an explicit messages= kwarg to evaluate_after_turn that all
   three call sites (CLI, gateway, TUI gateway) now pass directly.
   Agent-attribute extraction kept as back-compat fallback.

3. Prompt was too harsh on simple goals. The original 'be HARSH,
   default to leaving items pending' wording made the judge refuse
   to mark 'file exists' completed even after the agent ran ls,
   test -f, os.path.isfile, and find — burning the entire 8-turn
   budget on a fizzbuzz task. Softened to 'strict but not absurd'
   with explicit guidance on what counts as evidence and a directive
   not to require re-proving items already established earlier.

Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
2026-05-10 16:56:51 -07:00
Teknium
c5f1f863ac
fix(cli): drive _prompt_text_input directly when off main thread (#23454)
Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the
process_loop daemon thread.  prompt_toolkit.run_in_terminal returns a
coroutine that only the main-thread event loop can drive, so calling it
from a daemon thread orphans the coroutine — the input prompt never
renders and user keystrokes leak into the composer instead of the
confirmation prompt (issue #23185).

Mirror the thread-aware guard already in _run_curses_picker: when off the
main thread, fall back to a direct input() call.  Also wrap
run_in_terminal in try/except so WSL / Warp / other emulators that
silently drop the scheduled coroutine fall back to input() too.

Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main
thread (run_in_terminal path), daemon thread (direct input fallback),
no-app, run_in_terminal-raises, and EOF handling.
2026-05-10 16:16:10 -07:00
ming
85383c6363 fix(cli): preserve config comments on setting writes 2026-05-09 17:55:12 -07:00
Teknium
c7f0aab949
feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.

- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
  it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
  on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
  [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
  AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
  path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
  both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
  [0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
  tui_gateway/server.py: propagate the kwarg (mirrors providers_order
  plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
  unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md

Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
2026-05-09 14:47:00 -07:00
Teknium
70bc52e408
fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777)
Native Windows, WSL, SSH sessions, and Windows Terminal all send
Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on
every POSIX platform, so Ctrl+Enter submitted instead of inserting
a newline on those terminals. Reported in #22379.

Add _preserve_ctrl_enter_newline() predicate that detects the
environments where Ctrl+Enter must produce a newline (sys.platform
== 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION,
WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the
c-j-as-submit binding off in those environments and gate the
c-j-as-newline handler on. Local POSIX TTYs without those markers
(docker exec, plain ssh from a Mac) keep c-j as submit so plain
Enter still works on thin PTYs.

Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py
mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter
('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the
(Escape, ControlM) tuple Alt+Enter produces. This lets Kitty /
mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter
newline through the existing Alt+Enter handler.

9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix
to cover bare-local vs SSH branches.

Closes #22379.
2026-05-09 12:48:14 -07:00
Teknium
b9c001116e
feat: confirm prompt for destructive slash commands (#4069) (#22687)
/clear, /new, /reset, and /undo now ask the user to confirm before
discarding conversation state — three-option prompt routed through the
existing tools.slash_confirm primitive.

Native yes/no buttons render on Telegram, Discord, and Slack (their
adapters already implement send_slash_confirm); other platforms get a
text-fallback prompt and reply with /approve, /always, or /cancel.

The classic prompt_toolkit CLI uses the same three-option flow via the
established _prompt_text_input pattern (see _confirm_and_reload_mcp).
TUI keeps its existing modal overlay (#12312).

Gated by new config key approvals.destructive_slash_confirm (default
true). Picking 'Always Approve' flips the gate to false so subsequent
destructive commands run silently — matches the established
mcp_reload_confirm UX.

Out of scope: /cron remove (separate domain — scheduled jobs, not
session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM)
left unchanged; cosmetic unification can come later.

Closes #4069.
2026-05-09 11:04:46 -07:00
Syed Abdur Rehman Ali
f5b635f6ab feat(cli): recognise Shift+Enter as a newline key
Closes #5346.

Most terminals send the same byte sequence for `Enter` and `Shift+Enter`
by default, so the application can't tell them apart — this is a terminal
protocol limitation, not something Hermes can paper over. But terminals
that implement the Kitty keyboard protocol (Kitty / foot / WezTerm /
Ghostty by default; iTerm2 / Alacritty / VS Code terminal / Warp once the
protocol is enabled) DO emit a distinct sequence for `Shift+Enter`:

  - `\x1b[13;2u`     — Kitty / CSI-u, modifier=2
  - `\x1b[27;2;13~`  — xterm modifyOtherKeys=2

Stock prompt_toolkit doesn't have the CSI-u sequence in its
`ANSI_SEQUENCES` table at all, and it maps the modifyOtherKeys variant to
plain `Keys.ControlM` (Enter) — i.e. it strips the Shift modifier, which
is the bug users actually hit on iTerm2 and friends.

This PR adds `hermes_cli/pt_input_extras.install_shift_enter_alias()`,
called once at CLI startup from `cli.py`, which inserts/overwrites those
sequences in `ANSI_SEQUENCES` so they decode to `(Keys.Escape, Keys.ControlM)`
— the same key tuple `Alt+Enter` produces. The existing Alt+Enter newline
handler (`@kb.add('escape', 'enter')` in `cli.py`) then fires unchanged,
so there is no new keybinding to register and no behavioral change for
terminals that don't emit the distinct sequences.

Files
=====

* `hermes_cli/pt_input_extras.py` — new module hosting the helper. Lives
  outside `cli.py` so it's importable in tests without dragging in the
  full CLI runtime (which depends on `fire`, `rich`, etc.).
* `cli.py` — calls `install_shift_enter_alias()` once at module import.
  Wrapped in try/except so prompt_toolkit version drift can't break CLI
  startup.
* `tests/cli/test_cli_shift_enter_newline.py` — 6 tests:
  - registration of all three byte sequences
  - overwrite of stock prompt_toolkit's broken modifyOtherKeys mapping
  - idempotency
  - parser equivalence: CSI-u Shift+Enter == Alt+Enter
  - parser equivalence: modifyOtherKeys Shift+Enter == Alt+Enter
  - plain Enter remains a single key (submit), distinct from the two-key
    Alt+Enter / Shift+Enter tuple
* `website/docs/user-guide/cli.md` — keybinding table updated; new
  "Shift+Enter compatibility" subsection with a per-terminal status table
  noting macOS Terminal / stock Windows Terminal cannot distinguish the
  keystroke at the protocol level.
* `website/docs/getting-started/quickstart.md`,
  `website/docs/guides/tips.md` — short mention pointing readers at the
  full compatibility note in `cli.md`.

Tested
======

  pytest tests/cli/test_cli_shift_enter_newline.py        # 6 passed

Live-tested by triggering `\x1b[13;2u` against the running Vt100Parser
(see test). Not exercised in a real terminal end-to-end because that
requires a Kitty-protocol-capable host; the test exercises the parser
path that drives the live terminal too.
2026-05-08 16:26:51 -07:00
Teknium
d1838041e5 feat: Ctrl+Enter inserts newline on Windows Terminal
Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving
Windows users with no Enter-involving way to insert a newline in the Hermes
prompt. Fix it by reclaiming c-j on Windows only:

- _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where
  thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows
  plain Enter is always c-m, so c-j is free.
- Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends
  Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal
  settings changes required.
- Alt+Enter binding unchanged; still works on mac/Linux/WSL.
- Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler
  split into platform-aware assertions for POSIX vs win32.
- Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit
  even on POSIX) to point Windows users at Ctrl+Enter.

Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline,
since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No
conflicting Hermes binding existed for Ctrl+J, so this is a harmless side
effect.
2026-05-08 14:27:40 -07:00
Teknium
674fad1483
fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888)
Reported: Ctrl+C during an active /goal loop felt like it did nothing —
the agent would interrupt the current turn, then immediately queue another
continuation and keep going until the session ended or the 20-turn budget
ran out.

Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally:
block around self.chat(...) unconditionally. Whether the turn completed
normally, got interrupted, or returned an empty string, the judge ran on
whatever was in conversation_history and — because the judge is fail-open
— a "continue" verdict pushed another CONTINUATION_PROMPT onto
_pending_input. Ctrl+C was invisible to the hook.

Fix:
- chat() now captures result['interrupted'] onto self._last_turn_interrupted
  (resets to False at entry so early-returns don't leak prior state).
- _maybe_continue_goal_after_turn() checks the flag first: on interrupt,
  auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print
  a one-liner pointing the user at /goal resume or /goal clear. No judge
  call, no continuation enqueued.
- Also added an empty-response guard that mirrors gateway/run.py's
  _handle_message logic (empty reply → transient failure → skip judging
  so we don't trip the consecutive-parse-failures backstop unnecessarily).

The goal stays in the DB as paused, so /goal resume recovers it after
the user has sorted out whatever made them cancel. /goal clear still
works as before for a full stop.

Tests: tests/cli/test_cli_goal_interrupt.py covers:
  - interrupted turn pauses + doesn't queue + judge is NOT called
  - paused goal is resumable
  - empty / whitespace / missing assistant reply skips judging
  - healthy turn still enqueues continuation / marks done
  - chat() resets _last_turn_interrupted at entry (anti-leak guard)

All 55 existing goal tests still pass.
2026-05-08 06:53:13 -07:00