mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
691 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c1eb2dcda7
|
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
|
||
|
|
ea1d0462cf
|
fix(cli): vertical fallback for markdown tables wider than terminal (#23948)
Follow-up to #23863 (CJK table alignment). The realigner was correctly padding pipes to identical column offsets, but when a table's natural width exceeds terminal cells it produced lines that the terminal soft-wrapped mid-cell, destroying column alignment visually even though the bytes were perfectly padded. Reported as 'columns are not aligned' on tables containing one long row alongside several short rows. Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal fallback: when realign_markdown_tables is given an available_width budget and the rebuilt horizontal table exceeds it, render each body row as 'Header: value' lines separated by a thin ─ rule. Word-wraps oversize values at the budget with a 2-space continuation indent. - agent/markdown_tables.py: realign_markdown_tables(text, available_width=None); threshold check at the top of _render_block flips into a new _render_vertical fallback. Includes _wrap_to_width with hard-break for tokens longer than the budget. - cli.py: helper _terminal_width_for_streaming() returns shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell safety margin; passed to all three realign call sites (_render_final_assistant_content for strip+render Panel paths, and the streaming flushers in _emit_stream_text / _flush_stream). - tests/agent/test_markdown_tables.py: 4 new tests covering the overflow-vertical fallback for ASCII + CJK content, the 'fits → keep horizontal' case, and the long-cell wrap with indent. Live-verified: with COLUMNS=100, the user's reported 'long row in ASCII table' case now renders as vertical key-value rows that all fit the panel; the 6-column CJK comparison table still renders as an aligned horizontal table because it fits inside 100 cols. |
||
|
|
2ec8d2b42f
|
chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero). |
||
|
|
1d00716754
|
fix(cli,tui): align CJK / wide-char markdown tables (#23863)
CJK and emoji glyphs render as two terminal cells but JS String#length and the model's own padding count them as one, so any markdown table with Chinese / Japanese / Korean cells drifts right per row when a real terminal renders it. Both surfaces fix this with a display-cell width measurement (wcswidth on the Python side, stringWidth on the TUI side). Changes: - agent/markdown_tables.py: new helper. realign_markdown_tables(text) detects markdown table blocks (header + |---| divider) and rewrites the row padding using wcwidth.wcswidth so every pipe and dash lines up across rows. No-op on text without tables. - cli.py: hook the helper into _render_final_assistant_content for strip / render modes (raw passes through untouched), and into the streaming line emitter so live token-by-token rendering also produces aligned tables. A small two-buffer state machine in _emit_stream_text holds table rows until the block ends, then flushes them through the realigner so all rows pad to a single per-column width. - ui-tui/src/components/markdown.tsx: renderTable now uses stringWidth (Bun.stringWidth fast path + East-Asian-width-aware fallback, already memoised in @hermes/ink) instead of UTF-16 String#length for both column-width measurement and per-cell padding. Drops the comment that documented the bug as a deliberate limitation. Validation: - New tests/agent/test_markdown_tables.py (11): every rebuilt block shares pipe column offsets across rows for pure CJK, mixed CJK+emoji, ragged-row, and multi-table inputs. - Updated tests/cli/test_cli_markdown_rendering.py: the existing strip-mode test asserted exact whitespace; rewritten to assert the alignment contract (cell content survives + every rendered row shares pipe offsets). - New ui-tui markdown.test.ts case (1): rendered column-2 start offset is identical for the header + every body row, including the CJK row that drifted before the fix. - Live: hermes chat -q with the user-reported screenshot prompt now produces a perfectly aligned table on the wire (header, divider, 4 body rows including '通义千问', all pipes at identical columns). |
||
|
|
657874460f
|
chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions. |
||
|
|
cc9e788c14 |
fix(cli): defensive _slash_confirm_state access + AUTHOR_MAP
- getattr(self, '_slash_confirm_state', None) at the two read sites that
trip object.__new__(HermesCLI) test fixtures (test_cli_external_editor,
test_cli_skin_integration)
- _build_tui_layout_children: make slash_confirm_widget keyword-only with
default None to avoid breaking subclassing extension hook for wrapper
CLIs (test_cli_extension_hooks)
- AUTHOR_MAP entry for zhengyn0001
Follow-up to the salvaged commit
|
||
|
|
054f568578 | fix: use TUI modal for slash confirmations | ||
|
|
3e7145e0bb
|
revert: roll back /goal checklist + /subgoal feature stack (#23813)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit |
||
|
|
ebf2ea584a |
feat(terminal,cli): docker_extra_args + display.timestamps
Two independent opt-in QoL toggles, both off by default. terminal.docker_extra_args: - List of extra flags appended verbatim to docker run after security defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or other docker run options not exposed by existing config keys. - Non-string entries are logged and skipped. - Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var. display.timestamps: - Appends [HH:MM] to user input bullet and the assistant response box header. Single hub in _format_submitted_user_message_preview() covers both single-line and multi-line user previews; assistant response label gets the timestamp at box-open time. Closes #1569 (timestamps). Co-authored-by: Mibayy <Mibayy@users.noreply.github.com> |
||
|
|
404640a2b7
|
feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
|
||
|
|
c5f1f863ac
|
fix(cli): drive _prompt_text_input directly when off main thread (#23454)
Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the process_loop daemon thread. prompt_toolkit.run_in_terminal returns a coroutine that only the main-thread event loop can drive, so calling it from a daemon thread orphans the coroutine — the input prompt never renders and user keystrokes leak into the composer instead of the confirmation prompt (issue #23185). Mirror the thread-aware guard already in _run_curses_picker: when off the main thread, fall back to a direct input() call. Also wrap run_in_terminal in try/except so WSL / Warp / other emulators that silently drop the scheduled coroutine fall back to input() too. Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main thread (run_in_terminal path), daemon thread (direct input fallback), no-app, run_in_terminal-raises, and EOF handling. |
||
|
|
00ce5f04d9 |
feat(session): make /handoff actually transfer the session live
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.
State machine on sessions table replaces the boolean flag:
None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.
CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.
Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:
1. Resolves the platform's home channel.
2. Asks the adapter for a fresh thread via the new
create_handoff_thread(parent_chat_id, name) capability so the
handed-off conversation gets its own scrollback. Adapters that
don't support threads (or fail) return None and the watcher
falls back to the home channel directly.
3. Constructs a SessionSource keyed as 'thread' when a thread was
created, 'dm' otherwise, then session_store.switch_session
re-binds the destination key to the CLI session_id. The full
role-aware transcript replays via load_transcript on the next
turn (no flat-text injection into context_prompt).
4. Forges a synthetic MessageEvent(internal=True) with the handoff
notice and dispatches through _handle_message; the agent runs
against the loaded transcript and adapter.send delivers the
reply.
5. Marks the row 'completed' on success, 'failed' (+error) on any
exception.
Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:
- Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
so DM topics (Bot API 9.4+) and forum supergroups both work.
- Discord (gateway/platforms/discord.py): parent.create_thread on
text channels with a seed-message + message.create_thread
fallback for permission edge cases. Skips DMs and other
non-thread-capable parents.
- Slack (gateway/platforms/slack.py): posts a seed message and
returns its ts as the thread anchor — Slack threads are
message-anchored.
In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.
CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).
Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.
Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
|
||
|
|
878611a79d |
feat(session): add /handoff command for cross-platform session transfer
Adds /handoff <platform> CLI command that queues the current session for resume on the configured home channel of any messaging platform. CLI side: - /handoff telegram — marks session in shared DB, sends summary to the Telegram home channel via send_message - /handoff discord — same for Discord - Supports telegram, discord, slack, whatsapp, signal, matrix Gateway side: - On new session creation, checks for pending handoffs for the incoming message's platform - If found, loads the CLI session's full conversation history and injects it into the context prompt as a handoff transcript - Agent continues the conversation seamlessly Files: - hermes_state.py: handoff_pending, handoff_platform columns + helpers - cli.py: _handle_handoff_command dispatch + handler - hermes_cli/commands.py: CommandDef entry - gateway/run.py: handoff detection in _handle_message_with_agent - tests/hermes_cli/test_session_handoff.py: 8 tests |
||
|
|
68e44642c8
|
fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993)
Subagent stream drops were spamming the parent terminal with two lines
per blip ('Connection dropped...' + 'Reconnected...') while leaving zero
breadcrumb in agent.log to debug them.
Two underlying bugs, fixed together:
1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which
filters records before root-logger file handlers see them. The comment
claimed 'File handlers still capture everything' — that was wrong.
Removed in both run_agent.py and cli.py; console quietness already
comes from hermes_logging not installing a console StreamHandler in
non-verbose mode.
2. The stream-retry blocks emitted two _emit_status calls per drop
('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected —
resuming…') with no provider name, so multi-provider sessions had to
dig through agent.log to attribute a drop. Replaced both call sites
with a single _emit_stream_drop helper that emits ONE line naming the
provider and error class, and always writes a structured WARNING to
agent.log with subagent_id, depth, provider, base_url, error_type.
Net UX change: 6 lines per triple-subagent drop → 3 lines, each
naming the provider. agent.log now has a structured breadcrumb per
retry that didn't exist before.
Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py
covering the logger-level guard, structured WARNING content, single
status line per drop (no Reconnected follow-up), and provider naming.
|
||
|
|
b67ea7ff47
|
perf(cli): skip welcome banner on chat -q single-query mode (#22904)
`hermes chat -q "..."` printed the full welcome banner before running the query — kawaii ASCII logo, available toolsets list, available skills list, model name, session ID, working directory, update-available notice. Building it took ~420 ms on cold start (~200 ms version-update probe, the rest is toolset / skill enumeration plus Rich panel rendering). For a one-shot `-q` query the banner is noise: the user already picked the prompt, doesn't need a toolset reference, and gets the session ID + resume hint from `_print_exit_summary()` after the response prints. The fully-quiet `-Q` / `--quiet` machine-readable path was already banner-free; this brings the human-facing single-query path in line so all non-interactive invocations are fast. Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run percentiles, 9950X3D): median: 1.90 → 1.75 s (-150 ms) min: 1.80 → 1.73 s ( -70 ms) P25: 1.82 → 1.74 s ( -80 ms) Wider variance than expected; the banner cost overlaps with API latency on real `chat -q` runs. Min-time delta of 70 ms is the cleanest signal — that's the deterministic banner-build cost gone. The 150 ms median delta picks up cases where the version-update probe also finishes during the wait. Interactive mode (`hermes` with no `-q`) and the `--list-tools` / `--list-toolsets` one-shot listing commands still show the banner — those are the contexts where it's actually wanted. Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre- existing flakes in `test_cli_save_config_value.py` that fail with `No module named 'ruamel'` both with and without this change). |
||
|
|
85383c6363 | fix(cli): preserve config comments on setting writes | ||
|
|
116a1446a4 |
fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV
Problem: terminal.docker_env set in config.yaml was silently ignored.
Docker containers never received the user-specified env vars.
Root cause: docker_env was missing from all three config→env bridging
maps (cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync) and from the terminal_tool
_get_env_config() reader. _create_environment() consumed the key from
container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV
was never set.
Also extend the list-serialisation branches in cli.py and gateway/run.py
to handle dict values via json.dumps (lists already used json.dumps;
plain str() on a dict produces undecodable output).
Fix:
- cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings;
serialise dict values with json.dumps alongside existing list path
- gateway/run.py: same additions to _terminal_env_map and serialisation
- hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV"
to _config_to_env_sync so `hermes config set terminal.docker_env …`
persists to .env correctly
- tools/terminal_tool.py: add docker_env key to _get_env_config() reading
TERMINAL_DOCKER_ENV via _parse_env_var with default "{}"
Tests: add test_docker_env_is_bridged_everywhere to
tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on
origin/main, passes with fix.
Fixes #20537
|
||
|
|
c7f0aab949
|
feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
|
||
|
|
70bc52e408
|
fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777)
Native Windows, WSL, SSH sessions, and Windows Terminal all send Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on every POSIX platform, so Ctrl+Enter submitted instead of inserting a newline on those terminals. Reported in #22379. Add _preserve_ctrl_enter_newline() predicate that detects the environments where Ctrl+Enter must produce a newline (sys.platform == 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION, WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the c-j-as-submit binding off in those environments and gate the c-j-as-newline handler on. Local POSIX TTYs without those markers (docker exec, plain ssh from a Mac) keep c-j as submit so plain Enter still works on thin PTYs. Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter ('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the (Escape, ControlM) tuple Alt+Enter produces. This lets Kitty / mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter newline through the existing Alt+Enter handler. 9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix to cover bare-local vs SSH branches. Closes #22379. |
||
|
|
b9c001116e
|
feat: confirm prompt for destructive slash commands (#4069) (#22687)
/clear, /new, /reset, and /undo now ask the user to confirm before discarding conversation state — three-option prompt routed through the existing tools.slash_confirm primitive. Native yes/no buttons render on Telegram, Discord, and Slack (their adapters already implement send_slash_confirm); other platforms get a text-fallback prompt and reply with /approve, /always, or /cancel. The classic prompt_toolkit CLI uses the same three-option flow via the established _prompt_text_input pattern (see _confirm_and_reload_mcp). TUI keeps its existing modal overlay (#12312). Gated by new config key approvals.destructive_slash_confirm (default true). Picking 'Always Approve' flips the gate to false so subsequent destructive commands run silently — matches the established mcp_reload_confirm UX. Out of scope: /cron remove (separate domain — scheduled jobs, not session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM) left unchanged; cosmetic unification can come later. Closes #4069. |
||
|
|
2a7047c2ed
|
fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043)
SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl byte-range locks that don't reliably work on network filesystems. Upstream documents this explicitly: https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises 'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before this change, every feature backed by state.db or kanban.db broke silently: - /resume, /title, /history, /branch returned 'Session database not available.' with no cause - gateway logged the init failure at DEBUG (invisible in errors.log) - kanban dispatcher crashed every 60s, driving the known migration race (duplicate column name: consecutive_failures, #21708 / #21374) Changes: - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL and falls back to DELETE on SQLITE_PROTOCOL-style errors with one WARNING explaining why - hermes_state.get_last_init_error() + format_session_db_unavailable(): capture the init failure cause and surface it in user-facing strings (with an NFS/SMB pointer for 'locking protocol') - hermes_cli/kanban_db.connect(): use the shared helper - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING (matches cli.py's existing correct behavior) - cli.py (4 sites) + gateway/run.py (5 sites): replace bare 'Session database not available.' with format_session_db_unavailable() Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new test in tests/hermes_cli/test_kanban_db.py. Existing suites (state, kanban, gateway, cli) remain green for all tests unrelated to pre-existing failures on main. Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home, local_lock=none) reporting 'Session database not available.' on /resume; 'locking protocol' appears in 4 distinct log entries across backup, kanban, TUI, and CLI paths in the same session. closes #22032 |
||
|
|
f5b635f6ab |
feat(cli): recognise Shift+Enter as a newline key
Closes #5346. Most terminals send the same byte sequence for `Enter` and `Shift+Enter` by default, so the application can't tell them apart — this is a terminal protocol limitation, not something Hermes can paper over. But terminals that implement the Kitty keyboard protocol (Kitty / foot / WezTerm / Ghostty by default; iTerm2 / Alacritty / VS Code terminal / Warp once the protocol is enabled) DO emit a distinct sequence for `Shift+Enter`: - `\x1b[13;2u` — Kitty / CSI-u, modifier=2 - `\x1b[27;2;13~` — xterm modifyOtherKeys=2 Stock prompt_toolkit doesn't have the CSI-u sequence in its `ANSI_SEQUENCES` table at all, and it maps the modifyOtherKeys variant to plain `Keys.ControlM` (Enter) — i.e. it strips the Shift modifier, which is the bug users actually hit on iTerm2 and friends. This PR adds `hermes_cli/pt_input_extras.install_shift_enter_alias()`, called once at CLI startup from `cli.py`, which inserts/overwrites those sequences in `ANSI_SEQUENCES` so they decode to `(Keys.Escape, Keys.ControlM)` — the same key tuple `Alt+Enter` produces. The existing Alt+Enter newline handler (`@kb.add('escape', 'enter')` in `cli.py`) then fires unchanged, so there is no new keybinding to register and no behavioral change for terminals that don't emit the distinct sequences. Files ===== * `hermes_cli/pt_input_extras.py` — new module hosting the helper. Lives outside `cli.py` so it's importable in tests without dragging in the full CLI runtime (which depends on `fire`, `rich`, etc.). * `cli.py` — calls `install_shift_enter_alias()` once at module import. Wrapped in try/except so prompt_toolkit version drift can't break CLI startup. * `tests/cli/test_cli_shift_enter_newline.py` — 6 tests: - registration of all three byte sequences - overwrite of stock prompt_toolkit's broken modifyOtherKeys mapping - idempotency - parser equivalence: CSI-u Shift+Enter == Alt+Enter - parser equivalence: modifyOtherKeys Shift+Enter == Alt+Enter - plain Enter remains a single key (submit), distinct from the two-key Alt+Enter / Shift+Enter tuple * `website/docs/user-guide/cli.md` — keybinding table updated; new "Shift+Enter compatibility" subsection with a per-terminal status table noting macOS Terminal / stock Windows Terminal cannot distinguish the keystroke at the protocol level. * `website/docs/getting-started/quickstart.md`, `website/docs/guides/tips.md` — short mention pointing readers at the full compatibility note in `cli.md`. Tested ====== pytest tests/cli/test_cli_shift_enter_newline.py # 6 passed Live-tested by triggering `\x1b[13;2u` against the running Vt100Parser (see test). Not exercised in a real terminal end-to-end because that requires a Kitty-protocol-capable host; the test exercises the parser path that drives the live terminal too. |
||
|
|
26bac67ef9
|
fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091)
teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after a code update, on both his Windows machine AND his Linux workstation. The failure mode is real and affects every user who updates hermes by any path OTHER than a fully-successful ``hermes update``. ## What happens hermes_bootstrap.py is a top-level module registered via pyproject.toml's ``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work). It must be registered in the venv's editable-install .pth file before Python can find it as a bare ``import hermes_bootstrap``. ``hermes update`` handles this correctly: (1) git reset --hard, (2) clear __pycache__, (3) uv pip install -e . (re-registers the package including the new py-modules list), (4) restart. BUT if any step AFTER (1) fails — network blip during pip install, PEP 668 on a system Python, venv locked, uv not in PATH, a crash mid-update — the user is left with new code that references hermes_bootstrap and a venv that doesn't know about it. Every hermes invocation after that crashes with ModuleNotFoundError, including ``hermes update`` itself. No recovery path without manual `uv pip install -e .`. Also affects users who ``git pull`` the repo directly without running hermes update — relatively common for developers. ## Fix Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError across all 6 entry points (hermes_cli/main, run_agent, gateway/run, acp_adapter/entry, cli, batch_runner). On Windows, missing bootstrap means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode chars may fail to print) but NOT a crash. POSIX is unaffected either way since the bootstrap is a no-op there. Once hermes is running again, the user can ``hermes update`` to fully recover. ## Test update tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap scans for the first top-level import in each entry point and asserts it is hermes_bootstrap. Extended the check to accept a Try block whose body is a lone Import of hermes_bootstrap — that's the recovery-friendly form we just introduced. Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak`` and confirming ``python -c "import hermes_cli.main"`` succeeds. 82/82 tests pass (hermes_bootstrap + windows-native + windows-compat). |
||
|
|
0ba1e12abc |
fix(windows): browser tool + spurious SIGINT from subprocess spawning
Three related Windows-only fixes that together make the browser toolset
actually usable on Windows. Symptom chain: user invokes browser_navigate
-> tool returns {"success": false, "error": "Daemon process exited
during startup with no error output"} and the CLI exits mid-turn with
the session summary.
Root cause (3 layers):
1. tools/browser_tool.py::_find_agent_browser() resolved
node_modules/.bin/agent-browser to the extensionless POSIX shell
shim via Path.exists(). On Windows, CreateProcessW cannot execute
that script (WinError 193 "not a valid Win32 application"). Fix:
delegate to shutil.which with path=node_modules/.bin so PATHEXT
picks up agent-browser.CMD on Windows and the extensionless shim
stays correct on POSIX.
2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the
parent hermes.exe whenever a background thread spawns a .cmd
subprocess. Python 3.11's default SIGINT handler raises
KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's
app.run() -> cli.py::run()'s finally block calls _run_cleanup()
-> _emergency_cleanup_all_sessions -> spawns a concurrent
_run_browser_command("close", ...) on the same session the agent
thread just opened. Two agent-browser processes race on the same
--session name, the daemon startup loses, and the tool returns
the "Daemon process exited during startup" error. Fix: install a
Windows-only SIGINT handler that absorbs the signal silently.
Real user Ctrl+C still routes through prompt_toolkit's own c-c
keybinding at the TUI layer, which is how Claude Code handles the
same quirk (driving cancellation via the TUI key handler, not
signals).
3. In tools/browser_tool.py, both Popen sites now pass
creationflags=CREATE_NO_WINDOW | STARTF_USESTDHANDLES with
close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd
console flash; STARTF_USESTDHANDLES + close_fds ensures the child
inherits only our three chosen handles (DEVNULL stdin, temp-file
stdout/stderr) and no leaked parent console handles that could
confuse agent-browser's native daemon spawn. Notably we do NOT
add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag
interacts badly with asyncio's ProactorEventLoop and makes things
worse.
Verified end-to-end on Windows 10 / Windows Terminal / PowerShell:
browser_navigate to https://example.com returns
{"success": true, "title": "Example Domain"} and the CLI stays alive
for follow-up tool calls and assistant turns.
Refs: earlier Windows quirks commits
|
||
|
|
d1838041e5 |
feat: Ctrl+Enter inserts newline on Windows Terminal
Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving Windows users with no Enter-involving way to insert a newline in the Hermes prompt. Fix it by reclaiming c-j on Windows only: - _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows plain Enter is always c-m, so c-j is free. - Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal settings changes required. - Alt+Enter binding unchanged; still works on mac/Linux/WSL. - Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler split into platform-aware assertions for POSIX vs win32. - Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit even on POSIX) to point Windows users at Ctrl+Enter. Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline, since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No conflicting Hermes binding existed for Ctrl+J, so this is a harmless side effect. |
||
|
|
cbce5e93fc |
codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.
Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs). That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.
After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly. Works identically on every platform
and every locale, no surprise behavior.
Mechanical sweep via:
ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' .
All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing
else changed. Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).
Scope notes:
- tests/ excluded: test fixtures can use locale encoding intentionally
(exercising edge cases). If we want to tighten tests later that's
a separate PR.
- plugins/ excluded: plugin-specific conventions may differ; plugin
authors own their code.
- optional-skills/ and skills/ excluded: skill scripts are user-authored
and we don't want to mass-edit them.
- website/ and tinker-atropos/ excluded: vendored / generated content.
46 files touched, 89 +/- lines (symmetric replacement). No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
|
||
|
|
d94fb47717 |
hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).
Problem: Python on Windows has two long-standing text-encoding pitfalls:
1. sys.stdout/stderr are bound to the console code page (cp1252 on
US-locale installs) — print('café') crashes with UnicodeEncodeError.
2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
PYTHONIOENCODING are set in their env — so any Python we spawn
(linters, sandbox children, delegation workers) hits the same bug.
Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:
- hermes_cli/main.py (hermes / hermes-agent console_script)
- run_agent.py (hermes-agent direct)
- acp_adapter/entry.py (hermes-acp)
- gateway/run.py (messaging gateway)
- batch_runner.py (parallel batch mode)
- cli.py (legacy direct-launch CLI)
On Windows, the bootstrap:
- os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode)
- os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
- sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')
Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.
On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected. POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.
setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.
What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init. A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.
Tests (17): 16 passed, 1 skipped on Windows.
- Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
- POSIX: complete no-op (verified on fake POSIX + skipped on real
POSIX since we don't have a Linux box in this session)
- Idempotence: multiple calls safe
- Graceful degradation: non-reconfigurable streams don't crash
- User opt-out: explicit PYTHONUTF8=0 is respected
- Load order: every entry point's FIRST top-level import is
hermes_bootstrap, enforced by an AST-level parametrized test
pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
|
||
|
|
e93bfc6c93 |
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
|
||
|
|
9de893e3b0 |
feat(windows): close native-Windows install gaps — crash-free startup, UTF-8 stdio, tzdata dep, docs
Native Windows (with Git for Windows installed) can now run the Hermes CLI and gateway end-to-end without crashing. install.ps1 already existed and the Git Bash terminal backend was already wired up — this PR fills the remaining gaps discovered by auditing every Windows-unsafe primitive (`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios` imports) and by comparing hermes against how Claude Code, OpenCode, Codex, and Cline handle native Windows. ## What changed ### UTF-8 stdio (new module) - `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point. Flips the console code page to CP_UTF8 (65001), reconfigures `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8` for subprocesses. No-op on non-Windows. Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`. - Called early in `cli.py::main`, `hermes_cli/main.py::main`, and `gateway/run.py::main` so Unicode banners (box-drawing, geometric symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252 consoles. ### Crash sites fixed - `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)` which routes through `taskkill /T /F` on Windows. - `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also converted SIGTERM path to `terminate_pid()` and widened OSError catch on the intermediate `os.kill(pid, 0)` probe. - `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` → `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the pattern already used in `gateway/status.py`). ### OSError widening on `os.kill(pid, 0)` probes Windows raises `OSError` (WinError 87) for a gone PID instead of `ProcessLookupError`. Widened the catch at: - `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this, the loop busy-spins the full 10s every Windows gateway start) - `hermes_cli/gateway.py:228, 460, 940` - `hermes_cli/profiles.py:777` - `tools/process_registry.py::_is_host_pid_alive` - `tools/browser_tool.py:1170, 1206` ### Dashboard PTY graceful degradation `hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`, none of which exist on native Windows. Previously a Windows dashboard would crash on `import hermes_cli.web_server` because of a top-level import. Now: - `hermes_cli/web_server.py` wraps the pty_bridge import in `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`. - The `/api/pty` WebSocket handler returns a friendly "use WSL2 for this tab" message instead of exploding. - Every other dashboard feature (sessions, jobs, metrics, config editor) runs natively on Windows. ### Dependency - `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so Python's `zoneinfo` works on Windows (which has no IANA tzdata shipped with the OS). Credits @sprmn24 (PR #13182). ### Docs - README.md: removed "Native Windows is not supported"; added PowerShell one-liner and Git-for-Windows prerequisite note. - `website/docs/getting-started/installation.md`: new Windows section with capability matrix (everything native except the dashboard `/chat` PTY tab, which is WSL2-only). - `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as "WSL2 as an alternative to native" rather than "the only way". - `website/docs/developer-guide/contributing.md`: updated cross-platform guidance with the `signal.SIGKILL` / `OSError` rules we enforce now. - `website/docs/user-guide/features/web-dashboard.md`: acknowledged native Windows works for everything except the embedded PTY pane. ## Why this shape Pulled from a survey of how other agent codebases handle native Windows (Claude Code, OpenCode, Codex, Cline): - All four treat Git Bash as the canonical shell on Windows, same as hermes already does in `tools/environments/local.py::_find_bash()`. - None of them force `SetConsoleOutputCP` — but they don't have to, Node/Rust write UTF-16 to the Win32 console API. Python does not get that for free, so we flip CP_UTF8 via ctypes. - None of them ship PowerShell-as-primary-shell (Claude Code exposes PS as a secondary tool; scope creep for this PR). - All of them use `taskkill /T /F` for force-kill on Windows, which is exactly what `gateway.status.terminate_pid(force=True)` does. ## Non-goals (deliberate scope limits) - No PowerShell-as-a-second-shell tool — worth designing separately. - No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's the hardest design call and needs a separate doc. - No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld cluster) — will do as follow-up if users hit actual breakage; most modern code already specifies it. ## Validation - 28 new tests in `tests/tools/test_windows_native_support.py` — all platform-mocked, pass on Linux CI. Cover: - `configure_windows_stdio` idempotency, opt-out, env-preservation - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback - `getattr(signal, "SIGKILL", …)` fallback shape - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior) - Source-level checks that all entry points call `configure_windows_stdio` - pty_bridge import-guard present in `web_server.py` - README no longer says "not supported" - 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass. - `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed pre-existing on main by stash-test). - `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure). - `tests/tools/test_process_registry.py` + `test_browser_*` pass. - Manual smoke: `import hermes_cli.stdio; import gateway.run; import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True` on Linux (as expected). ## Files - New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py` - Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, `hermes_cli/profiles.py`, `hermes_cli/gateway.py`, `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`, `hermes_cli/web_server.py`, `tools/browser_tool.py`, `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4 docs pages. Credits to everyone whose prior PR work informed these fixes — see the co-author trailers. All of the PRs listed in `~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL` / UTF-8 stdio / tzdata / README patterns found the same issues; this PR consolidates them. Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com> Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com> Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com> Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com> Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com> Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com> Co-authored-by: sprmn24 <oncuevtv@gmail.com> Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com> Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com> |
||
|
|
850413f120 |
feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form. |
||
|
|
674fad1483
|
fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888)
Reported: Ctrl+C during an active /goal loop felt like it did nothing — the agent would interrupt the current turn, then immediately queue another continuation and keep going until the session ended or the 20-turn budget ran out. Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally: block around self.chat(...) unconditionally. Whether the turn completed normally, got interrupted, or returned an empty string, the judge ran on whatever was in conversation_history and — because the judge is fail-open — a "continue" verdict pushed another CONTINUATION_PROMPT onto _pending_input. Ctrl+C was invisible to the hook. Fix: - chat() now captures result['interrupted'] onto self._last_turn_interrupted (resets to False at entry so early-returns don't leak prior state). - _maybe_continue_goal_after_turn() checks the flag first: on interrupt, auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print a one-liner pointing the user at /goal resume or /goal clear. No judge call, no continuation enqueued. - Also added an empty-response guard that mirrors gateway/run.py's _handle_message logic (empty reply → transient failure → skip judging so we don't trip the consecutive-parse-failures backstop unnecessarily). The goal stays in the DB as paused, so /goal resume recovers it after the user has sorted out whatever made them cancel. /goal clear still works as before for a full stop. Tests: tests/cli/test_cli_goal_interrupt.py covers: - interrupted turn pauses + doesn't queue + judge is NOT called - paused goal is resumable - empty / whitespace / missing assistant reply skips judging - healthy turn still enqueues continuation / marks done - chat() resets _last_turn_interrupted at entry (anti-leak guard) All 55 existing goal tests still pass. |
||
|
|
7338e5d9ba
|
fix(model-switch): prevent stale Ollama credentials after provider switch (#21703)
When switching from a custom local provider (e.g. ollama-launch) to a
cloud provider, two bugs caused the CLI to misbehave:
1. _explicit_api_key/_explicit_base_url were only updated when the switch
result had non-empty values (guarded by `if result.api_key:` etc.).
If the previous provider set these to Ollama values ("ollama",
"http://127.0.0.1:11434/v1"), those stale values leaked into the next
turn's _ensure_runtime_credentials() call and were forwarded to the
new provider's API endpoint, causing authentication/routing failures.
Fix: unconditionally write result.api_key/base_url into the explicit
fields after every successful switch. An empty string is the correct
sentinel — it tells _ensure_runtime_credentials to re-resolve from the
auth store / config rather than forwarding a stale override.
2. In AIAgent.switch_model(), `self.base_url = base_url or self.base_url`
kept the old Ollama localhost URL whenever the incoming base_url was an
empty string. For providers that use a native SDK (not an OpenAI-compat
endpoint), the caller passes base_url="" and expects the agent to clear
the field — not silently inherit Ollama's address.
Fix: only update self.base_url when base_url is truthy.
3. _handle_model_picker_selection() was called from the prompt_toolkit
Enter key binding without any exception guard. Any unexpected error
in the model-selection code path propagated through prompt_toolkit's
key-binding dispatcher and caused the entire TUI to exit — which the
user sees as "the terminal exits when I switch providers".
Fix: wrap the call in try/except and close the picker on failure.
|
||
|
|
d87c7b99e2
|
fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455)
- Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and Haiku 4.5 with updated source URLs (platform.claude.com) - Add _normalize_anthropic_model_name() to handle dot-notation variants (e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups - Fix silent token loss: ensure session row exists before UPDATE in both run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent) - Log token persistence failures at DEBUG level instead of swallowing them silently — makes undercounted analytics diagnosable - Surface reasoning tokens in CLI /usage and TUI usage panel - Add 'reasoning' and 'cost_status' fields to TUI Usage type |
||
|
|
edbbc96b55 | fix(cli): replace get_event_loop() with get_running_loop() to silence RuntimeWarning in process_loop thread (#19285) | ||
|
|
6e46f99e7e
|
fix(tui): surface backend error as visible text when final_response is empty (#21245)
When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.
Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.
Reported by @counterposition in PR #20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
|
||
|
|
f5a232af84 |
refactor: replace 'cmp' text with 🗜️ emoji in status bar
Address review feedback to use the clamp emoji (��️) instead of the plain text 'cmp' prefix for the compression count indicator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
103e11926f |
feat(cli): show context compression count in status bar
Display the number of context compressions in the CLI status bar when compressions > 0, helping users understand conversation compression pressure during long sessions. - Wide layout (>=76 cols): shows 'cmp N' between context percent and duration - Medium layout (52-75 cols): shows 'cmp N' between percent and duration - Narrow layout (<52 cols): omitted to save space - Color-coded: dim for 1-4, warn for 5-9, bad for 10+ - Hidden when zero to keep the bar clean for new sessions Closes #18564 |
||
|
|
fb1ce793e6
|
feat(security): enable secret redaction by default (#17691, #20785) (#21193)
Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation). |
||
|
|
5044e1cbf1
|
fix(cli): submit LF enter in thin PTYs (#20896) | ||
|
|
906881c38b |
fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands
When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).
The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:
WARNING cli: process_loop unhandled error (msg may be lost):
[Errno 63] File name too long: "/goal Drive the space board..."
This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.
Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.
Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.
Reproducer (without the fix):
python -c "from cli import _detect_file_drop;
_detect_file_drop('/goal ' + 'a'*300)"
→ OSError: [Errno 63] File name too long
|
||
|
|
a0fedfbb1b
|
feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
|
||
|
|
76074d9ee6 | fix(cli): recover classic CLI output after resize | ||
|
|
e45df2e81e | fix(ui): reduce status-line jitter while scrolling | ||
|
|
e70e49016f
|
fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)
CPython's logging module is not reentrant-safe. `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.
When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.
Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path. This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself. Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.
Observed in the wild: debug report on 0.12.0 (commit
|
||
|
|
395dbcc873 |
feat(browser): add Lightpanda engine support with automatic Chrome fallback
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
|
||
|
|
3b750715a3
|
fix: resolve lazy session creation regressions (#18370 fallout) (#20363)
Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths. |
||
|
|
e805380b82 | Discover plugin commands during CLI dispatch | ||
|
|
4e6f51167d | fix(cli): fall back on invalid HERMES_MAX_ITERATIONS | ||
|
|
aacf36e943 | fix(cli): persist manual compress handoff | ||
|
|
20428f5e60
|
fix(tui): respect voice.record_key config (supersedes #19028, #19339) (#19835)
* fix(tui): respect voice.record_key config instead of hardcoded Ctrl+B
Classic CLI loaded ``voice.record_key`` from config.yaml and bound the
prompt-toolkit handler dynamically (``cli.py`` paths). The new TUI hard-
coded ``Ctrl+B`` everywhere — ``isVoiceToggleKey`` (input handler),
``/voice status`` ("Record key: Ctrl+B"), and ``/voice on`` ("Ctrl+B to
start/stop recording"). A user who set ``voice.record_key: ctrl+o``
(or any other key) saw the documented config silently ignored — only
Ctrl+B worked, the displayed shortcut lied about it.
Wire the configured key end to end through the existing channels:
* **Backend** (``tui_gateway/server.py``): ``voice.toggle`` action=status
AND action=on/off responses now include ``record_key``, sourced from
``config.get('voice', {}).get('record_key', 'ctrl+b')``.
* **Backend types** (``ui-tui/src/gatewayTypes.ts``): ``ConfigFullResponse``
now exposes ``config.voice.record_key`` and ``VoiceToggleResponse``
carries ``record_key`` so the TUI can both bind and display it.
* **Frontend parser/formatter** (``ui-tui/src/lib/platform.ts``):
``parseVoiceRecordKey()`` accepts ``ctrl+b`` / ``alt+r`` / ``cmd+space``
and the common aliases (``option``, ``cmd``, ``win``, …); falls back to
the documented Ctrl+B for empty / multi-character / malformed input so
a typo never silently disables the shortcut. ``formatVoiceRecordKey()``
renders for status text. ``isVoiceToggleKey`` now takes a parsed
``ParsedVoiceRecordKey`` argument; the hardcoded ``ch === 'b'`` is
gone. Default arg keeps existing call sites back-compat.
* **Hydration** (``ui-tui/src/app/useConfigSync.ts``,
``useMainApp.ts``): startup ``config.get full`` already runs; extract
``cfg.voice.record_key`` from it, parse, push into a new
``voiceRecordKey`` state, and forward to the input handler ctx
(``InputHandlerContext.voice.recordKey``). Mtime-poll path also
re-applies the parsed key so a hand-edit of config.yaml takes effect
the next tick — matches existing behaviour for display options.
* **Input handler** (``ui-tui/src/app/useInputHandlers.ts``):
``isVoiceToggleKey(key, ch, voice.recordKey)`` so the configured
binding fires.
* **Slash command** (``ui-tui/src/app/slash/commands/session.ts``):
``/voice status`` and ``/voice on`` use ``formatVoiceRecordKey`` on
the response's ``record_key`` instead of the hardcoded label.
Tests:
* ``parseVoiceRecordKey`` covers ctrl/alt/cmd/super aliases, multi-char
rejection, and empty fallback.
* ``formatVoiceRecordKey`` covers the doc examples (``Ctrl+B``,
``Ctrl+O``, ``Alt+R``, ``Cmd+B``).
* ``isVoiceToggleKey`` regression: ``ctrl+o`` configured → only ``o``
matches, not ``b``; ``alt+r`` matches both alt-bit and meta-bit
encodings (terminal protocol parity); omitted-arg call still binds
Ctrl+B for back-compat.
Full TUI suite (555 tests) passes; ``tsc --noEmit`` clean.
Fixes #18994
Co-authored-by: asheriif <ahmedsherif95@gmail.com>
* fix(tui): support named-key tokens in voice.record_key (space, enter, …)
Reviewer caught that the round-1 parser in #18994 rejected every
multi-character token, so a config value like ``ctrl+space`` (which the
CLI happily binds via prompt_toolkit's ``c-space`` rewrite in
``cli.py``) silently fell back to the documented Ctrl+B default —
re-introducing the same false-shortcut bug the PR was meant to fix,
just at a different surface.
Add explicit named-key support that mirrors what the CLI accepts:
* ``space`` (alias: ``spc``) → matches ``ch === ' '``
* ``enter`` (alias: ``return``, ``ret``) → matches ``key.return``
* ``tab`` → matches ``key.tab``
* ``escape`` (alias: ``esc``) → matches ``key.escape``
* ``backspace`` (alias: ``bs``) → matches ``key.backspace``
* ``delete`` (alias: ``del``) → matches ``key.delete``
``ParsedVoiceRecordKey`` gains an optional ``named`` field; ``ch``
holds either a single char (back-compat) or the canonical named token,
and the runtime matcher dispatches on ``named`` before checking the
modifier shape. Aliases collapse to one canonical name so
``ctrl+esc`` and ``ctrl+escape`` behave identically.
Unrecognised multi-character tokens (e.g. ``ctrl+spcae`` typo, or
unsupported keys like ``ctrl+f5``) still fall back to the Ctrl+B
default rather than silently disabling the binding — keeps the "typo
never silently kills the shortcut" guarantee.
Tests:
* ``parseVoiceRecordKey`` parametrised over every named token + each
alias variant.
* New ``isVoiceToggleKey`` cases for space (ch-based match), enter
(``key.return``), tab, escape, backspace, delete, including
modifier-mismatch negatives.
* ``formatVoiceRecordKey`` renders named keys in title case
(``Ctrl+Space``, ``Ctrl+Enter``).
* Existing fall-back-to-Ctrl+B contract preserved for empty input
AND unrecognised multi-char tokens.
Full TUI suite: 559/559 pass; ``tsc --noEmit`` clean.
Refs #18994 (round-1 review feedback)
Co-authored-by: asheriif <ahmedsherif95@gmail.com>
* test(tui): assert voice.toggle returns configured record_key
Salvage the backend regression from #19339 — asserts ``voice.toggle``
action=on AND action=status responses carry the configured
``voice.record_key`` end-to-end through ``_load_cfg()``. Keeps the
CLI→TUI parity contract visible in the Python test suite alongside
the existing frontend parser/matcher/formatter coverage from #19028.
* fix(tui): address Copilot review on #19835 voice.record_key wiring
Five tightenings on the parser + matcher + hydration surface, all
caught by the Copilot review on the PR — each one turns a silent
false-fire or display/binding skew into a deterministic behaviour.
* **isVoiceToggleKey ctrl branch was too permissive for named keys.**
The doc-default macOS Cmd+B muscle-memory fallback
(``isActionMod(key)`` on top of ``key.ctrl``) fired for every
configured key, so bare Esc — which hermes-ink reports with
``key.meta`` on some macOS terminals — triggered ``ctrl+escape``,
and Alt+Space / Alt+Tab triggered ``ctrl+space`` / ``ctrl+tab``.
Gate the fallback to the literal ``ctrl+b`` binding so any custom
chord requires the real Ctrl bit.
* **Alt branch guarded against Ctrl/Cmd co-press.** Without this,
Ctrl+Alt+<letter> and Cmd+Alt+<letter> also fired ``alt+<letter>``.
* **Dropped the ``meta`` modifier variant and its alias.** In
hermes-ink ``key.meta`` is Alt on xterm-style terminals and Cmd on
legacy macOS ones, so a literal ``meta+b`` config displayed as
``Cmd+B`` while matching Alt+B — exactly the kind of false
shortcut the PR was meant to remove. ``cmd`` / ``command`` now
collapse onto ``super`` (kitty-style ``key.super``, with a macOS
``key.meta`` fallback) and render as ``Cmd+B``. Unknown modifier
tokens fall back to the documented Ctrl+B default rather than
silently coercing to Ctrl.
* **Slash-command display/binding skew.** ``/voice status`` and
``/voice on`` rendered from the fresh gateway ``record_key``
response, but ``useInputHandlers()`` still bound the old key
until the next 5s mtime poll. Thread ``setVoiceRecordKey``
through ``SlashHandlerContext.voice`` and push the parsed spec
into frontend state on every response so text and binding stay
consistent.
* **Test coverage for the two paths Copilot flagged.** Added
vitest coverage for (a) the three-case ``/voice`` slash output
in ``createSlashHandler.test.ts`` and (b) the
``applyDisplay → voice.record_key`` hydration + omit-setter
back-compat paths in ``useConfigSync.test.ts``. Plus regression
cases for every false-fire scenario above.
Suite: 575/575 green, tsc --noEmit clean.
* fix(tui): address Copilot round-2 review on #19835
Three tightenings on the surface introduced in the round-1 fix:
* **``/voice tts`` reset custom bindings to Ctrl+B.** The ``tts`` branch
of ``voice.toggle`` omitted ``record_key`` from its response, so the
frontend's ``r.record_key ?? 'ctrl+b'`` coerced a user's custom
binding back to the default on every TTS toggle. Two-sided fix:
the backend now includes ``record_key`` on the ``tts`` branch (parity
with ``status``/``on``/``off``), and the slash handler only pushes
frontend state when the response actually carries ``record_key`` —
belt-and-suspenders against any future branch forgetting to include
it.
* **``super+b`` / ``win+b`` / ``cmd+b`` displayed "Cmd+B" on Linux and
Windows.** ``formatVoiceRecordKey`` rendered ``mod === 'super'`` as
``Cmd`` universally, which told non-mac users the wrong modifier to
press even though ``isVoiceToggleKey`` matched the right event bits.
Gate the label to ``isMac`` so non-mac renders ``Super+B``.
* **``control+b`` / ``ctrl + b`` lost the macOS Cmd+B fallback.**
``_isDefaultVoiceKey`` keyed off ``parsed.raw`` — so
semantically-equal aliases of the documented default dropped into
the strict branch even though they bind Ctrl+B. Compare on the
parsed spec (mod + ch + named) instead.
Coverage added: Linux ``Super+B`` rendering (and macOS ``Cmd+B``),
``control+b`` / ``ctrl + b`` accepting the Cmd+B fallback on darwin,
``/voice tts`` without ``record_key`` not clobbering cached binding,
and a backend regression asserting every ``voice.toggle`` branch
carries the configured key.
Suite: 579/579 TUI vitest green, 2/2 backend voice tests green,
tsc --noEmit clean.
* fix(tui): address Copilot round-3 review on #19835
Three classes of robustness issue caught on the second pass — all
revolve around malformed YAML tipping ``parseVoiceRecordKey`` or
``_voice_record_key`` into a crash instead of the documented
fallback.
* **Parser crashed on non-string YAML scalars.** ``config.get full``
returns raw ``yaml.safe_load`` output, so ``voice.record_key: 1``
or ``voice.record_key: true`` in a hand-edited config would hit
``.trim()`` on a number/bool and throw, breaking startup and
every mtime re-apply. Accept ``unknown`` at the signature, guard
with ``typeof raw !== 'string'``, and fall back to the default.
* **Backend blew up on non-dict ``voice:``.** Same YAML hazard on
the gateway side: ``voice: true`` / ``voice: cmd+b`` left
``_load_cfg().get("voice")`` as a bool/str, so ``.get("record_key")``
raised AttributeError and took every ``voice.toggle`` branch down
with it. Centralised the lookup in a single
``_voice_record_key()`` helper that ``isinstance``-guards both
``voice`` and ``record_key`` and falls back to ``ctrl+b``.
* **Multi-modifier chords silently dropped extras.** The previous
validator only checked the first modifier token, so ``ctrl+alt+r``
silently parsed as ``ctrl+r`` and ``cmd+ctrl+b`` as ``super+b`` —
a typo bound a different shortcut than the user configured.
Reject multi-modifier spellings outright; the classic CLI only
supports single-modifier bindings via prompt_toolkit's ``c-x`` /
``a-x`` rewrite, so this matches CLI parity.
Coverage added:
* ``parseVoiceRecordKey`` fallback on ``1`` / ``true`` / ``null`` /
``undefined`` / ``{}``.
* ``parseVoiceRecordKey`` fallback on ``ctrl+alt+r`` /
``cmd+ctrl+b`` / ``alt+ctrl+space``.
* ``test_voice_toggle_handles_non_dict_voice_cfg`` exercises
every non-dict ``voice:`` shape (bool, str, None, int, list) and
asserts each falls back to ``record_key: 'ctrl+b'``.
Suite: 581/581 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.
* fix(tui): address Copilot round-4 review on #19835
Four final corners of the voice.record_key surface:
* **Bare-char configs silently coerced to ``ctrl+<key>``.** A config
like ``voice.record_key: o`` / ``space`` / ``escape`` fell through
to the default ``mod = 'ctrl'`` and silently bound Ctrl+O, while
the classic CLI's prompt_toolkit would bind the raw key (no
rewrite) — so the two runtimes silently disagreed on what "o"
means. Require an explicit modifier; bare-char configs fall back
to the documented Ctrl+B default.
* **Reserved ctrl+<letter> bindings would never fire.**
``useInputHandlers()`` intercepts ``ctrl+c`` (interrupt),
``ctrl+d`` (quit), and ``ctrl+l`` (clear screen) before the voice
check runs, so those configs would be advertised in /voice
status but the advertised shortcut never actually triggers
push-to-talk. Added ``_RESERVED_CTRL_CHARS`` at parse time so
the user gets the documented default instead of a dead shortcut.
(``alt+c``, ``cmd+l``, etc. are not intercepted and stay usable.)
* **``_load_cfg()`` root itself may be a non-dict.**
``_voice_record_key()`` isinstance-guarded the ``voice`` subkey
but not the root — a malformed config.yaml that collapsed to a
scalar/list at the top level (``config.yaml: true`` or ``[]``)
would still raise on ``.get("voice")``. Added the top-level
guard too so every malformed shape falls back to ``ctrl+b``.
* **Stale header comment on ``isVoiceToggleKey``.** The doc-comment
still claimed "On macOS we additionally accept the platform
action modifier (Cmd) for the configured letter" even though the
implementation gates the Cmd fallback to the documented default
only. Rewrote to match.
Coverage added:
* ``parseVoiceRecordKey`` fallback on bare chars (``o``, ``b``,
``space``, ``escape``).
* ``parseVoiceRecordKey`` fallback on ``ctrl+c`` / ``ctrl+d`` /
``ctrl+l``; positive case for ``alt+c`` / ``cmd+l`` still usable.
* Backend ``test_voice_toggle_handles_non_dict_voice_cfg`` now
exercises 5 non-dict shapes at the YAML root too.
Suite: 583/583 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.
* fix(tui): address Copilot round-5 review on #19835
Three follow-ups on the voice matcher's modifier + shift discipline:
* **``super`` branch falsely fired on Alt+<key> / bare Esc on macOS.**
``isVoiceToggleKey`` accepted ``isMac && key.meta`` as a Cmd
fallback for the ``super`` modifier — but hermes-ink sets
``key.meta`` for plain Alt/Option AND for bare Escape on some
macOS terminals. A ``cmd+b`` config silently fired on Alt+B;
``cmd+space`` on Alt+Space; ``cmd+escape`` on bare Esc. Drop the
fallback and require the literal ``key.super`` bit. Legacy-
terminal users who need Cmd should upgrade to a kitty-protocol
terminal or bind ``alt+X`` explicitly.
* **Shift bit was never checked.** The parser rejects multi-
modifier configs like ``ctrl+shift+tab``, but the runtime
matcher didn't check ``key.shift`` — so ``ctrl+tab`` also fired
on Ctrl+Shift+Tab and ``alt+enter`` on Alt+Shift+Enter.
Early-return on ``key.shift === true`` so the runtime only fires
the exact chord the user configured.
* **Test leaked ``HERMES_VOICE=1`` into later tests.**
``voice.toggle`` action=on writes to ``os.environ`` directly
(CLI parity, runtime-only flag); ``test_voice_toggle_returns_
configured_record_key`` dispatched action=on without letting
monkeypatch take ownership of the var first. Any later test
that read voice mode in the same Python process could inherit a
stale enabled state. Added ``monkeypatch.setenv("HERMES_VOICE",
"0")`` up front so monkeypatch restores the original value at
teardown.
Coverage added:
* ``cmd+b`` / ``cmd+space`` / ``cmd+escape`` do NOT fire on
``key.meta``-only events on darwin.
* ``ctrl+tab`` / ``alt+enter`` / ``ctrl+o`` reject matches when
``key.shift`` is held; sanity cases without Shift still fire.
Suite: 585/585 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.
* fix(tui): address Copilot round-6 review on #19835
Three classes of modifier-discipline tightening + one config-surface
honesty fix:
* **Default ``ctrl+b`` Cmd fallback leaked Alt+B.** The default's
macOS Cmd+B muscle-memory path used ``isActionMod(key)``, which
returns ``key.meta || key.super`` on darwin. hermes-ink also
reports plain Alt as ``key.meta``, so Alt+B silently fired the
default binding. Replaced with strict ``isMac && key.super ===
true`` — kitty-style Cmd+B still works, Alt+B correctly
rejected. Legacy-terminal mac users (Terminal.app without
CSI-u) now get raw Ctrl+B only; the documented default still
works everywhere.
* **ctrl / super branches accepted extra modifier bits.** The
parser rejects multi-modifier configs like ``ctrl+alt+o``, but
the runtime matcher was permissive — ``ctrl+o`` fired on
Ctrl+Alt+O / Ctrl+Cmd+O, and ``super+b`` fired on Cmd+Alt+B /
Ctrl+Cmd+B. Added strict ``!key.alt && !key.meta && key.super
!== true`` on ctrl, and ``!key.ctrl && !key.alt && !key.meta``
on super, so the runtime only fires the exact chord the parser
would let you configure.
* **Dropped ``cmd`` / ``command`` aliases.** They parsed to
``super`` and rendered as ``Cmd+X``, but legacy macOS terminals
report Cmd as ``key.meta`` (same signal as Alt), so a
``cmd+o`` config was advertised as working but never actually
fired on Terminal.app-without-CSI-u. That recreated the
"displayed shortcut does not work" problem this PR was meant to
remove. Users who want the platform action modifier spell it
``super`` / ``win`` — that matches the unambiguous ``key.super``
bit, and kitty-style macOS terminals render it as ``Cmd+X`` via
platform-aware formatter.
Coverage updated:
* Default ctrl+b no longer fires on Alt+B via ``key.meta`` leak;
raw Ctrl+B and kitty-style Cmd+B still fire.
* ``ctrl+o`` rejects Ctrl+Alt+O / Ctrl+Cmd+O / Ctrl+Meta+O chords.
* ``super+b`` rejects Cmd+Alt+B / Cmd+Meta+B / Ctrl+Cmd+B chords.
* ``cmd+b`` / ``command+b`` / ``meta+b`` all fall back to the
documented default at parse time (joined the ambiguous-mac-mod
rejection class).
* Round-2 expectations that asserted ``cmd+b`` parsed as super
and accepted ``key.meta`` on darwin updated to reflect the new
stricter contract.
Suite: 588/588 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.
* fix(tui): address Copilot follow-up on wire typing + escape precedence
Two follow-ups from the latest Copilot pass:
* **Config wire typing honesty (`gatewayTypes.ts`)**
`config.get full` forwards raw `yaml.safe_load()` output, so
`voice.record_key` can be any scalar/container when hand-edited.
Typing it as `string` suggests a normalized contract that the
backend does not guarantee and makes unsafe callers more likely.
Change `ConfigVoiceConfig.record_key` to `unknown` with an
explicit comment that callers must normalize at runtime.
* **Escape-based voice bindings were swallowed before voice check**
`useInputHandlers()` handled `key.escape` for queue-edit cancel and
selection clear before `isVoiceToggleKey(...)`, so configured
`ctrl+escape` / `alt+escape` / `super+escape` chords were advertised
but never toggled recording in those UI states.
Add an early escape+voice check before generic Esc handlers so
escape-based voice bindings win when configured, while plain Esc
behavior remains unchanged.
Also updated PR #19835 description text to remove stale cmd/command
alias claims and match the current parser contract.
* fix(tui): pass configured voice shortcut through TextInput layer
Thread the live parsed voiceRecordKey into TextInput so configured voice.record_key chords bubble to useInputHandlers instead of being consumed as editor input. This removes the last hardcoded Ctrl+B pass-through in the composer path while preserving existing global control chord behavior.
* fix(tui): require explicit alt bit for escape-based alt chords
Hermes-ink reports bare Escape as meta=true+escape=true on some terminals, so a configured alt+escape binding was firing on bare Esc. Require an explicit key.alt bit when the configured named key is escape so plain Esc stays plain Esc; kitty-style alt+escape still fires.
* fix(tui): harden voice.record + TextInput paste + super-mod reserved list
Three round-7 Copilot follow-ups on #19835:
- voice.record start handler used _load_cfg().get('voice', {}).get(...) without
shape checks, so malformed YAML (bool/scalar/list) returned 5025 instead of
using VAD defaults. Centralized _voice_cfg_dict() helper and type-guarded
silence_threshold/silence_duration with numeric fallbacks.
- TextInput pass-through check moved above paste/copy handling so configured
voice chords (ctrl+v / alt+v / cmd+v) beat the composer's paste/copy
defaults.
- parser now also rejects super+{c,d,l,v} — on macOS those are
copy/exit/clear/paste and would be advertised in /voice status but never
actually toggle recording.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(tui): round-8 Copilot review — allow ctrl+x, gate super reservations to macOS, preserve voice key on transient RPC failure
Three round-8 Copilot follow-ups on #19835:
- Revert ctrl+x addition to _RESERVED_CTRL_CHARS (landed via Copilot Autofix
commit
|