Previous pass assumed both skills would always be loaded together, so
each description pointed at the other ('use concept-diagrams instead').
That breaks when only one skill is active — the agent reads 'use the
other skill' and there is no other skill.
Now each skill's description and scope section is fully self-contained:
- States what it's best suited for
- Lists subjects where a more specialized skill (if available) would be
a better fit, naming them only as 'consider X if available'
- Explicitly offers itself as a general SVG diagram fallback when no
more specialized skill exists
An agent loading either skill alone gets unambiguous guidance; an
agent with both loaded still gets useful routing via the 'consider X
if available' hints and the related_skills metadata.
Both skills generate SVG system diagrams, but for very different subjects
and aesthetics. The old descriptions didn't make the split clear, so an
agent loading either one couldn't confidently pick.
Changes:
- Rewrote both frontmatter descriptions to state the scope up front plus
an explicit 'for X, use the other skill instead' pointer.
- Added a symmetric 'When to use this skill vs <other>' decision table
to the top of each SKILL.md body, so the guidance is visible whether
the agent is reading frontmatter or full content.
- Added architecture-diagram <-> concept-diagrams to each other's
related_skills metadata.
Rule of thumb baked into both skills:
software/cloud infra -> architecture-diagram
physical / scientific / educational -> concept-diagrams
Salvage of PR #11045 (original by v1k22). Changes on top of the
original commit:
- Rename 'architecture-visualization-svg-diagrams' -> 'concept-diagrams'
to differentiate from the existing architecture-diagram skill.
architecture-diagram stays as the dark-themed Cocoon-style option for
software/infra; concept-diagrams covers physics, chemistry, math,
engineering, physical objects, and educational visuals.
- Trigger description scoped to actual use cases; removed the 'always
use this skill' language and long phrase-capture list to stop
colliding with architecture-diagram, excalidraw, generative-widgets,
manim-video.
- Default output is now a standalone self-contained HTML file (works
offline, no server). The preview server is opt-in and no longer part
of the default workflow.
- When the server IS used: bind to 127.0.0.1 instead of 0.0.0.0 (was a
LAN exposure hazard on shared networks) and let the OS pick a free
ephemeral port instead of hard-coding 22223 (collision prone).
- Shrink SKILL.md from 1540 to 353 lines by extracting reusable
material into linked files:
- templates/template.html (host page with full CSS design system)
- references/physical-shape-cookbook.md
- references/infrastructure-patterns.md
- references/dashboard-patterns.md
All 15 examples kept intact.
- Add dhandhalyabhavik@gmail.com -> v1k22 to AUTHOR_MAP.
Preserves v1k22's authorship on the underlying commit.
- SKILL.md with full SVG design system (color palette, typography, spacing, dark mode)
- 15 example diagrams covering flowcharts, physical structures, chemistry, charts, floor plans, and more
- Supports 8 diagram types: flowchart, structural, API map, microservice, data flow, physical, infrastructure, UI mockups
- Auto-hosts diagrams on 0.0.0.0:22223 as interactive web pages
Inbound Feishu messages arriving during brief windows when the adapter
loop is unavailable (startup/restart transitions, network-flap reconnect)
were silently dropped with a WARNING log. This matches the symptom in
issue #5499 — and users have reported seeing only a subset of their
messages reach the agent.
Fix: queue pending events in a thread-safe list and spawn a single
drainer thread that replays them once the loop becomes ready. Covers
these scenarios:
* Queue events instead of dropping when loop is None/closed
* Single drainer handles the full queue (not thread-per-event)
* Thread-safe with threading.Lock on the queue and schedule flag
* Handles mid-drain bursts (new events arrive while drainer is working)
* Handles RuntimeError if loop closes between check and submit
* Depth cap (1000) prevents unbounded growth during extended outages
* Drops queue cleanly on disconnect rather than holding forever
* Safety timeout (120s) prevents infinite retention on broken adapters
Based on the approach proposed in #4789 by milkoor, rewritten for
thread-safety and correctness.
Test plan:
* 5 new unit tests (TestPendingInboundQueue) — all passing
* E2E test with real asyncio loop + fake WS thread: 10-event burst
before loop ready → all 10 delivered in order
* E2E concurrent burst test: 20 events queued, 20 more arrive during
drainer dispatch → all 40 delivered, no loss, no duplicates
* All 111 existing feishu tests pass
Related: #5499, #4789
Co-authored-by: milkoor <milkoor@users.noreply.github.com>
* feat(image_gen): multi-model FAL support with picker in hermes tools
Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.
Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)
Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
whitelist, and upscale flag. Three size families handled uniformly:
image_size_preset (flux family), aspect_ratio (nano-banana), and
gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
into model-specific payloads, merges defaults, applies caller overrides,
wires GPT quality_setting, then filters to the supports whitelist — so
models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
providers (Replicate, Stability, etc.); each provider entry tags itself
with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.
Config:
image_gen.model = fal-ai/flux-2/klein/9b (new)
image_gen.quality_setting = medium (new, GPT only)
image_gen.use_gateway = bool (existing)
Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.
Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.
Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.
Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).
* feat(image_gen): translate managed-gateway 4xx to actionable error
When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
1. The rejected model ID + HTTP status
2. Two remediation paths: set FAL_KEY for direct access, or
pick a different model via `hermes tools`
5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).
Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.
Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.
Docs: brief note in image-generation.md for Nous subscribers.
Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.
* feat(image_gen): pin GPT-Image quality to medium (no user choice)
Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:
1. Nous Portal billing complexity — the 22x cost spread between tiers
($0.009 low / $0.20 high) forces the gateway to meter per-tier per
user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
credit ~6x faster than `medium`.
This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:
- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
config.yaml, no code path reads it — always sends medium.
Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
(5 tests) — proves medium is baked in, config is ignored, flag is
removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
picker call fires when gpt-image is selected (no quality follow-up).
Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.
* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section
Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
The helper used ${var,,} (bash 4+ lowercase parameter expansion) and
[[ =~ ]], which fail on macOS default /bin/bash (3.2.57) with:
bash: ${default,,}: bad substitution
With 'set -e' at the top of the script, that aborts the whole
installer for macOS users who don't have a newer bash on PATH.
Replace the lowercase expansions with POSIX-style case patterns
(`[yY]|[yY][eE][sS]|...`) that behave identically and parse cleanly
on bash 3.2. Verified with a 15-case behavior test on both bash 3.2
and bash 5.2 — all pass.
Re-land of #10933, now guarded by the tests in #11266.
When a provider drops a TCP connection mid-stream, the socket can enter
CLOSE-WAIT and ''epoll_wait'' may never fire — no data or error signal
arrives, so the httpx read timeout never triggers and the agent hangs
indefinitely. The other defenses (''_force_close_tcp_sockets'', stale
stream detector) all ride on the socket layer reporting the dead
connection, which it never does without probes.
Inject ''SO_KEEPALIVE'' + ''TCP_KEEPIDLE''/''KEEPINTVL''/''KEEPCNT''
into the httpx transport. Kernel probes after 30s idle, retries every
10s, gives up after 3 → dead peer detected within ~60s instead of
hanging forever. Platform-aware: ''TCP_KEEPIDLE'' on Linux,
''TCP_KEEPALIVE'' on macOS. Silent no-op on Windows or anywhere
the socket options aren't available.
The original land (#10933) mutated ''client_kwargs'' in place when it
injected the ''httpx.Client''. Since callers pass ''self._client_kwargs''
by reference, the injected client leaked into the instance state. After
the first request, the OpenAI SDK closed its ''http_client'' — including
the injected one. The next ''_create_openai_client'' call re-read the
now-closed ''httpx.Client'' from ''self._client_kwargs'' and every
subsequent chat raised ''APIConnectionError'' with cause ''RuntimeError:
Cannot send a request, as the client has been closed'' (AlexKucera's
Discord report, 2026-04-16).
The defensive ''client_kwargs = dict(client_kwargs)'' copy already on
main (taeuk178's #10978) means this injection only lands in the
per-call local copy. Each ''_create_openai_client'' invocation gets
its OWN fresh ''httpx.Client'' whose lifetime is tied to the paired
''OpenAI'' client. When that ''OpenAI'' client is closed (rebuild,
teardown, credential rotation), its ''httpx.Client'' closes with it
and the next call constructs a fresh one — no stale closed transport
can be reused.
Full 4-test matrix all green (unit + live with real OpenRouter round
trips, HERMES_LIVE_TESTS=1):
tests/run_agent/test_create_openai_client_kwargs_isolation.py PASS
tests/run_agent/test_create_openai_client_reuse.py PASS (2)
tests/run_agent/test_sequential_chats_live.py PASS
Socket options verified on the live httpx transport:
_socket_options: [(1, 9, 1), (6, 4, 30), (6, 5, 10), (6, 6, 3)]
= (SO_KEEPALIVE=1, TCP_KEEPIDLE=30s, TCP_KEEPINTVL=10s, TCP_KEEPCNT=3)
Sequential-chat reproduction of the #10933 failure was explicitly
run against this patch — the defensive copy on main prevents the
closed transport from leaking back into ''self._client_kwargs'', so
every rebuild constructs a fresh transport.
Closes#10324
PR #4918 fixed the double-/v1 bug at fresh agent init by stripping the
trailing /v1 from OpenCode base URLs when api_mode is anthropic_messages
(so the Anthropic SDK's own /v1/messages doesn't land on /v1/v1/messages).
The same logic was missing from the /model mid-session switch path.
Repro: start a session on opencode-go with GLM-5 (or any chat_completions
model), then `/model minimax-m2.7`. switch_model() correctly sets
api_mode=anthropic_messages via opencode_model_api_mode(), but base_url
passes through as https://opencode.ai/zen/go/v1. The Anthropic SDK then
POSTs to https://opencode.ai/zen/go/v1/v1/messages, which returns the
OpenCode website 404 HTML page (title 'Not Found | opencode').
Same bug affects `/model claude-sonnet-4-6` on opencode-zen.
Verified upstream: POST /v1/messages returns clean JSON 401 with x-api-key
auth (route works), while POST /v1/v1/messages returns the exact HTML 404
users reported.
Fix mirrors runtime_provider.resolve_runtime_provider:
- hermes_cli/model_switch.py::switch_model() strips /v1 after the OpenCode
api_mode override when the resolved mode is anthropic_messages.
- run_agent.py::AIAgent.switch_model() applies the same strip as
defense-in-depth, so any direct caller can't reintroduce the double-/v1.
Tests: 9 new regression tests in tests/hermes_cli/test_model_switch_opencode_anthropic.py
covering minimax on opencode-go, claude on opencode-zen, chat_completions
(GLM/Kimi/Gemini) keeping /v1 intact, codex_responses (GPT) keeping /v1
intact, trailing-slash handling, and the agent-level defense-in-depth.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:
tools/environments/daytona.py
- PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
against the same sandbox don't collide on the remote temp path
- Move sync_back() inside the cleanup lock and after the _sandbox-None
guard, with its own try/except. Previously a no-op cleanup (sandbox
already cleared) still fired sync_back → 3-attempt retry storm against
a nil sandbox (~6s of sleep). Now short-circuits cleanly.
tools/environments/file_sync.py
- Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
tar larger than the limit. Protects against runaway sandboxes
producing arbitrary-size archives.
- Add 'nothing previously pushed' guard at the top of sync_back(). If
_pushed_hashes and _synced_files are both empty, the FileSyncManager
was never initialized from the host side — there is nothing coherent
to sync back. Skips the retry/backoff machinery on uninitialized
managers and eliminates test-suite slowdown from pre-existing cleanup
tests that don't mock the sync layer.
tests/tools/test_file_sync_back.py
- Update _make_manager helper to seed a _pushed_hashes entry by default
so sync_back() exercises its real path. A seed_pushed_state=False
opt-out is available for noop-path tests.
- Add TestSyncBackSizeCap with positive and negative coverage of the
new cap.
tests/tools/test_sync_back_backends.py
- Update Daytona bulk download test to assert the PID-suffixed path
pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR #8018 by @alt-glitch onto current main.
On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.
Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
- Retry with exponential backoff (3 attempts, 2s/4s/8s)
- SIGINT trap + defer (prevents partial writes on Ctrl-C)
- fcntl.flock serialization (concurrent gateway sandboxes)
- Last-write-wins conflict resolution with warning
- New remote files pulled back via _infer_host_path prefix matching
Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()
Fixes applied during salvage (vs original PR #8018):
| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |
Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).
Based on #8018 by @alt-glitch.
Sticky prompt:
The loop was skipping `first` (the first row in the viewport) when
looking for a user message scrolled above the top edge. If `first`
itself was a user row that had just ticked above the viewport, we'd
fall through the early-return guard (`role === 'user' && !above`),
then walk from `first - 1` backward — never rechecking `first`, never
finding anything, returning '' and leaving the sticky empty. This is
why it felt "stuck" at the start: one-turn sessions with the user row
exactly at/near the top never surfaced the breadcrumb.
Collapsed the two branches into one loop starting at `first`: nearest
user wins — still-on-screen → empty (redundant to echo), already
above → text. Same semantics, covers the gap.
Scrollbar:
`useSyncExternalStore` snapshot was `scrollTop:vp:scrollHeight` —
scrollHeight ticks up by ~1 row on every streamed chunk, forcing a
re-render per chunk. Quantized snapshot to the displayed values
(`thumbTop:thumbSize:vp`) so we only re-render when the bar actually
changes. Drops render count per turn by ~100x during streaming and
stops the "constantly resizes" flicker.
The gateway was gating `reasoning.delta` and `reasoning.available`
behind `_reasoning_visible(sid)` (true iff `display.show_reasoning:
true` or `tool_progress_mode: verbose`). With the default config,
neither was true — so reasoning events never reached the TUI,
`turn.reasoning` stayed empty, `reasoningTokens` stayed 0, and the
Thinking expander showed no token label for the whole turn. Tools
still reported tokens because `tool.start` had no such gate.
Then `message.complete` fired with `payload.reasoning` populated, the
TUI saved it into `msg.thinking`, and the finalized row's expander
sprouted "~36 tokens" post-hoc. That's the "tokens appear after the
turn" jank.
Remove the gate on emission. The TUI is responsible for whether to
display reasoning content (detailsMode + collapsed expander already
handle that). Token counting becomes continuous throughout the turn,
matching how tools work.
Also dropped the now-unused `_reasoning_visible` and
`_session_show_reasoning` helpers. `show_reasoning` config key stays
in place — it's still toggled via `/reasoning show|hide` and read
elsewhere for potential future TUI-side gating.
Two improvements:
1. The progress ToolTrail and the streaming MessageLine were two
sibling JSX blocks in appLayout with hand-rolled margin glue
between them. Extracted into `<StreamingAssistant>`, a single
component that owns both the trail and the streaming body plus
the 1-row gap between them. appLayout just hands it `progress`
and theme; the layout logic lives in one place, matching the
mental model that these two pieces are one live assistant turn.
2. Thinking token label was hidden when `reasoningTokens === 0` even
if the live reasoning text was already populated (the
scheduleReasoning timer hadn't ticked, or the model sent no
reasoning but the text was coming in via reasoning.delta).
Changed the tokenCount fallback from `reasoningTokens !==
undefined ? reasoningTokens : estimate` to `reasoningTokens > 0 ?
... : estimate` so the label appears the moment text exists.
`appLayout` was passing `busy={ui.busy && !progress.streaming}` into
ToolTrail, so the moment `message.delta` fired and streaming began,
the panel internally saw `busy=false`. With the prior fix in place
(hasThinking = !!cot || reasoningActive || busy), that flipped
hasThinking to false and the Thinking expander vanished mid-turn —
reappearing only after message.complete when the finalized row
rendered with its own internal expander.
The `!progress.streaming` override was a defensive guard against the
panel implying "still thinking" once the response text was streaming.
But that's already handled inside ToolTrail — `streaming` prop on the
Thinking component uses `busy && reasoningStreaming`, and
reasoningStreaming is already falsey once recordMessageDelta calls
endReasoningPhase.
Pass plain `busy={ui.busy}`. Panel stays up start-to-finish; handoff
to the finalized-message row is continuous.
Finalized assistant messages rendered the thinking/tools trail inside
MessageLine with marginBottom=1 before the response body — giving a
clean blank line above the text. The streaming path rendered the
progress ToolTrail and the streaming MessageLine as two separate
siblings with no margin between, so the in-progress response butted
right up against the thinking panel. That's the "newline appears
after it's done" jank.
Wrap the streaming MessageLine in a Box with marginTop=1 whenever the
progress area is visible above it. Same spacing as the finalized
version, continuous through the handoff.
Previously `hasThinking = !!cot || reasoningActive || (busy && !hasTools)`
so the moment a tool started streaming (`hasTools` → true) the expander
vanished mid-turn. If the model also produced no `reasoning.delta`
events (reasoning-less models, or reasoning arriving after tools), the
whole turn ran with no Thinking row — then `message.complete`
populated `msg.thinking` from the payload's post-hoc reasoning trace
and the expander suddenly appeared in the transcript AFTER the turn.
Drop the `!hasTools` restriction. The Thinking row now anchors for the
entire `busy` window; tools and thinking coexist as sibling sections
(they already did — the exclusion was a UX mistake). Reasoning-less
models show a dim empty header; streaming models show live content;
tool-interleaved turns keep the anchor visible throughout.
The status bar was showing stale lifecycle text ("running…") while the
face+verb stream flickered through the thinking panel as Python pushed
thinking.delta events. That's backwards — the face ticker is the
primary "I'm alive" signal, it belongs in the status bar; the thinking
panel is for substantive reasoning and tool activity.
Status bar now reads `ui.busy`: when true, renders a local `<FaceTicker>`
cycling FACES × VERBS on a 2.5s interval, unaffected by server events.
When false, the bar shows the actual status string (ready, starting
agent…, interrupted, etc.).
Side effect: `scheduleThinkingStatus` still patches `ui.status` with
Python's face text, but while busy the bar ignores that string and uses
the ticker instead. No server-side changes needed — Python keeps
emitting thinking.delta as a liveness heartbeat, the TUI just doesn't
let it fight the status bar.
"PT" was internal shorthand for prompt_toolkit that leaked into
AGENTS.md and the TUI post-mortem. Spell it out.
- AGENTS.md: "PT CLI" → "classic (prompt_toolkit) CLI"
- docs/plans/2026-04-01-ink-gateway-tui-migration-plan.md: both hits
"Ink" is the React reconciler — implementation detail, not branding.
Consistent naming: the classic CLI is the CLI, the new one is the TUI.
Updated docs: user-guide/tui.md, user-guide/cli.md cross-link, quickstart,
cli-commands reference, environment-variables reference.
Updated code: main.py --tui help text, server.py user-visible setup
error, AGENTS.md "TUI Architecture" section.
Kept "Ink" only where it is literally the library (hermes-ink internal
source comments, AGENTS.md tree note flagging ui-tui/ as a React/Ink dir).
New primary guide at `user-guide/tui.md` covering launch, requirements,
keybindings, slash commands, status line, configuration, sessions, and
the revert path. Matches the voice of `user-guide/cli.md`.
Cross-links:
- `user-guide/cli.md`: tip callout pointing readers at the Ink TUI
- `getting-started/quickstart.md`: shows both `hermes` and `hermes --tui`
under "Start Chatting" so first-run users know they have the choice
- `reference/environment-variables.md`: new "Interface" section with
`HERMES_TUI` and `HERMES_TUI_DIR`
- `reference/cli-commands.md`: `--tui` and `--dev` added to global options
Sidebar: `user-guide/tui` slotted right after `user-guide/cli`.
All 61 TUI-related tests green across 3 consecutive xdist runs.
tests/tui_gateway/test_protocol.py:
- rename `get_messages` → `get_messages_as_conversation` on mock DB (method
was renamed in the real backend, test was still stubbing the old name)
- update tool-message shape expectation: `{role, name, context}` matches
current `_history_to_messages` output, not the legacy `{role, text}`
tests/hermes_cli/test_tui_resume_flow.py:
- `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes
setup" before `_launch_tui` was ever reached; 3 tests stubbed
`_resolve_last_session` + `_launch_tui` but not the gate
- factored a `main_mod` fixture that stubs `_has_any_provider_configured`,
reused by all three tests
tests/test_tui_gateway_server.py:
- `test_config_set_personality_resets_history_and_returns_info` was flaky
under xdist because the real `_write_config_key` touches
`~/.hermes/config.yaml`, racing with any other worker that writes
config. Stub it in the test.
The Discord voice receive path skipped RFC 3550 §5.1 padding handling,
passing padding-contaminated payloads into DAVE E2EE decrypt and Opus
decode. Symptoms in live VC sessions: deaf inbound speech, intermittent
empty STT results, "corrupted stream" decode errors — especially on the
first reply after join.
When the P bit is set in the RTP header, the last payload byte holds the
count of trailing padding bytes (including itself) that must be removed.
Receive pipeline now follows the spec order:
1. RTP header parse
2. NaCl transport decrypt (aead_xchacha20_poly1305_rtpsize)
3. strip encrypted RTP extension data from start
4. strip RTP padding from end if P bit set ← was missing
5. DAVE inner media decrypt
6. Opus decode
Drops malformed packets where pad_len is 0 or exceeds payload length.
Adds 7 integration tests covering valid padded packets, the X+P combined
case, padding under DAVE passthrough, and three malformed-padding paths.
Closes#11267
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the dashboard connects to a remote gateway via GATEWAY_HEALTH_URL,
display the URL instead of the remote PID (which is meaningless locally).
Falls back to PID display for local gateways as before.
- Backend: expose gateway_health_url in /api/status response
- Frontend: prefer gateway_health_url over PID in gatewayValue()
- Add truncate + title tooltip for long URLs that overflow the card
- Add min-w-0/overflow-hidden on status cards for proper truncation
- Tests: verify gateway_health_url in remote and no-URL scenarios
Two new tests in tests/run_agent/ that pin the user-visible invariant
behind AlexKucera's Discord report (2026-04-16): no matter how a future
keepalive / transport fix for #10324 plumbs sockets in, sequential
chats on the same AIAgent instance must all succeed.
test_create_openai_client_reuse.py (no network, runs in CI):
- test_second_create_does_not_wrap_closed_transport_from_first
back-to-back _create_openai_client calls must not hand the same
http_client (after an SDK close) to the second construction
- test_replace_primary_openai_client_survives_repeated_rebuilds
three sequential rebuilds via the real _replace_primary_openai_client
entrypoint must each install a live client
test_sequential_chats_live.py (opt-in, HERMES_LIVE_TESTS=1):
- test_three_sequential_chats_across_client_rebuild
real OpenRouter round trips, with an explicit
_replace_primary_openai_client call between turns 2 and 3.
Error-sentinel detector treats 'API call failed after 3 retries'
replies as failures instead of letting them pass the naive
truthy check (which is how a first draft of this test missed
the bug it was meant to catch).
Validation:
clean main (post-revert, defensive copy present)
-> all 4 tests PASS
broken #10933 state (keepalive injection, no defensive copy)
-> all 4 tests FAIL with precise messages pointing at #10933
Companion to taeuk178's test_create_openai_client_kwargs_isolation.py,
which pins the syntactic 'don't mutate input dict' half of the same
contract. Together they catch both the specific mechanism of #10933
and any other reimplementation that breaks the sequential-call
invariant.
The language switcher displayed the *other* language's flag (clicking
the Chinese flag switched to Chinese). This is dissonant — a flag reads
as a state indicator first, so seeing the Chinese flag while the UI is
in English feels wrong. Users expect the flag to reflect the current
language, like every other status indicator.
Flips the flag and label ternaries so English shows UK + EN, Chinese
shows CN + 中文. Tooltip text ("Switch to Chinese" / "切换到英文") still
communicates the click action, which is where that belongs.
* fix(cli): stop approval panel from clipping approve/deny off-screen
The dangerous-command approval panel had an unbounded Window height with
choices at the bottom. When tirith findings produced long descriptions or
the terminal was compact, HSplit clipped the bottom of the widget — which
is exactly where approve/session/always/deny live. Users were asked to
decide on commands without being able to see the choices (and sometimes
the command itself was hidden too).
Fix: reorder the panel so title → command → choices render first, with
description last. Budget vertical rows so the mandatory content (command
and every choice) always fits, and truncate the description to whatever
row budget is left. Handle three edge cases:
- Long description in a normal terminal: description gets truncated at
the bottom with a '… (description truncated)' marker. Command and
all four choices always visible.
- Compact terminal (≤ ~14 rows): description dropped entirely. Command
and choices are the only content, no overflow.
- /view on a giant command: command gets truncated with a marker so
choices still render. Keeps at least 2 rows of command.
Same row-budgeting pattern applied to the clarify widget, which had the
identical structural bug (long question would push choices off-screen).
Adds regression tests covering all three scenarios.
* fix(cli): add compact chrome mode for approval/clarify panels on short terminals
Live PTY test at 100x14 rows revealed reserved_below=4 was too optimistic
— the spinner/tool-progress line, status bar, input area, separators, and
prompt symbol actually consume ~6 rows below the panel. At 14 rows, the
panel still got 'Deny' clipped off the bottom.
Fix: bump reserved_below to 6 (measured from live PTY output) and add a
compact-chrome mode that drops the blank separators between title/command
and command/choices when the full-chrome panel wouldn't fit. Chrome goes
from 5 rows to 3 rows in tight mode, keeping command + all 4 choices on
screen in terminals as small as ~13 rows.
Same compact-chrome pattern applied to the clarify widget.
Verified live in PTY hermes chat sessions at 100x14 (compact chrome
triggered, all choices visible) and 100x30 (full chrome with blanks, nice
spacing) by asking the agent to run 'rm -rf /tmp/sandbox'.
---------
Co-authored-by: Teknium <teknium@nousresearch.com>
Python (tui_gateway/server.py):
- hoist `_wait_agent` next to `_sess` so `_sess` no longer forward-refs
- simplify `_wait_agent`: `ready.wait()` already returns True when set,
no separate `.is_set()` check, collapse two returns into one expr
- factor `_sess_nowait` for handlers that don't need the agent (currently
`terminal.resize` + `input.detect_drop`) — DRY up the duplicated
`_sessions.get` + "session not found" dance
- inline `session = _sessions[sid]` in the session.create build thread so
agent/worker writes don't re-look-up the dict each time
- rename inline `ready_event` → `ready` (it's never ambiguous)
TS:
- `useSessionLifecycle.newSession`: hoist `r.info ?? null` into `info`
so it's one lookup, drop ceremonial `{ … }` blocks around single-line
bodies
- `createGatewayEventHandler.session.info`: wrap the case in a block,
hoist `ev.payload` into `info`, tighten comments
- `useMainApp` flush effect: collapse two guard returns into one
- `bootBanner.ts`: lift `TAGLINE` + `FALLBACK` to module constants, make
`GRADIENT` readonly, one-liner return via template literal
- `theme.ts`: group `selectionBg` inside the status* block (it's a UI
surface bg, same family), trim the comment
Users with 'commit.gpgsign = true' in their global git config got a
pinentry popup (or a failed commit) every time the agent took a
background filesystem snapshot — every write_file, patch, or diff
mid-session. With GPG_TTY unset, pinentry-qt/gtk would spawn a GUI
window, constantly interrupting the session.
The shadow repo is internal Hermes infrastructure. It must not
inherit user-level git settings (signing, hooks, aliases, credential
helpers, etc.) under any circumstance.
Fix is layered:
1. _git_env() sets GIT_CONFIG_GLOBAL=os.devnull,
GIT_CONFIG_SYSTEM=os.devnull, and GIT_CONFIG_NOSYSTEM=1. Shadow
git commands no longer see ~/.gitconfig or /etc/gitconfig at all
(uses os.devnull for Windows compat).
2. _init_shadow_repo() explicitly writes commit.gpgsign=false and
tag.gpgSign=false into the shadow's own config, so the repo is
correct even if inspected or run against directly without the
env vars, and for older git versions (<2.32) that predate
GIT_CONFIG_GLOBAL.
3. _take() passes --no-gpg-sign inline on the commit call. This
covers existing shadow repos created before this fix — they will
never re-run _init_shadow_repo (it is gated on HEAD not existing),
so they would miss layer 2. Layer 1 still protects them, but the
inline flag guarantees correctness at the commit call itself.
Existing checkpoints, rollback, list, diff, and restore all continue
to work — history is untouched. Users who had the bug stop getting
pinentry popups; users who didn't see no observable change.
Tests: 5 new regression tests in TestGpgAndGlobalConfigIsolation,
including a full E2E repro with fake HOME, global gpgsign=true, and
a deliberately broken GPG binary — checkpoint succeeds regardless.
* - make buffered streaming
- fix path naming to expand `~` for agent.
- fix stripping of matrix ID to not remove other mentions / localports.
* fix(matrix): register MembershipEventDispatcher for invite auto-join
The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE
events are only dispatched when MembershipEventDispatcher is registered on the
client. Without it, _on_invite is dead code and the bot silently ignores all
room invites.
Closes#10094Closes#10725
Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz)
* fix(matrix): preserve _joined_rooms reference for CryptoStateStore
connect() reassigned self._joined_rooms = set(...) after initial sync,
orphaning the reference captured by _CryptoStateStore at init time.
find_shared_rooms() returned [] forever, breaking Megolm session rotation
on membership changes.
Mutate in place with clear() + update() so the CryptoStateStore reference
stays valid.
Refs #8174, PR #8215
* fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race
mautrix auto-registers DecryptionDispatcher when client.crypto is set.
The adapter also registered _on_encrypted_event for the same event type.
_on_encrypted_event had zero awaits and won the race to mark event IDs
in the dedup set, causing _on_room_message to drop successfully decrypted
events from DecryptionDispatcher. The retry loop masked this by re-decrypting
every message ~4 seconds later.
Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption;
genuinely undecryptable events are logged by mautrix and retried on next
key exchange.
Refs #8174, PR #8215
* fix(matrix): re-verify device keys after share_keys() upload
Matrix homeservers treat ed25519 identity keys as immutable per device.
share_keys() can return 200 but silently ignore new keys if the device
already exists with different identity keys. The bot would proceed with
shared=True while peers encrypt to the old (unreachable) keys.
Now re-queries the server after share_keys() and fails closed if keys
don't match, with an actionable error message.
Refs #8174, PR #8215
* fix(matrix): encrypt outbound attachments in E2EE rooms
_upload_and_send() uploaded raw bytes and used the 'url' key for all
rooms. In E2EE rooms, media must be encrypted client-side with
encrypt_attachment(), the ciphertext uploaded, and the 'file' key
(with key/iv/hashes) used instead of 'url'.
Now detects encrypted rooms via state_store.is_encrypted() and
branches to the encrypted upload path.
Refs: PR #9822 (charles-brooks)
* fix(matrix): add stop_typing to clear typing indicator after response
The adapter set a 30-second typing timeout but never cleared it.
The base class stop_typing() is a no-op, so the typing indicator
lingered for up to 30 seconds after each response.
Closes#6016
Refs: PR #6020 (r266-tech)
* fix(matrix): cache all media types locally, not just photos/voice
should_cache_locally only covered PHOTO, VOICE, and encrypted media.
Unencrypted audio/video/documents in plaintext rooms were passed as MXC
URLs that require authentication the agent doesn't have, resulting
in 401 errors.
Refs #3487, #3806
* fix(matrix): detect stale OTK conflict on startup and fail closed
When crypto state is wiped but the same device ID is reused, the
homeserver may still hold one-time keys signed with the previous
identity key. Identity key re-upload succeeds but OTK uploads fail
with "already exists" and a signature mismatch. Peers cannot
establish new Olm sessions, so all new messages are undecryptable.
Now proactively flushes OTKs via share_keys() during connect() and
catches the "already exists" error with an actionable log message
telling the operator to purge the device from the homeserver or
generate a fresh device ID.
Also documents the crypto store recovery procedure in the Matrix
setup guide.
Refs #8174
* docs(matrix): improve crypto recovery docs per review
- Put easy path (fresh access token) first, manual purge second
- URL-encode user ID in Synapse admin API example
- Note that device deletion may invalidate the access token
- Add "stop Synapse first" caveat for direct SQLite approach
- Mention the fail-closed startup detection behavior
- Add back-reference from upgrade section to OTK warning
* refactor(matrix): cleanup from code review
- Extract _extract_server_ed25519() and _reverify_keys_after_upload()
to deduplicate the re-verification block (was copy-pasted in two
places, three copies of ed25519 key extraction total)
- Remove dead code: _pending_megolm, _retry_pending_decryptions,
_MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after
removing _on_encrypted_event
- Remove tautological TestMediaCacheGate (tested its own predicate,
not production code)
- Remove dead TestMatrixMegolmEventHandling and
TestMatrixRetryPendingDecryptions (tested removed methods)
- Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator
- Trim comment to just the "why"
The blocking gateway approval wait at tools/approval.py called
`entry.event.wait(timeout=...)` which never touched the agent's
activity tracker. When a user was slow to respond to a /approve prompt
(or the gateway_timeout config was set higher than the default 300s),
the agent thread sat silent long enough for the gateway's inactivity
watchdog (agent.gateway_timeout, default 1800s) to kill it — even
though the agent was doing exactly the right thing and the user was
the one causing the delay.
The fix polls the event in 1s slices and calls touch_activity_if_due
between slices, mirroring the _wait_for_process() pattern in
tools/environments/base.py that covers the subprocess-waiting side of
the same problem. At the default 10s heartbeat cadence, a 300s
approval wait now pings activity ~30 times, well under the 1800s
idle threshold.
Observed in community user logs: 12 repeated 'Agent idle 1800s,
last_activity=executing tool: terminal' events across April 12-14.
Companion to PR #10501 which covered streaming / concurrent-tool /
Modal-backend gaps but did not touch approval.py.
Test: tests/tools/test_approval_heartbeat.py — verifies (1) heartbeats
fire during the wait, (2) user responses are still near-instant, and
(3) the approval path stays functional when the heartbeat helper
can't be imported.
Users (Teknium) report missing debug reports before the 1-hour auto-delete
fires. 6 hours gives enough window for async bug-report triage without
leaving sensitive log data on public paste services indefinitely.
Applies to both the CLI (hermes debug share) and gateway (/debug) paths.
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:
- tools/tts_tool.py: new _generate_gemini_tts() calls the
generativelanguage REST endpoint with responseModalities=[AUDIO],
wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
then ffmpeg-converts to MP3 or Opus depending on output extension.
For .ogg output, libopus is forced explicitly so Telegram voice
bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
status display, and API key prompt branch (accepts existing
GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
overrides, snake_case vs camelCase inlineData handling, HTTP error
surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
gemini config block and ffmpeg notes.
Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
Selection was falling back to SGR-7 inverse (fg ↔ bg per cell), which
fragments over syntax-highlighted content — each amber/gold/dim/cornsilk
fg turned into a different bg stripe, producing the staircase look.
Now `useMainApp` calls `selection.setSelectionBgColor()` with a muted
navy (`#3a3a55`) on theme change. `setSelectionBg` in screen.ts replaces
just the bg cell-by-cell while preserving fg/bold/dim/italic, so the
highlight is one solid color across the whole drag range and the text
stays readable in its original color.
Skins can override via `selection_bg` in their color map.
Dynamic-importing @hermes/ink + App costs ~170ms on cold start — during
that window the terminal was blank. Now `entry.tsx` writes a raw-ANSI
banner to stdout immediately after the TTY check, using hardcoded
DEFAULT_THEME colors. Ink's `<AlternateScreen>` wipes the normal-screen
buffer when it mounts, so the boot banner is replaced seamlessly by the
real React render a moment later — no double-banner, no flash.
T=2ms banner visible (vs. ~170ms before)
T=~170ms React + Ink mounts
T=~200ms alt screen takes over, Banner component repaints
Palette drift between `bootBanner.ts` and the live theme is harmless —
the live render overrides after ~200ms. Narrow terminals (cols < 98)
fall back to the one-line "⚕ NOUS HERMES" marker.
Post-async-session.create, `session.create` returns in ~1ms with partial
info and the real agent fires `session.info` ~1s later. Previously the
status bar went straight to 'ready' right after the instant RPC return,
which was misleading — `prompt.submit` would block server-side waiting
for the agent to finish building.
Now:
- `newSession`: status = 'starting agent…' when info has no `version`,
else 'ready' (covers the fast resume path too)
- `session.info` event: flips status to 'ready' only if it was
'starting agent…', preserving running/interrupted/error states
Previously `session.create` blocked for ~1.2s on `_make_agent` (mostly
`run_agent` transitive imports + AIAgent constructor). The UI waited
through that whole window before sid became known and the banner/panel
could render.
Now `session.create` returns immediately with `{session_id, info:
{model, cwd, tools:{}, skills:{}}}` and spawns a background thread that
does the real `_make_agent` + `_init_session`. When the agent is live,
the thread emits `session.info` with the full payload.
Python side:
- `_sessions[sid]` gets a placeholder dict with `agent=None` and a
`threading.Event()` named `agent_ready`
- `_wait_agent(session, rid, timeout=30)` blocks until the event is set
(no-op when already set or absent, e.g. for `session.resume`)
- `_sess()` now calls `_wait_agent` — so every handler routed through it
(prompt.submit, session.usage, session.compress, session.branch,
rollback.*, tools.configure, etc.) automatically holds until the agent
is live, but only during the ~1s startup window
- `terminal.resize` and `input.detect_drop` bypass the wait via direct
dict lookup — they don't touch the agent and would otherwise block
the first post-startup RPCs unnecessarily
TS side:
- `session.info` event handler now patches the intro message's `info`
in-place so the seeded banner upgrades to the full session panel when
the agent finishes initializing
- `appLayout` gates `SessionPanel` on `info.version` being present
(only set by `_session_info(agent)`, not by the partial payload from
`session.create`) — so the panel only appears when real data arrives
Net effect on cold start:
T=~400ms banner paints (seeded intro)
T=~245ms ui.sid set (session.create responds in ~1ms after ready)
T=~1400ms session panel fills in (real session.info event)
Pre-session keystrokes queue as before (already handled by the flush
effect); `prompt.submit` will wait on `agent_ready` on the Python side
when the flush tries to send before the agent is live.