CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:
Output.get() per-frame walk: 24% total
└─ sliceAnsi(line, from, to) per write: 18% total
stringWidth(line) chain (cached + JS): 14% total
All three were re-doing identical work every frame: same string → same
clipped slice → same width.
Fixes:
1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
skips the cache (inline scan beats Map.get for short ASCII, the >90%
case). String.charCodeAt scan up to 64 chars is cheaper than the regex
fallback.
2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
is pure and the same content reflows identically every frame.
3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
end-defined hot path used by Output.get().
4. Skip the slice entirely in Output.get() when the line already fits the
clip box (startsBefore=false && endsAfter=false). Most transcript lines
never exceed their container width, and tokenizing them just to slice
(line, 0, width) was pure overhead. This single fast-path drops
sliceAnsi from 18% → ~0% in the profile.
Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.
Validation, real-user CPU profile, page-up scroll on 11k-line session:
Output.get() self-time: 24% → 0.3%
sliceAnsi total: 18% → not in top 25
stringWidth family: 14% → ~3%
idle: 60.7% → 77.3%
Frame timings (synthetic page-up profile harness):
dur p95: ~10ms → 4.87ms
dur p99: 25ms+ → 12.80ms
yoga p99: ~20ms → 1.87ms
The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).
Implementation:
* lib/fpsStore.ts — nanostore atom updated from a trackFrame()
sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
trackFrame is undefined when SHOW_FPS is off so ink's onFrame
short-circuits at the optional chain.
* components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
when SHOW_FPS is off (React skips the subtree entirely).
* entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
trackFrame (fps) so both flags can coexist. When both are off,
onFrame is undefined and ink never attaches the handler.
* appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
aligned Box below the composer, conditional on SHOW_FPS.
Usage:
HERMES_TUI_FPS=1 hermes --tui
# bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red)
Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.
126 files post-compile with React Compiler; 352 tests still pass.
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.
Result: definitively NO. Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height. Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.
Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):
metric fullscreen inline delta
patches_total 28,864 1,111,574 +3751%
writeBytes_total 42 KB 1.6 MB +3881%
fps_throughput 15.8 fps 1.75 fps -89%
frames 179 18 -90%
gap_p50_ms 17 (~60fps) 726 (~1fps) +4170%
yoga_p99 34 ms 405 ms +1083%
renderer_p99 14 ms 169 ms +1062%
flickers 0 5 offscreen —
This is actually the cleanest data we've gotten so far:
* AlternateScreen is LOAD-BEARING for perf — its viewport height
constraint is what lets useVirtualHistory's culling work. No
constraint → ScrollBox grows unbounded → every fiber mounts.
* The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
frames. Our terminal-write hypothesis from last session was
wrong: the bottleneck is React + Yoga, not the wire.
* Doing proper inline mode (non-virtualized transcript in
scrollback, composer pinned below) is not a flag flip — it's a
different UI architecture. Leaving this flag in so anyone
re-running the experiment gets the same numbers, but not
building the architecture until we're sure the perf win is
worth the UX loss (it probably isn't — the fullscreen + virt
path is the one we should optimize, not replace).
Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
User observation: "it doesn't scroll line by line/row by row."
Was right. Two places hardcoded big deltas:
1. WHEEL_SCROLL_STEP = 6 (config/limits.ts)
Each wheel event scrolled 6 rows. A mechanical wheel notch emits
3-5 events → 18-30 rows per click, which visually teleports past
content instead of smooth-scrolling it. Drop to 1. Trackpads
emit 50-100 events per flick — at step=1 that's still a fast flick
(a whole viewport in one flick) but each intermediate frame is
visible. Porting claude-code's wheel accel state machine is the
right next step if this feels sluggish on precision scrolls.
2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts)
Full-viewport jumps replace the entire screen — no visual
continuity, can't scan content — AND land right at Ink's fast-path
threshold (`delta < innerHeight`), which disqualifies the DECSTBM
blit on every press. Half-viewport keeps 50% continuity AND
drops well under the threshold. Two presses still cover the same
total distance.
Profiled against the 1106-msg session, holding the key at 30Hz for
6s:
wheel_up (step 6 → 1):
frames 142 → 163 (+15%)
throughput 10.7 → 15.8 fps (+48%)
patches tot 53018→ 36562 (-31%)
gap p50 5ms → 16ms (actual rendering ~60fps now)
<16ms frames 93 → 76
16-33ms 82 → 76
hitches 3 → 1
pageUp (viewport-2 → viewport/2):
throughput 10.7 → 9.5 fps (same ballpark — smaller delta × same
event rate = less total scroll)
Ink's proportional drain caps at `innerHeight - 1` per frame to keep
the DECSTBM fast path firing. With these smaller deltas every event
comfortably fits under that cap, so fast-path hit rate goes up and
patch volume per frame drops — the measured 31% reduction in total
patches-sent correlates with users perceiving smoother scrolling
because the outer terminal (VS Code / xterm.js / tmux) isn't drowning
in ANSI between paints.
Tests/type-check/build clean; 352 tests pass.
- providers.ts: drop the `dup` intermediate, fold the ternary inline
- paths.ts (fmtCwdBranch): inline `b` into the `tag` template
- prompts.tsx (ConfirmPrompt): hoist a single `lower = ch.toLowerCase()`,
collapse the three early-return branches into two, drop the
redundant bounds checks on arrow-key handlers (setSel is idempotent
at 0/1), inline the `confirmLabel`/`cancelLabel` defaults at the
use site
- modelPicker.tsx / config/env.ts / providers.test.ts: auto-formatter
reflows picked up by `npm run fix`
- useInputHandlers.ts: drop the stray blank line that was tripping
perfectionist/sort-imports (pre-existing lint error)
Prevents accidental session loss: the first press prints
"press /clear again within 3s to confirm"; a second press inside
the window actually starts a new session. Outside the window the
gate re-arms.
Opt out with HERMES_TUI_NO_CONFIRM=1 for scripted / muscle-memory
workflows.
Refs #4069.
Hoist turn state from a 286-line hook into $turnState atom + turnController
singleton. createGatewayEventHandler becomes a typed dispatch over the
controller; its ctx shrinks from 30 fields to 5. Event-handler refs and 16
threaded actions are gone.
Fold three createSlash*Handler factories into a data-driven SlashCommand[]
registry under slash/commands/{core,session,ops}.ts. Aliases are data;
findSlashCommand does name+alias lookup. Shared guarded/guardedErr combinator
in slash/guarded.ts.
Split constants.ts + app/helpers.ts into config/ (timing/limits/env),
content/ (faces/placeholders/hotkeys/verbs/charms/fortunes), domain/ (roles/
details/messages/paths/slash/viewport/usage), protocol/ (interpolation/paste).
Type every RPC response in gatewayTypes.ts (26 new interfaces); drop all
`(r: any)` across slash + main app.
Shrink useMainApp from 1216 -> 646 lines by extracting useSessionLifecycle,
useSubmission, useConfigSync. Add <Fg> themed primitive and strip ~50
`as any` color casts.
Tests: 50 passing. Build + type-check clean.