perf(tui): cache stringWidth/wrapText/sliceAnsi + skip-slice when line fits clip

CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three hot loops in the per-frame render path: Output.get() per-frame walk: 24% total └─ sliceAnsi(line, from, to) per write: 18% total stringWidth(line) chain (cached + JS): 14% total All three were re-doing identical work every frame: same string → same clipped slice → same width. Fixes: 1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path skips the cache (inline scan beats Map.get for short ASCII, the >90% case). String.charCodeAt scan up to 64 chars is cheaper than the regex fallback. 2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi is pure and the same content reflows identically every frame. 3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the end-defined hot path used by Output.get(). 4. Skip the slice entirely in Output.get() when the line already fits the clip box (startsBefore=false && endsAfter=false). Most transcript lines never exceed their container width, and tokenizing them just to slice (line, 0, width) was pure overhead. This single fast-path drops sliceAnsi from 18% → ~0% in the profile. Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20, SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16 lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS window still render in full so reading-zone behavior is unchanged. Validation, real-user CPU profile, page-up scroll on 11k-line session: Output.get() self-time: 24% → 0.3% sliceAnsi total: 18% → not in top 25 stringWidth family: 14% → ~3% idle: 60.7% → 77.3% Frame timings (synthetic page-up profile harness): dur p95: ~10ms → 4.87ms dur p99: 25ms+ → 12.80ms yoga p99: ~20ms → 1.87ms The remaining CPU in the profile is Yoga layoutNode + React commit, which is the irreducible work for this UI tree size.
2026-05-24 05:41:40 +00:00 · 2026-04-26 19:28:09 -05:00 · 2026-04-26 19:28:09 -05:00 · c370e2e1e5
commit c370e2e1e5
parent 85e9a23efb
14 changed files with 450 additions and 42 deletions
--- a/ui-tui/packages/hermes-ink/src/ink/stringWidth.ts
+++ b/ui-tui/packages/hermes-ink/src/ink/stringWidth.ts
@ -270,6 +270,58 @@ const bunStringWidth = typeof Bun !== 'undefined' && typeof Bun.stringWidth ===

 const BUN_STRING_WIDTH_OPTS = { ambiguousIsNarrow: true } as const

-export const stringWidth: (str: string) => number = bunStringWidth
+const rawStringWidth: (str: string) => number = bunStringWidth
  ? str => bunStringWidth(str, BUN_STRING_WIDTH_OPTS)
  : stringWidthJavaScript
+
+// Memoize stringWidth — it's pure, hot (~100k calls/frame per the comment
+// above), and the underlying impl scans every grapheme + tests EMOJI_REGEX.
+// CPU profile (Apr 2026) showed stringWidth dominating at 21% of total
+// runtime during scroll. Cache is global (vs per-frame) since the same
+// strings recur across frames in a stable transcript.
+//
+// Pure-ASCII short-strings (the >90% common case) skip the cache: the inline
+// loop in stringWidthJavaScript is already faster than a Map.get for them.
+const widthCache = new Map<string, number>()
+const WIDTH_CACHE_LIMIT = 8192
+
+export const stringWidth: (str: string) => number = str => {
+  if (!str) {
+    return 0
+  }
+
+  // ASCII fast-path detection — for short ASCII, skip the cache.
+  if (str.length <= 64) {
+    let asciiOnly = true
+
+    for (let i = 0; i < str.length; i++) {
+      const code = str.charCodeAt(i)
+
+      if (code >= 127 || code === 0x1b) {
+        asciiOnly = false
+        break
+      }
+    }
+
+    if (asciiOnly) {
+      return rawStringWidth(str)
+    }
+  }
+
+  const cached = widthCache.get(str)
+
+  if (cached !== undefined) {
+    return cached
+  }
+
+  const w = rawStringWidth(str)
+
+  if (widthCache.size >= WIDTH_CACHE_LIMIT) {
+    // Drop oldest entry — Map iteration order is insertion order.
+    widthCache.delete(widthCache.keys().next().value!)
+  }
+
+  widthCache.set(str, w)
+
+  return w
+}