mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame` only. The "at most one flush per frame" guarantee that gives you is fine for fast streams (>~80 tok/sec) where multiple tokens arrive within a single frame, but breaks down at typical LLM token rates (30-80 tok/sec) where each token arrives slower than the rAF cadence and triggers its own React commit + Streamdown markdown re-parse. Track `lastFlushAt` and require at least 33 ms between two flushes. React 18+ auto-batching probabilistically already collapsed some of these, but the floor makes it deterministic. A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks): | | avgFps | p99 frame | LTs / 5 s | max LT | |---|---|---|---|---| | no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms | | 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms | `inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms, which is the expected signature of a deterministic floor. Doesn't fully solve the user's perceived hitches — Streamdown's per-Block parse cost when the last block grows past ~2 k chars is still the elephant — but it consistently shaves the worst-case longtask and makes the streaming cadence visibly steadier. Also threads a matching `flushMinMs` option through the synthetic stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs` so the harness can A/B both regimes without spending LLM credits. See `scripts/profile-typing-lag.md` for the full investigation. |
||
|---|---|---|
| .. | ||
| dashboard | ||
| desktop | ||
| shared | ||