fix(tui): anchor splitReasoning unclosed-tag regex to start of input (#29426)

`splitReasoning()` strips paired `<think>…</think>` blocks first, then runs
an unclosed-trailing regex to catch reasoning that hasn't yet streamed its
closer. That second regex was unanchored and greedy:

    new RegExp(`<${tag}>([\\s\\S]*)$`, 'i')

So any literal `<think>` somewhere in prose — a model quoting the tag, a
code example, or a stream-mid-tag before the closer arrives — consumed
every paragraph after it to EOF. User-visible symptom: "TUI eats last
paragraph of output," both during streaming and on settled turns.

Real reasoning streams always lead the message (that's the only place an
unclosed opener can legitimately appear during streaming). Anchor the
regex to `^\s*` so mid-prose mentions of the tag are preserved.

Empirical repro before the fix:

    splitReasoning('final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.')
    → text: 'final answer paragraph one.'        ← paragraph two GONE

After:

    → text: 'final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.'

Updated the existing trailing-unclosed test to lead with `<think>` (the
real-world shape) and added a regression test pinning the mid-text case.

ui-tui type-check clean, 808/808 vitest pass.
This commit is contained in:
brooklyn! 2026-05-20 14:09:38 -05:00 committed by GitHub
parent eeb747de25
commit 88f5186d35
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 24 additions and 4 deletions

View file

@ -21,11 +21,26 @@ describe('splitReasoning', () => {
expect(text).toBe('body')
})
it('treats unclosed trailing <think>… as reasoning', () => {
const { reasoning, text } = splitReasoning('answer start <think>still deciding')
it('treats unclosed leading <think>… as reasoning (real reasoning-model stream)', () => {
const { reasoning, text } = splitReasoning('<think>still deciding')
expect(reasoning).toBe('still deciding')
expect(text).toBe('answer start')
expect(text).toBe('')
})
it('does not strip trailing prose after a stray mid-text <think> mention', () => {
// Regression for "TUI eats last paragraph of output": when the model
// emits a literal `<think>` somewhere in prose (quoted explanation, code
// example, partial stream-mid-tag), the trailing greedy unclosed-tag
// regex used to consume every paragraph after it. Real unclosed
// reasoning blocks always lead the message — anchor to ^ so prose
// mentions are preserved.
const { reasoning, text } = splitReasoning(
'final answer paragraph one.\n\n<think>internal note never closed\n\nfinal answer paragraph two.'
)
expect(reasoning).toBe('')
expect(text).toBe('final answer paragraph one.\n\n<think>internal note never closed\n\nfinal answer paragraph two.')
})
it('returns empty reasoning and untouched text when no tags present', () => {

View file

@ -21,7 +21,12 @@ export function splitReasoning(input: string): SplitReasoning {
return ''
})
const unclosed = new RegExp(`<${tag}>([\\s\\S]*)$`, 'i')
// Anchor to start-of-input so a literal `<think>` mid-prose (model quoting
// the word, code blocks containing the tag, etc.) doesn't eat every
// paragraph after it. Real unclosed reasoning blocks always lead the
// message — that's how reasoning models stream. See test
// "does not strip trailing prose after a stray mid-text <think> mention".
const unclosed = new RegExp(`^\\s*<${tag}>([\\s\\S]*)$`, 'i')
text = text.replace(unclosed, (_m, inner: string) => {
const trimmed = inner.trim()