hermes-agent/ui-tui/src/lib/reasoning.ts
brooklyn! 88f5186d35
fix(tui): anchor splitReasoning unclosed-tag regex to start of input (#29426)
`splitReasoning()` strips paired `<think>…</think>` blocks first, then runs
an unclosed-trailing regex to catch reasoning that hasn't yet streamed its
closer. That second regex was unanchored and greedy:

    new RegExp(`<${tag}>([\\s\\S]*)$`, 'i')

So any literal `<think>` somewhere in prose — a model quoting the tag, a
code example, or a stream-mid-tag before the closer arrives — consumed
every paragraph after it to EOF. User-visible symptom: "TUI eats last
paragraph of output," both during streaming and on settled turns.

Real reasoning streams always lead the message (that's the only place an
unclosed opener can legitimately appear during streaming). Anchor the
regex to `^\s*` so mid-prose mentions of the tag are preserved.

Empirical repro before the fix:

    splitReasoning('final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.')
    → text: 'final answer paragraph one.'        ← paragraph two GONE

After:

    → text: 'final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.'

Updated the existing trailing-unclosed test to lead with `<think>` (the
real-world shape) and added a regression test pinning the mid-text case.

ui-tui type-check clean, 808/808 vitest pass.
2026-05-20 14:09:38 -05:00

55 lines
1.4 KiB
TypeScript

const TAGS = ['think', 'reasoning', 'thinking', 'thought', 'REASONING_SCRATCHPAD'] as const
export interface SplitReasoning {
reasoning: string
text: string
}
export function splitReasoning(input: string): SplitReasoning {
let text = input
const reasoning: string[] = []
for (const tag of TAGS) {
const paired = new RegExp(`<${tag}>([\\s\\S]*?)</${tag}>\\s*`, 'gi')
text = text.replace(paired, (_m, inner: string) => {
const trimmed = inner.trim()
if (trimmed) {
reasoning.push(trimmed)
}
return ''
})
// Anchor to start-of-input so a literal `<think>` mid-prose (model quoting
// the word, code blocks containing the tag, etc.) doesn't eat every
// paragraph after it. Real unclosed reasoning blocks always lead the
// message — that's how reasoning models stream. See test
// "does not strip trailing prose after a stray mid-text <think> mention".
const unclosed = new RegExp(`^\\s*<${tag}>([\\s\\S]*)$`, 'i')
text = text.replace(unclosed, (_m, inner: string) => {
const trimmed = inner.trim()
if (trimmed) {
reasoning.push(trimmed)
}
return ''
})
}
return {
reasoning: reasoning.join('\n\n').trim(),
text: text.trim()
}
}
export const hasReasoningTag = (input: string) => {
for (const tag of TAGS) {
if (input.includes(`<${tag}>`)) {
return true
}
}
return false
}