fix(goals): auto-pause when judge model returns unparseable output

Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like

  judge returned empty response
  judge reply was not JSON: "Let me analyze whether the goal..."

and /goal clear could not stop it mid-loop without /stop.

After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:

  ⏸ Goal paused — the judge model (3 turns) isn't returning the
  required JSON verdict. Route the judge to a stricter model in
  ~/.hermes/config.yaml:
    auxiliary:
      goal_judge:
        provider: openrouter
        model: google/gemini-3-flash-preview
  Then /goal resume to continue.

The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.

Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
This commit is contained in:
Teknium 2026-05-07 17:19:47 -07:00
parent 03ddff8897
commit 307c85e5c1
4 changed files with 270 additions and 49 deletions

View file

@ -58,6 +58,7 @@ AUTHOR_MAP = {
"223003280+Abd0r@users.noreply.github.com": "Abd0r",
"abdielv@proton.me": "AJV20",
"mason@growagainorchids.com": "masonjames",
"ytchen0719@gmail.com": "liquidchen",
"am@studio1.tailb672fe.ts.net": "subtract0",
"axmaiqiu@gmail.com": "qWaitCrypto",
"159539633+MottledShadow@users.noreply.github.com": "MottledShadow",