feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232)

Workers completing a kanban task can now claim the ids of cards they
created via an optional ``created_cards`` field on ``kanban_complete``.
The kernel verifies each id exists and was created by the completing
worker's profile; any phantom id blocks the completion with a
``HallucinatedCardsError`` and records a
``completion_blocked_hallucination`` event on the task so the rejected
attempt is auditable. Successful completions also get a non-blocking
prose-scan pass over their ``summary`` + ``result`` that emits a
``suspected_hallucinated_references`` event for any ``t_<hex>``
reference that doesn't resolve.

Closes #20017.

Recovery UX (kernel + CLI + dashboard)
--------------------------------------

A structural gate alone isn't enough — operators also need to see and
act on stuck workers, especially when a profile's model is the root
cause. This PR ships the full loop:

* ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that
  releases an active worker claim immediately (unlike
  ``release_stale_claims`` which only acts after claim_expires has
  passed). Emits a ``reclaimed`` event with ``manual: True`` payload.
* ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` —
  switch a task to a different profile, optionally reclaiming a stuck
  running worker in the same call.
* ``hermes kanban reclaim <id> [--reason ...]`` and
  ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]``
  CLI subcommands wired through to the same helpers.
* ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and
  ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the
  dashboard plugin.

Dashboard surfacing
-------------------

* ⚠ **warning badge** on cards with active hallucination events.
* **attention strip** at the top of the board listing all flagged
  tasks; dismissible per session.
* **events callout** in the task drawer — hallucination events render
  with a red left border, amber icon, and phantom ids as styled chips.
* **recovery section** in the task drawer with three actions: Reclaim,
  Reassign (with profile picker + reclaim-first checkbox), and a
  copy-to-clipboard hint for ``hermes -p <profile> model`` since
  profile config lives on disk and can't be edited from the browser.
  Auto-opens when the task has warnings, collapsed otherwise.
  Keyed by task id so state doesn't leak between drawers.

Active-vs-stale rule: warnings clear when a clean ``completed`` or
``edited`` event supersedes the hallucination, so recovery is never
permanently stigmatising — the audit events persist for debugging but
the badge goes away once the worker succeeds.

Skill updates
-------------

* ``skills/devops/kanban-worker/SKILL.md`` documents the
  ``created_cards`` contract with good/bad examples.
* ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering
  stuck workers" section with the three actions and when to use each.

Tests
-----

* Kernel gate: verified-cards manifest, phantom rejection + audit
  event, cross-worker rejection, prose scan positive + negative.
* Recovery helpers: reclaim on running task, reclaim on non-running
  returns False, reassign refuses running without reclaim_first,
  reassign with reclaim_first succeeds on running.
* API endpoints: warnings field present on /board and /tasks/:id,
  warnings cleared after clean completion, reclaim 200 + 409 paths,
  reassign 200 + 409 + reclaim_first paths.
* CLI smoke: reclaim + reassign subcommands.

Live-verified end-to-end on a dashboard with seeded scenarios:
attention strip renders, badges land on the right cards, drawer
callout shows phantom chips, Reclaim on a running task flips status to
ready + emits manual reclaimed event + refreshes the drawer,
Reassign swaps the assignee and triggers board refresh.

359/359 kanban-suite tests pass
(test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
This commit is contained in:
Teknium 2026-05-05 08:06:55 -07:00 committed by GitHub
parent 7de3c86c5a
commit de9238d37e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 1791 additions and 17 deletions

View file

@ -176,6 +176,74 @@ def _run_dict(r: kanban_db.Run) -> dict[str, Any]:
}
# Hallucination-warning event kinds — see complete_task() in kanban_db.py.
# completion_blocked_hallucination: kernel rejected created_cards with
# phantom ids; task stays in prior state.
# suspected_hallucinated_references: prose scan found t_<hex> in summary
# that doesn't resolve; completion succeeded, advisory only.
_WARNING_EVENT_KINDS = (
"completion_blocked_hallucination",
"suspected_hallucinated_references",
)
def _compute_warnings_for_tasks(
conn: sqlite3.Connection,
task_ids: Optional[list[str]] = None,
) -> dict[str, dict]:
"""Return {task_id: {count, kinds, latest_at}} for tasks with
hallucination warnings that occurred AFTER the most recent clean
completion event (completed / edited). An empty dict means no tasks
on the board have active warnings.
``task_ids`` narrows the query; pass ``None`` to scan the whole DB
(matches board-level rollup). Used by both the /board aggregate and
per-task /tasks/:id endpoints.
"""
params: tuple = ()
if task_ids is not None:
if not task_ids:
return {}
placeholders = ",".join(["?"] * len(task_ids))
sql = (
"SELECT task_id, kind, created_at FROM task_events "
f"WHERE task_id IN ({placeholders}) AND kind IN "
"('completion_blocked_hallucination', "
" 'suspected_hallucinated_references', "
" 'completed', 'edited') "
"ORDER BY task_id, id"
)
params = tuple(task_ids)
else:
sql = (
"SELECT task_id, kind, created_at FROM task_events "
"WHERE kind IN "
"('completion_blocked_hallucination', "
" 'suspected_hallucinated_references', "
" 'completed', 'edited') "
"ORDER BY task_id, id"
)
out: dict[str, dict] = {}
for row in conn.execute(sql, params).fetchall():
tid = row["task_id"]
kind = row["kind"]
created_at = row["created_at"]
if kind in ("completed", "edited"):
# Clean event wipes prior warning counters; only events after
# this timestamp count.
out.pop(tid, None)
continue
bucket = out.setdefault(
tid, {"count": 0, "kinds": {}, "latest_at": 0}
)
bucket["count"] += 1
bucket["kinds"][kind] = bucket["kinds"].get(kind, 0) + 1
if created_at > bucket["latest_at"]:
bucket["latest_at"] = created_at
return out
def _links_for(conn: sqlite3.Connection, task_id: str) -> dict[str, list[str]]:
"""Return {'parents': [...], 'children': [...]} for a task."""
parents = [
@ -253,6 +321,11 @@ def get_board(
if row["cstatus"] == "done":
p["done"] += 1
# Hallucination-warning rollup for this board (all tasks).
# Delegated to _compute_warnings_for_tasks so the per-task
# /tasks/:id endpoint can reuse the same rule.
warnings_per_task = _compute_warnings_for_tasks(conn, task_ids=None)
latest_event_id = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
).fetchone()["m"]
@ -266,6 +339,9 @@ def get_board(
d["link_counts"] = link_counts.get(t.id, {"parents": 0, "children": 0})
d["comment_count"] = comment_counts.get(t.id, 0)
d["progress"] = progress.get(t.id) # None when the task has no children
w = warnings_per_task.get(t.id)
if w:
d["warnings"] = w
col = t.status if t.status in columns else "todo"
columns[col].append(d)
@ -313,8 +389,14 @@ def get_task(task_id: str, board: Optional[str] = Query(None)):
task = kanban_db.get_task(conn, task_id)
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
task_d = _task_dict(task)
# Attach warnings metadata so the drawer's Recovery section can
# auto-open when a hallucination is unresolved.
warnings = _compute_warnings_for_tasks(conn, task_ids=[task_id])
if warnings.get(task_id):
task_d["warnings"] = warnings[task_id]
return {
"task": _task_dict(task),
"task": task_d,
"comments": [_comment_dict(c) for c in kanban_db.list_comments(conn, task_id)],
"events": [_event_dict(e) for e in kanban_db.list_events(conn, task_id)],
"links": _links_for(conn, task_id),
@ -713,6 +795,85 @@ def bulk_update(payload: BulkTaskBody, board: Optional[str] = Query(None)):
conn.close()
# ---------------------------------------------------------------------------
# Recovery actions — reclaim a running claim, reassign to a new profile
# ---------------------------------------------------------------------------
class ReclaimBody(BaseModel):
reason: Optional[str] = None
@router.post("/tasks/{task_id}/reclaim")
def reclaim_task_endpoint(
task_id: str,
payload: ReclaimBody,
board: Optional[str] = Query(None),
):
"""Release an active worker claim on a running task.
Used by the dashboard recovery popover when an operator wants to
abort a stuck worker (e.g. one that keeps hallucinating card ids)
without waiting for the claim TTL. Maps 1:1 to
``hermes kanban reclaim <task_id> --reason ...``.
"""
board = _resolve_board(board)
conn = _conn(board=board)
try:
ok = kanban_db.reclaim_task(conn, task_id, reason=payload.reason)
if not ok:
raise HTTPException(
status_code=409,
detail=(
f"cannot reclaim {task_id}: not in a claimable state "
"(not running, or unknown id)"
),
)
return {"ok": True, "task_id": task_id}
finally:
conn.close()
class ReassignBody(BaseModel):
profile: Optional[str] = None # "" or None = unassign
reclaim_first: bool = False
reason: Optional[str] = None
@router.post("/tasks/{task_id}/reassign")
def reassign_task_endpoint(
task_id: str,
payload: ReassignBody,
board: Optional[str] = Query(None),
):
"""Reassign a task to a different profile, optionally reclaiming first.
Used by the dashboard recovery popover when an operator wants to
retry a task with a different worker profile (e.g. switch to a
smarter model after the assigned profile keeps hallucinating).
Maps 1:1 to ``hermes kanban reassign <task_id> <profile> [--reclaim]``.
"""
board = _resolve_board(board)
conn = _conn(board=board)
try:
ok = kanban_db.reassign_task(
conn, task_id,
payload.profile or None,
reclaim_first=bool(payload.reclaim_first),
reason=payload.reason,
)
if not ok:
raise HTTPException(
status_code=409,
detail=(
f"cannot reassign {task_id}: unknown id, or still "
"running (pass reclaim_first=true to release the claim first)"
),
)
return {"ok": True, "task_id": task_id, "assignee": payload.profile or None}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Plugin config (read dashboard.kanban.* defaults from config.yaml)
# ---------------------------------------------------------------------------