mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232)
Workers completing a kanban task can now claim the ids of cards they created via an optional ``created_cards`` field on ``kanban_complete``. The kernel verifies each id exists and was created by the completing worker's profile; any phantom id blocks the completion with a ``HallucinatedCardsError`` and records a ``completion_blocked_hallucination`` event on the task so the rejected attempt is auditable. Successful completions also get a non-blocking prose-scan pass over their ``summary`` + ``result`` that emits a ``suspected_hallucinated_references`` event for any ``t_<hex>`` reference that doesn't resolve. Closes #20017. Recovery UX (kernel + CLI + dashboard) -------------------------------------- A structural gate alone isn't enough — operators also need to see and act on stuck workers, especially when a profile's model is the root cause. This PR ships the full loop: * ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that releases an active worker claim immediately (unlike ``release_stale_claims`` which only acts after claim_expires has passed). Emits a ``reclaimed`` event with ``manual: True`` payload. * ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` — switch a task to a different profile, optionally reclaiming a stuck running worker in the same call. * ``hermes kanban reclaim <id> [--reason ...]`` and ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]`` CLI subcommands wired through to the same helpers. * ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the dashboard plugin. Dashboard surfacing ------------------- * ⚠ **warning badge** on cards with active hallucination events. * **attention strip** at the top of the board listing all flagged tasks; dismissible per session. * **events callout** in the task drawer — hallucination events render with a red left border, amber icon, and phantom ids as styled chips. * **recovery section** in the task drawer with three actions: Reclaim, Reassign (with profile picker + reclaim-first checkbox), and a copy-to-clipboard hint for ``hermes -p <profile> model`` since profile config lives on disk and can't be edited from the browser. Auto-opens when the task has warnings, collapsed otherwise. Keyed by task id so state doesn't leak between drawers. Active-vs-stale rule: warnings clear when a clean ``completed`` or ``edited`` event supersedes the hallucination, so recovery is never permanently stigmatising — the audit events persist for debugging but the badge goes away once the worker succeeds. Skill updates ------------- * ``skills/devops/kanban-worker/SKILL.md`` documents the ``created_cards`` contract with good/bad examples. * ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering stuck workers" section with the three actions and when to use each. Tests ----- * Kernel gate: verified-cards manifest, phantom rejection + audit event, cross-worker rejection, prose scan positive + negative. * Recovery helpers: reclaim on running task, reclaim on non-running returns False, reassign refuses running without reclaim_first, reassign with reclaim_first succeeds on running. * API endpoints: warnings field present on /board and /tasks/:id, warnings cleared after clean completion, reclaim 200 + 409 paths, reassign 200 + 409 + reclaim_first paths. * CLI smoke: reclaim + reassign subcommands. Live-verified end-to-end on a dashboard with seeded scenarios: attention strip renders, badges land on the right cards, drawer callout shows phantom chips, Reclaim on a running task flips status to ready + emits manual reclaimed event + refreshes the drawer, Reassign swaps the assignee and triggers board refresh. 359/359 kanban-suite tests pass (test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
This commit is contained in:
parent
7de3c86c5a
commit
de9238d37e
11 changed files with 1791 additions and 17 deletions
|
|
@ -176,6 +176,74 @@ def _run_dict(r: kanban_db.Run) -> dict[str, Any]:
|
|||
}
|
||||
|
||||
|
||||
# Hallucination-warning event kinds — see complete_task() in kanban_db.py.
|
||||
# completion_blocked_hallucination: kernel rejected created_cards with
|
||||
# phantom ids; task stays in prior state.
|
||||
# suspected_hallucinated_references: prose scan found t_<hex> in summary
|
||||
# that doesn't resolve; completion succeeded, advisory only.
|
||||
_WARNING_EVENT_KINDS = (
|
||||
"completion_blocked_hallucination",
|
||||
"suspected_hallucinated_references",
|
||||
)
|
||||
|
||||
|
||||
def _compute_warnings_for_tasks(
|
||||
conn: sqlite3.Connection,
|
||||
task_ids: Optional[list[str]] = None,
|
||||
) -> dict[str, dict]:
|
||||
"""Return {task_id: {count, kinds, latest_at}} for tasks with
|
||||
hallucination warnings that occurred AFTER the most recent clean
|
||||
completion event (completed / edited). An empty dict means no tasks
|
||||
on the board have active warnings.
|
||||
|
||||
``task_ids`` narrows the query; pass ``None`` to scan the whole DB
|
||||
(matches board-level rollup). Used by both the /board aggregate and
|
||||
per-task /tasks/:id endpoints.
|
||||
"""
|
||||
params: tuple = ()
|
||||
if task_ids is not None:
|
||||
if not task_ids:
|
||||
return {}
|
||||
placeholders = ",".join(["?"] * len(task_ids))
|
||||
sql = (
|
||||
"SELECT task_id, kind, created_at FROM task_events "
|
||||
f"WHERE task_id IN ({placeholders}) AND kind IN "
|
||||
"('completion_blocked_hallucination', "
|
||||
" 'suspected_hallucinated_references', "
|
||||
" 'completed', 'edited') "
|
||||
"ORDER BY task_id, id"
|
||||
)
|
||||
params = tuple(task_ids)
|
||||
else:
|
||||
sql = (
|
||||
"SELECT task_id, kind, created_at FROM task_events "
|
||||
"WHERE kind IN "
|
||||
"('completion_blocked_hallucination', "
|
||||
" 'suspected_hallucinated_references', "
|
||||
" 'completed', 'edited') "
|
||||
"ORDER BY task_id, id"
|
||||
)
|
||||
|
||||
out: dict[str, dict] = {}
|
||||
for row in conn.execute(sql, params).fetchall():
|
||||
tid = row["task_id"]
|
||||
kind = row["kind"]
|
||||
created_at = row["created_at"]
|
||||
if kind in ("completed", "edited"):
|
||||
# Clean event wipes prior warning counters; only events after
|
||||
# this timestamp count.
|
||||
out.pop(tid, None)
|
||||
continue
|
||||
bucket = out.setdefault(
|
||||
tid, {"count": 0, "kinds": {}, "latest_at": 0}
|
||||
)
|
||||
bucket["count"] += 1
|
||||
bucket["kinds"][kind] = bucket["kinds"].get(kind, 0) + 1
|
||||
if created_at > bucket["latest_at"]:
|
||||
bucket["latest_at"] = created_at
|
||||
return out
|
||||
|
||||
|
||||
def _links_for(conn: sqlite3.Connection, task_id: str) -> dict[str, list[str]]:
|
||||
"""Return {'parents': [...], 'children': [...]} for a task."""
|
||||
parents = [
|
||||
|
|
@ -253,6 +321,11 @@ def get_board(
|
|||
if row["cstatus"] == "done":
|
||||
p["done"] += 1
|
||||
|
||||
# Hallucination-warning rollup for this board (all tasks).
|
||||
# Delegated to _compute_warnings_for_tasks so the per-task
|
||||
# /tasks/:id endpoint can reuse the same rule.
|
||||
warnings_per_task = _compute_warnings_for_tasks(conn, task_ids=None)
|
||||
|
||||
latest_event_id = conn.execute(
|
||||
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
|
||||
).fetchone()["m"]
|
||||
|
|
@ -266,6 +339,9 @@ def get_board(
|
|||
d["link_counts"] = link_counts.get(t.id, {"parents": 0, "children": 0})
|
||||
d["comment_count"] = comment_counts.get(t.id, 0)
|
||||
d["progress"] = progress.get(t.id) # None when the task has no children
|
||||
w = warnings_per_task.get(t.id)
|
||||
if w:
|
||||
d["warnings"] = w
|
||||
col = t.status if t.status in columns else "todo"
|
||||
columns[col].append(d)
|
||||
|
||||
|
|
@ -313,8 +389,14 @@ def get_task(task_id: str, board: Optional[str] = Query(None)):
|
|||
task = kanban_db.get_task(conn, task_id)
|
||||
if task is None:
|
||||
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
|
||||
task_d = _task_dict(task)
|
||||
# Attach warnings metadata so the drawer's Recovery section can
|
||||
# auto-open when a hallucination is unresolved.
|
||||
warnings = _compute_warnings_for_tasks(conn, task_ids=[task_id])
|
||||
if warnings.get(task_id):
|
||||
task_d["warnings"] = warnings[task_id]
|
||||
return {
|
||||
"task": _task_dict(task),
|
||||
"task": task_d,
|
||||
"comments": [_comment_dict(c) for c in kanban_db.list_comments(conn, task_id)],
|
||||
"events": [_event_dict(e) for e in kanban_db.list_events(conn, task_id)],
|
||||
"links": _links_for(conn, task_id),
|
||||
|
|
@ -713,6 +795,85 @@ def bulk_update(payload: BulkTaskBody, board: Optional[str] = Query(None)):
|
|||
conn.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Recovery actions — reclaim a running claim, reassign to a new profile
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class ReclaimBody(BaseModel):
|
||||
reason: Optional[str] = None
|
||||
|
||||
|
||||
@router.post("/tasks/{task_id}/reclaim")
|
||||
def reclaim_task_endpoint(
|
||||
task_id: str,
|
||||
payload: ReclaimBody,
|
||||
board: Optional[str] = Query(None),
|
||||
):
|
||||
"""Release an active worker claim on a running task.
|
||||
|
||||
Used by the dashboard recovery popover when an operator wants to
|
||||
abort a stuck worker (e.g. one that keeps hallucinating card ids)
|
||||
without waiting for the claim TTL. Maps 1:1 to
|
||||
``hermes kanban reclaim <task_id> --reason ...``.
|
||||
"""
|
||||
board = _resolve_board(board)
|
||||
conn = _conn(board=board)
|
||||
try:
|
||||
ok = kanban_db.reclaim_task(conn, task_id, reason=payload.reason)
|
||||
if not ok:
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail=(
|
||||
f"cannot reclaim {task_id}: not in a claimable state "
|
||||
"(not running, or unknown id)"
|
||||
),
|
||||
)
|
||||
return {"ok": True, "task_id": task_id}
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class ReassignBody(BaseModel):
|
||||
profile: Optional[str] = None # "" or None = unassign
|
||||
reclaim_first: bool = False
|
||||
reason: Optional[str] = None
|
||||
|
||||
|
||||
@router.post("/tasks/{task_id}/reassign")
|
||||
def reassign_task_endpoint(
|
||||
task_id: str,
|
||||
payload: ReassignBody,
|
||||
board: Optional[str] = Query(None),
|
||||
):
|
||||
"""Reassign a task to a different profile, optionally reclaiming first.
|
||||
|
||||
Used by the dashboard recovery popover when an operator wants to
|
||||
retry a task with a different worker profile (e.g. switch to a
|
||||
smarter model after the assigned profile keeps hallucinating).
|
||||
Maps 1:1 to ``hermes kanban reassign <task_id> <profile> [--reclaim]``.
|
||||
"""
|
||||
board = _resolve_board(board)
|
||||
conn = _conn(board=board)
|
||||
try:
|
||||
ok = kanban_db.reassign_task(
|
||||
conn, task_id,
|
||||
payload.profile or None,
|
||||
reclaim_first=bool(payload.reclaim_first),
|
||||
reason=payload.reason,
|
||||
)
|
||||
if not ok:
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail=(
|
||||
f"cannot reassign {task_id}: unknown id, or still "
|
||||
"running (pass reclaim_first=true to release the claim first)"
|
||||
),
|
||||
)
|
||||
return {"ok": True, "task_id": task_id, "assignee": payload.profile or None}
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Plugin config (read dashboard.kanban.* defaults from config.yaml)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue