feat(kanban): generic diagnostics engine for task distress signals (#20332)

* feat(kanban): generic diagnostics engine for task distress signals

Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.

Closes the follow-up from #20232 discussion.

New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.

v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
  ``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
  ``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
  fires when ``tasks.spawn_failures >= 3``; suggests
  ``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
  ``crashed`` run outcomes with no successful completion between;
  suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
  state with no comments / unblock attempts; suggests commenting.

Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.

Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.

API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
  ``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
  section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
  severity, sorted critical-first.

CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
  [--json]`` — fleet view or single-task view, matches dashboard rule
  output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
  the top with severity markers + suggested actions.

Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
  !!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
  diagnostics (not just hallucinations), severity-coloured, lists
  affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
  ``DiagnosticsSection`` rendering a card per active diagnostic:
  title + detail + structured data (task-id chips when payload keys
  look like id lists) + action buttons. Reassign profile picker is
  inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
  environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
  red for critical. Uses CSS variables so theming is straightforward.

Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
  covering each rule's positive/negative/threshold paths, severity
  sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
  populated, severity-filtered), ``/board`` exposes both diagnostic
  list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
  warnings_field_for_hallucinated_completions``) updated to reflect
  the new contract: warning summary keys by diagnostic kind
  (``hallucinated_cards``) not event kind.

379 kanban-suite tests pass (+16 net from this PR).

Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:

* Attention strip: shows ``!! 5 tasks need attention`` in the
  error-severity orange; Show expands to a list of 5 rows ordered
  critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
  render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
  diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
  ``broken-ml-worker → alice`` and the drawer refreshed with the
  new assignee + the same diagnostic still firing (correct:
  spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
  ``--severity error`` narrows to 3; ``kanban show <id>`` includes
  the Diagnostics block at the top with suggested action hint.

Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.

* feat(kanban/diagnostics): lead titles with the actual error text

The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.

New titles:

  Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
  Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
  Agent crashed 3x: provider auth error: 401 Unauthorized
  Agent spawn failed 4x: insufficient_quota: You exceeded your current

Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').

Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).

Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
This commit is contained in:
Teknium 2026-05-05 13:32:42 -07:00 committed by GitHub
parent ec7f2f249e
commit f67063ba81
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 1895 additions and 289 deletions

View file

@ -60,30 +60,19 @@
blocked: "Mark this task as blocked? The worker's claim is released.",
};
// Event kinds that indicate a hallucinated/phantom task-id reference
// in a completion. ``completion_blocked_hallucination`` is emitted when
// the kernel's ``created_cards`` gate rejects a completion; the task is
// left in its prior state and the worker can retry. ``suspected_
// hallucinated_references`` is the advisory prose-scan result — the
// completion succeeded but the summary text references task ids that
// do not resolve.
const HALLUCINATION_EVENT_KINDS = [
"completion_blocked_hallucination",
"suspected_hallucinated_references",
];
const HALLUCINATION_EVENT_LABELS = {
completion_blocked_hallucination: "Completion blocked — phantom card ids",
suspected_hallucinated_references: "Prose referenced phantom card ids",
// Diagnostic kind labels for the events-tab callout. Event kinds emitted
// by the kernel get a human-readable header when we detect them in the
// events list; add new entries here as new diagnostic event kinds land.
const DIAGNOSTIC_EVENT_LABELS = {
completion_blocked_hallucination: "⚠ Completion blocked — phantom card ids",
suspected_hallucinated_references: "⚠ Prose referenced phantom card ids",
};
function isHallucinationEvent(kind) {
return HALLUCINATION_EVENT_KINDS.indexOf(kind) !== -1;
function isDiagnosticEvent(kind) {
return Object.prototype.hasOwnProperty.call(DIAGNOSTIC_EVENT_LABELS, kind);
}
function phantomIdsFromEvent(ev) {
// Payload shapes:
// completion_blocked_hallucination: {phantom_cards, verified_cards, summary_preview}
// suspected_hallucinated_references: {phantom_refs, source}
if (!ev || !ev.payload) return [];
const p = ev.payload;
return p.phantom_cards || p.phantom_refs || [];
@ -725,24 +714,36 @@
}
// -------------------------------------------------------------------------
// Attention strip — surfaces tasks with active hallucination warnings.
// Renders a collapsed bar just below the board switcher; clicking expands
// a list of affected tasks with an "Open" button each. Dismissible per
// session via state flag; tasks re-appear on page reload if they still
// have warnings.
// Attention strip — surfaces every task with active diagnostics,
// severity-marked (warning/error/critical). Collapsed by default; click
// Show to expand into per-task rows with Open buttons. Dismissible
// per session via state flag.
// -------------------------------------------------------------------------
function collectWarningTasks(boardData) {
function collectDiagTasks(boardData) {
if (!boardData || !boardData.columns) return [];
const out = [];
for (const col of boardData.columns) {
for (const t of col.tasks || []) {
if (t.warnings && t.warnings.count > 0) out.push(t);
if (t.diagnostics && t.diagnostics.length > 0) out.push(t);
else if (t.warnings && t.warnings.count > 0) out.push(t);
}
}
// Sort: most recent warning first.
// Sort: highest severity first (critical > error > warning), then by
// most recent latest_at.
const sevIdx = function (s) {
if (s === "critical") return 3;
if (s === "error") return 2;
if (s === "warning") return 1;
return 0;
};
out.sort(function (a, b) {
return (b.warnings.latest_at || 0) - (a.warnings.latest_at || 0);
const aSev = sevIdx((a.warnings && a.warnings.highest_severity) || "warning");
const bSev = sevIdx((b.warnings && b.warnings.highest_severity) || "warning");
if (aSev !== bSev) return bSev - aSev;
const aLa = (a.warnings && a.warnings.latest_at) || 0;
const bLa = (b.warnings && b.warnings.latest_at) || 0;
return bLa - aLa;
});
return out;
}
@ -750,18 +751,31 @@
function AttentionStrip(props) {
const [expanded, setExpanded] = useState(false);
const [dismissed, setDismissed] = useState(false);
const warnTasks = useMemo(
function () { return collectWarningTasks(props.boardData); },
const diagTasks = useMemo(
function () { return collectDiagTasks(props.boardData); },
[props.boardData]
);
if (dismissed || warnTasks.length === 0) return null;
return h("div", { className: "hermes-kanban-attention" },
if (dismissed || diagTasks.length === 0) return null;
// Pick the highest severity present so we can colour the strip.
let topSev = "warning";
for (const t of diagTasks) {
const s = (t.warnings && t.warnings.highest_severity) || "warning";
if (s === "critical") { topSev = "critical"; break; }
if (s === "error" && topSev !== "critical") topSev = "error";
}
return h("div", {
className: cn(
"hermes-kanban-attention",
"hermes-kanban-attention--" + topSev,
),
},
h("div", { className: "hermes-kanban-attention-bar" },
h("span", { className: "hermes-kanban-attention-icon" }, "⚠"),
h("span", { className: "hermes-kanban-attention-icon" },
topSev === "critical" ? "!!!" : topSev === "error" ? "!!" : "⚠"),
h("span", { className: "hermes-kanban-attention-text" },
warnTasks.length === 1
? "1 task with hallucination warnings"
: `${warnTasks.length} tasks with hallucination warnings`,
diagTasks.length === 1
? "1 task needs attention"
: `${diagTasks.length} tasks need attention`,
),
h("button", {
className: "hermes-kanban-attention-toggle",
@ -773,19 +787,29 @@
onClick: function () { setDismissed(true); },
title: "Hide until next page reload",
type: "button",
}, ""),
}, "\u2715"),
),
expanded
? h("div", { className: "hermes-kanban-attention-list" },
warnTasks.map(function (t) {
return h("div", { key: t.id, className: "hermes-kanban-attention-row" },
diagTasks.map(function (t) {
const sev = (t.warnings && t.warnings.highest_severity) || "warning";
const kinds = t.warnings && t.warnings.kinds ? Object.keys(t.warnings.kinds) : [];
return h("div", {
key: t.id,
className: cn(
"hermes-kanban-attention-row",
"hermes-kanban-attention-row--" + sev,
),
},
h("span", { className: "hermes-kanban-attention-row-sev" },
sev === "critical" ? "!!!" : sev === "error" ? "!!" : "⚠"),
h("span", { className: "hermes-kanban-attention-row-id" }, t.id),
h("span", { className: "hermes-kanban-attention-row-title" },
t.title || "(untitled)"),
h("span", { className: "hermes-kanban-attention-row-meta" },
t.assignee ? "@" + t.assignee : "unassigned",
" · ",
`${t.warnings.count} event${t.warnings.count === 1 ? "" : "s"}`,
" \u00b7 ",
kinds.length > 0 ? kinds.join(", ") : "diagnostic",
),
h("button", {
className: "hermes-kanban-attention-row-btn",
@ -800,195 +824,266 @@
}
// -------------------------------------------------------------------------
// Recovery popover — operator actions for a task flagged with
// hallucination warnings. Three primary actions:
// 1. Reclaim — release a running worker's claim; task back to ready.
// 2. Reassign — switch the task to a different profile (with optional
// reclaim-first toggle for currently-running tasks).
// 3. Edit profile — copy the CLI hint for `hermes -p <name> model`
// (the dashboard can't edit profile config from the
// browser; it lives on the filesystem).
// Rendered from inside TaskDetail via a toggle button.
// Diagnostics section — generic renderer for a task's active distress
// signals. Each diagnostic carries its own title, detail, data payload,
// and a list of structured actions; the section renders them uniformly
// regardless of kind. Replaces the hallucination-specific
// ``RecoveryPopover`` from the previous iteration.
//
// Action kinds supported today:
// reclaim → POST /tasks/:id/reclaim
// reassign → POST /tasks/:id/reassign (with profile picker)
// unblock → PATCH /tasks/:id body: {status: "ready"}
// comment → scroll to the comment input at the bottom of the drawer
// cli_hint → copy payload.command to clipboard
// open_docs → open payload.url in a new tab
// Unknown kinds are rendered as a disabled informational row so the
// server can add new action kinds without breaking the UI.
// -------------------------------------------------------------------------
function RecoveryPopover(props) {
const t = props.task;
const board = props.boardSlug;
const assignees = props.assignees || [];
const [reason, setReason] = useState("");
const [newProfile, setNewProfile] = useState(t.assignee || "");
const [reclaimFirst, setReclaimFirst] = useState(t.status === "running");
function DiagnosticActionButton(props) {
const { action, onExec, busy, extra } = props;
const label = (action.suggested ? "\u2606 " : "") + action.label;
const cls = cn(
"hermes-kanban-diag-action-btn",
action.suggested ? "hermes-kanban-diag-action-btn--suggested" : "",
);
if (action.kind === "reclaim" || action.kind === "reassign" ||
action.kind === "unblock") {
return h("button", {
className: cls,
disabled: busy || (extra && extra.disabled),
onClick: function () { onExec(action); },
type: "button",
}, label);
}
if (action.kind === "cli_hint") {
return h("button", {
className: cls,
disabled: busy,
onClick: function () { onExec(action); },
type: "button",
title: "Copy command to clipboard",
}, (extra && extra.copied) ? "Copied" : label);
}
if (action.kind === "comment") {
return h("button", {
className: cls,
onClick: function () { onExec(action); },
type: "button",
}, label);
}
if (action.kind === "open_docs") {
return h("a", {
className: cls,
href: (action.payload && action.payload.url) || "#",
target: "_blank",
rel: "noreferrer",
}, label);
}
// Unknown kind — render informational, non-interactive.
return h("span", { className: cls + " hermes-kanban-diag-action-btn--unknown" },
label);
}
function DiagnosticCard(props) {
const { diag, task, boardSlug, assignees, onRefresh } = props;
const [busy, setBusy] = useState(false);
const [msg, setMsg] = useState(null);
const [copied, setCopied] = useState(false);
const [copiedKey, setCopiedKey] = useState(null);
const [reassignProfile, setReassignProfile] = useState(task.assignee || "");
const act = function (kind) {
const execAction = function (action) {
if (busy) return;
setBusy(true);
setMsg(null);
const urlBase = `${API}/tasks/${encodeURIComponent(t.id)}`;
const url = kind === "reclaim"
? withBoard(`${urlBase}/reclaim`, board)
: withBoard(`${urlBase}/reassign`, board);
const body = kind === "reclaim"
? { reason: reason || null }
: {
profile: newProfile || null,
reclaim_first: !!reclaimFirst,
reason: reason || null,
};
SDK.fetchJSON(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
}).then(function () {
setMsg({ ok: true, text:
kind === "reclaim"
? `Reclaimed ${t.id}. Task back to ready.`
: `Reassigned ${t.id} to ${newProfile || "(unassigned)"}.`
});
if (props.onActionComplete) props.onActionComplete(kind);
}).catch(function (err) {
setMsg({ ok: false, text: `Failed: ${err.message || err}` });
}).then(function () {
setBusy(false);
});
};
const profileCmd = `hermes -p ${t.assignee || "<profile>"} model`;
const copyCmd = function () {
try {
navigator.clipboard.writeText(profileCmd).then(function () {
setCopied(true);
setTimeout(function () { setCopied(false); }, 2000);
});
} catch (_) {
window.prompt("Copy this command:", profileCmd);
if (action.kind === "cli_hint") {
const cmd = (action.payload && action.payload.command) || action.label;
const fallback = function () { window.prompt("Copy this command:", cmd); };
try {
const p = navigator.clipboard && navigator.clipboard.writeText(cmd);
if (p && p.then) {
p.then(function () {
setCopiedKey(action.label);
setTimeout(function () { setCopiedKey(null); }, 2000);
}).catch(fallback);
} else {
fallback();
}
} catch (_) {
fallback();
}
return;
}
if (action.kind === "comment") {
// Scroll the comment input into view; the drawer already has one
// at the bottom. Focus it so the operator can start typing.
const ta = document.querySelector(".hermes-kanban-drawer-comment-row input, .hermes-kanban-drawer-comment-row textarea");
if (ta) {
ta.scrollIntoView({ behavior: "smooth", block: "nearest" });
ta.focus();
}
return;
}
if (action.kind === "unblock") {
setBusy(true); setMsg(null);
const url = withBoard(`${API}/tasks/${encodeURIComponent(task.id)}`, boardSlug);
SDK.fetchJSON(url, {
method: "PATCH",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ status: "ready" }),
}).then(function () {
setMsg({ ok: true, text: `Unblocked ${task.id}. Task is ready for the next tick.` });
if (onRefresh) onRefresh();
}).catch(function (err) {
setMsg({ ok: false, text: `Unblock failed: ${err.message || err}` });
}).then(function () { setBusy(false); });
return;
}
if (action.kind === "reclaim") {
setBusy(true); setMsg(null);
const url = withBoard(`${API}/tasks/${encodeURIComponent(task.id)}/reclaim`, boardSlug);
SDK.fetchJSON(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ reason: `recovery action for ${diag.kind}` }),
}).then(function () {
setMsg({ ok: true, text: `Reclaimed ${task.id}. Task is back to ready.` });
if (onRefresh) onRefresh();
}).catch(function (err) {
setMsg({ ok: false, text: `Reclaim failed: ${err.message || err}` });
}).then(function () { setBusy(false); });
return;
}
if (action.kind === "reassign") {
if (!reassignProfile) {
setMsg({ ok: false, text: "Pick a profile first." });
return;
}
setBusy(true); setMsg(null);
const url = withBoard(`${API}/tasks/${encodeURIComponent(task.id)}/reassign`, boardSlug);
const body = {
profile: reassignProfile || null,
reclaim_first: !!(action.payload && action.payload.reclaim_first),
reason: `recovery action for ${diag.kind}`,
};
SDK.fetchJSON(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
}).then(function () {
setMsg({
ok: true,
text: `Reassigned ${task.id} to ${reassignProfile}.`,
});
if (onRefresh) onRefresh();
}).catch(function (err) {
setMsg({ ok: false, text: `Reassign failed: ${err.message || err}` });
}).then(function () { setBusy(false); });
return;
}
};
return h("div", { className: "hermes-kanban-recovery" },
h("div", { className: "hermes-kanban-recovery-title" },
"Recovery actions"),
h("div", { className: "hermes-kanban-recovery-hint" },
"Use these when a worker is stuck (crash loop, repeated hallucination, ",
"broken model). Events in this task's history are preserved as audit trail."),
// Pull out the reassign action so we can render its picker inline.
const reassignAction = (diag.actions || []).find(function (a) {
return a.kind === "reassign";
});
// Reason input (shared across actions)
h("div", { className: "hermes-kanban-recovery-section" },
h("label", { className: "hermes-kanban-recovery-label" },
"Reason (optional, logged on event)"),
h("input", {
type: "text",
className: "hermes-kanban-recovery-input",
value: reason,
onChange: function (e) { setReason(e.target.value); },
placeholder: "e.g. model hallucinating, switching to larger",
const sevClass = "hermes-kanban-diag--" + (diag.severity || "warning");
return h("div", { className: cn("hermes-kanban-diag", sevClass) },
h("div", { className: "hermes-kanban-diag-header" },
h("span", { className: "hermes-kanban-diag-sev" },
diag.severity === "critical" ? "!!!" :
diag.severity === "error" ? "!!" : "\u26a0"),
h("span", { className: "hermes-kanban-diag-title" },
diag.title),
),
h("div", { className: "hermes-kanban-diag-detail" },
diag.detail),
diag.data && Object.keys(diag.data).length > 0
? h("div", { className: "hermes-kanban-diag-data" },
Object.keys(diag.data).map(function (k) {
const v = diag.data[k];
if (Array.isArray(v) && v.length > 0 && typeof v[0] === "string" &&
v[0].indexOf("t_") === 0) {
// Task-id list — render as chips.
return h("div", { key: k, className: "hermes-kanban-diag-data-row" },
h("span", { className: "hermes-kanban-diag-data-key" }, k + ":"),
v.map(function (x) {
return h("code", {
key: x, className: "hermes-kanban-event-phantom-chip",
}, x);
}),
);
}
return h("div", { key: k, className: "hermes-kanban-diag-data-row" },
h("span", { className: "hermes-kanban-diag-data-key" }, k + ":"),
h("span", { className: "hermes-kanban-diag-data-val" },
Array.isArray(v) ? v.join(", ") : String(v)),
);
}),
)
: null,
// Inline reassign picker — only shown when the diagnostic offers
// a reassign action. Profile list comes from the board payload.
reassignAction
? h("div", { className: "hermes-kanban-diag-reassign-row" },
h("span", { className: "hermes-kanban-diag-reassign-label" },
"Reassign to:"),
h("select", {
className: "hermes-kanban-recovery-select",
value: reassignProfile,
onChange: function (e) { setReassignProfile(e.target.value); },
},
h("option", { value: "" }, "(unassigned)"),
(assignees || []).map(function (a) {
return h("option", { key: a, value: a }, a);
}),
),
)
: null,
h("div", { className: "hermes-kanban-diag-actions" },
(diag.actions || []).map(function (a, i) {
return h(DiagnosticActionButton, {
key: a.kind + i,
action: a,
onExec: execAction,
busy: busy,
extra: {
copied: copiedKey === a.label,
disabled: (a.kind === "reassign" && !reassignProfile),
},
});
}),
),
// Action 1: Reclaim
h("div", { className: "hermes-kanban-recovery-section" },
h("div", { className: "hermes-kanban-recovery-action-row" },
h("div", { className: "hermes-kanban-recovery-action-label" },
"1. Reclaim"),
h("div", { className: "hermes-kanban-recovery-action-desc" },
t.status === "running"
? "Abort the running worker and reset to ready."
: "Task is not running — nothing to reclaim."),
h("button", {
className: "hermes-kanban-recovery-btn",
disabled: busy || t.status !== "running",
onClick: function () { act("reclaim"); },
type: "button",
}, "Reclaim"),
),
),
// Action 2: Reassign
h("div", { className: "hermes-kanban-recovery-section" },
h("div", { className: "hermes-kanban-recovery-action-row" },
h("div", { className: "hermes-kanban-recovery-action-label" },
"2. Reassign"),
h("div", { className: "hermes-kanban-recovery-action-desc" },
"Switch to a different worker profile and retry."),
),
h("div", { className: "hermes-kanban-recovery-reassign-row" },
h("select", {
className: "hermes-kanban-recovery-select",
value: newProfile,
onChange: function (e) { setNewProfile(e.target.value); },
},
h("option", { value: "" }, "(unassigned)"),
assignees.map(function (a) {
return h("option", { key: a, value: a }, a);
}),
),
h("label", { className: "hermes-kanban-recovery-checkbox" },
h("input", {
type: "checkbox",
checked: reclaimFirst,
onChange: function (e) { setReclaimFirst(e.target.checked); },
}),
" Reclaim first",
),
h("button", {
className: "hermes-kanban-recovery-btn",
disabled: busy,
onClick: function () { act("reassign"); },
type: "button",
}, "Reassign"),
),
),
// Action 3: Edit profile model (CLI hint)
h("div", { className: "hermes-kanban-recovery-section" },
h("div", { className: "hermes-kanban-recovery-action-row" },
h("div", { className: "hermes-kanban-recovery-action-label" },
"3. Change profile model"),
h("div", { className: "hermes-kanban-recovery-action-desc" },
"Profile config lives on disk — change it from a terminal, ",
"then use Reclaim above to retry with the new model."),
),
h("div", { className: "hermes-kanban-recovery-cmd-row" },
h("code", { className: "hermes-kanban-recovery-cmd" }, profileCmd),
h("button", {
className: "hermes-kanban-recovery-btn",
onClick: copyCmd,
type: "button",
}, copied ? "Copied" : "Copy"),
),
),
msg
? h("div", {
className: cn(
"hermes-kanban-recovery-msg",
msg.ok ? "hermes-kanban-recovery-msg--ok" : "hermes-kanban-recovery-msg--err",
"hermes-kanban-diag-msg",
msg.ok ? "hermes-kanban-diag-msg--ok" : "hermes-kanban-diag-msg--err",
),
}, msg.text)
: null,
);
}
// Thin wrapper that toggles the RecoveryPopover visibility inside a
// task drawer. Auto-opens when the task has active hallucination
// warnings; operators can still collapse it. Always available via a
// header button for tasks without warnings, so reclaim/reassign is
// accessible for other stuck-worker scenarios too.
function RecoverySection(props) {
const [open, setOpen] = useState(!!props.hasWarnings);
// Re-open automatically if warnings appear while the drawer is open.
function DiagnosticsSection(props) {
const diags = props.diagnostics || [];
const hasOpenDiags = diags.length > 0;
const [open, setOpen] = useState(hasOpenDiags);
useEffect(function () {
if (props.hasWarnings) setOpen(true);
}, [props.hasWarnings]);
if (hasOpenDiags) setOpen(true);
}, [hasOpenDiags]);
if (!hasOpenDiags && !props.alwaysVisible) {
// Nothing active. Collapse the section entirely rather than showing
// an empty "Recovery" header — keeps clean tasks visually clean.
return null;
}
return h("div", { className: "hermes-kanban-section" },
h("div", { className: "hermes-kanban-section-head-row" },
h("span", { className: "hermes-kanban-section-head" },
props.hasWarnings
hasOpenDiags
? h("span", { className: "hermes-kanban-section-head-warning" },
"⚠ Recovery")
: "Recovery",
`\u26a0 Diagnostics (${diags.length})`)
: "Diagnostics",
),
h("button", {
className: "hermes-kanban-section-toggle",
@ -997,24 +1092,23 @@
}, open ? "Hide" : "Show"),
),
open
? h(RecoveryPopover, {
// Keyed by task id so React tears the popover down and
// remounts it when the drawer swaps to a different task —
// otherwise reason / newProfile / success toast from the
// previous task leak into the new one.
key: props.task.id,
task: props.task,
boardSlug: props.boardSlug,
assignees: props.assignees,
onActionComplete: function () {
if (props.onRefresh) props.onRefresh();
},
})
? h("div", { className: "hermes-kanban-diag-list" },
diags.map(function (d, i) {
return h(DiagnosticCard, {
key: props.task.id + ":" + d.kind + i,
diag: d,
task: props.task,
boardSlug: props.boardSlug,
assignees: props.assignees,
onRefresh: props.onRefresh,
});
}),
)
: null,
);
}
// -------------------------------------------------------------------------
// -------------------------------------------------------------------------
// Board switcher (multi-project)
// -------------------------------------------------------------------------
@ -1545,11 +1639,18 @@
h("span", { className: "hermes-kanban-card-id" }, t.id),
t.warnings && t.warnings.count > 0
? h("span", {
className: "hermes-kanban-warning-badge",
title: `${t.warnings.count} hallucination ` +
`event(s) since last clean completion. ` +
`Click to open for details.`,
}, "⚠")
className: cn(
"hermes-kanban-warning-badge",
"hermes-kanban-warning-badge--" + (t.warnings.highest_severity || "warning"),
),
title: (
`${t.warnings.count} active diagnostic` +
(t.warnings.count === 1 ? "" : "s") +
` (severity: ${t.warnings.highest_severity || "warning"}). ` +
`Click to open for details.`
),
}, t.warnings.highest_severity === "critical" ? "!!!" :
t.warnings.highest_severity === "error" ? "!!" : "⚠")
: null,
t.priority > 0
? h(Badge, { className: "hermes-kanban-priority" }, `P${t.priority}`)
@ -1945,11 +2046,11 @@
t.created_by ? h(MetaRow, { label: "Created by", value: t.created_by }) : null,
),
h(StatusActions, { task: t, onPatch: props.onPatch }),
h(RecoverySection, {
h(DiagnosticsSection, {
task: t,
boardSlug: props.boardSlug,
assignees: props.assignees,
hasWarnings: t.warnings && t.warnings.count > 0,
diagnostics: t.diagnostics || [],
onRefresh: props.onRefresh,
}),
h(HomeSubsSection, {
@ -1992,20 +2093,20 @@
h("div", { className: "hermes-kanban-section" },
h("div", { className: "hermes-kanban-section-head" }, `Events (${events.length})`),
events.slice().reverse().slice(0, 20).map(function (e) {
const isHall = isHallucinationEvent(e.kind);
const phantoms = isHall ? phantomIdsFromEvent(e) : [];
const isDiag = isDiagnosticEvent(e.kind);
const phantoms = isDiag ? phantomIdsFromEvent(e) : [];
return h("div", {
key: e.id,
className: cn(
"hermes-kanban-event",
isHall ? "hermes-kanban-event--hallucination" : "",
isDiag ? "hermes-kanban-event--hallucination" : "",
),
},
isHall
isDiag
? h("div", { className: "hermes-kanban-event-header" },
h("span", { className: "hermes-kanban-event-warning-icon" }, "⚠"),
h("span", { className: "hermes-kanban-event-warning-label" },
HALLUCINATION_EVENT_LABELS[e.kind] || e.kind),
DIAGNOSTIC_EVENT_LABELS[e.kind] || e.kind),
h("span", { className: "hermes-kanban-event-ago" },
timeAgo ? timeAgo(e.created_at) : ""),
)
@ -2014,7 +2115,7 @@
h("span", { className: "hermes-kanban-event-ago" },
timeAgo ? timeAgo(e.created_at) : ""),
),
isHall && phantoms.length > 0
isDiag && phantoms.length > 0
? h("div", { className: "hermes-kanban-event-phantom-row" },
h("span", { className: "hermes-kanban-event-phantom-label" },
"Phantom ids:"),
@ -2026,7 +2127,7 @@
}),
)
: null,
e.payload && !isHall
e.payload && !isDiag
? h("code", { className: "hermes-kanban-event-payload" },
JSON.stringify(e.payload))
: null,

View file

@ -1100,3 +1100,173 @@
color: #ff8b8b;
border: 1px solid rgba(255, 107, 107, 0.3);
}
/* ---------------------------------------------------------------------- */
/* Diagnostics — generic, severity-coloured distress signals on tasks. */
/* Three rungs: warning (amber), error (orange), critical (red). */
/* ---------------------------------------------------------------------- */
/* Severity token variables so every diagnostic-coloured surface uses the */
/* same palette. */
.hermes-kanban-diag,
.hermes-kanban-attention,
.hermes-kanban-warning-badge,
.hermes-kanban-attention-row {
--hermes-diag-warning: #ff9e3b;
--hermes-diag-error: #ff6b3d;
--hermes-diag-critical: #ff4d4d;
}
/* Warning-badge severity variants (overrides the base colour). */
.hermes-kanban-warning-badge--warning { color: var(--hermes-diag-warning); }
.hermes-kanban-warning-badge--error { color: var(--hermes-diag-error); font-weight: 700; }
.hermes-kanban-warning-badge--critical { color: var(--hermes-diag-critical); font-weight: 700; }
/* Attention-strip severity variants. */
.hermes-kanban-attention--warning {
border-color: rgba(255, 158, 59, 0.35);
background: rgba(255, 158, 59, 0.06);
}
.hermes-kanban-attention--error {
border-color: rgba(255, 107, 61, 0.45);
background: rgba(255, 107, 61, 0.08);
}
.hermes-kanban-attention--critical {
border-color: rgba(255, 77, 77, 0.55);
background: rgba(255, 77, 77, 0.10);
}
.hermes-kanban-attention--error .hermes-kanban-attention-icon { color: var(--hermes-diag-error); }
.hermes-kanban-attention--critical .hermes-kanban-attention-icon { color: var(--hermes-diag-critical); }
/* Per-row severity marker in the expanded attention list. */
.hermes-kanban-attention-row-sev {
display: inline-block;
min-width: 1.5rem;
font-weight: 600;
}
.hermes-kanban-attention-row--warning .hermes-kanban-attention-row-sev { color: var(--hermes-diag-warning); }
.hermes-kanban-attention-row--error .hermes-kanban-attention-row-sev { color: var(--hermes-diag-error); font-weight: 700; }
.hermes-kanban-attention-row--critical .hermes-kanban-attention-row-sev { color: var(--hermes-diag-critical); font-weight: 700; }
/* Individual diagnostic card inside the drawer's Diagnostics section. */
.hermes-kanban-diag-list {
display: flex;
flex-direction: column;
gap: 0.6rem;
}
.hermes-kanban-diag {
border-left: 3px solid var(--hermes-diag-warning);
background: rgba(255, 158, 59, 0.05);
border-radius: 0.35rem;
padding: 0.6rem 0.75rem;
display: flex;
flex-direction: column;
gap: 0.4rem;
}
.hermes-kanban-diag--error {
border-left-color: var(--hermes-diag-error);
background: rgba(255, 107, 61, 0.06);
}
.hermes-kanban-diag--critical {
border-left-color: var(--hermes-diag-critical);
background: rgba(255, 77, 77, 0.07);
}
.hermes-kanban-diag-header {
display: flex;
align-items: center;
gap: 0.5rem;
}
.hermes-kanban-diag-sev {
font-weight: 700;
min-width: 1.5rem;
}
.hermes-kanban-diag--warning .hermes-kanban-diag-sev { color: var(--hermes-diag-warning); }
.hermes-kanban-diag--error .hermes-kanban-diag-sev { color: var(--hermes-diag-error); }
.hermes-kanban-diag--critical .hermes-kanban-diag-sev { color: var(--hermes-diag-critical); }
.hermes-kanban-diag-title {
font-weight: 600;
font-size: 0.875rem;
}
.hermes-kanban-diag-detail {
font-size: 0.8125rem;
color: var(--color-foreground, #ccc);
line-height: 1.4;
}
.hermes-kanban-diag-data {
display: flex;
flex-direction: column;
gap: 0.2rem;
font-size: 0.75rem;
}
.hermes-kanban-diag-data-row {
display: flex;
align-items: center;
gap: 0.35rem;
flex-wrap: wrap;
}
.hermes-kanban-diag-data-key {
color: var(--color-muted-foreground, #888);
font-weight: 500;
}
.hermes-kanban-diag-data-val {
font-family: ui-monospace, SFMono-Regular, monospace;
}
.hermes-kanban-diag-reassign-row {
display: flex;
align-items: center;
gap: 0.4rem;
font-size: 0.75rem;
}
.hermes-kanban-diag-reassign-label {
color: var(--color-muted-foreground, #888);
}
.hermes-kanban-diag-actions {
display: flex;
flex-wrap: wrap;
gap: 0.4rem;
margin-top: 0.1rem;
}
.hermes-kanban-diag-action-btn {
padding: 0.25rem 0.6rem;
font-size: 0.75rem;
background: rgba(0, 0, 0, 0.2);
border: 1px solid rgba(120, 120, 140, 0.3);
border-radius: 0.3rem;
color: inherit;
cursor: pointer;
text-decoration: none;
}
.hermes-kanban-diag-action-btn:hover:not(:disabled) {
background: rgba(0, 0, 0, 0.3);
}
.hermes-kanban-diag-action-btn:disabled {
opacity: 0.4;
cursor: not-allowed;
}
.hermes-kanban-diag-action-btn--suggested {
background: rgba(255, 158, 59, 0.15);
border-color: rgba(255, 158, 59, 0.4);
font-weight: 600;
}
.hermes-kanban-diag-action-btn--suggested:hover:not(:disabled) {
background: rgba(255, 158, 59, 0.25);
}
.hermes-kanban-diag-action-btn--unknown {
opacity: 0.6;
cursor: default;
}
.hermes-kanban-diag-msg {
font-size: 0.75rem;
padding: 0.35rem 0.5rem;
border-radius: 0.3rem;
}
.hermes-kanban-diag-msg--ok {
background: rgba(120, 200, 120, 0.12);
color: #6bc46b;
border: 1px solid rgba(120, 200, 120, 0.3);
}
.hermes-kanban-diag-msg--err {
background: rgba(255, 107, 61, 0.12);
color: #ff8b6b;
border: 1px solid rgba(255, 107, 61, 0.3);
}

View file

@ -187,63 +187,109 @@ _WARNING_EVENT_KINDS = (
)
def _compute_warnings_for_tasks(
def _compute_task_diagnostics(
conn: sqlite3.Connection,
task_ids: Optional[list[str]] = None,
) -> dict[str, dict]:
"""Return {task_id: {count, kinds, latest_at}} for tasks with
hallucination warnings that occurred AFTER the most recent clean
completion event (completed / edited). An empty dict means no tasks
on the board have active warnings.
) -> dict[str, list[dict]]:
"""Run the diagnostic rule engine against every task (or a subset)
and return ``{task_id: [diagnostic_dict, ...]}``.
``task_ids`` narrows the query; pass ``None`` to scan the whole DB
(matches board-level rollup). Used by both the /board aggregate and
per-task /tasks/:id endpoints.
Tasks with no active diagnostics are omitted from the result.
Uses ``hermes_cli.kanban_diagnostics`` see that module for the
rule definitions.
"""
params: tuple = ()
from hermes_cli import kanban_diagnostics as kd
# Build the candidate task list. We need each task's row + its
# events + its runs. Doing N separate queries works but scales
# poorly; do three aggregate queries instead.
if task_ids is not None:
if not task_ids:
return {}
placeholders = ",".join(["?"] * len(task_ids))
sql = (
"SELECT task_id, kind, created_at FROM task_events "
f"WHERE task_id IN ({placeholders}) AND kind IN "
"('completion_blocked_hallucination', "
" 'suspected_hallucinated_references', "
" 'completed', 'edited') "
"ORDER BY task_id, id"
)
params = tuple(task_ids)
rows = conn.execute(
f"SELECT * FROM tasks WHERE id IN ({placeholders})",
tuple(task_ids),
).fetchall()
else:
sql = (
"SELECT task_id, kind, created_at FROM task_events "
"WHERE kind IN "
"('completion_blocked_hallucination', "
" 'suspected_hallucinated_references', "
" 'completed', 'edited') "
"ORDER BY task_id, id"
)
rows = conn.execute(
"SELECT * FROM tasks WHERE status != 'archived'",
).fetchall()
out: dict[str, dict] = {}
for row in conn.execute(sql, params).fetchall():
tid = row["task_id"]
kind = row["kind"]
created_at = row["created_at"]
if kind in ("completed", "edited"):
# Clean event wipes prior warning counters; only events after
# this timestamp count.
out.pop(tid, None)
continue
bucket = out.setdefault(
tid, {"count": 0, "kinds": {}, "latest_at": 0}
if not rows:
return {}
# Index events + runs by task id. For very large boards this will
# slurp a lot — acceptable on the dashboard's typical working set
# (hundreds of tasks), but we can add pagination / filtering later
# if profiling shows it's a hotspot.
row_ids = [r["id"] for r in rows]
placeholders = ",".join(["?"] * len(row_ids))
events_by_task: dict[str, list] = {tid: [] for tid in row_ids}
for ev_row in conn.execute(
f"SELECT * FROM task_events WHERE task_id IN ({placeholders}) ORDER BY id",
tuple(row_ids),
).fetchall():
events_by_task.setdefault(ev_row["task_id"], []).append(ev_row)
runs_by_task: dict[str, list] = {tid: [] for tid in row_ids}
for run_row in conn.execute(
f"SELECT * FROM task_runs WHERE task_id IN ({placeholders}) ORDER BY id",
tuple(row_ids),
).fetchall():
runs_by_task.setdefault(run_row["task_id"], []).append(run_row)
out: dict[str, list[dict]] = {}
for r in rows:
tid = r["id"]
diags = kd.compute_task_diagnostics(
r,
events_by_task.get(tid, []),
runs_by_task.get(tid, []),
)
bucket["count"] += 1
bucket["kinds"][kind] = bucket["kinds"].get(kind, 0) + 1
if created_at > bucket["latest_at"]:
bucket["latest_at"] = created_at
if diags:
out[tid] = [d.to_dict() for d in diags]
return out
def _warnings_summary_from_diagnostics(
diagnostics: list[dict],
) -> Optional[dict]:
"""Compact summary for cards: {count, highest_severity, kinds,
latest_at}. Replaces the old hallucination-only ``warnings`` object
same shape additions plus ``highest_severity`` so the UI can color
badges per diagnostic severity.
Returns None when ``diagnostics`` is empty.
"""
if not diagnostics:
return None
from hermes_cli.kanban_diagnostics import SEVERITY_ORDER
kinds: dict[str, int] = {}
latest = 0
highest_idx = -1
highest_sev: Optional[str] = None
count = 0
for d in diagnostics:
kinds[d["kind"]] = kinds.get(d["kind"], 0) + d.get("count", 1)
count += d.get("count", 1)
la = d.get("last_seen_at") or 0
if la > latest:
latest = la
sev = d.get("severity")
if sev in SEVERITY_ORDER:
idx = SEVERITY_ORDER.index(sev)
if idx > highest_idx:
highest_idx = idx
highest_sev = sev
return {
"count": count,
"kinds": kinds,
"latest_at": latest,
"highest_severity": highest_sev,
}
def _links_for(conn: sqlite3.Connection, task_id: str) -> dict[str, list[str]]:
"""Return {'parents': [...], 'children': [...]} for a task."""
parents = [
@ -321,10 +367,11 @@ def get_board(
if row["cstatus"] == "done":
p["done"] += 1
# Hallucination-warning rollup for this board (all tasks).
# Delegated to _compute_warnings_for_tasks so the per-task
# /tasks/:id endpoint can reuse the same rule.
warnings_per_task = _compute_warnings_for_tasks(conn, task_ids=None)
# Diagnostics rollup for this board — see kanban_diagnostics.
# We get the full structured list per task AND a compact
# summary for the card badge (so cards don't carry the detail
# text; the drawer fetches that via /tasks/:id or /diagnostics).
diagnostics_per_task = _compute_task_diagnostics(conn, task_ids=None)
latest_event_id = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
@ -339,9 +386,13 @@ def get_board(
d["link_counts"] = link_counts.get(t.id, {"parents": 0, "children": 0})
d["comment_count"] = comment_counts.get(t.id, 0)
d["progress"] = progress.get(t.id) # None when the task has no children
w = warnings_per_task.get(t.id)
if w:
d["warnings"] = w
diags = diagnostics_per_task.get(t.id)
if diags:
# Full list goes into the payload so the drawer can render
# without a second round-trip. The board-level badge only
# needs the summary.
d["diagnostics"] = diags
d["warnings"] = _warnings_summary_from_diagnostics(diags)
col = t.status if t.status in columns else "todo"
columns[col].append(d)
@ -390,11 +441,13 @@ def get_task(task_id: str, board: Optional[str] = Query(None)):
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
task_d = _task_dict(task)
# Attach warnings metadata so the drawer's Recovery section can
# auto-open when a hallucination is unresolved.
warnings = _compute_warnings_for_tasks(conn, task_ids=[task_id])
if warnings.get(task_id):
task_d["warnings"] = warnings[task_id]
# Attach diagnostics so the drawer's Diagnostics section can
# render recovery actions without a second round-trip.
diags = _compute_task_diagnostics(conn, task_ids=[task_id])
diag_list = diags.get(task_id) or []
if diag_list:
task_d["diagnostics"] = diag_list
task_d["warnings"] = _warnings_summary_from_diagnostics(diag_list)
return {
"task": task_d,
"comments": [_comment_dict(c) for c in kanban_db.list_comments(conn, task_id)],
@ -795,6 +848,89 @@ def bulk_update(payload: BulkTaskBody, board: Optional[str] = Query(None)):
conn.close()
# ---------------------------------------------------------------------------
# Diagnostics — fleet-wide distress signals (hallucinations, crashes,
# spawn failures, stuck-blocked). See hermes_cli.kanban_diagnostics for
# the rule engine.
# ---------------------------------------------------------------------------
@router.get("/diagnostics")
def list_diagnostics(
board: Optional[str] = Query(None, description="Kanban board slug (omit for current)"),
severity: Optional[str] = Query(
None,
description="Filter by severity: warning|error|critical",
),
):
"""Return ``[{task_id, task_title, task_status, task_assignee,
diagnostics: [...]}, ...]`` for every task on the board with at
least one active diagnostic.
Severity-filterable so the UI can render "just the critical ones"
or the CLI can grep. Useful for the board-header attention strip
AND for ``hermes kanban diagnostics`` which shells to this
endpoint when the dashboard's running, or invokes the engine
directly when it isn't.
"""
board = _resolve_board(board)
conn = _conn(board=board)
try:
diags_by_task = _compute_task_diagnostics(conn, task_ids=None)
if not diags_by_task:
return {"diagnostics": [], "count": 0}
# Narrow by severity if asked.
if severity:
filtered: dict[str, list[dict]] = {}
for tid, dl in diags_by_task.items():
keep = [d for d in dl if d.get("severity") == severity]
if keep:
filtered[tid] = keep
diags_by_task = filtered
if not diags_by_task:
return {"diagnostics": [], "count": 0}
# Pull the task rows we need in one query so we can include
# titles/statuses without a per-task lookup.
ids = list(diags_by_task.keys())
placeholders = ",".join(["?"] * len(ids))
rows = {
r["id"]: r
for r in conn.execute(
f"SELECT id, title, status, assignee FROM tasks WHERE id IN ({placeholders})",
tuple(ids),
).fetchall()
}
out = []
for tid, dl in diags_by_task.items():
r = rows.get(tid)
out.append({
"task_id": tid,
"task_title": r["title"] if r else None,
"task_status": r["status"] if r else None,
"task_assignee": r["assignee"] if r else None,
"diagnostics": dl,
})
# Sort: highest severity first, then most recent.
from hermes_cli.kanban_diagnostics import SEVERITY_ORDER
sev_idx = {s: i for i, s in enumerate(SEVERITY_ORDER)}
def _sort_key(row):
top = row["diagnostics"][0]
return (
-sev_idx.get(top.get("severity"), -1),
-(top.get("last_seen_at") or 0),
)
out.sort(key=_sort_key)
return {
"diagnostics": out,
"count": sum(len(d["diagnostics"]) for d in out),
}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Recovery actions — reclaim a running claim, reassign to a new profile
# ---------------------------------------------------------------------------