Workers completing a kanban task can now claim the ids of cards they created via an optional ``created_cards`` field on ``kanban_complete``. The kernel verifies each id exists and was created by the completing worker's profile; any phantom id blocks the completion with a ``HallucinatedCardsError`` and records a ``completion_blocked_hallucination`` event on the task so the rejected attempt is auditable. Successful completions also get a non-blocking prose-scan pass over their ``summary`` + ``result`` that emits a ``suspected_hallucinated_references`` event for any ``t_<hex>`` reference that doesn't resolve. Closes #20017. Recovery UX (kernel + CLI + dashboard) -------------------------------------- A structural gate alone isn't enough — operators also need to see and act on stuck workers, especially when a profile's model is the root cause. This PR ships the full loop: * ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that releases an active worker claim immediately (unlike ``release_stale_claims`` which only acts after claim_expires has passed). Emits a ``reclaimed`` event with ``manual: True`` payload. * ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` — switch a task to a different profile, optionally reclaiming a stuck running worker in the same call. * ``hermes kanban reclaim <id> [--reason ...]`` and ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]`` CLI subcommands wired through to the same helpers. * ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the dashboard plugin. Dashboard surfacing ------------------- * ⚠ **warning badge** on cards with active hallucination events. * **attention strip** at the top of the board listing all flagged tasks; dismissible per session. * **events callout** in the task drawer — hallucination events render with a red left border, amber icon, and phantom ids as styled chips. * **recovery section** in the task drawer with three actions: Reclaim, Reassign (with profile picker + reclaim-first checkbox), and a copy-to-clipboard hint for ``hermes -p <profile> model`` since profile config lives on disk and can't be edited from the browser. Auto-opens when the task has warnings, collapsed otherwise. Keyed by task id so state doesn't leak between drawers. Active-vs-stale rule: warnings clear when a clean ``completed`` or ``edited`` event supersedes the hallucination, so recovery is never permanently stigmatising — the audit events persist for debugging but the badge goes away once the worker succeeds. Skill updates ------------- * ``skills/devops/kanban-worker/SKILL.md`` documents the ``created_cards`` contract with good/bad examples. * ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering stuck workers" section with the three actions and when to use each. Tests ----- * Kernel gate: verified-cards manifest, phantom rejection + audit event, cross-worker rejection, prose scan positive + negative. * Recovery helpers: reclaim on running task, reclaim on non-running returns False, reassign refuses running without reclaim_first, reassign with reclaim_first succeeds on running. * API endpoints: warnings field present on /board and /tasks/:id, warnings cleared after clean completion, reclaim 200 + 409 paths, reassign 200 + 409 + reclaim_first paths. * CLI smoke: reclaim + reassign subcommands. Live-verified end-to-end on a dashboard with seeded scenarios: attention strip renders, badges land on the right cards, drawer callout shows phantom chips, Reclaim on a running task flips status to ready + emits manual reclaimed event + refreshes the drawer, Reassign swaps the assignee and triggers board refresh. 359/359 kanban-suite tests pass (test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
9.1 KiB
| name | description | version | metadata | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| kanban-orchestrator | Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role. | 2.0.0 |
|
Kanban Orchestrator — Decomposition Playbook
The core worker lifecycle (including the
kanban_createfan-out pattern and the "decompose, don't execute" rule) is auto-injected into every kanban process via theKANBAN_GUIDANCEsystem-prompt block. This skill is the deeper playbook when you're an orchestrator profile whose whole job is routing.
When to use the board (vs. just doing the work)
Create Kanban tasks when any of these are true:
- Multiple specialists are needed. Research + analysis + writing is three profiles.
- The work should survive a crash or restart. Long-running, recurring, or important.
- The user might want to interject. Human-in-the-loop at any step.
- Multiple subtasks can run in parallel. Fan-out for speed.
- Review / iteration is expected. A reviewer profile loops on drafter output.
- The audit trail matters. Board rows persist in SQLite forever.
If none of those apply — it's a small one-shot reasoning task — use delegate_task instead or answer the user directly.
The anti-temptation rules
Your job description says "route, don't execute." The rules that enforce that:
- Do not execute the work yourself. Your restricted toolset usually doesn't even include terminal/file/code/web for implementation. If you find yourself "just fixing this quickly" — stop and create a task for the right specialist.
- For any concrete task, create a Kanban task and assign it. Every single time.
- If no specialist fits, ask the user which profile to create. Do not default to doing it yourself under "close enough."
- Decompose, route, and summarize — that's the whole job.
The standard specialist roster (convention)
Unless the user's setup has customized profiles, assume these exist. Adjust to whatever the user actually has — ask if you're unsure.
| Profile | Does | Typical workspace |
|---|---|---|
researcher |
Reads sources, gathers facts, writes findings | scratch |
analyst |
Synthesizes, ranks, de-dupes. Consumes multiple researcher outputs |
scratch |
writer |
Drafts prose in the user's voice | scratch or dir: into their Obsidian vault |
reviewer |
Reads output, leaves findings, gates approval | scratch |
backend-eng |
Writes server-side code | worktree |
frontend-eng |
Writes client-side code | worktree |
ops |
Runs scripts, manages services, handles deployments | dir: into ops scripts repo |
pm |
Writes specs, acceptance criteria | scratch |
Decomposition playbook
Step 1 — Understand the goal
Ask clarifying questions if the goal is ambiguous. Cheap to ask; expensive to spawn the wrong fleet.
Step 2 — Sketch the task graph
Before creating anything, draft the graph out loud (in your response to the user). Example for "Analyze whether we should migrate to Postgres":
T1 researcher research: Postgres cost vs current
T2 researcher research: Postgres performance vs current
T3 analyst synthesize migration recommendation parents: T1, T2
T4 writer draft decision memo parents: T3
Show this to the user. Let them correct it before you create anything.
Step 3 — Create tasks and link
t1 = kanban_create(
title="research: Postgres cost vs current",
assignee="researcher",
body="Compare estimated infrastructure costs, migration costs, and ongoing ops costs over a 3-year window. Sources: AWS/GCP pricing, team time estimates, current Postgres bills from peers.",
tenant=os.environ.get("HERMES_TENANT"),
)["task_id"]
t2 = kanban_create(
title="research: Postgres performance vs current",
assignee="researcher",
body="Compare query latency, throughput, and scaling characteristics at our expected data volume (~500GB, 10k QPS peak). Sources: benchmark papers, public case studies, pgbench results if easy.",
)["task_id"]
t3 = kanban_create(
title="synthesize migration recommendation",
assignee="analyst",
body="Read the findings from T1 (cost) and T2 (performance). Produce a 1-page recommendation with explicit trade-offs and a go/no-go call.",
parents=[t1, t2],
)["task_id"]
t4 = kanban_create(
title="draft decision memo",
assignee="writer",
body="Turn the analyst's recommendation into a 2-page memo for the CTO. Match the tone of previous decision memos in the team's knowledge base.",
parents=[t3],
)["task_id"]
parents=[...] gates promotion — children stay in todo until every parent reaches done, then auto-promote to ready. No manual coordination needed; the dispatcher and dependency engine handle it.
Step 4 — Complete your own task
If you were spawned as a task yourself (e.g. planner profile was assigned T0: "investigate Postgres migration"), mark it done with a summary of what you created:
kanban_complete(
summary="decomposed into T1-T4: 2 researchers parallel, 1 analyst on their outputs, 1 writer on the recommendation",
metadata={
"task_graph": {
"T1": {"assignee": "researcher", "parents": []},
"T2": {"assignee": "researcher", "parents": []},
"T3": {"assignee": "analyst", "parents": ["T1", "T2"]},
"T4": {"assignee": "writer", "parents": ["T3"]},
},
},
)
Step 5 — Report back to the user
Tell them what you created in plain prose:
I've queued 4 tasks:
- T1 (researcher): cost comparison
- T2 (researcher): performance comparison, in parallel with T1
- T3 (analyst): synthesizes T1 + T2 into a recommendation
- T4 (writer): turns T3 into a CTO memo
The dispatcher will pick up T1 and T2 now. T3 starts when both finish. You'll get a gateway ping when T4 completes. Use the dashboard or
hermes kanban tail <id>to follow along.
Common patterns
Fan-out + fan-in (research → synthesize): N researcher tasks with no parents, one analyst task with all of them as parents.
Pipeline with gates: pm → backend-eng → reviewer. Each stage's parents=[previous_task]. Reviewer blocks or completes; if reviewer blocks, the operator unblocks with feedback and respawns.
Same-profile queue: 50 tasks, all assigned to translator, no dependencies between them. Dispatcher serializes — translator processes them in priority order, accumulating experience in their own memory.
Human-in-the-loop: Any task can kanban_block() to wait for input. Dispatcher respawns after /unblock. The comment thread carries the full context.
Pitfalls
Reassignment vs. new task. If a reviewer blocks with "needs changes," create a NEW task linked from the reviewer's task — don't re-run the same task with a stern look. The new task is assigned to the original implementer profile.
Argument order for links. kanban_link(parent_id=..., child_id=...) — parent first. Mixing them up demotes the wrong task to todo.
Don't pre-create the whole graph if the shape depends on intermediate findings. If T3's structure depends on what T1 and T2 find, let T3 exist as a "synthesize findings" task whose own first step is to read parent handoffs and plan the rest. Orchestrators can spawn orchestrators.
Tenant inheritance. If HERMES_TENANT is set in your env, pass tenant=os.environ.get("HERMES_TENANT") on every kanban_create call so child tasks stay in the same namespace.
Recovering stuck workers
When a worker profile keeps crashing, hallucinating, or getting blocked by its own mistakes (usually: wrong model, missing skill, broken credential), the kanban dashboard flags the task with a ⚠ badge and opens a Recovery section in the drawer. Three primary actions:
- Reclaim (or
hermes kanban reclaim <task_id>) — abort the running worker immediately and reset the task toready. The existing claim TTL is ~15 min; this is the fast path out. - Reassign (or
hermes kanban reassign <task_id> <new-profile> --reclaim) — switch the task to a different profile and let the dispatcher pick it up with a fresh worker. - Change profile model — the dashboard prints a copy-paste hint for
hermes -p <profile> modelsince profile config lives on disk; edit it in a terminal, then Reclaim to retry with the new model.
Hallucination warnings appear on tasks where a worker's kanban_complete(created_cards=[...]) claim included card ids that don't exist or weren't created by the worker's profile (the gate blocks the completion), or where the free-form summary references t_<hex> ids that don't resolve (advisory prose scan, non-blocking). Both produce audit events that persist even after recovery actions — the trail stays for debugging.