Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init|create|list|show|assign|link|unlink| claim|comment|complete|block|unblock|archive|tail|dispatch|context| init|gc|watch|stats|notify|log|heartbeat|runs|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>
6.8 KiB
| name | description | version | metadata | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| kanban-worker | Pitfalls, examples, and edge cases for Hermes Kanban workers. The lifecycle itself is auto-injected into every worker's system prompt as KANBAN_GUIDANCE (from agent/prompt_builder.py); this skill is what you load when you want deeper detail on specific scenarios. | 2.0.0 |
|
Kanban Worker — Pitfalls and Examples
You're seeing this skill because the Hermes Kanban dispatcher spawned you as a worker with
--skills kanban-worker— it's loaded automatically for every dispatched worker. The lifecycle (6 steps: orient → work → heartbeat → block/complete) also lives in theKANBAN_GUIDANCEblock that's auto-injected into your system prompt. This skill is the deeper detail: good handoff shapes, retry diagnostics, edge cases.
Workspace handling
Your workspace kind determines how you should behave inside $HERMES_KANBAN_WORKSPACE:
| Kind | What it is | How to work |
|---|---|---|
scratch |
Fresh tmp dir, yours alone | Read/write freely; it gets GC'd when the task is archived. |
dir:<path> |
Shared persistent directory | Other runs will read what you write. Treat it like long-lived state. Path is guaranteed absolute (the kernel rejects relative paths). |
worktree |
Git worktree at the resolved path | If .git doesn't exist, run git worktree add <path> <branch> from the main repo first, then cd and work normally. Commit work here. |
Tenant isolation
If $HERMES_TENANT is set, the task belongs to a tenant namespace. When reading or writing persistent memory, prefix memory entries with the tenant so context doesn't leak across tenants:
- Good:
business-a: Acme is our biggest customer - Bad (leaks):
Acme is our biggest customer
Good summary + metadata shapes
The kanban_complete(summary=..., metadata=...) handoff is how downstream workers read what you did. Patterns that work:
Coding task:
kanban_complete(
summary="shipped rate limiter — token bucket, keys on user_id with IP fallback, 14 tests pass",
metadata={
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
"tests_run": 14,
"tests_passed": 14,
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
},
)
Research task:
kanban_complete(
summary="3 competing libraries reviewed; vLLM wins on throughput, SGLang on latency, Tensorrt-LLM on memory efficiency",
metadata={
"sources_read": 12,
"recommendation": "vLLM",
"benchmarks": {"vllm": 1.0, "sglang": 0.87, "trtllm": 0.72},
},
)
Review task:
kanban_complete(
summary="reviewed PR #123; 2 blocking issues found (SQL injection in /search, missing CSRF on /settings)",
metadata={
"pr_number": 123,
"findings": [
{"severity": "critical", "file": "api/search.py", "line": 42, "issue": "raw SQL concat"},
{"severity": "high", "file": "api/settings.py", "issue": "missing CSRF middleware"},
],
"approved": False,
},
)
Shape metadata so downstream parsers (reviewers, aggregators, schedulers) can use it without re-reading your prose.
Block reasons that get answered fast
Bad: "stuck" — the human has no context.
Good: one sentence naming the specific decision you need. Leave longer context as a comment instead.
kanban_comment(
task_id=os.environ["HERMES_KANBAN_TASK"],
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers. Keying on IP alone causes false positives.",
)
kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id (requires auth, skips anonymous endpoints)?")
The block message is what appears in the dashboard / gateway notifier. The comment is the deeper context a human reads when they open the task.
Heartbeats worth sending
Good heartbeats name progress: "epoch 12/50, loss 0.31", "scanned 1.2M/2.4M rows", "uploaded 47/120 videos".
Bad heartbeats: "still working", empty notes, sub-second intervals. Every few minutes max; skip entirely for tasks under ~2 minutes.
Retry scenarios
If you open the task and kanban_show returns runs: [...] with one or more closed runs, you're a retry. The prior runs' outcome / summary / error tell you what didn't work. Don't repeat that path. Typical retry diagnostics:
outcome: "timed_out"— the previous attempt hitmax_runtime_seconds. You may need to chunk the work or shorten it.outcome: "crashed"— OOM or segfault. Reduce memory footprint.outcome: "spawn_failed"+error: "..."— usually a profile config issue (missing credential, bad PATH). Ask the human viakanban_blockinstead of retrying blindly.outcome: "reclaimed"+summary: "task archived..."— operator archived the task out from under the previous run; you probably shouldn't be running at all, check status carefully.outcome: "blocked"— a previous attempt blocked; the unblock comment should be in the thread by now.
Do NOT
- Call
delegate_taskas a substitute forkanban_create.delegate_taskis for short reasoning subtasks inside YOUR run;kanban_createis for cross-agent handoffs that outlive one API loop. - Modify files outside
$HERMES_KANBAN_WORKSPACEunless the task body says to. - Create follow-up tasks assigned to yourself — assign to the right specialist.
- Complete a task you didn't actually finish. Block it instead.
Pitfalls
Task state can change between dispatch and your startup. Between when the dispatcher claimed and when your process actually booted, the task may have been blocked, reassigned, or archived. Always kanban_show first. If it reports blocked or archived, stop — you shouldn't be running.
Workspace may have stale artifacts. Especially dir: and worktree workspaces can have files from previous runs. Read the comment thread — it usually explains why you're running again and what state the workspace is in.
Don't rely on the CLI when the guidance is available. The kanban_* tools work across all terminal backends (Docker, Modal, SSH). hermes kanban <verb> from your terminal tool will fail in containerized backends because the CLI isn't installed there. When in doubt, use the tool.
CLI fallback (for scripting)
Every tool has a CLI equivalent for human operators and scripts:
kanban_show↔hermes kanban show <id> --jsonkanban_complete↔hermes kanban complete <id> --summary "..." --metadata '{...}'kanban_block↔hermes kanban block <id> "reason"kanban_create↔hermes kanban create "title" --assignee <profile> [--parent <id>]- etc.
Use the tools from inside an agent; the CLI exists for the human at the terminal.