mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-04 02:21:47 +00:00
Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init|create|list|show|assign|link|unlink| claim|comment|complete|block|unblock|archive|tail|dispatch|context| init|gc|watch|stats|notify|log|heartbeat|runs|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>
134 lines
6.8 KiB
Markdown
134 lines
6.8 KiB
Markdown
---
|
|
name: kanban-worker
|
|
description: Pitfalls, examples, and edge cases for Hermes Kanban workers. The lifecycle itself is auto-injected into every worker's system prompt as KANBAN_GUIDANCE (from agent/prompt_builder.py); this skill is what you load when you want deeper detail on specific scenarios.
|
|
version: 2.0.0
|
|
metadata:
|
|
hermes:
|
|
tags: [kanban, multi-agent, collaboration, workflow, pitfalls]
|
|
related_skills: [kanban-orchestrator]
|
|
---
|
|
|
|
# Kanban Worker — Pitfalls and Examples
|
|
|
|
> You're seeing this skill because the Hermes Kanban dispatcher spawned you as a worker with `--skills kanban-worker` — it's loaded automatically for every dispatched worker. The **lifecycle** (6 steps: orient → work → heartbeat → block/complete) also lives in the `KANBAN_GUIDANCE` block that's auto-injected into your system prompt. This skill is the deeper detail: good handoff shapes, retry diagnostics, edge cases.
|
|
|
|
## Workspace handling
|
|
|
|
Your workspace kind determines how you should behave inside `$HERMES_KANBAN_WORKSPACE`:
|
|
|
|
| Kind | What it is | How to work |
|
|
|---|---|---|
|
|
| `scratch` | Fresh tmp dir, yours alone | Read/write freely; it gets GC'd when the task is archived. |
|
|
| `dir:<path>` | Shared persistent directory | Other runs will read what you write. Treat it like long-lived state. Path is guaranteed absolute (the kernel rejects relative paths). |
|
|
| `worktree` | Git worktree at the resolved path | If `.git` doesn't exist, run `git worktree add <path> <branch>` from the main repo first, then cd and work normally. Commit work here. |
|
|
|
|
## Tenant isolation
|
|
|
|
If `$HERMES_TENANT` is set, the task belongs to a tenant namespace. When reading or writing persistent memory, prefix memory entries with the tenant so context doesn't leak across tenants:
|
|
|
|
- Good: `business-a: Acme is our biggest customer`
|
|
- Bad (leaks): `Acme is our biggest customer`
|
|
|
|
## Good summary + metadata shapes
|
|
|
|
The `kanban_complete(summary=..., metadata=...)` handoff is how downstream workers read what you did. Patterns that work:
|
|
|
|
**Coding task:**
|
|
```python
|
|
kanban_complete(
|
|
summary="shipped rate limiter — token bucket, keys on user_id with IP fallback, 14 tests pass",
|
|
metadata={
|
|
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
|
|
"tests_run": 14,
|
|
"tests_passed": 14,
|
|
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
|
|
},
|
|
)
|
|
```
|
|
|
|
**Research task:**
|
|
```python
|
|
kanban_complete(
|
|
summary="3 competing libraries reviewed; vLLM wins on throughput, SGLang on latency, Tensorrt-LLM on memory efficiency",
|
|
metadata={
|
|
"sources_read": 12,
|
|
"recommendation": "vLLM",
|
|
"benchmarks": {"vllm": 1.0, "sglang": 0.87, "trtllm": 0.72},
|
|
},
|
|
)
|
|
```
|
|
|
|
**Review task:**
|
|
```python
|
|
kanban_complete(
|
|
summary="reviewed PR #123; 2 blocking issues found (SQL injection in /search, missing CSRF on /settings)",
|
|
metadata={
|
|
"pr_number": 123,
|
|
"findings": [
|
|
{"severity": "critical", "file": "api/search.py", "line": 42, "issue": "raw SQL concat"},
|
|
{"severity": "high", "file": "api/settings.py", "issue": "missing CSRF middleware"},
|
|
],
|
|
"approved": False,
|
|
},
|
|
)
|
|
```
|
|
|
|
Shape `metadata` so downstream parsers (reviewers, aggregators, schedulers) can use it without re-reading your prose.
|
|
|
|
## Block reasons that get answered fast
|
|
|
|
Bad: `"stuck"` — the human has no context.
|
|
|
|
Good: one sentence naming the specific decision you need. Leave longer context as a comment instead.
|
|
|
|
```python
|
|
kanban_comment(
|
|
task_id=os.environ["HERMES_KANBAN_TASK"],
|
|
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers. Keying on IP alone causes false positives.",
|
|
)
|
|
kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id (requires auth, skips anonymous endpoints)?")
|
|
```
|
|
|
|
The block message is what appears in the dashboard / gateway notifier. The comment is the deeper context a human reads when they open the task.
|
|
|
|
## Heartbeats worth sending
|
|
|
|
Good heartbeats name progress: `"epoch 12/50, loss 0.31"`, `"scanned 1.2M/2.4M rows"`, `"uploaded 47/120 videos"`.
|
|
|
|
Bad heartbeats: `"still working"`, empty notes, sub-second intervals. Every few minutes max; skip entirely for tasks under ~2 minutes.
|
|
|
|
## Retry scenarios
|
|
|
|
If you open the task and `kanban_show` returns `runs: [...]` with one or more closed runs, you're a retry. The prior runs' `outcome` / `summary` / `error` tell you what didn't work. Don't repeat that path. Typical retry diagnostics:
|
|
|
|
- `outcome: "timed_out"` — the previous attempt hit `max_runtime_seconds`. You may need to chunk the work or shorten it.
|
|
- `outcome: "crashed"` — OOM or segfault. Reduce memory footprint.
|
|
- `outcome: "spawn_failed"` + `error: "..."` — usually a profile config issue (missing credential, bad PATH). Ask the human via `kanban_block` instead of retrying blindly.
|
|
- `outcome: "reclaimed"` + `summary: "task archived..."` — operator archived the task out from under the previous run; you probably shouldn't be running at all, check status carefully.
|
|
- `outcome: "blocked"` — a previous attempt blocked; the unblock comment should be in the thread by now.
|
|
|
|
## Do NOT
|
|
|
|
- Call `delegate_task` as a substitute for `kanban_create`. `delegate_task` is for short reasoning subtasks inside YOUR run; `kanban_create` is for cross-agent handoffs that outlive one API loop.
|
|
- Modify files outside `$HERMES_KANBAN_WORKSPACE` unless the task body says to.
|
|
- Create follow-up tasks assigned to yourself — assign to the right specialist.
|
|
- Complete a task you didn't actually finish. Block it instead.
|
|
|
|
## Pitfalls
|
|
|
|
**Task state can change between dispatch and your startup.** Between when the dispatcher claimed and when your process actually booted, the task may have been blocked, reassigned, or archived. Always `kanban_show` first. If it reports `blocked` or `archived`, stop — you shouldn't be running.
|
|
|
|
**Workspace may have stale artifacts.** Especially `dir:` and `worktree` workspaces can have files from previous runs. Read the comment thread — it usually explains why you're running again and what state the workspace is in.
|
|
|
|
**Don't rely on the CLI when the guidance is available.** The `kanban_*` tools work across all terminal backends (Docker, Modal, SSH). `hermes kanban <verb>` from your terminal tool will fail in containerized backends because the CLI isn't installed there. When in doubt, use the tool.
|
|
|
|
## CLI fallback (for scripting)
|
|
|
|
Every tool has a CLI equivalent for human operators and scripts:
|
|
- `kanban_show` ↔ `hermes kanban show <id> --json`
|
|
- `kanban_complete` ↔ `hermes kanban complete <id> --summary "..." --metadata '{...}'`
|
|
- `kanban_block` ↔ `hermes kanban block <id> "reason"`
|
|
- `kanban_create` ↔ `hermes kanban create "title" --assignee <profile> [--parent <id>]`
|
|
- etc.
|
|
|
|
Use the tools from inside an agent; the CLI exists for the human at the terminal.
|