hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
Teknium	24d48ffb82	feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks (#21435 ) * feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks The Triage column shipped with a placeholder 'a specifier will flesh out the spec', but the specifier itself was never built. This wires it up as a dedicated CLI verb. `hermes kanban specify <id>` calls the auxiliary LLM (configured under `auxiliary.triage_specifier`) to expand a rough one-liner into a concrete spec — tightened title plus a body with Goal / Approach / Acceptance criteria / Out-of-scope sections — then atomically flips `status: triage -> todo` and recomputes ready so parent-free tasks go straight to the dispatcher on the same tick. Surface: hermes kanban specify <task_id> # single task hermes kanban specify --all [--tenant T] # sweep triage column hermes kanban specify ... --author NAME # audit-comment author hermes kanban specify ... --json # one JSON line per task Design choices: - Parent gating is preserved. specify_triage_task flips to 'todo', then recompute_ready promotes to 'ready' only when parents are done — same rule as a normal parent-gated todo. - No daemon, no background watcher. Every invocation is explicit — keeps cost predictable and doesn't fight the dispatcher loop. - Response parse is lenient: strict JSON preferred, markdown-fence tolerated, raw-body fallback on malformed JSON so the LLM can't strand a task in triage. - All failure modes (no aux client, API error, task moved out of triage mid-call) return SpecifyOutcome(ok=False, reason=...) so --all continues past individual failures. Changes: hermes_cli/kanban_db.py + specify_triage_task() hermes_cli/kanban_specify.py NEW (~220 LOC — prompt, parse, call) hermes_cli/kanban.py + specify subcommand + _cmd_specify hermes_cli/config.py + auxiliary.triage_specifier task slot website/docs/user-guide/features/kanban.md specify + config notes website/docs/reference/cli-commands.md CLI reference entry tests/hermes_cli/test_kanban_specify_db.py NEW (10 tests) tests/hermes_cli/test_kanban_specify.py NEW (20 tests) Validation: 30/30 targeted tests pass. E2E: triage task -> specify -> ends in 'ready' with events [created, specified, promoted] and the audit comment recorded under the configured author. * feat(kanban): wire specifier into dashboard and gateway slash Follow-ups to the initial PR #21435 — closes the two gaps I'd left as post-merge: dashboard button and first-class gateway surface. Dashboard (plugins/kanban/dashboard/) - POST /tasks/:id/specify NEW endpoint. Thin wrapper around kanban_specify.specify_task(). Returns the CLI outcome shape ({ok, task_id, reason, new_title}); ok=false with a human reason is a 200, not a 4xx, so the UI can render it inline without treating 'no aux client configured' as a crash. - Runs sync in FastAPI's threadpool because the LLM call can take tens of seconds on reasoning models. - Pins HERMES_KANBAN_BOARD around the specify call so the module's argless kb.connect() lands on the right board. - dist/index.js: doSpecify callback threaded through the drawer → TaskDetail → StatusActions prop chain. ✨ Specify button appears ONLY when task.status === 'triage' (elsewhere the backend would reject anyway — hide the button to keep the action row clean). Busy state (Specifying…) + inline success/error banner under the button using the response.reason text. - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using existing --color vars so themes reskin cleanly. Gateway slash (/kanban specify) - Already works via the existing run_slash → build_parser → kanban_command pipeline. No code change needed — slash commands inherit the argparse tree automatically. Added coverage: test_run_slash_specify_end_to_end (create --triage, specify, verify promotion + retitle) and test_run_slash_specify_help_is_reachable. Tests - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the REST endpoint — happy path, non-triage rejection as ok=false 200, missing aux client as ok=false 200. - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests. Docs - website/docs/user-guide/features/kanban.md: dashboard action row description mentions ✨ Specify + all three surfaces. REST table gains /tasks/:id/specify. Slash examples include /kanban specify. Validation: 340/340 targeted tests pass. E2E via TestClient: create a triage task over REST → POST /specify with mocked aux client → task moves to 'ready' column on /board with new title and body applied.	2026-05-07 13:04:41 -07:00
SandroHub013	36ad97337a	fix(kanban): treat dashboard event-stream cancellation as normal shutdown Stopping `hermes dashboard` with Ctrl-C while the Kanban dashboard is open prints an ASGI traceback ending in `plugins/kanban/dashboard/plugin_api.py::stream_events` at the `asyncio.sleep(_EVENT_POLL_SECONDS)` line. This is a normal shutdown path: Uvicorn cancels the open websocket task while it is sleeping in the 300 ms poll loop. `asyncio.CancelledError` is a `BaseException` in Python 3.8+ — the bare `except Exception:` handler below the existing `WebSocketDisconnect:` clause does NOT catch it, so the cancellation surfaces as an application traceback and routine dashboard exit looks like a runtime failure. Add an explicit `except asyncio.CancelledError: return` clause beside the existing `WebSocketDisconnect` handler. Disconnection (client closed the tab) and shutdown cancellation (dashboard process exiting) are conceptually different paths but both warrant a quiet return; the two clauses are kept separate to keep that intent explicit. `asyncio` is already imported and used in this scope, so no new import is needed. The bare `except Exception:` handler is preserved verbatim, so genuine runtime failures still log a warning and close the socket cleanly. Closes #20790.	2026-05-07 05:31:07 -07:00
Brecht-H	3f97297413	feat(kanban): surface task_runs.summary on dashboard cards + ``kanban show`` The kanban-worker skill (built into the gateway dispatcher's spawn prompt) instructs every worker to hand off via ``kanban_complete(summary=..., metadata=...)``. That writes the summary onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the latter is left NULL unless the caller passes ``result=`` explicitly. Result: a glance at the dashboard or ``hermes kanban show <id>`` shows a blank "Result:" section even when the worker did real work, which on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a task that had a 10-line completion summary on its run. This patch surfaces the latest non-null run summary as ``latest_summary`` so the worker's actual handoff lands in front of operators. * New helpers ``kanban_db.latest_summary(conn, task_id)`` and ``kanban_db.latest_summaries(conn, task_ids)``. The batch variant uses a single window-function SELECT so the dashboard board endpoint doesn't pay an N+1 cost on multi-hundred-task boards. * CLI ``hermes kanban show <id>`` prints a "Latest summary:" block when ``tasks.result`` is empty but a run has produced a summary (the existing "Result:" section still wins when populated, so the back-compat path for hand-edited results is untouched). JSON output gains a top-level ``latest_summary`` field. * Dashboard ``/board`` and ``/tasks/{id}`` now include a ``latest_summary`` field on every task. Cards on /board carry a 200-character preview (cheap to render, plenty for "what did this worker do?" at a glance); the drawer/detail endpoint returns the full summary. * Five new tests cover: empty-runs case, post-complete surface, newest-of-multiple selection, empty-string skip, batch with missing tasks + empty input. Smoke-tested locally against the live profile DB on the three acceptance-criterion targets (t_f08fef91 cron-hygiene-audit, t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now return their populated summaries via both ``latest_summary`` and ``latest_summaries``. Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests pass. No regression on tasks where ``tasks.result`` is explicitly populated (the existing "Result:" branch is preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 17:26:15 -07:00
daixin1204	d2c6eceed9	fix(kanban): prevent child task dispatch when parent is not done Add parent dependency guard to _set_status_direct so dragging a task to the ready column is rejected (409) when its parents are not all done. Previously the guard only existed in recompute_ready, allowing direct status writes via the dashboard API to bypass the dependency engine. Root cause: after reclaiming stale workers, both T3 and T4 were set to ready via dashboard status writes in quick succession, causing the writer to be spawned while the analyst was blocked — upstream work wasn't done yet.	2026-05-05 17:26:15 -07:00
Teknium	f67063ba81	feat(kanban): generic diagnostics engine for task distress signals (#20332 ) * feat(kanban): generic diagnostics engine for task distress signals Replaces the hallucination-specific ``warnings`` / ``RecoverySection`` surface (shipped in PR #20232) with a reusable diagnostic-rule engine that covers five distress kinds in v1 and can be extended without touching UI code. The "something's wrong with this task" signal is no longer limited to phantom card ids. Closes the follow-up from #20232 discussion. New module ---------- ``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule engine. Each rule is a pure function of ``(task, events, runs, now, config) -> list[Diagnostic]``. Registry is a simple list; adding a new distress kind is one function + one import, no UI or API changes required. v1 rule set ----------- * ``hallucinated_cards`` (error) — folds the existing ``completion_blocked_hallucination`` event into the new surface. * ``prose_phantom_refs`` (warning) — folds ``suspected_hallucinated_references``. * ``repeated_spawn_failures`` (error → critical at 2x threshold) — fires when ``tasks.spawn_failures >= 3``; suggests ``hermes -p <profile> doctor`` / ``auth``. * ``repeated_crashes`` (error → critical) — fires after N consecutive ``crashed`` run outcomes with no successful completion between; suggests ``hermes kanban log <id>``. * ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked`` state with no comments / unblock attempts; suggests commenting. Every diagnostic carries structured ``actions`` (reclaim, reassign, unblock, cli_hint, comment, open_docs) that render consistently in both CLI and dashboard. Suggested actions are highlighted; generic recovery actions (reclaim / reassign) are available on every kind as fallbacks. Diagnostics auto-clear when the underlying failure resolves — a clean ``completed``/``edited`` event drops hallucination diagnostics, a successful run drops crash diagnostics, a comment drops stuck-blocked diagnostics. Audit events persist; the badge goes away. API --- ``plugin_api.py``: * ``/board`` now attaches ``diagnostics`` (full list) and ``warnings`` (compact summary with ``highest_severity``) per task. * ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics section auto-opens on flagged tasks. * NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by severity, sorted critical-first. CLI --- * NEW ``hermes kanban diagnostics [--severity X] [--task id] [--json]`` — fleet view or single-task view, matches dashboard rule output so CLI users see the same picture. * ``hermes kanban show <id>`` now renders a Diagnostics section near the top with severity markers + suggested actions. Dashboard --------- * Card badge is severity-coloured (⚠ amber warning, !! orange error, !!! red critical) using ``warnings.highest_severity``. * Attention strip above the toolbar counts EVERY task with active diagnostics (not just hallucinations), severity-coloured, lists affected tasks with Open buttons when expanded. * Drawer's old ``RecoverySection`` replaced with generic ``DiagnosticsSection`` rendering a card per active diagnostic: title + detail + structured data (task-id chips when payload keys look like id lists) + action buttons. Reassign profile picker is inline per-diagnostic. Clipboard fallback uses ``.catch()`` for environments where writeText rejects. * Three-rung severity palette; amber for warning, orange for error, red for critical. Uses CSS variables so theming is straightforward. Tests ----- * NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests covering each rule's positive/negative/threshold paths, severity sorting, broken-rule isolation, and sqlite3.Row integration. * Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty, populated, severity-filtered), ``/board`` exposes both diagnostic list and compact summary with ``highest_severity``. * Existing hallucination-specific test (``test_board_surfaces_ warnings_field_for_hallucinated_completions``) updated to reflect the new contract: warning summary keys by diagnostic kind (``hallucinated_cards``) not event kind. 379 kanban-suite tests pass (+16 net from this PR). Live verification ----------------- Seeded all 5 diagnostic kinds + one clean + one plain-running task (7 total) into an isolated HERMES_HOME, spun up the dashboard, and verified: * Attention strip: shows ``!! 5 tasks need attention`` in the error-severity orange; Show expands to a list of 5 rows ordered critical > error > warning. * Card badges: error tasks render ``!!`` orange, warning tasks render ``⚠`` amber, clean and plain-running tasks render no badge. * Each of the 5 rules opens a correctly-coloured, correctly-styled diagnostic card in the drawer with its specific suggested action. * Live reassign from a diagnostic card flipped ``broken-ml-worker → alice`` and the drawer refreshed with the new assignee + the same diagnostic still firing (correct: spawn_failures counter hasn't reset yet). * CLI ``hermes kanban diagnostics`` prints all 5 in severity order; ``--severity error`` narrows to 3; ``kanban show <id>`` includes the Diagnostics block at the top with suggested action hint. Migration note -------------- The old ``warnings`` shape (``{count, kinds, latest_at}``) is preserved on the API but ``kinds`` now keys by diagnostic kind (``hallucinated_cards``) instead of event kind (``completion_blocked_hallucination``). ``highest_severity`` is a new required field. The dashboard was the only consumer and has been updated in the same commit; external API consumers of the ``warnings`` field will need to update their kind-match logic. * feat(kanban/diagnostics): lead titles with the actual error text The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn N times' titles buried the actual cause in the data section. Operators had to open logs or expand the diagnostic to see WHY the worker is stuck — rate-limit vs insufficient quota vs bad auth vs context overflow vs network blip all looked identical at a glance. New titles: Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance Agent crashed 3x: provider auth error: 401 Unauthorized Agent spawn failed 4x: insufficient_quota: You exceeded your current Detail keeps the full error snippet (capped at 500 chars + ellipsis for tracebacks). Title takes the first line capped at 160 chars. Fallback title if no error recorded stays honest ('no error recorded'). Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total pass (+4). Live-verified on dashboard with 6 seeded scenarios (rate-limit, billing, auth, context, network, spawn-billing) — each card title leads with the actionable error text.	2026-05-05 13:32:42 -07:00
Teknium	de9238d37e	feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232 ) Workers completing a kanban task can now claim the ids of cards they created via an optional ``created_cards`` field on ``kanban_complete``. The kernel verifies each id exists and was created by the completing worker's profile; any phantom id blocks the completion with a ``HallucinatedCardsError`` and records a ``completion_blocked_hallucination`` event on the task so the rejected attempt is auditable. Successful completions also get a non-blocking prose-scan pass over their ``summary`` + ``result`` that emits a ``suspected_hallucinated_references`` event for any ``t_<hex>`` reference that doesn't resolve. Closes #20017. Recovery UX (kernel + CLI + dashboard) -------------------------------------- A structural gate alone isn't enough — operators also need to see and act on stuck workers, especially when a profile's model is the root cause. This PR ships the full loop: * ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that releases an active worker claim immediately (unlike ``release_stale_claims`` which only acts after claim_expires has passed). Emits a ``reclaimed`` event with ``manual: True`` payload. * ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` — switch a task to a different profile, optionally reclaiming a stuck running worker in the same call. * ``hermes kanban reclaim <id> [--reason ...]`` and ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]`` CLI subcommands wired through to the same helpers. * ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the dashboard plugin. Dashboard surfacing ------------------- * ⚠ warning badge on cards with active hallucination events. * attention strip at the top of the board listing all flagged tasks; dismissible per session. * events callout in the task drawer — hallucination events render with a red left border, amber icon, and phantom ids as styled chips. * recovery section in the task drawer with three actions: Reclaim, Reassign (with profile picker + reclaim-first checkbox), and a copy-to-clipboard hint for ``hermes -p <profile> model`` since profile config lives on disk and can't be edited from the browser. Auto-opens when the task has warnings, collapsed otherwise. Keyed by task id so state doesn't leak between drawers. Active-vs-stale rule: warnings clear when a clean ``completed`` or ``edited`` event supersedes the hallucination, so recovery is never permanently stigmatising — the audit events persist for debugging but the badge goes away once the worker succeeds. Skill updates ------------- * ``skills/devops/kanban-worker/SKILL.md`` documents the ``created_cards`` contract with good/bad examples. * ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering stuck workers" section with the three actions and when to use each. Tests ----- * Kernel gate: verified-cards manifest, phantom rejection + audit event, cross-worker rejection, prose scan positive + negative. * Recovery helpers: reclaim on running task, reclaim on non-running returns False, reassign refuses running without reclaim_first, reassign with reclaim_first succeeds on running. * API endpoints: warnings field present on /board and /tasks/:id, warnings cleared after clean completion, reclaim 200 + 409 paths, reassign 200 + 409 + reclaim_first paths. * CLI smoke: reclaim + reassign subcommands. Live-verified end-to-end on a dashboard with seeded scenarios: attention strip renders, badges land on the right cards, drawer callout shows phantom chips, Reclaim on a running task flips status to ready + emits manual reclaimed event + refreshes the drawer, Reassign swaps the assignee and triggers board refresh. 359/359 kanban-suite tests pass (test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).	2026-05-05 08:06:55 -07:00
LeonSGP43	354502ee48	fix(kanban): preserve dashboard completion summaries	2026-05-05 04:57:38 -07:00
Teknium	1c7c7c3c5f	feat(kanban-dashboard): per-platform home-channel notification toggles (#19864 ) * revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718) Reverts `ff3d2773e2`. Teknium reviewed the merged PR and decided this behavior isn't wanted — tool-driven kanban_create should not mirror the slash-command path's auto-subscribe. Orchestrators that want their originating chat notified can call kanban_notify-subscribe explicitly; we're not going to make it implicit. * feat(kanban-dashboard): per-platform home-channel notification toggles Adds a "Notify home channels" section to the task drawer in the kanban dashboard plugin. Each platform where the user has set a home channel (/sethome, TELEGRAM_HOME_CHANNEL env var, gateway.platforms.<p>.home_channel in config.yaml) gets a toggle pill. Toggling on writes a kanban_notify_subs row keyed to that platform's home (chat_id + thread_id); toggling off removes it. The existing gateway notifier watcher delivers completed / blocked / gave_up events without any new plumbing — this is purely a GUI surface over existing machinery. Replaces the reverted auto-subscribe behavior from #19718 with an explicit, per-task, per-platform, user-controlled opt-in. No implicit subscription on tool-driven kanban_create; no CLI commands; no slash commands. Just a toggle in the drawer. Backend (plugins/kanban/dashboard/plugin_api.py): - GET /api/plugins/kanban/home-channels[?task_id=X] Returns every platform with a configured home, plus a per-entry subscribed: bool relative to task_id (false when task_id omitted). Reads the live GatewayConfig via load_gateway_config() so env-var overlays stay honored. - POST /api/plugins/kanban/tasks/:id/home-subscribe/:platform Idempotent add_notify_sub keyed to the platform's home. - DELETE /api/plugins/kanban/tasks/:id/home-subscribe/:platform remove_notify_sub for the same tuple. - 404 when the platform has no home configured, or task_id doesn't exist (POST only). Frontend (plugins/kanban/dashboard/dist/index.js): - TaskDrawer fetches /home-channels on open, keyed on task_id. - HomeSubsSection renders nothing when zero platforms have a home (so users who haven't set one up don't see an empty UI block). - Optimistic toggle with busy flag + revert-on-failure. One pill per platform; ✓ prefix and --on class indicate the subscribed state. CSS (plugins/kanban/dashboard/dist/style.css): - .hermes-kanban-home-subs flex row + .hermes-kanban-home-sub pill style + --on subscribed variant (subtle ring-colored background). Live-tested against a dashboard with TELEGRAM + DISCORD_BOT_TOKEN / HOME_CHANNEL env vars set: drawer shows both pills, toggling each flips its visual state AND writes/removes the correct kanban_notify_subs row (verified via direct DB read). Tests (tests/plugins/test_kanban_dashboard_plugin.py, 11 new, 53/53 pass total): - home-channels lists only platforms with a home (slack with a token but no home is excluded) - no task_id -> all subscribed=false - subscribe creates notify_sub row with correct chat/thread/platform - subscribed=true reflected in subsequent GET - idempotent re-subscribe - unknown platform -> 404 - unknown task -> 404 - unsubscribe removes the row - telegram + discord subscribe/unsubscribe independent - zero homes -> empty list	2026-05-04 12:31:21 -07:00
luyao618	6b3efcee49	fix(kanban): reject direct status transition to 'running' via dashboard API The PATCH /tasks/:id endpoint allows setting status='running' via _set_status_direct(), bypassing the dispatcher/claim path that creates run rows, claim locks, expiry, and worker process metadata. This can leave tasks stuck in 'running' with no active worker. Fix: reject status='running' with HTTP 400, requiring all transitions to 'running' to go through the canonical claim_task() path. Closes #19535	2026-05-04 04:46:47 -07:00
Teknium	5ec6baa400	feat(kanban): multi-project boards — one install, many kanbans (#19653 ) Adds first-class board support to kanban so users can separate unrelated streams of work (projects, repos, domains) into isolated queues. Single- project users stay on the 'default' board and see no UI change. Isolation model --------------- - Each board is a directory at `~/.hermes/kanban/boards/<slug>/` with its own `kanban.db`, `workspaces/`, and `logs/`. The 'default' board keeps its legacy path (`~/.hermes/kanban.db`) for back-compat — fresh installs and pre-boards users get zero migration. - Workers spawned by the dispatcher have `HERMES_KANBAN_BOARD` pinned in their env alongside the existing `HERMES_KANBAN_DB` / `HERMES_KANBAN_WORKSPACES_ROOT` pins, so workers physically cannot see other boards' tasks. - The gateway's single dispatcher loop now sweeps every board per tick; per-tick cost is a few extra filesystem stats. - CAS concurrency guarantees are preserved per-board (each board is its own SQLite DB, same WAL+IMMEDIATE machinery as before). CLI --- hermes kanban boards list\|create\|switch\|show\|rename\|rm hermes kanban --board <slug> <any-subcommand> Board resolution order: `--board` flag → `HERMES_KANBAN_BOARD` env → `~/.hermes/kanban/current` file → `default`. Slug validation is strict: lowercase alphanumerics + hyphens + underscores, 1-64 chars, starts with alphanumeric. Uppercase is auto-downcased; slashes / dots / `..` / control chars are rejected so boards can't name their way out of the boards/ directory. Passive discoverability: when more than one board exists, `hermes kanban list` prints a one-line header ("Board: foo (2 other boards …)") so users who stumble across multi-project never have to hunt for the feature. Invisible for single-board installs. Dashboard --------- - New `BoardSwitcher` component at the top of the Kanban tab: dropdown with all boards + task counts, `+ New board` button, `Archive` button (non-default only). Hidden entirely when only `default` exists and is empty — single-project users never see it. - New `NewBoardDialog` modal: slug / display name / description / icon + "switch to this board after creating" checkbox. - Selected board persists to `localStorage` so browser users don't shift the CLI's active board out from under a terminal they left open. - New `?board=<slug>` query param on every existing endpoint plus a new `/boards` CRUD surface (`GET /boards`, `POST /boards`, `PATCH /boards/<slug>`, `DELETE /boards/<slug>`, `POST /boards/<slug>/switch`). - Events WebSocket is pinned to a board at connection time; switching opens a fresh WS against the new board. Also fixes a pre-existing bug in the plugin's tenant / assignee filters: the SDK's `Select` uses `onValueChange(value)`, not native `onChange(event)`, so those filters silently didn't work. New `selectChangeHandler` helper wires both signatures. Tests ----- 49 new tests in `tests/hermes_cli/test_kanban_boards.py` covering: slug validation (valid / invalid / auto-downcase), path resolution (default = legacy path, named = `boards/<slug>/`, env var override), current-board resolution chain (env > file > default), board CRUD + archive / hard-delete, per-board connection isolation (tasks don't leak), worker spawn env injection (`HERMES_KANBAN_BOARD`, `HERMES_KANBAN_DB`, `HERMES_KANBAN_WORKSPACES_ROOT` all point at the right board), and end-to-end CLI surface. Regression surface: all 264 pre-existing kanban tests continue to pass. Live-tested via the dashboard: created 3 boards (default, hermes-agent, atm10-server), created tasks on each via both CLI (`--board <slug> create`) and dashboard (inline create on the Ready column), confirmed zero cross-board leakage, confirmed `BoardSwitcher` + `NewBoardDialog` work end-to-end in the browser.	2026-05-04 04:42:38 -07:00
Teknium	c868425467	feat(kanban): durable multi-profile collaboration board (#17805 ) Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init\|create\|list\|show\|assign\|link\|unlink\| claim\|comment\|complete\|block\|unblock\|archive\|tail\|dispatch\|context\| init\|gc\|watch\|stats\|notify\|log\|heartbeat\|runs\|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>	2026-04-30 13:36:47 -07:00

11 commits