hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
stephen0110	40b51c93a2	fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at The kanban_heartbeat tool called heartbeat_worker but never heartbeat_claim, so a worker that loops the tool while a single tool call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got reclaimed by release_stale_claims. The function name and heartbeat_claim's own docstring imply otherwise: "Workers that know they'll exceed 15 minutes should call this every few minutes to keep ownership." But there was no caller in the worker tool path. Workers couldn't invoke heartbeat_claim themselves either — it isn't exposed as a tool. Fix: _handle_heartbeat now calls heartbeat_claim first, reading HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins this in _default_spawn). Falls back to _claimer_id() for locally- driven workers that didn't go through dispatcher spawn. Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires rewinds claim_expires into the past, calls the tool, and asserts the new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to fail against the unfixed code (claim_expires stays at the rewound value). Closes the root cause underlying the symptom in #21141 (15-min respawns of long-running workers). #21141 separately addresses post-reclaim cleanup; this fixes the upstream "shouldn't have been reclaimed in the first place" half.	2026-05-07 05:05:20 -07:00
misery-hl	56b4795115	guard kanban worker lifecycle by run id	2026-05-05 15:09:28 -07:00
Teknium	de9238d37e	feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232 ) Workers completing a kanban task can now claim the ids of cards they created via an optional ``created_cards`` field on ``kanban_complete``. The kernel verifies each id exists and was created by the completing worker's profile; any phantom id blocks the completion with a ``HallucinatedCardsError`` and records a ``completion_blocked_hallucination`` event on the task so the rejected attempt is auditable. Successful completions also get a non-blocking prose-scan pass over their ``summary`` + ``result`` that emits a ``suspected_hallucinated_references`` event for any ``t_<hex>`` reference that doesn't resolve. Closes #20017. Recovery UX (kernel + CLI + dashboard) -------------------------------------- A structural gate alone isn't enough — operators also need to see and act on stuck workers, especially when a profile's model is the root cause. This PR ships the full loop: * ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that releases an active worker claim immediately (unlike ``release_stale_claims`` which only acts after claim_expires has passed). Emits a ``reclaimed`` event with ``manual: True`` payload. * ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` — switch a task to a different profile, optionally reclaiming a stuck running worker in the same call. * ``hermes kanban reclaim <id> [--reason ...]`` and ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]`` CLI subcommands wired through to the same helpers. * ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the dashboard plugin. Dashboard surfacing ------------------- * ⚠ warning badge on cards with active hallucination events. * attention strip at the top of the board listing all flagged tasks; dismissible per session. * events callout in the task drawer — hallucination events render with a red left border, amber icon, and phantom ids as styled chips. * recovery section in the task drawer with three actions: Reclaim, Reassign (with profile picker + reclaim-first checkbox), and a copy-to-clipboard hint for ``hermes -p <profile> model`` since profile config lives on disk and can't be edited from the browser. Auto-opens when the task has warnings, collapsed otherwise. Keyed by task id so state doesn't leak between drawers. Active-vs-stale rule: warnings clear when a clean ``completed`` or ``edited`` event supersedes the hallucination, so recovery is never permanently stigmatising — the audit events persist for debugging but the badge goes away once the worker succeeds. Skill updates ------------- * ``skills/devops/kanban-worker/SKILL.md`` documents the ``created_cards`` contract with good/bad examples. * ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering stuck workers" section with the three actions and when to use each. Tests ----- * Kernel gate: verified-cards manifest, phantom rejection + audit event, cross-worker rejection, prose scan positive + negative. * Recovery helpers: reclaim on running task, reclaim on non-running returns False, reassign refuses running without reclaim_first, reassign with reclaim_first succeeds on running. * API endpoints: warnings field present on /board and /tasks/:id, warnings cleared after clean completion, reclaim 200 + 409 paths, reassign 200 + 409 + reclaim_first paths. * CLI smoke: reclaim + reassign subcommands. Live-verified end-to-end on a dashboard with seeded scenarios: attention strip renders, badges land on the right cards, drawer callout shows phantom chips, Reclaim on a running task flips status to ready + emits manual reclaimed event + refreshes the drawer, Reassign swaps the assignee and triggers board refresh. 359/359 kanban-suite tests pass (test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).	2026-05-05 08:06:55 -07:00
Teknium	3fb35520c6	revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718 ) (#19721 ) Reverts `ff3d2773e2`. Teknium reviewed the merged PR and decided this behavior isn't wanted — tool-driven kanban_create should not mirror the slash-command path's auto-subscribe. Orchestrators that want their originating chat notified can call kanban_notify-subscribe explicitly; we're not going to make it implicit.	2026-05-04 05:04:01 -07:00
Teknium	ff3d2773e2	feat(kanban): auto-subscribe gateway chat on tool-driven kanban_create (#19718 ) Closes #19479. When an orchestrator agent calls kanban_create from a gateway session (e.g. a Telegram user delegating to an orchestrator profile), auto- subscribe the originating (platform, chat, thread, user) to the new task's terminal events. Mirrors the behavior of the /kanban create slash command in gateway/run.py so tool-driven creation is at parity with human-driven creation. Without this, a user who interacts with an orchestrator exclusively via the gateway never receives blocked / completed / gave_up notifications for tasks the orchestrator created on their behalf — silently breaking the gateway-first multi-agent flow the reporter describes. Reads the context-local HERMES_SESSION_* vars via get_session_env() (not os.environ — those are contextvars for asyncio concurrency safety). Falls through cleanly in CLI / cron contexts with no session active (subscribed=False in the response). Best-effort: if the gateway module isn't importable (test rigs stubbing gateway.*), the task still creates, we just skip the subscription. Response gains a 'subscribed' bool so the orchestrator knows whether terminal events will land back in the originating chat or whether it needs to poll / unblock manually. Tests: 4 new in tests/tools/test_kanban_tools.py covering CLI/no-subscribe, telegram/gateway-auto-subscribe, discord-DM/no- thread subscribe, and partial-ctx/no-chat_id no-subscribe. 40/40 kanban tool tests pass.	2026-05-04 05:02:23 -07:00
Teknium	d3b22b76d8	fix(kanban): enforce worker task-ownership on destructive tool calls (#19713 ) Closes #19534 (security). A worker spawned by the kanban dispatcher has HERMES_KANBAN_TASK set to its own task id. The destructive tools (kanban_complete, kanban_block, kanban_heartbeat) resolved task_id via _default_task_id() which preferred an explicit arg over the env var, with no ownership check — so a buggy or prompt-injected worker could complete / block / heartbeat any OTHER task (sibling, cross-tenant, anything) by supplying its id. Reporter's repro: worker for t_A passed task_id=t_B to kanban_complete and got {"ok": true}. Fix: add _enforce_worker_task_ownership(tid). If HERMES_KANBAN_TASK is set and tid doesn't match, return a structured tool error with guidance to use kanban_comment (for information handoff across tasks) or kanban_create (for follow-up work). Orchestrator profiles (no env var, but kanban toolset enabled per #18968) are exempt — their job is routing and sometimes includes closing out child tasks. Kept unrestricted (deliberately): - kanban_show — workers legitimately read parent/sibling handoff context - kanban_comment — cross-task comments are the handoff mechanism - kanban_create — orchestrator fan-out, worker follow-up spawning - kanban_link — parent/child linking Tests: 5 new regression tests in tests/tools/test_kanban_tools.py covering the grid (worker-attacks-foreign ×3 tools, worker-own-task preserved, orchestrator-unrestricted). 36/36 pass.	2026-05-04 04:54:02 -07:00
pdonizete	deb59eab72	fix: allow kanban tools for orchestrator profiles with kanban toolset The _check_kanban_mode() gating function only checked for HERMES_KANBAN_TASK env var, which is only set by the dispatcher when spawning workers. This prevented orchestrator profiles (like techlead) from using kanban_create, kanban_link, etc. even when they had 'kanban' explicitly in their toolsets config. Now uses load_config() from hermes_cli.config (which has mtime-based caching) to check if 'kanban' is in the profile's toolsets list. This enables orchestrators to route work via Kanban while workers continue using the dispatcher env var. Fixes #18968	2026-05-04 02:00:42 -07:00
Teknium	c868425467	feat(kanban): durable multi-profile collaboration board (#17805 ) Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init\|create\|list\|show\|assign\|link\|unlink\| claim\|comment\|complete\|block\|unblock\|archive\|tail\|dispatch\|context\| init\|gc\|watch\|stats\|notify\|log\|heartbeat\|runs\|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>	2026-04-30 13:36:47 -07:00

8 commits