hermes-agent/hermes_cli/kanban_specify.py
Teknium 24d48ffb82
feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)
* feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks

The Triage column shipped with a placeholder 'a specifier will flesh
out the spec', but the specifier itself was never built. This wires
it up as a dedicated CLI verb.

`hermes kanban specify <id>` calls the auxiliary LLM (configured under
`auxiliary.triage_specifier`) to expand a rough one-liner into a
concrete spec — tightened title plus a body with Goal / Approach /
Acceptance criteria / Out-of-scope sections — then atomically flips
`status: triage -> todo` and recomputes ready so parent-free tasks
go straight to the dispatcher on the same tick.

Surface:

  hermes kanban specify <task_id>               # single task
  hermes kanban specify --all [--tenant T]      # sweep triage column
  hermes kanban specify ... --author NAME       # audit-comment author
  hermes kanban specify ... --json              # one JSON line per task

Design choices:

  - Parent gating is preserved. specify_triage_task flips to 'todo',
    then recompute_ready promotes to 'ready' only when parents are
    done — same rule as a normal parent-gated todo.
  - No daemon, no background watcher. Every invocation is explicit —
    keeps cost predictable and doesn't fight the dispatcher loop.
  - Response parse is lenient: strict JSON preferred, markdown-fence
    tolerated, raw-body fallback on malformed JSON so the LLM can't
    strand a task in triage.
  - All failure modes (no aux client, API error, task moved out of
    triage mid-call) return SpecifyOutcome(ok=False, reason=...) so
    --all continues past individual failures.

Changes:

  hermes_cli/kanban_db.py    + specify_triage_task()
  hermes_cli/kanban_specify.py  NEW (~220 LOC — prompt, parse, call)
  hermes_cli/kanban.py       + specify subcommand + _cmd_specify
  hermes_cli/config.py       + auxiliary.triage_specifier task slot
  website/docs/user-guide/features/kanban.md  specify + config notes
  website/docs/reference/cli-commands.md      CLI reference entry
  tests/hermes_cli/test_kanban_specify_db.py    NEW (10 tests)
  tests/hermes_cli/test_kanban_specify.py       NEW (20 tests)

Validation: 30/30 targeted tests pass. E2E: triage task -> specify ->
ends in 'ready' with events [created, specified, promoted] and the
audit comment recorded under the configured author.

* feat(kanban): wire specifier into dashboard and gateway slash

Follow-ups to the initial PR #21435 — closes the two gaps I'd left as
post-merge: dashboard button and first-class gateway surface.

Dashboard (plugins/kanban/dashboard/)
  - POST /tasks/:id/specify  NEW endpoint. Thin wrapper around
    kanban_specify.specify_task(). Returns the CLI outcome shape
    ({ok, task_id, reason, new_title}); ok=false with a human reason
    is a 200, not a 4xx, so the UI can render it inline without
    treating 'no aux client configured' as a crash.
  - Runs sync in FastAPI's threadpool because the LLM call can take
    tens of seconds on reasoning models.
  - Pins HERMES_KANBAN_BOARD around the specify call so the module's
    argless kb.connect() lands on the right board.
  - dist/index.js: doSpecify callback threaded through the drawer →
    TaskDetail → StatusActions prop chain.  Specify button appears
    ONLY when task.status === 'triage' (elsewhere the backend would
    reject anyway — hide the button to keep the action row clean).
    Busy state (Specifying…) + inline success/error banner under the
    button using the response.reason text.
  - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using
    existing --color vars so themes reskin cleanly.

Gateway slash (/kanban specify)
  - Already works via the existing run_slash → build_parser →
    kanban_command pipeline. No code change needed — slash commands
    inherit the argparse tree automatically. Added coverage:
    test_run_slash_specify_end_to_end (create --triage, specify, verify
    promotion + retitle) and test_run_slash_specify_help_is_reachable.

Tests
  - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the
    REST endpoint — happy path, non-triage rejection as ok=false 200,
    missing aux client as ok=false 200.
  - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests.

Docs
  - website/docs/user-guide/features/kanban.md: dashboard action row
    description mentions  Specify + all three surfaces. REST table
    gains /tasks/:id/specify. Slash examples include /kanban specify.

Validation: 340/340 targeted tests pass. E2E via TestClient: create a
triage task over REST → POST /specify with mocked aux client → task
moves to 'ready' column on /board with new title and body applied.
2026-05-07 13:04:41 -07:00

265 lines
8.5 KiB
Python

"""Kanban triage specifier — flesh out a one-liner into a real spec.
Used by ``hermes kanban specify [task_id | --all]``. Takes a task that
lives in the Triage column (a rough idea, typically only a title), calls
the auxiliary LLM to produce:
* A tightened title (optional — only replaces if the model proposes a
materially different one)
* A concrete body: goal, proposed approach, acceptance criteria
and then flips the task ``triage -> todo`` via
``kanban_db.specify_triage_task``. The dispatcher promotes it to
``ready`` on its next tick (or immediately if there are no open parents).
Design notes
------------
* This module intentionally mirrors ``hermes_cli/goals.py`` — same aux
client pattern, same "empty config => skip, don't crash" tolerance.
Keeps the surface area tiny and the failure modes predictable.
* The prompt is a short system + user pair. We ask for JSON with
``{title, body}``; if parsing fails, we fall back to treating the
whole response as the body and leave the title untouched. No
retry loop — one shot, keep cost bounded.
* Structured output / JSON mode is not requested explicitly so the
specifier works on providers that don't implement it. The parse
is lenient (tolerates markdown code fences around the JSON).
"""
from __future__ import annotations
import json
import logging
import os
import re
from dataclasses import dataclass
from typing import Optional
from hermes_cli import kanban_db as kb
logger = logging.getLogger(__name__)
_SYSTEM_PROMPT = """You are the Kanban triage specifier for the Hermes Agent board.
A user dropped a rough idea into the Triage column. Your job is to turn it
into a concrete, actionable task spec that an autonomous worker can pick up
and execute without further clarification.
Output a single JSON object with exactly two keys:
{
"title": "<tightened task title, <= 80 chars, imperative voice>",
"body": "<multi-line spec, see structure below>"
}
The body MUST include these sections, each prefixed with a bold markdown
heading, in this order:
**Goal** — one sentence, user-facing outcome.
**Approach** — 2-5 bullets on how a worker should tackle it.
**Acceptance criteria** — checklist of concrete, verifiable conditions.
**Out of scope** — short list of things NOT to touch (omit if nothing
obvious; never invent scope creep).
Rules:
- Keep the tightened title close in meaning to the original idea — do
NOT invent a different project.
- If the original idea is already detailed, preserve its substance and
just reformat into the sections above.
- Never add invented requirements the user didn't hint at.
- No preamble, no closing remarks, no code fences around the JSON.
- Output only the JSON object and nothing else.
"""
_USER_TEMPLATE = """Task id: {task_id}
Current title: {title}
Current body:
{body}
"""
@dataclass
class SpecifyOutcome:
"""Result of specifying a single triage task."""
task_id: str
ok: bool
reason: str = ""
new_title: Optional[str] = None
def _truncate(text: str, limit: int) -> str:
if len(text) <= limit:
return text
return text[: limit - 1] + ""
_FENCE_RE = re.compile(r"^\s*```(?:json)?\s*|\s*```\s*$", re.IGNORECASE)
def _extract_json_blob(raw: str) -> Optional[dict]:
"""Lenient JSON extraction — tolerates fenced code blocks and
leading/trailing whitespace. Returns None if nothing parses."""
if not raw:
return None
stripped = _FENCE_RE.sub("", raw.strip())
# Greedy: find the first `{` and last `}` and try that slice.
first = stripped.find("{")
last = stripped.rfind("}")
if first == -1 or last == -1 or last <= first:
return None
candidate = stripped[first : last + 1]
try:
val = json.loads(candidate)
except (ValueError, json.JSONDecodeError):
return None
if not isinstance(val, dict):
return None
return val
def _profile_author() -> str:
"""Mirror of ``hermes_cli.kanban._profile_author``. Kept local to
avoid a circular import when kanban.py imports this module."""
return (
os.environ.get("HERMES_PROFILE")
or os.environ.get("USER")
or "specifier"
)
def specify_task(
task_id: str,
*,
author: Optional[str] = None,
timeout: Optional[int] = None,
) -> SpecifyOutcome:
"""Specify a single triage task and promote it to ``todo``.
Returns an outcome describing what happened. Never raises for expected
failure modes (task not in triage, no aux client configured, API
error, malformed response) — those surface via ``ok=False`` so the
``--all`` sweep can continue past individual failures.
"""
with kb.connect() as conn:
task = kb.get_task(conn, task_id)
if task is None:
return SpecifyOutcome(task_id, False, "unknown task id")
if task.status != "triage":
return SpecifyOutcome(
task_id, False, f"task is not in triage (status={task.status!r})"
)
try:
from agent.auxiliary_client import get_text_auxiliary_client
except Exception as exc: # pragma: no cover — import smoke test
logger.debug("specify: auxiliary client import failed: %s", exc)
return SpecifyOutcome(task_id, False, "auxiliary client unavailable")
try:
client, model = get_text_auxiliary_client("triage_specifier")
except Exception as exc:
logger.debug("specify: get_text_auxiliary_client failed: %s", exc)
return SpecifyOutcome(task_id, False, "auxiliary client unavailable")
if client is None or not model:
return SpecifyOutcome(
task_id, False, "no auxiliary client configured"
)
user_msg = _USER_TEMPLATE.format(
task_id=task.id,
title=_truncate(task.title or "", 400),
body=_truncate(task.body or "(no body)", 4000),
)
try:
resp = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": _SYSTEM_PROMPT},
{"role": "user", "content": user_msg},
],
temperature=0.3,
max_tokens=1500,
timeout=timeout or 120,
)
except Exception as exc:
logger.info(
"specify: API call failed for %s (%s) — skipping",
task_id, exc,
)
return SpecifyOutcome(
task_id, False, f"LLM error: {type(exc).__name__}"
)
try:
raw = resp.choices[0].message.content or ""
except Exception:
raw = ""
parsed = _extract_json_blob(raw)
new_title: Optional[str]
new_body: Optional[str]
if parsed is None:
# Fall back: treat the whole reply as the body, leave title as-is.
# Worst case the user edits afterward — still better than stranding
# the task in triage on a malformed LLM reply.
stripped_raw = raw.strip()
if not stripped_raw:
return SpecifyOutcome(
task_id, False, "LLM returned an empty response"
)
new_title = None
new_body = stripped_raw
else:
title_val = parsed.get("title")
body_val = parsed.get("body")
new_title = (
title_val.strip()
if isinstance(title_val, str) and title_val.strip()
else None
)
new_body = (
body_val if isinstance(body_val, str) and body_val.strip() else None
)
if new_body is None and new_title is None:
return SpecifyOutcome(
task_id, False, "LLM response missing title and body"
)
with kb.connect() as conn:
ok = kb.specify_triage_task(
conn,
task_id,
title=new_title,
body=new_body,
author=author or _profile_author(),
)
if not ok:
# Race: someone else promoted / archived the task between our
# read above and the write. Report, don't crash.
return SpecifyOutcome(
task_id, False, "task moved out of triage before promotion"
)
return SpecifyOutcome(task_id, True, "specified", new_title=new_title)
def list_triage_ids(*, tenant: Optional[str] = None) -> list[str]:
"""Return task ids currently in the triage column.
``tenant`` narrows the sweep; ``None`` returns every triage task.
"""
with kb.connect() as conn:
tasks = kb.list_tasks(
conn,
status="triage",
tenant=tenant,
include_archived=False,
)
return [t.id for t in tasks]