feat(skill): add video-orchestrator optional creative skill

Meta-pipeline that wraps any video request — narrative film, product /
marketing, music video, explainer, ASCII, generative, comic, 3D,
real-time/installation — in a Hermes Kanban pipeline. Performs adaptive
discovery, designs an appropriate team for the requested style, generates
the setup script that creates Hermes profiles + initial kanban task, and
helps monitor execution.

Routes scenes to whichever existing Hermes skill fits each beat
(`ascii-video`, `manim-video`, `p5js`, `comfyui`, `touchdesigner-mcp`,
`blender-mcp`, `pixel-art`, `baoyu-comic`, `claude-design`, `excalidraw`,
`songsee`, `heartmula`, …) plus external APIs for TTS, image-gen, and
image-to-video. Kanban orchestration uses the `kanban-orchestrator` and
`kanban-worker` skills.

The single-project workspace layout, profile-config patching pattern,
SOUL.md-per-profile model, and `--workspace dir:<path>` discipline are
adapted from alt-glitch's original kanban-video-pipeline at
https://github.com/NousResearch/kanban-video-pipeline. This skill
generalizes those patterns across video styles and replaces the original
string-replacement config patcher with a PyYAML-based one that touches
only `toolsets` and `skills.always_load` (preserving security-sensitive
fields like `approvals.mode`).

Includes:
- SKILL.md — workflow + critical rules
- references/ — intake, role archetypes, tool matrix, kanban setup,
  monitoring, six worked examples
- assets/ — brief / setup.sh / soul.md templates
- scripts/ — bootstrap_pipeline.py (plan.json -> setup.sh) and
  monitor.py (poll + issue detection)

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
This commit is contained in:
SHL0MS 2026-05-03 11:40:34 -04:00 committed by Teknium
parent e97a9993b9
commit 511add7249
12 changed files with 2656 additions and 0 deletions

View file

@ -0,0 +1,206 @@
---
name: video-orchestrator
description: Plan, set up, and monitor a multi-agent video production pipeline backed by Hermes Kanban. Use when the user wants to make ANY video — narrative film, product/marketing, music video, explainer, ASCII/terminal art, abstract/generative loop, comic, 3D, real-time/installation — and the work warrants decomposition into specialized profiles (writer, designer, animator, renderer, voice, editor, etc.) coordinated through a kanban board. Performs adaptive discovery to scope the brief, designs an appropriate team for the requested style, generates the setup script that creates Hermes profiles + initial kanban task, then helps monitor execution and intervene when tasks stall or fail. Routes scenes to whichever Hermes rendering / audio / design skill fits each beat (`ascii-video`, `manim-video`, `p5js`, `comfyui`, `touchdesigner-mcp`, `blender-mcp`, `pixel-art`, `baoyu-comic`, `claude-design`, `excalidraw`, `songsee`, `heartmula`, …) plus external APIs for TTS, image-gen, and image-to-video as needed.
version: 1.0.0
author: [SHL0MS, alt-glitch]
license: MIT
metadata:
hermes:
tags: [video, kanban, multi-agent, orchestration, production-pipeline]
related_skills: [kanban-orchestrator, kanban-worker, ascii-video, manim-video, p5js, comfyui, touchdesigner-mcp, blender-mcp, pixel-art, ascii-art, songwriting-and-ai-music, heartmula, songsee, spotify, youtube-content, claude-design, excalidraw, architecture-diagram, concept-diagrams, baoyu-comic, baoyu-infographic, humanizer, gif-search, meme-generation]
credits: |
The single-project workspace layout, profile-config patching pattern,
SOUL.md-per-profile model, TEAM.md task-graph convention, and
`--workspace dir:<path>` discipline are adapted from alt-glitch's
original multi-agent video pipeline at
https://github.com/NousResearch/kanban-video-pipeline.
---
# Video Orchestrator
Wrap any video request — from a 15-second product teaser to a 5-minute narrative
short to a music video to an ASCII loop — in a Hermes Kanban pipeline that
decomposes the work to specialized agent profiles.
This skill does **not** render anything itself. It is a meta-pipeline that:
1. **Scopes** the request through targeted discovery
2. **Designs** an appropriate team (which roles, which tools per role) based on the style
3. **Generates** a setup script that creates Hermes profiles, project workspace, and the initial kanban task
4. **Hands off** to the director profile, which decomposes via the kanban
5. **Monitors** execution, helps intervene when tasks stall or fail
The actual rendering happens inside the kanban once it's running, via whichever
existing skills + tools fit the scenes — `ascii-video`, `manim-video`, `p5js`,
`comfyui`, `touchdesigner-mcp`, `blender-mcp`, `songwriting-and-ai-music`,
`heartmula`, external APIs, or plain Python with PIL + ffmpeg.
## When NOT to use this skill
- The video is one continuous procedural project that needs no specialists. Just write the code directly.
- The user wants a quick one-shot conversion (e.g. "convert this mp4 to a GIF") — use ffmpeg directly.
- The output is a static image, GIF, or audio-only artifact — use the matching specific skill (`ascii-art`, `gifs`, `meme-generation`, `songwriting-and-ai-music`).
- The work fits a single existing skill cleanly (e.g. a pure ASCII video — just use `ascii-video`).
## Workflow
```
DISCOVER → BRIEF → TEAM DESIGN → SETUP → EXECUTE → MONITOR
```
### Step 1 — Discover (ask the right questions)
The discovery process is **adaptive**: ask only what is actually needed. Always
start with three questions to identify the broad shape:
- **What is the video?** (one-sentence brief)
- **How long?** (5-30s teaser / 30-90s short / 90s-3min explainer / 3-10min film / longer)
- **What aspect ratio + target platform?** (1:1 / 9:16 / 16:9; X, IG, YouTube, internal, etc.)
From the answer, classify the style category. The style determines which
follow-up questions to ask. **Do not ask all questions at once.** Ask 2-4 at a
time, listen, then proceed. Make reasonable assumptions whenever the user
implies an answer.
For complete intake patterns and per-style question banks, see
**[references/intake.md](references/intake.md)**.
### Step 2 — Brief
Once enough is known, produce a structured `brief.md` using the template in
`assets/brief.md.tmpl`. Stages:
1. **Concept** — the one-sentence pitch + emotional north star
2. **Scope** — duration, aspect, platform, deadline
3. **Style** — visual references, brand constraints, tone
4. **Scenes** — beat-by-beat breakdown (durations, content, target tool)
5. **Audio** — narration / music / SFX / silent (per scene if needed)
6. **Deliverables** — file format, resolution, optional alternates (vertical cut, GIF, etc.)
Show the brief to the user for confirmation before designing the team. **The
brief is the contract** — every downstream task references it.
### Step 3 — Team design
Pick role archetypes from the library that fit this video. **Compose, don't
clone.** Most videos need 4-7 profiles. The director is always present; the
rest are picked by what the brief actually requires.
For the role library and per-style team compositions, see
**[references/role-archetypes.md](references/role-archetypes.md)**.
For mapping role → which Hermes skills + toolsets it loads, see
**[references/tool-matrix.md](references/tool-matrix.md)**.
### Step 4 — Setup
Generate a setup script (`setup.sh`) and run it. The script:
1. Creates the project workspace (`~/projects/video-pipeline/<slug>/`)
2. Copies any provided assets into `taste/`, `audio/`, `assets/`
3. Creates each Hermes profile via `hermes profile create --clone`
4. Writes per-profile `SOUL.md` (personality + role definition)
5. Configures profile YAML (toolsets, always_load skills, cwd)
6. Writes `brief.md`, `TEAM.md`, and `taste/` content
7. Fires the initial `hermes kanban create` task assigned to the director
Use `scripts/bootstrap_pipeline.py` to generate setup.sh from a brief +
team-design JSON. See **[references/kanban-setup.md](references/kanban-setup.md)**
for the setup script structure, profile config patterns, and the critical
"shared workspace" rule.
### Step 5 — Execute
Run `setup.sh`. Then provide the user with monitoring commands:
```bash
hermes kanban watch --tenant <project-tenant> # live events
hermes kanban list --tenant <project-tenant> # board snapshot
hermes dashboard # visual board UI
```
The director profile takes over from here, decomposing the work and routing
tasks to specialist profiles via the kanban toolset.
### Step 6 — Monitor and intervene
Stay engaged — the kanban runs autonomously but a stuck task or bad output
needs human (or AI) judgment.
Monitoring patterns: poll `kanban list` periodically, inspect any RUNNING task
that exceeds its expected duration with `kanban show <id>`, and check
heartbeats. When a worker's output fails review, the standard interventions are:
1. Comment on the worker's task with specific feedback (`kanban_comment`)
2. Create a re-run task with the original as parent
3. Adjust the brief's scope and let the director re-decompose
For diagnostic patterns, intervention recipes, and the "task is stuck"
playbook, see **[references/monitoring.md](references/monitoring.md)**.
## Reference: worked examples
Six concrete pipelines covering very different video styles — narrative film,
product/marketing, music video, math/algorithm explainer, ASCII video, real-time
installation — showing how the same workflow yields very different teams and
task graphs. See **[references/examples.md](references/examples.md)**.
## Critical rules
1. **Discovery before action.** Never start generating a brief or team without
asking at least the three baseline questions. A bad brief cascades through
the entire pipeline.
2. **Match the team to the video.** Don't reuse the same 4-profile setup for
every job. A music video that doesn't have a beat-analysis profile will
misfire. A narrative film that doesn't have a writer profile will produce
incoherent scenes. See `references/role-archetypes.md`.
3. **One workspace per project.** All profiles for a given video share the same
`dir:` workspace. Tasks pass artifacts via shared filesystem and structured
handoffs. **Every** `kanban_create` call passes
`workspace_kind="dir"` + `workspace_path="<absolute project path>"`.
4. **Tenant every project.** Use a project-specific tenant
(`--tenant <project-slug>`). Keeps the dashboard scoped and prevents
cross-pollination with other ongoing kanbans.
5. **Respect existing skills.** When a scene fits an existing skill, the
relevant renderer should load that skill via `--skill <name>` on its task
or `always_load` in its profile. Do not re-derive what a skill already
provides.
6. **The director never executes.** Even with the full `kanban + terminal +
file` toolset, the director's `SOUL.md` rules forbid it from executing
work itself. It decomposes and routes only — every concrete task becomes
a `hermes kanban create` call to a specialist profile. The
`kanban-orchestrator` skill spells this out further.
7. **Don't over-decompose.** A 30-second product video does NOT need 20 tasks.
Aim for the smallest task graph that still parallelizes well and exposes the
right human-review gates.
8. **Verify API keys BEFORE firing.** External APIs (TTS, image-gen,
image-to-video) need keys in `~/.hermes/.env` or the user's secret store.
A worker that hits a missing-key error wastes a task slot. The setup
script's `check_key` helper aborts cleanly if a required key is missing.
## File map
```
SKILL.md ← this file (workflow + rules)
references/
intake.md ← discovery question banks per style
role-archetypes.md ← role library (writer, designer, animator, …)
tool-matrix.md ← skill + toolset mapping per role
kanban-setup.md ← setup script structure & profile config
monitoring.md ← watch + intervene patterns
examples.md ← six worked pipelines
assets/
brief.md.tmpl ← brief skeleton
setup.sh.tmpl ← setup script skeleton
soul.md.tmpl ← profile personality skeleton
scripts/
bootstrap_pipeline.py ← generate setup.sh from brief + team JSON
monitor.py ← polling + intervention helpers
```

View file

@ -0,0 +1,79 @@
# Video Brief — {{TITLE}}
> Slug: `{{SLUG}}` · Tenant: `{{TENANT}}` · Project workspace: `{{WORKSPACE}}`
## 1. Concept
**One-line pitch.** {{ONE_LINE_PITCH}}
**Emotional north star.** {{EMOTIONAL_NORTH_STAR}}
*(What should the viewer feel walking away?)*
## 2. Scope
| | |
|---|---|
| Duration | {{DURATION_S}} seconds |
| Aspect ratio | {{ASPECT}} |
| Resolution | {{RESOLUTION}} |
| Frame rate | {{FPS}} fps |
| Target platforms | {{PLATFORMS}} |
| Deadline | {{DEADLINE}} |
| Quality bar | {{QUALITY_BAR}} *(rough draft / polished / archival)* |
## 3. Style
**Visual references.** {{VISUAL_REFS}}
**Tone.** {{TONE}}
**Brand constraints.** {{BRAND_CONSTRAINTS}}
*(colors, typography, motion language; or "n/a")*
**Aesthetic rules.**
{{AESTHETIC_RULES}}
## 4. Scenes
Beat-by-beat breakdown. Each scene gets a row.
| # | Time | Content | Target tool / skill | Audio | Notes |
|---|------|---------|---------------------|-------|-------|
| 1 | 0:000:0X | {{SCENE_1_CONTENT}} | {{SCENE_1_TOOL}} | {{SCENE_1_AUDIO}} | {{SCENE_1_NOTES}} |
| 2 | 0:0X0:0Y | ... | ... | ... | ... |
## 5. Audio
**Approach.** {{AUDIO_APPROACH}}
*(narration / music-only / synced to track / silent / mixed)*
**Voiceover.** {{VO_DETAILS}}
*(provider, voice, language, script source — "n/a" if no VO)*
**Music.** {{MUSIC_DETAILS}}
*(provided track path / commission via Suno / commission via heartmula /
license-free / "n/a")*
**SFX.** {{SFX_DETAILS}}
*(generated, library, or "n/a")*
## 6. Deliverables
| Format | Resolution | Notes |
|--------|-----------|-------|
| {{PRIMARY_FORMAT}} | {{PRIMARY_RES}} | The main output |
| {{ALT_FORMAT_1}} | {{ALT_RES_1}} | {{ALT_NOTES_1}} |
**Final filename.** `output/final.mp4`
*(plus optional `output/final-9x16.mp4`, `output/captions.srt`, etc.)*
## 7. Constraints
- API keys required: {{API_KEYS_REQUIRED}}
- External dependencies: {{EXT_DEPS}}
- Source assets to incorporate: {{SOURCE_ASSETS}}
---
**This brief is the contract. The director and every downstream profile read
it. If the brief changes, the kanban must be re-fired — don't edit live.**

View file

@ -0,0 +1,185 @@
#!/usr/bin/env bash
# ═══════════════════════════════════════════════════════════════════════
# Video Pipeline Setup — {{TITLE}}
#
# Generated by video-orchestrator skill.
#
# Slug: {{SLUG}}
# Workspace: {{WORKSPACE}}
# Tenant: {{TENANT}}
# ═══════════════════════════════════════════════════════════════════════
set -euo pipefail
PROJECT_SLUG="{{SLUG}}"
WORKSPACE="$HOME/projects/video-pipeline/${PROJECT_SLUG}"
TENANT="{{TENANT}}"
# ─────────────────────────────────────────────────────────────────────
# 1. Verify required API keys
# ─────────────────────────────────────────────────────────────────────
echo "═══ Checking required API keys ═══"
check_key() {
local var="$1"
local kc_account="${2:-hermes}"
local kc_service="${3:-$1}"
if grep -q "^${var}=" "$HOME/.hermes/.env" 2>/dev/null && \
[ -n "$(grep "^${var}=" "$HOME/.hermes/.env" | cut -d= -f2-)" ]; then
echo " ✓ ${var} (env)"
return 0
fi
if command -v security >/dev/null 2>&1 && \
security find-generic-password -a "${kc_account}" -s "${kc_service}" -w >/dev/null 2>&1; then
echo " ✓ ${var} (Keychain ${kc_account}/${kc_service})"
return 0
fi
echo " ✗ ${var} not set in ~/.hermes/.env or Keychain (${kc_account}/${kc_service})"
return 1
}
# Customize this list per project — only check keys actually used:
{{KEY_CHECKS}}
# ─────────────────────────────────────────────────────────────────────
# 2. Create project workspace
# ─────────────────────────────────────────────────────────────────────
echo "═══ Creating project workspace ═══"
mkdir -p "$WORKSPACE"/{taste,audio/{voiceover,sfx},assets,scenes,checkpoints,tools,output}
{{SCENE_DIRS}}
echo " ✓ $WORKSPACE"
# ─────────────────────────────────────────────────────────────────────
# 3. Create Hermes profiles
# ─────────────────────────────────────────────────────────────────────
echo "═══ Creating Hermes profiles ═══"
{{PROFILE_CREATE_COMMANDS}}
# ─────────────────────────────────────────────────────────────────────
# 4. Configure profiles (toolsets, skills, cwd)
# ─────────────────────────────────────────────────────────────────────
echo "═══ Configuring profiles ═══"
configure_profile() {
local profile="$1"
local toolsets_json="$2" # JSON array string, e.g. '["kanban","terminal","file"]'
local skills_json="$3" # JSON array string, e.g. '["kanban-worker","ascii-video"]'
python3 - "$profile" "$toolsets_json" "$skills_json" "$WORKSPACE" <<'PY'
"""Patch a Hermes profile config.yaml using PyYAML so we don't depend on the
exact default-config string format. Validates the patch took effect and exits
non-zero if anything's off."""
import json
import os
import sys
try:
import yaml
except ImportError:
print("ERROR: PyYAML required. pip install pyyaml", file=sys.stderr)
sys.exit(1)
profile, toolsets_json, skills_json, workspace = sys.argv[1:5]
toolsets = json.loads(toolsets_json)
skills = json.loads(skills_json)
p = os.path.expanduser(f"~/.hermes/profiles/{profile}/config.yaml")
if not os.path.exists(p):
print(f" ✗ profile config not found: {p}", file=sys.stderr)
sys.exit(1)
with open(p) as f:
cfg = yaml.safe_load(f) or {}
# Apply our changes — only the keys we actually want to set.
cfg["toolsets"] = toolsets
cfg.setdefault("skills", {})
cfg["skills"]["always_load"] = skills
# Note: we do NOT touch cfg["approvals"] — that's a security-sensitive
# setting (manual confirmation of tool calls). Workspace cwd is overridden
# per-task by `--workspace dir:<path>` on `hermes kanban create`, so we
# don't need to mutate cfg["terminal"]["cwd"] either.
with open(p, "w") as f:
yaml.safe_dump(cfg, f, sort_keys=False)
# Validate
with open(p) as f:
after = yaml.safe_load(f)
errors = []
if after.get("toolsets") != toolsets:
errors.append(f"toolsets mismatch: {after.get('toolsets')!r}")
if after.get("skills", {}).get("always_load") != skills:
errors.append(f"skills.always_load mismatch: {after.get('skills', {}).get('always_load')!r}")
if errors:
print(f" ✗ {profile}: " + "; ".join(errors), file=sys.stderr)
sys.exit(1)
PY
if [ $? -ne 0 ]; then
echo " ✗ failed to configure ${profile}" >&2
exit 1
fi
echo " ✓ ${profile}"
}
{{PROFILE_CONFIG_COMMANDS}}
# ─────────────────────────────────────────────────────────────────────
# 5. Write SOUL.md per profile
# ─────────────────────────────────────────────────────────────────────
echo "═══ Writing profile personalities ═══"
{{SOUL_WRITES}}
# ─────────────────────────────────────────────────────────────────────
# 6. Copy brief, TEAM.md, and any provided assets
# ─────────────────────────────────────────────────────────────────────
echo "═══ Writing brief + taste ═══"
cat > "$WORKSPACE/brief.md" <<'BRIEF_EOF'
{{BRIEF_CONTENTS}}
BRIEF_EOF
cat > "$WORKSPACE/TEAM.md" <<'TEAM_EOF'
{{TEAM_CONTENTS}}
TEAM_EOF
{{TASTE_WRITES}}
{{ASSET_COPIES}}
# ─────────────────────────────────────────────────────────────────────
# 7. Fire the initial kanban task
# ─────────────────────────────────────────────────────────────────────
echo "═══ Firing initial kanban task ═══"
hermes kanban create "Direct production of {{TITLE}}" \
--assignee director \
--workspace dir:"$WORKSPACE" \
--tenant "$TENANT" \
--priority 2 \
--max-runtime 4h \
--body "$(cat <<EOF
Read brief.md, TEAM.md, and taste/.
Decompose into the team graph defined in TEAM.md.
All child tasks MUST use:
workspace_kind="dir"
workspace_path="$WORKSPACE"
tenant="$TENANT"
Do not execute the work yourself — route every concrete subtask to the
appropriate profile via kanban_create.
EOF
)"
echo ""
echo "═══ Setup complete ═══"
echo ""
echo "Monitor with:"
echo " hermes kanban watch --tenant $TENANT"
echo " hermes kanban list --tenant $TENANT"
echo " hermes dashboard"
echo ""
echo "Workspace: $WORKSPACE"

View file

@ -0,0 +1,38 @@
# {{ROLE_NAME}}
You are the **{{ROLE_NAME}}** for this video production.
## Project context
- **Brief:** read `brief.md` in your CWD
- **Team graph:** read `TEAM.md` in your CWD
- **Style spec:** read `taste/brand-guide.md` and `taste/emotional-dna.md` in
your CWD
## What you do
{{ROLE_RESPONSIBILITIES}}
## Inputs you read
{{INPUTS_READ}}
## Outputs you produce
{{OUTPUTS_PRODUCED}}
## Tools and skills available
- **Toolsets:** {{TOOLSETS}}
- **Skills loaded:** {{SKILLS}}
- **External APIs / CLIs:** {{EXTERNAL_TOOLS}}
## Rules
{{ROLE_RULES}}
{{COMMON_RULES}}
## Common reference commands
{{COMMON_COMMANDS}}

View file

@ -0,0 +1,227 @@
# Worked Examples
Six concrete pipelines covering different video styles. Each shows the team
composition, task graph, and skill/tool choices the orchestrator would make
for that brief. **These are illustrative, not templates** — adapt to the
actual brief.
## Example 1 — Narrative short film (text-to-image → image-to-video → cut)
**Brief:** A 90-second noir-style short. A detective walks through a rainy
city. Voiceover narration. AI-generated visuals.
**Team:**
- `director` — vision, decomposition, approval
- `writer` — script + voiceover copy (loads `humanizer` for natural voice)
- `storyboarder` — beat-by-beat shot list (loads `excalidraw`)
- `image-generator` — generates each shot's still via local ComfyUI workflows
(loads `comfyui`)
- `image-to-video-generator` — animates each still (Runway/Kling, OR
ComfyUI's AnimateDiff/WAN workflows via `comfyui`)
- `voice-talent` — narration via ElevenLabs
- `audio-mixer` — VO + ambient pad
- `editor` — assembly + transitions
- `reviewer` — final QA
**Task graph:**
```
T0 director decompose
T1 writer script + voiceover.md (parent: T0)
T2 storyboarder shot list with framing per beat (parent: T1)
T3 image-generator one still per shot (~12 shots) (parent: T2)
T4 image-to-video animate each still (parent: T3)
T5 voice-talent generate narration audio (parent: T1)
T6 audio-mixer mix VO + ambient (parent: T5)
T7 editor cut + transitions + audio mux (parents: T4, T6)
T8 reviewer final QA (parent: T7)
```
**Key choices:**
- Local ComfyUI via `comfyui` skill is preferred over external API for
cost/control — but external APIs are fine if ComfyUI isn't installed
- `editor` profile is ffmpeg-only, no Hermes skill required beyond
`kanban-worker`
- Storyboarder produces `storyboard.excalidraw` alongside the markdown
## Example 2 — Product / marketing teaser
**Brief:** A 30-second product teaser for a developer tool. Shows code +
terminal + UI screen recordings, voiceover, CTA at end. Square 1:1.
**Team:**
- `director`
- `copywriter` — taglines, voiceover script, CTA (loads `humanizer`)
- `concept-artist` — style frames (loads `claude-design` for UI mockups)
- `renderer-motion-graphics` — animated UI sequences (Remotion CLI)
- `renderer-ascii` — terminal-style demo scenes (loads `ascii-video`)
- `voice-talent` — VO via ElevenLabs
- `editor` — assembly + brand-color treatment
- `audio-mixer` — VO + light music bed
- `captioner` — burned subtitles for muted-autoplay platforms
- `masterer` — produces 1:1 + 9:16 + 16:9 variants
**Task graph:**
```
T0 director decompose
T1 copywriter copy.md + cta + vo script (parent: T0)
T2 concept-artist visual-spec.md + style frames (parent: T1)
T3a renderer-motion-graphics scene 1: UI sequence (parent: T2)
T3b renderer-ascii scene 2: terminal demo (parent: T2)
T3c renderer-motion-graphics scene 3: feature highlight (parent: T2)
T3d renderer-motion-graphics scene 4: CTA card (parent: T2)
T4 voice-talent narration (parent: T1)
T5 audio-mixer VO + music bed (parent: T4)
T6 editor cut + transitions (parents: T3*, T5)
T7 captioner SRT + burned subtitles (parent: T6)
T8 masterer 1:1, 9:16, 16:9 variants (parent: T7)
```
**Key choices:**
- Multiple specialized renderers (motion-graphics + ASCII) coexist
- Captioner is included because muted autoplay is the norm on social
- `claude-design` skill for UI mockups maps directly to the product video idiom
## Example 3 — Music video (synced to provided track)
**Brief:** A 3-minute music video for a provided lo-fi hip-hop track. Visuals
should pulse with the beat. Generative + ASCII hybrid. Vertical 9:16.
**Team:**
- `director`
- `music-supervisor` — analyze track, emit `audio/beats.json` (loads `songsee`)
- `storyboarder` — beat-aligned shot list (loads `excalidraw`)
- `renderer-ascii` — ASCII scenes synced to bass kicks (loads `ascii-video`)
- `renderer-p5js` — generative particle scenes synced to highs (loads `p5js`)
- `editor` — beat-cut assembly using `beats.json`
- `reviewer` — sync QA
**Task graph:**
```
T0 director decompose
T1 music-supervisor analyze track → beats.json + spectrogram (parent: T0)
T2 storyboarder shot list aligned to beats (parents: T1, T0)
T3a renderer-ascii scene 1: bass-driven ASCII (parent: T2)
T3b renderer-p5js scene 2: high-end particle field (parent: T2)
... (more scenes)
T4 editor cut to beats + mux track (parents: T3*, T1)
T5 reviewer sync QA + final approval (parent: T4)
```
**Key choices:**
- `music-supervisor` runs FIRST — `beats.json` gates the renderers
- `editor` uses `beats.json` directly to align cuts to bass kicks
- No voice-talent — music is the audio
- Two specialized renderers (`ascii-video` + `p5js`) for visual variety
## Example 4 — Math/algorithm explainer
**Brief:** A 2-minute explainer of an algorithm. 3Blue1Brown-style. Animated
diagrams, equations, narration. Square 1:1.
**Team:**
- `director`
- `writer` — narration script (loads `humanizer`)
- `cinematographer` — visual spec (loads `manim-video`)
- `renderer-manim` — all animated scenes (loads `manim-video`)
- `voice-talent` — narration via ElevenLabs
- `editor` — assembly + audio mux
- `captioner` — burned subtitles
**Task graph:**
```
T0 director decompose
T1 writer script + narration (parent: T0)
T2 cinematographer visual spec for all scenes (parent: T1)
T3a-Tn renderer-manim scenes 1..N (parents: T2)
T4 voice-talent narration audio (parent: T1)
T5 editor cut + mux (parents: T3*, T4)
T6 captioner SRT + burn (parent: T5)
```
**Key choices:**
- `manim-video` skill drives both the cinematographer (visual language) and
the renderer (actual scene production)
- The `manim-video` skill's reference docs (animation-design-thinking,
scene-planning, equations) auto-load when needed via the renderer's pinned skill
## Example 5 — ASCII video, music-track-only
**Brief:** A 60-second pure-ASCII video reactive to an existing track. No
voiceover, no other tools. Square 1:1.
**Team:**
- `director`
- `music-supervisor` — track analysis (loads `songsee`)
- `renderer-ascii` — all visuals (loads `ascii-video`)
- `editor` — assembly + audio mux
**Task graph:**
```
T0 director decompose
T1 music-supervisor analyze track (parent: T0)
T2a renderer-ascii scene 1 (parents: T1, T0)
T2b renderer-ascii scene 2 (parents: T1, T0)
T2c renderer-ascii scene 3 (parents: T1, T0)
T3 editor stitch + mux audio (parents: T2*)
```
**Key choices:**
- Minimal team (4 profiles) for a focused single-tool project
- No reviewer — short experimental piece, director approves directly
- All scenes run through one `renderer-ascii` profile because the `ascii-video`
skill covers everything
This example illustrates the rule: **don't over-decompose**. Three scenes
through one renderer is fine. Don't spawn three renderer profiles.
## Example 6 — Real-time / installation art
**Brief:** A 2-minute audio-reactive visual for a gallery installation. Driven
by an audio input feed. TouchDesigner-based. 16:9 4K.
**Team:**
- `director`
- `cinematographer` — visual language spec (loads `touchdesigner-mcp`)
- `renderer-touchdesigner` — all visuals + record-to-disk
(loads `touchdesigner-mcp`)
- `audio-mixer` — final loudness pass on the captured audio (optional if
pre-mixed source)
- `editor` — assemble final clip from TouchDesigner recording
- `reviewer` — visual QA
**Task graph:**
```
T0 director decompose
T1 cinematographer TD operator graph spec (parent: T0)
T2 renderer-touchdesigner build TD network + record output (parent: T1)
T3 editor trim + audio mux (parent: T2)
T4 reviewer final QA (parent: T3)
```
**Key choices:**
- `touchdesigner-mcp` controls a running TouchDesigner instance — the
cinematographer designs the operator graph, renderer builds it
- Output is a recording from the running TD network, not a render-to-frames
process; editor mostly just trims
## Pattern recognition
When the user describes a video, look for these signals to map to an example:
- **Plot, characters, scripted dialogue** → Example 1 (narrative)
- **Specific product, CTA, brand colors, voiceover** → Example 2 (marketing)
- **Track file provided, "synced to music"** → Example 3 (music video)
- **"Explain how X works", math/algorithm/concept walkthrough** → Example 4 (manim explainer)
- **Terminal aesthetic, ASCII, retro pixel** → Example 5 (ASCII)
- **"Audio-reactive", "real-time", "installation"** → Example 6 (TouchDesigner)
- **Comic-style narrative** → use `renderer-comic` (`baoyu-comic` skill)
- **Retro game / pixel-art aesthetic** → use `renderer-pixel` (`pixel-art` skill)
- **3D scene, photoreal environment** → use `renderer-3d` (`blender-mcp`)
- **Generative art, particle system, shader** → use `renderer-p5js` (`p5js`)
- **AI-generated photoreal stills + animation** → use `renderer-comfyui`
(`comfyui`) for both stills and image-to-video
- **"video about how the system works", recursive demo** → composable from
any of the above; the recursion is a rendering technique, not a style
The actual team should be derived from the specific brief — these examples are
starting points, not endpoints.

View file

@ -0,0 +1,166 @@
# Intake — Discovery Question Banks
The discovery process is **adaptive**. Always start with three baseline
questions to identify the broad style category, then drill into a per-style
question bank. Ask 2-4 questions at a time, listen, then proceed. Make
reasonable assumptions whenever the user implies an answer.
## Tier 0 — Baseline (always ask)
1. **What is the video?** — One-sentence pitch
2. **How long?** — Approximate duration
3. **Aspect ratio + target platform?** — 16:9 / 9:16 / 1:1 / 4:5; X, IG, YouTube, internal, etc.
From these answers, classify the style category and pick the relevant Tier 1
follow-ups. **Do not** continue asking until you have at least these three.
## Style classification
Map the brief to one of these archetypes (or a hybrid):
| Archetype | Tells |
|-----------|-------|
| **Narrative film** | Plot, characters, scenes-with-events, dialogue, location |
| **Product / marketing** | A specific product or feature being shown / sold; CTA at end |
| **Music video** | A specific track exists; visuals sync to music |
| **Explainer / educational** | A concept being taught; voiceover-driven |
| **Tutorial / changelog** | Software demo, terminal-heavy, technical |
| **ASCII / terminal art** | Retro terminal aesthetic explicit, character-grid |
| **Abstract / loop** | Generative, no plot, often perfect-loop |
| **Documentary / interview cut** | Real footage, transcription-driven |
| **Real-time / installation** | Audio-reactive, gallery installation, VJ output |
If ambiguous, **ask** which category fits — don't guess. Hybrids are common
(e.g., a product video with a narrative arc); decompose into the dominant
mode + secondary modifiers.
**Recursive / meta** ("a video that shows its own production") is a
*rendering technique*, not a separate style — compose it from any of the
above by adding a two-pass render step where pass 2 uses pass 1's output as
texture inside the final scene.
## Tier 1 — Per-style follow-ups
### Narrative film
- **Setting / world?** — When and where the story takes place
- **Characters?** — How many, archetypes, who carries dialogue
- **Beat list or full script?** — Has the user written the story or do we draft it
- **Dialogue language?** — Spoken lines, on-screen subs only, silent
- **Visual generation approach?** — Text-to-image (FAL/Midjourney/Imagen) →
image-to-video (Runway/Kling), 3D animation (Blender), 2D animation,
procedural, or hybrid
- **Voice approach?** — TTS (which voice), recorded VO, no dialogue
- **Music / score?** — Commissioned (via `songwriting-and-ai-music` Suno
prompts, or local `heartmula`), licensed track provided, silent
### Product / marketing
- **Product?** — Name, what it does, key feature being shown
- **Target audience?** — Who's watching, what they care about
- **CTA?** — Visit URL, install, sign up, etc.
- **Tone?** — Serious, playful, technical, premium, edgy
- **Brand assets available?** — Logo files, color palette, fonts, existing footage
- **Animation style?** — Motion graphics (Remotion / AE-style), screen recording,
generative, illustrated
- **Voiceover?** — Yes (which voice / language) or text-only
- **Music?** — Track provided, license-free needed, custom-composed
### Music video
- **Track file?** — Path to the audio (essential — we'll analyze BPM + beats)
- **Track length to use?** — Full song or a section
- **Genre / energy?** — Tells what visual rhythm and density to use
- **Lyric / narrative content?** — Are there lyrics to render on screen,
or is it purely visual?
- **Visual reference style?** — Existing music videos / artists for reference
- **Performer footage?** — None, has clips, will provide
- **Visual generation approach?** — Per-beat generative, edit-driven cuts of stock
footage, illustrated, hybrid
### Explainer / educational
- **What concept is being taught?** — One-sentence concept, key takeaway
- **Audience expertise?** — Beginner / intermediate / expert
- **Diagram density?** — Heavy math / formulas / code / abstract concepts
- **Voiceover?** — TTS / recorded / on-screen text only
- **Tool preference?**`manim-video` (math), `p5js` (generative),
Remotion (UI motion graphics), `comfyui` (AI-generated visuals),
`ascii-video` (technical/retro), hybrid
- **Pacing?** — Fast and dense (3Blue1Brown) or slow and contemplative
### Tutorial / changelog / software demo
- **Software being demonstrated?** — Name, what it does
- **Demo script?** — Sequence of commands / screens to show
- **Terminal-only or with GUI?**
- **Voiceover for narration?**
- **Diagram support needed?** — Often these benefit from a diagram skill
alongside the screen-capture/render step (`excalidraw`,
`architecture-diagram`, `concept-diagrams`)
### ASCII / terminal art
- **Source material?** — Generative / driven by audio / converting existing
video / static image starting point
- **Color palette?** — Brand-driven (gold/black/blue), Matrix green, full
rainbow, monochrome
- **Audio reactivity?** — None / loose mood / tight beat sync / FFT-driven
- **Character set?** — ASCII only / Unicode block-drawing / mystic glyphs
- **Loop or narrative?** — Perfect loop or one-shot
### Abstract / loop
- **Mood / emotion?** — One word that captures the feel
- **Motion type?** — Zoom-into-itself, particle drift, wave, geometric, organic
- **Loop required?** — Perfect loop (Droste-style) or just satisfying ending
- **Audio?** — Silent, ambient pad, beat-synced
### Documentary / interview cut
- **Source footage?** — Provided clips, length per clip
- **Transcript / subtitles?** — Provided or to be generated
- **Story structure?** — Chronological / thematic / arc
- **B-roll approach?** — Generated, stock library, none
### Real-time / installation
- **Output environment?** — Gallery wall, projector, screen, web embed
- **Audio source?** — Live audio input, pre-recorded track, both
- **Reactivity tightness?** — Mood-level (loose) vs. tight beat-sync vs. live
parameter control
- **Tool preference?**`touchdesigner-mcp` for full TD operator graphs;
`p5js` for web-canvas; `comfyui` for generative-AI fed by audio features
## Tier 2 — Always ask near the end
- **Brand assets path?** — Where logo / color palette / fonts / music library lives
- **Output format requirements?** — Codec preference, target file size, accepted
alternates (vertical cut, GIF, audio-only)
- **Deadline?** — Affects task `max_runtime_seconds` and acceptable scope
- **Quality bar?** — Rough draft for review / polished final / archival
- **Existing footage / assets to reuse?** — Anything that should appear, not just inform
## Reasonable assumption defaults
When the user under-specifies, fill in these defaults rather than asking:
| Question | Default |
|----------|---------|
| Frame rate | 30 fps for X / IG; 60 fps for tutorials/explainers; 24 fps for narrative film |
| Resolution | 1080×1080 for square, 1920×1080 for 16:9, 1080×1920 for 9:16 |
| Codec | H.264 / yuv420p, CRF 18 |
| Audio codec | AAC 192 kbps |
| Voice | Provider's mid-range neutral voice unless brand calls for distinctive timbre |
| Music | Silent (require user to specify if music is wanted) |
| Captions | On for explainer/tutorial; off for narrative/abstract unless requested |
| Quality bar | Polished final unless user says draft |
State the assumption explicitly: *"Assuming 30fps and AAC audio unless you say otherwise — proceed?"*
## Anti-patterns
- **Asking 10 questions at once.** Maximum 4 per turn.
- **Asking for things the brief already implies.** If the user said "music video for my track," do not ask "is there a track?"
- **Failing to classify before drilling in.** Tier-1 questions depend on classification; mixing them up wastes turns.
- **Treating "make a video" as enough to proceed.** Always confirm the three baseline questions.

View file

@ -0,0 +1,276 @@
# Kanban Setup — Project Bootstrap & Profile Configuration
Once the brief is locked and the team is designed, the next step is producing
the actual `setup.sh` that creates the project workspace, configures Hermes
profiles, and fires the initial kanban task.
This file documents the patterns. The companion script
`scripts/bootstrap_pipeline.py` automates most of it from a structured input
JSON.
> **Credit:** the single-project-workspace layout, profile-config patching
> approach, SOUL.md-per-profile convention, and `--workspace dir:<path>` rule
> are adapted from alt-glitch's original multi-agent video pipeline:
> [NousResearch/kanban-video-pipeline](https://github.com/NousResearch/kanban-video-pipeline).
> This skill generalizes those patterns across video styles and replaces the
> string-replacement config patcher with a PyYAML-based one.
## Project workspace structure
Every video project gets one workspace under `~/projects/video-pipeline/<slug>/`:
```
~/projects/video-pipeline/<slug>/
├── brief.md ← the contract; all tasks reference
├── TEAM.md ← team composition + task graph (director reads this)
├── taste/
│ ├── brand-guide.md ← color, typography, motion rules
│ ├── emotional-dna.md ← what the piece should FEEL like
│ └── style-frames/ ← optional: visual references
├── audio/
│ ├── track.mp3 ← provided music (if any)
│ ├── voiceover/ ← per-line TTS clips
│ └── sfx/ ← sound effects
├── assets/
│ ├── logos/
│ ├── fonts/
│ └── existing-footage/ ← reusable provided clips
├── scenes/
│ ├── scene-01/
│ │ ├── VISUAL_SPEC.md ← cinematographer's per-scene spec
│ │ ├── render.py ← renderer's code (or sketch.html, etc.)
│ │ ├── checkpoints/ ← preview frames for QA
│ │ └── clip.mp4 ← the deliverable for this scene
│ ├── scene-02/...
│ └── ...
├── checkpoints/ ← global review frames
├── tools/ ← optional project-local helpers
└── output/
├── final.mp4 ← stitched + audio
├── final-noaudio.mp4
├── final-9x16.mp4 ← optional: vertical alternate
└── captions.srt ← optional: subtitle file
```
**The slug** is derived from the brief title: lowercase, hyphen-separated.
Example: `q3-product-teaser`, `ascii-mood-loop`, `interview-cut-2026-q1`.
## The setup.sh script
The setup script does six things in order:
1. **Create workspace tree** — all directories above
2. **Create profiles**`hermes profile create <name> --clone`
3. **Configure profiles** — patch each profile's
`~/.hermes/profiles/<name>/config.yaml` to set toolsets, always_load skills,
and `cwd`
4. **Write SOUL.md per profile** — the personality + role definition
5. **Copy any provided assets + write `brief.md`, `TEAM.md`, and `taste/`**
6. **Fire the initial kanban task**`hermes kanban create` assigned to the director
See `assets/setup.sh.tmpl` for the skeleton.
### Profile creation pattern
```bash
hermes profile create director --clone 2>/dev/null || true
```
The `--clone` flag clones from the active profile (preserving model, base
config). The `|| true` makes the script idempotent — re-running won't error if
the profile already exists.
### Profile config patching
Each profile has a YAML config at `~/.hermes/profiles/<name>/config.yaml`. The
setup script edits exactly two keys:
1. `toolsets:` — replace the default with the role's required toolsets
2. `skills.always_load:` — list the role's must-load skills (may be empty)
**Do NOT** modify `approvals.mode` (controls user-confirmation of tool calls
— a security setting that must stay as the user configured it). **Do NOT**
modify `terminal.cwd` — the kanban dispatcher overrides cwd per-task via
`--workspace dir:<path>`, so the profile's cwd is irrelevant to the kanban
work and changing it could break the user's interactive use of the profile.
Use **PyYAML**, not string replacement, so the patch is robust against
default-config schema drift:
```bash
configure_profile() {
local profile="$1"
local toolsets_json="$2" # JSON array, e.g. '["kanban","terminal","file"]'
local skills_json="$3" # JSON array, e.g. '["kanban-worker","ascii-video"]'
python3 - "$profile" "$toolsets_json" "$skills_json" <<'PY'
import json, os, sys, yaml
profile, ts_json, sk_json = sys.argv[1:4]
p = os.path.expanduser(f"~/.hermes/profiles/{profile}/config.yaml")
with open(p) as f:
cfg = yaml.safe_load(f) or {}
cfg["toolsets"] = json.loads(ts_json)
cfg.setdefault("skills", {})["always_load"] = json.loads(sk_json)
with open(p, "w") as f:
yaml.safe_dump(cfg, f, sort_keys=False)
PY
}
```
PyYAML must be installed in the user's Python (it ships with most Hermes
installs). If absent: `pip install pyyaml`.
The setup script should also **validate** the patch by re-reading the file
and comparing — see `assets/setup.sh.tmpl` for the validation pattern.
### SOUL.md per profile
Each profile gets a `SOUL.md` at `~/.hermes/profiles/<name>/SOUL.md` that
defines its role, voice, and rules. See `assets/soul.md.tmpl` for the
template. Customize per role and per project.
The director's SOUL.md should be the most opinionated — its voice flavors
the entire production. **Critical content for the director's SOUL.md:**
- **Anti-temptation rules:** "Do not execute the work yourself. For every
concrete task, create a kanban task and assign it. Decompose, route, comment,
approve — that's the whole job." (The `kanban-orchestrator` skill provides
the deeper playbook; load it.)
- **Decomposition steps:** Read `brief.md`, `TEAM.md`, `taste/`. Use the team
graph in `TEAM.md` to fan out tasks.
- **The workspace_path rule** (see below).
Other profiles' SOUL.md is briefer; mostly mechanical: who you are, what you
read, what you produce, what skills/tools to use, where to write outputs.
Most non-director profiles should `always_load: kanban-worker` for the
deeper-than-baseline kanban guidance.
### Initial kanban task
The final action of setup.sh is firing the kanban:
```bash
hermes kanban create "Direct production of <video title>" \
--assignee director \
--workspace dir:"$HOME/projects/video-pipeline/${PROJECT_SLUG}" \
--tenant ${PROJECT_SLUG} \
--priority 2 \
--max-runtime 4h \
--body "$(cat <<EOF
Read brief.md, TEAM.md, and taste/.
Decompose into the team graph defined in TEAM.md.
All child tasks MUST use:
workspace_kind="dir"
workspace_path="$HOME/projects/video-pipeline/${PROJECT_SLUG}"
tenant="${PROJECT_SLUG}"
EOF
)"
```
The `--workspace dir:<path>` flag is **critical** — it tells the kanban that
all child tasks share this workspace. Skipping or using `worktree` will
isolate profiles and break artifact sharing.
## The TEAM.md file
Alongside `brief.md`, write a `TEAM.md` that the director reads. It documents
the team composition + task graph the orchestrator should follow. This
removes ambiguity and prevents the director from inventing extra steps.
Example structure (for an ASCII video with a music supervisor and editor):
```markdown
# Team & Task Graph — <video title>
## Team
- `director` (this profile) — vision, decomposition, approval
- `cinematographer` — visual spec, quality review (loads `ascii-video`)
- `renderer-ascii` — ASCII scenes (loads `ascii-video`)
- `music-supervisor` — track analysis (loads `songsee`)
- `voice-talent` — narration (uses ElevenLabs API)
- `audio-mixer` — final mix (ffmpeg)
- `editor` — assembly (ffmpeg)
- `reviewer` — final QA gate
## Task Graph
T0: this task — decompose
├── T1: cinematographer "Design visual language" (parent: T0)
│ │
│ ├── T2a: renderer-ascii "Scene 1 — title card" (parent: T1)
│ ├── T2b: renderer-ascii "Scene 2 — main beat" (parent: T1)
│ ├── T2c: renderer-ascii "Scene 3 — outro" (parent: T1)
├── T3: music-supervisor "Analyze track + emit beats.json" (parent: T0)
├── T4: voice-talent "Generate narration" (parent: T0)
├── T5: audio-mixer "Mix VO + bg music" (parents: T3, T4)
├── T6: editor "Assemble cut + mux audio" (parents: T2*, T5)
└── T7: reviewer "Final QA" (parent: T6)
```
The director turns this into actual `kanban_create` calls.
## API-key prerequisites check
Before firing the kanban, verify required keys are available. Check both
`~/.hermes/.env` and macOS Keychain (if on macOS):
```bash
check_key() {
local var="$1"
local kc_account="$2"
local kc_service="$3"
if grep -q "^${var}=" ~/.hermes/.env 2>/dev/null && \
[ -n "$(grep "^${var}=" ~/.hermes/.env | cut -d= -f2-)" ]; then
return 0
fi
if command -v security >/dev/null 2>&1 && \
security find-generic-password -a "${kc_account}" -s "${kc_service}" -w >/dev/null 2>&1; then
return 0
fi
echo "ERROR: ${var} not set in ~/.hermes/.env or Keychain (${kc_account}/${kc_service})"
return 1
}
check_key ELEVENLABS_API_KEY hermes ELEVENLABS_API_KEY || exit 1
check_key OPENROUTER_API_KEY hermes OPENROUTER_API_KEY || exit 1
# ...
```
If a key is missing, the script aborts with a clear message rather than
firing a kanban that will hit credential errors mid-execution.
## Critical rules
1. **`workspace_kind="dir"` + `workspace_path="<absolute>"` on every kanban_create.** Otherwise profiles can't share artifacts.
2. **Tenant every task.** `--tenant <project-slug>` keeps the dashboard scoped
and prevents cross-pollination with other ongoing kanbans.
3. **Idempotency keys.** For tasks that should not duplicate on re-run (e.g.,
setup creating profiles), use the `idempotency_key` argument or check
existence first.
4. **`max_runtime_seconds` per task.** Renderers that get stuck eat compute.
Standard defaults:
- Renderer task: 1800s (30min)
- Editor task: 600s (10min)
- Voice-talent task: 300s (5min)
- Image-generator task: 600s (10min)
- Image-to-video-generator task: 900s (15min)
5. **Heartbeats for long renders.** Tasks expected to run >5min should emit
`kanban_heartbeat` periodically with progress. Renderers should report
frame counts; the editor should report assembly progress.
6. **The `audio/` and `taste/` dirs are populated BEFORE firing the kanban.**
Don't ask the director's pipeline to source these — copy at setup time.
7. **`brief.md` is read-only after setup.** If the brief changes during
execution, that's a significant pivot — re-fire the kanban rather than edit
live.

View file

@ -0,0 +1,180 @@
# Monitoring — Watch the Pipeline + Intervene
After `setup.sh` fires the kanban, the work runs autonomously. The role of
this skill in the execution phase is to help the user (and the AI overseeing
the session) detect problems early and intervene effectively.
## Live monitoring commands
```bash
# Live event stream — task spawns, status changes, heartbeats, completions
hermes kanban watch --tenant <project-slug>
# Snapshot of the board
hermes kanban list --tenant <project-slug>
hermes kanban list --tenant <project-slug> --json # machine-readable
# Per-status counts + oldest-ready age
hermes kanban stats --tenant <project-slug>
# Visual dashboard (browser)
hermes dashboard
# Inspect a specific task (includes comments + events)
hermes kanban show <task-id>
# Follow a single task's event stream
hermes kanban tail <task-id>
```
Verify available subcommands with `hermes kanban --help` — the kanban CLI
ships with `init / create / list / show / assign / link / unlink / claim /
comment / complete / block / unblock / archive / tail / dispatch / watch /
stats / heartbeat / log / runs / context / gc`.
The companion `scripts/monitor.py` polls the kanban via the CLI and surfaces
common issues (stuck tasks, missing heartbeats, repeated retries, dependency
deadlocks).
## What to watch for
### Healthy pipeline indicators
- Tasks transition `READY → RUNNING → DONE` in roughly the expected order
- Renderers emit periodic `kanban_heartbeat` events with progress (e.g. "frame
240/720")
- Each task's runtime is well under its `max_runtime_seconds` cap
- No task accumulates more than 1 retry
- Dependency arrows resolve (children unblock as parents complete)
### Warning signs
| Symptom | Likely cause | Action |
|---------|--------------|--------|
| Task RUNNING but no heartbeat in 2+ min | Worker stuck, infinite loop, blocked on input | `hermes kanban show <id>` — read the worker's last events. The dispatcher SIGTERMs tasks that exceed their `max-runtime`; if you need to stop one earlier, `hermes kanban block <id>` then `hermes kanban archive <id>`, and create a re-run task. |
| Same task retried 2+ times | Reproducible failure (missing key, bad spec, broken tool) | `hermes kanban show <id>` to read failure events. Fix root cause before re-running. |
| RUNNING longer than max_runtime | Task is slow but progressing OR genuinely stuck | Check heartbeats with `hermes kanban tail <id>`. If progressing, the dispatcher will SIGTERM eventually anyway — raise `max-runtime` on a re-created task. |
| Child task READY but parents still RUNNING for >2× expected | Cascade slow, dependency miswired | Check the dependency graph. Inspect the parent: sometimes it completed but its handoff fields (summary, metadata) were empty so the child has nothing to consume. |
| New tasks not appearing | Director is hung in decomposition | Inspect director task with `kanban show`. Often a malformed `kanban_create` call. |
| Specialist tasks completing instantly | Decomposition created tasks without bodies | Director didn't pass enough context. Re-create with explicit body content. |
| Tasks created but never picked up | Profile not running, or tenant mismatch, or dispatcher not running | Check `hermes profile list` (profile exists?), `hermes status` (gateway/dispatcher up?), and verify tenant. |
| Specific renderer task fails → review note → renderer redoes → fails again | Brief is asking for the impossible | Pivot the brief, not the renderer. |
## Intervention recipes
### Rejecting bad output
When a renderer ships a clip that doesn't pass review:
```bash
# 1. Comment on the renderer's task with specific feedback
hermes kanban comment <renderer-task-id> "Scene 3 looks too sparse \
— increase visual density. Tighten color palette to brand spec."
# 2. Create a re-render task with the original as parent
hermes kanban create "Scene 3 — re-render with feedback" \
--assignee renderer-ascii \
--parent <renderer-task-id> \
--workspace dir:"$HOME/projects/video-pipeline/<slug>" \
--tenant <slug> \
--skill ascii-video \
--max-runtime 30m
```
### Adding a new dependency mid-flight
When the editor needs an asset that wasn't originally planned (e.g., a captions
file):
```bash
# 1. Create the new task and capture its id
NEW_TASK_ID=$(hermes kanban create "Generate SRT captions from voiceover" \
--assignee captioner \
--workspace dir:"$HOME/projects/video-pipeline/<slug>" \
--tenant <slug> \
--json | python3 -c "import json,sys;print(json.load(sys.stdin)['id'])")
# 2. Wire it as a parent of the editor's task with `kanban link`
hermes kanban link "$NEW_TASK_ID" <editor-task-id>
```
`kanban link` takes `parent_id child_id` (parent first). Use `kanban unlink`
to remove a dependency.
### Stopping a worker that's stuck
The kanban dispatcher will SIGTERM (then SIGKILL) any task that exceeds its
`--max-runtime` automatically. To stop one sooner:
```bash
# Mark blocked so the dispatcher leaves it alone, then archive
hermes kanban block <task-id>
hermes kanban archive <task-id>
# Diagnose what happened
hermes kanban show <task-id> # task body, comments, recent events
hermes kanban tail <task-id> # follow the live event stream
hermes kanban log <task-id> # worker process log
```
After stopping, decide: fix root cause + re-create the task, or skip and
adjust dependent tasks.
### Pivoting the brief
If during execution the user wants something fundamentally different:
1. Cancel the active director task and all RUNNING children
2. Edit `brief.md` and `TEAM.md`
3. Re-fire the initial `hermes kanban create` for the director
Don't try to "edit while running" — the kanban's audit trail makes a clean
pivot more legible than mid-stream changes.
## Periodic check-in script
A simple polling pattern for hands-off monitoring:
```bash
while true; do
clear
hermes kanban list --tenant <slug>
echo "---"
hermes kanban stats --tenant <slug>
sleep 30
done
```
For a live event feed, run `hermes kanban watch --tenant <slug>` in a
separate terminal — it streams task lifecycle events as they happen.
For automated intervention (auto-restart stuck tasks, auto-create re-render on
review failure), see the `scripts/monitor.py` patterns.
## When to call it done
The pipeline is finished when:
1. All RENDER tasks complete and pass review
2. The editor's `output/final.mp4` exists and `ffprobe` confirms expected
duration + streams
3. The reviewer (if present) has approved
4. Optional masterer variants exist
At this point, present the final.mp4 path to the user along with any review
notes. Do NOT delete the workspace — the user may want to iterate on a single
scene without re-running the whole pipeline.
## Common gotchas
- **Tenant mismatches.** A task created with the wrong tenant won't appear in
monitoring. Always pass `--tenant <slug>` consistently.
- **Profile process not running.** Tasks queue indefinitely in READY if no
worker for that profile is online. Check `hermes profile list` and start
any missing profiles.
- **Workspace permissions.** All profiles need read+write to the workspace
directory. `chmod -R u+rw <workspace>` if any worker reports permission
errors.
- **Audio/visual sync.** The editor's clip stitching must match the
renderer's actual output durations. Don't hardcode scene durations in
the editor — read from the renderer's handoff metadata.

View file

@ -0,0 +1,298 @@
# Role Archetypes
The library of role archetypes for video production. **Compose a team from this
list, don't clone a fixed roster.** Most videos need 4-7 profiles. The director
is always present; everything else is conditional on the brief.
Each role's profile name is by convention `kebab-case` (e.g. `creative-director`,
`image-generator`). Multiple instances of the same role get descriptive suffixes
when they need different focus (e.g., `renderer-ascii`, `renderer-3d`).
For toolset + skill mapping per role, see [tool-matrix.md](tool-matrix.md).
## Always present
### director
The vision-holder. Reads the brief and brand guide, decomposes into a task
graph, comments to steer creative direction, approves the final cut.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-orchestrator`. The kanban plugin auto-injects baseline
orchestration guidance for free; `kanban-orchestrator` is the deeper
decomposition playbook. Add `creative-ideation` if the brief is wide-open
and needs framing help.
- **Personality:** Tied to the brand voice — see `assets/soul.md.tmpl`
The director has the same toolset as everyone else, but its `SOUL.md` rules
**forbid** execution. The "decompose, don't execute" discipline is enforced
by personality + the kanban-orchestrator skill, not by missing tools.
## Pre-production roles
Pick based on what the brief needs.
### writer / screenwriter
Writes scripts, dialogue, voiceover copy, narration. Use for any video with
spoken or written words beyond a tagline.
- **Toolsets:** kanban, file
- **Skills:** `kanban-worker`, `humanizer` (post-process to strip AI-tells)
- **Outputs:** `script.md`, `narration.md`, `dialogue/scene-NN.md`
### copywriter
Like `writer` but specifically for marketing copy: taglines, CTAs, voiceover
scripts for product videos.
- **Toolsets:** kanban, file
- **Skills:** `kanban-worker`, `humanizer`
- **Outputs:** `copy.md`
### concept-artist / visual-designer
Develops the visual identity: mood board, style frames, color palette
rationale, typography choices. Produces a `visual-spec.md` that all generators
follow. Often produces still reference frames using image-generation APIs or
local skills.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker` plus any project-specific design skill —
`claude-design` (UI/web), `sketch` (quick mockup variants),
`popular-web-designs` (matching known web aesthetic), `pixel-art` (retro),
`ascii-art` (terminal/retro), `excalidraw` (hand-drawn frames),
`design-md` (text-based design docs)
- **Outputs:** `visual-spec.md`, `taste/style-frames/*.png`
### storyboarder
Maps the brief to a beat-by-beat shot list with timing. Critical for narrative
film and music video. Often pairs with a diagramming tool.
- **Toolsets:** kanban, file
- **Skills:** `kanban-worker` plus a diagram skill — `excalidraw` (sketch),
`architecture-diagram` (technical/system), `concept-diagrams` (educational/
scientific)
- **Outputs:** `storyboard.md` with one row per scene/shot, optional
storyboard sketches
### cinematographer / dp
Designs the visual language: framing, color, motion, transitions. Reviews
generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker` plus the visual skill that matches the project
(e.g., `ascii-video` for ASCII work, `manim-video` for explainers,
`touchdesigner-mcp` for real-time visuals, etc.)
- **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer
tasks
- **Reviews via:** any media-analysis approach (Gemini multimodal, manual
inspection of clip thumbnails, ffprobe summaries)
## Production roles
### renderer (generic)
A worker that produces visual content for one or more scenes. Loaded with
whichever creative skill fits the scene's style. Multiple renderers can run in
parallel, each pinned to a different skill via `always_load` in their profile
or `--skill` on the task.
- **Toolsets:** kanban, terminal, file
- **Skills:** one creative skill (see specialized variants below)
- **Outputs:** `scenes/scene-NN/clip.mp4`
### Specialized renderer variants
When scenes need very different tools, create specialized renderer profiles
instead of overloading one. Each loads a different creative skill.
| Variant | Skill | Best for |
|---------|-------|----------|
| `renderer-ascii` | `ascii-video` | Terminal aesthetic, retro pixel, audio-reactive grid, video-to-ASCII conversion |
| `renderer-manim` | `manim-video` | Math, algorithms, 3Blue1Brown-style explainers, equation derivations |
| `renderer-p5js` | `p5js` | Generative art, particles, shaders, organic motion, web-canvas content |
| `renderer-comfyui` | `comfyui` | AI-generated stills + video using local ComfyUI workflows (img-to-img, img-to-video, etc.) |
| `renderer-touchdesigner` | `touchdesigner-mcp` | Real-time, audio-reactive, installation art, VJ-style content |
| `renderer-3d` | `blender-mcp` *(optional)* | 3D modeling, animation, photoreal environments, character animation |
| `renderer-pixel` | `pixel-art` | Retro game aesthetic with era-correct palettes |
| `renderer-comic` | `baoyu-comic` | Knowledge-comic style narrative scenes |
| `renderer-meme` | `meme-generation` *(optional)* | Meme-style stills for satirical/social content |
| `renderer-procedural` | (none — Python with PIL + ffmpeg directly) | Custom procedural content where no skill fits |
| `renderer-video` | (external image-to-video API: Runway / Kling / Luma) | Animating still images in narrative film |
| `renderer-motion-graphics` | (external — Remotion CLI) | Motion graphics, kinetic typography, UI animations |
For external-API renderers, the profile holds the API client logic; only
`kanban-worker` is loaded, plus the terminal toolset and the API key.
### image-generator
Specifically for text-to-image generation. Often produces stills that go to
`renderer-video` for animation.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`, optionally `comfyui` (drives a local
ComfyUI install for image generation)
- **External APIs (alternative to local ComfyUI):** FAL, Replicate, OpenAI
Images, Midjourney
- **Outputs:** `scenes/scene-NN/stills/*.png`
### image-to-video-generator
Takes still images and animates them via Runway/Kling/Luma APIs, or via
ComfyUI's image-to-video workflows locally. Almost always follows
`image-generator` in narrative film pipelines.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`, optionally `comfyui` (for local image-to-video
workflows like AnimateDiff or WAN)
- **External APIs:** Runway, Kling, Luma, Pika
- **Outputs:** `scenes/scene-NN/clip.mp4`
### music-supervisor
Sources, analyzes, and prepares the music track. For music videos, also
produces a beat/BPM map and key-moment timestamps. Uses `songsee` for
spectrograms when the editor or renderer needs a visual reference of the
audio's energy.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`, `songsee` (audio visualization), plus one of:
- `songwriting-and-ai-music` — when commissioning lyrics + Suno prompts
- `heartmula` — when generating music with the open-source local model
- `spotify` — when sourcing existing tracks
- **Outputs:** `audio/track.mp3`, `audio/beats.json`, optional
`audio/track-spectrogram.png`
### voice-talent / narrator
Generates voiceover audio. Calls a TTS API directly; no Hermes skill required
beyond `kanban-worker`. The user can also supply pre-recorded VO instead of
generation.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **External APIs:** ElevenLabs, OpenAI TTS, etc.
- **Outputs:** `audio/voiceover/line-NN.mp3`, `audio/voiceover/timeline.mp3`
### foley / sfx-designer
Sound effects and ambient design. Often optional unless the brief calls for
sound design specifically.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`, `songsee` for audio-feature visualization when
designing to a track
- **Outputs:** `audio/sfx/*.mp3`
## Post-production roles
### editor
Assembles the final cut from clips. Uses ffmpeg for stitching, fades,
transitions. Reviews each clip for pacing and quality before assembly.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **External tools:** ffmpeg, ffprobe
- **Outputs:** `output/final.mp4`, `output/final-noaudio.mp4`
### colorist
Color grading. Usually optional — if the renderers already produce
brand-consistent output and the editor just stitches, the colorist is overkill.
Worth including for narrative film with hero shots.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **Outputs:** `output/final-graded.mp4`
### audio-mixer
Mixes voiceover + music + SFX into a final audio track. Sets levels, ducks
music under VO, normalizes loudness (LUFS).
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **External tools:** ffmpeg with `loudnorm` filter, optional `sox`
- **Outputs:** `audio/final-mix.mp3`
### captioner
Burns subtitles into the video, generates SRT, handles accessibility. Can also
generate captions from audio via Whisper.
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **External tools:** Whisper (CLI or API), ffmpeg subtitle filters
- **Outputs:** `output/captions.srt`, `output/final-captioned.mp4`
### masterer
Final encode + format variants. Produces deliverables for each platform target
(square for IG, vertical for TikTok, full HD for YouTube, etc.).
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **Outputs:** `output/final-1080.mp4`, `output/final-9x16.mp4`, etc.
## QA roles
### reviewer
A neutral quality gate. Reads the brief, watches the cut, comments
specifically on what's off (pacing, sync, brand alignment, technical
quality). Distinct from the cinematographer (who reviews visuals during
production) and the editor (who reviews for assembly).
- **Toolsets:** kanban, terminal, file
- **Skills:** `kanban-worker`
- **External tools:** any media-analysis approach (Gemini multimodal,
ffprobe, manual frame extraction)
- **Outputs:** `review-notes.md`, comments on tasks
### brand-cop
Reviews specifically for brand compliance — colors, typography, voice. Use
when the brand guidelines are detailed and a generic reviewer might miss
violations.
- **Toolsets:** kanban, file
- **Skills:** `kanban-worker`
- **Outputs:** comments + `brand-review.md`
## Composing teams — heuristics
- **Always:** director + at least one renderer + editor.
- **Add writer** if scripted dialogue / narration / on-screen text exceeds a
tagline.
- **Add storyboarder** if the brief has more than 5 distinct beats and the
director hasn't already laid out a beat list.
- **Add cinematographer** if multiple renderer instances need consistent
visual language. (For a single-tool video, the renderer's own skill spec
is enough.)
- **Add image-generator + image-to-video-generator pair** for narrative film
with photorealistic visuals.
- **Add music-supervisor** when music is provided and rhythm matters
(music videos always; explainers sometimes).
- **Add voice-talent** for any voiceover / narrative dialogue.
- **Add audio-mixer** when there are 2+ audio sources (VO + music, music + SFX).
- **Add captioner** for accessibility-priority projects (explainer, tutorial,
any platform that defaults to muted playback).
- **Add reviewer** for high-stakes projects. Skip for quick experimental loops.
- **Add masterer** when multiple platform deliverables are needed.
## Anti-patterns
- **One renderer doing everything.** If scenes use very different tools
(ASCII + 3D + motion graphics), use specialized renderer variants. The
renderer loads ONE creative skill at a time; mixing styles in a single
renderer causes thrashing.
- **A separate profile per scene.** No. Profiles are per-role, not per-scene.
Eight scenes use one or two renderer profiles, not eight.
- **A "general" profile that does everything.** Worse than no specialization.
The kanban routing breaks down if every task fits every profile.
- **No reviewer for important deliverables.** Saves an hour of pipeline time
but ships flaws.

View file

@ -0,0 +1,305 @@
# Tool Matrix — Skills + Toolsets per Role
Maps each role archetype to the Hermes skills it should `always_load` and the
toolsets it needs. Only references skills that ship in the public hermes-agent
repository (under `skills/` or `optional-skills/`). External APIs and CLIs are
called from the terminal toolset; they don't appear in `always_load`.
## Hermes skills relevant to video production
### Visual / rendering skills (`hermes-agent/skills/creative/`)
| Skill | What it does | Best fit for |
|-------|--------------|--------------|
| `ascii-video` | Production pipeline for ASCII art video — generative, audio-reactive, video-to-ASCII | Renderer for ASCII / terminal / retro pixel content; cinematographer for ASCII projects |
| `ascii-art` | Static ASCII art generation | Concept artist for ASCII style frames; secondary tool for ASCII renderer |
| `manim-video` | Manim CE animations — math, algorithms, 3Blue1Brown-style explainers | Renderer for math, algorithm walkthroughs, technical concept explainers |
| `p5js` | p5.js sketches — generative art, shaders, interactive, 3D | Renderer for generative art, particle systems, organic motion, web-canvas content |
| `comfyui` | Generate images, video, audio with ComfyUI workflows (image-to-image, image-to-video, etc.) | image-generator, image-to-video-generator, or general renderer for AI-generated content |
| `touchdesigner-mcp` | Control a running TouchDesigner instance — real-time visuals, audio-reactive installation art, VJ | Renderer for real-time/audio-reactive content; installation art; live performance |
| `blender-mcp` *(optional)* | Control Blender 4.3+ via MCP — 3D modeling, animation, rendering | Renderer for 3D scenes, photoreal environments, character animation |
| `pixel-art` | Pixel art with era palettes (NES, Game Boy, PICO-8) | Renderer for retro game aesthetic; concept artist for pixel-style frames |
| `baoyu-comic` | Knowledge-comic generation (educational, biography, tutorial) | Renderer for comic-style narrative; explainer in panel form |
| `baoyu-infographic` | Infographic generation | Renderer for data-driven explainer scenes |
| `meme-generation` *(optional)* | Generate meme images by overlaying text on templates | Generator for satirical/social content; meme-style stills |
### Design / pre-production skills (`hermes-agent/skills/creative/`)
| Skill | What it does | Best fit for |
|-------|--------------|--------------|
| `claude-design` | Design one-off HTML artifacts (landing, deck, prototype) | Concept artist for product video style frames; storyboarder for UI-heavy content |
| `design-md` | Design markdown docs | Concept artist documenting visual specs |
| `popular-web-designs` | Reference patterns for popular web designs | Concept artist; cinematographer when matching a known UI aesthetic |
| `sketch` | Throwaway HTML mockups (2-3 design variants to compare) | Concept artist exploring directions; storyboarder for UI flows |
| `excalidraw` | Excalidraw-style hand-drawn diagrams | Storyboarder; concept artist for sketch-style frames |
| `architecture-diagram` | Software architecture diagrams | Storyboarder for technical content; explainer scenes about systems |
| `concept-diagrams` *(optional)* | Flat, minimal SVG diagrams (educational visual language; physics, chemistry, math, anatomy, etc.) | Renderer / storyboarder for explainer scenes with clean educational diagrams |
| `pretext` | Mathematical/scientific content authoring | Writer / cinematographer for technical-explainer pretexts |
| `creative-ideation` | Constraint-driven project ideation | Director / cinematographer when the brief is wide-open and needs framing |
| `humanizer` | Strip AI-isms from text, add real voice | Writer / copywriter post-process to avoid AI-tells in scripts and VO copy |
### Audio / media skills (`hermes-agent/skills/creative/` + `skills/media/`)
| Skill | What it does | Best fit for |
|-------|--------------|--------------|
| `songwriting-and-ai-music` | Songwriting craft + Suno prompt patterns | Music supervisor when commissioning a track via Suno |
| `heartmula` | Open-source music generation (Apache-2.0, Suno-like) | Music supervisor generating bespoke tracks without external APIs |
| `songsee` | Spectrograms, mel/chroma/MFCC of audio files | Music supervisor analyzing tracks; foley-designer designing to a beat; editor visualizing a mix |
| `spotify` | Spotify control — play, search, queue, manage playlists | Music supervisor sourcing existing tracks; reference research |
| `youtube-content` | Fetch transcripts + transform to chapters/summaries/posts | Documentary cut, content adaptation, research for explainers |
| `gif-search` | Find existing GIFs | Editor / concept artist sourcing references |
| `gifs` | GIF tooling | Masterer producing GIF deliverables |
### Kanban infrastructure (`hermes-agent/skills/devops/`)
| Skill | What it does | When to load |
|-------|--------------|--------------|
| `kanban-orchestrator` | Decomposition playbook + anti-temptation rules for orchestrator profiles | Director only |
| `kanban-worker` | Pitfalls, examples, edge cases for kanban workers (deeper than auto-injected guidance) | Any profile — load when handling tricky multi-step workflows |
The kanban plugin auto-injects baseline orchestration guidance into every
worker's system prompt — the `kanban_create` fan-out pattern, claim/handoff
lifecycle, and the "decompose, don't execute" rule for orchestrators.
`kanban-orchestrator` and `kanban-worker` are deeper playbooks loaded when a
profile needs them.
## External tools (called from terminal toolset)
These are **not** Hermes skills but external CLIs / APIs that profiles invoke.
They don't appear in `always_load`; instead the role's terminal commands hit
them directly.
| Tool | What it does | Profile that uses it |
|------|--------------|----------------------|
| `ffmpeg` | Video / audio encode, splice, mux | renderer, editor, audio-mixer, masterer |
| `ffprobe` | Inspect media | All media-touching profiles |
| Whisper (CLI or API) | Speech-to-text for captions | captioner |
| Text-to-image API (FAL / Replicate / OpenAI / Midjourney) | Stills generation | image-generator (alternative to local `comfyui`) |
| Image-to-video API (Runway / Kling / Luma / Pika) | Animate stills | image-to-video-generator |
| Text-to-speech API (ElevenLabs / OpenAI TTS / etc.) | Voiceover generation | voice-talent |
| Suno API or web | Track composition (paired with `songwriting-and-ai-music`) | music-supervisor |
| Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics |
| Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim |
| Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d |
| Gemini multimodal / Claude vision | AI review of clips | reviewer, cinematographer, editor |
## Standard toolset configurations per role
### director
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-orchestrator
```
The director's terminal access is conventional but the SOUL.md rules forbid
execution. Audit logs catch violations.
### writer / copywriter
```yaml
toolsets:
- kanban
- file
skills:
always_load:
- kanban-worker
- humanizer # post-process scripts to strip AI-tells
```
No terminal — writers don't need it.
### concept-artist
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
# plus one or more (style-dependent):
# - claude-design (UI / web product video)
# - sketch (quick mockup variants)
# - excalidraw (hand-drawn frames)
# - ascii-art (ASCII style frames)
# - pixel-art (retro/game aesthetic)
# - popular-web-designs (matching known web aesthetic)
# - design-md (text-based design docs)
```
### storyboarder
```yaml
toolsets:
- kanban
- file
skills:
always_load:
- kanban-worker
# one of:
# - excalidraw (sketch storyboards)
# - architecture-diagram (technical/system content)
# - concept-diagrams (educational / scientific content)
```
### cinematographer
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
# the visual skill that matches the project, e.g.:
# - ascii-video (ASCII projects)
# - manim-video (math/explainer)
# - p5js (generative)
# - comfyui (AI-generated visuals)
# - blender-mcp (3D)
# - touchdesigner-mcp (real-time/installation)
```
### renderer (specialized variants)
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
# ONE skill per renderer variant (or empty for external-API renderers):
# - ascii-video (renderer-ascii)
# - manim-video (renderer-manim)
# - p5js (renderer-p5js)
# - comfyui (renderer-comfyui — img/video AI gen)
# - touchdesigner-mcp (renderer-touchdesigner)
# - blender-mcp (renderer-3d)
# - pixel-art (renderer-pixel)
# - baoyu-comic (renderer-comic)
# - meme-generation (renderer-meme)
```
For external-API renderers (image-to-video-generator using Runway, voice-talent
using ElevenLabs, renderer-motion-graphics using Remotion), `always_load` only
contains `kanban-worker` — the role's work is API-driven and the API key +
terminal commands suffice.
For multi-skill renderer setups (rare — usually one variant per skill is
cleaner) use `--skill <name>` on individual `kanban_create` calls to override
which skill loads for that specific task.
### image-generator / image-to-video-generator / voice-talent
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
# for image-generator that drives ComfyUI locally:
# - comfyui
env_required:
# populate based on the chosen API:
- FAL_KEY # or REPLICATE_API_TOKEN, OPENAI_API_KEY for image-gen
- RUNWAY_API_KEY # or KLING_API_KEY, LUMA_API_KEY for image-to-video
- ELEVENLABS_API_KEY # or OPENAI_API_KEY for TTS
```
If the user's setup has ComfyUI installed locally, the `comfyui` skill can
replace the external image-gen API entirely (cheaper, more control, supports
custom workflows for image-to-video too).
### music-supervisor
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
- songsee # spectrograms / audio analysis
# plus (depending on what the project needs):
# - songwriting-and-ai-music (commissioning Suno tracks)
# - heartmula (commissioning open-source local generation)
# - spotify (sourcing existing tracks)
```
### editor / audio-mixer / captioner / masterer
```yaml
toolsets:
- kanban
- terminal
- file
skills:
always_load:
- kanban-worker
```
These are mostly ffmpeg-driven; no special skill needed beyond `kanban-worker`.
For captioner add Whisper invocation patterns to the SOUL.md.
### reviewer / brand-cop
```yaml
toolsets:
- kanban
- terminal # for media inspection
- file
skills:
always_load:
- kanban-worker
env_required:
- OPENROUTER_API_KEY # if using Gemini multimodal review
# or ANTHROPIC_API_KEY if using Claude vision (already required globally)
```
## API key requirements
Track these in the project setup. The setup script should verify each required
key is present in `~/.hermes/.env` (or macOS Keychain) before firing the kanban.
| Service | Env var | Used by |
|---------|---------|---------|
| ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent |
| OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) |
| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (Gemini multimodal review) |
| FAL | `FAL_KEY` | image-generator (FAL flux models) |
| Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) |
| Runway | `RUNWAY_API_KEY` | image-to-video-generator |
| Kling | `KLING_API_KEY` | image-to-video-generator (alternate) |
| Luma | `LUMA_API_KEY` | image-to-video-generator (alternate) |
| Suno | `SUNO_API_KEY` | music-supervisor (paired with `songwriting-and-ai-music`) |
| Spotify | `SPOTIFY_CLIENT_ID` + `SPOTIFY_CLIENT_SECRET` | music-supervisor (paired with `spotify` skill) |
| Anthropic | `ANTHROPIC_API_KEY` | every Hermes profile (Claude) |
If a key is missing, prompt the user to add it. Storage methods, in order of
preference: macOS Keychain → `~/.hermes/.env` → environment variable.
## Skill version pinning
If a specific skill version is desired, pass it via the per-task
`--skill <name>=<version>` flag. The default is whatever's installed.
## Adding a new skill to the matrix
When a new Hermes-public video skill ships:
1. Add a row to the relevant table at the top of this file
2. If it warrants a specialized renderer variant, add to `role-archetypes.md`
3. Update relevant per-style examples in `examples.md`

View file

@ -0,0 +1,501 @@
#!/usr/bin/env python3
"""
Bootstrap a video production kanban from a structured plan JSON.
Reads a plan.json describing the team + brief, expands templates from
../assets/, and writes a setup.sh that creates Hermes profiles and fires the
initial kanban task.
Profile-config patching, SOUL.md-per-profile, TEAM.md task-graph convention,
and the `hermes kanban create --workspace dir:` initial-task pattern are
adapted from alt-glitch's NousResearch/kanban-video-pipeline.
Usage:
bootstrap_pipeline.py plan.json [--out setup.sh]
The plan.json schema is documented inline below see the `validate_plan`
function. A minimal example:
{
"title": "Q3 Product Teaser",
"slug": "q3-product-teaser",
"tenant": "q3-product-teaser",
"duration_s": 30,
"aspect": "1:1",
"resolution": "1080x1080",
"fps": 30,
"team": [
{
"profile": "director",
"role": "director",
"toolsets": ["kanban", "terminal", "file"],
"skills": [],
"responsibilities": "...",
"inputs": "brief.md, TEAM.md, taste/",
"outputs": "kanban tasks for the team"
},
...
],
"scenes": [
{"n": 1, "time": "0:00-0:08", "content": "...", "tool": "renderer-ascii"},
...
],
"audio": {"approach": "voiceover + music bed", "vo": "ElevenLabs Lily",
"music": "license-free", "sfx": "n/a"},
"deliverables": [
{"format": "mp4", "resolution": "1080x1080", "notes": "primary"}
],
"api_keys_required": ["ELEVENLABS_API_KEY", "OPENROUTER_API_KEY"],
"brief_extra": {
"concept_one_liner": "...",
"emotional_north_star": "...",
"visual_refs": "...",
"tone": "...",
"brand_constraints": "..."
}
}
"""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
from pathlib import Path
ASSETS_DIR = Path(__file__).resolve().parent.parent / "assets"
def load_template(name: str) -> str:
return (ASSETS_DIR / name).read_text()
PROFILE_NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
SLUG_RE = re.compile(r"^[a-z0-9][a-z0-9-]+$")
def validate_plan(plan: dict) -> list[str]:
"""Return a list of validation error strings; empty list = valid."""
errors = []
required_top = ["title", "slug", "tenant", "duration_s", "aspect",
"resolution", "fps", "team", "scenes", "audio",
"deliverables"]
for k in required_top:
if k not in plan:
errors.append(f"missing required key: {k}")
if "team" in plan:
if not isinstance(plan["team"], list) or not plan["team"]:
errors.append("team must be a non-empty list")
else:
roles = [t.get("role") for t in plan["team"]]
if "director" not in roles:
errors.append("team must include a director role")
seen_profiles = set()
for i, t in enumerate(plan["team"]):
for k in ["profile", "role", "toolsets", "skills",
"responsibilities"]:
if k not in t:
errors.append(f"team[{i}] missing {k}")
# Profile name must match Hermes's regex (lowercase
# alphanumeric + hyphens + underscores, up to 64 chars).
if "profile" in t:
if not PROFILE_NAME_RE.match(t["profile"]):
errors.append(
f"team[{i}].profile {t['profile']!r} must match "
f"[a-z0-9][a-z0-9_-]{{0,63}} per Hermes profile rules"
)
if t["profile"] in seen_profiles:
errors.append(
f"team[{i}].profile {t['profile']!r} is duplicated"
)
seen_profiles.add(t["profile"])
# Toolsets / skills must be lists, not strings.
if "toolsets" in t and not isinstance(t["toolsets"], list):
errors.append(
f"team[{i}].toolsets must be a list of strings"
)
if "skills" in t and not isinstance(t["skills"], list):
errors.append(
f"team[{i}].skills must be a list of strings"
)
if "slug" in plan:
if not SLUG_RE.match(plan["slug"]):
errors.append("slug must be lowercase, hyphenated, "
"starting with [a-z0-9]")
return errors
def render_brief(plan: dict) -> str:
"""Render brief.md from the plan."""
tmpl = load_template("brief.md.tmpl")
extra = plan.get("brief_extra", {})
# Scene table rows
scene_rows = []
for s in plan["scenes"]:
scene_rows.append(
f"| {s.get('n', '?')} | {s.get('time', '?')} | "
f"{s.get('content', '')} | {s.get('tool', '')} | "
f"{s.get('audio', '')} | {s.get('notes', '')} |"
)
scene_table = "\n".join(scene_rows) if scene_rows else "_(none yet)_"
# Deliverable rows
deliv_rows = []
for d in plan["deliverables"]:
deliv_rows.append(
f"| {d.get('format', '?')} | {d.get('resolution', '?')} | "
f"{d.get('notes', '')} |"
)
deliv_table = "\n".join(deliv_rows) if deliv_rows else "_(none)_"
# Replacements (single-pass)
replacements = {
"TITLE": plan["title"],
"SLUG": plan["slug"],
"TENANT": plan["tenant"],
"WORKSPACE": f"~/projects/video-pipeline/{plan['slug']}",
"ONE_LINE_PITCH": extra.get("concept_one_liner", "_(TBD)_"),
"EMOTIONAL_NORTH_STAR": extra.get("emotional_north_star", "_(TBD)_"),
"DURATION_S": str(plan["duration_s"]),
"ASPECT": plan["aspect"],
"RESOLUTION": plan["resolution"],
"FPS": str(plan["fps"]),
"PLATFORMS": extra.get("platforms", "_(TBD)_"),
"DEADLINE": extra.get("deadline", "_(none)_"),
"QUALITY_BAR": extra.get("quality_bar", "polished"),
"VISUAL_REFS": extra.get("visual_refs", "_(none)_"),
"TONE": extra.get("tone", "_(TBD)_"),
"BRAND_CONSTRAINTS": extra.get("brand_constraints", "_(none)_"),
"AESTHETIC_RULES": extra.get("aesthetic_rules", "_(TBD)_"),
"AUDIO_APPROACH": plan["audio"].get("approach", "_(TBD)_"),
"VO_DETAILS": plan["audio"].get("vo", "_(n/a)_"),
"MUSIC_DETAILS": plan["audio"].get("music", "_(n/a)_"),
"SFX_DETAILS": plan["audio"].get("sfx", "_(n/a)_"),
"PRIMARY_FORMAT": plan["deliverables"][0]["format"],
"PRIMARY_RES": plan["deliverables"][0]["resolution"],
"ALT_FORMAT_1": (plan["deliverables"][1]["format"]
if len(plan["deliverables"]) > 1 else "_(none)_"),
"ALT_RES_1": (plan["deliverables"][1]["resolution"]
if len(plan["deliverables"]) > 1 else ""),
"ALT_NOTES_1": (plan["deliverables"][1].get("notes", "")
if len(plan["deliverables"]) > 1 else ""),
"API_KEYS_REQUIRED": ", ".join(plan.get("api_keys_required", [])) or "none",
"EXT_DEPS": extra.get("ext_deps", "ffmpeg, Python 3.11+"),
"SOURCE_ASSETS": extra.get("source_assets", "_(none)_"),
}
out = tmpl
for k, v in replacements.items():
out = out.replace("{{" + k + "}}", str(v))
# Scene + deliv tables: replace the placeholder row in the template
out = re.sub(
r"\|\s*1\s*\|\s*0:000:0X.+?\n\|\s*2\s*\|.+?\n",
scene_table + "\n",
out, flags=re.DOTALL,
)
return out
def render_team_md(plan: dict) -> str:
"""Render TEAM.md from the team list + scene → tool mapping."""
lines = [f"# Team & Task Graph — {plan['title']}", "", "## Team", ""]
for t in plan["team"]:
skills = (
f"loads `{', '.join(t['skills'])}`"
if t["skills"] else "no skills required"
)
lines.append(
f"- `{t['profile']}` — {t['responsibilities']} ({skills})"
)
lines.extend(["", "## Task Graph", "", "```"])
# Build a simple task graph based on conventions
profiles_by_role = {t["role"]: t["profile"] for t in plan["team"]}
director = profiles_by_role.get("director", "director")
lines.append(f"T0 {director} — decompose")
next_id = 1
parents_for_renderer: list[str] = ["T0"]
if "cinematographer" in profiles_by_role:
cid = f"T{next_id}"
lines.append(
f"{cid:5} {profiles_by_role['cinematographer']} — visual spec for all scenes (parent: T0)"
)
parents_for_renderer = [cid]
next_id += 1
if "music-supervisor" in profiles_by_role:
cid = f"T{next_id}"
lines.append(
f"{cid:5} {profiles_by_role['music-supervisor']} — track analysis + beats.json (parent: T0)"
)
next_id += 1
ms_id = cid
else:
ms_id = None
# Scenes
scene_ids = []
for s in plan["scenes"]:
cid = f"T{next_id}"
renderer_profile = s.get("tool") or "renderer"
# Lookup the actual profile name
for t in plan["team"]:
if t["role"] == renderer_profile or t["profile"] == renderer_profile:
renderer_profile = t["profile"]
break
parents = parents_for_renderer + ([ms_id] if ms_id else [])
parent_str = ", ".join(parents)
lines.append(
f"{cid:5} {renderer_profile} — scene {s.get('n', '?')}: "
f"{s.get('content', '')[:50]} (parents: {parent_str})"
)
scene_ids.append(cid)
next_id += 1
# VO + audio mix
if "voice-talent" in profiles_by_role:
vo_id = f"T{next_id}"
lines.append(f"{vo_id:5} {profiles_by_role['voice-talent']} — narration (parent: T0)")
next_id += 1
else:
vo_id = None
if "audio-mixer" in profiles_by_role:
am_id = f"T{next_id}"
am_parents = [p for p in [ms_id, vo_id] if p]
lines.append(
f"{am_id:5} {profiles_by_role['audio-mixer']} — mix audio (parents: {', '.join(am_parents)})"
)
next_id += 1
else:
am_id = None
# Editor
if "editor" in profiles_by_role:
ed_id = f"T{next_id}"
ed_parents = scene_ids + [p for p in [am_id, vo_id, ms_id] if p and p not in scene_ids]
lines.append(
f"{ed_id:5} {profiles_by_role['editor']} — assemble + mux (parents: {', '.join(ed_parents)})"
)
next_id += 1
else:
ed_id = None
# Captioner
if "captioner" in profiles_by_role and ed_id:
cap_id = f"T{next_id}"
lines.append(
f"{cap_id:5} {profiles_by_role['captioner']} — SRT + burn (parent: {ed_id})"
)
next_id += 1
last = cap_id
else:
last = ed_id
# Reviewer
if "reviewer" in profiles_by_role and last:
rv_id = f"T{next_id}"
lines.append(
f"{rv_id:5} {profiles_by_role['reviewer']} — final QA (parent: {last})"
)
lines.append("```")
lines.extend([
"",
"## Per-task workspace requirement",
"",
f"All `kanban_create` calls MUST pass:",
f"```",
f'workspace_kind="dir"',
f'workspace_path="$HOME/projects/video-pipeline/{plan["slug"]}"',
f'tenant="{plan["tenant"]}"',
f"```",
])
return "\n".join(lines)
def render_setup_sh(plan: dict, brief_md: str, team_md: str) -> str:
"""Render setup.sh from the plan."""
tmpl = load_template("setup.sh.tmpl")
# API key checks
key_checks = []
for key in plan.get("api_keys_required", []):
key_checks.append(f'check_key {key} hermes {key} || exit 1')
key_checks_str = "\n".join(key_checks) if key_checks else "# (no API keys required)"
# Scene dirs
scene_dir_lines = []
for s in plan["scenes"]:
n = s.get("n", "?")
scene_dir_lines.append(f'mkdir -p "$WORKSPACE/scenes/scene-{n:02d}"/checkpoints')
scene_dirs = "\n".join(scene_dir_lines) if scene_dir_lines else ""
# Profile create
profile_creates = []
for t in plan["team"]:
profile_creates.append(
f'hermes profile create {t["profile"]} --clone 2>/dev/null || true'
)
# Profile config — emit JSON arrays so the bash function can pass them
# safely through to the Python YAML patcher.
profile_configs = []
for t in plan["team"]:
ts_json = json.dumps(t["toolsets"])
sk_json = json.dumps(t["skills"])
# Use single-quoted bash strings; JSON only contains "/[/], no single
# quotes, so this is safe.
profile_configs.append(
f"configure_profile {t['profile']!r} {ts_json!r} {sk_json!r}"
)
# SOUL writes — uses heredocs per profile
soul_writes = []
for t in plan["team"]:
soul_writes.append(
f'cat > "$HOME/.hermes/profiles/{t["profile"]}/SOUL.md" <<\'SOUL_EOF\'\n'
f"{render_soul_md(t, plan)}\n"
f"SOUL_EOF\n"
f'echo " ✓ SOUL.md for {t["profile"]}"'
)
# Taste writes (placeholder; real content optional)
taste_writes = (
'cat > "$WORKSPACE/taste/brand-guide.md" <<\'TASTE_EOF\'\n'
'# Brand Guide\n\n'
'_(Populate with project-specific colors, typography, motion rules)_\n'
'TASTE_EOF\n'
'cat > "$WORKSPACE/taste/emotional-dna.md" <<\'DNA_EOF\'\n'
'# Emotional DNA\n\n'
'_(What this piece should FEEL like — populate from the brief.)_\n'
'DNA_EOF'
)
# Asset copies — leave empty by default; user fills in
asset_copies = "# Add cp/rsync commands here for any provided assets"
out = tmpl
out = out.replace("{{TITLE}}", plan["title"])
out = out.replace("{{SLUG}}", plan["slug"])
out = out.replace("{{TENANT}}", plan["tenant"])
out = out.replace("{{WORKSPACE}}", f"~/projects/video-pipeline/{plan['slug']}")
out = out.replace("{{KEY_CHECKS}}", key_checks_str)
out = out.replace("{{SCENE_DIRS}}", scene_dirs)
out = out.replace("{{PROFILE_CREATE_COMMANDS}}", "\n".join(profile_creates))
out = out.replace("{{PROFILE_CONFIG_COMMANDS}}", "\n".join(profile_configs))
out = out.replace("{{SOUL_WRITES}}", "\n".join(soul_writes))
out = out.replace("{{BRIEF_CONTENTS}}", brief_md)
out = out.replace("{{TEAM_CONTENTS}}", team_md)
out = out.replace("{{TASTE_WRITES}}", taste_writes)
out = out.replace("{{ASSET_COPIES}}", asset_copies)
return out
def render_soul_md(team_member: dict, plan: dict) -> str:
"""Render a profile's SOUL.md from a team member dict + plan context."""
tmpl = load_template("soul.md.tmpl")
role = team_member["role"]
common_rules = (
"- **Read the brief and team graph** before doing anything else.\n"
"- **Pass `workspace_kind=\"dir\"` and `workspace_path` on every "
"`kanban_create` call.** This keeps the team in one shared workspace.\n"
f"- **Use tenant `{plan['tenant']}`** on every kanban call.\n"
"- **Write outputs to predictable paths.** Other profiles depend on "
"your filename conventions.\n"
"- **Emit heartbeats** during long-running work. Renderers should "
"report frame counts; editors should report assembly progress.\n"
)
if role == "director":
common_rules += (
"- **Do not execute the work yourself.** For every concrete task, "
"create a kanban task and assign it to the appropriate profile.\n"
"- **Decompose, route, comment, approve — that's the whole job.**\n"
"- **Read TEAM.md** for the canonical task graph. Do not invent "
"new roles unless the brief truly demands it.\n"
"- **Load the `kanban-orchestrator` skill** for the deeper "
"decomposition playbook beyond the auto-injected baseline.\n"
)
common_commands = (
"```bash\n"
"# Inspect a clip\n"
"ffprobe -v quiet -show_entries format=duration -show_entries "
"stream=codec_name,width,height,r_frame_rate <file.mp4>\n"
"\n"
"# Extract a frame for QA\n"
"ffmpeg -y -i <input.mp4> -vf \"select='eq(n,30)'\" -vsync vfr <out.png>\n"
"```"
)
out = tmpl
out = out.replace("{{ROLE_NAME}}", role)
out = out.replace("{{ROLE_RESPONSIBILITIES}}", team_member["responsibilities"])
out = out.replace("{{INPUTS_READ}}", team_member.get("inputs", "_(see brief)_"))
out = out.replace("{{OUTPUTS_PRODUCED}}", team_member.get("outputs", "_(see brief)_"))
out = out.replace("{{TOOLSETS}}", ", ".join(team_member["toolsets"]))
out = out.replace(
"{{SKILLS}}",
", ".join(team_member["skills"]) if team_member["skills"] else "(none)"
)
out = out.replace(
"{{EXTERNAL_TOOLS}}",
team_member.get("external_tools", "ffmpeg, ffprobe (via terminal)")
)
out = out.replace(
"{{ROLE_RULES}}",
team_member.get("role_rules", "_(see TEAM.md and brief.md)_")
)
out = out.replace("{{COMMON_RULES}}", common_rules)
out = out.replace("{{COMMON_COMMANDS}}", common_commands)
return out
def main():
ap = argparse.ArgumentParser(description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
ap.add_argument("plan_json", help="Path to plan.json")
ap.add_argument("--out", default="setup.sh",
help="Output path for setup.sh (default: ./setup.sh)")
ap.add_argument("--brief-out", default=None,
help="Write brief.md alongside (default: skipped)")
ap.add_argument("--team-out", default=None,
help="Write TEAM.md alongside (default: skipped)")
args = ap.parse_args()
plan = json.loads(Path(args.plan_json).read_text())
errors = validate_plan(plan)
if errors:
print("Plan validation failed:", file=sys.stderr)
for e in errors:
print(f" - {e}", file=sys.stderr)
sys.exit(2)
brief = render_brief(plan)
team = render_team_md(plan)
setup = render_setup_sh(plan, brief, team)
Path(args.out).write_text(setup)
os.chmod(args.out, 0o755)
print(f"Wrote {args.out}")
if args.brief_out:
Path(args.brief_out).write_text(brief)
print(f"Wrote {args.brief_out}")
if args.team_out:
Path(args.team_out).write_text(team)
print(f"Wrote {args.team_out}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,195 @@
#!/usr/bin/env python3
"""
Monitor a running video-production kanban. Polls `hermes kanban list` and
`events` for a tenant and surfaces issues (stuck tasks, missing heartbeats,
repeated retries, dependency deadlocks).
Usage:
monitor.py --tenant <project-slug> [--interval 30]
Outputs a periodic snapshot to stdout. Sends alerts via stderr when issues
are detected. Designed to run alongside the kanban kill with Ctrl-C when
you're satisfied (or scripted to stop on completion).
This is best-effort observability. It does not auto-restart tasks; intervention
decisions should remain human/AI-overseen.
"""
from __future__ import annotations
import argparse
import json
import shutil
import subprocess
import sys
import time
from collections import defaultdict
from datetime import datetime, timedelta
def hermes_available() -> bool:
return shutil.which("hermes") is not None
def kanban_list(tenant: str) -> list[dict]:
"""Returns parsed task rows. Falls back to plain stdout parsing if JSON
output isn't supported by the installed hermes CLI."""
try:
out = subprocess.run(
["hermes", "kanban", "list", "--tenant", tenant, "--json"],
capture_output=True, text=True, check=False,
)
if out.returncode == 0 and out.stdout.strip().startswith("["):
return json.loads(out.stdout)
except (FileNotFoundError, json.JSONDecodeError):
pass
# Fallback: textual parse of `hermes kanban list`
out = subprocess.run(
["hermes", "kanban", "list", "--tenant", tenant],
capture_output=True, text=True, check=False,
)
rows = []
for line in out.stdout.splitlines():
line = line.strip()
if not line or line.startswith("#") or "STATUS" in line.upper():
continue
parts = line.split()
if len(parts) >= 4 and parts[0].startswith("t_"):
rows.append({
"id": parts[0],
"status": parts[1] if len(parts) > 1 else "?",
"assignee": parts[2] if len(parts) > 2 else "?",
"title": " ".join(parts[3:]) if len(parts) > 3 else "",
"started_at": None,
"heartbeat_at": None,
"max_runtime_s": None,
})
return rows
def kanban_show(task_id: str) -> dict | None:
out = subprocess.run(
["hermes", "kanban", "show", task_id, "--json"],
capture_output=True, text=True, check=False,
)
if out.returncode != 0:
return None
try:
return json.loads(out.stdout)
except json.JSONDecodeError:
return None
def detect_issues(tasks: list[dict]) -> list[str]:
"""Return a list of issue strings, one per concern."""
now = datetime.now()
issues: list[str] = []
by_status = defaultdict(list)
for t in tasks:
by_status[t.get("status", "?")].append(t)
# Stuck tasks: RUNNING with no heartbeat in 2 min
for t in by_status.get("running", []) + by_status.get("RUNNING", []):
hb = t.get("heartbeat_at")
if not hb:
continue
try:
hb_dt = datetime.fromisoformat(str(hb).rstrip("Z"))
except ValueError:
continue
if now - hb_dt > timedelta(minutes=2):
issues.append(
f"STUCK: {t['id']} ({t.get('assignee', '?')}) — "
f"no heartbeat in {(now - hb_dt).total_seconds():.0f}s"
)
# Tasks exceeding max_runtime
for t in by_status.get("running", []) + by_status.get("RUNNING", []):
started = t.get("started_at")
max_rt = t.get("max_runtime_s")
if not started or not max_rt:
continue
try:
started_dt = datetime.fromisoformat(str(started).rstrip("Z"))
except ValueError:
continue
elapsed = (now - started_dt).total_seconds()
if elapsed > max_rt:
issues.append(
f"OVERTIME: {t['id']} ({t.get('assignee', '?')}) — "
f"running {elapsed:.0f}s, cap was {max_rt}s"
)
# Repeated retries
for t in tasks:
retries = t.get("retries", 0)
if retries and retries >= 2:
issues.append(
f"FLAPPING: {t['id']} ({t.get('assignee', '?')}) — "
f"retried {retries}× — fix root cause before next run"
)
return issues
def snapshot(tenant: str) -> tuple[list[dict], list[str]]:
tasks = kanban_list(tenant)
issues = detect_issues(tasks)
return tasks, issues
def print_snapshot(tasks: list[dict], issues: list[str]):
counts = defaultdict(int)
for t in tasks:
counts[str(t.get("status", "?")).lower()] += 1
print(f"\n[{datetime.now().strftime('%H:%M:%S')}] "
f"Total: {len(tasks)} | "
+ " | ".join(f"{k}: {v}" for k, v in sorted(counts.items())))
for t in tasks:
bar = "" if str(t.get("status", "")).lower() == "done" else \
"" if str(t.get("status", "")).lower() == "running" else \
"·" if str(t.get("status", "")).lower() == "ready" else \
"" if str(t.get("status", "")).lower() == "failed" else "?"
print(f" {bar} {t.get('id', '?'):14} {t.get('assignee', '?'):20} "
f"{t.get('title', '')[:60]}")
if issues:
print("\n ⚠ ISSUES:", file=sys.stderr)
for i in issues:
print(f" {i}", file=sys.stderr)
def main():
ap = argparse.ArgumentParser(description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
ap.add_argument("--tenant", required=True,
help="Project tenant slug to monitor")
ap.add_argument("--interval", type=int, default=30,
help="Poll interval in seconds (default: 30)")
ap.add_argument("--once", action="store_true",
help="Print one snapshot and exit (no polling loop)")
args = ap.parse_args()
if not hermes_available():
print("ERROR: 'hermes' CLI not found in PATH", file=sys.stderr)
sys.exit(1)
if args.once:
tasks, issues = snapshot(args.tenant)
print_snapshot(tasks, issues)
sys.exit(0 if not issues else 2)
print(f"Monitoring tenant '{args.tenant}' every {args.interval}s. "
"Ctrl-C to exit.")
try:
while True:
tasks, issues = snapshot(args.tenant)
print_snapshot(tasks, issues)
time.sleep(args.interval)
except KeyboardInterrupt:
print("\nStopped.")
if __name__ == "__main__":
main()