The kanban prefix makes the skill discoverable alongside `kanban-orchestrator` and `kanban-worker`, and signals up front that this skill drives the kanban plugin rather than being a generic video tool. Updated: - directory rename - SKILL.md frontmatter `name:` and H1 - setup.sh.tmpl header
10 KiB
Worked Examples
Six concrete pipelines covering different video styles. Each shows the team composition, task graph, and skill/tool choices the orchestrator would make for that brief. These are illustrative, not templates — adapt to the actual brief.
Example 1 — Narrative short film (text-to-image → image-to-video → cut)
Brief: A 90-second noir-style short. A detective walks through a rainy city. Voiceover narration. AI-generated visuals.
Team:
director— vision, decomposition, approvalwriter— script + voiceover copy (loadshumanizerfor natural voice)storyboarder— beat-by-beat shot list (loadsexcalidraw)image-generator— generates each shot's still via local ComfyUI workflows (loadscomfyui)image-to-video-generator— animates each still (Runway/Kling, OR ComfyUI's AnimateDiff/WAN workflows viacomfyui)voice-talent— narration via ElevenLabsaudio-mixer— VO + ambient padeditor— assembly + transitionsreviewer— final QA
Task graph:
T0 director decompose
T1 writer script + voiceover.md (parent: T0)
T2 storyboarder shot list with framing per beat (parent: T1)
T3 image-generator one still per shot (~12 shots) (parent: T2)
T4 image-to-video animate each still (parent: T3)
T5 voice-talent generate narration audio (parent: T1)
T6 audio-mixer mix VO + ambient (parent: T5)
T7 editor cut + transitions + audio mux (parents: T4, T6)
T8 reviewer final QA (parent: T7)
Key choices:
- Local ComfyUI via
comfyuiskill is preferred over external API for cost/control — but external APIs are fine if ComfyUI isn't installed editorprofile is ffmpeg-only, no Hermes skill required beyondkanban-worker- Storyboarder produces
storyboard.excalidrawalongside the markdown
Example 2 — Product / marketing teaser
Brief: A 30-second product teaser for a developer tool. Shows code + terminal + UI screen recordings, voiceover, CTA at end. Square 1:1.
Team:
directorcopywriter— taglines, voiceover script, CTA (loadshumanizer)concept-artist— style frames (loadsclaude-designfor UI mockups)renderer-motion-graphics— animated UI sequences (Remotion CLI)renderer-ascii— terminal-style demo scenes (loadsascii-video)voice-talent— VO via ElevenLabseditor— assembly + brand-color treatmentaudio-mixer— VO + light music bedcaptioner— burned subtitles for muted-autoplay platformsmasterer— produces 1:1 + 9:16 + 16:9 variants
Task graph:
T0 director decompose
T1 copywriter copy.md + cta + vo script (parent: T0)
T2 concept-artist visual-spec.md + style frames (parent: T1)
T3a renderer-motion-graphics scene 1: UI sequence (parent: T2)
T3b renderer-ascii scene 2: terminal demo (parent: T2)
T3c renderer-motion-graphics scene 3: feature highlight (parent: T2)
T3d renderer-motion-graphics scene 4: CTA card (parent: T2)
T4 voice-talent narration (parent: T1)
T5 audio-mixer VO + music bed (parent: T4)
T6 editor cut + transitions (parents: T3*, T5)
T7 captioner SRT + burned subtitles (parent: T6)
T8 masterer 1:1, 9:16, 16:9 variants (parent: T7)
Key choices:
- Multiple specialized renderers (motion-graphics + ASCII) coexist
- Captioner is included because muted autoplay is the norm on social
claude-designskill for UI mockups maps directly to the product video idiom
Example 3 — Music video (synced to provided track)
Brief: A 3-minute music video for a provided lo-fi hip-hop track. Visuals should pulse with the beat. Generative + ASCII hybrid. Vertical 9:16.
Team:
directormusic-supervisor— analyze track, emitaudio/beats.json(loadssongsee)storyboarder— beat-aligned shot list (loadsexcalidraw)renderer-ascii— ASCII scenes synced to bass kicks (loadsascii-video)renderer-p5js— generative particle scenes synced to highs (loadsp5js)editor— beat-cut assembly usingbeats.jsonreviewer— sync QA
Task graph:
T0 director decompose
T1 music-supervisor analyze track → beats.json + spectrogram (parent: T0)
T2 storyboarder shot list aligned to beats (parents: T1, T0)
T3a renderer-ascii scene 1: bass-driven ASCII (parent: T2)
T3b renderer-p5js scene 2: high-end particle field (parent: T2)
... (more scenes)
T4 editor cut to beats + mux track (parents: T3*, T1)
T5 reviewer sync QA + final approval (parent: T4)
Key choices:
music-supervisorruns FIRST —beats.jsongates the rendererseditorusesbeats.jsondirectly to align cuts to bass kicks- No voice-talent — music is the audio
- Two specialized renderers (
ascii-video+p5js) for visual variety
Example 4 — Math/algorithm explainer
Brief: A 2-minute explainer of an algorithm. 3Blue1Brown-style. Animated diagrams, equations, narration. Square 1:1.
Team:
directorwriter— narration script (loadshumanizer)cinematographer— visual spec (loadsmanim-video)renderer-manim— all animated scenes (loadsmanim-video)voice-talent— narration via ElevenLabseditor— assembly + audio muxcaptioner— burned subtitles
Task graph:
T0 director decompose
T1 writer script + narration (parent: T0)
T2 cinematographer visual spec for all scenes (parent: T1)
T3a-Tn renderer-manim scenes 1..N (parents: T2)
T4 voice-talent narration audio (parent: T1)
T5 editor cut + mux (parents: T3*, T4)
T6 captioner SRT + burn (parent: T5)
Key choices:
manim-videoskill drives both the cinematographer (visual language) and the renderer (actual scene production)- The
manim-videoskill's reference docs (animation-design-thinking, scene-planning, equations) auto-load when needed via the renderer's pinned skill
Example 5 — ASCII video, music-track-only
Brief: A 60-second pure-ASCII video reactive to an existing track. No voiceover, no other tools. Square 1:1.
Team:
directormusic-supervisor— track analysis (loadssongsee)renderer-ascii— all visuals (loadsascii-video)editor— assembly + audio mux
Task graph:
T0 director decompose
T1 music-supervisor analyze track (parent: T0)
T2a renderer-ascii scene 1 (parents: T1, T0)
T2b renderer-ascii scene 2 (parents: T1, T0)
T2c renderer-ascii scene 3 (parents: T1, T0)
T3 editor stitch + mux audio (parents: T2*)
Key choices:
- Minimal team (4 profiles) for a focused single-tool project
- No reviewer — short experimental piece, director approves directly
- All scenes run through one
renderer-asciiprofile because theascii-videoskill covers everything
This example illustrates the rule: don't over-decompose. Three scenes through one renderer is fine. Don't spawn three renderer profiles.
Example 6 — Real-time / installation art
Brief: A 2-minute audio-reactive visual for a gallery installation. Driven by an audio input feed. TouchDesigner-based. 16:9 4K.
Team:
directorcinematographer— visual language spec (loadstouchdesigner-mcp)renderer-touchdesigner— all visuals + record-to-disk (loadstouchdesigner-mcp)audio-mixer— final loudness pass on the captured audio (optional if pre-mixed source)editor— assemble final clip from TouchDesigner recordingreviewer— visual QA
Task graph:
T0 director decompose
T1 cinematographer TD operator graph spec (parent: T0)
T2 renderer-touchdesigner build TD network + record output (parent: T1)
T3 editor trim + audio mux (parent: T2)
T4 reviewer final QA (parent: T3)
Key choices:
touchdesigner-mcpcontrols a running TouchDesigner instance — the cinematographer designs the operator graph, renderer builds it- Output is a recording from the running TD network, not a render-to-frames process; editor mostly just trims
Pattern recognition
When the user describes a video, look for these signals to map to an example:
- Plot, characters, scripted dialogue → Example 1 (narrative)
- Specific product, CTA, brand colors, voiceover → Example 2 (marketing)
- Track file provided, "synced to music" → Example 3 (music video)
- "Explain how X works", math/algorithm/concept walkthrough → Example 4 (manim explainer)
- Terminal aesthetic, ASCII, retro pixel → Example 5 (ASCII)
- "Audio-reactive", "real-time", "installation" → Example 6 (TouchDesigner)
- Comic-style narrative → use
renderer-comic(baoyu-comicskill) - Retro game / pixel-art aesthetic → use
renderer-pixel(pixel-artskill) - 3D scene, photoreal environment → use
renderer-3d(blender-mcp) - Generative art, particle system, shader → use
renderer-p5js(p5js) - AI-generated photoreal stills + animation → use
renderer-comfyui(comfyui) for both stills and image-to-video - "video about how the system works", recursive demo → composable from any of the above; the recursion is a rendering technique, not a style
The actual team should be derived from the specific brief — these examples are starting points, not endpoints.