hermes-agent/optional-skills/creative/kanban-video-orchestrator/references/examples.md
SHL0MS 0dd8e3f8d8 rename: video-orchestrator → kanban-video-orchestrator
The kanban prefix makes the skill discoverable alongside `kanban-orchestrator`
and `kanban-worker`, and signals up front that this skill drives the kanban
plugin rather than being a generic video tool.

Updated:
- directory rename
- SKILL.md frontmatter `name:` and H1
- setup.sh.tmpl header
2026-05-03 10:26:54 -07:00

10 KiB

Worked Examples

Six concrete pipelines covering different video styles. Each shows the team composition, task graph, and skill/tool choices the orchestrator would make for that brief. These are illustrative, not templates — adapt to the actual brief.

Example 1 — Narrative short film (text-to-image → image-to-video → cut)

Brief: A 90-second noir-style short. A detective walks through a rainy city. Voiceover narration. AI-generated visuals.

Team:

  • director — vision, decomposition, approval
  • writer — script + voiceover copy (loads humanizer for natural voice)
  • storyboarder — beat-by-beat shot list (loads excalidraw)
  • image-generator — generates each shot's still via local ComfyUI workflows (loads comfyui)
  • image-to-video-generator — animates each still (Runway/Kling, OR ComfyUI's AnimateDiff/WAN workflows via comfyui)
  • voice-talent — narration via ElevenLabs
  • audio-mixer — VO + ambient pad
  • editor — assembly + transitions
  • reviewer — final QA

Task graph:

T0  director         decompose
T1  writer           script + voiceover.md                    (parent: T0)
T2  storyboarder     shot list with framing per beat          (parent: T1)
T3  image-generator  one still per shot (~12 shots)           (parent: T2)
T4  image-to-video   animate each still                       (parent: T3)
T5  voice-talent     generate narration audio                 (parent: T1)
T6  audio-mixer      mix VO + ambient                         (parent: T5)
T7  editor           cut + transitions + audio mux            (parents: T4, T6)
T8  reviewer         final QA                                 (parent: T7)

Key choices:

  • Local ComfyUI via comfyui skill is preferred over external API for cost/control — but external APIs are fine if ComfyUI isn't installed
  • editor profile is ffmpeg-only, no Hermes skill required beyond kanban-worker
  • Storyboarder produces storyboard.excalidraw alongside the markdown

Example 2 — Product / marketing teaser

Brief: A 30-second product teaser for a developer tool. Shows code + terminal + UI screen recordings, voiceover, CTA at end. Square 1:1.

Team:

  • director
  • copywriter — taglines, voiceover script, CTA (loads humanizer)
  • concept-artist — style frames (loads claude-design for UI mockups)
  • renderer-motion-graphics — animated UI sequences (Remotion CLI)
  • renderer-ascii — terminal-style demo scenes (loads ascii-video)
  • voice-talent — VO via ElevenLabs
  • editor — assembly + brand-color treatment
  • audio-mixer — VO + light music bed
  • captioner — burned subtitles for muted-autoplay platforms
  • masterer — produces 1:1 + 9:16 + 16:9 variants

Task graph:

T0  director              decompose
T1  copywriter            copy.md + cta + vo script               (parent: T0)
T2  concept-artist        visual-spec.md + style frames           (parent: T1)
T3a renderer-motion-graphics  scene 1: UI sequence                (parent: T2)
T3b renderer-ascii        scene 2: terminal demo                  (parent: T2)
T3c renderer-motion-graphics  scene 3: feature highlight          (parent: T2)
T3d renderer-motion-graphics  scene 4: CTA card                   (parent: T2)
T4  voice-talent          narration                                (parent: T1)
T5  audio-mixer           VO + music bed                          (parent: T4)
T6  editor                cut + transitions                        (parents: T3*, T5)
T7  captioner             SRT + burned subtitles                  (parent: T6)
T8  masterer              1:1, 9:16, 16:9 variants                (parent: T7)

Key choices:

  • Multiple specialized renderers (motion-graphics + ASCII) coexist
  • Captioner is included because muted autoplay is the norm on social
  • claude-design skill for UI mockups maps directly to the product video idiom

Example 3 — Music video (synced to provided track)

Brief: A 3-minute music video for a provided lo-fi hip-hop track. Visuals should pulse with the beat. Generative + ASCII hybrid. Vertical 9:16.

Team:

  • director
  • music-supervisor — analyze track, emit audio/beats.json (loads songsee)
  • storyboarder — beat-aligned shot list (loads excalidraw)
  • renderer-ascii — ASCII scenes synced to bass kicks (loads ascii-video)
  • renderer-p5js — generative particle scenes synced to highs (loads p5js)
  • editor — beat-cut assembly using beats.json
  • reviewer — sync QA

Task graph:

T0  director              decompose
T1  music-supervisor      analyze track → beats.json + spectrogram  (parent: T0)
T2  storyboarder          shot list aligned to beats                (parents: T1, T0)
T3a renderer-ascii        scene 1: bass-driven ASCII                (parent: T2)
T3b renderer-p5js         scene 2: high-end particle field          (parent: T2)
... (more scenes)
T4  editor                cut to beats + mux track                  (parents: T3*, T1)
T5  reviewer              sync QA + final approval                  (parent: T4)

Key choices:

  • music-supervisor runs FIRST — beats.json gates the renderers
  • editor uses beats.json directly to align cuts to bass kicks
  • No voice-talent — music is the audio
  • Two specialized renderers (ascii-video + p5js) for visual variety

Example 4 — Math/algorithm explainer

Brief: A 2-minute explainer of an algorithm. 3Blue1Brown-style. Animated diagrams, equations, narration. Square 1:1.

Team:

  • director
  • writer — narration script (loads humanizer)
  • cinematographer — visual spec (loads manim-video)
  • renderer-manim — all animated scenes (loads manim-video)
  • voice-talent — narration via ElevenLabs
  • editor — assembly + audio mux
  • captioner — burned subtitles

Task graph:

T0  director           decompose
T1  writer             script + narration                  (parent: T0)
T2  cinematographer    visual spec for all scenes           (parent: T1)
T3a-Tn renderer-manim  scenes 1..N                          (parents: T2)
T4  voice-talent       narration audio                      (parent: T1)
T5  editor             cut + mux                            (parents: T3*, T4)
T6  captioner          SRT + burn                           (parent: T5)

Key choices:

  • manim-video skill drives both the cinematographer (visual language) and the renderer (actual scene production)
  • The manim-video skill's reference docs (animation-design-thinking, scene-planning, equations) auto-load when needed via the renderer's pinned skill

Example 5 — ASCII video, music-track-only

Brief: A 60-second pure-ASCII video reactive to an existing track. No voiceover, no other tools. Square 1:1.

Team:

  • director
  • music-supervisor — track analysis (loads songsee)
  • renderer-ascii — all visuals (loads ascii-video)
  • editor — assembly + audio mux

Task graph:

T0  director           decompose
T1  music-supervisor   analyze track                       (parent: T0)
T2a renderer-ascii     scene 1                             (parents: T1, T0)
T2b renderer-ascii     scene 2                             (parents: T1, T0)
T2c renderer-ascii     scene 3                             (parents: T1, T0)
T3  editor             stitch + mux audio                  (parents: T2*)

Key choices:

  • Minimal team (4 profiles) for a focused single-tool project
  • No reviewer — short experimental piece, director approves directly
  • All scenes run through one renderer-ascii profile because the ascii-video skill covers everything

This example illustrates the rule: don't over-decompose. Three scenes through one renderer is fine. Don't spawn three renderer profiles.

Example 6 — Real-time / installation art

Brief: A 2-minute audio-reactive visual for a gallery installation. Driven by an audio input feed. TouchDesigner-based. 16:9 4K.

Team:

  • director
  • cinematographer — visual language spec (loads touchdesigner-mcp)
  • renderer-touchdesigner — all visuals + record-to-disk (loads touchdesigner-mcp)
  • audio-mixer — final loudness pass on the captured audio (optional if pre-mixed source)
  • editor — assemble final clip from TouchDesigner recording
  • reviewer — visual QA

Task graph:

T0  director                decompose
T1  cinematographer         TD operator graph spec           (parent: T0)
T2  renderer-touchdesigner  build TD network + record output (parent: T1)
T3  editor                  trim + audio mux                 (parent: T2)
T4  reviewer                final QA                         (parent: T3)

Key choices:

  • touchdesigner-mcp controls a running TouchDesigner instance — the cinematographer designs the operator graph, renderer builds it
  • Output is a recording from the running TD network, not a render-to-frames process; editor mostly just trims

Pattern recognition

When the user describes a video, look for these signals to map to an example:

  • Plot, characters, scripted dialogue → Example 1 (narrative)
  • Specific product, CTA, brand colors, voiceover → Example 2 (marketing)
  • Track file provided, "synced to music" → Example 3 (music video)
  • "Explain how X works", math/algorithm/concept walkthrough → Example 4 (manim explainer)
  • Terminal aesthetic, ASCII, retro pixel → Example 5 (ASCII)
  • "Audio-reactive", "real-time", "installation" → Example 6 (TouchDesigner)
  • Comic-style narrative → use renderer-comic (baoyu-comic skill)
  • Retro game / pixel-art aesthetic → use renderer-pixel (pixel-art skill)
  • 3D scene, photoreal environment → use renderer-3d (blender-mcp)
  • Generative art, particle system, shader → use renderer-p5js (p5js)
  • AI-generated photoreal stills + animation → use renderer-comfyui (comfyui) for both stills and image-to-video
  • "video about how the system works", recursive demo → composable from any of the above; the recursion is a rendering technique, not a style

The actual team should be derived from the specific brief — these examples are starting points, not endpoints.