hermes-agent/optional-skills/creative/kanban-video-orchestrator/references/intake.md
SHL0MS 0dd8e3f8d8 rename: video-orchestrator → kanban-video-orchestrator
The kanban prefix makes the skill discoverable alongside `kanban-orchestrator`
and `kanban-worker`, and signals up front that this skill drives the kanban
plugin rather than being a generic video tool.

Updated:
- directory rename
- SKILL.md frontmatter `name:` and H1
- setup.sh.tmpl header
2026-05-03 10:26:54 -07:00

7.8 KiB
Raw Blame History

Intake — Discovery Question Banks

The discovery process is adaptive. Always start with three baseline questions to identify the broad style category, then drill into a per-style question bank. Ask 2-4 questions at a time, listen, then proceed. Make reasonable assumptions whenever the user implies an answer.

Tier 0 — Baseline (always ask)

  1. What is the video? — One-sentence pitch
  2. How long? — Approximate duration
  3. Aspect ratio + target platform? — 16:9 / 9:16 / 1:1 / 4:5; X, IG, YouTube, internal, etc.

From these answers, classify the style category and pick the relevant Tier 1 follow-ups. Do not continue asking until you have at least these three.

Style classification

Map the brief to one of these archetypes (or a hybrid):

Archetype Tells
Narrative film Plot, characters, scenes-with-events, dialogue, location
Product / marketing A specific product or feature being shown / sold; CTA at end
Music video A specific track exists; visuals sync to music
Explainer / educational A concept being taught; voiceover-driven
Tutorial / changelog Software demo, terminal-heavy, technical
ASCII / terminal art Retro terminal aesthetic explicit, character-grid
Abstract / loop Generative, no plot, often perfect-loop
Documentary / interview cut Real footage, transcription-driven
Real-time / installation Audio-reactive, gallery installation, VJ output

If ambiguous, ask which category fits — don't guess. Hybrids are common (e.g., a product video with a narrative arc); decompose into the dominant mode + secondary modifiers.

Recursive / meta ("a video that shows its own production") is a rendering technique, not a separate style — compose it from any of the above by adding a two-pass render step where pass 2 uses pass 1's output as texture inside the final scene.

Tier 1 — Per-style follow-ups

Narrative film

  • Setting / world? — When and where the story takes place
  • Characters? — How many, archetypes, who carries dialogue
  • Beat list or full script? — Has the user written the story or do we draft it
  • Dialogue language? — Spoken lines, on-screen subs only, silent
  • Visual generation approach? — Text-to-image (FAL/Midjourney/Imagen) → image-to-video (Runway/Kling), 3D animation (Blender), 2D animation, procedural, or hybrid
  • Voice approach? — TTS (which voice), recorded VO, no dialogue
  • Music / score? — Commissioned (via songwriting-and-ai-music Suno prompts, or local heartmula), licensed track provided, silent

Product / marketing

  • Product? — Name, what it does, key feature being shown
  • Target audience? — Who's watching, what they care about
  • CTA? — Visit URL, install, sign up, etc.
  • Tone? — Serious, playful, technical, premium, edgy
  • Brand assets available? — Logo files, color palette, fonts, existing footage
  • Animation style? — Motion graphics (Remotion / AE-style), screen recording, generative, illustrated
  • Voiceover? — Yes (which voice / language) or text-only
  • Music? — Track provided, license-free needed, custom-composed

Music video

  • Track file? — Path to the audio (essential — we'll analyze BPM + beats)
  • Track length to use? — Full song or a section
  • Genre / energy? — Tells what visual rhythm and density to use
  • Lyric / narrative content? — Are there lyrics to render on screen, or is it purely visual?
  • Visual reference style? — Existing music videos / artists for reference
  • Performer footage? — None, has clips, will provide
  • Visual generation approach? — Per-beat generative, edit-driven cuts of stock footage, illustrated, hybrid

Explainer / educational

  • What concept is being taught? — One-sentence concept, key takeaway
  • Audience expertise? — Beginner / intermediate / expert
  • Diagram density? — Heavy math / formulas / code / abstract concepts
  • Voiceover? — TTS / recorded / on-screen text only
  • Tool preference?manim-video (math), p5js (generative), Remotion (UI motion graphics), comfyui (AI-generated visuals), ascii-video (technical/retro), hybrid
  • Pacing? — Fast and dense (3Blue1Brown) or slow and contemplative

Tutorial / changelog / software demo

  • Software being demonstrated? — Name, what it does
  • Demo script? — Sequence of commands / screens to show
  • Terminal-only or with GUI?
  • Voiceover for narration?
  • Diagram support needed? — Often these benefit from a diagram skill alongside the screen-capture/render step (excalidraw, architecture-diagram, concept-diagrams)

ASCII / terminal art

  • Source material? — Generative / driven by audio / converting existing video / static image starting point
  • Color palette? — Brand-driven (gold/black/blue), Matrix green, full rainbow, monochrome
  • Audio reactivity? — None / loose mood / tight beat sync / FFT-driven
  • Character set? — ASCII only / Unicode block-drawing / mystic glyphs
  • Loop or narrative? — Perfect loop or one-shot

Abstract / loop

  • Mood / emotion? — One word that captures the feel
  • Motion type? — Zoom-into-itself, particle drift, wave, geometric, organic
  • Loop required? — Perfect loop (Droste-style) or just satisfying ending
  • Audio? — Silent, ambient pad, beat-synced

Documentary / interview cut

  • Source footage? — Provided clips, length per clip
  • Transcript / subtitles? — Provided or to be generated
  • Story structure? — Chronological / thematic / arc
  • B-roll approach? — Generated, stock library, none

Real-time / installation

  • Output environment? — Gallery wall, projector, screen, web embed
  • Audio source? — Live audio input, pre-recorded track, both
  • Reactivity tightness? — Mood-level (loose) vs. tight beat-sync vs. live parameter control
  • Tool preference?touchdesigner-mcp for full TD operator graphs; p5js for web-canvas; comfyui for generative-AI fed by audio features

Tier 2 — Always ask near the end

  • Brand assets path? — Where logo / color palette / fonts / music library lives
  • Output format requirements? — Codec preference, target file size, accepted alternates (vertical cut, GIF, audio-only)
  • Deadline? — Affects task max_runtime_seconds and acceptable scope
  • Quality bar? — Rough draft for review / polished final / archival
  • Existing footage / assets to reuse? — Anything that should appear, not just inform

Reasonable assumption defaults

When the user under-specifies, fill in these defaults rather than asking:

Question Default
Frame rate 30 fps for X / IG; 60 fps for tutorials/explainers; 24 fps for narrative film
Resolution 1080×1080 for square, 1920×1080 for 16:9, 1080×1920 for 9:16
Codec H.264 / yuv420p, CRF 18
Audio codec AAC 192 kbps
Voice Provider's mid-range neutral voice unless brand calls for distinctive timbre
Music Silent (require user to specify if music is wanted)
Captions On for explainer/tutorial; off for narrative/abstract unless requested
Quality bar Polished final unless user says draft

State the assumption explicitly: "Assuming 30fps and AAC audio unless you say otherwise — proceed?"

Anti-patterns

  • Asking 10 questions at once. Maximum 4 per turn.
  • Asking for things the brief already implies. If the user said "music video for my track," do not ask "is there a track?"
  • Failing to classify before drilling in. Tier-1 questions depend on classification; mixing them up wastes turns.
  • Treating "make a video" as enough to proceed. Always confirm the three baseline questions.