The kanban prefix makes the skill discoverable alongside `kanban-orchestrator` and `kanban-worker`, and signals up front that this skill drives the kanban plugin rather than being a generic video tool. Updated: - directory rename - SKILL.md frontmatter `name:` and H1 - setup.sh.tmpl header
7.8 KiB
Intake — Discovery Question Banks
The discovery process is adaptive. Always start with three baseline questions to identify the broad style category, then drill into a per-style question bank. Ask 2-4 questions at a time, listen, then proceed. Make reasonable assumptions whenever the user implies an answer.
Tier 0 — Baseline (always ask)
- What is the video? — One-sentence pitch
- How long? — Approximate duration
- Aspect ratio + target platform? — 16:9 / 9:16 / 1:1 / 4:5; X, IG, YouTube, internal, etc.
From these answers, classify the style category and pick the relevant Tier 1 follow-ups. Do not continue asking until you have at least these three.
Style classification
Map the brief to one of these archetypes (or a hybrid):
| Archetype | Tells |
|---|---|
| Narrative film | Plot, characters, scenes-with-events, dialogue, location |
| Product / marketing | A specific product or feature being shown / sold; CTA at end |
| Music video | A specific track exists; visuals sync to music |
| Explainer / educational | A concept being taught; voiceover-driven |
| Tutorial / changelog | Software demo, terminal-heavy, technical |
| ASCII / terminal art | Retro terminal aesthetic explicit, character-grid |
| Abstract / loop | Generative, no plot, often perfect-loop |
| Documentary / interview cut | Real footage, transcription-driven |
| Real-time / installation | Audio-reactive, gallery installation, VJ output |
If ambiguous, ask which category fits — don't guess. Hybrids are common (e.g., a product video with a narrative arc); decompose into the dominant mode + secondary modifiers.
Recursive / meta ("a video that shows its own production") is a rendering technique, not a separate style — compose it from any of the above by adding a two-pass render step where pass 2 uses pass 1's output as texture inside the final scene.
Tier 1 — Per-style follow-ups
Narrative film
- Setting / world? — When and where the story takes place
- Characters? — How many, archetypes, who carries dialogue
- Beat list or full script? — Has the user written the story or do we draft it
- Dialogue language? — Spoken lines, on-screen subs only, silent
- Visual generation approach? — Text-to-image (FAL/Midjourney/Imagen) → image-to-video (Runway/Kling), 3D animation (Blender), 2D animation, procedural, or hybrid
- Voice approach? — TTS (which voice), recorded VO, no dialogue
- Music / score? — Commissioned (via
songwriting-and-ai-musicSuno prompts, or localheartmula), licensed track provided, silent
Product / marketing
- Product? — Name, what it does, key feature being shown
- Target audience? — Who's watching, what they care about
- CTA? — Visit URL, install, sign up, etc.
- Tone? — Serious, playful, technical, premium, edgy
- Brand assets available? — Logo files, color palette, fonts, existing footage
- Animation style? — Motion graphics (Remotion / AE-style), screen recording, generative, illustrated
- Voiceover? — Yes (which voice / language) or text-only
- Music? — Track provided, license-free needed, custom-composed
Music video
- Track file? — Path to the audio (essential — we'll analyze BPM + beats)
- Track length to use? — Full song or a section
- Genre / energy? — Tells what visual rhythm and density to use
- Lyric / narrative content? — Are there lyrics to render on screen, or is it purely visual?
- Visual reference style? — Existing music videos / artists for reference
- Performer footage? — None, has clips, will provide
- Visual generation approach? — Per-beat generative, edit-driven cuts of stock footage, illustrated, hybrid
Explainer / educational
- What concept is being taught? — One-sentence concept, key takeaway
- Audience expertise? — Beginner / intermediate / expert
- Diagram density? — Heavy math / formulas / code / abstract concepts
- Voiceover? — TTS / recorded / on-screen text only
- Tool preference? —
manim-video(math),p5js(generative), Remotion (UI motion graphics),comfyui(AI-generated visuals),ascii-video(technical/retro), hybrid - Pacing? — Fast and dense (3Blue1Brown) or slow and contemplative
Tutorial / changelog / software demo
- Software being demonstrated? — Name, what it does
- Demo script? — Sequence of commands / screens to show
- Terminal-only or with GUI?
- Voiceover for narration?
- Diagram support needed? — Often these benefit from a diagram skill
alongside the screen-capture/render step (
excalidraw,architecture-diagram,concept-diagrams)
ASCII / terminal art
- Source material? — Generative / driven by audio / converting existing video / static image starting point
- Color palette? — Brand-driven (gold/black/blue), Matrix green, full rainbow, monochrome
- Audio reactivity? — None / loose mood / tight beat sync / FFT-driven
- Character set? — ASCII only / Unicode block-drawing / mystic glyphs
- Loop or narrative? — Perfect loop or one-shot
Abstract / loop
- Mood / emotion? — One word that captures the feel
- Motion type? — Zoom-into-itself, particle drift, wave, geometric, organic
- Loop required? — Perfect loop (Droste-style) or just satisfying ending
- Audio? — Silent, ambient pad, beat-synced
Documentary / interview cut
- Source footage? — Provided clips, length per clip
- Transcript / subtitles? — Provided or to be generated
- Story structure? — Chronological / thematic / arc
- B-roll approach? — Generated, stock library, none
Real-time / installation
- Output environment? — Gallery wall, projector, screen, web embed
- Audio source? — Live audio input, pre-recorded track, both
- Reactivity tightness? — Mood-level (loose) vs. tight beat-sync vs. live parameter control
- Tool preference? —
touchdesigner-mcpfor full TD operator graphs;p5jsfor web-canvas;comfyuifor generative-AI fed by audio features
Tier 2 — Always ask near the end
- Brand assets path? — Where logo / color palette / fonts / music library lives
- Output format requirements? — Codec preference, target file size, accepted alternates (vertical cut, GIF, audio-only)
- Deadline? — Affects task
max_runtime_secondsand acceptable scope - Quality bar? — Rough draft for review / polished final / archival
- Existing footage / assets to reuse? — Anything that should appear, not just inform
Reasonable assumption defaults
When the user under-specifies, fill in these defaults rather than asking:
| Question | Default |
|---|---|
| Frame rate | 30 fps for X / IG; 60 fps for tutorials/explainers; 24 fps for narrative film |
| Resolution | 1080×1080 for square, 1920×1080 for 16:9, 1080×1920 for 9:16 |
| Codec | H.264 / yuv420p, CRF 18 |
| Audio codec | AAC 192 kbps |
| Voice | Provider's mid-range neutral voice unless brand calls for distinctive timbre |
| Music | Silent (require user to specify if music is wanted) |
| Captions | On for explainer/tutorial; off for narrative/abstract unless requested |
| Quality bar | Polished final unless user says draft |
State the assumption explicitly: "Assuming 30fps and AAC audio unless you say otherwise — proceed?"
Anti-patterns
- Asking 10 questions at once. Maximum 4 per turn.
- Asking for things the brief already implies. If the user said "music video for my track," do not ask "is there a track?"
- Failing to classify before drilling in. Tier-1 questions depend on classification; mixing them up wastes turns.
- Treating "make a video" as enough to proceed. Always confirm the three baseline questions.