diff --git a/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md b/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md index 5a4cb207a2..95eaeb33b6 100644 --- a/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md +++ b/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md @@ -82,14 +82,14 @@ film and music video. Often pairs with a diagramming tool. Designs the visual language: framing, color, motion, transitions. Reviews generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`. -- **Toolsets:** kanban, terminal, file +- **Toolsets:** kanban, terminal, file, video, vision - **Skills:** `kanban-worker` plus the visual skill that matches the project (e.g., `ascii-video` for ASCII work, `manim-video` for explainers, `touchdesigner-mcp` for real-time visuals, etc.) - **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer tasks -- **Reviews via:** any media-analysis approach (Gemini multimodal, manual - inspection of clip thumbnails, ffprobe summaries) +- **Reviews via:** `video_analyze` (sends full clip to multimodal LLM for + native review), `vision_analyze` for spot-checking frames, ffprobe summaries ## Production roles @@ -247,10 +247,10 @@ specifically on what's off (pacing, sync, brand alignment, technical quality). Distinct from the cinematographer (who reviews visuals during production) and the editor (who reviews for assembly). -- **Toolsets:** kanban, terminal, file +- **Toolsets:** kanban, terminal, file, video, vision - **Skills:** `kanban-worker` -- **External tools:** any media-analysis approach (Gemini multimodal, - ffprobe, manual frame extraction) +- **Review tools:** `video_analyze` (native clip review via multimodal LLM), + `vision_analyze` (frame/thumbnail review), ffprobe - **Outputs:** `review-notes.md`, comments on tasks ### brand-cop diff --git a/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md b/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md index 5c78c4ff3d..5a52d15ddd 100644 --- a/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md +++ b/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md @@ -81,7 +81,16 @@ them directly. | Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics | | Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim | | Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d | -| Gemini multimodal / Claude vision | AI review of clips | reviewer, cinematographer, editor | + +## Built-in Hermes tools for media review + +These are native Hermes tools — not invoked via terminal but through their own +toolsets. Enable them per-profile by adding the toolset to the profile config. + +| Tool | Toolset | What it does | Profile that uses it | +|------|---------|--------------|----------------------| +| `video_analyze` | `video` (opt-in — `hermes tools enable video`) | Native video understanding — sends full clip to a multimodal LLM (Gemini via OpenRouter) for review without frame extraction. Supports mp4, webm, mov, avi, mkv. 50 MB cap. Model: `AUXILIARY_VIDEO_MODEL` env → `AUXILIARY_VISION_MODEL` fallback. | reviewer, cinematographer, editor | +| `vision_analyze` | `vision` (core — enabled by default) | Image/frame analysis — review stills, thumbnails, exported frames. Already available to all profiles without opt-in. | reviewer, cinematographer, concept-artist | ## Standard toolset configurations per role @@ -156,6 +165,8 @@ toolsets: - kanban - terminal - file + - video # video_analyze — review full clips natively + - vision # vision_analyze — review stills / exported frames skills: always_load: - kanban-worker @@ -246,6 +257,8 @@ toolsets: - kanban - terminal - file + - video # video_analyze — editor reviews assembled cuts natively + - vision # vision_analyze — spot-check frames skills: always_load: - kanban-worker @@ -259,14 +272,13 @@ For captioner add Whisper invocation patterns to the SOUL.md. ```yaml toolsets: - kanban - - terminal # for media inspection + - terminal # for media inspection (ffprobe, etc.) - file + - video # video_analyze — review full clips natively + - vision # vision_analyze — review stills / exported frames skills: always_load: - kanban-worker -env_required: - - OPENROUTER_API_KEY # if using Gemini multimodal review - # or ANTHROPIC_API_KEY if using Claude vision (already required globally) ``` ## API key requirements @@ -278,7 +290,7 @@ key is present in `~/.hermes/.env` (or macOS Keychain) before firing the kanban. |---------|---------|---------| | ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent | | OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) | -| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (Gemini multimodal review) | +| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (`video_analyze` routes through `AUXILIARY_VIDEO_MODEL` → OpenRouter) | | FAL | `FAL_KEY` | image-generator (FAL flux models) | | Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) | | Runway | `RUNWAY_API_KEY` | image-to-video-generator |