fix(skill): reference built-in video_analyze/vision_analyze tools in kanban-video-orchestrator (#19562)

The tool-matrix.md had a vague 'Gemini multimodal / Claude vision' entry
in the external tools table that didn't point to the actual built-in
Hermes tools. Now that video_analyze exists (merged in #19301), update
the skill to reference it properly:

- Add 'Built-in Hermes tools for media review' section with proper
  toolset names, enablement instructions, and capability details
- Add video + vision toolsets to cinematographer, editor, and reviewer
  profile configs
- Update role-archetypes.md to reference tools by name
- Update API key table to explain video_analyze routing
This commit is contained in:
Siddharth Balyan 2026-05-04 12:54:50 +05:30 committed by GitHub
parent a11aed1acc
commit 8163d37192
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 24 additions and 12 deletions

View file

@ -82,14 +82,14 @@ film and music video. Often pairs with a diagramming tool.
Designs the visual language: framing, color, motion, transitions. Reviews Designs the visual language: framing, color, motion, transitions. Reviews
generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`. generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`.
- **Toolsets:** kanban, terminal, file - **Toolsets:** kanban, terminal, file, video, vision
- **Skills:** `kanban-worker` plus the visual skill that matches the project - **Skills:** `kanban-worker` plus the visual skill that matches the project
(e.g., `ascii-video` for ASCII work, `manim-video` for explainers, (e.g., `ascii-video` for ASCII work, `manim-video` for explainers,
`touchdesigner-mcp` for real-time visuals, etc.) `touchdesigner-mcp` for real-time visuals, etc.)
- **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer - **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer
tasks tasks
- **Reviews via:** any media-analysis approach (Gemini multimodal, manual - **Reviews via:** `video_analyze` (sends full clip to multimodal LLM for
inspection of clip thumbnails, ffprobe summaries) native review), `vision_analyze` for spot-checking frames, ffprobe summaries
## Production roles ## Production roles
@ -247,10 +247,10 @@ specifically on what's off (pacing, sync, brand alignment, technical
quality). Distinct from the cinematographer (who reviews visuals during quality). Distinct from the cinematographer (who reviews visuals during
production) and the editor (who reviews for assembly). production) and the editor (who reviews for assembly).
- **Toolsets:** kanban, terminal, file - **Toolsets:** kanban, terminal, file, video, vision
- **Skills:** `kanban-worker` - **Skills:** `kanban-worker`
- **External tools:** any media-analysis approach (Gemini multimodal, - **Review tools:** `video_analyze` (native clip review via multimodal LLM),
ffprobe, manual frame extraction) `vision_analyze` (frame/thumbnail review), ffprobe
- **Outputs:** `review-notes.md`, comments on tasks - **Outputs:** `review-notes.md`, comments on tasks
### brand-cop ### brand-cop

View file

@ -81,7 +81,16 @@ them directly.
| Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics | | Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics |
| Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim | | Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim |
| Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d | | Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d |
| Gemini multimodal / Claude vision | AI review of clips | reviewer, cinematographer, editor |
## Built-in Hermes tools for media review
These are native Hermes tools — not invoked via terminal but through their own
toolsets. Enable them per-profile by adding the toolset to the profile config.
| Tool | Toolset | What it does | Profile that uses it |
|------|---------|--------------|----------------------|
| `video_analyze` | `video` (opt-in — `hermes tools enable video`) | Native video understanding — sends full clip to a multimodal LLM (Gemini via OpenRouter) for review without frame extraction. Supports mp4, webm, mov, avi, mkv. 50 MB cap. Model: `AUXILIARY_VIDEO_MODEL` env → `AUXILIARY_VISION_MODEL` fallback. | reviewer, cinematographer, editor |
| `vision_analyze` | `vision` (core — enabled by default) | Image/frame analysis — review stills, thumbnails, exported frames. Already available to all profiles without opt-in. | reviewer, cinematographer, concept-artist |
## Standard toolset configurations per role ## Standard toolset configurations per role
@ -156,6 +165,8 @@ toolsets:
- kanban - kanban
- terminal - terminal
- file - file
- video # video_analyze — review full clips natively
- vision # vision_analyze — review stills / exported frames
skills: skills:
always_load: always_load:
- kanban-worker - kanban-worker
@ -246,6 +257,8 @@ toolsets:
- kanban - kanban
- terminal - terminal
- file - file
- video # video_analyze — editor reviews assembled cuts natively
- vision # vision_analyze — spot-check frames
skills: skills:
always_load: always_load:
- kanban-worker - kanban-worker
@ -259,14 +272,13 @@ For captioner add Whisper invocation patterns to the SOUL.md.
```yaml ```yaml
toolsets: toolsets:
- kanban - kanban
- terminal # for media inspection - terminal # for media inspection (ffprobe, etc.)
- file - file
- video # video_analyze — review full clips natively
- vision # vision_analyze — review stills / exported frames
skills: skills:
always_load: always_load:
- kanban-worker - kanban-worker
env_required:
- OPENROUTER_API_KEY # if using Gemini multimodal review
# or ANTHROPIC_API_KEY if using Claude vision (already required globally)
``` ```
## API key requirements ## API key requirements
@ -278,7 +290,7 @@ key is present in `~/.hermes/.env` (or macOS Keychain) before firing the kanban.
|---------|---------|---------| |---------|---------|---------|
| ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent | | ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent |
| OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) | | OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) |
| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (Gemini multimodal review) | | OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (`video_analyze` routes through `AUXILIARY_VIDEO_MODEL` → OpenRouter) |
| FAL | `FAL_KEY` | image-generator (FAL flux models) | | FAL | `FAL_KEY` | image-generator (FAL flux models) |
| Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) | | Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) |
| Runway | `RUNWAY_API_KEY` | image-to-video-generator | | Runway | `RUNWAY_API_KEY` | image-to-video-generator |