From 8163d371922768c32f43eb6036d7d36e56775605 Mon Sep 17 00:00:00 2001
From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com>
Date: Mon, 4 May 2026 12:54:50 +0530
Subject: [PATCH] fix(skill): reference built-in video_analyze/vision_analyze
 tools in kanban-video-orchestrator (#19562)

The tool-matrix.md had a vague 'Gemini multimodal / Claude vision' entry
in the external tools table that didn't point to the actual built-in
Hermes tools. Now that video_analyze exists (merged in #19301), update
the skill to reference it properly:

- Add 'Built-in Hermes tools for media review' section with proper
  toolset names, enablement instructions, and capability details
- Add video + vision toolsets to cinematographer, editor, and reviewer
  profile configs
- Update role-archetypes.md to reference tools by name
- Update API key table to explain video_analyze routing
---
 .../references/role-archetypes.md             | 12 +++++-----
 .../references/tool-matrix.md                 | 24 ++++++++++++++-----
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md b/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md
index 5a4cb207a2..95eaeb33b6 100644
--- a/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md
+++ b/optional-skills/creative/kanban-video-orchestrator/references/role-archetypes.md
@@ -82,14 +82,14 @@ film and music video. Often pairs with a diagramming tool.
 Designs the visual language: framing, color, motion, transitions. Reviews
 generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`.
 
-- **Toolsets:** kanban, terminal, file
+- **Toolsets:** kanban, terminal, file, video, vision
 - **Skills:** `kanban-worker` plus the visual skill that matches the project
   (e.g., `ascii-video` for ASCII work, `manim-video` for explainers,
   `touchdesigner-mcp` for real-time visuals, etc.)
 - **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer
   tasks
-- **Reviews via:** any media-analysis approach (Gemini multimodal, manual
-  inspection of clip thumbnails, ffprobe summaries)
+- **Reviews via:** `video_analyze` (sends full clip to multimodal LLM for
+  native review), `vision_analyze` for spot-checking frames, ffprobe summaries
 
 ## Production roles
 
@@ -247,10 +247,10 @@ specifically on what's off (pacing, sync, brand alignment, technical
 quality). Distinct from the cinematographer (who reviews visuals during
 production) and the editor (who reviews for assembly).
 
-- **Toolsets:** kanban, terminal, file
+- **Toolsets:** kanban, terminal, file, video, vision
 - **Skills:** `kanban-worker`
-- **External tools:** any media-analysis approach (Gemini multimodal,
-  ffprobe, manual frame extraction)
+- **Review tools:** `video_analyze` (native clip review via multimodal LLM),
+  `vision_analyze` (frame/thumbnail review), ffprobe
 - **Outputs:** `review-notes.md`, comments on tasks
 
 ### brand-cop
diff --git a/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md b/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md
index 5c78c4ff3d..5a52d15ddd 100644
--- a/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md
+++ b/optional-skills/creative/kanban-video-orchestrator/references/tool-matrix.md
@@ -81,7 +81,16 @@ them directly.
 | Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics |
 | Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim |
 | Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d |
-| Gemini multimodal / Claude vision | AI review of clips | reviewer, cinematographer, editor |
+
+## Built-in Hermes tools for media review
+
+These are native Hermes tools — not invoked via terminal but through their own
+toolsets. Enable them per-profile by adding the toolset to the profile config.
+
+| Tool | Toolset | What it does | Profile that uses it |
+|------|---------|--------------|----------------------|
+| `video_analyze` | `video` (opt-in — `hermes tools enable video`) | Native video understanding — sends full clip to a multimodal LLM (Gemini via OpenRouter) for review without frame extraction. Supports mp4, webm, mov, avi, mkv. 50 MB cap. Model: `AUXILIARY_VIDEO_MODEL` env → `AUXILIARY_VISION_MODEL` fallback. | reviewer, cinematographer, editor |
+| `vision_analyze` | `vision` (core — enabled by default) | Image/frame analysis — review stills, thumbnails, exported frames. Already available to all profiles without opt-in. | reviewer, cinematographer, concept-artist |
 
 ## Standard toolset configurations per role
 
@@ -156,6 +165,8 @@ toolsets:
   - kanban
   - terminal
   - file
+  - video               # video_analyze — review full clips natively
+  - vision              # vision_analyze — review stills / exported frames
 skills:
   always_load:
     - kanban-worker
@@ -246,6 +257,8 @@ toolsets:
   - kanban
   - terminal
   - file
+  - video              # video_analyze — editor reviews assembled cuts natively
+  - vision             # vision_analyze — spot-check frames
 skills:
   always_load:
     - kanban-worker
@@ -259,14 +272,13 @@ For captioner add Whisper invocation patterns to the SOUL.md.
 ```yaml
 toolsets:
   - kanban
-  - terminal           # for media inspection
+  - terminal           # for media inspection (ffprobe, etc.)
   - file
+  - video              # video_analyze — review full clips natively
+  - vision             # vision_analyze — review stills / exported frames
 skills:
   always_load:
     - kanban-worker
-env_required:
-  - OPENROUTER_API_KEY    # if using Gemini multimodal review
-  # or ANTHROPIC_API_KEY if using Claude vision (already required globally)
 ```
 
 ## API key requirements
@@ -278,7 +290,7 @@ key is present in `~/.hermes/.env` (or macOS Keychain) before firing the kanban.
 |---------|---------|---------|
 | ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent |
 | OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) |
-| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (Gemini multimodal review) |
+| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (`video_analyze` routes through `AUXILIARY_VIDEO_MODEL` → OpenRouter) |
 | FAL | `FAL_KEY` | image-generator (FAL flux models) |
 | Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) |
 | Runway | `RUNWAY_API_KEY` | image-to-video-generator |