mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-07 02:51:50 +00:00
fix(skill): reference built-in video_analyze/vision_analyze tools in kanban-video-orchestrator (#19562)
The tool-matrix.md had a vague 'Gemini multimodal / Claude vision' entry in the external tools table that didn't point to the actual built-in Hermes tools. Now that video_analyze exists (merged in #19301), update the skill to reference it properly: - Add 'Built-in Hermes tools for media review' section with proper toolset names, enablement instructions, and capability details - Add video + vision toolsets to cinematographer, editor, and reviewer profile configs - Update role-archetypes.md to reference tools by name - Update API key table to explain video_analyze routing
This commit is contained in:
parent
a11aed1acc
commit
8163d37192
2 changed files with 24 additions and 12 deletions
|
|
@ -82,14 +82,14 @@ film and music video. Often pairs with a diagramming tool.
|
|||
Designs the visual language: framing, color, motion, transitions. Reviews
|
||||
generator output for visual consistency. Hands off per-scene `VISUAL_SPEC.md`.
|
||||
|
||||
- **Toolsets:** kanban, terminal, file
|
||||
- **Toolsets:** kanban, terminal, file, video, vision
|
||||
- **Skills:** `kanban-worker` plus the visual skill that matches the project
|
||||
(e.g., `ascii-video` for ASCII work, `manim-video` for explainers,
|
||||
`touchdesigner-mcp` for real-time visuals, etc.)
|
||||
- **Outputs:** `scenes/scene-NN/VISUAL_SPEC.md`, review comments on renderer
|
||||
tasks
|
||||
- **Reviews via:** any media-analysis approach (Gemini multimodal, manual
|
||||
inspection of clip thumbnails, ffprobe summaries)
|
||||
- **Reviews via:** `video_analyze` (sends full clip to multimodal LLM for
|
||||
native review), `vision_analyze` for spot-checking frames, ffprobe summaries
|
||||
|
||||
## Production roles
|
||||
|
||||
|
|
@ -247,10 +247,10 @@ specifically on what's off (pacing, sync, brand alignment, technical
|
|||
quality). Distinct from the cinematographer (who reviews visuals during
|
||||
production) and the editor (who reviews for assembly).
|
||||
|
||||
- **Toolsets:** kanban, terminal, file
|
||||
- **Toolsets:** kanban, terminal, file, video, vision
|
||||
- **Skills:** `kanban-worker`
|
||||
- **External tools:** any media-analysis approach (Gemini multimodal,
|
||||
ffprobe, manual frame extraction)
|
||||
- **Review tools:** `video_analyze` (native clip review via multimodal LLM),
|
||||
`vision_analyze` (frame/thumbnail review), ffprobe
|
||||
- **Outputs:** `review-notes.md`, comments on tasks
|
||||
|
||||
### brand-cop
|
||||
|
|
|
|||
|
|
@ -81,7 +81,16 @@ them directly.
|
|||
| Remotion CLI (`npx remotion render`) | React-based motion graphics | renderer-motion-graphics |
|
||||
| Manim CE (`manim`) | Math animation render (driven by `manim-video` skill's recipes) | renderer-manim |
|
||||
| Blender (`blender -b`) | 3D rendering (alternative to `blender-mcp`) | renderer-3d |
|
||||
| Gemini multimodal / Claude vision | AI review of clips | reviewer, cinematographer, editor |
|
||||
|
||||
## Built-in Hermes tools for media review
|
||||
|
||||
These are native Hermes tools — not invoked via terminal but through their own
|
||||
toolsets. Enable them per-profile by adding the toolset to the profile config.
|
||||
|
||||
| Tool | Toolset | What it does | Profile that uses it |
|
||||
|------|---------|--------------|----------------------|
|
||||
| `video_analyze` | `video` (opt-in — `hermes tools enable video`) | Native video understanding — sends full clip to a multimodal LLM (Gemini via OpenRouter) for review without frame extraction. Supports mp4, webm, mov, avi, mkv. 50 MB cap. Model: `AUXILIARY_VIDEO_MODEL` env → `AUXILIARY_VISION_MODEL` fallback. | reviewer, cinematographer, editor |
|
||||
| `vision_analyze` | `vision` (core — enabled by default) | Image/frame analysis — review stills, thumbnails, exported frames. Already available to all profiles without opt-in. | reviewer, cinematographer, concept-artist |
|
||||
|
||||
## Standard toolset configurations per role
|
||||
|
||||
|
|
@ -156,6 +165,8 @@ toolsets:
|
|||
- kanban
|
||||
- terminal
|
||||
- file
|
||||
- video # video_analyze — review full clips natively
|
||||
- vision # vision_analyze — review stills / exported frames
|
||||
skills:
|
||||
always_load:
|
||||
- kanban-worker
|
||||
|
|
@ -246,6 +257,8 @@ toolsets:
|
|||
- kanban
|
||||
- terminal
|
||||
- file
|
||||
- video # video_analyze — editor reviews assembled cuts natively
|
||||
- vision # vision_analyze — spot-check frames
|
||||
skills:
|
||||
always_load:
|
||||
- kanban-worker
|
||||
|
|
@ -259,14 +272,13 @@ For captioner add Whisper invocation patterns to the SOUL.md.
|
|||
```yaml
|
||||
toolsets:
|
||||
- kanban
|
||||
- terminal # for media inspection
|
||||
- terminal # for media inspection (ffprobe, etc.)
|
||||
- file
|
||||
- video # video_analyze — review full clips natively
|
||||
- vision # vision_analyze — review stills / exported frames
|
||||
skills:
|
||||
always_load:
|
||||
- kanban-worker
|
||||
env_required:
|
||||
- OPENROUTER_API_KEY # if using Gemini multimodal review
|
||||
# or ANTHROPIC_API_KEY if using Claude vision (already required globally)
|
||||
```
|
||||
|
||||
## API key requirements
|
||||
|
|
@ -278,7 +290,7 @@ key is present in `~/.hermes/.env` (or macOS Keychain) before firing the kanban.
|
|||
|---------|---------|---------|
|
||||
| ElevenLabs | `ELEVENLABS_API_KEY` | voice-talent |
|
||||
| OpenAI | `OPENAI_API_KEY` | image-generator (DALL-E), voice-talent (TTS) |
|
||||
| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (Gemini multimodal review) |
|
||||
| OpenRouter | `OPENROUTER_API_KEY` | reviewer, cinematographer, editor (`video_analyze` routes through `AUXILIARY_VIDEO_MODEL` → OpenRouter) |
|
||||
| FAL | `FAL_KEY` | image-generator (FAL flux models) |
|
||||
| Replicate | `REPLICATE_API_TOKEN` | image-generator (alternate provider) |
|
||||
| Runway | `RUNWAY_API_KEY` | image-to-video-generator |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue