* feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' | 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
10 KiB
| sidebar_position | title | description |
|---|---|---|
| 4 | Toolsets Reference | Reference for Hermes core, composite, platform, and dynamic toolsets |
Toolsets Reference
Toolsets are named bundles of tools that control what the agent can do. They're the primary mechanism for configuring tool availability per platform, per session, or per task.
How Toolsets Work
Every tool belongs to exactly one toolset. When you enable a toolset, all tools in that bundle become available to the agent. Toolsets come in three kinds:
- Core — A single logical group of related tools (e.g.,
filebundlesread_file,write_file,patch,search_files) - Composite — Combines multiple core toolsets for a common scenario (e.g.,
debuggingbundles file, terminal, and web tools) - Platform — A complete tool configuration for a specific deployment context (e.g.,
hermes-cliis the default for interactive CLI sessions)
Configuring Toolsets
Per-session (CLI)
hermes chat --toolsets web,file,terminal
hermes chat --toolsets debugging # composite — expands to file + terminal + web
hermes chat --toolsets all # everything
Per-platform (config.yaml)
toolsets:
- hermes-cli # default for CLI
# - hermes-telegram # override for Telegram gateway
Interactive management
hermes tools # curses UI to enable/disable per platform
Or in-session:
/tools list
/tools disable browser
/tools enable rl
Core Toolsets
| Toolset | Tools | Purpose |
|---|---|---|
browser |
browser_back, browser_cdp, browser_click, browser_console, browser_dialog, browser_get_images, browser_navigate, browser_press, browser_scroll, browser_snapshot, browser_type, browser_vision, web_search |
Core browser automation. Includes web_search as a fallback for quick lookups. browser_cdp and browser_dialog are gated at runtime — registered only when a CDP endpoint is reachable at session start (via /browser connect, browser.cdp_url config, Browserbase, or Camofox). browser_dialog works together with the pending_dialogs and frame_tree fields that browser_snapshot adds when a CDP supervisor is attached. |
clarify |
clarify |
Ask the user a question when the agent needs clarification. |
code_execution |
execute_code |
Run Python scripts that call Hermes tools programmatically. |
cronjob |
cronjob |
Schedule and manage recurring tasks. |
debugging |
composite (file + terminal + web) |
Debug bundle — file, process/terminal, web extract/search. |
delegation |
delegate_task |
Spawn isolated subagent instances for parallel work. |
discord |
discord |
Core Discord text/embed/DM actions (gateway-only). Active on the hermes-discord toolset. |
discord_admin |
discord_admin |
Discord moderation (bans, role changes, channel management). Active on the hermes-discord toolset; requires the bot to hold the relevant Discord permissions. |
feishu_doc |
feishu_doc_read |
Read Feishu/Lark document content. Used by the Feishu document-comment intelligent-reply handler. |
feishu_drive |
feishu_drive_add_comment, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment |
Feishu/Lark drive comment operations. Scoped to the comment agent; not exposed on hermes-cli or other messaging toolsets. |
file |
patch, read_file, search_files, write_file |
File reading, writing, searching, and editing. |
homeassistant |
ha_call_service, ha_get_state, ha_list_entities, ha_list_services |
Smart home control via Home Assistant. Only available when HASS_TOKEN is set. |
computer_use |
computer_use |
Background macOS desktop control via cua-driver — does not steal cursor/focus. Works with any tool-capable model. macOS only; requires cua-driver on $PATH. |
image_gen |
image_generate |
Text-to-image generation via FAL.ai (with opt-in OpenAI / xAI backends). |
video_gen |
video_generate |
Text-to-video and image-to-video via plugin-registered backends (xAI Grok-Imagine, FAL.ai Veo 3.1 / Pixverse v6 / Kling O3). Pass image_url to animate an image; omit it for text-to-video. |
kanban |
kanban_block, kanban_comment, kanban_complete, kanban_create, kanban_heartbeat, kanban_link, kanban_show |
Multi-agent coordination tools — only registered when the agent is spawned by the kanban dispatcher (HERMES_KANBAN_TASK env set). Lets workers mark tasks done with structured handoffs, block for human input, heartbeat during long ops, comment on threads, and (for orchestrators) fan out into child tasks. |
memory |
memory |
Persistent cross-session memory management. |
messaging |
send_message |
Send messages to other platforms (Telegram, Discord, etc.) from within a session. |
moa |
mixture_of_agents |
Multi-model consensus via Mixture of Agents. |
rl |
rl_check_status, rl_edit_config, rl_get_current_config, rl_get_results, rl_list_environments, rl_list_runs, rl_select_environment, rl_start_training, rl_stop_training, rl_test_inference |
RL training environment management (Atropos). |
safe |
image_generate, vision_analyze, web_extract, web_search (via includes) |
Read-only research + media generation. No file writes, no terminal, no code execution. |
search |
web_search |
Web search only (without extract). |
session_search |
session_search |
Search past conversation sessions. |
skills |
skill_manage, skill_view, skills_list |
Skill CRUD and browsing. |
spotify |
spotify_albums, spotify_devices, spotify_library, spotify_playback, spotify_playlists, spotify_queue, spotify_search |
Native Spotify control (playback, queue, search, playlists, albums, library). Registered by the bundled spotify plugin. |
terminal |
process, terminal |
Shell command execution and background process management. |
todo |
todo |
Task list management within a session. |
tts |
text_to_speech |
Text-to-speech audio generation. |
vision |
vision_analyze |
Image analysis via vision-capable models. |
video |
video_analyze |
Video analysis and understanding tools (opt-in, not in the default toolset — add explicitly via --toolsets). |
web |
web_extract, web_search |
Web search and page content extraction. |
yuanbao |
yb_query_group_info, yb_query_group_members, yb_search_sticker, yb_send_dm, yb_send_sticker |
Yuanbao DM/group actions and sticker search. Registered only on hermes-yuanbao. |
Platform Toolsets
Platform toolsets define the complete tool configuration for a deployment target. Most messaging platforms use the same set as hermes-cli:
| Toolset | Differences from hermes-cli |
|---|---|
hermes-cli |
Full toolset — the default for interactive CLI sessions. Includes file, terminal, web, browser, memory, skills, vision, image_gen, todo, tts, delegation, code_execution, cronjob, session_search, clarify, and safe (read-only) bundles plus the standard messaging tools. |
hermes-acp |
Drops clarify, cronjob, image_generate, send_message, text_to_speech, and all four Home Assistant tools. Focused on coding tasks in IDE context. |
hermes-api-server |
Drops clarify, send_message, and text_to_speech. Keeps everything else — suitable for programmatic access where user interaction isn't possible. |
hermes-cron |
Same as hermes-cli. |
hermes-telegram |
Same as hermes-cli. |
hermes-discord |
Adds discord and discord_admin on top of hermes-cli. |
hermes-slack |
Same as hermes-cli. |
hermes-whatsapp |
Same as hermes-cli. |
hermes-signal |
Same as hermes-cli. |
hermes-matrix |
Same as hermes-cli. |
hermes-mattermost |
Same as hermes-cli. |
hermes-email |
Same as hermes-cli. |
hermes-sms |
Same as hermes-cli. |
hermes-bluebubbles |
Same as hermes-cli. |
hermes-dingtalk |
Same as hermes-cli. |
hermes-feishu |
Adds the five feishu_doc_* / feishu_drive_* tools (only used by the document-comment handler, not the regular chat adapter). |
hermes-qqbot |
Same as hermes-cli. |
hermes-wecom |
Same as hermes-cli. |
hermes-wecom-callback |
Same as hermes-cli. |
hermes-weixin |
Same as hermes-cli. |
hermes-yuanbao |
Adds the five yb_* tools (DM/group/sticker) on top of hermes-cli. |
hermes-homeassistant |
Same as hermes-cli (the Home Assistant tools are already present by default and activate when HASS_TOKEN is set). |
hermes-webhook |
Same as hermes-cli. |
hermes-gateway |
Internal gateway orchestrator toolset — union of every hermes-<platform> toolset; used when the gateway needs to accept any message source. |
Dynamic Toolsets
MCP server toolsets
Each configured MCP server generates a mcp-<server> toolset at runtime. For example, if you configure a github MCP server, a mcp-github toolset is created containing all tools that server exposes.
# config.yaml
mcp_servers:
github:
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
This creates a mcp-github toolset you can reference in --toolsets or platform configs.
Plugin toolsets
Plugins can register their own toolsets via ctx.register_tool() during plugin initialization. These appear alongside built-in toolsets and can be enabled/disabled the same way.
Custom toolsets
Define custom toolsets in config.yaml to create project-specific bundles:
toolsets:
- hermes-cli
custom_toolsets:
data-science:
- file
- terminal
- code_execution
- web
- vision
Wildcards
allor*— expands to every registered toolset (built-in + dynamic + plugin)
Relationship to hermes tools
The hermes tools command provides a curses-based UI for toggling individual tools on or off per platform. This operates at the tool level (finer than toolsets) and persists to config.yaml. Disabled tools are filtered out even if their toolset is enabled.
See also: Tools Reference for the complete list of individual tools and their parameters.