feat(video_gen): unified video_generate tool with pluggable provider backends (#25126)

* feat(video_gen): unified video_generate tool with pluggable provider backends

One core video_generate tool, every backend a plugin. Mirrors the
image_gen + memory_provider + context_engine architecture: ABC, registry,
plugin-context registration hook, and per-plugin model catalogs surfaced
through hermes tools.

Surface (one schema, every backend):
- operation: generate / edit / extend
- modalities: text-to-video (prompt only), image-to-video (prompt +
  image_url), video edit (prompt + video_url), video extend (video_url)
- reference_image_urls, duration, aspect_ratio, resolution,
  negative_prompt, audio, seed, model override
- Providers ignore unknown kwargs and declare what they support via
  VideoGenProvider.capabilities() — backend-specific quirks stay in the
  backend, the agent learns one tool

Backends shipped:
- plugins/video_gen/xai/  — Grok-Imagine, full generate/edit/extend +
  image-to-video + reference images (salvaged from PR #10600 by
  @Jaaneek, reshaped into the plugin interface)
- plugins/video_gen/fal/  — Veo 3.1 (t2v + i2v), Kling O3 i2v,
  Pixverse v6 i2v with model-aware payload building that drops keys a
  model doesn't declare

Wiring:
- agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation,
  success_response / error_response, save_b64_video / save_bytes_video,
  $HERMES_HOME/cache/videos/
- agent/video_gen_registry.py — thread-safe register/get/list +
  get_active_provider() reading video_gen.provider from config.yaml
- hermes_cli/plugins.py — PluginContext.register_video_gen_provider()
- hermes_cli/tools_config.py — Video Generation category in
  hermes tools, plugin-only providers list, model picker per plugin,
  config write to video_gen.{provider,model}
- toolsets.py — new video_gen toolset
- tests: 31 new tests covering ABC, registry, tool dispatch, both plugins
- docs: developer-guide/video-gen-provider-plugin.md (parallel to the
  image-gen guide), sidebar + toolsets-reference + plugin guides updated

Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse),
#10458 (provider categories), #10786 (xAI media+search bundle), #2984
(FAL duplicate), #19086 (Google Veo standalone — easy port to plugin
interface).

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): dynamic schema reflects active backend's capabilities

Address the 'capability variance' question — instead of one tool with a
static schema that lies about what every backend supports, the
video_generate tool now rebuilds its description at get_definitions()
time based on the configured video_gen.provider and video_gen.model.

The agent sees backend-specific guidance up-front:
- 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is
  REQUIRED; text-only prompts will be rejected'
- 'fal-ai/veo3.1' (t2v): no image_url restriction shown
- xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7
  reference_image_urls'
- Backends without edit/extend: 'not supported on this backend — surface
  that they need to switch backends via hermes tools'

This is the same pattern PR #22694 used for delegate_task self-capping —
documented in the dynamic-tool-schemas skill. Cache invalidation is
free: get_tool_definitions() already memoizes on config.yaml mtime, so a
mid-session backend swap rebuilds the schema automatically.

Tested:
- Empirical FAL OpenAPI schema check confirms image-to-video models
  require image_url (FAL returns HTTP 422 otherwise) — client-side
  rejection in FALVideoGenProvider.generate() now prevents the wasted
  round-trip
- Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean
  missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches
- 6 new tests cover the builder (no config / image-only / full-surface /
  text-only / unknown provider / registry wiring), all passing
- 37/37 in the slice, 134/134 in the broader regression set

* test(video_gen/xai): full surface integration tests + cleaner schema

Verified end-to-end that the xAI plugin handles every documented mode
from PR #10600's surface: text-to-video, image-to-video,
reference-images-to-video, video edit, video extend (with and without
prompt). All five modes route to the correct xAI endpoint
(/videos/generations, /videos/edits, /videos/extensions) with the right
payload shape (image / reference_images / video keys), and all five
client-side rejections fire before the network: edit-without-prompt,
extend-without-video_url, image+refs conflict, >7 references, and
duration/aspect_ratio clamping.

15 new integration tests grouped into four classes (endpoint routing,
modalities, validation, clamping). httpx is stubbed via a small fake
AsyncClient that records POSTs so the tests assert the actual payload
the plugin would send to xAI — not just the success/error envelope.

Also cleaned up a description redundancy: when a model's operations
match the backend's overall set, we no longer print the duplicate
'operations supported by this model' line. xAI's description now reads:

    Active backend: xAI . model: grok-imagine-video
    - operations supported by this backend: edit, extend, generate
    - modalities supported by this backend: image, reference_images, text
    - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16
    - resolution choices: 480p, 720p
    - duration range: 1-15s
    - reference_image_urls: up to 7 images

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing

Two design changes per Teknium:

1) Drop edit/extend from the tool surface entirely. Only text-to-video
and image-to-video remain. The agent sees a clean tool with two
modalities; backend-specific quirks like xAI's edit/extend endpoints
stay out of the unified schema.

2) FAL: pick a model FAMILY once, the plugin routes between the
family's text-to-video and image-to-video endpoints based on whether
image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND
'fal-ai/veo3.1/image-to-video' as separate options — they pick
'veo3.1', and the plugin handles the rest.

Catalog rewritten as families:

    veo3.1            fal-ai/veo3.1                                /  fal-ai/veo3.1/image-to-video
    pixverse-v6       fal-ai/pixverse/v6/text-to-video             /  fal-ai/pixverse/v6/image-to-video
    kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video /  fal-ai/kling-video/o3/standard/image-to-video

xAI uses a single endpoint (/videos/generations) for both modes,
routed by the presence of the 'image' field in the payload — no
edit/extend exposure.

Schema changes:
- VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params:
  prompt (required), image_url, reference_image_urls, duration,
  aspect_ratio, resolution, negative_prompt, audio, seed, model.
- VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS,
  DEFAULT_OPERATION. capabilities() drops 'operations' key.
- success_response: add 'modality' field ('text' | 'image') so the
  agent and logs can see which endpoint was actually hit.

Dynamic schema builder simplified — no operations bullet, no
'switch backends if you need edit/extend' guidance. When the active
backend supports both modalities (the common case), description reads:

    Active backend: FAL . model: pixverse-v6
    - supports both text-to-video (omit image_url) and image-to-video
      (pass image_url) - routes automatically
    - aspect_ratio choices: 16:9, 9:16, 1:1
    - resolution choices: 360p, 540p, 720p, 1080p
    - duration range: 1-15s
    - audio: pass audio=true to enable native audio (pricing tier)
    - negative_prompt: supported

Tests: 51 in the video_gen slice, 216 across the broader image+video
sweep, all passing. New FAL routing tests prove pixverse-v6 + no image
hits text-to-video endpoint, pixverse-v6 + image_url hits
image-to-video endpoint, same for veo3.1 and kling-o3-standard.

Docs updated: developer-guide page rewrites the 'model families' pattern
as a first-class section so external plugin authors know the convention.
toolsets-reference and toolsets.py descriptions match the new surface.

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers

Catalog now covers everything Teknium specced from FAL:

  Cheap tier:
    ltx-2.3        fal-ai/ltx-2.3-22b/text-to-video       / image-to-video
    pixverse-v6    fal-ai/pixverse/v6/text-to-video       / image-to-video

  Premium tier:
    veo3.1         fal-ai/veo3.1                          / fal-ai/veo3.1/image-to-video
    seedance-2.0   bytedance/seedance-2.0/text-to-video   / image-to-video
    kling-v3-4k    fal-ai/kling-video/v3/4k/text-to-video / image-to-video
    happy-horse    fal-ai/happy-horse/text-to-video       / image-to-video

DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane
defaults, both modalities) — better first-run UX for users who haven't
explicitly picked a model.

New family-entry knob: image_param_key. Kling v3 4K's image-to-video
endpoint expects start_image_url instead of image_url; declaring
image_param_key='start_image_url' on the family lets _build_payload
remap correctly. Other families default to plain image_url.

Per-family capability flags reflect each model's docs:
- LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution
  enum exposed by FAL — let endpoint apply defaults)
- Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported,
  negative prompts NOT supported per docs
- Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative
- Veo 3.1: unchanged, 16:9/9:16, 4/6/8s

Tests: +5 covering the new families (full catalog, Kling 4K
start_image_url remap, Seedance routing, LTX payload minimality, Happy
Horse minimality). 56/56 in the slice green.

Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes
already has a direct xAI plugin that talks to xAI's own API; routing
the same model through FAL's wrapper would duplicate the surface
without adding capabilities. Users on FAL who want Grok-Imagine should
use the xAI plugin directly; flag if you want both routes available.

* test(video_gen): tool-surface routing matrix — every model x modality

End-to-end matrix test driven through _handle_video_generate() — the
actual function the agent's video_generate tool call lands in. Writes
config.yaml, invokes the registered handler with a raw args dict, then
asserts the outbound HTTP/SDK call hit the right endpoint with the right
payload shape.

Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new
families as they're added (add a family to FAL_FAMILIES and you get
both modalities tested for free).

Coverage:
- All 6 FAL families x {text-only, text+image} = 12 cases
- xAI x {text-only, text+image} = 2 cases
- tool-level model= arg overrides config = 2 cases

For each case, verifies:
- result['success'] is True
- result['modality'] matches input shape ('text' if no image_url, 'image' otherwise)
- outbound endpoint URL matches the family's text_endpoint or image_endpoint
- text-only payloads carry no image-shaped keys
- text+image payloads carry the family's image key (image_url for most,
  start_image_url for kling-v3-4k, wrapped 'image' object for xAI)

All 16 cases passing. Confirms the tool surface routes every
(provider, model, modality) combination correctly with zero leakage.

* feat(video_gen): keep video_gen out of first-run setup, surface in status

Two changes:

1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in
   the first-run toolset checklist. Video gen is niche, paid, and slow —
   most users don't want it nagging them during initial setup. Anyone
   who wants it opts in via 'hermes tools' -> Video Generation, which
   already routes to the provider+model picker.

2. The 'hermes setup' status panel learns about video_gen — but only
   shows the row when a plugin reports available. Users without
   FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of
   those keys see 'Video Generation (FAL) ✓' as confirmation it's wired.

Verified live:
- Fresh install (no creds): zero video_gen mentions in wizard.
- With FAL_KEY: status row appears with active backend name.
- 160/160 in the setup + tools_config + video_gen test slice.

Rationale: image_gen is on by default because it's a featured creative
tool used in casual chat (telegrams, etc). Video gen is heavier — long
wait, paid per-second pricing. Default-off matches user intent better.

---------

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
This commit is contained in:
Teknium 2026-05-13 16:39:41 -07:00 committed by GitHub
parent b833d85019
commit 9d42c2c286
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
25 changed files with 3617 additions and 3 deletions

299
agent/video_gen_provider.py Normal file
View file

@ -0,0 +1,299 @@
"""
Video Generation Provider ABC
=============================
Defines the pluggable-backend interface for video generation. Providers register
instances via ``PluginContext.register_video_gen_provider()``; the active one
(selected via ``video_gen.provider`` in ``config.yaml``) services every
``video_generate`` tool call.
Providers live in ``<repo>/plugins/video_gen/<name>/`` (built-in, auto-loaded
as ``kind: backend``) or ``~/.hermes/plugins/video_gen/<name>/`` (user, opt-in
via ``plugins.enabled``).
Mirrors the ``image_gen`` provider design (``agent/image_gen_provider.py``) so
the two surfaces stay learnable together.
Unified surface
---------------
One tool ``video_generate`` covers **text-to-video** and **image-to-video**.
The router is the presence of ``image_url``: if it's set, the provider routes
to its image-to-video endpoint; if it's omitted, the provider routes to
text-to-video. Users pick one **model family** (e.g. Pixverse v6, Veo 3.1,
Kling O3 Standard); the provider handles which underlying FAL/xAI endpoint
to hit.
Video edit and video extend are intentionally NOT exposed in this surface
the inconsistency across backends is too large for one unified tool. If
those use cases warrant attention later they can ship as separate tools.
Response shape
--------------
All providers return a dict built by :func:`success_response` /
:func:`error_response`. Keys:
success bool
video str | None URL or absolute file path
model str provider-specific model identifier
prompt str echoed prompt
modality str "text" | "image" (which mode was used)
aspect_ratio str provider-native (e.g. "16:9") or ""
duration int seconds (0 if not applicable)
provider str provider name (for diagnostics)
error str only when success=False
error_type str only when success=False
"""
from __future__ import annotations
import abc
import base64
import datetime
import logging
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# Common aspect ratios across providers (Veo / Kling / xAI / Pixverse). The
# tool schema advertises this set as an enum hint, but providers may accept
# a narrower or wider set — they are responsible for clamping.
COMMON_ASPECT_RATIOS: Tuple[str, ...] = ("16:9", "9:16", "1:1", "4:3", "3:4", "3:2", "2:3")
DEFAULT_ASPECT_RATIO = "16:9"
COMMON_RESOLUTIONS: Tuple[str, ...] = ("480p", "540p", "720p", "1080p")
DEFAULT_RESOLUTION = "720p"
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class VideoGenProvider(abc.ABC):
"""Abstract base class for a video generation backend.
Subclasses must implement :meth:`generate`. Everything else has sane
defaults override only what your provider needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``video_gen.provider`` config.
Lowercase, no spaces. Examples: ``xai``, ``fal``, ``google``.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``. Defaults to ``name.title()``."""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key and optional-dependency
import. Default: True.
"""
return True
def list_models(self) -> List[Dict[str, Any]]:
"""Return catalog entries for ``hermes tools`` model picker.
Each entry represents a **model family** that supports text-to-video
and/or image-to-video routing internally::
{
"id": "veo-3.1", # required
"display": "Veo 3.1", # optional; defaults to id
"speed": "~60s", # optional
"strengths": "...", # optional
"price": "$0.20/s", # optional
"modalities": ["text", "image"], # optional, advisory
}
Default: empty list (provider has no user-selectable models).
"""
return []
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker."""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
def capabilities(self) -> Dict[str, Any]:
"""Return what this provider supports.
Returned dict (all keys optional)::
{
"modalities": ["text", "image"], # which inputs the backend accepts
"aspect_ratios": ["16:9", "9:16", ...],
"resolutions": ["720p", "1080p"],
"max_duration": 15, # seconds
"min_duration": 1,
"supports_audio": True,
"supports_negative_prompt": True,
"max_reference_images": 7,
}
Used by the tool layer for soft validation and by ``hermes tools``
for the picker. Default: text-only.
"""
return {
"modalities": ["text"],
"aspect_ratios": list(COMMON_ASPECT_RATIOS),
"resolutions": list(COMMON_RESOLUTIONS),
"max_duration": 10,
"min_duration": 1,
"supports_audio": False,
"supports_negative_prompt": False,
"max_reference_images": 0,
}
@abc.abstractmethod
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
resolution: str = DEFAULT_RESOLUTION,
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
"""Generate a video from a prompt (text-to-video) or animate an image
(image-to-video).
Routing: if ``image_url`` is provided, the provider should route to
its image-to-video endpoint; otherwise text-to-video. The plugin
is responsible for picking the right underlying endpoint within
the user's chosen model family.
Implementations should return the dict from :func:`success_response`
or :func:`error_response`. ``kwargs`` may contain forward-compat
parameters future versions of the schema will expose
implementations MUST ignore unknown keys (no TypeError).
"""
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _videos_cache_dir() -> Path:
"""Return ``$HERMES_HOME/cache/videos/``, creating parents as needed."""
from hermes_constants import get_hermes_home
path = get_hermes_home() / "cache" / "videos"
path.mkdir(parents=True, exist_ok=True)
return path
def save_b64_video(
b64_data: str,
*,
prefix: str = "video",
extension: str = "mp4",
) -> Path:
"""Decode base64 video data and write under ``$HERMES_HOME/cache/videos/``.
Returns the absolute :class:`Path` to the saved file.
Filename format: ``<prefix>_<YYYYMMDD_HHMMSS>_<short-uuid>.<ext>``.
"""
raw = base64.b64decode(b64_data)
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
path.write_bytes(raw)
return path
def save_bytes_video(
raw: bytes,
*,
prefix: str = "video",
extension: str = "mp4",
) -> Path:
"""Write raw video bytes (e.g. an HTTP download body) to the cache."""
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
path.write_bytes(raw)
return path
def success_response(
*,
video: str,
model: str,
prompt: str,
modality: str = "text",
aspect_ratio: str = "",
duration: int = 0,
provider: str,
extra: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Build a uniform success response dict.
``video`` may be an HTTP URL or an absolute filesystem path.
``modality`` is ``"text"`` (text-to-video) or ``"image"`` (image-to-video)
indicates which endpoint was actually hit, useful for diagnostics.
"""
payload: Dict[str, Any] = {
"success": True,
"video": video,
"model": model,
"prompt": prompt,
"modality": modality,
"aspect_ratio": aspect_ratio,
"duration": int(duration) if duration else 0,
"provider": provider,
}
if extra:
for k, v in extra.items():
payload.setdefault(k, v)
return payload
def error_response(
*,
error: str,
error_type: str = "provider_error",
provider: str = "",
model: str = "",
prompt: str = "",
aspect_ratio: str = "",
) -> Dict[str, Any]:
"""Build a uniform error response dict."""
return {
"success": False,
"video": None,
"error": error,
"error_type": error_type,
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"provider": provider,
}

117
agent/video_gen_registry.py Normal file
View file

@ -0,0 +1,117 @@
"""
Video Generation Provider Registry
==================================
Central map of registered providers. Populated by plugins at import-time via
``PluginContext.register_video_gen_provider()``; consumed by the
``video_generate`` tool to dispatch each call to the active backend.
Active selection
----------------
The active provider is chosen by ``video_gen.provider`` in ``config.yaml``.
If unset, :func:`get_active_provider` applies fallback logic:
1. If exactly one provider is registered, use it.
2. Otherwise return ``None`` (the tool surfaces a helpful error pointing
the user at ``hermes tools``).
Mirrors ``agent/image_gen_registry.py`` so the two surfaces behave the
same.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.video_gen_provider import VideoGenProvider
logger = logging.getLogger(__name__)
_providers: Dict[str, VideoGenProvider] = {}
_lock = threading.Lock()
def register_provider(provider: VideoGenProvider) -> None:
"""Register a video generation provider.
Re-registration (same ``name``) overwrites the previous entry and logs
a debug message this makes hot-reload scenarios (tests, dev loops)
behave predictably.
"""
if not isinstance(provider, VideoGenProvider):
raise TypeError(
f"register_provider() expects a VideoGenProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Video gen provider .name must be a non-empty string")
with _lock:
existing = _providers.get(name)
_providers[name] = provider
if existing is not None:
logger.debug("Video gen provider '%s' re-registered (was %r)", name, type(existing).__name__)
else:
logger.debug("Registered video gen provider '%s' (%s)", name, type(provider).__name__)
def list_providers() -> List[VideoGenProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[VideoGenProvider]:
"""Return the provider registered under *name*, or None."""
if not isinstance(name, str):
return None
with _lock:
return _providers.get(name.strip())
def get_active_provider() -> Optional[VideoGenProvider]:
"""Resolve the currently-active provider.
Reads ``video_gen.provider`` from config.yaml; falls back per the
module docstring.
"""
configured: Optional[str] = None
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("video_gen") if isinstance(cfg, dict) else None
if isinstance(section, dict):
raw = section.get("provider")
if isinstance(raw, str) and raw.strip():
configured = raw.strip()
except Exception as exc:
logger.debug("Could not read video_gen.provider from config: %s", exc)
with _lock:
snapshot = dict(_providers)
if configured:
provider = snapshot.get(configured)
if provider is not None:
return provider
logger.debug(
"video_gen.provider='%s' configured but not registered; falling back",
configured,
)
# Fallback: single-provider case
if len(snapshot) == 1:
return next(iter(snapshot.values()))
return None
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()

View file

@ -2107,10 +2107,10 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
"FAL_KEY": {
"description": "FAL API key for image generation",
"description": "FAL API key for image and video generation",
"prompt": "FAL API key",
"url": "https://fal.ai/",
"tools": ["image_generate"],
"tools": ["image_generate", "video_generate"],
"password": True,
"category": "tool",
},

View file

@ -542,6 +542,33 @@ class PluginContext:
self.manifest.name, provider.name,
)
# -- video gen provider registration -------------------------------------
def register_video_gen_provider(self, provider) -> None:
"""Register a video generation backend.
``provider`` must be an instance of
:class:`agent.video_gen_provider.VideoGenProvider`. The
``provider.name`` attribute is what ``video_gen.provider`` in
``config.yaml`` matches against when routing ``video_generate``
tool calls.
"""
from agent.video_gen_provider import VideoGenProvider
from agent.video_gen_registry import register_provider as _register_video_provider
if not isinstance(provider, VideoGenProvider):
logger.warning(
"Plugin '%s' tried to register a video_gen provider that does "
"not inherit from VideoGenProvider. Ignoring.",
self.manifest.name,
)
return
_register_video_provider(provider)
logger.info(
"Plugin '%s' registered video_gen provider: %s",
self.manifest.name, provider.name,
)
# -- platform adapter registration ---------------------------------------
def register_platform(

View file

@ -454,6 +454,26 @@ def _print_setup_summary(config: dict, hermes_home):
else:
tool_status.append(("Image Generation", False, "FAL_KEY or OPENAI_API_KEY"))
# Video generation — opt-in via `hermes tools` → Video Generation.
# Only show the row when a plugin reports available so we don't badger
# users who don't care about video gen with a "missing" status line.
try:
from agent.video_gen_registry import list_providers as _list_video_providers
from hermes_cli.plugins import _ensure_plugins_discovered as _ensure_plugins
_ensure_plugins()
_video_backend = None
for _vp in _list_video_providers():
try:
if _vp.is_available():
_video_backend = _vp.display_name
break
except Exception:
continue
except Exception:
_video_backend = None
if _video_backend:
tool_status.append((f"Video Generation ({_video_backend})", True, None))
# TTS — show configured provider
tts_provider = cfg_get(config, "tts", "provider", default="edge")
if subscription_features.tts.managed_by_nous:

View file

@ -60,6 +60,7 @@ CONFIGURABLE_TOOLSETS = [
("vision", "👁️ Vision / Image Analysis", "vision_analyze"),
("video", "🎬 Video Analysis", "video_analyze (requires video-capable model)"),
("image_gen", "🎨 Image Generation", "image_generate"),
("video_gen", "🎬 Video Generation", "video_generate (text-to-video + image-to-video)"),
("moa", "🧠 Mixture of Agents", "mixture_of_agents"),
("tts", "🔊 Text-to-Speech", "text_to_speech"),
("skills", "📚 Skills", "list, view, manage"),
@ -82,7 +83,11 @@ CONFIGURABLE_TOOLSETS = [
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video"}
#
# Video gen is off by default — it's a niche, paid, slow feature. Users
# who want it opt in via `hermes tools` → Video Generation, which walks
# them through provider + model selection.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video", "video_gen"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
@ -349,6 +354,15 @@ TOOL_CATEGORIES = {
},
],
},
"video_gen": {
"name": "Video Generation",
"icon": "🎬",
# Providers list is intentionally empty — every video gen backend
# is a plugin, surfaced by ``_plugin_video_gen_providers()`` and
# injected by ``_visible_providers``. Mirrors the design we'll
# converge image_gen toward.
"providers": [],
},
"browser": {
"name": "Browser Automation",
"icon": "🌐",
@ -1525,6 +1539,43 @@ def _plugin_image_gen_providers() -> list[dict]:
return rows
def _plugin_video_gen_providers() -> list[dict]:
"""Build picker-row dicts from plugin-registered video gen providers.
Mirrors ``_plugin_image_gen_providers`` exactly every video backend
is a plugin, so this function is the *only* source of provider rows
for the Video Generation category. The hardcoded ``TOOL_CATEGORIES``
entry for ``video_gen`` keeps an empty providers list.
"""
try:
from agent.video_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
providers = list_providers()
except Exception:
return []
rows: list[dict] = []
for provider in providers:
try:
schema = provider.get_setup_schema()
except Exception:
continue
if not isinstance(schema, dict):
continue
rows.append(
{
"name": schema.get("name", provider.display_name),
"badge": schema.get("badge", ""),
"tag": schema.get("tag", ""),
"env_vars": schema.get("env_vars", []),
"video_gen_plugin_name": provider.name,
}
)
return rows
def _visible_providers(cat: dict, config: dict) -> list[dict]:
"""Return provider entries visible for the current auth/config state."""
features = get_nous_subscription_features(config)
@ -1541,6 +1592,11 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
if cat.get("name") == "Image Generation":
visible.extend(_plugin_image_gen_providers())
# Inject plugin-registered video_gen backends. Unlike image_gen,
# video_gen has NO hardcoded providers — every backend is a plugin.
if cat.get("name") == "Video Generation":
visible.extend(_plugin_video_gen_providers())
return visible
@ -1608,6 +1664,23 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
if provider.is_available():
return False
except Exception:
continue
except Exception:
pass
return True
if ts_key == "video_gen":
# Satisfied when any plugin-registered video gen provider reports
# available — no in-tree fallback (every backend is a plugin).
try:
from agent.video_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
@ -1952,6 +2025,106 @@ def _select_plugin_image_gen_provider(plugin_name: str, config: dict) -> None:
_configure_imagegen_model_for_plugin(plugin_name, config)
# ─── Video Generation Model Pickers ───────────────────────────────────────────
def _plugin_video_gen_catalog(plugin_name: str):
"""Return ``(catalog_dict, default_model_id)`` for a video gen plugin.
Mirrors :func:`_plugin_image_gen_catalog`. Returns ``({}, None)`` when
the plugin isn't registered or has no models.
"""
try:
from agent.video_gen_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_provider(plugin_name)
except Exception:
return {}, None
if provider is None:
return {}, None
try:
models = provider.list_models() or []
default = provider.default_model()
except Exception:
return {}, None
catalog = {m["id"]: m for m in models if isinstance(m, dict) and "id" in m}
return catalog, default
def _configure_videogen_model_for_plugin(plugin_name: str, config: dict) -> None:
"""Prompt for a video gen model from a plugin's catalog.
Mirrors :func:`_configure_imagegen_model_for_plugin`. Writes the
selection to ``video_gen.model``.
"""
catalog, default_model = _plugin_video_gen_catalog(plugin_name)
if not catalog:
return
cur_cfg = config.setdefault("video_gen", {})
if not isinstance(cur_cfg, dict):
cur_cfg = {}
config["video_gen"] = cur_cfg
current_model = cur_cfg.get("model") or default_model
if current_model not in catalog:
current_model = default_model
model_ids = list(catalog.keys())
ordered = [current_model] + [m for m in model_ids if m != current_model]
widths = {
"model": max(len(m) for m in model_ids),
"speed": max((len(catalog[m].get("speed", "")) for m in model_ids), default=6),
"strengths": max((len(catalog[m].get("strengths", "")) for m in model_ids), default=0),
}
print()
header = (
f" {'Model':<{widths['model']}} "
f"{'Speed':<{widths['speed']}} "
f"{'Strengths':<{widths['strengths']}} "
f"Price"
)
print(color(header, Colors.CYAN))
rows = []
for mid in ordered:
meta = catalog[mid]
row = (
f" {mid:<{widths['model']}} "
f"{meta.get('speed', ''):<{widths['speed']}} "
f"{meta.get('strengths', ''):<{widths['strengths']}} "
f"{meta.get('price', '')}"
)
if mid == current_model:
row += " ← currently in use"
rows.append(row)
idx = _prompt_choice(
f" Choose {plugin_name} model:",
rows,
default=0,
)
chosen = ordered[idx]
cur_cfg["model"] = chosen
_print_success(f" Model set to: {chosen}")
def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
"""Persist a plugin-backed video generation provider selection."""
vid_cfg = config.setdefault("video_gen", {})
if not isinstance(vid_cfg, dict):
vid_cfg = {}
config["video_gen"] = vid_cfg
vid_cfg["provider"] = plugin_name
vid_cfg["use_gateway"] = False
_print_success(f" video_gen.provider set to: {plugin_name}")
_configure_videogen_model_for_plugin(plugin_name, config)
def _configure_provider(provider: dict, config: dict):
"""Configure a single provider - prompt for API keys and set config."""
env_vars = provider.get("env_vars", [])
@ -2014,6 +2187,12 @@ def _configure_provider(provider: dict, config: dict):
if plugin_name:
_select_plugin_image_gen_provider(plugin_name, config)
return
# Plugin-registered video_gen provider — same flow, different
# registry.
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
# Imagegen backends prompt for model selection after backend pick.
backend = provider.get("imagegen_backend")
if backend:
@ -2062,6 +2241,10 @@ def _configure_provider(provider: dict, config: dict):
if plugin_name:
_select_plugin_image_gen_provider(plugin_name, config)
return
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
# Imagegen backends prompt for model selection after env vars are in.
backend = provider.get("imagegen_backend")
if backend:

View file

@ -0,0 +1,523 @@
"""FAL.ai video generation backend.
User-facing surface: pick a **model family** (e.g. "Pixverse v6",
"Veo 3.1", "Seedance 2.0", "Kling v3 4K", "LTX 2.3", "Happy Horse").
The plugin auto-routes to the family's text-to-video endpoint when
called without ``image_url``, and to its image-to-video endpoint when
``image_url`` is provided. The agent never sees the routing it just
calls ``video_generate(prompt=..., image_url=...)``.
Model families (each with t2v + i2v endpoints):
Cheap tier:
ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / fal-ai/ltx-2.3-22b/image-to-video
pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video
Premium tier:
veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video
seedance-2.0 bytedance/seedance-2.0/text-to-video / bytedance/seedance-2.0/image-to-video
kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / fal-ai/kling-video/v3/4k/image-to-video
happy-horse fal-ai/happy-horse/text-to-video / fal-ai/happy-horse/image-to-video
Selection precedence for the active family:
1. ``model=`` arg from the tool call
2. ``FAL_VIDEO_MODEL`` env var
3. ``video_gen.fal.model`` in ``config.yaml``
4. ``video_gen.model`` in ``config.yaml`` (when it's one of our family IDs)
5. ``DEFAULT_MODEL``
Authentication via ``FAL_KEY``. Output is an HTTPS URL from FAL's CDN; the
gateway downloads and delivers it.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List, Optional, Tuple
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Family catalog
# ---------------------------------------------------------------------------
#
# Each family declares both endpoints (when available) plus a per-family
# capability sheet derived from FAL's OpenAPI schemas. Capability flags
# drive which keys get added to the request payload — keys a family doesn't
# advertise are dropped before send.
#
# Capabilities:
# aspect_ratios : tuple of supported ratios (None = endpoint decides)
# resolutions : tuple of supported resolutions (None = endpoint decides)
# durations : tuple of supported durations OR (min, max) range
# (heuristic: 2-element with gap > 1 is a range)
# audio : True if generate_audio is supported
# negative : True if negative_prompt is supported
FAL_FAMILIES: Dict[str, Dict[str, Any]] = {
# ─── Cheap / fast tier ─────────────────────────────────────────────
"ltx-2.3": {
"display": "LTX 2.3 (22B)",
"speed": "~30-60s",
"price": "cheap",
"strengths": "22B model with native audio generation. Affordable.",
"tier": "cheap",
"text_endpoint": "fal-ai/ltx-2.3-22b/text-to-video",
"image_endpoint": "fal-ai/ltx-2.3-22b/image-to-video",
# LTX docs don't expose duration/aspect/resolution enums — leave
# blank so we don't send unrecognized payload keys.
"aspect_ratios": None,
"resolutions": None,
"durations": None,
"audio": True,
"negative": True,
},
"pixverse-v6": {
"display": "Pixverse v6",
"speed": "~30-90s",
"price": "cheap",
"strengths": "Affordable. Negative prompts. 1-15s durations.",
"tier": "cheap",
"text_endpoint": "fal-ai/pixverse/v6/text-to-video",
"image_endpoint": "fal-ai/pixverse/v6/image-to-video",
"aspect_ratios": None,
"resolutions": ("360p", "540p", "720p", "1080p"),
"durations": (1, 15),
"audio": True,
"negative": True,
},
# ─── Expensive / premium tier ──────────────────────────────────────
"veo3.1": {
"display": "Veo 3.1",
"speed": "~60-120s",
"price": "premium",
"strengths": "Google DeepMind. Cinematic, native audio, strong prompt adherence.",
"tier": "premium",
"text_endpoint": "fal-ai/veo3.1",
"image_endpoint": "fal-ai/veo3.1/image-to-video",
"aspect_ratios": ("16:9", "9:16"),
"resolutions": ("720p", "1080p"),
"durations": (4, 6, 8),
"audio": True,
"negative": True,
},
"seedance-2.0": {
"display": "Seedance 2.0",
"speed": "~60-120s",
"price": "premium",
"strengths": "ByteDance. Cinematic, synchronized audio + lip-sync, 4-15s.",
"tier": "premium",
"text_endpoint": "bytedance/seedance-2.0/text-to-video",
"image_endpoint": "bytedance/seedance-2.0/image-to-video",
# Seedance accepts "auto" too — we omit it from the enum so the
# agent can't pass it; the endpoint defaults handle the rest.
"aspect_ratios": ("21:9", "16:9", "4:3", "1:1", "3:4", "9:16"),
"resolutions": ("480p", "720p", "1080p"),
"durations": (4, 15),
"audio": True,
"negative": False,
},
"kling-v3-4k": {
"display": "Kling v3 4K",
"speed": "~120-300s",
"price": "premium",
"strengths": "4K output, native audio (Chinese/English), 3-15s.",
"tier": "premium",
"text_endpoint": "fal-ai/kling-video/v3/4k/text-to-video",
"image_endpoint": "fal-ai/kling-video/v3/4k/image-to-video",
# Kling 4K image-to-video uses `start_image_url` instead of
# `image_url`. Handled in _build_payload via image_param_key.
"image_param_key": "start_image_url",
"aspect_ratios": ("16:9", "9:16", "1:1"),
"resolutions": None, # 4K is implicit
"durations": (3, 15),
"audio": True,
"negative": True,
},
"happy-horse": {
"display": "Happy Horse 1.0",
"speed": "~60-120s",
"price": "premium",
"strengths": "Alibaba. New model, sparse public docs — conservative defaults.",
"tier": "premium",
"text_endpoint": "fal-ai/happy-horse/text-to-video",
"image_endpoint": "fal-ai/happy-horse/image-to-video",
# Docs don't expose duration/aspect/resolution — let the endpoint
# apply its own defaults.
"aspect_ratios": None,
"resolutions": None,
"durations": None,
"audio": False,
"negative": False,
},
}
DEFAULT_MODEL = "pixverse-v6" # cheap, both modalities, sane defaults
def _is_duration_range(durations: Any) -> bool:
"""Heuristic: a 2-tuple of ints with a gap > 1 is treated as ``(min, max)``."""
if not isinstance(durations, tuple) or len(durations) != 2:
return False
if not all(isinstance(d, int) for d in durations):
return False
return durations[1] - durations[0] > 1
def _clamp_duration(family: Dict[str, Any], duration: Optional[int]) -> Optional[int]:
durations = family.get("durations")
if not durations:
return duration
if duration is None:
return durations[0]
if _is_duration_range(durations):
lo, hi = durations
return max(lo, min(hi, duration))
# enum
if duration in durations:
return duration
return min(durations, key=lambda d: abs(d - duration))
# ---------------------------------------------------------------------------
# Config / model resolution
# ---------------------------------------------------------------------------
def _load_video_gen_section() -> Dict[str, Any]:
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("video_gen") if isinstance(cfg, dict) else None
return section if isinstance(section, dict) else {}
except Exception as exc:
logger.debug("Could not load video_gen config: %s", exc)
return {}
def _resolve_family(explicit: Optional[str]) -> Tuple[str, Dict[str, Any]]:
"""Decide which FAL family to use. Returns ``(family_id, meta)``."""
candidates: List[Optional[str]] = []
candidates.append(explicit)
candidates.append(os.environ.get("FAL_VIDEO_MODEL"))
cfg = _load_video_gen_section()
fal_cfg = cfg.get("fal") if isinstance(cfg.get("fal"), dict) else {}
if isinstance(fal_cfg, dict):
candidates.append(fal_cfg.get("model"))
top = cfg.get("model")
if isinstance(top, str):
candidates.append(top)
for c in candidates:
if isinstance(c, str) and c.strip() and c.strip() in FAL_FAMILIES:
fid = c.strip()
return fid, FAL_FAMILIES[fid]
return DEFAULT_MODEL, FAL_FAMILIES[DEFAULT_MODEL]
# ---------------------------------------------------------------------------
# Payload construction
# ---------------------------------------------------------------------------
def _build_payload(
family: Dict[str, Any],
*,
prompt: str,
image_url: Optional[str],
duration: Optional[int],
aspect_ratio: str,
resolution: str,
negative_prompt: Optional[str],
audio: Optional[bool],
seed: Optional[int],
) -> Dict[str, Any]:
"""Build a family-specific payload, dropping keys the family doesn't declare."""
payload: Dict[str, Any] = {}
if prompt:
payload["prompt"] = prompt
if image_url:
# Some endpoints (e.g. Kling v3 4K image-to-video) expect
# `start_image_url` instead of `image_url`. The family entry can
# declare an override.
key = family.get("image_param_key") or "image_url"
payload[key] = image_url
if seed is not None:
payload["seed"] = seed
if family.get("aspect_ratios"):
if aspect_ratio in family["aspect_ratios"]:
payload["aspect_ratio"] = aspect_ratio
# otherwise let the endpoint auto-crop / use its default
if family.get("resolutions"):
if resolution in family["resolutions"]:
payload["resolution"] = resolution
# else: let the endpoint default
clamped = _clamp_duration(family, duration)
if clamped is not None and family.get("durations"):
# FAL exposes duration as a string in the queue API ("8" not 8).
payload["duration"] = str(clamped)
if family.get("audio") and audio is not None:
payload["generate_audio"] = bool(audio)
if family.get("negative") and negative_prompt:
payload["negative_prompt"] = negative_prompt
return payload
# ---------------------------------------------------------------------------
# fal_client lazy import (same pattern as image_generation_tool)
# ---------------------------------------------------------------------------
_fal_client: Any = None
def _load_fal_client() -> Any:
global _fal_client
if _fal_client is not None:
return _fal_client
import fal_client # type: ignore
_fal_client = fal_client
return fal_client
# ---------------------------------------------------------------------------
# Provider
# ---------------------------------------------------------------------------
class FALVideoGenProvider(VideoGenProvider):
"""FAL.ai multi-family video generation backend.
Routes between text-to-video and image-to-video endpoints automatically
based on whether ``image_url`` was provided.
"""
@property
def name(self) -> str:
return "fal"
@property
def display_name(self) -> str:
return "FAL"
def is_available(self) -> bool:
if not os.environ.get("FAL_KEY", "").strip():
return False
try:
import fal_client # noqa: F401
except ImportError:
return False
return True
def list_models(self) -> List[Dict[str, Any]]:
out: List[Dict[str, Any]] = []
for fid, meta in FAL_FAMILIES.items():
modalities: List[str] = []
if meta.get("text_endpoint"):
modalities.append("text")
if meta.get("image_endpoint"):
modalities.append("image")
out.append({
"id": fid,
"display": meta["display"],
"speed": meta["speed"],
"strengths": meta["strengths"],
"price": meta["price"],
"tier": meta.get("tier", "premium"),
"modalities": modalities,
})
return out
def default_model(self) -> Optional[str]:
return DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "FAL",
"badge": "paid",
"tag": "LTX, Pixverse, Veo 3.1, Seedance 2.0, Kling 4K, Happy Horse — text-to-video & image-to-video",
"env_vars": [
{
"key": "FAL_KEY",
"prompt": "FAL.ai API key",
"url": "https://fal.ai/dashboard/keys",
},
],
}
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": ["16:9", "9:16", "1:1"],
"resolutions": ["360p", "540p", "720p", "1080p"],
"max_duration": 15,
"min_duration": 1,
"supports_audio": True,
"supports_negative_prompt": True,
"max_reference_images": 0,
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = "16:9",
resolution: str = "720p",
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if not os.environ.get("FAL_KEY", "").strip():
return error_response(
error=(
"FAL_KEY not set. Run `hermes tools` → Video Generation "
"→ FAL to configure."
),
error_type="auth_required",
provider="fal",
prompt=prompt,
)
try:
fal_client = _load_fal_client()
except ImportError:
return error_response(
error="fal_client Python package not installed (pip install fal-client)",
error_type="missing_dependency",
provider="fal",
prompt=prompt,
)
prompt = (prompt or "").strip()
family_id, family = _resolve_family(model)
# Route: image_url → image-to-video endpoint; else → text-to-video.
image_url_norm = (image_url or "").strip() or None
if image_url_norm:
endpoint = family.get("image_endpoint")
modality_used = "image"
if not endpoint:
return error_response(
error=(
f"FAL family {family_id} has no image-to-video "
f"endpoint. Pick a family with image-to-video support "
f"via `hermes tools` → Video Generation."
),
error_type="modality_unsupported",
provider="fal", model=family_id, prompt=prompt,
)
else:
endpoint = family.get("text_endpoint")
modality_used = "text"
if not endpoint:
return error_response(
error=(
f"FAL family {family_id} has no text-to-video "
f"endpoint. Pass an image_url to use its "
f"image-to-video endpoint, or pick a different family."
),
error_type="modality_unsupported",
provider="fal", model=family_id, prompt=prompt,
)
if not prompt:
return error_response(
error="prompt is required.",
error_type="missing_prompt",
provider="fal", model=family_id, prompt=prompt,
)
payload = _build_payload(
family,
prompt=prompt,
image_url=image_url_norm,
duration=duration,
aspect_ratio=aspect_ratio,
resolution=resolution,
negative_prompt=negative_prompt,
audio=audio,
seed=seed,
)
try:
result = fal_client.subscribe(
endpoint,
arguments=payload,
with_logs=False,
)
except Exception as exc:
logger.warning(
"FAL video gen failed (family=%s, endpoint=%s): %s",
family_id, endpoint, exc, exc_info=True,
)
return error_response(
error=f"FAL video generation failed: {exc}",
error_type="api_error",
provider="fal", model=family_id, prompt=prompt,
aspect_ratio=aspect_ratio,
)
video = (result or {}).get("video") if isinstance(result, dict) else None
url: Optional[str] = None
if isinstance(video, dict):
url = video.get("url")
elif isinstance(video, str):
url = video
if not url:
return error_response(
error="FAL returned no video URL in response",
error_type="empty_response",
provider="fal", model=family_id, prompt=prompt,
)
extra: Dict[str, Any] = {"endpoint": endpoint}
if isinstance(video, dict):
if video.get("file_size"):
extra["file_size"] = video["file_size"]
if video.get("content_type"):
extra["content_type"] = video["content_type"]
return success_response(
video=url,
model=family_id,
prompt=prompt,
modality=modality_used,
aspect_ratio=aspect_ratio if "aspect_ratio" in payload else "",
duration=int(payload["duration"]) if "duration" in payload else 0,
provider="fal",
extra=extra,
)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``FALVideoGenProvider`` into the registry."""
ctx.register_video_gen_provider(FALVideoGenProvider())

View file

@ -0,0 +1,7 @@
name: fal
version: 1.0.0
description: "FAL.ai video generation backend. Multi-model — Veo 3.1, Kling, Pixverse — covering text-to-video and image-to-video via fal_client's queue API."
author: NousResearch
kind: backend
requires_env:
- FAL_KEY

View file

@ -0,0 +1,402 @@
"""xAI Grok-Imagine video generation backend.
Surface: text-to-video and image-to-video (animate an input image)
through xAI's ``/videos/generations`` endpoint. Edit and extend are not
exposed in this unified surface xAI is the only backend that supports
them and the inconsistency would force per-backend prose in the agent's
tool description.
Originally salvaged from PR #10600 by @Jaaneek; reshaped into the
:class:`VideoGenProvider` plugin interface and trimmed to the
generate-only surface.
Authentication via ``XAI_API_KEY``. Output is an HTTPS URL from xAI's
CDN; the gateway downloads and delivers it.
"""
from __future__ import annotations
import asyncio
import logging
import os
import uuid
from typing import Any, Dict, List, Optional
import httpx
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_XAI_BASE_URL = "https://api.x.ai/v1"
DEFAULT_MODEL = "grok-imagine-video"
DEFAULT_DURATION = 8
DEFAULT_ASPECT_RATIO = "16:9"
DEFAULT_RESOLUTION = "720p"
DEFAULT_TIMEOUT_SECONDS = 240
DEFAULT_POLL_INTERVAL_SECONDS = 5
VALID_ASPECT_RATIOS = {"1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3"}
VALID_RESOLUTIONS = {"480p", "720p"}
MAX_REFERENCE_IMAGES = 7
_MODELS: Dict[str, Dict[str, Any]] = {
"grok-imagine-video": {
"display": "Grok Imagine Video",
"speed": "~60-240s",
"strengths": "Text-to-video + image-to-video; up to 7 reference images for style/character.",
"price": "see https://docs.x.ai/docs/models",
"modalities": ["text", "image"],
},
}
# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------
def _xai_base_url() -> str:
return (os.getenv("XAI_BASE_URL") or DEFAULT_XAI_BASE_URL).strip().rstrip("/")
def _xai_headers() -> Dict[str, str]:
api_key = os.getenv("XAI_API_KEY", "").strip()
if not api_key:
raise ValueError("XAI_API_KEY not set. Get one at https://console.x.ai/")
try:
from tools.xai_http import hermes_xai_user_agent
ua = hermes_xai_user_agent()
except Exception:
ua = "hermes-agent/video_gen"
return {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": ua,
}
def _normalize_reference_images(reference_image_urls: Optional[List[str]]):
refs = []
for url in reference_image_urls or []:
normalized = (url or "").strip()
if normalized:
refs.append({"url": normalized})
return refs or None
def _clamp_duration(duration: Optional[int], has_reference_images: bool) -> int:
value = duration if duration is not None else DEFAULT_DURATION
if value < 1:
value = 1
if value > 15:
value = 15
if has_reference_images and value > 10:
value = 10
return value
async def _submit(
client: httpx.AsyncClient,
payload: Dict[str, Any],
) -> str:
"""POST to /videos/generations — xAI's only public endpoint for our
text-to-video and image-to-video surface."""
response = await client.post(
f"{_xai_base_url()}/videos/generations",
headers={**_xai_headers(), "x-idempotency-key": str(uuid.uuid4())},
json=payload,
timeout=60,
)
response.raise_for_status()
body = response.json()
request_id = body.get("request_id")
if not request_id:
raise RuntimeError("xAI video response did not include request_id")
return request_id
async def _poll(
client: httpx.AsyncClient,
request_id: str,
*,
timeout_seconds: int,
poll_interval: int,
) -> Dict[str, Any]:
elapsed = 0.0
last_status = "queued"
while elapsed < timeout_seconds:
response = await client.get(
f"{_xai_base_url()}/videos/{request_id}",
headers=_xai_headers(),
timeout=30,
)
response.raise_for_status()
body = response.json()
last_status = (body.get("status") or "").lower()
if last_status == "done":
return {"status": "done", "body": body}
if last_status in {"failed", "error", "expired", "cancelled"}:
return {"status": last_status, "body": body}
await asyncio.sleep(poll_interval)
elapsed += poll_interval
return {"status": "timeout", "body": {"status": last_status}}
# ---------------------------------------------------------------------------
# Provider
# ---------------------------------------------------------------------------
class XAIVideoGenProvider(VideoGenProvider):
"""xAI grok-imagine-video backend (text-to-video + image-to-video)."""
@property
def name(self) -> str:
return "xai"
@property
def display_name(self) -> str:
return "xAI"
def is_available(self) -> bool:
return bool(os.environ.get("XAI_API_KEY", "").strip())
def list_models(self) -> List[Dict[str, Any]]:
return [{"id": mid, **meta} for mid, meta in _MODELS.items()]
def default_model(self) -> Optional[str]:
return DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "xAI",
"badge": "paid",
"tag": "grok-imagine-video — text-to-video & image-to-video with reference images",
"env_vars": [
{
"key": "XAI_API_KEY",
"prompt": "xAI API key",
"url": "https://console.x.ai/",
},
],
}
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": sorted(VALID_ASPECT_RATIOS),
"resolutions": sorted(VALID_RESOLUTIONS),
"max_duration": 15,
"min_duration": 1,
"supports_audio": False,
"supports_negative_prompt": False,
"max_reference_images": MAX_REFERENCE_IMAGES,
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
resolution: str = DEFAULT_RESOLUTION,
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
try:
loop = asyncio.new_event_loop()
try:
return loop.run_until_complete(self._generate_async(
prompt=prompt,
model=model,
image_url=image_url,
reference_image_urls=reference_image_urls,
duration=duration,
aspect_ratio=aspect_ratio,
resolution=resolution,
))
finally:
loop.close()
except Exception as exc:
logger.warning("xAI video gen unexpected failure: %s", exc, exc_info=True)
return error_response(
error=f"xAI video generation failed: {exc}",
error_type="api_error",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
aspect_ratio=aspect_ratio,
)
async def _generate_async(
self,
*,
prompt: str,
model: Optional[str],
image_url: Optional[str],
reference_image_urls: Optional[List[str]],
duration: Optional[int],
aspect_ratio: str,
resolution: str,
) -> Dict[str, Any]:
if not os.environ.get("XAI_API_KEY", "").strip():
return error_response(
error="XAI_API_KEY not set. Get one at https://console.x.ai/",
error_type="auth_required",
provider="xai", prompt=prompt,
)
prompt = (prompt or "").strip()
image_url_norm = (image_url or "").strip() or None
normalized_aspect_ratio = (aspect_ratio or DEFAULT_ASPECT_RATIO).strip()
normalized_resolution = (resolution or DEFAULT_RESOLUTION).strip().lower()
modality_used = "image" if image_url_norm else "text"
if not prompt:
return error_response(
error=(
"prompt is required for xAI video generation "
"(text-to-video or image-to-video)"
),
error_type="missing_prompt",
provider="xai", prompt=prompt,
)
refs = _normalize_reference_images(reference_image_urls)
if refs and len(refs) > MAX_REFERENCE_IMAGES:
return error_response(
error=f"reference_image_urls supports at most {MAX_REFERENCE_IMAGES} images on xAI",
error_type="too_many_references",
provider="xai", prompt=prompt,
)
if image_url_norm and refs:
return error_response(
error="image_url and reference_image_urls cannot be combined on xAI",
error_type="conflicting_inputs",
provider="xai", prompt=prompt,
)
clamped_duration = _clamp_duration(duration, has_reference_images=bool(refs))
if normalized_aspect_ratio not in VALID_ASPECT_RATIOS:
normalized_aspect_ratio = DEFAULT_ASPECT_RATIO
if normalized_resolution not in VALID_RESOLUTIONS:
normalized_resolution = DEFAULT_RESOLUTION
payload: Dict[str, Any] = {
"model": model or DEFAULT_MODEL,
"prompt": prompt,
"duration": clamped_duration,
"aspect_ratio": normalized_aspect_ratio,
"resolution": normalized_resolution,
}
if image_url_norm:
payload["image"] = {"url": image_url_norm}
if refs:
payload["reference_images"] = refs
async with httpx.AsyncClient() as client:
try:
request_id = await _submit(client, payload)
except httpx.HTTPStatusError as exc:
detail = ""
try:
detail = exc.response.text[:500]
except Exception:
pass
return error_response(
error=f"xAI submit failed ({exc.response.status_code}): {detail or exc}",
error_type="api_error",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
poll_result = await _poll(
client, request_id,
timeout_seconds=DEFAULT_TIMEOUT_SECONDS,
poll_interval=DEFAULT_POLL_INTERVAL_SECONDS,
)
status = poll_result["status"]
body = poll_result["body"]
if status == "done":
video = body.get("video") or {}
url = video.get("url")
if not url:
return error_response(
error="xAI video generation completed without a video URL",
error_type="empty_response",
provider="xai",
model=body.get("model") or model or DEFAULT_MODEL,
prompt=prompt,
)
extra: Dict[str, Any] = {
"request_id": request_id,
"resolution": normalized_resolution,
}
if body.get("usage"):
extra["usage"] = body["usage"]
return success_response(
video=url,
model=body.get("model") or model or DEFAULT_MODEL,
prompt=prompt,
modality=modality_used,
aspect_ratio=normalized_aspect_ratio,
duration=video.get("duration") or clamped_duration,
provider="xai",
extra=extra,
)
if status == "timeout":
return error_response(
error=f"Timed out waiting for video generation after {DEFAULT_TIMEOUT_SECONDS}s",
error_type="timeout",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
message = (
(body.get("error", {}) or {}).get("message")
or body.get("message")
or f"xAI video generation ended with status '{status}'"
)
return error_response(
error=message,
error_type=f"xai_{status}",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``XAIVideoGenProvider`` into the registry."""
ctx.register_video_gen_provider(XAIVideoGenProvider())

View file

@ -0,0 +1,7 @@
name: xai
version: 1.0.0
description: "xAI Grok-Imagine video generation backend. Supports text-to-video, image-to-video, reference-image-guided generation, video edit, and video extend via the xAI async videos API."
author: NousResearch
kind: backend
requires_env:
- XAI_API_KEY

View file

@ -0,0 +1,114 @@
"""Tests for agent/video_gen_registry.py — provider registration & active lookup."""
from __future__ import annotations
import pytest
from agent import video_gen_registry
from agent.video_gen_provider import VideoGenProvider
class _FakeProvider(VideoGenProvider):
def __init__(self, name: str, available: bool = True):
self._name = name
self._available = available
@property
def name(self) -> str:
return self._name
def is_available(self) -> bool:
return self._available
def generate(self, prompt, **kw):
return {"success": True, "video": f"{self._name}://{prompt}"}
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
class TestRegisterProvider:
def test_register_and_lookup(self):
provider = _FakeProvider("fake")
video_gen_registry.register_provider(provider)
assert video_gen_registry.get_provider("fake") is provider
def test_rejects_non_provider(self):
with pytest.raises(TypeError):
video_gen_registry.register_provider("not a provider") # type: ignore[arg-type]
def test_rejects_empty_name(self):
class Empty(VideoGenProvider):
@property
def name(self) -> str:
return ""
def generate(self, prompt, **kw):
return {}
with pytest.raises(ValueError):
video_gen_registry.register_provider(Empty())
def test_reregister_overwrites(self):
a = _FakeProvider("same")
b = _FakeProvider("same")
video_gen_registry.register_provider(a)
video_gen_registry.register_provider(b)
assert video_gen_registry.get_provider("same") is b
def test_list_is_sorted(self):
video_gen_registry.register_provider(_FakeProvider("zeta"))
video_gen_registry.register_provider(_FakeProvider("alpha"))
names = [p.name for p in video_gen_registry.list_providers()]
assert names == ["alpha", "zeta"]
class TestGetActiveProvider:
def test_single_provider_autoresolves(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
video_gen_registry.register_provider(_FakeProvider("solo"))
active = video_gen_registry.get_active_provider()
assert active is not None and active.name == "solo"
def test_no_provider_returns_none(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
assert video_gen_registry.get_active_provider() is None
def test_multi_without_config_returns_none(self, tmp_path, monkeypatch):
"""Unlike image_gen (which falls back to 'fal'), video_gen has no
legacy default when there are multiple providers and no config,
the registry returns None and the tool surfaces a helpful error.
"""
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
video_gen_registry.register_provider(_FakeProvider("xai"))
video_gen_registry.register_provider(_FakeProvider("fal"))
assert video_gen_registry.get_active_provider() is None
def test_config_selects_provider(self, tmp_path, monkeypatch):
import yaml
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
(tmp_path / "config.yaml").write_text(
yaml.safe_dump({"video_gen": {"provider": "fal"}})
)
video_gen_registry.register_provider(_FakeProvider("xai"))
video_gen_registry.register_provider(_FakeProvider("fal"))
active = video_gen_registry.get_active_provider()
assert active is not None and active.name == "fal"
def test_unknown_config_falls_back(self, tmp_path, monkeypatch):
"""If video_gen.provider names a provider that isn't registered,
the single-provider fallback still applies."""
import yaml
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
(tmp_path / "config.yaml").write_text(
yaml.safe_dump({"video_gen": {"provider": "ghost"}})
)
video_gen_registry.register_provider(_FakeProvider("only"))
active = video_gen_registry.get_active_provider()
assert active is not None and active.name == "only"

View file

@ -0,0 +1 @@
"""Make tests/plugins/video_gen a package."""

View file

@ -0,0 +1,314 @@
"""Tests for the FAL video gen plugin — family routing, payload shape."""
from __future__ import annotations
import pytest
from agent import video_gen_registry
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
def test_fal_provider_registers():
from plugins.video_gen.fal import FALVideoGenProvider, DEFAULT_MODEL
provider = FALVideoGenProvider()
video_gen_registry.register_provider(provider)
assert video_gen_registry.get_provider("fal") is provider
assert provider.display_name == "FAL"
# DEFAULT_MODEL is the cheap-tier default
assert provider.default_model() == DEFAULT_MODEL
assert DEFAULT_MODEL in {"pixverse-v6", "ltx-2.3"}
def test_fal_family_catalog():
"""Each family declares both endpoints. The catalog covers the
cheap + premium tiers Teknium listed."""
from plugins.video_gen.fal import FAL_FAMILIES
expected = {
# cheap
"ltx-2.3", "pixverse-v6",
# premium
"veo3.1", "seedance-2.0", "kling-v3-4k", "happy-horse",
}
assert expected.issubset(set(FAL_FAMILIES.keys())), (
f"missing families: {expected - set(FAL_FAMILIES.keys())}"
)
for fid, meta in FAL_FAMILIES.items():
assert meta.get("text_endpoint"), f"{fid} missing text_endpoint"
assert meta.get("image_endpoint"), f"{fid} missing image_endpoint"
assert meta["text_endpoint"] != meta["image_endpoint"]
assert meta.get("tier") in {"cheap", "premium"}, (
f"{fid} has invalid tier"
)
def test_kling_4k_uses_start_image_url():
"""Kling v3 4K's image-to-video endpoint expects start_image_url,
not image_url. The family must declare image_param_key='start_image_url'."""
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["kling-v3-4k"]
assert meta.get("image_param_key") == "start_image_url"
payload = _build_payload(
meta,
prompt="x",
image_url="https://example.com/i.png",
duration=5,
aspect_ratio="16:9",
resolution="720p",
negative_prompt=None,
audio=None,
seed=None,
)
assert payload.get("start_image_url") == "https://example.com/i.png"
assert "image_url" not in payload
def test_fal_list_models_advertises_both_modalities():
from plugins.video_gen.fal import FALVideoGenProvider
models = FALVideoGenProvider().list_models()
for m in models:
assert set(m["modalities"]) == {"text", "image"}, (
f"{m['id']} doesn't advertise both modalities — every family "
f"should have t2v + i2v"
)
def test_fal_unavailable_without_key(monkeypatch):
from plugins.video_gen.fal import FALVideoGenProvider
monkeypatch.delenv("FAL_KEY", raising=False)
assert FALVideoGenProvider().is_available() is False
def test_fal_generate_requires_fal_key(monkeypatch):
from plugins.video_gen.fal import FALVideoGenProvider
monkeypatch.delenv("FAL_KEY", raising=False)
result = FALVideoGenProvider().generate("a happy dog")
assert result["success"] is False
assert result["error_type"] == "auth_required"
class TestFamilyRouting:
"""The headline behavior: image_url presence picks the endpoint."""
@pytest.fixture
def with_fake_fal(self, monkeypatch):
"""Stub fal_client.subscribe to capture which endpoint we hit."""
import sys
import types
captured = {"endpoint": None, "arguments": None}
fake = types.ModuleType("fal_client")
def _subscribe(endpoint, arguments=None, with_logs=False):
captured["endpoint"] = endpoint
captured["arguments"] = arguments
return {"video": {"url": "https://fake/out.mp4"}}
fake.subscribe = _subscribe # type: ignore
monkeypatch.setitem(sys.modules, "fal_client", fake)
# Reset the lazy global so it picks up our stub
from plugins.video_gen import fal as fal_plugin
fal_plugin._fal_client = None
monkeypatch.setenv("FAL_KEY", "test")
return captured
def test_text_to_video_routes_to_text_endpoint(self, with_fake_fal):
from plugins.video_gen.fal import FALVideoGenProvider
result = FALVideoGenProvider().generate(
"a dog running",
model="pixverse-v6",
)
assert result["success"] is True
assert with_fake_fal["endpoint"] == "fal-ai/pixverse/v6/text-to-video"
assert result["modality"] == "text"
assert with_fake_fal["arguments"]["prompt"] == "a dog running"
assert "image_url" not in with_fake_fal["arguments"]
def test_image_to_video_routes_to_image_endpoint(self, with_fake_fal):
from plugins.video_gen.fal import FALVideoGenProvider
result = FALVideoGenProvider().generate(
"animate this dog",
model="pixverse-v6",
image_url="https://example.com/dog.png",
)
assert result["success"] is True
assert with_fake_fal["endpoint"] == "fal-ai/pixverse/v6/image-to-video"
assert result["modality"] == "image"
assert with_fake_fal["arguments"]["image_url"] == "https://example.com/dog.png"
def test_default_family_text_routing(self, with_fake_fal):
"""No model arg → DEFAULT_MODEL → text-to-video endpoint."""
from plugins.video_gen.fal import FALVideoGenProvider, FAL_FAMILIES, DEFAULT_MODEL
result = FALVideoGenProvider().generate("a dog")
assert result["success"] is True
expected_endpoint = FAL_FAMILIES[DEFAULT_MODEL]["text_endpoint"]
assert with_fake_fal["endpoint"] == expected_endpoint
def test_default_family_image_routing(self, with_fake_fal):
from plugins.video_gen.fal import FALVideoGenProvider, FAL_FAMILIES, DEFAULT_MODEL
result = FALVideoGenProvider().generate(
"animate this",
image_url="https://example.com/i.png",
)
assert result["success"] is True
expected_endpoint = FAL_FAMILIES[DEFAULT_MODEL]["image_endpoint"]
assert with_fake_fal["endpoint"] == expected_endpoint
def test_unknown_family_falls_back_to_default(self, with_fake_fal):
from plugins.video_gen.fal import FALVideoGenProvider, FAL_FAMILIES, DEFAULT_MODEL
result = FALVideoGenProvider().generate(
"x",
model="not-a-real-family",
)
assert result["success"] is True
expected_endpoint = FAL_FAMILIES[DEFAULT_MODEL]["text_endpoint"]
assert with_fake_fal["endpoint"] == expected_endpoint
def test_premium_seedance_routing(self, with_fake_fal):
"""Sanity check the premium-tier seedance routes correctly."""
from plugins.video_gen.fal import FALVideoGenProvider
result = FALVideoGenProvider().generate(
"a dog",
model="seedance-2.0",
image_url="https://example.com/dog.png",
)
assert result["success"] is True
assert with_fake_fal["endpoint"] == "bytedance/seedance-2.0/image-to-video"
# Seedance uses regular image_url (not start_image_url)
assert with_fake_fal["arguments"]["image_url"] == "https://example.com/dog.png"
def test_kling_4k_remaps_image_param(self, with_fake_fal):
"""Kling v3 4K image-to-video receives start_image_url, not image_url."""
from plugins.video_gen.fal import FALVideoGenProvider
result = FALVideoGenProvider().generate(
"x",
model="kling-v3-4k",
image_url="https://example.com/frame.png",
)
assert result["success"] is True
assert with_fake_fal["endpoint"] == "fal-ai/kling-video/v3/4k/image-to-video"
assert with_fake_fal["arguments"].get("start_image_url") == "https://example.com/frame.png"
assert "image_url" not in with_fake_fal["arguments"]
class TestPayloadBuilder:
def test_drops_unsupported_keys(self):
"""Veo enum-clamps duration, supports aspect+resolution+audio+neg."""
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["veo3.1"]
p = _build_payload(
meta,
prompt="x",
image_url=None,
duration=12, # not in enum (4,6,8) — snap to 8
aspect_ratio="16:9",
resolution="720p",
negative_prompt="ugly",
audio=True,
seed=42,
)
assert p["prompt"] == "x"
assert p["duration"] == "8" # FAL queue API uses strings
assert p["aspect_ratio"] == "16:9"
assert p["resolution"] == "720p"
assert p["generate_audio"] is True
assert p["negative_prompt"] == "ugly"
assert p["seed"] == 42
def test_pixverse_range_clamps_correctly(self):
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["pixverse-v6"]
p = _build_payload(
meta,
prompt="x",
image_url="https://i.png",
duration=99, # over max → 15
aspect_ratio="16:9",
resolution="540p",
negative_prompt=None,
audio=None,
seed=None,
)
assert p["duration"] == "15"
def test_kling_4k_clamps_below_min(self):
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["kling-v3-4k"]
p = _build_payload(
meta,
prompt="x",
image_url="https://i.png",
duration=1, # below min (3) → 3
aspect_ratio="16:9",
resolution="720p",
negative_prompt=None,
audio=None,
seed=None,
)
assert p["duration"] == "3"
def test_ltx_omits_duration_aspect_resolution(self):
"""LTX 2.3 doesn't declare duration/aspect/resolution enums —
the payload should NOT include those keys (let FAL default)."""
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["ltx-2.3"]
p = _build_payload(
meta,
prompt="x",
image_url=None,
duration=8,
aspect_ratio="16:9",
resolution="720p",
negative_prompt="ugly",
audio=True,
seed=None,
)
assert "duration" not in p
assert "aspect_ratio" not in p
assert "resolution" not in p
# But audio + negative are advertised
assert p["generate_audio"] is True
assert p["negative_prompt"] == "ugly"
def test_happy_horse_minimal_payload(self):
"""Happy Horse has sparse docs — payload should be minimal."""
from plugins.video_gen.fal import FAL_FAMILIES, _build_payload
meta = FAL_FAMILIES["happy-horse"]
p = _build_payload(
meta,
prompt="a horse galloping",
image_url=None,
duration=8,
aspect_ratio="16:9",
resolution="720p",
negative_prompt="watermark",
audio=True,
seed=None,
)
# Only prompt — no payload bloat for fields we can't verify
assert p == {"prompt": "a horse galloping"}

View file

@ -0,0 +1,69 @@
"""Smoke tests for the xAI video gen plugin — load & register surface."""
from __future__ import annotations
import pytest
from agent import video_gen_registry
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
def test_xai_provider_registers():
from plugins.video_gen.xai import XAIVideoGenProvider
provider = XAIVideoGenProvider()
video_gen_registry.register_provider(provider)
assert video_gen_registry.get_provider("xai") is provider
assert provider.display_name == "xAI"
assert provider.default_model() == "grok-imagine-video"
def test_xai_capabilities_text_and_image_only():
"""xAI was previously advertised with edit/extend operations. The
simplified surface only exposes text-to-video and image-to-video
confirm those are the only modalities advertised."""
from plugins.video_gen.xai import XAIVideoGenProvider
caps = XAIVideoGenProvider().capabilities()
assert caps["modalities"] == ["text", "image"]
# No 'operations' key in the simplified surface
assert "operations" not in caps
assert caps["max_reference_images"] == 7
def test_xai_unavailable_without_key(monkeypatch):
from plugins.video_gen.xai import XAIVideoGenProvider
monkeypatch.delenv("XAI_API_KEY", raising=False)
assert XAIVideoGenProvider().is_available() is False
def test_xai_generate_requires_xai_key(monkeypatch):
from plugins.video_gen.xai import XAIVideoGenProvider
monkeypatch.delenv("XAI_API_KEY", raising=False)
result = XAIVideoGenProvider().generate("a happy dog")
assert result["success"] is False
assert result["error_type"] == "auth_required"
def test_xai_no_operation_kwarg():
"""The ABC's generate() signature no longer accepts 'operation'.
Passing it through **kwargs should be ignored (forward-compat)."""
from plugins.video_gen.xai import XAIVideoGenProvider
# We're not actually hitting the network — just verify the call
# doesn't TypeError on the unexpected kwarg.
# Will fail with auth_required (no XAI_API_KEY), but should NOT
# fail with TypeError.
result = XAIVideoGenProvider().generate("x", operation="generate")
assert result["success"] is False
# auth_required, NOT some signature error
assert result["error_type"] in ("auth_required", "api_error")

View file

@ -0,0 +1,191 @@
"""Integration tests for the xAI video gen plugin's simplified surface.
xAI exposes only text-to-video and image-to-video through the unified
``video_generate`` tool. We assert the endpoint hit and the payload shape
because routing is the part most likely to break silently.
"""
from __future__ import annotations
import asyncio
import json
from typing import Any, Dict, List, Optional
import pytest
from agent import video_gen_registry
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
class _FakeResponse:
def __init__(self, status: int = 200, payload: Optional[Dict[str, Any]] = None):
self.status_code = status
self._payload = payload or {}
self.text = json.dumps(self._payload)
def raise_for_status(self):
if self.status_code >= 400:
import httpx
raise httpx.HTTPStatusError("err", request=None, response=self) # type: ignore
def json(self):
return self._payload
class _FakeAsyncClient:
def __init__(self):
self.posts: List[Dict[str, Any]] = []
async def __aenter__(self):
return self
async def __aexit__(self, *args):
return None
async def post(self, url, headers=None, json=None, timeout=None):
self.posts.append({"url": url, "json": json})
return _FakeResponse(200, {"request_id": "req-123"})
async def get(self, url, headers=None, timeout=None):
return _FakeResponse(200, {
"status": "done",
"video": {"url": "https://xai-cdn/out.mp4", "duration": 8},
"model": "grok-imagine-video",
})
@pytest.fixture
def xai_provider(monkeypatch):
monkeypatch.setenv("XAI_API_KEY", "test-key")
import plugins.video_gen.xai as xai_plugin
captured: Dict[str, _FakeAsyncClient] = {}
def _client_factory():
captured["client"] = _FakeAsyncClient()
return captured["client"]
monkeypatch.setattr(xai_plugin.httpx, "AsyncClient", _client_factory)
async def _no_sleep(*a, **k):
return None
monkeypatch.setattr(asyncio, "sleep", _no_sleep)
provider = xai_plugin.XAIVideoGenProvider()
return provider, captured
def _last_post(captured) -> Dict[str, Any]:
return captured["client"].posts[-1]
class TestXAIEndpoint:
"""xAI uses one endpoint — ``/videos/generations`` — for both modes."""
def test_text_to_video_hits_generations(self, xai_provider):
provider, captured = xai_provider
result = provider.generate("a dog on a skateboard")
assert result["success"] is True
assert _last_post(captured)["url"].endswith("/videos/generations")
assert result["modality"] == "text"
def test_image_to_video_hits_generations(self, xai_provider):
provider, captured = xai_provider
result = provider.generate(
"animate this",
image_url="https://example.com/cat.png",
)
assert result["success"] is True
assert _last_post(captured)["url"].endswith("/videos/generations")
assert result["modality"] == "image"
class TestXAIPayload:
def test_text_payload_has_no_image_field(self, xai_provider):
provider, captured = xai_provider
provider.generate("a dog at sunset")
payload = _last_post(captured)["json"]
assert payload["prompt"] == "a dog at sunset"
assert "image" not in payload
assert "reference_images" not in payload
def test_image_payload_has_image_field(self, xai_provider):
provider, captured = xai_provider
provider.generate("animate this", image_url="https://example.com/cat.png")
payload = _last_post(captured)["json"]
assert payload["image"] == {"url": "https://example.com/cat.png"}
def test_reference_images_payload(self, xai_provider):
provider, captured = xai_provider
provider.generate(
"keep this character",
reference_image_urls=[
"https://example.com/a.png",
"https://example.com/b.png",
],
)
payload = _last_post(captured)["json"]
assert payload["reference_images"] == [
{"url": "https://example.com/a.png"},
{"url": "https://example.com/b.png"},
]
class TestXAIValidation:
def test_missing_prompt_rejects(self, xai_provider):
provider, captured = xai_provider
result = provider.generate("")
assert result["success"] is False
assert result["error_type"] == "missing_prompt"
# Never hit the network
assert "client" not in captured or not captured["client"].posts
def test_image_plus_refs_rejects(self, xai_provider):
provider, captured = xai_provider
result = provider.generate(
"x",
image_url="https://example.com/i.png",
reference_image_urls=["https://example.com/r.png"],
)
assert result["success"] is False
assert result["error_type"] == "conflicting_inputs"
assert "client" not in captured or not captured["client"].posts
def test_too_many_references_rejects(self, xai_provider):
provider, captured = xai_provider
result = provider.generate(
"x",
reference_image_urls=[f"https://example.com/r{i}.png" for i in range(8)],
)
assert result["success"] is False
assert result["error_type"] == "too_many_references"
class TestXAIClamping:
def test_duration_clamped_to_15(self, xai_provider):
provider, captured = xai_provider
provider.generate("x", duration=30)
assert _last_post(captured)["json"]["duration"] == 15
def test_duration_clamped_when_refs_present(self, xai_provider):
provider, captured = xai_provider
provider.generate(
"x",
duration=15,
reference_image_urls=["https://example.com/r.png"],
)
# refs present caps to 10
assert _last_post(captured)["json"]["duration"] == 10
def test_invalid_aspect_ratio_soft_clamps(self, xai_provider):
provider, captured = xai_provider
provider.generate("x", aspect_ratio="21:9")
assert _last_post(captured)["json"]["aspect_ratio"] == "16:9"

View file

@ -0,0 +1,126 @@
"""Tests for the unified ``video_generate`` tool dispatch surface."""
from __future__ import annotations
import json
from typing import Any, Dict, List, Optional
import pytest
from agent import video_gen_registry
from agent.video_gen_provider import VideoGenProvider
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
class _RecordingProvider(VideoGenProvider):
"""Captures the kwargs the tool layer hands it."""
def __init__(self, name: str = "fake"):
self._name = name
self.last_kwargs: Dict[str, Any] = {}
@property
def name(self) -> str:
return self._name
def list_models(self) -> List[Dict[str, Any]]:
return [{"id": "model-a"}]
def default_model(self) -> Optional[str]:
return "model-a"
def generate(self, prompt, **kwargs):
self.last_kwargs = {"prompt": prompt, **kwargs}
modality = "image" if kwargs.get("image_url") else "text"
return {
"success": True,
"video": "https://example.com/v.mp4",
"model": kwargs.get("model") or "model-a",
"prompt": prompt,
"modality": modality,
"aspect_ratio": kwargs.get("aspect_ratio", ""),
"duration": kwargs.get("duration") or 0,
"provider": self._name,
}
class _RaisingProvider(VideoGenProvider):
@property
def name(self) -> str:
return "raises"
def generate(self, prompt, **kwargs):
raise RuntimeError("boom")
class TestUnifiedDispatch:
def _run(self, args: Dict[str, Any], *, configured: Optional[str] = None) -> Dict[str, Any]:
from tools import video_generation_tool
import hermes_cli.plugins as plugins_module
saved = video_generation_tool._read_configured_video_provider
video_generation_tool._read_configured_video_provider = lambda: configured # type: ignore
saved_discover = plugins_module._ensure_plugins_discovered
plugins_module._ensure_plugins_discovered = lambda *_a, **_k: None # type: ignore
try:
raw = video_generation_tool._handle_video_generate(args)
finally:
video_generation_tool._read_configured_video_provider = saved # type: ignore
plugins_module._ensure_plugins_discovered = saved_discover # type: ignore
return json.loads(raw)
def test_no_provider_returns_clear_error(self):
result = self._run({"prompt": "a dog"})
assert result["success"] is False
assert result["error_type"] == "no_provider_configured"
def test_unknown_provider_returns_clear_error(self):
result = self._run({"prompt": "a dog"}, configured="ghost")
assert result["success"] is False
assert result["error_type"] == "provider_not_registered"
def test_text_to_video_routes_without_image_url(self):
provider = _RecordingProvider("rec")
video_gen_registry.register_provider(provider)
result = self._run({"prompt": "a happy dog"})
assert result["success"] is True
assert result["modality"] == "text"
assert "image_url" not in provider.last_kwargs
assert provider.last_kwargs["aspect_ratio"] == "16:9"
assert provider.last_kwargs["resolution"] == "720p"
def test_image_to_video_routes_with_image_url(self):
provider = _RecordingProvider("rec")
video_gen_registry.register_provider(provider)
result = self._run({
"prompt": "animate this",
"image_url": "https://example.com/img.png",
})
assert result["success"] is True
assert result["modality"] == "image"
assert provider.last_kwargs["image_url"] == "https://example.com/img.png"
def test_prompt_required(self):
provider = _RecordingProvider("rec")
video_gen_registry.register_provider(provider)
result = self._run({"prompt": "", "image_url": "https://example.com/i.png"})
assert "error" in result
assert "prompt" in result["error"].lower()
def test_provider_exception_caught(self):
video_gen_registry.register_provider(_RaisingProvider())
result = self._run({"prompt": "x"})
assert result["success"] is False
assert result["error_type"] == "provider_exception"
def test_operation_field_not_in_schema(self):
"""Make sure we removed the operation field from the schema."""
from tools.video_generation_tool import VIDEO_GENERATE_SCHEMA
assert "operation" not in VIDEO_GENERATE_SCHEMA["parameters"]["properties"]
assert "video_url" not in VIDEO_GENERATE_SCHEMA["parameters"]["properties"]

View file

@ -0,0 +1,153 @@
"""Tests for the dynamic schema builder under the simplified surface."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
import pytest
import yaml
from agent import video_gen_registry
from agent.video_gen_provider import VideoGenProvider
@pytest.fixture(autouse=True)
def _reset_registry():
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
@pytest.fixture
def cfg_home(tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
return tmp_path
def _write_cfg(home, cfg: dict):
(home / "config.yaml").write_text(yaml.safe_dump(cfg))
class _BothModalitiesProvider(VideoGenProvider):
"""Supports both text-to-video AND image-to-video (the common case)."""
@property
def name(self) -> str:
return "both"
def is_available(self) -> bool:
return True
def list_models(self) -> List[Dict[str, Any]]:
return [{"id": "family-a", "modalities": ["text", "image"]}]
def default_model(self) -> Optional[str]:
return "family-a"
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": ["16:9", "9:16"],
"resolutions": ["720p", "1080p"],
"min_duration": 1,
"max_duration": 15,
"supports_audio": True,
"supports_negative_prompt": True,
"max_reference_images": 0,
}
def generate(self, prompt, **kwargs):
return {"success": True}
class _ImageOnlyProvider(VideoGenProvider):
"""Backend with only image-to-video support (rare but possible)."""
@property
def name(self) -> str:
return "img-only"
def is_available(self) -> bool:
return True
def list_models(self) -> List[Dict[str, Any]]:
return [{"id": "img-only-v1", "modalities": ["image"]}]
def default_model(self) -> Optional[str]:
return "img-only-v1"
def capabilities(self) -> Dict[str, Any]:
return {"modalities": ["image"], "min_duration": 1, "max_duration": 10}
def generate(self, prompt, **kwargs):
return {"success": True}
class TestDynamicSchemaBuilder:
def test_no_config_says_so(self, cfg_home):
from tools.video_generation_tool import _build_dynamic_video_schema
desc = _build_dynamic_video_schema()["description"]
assert "No video backend is configured" in desc
assert "hermes tools" in desc
def test_does_not_mention_edit_or_extend(self, cfg_home):
"""The simplified surface only does text→video and image→video.
The description must not mention edit/extend anywhere."""
from tools.video_generation_tool import _build_dynamic_video_schema, _GENERIC_DESCRIPTION
desc = _build_dynamic_video_schema()["description"]
# Block words that would suggest functionality we removed
assert "edit" not in desc.lower() or "audio" in desc.lower() # 'audio' contains 'audi' not 'edit'
# Stronger: no occurrence of the words "edit" or "extend" as standalone
for forbidden in (" edit ", " edits ", " extend ", " extends "):
assert forbidden not in desc.lower(), f"description leaks '{forbidden.strip()}'"
# Sanity: the generic blurb itself is also clean
for forbidden in ("edit", "extend"):
assert forbidden not in _GENERIC_DESCRIPTION.lower()
def test_both_modalities_advertises_auto_routing(self, cfg_home):
from tools.video_generation_tool import _build_dynamic_video_schema
_write_cfg(cfg_home, {"video_gen": {"provider": "both"}})
video_gen_registry.register_provider(_BothModalitiesProvider())
import hermes_cli.plugins as plugins_module
saved = plugins_module._ensure_plugins_discovered
plugins_module._ensure_plugins_discovered = lambda *a, **k: None
try:
desc = _build_dynamic_video_schema()["description"]
finally:
plugins_module._ensure_plugins_discovered = saved
assert "Active backend: Both" in desc
assert "text-to-video" in desc and "image-to-video" in desc
assert "routes automatically" in desc
# operations bullet is gone
assert "operations supported" not in desc
def test_image_only_model_warns_about_required_image_url(self, cfg_home):
from tools.video_generation_tool import _build_dynamic_video_schema
_write_cfg(cfg_home, {"video_gen": {"provider": "img-only"}})
video_gen_registry.register_provider(_ImageOnlyProvider())
import hermes_cli.plugins as plugins_module
saved = plugins_module._ensure_plugins_discovered
plugins_module._ensure_plugins_discovered = lambda *a, **k: None
try:
desc = _build_dynamic_video_schema()["description"]
finally:
plugins_module._ensure_plugins_discovered = saved
assert "image-to-video only" in desc
assert "image_url is REQUIRED" in desc
def test_builder_wired_into_registry(self):
from tools.registry import discover_builtin_tools, registry
discover_builtin_tools()
entry = registry._tools["video_generate"]
assert entry.dynamic_schema_overrides is not None
out = entry.dynamic_schema_overrides()
assert "description" in out

View file

@ -0,0 +1,253 @@
"""Tool-surface routing matrix: every (provider, model, modality) combo.
This is the integration test for the question Teknium asked: regardless
of which provider+model the user picks and whether they pass an
image_url or not, does the tool surface route correctly to the right
endpoint with the right payload shape?
Drives ``_handle_video_generate(args)`` end-to-end config write
config read registry lookup provider.generate() outbound HTTP/SDK
call. Stubs fal_client and httpx so we observe routing without hitting
the network.
"""
from __future__ import annotations
import asyncio
import json
import types
from typing import Any, Dict, List, Optional
import pytest
import yaml
@pytest.fixture(autouse=True)
def _reset_registry():
from agent import video_gen_registry
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
@pytest.fixture
def matrix_env(tmp_path, monkeypatch):
"""Set up HERMES_HOME, stub fal_client + httpx, force plugin discovery."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
monkeypatch.setenv("FAL_KEY", "test-key")
monkeypatch.setenv("XAI_API_KEY", "test-key")
fal_calls: List[Dict[str, Any]] = []
xai_calls: List[Dict[str, Any]] = []
# fal_client stub
fake_fal = types.ModuleType("fal_client")
def _subscribe(endpoint, arguments=None, with_logs=False):
fal_calls.append({"endpoint": endpoint, "arguments": arguments})
return {"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}}
fake_fal.subscribe = _subscribe # type: ignore
monkeypatch.setitem(__import__("sys").modules, "fal_client", fake_fal)
# httpx stub for xAI
import httpx
class _Resp:
def __init__(self, p, s=200):
self.status_code = s
self._p = p
self.text = json.dumps(p)
def raise_for_status(self):
if self.status_code >= 400:
raise httpx.HTTPStatusError("err", request=None, response=self) # type: ignore
def json(self):
return self._p
class _Client:
async def __aenter__(self): return self
async def __aexit__(self, *a): return None
async def post(self, url, headers=None, json=None, timeout=None):
xai_calls.append({"url": url, "json": json})
return _Resp({"request_id": "req-1"})
async def get(self, url, headers=None, timeout=None):
return _Resp({
"status": "done",
"video": {"url": "https://xai-cdn/out.mp4", "duration": 8},
"model": "grok-imagine-video",
})
import plugins.video_gen.xai as xai_plugin
monkeypatch.setattr(xai_plugin.httpx, "AsyncClient", lambda: _Client())
async def _no_sleep(*a, **k): return None
monkeypatch.setattr(asyncio, "sleep", _no_sleep)
# Reset FAL plugin's lazy fal_client cache so it picks up the stub
from plugins.video_gen import fal as fal_plugin
fal_plugin._fal_client = None
# Force discovery
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered(force=True)
return tmp_path, fal_calls, xai_calls
def _invoke_tool(home, cfg: dict, args: dict) -> dict:
"""Write config, invoke the registered tool handler, return parsed JSON."""
(home / "config.yaml").write_text(yaml.safe_dump(cfg))
import hermes_cli.config as cfg_mod
if hasattr(cfg_mod, "_invalidate_load_config_cache"):
cfg_mod._invalidate_load_config_cache()
from tools.registry import registry
handler = registry._tools["video_generate"].handler
return json.loads(handler(args))
# ─────────────────────────────────────────────────────────────────────────
# FAL: every family × {text-only, text+image}
# ─────────────────────────────────────────────────────────────────────────
# We parametrize over the catalog so the test discovers new families
# automatically. If someone adds 'sora-2' to FAL_FAMILIES, this matrix
# picks it up — no test changes needed beyond confirming the endpoints.
def _all_fal_families():
from plugins.video_gen.fal import FAL_FAMILIES
return list(FAL_FAMILIES.keys())
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_only_routes_to_text_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "a dog running"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "text"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's text endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["text_endpoint"]
# Payload must NOT contain any image-shaped key
payload = fal_calls[0]["arguments"] or {}
image_keys = [k for k in payload if "image" in k and "url" in k]
assert not image_keys, f"{family_id} text-only leaked image keys: {image_keys}"
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_plus_image_routes_to_image_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "animate this dog", "image_url": "https://example.com/dog.png"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "image"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's image endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["image_endpoint"]
# Payload must contain the right image key (may be image_url or
# start_image_url depending on the family's image_param_key)
payload = fal_calls[0]["arguments"] or {}
expected_image_key = FAL_FAMILIES[family_id].get("image_param_key") or "image_url"
assert payload.get(expected_image_key) == "https://example.com/dog.png", (
f"{family_id} text+image missing {expected_image_key} in payload "
f"(keys: {sorted(payload.keys())})"
)
# ─────────────────────────────────────────────────────────────────────────
# xAI: text-only / text+image both go to /videos/generations
# (xAI uses one endpoint with an optional 'image' field, not separate URLs)
# ─────────────────────────────────────────────────────────────────────────
def test_xai_text_only_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "a dog running"},
)
assert result["success"] is True
assert result["modality"] == "text"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert "image" not in payload
assert "reference_images" not in payload
def test_xai_text_plus_image_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "animate this", "image_url": "https://example.com/img.png"},
)
assert result["success"] is True
assert result["modality"] == "image"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert payload["image"] == {"url": "https://example.com/img.png"}
# ─────────────────────────────────────────────────────────────────────────
# tool-level `model` arg overrides config
# ─────────────────────────────────────────────────────────────────────────
def test_tool_model_arg_overrides_config(matrix_env):
"""When the tool call passes model=, it wins over video_gen.model in config."""
home, fal_calls, _ = matrix_env
# Config picks pixverse-v6, but tool call says veo3.1
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{"prompt": "a dog", "model": "veo3.1"},
)
assert result["success"] is True
assert result["model"] == "veo3.1"
# Outbound endpoint reflects the override, not config
assert fal_calls[0]["endpoint"] == "fal-ai/veo3.1"
def test_tool_model_arg_with_image_url_routes_to_override_image_endpoint(matrix_env):
"""model= override on text+image goes to the override family's image endpoint."""
home, fal_calls, _ = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{
"prompt": "animate this",
"image_url": "https://example.com/i.png",
"model": "kling-v3-4k",
},
)
assert result["success"] is True
assert result["model"] == "kling-v3-4k"
assert fal_calls[0]["endpoint"] == "fal-ai/kling-video/v3/4k/image-to-video"
# Kling 4K uses start_image_url
assert fal_calls[0]["arguments"].get("start_image_url") == "https://example.com/i.png"
assert "image_url" not in fal_calls[0]["arguments"]

View file

@ -0,0 +1,561 @@
#!/usr/bin/env python3
"""
Video Generation Tool
=====================
Single ``video_generate`` tool that dispatches to a plugin-registered
video generation provider. Mirrors the ``image_generate`` design:
- ``agent/video_gen_provider.py`` defines the :class:`VideoGenProvider` ABC.
- ``agent/video_gen_registry.py`` holds the active providers (populated by
plugins at import time).
- Each provider lives under ``plugins/video_gen/<name>/``.
The tool itself is intentionally backend-agnostic and ships **no in-tree
provider** turn on a backend by enabling a plugin (``hermes plugins
enable video_gen/<name>``) and selecting it in ``hermes tools`` Video
Generation.
Unified surface
---------------
One tool covers the common cases text-to-video, image-to-video, video
edit, video extend with a compact schema:
prompt text instruction (required for generate/edit)
operation "generate" | "edit" | "extend"
image_url drives image-to-video when operation=generate
video_url source video for edit/extend
reference_image_urls list, up to provider-declared cap
duration seconds (provider clamps)
aspect_ratio "16:9" | "9:16" | "1:1" | ...
resolution "480p" | "540p" | "720p" | "1080p"
negative_prompt optional (Pixverse/Kling style)
audio optional (Veo3/Pixverse pricing tier)
seed optional
model optional, override the active provider's default
Providers ignore parameters they do not support. The tool layer does
**lightweight** validation (type/required-prompt) and lets each provider
do its own clamping inside :meth:`VideoGenProvider.generate` that keeps
the tool surface stable as new providers ship with different capabilities.
"""
from __future__ import annotations
import json
import logging
from typing import Any, Dict, List, Optional
from agent.video_gen_provider import (
COMMON_ASPECT_RATIOS,
COMMON_RESOLUTIONS,
DEFAULT_ASPECT_RATIO,
DEFAULT_RESOLUTION,
error_response,
)
from tools.registry import registry, tool_error
logger = logging.getLogger(__name__)
VIDEO_GENERATE_SCHEMA: Dict[str, Any] = {
"name": "video_generate",
# Placeholder — the real description is built dynamically at
# get_tool_definitions() time so it reflects the active backend's
# actual capabilities (which modalities / resolutions / duration
# ranges the user's currently-selected model supports).
# See _build_dynamic_video_schema() below and the dynamic-tool-schemas
# skill at github/hermes-agent-dev/references/dynamic-tool-schemas.md.
"description": "(rebuilt at get_definitions() time — see _build_dynamic_video_schema)",
"parameters": {
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": (
"Text instruction describing the desired video, motion, "
"subject, style, camera movement, etc."
),
},
"image_url": {
"type": "string",
"description": (
"Optional public URL of a still image. When provided, "
"the active backend routes to its image-to-video "
"endpoint (animate the image); when omitted, it routes "
"to text-to-video. Pass either a URL the user supplied "
"or a path/URL from the conversation."
),
},
"reference_image_urls": {
"type": "array",
"items": {"type": "string"},
"description": (
"Optional list of reference image URLs (style or "
"character refs). Only supported by some backends; "
"the active backend's description below indicates whether "
"this is honored and what the max is."
),
},
"duration": {
"type": "integer",
"description": (
"Desired video duration in seconds. Providers clamp to "
"their supported range (commonly 4-15s). Omit to use the "
"provider's default."
),
},
"aspect_ratio": {
"type": "string",
"enum": list(COMMON_ASPECT_RATIOS),
"description": (
"Output aspect ratio. Providers clamp to their supported "
"set."
),
"default": DEFAULT_ASPECT_RATIO,
},
"resolution": {
"type": "string",
"enum": list(COMMON_RESOLUTIONS),
"description": (
"Output resolution. Providers clamp to their supported "
"set."
),
"default": DEFAULT_RESOLUTION,
},
"negative_prompt": {
"type": "string",
"description": (
"Optional negative prompt — content to avoid in the "
"output. Supported by Pixverse, Kling, and similar; "
"ignored by providers that do not support it."
),
},
"audio": {
"type": "boolean",
"description": (
"Optional audio generation toggle. Supported by Veo3 and "
"Pixverse (affects pricing tier); ignored elsewhere."
),
},
"seed": {
"type": "integer",
"description": (
"Optional seed for reproducible outputs (provider-"
"dependent)."
),
},
"model": {
"type": "string",
"description": (
"Optional model override. If omitted, the user's "
"configured ``video_gen.model`` (set via `hermes tools` "
"→ Video Generation) is used. Models that the active "
"provider does not know are rejected."
),
},
},
"required": ["prompt"],
},
}
# ---------------------------------------------------------------------------
# Config readers (mirror image_generation_tool.py)
# ---------------------------------------------------------------------------
def _read_video_gen_section() -> Dict[str, Any]:
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("video_gen") if isinstance(cfg, dict) else None
return section if isinstance(section, dict) else {}
except Exception as exc:
logger.debug("Could not read video_gen config: %s", exc)
return {}
def _read_configured_video_provider() -> Optional[str]:
value = _read_video_gen_section().get("provider")
if isinstance(value, str) and value.strip():
return value.strip()
return None
def _read_configured_video_model() -> Optional[str]:
value = _read_video_gen_section().get("model")
if isinstance(value, str) and value.strip():
return value.strip()
return None
# ---------------------------------------------------------------------------
# Availability check
# ---------------------------------------------------------------------------
def check_video_generation_requirements() -> bool:
"""Return True when at least one registered provider reports available.
Triggers plugin discovery (idempotent) so user-installed plugins are
visible to the toolset gate.
"""
try:
from agent.video_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
if provider.is_available():
return True
except Exception:
continue
except Exception:
pass
return False
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
def _resolve_active_provider():
"""Return the active provider object or None.
Forces plugin discovery before checking the registry handles cases
where a long-lived session was started before a plugin was installed.
"""
try:
from agent.video_gen_registry import get_active_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_active_provider()
if provider is None:
_ensure_plugins_discovered(force=True)
provider = get_active_provider()
return provider
except Exception as exc:
logger.debug("video_gen provider resolution failed: %s", exc)
return None
def _missing_provider_error(configured: Optional[str]) -> str:
if configured:
msg = (
f"video_gen.provider='{configured}' is set but no plugin "
f"registered that name. Run `hermes plugins list` to see "
f"installed video gen backends, or `hermes tools` → Video "
f"Generation to pick one."
)
return json.dumps(error_response(
error=msg, error_type="provider_not_registered",
provider=configured,
))
msg = (
"No video generation backend is configured. Run `hermes tools` → "
"Video Generation to enable one (xAI, FAL, or Google Veo)."
)
return json.dumps(error_response(
error=msg, error_type="no_provider_configured",
))
# ---------------------------------------------------------------------------
# Handler
# ---------------------------------------------------------------------------
def _coerce_int(value: Any) -> Optional[int]:
if value is None or value == "":
return None
try:
return int(value)
except (TypeError, ValueError):
return None
def _coerce_bool(value: Any) -> Optional[bool]:
if value is None:
return None
if isinstance(value, bool):
return value
if isinstance(value, str):
v = value.strip().lower()
if v in ("true", "1", "yes", "on"):
return True
if v in ("false", "0", "no", "off"):
return False
return None
def _normalize_reference_images(value: Any) -> Optional[List[str]]:
if value is None:
return None
if isinstance(value, str):
value = [value]
if not isinstance(value, (list, tuple)):
return None
out: List[str] = []
for item in value:
if isinstance(item, str) and item.strip():
out.append(item.strip())
return out or None
def _handle_video_generate(args: Dict[str, Any], **_kw: Any) -> str:
prompt = (args.get("prompt") or "").strip()
image_url = (args.get("image_url") or "").strip() or None
reference_image_urls = _normalize_reference_images(args.get("reference_image_urls"))
duration = _coerce_int(args.get("duration"))
aspect_ratio = (args.get("aspect_ratio") or DEFAULT_ASPECT_RATIO).strip() or DEFAULT_ASPECT_RATIO
resolution = (args.get("resolution") or DEFAULT_RESOLUTION).strip() or DEFAULT_RESOLUTION
negative_prompt = (args.get("negative_prompt") or "").strip() or None
audio = _coerce_bool(args.get("audio"))
seed = _coerce_int(args.get("seed"))
model_override = (args.get("model") or "").strip() or None
# Soft validation — providers do their own. Prompt is required by the
# schema; the backend may still accept image-only on its image-to-video
# endpoint but our surface always needs a prompt.
if not prompt:
return tool_error("prompt is required for video generation")
# Resolve the active provider.
configured = _read_configured_video_provider()
provider = _resolve_active_provider()
if provider is None:
return _missing_provider_error(configured)
# Resolve model: explicit arg wins, then config, then provider default.
model = model_override or _read_configured_video_model() or provider.default_model()
kwargs: Dict[str, Any] = {
"model": model,
"image_url": image_url,
"reference_image_urls": reference_image_urls,
"duration": duration,
"aspect_ratio": aspect_ratio,
"resolution": resolution,
"negative_prompt": negative_prompt,
"audio": audio,
"seed": seed,
}
# Drop None entries so providers see clean defaults.
kwargs = {k: v for k, v in kwargs.items() if v is not None}
try:
result = provider.generate(prompt=prompt, **kwargs)
except TypeError as exc:
# A provider that hasn't widened its signature is a bug, not a
# caller error — log and surface a clear contract message.
logger.warning(
"video_gen provider '%s' rejected kwargs (signature too narrow): %s",
getattr(provider, "name", "?"), exc,
)
return json.dumps(error_response(
error=(
f"Provider '{getattr(provider, 'name', '?')}' signature is "
f"out of date with the video_generate schema. Report this "
f"to the plugin author."
),
error_type="provider_contract",
provider=getattr(provider, "name", ""),
model=model or "",
prompt=prompt,
))
except Exception as exc:
logger.warning(
"video_gen provider '%s' raised: %s",
getattr(provider, "name", "?"), exc,
)
return json.dumps(error_response(
error=f"Provider '{getattr(provider, 'name', '?')}' error: {exc}",
error_type="provider_exception",
provider=getattr(provider, "name", ""),
model=model or "",
prompt=prompt,
))
if not isinstance(result, dict):
return json.dumps(error_response(
error="Provider returned a non-dict result",
error_type="provider_contract",
provider=getattr(provider, "name", ""),
model=model or "",
prompt=prompt,
))
return json.dumps(result)
# ---------------------------------------------------------------------------
# Dynamic schema — reflect the active backend's actual capabilities
# ---------------------------------------------------------------------------
#
# Why dynamic: the user's configured backend determines which operations
# (generate/edit/extend), modalities (text / image / refs), aspect ratios,
# resolutions, durations, and audio/negative-prompt flags are real. A model
# that calls video_generate without knowing the active backend wastes a
# turn on something like "fal-ai/veo3.1/image-to-video requires image_url".
# Surfacing the per-model surface in the description means the model
# usually gets the call right on the first try.
#
# Memoization: model_tools.get_tool_definitions() keys its cache on
# config.yaml mtime, so when the user changes provider/model via
# `hermes tools` or `/skills`, the schema rebuilds automatically.
_GENERIC_DESCRIPTION = (
"Generate a video from a text prompt (text-to-video) or animate a "
"still image (image-to-video) using the user's configured video "
"generation backend. Pass `image_url` to animate that image; omit it "
"to generate from text alone. The backend auto-routes to the right "
"endpoint. The backend and model family are user-configured via "
"`hermes tools` → Video Generation; the agent does not pick them. "
"Long-running generations may take 30 seconds to several minutes — "
"the call blocks until the video is ready. Returns either an HTTP "
"URL or an absolute file path in the `video` field; display it with "
"markdown ![description](url-or-path) and the gateway will deliver it."
)
def _format_model_caveats(
model_meta: Dict[str, Any],
backend_caps: Dict[str, Any],
) -> List[str]:
"""Pull human-readable caveats out of one model's catalog metadata.
Only surfaces things that meaningfully differ from the backend's
overall capabilities repeating defaults is noise.
"""
caveats: List[str] = []
modalities = set(model_meta.get("modalities") or [])
modality = model_meta.get("modality") # FAL's plugin uses this key for single-modality entries
if modality:
modalities.add(modality)
if "image" in modalities and "text" not in modalities:
caveats.append(
"this model is image-to-video only — image_url is REQUIRED; "
"text-only calls will be rejected"
)
elif "text" in modalities and "image" not in modalities:
caveats.append(
"this model is text-to-video only — image_url is not supported"
)
return caveats
def _build_dynamic_video_schema() -> Dict[str, Any]:
"""Build a description that reflects the active backend's actual surface.
Cheap: reads config (already memoized by the caller), asks the active
provider for `capabilities()` and the active model's catalog entry,
and formats a few lines of prose. Falls back to the generic
description when no provider is configured or registered.
"""
parts: List[str] = [_GENERIC_DESCRIPTION]
configured = _read_configured_video_provider()
configured_model = _read_configured_video_model()
if not configured:
parts.append(
"\nNo video backend is configured. Calls will return an error "
"until the user picks one via `hermes tools` → Video Generation."
)
return {"description": "\n".join(parts)}
try:
from agent.video_gen_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_provider(configured)
except Exception:
provider = None
if provider is None:
parts.append(
f"\nActive backend: {configured} (plugin not yet loaded — the "
f"tool will retry discovery on first call)."
)
return {"description": "\n".join(parts)}
try:
caps = provider.capabilities() or {}
except Exception:
caps = {}
try:
models = provider.list_models() or []
except Exception:
models = []
active_model = configured_model or provider.default_model()
model_meta = next(
(m for m in models if isinstance(m, dict) and m.get("id") == active_model),
{},
)
backend_label = provider.display_name
line = f"\nActive backend: {backend_label}"
if active_model:
line += f" · model: {active_model}"
parts.append(line)
# Model-specific caveats (the high-signal stuff)
for c in _format_model_caveats(model_meta, caps):
parts.append(f"- {c}")
# Backend modality summary — only useful when the backend supports
# both text and image. Single-modality backends are already covered by
# the model caveat above.
modalities = set(caps.get("modalities") or [])
if "text" in modalities and "image" in modalities and not model_meta.get("modality"):
parts.append(
"- supports both text-to-video (omit image_url) and "
"image-to-video (pass image_url) — routes automatically"
)
if caps.get("aspect_ratios"):
parts.append(f"- aspect_ratio choices: {', '.join(caps['aspect_ratios'])}")
if caps.get("resolutions"):
parts.append(f"- resolution choices: {', '.join(caps['resolutions'])}")
if caps.get("min_duration") and caps.get("max_duration"):
parts.append(
f"- duration range: {caps['min_duration']}-{caps['max_duration']}s"
)
if caps.get("supports_audio"):
parts.append("- audio: pass `audio=true` to enable native audio (pricing tier)")
if caps.get("supports_negative_prompt"):
parts.append("- negative_prompt: supported")
max_refs = caps.get("max_reference_images") or 0
if max_refs:
parts.append(f"- reference_image_urls: up to {max_refs} images")
return {"description": "\n".join(parts)}
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
registry.register(
name="video_generate",
toolset="video_gen",
schema=VIDEO_GENERATE_SCHEMA,
handler=_handle_video_generate,
check_fn=check_video_generation_requirements,
requires_env=[],
is_async=False,
emoji="🎬",
dynamic_schema_overrides=_build_dynamic_video_schema,
)

View file

@ -107,6 +107,17 @@ TOOLSETS = {
"includes": []
},
"video_gen": {
"description": (
"Video generation tools. Single ``video_generate`` tool covers "
"text-to-video (prompt only) and image-to-video (prompt + "
"image_url) — the active backend auto-routes. Configure via "
"``hermes tools`` → Video Generation."
),
"tools": ["video_generate"],
"includes": []
},
"computer_use": {
"description": (
"Background macOS desktop control via cua-driver — screenshots, "

View file

@ -0,0 +1,231 @@
---
sidebar_position: 12
title: "Video Generation Provider Plugins"
description: "How to build a video-generation backend plugin for Hermes Agent"
---
# Building a Video Generation Provider Plugin
Video-gen provider plugins register a backend that services every `video_generate` tool call. Built-in providers (xAI, FAL) ship as plugins. Add a new one, or override a bundled one, by dropping a directory into `plugins/video_gen/<name>/`.
:::tip
Video-gen mirrors [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) almost line-for-line — if you've built an image-gen backend, you already know the shape. The main differences: a `capabilities()` method advertising modalities/aspect-ratios/durations, and a routing convention (pass `image_url` to use image-to-video, omit it to use text-to-video — the provider picks the right endpoint internally).
:::
## The unified surface (one tool, two modalities)
The `video_generate` tool exposes two modalities through one parameter:
- **Text-to-video** — call with `prompt` only. The provider routes to its text-to-video endpoint.
- **Image-to-video** — call with `prompt` + `image_url`. The provider routes to its image-to-video endpoint.
Edit and extend are intentionally out of scope. Most backends don't support them and the inconsistency would force per-backend prose into the agent's tool description.
## How discovery works
Hermes scans for video-gen backends in three places:
1. **Bundled**`<repo>/plugins/video_gen/<name>/` (auto-loaded with `kind: backend`)
2. **User**`~/.hermes/plugins/video_gen/<name>/` (opt-in via `plugins.enabled`)
3. **Pip** — packages declaring a `hermes_agent.plugins` entry point
Each plugin's `register(ctx)` function calls `ctx.register_video_gen_provider(...)`. The active provider is picked by `video_gen.provider` in `config.yaml`; `hermes tools` → Video Generation walks users through selection. Unlike `image_generate`, there is no in-tree legacy backend — every provider is a plugin.
## Directory structure
```
plugins/video_gen/my-backend/
├── __init__.py # VideoGenProvider subclass + register()
└── plugin.yaml # Manifest with kind: backend
```
## The VideoGenProvider ABC
Subclass `agent.video_gen_provider.VideoGenProvider`. Required: `name` property and `generate()` method.
```python
# plugins/video_gen/my-backend/__init__.py
from typing import Any, Dict, List, Optional
import os
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
class MyVideoGenProvider(VideoGenProvider):
@property
def name(self) -> str:
return "my-backend"
@property
def display_name(self) -> str:
return "My Backend"
def is_available(self) -> bool:
return bool(os.environ.get("MY_API_KEY"))
def list_models(self) -> List[Dict[str, Any]]:
# Each entry is a model FAMILY — a name the user picks once.
# Your provider's generate() routes within the family based on
# whether image_url was passed.
return [
{
"id": "fast",
"display": "Fast",
"speed": "~30s",
"strengths": "Cheapest tier",
"price": "$0.05/s",
"modalities": ["text", "image"], # advisory
},
]
def default_model(self) -> Optional[str]:
return "fast"
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": ["16:9", "9:16"],
"resolutions": ["720p", "1080p"],
"min_duration": 1,
"max_duration": 10,
"supports_audio": False,
"supports_negative_prompt": True,
"max_reference_images": 0,
}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "My Backend",
"badge": "paid",
"tag": "Short description shown in `hermes tools`",
"env_vars": [
{
"key": "MY_API_KEY",
"prompt": "My Backend API key",
"url": "https://mybackend.example.com/keys",
},
],
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = "16:9",
resolution: str = "720p",
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any, # always ignore unknown kwargs for forward-compat
) -> Dict[str, Any]:
# ROUTE: image_url presence picks the endpoint.
if image_url:
endpoint = "my-backend/image-to-video"
modality_used = "image"
else:
endpoint = "my-backend/text-to-video"
modality_used = "text"
# ... call your API ...
return success_response(
video="https://your-cdn/output.mp4",
model=model or "fast",
prompt=prompt,
modality=modality_used,
aspect_ratio=aspect_ratio,
duration=duration or 5,
provider=self.name,
)
def register(ctx) -> None:
ctx.register_video_gen_provider(MyVideoGenProvider())
```
## The plugin manifest
```yaml
# plugins/video_gen/my-backend/plugin.yaml
name: my-backend
version: 1.0.0
description: "My video generation backend"
author: Your Name
kind: backend
requires_env:
- MY_API_KEY
```
## The `video_generate` schema
The tool exposes one schema across every backend. Providers ignore parameters they don't support.
| Parameter | What it does |
|---|---|
| `prompt` | Text instruction (required) |
| `image_url` | When set → image-to-video; when omitted → text-to-video |
| `reference_image_urls` | Style/character refs (provider-dependent) |
| `duration` | Seconds — provider clamps |
| `aspect_ratio` | `"16:9"`, `"9:16"`, `"1:1"`, ... — provider clamps |
| `resolution` | `"480p"` / `"540p"` / `"720p"` / `"1080p"` — provider clamps |
| `negative_prompt` | Content to avoid (Pixverse/Kling only) |
| `audio` | Native audio (Veo3 / Pixverse pricing tier) |
| `seed` | Reproducibility |
| `model` | Override the active model/family |
The provider's `capabilities()` advertises which of these are honored. The agent sees the active backend's capabilities in the tool description, dynamically rebuilt when the user changes backend via `hermes tools`.
## Model families and endpoint routing (the FAL pattern)
When your backend has multiple endpoints per "model" — like FAL, where every family (Veo 3.1, Pixverse v6, Kling O3) has both a `/text-to-video` and an `/image-to-video` URL — represent each **family** as one catalog entry. Your `generate()` picks the right endpoint based on whether `image_url` was passed:
```python
FAMILIES = {
"veo3.1": {
"text_endpoint": "fal-ai/veo3.1",
"image_endpoint": "fal-ai/veo3.1/image-to-video",
# ... family-specific capability flags ...
},
}
def generate(self, prompt, *, image_url=None, model=None, **kwargs):
family_id, family = _resolve_family(model)
endpoint = family["image_endpoint"] if image_url else family["text_endpoint"]
# ... build payload from family's declared capability flags, call endpoint ...
```
The user picks `veo3.1` once in `hermes tools`. The agent never thinks about endpoints — it just passes (or doesn't pass) `image_url`.
## Selection precedence
For per-instance model knobs (see `plugins/video_gen/fal/__init__.py`):
1. `model=` keyword from the tool call
2. `<PROVIDER>_VIDEO_MODEL` env var
3. `video_gen.<provider>.model` in `config.yaml`
4. `video_gen.model` in `config.yaml` (when it's one of your IDs)
5. Provider's `default_model()`
## Response shape
`success_response()` and `error_response()` produce the dict shape every backend returns. Use them — don't hand-roll the dict.
Success keys: `success`, `video` (URL or absolute path), `model`, `prompt`, `modality` (`"text"` or `"image"`), `aspect_ratio`, `duration`, `provider`, plus `extra`.
Error keys: `success`, `video` (None), `error`, `error_type`, `model`, `prompt`, `aspect_ratio`, `provider`.
## Where to save artifacts
If your backend returns base64, use `save_b64_video()` to write under `$HERMES_HOME/cache/videos/`. For raw bytes from a follow-up HTTP fetch, use `save_bytes_video()`. Otherwise return the upstream URL directly — the gateway resolves remote URLs on delivery.
## Testing
Drop a smoke test under `tests/plugins/video_gen/test_<name>_plugin.py`. The xAI and FAL tests show the pattern — register, verify catalog, exercise routing both with and without `image_url`, assert clean error responses on missing auth.

View file

@ -20,6 +20,7 @@ Hermes has several distinct pluggable interfaces — some use Python `register_*
| A **memory backend** (Honcho/Mem0/Supermemory/etc.) | [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) |
| A **context-compression engine** | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| An **image-generation backend** | [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) |
| A **video-generation backend** | [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, voice cloning, …) | [TTS custom command providers](/docs/user-guide/features/tts#custom-command-providers) — config-driven, no Python needed |
| An **STT backend** (custom whisper / ASR CLI) | [Voice Message Transcription](/docs/user-guide/features/tts#voice-message-transcription-stt) — set `HERMES_LOCAL_STT_COMMAND` to a shell template |
| **External tools via MCP** (filesystem, GitHub, Linear, any MCP server) | [MCP](/docs/user-guide/features/mcp) — declare `mcp_servers.<name>` in `config.yaml` |

View file

@ -66,6 +66,7 @@ Or in-session:
| `homeassistant` | `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services` | Smart home control via Home Assistant. Only available when `HASS_TOKEN` is set. |
| `computer_use` | `computer_use` | Background macOS desktop control via cua-driver — does not steal cursor/focus. Works with any tool-capable model. macOS only; requires `cua-driver` on `$PATH`. |
| `image_gen` | `image_generate` | Text-to-image generation via FAL.ai (with opt-in OpenAI / xAI backends). |
| `video_gen` | `video_generate` | Text-to-video and image-to-video via plugin-registered backends (xAI Grok-Imagine, FAL.ai Veo 3.1 / Pixverse v6 / Kling O3). Pass `image_url` to animate an image; omit it for text-to-video. |
| `kanban` | `kanban_block`, `kanban_comment`, `kanban_complete`, `kanban_create`, `kanban_heartbeat`, `kanban_link`, `kanban_show` | Multi-agent coordination tools — only registered when the agent is spawned by the kanban dispatcher (`HERMES_KANBAN_TASK` env set). Lets workers mark tasks done with structured handoffs, block for human input, heartbeat during long ops, comment on threads, and (for orchestrators) fan out into child tasks. |
| `memory` | `memory` | Persistent cross-session memory management. |
| `messaging` | `send_message` | Send messages to other platforms (Telegram, Discord, etc.) from within a session. |

View file

@ -109,6 +109,7 @@ Every `ctx.*` API below is available inside a plugin's `register(ctx)` function.
| Distribute via pip | `[project.entry-points."hermes_agent.plugins"]` |
| Register a gateway platform (Discord, Telegram, IRC, …) | `ctx.register_platform(name, label, adapter_factory, check_fn, ...)` — see [Adding Platform Adapters](/docs/developer-guide/adding-platform-adapters) |
| Register an image-generation backend | `ctx.register_image_gen_provider(provider)` — see [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) |
| Register a video-generation backend | `ctx.register_video_gen_provider(provider)` — see [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) |
| Register a context-compression engine | `ctx.register_context_engine(engine)` — see [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| Register a memory backend | Subclass `MemoryProvider` in `plugins/memory/<name>/__init__.py` — see [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) (uses a separate discovery system) |
| Run a host-owned LLM call | `ctx.llm.complete(...)` / `ctx.llm.complete_structured(...)` — borrow the user's active model + auth for a one-shot completion with optional JSON schema validation. See [Plugin LLM Access](/docs/developer-guide/plugin-llm-access) |
@ -230,6 +231,7 @@ The table above shows the four plugin categories, but within "General plugins" t
| A **memory backend** (Honcho, Mem0, Supermemory, …) | Memory plugin — subclass `MemoryProvider` in `plugins/memory/<name>/` | [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) |
| A **context-compression strategy** | Context-engine plugin — `ctx.register_context_engine()` | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) |
| A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven — declare under `tts.providers.<name>` with `type: command` in `config.yaml` | [TTS setup](/docs/user-guide/features/tts#custom-command-providers) |
| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/docs/user-guide/features/tts#voice-message-transcription-stt) |
| **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/docs/user-guide/features/mcp) |

View file

@ -223,6 +223,7 @@ const sidebars: SidebarsConfig = {
'developer-guide/context-engine-plugin',
'developer-guide/model-provider-plugin',
'developer-guide/image-gen-provider-plugin',
'developer-guide/video-gen-provider-plugin',
'developer-guide/plugin-llm-access',
'developer-guide/creating-skills',
'developer-guide/extending-the-cli',