hermes-agent/tests/tools/test_video_generation_tool_surface_matrix.py
Teknium 9d42c2c286
feat(video_gen): unified video_generate tool with pluggable provider backends (#25126)
* feat(video_gen): unified video_generate tool with pluggable provider backends

One core video_generate tool, every backend a plugin. Mirrors the
image_gen + memory_provider + context_engine architecture: ABC, registry,
plugin-context registration hook, and per-plugin model catalogs surfaced
through hermes tools.

Surface (one schema, every backend):
- operation: generate / edit / extend
- modalities: text-to-video (prompt only), image-to-video (prompt +
  image_url), video edit (prompt + video_url), video extend (video_url)
- reference_image_urls, duration, aspect_ratio, resolution,
  negative_prompt, audio, seed, model override
- Providers ignore unknown kwargs and declare what they support via
  VideoGenProvider.capabilities() — backend-specific quirks stay in the
  backend, the agent learns one tool

Backends shipped:
- plugins/video_gen/xai/  — Grok-Imagine, full generate/edit/extend +
  image-to-video + reference images (salvaged from PR #10600 by
  @Jaaneek, reshaped into the plugin interface)
- plugins/video_gen/fal/  — Veo 3.1 (t2v + i2v), Kling O3 i2v,
  Pixverse v6 i2v with model-aware payload building that drops keys a
  model doesn't declare

Wiring:
- agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation,
  success_response / error_response, save_b64_video / save_bytes_video,
  $HERMES_HOME/cache/videos/
- agent/video_gen_registry.py — thread-safe register/get/list +
  get_active_provider() reading video_gen.provider from config.yaml
- hermes_cli/plugins.py — PluginContext.register_video_gen_provider()
- hermes_cli/tools_config.py — Video Generation category in
  hermes tools, plugin-only providers list, model picker per plugin,
  config write to video_gen.{provider,model}
- toolsets.py — new video_gen toolset
- tests: 31 new tests covering ABC, registry, tool dispatch, both plugins
- docs: developer-guide/video-gen-provider-plugin.md (parallel to the
  image-gen guide), sidebar + toolsets-reference + plugin guides updated

Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse),
#10458 (provider categories), #10786 (xAI media+search bundle), #2984
(FAL duplicate), #19086 (Google Veo standalone — easy port to plugin
interface).

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): dynamic schema reflects active backend's capabilities

Address the 'capability variance' question — instead of one tool with a
static schema that lies about what every backend supports, the
video_generate tool now rebuilds its description at get_definitions()
time based on the configured video_gen.provider and video_gen.model.

The agent sees backend-specific guidance up-front:
- 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is
  REQUIRED; text-only prompts will be rejected'
- 'fal-ai/veo3.1' (t2v): no image_url restriction shown
- xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7
  reference_image_urls'
- Backends without edit/extend: 'not supported on this backend — surface
  that they need to switch backends via hermes tools'

This is the same pattern PR #22694 used for delegate_task self-capping —
documented in the dynamic-tool-schemas skill. Cache invalidation is
free: get_tool_definitions() already memoizes on config.yaml mtime, so a
mid-session backend swap rebuilds the schema automatically.

Tested:
- Empirical FAL OpenAPI schema check confirms image-to-video models
  require image_url (FAL returns HTTP 422 otherwise) — client-side
  rejection in FALVideoGenProvider.generate() now prevents the wasted
  round-trip
- Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean
  missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches
- 6 new tests cover the builder (no config / image-only / full-surface /
  text-only / unknown provider / registry wiring), all passing
- 37/37 in the slice, 134/134 in the broader regression set

* test(video_gen/xai): full surface integration tests + cleaner schema

Verified end-to-end that the xAI plugin handles every documented mode
from PR #10600's surface: text-to-video, image-to-video,
reference-images-to-video, video edit, video extend (with and without
prompt). All five modes route to the correct xAI endpoint
(/videos/generations, /videos/edits, /videos/extensions) with the right
payload shape (image / reference_images / video keys), and all five
client-side rejections fire before the network: edit-without-prompt,
extend-without-video_url, image+refs conflict, >7 references, and
duration/aspect_ratio clamping.

15 new integration tests grouped into four classes (endpoint routing,
modalities, validation, clamping). httpx is stubbed via a small fake
AsyncClient that records POSTs so the tests assert the actual payload
the plugin would send to xAI — not just the success/error envelope.

Also cleaned up a description redundancy: when a model's operations
match the backend's overall set, we no longer print the duplicate
'operations supported by this model' line. xAI's description now reads:

    Active backend: xAI . model: grok-imagine-video
    - operations supported by this backend: edit, extend, generate
    - modalities supported by this backend: image, reference_images, text
    - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16
    - resolution choices: 480p, 720p
    - duration range: 1-15s
    - reference_image_urls: up to 7 images

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing

Two design changes per Teknium:

1) Drop edit/extend from the tool surface entirely. Only text-to-video
and image-to-video remain. The agent sees a clean tool with two
modalities; backend-specific quirks like xAI's edit/extend endpoints
stay out of the unified schema.

2) FAL: pick a model FAMILY once, the plugin routes between the
family's text-to-video and image-to-video endpoints based on whether
image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND
'fal-ai/veo3.1/image-to-video' as separate options — they pick
'veo3.1', and the plugin handles the rest.

Catalog rewritten as families:

    veo3.1            fal-ai/veo3.1                                /  fal-ai/veo3.1/image-to-video
    pixverse-v6       fal-ai/pixverse/v6/text-to-video             /  fal-ai/pixverse/v6/image-to-video
    kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video /  fal-ai/kling-video/o3/standard/image-to-video

xAI uses a single endpoint (/videos/generations) for both modes,
routed by the presence of the 'image' field in the payload — no
edit/extend exposure.

Schema changes:
- VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params:
  prompt (required), image_url, reference_image_urls, duration,
  aspect_ratio, resolution, negative_prompt, audio, seed, model.
- VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS,
  DEFAULT_OPERATION. capabilities() drops 'operations' key.
- success_response: add 'modality' field ('text' | 'image') so the
  agent and logs can see which endpoint was actually hit.

Dynamic schema builder simplified — no operations bullet, no
'switch backends if you need edit/extend' guidance. When the active
backend supports both modalities (the common case), description reads:

    Active backend: FAL . model: pixverse-v6
    - supports both text-to-video (omit image_url) and image-to-video
      (pass image_url) - routes automatically
    - aspect_ratio choices: 16:9, 9:16, 1:1
    - resolution choices: 360p, 540p, 720p, 1080p
    - duration range: 1-15s
    - audio: pass audio=true to enable native audio (pricing tier)
    - negative_prompt: supported

Tests: 51 in the video_gen slice, 216 across the broader image+video
sweep, all passing. New FAL routing tests prove pixverse-v6 + no image
hits text-to-video endpoint, pixverse-v6 + image_url hits
image-to-video endpoint, same for veo3.1 and kling-o3-standard.

Docs updated: developer-guide page rewrites the 'model families' pattern
as a first-class section so external plugin authors know the convention.
toolsets-reference and toolsets.py descriptions match the new surface.

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers

Catalog now covers everything Teknium specced from FAL:

  Cheap tier:
    ltx-2.3        fal-ai/ltx-2.3-22b/text-to-video       / image-to-video
    pixverse-v6    fal-ai/pixverse/v6/text-to-video       / image-to-video

  Premium tier:
    veo3.1         fal-ai/veo3.1                          / fal-ai/veo3.1/image-to-video
    seedance-2.0   bytedance/seedance-2.0/text-to-video   / image-to-video
    kling-v3-4k    fal-ai/kling-video/v3/4k/text-to-video / image-to-video
    happy-horse    fal-ai/happy-horse/text-to-video       / image-to-video

DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane
defaults, both modalities) — better first-run UX for users who haven't
explicitly picked a model.

New family-entry knob: image_param_key. Kling v3 4K's image-to-video
endpoint expects start_image_url instead of image_url; declaring
image_param_key='start_image_url' on the family lets _build_payload
remap correctly. Other families default to plain image_url.

Per-family capability flags reflect each model's docs:
- LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution
  enum exposed by FAL — let endpoint apply defaults)
- Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported,
  negative prompts NOT supported per docs
- Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative
- Veo 3.1: unchanged, 16:9/9:16, 4/6/8s

Tests: +5 covering the new families (full catalog, Kling 4K
start_image_url remap, Seedance routing, LTX payload minimality, Happy
Horse minimality). 56/56 in the slice green.

Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes
already has a direct xAI plugin that talks to xAI's own API; routing
the same model through FAL's wrapper would duplicate the surface
without adding capabilities. Users on FAL who want Grok-Imagine should
use the xAI plugin directly; flag if you want both routes available.

* test(video_gen): tool-surface routing matrix — every model x modality

End-to-end matrix test driven through _handle_video_generate() — the
actual function the agent's video_generate tool call lands in. Writes
config.yaml, invokes the registered handler with a raw args dict, then
asserts the outbound HTTP/SDK call hit the right endpoint with the right
payload shape.

Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new
families as they're added (add a family to FAL_FAMILIES and you get
both modalities tested for free).

Coverage:
- All 6 FAL families x {text-only, text+image} = 12 cases
- xAI x {text-only, text+image} = 2 cases
- tool-level model= arg overrides config = 2 cases

For each case, verifies:
- result['success'] is True
- result['modality'] matches input shape ('text' if no image_url, 'image' otherwise)
- outbound endpoint URL matches the family's text_endpoint or image_endpoint
- text-only payloads carry no image-shaped keys
- text+image payloads carry the family's image key (image_url for most,
  start_image_url for kling-v3-4k, wrapped 'image' object for xAI)

All 16 cases passing. Confirms the tool surface routes every
(provider, model, modality) combination correctly with zero leakage.

* feat(video_gen): keep video_gen out of first-run setup, surface in status

Two changes:

1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in
   the first-run toolset checklist. Video gen is niche, paid, and slow —
   most users don't want it nagging them during initial setup. Anyone
   who wants it opts in via 'hermes tools' -> Video Generation, which
   already routes to the provider+model picker.

2. The 'hermes setup' status panel learns about video_gen — but only
   shows the row when a plugin reports available. Users without
   FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of
   those keys see 'Video Generation (FAL) ✓' as confirmation it's wired.

Verified live:
- Fresh install (no creds): zero video_gen mentions in wizard.
- With FAL_KEY: status row appears with active backend name.
- 160/160 in the setup + tools_config + video_gen test slice.

Rationale: image_gen is on by default because it's a featured creative
tool used in casual chat (telegrams, etc). Video gen is heavier — long
wait, paid per-second pricing. Default-off matches user intent better.

---------

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-05-13 16:39:41 -07:00

253 lines
10 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Tool-surface routing matrix: every (provider, model, modality) combo.
This is the integration test for the question Teknium asked: regardless
of which provider+model the user picks and whether they pass an
image_url or not, does the tool surface route correctly to the right
endpoint with the right payload shape?
Drives ``_handle_video_generate(args)`` end-to-end — config write →
config read → registry lookup → provider.generate() → outbound HTTP/SDK
call. Stubs fal_client and httpx so we observe routing without hitting
the network.
"""
from __future__ import annotations
import asyncio
import json
import types
from typing import Any, Dict, List, Optional
import pytest
import yaml
@pytest.fixture(autouse=True)
def _reset_registry():
from agent import video_gen_registry
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
@pytest.fixture
def matrix_env(tmp_path, monkeypatch):
"""Set up HERMES_HOME, stub fal_client + httpx, force plugin discovery."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
monkeypatch.setenv("FAL_KEY", "test-key")
monkeypatch.setenv("XAI_API_KEY", "test-key")
fal_calls: List[Dict[str, Any]] = []
xai_calls: List[Dict[str, Any]] = []
# fal_client stub
fake_fal = types.ModuleType("fal_client")
def _subscribe(endpoint, arguments=None, with_logs=False):
fal_calls.append({"endpoint": endpoint, "arguments": arguments})
return {"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}}
fake_fal.subscribe = _subscribe # type: ignore
monkeypatch.setitem(__import__("sys").modules, "fal_client", fake_fal)
# httpx stub for xAI
import httpx
class _Resp:
def __init__(self, p, s=200):
self.status_code = s
self._p = p
self.text = json.dumps(p)
def raise_for_status(self):
if self.status_code >= 400:
raise httpx.HTTPStatusError("err", request=None, response=self) # type: ignore
def json(self):
return self._p
class _Client:
async def __aenter__(self): return self
async def __aexit__(self, *a): return None
async def post(self, url, headers=None, json=None, timeout=None):
xai_calls.append({"url": url, "json": json})
return _Resp({"request_id": "req-1"})
async def get(self, url, headers=None, timeout=None):
return _Resp({
"status": "done",
"video": {"url": "https://xai-cdn/out.mp4", "duration": 8},
"model": "grok-imagine-video",
})
import plugins.video_gen.xai as xai_plugin
monkeypatch.setattr(xai_plugin.httpx, "AsyncClient", lambda: _Client())
async def _no_sleep(*a, **k): return None
monkeypatch.setattr(asyncio, "sleep", _no_sleep)
# Reset FAL plugin's lazy fal_client cache so it picks up the stub
from plugins.video_gen import fal as fal_plugin
fal_plugin._fal_client = None
# Force discovery
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered(force=True)
return tmp_path, fal_calls, xai_calls
def _invoke_tool(home, cfg: dict, args: dict) -> dict:
"""Write config, invoke the registered tool handler, return parsed JSON."""
(home / "config.yaml").write_text(yaml.safe_dump(cfg))
import hermes_cli.config as cfg_mod
if hasattr(cfg_mod, "_invalidate_load_config_cache"):
cfg_mod._invalidate_load_config_cache()
from tools.registry import registry
handler = registry._tools["video_generate"].handler
return json.loads(handler(args))
# ─────────────────────────────────────────────────────────────────────────
# FAL: every family × {text-only, text+image}
# ─────────────────────────────────────────────────────────────────────────
# We parametrize over the catalog so the test discovers new families
# automatically. If someone adds 'sora-2' to FAL_FAMILIES, this matrix
# picks it up — no test changes needed beyond confirming the endpoints.
def _all_fal_families():
from plugins.video_gen.fal import FAL_FAMILIES
return list(FAL_FAMILIES.keys())
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_only_routes_to_text_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "a dog running"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "text"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's text endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["text_endpoint"]
# Payload must NOT contain any image-shaped key
payload = fal_calls[0]["arguments"] or {}
image_keys = [k for k in payload if "image" in k and "url" in k]
assert not image_keys, f"{family_id} text-only leaked image keys: {image_keys}"
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_plus_image_routes_to_image_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "animate this dog", "image_url": "https://example.com/dog.png"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "image"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's image endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["image_endpoint"]
# Payload must contain the right image key (may be image_url or
# start_image_url depending on the family's image_param_key)
payload = fal_calls[0]["arguments"] or {}
expected_image_key = FAL_FAMILIES[family_id].get("image_param_key") or "image_url"
assert payload.get(expected_image_key) == "https://example.com/dog.png", (
f"{family_id} text+image missing {expected_image_key} in payload "
f"(keys: {sorted(payload.keys())})"
)
# ─────────────────────────────────────────────────────────────────────────
# xAI: text-only / text+image both go to /videos/generations
# (xAI uses one endpoint with an optional 'image' field, not separate URLs)
# ─────────────────────────────────────────────────────────────────────────
def test_xai_text_only_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "a dog running"},
)
assert result["success"] is True
assert result["modality"] == "text"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert "image" not in payload
assert "reference_images" not in payload
def test_xai_text_plus_image_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "animate this", "image_url": "https://example.com/img.png"},
)
assert result["success"] is True
assert result["modality"] == "image"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert payload["image"] == {"url": "https://example.com/img.png"}
# ─────────────────────────────────────────────────────────────────────────
# tool-level `model` arg overrides config
# ─────────────────────────────────────────────────────────────────────────
def test_tool_model_arg_overrides_config(matrix_env):
"""When the tool call passes model=, it wins over video_gen.model in config."""
home, fal_calls, _ = matrix_env
# Config picks pixverse-v6, but tool call says veo3.1
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{"prompt": "a dog", "model": "veo3.1"},
)
assert result["success"] is True
assert result["model"] == "veo3.1"
# Outbound endpoint reflects the override, not config
assert fal_calls[0]["endpoint"] == "fal-ai/veo3.1"
def test_tool_model_arg_with_image_url_routes_to_override_image_endpoint(matrix_env):
"""model= override on text+image goes to the override family's image endpoint."""
home, fal_calls, _ = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{
"prompt": "animate this",
"image_url": "https://example.com/i.png",
"model": "kling-v3-4k",
},
)
assert result["success"] is True
assert result["model"] == "kling-v3-4k"
assert fal_calls[0]["endpoint"] == "fal-ai/kling-video/v3/4k/image-to-video"
# Kling 4K uses start_image_url
assert fal_calls[0]["arguments"].get("start_image_url") == "https://example.com/i.png"
assert "image_url" not in fal_calls[0]["arguments"]