feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)

* feat(plugins): pluggable image_gen backends + OpenAI provider

Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:

- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
  Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
  incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
  `image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
  `plugin.yaml` of their own).

Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.

FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.

- 41 unit tests (scanner recursion, kind parsing, gate logic,
  registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
  `_handle_image_generate` routes to OpenAI when configured

* fix(image_gen/openai): don't send response_format to gpt-image-*

The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.

* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog

gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.

Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.

* feat(image_gen/openai): expose gpt-image-2 as three quality tiers

Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:

  gpt-image-2-low     ~15s   fast iteration
  gpt-image-2-medium  ~40s   default
  gpt-image-2-high    ~2min  highest fidelity

Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.

Config:
  image_gen.openai.model: gpt-image-2-high
  # or
  image_gen.model: gpt-image-2-low
  # or env var for scripts/tests
  OPENAI_IMAGE_MODEL=gpt-image-2-medium

Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.

* feat(tools_config): plugin image_gen providers inject themselves into picker

'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.

Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
  (name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
  every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
  Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
  writes image_gen.provider and routes to the plugin's list_models()
  catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
  FAL key when any plugin provider reports is_available().

FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.

Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.

397 tests pass across plugins/, tools_config, registry, and picker.

* fix(image_gen): close final gaps for plugin-backend parity with FAL

Two small places that still hardcoded FAL:

- hermes_cli/setup.py status line: an OpenAI-only setup showed
  'Image Generation: missing FAL_KEY'. Now probes plugin providers
  and reports '(OpenAI)' when one is_available() — or falls back to
  'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.

- image_generate tool schema description: said 'using FAL.ai, default
  FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
  user-configured' — and notes the 'image' field can be a URL or an
  absolute path, which the gateway delivers either way via
  extract_local_files().
This commit is contained in:
Teknium 2026-04-21 21:30:10 -07:00 committed by GitHub
parent d1acf17773
commit ff9752410a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
13 changed files with 2122 additions and 67 deletions

View file

@ -774,14 +774,41 @@ def check_fal_api_key() -> bool:
def check_image_generation_requirements() -> bool:
"""True if FAL credentials and fal_client SDK are both available."""
"""True if any image gen backend is available.
Providers are considered in this order:
1. The in-tree FAL backend (FAL_KEY or managed gateway).
2. Any plugin-registered provider whose ``is_available()`` returns True.
Plugins win only when the in-tree FAL path is NOT ready, which matches
the historical behavior: shipping hermes with a FAL key configured
should still expose the tool. The active selection among ready
providers is resolved per-call by ``image_gen.provider``.
"""
try:
if not check_fal_api_key():
return False
fal_client # noqa: F401 — SDK presence check
return True
if check_fal_api_key():
fal_client # noqa: F401 — SDK presence check
return True
except ImportError:
return False
pass
# Probe plugin providers. Discovery is idempotent and cheap.
try:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
if provider.is_available():
return True
except Exception:
continue
except Exception:
pass
return False
# ---------------------------------------------------------------------------
@ -827,10 +854,11 @@ from tools.registry import registry, tool_error
IMAGE_GENERATE_SCHEMA = {
"name": "image_generate",
"description": (
"Generate high-quality images from text prompts using FAL.ai. "
"The underlying model is user-configured (default: FLUX 2 Klein 9B, "
"sub-1s generation) and is not selectable by the agent. Returns a "
"single image URL. Display it using markdown: ![description](URL)"
"Generate high-quality images from text prompts. The underlying "
"backend (FAL, OpenAI, etc.) and model are user-configured and not "
"selectable by the agent. Returns either a URL or an absolute file "
"path in the `image` field; display it with markdown "
"![description](url-or-path) and the gateway will deliver it."
),
"parameters": {
"type": "object",
@ -851,13 +879,104 @@ IMAGE_GENERATE_SCHEMA = {
}
def _read_configured_image_provider():
"""Return the value of ``image_gen.provider`` from config.yaml, or None.
We only consult the plugin registry when this is explicitly set an
unset value keeps users on the legacy in-tree FAL path even when other
providers happen to be registered (e.g. a user has OPENAI_API_KEY set
for other features but never asked for OpenAI image gen).
"""
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("image_gen") if isinstance(cfg, dict) else None
if isinstance(section, dict):
value = section.get("provider")
if isinstance(value, str) and value.strip():
return value.strip()
except Exception as exc:
logger.debug("Could not read image_gen.provider: %s", exc)
return None
def _dispatch_to_plugin_provider(prompt: str, aspect_ratio: str):
"""Route the call to a plugin-registered provider when one is selected.
Returns a JSON string on dispatch, or ``None`` to fall through to the
built-in FAL path.
Dispatch only fires when ``image_gen.provider`` is explicitly set AND
it does not point to ``fal`` (FAL still lives in-tree in this PR;
a later PR ports it into ``plugins/image_gen/fal/``). Any other value
that matches a registered plugin provider wins.
"""
configured = _read_configured_image_provider()
if not configured or configured == "fal":
return None
try:
# Import locally so plugin discovery isn't triggered just by
# importing this module (tests rely on that).
from agent.image_gen_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_provider(configured)
except Exception as exc:
logger.debug("image_gen plugin dispatch skipped: %s", exc)
return None
if provider is None:
return json.dumps({
"success": False,
"image": None,
"error": (
f"image_gen.provider='{configured}' is set but no plugin "
f"registered that name. Run `hermes plugins list` to see "
f"available image gen backends."
),
"error_type": "provider_not_registered",
})
try:
result = provider.generate(prompt=prompt, aspect_ratio=aspect_ratio)
except Exception as exc:
logger.warning(
"Image gen provider '%s' raised: %s",
getattr(provider, "name", "?"), exc,
)
return json.dumps({
"success": False,
"image": None,
"error": f"Provider '{getattr(provider, 'name', '?')}' error: {exc}",
"error_type": "provider_exception",
})
if not isinstance(result, dict):
return json.dumps({
"success": False,
"image": None,
"error": "Provider returned a non-dict result",
"error_type": "provider_contract",
})
return json.dumps(result)
def _handle_image_generate(args, **kw):
prompt = args.get("prompt", "")
if not prompt:
return tool_error("prompt is required for image generation")
aspect_ratio = args.get("aspect_ratio", DEFAULT_ASPECT_RATIO)
# Route to a plugin-registered provider if one is active (and it's
# not the in-tree FAL path).
dispatched = _dispatch_to_plugin_provider(prompt, aspect_ratio)
if dispatched is not None:
return dispatched
return image_generate_tool(
prompt=prompt,
aspect_ratio=args.get("aspect_ratio", DEFAULT_ASPECT_RATIO),
aspect_ratio=aspect_ratio,
)