mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
242 lines
7.4 KiB
Python
242 lines
7.4 KiB
Python
"""
|
|
Image Generation Provider ABC
|
|
=============================
|
|
|
|
Defines the pluggable-backend interface for image generation. Providers register
|
|
instances via ``PluginContext.register_image_gen_provider()``; the active one
|
|
(selected via ``image_gen.provider`` in ``config.yaml``) services every
|
|
``image_generate`` tool call.
|
|
|
|
Providers live in ``<repo>/plugins/image_gen/<name>/`` (built-in, auto-loaded
|
|
as ``kind: backend``) or ``~/.hermes/plugins/image_gen/<name>/`` (user, opt-in
|
|
via ``plugins.enabled``).
|
|
|
|
Response shape
|
|
--------------
|
|
All providers return a dict that :func:`success_response` / :func:`error_response`
|
|
produce. The tool wrapper JSON-serializes it. Keys:
|
|
|
|
success bool
|
|
image str | None URL or absolute file path
|
|
model str provider-specific model identifier
|
|
prompt str echoed prompt
|
|
aspect_ratio str "landscape" | "square" | "portrait"
|
|
provider str provider name (for diagnostics)
|
|
error str only when success=False
|
|
error_type str only when success=False
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import abc
|
|
import base64
|
|
import datetime
|
|
import logging
|
|
import uuid
|
|
from pathlib import Path
|
|
from typing import Any, Dict, List, Optional, Tuple
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
VALID_ASPECT_RATIOS: Tuple[str, ...] = ("landscape", "square", "portrait")
|
|
DEFAULT_ASPECT_RATIO = "landscape"
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# ABC
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class ImageGenProvider(abc.ABC):
|
|
"""Abstract base class for an image generation backend.
|
|
|
|
Subclasses must implement :meth:`generate`. Everything else has sane
|
|
defaults — override only what your provider needs.
|
|
"""
|
|
|
|
@property
|
|
@abc.abstractmethod
|
|
def name(self) -> str:
|
|
"""Stable short identifier used in ``image_gen.provider`` config.
|
|
|
|
Lowercase, no spaces. Examples: ``fal``, ``openai``, ``replicate``.
|
|
"""
|
|
|
|
@property
|
|
def display_name(self) -> str:
|
|
"""Human-readable label shown in ``hermes tools``. Defaults to ``name.title()``."""
|
|
return self.name.title()
|
|
|
|
def is_available(self) -> bool:
|
|
"""Return True when this provider can service calls.
|
|
|
|
Typically checks for a required API key. Default: True
|
|
(providers with no external dependencies are always available).
|
|
"""
|
|
return True
|
|
|
|
def list_models(self) -> List[Dict[str, Any]]:
|
|
"""Return catalog entries for ``hermes tools`` model picker.
|
|
|
|
Each entry::
|
|
|
|
{
|
|
"id": "gpt-image-1.5", # required
|
|
"display": "GPT Image 1.5", # optional; defaults to id
|
|
"speed": "~10s", # optional
|
|
"strengths": "...", # optional
|
|
"price": "$...", # optional
|
|
}
|
|
|
|
Default: empty list (provider has no user-selectable models).
|
|
"""
|
|
return []
|
|
|
|
def get_setup_schema(self) -> Dict[str, Any]:
|
|
"""Return provider metadata for the ``hermes tools`` picker.
|
|
|
|
Used by ``tools_config.py`` to inject this provider as a row in
|
|
the Image Generation provider list. Shape::
|
|
|
|
{
|
|
"name": "OpenAI", # picker label
|
|
"badge": "paid", # optional short tag
|
|
"tag": "One-line description...", # optional subtitle
|
|
"env_vars": [ # keys to prompt for
|
|
{"key": "OPENAI_API_KEY",
|
|
"prompt": "OpenAI API key",
|
|
"url": "https://platform.openai.com/api-keys"},
|
|
],
|
|
}
|
|
|
|
Default: minimal entry derived from ``display_name``. Override to
|
|
expose API key prompts and custom badges.
|
|
"""
|
|
return {
|
|
"name": self.display_name,
|
|
"badge": "",
|
|
"tag": "",
|
|
"env_vars": [],
|
|
}
|
|
|
|
def default_model(self) -> Optional[str]:
|
|
"""Return the default model id, or None if not applicable."""
|
|
models = self.list_models()
|
|
if models:
|
|
return models[0].get("id")
|
|
return None
|
|
|
|
@abc.abstractmethod
|
|
def generate(
|
|
self,
|
|
prompt: str,
|
|
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
|
|
**kwargs: Any,
|
|
) -> Dict[str, Any]:
|
|
"""Generate an image.
|
|
|
|
Implementations should return the dict from :func:`success_response`
|
|
or :func:`error_response`. ``kwargs`` may contain forward-compat
|
|
parameters future versions of the schema will expose — implementations
|
|
should ignore unknown keys.
|
|
"""
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Helpers
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def resolve_aspect_ratio(value: Optional[str]) -> str:
|
|
"""Clamp an aspect_ratio value to the valid set, defaulting to landscape.
|
|
|
|
Invalid values are coerced rather than rejected so the tool surface is
|
|
forgiving of agent mistakes.
|
|
"""
|
|
if not isinstance(value, str):
|
|
return DEFAULT_ASPECT_RATIO
|
|
v = value.strip().lower()
|
|
if v in VALID_ASPECT_RATIOS:
|
|
return v
|
|
return DEFAULT_ASPECT_RATIO
|
|
|
|
|
|
def _images_cache_dir() -> Path:
|
|
"""Return ``$HERMES_HOME/cache/images/``, creating parents as needed."""
|
|
from hermes_constants import get_hermes_home
|
|
|
|
path = get_hermes_home() / "cache" / "images"
|
|
path.mkdir(parents=True, exist_ok=True)
|
|
return path
|
|
|
|
|
|
def save_b64_image(
|
|
b64_data: str,
|
|
*,
|
|
prefix: str = "image",
|
|
extension: str = "png",
|
|
) -> Path:
|
|
"""Decode base64 image data and write it under ``$HERMES_HOME/cache/images/``.
|
|
|
|
Returns the absolute :class:`Path` to the saved file.
|
|
|
|
Filename format: ``<prefix>_<YYYYMMDD_HHMMSS>_<short-uuid>.<ext>``.
|
|
"""
|
|
raw = base64.b64decode(b64_data)
|
|
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
|
|
short = uuid.uuid4().hex[:8]
|
|
path = _images_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
|
|
path.write_bytes(raw)
|
|
return path
|
|
|
|
|
|
def success_response(
|
|
*,
|
|
image: str,
|
|
model: str,
|
|
prompt: str,
|
|
aspect_ratio: str,
|
|
provider: str,
|
|
extra: Optional[Dict[str, Any]] = None,
|
|
) -> Dict[str, Any]:
|
|
"""Build a uniform success response dict.
|
|
|
|
``image`` may be an HTTP URL or an absolute filesystem path (for b64
|
|
providers like OpenAI). Callers that need to pass through additional
|
|
backend-specific fields can supply ``extra``.
|
|
"""
|
|
payload: Dict[str, Any] = {
|
|
"success": True,
|
|
"image": image,
|
|
"model": model,
|
|
"prompt": prompt,
|
|
"aspect_ratio": aspect_ratio,
|
|
"provider": provider,
|
|
}
|
|
if extra:
|
|
for k, v in extra.items():
|
|
payload.setdefault(k, v)
|
|
return payload
|
|
|
|
|
|
def error_response(
|
|
*,
|
|
error: str,
|
|
error_type: str = "provider_error",
|
|
provider: str = "",
|
|
model: str = "",
|
|
prompt: str = "",
|
|
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
|
|
) -> Dict[str, Any]:
|
|
"""Build a uniform error response dict."""
|
|
return {
|
|
"success": False,
|
|
"image": None,
|
|
"error": error,
|
|
"error_type": error_type,
|
|
"model": model,
|
|
"prompt": prompt,
|
|
"aspect_ratio": aspect_ratio,
|
|
"provider": provider,
|
|
}
|