mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
243 lines
9.2 KiB
Python
243 lines
9.2 KiB
Python
"""Tests for the bundled OpenAI image_gen plugin (gpt-image-2, three tiers)."""
|
||
|
||
from __future__ import annotations
|
||
|
||
from pathlib import Path
|
||
from types import SimpleNamespace
|
||
from unittest.mock import MagicMock, patch
|
||
|
||
import pytest
|
||
|
||
import plugins.image_gen.openai as openai_plugin
|
||
|
||
|
||
# 1×1 transparent PNG — valid bytes for save_b64_image()
|
||
_PNG_HEX = (
|
||
"89504e470d0a1a0a0000000d49484452000000010000000108060000001f15c4"
|
||
"890000000d49444154789c6300010000000500010d0a2db40000000049454e44"
|
||
"ae426082"
|
||
)
|
||
|
||
|
||
def _b64_png() -> str:
|
||
import base64
|
||
return base64.b64encode(bytes.fromhex(_PNG_HEX)).decode()
|
||
|
||
|
||
def _fake_response(*, b64=None, url=None, revised_prompt=None):
|
||
item = SimpleNamespace(b64_json=b64, url=url, revised_prompt=revised_prompt)
|
||
return SimpleNamespace(data=[item])
|
||
|
||
|
||
@pytest.fixture(autouse=True)
|
||
def _tmp_hermes_home(tmp_path, monkeypatch):
|
||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||
yield tmp_path
|
||
|
||
|
||
@pytest.fixture
|
||
def provider(monkeypatch):
|
||
monkeypatch.setenv("OPENAI_API_KEY", "test-key")
|
||
return openai_plugin.OpenAIImageGenProvider()
|
||
|
||
|
||
def _patched_openai(fake_client: MagicMock):
|
||
fake_openai = MagicMock()
|
||
fake_openai.OpenAI.return_value = fake_client
|
||
return patch.dict("sys.modules", {"openai": fake_openai})
|
||
|
||
|
||
# ── Metadata ────────────────────────────────────────────────────────────────
|
||
|
||
|
||
class TestMetadata:
|
||
def test_name(self, provider):
|
||
assert provider.name == "openai"
|
||
|
||
def test_default_model(self, provider):
|
||
assert provider.default_model() == "gpt-image-2-medium"
|
||
|
||
def test_list_models_three_tiers(self, provider):
|
||
ids = [m["id"] for m in provider.list_models()]
|
||
assert ids == ["gpt-image-2-low", "gpt-image-2-medium", "gpt-image-2-high"]
|
||
|
||
def test_catalog_entries_have_display_speed_strengths(self, provider):
|
||
for entry in provider.list_models():
|
||
assert entry["display"].startswith("GPT Image 2")
|
||
assert entry["speed"]
|
||
assert entry["strengths"]
|
||
|
||
|
||
# ── Availability ────────────────────────────────────────────────────────────
|
||
|
||
|
||
class TestAvailability:
|
||
def test_no_api_key_unavailable(self, monkeypatch):
|
||
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
||
assert openai_plugin.OpenAIImageGenProvider().is_available() is False
|
||
|
||
def test_api_key_set_available(self, monkeypatch):
|
||
monkeypatch.setenv("OPENAI_API_KEY", "test")
|
||
assert openai_plugin.OpenAIImageGenProvider().is_available() is True
|
||
|
||
|
||
# ── Model resolution ────────────────────────────────────────────────────────
|
||
|
||
|
||
class TestModelResolution:
|
||
def test_default_is_medium(self):
|
||
model_id, meta = openai_plugin._resolve_model()
|
||
assert model_id == "gpt-image-2-medium"
|
||
assert meta["quality"] == "medium"
|
||
|
||
def test_env_var_override(self, monkeypatch):
|
||
monkeypatch.setenv("OPENAI_IMAGE_MODEL", "gpt-image-2-high")
|
||
model_id, meta = openai_plugin._resolve_model()
|
||
assert model_id == "gpt-image-2-high"
|
||
assert meta["quality"] == "high"
|
||
|
||
def test_env_var_unknown_falls_back(self, monkeypatch):
|
||
monkeypatch.setenv("OPENAI_IMAGE_MODEL", "bogus-tier")
|
||
model_id, _ = openai_plugin._resolve_model()
|
||
assert model_id == openai_plugin.DEFAULT_MODEL
|
||
|
||
def test_config_openai_model(self, tmp_path):
|
||
import yaml
|
||
(tmp_path / "config.yaml").write_text(
|
||
yaml.safe_dump({"image_gen": {"openai": {"model": "gpt-image-2-low"}}})
|
||
)
|
||
model_id, meta = openai_plugin._resolve_model()
|
||
assert model_id == "gpt-image-2-low"
|
||
assert meta["quality"] == "low"
|
||
|
||
def test_config_top_level_model(self, tmp_path):
|
||
"""``image_gen.model: gpt-image-2-high`` also works (top-level)."""
|
||
import yaml
|
||
(tmp_path / "config.yaml").write_text(
|
||
yaml.safe_dump({"image_gen": {"model": "gpt-image-2-high"}})
|
||
)
|
||
model_id, meta = openai_plugin._resolve_model()
|
||
assert model_id == "gpt-image-2-high"
|
||
assert meta["quality"] == "high"
|
||
|
||
|
||
# ── Generate ────────────────────────────────────────────────────────────────
|
||
|
||
|
||
class TestGenerate:
|
||
def test_empty_prompt_rejected(self, provider):
|
||
result = provider.generate("", aspect_ratio="square")
|
||
assert result["success"] is False
|
||
assert result["error_type"] == "invalid_argument"
|
||
|
||
def test_missing_api_key(self, monkeypatch):
|
||
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
||
result = openai_plugin.OpenAIImageGenProvider().generate("a cat")
|
||
assert result["success"] is False
|
||
assert result["error_type"] == "auth_required"
|
||
|
||
def test_b64_saves_to_cache(self, provider, tmp_path):
|
||
import base64
|
||
png_bytes = bytes.fromhex(_PNG_HEX)
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat", aspect_ratio="landscape")
|
||
|
||
assert result["success"] is True
|
||
assert result["model"] == "gpt-image-2-medium"
|
||
assert result["aspect_ratio"] == "landscape"
|
||
assert result["provider"] == "openai"
|
||
assert result["quality"] == "medium"
|
||
|
||
saved = Path(result["image"])
|
||
assert saved.exists()
|
||
assert saved.parent == tmp_path / "cache" / "images"
|
||
assert saved.read_bytes() == png_bytes
|
||
|
||
call_kwargs = fake_client.images.generate.call_args.kwargs
|
||
# All tiers hit the single underlying API model.
|
||
assert call_kwargs["model"] == "gpt-image-2"
|
||
assert call_kwargs["quality"] == "medium"
|
||
assert call_kwargs["size"] == "1536x1024"
|
||
# gpt-image-2 rejects response_format — we must NOT send it.
|
||
assert "response_format" not in call_kwargs
|
||
|
||
@pytest.mark.parametrize("tier,expected_quality", [
|
||
("gpt-image-2-low", "low"),
|
||
("gpt-image-2-medium", "medium"),
|
||
("gpt-image-2-high", "high"),
|
||
])
|
||
def test_tier_maps_to_quality(self, provider, monkeypatch, tier, expected_quality):
|
||
monkeypatch.setenv("OPENAI_IMAGE_MODEL", tier)
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat")
|
||
|
||
assert result["model"] == tier
|
||
assert result["quality"] == expected_quality
|
||
assert fake_client.images.generate.call_args.kwargs["quality"] == expected_quality
|
||
# Always the same underlying API model regardless of tier.
|
||
assert fake_client.images.generate.call_args.kwargs["model"] == "gpt-image-2"
|
||
|
||
@pytest.mark.parametrize("aspect,expected_size", [
|
||
("landscape", "1536x1024"),
|
||
("square", "1024x1024"),
|
||
("portrait", "1024x1536"),
|
||
])
|
||
def test_aspect_ratio_mapping(self, provider, aspect, expected_size):
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
|
||
|
||
with _patched_openai(fake_client):
|
||
provider.generate("a cat", aspect_ratio=aspect)
|
||
|
||
assert fake_client.images.generate.call_args.kwargs["size"] == expected_size
|
||
|
||
def test_revised_prompt_passed_through(self, provider):
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = _fake_response(
|
||
b64=_b64_png(), revised_prompt="A photo of a cat",
|
||
)
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat")
|
||
|
||
assert result["revised_prompt"] == "A photo of a cat"
|
||
|
||
def test_api_error_returns_error_response(self, provider):
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.side_effect = RuntimeError("boom")
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat")
|
||
|
||
assert result["success"] is False
|
||
assert result["error_type"] == "api_error"
|
||
assert "boom" in result["error"]
|
||
|
||
def test_empty_response_data(self, provider):
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = SimpleNamespace(data=[])
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat")
|
||
|
||
assert result["success"] is False
|
||
assert result["error_type"] == "empty_response"
|
||
|
||
def test_url_fallback_if_api_changes(self, provider):
|
||
"""Defensive: if OpenAI ever returns URL instead of b64, pass through."""
|
||
fake_client = MagicMock()
|
||
fake_client.images.generate.return_value = _fake_response(
|
||
b64=None, url="https://example.com/img.png",
|
||
)
|
||
|
||
with _patched_openai(fake_client):
|
||
result = provider.generate("a cat")
|
||
|
||
assert result["success"] is True
|
||
assert result["image"] == "https://example.com/img.png"
|