feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)

* feat(plugins): pluggable image_gen backends + OpenAI provider Adds a ImageGenProvider ABC so image generation backends register as bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner gains three primitives to make this work generically: - `kind:` manifest field (`standalone` | `backend` | `exclusive`). Bundled `kind: backend` plugins auto-load — no `plugins.enabled` incantation. User-installed backends stay opt-in. - Path-derived keys: `plugins/image_gen/openai/` gets key `image_gen/openai`, so a future `tts/openai` cannot collide. - Depth-2 recursion into category namespaces (parent dirs without a `plugin.yaml` of their own). Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5 default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64 responses save to `$HERMES_HOME/cache/images/`; URL responses pass through. FAL stays in-tree for this PR — a follow-up ports it into `plugins/image_gen/fal/` so the in-tree `image_generation_tool.py` slims down. The dispatch shim in `_handle_image_generate` only fires when `image_gen.provider` is explicitly set to a non-FAL value, so existing FAL setups are untouched. - 41 unit tests (scanner recursion, kind parsing, gate logic, registry, OpenAI payload shapes) - E2E smoke verified: bundled plugin autoloads, registers, and `_handle_image_generate` routes to OpenAI when configured * fix(image_gen/openai): don't send response_format to gpt-image-* The live API rejects it: 'Unknown parameter: response_format' (verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return b64_json unconditionally, so the parameter was both unnecessary and actively broken. * feat(image_gen/openai): gpt-image-2 only, drop legacy catalog gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21) and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 / dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward (dall-e-2 squares only). Trim the catalog down to a single model. Live-verified end-to-end: landscape 1536x1024 render of a Moog-style synth matches prompt exactly, 2.4MB PNG saved to cache. * feat(image_gen/openai): expose gpt-image-2 as three quality tiers Users pick speed/fidelity via the normal model picker instead of a hidden quality knob. All three tier IDs resolve to the single underlying gpt-image-2 API model with a different quality parameter: gpt-image-2-low ~15s fast iteration gpt-image-2-medium ~40s default gpt-image-2-high ~2min highest fidelity Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the same 1024x1024 prompt. Config: image_gen.openai.model: gpt-image-2-high # or image_gen.model: gpt-image-2-low # or env var for scripts/tests OPENAI_IMAGE_MODEL=gpt-image-2-medium Live-verified end-to-end with the low tier: 18.8s landscape render of a golden retriever in wildflowers, vision-confirmed exact match. * feat(tools_config): plugin image_gen providers inject themselves into picker 'hermes tools' → Image Generation now shows plugin-registered backends alongside Nous Subscription and FAL.ai without tools_config.py needing to know about them. OpenAI appears as a third option today; future backends appear automatically as they're added. Mechanism: - ImageGenProvider gains an optional get_setup_schema() hook (name, badge, tag, env_vars). Default derived from display_name. - tools_config._plugin_image_gen_providers() pulls the schemas from every registered non-FAL plugin provider. - _visible_providers() appends those rows when rendering the Image Generation category. - _configure_provider() handles the new image_gen_plugin_name marker: writes image_gen.provider and routes to the plugin's list_models() catalog for the model picker. - _toolset_needs_configuration_prompt('image_gen') stops demanding a FAL key when any plugin provider reports is_available(). FAL is skipped in the plugin path because it already has hardcoded TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up PR the hardcoded rows go away and it surfaces through the same path as OpenAI. Verified live: picker shows Nous Subscription / FAL.ai / OpenAI. Picking OpenAI prompts for OPENAI_API_KEY, then shows the gpt-image-2-low/medium/high model picker sourced from the plugin. 397 tests pass across plugins/, tools_config, registry, and picker. * fix(image_gen): close final gaps for plugin-backend parity with FAL Two small places that still hardcoded FAL: - hermes_cli/setup.py status line: an OpenAI-only setup showed 'Image Generation: missing FAL_KEY'. Now probes plugin providers and reports '(OpenAI)' when one is_available() — or falls back to 'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured. - image_generate tool schema description: said 'using FAL.ai, default FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are user-configured' — and notes the 'image' field can be a URL or an absolute path, which the gateway delivers either way via extract_local_files().
2026-04-27 01:11:40 +00:00 · 2026-04-21 21:30:10 -07:00 · 2026-04-21 21:30:10 -07:00 · ff9752410a
commit ff9752410a
parent d1acf17773
13 changed files with 2122 additions and 67 deletions
--- a/tests/plugins/image_gen/test_openai_provider.py
+++ b/tests/plugins/image_gen/test_openai_provider.py
@ -0,0 +1,243 @@
+"""Tests for the bundled OpenAI image_gen plugin (gpt-image-2, three tiers)."""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+import plugins.image_gen.openai as openai_plugin
+
+
+# 1×1 transparent PNG — valid bytes for save_b64_image()
+_PNG_HEX = (
+    "89504e470d0a1a0a0000000d49484452000000010000000108060000001f15c4"
+    "890000000d49444154789c6300010000000500010d0a2db40000000049454e44"
+    "ae426082"
+)
+
+
+def _b64_png() -> str:
+    import base64
+    return base64.b64encode(bytes.fromhex(_PNG_HEX)).decode()
+
+
+def _fake_response(*, b64=None, url=None, revised_prompt=None):
+    item = SimpleNamespace(b64_json=b64, url=url, revised_prompt=revised_prompt)
+    return SimpleNamespace(data=[item])
+
+
+@pytest.fixture(autouse=True)
+def _tmp_hermes_home(tmp_path, monkeypatch):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    yield tmp_path
+
+
+@pytest.fixture
+def provider(monkeypatch):
+    monkeypatch.setenv("OPENAI_API_KEY", "test-key")
+    return openai_plugin.OpenAIImageGenProvider()
+
+
+def _patched_openai(fake_client: MagicMock):
+    fake_openai = MagicMock()
+    fake_openai.OpenAI.return_value = fake_client
+    return patch.dict("sys.modules", {"openai": fake_openai})
+
+
+# ── Metadata ────────────────────────────────────────────────────────────────
+
+
+class TestMetadata:
+    def test_name(self, provider):
+        assert provider.name == "openai"
+
+    def test_default_model(self, provider):
+        assert provider.default_model() == "gpt-image-2-medium"
+
+    def test_list_models_three_tiers(self, provider):
+        ids = [m["id"] for m in provider.list_models()]
+        assert ids == ["gpt-image-2-low", "gpt-image-2-medium", "gpt-image-2-high"]
+
+    def test_catalog_entries_have_display_speed_strengths(self, provider):
+        for entry in provider.list_models():
+            assert entry["display"].startswith("GPT Image 2")
+            assert entry["speed"]
+            assert entry["strengths"]
+
+
+# ── Availability ────────────────────────────────────────────────────────────
+
+
+class TestAvailability:
+    def test_no_api_key_unavailable(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        assert openai_plugin.OpenAIImageGenProvider().is_available() is False
+
+    def test_api_key_set_available(self, monkeypatch):
+        monkeypatch.setenv("OPENAI_API_KEY", "test")
+        assert openai_plugin.OpenAIImageGenProvider().is_available() is True
+
+
+# ── Model resolution ────────────────────────────────────────────────────────
+
+
+class TestModelResolution:
+    def test_default_is_medium(self):
+        model_id, meta = openai_plugin._resolve_model()
+        assert model_id == "gpt-image-2-medium"
+        assert meta["quality"] == "medium"
+
+    def test_env_var_override(self, monkeypatch):
+        monkeypatch.setenv("OPENAI_IMAGE_MODEL", "gpt-image-2-high")
+        model_id, meta = openai_plugin._resolve_model()
+        assert model_id == "gpt-image-2-high"
+        assert meta["quality"] == "high"
+
+    def test_env_var_unknown_falls_back(self, monkeypatch):
+        monkeypatch.setenv("OPENAI_IMAGE_MODEL", "bogus-tier")
+        model_id, _ = openai_plugin._resolve_model()
+        assert model_id == openai_plugin.DEFAULT_MODEL
+
+    def test_config_openai_model(self, tmp_path):
+        import yaml
+        (tmp_path / "config.yaml").write_text(
+            yaml.safe_dump({"image_gen": {"openai": {"model": "gpt-image-2-low"}}})
+        )
+        model_id, meta = openai_plugin._resolve_model()
+        assert model_id == "gpt-image-2-low"
+        assert meta["quality"] == "low"
+
+    def test_config_top_level_model(self, tmp_path):
+        """``image_gen.model: gpt-image-2-high`` also works (top-level)."""
+        import yaml
+        (tmp_path / "config.yaml").write_text(
+            yaml.safe_dump({"image_gen": {"model": "gpt-image-2-high"}})
+        )
+        model_id, meta = openai_plugin._resolve_model()
+        assert model_id == "gpt-image-2-high"
+        assert meta["quality"] == "high"
+
+
+# ── Generate ────────────────────────────────────────────────────────────────
+
+
+class TestGenerate:
+    def test_empty_prompt_rejected(self, provider):
+        result = provider.generate("", aspect_ratio="square")
+        assert result["success"] is False
+        assert result["error_type"] == "invalid_argument"
+
+    def test_missing_api_key(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        result = openai_plugin.OpenAIImageGenProvider().generate("a cat")
+        assert result["success"] is False
+        assert result["error_type"] == "auth_required"
+
+    def test_b64_saves_to_cache(self, provider, tmp_path):
+        import base64
+        png_bytes = bytes.fromhex(_PNG_HEX)
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat", aspect_ratio="landscape")
+
+        assert result["success"] is True
+        assert result["model"] == "gpt-image-2-medium"
+        assert result["aspect_ratio"] == "landscape"
+        assert result["provider"] == "openai"
+        assert result["quality"] == "medium"
+
+        saved = Path(result["image"])
+        assert saved.exists()
+        assert saved.parent == tmp_path / "cache" / "images"
+        assert saved.read_bytes() == png_bytes
+
+        call_kwargs = fake_client.images.generate.call_args.kwargs
+        # All tiers hit the single underlying API model.
+        assert call_kwargs["model"] == "gpt-image-2"
+        assert call_kwargs["quality"] == "medium"
+        assert call_kwargs["size"] == "1536x1024"
+        # gpt-image-2 rejects response_format — we must NOT send it.
+        assert "response_format" not in call_kwargs
+
+    @pytest.mark.parametrize("tier,expected_quality", [
+        ("gpt-image-2-low", "low"),
+        ("gpt-image-2-medium", "medium"),
+        ("gpt-image-2-high", "high"),
+    ])
+    def test_tier_maps_to_quality(self, provider, monkeypatch, tier, expected_quality):
+        monkeypatch.setenv("OPENAI_IMAGE_MODEL", tier)
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat")
+
+        assert result["model"] == tier
+        assert result["quality"] == expected_quality
+        assert fake_client.images.generate.call_args.kwargs["quality"] == expected_quality
+        # Always the same underlying API model regardless of tier.
+        assert fake_client.images.generate.call_args.kwargs["model"] == "gpt-image-2"
+
+    @pytest.mark.parametrize("aspect,expected_size", [
+        ("landscape", "1536x1024"),
+        ("square", "1024x1024"),
+        ("portrait", "1024x1536"),
+    ])
+    def test_aspect_ratio_mapping(self, provider, aspect, expected_size):
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = _fake_response(b64=_b64_png())
+
+        with _patched_openai(fake_client):
+            provider.generate("a cat", aspect_ratio=aspect)
+
+        assert fake_client.images.generate.call_args.kwargs["size"] == expected_size
+
+    def test_revised_prompt_passed_through(self, provider):
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = _fake_response(
+            b64=_b64_png(), revised_prompt="A photo of a cat",
+        )
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat")
+
+        assert result["revised_prompt"] == "A photo of a cat"
+
+    def test_api_error_returns_error_response(self, provider):
+        fake_client = MagicMock()
+        fake_client.images.generate.side_effect = RuntimeError("boom")
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat")
+
+        assert result["success"] is False
+        assert result["error_type"] == "api_error"
+        assert "boom" in result["error"]
+
+    def test_empty_response_data(self, provider):
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = SimpleNamespace(data=[])
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat")
+
+        assert result["success"] is False
+        assert result["error_type"] == "empty_response"
+
+    def test_url_fallback_if_api_changes(self, provider):
+        """Defensive: if OpenAI ever returns URL instead of b64, pass through."""
+        fake_client = MagicMock()
+        fake_client.images.generate.return_value = _fake_response(
+            b64=None, url="https://example.com/img.png",
+        )
+
+        with _patched_openai(fake_client):
+            result = provider.generate("a cat")
+
+        assert result["success"] is True
+        assert result["image"] == "https://example.com/img.png"