mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
1065 lines
39 KiB
Python
1065 lines
39 KiB
Python
#!/usr/bin/env python3
|
||
"""
|
||
Image Generation Tools Module
|
||
|
||
Provides image generation via FAL.ai. Multiple FAL models are supported and
|
||
selectable via ``hermes tools`` → Image Generation; the active model is
|
||
persisted to ``image_gen.model`` in ``config.yaml``.
|
||
|
||
Architecture:
|
||
- ``FAL_MODELS`` is a catalog of supported models with per-model metadata
|
||
(size-style family, defaults, ``supports`` whitelist, upscaler flag).
|
||
- ``_build_fal_payload()`` translates the agent's unified inputs (prompt +
|
||
aspect_ratio) into the model-specific payload and filters to the
|
||
``supports`` whitelist so models never receive rejected keys.
|
||
- Upscaling via FAL's Clarity Upscaler is gated per-model via the ``upscale``
|
||
flag — on for FLUX 2 Pro (backward-compat), off for all faster/newer models
|
||
where upscaling would either hurt latency or add marginal quality.
|
||
|
||
Pricing shown in UI strings is as-of the initial commit; we accept drift and
|
||
update when it's noticed.
|
||
"""
|
||
|
||
import json
|
||
import logging
|
||
import os
|
||
import datetime
|
||
import threading
|
||
import uuid
|
||
from typing import Any, Dict, Optional, Union
|
||
from urllib.parse import urlencode
|
||
|
||
# fal_client is imported lazily — see _load_fal_client(). Pulling it
|
||
# eagerly added ~64 ms to every CLI cold start because
|
||
# discover_builtin_tools() imports this module unconditionally during
|
||
# the registry walk, even when image generation is never used.
|
||
#
|
||
# Tests that monkeypatch this attribute (e.g.
|
||
# ``monkeypatch.setattr(image_tool, "fal_client", fake_fal_client)``)
|
||
# still work: _load_fal_client() short-circuits when the attribute is
|
||
# anything truthy, so a test-installed mock is not overwritten by a
|
||
# subsequent real import.
|
||
fal_client: Any = None
|
||
|
||
|
||
def _load_fal_client() -> Any:
|
||
"""Lazily import fal_client and rebind the module global on first use.
|
||
|
||
Idempotent. Returns the (now-loaded) ``fal_client`` module reference.
|
||
Skips the import if the global is already truthy — this preserves the
|
||
test pattern of monkeypatching the module global to install a mock.
|
||
"""
|
||
global fal_client
|
||
if fal_client is not None:
|
||
return fal_client
|
||
try:
|
||
from tools.lazy_deps import ensure as _lazy_ensure
|
||
_lazy_ensure("image.fal", prompt=False)
|
||
except ImportError:
|
||
pass
|
||
except Exception as e:
|
||
raise ImportError(str(e))
|
||
import fal_client as _fal_client # noqa: F811 — module-global rebind
|
||
fal_client = _fal_client
|
||
return fal_client
|
||
|
||
|
||
from tools.debug_helpers import DebugSession
|
||
from tools.managed_tool_gateway import resolve_managed_tool_gateway
|
||
from tools.tool_backend_helpers import (
|
||
fal_key_is_configured,
|
||
managed_nous_tools_enabled,
|
||
prefers_gateway,
|
||
)
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# FAL model catalog
|
||
# ---------------------------------------------------------------------------
|
||
#
|
||
# Each entry declares how to translate our unified inputs into the model's
|
||
# native payload shape. Size specification falls into three families:
|
||
#
|
||
# "image_size_preset" — preset enum ("square_hd", "landscape_16_9", ...)
|
||
# used by the flux family, z-image, qwen, recraft,
|
||
# ideogram.
|
||
# "aspect_ratio" — aspect ratio enum ("16:9", "1:1", ...) used by
|
||
# nano-banana (Gemini).
|
||
# "gpt_literal" — literal dimension strings ("1024x1024", etc.)
|
||
# used by gpt-image-1.5.
|
||
#
|
||
# ``supports`` is a whitelist of keys allowed in the outgoing payload — any
|
||
# key outside this set is stripped before submission so models never receive
|
||
# rejected parameters (each FAL model rejects unknown keys differently).
|
||
#
|
||
# ``upscale`` controls whether to chain Clarity Upscaler after generation.
|
||
|
||
FAL_MODELS: Dict[str, Dict[str, Any]] = {
|
||
"fal-ai/flux-2/klein/9b": {
|
||
"display": "FLUX 2 Klein 9B",
|
||
"speed": "<1s",
|
||
"strengths": "Fast, crisp text",
|
||
"price": "$0.006/MP",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
"num_inference_steps": 4,
|
||
"output_format": "png",
|
||
"enable_safety_checker": False,
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "num_inference_steps", "seed",
|
||
"output_format", "enable_safety_checker",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/flux-2-pro": {
|
||
"display": "FLUX 2 Pro",
|
||
"speed": "~6s",
|
||
"strengths": "Studio photorealism",
|
||
"price": "$0.03/MP",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
"num_inference_steps": 50,
|
||
"guidance_scale": 4.5,
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
"enable_safety_checker": False,
|
||
"safety_tolerance": "5",
|
||
"sync_mode": True,
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "num_inference_steps", "guidance_scale",
|
||
"num_images", "output_format", "enable_safety_checker",
|
||
"safety_tolerance", "sync_mode", "seed",
|
||
},
|
||
"upscale": True, # Backward-compat: current default behavior.
|
||
},
|
||
"fal-ai/z-image/turbo": {
|
||
"display": "Z-Image Turbo",
|
||
"speed": "~2s",
|
||
"strengths": "Bilingual EN/CN, 6B",
|
||
"price": "$0.005/MP",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
"num_inference_steps": 8,
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
"enable_safety_checker": False,
|
||
"enable_prompt_expansion": False, # avoid the extra per-request charge
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "num_inference_steps", "num_images",
|
||
"seed", "output_format", "enable_safety_checker",
|
||
"enable_prompt_expansion",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/nano-banana-pro": {
|
||
"display": "Nano Banana Pro (Gemini 3 Pro Image)",
|
||
"speed": "~8s",
|
||
"strengths": "Gemini 3 Pro, reasoning depth, text rendering",
|
||
"price": "$0.15/image (1K)",
|
||
"size_style": "aspect_ratio",
|
||
"sizes": {
|
||
"landscape": "16:9",
|
||
"square": "1:1",
|
||
"portrait": "9:16",
|
||
},
|
||
"defaults": {
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
"safety_tolerance": "5",
|
||
# "1K" is the cheapest tier; 4K doubles the per-image cost.
|
||
# Users on Nous Subscription should stay at 1K for predictable billing.
|
||
"resolution": "1K",
|
||
},
|
||
"supports": {
|
||
"prompt", "aspect_ratio", "num_images", "output_format",
|
||
"safety_tolerance", "seed", "sync_mode", "resolution",
|
||
"enable_web_search", "limit_generations",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/gpt-image-1.5": {
|
||
"display": "GPT Image 1.5",
|
||
"speed": "~15s",
|
||
"strengths": "Prompt adherence",
|
||
"price": "$0.034/image",
|
||
"size_style": "gpt_literal",
|
||
"sizes": {
|
||
"landscape": "1536x1024",
|
||
"square": "1024x1024",
|
||
"portrait": "1024x1536",
|
||
},
|
||
"defaults": {
|
||
# Quality is pinned to medium to keep portal billing predictable
|
||
# across all users (low is too rough, high is 4-6x more expensive).
|
||
"quality": "medium",
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "quality", "num_images", "output_format",
|
||
"background", "sync_mode",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/gpt-image-2": {
|
||
"display": "GPT Image 2",
|
||
"speed": "~20s",
|
||
"strengths": "SOTA text rendering + CJK, world-aware photorealism",
|
||
"price": "$0.04–0.06/image",
|
||
# GPT Image 2 uses FAL's standard preset enum (unlike 1.5's literal
|
||
# dimensions). We map to the 4:3 variants — the 16:9 presets
|
||
# (1024x576) fall below GPT-Image-2's 655,360 min-pixel requirement
|
||
# and would be rejected. 4:3 keeps us above the minimum on all
|
||
# three aspect ratios.
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_4_3", # 1024x768
|
||
"square": "square_hd", # 1024x1024
|
||
"portrait": "portrait_4_3", # 768x1024
|
||
},
|
||
"defaults": {
|
||
# Same quality pinning as gpt-image-1.5: medium keeps Nous
|
||
# Portal billing predictable. "high" is 3-4x the per-image
|
||
# cost at the same size; "low" is too rough for production use.
|
||
"quality": "medium",
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "quality", "num_images", "output_format",
|
||
"sync_mode",
|
||
# openai_api_key (BYOK) intentionally omitted — all users go
|
||
# through the shared FAL billing path.
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/ideogram/v3": {
|
||
"display": "Ideogram V3",
|
||
"speed": "~5s",
|
||
"strengths": "Best typography",
|
||
"price": "$0.03-0.09/image",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
"rendering_speed": "BALANCED",
|
||
"expand_prompt": True,
|
||
"style": "AUTO",
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "rendering_speed", "expand_prompt",
|
||
"style", "seed",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/recraft/v4/pro/text-to-image": {
|
||
"display": "Recraft V4 Pro",
|
||
"speed": "~8s",
|
||
"strengths": "Design, brand systems, production-ready",
|
||
"price": "$0.25/image",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
# V4 Pro dropped V3's required `style` enum — defaults handle taste now.
|
||
"enable_safety_checker": False,
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "enable_safety_checker",
|
||
"colors", "background_color",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
"fal-ai/qwen-image": {
|
||
"display": "Qwen Image",
|
||
"speed": "~12s",
|
||
"strengths": "LLM-based, complex text",
|
||
"price": "$0.02/MP",
|
||
"size_style": "image_size_preset",
|
||
"sizes": {
|
||
"landscape": "landscape_16_9",
|
||
"square": "square_hd",
|
||
"portrait": "portrait_16_9",
|
||
},
|
||
"defaults": {
|
||
"num_inference_steps": 30,
|
||
"guidance_scale": 2.5,
|
||
"num_images": 1,
|
||
"output_format": "png",
|
||
"acceleration": "regular",
|
||
},
|
||
"supports": {
|
||
"prompt", "image_size", "num_inference_steps", "guidance_scale",
|
||
"num_images", "output_format", "acceleration", "seed", "sync_mode",
|
||
},
|
||
"upscale": False,
|
||
},
|
||
}
|
||
|
||
# Default model is the fastest reasonable option. Kept cheap and sub-1s.
|
||
DEFAULT_MODEL = "fal-ai/flux-2/klein/9b"
|
||
|
||
DEFAULT_ASPECT_RATIO = "landscape"
|
||
VALID_ASPECT_RATIOS = ("landscape", "square", "portrait")
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Upscaler (Clarity Upscaler — unchanged from previous implementation)
|
||
# ---------------------------------------------------------------------------
|
||
UPSCALER_MODEL = "fal-ai/clarity-upscaler"
|
||
UPSCALER_FACTOR = 2
|
||
UPSCALER_SAFETY_CHECKER = False
|
||
UPSCALER_DEFAULT_PROMPT = "masterpiece, best quality, highres"
|
||
UPSCALER_NEGATIVE_PROMPT = "(worst quality, low quality, normal quality:2)"
|
||
UPSCALER_CREATIVITY = 0.35
|
||
UPSCALER_RESEMBLANCE = 0.6
|
||
UPSCALER_GUIDANCE_SCALE = 4
|
||
UPSCALER_NUM_INFERENCE_STEPS = 18
|
||
|
||
|
||
_debug = DebugSession("image_tools", env_var="IMAGE_TOOLS_DEBUG")
|
||
_managed_fal_client = None
|
||
_managed_fal_client_config = None
|
||
_managed_fal_client_lock = threading.Lock()
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Managed FAL gateway (Nous Subscription)
|
||
# ---------------------------------------------------------------------------
|
||
def _resolve_managed_fal_gateway():
|
||
"""Return managed fal-queue gateway config when the user prefers the gateway
|
||
or direct FAL credentials are absent."""
|
||
if fal_key_is_configured() and not prefers_gateway("image_gen"):
|
||
return None
|
||
return resolve_managed_tool_gateway("fal-queue")
|
||
|
||
|
||
def _normalize_fal_queue_url_format(queue_run_origin: str) -> str:
|
||
normalized_origin = str(queue_run_origin or "").strip().rstrip("/")
|
||
if not normalized_origin:
|
||
raise ValueError("Managed FAL queue origin is required")
|
||
return f"{normalized_origin}/"
|
||
|
||
|
||
class _ManagedFalSyncClient:
|
||
"""Small per-instance wrapper around fal_client.SyncClient for managed queue hosts."""
|
||
|
||
def __init__(self, *, key: str, queue_run_origin: str):
|
||
# Trigger the lazy import on first construction. Idempotent — the
|
||
# placeholder is overwritten with the real module on first call.
|
||
_load_fal_client()
|
||
sync_client_class = getattr(fal_client, "SyncClient", None)
|
||
if sync_client_class is None:
|
||
raise RuntimeError("fal_client.SyncClient is required for managed FAL gateway mode")
|
||
|
||
client_module = getattr(fal_client, "client", None)
|
||
if client_module is None:
|
||
raise RuntimeError("fal_client.client is required for managed FAL gateway mode")
|
||
|
||
self._queue_url_format = _normalize_fal_queue_url_format(queue_run_origin)
|
||
self._sync_client = sync_client_class(key=key)
|
||
self._http_client = getattr(self._sync_client, "_client", None)
|
||
self._maybe_retry_request = getattr(client_module, "_maybe_retry_request", None)
|
||
self._raise_for_status = getattr(client_module, "_raise_for_status", None)
|
||
self._request_handle_class = getattr(client_module, "SyncRequestHandle", None)
|
||
self._add_hint_header = getattr(client_module, "add_hint_header", None)
|
||
self._add_priority_header = getattr(client_module, "add_priority_header", None)
|
||
self._add_timeout_header = getattr(client_module, "add_timeout_header", None)
|
||
|
||
if self._http_client is None:
|
||
raise RuntimeError("fal_client.SyncClient._client is required for managed FAL gateway mode")
|
||
if self._maybe_retry_request is None or self._raise_for_status is None:
|
||
raise RuntimeError("fal_client.client request helpers are required for managed FAL gateway mode")
|
||
if self._request_handle_class is None:
|
||
raise RuntimeError("fal_client.client.SyncRequestHandle is required for managed FAL gateway mode")
|
||
|
||
def submit(
|
||
self,
|
||
application: str,
|
||
arguments: Dict[str, Any],
|
||
*,
|
||
path: str = "",
|
||
hint: Optional[str] = None,
|
||
webhook_url: Optional[str] = None,
|
||
priority: Any = None,
|
||
headers: Optional[Dict[str, str]] = None,
|
||
start_timeout: Optional[Union[int, float]] = None,
|
||
):
|
||
url = self._queue_url_format + application
|
||
if path:
|
||
url += "/" + path.lstrip("/")
|
||
if webhook_url is not None:
|
||
url += "?" + urlencode({"fal_webhook": webhook_url})
|
||
|
||
request_headers = dict(headers or {})
|
||
if hint is not None and self._add_hint_header is not None:
|
||
self._add_hint_header(hint, request_headers)
|
||
if priority is not None:
|
||
if self._add_priority_header is None:
|
||
raise RuntimeError("fal_client.client.add_priority_header is required for priority requests")
|
||
self._add_priority_header(priority, request_headers)
|
||
if start_timeout is not None:
|
||
if self._add_timeout_header is None:
|
||
raise RuntimeError("fal_client.client.add_timeout_header is required for timeout requests")
|
||
self._add_timeout_header(start_timeout, request_headers)
|
||
|
||
response = self._maybe_retry_request(
|
||
self._http_client,
|
||
"POST",
|
||
url,
|
||
json=arguments,
|
||
timeout=getattr(self._sync_client, "default_timeout", 120.0),
|
||
headers=request_headers,
|
||
)
|
||
self._raise_for_status(response)
|
||
|
||
data = response.json()
|
||
return self._request_handle_class(
|
||
request_id=data["request_id"],
|
||
response_url=data["response_url"],
|
||
status_url=data["status_url"],
|
||
cancel_url=data["cancel_url"],
|
||
client=self._http_client,
|
||
)
|
||
|
||
|
||
def _get_managed_fal_client(managed_gateway):
|
||
"""Reuse the managed FAL client so its internal httpx.Client is not leaked per call."""
|
||
global _managed_fal_client, _managed_fal_client_config
|
||
|
||
client_config = (
|
||
managed_gateway.gateway_origin.rstrip("/"),
|
||
managed_gateway.nous_user_token,
|
||
)
|
||
with _managed_fal_client_lock:
|
||
if _managed_fal_client is not None and _managed_fal_client_config == client_config:
|
||
return _managed_fal_client
|
||
|
||
_managed_fal_client = _ManagedFalSyncClient(
|
||
key=managed_gateway.nous_user_token,
|
||
queue_run_origin=managed_gateway.gateway_origin,
|
||
)
|
||
_managed_fal_client_config = client_config
|
||
return _managed_fal_client
|
||
|
||
|
||
def _submit_fal_request(model: str, arguments: Dict[str, Any]):
|
||
"""Submit a FAL request using direct credentials or the managed queue gateway."""
|
||
# Trigger the lazy import on first call. Idempotent.
|
||
_load_fal_client()
|
||
request_headers = {"x-idempotency-key": str(uuid.uuid4())}
|
||
managed_gateway = _resolve_managed_fal_gateway()
|
||
if managed_gateway is None:
|
||
return fal_client.submit(model, arguments=arguments, headers=request_headers)
|
||
|
||
managed_client = _get_managed_fal_client(managed_gateway)
|
||
try:
|
||
return managed_client.submit(
|
||
model,
|
||
arguments=arguments,
|
||
headers=request_headers,
|
||
)
|
||
except Exception as exc:
|
||
# 4xx from the managed gateway typically means the portal doesn't
|
||
# currently proxy this model (allowlist miss, billing gate, etc.)
|
||
# — surface a clearer message with actionable remediation instead
|
||
# of a raw HTTP error from httpx.
|
||
status = _extract_http_status(exc)
|
||
if status is not None and 400 <= status < 500:
|
||
raise ValueError(
|
||
f"Nous Subscription gateway rejected model '{model}' "
|
||
f"(HTTP {status}). This model may not yet be enabled on "
|
||
f"the Nous Portal's FAL proxy. Either:\n"
|
||
f" • Set FAL_KEY in your environment to use FAL.ai directly, or\n"
|
||
f" • Pick a different model via `hermes tools` → Image Generation."
|
||
) from exc
|
||
raise
|
||
|
||
|
||
def _extract_http_status(exc: BaseException) -> Optional[int]:
|
||
"""Return an HTTP status code from httpx/fal exceptions, else None.
|
||
|
||
Defensive across exception shapes — httpx.HTTPStatusError exposes
|
||
``.response.status_code`` while fal_client wrappers may expose
|
||
``.status_code`` directly.
|
||
"""
|
||
response = getattr(exc, "response", None)
|
||
if response is not None:
|
||
status = getattr(response, "status_code", None)
|
||
if isinstance(status, int):
|
||
return status
|
||
status = getattr(exc, "status_code", None)
|
||
if isinstance(status, int):
|
||
return status
|
||
return None
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Model resolution + payload construction
|
||
# ---------------------------------------------------------------------------
|
||
def _resolve_fal_model() -> tuple:
|
||
"""Resolve the active FAL model from config.yaml (primary) or default.
|
||
|
||
Returns (model_id, metadata_dict). Falls back to DEFAULT_MODEL if the
|
||
configured model is unknown (logged as a warning).
|
||
"""
|
||
model_id = ""
|
||
try:
|
||
from hermes_cli.config import load_config
|
||
cfg = load_config()
|
||
img_cfg = cfg.get("image_gen") if isinstance(cfg, dict) else None
|
||
if isinstance(img_cfg, dict):
|
||
raw = img_cfg.get("model")
|
||
if isinstance(raw, str):
|
||
model_id = raw.strip()
|
||
except Exception as exc:
|
||
logger.debug("Could not load image_gen.model from config: %s", exc)
|
||
|
||
# Env var escape hatch (undocumented; backward-compat for tests/scripts).
|
||
if not model_id:
|
||
model_id = os.getenv("FAL_IMAGE_MODEL", "").strip()
|
||
|
||
if not model_id:
|
||
return DEFAULT_MODEL, FAL_MODELS[DEFAULT_MODEL]
|
||
|
||
if model_id not in FAL_MODELS:
|
||
logger.warning(
|
||
"Unknown FAL model '%s' in config; falling back to %s",
|
||
model_id, DEFAULT_MODEL,
|
||
)
|
||
return DEFAULT_MODEL, FAL_MODELS[DEFAULT_MODEL]
|
||
|
||
return model_id, FAL_MODELS[model_id]
|
||
|
||
|
||
def _build_fal_payload(
|
||
model_id: str,
|
||
prompt: str,
|
||
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
|
||
seed: Optional[int] = None,
|
||
overrides: Optional[Dict[str, Any]] = None,
|
||
) -> Dict[str, Any]:
|
||
"""Build a FAL request payload for `model_id` from unified inputs.
|
||
|
||
Translates aspect_ratio into the model's native size spec (preset enum,
|
||
aspect-ratio enum, or GPT literal string), merges model defaults, applies
|
||
caller overrides, then filters to the model's ``supports`` whitelist.
|
||
"""
|
||
meta = FAL_MODELS[model_id]
|
||
size_style = meta["size_style"]
|
||
sizes = meta["sizes"]
|
||
|
||
aspect = (aspect_ratio or DEFAULT_ASPECT_RATIO).lower().strip()
|
||
if aspect not in sizes:
|
||
aspect = DEFAULT_ASPECT_RATIO
|
||
|
||
payload: Dict[str, Any] = dict(meta.get("defaults", {}))
|
||
payload["prompt"] = (prompt or "").strip()
|
||
|
||
if size_style in {"image_size_preset", "gpt_literal"}:
|
||
payload["image_size"] = sizes[aspect]
|
||
elif size_style == "aspect_ratio":
|
||
payload["aspect_ratio"] = sizes[aspect]
|
||
else:
|
||
raise ValueError(f"Unknown size_style: {size_style!r}")
|
||
|
||
if seed is not None and isinstance(seed, int):
|
||
payload["seed"] = seed
|
||
|
||
if overrides:
|
||
for k, v in overrides.items():
|
||
if v is not None:
|
||
payload[k] = v
|
||
|
||
supports = meta["supports"]
|
||
return {k: v for k, v in payload.items() if k in supports}
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Upscaler
|
||
# ---------------------------------------------------------------------------
|
||
def _upscale_image(image_url: str, original_prompt: str) -> Optional[Dict[str, Any]]:
|
||
"""Upscale an image using FAL.ai's Clarity Upscaler.
|
||
|
||
Returns upscaled image dict, or None on failure (caller falls back to
|
||
the original image).
|
||
"""
|
||
try:
|
||
logger.info("Upscaling image with Clarity Upscaler...")
|
||
|
||
upscaler_arguments = {
|
||
"image_url": image_url,
|
||
"prompt": f"{UPSCALER_DEFAULT_PROMPT}, {original_prompt}",
|
||
"upscale_factor": UPSCALER_FACTOR,
|
||
"negative_prompt": UPSCALER_NEGATIVE_PROMPT,
|
||
"creativity": UPSCALER_CREATIVITY,
|
||
"resemblance": UPSCALER_RESEMBLANCE,
|
||
"guidance_scale": UPSCALER_GUIDANCE_SCALE,
|
||
"num_inference_steps": UPSCALER_NUM_INFERENCE_STEPS,
|
||
"enable_safety_checker": UPSCALER_SAFETY_CHECKER,
|
||
}
|
||
|
||
handler = _submit_fal_request(UPSCALER_MODEL, arguments=upscaler_arguments)
|
||
result = handler.get()
|
||
|
||
if result and "image" in result:
|
||
upscaled_image = result["image"]
|
||
logger.info(
|
||
"Image upscaled successfully to %sx%s",
|
||
upscaled_image.get("width", "unknown"),
|
||
upscaled_image.get("height", "unknown"),
|
||
)
|
||
return {
|
||
"url": upscaled_image["url"],
|
||
"width": upscaled_image.get("width", 0),
|
||
"height": upscaled_image.get("height", 0),
|
||
"upscaled": True,
|
||
"upscale_factor": UPSCALER_FACTOR,
|
||
}
|
||
logger.error("Upscaler returned invalid response")
|
||
return None
|
||
|
||
except Exception as e:
|
||
logger.error("Error upscaling image: %s", e, exc_info=True)
|
||
return None
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Tool entry point
|
||
# ---------------------------------------------------------------------------
|
||
def image_generate_tool(
|
||
prompt: str,
|
||
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
|
||
num_inference_steps: Optional[int] = None,
|
||
guidance_scale: Optional[float] = None,
|
||
num_images: Optional[int] = None,
|
||
output_format: Optional[str] = None,
|
||
seed: Optional[int] = None,
|
||
) -> str:
|
||
"""Generate an image from a text prompt using the configured FAL model.
|
||
|
||
The agent-facing schema exposes only ``prompt`` and ``aspect_ratio``; the
|
||
remaining kwargs are overrides for direct Python callers and are filtered
|
||
per-model via the ``supports`` whitelist (unsupported overrides are
|
||
silently dropped so legacy callers don't break when switching models).
|
||
|
||
Returns a JSON string with ``{"success": bool, "image": url | None,
|
||
"error": str, "error_type": str}``.
|
||
"""
|
||
model_id, meta = _resolve_fal_model()
|
||
|
||
debug_call_data = {
|
||
"model": model_id,
|
||
"parameters": {
|
||
"prompt": prompt,
|
||
"aspect_ratio": aspect_ratio,
|
||
"num_inference_steps": num_inference_steps,
|
||
"guidance_scale": guidance_scale,
|
||
"num_images": num_images,
|
||
"output_format": output_format,
|
||
"seed": seed,
|
||
},
|
||
"error": None,
|
||
"success": False,
|
||
"images_generated": 0,
|
||
"generation_time": 0,
|
||
}
|
||
|
||
start_time = datetime.datetime.now()
|
||
|
||
try:
|
||
if not prompt or not isinstance(prompt, str) or len(prompt.strip()) == 0:
|
||
raise ValueError("Prompt is required and must be a non-empty string")
|
||
|
||
if not (fal_key_is_configured() or _resolve_managed_fal_gateway()):
|
||
message = "FAL_KEY environment variable not set"
|
||
if managed_nous_tools_enabled():
|
||
message += " and managed FAL gateway is unavailable"
|
||
raise ValueError(message)
|
||
|
||
aspect_lc = (aspect_ratio or DEFAULT_ASPECT_RATIO).lower().strip()
|
||
if aspect_lc not in VALID_ASPECT_RATIOS:
|
||
logger.warning(
|
||
"Invalid aspect_ratio '%s', defaulting to '%s'",
|
||
aspect_ratio, DEFAULT_ASPECT_RATIO,
|
||
)
|
||
aspect_lc = DEFAULT_ASPECT_RATIO
|
||
|
||
overrides: Dict[str, Any] = {}
|
||
if num_inference_steps is not None:
|
||
overrides["num_inference_steps"] = num_inference_steps
|
||
if guidance_scale is not None:
|
||
overrides["guidance_scale"] = guidance_scale
|
||
if num_images is not None:
|
||
overrides["num_images"] = num_images
|
||
if output_format is not None:
|
||
overrides["output_format"] = output_format
|
||
|
||
arguments = _build_fal_payload(
|
||
model_id, prompt, aspect_lc, seed=seed, overrides=overrides,
|
||
)
|
||
|
||
logger.info(
|
||
"Generating image with %s (%s) — prompt: %s",
|
||
meta.get("display", model_id), model_id, prompt[:80],
|
||
)
|
||
|
||
handler = _submit_fal_request(model_id, arguments=arguments)
|
||
result = handler.get()
|
||
|
||
generation_time = (datetime.datetime.now() - start_time).total_seconds()
|
||
|
||
if not result or "images" not in result:
|
||
raise ValueError("Invalid response from FAL.ai API — no images returned")
|
||
|
||
images = result.get("images", [])
|
||
if not images:
|
||
raise ValueError("No images were generated")
|
||
|
||
should_upscale = bool(meta.get("upscale", False))
|
||
|
||
formatted_images = []
|
||
for img in images:
|
||
if not (isinstance(img, dict) and "url" in img):
|
||
continue
|
||
original_image = {
|
||
"url": img["url"],
|
||
"width": img.get("width", 0),
|
||
"height": img.get("height", 0),
|
||
}
|
||
|
||
if should_upscale:
|
||
upscaled_image = _upscale_image(img["url"], prompt.strip())
|
||
if upscaled_image:
|
||
formatted_images.append(upscaled_image)
|
||
continue
|
||
logger.warning("Using original image as fallback (upscale failed)")
|
||
|
||
original_image["upscaled"] = False
|
||
formatted_images.append(original_image)
|
||
|
||
if not formatted_images:
|
||
raise ValueError("No valid image URLs returned from API")
|
||
|
||
upscaled_count = sum(1 for img in formatted_images if img.get("upscaled"))
|
||
logger.info(
|
||
"Generated %s image(s) in %.1fs (%s upscaled) via %s",
|
||
len(formatted_images), generation_time, upscaled_count, model_id,
|
||
)
|
||
|
||
response_data = {
|
||
"success": True,
|
||
"image": formatted_images[0]["url"] if formatted_images else None,
|
||
}
|
||
|
||
debug_call_data["success"] = True
|
||
debug_call_data["images_generated"] = len(formatted_images)
|
||
debug_call_data["generation_time"] = generation_time
|
||
_debug.log_call("image_generate_tool", debug_call_data)
|
||
_debug.save()
|
||
|
||
return json.dumps(response_data, indent=2, ensure_ascii=False)
|
||
|
||
except Exception as e:
|
||
generation_time = (datetime.datetime.now() - start_time).total_seconds()
|
||
error_msg = f"Error generating image: {str(e)}"
|
||
logger.error("%s", error_msg, exc_info=True)
|
||
|
||
response_data = {
|
||
"success": False,
|
||
"image": None,
|
||
"error": str(e),
|
||
"error_type": type(e).__name__,
|
||
}
|
||
|
||
debug_call_data["error"] = error_msg
|
||
debug_call_data["generation_time"] = generation_time
|
||
_debug.log_call("image_generate_tool", debug_call_data)
|
||
_debug.save()
|
||
|
||
return json.dumps(response_data, indent=2, ensure_ascii=False)
|
||
|
||
|
||
def check_fal_api_key() -> bool:
|
||
"""True if the FAL.ai API key (direct or managed gateway) is available."""
|
||
return bool(fal_key_is_configured() or _resolve_managed_fal_gateway())
|
||
|
||
|
||
def check_image_generation_requirements() -> bool:
|
||
"""True if any image gen backend is available.
|
||
|
||
Providers are considered in this order:
|
||
|
||
1. The in-tree FAL backend (FAL_KEY or managed gateway).
|
||
2. Any plugin-registered provider whose ``is_available()`` returns True.
|
||
|
||
Plugins win only when the in-tree FAL path is NOT ready, which matches
|
||
the historical behavior: shipping hermes with a FAL key configured
|
||
should still expose the tool. The active selection among ready
|
||
providers is resolved per-call by ``image_gen.provider``.
|
||
"""
|
||
try:
|
||
if check_fal_api_key():
|
||
# Trigger the lazy fal_client import here as the SDK presence
|
||
# check. Raises ImportError if the optional ``fal-client``
|
||
# package isn't installed; the caller's except ImportError
|
||
# below catches that and continues to plugin probing.
|
||
_load_fal_client()
|
||
return True
|
||
except ImportError:
|
||
pass
|
||
|
||
# Probe plugin providers. Discovery is idempotent and cheap.
|
||
try:
|
||
from agent.image_gen_registry import list_providers
|
||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||
|
||
_ensure_plugins_discovered()
|
||
for provider in list_providers():
|
||
try:
|
||
if provider.is_available():
|
||
return True
|
||
except Exception:
|
||
continue
|
||
except Exception:
|
||
pass
|
||
|
||
return False
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Demo / CLI entry point
|
||
# ---------------------------------------------------------------------------
|
||
if __name__ == "__main__":
|
||
print("🎨 Image Generation Tools — FAL.ai multi-model support")
|
||
print("=" * 60)
|
||
|
||
if not check_fal_api_key():
|
||
print("❌ FAL_KEY environment variable not set")
|
||
print(" Set it via: export FAL_KEY='your-key-here'")
|
||
print(" Get a key: https://fal.ai/")
|
||
raise SystemExit(1)
|
||
print("✅ FAL.ai API key found")
|
||
|
||
try:
|
||
import fal_client # noqa: F401
|
||
print("✅ fal_client library available")
|
||
except ImportError:
|
||
print("❌ fal_client library not found — pip install fal-client")
|
||
raise SystemExit(1)
|
||
|
||
model_id, meta = _resolve_fal_model()
|
||
print(f"🤖 Active model: {meta.get('display', model_id)} ({model_id})")
|
||
print(f" Speed: {meta.get('speed', '?')} · Price: {meta.get('price', '?')}")
|
||
print(f" Upscaler: {'on' if meta.get('upscale') else 'off'}")
|
||
|
||
print("\nAvailable models:")
|
||
for mid, m in FAL_MODELS.items():
|
||
marker = " ← active" if mid == model_id else ""
|
||
print(f" {mid:<32} {m.get('speed', '?'):<6} {m.get('price', '?')}{marker}")
|
||
|
||
if _debug.active:
|
||
print(f"\n🐛 Debug mode enabled — session {_debug.session_id}")
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Registry
|
||
# ---------------------------------------------------------------------------
|
||
from tools.registry import registry, tool_error
|
||
|
||
IMAGE_GENERATE_SCHEMA = {
|
||
"name": "image_generate",
|
||
"description": (
|
||
"Generate high-quality images from text prompts. The underlying "
|
||
"backend (FAL, OpenAI, etc.) and model are user-configured and not "
|
||
"selectable by the agent. Returns either a URL or an absolute file "
|
||
"path in the `image` field; display it with markdown "
|
||
" and the gateway will deliver it."
|
||
),
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"prompt": {
|
||
"type": "string",
|
||
"description": "The text prompt describing the desired image. Be detailed and descriptive.",
|
||
},
|
||
"aspect_ratio": {
|
||
"type": "string",
|
||
"enum": list(VALID_ASPECT_RATIOS),
|
||
"description": "The aspect ratio of the generated image. 'landscape' is 16:9 wide, 'portrait' is 16:9 tall, 'square' is 1:1.",
|
||
"default": DEFAULT_ASPECT_RATIO,
|
||
},
|
||
},
|
||
"required": ["prompt"],
|
||
},
|
||
}
|
||
|
||
|
||
def _read_configured_image_model():
|
||
"""Return the value of ``image_gen.model`` from config.yaml, or None."""
|
||
try:
|
||
from hermes_cli.config import load_config
|
||
cfg = load_config()
|
||
section = cfg.get("image_gen") if isinstance(cfg, dict) else None
|
||
if isinstance(section, dict):
|
||
value = section.get("model")
|
||
if isinstance(value, str) and value.strip():
|
||
return value.strip()
|
||
except Exception as exc:
|
||
logger.debug("Could not read image_gen.model: %s", exc)
|
||
return None
|
||
|
||
|
||
def _read_configured_image_provider():
|
||
"""Return the value of ``image_gen.provider`` from config.yaml, or None.
|
||
|
||
We only consult the plugin registry when this is explicitly set — an
|
||
unset value keeps users on the legacy in-tree FAL path even when other
|
||
providers happen to be registered (e.g. a user has OPENAI_API_KEY set
|
||
for other features but never asked for OpenAI image gen).
|
||
"""
|
||
try:
|
||
from hermes_cli.config import load_config
|
||
cfg = load_config()
|
||
section = cfg.get("image_gen") if isinstance(cfg, dict) else None
|
||
if isinstance(section, dict):
|
||
value = section.get("provider")
|
||
if isinstance(value, str) and value.strip():
|
||
return value.strip()
|
||
except Exception as exc:
|
||
logger.debug("Could not read image_gen.provider: %s", exc)
|
||
return None
|
||
|
||
|
||
def _dispatch_to_plugin_provider(prompt: str, aspect_ratio: str):
|
||
"""Route the call to a plugin-registered provider when one is selected.
|
||
|
||
Returns a JSON string on dispatch, or ``None`` to fall through to the
|
||
built-in FAL path.
|
||
|
||
Dispatch only fires when ``image_gen.provider`` is explicitly set AND
|
||
it does not point to ``fal`` (FAL still lives in-tree in this PR;
|
||
a later PR ports it into ``plugins/image_gen/fal/``). Any other value
|
||
that matches a registered plugin provider wins.
|
||
"""
|
||
configured = _read_configured_image_provider()
|
||
if not configured or configured == "fal":
|
||
return None
|
||
|
||
# Also read configured model so we can pass it to the plugin
|
||
configured_model = _read_configured_image_model()
|
||
|
||
try:
|
||
# Import locally so plugin discovery isn't triggered just by
|
||
# importing this module (tests rely on that).
|
||
from agent.image_gen_registry import get_provider
|
||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||
|
||
_ensure_plugins_discovered()
|
||
provider = get_provider(configured)
|
||
except Exception as exc:
|
||
logger.debug("image_gen plugin dispatch skipped: %s", exc)
|
||
return None
|
||
|
||
if provider is None:
|
||
try:
|
||
# Long-lived sessions may have discovered plugins before a bundled
|
||
# backend was patched in or before config changed. Retry once with
|
||
# a forced refresh before surfacing a missing-provider error.
|
||
_ensure_plugins_discovered(force=True)
|
||
provider = get_provider(configured)
|
||
except Exception as exc:
|
||
logger.debug("image_gen plugin force-refresh skipped: %s", exc)
|
||
|
||
if provider is None:
|
||
return json.dumps({
|
||
"success": False,
|
||
"image": None,
|
||
"error": (
|
||
f"image_gen.provider='{configured}' is set but no plugin "
|
||
f"registered that name. Run `hermes plugins list` to see "
|
||
f"available image gen backends."
|
||
),
|
||
"error_type": "provider_not_registered",
|
||
})
|
||
|
||
try:
|
||
kwargs = {"prompt": prompt, "aspect_ratio": aspect_ratio}
|
||
if configured_model:
|
||
kwargs["model"] = configured_model
|
||
result = provider.generate(**kwargs)
|
||
except Exception as exc:
|
||
logger.warning(
|
||
"Image gen provider '%s' raised: %s",
|
||
getattr(provider, "name", "?"), exc,
|
||
)
|
||
return json.dumps({
|
||
"success": False,
|
||
"image": None,
|
||
"error": f"Provider '{getattr(provider, 'name', '?')}' error: {exc}",
|
||
"error_type": "provider_exception",
|
||
})
|
||
if not isinstance(result, dict):
|
||
return json.dumps({
|
||
"success": False,
|
||
"image": None,
|
||
"error": "Provider returned a non-dict result",
|
||
"error_type": "provider_contract",
|
||
})
|
||
return json.dumps(result)
|
||
|
||
|
||
def _handle_image_generate(args, **kw):
|
||
prompt = args.get("prompt", "")
|
||
if not prompt:
|
||
return tool_error("prompt is required for image generation")
|
||
aspect_ratio = args.get("aspect_ratio", DEFAULT_ASPECT_RATIO)
|
||
|
||
# Route to a plugin-registered provider if one is active (and it's
|
||
# not the in-tree FAL path).
|
||
dispatched = _dispatch_to_plugin_provider(prompt, aspect_ratio)
|
||
if dispatched is not None:
|
||
return dispatched
|
||
|
||
return image_generate_tool(
|
||
prompt=prompt,
|
||
aspect_ratio=aspect_ratio,
|
||
)
|
||
|
||
|
||
registry.register(
|
||
name="image_generate",
|
||
toolset="image_gen",
|
||
schema=IMAGE_GENERATE_SCHEMA,
|
||
handler=_handle_image_generate,
|
||
check_fn=check_image_generation_requirements,
|
||
requires_env=[],
|
||
is_async=False, # sync fal_client API to avoid "Event loop is closed" in gateway
|
||
emoji="🎨",
|
||
)
|