* feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro
Upstream asked for these two upgrades ASAP — the old entries show
stale models when newer, higher-quality versions are available on FAL.
Recraft V3 → Recraft V4 Pro
ID: fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image
Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier)
Schema: V4 dropped the required `style` enum entirely; defaults
handle taste now. Added `colors` and `background_color`
to supports for brand-palette control. `seed` is not
supported by V4 per the API docs.
Nano Banana → Nano Banana Pro
ID: fal-ai/nano-banana → fal-ai/nano-banana-pro
Price: $0.08/image → $0.15/image (1K); $0.30 at 4K
Schema: Aspect ratio family unchanged. Added `resolution`
(1K/2K/4K, default 1K for billing predictability),
`enable_web_search` (real-time info grounding, +$0.015),
and `limit_generations` (force exactly 1 image).
Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality
and reasoning depth improved; slower (~6s → ~8s).
Migration: users who had the old IDs in `image_gen.model` will
fall through the existing 'unknown model → default' warning path
in `_resolve_fal_model()` and get the Klein 9B default on the next
run. Re-run `hermes tools` → Image Generation to pick the new
version. No silent cost-upgrade aliasing — the 2-6x price jump
on these tiers warrants explicit user re-selection.
Portal note: both new model IDs need to be allowlisted on the
Nous fal-queue-gateway alongside the previous 7 additions, or
users on Nous Subscription will see the 'managed gateway rejected
model' error we added previously (which is clear and
self-remediating, just noisy).
* docs: wrap '<1s' in backticks to unblock MDX compilation
Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and
'<1s' fails because '1' isn't a valid tag-name start character. This
was broken on main since PR #11265 (never noticed because
docs-site-checks was failing on OTHER issues at the time and we
admin-merged through it).
Wrapping in backticks also gives the cell monospace styling which
reads more cleanly alongside the inline-code model ID in the same row.
The other '<1s' occurrence (line 52) is inside a fenced code block
and is already safe — code fences bypass MDX parsing.
6.4 KiB
| title | description | sidebar_label | sidebar_position |
|---|---|---|---|
| Image Generation | Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, and more, selectable via `hermes tools`. | Image Generation | 6 |
Image Generation
Hermes Agent generates images from text prompts via FAL.ai. Eight models are supported out of the box, each with different speed, quality, and cost tradeoffs. The active model is user-configurable via hermes tools and persists in config.yaml.
Supported Models
| Model | Speed | Strengths | Price |
|---|---|---|---|
fal-ai/flux-2/klein/9b (default) |
<1s |
Fast, crisp text | $0.006/MP |
fal-ai/flux-2-pro |
~6s | Studio photorealism | $0.03/MP |
fal-ai/z-image/turbo |
~2s | Bilingual EN/CN, 6B params | $0.005/MP |
fal-ai/nano-banana-pro |
~8s | Gemini 3 Pro, reasoning depth, text rendering | $0.15/image (1K) |
fal-ai/gpt-image-1.5 |
~15s | Prompt adherence | $0.034/image |
fal-ai/ideogram/v3 |
~5s | Best typography | $0.03–0.09/image |
fal-ai/recraft/v4/pro/text-to-image |
~8s | Design, brand systems, production-ready | $0.25/image |
fal-ai/qwen-image |
~12s | LLM-based, complex text | $0.02/MP |
Prices are FAL's pricing at time of writing; check fal.ai for current numbers.
Setup
:::tip Nous Subscribers If you have a paid Nous Portal subscription, you can use image generation through the Tool Gateway without a FAL API key. Your model selection persists across both paths.
If the managed gateway returns HTTP 4xx for a specific model, that model isn't yet proxied on the portal side — the agent will tell you so, with remediation steps (set FAL_KEY for direct access, or pick a different model).
:::
Get a FAL API Key
- Sign up at fal.ai
- Generate an API key from your dashboard
Configure and Pick a Model
Run the tools command:
hermes tools
Navigate to 🎨 Image Generation, pick your backend (Nous Subscription or FAL.ai), then the picker shows all supported models in a column-aligned table — arrow keys to navigate, Enter to select:
Model Speed Strengths Price
fal-ai/flux-2/klein/9b <1s Fast, crisp text $0.006/MP ← currently in use
fal-ai/flux-2-pro ~6s Studio photorealism $0.03/MP
fal-ai/z-image/turbo ~2s Bilingual EN/CN, 6B $0.005/MP
...
Your selection is saved to config.yaml:
image_gen:
model: fal-ai/flux-2/klein/9b
use_gateway: false # true if using Nous Subscription
GPT-Image Quality
The fal-ai/gpt-image-1.5 request quality is pinned to medium (~$0.034/image at 1024×1024). We don't expose the low / high tiers as a user-facing option so that Nous Portal billing stays predictable across all users — the cost spread between tiers is ~22×. If you want a cheaper GPT-Image option, pick a different model; if you want higher quality, use Klein 9B or Imagen-class models.
Usage
The agent-facing schema is intentionally minimal — the model picks up whatever you've configured:
Generate an image of a serene mountain landscape with cherry blossoms
Create a square portrait of a wise old owl — use the typography model
Make me a futuristic cityscape, landscape orientation
Aspect Ratios
Every model accepts the same three aspect ratios from the agent's perspective. Internally, each model's native size spec is filled in automatically:
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image) |
|---|---|---|---|
landscape |
landscape_16_9 |
16:9 |
1536x1024 |
square |
square_hd |
1:1 |
1024x1024 |
portrait |
portrait_16_9 |
9:16 |
1024x1536 |
This translation happens in _build_fal_payload() — agent code never has to know about per-model schema differences.
Automatic Upscaling
Upscaling via FAL's Clarity Upscaler is gated per-model:
| Model | Upscale? | Why |
|---|---|---|
fal-ai/flux-2-pro |
✓ | Backward-compat (was the pre-picker default) |
| All others | ✗ | Fast models would lose their sub-second value prop; hi-res models don't need it |
When upscaling runs, it uses these settings:
| Setting | Value |
|---|---|
| Upscale factor | 2× |
| Creativity | 0.35 |
| Resemblance | 0.6 |
| Guidance scale | 4 |
| Inference steps | 18 |
If upscaling fails (network issue, rate limit), the original image is returned automatically.
How It Works Internally
- Model resolution —
_resolve_fal_model()readsimage_gen.modelfromconfig.yaml, falls back to theFAL_IMAGE_MODELenv var, then tofal-ai/flux-2/klein/9b. - Payload building —
_build_fal_payload()translates youraspect_ratiointo the model's native format (preset enum, aspect-ratio enum, or GPT literal), merges the model's default params, applies any caller overrides, then filters to the model'ssupportswhitelist so unsupported keys are never sent. - Submission —
_submit_fal_request()routes via direct FAL credentials or the managed Nous gateway. - Upscaling — runs only if the model's metadata has
upscale: True. - Delivery — final image URL returned to the agent, which emits a
MEDIA:<url>tag that platform adapters convert to native media.
Debugging
Enable debug logging:
export IMAGE_TOOLS_DEBUG=true
Debug logs go to ./logs/image_tools_debug_<session_id>.json with per-call details (model, parameters, timing, errors).
Platform Delivery
| Platform | Delivery |
|---|---|
| CLI | Image URL printed as markdown  — click to open |
| Telegram | Photo message with the prompt as caption |
| Discord | Embedded in a message |
| Slack | URL unfurled by Slack |
| Media message | |
| Others | URL in plain text |
Limitations
- Requires FAL credentials (direct
FAL_KEYor Nous Subscription) - Text-to-image only — no inpainting, img2img, or editing via this tool
- Temporary URLs — FAL returns hosted URLs that expire after hours/days; save locally if needed
- Per-model constraints — some models don't support
seed,num_inference_steps, etc. Thesupportsfilter silently drops unsupported params; this is expected behavior