mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Teknium 01906e99dd

feat(image_gen): multi-model FAL support with picker in hermes tools (#11265 )

* feat(image_gen): multi-model FAL support with picker in hermes tools

Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.

Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)

Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
  whitelist, and upscale flag. Three size families handled uniformly:
  image_size_preset (flux family), aspect_ratio (nano-banana), and
  gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
  into model-specific payloads, merges defaults, applies caller overrides,
  wires GPT quality_setting, then filters to the supports whitelist — so
  models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
  providers (Replicate, Stability, etc.); each provider entry tags itself
  with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
  prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.

Config:
  image_gen.model           = fal-ai/flux-2/klein/9b  (new)
  image_gen.quality_setting = medium                  (new, GPT only)
  image_gen.use_gateway     = bool                    (existing)

Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.

Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.

Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.

Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).

* feat(image_gen): translate managed-gateway 4xx to actionable error

When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
  1. The rejected model ID + HTTP status
  2. Two remediation paths: set FAL_KEY for direct access, or
     pick a different model via `hermes tools`

5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).

Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.

Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.

Docs: brief note in image-generation.md for Nous subscribers.

Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.

* feat(image_gen): pin GPT-Image quality to medium (no user choice)

Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:

1. Nous Portal billing complexity — the 22x cost spread between tiers
   ($0.009 low / $0.20 high) forces the gateway to meter per-tier per
   user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
   credit ~6x faster than `medium`.

This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:

- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
  config.yaml, no code path reads it — always sends medium.

Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
  (5 tests) — proves medium is baked in, config is ignored, flag is
  removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
  test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
  picker call fires when gpt-image is selected (no quality follow-up).

Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.

* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section

Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.

2026-04-16 20:19:53 -07:00

6.3 KiB

Raw Blame History

title	sidebar_label	sidebar_position
Features Overview	Overview	1

Features Overview

Hermes Agent includes a rich set of capabilities that extend far beyond basic chat. From persistent memory and file-aware context to browser automation and voice conversations, these features work together to make Hermes a powerful autonomous assistant.

Core

Tools & Toolsets — Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform, covering web search, terminal execution, file editing, memory, delegation, and more.
Skills System — On-demand knowledge documents the agent can load when needed. Skills follow a progressive disclosure pattern to minimize token usage and are compatible with the agentskills.io open standard.
Persistent Memory — Bounded, curated memory that persists across sessions. Hermes remembers your preferences, projects, environment, and things it has learned via MEMORY.md and USER.md.
Context Files — Hermes automatically discovers and loads project context files (.hermes.md, AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules) that shape how it behaves in your project.
Context References — Type @ followed by a reference to inject files, folders, git diffs, and URLs directly into your messages. Hermes expands the reference inline and appends the content automatically.
Checkpoints — Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back with /rollback if something goes wrong.

Automation

Scheduled Tasks (Cron) — Schedule tasks to run automatically with natural language or cron expressions. Jobs can attach skills, deliver results to any platform, and support pause/resume/edit operations.
Subagent Delegation — The delegate_task tool spawns child agent instances with isolated context, restricted toolsets, and their own terminal sessions. Run up to 3 concurrent subagents for parallel workstreams.
Code Execution — The execute_code tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn via sandboxed RPC execution.
Event Hooks — Run custom code at key lifecycle points. Gateway hooks handle logging, alerts, and webhooks; plugin hooks handle tool interception, metrics, and guardrails.
Batch Processing — Run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.

Media & Web

Voice Mode — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
Browser Automation — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
Vision & Image Paste — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
Image Generation — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana, Ideogram V3, Recraft V3, Qwen, Z-Image Turbo); pick one via hermes tools.
Voice & TTS — Text-to-speech output and voice message transcription across all messaging platforms, with five provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, and NeuTTS.

Integrations

MCP Integration — Connect to any MCP server via stdio or HTTP transport. Access external tools from GitHub, databases, file systems, and internal APIs without writing native Hermes tools. Includes per-server tool filtering and sampling support.
Provider Routing — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering.
Fallback Providers — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression.
Credential Pools — Distribute API calls across multiple keys for the same provider. Automatic rotation on rate limits or failures.
Memory Providers — Plug in external memory backends (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover) for cross-session user modeling and personalization beyond the built-in memory system.
API Server — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more.
IDE Integration (ACP) — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor.
RL Training — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.

Customization

Personality & SOUL.md — Fully customizable agent personality. SOUL.md is the primary identity file — the first thing in the system prompt — and you can swap in built-in or custom /personality presets per session.
Skins & Themes — Customize the CLI's visual presentation: banner colors, spinner faces and verbs, response-box labels, branding text, and the tool activity prefix.
Plugins — Add custom tools, hooks, and integrations without modifying core code. Three plugin types: general plugins (tools/hooks), memory providers (cross-session knowledge), and context engines (alternative context management). Managed via the unified hermes plugins interactive UI.

6.3 KiB Raw Blame History