mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Teknium 01906e99dd

feat(image_gen): multi-model FAL support with picker in hermes tools (#11265 )

* feat(image_gen): multi-model FAL support with picker in hermes tools

Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.

Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)

Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
  whitelist, and upscale flag. Three size families handled uniformly:
  image_size_preset (flux family), aspect_ratio (nano-banana), and
  gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
  into model-specific payloads, merges defaults, applies caller overrides,
  wires GPT quality_setting, then filters to the supports whitelist — so
  models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
  providers (Replicate, Stability, etc.); each provider entry tags itself
  with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
  prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.

Config:
  image_gen.model           = fal-ai/flux-2/klein/9b  (new)
  image_gen.quality_setting = medium                  (new, GPT only)
  image_gen.use_gateway     = bool                    (existing)

Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.

Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.

Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.

Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).

* feat(image_gen): translate managed-gateway 4xx to actionable error

When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
  1. The rejected model ID + HTTP status
  2. Two remediation paths: set FAL_KEY for direct access, or
     pick a different model via `hermes tools`

5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).

Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.

Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.

Docs: brief note in image-generation.md for Nous subscribers.

Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.

* feat(image_gen): pin GPT-Image quality to medium (no user choice)

Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:

1. Nous Portal billing complexity — the 22x cost spread between tiers
   ($0.009 low / $0.20 high) forces the gateway to meter per-tier per
   user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
   credit ~6x faster than `medium`.

This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:

- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
  config.yaml, no code path reads it — always sends medium.

Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
  (5 tests) — proves medium is baked in, config is ignored, flag is
  removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
  test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
  picker call fires when gpt-image is selected (no quality follow-up).

Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.

* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section

Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.

2026-04-16 20:19:53 -07:00

14 KiB

Raw Blame History

sidebar_position	title	description
3	Built-in Tools Reference	Authoritative reference for Hermes built-in tools, grouped by toolset

Built-in Tools Reference

This page documents all 47 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.

Quick counts: 10 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, and 15 standalone tools across other toolsets.

:::tip MCP Tools In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., github_create_issue for the github MCP server). See MCP Integration for configuration. :::

`browser` toolset

Tool	Description	Requires environment
`browser_back`	Navigate back to the previous page in browser history. Requires browser_navigate to be called first.	—
`browser_click`	Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first.	—
`browser_console`	Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requi…	—
`browser_get_images`	Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first.	—
`browser_navigate`	Navigate to a URL in the browser. Initializes the session and loads the page. Must be called before other browser tools. For simple information retrieval, prefer web_search or web_extract (faster, cheaper). Use browser tools when you need…	—
`browser_press`	Press a keyboard key. Useful for submitting forms (Enter), navigating (Tab), or keyboard shortcuts. Requires browser_navigate to be called first.	—
`browser_scroll`	Scroll the page in a direction. Use this to reveal more content that may be below or above the current viewport. Requires browser_navigate to be called first.	—
`browser_snapshot`	Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs (like @e1, @e2) for browser_click and browser_type. full=false (default): compact view with interactive elements. full=true: comp…	—
`browser_type`	Type text into an input field identified by its ref ID. Clears the field first, then types the new text. Requires browser_navigate and browser_snapshot to be called first.	—
`browser_vision`	Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what's on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snaps…	—

`clarify` toolset

Tool	Description	Requires environment
`clarify`	Ask the user a question when you need clarification, feedback, or a decision before proceeding. Supports two modes: 1. Multiple choice — provide up to 4 choices. The user picks one or types their own answer via a 5th 'Other' option. 2.…	—

`code_execution` toolset

Tool	Description	Requires environment
`execute_code`	Run a Python script that can call Hermes tools programmatically. Use this when you need 3+ tool calls with processing logic between them, need to filter/reduce large tool outputs before they enter your context, need conditional branching (…	—

`cronjob` toolset

Tool	Description	Requires environment
`cronjob`	Unified scheduled-task manager. Use `action="create"`, `"list"`, `"update"`, `"pause"`, `"resume"`, `"run"`, or `"remove"` to manage jobs. Supports skill-backed jobs with one or more attached skills, and `skills=[]` on update clears attached skills. Cron runs happen in fresh sessions with no current-chat context.	—

`delegation` toolset

Tool	Description	Requires environment
`delegate_task`	Spawn one or more subagents to work on tasks in isolated contexts. Each subagent gets its own conversation, terminal session, and toolset. Only the final summary is returned -- intermediate tool results never enter your context window. TWO…	—

`file` toolset

Tool	Description	Requires environment
`patch`	Targeted find-and-replace edits in files. Use this instead of sed/awk in terminal. Uses fuzzy matching (9 strategies) so minor whitespace/indentation differences won't break it. Returns a unified diff. Auto-runs syntax checks after editing…	—
`read_file`	Read a text file with line numbers and pagination. Use this instead of cat/head/tail in terminal. Output format: 'LINE_NUM\|CONTENT'. Suggests similar filenames if not found. Use offset and limit for large files. NOTE: Cannot read images o…	—
`search_files`	Search file contents or find files by name. Use this instead of grep/rg/find/ls in terminal. Ripgrep-backed, faster than shell equivalents. Content search (target='content'): Regex search inside files. Output modes: full matches with line…	—
`write_file`	Write content to a file, completely replacing existing content. Use this instead of echo/cat heredoc in terminal. Creates parent directories automatically. OVERWRITES the entire file — use 'patch' for targeted edits.	—

`homeassistant` toolset

Tool	Description	Requires environment
`ha_call_service`	Call a Home Assistant service to control a device. Use ha_list_services to discover available services and their parameters for each domain.	—
`ha_get_state`	Get the detailed state of a single Home Assistant entity, including all attributes (brightness, color, temperature setpoint, sensor readings, etc.).	—
`ha_list_entities`	List Home Assistant entities. Optionally filter by domain (light, switch, climate, sensor, binary_sensor, cover, fan, etc.) or by area name (living room, kitchen, bedroom, etc.).	—
`ha_list_services`	List available Home Assistant services (actions) for device control. Shows what actions can be performed on each device type and what parameters they accept. Use this to discover how to control devices found via ha_list_entities.	—

:::note Honcho tools (honcho_profile, honcho_search, honcho_context, honcho_reasoning, honcho_conclude) are no longer built-in. They are available via the Honcho memory provider plugin at plugins/memory/honcho/. See Memory Providers for installation and usage. :::

`image_gen` toolset

Tool	Description	Requires environment
`image_generate`	Generate high-quality images from text prompts using FAL.ai. The underlying model is user-configured (default: FLUX 2 Klein 9B, sub-1s generation) and is not selectable by the agent. Returns a single image URL. Display it using…	FAL_KEY

`memory` toolset

Tool	Description	Requires environment
`memory`	Save important information to persistent memory that survives across sessions. Your memory appears in your system prompt at session start -- it's how you remember things about the user and your environment between conversations. WHEN TO SA…	—

`messaging` toolset

Tool	Description	Requires environment
`send_message`	Send a message to a connected messaging platform, or list available targets. IMPORTANT: When the user asks to send to a specific channel or person (not just a bare platform name), call send_message(action='list') FIRST to see available tar…	—

`moa` toolset

Tool	Description	Requires environment
`mixture_of_agents`	Route a hard problem through multiple frontier LLMs collaboratively. Makes 5 API calls (4 reference models + 1 aggregator) with maximum reasoning effort — use sparingly for genuinely difficult problems. Best for: complex math, advanced alg…	OPENROUTER_API_KEY

`rl` toolset

Tool	Description	Requires environment
`rl_check_status`	Get status and metrics for a training run. RATE LIMITED: enforces 30-minute minimum between checks for the same run. Returns WandB metrics: step, state, reward_mean, loss, percent_correct.	TINKER_API_KEY, WANDB_API_KEY
`rl_edit_config`	Update a configuration field. Use rl_get_current_config() first to see all available fields for the selected environment. Each environment has different configurable options. Infrastructure settings (tokenizer, URLs, lora_rank, learning_ra…	TINKER_API_KEY, WANDB_API_KEY
`rl_get_current_config`	Get the current environment configuration. Returns only fields that can be modified: group_size, max_token_length, total_steps, steps_per_eval, use_wandb, wandb_name, max_num_workers.	TINKER_API_KEY, WANDB_API_KEY
`rl_get_results`	Get final results and metrics for a completed training run. Returns final metrics and path to trained weights.	TINKER_API_KEY, WANDB_API_KEY
`rl_list_environments`	List all available RL environments. Returns environment names, paths, and descriptions. TIP: Read the file_path with file tools to understand how each environment works (verifiers, data loading, rewards).	TINKER_API_KEY, WANDB_API_KEY
`rl_list_runs`	List all training runs (active and completed) with their status.	TINKER_API_KEY, WANDB_API_KEY
`rl_select_environment`	Select an RL environment for training. Loads the environment's default configuration. After selecting, use rl_get_current_config() to see settings and rl_edit_config() to modify them.	TINKER_API_KEY, WANDB_API_KEY
`rl_start_training`	Start a new RL training run with the current environment and config. Most training parameters (lora_rank, learning_rate, etc.) are fixed. Use rl_edit_config() to set group_size, batch_size, wandb_project before starting. WARNING: Training…	TINKER_API_KEY, WANDB_API_KEY
`rl_stop_training`	Stop a running training job. Use if metrics look bad, training is stagnant, or you want to try different settings.	TINKER_API_KEY, WANDB_API_KEY
`rl_test_inference`	Quick inference test for any environment. Runs a few steps of inference + scoring using OpenRouter. Default: 3 steps x 16 completions = 48 rollouts per model, testing 3 models = 144 total. Tests environment loading, prompt construction, in…	TINKER_API_KEY, WANDB_API_KEY

`session_search` toolset

Tool	Description	Requires environment
`session_search`	Search your long-term memory of past conversations. This is your recall -- every past session is searchable, and this tool summarizes what happened. USE THIS PROACTIVELY when: - The user says 'we did this before', 'remember when', 'last ti…	—

`skills` toolset

Tool	Description	Requires environment
`skill_manage`	Manage skills (create, update, delete). Skills are your procedural memory — reusable approaches for recurring task types. New skills go to ~/.hermes/skills/; existing skills can be modified wherever they live. Actions: create (full SKILL.m…	—
`skill_view`	Skills allow for loading information about specific tasks and workflows, as well as scripts and templates. Load a skill's full content or access its linked files (references, templates, scripts). First call returns SKILL.md content plus a…	—
`skills_list`	List available skills (name + description). Use skill_view(name) to load full content.	—

`terminal` toolset

Tool	Description	Requires environment
`process`	Manage background processes started with terminal(background=true). Actions: 'list' (show all), 'poll' (check status + new output), 'log' (full output with pagination), 'wait' (block until done or timeout), 'kill' (terminate), 'write' (sen…	—
`terminal`	Execute shell commands on a Linux environment. Filesystem persists between calls. Set `background=true` for long-running servers. Set `notify_on_complete=true` (with `background=true`) to get an automatic notification when the process finishes — no polling needed. Do NOT use cat/head/tail — use read_file. Do NOT use grep/rg/find — use search_files.	—

`todo` toolset

Tool	Description	Requires environment
`todo`	Manage your task list for the current session. Use for complex tasks with 3+ steps or when the user provides multiple tasks. Call with no parameters to read the current list. Writing: - Provide 'todos' array to create/update items - merge=…	—

`vision` toolset

Tool	Description	Requires environment
`vision_analyze`	Analyze images using AI vision. Provides a comprehensive description and answers a specific question about the image content.	—

`web` toolset

Tool	Description	Requires environment
`web_search`	Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions.	EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY
`web_extract`	Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized.	EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY

`tts` toolset

Tool	Description	Requires environment
`text_to_speech`	Convert text to speech audio. Returns a MEDIA: path that the platform delivers as a voice message. On Telegram it plays as a voice bubble, on Discord/WhatsApp as an audio attachment. In CLI mode, saves to ~/voice-memos/. Voice and provider…	—

14 KiB Raw Blame History

Built-in Tools Reference

browser toolset

clarify toolset

code_execution toolset

cronjob toolset

delegation toolset

file toolset

homeassistant toolset

image_gen toolset

memory toolset

messaging toolset

moa toolset

rl toolset

session_search toolset

skills toolset

terminal toolset

todo toolset

vision toolset

web toolset

tts toolset