Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor

This commit is contained in:
Brooklyn Nicholson 2026-04-17 08:59:33 -05:00
commit 1f37ef2fd1
126 changed files with 12584 additions and 2666 deletions

View file

@ -1,6 +1,6 @@
---
title: Image Generation
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana, Ideogram, and more, selectable via `hermes tools`.
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, and more, selectable via `hermes tools`.
sidebar_label: Image Generation
sidebar_position: 6
---
@ -13,13 +13,13 @@ Hermes Agent generates images from text prompts via FAL.ai. Eight models are sup
| Model | Speed | Strengths | Price |
|---|---|---|---|
| `fal-ai/flux-2/klein/9b` *(default)* | <1s | Fast, crisp text | $0.006/MP |
| `fal-ai/flux-2/klein/9b` *(default)* | `<1s` | Fast, crisp text | $0.006/MP |
| `fal-ai/flux-2-pro` | ~6s | Studio photorealism | $0.03/MP |
| `fal-ai/z-image/turbo` | ~2s | Bilingual EN/CN, 6B params | $0.005/MP |
| `fal-ai/nano-banana` | ~6s | Gemini 2.5, character consistency | $0.08/image |
| `fal-ai/nano-banana-pro` | ~8s | Gemini 3 Pro, reasoning depth, text rendering | $0.15/image (1K) |
| `fal-ai/gpt-image-1.5` | ~15s | Prompt adherence | $0.034/image |
| `fal-ai/ideogram/v3` | ~5s | Best typography | $0.030.09/image |
| `fal-ai/recraft-v3` | ~8s | Vector art, brand styles | $0.04/image |
| `fal-ai/recraft/v4/pro/text-to-image` | ~8s | Design, brand systems, production-ready | $0.25/image |
| `fal-ai/qwen-image` | ~12s | LLM-based, complex text | $0.02/MP |
Prices are FAL's pricing at time of writing; check [fal.ai](https://fal.ai/) for current numbers.
@ -87,7 +87,7 @@ Make me a futuristic cityscape, landscape orientation
Every model accepts the same three aspect ratios from the agent's perspective. Internally, each model's native size spec is filled in automatically:
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana) | image_size (gpt-image) |
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image) |
|---|---|---|---|
| `landscape` | `landscape_16_9` | `16:9` | `1536x1024` |
| `square` | `square_hd` | `1:1` | `1024x1024` |

View file

@ -30,7 +30,7 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
- **[Voice Mode](voice-mode.md)** — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana, Ideogram V3, Recraft V3, Qwen, Z-Image Turbo); pick one via `hermes tools`.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana Pro, Ideogram V3, Recraft V4 Pro, Qwen, Z-Image Turbo); pick one via `hermes tools`.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with five provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, and NeuTTS.
## Integrations

View file

@ -278,6 +278,8 @@ hermes skills check # Check installed hub skills f
hermes skills update # Reinstall hub skills with upstream changes when needed
hermes skills audit # Re-scan all hub skills for security
hermes skills uninstall k8s # Remove a hub skill
hermes skills reset google-workspace # Un-stick a bundled skill from "user-modified" (see below)
hermes skills reset google-workspace --restore # Also restore the bundled version, deleting your local edits
hermes skills publish skills/my-skill --to github --repo owner/repo
hermes skills snapshot export setup.json # Export skill config
hermes skills tap add myorg/skills-repo # Add a custom GitHub source
@ -430,6 +432,43 @@ This uses the stored source identifier plus the current upstream bundle content
Skills hub operations use the GitHub API, which has a rate limit of 60 requests/hour for unauthenticated users. If you see rate-limit errors during install or search, set `GITHUB_TOKEN` in your `.env` file to increase the limit to 5,000 requests/hour. The error message includes an actionable hint when this happens.
:::
## Bundled skill updates (`hermes skills reset`)
Hermes ships with a set of bundled skills in `skills/` inside the repo. On install and on every `hermes update`, a sync pass copies those into `~/.hermes/skills/` and records a manifest at `~/.hermes/skills/.bundled_manifest` mapping each skill name to the content hash at the time it was synced (the **origin hash**).
On each sync, Hermes recomputes the hash of your local copy and compares it to the origin hash:
- **Unchanged** → safe to pull upstream changes, copy the new bundled version in, record the new origin hash.
- **Changed** → treated as **user-modified** and skipped forever, so your edits never get stomped.
The protection is good, but it has one sharp edge. If you edit a bundled skill and then later want to abandon your changes and go back to the bundled version by just copy-pasting from `~/.hermes/hermes-agent/skills/`, the manifest still holds the *old* origin hash from whenever the last successful sync ran. Your fresh copy-paste contents (current bundled hash) won't match that stale origin hash, so sync keeps flagging it as user-modified.
`hermes skills reset` is the escape hatch:
```bash
# Safe: clears the manifest entry for this skill. Your current copy is preserved,
# but the next sync re-baselines against it so future updates work normally.
hermes skills reset google-workspace
# Full restore: also deletes your local copy and re-copies the current bundled
# version. Use this when you want the pristine upstream skill back.
hermes skills reset google-workspace --restore
# Non-interactive (e.g. in scripts or TUI mode) — skip the --restore confirmation.
hermes skills reset google-workspace --restore --yes
```
The same command works in chat as a slash command:
```text
/skills reset google-workspace
/skills reset google-workspace --restore
```
:::note Profiles
Each profile has its own `.bundled_manifest` under its own `HERMES_HOME`, so `hermes -p coder skills reset <name>` only affects that profile.
:::
### Slash commands (inside chat)
All the same commands work with `/skills`:
@ -442,6 +481,7 @@ All the same commands work with `/skills`:
/skills install openai/skills/skill-creator --force
/skills check
/skills update
/skills reset google-workspace
/skills list
```

View file

@ -18,7 +18,7 @@ The **Tool Gateway** lets paid [Nous Portal](https://portal.nousresearch.com) su
| Tool | What It Does | Direct Alternative |
|------|--------------|--------------------|
| **Web search & extract** | Search the web and extract page content via Firecrawl | `FIRECRAWL_API_KEY`, `EXA_API_KEY`, `PARALLEL_API_KEY`, `TAVILY_API_KEY` |
| **Image generation** | Generate images via FAL (8 models: FLUX 2 Klein/Pro, GPT-Image, Nano Banana, Ideogram, Recraft, Qwen, Z-Image) | `FAL_KEY` |
| **Image generation** | Generate images via FAL (8 models: FLUX 2 Klein/Pro, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, Qwen, Z-Image) | `FAL_KEY` |
| **Text-to-speech** | Convert text to speech via OpenAI TTS | `VOICE_TOOLS_OPENAI_KEY`, `ELEVENLABS_API_KEY` |
| **Browser automation** | Control cloud browsers via Browser Use | `BROWSER_USE_API_KEY`, `BROWSERBASE_API_KEY` |

View file

@ -283,6 +283,10 @@ Discord behavior is controlled through two files: **`~/.hermes/.env`** for crede
| `DISCORD_IGNORED_CHANNELS` | No | — | Comma-separated channel IDs where the bot **never** responds, even when `@mentioned`. Takes priority over all other channel settings. |
| `DISCORD_NO_THREAD_CHANNELS` | No | — | Comma-separated channel IDs where the bot responds directly in the channel instead of creating a thread. Only relevant when `DISCORD_AUTO_THREAD` is `true`. |
| `DISCORD_REPLY_TO_MODE` | No | `"first"` | Controls reply-reference behavior: `"off"` — never reply to the original message, `"first"` — reply-reference on the first message chunk only (default), `"all"` — reply-reference on every chunk. |
| `DISCORD_ALLOW_MENTION_EVERYONE` | No | `false` | When `false` (default), the bot cannot ping `@everyone` or `@here` even if its response contains those tokens. Set to `true` to opt back in. See [Mention Control](#mention-control) below. |
| `DISCORD_ALLOW_MENTION_ROLES` | No | `false` | When `false` (default), the bot cannot ping `@role` mentions. Set to `true` to allow. |
| `DISCORD_ALLOW_MENTION_USERS` | No | `true` | When `true` (default), the bot can ping individual users by ID. |
| `DISCORD_ALLOW_MENTION_REPLIED_USER` | No | `true` | When `true` (default), replying to a message pings the original author. |
### Config File (`config.yaml`)
@ -298,6 +302,11 @@ discord:
ignored_channels: [] # Channel IDs where bot never responds
no_thread_channels: [] # Channel IDs where bot responds without threading
channel_prompts: {} # Per-channel ephemeral system prompts
allow_mentions: # What the bot is allowed to ping (safe defaults)
everyone: false # @everyone / @here pings (default: false)
roles: false # @role pings (default: false)
users: true # @user pings (default: true)
replied_user: true # reply-reference pings the author (default: true)
# Session isolation (applies to all gateway platforms, not just Discord)
group_sessions_per_user: true # Isolate sessions per user in shared channels
@ -552,6 +561,34 @@ If you intentionally want a shared room conversation, leave it off — just expe
Always set `DISCORD_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
:::
### Mention Control
By default, Hermes blocks the bot from pinging `@everyone`, `@here`, and role mentions, even if its reply contains those tokens. This prevents a poorly-worded prompt or echoed user content from spamming a whole server. Individual `@user` pings and reply-reference pings (the little "replying to…" chip) stay enabled so normal conversation still works.
You can relax these defaults via either env vars or `config.yaml`:
```yaml
# ~/.hermes/config.yaml
discord:
allow_mentions:
everyone: false # allow the bot to ping @everyone / @here
roles: false # allow the bot to ping @role mentions
users: true # allow the bot to ping individual @users
replied_user: true # ping the author when replying to their message
```
```bash
# ~/.hermes/.env — env vars win over config.yaml
DISCORD_ALLOW_MENTION_EVERYONE=false
DISCORD_ALLOW_MENTION_ROLES=false
DISCORD_ALLOW_MENTION_USERS=true
DISCORD_ALLOW_MENTION_REPLIED_USER=true
```
:::tip
Leave `everyone` and `roles` at `false` unless you know exactly why you need them. It is very easy for an LLM to produce the string `@everyone` inside a normal-looking response; without this protection, that would notify every member of your server.
:::
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).

View file

@ -16,14 +16,14 @@ This adapter is for **personal WeChat accounts** (微信). If you need enterpris
- A personal WeChat account
- Python packages: `aiohttp` and `cryptography`
- The `qrcode` package is optional (for terminal QR rendering during setup)
- Terminal QR rendering is included when Hermes is installed with the `messaging` extra
Install the required dependencies:
```bash
pip install aiohttp cryptography
# Optional: for terminal QR code display
pip install qrcode
pip install hermes-agent[messaging]
```
## Setup
@ -90,7 +90,7 @@ The adapter will restore saved credentials, connect to the iLink API, and begin
- **Media support** — images, video, files, and voice messages
- **AES-128-ECB encrypted CDN** — automatic encryption/decryption for all media transfers
- **Context token persistence** — disk-backed reply continuity across restarts
- **Markdown formatting**headers, tables, and code blocks are reformatted for WeChat readability
- **Markdown formatting**preserves Markdown, including headers, tables, and code blocks, so WeChat clients that support Markdown can render it natively
- **Smart message chunking** — messages stay as a single bubble when under the limit; only oversized payloads split at logical boundaries
- **Typing indicators** — shows "typing…" status in the WeChat client while the agent processes
- **SSRF protection** — outbound media URLs are validated before download
@ -206,12 +206,12 @@ This ensures reply continuity even after gateway restarts.
## Markdown Formatting
WeChat's personal chat does not natively render full Markdown. The adapter reformats content for better readability:
WeChat clients connected through the iLink Bot API can render Markdown directly, so the adapter preserves Markdown instead of rewriting it:
- **Headers** (`# Title`) → converted to `【Title】` (level 1) or `**Title**` (level 2+)
- **Tables** → reformatted as labeled key-value lists (e.g., `- Column: Value`)
- **Code fences** → preserved as-is (WeChat renders these adequately)
- **Excessive blank lines** → collapsed to double newlines
- **Headers** stay as Markdown headings (`#`, `##`, ...)
- **Tables** stay as Markdown tables
- **Code fences** stay as fenced code blocks
- **Excessive blank lines** are collapsed to double newlines outside fenced code blocks
## Message Chunking
@ -296,4 +296,4 @@ Only one Weixin gateway instance can use a given token at a time. The adapter ac
| Voice messages show as text | If WeChat provides a transcription, the adapter uses the text. This is expected behavior |
| Messages appear duplicated | The adapter deduplicates by message ID. If you see duplicates, check if multiple gateway instances are running |
| `iLink POST ... HTTP 4xx/5xx` | API error from the iLink service. Check your token validity and network connectivity |
| Terminal QR code doesn't render | Install `qrcode`: `pip install qrcode`. Alternatively, open the URL printed above the QR |
| Terminal QR code doesn't render | Reinstall with the messaging extra: `pip install hermes-agent[messaging]`. Alternatively, open the URL printed above the QR |