mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor
This commit is contained in:
commit
1f37ef2fd1
126 changed files with 12584 additions and 2666 deletions
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Image Generation
|
||||
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana, Ideogram, and more, selectable via `hermes tools`.
|
||||
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, and more, selectable via `hermes tools`.
|
||||
sidebar_label: Image Generation
|
||||
sidebar_position: 6
|
||||
---
|
||||
|
|
@ -13,13 +13,13 @@ Hermes Agent generates images from text prompts via FAL.ai. Eight models are sup
|
|||
|
||||
| Model | Speed | Strengths | Price |
|
||||
|---|---|---|---|
|
||||
| `fal-ai/flux-2/klein/9b` *(default)* | <1s | Fast, crisp text | $0.006/MP |
|
||||
| `fal-ai/flux-2/klein/9b` *(default)* | `<1s` | Fast, crisp text | $0.006/MP |
|
||||
| `fal-ai/flux-2-pro` | ~6s | Studio photorealism | $0.03/MP |
|
||||
| `fal-ai/z-image/turbo` | ~2s | Bilingual EN/CN, 6B params | $0.005/MP |
|
||||
| `fal-ai/nano-banana` | ~6s | Gemini 2.5, character consistency | $0.08/image |
|
||||
| `fal-ai/nano-banana-pro` | ~8s | Gemini 3 Pro, reasoning depth, text rendering | $0.15/image (1K) |
|
||||
| `fal-ai/gpt-image-1.5` | ~15s | Prompt adherence | $0.034/image |
|
||||
| `fal-ai/ideogram/v3` | ~5s | Best typography | $0.03–0.09/image |
|
||||
| `fal-ai/recraft-v3` | ~8s | Vector art, brand styles | $0.04/image |
|
||||
| `fal-ai/recraft/v4/pro/text-to-image` | ~8s | Design, brand systems, production-ready | $0.25/image |
|
||||
| `fal-ai/qwen-image` | ~12s | LLM-based, complex text | $0.02/MP |
|
||||
|
||||
Prices are FAL's pricing at time of writing; check [fal.ai](https://fal.ai/) for current numbers.
|
||||
|
|
@ -87,7 +87,7 @@ Make me a futuristic cityscape, landscape orientation
|
|||
|
||||
Every model accepts the same three aspect ratios from the agent's perspective. Internally, each model's native size spec is filled in automatically:
|
||||
|
||||
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana) | image_size (gpt-image) |
|
||||
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image) |
|
||||
|---|---|---|---|
|
||||
| `landscape` | `landscape_16_9` | `16:9` | `1536x1024` |
|
||||
| `square` | `square_hd` | `1:1` | `1024x1024` |
|
||||
|
|
|
|||
|
|
@ -30,7 +30,7 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
|
|||
- **[Voice Mode](voice-mode.md)** — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
|
||||
- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
|
||||
- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
|
||||
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana, Ideogram V3, Recraft V3, Qwen, Z-Image Turbo); pick one via `hermes tools`.
|
||||
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana Pro, Ideogram V3, Recraft V4 Pro, Qwen, Z-Image Turbo); pick one via `hermes tools`.
|
||||
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with five provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, and NeuTTS.
|
||||
|
||||
## Integrations
|
||||
|
|
|
|||
|
|
@ -278,6 +278,8 @@ hermes skills check # Check installed hub skills f
|
|||
hermes skills update # Reinstall hub skills with upstream changes when needed
|
||||
hermes skills audit # Re-scan all hub skills for security
|
||||
hermes skills uninstall k8s # Remove a hub skill
|
||||
hermes skills reset google-workspace # Un-stick a bundled skill from "user-modified" (see below)
|
||||
hermes skills reset google-workspace --restore # Also restore the bundled version, deleting your local edits
|
||||
hermes skills publish skills/my-skill --to github --repo owner/repo
|
||||
hermes skills snapshot export setup.json # Export skill config
|
||||
hermes skills tap add myorg/skills-repo # Add a custom GitHub source
|
||||
|
|
@ -430,6 +432,43 @@ This uses the stored source identifier plus the current upstream bundle content
|
|||
Skills hub operations use the GitHub API, which has a rate limit of 60 requests/hour for unauthenticated users. If you see rate-limit errors during install or search, set `GITHUB_TOKEN` in your `.env` file to increase the limit to 5,000 requests/hour. The error message includes an actionable hint when this happens.
|
||||
:::
|
||||
|
||||
## Bundled skill updates (`hermes skills reset`)
|
||||
|
||||
Hermes ships with a set of bundled skills in `skills/` inside the repo. On install and on every `hermes update`, a sync pass copies those into `~/.hermes/skills/` and records a manifest at `~/.hermes/skills/.bundled_manifest` mapping each skill name to the content hash at the time it was synced (the **origin hash**).
|
||||
|
||||
On each sync, Hermes recomputes the hash of your local copy and compares it to the origin hash:
|
||||
|
||||
- **Unchanged** → safe to pull upstream changes, copy the new bundled version in, record the new origin hash.
|
||||
- **Changed** → treated as **user-modified** and skipped forever, so your edits never get stomped.
|
||||
|
||||
The protection is good, but it has one sharp edge. If you edit a bundled skill and then later want to abandon your changes and go back to the bundled version by just copy-pasting from `~/.hermes/hermes-agent/skills/`, the manifest still holds the *old* origin hash from whenever the last successful sync ran. Your fresh copy-paste contents (current bundled hash) won't match that stale origin hash, so sync keeps flagging it as user-modified.
|
||||
|
||||
`hermes skills reset` is the escape hatch:
|
||||
|
||||
```bash
|
||||
# Safe: clears the manifest entry for this skill. Your current copy is preserved,
|
||||
# but the next sync re-baselines against it so future updates work normally.
|
||||
hermes skills reset google-workspace
|
||||
|
||||
# Full restore: also deletes your local copy and re-copies the current bundled
|
||||
# version. Use this when you want the pristine upstream skill back.
|
||||
hermes skills reset google-workspace --restore
|
||||
|
||||
# Non-interactive (e.g. in scripts or TUI mode) — skip the --restore confirmation.
|
||||
hermes skills reset google-workspace --restore --yes
|
||||
```
|
||||
|
||||
The same command works in chat as a slash command:
|
||||
|
||||
```text
|
||||
/skills reset google-workspace
|
||||
/skills reset google-workspace --restore
|
||||
```
|
||||
|
||||
:::note Profiles
|
||||
Each profile has its own `.bundled_manifest` under its own `HERMES_HOME`, so `hermes -p coder skills reset <name>` only affects that profile.
|
||||
:::
|
||||
|
||||
### Slash commands (inside chat)
|
||||
|
||||
All the same commands work with `/skills`:
|
||||
|
|
@ -442,6 +481,7 @@ All the same commands work with `/skills`:
|
|||
/skills install openai/skills/skill-creator --force
|
||||
/skills check
|
||||
/skills update
|
||||
/skills reset google-workspace
|
||||
/skills list
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -18,7 +18,7 @@ The **Tool Gateway** lets paid [Nous Portal](https://portal.nousresearch.com) su
|
|||
| Tool | What It Does | Direct Alternative |
|
||||
|------|--------------|--------------------|
|
||||
| **Web search & extract** | Search the web and extract page content via Firecrawl | `FIRECRAWL_API_KEY`, `EXA_API_KEY`, `PARALLEL_API_KEY`, `TAVILY_API_KEY` |
|
||||
| **Image generation** | Generate images via FAL (8 models: FLUX 2 Klein/Pro, GPT-Image, Nano Banana, Ideogram, Recraft, Qwen, Z-Image) | `FAL_KEY` |
|
||||
| **Image generation** | Generate images via FAL (8 models: FLUX 2 Klein/Pro, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, Qwen, Z-Image) | `FAL_KEY` |
|
||||
| **Text-to-speech** | Convert text to speech via OpenAI TTS | `VOICE_TOOLS_OPENAI_KEY`, `ELEVENLABS_API_KEY` |
|
||||
| **Browser automation** | Control cloud browsers via Browser Use | `BROWSER_USE_API_KEY`, `BROWSERBASE_API_KEY` |
|
||||
|
||||
|
|
|
|||
|
|
@ -283,6 +283,10 @@ Discord behavior is controlled through two files: **`~/.hermes/.env`** for crede
|
|||
| `DISCORD_IGNORED_CHANNELS` | No | — | Comma-separated channel IDs where the bot **never** responds, even when `@mentioned`. Takes priority over all other channel settings. |
|
||||
| `DISCORD_NO_THREAD_CHANNELS` | No | — | Comma-separated channel IDs where the bot responds directly in the channel instead of creating a thread. Only relevant when `DISCORD_AUTO_THREAD` is `true`. |
|
||||
| `DISCORD_REPLY_TO_MODE` | No | `"first"` | Controls reply-reference behavior: `"off"` — never reply to the original message, `"first"` — reply-reference on the first message chunk only (default), `"all"` — reply-reference on every chunk. |
|
||||
| `DISCORD_ALLOW_MENTION_EVERYONE` | No | `false` | When `false` (default), the bot cannot ping `@everyone` or `@here` even if its response contains those tokens. Set to `true` to opt back in. See [Mention Control](#mention-control) below. |
|
||||
| `DISCORD_ALLOW_MENTION_ROLES` | No | `false` | When `false` (default), the bot cannot ping `@role` mentions. Set to `true` to allow. |
|
||||
| `DISCORD_ALLOW_MENTION_USERS` | No | `true` | When `true` (default), the bot can ping individual users by ID. |
|
||||
| `DISCORD_ALLOW_MENTION_REPLIED_USER` | No | `true` | When `true` (default), replying to a message pings the original author. |
|
||||
|
||||
### Config File (`config.yaml`)
|
||||
|
||||
|
|
@ -298,6 +302,11 @@ discord:
|
|||
ignored_channels: [] # Channel IDs where bot never responds
|
||||
no_thread_channels: [] # Channel IDs where bot responds without threading
|
||||
channel_prompts: {} # Per-channel ephemeral system prompts
|
||||
allow_mentions: # What the bot is allowed to ping (safe defaults)
|
||||
everyone: false # @everyone / @here pings (default: false)
|
||||
roles: false # @role pings (default: false)
|
||||
users: true # @user pings (default: true)
|
||||
replied_user: true # reply-reference pings the author (default: true)
|
||||
|
||||
# Session isolation (applies to all gateway platforms, not just Discord)
|
||||
group_sessions_per_user: true # Isolate sessions per user in shared channels
|
||||
|
|
@ -552,6 +561,34 @@ If you intentionally want a shared room conversation, leave it off — just expe
|
|||
Always set `DISCORD_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
|
||||
:::
|
||||
|
||||
### Mention Control
|
||||
|
||||
By default, Hermes blocks the bot from pinging `@everyone`, `@here`, and role mentions, even if its reply contains those tokens. This prevents a poorly-worded prompt or echoed user content from spamming a whole server. Individual `@user` pings and reply-reference pings (the little "replying to…" chip) stay enabled so normal conversation still works.
|
||||
|
||||
You can relax these defaults via either env vars or `config.yaml`:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
discord:
|
||||
allow_mentions:
|
||||
everyone: false # allow the bot to ping @everyone / @here
|
||||
roles: false # allow the bot to ping @role mentions
|
||||
users: true # allow the bot to ping individual @users
|
||||
replied_user: true # ping the author when replying to their message
|
||||
```
|
||||
|
||||
```bash
|
||||
# ~/.hermes/.env — env vars win over config.yaml
|
||||
DISCORD_ALLOW_MENTION_EVERYONE=false
|
||||
DISCORD_ALLOW_MENTION_ROLES=false
|
||||
DISCORD_ALLOW_MENTION_USERS=true
|
||||
DISCORD_ALLOW_MENTION_REPLIED_USER=true
|
||||
```
|
||||
|
||||
:::tip
|
||||
Leave `everyone` and `roles` at `false` unless you know exactly why you need them. It is very easy for an LLM to produce the string `@everyone` inside a normal-looking response; without this protection, that would notify every member of your server.
|
||||
:::
|
||||
|
||||
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -16,14 +16,14 @@ This adapter is for **personal WeChat accounts** (微信). If you need enterpris
|
|||
|
||||
- A personal WeChat account
|
||||
- Python packages: `aiohttp` and `cryptography`
|
||||
- The `qrcode` package is optional (for terminal QR rendering during setup)
|
||||
- Terminal QR rendering is included when Hermes is installed with the `messaging` extra
|
||||
|
||||
Install the required dependencies:
|
||||
|
||||
```bash
|
||||
pip install aiohttp cryptography
|
||||
# Optional: for terminal QR code display
|
||||
pip install qrcode
|
||||
pip install hermes-agent[messaging]
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
|
@ -90,7 +90,7 @@ The adapter will restore saved credentials, connect to the iLink API, and begin
|
|||
- **Media support** — images, video, files, and voice messages
|
||||
- **AES-128-ECB encrypted CDN** — automatic encryption/decryption for all media transfers
|
||||
- **Context token persistence** — disk-backed reply continuity across restarts
|
||||
- **Markdown formatting** — headers, tables, and code blocks are reformatted for WeChat readability
|
||||
- **Markdown formatting** — preserves Markdown, including headers, tables, and code blocks, so WeChat clients that support Markdown can render it natively
|
||||
- **Smart message chunking** — messages stay as a single bubble when under the limit; only oversized payloads split at logical boundaries
|
||||
- **Typing indicators** — shows "typing…" status in the WeChat client while the agent processes
|
||||
- **SSRF protection** — outbound media URLs are validated before download
|
||||
|
|
@ -206,12 +206,12 @@ This ensures reply continuity even after gateway restarts.
|
|||
|
||||
## Markdown Formatting
|
||||
|
||||
WeChat's personal chat does not natively render full Markdown. The adapter reformats content for better readability:
|
||||
WeChat clients connected through the iLink Bot API can render Markdown directly, so the adapter preserves Markdown instead of rewriting it:
|
||||
|
||||
- **Headers** (`# Title`) → converted to `【Title】` (level 1) or `**Title**` (level 2+)
|
||||
- **Tables** → reformatted as labeled key-value lists (e.g., `- Column: Value`)
|
||||
- **Code fences** → preserved as-is (WeChat renders these adequately)
|
||||
- **Excessive blank lines** → collapsed to double newlines
|
||||
- **Headers** stay as Markdown headings (`#`, `##`, ...)
|
||||
- **Tables** stay as Markdown tables
|
||||
- **Code fences** stay as fenced code blocks
|
||||
- **Excessive blank lines** are collapsed to double newlines outside fenced code blocks
|
||||
|
||||
## Message Chunking
|
||||
|
||||
|
|
@ -296,4 +296,4 @@ Only one Weixin gateway instance can use a given token at a time. The adapter ac
|
|||
| Voice messages show as text | If WeChat provides a transcription, the adapter uses the text. This is expected behavior |
|
||||
| Messages appear duplicated | The adapter deduplicates by message ID. If you see duplicates, check if multiple gateway instances are running |
|
||||
| `iLink POST ... HTTP 4xx/5xx` | API error from the iLink service. Check your token validity and network connectivity |
|
||||
| Terminal QR code doesn't render | Install `qrcode`: `pip install qrcode`. Alternatively, open the URL printed above the QR |
|
||||
| Terminal QR code doesn't render | Reinstall with the messaging extra: `pip install hermes-agent[messaging]`. Alternatively, open the URL printed above the QR |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue