mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-27 01:11:40 +00:00
Merge remote-tracking branch 'origin/main' into feat/dashboard-chat
This commit is contained in:
commit
1cd2b280fd
373 changed files with 35795 additions and 7622 deletions
|
|
@ -20,7 +20,7 @@ delegate_task(
|
|||
|
||||
## Parallel Batch
|
||||
|
||||
Up to 3 concurrent subagents:
|
||||
Up to 3 concurrent subagents by default (configurable, no hard ceiling):
|
||||
|
||||
```python
|
||||
delegate_task(tasks=[
|
||||
|
|
@ -33,10 +33,10 @@ delegate_task(tasks=[
|
|||
## How Subagent Context Works
|
||||
|
||||
:::warning Critical: Subagents Know Nothing
|
||||
Subagents start with a **completely fresh conversation**. They have zero knowledge of the parent's conversation history, prior tool calls, or anything discussed before delegation. The subagent's only context comes from the `goal` and `context` fields you provide.
|
||||
Subagents start with a **completely fresh conversation**. They have zero knowledge of the parent's conversation history, prior tool calls, or anything discussed before delegation. The subagent's only context comes from the `goal` and `context` fields the parent agent populates when it calls `delegate_task`.
|
||||
:::
|
||||
|
||||
This means you must pass **everything** the subagent needs:
|
||||
This means the parent agent must pass **everything** the subagent needs in the call:
|
||||
|
||||
```python
|
||||
# BAD - subagent has no idea what "the error" is
|
||||
|
|
@ -121,8 +121,8 @@ delegate_task(
|
|||
|
||||
When you provide a `tasks` array, subagents run in **parallel** using a thread pool:
|
||||
|
||||
- **Maximum concurrency:** 3 tasks (the `tasks` array is truncated to 3 if longer)
|
||||
- **Thread pool:** Uses `ThreadPoolExecutor` with `MAX_CONCURRENT_CHILDREN = 3` workers
|
||||
- **Maximum concurrency:** 3 tasks by default (configurable via `delegation.max_concurrent_children` or the `DELEGATION_MAX_CONCURRENT_CHILDREN` env var; floor of 1, no hard ceiling). Batches larger than the limit return a tool error rather than being silently truncated.
|
||||
- **Thread pool:** Uses `ThreadPoolExecutor` with the configured concurrency limit as max workers
|
||||
- **Progress display:** In CLI mode, a tree-view shows tool calls from each subagent in real-time with per-task completion lines. In gateway mode, progress is batched and relayed to the parent's progress callback
|
||||
- **Result ordering:** Results are sorted by task index to match input order regardless of completion order
|
||||
- **Interrupt propagation:** Interrupting the parent (e.g., sending a new message) interrupts all active children
|
||||
|
|
@ -154,8 +154,8 @@ The `toolsets` parameter controls what tools the subagent has access to. Choose
|
|||
| `["file"]` | Read-only analysis, code review without execution |
|
||||
| `["terminal"]` | System administration, process management |
|
||||
|
||||
Certain toolsets are **always blocked** for subagents regardless of what you specify:
|
||||
- `delegation` — no recursive delegation (prevents infinite spawning)
|
||||
Certain toolsets are blocked for subagents regardless of what you specify:
|
||||
- `delegation` — blocked for leaf subagents (the default). Retained for `role="orchestrator"` children, bounded by `max_spawn_depth` — see [Depth Limit and Nested Orchestration](#depth-limit-and-nested-orchestration) below.
|
||||
- `clarify` — subagents cannot interact with the user
|
||||
- `memory` — no writes to shared persistent memory
|
||||
- `code_execution` — children should reason step-by-step
|
||||
|
|
@ -173,16 +173,32 @@ delegate_task(
|
|||
)
|
||||
```
|
||||
|
||||
## Depth Limit
|
||||
## Depth Limit and Nested Orchestration
|
||||
|
||||
Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children (depth 1), but children cannot delegate further. This prevents runaway recursive delegation chains.
|
||||
By default, delegation is **flat**: a parent (depth 0) spawns children (depth 1), and those children cannot delegate further. This prevents runaway recursive delegation.
|
||||
|
||||
For multi-stage workflows (research → synthesis, or parallel orchestration over sub-problems), a parent can spawn **orchestrator** children that *can* delegate their own workers:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Survey three code review approaches and recommend one",
|
||||
role="orchestrator", # Allows this child to spawn its own workers
|
||||
context="...",
|
||||
)
|
||||
```
|
||||
|
||||
- `role="leaf"` (default): child cannot delegate further — identical to the flat-delegation behavior.
|
||||
- `role="orchestrator"`: child retains the `delegation` toolset. Gated by `delegation.max_spawn_depth` (default **1** = flat, so `role="orchestrator"` is a no-op at defaults). Raise `max_spawn_depth` to 2 to allow orchestrator children to spawn leaf grandchildren; 3 for three levels (cap).
|
||||
- `delegation.orchestrator_enabled: false`: global kill switch that forces every child to `leaf` regardless of the `role` parameter.
|
||||
|
||||
**Cost warning:** With `max_spawn_depth: 3` and `max_concurrent_children: 3`, the tree can reach 3×3×3 = 27 concurrent leaf agents. Each extra level multiplies spend — raise `max_spawn_depth` intentionally.
|
||||
|
||||
## Key Properties
|
||||
|
||||
- Each subagent gets its **own terminal session** (separate from the parent)
|
||||
- **No nested delegation** — children cannot delegate further (no grandchildren)
|
||||
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
|
||||
- **Interrupt propagation** — interrupting the parent interrupts all active children
|
||||
- **Nested delegation is opt-in** — only `role="orchestrator"` children can delegate further, and only when `max_spawn_depth` is raised from its default of 1 (flat). Disable globally with `orchestrator_enabled: false`.
|
||||
- Leaf subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`. Orchestrator subagents retain `delegate_task` but still cannot use the other four.
|
||||
- **Interrupt propagation** — interrupting the parent interrupts all active children (including grandchildren under orchestrators)
|
||||
- Only the final summary enters the parent's context, keeping token usage efficient
|
||||
- Subagents inherit the parent's **API key, provider configuration, and credential pool** (enabling key rotation on rate limits)
|
||||
|
||||
|
|
@ -193,7 +209,7 @@ Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children
|
|||
| **Reasoning** | Full LLM reasoning loop | Just Python code execution |
|
||||
| **Context** | Fresh isolated conversation | No conversation, just script |
|
||||
| **Tool access** | All non-blocked tools with reasoning | 7 tools via RPC, no reasoning |
|
||||
| **Parallelism** | Up to 3 concurrent subagents | Single script |
|
||||
| **Parallelism** | 3 concurrent subagents by default (configurable) | Single script |
|
||||
| **Best for** | Complex tasks needing judgment | Mechanical multi-step pipelines |
|
||||
| **Token cost** | Higher (full LLM loop) | Lower (only stdout returned) |
|
||||
| **User interaction** | None (subagents can't clarify) | None |
|
||||
|
|
@ -206,7 +222,9 @@ Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children
|
|||
# In ~/.hermes/config.yaml
|
||||
delegation:
|
||||
max_iterations: 50 # Max turns per child (default: 50)
|
||||
default_toolsets: ["terminal", "file", "web"] # Default toolsets
|
||||
# max_concurrent_children: 3 # Parallel children per batch (default: 3)
|
||||
# max_spawn_depth: 1 # Tree depth (1-3, default 1 = flat). Raise to 2 to allow orchestrator children to spawn leaves; 3 for three levels.
|
||||
# orchestrator_enabled: true # Disable to force all children to leaf role.
|
||||
model: "google/gemini-3-flash-preview" # Optional provider/model override
|
||||
provider: "openrouter" # Optional built-in provider
|
||||
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
---
|
||||
title: Image Generation
|
||||
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, and more, selectable via `hermes tools`.
|
||||
description: Generate images via FAL.ai — 9 models including FLUX 2, GPT Image (1.5 & 2), Nano Banana Pro, Ideogram, Recraft V4 Pro, and more, selectable via `hermes tools`.
|
||||
sidebar_label: Image Generation
|
||||
sidebar_position: 6
|
||||
---
|
||||
|
||||
# Image Generation
|
||||
|
||||
Hermes Agent generates images from text prompts via FAL.ai. Eight models are supported out of the box, each with different speed, quality, and cost tradeoffs. The active model is user-configurable via `hermes tools` and persists in `config.yaml`.
|
||||
Hermes Agent generates images from text prompts via FAL.ai. Nine models are supported out of the box, each with different speed, quality, and cost tradeoffs. The active model is user-configurable via `hermes tools` and persists in `config.yaml`.
|
||||
|
||||
## Supported Models
|
||||
|
||||
|
|
@ -18,6 +18,7 @@ Hermes Agent generates images from text prompts via FAL.ai. Eight models are sup
|
|||
| `fal-ai/z-image/turbo` | ~2s | Bilingual EN/CN, 6B params | $0.005/MP |
|
||||
| `fal-ai/nano-banana-pro` | ~8s | Gemini 3 Pro, reasoning depth, text rendering | $0.15/image (1K) |
|
||||
| `fal-ai/gpt-image-1.5` | ~15s | Prompt adherence | $0.034/image |
|
||||
| `fal-ai/gpt-image-2` | ~20s | SOTA text rendering + CJK, world-aware photorealism | $0.04–0.06/image |
|
||||
| `fal-ai/ideogram/v3` | ~5s | Best typography | $0.03–0.09/image |
|
||||
| `fal-ai/recraft/v4/pro/text-to-image` | ~8s | Design, brand systems, production-ready | $0.25/image |
|
||||
| `fal-ai/qwen-image` | ~12s | LLM-based, complex text | $0.02/MP |
|
||||
|
|
@ -65,7 +66,7 @@ image_gen:
|
|||
|
||||
### GPT-Image Quality
|
||||
|
||||
The `fal-ai/gpt-image-1.5` request quality is pinned to `medium` (~$0.034/image at 1024×1024). We don't expose the `low` / `high` tiers as a user-facing option so that Nous Portal billing stays predictable across all users — the cost spread between tiers is ~22×. If you want a cheaper GPT-Image option, pick a different model; if you want higher quality, use Klein 9B or Imagen-class models.
|
||||
The `fal-ai/gpt-image-1.5` and `fal-ai/gpt-image-2` request quality is pinned to `medium` (~$0.034–$0.06/image at 1024×1024). We don't expose the `low` / `high` tiers as a user-facing option so that Nous Portal billing stays predictable across all users — the cost spread between tiers is 3–22×. If you want a cheaper option, pick Klein 9B or Z-Image Turbo; if you want higher quality, use Nano Banana Pro or Recraft V4 Pro.
|
||||
|
||||
## Usage
|
||||
|
||||
|
|
@ -87,11 +88,13 @@ Make me a futuristic cityscape, landscape orientation
|
|||
|
||||
Every model accepts the same three aspect ratios from the agent's perspective. Internally, each model's native size spec is filled in automatically:
|
||||
|
||||
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image) |
|
||||
|---|---|---|---|
|
||||
| `landscape` | `landscape_16_9` | `16:9` | `1536x1024` |
|
||||
| `square` | `square_hd` | `1:1` | `1024x1024` |
|
||||
| `portrait` | `portrait_16_9` | `9:16` | `1024x1536` |
|
||||
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image-1.5) | image_size (gpt-image-2) |
|
||||
|---|---|---|---|---|
|
||||
| `landscape` | `landscape_16_9` | `16:9` | `1536x1024` | `landscape_4_3` (1024×768) |
|
||||
| `square` | `square_hd` | `1:1` | `1024x1024` | `square_hd` (1024×1024) |
|
||||
| `portrait` | `portrait_16_9` | `9:16` | `1024x1536` | `portrait_4_3` (768×1024) |
|
||||
|
||||
GPT Image 2 maps to 4:3 presets rather than 16:9 because its minimum pixel count is 655,360 — the `landscape_16_9` preset (1024×576 = 589,824) would be rejected.
|
||||
|
||||
This translation happens in `_build_fal_payload()` — agent code never has to know about per-model schema differences.
|
||||
|
||||
|
|
|
|||
|
|
@ -359,7 +359,11 @@ The setup wizard installs dependencies automatically and only installs what's ne
|
|||
| `auto_retain` | `true` | Automatically retain conversation turns |
|
||||
| `auto_recall` | `true` | Automatically recall memories before each turn |
|
||||
| `retain_async` | `true` | Process retain asynchronously on the server |
|
||||
| `tags` | — | Tags applied when storing memories |
|
||||
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
|
||||
| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
|
||||
| `retain_source` | — | Optional `metadata.source` attached to retained memories |
|
||||
| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
|
||||
| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
|
||||
| `recall_tags` | — | Tags to filter on recall |
|
||||
|
||||
See [plugin README](https://github.com/NousResearch/hermes-agent/blob/main/plugins/memory/hindsight/README.md) for the full configuration reference.
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
|
|||
## Automation
|
||||
|
||||
- **[Scheduled Tasks (Cron)](cron.md)** — Schedule tasks to run automatically with natural language or cron expressions. Jobs can attach skills, deliver results to any platform, and support pause/resume/edit operations.
|
||||
- **[Subagent Delegation](delegation.md)** — The `delegate_task` tool spawns child agent instances with isolated context, restricted toolsets, and their own terminal sessions. Run up to 3 concurrent subagents for parallel workstreams.
|
||||
- **[Subagent Delegation](delegation.md)** — The `delegate_task` tool spawns child agent instances with isolated context, restricted toolsets, and their own terminal sessions. Run 3 concurrent subagents by default (configurable) for parallel workstreams.
|
||||
- **[Code Execution](code-execution.md)** — The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn via sandboxed RPC execution.
|
||||
- **[Event Hooks](hooks.md)** — Run custom code at key lifecycle points. Gateway hooks handle logging, alerts, and webhooks; plugin hooks handle tool interception, metrics, and guardrails.
|
||||
- **[Batch Processing](batch-processing.md)** — Run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,
|
|||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with eight providers:
|
||||
Convert text to speech with nine providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
|
|
@ -25,7 +25,8 @@ Convert text to speech with eight providers:
|
|||
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
|
||||
| **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` |
|
||||
| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
|
||||
| **NeuTTS** | Good | Free | None needed |
|
||||
| **NeuTTS** | Good | Free (local) | None needed |
|
||||
| **KittenTTS** | Good | Free (local) | None needed |
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
|
|
@ -41,7 +42,7 @@ Convert text to speech with eight providers:
|
|||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts"
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
|
||||
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
|
|
@ -77,6 +78,11 @@ tts:
|
|||
ref_text: ''
|
||||
model: neuphonic/neutts-air-q4-gguf
|
||||
device: cpu
|
||||
kittentts:
|
||||
model: KittenML/kitten-tts-nano-0.8-int8 # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
|
||||
voice: Jasper # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
|
||||
speed: 1.0 # 0.5 - 2.0
|
||||
clean_text: true # Expand numbers, currencies, units
|
||||
```
|
||||
|
||||
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
|
||||
|
|
@ -91,6 +97,7 @@ Telegram voice bubbles require Opus/OGG audio format:
|
|||
- **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles
|
||||
- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
|
|
@ -103,7 +110,7 @@ brew install ffmpeg
|
|||
sudo dnf install ffmpeg
|
||||
```
|
||||
|
||||
Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, and KittenTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
|
||||
:::tip
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
|
||||
|
|
|
|||
|
|
@ -27,50 +27,52 @@ How you attach an image depends on your terminal environment. Not all methods wo
|
|||
|
||||
### `/paste` Command
|
||||
|
||||
**The most reliable method. Works everywhere.**
|
||||
**The most reliable explicit image-attach fallback.**
|
||||
|
||||
```
|
||||
/paste
|
||||
```
|
||||
|
||||
Type `/paste` and press Enter. Hermes checks your clipboard for an image and attaches it. This works in every environment because it explicitly calls the clipboard backend — no terminal keybinding interception to worry about.
|
||||
Type `/paste` and press Enter. Hermes checks your clipboard for an image and attaches it. This is the safest option when your terminal rewrites `Cmd+V`/`Ctrl+V`, or when you copied only an image and there is no bracketed-paste text payload to inspect.
|
||||
|
||||
### Ctrl+V / Cmd+V (Bracketed Paste)
|
||||
### Ctrl+V / Cmd+V
|
||||
|
||||
When you paste text that's on the clipboard alongside an image, Hermes automatically checks for an image too. This works when:
|
||||
- Your clipboard contains **both text and an image** (some apps put both on the clipboard when you copy)
|
||||
- Your terminal supports bracketed paste (most modern terminals do)
|
||||
Hermes now treats paste as a layered flow:
|
||||
- normal text paste first
|
||||
- native clipboard / OSC52 text fallback if the terminal did not deliver text cleanly
|
||||
- image attach when the clipboard or pasted payload resolves to an image or image path
|
||||
|
||||
This means pasted macOS screenshot temp paths and `file://...` image URIs can attach immediately instead of sitting in the composer as raw text.
|
||||
|
||||
:::warning
|
||||
If your clipboard has **only an image** (no text), Ctrl+V does nothing in most terminals. Terminals can only paste text — there's no standard mechanism to paste binary image data. Use `/paste` or Alt+V instead.
|
||||
If your clipboard has **only an image** (no text), terminals still cannot send binary image bytes directly. Use `/paste` as the explicit image-attach fallback.
|
||||
:::
|
||||
|
||||
### Alt+V
|
||||
### `/terminal-setup` for VS Code / Cursor / Windsurf
|
||||
|
||||
Alt key combinations pass through most terminal emulators (they're sent as ESC + key rather than being intercepted). Press `Alt+V` to check the clipboard for an image.
|
||||
If you run the TUI inside a local VS Code-family integrated terminal on macOS, Hermes can install the recommended `workbench.action.terminal.sendSequence` bindings for better multiline and undo/redo parity:
|
||||
|
||||
:::caution
|
||||
**Does not work in VSCode's integrated terminal.** VSCode intercepts many Alt+key combos for its own UI. Use `/paste` instead.
|
||||
:::
|
||||
```text
|
||||
/terminal-setup
|
||||
```
|
||||
|
||||
### Ctrl+V (Raw — Linux Only)
|
||||
|
||||
On Linux desktop terminals (GNOME Terminal, Konsole, Alacritty, etc.), `Ctrl+V` is **not** the paste shortcut — `Ctrl+Shift+V` is. So `Ctrl+V` sends a raw byte to the application, and Hermes catches it to check the clipboard. This only works on Linux desktop terminals with X11 or Wayland clipboard access.
|
||||
This is especially useful when `Cmd+Enter`, `Cmd+Z`, or `Shift+Cmd+Z` are being intercepted by the IDE. Run it on the local machine only — not inside an SSH session.
|
||||
|
||||
## Platform Compatibility
|
||||
|
||||
| Environment | `/paste` | Ctrl+V text+image | Alt+V | Notes |
|
||||
| Environment | `/paste` | Cmd/Ctrl+V | `/terminal-setup` | Notes |
|
||||
|---|:---:|:---:|:---:|---|
|
||||
| **macOS Terminal / iTerm2** | ✅ | ✅ | ✅ | Best experience — `osascript` always available |
|
||||
| **Linux X11 desktop** | ✅ | ✅ | ✅ | Requires `xclip` (`apt install xclip`) |
|
||||
| **Linux Wayland desktop** | ✅ | ✅ | ✅ | Requires `wl-paste` (`apt install wl-clipboard`) |
|
||||
| **WSL2 (Windows Terminal)** | ✅ | ✅¹ | ✅ | Uses `powershell.exe` — no extra install needed |
|
||||
| **VSCode Terminal (local)** | ✅ | ✅¹ | ❌ | VSCode intercepts Alt+key |
|
||||
| **VSCode Terminal (SSH)** | ❌² | ❌² | ❌ | Remote clipboard not accessible |
|
||||
| **SSH terminal (any)** | ❌² | ❌² | ❌² | Remote clipboard not accessible |
|
||||
| **macOS Terminal / iTerm2** | ✅ | ✅ | n/a | Best experience — native clipboard + screenshot-path recovery |
|
||||
| **Apple Terminal** | ✅ | ✅ | n/a | If Cmd+←/→/⌫ gets rewritten, use Ctrl+A / Ctrl+E / Ctrl+U fallbacks |
|
||||
| **Linux X11 desktop** | ✅ | ✅ | n/a | Requires `xclip` (`apt install xclip`) |
|
||||
| **Linux Wayland desktop** | ✅ | ✅ | n/a | Requires `wl-paste` (`apt install wl-clipboard`) |
|
||||
| **WSL2 (Windows Terminal)** | ✅ | ✅ | n/a | Uses `powershell.exe` — no extra install needed |
|
||||
| **VS Code / Cursor / Windsurf (local)** | ✅ | ✅ | ✅ | Recommended for better Cmd+Enter / undo / redo parity |
|
||||
| **VS Code / Cursor / Windsurf (SSH)** | ❌² | ❌² | ❌³ | Run `/terminal-setup` on the local machine instead |
|
||||
| **SSH terminal (any)** | ❌² | ❌² | n/a | Remote clipboard not accessible |
|
||||
|
||||
¹ Only when clipboard has both text and an image (image-only clipboard = nothing happens)
|
||||
² See [SSH & Remote Sessions](#ssh--remote-sessions) below
|
||||
³ The command writes local IDE keybindings and should not be run from the remote host
|
||||
|
||||
## Platform-Specific Setup
|
||||
|
||||
|
|
@ -145,7 +147,9 @@ powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms;
|
|||
|
||||
## SSH & Remote Sessions
|
||||
|
||||
**Clipboard paste does not work over SSH.** When you SSH into a remote machine, the Hermes CLI runs on the remote host. All clipboard tools (`xclip`, `wl-paste`, `powershell.exe`, `osascript`) read the clipboard of the machine they run on — which is the remote server, not your local machine. Your local clipboard is inaccessible from the remote side.
|
||||
**Clipboard image paste does not fully work over SSH.** When you SSH into a remote machine, the Hermes CLI runs on the remote host. Clipboard tools (`xclip`, `wl-paste`, `powershell.exe`, `osascript`) read the clipboard of the machine they run on — which is the remote server, not your local machine. Your local clipboard image is therefore inaccessible from the remote side.
|
||||
|
||||
Text can sometimes still bridge through terminal paste or OSC52, but image clipboard access and local screenshot temp paths remain tied to the machine running Hermes.
|
||||
|
||||
### Workarounds for SSH
|
||||
|
||||
|
|
|
|||
|
|
@ -149,7 +149,7 @@ Two-stage algorithm detects when you've finished speaking:
|
|||
|
||||
If no speech is detected at all for 15 seconds, recording stops automatically.
|
||||
|
||||
Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`.
|
||||
Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`. You can also disable the record start/stop beeps with `voice.beep_enabled: false`.
|
||||
|
||||
### Streaming TTS
|
||||
|
||||
|
|
@ -383,6 +383,7 @@ voice:
|
|||
record_key: "ctrl+b" # Key to start/stop recording
|
||||
max_recording_seconds: 120 # Maximum recording length
|
||||
auto_tts: false # Auto-enable TTS when voice mode starts
|
||||
beep_enabled: true # Play record start/stop beeps
|
||||
silence_threshold: 200 # RMS level (0-32767) below which counts as silence
|
||||
silence_duration: 3.0 # Seconds of silence before auto-stop
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue