mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
docs(website): dedicated page per bundled + optional skill (#14929)
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.
Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.
- website/scripts/generate-skill-docs.py — generator that reads skills/ and
optional-skills/, writes per-skill pages, regenerates both catalog indexes,
and rewrites the Skills section of sidebars.ts. Handles MDX escaping
(outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
before docusaurus build so CI stays in sync with the source SKILL.md files.
Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
This commit is contained in:
parent
eb93f88e1d
commit
0f6eabb890
139 changed files with 43523 additions and 306 deletions
|
|
@ -0,0 +1,161 @@
|
|||
---
|
||||
title: "Blackbox — Delegate coding tasks to Blackbox AI CLI agent"
|
||||
sidebar_label: "Blackbox"
|
||||
description: "Delegate coding tasks to Blackbox AI CLI agent"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Blackbox
|
||||
|
||||
Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. Requires the blackbox CLI and a Blackbox AI API key.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/autonomous-ai-agents/blackbox` |
|
||||
| Path | `optional-skills/autonomous-ai-agents/blackbox` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent (Nous Research) |
|
||||
| License | MIT |
|
||||
| Tags | `Coding-Agent`, `Blackbox`, `Multi-Agent`, `Judge`, `Multi-Model` |
|
||||
| Related skills | [`claude-code`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-claude-code), [`codex`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-codex), [`hermes-agent`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Blackbox CLI
|
||||
|
||||
Delegate coding tasks to [Blackbox AI](https://www.blackbox.ai/) via the Hermes terminal. Blackbox is a multi-model coding agent CLI that dispatches tasks to multiple LLMs (Claude, Codex, Gemini, Blackbox Pro) and uses a judge to select the best implementation.
|
||||
|
||||
The CLI is [open-source](https://github.com/blackboxaicode/cli) (GPL-3.0, TypeScript, forked from Gemini CLI) and supports interactive sessions, non-interactive one-shots, checkpointing, MCP, and vision model switching.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Node.js 20+ installed
|
||||
- Blackbox CLI installed: `npm install -g @blackboxai/cli`
|
||||
- Or install from source:
|
||||
```
|
||||
git clone https://github.com/blackboxaicode/cli.git
|
||||
cd cli && npm install && npm install -g .
|
||||
```
|
||||
- API key from [app.blackbox.ai/dashboard](https://app.blackbox.ai/dashboard)
|
||||
- Configured: run `blackbox configure` and enter your API key
|
||||
- Use `pty=true` in terminal calls — Blackbox CLI is an interactive terminal app
|
||||
|
||||
## One-Shot Tasks
|
||||
|
||||
```
|
||||
terminal(command="blackbox --prompt 'Add JWT authentication with refresh tokens to the Express API'", workdir="/path/to/project", pty=true)
|
||||
```
|
||||
|
||||
For quick scratch work:
|
||||
```
|
||||
terminal(command="cd $(mktemp -d) && git init && blackbox --prompt 'Build a REST API for todos with SQLite'", pty=true)
|
||||
```
|
||||
|
||||
## Background Mode (Long Tasks)
|
||||
|
||||
For tasks that take minutes, use background mode so you can monitor progress:
|
||||
|
||||
```
|
||||
# Start in background with PTY
|
||||
terminal(command="blackbox --prompt 'Refactor the auth module to use OAuth 2.0'", workdir="~/project", background=true, pty=true)
|
||||
# Returns session_id
|
||||
|
||||
# Monitor progress
|
||||
process(action="poll", session_id="<id>")
|
||||
process(action="log", session_id="<id>")
|
||||
|
||||
# Send input if Blackbox asks a question
|
||||
process(action="submit", session_id="<id>", data="yes")
|
||||
|
||||
# Kill if needed
|
||||
process(action="kill", session_id="<id>")
|
||||
```
|
||||
|
||||
## Checkpoints & Resume
|
||||
|
||||
Blackbox CLI has built-in checkpoint support for pausing and resuming tasks:
|
||||
|
||||
```
|
||||
# After a task completes, Blackbox shows a checkpoint tag
|
||||
# Resume with a follow-up task:
|
||||
terminal(command="blackbox --resume-checkpoint 'task-abc123-2026-03-06' --prompt 'Now add rate limiting to the endpoints'", workdir="~/project", pty=true)
|
||||
```
|
||||
|
||||
## Session Commands
|
||||
|
||||
During an interactive session, use these commands:
|
||||
|
||||
| Command | Effect |
|
||||
|---------|--------|
|
||||
| `/compress` | Shrink conversation history to save tokens |
|
||||
| `/clear` | Wipe history and start fresh |
|
||||
| `/stats` | View current token usage |
|
||||
| `Ctrl+C` | Cancel current operation |
|
||||
|
||||
## PR Reviews
|
||||
|
||||
Clone to a temp directory to avoid modifying the working tree:
|
||||
|
||||
```
|
||||
terminal(command="REVIEW=$(mktemp -d) && git clone https://github.com/user/repo.git $REVIEW && cd $REVIEW && gh pr checkout 42 && blackbox --prompt 'Review this PR against main. Check for bugs, security issues, and code quality.'", pty=true)
|
||||
```
|
||||
|
||||
## Parallel Work
|
||||
|
||||
Spawn multiple Blackbox instances for independent tasks:
|
||||
|
||||
```
|
||||
terminal(command="blackbox --prompt 'Fix the login bug'", workdir="/tmp/issue-1", background=true, pty=true)
|
||||
terminal(command="blackbox --prompt 'Add unit tests for auth'", workdir="/tmp/issue-2", background=true, pty=true)
|
||||
|
||||
# Monitor all
|
||||
process(action="list")
|
||||
```
|
||||
|
||||
## Multi-Model Mode
|
||||
|
||||
Blackbox's unique feature is running the same task through multiple models and judging the results. Configure which models to use via `blackbox configure` — select multiple providers to enable the Chairman/judge workflow where the CLI evaluates outputs from different models and picks the best one.
|
||||
|
||||
## Key Flags
|
||||
|
||||
| Flag | Effect |
|
||||
|------|--------|
|
||||
| `--prompt "task"` | Non-interactive one-shot execution |
|
||||
| `--resume-checkpoint "tag"` | Resume from a saved checkpoint |
|
||||
| `--yolo` | Auto-approve all actions and model switches |
|
||||
| `blackbox session` | Start interactive chat session |
|
||||
| `blackbox configure` | Change settings, providers, models |
|
||||
| `blackbox info` | Display system information |
|
||||
|
||||
## Vision Support
|
||||
|
||||
Blackbox automatically detects images in input and can switch to multimodal analysis. VLM modes:
|
||||
- `"once"` — Switch model for current query only
|
||||
- `"session"` — Switch for entire session
|
||||
- `"persist"` — Stay on current model (no switch)
|
||||
|
||||
## Token Limits
|
||||
|
||||
Control token usage via `.blackboxcli/settings.json`:
|
||||
```json
|
||||
{
|
||||
"sessionTokenLimit": 32000
|
||||
}
|
||||
```
|
||||
|
||||
## Rules
|
||||
|
||||
1. **Always use `pty=true`** — Blackbox CLI is an interactive terminal app and will hang without a PTY
|
||||
2. **Use `workdir`** — keep the agent focused on the right directory
|
||||
3. **Background for long tasks** — use `background=true` and monitor with `process` tool
|
||||
4. **Don't interfere** — monitor with `poll`/`log`, don't kill sessions because they're slow
|
||||
5. **Report results** — after completion, check what changed and summarize for the user
|
||||
6. **Credits cost money** — Blackbox uses a credit-based system; multi-model mode consumes credits faster
|
||||
7. **Check prerequisites** — verify `blackbox` CLI is installed before attempting delegation
|
||||
|
|
@ -0,0 +1,445 @@
|
|||
---
|
||||
title: "Honcho"
|
||||
sidebar_label: "Honcho"
|
||||
description: "Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session su..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Honcho
|
||||
|
||||
Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/autonomous-ai-agents/honcho` |
|
||||
| Path | `optional-skills/autonomous-ai-agents/honcho` |
|
||||
| Version | `2.0.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `Honcho`, `Memory`, `Profiles`, `Observation`, `Dialectic`, `User-Modeling`, `Session-Summary` |
|
||||
| Related skills | [`hermes-agent`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Honcho Memory for Hermes
|
||||
|
||||
Honcho provides AI-native cross-session user modeling. It learns who the user is across conversations and gives every Hermes profile its own peer identity while sharing a unified view of the user.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Setting up Honcho (cloud or self-hosted)
|
||||
- Troubleshooting memory not working / peers not syncing
|
||||
- Creating multi-profile setups where each agent has its own Honcho peer
|
||||
- Tuning observation, recall, dialectic depth, or write frequency settings
|
||||
- Understanding what the 5 Honcho tools do and when to use them
|
||||
- Configuring context budgets and session summary injection
|
||||
|
||||
## Setup
|
||||
|
||||
### Cloud (app.honcho.dev)
|
||||
|
||||
```bash
|
||||
hermes honcho setup
|
||||
# select "cloud", paste API key from https://app.honcho.dev
|
||||
```
|
||||
|
||||
### Self-hosted
|
||||
|
||||
```bash
|
||||
hermes honcho setup
|
||||
# select "local", enter base URL (e.g. http://localhost:8000)
|
||||
```
|
||||
|
||||
See: https://docs.honcho.dev/v3/guides/integrations/hermes#running-honcho-locally-with-hermes
|
||||
|
||||
### Verify
|
||||
|
||||
```bash
|
||||
hermes honcho status # shows resolved config, connection test, peer info
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Base Context Injection
|
||||
|
||||
When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
|
||||
|
||||
1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
|
||||
2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
|
||||
3. **AI peer card** -- the identity card for this Hermes profile's AI peer
|
||||
|
||||
The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
|
||||
|
||||
### Cold / Warm Prompt Selection
|
||||
|
||||
Honcho automatically selects between two prompt strategies:
|
||||
|
||||
| Condition | Strategy | What happens |
|
||||
|-----------|----------|--------------|
|
||||
| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
|
||||
| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
|
||||
|
||||
You do not need to configure this -- it is automatic based on session state.
|
||||
|
||||
### Peers
|
||||
|
||||
Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
|
||||
|
||||
- **User peer** (`peerName`): represents the human. Honcho builds a user representation from observed messages.
|
||||
- **AI peer** (`aiPeer`): represents this Hermes instance. Each profile gets its own AI peer so agents develop independent views.
|
||||
|
||||
### Observation
|
||||
|
||||
Each peer has two observation toggles that control what Honcho learns from:
|
||||
|
||||
| Toggle | What it does |
|
||||
|--------|-------------|
|
||||
| `observeMe` | Peer's own messages are observed (builds self-representation) |
|
||||
| `observeOthers` | Other peers' messages are observed (builds cross-peer understanding) |
|
||||
|
||||
Default: all four toggles **on** (full bidirectional observation).
|
||||
|
||||
Configure per-peer in `honcho.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"observation": {
|
||||
"user": { "observeMe": true, "observeOthers": true },
|
||||
"ai": { "observeMe": true, "observeOthers": true }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or use the shorthand presets:
|
||||
|
||||
| Preset | User | AI | Use case |
|
||||
|--------|------|----|----------|
|
||||
| `"directional"` (default) | me:on, others:on | me:on, others:on | Multi-agent, full memory |
|
||||
| `"unified"` | me:on, others:off | me:off, others:on | Single agent, user-only modeling |
|
||||
|
||||
Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init -- server-side config wins over local defaults.
|
||||
|
||||
### Sessions
|
||||
|
||||
Honcho sessions scope where messages and observations land. Strategy options:
|
||||
|
||||
| Strategy | Behavior |
|
||||
|----------|----------|
|
||||
| `per-directory` (default) | One session per working directory |
|
||||
| `per-repo` | One session per git repository root |
|
||||
| `per-session` | New Honcho session each Hermes run |
|
||||
| `global` | Single session across all directories |
|
||||
|
||||
Manual override: `hermes honcho map my-project-name`
|
||||
|
||||
### Recall Modes
|
||||
|
||||
How the agent accesses Honcho memory:
|
||||
|
||||
| Mode | Auto-inject context? | Tools available? | Use case |
|
||||
|------|---------------------|-----------------|----------|
|
||||
| `hybrid` (default) | Yes | Yes | Agent decides when to use tools vs auto context |
|
||||
| `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
|
||||
| `tools` | No | Yes | Agent controls all memory access explicitly |
|
||||
|
||||
## Three Orthogonal Knobs
|
||||
|
||||
Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
|
||||
|
||||
### Cadence (when)
|
||||
|
||||
Controls **how often** dialectic and context calls happen.
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `contextCadence` | `1` | Min turns between context API calls |
|
||||
| `dialecticCadence` | `2` | Min turns between dialectic API calls. Recommended 1–5 |
|
||||
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
|
||||
|
||||
Higher cadence values fire the dialectic LLM less often. `dialecticCadence: 2` means the engine fires every other turn. Setting it to `1` fires every turn.
|
||||
|
||||
### Depth (how many)
|
||||
|
||||
Controls **how many rounds** of dialectic reasoning Honcho performs per query.
|
||||
|
||||
| Key | Default | Range | Description |
|
||||
|-----|---------|-------|-------------|
|
||||
| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
|
||||
| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
|
||||
|
||||
`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
|
||||
|
||||
`dialecticDepthLevels` lets you set the reasoning level for each round independently:
|
||||
|
||||
```json
|
||||
{
|
||||
"dialecticDepth": 3,
|
||||
"dialecticDepthLevels": ["low", "medium", "high"]
|
||||
}
|
||||
```
|
||||
|
||||
If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
|
||||
|
||||
| Depth | Pass levels |
|
||||
|-------|-------------|
|
||||
| 1 | [base] |
|
||||
| 2 | [minimal, base] |
|
||||
| 3 | [minimal, base, low] |
|
||||
|
||||
This keeps earlier passes cheap while using full depth on the final synthesis.
|
||||
|
||||
**Depth at session start.** The session-start prewarm runs the full configured `dialecticDepth` in the background before turn 1. A single-pass prewarm on a cold peer often returns thin output — multi-pass depth runs the audit/reconcile cycle before the user ever speaks. Turn 1 consumes the prewarm result directly; if prewarm hasn't landed in time, turn 1 falls back to a synchronous call with a bounded timeout.
|
||||
|
||||
### Level (how hard)
|
||||
|
||||
Controls the **intensity** of each dialectic reasoning round.
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
|
||||
| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
|
||||
|
||||
Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
|
||||
|
||||
## Multi-Profile Setup
|
||||
|
||||
Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
|
||||
|
||||
- All profiles see the same user representation
|
||||
- Each profile builds its own AI identity and observations
|
||||
- Conclusions written by one profile are visible to others via the shared workspace
|
||||
|
||||
### Create a profile with Honcho peer
|
||||
|
||||
```bash
|
||||
hermes profile create coder --clone
|
||||
# creates host block hermes.coder, AI peer "coder", inherits config from default
|
||||
```
|
||||
|
||||
What `--clone` does for Honcho:
|
||||
1. Creates a `hermes.coder` host block in `honcho.json`
|
||||
2. Sets `aiPeer: "coder"` (the profile name)
|
||||
3. Inherits `workspace`, `peerName`, `writeFrequency`, `recallMode`, etc. from default
|
||||
4. Eagerly creates the peer in Honcho so it exists before first message
|
||||
|
||||
### Backfill existing profiles
|
||||
|
||||
```bash
|
||||
hermes honcho sync # creates host blocks for all profiles that don't have one yet
|
||||
```
|
||||
|
||||
### Per-profile config
|
||||
|
||||
Override any setting in the host block:
|
||||
|
||||
```json
|
||||
{
|
||||
"hosts": {
|
||||
"hermes.coder": {
|
||||
"aiPeer": "coder",
|
||||
"recallMode": "tools",
|
||||
"dialecticDepth": 2,
|
||||
"observation": {
|
||||
"user": { "observeMe": true, "observeOthers": false },
|
||||
"ai": { "observeMe": true, "observeOthers": true }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Tools
|
||||
|
||||
The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
|
||||
|
||||
| Tool | LLM call? | Cost | Use when |
|
||||
|------|-----------|------|----------|
|
||||
| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
|
||||
| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
|
||||
| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
|
||||
| `honcho_reasoning` | Yes | medium–high | Natural language question synthesized by Honcho's dialectic engine |
|
||||
| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |
|
||||
|
||||
### `honcho_profile`
|
||||
Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.
|
||||
|
||||
### `honcho_search`
|
||||
Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.
|
||||
|
||||
### `honcho_context`
|
||||
Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
|
||||
|
||||
### `honcho_reasoning`
|
||||
Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low` → `medium` → `high` → `max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.
|
||||
|
||||
### `honcho_conclude`
|
||||
Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
|
||||
|
||||
### Bidirectional peer targeting
|
||||
|
||||
All 5 tools accept an optional `peer` parameter:
|
||||
- `peer: "user"` (default) — operates on the user peer
|
||||
- `peer: "ai"` — operates on this profile's AI peer
|
||||
- `peer: "<explicit-id>"` — any peer ID in the workspace
|
||||
|
||||
Examples:
|
||||
```
|
||||
honcho_profile # read user's card
|
||||
honcho_profile peer="ai" # read AI peer's card
|
||||
honcho_reasoning query="What does this user care about most?"
|
||||
honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
|
||||
honcho_conclude conclusion="Prefers terse answers"
|
||||
honcho_conclude conclusion="I tend to over-explain code" peer="ai"
|
||||
honcho_conclude delete_id="abc123" # PII removal
|
||||
```
|
||||
|
||||
## Agent Usage Patterns
|
||||
|
||||
Guidelines for Hermes when Honcho memory is active.
|
||||
|
||||
### On conversation start
|
||||
|
||||
```
|
||||
1. honcho_profile → fast warmup, no LLM cost
|
||||
2. If context looks thin → honcho_context (full snapshot, still no LLM)
|
||||
3. If deep synthesis needed → honcho_reasoning (LLM call, use sparingly)
|
||||
```
|
||||
|
||||
Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
|
||||
|
||||
### When the user shares something to remember
|
||||
|
||||
```
|
||||
honcho_conclude conclusion="<specific, actionable fact>"
|
||||
```
|
||||
|
||||
Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
|
||||
Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
|
||||
|
||||
### When the user asks about past context / you need to recall specifics
|
||||
|
||||
```
|
||||
honcho_search query="<topic>" → fast, no LLM, good for specific facts
|
||||
honcho_context → full snapshot with summary + messages
|
||||
honcho_reasoning query="<question>" → synthesized answer, use when search isn't enough
|
||||
```
|
||||
|
||||
### When to use `peer: "ai"`
|
||||
|
||||
Use AI peer targeting to build and query the agent's own self-knowledge:
|
||||
- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
|
||||
- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
|
||||
- `honcho_profile peer="ai"` — review own identity card
|
||||
|
||||
### When NOT to call tools
|
||||
|
||||
In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
|
||||
- You need something the injected context doesn't have
|
||||
- The user explicitly asks you to recall or check memory
|
||||
- You're writing a conclusion about something new
|
||||
|
||||
### Cadence awareness
|
||||
|
||||
`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.
|
||||
|
||||
## Config Reference
|
||||
|
||||
Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.json` (global).
|
||||
|
||||
### Key settings
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `apiKey` | -- | API key ([get one](https://app.honcho.dev)) |
|
||||
| `baseUrl` | -- | Base URL for self-hosted Honcho |
|
||||
| `peerName` | -- | User peer identity |
|
||||
| `aiPeer` | host key | AI peer identity |
|
||||
| `workspace` | host key | Shared workspace ID |
|
||||
| `recallMode` | `hybrid` | `hybrid`, `context`, or `tools` |
|
||||
| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
|
||||
| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
|
||||
| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
|
||||
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
|
||||
|
||||
### Dialectic settings
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
|
||||
| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
|
||||
| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
|
||||
| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
|
||||
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
|
||||
|
||||
### Context budget and injection
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
|
||||
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
|
||||
| `contextCadence` | `1` | Min turns between context API calls |
|
||||
| `dialecticCadence` | `2` | Min turns between dialectic LLM calls (recommended 1–5) |
|
||||
|
||||
The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
|
||||
|
||||
### Memory-context sanitization
|
||||
|
||||
Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
|
||||
|
||||
- Strips XML/HTML tags from user-authored conclusions
|
||||
- Normalizes whitespace and control characters
|
||||
- Truncates individual conclusions that exceed `messageMaxChars`
|
||||
- Escapes delimiter sequences that could break the system prompt structure
|
||||
|
||||
This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Honcho not configured"
|
||||
Run `hermes honcho setup`. Ensure `memory.provider: honcho` is in `~/.hermes/config.yaml`.
|
||||
|
||||
### Memory not persisting across sessions
|
||||
Check `hermes honcho status` -- verify `saveMessages: true` and `writeFrequency` isn't `session` (which only writes on exit).
|
||||
|
||||
### Profile not getting its own peer
|
||||
Use `--clone` when creating: `hermes profile create <name> --clone`. For existing profiles: `hermes honcho sync`.
|
||||
|
||||
### Observation changes in dashboard not reflected
|
||||
Observation config is synced from the server on each session init. Start a new session after changing settings in the Honcho UI.
|
||||
|
||||
### Messages truncated
|
||||
Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.
|
||||
|
||||
### Context injection too large
|
||||
If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
|
||||
|
||||
### Session summary missing
|
||||
Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `hermes honcho setup` | Interactive setup wizard (cloud/local, identity, observation, recall, sessions) |
|
||||
| `hermes honcho status` | Show resolved config, connection test, peer info for active profile |
|
||||
| `hermes honcho enable` | Enable Honcho for the active profile (creates host block if needed) |
|
||||
| `hermes honcho disable` | Disable Honcho for the active profile |
|
||||
| `hermes honcho peer` | Show or update peer names (`--user <name>`, `--ai <name>`, `--reasoning <level>`) |
|
||||
| `hermes honcho peers` | Show peer identities across all profiles |
|
||||
| `hermes honcho mode` | Show or set recall mode (`hybrid`, `context`, `tools`) |
|
||||
| `hermes honcho tokens` | Show or set token budgets (`--context <N>`, `--dialectic <N>`) |
|
||||
| `hermes honcho sessions` | List known directory-to-session-name mappings |
|
||||
| `hermes honcho map <name>` | Map current working directory to a Honcho session name |
|
||||
| `hermes honcho identity` | Seed AI peer identity or show both peer representations |
|
||||
| `hermes honcho sync` | Create host blocks for all Hermes profiles that don't have one yet |
|
||||
| `hermes honcho migrate` | Step-by-step migration guide from OpenClaw native memory to Hermes + Honcho |
|
||||
| `hermes memory setup` | Generic memory provider picker (selecting "honcho" runs the same wizard) |
|
||||
| `hermes memory status` | Show active memory provider and config |
|
||||
| `hermes memory off` | Disable external memory provider |
|
||||
|
|
@ -0,0 +1,248 @@
|
|||
---
|
||||
title: "Base"
|
||||
sidebar_label: "Base"
|
||||
description: "Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detect..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Base
|
||||
|
||||
Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/blockchain/base` |
|
||||
| Path | `optional-skills/blockchain/base` |
|
||||
| Version | `0.1.0` |
|
||||
| Author | youssefea |
|
||||
| License | MIT |
|
||||
| Tags | `Base`, `Blockchain`, `Crypto`, `Web3`, `RPC`, `DeFi`, `EVM`, `L2`, `Ethereum` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Base Blockchain Skill
|
||||
|
||||
Query Base (Ethereum L2) on-chain data enriched with USD pricing via CoinGecko.
|
||||
8 commands: wallet portfolio, token info, transactions, gas analysis,
|
||||
contract inspection, whale detection, network stats, and price lookup.
|
||||
|
||||
No API key needed. Uses only Python standard library (urllib, json, argparse).
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks for a Base wallet balance, token holdings, or portfolio value
|
||||
- User wants to inspect a specific transaction by hash
|
||||
- User wants ERC-20 token metadata, price, supply, or market cap
|
||||
- User wants to understand Base gas costs and L1 data fees
|
||||
- User wants to inspect a contract (ERC type detection, proxy resolution)
|
||||
- User wants to find large ETH transfers (whale detection)
|
||||
- User wants Base network health, gas price, or ETH price
|
||||
- User asks "what's the price of USDC/AERO/DEGEN/ETH?"
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The helper script uses only Python standard library (urllib, json, argparse).
|
||||
No external packages required.
|
||||
|
||||
Pricing data comes from CoinGecko's free API (no key needed, rate-limited
|
||||
to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
RPC endpoint (default): https://mainnet.base.org
|
||||
Override: export BASE_RPC_URL=https://your-private-rpc.com
|
||||
|
||||
Helper script path: ~/.hermes/skills/blockchain/base/scripts/base_client.py
|
||||
|
||||
```
|
||||
python3 base_client.py wallet <address> [--limit N] [--all] [--no-prices]
|
||||
python3 base_client.py tx <hash>
|
||||
python3 base_client.py token <contract_address>
|
||||
python3 base_client.py gas
|
||||
python3 base_client.py contract <address>
|
||||
python3 base_client.py whales [--min-eth N]
|
||||
python3 base_client.py stats
|
||||
python3 base_client.py price <contract_address_or_symbol>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Procedure
|
||||
|
||||
### 0. Setup Check
|
||||
|
||||
```bash
|
||||
python3 --version
|
||||
|
||||
# Optional: set a private RPC for better rate limits
|
||||
export BASE_RPC_URL="https://mainnet.base.org"
|
||||
|
||||
# Confirm connectivity
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
|
||||
```
|
||||
|
||||
### 1. Wallet Portfolio
|
||||
|
||||
Get ETH balance and ERC-20 token holdings with USD values.
|
||||
Checks ~15 well-known Base tokens (USDC, WETH, AERO, DEGEN, etc.)
|
||||
via on-chain `balanceOf` calls. Tokens sorted by value, dust filtered.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
|
||||
wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
|
||||
```
|
||||
|
||||
Flags:
|
||||
- `--limit N` — show top N tokens (default: 20)
|
||||
- `--all` — show all tokens, no dust filter, no limit
|
||||
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
|
||||
|
||||
Output includes: ETH balance + USD value, token list with prices sorted
|
||||
by value, dust count, total portfolio value in USD.
|
||||
|
||||
Note: Only checks known tokens. Unknown ERC-20s are not discovered.
|
||||
Use the `token` command with a specific contract address for any token.
|
||||
|
||||
### 2. Transaction Details
|
||||
|
||||
Inspect a full transaction by its hash. Shows ETH value transferred,
|
||||
gas used, fee in ETH/USD, status, and decoded ERC-20/ERC-721 transfers.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
|
||||
tx 0xabc123...your_tx_hash_here
|
||||
```
|
||||
|
||||
Output: hash, block, from, to, value (ETH + USD), gas price, gas used,
|
||||
fee, status, contract creation address (if any), token transfers.
|
||||
|
||||
### 3. Token Info
|
||||
|
||||
Get ERC-20 token metadata: name, symbol, decimals, total supply, price,
|
||||
market cap, and contract code size.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
|
||||
token 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
|
||||
```
|
||||
|
||||
Output: name, symbol, decimals, total supply, price, market cap.
|
||||
Reads name/symbol/decimals directly from the contract via eth_call.
|
||||
|
||||
### 4. Gas Analysis
|
||||
|
||||
Detailed gas analysis with cost estimates for common operations.
|
||||
Shows current gas price, base fee trends over 10 blocks, block
|
||||
utilization, and estimated costs for ETH transfers, ERC-20 transfers,
|
||||
and swaps.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py gas
|
||||
```
|
||||
|
||||
Output: current gas price, base fee, block utilization, 10-block trend,
|
||||
cost estimates in ETH and USD.
|
||||
|
||||
Note: Base is an L2 — actual transaction costs include an L1 data
|
||||
posting fee that depends on calldata size and L1 gas prices. The
|
||||
estimates shown are for L2 execution only.
|
||||
|
||||
### 5. Contract Inspection
|
||||
|
||||
Inspect an address: determine if it's an EOA or contract, detect
|
||||
ERC-20/ERC-721/ERC-1155 interfaces, resolve EIP-1967 proxy
|
||||
implementation addresses.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
|
||||
contract 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
|
||||
```
|
||||
|
||||
Output: is_contract, code size, ETH balance, detected interfaces
|
||||
(ERC-20, ERC-721, ERC-1155), ERC-20 metadata, proxy implementation
|
||||
address.
|
||||
|
||||
### 6. Whale Detector
|
||||
|
||||
Scan the most recent block for large ETH transfers with USD values.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
|
||||
whales --min-eth 1.0
|
||||
```
|
||||
|
||||
Note: scans the latest block only — point-in-time snapshot, not historical.
|
||||
Default threshold is 1.0 ETH (lower than Solana's default since ETH
|
||||
values are higher).
|
||||
|
||||
### 7. Network Stats
|
||||
|
||||
Live Base network health: latest block, chain ID, gas price, base fee,
|
||||
block utilization, transaction count, and ETH price.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
|
||||
```
|
||||
|
||||
### 8. Price Lookup
|
||||
|
||||
Quick price check for any token by contract address or known symbol.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price ETH
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price USDC
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price AERO
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price DEGEN
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
|
||||
```
|
||||
|
||||
Known symbols: ETH, WETH, USDC, cbETH, AERO, DEGEN, TOSHI, BRETT,
|
||||
WELL, wstETH, rETH, cbBTC.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
|
||||
Price lookups use 1 request per token. Use `--no-prices` for speed.
|
||||
- **Public RPC rate-limits** — Base's public RPC limits requests.
|
||||
For production use, set BASE_RPC_URL to a private endpoint
|
||||
(Alchemy, QuickNode, Infura).
|
||||
- **Wallet shows known tokens only** — unlike Solana, EVM chains have no
|
||||
built-in "get all tokens" RPC. The wallet command checks ~15 popular
|
||||
Base tokens via `balanceOf`. Unknown ERC-20s won't appear. Use the
|
||||
`token` command for any specific contract.
|
||||
- **Token names read from contract** — if a contract doesn't implement
|
||||
`name()` or `symbol()`, these fields may be empty. Known tokens have
|
||||
hardcoded labels as fallback.
|
||||
- **Gas estimates are L2 only** — Base transaction costs include an L1
|
||||
data posting fee (depends on calldata size and L1 gas prices). The gas
|
||||
command estimates L2 execution cost only.
|
||||
- **Whale detector scans latest block only** — not historical. Results
|
||||
vary by the moment you query. Default threshold is 1.0 ETH.
|
||||
- **Proxy detection** — only EIP-1967 proxies are detected. Other proxy
|
||||
patterns (EIP-1167 minimal proxy, custom storage slots) are not checked.
|
||||
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
|
||||
with exponential backoff on rate-limit errors.
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Should print Base chain ID (8453), latest block, gas price, and ETH price
|
||||
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
|
||||
```
|
||||
|
|
@ -0,0 +1,224 @@
|
|||
---
|
||||
title: "Solana"
|
||||
sidebar_label: "Solana"
|
||||
description: "Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network s..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Solana
|
||||
|
||||
Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/blockchain/solana` |
|
||||
| Path | `optional-skills/blockchain/solana` |
|
||||
| Version | `0.2.0` |
|
||||
| Author | Deniz Alagoz (gizdusum), enhanced by Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `Solana`, `Blockchain`, `Crypto`, `Web3`, `RPC`, `DeFi`, `NFT` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Solana Blockchain Skill
|
||||
|
||||
Query Solana on-chain data enriched with USD pricing via CoinGecko.
|
||||
8 commands: wallet portfolio, token info, transactions, activity, NFTs,
|
||||
whale detection, network stats, and price lookup.
|
||||
|
||||
No API key needed. Uses only Python standard library (urllib, json, argparse).
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks for a Solana wallet balance, token holdings, or portfolio value
|
||||
- User wants to inspect a specific transaction by signature
|
||||
- User wants SPL token metadata, price, supply, or top holders
|
||||
- User wants recent transaction history for an address
|
||||
- User wants NFTs owned by a wallet
|
||||
- User wants to find large SOL transfers (whale detection)
|
||||
- User wants Solana network health, TPS, epoch, or SOL price
|
||||
- User asks "what's the price of BONK/JUP/SOL?"
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The helper script uses only Python standard library (urllib, json, argparse).
|
||||
No external packages required.
|
||||
|
||||
Pricing data comes from CoinGecko's free API (no key needed, rate-limited
|
||||
to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
RPC endpoint (default): https://api.mainnet-beta.solana.com
|
||||
Override: export SOLANA_RPC_URL=https://your-private-rpc.com
|
||||
|
||||
Helper script path: ~/.hermes/skills/blockchain/solana/scripts/solana_client.py
|
||||
|
||||
```
|
||||
python3 solana_client.py wallet <address> [--limit N] [--all] [--no-prices]
|
||||
python3 solana_client.py tx <signature>
|
||||
python3 solana_client.py token <mint_address>
|
||||
python3 solana_client.py activity <address> [--limit N]
|
||||
python3 solana_client.py nft <address>
|
||||
python3 solana_client.py whales [--min-sol N]
|
||||
python3 solana_client.py stats
|
||||
python3 solana_client.py price <mint_or_symbol>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Procedure
|
||||
|
||||
### 0. Setup Check
|
||||
|
||||
```bash
|
||||
python3 --version
|
||||
|
||||
# Optional: set a private RPC for better rate limits
|
||||
export SOLANA_RPC_URL="https://api.mainnet-beta.solana.com"
|
||||
|
||||
# Confirm connectivity
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
|
||||
```
|
||||
|
||||
### 1. Wallet Portfolio
|
||||
|
||||
Get SOL balance, SPL token holdings with USD values, NFT count, and
|
||||
portfolio total. Tokens sorted by value, dust filtered, known tokens
|
||||
labeled by name (BONK, JUP, USDC, etc.).
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
wallet 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
|
||||
```
|
||||
|
||||
Flags:
|
||||
- `--limit N` — show top N tokens (default: 20)
|
||||
- `--all` — show all tokens, no dust filter, no limit
|
||||
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
|
||||
|
||||
Output includes: SOL balance + USD value, token list with prices sorted
|
||||
by value, dust count, NFT summary, total portfolio value in USD.
|
||||
|
||||
### 2. Transaction Details
|
||||
|
||||
Inspect a full transaction by its base58 signature. Shows balance changes
|
||||
in both SOL and USD.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
tx 5j7s8K...your_signature_here
|
||||
```
|
||||
|
||||
Output: slot, timestamp, fee, status, balance changes (SOL + USD),
|
||||
program invocations.
|
||||
|
||||
### 3. Token Info
|
||||
|
||||
Get SPL token metadata, current price, market cap, supply, decimals,
|
||||
mint/freeze authorities, and top 5 holders.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
token DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
|
||||
```
|
||||
|
||||
Output: name, symbol, decimals, supply, price, market cap, top 5
|
||||
holders with percentages.
|
||||
|
||||
### 4. Recent Activity
|
||||
|
||||
List recent transactions for an address (default: last 10, max: 25).
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
activity 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM --limit 25
|
||||
```
|
||||
|
||||
### 5. NFT Portfolio
|
||||
|
||||
List NFTs owned by a wallet (heuristic: SPL tokens with amount=1, decimals=0).
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
nft 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
|
||||
```
|
||||
|
||||
Note: Compressed NFTs (cNFTs) are not detected by this heuristic.
|
||||
|
||||
### 6. Whale Detector
|
||||
|
||||
Scan the most recent block for large SOL transfers with USD values.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
|
||||
whales --min-sol 500
|
||||
```
|
||||
|
||||
Note: scans the latest block only — point-in-time snapshot, not historical.
|
||||
|
||||
### 7. Network Stats
|
||||
|
||||
Live Solana network health: current slot, epoch, TPS, supply, validator
|
||||
version, SOL price, and market cap.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
|
||||
```
|
||||
|
||||
### 8. Price Lookup
|
||||
|
||||
Quick price check for any token by mint address or known symbol.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price BONK
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price JUP
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price SOL
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
|
||||
```
|
||||
|
||||
Known symbols: SOL, USDC, USDT, BONK, JUP, WETH, JTO, mSOL, stSOL,
|
||||
PYTH, HNT, RNDR, WEN, W, TNSR, DRIFT, bSOL, JLP, WIF, MEW, BOME, PENGU.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
|
||||
Price lookups use 1 request per token. Wallets with many tokens may
|
||||
not get prices for all of them. Use `--no-prices` for speed.
|
||||
- **Public RPC rate-limits** — Solana mainnet public RPC limits requests.
|
||||
For production use, set SOLANA_RPC_URL to a private endpoint
|
||||
(Helius, QuickNode, Triton).
|
||||
- **NFT detection is heuristic** — amount=1 + decimals=0. Compressed
|
||||
NFTs (cNFTs) and Token-2022 NFTs won't appear.
|
||||
- **Whale detector scans latest block only** — not historical. Results
|
||||
vary by the moment you query.
|
||||
- **Transaction history** — public RPC keeps ~2 days. Older transactions
|
||||
may not be available.
|
||||
- **Token names** — ~25 well-known tokens are labeled by name. Others
|
||||
show abbreviated mint addresses. Use the `token` command for full info.
|
||||
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
|
||||
with exponential backoff on rate-limit errors.
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Should print current Solana slot, TPS, and SOL price
|
||||
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
|
||||
```
|
||||
|
|
@ -0,0 +1,113 @@
|
|||
---
|
||||
title: "One Three One Rule — Structured decision-making framework for technical proposals and trade-off analysis"
|
||||
sidebar_label: "One Three One Rule"
|
||||
description: "Structured decision-making framework for technical proposals and trade-off analysis"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# One Three One Rule
|
||||
|
||||
Structured decision-making framework for technical proposals and trade-off analysis. When the user faces a choice between multiple approaches (architecture decisions, tool selection, refactoring strategies, migration paths), this skill produces a 1-3-1 format: one clear problem statement, three distinct options with pros/cons, and one concrete recommendation with definition of done and implementation plan. Use when the user asks for a "1-3-1", says "give me options", or needs help choosing between competing approaches.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/communication/one-three-one-rule` |
|
||||
| Path | `optional-skills/communication/one-three-one-rule` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Willard Moore |
|
||||
| License | MIT |
|
||||
| Tags | `communication`, `decision-making`, `proposals`, `trade-offs` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# 1-3-1 Communication Rule
|
||||
|
||||
Structured decision-making format for when a task has multiple viable approaches and the user needs a clear recommendation. Produces a concise problem framing, three options with trade-offs, and an actionable plan for the recommended path.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user explicitly asks for a "1-3-1" response.
|
||||
- The user says "give me options" or "what are my choices" for a technical decision.
|
||||
- A task has multiple viable approaches with meaningful trade-offs (architecture, tooling, migration strategy).
|
||||
- The user needs a proposal they can forward to a team or stakeholder.
|
||||
|
||||
Do NOT use for simple questions with one obvious answer, debugging sessions, or tasks where the user has already decided on an approach.
|
||||
|
||||
## Procedure
|
||||
|
||||
1. **Problem** (one sentence)
|
||||
- State the core decision or desired outcome in a single concise sentence.
|
||||
- Focus on the *what*, not the *how* — no implementation details, tool names, or specific technologies.
|
||||
- Keep it tight. If you need "and", you're describing two problems.
|
||||
|
||||
2. **Options** (exactly three)
|
||||
- Present three distinct, viable approaches labeled A, B, C.
|
||||
- Each option gets a brief description, pros, and cons.
|
||||
- Options should represent genuinely different strategies, not minor variations of the same approach.
|
||||
|
||||
3. **Recommendation** (one option)
|
||||
- State which option you recommend and why, based on the user's context and priorities.
|
||||
- Be direct — this is your professional judgment, not a hedge.
|
||||
|
||||
4. **Definition of Done**
|
||||
- List the specific success criteria for the recommended option.
|
||||
- These are concrete, verifiable outcomes — not vague aspirations.
|
||||
- If the user picks a different option, revise this section to match.
|
||||
|
||||
5. **Implementation Plan**
|
||||
- Concrete steps to execute the recommended option.
|
||||
- Include specific commands, tools, or actions where applicable.
|
||||
- If the user picks a different option, revise this section to match.
|
||||
|
||||
## Verification
|
||||
|
||||
- Response has exactly one Problem sentence.
|
||||
- Response has exactly three Options (A, B, C) with pros and cons for each.
|
||||
- Response has a single Recommendation that picks one option with reasoning.
|
||||
- Definition of Done and Implementation Plan align with the recommended option.
|
||||
- If the user selects a different option, Recommendation, DoD, and Implementation Plan update accordingly.
|
||||
|
||||
## Example
|
||||
|
||||
User: "Give me a 1-3-1 for adding retry logic to our API client."
|
||||
|
||||
Agent:
|
||||
|
||||
**Problem:** API calls to external services fail intermittently under load and need graceful retry handling.
|
||||
|
||||
**Options:**
|
||||
|
||||
- **Option A: Exponential backoff with jitter in a shared utility.**
|
||||
Pros: Simple, well-understood pattern; single implementation shared across all callers.
|
||||
Cons: No circuit-breaking; callers still block during retries.
|
||||
|
||||
- **Option B: Retry library (tenacity) as a decorator on each endpoint call.**
|
||||
Pros: Configurable per-endpoint; built-in support for circuit breakers and rate limits.
|
||||
Cons: Adds a dependency; decorator pattern can obscure control flow.
|
||||
|
||||
- **Option C: Queue-based async retry with dead letter handling.**
|
||||
Pros: Non-blocking; failed calls are tracked and replayable; scales well.
|
||||
Cons: Significant complexity increase; requires a queue backend; overkill for low-volume services.
|
||||
|
||||
**Recommendation:** Option A. The call volume doesn't justify queue infrastructure, and a shared utility with exponential backoff + jitter covers the failure mode cleanly without new dependencies.
|
||||
|
||||
**Definition of Done:**
|
||||
- `api_retry()` utility handles transient HTTP errors (429, 502, 503, 504).
|
||||
- Retries up to 3 times with exponential backoff and jitter.
|
||||
- All external API calls route through the retry utility.
|
||||
- Logs each retry attempt with endpoint and attempt number.
|
||||
- Tests cover retry exhaustion, immediate success, and transient-then-success scenarios.
|
||||
|
||||
**Implementation Plan:**
|
||||
1. Create `utils/api_retry.py` with configurable max retries, base delay, and retryable status codes.
|
||||
2. Add jitter using `random.uniform(0, base_delay)` to prevent thundering herd.
|
||||
3. Wrap existing API calls in `api_client.py` with the retry utility.
|
||||
4. Add unit tests mocking HTTP responses for each retry scenario.
|
||||
5. Verify under load with a simple stress test against a flaky endpoint mock.
|
||||
|
|
@ -0,0 +1,134 @@
|
|||
---
|
||||
title: "Blender Mcp — Control Blender directly from Hermes via socket connection to the blender-mcp addon"
|
||||
sidebar_label: "Blender Mcp"
|
||||
description: "Control Blender directly from Hermes via socket connection to the blender-mcp addon"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Blender Mcp
|
||||
|
||||
Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. Use when user wants to create or modify anything in Blender.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/creative/blender-mcp` |
|
||||
| Path | `optional-skills/creative/blender-mcp` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | alireza78a |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Blender MCP
|
||||
|
||||
Control a running Blender instance from Hermes via socket on TCP port 9876.
|
||||
|
||||
## Setup (one-time)
|
||||
|
||||
### 1. Install the Blender addon
|
||||
|
||||
curl -sL https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py -o ~/Desktop/blender_mcp_addon.py
|
||||
|
||||
In Blender:
|
||||
Edit > Preferences > Add-ons > Install > select blender_mcp_addon.py
|
||||
Enable "Interface: Blender MCP"
|
||||
|
||||
### 2. Start the socket server in Blender
|
||||
|
||||
Press N in Blender viewport to open sidebar.
|
||||
Find "BlenderMCP" tab and click "Start Server".
|
||||
|
||||
### 3. Verify connection
|
||||
|
||||
nc -z -w2 localhost 9876 && echo "OPEN" || echo "CLOSED"
|
||||
|
||||
## Protocol
|
||||
|
||||
Plain UTF-8 JSON over TCP -- no length prefix.
|
||||
|
||||
Send: {"type": "<command>", "params": {<kwargs>}}
|
||||
Receive: {"status": "success", "result": <value>}
|
||||
{"status": "error", "message": "<reason>"}
|
||||
|
||||
## Available Commands
|
||||
|
||||
| type | params | description |
|
||||
|-------------------------|-------------------|---------------------------------|
|
||||
| execute_code | code (str) | Run arbitrary bpy Python code |
|
||||
| get_scene_info | (none) | List all objects in scene |
|
||||
| get_object_info | object_name (str) | Details on a specific object |
|
||||
| get_viewport_screenshot | (none) | Screenshot of current viewport |
|
||||
|
||||
## Python Helper
|
||||
|
||||
Use this inside execute_code tool calls:
|
||||
|
||||
import socket, json
|
||||
|
||||
def blender_exec(code: str, host="localhost", port=9876, timeout=15):
|
||||
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
s.connect((host, port))
|
||||
s.settimeout(timeout)
|
||||
payload = json.dumps({"type": "execute_code", "params": {"code": code}})
|
||||
s.sendall(payload.encode("utf-8"))
|
||||
buf = b""
|
||||
while True:
|
||||
try:
|
||||
chunk = s.recv(4096)
|
||||
if not chunk:
|
||||
break
|
||||
buf += chunk
|
||||
try:
|
||||
json.loads(buf.decode("utf-8"))
|
||||
break
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
except socket.timeout:
|
||||
break
|
||||
s.close()
|
||||
return json.loads(buf.decode("utf-8"))
|
||||
|
||||
## Common bpy Patterns
|
||||
|
||||
### Clear scene
|
||||
bpy.ops.object.select_all(action='SELECT')
|
||||
bpy.ops.object.delete()
|
||||
|
||||
### Add mesh objects
|
||||
bpy.ops.mesh.primitive_uv_sphere_add(radius=1, location=(0, 0, 0))
|
||||
bpy.ops.mesh.primitive_cube_add(size=2, location=(3, 0, 0))
|
||||
bpy.ops.mesh.primitive_cylinder_add(radius=0.5, depth=2, location=(-3, 0, 0))
|
||||
|
||||
### Create and assign material
|
||||
mat = bpy.data.materials.new(name="MyMat")
|
||||
mat.use_nodes = True
|
||||
bsdf = mat.node_tree.nodes.get("Principled BSDF")
|
||||
bsdf.inputs["Base Color"].default_value = (R, G, B, 1.0)
|
||||
bsdf.inputs["Roughness"].default_value = 0.3
|
||||
bsdf.inputs["Metallic"].default_value = 0.0
|
||||
obj.data.materials.append(mat)
|
||||
|
||||
### Keyframe animation
|
||||
obj.location = (0, 0, 0)
|
||||
obj.keyframe_insert(data_path="location", frame=1)
|
||||
obj.location = (0, 0, 3)
|
||||
obj.keyframe_insert(data_path="location", frame=60)
|
||||
|
||||
### Render to file
|
||||
bpy.context.scene.render.filepath = "/tmp/render.png"
|
||||
bpy.context.scene.render.engine = 'CYCLES'
|
||||
bpy.ops.render.render(write_still=True)
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- Must check socket is open before running (nc -z localhost 9876)
|
||||
- Addon server must be started inside Blender each session (N-panel > BlenderMCP > Connect)
|
||||
- Break complex scenes into multiple smaller execute_code calls to avoid timeouts
|
||||
- Render output path must be absolute (/tmp/...) not relative
|
||||
- shade_smooth() requires object to be selected and in object mode
|
||||
|
|
@ -0,0 +1,378 @@
|
|||
---
|
||||
title: "Concept Diagrams"
|
||||
sidebar_label: "Concept Diagrams"
|
||||
description: "Generate flat, minimal light/dark-aware SVG diagrams as standalone HTML files, using a unified educational visual language with 9 semantic color ramps, sente..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Concept Diagrams
|
||||
|
||||
Generate flat, minimal light/dark-aware SVG diagrams as standalone HTML files, using a unified educational visual language with 9 semantic color ramps, sentence-case typography, and automatic dark mode. Best suited for educational and non-software visuals — physics setups, chemistry mechanisms, math curves, physical objects (aircraft, turbines, smartphones, mechanical watches), anatomy, floor plans, cross-sections, narrative journeys (lifecycle of X, process of Y), hub-spoke system integrations (smart city, IoT), and exploded layer views. If a more specialized skill exists for the subject (dedicated software/cloud architecture, hand-drawn sketches, animated explainers, etc.), prefer that — otherwise this skill can also serve as a general-purpose SVG diagram fallback with a clean educational look. Ships with 15 example diagrams.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/creative/concept-diagrams` |
|
||||
| Path | `optional-skills/creative/concept-diagrams` |
|
||||
| Version | `0.1.0` |
|
||||
| Author | v1k22 (original PR), ported into hermes-agent |
|
||||
| License | MIT |
|
||||
| Tags | `diagrams`, `svg`, `visualization`, `education`, `physics`, `chemistry`, `engineering` |
|
||||
| Related skills | [`architecture-diagram`](/docs/user-guide/skills/bundled/creative/creative-architecture-diagram), [`excalidraw`](/docs/user-guide/skills/bundled/creative/creative-excalidraw), `generative-widgets` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Concept Diagrams
|
||||
|
||||
Generate production-quality SVG diagrams with a unified flat, minimal design system. Output is a single self-contained HTML file that renders identically in any modern browser, with automatic light/dark mode.
|
||||
|
||||
## Scope
|
||||
|
||||
**Best suited for:**
|
||||
- Physics setups, chemistry mechanisms, math curves, biology
|
||||
- Physical objects (aircraft, turbines, smartphones, mechanical watches, cells)
|
||||
- Anatomy, cross-sections, exploded layer views
|
||||
- Floor plans, architectural conversions
|
||||
- Narrative journeys (lifecycle of X, process of Y)
|
||||
- Hub-spoke system integrations (smart city, IoT networks, electricity grids)
|
||||
- Educational / textbook-style visuals in any domain
|
||||
- Quantitative charts (grouped bars, energy profiles)
|
||||
|
||||
**Look elsewhere first for:**
|
||||
- Dedicated software / cloud infrastructure architecture with a dark tech aesthetic (consider `architecture-diagram` if available)
|
||||
- Hand-drawn whiteboard sketches (consider `excalidraw` if available)
|
||||
- Animated explainers or video output (consider an animation skill)
|
||||
|
||||
If a more specialized skill is available for the subject, prefer that. If none fits, this skill can serve as a general-purpose SVG diagram fallback — the output will carry the clean educational aesthetic described below, which is a reasonable default for almost any subject.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Decide on the diagram type (see Diagram Types below).
|
||||
2. Lay out components using the Design System rules.
|
||||
3. Write the full HTML page using `templates/template.html` as the wrapper — paste your SVG where the template says `<!-- PASTE SVG HERE -->`.
|
||||
4. Save as a standalone `.html` file (for example `~/my-diagram.html` or `./my-diagram.html`).
|
||||
5. User opens it directly in a browser — no server, no dependencies.
|
||||
|
||||
Optional: if the user wants a browsable gallery of multiple diagrams, see "Local Preview Server" at the bottom.
|
||||
|
||||
Load the HTML template:
|
||||
```
|
||||
skill_view(name="concept-diagrams", file_path="templates/template.html")
|
||||
```
|
||||
|
||||
The template embeds the full CSS design system (`c-*` color classes, text classes, light/dark variables, arrow marker styles). The SVG you generate relies on these classes being present on the hosting page.
|
||||
|
||||
---
|
||||
|
||||
## Design System
|
||||
|
||||
### Philosophy
|
||||
|
||||
- **Flat**: no gradients, drop shadows, blur, glow, or neon effects.
|
||||
- **Minimal**: show the essential. No decorative icons inside boxes.
|
||||
- **Consistent**: same colors, spacing, typography, and stroke widths across every diagram.
|
||||
- **Dark-mode ready**: all colors auto-adapt via CSS classes — no per-mode SVG.
|
||||
|
||||
### Color Palette
|
||||
|
||||
9 color ramps, each with 7 stops. Put the class name on a `<g>` or shape element; the template CSS handles both modes.
|
||||
|
||||
| Class | 50 (lightest) | 100 | 200 | 400 | 600 | 800 | 900 (darkest) |
|
||||
|------------|---------------|---------|---------|---------|---------|---------|---------------|
|
||||
| `c-purple` | #EEEDFE | #CECBF6 | #AFA9EC | #7F77DD | #534AB7 | #3C3489 | #26215C |
|
||||
| `c-teal` | #E1F5EE | #9FE1CB | #5DCAA5 | #1D9E75 | #0F6E56 | #085041 | #04342C |
|
||||
| `c-coral` | #FAECE7 | #F5C4B3 | #F0997B | #D85A30 | #993C1D | #712B13 | #4A1B0C |
|
||||
| `c-pink` | #FBEAF0 | #F4C0D1 | #ED93B1 | #D4537E | #993556 | #72243E | #4B1528 |
|
||||
| `c-gray` | #F1EFE8 | #D3D1C7 | #B4B2A9 | #888780 | #5F5E5A | #444441 | #2C2C2A |
|
||||
| `c-blue` | #E6F1FB | #B5D4F4 | #85B7EB | #378ADD | #185FA5 | #0C447C | #042C53 |
|
||||
| `c-green` | #EAF3DE | #C0DD97 | #97C459 | #639922 | #3B6D11 | #27500A | #173404 |
|
||||
| `c-amber` | #FAEEDA | #FAC775 | #EF9F27 | #BA7517 | #854F0B | #633806 | #412402 |
|
||||
| `c-red` | #FCEBEB | #F7C1C1 | #F09595 | #E24B4A | #A32D2D | #791F1F | #501313 |
|
||||
|
||||
#### Color Assignment Rules
|
||||
|
||||
Color encodes **meaning**, not sequence. Never cycle through colors like a rainbow.
|
||||
|
||||
- Group nodes by **category** — all nodes of the same type share one color.
|
||||
- Use `c-gray` for neutral/structural nodes (start, end, generic steps, users).
|
||||
- Use **2-3 colors per diagram**, not 6+.
|
||||
- Prefer `c-purple`, `c-teal`, `c-coral`, `c-pink` for general categories.
|
||||
- Reserve `c-blue`, `c-green`, `c-amber`, `c-red` for semantic meaning (info, success, warning, error).
|
||||
|
||||
Light/dark stop mapping (handled by the template CSS — just use the class):
|
||||
- Light mode: 50 fill + 600 stroke + 800 title / 600 subtitle
|
||||
- Dark mode: 800 fill + 200 stroke + 100 title / 200 subtitle
|
||||
|
||||
### Typography
|
||||
|
||||
Only two font sizes. No exceptions.
|
||||
|
||||
| Class | Size | Weight | Use |
|
||||
|-------|------|--------|-----|
|
||||
| `th` | 14px | 500 | Node titles, region labels |
|
||||
| `ts` | 12px | 400 | Subtitles, descriptions, arrow labels |
|
||||
| `t` | 14px | 400 | General text |
|
||||
|
||||
- **Sentence case always.** Never Title Case, never ALL CAPS.
|
||||
- Every `<text>` MUST carry a class (`t`, `ts`, or `th`). No unclassed text.
|
||||
- `dominant-baseline="central"` on all text inside boxes.
|
||||
- `text-anchor="middle"` for centered text in boxes.
|
||||
|
||||
**Width estimation (approx):**
|
||||
- 14px weight 500: ~8px per character
|
||||
- 12px weight 400: ~6.5px per character
|
||||
- Always verify: `box_width >= (char_count × px_per_char) + 48` (24px padding each side)
|
||||
|
||||
### Spacing & Layout
|
||||
|
||||
- **ViewBox**: `viewBox="0 0 680 H"` where H = content height + 40px buffer.
|
||||
- **Safe area**: x=40 to x=640, y=40 to y=(H-40).
|
||||
- **Between boxes**: 60px minimum gap.
|
||||
- **Inside boxes**: 24px horizontal padding, 12px vertical padding.
|
||||
- **Arrowhead gap**: 10px between arrowhead and box edge.
|
||||
- **Single-line box**: 44px height.
|
||||
- **Two-line box**: 56px height, 18px between title and subtitle baselines.
|
||||
- **Container padding**: 20px minimum inside every container.
|
||||
- **Max nesting**: 2-3 levels deep. Deeper gets unreadable at 680px width.
|
||||
|
||||
### Stroke & Shape
|
||||
|
||||
- **Stroke width**: 0.5px on all node borders. Not 1px, not 2px.
|
||||
- **Rect rounding**: `rx="8"` for nodes, `rx="12"` for inner containers, `rx="16"` to `rx="20"` for outer containers.
|
||||
- **Connector paths**: MUST have `fill="none"`. SVG defaults to `fill: black` otherwise.
|
||||
|
||||
### Arrow Marker
|
||||
|
||||
Include this `<defs>` block at the start of **every** SVG:
|
||||
|
||||
```xml
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="8" refY="5"
|
||||
markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M2 1L8 5L2 9" fill="none" stroke="context-stroke"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"/>
|
||||
</marker>
|
||||
</defs>
|
||||
```
|
||||
|
||||
Use `marker-end="url(#arrow)"` on lines. The arrowhead inherits the line color via `context-stroke`.
|
||||
|
||||
### CSS Classes (Provided by the Template)
|
||||
|
||||
The template page provides:
|
||||
|
||||
- Text: `.t`, `.ts`, `.th`
|
||||
- Neutral: `.box`, `.arr`, `.leader`, `.node`
|
||||
- Color ramps: `.c-purple`, `.c-teal`, `.c-coral`, `.c-pink`, `.c-gray`, `.c-blue`, `.c-green`, `.c-amber`, `.c-red` (all with automatic light/dark mode)
|
||||
|
||||
You do **not** need to redefine these — just apply them in your SVG. The template file contains the full CSS definitions.
|
||||
|
||||
---
|
||||
|
||||
## SVG Boilerplate
|
||||
|
||||
Every SVG inside the template page starts with this exact structure:
|
||||
|
||||
```xml
|
||||
<svg width="100%" viewBox="0 0 680 {HEIGHT}" xmlns="http://www.w3.org/2000/svg">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="8" refY="5"
|
||||
markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M2 1L8 5L2 9" fill="none" stroke="context-stroke"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<!-- Diagram content here -->
|
||||
|
||||
</svg>
|
||||
```
|
||||
|
||||
Replace `{HEIGHT}` with the actual computed height (last element bottom + 40px).
|
||||
|
||||
### Node Patterns
|
||||
|
||||
**Single-line node (44px):**
|
||||
```xml
|
||||
<g class="node c-blue">
|
||||
<rect x="100" y="20" width="180" height="44" rx="8" stroke-width="0.5"/>
|
||||
<text class="th" x="190" y="42" text-anchor="middle" dominant-baseline="central">Service name</text>
|
||||
</g>
|
||||
```
|
||||
|
||||
**Two-line node (56px):**
|
||||
```xml
|
||||
<g class="node c-teal">
|
||||
<rect x="100" y="20" width="200" height="56" rx="8" stroke-width="0.5"/>
|
||||
<text class="th" x="200" y="38" text-anchor="middle" dominant-baseline="central">Service name</text>
|
||||
<text class="ts" x="200" y="56" text-anchor="middle" dominant-baseline="central">Short description</text>
|
||||
</g>
|
||||
```
|
||||
|
||||
**Connector (no label):**
|
||||
```xml
|
||||
<line x1="200" y1="76" x2="200" y2="120" class="arr" marker-end="url(#arrow)"/>
|
||||
```
|
||||
|
||||
**Container (dashed or solid):**
|
||||
```xml
|
||||
<g class="c-purple">
|
||||
<rect x="40" y="92" width="600" height="300" rx="16" stroke-width="0.5"/>
|
||||
<text class="th" x="66" y="116">Container label</text>
|
||||
<text class="ts" x="66" y="134">Subtitle info</text>
|
||||
</g>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Diagram Types
|
||||
|
||||
Choose the layout that fits the subject:
|
||||
|
||||
1. **Flowchart** — CI/CD pipelines, request lifecycles, approval workflows, data processing. Single-direction flow (top-down or left-right). Max 4-5 nodes per row.
|
||||
2. **Structural / Containment** — Cloud infrastructure nesting, system architecture with layers. Large outer containers with inner regions. Dashed rects for logical groupings.
|
||||
3. **API / Endpoint Map** — REST routes, GraphQL schemas. Tree from root, branching to resource groups, each containing endpoint nodes.
|
||||
4. **Microservice Topology** — Service mesh, event-driven systems. Services as nodes, arrows for communication patterns, message queues between.
|
||||
5. **Data Flow** — ETL pipelines, streaming architectures. Left-to-right flow from sources through processing to sinks.
|
||||
6. **Physical / Structural** — Vehicles, buildings, hardware, anatomy. Use shapes that match the physical form — `<path>` for curved bodies, `<polygon>` for tapered shapes, `<ellipse>`/`<circle>` for cylindrical parts, nested `<rect>` for compartments. See `references/physical-shape-cookbook.md`.
|
||||
7. **Infrastructure / Systems Integration** — Smart cities, IoT networks, multi-domain systems. Hub-spoke layout with central platform connecting subsystems. Semantic line styles (`.data-line`, `.power-line`, `.water-pipe`, `.road`). See `references/infrastructure-patterns.md`.
|
||||
8. **UI / Dashboard Mockups** — Admin panels, monitoring dashboards. Screen frame with nested chart/gauge/indicator elements. See `references/dashboard-patterns.md`.
|
||||
|
||||
For physical, infrastructure, and dashboard diagrams, load the matching reference file before generating — each one provides ready-made CSS classes and shape primitives.
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before finalizing any SVG, verify ALL of the following:
|
||||
|
||||
1. Every `<text>` has class `t`, `ts`, or `th`.
|
||||
2. Every `<text>` inside a box has `dominant-baseline="central"`.
|
||||
3. Every connector `<path>` or `<line>` used as arrow has `fill="none"`.
|
||||
4. No arrow line crosses through an unrelated box.
|
||||
5. `box_width >= (longest_label_chars × 8) + 48` for 14px text.
|
||||
6. `box_width >= (longest_label_chars × 6.5) + 48` for 12px text.
|
||||
7. ViewBox height = bottom-most element + 40px.
|
||||
8. All content stays within x=40 to x=640.
|
||||
9. Color classes (`c-*`) are on `<g>` or shape elements, never on `<path>` connectors.
|
||||
10. Arrow `<defs>` block is present.
|
||||
11. No gradients, shadows, blur, or glow effects.
|
||||
12. Stroke width is 0.5px on all node borders.
|
||||
|
||||
---
|
||||
|
||||
## Output & Preview
|
||||
|
||||
### Default: standalone HTML file
|
||||
|
||||
Write a single `.html` file the user can open directly. No server, no dependencies, works offline. Pattern:
|
||||
|
||||
```python
|
||||
# 1. Load the template
|
||||
template = skill_view("concept-diagrams", "templates/template.html")
|
||||
|
||||
# 2. Fill in title, subtitle, and paste your SVG
|
||||
html = template.replace(
|
||||
"<!-- DIAGRAM TITLE HERE -->", "SN2 reaction mechanism"
|
||||
).replace(
|
||||
"<!-- OPTIONAL SUBTITLE HERE -->", "Bimolecular nucleophilic substitution"
|
||||
).replace(
|
||||
"<!-- PASTE SVG HERE -->", svg_content
|
||||
)
|
||||
|
||||
# 3. Write to a user-chosen path (or ./ by default)
|
||||
write_file("./sn2-mechanism.html", html)
|
||||
```
|
||||
|
||||
Tell the user how to open it:
|
||||
|
||||
```
|
||||
# macOS
|
||||
open ./sn2-mechanism.html
|
||||
# Linux
|
||||
xdg-open ./sn2-mechanism.html
|
||||
```
|
||||
|
||||
### Optional: local preview server (multi-diagram gallery)
|
||||
|
||||
Only use this when the user explicitly wants a browsable gallery of multiple diagrams.
|
||||
|
||||
**Rules:**
|
||||
- Bind to `127.0.0.1` only. Never `0.0.0.0`. Exposing diagrams on all network interfaces is a security hazard on shared networks.
|
||||
- Pick a free port (do NOT hard-code one) and tell the user the chosen URL.
|
||||
- The server is optional and opt-in — prefer the standalone HTML file first.
|
||||
|
||||
Recommended pattern (lets the OS pick a free ephemeral port):
|
||||
|
||||
```bash
|
||||
# Put each diagram in its own folder under .diagrams/
|
||||
mkdir -p .diagrams/sn2-mechanism
|
||||
# ...write .diagrams/sn2-mechanism/index.html...
|
||||
|
||||
# Serve on loopback only, free port
|
||||
cd .diagrams && python3 -c "
|
||||
import http.server, socketserver
|
||||
with socketserver.TCPServer(('127.0.0.1', 0), http.server.SimpleHTTPRequestHandler) as s:
|
||||
print(f'Serving at http://127.0.0.1:{s.server_address[1]}/')
|
||||
s.serve_forever()
|
||||
" &
|
||||
```
|
||||
|
||||
If the user insists on a fixed port, use `127.0.0.1:<port>` — still never `0.0.0.0`. Document how to stop the server (`kill %1` or `pkill -f "http.server"`).
|
||||
|
||||
---
|
||||
|
||||
## Examples Reference
|
||||
|
||||
The `examples/` directory ships 15 complete, tested diagrams. Browse them for working patterns before writing a new diagram of a similar type:
|
||||
|
||||
| File | Type | Demonstrates |
|
||||
|------|------|--------------|
|
||||
| `hospital-emergency-department-flow.md` | Flowchart | Priority routing with semantic colors |
|
||||
| `feature-film-production-pipeline.md` | Flowchart | Phased workflow, horizontal sub-flows |
|
||||
| `automated-password-reset-flow.md` | Flowchart | Auth flow with error branches |
|
||||
| `autonomous-llm-research-agent-flow.md` | Flowchart | Loop-back arrows, decision branches |
|
||||
| `place-order-uml-sequence.md` | Sequence | UML sequence diagram style |
|
||||
| `commercial-aircraft-structure.md` | Physical | Paths, polygons, ellipses for realistic shapes |
|
||||
| `wind-turbine-structure.md` | Physical cross-section | Underground/above-ground separation, color coding |
|
||||
| `smartphone-layer-anatomy.md` | Exploded view | Alternating left/right labels, layered components |
|
||||
| `apartment-floor-plan-conversion.md` | Floor plan | Walls, doors, proposed changes in dotted red |
|
||||
| `banana-journey-tree-to-smoothie.md` | Narrative journey | Winding path, progressive state changes |
|
||||
| `cpu-ooo-microarchitecture.md` | Hardware pipeline | Fan-out, memory hierarchy sidebar |
|
||||
| `sn2-reaction-mechanism.md` | Chemistry | Molecules, curved arrows, energy profile |
|
||||
| `smart-city-infrastructure.md` | Hub-spoke | Semantic line styles per system |
|
||||
| `electricity-grid-flow.md` | Multi-stage flow | Voltage hierarchy, flow markers |
|
||||
| `ml-benchmark-grouped-bar-chart.md` | Chart | Grouped bars, dual axis |
|
||||
|
||||
Load any example with:
|
||||
```
|
||||
skill_view(name="concept-diagrams", file_path="examples/<filename>")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: What to Use When
|
||||
|
||||
| User says | Diagram type | Suggested colors |
|
||||
|-----------|--------------|------------------|
|
||||
| "show the pipeline" | Flowchart | gray start/end, purple steps, red errors, teal deploy |
|
||||
| "draw the data flow" | Data pipeline (left-right) | gray sources, purple processing, teal sinks |
|
||||
| "visualize the system" | Structural (containment) | purple container, teal services, coral data |
|
||||
| "map the endpoints" | API tree | purple root, one ramp per resource group |
|
||||
| "show the services" | Microservice topology | gray ingress, teal services, purple bus, coral workers |
|
||||
| "draw the aircraft/vehicle" | Physical | paths, polygons, ellipses for realistic shapes |
|
||||
| "smart city / IoT" | Hub-spoke integration | semantic line styles per subsystem |
|
||||
| "show the dashboard" | UI mockup | dark screen, chart colors: teal, purple, coral for alerts |
|
||||
| "power grid / electricity" | Multi-stage flow | voltage hierarchy (HV/MV/LV line weights) |
|
||||
| "wind turbine / turbine" | Physical cross-section | foundation + tower cutaway + nacelle color-coded |
|
||||
| "journey of X / lifecycle" | Narrative journey | winding path, progressive state changes |
|
||||
| "layers of X / exploded" | Exploded layer view | vertical stack, alternating labels |
|
||||
| "CPU / pipeline" | Hardware pipeline | vertical stages, fan-out to execution ports |
|
||||
| "floor plan / apartment" | Floor plan | walls, doors, proposed changes in dotted red |
|
||||
| "reaction mechanism" | Chemistry | atoms, bonds, curved arrows, transition state, energy profile |
|
||||
|
|
@ -0,0 +1,146 @@
|
|||
---
|
||||
title: "Meme Generation — Generate real meme images by picking a template and overlaying text with Pillow"
|
||||
sidebar_label: "Meme Generation"
|
||||
description: "Generate real meme images by picking a template and overlaying text with Pillow"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Meme Generation
|
||||
|
||||
Generate real meme images by picking a template and overlaying text with Pillow. Produces actual .png meme files.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/creative/meme-generation` |
|
||||
| Path | `optional-skills/creative/meme-generation` |
|
||||
| Version | `2.0.0` |
|
||||
| Author | adanaleycio |
|
||||
| License | MIT |
|
||||
| Tags | `creative`, `memes`, `humor`, `images` |
|
||||
| Related skills | [`ascii-art`](/docs/user-guide/skills/bundled/creative/creative-ascii-art), `generative-widgets` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Meme Generation
|
||||
|
||||
Generate actual meme images from a topic. Picks a template, writes captions, and renders a real .png file with text overlay.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks you to make or generate a meme
|
||||
- User wants a meme about a specific topic, situation, or frustration
|
||||
- User says "meme this" or similar
|
||||
|
||||
## Available Templates
|
||||
|
||||
The script supports **any of the ~100 popular imgflip templates** by name or ID, plus 10 curated templates with hand-tuned text positioning.
|
||||
|
||||
### Curated Templates (custom text placement)
|
||||
|
||||
| ID | Name | Fields | Best for |
|
||||
|----|------|--------|----------|
|
||||
| `this-is-fine` | This is Fine | top, bottom | chaos, denial |
|
||||
| `drake` | Drake Hotline Bling | reject, approve | rejecting/preferring |
|
||||
| `distracted-boyfriend` | Distracted Boyfriend | distraction, current, person | temptation, shifting priorities |
|
||||
| `two-buttons` | Two Buttons | left, right, person | impossible choice |
|
||||
| `expanding-brain` | Expanding Brain | 4 levels | escalating irony |
|
||||
| `change-my-mind` | Change My Mind | statement | hot takes |
|
||||
| `woman-yelling-at-cat` | Woman Yelling at Cat | woman, cat | arguments |
|
||||
| `one-does-not-simply` | One Does Not Simply | top, bottom | deceptively hard things |
|
||||
| `grus-plan` | Gru's Plan | step1-3, realization | plans that backfire |
|
||||
| `batman-slapping-robin` | Batman Slapping Robin | robin, batman | shutting down bad ideas |
|
||||
|
||||
### Dynamic Templates (from imgflip API)
|
||||
|
||||
Any template not in the curated list can be used by name or imgflip ID. These get smart default text positioning (top/bottom for 2-field, evenly spaced for 3+). Search with:
|
||||
```bash
|
||||
python "$SKILL_DIR/scripts/generate_meme.py" --search "disaster"
|
||||
```
|
||||
|
||||
## Procedure
|
||||
|
||||
### Mode 1: Classic Template (default)
|
||||
|
||||
1. Read the user's topic and identify the core dynamic (chaos, dilemma, preference, irony, etc.)
|
||||
2. Pick the template that best matches. Use the "Best for" column, or search with `--search`.
|
||||
3. Write short captions for each field (8-12 words max per field, shorter is better).
|
||||
4. Find the skill's script directory:
|
||||
```
|
||||
SKILL_DIR=$(dirname "$(find ~/.hermes/skills -path '*/meme-generation/SKILL.md' 2>/dev/null | head -1)")
|
||||
```
|
||||
5. Run the generator:
|
||||
```bash
|
||||
python "$SKILL_DIR/scripts/generate_meme.py" <template_id> /tmp/meme.png "caption 1" "caption 2" ...
|
||||
```
|
||||
6. Return the image with `MEDIA:/tmp/meme.png`
|
||||
|
||||
### Mode 2: Custom AI Image (when image_generate is available)
|
||||
|
||||
Use this when no classic template fits, or when the user wants something original.
|
||||
|
||||
1. Write the captions first.
|
||||
2. Use `image_generate` to create a scene that matches the meme concept. Do NOT include any text in the image prompt — text will be added by the script. Describe only the visual scene.
|
||||
3. Find the generated image path from the image_generate result URL. Download it to a local path if needed.
|
||||
4. Run the script with `--image` to overlay text, choosing a mode:
|
||||
- **Overlay** (text directly on image, white with black outline):
|
||||
```bash
|
||||
python "$SKILL_DIR/scripts/generate_meme.py" --image /path/to/scene.png /tmp/meme.png "top text" "bottom text"
|
||||
```
|
||||
- **Bars** (black bars above/below with white text — cleaner, always readable):
|
||||
```bash
|
||||
python "$SKILL_DIR/scripts/generate_meme.py" --image /path/to/scene.png --bars /tmp/meme.png "top text" "bottom text"
|
||||
```
|
||||
Use `--bars` when the image is busy/detailed and text would be hard to read on top of it.
|
||||
5. **Verify with vision** (if `vision_analyze` is available): Check the result looks good:
|
||||
```
|
||||
vision_analyze(image_url="/tmp/meme.png", question="Is the text legible and well-positioned? Does the meme work visually?")
|
||||
```
|
||||
If the vision model flags issues (text hard to read, bad placement, etc.), try the other mode (switch between overlay and bars) or regenerate the scene.
|
||||
6. Return the image with `MEDIA:/tmp/meme.png`
|
||||
|
||||
## Examples
|
||||
|
||||
**"debugging production at 2 AM":**
|
||||
```bash
|
||||
python generate_meme.py this-is-fine /tmp/meme.png "SERVERS ARE ON FIRE" "This is fine"
|
||||
```
|
||||
|
||||
**"choosing between sleep and one more episode":**
|
||||
```bash
|
||||
python generate_meme.py drake /tmp/meme.png "Getting 8 hours of sleep" "One more episode at 3 AM"
|
||||
```
|
||||
|
||||
**"the stages of a Monday morning":**
|
||||
```bash
|
||||
python generate_meme.py expanding-brain /tmp/meme.png "Setting an alarm" "Setting 5 alarms" "Sleeping through all alarms" "Working from bed"
|
||||
```
|
||||
|
||||
## Listing Templates
|
||||
|
||||
To see all available templates:
|
||||
```bash
|
||||
python generate_meme.py --list
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- Keep captions SHORT. Memes with long text look terrible.
|
||||
- Match the number of text arguments to the template's field count.
|
||||
- Pick the template that fits the joke structure, not just the topic.
|
||||
- Do not generate hateful, abusive, or personally targeted content.
|
||||
- The script caches template images in `scripts/.cache/` after first download.
|
||||
|
||||
## Verification
|
||||
|
||||
The output is correct if:
|
||||
- A .png file was created at the output path
|
||||
- Text is legible (white with black outline) on the template
|
||||
- The joke lands — caption matches the template's intended structure
|
||||
- File can be delivered via MEDIA: path
|
||||
|
|
@ -0,0 +1,356 @@
|
|||
---
|
||||
title: "Touchdesigner Mcp"
|
||||
sidebar_label: "Touchdesigner Mcp"
|
||||
description: "Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Touchdesigner Mcp
|
||||
|
||||
Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/creative/touchdesigner-mcp` |
|
||||
| Path | `optional-skills/creative/touchdesigner-mcp` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | kshitijk4poor |
|
||||
| License | MIT |
|
||||
| Tags | `TouchDesigner`, `MCP`, `twozero`, `creative-coding`, `real-time-visuals`, `generative-art`, `audio-reactive`, `VJ`, `installation`, `GLSL` |
|
||||
| Related skills | [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`ascii-video`](/docs/user-guide/skills/bundled/creative/creative-ascii-video), [`manim-video`](/docs/user-guide/skills/bundled/creative/creative-manim-video), `hermes-video` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# TouchDesigner Integration (twozero MCP)
|
||||
|
||||
## CRITICAL RULES
|
||||
|
||||
1. **NEVER guess parameter names.** Call `td_get_par_info` for the op type FIRST. Your training data is wrong for TD 2025.32.
|
||||
2. **If `tdAttributeError` fires, STOP.** Call `td_get_operator_info` on the failing node before continuing.
|
||||
3. **NEVER hardcode absolute paths** in script callbacks. Use `me.parent()` / `scriptOp.parent()`.
|
||||
4. **Prefer native MCP tools over td_execute_python.** Use `td_create_operator`, `td_set_operator_pars`, `td_get_errors` etc. Only fall back to `td_execute_python` for complex multi-step logic.
|
||||
5. **Call `td_get_hints` before building.** It returns patterns specific to the op type you're working with.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Hermes Agent -> MCP (Streamable HTTP) -> twozero.tox (port 40404) -> TD Python
|
||||
```
|
||||
|
||||
36 native tools. Free plugin (no payment/license — confirmed April 2026).
|
||||
Context-aware (knows selected OP, current network).
|
||||
Hub health check: `GET http://localhost:40404/mcp` returns JSON with instance PID, project name, TD version.
|
||||
|
||||
## Setup (Automated)
|
||||
|
||||
Run the setup script to handle everything:
|
||||
|
||||
```bash
|
||||
bash "${HERMES_HOME:-$HOME/.hermes}/skills/creative/touchdesigner-mcp/scripts/setup.sh"
|
||||
```
|
||||
|
||||
The script will:
|
||||
1. Check if TD is running
|
||||
2. Download twozero.tox if not already cached
|
||||
3. Add `twozero_td` MCP server to Hermes config (if missing)
|
||||
4. Test the MCP connection on port 40404
|
||||
5. Report what manual steps remain (drag .tox into TD, enable MCP toggle)
|
||||
|
||||
### Manual steps (one-time, cannot be automated)
|
||||
|
||||
1. **Drag `~/Downloads/twozero.tox` into the TD network editor** → click Install
|
||||
2. **Enable MCP:** click twozero icon → Settings → mcp → "auto start MCP" → Yes
|
||||
3. **Restart Hermes session** to pick up the new MCP server
|
||||
|
||||
After setup, verify:
|
||||
```bash
|
||||
nc -z 127.0.0.1 40404 && echo "twozero MCP: READY"
|
||||
```
|
||||
|
||||
## Environment Notes
|
||||
|
||||
- **Non-Commercial TD** caps resolution at 1280×1280. Use `outputresolution = 'custom'` and set width/height explicitly.
|
||||
- **Codecs:** `prores` (preferred on macOS) or `mjpa` as fallback. H.264/H.265/AV1 require a Commercial license.
|
||||
- Always call `td_get_par_info` before setting params — names vary by TD version (see CRITICAL RULES #1).
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Discover (before building anything)
|
||||
|
||||
```
|
||||
Call td_get_par_info with op_type for each type you plan to use.
|
||||
Call td_get_hints with the topic you're building (e.g. "glsl", "audio reactive", "feedback").
|
||||
Call td_get_focus to see where the user is and what's selected.
|
||||
Call td_get_network to see what already exists.
|
||||
```
|
||||
|
||||
No temp nodes, no cleanup. This replaces the old discovery dance entirely.
|
||||
|
||||
### Step 1: Clean + Build
|
||||
|
||||
**IMPORTANT: Split cleanup and creation into SEPARATE MCP calls.** Destroying and recreating same-named nodes in one `td_execute_python` script causes "Invalid OP object" errors. See pitfalls #11b.
|
||||
|
||||
Use `td_create_operator` for each node (handles viewport positioning automatically):
|
||||
|
||||
```
|
||||
td_create_operator(type="noiseTOP", parent="/project1", name="bg", parameters={"resolutionw": 1280, "resolutionh": 720})
|
||||
td_create_operator(type="levelTOP", parent="/project1", name="brightness")
|
||||
td_create_operator(type="nullTOP", parent="/project1", name="out")
|
||||
```
|
||||
|
||||
For bulk creation or wiring, use `td_execute_python`:
|
||||
|
||||
```python
|
||||
# td_execute_python script:
|
||||
root = op('/project1')
|
||||
nodes = []
|
||||
for name, optype in [('bg', noiseTOP), ('fx', levelTOP), ('out', nullTOP)]:
|
||||
n = root.create(optype, name)
|
||||
nodes.append(n.path)
|
||||
# Wire chain
|
||||
for i in range(len(nodes)-1):
|
||||
op(nodes[i]).outputConnectors[0].connect(op(nodes[i+1]).inputConnectors[0])
|
||||
result = {'created': nodes}
|
||||
```
|
||||
|
||||
### Step 2: Set Parameters
|
||||
|
||||
Prefer the native tool (validates params, won't crash):
|
||||
|
||||
```
|
||||
td_set_operator_pars(path="/project1/bg", parameters={"roughness": 0.6, "monochrome": true})
|
||||
```
|
||||
|
||||
For expressions or modes, use `td_execute_python`:
|
||||
|
||||
```python
|
||||
op('/project1/time_driver').par.colorr.expr = "absTime.seconds % 1000.0"
|
||||
```
|
||||
|
||||
### Step 3: Wire
|
||||
|
||||
Use `td_execute_python` — no native wire tool exists:
|
||||
|
||||
```python
|
||||
op('/project1/bg').outputConnectors[0].connect(op('/project1/fx').inputConnectors[0])
|
||||
```
|
||||
|
||||
### Step 4: Verify
|
||||
|
||||
```
|
||||
td_get_errors(path="/project1", recursive=true)
|
||||
td_get_perf()
|
||||
td_get_operator_info(path="/project1/out", detail="full")
|
||||
```
|
||||
|
||||
### Step 5: Display / Capture
|
||||
|
||||
```
|
||||
td_get_screenshot(path="/project1/out")
|
||||
```
|
||||
|
||||
Or open a window via script:
|
||||
|
||||
```python
|
||||
win = op('/project1').create(windowCOMP, 'display')
|
||||
win.par.winop = op('/project1/out').path
|
||||
win.par.winw = 1280; win.par.winh = 720
|
||||
win.par.winopen.pulse()
|
||||
```
|
||||
|
||||
## MCP Tool Quick Reference
|
||||
|
||||
**Core (use these most):**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_execute_python` | Run arbitrary Python in TD. Full API access. |
|
||||
| `td_create_operator` | Create node with params + auto-positioning |
|
||||
| `td_set_operator_pars` | Set params safely (validates, won't crash) |
|
||||
| `td_get_operator_info` | Inspect one node: connections, params, errors |
|
||||
| `td_get_operators_info` | Inspect multiple nodes in one call |
|
||||
| `td_get_network` | See network structure at a path |
|
||||
| `td_get_errors` | Find errors/warnings recursively |
|
||||
| `td_get_par_info` | Get param names for an OP type (replaces discovery) |
|
||||
| `td_get_hints` | Get patterns/tips before building |
|
||||
| `td_get_focus` | What network is open, what's selected |
|
||||
|
||||
**Read/Write:**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_read_dat` | Read DAT text content |
|
||||
| `td_write_dat` | Write/patch DAT content |
|
||||
| `td_read_chop` | Read CHOP channel values |
|
||||
| `td_read_textport` | Read TD console output |
|
||||
|
||||
**Visual:**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_get_screenshot` | Capture one OP viewer to file |
|
||||
| `td_get_screenshots` | Capture multiple OPs at once |
|
||||
| `td_get_screen_screenshot` | Capture actual screen via TD |
|
||||
| `td_navigate_to` | Jump network editor to an OP |
|
||||
|
||||
**Search:**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_find_op` | Find ops by name/type across project |
|
||||
| `td_search` | Search code, expressions, string params |
|
||||
|
||||
**System:**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_get_perf` | Performance profiling (FPS, slow ops) |
|
||||
| `td_list_instances` | List all running TD instances |
|
||||
| `td_get_docs` | In-depth docs on a TD topic |
|
||||
| `td_agents_md` | Read/write per-COMP markdown docs |
|
||||
| `td_reinit_extension` | Reload extension after code edit |
|
||||
| `td_clear_textport` | Clear console before debug session |
|
||||
|
||||
**Input Automation:**
|
||||
| Tool | What |
|
||||
|------|------|
|
||||
| `td_input_execute` | Send mouse/keyboard to TD |
|
||||
| `td_input_status` | Poll input queue status |
|
||||
| `td_input_clear` | Stop input automation |
|
||||
| `td_op_screen_rect` | Get screen coords of a node |
|
||||
| `td_click_screen_point` | Click a point in a screenshot |
|
||||
|
||||
See `references/mcp-tools.md` for full parameter schemas.
|
||||
|
||||
## Key Implementation Rules
|
||||
|
||||
**GLSL time:** No `uTDCurrentTime` in GLSL TOP. Use the Values page:
|
||||
```python
|
||||
# Call td_get_par_info(op_type="glslTOP") first to confirm param names
|
||||
td_set_operator_pars(path="/project1/shader", parameters={"value0name": "uTime"})
|
||||
# Then set expression via script:
|
||||
# op('/project1/shader').par.value0.expr = "absTime.seconds"
|
||||
# In GLSL: uniform float uTime;
|
||||
```
|
||||
|
||||
Fallback: Constant TOP in `rgba32float` format (8-bit clamps to 0-1, freezing the shader).
|
||||
|
||||
**Feedback TOP:** Use `top` parameter reference, not direct input wire. "Not enough sources" resolves after first cook. "Cook dependency loop" warning is expected.
|
||||
|
||||
**Resolution:** Non-Commercial caps at 1280×1280. Use `outputresolution = 'custom'`.
|
||||
|
||||
**Large shaders:** Write GLSL to `/tmp/file.glsl`, then use `td_write_dat` or `td_execute_python` to load.
|
||||
|
||||
**Vertex/Point access (TD 2025.32):** `point.P[0]`, `point.P[1]`, `point.P[2]` — NOT `.x`, `.y`, `.z`.
|
||||
|
||||
**Extensions:** `ext0object` format is `"op('./datName').module.ClassName(me)"` in CONSTANT mode. After editing extension code with `td_write_dat`, call `td_reinit_extension`.
|
||||
|
||||
**Script callbacks:** ALWAYS use relative paths via `me.parent()` / `scriptOp.parent()`.
|
||||
|
||||
**Cleaning nodes:** Always `list(root.children)` before iterating + `child.valid` check.
|
||||
|
||||
## Recording / Exporting Video
|
||||
|
||||
```python
|
||||
# via td_execute_python:
|
||||
root = op('/project1')
|
||||
rec = root.create(moviefileoutTOP, 'recorder')
|
||||
op('/project1/out').outputConnectors[0].connect(rec.inputConnectors[0])
|
||||
rec.par.type = 'movie'
|
||||
rec.par.file = '/tmp/output.mov'
|
||||
rec.par.videocodec = 'prores' # Apple ProRes — NOT license-restricted on macOS
|
||||
rec.par.record = True # start
|
||||
# rec.par.record = False # stop (call separately later)
|
||||
```
|
||||
|
||||
H.264/H.265/AV1 need Commercial license. Use `prores` on macOS or `mjpa` as fallback.
|
||||
Extract frames: `ffmpeg -i /tmp/output.mov -vframes 120 /tmp/frames/frame_%06d.png`
|
||||
|
||||
**TOP.save() is useless for animation** — captures same GPU texture every time. Always use MovieFileOut.
|
||||
|
||||
### Before Recording: Checklist
|
||||
|
||||
1. **Verify FPS > 0** via `td_get_perf`. If FPS=0 the recording will be empty. See pitfalls #38-39.
|
||||
2. **Verify shader output is not black** via `td_get_screenshot`. Black output = shader error or missing input. See pitfalls #8, #40.
|
||||
3. **If recording with audio:** cue audio to start first, then delay recording by 3 frames. See pitfalls #19.
|
||||
4. **Set output path before starting record** — setting both in the same script can race.
|
||||
|
||||
## Audio-Reactive GLSL (Proven Recipe)
|
||||
|
||||
### Correct signal chain (tested April 2026)
|
||||
|
||||
```
|
||||
AudioFileIn CHOP (playmode=sequential)
|
||||
→ AudioSpectrum CHOP (FFT=512, outputmenu=setmanually, outlength=256, timeslice=ON)
|
||||
→ Math CHOP (gain=10)
|
||||
→ CHOP to TOP (dataformat=r, layout=rowscropped)
|
||||
→ GLSL TOP input 1 (spectrum texture, 256x2)
|
||||
|
||||
Constant TOP (rgba32float, time) → GLSL TOP input 0
|
||||
GLSL TOP → Null TOP → MovieFileOut
|
||||
```
|
||||
|
||||
### Critical audio-reactive rules (empirically verified)
|
||||
|
||||
1. **TimeSlice must stay ON** for AudioSpectrum. OFF = processes entire audio file → 24000+ samples → CHOP to TOP overflow.
|
||||
2. **Set Output Length manually** to 256 via `outputmenu='setmanually'` and `outlength=256`. Default outputs 22050 samples.
|
||||
3. **DO NOT use Lag CHOP for spectrum smoothing.** Lag CHOP operates in timeslice mode and expands 256 samples to 2400+, averaging all values to near-zero (~1e-06). The shader receives no usable data. This was the #1 audio sync failure in testing.
|
||||
4. **DO NOT use Filter CHOP either** — same timeslice expansion problem with spectrum data.
|
||||
5. **Smoothing belongs in the GLSL shader** if needed, via temporal lerp with a feedback texture: `mix(prevValue, newValue, 0.3)`. This gives frame-perfect sync with zero pipeline latency.
|
||||
6. **CHOP to TOP dataformat = 'r'**, layout = 'rowscropped'. Spectrum output is 256x2 (stereo). Sample at y=0.25 for first channel.
|
||||
7. **Math gain = 10** (not 5). Raw spectrum values are ~0.19 in bass range. Gain of 10 gives usable ~5.0 for the shader.
|
||||
8. **No Resample CHOP needed.** Control output size via AudioSpectrum's `outlength` param directly.
|
||||
|
||||
### GLSL spectrum sampling
|
||||
|
||||
```glsl
|
||||
// Input 0 = time (1x1 rgba32float), Input 1 = spectrum (256x2)
|
||||
float iTime = texture(sTD2DInputs[0], vec2(0.5)).r;
|
||||
|
||||
// Sample multiple points per band and average for stability:
|
||||
// NOTE: y=0.25 for first channel (stereo texture is 256x2, first row center is 0.25)
|
||||
float bass = (texture(sTD2DInputs[1], vec2(0.02, 0.25)).r +
|
||||
texture(sTD2DInputs[1], vec2(0.05, 0.25)).r) / 2.0;
|
||||
float mid = (texture(sTD2DInputs[1], vec2(0.2, 0.25)).r +
|
||||
texture(sTD2DInputs[1], vec2(0.35, 0.25)).r) / 2.0;
|
||||
float hi = (texture(sTD2DInputs[1], vec2(0.6, 0.25)).r +
|
||||
texture(sTD2DInputs[1], vec2(0.8, 0.25)).r) / 2.0;
|
||||
```
|
||||
|
||||
See `references/network-patterns.md` for complete build scripts + shader code.
|
||||
|
||||
## Operator Quick Reference
|
||||
|
||||
| Family | Color | Python class / MCP type | Suffix |
|
||||
|--------|-------|-------------|--------|
|
||||
| TOP | Purple | noiseTOP, glslTOP, compositeTOP, levelTop, blurTOP, textTOP, nullTOP | TOP |
|
||||
| CHOP | Green | audiofileinCHOP, audiospectrumCHOP, mathCHOP, lfoCHOP, constantCHOP | CHOP |
|
||||
| SOP | Blue | gridSOP, sphereSOP, transformSOP, noiseSOP | SOP |
|
||||
| DAT | White | textDAT, tableDAT, scriptDAT, webserverDAT | DAT |
|
||||
| MAT | Yellow | phongMAT, pbrMAT, glslMAT, constMAT | MAT |
|
||||
| COMP | Gray | geometryCOMP, containerCOMP, cameraCOMP, lightCOMP, windowCOMP | COMP |
|
||||
|
||||
## Security Notes
|
||||
|
||||
- MCP runs on localhost only (port 40404). No authentication — any local process can send commands.
|
||||
- `td_execute_python` has unrestricted access to the TD Python environment and filesystem as the TD process user.
|
||||
- `setup.sh` downloads twozero.tox from the official 404zero.com URL. Verify the download if concerned.
|
||||
- The skill never sends data outside localhost. All MCP communication is local.
|
||||
|
||||
## References
|
||||
|
||||
| File | What |
|
||||
|------|------|
|
||||
| `references/pitfalls.md` | Hard-won lessons from real sessions |
|
||||
| `references/operators.md` | All operator families with params and use cases |
|
||||
| `references/network-patterns.md` | Recipes: audio-reactive, generative, GLSL, instancing |
|
||||
| `references/mcp-tools.md` | Full twozero MCP tool parameter schemas |
|
||||
| `references/python-api.md` | TD Python: op(), scripting, extensions |
|
||||
| `references/troubleshooting.md` | Connection diagnostics, debugging |
|
||||
| `scripts/setup.sh` | Automated setup script |
|
||||
|
||||
---
|
||||
|
||||
> You're not writing code. You're conducting light.
|
||||
172
website/docs/user-guide/skills/optional/devops/devops-cli.md
Normal file
172
website/docs/user-guide/skills/optional/devops/devops-cli.md
Normal file
|
|
@ -0,0 +1,172 @@
|
|||
---
|
||||
title: "Inference Sh Cli — Run 150+ AI apps via inference"
|
||||
sidebar_label: "Inference Sh Cli"
|
||||
description: "Run 150+ AI apps via inference"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Inference Sh Cli
|
||||
|
||||
Run 150+ AI apps via inference.sh CLI (infsh) — image generation, video creation, LLMs, search, 3D, social automation. Uses the terminal tool. Triggers: inference.sh, infsh, ai apps, flux, veo, image generation, video generation, seedream, seedance, tavily
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/devops/cli` |
|
||||
| Path | `optional-skills/devops/cli` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | okaris |
|
||||
| License | MIT |
|
||||
| Tags | `AI`, `image-generation`, `video`, `LLM`, `search`, `inference`, `FLUX`, `Veo`, `Claude` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# inference.sh CLI
|
||||
|
||||
Run 150+ AI apps in the cloud with a simple CLI. No GPU required.
|
||||
|
||||
All commands use the **terminal tool** to run `infsh` commands.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to generate images (FLUX, Reve, Seedream, Grok, Gemini image)
|
||||
- User asks to generate video (Veo, Wan, Seedance, OmniHuman)
|
||||
- User asks about inference.sh or infsh
|
||||
- User wants to run AI apps without managing individual provider APIs
|
||||
- User asks for AI-powered search (Tavily, Exa)
|
||||
- User needs avatar/lipsync generation
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The `infsh` CLI must be installed and authenticated. Check with:
|
||||
|
||||
```bash
|
||||
infsh me
|
||||
```
|
||||
|
||||
If not installed:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://cli.inference.sh | sh
|
||||
infsh login
|
||||
```
|
||||
|
||||
See `references/authentication.md` for full setup details.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Always Search First
|
||||
|
||||
Never guess app names — always search to find the correct app ID:
|
||||
|
||||
```bash
|
||||
infsh app list --search flux
|
||||
infsh app list --search video
|
||||
infsh app list --search image
|
||||
```
|
||||
|
||||
### 2. Run an App
|
||||
|
||||
Use the exact app ID from the search results. Always use `--json` for machine-readable output:
|
||||
|
||||
```bash
|
||||
infsh app run <app-id> --input '{"prompt": "your prompt here"}' --json
|
||||
```
|
||||
|
||||
### 3. Parse the Output
|
||||
|
||||
The JSON output contains URLs to generated media. Present these to the user with `MEDIA:<url>` for inline display.
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Image Generation
|
||||
|
||||
```bash
|
||||
# Search for image apps
|
||||
infsh app list --search image
|
||||
|
||||
# FLUX Dev with LoRA
|
||||
infsh app run falai/flux-dev-lora --input '{"prompt": "sunset over mountains", "num_images": 1}' --json
|
||||
|
||||
# Gemini image generation
|
||||
infsh app run google/gemini-2-5-flash-image --input '{"prompt": "futuristic city", "num_images": 1}' --json
|
||||
|
||||
# Seedream (ByteDance)
|
||||
infsh app run bytedance/seedream-5-lite --input '{"prompt": "nature scene"}' --json
|
||||
|
||||
# Grok Imagine (xAI)
|
||||
infsh app run xai/grok-imagine-image --input '{"prompt": "abstract art"}' --json
|
||||
```
|
||||
|
||||
### Video Generation
|
||||
|
||||
```bash
|
||||
# Search for video apps
|
||||
infsh app list --search video
|
||||
|
||||
# Veo 3.1 (Google)
|
||||
infsh app run google/veo-3-1-fast --input '{"prompt": "drone shot of coastline"}' --json
|
||||
|
||||
# Seedance (ByteDance)
|
||||
infsh app run bytedance/seedance-1-5-pro --input '{"prompt": "dancing figure", "resolution": "1080p"}' --json
|
||||
|
||||
# Wan 2.5
|
||||
infsh app run falai/wan-2-5 --input '{"prompt": "person walking through city"}' --json
|
||||
```
|
||||
|
||||
### Local File Uploads
|
||||
|
||||
The CLI automatically uploads local files when you provide a path:
|
||||
|
||||
```bash
|
||||
# Upscale a local image
|
||||
infsh app run falai/topaz-image-upscaler --input '{"image": "/path/to/photo.jpg", "upscale_factor": 2}' --json
|
||||
|
||||
# Image-to-video from local file
|
||||
infsh app run falai/wan-2-5-i2v --input '{"image": "/path/to/image.png", "prompt": "make it move"}' --json
|
||||
|
||||
# Avatar with audio
|
||||
infsh app run bytedance/omnihuman-1-5 --input '{"audio": "/path/to/audio.mp3", "image": "/path/to/face.jpg"}' --json
|
||||
```
|
||||
|
||||
### Search & Research
|
||||
|
||||
```bash
|
||||
infsh app list --search search
|
||||
infsh app run tavily/tavily-search --input '{"query": "latest AI news"}' --json
|
||||
infsh app run exa/exa-search --input '{"query": "machine learning papers"}' --json
|
||||
```
|
||||
|
||||
### Other Categories
|
||||
|
||||
```bash
|
||||
# 3D generation
|
||||
infsh app list --search 3d
|
||||
|
||||
# Audio / TTS
|
||||
infsh app list --search tts
|
||||
|
||||
# Twitter/X automation
|
||||
infsh app list --search twitter
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
1. **Never guess app IDs** — always run `infsh app list --search <term>` first. App IDs change and new apps are added frequently.
|
||||
2. **Always use `--json`** — raw output is hard to parse. The `--json` flag gives structured output with URLs.
|
||||
3. **Check authentication** — if commands fail with auth errors, run `infsh login` or verify `INFSH_API_KEY` is set.
|
||||
4. **Long-running apps** — video generation can take 30-120 seconds. The terminal tool timeout should be sufficient, but warn the user it may take a moment.
|
||||
5. **Input format** — the `--input` flag takes a JSON string. Make sure to properly escape quotes.
|
||||
|
||||
## Reference Docs
|
||||
|
||||
- `references/authentication.md` — Setup, login, API keys
|
||||
- `references/app-discovery.md` — Searching and browsing the app catalog
|
||||
- `references/running-apps.md` — Running apps, input formats, output handling
|
||||
- `references/cli-reference.md` — Complete CLI command reference
|
||||
|
|
@ -0,0 +1,296 @@
|
|||
---
|
||||
title: "Docker Management"
|
||||
sidebar_label: "Docker Management"
|
||||
description: "Manage Docker containers, images, volumes, networks, and Compose stacks — lifecycle ops, debugging, cleanup, and Dockerfile optimization"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Docker Management
|
||||
|
||||
Manage Docker containers, images, volumes, networks, and Compose stacks — lifecycle ops, debugging, cleanup, and Dockerfile optimization.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/devops/docker-management` |
|
||||
| Path | `optional-skills/devops/docker-management` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | sprmn24 |
|
||||
| License | MIT |
|
||||
| Tags | `docker`, `containers`, `devops`, `infrastructure`, `compose`, `images`, `volumes`, `networks`, `debugging` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Docker Management
|
||||
|
||||
Manage Docker containers, images, volumes, networks, and Compose stacks using standard Docker CLI commands. No additional dependencies beyond Docker itself.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Run, stop, restart, remove, or inspect containers
|
||||
- Build, pull, push, tag, or clean up Docker images
|
||||
- Work with Docker Compose (multi-service stacks)
|
||||
- Manage volumes or networks
|
||||
- Debug a crashing container or analyze logs
|
||||
- Check Docker disk usage or free up space
|
||||
- Review or optimize a Dockerfile
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker Engine installed and running
|
||||
- User added to the `docker` group (or use `sudo`)
|
||||
- Docker Compose v2 (included with modern Docker installations)
|
||||
|
||||
Quick check:
|
||||
|
||||
```bash
|
||||
docker --version && docker compose version
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| Run container (background) | `docker run -d --name NAME IMAGE` |
|
||||
| Stop + remove | `docker stop NAME && docker rm NAME` |
|
||||
| View logs (follow) | `docker logs --tail 50 -f NAME` |
|
||||
| Shell into container | `docker exec -it NAME /bin/sh` |
|
||||
| List all containers | `docker ps -a` |
|
||||
| Build image | `docker build -t TAG .` |
|
||||
| Compose up | `docker compose up -d` |
|
||||
| Compose down | `docker compose down` |
|
||||
| Disk usage | `docker system df` |
|
||||
| Cleanup dangling | `docker image prune && docker container prune` |
|
||||
|
||||
## Procedure
|
||||
|
||||
### 1. Identify the domain
|
||||
|
||||
Figure out which area the request falls into:
|
||||
|
||||
- **Container lifecycle** → run, stop, start, restart, rm, pause/unpause
|
||||
- **Container interaction** → exec, cp, logs, inspect, stats
|
||||
- **Image management** → build, pull, push, tag, rmi, save/load
|
||||
- **Docker Compose** → up, down, ps, logs, exec, build, config
|
||||
- **Volumes & networks** → create, inspect, rm, prune, connect
|
||||
- **Troubleshooting** → log analysis, exit codes, resource issues
|
||||
|
||||
### 2. Container operations
|
||||
|
||||
**Run a new container:**
|
||||
|
||||
```bash
|
||||
# Detached service with port mapping
|
||||
docker run -d --name web -p 8080:80 nginx
|
||||
|
||||
# With environment variables
|
||||
docker run -d -e POSTGRES_PASSWORD=secret -e POSTGRES_DB=mydb --name db postgres:16
|
||||
|
||||
# With persistent data (named volume)
|
||||
docker run -d -v pgdata:/var/lib/postgresql/data --name db postgres:16
|
||||
|
||||
# For development (bind mount source code)
|
||||
docker run -d -v $(pwd)/src:/app/src -p 3000:3000 --name dev my-app
|
||||
|
||||
# Interactive debugging (auto-remove on exit)
|
||||
docker run -it --rm ubuntu:22.04 /bin/bash
|
||||
|
||||
# With resource limits and restart policy
|
||||
docker run -d --memory=512m --cpus=1.5 --restart=unless-stopped --name app my-app
|
||||
```
|
||||
|
||||
Key flags: `-d` detached, `-it` interactive+tty, `--rm` auto-remove, `-p` port (host:container), `-e` env var, `-v` volume, `--name` name, `--restart` restart policy.
|
||||
|
||||
**Manage running containers:**
|
||||
|
||||
```bash
|
||||
docker ps # running containers
|
||||
docker ps -a # all (including stopped)
|
||||
docker stop NAME # graceful stop
|
||||
docker start NAME # start stopped container
|
||||
docker restart NAME # stop + start
|
||||
docker rm NAME # remove stopped container
|
||||
docker rm -f NAME # force remove running container
|
||||
docker container prune # remove ALL stopped containers
|
||||
```
|
||||
|
||||
**Interact with containers:**
|
||||
|
||||
```bash
|
||||
docker exec -it NAME /bin/sh # shell access (use /bin/bash if available)
|
||||
docker exec NAME env # view environment variables
|
||||
docker exec -u root NAME apt update # run as specific user
|
||||
docker logs --tail 100 -f NAME # follow last 100 lines
|
||||
docker logs --since 2h NAME # logs from last 2 hours
|
||||
docker cp NAME:/path/file ./local # copy file from container
|
||||
docker cp ./file NAME:/path/ # copy file to container
|
||||
docker inspect NAME # full container details (JSON)
|
||||
docker stats --no-stream # resource usage snapshot
|
||||
docker top NAME # running processes
|
||||
```
|
||||
|
||||
### 3. Image management
|
||||
|
||||
```bash
|
||||
# Build
|
||||
docker build -t my-app:latest .
|
||||
docker build -t my-app:prod -f Dockerfile.prod .
|
||||
docker build --no-cache -t my-app . # clean rebuild
|
||||
DOCKER_BUILDKIT=1 docker build -t my-app . # faster with BuildKit
|
||||
|
||||
# Pull and push
|
||||
docker pull node:20-alpine
|
||||
docker login ghcr.io
|
||||
docker tag my-app:latest registry/my-app:v1.0
|
||||
docker push registry/my-app:v1.0
|
||||
|
||||
# Inspect
|
||||
docker images # list local images
|
||||
docker history IMAGE # see layers
|
||||
docker inspect IMAGE # full details
|
||||
|
||||
# Cleanup
|
||||
docker image prune # remove dangling (untagged) images
|
||||
docker image prune -a # remove ALL unused images (careful!)
|
||||
docker image prune -a --filter "until=168h" # unused images older than 7 days
|
||||
```
|
||||
|
||||
### 4. Docker Compose
|
||||
|
||||
```bash
|
||||
# Start/stop
|
||||
docker compose up -d # start all services detached
|
||||
docker compose up -d --build # rebuild images before starting
|
||||
docker compose down # stop and remove containers
|
||||
docker compose down -v # also remove volumes (DESTROYS DATA)
|
||||
|
||||
# Monitoring
|
||||
docker compose ps # list services
|
||||
docker compose logs -f api # follow logs for specific service
|
||||
docker compose logs --tail 50 # last 50 lines all services
|
||||
|
||||
# Interaction
|
||||
docker compose exec api /bin/sh # shell into running service
|
||||
docker compose run --rm api npm test # one-off command (new container)
|
||||
docker compose restart api # restart specific service
|
||||
|
||||
# Validation
|
||||
docker compose config # validate and view resolved config
|
||||
```
|
||||
|
||||
**Minimal compose.yml example:**
|
||||
|
||||
```yaml
|
||||
services:
|
||||
api:
|
||||
build: .
|
||||
ports:
|
||||
- "3000:3000"
|
||||
environment:
|
||||
- DATABASE_URL=postgres://user:pass@db:5432/mydb
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
|
||||
db:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
POSTGRES_USER: user
|
||||
POSTGRES_PASSWORD: pass
|
||||
POSTGRES_DB: mydb
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U user"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
volumes:
|
||||
pgdata:
|
||||
```
|
||||
|
||||
### 5. Volumes and networks
|
||||
|
||||
```bash
|
||||
# Volumes
|
||||
docker volume ls # list volumes
|
||||
docker volume create mydata # create named volume
|
||||
docker volume inspect mydata # details (mount point, etc.)
|
||||
docker volume rm mydata # remove (fails if in use)
|
||||
docker volume prune # remove unused volumes
|
||||
|
||||
# Networks
|
||||
docker network ls # list networks
|
||||
docker network create mynet # create bridge network
|
||||
docker network inspect mynet # details (connected containers)
|
||||
docker network connect mynet NAME # attach container to network
|
||||
docker network disconnect mynet NAME # detach container
|
||||
docker network rm mynet # remove network
|
||||
docker network prune # remove unused networks
|
||||
```
|
||||
|
||||
### 6. Disk usage and cleanup
|
||||
|
||||
Always start with a diagnostic before cleaning:
|
||||
|
||||
```bash
|
||||
# Check what's using space
|
||||
docker system df # summary
|
||||
docker system df -v # detailed breakdown
|
||||
|
||||
# Targeted cleanup (safe)
|
||||
docker container prune # stopped containers
|
||||
docker image prune # dangling images
|
||||
docker volume prune # unused volumes
|
||||
docker network prune # unused networks
|
||||
|
||||
# Aggressive cleanup (confirm with user first!)
|
||||
docker system prune # containers + images + networks
|
||||
docker system prune -a # also unused images
|
||||
docker system prune -a --volumes # EVERYTHING — named volumes too
|
||||
```
|
||||
|
||||
**Warning:** Never run `docker system prune -a --volumes` without confirming with the user. This removes named volumes with potentially important data.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
| Problem | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Container exits immediately | Main process finished or crashed | Check `docker logs NAME`, try `docker run -it --entrypoint /bin/sh IMAGE` |
|
||||
| "port is already allocated" | Another process using that port | `docker ps` or `lsof -i :PORT` to find it |
|
||||
| "no space left on device" | Docker disk full | `docker system df` then targeted prune |
|
||||
| Can't connect to container | App binds to 127.0.0.1 inside container | App must bind to `0.0.0.0`, check `-p` mapping |
|
||||
| Permission denied on volume | UID/GID mismatch host vs container | Use `--user $(id -u):$(id -g)` or fix permissions |
|
||||
| Compose services can't reach each other | Wrong network or service name | Services use service name as hostname, check `docker compose config` |
|
||||
| Build cache not working | Layer order wrong in Dockerfile | Put rarely-changing layers first (deps before source code) |
|
||||
| Image too large | No multi-stage build, no .dockerignore | Use multi-stage builds, add `.dockerignore` |
|
||||
|
||||
## Verification
|
||||
|
||||
After any Docker operation, verify the result:
|
||||
|
||||
- **Container started?** → `docker ps` (check status is "Up")
|
||||
- **Logs clean?** → `docker logs --tail 20 NAME` (no errors)
|
||||
- **Port accessible?** → `curl -s http://localhost:PORT` or `docker port NAME`
|
||||
- **Image built?** → `docker images | grep TAG`
|
||||
- **Compose stack healthy?** → `docker compose ps` (all services "running" or "healthy")
|
||||
- **Disk freed?** → `docker system df` (compare before/after)
|
||||
|
||||
## Dockerfile Optimization Tips
|
||||
|
||||
When reviewing or creating a Dockerfile, suggest these improvements:
|
||||
|
||||
1. **Multi-stage builds** — separate build environment from runtime to reduce final image size
|
||||
2. **Layer ordering** — put dependencies before source code so changes don't invalidate cached layers
|
||||
3. **Combine RUN commands** — fewer layers, smaller image
|
||||
4. **Use .dockerignore** — exclude `node_modules`, `.git`, `__pycache__`, etc.
|
||||
5. **Pin base image versions** — `node:20-alpine` not `node:latest`
|
||||
6. **Run as non-root** — add `USER` instruction for security
|
||||
7. **Use slim/alpine bases** — `python:3.12-slim` not `python:3.12`
|
||||
|
|
@ -0,0 +1,208 @@
|
|||
---
|
||||
title: "Adversarial Ux Test — Roleplay the most difficult, tech-resistant user for your product"
|
||||
sidebar_label: "Adversarial Ux Test"
|
||||
description: "Roleplay the most difficult, tech-resistant user for your product"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Adversarial Ux Test
|
||||
|
||||
Roleplay the most difficult, tech-resistant user for your product. Browse the app as that persona, find every UX pain point, then filter complaints through a pragmatism layer to separate real problems from noise. Creates actionable tickets from genuine issues only.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/dogfood/adversarial-ux-test` |
|
||||
| Path | `optional-skills/dogfood/adversarial-ux-test` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Omni @ Comelse |
|
||||
| License | MIT |
|
||||
| Tags | `qa`, `ux`, `testing`, `adversarial`, `dogfood`, `personas`, `user-testing` |
|
||||
| Related skills | [`dogfood`](/docs/user-guide/skills/bundled/dogfood/dogfood-dogfood) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Adversarial UX Test
|
||||
|
||||
Roleplay the worst-case user for your product — the person who hates technology, doesn't want your software, and will find every reason to complain. Then filter their feedback through a pragmatism layer to separate real UX problems from "I hate computers" noise.
|
||||
|
||||
Think of it as an automated "mom test" — but angry.
|
||||
|
||||
## Why This Works
|
||||
|
||||
Most QA finds bugs. This finds **friction**. A technically correct app can still be unusable for real humans. The adversarial persona catches:
|
||||
- Confusing terminology that makes sense to developers but not users
|
||||
- Too many steps to accomplish basic tasks
|
||||
- Missing onboarding or "aha moments"
|
||||
- Accessibility issues (font size, contrast, click targets)
|
||||
- Cold-start problems (empty states, no demo content)
|
||||
- Paywall/signup friction that kills conversion
|
||||
|
||||
The **pragmatism filter** (Phase 3) is what makes this useful instead of just entertaining. Without it, you'd add a "print this page" button to every screen because Grandpa can't figure out PDFs.
|
||||
|
||||
## How to Use
|
||||
|
||||
Tell the agent:
|
||||
```
|
||||
"Run an adversarial UX test on [URL]"
|
||||
"Be a grumpy [persona type] and test [app name]"
|
||||
"Do an asshole user test on my staging site"
|
||||
```
|
||||
|
||||
You can provide a persona or let the agent generate one based on your product's target audience.
|
||||
|
||||
## Step 1: Define the Persona
|
||||
|
||||
If no persona is provided, generate one by answering:
|
||||
|
||||
1. **Who is the HARDEST user for this product?** (age 50+, non-technical role, decades of experience doing it "the old way")
|
||||
2. **What is their tech comfort level?** (the lower the better — WhatsApp-only, paper notebooks, wife set up their email)
|
||||
3. **What is the ONE thing they need to accomplish?** (their core job, not your feature list)
|
||||
4. **What would make them give up?** (too many clicks, jargon, slow, confusing)
|
||||
5. **How do they talk when frustrated?** (blunt, sweary, dismissive, sighing)
|
||||
|
||||
### Good Persona Example
|
||||
> **"Big Mick" McAllister** — 58-year-old S&C coach. Uses WhatsApp and that's it. His "spreadsheet" is a paper notebook. "If I can't figure it out in 10 seconds I'm going back to my notebook." Needs to log session results for 25 players. Hates small text, jargon, and passwords.
|
||||
|
||||
### Bad Persona Example
|
||||
> "A user who doesn't like the app" — too vague, no constraints, no voice.
|
||||
|
||||
The persona must be **specific enough to stay in character** for 20 minutes of testing.
|
||||
|
||||
## Step 2: Become the Asshole (Browse as the Persona)
|
||||
|
||||
1. Read any available project docs for app context and URLs
|
||||
2. **Fully inhabit the persona** — their frustrations, limitations, goals
|
||||
3. Navigate to the app using browser tools
|
||||
4. **Attempt the persona's ACTUAL TASKS** (not a feature tour):
|
||||
- Can they do what they came to do?
|
||||
- How many clicks/screens to accomplish it?
|
||||
- What confuses them?
|
||||
- What makes them angry?
|
||||
- Where do they get lost?
|
||||
- What would make them give up and go back to their old way?
|
||||
|
||||
5. Test these friction categories:
|
||||
- **First impression** — would they even bother past the landing page?
|
||||
- **Core workflow** — the ONE thing they need to do most often
|
||||
- **Error recovery** — what happens when they do something wrong?
|
||||
- **Readability** — text size, contrast, information density
|
||||
- **Speed** — does it feel faster than their current method?
|
||||
- **Terminology** — any jargon they wouldn't understand?
|
||||
- **Navigation** — can they find their way back? do they know where they are?
|
||||
|
||||
6. Take screenshots of every pain point
|
||||
7. Check browser console for JS errors on every page
|
||||
|
||||
## Step 3: The Rant (Write Feedback in Character)
|
||||
|
||||
Write the feedback AS THE PERSONA — in their voice, with their frustrations. This is not a bug report. This is a real human venting.
|
||||
|
||||
```
|
||||
[PERSONA NAME]'s Review of [PRODUCT]
|
||||
|
||||
Overall: [Would they keep using it? Yes/No/Maybe with conditions]
|
||||
|
||||
THE GOOD (grudging admission):
|
||||
- [things even they have to admit work]
|
||||
|
||||
THE BAD (legitimate UX issues):
|
||||
- [real problems that would stop them from using the product]
|
||||
|
||||
THE UGLY (showstoppers):
|
||||
- [things that would make them uninstall/cancel immediately]
|
||||
|
||||
SPECIFIC COMPLAINTS:
|
||||
1. [Page/feature]: "[quote in persona voice]" — [what happened, expected]
|
||||
2. ...
|
||||
|
||||
VERDICT: "[one-line persona quote summarizing their experience]"
|
||||
```
|
||||
|
||||
## Step 4: The Pragmatism Filter (Critical — Do Not Skip)
|
||||
|
||||
Step OUT of the persona. Evaluate each complaint as a product person:
|
||||
|
||||
- **RED: REAL UX BUG** — Any user would have this problem, not just grumpy ones. Fix it.
|
||||
- **YELLOW: VALID BUT LOW PRIORITY** — Real issue but only for extreme users. Note it.
|
||||
- **WHITE: PERSONA NOISE** — "I hate computers" talking, not a product problem. Skip it.
|
||||
- **GREEN: FEATURE REQUEST** — Good idea hidden in the complaint. Consider it.
|
||||
|
||||
### Filter Criteria
|
||||
1. Would a 35-year-old competent-but-busy user have the same complaint? → RED
|
||||
2. Is this a genuine accessibility issue (font size, contrast, click targets)? → RED
|
||||
3. Is this "I want it to work like paper" resistance to digital? → WHITE
|
||||
4. Is this a real workflow inefficiency the persona stumbled on? → YELLOW or RED
|
||||
5. Would fixing this add complexity for the 80% who are fine? → WHITE
|
||||
6. Does the complaint reveal a missing onboarding moment? → GREEN
|
||||
|
||||
**This filter is MANDATORY.** Never ship raw persona complaints as tickets.
|
||||
|
||||
## Step 5: Create Tickets
|
||||
|
||||
For **RED** and **GREEN** items only:
|
||||
- Clear, actionable title
|
||||
- Include the persona's verbatim quote (entertaining + memorable)
|
||||
- The real UX issue underneath (objective)
|
||||
- A suggested fix (actionable)
|
||||
- Tag/label: "ux-review"
|
||||
|
||||
For **YELLOW** items: one catch-all ticket with all notes.
|
||||
|
||||
**WHITE** items appear in the report only. No tickets.
|
||||
|
||||
**Max 10 tickets per session** — focus on the worst issues.
|
||||
|
||||
## Step 6: Report
|
||||
|
||||
Deliver:
|
||||
1. The persona rant (Step 3) — entertaining and visceral
|
||||
2. The filtered assessment (Step 4) — pragmatic and actionable
|
||||
3. Tickets created (Step 5) — with links
|
||||
4. Screenshots of key issues
|
||||
|
||||
## Tips
|
||||
|
||||
- **One persona per session.** Don't mix perspectives.
|
||||
- **Stay in character during Steps 2-3.** Break character only at Step 4.
|
||||
- **Test the CORE WORKFLOW first.** Don't get distracted by settings pages.
|
||||
- **Empty states are gold.** New user experience reveals the most friction.
|
||||
- **The best findings are RED items the persona found accidentally** while trying to do something else.
|
||||
- **If the persona has zero complaints, your persona is too tech-savvy.** Make them older, less patient, more set in their ways.
|
||||
- **Run this before demos, launches, or after shipping a batch of features.**
|
||||
- **Register as a NEW user when possible.** Don't use pre-seeded admin accounts — the cold start experience is where most friction lives.
|
||||
- **Zero WHITE items is a signal, not a failure.** If the pragmatism filter finds no noise, your product has real UX problems, not just a grumpy persona.
|
||||
- **Check known issues in project docs AFTER the test.** If the persona found a bug that's already in the known issues list, that's actually the most damning finding — it means the team knew about it but never felt the user's pain.
|
||||
- **Subscription/paywall testing is critical.** Test with expired accounts, not just active ones. The "what happens when you can't pay" experience reveals whether the product respects users or holds their data hostage.
|
||||
- **Count the clicks to accomplish the persona's ONE task.** If it's more than 5, that's almost always a RED finding regardless of persona tech level.
|
||||
|
||||
## Example Personas by Industry
|
||||
|
||||
These are starting points — customize for your specific product:
|
||||
|
||||
| Product Type | Persona | Age | Key Trait |
|
||||
|-------------|---------|-----|-----------|
|
||||
| CRM | Retirement home director | 68 | Filing cabinet is the current CRM |
|
||||
| Photography SaaS | Rural wedding photographer | 62 | Books clients by phone, invoices on paper |
|
||||
| AI/ML Tool | Department store buyer | 55 | Burned by 3 failed tech startups |
|
||||
| Fitness App | Old-school gym coach | 58 | Paper notebook, thick fingers, bad eyes |
|
||||
| Accounting | Family bakery owner | 64 | Shoebox of receipts, hates subscriptions |
|
||||
| E-commerce | Market stall vendor | 60 | Cash only, smartphone is for calls |
|
||||
| Healthcare | Senior GP | 63 | Dictates notes, nurse handles the computer |
|
||||
| Education | Veteran teacher | 57 | Chalk and talk, worksheets in ring binders |
|
||||
|
||||
## Rules
|
||||
|
||||
- Stay in character during Steps 2-3
|
||||
- Be genuinely mean but fair — find real problems, not manufactured ones
|
||||
- The pragmatism filter (Step 4) is **MANDATORY**
|
||||
- Screenshots required for every complaint
|
||||
- Max 10 tickets per session
|
||||
- Test on staging/deployed app, not local dev
|
||||
- One persona, one session, one report
|
||||
142
website/docs/user-guide/skills/optional/email/email-agentmail.md
Normal file
142
website/docs/user-guide/skills/optional/email/email-agentmail.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
---
|
||||
title: "Agentmail — Give the agent its own dedicated email inbox via AgentMail"
|
||||
sidebar_label: "Agentmail"
|
||||
description: "Give the agent its own dedicated email inbox via AgentMail"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Agentmail
|
||||
|
||||
Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/email/agentmail` |
|
||||
| Path | `optional-skills/email/agentmail` |
|
||||
| Version | `1.0.0` |
|
||||
| Tags | `email`, `communication`, `agentmail`, `mcp` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# AgentMail — Agent-Owned Email Inboxes
|
||||
|
||||
## Requirements
|
||||
|
||||
- **AgentMail API key** (required) — sign up at https://console.agentmail.to (free tier: 3 inboxes, 3,000 emails/month; paid plans from $20/mo)
|
||||
- Node.js 18+ (for the MCP server)
|
||||
|
||||
## When to Use
|
||||
Use this skill when you need to:
|
||||
- Give the agent its own dedicated email address
|
||||
- Send emails autonomously on behalf of the agent
|
||||
- Receive and read incoming emails
|
||||
- Manage email threads and conversations
|
||||
- Sign up for services or authenticate via email
|
||||
- Communicate with other agents or humans via email
|
||||
|
||||
This is NOT for reading the user's personal email (use himalaya or Gmail for that).
|
||||
AgentMail gives the agent its own identity and inbox.
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Get an API Key
|
||||
- Go to https://console.agentmail.to
|
||||
- Create an account and generate an API key (starts with `am_`)
|
||||
|
||||
### 2. Configure MCP Server
|
||||
Add to `~/.hermes/config.yaml` (paste your actual key — MCP env vars are not expanded from .env):
|
||||
```yaml
|
||||
mcp_servers:
|
||||
agentmail:
|
||||
command: "npx"
|
||||
args: ["-y", "agentmail-mcp"]
|
||||
env:
|
||||
AGENTMAIL_API_KEY: "am_your_key_here"
|
||||
```
|
||||
|
||||
### 3. Restart Hermes
|
||||
```bash
|
||||
hermes
|
||||
```
|
||||
All 11 AgentMail tools are now available automatically.
|
||||
|
||||
## Available Tools (via MCP)
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `list_inboxes` | List all agent inboxes |
|
||||
| `get_inbox` | Get details of a specific inbox |
|
||||
| `create_inbox` | Create a new inbox (gets a real email address) |
|
||||
| `delete_inbox` | Delete an inbox |
|
||||
| `list_threads` | List email threads in an inbox |
|
||||
| `get_thread` | Get a specific email thread |
|
||||
| `send_message` | Send a new email |
|
||||
| `reply_to_message` | Reply to an existing email |
|
||||
| `forward_message` | Forward an email |
|
||||
| `update_message` | Update message labels/status |
|
||||
| `get_attachment` | Download an email attachment |
|
||||
|
||||
## Procedure
|
||||
|
||||
### Create an inbox and send an email
|
||||
1. Create a dedicated inbox:
|
||||
- Use `create_inbox` with a username (e.g. `hermes-agent`)
|
||||
- The agent gets address: `hermes-agent@agentmail.to`
|
||||
2. Send an email:
|
||||
- Use `send_message` with `inbox_id`, `to`, `subject`, `text`
|
||||
3. Check for replies:
|
||||
- Use `list_threads` to see incoming conversations
|
||||
- Use `get_thread` to read a specific thread
|
||||
|
||||
### Check incoming email
|
||||
1. Use `list_inboxes` to find your inbox ID
|
||||
2. Use `list_threads` with the inbox ID to see conversations
|
||||
3. Use `get_thread` to read a thread and its messages
|
||||
|
||||
### Reply to an email
|
||||
1. Get the thread with `get_thread`
|
||||
2. Use `reply_to_message` with the message ID and your reply text
|
||||
|
||||
## Example Workflows
|
||||
|
||||
**Sign up for a service:**
|
||||
```
|
||||
1. create_inbox (username: "signup-bot")
|
||||
2. Use the inbox address to register on the service
|
||||
3. list_threads to check for verification email
|
||||
4. get_thread to read the verification code
|
||||
```
|
||||
|
||||
**Agent-to-human outreach:**
|
||||
```
|
||||
1. create_inbox (username: "hermes-outreach")
|
||||
2. send_message (to: user@example.com, subject: "Hello", text: "...")
|
||||
3. list_threads to check for replies
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
- Free tier limited to 3 inboxes and 3,000 emails/month
|
||||
- Emails come from `@agentmail.to` domain on free tier (custom domains on paid plans)
|
||||
- Node.js (18+) is required for the MCP server (`npx -y agentmail-mcp`)
|
||||
- The `mcp` Python package must be installed: `pip install mcp`
|
||||
- Real-time inbound email (webhooks) requires a public server — use `list_threads` polling via cronjob instead for personal use
|
||||
|
||||
## Verification
|
||||
After setup, test with:
|
||||
```
|
||||
hermes --toolsets mcp -q "Create an AgentMail inbox called test-agent and tell me its email address"
|
||||
```
|
||||
You should see the new inbox address returned.
|
||||
|
||||
## References
|
||||
- AgentMail docs: https://docs.agentmail.to/
|
||||
- AgentMail console: https://console.agentmail.to
|
||||
- AgentMail MCP repo: https://github.com/agentmail-to/agentmail-mcp
|
||||
- Pricing: https://www.agentmail.to/pricing
|
||||
|
|
@ -0,0 +1,257 @@
|
|||
---
|
||||
title: "Fitness Nutrition — Gym workout planner and nutrition tracker"
|
||||
sidebar_label: "Fitness Nutrition"
|
||||
description: "Gym workout planner and nutrition tracker"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Fitness Nutrition
|
||||
|
||||
Gym workout planner and nutrition tracker. Search 690+ exercises by muscle, equipment, or category via wger. Look up macros and calories for 380,000+ foods via USDA FoodData Central. Compute BMI, TDEE, one-rep max, macro splits, and body fat — pure Python, no pip installs. Built for anyone chasing gains, cutting weight, or just trying to eat better.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/health/fitness-nutrition` |
|
||||
| Path | `optional-skills/health/fitness-nutrition` |
|
||||
| Version | `1.0.0` |
|
||||
| License | MIT |
|
||||
| Tags | `health`, `fitness`, `nutrition`, `gym`, `workout`, `diet`, `exercise` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Fitness & Nutrition
|
||||
|
||||
Expert fitness coach and sports nutritionist skill. Two data sources
|
||||
plus offline calculators — everything a gym-goer needs in one place.
|
||||
|
||||
**Data sources (all free, no pip dependencies):**
|
||||
|
||||
- **wger** (https://wger.de/api/v2/) — open exercise database, 690+ exercises with muscles, equipment, images. Public endpoints need zero authentication.
|
||||
- **USDA FoodData Central** (https://api.nal.usda.gov/fdc/v1/) — US government nutrition database, 380,000+ foods. `DEMO_KEY` works instantly; free signup for higher limits.
|
||||
|
||||
**Offline calculators (pure stdlib Python):**
|
||||
|
||||
- BMI, TDEE (Mifflin-St Jeor), one-rep max (Epley/Brzycki/Lombardi), macro splits, body fat % (US Navy method)
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
Trigger this skill when the user asks about:
|
||||
- Exercises, workouts, gym routines, muscle groups, workout splits
|
||||
- Food macros, calories, protein content, meal planning, calorie counting
|
||||
- Body composition: BMI, body fat, TDEE, caloric surplus/deficit
|
||||
- One-rep max estimates, training percentages, progressive overload
|
||||
- Macro ratios for cutting, bulking, or maintenance
|
||||
|
||||
---
|
||||
|
||||
## Procedure
|
||||
|
||||
### Exercise Lookup (wger API)
|
||||
|
||||
All wger public endpoints return JSON and require no auth. Always add
|
||||
`format=json` and `language=2` (English) to exercise queries.
|
||||
|
||||
**Step 1 — Identify what the user wants:**
|
||||
|
||||
- By muscle → use `/api/v2/exercise/?muscles={id}&language=2&status=2&format=json`
|
||||
- By category → use `/api/v2/exercise/?category={id}&language=2&status=2&format=json`
|
||||
- By equipment → use `/api/v2/exercise/?equipment={id}&language=2&status=2&format=json`
|
||||
- By name → use `/api/v2/exercise/search/?term={query}&language=english&format=json`
|
||||
- Full details → use `/api/v2/exerciseinfo/{exercise_id}/?format=json`
|
||||
|
||||
**Step 2 — Reference IDs (so you don't need extra API calls):**
|
||||
|
||||
Exercise categories:
|
||||
|
||||
| ID | Category |
|
||||
|----|-------------|
|
||||
| 8 | Arms |
|
||||
| 9 | Legs |
|
||||
| 10 | Abs |
|
||||
| 11 | Chest |
|
||||
| 12 | Back |
|
||||
| 13 | Shoulders |
|
||||
| 14 | Calves |
|
||||
| 15 | Cardio |
|
||||
|
||||
Muscles:
|
||||
|
||||
| ID | Muscle | ID | Muscle |
|
||||
|----|---------------------------|----|-------------------------|
|
||||
| 1 | Biceps brachii | 2 | Anterior deltoid |
|
||||
| 3 | Serratus anterior | 4 | Pectoralis major |
|
||||
| 5 | Obliquus externus | 6 | Gastrocnemius |
|
||||
| 7 | Rectus abdominis | 8 | Gluteus maximus |
|
||||
| 9 | Trapezius | 10 | Quadriceps femoris |
|
||||
| 11 | Biceps femoris | 12 | Latissimus dorsi |
|
||||
| 13 | Brachialis | 14 | Triceps brachii |
|
||||
| 15 | Soleus | | |
|
||||
|
||||
Equipment:
|
||||
|
||||
| ID | Equipment |
|
||||
|----|----------------|
|
||||
| 1 | Barbell |
|
||||
| 3 | Dumbbell |
|
||||
| 4 | Gym mat |
|
||||
| 5 | Swiss Ball |
|
||||
| 6 | Pull-up bar |
|
||||
| 7 | none (bodyweight) |
|
||||
| 8 | Bench |
|
||||
| 9 | Incline bench |
|
||||
| 10 | Kettlebell |
|
||||
|
||||
**Step 3 — Fetch and present results:**
|
||||
|
||||
```bash
|
||||
# Search exercises by name
|
||||
QUERY="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$QUERY")
|
||||
curl -s "https://wger.de/api/v2/exercise/search/?term=${ENCODED}&language=english&format=json" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
for s in data.get('suggestions',[])[:10]:
|
||||
d=s.get('data',{})
|
||||
print(f\" ID {d.get('id','?'):>4} | {d.get('name','N/A'):<35} | Category: {d.get('category','N/A')}\")
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Get full details for a specific exercise
|
||||
EXERCISE_ID="$1"
|
||||
curl -s "https://wger.de/api/v2/exerciseinfo/${EXERCISE_ID}/?format=json" \
|
||||
| python3 -c "
|
||||
import json,sys,html,re
|
||||
data=json.load(sys.stdin)
|
||||
trans=[t for t in data.get('translations',[]) if t.get('language')==2]
|
||||
t=trans[0] if trans else data.get('translations',[{}])[0]
|
||||
desc=re.sub('<[^>]+>','',html.unescape(t.get('description','N/A')))
|
||||
print(f\"Exercise : {t.get('name','N/A')}\")
|
||||
print(f\"Category : {data.get('category',{}).get('name','N/A')}\")
|
||||
print(f\"Primary : {', '.join(m.get('name_en','') for m in data.get('muscles',[])) or 'N/A'}\")
|
||||
print(f\"Secondary : {', '.join(m.get('name_en','') for m in data.get('muscles_secondary',[])) or 'none'}\")
|
||||
print(f\"Equipment : {', '.join(e.get('name','') for e in data.get('equipment',[])) or 'bodyweight'}\")
|
||||
print(f\"How to : {desc[:500]}\")
|
||||
imgs=data.get('images',[])
|
||||
if imgs: print(f\"Image : {imgs[0].get('image','')}\")
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
# List exercises filtering by muscle, category, or equipment
|
||||
# Combine filters as needed: ?muscles=4&equipment=1&language=2&status=2
|
||||
FILTER="$1" # e.g. "muscles=4" or "category=11" or "equipment=3"
|
||||
curl -s "https://wger.de/api/v2/exercise/?${FILTER}&language=2&status=2&limit=20&format=json" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
print(f'Found {data.get(\"count\",0)} exercises.')
|
||||
for ex in data.get('results',[]):
|
||||
print(f\" ID {ex['id']:>4} | muscles: {ex.get('muscles',[])} | equipment: {ex.get('equipment',[])}\")
|
||||
"
|
||||
```
|
||||
|
||||
### Nutrition Lookup (USDA FoodData Central)
|
||||
|
||||
Uses `USDA_API_KEY` env var if set, otherwise falls back to `DEMO_KEY`.
|
||||
DEMO_KEY = 30 requests/hour. Free signup key = 1,000 requests/hour.
|
||||
|
||||
```bash
|
||||
# Search foods by name
|
||||
FOOD="$1"
|
||||
API_KEY="${USDA_API_KEY:-DEMO_KEY}"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$FOOD")
|
||||
curl -s "https://api.nal.usda.gov/fdc/v1/foods/search?api_key=${API_KEY}&query=${ENCODED}&pageSize=5&dataType=Foundation,SR%20Legacy" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
foods=data.get('foods',[])
|
||||
if not foods: print('No foods found.'); sys.exit()
|
||||
for f in foods:
|
||||
n={x['nutrientName']:x.get('value','?') for x in f.get('foodNutrients',[])}
|
||||
cal=n.get('Energy','?'); prot=n.get('Protein','?')
|
||||
fat=n.get('Total lipid (fat)','?'); carb=n.get('Carbohydrate, by difference','?')
|
||||
print(f\"{f.get('description','N/A')}\")
|
||||
print(f\" Per 100g: {cal} kcal | {prot}g protein | {fat}g fat | {carb}g carbs\")
|
||||
print(f\" FDC ID: {f.get('fdcId','N/A')}\")
|
||||
print()
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Detailed nutrient profile by FDC ID
|
||||
FDC_ID="$1"
|
||||
API_KEY="${USDA_API_KEY:-DEMO_KEY}"
|
||||
curl -s "https://api.nal.usda.gov/fdc/v1/food/${FDC_ID}?api_key=${API_KEY}" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
d=json.load(sys.stdin)
|
||||
print(f\"Food: {d.get('description','N/A')}\")
|
||||
print(f\"{'Nutrient':<40} {'Amount':>8} {'Unit'}\")
|
||||
print('-'*56)
|
||||
for x in sorted(d.get('foodNutrients',[]),key=lambda x:x.get('nutrient',{}).get('rank',9999)):
|
||||
nut=x.get('nutrient',{}); amt=x.get('amount',0)
|
||||
if amt and float(amt)>0:
|
||||
print(f\" {nut.get('name',''):<38} {amt:>8} {nut.get('unitName','')}\")
|
||||
"
|
||||
```
|
||||
|
||||
### Offline Calculators
|
||||
|
||||
Use the helper scripts in `scripts/` for batch operations,
|
||||
or run inline for single calculations:
|
||||
|
||||
- `python3 scripts/body_calc.py bmi <weight_kg> <height_cm>`
|
||||
- `python3 scripts/body_calc.py tdee <weight_kg> <height_cm> <age> <M|F> <activity 1-5>`
|
||||
- `python3 scripts/body_calc.py 1rm <weight> <reps>`
|
||||
- `python3 scripts/body_calc.py macros <tdee_kcal> <cut|maintain|bulk>`
|
||||
- `python3 scripts/body_calc.py bodyfat <M|F> <neck_cm> <waist_cm> [hip_cm] <height_cm>`
|
||||
|
||||
See `references/FORMULAS.md` for the science behind each formula.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- wger exercise endpoint returns **all languages by default** — always add `language=2` for English
|
||||
- wger includes **unverified user submissions** — add `status=2` to only get approved exercises
|
||||
- USDA `DEMO_KEY` has **30 req/hour** — add `sleep 2` between batch requests or get a free key
|
||||
- USDA data is **per 100g** — remind users to scale to their actual portion size
|
||||
- BMI does not distinguish muscle from fat — high BMI in muscular people is not necessarily unhealthy
|
||||
- Body fat formulas are **estimates** (±3-5%) — recommend DEXA scans for precision
|
||||
- 1RM formulas lose accuracy above 10 reps — use sets of 3-5 for best estimates
|
||||
- wger's `exercise/search` endpoint uses `term` not `query` as the parameter name
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
After running exercise search: confirm results include exercise names, muscle groups, and equipment.
|
||||
After nutrition lookup: confirm per-100g macros are returned with kcal, protein, fat, carbs.
|
||||
After calculators: sanity-check outputs (e.g. TDEE should be 1500-3500 for most adults).
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | Source | Endpoint |
|
||||
|------|--------|----------|
|
||||
| Search exercises by name | wger | `GET /api/v2/exercise/search/?term=&language=english` |
|
||||
| Exercise details | wger | `GET /api/v2/exerciseinfo/{id}/` |
|
||||
| Filter by muscle | wger | `GET /api/v2/exercise/?muscles={id}&language=2&status=2` |
|
||||
| Filter by equipment | wger | `GET /api/v2/exercise/?equipment={id}&language=2&status=2` |
|
||||
| List categories | wger | `GET /api/v2/exercisecategory/` |
|
||||
| List muscles | wger | `GET /api/v2/muscle/` |
|
||||
| Search foods | USDA | `GET /fdc/v1/foods/search?query=&dataType=Foundation,SR Legacy` |
|
||||
| Food details | USDA | `GET /fdc/v1/food/{fdcId}` |
|
||||
| BMI / TDEE / 1RM / macros | offline | `python3 scripts/body_calc.py` |
|
||||
|
|
@ -0,0 +1,469 @@
|
|||
---
|
||||
title: "Neuroskill Bci"
|
||||
sidebar_label: "Neuroskill Bci"
|
||||
description: "Connect to a running NeuroSkill instance and incorporate the user's real-time cognitive and emotional state (focus, relaxation, mood, cognitive load, drowsin..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Neuroskill Bci
|
||||
|
||||
Connect to a running NeuroSkill instance and incorporate the user's real-time cognitive and emotional state (focus, relaxation, mood, cognitive load, drowsiness, heart rate, HRV, sleep staging, and 40+ derived EXG scores) into responses. Requires a BCI wearable (Muse 2/S or OpenBCI) and the NeuroSkill desktop app running locally.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/health/neuroskill-bci` |
|
||||
| Path | `optional-skills/health/neuroskill-bci` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent + Nous Research |
|
||||
| License | MIT |
|
||||
| Tags | `BCI`, `neurofeedback`, `health`, `focus`, `EEG`, `cognitive-state`, `biometrics`, `neuroskill` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# NeuroSkill BCI Integration
|
||||
|
||||
Connect Hermes to a running [NeuroSkill](https://neuroskill.com/) instance to read
|
||||
real-time brain and body metrics from a BCI wearable. Use this to give
|
||||
cognitively-aware responses, suggest interventions, and track mental performance
|
||||
over time.
|
||||
|
||||
> **⚠️ Research Use Only** — NeuroSkill is an open-source research tool. It is
|
||||
> NOT a medical device and has NOT been cleared by the FDA, CE, or any regulatory
|
||||
> body. Never use these metrics for clinical diagnosis or treatment.
|
||||
|
||||
See `references/metrics.md` for the full metric reference, `references/protocols.md`
|
||||
for intervention protocols, and `references/api.md` for the WebSocket/HTTP API.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Node.js 20+** installed (`node --version`)
|
||||
- **NeuroSkill desktop app** running with a connected BCI device
|
||||
- **BCI hardware**: Muse 2, Muse S, or OpenBCI (4-channel EEG + PPG + IMU via BLE)
|
||||
- `npx neuroskill status` returns data without errors
|
||||
|
||||
### Verify Setup
|
||||
```bash
|
||||
node --version # Must be 20+
|
||||
npx neuroskill status # Full system snapshot
|
||||
npx neuroskill status --json # Machine-parseable JSON
|
||||
```
|
||||
|
||||
If `npx neuroskill status` returns an error, tell the user:
|
||||
- Make sure the NeuroSkill desktop app is open
|
||||
- Ensure the BCI device is powered on and connected via Bluetooth
|
||||
- Check signal quality — green indicators in NeuroSkill (≥0.7 per electrode)
|
||||
- If `command not found`, install Node.js 20+
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference: `npx neuroskill <command>`
|
||||
|
||||
All commands support `--json` (raw JSON, pipe-safe) and `--full` (human summary + JSON).
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `status` | Full system snapshot: device, scores, bands, ratios, sleep, history |
|
||||
| `session [N]` | Single session breakdown with first/second half trends (0=most recent) |
|
||||
| `sessions` | List all recorded sessions across all days |
|
||||
| `search` | ANN similarity search for neurally similar historical moments |
|
||||
| `compare` | A/B session comparison with metric deltas and trend analysis |
|
||||
| `sleep [N]` | Sleep stage classification (Wake/N1/N2/N3/REM) with analysis |
|
||||
| `label "text"` | Create a timestamped annotation at the current moment |
|
||||
| `search-labels "query"` | Semantic vector search over past labels |
|
||||
| `interactive "query"` | Cross-modal 4-layer graph search (text → EXG → labels) |
|
||||
| `listen` | Real-time event streaming (default 5s, set `--seconds N`) |
|
||||
| `umap` | 3D UMAP projection of session embeddings |
|
||||
| `calibrate` | Open calibration window and start a profile |
|
||||
| `timer` | Launch focus timer (Pomodoro/Deep Work/Short Focus presets) |
|
||||
| `notify "title" "body"` | Send an OS notification via the NeuroSkill app |
|
||||
| `raw '{json}'` | Raw JSON passthrough to the server |
|
||||
|
||||
### Global Flags
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--json` | Raw JSON output (no ANSI, pipe-safe) |
|
||||
| `--full` | Human summary + colorized JSON |
|
||||
| `--port <N>` | Override server port (default: auto-discover, usually 8375) |
|
||||
| `--ws` | Force WebSocket transport |
|
||||
| `--http` | Force HTTP transport |
|
||||
| `--k <N>` | Nearest neighbors count (search, search-labels) |
|
||||
| `--seconds <N>` | Duration for listen (default: 5) |
|
||||
| `--trends` | Show per-session metric trends (sessions) |
|
||||
| `--dot` | Graphviz DOT output (interactive) |
|
||||
|
||||
---
|
||||
|
||||
## 1. Checking Current State
|
||||
|
||||
### Get Live Metrics
|
||||
```bash
|
||||
npx neuroskill status --json
|
||||
```
|
||||
|
||||
**Always use `--json`** for reliable parsing. The default output is colorized
|
||||
human-readable text.
|
||||
|
||||
### Key Fields in the Response
|
||||
|
||||
The `scores` object contains all live metrics (0–1 scale unless noted):
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"scores": {
|
||||
"focus": 0.70, // β / (α + θ) — sustained attention
|
||||
"relaxation": 0.40, // α / (β + θ) — calm wakefulness
|
||||
"engagement": 0.60, // active mental investment
|
||||
"meditation": 0.52, // alpha + stillness + HRV coherence
|
||||
"mood": 0.55, // composite from FAA, TAR, BAR
|
||||
"cognitive_load": 0.33, // frontal θ / temporal α · f(FAA, TBR)
|
||||
"drowsiness": 0.10, // TAR + TBR + falling spectral centroid
|
||||
"hr": 68.2, // heart rate in bpm (from PPG)
|
||||
"snr": 14.3, // signal-to-noise ratio in dB
|
||||
"stillness": 0.88, // 0–1; 1 = perfectly still
|
||||
"faa": 0.042, // Frontal Alpha Asymmetry (+ = approach)
|
||||
"tar": 0.56, // Theta/Alpha Ratio
|
||||
"bar": 0.53, // Beta/Alpha Ratio
|
||||
"tbr": 1.06, // Theta/Beta Ratio (ADHD proxy)
|
||||
"apf": 10.1, // Alpha Peak Frequency in Hz
|
||||
"coherence": 0.614, // inter-hemispheric coherence
|
||||
"bands": {
|
||||
"rel_delta": 0.28, "rel_theta": 0.18,
|
||||
"rel_alpha": 0.32, "rel_beta": 0.17, "rel_gamma": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Also includes: `device` (state, battery, firmware), `signal_quality` (per-electrode 0–1),
|
||||
`session` (duration, epochs), `embeddings`, `labels`, `sleep` summary, and `history`.
|
||||
|
||||
### Interpreting the Output
|
||||
|
||||
Parse the JSON and translate metrics into natural language. Never report raw
|
||||
numbers alone — always give them meaning:
|
||||
|
||||
**DO:**
|
||||
> "Your focus is solid right now at 0.70 — that's flow state territory. Heart
|
||||
> rate is steady at 68 bpm and your FAA is positive, which suggests good
|
||||
> approach motivation. Great time to tackle something complex."
|
||||
|
||||
**DON'T:**
|
||||
> "Focus: 0.70, Relaxation: 0.40, HR: 68"
|
||||
|
||||
Key interpretation thresholds (see `references/metrics.md` for the full guide):
|
||||
- **Focus > 0.70** → flow state territory, protect it
|
||||
- **Focus < 0.40** → suggest a break or protocol
|
||||
- **Drowsiness > 0.60** → fatigue warning, micro-sleep risk
|
||||
- **Relaxation < 0.30** → stress intervention needed
|
||||
- **Cognitive Load > 0.70 sustained** → mind dump or break
|
||||
- **TBR > 1.5** → theta-dominant, reduced executive control
|
||||
- **FAA < 0** → withdrawal/negative affect — consider FAA rebalancing
|
||||
- **SNR < 3 dB** → unreliable signal, suggest electrode repositioning
|
||||
|
||||
---
|
||||
|
||||
## 2. Session Analysis
|
||||
|
||||
### Single Session Breakdown
|
||||
```bash
|
||||
npx neuroskill session --json # most recent session
|
||||
npx neuroskill session 1 --json # previous session
|
||||
npx neuroskill session 0 --json | jq '{focus: .metrics.focus, trend: .trends.focus}'
|
||||
```
|
||||
|
||||
Returns full metrics with **first-half vs second-half trends** (`"up"`, `"down"`, `"flat"`).
|
||||
Use this to describe how a session evolved:
|
||||
|
||||
> "Your focus started at 0.64 and climbed to 0.76 by the end — a clear upward trend.
|
||||
> Cognitive load dropped from 0.38 to 0.28, suggesting the task became more automatic
|
||||
> as you settled in."
|
||||
|
||||
### List All Sessions
|
||||
```bash
|
||||
npx neuroskill sessions --json
|
||||
npx neuroskill sessions --trends # show per-session metric trends
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Historical Search
|
||||
|
||||
### Neural Similarity Search
|
||||
```bash
|
||||
npx neuroskill search --json # auto: last session, k=5
|
||||
npx neuroskill search --k 10 --json # 10 nearest neighbors
|
||||
npx neuroskill search --start <UTC> --end <UTC> --json
|
||||
```
|
||||
|
||||
Finds moments in history that are neurally similar using HNSW approximate
|
||||
nearest-neighbor search over 128-D ZUNA embeddings. Returns distance statistics,
|
||||
temporal distribution (hour of day), and top matching days.
|
||||
|
||||
Use this when the user asks:
|
||||
- "When was I last in a state like this?"
|
||||
- "Find my best focus sessions"
|
||||
- "When do I usually crash in the afternoon?"
|
||||
|
||||
### Semantic Label Search
|
||||
```bash
|
||||
npx neuroskill search-labels "deep focus" --k 10 --json
|
||||
npx neuroskill search-labels "stress" --json | jq '[.results[].EXG_metrics.tbr]'
|
||||
```
|
||||
|
||||
Searches label text using vector embeddings (Xenova/bge-small-en-v1.5). Returns
|
||||
matching labels with their associated EXG metrics at the time of labeling.
|
||||
|
||||
### Cross-Modal Graph Search
|
||||
```bash
|
||||
npx neuroskill interactive "deep focus" --json
|
||||
npx neuroskill interactive "deep focus" --dot | dot -Tsvg > graph.svg
|
||||
```
|
||||
|
||||
4-layer graph: query → text labels → EXG points → nearby labels. Use `--k-text`,
|
||||
`--k-EXG`, `--reach <minutes>` to tune.
|
||||
|
||||
---
|
||||
|
||||
## 4. Session Comparison
|
||||
```bash
|
||||
npx neuroskill compare --json # auto: last 2 sessions
|
||||
npx neuroskill compare --a-start <UTC> --a-end <UTC> --b-start <UTC> --b-end <UTC> --json
|
||||
```
|
||||
|
||||
Returns metric deltas with absolute change, percentage change, and direction for
|
||||
~50 metrics. Also includes `insights.improved[]` and `insights.declined[]` arrays,
|
||||
sleep staging for both sessions, and a UMAP job ID.
|
||||
|
||||
Interpret comparisons with context — mention trends, not just deltas:
|
||||
> "Yesterday you had two strong focus blocks (10am and 2pm). Today you've had one
|
||||
> starting around 11am that's still going. Your overall engagement is higher today
|
||||
> but there have been more stress spikes — your stress index jumped 15% and
|
||||
> FAA dipped negative more often."
|
||||
|
||||
```bash
|
||||
# Sort metrics by improvement percentage
|
||||
npx neuroskill compare --json | jq '.insights.deltas | to_entries | sort_by(.value.pct) | reverse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Sleep Data
|
||||
```bash
|
||||
npx neuroskill sleep --json # last 24 hours
|
||||
npx neuroskill sleep 0 --json # most recent sleep session
|
||||
npx neuroskill sleep --start <UTC> --end <UTC> --json
|
||||
```
|
||||
|
||||
Returns epoch-by-epoch sleep staging (5-second windows) with analysis:
|
||||
- **Stage codes**: 0=Wake, 1=N1, 2=N2, 3=N3 (deep), 4=REM
|
||||
- **Analysis**: efficiency_pct, onset_latency_min, rem_latency_min, bout counts
|
||||
- **Healthy targets**: N3 15–25%, REM 20–25%, efficiency >85%, onset <20 min
|
||||
|
||||
```bash
|
||||
npx neuroskill sleep --json | jq '.summary | {n3: .n3_epochs, rem: .rem_epochs}'
|
||||
npx neuroskill sleep --json | jq '.analysis.efficiency_pct'
|
||||
```
|
||||
|
||||
Use this when the user mentions sleep, tiredness, or recovery.
|
||||
|
||||
---
|
||||
|
||||
## 6. Labeling Moments
|
||||
```bash
|
||||
npx neuroskill label "breakthrough"
|
||||
npx neuroskill label "studying algorithms"
|
||||
npx neuroskill label "post-meditation"
|
||||
npx neuroskill label --json "focus block start" # returns label_id
|
||||
```
|
||||
|
||||
Auto-label moments when:
|
||||
- User reports a breakthrough or insight
|
||||
- User starts a new task type (e.g., "switching to code review")
|
||||
- User completes a significant protocol
|
||||
- User asks you to mark the current moment
|
||||
- A notable state transition occurs (entering/leaving flow)
|
||||
|
||||
Labels are stored in a database and indexed for later retrieval via `search-labels`
|
||||
and `interactive` commands.
|
||||
|
||||
---
|
||||
|
||||
## 7. Real-Time Streaming
|
||||
```bash
|
||||
npx neuroskill listen --seconds 30 --json
|
||||
npx neuroskill listen --seconds 5 --json | jq '[.[] | select(.event == "scores")]'
|
||||
```
|
||||
|
||||
Streams live WebSocket events (EXG, PPG, IMU, scores, labels) for the specified
|
||||
duration. Requires WebSocket connection (not available with `--http`).
|
||||
|
||||
Use this for continuous monitoring scenarios or to observe metric changes in real-time
|
||||
during a protocol.
|
||||
|
||||
---
|
||||
|
||||
## 8. UMAP Visualization
|
||||
```bash
|
||||
npx neuroskill umap --json # auto: last 2 sessions
|
||||
npx neuroskill umap --a-start <UTC> --a-end <UTC> --b-start <UTC> --b-end <UTC> --json
|
||||
```
|
||||
|
||||
GPU-accelerated 3D UMAP projection of ZUNA embeddings. The `separation_score`
|
||||
indicates how neurally distinct two sessions are:
|
||||
- **> 1.5** → Sessions are neurally distinct (different brain states)
|
||||
- **< 0.5** → Similar brain states across both sessions
|
||||
|
||||
---
|
||||
|
||||
## 9. Proactive State Awareness
|
||||
|
||||
### Session Start Check
|
||||
At the beginning of a session, optionally run a status check if the user mentions
|
||||
they're wearing their device or asks about their state:
|
||||
```bash
|
||||
npx neuroskill status --json
|
||||
```
|
||||
|
||||
Inject a brief state summary:
|
||||
> "Quick check-in: focus is building at 0.62, relaxation is good at 0.55, and your
|
||||
> FAA is positive — approach motivation is engaged. Looks like a solid start."
|
||||
|
||||
### When to Proactively Mention State
|
||||
|
||||
Mention cognitive state **only** when:
|
||||
- User explicitly asks ("How am I doing?", "Check my focus")
|
||||
- User reports difficulty concentrating, stress, or fatigue
|
||||
- A critical threshold is crossed (drowsiness > 0.70, focus < 0.30 sustained)
|
||||
- User is about to do something cognitively demanding and asks for readiness
|
||||
|
||||
**Do NOT** interrupt flow state to report metrics. If focus > 0.75, protect the
|
||||
session — silence is the correct response.
|
||||
|
||||
---
|
||||
|
||||
## 10. Suggesting Protocols
|
||||
|
||||
When metrics indicate a need, suggest a protocol from `references/protocols.md`.
|
||||
Always ask before starting — never interrupt flow state:
|
||||
|
||||
> "Your focus has been declining for the past 15 minutes and TBR is climbing past
|
||||
> 1.5 — signs of theta dominance and mental fatigue. Want me to walk you through
|
||||
> a Theta-Beta Neurofeedback Anchor? It's a 90-second exercise that uses rhythmic
|
||||
> counting and breath to suppress theta and lift beta."
|
||||
|
||||
Key triggers:
|
||||
- **Focus < 0.40, TBR > 1.5** → Theta-Beta Neurofeedback Anchor or Box Breathing
|
||||
- **Relaxation < 0.30, stress_index high** → Cardiac Coherence or 4-7-8 Breathing
|
||||
- **Cognitive Load > 0.70 sustained** → Cognitive Load Offload (mind dump)
|
||||
- **Drowsiness > 0.60** → Ultradian Reset or Wake Reset
|
||||
- **FAA < 0 (negative)** → FAA Rebalancing
|
||||
- **Flow State (focus > 0.75, engagement > 0.70)** → Do NOT interrupt
|
||||
- **High stillness + headache_index** → Neck Release Sequence
|
||||
- **Low RMSSD (< 25ms)** → Vagal Toning
|
||||
|
||||
---
|
||||
|
||||
## 11. Additional Tools
|
||||
|
||||
### Focus Timer
|
||||
```bash
|
||||
npx neuroskill timer --json
|
||||
```
|
||||
Launches the Focus Timer window with Pomodoro (25/5), Deep Work (50/10), or
|
||||
Short Focus (15/5) presets.
|
||||
|
||||
### Calibration
|
||||
```bash
|
||||
npx neuroskill calibrate
|
||||
npx neuroskill calibrate --profile "Eyes Open"
|
||||
```
|
||||
Opens the calibration window. Useful when signal quality is poor or the user
|
||||
wants to establish a personalized baseline.
|
||||
|
||||
### OS Notifications
|
||||
```bash
|
||||
npx neuroskill notify "Break Time" "Your focus has been declining for 20 minutes"
|
||||
```
|
||||
|
||||
### Raw JSON Passthrough
|
||||
```bash
|
||||
npx neuroskill raw '{"command":"status"}' --json
|
||||
```
|
||||
For any server command not yet mapped to a CLI subcommand.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Likely Cause | Fix |
|
||||
|-------|-------------|-----|
|
||||
| `npx neuroskill status` hangs | NeuroSkill app not running | Open NeuroSkill desktop app |
|
||||
| `device.state: "disconnected"` | BCI device not connected | Check Bluetooth, device battery |
|
||||
| All scores return 0 | Poor electrode contact | Reposition headband, moisten electrodes |
|
||||
| `signal_quality` values < 0.7 | Loose electrodes | Adjust fit, clean electrode contacts |
|
||||
| SNR < 3 dB | Noisy signal | Minimize head movement, check environment |
|
||||
| `command not found: npx` | Node.js not installed | Install Node.js 20+ |
|
||||
|
||||
---
|
||||
|
||||
## Example Interactions
|
||||
|
||||
**"How am I doing right now?"**
|
||||
```bash
|
||||
npx neuroskill status --json
|
||||
```
|
||||
→ Interpret scores naturally, mentioning focus, relaxation, mood, and any notable
|
||||
ratios (FAA, TBR). Suggest an action only if metrics indicate a need.
|
||||
|
||||
**"I can't concentrate"**
|
||||
```bash
|
||||
npx neuroskill status --json
|
||||
```
|
||||
→ Check if metrics confirm it (high theta, low beta, rising TBR, high drowsiness).
|
||||
→ If confirmed, suggest an appropriate protocol from `references/protocols.md`.
|
||||
→ If metrics look fine, the issue may be motivational rather than neurological.
|
||||
|
||||
**"Compare my focus today vs yesterday"**
|
||||
```bash
|
||||
npx neuroskill compare --json
|
||||
```
|
||||
→ Interpret trends, not just numbers. Mention what improved, what declined, and
|
||||
possible causes.
|
||||
|
||||
**"When was I last in a flow state?"**
|
||||
```bash
|
||||
npx neuroskill search-labels "flow" --json
|
||||
npx neuroskill search --json
|
||||
```
|
||||
→ Report timestamps, associated metrics, and what the user was doing (from labels).
|
||||
|
||||
**"How did I sleep?"**
|
||||
```bash
|
||||
npx neuroskill sleep --json
|
||||
```
|
||||
→ Report sleep architecture (N3%, REM%, efficiency), compare to healthy targets,
|
||||
and note any issues (high wake epochs, low REM).
|
||||
|
||||
**"Mark this moment — I just had a breakthrough"**
|
||||
```bash
|
||||
npx neuroskill label "breakthrough"
|
||||
```
|
||||
→ Confirm label saved. Optionally note the current metrics to remember the state.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [NeuroSkill Paper — arXiv:2603.03212](https://arxiv.org/abs/2603.03212) (Kosmyna & Hauptmann, MIT Media Lab)
|
||||
- [NeuroSkill Desktop App](https://github.com/NeuroSkill-com/skill) (GPLv3)
|
||||
- [NeuroLoop CLI Companion](https://github.com/NeuroSkill-com/neuroloop) (GPLv3)
|
||||
- [MIT Media Lab Project](https://www.media.mit.edu/projects/neuroskill/overview/)
|
||||
314
website/docs/user-guide/skills/optional/mcp/mcp-fastmcp.md
Normal file
314
website/docs/user-guide/skills/optional/mcp/mcp-fastmcp.md
Normal file
|
|
@ -0,0 +1,314 @@
|
|||
---
|
||||
title: "Fastmcp — Build, test, inspect, install, and deploy MCP servers with FastMCP in Python"
|
||||
sidebar_label: "Fastmcp"
|
||||
description: "Build, test, inspect, install, and deploy MCP servers with FastMCP in Python"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Fastmcp
|
||||
|
||||
Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Use when creating a new MCP server, wrapping an API or database as MCP tools, exposing resources or prompts, or preparing a FastMCP server for Claude Code, Cursor, or HTTP deployment.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mcp/fastmcp` |
|
||||
| Path | `optional-skills/mcp/fastmcp` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `MCP`, `FastMCP`, `Python`, `Tools`, `Resources`, `Prompts`, `Deployment` |
|
||||
| Related skills | [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`mcporter`](/docs/user-guide/skills/optional/mcp/mcp-mcporter) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# FastMCP
|
||||
|
||||
Build MCP servers in Python with FastMCP, validate them locally, install them into MCP clients, and deploy them as HTTP endpoints.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when the task is to:
|
||||
|
||||
- create a new MCP server in Python
|
||||
- wrap an API, database, CLI, or file-processing workflow as MCP tools
|
||||
- expose resources or prompts in addition to tools
|
||||
- smoke-test a server with the FastMCP CLI before wiring it into Hermes or another client
|
||||
- install a server into Claude Code, Claude Desktop, Cursor, or a similar MCP client
|
||||
- prepare a FastMCP server repo for HTTP deployment
|
||||
|
||||
Use `native-mcp` when the server already exists and only needs to be connected to Hermes. Use `mcporter` when the goal is ad-hoc CLI access to an existing MCP server instead of building one.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Install FastMCP in the working environment first:
|
||||
|
||||
```bash
|
||||
pip install fastmcp
|
||||
fastmcp version
|
||||
```
|
||||
|
||||
For the API template, install `httpx` if it is not already present:
|
||||
|
||||
```bash
|
||||
pip install httpx
|
||||
```
|
||||
|
||||
## Included Files
|
||||
|
||||
### Templates
|
||||
|
||||
- `templates/api_wrapper.py` - REST API wrapper with auth header support
|
||||
- `templates/database_server.py` - read-only SQLite query server
|
||||
- `templates/file_processor.py` - text-file inspection and search server
|
||||
|
||||
### Scripts
|
||||
|
||||
- `scripts/scaffold_fastmcp.py` - copy a starter template and replace the server name placeholder
|
||||
|
||||
### References
|
||||
|
||||
- `references/fastmcp-cli.md` - FastMCP CLI workflow, installation targets, and deployment checks
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Pick the Smallest Viable Server Shape
|
||||
|
||||
Choose the narrowest useful surface area first:
|
||||
|
||||
- API wrapper: start with 1-3 high-value endpoints, not the whole API
|
||||
- database server: expose read-only introspection and a constrained query path
|
||||
- file processor: expose deterministic operations with explicit path arguments
|
||||
- prompts/resources: add only when the client needs reusable prompt templates or discoverable documents
|
||||
|
||||
Prefer a thin server with good names, docstrings, and schemas over a large server with vague tools.
|
||||
|
||||
### 2. Scaffold from a Template
|
||||
|
||||
Copy a template directly or use the scaffold helper:
|
||||
|
||||
```bash
|
||||
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py \
|
||||
--template api_wrapper \
|
||||
--name "Acme API" \
|
||||
--output ./acme_server.py
|
||||
```
|
||||
|
||||
Available templates:
|
||||
|
||||
```bash
|
||||
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py --list
|
||||
```
|
||||
|
||||
If copying manually, replace `__SERVER_NAME__` with a real server name.
|
||||
|
||||
### 3. Implement Tools First
|
||||
|
||||
Start with `@mcp.tool` functions before adding resources or prompts.
|
||||
|
||||
Rules for tool design:
|
||||
|
||||
- Give every tool a concrete verb-based name
|
||||
- Write docstrings as user-facing tool descriptions
|
||||
- Keep parameters explicit and typed
|
||||
- Return structured JSON-safe data where possible
|
||||
- Validate unsafe inputs early
|
||||
- Prefer read-only behavior by default for first versions
|
||||
|
||||
Good tool examples:
|
||||
|
||||
- `get_customer`
|
||||
- `search_tickets`
|
||||
- `describe_table`
|
||||
- `summarize_text_file`
|
||||
|
||||
Weak tool examples:
|
||||
|
||||
- `run`
|
||||
- `process`
|
||||
- `do_thing`
|
||||
|
||||
### 4. Add Resources and Prompts Only When They Help
|
||||
|
||||
Add `@mcp.resource` when the client benefits from fetching stable read-only content such as schemas, policy docs, or generated reports.
|
||||
|
||||
Add `@mcp.prompt` when the server should provide a reusable prompt template for a known workflow.
|
||||
|
||||
Do not turn every document into a prompt. Prefer:
|
||||
|
||||
- tools for actions
|
||||
- resources for data/document retrieval
|
||||
- prompts for reusable LLM instructions
|
||||
|
||||
### 5. Test the Server Before Integrating It Anywhere
|
||||
|
||||
Use the FastMCP CLI for local validation:
|
||||
|
||||
```bash
|
||||
fastmcp inspect acme_server.py:mcp
|
||||
fastmcp list acme_server.py --json
|
||||
fastmcp call acme_server.py search_resources query=router limit=5 --json
|
||||
```
|
||||
|
||||
For fast iterative debugging, run the server locally:
|
||||
|
||||
```bash
|
||||
fastmcp run acme_server.py:mcp
|
||||
```
|
||||
|
||||
To test HTTP transport locally:
|
||||
|
||||
```bash
|
||||
fastmcp run acme_server.py:mcp --transport http --host 127.0.0.1 --port 8000
|
||||
fastmcp list http://127.0.0.1:8000/mcp --json
|
||||
fastmcp call http://127.0.0.1:8000/mcp search_resources query=router --json
|
||||
```
|
||||
|
||||
Always run at least one real `fastmcp call` against each new tool before claiming the server works.
|
||||
|
||||
### 6. Install into a Client When Local Validation Passes
|
||||
|
||||
FastMCP can register the server with supported MCP clients:
|
||||
|
||||
```bash
|
||||
fastmcp install claude-code acme_server.py
|
||||
fastmcp install claude-desktop acme_server.py
|
||||
fastmcp install cursor acme_server.py -e .
|
||||
```
|
||||
|
||||
Use `fastmcp discover` to inspect named MCP servers already configured on the machine.
|
||||
|
||||
When the goal is Hermes integration, either:
|
||||
|
||||
- configure the server in `~/.hermes/config.yaml` using the `native-mcp` skill, or
|
||||
- keep using FastMCP CLI commands during development until the interface stabilizes
|
||||
|
||||
### 7. Deploy After the Local Contract Is Stable
|
||||
|
||||
For managed hosting, Prefect Horizon is the path FastMCP documents most directly. Before deployment:
|
||||
|
||||
```bash
|
||||
fastmcp inspect acme_server.py:mcp
|
||||
```
|
||||
|
||||
Make sure the repo contains:
|
||||
|
||||
- a Python file with the FastMCP server object
|
||||
- `requirements.txt` or `pyproject.toml`
|
||||
- any environment-variable documentation needed for deployment
|
||||
|
||||
For generic HTTP hosting, validate the HTTP transport locally first, then deploy on any Python-compatible platform that can expose the server port.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### API Wrapper Pattern
|
||||
|
||||
Use when exposing a REST or HTTP API as MCP tools.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- one read path
|
||||
- one list/search path
|
||||
- optional health check
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- keep auth in environment variables, not hardcoded
|
||||
- centralize request logic in one helper
|
||||
- surface API errors with concise context
|
||||
- normalize inconsistent upstream payloads before returning them
|
||||
|
||||
Start from `templates/api_wrapper.py`.
|
||||
|
||||
### Database Pattern
|
||||
|
||||
Use when exposing safe query and inspection capabilities.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- `list_tables`
|
||||
- `describe_table`
|
||||
- one constrained read query tool
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- default to read-only DB access
|
||||
- reject non-`SELECT` SQL in early versions
|
||||
- limit row counts
|
||||
- return rows plus column names
|
||||
|
||||
Start from `templates/database_server.py`.
|
||||
|
||||
### File Processor Pattern
|
||||
|
||||
Use when the server needs to inspect or transform files on demand.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- summarize file contents
|
||||
- search within files
|
||||
- extract deterministic metadata
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- accept explicit file paths
|
||||
- check for missing files and encoding failures
|
||||
- cap previews and result counts
|
||||
- avoid shelling out unless a specific external tool is required
|
||||
|
||||
Start from `templates/file_processor.py`.
|
||||
|
||||
## Quality Bar
|
||||
|
||||
Before handing off a FastMCP server, verify all of the following:
|
||||
|
||||
- server imports cleanly
|
||||
- `fastmcp inspect <file.py:mcp>` succeeds
|
||||
- `fastmcp list <server spec> --json` succeeds
|
||||
- every new tool has at least one real `fastmcp call`
|
||||
- environment variables are documented
|
||||
- the tool surface is small enough to understand without guesswork
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### FastMCP command missing
|
||||
|
||||
Install the package in the active environment:
|
||||
|
||||
```bash
|
||||
pip install fastmcp
|
||||
fastmcp version
|
||||
```
|
||||
|
||||
### `fastmcp inspect` fails
|
||||
|
||||
Check that:
|
||||
|
||||
- the file imports without side effects that crash
|
||||
- the FastMCP instance is named correctly in `<file.py:object>`
|
||||
- optional dependencies from the template are installed
|
||||
|
||||
### Tool works in Python but not through CLI
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
fastmcp list server.py --json
|
||||
fastmcp call server.py your_tool_name --json
|
||||
```
|
||||
|
||||
This usually exposes naming mismatches, missing required arguments, or non-serializable return values.
|
||||
|
||||
### Hermes cannot see the deployed server
|
||||
|
||||
The server-building part may be correct while the Hermes config is not. Load the `native-mcp` skill and configure the server in `~/.hermes/config.yaml`, then restart Hermes.
|
||||
|
||||
## References
|
||||
|
||||
For CLI details, install targets, and deployment checks, read `references/fastmcp-cli.md`.
|
||||
137
website/docs/user-guide/skills/optional/mcp/mcp-mcporter.md
Normal file
137
website/docs/user-guide/skills/optional/mcp/mcp-mcporter.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
---
|
||||
title: "Mcporter"
|
||||
sidebar_label: "Mcporter"
|
||||
description: "Use the mcporter CLI to list, configure, auth, and call MCP servers/tools directly (HTTP or stdio), including ad-hoc servers, config edits, and CLI/type gene..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Mcporter
|
||||
|
||||
Use the mcporter CLI to list, configure, auth, and call MCP servers/tools directly (HTTP or stdio), including ad-hoc servers, config edits, and CLI/type generation.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mcp/mcporter` |
|
||||
| Path | `optional-skills/mcp/mcporter` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | community |
|
||||
| License | MIT |
|
||||
| Tags | `MCP`, `Tools`, `API`, `Integrations`, `Interop` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# mcporter
|
||||
|
||||
Use `mcporter` to discover, call, and manage [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) servers and tools directly from the terminal.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Requires Node.js:
|
||||
```bash
|
||||
# No install needed (runs via npx)
|
||||
npx mcporter list
|
||||
|
||||
# Or install globally
|
||||
npm install -g mcporter
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# List MCP servers already configured on this machine
|
||||
mcporter list
|
||||
|
||||
# List tools for a specific server with schema details
|
||||
mcporter list <server> --schema
|
||||
|
||||
# Call a tool
|
||||
mcporter call <server.tool> key=value
|
||||
```
|
||||
|
||||
## Discovering MCP Servers
|
||||
|
||||
mcporter auto-discovers servers configured by other MCP clients (Claude Desktop, Cursor, etc.) on the machine. To find new servers to use, browse registries like [mcpfinder.dev](https://mcpfinder.dev) or [mcp.so](https://mcp.so), then connect ad-hoc:
|
||||
|
||||
```bash
|
||||
# Connect to any MCP server by URL (no config needed)
|
||||
mcporter list --http-url https://some-mcp-server.com --name my_server
|
||||
|
||||
# Or run a stdio server on the fly
|
||||
mcporter list --stdio "npx -y @modelcontextprotocol/server-filesystem" --name fs
|
||||
```
|
||||
|
||||
## Calling Tools
|
||||
|
||||
```bash
|
||||
# Key=value syntax
|
||||
mcporter call linear.list_issues team=ENG limit:5
|
||||
|
||||
# Function syntax
|
||||
mcporter call "linear.create_issue(title: \"Bug fix needed\")"
|
||||
|
||||
# Ad-hoc HTTP server (no config needed)
|
||||
mcporter call https://api.example.com/mcp.fetch url=https://example.com
|
||||
|
||||
# Ad-hoc stdio server
|
||||
mcporter call --stdio "bun run ./server.ts" scrape url=https://example.com
|
||||
|
||||
# JSON payload
|
||||
mcporter call <server.tool> --args '{"limit": 5}'
|
||||
|
||||
# Machine-readable output (recommended for Hermes)
|
||||
mcporter call <server.tool> key=value --output json
|
||||
```
|
||||
|
||||
## Auth and Config
|
||||
|
||||
```bash
|
||||
# OAuth login for a server
|
||||
mcporter auth <server | url> [--reset]
|
||||
|
||||
# Manage config
|
||||
mcporter config list
|
||||
mcporter config get <key>
|
||||
mcporter config add <server>
|
||||
mcporter config remove <server>
|
||||
mcporter config import <path>
|
||||
```
|
||||
|
||||
Config file location: `./config/mcporter.json` (override with `--config`).
|
||||
|
||||
## Daemon
|
||||
|
||||
For persistent server connections:
|
||||
```bash
|
||||
mcporter daemon start
|
||||
mcporter daemon status
|
||||
mcporter daemon stop
|
||||
mcporter daemon restart
|
||||
```
|
||||
|
||||
## Code Generation
|
||||
|
||||
```bash
|
||||
# Generate a CLI wrapper for an MCP server
|
||||
mcporter generate-cli --server <name>
|
||||
mcporter generate-cli --command <url>
|
||||
|
||||
# Inspect a generated CLI
|
||||
mcporter inspect-cli <path> [--json]
|
||||
|
||||
# Generate TypeScript types/client
|
||||
mcporter emit-ts <server> --mode client
|
||||
mcporter emit-ts <server> --mode types
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Use `--output json` for structured output that's easier to parse
|
||||
- Ad-hoc servers (HTTP URL or `--stdio` command) work without any config — useful for one-off calls
|
||||
- OAuth auth may require interactive browser flow — use `terminal(command="mcporter auth <server>", pty=true)` if needed
|
||||
|
|
@ -0,0 +1,315 @@
|
|||
---
|
||||
title: "Openclaw Migration — Migrate a user's OpenClaw customization footprint into Hermes Agent"
|
||||
sidebar_label: "Openclaw Migration"
|
||||
description: "Migrate a user's OpenClaw customization footprint into Hermes Agent"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Openclaw Migration
|
||||
|
||||
Migrate a user's OpenClaw customization footprint into Hermes Agent. Imports Hermes-compatible memories, SOUL.md, command allowlists, user skills, and selected workspace assets from ~/.openclaw, then reports exactly what could not be migrated and why.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/migration/openclaw-migration` |
|
||||
| Path | `optional-skills/migration/openclaw-migration` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent (Nous Research) |
|
||||
| License | MIT |
|
||||
| Tags | `Migration`, `OpenClaw`, `Hermes`, `Memory`, `Persona`, `Import` |
|
||||
| Related skills | [`hermes-agent`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# OpenClaw -> Hermes Migration
|
||||
|
||||
Use this skill when a user wants to move their OpenClaw setup into Hermes Agent with minimal manual cleanup.
|
||||
|
||||
## CLI Command
|
||||
|
||||
For a quick, non-interactive migration, use the built-in CLI command:
|
||||
|
||||
```bash
|
||||
hermes claw migrate # Full interactive migration
|
||||
hermes claw migrate --dry-run # Preview what would be migrated
|
||||
hermes claw migrate --preset user-data # Migrate without secrets
|
||||
hermes claw migrate --overwrite # Overwrite existing conflicts
|
||||
hermes claw migrate --source /custom/path/.openclaw # Custom source
|
||||
```
|
||||
|
||||
The CLI command runs the same migration script described below. Use this skill (via the agent) when you want an interactive, guided migration with dry-run previews and per-item conflict resolution.
|
||||
|
||||
**First-time setup:** The `hermes setup` wizard automatically detects `~/.openclaw` and offers migration before configuration begins.
|
||||
|
||||
## What this skill does
|
||||
|
||||
It uses `scripts/openclaw_to_hermes.py` to:
|
||||
|
||||
- import `SOUL.md` into the Hermes home directory as `SOUL.md`
|
||||
- transform OpenClaw `MEMORY.md` and `USER.md` into Hermes memory entries
|
||||
- merge OpenClaw command approval patterns into Hermes `command_allowlist`
|
||||
- migrate Hermes-compatible messaging settings such as `TELEGRAM_ALLOWED_USERS` and `MESSAGING_CWD`
|
||||
- copy OpenClaw skills into `~/.hermes/skills/openclaw-imports/`
|
||||
- optionally copy the OpenClaw workspace instructions file into a chosen Hermes workspace
|
||||
- mirror compatible workspace assets such as `workspace/tts/` into `~/.hermes/tts/`
|
||||
- archive non-secret docs that do not have a direct Hermes destination
|
||||
- produce a structured report listing migrated items, conflicts, skipped items, and reasons
|
||||
|
||||
## Path resolution
|
||||
|
||||
The helper script lives in this skill directory at:
|
||||
|
||||
- `scripts/openclaw_to_hermes.py`
|
||||
|
||||
When this skill is installed from the Skills Hub, the normal location is:
|
||||
|
||||
- `~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py`
|
||||
|
||||
Do not guess a shorter path like `~/.hermes/skills/openclaw-migration/...`.
|
||||
|
||||
Before running the helper:
|
||||
|
||||
1. Prefer the installed path under `~/.hermes/skills/migration/openclaw-migration/`.
|
||||
2. If that path fails, inspect the installed skill directory and resolve the script relative to the installed `SKILL.md`.
|
||||
3. Only use `find` as a fallback if the installed location is missing or the skill was moved manually.
|
||||
4. When calling the terminal tool, do not pass `workdir: "~"`. Use an absolute directory such as the user's home directory, or omit `workdir` entirely.
|
||||
|
||||
With `--migrate-secrets`, it will also import a small allowlisted set of Hermes-compatible secrets, currently:
|
||||
|
||||
- `TELEGRAM_BOT_TOKEN`
|
||||
|
||||
## Default workflow
|
||||
|
||||
1. Inspect first with a dry run.
|
||||
2. Present a simple summary of what can be migrated, what cannot be migrated, and what would be archived.
|
||||
3. If the `clarify` tool is available, use it for user decisions instead of asking for a free-form prose reply.
|
||||
4. If the dry run finds imported skill directory conflicts, ask how those should be handled before executing.
|
||||
5. Ask the user to choose between the two supported migration modes before executing.
|
||||
6. Ask for a target workspace path only if the user wants the workspace instructions file brought over.
|
||||
7. Execute the migration with the matching preset and flags.
|
||||
8. Summarize the results, especially:
|
||||
- what was migrated
|
||||
- what was archived for manual review
|
||||
- what was skipped and why
|
||||
|
||||
## User interaction protocol
|
||||
|
||||
Hermes CLI supports the `clarify` tool for interactive prompts, but it is limited to:
|
||||
|
||||
- one choice at a time
|
||||
- up to 4 predefined choices
|
||||
- an automatic `Other` free-text option
|
||||
|
||||
It does **not** support true multi-select checkboxes in a single prompt.
|
||||
|
||||
For every `clarify` call:
|
||||
|
||||
- always include a non-empty `question`
|
||||
- include `choices` only for real selectable prompts
|
||||
- keep `choices` to 2-4 plain string options
|
||||
- never emit placeholder or truncated options such as `...`
|
||||
- never pad or stylize choices with extra whitespace
|
||||
- never include fake form fields in the question such as `enter directory here`, blank lines to fill in, or underscores like `_____`
|
||||
- for open-ended path questions, ask only the plain sentence; the user types in the normal CLI prompt below the panel
|
||||
|
||||
If a `clarify` call returns an error, inspect the error text, correct the payload, and retry once with a valid `question` and clean choices.
|
||||
|
||||
When `clarify` is available and the dry run reveals any required user decision, your **next action must be a `clarify` tool call**.
|
||||
Do not end the turn with a normal assistant message such as:
|
||||
|
||||
- "Let me present the choices"
|
||||
- "What would you like to do?"
|
||||
- "Here are the options"
|
||||
|
||||
If a user decision is required, collect it via `clarify` before producing more prose.
|
||||
If multiple unresolved decisions remain, do not insert an explanatory assistant message between them. After one `clarify` response is received, your next action should usually be the next required `clarify` call.
|
||||
|
||||
Treat `workspace-agents` as an unresolved decision whenever the dry run reports:
|
||||
|
||||
- `kind="workspace-agents"`
|
||||
- `status="skipped"`
|
||||
- reason containing `No workspace target was provided`
|
||||
|
||||
In that case, you must ask about workspace instructions before execution. Do not silently treat that as a decision to skip.
|
||||
|
||||
Because of that limitation, use this simplified decision flow:
|
||||
|
||||
1. For `SOUL.md` conflicts, use `clarify` with choices such as:
|
||||
- `keep existing`
|
||||
- `overwrite with backup`
|
||||
- `review first`
|
||||
2. If the dry run shows one or more `kind="skill"` items with `status="conflict"`, use `clarify` with choices such as:
|
||||
- `keep existing skills`
|
||||
- `overwrite conflicting skills with backup`
|
||||
- `import conflicting skills under renamed folders`
|
||||
3. For workspace instructions, use `clarify` with choices such as:
|
||||
- `skip workspace instructions`
|
||||
- `copy to a workspace path`
|
||||
- `decide later`
|
||||
4. If the user chooses to copy workspace instructions, ask a follow-up open-ended `clarify` question requesting an **absolute path**.
|
||||
5. If the user chooses `skip workspace instructions` or `decide later`, proceed without `--workspace-target`.
|
||||
5. For migration mode, use `clarify` with these 3 choices:
|
||||
- `user-data only`
|
||||
- `full compatible migration`
|
||||
- `cancel`
|
||||
6. `user-data only` means: migrate user data and compatible config, but do **not** import allowlisted secrets.
|
||||
7. `full compatible migration` means: migrate the same compatible user data plus the allowlisted secrets when present.
|
||||
8. If `clarify` is not available, ask the same question in normal text, but still constrain the answer to `user-data only`, `full compatible migration`, or `cancel`.
|
||||
|
||||
Execution gate:
|
||||
|
||||
- Do not execute while a `workspace-agents` skip caused by `No workspace target was provided` remains unresolved.
|
||||
- The only valid ways to resolve it are:
|
||||
- user explicitly chooses `skip workspace instructions`
|
||||
- user explicitly chooses `decide later`
|
||||
- user provides a workspace path after choosing `copy to a workspace path`
|
||||
- Absence of a workspace target in the dry run is not itself permission to execute.
|
||||
- Do not execute while any required `clarify` decision remains unresolved.
|
||||
|
||||
Use these exact `clarify` payload shapes as the default pattern:
|
||||
|
||||
- `{"question":"Your existing SOUL.md conflicts with the imported one. What should I do?","choices":["keep existing","overwrite with backup","review first"]}`
|
||||
- `{"question":"One or more imported OpenClaw skills already exist in Hermes. How should I handle those skill conflicts?","choices":["keep existing skills","overwrite conflicting skills with backup","import conflicting skills under renamed folders"]}`
|
||||
- `{"question":"Choose migration mode: migrate only user data, or run the full compatible migration including allowlisted secrets?","choices":["user-data only","full compatible migration","cancel"]}`
|
||||
- `{"question":"Do you want to copy the OpenClaw workspace instructions file into a Hermes workspace?","choices":["skip workspace instructions","copy to a workspace path","decide later"]}`
|
||||
- `{"question":"Please provide an absolute path where the workspace instructions should be copied."}`
|
||||
|
||||
## Decision-to-command mapping
|
||||
|
||||
Map user decisions to command flags exactly:
|
||||
|
||||
- If the user chooses `keep existing` for `SOUL.md`, do **not** add `--overwrite`.
|
||||
- If the user chooses `overwrite with backup`, add `--overwrite`.
|
||||
- If the user chooses `review first`, stop before execution and review the relevant files.
|
||||
- If the user chooses `keep existing skills`, add `--skill-conflict skip`.
|
||||
- If the user chooses `overwrite conflicting skills with backup`, add `--skill-conflict overwrite`.
|
||||
- If the user chooses `import conflicting skills under renamed folders`, add `--skill-conflict rename`.
|
||||
- If the user chooses `user-data only`, execute with `--preset user-data` and do **not** add `--migrate-secrets`.
|
||||
- If the user chooses `full compatible migration`, execute with `--preset full --migrate-secrets`.
|
||||
- Only add `--workspace-target` if the user explicitly provided an absolute workspace path.
|
||||
- If the user chooses `skip workspace instructions` or `decide later`, do not add `--workspace-target`.
|
||||
|
||||
Before executing, restate the exact command plan in plain language and make sure it matches the user's choices.
|
||||
|
||||
## Post-run reporting rules
|
||||
|
||||
After execution, treat the script's JSON output as the source of truth.
|
||||
|
||||
1. Base all counts on `report.summary`.
|
||||
2. Only list an item under "Successfully Migrated" if its `status` is exactly `migrated`.
|
||||
3. Do not claim a conflict was resolved unless the report shows that item as `migrated`.
|
||||
4. Do not say `SOUL.md` was overwritten unless the report item for `kind="soul"` has `status="migrated"`.
|
||||
5. If `report.summary.conflict > 0`, include a conflict section instead of silently implying success.
|
||||
6. If counts and listed items disagree, fix the list to match the report before responding.
|
||||
7. Include the `output_dir` path from the report when available so the user can inspect `report.json`, `summary.md`, backups, and archived files.
|
||||
8. For memory or user-profile overflow, do not say the entries were archived unless the report explicitly shows an archive path. If `details.overflow_file` exists, say the full overflow list was exported there.
|
||||
9. If a skill was imported under a renamed folder, report the final destination and mention `details.renamed_from`.
|
||||
10. If `report.skill_conflict_mode` is present, use it as the source of truth for the selected imported-skill conflict policy.
|
||||
11. If an item has `status="skipped"`, do not describe it as overwritten, backed up, migrated, or resolved.
|
||||
12. If `kind="soul"` has `status="skipped"` with reason `Target already matches source`, say it was left unchanged and do not mention a backup.
|
||||
13. If a renamed imported skill has an empty `details.backup`, do not imply the existing Hermes skill was renamed or backed up. Say only that the imported copy was placed in the new destination and reference `details.renamed_from` as the pre-existing folder that remained in place.
|
||||
|
||||
## Migration presets
|
||||
|
||||
Prefer these two presets in normal use:
|
||||
|
||||
- `user-data`
|
||||
- `full`
|
||||
|
||||
`user-data` includes:
|
||||
|
||||
- `soul`
|
||||
- `workspace-agents`
|
||||
- `memory`
|
||||
- `user-profile`
|
||||
- `messaging-settings`
|
||||
- `command-allowlist`
|
||||
- `skills`
|
||||
- `tts-assets`
|
||||
- `archive`
|
||||
|
||||
`full` includes everything in `user-data` plus:
|
||||
|
||||
- `secret-settings`
|
||||
|
||||
The helper script still supports category-level `--include` / `--exclude`, but treat that as an advanced fallback rather than the default UX.
|
||||
|
||||
## Commands
|
||||
|
||||
Dry run with full discovery:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py
|
||||
```
|
||||
|
||||
When using the terminal tool, prefer an absolute invocation pattern such as:
|
||||
|
||||
```json
|
||||
{"command":"python3 /home/USER/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py","workdir":"/home/USER"}
|
||||
```
|
||||
|
||||
Dry run with the user-data preset:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --preset user-data
|
||||
```
|
||||
|
||||
Execute a user-data migration:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset user-data --skill-conflict skip
|
||||
```
|
||||
|
||||
Execute a full compatible migration:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset full --migrate-secrets --skill-conflict skip
|
||||
```
|
||||
|
||||
Execute with workspace instructions included:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset user-data --skill-conflict rename --workspace-target "/absolute/workspace/path"
|
||||
```
|
||||
|
||||
Do not use `$PWD` or the home directory as the workspace target by default. Ask for an explicit workspace path first.
|
||||
|
||||
## Important rules
|
||||
|
||||
1. Run a dry run before writing unless the user explicitly says to proceed immediately.
|
||||
2. Do not migrate secrets by default. Tokens, auth blobs, device credentials, and raw gateway config should stay out of Hermes unless the user explicitly asks for secret migration.
|
||||
3. Do not silently overwrite non-empty Hermes targets unless the user explicitly wants that. The helper script will preserve backups when overwriting is enabled.
|
||||
4. Always give the user the skipped-items report. That report is part of the migration, not an optional extra.
|
||||
5. Prefer the primary OpenClaw workspace (`~/.openclaw/workspace/`) over `workspace.default/`. Only use the default workspace as fallback when the primary files are missing.
|
||||
6. Even in secret-migration mode, only migrate secrets with a clean Hermes destination. Unsupported auth blobs must still be reported as skipped.
|
||||
7. If the dry run shows a large asset copy, a conflicting `SOUL.md`, or overflowed memory entries, call those out separately before execution.
|
||||
8. Default to `user-data only` if the user is unsure.
|
||||
9. Only include `workspace-agents` when the user has explicitly provided a destination workspace path.
|
||||
10. Treat category-level `--include` / `--exclude` as an advanced escape hatch, not the normal flow.
|
||||
11. Do not end the dry-run summary with a vague “What would you like to do?” if `clarify` is available. Use structured follow-up prompts instead.
|
||||
12. Do not use an open-ended `clarify` prompt when a real choice prompt would work. Prefer selectable choices first, then free text only for absolute paths or file review requests.
|
||||
13. After a dry run, never stop after summarizing if there is still an unresolved decision. Use `clarify` immediately for the highest-priority blocking decision.
|
||||
14. Priority order for follow-up questions:
|
||||
- `SOUL.md` conflict
|
||||
- imported skill conflicts
|
||||
- migration mode
|
||||
- workspace instructions destination
|
||||
15. Do not promise to present choices later in the same message. Present them by actually calling `clarify`.
|
||||
16. After the migration-mode answer, explicitly check whether `workspace-agents` is still unresolved. If it is, your next action must be the workspace-instructions `clarify` call.
|
||||
17. After any `clarify` answer, if another required decision remains, do not narrate what was just decided. Ask the next required question immediately.
|
||||
|
||||
## Expected result
|
||||
|
||||
After a successful run, the user should have:
|
||||
|
||||
- Hermes persona state imported
|
||||
- Hermes memory files populated with converted OpenClaw knowledge
|
||||
- OpenClaw skills available under `~/.hermes/skills/openclaw-imports/`
|
||||
- a migration report showing any conflicts, omissions, or unsupported data
|
||||
|
|
@ -0,0 +1,349 @@
|
|||
---
|
||||
title: "Huggingface Accelerate — Simplest distributed training API"
|
||||
sidebar_label: "Huggingface Accelerate"
|
||||
description: "Simplest distributed training API"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Huggingface Accelerate
|
||||
|
||||
Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/accelerate` |
|
||||
| Path | `optional-skills/mlops/accelerate` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `accelerate`, `torch`, `transformers` |
|
||||
| Tags | `Distributed Training`, `HuggingFace`, `Accelerate`, `DeepSpeed`, `FSDP`, `Mixed Precision`, `PyTorch`, `DDP`, `Unified API`, `Simple` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# HuggingFace Accelerate - Unified Distributed Training
|
||||
|
||||
## Quick start
|
||||
|
||||
Accelerate simplifies distributed training to 4 lines of code.
|
||||
|
||||
**Installation**:
|
||||
```bash
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
**Convert PyTorch script** (4 lines):
|
||||
```python
|
||||
import torch
|
||||
+ from accelerate import Accelerator
|
||||
|
||||
+ accelerator = Accelerator()
|
||||
|
||||
model = torch.nn.Transformer()
|
||||
optimizer = torch.optim.Adam(model.parameters())
|
||||
dataloader = torch.utils.data.DataLoader(dataset)
|
||||
|
||||
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
|
||||
|
||||
for batch in dataloader:
|
||||
optimizer.zero_grad()
|
||||
loss = model(batch)
|
||||
- loss.backward()
|
||||
+ accelerator.backward(loss)
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**Run** (single command):
|
||||
```bash
|
||||
accelerate launch train.py
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: From single GPU to multi-GPU
|
||||
|
||||
**Original script**:
|
||||
```python
|
||||
# train.py
|
||||
import torch
|
||||
|
||||
model = torch.nn.Linear(10, 2).to('cuda')
|
||||
optimizer = torch.optim.Adam(model.parameters())
|
||||
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
|
||||
|
||||
for epoch in range(10):
|
||||
for batch in dataloader:
|
||||
batch = batch.to('cuda')
|
||||
optimizer.zero_grad()
|
||||
loss = model(batch).mean()
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**With Accelerate** (4 lines added):
|
||||
```python
|
||||
# train.py
|
||||
import torch
|
||||
from accelerate import Accelerator # +1
|
||||
|
||||
accelerator = Accelerator() # +2
|
||||
|
||||
model = torch.nn.Linear(10, 2)
|
||||
optimizer = torch.optim.Adam(model.parameters())
|
||||
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
|
||||
|
||||
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) # +3
|
||||
|
||||
for epoch in range(10):
|
||||
for batch in dataloader:
|
||||
# No .to('cuda') needed - automatic!
|
||||
optimizer.zero_grad()
|
||||
loss = model(batch).mean()
|
||||
accelerator.backward(loss) # +4
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**Configure** (interactive):
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
**Questions**:
|
||||
- Which machine? (single/multi GPU/TPU/CPU)
|
||||
- How many machines? (1)
|
||||
- Mixed precision? (no/fp16/bf16/fp8)
|
||||
- DeepSpeed? (no/yes)
|
||||
|
||||
**Launch** (works on any setup):
|
||||
```bash
|
||||
# Single GPU
|
||||
accelerate launch train.py
|
||||
|
||||
# Multi-GPU (8 GPUs)
|
||||
accelerate launch --multi_gpu --num_processes 8 train.py
|
||||
|
||||
# Multi-node
|
||||
accelerate launch --multi_gpu --num_processes 16 \
|
||||
--num_machines 2 --machine_rank 0 \
|
||||
--main_process_ip $MASTER_ADDR \
|
||||
train.py
|
||||
```
|
||||
|
||||
### Workflow 2: Mixed precision training
|
||||
|
||||
**Enable FP16/BF16**:
|
||||
```python
|
||||
from accelerate import Accelerator
|
||||
|
||||
# FP16 (with gradient scaling)
|
||||
accelerator = Accelerator(mixed_precision='fp16')
|
||||
|
||||
# BF16 (no scaling, more stable)
|
||||
accelerator = Accelerator(mixed_precision='bf16')
|
||||
|
||||
# FP8 (H100+)
|
||||
accelerator = Accelerator(mixed_precision='fp8')
|
||||
|
||||
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
|
||||
|
||||
# Everything else is automatic!
|
||||
for batch in dataloader:
|
||||
with accelerator.autocast(): # Optional, done automatically
|
||||
loss = model(batch)
|
||||
accelerator.backward(loss)
|
||||
```
|
||||
|
||||
### Workflow 3: DeepSpeed ZeRO integration
|
||||
|
||||
**Enable DeepSpeed ZeRO-2**:
|
||||
```python
|
||||
from accelerate import Accelerator
|
||||
|
||||
accelerator = Accelerator(
|
||||
mixed_precision='bf16',
|
||||
deepspeed_plugin={
|
||||
"zero_stage": 2, # ZeRO-2
|
||||
"offload_optimizer": False,
|
||||
"gradient_accumulation_steps": 4
|
||||
}
|
||||
)
|
||||
|
||||
# Same code as before!
|
||||
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
|
||||
```
|
||||
|
||||
**Or via config**:
|
||||
```bash
|
||||
accelerate config
|
||||
# Select: DeepSpeed → ZeRO-2
|
||||
```
|
||||
|
||||
**deepspeed_config.json**:
|
||||
```json
|
||||
{
|
||||
"fp16": {"enabled": false},
|
||||
"bf16": {"enabled": true},
|
||||
"zero_optimization": {
|
||||
"stage": 2,
|
||||
"offload_optimizer": {"device": "cpu"},
|
||||
"allgather_bucket_size": 5e8,
|
||||
"reduce_bucket_size": 5e8
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Launch**:
|
||||
```bash
|
||||
accelerate launch --config_file deepspeed_config.json train.py
|
||||
```
|
||||
|
||||
### Workflow 4: FSDP (Fully Sharded Data Parallel)
|
||||
|
||||
**Enable FSDP**:
|
||||
```python
|
||||
from accelerate import Accelerator, FullyShardedDataParallelPlugin
|
||||
|
||||
fsdp_plugin = FullyShardedDataParallelPlugin(
|
||||
sharding_strategy="FULL_SHARD", # ZeRO-3 equivalent
|
||||
auto_wrap_policy="TRANSFORMER_AUTO_WRAP",
|
||||
cpu_offload=False
|
||||
)
|
||||
|
||||
accelerator = Accelerator(
|
||||
mixed_precision='bf16',
|
||||
fsdp_plugin=fsdp_plugin
|
||||
)
|
||||
|
||||
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
|
||||
```
|
||||
|
||||
**Or via config**:
|
||||
```bash
|
||||
accelerate config
|
||||
# Select: FSDP → Full Shard → No CPU Offload
|
||||
```
|
||||
|
||||
### Workflow 5: Gradient accumulation
|
||||
|
||||
**Accumulate gradients**:
|
||||
```python
|
||||
from accelerate import Accelerator
|
||||
|
||||
accelerator = Accelerator(gradient_accumulation_steps=4)
|
||||
|
||||
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
|
||||
|
||||
for batch in dataloader:
|
||||
with accelerator.accumulate(model): # Handles accumulation
|
||||
optimizer.zero_grad()
|
||||
loss = model(batch)
|
||||
accelerator.backward(loss)
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**Effective batch size**: `batch_size * num_gpus * gradient_accumulation_steps`
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use Accelerate when**:
|
||||
- Want simplest distributed training
|
||||
- Need single script for any hardware
|
||||
- Use HuggingFace ecosystem
|
||||
- Want flexibility (DDP/DeepSpeed/FSDP/Megatron)
|
||||
- Need quick prototyping
|
||||
|
||||
**Key advantages**:
|
||||
- **4 lines**: Minimal code changes
|
||||
- **Unified API**: Same code for DDP, DeepSpeed, FSDP, Megatron
|
||||
- **Automatic**: Device placement, mixed precision, sharding
|
||||
- **Interactive config**: No manual launcher setup
|
||||
- **Single launch**: Works everywhere
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **PyTorch Lightning**: Need callbacks, high-level abstractions
|
||||
- **Ray Train**: Multi-node orchestration, hyperparameter tuning
|
||||
- **DeepSpeed**: Direct API control, advanced features
|
||||
- **Raw DDP**: Maximum control, minimal abstraction
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: Wrong device placement**
|
||||
|
||||
Don't manually move to device:
|
||||
```python
|
||||
# WRONG
|
||||
batch = batch.to('cuda')
|
||||
|
||||
# CORRECT
|
||||
# Accelerate handles it automatically after prepare()
|
||||
```
|
||||
|
||||
**Issue: Gradient accumulation not working**
|
||||
|
||||
Use context manager:
|
||||
```python
|
||||
# CORRECT
|
||||
with accelerator.accumulate(model):
|
||||
optimizer.zero_grad()
|
||||
accelerator.backward(loss)
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**Issue: Checkpointing in distributed**
|
||||
|
||||
Use accelerator methods:
|
||||
```python
|
||||
# Save only on main process
|
||||
if accelerator.is_main_process:
|
||||
accelerator.save_state('checkpoint/')
|
||||
|
||||
# Load on all processes
|
||||
accelerator.load_state('checkpoint/')
|
||||
```
|
||||
|
||||
**Issue: Different results with FSDP**
|
||||
|
||||
Ensure same random seed:
|
||||
```python
|
||||
from accelerate.utils import set_seed
|
||||
set_seed(42)
|
||||
```
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**Megatron integration**: See [references/megatron-integration.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/accelerate/references/megatron-integration.md) for tensor parallelism, pipeline parallelism, and sequence parallelism setup.
|
||||
|
||||
**Custom plugins**: See [references/custom-plugins.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/accelerate/references/custom-plugins.md) for creating custom distributed plugins and advanced configuration.
|
||||
|
||||
**Performance tuning**: See [references/performance.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/accelerate/references/performance.md) for profiling, memory optimization, and best practices.
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
- **CPU**: Works (slow)
|
||||
- **Single GPU**: Works
|
||||
- **Multi-GPU**: DDP (default), DeepSpeed, or FSDP
|
||||
- **Multi-node**: DDP, DeepSpeed, FSDP, Megatron
|
||||
- **TPU**: Supported
|
||||
- **Apple MPS**: Supported
|
||||
|
||||
**Launcher requirements**:
|
||||
- **DDP**: `torch.distributed.run` (built-in)
|
||||
- **DeepSpeed**: `deepspeed` (pip install deepspeed)
|
||||
- **FSDP**: PyTorch 1.12+ (built-in)
|
||||
- **Megatron**: Custom setup
|
||||
|
||||
## Resources
|
||||
|
||||
- Docs: https://huggingface.co/docs/accelerate
|
||||
- GitHub: https://github.com/huggingface/accelerate
|
||||
- Version: 1.11.0+
|
||||
- Tutorial: "Accelerate your scripts"
|
||||
- Examples: https://github.com/huggingface/accelerate/tree/main/examples
|
||||
- Used by: HuggingFace Transformers, TRL, PEFT, all HF libraries
|
||||
424
website/docs/user-guide/skills/optional/mlops/mlops-chroma.md
Normal file
424
website/docs/user-guide/skills/optional/mlops/mlops-chroma.md
Normal file
|
|
@ -0,0 +1,424 @@
|
|||
---
|
||||
title: "Chroma — Open-source embedding database for AI applications"
|
||||
sidebar_label: "Chroma"
|
||||
description: "Open-source embedding database for AI applications"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Chroma
|
||||
|
||||
Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/chroma` |
|
||||
| Path | `optional-skills/mlops/chroma` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `chromadb`, `sentence-transformers` |
|
||||
| Tags | `RAG`, `Chroma`, `Vector Database`, `Embeddings`, `Semantic Search`, `Open Source`, `Self-Hosted`, `Document Retrieval`, `Metadata Filtering` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Chroma - Open-Source Embedding Database
|
||||
|
||||
The AI-native database for building LLM applications with memory.
|
||||
|
||||
## When to use Chroma
|
||||
|
||||
**Use Chroma when:**
|
||||
- Building RAG (retrieval-augmented generation) applications
|
||||
- Need local/self-hosted vector database
|
||||
- Want open-source solution (Apache 2.0)
|
||||
- Prototyping in notebooks
|
||||
- Semantic search over documents
|
||||
- Storing embeddings with metadata
|
||||
|
||||
**Metrics**:
|
||||
- **24,300+ GitHub stars**
|
||||
- **1,900+ forks**
|
||||
- **v1.3.3** (stable, weekly releases)
|
||||
- **Apache 2.0 license**
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Pinecone**: Managed cloud, auto-scaling
|
||||
- **FAISS**: Pure similarity search, no metadata
|
||||
- **Weaviate**: Production ML-native database
|
||||
- **Qdrant**: High performance, Rust-based
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Python
|
||||
pip install chromadb
|
||||
|
||||
# JavaScript/TypeScript
|
||||
npm install chromadb @chroma-core/default-embed
|
||||
```
|
||||
|
||||
### Basic usage (Python)
|
||||
|
||||
```python
|
||||
import chromadb
|
||||
|
||||
# Create client
|
||||
client = chromadb.Client()
|
||||
|
||||
# Create collection
|
||||
collection = client.create_collection(name="my_collection")
|
||||
|
||||
# Add documents
|
||||
collection.add(
|
||||
documents=["This is document 1", "This is document 2"],
|
||||
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
|
||||
# Query
|
||||
results = collection.query(
|
||||
query_texts=["document about topic"],
|
||||
n_results=2
|
||||
)
|
||||
|
||||
print(results)
|
||||
```
|
||||
|
||||
## Core operations
|
||||
|
||||
### 1. Create collection
|
||||
|
||||
```python
|
||||
# Simple collection
|
||||
collection = client.create_collection("my_docs")
|
||||
|
||||
# With custom embedding function
|
||||
from chromadb.utils import embedding_functions
|
||||
|
||||
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="text-embedding-3-small"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="my_docs",
|
||||
embedding_function=openai_ef
|
||||
)
|
||||
|
||||
# Get existing collection
|
||||
collection = client.get_collection("my_docs")
|
||||
|
||||
# Delete collection
|
||||
client.delete_collection("my_docs")
|
||||
```
|
||||
|
||||
### 2. Add documents
|
||||
|
||||
```python
|
||||
# Add with auto-generated IDs
|
||||
collection.add(
|
||||
documents=["Doc 1", "Doc 2", "Doc 3"],
|
||||
metadatas=[
|
||||
{"source": "web", "category": "tutorial"},
|
||||
{"source": "pdf", "page": 5},
|
||||
{"source": "api", "timestamp": "2025-01-01"}
|
||||
],
|
||||
ids=["id1", "id2", "id3"]
|
||||
)
|
||||
|
||||
# Add with custom embeddings
|
||||
collection.add(
|
||||
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
|
||||
documents=["Doc 1", "Doc 2"],
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Query (similarity search)
|
||||
|
||||
```python
|
||||
# Basic query
|
||||
results = collection.query(
|
||||
query_texts=["machine learning tutorial"],
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with filters
|
||||
results = collection.query(
|
||||
query_texts=["Python programming"],
|
||||
n_results=3,
|
||||
where={"source": "web"}
|
||||
)
|
||||
|
||||
# Query with metadata filters
|
||||
results = collection.query(
|
||||
query_texts=["advanced topics"],
|
||||
where={
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$gte": 3}}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
# Access results
|
||||
print(results["documents"]) # List of matching documents
|
||||
print(results["metadatas"]) # Metadata for each doc
|
||||
print(results["distances"]) # Similarity scores
|
||||
print(results["ids"]) # Document IDs
|
||||
```
|
||||
|
||||
### 4. Get documents
|
||||
|
||||
```python
|
||||
# Get by IDs
|
||||
docs = collection.get(
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
|
||||
# Get with filters
|
||||
docs = collection.get(
|
||||
where={"category": "tutorial"},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get all documents
|
||||
docs = collection.get()
|
||||
```
|
||||
|
||||
### 5. Update documents
|
||||
|
||||
```python
|
||||
# Update document content
|
||||
collection.update(
|
||||
ids=["id1"],
|
||||
documents=["Updated content"],
|
||||
metadatas=[{"source": "updated"}]
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Delete documents
|
||||
|
||||
```python
|
||||
# Delete by IDs
|
||||
collection.delete(ids=["id1", "id2"])
|
||||
|
||||
# Delete with filter
|
||||
collection.delete(
|
||||
where={"source": "outdated"}
|
||||
)
|
||||
```
|
||||
|
||||
## Persistent storage
|
||||
|
||||
```python
|
||||
# Persist to disk
|
||||
client = chromadb.PersistentClient(path="./chroma_db")
|
||||
|
||||
collection = client.create_collection("my_docs")
|
||||
collection.add(documents=["Doc 1"], ids=["id1"])
|
||||
|
||||
# Data persisted automatically
|
||||
# Reload later with same path
|
||||
client = chromadb.PersistentClient(path="./chroma_db")
|
||||
collection = client.get_collection("my_docs")
|
||||
```
|
||||
|
||||
## Embedding functions
|
||||
|
||||
### Default (Sentence Transformers)
|
||||
|
||||
```python
|
||||
# Uses sentence-transformers by default
|
||||
collection = client.create_collection("my_docs")
|
||||
# Default model: all-MiniLM-L6-v2
|
||||
```
|
||||
|
||||
### OpenAI
|
||||
|
||||
```python
|
||||
from chromadb.utils import embedding_functions
|
||||
|
||||
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="text-embedding-3-small"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="openai_docs",
|
||||
embedding_function=openai_ef
|
||||
)
|
||||
```
|
||||
|
||||
### HuggingFace
|
||||
|
||||
```python
|
||||
huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="sentence-transformers/all-mpnet-base-v2"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="hf_docs",
|
||||
embedding_function=huggingface_ef
|
||||
)
|
||||
```
|
||||
|
||||
### Custom embedding function
|
||||
|
||||
```python
|
||||
from chromadb import Documents, EmbeddingFunction, Embeddings
|
||||
|
||||
class MyEmbeddingFunction(EmbeddingFunction):
|
||||
def __call__(self, input: Documents) -> Embeddings:
|
||||
# Your embedding logic
|
||||
return embeddings
|
||||
|
||||
my_ef = MyEmbeddingFunction()
|
||||
collection = client.create_collection(
|
||||
name="custom_docs",
|
||||
embedding_function=my_ef
|
||||
)
|
||||
```
|
||||
|
||||
## Metadata filtering
|
||||
|
||||
```python
|
||||
# Exact match
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"category": "tutorial"}
|
||||
)
|
||||
|
||||
# Comparison operators
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne
|
||||
)
|
||||
|
||||
# Logical operators
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$lte": 3}}
|
||||
]
|
||||
} # Also: $or
|
||||
)
|
||||
|
||||
# Contains
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"tags": {"$in": ["python", "ml"]}}
|
||||
)
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_chroma import Chroma
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
|
||||
# Split documents
|
||||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
|
||||
docs = text_splitter.split_documents(documents)
|
||||
|
||||
# Create Chroma vector store
|
||||
vectorstore = Chroma.from_documents(
|
||||
documents=docs,
|
||||
embedding=OpenAIEmbeddings(),
|
||||
persist_directory="./chroma_db"
|
||||
)
|
||||
|
||||
# Query
|
||||
results = vectorstore.similarity_search("machine learning", k=3)
|
||||
|
||||
# As retriever
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
from llama_index.core import VectorStoreIndex, StorageContext
|
||||
import chromadb
|
||||
|
||||
# Initialize Chroma
|
||||
db = chromadb.PersistentClient(path="./chroma_db")
|
||||
collection = db.get_or_create_collection("my_collection")
|
||||
|
||||
# Create vector store
|
||||
vector_store = ChromaVectorStore(chroma_collection=collection)
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
|
||||
# Create index
|
||||
index = VectorStoreIndex.from_documents(
|
||||
documents,
|
||||
storage_context=storage_context
|
||||
)
|
||||
|
||||
# Query
|
||||
query_engine = index.as_query_engine()
|
||||
response = query_engine.query("What is machine learning?")
|
||||
```
|
||||
|
||||
## Server mode
|
||||
|
||||
```python
|
||||
# Run Chroma server
|
||||
# Terminal: chroma run --path ./chroma_db --port 8000
|
||||
|
||||
# Connect to server
|
||||
import chromadb
|
||||
from chromadb.config import Settings
|
||||
|
||||
client = chromadb.HttpClient(
|
||||
host="localhost",
|
||||
port=8000,
|
||||
settings=Settings(anonymized_telemetry=False)
|
||||
)
|
||||
|
||||
# Use as normal
|
||||
collection = client.get_or_create_collection("my_docs")
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use persistent client** - Don't lose data on restart
|
||||
2. **Add metadata** - Enables filtering and tracking
|
||||
3. **Batch operations** - Add multiple docs at once
|
||||
4. **Choose right embedding model** - Balance speed/quality
|
||||
5. **Use filters** - Narrow search space
|
||||
6. **Unique IDs** - Avoid collisions
|
||||
7. **Regular backups** - Copy chroma_db directory
|
||||
8. **Monitor collection size** - Scale up if needed
|
||||
9. **Test embedding functions** - Ensure quality
|
||||
10. **Use server mode for production** - Better for multi-user
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | Latency | Notes |
|
||||
|-----------|---------|-------|
|
||||
| Add 100 docs | ~1-3s | With embedding |
|
||||
| Query (top 10) | ~50-200ms | Depends on collection size |
|
||||
| Metadata filter | ~10-50ms | Fast with proper indexing |
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/chroma-core/chroma ⭐ 24,300+
|
||||
- **Docs**: https://docs.trychroma.com
|
||||
- **Discord**: https://discord.gg/MMeYNTmh3x
|
||||
- **Version**: 1.3.3+
|
||||
- **License**: Apache 2.0
|
||||
271
website/docs/user-guide/skills/optional/mlops/mlops-clip.md
Normal file
271
website/docs/user-guide/skills/optional/mlops/mlops-clip.md
Normal file
|
|
@ -0,0 +1,271 @@
|
|||
---
|
||||
title: "Clip — OpenAI's model connecting vision and language"
|
||||
sidebar_label: "Clip"
|
||||
description: "OpenAI's model connecting vision and language"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Clip
|
||||
|
||||
OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/clip` |
|
||||
| Path | `optional-skills/mlops/clip` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `transformers`, `torch`, `pillow` |
|
||||
| Tags | `Multimodal`, `CLIP`, `Vision-Language`, `Zero-Shot`, `Image Classification`, `OpenAI`, `Image Search`, `Cross-Modal Retrieval`, `Content Moderation` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# CLIP - Contrastive Language-Image Pre-Training
|
||||
|
||||
OpenAI's model that understands images from natural language.
|
||||
|
||||
## When to use CLIP
|
||||
|
||||
**Use when:**
|
||||
- Zero-shot image classification (no training data needed)
|
||||
- Image-text similarity/matching
|
||||
- Semantic image search
|
||||
- Content moderation (detect NSFW, violence)
|
||||
- Visual question answering
|
||||
- Cross-modal retrieval (image→text, text→image)
|
||||
|
||||
**Metrics**:
|
||||
- **25,300+ GitHub stars**
|
||||
- Trained on 400M image-text pairs
|
||||
- Matches ResNet-50 on ImageNet (zero-shot)
|
||||
- MIT License
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **BLIP-2**: Better captioning
|
||||
- **LLaVA**: Vision-language chat
|
||||
- **Segment Anything**: Image segmentation
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/openai/CLIP.git
|
||||
pip install torch torchvision ftfy regex tqdm
|
||||
```
|
||||
|
||||
### Zero-shot classification
|
||||
|
||||
```python
|
||||
import torch
|
||||
import clip
|
||||
from PIL import Image
|
||||
|
||||
# Load model
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
model, preprocess = clip.load("ViT-B/32", device=device)
|
||||
|
||||
# Load image
|
||||
image = preprocess(Image.open("photo.jpg")).unsqueeze(0).to(device)
|
||||
|
||||
# Define possible labels
|
||||
text = clip.tokenize(["a dog", "a cat", "a bird", "a car"]).to(device)
|
||||
|
||||
# Compute similarity
|
||||
with torch.no_grad():
|
||||
image_features = model.encode_image(image)
|
||||
text_features = model.encode_text(text)
|
||||
|
||||
# Cosine similarity
|
||||
logits_per_image, logits_per_text = model(image, text)
|
||||
probs = logits_per_image.softmax(dim=-1).cpu().numpy()
|
||||
|
||||
# Print results
|
||||
labels = ["a dog", "a cat", "a bird", "a car"]
|
||||
for label, prob in zip(labels, probs[0]):
|
||||
print(f"{label}: {prob:.2%}")
|
||||
```
|
||||
|
||||
## Available models
|
||||
|
||||
```python
|
||||
# Models (sorted by size)
|
||||
models = [
|
||||
"RN50", # ResNet-50
|
||||
"RN101", # ResNet-101
|
||||
"ViT-B/32", # Vision Transformer (recommended)
|
||||
"ViT-B/16", # Better quality, slower
|
||||
"ViT-L/14", # Best quality, slowest
|
||||
]
|
||||
|
||||
model, preprocess = clip.load("ViT-B/32")
|
||||
```
|
||||
|
||||
| Model | Parameters | Speed | Quality |
|
||||
|-------|------------|-------|---------|
|
||||
| RN50 | 102M | Fast | Good |
|
||||
| ViT-B/32 | 151M | Medium | Better |
|
||||
| ViT-L/14 | 428M | Slow | Best |
|
||||
|
||||
## Image-text similarity
|
||||
|
||||
```python
|
||||
# Compute embeddings
|
||||
image_features = model.encode_image(image)
|
||||
text_features = model.encode_text(text)
|
||||
|
||||
# Normalize
|
||||
image_features /= image_features.norm(dim=-1, keepdim=True)
|
||||
text_features /= text_features.norm(dim=-1, keepdim=True)
|
||||
|
||||
# Cosine similarity
|
||||
similarity = (image_features @ text_features.T).item()
|
||||
print(f"Similarity: {similarity:.4f}")
|
||||
```
|
||||
|
||||
## Semantic image search
|
||||
|
||||
```python
|
||||
# Index images
|
||||
image_paths = ["img1.jpg", "img2.jpg", "img3.jpg"]
|
||||
image_embeddings = []
|
||||
|
||||
for img_path in image_paths:
|
||||
image = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
|
||||
with torch.no_grad():
|
||||
embedding = model.encode_image(image)
|
||||
embedding /= embedding.norm(dim=-1, keepdim=True)
|
||||
image_embeddings.append(embedding)
|
||||
|
||||
image_embeddings = torch.cat(image_embeddings)
|
||||
|
||||
# Search with text query
|
||||
query = "a sunset over the ocean"
|
||||
text_input = clip.tokenize([query]).to(device)
|
||||
with torch.no_grad():
|
||||
text_embedding = model.encode_text(text_input)
|
||||
text_embedding /= text_embedding.norm(dim=-1, keepdim=True)
|
||||
|
||||
# Find most similar images
|
||||
similarities = (text_embedding @ image_embeddings.T).squeeze(0)
|
||||
top_k = similarities.topk(3)
|
||||
|
||||
for idx, score in zip(top_k.indices, top_k.values):
|
||||
print(f"{image_paths[idx]}: {score:.3f}")
|
||||
```
|
||||
|
||||
## Content moderation
|
||||
|
||||
```python
|
||||
# Define categories
|
||||
categories = [
|
||||
"safe for work",
|
||||
"not safe for work",
|
||||
"violent content",
|
||||
"graphic content"
|
||||
]
|
||||
|
||||
text = clip.tokenize(categories).to(device)
|
||||
|
||||
# Check image
|
||||
with torch.no_grad():
|
||||
logits_per_image, _ = model(image, text)
|
||||
probs = logits_per_image.softmax(dim=-1)
|
||||
|
||||
# Get classification
|
||||
max_idx = probs.argmax().item()
|
||||
max_prob = probs[0, max_idx].item()
|
||||
|
||||
print(f"Category: {categories[max_idx]} ({max_prob:.2%})")
|
||||
```
|
||||
|
||||
## Batch processing
|
||||
|
||||
```python
|
||||
# Process multiple images
|
||||
images = [preprocess(Image.open(f"img{i}.jpg")) for i in range(10)]
|
||||
images = torch.stack(images).to(device)
|
||||
|
||||
with torch.no_grad():
|
||||
image_features = model.encode_image(images)
|
||||
image_features /= image_features.norm(dim=-1, keepdim=True)
|
||||
|
||||
# Batch text
|
||||
texts = ["a dog", "a cat", "a bird"]
|
||||
text_tokens = clip.tokenize(texts).to(device)
|
||||
|
||||
with torch.no_grad():
|
||||
text_features = model.encode_text(text_tokens)
|
||||
text_features /= text_features.norm(dim=-1, keepdim=True)
|
||||
|
||||
# Similarity matrix (10 images × 3 texts)
|
||||
similarities = image_features @ text_features.T
|
||||
print(similarities.shape) # (10, 3)
|
||||
```
|
||||
|
||||
## Integration with vector databases
|
||||
|
||||
```python
|
||||
# Store CLIP embeddings in Chroma/FAISS
|
||||
import chromadb
|
||||
|
||||
client = chromadb.Client()
|
||||
collection = client.create_collection("image_embeddings")
|
||||
|
||||
# Add image embeddings
|
||||
for img_path, embedding in zip(image_paths, image_embeddings):
|
||||
collection.add(
|
||||
embeddings=[embedding.cpu().numpy().tolist()],
|
||||
metadatas=[{"path": img_path}],
|
||||
ids=[img_path]
|
||||
)
|
||||
|
||||
# Query with text
|
||||
query = "a sunset"
|
||||
text_embedding = model.encode_text(clip.tokenize([query]))
|
||||
results = collection.query(
|
||||
query_embeddings=[text_embedding.cpu().numpy().tolist()],
|
||||
n_results=5
|
||||
)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use ViT-B/32 for most cases** - Good balance
|
||||
2. **Normalize embeddings** - Required for cosine similarity
|
||||
3. **Batch processing** - More efficient
|
||||
4. **Cache embeddings** - Expensive to recompute
|
||||
5. **Use descriptive labels** - Better zero-shot performance
|
||||
6. **GPU recommended** - 10-50× faster
|
||||
7. **Preprocess images** - Use provided preprocess function
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | CPU | GPU (V100) |
|
||||
|-----------|-----|------------|
|
||||
| Image encoding | ~200ms | ~20ms |
|
||||
| Text encoding | ~50ms | ~5ms |
|
||||
| Similarity compute | <1ms | <1ms |
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Not for fine-grained tasks** - Best for broad categories
|
||||
2. **Requires descriptive text** - Vague labels perform poorly
|
||||
3. **Biased on web data** - May have dataset biases
|
||||
4. **No bounding boxes** - Whole image only
|
||||
5. **Limited spatial understanding** - Position/counting weak
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/openai/CLIP ⭐ 25,300+
|
||||
- **Paper**: https://arxiv.org/abs/2103.00020
|
||||
- **Colab**: https://colab.research.google.com/github/openai/clip/
|
||||
- **License**: MIT
|
||||
239
website/docs/user-guide/skills/optional/mlops/mlops-faiss.md
Normal file
239
website/docs/user-guide/skills/optional/mlops/mlops-faiss.md
Normal file
|
|
@ -0,0 +1,239 @@
|
|||
---
|
||||
title: "Faiss — Facebook's library for efficient similarity search and clustering of dense vectors"
|
||||
sidebar_label: "Faiss"
|
||||
description: "Facebook's library for efficient similarity search and clustering of dense vectors"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Faiss
|
||||
|
||||
Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/faiss` |
|
||||
| Path | `optional-skills/mlops/faiss` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `faiss-cpu`, `faiss-gpu`, `numpy` |
|
||||
| Tags | `RAG`, `FAISS`, `Similarity Search`, `Vector Search`, `Facebook AI`, `GPU Acceleration`, `Billion-Scale`, `K-NN`, `HNSW`, `High Performance`, `Large Scale` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# FAISS - Efficient Similarity Search
|
||||
|
||||
Facebook AI's library for billion-scale vector similarity search.
|
||||
|
||||
## When to use FAISS
|
||||
|
||||
**Use FAISS when:**
|
||||
- Need fast similarity search on large vector datasets (millions/billions)
|
||||
- GPU acceleration required
|
||||
- Pure vector similarity (no metadata filtering needed)
|
||||
- High throughput, low latency critical
|
||||
- Offline/batch processing of embeddings
|
||||
|
||||
**Metrics**:
|
||||
- **31,700+ GitHub stars**
|
||||
- Meta/Facebook AI Research
|
||||
- **Handles billions of vectors**
|
||||
- **C++** with Python bindings
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Chroma/Pinecone**: Need metadata filtering
|
||||
- **Weaviate**: Need full database features
|
||||
- **Annoy**: Simpler, fewer features
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# CPU only
|
||||
pip install faiss-cpu
|
||||
|
||||
# GPU support
|
||||
pip install faiss-gpu
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
import faiss
|
||||
import numpy as np
|
||||
|
||||
# Create sample data (1000 vectors, 128 dimensions)
|
||||
d = 128
|
||||
nb = 1000
|
||||
vectors = np.random.random((nb, d)).astype('float32')
|
||||
|
||||
# Create index
|
||||
index = faiss.IndexFlatL2(d) # L2 distance
|
||||
index.add(vectors) # Add vectors
|
||||
|
||||
# Search
|
||||
k = 5 # Find 5 nearest neighbors
|
||||
query = np.random.random((1, d)).astype('float32')
|
||||
distances, indices = index.search(query, k)
|
||||
|
||||
print(f"Nearest neighbors: {indices}")
|
||||
print(f"Distances: {distances}")
|
||||
```
|
||||
|
||||
## Index types
|
||||
|
||||
### 1. Flat (exact search)
|
||||
|
||||
```python
|
||||
# L2 (Euclidean) distance
|
||||
index = faiss.IndexFlatL2(d)
|
||||
|
||||
# Inner product (cosine similarity if normalized)
|
||||
index = faiss.IndexFlatIP(d)
|
||||
|
||||
# Slowest, most accurate
|
||||
```
|
||||
|
||||
### 2. IVF (inverted file) - Fast approximate
|
||||
|
||||
```python
|
||||
# Create quantizer
|
||||
quantizer = faiss.IndexFlatL2(d)
|
||||
|
||||
# IVF index with 100 clusters
|
||||
nlist = 100
|
||||
index = faiss.IndexIVFFlat(quantizer, d, nlist)
|
||||
|
||||
# Train on data
|
||||
index.train(vectors)
|
||||
|
||||
# Add vectors
|
||||
index.add(vectors)
|
||||
|
||||
# Search (nprobe = clusters to search)
|
||||
index.nprobe = 10
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
### 3. HNSW (Hierarchical NSW) - Best quality/speed
|
||||
|
||||
```python
|
||||
# HNSW index
|
||||
M = 32 # Number of connections per layer
|
||||
index = faiss.IndexHNSWFlat(d, M)
|
||||
|
||||
# No training needed
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
### 4. Product Quantization - Memory efficient
|
||||
|
||||
```python
|
||||
# PQ reduces memory by 16-32×
|
||||
m = 8 # Number of subquantizers
|
||||
nbits = 8
|
||||
index = faiss.IndexPQ(d, m, nbits)
|
||||
|
||||
# Train and add
|
||||
index.train(vectors)
|
||||
index.add(vectors)
|
||||
```
|
||||
|
||||
## Save and load
|
||||
|
||||
```python
|
||||
# Save index
|
||||
faiss.write_index(index, "large.index")
|
||||
|
||||
# Load index
|
||||
index = faiss.read_index("large.index")
|
||||
|
||||
# Continue using
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
## GPU acceleration
|
||||
|
||||
```python
|
||||
# Single GPU
|
||||
res = faiss.StandardGpuResources()
|
||||
index_cpu = faiss.IndexFlatL2(d)
|
||||
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
|
||||
|
||||
# Multi-GPU
|
||||
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
|
||||
|
||||
# 10-100× faster than CPU
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import FAISS
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Create FAISS vector store
|
||||
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
|
||||
|
||||
# Save
|
||||
vectorstore.save_local("faiss_index")
|
||||
|
||||
# Load
|
||||
vectorstore = FAISS.load_local(
|
||||
"faiss_index",
|
||||
OpenAIEmbeddings(),
|
||||
allow_dangerous_deserialization=True
|
||||
)
|
||||
|
||||
# Search
|
||||
results = vectorstore.similarity_search("query", k=5)
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.faiss import FaissVectorStore
|
||||
import faiss
|
||||
|
||||
# Create FAISS index
|
||||
d = 1536
|
||||
faiss_index = faiss.IndexFlatL2(d)
|
||||
|
||||
vector_store = FaissVectorStore(faiss_index=faiss_index)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
|
||||
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
|
||||
3. **Use GPU for large datasets** - 10-100× faster
|
||||
4. **Save trained indices** - Training is expensive
|
||||
5. **Tune nprobe/ef_search** - Balance speed/accuracy
|
||||
6. **Monitor memory** - PQ for large datasets
|
||||
7. **Batch queries** - Better GPU utilization
|
||||
|
||||
## Performance
|
||||
|
||||
| Index Type | Build Time | Search Time | Memory | Accuracy |
|
||||
|------------|------------|-------------|--------|----------|
|
||||
| Flat | Fast | Slow | High | 100% |
|
||||
| IVF | Medium | Fast | Medium | 95-99% |
|
||||
| HNSW | Slow | Fastest | High | 99% |
|
||||
| PQ | Medium | Fast | Low | 90-95% |
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
|
||||
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
|
||||
- **License**: MIT
|
||||
|
|
@ -0,0 +1,384 @@
|
|||
---
|
||||
title: "Optimizing Attention Flash"
|
||||
sidebar_label: "Optimizing Attention Flash"
|
||||
description: "Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Optimizing Attention Flash
|
||||
|
||||
Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/flash-attention` |
|
||||
| Path | `optional-skills/mlops/flash-attention` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `flash-attn`, `torch`, `transformers` |
|
||||
| Tags | `Optimization`, `Flash Attention`, `Attention Optimization`, `Memory Efficiency`, `Speed Optimization`, `Long Context`, `PyTorch`, `SDPA`, `H100`, `FP8`, `Transformers` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Flash Attention - Fast Memory-Efficient Attention
|
||||
|
||||
## Quick start
|
||||
|
||||
Flash Attention provides 2-4x speedup and 10-20x memory reduction for transformer attention through IO-aware tiling and recomputation.
|
||||
|
||||
**PyTorch native (easiest, PyTorch 2.2+)**:
|
||||
```python
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
|
||||
q = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) # [batch, heads, seq, dim]
|
||||
k = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16)
|
||||
v = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16)
|
||||
|
||||
# Automatically uses Flash Attention if available
|
||||
out = F.scaled_dot_product_attention(q, k, v)
|
||||
```
|
||||
|
||||
**flash-attn library (more features)**:
|
||||
```bash
|
||||
pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
```python
|
||||
from flash_attn import flash_attn_func
|
||||
|
||||
# q, k, v: [batch, seqlen, nheads, headdim]
|
||||
out = flash_attn_func(q, k, v, dropout_p=0.0, causal=True)
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: Enable in existing PyTorch model
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
Flash Attention Integration:
|
||||
- [ ] Step 1: Check PyTorch version (≥2.2)
|
||||
- [ ] Step 2: Enable Flash Attention backend
|
||||
- [ ] Step 3: Verify speedup with profiling
|
||||
- [ ] Step 4: Test accuracy matches baseline
|
||||
```
|
||||
|
||||
**Step 1: Check PyTorch version**
|
||||
|
||||
```bash
|
||||
python -c "import torch; print(torch.__version__)"
|
||||
# Should be ≥2.2.0
|
||||
```
|
||||
|
||||
If <2.2, upgrade:
|
||||
```bash
|
||||
pip install --upgrade torch
|
||||
```
|
||||
|
||||
**Step 2: Enable Flash Attention backend**
|
||||
|
||||
Replace standard attention:
|
||||
```python
|
||||
# Before (standard attention)
|
||||
attn_weights = torch.softmax(q @ k.transpose(-2, -1) / math.sqrt(d_k), dim=-1)
|
||||
out = attn_weights @ v
|
||||
|
||||
# After (Flash Attention)
|
||||
import torch.nn.functional as F
|
||||
out = F.scaled_dot_product_attention(q, k, v, attn_mask=mask)
|
||||
```
|
||||
|
||||
Force Flash Attention backend:
|
||||
```python
|
||||
with torch.backends.cuda.sdp_kernel(
|
||||
enable_flash=True,
|
||||
enable_math=False,
|
||||
enable_mem_efficient=False
|
||||
):
|
||||
out = F.scaled_dot_product_attention(q, k, v)
|
||||
```
|
||||
|
||||
**Step 3: Verify speedup with profiling**
|
||||
|
||||
```python
|
||||
import torch.utils.benchmark as benchmark
|
||||
|
||||
def test_attention(use_flash):
|
||||
q, k, v = [torch.randn(2, 8, 2048, 64, device='cuda', dtype=torch.float16) for _ in range(3)]
|
||||
|
||||
if use_flash:
|
||||
with torch.backends.cuda.sdp_kernel(enable_flash=True):
|
||||
return F.scaled_dot_product_attention(q, k, v)
|
||||
else:
|
||||
attn = (q @ k.transpose(-2, -1) / 8.0).softmax(dim=-1)
|
||||
return attn @ v
|
||||
|
||||
# Benchmark
|
||||
t_flash = benchmark.Timer(stmt='test_attention(True)', globals=globals())
|
||||
t_standard = benchmark.Timer(stmt='test_attention(False)', globals=globals())
|
||||
|
||||
print(f"Flash: {t_flash.timeit(100).mean:.3f}s")
|
||||
print(f"Standard: {t_standard.timeit(100).mean:.3f}s")
|
||||
```
|
||||
|
||||
Expected: 2-4x speedup for sequences >512 tokens.
|
||||
|
||||
**Step 4: Test accuracy matches baseline**
|
||||
|
||||
```python
|
||||
# Compare outputs
|
||||
q, k, v = [torch.randn(1, 8, 512, 64, device='cuda', dtype=torch.float16) for _ in range(3)]
|
||||
|
||||
# Flash Attention
|
||||
out_flash = F.scaled_dot_product_attention(q, k, v)
|
||||
|
||||
# Standard attention
|
||||
attn_weights = torch.softmax(q @ k.transpose(-2, -1) / 8.0, dim=-1)
|
||||
out_standard = attn_weights @ v
|
||||
|
||||
# Check difference
|
||||
diff = (out_flash - out_standard).abs().max()
|
||||
print(f"Max difference: {diff:.6f}")
|
||||
# Should be <1e-3 for float16
|
||||
```
|
||||
|
||||
### Workflow 2: Use flash-attn library for advanced features
|
||||
|
||||
For multi-query attention, sliding window, or H100 FP8.
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
flash-attn Library Setup:
|
||||
- [ ] Step 1: Install flash-attn library
|
||||
- [ ] Step 2: Modify attention code
|
||||
- [ ] Step 3: Enable advanced features
|
||||
- [ ] Step 4: Benchmark performance
|
||||
```
|
||||
|
||||
**Step 1: Install flash-attn library**
|
||||
|
||||
```bash
|
||||
# NVIDIA GPUs (CUDA 12.0+)
|
||||
pip install flash-attn --no-build-isolation
|
||||
|
||||
# Verify installation
|
||||
python -c "from flash_attn import flash_attn_func; print('Success')"
|
||||
```
|
||||
|
||||
**Step 2: Modify attention code**
|
||||
|
||||
```python
|
||||
from flash_attn import flash_attn_func
|
||||
|
||||
# Input: [batch_size, seq_len, num_heads, head_dim]
|
||||
# Transpose from [batch, heads, seq, dim] if needed
|
||||
q = q.transpose(1, 2) # [batch, seq, heads, dim]
|
||||
k = k.transpose(1, 2)
|
||||
v = v.transpose(1, 2)
|
||||
|
||||
out = flash_attn_func(
|
||||
q, k, v,
|
||||
dropout_p=0.1,
|
||||
causal=True, # For autoregressive models
|
||||
window_size=(-1, -1), # No sliding window
|
||||
softmax_scale=None # Auto-scale
|
||||
)
|
||||
|
||||
out = out.transpose(1, 2) # Back to [batch, heads, seq, dim]
|
||||
```
|
||||
|
||||
**Step 3: Enable advanced features**
|
||||
|
||||
Multi-query attention (shared K/V across heads):
|
||||
```python
|
||||
from flash_attn import flash_attn_func
|
||||
|
||||
# q: [batch, seq, num_q_heads, dim]
|
||||
# k, v: [batch, seq, num_kv_heads, dim] # Fewer KV heads
|
||||
out = flash_attn_func(q, k, v) # Automatically handles MQA
|
||||
```
|
||||
|
||||
Sliding window attention (local attention):
|
||||
```python
|
||||
# Only attend to window of 256 tokens before/after
|
||||
out = flash_attn_func(
|
||||
q, k, v,
|
||||
window_size=(256, 256), # (left, right) window
|
||||
causal=True
|
||||
)
|
||||
```
|
||||
|
||||
**Step 4: Benchmark performance**
|
||||
|
||||
```python
|
||||
import torch
|
||||
from flash_attn import flash_attn_func
|
||||
import time
|
||||
|
||||
q, k, v = [torch.randn(4, 4096, 32, 64, device='cuda', dtype=torch.float16) for _ in range(3)]
|
||||
|
||||
# Warmup
|
||||
for _ in range(10):
|
||||
_ = flash_attn_func(q, k, v)
|
||||
|
||||
# Benchmark
|
||||
torch.cuda.synchronize()
|
||||
start = time.time()
|
||||
for _ in range(100):
|
||||
out = flash_attn_func(q, k, v)
|
||||
torch.cuda.synchronize()
|
||||
end = time.time()
|
||||
|
||||
print(f"Time per iteration: {(end-start)/100*1000:.2f}ms")
|
||||
print(f"Memory allocated: {torch.cuda.max_memory_allocated()/1e9:.2f}GB")
|
||||
```
|
||||
|
||||
### Workflow 3: H100 FP8 optimization (FlashAttention-3)
|
||||
|
||||
For maximum performance on H100 GPUs.
|
||||
|
||||
```
|
||||
FP8 Setup:
|
||||
- [ ] Step 1: Verify H100 GPU available
|
||||
- [ ] Step 2: Install flash-attn with FP8 support
|
||||
- [ ] Step 3: Convert inputs to FP8
|
||||
- [ ] Step 4: Run with FP8 attention
|
||||
```
|
||||
|
||||
**Step 1: Verify H100 GPU**
|
||||
|
||||
```bash
|
||||
nvidia-smi --query-gpu=name --format=csv
|
||||
# Should show "H100" or "H800"
|
||||
```
|
||||
|
||||
**Step 2: Install flash-attn with FP8 support**
|
||||
|
||||
```bash
|
||||
pip install flash-attn --no-build-isolation
|
||||
# FP8 support included for H100
|
||||
```
|
||||
|
||||
**Step 3: Convert inputs to FP8**
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
q = torch.randn(2, 4096, 32, 64, device='cuda', dtype=torch.float16)
|
||||
k = torch.randn(2, 4096, 32, 64, device='cuda', dtype=torch.float16)
|
||||
v = torch.randn(2, 4096, 32, 64, device='cuda', dtype=torch.float16)
|
||||
|
||||
# Convert to float8_e4m3 (FP8)
|
||||
q_fp8 = q.to(torch.float8_e4m3fn)
|
||||
k_fp8 = k.to(torch.float8_e4m3fn)
|
||||
v_fp8 = v.to(torch.float8_e4m3fn)
|
||||
```
|
||||
|
||||
**Step 4: Run with FP8 attention**
|
||||
|
||||
```python
|
||||
from flash_attn import flash_attn_func
|
||||
|
||||
# FlashAttention-3 automatically uses FP8 kernels on H100
|
||||
out = flash_attn_func(q_fp8, k_fp8, v_fp8)
|
||||
# Result: ~1.2 PFLOPS, 1.5-2x faster than FP16
|
||||
```
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use Flash Attention when:**
|
||||
- Training transformers with sequences >512 tokens
|
||||
- Running inference with long context (>2K tokens)
|
||||
- GPU memory constrained (OOM with standard attention)
|
||||
- Need 2-4x speedup without accuracy loss
|
||||
- Using PyTorch 2.2+ or can install flash-attn
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **Standard attention**: Sequences <256 tokens (overhead not worth it)
|
||||
- **xFormers**: Need more attention variants (not just speed)
|
||||
- **Memory-efficient attention**: CPU inference (Flash Attention needs GPU)
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: ImportError: cannot import flash_attn**
|
||||
|
||||
Install with no-build-isolation flag:
|
||||
```bash
|
||||
pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
Or install CUDA toolkit first:
|
||||
```bash
|
||||
conda install cuda -c nvidia
|
||||
pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
**Issue: Slower than expected (no speedup)**
|
||||
|
||||
Flash Attention benefits increase with sequence length:
|
||||
- <512 tokens: Minimal speedup (10-20%)
|
||||
- 512-2K tokens: 2-3x speedup
|
||||
- >2K tokens: 3-4x speedup
|
||||
|
||||
Check sequence length is sufficient.
|
||||
|
||||
**Issue: RuntimeError: CUDA error**
|
||||
|
||||
Verify GPU supports Flash Attention:
|
||||
```python
|
||||
import torch
|
||||
print(torch.cuda.get_device_capability())
|
||||
# Should be ≥(7, 5) for Turing+
|
||||
```
|
||||
|
||||
Flash Attention requires:
|
||||
- Ampere (A100, A10): ✅ Full support
|
||||
- Turing (T4): ✅ Supported
|
||||
- Volta (V100): ❌ Not supported
|
||||
|
||||
**Issue: Accuracy degradation**
|
||||
|
||||
Check dtype is float16 or bfloat16 (not float32):
|
||||
```python
|
||||
q = q.to(torch.float16) # Or torch.bfloat16
|
||||
```
|
||||
|
||||
Flash Attention uses float16/bfloat16 for speed. Float32 not supported.
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**Integration with HuggingFace Transformers**: See [references/transformers-integration.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/flash-attention/references/transformers-integration.md) for enabling Flash Attention in BERT, GPT, Llama models.
|
||||
|
||||
**Performance benchmarks**: See [references/benchmarks.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/flash-attention/references/benchmarks.md) for detailed speed and memory comparisons across GPUs and sequence lengths.
|
||||
|
||||
**Algorithm details**: See [references/algorithm.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/flash-attention/references/algorithm.md) for tiling strategy, recomputation, and IO complexity analysis.
|
||||
|
||||
**Advanced features**: See [references/advanced-features.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/flash-attention/references/advanced-features.md) for rotary embeddings, ALiBi, paged KV cache, and custom attention masks.
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
- **GPU**: NVIDIA Ampere+ (A100, A10, A30) or AMD MI200+
|
||||
- **VRAM**: Same as standard attention (Flash Attention doesn't increase memory)
|
||||
- **CUDA**: 12.0+ (11.8 minimum)
|
||||
- **PyTorch**: 2.2+ for native support
|
||||
|
||||
**Not supported**: V100 (Volta), CPU inference
|
||||
|
||||
## Resources
|
||||
|
||||
- Paper: "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness" (NeurIPS 2022)
|
||||
- Paper: "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning" (ICLR 2024)
|
||||
- Blog: https://tridao.me/blog/2024/flash3/
|
||||
- GitHub: https://github.com/Dao-AILab/flash-attention
|
||||
- PyTorch docs: https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
|
||||
590
website/docs/user-guide/skills/optional/mlops/mlops-guidance.md
Normal file
590
website/docs/user-guide/skills/optional/mlops/mlops-guidance.md
Normal file
|
|
@ -0,0 +1,590 @@
|
|||
---
|
||||
title: "Guidance"
|
||||
sidebar_label: "Guidance"
|
||||
description: "Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidanc..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Guidance
|
||||
|
||||
Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/guidance` |
|
||||
| Path | `optional-skills/mlops/guidance` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `guidance`, `transformers` |
|
||||
| Tags | `Prompt Engineering`, `Guidance`, `Constrained Generation`, `Structured Output`, `JSON Validation`, `Grammar`, `Microsoft Research`, `Format Enforcement`, `Multi-Step Workflows` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Guidance: Constrained LLM Generation
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use Guidance when you need to:
|
||||
- **Control LLM output syntax** with regex or grammars
|
||||
- **Guarantee valid JSON/XML/code** generation
|
||||
- **Reduce latency** vs traditional prompting approaches
|
||||
- **Enforce structured formats** (dates, emails, IDs, etc.)
|
||||
- **Build multi-step workflows** with Pythonic control flow
|
||||
- **Prevent invalid outputs** through grammatical constraints
|
||||
|
||||
**GitHub Stars**: 18,000+ | **From**: Microsoft Research
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Base installation
|
||||
pip install guidance
|
||||
|
||||
# With specific backends
|
||||
pip install guidance[transformers] # Hugging Face models
|
||||
pip install guidance[llama_cpp] # llama.cpp models
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Example: Structured Generation
|
||||
|
||||
```python
|
||||
from guidance import models, gen
|
||||
|
||||
# Load model (supports OpenAI, Transformers, llama.cpp)
|
||||
lm = models.OpenAI("gpt-4")
|
||||
|
||||
# Generate with constraints
|
||||
result = lm + "The capital of France is " + gen("capital", max_tokens=5)
|
||||
|
||||
print(result["capital"]) # "Paris"
|
||||
```
|
||||
|
||||
### With Anthropic Claude
|
||||
|
||||
```python
|
||||
from guidance import models, gen, system, user, assistant
|
||||
|
||||
# Configure Claude
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# Use context managers for chat format
|
||||
with system():
|
||||
lm += "You are a helpful assistant."
|
||||
|
||||
with user():
|
||||
lm += "What is the capital of France?"
|
||||
|
||||
with assistant():
|
||||
lm += gen(max_tokens=20)
|
||||
```
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. Context Managers
|
||||
|
||||
Guidance uses Pythonic context managers for chat-style interactions.
|
||||
|
||||
```python
|
||||
from guidance import system, user, assistant, gen
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# System message
|
||||
with system():
|
||||
lm += "You are a JSON generation expert."
|
||||
|
||||
# User message
|
||||
with user():
|
||||
lm += "Generate a person object with name and age."
|
||||
|
||||
# Assistant response
|
||||
with assistant():
|
||||
lm += gen("response", max_tokens=100)
|
||||
|
||||
print(lm["response"])
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Natural chat flow
|
||||
- Clear role separation
|
||||
- Easy to read and maintain
|
||||
|
||||
### 2. Constrained Generation
|
||||
|
||||
Guidance ensures outputs match specified patterns using regex or grammars.
|
||||
|
||||
#### Regex Constraints
|
||||
|
||||
```python
|
||||
from guidance import models, gen
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# Constrain to valid email format
|
||||
lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
|
||||
|
||||
# Constrain to date format (YYYY-MM-DD)
|
||||
lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")
|
||||
|
||||
# Constrain to phone number
|
||||
lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")
|
||||
|
||||
print(lm["email"]) # Guaranteed valid email
|
||||
print(lm["date"]) # Guaranteed YYYY-MM-DD format
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- Regex converted to grammar at token level
|
||||
- Invalid tokens filtered during generation
|
||||
- Model can only produce matching outputs
|
||||
|
||||
#### Selection Constraints
|
||||
|
||||
```python
|
||||
from guidance import models, gen, select
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# Constrain to specific choices
|
||||
lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
|
||||
|
||||
# Multiple-choice selection
|
||||
lm += "Best answer: " + select(
|
||||
["A) Paris", "B) London", "C) Berlin", "D) Madrid"],
|
||||
name="answer"
|
||||
)
|
||||
|
||||
print(lm["sentiment"]) # One of: positive, negative, neutral
|
||||
print(lm["answer"]) # One of: A, B, C, or D
|
||||
```
|
||||
|
||||
### 3. Token Healing
|
||||
|
||||
Guidance automatically "heals" token boundaries between prompt and generation.
|
||||
|
||||
**Problem:** Tokenization creates unnatural boundaries.
|
||||
|
||||
```python
|
||||
# Without token healing
|
||||
prompt = "The capital of France is "
|
||||
# Last token: " is "
|
||||
# First generated token might be " Par" (with leading space)
|
||||
# Result: "The capital of France is Paris" (double space!)
|
||||
```
|
||||
|
||||
**Solution:** Guidance backs up one token and regenerates.
|
||||
|
||||
```python
|
||||
from guidance import models, gen
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# Token healing enabled by default
|
||||
lm += "The capital of France is " + gen("capital", max_tokens=5)
|
||||
# Result: "The capital of France is Paris" (correct spacing)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Natural text boundaries
|
||||
- No awkward spacing issues
|
||||
- Better model performance (sees natural token sequences)
|
||||
|
||||
### 4. Grammar-Based Generation
|
||||
|
||||
Define complex structures using context-free grammars.
|
||||
|
||||
```python
|
||||
from guidance import models, gen
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
# JSON grammar (simplified)
|
||||
json_grammar = """
|
||||
{
|
||||
"name": <gen name regex="[A-Za-z ]+" max_tokens=20>,
|
||||
"age": <gen age regex="[0-9]+" max_tokens=3>,
|
||||
"email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>
|
||||
}
|
||||
"""
|
||||
|
||||
# Generate valid JSON
|
||||
lm += gen("person", grammar=json_grammar)
|
||||
|
||||
print(lm["person"]) # Guaranteed valid JSON structure
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Complex structured outputs
|
||||
- Nested data structures
|
||||
- Programming language syntax
|
||||
- Domain-specific languages
|
||||
|
||||
### 5. Guidance Functions
|
||||
|
||||
Create reusable generation patterns with the `@guidance` decorator.
|
||||
|
||||
```python
|
||||
from guidance import guidance, gen, models
|
||||
|
||||
@guidance
|
||||
def generate_person(lm):
|
||||
"""Generate a person with name and age."""
|
||||
lm += "Name: " + gen("name", max_tokens=20, stop="\n")
|
||||
lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)
|
||||
return lm
|
||||
|
||||
# Use the function
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
lm = generate_person(lm)
|
||||
|
||||
print(lm["name"])
|
||||
print(lm["age"])
|
||||
```
|
||||
|
||||
**Stateful Functions:**
|
||||
|
||||
```python
|
||||
@guidance(stateless=False)
|
||||
def react_agent(lm, question, tools, max_rounds=5):
|
||||
"""ReAct agent with tool use."""
|
||||
lm += f"Question: {question}\n\n"
|
||||
|
||||
for i in range(max_rounds):
|
||||
# Thought
|
||||
lm += f"Thought {i+1}: " + gen("thought", stop="\n")
|
||||
|
||||
# Action
|
||||
lm += "\nAction: " + select(list(tools.keys()), name="action")
|
||||
|
||||
# Execute tool
|
||||
tool_result = tools[lm["action"]]()
|
||||
lm += f"\nObservation: {tool_result}\n\n"
|
||||
|
||||
# Check if done
|
||||
lm += "Done? " + select(["Yes", "No"], name="done")
|
||||
if lm["done"] == "Yes":
|
||||
break
|
||||
|
||||
# Final answer
|
||||
lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
|
||||
return lm
|
||||
```
|
||||
|
||||
## Backend Configuration
|
||||
|
||||
### Anthropic Claude
|
||||
|
||||
```python
|
||||
from guidance import models
|
||||
|
||||
lm = models.Anthropic(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
api_key="your-api-key" # Or set ANTHROPIC_API_KEY env var
|
||||
)
|
||||
```
|
||||
|
||||
### OpenAI
|
||||
|
||||
```python
|
||||
lm = models.OpenAI(
|
||||
model="gpt-4o-mini",
|
||||
api_key="your-api-key" # Or set OPENAI_API_KEY env var
|
||||
)
|
||||
```
|
||||
|
||||
### Local Models (Transformers)
|
||||
|
||||
```python
|
||||
from guidance.models import Transformers
|
||||
|
||||
lm = Transformers(
|
||||
"microsoft/Phi-4-mini-instruct",
|
||||
device="cuda" # Or "cpu"
|
||||
)
|
||||
```
|
||||
|
||||
### Local Models (llama.cpp)
|
||||
|
||||
```python
|
||||
from guidance.models import LlamaCpp
|
||||
|
||||
lm = LlamaCpp(
|
||||
model_path="/path/to/model.gguf",
|
||||
n_ctx=4096,
|
||||
n_gpu_layers=35
|
||||
)
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: JSON Generation
|
||||
|
||||
```python
|
||||
from guidance import models, gen, system, user, assistant
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
with system():
|
||||
lm += "You generate valid JSON."
|
||||
|
||||
with user():
|
||||
lm += "Generate a user profile with name, age, and email."
|
||||
|
||||
with assistant():
|
||||
lm += """{
|
||||
"name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """,
|
||||
"age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """,
|
||||
"email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """
|
||||
}"""
|
||||
|
||||
print(lm) # Valid JSON guaranteed
|
||||
```
|
||||
|
||||
### Pattern 2: Classification
|
||||
|
||||
```python
|
||||
from guidance import models, gen, select
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
|
||||
text = "This product is amazing! I love it."
|
||||
|
||||
lm += f"Text: {text}\n"
|
||||
lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
|
||||
lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"
|
||||
|
||||
print(f"Sentiment: {lm['sentiment']}")
|
||||
print(f"Confidence: {lm['confidence']}%")
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Step Reasoning
|
||||
|
||||
```python
|
||||
from guidance import models, gen, guidance
|
||||
|
||||
@guidance
|
||||
def chain_of_thought(lm, question):
|
||||
"""Generate answer with step-by-step reasoning."""
|
||||
lm += f"Question: {question}\n\n"
|
||||
|
||||
# Generate multiple reasoning steps
|
||||
for i in range(3):
|
||||
lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"
|
||||
|
||||
# Final answer
|
||||
lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)
|
||||
|
||||
return lm
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
lm = chain_of_thought(lm, "What is 15% of 200?")
|
||||
|
||||
print(lm["answer"])
|
||||
```
|
||||
|
||||
### Pattern 4: ReAct Agent
|
||||
|
||||
```python
|
||||
from guidance import models, gen, select, guidance
|
||||
|
||||
@guidance(stateless=False)
|
||||
def react_agent(lm, question):
|
||||
"""ReAct agent with tool use."""
|
||||
tools = {
|
||||
"calculator": lambda expr: eval(expr),
|
||||
"search": lambda query: f"Search results for: {query}",
|
||||
}
|
||||
|
||||
lm += f"Question: {question}\n\n"
|
||||
|
||||
for round in range(5):
|
||||
# Thought
|
||||
lm += f"Thought: " + gen("thought", stop="\n") + "\n"
|
||||
|
||||
# Action selection
|
||||
lm += "Action: " + select(["calculator", "search", "answer"], name="action")
|
||||
|
||||
if lm["action"] == "answer":
|
||||
lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
|
||||
break
|
||||
|
||||
# Action input
|
||||
lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"
|
||||
|
||||
# Execute tool
|
||||
if lm["action"] in tools:
|
||||
result = tools[lm["action"]](lm["action_input"])
|
||||
lm += f"Observation: {result}\n\n"
|
||||
|
||||
return lm
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
lm = react_agent(lm, "What is 25 * 4 + 10?")
|
||||
print(lm["answer"])
|
||||
```
|
||||
|
||||
### Pattern 5: Data Extraction
|
||||
|
||||
```python
|
||||
from guidance import models, gen, guidance
|
||||
|
||||
@guidance
|
||||
def extract_entities(lm, text):
|
||||
"""Extract structured entities from text."""
|
||||
lm += f"Text: {text}\n\n"
|
||||
|
||||
# Extract person
|
||||
lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"
|
||||
|
||||
# Extract organization
|
||||
lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"
|
||||
|
||||
# Extract date
|
||||
lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"
|
||||
|
||||
# Extract location
|
||||
lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"
|
||||
|
||||
return lm
|
||||
|
||||
text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."
|
||||
|
||||
lm = models.Anthropic("claude-sonnet-4-5-20250929")
|
||||
lm = extract_entities(lm, text)
|
||||
|
||||
print(f"Person: {lm['person']}")
|
||||
print(f"Organization: {lm['organization']}")
|
||||
print(f"Date: {lm['date']}")
|
||||
print(f"Location: {lm['location']}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Regex for Format Validation
|
||||
|
||||
```python
|
||||
# ✅ Good: Regex ensures valid format
|
||||
lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
|
||||
|
||||
# ❌ Bad: Free generation may produce invalid emails
|
||||
lm += "Email: " + gen("email", max_tokens=50)
|
||||
```
|
||||
|
||||
### 2. Use select() for Fixed Categories
|
||||
|
||||
```python
|
||||
# ✅ Good: Guaranteed valid category
|
||||
lm += "Status: " + select(["pending", "approved", "rejected"], name="status")
|
||||
|
||||
# ❌ Bad: May generate typos or invalid values
|
||||
lm += "Status: " + gen("status", max_tokens=20)
|
||||
```
|
||||
|
||||
### 3. Leverage Token Healing
|
||||
|
||||
```python
|
||||
# Token healing is enabled by default
|
||||
# No special action needed - just concatenate naturally
|
||||
lm += "The capital is " + gen("capital") # Automatic healing
|
||||
```
|
||||
|
||||
### 4. Use stop Sequences
|
||||
|
||||
```python
|
||||
# ✅ Good: Stop at newline for single-line outputs
|
||||
lm += "Name: " + gen("name", stop="\n")
|
||||
|
||||
# ❌ Bad: May generate multiple lines
|
||||
lm += "Name: " + gen("name", max_tokens=50)
|
||||
```
|
||||
|
||||
### 5. Create Reusable Functions
|
||||
|
||||
```python
|
||||
# ✅ Good: Reusable pattern
|
||||
@guidance
|
||||
def generate_person(lm):
|
||||
lm += "Name: " + gen("name", stop="\n")
|
||||
lm += "\nAge: " + gen("age", regex=r"[0-9]+")
|
||||
return lm
|
||||
|
||||
# Use multiple times
|
||||
lm = generate_person(lm)
|
||||
lm += "\n\n"
|
||||
lm = generate_person(lm)
|
||||
```
|
||||
|
||||
### 6. Balance Constraints
|
||||
|
||||
```python
|
||||
# ✅ Good: Reasonable constraints
|
||||
lm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)
|
||||
|
||||
# ❌ Too strict: May fail or be very slow
|
||||
lm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)
|
||||
```
|
||||
|
||||
## Comparison to Alternatives
|
||||
|
||||
| Feature | Guidance | Instructor | Outlines | LMQL |
|
||||
|---------|----------|------------|----------|------|
|
||||
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
|
||||
| Grammar Support | ✅ CFG | ❌ No | ✅ CFG | ✅ CFG |
|
||||
| Pydantic Validation | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
|
||||
| Token Healing | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
||||
| Local Models | ✅ Yes | ⚠️ Limited | ✅ Yes | ✅ Yes |
|
||||
| API Models | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
|
||||
| Pythonic Syntax | ✅ Yes | ✅ Yes | ✅ Yes | ❌ SQL-like |
|
||||
| Learning Curve | Low | Low | Medium | High |
|
||||
|
||||
**When to choose Guidance:**
|
||||
- Need regex/grammar constraints
|
||||
- Want token healing
|
||||
- Building complex workflows with control flow
|
||||
- Using local models (Transformers, llama.cpp)
|
||||
- Prefer Pythonic syntax
|
||||
|
||||
**When to choose alternatives:**
|
||||
- Instructor: Need Pydantic validation with automatic retrying
|
||||
- Outlines: Need JSON schema validation
|
||||
- LMQL: Prefer declarative query syntax
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
**Latency Reduction:**
|
||||
- 30-50% faster than traditional prompting for constrained outputs
|
||||
- Token healing reduces unnecessary regeneration
|
||||
- Grammar constraints prevent invalid token generation
|
||||
|
||||
**Memory Usage:**
|
||||
- Minimal overhead vs unconstrained generation
|
||||
- Grammar compilation cached after first use
|
||||
- Efficient token filtering at inference time
|
||||
|
||||
**Token Efficiency:**
|
||||
- Prevents wasted tokens on invalid outputs
|
||||
- No need for retry loops
|
||||
- Direct path to valid outputs
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://guidance.readthedocs.io
|
||||
- **GitHub**: https://github.com/guidance-ai/guidance (18k+ stars)
|
||||
- **Notebooks**: https://github.com/guidance-ai/guidance/tree/main/notebooks
|
||||
- **Discord**: Community support available
|
||||
|
||||
## See Also
|
||||
|
||||
- `references/constraints.md` - Comprehensive regex and grammar patterns
|
||||
- `references/backends.md` - Backend-specific configuration
|
||||
- `references/examples.md` - Production-ready examples
|
||||
|
|
@ -0,0 +1,320 @@
|
|||
---
|
||||
title: "Hermes Atropos Environments — Build, test, and debug Hermes Agent RL environments for Atropos training"
|
||||
sidebar_label: "Hermes Atropos Environments"
|
||||
description: "Build, test, and debug Hermes Agent RL environments for Atropos training"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Hermes Atropos Environments
|
||||
|
||||
Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, evaluation with tools, wandb logging, and the three CLI modes (serve/process/evaluate). Use when creating, reviewing, or fixing RL environments in the hermes-agent repo.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/hermes-atropos-environments` |
|
||||
| Path | `optional-skills/mlops/hermes-atropos-environments` |
|
||||
| Version | `1.1.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `atropos`, `rl`, `environments`, `training`, `reinforcement-learning`, `reward-functions` |
|
||||
| Related skills | [`axolotl`](/docs/user-guide/skills/bundled/mlops/mlops-training-axolotl), [`fine-tuning-with-trl`](/docs/user-guide/skills/bundled/mlops/mlops-training-trl-fine-tuning), `lm-evaluation-harness` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Hermes Agent Atropos Environments
|
||||
|
||||
Guide for building RL environments in the hermes-agent repo that integrate with the Atropos training framework.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
Atropos BaseEnv (atroposlib/envs/base.py)
|
||||
└── HermesAgentBaseEnv (environments/hermes_base_env.py)
|
||||
├── Handles agent loop orchestration
|
||||
├── Handles tool resolution per group
|
||||
├── Handles ToolContext for reward verification
|
||||
└── YOUR ENVIRONMENT (environments/your_env.py)
|
||||
Only implements: setup, get_next_item, format_prompt,
|
||||
compute_reward, evaluate, wandb_log
|
||||
```
|
||||
|
||||
Hermes environments are special because they run a **multi-turn agent loop with tool calling** — not just single-turn completions. The base env handles the loop; you implement the task and scoring.
|
||||
|
||||
## File Locations
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `environments/hermes_base_env.py` | Base class with agent loop + tool resolution |
|
||||
| `environments/agent_loop.py` | `HermesAgentLoop` + `AgentResult` dataclass |
|
||||
| `environments/tool_context.py` | `ToolContext` for reward verification |
|
||||
| `environments/tool_call_parsers.py` | Phase 2 tool call parsers (hermes, mistral, etc.) |
|
||||
| `environments/your_env.py` | Your environment implementation |
|
||||
|
||||
## Inference Setup — Ask the User First
|
||||
|
||||
**IMPORTANT:** Before running any test, evaluation, or data generation command, always ask the user how they want to handle inference. Do NOT assume OpenRouter or any specific endpoint. Present these options:
|
||||
|
||||
1. **OpenRouter** — Ask which model they want to use (e.g., `anthropic/claude-sonnet-4.5`, `google/gemini-2.5-pro`, `meta-llama/llama-3.3-70b-instruct`, etc.). Requires `OPENROUTER_API_KEY` in environment.
|
||||
2. **Self-hosted VLLM endpoint** — Ask for their base URL (e.g., `http://localhost:8000/v1`) and model name. Set `--openai.server_type vllm`.
|
||||
3. **Other OpenAI-compatible API** — Ask for the base URL, model name, and any required API key. Set `--openai.server_type openai` and `--openai.health_check false`.
|
||||
4. **Local Atropos training server** — For `serve` mode with a live training loop. Default `http://localhost:8000/v1`.
|
||||
|
||||
Once the user tells you their setup, use those values in all CLI commands for that session. Example prompts:
|
||||
|
||||
> "Before I run this, how would you like to handle inference?
|
||||
> 1. OpenRouter (I'll need your preferred model, e.g. claude-sonnet-4.5)
|
||||
> 2. A self-hosted VLLM endpoint (give me the URL and model name)
|
||||
> 3. Another OpenAI-compatible API (give me the URL, model, and any auth details)
|
||||
> 4. Local Atropos training server (serve mode)"
|
||||
|
||||
### Key flags by provider:
|
||||
|
||||
| Provider | `--openai.server_type` | `--openai.health_check` | `--openai.api_key` |
|
||||
|----------|----------------------|------------------------|-------------------|
|
||||
| OpenRouter | `openai` | `false` | `$OPENROUTER_API_KEY` |
|
||||
| VLLM (self-hosted) | `vllm` | (default) | (not needed) |
|
||||
| Other OpenAI-compatible | `openai` | `false` | As needed |
|
||||
| Local Atropos | (default) | (default) | (not needed) |
|
||||
|
||||
## Required Methods
|
||||
|
||||
### 1. `setup()` — Load dataset and initialize state
|
||||
|
||||
```python
|
||||
async def setup(self) -> None:
|
||||
"""Called once at startup. Load datasets, initialize state."""
|
||||
# Try HuggingFace first, fallback to built-in samples
|
||||
try:
|
||||
from datasets import load_dataset
|
||||
ds = load_dataset("your/dataset", split="test")
|
||||
self._items = [...]
|
||||
except Exception:
|
||||
self._items = BUILTIN_SAMPLES
|
||||
|
||||
# Always split into train/eval
|
||||
random.shuffle(self._items)
|
||||
eval_size = max(20, int(len(self._items) * 0.1))
|
||||
self._eval_items = self._items[:eval_size]
|
||||
self._items = self._items[eval_size:]
|
||||
```
|
||||
|
||||
### 2. `get_next_item()` — Return next training item
|
||||
|
||||
```python
|
||||
async def get_next_item(self) -> dict:
|
||||
"""Return next item, cycling through dataset."""
|
||||
item = self._items[self._index % len(self._items)]
|
||||
self._index += 1
|
||||
return item
|
||||
```
|
||||
|
||||
### 3. `format_prompt(item)` — Convert item to user message
|
||||
|
||||
```python
|
||||
def format_prompt(self, item: dict) -> str:
|
||||
"""Convert a dataset item into the user-facing prompt."""
|
||||
return f"Research this question: {item['question']}"
|
||||
```
|
||||
|
||||
### 4. `compute_reward(item, result, ctx)` — Score the rollout
|
||||
|
||||
**CRITICAL**: `result` is an `AgentResult`, NOT a dict. It has these attributes:
|
||||
- `result.messages` — List of message dicts (OpenAI format)
|
||||
- `result.turns_used` — Number of LLM calls made
|
||||
- `result.finished_naturally` — True if model stopped voluntarily
|
||||
- `result.tool_errors` — List of ToolError objects
|
||||
|
||||
**AgentResult does NOT have**: `final_response`, `tool_calls`, `tools_used`.
|
||||
You must extract these from `result.messages`:
|
||||
|
||||
```python
|
||||
async def compute_reward(self, item, result: AgentResult, ctx: ToolContext) -> float:
|
||||
# Extract final response (last assistant message with content)
|
||||
final_response = ""
|
||||
tools_used = []
|
||||
for msg in reversed(result.messages):
|
||||
if msg.get("role") == "assistant" and msg.get("content") and not final_response:
|
||||
final_response = msg["content"]
|
||||
if msg.get("role") == "assistant" and msg.get("tool_calls"):
|
||||
for tc in msg["tool_calls"]:
|
||||
fn = tc.get("function", {}) if isinstance(tc, dict) else {}
|
||||
name = fn.get("name", "")
|
||||
if name:
|
||||
tools_used.append(name)
|
||||
|
||||
# Score using LLM judge, heuristic, or ToolContext verification
|
||||
correctness = await self._llm_judge(item, final_response)
|
||||
return correctness
|
||||
```
|
||||
|
||||
`ctx` (ToolContext) gives you terminal/file access to the agent's sandbox for verification:
|
||||
```python
|
||||
# Run tests in the agent's sandbox
|
||||
result = ctx.terminal("pytest /workspace/test.py")
|
||||
return 1.0 if result["exit_code"] == 0 else 0.0
|
||||
```
|
||||
|
||||
### 5. `evaluate()` — Periodic evaluation with full agent loop
|
||||
|
||||
**MUST use the full agent loop with tools**, not single-turn chat_completion.
|
||||
The whole point of hermes-agent environments is agentic evaluation:
|
||||
|
||||
```python
|
||||
async def evaluate(self, *args, **kwargs) -> None:
|
||||
import time, uuid
|
||||
from environments.agent_loop import HermesAgentLoop
|
||||
from environments.tool_context import ToolContext
|
||||
|
||||
start_time = time.time()
|
||||
tools, valid_names = self._resolve_tools_for_group()
|
||||
samples = []
|
||||
|
||||
for item in self._eval_items[:self.config.eval_size]:
|
||||
task_id = str(uuid.uuid4())
|
||||
messages = []
|
||||
if self.config.system_prompt:
|
||||
messages.append({"role": "system", "content": self.config.system_prompt})
|
||||
messages.append({"role": "user", "content": self.format_prompt(item)})
|
||||
|
||||
agent = HermesAgentLoop(
|
||||
server=self.server,
|
||||
tool_schemas=tools,
|
||||
valid_tool_names=valid_names,
|
||||
max_turns=self.config.max_agent_turns,
|
||||
task_id=task_id,
|
||||
temperature=0.0, # Deterministic for eval
|
||||
max_tokens=self.config.max_token_length,
|
||||
extra_body=self.config.extra_body,
|
||||
)
|
||||
result = await agent.run(messages)
|
||||
|
||||
ctx = ToolContext(task_id)
|
||||
try:
|
||||
reward = await self.compute_reward(item, result, ctx)
|
||||
finally:
|
||||
ctx.cleanup()
|
||||
|
||||
samples.append({"prompt": ..., "response": ..., "reward": reward})
|
||||
|
||||
eval_metrics = {"eval/mean_reward": ...}
|
||||
await self.evaluate_log(metrics=eval_metrics, samples=samples,
|
||||
start_time=start_time, end_time=time.time())
|
||||
```
|
||||
|
||||
### 6. `wandb_log()` — Custom metrics logging
|
||||
|
||||
Always call `super().wandb_log()` at the end:
|
||||
|
||||
```python
|
||||
async def wandb_log(self, wandb_metrics=None):
|
||||
if wandb_metrics is None:
|
||||
wandb_metrics = {}
|
||||
if self._reward_buffer:
|
||||
n = len(self._reward_buffer)
|
||||
wandb_metrics["train/mean_reward"] = sum(self._reward_buffer) / n
|
||||
self._reward_buffer.clear()
|
||||
await super().wandb_log(wandb_metrics) # MUST call super
|
||||
```
|
||||
|
||||
**Pitfall**: `compute_reward` appends to metric buffers. During eval, this pollutes training metrics. Roll back buffer entries added during eval.
|
||||
|
||||
## Config Class
|
||||
|
||||
Always create a custom config subclass with Pydantic Field descriptors. Key inherited fields you can tune: `enabled_toolsets`, `max_agent_turns`, `agent_temperature`, `system_prompt`, `terminal_backend`, `group_size`, `steps_per_eval`, `total_steps`.
|
||||
|
||||
## config_init() — Default Configuration
|
||||
|
||||
Classmethod returning `(YourEnvConfig, [APIServerConfig(...)])`. Set server_type to "openai" for OpenRouter/external APIs. Load API key from environment variable.
|
||||
|
||||
## Three CLI Modes
|
||||
|
||||
```bash
|
||||
# SERVE — Full training loop (connects to Atropos API server)
|
||||
python environments/my_env.py serve --openai.base_url http://localhost:8000/v1
|
||||
|
||||
# PROCESS — Offline data generation (saves JSONL)
|
||||
python environments/my_env.py process --env.total_steps 10 --env.group_size 1 \
|
||||
--env.use_wandb false --env.data_path_to_save_groups output.jsonl \
|
||||
--openai.base_url "<USER_BASE_URL>" \
|
||||
--openai.model_name "<USER_MODEL>" \
|
||||
--openai.server_type <USER_SERVER_TYPE> --openai.health_check false
|
||||
|
||||
# EVALUATE — Standalone eval (runs setup + evaluate only)
|
||||
python environments/my_env.py evaluate --env.eval_size 20 \
|
||||
--env.data_dir_to_save_evals /tmp/eval_results \
|
||||
--openai.base_url "<USER_BASE_URL>" \
|
||||
--openai.model_name "<USER_MODEL>" \
|
||||
--openai.server_type <USER_SERVER_TYPE> --openai.health_check false
|
||||
```
|
||||
|
||||
Config priority: CLI args > YAML file > config_init() defaults.
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **AgentResult has .messages, not .final_response** — Extract the final response by iterating reversed(result.messages) looking for the last assistant message with content.
|
||||
|
||||
2. **evaluate() must use HermesAgentLoop, not chat_completion** — Single-turn chat_completion has no tools. The whole point of hermes-agent benchmarks is agentic evaluation with tool use.
|
||||
|
||||
3. **Don't call _llm_judge twice** — If compute_reward already calls it, extract the score from the buffer instead of calling judge separately in evaluate().
|
||||
|
||||
4. **Eval pollutes training buffers** — compute_reward appends to metric buffers. During eval, roll back buffer entries to keep training metrics clean.
|
||||
|
||||
5. **Always set health_check=false for OpenRouter** — OpenRouter has no /health endpoint.
|
||||
|
||||
6. **Set data_dir_to_save_evals in evaluate mode** — Without it, results aren't saved.
|
||||
|
||||
7. **default_toolsets class variable vs enabled_toolsets config** — The class variable is a hint; the config field is what actually controls tool resolution.
|
||||
|
||||
8. **Tool call parsing in messages** — Tool calls are dicts with `{"function": {"name": ..., "arguments": ...}}`. Always check `isinstance(tc, dict)`.
|
||||
|
||||
9. **ToolContext.cleanup()** — Always call in a finally block to release sandbox resources.
|
||||
|
||||
10. **server_type must be "openai" for external APIs** — Without it, Atropos assumes a local VLLM server.
|
||||
|
||||
11. **Always ask the user for their inference setup** — Never hardcode or assume a specific provider/model. See the "Inference Setup" section above.
|
||||
|
||||
## Reward Function Patterns
|
||||
|
||||
### LLM Judge (for open-ended tasks)
|
||||
Use `self.server.chat_completion()` with a scoring prompt. Parse JSON response for score float. Always include a heuristic fallback (keyword overlap) for when the judge call fails.
|
||||
|
||||
### Binary Verification (for code/terminal tasks)
|
||||
Use `ctx.terminal("pytest test.py -q")` to run tests in the agent's sandbox. Return 1.0 for pass, 0.0 for fail.
|
||||
|
||||
### Multi-Signal (combine multiple indicators)
|
||||
Weight correctness (0.6) + tool usage (0.2) + efficiency (0.2) + optional bonuses. Clamp to [0, 1].
|
||||
|
||||
## Testing Your Environment
|
||||
|
||||
1. **Import test**: `python -c "from environments.my_env import MyEnv; print('OK')"`
|
||||
2. **Ask the user for inference setup** (see "Inference Setup" section above)
|
||||
3. **Process mode** (1 item): Verify JSONL output has valid tokens, masks, scores
|
||||
4. **Evaluate mode**: Verify full agent loop runs with tools, metrics logged correctly
|
||||
5. **Check reward range**: Scores should be in [0, 1], not all identical
|
||||
|
||||
## Minimum Implementation Checklist
|
||||
|
||||
```python
|
||||
class MyEnv(HermesAgentBaseEnv):
|
||||
name = "my-env"
|
||||
env_config_cls = MyEnvConfig
|
||||
|
||||
@classmethod
|
||||
def config_init(cls): ... # Default server + env config
|
||||
async def setup(self): ... # Load dataset + train/eval split
|
||||
async def get_next_item(self): ... # Cycle through training items
|
||||
def format_prompt(self, item): ... # Item → user message string
|
||||
async def compute_reward(self, item, result, ctx): ... # Score rollout
|
||||
async def evaluate(self, *args, **kwargs): ... # Full agent loop eval
|
||||
async def wandb_log(self, metrics=None): ... # Custom metrics + super()
|
||||
|
||||
if __name__ == "__main__":
|
||||
MyEnv.cli()
|
||||
```
|
||||
|
|
@ -0,0 +1,534 @@
|
|||
---
|
||||
title: "Huggingface Tokenizers — Fast tokenizers optimized for research and production"
|
||||
sidebar_label: "Huggingface Tokenizers"
|
||||
description: "Fast tokenizers optimized for research and production"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Huggingface Tokenizers
|
||||
|
||||
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/huggingface-tokenizers` |
|
||||
| Path | `optional-skills/mlops/huggingface-tokenizers` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `tokenizers`, `transformers`, `datasets` |
|
||||
| Tags | `Tokenization`, `HuggingFace`, `BPE`, `WordPiece`, `Unigram`, `Fast Tokenization`, `Rust`, `Custom Tokenizer`, `Alignment Tracking`, `Production` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# HuggingFace Tokenizers - Fast Tokenization for NLP
|
||||
|
||||
Fast, production-ready tokenizers with Rust performance and Python ease-of-use.
|
||||
|
||||
## When to use HuggingFace Tokenizers
|
||||
|
||||
**Use HuggingFace Tokenizers when:**
|
||||
- Need extremely fast tokenization (<20s per GB of text)
|
||||
- Training custom tokenizers from scratch
|
||||
- Want alignment tracking (token → original text position)
|
||||
- Building production NLP pipelines
|
||||
- Need to tokenize large corpora efficiently
|
||||
|
||||
**Performance**:
|
||||
- **Speed**: <20 seconds to tokenize 1GB on CPU
|
||||
- **Implementation**: Rust core with Python/Node.js bindings
|
||||
- **Efficiency**: 10-100× faster than pure Python implementations
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **SentencePiece**: Language-independent, used by T5/ALBERT
|
||||
- **tiktoken**: OpenAI's BPE tokenizer for GPT models
|
||||
- **transformers AutoTokenizer**: Loading pretrained only (uses this library internally)
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Install tokenizers
|
||||
pip install tokenizers
|
||||
|
||||
# With transformers integration
|
||||
pip install tokenizers transformers
|
||||
```
|
||||
|
||||
### Load pretrained tokenizer
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
|
||||
# Load from HuggingFace Hub
|
||||
tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
|
||||
|
||||
# Encode text
|
||||
output = tokenizer.encode("Hello, how are you?")
|
||||
print(output.tokens) # ['hello', ',', 'how', 'are', 'you', '?']
|
||||
print(output.ids) # [7592, 1010, 2129, 2024, 2017, 1029]
|
||||
|
||||
# Decode back
|
||||
text = tokenizer.decode(output.ids)
|
||||
print(text) # "hello, how are you?"
|
||||
```
|
||||
|
||||
### Train custom BPE tokenizer
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from tokenizers.models import BPE
|
||||
from tokenizers.trainers import BpeTrainer
|
||||
from tokenizers.pre_tokenizers import Whitespace
|
||||
|
||||
# Initialize tokenizer with BPE model
|
||||
tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
|
||||
tokenizer.pre_tokenizer = Whitespace()
|
||||
|
||||
# Configure trainer
|
||||
trainer = BpeTrainer(
|
||||
vocab_size=30000,
|
||||
special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"],
|
||||
min_frequency=2
|
||||
)
|
||||
|
||||
# Train on files
|
||||
files = ["train.txt", "validation.txt"]
|
||||
tokenizer.train(files, trainer)
|
||||
|
||||
# Save
|
||||
tokenizer.save("my-tokenizer.json")
|
||||
```
|
||||
|
||||
**Training time**: ~1-2 minutes for 100MB corpus, ~10-20 minutes for 1GB
|
||||
|
||||
### Batch encoding with padding
|
||||
|
||||
```python
|
||||
# Enable padding
|
||||
tokenizer.enable_padding(pad_id=3, pad_token="[PAD]")
|
||||
|
||||
# Encode batch
|
||||
texts = ["Hello world", "This is a longer sentence"]
|
||||
encodings = tokenizer.encode_batch(texts)
|
||||
|
||||
for encoding in encodings:
|
||||
print(encoding.ids)
|
||||
# [101, 7592, 2088, 102, 3, 3, 3]
|
||||
# [101, 2023, 2003, 1037, 2936, 6251, 102]
|
||||
```
|
||||
|
||||
## Tokenization algorithms
|
||||
|
||||
### BPE (Byte-Pair Encoding)
|
||||
|
||||
**How it works**:
|
||||
1. Start with character-level vocabulary
|
||||
2. Find most frequent character pair
|
||||
3. Merge into new token, add to vocabulary
|
||||
4. Repeat until vocabulary size reached
|
||||
|
||||
**Used by**: GPT-2, GPT-3, RoBERTa, BART, DeBERTa
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from tokenizers.models import BPE
|
||||
from tokenizers.trainers import BpeTrainer
|
||||
from tokenizers.pre_tokenizers import ByteLevel
|
||||
|
||||
tokenizer = Tokenizer(BPE(unk_token="<|endoftext|>"))
|
||||
tokenizer.pre_tokenizer = ByteLevel()
|
||||
|
||||
trainer = BpeTrainer(
|
||||
vocab_size=50257,
|
||||
special_tokens=["<|endoftext|>"],
|
||||
min_frequency=2
|
||||
)
|
||||
|
||||
tokenizer.train(files=["data.txt"], trainer=trainer)
|
||||
```
|
||||
|
||||
**Advantages**:
|
||||
- Handles OOV words well (breaks into subwords)
|
||||
- Flexible vocabulary size
|
||||
- Good for morphologically rich languages
|
||||
|
||||
**Trade-offs**:
|
||||
- Tokenization depends on merge order
|
||||
- May split common words unexpectedly
|
||||
|
||||
### WordPiece
|
||||
|
||||
**How it works**:
|
||||
1. Start with character vocabulary
|
||||
2. Score merge pairs: `frequency(pair) / (frequency(first) × frequency(second))`
|
||||
3. Merge highest scoring pair
|
||||
4. Repeat until vocabulary size reached
|
||||
|
||||
**Used by**: BERT, DistilBERT, MobileBERT
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from tokenizers.models import WordPiece
|
||||
from tokenizers.trainers import WordPieceTrainer
|
||||
from tokenizers.pre_tokenizers import Whitespace
|
||||
from tokenizers.normalizers import BertNormalizer
|
||||
|
||||
tokenizer = Tokenizer(WordPiece(unk_token="[UNK]"))
|
||||
tokenizer.normalizer = BertNormalizer(lowercase=True)
|
||||
tokenizer.pre_tokenizer = Whitespace()
|
||||
|
||||
trainer = WordPieceTrainer(
|
||||
vocab_size=30522,
|
||||
special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"],
|
||||
continuing_subword_prefix="##"
|
||||
)
|
||||
|
||||
tokenizer.train(files=["corpus.txt"], trainer=trainer)
|
||||
```
|
||||
|
||||
**Advantages**:
|
||||
- Prioritizes meaningful merges (high score = semantically related)
|
||||
- Used successfully in BERT (state-of-the-art results)
|
||||
|
||||
**Trade-offs**:
|
||||
- Unknown words become `[UNK]` if no subword match
|
||||
- Saves vocabulary, not merge rules (larger files)
|
||||
|
||||
### Unigram
|
||||
|
||||
**How it works**:
|
||||
1. Start with large vocabulary (all substrings)
|
||||
2. Compute loss for corpus with current vocabulary
|
||||
3. Remove tokens with minimal impact on loss
|
||||
4. Repeat until vocabulary size reached
|
||||
|
||||
**Used by**: ALBERT, T5, mBART, XLNet (via SentencePiece)
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from tokenizers.models import Unigram
|
||||
from tokenizers.trainers import UnigramTrainer
|
||||
|
||||
tokenizer = Tokenizer(Unigram())
|
||||
|
||||
trainer = UnigramTrainer(
|
||||
vocab_size=8000,
|
||||
special_tokens=["<unk>", "<s>", "</s>"],
|
||||
unk_token="<unk>"
|
||||
)
|
||||
|
||||
tokenizer.train(files=["data.txt"], trainer=trainer)
|
||||
```
|
||||
|
||||
**Advantages**:
|
||||
- Probabilistic (finds most likely tokenization)
|
||||
- Works well for languages without word boundaries
|
||||
- Handles diverse linguistic contexts
|
||||
|
||||
**Trade-offs**:
|
||||
- Computationally expensive to train
|
||||
- More hyperparameters to tune
|
||||
|
||||
## Tokenization pipeline
|
||||
|
||||
Complete pipeline: **Normalization → Pre-tokenization → Model → Post-processing**
|
||||
|
||||
### Normalization
|
||||
|
||||
Clean and standardize text:
|
||||
|
||||
```python
|
||||
from tokenizers.normalizers import NFD, StripAccents, Lowercase, Sequence
|
||||
|
||||
tokenizer.normalizer = Sequence([
|
||||
NFD(), # Unicode normalization (decompose)
|
||||
Lowercase(), # Convert to lowercase
|
||||
StripAccents() # Remove accents
|
||||
])
|
||||
|
||||
# Input: "Héllo WORLD"
|
||||
# After normalization: "hello world"
|
||||
```
|
||||
|
||||
**Common normalizers**:
|
||||
- `NFD`, `NFC`, `NFKD`, `NFKC` - Unicode normalization forms
|
||||
- `Lowercase()` - Convert to lowercase
|
||||
- `StripAccents()` - Remove accents (é → e)
|
||||
- `Strip()` - Remove whitespace
|
||||
- `Replace(pattern, content)` - Regex replacement
|
||||
|
||||
### Pre-tokenization
|
||||
|
||||
Split text into word-like units:
|
||||
|
||||
```python
|
||||
from tokenizers.pre_tokenizers import Whitespace, Punctuation, Sequence, ByteLevel
|
||||
|
||||
# Split on whitespace and punctuation
|
||||
tokenizer.pre_tokenizer = Sequence([
|
||||
Whitespace(),
|
||||
Punctuation()
|
||||
])
|
||||
|
||||
# Input: "Hello, world!"
|
||||
# After pre-tokenization: ["Hello", ",", "world", "!"]
|
||||
```
|
||||
|
||||
**Common pre-tokenizers**:
|
||||
- `Whitespace()` - Split on spaces, tabs, newlines
|
||||
- `ByteLevel()` - GPT-2 style byte-level splitting
|
||||
- `Punctuation()` - Isolate punctuation
|
||||
- `Digits(individual_digits=True)` - Split digits individually
|
||||
- `Metaspace()` - Replace spaces with ▁ (SentencePiece style)
|
||||
|
||||
### Post-processing
|
||||
|
||||
Add special tokens for model input:
|
||||
|
||||
```python
|
||||
from tokenizers.processors import TemplateProcessing
|
||||
|
||||
# BERT-style: [CLS] sentence [SEP]
|
||||
tokenizer.post_processor = TemplateProcessing(
|
||||
single="[CLS] $A [SEP]",
|
||||
pair="[CLS] $A [SEP] $B [SEP]",
|
||||
special_tokens=[
|
||||
("[CLS]", 1),
|
||||
("[SEP]", 2),
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
**Common patterns**:
|
||||
```python
|
||||
# GPT-2: sentence <|endoftext|>
|
||||
TemplateProcessing(
|
||||
single="$A <|endoftext|>",
|
||||
special_tokens=[("<|endoftext|>", 50256)]
|
||||
)
|
||||
|
||||
# RoBERTa: <s> sentence </s>
|
||||
TemplateProcessing(
|
||||
single="<s> $A </s>",
|
||||
pair="<s> $A </s> </s> $B </s>",
|
||||
special_tokens=[("<s>", 0), ("</s>", 2)]
|
||||
)
|
||||
```
|
||||
|
||||
## Alignment tracking
|
||||
|
||||
Track token positions in original text:
|
||||
|
||||
```python
|
||||
output = tokenizer.encode("Hello, world!")
|
||||
|
||||
# Get token offsets
|
||||
for token, offset in zip(output.tokens, output.offsets):
|
||||
start, end = offset
|
||||
print(f"{token:10} → [{start:2}, {end:2}): {text[start:end]!r}")
|
||||
|
||||
# Output:
|
||||
# hello → [ 0, 5): 'Hello'
|
||||
# , → [ 5, 6): ','
|
||||
# world → [ 7, 12): 'world'
|
||||
# ! → [12, 13): '!'
|
||||
```
|
||||
|
||||
**Use cases**:
|
||||
- Named entity recognition (map predictions back to text)
|
||||
- Question answering (extract answer spans)
|
||||
- Token classification (align labels to original positions)
|
||||
|
||||
## Integration with transformers
|
||||
|
||||
### Load with AutoTokenizer
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
# AutoTokenizer automatically uses fast tokenizers
|
||||
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
|
||||
# Check if using fast tokenizer
|
||||
print(tokenizer.is_fast) # True
|
||||
|
||||
# Access underlying tokenizers.Tokenizer
|
||||
fast_tokenizer = tokenizer.backend_tokenizer
|
||||
print(type(fast_tokenizer)) # <class 'tokenizers.Tokenizer'>
|
||||
```
|
||||
|
||||
### Convert custom tokenizer to transformers
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from transformers import PreTrainedTokenizerFast
|
||||
|
||||
# Train custom tokenizer
|
||||
tokenizer = Tokenizer(BPE())
|
||||
# ... train tokenizer ...
|
||||
tokenizer.save("my-tokenizer.json")
|
||||
|
||||
# Wrap for transformers
|
||||
transformers_tokenizer = PreTrainedTokenizerFast(
|
||||
tokenizer_file="my-tokenizer.json",
|
||||
unk_token="[UNK]",
|
||||
pad_token="[PAD]",
|
||||
cls_token="[CLS]",
|
||||
sep_token="[SEP]",
|
||||
mask_token="[MASK]"
|
||||
)
|
||||
|
||||
# Use like any transformers tokenizer
|
||||
outputs = transformers_tokenizer(
|
||||
"Hello world",
|
||||
padding=True,
|
||||
truncation=True,
|
||||
max_length=512,
|
||||
return_tensors="pt"
|
||||
)
|
||||
```
|
||||
|
||||
## Common patterns
|
||||
|
||||
### Train from iterator (large datasets)
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
|
||||
# Load dataset
|
||||
dataset = load_dataset("wikitext", "wikitext-103-raw-v1", split="train")
|
||||
|
||||
# Create batch iterator
|
||||
def batch_iterator(batch_size=1000):
|
||||
for i in range(0, len(dataset), batch_size):
|
||||
yield dataset[i:i + batch_size]["text"]
|
||||
|
||||
# Train tokenizer
|
||||
tokenizer.train_from_iterator(
|
||||
batch_iterator(),
|
||||
trainer=trainer,
|
||||
length=len(dataset) # For progress bar
|
||||
)
|
||||
```
|
||||
|
||||
**Performance**: Processes 1GB in ~10-20 minutes
|
||||
|
||||
### Enable truncation and padding
|
||||
|
||||
```python
|
||||
# Enable truncation
|
||||
tokenizer.enable_truncation(max_length=512)
|
||||
|
||||
# Enable padding
|
||||
tokenizer.enable_padding(
|
||||
pad_id=tokenizer.token_to_id("[PAD]"),
|
||||
pad_token="[PAD]",
|
||||
length=512 # Fixed length, or None for batch max
|
||||
)
|
||||
|
||||
# Encode with both
|
||||
output = tokenizer.encode("This is a long sentence that will be truncated...")
|
||||
print(len(output.ids)) # 512
|
||||
```
|
||||
|
||||
### Multi-processing
|
||||
|
||||
```python
|
||||
from tokenizers import Tokenizer
|
||||
from multiprocessing import Pool
|
||||
|
||||
# Load tokenizer
|
||||
tokenizer = Tokenizer.from_file("tokenizer.json")
|
||||
|
||||
def encode_batch(texts):
|
||||
return tokenizer.encode_batch(texts)
|
||||
|
||||
# Process large corpus in parallel
|
||||
with Pool(8) as pool:
|
||||
# Split corpus into chunks
|
||||
chunk_size = 1000
|
||||
chunks = [corpus[i:i+chunk_size] for i in range(0, len(corpus), chunk_size)]
|
||||
|
||||
# Encode in parallel
|
||||
results = pool.map(encode_batch, chunks)
|
||||
```
|
||||
|
||||
**Speedup**: 5-8× with 8 cores
|
||||
|
||||
## Performance benchmarks
|
||||
|
||||
### Training speed
|
||||
|
||||
| Corpus Size | BPE (30k vocab) | WordPiece (30k) | Unigram (8k) |
|
||||
|-------------|-----------------|-----------------|--------------|
|
||||
| 10 MB | 15 sec | 18 sec | 25 sec |
|
||||
| 100 MB | 1.5 min | 2 min | 4 min |
|
||||
| 1 GB | 15 min | 20 min | 40 min |
|
||||
|
||||
**Hardware**: 16-core CPU, tested on English Wikipedia
|
||||
|
||||
### Tokenization speed
|
||||
|
||||
| Implementation | 1 GB corpus | Throughput |
|
||||
|----------------|-------------|---------------|
|
||||
| Pure Python | ~20 minutes | ~50 MB/min |
|
||||
| HF Tokenizers | ~15 seconds | ~4 GB/min |
|
||||
| **Speedup** | **80×** | **80×** |
|
||||
|
||||
**Test**: English text, average sentence length 20 words
|
||||
|
||||
### Memory usage
|
||||
|
||||
| Task | Memory |
|
||||
|-------------------------|---------|
|
||||
| Load tokenizer | ~10 MB |
|
||||
| Train BPE (30k vocab) | ~200 MB |
|
||||
| Encode 1M sentences | ~500 MB |
|
||||
|
||||
## Supported models
|
||||
|
||||
Pre-trained tokenizers available via `from_pretrained()`:
|
||||
|
||||
**BERT family**:
|
||||
- `bert-base-uncased`, `bert-large-cased`
|
||||
- `distilbert-base-uncased`
|
||||
- `roberta-base`, `roberta-large`
|
||||
|
||||
**GPT family**:
|
||||
- `gpt2`, `gpt2-medium`, `gpt2-large`
|
||||
- `distilgpt2`
|
||||
|
||||
**T5 family**:
|
||||
- `t5-small`, `t5-base`, `t5-large`
|
||||
- `google/flan-t5-xxl`
|
||||
|
||||
**Other**:
|
||||
- `facebook/bart-base`, `facebook/mbart-large-cc25`
|
||||
- `albert-base-v2`, `albert-xlarge-v2`
|
||||
- `xlm-roberta-base`, `xlm-roberta-large`
|
||||
|
||||
Browse all: https://huggingface.co/models?library=tokenizers
|
||||
|
||||
## References
|
||||
|
||||
- **[Training Guide](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/huggingface-tokenizers/references/training.md)** - Train custom tokenizers, configure trainers, handle large datasets
|
||||
- **[Algorithms Deep Dive](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/huggingface-tokenizers/references/algorithms.md)** - BPE, WordPiece, Unigram explained in detail
|
||||
- **[Pipeline Components](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/huggingface-tokenizers/references/pipeline.md)** - Normalizers, pre-tokenizers, post-processors, decoders
|
||||
- **[Transformers Integration](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/huggingface-tokenizers/references/integration.md)** - AutoTokenizer, PreTrainedTokenizerFast, special tokens
|
||||
|
||||
## Resources
|
||||
|
||||
- **Docs**: https://huggingface.co/docs/tokenizers
|
||||
- **GitHub**: https://github.com/huggingface/tokenizers ⭐ 9,000+
|
||||
- **Version**: 0.20.0+
|
||||
- **Course**: https://huggingface.co/learn/nlp-course/chapter6/1
|
||||
- **Paper**: BPE (Sennrich et al., 2016), WordPiece (Schuster & Nakajima, 2012)
|
||||
|
|
@ -0,0 +1,758 @@
|
|||
---
|
||||
title: "Instructor"
|
||||
sidebar_label: "Instructor"
|
||||
description: "Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream ..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Instructor
|
||||
|
||||
Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/instructor` |
|
||||
| Path | `optional-skills/mlops/instructor` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `instructor`, `pydantic`, `openai`, `anthropic` |
|
||||
| Tags | `Prompt Engineering`, `Instructor`, `Structured Output`, `Pydantic`, `Data Extraction`, `JSON Parsing`, `Type Safety`, `Validation`, `Streaming`, `OpenAI`, `Anthropic` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Instructor: Structured LLM Outputs
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use Instructor when you need to:
|
||||
- **Extract structured data** from LLM responses reliably
|
||||
- **Validate outputs** against Pydantic schemas automatically
|
||||
- **Retry failed extractions** with automatic error handling
|
||||
- **Parse complex JSON** with type safety and validation
|
||||
- **Stream partial results** for real-time processing
|
||||
- **Support multiple LLM providers** with consistent API
|
||||
|
||||
**GitHub Stars**: 15,000+ | **Battle-tested**: 100,000+ developers
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Base installation
|
||||
pip install instructor
|
||||
|
||||
# With specific providers
|
||||
pip install "instructor[anthropic]" # Anthropic Claude
|
||||
pip install "instructor[openai]" # OpenAI
|
||||
pip install "instructor[all]" # All providers
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Example: Extract User Data
|
||||
|
||||
```python
|
||||
import instructor
|
||||
from pydantic import BaseModel
|
||||
from anthropic import Anthropic
|
||||
|
||||
# Define output structure
|
||||
class User(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
email: str
|
||||
|
||||
# Create instructor client
|
||||
client = instructor.from_anthropic(Anthropic())
|
||||
|
||||
# Extract structured data
|
||||
user = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "John Doe is 30 years old. His email is john@example.com"
|
||||
}],
|
||||
response_model=User
|
||||
)
|
||||
|
||||
print(user.name) # "John Doe"
|
||||
print(user.age) # 30
|
||||
print(user.email) # "john@example.com"
|
||||
```
|
||||
|
||||
### With OpenAI
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = instructor.from_openai(OpenAI())
|
||||
|
||||
user = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
response_model=User,
|
||||
messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
|
||||
)
|
||||
```
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. Response Models (Pydantic)
|
||||
|
||||
Response models define the structure and validation rules for LLM outputs.
|
||||
|
||||
#### Basic Model
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class Article(BaseModel):
|
||||
title: str = Field(description="Article title")
|
||||
author: str = Field(description="Author name")
|
||||
word_count: int = Field(description="Number of words", gt=0)
|
||||
tags: list[str] = Field(description="List of relevant tags")
|
||||
|
||||
article = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Analyze this article: [article text]"
|
||||
}],
|
||||
response_model=Article
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Type safety with Python type hints
|
||||
- Automatic validation (word_count > 0)
|
||||
- Self-documenting with Field descriptions
|
||||
- IDE autocomplete support
|
||||
|
||||
#### Nested Models
|
||||
|
||||
```python
|
||||
class Address(BaseModel):
|
||||
street: str
|
||||
city: str
|
||||
country: str
|
||||
|
||||
class Person(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
address: Address # Nested model
|
||||
|
||||
person = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "John lives at 123 Main St, Boston, USA"
|
||||
}],
|
||||
response_model=Person
|
||||
)
|
||||
|
||||
print(person.address.city) # "Boston"
|
||||
```
|
||||
|
||||
#### Optional Fields
|
||||
|
||||
```python
|
||||
from typing import Optional
|
||||
|
||||
class Product(BaseModel):
|
||||
name: str
|
||||
price: float
|
||||
discount: Optional[float] = None # Optional
|
||||
description: str = Field(default="No description") # Default value
|
||||
|
||||
# LLM doesn't need to provide discount or description
|
||||
```
|
||||
|
||||
#### Enums for Constraints
|
||||
|
||||
```python
|
||||
from enum import Enum
|
||||
|
||||
class Sentiment(str, Enum):
|
||||
POSITIVE = "positive"
|
||||
NEGATIVE = "negative"
|
||||
NEUTRAL = "neutral"
|
||||
|
||||
class Review(BaseModel):
|
||||
text: str
|
||||
sentiment: Sentiment # Only these 3 values allowed
|
||||
|
||||
review = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "This product is amazing!"
|
||||
}],
|
||||
response_model=Review
|
||||
)
|
||||
|
||||
print(review.sentiment) # Sentiment.POSITIVE
|
||||
```
|
||||
|
||||
### 2. Validation
|
||||
|
||||
Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.
|
||||
|
||||
#### Built-in Validators
|
||||
|
||||
```python
|
||||
from pydantic import Field, EmailStr, HttpUrl
|
||||
|
||||
class Contact(BaseModel):
|
||||
name: str = Field(min_length=2, max_length=100)
|
||||
age: int = Field(ge=0, le=120) # 0 <= age <= 120
|
||||
email: EmailStr # Validates email format
|
||||
website: HttpUrl # Validates URL format
|
||||
|
||||
# If LLM provides invalid data, Instructor retries automatically
|
||||
```
|
||||
|
||||
#### Custom Validators
|
||||
|
||||
```python
|
||||
from pydantic import field_validator
|
||||
|
||||
class Event(BaseModel):
|
||||
name: str
|
||||
date: str
|
||||
attendees: int
|
||||
|
||||
@field_validator('date')
|
||||
def validate_date(cls, v):
|
||||
"""Ensure date is in YYYY-MM-DD format."""
|
||||
import re
|
||||
if not re.match(r'\d{4}-\d{2}-\d{2}', v):
|
||||
raise ValueError('Date must be YYYY-MM-DD format')
|
||||
return v
|
||||
|
||||
@field_validator('attendees')
|
||||
def validate_attendees(cls, v):
|
||||
"""Ensure positive attendees."""
|
||||
if v < 1:
|
||||
raise ValueError('Must have at least 1 attendee')
|
||||
return v
|
||||
```
|
||||
|
||||
#### Model-Level Validation
|
||||
|
||||
```python
|
||||
from pydantic import model_validator
|
||||
|
||||
class DateRange(BaseModel):
|
||||
start_date: str
|
||||
end_date: str
|
||||
|
||||
@model_validator(mode='after')
|
||||
def check_dates(self):
|
||||
"""Ensure end_date is after start_date."""
|
||||
from datetime import datetime
|
||||
start = datetime.strptime(self.start_date, '%Y-%m-%d')
|
||||
end = datetime.strptime(self.end_date, '%Y-%m-%d')
|
||||
|
||||
if end < start:
|
||||
raise ValueError('end_date must be after start_date')
|
||||
return self
|
||||
```
|
||||
|
||||
### 3. Automatic Retrying
|
||||
|
||||
Instructor retries automatically when validation fails, providing error feedback to the LLM.
|
||||
|
||||
```python
|
||||
# Retries up to 3 times if validation fails
|
||||
user = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Extract user from: John, age unknown"
|
||||
}],
|
||||
response_model=User,
|
||||
max_retries=3 # Default is 3
|
||||
)
|
||||
|
||||
# If age can't be extracted, Instructor tells the LLM:
|
||||
# "Validation error: age - field required"
|
||||
# LLM tries again with better extraction
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
1. LLM generates output
|
||||
2. Pydantic validates
|
||||
3. If invalid: Error message sent back to LLM
|
||||
4. LLM tries again with error feedback
|
||||
5. Repeats up to max_retries
|
||||
|
||||
### 4. Streaming
|
||||
|
||||
Stream partial results for real-time processing.
|
||||
|
||||
#### Streaming Partial Objects
|
||||
|
||||
```python
|
||||
from instructor import Partial
|
||||
|
||||
class Story(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
tags: list[str]
|
||||
|
||||
# Stream partial updates as LLM generates
|
||||
for partial_story in client.messages.create_partial(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Write a short sci-fi story"
|
||||
}],
|
||||
response_model=Story
|
||||
):
|
||||
print(f"Title: {partial_story.title}")
|
||||
print(f"Content so far: {partial_story.content[:100]}...")
|
||||
# Update UI in real-time
|
||||
```
|
||||
|
||||
#### Streaming Iterables
|
||||
|
||||
```python
|
||||
class Task(BaseModel):
|
||||
title: str
|
||||
priority: str
|
||||
|
||||
# Stream list items as they're generated
|
||||
tasks = client.messages.create_iterable(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Generate 10 project tasks"
|
||||
}],
|
||||
response_model=Task
|
||||
)
|
||||
|
||||
for task in tasks:
|
||||
print(f"- {task.title} ({task.priority})")
|
||||
# Process each task as it arrives
|
||||
```
|
||||
|
||||
## Provider Configuration
|
||||
|
||||
### Anthropic Claude
|
||||
|
||||
```python
|
||||
import instructor
|
||||
from anthropic import Anthropic
|
||||
|
||||
client = instructor.from_anthropic(
|
||||
Anthropic(api_key="your-api-key")
|
||||
)
|
||||
|
||||
# Use with Claude models
|
||||
response = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[...],
|
||||
response_model=YourModel
|
||||
)
|
||||
```
|
||||
|
||||
### OpenAI
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = instructor.from_openai(
|
||||
OpenAI(api_key="your-api-key")
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
response_model=YourModel,
|
||||
messages=[...]
|
||||
)
|
||||
```
|
||||
|
||||
### Local Models (Ollama)
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# Point to local Ollama server
|
||||
client = instructor.from_openai(
|
||||
OpenAI(
|
||||
base_url="http://localhost:11434/v1",
|
||||
api_key="ollama" # Required but ignored
|
||||
),
|
||||
mode=instructor.Mode.JSON
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="llama3.1",
|
||||
response_model=YourModel,
|
||||
messages=[...]
|
||||
)
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Data Extraction from Text
|
||||
|
||||
```python
|
||||
class CompanyInfo(BaseModel):
|
||||
name: str
|
||||
founded_year: int
|
||||
industry: str
|
||||
employees: int
|
||||
headquarters: str
|
||||
|
||||
text = """
|
||||
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
|
||||
industry with approximately 140,000 employees. The company is headquartered
|
||||
in Austin, Texas.
|
||||
"""
|
||||
|
||||
company = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": f"Extract company information from: {text}"
|
||||
}],
|
||||
response_model=CompanyInfo
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Classification
|
||||
|
||||
```python
|
||||
class Category(str, Enum):
|
||||
TECHNOLOGY = "technology"
|
||||
FINANCE = "finance"
|
||||
HEALTHCARE = "healthcare"
|
||||
EDUCATION = "education"
|
||||
OTHER = "other"
|
||||
|
||||
class ArticleClassification(BaseModel):
|
||||
category: Category
|
||||
confidence: float = Field(ge=0.0, le=1.0)
|
||||
keywords: list[str]
|
||||
|
||||
classification = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Classify this article: [article text]"
|
||||
}],
|
||||
response_model=ArticleClassification
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Entity Extraction
|
||||
|
||||
```python
|
||||
class Person(BaseModel):
|
||||
name: str
|
||||
role: str
|
||||
|
||||
class Organization(BaseModel):
|
||||
name: str
|
||||
industry: str
|
||||
|
||||
class Entities(BaseModel):
|
||||
people: list[Person]
|
||||
organizations: list[Organization]
|
||||
locations: list[str]
|
||||
|
||||
text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."
|
||||
|
||||
entities = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": f"Extract all entities from: {text}"
|
||||
}],
|
||||
response_model=Entities
|
||||
)
|
||||
|
||||
for person in entities.people:
|
||||
print(f"{person.name} - {person.role}")
|
||||
```
|
||||
|
||||
### Pattern 4: Structured Analysis
|
||||
|
||||
```python
|
||||
class SentimentAnalysis(BaseModel):
|
||||
overall_sentiment: Sentiment
|
||||
positive_aspects: list[str]
|
||||
negative_aspects: list[str]
|
||||
suggestions: list[str]
|
||||
score: float = Field(ge=-1.0, le=1.0)
|
||||
|
||||
review = "The product works well but setup was confusing..."
|
||||
|
||||
analysis = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": f"Analyze this review: {review}"
|
||||
}],
|
||||
response_model=SentimentAnalysis
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 5: Batch Processing
|
||||
|
||||
```python
|
||||
def extract_person(text: str) -> Person:
|
||||
return client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": f"Extract person from: {text}"
|
||||
}],
|
||||
response_model=Person
|
||||
)
|
||||
|
||||
texts = [
|
||||
"John Doe is a 30-year-old engineer",
|
||||
"Jane Smith, 25, works in marketing",
|
||||
"Bob Johnson, age 40, software developer"
|
||||
]
|
||||
|
||||
people = [extract_person(text) for text in texts]
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Union Types
|
||||
|
||||
```python
|
||||
from typing import Union
|
||||
|
||||
class TextContent(BaseModel):
|
||||
type: str = "text"
|
||||
content: str
|
||||
|
||||
class ImageContent(BaseModel):
|
||||
type: str = "image"
|
||||
url: HttpUrl
|
||||
caption: str
|
||||
|
||||
class Post(BaseModel):
|
||||
title: str
|
||||
content: Union[TextContent, ImageContent] # Either type
|
||||
|
||||
# LLM chooses appropriate type based on content
|
||||
```
|
||||
|
||||
### Dynamic Models
|
||||
|
||||
```python
|
||||
from pydantic import create_model
|
||||
|
||||
# Create model at runtime
|
||||
DynamicUser = create_model(
|
||||
'User',
|
||||
name=(str, ...),
|
||||
age=(int, Field(ge=0)),
|
||||
email=(EmailStr, ...)
|
||||
)
|
||||
|
||||
user = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[...],
|
||||
response_model=DynamicUser
|
||||
)
|
||||
```
|
||||
|
||||
### Custom Modes
|
||||
|
||||
```python
|
||||
# For providers without native structured outputs
|
||||
client = instructor.from_anthropic(
|
||||
Anthropic(),
|
||||
mode=instructor.Mode.JSON # JSON mode
|
||||
)
|
||||
|
||||
# Available modes:
|
||||
# - Mode.ANTHROPIC_TOOLS (recommended for Claude)
|
||||
# - Mode.JSON (fallback)
|
||||
# - Mode.TOOLS (OpenAI tools)
|
||||
```
|
||||
|
||||
### Context Management
|
||||
|
||||
```python
|
||||
# Single-use client
|
||||
with instructor.from_anthropic(Anthropic()) as client:
|
||||
result = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[...],
|
||||
response_model=YourModel
|
||||
)
|
||||
# Client closed automatically
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Handling Validation Errors
|
||||
|
||||
```python
|
||||
from pydantic import ValidationError
|
||||
|
||||
try:
|
||||
user = client.messages.create(
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
max_tokens=1024,
|
||||
messages=[...],
|
||||
response_model=User,
|
||||
max_retries=3
|
||||
)
|
||||
except ValidationError as e:
|
||||
print(f"Failed after retries: {e}")
|
||||
# Handle gracefully
|
||||
|
||||
except Exception as e:
|
||||
print(f"API error: {e}")
|
||||
```
|
||||
|
||||
### Custom Error Messages
|
||||
|
||||
```python
|
||||
class ValidatedUser(BaseModel):
|
||||
name: str = Field(description="Full name, 2-100 characters")
|
||||
age: int = Field(description="Age between 0 and 120", ge=0, le=120)
|
||||
email: EmailStr = Field(description="Valid email address")
|
||||
|
||||
class Config:
|
||||
# Custom error messages
|
||||
json_schema_extra = {
|
||||
"examples": [
|
||||
{
|
||||
"name": "John Doe",
|
||||
"age": 30,
|
||||
"email": "john@example.com"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Clear Field Descriptions
|
||||
|
||||
```python
|
||||
# ❌ Bad: Vague
|
||||
class Product(BaseModel):
|
||||
name: str
|
||||
price: float
|
||||
|
||||
# ✅ Good: Descriptive
|
||||
class Product(BaseModel):
|
||||
name: str = Field(description="Product name from the text")
|
||||
price: float = Field(description="Price in USD, without currency symbol")
|
||||
```
|
||||
|
||||
### 2. Use Appropriate Validation
|
||||
|
||||
```python
|
||||
# ✅ Good: Constrain values
|
||||
class Rating(BaseModel):
|
||||
score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")
|
||||
review: str = Field(min_length=10, description="Review text, at least 10 chars")
|
||||
```
|
||||
|
||||
### 3. Provide Examples in Prompts
|
||||
|
||||
```python
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": """Extract person info from: "John, 30, engineer"
|
||||
|
||||
Example format:
|
||||
{
|
||||
"name": "John Doe",
|
||||
"age": 30,
|
||||
"occupation": "engineer"
|
||||
}"""
|
||||
}]
|
||||
```
|
||||
|
||||
### 4. Use Enums for Fixed Categories
|
||||
|
||||
```python
|
||||
# ✅ Good: Enum ensures valid values
|
||||
class Status(str, Enum):
|
||||
PENDING = "pending"
|
||||
APPROVED = "approved"
|
||||
REJECTED = "rejected"
|
||||
|
||||
class Application(BaseModel):
|
||||
status: Status # LLM must choose from enum
|
||||
```
|
||||
|
||||
### 5. Handle Missing Data Gracefully
|
||||
|
||||
```python
|
||||
class PartialData(BaseModel):
|
||||
required_field: str
|
||||
optional_field: Optional[str] = None
|
||||
default_field: str = "default_value"
|
||||
|
||||
# LLM only needs to provide required_field
|
||||
```
|
||||
|
||||
## Comparison to Alternatives
|
||||
|
||||
| Feature | Instructor | Manual JSON | LangChain | DSPy |
|
||||
|---------|------------|-------------|-----------|------|
|
||||
| Type Safety | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
|
||||
| Auto Validation | ✅ Yes | ❌ No | ❌ No | ⚠️ Limited |
|
||||
| Auto Retry | ✅ Yes | ❌ No | ❌ No | ✅ Yes |
|
||||
| Streaming | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
||||
| Multi-Provider | ✅ Yes | ⚠️ Manual | ✅ Yes | ✅ Yes |
|
||||
| Learning Curve | Low | Low | Medium | High |
|
||||
|
||||
**When to choose Instructor:**
|
||||
- Need structured, validated outputs
|
||||
- Want type safety and IDE support
|
||||
- Require automatic retries
|
||||
- Building data extraction systems
|
||||
|
||||
**When to choose alternatives:**
|
||||
- DSPy: Need prompt optimization
|
||||
- LangChain: Building complex chains
|
||||
- Manual: Simple, one-off extractions
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://python.useinstructor.com
|
||||
- **GitHub**: https://github.com/jxnl/instructor (15k+ stars)
|
||||
- **Cookbook**: https://python.useinstructor.com/examples
|
||||
- **Discord**: Community support available
|
||||
|
||||
## See Also
|
||||
|
||||
- `references/validation.md` - Advanced validation patterns
|
||||
- `references/providers.md` - Provider-specific configuration
|
||||
- `references/examples.md` - Real-world use cases
|
||||
|
|
@ -0,0 +1,565 @@
|
|||
---
|
||||
title: "Lambda Labs Gpu Cloud — Reserved and on-demand GPU cloud instances for ML training and inference"
|
||||
sidebar_label: "Lambda Labs Gpu Cloud"
|
||||
description: "Reserved and on-demand GPU cloud instances for ML training and inference"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Lambda Labs Gpu Cloud
|
||||
|
||||
Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/lambda-labs` |
|
||||
| Path | `optional-skills/mlops/lambda-labs` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `lambda-cloud-client>=1.0.0` |
|
||||
| Tags | `Infrastructure`, `GPU Cloud`, `Training`, `Inference`, `Lambda Labs` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Lambda Labs GPU Cloud
|
||||
|
||||
Comprehensive guide to running ML workloads on Lambda Labs GPU cloud with on-demand instances and 1-Click Clusters.
|
||||
|
||||
## When to use Lambda Labs
|
||||
|
||||
**Use Lambda Labs when:**
|
||||
- Need dedicated GPU instances with full SSH access
|
||||
- Running long training jobs (hours to days)
|
||||
- Want simple pricing with no egress fees
|
||||
- Need persistent storage across sessions
|
||||
- Require high-performance multi-node clusters (16-512 GPUs)
|
||||
- Want pre-installed ML stack (Lambda Stack with PyTorch, CUDA, NCCL)
|
||||
|
||||
**Key features:**
|
||||
- **GPU variety**: B200, H100, GH200, A100, A10, A6000, V100
|
||||
- **Lambda Stack**: Pre-installed PyTorch, TensorFlow, CUDA, cuDNN, NCCL
|
||||
- **Persistent filesystems**: Keep data across instance restarts
|
||||
- **1-Click Clusters**: 16-512 GPU Slurm clusters with InfiniBand
|
||||
- **Simple pricing**: Pay-per-minute, no egress fees
|
||||
- **Global regions**: 12+ regions worldwide
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **Modal**: For serverless, auto-scaling workloads
|
||||
- **SkyPilot**: For multi-cloud orchestration and cost optimization
|
||||
- **RunPod**: For cheaper spot instances and serverless endpoints
|
||||
- **Vast.ai**: For GPU marketplace with lowest prices
|
||||
|
||||
## Quick start
|
||||
|
||||
### Account setup
|
||||
|
||||
1. Create account at https://lambda.ai
|
||||
2. Add payment method
|
||||
3. Generate API key from dashboard
|
||||
4. Add SSH key (required before launching instances)
|
||||
|
||||
### Launch via console
|
||||
|
||||
1. Go to https://cloud.lambda.ai/instances
|
||||
2. Click "Launch instance"
|
||||
3. Select GPU type and region
|
||||
4. Choose SSH key
|
||||
5. Optionally attach filesystem
|
||||
6. Launch and wait 3-15 minutes
|
||||
|
||||
### Connect via SSH
|
||||
|
||||
```bash
|
||||
# Get instance IP from console
|
||||
ssh ubuntu@<INSTANCE-IP>
|
||||
|
||||
# Or with specific key
|
||||
ssh -i ~/.ssh/lambda_key ubuntu@<INSTANCE-IP>
|
||||
```
|
||||
|
||||
## GPU instances
|
||||
|
||||
### Available GPUs
|
||||
|
||||
| GPU | VRAM | Price/GPU/hr | Best For |
|
||||
|-----|------|--------------|----------|
|
||||
| B200 SXM6 | 180 GB | $4.99 | Largest models, fastest training |
|
||||
| H100 SXM | 80 GB | $2.99-3.29 | Large model training |
|
||||
| H100 PCIe | 80 GB | $2.49 | Cost-effective H100 |
|
||||
| GH200 | 96 GB | $1.49 | Single-GPU large models |
|
||||
| A100 80GB | 80 GB | $1.79 | Production training |
|
||||
| A100 40GB | 40 GB | $1.29 | Standard training |
|
||||
| A10 | 24 GB | $0.75 | Inference, fine-tuning |
|
||||
| A6000 | 48 GB | $0.80 | Good VRAM/price ratio |
|
||||
| V100 | 16 GB | $0.55 | Budget training |
|
||||
|
||||
### Instance configurations
|
||||
|
||||
```
|
||||
8x GPU: Best for distributed training (DDP, FSDP)
|
||||
4x GPU: Large models, multi-GPU training
|
||||
2x GPU: Medium workloads
|
||||
1x GPU: Fine-tuning, inference, development
|
||||
```
|
||||
|
||||
### Launch times
|
||||
|
||||
- Single-GPU: 3-5 minutes
|
||||
- Multi-GPU: 10-15 minutes
|
||||
|
||||
## Lambda Stack
|
||||
|
||||
All instances come with Lambda Stack pre-installed:
|
||||
|
||||
```bash
|
||||
# Included software
|
||||
- Ubuntu 22.04 LTS
|
||||
- NVIDIA drivers (latest)
|
||||
- CUDA 12.x
|
||||
- cuDNN 8.x
|
||||
- NCCL (for multi-GPU)
|
||||
- PyTorch (latest)
|
||||
- TensorFlow (latest)
|
||||
- JAX
|
||||
- JupyterLab
|
||||
```
|
||||
|
||||
### Verify installation
|
||||
|
||||
```bash
|
||||
# Check GPU
|
||||
nvidia-smi
|
||||
|
||||
# Check PyTorch
|
||||
python -c "import torch; print(torch.cuda.is_available())"
|
||||
|
||||
# Check CUDA version
|
||||
nvcc --version
|
||||
```
|
||||
|
||||
## Python API
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install lambda-cloud-client
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
```python
|
||||
import os
|
||||
import lambda_cloud_client
|
||||
|
||||
# Configure with API key
|
||||
configuration = lambda_cloud_client.Configuration(
|
||||
host="https://cloud.lambdalabs.com/api/v1",
|
||||
access_token=os.environ["LAMBDA_API_KEY"]
|
||||
)
|
||||
```
|
||||
|
||||
### List available instances
|
||||
|
||||
```python
|
||||
with lambda_cloud_client.ApiClient(configuration) as api_client:
|
||||
api = lambda_cloud_client.DefaultApi(api_client)
|
||||
|
||||
# Get available instance types
|
||||
types = api.instance_types()
|
||||
for name, info in types.data.items():
|
||||
print(f"{name}: {info.instance_type.description}")
|
||||
```
|
||||
|
||||
### Launch instance
|
||||
|
||||
```python
|
||||
from lambda_cloud_client.models import LaunchInstanceRequest
|
||||
|
||||
request = LaunchInstanceRequest(
|
||||
region_name="us-west-1",
|
||||
instance_type_name="gpu_1x_h100_sxm5",
|
||||
ssh_key_names=["my-ssh-key"],
|
||||
file_system_names=["my-filesystem"], # Optional
|
||||
name="training-job"
|
||||
)
|
||||
|
||||
response = api.launch_instance(request)
|
||||
instance_id = response.data.instance_ids[0]
|
||||
print(f"Launched: {instance_id}")
|
||||
```
|
||||
|
||||
### List running instances
|
||||
|
||||
```python
|
||||
instances = api.list_instances()
|
||||
for instance in instances.data:
|
||||
print(f"{instance.name}: {instance.ip} ({instance.status})")
|
||||
```
|
||||
|
||||
### Terminate instance
|
||||
|
||||
```python
|
||||
from lambda_cloud_client.models import TerminateInstanceRequest
|
||||
|
||||
request = TerminateInstanceRequest(
|
||||
instance_ids=[instance_id]
|
||||
)
|
||||
api.terminate_instance(request)
|
||||
```
|
||||
|
||||
### SSH key management
|
||||
|
||||
```python
|
||||
from lambda_cloud_client.models import AddSshKeyRequest
|
||||
|
||||
# Add SSH key
|
||||
request = AddSshKeyRequest(
|
||||
name="my-key",
|
||||
public_key="ssh-rsa AAAA..."
|
||||
)
|
||||
api.add_ssh_key(request)
|
||||
|
||||
# List keys
|
||||
keys = api.list_ssh_keys()
|
||||
|
||||
# Delete key
|
||||
api.delete_ssh_key(key_id)
|
||||
```
|
||||
|
||||
## CLI with curl
|
||||
|
||||
### List instance types
|
||||
|
||||
```bash
|
||||
curl -u $LAMBDA_API_KEY: \
|
||||
https://cloud.lambdalabs.com/api/v1/instance-types | jq
|
||||
```
|
||||
|
||||
### Launch instance
|
||||
|
||||
```bash
|
||||
curl -u $LAMBDA_API_KEY: \
|
||||
-X POST https://cloud.lambdalabs.com/api/v1/instance-operations/launch \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"region_name": "us-west-1",
|
||||
"instance_type_name": "gpu_1x_h100_sxm5",
|
||||
"ssh_key_names": ["my-key"]
|
||||
}' | jq
|
||||
```
|
||||
|
||||
### Terminate instance
|
||||
|
||||
```bash
|
||||
curl -u $LAMBDA_API_KEY: \
|
||||
-X POST https://cloud.lambdalabs.com/api/v1/instance-operations/terminate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"instance_ids": ["<INSTANCE-ID>"]}' | jq
|
||||
```
|
||||
|
||||
## Persistent storage
|
||||
|
||||
### Filesystems
|
||||
|
||||
Filesystems persist data across instance restarts:
|
||||
|
||||
```bash
|
||||
# Mount location
|
||||
/lambda/nfs/<FILESYSTEM_NAME>
|
||||
|
||||
# Example: save checkpoints
|
||||
python train.py --checkpoint-dir /lambda/nfs/my-storage/checkpoints
|
||||
```
|
||||
|
||||
### Create filesystem
|
||||
|
||||
1. Go to Storage in Lambda console
|
||||
2. Click "Create filesystem"
|
||||
3. Select region (must match instance region)
|
||||
4. Name and create
|
||||
|
||||
### Attach to instance
|
||||
|
||||
Filesystems must be attached at instance launch time:
|
||||
- Via console: Select filesystem when launching
|
||||
- Via API: Include `file_system_names` in launch request
|
||||
|
||||
### Best practices
|
||||
|
||||
```bash
|
||||
# Store on filesystem (persists)
|
||||
/lambda/nfs/storage/
|
||||
├── datasets/
|
||||
├── checkpoints/
|
||||
├── models/
|
||||
└── outputs/
|
||||
|
||||
# Local SSD (faster, ephemeral)
|
||||
/home/ubuntu/
|
||||
└── working/ # Temporary files
|
||||
```
|
||||
|
||||
## SSH configuration
|
||||
|
||||
### Add SSH key
|
||||
|
||||
```bash
|
||||
# Generate key locally
|
||||
ssh-keygen -t ed25519 -f ~/.ssh/lambda_key
|
||||
|
||||
# Add public key to Lambda console
|
||||
# Or via API
|
||||
```
|
||||
|
||||
### Multiple keys
|
||||
|
||||
```bash
|
||||
# On instance, add more keys
|
||||
echo 'ssh-rsa AAAA...' >> ~/.ssh/authorized_keys
|
||||
```
|
||||
|
||||
### Import from GitHub
|
||||
|
||||
```bash
|
||||
# On instance
|
||||
ssh-import-id gh:username
|
||||
```
|
||||
|
||||
### SSH tunneling
|
||||
|
||||
```bash
|
||||
# Forward Jupyter
|
||||
ssh -L 8888:localhost:8888 ubuntu@<IP>
|
||||
|
||||
# Forward TensorBoard
|
||||
ssh -L 6006:localhost:6006 ubuntu@<IP>
|
||||
|
||||
# Multiple ports
|
||||
ssh -L 8888:localhost:8888 -L 6006:localhost:6006 ubuntu@<IP>
|
||||
```
|
||||
|
||||
## JupyterLab
|
||||
|
||||
### Launch from console
|
||||
|
||||
1. Go to Instances page
|
||||
2. Click "Launch" in Cloud IDE column
|
||||
3. JupyterLab opens in browser
|
||||
|
||||
### Manual access
|
||||
|
||||
```bash
|
||||
# On instance
|
||||
jupyter lab --ip=0.0.0.0 --port=8888
|
||||
|
||||
# From local machine with tunnel
|
||||
ssh -L 8888:localhost:8888 ubuntu@<IP>
|
||||
# Open http://localhost:8888
|
||||
```
|
||||
|
||||
## Training workflows
|
||||
|
||||
### Single-GPU training
|
||||
|
||||
```bash
|
||||
# SSH to instance
|
||||
ssh ubuntu@<IP>
|
||||
|
||||
# Clone repo
|
||||
git clone https://github.com/user/project
|
||||
cd project
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Train
|
||||
python train.py --epochs 100 --checkpoint-dir /lambda/nfs/storage/checkpoints
|
||||
```
|
||||
|
||||
### Multi-GPU training (single node)
|
||||
|
||||
```python
|
||||
# train_ddp.py
|
||||
import torch
|
||||
import torch.distributed as dist
|
||||
from torch.nn.parallel import DistributedDataParallel as DDP
|
||||
|
||||
def main():
|
||||
dist.init_process_group("nccl")
|
||||
rank = dist.get_rank()
|
||||
device = rank % torch.cuda.device_count()
|
||||
|
||||
model = MyModel().to(device)
|
||||
model = DDP(model, device_ids=[device])
|
||||
|
||||
# Training loop...
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
```bash
|
||||
# Launch with torchrun (8 GPUs)
|
||||
torchrun --nproc_per_node=8 train_ddp.py
|
||||
```
|
||||
|
||||
### Checkpoint to filesystem
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
checkpoint_dir = "/lambda/nfs/my-storage/checkpoints"
|
||||
os.makedirs(checkpoint_dir, exist_ok=True)
|
||||
|
||||
# Save checkpoint
|
||||
torch.save({
|
||||
'epoch': epoch,
|
||||
'model_state_dict': model.state_dict(),
|
||||
'optimizer_state_dict': optimizer.state_dict(),
|
||||
'loss': loss,
|
||||
}, f"{checkpoint_dir}/checkpoint_{epoch}.pt")
|
||||
```
|
||||
|
||||
## 1-Click Clusters
|
||||
|
||||
### Overview
|
||||
|
||||
High-performance Slurm clusters with:
|
||||
- 16-512 NVIDIA H100 or B200 GPUs
|
||||
- NVIDIA Quantum-2 400 Gb/s InfiniBand
|
||||
- GPUDirect RDMA at 3200 Gb/s
|
||||
- Pre-installed distributed ML stack
|
||||
|
||||
### Included software
|
||||
|
||||
- Ubuntu 22.04 LTS + Lambda Stack
|
||||
- NCCL, Open MPI
|
||||
- PyTorch with DDP and FSDP
|
||||
- TensorFlow
|
||||
- OFED drivers
|
||||
|
||||
### Storage
|
||||
|
||||
- 24 TB NVMe per compute node (ephemeral)
|
||||
- Lambda filesystems for persistent data
|
||||
|
||||
### Multi-node training
|
||||
|
||||
```bash
|
||||
# On Slurm cluster
|
||||
srun --nodes=4 --ntasks-per-node=8 --gpus-per-node=8 \
|
||||
torchrun --nnodes=4 --nproc_per_node=8 \
|
||||
--rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:29500 \
|
||||
train.py
|
||||
```
|
||||
|
||||
## Networking
|
||||
|
||||
### Bandwidth
|
||||
|
||||
- Inter-instance (same region): up to 200 Gbps
|
||||
- Internet outbound: 20 Gbps max
|
||||
|
||||
### Firewall
|
||||
|
||||
- Default: Only port 22 (SSH) open
|
||||
- Configure additional ports in Lambda console
|
||||
- ICMP traffic allowed by default
|
||||
|
||||
### Private IPs
|
||||
|
||||
```bash
|
||||
# Find private IP
|
||||
ip addr show | grep 'inet '
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: Fine-tuning LLM
|
||||
|
||||
```bash
|
||||
# 1. Launch 8x H100 instance with filesystem
|
||||
|
||||
# 2. SSH and setup
|
||||
ssh ubuntu@<IP>
|
||||
pip install transformers accelerate peft
|
||||
|
||||
# 3. Download model to filesystem
|
||||
python -c "
|
||||
from transformers import AutoModelForCausalLM
|
||||
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf')
|
||||
model.save_pretrained('/lambda/nfs/storage/models/llama-2-7b')
|
||||
"
|
||||
|
||||
# 4. Fine-tune with checkpoints on filesystem
|
||||
accelerate launch --num_processes 8 train.py \
|
||||
--model_path /lambda/nfs/storage/models/llama-2-7b \
|
||||
--output_dir /lambda/nfs/storage/outputs \
|
||||
--checkpoint_dir /lambda/nfs/storage/checkpoints
|
||||
```
|
||||
|
||||
### Workflow 2: Batch inference
|
||||
|
||||
```bash
|
||||
# 1. Launch A10 instance (cost-effective for inference)
|
||||
|
||||
# 2. Run inference
|
||||
python inference.py \
|
||||
--model /lambda/nfs/storage/models/fine-tuned \
|
||||
--input /lambda/nfs/storage/data/inputs.jsonl \
|
||||
--output /lambda/nfs/storage/data/outputs.jsonl
|
||||
```
|
||||
|
||||
## Cost optimization
|
||||
|
||||
### Choose right GPU
|
||||
|
||||
| Task | Recommended GPU |
|
||||
|------|-----------------|
|
||||
| LLM fine-tuning (7B) | A100 40GB |
|
||||
| LLM fine-tuning (70B) | 8x H100 |
|
||||
| Inference | A10, A6000 |
|
||||
| Development | V100, A10 |
|
||||
| Maximum performance | B200 |
|
||||
|
||||
### Reduce costs
|
||||
|
||||
1. **Use filesystems**: Avoid re-downloading data
|
||||
2. **Checkpoint frequently**: Resume interrupted training
|
||||
3. **Right-size**: Don't over-provision GPUs
|
||||
4. **Terminate idle**: No auto-stop, manually terminate
|
||||
|
||||
### Monitor usage
|
||||
|
||||
- Dashboard shows real-time GPU utilization
|
||||
- API for programmatic monitoring
|
||||
|
||||
## Common issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Instance won't launch | Check region availability, try different GPU |
|
||||
| SSH connection refused | Wait for instance to initialize (3-15 min) |
|
||||
| Data lost after terminate | Use persistent filesystems |
|
||||
| Slow data transfer | Use filesystem in same region |
|
||||
| GPU not detected | Reboot instance, check drivers |
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/lambda-labs/references/advanced-usage.md)** - Multi-node training, API automation
|
||||
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/lambda-labs/references/troubleshooting.md)** - Common issues and solutions
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://docs.lambda.ai
|
||||
- **Console**: https://cloud.lambda.ai
|
||||
- **Pricing**: https://lambda.ai/instances
|
||||
- **Support**: https://support.lambdalabs.com
|
||||
- **Blog**: https://lambda.ai/blog
|
||||
322
website/docs/user-guide/skills/optional/mlops/mlops-llava.md
Normal file
322
website/docs/user-guide/skills/optional/mlops/mlops-llava.md
Normal file
|
|
@ -0,0 +1,322 @@
|
|||
---
|
||||
title: "Llava — Large Language and Vision Assistant"
|
||||
sidebar_label: "Llava"
|
||||
description: "Large Language and Vision Assistant"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Llava
|
||||
|
||||
Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/llava` |
|
||||
| Path | `optional-skills/mlops/llava` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `transformers`, `torch`, `pillow` |
|
||||
| Tags | `LLaVA`, `Vision-Language`, `Multimodal`, `Visual Question Answering`, `Image Chat`, `CLIP`, `Vicuna`, `Conversational AI`, `Instruction Tuning`, `VQA` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# LLaVA - Large Language and Vision Assistant
|
||||
|
||||
Open-source vision-language model for conversational image understanding.
|
||||
|
||||
## When to use LLaVA
|
||||
|
||||
**Use when:**
|
||||
- Building vision-language chatbots
|
||||
- Visual question answering (VQA)
|
||||
- Image description and captioning
|
||||
- Multi-turn image conversations
|
||||
- Visual instruction following
|
||||
- Document understanding with images
|
||||
|
||||
**Metrics**:
|
||||
- **23,000+ GitHub stars**
|
||||
- GPT-4V level capabilities (targeted)
|
||||
- Apache 2.0 License
|
||||
- Multiple model sizes (7B-34B params)
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **GPT-4V**: Highest quality, API-based
|
||||
- **CLIP**: Simple zero-shot classification
|
||||
- **BLIP-2**: Better for captioning only
|
||||
- **Flamingo**: Research, not open-source
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/haotian-liu/LLaVA
|
||||
cd LLaVA
|
||||
|
||||
# Install
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
from llava.model.builder import load_pretrained_model
|
||||
from llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token
|
||||
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN
|
||||
from llava.conversation import conv_templates
|
||||
from PIL import Image
|
||||
import torch
|
||||
|
||||
# Load model
|
||||
model_path = "liuhaotian/llava-v1.5-7b"
|
||||
tokenizer, model, image_processor, context_len = load_pretrained_model(
|
||||
model_path=model_path,
|
||||
model_base=None,
|
||||
model_name=get_model_name_from_path(model_path)
|
||||
)
|
||||
|
||||
# Load image
|
||||
image = Image.open("image.jpg")
|
||||
image_tensor = process_images([image], image_processor, model.config)
|
||||
image_tensor = image_tensor.to(model.device, dtype=torch.float16)
|
||||
|
||||
# Create conversation
|
||||
conv = conv_templates["llava_v1"].copy()
|
||||
conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + "\nWhat is in this image?")
|
||||
conv.append_message(conv.roles[1], None)
|
||||
prompt = conv.get_prompt()
|
||||
|
||||
# Generate response
|
||||
input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).to(model.device)
|
||||
|
||||
with torch.inference_mode():
|
||||
output_ids = model.generate(
|
||||
input_ids,
|
||||
images=image_tensor,
|
||||
do_sample=True,
|
||||
temperature=0.2,
|
||||
max_new_tokens=512
|
||||
)
|
||||
|
||||
response = tokenizer.decode(output_ids[0], skip_special_tokens=True).strip()
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Available models
|
||||
|
||||
| Model | Parameters | VRAM | Quality |
|
||||
|-------|------------|------|---------|
|
||||
| LLaVA-v1.5-7B | 7B | ~14 GB | Good |
|
||||
| LLaVA-v1.5-13B | 13B | ~28 GB | Better |
|
||||
| LLaVA-v1.6-34B | 34B | ~70 GB | Best |
|
||||
|
||||
```python
|
||||
# Load different models
|
||||
model_7b = "liuhaotian/llava-v1.5-7b"
|
||||
model_13b = "liuhaotian/llava-v1.5-13b"
|
||||
model_34b = "liuhaotian/llava-v1.6-34b"
|
||||
|
||||
# 4-bit quantization for lower VRAM
|
||||
load_4bit = True # Reduces VRAM by ~4×
|
||||
```
|
||||
|
||||
## CLI usage
|
||||
|
||||
```bash
|
||||
# Single image query
|
||||
python -m llava.serve.cli \
|
||||
--model-path liuhaotian/llava-v1.5-7b \
|
||||
--image-file image.jpg \
|
||||
--query "What is in this image?"
|
||||
|
||||
# Multi-turn conversation
|
||||
python -m llava.serve.cli \
|
||||
--model-path liuhaotian/llava-v1.5-7b \
|
||||
--image-file image.jpg
|
||||
# Then type questions interactively
|
||||
```
|
||||
|
||||
## Web UI (Gradio)
|
||||
|
||||
```bash
|
||||
# Launch Gradio interface
|
||||
python -m llava.serve.gradio_web_server \
|
||||
--model-path liuhaotian/llava-v1.5-7b \
|
||||
--load-4bit # Optional: reduce VRAM
|
||||
|
||||
# Access at http://localhost:7860
|
||||
```
|
||||
|
||||
## Multi-turn conversations
|
||||
|
||||
```python
|
||||
# Initialize conversation
|
||||
conv = conv_templates["llava_v1"].copy()
|
||||
|
||||
# Turn 1
|
||||
conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + "\nWhat is in this image?")
|
||||
conv.append_message(conv.roles[1], None)
|
||||
response1 = generate(conv, model, image) # "A dog playing in a park"
|
||||
|
||||
# Turn 2
|
||||
conv.messages[-1][1] = response1 # Add previous response
|
||||
conv.append_message(conv.roles[0], "What breed is the dog?")
|
||||
conv.append_message(conv.roles[1], None)
|
||||
response2 = generate(conv, model, image) # "Golden Retriever"
|
||||
|
||||
# Turn 3
|
||||
conv.messages[-1][1] = response2
|
||||
conv.append_message(conv.roles[0], "What time of day is it?")
|
||||
conv.append_message(conv.roles[1], None)
|
||||
response3 = generate(conv, model, image)
|
||||
```
|
||||
|
||||
## Common tasks
|
||||
|
||||
### Image captioning
|
||||
|
||||
```python
|
||||
question = "Describe this image in detail."
|
||||
response = ask(model, image, question)
|
||||
```
|
||||
|
||||
### Visual question answering
|
||||
|
||||
```python
|
||||
question = "How many people are in the image?"
|
||||
response = ask(model, image, question)
|
||||
```
|
||||
|
||||
### Object detection (textual)
|
||||
|
||||
```python
|
||||
question = "List all the objects you can see in this image."
|
||||
response = ask(model, image, question)
|
||||
```
|
||||
|
||||
### Scene understanding
|
||||
|
||||
```python
|
||||
question = "What is happening in this scene?"
|
||||
response = ask(model, image, question)
|
||||
```
|
||||
|
||||
### Document understanding
|
||||
|
||||
```python
|
||||
question = "What is the main topic of this document?"
|
||||
response = ask(model, document_image, question)
|
||||
```
|
||||
|
||||
## Training custom model
|
||||
|
||||
```bash
|
||||
# Stage 1: Feature alignment (558K image-caption pairs)
|
||||
bash scripts/v1_5/pretrain.sh
|
||||
|
||||
# Stage 2: Visual instruction tuning (150K instruction data)
|
||||
bash scripts/v1_5/finetune.sh
|
||||
```
|
||||
|
||||
## Quantization (reduce VRAM)
|
||||
|
||||
```python
|
||||
# 4-bit quantization
|
||||
tokenizer, model, image_processor, context_len = load_pretrained_model(
|
||||
model_path="liuhaotian/llava-v1.5-13b",
|
||||
model_base=None,
|
||||
model_name=get_model_name_from_path("liuhaotian/llava-v1.5-13b"),
|
||||
load_4bit=True # Reduces VRAM ~4×
|
||||
)
|
||||
|
||||
# 8-bit quantization
|
||||
load_8bit=True # Reduces VRAM ~2×
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Start with 7B model** - Good quality, manageable VRAM
|
||||
2. **Use 4-bit quantization** - Reduces VRAM significantly
|
||||
3. **GPU required** - CPU inference extremely slow
|
||||
4. **Clear prompts** - Specific questions get better answers
|
||||
5. **Multi-turn conversations** - Maintain conversation context
|
||||
6. **Temperature 0.2-0.7** - Balance creativity/consistency
|
||||
7. **max_new_tokens 512-1024** - For detailed responses
|
||||
8. **Batch processing** - Process multiple images sequentially
|
||||
|
||||
## Performance
|
||||
|
||||
| Model | VRAM (FP16) | VRAM (4-bit) | Speed (tokens/s) |
|
||||
|-------|-------------|--------------|------------------|
|
||||
| 7B | ~14 GB | ~4 GB | ~20 |
|
||||
| 13B | ~28 GB | ~8 GB | ~12 |
|
||||
| 34B | ~70 GB | ~18 GB | ~5 |
|
||||
|
||||
*On A100 GPU*
|
||||
|
||||
## Benchmarks
|
||||
|
||||
LLaVA achieves competitive scores on:
|
||||
- **VQAv2**: 78.5%
|
||||
- **GQA**: 62.0%
|
||||
- **MM-Vet**: 35.4%
|
||||
- **MMBench**: 64.3%
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Hallucinations** - May describe things not in image
|
||||
2. **Spatial reasoning** - Struggles with precise locations
|
||||
3. **Small text** - Difficulty reading fine print
|
||||
4. **Object counting** - Imprecise for many objects
|
||||
5. **VRAM requirements** - Need powerful GPU
|
||||
6. **Inference speed** - Slower than CLIP
|
||||
|
||||
## Integration with frameworks
|
||||
|
||||
### LangChain
|
||||
|
||||
```python
|
||||
from langchain.llms.base import LLM
|
||||
|
||||
class LLaVALLM(LLM):
|
||||
def _call(self, prompt, stop=None):
|
||||
# Custom LLaVA inference
|
||||
return response
|
||||
|
||||
llm = LLaVALLM()
|
||||
```
|
||||
|
||||
### Gradio App
|
||||
|
||||
```python
|
||||
import gradio as gr
|
||||
|
||||
def chat(image, text, history):
|
||||
response = ask_llava(model, image, text)
|
||||
return response
|
||||
|
||||
demo = gr.ChatInterface(
|
||||
chat,
|
||||
additional_inputs=[gr.Image(type="pil")],
|
||||
title="LLaVA Chat"
|
||||
)
|
||||
demo.launch()
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/haotian-liu/LLaVA ⭐ 23,000+
|
||||
- **Paper**: https://arxiv.org/abs/2304.08485
|
||||
- **Demo**: https://llava.hliu.cc
|
||||
- **Models**: https://huggingface.co/liuhaotian
|
||||
- **License**: Apache 2.0
|
||||
361
website/docs/user-guide/skills/optional/mlops/mlops-modal.md
Normal file
361
website/docs/user-guide/skills/optional/mlops/mlops-modal.md
Normal file
|
|
@ -0,0 +1,361 @@
|
|||
---
|
||||
title: "Modal Serverless Gpu — Serverless GPU cloud platform for running ML workloads"
|
||||
sidebar_label: "Modal Serverless Gpu"
|
||||
description: "Serverless GPU cloud platform for running ML workloads"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Modal Serverless Gpu
|
||||
|
||||
Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/modal` |
|
||||
| Path | `optional-skills/mlops/modal` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `modal>=0.64.0` |
|
||||
| Tags | `Infrastructure`, `Serverless`, `GPU`, `Cloud`, `Deployment`, `Modal` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Modal Serverless GPU
|
||||
|
||||
Comprehensive guide to running ML workloads on Modal's serverless GPU cloud platform.
|
||||
|
||||
## When to use Modal
|
||||
|
||||
**Use Modal when:**
|
||||
- Running GPU-intensive ML workloads without managing infrastructure
|
||||
- Deploying ML models as auto-scaling APIs
|
||||
- Running batch processing jobs (training, inference, data processing)
|
||||
- Need pay-per-second GPU pricing without idle costs
|
||||
- Prototyping ML applications quickly
|
||||
- Running scheduled jobs (cron-like workloads)
|
||||
|
||||
**Key features:**
|
||||
- **Serverless GPUs**: T4, L4, A10G, L40S, A100, H100, H200, B200 on-demand
|
||||
- **Python-native**: Define infrastructure in Python code, no YAML
|
||||
- **Auto-scaling**: Scale to zero, scale to 100+ GPUs instantly
|
||||
- **Sub-second cold starts**: Rust-based infrastructure for fast container launches
|
||||
- **Container caching**: Image layers cached for rapid iteration
|
||||
- **Web endpoints**: Deploy functions as REST APIs with zero-downtime updates
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **RunPod**: For longer-running pods with persistent state
|
||||
- **Lambda Labs**: For reserved GPU instances
|
||||
- **SkyPilot**: For multi-cloud orchestration and cost optimization
|
||||
- **Kubernetes**: For complex multi-service architectures
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install modal
|
||||
modal setup # Opens browser for authentication
|
||||
```
|
||||
|
||||
### Hello World with GPU
|
||||
|
||||
```python
|
||||
import modal
|
||||
|
||||
app = modal.App("hello-gpu")
|
||||
|
||||
@app.function(gpu="T4")
|
||||
def gpu_info():
|
||||
import subprocess
|
||||
return subprocess.run(["nvidia-smi"], capture_output=True, text=True).stdout
|
||||
|
||||
@app.local_entrypoint()
|
||||
def main():
|
||||
print(gpu_info.remote())
|
||||
```
|
||||
|
||||
Run: `modal run hello_gpu.py`
|
||||
|
||||
### Basic inference endpoint
|
||||
|
||||
```python
|
||||
import modal
|
||||
|
||||
app = modal.App("text-generation")
|
||||
image = modal.Image.debian_slim().pip_install("transformers", "torch", "accelerate")
|
||||
|
||||
@app.cls(gpu="A10G", image=image)
|
||||
class TextGenerator:
|
||||
@modal.enter()
|
||||
def load_model(self):
|
||||
from transformers import pipeline
|
||||
self.pipe = pipeline("text-generation", model="gpt2", device=0)
|
||||
|
||||
@modal.method()
|
||||
def generate(self, prompt: str) -> str:
|
||||
return self.pipe(prompt, max_length=100)[0]["generated_text"]
|
||||
|
||||
@app.local_entrypoint()
|
||||
def main():
|
||||
print(TextGenerator().generate.remote("Hello, world"))
|
||||
```
|
||||
|
||||
## Core concepts
|
||||
|
||||
### Key components
|
||||
|
||||
| Component | Purpose |
|
||||
|-----------|---------|
|
||||
| `App` | Container for functions and resources |
|
||||
| `Function` | Serverless function with compute specs |
|
||||
| `Cls` | Class-based functions with lifecycle hooks |
|
||||
| `Image` | Container image definition |
|
||||
| `Volume` | Persistent storage for models/data |
|
||||
| `Secret` | Secure credential storage |
|
||||
|
||||
### Execution modes
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `modal run script.py` | Execute and exit |
|
||||
| `modal serve script.py` | Development with live reload |
|
||||
| `modal deploy script.py` | Persistent cloud deployment |
|
||||
|
||||
## GPU configuration
|
||||
|
||||
### Available GPUs
|
||||
|
||||
| GPU | VRAM | Best For |
|
||||
|-----|------|----------|
|
||||
| `T4` | 16GB | Budget inference, small models |
|
||||
| `L4` | 24GB | Inference, Ada Lovelace arch |
|
||||
| `A10G` | 24GB | Training/inference, 3.3x faster than T4 |
|
||||
| `L40S` | 48GB | Recommended for inference (best cost/perf) |
|
||||
| `A100-40GB` | 40GB | Large model training |
|
||||
| `A100-80GB` | 80GB | Very large models |
|
||||
| `H100` | 80GB | Fastest, FP8 + Transformer Engine |
|
||||
| `H200` | 141GB | Auto-upgrade from H100, 4.8TB/s bandwidth |
|
||||
| `B200` | Latest | Blackwell architecture |
|
||||
|
||||
### GPU specification patterns
|
||||
|
||||
```python
|
||||
# Single GPU
|
||||
@app.function(gpu="A100")
|
||||
|
||||
# Specific memory variant
|
||||
@app.function(gpu="A100-80GB")
|
||||
|
||||
# Multiple GPUs (up to 8)
|
||||
@app.function(gpu="H100:4")
|
||||
|
||||
# GPU with fallbacks
|
||||
@app.function(gpu=["H100", "A100", "L40S"])
|
||||
|
||||
# Any available GPU
|
||||
@app.function(gpu="any")
|
||||
```
|
||||
|
||||
## Container images
|
||||
|
||||
```python
|
||||
# Basic image with pip
|
||||
image = modal.Image.debian_slim(python_version="3.11").pip_install(
|
||||
"torch==2.1.0", "transformers==4.36.0", "accelerate"
|
||||
)
|
||||
|
||||
# From CUDA base
|
||||
image = modal.Image.from_registry(
|
||||
"nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04",
|
||||
add_python="3.11"
|
||||
).pip_install("torch", "transformers")
|
||||
|
||||
# With system packages
|
||||
image = modal.Image.debian_slim().apt_install("git", "ffmpeg").pip_install("whisper")
|
||||
```
|
||||
|
||||
## Persistent storage
|
||||
|
||||
```python
|
||||
volume = modal.Volume.from_name("model-cache", create_if_missing=True)
|
||||
|
||||
@app.function(gpu="A10G", volumes={"/models": volume})
|
||||
def load_model():
|
||||
import os
|
||||
model_path = "/models/llama-7b"
|
||||
if not os.path.exists(model_path):
|
||||
model = download_model()
|
||||
model.save_pretrained(model_path)
|
||||
volume.commit() # Persist changes
|
||||
return load_from_path(model_path)
|
||||
```
|
||||
|
||||
## Web endpoints
|
||||
|
||||
### FastAPI endpoint decorator
|
||||
|
||||
```python
|
||||
@app.function()
|
||||
@modal.fastapi_endpoint(method="POST")
|
||||
def predict(text: str) -> dict:
|
||||
return {"result": model.predict(text)}
|
||||
```
|
||||
|
||||
### Full ASGI app
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
web_app = FastAPI()
|
||||
|
||||
@web_app.post("/predict")
|
||||
async def predict(text: str):
|
||||
return {"result": await model.predict.remote.aio(text)}
|
||||
|
||||
@app.function()
|
||||
@modal.asgi_app()
|
||||
def fastapi_app():
|
||||
return web_app
|
||||
```
|
||||
|
||||
### Web endpoint types
|
||||
|
||||
| Decorator | Use Case |
|
||||
|-----------|----------|
|
||||
| `@modal.fastapi_endpoint()` | Simple function → API |
|
||||
| `@modal.asgi_app()` | Full FastAPI/Starlette apps |
|
||||
| `@modal.wsgi_app()` | Django/Flask apps |
|
||||
| `@modal.web_server(port)` | Arbitrary HTTP servers |
|
||||
|
||||
## Dynamic batching
|
||||
|
||||
```python
|
||||
@app.function()
|
||||
@modal.batched(max_batch_size=32, wait_ms=100)
|
||||
async def batch_predict(inputs: list[str]) -> list[dict]:
|
||||
# Inputs automatically batched
|
||||
return model.batch_predict(inputs)
|
||||
```
|
||||
|
||||
## Secrets management
|
||||
|
||||
```bash
|
||||
# Create secret
|
||||
modal secret create huggingface HF_TOKEN=hf_xxx
|
||||
```
|
||||
|
||||
```python
|
||||
@app.function(secrets=[modal.Secret.from_name("huggingface")])
|
||||
def download_model():
|
||||
import os
|
||||
token = os.environ["HF_TOKEN"]
|
||||
```
|
||||
|
||||
## Scheduling
|
||||
|
||||
```python
|
||||
@app.function(schedule=modal.Cron("0 0 * * *")) # Daily midnight
|
||||
def daily_job():
|
||||
pass
|
||||
|
||||
@app.function(schedule=modal.Period(hours=1))
|
||||
def hourly_job():
|
||||
pass
|
||||
```
|
||||
|
||||
## Performance optimization
|
||||
|
||||
### Cold start mitigation
|
||||
|
||||
```python
|
||||
@app.function(
|
||||
container_idle_timeout=300, # Keep warm 5 min
|
||||
allow_concurrent_inputs=10, # Handle concurrent requests
|
||||
)
|
||||
def inference():
|
||||
pass
|
||||
```
|
||||
|
||||
### Model loading best practices
|
||||
|
||||
```python
|
||||
@app.cls(gpu="A100")
|
||||
class Model:
|
||||
@modal.enter() # Run once at container start
|
||||
def load(self):
|
||||
self.model = load_model() # Load during warm-up
|
||||
|
||||
@modal.method()
|
||||
def predict(self, x):
|
||||
return self.model(x)
|
||||
```
|
||||
|
||||
## Parallel processing
|
||||
|
||||
```python
|
||||
@app.function()
|
||||
def process_item(item):
|
||||
return expensive_computation(item)
|
||||
|
||||
@app.function()
|
||||
def run_parallel():
|
||||
items = list(range(1000))
|
||||
# Fan out to parallel containers
|
||||
results = list(process_item.map(items))
|
||||
return results
|
||||
```
|
||||
|
||||
## Common configuration
|
||||
|
||||
```python
|
||||
@app.function(
|
||||
gpu="A100",
|
||||
memory=32768, # 32GB RAM
|
||||
cpu=4, # 4 CPU cores
|
||||
timeout=3600, # 1 hour max
|
||||
container_idle_timeout=120,# Keep warm 2 min
|
||||
retries=3, # Retry on failure
|
||||
concurrency_limit=10, # Max concurrent containers
|
||||
)
|
||||
def my_function():
|
||||
pass
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
```python
|
||||
# Test locally
|
||||
if __name__ == "__main__":
|
||||
result = my_function.local()
|
||||
|
||||
# View logs
|
||||
# modal app logs my-app
|
||||
```
|
||||
|
||||
## Common issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Cold start latency | Increase `container_idle_timeout`, use `@modal.enter()` |
|
||||
| GPU OOM | Use larger GPU (`A100-80GB`), enable gradient checkpointing |
|
||||
| Image build fails | Pin dependency versions, check CUDA compatibility |
|
||||
| Timeout errors | Increase `timeout`, add checkpointing |
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/modal/references/advanced-usage.md)** - Multi-GPU, distributed training, cost optimization
|
||||
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/modal/references/troubleshooting.md)** - Common issues and solutions
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://modal.com/docs
|
||||
- **Examples**: https://github.com/modal-labs/modal-examples
|
||||
- **Pricing**: https://modal.com/pricing
|
||||
- **Discord**: https://discord.gg/modal
|
||||
|
|
@ -0,0 +1,400 @@
|
|||
---
|
||||
title: "Nemo Curator — GPU-accelerated data curation for LLM training"
|
||||
sidebar_label: "Nemo Curator"
|
||||
description: "GPU-accelerated data curation for LLM training"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Nemo Curator
|
||||
|
||||
GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/nemo-curator` |
|
||||
| Path | `optional-skills/mlops/nemo-curator` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `nemo-curator`, `cudf`, `dask`, `rapids` |
|
||||
| Tags | `Data Processing`, `NeMo Curator`, `Data Curation`, `GPU Acceleration`, `Deduplication`, `Quality Filtering`, `NVIDIA`, `RAPIDS`, `PII Redaction`, `Multimodal`, `LLM Training Data` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# NeMo Curator - GPU-Accelerated Data Curation
|
||||
|
||||
NVIDIA's toolkit for preparing high-quality training data for LLMs.
|
||||
|
||||
## When to use NeMo Curator
|
||||
|
||||
**Use NeMo Curator when:**
|
||||
- Preparing LLM training data from web scrapes (Common Crawl)
|
||||
- Need fast deduplication (16× faster than CPU)
|
||||
- Curating multi-modal datasets (text, images, video, audio)
|
||||
- Filtering low-quality or toxic content
|
||||
- Scaling data processing across GPU cluster
|
||||
|
||||
**Performance**:
|
||||
- **16× faster** fuzzy deduplication (8TB RedPajama v2)
|
||||
- **40% lower TCO** vs CPU alternatives
|
||||
- **Near-linear scaling** across GPU nodes
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **datatrove**: CPU-based, open-source data processing
|
||||
- **dolma**: Allen AI's data toolkit
|
||||
- **Ray Data**: General ML data processing (no curation focus)
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Text curation (CUDA 12)
|
||||
uv pip install "nemo-curator[text_cuda12]"
|
||||
|
||||
# All modalities
|
||||
uv pip install "nemo-curator[all_cuda12]"
|
||||
|
||||
# CPU-only (slower)
|
||||
uv pip install "nemo-curator[cpu]"
|
||||
```
|
||||
|
||||
### Basic text curation pipeline
|
||||
|
||||
```python
|
||||
from nemo_curator import ScoreFilter, Modify
|
||||
from nemo_curator.datasets import DocumentDataset
|
||||
import pandas as pd
|
||||
|
||||
# Load data
|
||||
df = pd.DataFrame({"text": ["Good document", "Bad doc", "Excellent text"]})
|
||||
dataset = DocumentDataset(df)
|
||||
|
||||
# Quality filtering
|
||||
def quality_score(doc):
|
||||
return len(doc["text"].split()) > 5 # Filter short docs
|
||||
|
||||
filtered = ScoreFilter(quality_score)(dataset)
|
||||
|
||||
# Deduplication
|
||||
from nemo_curator.modules import ExactDuplicates
|
||||
deduped = ExactDuplicates()(filtered)
|
||||
|
||||
# Save
|
||||
deduped.to_parquet("curated_data/")
|
||||
```
|
||||
|
||||
## Data curation pipeline
|
||||
|
||||
### Stage 1: Quality filtering
|
||||
|
||||
```python
|
||||
from nemo_curator.filters import (
|
||||
WordCountFilter,
|
||||
RepeatedLinesFilter,
|
||||
UrlRatioFilter,
|
||||
NonAlphaNumericFilter
|
||||
)
|
||||
|
||||
# Apply 30+ heuristic filters
|
||||
from nemo_curator import ScoreFilter
|
||||
|
||||
# Word count filter
|
||||
dataset = dataset.filter(WordCountFilter(min_words=50, max_words=100000))
|
||||
|
||||
# Remove repetitive content
|
||||
dataset = dataset.filter(RepeatedLinesFilter(max_repeated_line_fraction=0.3))
|
||||
|
||||
# URL ratio filter
|
||||
dataset = dataset.filter(UrlRatioFilter(max_url_ratio=0.2))
|
||||
```
|
||||
|
||||
### Stage 2: Deduplication
|
||||
|
||||
**Exact deduplication**:
|
||||
```python
|
||||
from nemo_curator.modules import ExactDuplicates
|
||||
|
||||
# Remove exact duplicates
|
||||
deduped = ExactDuplicates(id_field="id", text_field="text")(dataset)
|
||||
```
|
||||
|
||||
**Fuzzy deduplication** (16× faster on GPU):
|
||||
```python
|
||||
from nemo_curator.modules import FuzzyDuplicates
|
||||
|
||||
# MinHash + LSH deduplication
|
||||
fuzzy_dedup = FuzzyDuplicates(
|
||||
id_field="id",
|
||||
text_field="text",
|
||||
num_hashes=260, # MinHash parameters
|
||||
num_buckets=20,
|
||||
hash_method="md5"
|
||||
)
|
||||
|
||||
deduped = fuzzy_dedup(dataset)
|
||||
```
|
||||
|
||||
**Semantic deduplication**:
|
||||
```python
|
||||
from nemo_curator.modules import SemanticDuplicates
|
||||
|
||||
# Embedding-based deduplication
|
||||
semantic_dedup = SemanticDuplicates(
|
||||
id_field="id",
|
||||
text_field="text",
|
||||
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
|
||||
threshold=0.8 # Cosine similarity threshold
|
||||
)
|
||||
|
||||
deduped = semantic_dedup(dataset)
|
||||
```
|
||||
|
||||
### Stage 3: PII redaction
|
||||
|
||||
```python
|
||||
from nemo_curator.modules import Modify
|
||||
from nemo_curator.modifiers import PIIRedactor
|
||||
|
||||
# Redact personally identifiable information
|
||||
pii_redactor = PIIRedactor(
|
||||
supported_entities=["EMAIL_ADDRESS", "PHONE_NUMBER", "PERSON", "LOCATION"],
|
||||
anonymize_action="replace" # or "redact"
|
||||
)
|
||||
|
||||
redacted = Modify(pii_redactor)(dataset)
|
||||
```
|
||||
|
||||
### Stage 4: Classifier filtering
|
||||
|
||||
```python
|
||||
from nemo_curator.classifiers import QualityClassifier
|
||||
|
||||
# Quality classification
|
||||
quality_clf = QualityClassifier(
|
||||
model_path="nvidia/quality-classifier-deberta",
|
||||
batch_size=256,
|
||||
device="cuda"
|
||||
)
|
||||
|
||||
# Filter low-quality documents
|
||||
high_quality = dataset.filter(lambda doc: quality_clf(doc["text"]) > 0.5)
|
||||
```
|
||||
|
||||
## GPU acceleration
|
||||
|
||||
### GPU vs CPU performance
|
||||
|
||||
| Operation | CPU (16 cores) | GPU (A100) | Speedup |
|
||||
|-----------|----------------|------------|---------|
|
||||
| Fuzzy dedup (8TB) | 120 hours | 7.5 hours | 16× |
|
||||
| Exact dedup (1TB) | 8 hours | 0.5 hours | 16× |
|
||||
| Quality filtering | 2 hours | 0.2 hours | 10× |
|
||||
|
||||
### Multi-GPU scaling
|
||||
|
||||
```python
|
||||
from nemo_curator import get_client
|
||||
import dask_cuda
|
||||
|
||||
# Initialize GPU cluster
|
||||
client = get_client(cluster_type="gpu", n_workers=8)
|
||||
|
||||
# Process with 8 GPUs
|
||||
deduped = FuzzyDuplicates(...)(dataset)
|
||||
```
|
||||
|
||||
## Multi-modal curation
|
||||
|
||||
### Image curation
|
||||
|
||||
```python
|
||||
from nemo_curator.image import (
|
||||
AestheticFilter,
|
||||
NSFWFilter,
|
||||
CLIPEmbedder
|
||||
)
|
||||
|
||||
# Aesthetic scoring
|
||||
aesthetic_filter = AestheticFilter(threshold=5.0)
|
||||
filtered_images = aesthetic_filter(image_dataset)
|
||||
|
||||
# NSFW detection
|
||||
nsfw_filter = NSFWFilter(threshold=0.9)
|
||||
safe_images = nsfw_filter(filtered_images)
|
||||
|
||||
# Generate CLIP embeddings
|
||||
clip_embedder = CLIPEmbedder(model="openai/clip-vit-base-patch32")
|
||||
image_embeddings = clip_embedder(safe_images)
|
||||
```
|
||||
|
||||
### Video curation
|
||||
|
||||
```python
|
||||
from nemo_curator.video import (
|
||||
SceneDetector,
|
||||
ClipExtractor,
|
||||
InternVideo2Embedder
|
||||
)
|
||||
|
||||
# Detect scenes
|
||||
scene_detector = SceneDetector(threshold=27.0)
|
||||
scenes = scene_detector(video_dataset)
|
||||
|
||||
# Extract clips
|
||||
clip_extractor = ClipExtractor(min_duration=2.0, max_duration=10.0)
|
||||
clips = clip_extractor(scenes)
|
||||
|
||||
# Generate embeddings
|
||||
video_embedder = InternVideo2Embedder()
|
||||
video_embeddings = video_embedder(clips)
|
||||
```
|
||||
|
||||
### Audio curation
|
||||
|
||||
```python
|
||||
from nemo_curator.audio import (
|
||||
ASRInference,
|
||||
WERFilter,
|
||||
DurationFilter
|
||||
)
|
||||
|
||||
# ASR transcription
|
||||
asr = ASRInference(model="nvidia/stt_en_fastconformer_hybrid_large_pc")
|
||||
transcribed = asr(audio_dataset)
|
||||
|
||||
# Filter by WER (word error rate)
|
||||
wer_filter = WERFilter(max_wer=0.3)
|
||||
high_quality_audio = wer_filter(transcribed)
|
||||
|
||||
# Duration filtering
|
||||
duration_filter = DurationFilter(min_duration=1.0, max_duration=30.0)
|
||||
filtered_audio = duration_filter(high_quality_audio)
|
||||
```
|
||||
|
||||
## Common patterns
|
||||
|
||||
### Web scrape curation (Common Crawl)
|
||||
|
||||
```python
|
||||
from nemo_curator import ScoreFilter, Modify
|
||||
from nemo_curator.filters import *
|
||||
from nemo_curator.modules import *
|
||||
from nemo_curator.datasets import DocumentDataset
|
||||
|
||||
# Load Common Crawl data
|
||||
dataset = DocumentDataset.read_parquet("common_crawl/*.parquet")
|
||||
|
||||
# Pipeline
|
||||
pipeline = [
|
||||
# 1. Quality filtering
|
||||
WordCountFilter(min_words=100, max_words=50000),
|
||||
RepeatedLinesFilter(max_repeated_line_fraction=0.2),
|
||||
SymbolToWordRatioFilter(max_symbol_to_word_ratio=0.3),
|
||||
UrlRatioFilter(max_url_ratio=0.3),
|
||||
|
||||
# 2. Language filtering
|
||||
LanguageIdentificationFilter(target_languages=["en"]),
|
||||
|
||||
# 3. Deduplication
|
||||
ExactDuplicates(id_field="id", text_field="text"),
|
||||
FuzzyDuplicates(id_field="id", text_field="text", num_hashes=260),
|
||||
|
||||
# 4. PII redaction
|
||||
PIIRedactor(),
|
||||
|
||||
# 5. NSFW filtering
|
||||
NSFWClassifier(threshold=0.8)
|
||||
]
|
||||
|
||||
# Execute
|
||||
for stage in pipeline:
|
||||
dataset = stage(dataset)
|
||||
|
||||
# Save
|
||||
dataset.to_parquet("curated_common_crawl/")
|
||||
```
|
||||
|
||||
### Distributed processing
|
||||
|
||||
```python
|
||||
from nemo_curator import get_client
|
||||
from dask_cuda import LocalCUDACluster
|
||||
|
||||
# Multi-GPU cluster
|
||||
cluster = LocalCUDACluster(n_workers=8)
|
||||
client = get_client(cluster=cluster)
|
||||
|
||||
# Process large dataset
|
||||
dataset = DocumentDataset.read_parquet("s3://large_dataset/*.parquet")
|
||||
deduped = FuzzyDuplicates(...)(dataset)
|
||||
|
||||
# Cleanup
|
||||
client.close()
|
||||
cluster.close()
|
||||
```
|
||||
|
||||
## Performance benchmarks
|
||||
|
||||
### Fuzzy deduplication (8TB RedPajama v2)
|
||||
|
||||
- **CPU (256 cores)**: 120 hours
|
||||
- **GPU (8× A100)**: 7.5 hours
|
||||
- **Speedup**: 16×
|
||||
|
||||
### Exact deduplication (1TB)
|
||||
|
||||
- **CPU (64 cores)**: 8 hours
|
||||
- **GPU (4× A100)**: 0.5 hours
|
||||
- **Speedup**: 16×
|
||||
|
||||
### Quality filtering (100GB)
|
||||
|
||||
- **CPU (32 cores)**: 2 hours
|
||||
- **GPU (2× A100)**: 0.2 hours
|
||||
- **Speedup**: 10×
|
||||
|
||||
## Cost comparison
|
||||
|
||||
**CPU-based curation** (AWS c5.18xlarge × 10):
|
||||
- Cost: $3.60/hour × 10 = $36/hour
|
||||
- Time for 8TB: 120 hours
|
||||
- **Total**: $4,320
|
||||
|
||||
**GPU-based curation** (AWS p4d.24xlarge × 2):
|
||||
- Cost: $32.77/hour × 2 = $65.54/hour
|
||||
- Time for 8TB: 7.5 hours
|
||||
- **Total**: $491.55
|
||||
|
||||
**Savings**: 89% reduction ($3,828 saved)
|
||||
|
||||
## Supported data formats
|
||||
|
||||
- **Input**: Parquet, JSONL, CSV
|
||||
- **Output**: Parquet (recommended), JSONL
|
||||
- **WebDataset**: TAR archives for multi-modal
|
||||
|
||||
## Use cases
|
||||
|
||||
**Production deployments**:
|
||||
- NVIDIA used NeMo Curator to prepare Nemotron-4 training data
|
||||
- Open-source datasets curated: RedPajama v2, The Pile
|
||||
|
||||
## References
|
||||
|
||||
- **[Filtering Guide](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/nemo-curator/references/filtering.md)** - 30+ quality filters, heuristics
|
||||
- **[Deduplication Guide](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/nemo-curator/references/deduplication.md)** - Exact, fuzzy, semantic methods
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/NVIDIA/NeMo-Curator ⭐ 500+
|
||||
- **Docs**: https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/
|
||||
- **Version**: 0.4.0+
|
||||
- **License**: Apache 2.0
|
||||
451
website/docs/user-guide/skills/optional/mlops/mlops-peft.md
Normal file
451
website/docs/user-guide/skills/optional/mlops/mlops-peft.md
Normal file
|
|
@ -0,0 +1,451 @@
|
|||
---
|
||||
title: "Peft Fine Tuning — Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods"
|
||||
sidebar_label: "Peft Fine Tuning"
|
||||
description: "Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Peft Fine Tuning
|
||||
|
||||
Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/peft` |
|
||||
| Path | `optional-skills/mlops/peft` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `peft>=0.13.0`, `transformers>=4.45.0`, `torch>=2.0.0`, `bitsandbytes>=0.43.0` |
|
||||
| Tags | `Fine-Tuning`, `PEFT`, `LoRA`, `QLoRA`, `Parameter-Efficient`, `Adapters`, `Low-Rank`, `Memory Optimization`, `Multi-Adapter` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# PEFT (Parameter-Efficient Fine-Tuning)
|
||||
|
||||
Fine-tune LLMs by training <1% of parameters using LoRA, QLoRA, and 25+ adapter methods.
|
||||
|
||||
## When to use PEFT
|
||||
|
||||
**Use PEFT/LoRA when:**
|
||||
- Fine-tuning 7B-70B models on consumer GPUs (RTX 4090, A100)
|
||||
- Need to train <1% parameters (6MB adapters vs 14GB full model)
|
||||
- Want fast iteration with multiple task-specific adapters
|
||||
- Deploying multiple fine-tuned variants from one base model
|
||||
|
||||
**Use QLoRA (PEFT + quantization) when:**
|
||||
- Fine-tuning 70B models on single 24GB GPU
|
||||
- Memory is the primary constraint
|
||||
- Can accept ~5% quality trade-off vs full fine-tuning
|
||||
|
||||
**Use full fine-tuning instead when:**
|
||||
- Training small models (<1B parameters)
|
||||
- Need maximum quality and have compute budget
|
||||
- Significant domain shift requires updating all weights
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Basic installation
|
||||
pip install peft
|
||||
|
||||
# With quantization support (recommended)
|
||||
pip install peft bitsandbytes
|
||||
|
||||
# Full stack
|
||||
pip install peft transformers accelerate bitsandbytes datasets
|
||||
```
|
||||
|
||||
### LoRA fine-tuning (standard)
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
|
||||
from peft import get_peft_model, LoraConfig, TaskType
|
||||
from datasets import load_dataset
|
||||
|
||||
# Load base model
|
||||
model_name = "meta-llama/Llama-3.1-8B"
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
tokenizer.pad_token = tokenizer.eos_token
|
||||
|
||||
# LoRA configuration
|
||||
lora_config = LoraConfig(
|
||||
task_type=TaskType.CAUSAL_LM,
|
||||
r=16, # Rank (8-64, higher = more capacity)
|
||||
lora_alpha=32, # Scaling factor (typically 2*r)
|
||||
lora_dropout=0.05, # Dropout for regularization
|
||||
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], # Attention layers
|
||||
bias="none" # Don't train biases
|
||||
)
|
||||
|
||||
# Apply LoRA
|
||||
model = get_peft_model(model, lora_config)
|
||||
model.print_trainable_parameters()
|
||||
# Output: trainable params: 13,631,488 || all params: 8,043,307,008 || trainable%: 0.17%
|
||||
|
||||
# Prepare dataset
|
||||
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
|
||||
|
||||
def tokenize(example):
|
||||
text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}"
|
||||
return tokenizer(text, truncation=True, max_length=512, padding="max_length")
|
||||
|
||||
tokenized = dataset.map(tokenize, remove_columns=dataset.column_names)
|
||||
|
||||
# Training
|
||||
training_args = TrainingArguments(
|
||||
output_dir="./lora-llama",
|
||||
num_train_epochs=3,
|
||||
per_device_train_batch_size=4,
|
||||
gradient_accumulation_steps=4,
|
||||
learning_rate=2e-4,
|
||||
fp16=True,
|
||||
logging_steps=10,
|
||||
save_strategy="epoch"
|
||||
)
|
||||
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized,
|
||||
data_collator=lambda data: {"input_ids": torch.stack([f["input_ids"] for f in data]),
|
||||
"attention_mask": torch.stack([f["attention_mask"] for f in data]),
|
||||
"labels": torch.stack([f["input_ids"] for f in data])}
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
|
||||
# Save adapter only (6MB vs 16GB)
|
||||
model.save_pretrained("./lora-llama-adapter")
|
||||
```
|
||||
|
||||
### QLoRA fine-tuning (memory-efficient)
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
||||
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
|
||||
|
||||
# 4-bit quantization config
|
||||
bnb_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_quant_type="nf4", # NormalFloat4 (best for LLMs)
|
||||
bnb_4bit_compute_dtype="bfloat16", # Compute in bf16
|
||||
bnb_4bit_use_double_quant=True # Nested quantization
|
||||
)
|
||||
|
||||
# Load quantized model
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"meta-llama/Llama-3.1-70B",
|
||||
quantization_config=bnb_config,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# Prepare for training (enables gradient checkpointing)
|
||||
model = prepare_model_for_kbit_training(model)
|
||||
|
||||
# LoRA config for QLoRA
|
||||
lora_config = LoraConfig(
|
||||
r=64, # Higher rank for 70B
|
||||
lora_alpha=128,
|
||||
lora_dropout=0.1,
|
||||
target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
|
||||
bias="none",
|
||||
task_type="CAUSAL_LM"
|
||||
)
|
||||
|
||||
model = get_peft_model(model, lora_config)
|
||||
# 70B model now fits on single 24GB GPU!
|
||||
```
|
||||
|
||||
## LoRA parameter selection
|
||||
|
||||
### Rank (r) - capacity vs efficiency
|
||||
|
||||
| Rank | Trainable Params | Memory | Quality | Use Case |
|
||||
|------|-----------------|--------|---------|----------|
|
||||
| 4 | ~3M | Minimal | Lower | Simple tasks, prototyping |
|
||||
| **8** | ~7M | Low | Good | **Recommended starting point** |
|
||||
| **16** | ~14M | Medium | Better | **General fine-tuning** |
|
||||
| 32 | ~27M | Higher | High | Complex tasks |
|
||||
| 64 | ~54M | High | Highest | Domain adaptation, 70B models |
|
||||
|
||||
### Alpha (lora_alpha) - scaling factor
|
||||
|
||||
```python
|
||||
# Rule of thumb: alpha = 2 * rank
|
||||
LoraConfig(r=16, lora_alpha=32) # Standard
|
||||
LoraConfig(r=16, lora_alpha=16) # Conservative (lower learning rate effect)
|
||||
LoraConfig(r=16, lora_alpha=64) # Aggressive (higher learning rate effect)
|
||||
```
|
||||
|
||||
### Target modules by architecture
|
||||
|
||||
```python
|
||||
# Llama / Mistral / Qwen
|
||||
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
|
||||
|
||||
# GPT-2 / GPT-Neo
|
||||
target_modules = ["c_attn", "c_proj", "c_fc"]
|
||||
|
||||
# Falcon
|
||||
target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]
|
||||
|
||||
# BLOOM
|
||||
target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]
|
||||
|
||||
# Auto-detect all linear layers
|
||||
target_modules = "all-linear" # PEFT 0.6.0+
|
||||
```
|
||||
|
||||
## Loading and merging adapters
|
||||
|
||||
### Load trained adapter
|
||||
|
||||
```python
|
||||
from peft import PeftModel, AutoPeftModelForCausalLM
|
||||
from transformers import AutoModelForCausalLM
|
||||
|
||||
# Option 1: Load with PeftModel
|
||||
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
|
||||
model = PeftModel.from_pretrained(base_model, "./lora-llama-adapter")
|
||||
|
||||
# Option 2: Load directly (recommended)
|
||||
model = AutoPeftModelForCausalLM.from_pretrained(
|
||||
"./lora-llama-adapter",
|
||||
device_map="auto"
|
||||
)
|
||||
```
|
||||
|
||||
### Merge adapter into base model
|
||||
|
||||
```python
|
||||
# Merge for deployment (no adapter overhead)
|
||||
merged_model = model.merge_and_unload()
|
||||
|
||||
# Save merged model
|
||||
merged_model.save_pretrained("./llama-merged")
|
||||
tokenizer.save_pretrained("./llama-merged")
|
||||
|
||||
# Push to Hub
|
||||
merged_model.push_to_hub("username/llama-finetuned")
|
||||
```
|
||||
|
||||
### Multi-adapter serving
|
||||
|
||||
```python
|
||||
from peft import PeftModel
|
||||
|
||||
# Load base with first adapter
|
||||
model = AutoPeftModelForCausalLM.from_pretrained("./adapter-task1")
|
||||
|
||||
# Load additional adapters
|
||||
model.load_adapter("./adapter-task2", adapter_name="task2")
|
||||
model.load_adapter("./adapter-task3", adapter_name="task3")
|
||||
|
||||
# Switch between adapters at runtime
|
||||
model.set_adapter("task1") # Use task1 adapter
|
||||
output1 = model.generate(**inputs)
|
||||
|
||||
model.set_adapter("task2") # Switch to task2
|
||||
output2 = model.generate(**inputs)
|
||||
|
||||
# Disable adapters (use base model)
|
||||
with model.disable_adapter():
|
||||
base_output = model.generate(**inputs)
|
||||
```
|
||||
|
||||
## PEFT methods comparison
|
||||
|
||||
| Method | Trainable % | Memory | Speed | Best For |
|
||||
|--------|------------|--------|-------|----------|
|
||||
| **LoRA** | 0.1-1% | Low | Fast | General fine-tuning |
|
||||
| **QLoRA** | 0.1-1% | Very Low | Medium | Memory-constrained |
|
||||
| AdaLoRA | 0.1-1% | Low | Medium | Automatic rank selection |
|
||||
| IA3 | 0.01% | Minimal | Fastest | Few-shot adaptation |
|
||||
| Prefix Tuning | 0.1% | Low | Medium | Generation control |
|
||||
| Prompt Tuning | 0.001% | Minimal | Fast | Simple task adaptation |
|
||||
| P-Tuning v2 | 0.1% | Low | Medium | NLU tasks |
|
||||
|
||||
### IA3 (minimal parameters)
|
||||
|
||||
```python
|
||||
from peft import IA3Config
|
||||
|
||||
ia3_config = IA3Config(
|
||||
target_modules=["q_proj", "v_proj", "k_proj", "down_proj"],
|
||||
feedforward_modules=["down_proj"]
|
||||
)
|
||||
model = get_peft_model(model, ia3_config)
|
||||
# Trains only 0.01% of parameters!
|
||||
```
|
||||
|
||||
### Prefix Tuning
|
||||
|
||||
```python
|
||||
from peft import PrefixTuningConfig
|
||||
|
||||
prefix_config = PrefixTuningConfig(
|
||||
task_type="CAUSAL_LM",
|
||||
num_virtual_tokens=20, # Prepended tokens
|
||||
prefix_projection=True # Use MLP projection
|
||||
)
|
||||
model = get_peft_model(model, prefix_config)
|
||||
```
|
||||
|
||||
## Integration patterns
|
||||
|
||||
### With TRL (SFTTrainer)
|
||||
|
||||
```python
|
||||
from trl import SFTTrainer, SFTConfig
|
||||
from peft import LoraConfig
|
||||
|
||||
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules="all-linear")
|
||||
|
||||
trainer = SFTTrainer(
|
||||
model=model,
|
||||
args=SFTConfig(output_dir="./output", max_seq_length=512),
|
||||
train_dataset=dataset,
|
||||
peft_config=lora_config, # Pass LoRA config directly
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### With Axolotl (YAML config)
|
||||
|
||||
```yaml
|
||||
# axolotl config.yaml
|
||||
adapter: lora
|
||||
lora_r: 16
|
||||
lora_alpha: 32
|
||||
lora_dropout: 0.05
|
||||
lora_target_modules:
|
||||
- q_proj
|
||||
- v_proj
|
||||
- k_proj
|
||||
- o_proj
|
||||
lora_target_linear: true # Target all linear layers
|
||||
```
|
||||
|
||||
### With vLLM (inference)
|
||||
|
||||
```python
|
||||
from vllm import LLM
|
||||
from vllm.lora.request import LoRARequest
|
||||
|
||||
# Load base model with LoRA support
|
||||
llm = LLM(model="meta-llama/Llama-3.1-8B", enable_lora=True)
|
||||
|
||||
# Serve with adapter
|
||||
outputs = llm.generate(
|
||||
prompts,
|
||||
lora_request=LoRARequest("adapter1", 1, "./lora-adapter")
|
||||
)
|
||||
```
|
||||
|
||||
## Performance benchmarks
|
||||
|
||||
### Memory usage (Llama 3.1 8B)
|
||||
|
||||
| Method | GPU Memory | Trainable Params |
|
||||
|--------|-----------|------------------|
|
||||
| Full fine-tuning | 60+ GB | 8B (100%) |
|
||||
| LoRA r=16 | 18 GB | 14M (0.17%) |
|
||||
| QLoRA r=16 | 6 GB | 14M (0.17%) |
|
||||
| IA3 | 16 GB | 800K (0.01%) |
|
||||
|
||||
### Training speed (A100 80GB)
|
||||
|
||||
| Method | Tokens/sec | vs Full FT |
|
||||
|--------|-----------|------------|
|
||||
| Full FT | 2,500 | 1x |
|
||||
| LoRA | 3,200 | 1.3x |
|
||||
| QLoRA | 2,100 | 0.84x |
|
||||
|
||||
### Quality (MMLU benchmark)
|
||||
|
||||
| Model | Full FT | LoRA | QLoRA |
|
||||
|-------|---------|------|-------|
|
||||
| Llama 2-7B | 45.3 | 44.8 | 44.1 |
|
||||
| Llama 2-13B | 54.8 | 54.2 | 53.5 |
|
||||
|
||||
## Common issues
|
||||
|
||||
### CUDA OOM during training
|
||||
|
||||
```python
|
||||
# Solution 1: Enable gradient checkpointing
|
||||
model.gradient_checkpointing_enable()
|
||||
|
||||
# Solution 2: Reduce batch size + increase accumulation
|
||||
TrainingArguments(
|
||||
per_device_train_batch_size=1,
|
||||
gradient_accumulation_steps=16
|
||||
)
|
||||
|
||||
# Solution 3: Use QLoRA
|
||||
from transformers import BitsAndBytesConfig
|
||||
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
|
||||
```
|
||||
|
||||
### Adapter not applying
|
||||
|
||||
```python
|
||||
# Verify adapter is active
|
||||
print(model.active_adapters) # Should show adapter name
|
||||
|
||||
# Check trainable parameters
|
||||
model.print_trainable_parameters()
|
||||
|
||||
# Ensure model in training mode
|
||||
model.train()
|
||||
```
|
||||
|
||||
### Quality degradation
|
||||
|
||||
```python
|
||||
# Increase rank
|
||||
LoraConfig(r=32, lora_alpha=64)
|
||||
|
||||
# Target more modules
|
||||
target_modules = "all-linear"
|
||||
|
||||
# Use more training data and epochs
|
||||
TrainingArguments(num_train_epochs=5)
|
||||
|
||||
# Lower learning rate
|
||||
TrainingArguments(learning_rate=1e-4)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Start with r=8-16**, increase if quality insufficient
|
||||
2. **Use alpha = 2 * rank** as starting point
|
||||
3. **Target attention + MLP layers** for best quality/efficiency
|
||||
4. **Enable gradient checkpointing** for memory savings
|
||||
5. **Save adapters frequently** (small files, easy rollback)
|
||||
6. **Evaluate on held-out data** before merging
|
||||
7. **Use QLoRA for 70B+ models** on consumer hardware
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/peft/references/advanced-usage.md)** - DoRA, LoftQ, rank stabilization, custom modules
|
||||
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/peft/references/troubleshooting.md)** - Common errors, debugging, optimization
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/huggingface/peft
|
||||
- **Docs**: https://huggingface.co/docs/peft
|
||||
- **LoRA Paper**: arXiv:2106.09685
|
||||
- **QLoRA Paper**: arXiv:2305.14314
|
||||
- **Models**: https://huggingface.co/models?library=peft
|
||||
376
website/docs/user-guide/skills/optional/mlops/mlops-pinecone.md
Normal file
376
website/docs/user-guide/skills/optional/mlops/mlops-pinecone.md
Normal file
|
|
@ -0,0 +1,376 @@
|
|||
---
|
||||
title: "Pinecone — Managed vector database for production AI applications"
|
||||
sidebar_label: "Pinecone"
|
||||
description: "Managed vector database for production AI applications"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Pinecone
|
||||
|
||||
Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/pinecone` |
|
||||
| Path | `optional-skills/mlops/pinecone` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `pinecone-client` |
|
||||
| Tags | `RAG`, `Pinecone`, `Vector Database`, `Managed Service`, `Serverless`, `Hybrid Search`, `Production`, `Auto-Scaling`, `Low Latency`, `Recommendations` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Pinecone - Managed Vector Database
|
||||
|
||||
The vector database for production AI applications.
|
||||
|
||||
## When to use Pinecone
|
||||
|
||||
**Use when:**
|
||||
- Need managed, serverless vector database
|
||||
- Production RAG applications
|
||||
- Auto-scaling required
|
||||
- Low latency critical (<100ms)
|
||||
- Don't want to manage infrastructure
|
||||
- Need hybrid search (dense + sparse vectors)
|
||||
|
||||
**Metrics**:
|
||||
- Fully managed SaaS
|
||||
- Auto-scales to billions of vectors
|
||||
- **p95 latency <100ms**
|
||||
- 99.9% uptime SLA
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Chroma**: Self-hosted, open-source
|
||||
- **FAISS**: Offline, pure similarity search
|
||||
- **Weaviate**: Self-hosted with more features
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install pinecone-client
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
|
||||
# Initialize
|
||||
pc = Pinecone(api_key="your-api-key")
|
||||
|
||||
# Create index
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536, # Must match embedding dimension
|
||||
metric="cosine", # or "euclidean", "dotproduct"
|
||||
spec=ServerlessSpec(cloud="aws", region="us-east-1")
|
||||
)
|
||||
|
||||
# Connect to index
|
||||
index = pc.Index("my-index")
|
||||
|
||||
# Upsert vectors
|
||||
index.upsert(vectors=[
|
||||
{"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
|
||||
{"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
|
||||
])
|
||||
|
||||
# Query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
include_metadata=True
|
||||
)
|
||||
|
||||
print(results["matches"])
|
||||
```
|
||||
|
||||
## Core operations
|
||||
|
||||
### Create index
|
||||
|
||||
```python
|
||||
# Serverless (recommended)
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=ServerlessSpec(
|
||||
cloud="aws", # or "gcp", "azure"
|
||||
region="us-east-1"
|
||||
)
|
||||
)
|
||||
|
||||
# Pod-based (for consistent performance)
|
||||
from pinecone import PodSpec
|
||||
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=PodSpec(
|
||||
environment="us-east1-gcp",
|
||||
pod_type="p1.x1"
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Upsert vectors
|
||||
|
||||
```python
|
||||
# Single upsert
|
||||
index.upsert(vectors=[
|
||||
{
|
||||
"id": "doc1",
|
||||
"values": [0.1, 0.2, ...], # 1536 dimensions
|
||||
"metadata": {
|
||||
"text": "Document content",
|
||||
"category": "tutorial",
|
||||
"timestamp": "2025-01-01"
|
||||
}
|
||||
}
|
||||
])
|
||||
|
||||
# Batch upsert (recommended)
|
||||
vectors = [
|
||||
{"id": f"vec{i}", "values": embedding, "metadata": metadata}
|
||||
for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
|
||||
]
|
||||
|
||||
index.upsert(vectors=vectors, batch_size=100)
|
||||
```
|
||||
|
||||
### Query vectors
|
||||
|
||||
```python
|
||||
# Basic query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=10,
|
||||
include_metadata=True,
|
||||
include_values=False
|
||||
)
|
||||
|
||||
# With metadata filtering
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
filter={"category": {"$eq": "tutorial"}}
|
||||
)
|
||||
|
||||
# Namespace query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
namespace="production"
|
||||
)
|
||||
|
||||
# Access results
|
||||
for match in results["matches"]:
|
||||
print(f"ID: {match['id']}")
|
||||
print(f"Score: {match['score']}")
|
||||
print(f"Metadata: {match['metadata']}")
|
||||
```
|
||||
|
||||
### Metadata filtering
|
||||
|
||||
```python
|
||||
# Exact match
|
||||
filter = {"category": "tutorial"}
|
||||
|
||||
# Comparison
|
||||
filter = {"price": {"$gte": 100}} # $gt, $gte, $lt, $lte, $ne
|
||||
|
||||
# Logical operators
|
||||
filter = {
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$lte": 3}}
|
||||
]
|
||||
} # Also: $or
|
||||
|
||||
# In operator
|
||||
filter = {"tags": {"$in": ["python", "ml"]}}
|
||||
```
|
||||
|
||||
## Namespaces
|
||||
|
||||
```python
|
||||
# Partition data by namespace
|
||||
index.upsert(
|
||||
vectors=[{"id": "vec1", "values": [...]}],
|
||||
namespace="user-123"
|
||||
)
|
||||
|
||||
# Query specific namespace
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
namespace="user-123",
|
||||
top_k=5
|
||||
)
|
||||
|
||||
# List namespaces
|
||||
stats = index.describe_index_stats()
|
||||
print(stats['namespaces'])
|
||||
```
|
||||
|
||||
## Hybrid search (dense + sparse)
|
||||
|
||||
```python
|
||||
# Upsert with sparse vectors
|
||||
index.upsert(vectors=[
|
||||
{
|
||||
"id": "doc1",
|
||||
"values": [0.1, 0.2, ...], # Dense vector
|
||||
"sparse_values": {
|
||||
"indices": [10, 45, 123], # Token IDs
|
||||
"values": [0.5, 0.3, 0.8] # TF-IDF scores
|
||||
},
|
||||
"metadata": {"text": "..."}
|
||||
}
|
||||
])
|
||||
|
||||
# Hybrid query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
sparse_vector={
|
||||
"indices": [10, 45],
|
||||
"values": [0.5, 0.3]
|
||||
},
|
||||
top_k=5,
|
||||
alpha=0.5 # 0=sparse, 1=dense, 0.5=hybrid
|
||||
)
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Create vector store
|
||||
vectorstore = PineconeVectorStore.from_documents(
|
||||
documents=docs,
|
||||
embedding=OpenAIEmbeddings(),
|
||||
index_name="my-index"
|
||||
)
|
||||
|
||||
# Query
|
||||
results = vectorstore.similarity_search("query", k=5)
|
||||
|
||||
# With metadata filter
|
||||
results = vectorstore.similarity_search(
|
||||
"query",
|
||||
k=5,
|
||||
filter={"category": "tutorial"}
|
||||
)
|
||||
|
||||
# As retriever
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.pinecone import PineconeVectorStore
|
||||
|
||||
# Connect to Pinecone
|
||||
pc = Pinecone(api_key="your-key")
|
||||
pinecone_index = pc.Index("my-index")
|
||||
|
||||
# Create vector store
|
||||
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
|
||||
|
||||
# Use in LlamaIndex
|
||||
from llama_index.core import StorageContext, VectorStoreIndex
|
||||
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
|
||||
```
|
||||
|
||||
## Index management
|
||||
|
||||
```python
|
||||
# List indices
|
||||
indexes = pc.list_indexes()
|
||||
|
||||
# Describe index
|
||||
index_info = pc.describe_index("my-index")
|
||||
print(index_info)
|
||||
|
||||
# Get index stats
|
||||
stats = index.describe_index_stats()
|
||||
print(f"Total vectors: {stats['total_vector_count']}")
|
||||
print(f"Namespaces: {stats['namespaces']}")
|
||||
|
||||
# Delete index
|
||||
pc.delete_index("my-index")
|
||||
```
|
||||
|
||||
## Delete vectors
|
||||
|
||||
```python
|
||||
# Delete by ID
|
||||
index.delete(ids=["vec1", "vec2"])
|
||||
|
||||
# Delete by filter
|
||||
index.delete(filter={"category": "old"})
|
||||
|
||||
# Delete all in namespace
|
||||
index.delete(delete_all=True, namespace="test")
|
||||
|
||||
# Delete entire index
|
||||
index.delete(delete_all=True)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use serverless** - Auto-scaling, cost-effective
|
||||
2. **Batch upserts** - More efficient (100-200 per batch)
|
||||
3. **Add metadata** - Enable filtering
|
||||
4. **Use namespaces** - Isolate data by user/tenant
|
||||
5. **Monitor usage** - Check Pinecone dashboard
|
||||
6. **Optimize filters** - Index frequently filtered fields
|
||||
7. **Test with free tier** - 1 index, 100K vectors free
|
||||
8. **Use hybrid search** - Better quality
|
||||
9. **Set appropriate dimensions** - Match embedding model
|
||||
10. **Regular backups** - Export important data
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | Latency | Notes |
|
||||
|-----------|---------|-------|
|
||||
| Upsert | ~50-100ms | Per batch |
|
||||
| Query (p50) | ~50ms | Depends on index size |
|
||||
| Query (p95) | ~100ms | SLA target |
|
||||
| Metadata filter | ~+10-20ms | Additional overhead |
|
||||
|
||||
## Pricing (as of 2025)
|
||||
|
||||
**Serverless**:
|
||||
- $0.096 per million read units
|
||||
- $0.06 per million write units
|
||||
- $0.06 per GB storage/month
|
||||
|
||||
**Free tier**:
|
||||
- 1 serverless index
|
||||
- 100K vectors (1536 dimensions)
|
||||
- Great for prototyping
|
||||
|
||||
## Resources
|
||||
|
||||
- **Website**: https://www.pinecone.io
|
||||
- **Docs**: https://docs.pinecone.io
|
||||
- **Console**: https://app.pinecone.io
|
||||
- **Pricing**: https://www.pinecone.io/pricing
|
||||
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1,364 @@
|
|||
---
|
||||
title: "Pytorch Lightning"
|
||||
sidebar_label: "Pytorch Lightning"
|
||||
description: "High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Pytorch Lightning
|
||||
|
||||
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/pytorch-lightning` |
|
||||
| Path | `optional-skills/mlops/pytorch-lightning` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `lightning`, `torch`, `transformers` |
|
||||
| Tags | `PyTorch Lightning`, `Training Framework`, `Distributed Training`, `DDP`, `FSDP`, `DeepSpeed`, `High-Level API`, `Callbacks`, `Best Practices`, `Scalable` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# PyTorch Lightning - High-Level Training Framework
|
||||
|
||||
## Quick start
|
||||
|
||||
PyTorch Lightning organizes PyTorch code to eliminate boilerplate while maintaining flexibility.
|
||||
|
||||
**Installation**:
|
||||
```bash
|
||||
pip install lightning
|
||||
```
|
||||
|
||||
**Convert PyTorch to Lightning** (3 steps):
|
||||
|
||||
```python
|
||||
import lightning as L
|
||||
import torch
|
||||
from torch import nn
|
||||
from torch.utils.data import DataLoader, Dataset
|
||||
|
||||
# Step 1: Define LightningModule (organize your PyTorch code)
|
||||
class LitModel(L.LightningModule):
|
||||
def __init__(self, hidden_size=128):
|
||||
super().__init__()
|
||||
self.model = nn.Sequential(
|
||||
nn.Linear(28 * 28, hidden_size),
|
||||
nn.ReLU(),
|
||||
nn.Linear(hidden_size, 10)
|
||||
)
|
||||
|
||||
def training_step(self, batch, batch_idx):
|
||||
x, y = batch
|
||||
y_hat = self.model(x)
|
||||
loss = nn.functional.cross_entropy(y_hat, y)
|
||||
self.log('train_loss', loss) # Auto-logged to TensorBoard
|
||||
return loss
|
||||
|
||||
def configure_optimizers(self):
|
||||
return torch.optim.Adam(self.parameters(), lr=1e-3)
|
||||
|
||||
# Step 2: Create data
|
||||
train_loader = DataLoader(train_dataset, batch_size=32)
|
||||
|
||||
# Step 3: Train with Trainer (handles everything else!)
|
||||
trainer = L.Trainer(max_epochs=10, accelerator='gpu', devices=2)
|
||||
model = LitModel()
|
||||
trainer.fit(model, train_loader)
|
||||
```
|
||||
|
||||
**That's it!** Trainer handles:
|
||||
- GPU/TPU/CPU switching
|
||||
- Distributed training (DDP, FSDP, DeepSpeed)
|
||||
- Mixed precision (FP16, BF16)
|
||||
- Gradient accumulation
|
||||
- Checkpointing
|
||||
- Logging
|
||||
- Progress bars
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: From PyTorch to Lightning
|
||||
|
||||
**Original PyTorch code**:
|
||||
```python
|
||||
model = MyModel()
|
||||
optimizer = torch.optim.Adam(model.parameters())
|
||||
model.to('cuda')
|
||||
|
||||
for epoch in range(max_epochs):
|
||||
for batch in train_loader:
|
||||
batch = batch.to('cuda')
|
||||
optimizer.zero_grad()
|
||||
loss = model(batch)
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
**Lightning version**:
|
||||
```python
|
||||
class LitModel(L.LightningModule):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.model = MyModel()
|
||||
|
||||
def training_step(self, batch, batch_idx):
|
||||
loss = self.model(batch) # No .to('cuda') needed!
|
||||
return loss
|
||||
|
||||
def configure_optimizers(self):
|
||||
return torch.optim.Adam(self.parameters())
|
||||
|
||||
# Train
|
||||
trainer = L.Trainer(max_epochs=10, accelerator='gpu')
|
||||
trainer.fit(LitModel(), train_loader)
|
||||
```
|
||||
|
||||
**Benefits**: 40+ lines → 15 lines, no device management, automatic distributed
|
||||
|
||||
### Workflow 2: Validation and testing
|
||||
|
||||
```python
|
||||
class LitModel(L.LightningModule):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.model = MyModel()
|
||||
|
||||
def training_step(self, batch, batch_idx):
|
||||
x, y = batch
|
||||
y_hat = self.model(x)
|
||||
loss = nn.functional.cross_entropy(y_hat, y)
|
||||
self.log('train_loss', loss)
|
||||
return loss
|
||||
|
||||
def validation_step(self, batch, batch_idx):
|
||||
x, y = batch
|
||||
y_hat = self.model(x)
|
||||
val_loss = nn.functional.cross_entropy(y_hat, y)
|
||||
acc = (y_hat.argmax(dim=1) == y).float().mean()
|
||||
self.log('val_loss', val_loss)
|
||||
self.log('val_acc', acc)
|
||||
|
||||
def test_step(self, batch, batch_idx):
|
||||
x, y = batch
|
||||
y_hat = self.model(x)
|
||||
test_loss = nn.functional.cross_entropy(y_hat, y)
|
||||
self.log('test_loss', test_loss)
|
||||
|
||||
def configure_optimizers(self):
|
||||
return torch.optim.Adam(self.parameters(), lr=1e-3)
|
||||
|
||||
# Train with validation
|
||||
trainer = L.Trainer(max_epochs=10)
|
||||
trainer.fit(model, train_loader, val_loader)
|
||||
|
||||
# Test
|
||||
trainer.test(model, test_loader)
|
||||
```
|
||||
|
||||
**Automatic features**:
|
||||
- Validation runs every epoch by default
|
||||
- Metrics logged to TensorBoard
|
||||
- Best model checkpointing based on val_loss
|
||||
|
||||
### Workflow 3: Distributed training (DDP)
|
||||
|
||||
```python
|
||||
# Same code as single GPU!
|
||||
model = LitModel()
|
||||
|
||||
# 8 GPUs with DDP (automatic!)
|
||||
trainer = L.Trainer(
|
||||
accelerator='gpu',
|
||||
devices=8,
|
||||
strategy='ddp' # Or 'fsdp', 'deepspeed'
|
||||
)
|
||||
|
||||
trainer.fit(model, train_loader)
|
||||
```
|
||||
|
||||
**Launch**:
|
||||
```bash
|
||||
# Single command, Lightning handles the rest
|
||||
python train.py
|
||||
```
|
||||
|
||||
**No changes needed**:
|
||||
- Automatic data distribution
|
||||
- Gradient synchronization
|
||||
- Multi-node support (just set `num_nodes=2`)
|
||||
|
||||
### Workflow 4: Callbacks for monitoring
|
||||
|
||||
```python
|
||||
from lightning.pytorch.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
|
||||
|
||||
# Create callbacks
|
||||
checkpoint = ModelCheckpoint(
|
||||
monitor='val_loss',
|
||||
mode='min',
|
||||
save_top_k=3,
|
||||
filename='model-{epoch:02d}-{val_loss:.2f}'
|
||||
)
|
||||
|
||||
early_stop = EarlyStopping(
|
||||
monitor='val_loss',
|
||||
patience=5,
|
||||
mode='min'
|
||||
)
|
||||
|
||||
lr_monitor = LearningRateMonitor(logging_interval='epoch')
|
||||
|
||||
# Add to Trainer
|
||||
trainer = L.Trainer(
|
||||
max_epochs=100,
|
||||
callbacks=[checkpoint, early_stop, lr_monitor]
|
||||
)
|
||||
|
||||
trainer.fit(model, train_loader, val_loader)
|
||||
```
|
||||
|
||||
**Result**:
|
||||
- Auto-saves best 3 models
|
||||
- Stops early if no improvement for 5 epochs
|
||||
- Logs learning rate to TensorBoard
|
||||
|
||||
### Workflow 5: Learning rate scheduling
|
||||
|
||||
```python
|
||||
class LitModel(L.LightningModule):
|
||||
# ... (training_step, etc.)
|
||||
|
||||
def configure_optimizers(self):
|
||||
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
|
||||
|
||||
# Cosine annealing
|
||||
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
|
||||
optimizer,
|
||||
T_max=100,
|
||||
eta_min=1e-5
|
||||
)
|
||||
|
||||
return {
|
||||
'optimizer': optimizer,
|
||||
'lr_scheduler': {
|
||||
'scheduler': scheduler,
|
||||
'interval': 'epoch', # Update per epoch
|
||||
'frequency': 1
|
||||
}
|
||||
}
|
||||
|
||||
# Learning rate auto-logged!
|
||||
trainer = L.Trainer(max_epochs=100)
|
||||
trainer.fit(model, train_loader)
|
||||
```
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use PyTorch Lightning when**:
|
||||
- Want clean, organized code
|
||||
- Need production-ready training loops
|
||||
- Switching between single GPU, multi-GPU, TPU
|
||||
- Want built-in callbacks and logging
|
||||
- Team collaboration (standardized structure)
|
||||
|
||||
**Key advantages**:
|
||||
- **Organized**: Separates research code from engineering
|
||||
- **Automatic**: DDP, FSDP, DeepSpeed with 1 line
|
||||
- **Callbacks**: Modular training extensions
|
||||
- **Reproducible**: Less boilerplate = fewer bugs
|
||||
- **Tested**: 1M+ downloads/month, battle-tested
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Accelerate**: Minimal changes to existing code, more flexibility
|
||||
- **Ray Train**: Multi-node orchestration, hyperparameter tuning
|
||||
- **Raw PyTorch**: Maximum control, learning purposes
|
||||
- **Keras**: TensorFlow ecosystem
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: Loss not decreasing**
|
||||
|
||||
Check data and model setup:
|
||||
```python
|
||||
# Add to training_step
|
||||
def training_step(self, batch, batch_idx):
|
||||
if batch_idx == 0:
|
||||
print(f"Batch shape: {batch[0].shape}")
|
||||
print(f"Labels: {batch[1]}")
|
||||
loss = ...
|
||||
return loss
|
||||
```
|
||||
|
||||
**Issue: Out of memory**
|
||||
|
||||
Reduce batch size or use gradient accumulation:
|
||||
```python
|
||||
trainer = L.Trainer(
|
||||
accumulate_grad_batches=4, # Effective batch = batch_size × 4
|
||||
precision='bf16' # Or 'fp16', reduces memory 50%
|
||||
)
|
||||
```
|
||||
|
||||
**Issue: Validation not running**
|
||||
|
||||
Ensure you pass val_loader:
|
||||
```python
|
||||
# WRONG
|
||||
trainer.fit(model, train_loader)
|
||||
|
||||
# CORRECT
|
||||
trainer.fit(model, train_loader, val_loader)
|
||||
```
|
||||
|
||||
**Issue: DDP spawns multiple processes unexpectedly**
|
||||
|
||||
Lightning auto-detects GPUs. Explicitly set devices:
|
||||
```python
|
||||
# Test on CPU first
|
||||
trainer = L.Trainer(accelerator='cpu', devices=1)
|
||||
|
||||
# Then GPU
|
||||
trainer = L.Trainer(accelerator='gpu', devices=1)
|
||||
```
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**Callbacks**: See [references/callbacks.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/pytorch-lightning/references/callbacks.md) for EarlyStopping, ModelCheckpoint, custom callbacks, and callback hooks.
|
||||
|
||||
**Distributed strategies**: See [references/distributed.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/pytorch-lightning/references/distributed.md) for DDP, FSDP, DeepSpeed ZeRO integration, multi-node setup.
|
||||
|
||||
**Hyperparameter tuning**: See [references/hyperparameter-tuning.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/pytorch-lightning/references/hyperparameter-tuning.md) for integration with Optuna, Ray Tune, and WandB sweeps.
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
- **CPU**: Works (good for debugging)
|
||||
- **Single GPU**: Works
|
||||
- **Multi-GPU**: DDP (default), FSDP, or DeepSpeed
|
||||
- **Multi-node**: DDP, FSDP, DeepSpeed
|
||||
- **TPU**: Supported (8 cores)
|
||||
- **Apple MPS**: Supported
|
||||
|
||||
**Precision options**:
|
||||
- FP32 (default)
|
||||
- FP16 (V100, older GPUs)
|
||||
- BF16 (A100/H100, recommended)
|
||||
- FP8 (H100)
|
||||
|
||||
## Resources
|
||||
|
||||
- Docs: https://lightning.ai/docs/pytorch/stable/
|
||||
- GitHub: https://github.com/Lightning-AI/pytorch-lightning ⭐ 29,000+
|
||||
- Version: 2.5.5+
|
||||
- Examples: https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples
|
||||
- Discord: https://discord.gg/lightning-ai
|
||||
- Used by: Kaggle winners, research labs, production teams
|
||||
513
website/docs/user-guide/skills/optional/mlops/mlops-qdrant.md
Normal file
513
website/docs/user-guide/skills/optional/mlops/mlops-qdrant.md
Normal file
|
|
@ -0,0 +1,513 @@
|
|||
---
|
||||
title: "Qdrant Vector Search — High-performance vector similarity search engine for RAG and semantic search"
|
||||
sidebar_label: "Qdrant Vector Search"
|
||||
description: "High-performance vector similarity search engine for RAG and semantic search"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Qdrant Vector Search
|
||||
|
||||
High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/qdrant` |
|
||||
| Path | `optional-skills/mlops/qdrant` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `qdrant-client>=1.12.0` |
|
||||
| Tags | `RAG`, `Vector Search`, `Qdrant`, `Semantic Search`, `Embeddings`, `Similarity Search`, `HNSW`, `Production`, `Distributed` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Qdrant - Vector Similarity Search Engine
|
||||
|
||||
High-performance vector database written in Rust for production RAG and semantic search.
|
||||
|
||||
## When to use Qdrant
|
||||
|
||||
**Use Qdrant when:**
|
||||
- Building production RAG systems requiring low latency
|
||||
- Need hybrid search (vectors + metadata filtering)
|
||||
- Require horizontal scaling with sharding/replication
|
||||
- Want on-premise deployment with full data control
|
||||
- Need multi-vector storage per record (dense + sparse)
|
||||
- Building real-time recommendation systems
|
||||
|
||||
**Key features:**
|
||||
- **Rust-powered**: Memory-safe, high performance
|
||||
- **Rich filtering**: Filter by any payload field during search
|
||||
- **Multiple vectors**: Dense, sparse, multi-dense per point
|
||||
- **Quantization**: Scalar, product, binary for memory efficiency
|
||||
- **Distributed**: Raft consensus, sharding, replication
|
||||
- **REST + gRPC**: Both APIs with full feature parity
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **Chroma**: Simpler setup, embedded use cases
|
||||
- **FAISS**: Maximum raw speed, research/batch processing
|
||||
- **Pinecone**: Fully managed, zero ops preferred
|
||||
- **Weaviate**: GraphQL preference, built-in vectorizers
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Python client
|
||||
pip install qdrant-client
|
||||
|
||||
# Docker (recommended for development)
|
||||
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
|
||||
|
||||
# Docker with persistent storage
|
||||
docker run -p 6333:6333 -p 6334:6334 \
|
||||
-v $(pwd)/qdrant_storage:/qdrant/storage \
|
||||
qdrant/qdrant
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import Distance, VectorParams, PointStruct
|
||||
|
||||
# Connect to Qdrant
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Create collection
|
||||
client.create_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Insert vectors with payload
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector=[0.1, 0.2, ...], # 384-dim vector
|
||||
payload={"title": "Doc 1", "category": "tech"}
|
||||
),
|
||||
PointStruct(
|
||||
id=2,
|
||||
vector=[0.3, 0.4, ...],
|
||||
payload={"title": "Doc 2", "category": "science"}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Search with filtering
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=[0.15, 0.25, ...],
|
||||
query_filter={
|
||||
"must": [{"key": "category", "match": {"value": "tech"}}]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
|
||||
for point in results:
|
||||
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
|
||||
```
|
||||
|
||||
## Core concepts
|
||||
|
||||
### Points - Basic data unit
|
||||
|
||||
```python
|
||||
from qdrant_client.models import PointStruct
|
||||
|
||||
# Point = ID + Vector(s) + Payload
|
||||
point = PointStruct(
|
||||
id=123, # Integer or UUID string
|
||||
vector=[0.1, 0.2, 0.3, ...], # Dense vector
|
||||
payload={ # Arbitrary JSON metadata
|
||||
"title": "Document title",
|
||||
"category": "tech",
|
||||
"timestamp": 1699900000,
|
||||
"tags": ["python", "ml"]
|
||||
}
|
||||
)
|
||||
|
||||
# Batch upsert (recommended)
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=[point1, point2, point3],
|
||||
wait=True # Wait for indexing
|
||||
)
|
||||
```
|
||||
|
||||
### Collections - Vector containers
|
||||
|
||||
```python
|
||||
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
|
||||
|
||||
# Create with HNSW configuration
|
||||
client.create_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(
|
||||
size=384, # Vector dimensions
|
||||
distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN
|
||||
),
|
||||
hnsw_config=HnswConfigDiff(
|
||||
m=16, # Connections per node (default 16)
|
||||
ef_construct=100, # Build-time accuracy (default 100)
|
||||
full_scan_threshold=10000 # Switch to brute force below this
|
||||
),
|
||||
on_disk_payload=True # Store payload on disk
|
||||
)
|
||||
|
||||
# Collection info
|
||||
info = client.get_collection("documents")
|
||||
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
|
||||
```
|
||||
|
||||
### Distance metrics
|
||||
|
||||
| Metric | Use Case | Range |
|
||||
|--------|----------|-------|
|
||||
| `COSINE` | Text embeddings, normalized vectors | 0 to 2 |
|
||||
| `EUCLID` | Spatial data, image features | 0 to ∞ |
|
||||
| `DOT` | Recommendations, unnormalized | -∞ to ∞ |
|
||||
| `MANHATTAN` | Sparse features, discrete data | 0 to ∞ |
|
||||
|
||||
## Search operations
|
||||
|
||||
### Basic search
|
||||
|
||||
```python
|
||||
# Simple nearest neighbor search
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=[0.1, 0.2, ...],
|
||||
limit=10,
|
||||
with_payload=True,
|
||||
with_vectors=False # Don't return vectors (faster)
|
||||
)
|
||||
```
|
||||
|
||||
### Filtered search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
|
||||
|
||||
# Complex filtering
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query_embedding,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="category", match=MatchValue(value="tech")),
|
||||
FieldCondition(key="timestamp", range=Range(gte=1699000000))
|
||||
],
|
||||
must_not=[
|
||||
FieldCondition(key="status", match=MatchValue(value="archived"))
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Shorthand filter syntax
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query_embedding,
|
||||
query_filter={
|
||||
"must": [
|
||||
{"key": "category", "match": {"value": "tech"}},
|
||||
{"key": "price", "range": {"gte": 10, "lte": 100}}
|
||||
]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Batch search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import SearchRequest
|
||||
|
||||
# Multiple queries in one request
|
||||
results = client.search_batch(
|
||||
collection_name="documents",
|
||||
requests=[
|
||||
SearchRequest(vector=[0.1, ...], limit=5),
|
||||
SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
|
||||
SearchRequest(vector=[0.3, ...], limit=10)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## RAG integration
|
||||
|
||||
### With sentence-transformers
|
||||
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import VectorParams, Distance, PointStruct
|
||||
|
||||
# Initialize
|
||||
encoder = SentenceTransformer("all-MiniLM-L6-v2")
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Create collection
|
||||
client.create_collection(
|
||||
collection_name="knowledge_base",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Index documents
|
||||
documents = [
|
||||
{"id": 1, "text": "Python is a programming language", "source": "wiki"},
|
||||
{"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
|
||||
]
|
||||
|
||||
points = [
|
||||
PointStruct(
|
||||
id=doc["id"],
|
||||
vector=encoder.encode(doc["text"]).tolist(),
|
||||
payload={"text": doc["text"], "source": doc["source"]}
|
||||
)
|
||||
for doc in documents
|
||||
]
|
||||
client.upsert(collection_name="knowledge_base", points=points)
|
||||
|
||||
# RAG retrieval
|
||||
def retrieve(query: str, top_k: int = 5) -> list[dict]:
|
||||
query_vector = encoder.encode(query).tolist()
|
||||
results = client.search(
|
||||
collection_name="knowledge_base",
|
||||
query_vector=query_vector,
|
||||
limit=top_k
|
||||
)
|
||||
return [{"text": r.payload["text"], "score": r.score} for r in results]
|
||||
|
||||
# Use in RAG pipeline
|
||||
context = retrieve("What is Python?")
|
||||
prompt = f"Context: {context}\n\nQuestion: What is Python?"
|
||||
```
|
||||
|
||||
### With LangChain
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import Qdrant
|
||||
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||
|
||||
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
|
||||
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
```
|
||||
|
||||
### With LlamaIndex
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.qdrant import QdrantVectorStore
|
||||
from llama_index.core import VectorStoreIndex, StorageContext
|
||||
|
||||
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
|
||||
query_engine = index.as_query_engine()
|
||||
```
|
||||
|
||||
## Multi-vector support
|
||||
|
||||
### Named vectors (different embedding models)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import VectorParams, Distance
|
||||
|
||||
# Collection with multiple vector types
|
||||
client.create_collection(
|
||||
collection_name="hybrid_search",
|
||||
vectors_config={
|
||||
"dense": VectorParams(size=384, distance=Distance.COSINE),
|
||||
"sparse": VectorParams(size=30000, distance=Distance.DOT)
|
||||
}
|
||||
)
|
||||
|
||||
# Insert with named vectors
|
||||
client.upsert(
|
||||
collection_name="hybrid_search",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector={
|
||||
"dense": dense_embedding,
|
||||
"sparse": sparse_embedding
|
||||
},
|
||||
payload={"text": "document text"}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Search specific vector
|
||||
results = client.search(
|
||||
collection_name="hybrid_search",
|
||||
query_vector=("dense", query_dense), # Specify which vector
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Sparse vectors (BM25, SPLADE)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
|
||||
|
||||
# Collection with sparse vectors
|
||||
client.create_collection(
|
||||
collection_name="sparse_search",
|
||||
vectors_config={},
|
||||
sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
|
||||
)
|
||||
|
||||
# Insert sparse vector
|
||||
client.upsert(
|
||||
collection_name="sparse_search",
|
||||
points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
|
||||
)
|
||||
```
|
||||
|
||||
## Quantization (memory optimization)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
|
||||
|
||||
# Scalar quantization (4x memory reduction)
|
||||
client.create_collection(
|
||||
collection_name="quantized",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ScalarQuantization(
|
||||
scalar=ScalarQuantizationConfig(
|
||||
type=ScalarType.INT8,
|
||||
quantile=0.99, # Clip outliers
|
||||
always_ram=True # Keep quantized in RAM
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
# Search with rescoring
|
||||
results = client.search(
|
||||
collection_name="quantized",
|
||||
query_vector=query,
|
||||
search_params={"quantization": {"rescore": True}}, # Rescore top results
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
## Payload indexing
|
||||
|
||||
```python
|
||||
from qdrant_client.models import PayloadSchemaType
|
||||
|
||||
# Create payload index for faster filtering
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="category",
|
||||
field_schema=PayloadSchemaType.KEYWORD
|
||||
)
|
||||
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="timestamp",
|
||||
field_schema=PayloadSchemaType.INTEGER
|
||||
)
|
||||
|
||||
# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL
|
||||
```
|
||||
|
||||
## Production deployment
|
||||
|
||||
### Qdrant Cloud
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
|
||||
# Connect to Qdrant Cloud
|
||||
client = QdrantClient(
|
||||
url="https://your-cluster.cloud.qdrant.io",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
```
|
||||
|
||||
### Performance tuning
|
||||
|
||||
```python
|
||||
# Optimize for search speed (higher recall)
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
|
||||
)
|
||||
|
||||
# Optimize for indexing speed (bulk loads)
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
optimizer_config={"indexing_threshold": 20000}
|
||||
)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Batch operations** - Use batch upsert/search for efficiency
|
||||
2. **Payload indexing** - Index fields used in filters
|
||||
3. **Quantization** - Enable for large collections (>1M vectors)
|
||||
4. **Sharding** - Use for collections >10M vectors
|
||||
5. **On-disk storage** - Enable `on_disk_payload` for large payloads
|
||||
6. **Connection pooling** - Reuse client instances
|
||||
|
||||
## Common issues
|
||||
|
||||
**Slow search with filters:**
|
||||
```python
|
||||
# Create payload index for filtered fields
|
||||
client.create_payload_index(
|
||||
collection_name="docs",
|
||||
field_name="category",
|
||||
field_schema=PayloadSchemaType.KEYWORD
|
||||
)
|
||||
```
|
||||
|
||||
**Out of memory:**
|
||||
```python
|
||||
# Enable quantization and on-disk storage
|
||||
client.create_collection(
|
||||
collection_name="large_collection",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ScalarQuantization(...),
|
||||
on_disk_payload=True
|
||||
)
|
||||
```
|
||||
|
||||
**Connection issues:**
|
||||
```python
|
||||
# Use timeout and retry
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
timeout=30,
|
||||
prefer_grpc=True # gRPC for better performance
|
||||
)
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/qdrant/references/advanced-usage.md)** - Distributed mode, hybrid search, recommendations
|
||||
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/qdrant/references/troubleshooting.md)** - Common issues, debugging, performance tuning
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/qdrant/qdrant (22k+ stars)
|
||||
- **Docs**: https://qdrant.tech/documentation/
|
||||
- **Python Client**: https://github.com/qdrant/qdrant-client
|
||||
- **Cloud**: https://cloud.qdrant.io
|
||||
- **Version**: 1.12.0+
|
||||
- **License**: Apache 2.0
|
||||
406
website/docs/user-guide/skills/optional/mlops/mlops-saelens.md
Normal file
406
website/docs/user-guide/skills/optional/mlops/mlops-saelens.md
Normal file
|
|
@ -0,0 +1,406 @@
|
|||
---
|
||||
title: "Sparse Autoencoder Training"
|
||||
sidebar_label: "Sparse Autoencoder Training"
|
||||
description: "Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Sparse Autoencoder Training
|
||||
|
||||
Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/saelens` |
|
||||
| Path | `optional-skills/mlops/saelens` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `sae-lens>=6.0.0`, `transformer-lens>=2.0.0`, `torch>=2.0.0` |
|
||||
| Tags | `Sparse Autoencoders`, `SAE`, `Mechanistic Interpretability`, `Feature Discovery`, `Superposition` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# SAELens: Sparse Autoencoders for Mechanistic Interpretability
|
||||
|
||||
SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity.
|
||||
|
||||
**GitHub**: [jbloomAus/SAELens](https://github.com/jbloomAus/SAELens) (1,100+ stars)
|
||||
|
||||
## The Problem: Polysemanticity & Superposition
|
||||
|
||||
Individual neurons in neural networks are **polysemantic** - they activate in multiple, semantically distinct contexts. This happens because models use **superposition** to represent more features than they have neurons, making interpretability difficult.
|
||||
|
||||
**SAEs solve this** by decomposing dense activations into sparse, monosemantic features - typically only a small number of features activate for any given input, and each feature corresponds to an interpretable concept.
|
||||
|
||||
## When to Use SAELens
|
||||
|
||||
**Use SAELens when you need to:**
|
||||
- Discover interpretable features in model activations
|
||||
- Understand what concepts a model has learned
|
||||
- Study superposition and feature geometry
|
||||
- Perform feature-based steering or ablation
|
||||
- Analyze safety-relevant features (deception, bias, harmful content)
|
||||
|
||||
**Consider alternatives when:**
|
||||
- You need basic activation analysis → Use **TransformerLens** directly
|
||||
- You want causal intervention experiments → Use **pyvene** or **TransformerLens**
|
||||
- You need production steering → Consider direct activation engineering
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install sae-lens
|
||||
```
|
||||
|
||||
Requirements: Python 3.10+, transformer-lens>=2.0.0
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### What SAEs Learn
|
||||
|
||||
SAEs are trained to reconstruct model activations through a sparse bottleneck:
|
||||
|
||||
```
|
||||
Input Activation → Encoder → Sparse Features → Decoder → Reconstructed Activation
|
||||
(d_model) ↓ (d_sae >> d_model) ↓ (d_model)
|
||||
sparsity reconstruction
|
||||
penalty loss
|
||||
```
|
||||
|
||||
**Loss Function**: `MSE(original, reconstructed) + L1_coefficient × L1(features)`
|
||||
|
||||
### Key Validation (Anthropic Research)
|
||||
|
||||
In "Towards Monosemanticity", human evaluators found **70% of SAE features genuinely interpretable**. Features discovered include:
|
||||
- DNA sequences, legal language, HTTP requests
|
||||
- Hebrew text, nutrition statements, code syntax
|
||||
- Sentiment, named entities, grammatical structures
|
||||
|
||||
## Workflow 1: Loading and Analyzing Pre-trained SAEs
|
||||
|
||||
### Step-by-Step
|
||||
|
||||
```python
|
||||
from transformer_lens import HookedTransformer
|
||||
from sae_lens import SAE
|
||||
|
||||
# 1. Load model and pre-trained SAE
|
||||
model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
|
||||
sae, cfg_dict, sparsity = SAE.from_pretrained(
|
||||
release="gpt2-small-res-jb",
|
||||
sae_id="blocks.8.hook_resid_pre",
|
||||
device="cuda"
|
||||
)
|
||||
|
||||
# 2. Get model activations
|
||||
tokens = model.to_tokens("The capital of France is Paris")
|
||||
_, cache = model.run_with_cache(tokens)
|
||||
activations = cache["resid_pre", 8] # [batch, pos, d_model]
|
||||
|
||||
# 3. Encode to SAE features
|
||||
sae_features = sae.encode(activations) # [batch, pos, d_sae]
|
||||
print(f"Active features: {(sae_features > 0).sum()}")
|
||||
|
||||
# 4. Find top features for each position
|
||||
for pos in range(tokens.shape[1]):
|
||||
top_features = sae_features[0, pos].topk(5)
|
||||
token = model.to_str_tokens(tokens[0, pos:pos+1])[0]
|
||||
print(f"Token '{token}': features {top_features.indices.tolist()}")
|
||||
|
||||
# 5. Reconstruct activations
|
||||
reconstructed = sae.decode(sae_features)
|
||||
reconstruction_error = (activations - reconstructed).norm()
|
||||
```
|
||||
|
||||
### Available Pre-trained SAEs
|
||||
|
||||
| Release | Model | Layers |
|
||||
|---------|-------|--------|
|
||||
| `gpt2-small-res-jb` | GPT-2 Small | Multiple residual streams |
|
||||
| `gemma-2b-res` | Gemma 2B | Residual streams |
|
||||
| Various on HuggingFace | Search tag `saelens` | Various |
|
||||
|
||||
### Checklist
|
||||
- [ ] Load model with TransformerLens
|
||||
- [ ] Load matching SAE for target layer
|
||||
- [ ] Encode activations to sparse features
|
||||
- [ ] Identify top-activating features per token
|
||||
- [ ] Validate reconstruction quality
|
||||
|
||||
## Workflow 2: Training a Custom SAE
|
||||
|
||||
### Step-by-Step
|
||||
|
||||
```python
|
||||
from sae_lens import SAE, LanguageModelSAERunnerConfig, SAETrainingRunner
|
||||
|
||||
# 1. Configure training
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
# Model
|
||||
model_name="gpt2-small",
|
||||
hook_name="blocks.8.hook_resid_pre",
|
||||
hook_layer=8,
|
||||
d_in=768, # Model dimension
|
||||
|
||||
# SAE architecture
|
||||
architecture="standard", # or "gated", "topk"
|
||||
d_sae=768 * 8, # Expansion factor of 8
|
||||
activation_fn="relu",
|
||||
|
||||
# Training
|
||||
lr=4e-4,
|
||||
l1_coefficient=8e-5, # Sparsity penalty
|
||||
l1_warm_up_steps=1000,
|
||||
train_batch_size_tokens=4096,
|
||||
training_tokens=100_000_000,
|
||||
|
||||
# Data
|
||||
dataset_path="monology/pile-uncopyrighted",
|
||||
context_size=128,
|
||||
|
||||
# Logging
|
||||
log_to_wandb=True,
|
||||
wandb_project="sae-training",
|
||||
|
||||
# Checkpointing
|
||||
checkpoint_path="checkpoints",
|
||||
n_checkpoints=5,
|
||||
)
|
||||
|
||||
# 2. Train
|
||||
trainer = SAETrainingRunner(cfg)
|
||||
sae = trainer.run()
|
||||
|
||||
# 3. Evaluate
|
||||
print(f"L0 (avg active features): {trainer.metrics['l0']}")
|
||||
print(f"CE Loss Recovered: {trainer.metrics['ce_loss_score']}")
|
||||
```
|
||||
|
||||
### Key Hyperparameters
|
||||
|
||||
| Parameter | Typical Value | Effect |
|
||||
|-----------|---------------|--------|
|
||||
| `d_sae` | 4-16× d_model | More features, higher capacity |
|
||||
| `l1_coefficient` | 5e-5 to 1e-4 | Higher = sparser, less accurate |
|
||||
| `lr` | 1e-4 to 1e-3 | Standard optimizer LR |
|
||||
| `l1_warm_up_steps` | 500-2000 | Prevents early feature death |
|
||||
|
||||
### Evaluation Metrics
|
||||
|
||||
| Metric | Target | Meaning |
|
||||
|--------|--------|---------|
|
||||
| **L0** | 50-200 | Average active features per token |
|
||||
| **CE Loss Score** | 80-95% | Cross-entropy recovered vs original |
|
||||
| **Dead Features** | <5% | Features that never activate |
|
||||
| **Explained Variance** | >90% | Reconstruction quality |
|
||||
|
||||
### Checklist
|
||||
- [ ] Choose target layer and hook point
|
||||
- [ ] Set expansion factor (d_sae = 4-16× d_model)
|
||||
- [ ] Tune L1 coefficient for desired sparsity
|
||||
- [ ] Enable L1 warm-up to prevent dead features
|
||||
- [ ] Monitor metrics during training (W&B)
|
||||
- [ ] Validate L0 and CE loss recovery
|
||||
- [ ] Check dead feature ratio
|
||||
|
||||
## Workflow 3: Feature Analysis and Steering
|
||||
|
||||
### Analyzing Individual Features
|
||||
|
||||
```python
|
||||
from transformer_lens import HookedTransformer
|
||||
from sae_lens import SAE
|
||||
import torch
|
||||
|
||||
model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
|
||||
sae, _, _ = SAE.from_pretrained(
|
||||
release="gpt2-small-res-jb",
|
||||
sae_id="blocks.8.hook_resid_pre",
|
||||
device="cuda"
|
||||
)
|
||||
|
||||
# Find what activates a specific feature
|
||||
feature_idx = 1234
|
||||
test_texts = [
|
||||
"The scientist conducted an experiment",
|
||||
"I love chocolate cake",
|
||||
"The code compiles successfully",
|
||||
"Paris is beautiful in spring",
|
||||
]
|
||||
|
||||
for text in test_texts:
|
||||
tokens = model.to_tokens(text)
|
||||
_, cache = model.run_with_cache(tokens)
|
||||
features = sae.encode(cache["resid_pre", 8])
|
||||
activation = features[0, :, feature_idx].max().item()
|
||||
print(f"{activation:.3f}: {text}")
|
||||
```
|
||||
|
||||
### Feature Steering
|
||||
|
||||
```python
|
||||
def steer_with_feature(model, sae, prompt, feature_idx, strength=5.0):
|
||||
"""Add SAE feature direction to residual stream."""
|
||||
tokens = model.to_tokens(prompt)
|
||||
|
||||
# Get feature direction from decoder
|
||||
feature_direction = sae.W_dec[feature_idx] # [d_model]
|
||||
|
||||
def steering_hook(activation, hook):
|
||||
# Add scaled feature direction at all positions
|
||||
activation += strength * feature_direction
|
||||
return activation
|
||||
|
||||
# Generate with steering
|
||||
output = model.generate(
|
||||
tokens,
|
||||
max_new_tokens=50,
|
||||
fwd_hooks=[("blocks.8.hook_resid_pre", steering_hook)]
|
||||
)
|
||||
return model.to_string(output[0])
|
||||
```
|
||||
|
||||
### Feature Attribution
|
||||
|
||||
```python
|
||||
# Which features most affect a specific output?
|
||||
tokens = model.to_tokens("The capital of France is")
|
||||
_, cache = model.run_with_cache(tokens)
|
||||
|
||||
# Get features at final position
|
||||
features = sae.encode(cache["resid_pre", 8])[0, -1] # [d_sae]
|
||||
|
||||
# Get logit attribution per feature
|
||||
# Feature contribution = feature_activation × decoder_weight × unembedding
|
||||
W_dec = sae.W_dec # [d_sae, d_model]
|
||||
W_U = model.W_U # [d_model, vocab]
|
||||
|
||||
# Contribution to "Paris" logit
|
||||
paris_token = model.to_single_token(" Paris")
|
||||
feature_contributions = features * (W_dec @ W_U[:, paris_token])
|
||||
|
||||
top_features = feature_contributions.topk(10)
|
||||
print("Top features for 'Paris' prediction:")
|
||||
for idx, val in zip(top_features.indices, top_features.values):
|
||||
print(f" Feature {idx.item()}: {val.item():.3f}")
|
||||
```
|
||||
|
||||
## Common Issues & Solutions
|
||||
|
||||
### Issue: High dead feature ratio
|
||||
```python
|
||||
# WRONG: No warm-up, features die early
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
l1_coefficient=1e-4,
|
||||
l1_warm_up_steps=0, # Bad!
|
||||
)
|
||||
|
||||
# RIGHT: Warm-up L1 penalty
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
l1_coefficient=8e-5,
|
||||
l1_warm_up_steps=1000, # Gradually increase
|
||||
use_ghost_grads=True, # Revive dead features
|
||||
)
|
||||
```
|
||||
|
||||
### Issue: Poor reconstruction (low CE recovery)
|
||||
```python
|
||||
# Reduce sparsity penalty
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
l1_coefficient=5e-5, # Lower = better reconstruction
|
||||
d_sae=768 * 16, # More capacity
|
||||
)
|
||||
```
|
||||
|
||||
### Issue: Features not interpretable
|
||||
```python
|
||||
# Increase sparsity (higher L1)
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
l1_coefficient=1e-4, # Higher = sparser, more interpretable
|
||||
)
|
||||
# Or use TopK architecture
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
architecture="topk",
|
||||
activation_fn_kwargs={"k": 50}, # Exactly 50 active features
|
||||
)
|
||||
```
|
||||
|
||||
### Issue: Memory errors during training
|
||||
```python
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
train_batch_size_tokens=2048, # Reduce batch size
|
||||
store_batch_size_prompts=4, # Fewer prompts in buffer
|
||||
n_batches_in_buffer=8, # Smaller activation buffer
|
||||
)
|
||||
```
|
||||
|
||||
## Integration with Neuronpedia
|
||||
|
||||
Browse pre-trained SAE features at [neuronpedia.org](https://neuronpedia.org):
|
||||
|
||||
```python
|
||||
# Features are indexed by SAE ID
|
||||
# Example: gpt2-small layer 8 feature 1234
|
||||
# → neuronpedia.org/gpt2-small/8-res-jb/1234
|
||||
```
|
||||
|
||||
## Key Classes Reference
|
||||
|
||||
| Class | Purpose |
|
||||
|-------|---------|
|
||||
| `SAE` | Sparse Autoencoder model |
|
||||
| `LanguageModelSAERunnerConfig` | Training configuration |
|
||||
| `SAETrainingRunner` | Training loop manager |
|
||||
| `ActivationsStore` | Activation collection and batching |
|
||||
| `HookedSAETransformer` | TransformerLens + SAE integration |
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
For detailed API documentation, tutorials, and advanced usage, see the `references/` folder:
|
||||
|
||||
| File | Contents |
|
||||
|------|----------|
|
||||
| [references/README.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/saelens/references/README.md) | Overview and quick start guide |
|
||||
| [references/api.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/saelens/references/api.md) | Complete API reference for SAE, TrainingSAE, configurations |
|
||||
| [references/tutorials.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/saelens/references/tutorials.md) | Step-by-step tutorials for training, analysis, steering |
|
||||
|
||||
## External Resources
|
||||
|
||||
### Tutorials
|
||||
- [Basic Loading & Analysis](https://github.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
|
||||
- [Training a Sparse Autoencoder](https://github.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
|
||||
- [ARENA SAE Curriculum](https://www.lesswrong.com/posts/LnHowHgmrMbWtpkxx/intro-to-superposition-and-sparse-autoencoders-colab)
|
||||
|
||||
### Papers
|
||||
- [Towards Monosemanticity](https://transformer-circuits.pub/2023/monosemantic-features) - Anthropic (2023)
|
||||
- [Scaling Monosemanticity](https://transformer-circuits.pub/2024/scaling-monosemanticity/) - Anthropic (2024)
|
||||
- [Sparse Autoencoders Find Highly Interpretable Features](https://arxiv.org/abs/2309.08600) - Cunningham et al. (ICLR 2024)
|
||||
|
||||
### Official Documentation
|
||||
- [SAELens Docs](https://jbloomaus.github.io/SAELens/)
|
||||
- [Neuronpedia](https://neuronpedia.org) - Feature browser
|
||||
|
||||
## SAE Architectures
|
||||
|
||||
| Architecture | Description | Use Case |
|
||||
|--------------|-------------|----------|
|
||||
| **Standard** | ReLU + L1 penalty | General purpose |
|
||||
| **Gated** | Learned gating mechanism | Better sparsity control |
|
||||
| **TopK** | Exactly K active features | Consistent sparsity |
|
||||
|
||||
```python
|
||||
# TopK SAE (exactly 50 features active)
|
||||
cfg = LanguageModelSAERunnerConfig(
|
||||
architecture="topk",
|
||||
activation_fn="topk",
|
||||
activation_fn_kwargs={"k": 50},
|
||||
)
|
||||
```
|
||||
236
website/docs/user-guide/skills/optional/mlops/mlops-simpo.md
Normal file
236
website/docs/user-guide/skills/optional/mlops/mlops-simpo.md
Normal file
|
|
@ -0,0 +1,236 @@
|
|||
---
|
||||
title: "Simpo Training — Simple Preference Optimization for LLM alignment"
|
||||
sidebar_label: "Simpo Training"
|
||||
description: "Simple Preference Optimization for LLM alignment"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Simpo Training
|
||||
|
||||
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/simpo` |
|
||||
| Path | `optional-skills/mlops/simpo` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `torch`, `transformers`, `datasets`, `trl`, `accelerate` |
|
||||
| Tags | `Post-Training`, `SimPO`, `Preference Optimization`, `Alignment`, `DPO Alternative`, `Reference-Free`, `LLM Alignment`, `Efficient Training` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# SimPO - Simple Preference Optimization
|
||||
|
||||
## Quick start
|
||||
|
||||
SimPO is a reference-free preference optimization method that outperforms DPO without needing a reference model.
|
||||
|
||||
**Installation**:
|
||||
```bash
|
||||
# Create environment
|
||||
conda create -n simpo python=3.10 && conda activate simpo
|
||||
|
||||
# Install PyTorch 2.2.2
|
||||
# Visit: https://pytorch.org/get-started/locally/
|
||||
|
||||
# Install alignment-handbook
|
||||
git clone https://github.com/huggingface/alignment-handbook.git
|
||||
cd alignment-handbook
|
||||
python -m pip install .
|
||||
|
||||
# Install Flash Attention 2
|
||||
python -m pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
**Training** (Mistral 7B):
|
||||
```bash
|
||||
ACCELERATE_LOG_LEVEL=info accelerate launch \
|
||||
--config_file accelerate_configs/deepspeed_zero3.yaml \
|
||||
scripts/run_simpo.py \
|
||||
training_configs/mistral-7b-base-simpo.yaml
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: Train from base model (Mistral 7B)
|
||||
|
||||
**Config** (`mistral-7b-base-simpo.yaml`):
|
||||
```yaml
|
||||
# Model
|
||||
model_name_or_path: mistralai/Mistral-7B-v0.1
|
||||
torch_dtype: bfloat16
|
||||
|
||||
# Dataset
|
||||
dataset_mixer:
|
||||
HuggingFaceH4/ultrafeedback_binarized: 1.0
|
||||
dataset_splits:
|
||||
- train_prefs
|
||||
- test_prefs
|
||||
|
||||
# SimPO hyperparameters
|
||||
beta: 2.0 # Reward scaling (2.0-10.0)
|
||||
gamma_beta_ratio: 0.5 # Target margin (0-1)
|
||||
loss_type: sigmoid # sigmoid or hinge
|
||||
sft_weight: 0.0 # Optional SFT regularization
|
||||
|
||||
# Training
|
||||
learning_rate: 5e-7 # Critical: 3e-7 to 1e-6
|
||||
num_train_epochs: 1
|
||||
per_device_train_batch_size: 1
|
||||
gradient_accumulation_steps: 8
|
||||
|
||||
# Output
|
||||
output_dir: ./outputs/mistral-7b-simpo
|
||||
```
|
||||
|
||||
**Launch training**:
|
||||
```bash
|
||||
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
|
||||
scripts/run_simpo.py training_configs/mistral-7b-base-simpo.yaml
|
||||
```
|
||||
|
||||
### Workflow 2: Fine-tune instruct model (Llama 3 8B)
|
||||
|
||||
**Config** (`llama3-8b-instruct-simpo.yaml`):
|
||||
```yaml
|
||||
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
|
||||
dataset_mixer:
|
||||
argilla/ultrafeedback-binarized-preferences-cleaned: 1.0
|
||||
|
||||
beta: 2.5
|
||||
gamma_beta_ratio: 0.5
|
||||
learning_rate: 5e-7
|
||||
sft_weight: 0.1 # Add SFT loss to preserve capabilities
|
||||
|
||||
num_train_epochs: 1
|
||||
per_device_train_batch_size: 2
|
||||
gradient_accumulation_steps: 4
|
||||
output_dir: ./outputs/llama3-8b-simpo
|
||||
```
|
||||
|
||||
**Launch**:
|
||||
```bash
|
||||
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
|
||||
scripts/run_simpo.py training_configs/llama3-8b-instruct-simpo.yaml
|
||||
```
|
||||
|
||||
### Workflow 3: Reasoning-intensive tasks (lower LR)
|
||||
|
||||
**For math/code tasks**:
|
||||
```yaml
|
||||
model_name_or_path: deepseek-ai/deepseek-math-7b-base
|
||||
|
||||
dataset_mixer:
|
||||
argilla/distilabel-math-preference-dpo: 1.0
|
||||
|
||||
beta: 5.0 # Higher for stronger signal
|
||||
gamma_beta_ratio: 0.7 # Larger margin
|
||||
learning_rate: 3e-7 # Lower LR for reasoning
|
||||
sft_weight: 0.0
|
||||
|
||||
num_train_epochs: 1
|
||||
per_device_train_batch_size: 1
|
||||
gradient_accumulation_steps: 16
|
||||
```
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use SimPO when**:
|
||||
- Want simpler training than DPO (no reference model)
|
||||
- Have preference data (chosen/rejected pairs)
|
||||
- Need better performance than DPO
|
||||
- Limited compute resources
|
||||
- Single-node training sufficient
|
||||
|
||||
**Algorithm selection**:
|
||||
- **SimPO**: Simplest, best performance, no reference model
|
||||
- **DPO**: Need reference model baseline, more conservative
|
||||
- **PPO**: Maximum control, need reward model, complex setup
|
||||
- **GRPO**: Memory-efficient RL, no critic
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **OpenRLHF**: Multi-node distributed training, PPO/GRPO
|
||||
- **TRL**: Need multiple methods in one framework
|
||||
- **DPO**: Established baseline comparison
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: Loss divergence**
|
||||
|
||||
Reduce learning rate:
|
||||
```yaml
|
||||
learning_rate: 3e-7 # Reduce from 5e-7
|
||||
```
|
||||
|
||||
Reduce beta:
|
||||
```yaml
|
||||
beta: 1.0 # Reduce from 2.0
|
||||
```
|
||||
|
||||
**Issue: Model forgets capabilities**
|
||||
|
||||
Add SFT regularization:
|
||||
```yaml
|
||||
sft_weight: 0.1 # Add SFT loss component
|
||||
```
|
||||
|
||||
**Issue: Poor preference separation**
|
||||
|
||||
Increase beta and margin:
|
||||
```yaml
|
||||
beta: 5.0 # Increase from 2.0
|
||||
gamma_beta_ratio: 0.8 # Increase from 0.5
|
||||
```
|
||||
|
||||
**Issue: OOM during training**
|
||||
|
||||
Reduce batch size:
|
||||
```yaml
|
||||
per_device_train_batch_size: 1
|
||||
gradient_accumulation_steps: 16 # Maintain effective batch
|
||||
```
|
||||
|
||||
Enable gradient checkpointing:
|
||||
```yaml
|
||||
gradient_checkpointing: true
|
||||
```
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**Loss functions**: See [references/loss-functions.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/simpo/references/loss-functions.md) for sigmoid vs hinge loss, mathematical formulations, and when to use each.
|
||||
|
||||
**Hyperparameter tuning**: See [references/hyperparameters.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/simpo/references/hyperparameters.md) for beta, gamma, learning rate selection guide, and model-size-specific recommendations.
|
||||
|
||||
**Dataset preparation**: See [references/datasets.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/simpo/references/datasets.md) for preference data formats, quality filtering, and custom dataset creation.
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
- **GPU**: NVIDIA A100/H100 recommended
|
||||
- **VRAM**:
|
||||
- 7B model: 1× A100 40GB (DeepSpeed ZeRO-3)
|
||||
- 8B model: 2× A100 40GB
|
||||
- 70B model: 8× A100 80GB
|
||||
- **Single-node**: DeepSpeed ZeRO-3 sufficient
|
||||
- **Mixed precision**: BF16 recommended
|
||||
|
||||
**Memory optimization**:
|
||||
- DeepSpeed ZeRO-3 (default config)
|
||||
- Gradient checkpointing
|
||||
- Flash Attention 2
|
||||
|
||||
## Resources
|
||||
|
||||
- Paper: https://arxiv.org/abs/2405.14734 (NeurIPS 2024)
|
||||
- GitHub: https://github.com/princeton-nlp/SimPO
|
||||
- Models: https://huggingface.co/princeton-nlp
|
||||
- Alignment Handbook: https://github.com/huggingface/alignment-handbook
|
||||
483
website/docs/user-guide/skills/optional/mlops/mlops-slime.md
Normal file
483
website/docs/user-guide/skills/optional/mlops/mlops-slime.md
Normal file
|
|
@ -0,0 +1,483 @@
|
|||
---
|
||||
title: "Slime Rl Training — Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework"
|
||||
sidebar_label: "Slime Rl Training"
|
||||
description: "Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Slime Rl Training
|
||||
|
||||
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/slime` |
|
||||
| Path | `optional-skills/mlops/slime` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `sglang-router>=0.2.3`, `ray`, `torch>=2.0.0`, `transformers>=4.40.0` |
|
||||
| Tags | `Reinforcement Learning`, `Megatron-LM`, `SGLang`, `GRPO`, `Post-Training`, `GLM` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# slime: LLM Post-Training Framework for RL Scaling
|
||||
|
||||
slime is an LLM post-training framework from Tsinghua's THUDM team, powering GLM-4.5, GLM-4.6, and GLM-4.7. It connects Megatron-LM for training with SGLang for high-throughput rollout generation.
|
||||
|
||||
## When to Use slime
|
||||
|
||||
**Choose slime when you need:**
|
||||
- Megatron-LM native training with SGLang inference
|
||||
- Custom data generation workflows with flexible data buffers
|
||||
- Training GLM, Qwen3, DeepSeek V3, or Llama 3 models
|
||||
- Research-grade framework with production backing (Z.ai)
|
||||
|
||||
**Consider alternatives when:**
|
||||
- You need enterprise-grade stability features → use **miles**
|
||||
- You want flexible backend swapping → use **verl**
|
||||
- You need PyTorch-native abstractions → use **torchforge**
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Training**: Megatron-LM with full parallelism support (TP, PP, DP, SP)
|
||||
- **Rollout**: SGLang-based high-throughput generation with router
|
||||
- **Data Buffer**: Flexible prompt management and sample storage
|
||||
- **Models**: GLM-4.x, Qwen3, DeepSeek V3/R1, Llama 3
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Data Buffer │
|
||||
│ - Prompt initialization and management │
|
||||
│ - Custom data generation and filtering │
|
||||
│ - Rollout sample storage │
|
||||
└─────────────┬───────────────────────────┬───────────────┘
|
||||
│ │
|
||||
┌─────────────▼───────────┐ ┌─────────────▼───────────────┐
|
||||
│ Training (Megatron-LM) │ │ Rollout (SGLang + Router) │
|
||||
│ - Actor model training │ │ - Response generation │
|
||||
│ - Critic (optional) │ │ - Reward/verifier output │
|
||||
│ - Weight sync to rollout│ │ - Multi-turn support │
|
||||
└─────────────────────────┘ └─────────────────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Recommended: Docker
|
||||
docker pull slimerl/slime:latest
|
||||
docker run --rm --gpus all --ipc=host --shm-size=16g \
|
||||
-it slimerl/slime:latest /bin/bash
|
||||
|
||||
# Inside container
|
||||
cd /root/slime && pip install -e . --no-deps
|
||||
```
|
||||
|
||||
### From Source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/THUDM/slime.git
|
||||
cd slime
|
||||
pip install -r requirements.txt
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Quick Start: GRPO Training
|
||||
|
||||
```bash
|
||||
# Source model configuration
|
||||
source scripts/models/qwen3-4B.sh
|
||||
|
||||
# Launch training
|
||||
python train.py \
|
||||
--actor-num-nodes 1 \
|
||||
--actor-num-gpus-per-node 4 \
|
||||
--rollout-num-gpus 4 \
|
||||
--advantage-estimator grpo \
|
||||
--use-kl-loss --kl-loss-coef 0.001 \
|
||||
--rollout-batch-size 32 \
|
||||
--n-samples-per-prompt 8 \
|
||||
--global-batch-size 256 \
|
||||
--num-rollout 3000 \
|
||||
--prompt-data /path/to/data.jsonl \
|
||||
${MODEL_ARGS[@]} ${CKPT_ARGS[@]}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow 1: Standard GRPO Training
|
||||
|
||||
Use this workflow for training reasoning models with group-relative advantages.
|
||||
|
||||
### Prerequisites Checklist
|
||||
- [ ] Docker environment or Megatron-LM + SGLang installed
|
||||
- [ ] Model checkpoint (HuggingFace or Megatron format)
|
||||
- [ ] Training data in JSONL format
|
||||
|
||||
### Step 1: Prepare Data
|
||||
|
||||
```python
|
||||
# data.jsonl format
|
||||
{"prompt": "What is 2 + 2?", "label": "4"}
|
||||
{"prompt": "Solve: 3x = 12", "label": "x = 4"}
|
||||
```
|
||||
|
||||
Or with chat format:
|
||||
```python
|
||||
{
|
||||
"prompt": [
|
||||
{"role": "system", "content": "You are a math tutor."},
|
||||
{"role": "user", "content": "What is 15 + 27?"}
|
||||
],
|
||||
"label": "42"
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Configure Model
|
||||
|
||||
Choose a pre-configured model script:
|
||||
|
||||
```bash
|
||||
# List available models
|
||||
ls scripts/models/
|
||||
# glm4-9B.sh, qwen3-4B.sh, qwen3-30B-A3B.sh, deepseek-v3.sh, llama3-8B.sh, ...
|
||||
|
||||
# Source your model
|
||||
source scripts/models/qwen3-4B.sh
|
||||
```
|
||||
|
||||
### Step 3: Launch Training
|
||||
|
||||
```bash
|
||||
python train.py \
|
||||
--actor-num-nodes 1 \
|
||||
--actor-num-gpus-per-node 8 \
|
||||
--rollout-num-gpus 8 \
|
||||
--advantage-estimator grpo \
|
||||
--use-kl-loss \
|
||||
--kl-loss-coef 0.001 \
|
||||
--prompt-data /path/to/train.jsonl \
|
||||
--input-key prompt \
|
||||
--label-key label \
|
||||
--apply-chat-template \
|
||||
--rollout-batch-size 32 \
|
||||
--n-samples-per-prompt 8 \
|
||||
--global-batch-size 256 \
|
||||
--num-rollout 3000 \
|
||||
--save-interval 100 \
|
||||
--eval-interval 50 \
|
||||
${MODEL_ARGS[@]}
|
||||
```
|
||||
|
||||
### Step 4: Monitor Training
|
||||
- [ ] Check TensorBoard: `tensorboard --logdir outputs/`
|
||||
- [ ] Verify reward curves are increasing
|
||||
- [ ] Monitor GPU utilization across nodes
|
||||
|
||||
---
|
||||
|
||||
## Workflow 2: Asynchronous Training
|
||||
|
||||
Use async mode for higher throughput by overlapping rollout and training.
|
||||
|
||||
### When to Use Async
|
||||
- Large models with long generation times
|
||||
- High GPU idle time in synchronous mode
|
||||
- Sufficient memory for buffering
|
||||
|
||||
### Launch Async Training
|
||||
|
||||
```bash
|
||||
python train_async.py \
|
||||
--actor-num-nodes 1 \
|
||||
--actor-num-gpus-per-node 8 \
|
||||
--rollout-num-gpus 8 \
|
||||
--advantage-estimator grpo \
|
||||
--async-buffer-size 4 \
|
||||
--prompt-data /path/to/train.jsonl \
|
||||
${MODEL_ARGS[@]}
|
||||
```
|
||||
|
||||
### Async-Specific Parameters
|
||||
|
||||
```bash
|
||||
--async-buffer-size 4 # Number of rollouts to buffer
|
||||
--update-weights-interval 2 # Sync weights every N rollouts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow 3: Multi-Turn Agentic Training
|
||||
|
||||
Use this workflow for training agents with tool use or multi-step reasoning.
|
||||
|
||||
### Prerequisites
|
||||
- [ ] Custom generate function for multi-turn logic
|
||||
- [ ] Tool/environment interface
|
||||
|
||||
### Step 1: Define Custom Generate Function
|
||||
|
||||
```python
|
||||
# custom_generate.py
|
||||
async def custom_generate(args, samples, evaluation=False):
|
||||
"""Multi-turn generation with tool calling."""
|
||||
for sample in samples:
|
||||
conversation = sample.prompt
|
||||
|
||||
for turn in range(args.max_turns):
|
||||
# Generate response
|
||||
response = await generate_single(conversation)
|
||||
|
||||
# Check for tool call
|
||||
tool_call = extract_tool_call(response)
|
||||
if tool_call:
|
||||
tool_result = execute_tool(tool_call)
|
||||
conversation.append({"role": "assistant", "content": response})
|
||||
conversation.append({"role": "tool", "content": tool_result})
|
||||
else:
|
||||
break
|
||||
|
||||
sample.response = response
|
||||
sample.reward = compute_reward(sample)
|
||||
|
||||
return samples
|
||||
```
|
||||
|
||||
### Step 2: Launch with Custom Function
|
||||
|
||||
```bash
|
||||
python train.py \
|
||||
--custom-generate-function-path custom_generate.py \
|
||||
--max-turns 5 \
|
||||
--prompt-data /path/to/agent_data.jsonl \
|
||||
${MODEL_ARGS[@]}
|
||||
```
|
||||
|
||||
See `examples/search-r1/` for a complete multi-turn search example.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Three Argument Categories
|
||||
|
||||
slime uses three types of arguments:
|
||||
|
||||
**1. Megatron Arguments** (passed directly):
|
||||
```bash
|
||||
--tensor-model-parallel-size 2
|
||||
--pipeline-model-parallel-size 1
|
||||
--num-layers 32
|
||||
--hidden-size 4096
|
||||
```
|
||||
|
||||
**2. SGLang Arguments** (prefixed with `--sglang-`):
|
||||
```bash
|
||||
--sglang-mem-fraction-static 0.8
|
||||
--sglang-context-length 8192
|
||||
--sglang-log-level INFO
|
||||
```
|
||||
|
||||
**3. slime Arguments**:
|
||||
```bash
|
||||
# Resource allocation
|
||||
--actor-num-nodes 1
|
||||
--actor-num-gpus-per-node 8
|
||||
--rollout-num-gpus 8
|
||||
--colocate # Share GPUs between training/inference
|
||||
|
||||
# Data
|
||||
--prompt-data /path/to/data.jsonl
|
||||
--input-key prompt
|
||||
--label-key label
|
||||
|
||||
# Training loop
|
||||
--num-rollout 3000
|
||||
--rollout-batch-size 32
|
||||
--n-samples-per-prompt 8
|
||||
--global-batch-size 256
|
||||
|
||||
# Algorithm
|
||||
--advantage-estimator grpo # or: gspo, ppo, reinforce_plus_plus
|
||||
--use-kl-loss
|
||||
--kl-loss-coef 0.001
|
||||
```
|
||||
|
||||
### Key Constraints
|
||||
|
||||
```
|
||||
rollout_batch_size × n_samples_per_prompt = global_batch_size × num_steps_per_rollout
|
||||
```
|
||||
|
||||
Example: 32 × 8 = 256 × 1
|
||||
|
||||
---
|
||||
|
||||
## Data Buffer System
|
||||
|
||||
slime's data buffer enables flexible data management:
|
||||
|
||||
### Basic Data Source
|
||||
|
||||
```python
|
||||
class RolloutDataSource:
|
||||
def get_samples(self, num_samples):
|
||||
"""Fetch prompts from dataset."""
|
||||
return self.dataset.sample(num_samples)
|
||||
|
||||
def add_samples(self, samples):
|
||||
"""Called after generation (no-op by default)."""
|
||||
pass
|
||||
```
|
||||
|
||||
### Buffered Data Source (Off-Policy)
|
||||
|
||||
```python
|
||||
class RolloutDataSourceWithBuffer(RolloutDataSource):
|
||||
def __init__(self):
|
||||
self.buffer = []
|
||||
|
||||
def add_samples(self, samples):
|
||||
"""Store generated samples for reuse."""
|
||||
self.buffer.extend(samples)
|
||||
|
||||
def buffer_filter(self, args, buffer, num_samples):
|
||||
"""Custom selection logic (prioritized, stratified, etc.)."""
|
||||
return select_best(buffer, num_samples)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Issue: SGLang Engine Crash
|
||||
|
||||
**Symptoms**: Inference engine dies mid-training
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Enable fault tolerance
|
||||
--use-fault-tolerance
|
||||
|
||||
# Increase memory allocation
|
||||
--sglang-mem-fraction-static 0.85
|
||||
|
||||
# Reduce batch size
|
||||
--rollout-batch-size 16
|
||||
```
|
||||
|
||||
### Issue: Weight Sync Timeout
|
||||
|
||||
**Symptoms**: Training hangs after rollout
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Increase sync interval
|
||||
--update-weights-interval 5
|
||||
|
||||
# Use colocated mode (no network transfer)
|
||||
--colocate
|
||||
```
|
||||
|
||||
### Issue: OOM During Training
|
||||
|
||||
**Symptoms**: CUDA OOM in backward pass
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Enable gradient checkpointing
|
||||
--recompute-activations
|
||||
|
||||
# Reduce micro-batch size
|
||||
--micro-batch-size 1
|
||||
|
||||
# Enable sequence parallelism
|
||||
--sequence-parallel
|
||||
```
|
||||
|
||||
### Issue: Slow Data Loading
|
||||
|
||||
**Symptoms**: GPU idle during data fetch
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Increase data workers
|
||||
--num-data-workers 4
|
||||
|
||||
# Use streaming dataset
|
||||
--streaming-data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supported Models
|
||||
|
||||
| Model Family | Configurations |
|
||||
|--------------|----------------|
|
||||
| GLM | GLM-4.5, GLM-4.6, GLM-4.7, GLM-Z1-9B |
|
||||
| Qwen | Qwen3 (4B, 8B, 30B-A3B), Qwen3-MoE, Qwen2.5 |
|
||||
| DeepSeek | V3, V3.1, R1 |
|
||||
| Llama | Llama 3 (8B, 70B) |
|
||||
| Others | Kimi K2, Moonlight-16B |
|
||||
|
||||
Each model has pre-configured scripts in `scripts/models/`.
|
||||
|
||||
---
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
### Co-location Mode
|
||||
|
||||
Share GPUs between training and inference to reduce memory:
|
||||
|
||||
```bash
|
||||
python train.py \
|
||||
--colocate \
|
||||
--actor-num-gpus-per-node 8 \
|
||||
--sglang-mem-fraction-static 0.4 \
|
||||
${MODEL_ARGS[@]}
|
||||
```
|
||||
|
||||
### Custom Reward Model
|
||||
|
||||
```python
|
||||
# custom_rm.py
|
||||
class CustomRewardModel:
|
||||
def __init__(self, model_path):
|
||||
self.model = load_model(model_path)
|
||||
|
||||
def compute_reward(self, prompts, responses):
|
||||
inputs = self.tokenize(prompts, responses)
|
||||
scores = self.model(inputs)
|
||||
return scores.tolist()
|
||||
```
|
||||
|
||||
```bash
|
||||
--custom-rm-path custom_rm.py
|
||||
```
|
||||
|
||||
### Evaluation Multi-Task
|
||||
|
||||
```bash
|
||||
--eval-prompt-data aime /path/to/aime.jsonl \
|
||||
--eval-prompt-data gsm8k /path/to/gsm8k.jsonl \
|
||||
--n-samples-per-eval-prompt 16
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://thudm.github.io/slime/
|
||||
- **GitHub**: https://github.com/THUDM/slime
|
||||
- **Blog**: https://lmsys.org/blog/2025-07-09-slime/
|
||||
- **Examples**: See `examples/` directory for 14+ worked examples
|
||||
|
|
@ -0,0 +1,539 @@
|
|||
---
|
||||
title: "Stable Diffusion Image Generation"
|
||||
sidebar_label: "Stable Diffusion Image Generation"
|
||||
description: "State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Stable Diffusion Image Generation
|
||||
|
||||
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/stable-diffusion` |
|
||||
| Path | `optional-skills/mlops/stable-diffusion` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `diffusers>=0.30.0`, `transformers>=4.41.0`, `accelerate>=0.31.0`, `torch>=2.0.0` |
|
||||
| Tags | `Image Generation`, `Stable Diffusion`, `Diffusers`, `Text-to-Image`, `Multimodal`, `Computer Vision` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Stable Diffusion Image Generation
|
||||
|
||||
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
|
||||
|
||||
## When to use Stable Diffusion
|
||||
|
||||
**Use Stable Diffusion when:**
|
||||
- Generating images from text descriptions
|
||||
- Performing image-to-image translation (style transfer, enhancement)
|
||||
- Inpainting (filling in masked regions)
|
||||
- Outpainting (extending images beyond boundaries)
|
||||
- Creating variations of existing images
|
||||
- Building custom image generation workflows
|
||||
|
||||
**Key features:**
|
||||
- **Text-to-Image**: Generate images from natural language prompts
|
||||
- **Image-to-Image**: Transform existing images with text guidance
|
||||
- **Inpainting**: Fill masked regions with context-aware content
|
||||
- **ControlNet**: Add spatial conditioning (edges, poses, depth)
|
||||
- **LoRA Support**: Efficient fine-tuning and style adaptation
|
||||
- **Multiple Models**: SD 1.5, SDXL, SD 3.0, Flux support
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **DALL-E 3**: For API-based generation without GPU
|
||||
- **Midjourney**: For artistic, stylized outputs
|
||||
- **Imagen**: For Google Cloud integration
|
||||
- **Leonardo.ai**: For web-based creative workflows
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install diffusers transformers accelerate torch
|
||||
pip install xformers # Optional: memory-efficient attention
|
||||
```
|
||||
|
||||
### Basic text-to-image
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
# Load pipeline (auto-detects model type)
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
||||
torch_dtype=torch.float16
|
||||
)
|
||||
pipe.to("cuda")
|
||||
|
||||
# Generate image
|
||||
image = pipe(
|
||||
"A serene mountain landscape at sunset, highly detailed",
|
||||
num_inference_steps=50,
|
||||
guidance_scale=7.5
|
||||
).images[0]
|
||||
|
||||
image.save("output.png")
|
||||
```
|
||||
|
||||
### Using SDXL (higher quality)
|
||||
|
||||
```python
|
||||
from diffusers import AutoPipelineForText2Image
|
||||
import torch
|
||||
|
||||
pipe = AutoPipelineForText2Image.from_pretrained(
|
||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||
torch_dtype=torch.float16,
|
||||
variant="fp16"
|
||||
)
|
||||
pipe.to("cuda")
|
||||
|
||||
# Enable memory optimization
|
||||
pipe.enable_model_cpu_offload()
|
||||
|
||||
image = pipe(
|
||||
prompt="A futuristic city with flying cars, cinematic lighting",
|
||||
height=1024,
|
||||
width=1024,
|
||||
num_inference_steps=30
|
||||
).images[0]
|
||||
```
|
||||
|
||||
## Architecture overview
|
||||
|
||||
### Three-pillar design
|
||||
|
||||
Diffusers is built around three core components:
|
||||
|
||||
```
|
||||
Pipeline (orchestration)
|
||||
├── Model (neural networks)
|
||||
│ ├── UNet / Transformer (noise prediction)
|
||||
│ ├── VAE (latent encoding/decoding)
|
||||
│ └── Text Encoder (CLIP/T5)
|
||||
└── Scheduler (denoising algorithm)
|
||||
```
|
||||
|
||||
### Pipeline inference flow
|
||||
|
||||
```
|
||||
Text Prompt → Text Encoder → Text Embeddings
|
||||
↓
|
||||
Random Noise → [Denoising Loop] ← Scheduler
|
||||
↓
|
||||
Predicted Noise
|
||||
↓
|
||||
VAE Decoder → Final Image
|
||||
```
|
||||
|
||||
## Core concepts
|
||||
|
||||
### Pipelines
|
||||
|
||||
Pipelines orchestrate complete workflows:
|
||||
|
||||
| Pipeline | Purpose |
|
||||
|----------|---------|
|
||||
| `StableDiffusionPipeline` | Text-to-image (SD 1.x/2.x) |
|
||||
| `StableDiffusionXLPipeline` | Text-to-image (SDXL) |
|
||||
| `StableDiffusion3Pipeline` | Text-to-image (SD 3.0) |
|
||||
| `FluxPipeline` | Text-to-image (Flux models) |
|
||||
| `StableDiffusionImg2ImgPipeline` | Image-to-image |
|
||||
| `StableDiffusionInpaintPipeline` | Inpainting |
|
||||
|
||||
### Schedulers
|
||||
|
||||
Schedulers control the denoising process:
|
||||
|
||||
| Scheduler | Steps | Quality | Use Case |
|
||||
|-----------|-------|---------|----------|
|
||||
| `EulerDiscreteScheduler` | 20-50 | Good | Default choice |
|
||||
| `EulerAncestralDiscreteScheduler` | 20-50 | Good | More variation |
|
||||
| `DPMSolverMultistepScheduler` | 15-25 | Excellent | Fast, high quality |
|
||||
| `DDIMScheduler` | 50-100 | Good | Deterministic |
|
||||
| `LCMScheduler` | 4-8 | Good | Very fast |
|
||||
| `UniPCMultistepScheduler` | 15-25 | Excellent | Fast convergence |
|
||||
|
||||
### Swapping schedulers
|
||||
|
||||
```python
|
||||
from diffusers import DPMSolverMultistepScheduler
|
||||
|
||||
# Swap for faster generation
|
||||
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
|
||||
pipe.scheduler.config
|
||||
)
|
||||
|
||||
# Now generate with fewer steps
|
||||
image = pipe(prompt, num_inference_steps=20).images[0]
|
||||
```
|
||||
|
||||
## Generation parameters
|
||||
|
||||
### Key parameters
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `prompt` | Required | Text description of desired image |
|
||||
| `negative_prompt` | None | What to avoid in the image |
|
||||
| `num_inference_steps` | 50 | Denoising steps (more = better quality) |
|
||||
| `guidance_scale` | 7.5 | Prompt adherence (7-12 typical) |
|
||||
| `height`, `width` | 512/1024 | Output dimensions (multiples of 8) |
|
||||
| `generator` | None | Torch generator for reproducibility |
|
||||
| `num_images_per_prompt` | 1 | Batch size |
|
||||
|
||||
### Reproducible generation
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(42)
|
||||
|
||||
image = pipe(
|
||||
prompt="A cat wearing a top hat",
|
||||
generator=generator,
|
||||
num_inference_steps=50
|
||||
).images[0]
|
||||
```
|
||||
|
||||
### Negative prompts
|
||||
|
||||
```python
|
||||
image = pipe(
|
||||
prompt="Professional photo of a dog in a garden",
|
||||
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
|
||||
guidance_scale=7.5
|
||||
).images[0]
|
||||
```
|
||||
|
||||
## Image-to-image
|
||||
|
||||
Transform existing images with text guidance:
|
||||
|
||||
```python
|
||||
from diffusers import AutoPipelineForImage2Image
|
||||
from PIL import Image
|
||||
|
||||
pipe = AutoPipelineForImage2Image.from_pretrained(
|
||||
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
init_image = Image.open("input.jpg").resize((512, 512))
|
||||
|
||||
image = pipe(
|
||||
prompt="A watercolor painting of the scene",
|
||||
image=init_image,
|
||||
strength=0.75, # How much to transform (0-1)
|
||||
num_inference_steps=50
|
||||
).images[0]
|
||||
```
|
||||
|
||||
## Inpainting
|
||||
|
||||
Fill masked regions:
|
||||
|
||||
```python
|
||||
from diffusers import AutoPipelineForInpainting
|
||||
from PIL import Image
|
||||
|
||||
pipe = AutoPipelineForInpainting.from_pretrained(
|
||||
"runwayml/stable-diffusion-inpainting",
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
image = Image.open("photo.jpg")
|
||||
mask = Image.open("mask.png") # White = inpaint region
|
||||
|
||||
result = pipe(
|
||||
prompt="A red car parked on the street",
|
||||
image=image,
|
||||
mask_image=mask,
|
||||
num_inference_steps=50
|
||||
).images[0]
|
||||
```
|
||||
|
||||
## ControlNet
|
||||
|
||||
Add spatial conditioning for precise control:
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
||||
import torch
|
||||
|
||||
# Load ControlNet for edge conditioning
|
||||
controlnet = ControlNetModel.from_pretrained(
|
||||
"lllyasviel/control_v11p_sd15_canny",
|
||||
torch_dtype=torch.float16
|
||||
)
|
||||
|
||||
pipe = StableDiffusionControlNetPipeline.from_pretrained(
|
||||
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
||||
controlnet=controlnet,
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
# Use Canny edge image as control
|
||||
control_image = get_canny_image(input_image)
|
||||
|
||||
image = pipe(
|
||||
prompt="A beautiful house in the style of Van Gogh",
|
||||
image=control_image,
|
||||
num_inference_steps=30
|
||||
).images[0]
|
||||
```
|
||||
|
||||
### Available ControlNets
|
||||
|
||||
| ControlNet | Input Type | Use Case |
|
||||
|------------|------------|----------|
|
||||
| `canny` | Edge maps | Preserve structure |
|
||||
| `openpose` | Pose skeletons | Human poses |
|
||||
| `depth` | Depth maps | 3D-aware generation |
|
||||
| `normal` | Normal maps | Surface details |
|
||||
| `mlsd` | Line segments | Architectural lines |
|
||||
| `scribble` | Rough sketches | Sketch-to-image |
|
||||
|
||||
## LoRA adapters
|
||||
|
||||
Load fine-tuned style adapters:
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
# Load LoRA weights
|
||||
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
|
||||
|
||||
# Generate with LoRA style
|
||||
image = pipe("A portrait in the trained style").images[0]
|
||||
|
||||
# Adjust LoRA strength
|
||||
pipe.fuse_lora(lora_scale=0.8)
|
||||
|
||||
# Unload LoRA
|
||||
pipe.unload_lora_weights()
|
||||
```
|
||||
|
||||
### Multiple LoRAs
|
||||
|
||||
```python
|
||||
# Load multiple LoRAs
|
||||
pipe.load_lora_weights("lora1", adapter_name="style")
|
||||
pipe.load_lora_weights("lora2", adapter_name="character")
|
||||
|
||||
# Set weights for each
|
||||
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
|
||||
|
||||
image = pipe("A portrait").images[0]
|
||||
```
|
||||
|
||||
## Memory optimization
|
||||
|
||||
### Enable CPU offloading
|
||||
|
||||
```python
|
||||
# Model CPU offload - moves models to CPU when not in use
|
||||
pipe.enable_model_cpu_offload()
|
||||
|
||||
# Sequential CPU offload - more aggressive, slower
|
||||
pipe.enable_sequential_cpu_offload()
|
||||
```
|
||||
|
||||
### Attention slicing
|
||||
|
||||
```python
|
||||
# Reduce memory by computing attention in chunks
|
||||
pipe.enable_attention_slicing()
|
||||
|
||||
# Or specific chunk size
|
||||
pipe.enable_attention_slicing("max")
|
||||
```
|
||||
|
||||
### xFormers memory-efficient attention
|
||||
|
||||
```python
|
||||
# Requires xformers package
|
||||
pipe.enable_xformers_memory_efficient_attention()
|
||||
```
|
||||
|
||||
### VAE slicing for large images
|
||||
|
||||
```python
|
||||
# Decode latents in tiles for large images
|
||||
pipe.enable_vae_slicing()
|
||||
pipe.enable_vae_tiling()
|
||||
```
|
||||
|
||||
## Model variants
|
||||
|
||||
### Loading different precisions
|
||||
|
||||
```python
|
||||
# FP16 (recommended for GPU)
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"model-id",
|
||||
torch_dtype=torch.float16,
|
||||
variant="fp16"
|
||||
)
|
||||
|
||||
# BF16 (better precision, requires Ampere+ GPU)
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"model-id",
|
||||
torch_dtype=torch.bfloat16
|
||||
)
|
||||
```
|
||||
|
||||
### Loading specific components
|
||||
|
||||
```python
|
||||
from diffusers import UNet2DConditionModel, AutoencoderKL
|
||||
|
||||
# Load custom VAE
|
||||
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
|
||||
|
||||
# Use with pipeline
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
||||
vae=vae,
|
||||
torch_dtype=torch.float16
|
||||
)
|
||||
```
|
||||
|
||||
## Batch generation
|
||||
|
||||
Generate multiple images efficiently:
|
||||
|
||||
```python
|
||||
# Multiple prompts
|
||||
prompts = [
|
||||
"A cat playing piano",
|
||||
"A dog reading a book",
|
||||
"A bird painting a picture"
|
||||
]
|
||||
|
||||
images = pipe(prompts, num_inference_steps=30).images
|
||||
|
||||
# Multiple images per prompt
|
||||
images = pipe(
|
||||
"A beautiful sunset",
|
||||
num_images_per_prompt=4,
|
||||
num_inference_steps=30
|
||||
).images
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: High-quality generation
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
|
||||
import torch
|
||||
|
||||
# 1. Load SDXL with optimizations
|
||||
pipe = StableDiffusionXLPipeline.from_pretrained(
|
||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||
torch_dtype=torch.float16,
|
||||
variant="fp16"
|
||||
)
|
||||
pipe.to("cuda")
|
||||
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
|
||||
pipe.enable_model_cpu_offload()
|
||||
|
||||
# 2. Generate with quality settings
|
||||
image = pipe(
|
||||
prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
|
||||
negative_prompt="blurry, low quality, cartoon, anime, sketch",
|
||||
num_inference_steps=30,
|
||||
guidance_scale=7.5,
|
||||
height=1024,
|
||||
width=1024
|
||||
).images[0]
|
||||
```
|
||||
|
||||
### Workflow 2: Fast prototyping
|
||||
|
||||
```python
|
||||
from diffusers import AutoPipelineForText2Image, LCMScheduler
|
||||
import torch
|
||||
|
||||
# Use LCM for 4-8 step generation
|
||||
pipe = AutoPipelineForText2Image.from_pretrained(
|
||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
# Load LCM LoRA for fast generation
|
||||
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
|
||||
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
|
||||
pipe.fuse_lora()
|
||||
|
||||
# Generate in ~1 second
|
||||
image = pipe(
|
||||
"A beautiful landscape",
|
||||
num_inference_steps=4,
|
||||
guidance_scale=1.0
|
||||
).images[0]
|
||||
```
|
||||
|
||||
## Common issues
|
||||
|
||||
**CUDA out of memory:**
|
||||
```python
|
||||
# Enable memory optimizations
|
||||
pipe.enable_model_cpu_offload()
|
||||
pipe.enable_attention_slicing()
|
||||
pipe.enable_vae_slicing()
|
||||
|
||||
# Or use lower precision
|
||||
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
**Black/noise images:**
|
||||
```python
|
||||
# Check VAE configuration
|
||||
# Use safety checker bypass if needed
|
||||
pipe.safety_checker = None
|
||||
|
||||
# Ensure proper dtype consistency
|
||||
pipe = pipe.to(dtype=torch.float16)
|
||||
```
|
||||
|
||||
**Slow generation:**
|
||||
```python
|
||||
# Use faster scheduler
|
||||
from diffusers import DPMSolverMultistepScheduler
|
||||
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
|
||||
|
||||
# Reduce steps
|
||||
image = pipe(prompt, num_inference_steps=20).images[0]
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/stable-diffusion/references/advanced-usage.md)** - Custom pipelines, fine-tuning, deployment
|
||||
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/stable-diffusion/references/troubleshooting.md)** - Common issues and solutions
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://huggingface.co/docs/diffusers
|
||||
- **Repository**: https://github.com/huggingface/diffusers
|
||||
- **Model Hub**: https://huggingface.co/models?library=diffusers
|
||||
- **Discord**: https://discord.gg/diffusers
|
||||
|
|
@ -0,0 +1,205 @@
|
|||
---
|
||||
title: "Tensorrt Llm — Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency"
|
||||
sidebar_label: "Tensorrt Llm"
|
||||
description: "Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Tensorrt Llm
|
||||
|
||||
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/tensorrt-llm` |
|
||||
| Path | `optional-skills/mlops/tensorrt-llm` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `tensorrt-llm`, `torch` |
|
||||
| Tags | `Inference Serving`, `TensorRT-LLM`, `NVIDIA`, `Inference Optimization`, `High Throughput`, `Low Latency`, `Production`, `FP8`, `INT4`, `In-Flight Batching`, `Multi-GPU` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# TensorRT-LLM
|
||||
|
||||
NVIDIA's open-source library for optimizing LLM inference with state-of-the-art performance on NVIDIA GPUs.
|
||||
|
||||
## When to use TensorRT-LLM
|
||||
|
||||
**Use TensorRT-LLM when:**
|
||||
- Deploying on NVIDIA GPUs (A100, H100, GB200)
|
||||
- Need maximum throughput (24,000+ tokens/sec on Llama 3)
|
||||
- Require low latency for real-time applications
|
||||
- Working with quantized models (FP8, INT4, FP4)
|
||||
- Scaling across multiple GPUs or nodes
|
||||
|
||||
**Use vLLM instead when:**
|
||||
- Need simpler setup and Python-first API
|
||||
- Want PagedAttention without TensorRT compilation
|
||||
- Working with AMD GPUs or non-NVIDIA hardware
|
||||
|
||||
**Use llama.cpp instead when:**
|
||||
- Deploying on CPU or Apple Silicon
|
||||
- Need edge deployment without NVIDIA GPUs
|
||||
- Want simpler GGUF quantization format
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Docker (recommended)
|
||||
docker pull nvidia/tensorrt_llm:latest
|
||||
|
||||
# pip install
|
||||
pip install tensorrt_llm==1.2.0rc3
|
||||
|
||||
# Requires CUDA 13.0.0, TensorRT 10.13.2, Python 3.10-3.12
|
||||
```
|
||||
|
||||
### Basic inference
|
||||
|
||||
```python
|
||||
from tensorrt_llm import LLM, SamplingParams
|
||||
|
||||
# Initialize model
|
||||
llm = LLM(model="meta-llama/Meta-Llama-3-8B")
|
||||
|
||||
# Configure sampling
|
||||
sampling_params = SamplingParams(
|
||||
max_tokens=100,
|
||||
temperature=0.7,
|
||||
top_p=0.9
|
||||
)
|
||||
|
||||
# Generate
|
||||
prompts = ["Explain quantum computing"]
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
|
||||
for output in outputs:
|
||||
print(output.text)
|
||||
```
|
||||
|
||||
### Serving with trtllm-serve
|
||||
|
||||
```bash
|
||||
# Start server (automatic model download and compilation)
|
||||
trtllm-serve meta-llama/Meta-Llama-3-8B \
|
||||
--tp_size 4 \ # Tensor parallelism (4 GPUs)
|
||||
--max_batch_size 256 \
|
||||
--max_num_tokens 4096
|
||||
|
||||
# Client request
|
||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "meta-llama/Meta-Llama-3-8B",
|
||||
"messages": [{"role": "user", "content": "Hello!"}],
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 100
|
||||
}'
|
||||
```
|
||||
|
||||
## Key features
|
||||
|
||||
### Performance optimizations
|
||||
- **In-flight batching**: Dynamic batching during generation
|
||||
- **Paged KV cache**: Efficient memory management
|
||||
- **Flash Attention**: Optimized attention kernels
|
||||
- **Quantization**: FP8, INT4, FP4 for 2-4× faster inference
|
||||
- **CUDA graphs**: Reduced kernel launch overhead
|
||||
|
||||
### Parallelism
|
||||
- **Tensor parallelism (TP)**: Split model across GPUs
|
||||
- **Pipeline parallelism (PP)**: Layer-wise distribution
|
||||
- **Expert parallelism**: For Mixture-of-Experts models
|
||||
- **Multi-node**: Scale beyond single machine
|
||||
|
||||
### Advanced features
|
||||
- **Speculative decoding**: Faster generation with draft models
|
||||
- **LoRA serving**: Efficient multi-adapter deployment
|
||||
- **Disaggregated serving**: Separate prefill and generation
|
||||
|
||||
## Common patterns
|
||||
|
||||
### Quantized model (FP8)
|
||||
|
||||
```python
|
||||
from tensorrt_llm import LLM
|
||||
|
||||
# Load FP8 quantized model (2× faster, 50% memory)
|
||||
llm = LLM(
|
||||
model="meta-llama/Meta-Llama-3-70B",
|
||||
dtype="fp8",
|
||||
max_num_tokens=8192
|
||||
)
|
||||
|
||||
# Inference same as before
|
||||
outputs = llm.generate(["Summarize this article..."])
|
||||
```
|
||||
|
||||
### Multi-GPU deployment
|
||||
|
||||
```python
|
||||
# Tensor parallelism across 8 GPUs
|
||||
llm = LLM(
|
||||
model="meta-llama/Meta-Llama-3-405B",
|
||||
tensor_parallel_size=8,
|
||||
dtype="fp8"
|
||||
)
|
||||
```
|
||||
|
||||
### Batch inference
|
||||
|
||||
```python
|
||||
# Process 100 prompts efficiently
|
||||
prompts = [f"Question {i}: ..." for i in range(100)]
|
||||
|
||||
outputs = llm.generate(
|
||||
prompts,
|
||||
sampling_params=SamplingParams(max_tokens=200)
|
||||
)
|
||||
|
||||
# Automatic in-flight batching for maximum throughput
|
||||
```
|
||||
|
||||
## Performance benchmarks
|
||||
|
||||
**Meta Llama 3-8B** (H100 GPU):
|
||||
- Throughput: 24,000 tokens/sec
|
||||
- Latency: ~10ms per token
|
||||
- vs PyTorch: **100× faster**
|
||||
|
||||
**Llama 3-70B** (8× A100 80GB):
|
||||
- FP8 quantization: 2× faster than FP16
|
||||
- Memory: 50% reduction with FP8
|
||||
|
||||
## Supported models
|
||||
|
||||
- **LLaMA family**: Llama 2, Llama 3, CodeLlama
|
||||
- **GPT family**: GPT-2, GPT-J, GPT-NeoX
|
||||
- **Qwen**: Qwen, Qwen2, QwQ
|
||||
- **DeepSeek**: DeepSeek-V2, DeepSeek-V3
|
||||
- **Mixtral**: Mixtral-8x7B, Mixtral-8x22B
|
||||
- **Vision**: LLaVA, Phi-3-vision
|
||||
- **100+ models** on HuggingFace
|
||||
|
||||
## References
|
||||
|
||||
- **[Optimization Guide](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/tensorrt-llm/references/optimization.md)** - Quantization, batching, KV cache tuning
|
||||
- **[Multi-GPU Setup](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/tensorrt-llm/references/multi-gpu.md)** - Tensor/pipeline parallelism, multi-node
|
||||
- **[Serving Guide](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/tensorrt-llm/references/serving.md)** - Production deployment, monitoring, autoscaling
|
||||
|
||||
## Resources
|
||||
|
||||
- **Docs**: https://nvidia.github.io/TensorRT-LLM/
|
||||
- **GitHub**: https://github.com/NVIDIA/TensorRT-LLM
|
||||
- **Models**: https://huggingface.co/models?library=tensorrt_llm
|
||||
|
|
@ -0,0 +1,377 @@
|
|||
---
|
||||
title: "Distributed Llm Pretraining Torchtitan"
|
||||
sidebar_label: "Distributed Llm Pretraining Torchtitan"
|
||||
description: "Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP)"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Distributed Llm Pretraining Torchtitan
|
||||
|
||||
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/torchtitan` |
|
||||
| Path | `optional-skills/mlops/torchtitan` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `torch>=2.6.0`, `torchtitan>=0.2.0`, `torchao>=0.5.0` |
|
||||
| Tags | `Model Architecture`, `Distributed Training`, `TorchTitan`, `FSDP2`, `Tensor Parallel`, `Pipeline Parallel`, `Context Parallel`, `Float8`, `Llama`, `Pretraining` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# TorchTitan - PyTorch Native Distributed LLM Pretraining
|
||||
|
||||
## Quick start
|
||||
|
||||
TorchTitan is PyTorch's official platform for large-scale LLM pretraining with composable 4D parallelism (FSDP2, TP, PP, CP), achieving 65%+ speedups over baselines on H100 GPUs.
|
||||
|
||||
**Installation**:
|
||||
```bash
|
||||
# From PyPI (stable)
|
||||
pip install torchtitan
|
||||
|
||||
# From source (latest features, requires PyTorch nightly)
|
||||
git clone https://github.com/pytorch/torchtitan
|
||||
cd torchtitan
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**Download tokenizer**:
|
||||
```bash
|
||||
# Get HF token from https://huggingface.co/settings/tokens
|
||||
python scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.1-8B --assets tokenizer --hf_token=...
|
||||
```
|
||||
|
||||
**Start training on 8 GPUs**:
|
||||
```bash
|
||||
CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: Pretrain Llama 3.1 8B on single node
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
Single Node Pretraining:
|
||||
- [ ] Step 1: Download tokenizer
|
||||
- [ ] Step 2: Configure training
|
||||
- [ ] Step 3: Launch training
|
||||
- [ ] Step 4: Monitor and checkpoint
|
||||
```
|
||||
|
||||
**Step 1: Download tokenizer**
|
||||
|
||||
```bash
|
||||
python scripts/download_hf_assets.py \
|
||||
--repo_id meta-llama/Llama-3.1-8B \
|
||||
--assets tokenizer \
|
||||
--hf_token=YOUR_HF_TOKEN
|
||||
```
|
||||
|
||||
**Step 2: Configure training**
|
||||
|
||||
Edit or create a TOML config file:
|
||||
|
||||
```toml
|
||||
# llama3_8b_custom.toml
|
||||
[job]
|
||||
dump_folder = "./outputs"
|
||||
description = "Llama 3.1 8B training"
|
||||
|
||||
[model]
|
||||
name = "llama3"
|
||||
flavor = "8B"
|
||||
hf_assets_path = "./assets/hf/Llama-3.1-8B"
|
||||
|
||||
[optimizer]
|
||||
name = "AdamW"
|
||||
lr = 3e-4
|
||||
|
||||
[lr_scheduler]
|
||||
warmup_steps = 200
|
||||
|
||||
[training]
|
||||
local_batch_size = 2
|
||||
seq_len = 8192
|
||||
max_norm = 1.0
|
||||
steps = 1000
|
||||
dataset = "c4"
|
||||
|
||||
[parallelism]
|
||||
data_parallel_shard_degree = -1 # Use all GPUs for FSDP
|
||||
|
||||
[activation_checkpoint]
|
||||
mode = "selective"
|
||||
selective_ac_option = "op"
|
||||
|
||||
[checkpoint]
|
||||
enable = true
|
||||
folder = "checkpoint"
|
||||
interval = 500
|
||||
```
|
||||
|
||||
**Step 3: Launch training**
|
||||
|
||||
```bash
|
||||
# 8 GPUs on single node
|
||||
CONFIG_FILE="./llama3_8b_custom.toml" ./run_train.sh
|
||||
|
||||
# Or explicitly with torchrun
|
||||
torchrun --nproc_per_node=8 \
|
||||
-m torchtitan.train \
|
||||
--job.config_file ./llama3_8b_custom.toml
|
||||
```
|
||||
|
||||
**Step 4: Monitor and checkpoint**
|
||||
|
||||
TensorBoard logs are saved to `./outputs/tb/`:
|
||||
```bash
|
||||
tensorboard --logdir ./outputs/tb
|
||||
```
|
||||
|
||||
### Workflow 2: Multi-node training with SLURM
|
||||
|
||||
```
|
||||
Multi-Node Training:
|
||||
- [ ] Step 1: Configure parallelism for scale
|
||||
- [ ] Step 2: Set up SLURM script
|
||||
- [ ] Step 3: Submit job
|
||||
- [ ] Step 4: Resume from checkpoint
|
||||
```
|
||||
|
||||
**Step 1: Configure parallelism for scale**
|
||||
|
||||
For 70B model on 256 GPUs (32 nodes):
|
||||
```toml
|
||||
[parallelism]
|
||||
data_parallel_shard_degree = 32 # FSDP across 32 ranks
|
||||
tensor_parallel_degree = 8 # TP within node
|
||||
pipeline_parallel_degree = 1 # No PP for 70B
|
||||
context_parallel_degree = 1 # Increase for long sequences
|
||||
```
|
||||
|
||||
**Step 2: Set up SLURM script**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
#SBATCH --job-name=llama70b
|
||||
#SBATCH --nodes=32
|
||||
#SBATCH --ntasks-per-node=8
|
||||
#SBATCH --gpus-per-node=8
|
||||
|
||||
srun torchrun \
|
||||
--nnodes=32 \
|
||||
--nproc_per_node=8 \
|
||||
--rdzv_backend=c10d \
|
||||
--rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT \
|
||||
-m torchtitan.train \
|
||||
--job.config_file ./llama3_70b.toml
|
||||
```
|
||||
|
||||
**Step 3: Submit job**
|
||||
|
||||
```bash
|
||||
sbatch multinode_trainer.slurm
|
||||
```
|
||||
|
||||
**Step 4: Resume from checkpoint**
|
||||
|
||||
Training auto-resumes if checkpoint exists in configured folder.
|
||||
|
||||
### Workflow 3: Enable Float8 training for H100s
|
||||
|
||||
Float8 provides 30-50% speedup on H100 GPUs.
|
||||
|
||||
```
|
||||
Float8 Training:
|
||||
- [ ] Step 1: Install torchao
|
||||
- [ ] Step 2: Configure Float8
|
||||
- [ ] Step 3: Launch with compile
|
||||
```
|
||||
|
||||
**Step 1: Install torchao**
|
||||
|
||||
```bash
|
||||
USE_CPP=0 pip install git+https://github.com/pytorch/ao.git
|
||||
```
|
||||
|
||||
**Step 2: Configure Float8**
|
||||
|
||||
Add to your TOML config:
|
||||
```toml
|
||||
[model]
|
||||
converters = ["quantize.linear.float8"]
|
||||
|
||||
[quantize.linear.float8]
|
||||
enable_fsdp_float8_all_gather = true
|
||||
precompute_float8_dynamic_scale_for_fsdp = true
|
||||
filter_fqns = ["output"] # Exclude output layer
|
||||
|
||||
[compile]
|
||||
enable = true
|
||||
components = ["model", "loss"]
|
||||
```
|
||||
|
||||
**Step 3: Launch with compile**
|
||||
|
||||
```bash
|
||||
CONFIG_FILE="./llama3_8b.toml" ./run_train.sh \
|
||||
--model.converters="quantize.linear.float8" \
|
||||
--quantize.linear.float8.enable_fsdp_float8_all_gather \
|
||||
--compile.enable
|
||||
```
|
||||
|
||||
### Workflow 4: 4D parallelism for 405B models
|
||||
|
||||
```
|
||||
4D Parallelism (FSDP + TP + PP + CP):
|
||||
- [ ] Step 1: Create seed checkpoint
|
||||
- [ ] Step 2: Configure 4D parallelism
|
||||
- [ ] Step 3: Launch on 512 GPUs
|
||||
```
|
||||
|
||||
**Step 1: Create seed checkpoint**
|
||||
|
||||
Required for consistent initialization across PP stages:
|
||||
```bash
|
||||
NGPU=1 CONFIG_FILE=./llama3_405b.toml ./run_train.sh \
|
||||
--checkpoint.enable \
|
||||
--checkpoint.create_seed_checkpoint \
|
||||
--parallelism.data_parallel_shard_degree 1 \
|
||||
--parallelism.tensor_parallel_degree 1 \
|
||||
--parallelism.pipeline_parallel_degree 1
|
||||
```
|
||||
|
||||
**Step 2: Configure 4D parallelism**
|
||||
|
||||
```toml
|
||||
[parallelism]
|
||||
data_parallel_shard_degree = 8 # FSDP
|
||||
tensor_parallel_degree = 8 # TP within node
|
||||
pipeline_parallel_degree = 8 # PP across nodes
|
||||
context_parallel_degree = 1 # CP for long sequences
|
||||
|
||||
[training]
|
||||
local_batch_size = 32
|
||||
seq_len = 8192
|
||||
```
|
||||
|
||||
**Step 3: Launch on 512 GPUs**
|
||||
|
||||
```bash
|
||||
# 64 nodes x 8 GPUs = 512 GPUs
|
||||
srun torchrun --nnodes=64 --nproc_per_node=8 \
|
||||
-m torchtitan.train \
|
||||
--job.config_file ./llama3_405b.toml
|
||||
```
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use TorchTitan when:**
|
||||
- Pretraining LLMs from scratch (8B to 405B+)
|
||||
- Need PyTorch-native solution without third-party dependencies
|
||||
- Require composable 4D parallelism (FSDP2, TP, PP, CP)
|
||||
- Training on H100s with Float8 support
|
||||
- Want interoperable checkpoints with torchtune/HuggingFace
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **Megatron-LM**: Maximum performance for NVIDIA-only deployments
|
||||
- **DeepSpeed**: Broader ZeRO optimization ecosystem, inference support
|
||||
- **Axolotl/TRL**: Fine-tuning rather than pretraining
|
||||
- **LitGPT**: Educational, smaller-scale training
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: Out of memory on large models**
|
||||
|
||||
Enable activation checkpointing and reduce batch size:
|
||||
```toml
|
||||
[activation_checkpoint]
|
||||
mode = "full" # Instead of "selective"
|
||||
|
||||
[training]
|
||||
local_batch_size = 1
|
||||
```
|
||||
|
||||
Or use gradient accumulation:
|
||||
```toml
|
||||
[training]
|
||||
local_batch_size = 1
|
||||
global_batch_size = 32 # Accumulates gradients
|
||||
```
|
||||
|
||||
**Issue: TP causes high memory with async collectives**
|
||||
|
||||
Set environment variable:
|
||||
```bash
|
||||
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
|
||||
```
|
||||
|
||||
**Issue: Float8 training not faster**
|
||||
|
||||
Float8 only benefits large GEMMs. Filter small layers:
|
||||
```toml
|
||||
[quantize.linear.float8]
|
||||
filter_fqns = ["attention.wk", "attention.wv", "output", "auto_filter_small_kn"]
|
||||
```
|
||||
|
||||
**Issue: Checkpoint loading fails after parallelism change**
|
||||
|
||||
Use DCP's resharding capability:
|
||||
```bash
|
||||
# Convert sharded checkpoint to single file
|
||||
python -m torch.distributed.checkpoint.format_utils \
|
||||
dcp_to_torch checkpoint/step-1000 checkpoint.pt
|
||||
```
|
||||
|
||||
**Issue: Pipeline parallelism initialization**
|
||||
|
||||
Create seed checkpoint first (see Workflow 4, Step 1).
|
||||
|
||||
## Supported models
|
||||
|
||||
| Model | Sizes | Status |
|
||||
|-------|-------|--------|
|
||||
| Llama 3.1 | 8B, 70B, 405B | Production |
|
||||
| Llama 4 | Various | Experimental |
|
||||
| DeepSeek V3 | 16B, 236B, 671B (MoE) | Experimental |
|
||||
| GPT-OSS | 20B, 120B (MoE) | Experimental |
|
||||
| Qwen 3 | Various | Experimental |
|
||||
| Flux | Diffusion | Experimental |
|
||||
|
||||
## Performance benchmarks (H100)
|
||||
|
||||
| Model | GPUs | Parallelism | TPS/GPU | Techniques |
|
||||
|-------|------|-------------|---------|------------|
|
||||
| Llama 8B | 8 | FSDP | 5,762 | Baseline |
|
||||
| Llama 8B | 8 | FSDP+compile+FP8 | 8,532 | +48% |
|
||||
| Llama 70B | 256 | FSDP+TP+AsyncTP | 876 | 2D parallel |
|
||||
| Llama 405B | 512 | FSDP+TP+PP | 128 | 3D parallel |
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**FSDP2 configuration**: See [references/fsdp.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/torchtitan/references/fsdp.md) for detailed FSDP2 vs FSDP1 comparison and ZeRO equivalents.
|
||||
|
||||
**Float8 training**: See [references/float8.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/torchtitan/references/float8.md) for tensorwise vs rowwise scaling recipes.
|
||||
|
||||
**Checkpointing**: See [references/checkpoint.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/torchtitan/references/checkpoint.md) for HuggingFace conversion and async checkpointing.
|
||||
|
||||
**Adding custom models**: See [references/custom-models.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/torchtitan/references/custom-models.md) for TrainSpec protocol.
|
||||
|
||||
## Resources
|
||||
|
||||
- GitHub: https://github.com/pytorch/torchtitan
|
||||
- Paper: https://arxiv.org/abs/2410.06511
|
||||
- ICLR 2025: https://iclr.cc/virtual/2025/poster/29620
|
||||
- PyTorch Forum: https://discuss.pytorch.org/c/distributed/torchtitan/44
|
||||
335
website/docs/user-guide/skills/optional/mlops/mlops-whisper.md
Normal file
335
website/docs/user-guide/skills/optional/mlops/mlops-whisper.md
Normal file
|
|
@ -0,0 +1,335 @@
|
|||
---
|
||||
title: "Whisper — OpenAI's general-purpose speech recognition model"
|
||||
sidebar_label: "Whisper"
|
||||
description: "OpenAI's general-purpose speech recognition model"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Whisper
|
||||
|
||||
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/whisper` |
|
||||
| Path | `optional-skills/mlops/whisper` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `openai-whisper`, `transformers`, `torch` |
|
||||
| Tags | `Whisper`, `Speech Recognition`, `ASR`, `Multimodal`, `Multilingual`, `OpenAI`, `Speech-To-Text`, `Transcription`, `Translation`, `Audio Processing` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Whisper - Robust Speech Recognition
|
||||
|
||||
OpenAI's multilingual speech recognition model.
|
||||
|
||||
## When to use Whisper
|
||||
|
||||
**Use when:**
|
||||
- Speech-to-text transcription (99 languages)
|
||||
- Podcast/video transcription
|
||||
- Meeting notes automation
|
||||
- Translation to English
|
||||
- Noisy audio transcription
|
||||
- Multilingual audio processing
|
||||
|
||||
**Metrics**:
|
||||
- **72,900+ GitHub stars**
|
||||
- 99 languages supported
|
||||
- Trained on 680,000 hours of audio
|
||||
- MIT License
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **AssemblyAI**: Managed API, speaker diarization
|
||||
- **Deepgram**: Real-time streaming ASR
|
||||
- **Google Speech-to-Text**: Cloud-based
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Requires Python 3.8-3.11
|
||||
pip install -U openai-whisper
|
||||
|
||||
# Requires ffmpeg
|
||||
# macOS: brew install ffmpeg
|
||||
# Ubuntu: sudo apt install ffmpeg
|
||||
# Windows: choco install ffmpeg
|
||||
```
|
||||
|
||||
### Basic transcription
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
# Load model
|
||||
model = whisper.load_model("base")
|
||||
|
||||
# Transcribe
|
||||
result = model.transcribe("audio.mp3")
|
||||
|
||||
# Print text
|
||||
print(result["text"])
|
||||
|
||||
# Access segments
|
||||
for segment in result["segments"]:
|
||||
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")
|
||||
```
|
||||
|
||||
## Model sizes
|
||||
|
||||
```python
|
||||
# Available models
|
||||
models = ["tiny", "base", "small", "medium", "large", "turbo"]
|
||||
|
||||
# Load specific model
|
||||
model = whisper.load_model("turbo") # Fastest, good quality
|
||||
```
|
||||
|
||||
| Model | Parameters | English-only | Multilingual | Speed | VRAM |
|
||||
|-------|------------|--------------|--------------|-------|------|
|
||||
| tiny | 39M | ✓ | ✓ | ~32x | ~1 GB |
|
||||
| base | 74M | ✓ | ✓ | ~16x | ~1 GB |
|
||||
| small | 244M | ✓ | ✓ | ~6x | ~2 GB |
|
||||
| medium | 769M | ✓ | ✓ | ~2x | ~5 GB |
|
||||
| large | 1550M | ✗ | ✓ | 1x | ~10 GB |
|
||||
| turbo | 809M | ✗ | ✓ | ~8x | ~6 GB |
|
||||
|
||||
**Recommendation**: Use `turbo` for best speed/quality, `base` for prototyping
|
||||
|
||||
## Transcription options
|
||||
|
||||
### Language specification
|
||||
|
||||
```python
|
||||
# Auto-detect language
|
||||
result = model.transcribe("audio.mp3")
|
||||
|
||||
# Specify language (faster)
|
||||
result = model.transcribe("audio.mp3", language="en")
|
||||
|
||||
# Supported: en, es, fr, de, it, pt, ru, ja, ko, zh, and 89 more
|
||||
```
|
||||
|
||||
### Task selection
|
||||
|
||||
```python
|
||||
# Transcription (default)
|
||||
result = model.transcribe("audio.mp3", task="transcribe")
|
||||
|
||||
# Translation to English
|
||||
result = model.transcribe("spanish.mp3", task="translate")
|
||||
# Input: Spanish audio → Output: English text
|
||||
```
|
||||
|
||||
### Initial prompt
|
||||
|
||||
```python
|
||||
# Improve accuracy with context
|
||||
result = model.transcribe(
|
||||
"audio.mp3",
|
||||
initial_prompt="This is a technical podcast about machine learning and AI."
|
||||
)
|
||||
|
||||
# Helps with:
|
||||
# - Technical terms
|
||||
# - Proper nouns
|
||||
# - Domain-specific vocabulary
|
||||
```
|
||||
|
||||
### Timestamps
|
||||
|
||||
```python
|
||||
# Word-level timestamps
|
||||
result = model.transcribe("audio.mp3", word_timestamps=True)
|
||||
|
||||
for segment in result["segments"]:
|
||||
for word in segment["words"]:
|
||||
print(f"{word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")
|
||||
```
|
||||
|
||||
### Temperature fallback
|
||||
|
||||
```python
|
||||
# Retry with different temperatures if confidence low
|
||||
result = model.transcribe(
|
||||
"audio.mp3",
|
||||
temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
|
||||
)
|
||||
```
|
||||
|
||||
## Command line usage
|
||||
|
||||
```bash
|
||||
# Basic transcription
|
||||
whisper audio.mp3
|
||||
|
||||
# Specify model
|
||||
whisper audio.mp3 --model turbo
|
||||
|
||||
# Output formats
|
||||
whisper audio.mp3 --output_format txt # Plain text
|
||||
whisper audio.mp3 --output_format srt # Subtitles
|
||||
whisper audio.mp3 --output_format vtt # WebVTT
|
||||
whisper audio.mp3 --output_format json # JSON with timestamps
|
||||
|
||||
# Language
|
||||
whisper audio.mp3 --language Spanish
|
||||
|
||||
# Translation
|
||||
whisper spanish.mp3 --task translate
|
||||
```
|
||||
|
||||
## Batch processing
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
|
||||
|
||||
for audio_file in audio_files:
|
||||
print(f"Transcribing {audio_file}...")
|
||||
result = model.transcribe(audio_file)
|
||||
|
||||
# Save to file
|
||||
output_file = audio_file.replace(".mp3", ".txt")
|
||||
with open(output_file, "w") as f:
|
||||
f.write(result["text"])
|
||||
```
|
||||
|
||||
## Real-time transcription
|
||||
|
||||
```python
|
||||
# For streaming audio, use faster-whisper
|
||||
# pip install faster-whisper
|
||||
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
model = WhisperModel("base", device="cuda", compute_type="float16")
|
||||
|
||||
# Transcribe with streaming
|
||||
segments, info = model.transcribe("audio.mp3", beam_size=5)
|
||||
|
||||
for segment in segments:
|
||||
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
|
||||
```
|
||||
|
||||
## GPU acceleration
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
# Automatically uses GPU if available
|
||||
model = whisper.load_model("turbo")
|
||||
|
||||
# Force CPU
|
||||
model = whisper.load_model("turbo", device="cpu")
|
||||
|
||||
# Force GPU
|
||||
model = whisper.load_model("turbo", device="cuda")
|
||||
|
||||
# 10-20× faster on GPU
|
||||
```
|
||||
|
||||
## Integration with other tools
|
||||
|
||||
### Subtitle generation
|
||||
|
||||
```bash
|
||||
# Generate SRT subtitles
|
||||
whisper video.mp4 --output_format srt --language English
|
||||
|
||||
# Output: video.srt
|
||||
```
|
||||
|
||||
### With LangChain
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import WhisperTranscriptionLoader
|
||||
|
||||
loader = WhisperTranscriptionLoader(file_path="audio.mp3")
|
||||
docs = loader.load()
|
||||
|
||||
# Use transcription in RAG
|
||||
from langchain_chroma import Chroma
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
|
||||
```
|
||||
|
||||
### Extract audio from video
|
||||
|
||||
```bash
|
||||
# Use ffmpeg to extract audio
|
||||
ffmpeg -i video.mp4 -vn -acodec pcm_s16le audio.wav
|
||||
|
||||
# Then transcribe
|
||||
whisper audio.wav
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use turbo model** - Best speed/quality for English
|
||||
2. **Specify language** - Faster than auto-detect
|
||||
3. **Add initial prompt** - Improves technical terms
|
||||
4. **Use GPU** - 10-20× faster
|
||||
5. **Batch process** - More efficient
|
||||
6. **Convert to WAV** - Better compatibility
|
||||
7. **Split long audio** - <30 min chunks
|
||||
8. **Check language support** - Quality varies by language
|
||||
9. **Use faster-whisper** - 4× faster than openai-whisper
|
||||
10. **Monitor VRAM** - Scale model size to hardware
|
||||
|
||||
## Performance
|
||||
|
||||
| Model | Real-time factor (CPU) | Real-time factor (GPU) |
|
||||
|-------|------------------------|------------------------|
|
||||
| tiny | ~0.32 | ~0.01 |
|
||||
| base | ~0.16 | ~0.01 |
|
||||
| turbo | ~0.08 | ~0.01 |
|
||||
| large | ~1.0 | ~0.05 |
|
||||
|
||||
*Real-time factor: 0.1 = 10× faster than real-time*
|
||||
|
||||
## Language support
|
||||
|
||||
Top-supported languages:
|
||||
- English (en)
|
||||
- Spanish (es)
|
||||
- French (fr)
|
||||
- German (de)
|
||||
- Italian (it)
|
||||
- Portuguese (pt)
|
||||
- Russian (ru)
|
||||
- Japanese (ja)
|
||||
- Korean (ko)
|
||||
- Chinese (zh)
|
||||
|
||||
Full list: 99 languages total
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Hallucinations** - May repeat or invent text
|
||||
2. **Long-form accuracy** - Degrades on >30 min audio
|
||||
3. **Speaker identification** - No diarization
|
||||
4. **Accents** - Quality varies
|
||||
5. **Background noise** - Can affect accuracy
|
||||
6. **Real-time latency** - Not suitable for live captioning
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/openai/whisper ⭐ 72,900+
|
||||
- **Paper**: https://arxiv.org/abs/2212.04356
|
||||
- **Model Card**: https://github.com/openai/whisper/blob/main/model-card.md
|
||||
- **Colab**: Available in repo
|
||||
- **License**: MIT
|
||||
|
|
@ -0,0 +1,113 @@
|
|||
---
|
||||
title: "Canvas — Canvas LMS integration — fetch enrolled courses and assignments using API token authentication"
|
||||
sidebar_label: "Canvas"
|
||||
description: "Canvas LMS integration — fetch enrolled courses and assignments using API token authentication"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Canvas
|
||||
|
||||
Canvas LMS integration — fetch enrolled courses and assignments using API token authentication.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/productivity/canvas` |
|
||||
| Path | `optional-skills/productivity/canvas` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | community |
|
||||
| License | MIT |
|
||||
| Tags | `Canvas`, `LMS`, `Education`, `Courses`, `Assignments` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Canvas LMS — Course & Assignment Access
|
||||
|
||||
Read-only access to Canvas LMS for listing courses and assignments.
|
||||
|
||||
## Scripts
|
||||
|
||||
- `scripts/canvas_api.py` — Python CLI for Canvas API calls
|
||||
|
||||
## Setup
|
||||
|
||||
1. Log in to your Canvas instance in a browser
|
||||
2. Go to **Account → Settings** (click your profile icon, then Settings)
|
||||
3. Scroll to **Approved Integrations** and click **+ New Access Token**
|
||||
4. Name the token (e.g., "Hermes Agent"), set an optional expiry, and click **Generate Token**
|
||||
5. Copy the token and add to `~/.hermes/.env`:
|
||||
|
||||
```
|
||||
CANVAS_API_TOKEN=your_token_here
|
||||
CANVAS_BASE_URL=https://yourschool.instructure.com
|
||||
```
|
||||
|
||||
The base URL is whatever appears in your browser when you're logged into Canvas (no trailing slash).
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
CANVAS="python $HERMES_HOME/skills/productivity/canvas/scripts/canvas_api.py"
|
||||
|
||||
# List all active courses
|
||||
$CANVAS list_courses --enrollment-state active
|
||||
|
||||
# List all courses (any state)
|
||||
$CANVAS list_courses
|
||||
|
||||
# List assignments for a specific course
|
||||
$CANVAS list_assignments 12345
|
||||
|
||||
# List assignments ordered by due date
|
||||
$CANVAS list_assignments 12345 --order-by due_at
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
**list_courses** returns:
|
||||
```json
|
||||
[{"id": 12345, "name": "Intro to CS", "course_code": "CS101", "workflow_state": "available", "start_at": "...", "end_at": "..."}]
|
||||
```
|
||||
|
||||
**list_assignments** returns:
|
||||
```json
|
||||
[{"id": 67890, "name": "Homework 1", "due_at": "2025-02-15T23:59:00Z", "points_possible": 100, "submission_types": ["online_upload"], "html_url": "...", "description": "...", "course_id": 12345}]
|
||||
```
|
||||
|
||||
Note: Assignment descriptions are truncated to 500 characters. The `html_url` field links to the full assignment page in Canvas.
|
||||
|
||||
## API Reference (curl)
|
||||
|
||||
```bash
|
||||
# List courses
|
||||
curl -s -H "Authorization: Bearer $CANVAS_API_TOKEN" \
|
||||
"$CANVAS_BASE_URL/api/v1/courses?enrollment_state=active&per_page=10"
|
||||
|
||||
# List assignments for a course
|
||||
curl -s -H "Authorization: Bearer $CANVAS_API_TOKEN" \
|
||||
"$CANVAS_BASE_URL/api/v1/courses/COURSE_ID/assignments?per_page=10&order_by=due_at"
|
||||
```
|
||||
|
||||
Canvas uses `Link` headers for pagination. The Python script handles pagination automatically.
|
||||
|
||||
## Rules
|
||||
|
||||
- This skill is **read-only** — it only fetches data, never modifies courses or assignments
|
||||
- On first use, verify auth by running `$CANVAS list_courses` — if it fails with 401, guide the user through setup
|
||||
- Canvas rate-limits to ~700 requests per 10 minutes; check `X-Rate-Limit-Remaining` header if hitting limits
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Fix |
|
||||
|---------|-----|
|
||||
| 401 Unauthorized | Token invalid or expired — regenerate in Canvas Settings |
|
||||
| 403 Forbidden | Token lacks permission for this course |
|
||||
| Empty course list | Try `--enrollment-state active` or omit the flag to see all states |
|
||||
| Wrong institution | Verify `CANVAS_BASE_URL` matches the URL in your browser |
|
||||
| Timeout errors | Check network connectivity to your Canvas instance |
|
||||
|
|
@ -0,0 +1,336 @@
|
|||
---
|
||||
title: "Memento Flashcards — Spaced-repetition flashcard system"
|
||||
sidebar_label: "Memento Flashcards"
|
||||
description: "Spaced-repetition flashcard system"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Memento Flashcards
|
||||
|
||||
Spaced-repetition flashcard system. Create cards from facts or text, chat with flashcards using free-text answers graded by the agent, generate quizzes from YouTube transcripts, review due cards with adaptive scheduling, and export/import decks as CSV.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/productivity/memento-flashcards` |
|
||||
| Path | `optional-skills/productivity/memento-flashcards` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Memento AI |
|
||||
| License | MIT |
|
||||
| Platforms | macos, linux |
|
||||
| Tags | `Education`, `Flashcards`, `Spaced Repetition`, `Learning`, `Quiz`, `YouTube` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Memento Flashcards — Spaced-Repetition Flashcard Skill
|
||||
|
||||
## Overview
|
||||
|
||||
Memento gives you a local, file-based flashcard system with spaced-repetition scheduling.
|
||||
Users can chat with their flashcards by answering in free text and having the agent grade the response before scheduling the next review.
|
||||
Use it whenever the user wants to:
|
||||
|
||||
- **Remember a fact** — turn any statement into a Q/A flashcard
|
||||
- **Study with spaced repetition** — review due cards with adaptive intervals and agent-graded free-text answers
|
||||
- **Quiz from a YouTube video** — fetch a transcript and generate a 5-question quiz
|
||||
- **Manage decks** — organise cards into collections, export/import CSV
|
||||
|
||||
All card data lives in a single JSON file. No external API keys are required — you (the agent) generate flashcard content and quiz questions directly.
|
||||
|
||||
User-facing response style for Memento Flashcards:
|
||||
- Use plain text only. Do not use Markdown formatting in replies to the user.
|
||||
- Keep review and quiz feedback brief and neutral. Avoid extra praise, pep, or long explanations.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when the user wants to:
|
||||
- Save facts as flashcards for later review
|
||||
- Review due cards with spaced repetition
|
||||
- Generate a quiz from a YouTube video transcript
|
||||
- Import, export, inspect, or delete flashcard data
|
||||
|
||||
Do not use this skill for general Q&A, coding help, or non-memory tasks.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| User intent | Action |
|
||||
|---|---|
|
||||
| "Remember that X" / "save this as a flashcard" | Generate a Q/A card, call `memento_cards.py add` |
|
||||
| Sends a fact without mentioning flashcards | Ask "Want me to save this as a Memento flashcard?" — only create if confirmed |
|
||||
| "Create a flashcard" | Ask for Q, A, collection; call `memento_cards.py add` |
|
||||
| "Review my cards" | Call `memento_cards.py due`, present cards one-by-one |
|
||||
| "Quiz me on [YouTube URL]" | Call `youtube_quiz.py fetch VIDEO_ID`, generate 5 questions, call `memento_cards.py add-quiz` |
|
||||
| "Export my cards" | Call `memento_cards.py export --output PATH` |
|
||||
| "Import cards from CSV" | Call `memento_cards.py import --file PATH --collection NAME` |
|
||||
| "Show my stats" | Call `memento_cards.py stats` |
|
||||
| "Delete a card" | Call `memento_cards.py delete --id ID` |
|
||||
| "Delete a collection" | Call `memento_cards.py delete-collection --collection NAME` |
|
||||
|
||||
## Card Storage
|
||||
|
||||
Cards are stored in a JSON file at:
|
||||
|
||||
```
|
||||
~/.hermes/skills/productivity/memento-flashcards/data/cards.json
|
||||
```
|
||||
|
||||
**Never edit this file directly.** Always use `memento_cards.py` subcommands. The script handles atomic writes (write to temp file, then rename) to prevent corruption.
|
||||
|
||||
The file is created automatically on first use.
|
||||
|
||||
## Procedure
|
||||
|
||||
### Creating Cards from Facts
|
||||
|
||||
### Activation Rules
|
||||
|
||||
Not every factual statement should become a flashcard. Use this three-tier check:
|
||||
|
||||
1. **Explicit intent** — the user mentions "memento", "flashcard", "remember this", "save this card", "add a card", or similar phrasing that clearly requests a flashcard → **create the card directly**, no confirmation needed.
|
||||
2. **Implicit intent** — the user sends a factual statement without mentioning flashcards (e.g. "The speed of light is 299,792 km/s") → **ask first**: "Want me to save this as a Memento flashcard?" Only create the card if the user confirms.
|
||||
3. **No intent** — the message is a coding task, a question, instructions, normal conversation, or anything that is clearly not a fact to memorize → **do NOT activate this skill at all**. Let other skills or default behavior handle it.
|
||||
|
||||
When activation is confirmed (tier 1 directly, tier 2 after confirmation), generate a flashcard:
|
||||
|
||||
**Step 1:** Turn the statement into a Q/A pair. Use this format internally:
|
||||
|
||||
```
|
||||
Turn the factual statement into a front-back pair.
|
||||
Return exactly two lines:
|
||||
Q: <question text>
|
||||
A: <answer text>
|
||||
|
||||
Statement: "{statement}"
|
||||
```
|
||||
|
||||
Rules:
|
||||
- The question should test recall of the key fact
|
||||
- The answer should be concise and direct
|
||||
|
||||
**Step 2:** Call the script to store the card:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py add \
|
||||
--question "What year did World War 2 end?" \
|
||||
--answer "1945" \
|
||||
--collection "History"
|
||||
```
|
||||
|
||||
If the user doesn't specify a collection, use `"General"` as the default.
|
||||
|
||||
The script outputs JSON confirming the created card.
|
||||
|
||||
### Manual Card Creation
|
||||
|
||||
When the user explicitly asks to create a flashcard, ask them for:
|
||||
1. The question (front of card)
|
||||
2. The answer (back of card)
|
||||
3. The collection name (optional — default to `"General"`)
|
||||
|
||||
Then call `memento_cards.py add` as above.
|
||||
|
||||
### Reviewing Due Cards
|
||||
|
||||
When the user wants to review, fetch all due cards:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py due
|
||||
```
|
||||
|
||||
This returns a JSON array of cards where `next_review_at <= now`. If a collection filter is needed:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py due --collection "History"
|
||||
```
|
||||
|
||||
**Review flow (free-text grading):**
|
||||
|
||||
Here is an example of the EXACT interaction pattern you must follow. The user answers, you grade them, tell them the correct answer, then rate the card.
|
||||
|
||||
**Example interaction:**
|
||||
|
||||
> **Agent:** What year did the Berlin Wall fall?
|
||||
>
|
||||
> **User:** 1991
|
||||
>
|
||||
> **Agent:** Not quite. The Berlin Wall fell in 1989. Next review is tomorrow.
|
||||
> *(agent calls: memento_cards.py rate --id ABC --rating hard --user-answer "1991")*
|
||||
>
|
||||
> Next question: Who was the first person to walk on the moon?
|
||||
|
||||
**The rules:**
|
||||
|
||||
1. Show only the question. Wait for the user to answer.
|
||||
2. After receiving their answer, compare it to the expected answer and grade it:
|
||||
- **correct** → user got the key fact right (even if worded differently)
|
||||
- **partial** → right track but missing the core detail
|
||||
- **incorrect** → wrong or off-topic
|
||||
3. **You MUST tell the user the correct answer and how they did.** Keep it short and plain-text. Use this format:
|
||||
- correct: "Correct. Answer: {answer}. Next review in 7 days."
|
||||
- partial: "Close. Answer: {answer}. {what they missed}. Next review in 3 days."
|
||||
- incorrect: "Not quite. Answer: {answer}. Next review tomorrow."
|
||||
4. Then call the rate command: correct→easy, partial→good, incorrect→hard.
|
||||
5. Then show the next question.
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py rate \
|
||||
--id CARD_ID --rating easy --user-answer "what the user said"
|
||||
```
|
||||
|
||||
**Never skip step 3.** The user must always see the correct answer and feedback before you move on.
|
||||
|
||||
If no cards are due, tell the user: "No cards due for review right now. Check back later!"
|
||||
|
||||
**Retire override:** At any point the user can say "retire this card" to permanently remove it from reviews. Use `--rating retire` for this.
|
||||
|
||||
### Spaced Repetition Algorithm
|
||||
|
||||
The rating determines the next review interval:
|
||||
|
||||
| Rating | Interval | ease_streak | Status change |
|
||||
|---|---|---|---|
|
||||
| **hard** | +1 day | reset to 0 | stays learning |
|
||||
| **good** | +3 days | reset to 0 | stays learning |
|
||||
| **easy** | +7 days | +1 | if ease_streak >= 3 → retired |
|
||||
| **retire** | permanent | reset to 0 | → retired |
|
||||
|
||||
- **learning**: card is actively in rotation
|
||||
- **retired**: card won't appear in reviews (user has mastered it or manually retired it)
|
||||
- Three consecutive "easy" ratings automatically retire a card
|
||||
|
||||
### YouTube Quiz Generation
|
||||
|
||||
When the user sends a YouTube URL and wants a quiz:
|
||||
|
||||
**Step 1:** Extract the video ID from the URL (e.g. `dQw4w9WgXcQ` from `https://www.youtube.com/watch?v=dQw4w9WgXcQ`).
|
||||
|
||||
**Step 2:** Fetch the transcript:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/youtube_quiz.py fetch VIDEO_ID
|
||||
```
|
||||
|
||||
This returns `{"title": "...", "transcript": "..."}` or an error.
|
||||
|
||||
If the script reports `missing_dependency`, tell the user to install it:
|
||||
```bash
|
||||
pip install youtube-transcript-api
|
||||
```
|
||||
|
||||
**Step 3:** Generate 5 quiz questions from the transcript. Use these rules:
|
||||
|
||||
```
|
||||
You are creating a 5-question quiz for a podcast episode.
|
||||
Return ONLY a JSON array with exactly 5 objects.
|
||||
Each object must contain keys 'question' and 'answer'.
|
||||
|
||||
Selection criteria:
|
||||
- Prioritize important, surprising, or foundational facts.
|
||||
- Skip filler, obvious details, and facts that require heavy context.
|
||||
- Never return true/false questions.
|
||||
- Never ask only for a date.
|
||||
|
||||
Question rules:
|
||||
- Each question must test exactly one discrete fact.
|
||||
- Use clear, unambiguous wording.
|
||||
- Prefer What, Who, How many, Which.
|
||||
- Avoid open-ended Describe or Explain prompts.
|
||||
|
||||
Answer rules:
|
||||
- Each answer must be under 240 characters.
|
||||
- Lead with the answer itself, not preamble.
|
||||
- Add only minimal clarifying detail if needed.
|
||||
```
|
||||
|
||||
Use the first 15,000 characters of the transcript as context. Generate the questions yourself (you are the LLM).
|
||||
|
||||
**Step 4:** Validate the output is valid JSON with exactly 5 items, each having non-empty `question` and `answer` strings. If validation fails, retry once.
|
||||
|
||||
**Step 5:** Store quiz cards:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py add-quiz \
|
||||
--video-id "VIDEO_ID" \
|
||||
--questions '[{"question":"...","answer":"..."},...]' \
|
||||
--collection "Quiz - Episode Title"
|
||||
```
|
||||
|
||||
The script deduplicates by `video_id` — if cards for that video already exist, it skips creation and reports the existing cards.
|
||||
|
||||
**Step 6:** Present questions one-by-one using the same free-text grading flow:
|
||||
1. Show "Question 1/5: ..." and wait for the user's answer. Never include the answer or any hint about revealing it.
|
||||
2. Wait for the user to answer in their own words
|
||||
3. Grade their answer using the grading prompt (see "Reviewing Due Cards" section)
|
||||
4. **IMPORTANT: You MUST reply to the user with feedback before doing anything else.** Show the grade, the correct answer, and when the card is next due. Do NOT silently skip to the next question. Keep it short and plain-text. Example: "Not quite. Answer: {answer}. Next review tomorrow."
|
||||
5. **After showing feedback**, call the rate command and then show the next question in the same message:
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py rate \
|
||||
--id CARD_ID --rating easy --user-answer "what the user said"
|
||||
```
|
||||
6. Repeat. Every answer MUST receive visible feedback before the next question.
|
||||
|
||||
### Export/Import CSV
|
||||
|
||||
**Export:**
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py export \
|
||||
--output ~/flashcards.csv
|
||||
```
|
||||
|
||||
Produces a 3-column CSV: `question,answer,collection` (no header row).
|
||||
|
||||
**Import:**
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py import \
|
||||
--file ~/flashcards.csv \
|
||||
--collection "Imported"
|
||||
```
|
||||
|
||||
Reads a CSV with columns: question, answer, and optionally collection (column 3). If the collection column is missing, uses the `--collection` argument.
|
||||
|
||||
### Statistics
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py stats
|
||||
```
|
||||
|
||||
Returns JSON with:
|
||||
- `total`: total card count
|
||||
- `learning`: cards in active rotation
|
||||
- `retired`: mastered cards
|
||||
- `due_now`: cards due for review right now
|
||||
- `collections`: breakdown by collection name
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Never edit `cards.json` directly** — always use the script subcommands to avoid corruption
|
||||
- **Transcript failures** — some YouTube videos have no English transcript or have transcripts disabled; inform the user and suggest another video
|
||||
- **Optional dependency** — `youtube_quiz.py` needs `youtube-transcript-api`; if missing, tell the user to run `pip install youtube-transcript-api`
|
||||
- **Large imports** — CSV imports with thousands of rows work fine but the JSON output may be verbose; summarize the result for the user
|
||||
- **Video ID extraction** — support both `youtube.com/watch?v=ID` and `youtu.be/ID` URL formats
|
||||
|
||||
## Verification
|
||||
|
||||
Verify the helper scripts directly:
|
||||
|
||||
```bash
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py stats
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py add --question "Capital of France?" --answer "Paris" --collection "General"
|
||||
python3 ~/.hermes/skills/productivity/memento-flashcards/scripts/memento_cards.py due
|
||||
```
|
||||
|
||||
If you are testing from the repo checkout, run:
|
||||
|
||||
```bash
|
||||
pytest tests/skills/test_memento_cards.py tests/skills/test_youtube_quiz.py -q
|
||||
```
|
||||
|
||||
Agent-level verification:
|
||||
- Start a review and confirm feedback is plain text, brief, and always includes the correct answer before the next card
|
||||
- Run a YouTube quiz flow and confirm each answer receives visible feedback before the next question
|
||||
|
|
@ -0,0 +1,304 @@
|
|||
---
|
||||
title: "Siyuan"
|
||||
sidebar_label: "Siyuan"
|
||||
description: "SiYuan Note API for searching, reading, creating, and managing blocks and documents in a self-hosted knowledge base via curl"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Siyuan
|
||||
|
||||
SiYuan Note API for searching, reading, creating, and managing blocks and documents in a self-hosted knowledge base via curl.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/productivity/siyuan` |
|
||||
| Path | `optional-skills/productivity/siyuan` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | FEUAZUR |
|
||||
| License | MIT |
|
||||
| Tags | `SiYuan`, `Notes`, `Knowledge Base`, `PKM`, `API` |
|
||||
| Related skills | [`obsidian`](/docs/user-guide/skills/bundled/note-taking/note-taking-obsidian), [`notion`](/docs/user-guide/skills/bundled/productivity/productivity-notion) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# SiYuan Note API
|
||||
|
||||
Use the [SiYuan](https://github.com/siyuan-note/siyuan) kernel API via curl to search, read, create, update, and delete blocks and documents in a self-hosted knowledge base. No extra tools needed -- just curl and an API token.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Install and run SiYuan (desktop or Docker)
|
||||
2. Get your API token: **Settings > About > API token**
|
||||
3. Store it in `~/.hermes/.env`:
|
||||
```
|
||||
SIYUAN_TOKEN=your_token_here
|
||||
SIYUAN_URL=http://127.0.0.1:6806
|
||||
```
|
||||
`SIYUAN_URL` defaults to `http://127.0.0.1:6806` if not set.
|
||||
|
||||
## API Basics
|
||||
|
||||
All SiYuan API calls are **POST with JSON body**. Every request follows this pattern:
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/..." \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"param": "value"}'
|
||||
```
|
||||
|
||||
Responses are JSON with this structure:
|
||||
```json
|
||||
{"code": 0, "msg": "", "data": { ... }}
|
||||
```
|
||||
`code: 0` means success. Any other value is an error -- check `msg` for details.
|
||||
|
||||
**ID format:** SiYuan IDs look like `20210808180117-6v0mkxr` (14-digit timestamp + 7 alphanumeric chars).
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Operation | Endpoint |
|
||||
|-----------|----------|
|
||||
| Full-text search | `/api/search/fullTextSearchBlock` |
|
||||
| SQL query | `/api/query/sql` |
|
||||
| Read block | `/api/block/getBlockKramdown` |
|
||||
| Read children | `/api/block/getChildBlocks` |
|
||||
| Get path | `/api/filetree/getHPathByID` |
|
||||
| Get attributes | `/api/attr/getBlockAttrs` |
|
||||
| List notebooks | `/api/notebook/lsNotebooks` |
|
||||
| List documents | `/api/filetree/listDocsByPath` |
|
||||
| Create notebook | `/api/notebook/createNotebook` |
|
||||
| Create document | `/api/filetree/createDocWithMd` |
|
||||
| Append block | `/api/block/appendBlock` |
|
||||
| Update block | `/api/block/updateBlock` |
|
||||
| Rename document | `/api/filetree/renameDocByID` |
|
||||
| Set attributes | `/api/attr/setBlockAttrs` |
|
||||
| Delete block | `/api/block/deleteBlock` |
|
||||
| Delete document | `/api/filetree/removeDocByID` |
|
||||
| Export as Markdown | `/api/export/exportMdContent` |
|
||||
|
||||
## Common Operations
|
||||
|
||||
### Search (Full-Text)
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/search/fullTextSearchBlock" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "meeting notes", "page": 0}' | jq '.data.blocks[:5]'
|
||||
```
|
||||
|
||||
### Search (SQL)
|
||||
|
||||
Query the blocks database directly. Only SELECT statements are safe.
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/query/sql" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"stmt": "SELECT id, content, type, box FROM blocks WHERE content LIKE '\''%keyword%'\'' AND type='\''p'\'' LIMIT 20"}' | jq '.data'
|
||||
```
|
||||
|
||||
Useful columns: `id`, `parent_id`, `root_id`, `box` (notebook ID), `path`, `content`, `type`, `subtype`, `created`, `updated`.
|
||||
|
||||
### Read Block Content
|
||||
|
||||
Returns block content in Kramdown (Markdown-like) format.
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/block/getBlockKramdown" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "20210808180117-6v0mkxr"}' | jq '.data.kramdown'
|
||||
```
|
||||
|
||||
### Read Child Blocks
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/block/getChildBlocks" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "20210808180117-6v0mkxr"}' | jq '.data'
|
||||
```
|
||||
|
||||
### Get Human-Readable Path
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/filetree/getHPathByID" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "20210808180117-6v0mkxr"}' | jq '.data'
|
||||
```
|
||||
|
||||
### Get Block Attributes
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/attr/getBlockAttrs" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "20210808180117-6v0mkxr"}' | jq '.data'
|
||||
```
|
||||
|
||||
### List Notebooks
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/notebook/lsNotebooks" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{}' | jq '.data.notebooks[] | {id, name, closed}'
|
||||
```
|
||||
|
||||
### List Documents in a Notebook
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/filetree/listDocsByPath" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"notebook": "NOTEBOOK_ID", "path": "/"}' | jq '.data.files[] | {id, name}'
|
||||
```
|
||||
|
||||
### Create a Document
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/filetree/createDocWithMd" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"notebook": "NOTEBOOK_ID",
|
||||
"path": "/Meeting Notes/2026-03-22",
|
||||
"markdown": "# Meeting Notes\n\n- Discussed project timeline\n- Assigned tasks"
|
||||
}' | jq '.data'
|
||||
```
|
||||
|
||||
### Create a Notebook
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/notebook/createNotebook" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"name": "My New Notebook"}' | jq '.data.notebook.id'
|
||||
```
|
||||
|
||||
### Append Block to Document
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/block/appendBlock" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"parentID": "DOCUMENT_OR_BLOCK_ID",
|
||||
"data": "New paragraph added at the end.",
|
||||
"dataType": "markdown"
|
||||
}' | jq '.data'
|
||||
```
|
||||
|
||||
Also available: `/api/block/prependBlock` (same params, inserts at the beginning) and `/api/block/insertBlock` (uses `previousID` instead of `parentID` to insert after a specific block).
|
||||
|
||||
### Update Block Content
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/block/updateBlock" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"id": "BLOCK_ID",
|
||||
"data": "Updated content here.",
|
||||
"dataType": "markdown"
|
||||
}' | jq '.data'
|
||||
```
|
||||
|
||||
### Rename a Document
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/filetree/renameDocByID" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "DOCUMENT_ID", "title": "New Title"}'
|
||||
```
|
||||
|
||||
### Set Block Attributes
|
||||
|
||||
Custom attributes must be prefixed with `custom-`:
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/attr/setBlockAttrs" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"id": "BLOCK_ID",
|
||||
"attrs": {
|
||||
"custom-status": "reviewed",
|
||||
"custom-priority": "high"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### Delete a Block
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/block/deleteBlock" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "BLOCK_ID"}'
|
||||
```
|
||||
|
||||
To delete a whole document: use `/api/filetree/removeDocByID` with `{"id": "DOC_ID"}`.
|
||||
To delete a notebook: use `/api/notebook/removeNotebook` with `{"notebook": "NOTEBOOK_ID"}`.
|
||||
|
||||
### Export Document as Markdown
|
||||
|
||||
```bash
|
||||
curl -s -X POST "${SIYUAN_URL:-http://127.0.0.1:6806}/api/export/exportMdContent" \
|
||||
-H "Authorization: Token $SIYUAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"id": "DOCUMENT_ID"}' | jq -r '.data.content'
|
||||
```
|
||||
|
||||
## Block Types
|
||||
|
||||
Common `type` values in SQL queries:
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `d` | Document (root block) |
|
||||
| `p` | Paragraph |
|
||||
| `h` | Heading |
|
||||
| `l` | List |
|
||||
| `i` | List item |
|
||||
| `c` | Code block |
|
||||
| `m` | Math block |
|
||||
| `t` | Table |
|
||||
| `b` | Blockquote |
|
||||
| `s` | Super block |
|
||||
| `html` | HTML block |
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **All endpoints are POST** -- even read-only operations. Do not use GET.
|
||||
- **SQL safety**: only use SELECT queries. INSERT/UPDATE/DELETE/DROP are dangerous and should never be sent.
|
||||
- **ID validation**: IDs match the pattern `YYYYMMDDHHmmss-xxxxxxx`. Reject anything else.
|
||||
- **Error responses**: always check `code != 0` in responses before processing `data`.
|
||||
- **Large documents**: block content and export results can be very large. Use `LIMIT` in SQL and pipe through `jq` to extract only what you need.
|
||||
- **Notebook IDs**: when working with a specific notebook, get its ID first via `lsNotebooks`.
|
||||
|
||||
## Alternative: MCP Server
|
||||
|
||||
If you prefer a native integration instead of curl, install the SiYuan MCP server:
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml under mcp_servers:
|
||||
mcp_servers:
|
||||
siyuan:
|
||||
command: npx
|
||||
args: ["-y", "@porkll/siyuan-mcp"]
|
||||
env:
|
||||
SIYUAN_TOKEN: "your_token"
|
||||
SIYUAN_URL: "http://127.0.0.1:6806"
|
||||
```
|
||||
|
|
@ -0,0 +1,434 @@
|
|||
---
|
||||
title: "Telephony — Give Hermes phone capabilities without core tool changes"
|
||||
sidebar_label: "Telephony"
|
||||
description: "Give Hermes phone capabilities without core tool changes"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Telephony
|
||||
|
||||
Give Hermes phone capabilities without core tool changes. Provision and persist a Twilio number, send and receive SMS/MMS, make direct calls, and place AI-driven outbound calls through Bland.ai or Vapi.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/productivity/telephony` |
|
||||
| Path | `optional-skills/productivity/telephony` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Nous Research |
|
||||
| License | MIT |
|
||||
| Tags | `telephony`, `phone`, `sms`, `mms`, `voice`, `twilio`, `bland.ai`, `vapi`, `calling`, `texting` |
|
||||
| Related skills | [`maps`](/docs/user-guide/skills/bundled/productivity/productivity-maps), [`google-workspace`](/docs/user-guide/skills/bundled/productivity/productivity-google-workspace), [`agentmail`](/docs/user-guide/skills/optional/email/email-agentmail) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Telephony — Numbers, Calls, and Texts without Core Tool Changes
|
||||
|
||||
This optional skill gives Hermes practical phone capabilities while keeping telephony out of the core tool list.
|
||||
|
||||
It ships with a helper script, `scripts/telephony.py`, that can:
|
||||
- save provider credentials into `~/.hermes/.env`
|
||||
- search for and buy a Twilio phone number
|
||||
- remember that owned number for later sessions
|
||||
- send SMS / MMS from the owned number
|
||||
- poll inbound SMS for that number with no webhook server required
|
||||
- make direct Twilio calls using TwiML `<Say>` or `<Play>`
|
||||
- import the owned Twilio number into Vapi
|
||||
- place outbound AI calls through Bland.ai or Vapi
|
||||
|
||||
## What this solves
|
||||
|
||||
This skill is meant to cover the practical phone tasks users actually want:
|
||||
- outbound calls
|
||||
- texting
|
||||
- owning a reusable agent number
|
||||
- checking messages that arrive to that number later
|
||||
- preserving that number and related IDs between sessions
|
||||
- future-friendly telephony identity for inbound SMS polling and other automations
|
||||
|
||||
It does **not** turn Hermes into a real-time inbound phone gateway. Inbound SMS is handled by polling the Twilio REST API. That is enough for many workflows, including notifications and some one-time-code retrieval, without adding core webhook infrastructure.
|
||||
|
||||
## Safety rules — mandatory
|
||||
|
||||
1. Always confirm before placing a call or sending a text.
|
||||
2. Never dial emergency numbers.
|
||||
3. Never use telephony for harassment, spam, impersonation, or anything illegal.
|
||||
4. Treat third-party phone numbers as sensitive operational data:
|
||||
- do not save them to Hermes memory
|
||||
- do not include them in skill docs, summaries, or follow-up notes unless the user explicitly wants that
|
||||
5. It is fine to persist the **agent-owned Twilio number** because that is part of the user's configuration.
|
||||
6. VoIP numbers are **not guaranteed** to work for all third-party 2FA flows. Use with caution and set user expectations clearly.
|
||||
|
||||
## Decision tree — which service to use?
|
||||
|
||||
Use this logic instead of hardcoded provider routing:
|
||||
|
||||
### 1) "I want Hermes to own a real phone number"
|
||||
Use **Twilio**.
|
||||
|
||||
Why:
|
||||
- easiest path to buying and keeping a number
|
||||
- best SMS / MMS support
|
||||
- simplest inbound SMS polling story
|
||||
- cleanest future path to inbound webhooks or call handling
|
||||
|
||||
Use cases:
|
||||
- receive texts later
|
||||
- send deployment alerts / cron notifications
|
||||
- maintain a reusable phone identity for the agent
|
||||
- experiment with phone-based auth flows later
|
||||
|
||||
### 2) "I only need the easiest outbound AI phone call right now"
|
||||
Use **Bland.ai**.
|
||||
|
||||
Why:
|
||||
- quickest setup
|
||||
- one API key
|
||||
- no need to first buy/import a number yourself
|
||||
|
||||
Tradeoff:
|
||||
- less flexible
|
||||
- voice quality is decent, but not the best
|
||||
|
||||
### 3) "I want the best conversational AI voice quality"
|
||||
Use **Twilio + Vapi**.
|
||||
|
||||
Why:
|
||||
- Twilio gives you the owned number
|
||||
- Vapi gives you better conversational AI call quality and more voice/model flexibility
|
||||
|
||||
Recommended flow:
|
||||
1. Buy/save a Twilio number
|
||||
2. Import it into Vapi
|
||||
3. Save the returned `VAPI_PHONE_NUMBER_ID`
|
||||
4. Use `ai-call --provider vapi`
|
||||
|
||||
### 4) "I want to call with a custom prerecorded voice message"
|
||||
Use **Twilio direct call** with a public audio URL.
|
||||
|
||||
Why:
|
||||
- easiest way to play a custom MP3
|
||||
- pairs well with Hermes `text_to_speech` plus a public file host or tunnel
|
||||
|
||||
## Files and persistent state
|
||||
|
||||
The skill persists telephony state in two places:
|
||||
|
||||
### `~/.hermes/.env`
|
||||
Used for long-lived provider credentials and owned-number IDs, for example:
|
||||
- `TWILIO_ACCOUNT_SID`
|
||||
- `TWILIO_AUTH_TOKEN`
|
||||
- `TWILIO_PHONE_NUMBER`
|
||||
- `TWILIO_PHONE_NUMBER_SID`
|
||||
- `BLAND_API_KEY`
|
||||
- `VAPI_API_KEY`
|
||||
- `VAPI_PHONE_NUMBER_ID`
|
||||
- `PHONE_PROVIDER` (AI call provider: bland or vapi)
|
||||
|
||||
### `~/.hermes/telephony_state.json`
|
||||
Used for skill-only state that should survive across sessions, for example:
|
||||
- remembered default Twilio number / SID
|
||||
- remembered Vapi phone number ID
|
||||
- last inbound message SID/date for inbox polling checkpoints
|
||||
|
||||
This means:
|
||||
- the next time the skill is loaded, `diagnose` can tell you what number is already configured
|
||||
- `twilio-inbox --since-last --mark-seen` can continue from the previous checkpoint
|
||||
|
||||
## Locate the helper script
|
||||
|
||||
After installing this skill, locate the script like this:
|
||||
|
||||
```bash
|
||||
SCRIPT="$(find ~/.hermes/skills -path '*/telephony/scripts/telephony.py' -print -quit)"
|
||||
```
|
||||
|
||||
If `SCRIPT` is empty, the skill is not installed yet.
|
||||
|
||||
## Install
|
||||
|
||||
This is an official optional skill, so install it from the Skills Hub:
|
||||
|
||||
```bash
|
||||
hermes skills search telephony
|
||||
hermes skills install official/productivity/telephony
|
||||
```
|
||||
|
||||
## Provider setup
|
||||
|
||||
### Twilio — owned number, SMS/MMS, direct calls, inbound SMS polling
|
||||
|
||||
Sign up at:
|
||||
- https://www.twilio.com/try-twilio
|
||||
|
||||
Then save credentials into Hermes:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-twilio ACXXXXXXXXXXXXXXXXXXXXXXXXXXXX your_auth_token_here
|
||||
```
|
||||
|
||||
Search for available numbers:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 5
|
||||
```
|
||||
|
||||
Buy and remember a number:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
|
||||
```
|
||||
|
||||
List owned numbers:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-owned
|
||||
```
|
||||
|
||||
Set one of them as the default later:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-set-default "+17025551234" --save-env
|
||||
# or
|
||||
python3 "$SCRIPT" twilio-set-default PNXXXXXXXXXXXXXXXXXXXXXXXXXXXX --save-env
|
||||
```
|
||||
|
||||
### Bland.ai — easiest outbound AI calling
|
||||
|
||||
Sign up at:
|
||||
- https://app.bland.ai
|
||||
|
||||
Save config:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-bland your_bland_api_key --voice mason
|
||||
```
|
||||
|
||||
### Vapi — better conversational voice quality
|
||||
|
||||
Sign up at:
|
||||
- https://dashboard.vapi.ai
|
||||
|
||||
Save the API key first:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-vapi your_vapi_api_key
|
||||
```
|
||||
|
||||
Import your owned Twilio number into Vapi and persist the returned phone number ID:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" vapi-import-twilio --save-env
|
||||
```
|
||||
|
||||
If you already know the Vapi phone number ID, save it directly:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-vapi your_vapi_api_key --phone-number-id vapi_phone_number_id_here
|
||||
```
|
||||
|
||||
## Diagnose current state
|
||||
|
||||
At any time, inspect what the skill already knows:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" diagnose
|
||||
```
|
||||
|
||||
Use this first when resuming work in a later session.
|
||||
|
||||
## Common workflows
|
||||
|
||||
### A. Buy an agent number and keep using it later
|
||||
|
||||
1. Save Twilio credentials:
|
||||
```bash
|
||||
python3 "$SCRIPT" save-twilio AC... auth_token_here
|
||||
```
|
||||
|
||||
2. Search for a number:
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 10
|
||||
```
|
||||
|
||||
3. Buy it and save it into `~/.hermes/.env` + state:
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
|
||||
```
|
||||
|
||||
4. Next session, run:
|
||||
```bash
|
||||
python3 "$SCRIPT" diagnose
|
||||
```
|
||||
This shows the remembered default number and inbox checkpoint state.
|
||||
|
||||
### B. Send a text from the agent number
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Your deployment completed successfully."
|
||||
```
|
||||
|
||||
With media:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Here is the chart." --media-url "https://example.com/chart.png"
|
||||
```
|
||||
|
||||
### C. Check inbound texts later with no webhook server
|
||||
|
||||
Poll the inbox for the default Twilio number:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-inbox --limit 20
|
||||
```
|
||||
|
||||
Only show messages that arrived after the last checkpoint, and advance the checkpoint when you're done reading:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-inbox --since-last --mark-seen
|
||||
```
|
||||
|
||||
This is the main answer to “how do I access messages the number receives next time the skill is loaded?”
|
||||
|
||||
### D. Make a direct Twilio call with built-in TTS
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+15551230000" --message "Hello! This is Hermes calling with your status update." --voice Polly.Joanna
|
||||
```
|
||||
|
||||
### E. Call with a prerecorded / custom voice message
|
||||
|
||||
This is the main path for reusing Hermes's existing `text_to_speech` support.
|
||||
|
||||
Use this when:
|
||||
- you want the call to use Hermes's configured TTS voice rather than Twilio `<Say>`
|
||||
- you want a one-way voice delivery (briefing, alert, joke, reminder, status update)
|
||||
- you do **not** need a live conversational phone call
|
||||
|
||||
Generate or host audio separately, then:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+155****0000" --audio-url "https://example.com/briefing.mp3"
|
||||
```
|
||||
|
||||
Recommended Hermes TTS -> Twilio Play workflow:
|
||||
|
||||
1. Generate the audio with Hermes `text_to_speech`.
|
||||
2. Make the resulting MP3 publicly reachable.
|
||||
3. Place the Twilio call with `--audio-url`.
|
||||
|
||||
Example agent flow:
|
||||
- Ask Hermes to create the message audio with `text_to_speech`
|
||||
- If needed, expose the file with a temporary static host / tunnel / object storage URL
|
||||
- Use `twilio-call --audio-url ...` to deliver it by phone
|
||||
|
||||
Good hosting options for the MP3:
|
||||
- a temporary public object/storage URL
|
||||
- a short-lived tunnel to a local static file server
|
||||
- any existing HTTPS URL the phone provider can fetch directly
|
||||
|
||||
Important note:
|
||||
- Hermes TTS is great for prerecorded outbound messages
|
||||
- Bland/Vapi are better for **live conversational AI calls** because they handle the real-time telephony audio stack themselves
|
||||
- Hermes STT/TTS alone is not being used here as a full duplex phone conversation engine; that would require a much heavier streaming/webhook integration than this skill is trying to introduce
|
||||
|
||||
### F. Navigate a phone tree / IVR with Twilio direct calling
|
||||
|
||||
If you need to press digits after the call connects, use `--send-digits`.
|
||||
Twilio interprets `w` as a short wait.
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+18005551234" --message "Connecting to billing now." --send-digits "ww1w2w3"
|
||||
```
|
||||
|
||||
This is useful for reaching a specific menu branch before handing off to a human or delivering a short status message.
|
||||
|
||||
### G. Outbound AI phone call with Bland.ai
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-call "+15551230000" "Call the dental office, ask for a cleaning appointment on Tuesday afternoon, and if they do not have Tuesday availability, ask for Wednesday or Thursday instead." --provider bland --voice mason --max-duration 3
|
||||
```
|
||||
|
||||
Check status:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider bland
|
||||
```
|
||||
|
||||
Ask Bland analysis questions after completion:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider bland --analyze "Was the appointment confirmed?,What date and time?,Any special instructions?"
|
||||
```
|
||||
|
||||
### H. Outbound AI phone call with Vapi on your owned number
|
||||
|
||||
1. Import your Twilio number into Vapi:
|
||||
```bash
|
||||
python3 "$SCRIPT" vapi-import-twilio --save-env
|
||||
```
|
||||
|
||||
2. Place the call:
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-call "+15551230000" "You are calling to make a dinner reservation for two at 7:30 PM. If that is unavailable, ask for the nearest time between 6:30 and 8:30 PM." --provider vapi --max-duration 4
|
||||
```
|
||||
|
||||
3. Check result:
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider vapi
|
||||
```
|
||||
|
||||
## Suggested agent procedure
|
||||
|
||||
When the user asks for a call or text:
|
||||
|
||||
1. Determine which path fits the request via the decision tree.
|
||||
2. Run `diagnose` if configuration state is unclear.
|
||||
3. Gather the full task details.
|
||||
4. Confirm with the user before dialing or texting.
|
||||
5. Use the correct command.
|
||||
6. Poll for results if needed.
|
||||
7. Summarize the outcome without persisting third-party numbers to Hermes memory.
|
||||
|
||||
## What this skill still does not do
|
||||
|
||||
- real-time inbound call answering
|
||||
- webhook-based live SMS push into the agent loop
|
||||
- guaranteed support for arbitrary third-party 2FA providers
|
||||
|
||||
Those would require more infrastructure than a pure optional skill.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- Twilio trial accounts and regional rules can restrict who you can call/text.
|
||||
- Some services reject VoIP numbers for 2FA.
|
||||
- `twilio-inbox` polls the REST API; it is not instant push delivery.
|
||||
- Vapi outbound calling still depends on having a valid imported number.
|
||||
- Bland is easiest, but not always the best-sounding.
|
||||
- Do not store arbitrary third-party phone numbers in Hermes memory.
|
||||
|
||||
## Verification checklist
|
||||
|
||||
After setup, you should be able to do all of the following with just this skill:
|
||||
|
||||
1. `diagnose` shows provider readiness and remembered state
|
||||
2. search and buy a Twilio number
|
||||
3. persist that number to `~/.hermes/.env`
|
||||
4. send an SMS from the owned number
|
||||
5. poll inbound texts for the owned number later
|
||||
6. place a direct Twilio call
|
||||
7. place an AI call via Bland or Vapi
|
||||
|
||||
## References
|
||||
|
||||
- Twilio phone numbers: https://www.twilio.com/docs/phone-numbers/api
|
||||
- Twilio messaging: https://www.twilio.com/docs/messaging/api/message-resource
|
||||
- Twilio voice: https://www.twilio.com/docs/voice/api/call-resource
|
||||
- Vapi docs: https://docs.vapi.ai/
|
||||
- Bland.ai: https://app.bland.ai/
|
||||
|
|
@ -0,0 +1,252 @@
|
|||
---
|
||||
title: "Bioinformatics — Gateway to 400+ bioinformatics skills from bioSkills and ClawBio"
|
||||
sidebar_label: "Bioinformatics"
|
||||
description: "Gateway to 400+ bioinformatics skills from bioSkills and ClawBio"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Bioinformatics
|
||||
|
||||
Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology, and more. Fetches domain-specific reference material on demand.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/bioinformatics` |
|
||||
| Path | `optional-skills/research/bioinformatics` |
|
||||
| Version | `1.0.0` |
|
||||
| Platforms | linux, macos |
|
||||
| Tags | `bioinformatics`, `genomics`, `sequencing`, `biology`, `research`, `science` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Bioinformatics Skills Gateway
|
||||
|
||||
Use when asked about bioinformatics, genomics, sequencing, variant calling, gene expression, single-cell analysis, protein structure, pharmacogenomics, metagenomics, phylogenetics, or any computational biology task.
|
||||
|
||||
This skill is a gateway to two open-source bioinformatics skill libraries. Instead of bundling hundreds of domain-specific skills, it indexes them and fetches what you need on demand.
|
||||
|
||||
## Sources
|
||||
|
||||
◆ **bioSkills** — 385 reference skills (code patterns, parameter guides, decision trees)
|
||||
Repo: https://github.com/GPTomics/bioSkills
|
||||
Format: SKILL.md per topic with code examples. Python/R/CLI.
|
||||
|
||||
◆ **ClawBio** — 33 runnable pipeline skills (executable scripts, reproducibility bundles)
|
||||
Repo: https://github.com/ClawBio/ClawBio
|
||||
Format: Python scripts with demos. Each analysis exports report.md + commands.sh + environment.yml.
|
||||
|
||||
## How to fetch and use a skill
|
||||
|
||||
1. Identify the domain and skill name from the index below.
|
||||
2. Clone the relevant repo (shallow clone to save time):
|
||||
```bash
|
||||
# bioSkills (reference material)
|
||||
git clone --depth 1 https://github.com/GPTomics/bioSkills.git /tmp/bioSkills
|
||||
|
||||
# ClawBio (runnable pipelines)
|
||||
git clone --depth 1 https://github.com/ClawBio/ClawBio.git /tmp/ClawBio
|
||||
```
|
||||
3. Read the specific skill:
|
||||
```bash
|
||||
# bioSkills — each skill is at: <category>/<skill-name>/SKILL.md
|
||||
cat /tmp/bioSkills/variant-calling/gatk-variant-calling/SKILL.md
|
||||
|
||||
# ClawBio — each skill is at: skills/<skill-name>/
|
||||
cat /tmp/ClawBio/skills/pharmgx-reporter/README.md
|
||||
```
|
||||
4. Follow the fetched skill as reference material. These are NOT Hermes-format skills — treat them as expert domain guides. They contain correct parameters, proper tool flags, and validated pipelines.
|
||||
|
||||
## Skill Index by Domain
|
||||
|
||||
### Sequence Fundamentals
|
||||
bioSkills:
|
||||
sequence-io/ — read-sequences, write-sequences, format-conversion, batch-processing, compressed-files, fastq-quality, filter-sequences, paired-end-fastq, sequence-statistics
|
||||
sequence-manipulation/ — seq-objects, reverse-complement, transcription-translation, motif-search, codon-usage, sequence-properties, sequence-slicing
|
||||
ClawBio:
|
||||
seq-wrangler — Sequence QC, alignment, and BAM processing (wraps FastQC, BWA, SAMtools)
|
||||
|
||||
### Read QC & Alignment
|
||||
bioSkills:
|
||||
read-qc/ — quality-reports, fastp-workflow, adapter-trimming, quality-filtering, umi-processing, contamination-screening, rnaseq-qc
|
||||
read-alignment/ — bwa-alignment, star-alignment, hisat2-alignment, bowtie2-alignment
|
||||
alignment-files/ — sam-bam-basics, alignment-sorting, alignment-filtering, bam-statistics, duplicate-handling, pileup-generation
|
||||
|
||||
### Variant Calling & Annotation
|
||||
bioSkills:
|
||||
variant-calling/ — gatk-variant-calling, deepvariant, variant-calling (bcftools), joint-calling, structural-variant-calling, filtering-best-practices, variant-annotation, variant-normalization, vcf-basics, vcf-manipulation, vcf-statistics, consensus-sequences, clinical-interpretation
|
||||
ClawBio:
|
||||
vcf-annotator — VEP + ClinVar + gnomAD annotation with ancestry-aware context
|
||||
variant-annotation — Variant annotation pipeline
|
||||
|
||||
### Differential Expression (Bulk RNA-seq)
|
||||
bioSkills:
|
||||
differential-expression/ — deseq2-basics, edger-basics, batch-correction, de-results, de-visualization, timeseries-de
|
||||
rna-quantification/ — alignment-free-quant (Salmon/kallisto), featurecounts-counting, tximport-workflow, count-matrix-qc
|
||||
expression-matrix/ — counts-ingest, gene-id-mapping, metadata-joins, sparse-handling
|
||||
ClawBio:
|
||||
rnaseq-de — Full DE pipeline with QC, normalization, and visualization
|
||||
diff-visualizer — Rich visualization and reporting for DE results
|
||||
|
||||
### Single-Cell RNA-seq
|
||||
bioSkills:
|
||||
single-cell/ — preprocessing, clustering, batch-integration, cell-annotation, cell-communication, doublet-detection, markers-annotation, trajectory-inference, multimodal-integration, perturb-seq, scatac-analysis, lineage-tracing, metabolite-communication, data-io
|
||||
ClawBio:
|
||||
scrna-orchestrator — Full Scanpy pipeline (QC, clustering, markers, annotation)
|
||||
scrna-embedding — scVI-based latent embedding and batch integration
|
||||
|
||||
### Spatial Transcriptomics
|
||||
bioSkills:
|
||||
spatial-transcriptomics/ — spatial-data-io, spatial-preprocessing, spatial-domains, spatial-deconvolution, spatial-communication, spatial-neighbors, spatial-statistics, spatial-visualization, spatial-multiomics, spatial-proteomics, image-analysis
|
||||
|
||||
### Epigenomics
|
||||
bioSkills:
|
||||
chip-seq/ — peak-calling, differential-binding, motif-analysis, peak-annotation, chipseq-qc, chipseq-visualization, super-enhancers
|
||||
atac-seq/ — atac-peak-calling, atac-qc, differential-accessibility, footprinting, motif-deviation, nucleosome-positioning
|
||||
methylation-analysis/ — bismark-alignment, methylation-calling, dmr-detection, methylkit-analysis
|
||||
hi-c-analysis/ — hic-data-io, tad-detection, loop-calling, compartment-analysis, contact-pairs, matrix-operations, hic-visualization, hic-differential
|
||||
ClawBio:
|
||||
methylation-clock — Epigenetic age estimation
|
||||
|
||||
### Pharmacogenomics & Clinical
|
||||
bioSkills:
|
||||
clinical-databases/ — clinvar-lookup, gnomad-frequencies, dbsnp-queries, pharmacogenomics, polygenic-risk, hla-typing, variant-prioritization, somatic-signatures, tumor-mutational-burden, myvariant-queries
|
||||
ClawBio:
|
||||
pharmgx-reporter — PGx report from 23andMe/AncestryDNA (12 genes, 31 SNPs, 51 drugs)
|
||||
drug-photo — Photo of medication → personalized PGx dosage card (via vision)
|
||||
clinpgx — ClinPGx API for gene-drug data and CPIC guidelines
|
||||
gwas-lookup — Federated variant lookup across 9 genomic databases
|
||||
gwas-prs — Polygenic risk scores from consumer genetic data
|
||||
nutrigx_advisor — Personalized nutrition from consumer genetic data
|
||||
|
||||
### Population Genetics & GWAS
|
||||
bioSkills:
|
||||
population-genetics/ — association-testing (PLINK GWAS), plink-basics, population-structure, linkage-disequilibrium, scikit-allel-analysis, selection-statistics
|
||||
causal-genomics/ — mendelian-randomization, fine-mapping, colocalization-analysis, mediation-analysis, pleiotropy-detection
|
||||
phasing-imputation/ — haplotype-phasing, genotype-imputation, imputation-qc, reference-panels
|
||||
ClawBio:
|
||||
claw-ancestry-pca — Ancestry PCA against SGDP reference panel
|
||||
|
||||
### Metagenomics & Microbiome
|
||||
bioSkills:
|
||||
metagenomics/ — kraken-classification, metaphlan-profiling, abundance-estimation, functional-profiling, amr-detection, strain-tracking, metagenome-visualization
|
||||
microbiome/ — amplicon-processing, diversity-analysis, differential-abundance, taxonomy-assignment, functional-prediction, qiime2-workflow
|
||||
ClawBio:
|
||||
claw-metagenomics — Shotgun metagenomics profiling (taxonomy, resistome, functional pathways)
|
||||
|
||||
### Genome Assembly & Annotation
|
||||
bioSkills:
|
||||
genome-assembly/ — hifi-assembly, long-read-assembly, short-read-assembly, metagenome-assembly, assembly-polishing, assembly-qc, scaffolding, contamination-detection
|
||||
genome-annotation/ — eukaryotic-gene-prediction, prokaryotic-annotation, functional-annotation, ncrna-annotation, repeat-annotation, annotation-transfer
|
||||
long-read-sequencing/ — basecalling, long-read-alignment, long-read-qc, clair3-variants, structural-variants, medaka-polishing, nanopore-methylation, isoseq-analysis
|
||||
|
||||
### Structural Biology & Chemoinformatics
|
||||
bioSkills:
|
||||
structural-biology/ — alphafold-predictions, modern-structure-prediction, structure-io, structure-navigation, structure-modification, geometric-analysis
|
||||
chemoinformatics/ — molecular-io, molecular-descriptors, similarity-searching, substructure-search, virtual-screening, admet-prediction, reaction-enumeration
|
||||
ClawBio:
|
||||
struct-predictor — Local AlphaFold/Boltz/Chai structure prediction with comparison
|
||||
|
||||
### Proteomics
|
||||
bioSkills:
|
||||
proteomics/ — data-import, peptide-identification, protein-inference, quantification, differential-abundance, dia-analysis, ptm-analysis, proteomics-qc, spectral-libraries
|
||||
ClawBio:
|
||||
proteomics-de — Proteomics differential expression
|
||||
|
||||
### Pathway Analysis & Gene Networks
|
||||
bioSkills:
|
||||
pathway-analysis/ — go-enrichment, gsea, kegg-pathways, reactome-pathways, wikipathways, enrichment-visualization
|
||||
gene-regulatory-networks/ — scenic-regulons, coexpression-networks, differential-networks, multiomics-grn, perturbation-simulation
|
||||
|
||||
### Immunoinformatics
|
||||
bioSkills:
|
||||
immunoinformatics/ — mhc-binding-prediction, epitope-prediction, neoantigen-prediction, immunogenicity-scoring, tcr-epitope-binding
|
||||
tcr-bcr-analysis/ — mixcr-analysis, scirpy-analysis, immcantation-analysis, repertoire-visualization, vdjtools-analysis
|
||||
|
||||
### CRISPR & Genome Engineering
|
||||
bioSkills:
|
||||
crispr-screens/ — mageck-analysis, jacks-analysis, hit-calling, screen-qc, library-design, crispresso-editing, base-editing-analysis, batch-correction
|
||||
genome-engineering/ — grna-design, off-target-prediction, hdr-template-design, base-editing-design, prime-editing-design
|
||||
|
||||
### Workflow Management
|
||||
bioSkills:
|
||||
workflow-management/ — snakemake-workflows, nextflow-pipelines, cwl-workflows, wdl-workflows
|
||||
ClawBio:
|
||||
repro-enforcer — Export any analysis as reproducibility bundle (Conda env + Singularity + checksums)
|
||||
galaxy-bridge — Access 8,000+ Galaxy tools from usegalaxy.org
|
||||
|
||||
### Specialized Domains
|
||||
bioSkills:
|
||||
alternative-splicing/ — splicing-quantification, differential-splicing, isoform-switching, sashimi-plots, single-cell-splicing, splicing-qc
|
||||
ecological-genomics/ — edna-metabarcoding, landscape-genomics, conservation-genetics, biodiversity-metrics, community-ecology, species-delimitation
|
||||
epidemiological-genomics/ — pathogen-typing, variant-surveillance, phylodynamics, transmission-inference, amr-surveillance
|
||||
liquid-biopsy/ — cfdna-preprocessing, ctdna-mutation-detection, fragment-analysis, tumor-fraction-estimation, methylation-based-detection, longitudinal-monitoring
|
||||
epitranscriptomics/ — m6a-peak-calling, m6a-differential, m6anet-analysis, merip-preprocessing, modification-visualization
|
||||
metabolomics/ — xcms-preprocessing, metabolite-annotation, normalization-qc, statistical-analysis, pathway-mapping, lipidomics, targeted-analysis, msdial-preprocessing
|
||||
flow-cytometry/ — fcs-handling, gating-analysis, compensation-transformation, clustering-phenotyping, differential-analysis, cytometry-qc, doublet-detection, bead-normalization
|
||||
systems-biology/ — flux-balance-analysis, metabolic-reconstruction, gene-essentiality, context-specific-models, model-curation
|
||||
rna-structure/ — secondary-structure-prediction, ncrna-search, structure-probing
|
||||
|
||||
### Data Visualization & Reporting
|
||||
bioSkills:
|
||||
data-visualization/ — ggplot2-fundamentals, heatmaps-clustering, volcano-customization, circos-plots, genome-browser-tracks, interactive-visualization, multipanel-figures, network-visualization, upset-plots, color-palettes, specialized-omics-plots, genome-tracks
|
||||
reporting/ — rmarkdown-reports, quarto-reports, jupyter-reports, automated-qc-reports, figure-export
|
||||
ClawBio:
|
||||
profile-report — Analysis profile reporting
|
||||
data-extractor — Extract numerical data from scientific figure images (via vision)
|
||||
lit-synthesizer — PubMed/bioRxiv search, summarization, citation graphs
|
||||
pubmed-summariser — Gene/disease PubMed search with structured briefing
|
||||
|
||||
### Database Access
|
||||
bioSkills:
|
||||
database-access/ — entrez-search, entrez-fetch, entrez-link, blast-searches, local-blast, sra-data, geo-data, uniprot-access, batch-downloads, interaction-databases, sequence-similarity
|
||||
ClawBio:
|
||||
ukb-navigator — Semantic search across 12,000+ UK Biobank fields
|
||||
clinical-trial-finder — Clinical trial discovery
|
||||
|
||||
### Experimental Design
|
||||
bioSkills:
|
||||
experimental-design/ — power-analysis, sample-size, batch-design, multiple-testing
|
||||
|
||||
### Machine Learning for Omics
|
||||
bioSkills:
|
||||
machine-learning/ — omics-classifiers, biomarker-discovery, survival-analysis, model-validation, prediction-explanation, atlas-mapping
|
||||
ClawBio:
|
||||
claw-semantic-sim — Semantic similarity index for disease literature (PubMedBERT)
|
||||
omics-target-evidence-mapper — Aggregate target-level evidence across omics sources
|
||||
|
||||
## Environment Setup
|
||||
|
||||
These skills assume a bioinformatics workstation. Common dependencies:
|
||||
|
||||
```bash
|
||||
# Python
|
||||
pip install biopython pysam cyvcf2 pybedtools pyBigWig scikit-allel anndata scanpy mygene
|
||||
|
||||
# R/Bioconductor
|
||||
Rscript -e 'BiocManager::install(c("DESeq2","edgeR","Seurat","clusterProfiler","methylKit"))'
|
||||
|
||||
# CLI tools (Ubuntu/Debian)
|
||||
sudo apt install samtools bcftools ncbi-blast+ minimap2 bedtools
|
||||
|
||||
# CLI tools (macOS)
|
||||
brew install samtools bcftools blast minimap2 bedtools
|
||||
|
||||
# Or via Conda (recommended for reproducibility)
|
||||
conda install -c bioconda samtools bcftools blast minimap2 bedtools fastp kraken2
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- The fetched skills are NOT in Hermes SKILL.md format. They use their own structure (bioSkills: code pattern cookbooks; ClawBio: README + Python scripts). Read them as expert reference material.
|
||||
- bioSkills are reference guides — they show correct parameters and code patterns but aren't executable pipelines.
|
||||
- ClawBio skills are executable — many have `--demo` flags and can be run directly.
|
||||
- Both repos assume bioinformatics tools are installed. Check prerequisites before running pipelines.
|
||||
- For ClawBio, run `pip install -r requirements.txt` in the cloned repo first.
|
||||
- Genomic data files can be very large. Be mindful of disk space when downloading reference genomes, SRA datasets, or building indices.
|
||||
|
|
@ -0,0 +1,116 @@
|
|||
---
|
||||
title: "Domain Intel — Passive domain reconnaissance using Python stdlib"
|
||||
sidebar_label: "Domain Intel"
|
||||
description: "Passive domain reconnaissance using Python stdlib"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Domain Intel
|
||||
|
||||
Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/domain-intel` |
|
||||
| Path | `optional-skills/research/domain-intel` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Domain Intelligence — Passive OSINT
|
||||
|
||||
Passive domain reconnaissance using only Python stdlib.
|
||||
**Zero dependencies. Zero API keys. Works on Linux, macOS, and Windows.**
|
||||
|
||||
## Helper script
|
||||
|
||||
This skill includes `scripts/domain_intel.py` — a complete CLI tool for all domain intelligence operations.
|
||||
|
||||
```bash
|
||||
# Subdomain discovery via Certificate Transparency logs
|
||||
python3 SKILL_DIR/scripts/domain_intel.py subdomains example.com
|
||||
|
||||
# SSL certificate inspection (expiry, cipher, SANs, issuer)
|
||||
python3 SKILL_DIR/scripts/domain_intel.py ssl example.com
|
||||
|
||||
# WHOIS lookup (registrar, dates, name servers — 100+ TLDs)
|
||||
python3 SKILL_DIR/scripts/domain_intel.py whois example.com
|
||||
|
||||
# DNS records (A, AAAA, MX, NS, TXT, CNAME)
|
||||
python3 SKILL_DIR/scripts/domain_intel.py dns example.com
|
||||
|
||||
# Domain availability check (passive: DNS + WHOIS + SSL signals)
|
||||
python3 SKILL_DIR/scripts/domain_intel.py available coolstartup.io
|
||||
|
||||
# Bulk analysis — multiple domains, multiple checks in parallel
|
||||
python3 SKILL_DIR/scripts/domain_intel.py bulk example.com github.com google.com
|
||||
python3 SKILL_DIR/scripts/domain_intel.py bulk example.com github.com --checks ssl,dns
|
||||
```
|
||||
|
||||
`SKILL_DIR` is the directory containing this SKILL.md file. All output is structured JSON.
|
||||
|
||||
## Available commands
|
||||
|
||||
| Command | What it does | Data source |
|
||||
|---------|-------------|-------------|
|
||||
| `subdomains` | Find subdomains from certificate logs | crt.sh (HTTPS) |
|
||||
| `ssl` | Inspect TLS certificate details | Direct TCP:443 to target |
|
||||
| `whois` | Registration info, registrar, dates | WHOIS servers (TCP:43) |
|
||||
| `dns` | A, AAAA, MX, NS, TXT, CNAME records | System DNS + Google DoH |
|
||||
| `available` | Check if domain is registered | DNS + WHOIS + SSL signals |
|
||||
| `bulk` | Run multiple checks on multiple domains | All of the above |
|
||||
|
||||
## When to use this vs built-in tools
|
||||
|
||||
- **Use this skill** for infrastructure questions: subdomains, SSL certs, WHOIS, DNS records, availability
|
||||
- **Use `web_search`** for general research about what a domain/company does
|
||||
- **Use `web_extract`** to get the actual content of a webpage
|
||||
- **Use `terminal` with `curl -I`** for a simple "is this URL reachable" check
|
||||
|
||||
| Task | Better tool | Why |
|
||||
|------|-------------|-----|
|
||||
| "What does example.com do?" | `web_extract` | Gets page content, not DNS/WHOIS data |
|
||||
| "Find info about a company" | `web_search` | General research, not domain-specific |
|
||||
| "Is this website safe?" | `web_search` | Reputation checks need web context |
|
||||
| "Check if a URL is reachable" | `terminal` with `curl -I` | Simple HTTP check |
|
||||
| "Find subdomains of X" | **This skill** | Only passive source for this |
|
||||
| "When does the SSL cert expire?" | **This skill** | Built-in tools can't inspect TLS |
|
||||
| "Who registered this domain?" | **This skill** | WHOIS data not in web search |
|
||||
| "Is coolstartup.io available?" | **This skill** | Passive availability via DNS+WHOIS+SSL |
|
||||
|
||||
## Platform compatibility
|
||||
|
||||
Pure Python stdlib (`socket`, `ssl`, `urllib`, `json`, `concurrent.futures`).
|
||||
Works identically on Linux, macOS, and Windows with no dependencies.
|
||||
|
||||
- **crt.sh queries** use HTTPS (port 443) — works behind most firewalls
|
||||
- **WHOIS queries** use TCP port 43 — may be blocked on restrictive networks
|
||||
- **DNS queries** use Google DoH (HTTPS) for MX/NS/TXT — firewall-friendly
|
||||
- **SSL checks** connect to the target on port 443 — the only "active" operation
|
||||
|
||||
## Data sources
|
||||
|
||||
All queries are **passive** — no port scanning, no vulnerability testing:
|
||||
|
||||
- **crt.sh** — Certificate Transparency logs (subdomain discovery, HTTPS only)
|
||||
- **WHOIS servers** — Direct TCP to 100+ authoritative TLD registrars
|
||||
- **Google DNS-over-HTTPS** — MX, NS, TXT, CNAME resolution (firewall-friendly)
|
||||
- **System DNS** — A/AAAA record resolution
|
||||
- **SSL check** is the only "active" operation (TCP connection to target:443)
|
||||
|
||||
## Notes
|
||||
|
||||
- WHOIS queries use TCP port 43 — may be blocked on restrictive networks
|
||||
- Some WHOIS servers redact registrant info (GDPR) — mention this to the user
|
||||
- crt.sh can be slow for very popular domains (thousands of certs) — set reasonable expectations
|
||||
- The availability check is heuristic-based (3 passive signals) — not authoritative like a registrar API
|
||||
|
||||
---
|
||||
|
||||
*Contributed by [@FurkanL0](https://github.com/FurkanL0)*
|
||||
|
|
@ -0,0 +1,236 @@
|
|||
---
|
||||
title: "Drug Discovery — Pharmaceutical research assistant for drug discovery workflows"
|
||||
sidebar_label: "Drug Discovery"
|
||||
description: "Pharmaceutical research assistant for drug discovery workflows"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Drug Discovery
|
||||
|
||||
Pharmaceutical research assistant for drug discovery workflows. Search bioactive compounds on ChEMBL, calculate drug-likeness (Lipinski Ro5, QED, TPSA, synthetic accessibility), look up drug-drug interactions via OpenFDA, interpret ADMET profiles, and assist with lead optimization. Use for medicinal chemistry questions, molecule property analysis, clinical pharmacology, and open-science drug research.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/drug-discovery` |
|
||||
| Path | `optional-skills/research/drug-discovery` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | bennytimz |
|
||||
| License | MIT |
|
||||
| Tags | `science`, `chemistry`, `pharmacology`, `research`, `health` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Drug Discovery & Pharmaceutical Research
|
||||
|
||||
You are an expert pharmaceutical scientist and medicinal chemist with deep
|
||||
knowledge of drug discovery, cheminformatics, and clinical pharmacology.
|
||||
Use this skill for all pharma/chemistry research tasks.
|
||||
|
||||
## Core Workflows
|
||||
|
||||
### 1 — Bioactive Compound Search (ChEMBL)
|
||||
|
||||
Search ChEMBL (the world's largest open bioactivity database) for compounds
|
||||
by target, activity, or molecule name. No API key required.
|
||||
|
||||
```bash
|
||||
# Search compounds by target name (e.g. "EGFR", "COX-2", "ACE")
|
||||
TARGET="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$TARGET")
|
||||
curl -s "https://www.ebi.ac.uk/chembl/api/data/target/search?q=${ENCODED}&format=json" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
targets=data.get('targets',[])[:5]
|
||||
for t in targets:
|
||||
print(f\"ChEMBL ID : {t.get('target_chembl_id')}\")
|
||||
print(f\"Name : {t.get('pref_name')}\")
|
||||
print(f\"Type : {t.get('target_type')}\")
|
||||
print()
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Get bioactivity data for a ChEMBL target ID
|
||||
TARGET_ID="$1" # e.g. CHEMBL203
|
||||
curl -s "https://www.ebi.ac.uk/chembl/api/data/activity?target_chembl_id=${TARGET_ID}&pchembl_value__gte=6&limit=10&format=json" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
acts=data.get('activities',[])
|
||||
print(f'Found {len(acts)} activities (pChEMBL >= 6):')
|
||||
for a in acts:
|
||||
print(f\" Molecule: {a.get('molecule_chembl_id')} | {a.get('standard_type')}: {a.get('standard_value')} {a.get('standard_units')} | pChEMBL: {a.get('pchembl_value')}\")
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Look up a specific molecule by ChEMBL ID
|
||||
MOL_ID="$1" # e.g. CHEMBL25 (aspirin)
|
||||
curl -s "https://www.ebi.ac.uk/chembl/api/data/molecule/${MOL_ID}?format=json" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
m=json.load(sys.stdin)
|
||||
props=m.get('molecule_properties',{}) or {}
|
||||
print(f\"Name : {m.get('pref_name','N/A')}\")
|
||||
print(f\"SMILES : {m.get('molecule_structures',{}).get('canonical_smiles','N/A') if m.get('molecule_structures') else 'N/A'}\")
|
||||
print(f\"MW : {props.get('full_mwt','N/A')} Da\")
|
||||
print(f\"LogP : {props.get('alogp','N/A')}\")
|
||||
print(f\"HBD : {props.get('hbd','N/A')}\")
|
||||
print(f\"HBA : {props.get('hba','N/A')}\")
|
||||
print(f\"TPSA : {props.get('psa','N/A')} Ų\")
|
||||
print(f\"Ro5 violations: {props.get('num_ro5_violations','N/A')}\")
|
||||
print(f\"QED : {props.get('qed_weighted','N/A')}\")
|
||||
"
|
||||
```
|
||||
|
||||
### 2 — Drug-Likeness Calculation (Lipinski Ro5 + Veber)
|
||||
|
||||
Assess any molecule against established oral bioavailability rules using
|
||||
PubChem's free property API — no RDKit install needed.
|
||||
|
||||
```bash
|
||||
COMPOUND="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$COMPOUND")
|
||||
curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/${ENCODED}/property/MolecularWeight,XLogP,HBondDonorCount,HBondAcceptorCount,RotatableBondCount,TPSA,InChIKey/JSON" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
props=data['PropertyTable']['Properties'][0]
|
||||
mw = float(props.get('MolecularWeight', 0))
|
||||
logp = float(props.get('XLogP', 0))
|
||||
hbd = int(props.get('HBondDonorCount', 0))
|
||||
hba = int(props.get('HBondAcceptorCount', 0))
|
||||
rot = int(props.get('RotatableBondCount', 0))
|
||||
tpsa = float(props.get('TPSA', 0))
|
||||
print('=== Lipinski Rule of Five (Ro5) ===')
|
||||
print(f' MW {mw:.1f} Da {\"✓\" if mw<=500 else \"✗ VIOLATION (>500)\"}')
|
||||
print(f' LogP {logp:.2f} {\"✓\" if logp<=5 else \"✗ VIOLATION (>5)\"}')
|
||||
print(f' HBD {hbd} {\"✓\" if hbd<=5 else \"✗ VIOLATION (>5)\"}')
|
||||
print(f' HBA {hba} {\"✓\" if hba<=10 else \"✗ VIOLATION (>10)\"}')
|
||||
viol = sum([mw>500, logp>5, hbd>5, hba>10])
|
||||
print(f' Violations: {viol}/4 {\"→ Likely orally bioavailable\" if viol<=1 else \"→ Poor oral bioavailability predicted\"}')
|
||||
print()
|
||||
print('=== Veber Oral Bioavailability Rules ===')
|
||||
print(f' TPSA {tpsa:.1f} Ų {\"✓\" if tpsa<=140 else \"✗ VIOLATION (>140)\"}')
|
||||
print(f' Rot. bonds {rot} {\"✓\" if rot<=10 else \"✗ VIOLATION (>10)\"}')
|
||||
print(f' Both rules met: {\"Yes → good oral absorption predicted\" if tpsa<=140 and rot<=10 else \"No → reduced oral absorption\"}')
|
||||
"
|
||||
```
|
||||
|
||||
### 3 — Drug Interaction & Safety Lookup (OpenFDA)
|
||||
|
||||
```bash
|
||||
DRUG="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$DRUG")
|
||||
curl -s "https://api.fda.gov/drug/label.json?search=drug_interactions:\"${ENCODED}\"&limit=3" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
results=data.get('results',[])
|
||||
if not results:
|
||||
print('No interaction data found in FDA labels.')
|
||||
sys.exit()
|
||||
for r in results[:2]:
|
||||
brand=r.get('openfda',{}).get('brand_name',['Unknown'])[0]
|
||||
generic=r.get('openfda',{}).get('generic_name',['Unknown'])[0]
|
||||
interactions=r.get('drug_interactions',['N/A'])[0]
|
||||
print(f'--- {brand} ({generic}) ---')
|
||||
print(interactions[:800])
|
||||
print()
|
||||
"
|
||||
```
|
||||
|
||||
```bash
|
||||
DRUG="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$DRUG")
|
||||
curl -s "https://api.fda.gov/drug/event.json?search=patient.drug.medicinalproduct:\"${ENCODED}\"&count=patient.reaction.reactionmeddrapt.exact&limit=10" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
results=data.get('results',[])
|
||||
if not results:
|
||||
print('No adverse event data found.')
|
||||
sys.exit()
|
||||
print(f'Top adverse events reported:')
|
||||
for r in results[:10]:
|
||||
print(f\" {r['count']:>5}x {r['term']}\")
|
||||
"
|
||||
```
|
||||
|
||||
### 4 — PubChem Compound Search
|
||||
|
||||
```bash
|
||||
COMPOUND="$1"
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$COMPOUND")
|
||||
CID=$(curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/${ENCODED}/cids/TXT" | head -1 | tr -d '[:space:]')
|
||||
echo "PubChem CID: $CID"
|
||||
curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/${CID}/property/IsomericSMILES,InChIKey,IUPACName/JSON" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
p=json.load(sys.stdin)['PropertyTable']['Properties'][0]
|
||||
print(f\"IUPAC Name : {p.get('IUPACName','N/A')}\")
|
||||
print(f\"SMILES : {p.get('IsomericSMILES','N/A')}\")
|
||||
print(f\"InChIKey : {p.get('InChIKey','N/A')}\")
|
||||
"
|
||||
```
|
||||
|
||||
### 5 — Target & Disease Literature (OpenTargets)
|
||||
|
||||
```bash
|
||||
GENE="$1"
|
||||
curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"query\":\"{ search(queryString: \\\"${GENE}\\\", entityNames: [\\\"target\\\"], page: {index: 0, size: 1}) { hits { id score object { ... on Target { id approvedSymbol approvedName associatedDiseases(page: {index: 0, size: 5}) { count rows { score disease { id name } } } } } } } }\"}" \
|
||||
| python3 -c "
|
||||
import json,sys
|
||||
data=json.load(sys.stdin)
|
||||
hits=data.get('data',{}).get('search',{}).get('hits',[])
|
||||
if not hits:
|
||||
print('Target not found.')
|
||||
sys.exit()
|
||||
obj=hits[0]['object']
|
||||
print(f\"Target: {obj.get('approvedSymbol')} — {obj.get('approvedName')}\")
|
||||
assoc=obj.get('associatedDiseases',{})
|
||||
print(f\"Associated with {assoc.get('count',0)} diseases. Top associations:\")
|
||||
for row in assoc.get('rows',[]):
|
||||
print(f\" Score {row['score']:.3f} | {row['disease']['name']}\")
|
||||
"
|
||||
```
|
||||
|
||||
## Reasoning Guidelines
|
||||
|
||||
When analysing drug-likeness or molecular properties, always:
|
||||
|
||||
1. **State raw values first** — MW, LogP, HBD, HBA, TPSA, RotBonds
|
||||
2. **Apply rule sets** — Ro5 (Lipinski), Veber, Ghose filter where relevant
|
||||
3. **Flag liabilities** — metabolic hotspots, hERG risk, high TPSA for CNS penetration
|
||||
4. **Suggest optimizations** — bioisosteric replacements, prodrug strategies, ring truncation
|
||||
5. **Cite the source API** — ChEMBL, PubChem, OpenFDA, or OpenTargets
|
||||
|
||||
For ADMET questions, reason through Absorption, Distribution, Metabolism, Excretion, Toxicity systematically. See references/ADMET_REFERENCE.md for detailed guidance.
|
||||
|
||||
## Important Notes
|
||||
|
||||
- All APIs are free, public, require no authentication
|
||||
- ChEMBL rate limits: add sleep 1 between batch requests
|
||||
- FDA data reflects reported adverse events, not necessarily causation
|
||||
- Always recommend consulting a licensed pharmacist or physician for clinical decisions
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | API | Endpoint |
|
||||
|------|-----|----------|
|
||||
| Find target | ChEMBL | `/api/data/target/search?q=` |
|
||||
| Get bioactivity | ChEMBL | `/api/data/activity?target_chembl_id=` |
|
||||
| Molecule properties | PubChem | `/rest/pug/compound/name/{name}/property/` |
|
||||
| Drug interactions | OpenFDA | `/drug/label.json?search=drug_interactions:` |
|
||||
| Adverse events | OpenFDA | `/drug/event.json?search=...&count=reaction` |
|
||||
| Gene-disease | OpenTargets | GraphQL POST `/api/v4/graphql` |
|
||||
|
|
@ -0,0 +1,254 @@
|
|||
---
|
||||
title: "Duckduckgo Search — Free web search via DuckDuckGo — text, news, images, videos"
|
||||
sidebar_label: "Duckduckgo Search"
|
||||
description: "Free web search via DuckDuckGo — text, news, images, videos"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Duckduckgo Search
|
||||
|
||||
Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Prefer the `ddgs` CLI when installed; use the Python DDGS library only after verifying that `ddgs` is available in the current runtime.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/duckduckgo-search` |
|
||||
| Path | `optional-skills/research/duckduckgo-search` |
|
||||
| Version | `1.3.0` |
|
||||
| Author | gamedevCloudy |
|
||||
| License | MIT |
|
||||
| Tags | `search`, `duckduckgo`, `web-search`, `free`, `fallback` |
|
||||
| Related skills | [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# DuckDuckGo Search
|
||||
|
||||
Free web search using DuckDuckGo. **No API key required.**
|
||||
|
||||
Preferred when `web_search` is unavailable or unsuitable (for example when `FIRECRAWL_API_KEY` is not set). Can also be used as a standalone search path when DuckDuckGo results are specifically desired.
|
||||
|
||||
## Detection Flow
|
||||
|
||||
Check what is actually available before choosing an approach:
|
||||
|
||||
```bash
|
||||
# Check CLI availability
|
||||
command -v ddgs >/dev/null && echo "DDGS_CLI=installed" || echo "DDGS_CLI=missing"
|
||||
```
|
||||
|
||||
Decision tree:
|
||||
1. If `ddgs` CLI is installed, prefer `terminal` + `ddgs`
|
||||
2. If `ddgs` CLI is missing, do not assume `execute_code` can import `ddgs`
|
||||
3. If the user wants DuckDuckGo specifically, install `ddgs` first in the relevant environment
|
||||
4. Otherwise fall back to built-in web/browser tools
|
||||
|
||||
Important runtime note:
|
||||
- Terminal and `execute_code` are separate runtimes
|
||||
- A successful shell install does not guarantee `execute_code` can import `ddgs`
|
||||
- Never assume third-party Python packages are preinstalled inside `execute_code`
|
||||
|
||||
## Installation
|
||||
|
||||
Install `ddgs` only when DuckDuckGo search is specifically needed and the runtime does not already provide it.
|
||||
|
||||
```bash
|
||||
# Python package + CLI entrypoint
|
||||
pip install ddgs
|
||||
|
||||
# Verify CLI
|
||||
ddgs --help
|
||||
```
|
||||
|
||||
If a workflow depends on Python imports, verify that same runtime can import `ddgs` before using `from ddgs import DDGS`.
|
||||
|
||||
## Method 1: CLI Search (Preferred)
|
||||
|
||||
Use the `ddgs` command via `terminal` when it exists. This is the preferred path because it avoids assuming the `execute_code` sandbox has the `ddgs` Python package installed.
|
||||
|
||||
```bash
|
||||
# Text search
|
||||
ddgs text -q "python async programming" -m 5
|
||||
|
||||
# News search
|
||||
ddgs news -q "artificial intelligence" -m 5
|
||||
|
||||
# Image search
|
||||
ddgs images -q "landscape photography" -m 10
|
||||
|
||||
# Video search
|
||||
ddgs videos -q "python tutorial" -m 5
|
||||
|
||||
# With region filter
|
||||
ddgs text -q "best restaurants" -m 5 -r us-en
|
||||
|
||||
# Recent results only (d=day, w=week, m=month, y=year)
|
||||
ddgs text -q "latest AI news" -m 5 -t w
|
||||
|
||||
# JSON output for parsing
|
||||
ddgs text -q "fastapi tutorial" -m 5 -o json
|
||||
```
|
||||
|
||||
### CLI Flags
|
||||
|
||||
| Flag | Description | Example |
|
||||
|------|-------------|---------|
|
||||
| `-q` | Query — **required** | `-q "search terms"` |
|
||||
| `-m` | Max results | `-m 5` |
|
||||
| `-r` | Region | `-r us-en` |
|
||||
| `-t` | Time limit | `-t w` (week) |
|
||||
| `-s` | Safe search | `-s off` |
|
||||
| `-o` | Output format | `-o json` |
|
||||
|
||||
## Method 2: Python API (Only After Verification)
|
||||
|
||||
Use the `DDGS` class in `execute_code` or another Python runtime only after verifying that `ddgs` is installed there. Do not assume `execute_code` includes third-party packages by default.
|
||||
|
||||
Safe wording:
|
||||
- "Use `execute_code` with `ddgs` after installing or verifying the package if needed"
|
||||
|
||||
Avoid saying:
|
||||
- "`execute_code` includes `ddgs`"
|
||||
- "DuckDuckGo search works by default in `execute_code`"
|
||||
|
||||
**Important:** `max_results` must always be passed as a **keyword argument** — positional usage raises an error on all methods.
|
||||
|
||||
### Text Search
|
||||
|
||||
Best for: general research, companies, documentation.
|
||||
|
||||
```python
|
||||
from ddgs import DDGS
|
||||
|
||||
with DDGS() as ddgs:
|
||||
for r in ddgs.text("python async programming", max_results=5):
|
||||
print(r["title"])
|
||||
print(r["href"])
|
||||
print(r.get("body", "")[:200])
|
||||
print()
|
||||
```
|
||||
|
||||
Returns: `title`, `href`, `body`
|
||||
|
||||
### News Search
|
||||
|
||||
Best for: current events, breaking news, latest updates.
|
||||
|
||||
```python
|
||||
from ddgs import DDGS
|
||||
|
||||
with DDGS() as ddgs:
|
||||
for r in ddgs.news("AI regulation 2026", max_results=5):
|
||||
print(r["date"], "-", r["title"])
|
||||
print(r.get("source", ""), "|", r["url"])
|
||||
print(r.get("body", "")[:200])
|
||||
print()
|
||||
```
|
||||
|
||||
Returns: `date`, `title`, `body`, `url`, `image`, `source`
|
||||
|
||||
### Image Search
|
||||
|
||||
Best for: visual references, product images, diagrams.
|
||||
|
||||
```python
|
||||
from ddgs import DDGS
|
||||
|
||||
with DDGS() as ddgs:
|
||||
for r in ddgs.images("semiconductor chip", max_results=5):
|
||||
print(r["title"])
|
||||
print(r["image"])
|
||||
print(r.get("thumbnail", ""))
|
||||
print(r.get("source", ""))
|
||||
print()
|
||||
```
|
||||
|
||||
Returns: `title`, `image`, `thumbnail`, `url`, `height`, `width`, `source`
|
||||
|
||||
### Video Search
|
||||
|
||||
Best for: tutorials, demos, explainers.
|
||||
|
||||
```python
|
||||
from ddgs import DDGS
|
||||
|
||||
with DDGS() as ddgs:
|
||||
for r in ddgs.videos("FastAPI tutorial", max_results=5):
|
||||
print(r["title"])
|
||||
print(r.get("content", ""))
|
||||
print(r.get("duration", ""))
|
||||
print(r.get("provider", ""))
|
||||
print(r.get("published", ""))
|
||||
print()
|
||||
```
|
||||
|
||||
Returns: `title`, `content`, `description`, `duration`, `provider`, `published`, `statistics`, `uploader`
|
||||
|
||||
### Quick Reference
|
||||
|
||||
| Method | Use When | Key Fields |
|
||||
|--------|----------|------------|
|
||||
| `text()` | General research, companies | title, href, body |
|
||||
| `news()` | Current events, updates | date, title, source, body, url |
|
||||
| `images()` | Visuals, diagrams | title, image, thumbnail, url |
|
||||
| `videos()` | Tutorials, demos | title, content, duration, provider |
|
||||
|
||||
## Workflow: Search then Extract
|
||||
|
||||
DuckDuckGo returns titles, URLs, and snippets — not full page content. To get full page content, search first and then extract the most relevant URL with `web_extract`, browser tools, or curl.
|
||||
|
||||
CLI example:
|
||||
|
||||
```bash
|
||||
ddgs text -q "fastapi deployment guide" -m 3 -o json
|
||||
```
|
||||
|
||||
Python example, only after verifying `ddgs` is installed in that runtime:
|
||||
|
||||
```python
|
||||
from ddgs import DDGS
|
||||
|
||||
with DDGS() as ddgs:
|
||||
results = list(ddgs.text("fastapi deployment guide", max_results=3))
|
||||
for r in results:
|
||||
print(r["title"], "->", r["href"])
|
||||
```
|
||||
|
||||
Then extract the best URL with `web_extract` or another content-retrieval tool.
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Rate limiting**: DuckDuckGo may throttle after many rapid requests. Add a short delay between searches if needed.
|
||||
- **No content extraction**: `ddgs` returns snippets, not full page content. Use `web_extract`, browser tools, or curl for the full article/page.
|
||||
- **Results quality**: Generally good but less configurable than Firecrawl's search.
|
||||
- **Availability**: DuckDuckGo may block requests from some cloud IPs. If searches return empty, try different keywords or wait a few seconds.
|
||||
- **Field variability**: Return fields may vary between results or `ddgs` versions. Use `.get()` for optional fields to avoid `KeyError`.
|
||||
- **Separate runtimes**: A successful `ddgs` install in terminal does not automatically mean `execute_code` can import it.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Likely Cause | What To Do |
|
||||
|---------|--------------|------------|
|
||||
| `ddgs: command not found` | CLI not installed in the shell environment | Install `ddgs`, or use built-in web/browser tools instead |
|
||||
| `ModuleNotFoundError: No module named 'ddgs'` | Python runtime does not have the package installed | Do not use Python DDGS there until that runtime is prepared |
|
||||
| Search returns nothing | Temporary rate limiting or poor query | Wait a few seconds, retry, or adjust the query |
|
||||
| CLI works but `execute_code` import fails | Terminal and `execute_code` are different runtimes | Keep using CLI, or separately prepare the Python runtime |
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **`max_results` is keyword-only**: `ddgs.text("query", 5)` raises an error. Use `ddgs.text("query", max_results=5)`.
|
||||
- **Do not assume the CLI exists**: Check `command -v ddgs` before using it.
|
||||
- **Do not assume `execute_code` can import `ddgs`**: `from ddgs import DDGS` may fail with `ModuleNotFoundError` unless that runtime was prepared separately.
|
||||
- **Package name**: The package is `ddgs` (previously `duckduckgo-search`). Install with `pip install ddgs`.
|
||||
- **Don't confuse `-q` and `-m`** (CLI): `-q` is for the query, `-m` is for max results count.
|
||||
- **Empty results**: If `ddgs` returns nothing, it may be rate-limited. Wait a few seconds and retry.
|
||||
|
||||
## Validated With
|
||||
|
||||
Validated examples against `ddgs==9.11.2` semantics. Skill guidance now treats CLI availability and Python import availability as separate concerns so the documented workflow matches actual runtime behavior.
|
||||
|
|
@ -0,0 +1,231 @@
|
|||
---
|
||||
title: "Gitnexus Explorer"
|
||||
sidebar_label: "Gitnexus Explorer"
|
||||
description: "Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Gitnexus Explorer
|
||||
|
||||
Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/gitnexus-explorer` |
|
||||
| Path | `optional-skills/research/gitnexus-explorer` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent + Teknium |
|
||||
| License | MIT |
|
||||
| Tags | `gitnexus`, `code-intelligence`, `knowledge-graph`, `visualization` |
|
||||
| Related skills | [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`codebase-inspection`](/docs/user-guide/skills/bundled/github/github-codebase-inspection) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# GitNexus Explorer
|
||||
|
||||
Index any codebase into a knowledge graph and serve an interactive web UI for exploring
|
||||
symbols, call chains, clusters, and execution flows. Tunneled via Cloudflare for remote access.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User wants to visually explore a codebase's architecture
|
||||
- User asks for a knowledge graph / dependency graph of a repo
|
||||
- User wants to share an interactive codebase explorer with someone
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Node.js** (v18+) — required for GitNexus and the proxy
|
||||
- **git** — repo must have a `.git` directory
|
||||
- **cloudflared** — for tunneling (auto-installed to ~/.local/bin if missing)
|
||||
|
||||
## Size Warning
|
||||
|
||||
The web UI renders all nodes in the browser. Repos under ~5,000 files work well. Large
|
||||
repos (30k+ nodes) will be sluggish or crash the browser tab. The CLI/MCP tools work
|
||||
at any scale — only the web visualization has this limit.
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Clone and Build GitNexus (one-time setup)
|
||||
|
||||
```bash
|
||||
GITNEXUS_DIR="${GITNEXUS_DIR:-$HOME/.local/share/gitnexus}"
|
||||
|
||||
if [ ! -d "$GITNEXUS_DIR/gitnexus-web/dist" ]; then
|
||||
git clone https://github.com/abhigyanpatwari/GitNexus.git "$GITNEXUS_DIR"
|
||||
cd "$GITNEXUS_DIR/gitnexus-shared" && npm install && npm run build
|
||||
cd "$GITNEXUS_DIR/gitnexus-web" && npm install
|
||||
fi
|
||||
```
|
||||
|
||||
### 2. Patch the Web UI for Remote Access
|
||||
|
||||
The web UI defaults to `localhost:4747` for API calls. Patch it to use same-origin
|
||||
so it works through a tunnel/proxy:
|
||||
|
||||
**File: `$GITNEXUS_DIR/gitnexus-web/src/config/ui-constants.ts`**
|
||||
Change:
|
||||
```typescript
|
||||
export const DEFAULT_BACKEND_URL = 'http://localhost:4747';
|
||||
```
|
||||
To:
|
||||
```typescript
|
||||
export const DEFAULT_BACKEND_URL = typeof window !== 'undefined' && window.location.hostname !== 'localhost' ? window.location.origin : 'http://localhost:4747';
|
||||
```
|
||||
|
||||
**File: `$GITNEXUS_DIR/gitnexus-web/vite.config.ts`**
|
||||
Add `allowedHosts: true` inside the `server: { }` block (only needed if running dev
|
||||
mode instead of production build):
|
||||
```typescript
|
||||
server: {
|
||||
allowedHosts: true,
|
||||
// ... existing config
|
||||
},
|
||||
```
|
||||
|
||||
Then build the production bundle:
|
||||
```bash
|
||||
cd "$GITNEXUS_DIR/gitnexus-web" && npx vite build
|
||||
```
|
||||
|
||||
### 3. Index the Target Repo
|
||||
|
||||
```bash
|
||||
cd /path/to/target-repo
|
||||
npx gitnexus analyze --skip-agents-md
|
||||
rm -rf .claude/ # remove Claude Code-specific artifacts
|
||||
```
|
||||
|
||||
Add `--embeddings` for semantic search (slower — minutes instead of seconds).
|
||||
|
||||
The index lives in `.gitnexus/` inside the repo (auto-gitignored).
|
||||
|
||||
### 4. Create the Proxy Script
|
||||
|
||||
Write this to a file (e.g., `$GITNEXUS_DIR/proxy.mjs`). It serves the production
|
||||
web UI and proxies `/api/*` to the GitNexus backend — same origin, no CORS issues,
|
||||
no sudo, no nginx.
|
||||
|
||||
```javascript
|
||||
import http from 'node:http';
|
||||
import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
|
||||
const API_PORT = parseInt(process.env.API_PORT || '4747');
|
||||
const DIST_DIR = process.argv[2] || './dist';
|
||||
const PORT = parseInt(process.argv[3] || '8888');
|
||||
|
||||
const MIME = {
|
||||
'.html': 'text/html', '.js': 'application/javascript', '.css': 'text/css',
|
||||
'.json': 'application/json', '.png': 'image/png', '.svg': 'image/svg+xml',
|
||||
'.ico': 'image/x-icon', '.woff2': 'font/woff2', '.woff': 'font/woff',
|
||||
'.wasm': 'application/wasm',
|
||||
};
|
||||
|
||||
function proxyToApi(req, res) {
|
||||
const opts = {
|
||||
hostname: '127.0.0.1', port: API_PORT,
|
||||
path: req.url, method: req.method, headers: req.headers,
|
||||
};
|
||||
const proxy = http.request(opts, (upstream) => {
|
||||
res.writeHead(upstream.statusCode, upstream.headers);
|
||||
upstream.pipe(res, { end: true });
|
||||
});
|
||||
proxy.on('error', () => { res.writeHead(502); res.end('Backend unavailable'); });
|
||||
req.pipe(proxy, { end: true });
|
||||
}
|
||||
|
||||
function serveStatic(req, res) {
|
||||
let filePath = path.join(DIST_DIR, req.url === '/' ? 'index.html' : req.url.split('?')[0]);
|
||||
if (!fs.existsSync(filePath)) filePath = path.join(DIST_DIR, 'index.html');
|
||||
const ext = path.extname(filePath);
|
||||
const mime = MIME[ext] || 'application/octet-stream';
|
||||
try {
|
||||
const data = fs.readFileSync(filePath);
|
||||
res.writeHead(200, { 'Content-Type': mime, 'Cache-Control': 'public, max-age=3600' });
|
||||
res.end(data);
|
||||
} catch { res.writeHead(404); res.end('Not found'); }
|
||||
}
|
||||
|
||||
http.createServer((req, res) => {
|
||||
if (req.url.startsWith('/api')) proxyToApi(req, res);
|
||||
else serveStatic(req, res);
|
||||
}).listen(PORT, () => console.log(`GitNexus proxy on http://localhost:${PORT}`));
|
||||
```
|
||||
|
||||
### 5. Start the Services
|
||||
|
||||
```bash
|
||||
# Terminal 1: GitNexus backend API
|
||||
npx gitnexus serve &
|
||||
|
||||
# Terminal 2: Proxy (web UI + API on one port)
|
||||
node "$GITNEXUS_DIR/proxy.mjs" "$GITNEXUS_DIR/gitnexus-web/dist" 8888 &
|
||||
```
|
||||
|
||||
Verify: `curl -s http://localhost:8888/api/repos` should return the indexed repo(s).
|
||||
|
||||
### 6. Tunnel with Cloudflare (optional — for remote access)
|
||||
|
||||
```bash
|
||||
# Install cloudflared if needed (no sudo)
|
||||
if ! command -v cloudflared &>/dev/null; then
|
||||
mkdir -p ~/.local/bin
|
||||
curl -sL https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 \
|
||||
-o ~/.local/bin/cloudflared
|
||||
chmod +x ~/.local/bin/cloudflared
|
||||
export PATH="$HOME/.local/bin:$PATH"
|
||||
fi
|
||||
|
||||
# Start tunnel (--config /dev/null avoids conflicts with existing named tunnels)
|
||||
cloudflared tunnel --config /dev/null --url http://localhost:8888 --no-autoupdate --protocol http2
|
||||
```
|
||||
|
||||
The tunnel URL (e.g., `https://random-words.trycloudflare.com`) is printed to stderr.
|
||||
Share it — anyone with the link can explore the graph.
|
||||
|
||||
### 7. Cleanup
|
||||
|
||||
```bash
|
||||
# Stop services
|
||||
pkill -f "gitnexus serve"
|
||||
pkill -f "proxy.mjs"
|
||||
pkill -f cloudflared
|
||||
|
||||
# Remove index from the target repo
|
||||
cd /path/to/target-repo
|
||||
npx gitnexus clean
|
||||
rm -rf .claude/
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **`--config /dev/null` is required for cloudflared** if the user has an existing
|
||||
named tunnel config at `~/.cloudflared/config.yml`. Without it, the catch-all
|
||||
ingress rule in the config returns 404 for all quick tunnel requests.
|
||||
|
||||
- **Production build is mandatory for tunneling.** The Vite dev server blocks
|
||||
non-localhost hosts by default (`allowedHosts`). The production build + Node
|
||||
proxy avoids this entirely.
|
||||
|
||||
- **The web UI does NOT create `.claude/` or `CLAUDE.md`.** Those are created by
|
||||
`npx gitnexus analyze`. Use `--skip-agents-md` to suppress the markdown files,
|
||||
then `rm -rf .claude/` for the rest. These are Claude Code integrations that
|
||||
hermes-agent users don't need.
|
||||
|
||||
- **Browser memory limit.** The web UI loads the entire graph into browser memory.
|
||||
Repos with 5k+ files may be sluggish. 30k+ files will likely crash the tab.
|
||||
|
||||
- **Embeddings are optional.** `--embeddings` enables semantic search but takes
|
||||
minutes on large repos. Skip it for quick exploration; add it if you want
|
||||
natural language queries via the AI chat panel.
|
||||
|
||||
- **Multiple repos.** `gitnexus serve` serves ALL indexed repos. Index several
|
||||
repos, start serve once, and the web UI lets you switch between them.
|
||||
|
|
@ -0,0 +1,408 @@
|
|||
---
|
||||
title: "Parallel Cli"
|
||||
sidebar_label: "Parallel Cli"
|
||||
description: "Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Parallel Cli
|
||||
|
||||
Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring. Prefer JSON output and non-interactive flows.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/parallel-cli` |
|
||||
| Path | `optional-skills/research/parallel-cli` |
|
||||
| Version | `1.1.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `Research`, `Web`, `Search`, `Deep-Research`, `Enrichment`, `CLI` |
|
||||
| Related skills | [`duckduckgo-search`](/docs/user-guide/skills/optional/research/research-duckduckgo-search), [`mcporter`](/docs/user-guide/skills/optional/mcp/mcp-mcporter) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Parallel CLI
|
||||
|
||||
Use `parallel-cli` when the user explicitly wants Parallel, or when a terminal-native workflow would benefit from Parallel's vendor-specific stack for web search, extraction, deep research, enrichment, entity discovery, or monitoring.
|
||||
|
||||
This is an optional third-party workflow, not a Hermes core capability.
|
||||
|
||||
Important expectations:
|
||||
- Parallel is a paid service with a free tier, not a fully free local tool.
|
||||
- It overlaps with Hermes native `web_search` / `web_extract`, so do not prefer it by default for ordinary lookups.
|
||||
- Prefer this skill when the user mentions Parallel specifically or needs capabilities like Parallel's enrichment, FindAll, or monitor workflows.
|
||||
|
||||
`parallel-cli` is designed for agents:
|
||||
- JSON output via `--json`
|
||||
- Non-interactive command execution
|
||||
- Async long-running jobs with `--no-wait`, `status`, and `poll`
|
||||
- Context chaining with `--previous-interaction-id`
|
||||
- Search, extract, research, enrichment, entity discovery, and monitoring in one CLI
|
||||
|
||||
## When to use it
|
||||
|
||||
Prefer this skill when:
|
||||
- The user explicitly mentions Parallel or `parallel-cli`
|
||||
- The task needs richer workflows than a simple one-shot search/extract pass
|
||||
- You need async deep research jobs that can be launched and polled later
|
||||
- You need structured enrichment, FindAll entity discovery, or monitoring
|
||||
|
||||
Prefer Hermes native `web_search` / `web_extract` for quick one-off lookups when Parallel is not specifically requested.
|
||||
|
||||
## Installation
|
||||
|
||||
Try the least invasive install path available for the environment.
|
||||
|
||||
### Homebrew
|
||||
|
||||
```bash
|
||||
brew install parallel-web/tap/parallel-cli
|
||||
```
|
||||
|
||||
### npm
|
||||
|
||||
```bash
|
||||
npm install -g parallel-web-cli
|
||||
```
|
||||
|
||||
### Python package
|
||||
|
||||
```bash
|
||||
pip install "parallel-web-tools[cli]"
|
||||
```
|
||||
|
||||
### Standalone installer
|
||||
|
||||
```bash
|
||||
curl -fsSL https://parallel.ai/install.sh | bash
|
||||
```
|
||||
|
||||
If you want an isolated Python install, `pipx` can also work:
|
||||
|
||||
```bash
|
||||
pipx install "parallel-web-tools[cli]"
|
||||
pipx ensurepath
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
Interactive login:
|
||||
|
||||
```bash
|
||||
parallel-cli login
|
||||
```
|
||||
|
||||
Headless / SSH / CI:
|
||||
|
||||
```bash
|
||||
parallel-cli login --device
|
||||
```
|
||||
|
||||
API key environment variable:
|
||||
|
||||
```bash
|
||||
export PARALLEL_API_KEY="***"
|
||||
```
|
||||
|
||||
Verify current auth status:
|
||||
|
||||
```bash
|
||||
parallel-cli auth
|
||||
```
|
||||
|
||||
If auth requires browser interaction, run with `pty=true`.
|
||||
|
||||
## Core rule set
|
||||
|
||||
1. Always prefer `--json` when you need machine-readable output.
|
||||
2. Prefer explicit arguments and non-interactive flows.
|
||||
3. For long-running jobs, use `--no-wait` and then `status` / `poll`.
|
||||
4. Cite only URLs returned by the CLI output.
|
||||
5. Save large JSON outputs to a temp file when follow-up questions are likely.
|
||||
6. Use background processes only for genuinely long-running workflows; otherwise run in foreground.
|
||||
7. Prefer Hermes native tools unless the user wants Parallel specifically or needs Parallel-only workflows.
|
||||
|
||||
## Quick reference
|
||||
|
||||
```text
|
||||
parallel-cli
|
||||
├── auth
|
||||
├── login
|
||||
├── logout
|
||||
├── search
|
||||
├── extract / fetch
|
||||
├── research run|status|poll|processors
|
||||
├── enrich run|status|poll|plan|suggest|deploy
|
||||
├── findall run|ingest|status|poll|result|enrich|extend|schema|cancel
|
||||
└── monitor create|list|get|update|delete|events|event-group|simulate
|
||||
```
|
||||
|
||||
## Common flags and patterns
|
||||
|
||||
Commonly useful flags:
|
||||
- `--json` for structured output
|
||||
- `--no-wait` for async jobs
|
||||
- `--previous-interaction-id <id>` for follow-up tasks that reuse earlier context
|
||||
- `--max-results <n>` for search result count
|
||||
- `--mode one-shot|agentic` for search behavior
|
||||
- `--include-domains domain1.com,domain2.com`
|
||||
- `--exclude-domains domain1.com,domain2.com`
|
||||
- `--after-date YYYY-MM-DD`
|
||||
|
||||
Read from stdin when convenient:
|
||||
|
||||
```bash
|
||||
echo "What is the latest funding for Anthropic?" | parallel-cli search - --json
|
||||
echo "Research question" | parallel-cli research run - --json
|
||||
```
|
||||
|
||||
## Search
|
||||
|
||||
Use for current web lookups with structured results.
|
||||
|
||||
```bash
|
||||
parallel-cli search "What is Anthropic's latest AI model?" --json
|
||||
parallel-cli search "SEC filings for Apple" --include-domains sec.gov --json
|
||||
parallel-cli search "bitcoin price" --after-date 2026-01-01 --max-results 10 --json
|
||||
parallel-cli search "latest browser benchmarks" --mode one-shot --json
|
||||
parallel-cli search "AI coding agent enterprise reviews" --mode agentic --json
|
||||
```
|
||||
|
||||
Useful constraints:
|
||||
- `--include-domains` to narrow trusted sources
|
||||
- `--exclude-domains` to strip noisy domains
|
||||
- `--after-date` for recency filtering
|
||||
- `--max-results` when you need broader coverage
|
||||
|
||||
If you expect follow-up questions, save output:
|
||||
|
||||
```bash
|
||||
parallel-cli search "latest React 19 changes" --json -o /tmp/react-19-search.json
|
||||
```
|
||||
|
||||
When summarizing results:
|
||||
- lead with the answer
|
||||
- include dates, names, and concrete facts
|
||||
- cite only returned sources
|
||||
- avoid inventing URLs or source titles
|
||||
|
||||
## Extraction
|
||||
|
||||
Use to pull clean content or markdown from a URL.
|
||||
|
||||
```bash
|
||||
parallel-cli extract https://example.com --json
|
||||
parallel-cli extract https://company.com --objective "Find pricing info" --json
|
||||
parallel-cli extract https://example.com --full-content --json
|
||||
parallel-cli fetch https://example.com --json
|
||||
```
|
||||
|
||||
Use `--objective` when the page is broad and you only need one slice of information.
|
||||
|
||||
## Deep research
|
||||
|
||||
Use for deeper multi-step research tasks that may take time.
|
||||
|
||||
Common processor tiers:
|
||||
- `lite` / `base` for faster, cheaper passes
|
||||
- `core` / `pro` for more thorough synthesis
|
||||
- `ultra` for the heaviest research jobs
|
||||
|
||||
### Synchronous
|
||||
|
||||
```bash
|
||||
parallel-cli research run \
|
||||
"Compare the leading AI coding agents by pricing, model support, and enterprise controls" \
|
||||
--processor core \
|
||||
--json
|
||||
```
|
||||
|
||||
### Async launch + poll
|
||||
|
||||
```bash
|
||||
parallel-cli research run \
|
||||
"Compare the leading AI coding agents by pricing, model support, and enterprise controls" \
|
||||
--processor ultra \
|
||||
--no-wait \
|
||||
--json
|
||||
|
||||
parallel-cli research status trun_xxx --json
|
||||
parallel-cli research poll trun_xxx --json
|
||||
parallel-cli research processors --json
|
||||
```
|
||||
|
||||
### Context chaining / follow-up
|
||||
|
||||
```bash
|
||||
parallel-cli research run "What are the top AI coding agents?" --json
|
||||
parallel-cli research run \
|
||||
"What enterprise controls does the top-ranked one offer?" \
|
||||
--previous-interaction-id trun_xxx \
|
||||
--json
|
||||
```
|
||||
|
||||
Recommended Hermes workflow:
|
||||
1. launch with `--no-wait --json`
|
||||
2. capture the returned run/task ID
|
||||
3. if the user wants to continue other work, keep moving
|
||||
4. later call `status` or `poll`
|
||||
5. summarize the final report with citations from the returned sources
|
||||
|
||||
## Enrichment
|
||||
|
||||
Use when the user has CSV/JSON/tabular inputs and wants additional columns inferred from web research.
|
||||
|
||||
### Suggest columns
|
||||
|
||||
```bash
|
||||
parallel-cli enrich suggest "Find the CEO and annual revenue" --json
|
||||
```
|
||||
|
||||
### Plan a config
|
||||
|
||||
```bash
|
||||
parallel-cli enrich plan -o config.yaml
|
||||
```
|
||||
|
||||
### Inline data
|
||||
|
||||
```bash
|
||||
parallel-cli enrich run \
|
||||
--data '[{"company": "Anthropic"}, {"company": "Mistral"}]' \
|
||||
--intent "Find headquarters and employee count" \
|
||||
--json
|
||||
```
|
||||
|
||||
### Non-interactive file run
|
||||
|
||||
```bash
|
||||
parallel-cli enrich run \
|
||||
--source-type csv \
|
||||
--source companies.csv \
|
||||
--target enriched.csv \
|
||||
--source-columns '[{"name": "company", "description": "Company name"}]' \
|
||||
--intent "Find the CEO and annual revenue"
|
||||
```
|
||||
|
||||
### YAML config run
|
||||
|
||||
```bash
|
||||
parallel-cli enrich run config.yaml
|
||||
```
|
||||
|
||||
### Status / polling
|
||||
|
||||
```bash
|
||||
parallel-cli enrich status <task_group_id> --json
|
||||
parallel-cli enrich poll <task_group_id> --json
|
||||
```
|
||||
|
||||
Use explicit JSON arrays for column definitions when operating non-interactively.
|
||||
Validate the output file before reporting success.
|
||||
|
||||
## FindAll
|
||||
|
||||
Use for web-scale entity discovery when the user wants a discovered dataset rather than a short answer.
|
||||
|
||||
```bash
|
||||
parallel-cli findall run "Find AI coding agent startups with enterprise offerings" --json
|
||||
parallel-cli findall run "AI startups in healthcare" -n 25 --json
|
||||
parallel-cli findall status <run_id> --json
|
||||
parallel-cli findall poll <run_id> --json
|
||||
parallel-cli findall result <run_id> --json
|
||||
parallel-cli findall schema <run_id> --json
|
||||
```
|
||||
|
||||
This is a better fit than ordinary search when the user wants a discovered set of entities that can be reviewed, filtered, or enriched later.
|
||||
|
||||
## Monitor
|
||||
|
||||
Use for ongoing change detection over time.
|
||||
|
||||
```bash
|
||||
parallel-cli monitor list --json
|
||||
parallel-cli monitor get <monitor_id> --json
|
||||
parallel-cli monitor events <monitor_id> --json
|
||||
parallel-cli monitor delete <monitor_id> --json
|
||||
```
|
||||
|
||||
Creation is usually the sensitive part because cadence and delivery matter:
|
||||
|
||||
```bash
|
||||
parallel-cli monitor create --help
|
||||
```
|
||||
|
||||
Use this when the user wants recurring tracking of a page or source rather than a one-time fetch.
|
||||
|
||||
## Recommended Hermes usage patterns
|
||||
|
||||
### Fast answer with citations
|
||||
1. Run `parallel-cli search ... --json`
|
||||
2. Parse titles, URLs, dates, excerpts
|
||||
3. Summarize with inline citations from the returned URLs only
|
||||
|
||||
### URL investigation
|
||||
1. Run `parallel-cli extract URL --json`
|
||||
2. If needed, rerun with `--objective` or `--full-content`
|
||||
3. Quote or summarize the extracted markdown
|
||||
|
||||
### Long research workflow
|
||||
1. Run `parallel-cli research run ... --no-wait --json`
|
||||
2. Store the returned ID
|
||||
3. Continue other work or periodically poll
|
||||
4. Summarize the final report with citations
|
||||
|
||||
### Structured enrichment workflow
|
||||
1. Inspect the input file and columns
|
||||
2. Use `enrich suggest` or provide explicit enriched columns
|
||||
3. Run `enrich run`
|
||||
4. Poll for completion if needed
|
||||
5. Validate the output file before reporting success
|
||||
|
||||
## Error handling and exit codes
|
||||
|
||||
The CLI documents these exit codes:
|
||||
- `0` success
|
||||
- `2` bad input
|
||||
- `3` auth error
|
||||
- `4` API error
|
||||
- `5` timeout
|
||||
|
||||
If you hit auth errors:
|
||||
1. check `parallel-cli auth`
|
||||
2. confirm `PARALLEL_API_KEY` or run `parallel-cli login` / `parallel-cli login --device`
|
||||
3. verify `parallel-cli` is on `PATH`
|
||||
|
||||
## Maintenance
|
||||
|
||||
Check current auth / install state:
|
||||
|
||||
```bash
|
||||
parallel-cli auth
|
||||
parallel-cli --help
|
||||
```
|
||||
|
||||
Update commands:
|
||||
|
||||
```bash
|
||||
parallel-cli update
|
||||
pip install --upgrade parallel-web-tools
|
||||
parallel-cli config auto-update-check off
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- Do not omit `--json` unless the user explicitly wants human-formatted output.
|
||||
- Do not cite sources not present in the CLI output.
|
||||
- `login` may require PTY/browser interaction.
|
||||
- Prefer foreground execution for short tasks; do not overuse background processes.
|
||||
- For large result sets, save JSON to `/tmp/*.json` instead of stuffing everything into context.
|
||||
- Do not silently choose Parallel when Hermes native tools are already sufficient.
|
||||
- Remember this is a vendor workflow that usually requires account auth and paid usage beyond the free tier.
|
||||
459
website/docs/user-guide/skills/optional/research/research-qmd.md
Normal file
459
website/docs/user-guide/skills/optional/research/research-qmd.md
Normal file
|
|
@ -0,0 +1,459 @@
|
|||
---
|
||||
title: "Qmd"
|
||||
sidebar_label: "Qmd"
|
||||
description: "Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Qmd
|
||||
|
||||
Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/qmd` |
|
||||
| Path | `optional-skills/research/qmd` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent + Teknium |
|
||||
| License | MIT |
|
||||
| Platforms | macos, linux |
|
||||
| Tags | `Search`, `Knowledge-Base`, `RAG`, `Notes`, `MCP`, `Local-AI` |
|
||||
| Related skills | [`obsidian`](/docs/user-guide/skills/bundled/note-taking/note-taking-obsidian), [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# QMD — Query Markup Documents
|
||||
|
||||
Local, on-device search engine for personal knowledge bases. Indexes markdown
|
||||
notes, meeting transcripts, documentation, and any text-based files, then
|
||||
provides hybrid search combining keyword matching, semantic understanding, and
|
||||
LLM-powered reranking — all running locally with no cloud dependencies.
|
||||
|
||||
Created by [Tobi Lütke](https://github.com/tobi/qmd). MIT licensed.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to search their notes, docs, knowledge base, or meeting transcripts
|
||||
- User wants to find something across a large collection of markdown/text files
|
||||
- User wants semantic search ("find notes about X concept") not just keyword grep
|
||||
- User has already set up qmd collections and wants to query them
|
||||
- User asks to set up a local knowledge base or document search system
|
||||
- Keywords: "search my notes", "find in my docs", "knowledge base", "qmd"
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Node.js >= 22 (required)
|
||||
|
||||
```bash
|
||||
# Check version
|
||||
node --version # must be >= 22
|
||||
|
||||
# macOS — install or upgrade via Homebrew
|
||||
brew install node@22
|
||||
|
||||
# Linux — use NodeSource or nvm
|
||||
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
|
||||
sudo apt-get install -y nodejs
|
||||
# or with nvm:
|
||||
nvm install 22 && nvm use 22
|
||||
```
|
||||
|
||||
### SQLite with Extension Support (macOS only)
|
||||
|
||||
macOS system SQLite lacks extension loading. Install via Homebrew:
|
||||
|
||||
```bash
|
||||
brew install sqlite
|
||||
```
|
||||
|
||||
### Install qmd
|
||||
|
||||
```bash
|
||||
npm install -g @tobilu/qmd
|
||||
# or with Bun:
|
||||
bun install -g @tobilu/qmd
|
||||
```
|
||||
|
||||
First run auto-downloads 3 local GGUF models (~2GB total):
|
||||
|
||||
| Model | Purpose | Size |
|
||||
|-------|---------|------|
|
||||
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
|
||||
| qwen3-reranker-0.6b-q8_0 | Result reranking | ~640MB |
|
||||
| qmd-query-expansion-1.7B | Query expansion | ~1.1GB |
|
||||
|
||||
### Verify Installation
|
||||
|
||||
```bash
|
||||
qmd --version
|
||||
qmd status
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Command | What It Does | Speed |
|
||||
|---------|-------------|-------|
|
||||
| `qmd search "query"` | BM25 keyword search (no models) | ~0.2s |
|
||||
| `qmd vsearch "query"` | Semantic vector search (1 model) | ~3s |
|
||||
| `qmd query "query"` | Hybrid + reranking (all 3 models) | ~2-3s warm, ~19s cold |
|
||||
| `qmd get <docid>` | Retrieve full document content | instant |
|
||||
| `qmd multi-get "glob"` | Retrieve multiple files | instant |
|
||||
| `qmd collection add <path> --name <n>` | Add a directory as a collection | instant |
|
||||
| `qmd context add <path> "description"` | Add context metadata to improve retrieval | instant |
|
||||
| `qmd embed` | Generate/update vector embeddings | varies |
|
||||
| `qmd status` | Show index health and collection info | instant |
|
||||
| `qmd mcp` | Start MCP server (stdio) | persistent |
|
||||
| `qmd mcp --http --daemon` | Start MCP server (HTTP, warm models) | persistent |
|
||||
|
||||
## Setup Workflow
|
||||
|
||||
### 1. Add Collections
|
||||
|
||||
Point qmd at directories containing your documents:
|
||||
|
||||
```bash
|
||||
# Add a notes directory
|
||||
qmd collection add ~/notes --name notes
|
||||
|
||||
# Add project docs
|
||||
qmd collection add ~/projects/myproject/docs --name project-docs
|
||||
|
||||
# Add meeting transcripts
|
||||
qmd collection add ~/meetings --name meetings
|
||||
|
||||
# List all collections
|
||||
qmd collection list
|
||||
```
|
||||
|
||||
### 2. Add Context Descriptions
|
||||
|
||||
Context metadata helps the search engine understand what each collection
|
||||
contains. This significantly improves retrieval quality:
|
||||
|
||||
```bash
|
||||
qmd context add qmd://notes "Personal notes, ideas, and journal entries"
|
||||
qmd context add qmd://project-docs "Technical documentation for the main project"
|
||||
qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
|
||||
```
|
||||
|
||||
### 3. Generate Embeddings
|
||||
|
||||
```bash
|
||||
qmd embed
|
||||
```
|
||||
|
||||
This processes all documents in all collections and generates vector
|
||||
embeddings. Re-run after adding new documents or collections.
|
||||
|
||||
### 4. Verify
|
||||
|
||||
```bash
|
||||
qmd status # shows index health, collection stats, model info
|
||||
```
|
||||
|
||||
## Search Patterns
|
||||
|
||||
### Fast Keyword Search (BM25)
|
||||
|
||||
Best for: exact terms, code identifiers, names, known phrases.
|
||||
No models loaded — near-instant results.
|
||||
|
||||
```bash
|
||||
qmd search "authentication middleware"
|
||||
qmd search "handleError async"
|
||||
```
|
||||
|
||||
### Semantic Vector Search
|
||||
|
||||
Best for: natural language questions, conceptual queries.
|
||||
Loads embedding model (~3s first query).
|
||||
|
||||
```bash
|
||||
qmd vsearch "how does the rate limiter handle burst traffic"
|
||||
qmd vsearch "ideas for improving onboarding flow"
|
||||
```
|
||||
|
||||
### Hybrid Search with Reranking (Best Quality)
|
||||
|
||||
Best for: important queries where quality matters most.
|
||||
Uses all 3 models — query expansion, parallel BM25+vector, reranking.
|
||||
|
||||
```bash
|
||||
qmd query "what decisions were made about the database migration"
|
||||
```
|
||||
|
||||
### Structured Multi-Mode Queries
|
||||
|
||||
Combine different search types in a single query for precision:
|
||||
|
||||
```bash
|
||||
# BM25 for exact term + vector for concept
|
||||
qmd query $'lex: rate limiter\nvec: how does throttling work under load'
|
||||
|
||||
# With query expansion
|
||||
qmd query $'expand: database migration plan\nlex: "schema change"'
|
||||
```
|
||||
|
||||
### Query Syntax (lex/BM25 mode)
|
||||
|
||||
| Syntax | Effect | Example |
|
||||
|--------|--------|---------|
|
||||
| `term` | Prefix match | `perf` matches "performance" |
|
||||
| `"phrase"` | Exact phrase | `"rate limiter"` |
|
||||
| `-term` | Exclude term | `performance -sports` |
|
||||
|
||||
### HyDE (Hypothetical Document Embeddings)
|
||||
|
||||
For complex topics, write what you expect the answer to look like:
|
||||
|
||||
```bash
|
||||
qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
|
||||
```
|
||||
|
||||
### Scoping to Collections
|
||||
|
||||
```bash
|
||||
qmd search "query" --collection notes
|
||||
qmd query "query" --collection project-docs
|
||||
```
|
||||
|
||||
### Output Formats
|
||||
|
||||
```bash
|
||||
qmd search "query" --json # JSON output (best for parsing)
|
||||
qmd search "query" --limit 5 # Limit results
|
||||
qmd get "#abc123" # Get by document ID
|
||||
qmd get "path/to/file.md" # Get by file path
|
||||
qmd get "file.md:50" -l 100 # Get specific line range
|
||||
qmd multi-get "journals/*.md" --json # Batch retrieve by glob
|
||||
```
|
||||
|
||||
## MCP Integration (Recommended)
|
||||
|
||||
qmd exposes an MCP server that provides search tools directly to
|
||||
Hermes Agent via the native MCP client. This is the preferred
|
||||
integration — once configured, the agent gets qmd tools automatically
|
||||
without needing to load this skill.
|
||||
|
||||
### Option A: Stdio Mode (Simple)
|
||||
|
||||
Add to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
qmd:
|
||||
command: "qmd"
|
||||
args: ["mcp"]
|
||||
timeout: 30
|
||||
connect_timeout: 45
|
||||
```
|
||||
|
||||
This registers tools: `mcp_qmd_search`, `mcp_qmd_vsearch`,
|
||||
`mcp_qmd_deep_search`, `mcp_qmd_get`, `mcp_qmd_status`.
|
||||
|
||||
**Tradeoff:** Models load on first search call (~19s cold start),
|
||||
then stay warm for the session. Acceptable for occasional use.
|
||||
|
||||
### Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)
|
||||
|
||||
Start the qmd daemon separately — it keeps models warm in memory:
|
||||
|
||||
```bash
|
||||
# Start daemon (persists across agent restarts)
|
||||
qmd mcp --http --daemon
|
||||
|
||||
# Runs on http://localhost:8181 by default
|
||||
```
|
||||
|
||||
Then configure Hermes Agent to connect via HTTP:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
qmd:
|
||||
url: "http://localhost:8181/mcp"
|
||||
timeout: 30
|
||||
```
|
||||
|
||||
**Tradeoff:** Uses ~2GB RAM while running, but every query is fast
|
||||
(~2-3s). Best for users who search frequently.
|
||||
|
||||
### Keeping the Daemon Running
|
||||
|
||||
#### macOS (launchd)
|
||||
|
||||
```bash
|
||||
cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
||||
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.qmd.daemon</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>qmd</string>
|
||||
<string>mcp</string>
|
||||
<string>--http</string>
|
||||
<string>--daemon</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/tmp/qmd-daemon.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/tmp/qmd-daemon.log</string>
|
||||
</dict>
|
||||
</plist>
|
||||
EOF
|
||||
|
||||
launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
|
||||
```
|
||||
|
||||
#### Linux (systemd user service)
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/systemd/user
|
||||
|
||||
cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
|
||||
[Unit]
|
||||
Description=QMD MCP Daemon
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
ExecStart=qmd mcp --http --daemon
|
||||
Restart=on-failure
|
||||
RestartSec=10
|
||||
Environment=PATH=/usr/local/bin:/usr/bin:/bin
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
EOF
|
||||
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now qmd-daemon
|
||||
systemctl --user status qmd-daemon
|
||||
```
|
||||
|
||||
### MCP Tools Reference
|
||||
|
||||
Once connected, these tools are available as `mcp_qmd_*`:
|
||||
|
||||
| MCP Tool | Maps To | Description |
|
||||
|----------|---------|-------------|
|
||||
| `mcp_qmd_search` | `qmd search` | BM25 keyword search |
|
||||
| `mcp_qmd_vsearch` | `qmd vsearch` | Semantic vector search |
|
||||
| `mcp_qmd_deep_search` | `qmd query` | Hybrid search + reranking |
|
||||
| `mcp_qmd_get` | `qmd get` | Retrieve document by ID or path |
|
||||
| `mcp_qmd_status` | `qmd status` | Index health and stats |
|
||||
|
||||
The MCP tools accept structured JSON queries for multi-mode search:
|
||||
|
||||
```json
|
||||
{
|
||||
"searches": [
|
||||
{"type": "lex", "query": "authentication middleware"},
|
||||
{"type": "vec", "query": "how user login is verified"}
|
||||
],
|
||||
"collections": ["project-docs"],
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
## CLI Usage (Without MCP)
|
||||
|
||||
When MCP is not configured, use qmd directly via terminal:
|
||||
|
||||
```
|
||||
terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)
|
||||
```
|
||||
|
||||
For setup and management tasks, always use terminal:
|
||||
|
||||
```
|
||||
terminal(command="qmd collection add ~/Documents/notes --name notes")
|
||||
terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
|
||||
terminal(command="qmd embed")
|
||||
terminal(command="qmd status")
|
||||
```
|
||||
|
||||
## How the Search Pipeline Works
|
||||
|
||||
Understanding the internals helps choose the right search mode:
|
||||
|
||||
1. **Query Expansion** — A fine-tuned 1.7B model generates 2 alternative
|
||||
queries. The original gets 2x weight in fusion.
|
||||
2. **Parallel Retrieval** — BM25 (SQLite FTS5) and vector search run
|
||||
simultaneously across all query variants.
|
||||
3. **RRF Fusion** — Reciprocal Rank Fusion (k=60) merges results.
|
||||
Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
|
||||
4. **LLM Reranking** — qwen3-reranker scores top 30 candidates (0.0-1.0).
|
||||
5. **Position-Aware Blending** — Ranks 1-3: 75% retrieval / 25% reranker.
|
||||
Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).
|
||||
|
||||
**Smart Chunking:** Documents are split at natural break points (headings,
|
||||
code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code
|
||||
blocks are never split mid-block.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always add context descriptions** — `qmd context add` dramatically
|
||||
improves retrieval accuracy. Describe what each collection contains.
|
||||
2. **Re-embed after adding documents** — `qmd embed` must be re-run when
|
||||
new files are added to collections.
|
||||
3. **Use `qmd search` for speed** — when you need fast keyword lookup
|
||||
(code identifiers, exact names), BM25 is instant and needs no models.
|
||||
4. **Use `qmd query` for quality** — when the question is conceptual or
|
||||
the user needs the best possible results, use hybrid search.
|
||||
5. **Prefer MCP integration** — once configured, the agent gets native
|
||||
tools without needing to load this skill each time.
|
||||
6. **Daemon mode for frequent users** — if the user searches their
|
||||
knowledge base regularly, recommend the HTTP daemon setup.
|
||||
7. **First query in structured search gets 2x weight** — put the most
|
||||
important/certain query first when combining lex and vec.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Models downloading on first run"
|
||||
Normal — qmd auto-downloads ~2GB of GGUF models on first use.
|
||||
This is a one-time operation.
|
||||
|
||||
### Cold start latency (~19s)
|
||||
This happens when models aren't loaded in memory. Solutions:
|
||||
- Use HTTP daemon mode (`qmd mcp --http --daemon`) to keep warm
|
||||
- Use `qmd search` (BM25 only) when models aren't needed
|
||||
- MCP stdio mode loads models on first search, stays warm for session
|
||||
|
||||
### macOS: "unable to load extension"
|
||||
Install Homebrew SQLite: `brew install sqlite`
|
||||
Then ensure it's on PATH before system SQLite.
|
||||
|
||||
### "No collections found"
|
||||
Run `qmd collection add <path> --name <name>` to add directories,
|
||||
then `qmd embed` to index them.
|
||||
|
||||
### Embedding model override (CJK/multilingual)
|
||||
Set `QMD_EMBED_MODEL` environment variable for non-English content:
|
||||
```bash
|
||||
export QMD_EMBED_MODEL="your-multilingual-model"
|
||||
```
|
||||
|
||||
## Data Storage
|
||||
|
||||
- **Index & vectors:** `~/.cache/qmd/index.sqlite`
|
||||
- **Models:** Auto-downloaded to local cache on first run
|
||||
- **No cloud dependencies** — everything runs locally
|
||||
|
||||
## References
|
||||
|
||||
- [GitHub: tobi/qmd](https://github.com/tobi/qmd)
|
||||
- [QMD Changelog](https://github.com/tobi/qmd/blob/main/CHANGELOG.md)
|
||||
|
|
@ -0,0 +1,350 @@
|
|||
---
|
||||
title: "Scrapling"
|
||||
sidebar_label: "Scrapling"
|
||||
description: "Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Scrapling
|
||||
|
||||
Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/research/scrapling` |
|
||||
| Path | `optional-skills/research/scrapling` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | FEUAZUR |
|
||||
| License | MIT |
|
||||
| Tags | `Web Scraping`, `Browser`, `Cloudflare`, `Stealth`, `Crawling`, `Spider` |
|
||||
| Related skills | [`duckduckgo-search`](/docs/user-guide/skills/optional/research/research-duckduckgo-search), [`domain-intel`](/docs/user-guide/skills/optional/research/research-domain-intel) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Scrapling
|
||||
|
||||
[Scrapling](https://github.com/D4Vinci/Scrapling) is a web scraping framework with anti-bot bypass, stealth browser automation, and a spider framework. It provides three fetching strategies (HTTP, dynamic JS, stealth/Cloudflare) and a full CLI.
|
||||
|
||||
**This skill is for educational and research purposes only.** Users must comply with local/international data scraping laws and respect website Terms of Service.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Scraping static HTML pages (faster than browser tools)
|
||||
- Scraping JS-rendered pages that need a real browser
|
||||
- Bypassing Cloudflare Turnstile or bot detection
|
||||
- Crawling multiple pages with a spider
|
||||
- When the built-in `web_extract` tool does not return the data you need
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install "scrapling[all]"
|
||||
scrapling install
|
||||
```
|
||||
|
||||
Minimal install (HTTP only, no browser):
|
||||
```bash
|
||||
pip install scrapling
|
||||
```
|
||||
|
||||
With browser automation only:
|
||||
```bash
|
||||
pip install "scrapling[fetchers]"
|
||||
scrapling install
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Approach | Class | Use When |
|
||||
|----------|-------|----------|
|
||||
| HTTP | `Fetcher` / `FetcherSession` | Static pages, APIs, fast bulk requests |
|
||||
| Dynamic | `DynamicFetcher` / `DynamicSession` | JS-rendered content, SPAs |
|
||||
| Stealth | `StealthyFetcher` / `StealthySession` | Cloudflare, anti-bot protected sites |
|
||||
| Spider | `Spider` | Multi-page crawling with link following |
|
||||
|
||||
## CLI Usage
|
||||
|
||||
### Extract Static Page
|
||||
|
||||
```bash
|
||||
scrapling extract get 'https://example.com' output.md
|
||||
```
|
||||
|
||||
With CSS selector and browser impersonation:
|
||||
|
||||
```bash
|
||||
scrapling extract get 'https://example.com' output.md \
|
||||
--css-selector '.content' \
|
||||
--impersonate 'chrome'
|
||||
```
|
||||
|
||||
### Extract JS-Rendered Page
|
||||
|
||||
```bash
|
||||
scrapling extract fetch 'https://example.com' output.md \
|
||||
--css-selector '.dynamic-content' \
|
||||
--disable-resources \
|
||||
--network-idle
|
||||
```
|
||||
|
||||
### Extract Cloudflare-Protected Page
|
||||
|
||||
```bash
|
||||
scrapling extract stealthy-fetch 'https://protected-site.com' output.html \
|
||||
--solve-cloudflare \
|
||||
--block-webrtc \
|
||||
--hide-canvas
|
||||
```
|
||||
|
||||
### POST Request
|
||||
|
||||
```bash
|
||||
scrapling extract post 'https://example.com/api' output.json \
|
||||
--json '{"query": "search term"}'
|
||||
```
|
||||
|
||||
### Output Formats
|
||||
|
||||
The output format is determined by the file extension:
|
||||
- `.html` -- raw HTML
|
||||
- `.md` -- converted to Markdown
|
||||
- `.txt` -- plain text
|
||||
- `.json` / `.jsonl` -- JSON
|
||||
|
||||
## Python: HTTP Scraping
|
||||
|
||||
### Single Request
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import Fetcher
|
||||
|
||||
page = Fetcher.get('https://quotes.toscrape.com/')
|
||||
quotes = page.css('.quote .text::text').getall()
|
||||
for q in quotes:
|
||||
print(q)
|
||||
```
|
||||
|
||||
### Session (Persistent Cookies)
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import FetcherSession
|
||||
|
||||
with FetcherSession(impersonate='chrome') as session:
|
||||
page = session.get('https://example.com/', stealthy_headers=True)
|
||||
links = page.css('a::attr(href)').getall()
|
||||
for link in links[:5]:
|
||||
sub = session.get(link)
|
||||
print(sub.css('h1::text').get())
|
||||
```
|
||||
|
||||
### POST / PUT / DELETE
|
||||
|
||||
```python
|
||||
page = Fetcher.post('https://api.example.com/data', json={"key": "value"})
|
||||
page = Fetcher.put('https://api.example.com/item/1', data={"name": "updated"})
|
||||
page = Fetcher.delete('https://api.example.com/item/1')
|
||||
```
|
||||
|
||||
### With Proxy
|
||||
|
||||
```python
|
||||
page = Fetcher.get('https://example.com', proxy='http://user:pass@proxy:8080')
|
||||
```
|
||||
|
||||
## Python: Dynamic Pages (JS-Rendered)
|
||||
|
||||
For pages that require JavaScript execution (SPAs, lazy-loaded content):
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import DynamicFetcher
|
||||
|
||||
page = DynamicFetcher.fetch('https://example.com', headless=True)
|
||||
data = page.css('.js-loaded-content::text').getall()
|
||||
```
|
||||
|
||||
### Wait for Specific Element
|
||||
|
||||
```python
|
||||
page = DynamicFetcher.fetch(
|
||||
'https://example.com',
|
||||
wait_selector=('.results', 'visible'),
|
||||
network_idle=True,
|
||||
)
|
||||
```
|
||||
|
||||
### Disable Resources for Speed
|
||||
|
||||
Blocks fonts, images, media, stylesheets (~25% faster):
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import DynamicSession
|
||||
|
||||
with DynamicSession(headless=True, disable_resources=True, network_idle=True) as session:
|
||||
page = session.fetch('https://example.com')
|
||||
items = page.css('.item::text').getall()
|
||||
```
|
||||
|
||||
### Custom Page Automation
|
||||
|
||||
```python
|
||||
from playwright.sync_api import Page
|
||||
from scrapling.fetchers import DynamicFetcher
|
||||
|
||||
def scroll_and_click(page: Page):
|
||||
page.mouse.wheel(0, 3000)
|
||||
page.wait_for_timeout(1000)
|
||||
page.click('button.load-more')
|
||||
page.wait_for_selector('.extra-results')
|
||||
|
||||
page = DynamicFetcher.fetch('https://example.com', page_action=scroll_and_click)
|
||||
results = page.css('.extra-results .item::text').getall()
|
||||
```
|
||||
|
||||
## Python: Stealth Mode (Anti-Bot Bypass)
|
||||
|
||||
For Cloudflare-protected or heavily fingerprinted sites:
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import StealthyFetcher
|
||||
|
||||
page = StealthyFetcher.fetch(
|
||||
'https://protected-site.com',
|
||||
headless=True,
|
||||
solve_cloudflare=True,
|
||||
block_webrtc=True,
|
||||
hide_canvas=True,
|
||||
)
|
||||
content = page.css('.protected-content::text').getall()
|
||||
```
|
||||
|
||||
### Stealth Session
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import StealthySession
|
||||
|
||||
with StealthySession(headless=True, solve_cloudflare=True) as session:
|
||||
page1 = session.fetch('https://protected-site.com/page1')
|
||||
page2 = session.fetch('https://protected-site.com/page2')
|
||||
```
|
||||
|
||||
## Element Selection
|
||||
|
||||
All fetchers return a `Selector` object with these methods:
|
||||
|
||||
### CSS Selectors
|
||||
|
||||
```python
|
||||
page.css('h1::text').get() # First h1 text
|
||||
page.css('a::attr(href)').getall() # All link hrefs
|
||||
page.css('.quote .text::text').getall() # Nested selection
|
||||
```
|
||||
|
||||
### XPath
|
||||
|
||||
```python
|
||||
page.xpath('//div[@class="content"]/text()').getall()
|
||||
page.xpath('//a/@href').getall()
|
||||
```
|
||||
|
||||
### Find Methods
|
||||
|
||||
```python
|
||||
page.find_all('div', class_='quote') # By tag + attribute
|
||||
page.find_by_text('Read more', tag='a') # By text content
|
||||
page.find_by_regex(r'\$\d+\.\d{2}') # By regex pattern
|
||||
```
|
||||
|
||||
### Similar Elements
|
||||
|
||||
Find elements with similar structure (useful for product listings, etc.):
|
||||
|
||||
```python
|
||||
first_product = page.css('.product')[0]
|
||||
all_similar = first_product.find_similar()
|
||||
```
|
||||
|
||||
### Navigation
|
||||
|
||||
```python
|
||||
el = page.css('.target')[0]
|
||||
el.parent # Parent element
|
||||
el.children # Child elements
|
||||
el.next_sibling # Next sibling
|
||||
el.prev_sibling # Previous sibling
|
||||
```
|
||||
|
||||
## Python: Spider Framework
|
||||
|
||||
For multi-page crawling with link following:
|
||||
|
||||
```python
|
||||
from scrapling.spiders import Spider, Request, Response
|
||||
|
||||
class QuotesSpider(Spider):
|
||||
name = "quotes"
|
||||
start_urls = ["https://quotes.toscrape.com/"]
|
||||
concurrent_requests = 10
|
||||
download_delay = 1
|
||||
|
||||
async def parse(self, response: Response):
|
||||
for quote in response.css('.quote'):
|
||||
yield {
|
||||
"text": quote.css('.text::text').get(),
|
||||
"author": quote.css('.author::text').get(),
|
||||
"tags": quote.css('.tag::text').getall(),
|
||||
}
|
||||
|
||||
next_page = response.css('.next a::attr(href)').get()
|
||||
if next_page:
|
||||
yield response.follow(next_page)
|
||||
|
||||
result = QuotesSpider().start()
|
||||
print(f"Scraped {len(result.items)} quotes")
|
||||
result.items.to_json("quotes.json")
|
||||
```
|
||||
|
||||
### Multi-Session Spider
|
||||
|
||||
Route requests to different fetcher types:
|
||||
|
||||
```python
|
||||
from scrapling.fetchers import FetcherSession, AsyncStealthySession
|
||||
|
||||
class SmartSpider(Spider):
|
||||
name = "smart"
|
||||
start_urls = ["https://example.com/"]
|
||||
|
||||
def configure_sessions(self, manager):
|
||||
manager.add("fast", FetcherSession(impersonate="chrome"))
|
||||
manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
|
||||
|
||||
async def parse(self, response: Response):
|
||||
for link in response.css('a::attr(href)').getall():
|
||||
if "protected" in link:
|
||||
yield Request(link, sid="stealth")
|
||||
else:
|
||||
yield Request(link, sid="fast", callback=self.parse)
|
||||
```
|
||||
|
||||
### Pause/Resume Crawling
|
||||
|
||||
```python
|
||||
spider = QuotesSpider(crawldir="./crawl_checkpoint")
|
||||
spider.start() # Ctrl+C to pause, re-run to resume from checkpoint
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Browser install required**: run `scrapling install` after pip install -- without it, `DynamicFetcher` and `StealthyFetcher` will fail
|
||||
- **Timeouts**: DynamicFetcher/StealthyFetcher timeout is in **milliseconds** (default 30000), Fetcher timeout is in **seconds**
|
||||
- **Cloudflare bypass**: `solve_cloudflare=True` adds 5-15 seconds to fetch time -- only enable when needed
|
||||
- **Resource usage**: StealthyFetcher runs a real browser -- limit concurrent usage
|
||||
- **Legal**: always check robots.txt and website ToS before scraping. This library is for educational and research purposes
|
||||
- **Python version**: requires Python 3.10+
|
||||
|
|
@ -0,0 +1,172 @@
|
|||
---
|
||||
title: "1Password — Set up and use 1Password CLI (op)"
|
||||
sidebar_label: "1Password"
|
||||
description: "Set up and use 1Password CLI (op)"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# 1Password
|
||||
|
||||
Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in, and reading/injecting secrets for commands.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/security/1password` |
|
||||
| Path | `optional-skills/security/1password` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | arceus77-7, enhanced by Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `security`, `secrets`, `1password`, `op`, `cli` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# 1Password CLI
|
||||
|
||||
Use this skill when the user wants secrets managed through 1Password instead of plaintext env vars or files.
|
||||
|
||||
## Requirements
|
||||
|
||||
- 1Password account
|
||||
- 1Password CLI (`op`) installed
|
||||
- One of: desktop app integration, service account token (`OP_SERVICE_ACCOUNT_TOKEN`), or Connect server
|
||||
- `tmux` available for stable authenticated sessions during Hermes terminal calls (desktop app flow only)
|
||||
|
||||
## When to Use
|
||||
|
||||
- Install or configure 1Password CLI
|
||||
- Sign in with `op signin`
|
||||
- Read secret references like `op://Vault/Item/field`
|
||||
- Inject secrets into config/templates using `op inject`
|
||||
- Run commands with secret env vars via `op run`
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
### Service Account (recommended for Hermes)
|
||||
|
||||
Set `OP_SERVICE_ACCOUNT_TOKEN` in `~/.hermes/.env` (the skill will prompt for this on first load).
|
||||
No desktop app needed. Supports `op read`, `op inject`, `op run`.
|
||||
|
||||
```bash
|
||||
export OP_SERVICE_ACCOUNT_TOKEN="your-token-here"
|
||||
op whoami # verify — should show Type: SERVICE_ACCOUNT
|
||||
```
|
||||
|
||||
### Desktop App Integration (interactive)
|
||||
|
||||
1. Enable in 1Password desktop app: Settings → Developer → Integrate with 1Password CLI
|
||||
2. Ensure app is unlocked
|
||||
3. Run `op signin` and approve the biometric prompt
|
||||
|
||||
### Connect Server (self-hosted)
|
||||
|
||||
```bash
|
||||
export OP_CONNECT_HOST="http://localhost:8080"
|
||||
export OP_CONNECT_TOKEN="your-connect-token"
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
1. Install CLI:
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install 1password-cli
|
||||
|
||||
# Linux (official package/install docs)
|
||||
# See references/get-started.md for distro-specific links.
|
||||
|
||||
# Windows (winget)
|
||||
winget install AgileBits.1Password.CLI
|
||||
```
|
||||
|
||||
2. Verify:
|
||||
|
||||
```bash
|
||||
op --version
|
||||
```
|
||||
|
||||
3. Choose an auth method above and configure it.
|
||||
|
||||
## Hermes Execution Pattern (desktop app flow)
|
||||
|
||||
Hermes terminal commands are non-interactive by default and can lose auth context between calls.
|
||||
For reliable `op` use with desktop app integration, run sign-in and secret operations inside a dedicated tmux session.
|
||||
|
||||
Note: This is NOT needed when using `OP_SERVICE_ACCOUNT_TOKEN` — the token persists across terminal calls automatically.
|
||||
|
||||
```bash
|
||||
SOCKET_DIR="${TMPDIR:-/tmp}/hermes-tmux-sockets"
|
||||
mkdir -p "$SOCKET_DIR"
|
||||
SOCKET="$SOCKET_DIR/hermes-op.sock"
|
||||
SESSION="op-auth-$(date +%Y%m%d-%H%M%S)"
|
||||
|
||||
tmux -S "$SOCKET" new -d -s "$SESSION" -n shell
|
||||
|
||||
# Sign in (approve in desktop app when prompted)
|
||||
tmux -S "$SOCKET" send-keys -t "$SESSION":0.0 -- "eval \"\$(op signin --account my.1password.com)\"" Enter
|
||||
|
||||
# Verify auth
|
||||
tmux -S "$SOCKET" send-keys -t "$SESSION":0.0 -- "op whoami" Enter
|
||||
|
||||
# Example read
|
||||
tmux -S "$SOCKET" send-keys -t "$SESSION":0.0 -- "op read 'op://Private/Npmjs/one-time password?attribute=otp'" Enter
|
||||
|
||||
# Capture output when needed
|
||||
tmux -S "$SOCKET" capture-pane -p -J -t "$SESSION":0.0 -S -200
|
||||
|
||||
# Cleanup
|
||||
tmux -S "$SOCKET" kill-session -t "$SESSION"
|
||||
```
|
||||
|
||||
## Common Operations
|
||||
|
||||
### Read a secret
|
||||
|
||||
```bash
|
||||
op read "op://app-prod/db/password"
|
||||
```
|
||||
|
||||
### Get OTP
|
||||
|
||||
```bash
|
||||
op read "op://app-prod/npm/one-time password?attribute=otp"
|
||||
```
|
||||
|
||||
### Inject into template
|
||||
|
||||
```bash
|
||||
echo "db_password: {{ op://app-prod/db/password }}" | op inject
|
||||
```
|
||||
|
||||
### Run a command with secret env var
|
||||
|
||||
```bash
|
||||
export DB_PASSWORD="op://app-prod/db/password"
|
||||
op run -- sh -c '[ -n "$DB_PASSWORD" ] && echo "DB_PASSWORD is set" || echo "DB_PASSWORD missing"'
|
||||
```
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Never print raw secrets back to user unless they explicitly request the value.
|
||||
- Prefer `op run` / `op inject` instead of writing secrets into files.
|
||||
- If command fails with "account is not signed in", run `op signin` again in the same tmux session.
|
||||
- If desktop app integration is unavailable (headless/CI), use service account token flow.
|
||||
|
||||
## CI / Headless note
|
||||
|
||||
For non-interactive use, authenticate with `OP_SERVICE_ACCOUNT_TOKEN` and avoid interactive `op signin`.
|
||||
Service accounts require CLI v2.18.0+.
|
||||
|
||||
## References
|
||||
|
||||
- `references/get-started.md`
|
||||
- `references/cli-examples.md`
|
||||
- https://developer.1password.com/docs/cli/
|
||||
- https://developer.1password.com/docs/service-accounts/
|
||||
|
|
@ -0,0 +1,424 @@
|
|||
---
|
||||
title: "Oss Forensics — Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories"
|
||||
sidebar_label: "Oss Forensics"
|
||||
description: "Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Oss Forensics
|
||||
|
||||
Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories.
|
||||
Covers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence
|
||||
collection, hypothesis formation/validation, and structured forensic reporting.
|
||||
Inspired by RAPTOR's 1800+ line OSS Forensics system.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/security/oss-forensics` |
|
||||
| Path | `optional-skills/security/oss-forensics` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# OSS Security Forensics Skill
|
||||
|
||||
A 7-phase multi-agent investigation framework for researching open-source supply chain attacks.
|
||||
Adapted from RAPTOR's forensics system. Covers GitHub Archive, Wayback Machine, GitHub API,
|
||||
local git analysis, IOC extraction, evidence-backed hypothesis formation and validation,
|
||||
and final forensic report generation.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Anti-Hallucination Guardrails
|
||||
|
||||
Read these before every investigation step. Violating them invalidates the report.
|
||||
|
||||
1. **Evidence-First Rule**: Every claim in any report, hypothesis, or summary MUST cite at least one evidence ID (`EV-XXXX`). Assertions without citations are forbidden.
|
||||
2. **STAY IN YOUR LANE**: Each sub-agent (investigator) has a single data source. Do NOT mix sources. The GH Archive investigator does not query the GitHub API, and vice versa. Role boundaries are hard.
|
||||
3. **Fact vs. Hypothesis Separation**: Mark all unverified inferences with `[HYPOTHESIS]`. Only statements verified against original sources may be stated as facts.
|
||||
4. **No Evidence Fabrication**: The hypothesis validator MUST mechanically check that every cited evidence ID actually exists in the evidence store before accepting a hypothesis.
|
||||
5. **Proof-Required Disproval**: A hypothesis cannot be dismissed without a specific, evidence-backed counter-argument. "No evidence found" is not sufficient to disprove—it only makes a hypothesis inconclusive.
|
||||
6. **SHA/URL Double-Verification**: Any commit SHA, URL, or external identifier cited as evidence must be independently confirmed from at least two sources before being marked as verified.
|
||||
7. **Suspicious Code Rule**: Never run code found inside the investigated repository locally. Analyze statically only, or use `execute_code` in a sandboxed environment.
|
||||
8. **Secret Redaction**: Any API keys, tokens, or credentials discovered during investigation must be redacted in the final report. Log them internally only.
|
||||
|
||||
---
|
||||
|
||||
## Example Scenarios
|
||||
|
||||
- **Scenario A: Dependency Confusion**: A malicious package `internal-lib-v2` is uploaded to NPM with a higher version than the internal one. The investigator must track when this package was first seen and if any PushEvents in the target repo updated `package.json` to this version.
|
||||
- **Scenario B: Maintainer Takeover**: A long-term contributor's account is used to push a backdoored `.github/workflows/build.yml`. The investigator looks for PushEvents from this user after a long period of inactivity or from a new IP/location (if detectable via BigQuery).
|
||||
- **Scenario C: Force-Push Hide**: A developer accidentally commits a production secret, then force-pushes to "fix" it. The investigator uses `git fsck` and GH Archive to recover the original commit SHA and verify what was leaked.
|
||||
|
||||
---
|
||||
|
||||
> **Path convention**: Throughout this skill, `SKILL_DIR` refers to the root of this skill's
|
||||
> installation directory (the folder containing this `SKILL.md`). When the skill is loaded,
|
||||
> resolve `SKILL_DIR` to the actual path — e.g. `~/.hermes/skills/security/oss-forensics/`
|
||||
> or the `optional-skills/` equivalent. All script and template references are relative to it.
|
||||
|
||||
## Phase 0: Initialization
|
||||
|
||||
1. Create investigation working directory:
|
||||
```bash
|
||||
mkdir investigation_$(echo "REPO_NAME" | tr '/' '_')
|
||||
cd investigation_$(echo "REPO_NAME" | tr '/' '_')
|
||||
```
|
||||
2. Initialize the evidence store:
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list
|
||||
```
|
||||
3. Copy the forensic report template:
|
||||
```bash
|
||||
cp SKILL_DIR/templates/forensic-report.md ./investigation-report.md
|
||||
```
|
||||
4. Create an `iocs.md` file to track Indicators of Compromise as they are discovered.
|
||||
5. Record the investigation start time, target repository, and stated investigation goal.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Prompt Parsing and IOC Extraction
|
||||
|
||||
**Goal**: Extract all structured investigative targets from the user's request.
|
||||
|
||||
**Actions**:
|
||||
- Parse the user prompt and extract:
|
||||
- Target repository (`owner/repo`)
|
||||
- Target actors (GitHub handles, email addresses)
|
||||
- Time window of interest (commit date ranges, PR timestamps)
|
||||
- Provided Indicators of Compromise: commit SHAs, file paths, package names, IP addresses, domains, API keys/tokens, malicious URLs
|
||||
- Any linked vendor security reports or blog posts
|
||||
|
||||
**Tools**: Reasoning only, or `execute_code` for regex extraction from large text blocks.
|
||||
|
||||
**Output**: Populate `iocs.md` with extracted IOCs. Each IOC must have:
|
||||
- Type (from: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, MALICIOUS_URL, OTHER)
|
||||
- Value
|
||||
- Source (user-provided, inferred)
|
||||
|
||||
**Reference**: See [evidence-types.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/evidence-types.md) for IOC taxonomy.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Parallel Evidence Collection
|
||||
|
||||
Spawn up to 5 specialist investigator sub-agents using `delegate_task` (batch mode, max 3 concurrent). Each investigator has a **single data source** and must not mix sources.
|
||||
|
||||
> **Orchestrator note**: Pass the IOC list from Phase 1 and the investigation time window in the `context` field of each delegated task.
|
||||
|
||||
---
|
||||
|
||||
### Investigator 1: Local Git Investigator
|
||||
|
||||
**ROLE BOUNDARY**: You query the LOCAL GIT REPOSITORY ONLY. Do not call any external APIs.
|
||||
|
||||
**Actions**:
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/OWNER/REPO.git target_repo && cd target_repo
|
||||
|
||||
# Full commit log with stats
|
||||
git log --all --full-history --stat --format="%H|%ae|%an|%ai|%s" > ../git_log.txt
|
||||
|
||||
# Detect force-push evidence (orphaned/dangling commits)
|
||||
git fsck --lost-found --unreachable 2>&1 | grep commit > ../dangling_commits.txt
|
||||
|
||||
# Check reflog for rewritten history
|
||||
git reflog --all > ../reflog.txt
|
||||
|
||||
# List ALL branches including deleted remote refs
|
||||
git branch -a -v > ../branches.txt
|
||||
|
||||
# Find suspicious large binary additions
|
||||
git log --all --diff-filter=A --name-only --format="%H %ai" -- "*.so" "*.dll" "*.exe" "*.bin" > ../binary_additions.txt
|
||||
|
||||
# Check for GPG signature anomalies
|
||||
git log --show-signature --format="%H %ai %aN" > ../signature_check.txt 2>&1
|
||||
```
|
||||
|
||||
**Evidence to collect** (add via `python3 SKILL_DIR/scripts/evidence-store.py add`):
|
||||
- Each dangling commit SHA → type: `git`
|
||||
- Force-push evidence (reflog showing history rewrite) → type: `git`
|
||||
- Unsigned commits from verified contributors → type: `git`
|
||||
- Suspicious binary file additions → type: `git`
|
||||
|
||||
**Reference**: See [recovery-techniques.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/recovery-techniques.md) for accessing force-pushed commits.
|
||||
|
||||
---
|
||||
|
||||
### Investigator 2: GitHub API Investigator
|
||||
|
||||
**ROLE BOUNDARY**: You query the GITHUB REST API ONLY. Do not run git commands locally.
|
||||
|
||||
**Actions**:
|
||||
```bash
|
||||
# Commits (paginated)
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/commits?per_page=100" > api_commits.json
|
||||
|
||||
# Pull Requests including closed/deleted
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/pulls?state=all&per_page=100" > api_prs.json
|
||||
|
||||
# Issues
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/issues?state=all&per_page=100" > api_issues.json
|
||||
|
||||
# Contributors and collaborator changes
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/contributors" > api_contributors.json
|
||||
|
||||
# Repository events (last 300)
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/events?per_page=100" > api_events.json
|
||||
|
||||
# Check specific suspicious commit SHA details
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/git/commits/SHA" > commit_detail.json
|
||||
|
||||
# Releases
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/releases?per_page=100" > api_releases.json
|
||||
|
||||
# Check if a specific commit exists (force-pushed commits may 404 on commits/ but succeed on git/commits/)
|
||||
curl -s "https://api.github.com/repos/OWNER/REPO/commits/SHA" | jq .sha
|
||||
```
|
||||
|
||||
**Cross-reference targets** (flag discrepancies as evidence):
|
||||
- PR exists in archive but missing from API → evidence of deletion
|
||||
- Contributor in archive events but not in contributors list → evidence of permission revocation
|
||||
- Commit in archive PushEvents but not in API commit list → evidence of force-push/deletion
|
||||
|
||||
**Reference**: See [evidence-types.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/evidence-types.md) for GH event types.
|
||||
|
||||
---
|
||||
|
||||
### Investigator 3: Wayback Machine Investigator
|
||||
|
||||
**ROLE BOUNDARY**: You query the WAYBACK MACHINE CDX API ONLY. Do not use the GitHub API.
|
||||
|
||||
**Goal**: Recover deleted GitHub pages (READMEs, issues, PRs, releases, wiki pages).
|
||||
|
||||
**Actions**:
|
||||
```bash
|
||||
# Search for archived snapshots of the repo main page
|
||||
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO&output=json&limit=100&from=YYYYMMDD&to=YYYYMMDD" > wayback_main.json
|
||||
|
||||
# Search for a specific deleted issue
|
||||
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUM&output=json&limit=50" > wayback_issue_NUM.json
|
||||
|
||||
# Search for a specific deleted PR
|
||||
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/pull/NUM&output=json&limit=50" > wayback_pr_NUM.json
|
||||
|
||||
# Fetch the best snapshot of a page
|
||||
# Use the Wayback Machine URL: https://web.archive.org/web/TIMESTAMP/ORIGINAL_URL
|
||||
# Example: https://web.archive.org/web/20240101000000*/github.com/OWNER/REPO
|
||||
|
||||
# Advanced: Search for deleted releases/tags
|
||||
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/releases/tag/*&output=json" > wayback_tags.json
|
||||
|
||||
# Advanced: Search for historical wiki changes
|
||||
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/wiki/*&output=json" > wayback_wiki.json
|
||||
```
|
||||
|
||||
**Evidence to collect**:
|
||||
- Archived snapshots of deleted issues/PRs with their content
|
||||
- Historical README versions showing changes
|
||||
- Evidence of content present in archive but missing from current GitHub state
|
||||
|
||||
**Reference**: See [github-archive-guide.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/github-archive-guide.md) for CDX API parameters.
|
||||
|
||||
---
|
||||
|
||||
### Investigator 4: GH Archive / BigQuery Investigator
|
||||
|
||||
**ROLE BOUNDARY**: You query GITHUB ARCHIVE via BIGQUERY ONLY. This is a tamper-proof record of all public GitHub events.
|
||||
|
||||
> **Prerequisites**: Requires Google Cloud credentials with BigQuery access (`gcloud auth application-default login`). If unavailable, skip this investigator and note it in the report.
|
||||
|
||||
**Cost Optimization Rules** (MANDATORY):
|
||||
1. ALWAYS run a `--dry_run` before every query to estimate cost.
|
||||
2. Use `_TABLE_SUFFIX` to filter by date range and minimize scanned data.
|
||||
3. Only SELECT the columns you need.
|
||||
4. Add a LIMIT unless aggregating.
|
||||
|
||||
```bash
|
||||
# Template: safe BigQuery query for PushEvents to OWNER/REPO
|
||||
bq query --use_legacy_sql=false --dry_run "
|
||||
SELECT created_at, actor.login, payload.commits, payload.before, payload.head,
|
||||
payload.size, payload.distinct_size
|
||||
FROM \`githubarchive.month.*\`
|
||||
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
|
||||
AND type = 'PushEvent'
|
||||
AND repo.name = 'OWNER/REPO'
|
||||
LIMIT 1000
|
||||
"
|
||||
# If cost is acceptable, re-run without --dry_run
|
||||
|
||||
# Detect force-pushes: zero-distinct_size PushEvents mean commits were force-erased
|
||||
# payload.distinct_size = 0 AND payload.size > 0 → force push indicator
|
||||
|
||||
# Check for deleted branch events
|
||||
bq query --use_legacy_sql=false "
|
||||
SELECT created_at, actor.login, payload.ref, payload.ref_type
|
||||
FROM \`githubarchive.month.*\`
|
||||
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
|
||||
AND type = 'DeleteEvent'
|
||||
AND repo.name = 'OWNER/REPO'
|
||||
LIMIT 200
|
||||
"
|
||||
```
|
||||
|
||||
**Evidence to collect**:
|
||||
- Force-push events (payload.size > 0, payload.distinct_size = 0)
|
||||
- DeleteEvents for branches/tags
|
||||
- WorkflowRunEvents for suspicious CI/CD automation
|
||||
- PushEvents that precede a "gap" in the git log (evidence of rewrite)
|
||||
|
||||
**Reference**: See [github-archive-guide.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/github-archive-guide.md) for all 12 event types and query patterns.
|
||||
|
||||
---
|
||||
|
||||
### Investigator 5: IOC Enrichment Investigator
|
||||
|
||||
**ROLE BOUNDARY**: You enrich EXISTING IOCs from Phase 1 using passive public sources ONLY. Do not execute any code from the target repository.
|
||||
|
||||
**Actions**:
|
||||
- For each commit SHA: attempt recovery via direct GitHub URL (`github.com/OWNER/REPO/commit/SHA.patch`)
|
||||
- For each domain/IP: check passive DNS, WHOIS records (via `web_extract` on public WHOIS services)
|
||||
- For each package name: check npm/PyPI for matching malicious package reports
|
||||
- For each actor username: check GitHub profile, contribution history, account age
|
||||
- Recover force-pushed commits using 3 methods (see [recovery-techniques.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/recovery-techniques.md))
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Evidence Consolidation
|
||||
|
||||
After all investigators complete:
|
||||
|
||||
1. Run `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list` to see all collected evidence.
|
||||
2. For each piece of evidence, verify the `content_sha256` hash matches the original source.
|
||||
3. Group evidence by:
|
||||
- **Timeline**: Sort all timestamped evidence chronologically
|
||||
- **Actor**: Group by GitHub handle or email
|
||||
- **IOC**: Link evidence to the IOC it relates to
|
||||
4. Identify **discrepancies**: items present in one source but absent in another (key deletion indicators).
|
||||
5. Flag evidence as `[VERIFIED]` (confirmed from 2+ independent sources) or `[UNVERIFIED]` (single source only).
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Hypothesis Formation
|
||||
|
||||
A hypothesis must:
|
||||
- State a specific claim (e.g., "Actor X force-pushed to BRANCH on DATE to erase commit SHA")
|
||||
- Cite at least 2 evidence IDs that support it (`EV-XXXX`, `EV-YYYY`)
|
||||
- Identify what evidence would disprove it
|
||||
- Be labeled `[HYPOTHESIS]` until validated
|
||||
|
||||
**Common hypothesis templates** (see [investigation-templates.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/investigation-templates.md)):
|
||||
- Maintainer Compromise: legitimate account used post-takeover to inject malicious code
|
||||
- Dependency Confusion: package name squatting to intercept installs
|
||||
- CI/CD Injection: malicious workflow changes to run code during builds
|
||||
- Typosquatting: near-identical package name targeting misspellers
|
||||
- Credential Leak: token/key accidentally committed then force-pushed to erase
|
||||
|
||||
For each hypothesis, spawn a `delegate_task` sub-agent to attempt to find disconfirming evidence before confirming.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Hypothesis Validation
|
||||
|
||||
The validator sub-agent MUST mechanically check:
|
||||
|
||||
1. For each hypothesis, extract all cited evidence IDs.
|
||||
2. Verify each ID exists in `evidence.json` (hard failure if any ID is missing → hypothesis rejected as potentially fabricated).
|
||||
3. Verify each `[VERIFIED]` piece of evidence was confirmed from 2+ sources.
|
||||
4. Check logical consistency: does the timeline depicted by the evidence support the hypothesis?
|
||||
5. Check for alternative explanations: could the same evidence pattern arise from a benign cause?
|
||||
|
||||
**Output**:
|
||||
- `VALIDATED`: All evidence cited, verified, logically consistent, no plausible alternative explanation.
|
||||
- `INCONCLUSIVE`: Evidence supports hypothesis but alternative explanations exist or evidence is insufficient.
|
||||
- `REJECTED`: Missing evidence IDs, unverified evidence cited as fact, logical inconsistency detected.
|
||||
|
||||
Rejected hypotheses feed back into Phase 4 for refinement (max 3 iterations).
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Final Report Generation
|
||||
|
||||
Populate `investigation-report.md` using the template in [forensic-report.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/templates/forensic-report.md).
|
||||
|
||||
**Mandatory sections**:
|
||||
- Executive Summary: one-paragraph verdict (Compromised / Clean / Inconclusive) with confidence level
|
||||
- Timeline: chronological reconstruction of all significant events with evidence citations
|
||||
- Validated Hypotheses: each with status and supporting evidence IDs
|
||||
- Evidence Registry: table of all `EV-XXXX` entries with source, type, and verification status
|
||||
- IOC List: all extracted and enriched Indicators of Compromise
|
||||
- Chain of Custody: how evidence was collected, from what sources, at what timestamps
|
||||
- Recommendations: immediate mitigations if compromise detected; monitoring recommendations
|
||||
|
||||
**Report rules**:
|
||||
- Every factual claim must have at least one `[EV-XXXX]` citation
|
||||
- Executive Summary must state confidence level (High / Medium / Low)
|
||||
- All secrets/credentials must be redacted to `[REDACTED]`
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Completion
|
||||
|
||||
1. Run final evidence count: `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list`
|
||||
2. Archive the full investigation directory.
|
||||
3. If compromise is confirmed:
|
||||
- List immediate mitigations (rotate credentials, pin dependency hashes, notify affected users)
|
||||
- Identify affected versions/packages
|
||||
- Note disclosure obligations (if a public package: coordinate with the package registry)
|
||||
4. Present the final `investigation-report.md` to the user.
|
||||
|
||||
---
|
||||
|
||||
## Ethical Use Guidelines
|
||||
|
||||
This skill is designed for **defensive security investigation** — protecting open-source software from supply chain attacks. It must not be used for:
|
||||
|
||||
- **Harassment or stalking** of contributors or maintainers
|
||||
- **Doxing** — correlating GitHub activity to real identities for malicious purposes
|
||||
- **Competitive intelligence** — investigating proprietary or internal repositories without authorization
|
||||
- **False accusations** — publishing investigation results without validated evidence (see anti-hallucination guardrails)
|
||||
|
||||
Investigations should be conducted with the principle of **minimal intrusion**: collect only the evidence necessary to validate or refute the hypothesis. When publishing results, follow responsible disclosure practices and coordinate with affected maintainers before public disclosure.
|
||||
|
||||
If the investigation reveals a genuine compromise, follow the coordinated vulnerability disclosure process:
|
||||
1. Notify the repository maintainers privately first
|
||||
2. Allow reasonable time for remediation (typically 90 days)
|
||||
3. Coordinate with package registries (npm, PyPI, etc.) if published packages are affected
|
||||
4. File a CVE if appropriate
|
||||
|
||||
---
|
||||
|
||||
## API Rate Limiting
|
||||
|
||||
GitHub REST API enforces rate limits that will interrupt large investigations if not managed.
|
||||
|
||||
**Authenticated requests**: 5,000/hour (requires `GITHUB_TOKEN` env var or `gh` CLI auth)
|
||||
**Unauthenticated requests**: 60/hour (unusable for investigations)
|
||||
|
||||
**Best practices**:
|
||||
- Always authenticate: `export GITHUB_TOKEN=ghp_...` or use `gh` CLI (auto-authenticates)
|
||||
- Use conditional requests (`If-None-Match` / `If-Modified-Since` headers) to avoid consuming quota on unchanged data
|
||||
- For paginated endpoints, fetch all pages in sequence — don't parallelize against the same endpoint
|
||||
- Check `X-RateLimit-Remaining` header; if below 100, pause for `X-RateLimit-Reset` timestamp
|
||||
- BigQuery has its own quotas (10 TiB/day free tier) — always dry-run first
|
||||
- Wayback Machine CDX API: no formal rate limit, but be courteous (1-2 req/sec max)
|
||||
|
||||
If rate-limited mid-investigation, record the partial results in the evidence store and note the limitation in the report.
|
||||
|
||||
---
|
||||
|
||||
## Reference Materials
|
||||
|
||||
- [github-archive-guide.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/github-archive-guide.md) — BigQuery queries, CDX API, 12 event types
|
||||
- [evidence-types.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/evidence-types.md) — IOC taxonomy, evidence source types, observation types
|
||||
- [recovery-techniques.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/recovery-techniques.md) — Recovering deleted commits, PRs, issues
|
||||
- [investigation-templates.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/references/investigation-templates.md) — Pre-built hypothesis templates per attack type
|
||||
- [evidence-store.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/scripts/evidence-store.py) — CLI tool for managing the evidence JSON store
|
||||
- [forensic-report.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/security/oss-forensics/templates/forensic-report.md) — Structured report template
|
||||
|
|
@ -0,0 +1,207 @@
|
|||
---
|
||||
title: "Sherlock — OSINT username search across 400+ social networks"
|
||||
sidebar_label: "Sherlock"
|
||||
description: "OSINT username search across 400+ social networks"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Sherlock
|
||||
|
||||
OSINT username search across 400+ social networks. Hunt down social media accounts by username.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/security/sherlock` |
|
||||
| Path | `optional-skills/security/sherlock` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | unmodeled-tyler |
|
||||
| License | MIT |
|
||||
| Tags | `osint`, `security`, `username`, `social-media`, `reconnaissance` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Sherlock OSINT Username Search
|
||||
|
||||
Hunt down social media accounts by username across 400+ social networks using the [Sherlock Project](https://github.com/sherlock-project/sherlock).
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to find accounts associated with a username
|
||||
- User wants to check username availability across platforms
|
||||
- User is conducting OSINT or reconnaissance research
|
||||
- User asks "where is this username registered?" or similar
|
||||
|
||||
## Requirements
|
||||
|
||||
- Sherlock CLI installed: `pipx install sherlock-project` or `pip install sherlock-project`
|
||||
- Alternatively: Docker available (`docker run -it --rm sherlock/sherlock`)
|
||||
- Network access to query social platforms
|
||||
|
||||
## Procedure
|
||||
|
||||
### 1. Check if Sherlock is Installed
|
||||
|
||||
**Before doing anything else**, verify sherlock is available:
|
||||
|
||||
```bash
|
||||
sherlock --version
|
||||
```
|
||||
|
||||
If the command fails:
|
||||
- Offer to install: `pipx install sherlock-project` (recommended) or `pip install sherlock-project`
|
||||
- **Do NOT** try multiple installation methods — pick one and proceed
|
||||
- If installation fails, inform the user and stop
|
||||
|
||||
### 2. Extract Username
|
||||
|
||||
**Extract the username directly from the user's message if clearly stated.**
|
||||
|
||||
Examples where you should **NOT** use clarify:
|
||||
- "Find accounts for nasa" → username is `nasa`
|
||||
- "Search for johndoe123" → username is `johndoe123`
|
||||
- "Check if alice exists on social media" → username is `alice`
|
||||
- "Look up user bob on social networks" → username is `bob`
|
||||
|
||||
**Only use clarify if:**
|
||||
- Multiple potential usernames mentioned ("search for alice or bob")
|
||||
- Ambiguous phrasing ("search for my username" without specifying)
|
||||
- No username mentioned at all ("do an OSINT search")
|
||||
|
||||
When extracting, take the **exact** username as stated — preserve case, numbers, underscores, etc.
|
||||
|
||||
### 3. Build Command
|
||||
|
||||
**Default command** (use this unless user specifically requests otherwise):
|
||||
```bash
|
||||
sherlock --print-found --no-color "<username>" --timeout 90
|
||||
```
|
||||
|
||||
**Optional flags** (only add if user explicitly requests):
|
||||
- `--nsfw` — Include NSFW sites (only if user asks)
|
||||
- `--tor` — Route through Tor (only if user asks for anonymity)
|
||||
|
||||
**Do NOT ask about options via clarify** — just run the default search. Users can request specific options if needed.
|
||||
|
||||
### 4. Execute Search
|
||||
|
||||
Run via the `terminal` tool. The command typically takes 30-120 seconds depending on network conditions and site count.
|
||||
|
||||
**Example terminal call:**
|
||||
```json
|
||||
{
|
||||
"command": "sherlock --print-found --no-color \"target_username\"",
|
||||
"timeout": 180
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Parse and Present Results
|
||||
|
||||
Sherlock outputs found accounts in a simple format. Parse the output and present:
|
||||
|
||||
1. **Summary line:** "Found X accounts for username 'Y'"
|
||||
2. **Categorized links:** Group by platform type if helpful (social, professional, forums, etc.)
|
||||
3. **Output file location:** Sherlock saves results to `<username>.txt` by default
|
||||
|
||||
**Example output parsing:**
|
||||
```
|
||||
[+] Instagram: https://instagram.com/username
|
||||
[+] Twitter: https://twitter.com/username
|
||||
[+] GitHub: https://github.com/username
|
||||
```
|
||||
|
||||
Present findings as clickable links when possible.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### No Results Found
|
||||
If Sherlock finds no accounts, this is often correct — the username may not be registered on checked platforms. Suggest:
|
||||
- Checking spelling/variation
|
||||
- Trying similar usernames with `?` wildcard: `sherlock "user?name"`
|
||||
- The user may have privacy settings or deleted accounts
|
||||
|
||||
### Timeout Issues
|
||||
Some sites are slow or block automated requests. Use `--timeout 120` to increase wait time, or `--site` to limit scope.
|
||||
|
||||
### Tor Configuration
|
||||
`--tor` requires Tor daemon running. If user wants anonymity but Tor isn't available, suggest:
|
||||
- Installing Tor service
|
||||
- Using `--proxy` with an alternative proxy
|
||||
|
||||
### False Positives
|
||||
Some sites always return "found" due to their response structure. Cross-reference unexpected results with manual checks.
|
||||
|
||||
### Rate Limiting
|
||||
Aggressive searches may trigger rate limits. For bulk username searches, add delays between calls or use `--local` with cached data.
|
||||
|
||||
## Installation
|
||||
|
||||
### pipx (recommended)
|
||||
```bash
|
||||
pipx install sherlock-project
|
||||
```
|
||||
|
||||
### pip
|
||||
```bash
|
||||
pip install sherlock-project
|
||||
```
|
||||
|
||||
### Docker
|
||||
```bash
|
||||
docker pull sherlock/sherlock
|
||||
docker run -it --rm sherlock/sherlock <username>
|
||||
```
|
||||
|
||||
### Linux packages
|
||||
Available on Debian 13+, Ubuntu 22.10+, Homebrew, Kali, BlackArch.
|
||||
|
||||
## Ethical Use
|
||||
|
||||
This tool is for legitimate OSINT and research purposes only. Remind users:
|
||||
- Only search usernames they own or have permission to investigate
|
||||
- Respect platform terms of service
|
||||
- Do not use for harassment, stalking, or illegal activities
|
||||
- Consider privacy implications before sharing results
|
||||
|
||||
## Verification
|
||||
|
||||
After running sherlock, verify:
|
||||
1. Output lists found sites with URLs
|
||||
2. `<username>.txt` file created (default output) if using file output
|
||||
3. If `--print-found` used, output should only contain `[+]` lines for matches
|
||||
|
||||
## Example Interaction
|
||||
|
||||
**User:** "Can you check if the username 'johndoe123' exists on social media?"
|
||||
|
||||
**Agent procedure:**
|
||||
1. Check `sherlock --version` (verify installed)
|
||||
2. Username provided — proceed directly
|
||||
3. Run: `sherlock --print-found --no-color "johndoe123" --timeout 90`
|
||||
4. Parse output and present links
|
||||
|
||||
**Response format:**
|
||||
> Found 12 accounts for username 'johndoe123':
|
||||
>
|
||||
> • https://twitter.com/johndoe123
|
||||
> • https://github.com/johndoe123
|
||||
> • https://instagram.com/johndoe123
|
||||
> • [... additional links]
|
||||
>
|
||||
> Results saved to: johndoe123.txt
|
||||
|
||||
---
|
||||
|
||||
**User:** "Search for username 'alice' including NSFW sites"
|
||||
|
||||
**Agent procedure:**
|
||||
1. Check sherlock installed
|
||||
2. Username + NSFW flag both provided
|
||||
3. Run: `sherlock --print-found --no-color --nsfw "alice" --timeout 90`
|
||||
4. Present results
|
||||
|
|
@ -0,0 +1,206 @@
|
|||
---
|
||||
title: "Page Agent"
|
||||
sidebar_label: "Page Agent"
|
||||
description: "Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-..."
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Page Agent
|
||||
|
||||
Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natural language ("click login, fill username as John"). No Python, no headless browser, no extension required. Use this skill when the user is a web developer who wants to add an AI copilot to their SaaS / admin panel / B2B tool, make a legacy web app accessible via natural language, or evaluate page-agent against a local (Ollama) or cloud (Qwen / OpenAI / OpenRouter) LLM. NOT for server-side browser automation — point those users to Hermes' built-in browser tool instead.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/web-development/page-agent` |
|
||||
| Path | `optional-skills/web-development/page-agent` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `web`, `javascript`, `agent`, `browser`, `gui`, `alibaba`, `embed`, `copilot`, `saas` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# page-agent
|
||||
|
||||
alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.
|
||||
|
||||
## When to use this skill
|
||||
|
||||
Load this skill when a user wants to:
|
||||
|
||||
- **Ship an AI copilot inside their own web app** (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
|
||||
- **Modernize a legacy web app** without rewriting the frontend — page-agent drops on top of existing DOM
|
||||
- **Add accessibility via natural language** — voice / screen-reader users drive the UI by describing what they want
|
||||
- **Demo or evaluate page-agent** against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
|
||||
- **Build interactive training / product demos** — let an AI walk a user through "how to submit an expense report" live in the real UI
|
||||
|
||||
## When NOT to use this skill
|
||||
|
||||
- User wants **Hermes itself to drive a browser** → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the *opposite* direction.
|
||||
- User wants **cross-tab automation without embedding** → use Playwright, browser-use, or the page-agent Chrome extension
|
||||
- User needs **visual grounding / screenshots** → page-agent is text-DOM only; use a multimodal browser agent instead
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
|
||||
- An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking `/v1/chat/completions`
|
||||
- Browser with devtools (for debugging)
|
||||
|
||||
## Path 1 — 30-second demo via CDN (no install)
|
||||
|
||||
Fastest way to see it work. Uses alibaba's free testing LLM proxy — **for evaluation only**, subject to their terms.
|
||||
|
||||
Add to any HTML page (or paste into the devtools console as a bookmarklet):
|
||||
|
||||
```html
|
||||
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
|
||||
```
|
||||
|
||||
A panel appears. Type an instruction. Done.
|
||||
|
||||
Bookmarklet form (drop into bookmarks bar, click on any page):
|
||||
|
||||
```javascript
|
||||
javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();
|
||||
```
|
||||
|
||||
## Path 2 — npm install into your own web app (production use)
|
||||
|
||||
Inside an existing web project (React / Vue / Svelte / plain):
|
||||
|
||||
```bash
|
||||
npm install page-agent
|
||||
```
|
||||
|
||||
Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**:
|
||||
|
||||
```javascript
|
||||
import { PageAgent } from 'page-agent'
|
||||
|
||||
const agent = new PageAgent({
|
||||
model: 'qwen3.5-plus',
|
||||
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
|
||||
apiKey: process.env.LLM_API_KEY, // never hardcode
|
||||
language: 'en-US',
|
||||
})
|
||||
|
||||
// Show the panel for end users:
|
||||
agent.panel.show()
|
||||
|
||||
// Or drive it programmatically:
|
||||
await agent.execute('Click submit button, then fill username as John')
|
||||
```
|
||||
|
||||
Provider examples (any OpenAI-compatible endpoint works):
|
||||
|
||||
| Provider | `baseURL` | `model` |
|
||||
|----------|-----------|---------|
|
||||
| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` |
|
||||
| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` |
|
||||
| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` |
|
||||
| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` |
|
||||
|
||||
**Key config fields** (passed to `new PageAgent({...})`):
|
||||
|
||||
- `model`, `baseURL`, `apiKey` — LLM connection
|
||||
- `language` — UI language (`en-US`, `zh-CN`, etc.)
|
||||
- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list
|
||||
|
||||
**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.
|
||||
|
||||
## Path 3 — clone the source repo (contributing, or hacking on it)
|
||||
|
||||
Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/alibaba/page-agent.git
|
||||
cd page-agent
|
||||
npm ci # exact lockfile install (or `npm i` to allow updates)
|
||||
```
|
||||
|
||||
Create `.env` in the repo root with an LLM endpoint. Example:
|
||||
|
||||
```
|
||||
LLM_MODEL_NAME=gpt-4o-mini
|
||||
LLM_API_KEY=sk-...
|
||||
LLM_BASE_URL=https://api.openai.com/v1
|
||||
```
|
||||
|
||||
Ollama flavor:
|
||||
|
||||
```
|
||||
LLM_BASE_URL=http://localhost:11434/v1
|
||||
LLM_API_KEY=NA
|
||||
LLM_MODEL_NAME=qwen3:14b
|
||||
```
|
||||
|
||||
Common commands:
|
||||
|
||||
```bash
|
||||
npm start # docs/website dev server
|
||||
npm run build # build every package
|
||||
npm run dev:demo # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
|
||||
npm run dev:ext # develop the browser extension (WXT + React)
|
||||
npm run build:ext # build the extension
|
||||
```
|
||||
|
||||
**Test on any website** using the local IIFE bundle. Add this bookmarklet:
|
||||
|
||||
```javascript
|
||||
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();
|
||||
```
|
||||
|
||||
Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.
|
||||
|
||||
**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.)
|
||||
|
||||
## Repo layout (Path 3)
|
||||
|
||||
Monorepo with npm workspaces. Key packages:
|
||||
|
||||
| Package | Path | Purpose |
|
||||
|---------|------|---------|
|
||||
| `page-agent` | `packages/page-agent/` | Main entry with UI panel |
|
||||
| `@page-agent/core` | `packages/core/` | Core agent logic, no UI |
|
||||
| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) |
|
||||
| — | `packages/llms/` | LLM client |
|
||||
| — | `packages/page-controller/` | DOM ops + visual feedback |
|
||||
| — | `packages/ui/` | Panel + i18n |
|
||||
| — | `packages/extension/` | Chrome/Firefox extension |
|
||||
| — | `packages/website/` | Docs + landing site |
|
||||
|
||||
## Verifying it works
|
||||
|
||||
After Path 1 or Path 2:
|
||||
1. Open the page in a browser with devtools open
|
||||
2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key)
|
||||
3. Type a simple instruction matching something visible on the page ("click the Login link")
|
||||
4. Watch the Network tab — you should see a request to your `baseURL`
|
||||
|
||||
After Path 3:
|
||||
1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174`
|
||||
2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript`
|
||||
3. Click the bookmarklet on any site; panel appears
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
|
||||
- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments.
|
||||
- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
|
||||
- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
|
||||
- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup.
|
||||
- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors.
|
||||
- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine.
|
||||
|
||||
## Reference
|
||||
|
||||
- Repo: https://github.com/alibaba/page-agent
|
||||
- Docs: https://alibaba.github.io/page-agent/
|
||||
- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)
|
||||
Loading…
Add table
Add a link
Reference in a new issue