mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-24 05:41:40 +00:00
chore(skills): move heavy training skills + outlines to optional-skills (#22912)
These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't be active by default. Moved to optional-skills/ where users opt-in via `hermes skills install official/...`. Moved: - mlops/training/axolotl - mlops/training/trl-fine-tuning - mlops/training/unsloth - mlops/inference/outlines Counts: 91 -> 87 built-in, 72 -> 76 optional. Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
This commit is contained in:
parent
4375b82cd9
commit
ded194eb6a
27 changed files with 18 additions and 18 deletions
|
|
@ -0,0 +1,671 @@
|
|||
---
|
||||
title: "Outlines — Outlines: structured JSON/regex/Pydantic LLM generation"
|
||||
sidebar_label: "Outlines"
|
||||
description: "Outlines: structured JSON/regex/Pydantic LLM generation"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Outlines
|
||||
|
||||
Outlines: structured JSON/regex/Pydantic LLM generation.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/outlines` |
|
||||
| Path | `optional-skills/mlops/inference/outlines` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `outlines`, `transformers`, `vllm`, `pydantic` |
|
||||
| Platforms | linux, macos, windows |
|
||||
| Tags | `Prompt Engineering`, `Outlines`, `Structured Generation`, `JSON Schema`, `Pydantic`, `Local Models`, `Grammar-Based Generation`, `vLLM`, `Transformers`, `Type Safety` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Outlines: Structured Text Generation
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use Outlines when you need to:
|
||||
- **Guarantee valid JSON/XML/code** structure during generation
|
||||
- **Use Pydantic models** for type-safe outputs
|
||||
- **Support local models** (Transformers, llama.cpp, vLLM)
|
||||
- **Maximize inference speed** with zero-overhead structured generation
|
||||
- **Generate against JSON schemas** automatically
|
||||
- **Control token sampling** at the grammar level
|
||||
|
||||
**GitHub Stars**: 8,000+ | **From**: dottxt.ai (formerly .txt)
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Base installation
|
||||
pip install outlines
|
||||
|
||||
# With specific backends
|
||||
pip install outlines transformers # Hugging Face models
|
||||
pip install outlines llama-cpp-python # llama.cpp
|
||||
pip install outlines vllm # vLLM for high-throughput
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Example: Classification
|
||||
|
||||
```python
|
||||
import outlines
|
||||
from typing import Literal
|
||||
|
||||
# Load model
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
|
||||
# Generate with type constraint
|
||||
prompt = "Sentiment of 'This product is amazing!': "
|
||||
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
|
||||
sentiment = generator(prompt)
|
||||
|
||||
print(sentiment) # "positive" (guaranteed one of these)
|
||||
```
|
||||
|
||||
### With Pydantic Models
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
import outlines
|
||||
|
||||
class User(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
email: str
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
|
||||
# Generate structured output
|
||||
prompt = "Extract user: John Doe, 30 years old, john@example.com"
|
||||
generator = outlines.generate.json(model, User)
|
||||
user = generator(prompt)
|
||||
|
||||
print(user.name) # "John Doe"
|
||||
print(user.age) # 30
|
||||
print(user.email) # "john@example.com"
|
||||
```
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. Constrained Token Sampling
|
||||
|
||||
Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.
|
||||
|
||||
**How it works:**
|
||||
1. Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
|
||||
2. Transform CFG into Finite State Machine (FSM)
|
||||
3. Filter invalid tokens at each step during generation
|
||||
4. Fast-forward when only one valid token exists
|
||||
|
||||
**Benefits:**
|
||||
- **Zero overhead**: Filtering happens at token level
|
||||
- **Speed improvement**: Fast-forward through deterministic paths
|
||||
- **Guaranteed validity**: Invalid outputs impossible
|
||||
|
||||
```python
|
||||
import outlines
|
||||
|
||||
# Pydantic model -> JSON schema -> CFG -> FSM
|
||||
class Person(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
|
||||
# Behind the scenes:
|
||||
# 1. Person -> JSON schema
|
||||
# 2. JSON schema -> CFG
|
||||
# 3. CFG -> FSM
|
||||
# 4. FSM filters tokens during generation
|
||||
|
||||
generator = outlines.generate.json(model, Person)
|
||||
result = generator("Generate person: Alice, 25")
|
||||
```
|
||||
|
||||
### 2. Structured Generators
|
||||
|
||||
Outlines provides specialized generators for different output types.
|
||||
|
||||
#### Choice Generator
|
||||
|
||||
```python
|
||||
# Multiple choice selection
|
||||
generator = outlines.generate.choice(
|
||||
model,
|
||||
["positive", "negative", "neutral"]
|
||||
)
|
||||
|
||||
sentiment = generator("Review: This is great!")
|
||||
# Result: One of the three choices
|
||||
```
|
||||
|
||||
#### JSON Generator
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
|
||||
class Product(BaseModel):
|
||||
name: str
|
||||
price: float
|
||||
in_stock: bool
|
||||
|
||||
# Generate valid JSON matching schema
|
||||
generator = outlines.generate.json(model, Product)
|
||||
product = generator("Extract: iPhone 15, $999, available")
|
||||
|
||||
# Guaranteed valid Product instance
|
||||
print(type(product)) # <class '__main__.Product'>
|
||||
```
|
||||
|
||||
#### Regex Generator
|
||||
|
||||
```python
|
||||
# Generate text matching regex
|
||||
generator = outlines.generate.regex(
|
||||
model,
|
||||
r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern
|
||||
)
|
||||
|
||||
phone = generator("Generate phone number:")
|
||||
# Result: "555-123-4567" (guaranteed to match pattern)
|
||||
```
|
||||
|
||||
#### Integer/Float Generators
|
||||
|
||||
```python
|
||||
# Generate specific numeric types
|
||||
int_generator = outlines.generate.integer(model)
|
||||
age = int_generator("Person's age:") # Guaranteed integer
|
||||
|
||||
float_generator = outlines.generate.float(model)
|
||||
price = float_generator("Product price:") # Guaranteed float
|
||||
```
|
||||
|
||||
### 3. Model Backends
|
||||
|
||||
Outlines supports multiple local and API-based backends.
|
||||
|
||||
#### Transformers (Hugging Face)
|
||||
|
||||
```python
|
||||
import outlines
|
||||
|
||||
# Load from Hugging Face
|
||||
model = outlines.models.transformers(
|
||||
"microsoft/Phi-3-mini-4k-instruct",
|
||||
device="cuda" # Or "cpu"
|
||||
)
|
||||
|
||||
# Use with any generator
|
||||
generator = outlines.generate.json(model, YourModel)
|
||||
```
|
||||
|
||||
#### llama.cpp
|
||||
|
||||
```python
|
||||
# Load GGUF model
|
||||
model = outlines.models.llamacpp(
|
||||
"./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
|
||||
n_gpu_layers=35
|
||||
)
|
||||
|
||||
generator = outlines.generate.json(model, YourModel)
|
||||
```
|
||||
|
||||
#### vLLM (High Throughput)
|
||||
|
||||
```python
|
||||
# For production deployments
|
||||
model = outlines.models.vllm(
|
||||
"meta-llama/Llama-3.1-8B-Instruct",
|
||||
tensor_parallel_size=2 # Multi-GPU
|
||||
)
|
||||
|
||||
generator = outlines.generate.json(model, YourModel)
|
||||
```
|
||||
|
||||
#### OpenAI (Limited Support)
|
||||
|
||||
```python
|
||||
# Basic OpenAI support
|
||||
model = outlines.models.openai(
|
||||
"gpt-4o-mini",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Note: Some features limited with API models
|
||||
generator = outlines.generate.json(model, YourModel)
|
||||
```
|
||||
|
||||
### 4. Pydantic Integration
|
||||
|
||||
Outlines has first-class Pydantic support with automatic schema translation.
|
||||
|
||||
#### Basic Models
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class Article(BaseModel):
|
||||
title: str = Field(description="Article title")
|
||||
author: str = Field(description="Author name")
|
||||
word_count: int = Field(description="Number of words", gt=0)
|
||||
tags: list[str] = Field(description="List of tags")
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, Article)
|
||||
|
||||
article = generator("Generate article about AI")
|
||||
print(article.title)
|
||||
print(article.word_count) # Guaranteed > 0
|
||||
```
|
||||
|
||||
#### Nested Models
|
||||
|
||||
```python
|
||||
class Address(BaseModel):
|
||||
street: str
|
||||
city: str
|
||||
country: str
|
||||
|
||||
class Person(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
address: Address # Nested model
|
||||
|
||||
generator = outlines.generate.json(model, Person)
|
||||
person = generator("Generate person in New York")
|
||||
|
||||
print(person.address.city) # "New York"
|
||||
```
|
||||
|
||||
#### Enums and Literals
|
||||
|
||||
```python
|
||||
from enum import Enum
|
||||
from typing import Literal
|
||||
|
||||
class Status(str, Enum):
|
||||
PENDING = "pending"
|
||||
APPROVED = "approved"
|
||||
REJECTED = "rejected"
|
||||
|
||||
class Application(BaseModel):
|
||||
applicant: str
|
||||
status: Status # Must be one of enum values
|
||||
priority: Literal["low", "medium", "high"] # Must be one of literals
|
||||
|
||||
generator = outlines.generate.json(model, Application)
|
||||
app = generator("Generate application")
|
||||
|
||||
print(app.status) # Status.PENDING (or APPROVED/REJECTED)
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Data Extraction
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
import outlines
|
||||
|
||||
class CompanyInfo(BaseModel):
|
||||
name: str
|
||||
founded_year: int
|
||||
industry: str
|
||||
employees: int
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, CompanyInfo)
|
||||
|
||||
text = """
|
||||
Apple Inc. was founded in 1976 in the technology industry.
|
||||
The company employs approximately 164,000 people worldwide.
|
||||
"""
|
||||
|
||||
prompt = f"Extract company information:\n{text}\n\nCompany:"
|
||||
company = generator(prompt)
|
||||
|
||||
print(f"Name: {company.name}")
|
||||
print(f"Founded: {company.founded_year}")
|
||||
print(f"Industry: {company.industry}")
|
||||
print(f"Employees: {company.employees}")
|
||||
```
|
||||
|
||||
### Pattern 2: Classification
|
||||
|
||||
```python
|
||||
from typing import Literal
|
||||
import outlines
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
|
||||
# Binary classification
|
||||
generator = outlines.generate.choice(model, ["spam", "not_spam"])
|
||||
result = generator("Email: Buy now! 50% off!")
|
||||
|
||||
# Multi-class classification
|
||||
categories = ["technology", "business", "sports", "entertainment"]
|
||||
category_gen = outlines.generate.choice(model, categories)
|
||||
category = category_gen("Article: Apple announces new iPhone...")
|
||||
|
||||
# With confidence
|
||||
class Classification(BaseModel):
|
||||
label: Literal["positive", "negative", "neutral"]
|
||||
confidence: float
|
||||
|
||||
classifier = outlines.generate.json(model, Classification)
|
||||
result = classifier("Review: This product is okay, nothing special")
|
||||
```
|
||||
|
||||
### Pattern 3: Structured Forms
|
||||
|
||||
```python
|
||||
class UserProfile(BaseModel):
|
||||
full_name: str
|
||||
age: int
|
||||
email: str
|
||||
phone: str
|
||||
country: str
|
||||
interests: list[str]
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, UserProfile)
|
||||
|
||||
prompt = """
|
||||
Extract user profile from:
|
||||
Name: Alice Johnson
|
||||
Age: 28
|
||||
Email: alice@example.com
|
||||
Phone: 555-0123
|
||||
Country: USA
|
||||
Interests: hiking, photography, cooking
|
||||
"""
|
||||
|
||||
profile = generator(prompt)
|
||||
print(profile.full_name)
|
||||
print(profile.interests) # ["hiking", "photography", "cooking"]
|
||||
```
|
||||
|
||||
### Pattern 4: Multi-Entity Extraction
|
||||
|
||||
```python
|
||||
class Entity(BaseModel):
|
||||
name: str
|
||||
type: Literal["PERSON", "ORGANIZATION", "LOCATION"]
|
||||
|
||||
class DocumentEntities(BaseModel):
|
||||
entities: list[Entity]
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, DocumentEntities)
|
||||
|
||||
text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
|
||||
prompt = f"Extract entities from: {text}"
|
||||
|
||||
result = generator(prompt)
|
||||
for entity in result.entities:
|
||||
print(f"{entity.name} ({entity.type})")
|
||||
```
|
||||
|
||||
### Pattern 5: Code Generation
|
||||
|
||||
```python
|
||||
class PythonFunction(BaseModel):
|
||||
function_name: str
|
||||
parameters: list[str]
|
||||
docstring: str
|
||||
body: str
|
||||
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, PythonFunction)
|
||||
|
||||
prompt = "Generate a Python function to calculate factorial"
|
||||
func = generator(prompt)
|
||||
|
||||
print(f"def {func.function_name}({', '.join(func.parameters)}):")
|
||||
print(f' """{func.docstring}"""')
|
||||
print(f" {func.body}")
|
||||
```
|
||||
|
||||
### Pattern 6: Batch Processing
|
||||
|
||||
```python
|
||||
def batch_extract(texts: list[str], schema: type[BaseModel]):
|
||||
"""Extract structured data from multiple texts."""
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
generator = outlines.generate.json(model, schema)
|
||||
|
||||
results = []
|
||||
for text in texts:
|
||||
result = generator(f"Extract from: {text}")
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
class Person(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
|
||||
texts = [
|
||||
"John is 30 years old",
|
||||
"Alice is 25 years old",
|
||||
"Bob is 40 years old"
|
||||
]
|
||||
|
||||
people = batch_extract(texts, Person)
|
||||
for person in people:
|
||||
print(f"{person.name}: {person.age}")
|
||||
```
|
||||
|
||||
## Backend Configuration
|
||||
|
||||
### Transformers
|
||||
|
||||
```python
|
||||
import outlines
|
||||
|
||||
# Basic usage
|
||||
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
||||
|
||||
# GPU configuration
|
||||
model = outlines.models.transformers(
|
||||
"microsoft/Phi-3-mini-4k-instruct",
|
||||
device="cuda",
|
||||
model_kwargs={"torch_dtype": "float16"}
|
||||
)
|
||||
|
||||
# Popular models
|
||||
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
|
||||
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
|
||||
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")
|
||||
```
|
||||
|
||||
### llama.cpp
|
||||
|
||||
```python
|
||||
# Load GGUF model
|
||||
model = outlines.models.llamacpp(
|
||||
"./models/llama-3.1-8b.Q4_K_M.gguf",
|
||||
n_ctx=4096, # Context window
|
||||
n_gpu_layers=35, # GPU layers
|
||||
n_threads=8 # CPU threads
|
||||
)
|
||||
|
||||
# Full GPU offload
|
||||
model = outlines.models.llamacpp(
|
||||
"./models/model.gguf",
|
||||
n_gpu_layers=-1 # All layers on GPU
|
||||
)
|
||||
```
|
||||
|
||||
### vLLM (Production)
|
||||
|
||||
```python
|
||||
# Single GPU
|
||||
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")
|
||||
|
||||
# Multi-GPU
|
||||
model = outlines.models.vllm(
|
||||
"meta-llama/Llama-3.1-70B-Instruct",
|
||||
tensor_parallel_size=4 # 4 GPUs
|
||||
)
|
||||
|
||||
# With quantization
|
||||
model = outlines.models.vllm(
|
||||
"meta-llama/Llama-3.1-8B-Instruct",
|
||||
quantization="awq" # Or "gptq"
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Specific Types
|
||||
|
||||
```python
|
||||
# ✅ Good: Specific types
|
||||
class Product(BaseModel):
|
||||
name: str
|
||||
price: float # Not str
|
||||
quantity: int # Not str
|
||||
in_stock: bool # Not str
|
||||
|
||||
# ❌ Bad: Everything as string
|
||||
class Product(BaseModel):
|
||||
name: str
|
||||
price: str # Should be float
|
||||
quantity: str # Should be int
|
||||
```
|
||||
|
||||
### 2. Add Constraints
|
||||
|
||||
```python
|
||||
from pydantic import Field
|
||||
|
||||
# ✅ Good: With constraints
|
||||
class User(BaseModel):
|
||||
name: str = Field(min_length=1, max_length=100)
|
||||
age: int = Field(ge=0, le=120)
|
||||
email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
|
||||
|
||||
# ❌ Bad: No constraints
|
||||
class User(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
email: str
|
||||
```
|
||||
|
||||
### 3. Use Enums for Categories
|
||||
|
||||
```python
|
||||
# ✅ Good: Enum for fixed set
|
||||
class Priority(str, Enum):
|
||||
LOW = "low"
|
||||
MEDIUM = "medium"
|
||||
HIGH = "high"
|
||||
|
||||
class Task(BaseModel):
|
||||
title: str
|
||||
priority: Priority
|
||||
|
||||
# ❌ Bad: Free-form string
|
||||
class Task(BaseModel):
|
||||
title: str
|
||||
priority: str # Can be anything
|
||||
```
|
||||
|
||||
### 4. Provide Context in Prompts
|
||||
|
||||
```python
|
||||
# ✅ Good: Clear context
|
||||
prompt = """
|
||||
Extract product information from the following text.
|
||||
Text: iPhone 15 Pro costs $999 and is currently in stock.
|
||||
Product:
|
||||
"""
|
||||
|
||||
# ❌ Bad: Minimal context
|
||||
prompt = "iPhone 15 Pro costs $999 and is currently in stock."
|
||||
```
|
||||
|
||||
### 5. Handle Optional Fields
|
||||
|
||||
```python
|
||||
from typing import Optional
|
||||
|
||||
# ✅ Good: Optional fields for incomplete data
|
||||
class Article(BaseModel):
|
||||
title: str # Required
|
||||
author: Optional[str] = None # Optional
|
||||
date: Optional[str] = None # Optional
|
||||
tags: list[str] = [] # Default empty list
|
||||
|
||||
# Can succeed even if author/date missing
|
||||
```
|
||||
|
||||
## Comparison to Alternatives
|
||||
|
||||
| Feature | Outlines | Instructor | Guidance | LMQL |
|
||||
|---------|----------|------------|----------|------|
|
||||
| Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No |
|
||||
| JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
|
||||
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
|
||||
| Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full |
|
||||
| API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full |
|
||||
| Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
|
||||
| Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No |
|
||||
| Learning Curve | Low | Low | Low | High |
|
||||
|
||||
**When to choose Outlines:**
|
||||
- Using local models (Transformers, llama.cpp, vLLM)
|
||||
- Need maximum inference speed
|
||||
- Want Pydantic model support
|
||||
- Require zero-overhead structured generation
|
||||
- Control token sampling process
|
||||
|
||||
**When to choose alternatives:**
|
||||
- Instructor: Need API models with automatic retrying
|
||||
- Guidance: Need token healing and complex workflows
|
||||
- LMQL: Prefer declarative query syntax
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
**Speed:**
|
||||
- **Zero overhead**: Structured generation as fast as unconstrained
|
||||
- **Fast-forward optimization**: Skips deterministic tokens
|
||||
- **1.2-2x faster** than post-generation validation approaches
|
||||
|
||||
**Memory:**
|
||||
- FSM compiled once per schema (cached)
|
||||
- Minimal runtime overhead
|
||||
- Efficient with vLLM for high throughput
|
||||
|
||||
**Accuracy:**
|
||||
- **100% valid outputs** (guaranteed by FSM)
|
||||
- No retry loops needed
|
||||
- Deterministic token filtering
|
||||
|
||||
## Resources
|
||||
|
||||
- **Documentation**: https://outlines-dev.github.io/outlines
|
||||
- **GitHub**: https://github.com/outlines-dev/outlines (8k+ stars)
|
||||
- **Discord**: https://discord.gg/R9DSu34mGd
|
||||
- **Blog**: https://blog.dottxt.co
|
||||
|
||||
## See Also
|
||||
|
||||
- `references/json_generation.md` - Comprehensive JSON and Pydantic patterns
|
||||
- `references/backends.md` - Backend-specific configuration
|
||||
- `references/examples.md` - Production-ready examples
|
||||
|
|
@ -0,0 +1,181 @@
|
|||
---
|
||||
title: "Axolotl — Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO)"
|
||||
sidebar_label: "Axolotl"
|
||||
description: "Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO)"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Axolotl
|
||||
|
||||
Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO).
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/axolotl` |
|
||||
| Path | `optional-skills/mlops/training/axolotl` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `axolotl`, `torch`, `transformers`, `datasets`, `peft`, `accelerate`, `deepspeed` |
|
||||
| Platforms | linux, macos |
|
||||
| Tags | `Fine-Tuning`, `Axolotl`, `LLM`, `LoRA`, `QLoRA`, `DPO`, `KTO`, `ORPO`, `GRPO`, `YAML`, `HuggingFace`, `DeepSpeed`, `Multimodal` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Axolotl Skill
|
||||
|
||||
## What's inside
|
||||
|
||||
Expert guidance for fine-tuning LLMs with Axolotl — YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support.
|
||||
|
||||
Comprehensive assistance with axolotl development, generated from official documentation.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
This skill should be triggered when:
|
||||
- Working with axolotl
|
||||
- Asking about axolotl features or APIs
|
||||
- Implementing axolotl solutions
|
||||
- Debugging axolotl code
|
||||
- Learning axolotl best practices
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Common Patterns
|
||||
|
||||
**Pattern 1:** To validate that acceptable data transfer speeds exist for your training job, running NCCL Tests can help pinpoint bottlenecks, for example:
|
||||
|
||||
```
|
||||
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3
|
||||
```
|
||||
|
||||
**Pattern 2:** Configure your model to use FSDP in the Axolotl yaml. For example:
|
||||
|
||||
```
|
||||
fsdp_version: 2
|
||||
fsdp_config:
|
||||
offload_params: true
|
||||
state_dict_type: FULL_STATE_DICT
|
||||
auto_wrap_policy: TRANSFORMER_BASED_WRAP
|
||||
transformer_layer_cls_to_wrap: LlamaDecoderLayer
|
||||
reshard_after_forward: true
|
||||
```
|
||||
|
||||
**Pattern 3:** The context_parallel_size should be a divisor of the total number of GPUs. For example:
|
||||
|
||||
```
|
||||
context_parallel_size
|
||||
```
|
||||
|
||||
**Pattern 4:** For example: - With 8 GPUs and no sequence parallelism: 8 different batches processed per step - With 8 GPUs and context_parallel_size=4: Only 2 different batches processed per step (each split across 4 GPUs) - If your per-GPU micro_batch_size is 2, the global batch size decreases from 16 to 4
|
||||
|
||||
```
|
||||
context_parallel_size=4
|
||||
```
|
||||
|
||||
**Pattern 5:** Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)
|
||||
|
||||
```
|
||||
save_compressed: true
|
||||
```
|
||||
|
||||
**Pattern 6:** Note It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env. See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer
|
||||
|
||||
```
|
||||
integrations
|
||||
```
|
||||
|
||||
**Pattern 7:** Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]
|
||||
|
||||
```
|
||||
utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)
|
||||
```
|
||||
|
||||
### Example Code Patterns
|
||||
|
||||
**Example 1** (python):
|
||||
```python
|
||||
cli.cloud.modal_.ModalCloud(config, app=None)
|
||||
```
|
||||
|
||||
**Example 2** (python):
|
||||
```python
|
||||
cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)
|
||||
```
|
||||
|
||||
**Example 3** (python):
|
||||
```python
|
||||
core.trainers.base.AxolotlTrainer(
|
||||
*_args,
|
||||
bench_data_collator=None,
|
||||
eval_data_collator=None,
|
||||
dataset_tags=None,
|
||||
**kwargs,
|
||||
)
|
||||
```
|
||||
|
||||
**Example 4** (python):
|
||||
```python
|
||||
core.trainers.base.AxolotlTrainer.log(logs, start_time=None)
|
||||
```
|
||||
|
||||
**Example 5** (python):
|
||||
```python
|
||||
prompt_strategies.input_output.RawInputOutputPrompter()
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
This skill includes comprehensive documentation in `references/`:
|
||||
|
||||
- **api.md** - Api documentation
|
||||
- **dataset-formats.md** - Dataset-Formats documentation
|
||||
- **other.md** - Other documentation
|
||||
|
||||
Use `view` to read specific reference files when detailed information is needed.
|
||||
|
||||
## Working with This Skill
|
||||
|
||||
### For Beginners
|
||||
Start with the getting_started or tutorials reference files for foundational concepts.
|
||||
|
||||
### For Specific Features
|
||||
Use the appropriate category reference file (api, guides, etc.) for detailed information.
|
||||
|
||||
### For Code Examples
|
||||
The quick reference section above contains common patterns extracted from the official docs.
|
||||
|
||||
## Resources
|
||||
|
||||
### references/
|
||||
Organized documentation extracted from official sources. These files contain:
|
||||
- Detailed explanations
|
||||
- Code examples with language annotations
|
||||
- Links to original documentation
|
||||
- Table of contents for quick navigation
|
||||
|
||||
### scripts/
|
||||
Add helper scripts here for common automation tasks.
|
||||
|
||||
### assets/
|
||||
Add templates, boilerplate, or example projects here.
|
||||
|
||||
## Notes
|
||||
|
||||
- This skill was automatically generated from official documentation
|
||||
- Reference files preserve the structure and examples from source docs
|
||||
- Code examples include language detection for better syntax highlighting
|
||||
- Quick reference patterns are extracted from common usage examples in the docs
|
||||
|
||||
## Updating
|
||||
|
||||
To refresh this skill with updated documentation:
|
||||
1. Re-run the scraper with the same configuration
|
||||
2. The skill will be rebuilt with the latest information
|
||||
|
|
@ -0,0 +1,477 @@
|
|||
---
|
||||
title: "Fine Tuning With Trl — TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF"
|
||||
sidebar_label: "Fine Tuning With Trl"
|
||||
description: "TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Fine Tuning With Trl
|
||||
|
||||
TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/trl-fine-tuning` |
|
||||
| Path | `optional-skills/mlops/training/trl-fine-tuning` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `trl`, `transformers`, `datasets`, `peft`, `accelerate`, `torch` |
|
||||
| Platforms | linux, macos, windows |
|
||||
| Tags | `Post-Training`, `TRL`, `Reinforcement Learning`, `Fine-Tuning`, `SFT`, `DPO`, `PPO`, `GRPO`, `RLHF`, `Preference Alignment`, `HuggingFace` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# TRL - Transformer Reinforcement Learning
|
||||
|
||||
## Quick start
|
||||
|
||||
TRL provides post-training methods for aligning language models with human preferences.
|
||||
|
||||
**Installation**:
|
||||
```bash
|
||||
pip install trl transformers datasets peft accelerate
|
||||
```
|
||||
|
||||
**Supervised Fine-Tuning** (instruction tuning):
|
||||
```python
|
||||
from trl import SFTTrainer
|
||||
|
||||
trainer = SFTTrainer(
|
||||
model="Qwen/Qwen2.5-0.5B",
|
||||
train_dataset=dataset, # Prompt-completion pairs
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
**DPO** (align with preferences):
|
||||
```python
|
||||
from trl import DPOTrainer, DPOConfig
|
||||
|
||||
config = DPOConfig(output_dir="model-dpo", beta=0.1)
|
||||
trainer = DPOTrainer(
|
||||
model=model,
|
||||
args=config,
|
||||
train_dataset=preference_dataset, # chosen/rejected pairs
|
||||
processing_class=tokenizer
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Workflow 1: Full RLHF pipeline (SFT → Reward Model → PPO)
|
||||
|
||||
Complete pipeline from base model to human-aligned model.
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
RLHF Training:
|
||||
- [ ] Step 1: Supervised fine-tuning (SFT)
|
||||
- [ ] Step 2: Train reward model
|
||||
- [ ] Step 3: PPO reinforcement learning
|
||||
- [ ] Step 4: Evaluate aligned model
|
||||
```
|
||||
|
||||
**Step 1: Supervised fine-tuning**
|
||||
|
||||
Train base model on instruction-following data:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
from trl import SFTTrainer, SFTConfig
|
||||
from datasets import load_dataset
|
||||
|
||||
# Load model
|
||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
|
||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
|
||||
|
||||
# Load instruction dataset
|
||||
dataset = load_dataset("trl-lib/Capybara", split="train")
|
||||
|
||||
# Configure training
|
||||
training_args = SFTConfig(
|
||||
output_dir="Qwen2.5-0.5B-SFT",
|
||||
per_device_train_batch_size=4,
|
||||
num_train_epochs=1,
|
||||
learning_rate=2e-5,
|
||||
logging_steps=10,
|
||||
save_strategy="epoch"
|
||||
)
|
||||
|
||||
# Train
|
||||
trainer = SFTTrainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=dataset,
|
||||
tokenizer=tokenizer
|
||||
)
|
||||
trainer.train()
|
||||
trainer.save_model()
|
||||
```
|
||||
|
||||
**Step 2: Train reward model**
|
||||
|
||||
Train model to predict human preferences:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForSequenceClassification
|
||||
from trl import RewardTrainer, RewardConfig
|
||||
|
||||
# Load SFT model as base
|
||||
model = AutoModelForSequenceClassification.from_pretrained(
|
||||
"Qwen2.5-0.5B-SFT",
|
||||
num_labels=1 # Single reward score
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained("Qwen2.5-0.5B-SFT")
|
||||
|
||||
# Load preference data (chosen/rejected pairs)
|
||||
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
|
||||
|
||||
# Configure training
|
||||
training_args = RewardConfig(
|
||||
output_dir="Qwen2.5-0.5B-Reward",
|
||||
per_device_train_batch_size=2,
|
||||
num_train_epochs=1,
|
||||
learning_rate=1e-5
|
||||
)
|
||||
|
||||
# Train reward model
|
||||
trainer = RewardTrainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
processing_class=tokenizer,
|
||||
train_dataset=dataset
|
||||
)
|
||||
trainer.train()
|
||||
trainer.save_model()
|
||||
```
|
||||
|
||||
**Step 3: PPO reinforcement learning**
|
||||
|
||||
Optimize policy using reward model:
|
||||
|
||||
```bash
|
||||
python -m trl.scripts.ppo \
|
||||
--model_name_or_path Qwen2.5-0.5B-SFT \
|
||||
--reward_model_path Qwen2.5-0.5B-Reward \
|
||||
--dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
|
||||
--output_dir Qwen2.5-0.5B-PPO \
|
||||
--learning_rate 3e-6 \
|
||||
--per_device_train_batch_size 64 \
|
||||
--total_episodes 10000
|
||||
```
|
||||
|
||||
**Step 4: Evaluate**
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
# Load aligned model
|
||||
generator = pipeline("text-generation", model="Qwen2.5-0.5B-PPO")
|
||||
|
||||
# Test
|
||||
prompt = "Explain quantum computing to a 10-year-old"
|
||||
output = generator(prompt, max_length=200)[0]["generated_text"]
|
||||
print(output)
|
||||
```
|
||||
|
||||
### Workflow 2: Simple preference alignment with DPO
|
||||
|
||||
Align model with preferences without reward model.
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
DPO Training:
|
||||
- [ ] Step 1: Prepare preference dataset
|
||||
- [ ] Step 2: Configure DPO
|
||||
- [ ] Step 3: Train with DPOTrainer
|
||||
- [ ] Step 4: Evaluate alignment
|
||||
```
|
||||
|
||||
**Step 1: Prepare preference dataset**
|
||||
|
||||
Dataset format:
|
||||
```json
|
||||
{
|
||||
"prompt": "What is the capital of France?",
|
||||
"chosen": "The capital of France is Paris.",
|
||||
"rejected": "I don't know."
|
||||
}
|
||||
```
|
||||
|
||||
Load dataset:
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
|
||||
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
|
||||
# Or load your own
|
||||
# dataset = load_dataset("json", data_files="preferences.json")
|
||||
```
|
||||
|
||||
**Step 2: Configure DPO**
|
||||
|
||||
```python
|
||||
from trl import DPOConfig
|
||||
|
||||
config = DPOConfig(
|
||||
output_dir="Qwen2.5-0.5B-DPO",
|
||||
per_device_train_batch_size=4,
|
||||
num_train_epochs=1,
|
||||
learning_rate=5e-7,
|
||||
beta=0.1, # KL penalty strength
|
||||
max_prompt_length=512,
|
||||
max_length=1024,
|
||||
logging_steps=10
|
||||
)
|
||||
```
|
||||
|
||||
**Step 3: Train with DPOTrainer**
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
from trl import DPOTrainer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
|
||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
|
||||
|
||||
trainer = DPOTrainer(
|
||||
model=model,
|
||||
args=config,
|
||||
train_dataset=dataset,
|
||||
processing_class=tokenizer
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
trainer.save_model()
|
||||
```
|
||||
|
||||
**CLI alternative**:
|
||||
```bash
|
||||
trl dpo \
|
||||
--model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
|
||||
--dataset_name argilla/Capybara-Preferences \
|
||||
--output_dir Qwen2.5-0.5B-DPO \
|
||||
--per_device_train_batch_size 4 \
|
||||
--learning_rate 5e-7 \
|
||||
--beta 0.1
|
||||
```
|
||||
|
||||
### Workflow 3: Memory-efficient online RL with GRPO
|
||||
|
||||
Train with reinforcement learning using minimal memory.
|
||||
|
||||
For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
```
|
||||
GRPO Training:
|
||||
- [ ] Step 1: Define reward function
|
||||
- [ ] Step 2: Configure GRPO
|
||||
- [ ] Step 3: Train with GRPOTrainer
|
||||
```
|
||||
|
||||
**Step 1: Define reward function**
|
||||
|
||||
```python
|
||||
def reward_function(completions, **kwargs):
|
||||
"""
|
||||
Compute rewards for completions.
|
||||
|
||||
Args:
|
||||
completions: List of generated texts
|
||||
|
||||
Returns:
|
||||
List of reward scores (floats)
|
||||
"""
|
||||
rewards = []
|
||||
for completion in completions:
|
||||
# Example: reward based on length and unique words
|
||||
score = len(completion.split()) # Favor longer responses
|
||||
score += len(set(completion.lower().split())) # Reward unique words
|
||||
rewards.append(score)
|
||||
return rewards
|
||||
```
|
||||
|
||||
Or use a reward model:
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
reward_model = pipeline("text-classification", model="reward-model-path")
|
||||
|
||||
def reward_from_model(completions, prompts, **kwargs):
|
||||
# Combine prompt + completion
|
||||
full_texts = [p + c for p, c in zip(prompts, completions)]
|
||||
# Get reward scores
|
||||
results = reward_model(full_texts)
|
||||
return [r["score"] for r in results]
|
||||
```
|
||||
|
||||
**Step 2: Configure GRPO**
|
||||
|
||||
```python
|
||||
from trl import GRPOConfig
|
||||
|
||||
config = GRPOConfig(
|
||||
output_dir="Qwen2-GRPO",
|
||||
per_device_train_batch_size=4,
|
||||
num_train_epochs=1,
|
||||
learning_rate=1e-5,
|
||||
num_generations=4, # Generate 4 completions per prompt
|
||||
max_new_tokens=128
|
||||
)
|
||||
```
|
||||
|
||||
**Step 3: Train with GRPOTrainer**
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
from trl import GRPOTrainer
|
||||
|
||||
# Load prompt-only dataset
|
||||
dataset = load_dataset("trl-lib/tldr", split="train")
|
||||
|
||||
trainer = GRPOTrainer(
|
||||
model="Qwen/Qwen2-0.5B-Instruct",
|
||||
reward_funcs=reward_function, # Your reward function
|
||||
args=config,
|
||||
train_dataset=dataset
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
**CLI**:
|
||||
```bash
|
||||
trl grpo \
|
||||
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
|
||||
--dataset_name trl-lib/tldr \
|
||||
--output_dir Qwen2-GRPO \
|
||||
--num_generations 4
|
||||
```
|
||||
|
||||
## When to use vs alternatives
|
||||
|
||||
**Use TRL when:**
|
||||
- Need to align model with human preferences
|
||||
- Have preference data (chosen/rejected pairs)
|
||||
- Want to use reinforcement learning (PPO, GRPO)
|
||||
- Need reward model training
|
||||
- Doing RLHF (full pipeline)
|
||||
|
||||
**Method selection**:
|
||||
- **SFT**: Have prompt-completion pairs, want basic instruction following
|
||||
- **DPO**: Have preferences, want simple alignment (no reward model needed)
|
||||
- **PPO**: Have reward model, need maximum control over RL
|
||||
- **GRPO**: Memory-constrained, want online RL
|
||||
- **Reward Model**: Building RLHF pipeline, need to score generations
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **HuggingFace Trainer**: Basic fine-tuning without RL
|
||||
- **Axolotl**: YAML-based training configuration
|
||||
- **LitGPT**: Educational, minimal fine-tuning
|
||||
- **Unsloth**: Fast LoRA training
|
||||
|
||||
## Common issues
|
||||
|
||||
**Issue: OOM during DPO training**
|
||||
|
||||
Reduce batch size and sequence length:
|
||||
```python
|
||||
config = DPOConfig(
|
||||
per_device_train_batch_size=1, # Reduce from 4
|
||||
max_length=512, # Reduce from 1024
|
||||
gradient_accumulation_steps=8 # Maintain effective batch
|
||||
)
|
||||
```
|
||||
|
||||
Or use gradient checkpointing:
|
||||
```python
|
||||
model.gradient_checkpointing_enable()
|
||||
```
|
||||
|
||||
**Issue: Poor alignment quality**
|
||||
|
||||
Tune beta parameter:
|
||||
```python
|
||||
# Higher beta = more conservative (stays closer to reference)
|
||||
config = DPOConfig(beta=0.5) # Default 0.1
|
||||
|
||||
# Lower beta = more aggressive alignment
|
||||
config = DPOConfig(beta=0.01)
|
||||
```
|
||||
|
||||
**Issue: Reward model not learning**
|
||||
|
||||
Check loss type and learning rate:
|
||||
```python
|
||||
config = RewardConfig(
|
||||
learning_rate=1e-5, # Try different LR
|
||||
num_train_epochs=3 # Train longer
|
||||
)
|
||||
```
|
||||
|
||||
Ensure preference dataset has clear winners:
|
||||
```python
|
||||
# Verify dataset
|
||||
print(dataset[0])
|
||||
# Should have clear chosen > rejected
|
||||
```
|
||||
|
||||
**Issue: PPO training unstable**
|
||||
|
||||
Adjust KL coefficient:
|
||||
```python
|
||||
config = PPOConfig(
|
||||
kl_coef=0.1, # Increase from 0.05
|
||||
cliprange=0.1 # Reduce from 0.2
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced topics
|
||||
|
||||
**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
|
||||
|
||||
**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
|
||||
|
||||
**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
|
||||
|
||||
**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
|
||||
|
||||
**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
- **GPU**: NVIDIA (CUDA required)
|
||||
- **VRAM**: Depends on model and method
|
||||
- SFT 7B: 16GB (with LoRA)
|
||||
- DPO 7B: 24GB (stores reference model)
|
||||
- PPO 7B: 40GB (policy + reward model)
|
||||
- GRPO 7B: 24GB (more memory efficient)
|
||||
- **Multi-GPU**: Supported via `accelerate`
|
||||
- **Mixed precision**: BF16 recommended (A100/H100)
|
||||
|
||||
**Memory optimization**:
|
||||
- Use LoRA/QLoRA for all methods
|
||||
- Enable gradient checkpointing
|
||||
- Use smaller batch sizes with gradient accumulation
|
||||
|
||||
## Resources
|
||||
|
||||
- Docs: https://huggingface.co/docs/trl/
|
||||
- GitHub: https://github.com/huggingface/trl
|
||||
- Papers:
|
||||
- "Training language models to follow instructions with human feedback" (InstructGPT, 2022)
|
||||
- "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (DPO, 2023)
|
||||
- "Group Relative Policy Optimization" (GRPO, 2024)
|
||||
- Examples: https://github.com/huggingface/trl/tree/main/examples/scripts
|
||||
|
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
title: "Unsloth — Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM"
|
||||
sidebar_label: "Unsloth"
|
||||
description: "Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Unsloth
|
||||
|
||||
Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Optional — install with `hermes skills install official/mlops/unsloth` |
|
||||
| Path | `optional-skills/mlops/training/unsloth` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
| Dependencies | `unsloth`, `torch`, `transformers`, `trl`, `datasets`, `peft` |
|
||||
| Platforms | linux, macos |
|
||||
| Tags | `Fine-Tuning`, `Unsloth`, `Fast Training`, `LoRA`, `QLoRA`, `Memory-Efficient`, `Optimization`, `Llama`, `Mistral`, `Gemma`, `Qwen` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Unsloth Skill
|
||||
|
||||
Comprehensive assistance with unsloth development, generated from official documentation.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
This skill should be triggered when:
|
||||
- Working with unsloth
|
||||
- Asking about unsloth features or APIs
|
||||
- Implementing unsloth solutions
|
||||
- Debugging unsloth code
|
||||
- Learning unsloth best practices
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Common Patterns
|
||||
|
||||
*Quick reference patterns will be added as you use the skill.*
|
||||
|
||||
## Reference Files
|
||||
|
||||
This skill includes comprehensive documentation in `references/`:
|
||||
|
||||
- **llms-txt.md** - Llms-Txt documentation
|
||||
|
||||
Use `view` to read specific reference files when detailed information is needed.
|
||||
|
||||
## Working with This Skill
|
||||
|
||||
### For Beginners
|
||||
Start with the getting_started or tutorials reference files for foundational concepts.
|
||||
|
||||
### For Specific Features
|
||||
Use the appropriate category reference file (api, guides, etc.) for detailed information.
|
||||
|
||||
### For Code Examples
|
||||
The quick reference section above contains common patterns extracted from the official docs.
|
||||
|
||||
## Resources
|
||||
|
||||
### references/
|
||||
Organized documentation extracted from official sources. These files contain:
|
||||
- Detailed explanations
|
||||
- Code examples with language annotations
|
||||
- Links to original documentation
|
||||
- Table of contents for quick navigation
|
||||
|
||||
### scripts/
|
||||
Add helper scripts here for common automation tasks.
|
||||
|
||||
### assets/
|
||||
Add templates, boilerplate, or example projects here.
|
||||
|
||||
## Notes
|
||||
|
||||
- This skill was automatically generated from official documentation
|
||||
- Reference files preserve the structure and examples from source docs
|
||||
- Code examples include language detection for better syntax highlighting
|
||||
- Quick reference patterns are extracted from common usage examples in the docs
|
||||
|
||||
## Updating
|
||||
|
||||
To refresh this skill with updated documentation:
|
||||
1. Re-run the scraper with the same configuration
|
||||
2. The skill will be rebuilt with the latest information
|
||||
|
||||
<!-- Trigger re-upload 1763621536 -->
|
||||
Loading…
Add table
Add a link
Reference in a new issue