chore(skills): move heavy training skills + outlines to optional-skills (#22912)

These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't be active by default. Moved to optional-skills/ where users opt-in via `hermes skills install official/...`. Moved: - mlops/training/axolotl - mlops/training/trl-fine-tuning - mlops/training/unsloth - mlops/inference/outlines Counts: 91 -> 87 built-in, 72 -> 76 optional. Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
2026-05-24 05:41:40 +00:00 · 2026-05-09 18:44:12 -07:00 · 2026-05-09 18:44:12 -07:00 · ded194eb6a
commit ded194eb6a
parent 4375b82cd9
27 changed files with 18 additions and 18 deletions
--- a/website/docs/user-guide/skills/optional/mlops/mlops-inference-outlines.md
+++ b/website/docs/user-guide/skills/optional/mlops/mlops-inference-outlines.md
@ -0,0 +1,671 @@
+---
+title: "Outlines — Outlines: structured JSON/regex/Pydantic LLM generation"
+sidebar_label: "Outlines"
+description: "Outlines: structured JSON/regex/Pydantic LLM generation"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Outlines
+
+Outlines: structured JSON/regex/Pydantic LLM generation.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/mlops/outlines` |
+| Path | `optional-skills/mlops/inference/outlines` |
+| Version | `1.0.0` |
+| Author | Orchestra Research |
+| License | MIT |
+| Dependencies | `outlines`, `transformers`, `vllm`, `pydantic` |
+| Platforms | linux, macos, windows |
+| Tags | `Prompt Engineering`, `Outlines`, `Structured Generation`, `JSON Schema`, `Pydantic`, `Local Models`, `Grammar-Based Generation`, `vLLM`, `Transformers`, `Type Safety` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Outlines: Structured Text Generation
+
+## When to Use This Skill
+
+Use Outlines when you need to:
+- **Guarantee valid JSON/XML/code** structure during generation
+- **Use Pydantic models** for type-safe outputs
+- **Support local models** (Transformers, llama.cpp, vLLM)
+- **Maximize inference speed** with zero-overhead structured generation
+- **Generate against JSON schemas** automatically
+- **Control token sampling** at the grammar level
+
+**GitHub Stars**: 8,000+ | **From**: dottxt.ai (formerly .txt)
+
+## Installation
+
+```bash
+# Base installation
+pip install outlines
+
+# With specific backends
+pip install outlines transformers  # Hugging Face models
+pip install outlines llama-cpp-python  # llama.cpp
+pip install outlines vllm  # vLLM for high-throughput
+```
+
+## Quick Start
+
+### Basic Example: Classification
+
+```python
+import outlines
+from typing import Literal
+
+# Load model
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+
+# Generate with type constraint
+prompt = "Sentiment of 'This product is amazing!': "
+generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
+sentiment = generator(prompt)
+
+print(sentiment)  # "positive" (guaranteed one of these)
+```
+
+### With Pydantic Models
+
+```python
+from pydantic import BaseModel
+import outlines
+
+class User(BaseModel):
+    name: str
+    age: int
+    email: str
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+
+# Generate structured output
+prompt = "Extract user: John Doe, 30 years old, john@example.com"
+generator = outlines.generate.json(model, User)
+user = generator(prompt)
+
+print(user.name)   # "John Doe"
+print(user.age)    # 30
+print(user.email)  # "john@example.com"
+```
+
+## Core Concepts
+
+### 1. Constrained Token Sampling
+
+Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.
+
+**How it works:**
+1. Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
+2. Transform CFG into Finite State Machine (FSM)
+3. Filter invalid tokens at each step during generation
+4. Fast-forward when only one valid token exists
+
+**Benefits:**
+- **Zero overhead**: Filtering happens at token level
+- **Speed improvement**: Fast-forward through deterministic paths
+- **Guaranteed validity**: Invalid outputs impossible
+
+```python
+import outlines
+
+# Pydantic model -> JSON schema -> CFG -> FSM
+class Person(BaseModel):
+    name: str
+    age: int
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+
+# Behind the scenes:
+# 1. Person -> JSON schema
+# 2. JSON schema -> CFG
+# 3. CFG -> FSM
+# 4. FSM filters tokens during generation
+
+generator = outlines.generate.json(model, Person)
+result = generator("Generate person: Alice, 25")
+```
+
+### 2. Structured Generators
+
+Outlines provides specialized generators for different output types.
+
+#### Choice Generator
+
+```python
+# Multiple choice selection
+generator = outlines.generate.choice(
+    model,
+    ["positive", "negative", "neutral"]
+)
+
+sentiment = generator("Review: This is great!")
+# Result: One of the three choices
+```
+
+#### JSON Generator
+
+```python
+from pydantic import BaseModel
+
+class Product(BaseModel):
+    name: str
+    price: float
+    in_stock: bool
+
+# Generate valid JSON matching schema
+generator = outlines.generate.json(model, Product)
+product = generator("Extract: iPhone 15, $999, available")
+
+# Guaranteed valid Product instance
+print(type(product))  # <class '__main__.Product'>
+```
+
+#### Regex Generator
+
+```python
+# Generate text matching regex
+generator = outlines.generate.regex(
+    model,
+    r"[0-9]{3}-[0-9]{3}-[0-9]{4}"  # Phone number pattern
+)
+
+phone = generator("Generate phone number:")
+# Result: "555-123-4567" (guaranteed to match pattern)
+```
+
+#### Integer/Float Generators
+
+```python
+# Generate specific numeric types
+int_generator = outlines.generate.integer(model)
+age = int_generator("Person's age:")  # Guaranteed integer
+
+float_generator = outlines.generate.float(model)
+price = float_generator("Product price:")  # Guaranteed float
+```
+
+### 3. Model Backends
+
+Outlines supports multiple local and API-based backends.
+
+#### Transformers (Hugging Face)
+
+```python
+import outlines
+
+# Load from Hugging Face
+model = outlines.models.transformers(
+    "microsoft/Phi-3-mini-4k-instruct",
+    device="cuda"  # Or "cpu"
+)
+
+# Use with any generator
+generator = outlines.generate.json(model, YourModel)
+```
+
+#### llama.cpp
+
+```python
+# Load GGUF model
+model = outlines.models.llamacpp(
+    "./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
+    n_gpu_layers=35
+)
+
+generator = outlines.generate.json(model, YourModel)
+```
+
+#### vLLM (High Throughput)
+
+```python
+# For production deployments
+model = outlines.models.vllm(
+    "meta-llama/Llama-3.1-8B-Instruct",
+    tensor_parallel_size=2  # Multi-GPU
+)
+
+generator = outlines.generate.json(model, YourModel)
+```
+
+#### OpenAI (Limited Support)
+
+```python
+# Basic OpenAI support
+model = outlines.models.openai(
+    "gpt-4o-mini",
+    api_key="your-api-key"
+)
+
+# Note: Some features limited with API models
+generator = outlines.generate.json(model, YourModel)
+```
+
+### 4. Pydantic Integration
+
+Outlines has first-class Pydantic support with automatic schema translation.
+
+#### Basic Models
+
+```python
+from pydantic import BaseModel, Field
+
+class Article(BaseModel):
+    title: str = Field(description="Article title")
+    author: str = Field(description="Author name")
+    word_count: int = Field(description="Number of words", gt=0)
+    tags: list[str] = Field(description="List of tags")
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+generator = outlines.generate.json(model, Article)
+
+article = generator("Generate article about AI")
+print(article.title)
+print(article.word_count)  # Guaranteed > 0
+```
+
+#### Nested Models
+
+```python
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class Person(BaseModel):
+    name: str
+    age: int
+    address: Address  # Nested model
+
+generator = outlines.generate.json(model, Person)
+person = generator("Generate person in New York")
+
+print(person.address.city)  # "New York"
+```
+
+#### Enums and Literals
+
+```python
+from enum import Enum
+from typing import Literal
+
+class Status(str, Enum):
+    PENDING = "pending"
+    APPROVED = "approved"
+    REJECTED = "rejected"
+
+class Application(BaseModel):
+    applicant: str
+    status: Status  # Must be one of enum values
+    priority: Literal["low", "medium", "high"]  # Must be one of literals
+
+generator = outlines.generate.json(model, Application)
+app = generator("Generate application")
+
+print(app.status)  # Status.PENDING (or APPROVED/REJECTED)
+```
+
+## Common Patterns
+
+### Pattern 1: Data Extraction
+
+```python
+from pydantic import BaseModel
+import outlines
+
+class CompanyInfo(BaseModel):
+    name: str
+    founded_year: int
+    industry: str
+    employees: int
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+generator = outlines.generate.json(model, CompanyInfo)
+
+text = """
+Apple Inc. was founded in 1976 in the technology industry.
+The company employs approximately 164,000 people worldwide.
+"""
+
+prompt = f"Extract company information:\n{text}\n\nCompany:"
+company = generator(prompt)
+
+print(f"Name: {company.name}")
+print(f"Founded: {company.founded_year}")
+print(f"Industry: {company.industry}")
+print(f"Employees: {company.employees}")
+```
+
+### Pattern 2: Classification
+
+```python
+from typing import Literal
+import outlines
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+
+# Binary classification
+generator = outlines.generate.choice(model, ["spam", "not_spam"])
+result = generator("Email: Buy now! 50% off!")
+
+# Multi-class classification
+categories = ["technology", "business", "sports", "entertainment"]
+category_gen = outlines.generate.choice(model, categories)
+category = category_gen("Article: Apple announces new iPhone...")
+
+# With confidence
+class Classification(BaseModel):
+    label: Literal["positive", "negative", "neutral"]
+    confidence: float
+
+classifier = outlines.generate.json(model, Classification)
+result = classifier("Review: This product is okay, nothing special")
+```
+
+### Pattern 3: Structured Forms
+
+```python
+class UserProfile(BaseModel):
+    full_name: str
+    age: int
+    email: str
+    phone: str
+    country: str
+    interests: list[str]
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+generator = outlines.generate.json(model, UserProfile)
+
+prompt = """
+Extract user profile from:
+Name: Alice Johnson
+Age: 28
+Email: alice@example.com
+Phone: 555-0123
+Country: USA
+Interests: hiking, photography, cooking
+"""
+
+profile = generator(prompt)
+print(profile.full_name)
+print(profile.interests)  # ["hiking", "photography", "cooking"]
+```
+
+### Pattern 4: Multi-Entity Extraction
+
+```python
+class Entity(BaseModel):
+    name: str
+    type: Literal["PERSON", "ORGANIZATION", "LOCATION"]
+
+class DocumentEntities(BaseModel):
+    entities: list[Entity]
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+generator = outlines.generate.json(model, DocumentEntities)
+
+text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
+prompt = f"Extract entities from: {text}"
+
+result = generator(prompt)
+for entity in result.entities:
+    print(f"{entity.name} ({entity.type})")
+```
+
+### Pattern 5: Code Generation
+
+```python
+class PythonFunction(BaseModel):
+    function_name: str
+    parameters: list[str]
+    docstring: str
+    body: str
+
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+generator = outlines.generate.json(model, PythonFunction)
+
+prompt = "Generate a Python function to calculate factorial"
+func = generator(prompt)
+
+print(f"def {func.function_name}({', '.join(func.parameters)}):")
+print(f'    """{func.docstring}"""')
+print(f"    {func.body}")
+```
+
+### Pattern 6: Batch Processing
+
+```python
+def batch_extract(texts: list[str], schema: type[BaseModel]):
+    """Extract structured data from multiple texts."""
+    model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+    generator = outlines.generate.json(model, schema)
+
+    results = []
+    for text in texts:
+        result = generator(f"Extract from: {text}")
+        results.append(result)
+
+    return results
+
+class Person(BaseModel):
+    name: str
+    age: int
+
+texts = [
+    "John is 30 years old",
+    "Alice is 25 years old",
+    "Bob is 40 years old"
+]
+
+people = batch_extract(texts, Person)
+for person in people:
+    print(f"{person.name}: {person.age}")
+```
+
+## Backend Configuration
+
+### Transformers
+
+```python
+import outlines
+
+# Basic usage
+model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
+
+# GPU configuration
+model = outlines.models.transformers(
+    "microsoft/Phi-3-mini-4k-instruct",
+    device="cuda",
+    model_kwargs={"torch_dtype": "float16"}
+)
+
+# Popular models
+model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
+model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
+model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")
+```
+
+### llama.cpp
+
+```python
+# Load GGUF model
+model = outlines.models.llamacpp(
+    "./models/llama-3.1-8b.Q4_K_M.gguf",
+    n_ctx=4096,         # Context window
+    n_gpu_layers=35,    # GPU layers
+    n_threads=8         # CPU threads
+)
+
+# Full GPU offload
+model = outlines.models.llamacpp(
+    "./models/model.gguf",
+    n_gpu_layers=-1  # All layers on GPU
+)
+```
+
+### vLLM (Production)
+
+```python
+# Single GPU
+model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")
+
+# Multi-GPU
+model = outlines.models.vllm(
+    "meta-llama/Llama-3.1-70B-Instruct",
+    tensor_parallel_size=4  # 4 GPUs
+)
+
+# With quantization
+model = outlines.models.vllm(
+    "meta-llama/Llama-3.1-8B-Instruct",
+    quantization="awq"  # Or "gptq"
+)
+```
+
+## Best Practices
+
+### 1. Use Specific Types
+
+```python
+# ✅ Good: Specific types
+class Product(BaseModel):
+    name: str
+    price: float  # Not str
+    quantity: int  # Not str
+    in_stock: bool  # Not str
+
+# ❌ Bad: Everything as string
+class Product(BaseModel):
+    name: str
+    price: str  # Should be float
+    quantity: str  # Should be int
+```
+
+### 2. Add Constraints
+
+```python
+from pydantic import Field
+
+# ✅ Good: With constraints
+class User(BaseModel):
+    name: str = Field(min_length=1, max_length=100)
+    age: int = Field(ge=0, le=120)
+    email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
+
+# ❌ Bad: No constraints
+class User(BaseModel):
+    name: str
+    age: int
+    email: str
+```
+
+### 3. Use Enums for Categories
+
+```python
+# ✅ Good: Enum for fixed set
+class Priority(str, Enum):
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+
+class Task(BaseModel):
+    title: str
+    priority: Priority
+
+# ❌ Bad: Free-form string
+class Task(BaseModel):
+    title: str
+    priority: str  # Can be anything
+```
+
+### 4. Provide Context in Prompts
+
+```python
+# ✅ Good: Clear context
+prompt = """
+Extract product information from the following text.
+Text: iPhone 15 Pro costs $999 and is currently in stock.
+Product:
+"""
+
+# ❌ Bad: Minimal context
+prompt = "iPhone 15 Pro costs $999 and is currently in stock."
+```
+
+### 5. Handle Optional Fields
+
+```python
+from typing import Optional
+
+# ✅ Good: Optional fields for incomplete data
+class Article(BaseModel):
+    title: str  # Required
+    author: Optional[str] = None  # Optional
+    date: Optional[str] = None  # Optional
+    tags: list[str] = []  # Default empty list
+
+# Can succeed even if author/date missing
+```
+
+## Comparison to Alternatives
+
+| Feature | Outlines | Instructor | Guidance | LMQL |
+|---------|----------|------------|----------|------|
+| Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No |
+| JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
+| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
+| Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full |
+| API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full |
+| Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
+| Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No |
+| Learning Curve | Low | Low | Low | High |
+
+**When to choose Outlines:**
+- Using local models (Transformers, llama.cpp, vLLM)
+- Need maximum inference speed
+- Want Pydantic model support
+- Require zero-overhead structured generation
+- Control token sampling process
+
+**When to choose alternatives:**
+- Instructor: Need API models with automatic retrying
+- Guidance: Need token healing and complex workflows
+- LMQL: Prefer declarative query syntax
+
+## Performance Characteristics
+
+**Speed:**
+- **Zero overhead**: Structured generation as fast as unconstrained
+- **Fast-forward optimization**: Skips deterministic tokens
+- **1.2-2x faster** than post-generation validation approaches
+
+**Memory:**
+- FSM compiled once per schema (cached)
+- Minimal runtime overhead
+- Efficient with vLLM for high throughput
+
+**Accuracy:**
+- **100% valid outputs** (guaranteed by FSM)
+- No retry loops needed
+- Deterministic token filtering
+
+## Resources
+
+- **Documentation**: https://outlines-dev.github.io/outlines
+- **GitHub**: https://github.com/outlines-dev/outlines (8k+ stars)
+- **Discord**: https://discord.gg/R9DSu34mGd
+- **Blog**: https://blog.dottxt.co
+
+## See Also
+
+- `references/json_generation.md` - Comprehensive JSON and Pydantic patterns
+- `references/backends.md` - Backend-specific configuration
+- `references/examples.md` - Production-ready examples
--- a/website/docs/user-guide/skills/optional/mlops/mlops-training-axolotl.md
+++ b/website/docs/user-guide/skills/optional/mlops/mlops-training-axolotl.md
@ -0,0 +1,181 @@
+---
+title: "Axolotl — Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO)"
+sidebar_label: "Axolotl"
+description: "Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO)"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Axolotl
+
+Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO).
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/mlops/axolotl` |
+| Path | `optional-skills/mlops/training/axolotl` |
+| Version | `1.0.0` |
+| Author | Orchestra Research |
+| License | MIT |
+| Dependencies | `axolotl`, `torch`, `transformers`, `datasets`, `peft`, `accelerate`, `deepspeed` |
+| Platforms | linux, macos |
+| Tags | `Fine-Tuning`, `Axolotl`, `LLM`, `LoRA`, `QLoRA`, `DPO`, `KTO`, `ORPO`, `GRPO`, `YAML`, `HuggingFace`, `DeepSpeed`, `Multimodal` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Axolotl Skill
+
+## What's inside
+
+Expert guidance for fine-tuning LLMs with Axolotl — YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support.
+
+Comprehensive assistance with axolotl development, generated from official documentation.
+
+## When to Use This Skill
+
+This skill should be triggered when:
+- Working with axolotl
+- Asking about axolotl features or APIs
+- Implementing axolotl solutions
+- Debugging axolotl code
+- Learning axolotl best practices
+
+## Quick Reference
+
+### Common Patterns
+
+**Pattern 1:** To validate that acceptable data transfer speeds exist for your training job, running NCCL Tests can help pinpoint bottlenecks, for example:
+
+```
+./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3
+```
+
+**Pattern 2:** Configure your model to use FSDP in the Axolotl yaml. For example:
+
+```
+fsdp_version: 2
+fsdp_config:
+  offload_params: true
+  state_dict_type: FULL_STATE_DICT
+  auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  transformer_layer_cls_to_wrap: LlamaDecoderLayer
+  reshard_after_forward: true
+```
+
+**Pattern 3:** The context_parallel_size should be a divisor of the total number of GPUs. For example:
+
+```
+context_parallel_size
+```
+
+**Pattern 4:** For example: - With 8 GPUs and no sequence parallelism: 8 different batches processed per step - With 8 GPUs and context_parallel_size=4: Only 2 different batches processed per step (each split across 4 GPUs) - If your per-GPU micro_batch_size is 2, the global batch size decreases from 16 to 4
+
+```
+context_parallel_size=4
+```
+
+**Pattern 5:** Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)
+
+```
+save_compressed: true
+```
+
+**Pattern 6:** Note It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env. See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer
+
+```
+integrations
+```
+
+**Pattern 7:** Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]
+
+```
+utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)
+```
+
+### Example Code Patterns
+
+**Example 1** (python):
+```python
+cli.cloud.modal_.ModalCloud(config, app=None)
+```
+
+**Example 2** (python):
+```python
+cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)
+```
+
+**Example 3** (python):
+```python
+core.trainers.base.AxolotlTrainer(
+    *_args,
+    bench_data_collator=None,
+    eval_data_collator=None,
+    dataset_tags=None,
+    **kwargs,
+)
+```
+
+**Example 4** (python):
+```python
+core.trainers.base.AxolotlTrainer.log(logs, start_time=None)
+```
+
+**Example 5** (python):
+```python
+prompt_strategies.input_output.RawInputOutputPrompter()
+```
+
+## Reference Files
+
+This skill includes comprehensive documentation in `references/`:
+
+- **api.md** - Api documentation
+- **dataset-formats.md** - Dataset-Formats documentation
+- **other.md** - Other documentation
+
+Use `view` to read specific reference files when detailed information is needed.
+
+## Working with This Skill
+
+### For Beginners
+Start with the getting_started or tutorials reference files for foundational concepts.
+
+### For Specific Features
+Use the appropriate category reference file (api, guides, etc.) for detailed information.
+
+### For Code Examples
+The quick reference section above contains common patterns extracted from the official docs.
+
+## Resources
+
+### references/
+Organized documentation extracted from official sources. These files contain:
+- Detailed explanations
+- Code examples with language annotations
+- Links to original documentation
+- Table of contents for quick navigation
+
+### scripts/
+Add helper scripts here for common automation tasks.
+
+### assets/
+Add templates, boilerplate, or example projects here.
+
+## Notes
+
+- This skill was automatically generated from official documentation
+- Reference files preserve the structure and examples from source docs
+- Code examples include language detection for better syntax highlighting
+- Quick reference patterns are extracted from common usage examples in the docs
+
+## Updating
+
+To refresh this skill with updated documentation:
+1. Re-run the scraper with the same configuration
+2. The skill will be rebuilt with the latest information
--- a/website/docs/user-guide/skills/optional/mlops/mlops-training-trl-fine-tuning.md
+++ b/website/docs/user-guide/skills/optional/mlops/mlops-training-trl-fine-tuning.md
@ -0,0 +1,477 @@
+---
+title: "Fine Tuning With Trl — TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF"
+sidebar_label: "Fine Tuning With Trl"
+description: "TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Fine Tuning With Trl
+
+TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/mlops/trl-fine-tuning` |
+| Path | `optional-skills/mlops/training/trl-fine-tuning` |
+| Version | `1.0.0` |
+| Author | Orchestra Research |
+| License | MIT |
+| Dependencies | `trl`, `transformers`, `datasets`, `peft`, `accelerate`, `torch` |
+| Platforms | linux, macos, windows |
+| Tags | `Post-Training`, `TRL`, `Reinforcement Learning`, `Fine-Tuning`, `SFT`, `DPO`, `PPO`, `GRPO`, `RLHF`, `Preference Alignment`, `HuggingFace` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# TRL - Transformer Reinforcement Learning
+
+## Quick start
+
+TRL provides post-training methods for aligning language models with human preferences.
+
+**Installation**:
+```bash
+pip install trl transformers datasets peft accelerate
+```
+
+**Supervised Fine-Tuning** (instruction tuning):
+```python
+from trl import SFTTrainer
+
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,  # Prompt-completion pairs
+)
+trainer.train()
+```
+
+**DPO** (align with preferences):
+```python
+from trl import DPOTrainer, DPOConfig
+
+config = DPOConfig(output_dir="model-dpo", beta=0.1)
+trainer = DPOTrainer(
+    model=model,
+    args=config,
+    train_dataset=preference_dataset,  # chosen/rejected pairs
+    processing_class=tokenizer
+)
+trainer.train()
+```
+
+## Common workflows
+
+### Workflow 1: Full RLHF pipeline (SFT → Reward Model → PPO)
+
+Complete pipeline from base model to human-aligned model.
+
+Copy this checklist:
+
+```
+RLHF Training:
+- [ ] Step 1: Supervised fine-tuning (SFT)
+- [ ] Step 2: Train reward model
+- [ ] Step 3: PPO reinforcement learning
+- [ ] Step 4: Evaluate aligned model
+```
+
+**Step 1: Supervised fine-tuning**
+
+Train base model on instruction-following data:
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from trl import SFTTrainer, SFTConfig
+from datasets import load_dataset
+
+# Load model
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
+
+# Load instruction dataset
+dataset = load_dataset("trl-lib/Capybara", split="train")
+
+# Configure training
+training_args = SFTConfig(
+    output_dir="Qwen2.5-0.5B-SFT",
+    per_device_train_batch_size=4,
+    num_train_epochs=1,
+    learning_rate=2e-5,
+    logging_steps=10,
+    save_strategy="epoch"
+)
+
+# Train
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset,
+    tokenizer=tokenizer
+)
+trainer.train()
+trainer.save_model()
+```
+
+**Step 2: Train reward model**
+
+Train model to predict human preferences:
+
+```python
+from transformers import AutoModelForSequenceClassification
+from trl import RewardTrainer, RewardConfig
+
+# Load SFT model as base
+model = AutoModelForSequenceClassification.from_pretrained(
+    "Qwen2.5-0.5B-SFT",
+    num_labels=1  # Single reward score
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen2.5-0.5B-SFT")
+
+# Load preference data (chosen/rejected pairs)
+dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
+
+# Configure training
+training_args = RewardConfig(
+    output_dir="Qwen2.5-0.5B-Reward",
+    per_device_train_batch_size=2,
+    num_train_epochs=1,
+    learning_rate=1e-5
+)
+
+# Train reward model
+trainer = RewardTrainer(
+    model=model,
+    args=training_args,
+    processing_class=tokenizer,
+    train_dataset=dataset
+)
+trainer.train()
+trainer.save_model()
+```
+
+**Step 3: PPO reinforcement learning**
+
+Optimize policy using reward model:
+
+```bash
+python -m trl.scripts.ppo \
+    --model_name_or_path Qwen2.5-0.5B-SFT \
+    --reward_model_path Qwen2.5-0.5B-Reward \
+    --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
+    --output_dir Qwen2.5-0.5B-PPO \
+    --learning_rate 3e-6 \
+    --per_device_train_batch_size 64 \
+    --total_episodes 10000
+```
+
+**Step 4: Evaluate**
+
+```python
+from transformers import pipeline
+
+# Load aligned model
+generator = pipeline("text-generation", model="Qwen2.5-0.5B-PPO")
+
+# Test
+prompt = "Explain quantum computing to a 10-year-old"
+output = generator(prompt, max_length=200)[0]["generated_text"]
+print(output)
+```
+
+### Workflow 2: Simple preference alignment with DPO
+
+Align model with preferences without reward model.
+
+Copy this checklist:
+
+```
+DPO Training:
+- [ ] Step 1: Prepare preference dataset
+- [ ] Step 2: Configure DPO
+- [ ] Step 3: Train with DPOTrainer
+- [ ] Step 4: Evaluate alignment
+```
+
+**Step 1: Prepare preference dataset**
+
+Dataset format:
+```json
+{
+  "prompt": "What is the capital of France?",
+  "chosen": "The capital of France is Paris.",
+  "rejected": "I don't know."
+}
+```
+
+Load dataset:
+```python
+from datasets import load_dataset
+
+dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
+# Or load your own
+# dataset = load_dataset("json", data_files="preferences.json")
+```
+
+**Step 2: Configure DPO**
+
+```python
+from trl import DPOConfig
+
+config = DPOConfig(
+    output_dir="Qwen2.5-0.5B-DPO",
+    per_device_train_batch_size=4,
+    num_train_epochs=1,
+    learning_rate=5e-7,
+    beta=0.1,  # KL penalty strength
+    max_prompt_length=512,
+    max_length=1024,
+    logging_steps=10
+)
+```
+
+**Step 3: Train with DPOTrainer**
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from trl import DPOTrainer
+
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
+
+trainer = DPOTrainer(
+    model=model,
+    args=config,
+    train_dataset=dataset,
+    processing_class=tokenizer
+)
+
+trainer.train()
+trainer.save_model()
+```
+
+**CLI alternative**:
+```bash
+trl dpo \
+    --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
+    --dataset_name argilla/Capybara-Preferences \
+    --output_dir Qwen2.5-0.5B-DPO \
+    --per_device_train_batch_size 4 \
+    --learning_rate 5e-7 \
+    --beta 0.1
+```
+
+### Workflow 3: Memory-efficient online RL with GRPO
+
+Train with reinforcement learning using minimal memory.
+
+For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
+
+Copy this checklist:
+
+```
+GRPO Training:
+- [ ] Step 1: Define reward function
+- [ ] Step 2: Configure GRPO
+- [ ] Step 3: Train with GRPOTrainer
+```
+
+**Step 1: Define reward function**
+
+```python
+def reward_function(completions, **kwargs):
+    """
+    Compute rewards for completions.
+
+    Args:
+        completions: List of generated texts
+
+    Returns:
+        List of reward scores (floats)
+    """
+    rewards = []
+    for completion in completions:
+        # Example: reward based on length and unique words
+        score = len(completion.split())  # Favor longer responses
+        score += len(set(completion.lower().split()))  # Reward unique words
+        rewards.append(score)
+    return rewards
+```
+
+Or use a reward model:
+```python
+from transformers import pipeline
+
+reward_model = pipeline("text-classification", model="reward-model-path")
+
+def reward_from_model(completions, prompts, **kwargs):
+    # Combine prompt + completion
+    full_texts = [p + c for p, c in zip(prompts, completions)]
+    # Get reward scores
+    results = reward_model(full_texts)
+    return [r["score"] for r in results]
+```
+
+**Step 2: Configure GRPO**
+
+```python
+from trl import GRPOConfig
+
+config = GRPOConfig(
+    output_dir="Qwen2-GRPO",
+    per_device_train_batch_size=4,
+    num_train_epochs=1,
+    learning_rate=1e-5,
+    num_generations=4,  # Generate 4 completions per prompt
+    max_new_tokens=128
+)
+```
+
+**Step 3: Train with GRPOTrainer**
+
+```python
+from datasets import load_dataset
+from trl import GRPOTrainer
+
+# Load prompt-only dataset
+dataset = load_dataset("trl-lib/tldr", split="train")
+
+trainer = GRPOTrainer(
+    model="Qwen/Qwen2-0.5B-Instruct",
+    reward_funcs=reward_function,  # Your reward function
+    args=config,
+    train_dataset=dataset
+)
+
+trainer.train()
+```
+
+**CLI**:
+```bash
+trl grpo \
+    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
+    --dataset_name trl-lib/tldr \
+    --output_dir Qwen2-GRPO \
+    --num_generations 4
+```
+
+## When to use vs alternatives
+
+**Use TRL when:**
+- Need to align model with human preferences
+- Have preference data (chosen/rejected pairs)
+- Want to use reinforcement learning (PPO, GRPO)
+- Need reward model training
+- Doing RLHF (full pipeline)
+
+**Method selection**:
+- **SFT**: Have prompt-completion pairs, want basic instruction following
+- **DPO**: Have preferences, want simple alignment (no reward model needed)
+- **PPO**: Have reward model, need maximum control over RL
+- **GRPO**: Memory-constrained, want online RL
+- **Reward Model**: Building RLHF pipeline, need to score generations
+
+**Use alternatives instead:**
+- **HuggingFace Trainer**: Basic fine-tuning without RL
+- **Axolotl**: YAML-based training configuration
+- **LitGPT**: Educational, minimal fine-tuning
+- **Unsloth**: Fast LoRA training
+
+## Common issues
+
+**Issue: OOM during DPO training**
+
+Reduce batch size and sequence length:
+```python
+config = DPOConfig(
+    per_device_train_batch_size=1,  # Reduce from 4
+    max_length=512,  # Reduce from 1024
+    gradient_accumulation_steps=8  # Maintain effective batch
+)
+```
+
+Or use gradient checkpointing:
+```python
+model.gradient_checkpointing_enable()
+```
+
+**Issue: Poor alignment quality**
+
+Tune beta parameter:
+```python
+# Higher beta = more conservative (stays closer to reference)
+config = DPOConfig(beta=0.5)  # Default 0.1
+
+# Lower beta = more aggressive alignment
+config = DPOConfig(beta=0.01)
+```
+
+**Issue: Reward model not learning**
+
+Check loss type and learning rate:
+```python
+config = RewardConfig(
+    learning_rate=1e-5,  # Try different LR
+    num_train_epochs=3  # Train longer
+)
+```
+
+Ensure preference dataset has clear winners:
+```python
+# Verify dataset
+print(dataset[0])
+# Should have clear chosen > rejected
+```
+
+**Issue: PPO training unstable**
+
+Adjust KL coefficient:
+```python
+config = PPOConfig(
+    kl_coef=0.1,  # Increase from 0.05
+    cliprange=0.1  # Reduce from 0.2
+)
+```
+
+## Advanced topics
+
+**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
+
+**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
+
+**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
+
+**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
+
+**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
+
+## Hardware requirements
+
+- **GPU**: NVIDIA (CUDA required)
+- **VRAM**: Depends on model and method
+  - SFT 7B: 16GB (with LoRA)
+  - DPO 7B: 24GB (stores reference model)
+  - PPO 7B: 40GB (policy + reward model)
+  - GRPO 7B: 24GB (more memory efficient)
+- **Multi-GPU**: Supported via `accelerate`
+- **Mixed precision**: BF16 recommended (A100/H100)
+
+**Memory optimization**:
+- Use LoRA/QLoRA for all methods
+- Enable gradient checkpointing
+- Use smaller batch sizes with gradient accumulation
+
+## Resources
+
+- Docs: https://huggingface.co/docs/trl/
+- GitHub: https://github.com/huggingface/trl
+- Papers:
+  - "Training language models to follow instructions with human feedback" (InstructGPT, 2022)
+  - "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (DPO, 2023)
+  - "Group Relative Policy Optimization" (GRPO, 2024)
+- Examples: https://github.com/huggingface/trl/tree/main/examples/scripts
--- a/website/docs/user-guide/skills/optional/mlops/mlops-training-unsloth.md
+++ b/website/docs/user-guide/skills/optional/mlops/mlops-training-unsloth.md
@ -0,0 +1,98 @@
+---
+title: "Unsloth — Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM"
+sidebar_label: "Unsloth"
+description: "Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Unsloth
+
+Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/mlops/unsloth` |
+| Path | `optional-skills/mlops/training/unsloth` |
+| Version | `1.0.0` |
+| Author | Orchestra Research |
+| License | MIT |
+| Dependencies | `unsloth`, `torch`, `transformers`, `trl`, `datasets`, `peft` |
+| Platforms | linux, macos |
+| Tags | `Fine-Tuning`, `Unsloth`, `Fast Training`, `LoRA`, `QLoRA`, `Memory-Efficient`, `Optimization`, `Llama`, `Mistral`, `Gemma`, `Qwen` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Unsloth Skill
+
+Comprehensive assistance with unsloth development, generated from official documentation.
+
+## When to Use This Skill
+
+This skill should be triggered when:
+- Working with unsloth
+- Asking about unsloth features or APIs
+- Implementing unsloth solutions
+- Debugging unsloth code
+- Learning unsloth best practices
+
+## Quick Reference
+
+### Common Patterns
+
+*Quick reference patterns will be added as you use the skill.*
+
+## Reference Files
+
+This skill includes comprehensive documentation in `references/`:
+
+- **llms-txt.md** - Llms-Txt documentation
+
+Use `view` to read specific reference files when detailed information is needed.
+
+## Working with This Skill
+
+### For Beginners
+Start with the getting_started or tutorials reference files for foundational concepts.
+
+### For Specific Features
+Use the appropriate category reference file (api, guides, etc.) for detailed information.
+
+### For Code Examples
+The quick reference section above contains common patterns extracted from the official docs.
+
+## Resources
+
+### references/
+Organized documentation extracted from official sources. These files contain:
+- Detailed explanations
+- Code examples with language annotations
+- Links to original documentation
+- Table of contents for quick navigation
+
+### scripts/
+Add helper scripts here for common automation tasks.
+
+### assets/
+Add templates, boilerplate, or example projects here.
+
+## Notes
+
+- This skill was automatically generated from official documentation
+- Reference files preserve the structure and examples from source docs
+- Code examples include language detection for better syntax highlighting
+- Quick reference patterns are extracted from common usage examples in the docs
+
+## Updating
+
+To refresh this skill with updated documentation:
+1. Re-run the scraper with the same configuration
+2. The skill will be rebuilt with the latest information
+
+<!-- Trigger re-upload 1763621536 -->