mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
- Restored 21 skills removed in commits757d012and740dd92: accelerate, audiocraft, code-review, faiss, flash-attention, gguf, grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion, tensorrt-llm, torchtitan, trl-fine-tuning, whisper - Rewrote sync_skills() with proper update semantics: * New skills (not in manifest): copied to user dir * Existing skills (in manifest + on disk): updated via hash comparison * User-deleted skills (in manifest, not on disk): respected, not re-added * Stale manifest entries (removed from bundled): cleaned from manifest - Added sync_skills() to CLI startup (cmd_chat) and gateway startup (start_gateway) — previously only ran during 'hermes update' - Updated cmd_update output to show new/updated/cleaned counts - Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh install, user deletion respect, update detection, stale cleanup, and name collision handling 75 bundled skills total. 2002 tests pass.
168 lines
3.2 KiB
Markdown
168 lines
3.2 KiB
Markdown
# SFT Training Guide
|
||
|
||
Complete guide to Supervised Fine-Tuning (SFT) with TRL for instruction tuning and task-specific fine-tuning.
|
||
|
||
## Overview
|
||
|
||
SFT trains models on input-output pairs to minimize cross-entropy loss. Use for:
|
||
- Instruction following
|
||
- Task-specific fine-tuning
|
||
- Chatbot training
|
||
- Domain adaptation
|
||
|
||
## Dataset Formats
|
||
|
||
### Format 1: Prompt-Completion
|
||
|
||
```json
|
||
[
|
||
{
|
||
"prompt": "What is the capital of France?",
|
||
"completion": "The capital of France is Paris."
|
||
}
|
||
]
|
||
```
|
||
|
||
### Format 2: Conversational (ChatML)
|
||
|
||
```json
|
||
[
|
||
{
|
||
"messages": [
|
||
{"role": "user", "content": "What is Python?"},
|
||
{"role": "assistant", "content": "Python is a programming language."}
|
||
]
|
||
}
|
||
]
|
||
```
|
||
|
||
### Format 3: Text-only
|
||
|
||
```json
|
||
[
|
||
{"text": "User: Hello\nAssistant: Hi! How can I help?"}
|
||
]
|
||
```
|
||
|
||
## Basic Training
|
||
|
||
```python
|
||
from trl import SFTTrainer, SFTConfig
|
||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
from datasets import load_dataset
|
||
|
||
# Load model
|
||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
|
||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
|
||
|
||
# Load dataset
|
||
dataset = load_dataset("trl-lib/Capybara", split="train")
|
||
|
||
# Configure
|
||
config = SFTConfig(
|
||
output_dir="Qwen2.5-SFT",
|
||
per_device_train_batch_size=4,
|
||
num_train_epochs=1,
|
||
learning_rate=2e-5,
|
||
save_strategy="epoch"
|
||
)
|
||
|
||
# Train
|
||
trainer = SFTTrainer(
|
||
model=model,
|
||
args=config,
|
||
train_dataset=dataset,
|
||
tokenizer=tokenizer
|
||
)
|
||
trainer.train()
|
||
```
|
||
|
||
## Chat Templates
|
||
|
||
Apply chat templates automatically:
|
||
|
||
```python
|
||
trainer = SFTTrainer(
|
||
model=model,
|
||
args=config,
|
||
train_dataset=dataset, # Messages format
|
||
tokenizer=tokenizer
|
||
# Chat template applied automatically
|
||
)
|
||
```
|
||
|
||
Or manually:
|
||
```python
|
||
def format_chat(example):
|
||
messages = example["messages"]
|
||
text = tokenizer.apply_chat_template(messages, tokenize=False)
|
||
return {"text": text}
|
||
|
||
dataset = dataset.map(format_chat)
|
||
```
|
||
|
||
## Packing for Efficiency
|
||
|
||
Pack multiple sequences into one to maximize GPU utilization:
|
||
|
||
```python
|
||
config = SFTConfig(
|
||
packing=True, # Enable packing
|
||
max_seq_length=2048,
|
||
dataset_text_field="text"
|
||
)
|
||
```
|
||
|
||
**Benefits**: 2-3× faster training
|
||
**Trade-off**: Slightly more complex batching
|
||
|
||
## Multi-GPU Training
|
||
|
||
```bash
|
||
accelerate launch --num_processes 4 train_sft.py
|
||
```
|
||
|
||
Or with config:
|
||
```python
|
||
config = SFTConfig(
|
||
output_dir="model-sft",
|
||
per_device_train_batch_size=4,
|
||
gradient_accumulation_steps=4,
|
||
num_train_epochs=1
|
||
)
|
||
```
|
||
|
||
## LoRA Fine-Tuning
|
||
|
||
```python
|
||
from peft import LoraConfig
|
||
|
||
lora_config = LoraConfig(
|
||
r=16,
|
||
lora_alpha=32,
|
||
target_modules="all-linear",
|
||
lora_dropout=0.05,
|
||
task_type="CAUSAL_LM"
|
||
)
|
||
|
||
trainer = SFTTrainer(
|
||
model=model,
|
||
args=config,
|
||
train_dataset=dataset,
|
||
peft_config=lora_config # Add LoRA
|
||
)
|
||
```
|
||
|
||
## Hyperparameters
|
||
|
||
| Model Size | Learning Rate | Batch Size | Epochs |
|
||
|------------|---------------|------------|--------|
|
||
| <1B | 5e-5 | 8-16 | 1-3 |
|
||
| 1-7B | 2e-5 | 4-8 | 1-2 |
|
||
| 7-13B | 1e-5 | 2-4 | 1 |
|
||
| 13B+ | 5e-6 | 1-2 | 1 |
|
||
|
||
## References
|
||
|
||
- TRL docs: https://huggingface.co/docs/trl/sft_trainer
|
||
- Examples: https://github.com/huggingface/trl/tree/main/examples/scripts
|