mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
- Restored 21 skills removed in commits757d012and740dd92: accelerate, audiocraft, code-review, faiss, flash-attention, gguf, grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion, tensorrt-llm, torchtitan, trl-fine-tuning, whisper - Rewrote sync_skills() with proper update semantics: * New skills (not in manifest): copied to user dir * Existing skills (in manifest + on disk): updated via hash comparison * User-deleted skills (in manifest, not on disk): respected, not re-added * Stale manifest entries (removed from bundled): cleaned from manifest - Added sync_skills() to CLI startup (cmd_chat) and gateway startup (start_gateway) — previously only ran during 'hermes update' - Updated cmd_update output to show new/updated/cleaned counts - Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh install, user deletion respect, update detection, stale cleanup, and name collision handling 75 bundled skills total. 2002 tests pass.
480 lines
10 KiB
Markdown
480 lines
10 KiB
Markdown
# PEFT Troubleshooting Guide
|
|
|
|
## Installation Issues
|
|
|
|
### bitsandbytes CUDA Error
|
|
|
|
**Error**: `CUDA Setup failed despite GPU being available`
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Check CUDA version
|
|
nvcc --version
|
|
|
|
# Install matching bitsandbytes
|
|
pip uninstall bitsandbytes
|
|
pip install bitsandbytes --no-cache-dir
|
|
|
|
# Or compile from source for specific CUDA
|
|
git clone https://github.com/TimDettmers/bitsandbytes.git
|
|
cd bitsandbytes
|
|
CUDA_VERSION=118 make cuda11x # Adjust for your CUDA
|
|
pip install .
|
|
```
|
|
|
|
### Triton Import Error
|
|
|
|
**Error**: `ModuleNotFoundError: No module named 'triton'`
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Install triton (Linux only)
|
|
pip install triton
|
|
|
|
# Windows: Triton not supported, use CUDA backend
|
|
# Set environment variable to disable triton
|
|
export CUDA_VISIBLE_DEVICES=0
|
|
```
|
|
|
|
### PEFT Version Conflicts
|
|
|
|
**Error**: `AttributeError: 'LoraConfig' object has no attribute 'use_dora'`
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Upgrade to latest PEFT
|
|
pip install peft>=0.13.0 --upgrade
|
|
|
|
# Check version
|
|
python -c "import peft; print(peft.__version__)"
|
|
```
|
|
|
|
## Training Issues
|
|
|
|
### CUDA Out of Memory
|
|
|
|
**Error**: `torch.cuda.OutOfMemoryError: CUDA out of memory`
|
|
|
|
**Solutions**:
|
|
|
|
1. **Enable gradient checkpointing**:
|
|
```python
|
|
from peft import prepare_model_for_kbit_training
|
|
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
|
|
```
|
|
|
|
2. **Reduce batch size**:
|
|
```python
|
|
TrainingArguments(
|
|
per_device_train_batch_size=1,
|
|
gradient_accumulation_steps=16 # Maintain effective batch size
|
|
)
|
|
```
|
|
|
|
3. **Use QLoRA**:
|
|
```python
|
|
from transformers import BitsAndBytesConfig
|
|
|
|
bnb_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_quant_type="nf4",
|
|
bnb_4bit_use_double_quant=True
|
|
)
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
|
|
```
|
|
|
|
4. **Lower LoRA rank**:
|
|
```python
|
|
LoraConfig(r=8) # Instead of r=16 or higher
|
|
```
|
|
|
|
5. **Target fewer modules**:
|
|
```python
|
|
target_modules=["q_proj", "v_proj"] # Instead of all-linear
|
|
```
|
|
|
|
### Loss Not Decreasing
|
|
|
|
**Problem**: Training loss stays flat or increases.
|
|
|
|
**Solutions**:
|
|
|
|
1. **Check learning rate**:
|
|
```python
|
|
# Start lower
|
|
TrainingArguments(learning_rate=1e-4) # Not 2e-4 or higher
|
|
```
|
|
|
|
2. **Verify adapter is active**:
|
|
```python
|
|
model.print_trainable_parameters()
|
|
# Should show >0 trainable params
|
|
|
|
# Check adapter applied
|
|
print(model.peft_config)
|
|
```
|
|
|
|
3. **Check data formatting**:
|
|
```python
|
|
# Verify tokenization
|
|
sample = dataset[0]
|
|
decoded = tokenizer.decode(sample["input_ids"])
|
|
print(decoded) # Should look correct
|
|
```
|
|
|
|
4. **Increase rank**:
|
|
```python
|
|
LoraConfig(r=32, lora_alpha=64) # More capacity
|
|
```
|
|
|
|
### NaN Loss
|
|
|
|
**Error**: `Loss is NaN`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Use bf16 instead of fp16
|
|
TrainingArguments(bf16=True, fp16=False)
|
|
|
|
# Or enable loss scaling
|
|
TrainingArguments(fp16=True, fp16_full_eval=True)
|
|
|
|
# Lower learning rate
|
|
TrainingArguments(learning_rate=5e-5)
|
|
|
|
# Check for data issues
|
|
for batch in dataloader:
|
|
if torch.isnan(batch["input_ids"].float()).any():
|
|
print("NaN in input!")
|
|
```
|
|
|
|
### Adapter Not Training
|
|
|
|
**Problem**: `trainable params: 0` or model not updating.
|
|
|
|
**Fix**:
|
|
```python
|
|
# Verify LoRA applied to correct modules
|
|
for name, module in model.named_modules():
|
|
if "lora" in name.lower():
|
|
print(f"Found LoRA: {name}")
|
|
|
|
# Check target_modules match model architecture
|
|
from peft.utils import TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
|
|
print(TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING.get(model.config.model_type))
|
|
|
|
# Ensure model in training mode
|
|
model.train()
|
|
|
|
# Check requires_grad
|
|
for name, param in model.named_parameters():
|
|
if param.requires_grad:
|
|
print(f"Trainable: {name}")
|
|
```
|
|
|
|
## Loading Issues
|
|
|
|
### Adapter Loading Fails
|
|
|
|
**Error**: `ValueError: Can't find adapter weights`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Check adapter files exist
|
|
import os
|
|
print(os.listdir("./adapter-path"))
|
|
# Should contain: adapter_config.json, adapter_model.safetensors
|
|
|
|
# Load with correct structure
|
|
from peft import PeftModel, PeftConfig
|
|
|
|
# Check config
|
|
config = PeftConfig.from_pretrained("./adapter-path")
|
|
print(config)
|
|
|
|
# Load base model first
|
|
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
|
|
model = PeftModel.from_pretrained(base_model, "./adapter-path")
|
|
```
|
|
|
|
### Base Model Mismatch
|
|
|
|
**Error**: `RuntimeError: size mismatch`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Ensure base model matches adapter
|
|
from peft import PeftConfig
|
|
|
|
config = PeftConfig.from_pretrained("./adapter-path")
|
|
print(f"Base model: {config.base_model_name_or_path}")
|
|
|
|
# Load exact same base model
|
|
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
|
|
```
|
|
|
|
### Safetensors vs PyTorch Format
|
|
|
|
**Error**: `ValueError: We couldn't connect to 'https://huggingface.co'`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Force local loading
|
|
model = PeftModel.from_pretrained(
|
|
base_model,
|
|
"./adapter-path",
|
|
local_files_only=True
|
|
)
|
|
|
|
# Or specify format
|
|
model.save_pretrained("./adapter", safe_serialization=True) # safetensors
|
|
model.save_pretrained("./adapter", safe_serialization=False) # pytorch
|
|
```
|
|
|
|
## Inference Issues
|
|
|
|
### Slow Generation
|
|
|
|
**Problem**: Inference much slower than expected.
|
|
|
|
**Solutions**:
|
|
|
|
1. **Merge adapter for deployment**:
|
|
```python
|
|
merged_model = model.merge_and_unload()
|
|
# No adapter overhead during inference
|
|
```
|
|
|
|
2. **Use optimized inference engine**:
|
|
```python
|
|
from vllm import LLM
|
|
llm = LLM(model="./merged-model", dtype="half")
|
|
```
|
|
|
|
3. **Enable Flash Attention**:
|
|
```python
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
attn_implementation="flash_attention_2"
|
|
)
|
|
```
|
|
|
|
### Output Quality Issues
|
|
|
|
**Problem**: Fine-tuned model produces worse outputs.
|
|
|
|
**Solutions**:
|
|
|
|
1. **Check evaluation without adapter**:
|
|
```python
|
|
with model.disable_adapter():
|
|
base_output = model.generate(**inputs)
|
|
# Compare with adapter output
|
|
```
|
|
|
|
2. **Lower temperature during eval**:
|
|
```python
|
|
model.generate(**inputs, temperature=0.1, do_sample=False)
|
|
```
|
|
|
|
3. **Retrain with more data**:
|
|
```python
|
|
# Increase training samples
|
|
# Use higher quality data
|
|
# Train for more epochs
|
|
```
|
|
|
|
### Wrong Adapter Active
|
|
|
|
**Problem**: Model using wrong adapter or no adapter.
|
|
|
|
**Fix**:
|
|
```python
|
|
# Check active adapters
|
|
print(model.active_adapters)
|
|
|
|
# Explicitly set adapter
|
|
model.set_adapter("your-adapter-name")
|
|
|
|
# List all adapters
|
|
print(model.peft_config.keys())
|
|
```
|
|
|
|
## QLoRA Specific Issues
|
|
|
|
### Quantization Errors
|
|
|
|
**Error**: `RuntimeError: mat1 and mat2 shapes cannot be multiplied`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Ensure compute dtype matches
|
|
bnb_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_compute_dtype=torch.bfloat16, # Match model dtype
|
|
bnb_4bit_quant_type="nf4"
|
|
)
|
|
|
|
# Load with correct dtype
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
quantization_config=bnb_config,
|
|
torch_dtype=torch.bfloat16
|
|
)
|
|
```
|
|
|
|
### QLoRA OOM
|
|
|
|
**Error**: OOM even with 4-bit quantization.
|
|
|
|
**Fix**:
|
|
```python
|
|
# Enable double quantization
|
|
bnb_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_use_double_quant=True # Further memory reduction
|
|
)
|
|
|
|
# Use offloading
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
quantization_config=bnb_config,
|
|
device_map="auto",
|
|
max_memory={0: "20GB", "cpu": "100GB"}
|
|
)
|
|
```
|
|
|
|
### QLoRA Merge Fails
|
|
|
|
**Error**: `RuntimeError: expected scalar type BFloat16 but found Float`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Dequantize before merging
|
|
from peft import PeftModel
|
|
|
|
# Load in higher precision for merging
|
|
base_model = AutoModelForCausalLM.from_pretrained(
|
|
base_model_name,
|
|
torch_dtype=torch.float16, # Not quantized
|
|
device_map="auto"
|
|
)
|
|
|
|
# Load adapter
|
|
model = PeftModel.from_pretrained(base_model, "./qlora-adapter")
|
|
|
|
# Now merge
|
|
merged = model.merge_and_unload()
|
|
```
|
|
|
|
## Multi-Adapter Issues
|
|
|
|
### Adapter Conflict
|
|
|
|
**Error**: `ValueError: Adapter with name 'default' already exists`
|
|
|
|
**Fix**:
|
|
```python
|
|
# Use unique names
|
|
model.load_adapter("./adapter1", adapter_name="task1")
|
|
model.load_adapter("./adapter2", adapter_name="task2")
|
|
|
|
# Or delete existing
|
|
model.delete_adapter("default")
|
|
```
|
|
|
|
### Mixed Precision Adapters
|
|
|
|
**Error**: Adapters trained with different dtypes.
|
|
|
|
**Fix**:
|
|
```python
|
|
# Convert adapter precision
|
|
model = PeftModel.from_pretrained(base_model, "./adapter")
|
|
model = model.to(torch.bfloat16)
|
|
|
|
# Or load with specific dtype
|
|
model = PeftModel.from_pretrained(
|
|
base_model,
|
|
"./adapter",
|
|
torch_dtype=torch.bfloat16
|
|
)
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### Memory Profiling
|
|
|
|
```python
|
|
import torch
|
|
|
|
def print_memory():
|
|
if torch.cuda.is_available():
|
|
allocated = torch.cuda.memory_allocated() / 1e9
|
|
reserved = torch.cuda.memory_reserved() / 1e9
|
|
print(f"Allocated: {allocated:.2f}GB, Reserved: {reserved:.2f}GB")
|
|
|
|
# Profile during training
|
|
print_memory() # Before
|
|
model.train()
|
|
loss = model(**batch).loss
|
|
loss.backward()
|
|
print_memory() # After
|
|
```
|
|
|
|
### Speed Profiling
|
|
|
|
```python
|
|
import time
|
|
import torch
|
|
|
|
def benchmark_generation(model, tokenizer, prompt, n_runs=5):
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
|
|
# Warmup
|
|
model.generate(**inputs, max_new_tokens=10)
|
|
torch.cuda.synchronize()
|
|
|
|
# Benchmark
|
|
times = []
|
|
for _ in range(n_runs):
|
|
start = time.perf_counter()
|
|
outputs = model.generate(**inputs, max_new_tokens=100)
|
|
torch.cuda.synchronize()
|
|
times.append(time.perf_counter() - start)
|
|
|
|
tokens = outputs.shape[1] - inputs.input_ids.shape[1]
|
|
avg_time = sum(times) / len(times)
|
|
print(f"Speed: {tokens/avg_time:.2f} tokens/sec")
|
|
|
|
# Compare adapter vs merged
|
|
benchmark_generation(adapter_model, tokenizer, "Hello")
|
|
benchmark_generation(merged_model, tokenizer, "Hello")
|
|
```
|
|
|
|
## Getting Help
|
|
|
|
1. **Check PEFT GitHub Issues**: https://github.com/huggingface/peft/issues
|
|
2. **HuggingFace Forums**: https://discuss.huggingface.co/
|
|
3. **PEFT Documentation**: https://huggingface.co/docs/peft
|
|
|
|
### Debugging Template
|
|
|
|
When reporting issues, include:
|
|
|
|
```python
|
|
# System info
|
|
import peft
|
|
import transformers
|
|
import torch
|
|
|
|
print(f"PEFT: {peft.__version__}")
|
|
print(f"Transformers: {transformers.__version__}")
|
|
print(f"PyTorch: {torch.__version__}")
|
|
print(f"CUDA: {torch.version.cuda}")
|
|
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")
|
|
|
|
# Config
|
|
print(model.peft_config)
|
|
model.print_trainable_parameters()
|
|
```
|