mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

teknium1 ab0f4126cf fix: restore all removed bundled skills + fix skills sync system

- Restored 21 skills removed in commits 757d012 and 740dd92:
  accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
  grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
  pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
  tensorrt-llm, torchtitan, trl-fine-tuning, whisper

- Rewrote sync_skills() with proper update semantics:
  * New skills (not in manifest): copied to user dir
  * Existing skills (in manifest + on disk): updated via hash comparison
  * User-deleted skills (in manifest, not on disk): respected, not re-added
  * Stale manifest entries (removed from bundled): cleaned from manifest

- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
  (start_gateway) — previously only ran during 'hermes update'

- Updated cmd_update output to show new/updated/cleaned counts

- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
  install, user deletion respect, update detection, stale cleanup, and
  name collision handling

75 bundled skills total. 2002 tests pass.

2026-03-06 15:57:30 -08:00

12 KiB

Raw Blame History

Hyperparameter Tuning with PyTorch Lightning

Integration with Tuning Frameworks

Lightning integrates seamlessly with popular hyperparameter tuning libraries.

1. Ray Tune Integration

Installation:

pip install ray[tune]
pip install lightning

Basic Ray Tune example:

import lightning as L
from ray import tune
from ray.tune.integration.pytorch_lightning import TuneReportCallback

class LitModel(L.LightningModule):
    def __init__(self, lr, batch_size):
        super().__init__()
        self.lr = lr
        self.batch_size = batch_size
        self.model = nn.Sequential(nn.Linear(10, 128), nn.ReLU(), nn.Linear(128, 1))

    def training_step(self, batch, batch_idx):
        loss = self.model(batch).mean()
        self.log('train_loss', loss)
        return loss

    def validation_step(self, batch, batch_idx):
        val_loss = self.model(batch).mean()
        self.log('val_loss', val_loss)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

def train_fn(config):
    """Training function for Ray Tune."""
    model = LitModel(lr=config["lr"], batch_size=config["batch_size"])

    # Add callback to report metrics to Tune
    trainer = L.Trainer(
        max_epochs=10,
        callbacks=[TuneReportCallback({"loss": "val_loss"}, on="validation_end")]
    )

    trainer.fit(model, train_loader, val_loader)

# Define search space
config = {
    "lr": tune.loguniform(1e-5, 1e-1),
    "batch_size": tune.choice([16, 32, 64, 128])
}

# Run hyperparameter search
analysis = tune.run(
    train_fn,
    config=config,
    num_samples=20,  # 20 trials
    resources_per_trial={"gpu": 1}
)

# Best hyperparameters
best_config = analysis.get_best_config(metric="loss", mode="min")
print(f"Best config: {best_config}")

Advanced: Population-Based Training (PBT):

from ray.tune.schedulers import PopulationBasedTraining

# PBT scheduler
scheduler = PopulationBasedTraining(
    time_attr='training_iteration',
    metric='val_loss',
    mode='min',
    perturbation_interval=5,  # Perturb every 5 epochs
    hyperparam_mutations={
        "lr": tune.loguniform(1e-5, 1e-1),
        "batch_size": [16, 32, 64, 128]
    }
)

analysis = tune.run(
    train_fn,
    config=config,
    num_samples=8,  # Population size
    scheduler=scheduler,
    resources_per_trial={"gpu": 1}
)

2. Optuna Integration

Installation:

pip install optuna
pip install optuna-integration

Optuna example:

import optuna
from optuna.integration import PyTorchLightningPruningCallback

def objective(trial):
    # Suggest hyperparameters
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-1)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
    n_layers = trial.suggest_int('n_layers', 1, 3)
    hidden_size = trial.suggest_int('hidden_size', 64, 512, step=64)

    # Create model
    model = LitModel(lr=lr, n_layers=n_layers, hidden_size=hidden_size)

    # Pruning callback (early stopping for bad trials)
    pruning_callback = PyTorchLightningPruningCallback(trial, monitor="val_loss")

    trainer = L.Trainer(
        max_epochs=20,
        callbacks=[pruning_callback],
        enable_progress_bar=False,
        logger=False
    )

    trainer.fit(model, train_loader, val_loader)

    return trainer.callback_metrics["val_loss"].item()

# Create study
study = optuna.create_study(
    direction='minimize',
    pruner=optuna.pruners.MedianPruner()  # Prune bad trials early
)

# Optimize
study.optimize(objective, n_trials=50, timeout=3600)

# Best params
print(f"Best trial: {study.best_trial.params}")
print(f"Best value: {study.best_value}")

# Visualization
optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_param_importances(study).show()

Optuna with distributed training:

import optuna

# Shared database for distributed optimization
storage = optuna.storages.RDBStorage(
    url='postgresql://user:pass@localhost/optuna'
)

study = optuna.create_study(
    study_name='distributed_study',
    storage=storage,
    load_if_exists=True,
    direction='minimize'
)

# Run on multiple machines
study.optimize(objective, n_trials=50)

3. Weights & Biases (WandB) Sweeps

Installation:

pip install wandb

WandB sweep config (sweep.yaml):

program: train.py
method: bayes
metric:
  name: val_loss
  goal: minimize
parameters:
  lr:
    distribution: log_uniform_values
    min: 0.00001
    max: 0.1
  batch_size:
    values: [16, 32, 64, 128]
  optimizer:
    values: ['adam', 'sgd', 'adamw']
  dropout:
    distribution: uniform
    min: 0.0
    max: 0.5

Training script (train.py):

import wandb
import lightning as L
from lightning.pytorch.loggers import WandbLogger

def train():
    # Initialize wandb
    wandb.init()
    config = wandb.config

    # Create model with sweep params
    model = LitModel(
        lr=config.lr,
        batch_size=config.batch_size,
        optimizer=config.optimizer,
        dropout=config.dropout
    )

    # WandB logger
    wandb_logger = WandbLogger(project='hyperparameter-sweep')

    trainer = L.Trainer(
        max_epochs=20,
        logger=wandb_logger
    )

    trainer.fit(model, train_loader, val_loader)

if __name__ == '__main__':
    train()

Launch sweep:

# Initialize sweep
wandb sweep sweep.yaml
# Output: wandb: Created sweep with ID: abc123

# Run agent (can run on multiple machines)
wandb agent your-entity/your-project/abc123

4. Hyperopt Integration

Installation:

pip install hyperopt

Hyperopt example:

from hyperopt import hp, fmin, tpe, Trials

def objective(params):
    model = LitModel(
        lr=params['lr'],
        batch_size=int(params['batch_size']),
        hidden_size=int(params['hidden_size'])
    )

    trainer = L.Trainer(
        max_epochs=10,
        enable_progress_bar=False,
        logger=False
    )

    trainer.fit(model, train_loader, val_loader)

    # Return loss (minimize)
    return trainer.callback_metrics["val_loss"].item()

# Define search space
space = {
    'lr': hp.loguniform('lr', np.log(1e-5), np.log(1e-1)),
    'batch_size': hp.quniform('batch_size', 16, 128, 16),
    'hidden_size': hp.quniform('hidden_size', 64, 512, 64)
}

# Optimize
trials = Trials()
best = fmin(
    fn=objective,
    space=space,
    algo=tpe.suggest,  # Tree-structured Parzen Estimator
    max_evals=50,
    trials=trials
)

print(f"Best hyperparameters: {best}")

Built-In Lightning Tuning

Auto Learning Rate Finder

class LitModel(L.LightningModule):
    def __init__(self, lr=1e-3):
        super().__init__()
        self.lr = lr
        self.model = nn.Linear(10, 1)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

    def training_step(self, batch, batch_idx):
        loss = self.model(batch).mean()
        return loss

# Find optimal learning rate
model = LitModel()
trainer = L.Trainer(auto_lr_find=True)

# This runs LR finder before training
trainer.tune(model, train_loader)

# Or manually
from lightning.pytorch.tuner import Tuner
tuner = Tuner(trainer)
lr_finder = tuner.lr_find(model, train_loader)

# Plot results
fig = lr_finder.plot(suggest=True)
fig.show()

# Get suggested LR
suggested_lr = lr_finder.suggestion()
print(f"Suggested LR: {suggested_lr}")

# Update model
model.lr = suggested_lr

# Train with optimal LR
trainer.fit(model, train_loader)

Auto Batch Size Finder

class LitModel(L.LightningModule):
    def __init__(self, batch_size=32):
        super().__init__()
        self.batch_size = batch_size
        self.model = nn.Linear(10, 1)

    def train_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size)

model = LitModel()
trainer = L.Trainer(auto_scale_batch_size='binsearch')

# Find optimal batch size
trainer.tune(model)

print(f"Optimal batch size: {model.batch_size}")

# Train with optimal batch size
trainer.fit(model, train_loader)

Advanced Tuning Strategies

1. Multi-Fidelity Optimization (Successive Halving)

from ray.tune.schedulers import ASHAScheduler

# ASHA: Asynchronous Successive Halving Algorithm
scheduler = ASHAScheduler(
    max_t=100,  # Max epochs
    grace_period=10,  # Min epochs before stopping
    reduction_factor=2  # Halve resources each round
)

analysis = tune.run(
    train_fn,
    config=config,
    num_samples=64,
    scheduler=scheduler,
    resources_per_trial={"gpu": 1}
)

How it works:

Start 64 trials
After 10 epochs, stop bottom 50% (32 trials remain)
After 20 epochs, stop bottom 50% (16 trials remain)
After 40 epochs, stop bottom 50% (8 trials remain)
After 80 epochs, stop bottom 50% (4 trials remain)
Run remaining 4 trials to completion (100 epochs)

2. Bayesian Optimization

from ray.tune.search.bayesopt import BayesOptSearch

search = BayesOptSearch(
    metric="val_loss",
    mode="min"
)

analysis = tune.run(
    train_fn,
    config=config,
    num_samples=50,
    search_alg=search,
    resources_per_trial={"gpu": 1}
)

3. Grid Search

from ray import tune

# Exhaustive grid search
config = {
    "lr": tune.grid_search([1e-5, 1e-4, 1e-3, 1e-2]),
    "batch_size": tune.grid_search([16, 32, 64, 128]),
    "optimizer": tune.grid_search(['adam', 'sgd', 'adamw'])
}

# Total trials: 4 × 4 × 3 = 48
analysis = tune.run(train_fn, config=config)

4. Random Search

config = {
    "lr": tune.loguniform(1e-5, 1e-1),
    "batch_size": tune.choice([16, 32, 64, 128]),
    "dropout": tune.uniform(0.0, 0.5),
    "hidden_size": tune.randint(64, 512)
}

# Random sampling
analysis = tune.run(
    train_fn,
    config=config,
    num_samples=100  # 100 random samples
)

Best Practices

1. Start Simple

# Phase 1: Coarse search (fast)
coarse_config = {
    "lr": tune.loguniform(1e-5, 1e-1),
    "batch_size": tune.choice([32, 64])
}
coarse_analysis = tune.run(train_fn, config=coarse_config, num_samples=10, max_epochs=5)

# Phase 2: Fine-tune around best (slow)
best_lr = coarse_analysis.best_config["lr"]
fine_config = {
    "lr": tune.uniform(best_lr * 0.5, best_lr * 2),
    "batch_size": tune.choice([16, 32, 64, 128])
}
fine_analysis = tune.run(train_fn, config=fine_config, num_samples=20, max_epochs=20)

2. Use Checkpointing

def train_fn(config, checkpoint_dir=None):
    model = LitModel(lr=config["lr"])

    trainer = L.Trainer(
        max_epochs=100,
        callbacks=[
            TuneReportCheckpointCallback(
                metrics={"loss": "val_loss"},
                filename="checkpoint",
                on="validation_end"
            )
        ]
    )

    # Resume from checkpoint if exists
    ckpt_path = None
    if checkpoint_dir:
        ckpt_path = os.path.join(checkpoint_dir, "checkpoint")

    trainer.fit(model, train_loader, val_loader, ckpt_path=ckpt_path)

3. Monitor Resource Usage

import GPUtil

def train_fn(config):
    # Before training
    GPUs = GPUtil.getGPUs()
    print(f"GPU memory before: {GPUs[0].memoryUsed} MB")

    # Train
    model = LitModel(lr=config["lr"], batch_size=config["batch_size"])
    trainer.fit(model, train_loader)

    # After training
    GPUs = GPUtil.getGPUs()
    print(f"GPU memory after: {GPUs[0].memoryUsed} MB")

Common Issues

Issue: Trials Running Out of Memory

Solution: Reduce concurrent trials or batch size

analysis = tune.run(
    train_fn,
    config=config,
    resources_per_trial={"gpu": 0.5},  # 2 trials per GPU
    max_concurrent_trials=2  # Limit concurrent trials
)

Issue: Slow Hyperparameter Search

Solution: Use early stopping scheduler

from ray.tune.schedulers import ASHAScheduler

scheduler = ASHAScheduler(
    max_t=100,
    grace_period=5,  # Stop bad trials after 5 epochs
    reduction_factor=3
)

Issue: Can't Reproduce Best Trial

Solution: Set seeds in training function

def train_fn(config):
    L.seed_everything(42, workers=True)
    # Rest of training...

Resources

Ray Tune + Lightning: https://docs.ray.io/en/latest/tune/examples/tune-pytorch-lightning.html
Optuna: https://optuna.readthedocs.io/
WandB Sweeps: https://docs.wandb.ai/guides/sweeps
Lightning Tuner: https://lightning.ai/docs/pytorch/stable/tuning.html

12 KiB Raw Blame History Unescape Escape

Hyperparameter Tuning with PyTorch Lightning

Integration with Tuning Frameworks

1. Ray Tune Integration

2. Optuna Integration

3. Weights & Biases (WandB) Sweeps

4. Hyperopt Integration

Built-In Lightning Tuning

Auto Learning Rate Finder

Auto Batch Size Finder

Advanced Tuning Strategies

1. Multi-Fidelity Optimization (Successive Halving)

2. Bayesian Optimization

3. Grid Search

4. Random Search

Best Practices

1. Start Simple

2. Use Checkpointing

3. Monitor Resource Usage

Common Issues

Issue: Trials Running Out of Memory

Issue: Slow Hyperparameter Search

Issue: Can't Reproduce Best Trial

Resources

12 KiB

Raw Blame History