refactor: reorganize skills into sub-categories

The skills directory was getting disorganized — mlops alone had 40 skills in a flat list, and 12 categories were singletons with just one skill each. Code change: - prompt_builder.py: Support sub-categories in skill scanner. skills/mlops/training/axolotl/SKILL.md now shows as category 'mlops/training' instead of just 'mlops'. Backwards-compatible with existing flat structure. Split mlops (40 skills) into 7 sub-categories: - mlops/training (12): accelerate, axolotl, flash-attention, grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, torchtitan, trl-fine-tuning, unsloth - mlops/inference (8): gguf, guidance, instructor, llama-cpp, obliteratus, outlines, tensorrt-llm, vllm - mlops/models (6): audiocraft, clip, llava, segment-anything, stable-diffusion, whisper - mlops/vector-databases (4): chroma, faiss, pinecone, qdrant - mlops/evaluation (5): huggingface-tokenizers, lm-evaluation-harness, nemo-curator, saelens, weights-and-biases - mlops/cloud (2): lambda-labs, modal - mlops/research (1): dspy Merged singleton categories: - gifs → media (gif-search joins youtube-content) - music-creation → media (heartmula, songsee) - diagramming → creative (excalidraw joins ascii-art) - ocr-and-documents → productivity - domain → research (domain-intel) - feeds → research (blogwatcher) - market-data → research (polymarket) Fixed misplaced skills: - mlops/code-review → software-development (not ML-specific) - mlops/ml-paper-writing → research (academic writing) Added DESCRIPTION.md files for all new/updated categories.
2026-04-26 01:01:40 +00:00 · 2026-03-09 03:35:53 -07:00 · 2026-03-09 03:35:53 -07:00 · 732c66b0f3
commit 732c66b0f3
parent d6c710706f
217 changed files with 39 additions and 4 deletions
--- a/skills/mlops/evaluation/saelens/references/README.md
+++ b/skills/mlops/evaluation/saelens/references/README.md
@ -0,0 +1,70 @@
+# SAELens Reference Documentation
+
+This directory contains comprehensive reference materials for SAELens.
+
+## Contents
+
+- [api.md](api.md) - Complete API reference for SAE, TrainingSAE, and configuration classes
+- [tutorials.md](tutorials.md) - Step-by-step tutorials for training and analyzing SAEs
+- [papers.md](papers.md) - Key research papers on sparse autoencoders
+
+## Quick Links
+
+- **GitHub Repository**: https://github.com/jbloomAus/SAELens
+- **Neuronpedia**: https://neuronpedia.org (browse pre-trained SAE features)
+- **HuggingFace SAEs**: Search for tag `saelens`
+
+## Installation
+
+```bash
+pip install sae-lens
+```
+
+Requirements: Python 3.10+, transformer-lens>=2.0.0
+
+## Basic Usage
+
+```python
+from transformer_lens import HookedTransformer
+from sae_lens import SAE
+
+# Load model and SAE
+model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
+sae, cfg_dict, sparsity = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+# Encode activations to sparse features
+tokens = model.to_tokens("Hello world")
+_, cache = model.run_with_cache(tokens)
+activations = cache["resid_pre", 8]
+
+features = sae.encode(activations)  # Sparse feature activations
+reconstructed = sae.decode(features)  # Reconstructed activations
+```
+
+## Key Concepts
+
+### Sparse Autoencoders
+SAEs decompose dense neural activations into sparse, interpretable features:
+- **Encoder**: Maps d_model → d_sae (typically 4-16x expansion)
+- **ReLU/TopK**: Enforces sparsity
+- **Decoder**: Reconstructs original activations
+
+### Training Loss
+`Loss = MSE(original, reconstructed) + L1_coefficient × L1(features)`
+
+### Key Metrics
+- **L0**: Average number of active features (target: 50-200)
+- **CE Loss Score**: Cross-entropy recovered vs original model (target: 80-95%)
+- **Dead Features**: Features that never activate (target: <5%)
+
+## Available Pre-trained SAEs
+
+| Release | Model | Description |
+|---------|-------|-------------|
+| `gpt2-small-res-jb` | GPT-2 Small | Residual stream SAEs |
+| `gemma-2b-res` | Gemma 2B | Residual stream SAEs |
+| Various | Search HuggingFace | Community-trained SAEs |
--- a/skills/mlops/evaluation/saelens/references/api.md
+++ b/skills/mlops/evaluation/saelens/references/api.md
@ -0,0 +1,333 @@
+# SAELens API Reference
+
+## SAE Class
+
+The core class representing a Sparse Autoencoder.
+
+### Loading Pre-trained SAEs
+
+```python
+from sae_lens import SAE
+
+# From official releases
+sae, cfg_dict, sparsity = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+# From HuggingFace
+sae, cfg_dict, sparsity = SAE.from_pretrained(
+    release="username/repo-name",
+    sae_id="path/to/sae",
+    device="cuda"
+)
+
+# From local disk
+sae = SAE.load_from_disk("/path/to/sae", device="cuda")
+```
+
+### SAE Attributes
+
+| Attribute | Shape | Description |
+|-----------|-------|-------------|
+| `W_enc` | [d_in, d_sae] | Encoder weights |
+| `W_dec` | [d_sae, d_in] | Decoder weights |
+| `b_enc` | [d_sae] | Encoder bias |
+| `b_dec` | [d_in] | Decoder bias |
+| `cfg` | SAEConfig | Configuration object |
+
+### Core Methods
+
+#### encode()
+
+```python
+# Encode activations to sparse features
+features = sae.encode(activations)
+# Input: [batch, pos, d_in]
+# Output: [batch, pos, d_sae]
+```
+
+#### decode()
+
+```python
+# Reconstruct activations from features
+reconstructed = sae.decode(features)
+# Input: [batch, pos, d_sae]
+# Output: [batch, pos, d_in]
+```
+
+#### forward()
+
+```python
+# Full forward pass (encode + decode)
+reconstructed = sae(activations)
+# Returns reconstructed activations
+```
+
+#### save_model()
+
+```python
+sae.save_model("/path/to/save")
+```
+
+---
+
+## SAEConfig
+
+Configuration class for SAE architecture and training context.
+
+### Key Parameters
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `d_in` | int | Input dimension (model's d_model) |
+| `d_sae` | int | SAE hidden dimension |
+| `architecture` | str | "standard", "gated", "jumprelu", "topk" |
+| `activation_fn_str` | str | Activation function name |
+| `model_name` | str | Source model name |
+| `hook_name` | str | Hook point in model |
+| `normalize_activations` | str | Normalization method |
+| `dtype` | str | Data type |
+| `device` | str | Device |
+
+### Accessing Config
+
+```python
+print(sae.cfg.d_in)      # 768 for GPT-2 small
+print(sae.cfg.d_sae)     # e.g., 24576 (32x expansion)
+print(sae.cfg.hook_name) # e.g., "blocks.8.hook_resid_pre"
+```
+
+---
+
+## LanguageModelSAERunnerConfig
+
+Comprehensive configuration for training SAEs.
+
+### Example Configuration
+
+```python
+from sae_lens import LanguageModelSAERunnerConfig
+
+cfg = LanguageModelSAERunnerConfig(
+    # Model and hook
+    model_name="gpt2-small",
+    hook_name="blocks.8.hook_resid_pre",
+    hook_layer=8,
+    d_in=768,
+
+    # SAE architecture
+    architecture="standard",  # "standard", "gated", "jumprelu", "topk"
+    d_sae=768 * 8,           # Expansion factor
+    activation_fn="relu",
+
+    # Training hyperparameters
+    lr=4e-4,
+    l1_coefficient=8e-5,
+    lp_norm=1.0,
+    lr_scheduler_name="constant",
+    lr_warm_up_steps=500,
+
+    # Sparsity control
+    l1_warm_up_steps=1000,
+    use_ghost_grads=True,
+    feature_sampling_window=1000,
+    dead_feature_window=5000,
+    dead_feature_threshold=1e-8,
+
+    # Data
+    dataset_path="monology/pile-uncopyrighted",
+    streaming=True,
+    context_size=128,
+
+    # Batch sizes
+    train_batch_size_tokens=4096,
+    store_batch_size_prompts=16,
+    n_batches_in_buffer=64,
+
+    # Training duration
+    training_tokens=100_000_000,
+
+    # Logging
+    log_to_wandb=True,
+    wandb_project="sae-training",
+    wandb_log_frequency=100,
+
+    # Checkpointing
+    checkpoint_path="checkpoints",
+    n_checkpoints=5,
+
+    # Hardware
+    device="cuda",
+    dtype="float32",
+)
+```
+
+### Key Parameters Explained
+
+#### Architecture Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `architecture` | SAE type: "standard", "gated", "jumprelu", "topk" |
+| `d_sae` | Hidden dimension (or use `expansion_factor`) |
+| `expansion_factor` | Alternative to d_sae: d_sae = d_in × expansion_factor |
+| `activation_fn` | "relu", "topk", etc. |
+| `activation_fn_kwargs` | Dict for activation params (e.g., {"k": 50} for topk) |
+
+#### Sparsity Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `l1_coefficient` | L1 penalty weight (higher = sparser) |
+| `l1_warm_up_steps` | Steps to ramp up L1 penalty |
+| `use_ghost_grads` | Apply gradients to dead features |
+| `dead_feature_threshold` | Activation threshold for "dead" |
+| `dead_feature_window` | Steps to check for dead features |
+
+#### Learning Rate Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `lr` | Base learning rate |
+| `lr_scheduler_name` | "constant", "cosineannealing", etc. |
+| `lr_warm_up_steps` | LR warmup steps |
+| `lr_decay_steps` | Steps for LR decay |
+
+---
+
+## SAETrainingRunner
+
+Main class for executing training.
+
+### Basic Training
+
+```python
+from sae_lens import SAETrainingRunner, LanguageModelSAERunnerConfig
+
+cfg = LanguageModelSAERunnerConfig(...)
+runner = SAETrainingRunner(cfg)
+sae = runner.run()
+```
+
+### Accessing Training Metrics
+
+```python
+# During training, metrics logged to W&B include:
+# - l0: Average active features
+# - ce_loss_score: Cross-entropy recovery
+# - mse_loss: Reconstruction loss
+# - l1_loss: Sparsity loss
+# - dead_features: Count of dead features
+```
+
+---
+
+## ActivationsStore
+
+Manages activation collection and batching.
+
+### Basic Usage
+
+```python
+from sae_lens import ActivationsStore
+
+store = ActivationsStore.from_sae(
+    model=model,
+    sae=sae,
+    store_batch_size_prompts=8,
+    train_batch_size_tokens=4096,
+    n_batches_in_buffer=32,
+    device="cuda",
+)
+
+# Get batch of activations
+activations = store.get_batch_tokens()
+```
+
+---
+
+## HookedSAETransformer
+
+Integration of SAEs with TransformerLens models.
+
+### Basic Usage
+
+```python
+from sae_lens import HookedSAETransformer
+
+# Load model with SAE
+model = HookedSAETransformer.from_pretrained("gpt2-small")
+model.add_sae(sae)
+
+# Run with SAE in the loop
+output = model.run_with_saes(tokens, saes=[sae])
+
+# Cache with SAE activations
+output, cache = model.run_with_cache_with_saes(tokens, saes=[sae])
+```
+
+---
+
+## SAE Architectures
+
+### Standard (ReLU + L1)
+
+```python
+cfg = LanguageModelSAERunnerConfig(
+    architecture="standard",
+    activation_fn="relu",
+    l1_coefficient=8e-5,
+)
+```
+
+### Gated
+
+```python
+cfg = LanguageModelSAERunnerConfig(
+    architecture="gated",
+)
+```
+
+### TopK
+
+```python
+cfg = LanguageModelSAERunnerConfig(
+    architecture="topk",
+    activation_fn="topk",
+    activation_fn_kwargs={"k": 50},  # Exactly 50 active features
+)
+```
+
+### JumpReLU (State-of-the-art)
+
+```python
+cfg = LanguageModelSAERunnerConfig(
+    architecture="jumprelu",
+)
+```
+
+---
+
+## Utility Functions
+
+### Upload to HuggingFace
+
+```python
+from sae_lens import upload_saes_to_huggingface
+
+upload_saes_to_huggingface(
+    saes=[sae],
+    repo_id="username/my-saes",
+    token="hf_token",
+)
+```
+
+### Neuronpedia Integration
+
+```python
+# Features can be viewed on Neuronpedia
+# URL format: neuronpedia.org/{model}/{layer}-{sae_type}/{feature_id}
+# Example: neuronpedia.org/gpt2-small/8-res-jb/1234
+```
--- a/skills/mlops/evaluation/saelens/references/tutorials.md
+++ b/skills/mlops/evaluation/saelens/references/tutorials.md
@ -0,0 +1,318 @@
+# SAELens Tutorials
+
+## Tutorial 1: Loading and Analyzing Pre-trained SAEs
+
+### Goal
+Load a pre-trained SAE and analyze which features activate on specific inputs.
+
+### Step-by-Step
+
+```python
+from transformer_lens import HookedTransformer
+from sae_lens import SAE
+import torch
+
+# 1. Load model and SAE
+model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
+sae, cfg_dict, sparsity = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+print(f"SAE input dim: {sae.cfg.d_in}")
+print(f"SAE hidden dim: {sae.cfg.d_sae}")
+print(f"Expansion factor: {sae.cfg.d_sae / sae.cfg.d_in:.1f}x")
+
+# 2. Get model activations
+prompt = "The capital of France is Paris"
+tokens = model.to_tokens(prompt)
+_, cache = model.run_with_cache(tokens)
+activations = cache["resid_pre", 8]  # [1, seq_len, 768]
+
+# 3. Encode to SAE features
+features = sae.encode(activations)  # [1, seq_len, d_sae]
+
+# 4. Analyze sparsity
+active_per_token = (features > 0).sum(dim=-1)
+print(f"Average active features per token: {active_per_token.float().mean():.1f}")
+
+# 5. Find top features for each token
+str_tokens = model.to_str_tokens(prompt)
+for pos in range(len(str_tokens)):
+    top_features = features[0, pos].topk(5)
+    print(f"\nToken '{str_tokens[pos]}':")
+    for feat_idx, feat_val in zip(top_features.indices, top_features.values):
+        print(f"  Feature {feat_idx.item()}: {feat_val.item():.3f}")
+
+# 6. Check reconstruction quality
+reconstructed = sae.decode(features)
+mse = ((activations - reconstructed) ** 2).mean()
+print(f"\nReconstruction MSE: {mse.item():.6f}")
+```
+
+---
+
+## Tutorial 2: Training a Custom SAE
+
+### Goal
+Train a Sparse Autoencoder on GPT-2 activations.
+
+### Step-by-Step
+
+```python
+from sae_lens import LanguageModelSAERunnerConfig, SAETrainingRunner
+
+# 1. Configure training
+cfg = LanguageModelSAERunnerConfig(
+    # Model
+    model_name="gpt2-small",
+    hook_name="blocks.6.hook_resid_pre",
+    hook_layer=6,
+    d_in=768,
+
+    # SAE architecture
+    architecture="standard",
+    d_sae=768 * 8,  # 8x expansion
+    activation_fn="relu",
+
+    # Training
+    lr=4e-4,
+    l1_coefficient=8e-5,
+    l1_warm_up_steps=1000,
+    train_batch_size_tokens=4096,
+    training_tokens=10_000_000,  # Small run for demo
+
+    # Data
+    dataset_path="monology/pile-uncopyrighted",
+    streaming=True,
+    context_size=128,
+
+    # Dead feature prevention
+    use_ghost_grads=True,
+    dead_feature_window=5000,
+
+    # Logging
+    log_to_wandb=True,
+    wandb_project="sae-training-demo",
+
+    # Hardware
+    device="cuda",
+    dtype="float32",
+)
+
+# 2. Train
+runner = SAETrainingRunner(cfg)
+sae = runner.run()
+
+# 3. Save
+sae.save_model("./my_trained_sae")
+```
+
+### Hyperparameter Tuning Guide
+
+| If you see... | Try... |
+|---------------|--------|
+| High L0 (>200) | Increase `l1_coefficient` |
+| Low CE recovery (<80%) | Decrease `l1_coefficient`, increase `d_sae` |
+| Many dead features (>5%) | Enable `use_ghost_grads`, increase `l1_warm_up_steps` |
+| Training instability | Lower `lr`, increase `lr_warm_up_steps` |
+
+---
+
+## Tutorial 3: Feature Attribution and Steering
+
+### Goal
+Identify which SAE features contribute to specific predictions and use them for steering.
+
+### Step-by-Step
+
+```python
+from transformer_lens import HookedTransformer
+from sae_lens import SAE
+import torch
+
+model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
+sae, _, _ = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+# 1. Feature attribution for a specific prediction
+prompt = "The capital of France is"
+tokens = model.to_tokens(prompt)
+_, cache = model.run_with_cache(tokens)
+activations = cache["resid_pre", 8]
+features = sae.encode(activations)
+
+# Target token
+target_token = model.to_single_token(" Paris")
+
+# Compute feature contributions to target logit
+# contribution = feature_activation * decoder_weight * unembedding
+W_dec = sae.W_dec  # [d_sae, d_model]
+W_U = model.W_U    # [d_model, d_vocab]
+
+# Feature direction projected to vocabulary
+feature_to_logit = W_dec @ W_U  # [d_sae, d_vocab]
+
+# Contribution of each feature to "Paris" at final position
+feature_acts = features[0, -1]  # [d_sae]
+contributions = feature_acts * feature_to_logit[:, target_token]
+
+# Top contributing features
+top_features = contributions.topk(10)
+print("Top features contributing to 'Paris':")
+for idx, val in zip(top_features.indices, top_features.values):
+    print(f"  Feature {idx.item()}: {val.item():.3f}")
+
+# 2. Feature steering
+def steer_with_feature(feature_idx, strength=5.0):
+    """Add a feature direction to the residual stream."""
+    feature_direction = sae.W_dec[feature_idx]  # [d_model]
+
+    def hook(activation, hook_obj):
+        activation[:, -1, :] += strength * feature_direction
+        return activation
+
+    output = model.generate(
+        tokens,
+        max_new_tokens=10,
+        fwd_hooks=[("blocks.8.hook_resid_pre", hook)]
+    )
+    return model.to_string(output[0])
+
+# Try steering with top feature
+top_feature_idx = top_features.indices[0].item()
+print(f"\nSteering with feature {top_feature_idx}:")
+print(steer_with_feature(top_feature_idx, strength=10.0))
+```
+
+---
+
+## Tutorial 4: Feature Ablation
+
+### Goal
+Test the causal importance of features by ablating them.
+
+### Step-by-Step
+
+```python
+from transformer_lens import HookedTransformer
+from sae_lens import SAE
+import torch
+
+model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
+sae, _, _ = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+prompt = "The capital of France is"
+tokens = model.to_tokens(prompt)
+
+# Baseline prediction
+baseline_logits = model(tokens)
+target_token = model.to_single_token(" Paris")
+baseline_prob = torch.softmax(baseline_logits[0, -1], dim=-1)[target_token].item()
+print(f"Baseline P(Paris): {baseline_prob:.4f}")
+
+# Get features to ablate
+_, cache = model.run_with_cache(tokens)
+activations = cache["resid_pre", 8]
+features = sae.encode(activations)
+top_features = features[0, -1].topk(10).indices
+
+# Ablate top features one by one
+for feat_idx in top_features:
+    def ablation_hook(activation, hook, feat_idx=feat_idx):
+        # Encode → zero feature → decode
+        feats = sae.encode(activation)
+        feats[:, :, feat_idx] = 0
+        return sae.decode(feats)
+
+    ablated_logits = model.run_with_hooks(
+        tokens,
+        fwd_hooks=[("blocks.8.hook_resid_pre", ablation_hook)]
+    )
+    ablated_prob = torch.softmax(ablated_logits[0, -1], dim=-1)[target_token].item()
+    change = (ablated_prob - baseline_prob) / baseline_prob * 100
+    print(f"Ablate feature {feat_idx.item()}: P(Paris)={ablated_prob:.4f} ({change:+.1f}%)")
+```
+
+---
+
+## Tutorial 5: Comparing Features Across Prompts
+
+### Goal
+Find which features activate consistently for a concept.
+
+### Step-by-Step
+
+```python
+from transformer_lens import HookedTransformer
+from sae_lens import SAE
+import torch
+
+model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
+sae, _, _ = SAE.from_pretrained(
+    release="gpt2-small-res-jb",
+    sae_id="blocks.8.hook_resid_pre",
+    device="cuda"
+)
+
+# Test prompts about the same concept
+prompts = [
+    "The Eiffel Tower is located in",
+    "Paris is the capital of",
+    "France's largest city is",
+    "The Louvre museum is in",
+]
+
+# Collect feature activations
+all_features = []
+for prompt in prompts:
+    tokens = model.to_tokens(prompt)
+    _, cache = model.run_with_cache(tokens)
+    activations = cache["resid_pre", 8]
+    features = sae.encode(activations)
+    # Take max activation across positions
+    max_features = features[0].max(dim=0).values
+    all_features.append(max_features)
+
+all_features = torch.stack(all_features)  # [n_prompts, d_sae]
+
+# Find features that activate consistently
+mean_activation = all_features.mean(dim=0)
+min_activation = all_features.min(dim=0).values
+
+# Features active in ALL prompts
+consistent_features = (min_activation > 0.5).nonzero().squeeze(-1)
+print(f"Features active in all prompts: {len(consistent_features)}")
+
+# Top consistent features
+top_consistent = mean_activation[consistent_features].topk(min(10, len(consistent_features)))
+print("\nTop consistent features (possibly 'France/Paris' related):")
+for idx, val in zip(top_consistent.indices, top_consistent.values):
+    feat_idx = consistent_features[idx].item()
+    print(f"  Feature {feat_idx}: mean activation {val.item():.3f}")
+```
+
+---
+
+## External Resources
+
+### Official Tutorials
+- [Basic Loading & Analysis](https://github.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
+- [Training SAEs](https://github.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
+- [Logits Lens with Features](https://github.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
+
+### ARENA Curriculum
+Comprehensive SAE course: https://www.lesswrong.com/posts/LnHowHgmrMbWtpkxx/intro-to-superposition-and-sparse-autoencoders-colab
+
+### Key Papers
+- [Towards Monosemanticity](https://transformer-circuits.pub/2023/monosemantic-features) - Anthropic (2023)
+- [Scaling Monosemanticity](https://transformer-circuits.pub/2024/scaling-monosemanticity/) - Anthropic (2024)
+- [Sparse Autoencoders Find Interpretable Features](https://arxiv.org/abs/2309.08600) - ICLR 2024