feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934)

* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap Map active skills to Telegram's slash command menu so users can discover and invoke skills directly. Three changes: 1. Telegram menu now includes active skill commands alongside built-in commands, capped at 100 entries (Telegram Bot API limit). Overflow commands remain callable but hidden from the picker. Logged at startup when cap is hit. 2. New /commands [page] gateway command for paginated browsing of all commands + skills. /help now shows first 10 skill commands and points to /commands for the full list. 3. When a user types a slash command that matches a disabled or uninstalled skill, they get actionable guidance: - Disabled: 'Enable it with: hermes skills config' - Optional (not installed): 'Install with: hermes skills install official/<path>' Built on ideas from PR #3921 by @kshitijk4poor. * chore: move 21 niche skills to optional-skills Move specialized/niche skills from built-in (skills/) to optional (optional-skills/) to reduce the default skill count. Users can install them with: hermes skills install official/<category>/<name> Moved skills (21): - mlops: accelerate, chroma, faiss, flash-attention, hermes-atropos-environments, huggingface-tokenizers, instructor, lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning, qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan - research: domain-intel, duckduckgo-search - devops: inference-sh cli Built-in skills: 96 → 75 Optional skills: 22 → 43 * fix: only include repo built-in skills in Telegram menu, not user-installed User-installed skills (from hub or manually added) stay accessible via /skills and by typing the command directly, but don't get registered in the Telegram slash command picker. Only skills whose SKILL.md is under the repo's skills/ directory are included in the menu. This keeps the Telegram menu focused on the curated built-in set while user-installed skills remain discoverable through /skills and /commands.
2026-04-25 00:51:20 +00:00 · 2026-03-30 10:57:30 -07:00 · 2026-03-30 10:57:30 -07:00 · 5ceed021dc
commit 5ceed021dc
parent 97d6813f51
73 changed files with 163 additions and 4 deletions
--- a/optional-skills/mlops/simpo/references/loss-functions.md
+++ b/optional-skills/mlops/simpo/references/loss-functions.md
@ -0,0 +1,350 @@
+# Loss Functions
+
+Complete guide to SimPO loss functions and mathematical formulations.
+
+## Overview
+
+SimPO supports two loss types:
+- **Sigmoid** (default) - Smooth, differentiable loss
+- **Hinge** - Margin-based, sparse loss
+
+Both are reference-free (no reference model needed).
+
+## SimPO Loss Formula
+
+### Core Calculation
+
+**Step 1: Log probability ratio**:
+```
+pi_logratios = log P_θ(y_chosen|x) - log P_θ(y_rejected|x)
+```
+
+**Step 2: Apply target margin**:
+```
+logits = pi_logratios - γ/β
+```
+Where:
+- γ/β = `gamma_beta_ratio` (target margin)
+
+**Step 3: Compute loss** (depends on loss type)
+
+### Sigmoid Loss (Default)
+
+**Formula**:
+```
+L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε
+```
+
+Where:
+- β = `beta` (reward scaling)
+- σ = sigmoid function
+- ε = `label_smoothing` (default 0.0)
+
+**Implementation**:
+```python
+losses = (
+    -F.logsigmoid(self.beta * logits) * (1 - self.label_smoothing)
+    - F.logsigmoid(-self.beta * logits) * self.label_smoothing
+)
+```
+
+**Characteristics**:
+- Smooth, continuous gradients
+- Probabilistic interpretation
+- Standard choice for most tasks
+- Works well with higher beta values
+
+### Hinge Loss
+
+**Formula**:
+```
+L = max(0, 1 - β * logits)
+```
+
+**Implementation**:
+```python
+losses = torch.relu(1 - self.beta * logits)
+```
+
+**Characteristics**:
+- Non-smooth (has kink at logits = 1/β)
+- Margin-based (SVM-style)
+- Can lead to sparser solutions
+- Less commonly used
+
+## Comparison to DPO
+
+### DPO Loss (Reference Model Required)
+
+**Formula**:
+```
+L_DPO = -E[log σ(β * log(π_θ(y_w|x)/π_ref(y_w|x)) - β * log(π_θ(y_l|x)/π_ref(y_l|x)))]
+```
+
+**Key features**:
+- Requires reference model π_ref
+- Normalizes by reference log probabilities
+- More conservative (stays close to reference)
+
+### SimPO Loss (Reference-Free)
+
+**Formula**:
+```
+L_SimPO = -log σ(β * (log π_θ(y_w|x) - log π_θ(y_l|x) - γ/β))
+```
+
+**Key features**:
+- No reference model needed
+- Direct preference optimization
+- Target margin γ/β controls preference strength
+- More efficient (fewer model forward passes)
+
+**Visual comparison**:
+```
+DPO:    [Policy] - [Reference] → Loss
+SimPO:  [Policy]               → Loss
+```
+
+## Average Log Probability Reward
+
+### Calculation
+
+**Per-token log probabilities**:
+```python
+# Get log probs for each token
+per_token_logps = log_softmax(logits).gather(dim=-1, index=labels)
+
+# Create mask to ignore padding
+loss_mask = (labels != label_pad_token_id)
+```
+
+**Average log probability** (if `average_log_prob=True`):
+```python
+avg_logp = (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
+```
+
+**Sum log probability** (if `average_log_prob=False`):
+```python
+sum_logp = (per_token_logps * loss_mask).sum(-1)
+```
+
+**Why average?**
+- Normalizes for sequence length
+- Prevents bias toward shorter/longer responses
+- Standard practice in SimPO
+
+### Reward Metrics
+
+**Chosen reward**:
+```python
+chosen_rewards = beta * policy_chosen_logps.detach()
+```
+
+**Rejected reward**:
+```python
+rejected_rewards = beta * policy_rejected_logps.detach()
+```
+
+**Reward margin**:
+```python
+reward_margin = chosen_rewards.mean() - rejected_rewards.mean()
+```
+
+## Label Smoothing
+
+### Formula with Smoothing
+
+**Sigmoid loss**:
+```
+L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε
+```
+
+**Effect**:
+- ε = 0.0: No smoothing (default)
+- ε = 0.1: 10% smoothing (soft labels)
+- ε = 0.5: Maximum smoothing
+
+**When to use**:
+- Noisy preference labels
+- Uncertain preferences
+- Prevent overconfidence
+
+**Config**:
+```yaml
+label_smoothing: 0.1  # 10% smoothing
+```
+
+## SFT Regularization
+
+### Combined Loss
+
+**With SFT component**:
+```
+L_total = L_SimPO + λ * L_SFT
+```
+
+Where:
+- L_SFT = cross-entropy loss on chosen responses
+- λ = `sft_weight` (0.0 to 1.0)
+
+**Implementation**:
+```python
+if self.sft_weight > 0:
+    sft_loss = -policy_chosen_logps
+    total_loss = simpo_loss + self.sft_weight * sft_loss
+```
+
+**When to use**:
+- Preserve model capabilities
+- Prevent catastrophic forgetting
+- Fine-tuning instruct models
+
+**Trade-off**:
+- Higher sft_weight: Preserve capabilities, less alignment
+- Lower sft_weight: Stronger alignment, may forget capabilities
+
+**Config**:
+```yaml
+sft_weight: 0.1  # 10% SFT regularization
+```
+
+## Loss Type Selection
+
+### Sigmoid vs Hinge
+
+| Aspect | Sigmoid | Hinge |
+|--------|---------|-------|
+| Smoothness | Smooth | Non-smooth |
+| Gradients | Continuous | Discontinuous at margin |
+| Sparsity | Dense solutions | Sparse solutions |
+| Interpretability | Probabilistic | Geometric margin |
+| Use case | **General purpose** | Margin-based tasks |
+| Recommendation | **Default choice** | Experimental |
+
+**Config**:
+```yaml
+# Sigmoid (default)
+loss_type: sigmoid
+
+# Hinge (alternative)
+loss_type: hinge
+```
+
+## Mathematical Properties
+
+### Gradient Analysis
+
+**Sigmoid loss gradient**:
+```
+∂L/∂logits = -β * σ(-β * logits) * (1 - ε) + β * σ(β * logits) * ε
+```
+
+**Hinge loss gradient**:
+```
+∂L/∂logits = -β   if logits < 1/β
+             0     otherwise
+```
+
+**Implications**:
+- Sigmoid: Always provides gradient signal
+- Hinge: No gradient when margin satisfied
+
+### Convergence Behavior
+
+**Sigmoid**:
+- Asymptotically approaches zero loss
+- Continues optimizing even with large margins
+- Smoother training curves
+
+**Hinge**:
+- Reaches zero loss at margin
+- Stops optimizing once margin satisfied
+- May have training plateaus
+
+## Complete Loss Examples
+
+### Example 1: Basic SimPO (Sigmoid)
+
+**Config**:
+```yaml
+beta: 2.0
+gamma_beta_ratio: 0.5
+loss_type: sigmoid
+label_smoothing: 0.0
+sft_weight: 0.0
+```
+
+**Loss calculation**:
+```python
+# Step 1: Compute log probs
+chosen_logps = avg_log_prob(policy(chosen))    # e.g., -1.2
+rejected_logps = avg_log_prob(policy(rejected)) # e.g., -2.5
+
+# Step 2: Log ratio and margin
+pi_logratios = -1.2 - (-2.5) = 1.3
+logits = 1.3 - 0.5 = 0.8
+
+# Step 3: Sigmoid loss
+loss = -log(sigmoid(2.0 * 0.8))
+     = -log(sigmoid(1.6))
+     = -log(0.832)
+     = 0.184
+```
+
+### Example 2: SimPO with SFT
+
+**Config**:
+```yaml
+beta: 2.5
+gamma_beta_ratio: 0.5
+loss_type: sigmoid
+sft_weight: 0.1
+```
+
+**Loss calculation**:
+```python
+# SimPO loss (as above)
+simpo_loss = 0.184
+
+# SFT loss
+sft_loss = -chosen_logps = -(-1.2) = 1.2
+
+# Total loss
+total_loss = simpo_loss + 0.1 * sft_loss
+           = 0.184 + 0.12
+           = 0.304
+```
+
+## Debugging
+
+### Check Reward Margins
+
+**Low margin (< 0.5)**:
+- Preferences not being learned
+- Increase beta or gamma_beta_ratio
+
+**High margin (> 5.0)**:
+- May be overfitting
+- Reduce beta or learning rate
+
+**Monitor**:
+```python
+reward_margin = chosen_rewards.mean() - rejected_rewards.mean()
+print(f"Reward margin: {reward_margin:.2f}")
+```
+
+### Check Log Probabilities
+
+**Typical values**:
+- Chosen: -1.0 to -2.0 (higher is better)
+- Rejected: -2.0 to -4.0 (lower is worse)
+
+**Warning signs**:
+- Both very negative (< -10): Model not learning
+- Both very positive (> 0): Numerical instability
+
+## References
+
+- SimPO paper: https://arxiv.org/abs/2405.14734
+- DPO paper: https://arxiv.org/abs/2305.18290
+- Implementation: https://github.com/princeton-nlp/SimPO