mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934 )

* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap

Map active skills to Telegram's slash command menu so users can
discover and invoke skills directly. Three changes:

1. Telegram menu now includes active skill commands alongside built-in
   commands, capped at 100 entries (Telegram Bot API limit). Overflow
   commands remain callable but hidden from the picker. Logged at
   startup when cap is hit.

2. New /commands [page] gateway command for paginated browsing of all
   commands + skills. /help now shows first 10 skill commands and
   points to /commands for the full list.

3. When a user types a slash command that matches a disabled or
   uninstalled skill, they get actionable guidance:
   - Disabled: 'Enable it with: hermes skills config'
   - Optional (not installed): 'Install with: hermes skills install official/<path>'

Built on ideas from PR #3921 by @kshitijk4poor.

* chore: move 21 niche skills to optional-skills

Move specialized/niche skills from built-in (skills/) to optional
(optional-skills/) to reduce the default skill count. Users can
install them with: hermes skills install official/<category>/<name>

Moved skills (21):
- mlops: accelerate, chroma, faiss, flash-attention,
  hermes-atropos-environments, huggingface-tokenizers, instructor,
  lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning,
  qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan
- research: domain-intel, duckduckgo-search
- devops: inference-sh cli

Built-in skills: 96 → 75
Optional skills: 22 → 43

* fix: only include repo built-in skills in Telegram menu, not user-installed

User-installed skills (from hub or manually added) stay accessible via
/skills and by typing the command directly, but don't get registered
in the Telegram slash command picker. Only skills whose SKILL.md is
under the repo's skills/ directory are included in the menu.

This keeps the Telegram menu focused on the curated built-in set while
user-installed skills remain discoverable through /skills and /commands.

2026-03-30 10:57:30 -07:00

8.3 KiB

Raw Blame History

Hyperparameters

Complete guide to SimPO hyperparameter selection and tuning.

Overview

Key hyperparameters in SimPO:

Learning Rate - Most critical
Beta (β) - Reward scaling
Gamma-Beta Ratio (γ/β) - Target margin
SFT Weight - Regularization strength

Learning Rate

Recommended Ranges

By model size:

Model Size	Learning Rate	Notes
1B-3B	5e-7 to 1e-6	Higher end safe
7B-8B	3e-7 to 5e-7	Standard
13B-30B	1e-7 to 3e-7	Lower for stability
70B+	5e-8 to 1e-7	Very conservative

By task type:

Task	Learning Rate	Reason
General chat	5e-7	Standard
Code generation	3e-7	Precise reasoning
Math reasoning	3e-7	Careful optimization
Creative writing	1e-6	More aggressive OK

Why Learning Rate Matters

Too high (> 1e-6 for 7B):

Loss divergence
Catastrophic forgetting
Unstable training

Too low (< 1e-7 for 7B):

Very slow convergence
May not finish in time
Undertraining

Optimal (3e-7 to 5e-7 for 7B):

Stable convergence
Good final performance
Efficient training

Config Examples

Mistral 7B (general):

learning_rate: 5e-7
num_train_epochs: 1
warmup_ratio: 0.1
lr_scheduler_type: cosine

Llama 3 8B (reasoning):

learning_rate: 3e-7
num_train_epochs: 1
warmup_ratio: 0.1
lr_scheduler_type: cosine

Gemma 2 9B (creative):

learning_rate: 1e-6
num_train_epochs: 1
warmup_ratio: 0.1
lr_scheduler_type: linear

Beta (β)

Recommended Values

Range: 2.0 to 10.0 (much higher than DPO's 0.01-0.1)

By preference strength:

Beta	Preference Strength	Use Case
1.0-2.0	Weak	Subtle preferences
2.0-5.0	Standard	General alignment
5.0-10.0	Strong	Clear preferences

Default: 2.0 to 2.5

Why Beta Matters

Low beta (< 2.0):

Weak reward signal
Slow preference learning
May underfit

High beta (> 10.0):

Very strong reward signal
Risk of overfitting
May ignore weak preferences

Optimal (2.0-5.0):

Balanced reward scaling
Stable training
Good generalization

Interaction with Gamma

Beta and gamma together:

Target margin in reward space = gamma
Target margin in logit space = gamma / beta

Example:

beta: 2.0
gamma_beta_ratio: 0.5
# Effective gamma = 2.0 * 0.5 = 1.0

Config Examples

Weak preferences:

beta: 2.0
gamma_beta_ratio: 0.3  # Small margin

Standard:

beta: 2.5
gamma_beta_ratio: 0.5  # Default

Strong preferences:

beta: 5.0
gamma_beta_ratio: 0.7  # Larger margin

Gamma-Beta Ratio (γ/β)

Recommended Values

Range: 0.0 to 1.0

By scenario:

Ratio	Margin	Use Case
0.0-0.3	Small	Weak preference data
0.4-0.6	Standard	General use
0.7-1.0	Large	Very clear preferences

Default: 0.5

Why Gamma Matters

Low gamma (< 0.3):

Small target margin
Less aggressive alignment
More conservative

High gamma (> 0.7):

Large target margin
Stronger alignment
More aggressive

Optimal (0.4-0.6):

Balanced margin
Stable training
Good alignment

Mathematical Meaning

In loss function:

logits = pi_logratios - gamma_beta_ratio
loss = -log(sigmoid(beta * logits))

Interpretation:

gamma_beta_ratio shifts the decision boundary
Higher ratio = requires larger log prob difference
Controls how "clear" preferences must be

Config Examples

Noisy preferences:

gamma_beta_ratio: 0.3  # Smaller margin, more tolerant

Standard:

gamma_beta_ratio: 0.5  # Default

High-quality preferences:

gamma_beta_ratio: 0.8  # Larger margin, stricter

SFT Weight

Recommended Values

Range: 0.0 to 1.0

By model type:

Model Type	SFT Weight	Reason
Base model	0.0	No prior capabilities
Instruct model	0.05-0.1	Preserve instruction following
Chat model	0.1-0.2	Preserve conversational skills

Default: 0.0 (no SFT regularization)

Why SFT Weight Matters

Zero SFT (0.0):

Pure preference optimization
May forget capabilities
Standard for base models

Low SFT (0.05-0.1):

Balanced approach
Recommended for instruct models
Slight capability preservation

High SFT (> 0.2):

Strong capability preservation
Weaker preference alignment
May reduce alignment gains

Trade-off

Total Loss = SimPO Loss + (sft_weight * SFT Loss)

Example:

sft_weight: 0.1
# 90% preference optimization + 10% capability preservation

Config Examples

Base model (no SFT):

model_name_or_path: mistralai/Mistral-7B-v0.1
sft_weight: 0.0

Instruct model (light SFT):

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
sft_weight: 0.1

Chat model (moderate SFT):

model_name_or_path: HuggingFaceH4/zephyr-7b-beta
sft_weight: 0.2

Model-Size-Specific Recommendations

7B Models (Mistral, Llama 3)

Standard config:

learning_rate: 5e-7
beta: 2.0
gamma_beta_ratio: 0.5
sft_weight: 0.0  # 0.1 if instruct model
num_train_epochs: 1
per_device_train_batch_size: 2
gradient_accumulation_steps: 4

8B-13B Models

Standard config:

learning_rate: 3e-7
beta: 2.5
gamma_beta_ratio: 0.5
sft_weight: 0.1  # If instruct
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8

70B Models

Standard config:

learning_rate: 1e-7
beta: 2.0
gamma_beta_ratio: 0.5
sft_weight: 0.05
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 16

Batch Size & Gradient Accumulation

Effective Batch Size

Effective Batch Size = per_device_batch_size * num_gpus * grad_accum_steps

Recommended effective batch sizes:

7B: 128-256
13B: 64-128
70B: 32-64

Config Examples

Single GPU (A100 40GB):

per_device_train_batch_size: 1
gradient_accumulation_steps: 128  # Effective batch = 128

4 GPUs (A100 40GB):

per_device_train_batch_size: 2
gradient_accumulation_steps: 16  # Effective batch = 2*4*16 = 128

8 GPUs (A100 80GB):

per_device_train_batch_size: 2
gradient_accumulation_steps: 8  # Effective batch = 2*8*8 = 128

Loss Type

Sigmoid vs Hinge

Sigmoid (default, recommended):

loss_type: sigmoid
label_smoothing: 0.0

Hinge (experimental):

loss_type: hinge
# No label smoothing for hinge

When to use hinge:

Margin-based tasks
SVM-style optimization
Experimental purposes

Generally: Stick with sigmoid

Tuning Guide

Step 1: Start with Defaults

learning_rate: 5e-7  # For 7B
beta: 2.0
gamma_beta_ratio: 0.5
sft_weight: 0.0  # 0.1 if instruct
loss_type: sigmoid

Step 2: Monitor Training

Check every 100 steps:

Loss curve (should decrease smoothly)
Reward margin (should increase)
Chosen/rejected logps (should separate)

Step 3: Adjust if Needed

If loss diverges:

learning_rate: 3e-7  # Reduce from 5e-7
beta: 1.0           # Reduce from 2.0

If loss plateaus early:

learning_rate: 1e-6  # Increase from 5e-7
beta: 5.0           # Increase from 2.0

If model forgets:

sft_weight: 0.2  # Increase from 0.0

Complete Example Configs

Mistral 7B Base (Standard)

model_name_or_path: mistralai/Mistral-7B-v0.1
dataset_mixer:
  HuggingFaceH4/ultrafeedback_binarized: 1.0

learning_rate: 5e-7
beta: 2.0
gamma_beta_ratio: 0.5
loss_type: sigmoid
sft_weight: 0.0

num_train_epochs: 1
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
warmup_ratio: 0.1
lr_scheduler_type: cosine

bf16: true
gradient_checkpointing: true

Llama 3 8B Instruct (Reasoning)

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
dataset_mixer:
  argilla/distilabel-math-preference-dpo: 1.0

learning_rate: 3e-7
beta: 5.0
gamma_beta_ratio: 0.7
loss_type: sigmoid
sft_weight: 0.1

num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
warmup_ratio: 0.1
lr_scheduler_type: cosine

References

SimPO paper: https://arxiv.org/abs/2405.14734
Alignment Handbook: https://github.com/huggingface/alignment-handbook

8.3 KiB Raw Blame History Unescape Escape

Hyperparameters

Overview

Learning Rate

Recommended Ranges

Why Learning Rate Matters

Config Examples

Beta (β)

Recommended Values

Why Beta Matters

Interaction with Gamma

Config Examples

Gamma-Beta Ratio (γ/β)

Recommended Values

Why Gamma Matters

Mathematical Meaning

Config Examples

SFT Weight

Recommended Values

Why SFT Weight Matters

Trade-off

Config Examples

Model-Size-Specific Recommendations

7B Models (Mistral, Llama 3)

8B-13B Models

70B Models

Batch Size & Gradient Accumulation

Effective Batch Size

Config Examples

Loss Type

Sigmoid vs Hinge

Tuning Guide

Step 1: Start with Defaults

Step 2: Monitor Training

Step 3: Adjust if Needed

Complete Example Configs

Mistral 7B Base (Standard)

Llama 3 8B Instruct (Reasoning)

References

8.3 KiB

Raw Blame History