mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-25 05:52:34 +00:00
chore(skills): move heavy training skills + outlines to optional-skills (#22912)
These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't be active by default. Moved to optional-skills/ where users opt-in via `hermes skills install official/...`. Moved: - mlops/training/axolotl - mlops/training/trl-fine-tuning - mlops/training/unsloth - mlops/inference/outlines Counts: 91 -> 87 built-in, 72 -> 76 optional. Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
This commit is contained in:
parent
4375b82cd9
commit
ded194eb6a
27 changed files with 18 additions and 18 deletions
|
|
@ -14,8 +14,8 @@ Outlines: structured JSON/regex/Pydantic LLM generation.
|
|||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/mlops/inference/outlines` |
|
||||
| Source | Optional — install with `hermes skills install official/mlops/outlines` |
|
||||
| Path | `optional-skills/mlops/inference/outlines` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
|
|
@ -14,8 +14,8 @@ Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO).
|
|||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/mlops/training/axolotl` |
|
||||
| Source | Optional — install with `hermes skills install official/mlops/axolotl` |
|
||||
| Path | `optional-skills/mlops/training/axolotl` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
|
|
@ -14,8 +14,8 @@ TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF.
|
|||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/mlops/training/trl-fine-tuning` |
|
||||
| Source | Optional — install with `hermes skills install official/mlops/trl-fine-tuning` |
|
||||
| Path | `optional-skills/mlops/training/trl-fine-tuning` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
|
|
@ -270,7 +270,7 @@ trl dpo \
|
|||
|
||||
Train with reinforcement learning using minimal memory.
|
||||
|
||||
For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
|
||||
For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
|
||||
|
||||
Copy this checklist:
|
||||
|
||||
|
|
@ -440,15 +440,15 @@ config = PPOConfig(
|
|||
|
||||
## Advanced topics
|
||||
|
||||
**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
|
||||
**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
|
||||
|
||||
**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
|
||||
**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
|
||||
|
||||
**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
|
||||
**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
|
||||
|
||||
**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
|
||||
**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
|
||||
|
||||
**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
|
||||
**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
|
||||
|
||||
## Hardware requirements
|
||||
|
||||
|
|
@ -14,8 +14,8 @@ Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM.
|
|||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/mlops/training/unsloth` |
|
||||
| Source | Optional — install with `hermes skills install official/mlops/unsloth` |
|
||||
| Path | `optional-skills/mlops/training/unsloth` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Orchestra Research |
|
||||
| License | MIT |
|
||||
Loading…
Add table
Add a link
Reference in a new issue