chore(skills): move heavy training skills + outlines to optional-skills (#22912)

These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't
be active by default. Moved to optional-skills/ where users opt-in via
`hermes skills install official/...`.

Moved:
- mlops/training/axolotl
- mlops/training/trl-fine-tuning
- mlops/training/unsloth
- mlops/inference/outlines

Counts: 91 -> 87 built-in, 72 -> 76 optional.

Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
This commit is contained in:
Teknium 2026-05-09 18:44:12 -07:00 committed by GitHub
parent 4375b82cd9
commit ded194eb6a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
27 changed files with 18 additions and 18 deletions

View file

@ -14,8 +14,8 @@ Outlines: structured JSON/regex/Pydantic LLM generation.
| | |
|---|---|
| Source | Bundled (installed by default) |
| Path | `skills/mlops/inference/outlines` |
| Source | Optional — install with `hermes skills install official/mlops/outlines` |
| Path | `optional-skills/mlops/inference/outlines` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |

View file

@ -14,8 +14,8 @@ Axolotl: YAML LLM fine-tuning (LoRA, DPO, GRPO).
| | |
|---|---|
| Source | Bundled (installed by default) |
| Path | `skills/mlops/training/axolotl` |
| Source | Optional — install with `hermes skills install official/mlops/axolotl` |
| Path | `optional-skills/mlops/training/axolotl` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |

View file

@ -14,8 +14,8 @@ TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF.
| | |
|---|---|
| Source | Bundled (installed by default) |
| Path | `skills/mlops/training/trl-fine-tuning` |
| Source | Optional — install with `hermes skills install official/mlops/trl-fine-tuning` |
| Path | `optional-skills/mlops/training/trl-fine-tuning` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |
@ -270,7 +270,7 @@ trl dpo \
Train with reinforcement learning using minimal memory.
For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py)**.
Copy this checklist:
@ -440,15 +440,15 @@ config = PPOConfig(
## Advanced topics
**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
**SFT training guide**: See [references/sft-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/sft-training.md) for dataset formats, chat templates, packing strategies, and multi-GPU training.
**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
**DPO variants**: See [references/dpo-variants.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/dpo-variants.md) for IPO, cDPO, RPO, and other DPO loss functions with recommended hyperparameters.
**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
**Reward modeling**: See [references/reward-modeling.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/reward-modeling.md) for outcome vs process rewards, Bradley-Terry loss, and reward model evaluation.
**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
**Online RL methods**: See [references/online-rl.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations.
**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
**GRPO deep dive**: See [references/grpo-training.md](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py).
## Hardware requirements

View file

@ -14,8 +14,8 @@ Unsloth: 2-5x faster LoRA/QLoRA fine-tuning, less VRAM.
| | |
|---|---|
| Source | Bundled (installed by default) |
| Path | `skills/mlops/training/unsloth` |
| Source | Optional — install with `hermes skills install official/mlops/unsloth` |
| Path | `optional-skills/mlops/training/unsloth` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |