hermes-agent/skills/mlops/grpo-rl-training
teknium1 14e59706b7 Add Skills Hub — universal skill search, install, and management from online registries
Implements the Hermes Skills Hub with agentskills.io spec compliance,
multi-registry skill discovery, security scanning, and user-driven
management via CLI and /skills slash command.

Core features:
- Security scanner (tools/skills_guard.py): 120 threat patterns across
  12 categories, trust-aware install policy (builtin/trusted/community),
  structural checks, unicode injection detection, LLM audit pass
- Hub client (tools/skills_hub.py): GitHub, ClawHub, Claude Code
  marketplace, and LobeHub source adapters with shared GitHubAuth
  (PAT + gh CLI + GitHub App), lock file provenance tracking, quarantine
  flow, and unified search across all sources
- CLI interface (hermes_cli/skills_hub.py): search, install, inspect,
  list, audit, uninstall, publish (GitHub PR), snapshot export/import,
  and tap management — powers both `hermes skills` and `/skills`

Spec conformance (Phase 0):
- Upgraded frontmatter parser to yaml.safe_load with fallback
- Migrated 39 SKILL.md files: tags/related_skills to metadata.hermes.*
- Added assets/ directory support and compatibility/metadata fields
- Excluded .hub/ from skill discovery in skills_tool.py

Updated 13 config/doc files including README, AGENTS.md, .env.example,
setup wizard, doctor, status, pyproject.toml, and docs.
2026-02-18 16:09:05 -08:00
..
templates Add skills tools and enhance model integration 2026-01-30 07:39:55 +00:00
README.md Add skills tools and enhance model integration 2026-01-30 07:39:55 +00:00
SKILL.md Add Skills Hub — universal skill search, install, and management from online registries 2026-02-18 16:09:05 -08:00

GRPO/RL Training Skill

Expert-level guidance for Group Relative Policy Optimization with TRL

📁 Skill Structure

grpo-rl-training/
├── SKILL.md                              # Main skill documentation (READ THIS FIRST)
├── README.md                             # This file
├── templates/
│   └── basic_grpo_training.py            # Production-ready training template
└── examples/
    └── reward_functions_library.py       # 20+ reward function examples

🚀 Quick Start

  1. Read SKILL.md - Comprehensive guide with all concepts and patterns
  2. Copy templates/basic_grpo_training.py - Start with working code
  3. Browse examples/reward_functions_library.py - Pick reward functions for your task
  4. Modify for your use case - Adapt dataset, rewards, and config

💡 What's Inside

SKILL.md (Main Documentation)

  • Core GRPO concepts and algorithm fundamentals
  • Complete implementation workflow (dataset → rewards → training → deployment)
  • 10+ reward function examples with code
  • Hyperparameter tuning guide
  • Training insights (loss behavior, metrics, debugging)
  • Troubleshooting guide
  • Production best practices

Templates

  • basic_grpo_training.py: Minimal, production-ready training script
    • Uses Qwen 2.5 1.5B Instruct
    • 3 reward functions (format + correctness)
    • LoRA for efficient training
    • Fully documented and ready to run

Examples

  • reward_functions_library.py: 20+ battle-tested reward functions
    • Correctness rewards (exact match, fuzzy match, numeric, code execution)
    • Format rewards (XML, JSON, strict/soft)
    • Length rewards (ideal length, min/max)
    • Style rewards (reasoning quality, citations, repetition penalty)
    • Combined rewards (multi-objective optimization)
    • Preset collections for common tasks

📖 Usage for Agents

When this skill is loaded in your agent's context:

  1. Always read SKILL.md first before implementing
  2. Start simple - Use length-based reward to validate setup
  3. Build incrementally - Add one reward function at a time
  4. Reference examples - Copy patterns from reward_functions_library.py
  5. Monitor training - Watch reward metrics (not loss!)

🎯 Common Use Cases

Task Type Recommended Rewards Template
Math reasoning MATH_REASONING_REWARDS preset basic_grpo_training.py
Code generation CODE_GENERATION_REWARDS preset Modify dataset in template
Summarization SUMMARIZATION_REWARDS preset Adjust prompts + rewards
Q&A QA_REWARDS preset Use fuzzy match + citations

⚠️ Critical Reminders

  • Loss goes UP during training - This is normal (it's KL divergence)
  • Use 3-5 reward functions - Single rewards often fail
  • Test rewards before training - Debug each function independently
  • Monitor reward_std - Should stay > 0.1 (avoid mode collapse)
  • Start with num_generations=4-8 - Scale up if GPU allows

🔗 External Resources

📝 Version

v1.0.0 - Initial release (January 2025)

👨‍💻 Maintained By

Orchestra Research For questions or improvements, see https://orchestra.com


License: MIT Last Updated: January 2025