mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

History

Teknium d0ffb111c2 refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 ) Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture) and manual analysis of the entire codebase. Changes by category: Unused imports removed (~95 across 55 files): - Removed genuinely unused imports from all major subsystems - agent/, hermes_cli/, tools/, gateway/, plugins/, cron/ - Includes imports in try/except blocks that were truly unused (vs availability checks which were left alone) Unused variables removed (~25): - Removed dead variables: connected, inner, channels, last_exc, source, new_server_names, verify, pconfig, default_terminal, result, pending_handled, temperature, loop - Dropped unused argparse subparser assignments in hermes_cli/main.py (12 instances of add_parser() where result was never used) Dead code removed: - run_agent.py: Removed dead ternary (None if False else None) and surrounding unreachable branch in identity fallback - run_agent.py: Removed write-only attribute _last_reported_tool - hermes_cli/providers.py: Removed dead @property decorator on module-level function (decorator has no effect outside a class) - gateway/run.py: Removed unused MCP config load before reconnect - gateway/platforms/slack.py: Removed dead SessionSource construction Undefined name bugs fixed (would cause NameError at runtime): - batch_runner.py: Added missing logger = logging.getLogger(__name__) - tools/environments/daytona.py: Added missing Dict and Path imports Unnecessary global statements removed (14): - tools/terminal_tool.py: 5 functions declared global for dicts they only mutated via .pop()/[key]=value (no rebinding) - tools/browser_tool.py: cleanup thread loop only reads flag - tools/rl_training_tool.py: 4 functions only do dict mutations - tools/mcp_oauth.py: only reads the global - hermes_time.py: only reads cached values Inefficient patterns fixed: - startswith/endswith tuple form: 15 instances of x.startswith('a') or x.startswith('b') consolidated to x.startswith(('a', 'b')) - len(x)==0 / len(x)>0: 13 instances replaced with pythonic truthiness checks (not x / bool(x)) - in dict.keys(): 5 instances simplified to in dict - Redefined unused name: removed duplicate _strip_mdv2 import in send_message_tool.py Other fixes: - hermes_cli/doctor.py: Replaced undefined logger.debug() with pass - hermes_cli/config.py: Consolidated chained .endswith() calls Test results: 3934 passed, 17 failed (all pre-existing on main), 19 skipped. Zero regressions.		2026-04-07 10:25:31 -07:00
..
templates	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )	2026-04-07 10:25:31 -07:00
README.md	refactor: reorganize skills into sub-categories	2026-03-09 03:35:53 -07:00
SKILL.md	refactor: reorganize skills into sub-categories	2026-03-09 03:35:53 -07:00

README.md

GRPO/RL Training Skill

Expert-level guidance for Group Relative Policy Optimization with TRL

📁 Skill Structure

grpo-rl-training/
├── SKILL.md                              # Main skill documentation (READ THIS FIRST)
├── README.md                             # This file
├── templates/
│   └── basic_grpo_training.py            # Production-ready training template
└── examples/
    └── reward_functions_library.py       # 20+ reward function examples

🚀 Quick Start

Read SKILL.md - Comprehensive guide with all concepts and patterns
Copy templates/basic_grpo_training.py - Start with working code
Browse examples/reward_functions_library.py - Pick reward functions for your task
Modify for your use case - Adapt dataset, rewards, and config

💡 What's Inside

SKILL.md (Main Documentation)

Core GRPO concepts and algorithm fundamentals
Complete implementation workflow (dataset → rewards → training → deployment)
10+ reward function examples with code
Hyperparameter tuning guide
Training insights (loss behavior, metrics, debugging)
Troubleshooting guide
Production best practices

Templates

basic_grpo_training.py: Minimal, production-ready training script
- Uses Qwen 2.5 1.5B Instruct
- 3 reward functions (format + correctness)
- LoRA for efficient training
- Fully documented and ready to run

Examples

reward_functions_library.py: 20+ battle-tested reward functions
- Correctness rewards (exact match, fuzzy match, numeric, code execution)
- Format rewards (XML, JSON, strict/soft)
- Length rewards (ideal length, min/max)
- Style rewards (reasoning quality, citations, repetition penalty)
- Combined rewards (multi-objective optimization)
- Preset collections for common tasks

📖 Usage for Agents

When this skill is loaded in your agent's context:

Always read SKILL.md first before implementing
Start simple - Use length-based reward to validate setup
Build incrementally - Add one reward function at a time
Reference examples - Copy patterns from reward_functions_library.py
Monitor training - Watch reward metrics (not loss!)

🎯 Common Use Cases

Task Type	Recommended Rewards	Template
Math reasoning	`MATH_REASONING_REWARDS` preset	basic_grpo_training.py
Code generation	`CODE_GENERATION_REWARDS` preset	Modify dataset in template
Summarization	`SUMMARIZATION_REWARDS` preset	Adjust prompts + rewards
Q&A	`QA_REWARDS` preset	Use fuzzy match + citations

⚠️ Critical Reminders

Loss goes UP during training - This is normal (it's KL divergence)
Use 3-5 reward functions - Single rewards often fail
Test rewards before training - Debug each function independently
Monitor reward_std - Should stay > 0.1 (avoid mode collapse)
Start with num_generations=4-8 - Scale up if GPU allows

🔗 External Resources

📝 Version

v1.0.0 - Initial release (January 2025)

👨‍💻 Maintained By

Orchestra Research For questions or improvements, see https://orchestra.com

License: MIT Last Updated: January 2025