mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
|
||
|---|---|---|
| .. | ||
| templates | ||
| README.md | ||
| SKILL.md | ||
GRPO/RL Training Skill
Expert-level guidance for Group Relative Policy Optimization with TRL
📁 Skill Structure
grpo-rl-training/
├── SKILL.md # Main skill documentation (READ THIS FIRST)
├── README.md # This file
├── templates/
│ └── basic_grpo_training.py # Production-ready training template
└── examples/
└── reward_functions_library.py # 20+ reward function examples
🚀 Quick Start
- Read SKILL.md - Comprehensive guide with all concepts and patterns
- Copy
templates/basic_grpo_training.py- Start with working code - Browse
examples/reward_functions_library.py- Pick reward functions for your task - Modify for your use case - Adapt dataset, rewards, and config
💡 What's Inside
SKILL.md (Main Documentation)
- Core GRPO concepts and algorithm fundamentals
- Complete implementation workflow (dataset → rewards → training → deployment)
- 10+ reward function examples with code
- Hyperparameter tuning guide
- Training insights (loss behavior, metrics, debugging)
- Troubleshooting guide
- Production best practices
Templates
- basic_grpo_training.py: Minimal, production-ready training script
- Uses Qwen 2.5 1.5B Instruct
- 3 reward functions (format + correctness)
- LoRA for efficient training
- Fully documented and ready to run
Examples
- reward_functions_library.py: 20+ battle-tested reward functions
- Correctness rewards (exact match, fuzzy match, numeric, code execution)
- Format rewards (XML, JSON, strict/soft)
- Length rewards (ideal length, min/max)
- Style rewards (reasoning quality, citations, repetition penalty)
- Combined rewards (multi-objective optimization)
- Preset collections for common tasks
📖 Usage for Agents
When this skill is loaded in your agent's context:
- Always read SKILL.md first before implementing
- Start simple - Use length-based reward to validate setup
- Build incrementally - Add one reward function at a time
- Reference examples - Copy patterns from reward_functions_library.py
- Monitor training - Watch reward metrics (not loss!)
🎯 Common Use Cases
| Task Type | Recommended Rewards | Template |
|---|---|---|
| Math reasoning | MATH_REASONING_REWARDS preset |
basic_grpo_training.py |
| Code generation | CODE_GENERATION_REWARDS preset |
Modify dataset in template |
| Summarization | SUMMARIZATION_REWARDS preset |
Adjust prompts + rewards |
| Q&A | QA_REWARDS preset |
Use fuzzy match + citations |
⚠️ Critical Reminders
- Loss goes UP during training - This is normal (it's KL divergence)
- Use 3-5 reward functions - Single rewards often fail
- Test rewards before training - Debug each function independently
- Monitor reward_std - Should stay > 0.1 (avoid mode collapse)
- Start with num_generations=4-8 - Scale up if GPU allows
🔗 External Resources
📝 Version
v1.0.0 - Initial release (January 2025)
👨💻 Maintained By
Orchestra Research For questions or improvements, see https://orchestra.com
License: MIT Last Updated: January 2025