hermes-agent/environments/benchmarks/terminalbench_2
teknium ad042fdd68 Update terminalbench_2 configuration for enhanced performance and evaluation
- Increased max_token_length from 16000 to 32000 to allow for longer inputs.
- Adjusted agent_temperature from 0.6 to 0.8 for more varied responses.
- Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations.
- Updated data directory path for saving evaluations to ensure proper organization.
2026-02-10 19:48:41 +00:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml Update terminalbench_2 configuration for enhanced performance and evaluation 2026-02-10 19:48:41 +00:00
run_eval.sh Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
terminalbench2_env.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00