hermes-agent/environments/benchmarks/terminalbench_2
teknium ba3fea24f1 Enhance TerminalBench 2 configuration and evaluation handling
- Added task_timeout parameter to enforce a maximum wall-clock time for each task, automatically scoring as FAIL if exceeded.
- Introduced terminal_timeout and tool_pool_size parameters to improve command execution and concurrency management.
- Updated logging to provide detailed task execution times and timeout handling, enhancing overall monitoring.
- Removed outdated evaluate_config.yaml file to streamline configuration management.
2026-02-10 22:53:24 +00:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml Enhance TerminalBench 2 configuration and evaluation handling 2026-02-10 22:53:24 +00:00
run_eval.sh Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
terminalbench2_env.py Enhance TerminalBench 2 configuration and evaluation handling 2026-02-10 22:53:24 +00:00