hermes-agent/environments/benchmarks/terminalbench_2
teknium 1b7bc299f3 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements
- Updated task filter descriptions for clarity and added a new skip task feature to exclude incompatible tasks.
- Introduced a set of modal incompatible tasks to prevent execution errors in cloud environments.
- Implemented streaming JSONL logging for task results, preserving data even on interruptions.
- Refactored task evaluation logic to include skipped task reporting and improved error handling.
2026-02-12 05:36:45 +00:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml Enhance TerminalBench 2 configuration and evaluation handling 2026-02-10 22:53:24 +00:00
run_eval.sh Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
terminalbench2_env.py Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements 2026-02-12 05:36:45 +00:00