hermes-agent/environments/benchmarks
teknium 1b7bc299f3 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements
- Updated task filter descriptions for clarity and added a new skip task feature to exclude incompatible tasks.
- Introduced a set of modal incompatible tasks to prevent execution errors in cloud environments.
- Implemented streaming JSONL logging for task results, preserving data even on interruptions.
- Refactored task evaluation logic to include skipped task reporting and improved error handling.
2026-02-12 05:36:45 +00:00
..
terminalbench_2 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements 2026-02-12 05:36:45 +00:00
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00