hermes-agent/environments/benchmarks/terminalbench_2
teknium1 ee7fde6531 feat: add OpenThoughts-TBLite evaluation script
Introduced a new evaluation script for the OpenThoughts-TBLite environment, enabling users to run evaluations with customizable options. The script includes logging capabilities and real-time output, enhancing the evaluation process for terminal agents. This addition complements the existing benchmarking tools and improves usability for users.
2026-03-04 12:55:56 +00:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml Enhance TerminalBench 2 configuration and evaluation handling 2026-02-10 22:53:24 +00:00
run_eval.sh feat: add OpenThoughts-TBLite evaluation script 2026-03-04 12:55:56 +00:00
terminalbench2_env.py Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements 2026-02-12 05:36:45 +00:00