hermes-agent/environments/benchmarks
teknium1 ee7fde6531 feat: add OpenThoughts-TBLite evaluation script
Introduced a new evaluation script for the OpenThoughts-TBLite environment, enabling users to run evaluations with customizable options. The script includes logging capabilities and real-time output, enhancing the evaluation process for terminal agents. This addition complements the existing benchmarking tools and improves usability for users.
2026-03-04 12:55:56 +00:00
..
tblite feat: add OpenThoughts-TBLite evaluation script 2026-03-04 12:55:56 +00:00
terminalbench_2 feat: add OpenThoughts-TBLite evaluation script 2026-03-04 12:55:56 +00:00
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00