feat: add OpenThoughts-TBLite evaluation environment and configuration files

Introduced a new evaluation environment for OpenThoughts-TBLite, including the main evaluation script, configuration YAML, and README documentation. This environment provides a faster alternative to Terminal-Bench 2.0, featuring 100 difficulty-calibrated tasks for terminal agents. The setup allows for easy evaluation and configuration, enhancing the benchmarking capabilities for terminal agents.
This commit is contained in:
teknium1 2026-03-04 11:38:32 +00:00
parent 3db3d60368
commit 0ea6c34325
4 changed files with 231 additions and 0 deletions