The environment was merged missing several standard components.
Updated to match the patterns established by 82 Atropos environments
and our own HermesAgentBaseEnv contract.
Added:
- WebResearchEnvConfig — custom Pydantic config with reward weights,
efficiency thresholds, eval settings, dataset config (all tunable
via CLI/YAML without code changes)
- config_init() classmethod — default server config (OpenRouter +
Claude) so the env works out of the box
- wandb_log() override — logs reward breakdown metrics (correctness,
tool_usage, efficiency, diversity, correct_rate, tool_usage_rate)
with proper buffer management and super() call
- evaluate() — uses server.chat_completion instead of broken stub
_run_agent_on_item(). Logs via evaluate_log() for lighteval-
compatible output.
Fixed:
- Removed broken _run_agent_on_item() stub that returned empty results
- evaluate() now uses server.chat_completion (same pattern as
TerminalTestEnv) for actual model evaluation
- compute_reward reads tool calls from AgentResult properly
- LLM judge uses self.server.chat_completion instead of ctx
Reward config is now tunable without code changes:
--env.correctness_weight 0.6
--env.tool_usage_weight 0.2
--env.efficiency_weight 0.2
--env.diversity_bonus 0.1
--env.efficient_max_calls 5