Commit graph

4 commits

Author SHA1 Message Date
Teknium
cbce5e93fc codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.

Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs).  That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.

After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly.  Works identically on every platform
and every locale, no surprise behavior.

Mechanical sweep via:
  ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix     --exclude 'tests,venv,.venv,node_modules,website,optional-skills,               skills,tinker-atropos,plugins' .

All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8').  Nothing
else changed.  Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).

Scope notes:
  - tests/ excluded: test fixtures can use locale encoding intentionally
    (exercising edge cases).  If we want to tighten tests later that's
    a separate PR.
  - plugins/ excluded: plugin-specific conventions may differ; plugin
    authors own their code.
  - optional-skills/ and skills/ excluded: skill scripts are user-authored
    and we don't want to mass-edit them.
  - website/ and tinker-atropos/ excluded: vendored / generated content.

46 files touched, 89 +/- lines (symmetric replacement).  No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-08 14:27:40 -07:00
alt-glitch
77c5bc9da9 feat(budget): make tool result persistence thresholds configurable
Add BudgetConfig dataclass to centralize and make overridable the
hardcoded constants (50K per-result, 200K per-turn, 2K preview) that
control when tool outputs get persisted to sandbox. Configurable at
the RL environment level via HermesAgentEnvConfig fields, threaded
through HermesAgentLoop to the storage layer.

Resolution: pinned (read_file=inf) > env config overrides > registry
per-tool > default. CLI override: --env.turn_budget_chars 80000
2026-04-08 02:24:32 -07:00
teknium1
ce28f847ce fix: update OpenRouter model names for yc-bench config
Use anthropic/claude-sonnet-4.6 (OpenRouter format) instead of
anthropic/claude-sonnet-4-20250514 (direct API format).
2026-03-06 19:58:56 -08:00
teknium1
b4fbb6fe10 feat: add YC-Bench long-horizon agent benchmark environment
Adds eval-only benchmark for YC-Bench (collinear-ai/yc-bench), a
deterministic long-horizon benchmark where the agent acts as CEO of an
AI startup over a simulated 1-3 year run.

Key design decisions verified against the official yc-bench repo:
- Uses 'sim init' (NOT 'yc-bench run') to avoid starting a competing
  built-in agent loop
- Correct DB table names: 'companies' and 'sim_events'
- Correct 4 domains: research, inference, data_environment, training
- Penalty values are preset-dependent (not hardcoded in system prompt)
- Sequential evaluation (each run is 100-500 turns)
- Follows TerminalBench2 patterns: KeyboardInterrupt handling,
  cleanup_all_environments(), tqdm logging handler, streaming JSONL

yc-bench added as optional dependency: pip install hermes-agent[yc-bench]

Closes #340
2026-03-06 19:25:56 -08:00