mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-09 03:11:58 +00:00
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).
Problem: Python on Windows has two long-standing text-encoding pitfalls:
1. sys.stdout/stderr are bound to the console code page (cp1252 on
US-locale installs) — print('café') crashes with UnicodeEncodeError.
2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
PYTHONIOENCODING are set in their env — so any Python we spawn
(linters, sandbox children, delegation workers) hits the same bug.
Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:
- hermes_cli/main.py (hermes / hermes-agent console_script)
- run_agent.py (hermes-agent direct)
- acp_adapter/entry.py (hermes-acp)
- gateway/run.py (messaging gateway)
- batch_runner.py (parallel batch mode)
- cli.py (legacy direct-launch CLI)
On Windows, the bootstrap:
- os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode)
- os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
- sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')
Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.
On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected. POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.
setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.
What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init. A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.
Tests (17): 16 passed, 1 skipped on Windows.
- Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
- POSIX: complete no-op (verified on fake POSIX + skipped on real
POSIX since we don't have a Linux box in this session)
- Idempotence: multiple calls safe
- Graceful degradation: non-reconfigurable streams don't crash
- User opt-out: explicit PYTHONUTF8=0 is respected
- Load order: every entry point's FIRST top-level import is
hermes_bootstrap, enforced by an AST-level parametrized test
pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
|
||
|---|---|---|
| .. | ||
| assets | ||
| builtin_hooks | ||
| platforms | ||
| __init__.py | ||
| channel_directory.py | ||
| config.py | ||
| delivery.py | ||
| display_config.py | ||
| hooks.py | ||
| mirror.py | ||
| pairing.py | ||
| platform_registry.py | ||
| restart.py | ||
| run.py | ||
| runtime_footer.py | ||
| session.py | ||
| session_context.py | ||
| status.py | ||
| sticker_cache.py | ||
| stream_consumer.py | ||
| whatsapp_identity.py | ||