mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-09 03:11:58 +00:00
execute_code: set PYTHONIOENCODING=utf-8 + PYTHONUTF8=1 in child env
Third Windows-specific sandbox bug (after WinError 10106 and the UTF-8
file-write bug): user scripts that print non-ASCII to stdout crash with
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192'
in position N: character maps to <undefined>
Root cause: Python's sys.stdout on Windows is bound to the console code
page (cp1252 on US-locale installs) when the process is attached to a
pipe without PYTHONIOENCODING set. LLM-generated scripts routinely
print em-dashes, arrows, accented chars, and emoji — all of which cp1252
can't encode.
Fix: spawn the sandbox child with:
PYTHONIOENCODING=utf-8 # sys.stdin/stdout/stderr all UTF-8
PYTHONUTF8=1 # PEP 540 UTF-8 mode — open() defaults to UTF-8 too
PYTHONUTF8 is the belt-and-suspenders half: LLM scripts that call
open(path, 'w') without encoding= in user code will now produce UTF-8
files by default, matching what the sandbox already does for its own
staging files.
The parent side already decodes child stdout/stderr as UTF-8 with
errors='replace' (lines 1345-1347) so the end-to-end chain is clean.
On POSIX these values usually match the locale default already, so
setting them is harmless belt-and-suspenders for C/POSIX-locale
containers and minimal base images.
Tests added (4) — total file now at 28 passed, 1 skipped on Windows:
- test_popen_env_sets_pythonioencoding_utf8 (source grep)
- test_popen_env_sets_pythonutf8_mode (source grep)
- test_live_child_can_print_non_ascii (cross-platform live test)
- test_windows_child_without_utf8_env_would_fail (Windows negative
control — actually reproduces the bug without our env overrides,
proving the fix is load-bearing on this system)
This commit is contained in:
parent
f5ec30dfe6
commit
bf43f6cfdd
2 changed files with 152 additions and 0 deletions
|
|
@ -1175,6 +1175,25 @@ def execute_code(
|
|||
child_env = _scrub_child_env(os.environ)
|
||||
child_env["HERMES_RPC_SOCKET"] = rpc_endpoint
|
||||
child_env["PYTHONDONTWRITEBYTECODE"] = "1"
|
||||
# Force UTF-8 for the child's stdio and default file encoding.
|
||||
#
|
||||
# Without this, on Windows sys.stdout is bound to the console code
|
||||
# page (cp1252 on US-locale installs), and any script that does
|
||||
# ``print("café")`` or ``print("→")`` crashes with:
|
||||
#
|
||||
# UnicodeEncodeError: 'charmap' codec can't encode character
|
||||
# '\u2192' in position N: character maps to <undefined>
|
||||
#
|
||||
# PYTHONIOENCODING fixes sys.stdin/stdout/stderr.
|
||||
# PYTHONUTF8=1 enables "UTF-8 mode" (PEP 540) which additionally
|
||||
# makes ``open()``'s default encoding UTF-8, so user scripts that
|
||||
# write files without specifying encoding= also work correctly.
|
||||
#
|
||||
# On POSIX both values usually match the locale default already,
|
||||
# so setting them is harmless belt-and-suspenders for environments
|
||||
# with a C/POSIX locale (containers, minimal base images).
|
||||
child_env["PYTHONIOENCODING"] = "utf-8"
|
||||
child_env["PYTHONUTF8"] = "1"
|
||||
# Ensure the hermes-agent root is importable in the sandbox so
|
||||
# repo-root modules are available to child scripts. We also prepend
|
||||
# the staging tmpdir so ``from hermes_tools import ...`` resolves even
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue