hermes-agent/gateway
Teknium d94fb47717 hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).

Problem: Python on Windows has two long-standing text-encoding pitfalls:

  1. sys.stdout/stderr are bound to the console code page (cp1252 on
     US-locale installs) — print('café') crashes with UnicodeEncodeError.
  2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
     PYTHONIOENCODING are set in their env — so any Python we spawn
     (linters, sandbox children, delegation workers) hits the same bug.

Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:

  - hermes_cli/main.py   (hermes / hermes-agent console_script)
  - run_agent.py         (hermes-agent direct)
  - acp_adapter/entry.py (hermes-acp)
  - gateway/run.py       (messaging gateway)
  - batch_runner.py      (parallel batch mode)
  - cli.py               (legacy direct-launch CLI)

On Windows, the bootstrap:
  - os.environ.setdefault('PYTHONUTF8', '1')       (PEP 540 UTF-8 mode)
  - os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
  - sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')

Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.

On POSIX (Linux/macOS), the bootstrap is a complete no-op.  We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected.  POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.

setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.

What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init.  A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.

Tests (17): 16 passed, 1 skipped on Windows.
  - Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
  - POSIX: complete no-op (verified on fake POSIX + skipped on real
    POSIX since we don't have a Linux box in this session)
  - Idempotence: multiple calls safe
  - Graceful degradation: non-reconfigurable streams don't crash
  - User opt-out: explicit PYTHONUTF8=0 is respected
  - Load order: every entry point's FIRST top-level import is
    hermes_bootstrap, enforced by an AST-level parametrized test

pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
2026-05-08 14:27:40 -07:00
..
assets fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
builtin_hooks remove: BOOT.md built-in hook (#17093) 2026-04-28 09:50:27 -07:00
platforms feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags 2026-05-08 14:27:40 -07:00
__init__.py Enhance CLI with multi-platform messaging integration and configuration management 2026-02-02 19:01:51 -08:00
channel_directory.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
config.py fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene 2026-05-08 10:29:58 -07:00
delivery.py fix(gateway): preserve case-sensitive chat IDs in DeliveryTarget.parse 2026-05-01 14:01:26 -07:00
display_config.py feat(gateway): opt-in cleanup of temporary progress bubbles (#21186) 2026-05-07 05:04:37 -07:00
hooks.py fix(plugins): register dynamically-loaded modules in sys.modules before exec 2026-04-29 23:34:35 -07:00
mirror.py fix(gateway): avoid cross-user mirror writes in per-user group sessions 2026-04-26 18:31:24 -07:00
pairing.py fix(pairing): enforce lockout on approve_code, not just generate_code (#10195) (#21325) 2026-05-07 07:18:21 -07:00
platform_registry.py feat(gateway): generic plugin hooks for env enablement + cron delivery 2026-05-07 07:15:44 -07:00
restart.py fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
run.py hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points 2026-05-08 14:27:40 -07:00
runtime_footer.py feat(gateway): opt-in runtime-metadata footer on final replies (#17026) 2026-04-28 06:50:04 -07:00
session.py refactor(gateway): simplify auto-resume + extend to crash recovery 2026-05-07 05:05:34 -07:00
session_context.py fix(cron): run due jobs in parallel to prevent serial tick starvation (#13021) 2026-04-20 11:53:07 -07:00
status.py fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
sticker_cache.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
stream_consumer.py chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664 2026-04-29 21:56:51 -07:00
whatsapp_identity.py fix(whatsapp_identity): pin identifier regex to ASCII, clarify it's defense-in-depth 2026-04-26 20:48:31 -07:00