hermes-agent/hermes_bootstrap.py
Teknium c39b2b50ee
fix(tui): stop a cwd package named utils/proxy/ui from crashing the gateway child (#51693)
Launching Hermes from a directory that ships its own top-level package with a
Hermes-internal name (utils/, proxy/, ui/) crashed the gateway/TUI child with
an ImportError (exit 1, crash loop): from utils import atomic_replace resolved
to the user's package.

tui_gateway/entry.py already stripped the relative cwd forms ('' / '.'), but
the launch dir also reaches sys.path as its own ABSOLUTE path (venv activation
or a project that adds itself to PYTHONPATH), which the strip missed and which
sat ahead of the Hermes root.

Centralize a hardened guard in hermes_bootstrap.harden_import_path(): drop the
relative forms AND force the Hermes source root to the front even when an
absolute cwd entry is present. Wire it into tui_gateway/entry.py and
acp_adapter/entry.py (both spawn into arbitrary cwds); hermes_cli/main.py and
gateway/run.py already insert the root at front. gatewayClient.ts now also
exports HERMES_PYTHON_SRC_ROOT for defense in depth.
2026-06-23 23:29:45 -07:00

165 lines
6.9 KiB
Python

"""Windows UTF-8 bootstrap for Hermes entry points.
Python on Windows has two long-standing text-encoding footguns:
1. ``sys.stdout`` / ``sys.stderr`` are bound to the console code page
(``cp1252`` on US-locale installs), so ``print("café")`` crashes with
``UnicodeEncodeError: 'charmap' codec can't encode character``.
2. Child processes spawned via ``subprocess`` don't know to use UTF-8
unless ``PYTHONUTF8`` and/or ``PYTHONIOENCODING`` are set in their
environment — so any Python subprocess (the execute_code sandbox,
delegation children, linter subprocesses, etc.) inherits the same
cp1252 defaults and hits the same UnicodeEncodeError.
This module fixes both on Windows *only* — POSIX is untouched. It
should be imported at the very top of every Hermes entry point
(``hermes``, ``hermes-agent``, ``hermes-acp``, ``python -m gateway.run``,
``batch_runner.py``, ``cron/scheduler.py``) before any other imports
that might do file I/O or print to stdout.
What this module does on Windows:
- Sets ``os.environ["PYTHONUTF8"] = "1"`` (PEP 540 UTF-8 mode) so
every child process we spawn uses UTF-8 for ``open()`` and stdio.
- Sets ``os.environ["PYTHONIOENCODING"] = "utf-8"`` for belt-and-
suspenders — some tools read this instead of / in addition to
``PYTHONUTF8``.
- Reconfigures ``sys.stdout`` / ``sys.stderr`` to UTF-8 in the current
process, using the ``reconfigure()`` API (Python 3.7+). This fixes
``print("café")`` in the parent without a re-exec.
What this module does NOT do:
- It does not re-exec Python with ``-X utf8``, so ``open()`` calls in
the *current* process still default to locale encoding. Those need
an explicit ``encoding="utf-8"`` at the call site (lint rule
``PLW1514`` / ``PYI058``). Ruff is the right tool for that sweep.
What this module does on POSIX:
- Nothing. POSIX systems are already UTF-8 by default in 99% of cases,
and we don't want to touch ``LANG``/``LC_*`` behavior that users may
have configured intentionally. If someone hits a C/POSIX locale on
Linux, they can export ``PYTHONUTF8=1`` themselves — we won't override.
Idempotent: safe to call multiple times. ``_bootstrap_once`` guards
against double-reconfigure.
"""
from __future__ import annotations
import os
import sys
_IS_WINDOWS = sys.platform == "win32"
_bootstrap_applied = False
def apply_windows_utf8_bootstrap() -> bool:
"""Apply the Windows UTF-8 bootstrap if we're on Windows.
Returns True if bootstrap was applied (i.e. we're on Windows and
haven't already done this), False otherwise. The return value is
advisory — callers normally don't need it, but tests may want to
assert the path was taken.
Idempotent: subsequent calls after the first are a no-op.
"""
global _bootstrap_applied
if not _IS_WINDOWS:
return False
if _bootstrap_applied:
return False
# 1. Child processes inherit these and run in UTF-8 mode.
# We use setdefault() rather than overwriting so the user can
# explicitly opt out by setting PYTHONUTF8=0 in their environment
# (or PYTHONIOENCODING=something-else) if they really want to.
os.environ.setdefault("PYTHONUTF8", "1")
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
# 2. Reconfigure the current process's stdio to UTF-8. Needed
# because os.environ changes don't retroactively rebind sys.stdout
# — those were bound at interpreter startup based on the console
# code page. ``reconfigure`` is a TextIOWrapper method since 3.7.
#
# errors="replace" means that if we ever *read* something from
# stdin that isn't UTF-8 (unlikely but possible with piped input
# from legacy tools), we'll get U+FFFD replacement chars rather
# than a crash. Output is pure UTF-8.
for stream_name in ("stdout", "stderr"):
stream = getattr(sys, stream_name, None)
if stream is None:
continue
reconfigure = getattr(stream, "reconfigure", None)
if reconfigure is None:
# Not a TextIOWrapper (could be redirected to a BytesIO in
# tests, or a non-standard stream in some embedded cases).
# Skip silently — the env-var fix is still in effect for
# child processes, which is the bigger win.
continue
try:
reconfigure(encoding="utf-8", errors="replace")
except (OSError, ValueError):
# Already closed, or someone replaced it with something
# non-reconfigurable. Non-fatal.
pass
# stdin is reconfigured separately with errors="replace" too — input
# from a legacy pipe shouldn't crash the process.
stdin = getattr(sys, "stdin", None)
if stdin is not None:
reconfigure = getattr(stdin, "reconfigure", None)
if reconfigure is not None:
try:
reconfigure(encoding="utf-8", errors="replace")
except (OSError, ValueError):
pass
_bootstrap_applied = True
return True
def harden_import_path(src_root: str | None = None) -> None:
"""Stop a package in the current directory from shadowing Hermes modules.
Hermes ships top-level modules with common names (``utils``, ``proxy``,
``ui``). Python always seeds ``sys.path`` with the current directory, so
launching an entry point from a project that has its own ``utils/`` package
makes ``from utils import ...`` resolve to the *user's* package and crash
with an ImportError before the gateway can even start.
The current directory reaches ``sys.path`` two ways, and a complete guard
has to handle both:
- As the empty string ``""`` (or ``"."``) that Python inserts at
``sys.path[0]`` for ``-m`` / script launches.
- As its own *absolute* path, when a venv activation or a project that
adds itself to ``PYTHONPATH`` puts the directory there explicitly.
We drop the relative forms outright, then force the real Hermes source root
to the front — relocating it ahead of any absolute cwd entry rather than
only inserting when absent, so an absolute cwd path can't keep winning.
``src_root`` defaults to the directory this module lives in, which is the
repository root for every shipped entry point, so the guard is
self-sufficient and does not depend on the spawner exporting an env var.
"""
root = src_root or os.environ.get("HERMES_PYTHON_SRC_ROOT") or os.path.dirname(
os.path.abspath(__file__)
)
sys.path[:] = [p for p in sys.path if p not in ("", ".")]
root_abs = os.path.abspath(root)
sys.path[:] = [p for p in sys.path if os.path.abspath(p) != root_abs]
sys.path.insert(0, root)
# Apply on import — entry points just need ``import hermes_bootstrap``
# (or ``from hermes_bootstrap import apply_windows_utf8_bootstrap``) at
# the very top of their module, before importing anything else. The
# import side effect does the right thing.
apply_windows_utf8_bootstrap()