hermes-agent/hermes_cli/__init__.py
kshitij 2f19512341
fix(cli): repair non-UTF-8 stdout/stderr on all platforms, not just Windows (#43439)
`hermes setup` (and other banner-printing commands) crash with an unhandled
UnicodeEncodeError on Linux hosts whose locale selects a non-UTF-8 codec —
e.g. a fresh Raspberry Pi / minimal Debian with a latin-1 or C/POSIX locale.
The setup wizard prints box-drawing characters (┌│├└─) and the ⚕ glyph before
any stream repair runs, so the command dies before it can start.

The existing _ensure_utf8() shim already knew how to re-wrap the standard
streams as UTF-8, but it returned early on `sys.platform != "win32"`, so the
identical crash class on Linux was never covered.

- Drop the win32 gate: repair any stdout/stderr whose encoding is not UTF-8.
- Prefer TextIOWrapper.reconfigure() so the stream object is fixed in place
  (cached sys.stdout references keep working); fall back to reopening the fd
  with closefd=False (the CPython-recommended safe variant).
- Use errors="replace" — matching the sibling hermes_cli/stdio.py shim — so a
  stray un-encodable byte degrades gracefully instead of crashing.
- Only set the PYTHONUTF8/PYTHONIOENCODING child-process hints when a repair
  actually happened, so a healthy UTF-8 host sees zero footprint (no stream
  swap, no env mutation).

This is intentionally the earliest, platform-agnostic guard, running at import
time before any banner prints. hermes_cli/stdio.py::configure_windows_stdio()
still runs later from the entry points for the Windows-only extras (console
code-page flip, EDITOR default, PATH augmentation); it early-returns on
non-Windows and its stream reconfigure is an idempotent no-op once we've
already repaired the streams here.

Add regression tests covering latin-1 and ascii/POSIX streams, the reconfigure
fallback, already-UTF-8 no-op (identity preserved + no env mutation), the
repair-sets-env and respects-explicit-env contracts, and hostile/None streams.
2026-06-10 02:21:00 -07:00

92 lines
3.7 KiB
Python

"""
Hermes CLI - Unified command-line interface for Hermes Agent.
Provides subcommands for:
- hermes chat - Interactive chat (same as ./hermes)
- hermes gateway - Run gateway in foreground
- hermes gateway start - Start gateway service
- hermes gateway stop - Stop gateway service
- hermes setup - Interactive setup wizard
- hermes status - Show status of all components
- hermes cron - Manage cron jobs
"""
import os
import sys
__version__ = "0.16.0"
__release_date__ = "2026.6.5"
def _ensure_utf8():
"""Force UTF-8 stdout/stderr to prevent UnicodeEncodeError crashes.
Several environments select a legacy, non-UTF-8 encoding for the standard
streams:
- Windows services and terminals default to cp1252.
- Linux hosts with a latin-1 / C / POSIX locale (common on minimal Debian
installs and Raspberry Pi) select latin-1 or ASCII.
The CLI prints box-drawing characters (┌│├└─) and the ⚕ glyph in the setup
wizard, doctor, and status banners. Encoding those under a non-UTF-8 codec
raises an unhandled UnicodeEncodeError that crashes the command before it
can even start — e.g. `hermes setup` on a fresh Pi.
This runs at import time so it protects every CLI subcommand, on any
platform. It re-wraps stdout/stderr as UTF-8 when their encoding is not
already UTF-8, preferring TextIOWrapper.reconfigure() so the existing
stream object is fixed in place (cached `sys.stdout` references keep
working) and falling back to reopening the file descriptor with
closefd=False (the CPython-recommended safe variant).
No-op when the streams are already UTF-8: a healthy UTF-8 system sees no
stream change and no environment mutation.
Note: this is intentionally the earliest, platform-agnostic guard.
hermes_cli/stdio.py::configure_windows_stdio() runs later from the entry
points and layers on the Windows-only extras (console code-page flip,
EDITOR default, PATH augmentation); its stream reconfiguration is a
harmless idempotent no-op once we have already repaired the streams here.
"""
repaired = False
for stream_name in ("stdout", "stderr"):
stream = getattr(sys, stream_name, None)
if stream is None:
continue
try:
encoding = (getattr(stream, "encoding", "") or "").lower().replace("-", "")
if encoding == "utf8":
continue
# Preferred: reconfigure the existing TextIOWrapper in place. This
# preserves object identity so any code already holding a reference
# to the old sys.stdout benefits from the repair too.
reconfigure = getattr(stream, "reconfigure", None)
if callable(reconfigure):
reconfigure(encoding="utf-8", errors="replace")
repaired = True
continue
# Fallback: reopen the underlying file descriptor as UTF-8. Used
# for streams that don't expose reconfigure() (e.g. some wrapped
# or replaced streams). closefd=False keeps the original fd open.
new_stream = open(
stream.fileno(), "w", encoding="utf-8",
errors="replace", buffering=1, closefd=False,
)
setattr(sys, stream_name, new_stream)
repaired = True
except (AttributeError, OSError, ValueError):
pass
# Only nudge child processes toward UTF-8 when we actually detected a
# non-UTF-8 locale. On a healthy UTF-8 host children inherit UTF-8 from the
# locale already, so leave the environment untouched (minimal footprint).
if repaired:
os.environ.setdefault("PYTHONUTF8", "1")
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
_ensure_utf8()