Three related Windows-only fixes that together make the browser toolset
actually usable on Windows. Symptom chain: user invokes browser_navigate
-> tool returns {"success": false, "error": "Daemon process exited
during startup with no error output"} and the CLI exits mid-turn with
the session summary.
Root cause (3 layers):
1. tools/browser_tool.py::_find_agent_browser() resolved
node_modules/.bin/agent-browser to the extensionless POSIX shell
shim via Path.exists(). On Windows, CreateProcessW cannot execute
that script (WinError 193 "not a valid Win32 application"). Fix:
delegate to shutil.which with path=node_modules/.bin so PATHEXT
picks up agent-browser.CMD on Windows and the extensionless shim
stays correct on POSIX.
2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the
parent hermes.exe whenever a background thread spawns a .cmd
subprocess. Python 3.11's default SIGINT handler raises
KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's
app.run() -> cli.py::run()'s finally block calls _run_cleanup()
-> _emergency_cleanup_all_sessions -> spawns a concurrent
_run_browser_command("close", ...) on the same session the agent
thread just opened. Two agent-browser processes race on the same
--session name, the daemon startup loses, and the tool returns
the "Daemon process exited during startup" error. Fix: install a
Windows-only SIGINT handler that absorbs the signal silently.
Real user Ctrl+C still routes through prompt_toolkit's own c-c
keybinding at the TUI layer, which is how Claude Code handles the
same quirk (driving cancellation via the TUI key handler, not
signals).
3. In tools/browser_tool.py, both Popen sites now pass
creationflags=CREATE_NO_WINDOW | STARTF_USESTDHANDLES with
close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd
console flash; STARTF_USESTDHANDLES + close_fds ensures the child
inherits only our three chosen handles (DEVNULL stdin, temp-file
stdout/stderr) and no leaked parent console handles that could
confuse agent-browser's native daemon spawn. Notably we do NOT
add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag
interacts badly with asyncio's ProactorEventLoop and makes things
worse.
Verified end-to-end on Windows 10 / Windows Terminal / PowerShell:
browser_navigate to https://example.com returns
{"success": true, "title": "Example Domain"} and the CLI stays alive
for follow-up tool calls and assistant turns.
Refs: earlier Windows quirks commits 1cebb3bad (Ctrl+Enter newline),
26f5af52a (environment hints), aefd1a37f (Playwright Chromium).
Replace hardcoded ~/.hermes/shared/ references with
get_default_hermes_root() / 'shared' so the cross-profile Nous auth
store lands in the correct location on every platform:
- Linux/macOS: ~/.hermes/shared/
- native Windows: %LOCALAPPDATA%\hermes\shared- Docker / custom HERMES_HOME: <root>/shared/
Updates _nous_shared_auth_dir(), the pytest seat-belt in
_nous_shared_store_path(), and the auth_add_command comment to match.
Previously Windows installs wrote to ~/.hermes/shared/ even though the
rest of the CLI uses %LOCALAPPDATA%\hermes, so profiles couldn't see
each other's shared credential.
Completes the Windows-gating coverage for the built-in skills/ tree. Every
bundled SKILL.md now carries an explicit platforms: declaration so the
loader (agent.skill_utils.skill_matches_platform) can skip-load skills
that don't fit the current OS.
74 skills declared cross-platform (platforms: [linux, macos, windows]):
Creative (16): ascii-art, ascii-video, architecture-diagram, baoyu-comic,
baoyu-infographic, claude-design, creative-ideation, design-md,
excalidraw, humanizer, manim-video, p5js, pixel-art,
popular-web-designs, pretext, sketch, songwriting-and-ai-music,
touchdesigner-mcp
Autonomous agents: claude-code, codex, hermes-agent, opencode
Data/devops: jupyter-live-kernel, kanban-orchestrator, kanban-worker,
webhook-subscriptions, dogfood, codebase-inspection
GitHub: github-auth, github-code-review, github-issues,
github-pr-workflow, github-repo-management
Media: gif-search, heartmula, songsee, spotify, youtube-content
MCP / email / gaming / notes / smart-home: native-mcp, himalaya,
pokemon-player, obsidian, openhue
mlops (non-broken): weights-and-biases, huggingface-hub, llama-cpp,
outlines, segment-anything-model, dspy, trl-fine-tuning
Productivity: airtable, google-workspace, linear, maps, nano-pdf,
notion, ocr-and-documents, powerpoint
Red-teaming / research: godmode, arxiv, blogwatcher, llm-wiki,
polymarket
Software-dev: debugging-hermes-tui-commands, hermes-agent-skill-authoring,
node-inspect-debugger, plan, requesting-code-review, spike,
subagent-driven-development, systematic-debugging,
test-driven-development, writing-plans
Misc: yuanbao
5 skills gated from Windows (platforms: [linux, macos]):
mlops/inference/vllm (serving-llms-vllm)
vLLM is officially Linux-only; Windows requires WSL.
mlops/training/axolotl
Axolotl's flash-attn + deepspeed + bitsandbytes stack is Linux-first.
mlops/training/unsloth
Requires Triton + xformers + flash-attn — Linux only in practice.
mlops/models/audiocraft (audiocraft-audio-generation)
torchaudio ffmpeg backend + encodec dependencies are Linux-first.
mlops/inference/obliteratus
Research abliteration workflow; relies on Linux-focused pytorch
kernels and MLX — no first-class Windows path.
Same strict-over-lenient policy as the optional-skills sweep: when the
underlying tool's Windows support is rough, missing, or WSL-only, gate the
skill. Easier to un-gate after verified Windows support lands than to leak
partial support that manifests as mid-task failures.
Combined with prior commits in this branch, every bundled SKILL.md
(skills/ + optional-skills/) now has a platforms: declaration.
Extends the Windows-gating work to the optional-skills/ tree. Every
SKILL.md that previously omitted the platforms: field now carries an
explicit declaration, which Hermes's loader (agent.skill_utils.
skill_matches_platform) honors to skip-load on incompatible OSes.
58 skills declared cross-platform (platforms: [linux, macos, windows]):
autonomous-ai-agents/blackbox, autonomous-ai-agents/honcho
blockchain/base, blockchain/solana
communication/one-three-one-rule
creative/blender-mcp, creative/concept-diagrams, creative/hyperframes,
creative/kanban-video-orchestrator, creative/meme-generation
devops/cli (inference-sh-cli), devops/docker-management
dogfood/adversarial-ux-test
email/agentmail
finance/3-statement-model, finance/comps-analysis, finance/dcf-model,
finance/excel-author, finance/lbo-model, finance/merger-model,
finance/pptx-author
health/fitness-nutrition, health/neuroskill-bci
mcp/fastmcp, mcp/mcporter
migration/openclaw-migration
mlops/accelerate, mlops/chroma, mlops/clip, mlops/guidance,
mlops/hermes-atropos-environments, mlops/huggingface-tokenizers,
mlops/instructor, mlops/lambda-labs, mlops/llava, mlops/modal,
mlops/peft, mlops/pinecone, mlops/pytorch-lightning, mlops/qdrant,
mlops/saelens, mlops/simpo, mlops/stable-diffusion
productivity/canvas, productivity/shop-app, productivity/shopify,
productivity/siyuan, productivity/telephony
research/domain-intel, research/drug-discovery, research/duckduckgo-search,
research/gitnexus-explorer, research/parallel-cli, research/scrapling
security/1password, security/oss-forensics, security/sherlock
web-development/page-agent
5 skills gated from Windows (platforms: [linux, macos]):
mlops/flash-attention - Flash Attention wheels are Linux-first; Windows
install requires building from source with CUDA
mlops/faiss - faiss-gpu has no Windows wheel; gate rather than
leak partial (faiss-cpu) support
mlops/nemo-curator - NVIDIA NeMo ecosystem has no first-class Windows path
mlops/slime - Megatron+SGLang RL stack is Linux-only in practice
mlops/whisper - openai-whisper + ffmpeg setup on Windows is
non-trivial; gate until Windows install stanza lands
Methodology: scanned every SKILL.md for Windows-hostile signals
(apt-get, brew, systemd, osascript, ptrace, X11 binaries, POSIX-only
Python APIs, Docker POSIX $(pwd) bind-mounts, explicit 'linux-only' /
'macos-only' text). 3 skills flagged as having hard signals on review:
docker-management and qdrant only had POSIX $(pwd) docker examples and
the tools themselves (Docker Desktop, Qdrant) run fine on Windows —
declared ALL. whisper had an apt/brew ffmpeg install path and nothing
else but the openai-whisper Windows install story is rough enough to
warrant gating.
Strict-over-lenient policy: when in doubt, gate. Easier to un-gate after
verified Windows support lands than to leak partial support that
manifests as mid-task failures for Windows users.
Hermes's skill loader (agent/skill_utils.skill_matches_platform) already honors
the 'platforms:' frontmatter field and skip-loads skills whose declared
platform list doesn't include sys.platform. Seven bundled skills are in fact
Linux/macOS-only but never declared it, so they leak into Windows skill
listings and sometimes load with broken instructions.
Audited all 160 SKILL.md files (skills/ + optional-skills/) for Windows-
hostile signals: apt-get/brew/systemd/chmod+x install flows, ptrace/proc
runtime dependencies, bash-only launcher scripts, and package dependencies
with no Windows build. The 7 below fail one or more of those tests in a way
that fundamentally can't be papered over by docs edits:
minecraft-modpack-server bash start.sh + chmod +x + apt openjdk
evaluating-llms-harness lm-eval-harness bash launcher scripts
distributed-llm-pretraining-
torchtitan bash multi-node torchrun launcher
python-debugpy remote attach relies on /proc ptrace_scope
pytorch-fsdp NCCL backend; Windows path is WSL only
tensorrt-llm NVIDIA TensorRT-LLM has no Windows build
searxng-search Docker volume flow assumes POSIX $(pwd)
All seven get 'platforms: [linux, macos]'. On Windows the loader now skips
them silently — no more phantom skill listings, no more mid-task failures
because an Apple-only path was surfaced as a suggestion.
Cross-platform skills that merely CONTAIN signals in examples or
install-instructions (brew install as one of several paths, /tmp/ in a code
snippet, etc.) are NOT touched by this commit. A broader audit that
declares the ~140 cross-platform skills as 'platforms: [linux, macos,
windows]' can follow as a separate change once each has been verified
working on Windows.
The installed user copies under ~/AppData/Local/hermes/skills/ (when they
exist) are also patched so the running session reflects the gating
immediately, but only the in-repo files are committed here.
scripts/install.sh runs 'npx playwright install --with-deps chromium'
on every Linux distro after the npm-install step, which is why browser
tools Just Work on Linux. scripts/install.ps1 never did the equivalent
step, so on native Windows installs check_browser_requirements() in
tools/browser_tool.py would return False (no Chromium under
%LOCALAPPDATA%\ms-playwright) and every browser_* tool got silently
filtered out of the agent's tool schema — no error, no log entry, user
just wondered why the tools didn't exist.
Two-part fix:
1. scripts/install.ps1: after 'npm install' in InstallDir succeeds, run
'npx playwright install chromium'. Resolves npx via the same
execution-policy-aware logic already used for npm (prefer npx.cmd
next to npmExe, fall back to Get-Command). Surfaces a warning +
manual-recovery hint when the install fails, matching install.sh
behaviour for distros.
2. hermes_cli/doctor.py: after the agent-browser check, lazily import
tools.browser_tool and reuse the exact same _chromium_installed()
predicate check_browser_requirements() uses, so the doctor signal
cannot drift from the runtime gate. Skip the check when Camofox /
CDP override / a cloud provider / Lightpanda is configured (those
bypass local Chromium). On missing Chromium, the hint is
platform-correct: '--with-deps' on POSIX, plain 'install chromium'
on win32.
Verified on Windows 10:
- 'npx playwright install chromium' completes successfully, drops
Chrome Headless Shell under %LOCALAPPDATA%\ms-playwright
- check_browser_requirements() flips from False -> True
- 'hermes doctor' now prints either '✓ Playwright Chromium (browser
engine)' or '⚠ Playwright Chromium not installed' + fix command
- tests/hermes_cli/test_doctor.py: 38/38 pass
- tests/tools/test_browser_chromium_check.py: 16/16 pass
Adds a dedicated '## Windows-Specific Quirks' section to the hermes-agent
skill so Windows pitfalls have one discoverable place to evolve. Inaugural
entries cover:
- Input / keybindings — Alt+Enter intercepted by Windows Terminal,
Ctrl+Enter as the Windows newline keystroke, mintty/git-bash behavior,
pointer to scripts/keystroke_diagnostic.py for investigation.
- Config / files — UTF-8 BOM HTTP-400 trap.
- execute_code / sandbox — WinError 10106 SYSTEMROOT root cause +
_WINDOWS_ESSENTIAL_ENV_VARS fix location.
- Testing / contributing — scripts/run_tests.sh POSIX-venv limitation and
the system-Python workaround, POSIX-only test skip-guard patterns.
- Path / filesystem — line-ending warnings (cosmetic), forward-slash
portability.
Collapses the old scattered Windows bullets under 'Platform-specific
issues' into a single pointer at the new dedicated section so there's
only one place to maintain this content.
Also adds the scripts/keystroke_diagnostic.py the skill now references —
a small prompt_toolkit Application that prints the Keys.* identifier and
raw escape bytes for every keystroke. Used to establish the Ctrl+Enter
= c-j fact on Windows Terminal; generally useful for anyone adding a
platform-aware keybinding.
Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving
Windows users with no Enter-involving way to insert a newline in the Hermes
prompt. Fix it by reclaiming c-j on Windows only:
- _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where
thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows
plain Enter is always c-m, so c-j is free.
- Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends
Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal
settings changes required.
- Alt+Enter binding unchanged; still works on mac/Linux/WSL.
- Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler
split into platform-aware assertions for POSIX vs win32.
- Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit
even on POSIX) to point Windows users at Ctrl+Enter.
Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline,
since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No
conflicting Hermes binding existed for Ctrl+J, so this is a harmless side
effect.
build_environment_hints() now emits a factual block describing the
execution environment on every prompt build:
* Local backend: host OS, $HOME, and cwd — so the agent stops guessing
paths from the hostname. Windows also gets two specific callouts:
- hostname != username (prevents C:\Users\<hostname>\... bugs)
- `terminal` shells out to bash (git-bash/MSYS), not PowerShell
* Remote backend (docker/singularity/modal/daytona/ssh/vercel_sandbox):
host info is SUPPRESSED — the agent's tools can't touch the host, so
showing it is misleading. Instead we probe the backend once per
process with `uname/whoami/pwd` and cache the result. On probe
failure, fall back to a per-backend description that states only what
we know from the backend choice itself (container type + likely OS
family) without inventing user/cwd/$HOME.
Linux/Mac local users now get a small helpful 3-line host block instead
of an empty string. Zero change to the existing WSL hint paragraph.
Tests: 8 new/updated in TestEnvironmentHints, including a regression
guard that fails if a new remote backend is added without listing it in
_REMOTE_TERMINAL_BACKENDS.
Turns the existing 'all lints disabled' stance into 'exactly one lint
enabled' — PLW1514 (unspecified-encoding) catches bare open() /
read_text() / write_text() calls that default to locale encoding on
Windows (cp1252), silently corrupting non-ASCII content.
Changes:
1. pyproject.toml
- Migrate [tool.ruff] top-level select → [tool.ruff.lint].select
(deprecated config location, ruff was warning on every run)
- Add preview = true (PLW1514 is a preview rule in ruff 0.15.x)
- select = ['PLW1514'] (exactly one rule, deliberately minimal)
- per-file-ignores exempt tests/, plugins/, skills/, optional-skills/ —
those have their own conventions or intentionally exercise edge cases
2. website/scripts/extract-skills.py
- Fix 3 remaining bare opens (website/ was excluded from the main
sweep but needed for ruff check . to go green)
3. tests/test_lint_config.py (new, 5 tests)
- Guards against accidental rule removal. If someone deletes PLW1514
from the select list or disables preview mode, these tests fail
with a loud message explaining why the rule exists.
Paired with a companion commit (held locally for now, pending a token
with workflow scope) that adds a blocking ruff step to .github/workflows/
lint.yml. Without that companion commit, ruff is configured correctly
but nothing in CI enforces it yet — the advisory PR comment will still
surface new PLW1514 violations though, so authors see them.
Verified: ruff check . → exit 0, 0 violations across the repo.
Test suite: 90 passed, 14 skipped, 0 failed.
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.
Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs). That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.
After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly. Works identically on every platform
and every locale, no surprise behavior.
Mechanical sweep via:
ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' .
All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing
else changed. Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).
Scope notes:
- tests/ excluded: test fixtures can use locale encoding intentionally
(exercising edge cases). If we want to tighten tests later that's
a separate PR.
- plugins/ excluded: plugin-specific conventions may differ; plugin
authors own their code.
- optional-skills/ and skills/ excluded: skill scripts are user-authored
and we don't want to mass-edit them.
- website/ and tinker-atropos/ excluded: vendored / generated content.
46 files touched, 89 +/- lines (symmetric replacement). No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).
Problem: Python on Windows has two long-standing text-encoding pitfalls:
1. sys.stdout/stderr are bound to the console code page (cp1252 on
US-locale installs) — print('café') crashes with UnicodeEncodeError.
2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
PYTHONIOENCODING are set in their env — so any Python we spawn
(linters, sandbox children, delegation workers) hits the same bug.
Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:
- hermes_cli/main.py (hermes / hermes-agent console_script)
- run_agent.py (hermes-agent direct)
- acp_adapter/entry.py (hermes-acp)
- gateway/run.py (messaging gateway)
- batch_runner.py (parallel batch mode)
- cli.py (legacy direct-launch CLI)
On Windows, the bootstrap:
- os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode)
- os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
- sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')
Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.
On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected. POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.
setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.
What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init. A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.
Tests (17): 16 passed, 1 skipped on Windows.
- Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
- POSIX: complete no-op (verified on fake POSIX + skipped on real
POSIX since we don't have a Linux box in this session)
- Idempotence: multiple calls safe
- Graceful degradation: non-reconfigurable streams don't crash
- User opt-out: explicit PYTHONUTF8=0 is respected
- Load order: every entry point's FIRST top-level import is
hermes_bootstrap, enforced by an AST-level parametrized test
pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
Third Windows-specific sandbox bug (after WinError 10106 and the UTF-8
file-write bug): user scripts that print non-ASCII to stdout crash with
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192'
in position N: character maps to <undefined>
Root cause: Python's sys.stdout on Windows is bound to the console code
page (cp1252 on US-locale installs) when the process is attached to a
pipe without PYTHONIOENCODING set. LLM-generated scripts routinely
print em-dashes, arrows, accented chars, and emoji — all of which cp1252
can't encode.
Fix: spawn the sandbox child with:
PYTHONIOENCODING=utf-8 # sys.stdin/stdout/stderr all UTF-8
PYTHONUTF8=1 # PEP 540 UTF-8 mode — open() defaults to UTF-8 too
PYTHONUTF8 is the belt-and-suspenders half: LLM scripts that call
open(path, 'w') without encoding= in user code will now produce UTF-8
files by default, matching what the sandbox already does for its own
staging files.
The parent side already decodes child stdout/stderr as UTF-8 with
errors='replace' (lines 1345-1347) so the end-to-end chain is clean.
On POSIX these values usually match the locale default already, so
setting them is harmless belt-and-suspenders for C/POSIX-locale
containers and minimal base images.
Tests added (4) — total file now at 28 passed, 1 skipped on Windows:
- test_popen_env_sets_pythonioencoding_utf8 (source grep)
- test_popen_env_sets_pythonutf8_mode (source grep)
- test_live_child_can_print_non_ascii (cross-platform live test)
- test_windows_child_without_utf8_env_would_fail (Windows negative
control — actually reproduces the bug without our env overrides,
proving the fix is load-bearing on this system)
test_code_execution_modes.py had two test-level failures and two
class-level stale skip reasons on this Windows-native branch:
- TestResolveChildPython::test_project_with_virtualenv_picks_venv_python
- TestResolveChildPython::test_project_prefers_virtualenv_over_conda
Both fail on Windows with OSError: [WinError 1314] — they call
pathlib.Path.symlink_to() to build a fake venv, which requires
developer mode or admin on Windows. They also assume POSIX venv
layout (bin/python) where Windows uses Scripts/python.exe. Skip
them with a specific, accurate reason.
Also updated two class-level skipif reasons that said
'execute_code is POSIX-only' — no longer true on this branch.
New reason explains it's the test infrastructure (symlinks + POSIX
venv layout) that's the blocker, not execute_code itself.
Results on Windows Python 3.11:
Before: 41 passed, 10 skipped, 2 failed
After: 43 passed, 12 skipped, 0 failed
Second Windows-specific sandbox bug (WinError 10106 was the first):
after the env-scrub fix let the child start, it immediately failed to
import hermes_tools with:
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x97
in position 154: invalid start byte
Root cause: _execute_local wrote the generated hermes_tools.py stub and
the user's script.py via open(path, 'w') without encoding=. On Windows
the default text-mode encoding is cp1252 (system locale), which encodes
em-dashes (used in the stub's docstrings) as 0x97. Python then decodes
source files as UTF-8 (PEP 3120) on import, chokes on 0x97, and the
sandbox dies before any tool call.
Fix: pass encoding='utf-8' to all four file opens in the code_execution
path — the two staging writes in _execute_local (hermes_tools.py +
script.py) and the two RPC file-transport reads/writes in the generated
remote stub. JSON is ASCII-safe for most payloads but tool results
(terminal output, web_extract content) routinely carry non-ASCII.
Tests added (4):
- test_stub_and_script_writes_specify_utf8 — source grep guard
- test_file_rpc_stub_uses_utf8 — generated remote stub check
- test_stub_source_roundtrips_through_utf8 — concrete round-trip
- test_windows_default_encoding_would_have_failed — negative control
(skips on modern Python builds where default is already UTF-8
compatible, but retained for platforms where the regression could
return)
24/25 tests pass on Windows 3.11 (negative control skips because this
Python build handles em-dashes via cp1252 subset — the fix is still
correct, just the corruption path isn't always triggerable).
Adds TestPosixEquivalence to test_code_execution_windows_env.py. The
class pins the invariant that _scrub_child_env(env, is_windows=False)
produces byte-for-byte identical output to the pre-refactor inline
scrubber, across a matrix of:
- 2 synthetic envs (POSIX-shaped, Windows-shaped-on-POSIX)
- 3 passthrough rules (none, single-var, everything)
- 1 real-os.environ check on whatever platform runs the test
Plus a superset sanity check: is_windows=True must keep everything
is_windows=False keeps, and any extras must come from the
_WINDOWS_ESSENTIAL_ENV_VARS allowlist.
Rationale: the previous commit refactored the env-scrubbing inline
block into a helper. Future changes to that helper must not silently
regress POSIX behavior — if someone needs to change it, they update
_legacy_posix_scrubber in lockstep so the churn is visible in review.
All 21 tests in the file pass locally on Windows (pytest 9.0.3). 8 of
them are parametrized equivalence checks that run on every OS.
The sandbox's env scrubbing was dropping SYSTEMROOT, WINDIR, COMSPEC,
APPDATA, etc. On Windows this broke the child process before any RPC
could happen:
OSError: [WinError 10106] The requested service provider could not
be loaded or initialized
Python's socket module uses SYSTEMROOT to locate mswsock.dll during
Winsock initialization. Without it, socket.socket(AF_INET, SOCK_STREAM)
fails — and the existing loopback-TCP fallback for Windows couldn't work.
Fix: add a small Windows-only allowlist (_WINDOWS_ESSENTIAL_ENV_VARS)
matched by exact uppercase name, after the existing secret-substring
block. The secret block still runs first, so the allowlist cannot be
used to exfiltrate credentials. Also extract the env scrubber into a
testable helper (_scrub_child_env) that takes is_windows as a parameter,
so the logic can be unit-tested on any OS.
Live Winsock smoke test verifies that a child spawned with the scrubbed
env can now create an AF_INET socket on a real Windows host; the test
is guarded by sys.platform == 'win32' so POSIX CI stays green.
Two fixes from teknium1's next install run:
1. **npm install: "npm.ps1 cannot be loaded because running scripts is
disabled on this system."** Get-Command's default PATHEXT ordering
picked up ``npm.ps1`` (the PowerShell shim) ahead of ``npm.cmd`` (the
batch shim). Most Windows users have PowerShell's execution policy
set to Restricted or RemoteSigned, which blocks unsigned ``.ps1``
files. ``npm.cmd`` has no such restriction and works universally.
Install-NodeDeps now detects when Get-Command returned npm.ps1, looks
for a sibling npm.cmd in the same directory, and prefers it. Prints
an info line so the user sees why. Emits a warning + hint if only
npm.ps1 is available.
2. **"Launch hermes chat now? Y" crashes with "%1 is not a valid Win32
application" on Windows installs.** The setup wizard calls
``relaunch(["chat"])``; ``resolve_hermes_bin()`` returned
``sys.argv[0]`` which was ``...\\hermes_cli\\main.py`` (because hermes
was launched via ``python -m hermes_cli.main`` during setup).
On Windows, ``os.access(script.py, os.X_OK)`` returns True because
PATHEXT lists ``.py`` when the Python launcher is registered — but
``subprocess.run([script.py, ...])`` can't actually execute a ``.py``
directly. CreateProcessW needs a real PE file.
Fixed ``resolve_hermes_bin`` to reject ``.py``/``.pyc`` argv0 values
on Windows specifically. Falls through to ``shutil.which("hermes")``
(hermes.exe in the venv Scripts dir) or, as a final fallback, lets
build_relaunch_argv build ``[sys.executable, "-m", "hermes_cli.main"]``
which is bulletproof. POSIX behaviour unchanged — ``.py`` argv0 with
a shebang + chmod+x is still a valid exec target there.
3 new tests cover the Windows paths: .py argv0 + hermes.exe on PATH →
returns hermes.exe; .py argv0 + no PATH → returns None (caller uses
python -m); POSIX + executable .py → still accepted.
26 relaunch tests pass, no POSIX regressions.
teknium1 noticed execute_code was missing from his enabled tools on Windows.
Root cause: tools/code_execution_tool.py set ``SANDBOX_AVAILABLE =
sys.platform != \"win32\"`` as a module-level constant, originally because
the RPC transport required AF_UNIX. We added loopback TCP fallback for
the sandbox in commit eeb723fff (and covered it in the Windows TCP tests),
but forgot to lift the availability gate. So execute_code was still
invisible via the check_fn path on Windows.
- SANDBOX_AVAILABLE is now True unconditionally (it's still checked — a
future platform could flip it off via monkeypatch/env if needed).
- Error message when disabled no longer mentions Windows specifically,
just says 'sandbox is unavailable in this environment'.
- test_windows_returns_error updated: patches SANDBOX_AVAILABLE=False
directly (which was always its real intent) and asserts on 'unavailable'
instead of 'Windows'.
Tests: 171 code-execution + windows-compat tests pass, no regressions.
Three bugs from teknium1's successful install + diagnostic chat on Windows:
1. **Start-Process -FilePath npm.cmd fails with "%1 is not a valid Win32
application".** Start-Process bypasses cmd.exe and PATHEXT to call
CreateProcessW directly, which refuses .cmd batch shims. Switched
Install-NodeDeps to use PowerShell's invocation operator (``& $npmExe
install --silent *> $log``) which DOES honour PATHEXT. Extracted a
``_Run-NpmInstall`` helper so the browser + TUI paths share the same
logic. Captures $LASTEXITCODE correctly, still surfaces the real
stderr on failure with a log-file pointer for the full output.
2. **patch tool returns false-negative on Windows due to CRLF round-trip.**
Root cause was upstream of patch: ``subprocess.Popen(..., text=True,
stdin=PIPE)`` on Windows translates ``\\n`` → ``\\r\\n`` when data flows
through the stdin pipe. ``_pipe_stdin()`` was writing the patch's
new_content string through a text-mode pipe, bash then wrote those
CRLF bytes to disk, and patch's post-write verify compared the
on-disk CRLF bytes against the original LF-only string — fail.
Fixed in two places for defense in depth:
- ``_pipe_stdin()`` now writes through ``proc.stdin.buffer`` with
explicit UTF-8 encoding, bypassing Python's newline translation on
every platform. No behaviour change on POSIX (bytes are identical)
but stops the CRLF injection on Windows.
- ``patch_replace``'s post-write verify normalizes CRLF→LF on both
sides before comparing, so even if some future backend still
translates newlines the patch tool won't report a bogus failure.
3. **SOUL.md gets a UTF-8 BOM on Windows PowerShell 5.1.** ``Set-Content
-Encoding UTF8`` on PS5.1 writes UTF-8 WITH a byte-order-mark (changed
in PS7 via ``utf8NoBOM``). Hermes's prompt-injection scanner sees
the BOM (U+FEFF invisible char) and refuses to load the file, so
SOUL.md's persona instructions never get applied.
Fixed by writing the file via ``[System.IO.File]::WriteAllText``
with an explicit ``UTF8Encoding($false)`` — BOM-free on every
PowerShell version.
All POSIX behaviour verified unchanged: 198 tests pass across
test_file_operations, test_local_env_cwd_recovery, test_code_execution,
test_windows_native_support, test_windows_compat.
User hit 'fatal: not in a git directory' on re-install because:
1. They ran Remove-Item -Force $env:LOCALAPPDATA\hermes -ErrorAction
SilentlyContinue WHILE cd'd inside the install dir. Windows
silently refuses to delete a directory any shell is currently cd'd
inside and leaves the skeleton intact, but the -ErrorAction
SilentlyContinue swallowed every partial-delete failure so they
thought the wipe succeeded.
2. The installer then walked into Install-Repository, saw $InstallDir
still exists with a partial .git stub, my repo-validity probe
returned success (the probe's git rev-parse may have exit-code-zeroed
in a way I didn't expect), and the real git fetch died with three
'fatal: not a git repository' errors.
Two fixes belt-and-braces:
- Main() now cds to $env:USERPROFILE at start if the current shell
is inside $InstallDir. Harmless when the user ran from elsewhere;
critical when they didn't. This alone fixes the user's case.
- Install-Repository's 'is this a valid repo' probe now runs BOTH
git rev-parse --is-inside-work-tree AND git status, resets
$LASTEXITCODE before each to avoid picking up a stale 0, and
requires BOTH to succeed. Also requires rev-parse's output to
match 'true' (not just exit 0) to rule out exit-0-with-empty-output
edge cases.
teknium1 hit "fatal: not in a git directory" on re-install when the previous
install left a $InstallDir\.git stub that Test-Path matched but git didn't
recognize (three "fatal: not a git repository" lines, then the script
exited before touching anything).
Two bugs:
1. Test-Path "$InstallDir\.git" was a weak gate — it matches .git
whether it's a directory, file, symlink, submodule gitfile, OR a
broken stub from a failed previous Remove-Item. Replaced with a
real repo probe: Push-Location + git rev-parse --is-inside-work-tree
+ $LASTEXITCODE check. If git itself can't see a repo, we treat
the directory as not-a-repo and fall through to fresh clone.
2. The original update path ignored $LASTEXITCODE. fetch/checkout/pull
all emitted fatals but the script kept going. Now each command
checks $LASTEXITCODE and throws with an explicit message.
Also: when the directory exists but isn't a valid repo, the new code
wipes it (Remove-Item -ErrorAction Stop) and falls through to fresh
clone, instead of dying with the old "Directory exists but is not a git
repository" error. If the wipe itself fails (file locked, hermes still
running), we throw with a user-readable "close any programs using files
in <dir>" hint.
Refactored the function to use a $didUpdate flag instead of my earlier
draft's early `return` — that was skipping the submodule init block at
the bottom of the function. Both the update and fresh-clone paths now
fall through to the submodule init step, which is correct (git pull
doesn't auto-update submodules).
PowerShell structural check: 21 functions defined, braces balanced.
Three interrelated bugs from teknium1's first interactive chat on Windows:
1. **Snapshot/cwd file paths unquoted in bash command strings.** The session
bootstrap and per-command wrapper interpolated
``self._snapshot_path`` / ``self._cwd_file`` unquoted into bash commands
like ``export -p > C:/Users/ryanc/.../hermes-snap-xxx.sh``. Git Bash's
MSYS2 layer handles ``C:/...`` paths correctly ONLY when quoted; unquoted,
the colon and forward-slash get glob-parsed and the redirect targets a
bogus path. Symptom: every terminal command emitted two
``C:/Users/.../hermes-snap-*.sh (No such file or directory)`` lines that
bled into stdout (``stderr=STDOUT`` on the local backend) and corrupted
file contents when the agent wrote to scratch paths via the terminal
tool. Fix: ``shlex.quote()`` every interpolation of ``_snapshot_path``
and ``_cwd_file`` in base.py — no-op on POSIX (the paths contain no
shell-metachars), critical on Windows.
2. **Stale PATH on first hermes launch after install.** ``install.ps1``
adds the PortableGit ``cmd`` / ``bin`` / ``usr\bin`` directories to the
Windows **User** PATH via ``SetEnvironmentVariable(..., "User")``. That
write propagates to newly *spawned* processes only — already-running
shells (including the one the user types ``hermes`` into immediately
after install) retain their old PATH. So hermes starts with a PATH that
doesn't include bash, rg, grep, ssh — and ``search_files`` reports
"rg/find not available" when the user clearly just installed them.
Fix: new ``_augment_path_with_known_tools()`` helper called from
``configure_windows_stdio()`` on startup. Prepends the Hermes-managed
Git directories + the WinGet Links directory (where ripgrep lands) to
``os.environ['PATH']`` if they exist on disk but aren't already in
PATH. Subsequent subprocess calls (including bash spawns via
``_find_bash()``) inherit the augmented PATH and find everything.
No-op on POSIX and when the directories don't exist.
3. **Root cause of "file content corruption".** #1 was the proximate cause.
Errors like ``C:/Users/.../hermes-snap-xxx.sh: No such file or directory``
were emitted on stderr by the failed redirect, captured into stdout via
``stderr=subprocess.STDOUT``, and if the agent used terminal commands
like ``cat > file`` the leaked error bytes became part of the file.
Fixing #1 eliminates this entirely.
## Tests
All 77 Windows-compat tests still pass on Linux (POSIX path is
shlex.quote('/tmp/foo.sh') → '/tmp/foo.sh' — unchanged).
## Not addressed here (would need a bigger design)
- Python file tools (``write_file``, ``read_file``) and the bash-backed
terminal tool see DIFFERENT views of ``/tmp`` on Windows. Python treats
``/tmp`` as ``C:\tmp`` (drive-relative), Git Bash's MSYS2 treats it as
a virtual mount to the PortableGit install's ``tmp\``. Would need a
translation shim in the Python tools to resolve bash-virtual paths to
their native-Windows equivalents. Workaround for users today: use
absolute native paths (``C:\Users\you\...``) instead of ``/tmp/...``
when crossing between terminal and Python file tools.
Three real bugs from teknium1's first Windows install run:
1. **MinGit has no bash.exe.** MinGit is the minimal-automation Git for Windows
distribution — it ships git.exe but deliberately strips bash and the POSIX
coreutils. Installer logged "Could not locate bash.exe" and Hermes would
fail to run any shell command. Switched to PortableGit — the full Git for
Windows minus the installer UI. PortableGit ships bash.exe at
<root>\bin\bash.exe plus sh, awk, sed, grep, curl, ssh in usr\bin\. ARM64
variant is detected separately (PortableGit-*-arm64.7z.exe). 32-bit falls
back to MinGit-32-bit with a warning (PortableGit is 64-bit only).
PortableGit ships as a 7z self-extractor (56MB vs MinGit's 38MB). We
invoke it with `-o<target> -y` to extract silently — no 7z install needed,
it's self-contained.
Updated tools/environments/local.py::_find_bash candidate order to prefer
the PortableGit layout (<root>\bin\bash.exe) with the MinGit layout
(<root>\usr\bin\bash.exe) as a fallback so existing installs keep working.
2. **os.execvp "Exec format error" on Windows.** Setup wizard's "Launch
hermes chat now? Y" called `os.execvp(["hermes", "chat"])` which on
Windows can only swap to real Win32 .exe files — chokes with OSError(8)
on .cmd batch shims and Python console-script wrappers. Added a
win32 branch in hermes_cli/relaunch.py::relaunch() that uses
subprocess.run + sys.exit — functionally identical (user sees "hermes
exited, then new hermes started") with one extra PID in play. POSIX
path is UNCHANGED — still uses os.execvp for in-place replacement.
Catches OSError in the Windows branch and surfaces a "open a new
terminal so PATH picks up, then re-run hermes" hint instead of a
cryptic traceback.
3. **npm install failures silent on Windows.** The install.ps1 was invoking
`npm install --silent 2>&1 | Out-Null` inside a try/catch. PowerShell's
try/catch does NOT trigger on non-zero process exit codes — only on
unhandled .NET exceptions — so npm failing printed a generic "npm
install failed" with zero information about WHY. The silent pipe ate
the stderr.
Rewrote Install-NodeDeps to:
- Resolve npm.cmd via Get-Command (respects PATHEXT) instead of
relying on bare `npm` name resolution.
- Use Start-Process with -PassThru to capture the actual exit code.
- Redirect stderr to a temp log and surface the first ~800 chars of
the real npm error when install fails, plus the log path for the
full text.
- Fail loudly with the right exit code instead of a misleading success.
- Bail cleanly with a helpful message when npm isn't on PATH at all.
4. **"True" printing to console after Node check.** `Test-Node` returns $true;
installer called it as a bare statement (no assignment, no cast). PowerShell
prints bare return values. Wrapped the call in `[void](Test-Node)`.
## Tests
- Added 3 new tests in tests/hermes_cli/test_relaunch.py covering the
Windows branch: subprocess is called (not execvp), child exit code
propagates, OSError surfaces a helpful message. All 23 tests pass
(20 existing + 3 new).
- 77 Windows-compat tests still pass, POSIX behaviour unchanged.
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
Pre-existing Windows bug surfaced while reviewing the portable-MinGit
install: prompt_toolkit's Buffer.open_in_editor() falls back to POSIX
absolute paths (/usr/bin/nano, /usr/bin/vi, /usr/bin/emacs) that don't
exist on native Windows. When neither $EDITOR nor $VISUAL is set,
Ctrl+X Ctrl+E ("open prompt in editor") and /edit both silently do
nothing on Windows — the user hits the key, nothing happens, no error.
This wasn't caused by MinGit (full Git for Windows doesn't fix it either,
because the Windows Python subprocess call resolves `/usr/bin/nano` as
`C:\usr\bin\nano`, which doesn't exist even with nano installed).
Fixes:
- hermes_cli/stdio.py::configure_windows_stdio now sets EDITOR=notepad
on Windows if neither EDITOR nor VISUAL is set. notepad.exe is in
every Windows install, works as a blocking editor (subprocess.call
waits for the window to close), and writes back to the file.
- hermes_cli/config.py (hermes config edit): reorder fallback list so
Windows tries notepad first — previously nano led the list, which
required Git Bash / WSL to be in PATH.
- Users who want VSCode / Neovim / Notepad++ can still override via
$env:EDITOR — that's checked before our default kicks in. Docstring
spells out the common overrides.
The Ink TUI (`hermes --tui`) already handled Windows correctly via
ui-tui/src/lib/editor.ts falling back to notepad.exe on win32 — this
commit brings the classic prompt_toolkit CLI into parity.
3 new tests in test_windows_native_support.py verify:
- EDITOR=notepad gets set when unset on Windows
- Explicit $EDITOR is respected
- $VISUAL is respected (not overwritten by our default)
User hit a real failure case: their system Git was in a half-installed state
(can neither uninstall nor reinstall) and winget refused to work around it.
We were one step away from shipping an installer that would have left users
with exactly the problem he already had.
What other agents do (reality check):
- Claude Code: requires pre-installed Git; breaks if user doesn't have it.
- OpenCode, Codex: don't need bash at all — PowerShell-first design.
- Cline: uses whatever shell VSCode is configured with; installs nothing.
None of them solve the "broken system Git" problem. We need to own our Git.
Changes:
- scripts/install.ps1::Install-Git: dropped winget path entirely. Now:
(1) use existing git if present; (2) download portable MinGit from the
official git-for-windows GitHub release to %LOCALAPPDATA%\hermes\git.
No winget, no admin, no Windows installer registry, no system impact.
- Added %LOCALAPPDATA%\hermes\git\{cmd,usr\bin} to User PATH so git + bash
+ POSIX coreutils (which, env, grep, …) resolve in fresh shells.
- tools/environments/local.py::_find_bash: reorder so Hermes' portable
MinGit install is checked BEFORE falling through to shutil.which("bash")
or system install locations. This way a broken system Git can't
hijack the bash lookup.
- README + installation docs reworded to reflect the new story: "portable
Git Bash, isolated from any system install, recoverable via rm -rf if it
ever breaks."
Recoverability: if Hermes' Git install ever breaks, ``Remove-Item %LOCALAPPDATA%\hermes\git``
and re-run the installer — no system impact, no uninstall drama, no winget
to fight with.
Native Windows (with Git for Windows installed) can now run the Hermes CLI
and gateway end-to-end without crashing. install.ps1 already existed and
the Git Bash terminal backend was already wired up — this PR fills the
remaining gaps discovered by auditing every Windows-unsafe primitive
(`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios`
imports) and by comparing hermes against how Claude Code, OpenCode, Codex,
and Cline handle native Windows.
## What changed
### UTF-8 stdio (new module)
- `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point.
Flips the console code page to CP_UTF8 (65001), reconfigures
`sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8`
for subprocesses. No-op on non-Windows. Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`.
- Called early in `cli.py::main`, `hermes_cli/main.py::main`, and
`gateway/run.py::main` so Unicode banners (box-drawing, geometric
symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252
consoles.
### Crash sites fixed
- `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw
`os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)`
which routes through `taskkill /T /F` on Windows.
- `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also
converted SIGTERM path to `terminate_pid()` and widened OSError catch
on the intermediate `os.kill(pid, 0)` probe.
- `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` →
`getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the
pattern already used in `gateway/status.py`).
### OSError widening on `os.kill(pid, 0)` probes
Windows raises `OSError` (WinError 87) for a gone PID instead of
`ProcessLookupError`. Widened the catch at:
- `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this,
the loop busy-spins the full 10s every Windows gateway start)
- `hermes_cli/gateway.py:228, 460, 940`
- `hermes_cli/profiles.py:777`
- `tools/process_registry.py::_is_host_pid_alive`
- `tools/browser_tool.py:1170, 1206`
### Dashboard PTY graceful degradation
`hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`,
none of which exist on native Windows. Previously a Windows dashboard
would crash on `import hermes_cli.web_server` because of a top-level
import. Now:
- `hermes_cli/web_server.py` wraps the pty_bridge import in
`try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`.
- The `/api/pty` WebSocket handler returns a friendly "use WSL2 for
this tab" message instead of exploding.
- Every other dashboard feature (sessions, jobs, metrics, config
editor) runs natively on Windows.
### Dependency
- `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so
Python's `zoneinfo` works on Windows (which has no IANA tzdata
shipped with the OS). Credits @sprmn24 (PR #13182).
### Docs
- README.md: removed "Native Windows is not supported"; added
PowerShell one-liner and Git-for-Windows prerequisite note.
- `website/docs/getting-started/installation.md`: new Windows section
with capability matrix (everything native except the dashboard
`/chat` PTY tab, which is WSL2-only).
- `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as
"WSL2 as an alternative to native" rather than "the only way".
- `website/docs/developer-guide/contributing.md`: updated
cross-platform guidance with the `signal.SIGKILL` / `OSError`
rules we enforce now.
- `website/docs/user-guide/features/web-dashboard.md`: acknowledged
native Windows works for everything except the embedded PTY pane.
## Why this shape
Pulled from a survey of how other agent codebases handle native
Windows (Claude Code, OpenCode, Codex, Cline):
- All four treat Git Bash as the canonical shell on Windows, same as
hermes already does in `tools/environments/local.py::_find_bash()`.
- None of them force `SetConsoleOutputCP` — but they don't have to,
Node/Rust write UTF-16 to the Win32 console API. Python does not get
that for free, so we flip CP_UTF8 via ctypes.
- None of them ship PowerShell-as-primary-shell (Claude Code exposes
PS as a secondary tool; scope creep for this PR).
- All of them use `taskkill /T /F` for force-kill on Windows, which
is exactly what `gateway.status.terminate_pid(force=True)` does.
## Non-goals (deliberate scope limits)
- No PowerShell-as-a-second-shell tool — worth designing separately.
- No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's
the hardest design call and needs a separate doc.
- No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld
cluster) — will do as follow-up if users hit actual breakage; most
modern code already specifies it.
## Validation
- 28 new tests in `tests/tools/test_windows_native_support.py` — all
platform-mocked, pass on Linux CI. Cover:
- `configure_windows_stdio` idempotency, opt-out, env-preservation
- `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback
- `getattr(signal, "SIGKILL", …)` fallback shape
- `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior)
- Source-level checks that all entry points call `configure_windows_stdio`
- pty_bridge import-guard present in `web_server.py`
- README no longer says "not supported"
- 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass.
- `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed
pre-existing on main by stash-test).
- `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure).
- `tests/tools/test_process_registry.py` + `test_browser_*` pass.
- Manual smoke: `import hermes_cli.stdio; import gateway.run;
import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True`
on Linux (as expected).
## Files
- New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py`
- Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`,
`hermes_cli/profiles.py`, `hermes_cli/gateway.py`,
`hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`,
`hermes_cli/web_server.py`, `tools/browser_tool.py`,
`tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4
docs pages.
Credits to everyone whose prior PR work informed these fixes — see
the co-author trailers. All of the PRs listed in
`~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL`
/ UTF-8 stdio / tzdata / README patterns found the same issues; this PR
consolidates them.
Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com>
Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com>
Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com>
Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com>
Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com>
Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com>
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com>
Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>
_distribution_metadata() reads the profile's distribution.yaml without
an explicit encoding, which defaults to the platform's locale encoding
— UTF-8 on POSIX, cp1252/mbcs on Windows. Files round-tripped between
hosts get mojibake on the Windows side.
Single-line fix: add encoding='utf-8' to the open() call. Matches the
sibling _read_config_model() site at line 398, which already does this.
Surfaces once PR #21561 lands the blocking ruff-check CI job
(PLW1514 — unspecified-encoding), but the underlying bug is
pre-existing on main.
Fifth and final slice polish on top of @dlkakbs's docs + skill. Three
things ship here:
1. Subscription renewal cron recipe (the #1 operational footgun).
Microsoft Graph webhook subscriptions expire at 72 hours max and
don't auto-renew. The shipped operator runbook mentioned
`maintain-subscriptions --dry-run` as a "daily or periodic check"
but never told operators how to actually automate it. Without a
scheduled job, any production deployment silently stops ingesting
meetings three days after go-live.
Adds an "Automating subscription renewal (REQUIRED for production)"
section to website/docs/guides/operate-teams-meeting-pipeline.md
with three concrete options and copy-pasteable configs:
- Option 1: Hermes cron (`hermes cron add --schedule "0 */12 * * *"
--script-only --command "hermes teams-pipeline maintain-subscriptions"`)
- Option 2: systemd service + timer (12h cadence, Persistent=true
so missed runs catch up after reboots)
- Option 3: plain crontab with a wrapper that sources .env for
credentials
Go-Live Checklist gains a bolded mandatory item for the schedule
being in place, with a cross-link to the section.
website/docs/user-guide/messaging/teams-meetings.md adds a
`::⚠️::` admonition right after the manual `subscribe`
examples so anyone who creates a subscription manually is told
the same day that it will silently expire in 72 hours.
2. Sidebar wiring. Shela's new docs pages (teams-meetings.md and
operate-teams-meeting-pipeline.md) weren't in website/sidebars.ts,
so they were orphaned URLs — reachable only if someone knew the
path. Wired teams-meetings into Messaging Platforms next to the
existing teams entry, and operate-teams-meeting-pipeline into
Guides & Tutorials next to microsoft-graph-app-registration from
PR #21922. Adjacent placement keeps the related pages discoverable
from each other.
3. SKILL.md rewrite (v1.0.0 → v1.1.0).
The original skill had five Turkish-only trigger phrases, which
works in a Turkish-speaking session but doesn't match English
triggers. Rewrote the skill to:
- Describe triggers by intent instead of exact phrases, with
explicit "works in any language" framing and example phrases
in both English and Turkish.
- Add a Decision Tree section covering the three most common user
asks (missing summary, setup verification, re-run request) and
the specific CLI command sequence for each.
- Add a dedicated "Critical pitfall: Graph subscriptions expire
in 72 hours" section that tells the agent exactly what to do
when a user reports "worked yesterday, nothing today" — the
most common operational failure mode.
- Expand the command reference into three labeled groups (Status
and inspection / Re-running and debugging / Subscription
management) so the agent can reach for the right command
without scanning.
- Add cross-links to all four related docs pages (Azure app
registration, webhook listener setup, full pipeline setup,
operator runbook).
Validation:
- npm run build: all new pages route, anchor to
#automating-subscription-renewal-required-for-production resolves
from both the runbook TOC and the teams-meetings.md admonition.
- scripts/run_tests.sh on the relevant test suites (607 tests): all
pass.
* feat(tui): support attaching to an existing gateway
Allow the TUI gateway client to connect via HERMES_TUI_GATEWAY_URL while preserving spawned gateway fallback, and mirror event frames to sidecar feeds so dashboard tool activity remains visible.
* review(copilot): redact attach URLs and gate stale transport exits
Strip query strings (and any user info) from gateway / sidecar URLs before logging or surfacing them in `gateway.start_timeout`, so attach tokens never leak into the TUI log tail or activity feed. Also gate the spawned-proc and websocket close handlers on transport identity so a stale child or socket cannot clear a freshly-started ready timer or reject newly-issued pending requests during reconnect.
* review(copilot): tighten transport restart and shutdown lifecycle
Reject any in-flight RPCs in resetStartupState so callers do not hang on promises issued to the previous transport when start() swaps a child or socket. Have kill() explicitly reject pending so attach-mode promises drain after an intentional shutdown, and reattach when HERMES_TUI_GATEWAY_URL rotates between requests instead of silently keeping the old session. Fold the spawned child error path through handleTransportExit so a failed spawn clears the startup timer and emits a single exit event. Also null the websocket reference before calling close so the identity guard correctly tags stale close events on real WebSocket timing. Locks the new behaviors in with regression tests for kill, URL rotation, and stale-pending cleanup.
* review(copilot): swallow stray ws connect rejection and isolate test env
Attach a no-op catch handler on the websocket connect promise so an unobserved connect-error / early-close rejection cannot surface as an unhandled promise rejection in Node when no request is currently racing the open. Snapshot HERMES_TUI_GATEWAY_URL / HERMES_TUI_SIDECAR_URL in beforeEach and restore them in afterEach so vitest runs that set those env vars beforehand do not get permanently cleared.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* review(copilot): hoist wire decoder and harden redact fallback
Reuse a single module-level TextDecoder for binary websocket frames so high-frequency attach-mode traffic does not allocate one per message. Strengthen the redactUrl fallback so embedded user:pass@ credentials are also masked when the WHATWG URL parser rejects the input, and pin the new behavior with a regression test that drives a malformed bearer URL through the gateway-stderr publish path.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* review(copilot): force redact fallback path with deterministic fixture
Replace the "%zz" user-info fixture, which WHATWG URL actually accepts in recent Node and silently routed the test back through the structured-URL branch, with a port-99999 fixture that the parser rejects across Node versions. Add a pre-flight `expect(() => new URL(fixture)).toThrow()` assertion so a future URL-parser change can never silently bypass `redactUrl()`'s fallback again.
* review(copilot): sanitize websocket constructor failures
Avoid logging raw WebSocket constructor error messages because some implementations include the full input URL, including token-bearing query strings. Log the redacted gateway or sidecar URL with the error class instead, and add regression coverage for constructor-throw paths on both attach and sidecar sockets.
* review(self): restart transport on attach-mode transition
Route runtime HERMES_TUI_GATEWAY_URL changes through start() so switching from spawned-gateway mode to attach mode also tears down the previously spawned Python child instead of leaving it alive. Keep the existing fast-fail behavior for pending RPCs. Also make constructor-failure logging fully generic after the redacted URL, avoiding even implementation-specific error class text in the log tail.
* review(copilot): use websocket wording for attach close errors
When the attached websocket closes, reject pending RPCs with an explicit websocket-closed reason instead of the spawned-process oriented `gateway exited` wording. Add coverage to ensure close code 1011 surfaces as `gateway websocket closed (1011)`.
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Third docs slice shipped alongside the TeamsSummaryWriter code so
operators can configure outbound summary delivery the moment this
PR lands.
- website/docs/user-guide/messaging/teams.md: new 'Meeting Summary
Delivery (Teams Meeting Pipeline)' section under Features,
explaining that the existing teams adapter handles pipeline
outbound (not a separate adapter surface), with a config-snippet
example for graph and incoming_webhook modes, a mode-choice
trade-off table, and a note that settings are inert when the
teams_pipeline plugin is disabled.
- website/docs/reference/environment-variables.md: new Teams Meeting
Summary Delivery subsection documenting TEAMS_DELIVERY_MODE,
TEAMS_INCOMING_WEBHOOK_URL, TEAMS_GRAPH_ACCESS_TOKEN, TEAMS_TEAM_ID,
TEAMS_CHANNEL_ID, TEAMS_CHAT_ID with cross-link to the Teams setup
page section.
Verified via npm run build: pages route correctly, no new warnings
or errors.
test_build_pipeline_runtime_reuses_existing_teams_adapter_surface set
delivery_mode='incoming_webhook' but omitted incoming_webhook_url.
_teams_delivery_is_configured() requires the URL to mark delivery as
enabled, so the guarded build_pipeline_runtime gate in runtime.py
correctly left teams_sender=None and the assertion failed.
The intent of the test — prove we reuse the existing TeamsSummaryWriter
from plugins/platforms/teams/adapter.py rather than introducing a new
adapter surface elsewhere — is unchanged. Added the URL so the gate
passes and the architectural assertion holds.
Two salvage follow-ups on top of @dlkakbs's plugin runtime.
1. Install a drop-scheduler when the runtime fails to build.
Previously when ``build_pipeline_runtime()`` raised (e.g. missing
Graph env vars, subscription store path unwritable), ``bind_gateway_runtime``
logged a warning and returned False, leaving the msgraph_webhook
adapter with no scheduler at all. Incoming Graph notifications
would then fall back to the adapter's default ``handle_message``
path, which produces a raw JSON dump as a user-role message — not
useful and fires every time Graph retries.
Now a no-op drop-scheduler is installed instead, so:
- Graph notifications ack cleanly (202) so Graph stops retrying.
- The failure is surfaced once in the log with the error.
- No user-role messages get manufactured from raw change payloads.
The adapter is still bindable later once the runtime becomes
available (e.g. after the operator runs ``hermes teams-pipeline
validate`` and fixes the config), since the gateway's
``_teams_pipeline_runtime`` sentinel wasn't set to a non-None value.
2. Test wiring for ``_teams_pipeline_plugin_enabled()`` gate.
The happy-path runner-wiring tests monkeypatched ``bind_gateway_runtime``
but not ``_load_gateway_config``. In the hermetic test environment
the real config read ran, saw no enabled plugins, and short-circuited
the bind call before the test could observe it — so the test
expected ``calls == [runner]`` but got ``calls == []``.
Adds a ``_load_gateway_config`` monkeypatch with
``plugins.enabled = ["teams_pipeline"]`` to the happy-path tests.
The explicit-disabled test ``test_gateway_runner_skips_wiring_when_teams_pipeline_plugin_disabled``
already patches the config correctly.
Also renames ``test_bind_gateway_runtime_leaves_scheduler_unchanged_on_failure``
to ``test_bind_gateway_runtime_installs_drop_scheduler_on_failure``
and updates the assertion — this test contradicted the drop-scheduler
test in ``tests/plugins/test_teams_pipeline_plugin.py`` which
expected the scheduler to be installed. The plugin-test name
(``test_bind_gateway_runtime_drops_notifications_when_unavailable``)
clearly describes the intended behavior; fixing the wiring-test
assertion aligns both tests.
Validation:
- ``scripts/run_tests.sh tests/plugins/test_teams_pipeline_plugin.py
tests/gateway/test_teams_pipeline_runtime_wiring.py
tests/hermes_cli/test_teams_pipeline_plugin_cli.py`` — 25/25 passed.
Third slice of the Microsoft Teams meeting pipeline stack, salvaged
onto current main. Adds the standalone teams_pipeline plugin that
consumes Graph change notifications from the webhook listener,
resolves meeting artifacts (transcript first, recording + STT fallback
later), persists job state in a durable store, and exposes an operator
CLI for inspection, replay, subscription management, and validation.
Design choices follow maintainer review feedback on PR #19815:
- Standalone plugin rather than bolted-on core surface
(plugins/teams_pipeline/, kind: standalone in plugin.yaml).
- Zero new model tools. The agent drives the pipeline by invoking
the operator CLI via the terminal tool, guided by the skill that
ships with a follow-up PR.
- Reuses the existing msgraph_webhook gateway platform for Graph
ingress. Pipeline runtime is wired in via bind_gateway_runtime and
gated on plugins.enabled so gateways that don't run the plugin
boot cleanly.
Additions:
- plugins/teams_pipeline/: runtime (gateway wiring + config builder),
pipeline core, durable SQLite store, subscription maintenance
helpers, Graph artifact resolution, operator CLI (list, show,
run/replay, fetch dry-run, subscriptions list, subscribe,
renew-subscription, delete-subscription, maintain-subscriptions,
token-health, validate).
- hermes_cli/main.py: second-pass plugin CLI discovery so any
standalone plugin registered via ctx.register_cli_command()
outside the memory-plugin convention path gets its subcommand
wired into argparse without touching core.
- gateway/run.py: _teams_pipeline_plugin_enabled() config gate,
_wire_teams_pipeline_runtime() binding after adapter setup, and
the two runner attributes used by the runtime.
Credit to @dlkakbs for the entire plugin implementation.
PR #20831 shipped the feature with a terse reference page. This adds a
proper user guide — ~570 lines of what/why/when/how with use-case
walkthroughs, lifecycle coverage from author through installer through
update, and recipe snippets for common workflows.
New page: website/docs/user-guide/profile-distributions.md
Sections:
* What this means — the before/after, side-by-side
* Why git, not tarballs or a custom format
* When to use a distribution (personal, team, community, product) and
when NOT to (local backup, sharing credentials, sharing memories)
* The lifecycle — dedicated walkthroughs for authors (publish in 4 steps)
and installers (install, check, update, remove)
* Use cases: personal sync, team internal bot, community publish,
commercial product, ephemeral ops agent
* Recipes: pin a version, compare installed vs. latest, preserve local
customizations through updates, force clean reinstall, fork-and-customize,
test before pushing
* What is NEVER in a distribution (the user-owned exclude list verbatim)
* Security and trust model — what you are trusting, why cron is not
auto-scheduled, the browser-extension analogy
Cross-linking:
* Added to sidebar under Getting Started, right after user-guide/profiles.
* Existing Profiles page ends with a Sharing profiles as distributions
teaser that links here.
* The Distribution section of the reference page gets an admonition
pointing newcomers here first. The reference stays as a CLI-flag
lookup for people who already know what they want.
Validation:
* ascii-guard lint --exclude-code-blocks docs -> 0 errors.
* All internal links resolve to real pages.
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.
_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.
Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.
Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.
AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
Tool handlers (e.g. computer_use capture) return a _multimodal envelope
dict when a screenshot is attached. The tool-message builder was passing
this raw dict as the `content` field of role:tool messages, which is an
illegal format — OpenAI-compatible APIs expect a string or a content-parts
list, not a plain Python dict, and would reject it with a 400/422 error.
Fix: unwrap _multimodal results to their `content` list
([{type:text,...},{type:image_url,...}]) in both the parallel and
sequential tool-call paths. The Anthropic adapter already handles content
lists natively; vision-capable OpenAI-compatible servers (mlx-vlm,
GPT-4o, etc.) accept image_url parts in tool messages directly.
Also add a _vision_supported adaptive fallback: on first image-rejection
error ("Only 'text' content type is supported." etc.) the agent strips all
image parts from the message history and retries with text only, so
text-only endpoints degrade gracefully without crashing the session.
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.
## cua_backend.py
**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.
**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.
**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.
**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).
**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.
**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.
**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.
**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.
## schema.py
Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.
## tool.py
Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.
## agent/display.py + run_agent.py
Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
`content: [text, image_url]` parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native `tool_result` image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- `model_tools.py` provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
Second docs slice shipped alongside the webhook listener code so users
can actually wire up the endpoint the moment this PR lands.
- website/docs/user-guide/messaging/msgraph-webhook.md: new page
covering what the listener is (change-notification ingress, distinct
from the teams chat adapter), quick-start YAML + env-var config,
full config table, security hardening (clientState + timing-safe
compare, source-IP allowlisting against Microsoft's published egress
ranges, TLS termination at the reverse proxy, response hygiene),
status-code table, troubleshooting, and cross-links to the Azure
app registration guide.
- website/docs/reference/environment-variables.md: new Microsoft
Graph Webhook Listener subsection with MSGRAPH_WEBHOOK_ENABLED,
_PORT, _CLIENT_STATE, _ACCEPTED_RESOURCES, _ALLOWED_SOURCE_CIDRS.
- website/sidebars.ts: wire the new page into Messaging Platforms,
right after the teams chat adapter so the two related pages are
adjacent in the sidebar.
The pipeline runtime / operator CLI / outbound delivery pages still
land with their matching PRs. With this PR merged, an operator can get
the listener running end-to-end, register a Graph subscription
manually, and receive validation handshake plus notification POSTs
against the configured client_state.
Verified via npm run build: new page routes at
/docs/user-guide/messaging/msgraph-webhook, sidebar wires correctly,
no new warnings or errors.
Defense-in-depth polish on top of the webhook listener before it becomes
a real attack surface once the pipeline starts creating subscriptions
and Graph starts POSTing to the configured public URL.
- Timing-safe clientState comparison. Previously used `==` on strings;
switches to hmac.compare_digest so a mismatch does not leak how many
leading characters matched. client_state is documented as a strong
shared secret (openssl rand -hex 32 in the setup docs), so a
timing-safe primitive is the right call.
- Split GET and POST handlers. Graph validates a subscription by sending
GET with validationToken in the query; anything else on GET is now a
400 so the endpoint cannot be probed or mistakenly used for data
exfil. Previously a bare GET fell through to the POST path and blew
up on request.json() with a confusing 400.
- Empty response bodies on success. 202 is returned with no body so
internal counters (accepted / duplicates / scheduled) do not leak to
any caller that can reach the endpoint; counters remain observable
via /health for operators. 403 on every-item-bad-clientState batches
(so forged POSTs stop retrying), 400 on malformed / unknown-resource
batches (sender configuration issue).
- Optional source-IP allowlist. New `allowed_source_cidrs` extra field
(list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS`
env var let operators restrict the webhook to Microsoft Graph's
published webhook source ranges in production. Empty = allow all,
preserving dev-tunnel / localhost workflows. Invalid CIDRs are
logged and ignored rather than crashing. Also gates the handshake
endpoint so disallowed IPs cannot probe it.
- Tests updated for the new response contract (empty-body 202,
auth-only 403, config-error 400) and extended to cover: bare GET
rejection, POST-with-validationToken handshake tolerance,
timing-safe compare actually invoked via hmac.compare_digest spy,
malformed body / missing value array, IP allowlist accept/reject
paths, handshake IP allowlist, invalid CIDR entries, comma-string
CIDR list parsing. 52/52 passed (was 40).
Full gateway suite: 5049 passed / 1 pre-existing failure in
test_discord_free_response (unrelated, reproduces on clean origin/main).