fix(windows): %1 install error, patch CRLF false-negative, SOUL.md BOM

Three bugs from teknium1's successful install + diagnostic chat on Windows:

1. **Start-Process -FilePath npm.cmd fails with "%1 is not a valid Win32
   application".**  Start-Process bypasses cmd.exe and PATHEXT to call
   CreateProcessW directly, which refuses .cmd batch shims.  Switched
   Install-NodeDeps to use PowerShell's invocation operator (``& $npmExe
   install --silent *> $log``) which DOES honour PATHEXT.  Extracted a
   ``_Run-NpmInstall`` helper so the browser + TUI paths share the same
   logic.  Captures $LASTEXITCODE correctly, still surfaces the real
   stderr on failure with a log-file pointer for the full output.

2. **patch tool returns false-negative on Windows due to CRLF round-trip.**
   Root cause was upstream of patch: ``subprocess.Popen(..., text=True,
   stdin=PIPE)`` on Windows translates ``\\n`` → ``\\r\\n`` when data flows
   through the stdin pipe.  ``_pipe_stdin()`` was writing the patch's
   new_content string through a text-mode pipe, bash then wrote those
   CRLF bytes to disk, and patch's post-write verify compared the
   on-disk CRLF bytes against the original LF-only string — fail.

   Fixed in two places for defense in depth:
   - ``_pipe_stdin()`` now writes through ``proc.stdin.buffer`` with
     explicit UTF-8 encoding, bypassing Python's newline translation on
     every platform.  No behaviour change on POSIX (bytes are identical)
     but stops the CRLF injection on Windows.
   - ``patch_replace``'s post-write verify normalizes CRLF→LF on both
     sides before comparing, so even if some future backend still
     translates newlines the patch tool won't report a bogus failure.

3. **SOUL.md gets a UTF-8 BOM on Windows PowerShell 5.1.**  ``Set-Content
   -Encoding UTF8`` on PS5.1 writes UTF-8 WITH a byte-order-mark (changed
   in PS7 via ``utf8NoBOM``).  Hermes's prompt-injection scanner sees
   the BOM (U+FEFF invisible char) and refuses to load the file, so
   SOUL.md's persona instructions never get applied.

   Fixed by writing the file via ``[System.IO.File]::WriteAllText``
   with an explicit ``UTF8Encoding($false)`` — BOM-free on every
   PowerShell version.

All POSIX behaviour verified unchanged: 198 tests pass across
test_file_operations, test_local_env_cwd_recovery, test_code_execution,
test_windows_native_support, test_windows_compat.
This commit is contained in:
Teknium 2026-05-07 18:11:43 -07:00
parent d52e54170a
commit 8f91d7bfa9
3 changed files with 97 additions and 75 deletions

View file

@ -99,12 +99,33 @@ def get_sandbox_dir() -> Path:
def _pipe_stdin(proc: subprocess.Popen, data: str) -> None:
"""Write *data* to proc.stdin on a daemon thread to avoid pipe-buffer deadlocks."""
"""Write *data* to proc.stdin on a daemon thread to avoid pipe-buffer deadlocks.
On Windows, text-mode stdin (``text=True`` / ``encoding="utf-8"``)
translates ``\\n`` ``\\r\\n`` as the data flows through the pipe
which corrupts every write_file / patch call because the bytes that
land on disk include injected carriage returns. The file IS created,
but every subsequent byte-count / content compare against the
caller's ``\\n``-only string fails.
Workaround: write through ``proc.stdin.buffer`` (the underlying byte
buffer), encoding to UTF-8 ourselves. That bypasses Python's
newline translation entirely on every platform. No behaviour change
on POSIX the byte sequence is identical to what text-mode would
produce there.
"""
def _write():
try:
proc.stdin.write(data)
proc.stdin.close()
# proc.stdin is a TextIOWrapper when text=True was set on the
# Popen. Its ``.buffer`` attribute is the raw BufferedWriter
# that bypasses newline translation. When Popen was created
# in byte mode, proc.stdin is already a BufferedWriter with
# no ``.buffer`` attribute — fall back to .write() directly.
raw = data.encode("utf-8") if isinstance(data, str) else data
target = getattr(proc.stdin, "buffer", proc.stdin)
target.write(raw)
target.close()
except (BrokenPipeError, OSError):
pass

View file

@ -966,11 +966,21 @@ class ShellFileOperations(FileOperations):
verify_result = self._exec(verify_cmd)
if verify_result.exit_code != 0:
return PatchResult(error=f"Post-write verification failed: could not re-read {path}")
if verify_result.stdout != new_content:
# Normalize line endings before comparing. On Windows, Python's
# default text-mode ``open()`` translates ``\n`` → ``\r\n`` on
# write, so the file on disk legitimately holds CRLFs while our
# ``new_content`` string has bare LFs. Without this normalization
# every patch on Windows returns a bogus "wrote 39, read 42"
# false-negative even though the edit landed correctly. POSIX
# backends don't translate, so this is a no-op there.
_verify_stdout_normalized = verify_result.stdout.replace("\r\n", "\n").replace("\r", "\n")
_new_content_normalized = new_content.replace("\r\n", "\n").replace("\r", "\n")
if _verify_stdout_normalized != _new_content_normalized:
return PatchResult(error=(
f"Post-write verification failed for {path}: on-disk content "
f"differs from intended write "
f"(wrote {len(new_content)} chars, read back {len(verify_result.stdout)}). "
f"(wrote {len(_new_content_normalized)} chars, read back "
f"{len(_verify_stdout_normalized)} chars after normalizing line endings). "
"The patch did not persist. Re-read the file and try again."
))