mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-12 03:42:08 +00:00
docs: add Windows-Specific Quirks section to hermes-agent skill + keystroke diagnostic
Adds a dedicated '## Windows-Specific Quirks' section to the hermes-agent skill so Windows pitfalls have one discoverable place to evolve. Inaugural entries cover: - Input / keybindings — Alt+Enter intercepted by Windows Terminal, Ctrl+Enter as the Windows newline keystroke, mintty/git-bash behavior, pointer to scripts/keystroke_diagnostic.py for investigation. - Config / files — UTF-8 BOM HTTP-400 trap. - execute_code / sandbox — WinError 10106 SYSTEMROOT root cause + _WINDOWS_ESSENTIAL_ENV_VARS fix location. - Testing / contributing — scripts/run_tests.sh POSIX-venv limitation and the system-Python workaround, POSIX-only test skip-guard patterns. - Path / filesystem — line-ending warnings (cosmetic), forward-slash portability. Collapses the old scattered Windows bullets under 'Platform-specific issues' into a single pointer at the new dedicated section so there's only one place to maintain this content. Also adds the scripts/keystroke_diagnostic.py the skill now references — a small prompt_toolkit Application that prints the Keys.* identifier and raw escape bytes for every keystroke. Used to establish the Ctrl+Enter = c-j fact on Windows Terminal; generally useful for anyone adding a platform-aware keybinding.
This commit is contained in:
parent
d1838041e5
commit
b63f9645f0
2 changed files with 210 additions and 1 deletions
81
scripts/keystroke_diagnostic.py
Normal file
81
scripts/keystroke_diagnostic.py
Normal file
|
|
@ -0,0 +1,81 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Diagnose how prompt_toolkit identifies keystrokes in the current terminal.
|
||||||
|
|
||||||
|
Useful when adding a keybinding to Hermes (or any prompt_toolkit app) and you
|
||||||
|
need to know what the terminal actually delivers — particularly on Windows,
|
||||||
|
where terminals can collapse, intercept, or silently remap key combinations.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# POSIX
|
||||||
|
python scripts/keystroke_diagnostic.py
|
||||||
|
|
||||||
|
# Windows (PowerShell / git-bash / cmd)
|
||||||
|
python scripts\\keystroke_diagnostic.py
|
||||||
|
|
||||||
|
Press the key combinations you care about. Each keystroke prints the
|
||||||
|
prompt_toolkit `Keys.*` identifier and the raw escape bytes the terminal
|
||||||
|
sent. The last 20 keystrokes stay on screen. Ctrl+Q or Ctrl+C to quit.
|
||||||
|
|
||||||
|
Common questions this answers:
|
||||||
|
- Does my terminal distinguish Ctrl+Enter from plain Enter?
|
||||||
|
(On Windows Terminal: yes, Ctrl+Enter → c-j, Enter → c-m.)
|
||||||
|
- Does Alt+Enter reach the app, or does the terminal eat it?
|
||||||
|
(Windows Terminal eats it for fullscreen; mintty may too.)
|
||||||
|
- Does Shift+Enter register as a separate key?
|
||||||
|
(Almost never — most terminals collapse it to Enter.)
|
||||||
|
- What byte sequence does Home/End/PageUp/etc. produce?
|
||||||
|
|
||||||
|
Example output for Ctrl+Enter on Windows Terminal + PowerShell:
|
||||||
|
key=<Keys.ControlJ: 'c-j'> data='\\n'
|
||||||
|
|
||||||
|
Then in Hermes, bind the newline behaviour to that key:
|
||||||
|
@kb.add('c-j')
|
||||||
|
def handle_ctrl_enter(event):
|
||||||
|
event.current_buffer.insert_text('\\n')
|
||||||
|
"""
|
||||||
|
from prompt_toolkit import Application
|
||||||
|
from prompt_toolkit.key_binding import KeyBindings
|
||||||
|
from prompt_toolkit.layout import Layout
|
||||||
|
from prompt_toolkit.layout.containers import Window
|
||||||
|
from prompt_toolkit.layout.controls import FormattedTextControl
|
||||||
|
|
||||||
|
|
||||||
|
_HISTORY: list[str] = []
|
||||||
|
|
||||||
|
|
||||||
|
def _header() -> list[str]:
|
||||||
|
return [
|
||||||
|
"Keystroke diagnostic — press keys to see how prompt_toolkit sees them.",
|
||||||
|
"Try: Enter, Ctrl+Enter, Shift+Enter, Alt+Enter, Ctrl+J, Ctrl+M, arrows, Home/End.",
|
||||||
|
"Ctrl+Q or Ctrl+C to quit. Last 20 keystrokes shown.",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _render_text() -> str:
|
||||||
|
return "\n".join(_header() + _HISTORY[-20:])
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
kb = KeyBindings()
|
||||||
|
|
||||||
|
@kb.add("<any>")
|
||||||
|
def _on_any(event): # noqa: ANN001 — prompt_toolkit event type
|
||||||
|
parts = []
|
||||||
|
for kp in event.key_sequence:
|
||||||
|
parts.append(f"key={kp.key!r} data={kp.data!r}")
|
||||||
|
_HISTORY.append(" | ".join(parts))
|
||||||
|
event.app.invalidate()
|
||||||
|
|
||||||
|
@kb.add("c-q")
|
||||||
|
@kb.add("c-c")
|
||||||
|
def _quit(event): # noqa: ANN001
|
||||||
|
event.app.exit()
|
||||||
|
|
||||||
|
control = FormattedTextControl(text=_render_text)
|
||||||
|
layout = Layout(Window(content=control))
|
||||||
|
Application(layout=layout, key_bindings=kb, full_screen=False).run()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -700,6 +700,96 @@ User docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Windows-Specific Quirks
|
||||||
|
|
||||||
|
Hermes runs natively on Windows (PowerShell, cmd, Windows Terminal, git-bash
|
||||||
|
mintty, VS Code integrated terminal). Most of it just works, but a handful
|
||||||
|
of differences between Win32 and POSIX have bitten us — document new ones
|
||||||
|
here as you hit them so the next person (or the next session) doesn't
|
||||||
|
rediscover them from scratch.
|
||||||
|
|
||||||
|
### Input / Keybindings
|
||||||
|
|
||||||
|
**Alt+Enter doesn't insert a newline.** Windows Terminal intercepts Alt+Enter
|
||||||
|
at the terminal layer to toggle fullscreen — the keystroke never reaches
|
||||||
|
prompt_toolkit. Use **Ctrl+Enter** instead. Windows Terminal delivers
|
||||||
|
Ctrl+Enter as LF (`c-j`), distinct from plain Enter (`c-m` / CR), and the
|
||||||
|
CLI binds `c-j` to newline insertion on `win32` only (see
|
||||||
|
`_bind_prompt_submit_keys` + the Windows-only `c-j` binding in `cli.py`).
|
||||||
|
Side effect: the raw Ctrl+J keystroke also inserts a newline on Windows —
|
||||||
|
unavoidable, because Windows Terminal collapses Ctrl+Enter and Ctrl+J to
|
||||||
|
the same keycode at the Win32 console API layer. No conflicting binding
|
||||||
|
existed for Ctrl+J on Windows, so this is a harmless side effect.
|
||||||
|
|
||||||
|
mintty / git-bash behaves the same (fullscreen on Alt+Enter) unless you
|
||||||
|
disable Alt+Fn shortcuts in Options → Keys. Easier to just use Ctrl+Enter.
|
||||||
|
|
||||||
|
**Diagnosing keybindings.** Run `python scripts/keystroke_diagnostic.py`
|
||||||
|
(repo root) to see exactly how prompt_toolkit identifies each keystroke
|
||||||
|
in the current terminal. Answers questions like "does Shift+Enter come
|
||||||
|
through as a distinct key?" (almost never — most terminals collapse it
|
||||||
|
to plain Enter) or "what byte sequence is my terminal sending for
|
||||||
|
Ctrl+Enter?" This is how the Ctrl+Enter = c-j fact was established.
|
||||||
|
|
||||||
|
### Config / Files
|
||||||
|
|
||||||
|
**HTTP 400 "No models provided" on first run.** `config.yaml` was saved
|
||||||
|
with a UTF-8 BOM (common when Windows apps write it). Re-save as UTF-8
|
||||||
|
without BOM. `hermes config edit` writes without BOM; manual edits in
|
||||||
|
Notepad are the usual culprit.
|
||||||
|
|
||||||
|
### `execute_code` / Sandbox
|
||||||
|
|
||||||
|
**WinError 10106** ("The requested service provider could not be loaded
|
||||||
|
or initialized") from the sandbox child process — it can't create an
|
||||||
|
`AF_INET` socket, so the loopback-TCP RPC fallback fails before
|
||||||
|
`connect()`. Root cause is usually **not** a broken Winsock LSP; it's
|
||||||
|
Hermes's own env scrubber dropping `SYSTEMROOT` / `WINDIR` / `COMSPEC`
|
||||||
|
from the child env. Python's `socket` module needs `SYSTEMROOT` to locate
|
||||||
|
`mswsock.dll`. Fixed via the `_WINDOWS_ESSENTIAL_ENV_VARS` allowlist in
|
||||||
|
`tools/code_execution_tool.py`. If you still hit it, echo `os.environ`
|
||||||
|
inside an `execute_code` block to confirm `SYSTEMROOT` is set. Full
|
||||||
|
diagnostic recipe in `references/execute-code-sandbox-env-windows.md`.
|
||||||
|
|
||||||
|
### Testing / Contributing
|
||||||
|
|
||||||
|
**`scripts/run_tests.sh` doesn't work as-is on Windows** — it looks for
|
||||||
|
POSIX venv layouts (`.venv/bin/activate`). The Hermes-installed venv at
|
||||||
|
`venv/Scripts/` has no pip or pytest either (stripped for install size).
|
||||||
|
Workaround: install `pytest + pytest-xdist + pyyaml` into a system Python
|
||||||
|
3.11 user site, then invoke pytest directly with `PYTHONPATH` set:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
"/c/Program Files/Python311/python" -m pip install --user pytest pytest-xdist pyyaml
|
||||||
|
export PYTHONPATH="$(pwd)"
|
||||||
|
"/c/Program Files/Python311/python" -m pytest tests/foo/test_bar.py -v --tb=short -n 0
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `-n 0`, not `-n 4` — `pyproject.toml`'s default `addopts` already
|
||||||
|
includes `-n`, and the wrapper's CI-parity guarantees don't apply off POSIX.
|
||||||
|
|
||||||
|
**POSIX-only tests need skip guards.** Common markers already in the codebase:
|
||||||
|
- Symlinks — elevated privileges on Windows
|
||||||
|
- `0o600` file modes — POSIX mode bits not enforced on NTFS by default
|
||||||
|
- `signal.SIGALRM` — Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
|
||||||
|
- Winsock / Windows-specific regressions — `@pytest.mark.skipif(sys.platform != "win32", ...)`
|
||||||
|
|
||||||
|
Use the existing skip-pattern style (`sys.platform == "win32"` or
|
||||||
|
`sys.platform.startswith("win")`) to stay consistent with the rest of the
|
||||||
|
suite.
|
||||||
|
|
||||||
|
### Path / Filesystem
|
||||||
|
|
||||||
|
**Line endings.** Git may warn `LF will be replaced by CRLF the next time
|
||||||
|
Git touches it`. Cosmetic — the repo's `.gitattributes` normalizes. Don't
|
||||||
|
let editors auto-convert committed POSIX-newline files to CRLF.
|
||||||
|
|
||||||
|
**Forward slashes work almost everywhere.** `C:/Users/...` is accepted by
|
||||||
|
every Hermes tool and most Windows APIs. Prefer forward slashes in code
|
||||||
|
and logs — avoids shell-escaping backslashes in bash.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Voice not working
|
### Voice not working
|
||||||
|
|
@ -742,7 +832,7 @@ Common gateway problems:
|
||||||
### Platform-specific issues
|
### Platform-specific issues
|
||||||
- **Discord bot silent**: Must enable **Message Content Intent** in Bot → Privileged Gateway Intents.
|
- **Discord bot silent**: Must enable **Message Content Intent** in Bot → Privileged Gateway Intents.
|
||||||
- **Slack bot only works in DMs**: Must subscribe to `message.channels` event. Without it, the bot ignores public channels.
|
- **Slack bot only works in DMs**: Must subscribe to `message.channels` event. Without it, the bot ignores public channels.
|
||||||
- **Windows HTTP 400 "No models provided"**: Config file encoding issue (BOM). Ensure `config.yaml` is saved as UTF-8 without BOM.
|
- **Windows-specific issues** (`Alt+Enter` newline, WinError 10106, UTF-8 BOM config, test suite, line endings): see the dedicated **Windows-Specific Quirks** section above.
|
||||||
|
|
||||||
### Auxiliary models not working
|
### Auxiliary models not working
|
||||||
If `auxiliary` tasks (vision, compression, session_search) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
|
If `auxiliary` tasks (vision, compression, session_search) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
|
||||||
|
|
@ -865,6 +955,44 @@ python -m pytest tests/tools/ -q # Specific area
|
||||||
- Run full suite before pushing any change
|
- Run full suite before pushing any change
|
||||||
- Use `-o 'addopts='` to clear any baked-in pytest flags
|
- Use `-o 'addopts='` to clear any baked-in pytest flags
|
||||||
|
|
||||||
|
**Windows contributors:** `scripts/run_tests.sh` currently looks for POSIX venvs (`.venv/bin/activate` / `venv/bin/activate`) and will error out on Windows where the layout is `venv/Scripts/activate` + `python.exe`. The Hermes-installed venv at `venv/Scripts/` also has no `pip` or `pytest` — it's stripped for end-user install size. Workaround: install pytest + pytest-xdist + pyyaml into a system Python 3.11 user site (`/c/Program Files/Python311/python -m pip install --user pytest pytest-xdist pyyaml`), then run tests directly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export PYTHONPATH="$(pwd)"
|
||||||
|
"/c/Program Files/Python311/python" -m pytest tests/tools/test_foo.py -v --tb=short -n 0
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `-n 0` (not `-n 4`) because `pyproject.toml`'s default `addopts` already includes `-n`, and the wrapper's CI-parity story doesn't apply off-POSIX.
|
||||||
|
|
||||||
|
**Cross-platform test guards:** tests that use POSIX-only syscalls need a skip marker. Common ones already in the codebase:
|
||||||
|
- Symlink creation → `@pytest.mark.skipif(sys.platform == "win32", reason="Symlinks require elevated privileges on Windows")` (see `tests/cron/test_cron_script.py`)
|
||||||
|
- POSIX file modes (0o600, etc.) → `@pytest.mark.skipif(sys.platform.startswith("win"), reason="POSIX mode bits not enforced on Windows")` (see `tests/hermes_cli/test_auth_toctou_file_modes.py`)
|
||||||
|
- `signal.SIGALRM` → Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
|
||||||
|
- Live Winsock / Windows-specific regression tests → `@pytest.mark.skipif(sys.platform != "win32", reason="Windows-specific regression")`
|
||||||
|
|
||||||
|
**Monkeypatching `sys.platform` is not enough** when the code under test also calls `platform.system()` / `platform.release()` / `platform.mac_ver()`. Those functions re-read the real OS independently, so a test that sets `sys.platform = "linux"` on a Windows runner will still see `platform.system() == "Windows"` and route through the Windows branch. Patch all three together:
|
||||||
|
|
||||||
|
```python
|
||||||
|
monkeypatch.setattr(sys, "platform", "linux")
|
||||||
|
monkeypatch.setattr(platform, "system", lambda: "Linux")
|
||||||
|
monkeypatch.setattr(platform, "release", lambda: "6.8.0-generic")
|
||||||
|
```
|
||||||
|
|
||||||
|
See `tests/agent/test_prompt_builder.py::TestEnvironmentHints` for a worked example.
|
||||||
|
|
||||||
|
### Extending the system prompt's execution-environment block
|
||||||
|
|
||||||
|
Factual guidance about the host OS, user home, cwd, terminal backend, and shell (bash vs. PowerShell on Windows) is emitted from `agent/prompt_builder.py::build_environment_hints()`. This is also where the WSL hint and per-backend probe logic live. The convention:
|
||||||
|
|
||||||
|
- **Local terminal backend** → emit host info (OS, `$HOME`, cwd) + Windows-specific notes (hostname ≠ username, `terminal` uses bash not PowerShell).
|
||||||
|
- **Remote terminal backend** (anything in `_REMOTE_TERMINAL_BACKENDS`: `docker, singularity, modal, daytona, ssh, vercel_sandbox, managed_modal`) → **suppress** host info entirely and describe only the backend. A live `uname`/`whoami`/`pwd` probe runs inside the backend via `tools.environments.get_environment(...).execute(...)`, cached per process in `_BACKEND_PROBE_CACHE`, with a static fallback if the probe times out.
|
||||||
|
- **Key fact for prompt authoring:** when `TERMINAL_ENV != "local"`, *every* file tool (`read_file`, `write_file`, `patch`, `search_files`) runs inside the backend container, not on the host. The system prompt must never describe the host in that case — the agent can't touch it.
|
||||||
|
|
||||||
|
Full design notes, the exact emitted strings, and testing pitfalls:
|
||||||
|
`references/prompt-builder-environment-hints.md`.
|
||||||
|
|
||||||
|
**Refactor-safety pattern (POSIX-equivalence guard):** when you extract inline logic into a helper that adds Windows/platform-specific behavior, keep a `_legacy_<name>` oracle function in the test file that's a verbatim copy of the old code, then parametrize-diff against it. Example: `tests/tools/test_code_execution_windows_env.py::TestPosixEquivalence`. This locks in the invariant that POSIX behavior is bit-for-bit identical and makes any future drift fail loudly with a clear diff.
|
||||||
|
|
||||||
### Commit Conventions
|
### Commit Conventions
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue