Two isolated reliability fixes:
- chat_completion_helpers: raise on a zero-chunk stream (no finish_reason,
no content/reasoning/tool_calls) so retry handles it instead of
fabricating a successful empty turn.
- model_metadata: parse the OpenRouter/Nous output-cap error phrasing
("maximum context length is N ... (A of text input, B of tool input,
C in the output)") so parse_available_output_tokens_from_error returns
a real cap and the caller stops looping on it.
Salvaged from #40405 (@ashishpatel26) — took the two stream/error-parsing
fixes. The PR also bundled compression-state changes (on_session_start
clearing _previous_summary; cron session-id prefix preservation, #38788);
those touch the compression hot path and are split out for separate review.
Co-authored-by: ashishpatel26 <ashishpatel26@users.noreply.github.com>
Packaged Desktop first-launch bootstrap no longer dies with a fatal HTTP
404 when install-stamp.json pins a commit that isn't fetchable from GitHub.
This only happens for locally-built desktop apps: write-build-stamp.cjs's
fromLocalGit() pins `git rev-parse HEAD`, which can be an unpushed commit
or dirty tree. CI builds stamp $GITHUB_SHA and are unaffected. The fix
unblocks the dev / self-builder workflow.
resolveInstallScript() now wraps the GitHub download in try/catch; on
failure it resolves ~/.hermes/hermes-agent/scripts/install.sh (the
already-installed agent checkout), copies it into bootstrap-cache, and
returns it as source 'installed-agent'. If the cache copy fails (read-only
FS), it uses the source path directly. With no installed checkout to fall
back to, the original error rethrows unchanged.
Download is now injectable via an optional _download param so the fallback
path is tested hermetically (no network).
Reported with a precise repro and suggested fix by @Tamaz-sujashvili (#40815).
Co-authored-by: Tamaz-sujashvili <56168197+Tamaz-sujashvili@users.noreply.github.com>
The dashboard font is now selectable from the UI, not just YAML. A new Font
section in the header theme picker overrides the UI font of whatever theme is
active; the choice is orthogonal to the theme and survives theme switches.
Each theme keeps its own font as the default — picking "Theme default" clears
the override.
- web/src/themes/fonts.ts: curated font catalog (system + Google Fonts across
sans/serif/mono), each with a family stack and optional webfont URL. The
catalog is the only injected-font surface — no free-text URL box, so the
injected <link> origins stay fixed.
- web/src/themes/context.tsx: font-override state (localStorage + server),
applied after theme typography so it wins; theme apply re-asserts it, and
clearing re-runs theme apply to restore the theme's own font. Mono is left
to the theme so code/terminal are untouched.
- web/src/components/ThemeSwitcher.tsx: Font section with grouped, self-
previewing font rows and a "Theme default" clear option.
- hermes_cli/web_server.py: GET/PUT /api/dashboard/font persisting to
config.yaml dashboard.font, with a server-side id allow-list (unknown ids
coerce to the theme sentinel).
- i18n + types, api client methods, tests, and docs.
Validation: 6 new backend endpoint tests pass; tsc + vite build clean; live
browser test confirmed pick/persist/survive-theme-switch/clear all work.
process_command() is typed -> bool, but the /clear, /new, and /undo
cancel paths did a bare `return` (None) when _confirm_destructive_slash
was declined, leaking None through the bool contract. Return True
(command handled, keep the REPL alive) on cancel.
Co-authored-by: yubingz <yubingz@users.noreply.github.com>
The desktop model picker calls POST /api/model/set with provider+model only
(no base_url). _apply_main_model_assignment cleared model.base_url for every
non-custom provider, so re-picking a Xiaomi MiMo model wiped a Token Plan
endpoint (https://token-plan-*.xiaomimimo.com/v1) back to the registry default
api.xiaomimimo.com — breaking valid tp- keys with 401s.
Now base_url is cleared only when switching to a different provider (the stale
URL belonged to the old one); same-provider re-assignment preserves it, and an
explicitly supplied base_url is honored for any provider.
The desktop markdown preprocessor autolinks bare URLs by wrapping them in
<...>. RAW_URL_RE allowed '*' in its character classes, so a bold line with
a URL and no separating space — e.g. '**PR opened: https://.../pull/123**' —
greedily pulled the closing '**' into the href, producing a broken link and
an unterminated bold run. Exclude '*' from both URL character classes; '_'
and '~' (which can appear in real paths) are preserved.
The cron run-history endpoint (GET /api/cron/jobs/{id}/runs, added in
#40684) reused list_sessions_rich's order_by_last_active path with a
leading-wildcard id_query. That routes through the recursive
compression-chain CTE, which seeds from EVERY source='cron' row in the DB
and runs per-row preview/last_active subqueries before filtering to one
job and applying LIMIT. Work scaled with the total cron history, so a
large pile made the run-history load time out before eventually
populating.
Cron runs are flat, never-compressed sessions with ids of the form
cron_{job_id}_{ts}, so the chain machinery is pure overhead and the
job binding is a true prefix, not a substring.
- New SessionDB.list_cron_job_runs(): bounded [prefix, hi) id-range scan
on source='cron', ordered by started_at DESC, with the same
preview/last_active enrichment. No CTE, no leading-wildcard LIKE.
- Add idx_sessions_source(source, id) so the range is an index scan;
bump SCHEMA_VERSION 14 -> 15 (index reconciles onto existing DBs via
CREATE INDEX IF NOT EXISTS on startup).
- Point the endpoint at the new method.
Measured on a real SessionDB with 30k cron rows: 5ms vs 85ms for the old
path (16x), and the new path stays flat as the pile grows while the old
one scaled with it. Verified the query plan uses idx_sessions_source_id
(range scan, no full table scan), runs are correctly scoped (substring
collisions like cron_xalpha_ excluded), newest-first, and paged.
* fix(desktop): scope in-session /model switch per-session, stop process-env leak
The desktop/dashboard tui_gateway backend hosts every same-profile session
in ONE process. An in-session /model switch wrote process-global env vars
(HERMES_MODEL / HERMES_INFERENCE_MODEL / HERMES_TUI_PROVIDER /
HERMES_INFERENCE_PROVIDER), which _resolve_startup_runtime() reads when
building a fresh agent. So switching the model in one session leaked into
every other live session's next agent rebuild (/new, resume) — changing the
model in session B silently changed it in session A.
Fix: record the switch as a per-session model_override on the session dict
instead of mutating os.environ. _make_agent honors that override on rebuild
(carrying the concrete base_url/api_key/api_mode the switch resolved), and
falls back to global config when absent. Global persistence on the --global
flag is unchanged.
Also a cleaner fix for #16857 (/new after switching to a custom-provider
model): the override carries the resolved credentials, so the rebuild keeps
the right endpoint without relying on the leaky env vars.
Reported via Twitter (@Da7_Tech): MiniMax M3 in one session + GLM 5.1 in
another interfere when switching between them.
* test(tui_gateway): align /model switch tests with per-session override contract
The three test_config_set_model_syncs_* tests asserted the old leaky contract
(switch writes HERMES_MODEL / HERMES_TUI_PROVIDER / HERMES_INFERENCE_PROVIDER to
process env). That env-sync IS the cross-session contamination bug this PR
removes. Updated to assert the new contract: shared process env untouched, the
switch recorded as a per-session model_override carrying provider/model/base_url/
api_key/api_mode. #16857's intent (a custom-provider switch survives /new) is
still covered — now via the override _make_agent honors on rebuild.
The desktop sidebar fetched the unified cross-profile session list as
profile='all' and filtered it client-side by the active profile. On a
large multi-profile install the active profile's rows could be windowed
out of the cross-profile recency page entirely, so switching to a profile
agent showed an empty history panel (and the 'all' fetch could exceed the
15s IPC timeout on startup). Scope the fetch to the active profile so its
own page comes back on its merits, and bump the session-list IPC timeout
to 60s. profileScope is now a refreshSessions dep, so the existing
gateway-open effect re-pulls on profile switch.
Persist the inbound user turn before provider/tool execution so a crash
before run_conversation() (e.g. provider/httpx client init failure) keeps
the inbound message in the transcript. Repair stale/missing SSL_CERT_FILE
state on gateway startup, and avoid duplicate gateway fallback writes.
The salvaged main-agent fix (sanidhyasin) applies model.default_headers
to the primary OpenAI client, but the auxiliary client (title generation,
context compression, vision routing) builds its own clients and did not
read the override. For a `provider: custom` endpoint behind a gateway/WAF
that rejects the OpenAI SDK's identifying headers, the main turn would
succeed while auxiliary calls to the same endpoint still failed with the
opaque 502/4xx from #40033.
Add agent.auxiliary_client._apply_user_default_headers() (user values win
over provider/SDK defaults; no-op when unconfigured) and apply it at every
OpenAI-wire client construction site:
- _try_custom_endpoint() — config-level `model.provider: custom`
- the named custom-provider branch (custom_providers/providers entries),
including the anthropic-SDK-missing OpenAI-wire fallback
- the api-key-provider, async-conversion, and main resolve_provider_client
fallback branches
To prevent the two clients ever drifting on precedence/value handling,
AIAgent._apply_user_default_headers (run_agent.py) now delegates the config
read + merge to this shared helper (run_agent already imports from
auxiliary_client). Native Anthropic/Bedrock branches are untouched (they
don't use the OpenAI wire).
8 new tests (helper semantics + config-level custom + named custom);
full aux + attribution header suites green (295).
Custom OpenAI-compatible endpoints sitting behind a gateway/WAF can reject
the OpenAI Python SDK's default identifying headers (User-Agent: OpenAI/Python,
X-Stainless-*) and return an opaque 502/4xx even though the same request body
succeeds under curl. There was no supported way to override those headers.
Add a model.default_headers config key whose values are merged onto the
OpenAI client's default_headers, taking precedence over provider- and
SDK-supplied defaults. Applied at client construction and on every credential
swap / client rebuild so the override survives reconnects. No-op for native
Anthropic / Bedrock modes and when unconfigured.
Mirrors the EN deep-audit fixes (PR #40952) into the zh-Hans translation so the
two locales agree. zh-Hans is the only non-English locale; 26 translated pages
carried the same stale claims.
Corrections ported (code tokens identical across locales; prose re-translated
where the surrounding text was already Chinese):
- reference: /version slash command + dual-surface list; cli --provider adds
openai-api + novita aliases; tool count 70->71 (+ removed phantom "10 RL tools"
and fixed kanban 7->9); model_catalog ttl 24->1.
- user-guide: hermes -w -q -> -w -z; language list 8->16; aux slots 8->11;
docker separate-dashboard claim; gateway-streaming per-platform note;
computer-use frontmatter.
- features: curator prune_builtins truth; codex-runtime aux keys
(context_compression->compression, vision_detect->vision); voice-mode STT/TTS
enums; removed phantom rl toolset.
- integrations: StepFun step-3-mini->step-3.5-flash; web-search backends 4->8;
nous-portal status subcommand.
- messaging: WeCom typing/streaming columns; telegram transport default
edit->auto; sms host 0.0.0.0->127.0.0.1; simplex/ntfy gateway-setup + pairing
approve; line smart-chunking; matrix MATRIX_DM_AUTO_THREAD; msgraph host note.
- developer-guide: entry-point group hermes.plugins->hermes_agent.plugins;
PLUGIN.yaml->plugin.yaml.
Net-new EN sections (mcp mTLS, api-server run-approval, kanban CLI verbs) are
untranslated in zh-Hans and fall back to English source, consistent with the
mirror's existing partial-coverage state. Verified: docusaurus build --locale
zh-Hans succeeds; no new broken anchors from these edits.
compress_context() sets last_prompt_tokens=-1 right after compression to
mark "no real API usage yet". The preflight display-seed used
`_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1
(truthy), so any positive rough estimate clobbered the sentinel with a
schema-inflated count — re-triggering compression on the next turn.
Treat any negative value as "no real data yet" and skip the seed.
Salvaged from #40246 as the minimal root-cause fix. The original also
added an `_awaiting_suppression_count` bounded-window state machine to
should_compress() across 3 files; left out here to keep blast radius
small — the sentinel guard alone fixes the re-fire. The suppression
window can be added separately if the usage=None-stub edge case warrants it.
Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
Adds the AUTHOR_MAP entry for the #40403 salvage (model.default_headers
for custom OpenAI-compatible providers, fixes#40033) so contributor_audit
passes when the salvage PR lands.
* test(kimi): align stale parity/profile tests with thinking-xor-effort contract
ce4e74b3 (fix(kimi): send thinking xor reasoning_effort, never both)
changed the Kimi profile to emit at most one of extra_body.thinking or a
top-level reasoning_effort, and added tests/plugins/model_providers/test_kimi_profile.py
to pin it — but left two older test files still asserting the removed
'send both' behavior, turning main red for every PR branched after it.
Update the stale assertions to the xor contract:
- explicit recognized effort (low|medium|high) -> reasoning_effort only,
no thinking
- enabled w/o effort, or no reasoning_config -> thinking:enabled only,
no reasoning_effort
- disabled -> thinking:disabled only
No production change.
* test(kimi): cover remaining xor stale assertions (profile_wiring, run_agent)
Two more test files asserted the pre-ce4e74b3 'thinking + reasoning_effort
together' behavior — landed in a different CI shard so they surfaced only
after the first batch went green:
- tests/providers/test_profile_wiring.py::TestKimiProfileParity (2)
- tests/run_agent/test_run_agent.py::TestBuildApiKwargs (3: kimi-coding,
moonshot, moonshot-cn)
Same realignment to the xor contract: default/enabled-without-effort emits
thinking:enabled and no reasoning_effort; explicit effort emits
reasoning_effort only. Verified by running the full provider +
TestBuildApiKwargs Kimi surface (202 passed) plus a codebase-wide grep for
any remaining paired thinking+effort assertion (none).
The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window
(verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses
is rejected with context_length_exceeded while ~250K succeeds; the same slug
exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the
default 50% trigger, auto-compaction fires at ~136K — half the usable window.
Raise the trigger to 85% (~231K) on this exact route only, gated by a new
compression.codex_gpt55_autoraise config flag (default true). When it fires,
emit a one-time notice (CLI inline print + gateway status_callback replay) with
the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's
global threshold.
- _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex
- _compression_threshold_for_model() now provider-aware + opt-out param
- config key + _config_version bump (27->28) for backfill
- docs + tests (40 cases in test_arcee_trinity_overrides.py)
Follow-up to the subprocess timeout: _prepare_local_audio only caught
CalledProcessError, so a timeout would raise uncaught. Return a clean
error instead.
subprocess.run() and proc.wait() without timeout can hang indefinitely
if the child process becomes unresponsive. This blocks the calling
thread forever.
Fixed locations:
- tools/transcription_tools.py: ffmpeg conversion (timeout=300) and
user-configured STT commands with shell=True (timeout=300)
- gateway/run.py: helper script proc.wait() (timeout=3600)
Not fixed:
- agent/anthropic_adapter.py: interactive 'claude setup-token' —
user-driven, timeout would be inappropriate
The standalone Kimi/Moonshot profile (api.moonshot.ai/v1) sent both
extra_body.thinking AND a top-level reasoning_effort. With no reasoning
config it even defaulted to thinking:enabled + reasoning_effort:medium,
pairing them on every default call. Moonshot treats these as mutually
exclusive (cannot specify both 'thinking' and 'reasoning_effort').
Align with the kimi-k2 handling already shipped for the opencode-go relay:
send effort when a recognized low|medium|high is requested, otherwise fall
back to the extra_body.thinking toggle. Disabled sends thinking:disabled
only. Never both.
Reported by Cars29 (NOUS Discord). DeepSeek was deliberately left untouched:
its native endpoint accepts both (verified by the live guardrail in
test_deepseek_v4_thinking_live.py), so the report's DeepSeek claim does not
hold there.
Tests: tests/plugins/model_providers/test_kimi_profile.py pins the xor
contract across all config shapes.
#40909 added `CREATE_BREAKAWAY_FROM_JOB` to `windows_detach_flags()`,
which fixed the headline bug (gateway dies after Desktop GUI update
and never comes back). The flag's own docstring acknowledges that
restrictive parent job objects can still refuse breakaway with
`ERROR_ACCESS_DENIED`, surfacing as `OSError` on the `subprocess.Popen`
call:
"Callers in this codebase already wrap detached spawns in
try/except OSError and fall back to a cmd.exe wrapper, so the
breakaway-denied case degrades gracefully rather than crashing."
That's true for `_spawn_detached` in `gateway_windows.py` (the
`hermes gateway start` path), which has both the breakaway bit AND a
retry-without-breakaway fallback. It's NOT true for the post-update
watcher path in `launch_detached_profile_gateway_restart`
(`hermes_cli/gateway.py`), which only has `except OSError: return
False` and gives up entirely. If a user's shell/terminal/container
wraps Hermes in a breakaway-denying job, the gateway-respawn watcher
silently fails to launch instead of trying again without breakaway.
This PR closes that gap and adds the regression tests that were
missing from the original fix.
## Changes
### `hermes_cli/_subprocess_compat.py`
Adds a sibling helper `windows_detach_flags_without_breakaway()` so
callers can express the fallback symbolically (via the helper) rather
than coding the magic `& ~0x01000000` mask at every site. Documented
on `windows_detach_flags` and `windows_detach_flags_without_breakaway`
with the recommended try/except pattern.
### `hermes_cli/gateway.py::launch_detached_profile_gateway_restart`
Two changes, both aligned with the canonical pattern in
`gateway_windows._spawn_detached`:
1. The outer watcher Popen now wraps in `try/except OSError`, and on
failure retries with `windows_detach_flags_without_breakaway()`
(POSIX never reaches this branch — `start_new_session=True` can't
raise OSError).
2. The inlined respawn payload (the `python -c` watcher) also
wraps its CreateProcess in try/except OSError and retries with
`_flags & ~_CREATE_BREAKAWAY_FROM_JOB` on failure. This matters
because the watcher's job-object inheritance is independent of the
outer process's — even if the outer Popen succeeds with breakaway,
the respawned gateway might inherit a job that doesn't.
### Regression tests in `tests/tools/test_windows_native_support.py`
#40909 shipped the fix without any test that the breakaway bit is
present (the existing `test_windows_detach_flags_has_expected_win32_bits`
asserted only the three legacy bits). Four new tests close that:
- `test_windows_detach_flags_includes_breakaway_from_job` — explicit
assertion that the breakaway bit is in the default bundle, with the
rationale spelled out in the docstring so a future maintainer
staring at this test understands why removing it would resurrect
the gateway-dies-after-GUI-update bug.
- `test_windows_detach_flags_without_breakaway_drops_only_that_bit`
— fallback payload keeps the other three detach bits intact.
- `test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway`
— static-text check on the stringified watcher payload. The inlined
Python program isn't reachable via normal import-time inspection
because it lives in a `textwrap.dedent("""...""")` literal that
gets passed to a separate `python -c` interpreter. Asserting that
both `_CREATE_BREAKAWAY_FROM_JOB` (symbolic) and `0x01000000` (hex
literal) appear inside the dedent block is a sufficient regression
guard against accidental refactors.
- `test_launch_detached_profile_gateway_restart_outer_popen_has_access_denied_fallback`
— static check that this PR's fallback retry is wired up
symbolically. Without standing up a real Windows job object that
refuses breakaway, we can't trigger the OSError in a unit test;
the text guard catches the case where a future refactor removes
the helper import or the `& ~_CREATE_BREAKAWAY_FROM_JOB` retry.
Also extends `test_windows_detach_flags_has_expected_win32_bits` to
include the breakaway bit assertion and updates
`test_windows_flags_zero_on_posix` to cover the new helper.
## Tests
Locally on Windows: 8/8 in the `-k "detach or breakaway or
popen_kwargs or launch_detached or gateway_run_update or
hermes_cli_gateway"` slice pass.
Broader `tests/hermes_cli/test_gateway*.py + test_windows_native_support.py`:
172 passed, 10 failed. All 10 failures are pre-existing POSIX-only
tests running on a Windows host (os.geteuid, SIGKILL fallback,
is_linux fixture mismatches). Stashing this PR and re-running on bare
post-#40909 main reproduces all 10 identically — none are regressions.
POSIX paths unchanged: `windows_detach_flags()` and
`windows_detach_flags_without_breakaway()` both return 0 off Windows,
`windows_detach_popen_kwargs()` still yields `{"start_new_session": True}`.
## Out of scope
- The other detached-spawn site in `hermes_cli/gateway.py` (around
line 3068) also uses `windows_detach_popen_kwargs()` + `except
OSError`. It deserves the same fallback treatment but the codepath
is different enough (not the update-flow watcher) that it warrants
a separate PR with its own scrutiny.
- `gateway/run.py` has Windows branches with `windows_detach_popen_kwargs`
too — same reasoning.
## Context
Follow-up to #40909 (merged). I had a parallel PR (#40934, closed)
that duplicated the core breakaway fix; the bits unique to that PR
that #40909 didn't cover are the contents of this one. Closing #40934
and opening this slimmed-down version as the focused follow-up.
Follow-up to the salvaged perf fix. The new force_fresh_nous_tier param was
inserted into list_authenticated_providers between custom_providers and
max_models. Make it keyword-only (*) so a positional caller passing max_models
as the 5th arg can never silently mis-bind it to the tier-refresh flag, and
add a signature-contract test that fails if the keyword-only separator is
later dropped. All in-repo callers already use keyword args; verified no
caller breaks.
Mirror the bootstrap-installer (Rust) fix in the Electron first-launch
runner. spawnPowerShell launched bare 'powershell.exe', trusting PATH to
contain %SystemRoot%\System32\WindowsPowerShell\v1.0 — the same latent
weakness that stalled the native installer at "0 of 0 steps" when PATH is
trimmed/truncated or stored as a non-expanding REG_SZ. Resolve by absolute
path first (%SystemRoot%/%windir%), then PATH (powershell 5.1 -> pwsh 7),
then bare name as last resort.
Make `powershell_under_root` visible under `cfg(test)` so the
%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe layout is
asserted on any host (the rest of the resolution is gated to Windows).
The native Windows installer spawned PowerShell via the bare program name
`powershell.exe`, which trusts PATH to contain
%SystemRoot%\System32\WindowsPowerShell\v1.0. On machines whose PATH was
trimmed or truncated (Windows silently drops entries once the variable
exceeds its length limit), the lookup fails and the spawn dies with
"program not found" before install.ps1 runs at all — the installer then
stalls at "0 of 0 steps".
Resolve PowerShell by absolute path first (%SystemRoot%/%windir%), then
fall back to PATH (powershell 5.1, then pwsh 7), then a bare name as a
last resort. Also include the resolved interpreter in the spawn-failure
context; the old message printed only the script path, which misleadingly
read as if the .ps1 itself was missing.
Regression guard for the emoji-fallback fix: checks DEFAULT_TYPOGRAPHY and every
defined builtin-theme fontSans/fontMono stack contains a color-emoji font.
None of the UI sans/mono font stacks (themes/presets.ts, styles.css) carry
emoji glyphs, so on platforms whose default text font lacks them (e.g. Linux)
emoji rendered as tofu boxes in the composer and chat.
Append a color-emoji fallback — Apple Color Emoji / Segoe UI Emoji / Segoe UI
Symbol / Noto Color Emoji / the `emoji` generic — to every font stack
(SYSTEM_SANS, SYSTEM_MONO, the Courier theme, and the CSS --dt-font-* defaults).
Text still uses the primary fonts; the browser only falls back for emoji
codepoints. Custom themes build on SYSTEM_* so they inherit it automatically.
On Windows with non-UTF-8 console encodings (e.g. cp949, cp1252),
StreamHandler emits raise UnicodeEncodeError when log messages contain
characters outside the console codepage — such as the em-dash (U+2014)
in the session hygiene message.
This crashed the gateway process silently, leaving no diagnostic output.
Fix: add _safe_stderr() helper that wraps sys.stderr in a TextIOWrapper
with encoding='utf-8' and errors='replace' when the console encoding
is not UTF-8. Applied to both:
- hermes_logging.py setup_verbose_logging() stderr handler
- gateway/run.py optional stderr handler
The wrapper ensures log lines are never lost — un-encodable characters
are replaced with '?' instead of crashing the process.
Fixes#40432
* fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep
Three coordinated fixes for the Windows gateway reliability story:
1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn
The 'hermes update' triggered from the Electron Desktop GUI ran inside
Electron's job object. Without breakaway, the post-update gateway
watcher spawned by update — already DETACHED_PROCESS — was still
reaped when Electron's job tore down, so the gateway never came back
after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000)
to:
- hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by
every helper that calls windows_detach_popen_kwargs(), including
launch_detached_profile_gateway_restart()
- The watcher subprocess's own respawn snippet in
hermes_cli/gateway.py (inlined flags so the watcher's child
respawn also breaks away)
_spawn_detached() in gateway_windows.py already had the flag; this
change brings the rest of the codebase to parity.
2. Per-minute supervisor Scheduled Task — Windows equivalent of
systemd Restart=always
Introduces hermes_cli/gateway_supervisor.py and registers it as a
second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1,
LIMITED rights) alongside the existing ONLOGON task. Every minute,
the supervisor uses the same gateway.status.get_running_pid() probe
as 'hermes gateway status' and, if no gateway is alive, calls
gateway_windows._spawn_detached() (which now includes BREAKAWAY) to
bring one back.
Covers every crash mode, not just 'machine rebooted': taskkill,
OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw
startup per minute when down, one PID-existence check per minute
when up.
Wired into both the schtasks-success and Startup-folder-fallback
install paths via _install_supervisor_best_effort(), and removed in
uninstall(). Best-effort: a failing supervisor install logs a
warning but doesn't roll back the primary install.
3. 'hermes gateway status --deep' shows per-probe PASS/FAIL
Replaces the existing terse '--deep' output (which only printed
paths) with an actual diagnostic table:
[1] PID file present
[2] Lock file held by a live process
[3] get_running_pid() result
[4] _pid_exists(pid) — OS-level liveness
[5] gateway_state.json (state + age)
[6] Last lifecycle event from gateway-exit-diag.log
When the high-level summary disagrees with reality, the user can
see exactly which signal is lying.
Test-leak fix
-------------
tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages
monkey-patched is_linux/is_wsl/supports_systemd_services to simulate
WSL but did NOT stub is_windows(). On a Windows host, the dispatcher
in _gateway_command_inner takes the is_windows() branch BEFORE the
WSL guidance branch, so the test invoked gateway_windows.install()
for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd
— the REAL user Startup folder, never sandboxed by tmp_path — pointing
at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper.
When pytest tore down the tmp_path, every subsequent Windows login
flashed a cmd.exe window that failed to find the missing target.
Stubs is_windows=False on all four affected tests:
test_install_wsl_no_systemd
test_start_wsl_no_systemd
test_status_wsl_running_manual
test_status_wsl_not_running
Defense-in-depth: _build_startup_launcher() now prefixes the launcher
with 'if not exist <target> exit /b 0', so any future stale Startup
entry silently no-ops instead of flashing a console window.
Status enhancements
-------------------
- status() now reports supervisor task presence alongside the existing
schtasks/Startup info, and nudges the user to reinstall if the
supervisor isn't registered.
- Deep mode dumps both the supervisor task name + script path.
* fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes
Earlier in this branch we added a per-minute schtasks-based supervisor to
respawn the gateway after crashes / GUI-update SIGTERMs. The implementation
flashed a brief console window on every firing, which stole window focus.
We tried several variants:
- cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem)
- schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is
actually subsystem=Console, not GUI; it respawns the real pythonw)
- schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side
conhost preallocation; documented Windows quirk)
- XML registration with <Hidden>true> -> still flashes (<Hidden> only hides
the task in the Task Scheduler UI, not the spawned window)
Researched what leading projects do:
- Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor.
- Tailscale: real Windows Service via SCM. Session 0, no console possible.
- Syncthing: --no-console flag inside the binary + Startup folder.
- openclaw: VBS Run(..., 0, False) wrapper. Suppresses the *window* but
Super User Q971162 confirms focus-steal still occurs in some cases.
None of these use a per-minute polling scheduled task. The 'auto-restart on
crash' responsibility belongs INSIDE the daemon (Tailscale's in-process
recovery / Ollama's monitor+worker pair) OR is delegated to the Windows
Service Control Manager — not Task Scheduler.
So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB
fix in _subprocess_compat.py (from commit c1e5fa433) survives — that is the
*real* fix for problem #2 (GUI-update kills gateway): the post-update
watcher in launch_detached_profile_gateway_restart() now breaks out of
Electron's job object, so the gateway respawn watcher survives the GUI
quit and successfully respawns the gateway.
Surviving from c1e5fa433:
* CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes#2)
* Inlined breakaway flag in the watcher respawn snippet in gateway.py
* hermes gateway status --deep PASS/FAIL probes (fixes#1 — visibility)
* 'if not exist <target> exit /b 0' guard in _build_startup_launcher
(fixes#3 — silent no-op for stale Startup entries)
* tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause
of #3 — pytest WSL tests no longer leak Startup entries on Win hosts)
Removed in this commit:
* hermes_cli/gateway_supervisor.py (entire file)
* Supervisor section in hermes_cli/gateway_windows.py (~180 lines):
get_supervisor_task_name, get_supervisor_script_path,
_build_supervisor_cmd_script, _write_supervisor_script,
_install_supervisor_task, is_supervisor_task_registered,
_install_supervisor_best_effort
* _install_supervisor_best_effort() calls in install() (3 spots)
* supervisor cleanup block in uninstall()
* supervisor display lines in status() / status(deep=True)
Future direction (out of scope for this PR): the right place for Windows
'Restart=always' semantics is a real Windows Service installed via
pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM
auto-restart, no console window possible. That's a meaningful next-PR
project, not a band-aid.
Tests: 51 pass / 2 pre-existing failures in
tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are
TestSupportsSystemdServicesWSL cases that fail on origin/main too —
unrelated to this PR).
_hook_jsonable() referenced SimpleNamespace without importing it, so
sanitizing any hook payload that contained one raised
NameError: name 'SimpleNamespace' is not defined.
Bedrock, Codex-responses, and the auxiliary client build their
response / message / tool_call objects as SimpleNamespace and hand the
raw objects to the post_api_request hook. The hook call sites swallow
exceptions (except Exception: pass), so the crash silently dropped the
observability hook for those providers.
Add the missing `from types import SimpleNamespace` and a regression
test covering the SimpleNamespace sanitization path.
Add a help line under the Orchestrator profile selector explaining it
owns the root task after fan-out and does not drive how tasks split;
point at auxiliary.kanban_decomposer for the decomposer model. Also fix
the Profile descriptions hint to credit the decomposer (not the
orchestrator) for routing. This is the dashboard surface that prompted
the original support confusion.
_read_events() returned normally when self._ws was closed-but-non-None
(the while-condition is false on entry). _listen_loop treats a normal
return as a clean read, resets backoff to 0, and immediately retries —
a tight busy-loop pinning CPU. Raising on entry routes it through the
reconnect/backoff path instead.
Co-authored-by: xushibo <xushibo@users.noreply.github.com>
Co-authored-by: cnfi <cnfi@users.noreply.github.com>
Pillow drives the byte/pixel image-shrink path that runs at vision-embed
time. Without it, an oversized image (>5 MB or >8000px) bakes into
immutable history and bricks the session on Anthropic's non-retryable
400. It's a pure-wheel dep with no system-lib requirement for the codecs
we use, so there's no reason to gate it behind an extra + a mid-session
lazy install (the install that deadlocked the CLI under prompt_toolkit,
#40490). Every install — base, [all], packagers — now ships it.
The [vision] extra becomes a no-op back-compat alias so existing
'pip install hermes-agent[vision]' invocations still resolve. The
tool.vision lazy-deps entry is kept as a belt-and-suspenders fallback for
stripped/source-build installs.
The vision (Pillow) and faster-whisper STT tool paths were the only
ensure() call sites that defaulted to prompt=True, so they could fire a
blocking input() confirmation mid-session. Every other call site already
passes prompt=False. Under the interactive CLI prompt_toolkit owns stdin,
so that input() deadlocks the terminal (#40490). The install is already
gated by security.allow_lazy_installs, so the prompt was redundant
consent anyway. This makes the deadlock-capable input() branch
unreachable from any tool-call path.
A memory provider tool whose name collides with a built-in core tool
(e.g. clarify, delegate_task) was skipped from agent.tools at init but
lingered in MemoryManager._tool_to_provider, where the has_tool dispatch
branch could route a call to a tool that was never registered (#40466).
Block the collision at registration instead of patching dispatch:
- MemoryManager.add_provider rejects any tool whose name is in
_HERMES_CORE_TOOLS (warn + skip), so it never enters the routing table.
- get_all_tool_schemas applies the same filter, so the manager never
advertises a schema it would refuse to route.
Built-ins always win, matching the invariant used by the TTS/browser/
search provider registries. Makes the dispatch-hijack structurally
impossible regardless of branch ordering.
Closes#40466.
Adds regression tests for the SSH cwd fix: local backend keeps
host-validated session cwd; non-local backend uses TERMINAL_CWD (or
terminal.cwd config) verbatim without host isdir() validation; sentinel
values fall back to session cwd.