* fix(cli): persist custom --portal-url to .env on dashboard register
`hermes dashboard register --portal-url <url>` resolved the custom portal
for the registration request but only persisted it to .env when the var was
absent AND non-default. So a user who re-registered against a different
portal (e.g. switching preview deploys) silently kept the stale
HERMES_DASHBOARD_PORTAL_URL, and an explicit request for the production
portal was never written at all.
Track whether a custom portal was *explicitly supplied* (--portal-url flag
or HERMES_DASHBOARD_PORTAL_URL env), separately from the resolved value:
- explicit custom URL -> always persist (update in place via
save_env_value, which overwrites the matching key rather than appending
a duplicate), even when it equals the production default; no-op when it
already matches.
- no custom URL supplied -> unchanged conservative behaviour: only write an
inferred portal when absent and non-default; never alter an existing
entry unexpectedly.
save_env_value already preserves other lines/comments and dedups in place;
this only changes the decision of *when* to call it.
Adds TestCustomPortalPersistence covering all four cases.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
* feat(cli): persist dashboard public URL from --redirect-uri on register
When the user registers a publicly-exposed dashboard with --redirect-uri
(the full OAuth callback, e.g. https://hermes.example.com/auth/callback),
derive its origin and persist it as HERMES_DASHBOARD_PUBLIC_URL — the env var
the dashboard auth layer actually consumes at serve time.
dashboard_auth/routes._redirect_uri reconstructs the callback as
HERMES_DASHBOARD_PUBLIC_URL + "/auth/callback" (verbatim), and
dashboard_auth/prefix.resolve_public_url reads that var (then config.yaml
dashboard.public_url) to decide the public origin. Previously --redirect-uri
was sent to the portal at registration but never persisted, so the operator
had to set HERMES_DASHBOARD_PUBLIC_URL by hand for the login gate to engage
and the callback to round-trip. We now wire it automatically.
Persist the ORIGIN (scheme://host[:port]), not the full callback path —
persisting the raw redirect would double the path when the runtime appends
/auth/callback. Mirrors the portal-url persistence semantics already in this
PR: always write an explicitly-derived value (updating in place, no
duplicate), no-op when it already matches, never written on a localhost-only
install (no --redirect-uri), and skipped for a non-http(s)/malformed redirect.
Verified end-to-end: cmd_dashboard_register writes the origin to .env, then
resolve_public_url() reads it back and public_url + /auth/callback
reconstructs exactly the originally-supplied --redirect-uri.
Adds TestPublicUrlPersistence (8 cases) incl. origin-derivation, port
preservation, update-in-place, no-op, no-flag, non-http skip, and
both-portal-and-public-url-persisted.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
---------
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Re-running `hermes dashboard register` now updates the existing dashboard
record in nous-account-service instead of creating a duplicate.
The stable key is the client_id this install already persisted in
HERMES_DASHBOARD_OAUTH_CLIENT_ID on a prior run:
- No stored client_id -> first registration -> create a fresh client with an
auto-generated name (unchanged behavior).
- Stored client_id present -> re-send it as `client_id` so the portal updates
that row in place. Without an explicit --name, the name is omitted so the
portal-stored name isn't churned to a new random value on every re-run.
- Prints "Updated dashboard" vs "Registered dashboard" based on whether the
portal echoed back the same client_id. A stale/deleted id safely falls
through to a fresh create server-side.
Requires the matching nous-account-service change (POST
/api/oauth/self-hosted-client accepting an optional client_id + optional name).
Tests: 7 new TestIdempotentRerun cases (key sent, name preserved/overridden,
Updated message, persisted id, stale-id fall-through, blank-id first-run);
existing create-path tests unchanged (23 pass).
A bare `git fetch origin` (and `git fetch upstream`) pulls every ref. The
repo carries thousands of auto-generated branches, so on any
non-single-branch checkout the installer's update path and `hermes update`
spend minutes downloading the full branch list — long enough to stall the
desktop installer or trip the follow-up `git pull --ff-only`.
Scope every update-path fetch to the branch we actually compare/merge
against:
- scripts/install.sh: collapse the remote to single-branch and fetch only
$BRANCH on the "existing install, updating" path.
- hermes_cli/main.py: fetch the resolved branch in the apply path, the
--check path (upstream + origin), and the fork upstream-sync.
Tracking-ref updates still happen via git's opportunistic refspec, so the
later origin/<branch> rev-parse/rev-list checks are unaffected.
Tests assert the apply-path fetch is branch-scoped and never bare.
hermes auth add openai-codex now creates an independent
manual:device_code pool entry per account instead of routing through
the singleton _save_codex_tokens save path, which collapsed every
added account into the latest login (the second add overwrote the
first account's singleton-mirrored device_code entry). This is the
add-path half of #39236; PR #39243 (already on this branch) fixes the
re-auth half.
manual:device_code entries refresh from their own token pair
(_sync_codex_entry_from_auth_store only adopts the singleton for
source=="device_code"), so they need no providers.openai-codex
shadow. Adding the first credential marks openai-codex active (the
singleton path did this implicitly) so the setup wizard's
get_active_provider() check still passes; subsequent adds leave the
active provider untouched.
Adds SOURCE_MANUAL_DEVICE_CODE constant and a regression test that two
distinct accounts keep distinct token pairs. Updates two existing add
tests to the pool-only behavior.
Co-authored-by: glesperance <info@glesperance.com>
The #33538 fix refreshed every credential_pool entry with source
"manual:device_code" on every Codex OAuth re-auth, on the assumption that
such entries were always legacy aliases of the singleton from the #33000
workaround era. That assumption is no longer true: `hermes auth add
openai-codex` also produces "manual:device_code" entries for independent
ChatGPT accounts, and the broad sync silently clobbered them with the
latest-authenticated token pair (labels preserved, token material
overwritten, status / quota readings then lie).
Narrow the sync: refresh a "manual:device_code" entry only when its
existing access_token matches the previous singleton access_token (true
legacy alias). Entries with distinct token material represent independent
accounts and are now left alone. Error markers are cleared only on
entries actually rewritten, so an independent account's own 429 / 401
state survives a re-auth that targeted a different account.
Tests:
* New: independent acctB/acctC are not overwritten when acctA re-auths.
* New: legacy singleton-alias still refreshed (preserves #33538).
* New: missing previous singleton state handled (no crash, no false
alias match).
* New: access_token-only alias match (legacy schema without
refresh_token still recognized).
* New: error markers cleared only on entries actually refreshed.
* Updated: existing manual-device-code sync test now covers both the
legacy-alias path AND the independent-account path in one fixture.
Behaviour change is zero for users with a single Codex account and zero
for users whose only "manual:device_code" entry is the legacy alias of
the singleton. Users with multiple independent Codex accounts added via
`hermes auth add` now keep their distinct token material across
re-auths.
Local: 29 passed in tests/hermes_cli/test_auth_codex_provider.py, no
new failures in tests/hermes_cli/ vs upstream/main baseline.
Fixes#39236.
* fix(desktop): stop running app locking win-unpacked before pack
On Windows a running Hermes.exe keeps an exclusive lock on
release/win-unpacked/Hermes.exe, so electron-builder's pack cannot
replace it and dies with "remove ...\Hermes.exe: Access is denied" /
ERR_ELECTRON_BUILDER_CANNOT_EXECUTE (before-pack hits the same EPERM
cleaning the dir, and the cache-purge retry repeats the failure since
the lock is still held).
Before building the packaged app, terminate any process whose
executable lives inside this build's release/ tree so the rebuild --
including the installer's headless --update rebuild -- can replace the
binary. Scope is narrow (only exes under release/), POSIX is a no-op
(it can unlink a running binary), and the final error now points
Windows users at the running-app cause.
* test(desktop): cover the win-unpacked lock-breaker helper
Verify _stop_desktop_processes_locking_build is a no-op off-Windows,
terminates only processes whose exe lives under release/ (sparing our
own PID and unrelated installs), and short-circuits when no release dir
exists.
* feat(windows): enable dashboard chat tab via ConPTY (win_pty_bridge)
Add hermes_cli/win_pty_bridge.py — a pywinpty-backed drop-in for
PtyBridge with the same spawn/read/write/resize/close surface — and
wire it into the web_server PTY import block so Windows picks it up
instead of falling back to None.
pywinpty is already a declared win32 dependency (pyproject.toml).
The ConPTY read path runs inside run_in_executor so the event loop
is never blocked. Spawn/read/write/terminate call shapes are taken
directly from tools/process_registry.py which already exercises the
same pywinpty version.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: remove WSL2-only caveat for dashboard chat tab
The chat pane now works on native Windows via the ConPTY bridge added
in the previous commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(windows): cover ConPTY bridge + web_server platform-branched import
Companion to the bridge added in the previous commits. Verified live on
native Windows 11 (pywinpty 2.0.15) against `hermes dashboard`'s
`/api/pty` WebSocket: the spawned `hermes --tui` (node entry.js) renders
through ConPTY, resize escapes reach `setwinsize`, and closing the WS
reaps both the node child and the pywinpty agent with zero orphans.
tests/hermes_cli/test_win_pty_bridge.py
Mirrors the layout of the existing POSIX test_pty_bridge.py:
spawn/io/resize/close/env coverage against cmd.exe and python -c,
plus the cross-platform fallback surface (PtyUnavailableError, the
off-Windows `spawn -> raises PtyUnavailableError` guard, and the
load-bearing _clamp() helper that protects setwinsize from garbage
winsize values out of xterm.js).
tests/hermes_cli/test_web_server_pty_import.py
Asserts that web_server.PtyBridge resolves to WinPtyBridge on win32
and to the POSIX PtyBridge on POSIX, that PtyUnavailableError is the
matching class on each side (so isinstance checks in /api/pty's
spawn fallback path work), and a source-text check that pins the
platform-branched import shape so a future refactor can't quietly
collapse it back to a POSIX-only import.
scripts/release.py
AUTHOR_MAP entries so CI release-note generation can resolve both
authors' plain (non-noreply) emails to their GitHub logins.
Co-Authored-By: JoelJJohnson <josephjohnson.joel@gmail.com>
Co-Authored-By: Nea74 <andreas@schwarz-ketsch.de>
---------
Co-authored-by: JoelJJohnson <josephjohnson.joel@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Nea74 <andreas@schwarz-ketsch.de>
resolve_provider() auto-detection only checked OPENROUTER_API_KEY/
OPENAI_API_KEY env vars, never the credential pool. A key added via
`hermes auth add openrouter` (manual pool entry, no env var) was invisible:
the provider failed to resolve or resolved with an empty api_key, so
requests went out with no Authorization header and OpenRouter returned
"HTTP 401: Missing Authentication header" while `hermes auth list` showed
the credential. Closes#42130.
- auth.py: check load_pool("openrouter").has_credentials() after the env check
- dump.py: `debug share` shows 'openrouter set (auth pool)' instead of the
misleading 'not set' when the key lives in the pool
- add regression tests (pool credential auto-detects; empty pool still raises)
Lift the 18 _model_flow_* provider-setup wizard functions out of hermes_cli/main.py
into hermes_cli/model_setup_flows.py. Behavior-neutral; main.py 14050 -> 11479 LOC.
select_provider_and_model (the dispatcher) STAYS in main.py and re-imports the
flows via an explicit 'from hermes_cli.model_setup_flows import (...)' block, so
both its bare-name calls and existing test monkeypatches targeting
hermes_cli.main._model_flow_* keep resolving against main's namespace unchanged.
Imports: 3 neutral deps (argparse, os, subprocess) at the module top; the 14
main.py-internal helpers the flows call (_prompt_api_key, _save_custom_provider,
the reasoning-effort/stepfun/qwen helpers, _run_anthropic_oauth_flow, ...) are
lazy-imported per-flow (from hermes_cli.main import ...) so the new module never
imports main at module scope -> no import cycle.
Repointed one source-inspection change-detector (test_setup_ollama_cloud_force_refresh)
to read the module the ollama-cloud branch moved to.
Validation: 6563/6563 hermes_cli tests pass; live flow-dispatch probe confirms the
lazy main-internal imports resolve at runtime.
Add a best-effort `commits` list (sha/summary/author/at) to the update-check
response for git/pip installs that are behind upstream, so the desktop's
remote update overlay can show what's changed before applying.
Additive and non-breaking: existing consumers (legacy dashboard, tests using
subset assertions) ignore the new field. Leaves the shared check_for_updates()
int contract untouched — commits come from a separate best-effort git call.
#41076 makes `hermes plugins list` discover nested category plugins (e.g.
observability/nemo_relay). This adds the missing enable/disable mutation path
so those plugins can actually be toggled, and fixes two incomplete-update
breakages on the #41076 base.
Before: `hermes plugins enable nemo_relay` -> "Plugin 'nemo_relay' is not
installed or bundled." (exit 1), because cmd_enable/cmd_disable went through
_plugin_exists(), which only checked top-level plugins/<name>/.
Changes:
- Add _resolve_plugin_key(): resolve a bare manifest/leaf name OR a full
path-derived key (observability/nemo_relay) to the canonical key the runtime
loader gates on, reusing #41076's _discover_all_plugins(). A bare leaf name
ambiguous across two categories resolves to None rather than silently picking
one.
- cmd_enable/cmd_disable resolve first, persist the canonical key, and drop any
stale legacy bare-name alias so the enabled/disabled lists can't drift into a
contradictory state. _plugin_exists delegates to the same resolver.
- Fix#41076 base breakages: _discover_all_plugins now returns 6-tuples, but
web_server._merged_plugins_hub() still unpacked 5 (ValueError on the
dashboard plugins-hub endpoint) and several test_plugins_cmd_list.py fixtures
were still 5-tuples. Both updated; the hub status check is now key-aware.
Verified e2e on the real CLI + runtime loader (isolated HERMES_HOME):
`hermes plugins enable nemo_relay` writes observability/nemo_relay to
config.yaml and the loader then loads it (enabled=True, error=None); a stale
bare-name alias is cleared on disable; the dashboard _merged_plugins_hub() runs
without crashing. Adds resolution + enable/disable tests; full
tests/hermes_cli/test_plugins_cmd* + web_server plugin tests green.
Follow-up to #41076 (#41066). Branched from that PR's head.
* fix(cli): set PYTHON env for node-gyp native builds on NixOS
node-gyp (triggered by node-pty during npm ci) looks for python3 on
PATH, which fails on NixOS because python3 lives in the nix store and
is not on the system PATH.
Add _nixos_build_env() — a two-tier helper that detects NixOS and:
1. Fast path: hermes venv python3 (~0s)
2. Fallback: nix-shell which python3 (~2-5s)
Wire it into _run_npm_install_deterministic() via a new env= parameter,
then pass it through cmd_gui() and _update_node_dependencies().
Non-NixOS systems: _nixos_build_env() returns None, behavior unchanged.
* fix(cli): merge _nixos_build_env() with os.environ, fix NixOS detection, add explicit return None
- Critical fix: both Tier 1 (venv) and Tier 2 (nix-shell) now return
{**os.environ, "PYTHON": ...} instead of {"PYTHON": ...} — subprocess.run
with env= replaces the entire environment, so the old code wiped PATH
and broke npm/node on NixOS entirely.
- Uses re.search(r"^ID=nixos$", ...) for anchored NixOS detection instead
of unanchored substring match (could match ID_LIKE=...nixos).
- Removes redundant Path.exists() guard before read_text(); just catches
OSError (one filesystem read instead of two).
- Adds explicit return None at end of function for type-hint consistency.
Under systemd's Restart=always, --replace turns every restart into a
self-kill loop: the new instance reads gateway.pid, kills the previous
process, writes its own PID, and on the next restart the cycle repeats.
A process supervisor owns the lifecycle — --replace is for manual
one-shot takeovers and fights the supervisor.
Remove --replace from both the system-level and user-level systemd
ExecStart lines. The --replace flag stays available for manual
'hermes gateway run --replace' and on the macOS launchd fallback path
(#23387), which is a deliberate manual takeover, not a supervised unit.
Also drop RestartMaxDelaySec / RestartSteps from the templates — they
require systemd v255+ and are silently ignored on older versions. The
_strip_optional_systemd_directives normalizer stays so existing installs
whose on-disk unit still carries those directives aren't flagged as
outdated.
Credit: reported and diagnosed by @Skippy-the-Magnificent-one (PR #37145);
reimplemented here under project authorship because the original commit
was authored under a non-existent email.
gateway/run.py is the largest god file (20k LOC, GatewayRunner with 220
methods). This lifts the cohesive kanban-watcher cluster — _kanban_notifier_watcher,
_kanban_dispatcher_watcher, _kanban_advance/unsub/rewind, _deliver_kanban_artifacts
(~1,035 LOC, 6 methods) — into gateway/kanban_watchers.py as a mixin that
GatewayRunner inherits.
Mixin (not free functions) because the methods use only self state: inheriting
keeps every self._kanban_* call site working unchanged via the MRO, making this
a behavior-neutral move. The methods' lazy imports (_kb, _decomp, _load_config,
Platform) travel with them; the mixin needs only stdlib + a matching
logging.getLogger('gateway.run').
run.py 20187 -> 19157 LOC; GatewayRunner direct methods 220 -> 214.
Behavior-neutral: gateway test suite 6582 passed / 0 failed; start() still wires
both watchers via self._kanban_*; MRO resolves all 6 to the mixin. One test
(corrupt-board quarantine retry) keyed its time-travel mock on the caller's
filename being gateway/run.py — updated to also accept gateway/kanban_watchers.py.
Establishes the mixin-extraction pattern for further GatewayRunner decomposition
(the 2406-LOC _run_agent and 1164-LOC _handle_message remain, but their callback
closures need a context-object redesign — deferred).
Subcommands whose handler was a closure defined inside main() — memory, acp,
tools, insights, skills, pairing, plugins, mcp, claw — have their handler
promoted to a top-level function and their parser block extracted into
hermes_cli/subcommands/<name>.py (build_<name>_parser, injected handler).
These 9 had zero closure-over-main-locals, so promotion is a pure relocation.
acp/mcp parser blocks use the shared add_accept_hooks_flag helper.
main() 1798 -> 954 LOC (71% below the 3297 Phase-2 starting point);
add_parser calls in main.py 89 -> 28.
Deferred: sessions, computer-use, secrets handlers reference <name>_parser
(for a no-subcommand print_help fallback) — left in place to avoid the
_self_parser indirection; minority, low value.
Behavior-neutral: all 9 subcommands' --help (incl nested subactions) byte-
identical to pre-extraction (diff-verified). tests/hermes_cli/ 6519 passed /
0 failed; new test_subcommands_followup.py covers the 9 builders.
Batch extraction of every remaining subcommand whose handler is top-level and
whose parser block is pure argparse: model, setup, postinstall, whatsapp, slack,
login, logout, auth, status, webhook, hooks, doctor, security, dump, debug,
backup, import, config, version, update, uninstall, dashboard, gui, logs,
prompt-size.
Each becomes hermes_cli/subcommands/<name>.py with build_<name>_parser() and an
injected handler (no main import). dashboard also injects cmd_dashboard_register
for its nested 'register' action.
Behavior-neutral: all 25 subcommands' --help output (and nested subaction help)
diff-verified byte-identical to pre-extraction. Two RawDescriptionHelpFormatter
epilogs (debug, logs) needed their multi-line string interiors preserved at
column 0 — caught by the --help diff, not compile.
main() 3297 -> 1798 LOC across this PR; add_parser calls in main.py 179 -> 89.
Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process
isolation; new test_subcommands_batch.py smoke-tests all 25 builders + the
dashboard two-handler case.
Follow-on to the cron extraction in the same Phase 2 PR. Same pattern:
per-group build_<name>_parser() functions with injected handlers, no main
import.
- subcommands/profile.py: build_profile_parser (190-line block out of main()).
- subcommands/gateway.py: build_gateway_parser (gateway + proxy, 238-line block;
they shared one inline section). Imports argparse for SUPPRESS defaults.
- main(): two more inline blocks become single builder calls.
Behavior-neutral: 'profile [sub] --help' and 'gateway/proxy [sub] --help'
byte-identical to pre-extraction (diff-verified).
main() now 2723 LOC (was 3297 at Phase 2 start); add_parser calls in main.py
179 -> 141.
Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process
isolation; new builder unit tests cover subactions, aliases, dispatch, flags.
Phase 2 of the god-file decomposition plan. main()'s argparse tree is 179
inline add_parser calls in one 3,297-line function. This establishes the
hermes_cli/subcommands/ package and extracts the first group (cron) as the
proof-of-pattern:
- hermes_cli/subcommands/_shared.py: shared parser helpers (add_accept_hooks_flag),
re-exported from main.py for backwards compat.
- hermes_cli/subcommands/cron.py: build_cron_parser(subparsers, cmd_cron=...).
Handler injected so the module never imports main (cycle avoidance).
- main()'s ~155-line inline cron block becomes one build_cron_parser() call.
Behavior-neutral: 'hermes cron create --help' output is byte-identical to
origin/main. main() 3297 -> 3143 LOC.
Validation: tests/hermes_cli/ 6466 passed / 0 failed under per-file process
isolation; new test_subcommands_cron.py covers subactions, aliases, options,
no-agent tristate, injected dispatch, and --accept-hooks.
- web_server.py: after proc.poll() returns a non-None exit code, call
proc.wait() to reap the child and move the entry from _ACTION_PROCS
to _ACTION_RESULTS. Previously .poll() alone left <defunct> zombies.
- meet_bot.py: terminate and wait on the pcm_pump subprocess (paplay/
ffmpeg) during the finally-block teardown. Previously leaked on every
normal bot exit.
- tests: add test_action_status_reaps_completed_process and
test_action_status_ignores_wait_failure covering both the happy path
and the wait()-raises-OSError edge case.
Closes#38032
The copytree ignore lambda in _copy_dist_payload applied USER_OWNED_EXCLUDE
recursively at every directory depth. This caused nested directories whose
names matched exclude entries (bin, logs, cache, etc.) to be silently dropped
during distribution install/update.
Fix: only apply USER_OWNED_EXCLUDE filtering at the root of the staged tree,
matching the two-tier pattern used by _clone_all_copytree_ignore and
_default_export_ignore in profiles.py.
Add 5 tests covering nested bin/logs/cache preservation and top-level
filtering still working.
Fixes#37954
On older systemd versions that don't support RestartMaxDelaySec /
RestartSteps, the installed unit file has those directives silently
dropped. systemd_unit_is_current() did a strict text comparison, so
the unit was perpetually flagged as outdated.
Fix: _strip_optional_systemd_directives() removes RestartMaxDelaySec
and RestartSteps from both the installed and expected text before
comparison. Units that differ only by these optional directives are
now correctly considered current.
Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.
Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
dir and queued via the existing native-image-attach pipeline. Magic-byte
extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
structured error codes. Accepts content_base64/filename (canonical) and
data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
so the two methods and the existing image.attach don't duplicate logic.
Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
clients can display it. Auth-gated like every /api route, extension
allowlist + size cap, AND confined to the gateway's own media roots
(images/screenshots/cache, resolved symlink-safe) so an authed caller can't
read image-extension files anywhere on disk.
Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
markdown-text fetch images over /api/media in remote mode.
Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.
Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.
Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
Follow-up on the deferred-cleanup salvage (#33774): _cleanup_workspace
returned early for a non-scratch ('dir'/'worktree') task and never ran the
parent sweep, so a scratch parent waiting on a 'dir' child would leak its
deferred workspace forever. Run the parent sweep before the early return.
Adds regression tests: deferred-while-child-active, swept-after-last-child,
and dir-child-unblocks-scratch-parent.
The dashboard font is now selectable from the UI, not just YAML. A new Font
section in the header theme picker overrides the UI font of whatever theme is
active; the choice is orthogonal to the theme and survives theme switches.
Each theme keeps its own font as the default — picking "Theme default" clears
the override.
- web/src/themes/fonts.ts: curated font catalog (system + Google Fonts across
sans/serif/mono), each with a family stack and optional webfont URL. The
catalog is the only injected-font surface — no free-text URL box, so the
injected <link> origins stay fixed.
- web/src/themes/context.tsx: font-override state (localStorage + server),
applied after theme typography so it wins; theme apply re-asserts it, and
clearing re-runs theme apply to restore the theme's own font. Mono is left
to the theme so code/terminal are untouched.
- web/src/components/ThemeSwitcher.tsx: Font section with grouped, self-
previewing font rows and a "Theme default" clear option.
- hermes_cli/web_server.py: GET/PUT /api/dashboard/font persisting to
config.yaml dashboard.font, with a server-side id allow-list (unknown ids
coerce to the theme sentinel).
- i18n + types, api client methods, tests, and docs.
Validation: 6 new backend endpoint tests pass; tsc + vite build clean; live
browser test confirmed pick/persist/survive-theme-switch/clear all work.
The desktop model picker calls POST /api/model/set with provider+model only
(no base_url). _apply_main_model_assignment cleared model.base_url for every
non-custom provider, so re-picking a Xiaomi MiMo model wiped a Token Plan
endpoint (https://token-plan-*.xiaomimimo.com/v1) back to the registry default
api.xiaomimimo.com — breaking valid tp- keys with 401s.
Now base_url is cleared only when switching to a different provider (the stale
URL belonged to the old one); same-provider re-assignment preserves it, and an
explicitly supplied base_url is honored for any provider.
_discover_all_plugins() previously did a flat iterdir() scan, missing
all category-namespaced plugins (web/*, image_gen/*, browser/*, video_gen/*).
Now recurses up to 2 levels deep, matching PluginManager._scan_directory_level().
Also fixes _plugin_status() to check both manifest name AND path-derived
key against enabled/disabled sets, so category plugins like 'web/tavily'
show correct status when enabled via config.
Follow-up to the salvaged perf fix. The new force_fresh_nous_tier param was
inserted into list_authenticated_providers between custom_providers and
max_models. Make it keyword-only (*) so a positional caller passing max_models
as the 5th arg can never silently mis-bind it to the tier-refresh flag, and
add a signature-contract test that fails if the keyword-only separator is
later dropped. All in-repo callers already use keyword args; verified no
caller breaks.
* fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep
Three coordinated fixes for the Windows gateway reliability story:
1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn
The 'hermes update' triggered from the Electron Desktop GUI ran inside
Electron's job object. Without breakaway, the post-update gateway
watcher spawned by update — already DETACHED_PROCESS — was still
reaped when Electron's job tore down, so the gateway never came back
after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000)
to:
- hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by
every helper that calls windows_detach_popen_kwargs(), including
launch_detached_profile_gateway_restart()
- The watcher subprocess's own respawn snippet in
hermes_cli/gateway.py (inlined flags so the watcher's child
respawn also breaks away)
_spawn_detached() in gateway_windows.py already had the flag; this
change brings the rest of the codebase to parity.
2. Per-minute supervisor Scheduled Task — Windows equivalent of
systemd Restart=always
Introduces hermes_cli/gateway_supervisor.py and registers it as a
second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1,
LIMITED rights) alongside the existing ONLOGON task. Every minute,
the supervisor uses the same gateway.status.get_running_pid() probe
as 'hermes gateway status' and, if no gateway is alive, calls
gateway_windows._spawn_detached() (which now includes BREAKAWAY) to
bring one back.
Covers every crash mode, not just 'machine rebooted': taskkill,
OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw
startup per minute when down, one PID-existence check per minute
when up.
Wired into both the schtasks-success and Startup-folder-fallback
install paths via _install_supervisor_best_effort(), and removed in
uninstall(). Best-effort: a failing supervisor install logs a
warning but doesn't roll back the primary install.
3. 'hermes gateway status --deep' shows per-probe PASS/FAIL
Replaces the existing terse '--deep' output (which only printed
paths) with an actual diagnostic table:
[1] PID file present
[2] Lock file held by a live process
[3] get_running_pid() result
[4] _pid_exists(pid) — OS-level liveness
[5] gateway_state.json (state + age)
[6] Last lifecycle event from gateway-exit-diag.log
When the high-level summary disagrees with reality, the user can
see exactly which signal is lying.
Test-leak fix
-------------
tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages
monkey-patched is_linux/is_wsl/supports_systemd_services to simulate
WSL but did NOT stub is_windows(). On a Windows host, the dispatcher
in _gateway_command_inner takes the is_windows() branch BEFORE the
WSL guidance branch, so the test invoked gateway_windows.install()
for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd
— the REAL user Startup folder, never sandboxed by tmp_path — pointing
at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper.
When pytest tore down the tmp_path, every subsequent Windows login
flashed a cmd.exe window that failed to find the missing target.
Stubs is_windows=False on all four affected tests:
test_install_wsl_no_systemd
test_start_wsl_no_systemd
test_status_wsl_running_manual
test_status_wsl_not_running
Defense-in-depth: _build_startup_launcher() now prefixes the launcher
with 'if not exist <target> exit /b 0', so any future stale Startup
entry silently no-ops instead of flashing a console window.
Status enhancements
-------------------
- status() now reports supervisor task presence alongside the existing
schtasks/Startup info, and nudges the user to reinstall if the
supervisor isn't registered.
- Deep mode dumps both the supervisor task name + script path.
* fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes
Earlier in this branch we added a per-minute schtasks-based supervisor to
respawn the gateway after crashes / GUI-update SIGTERMs. The implementation
flashed a brief console window on every firing, which stole window focus.
We tried several variants:
- cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem)
- schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is
actually subsystem=Console, not GUI; it respawns the real pythonw)
- schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side
conhost preallocation; documented Windows quirk)
- XML registration with <Hidden>true> -> still flashes (<Hidden> only hides
the task in the Task Scheduler UI, not the spawned window)
Researched what leading projects do:
- Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor.
- Tailscale: real Windows Service via SCM. Session 0, no console possible.
- Syncthing: --no-console flag inside the binary + Startup folder.
- openclaw: VBS Run(..., 0, False) wrapper. Suppresses the *window* but
Super User Q971162 confirms focus-steal still occurs in some cases.
None of these use a per-minute polling scheduled task. The 'auto-restart on
crash' responsibility belongs INSIDE the daemon (Tailscale's in-process
recovery / Ollama's monitor+worker pair) OR is delegated to the Windows
Service Control Manager — not Task Scheduler.
So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB
fix in _subprocess_compat.py (from commit c1e5fa433) survives — that is the
*real* fix for problem #2 (GUI-update kills gateway): the post-update
watcher in launch_detached_profile_gateway_restart() now breaks out of
Electron's job object, so the gateway respawn watcher survives the GUI
quit and successfully respawns the gateway.
Surviving from c1e5fa433:
* CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes#2)
* Inlined breakaway flag in the watcher respawn snippet in gateway.py
* hermes gateway status --deep PASS/FAIL probes (fixes#1 — visibility)
* 'if not exist <target> exit /b 0' guard in _build_startup_launcher
(fixes#3 — silent no-op for stale Startup entries)
* tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause
of #3 — pytest WSL tests no longer leak Startup entries on Win hosts)
Removed in this commit:
* hermes_cli/gateway_supervisor.py (entire file)
* Supervisor section in hermes_cli/gateway_windows.py (~180 lines):
get_supervisor_task_name, get_supervisor_script_path,
_build_supervisor_cmd_script, _write_supervisor_script,
_install_supervisor_task, is_supervisor_task_registered,
_install_supervisor_best_effort
* _install_supervisor_best_effort() calls in install() (3 spots)
* supervisor cleanup block in uninstall()
* supervisor display lines in status() / status(deep=True)
Future direction (out of scope for this PR): the right place for Windows
'Restart=always' semantics is a real Windows Service installed via
pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM
auto-restart, no console window possible. That's a meaningful next-PR
project, not a band-aid.
Tests: 51 pass / 2 pre-existing failures in
tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are
TestSupportsSystemdServicesWSL cases that fail on origin/main too —
unrelated to this PR).
* feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI)
Adds a GUI-only uninstall path so people can remove the desktop Chat GUI
while keeping the Hermes agent + their config/sessions/.env, and surfaces
the three CLI uninstall modes inside the desktop app's Settings → About.
CLI:
- New hermes_cli/gui_uninstall.py: cross-platform discovery + removal of the
desktop GUI's artifacts (source-built dist/release/node_modules + build
stamp, the packaged app bundle, and the Electron userData dir) on Linux,
macOS, and Windows. Never touches the agent source, venv, or user data.
- `hermes uninstall --gui` removes only the Chat GUI; `--gui-summary` prints a
JSON install snapshot (used by the desktop UI to gate options + detect a
missing agent for a future lite client).
- `hermes uninstall --yes` / `--full --yes` now run non-interactively, sharing
the destructive sequence via a new _perform_uninstall() helper. The keep-data
and full flows also sweep the GUI artifacts.
Desktop:
- electron/desktop-uninstall.cjs: pure helpers mapping each mode (gui/lite/full)
to CLI flags, resolving the running app bundle per OS, and building the
detached cleanup script that waits for the app to exit, runs the Python
uninstall, and removes the bundle.
- IPC hermes:uninstall:summary / :run, preload bridge, and types.
- Settings → About "Danger zone" with the three options; agent-removing
options hide when no local agent is detected.
Tests: tests/hermes_cli/test_gui_uninstall.py (22 pass with the existing
uninstall tests), electron/desktop-uninstall.test.cjs (17 pass, wired into
test:desktop:platforms). Docs: desktop.md "Uninstalling" + cli-commands.md.
* fix(desktop): tear down backend process tree before GUI uninstall (Windows lock safety)
The desktop uninstall cleanup script waited only on the desktop app's own
PID, but a backend grandchild (gateway / pty terminal / hermes REPL) can
outlive it and keep hermes.exe + venv files mandatory-locked on Windows —
making the script's rmdir half-fail and leaving a partial install, the same
failure class as the self-update path's #37532.
- main.cjs: runDesktopUninstall now awaits releaseBackendLock() before
spawning the cleanup script — tree-kills every backend PID the desktop owns
(primary + pool) via taskkill /T /F and polls the venv shim until unlocked.
Extracted the shared core out of releaseBackendLockForUpdate so both the
update hand-off and the uninstaller use the identical, incident-hardened
teardown. No-op on macOS/Linux (no mandatory locks).
- desktop-uninstall.cjs: Windows cleanup script removes the bundle via a
bounded rmdir retry loop (10x, 1s) instead of a single rmdir, since Windows
releases directory handles lazily even after the holding process exits.
- Dropped a fragile tasklist|findstr reap-by-path attempt; the Electron-side
tree-kill-by-PID is the reliable mechanism.
Tests: desktop-uninstall.test.cjs updated for the retry-loop output (17 pass).
* fix(desktop): address review on GUI uninstall (venv self-delete, gates, wait-loop)
Resolves @OutThisLife's review on #40355:
1. full mode now gated on agent presence (needsAgent: true). It removes the
agent + user data, so on a lite client with no local agent it's hidden
like lite — no more offering to remove an agent that isn't there.
2. (Finding 3, the real bug) lite/full no longer rmtree the venv from the
venv's OWN python. On Windows a running python.exe is mandatory-locked, so
that half-fails. New lightweight 'python -m hermes_cli.uninstall --mode X'
entrypoint (stdlib-only imports) lets the desktop run agent-removing modes
under the SYSTEM python (findSystemPython) with PYTHONPATH=<agentRoot>, so
import hermes_cli resolves from source while the venv is torn down. Falls
back to venv python + logs when no system python (gui-only unaffected).
3. Windows wait-loop is now bounded (60 tries, matching POSIX) and matches the
PID as a whole space-delimited token via findstr (no substring 99->990
trap, no redundant bare find). set HERMES_HOME/PID/PYTHONPATH now quoted.
4. Renamed the misleading 'returns null for dev run' test — the dev-run safety
is shouldRemoveAppBundle(isPackaged=false), which the test now asserts.
Docs: note that --gui on a source checkout also sweeps node_modules/build
output. Tests: 18 python + 19 desktop pass.
Conflict resolution prefixes --workspace web before --silent (preserving
the Termux npm_workspace_args path); update test_cmd_update fixture to match.
Add zakame@zakame.net -> zakame mapping so CI author check passes.
- check_dir = npm_dir if audit_extra else npm_dir evaluated identically in
both branches; change to PROJECT_ROOT if audit_extra else npm_dir so
workspace-scoped audits check the workspace root's node_modules
- Add test_npm_install_uses_workspace_web_scope asserting --workspace web is
passed adjacently in the _build_web_ui npm install invocation
- Update the --skip-build pre-build hint in the dashboard startup path
to use `npm install --workspace web && npm run build -w web` so users
don't accidentally trigger a desktop rebuild by following the hint.
- Add test_tui_launch_install_uses_workspace_scope to assert that the
TUI launch npm install carries --workspace ui-tui, covering the call
site added in the prior commit.
- Add --workspace ui-tui to the TUI launch npm install, the one call
site missed by the prior commit. Without scoping it ran from
PROJECT_ROOT and still resolved apps/desktop via the apps/* glob.
- Update the two manual-recovery hints in _build_web_ui (npm install
failure and build failure paths) to use the scoped form
`npm install --workspace web && npm run build -w web` so users
following the hint don't accidentally trigger a desktop rebuild.
- Update the stale test assertion in test_cmd_update.py to expect
--workspace web in the _build_web_ui npm ci call, which was
previously unreachable through the if-guard and left the workspace-
scoping change from the prior commit unverified.
The dependency-verification repair in _verify_core_dependencies_installed
ran 'pip install --reinstall -e .' via _run_install_with_heartbeat directly,
bypassing the Windows shim-quarantine that the primary install path performs.
That reinstall rewrites the entry-point shims, and on Windows the live
hermes.exe is the running process — pip can neither delete nor overwrite it.
With no quarantine, the shim was left missing and 'hermes' dropped off PATH
('hermes' is not recognized... after update).
Extract the rename-out-of-the-way / restore-on-failure logic into a reusable
_run_quarantined_install helper and route both the primary editable installs
and the --reinstall -e . repair through it. The per-package repair installs
only third-party deps (never hermes-agent), so they don't touch the shims and
are left untouched. Add a regression test (fails on old code, passes on new).