Commit graph

6 commits

Author SHA1 Message Date
Teknium
0ad4f55aa8
feat(dashboard): add --stop and --status flags (#17840)
`hermes dashboard` is a long-lived foreground server that users often
start and forget about, sometimes in a shell they've since closed.  We
didn't have a way to stop it — users had to find the PID manually.

Adds two lifecycle flags that reuse the same detection + termination
path the post-`hermes update` cleanup (PR #17832) uses:

  hermes dashboard --status
    List running hermes dashboard processes with PID + cmdline.
    Exit 0, informational.

  hermes dashboard --stop
    Terminate all running dashboards (3s grace then force-kill survivors).
    Exit 0 if none remain, 1 if any couldn't be stopped.
    Windows uses `taskkill /F` as before.

Both flags short-circuit before any fastapi/uvicorn import so they work
even on installations where the dashboard extras aren't installed —
useful when you're cleaning up after uninstalling.

The kill helper gained an optional `reason=...` param so the output
reads "(requested via --stop)" instead of the post-update-specific
"running backend no longer matches the updated frontend" wording.

E2E: `hermes dashboard --status` with nothing running prints the
empty message; with a fake `hermes dashboard ...` cmdline spawned via
`exec -a`, `--status` lists it, `--stop` terminates it (exit -15),
and a follow-up `--status` returns empty.
2026-04-30 02:30:20 -07:00
Teknium
5b85a7d351
fix(update): kill stale dashboard processes instead of warning (#17832)
`hermes update` previously just printed a warning when it detected a
running `hermes dashboard` process from the previous version, telling
the user to kill and restart it themselves.  In practice dashboards get
started and forgotten, so the warning was routinely ignored and users
ended up with a silent frontend/backend mismatch (new JS bundle served
against the old in-memory Python backend, e.g. new auth headers the old
code doesn't recognise → every API call 401s).

The dashboard has no service manager, no PID file, and we don't record
the original launch args (--host, --port, --insecure, --tui, --no-open)
so we can't auto-restart it.  But we CAN stop it, which is what the
user wants — the failure mode when the stale process is left alive is
worse than the dashboard just being down.

- POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors.
- Windows: `taskkill /PID <pid> /F`.
- Print each PID's outcome plus a one-line restart hint.
- Detection logic is unchanged (same ps / wmic scan, same guards
  against the `pgrep -f` greedy-match trap from #16872 and the
  #17049 wmic UnicodeDecodeError fix).

Also split the old monolithic `_warn_stale_dashboard_processes` into
`_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes`
(kill), keeping the old name as an alias so any external callers still
work.

E2E verified: spawned a fake `hermes dashboard` cmdline via
`exec -a 'hermes dashboard …' sleep 300`, ran
`_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15)
and that a post-scan returns an empty PID list.
2026-04-30 01:34:34 -07:00
briandevans
835f9adec0 fix(update,test): clarify wmic comment; switch tests to monkeypatch sys.platform
Two fix-ups for #17123:

1. Reword the inline comment in `_warn_stale_dashboard_processes` to
   accurately describe the failure mode (locale-dependent decoder, not a
   "default UTF-8 decoder") and identify `errors="ignore"` as the
   load-bearing protection. Per Copilot's review.

2. Switch `TestWindowsWmicEncoding` from `patch("hermes_cli.main.sys")`
   to `monkeypatch.setattr(sys, "platform", "win32")` — the codebase's
   canonical pattern (e.g. `tests/hermes_cli/test_auth_ssl_macos.py`).
   The MagicMock-replacement approach passed locally on Python 3.12 but
   the platform-equality check failed under CI's xdist+Python 3.11,
   leaving both new tests red despite the fix being present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
briandevans
b85fff9495 fix(update): protect dashboard wmic scan against UnicodeDecodeError on Windows non-UTF-8 locales (#17049)
`hermes update` calls `_warn_stale_dashboard_processes()` to warn about
dashboard processes still running the pre-update Python backend. On
Windows, that scan shells out to `wmic process get ProcessId,CommandLine
/FORMAT:LIST` with `text=True` and no explicit encoding.

`wmic` emits text in the system code page (e.g. cp936 on zh-CN locales),
not UTF-8. Without an explicit `encoding=`, Python's default UTF-8
decoder crashes the subprocess reader thread with
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 ...`. In
Python 3.11 that crash is silently absorbed: `subprocess.run()` returns
a `CompletedProcess` with `result.stdout = None`, the next line calls
`result.stdout.split("\n")`, and `hermes update` aborts with the
exact `AttributeError: 'NoneType' object has no attribute 'split'`
trace reported in #17049.

Fix: pass `encoding="utf-8", errors="ignore"` so undecodable bytes
cannot take down the reader thread (the parsing only matches the ASCII
prefixes `CommandLine=` and `ProcessId=`, so dropping non-UTF-8 bytes
is safe), and short-circuit when `result.stdout is None` as a defensive
guard for environments where the reader thread still fails for other
reasons.

This is the same root cause as #17074 (which patches
`hermes_cli/gateway._scan_gateway_pids` for the `hermes setup` path).
That PR does not touch `_warn_stale_dashboard_processes`, so
`hermes update` remains broken on the same locales until this lands.

Regression test in `tests/hermes_cli/test_update_stale_dashboard.py`:
- `test_wmic_invoked_with_utf8_ignore_errors` asserts the explicit
  encoding/errors kwargs reach `subprocess.run`.
- `test_wmic_returns_none_stdout_does_not_crash` simulates the
  reader-thread-crashed `result.stdout=None` aftermath and asserts the
  function returns silently instead of raising AttributeError.

Both new tests fail against clean origin/main (7d4648461) reproducing
the original AttributeError; both pass with this patch. The remaining
3 failures in `tests/hermes_cli/test_cmd_update.py` and
`test_update_autostash.py` are pre-existing baselines on origin/main —
they reproduce identically without this change and are unrelated to
the wmic scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
Teknium
9048fd020f fix(cli): tighten stale-dashboard match to explicit patterns
Replace the Linux/macOS pgrep regex ("hermes.*dashboard") with a ps
scan + the same explicit patterns list already used on the Windows
branch and in hermes_cli.gateway._scan_gateway_pids:

    hermes dashboard
    hermes_cli.main dashboard
    hermes_cli/main.py dashboard

The old greedy regex would match any cmdline containing both words —
e.g. a chat session whose argv mentions "dashboard" or an unrelated
grafana/dashboard-server process. Added regression tests for both.

Follow-up tightening on #16881.
2026-04-28 01:14:44 -07:00
Societus
66b1142384 fix(cli): warn about stale dashboard processes after hermes update
The dashboard is a long-lived server process users start and forget.
When hermes update replaces files on disk, the running process holds
the old Python backend in memory while the JS bundle gets updated,
producing a silent frontend/backend mismatch (e.g. v0.11.0 changed
the session token header -- old backends reject every API call).

Scan for running dashboard processes after a successful update (both
git and ZIP paths) and print a warning with their PIDs and restart
instructions. Mirrors the existing pattern for gateway processes.

Fixes #16872
2026-04-28 01:14:44 -07:00