When a Windows user relaunches Hermes while an in-app update is still
running (the desktop vanished with no progress and looks crashed), the
fresh instance spawns its own dashboard backend. That backend re-locks
the venv shim, the updater's straggler cleanup (force_kill_other_hermes
-> taskkill /F /T /IM hermes.exe) kills it, the launch dies with the 45s
"backend didn't come up" timeout, and the user relaunches into the same
trap -- an infinite respawn/kill loop (#50238).
Root cause: no mutual exclusion between an applying update and a fresh
desktop spawning its own local backend.
Fix: the updater publishes a HERMES_HOME/.hermes-update-in-progress
marker (pid + start time) for the whole run via an RAII drop-guard that
removes it on every exit path (success, early return, panic). A
freshly-launched desktop checks the marker before spawning its local
backend and PARKS until the update finishes -- then brings the backend
up itself (it is the surviving instance; the updater's own relaunch hits
the single-instance lock and quits). A stale marker (dead pid or past a
20-minute ceiling) is pruned so a crashed updater can never strand
future launches. No rogue backend spawns mid-update, so
force_kill_other_hermes has nothing legitimate to kill.
Marker parse/staleness logic is extracted to update-marker.cjs and
unit-tested; the Rust guard has unit tests; the Rust-write <-> JS-read
contract is E2E-verified.