fix: hermes update kills freshly-restarted gateway service

After restarting a service-managed gateway (systemd/launchd), the stale-process sweep calls find_gateway_pids() which returns ALL gateway PIDs via ps aux — including the one just spawned by the service manager. The sweep kills it, leaving the user with a stopped gateway and a confusing 'Restart manually' message. Fix: add _get_service_pids() to query systemd MainPID and launchd PID for active gateway services, then exclude those PIDs from the sweep. Also add exclude_pids parameter to find_gateway_pids() and kill_gateway_processes() so callers can skip known service-managed PIDs. Adds 9 targeted tests covering: - _get_service_pids() for systemd, launchd, empty, and zero-PID cases - find_gateway_pids() exclude_pids filtering - cmd_update integration: service PID not killed after restart - cmd_update integration: manual PID killed while service PID preserved
2026-04-25 00:51:20 +00:00 · 2026-04-06 09:52:22 +05:30 · 2026-04-06 09:52:22 +05:30 · a2a9ad7431
commit a2a9ad7431
parent 9c96f669a1
3 changed files with 371 additions and 9 deletions
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@ -3607,6 +3607,7 @@ def cmd_update(args):
            from hermes_cli.gateway import (
                is_macos, is_linux, _ensure_user_systemd_env,
                get_systemd_linger_status, find_gateway_pids,
+                _get_service_pids,
            )
            import signal as _signal

@ -3673,8 +3674,11 @@ def cmd_update(args):
                    pass

            # --- Manual (non-service) gateways ---
-            # Kill any remaining gateway processes not managed by a service
-            manual_pids = find_gateway_pids()
+            # Kill any remaining gateway processes not managed by a service.
+            # Exclude PIDs that belong to just-restarted services so we don't
+            # immediately kill the process that systemd/launchd just spawned.
+            service_pids = _get_service_pids()
+            manual_pids = find_gateway_pids(exclude_pids=service_pids)
            for pid in manual_pids:
                try:
                    os.kill(pid, _signal.SIGTERM)