feat(dashboard): add HTTP health probe for cross-container gateway detection

The dashboard's gateway status detection relied solely on local PID checks
(os.kill + /proc), which fails when the gateway runs in a separate container.

Changes:
- web_server.py: Add _probe_gateway_health() that queries the gateway's HTTP
  /health/detailed endpoint when the local PID check fails. Activated by
  setting the GATEWAY_HEALTH_URL env var (e.g. http://gateway:8642/health).
  Falls back to standard PID check when the env var is not set.
- api_server.py: Add GET /health/detailed endpoint that returns full gateway
  state (platforms, gateway_state, active_agents, pid, etc.) without auth.
  The existing GET /health remains unchanged for backwards compatibility.
- StatusPage.tsx: Handle the case where gateway_pid is null but the gateway
  is running remotely, displaying 'Running (remote)' instead of 'PID null'.

Environment variables:
- GATEWAY_HEALTH_URL: URL of the gateway health endpoint (e.g.
  http://gateway-container:8642/health). Unset = local PID check only.
- GATEWAY_HEALTH_TIMEOUT: Probe timeout in seconds (default: 3).
This commit is contained in:
Hermes Agent 2026-04-14 05:17:17 +00:00
parent f0b353bade
commit 28c39fda3d
3 changed files with 75 additions and 1 deletions

View file

@ -29,7 +29,8 @@ const GATEWAY_STATE_DISPLAY: Record<string, { badge: "success" | "warning" | "de
};
function gatewayValue(status: StatusResponse): string {
if (status.gateway_running) return `PID ${status.gateway_pid}`;
if (status.gateway_running && status.gateway_pid) return `PID ${status.gateway_pid}`;
if (status.gateway_running) return "Running (remote)";
if (status.gateway_state === "startup_failed") return "Start failed";
return "Not running";
}