feat(dashboard): add HTTP health probe for cross-container gateway detection

The dashboard's gateway status detection relied solely on local PID checks
(os.kill + /proc), which fails when the gateway runs in a separate container.

Changes:
- web_server.py: Add _probe_gateway_health() that queries the gateway's HTTP
  /health/detailed endpoint when the local PID check fails. Activated by
  setting the GATEWAY_HEALTH_URL env var (e.g. http://gateway:8642/health).
  Falls back to standard PID check when the env var is not set.
- api_server.py: Add GET /health/detailed endpoint that returns full gateway
  state (platforms, gateway_state, active_agents, pid, etc.) without auth.
  The existing GET /health remains unchanged for backwards compatibility.
- StatusPage.tsx: Handle the case where gateway_pid is null but the gateway
  is running remotely, displaying 'Running (remote)' instead of 'PID null'.

Environment variables:
- GATEWAY_HEALTH_URL: URL of the gateway health endpoint (e.g.
  http://gateway-container:8642/health). Unset = local PID check only.
- GATEWAY_HEALTH_TIMEOUT: Probe timeout in seconds (default: 3).
This commit is contained in:
Hermes Agent 2026-04-14 05:17:17 +00:00 committed by Teknium
parent 397386cae2
commit 45595f4805
6 changed files with 78 additions and 1 deletions

View file

@ -80,6 +80,7 @@ export const zh: Translations = {
notRunning: "未运行",
startFailed: "启动失败",
pid: "进程",
runningRemote: "运行中(远程)",
noneRunning: "无",
gatewayFailedToStart: "网关启动失败",
lastUpdate: "最后更新",