refactor(gateway): route all active_agents coercion through parse_active_agents; harden drain-timeout fallback

Second cleanup pass (simplify-code review of the first follow-up):

- write_runtime_status now clamps active_agents via parse_active_agents
  instead of an inline max(0, int(...)). Removes the duplicated clamp the
  helper's docstring acknowledged AND closes a write-side ValueError gap
  (a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
  through parse_active_agents too — the third coercion site of the same
  persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
  ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
  (a real float) instead of a blanket 'except Exception -> None'. None would
  have violated the surfaced field's int/float contract and stripped NAS's
  poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
  handles the empty/None case) and tightened the parse_active_agents docstring
  to describe the actual single-contract role (write + both reads).
This commit is contained in:
kshitijk4poor 2026-06-21 16:43:13 +05:30
parent b577f25100
commit 4d7bb382b0
3 changed files with 16 additions and 10 deletions

View file

@ -1844,7 +1844,7 @@ async def get_status(profile: Optional[str] = None):
# liveness via the single shared contract in gateway.status. Liveness
# keys off gateway_running (a live PID/health probe), NEVER
# gateway_updated_at — a healthy idle gateway never advances that.
active_agents = parse_active_agents(runtime.get("active_agents", 0)) if runtime else 0
active_agents = parse_active_agents((runtime or {}).get("active_agents", 0))
gateway_busy = derive_gateway_busy(
gateway_running=gateway_running,
gateway_state=gateway_state,
@ -1862,8 +1862,12 @@ async def get_status(profile: Optional[str] = None):
from hermes_cli.gateway import _get_restart_drain_timeout
restart_drain_timeout = _get_restart_drain_timeout()
except Exception:
restart_drain_timeout = None
except ImportError:
# Resolver moved/renamed — fall back to the real default so the
# field stays a numeric poll-deadline hint, never None.
from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
restart_drain_timeout = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
# Dashboard auth gate (Phase 7): surface whether the gate is engaged
# and which providers are registered so ``hermes status`` and the