hermes-agent/hermes_cli/dashboard_auth/public_paths.py
Ben a618789dba fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.

Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.

Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.

Endpoint inventory (verified safe to remain public):

* /api/status            — version, gateway state, active session count,
                           auth-gate shape. Portal liveness probe target.
* /api/config/defaults   — config-defaults feed for the SPA's Config page
* /api/config/schema     — config schema for the SPA's Config page
* /api/model/info        — model catalogue metadata (context windows)
* /api/dashboard/themes  — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard

No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.

Tests:

* New: test_gated_status_is_public (regression guard with the NAS
  fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
  over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
  never the response)
* New: docker integration check #3 in
  test_dashboard_oauth_gate_engaged_by_default — /api/status
  remains 200 under the gate AND reports auth_required=True so the
  portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
  /api/sessions instead of /api/status (status is public, so it
  can no longer distinguish 'logged in' from 'gate accidentally
  disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
  dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
  test_dashboard_oauth_gate_engaged_by_default probes
  /api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
  test_dashboard_auth_status_endpoint.py (no longer needed since
  /api/status is reachable cold)

Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
2026-05-29 12:17:12 +10:00

49 lines
2.2 KiB
Python

"""Shared allowlist of ``/api/*`` paths that bypass dashboard auth.
Two middlewares enforce dashboard auth and previously kept independent
copies of this list:
* ``hermes_cli.web_server.auth_middleware`` — loopback / ``--insecure``
mode, gates on the ephemeral ``_SESSION_TOKEN``.
* ``hermes_cli.dashboard_auth.middleware.gated_auth_middleware`` —
non-loopback mode, gates on the OAuth session cookie.
When the lists drifted, ``/api/status`` ended up public under the legacy
gate but 401'd under the OAuth gate. That broke the portal's wildcard
liveness probe (``nous-account-service`` ``fly-provider.ts``
``getInstanceRuntimeStatus``), which fetches ``/api/status`` without a
cookie as its sole signal of "agent dashboard is alive": every healthy
wildcard-subdomain agent surfaced as STARTING/down in the portal UI even
though the dashboard was serving correctly.
Centralising the allowlist here so both middlewares import the same
frozenset prevents the next drift. Keep this list minimal — only truly
non-sensitive, read-only endpoints belong here. As a sanity check, every
entry should be safe to expose to:
* external uptime probes (Pingdom, Better Stack, NAS),
* the dashboard SPA before the user has logged in,
* anyone who happens to ``curl`` the hostname.
If a new endpoint doesn't pass all three tests, it should be gated and
the SPA should bootstrap it after login instead.
"""
from __future__ import annotations
PUBLIC_API_PATHS: frozenset[str] = frozenset({
# Liveness probe target. Returns version, gateway state, active
# session count, and the dashboard auth-gate shape. No bodies, no
# session content, no secrets. Documented as the portal's wildcard
# liveness probe in
# ``docs/agent-dashboard-public-url-contract.md`` (NAS side).
"/api/status",
# Read-only config-defaults / schema feeds for the SPA's Config page.
"/api/config/defaults",
"/api/config/schema",
# Read-only model metadata (context windows, etc.) — same shape as
# provider catalogs already exposed on the public internet.
"/api/model/info",
# Read-only theme + plugin manifests for the dashboard skin engine.
"/api/dashboard/themes",
"/api/dashboard/plugins",
})