hermes-agent/hermes_cli/dashboard_auth/routes.py
Ben a890389b69 feat(dashboard-auth): HERMES_DASHBOARD_PUBLIC_URL / dashboard.public_url override
Operators behind reverse proxies that don't reliably forward
X-Forwarded-Host / X-Forwarded-Proto / X-Forwarded-Prefix (manual
nginx setups, on-prem ingresses, custom-domain Fly deploys with
incomplete proxy chains) had no way to force the absolute base URL
the OAuth callback redirects from. The dashboard would reconstruct
the redirect_uri from request headers, the IDP would echo it back,
and the user would land on the wrong host or wrong path — 404.

Add `dashboard.public_url` to config.yaml with env override
HERMES_DASHBOARD_PUBLIC_URL. When set, it is the complete authority —
scheme + host + optional path prefix (e.g. https://example.com/hermes) —
and becomes the base for the OAuth `redirect_uri`. X-Forwarded-Prefix
is IGNORED on this code path because the operator has explicitly
declared the public URL; we no longer need to guess from proxy
headers, and stacking the prefix on top would double-prefix the
common case where the prefix is already baked into public_url.

When unset, the existing proxy_headers + X-Forwarded-Prefix
reconstruction runs untouched. Existing Fly.io deploys continue to
work without configuration — this is purely additive.

Precedence mirrors dashboard.oauth.client_id:

  env (non-empty) > config.yaml > reconstructed from request

Implementation:

  - hermes_cli/config.py: add dashboard.public_url to DEFAULT_CONFIG
    with a multi-paragraph doc comment explaining the use case,
    the X-Forwarded-Prefix interaction, and the validation rules.
  - hermes_cli/dashboard_auth/prefix.py: factored out the existing
    _REJECT_CHARS frozenset, added _normalise_public_url() validator
    (requires http/https scheme + non-empty host + no header-injection
    chars), _load_dashboard_section() loader (robust to load_config
    raising, non-dict shapes), and resolve_public_url() entry point
    with the env-overrides-config precedence. A malformed value
    silently falls through to ""; the caller treats "" as "reconstruct
    from request" so a typo never breaks the login flow.
  - hermes_cli/dashboard_auth/routes.py: rewrite _redirect_uri()
    docstring to spell out the three resolution tiers; add the
    public_url short-circuit before the existing X-Forwarded-Prefix
    splicing. Source-level comment notes that X-Forwarded-Prefix is
    intentionally ignored when public_url is set so a future reader
    doesn't try to "fix" the missing prefix layering.
  - cli-config.yaml.example: extend the existing dashboard section
    with a public_url block.
  - website/docs/user-guide/features/web-dashboard.md: new "Public
    URL override" section between the provider configuration and
    the OAuth flow walkthrough. Documents the env-vs-config table,
    the validation rules, and the `http://` `public_url` ↔ Secure
    cookie footgun.

Test coverage — new TestPublicUrlOverride class (8 tests):

  - env var overrides request reconstruction (the primary motivating
    case)
  - config.yaml used when env unset
  - env wins over config (precedence pin)
  - public_url with a path prefix already baked in (the Q1-a case the
    user explicitly chose)
  - public_url suppresses X-Forwarded-Prefix layering (defends
    against the double-prefix bug)
  - trailing slash stripped from public_url (no //auth/callback)
  - malformed public_url falls through to reconstruction (six
    hostile inputs: javascript:, ftp:, missing scheme, missing host,
    quote chars, CRLF injection)
  - empty env string doesn't shadow config.yaml entry (CI / Fly
    provisioned-but-empty secret case)

Mutation-tested: flipping the precedence in resolve_public_url() trips
exactly test_env_overrides_config_public_url; weakening the validator
(accept any scheme) trips exactly test_malformed_public_url_falls_through_to_reconstruction.
Both other tests in each pair stay green, confirming the suite
discriminates the specific regression each test pins.
2026-05-27 02:12:27 -07:00

456 lines
16 KiB
Python

"""HTTP routes for the dashboard-auth OAuth round trip.
Mounted at root (no prefix) by ``web_server.py``. The router does not
auto-gate; gating is performed by ``gated_auth_middleware``, which
allowlists everything under ``/auth/*`` and ``/api/auth/providers``.
The routes:
GET /login → server-rendered login page
GET /auth/login?provider=N → 302 to IDP, sets PKCE cookie
GET /auth/callback?code,state → completes login, sets session cookies
POST /auth/logout → clears cookies, best-effort revoke
GET /api/auth/providers → list registered providers (login bootstrap)
GET /api/auth/me → current Session as JSON (auth-required)
"""
from __future__ import annotations
import logging
import time
from typing import Any
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
from hermes_cli.dashboard_auth import (
get_provider,
list_providers,
)
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
from hermes_cli.dashboard_auth.base import (
InvalidCodeError,
ProviderError,
)
from hermes_cli.dashboard_auth.cookies import (
clear_pkce_cookie,
clear_session_cookies,
detect_https,
read_pkce_cookie,
read_session_cookies,
set_pkce_cookie,
set_session_cookies,
)
from hermes_cli.dashboard_auth.login_page import render_login_html
_log = logging.getLogger(__name__)
router = APIRouter()
def _redirect_uri(request: Request) -> str:
"""Reconstruct the absolute callback URL the IDP redirects back to.
Three resolution tiers:
1. ``HERMES_DASHBOARD_PUBLIC_URL`` env var or
``dashboard.public_url`` in config.yaml — when set, this is
the complete authority (scheme + host + optional path prefix)
and we append ``/auth/callback`` verbatim. ``X-Forwarded-Prefix``
is IGNORED on this code path because the operator has declared
the public URL — we no longer need to guess from proxy headers,
and stacking the prefix on top would double-prefix the common
case where the prefix is already baked into ``public_url``.
Relief valve for deploys behind reverse proxies whose forwarded
headers aren't reliable.
2. ``X-Forwarded-Prefix: /hermes`` (Mission Control deploys) — we
prepend the prefix to the path FastAPI's ``url_for`` produces
(it doesn't natively honour this header — it isn't part of the
Starlette/uvicorn proxy_headers set).
3. Bare ``request.url_for("auth_callback")`` — under uvicorn's
``proxy_headers=True`` this picks up the public https URL from
``X-Forwarded-Host`` plus ``X-Forwarded-Proto``. Fly.io's
default path.
"""
from urllib.parse import urlparse, urlunparse
from hermes_cli.dashboard_auth.prefix import (
prefix_from_request,
resolve_public_url,
)
# Tier 1: operator-declared public URL.
public_url = resolve_public_url()
if public_url:
# ``public_url`` is the complete authority (possibly with a
# path prefix already baked in). Append the auth callback path
# verbatim. ``resolve_public_url`` already stripped any trailing
# slash so we don't produce ``//auth/callback`` double-slashes.
return f"{public_url}/auth/callback"
# Tier 2 + 3: reconstruct from the request URL, optionally with
# X-Forwarded-Prefix layered on top of the path.
base = str(request.url_for("auth_callback"))
prefix = prefix_from_request(request)
if not prefix:
return base
parsed = urlparse(base)
return urlunparse(parsed._replace(path=f"{prefix}{parsed.path}"))
def _client_ip(request: Request) -> str:
fwd = request.headers.get("x-forwarded-for", "")
if fwd:
return fwd.split(",")[0].strip()
return request.client.host if request.client else ""
def _prefix(request: Request) -> str:
"""Resolve the X-Forwarded-Prefix header for the active request.
Local indirection so the routes pass a consistent value to the
cookie helpers (cookie name + Path attribute) and the gate's
redirect builders (login_url construction). See
``hermes_cli.dashboard_auth.prefix`` for the normalisation rules.
"""
from hermes_cli.dashboard_auth.prefix import prefix_from_request
return prefix_from_request(request)
# ---------------------------------------------------------------------------
# Public: login page (server-rendered HTML, no SPA bundle)
# ---------------------------------------------------------------------------
@router.get("/login", name="login_page")
async def login_page(request: Request) -> HTMLResponse:
# Read the ``next=`` query the gate's ``_unauth_response`` set on
# the redirect URL. Validate against the same same-origin rules the
# callback applies (defence in depth — the gate already filters,
# but /login is reachable directly too).
next_path = _validate_post_login_target(
request.query_params.get("next", "")
)
return HTMLResponse(
render_login_html(next_path=next_path),
headers={"Cache-Control": "no-store, no-cache, must-revalidate"},
)
# ---------------------------------------------------------------------------
# Public: provider list for the login-page bootstrap
# ---------------------------------------------------------------------------
@router.get("/api/auth/providers", name="auth_providers")
async def api_auth_providers() -> Any:
providers = list_providers()
if not providers:
# Q13: fail-closed when zero providers are registered.
return JSONResponse(
{"detail": "no auth providers registered"},
status_code=503,
)
return {
"providers": [
{"name": p.name, "display_name": p.display_name}
for p in providers
],
}
# ---------------------------------------------------------------------------
# Public: OAuth round trip
# ---------------------------------------------------------------------------
@router.get("/auth/login", name="auth_login")
async def auth_login(request: Request, provider: str, next: str = ""):
p = get_provider(provider)
if p is None:
raise HTTPException(
status_code=404,
detail=f"Unknown provider: {provider!r}",
)
try:
ls = p.start_login(redirect_uri=_redirect_uri(request))
except ProviderError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider,
reason="provider_unreachable",
ip=_client_ip(request),
)
raise HTTPException(
status_code=503,
detail=f"Provider unreachable: {e}",
)
audit_log(
AuditEvent.LOGIN_START,
provider=provider,
ip=_client_ip(request),
)
resp = RedirectResponse(url=ls.redirect_url, status_code=302)
# Pack the provider name into the PKCE cookie so the callback can
# find it without a separate cookie. Provider may or may not have
# already included a ``provider=`` segment.
pkce = ls.cookie_payload.get("hermes_session_pkce", "")
if "provider=" not in pkce:
pkce = f"provider={provider};{pkce}" if pkce else f"provider={provider}"
# Carry ``next=`` through the round trip in the PKCE cookie. Real
# IDPs only echo back ``code`` + ``state`` on the callback URL, so
# query-string transport would lose the value — the cookie is the
# only server-controlled channel that survives. Validate before we
# store it so an attacker who reaches /auth/login directly with
# ``next=//evil.example`` can't poison the cookie.
safe_next = _validate_post_login_target(next)
if safe_next:
from urllib.parse import quote
pkce = f"{pkce};next={quote(safe_next, safe='')}"
set_pkce_cookie(
resp, payload=pkce, use_https=detect_https(request),
prefix=_prefix(request),
)
return resp
@router.get("/auth/callback", name="auth_callback")
async def auth_callback(
request: Request,
code: str = "",
state: str = "",
error: str = "",
error_description: str = "",
):
pkce_raw = read_pkce_cookie(request)
if not pkce_raw:
audit_log(
AuditEvent.LOGIN_FAILURE,
reason="missing_pkce_cookie",
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail="Missing PKCE state cookie",
)
# Parse ``provider=...;state=...;verifier=...;next=...`` — the
# ``next`` segment is optional (only present when /auth/login was
# given a next= query). All keys live in the same flat namespace;
# ``next`` carries a URL-encoded path so it never contains ``;``.
parts = dict(
seg.split("=", 1) for seg in pkce_raw.split(";") if "=" in seg
)
provider_name = parts.get("provider", "")
expected_state = parts.get("state", "")
verifier = parts.get("verifier", "")
# Read next= from the cookie ONLY. The IDP doesn't echo next= back
# on the callback URL (it only carries ``code`` + ``state``), so any
# next= query parameter on the callback URL is attacker-controlled
# and MUST be ignored.
next_from_cookie = parts.get("next", "")
p = get_provider(provider_name)
if p is None:
raise HTTPException(
status_code=400,
detail=f"Unknown provider in cookie: {provider_name!r}",
)
if error:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="idp_error",
error=error,
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail=f"OAuth error from provider: {error} ({error_description})",
)
if not state or state != expected_state:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="state_mismatch",
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail="OAuth state mismatch (CSRF check failed)",
)
try:
session = p.complete_login(
code=code,
state=state,
code_verifier=verifier,
redirect_uri=_redirect_uri(request),
)
except InvalidCodeError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="invalid_code",
ip=_client_ip(request),
)
raise HTTPException(status_code=400, detail=f"Invalid code: {e}")
except ProviderError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="provider_unreachable",
ip=_client_ip(request),
)
raise HTTPException(
status_code=503,
detail=f"Provider unreachable: {e}",
)
audit_log(
AuditEvent.LOGIN_SUCCESS,
provider=provider_name,
user_id=session.user_id,
email=session.email,
org_id=session.org_id,
ip=_client_ip(request),
)
expires_in = max(60, session.expires_at - int(time.time()))
# Honour the ``next=`` value the gate's _unauth_response set in the
# /login redirect URL and that /auth/login persisted into the PKCE
# cookie. We re-validate against the same-origin rules here — the
# cookie is server-set so this is defence in depth, but a regression
# that lets attacker-controlled bytes into the cookie would otherwise
# produce an open redirect.
landing = _validate_post_login_target(next_from_cookie) or "/"
resp = RedirectResponse(url=landing, status_code=302)
set_session_cookies(
resp,
access_token=session.access_token,
refresh_token=session.refresh_token,
access_token_expires_in=expires_in,
use_https=detect_https(request),
prefix=_prefix(request),
)
clear_pkce_cookie(resp, prefix=_prefix(request))
return resp
def _validate_post_login_target(raw: str) -> str:
"""Return ``raw`` if it's a safe same-origin path, else empty string.
The ``next`` query param survives a full OAuth round trip — the gate
encodes it into the /login redirect, the login page emits it back into
/auth/login, and the IDP preserves it across /authorize/callback. We
have to re-validate here because the value came back in via the
URL (an attacker could craft a /auth/callback URL with their own
``next=https://evil.example``).
"""
if not raw:
return ""
from urllib.parse import unquote
decoded = unquote(raw)
if not decoded.startswith("/") or decoded.startswith("//"):
return ""
# Don't loop back to login pages or auth flow.
if any(
decoded == p or decoded.startswith(p)
for p in ("/login", "/auth/", "/api/auth/")
):
return ""
return decoded
@router.post("/auth/logout", name="auth_logout")
async def auth_logout(request: Request):
_at, rt = read_session_cookies(request)
if rt:
# Best-effort revoke. Try every provider so a session minted by
# any registered provider is revoked correctly. Failures are
# logged but never raised.
for provider in list_providers():
try:
provider.revoke_session(refresh_token=rt)
except Exception as e: # noqa: BLE001 — best-effort
_log.warning(
"dashboard-auth: revoke on %r failed: %s",
provider.name, e,
)
sess = getattr(request.state, "session", None)
audit_log(
AuditEvent.LOGOUT,
provider=(sess.provider if sess else "unknown"),
user_id=(sess.user_id if sess else ""),
ip=_client_ip(request),
)
prefix = _prefix(request)
resp = RedirectResponse(url=f"{prefix}/login", status_code=302)
clear_session_cookies(resp, prefix=prefix)
clear_pkce_cookie(resp, prefix=prefix)
return resp
# ---------------------------------------------------------------------------
# Auth-required: identity probe for the SPA
# ---------------------------------------------------------------------------
@router.get("/api/auth/me", name="auth_me")
async def api_auth_me(request: Request):
"""Return the verified session as JSON. Auth-required (gate enforces)."""
sess = getattr(request.state, "session", None)
if sess is None:
raise HTTPException(status_code=401, detail="Unauthorized")
return {
"user_id": sess.user_id,
"email": sess.email,
"display_name": sess.display_name,
"org_id": sess.org_id,
"provider": sess.provider,
"expires_at": sess.expires_at,
}
# ---------------------------------------------------------------------------
# Auth-required: WS upgrade ticket (Phase 5)
# ---------------------------------------------------------------------------
@router.post("/api/auth/ws-ticket", name="auth_ws_ticket")
async def api_auth_ws_ticket(request: Request):
"""Mint a short-lived single-use ticket for the authenticated session.
Browsers cannot set ``Authorization`` on a WebSocket upgrade, so in
gated mode the SPA POSTs this endpoint to get a ``?ticket=`` value to
append to ``/api/pty``, ``/api/ws``, ``/api/pub``, or ``/api/events``.
The ticket has a 30-second TTL and is single-use. Calling this endpoint
multiple times in quick succession (e.g. one ticket per WS) is the
expected pattern.
"""
sess = getattr(request.state, "session", None)
if sess is None:
# Middleware should already have rejected, but check defensively.
raise HTTPException(status_code=401, detail="Unauthorized")
# Import here so the routes module stays usable in test contexts that
# don't load the ticket store.
from hermes_cli.dashboard_auth.ws_tickets import TTL_SECONDS, mint_ticket
ticket = mint_ticket(user_id=sess.user_id, provider=sess.provider)
audit_log(
AuditEvent.WS_TICKET_MINTED,
provider=sess.provider,
user_id=sess.user_id,
ip=_client_ip(request),
)
return {"ticket": ticket, "ttl_seconds": TTL_SECONDS}