hermes-agent/hermes_cli/dashboard_auth/cookies.py
Ben b26d81d536 feat(dashboard-auth): honour X-Forwarded-Prefix + __Host-/__Secure- cookies
Mission-control style deploys reverse-proxy the dashboard at a path
prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject
X-Forwarded-Prefix: /hermes on every request. The SPA mount already
honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__,
but the OAuth gate didn't:

  1. The gate's Location: header to /login and the 401 envelope's
     login_url were built bare ("/login?next=..."). Under a /hermes
     prefix the browser follows that to mission-control.tilos.com/login
     which the proxy doesn't route to the dashboard.
  2. _redirect_uri (the OAuth callback URL handed to the IDP) used
     request.url_for() which doesn't honour X-Forwarded-Prefix
     (Starlette/uvicorn only proxy_headers Host + Proto + For). The
     IDP redirects back to /auth/callback instead of /hermes/auth/
     callback → 404 in the user's browser.
  3. Cookies were set with Path=/ which leaks them to other apps on
     the same origin and won't be sent back on requests under the
     prefix in the first place.

Fix threads the normalised prefix through every boundary:

  * New hermes_cli/dashboard_auth/prefix.py — single source of truth
    for X-Forwarded-Prefix parsing. web_server._normalise_prefix
    becomes a re-export so the SPA mount, the gate, and the cookies
    helper all agree.
  * middleware._unauth_response builds login_url = f"{prefix}/login".
  * routes._redirect_uri splices the prefix into the path component
    of the IDP-bound URL (with full validation of the header).
  * cookies.{set,clear}_{session,pkce}_cookie now take prefix="".
    Path attribute switches to /hermes when set; cookie name switches
    name variant (see below). Every caller passes the request's
    normalised prefix.

Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt
the __Host- / __Secure- cookie name prefixes per draft-west-cookie-
prefixes. The variant is selected from (use_https, prefix):

  * Loopback HTTP → bare "hermes_session_at" (both prefixes require
    Secure, incompatible with HTTP).
  * HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at".
    Strongest spec: bound to exact origin, no Domain attribute, Secure
    required.
  * HTTPS, behind a proxy prefix (Path=/hermes) →
    "__Secure-hermes_session_at". __Host- forbids Path != "/"; the
    explicit Path=/hermes covers same-origin app isolation.

Setter and reader BOTH consult the prefix because the cookie *name*
changes — a reader that looked up the bare name when the setter wrote
__Secure- would never find the value. The reader falls back across
all three variants so a request whose shape changed mid-session (e.g.
post-deploy from no-prefix to /hermes) still picks up the existing
cookie until it expires.

Test coverage:

  - tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests
    pinning:
      • Location: /hermes/login on the gate's HTML redirect
      • 401 envelope login_url carries the prefix
      • Malformed X-Forwarded-Prefix is ignored (header-injection
        defence; the script-tag value is normalised to empty string)
      • _redirect_uri splices /hermes into the path (the property
        that prevents the IDP-returns-to-404 failure)
      • PKCE cookie uses Path=/hermes + __Secure- when proxied
      • Session cookies use __Host- when direct, __Secure- when
        proxied, bare on loopback HTTP
      • End-to-end round trip with hand-managed PKCE cookie carriage
        (TestClient can't simulate a Path=/hermes cookie automatically)
  - tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin
    each (use_https, prefix) shape produces its expected cookie name,
    plus reader-side coverage that __Host- and __Secure- variants are
    both recognised.
  - Existing tests across middleware / 401-reauth / etc. updated to
    match the new cookie names (substring contains instead of
    startswith).

Mutation-tested: reverting _unauth_response to build the bare
"/login" URL trips exactly the two tests that pin the prefix
carriage, confirming the suite discriminates the regression.
2026-05-27 02:12:27 -07:00

234 lines
9 KiB
Python

"""Cookie helpers for dashboard auth.
Three cookies in play:
- hermes_session_at: the OAuth access token
(HttpOnly, lifetime = token TTL)
- hermes_session_rt: the OAuth refresh token
(HttpOnly, lifetime = 30 days)
**DEPRECATED in OAuth contract v1** — Nous Portal
does not issue refresh tokens; we keep the cookie
name and clear semantics for forward compatibility
and to flush stale cookies from old browsers.
- hermes_session_pkce: short-lived PKCE state + CSRF nonce + provider
hint (HttpOnly, lifetime = 10 minutes)
All three are ``SameSite=Lax`` (browser will send on cross-site GET
top-level navigation, which we need for the IDP redirect back to
``/auth/callback``) and live under the prefix's Path. ``Secure`` is set
ONLY when the dashboard was reached over HTTPS — detected via the
request URL scheme, which honours ``X-Forwarded-Proto`` upstream of
Fly's TLS terminator when uvicorn is configured with
``proxy_headers=True``. Loopback dev traffic is always HTTP so
``Secure`` would lock the cookies out of the browser.
Cookie prefix selection (browser hardening per
https://datatracker.ietf.org/doc/html/draft-west-cookie-prefixes):
* Loopback HTTP — bare name. ``__Host-`` / ``__Secure-`` require
``Secure``, which is incompatible with HTTP.
* Gated HTTPS, direct deploy (Path=/) — ``__Host-`` prefix. Binds the
cookie to the exact origin (no Domain attribute) — strongest spec
guarantee.
* Gated HTTPS, behind a reverse-proxy prefix (Path=/hermes) —
``__Secure-`` prefix. ``__Host-`` is disallowed when Path != "/";
``__Secure-`` keeps the Secure-required hardening without the
Path constraint, and the explicit ``Path=/hermes`` covers
same-origin app isolation.
The setters and readers BOTH consult the active prefix because the
cookie *name* changes — a reader that looked up the bare name when the
setter wrote ``__Secure-hermes_session_at`` would never find the value.
.. deprecated:: contract v1
``set_session_cookies`` accepts ``refresh_token=""`` (the contract-v1
default) and silently skips writing the RT cookie in that case.
``clear_session_cookies`` still emits a Max-Age=0 deletion for the RT
cookie so users carrying a stale cookie from an earlier deployment get
it cleared on logout / session expiry. The full refresh-flow machinery
was rewritten as "401 → redirect to /login" in Phase 6.
"""
from __future__ import annotations
from typing import Optional, Tuple
from fastapi import Request
from fastapi.responses import Response
# Bare cookie names — the request-scoped ``_resolved_name`` helper
# decides whether to prepend ``__Host-`` / ``__Secure-`` based on the
# request's HTTPS + prefix combination.
SESSION_AT_COOKIE = "hermes_session_at"
SESSION_RT_COOKIE = "hermes_session_rt"
PKCE_COOKIE = "hermes_session_pkce"
# Possible name variants we may have to read back. Sorted so most-strict
# wins on iteration when both happen to be present (shouldn't happen in
# practice — a single request emits exactly one variant).
_NAME_VARIANTS = ("__Host-", "__Secure-", "")
# 30 days — matches Portal's REFRESH_TOKEN_TTL_SECONDS
_RT_MAX_AGE = 30 * 24 * 60 * 60
_PKCE_MAX_AGE = 10 * 60
def _resolved_name(bare: str, *, use_https: bool, prefix: str) -> str:
"""Pick the cookie-prefix variant for the active request shape.
See module docstring for the prefix selection rules. Mismatch
between setter and reader would silently break sessions, so this
function is the single source of truth for naming.
"""
if not use_https:
return bare
if prefix:
# Path != "/" forbids __Host-; fall back to __Secure-.
return f"__Secure-{bare}"
return f"__Host-{bare}"
def _cookie_path(prefix: str) -> str:
"""Cookie ``Path`` attribute for the active deploy shape.
Under ``X-Forwarded-Prefix: /hermes`` we want ``Path=/hermes`` so:
a) the browser sends the cookie back on requests under the prefix
(browsers omit the cookie if request path doesn't start with
Path);
b) the cookie doesn't leak to other apps on the same origin
(``mission-control.tilos.com/billing/...``).
Direct-deploy (no proxy prefix) gets ``Path=/``.
"""
return prefix if prefix else "/"
def _common_attrs(*, use_https: bool, prefix: str) -> dict:
attrs: dict = {
"httponly": True,
"samesite": "lax",
"path": _cookie_path(prefix),
}
if use_https:
attrs["secure"] = True
return attrs
def set_session_cookies(
response: Response,
*,
access_token: str,
refresh_token: str,
access_token_expires_in: int,
use_https: bool,
prefix: str = "",
) -> None:
"""Set the session cookies on the response.
``access_token_expires_in`` is in seconds. Use the provider's reported
TTL for the access token.
``refresh_token`` is accepted for backward / forward compatibility but
SKIPPED when empty — Nous Portal contract v1 issues no refresh tokens
so a ``Session.refresh_token == ""`` from the provider means we don't
persist anything. If a future contract revision starts emitting refresh
tokens, this helper will write the RT cookie again with no other change.
``prefix`` is the normalised X-Forwarded-Prefix value (e.g. ``/hermes``)
or ``""`` for a direct deploy. It influences both the cookie name
(``__Host-`` vs ``__Secure-`` vs bare) and the ``Path`` attribute.
"""
response.set_cookie(
_resolved_name(SESSION_AT_COOKIE, use_https=use_https, prefix=prefix),
access_token,
max_age=access_token_expires_in,
**_common_attrs(use_https=use_https, prefix=prefix),
)
# Contract v1: empty refresh token means "don't persist RT cookie".
# Keeping a literal empty-value cookie around would be dead state at
# best, attack surface at worst.
if refresh_token:
response.set_cookie(
_resolved_name(SESSION_RT_COOKIE, use_https=use_https, prefix=prefix),
refresh_token,
max_age=_RT_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_session_cookies(response: Response, *, prefix: str = "") -> None:
"""Emit Max-Age=0 deletions for both session cookies.
To delete a cookie reliably the deletion's ``Path`` must match the
set path AND the cookie name must match the variant the setter used.
We don't know which variant was originally set (cookie prefix
depends on the request that set it), so we emit deletions for every
plausible variant under the active path.
"""
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{SESSION_AT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
response.set_cookie(
f"{variant}{SESSION_RT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def set_pkce_cookie(
response: Response, *, payload: str, use_https: bool, prefix: str = "",
) -> None:
response.set_cookie(
_resolved_name(PKCE_COOKIE, use_https=use_https, prefix=prefix),
payload,
max_age=_PKCE_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_pkce_cookie(response: Response, *, prefix: str = "") -> None:
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{PKCE_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def _read_with_fallback(
request: Request, bare_name: str,
) -> Optional[str]:
"""Read a cookie by checking every prefix variant in order.
The setter chooses one variant based on the active request shape;
the reader doesn't know which one fired (the request that READS
the cookie may not be the same shape as the request that SET it
in pathological cases). Trying all three guarantees we find it.
"""
for variant in _NAME_VARIANTS:
value = request.cookies.get(f"{variant}{bare_name}")
if value is not None:
return value
return None
def read_session_cookies(request: Request) -> Tuple[Optional[str], Optional[str]]:
"""Returns (access_token, refresh_token), either may be None."""
at = _read_with_fallback(request, SESSION_AT_COOKIE)
rt = _read_with_fallback(request, SESSION_RT_COOKIE)
return at, rt
def read_pkce_cookie(request: Request) -> Optional[str]:
return _read_with_fallback(request, PKCE_COOKIE)
def detect_https(request: Request) -> bool:
"""Decide whether to set the ``Secure`` cookie flag.
Reads ``request.url.scheme`` — under uvicorn's ``proxy_headers=True``
(which start_server enables when the gate is active), this honours
``X-Forwarded-Proto`` from Fly's TLS terminator. Loopback traffic is
always HTTP so this returns False there.
"""
return request.url.scheme == "https"