hermes-agent/hermes_cli/dashboard_auth/cookies.py
Ben f5ecbe1ec6 feat(dashboard): auto-initiate portal SSO redirect on unauthenticated load
When the dashboard gateway has no local session cookie, it rendered a
click-through /login interstitial — even though the Nous portal's
/oauth/authorize auto-approves any current member of the dashboard's org
and is a silent 302 when the user already holds a portal session. For the
common case (clicking a hosted-agent dashboard link while signed in to the
portal) that interstitial click is pure friction.

This makes the gate auto-initiate the OAuth redirect on an unauthenticated
HTML document load instead of rendering the interstitial, when exactly one
interactive provider is registered. A one-shot loop-guard cookie
(hermes_sso_attempt, 60s TTL) ensures that a genuinely absent portal
session (the portal bounces back still-unauthenticated) falls back to the
/login page after exactly one bounce rather than ping-ponging forever. The
marker is cleared on a successful callback and whenever the gate falls back
to /login.

Security: this removes a human CLICK, not a security check. The redirect
lands on the existing /auth/login route and runs the unchanged PKCE
auth-code flow; token verification, audience checks, redirect-URI match,
and org-membership checks are all untouched. /api/* fetches still get the
401 JSON envelope (never a 302 a fetch() would follow opaquely), and with
two or more providers the /login chooser still renders.

Phase 1 of the cloud-auto-discovery work.
2026-06-29 04:25:18 -07:00

300 lines
12 KiB
Python

"""Cookie helpers for dashboard auth.
Three cookies in play:
- hermes_session_at: the OAuth access token
(HttpOnly, lifetime = token TTL, ~15 min)
- hermes_session_rt: the OAuth refresh token
(HttpOnly, lifetime = 24h, ROTATING + reuse-detected)
Nous Portal issues a rotating refresh token for the
dashboard auth-code grant (Portal NAS #293 / hermes
#37247). ``set_session_cookies`` writes this cookie
whenever the provider returns a non-empty
``refresh_token``; the middleware uses it to rotate a
fresh access token transparently on AT expiry. A
provider that omits the refresh token (empty string)
degrades gracefully to access-token-only sessions —
the RT cookie is simply not written.
- hermes_session_pkce: short-lived PKCE state + CSRF nonce + provider
hint (HttpOnly, lifetime = 10 minutes)
All three are ``SameSite=Lax`` (browser will send on cross-site GET
top-level navigation, which we need for the IDP redirect back to
``/auth/callback``) and live under the prefix's Path. ``Secure`` is set
ONLY when the dashboard was reached over HTTPS — detected via the
request URL scheme, which honours ``X-Forwarded-Proto`` upstream of
Fly's TLS terminator when uvicorn is configured with
``proxy_headers=True``. Loopback dev traffic is always HTTP so
``Secure`` would lock the cookies out of the browser.
Cookie prefix selection (browser hardening per
https://datatracker.ietf.org/doc/html/draft-west-cookie-prefixes):
* Loopback HTTP — bare name. ``__Host-`` / ``__Secure-`` require
``Secure``, which is incompatible with HTTP.
* Gated HTTPS, direct deploy (Path=/) — ``__Host-`` prefix. Binds the
cookie to the exact origin (no Domain attribute) — strongest spec
guarantee.
* Gated HTTPS, behind a reverse-proxy prefix (Path=/hermes) —
``__Secure-`` prefix. ``__Host-`` is disallowed when Path != "/";
``__Secure-`` keeps the Secure-required hardening without the
Path constraint, and the explicit ``Path=/hermes`` covers
same-origin app isolation.
The setters and readers BOTH consult the active prefix because the
cookie *name* changes — a reader that looked up the bare name when the
setter wrote ``__Secure-hermes_session_at`` would never find the value.
Refresh-token handling:
``set_session_cookies`` accepts ``refresh_token=""`` (provider omitted
it) and silently skips writing the RT cookie in that case, so a
refresh-token-less provider degrades to access-token-only sessions.
``clear_session_cookies`` always emits a Max-Age=0 deletion for the RT
cookie on logout / session expiry so a stale cookie from an earlier
deployment gets cleared. The transparent rotation flow ("expired AT +
live RT → rotate server-side, else 401 → /login") lives in
``middleware._attempt_refresh``.
"""
from __future__ import annotations
from typing import Optional, Tuple
from fastapi import Request
from fastapi.responses import Response
# Bare cookie names — the request-scoped ``_resolved_name`` helper
# decides whether to prepend ``__Host-`` / ``__Secure-`` based on the
# request's HTTPS + prefix combination.
SESSION_AT_COOKIE = "hermes_session_at"
SESSION_RT_COOKIE = "hermes_session_rt"
PKCE_COOKIE = "hermes_session_pkce"
# One-shot loop-guard marker for the auto-SSO redirect (Phase 1,
# cloud-auto-discovery). Set when the gate auto-initiates the portal OAuth
# redirect on an unauthenticated document load; its mere PRESENCE on the next
# unauthenticated load tells the gate "we already bounced once" so a genuinely
# absent portal session degrades to the /login page instead of ping-ponging.
# Carries no secret — it's a boolean breadcrumb — but is set HttpOnly/Lax/Secure
# like the others for consistency. Short TTL so a user who returns later gets a
# fresh silent attempt rather than a permanently-disabled one.
SSO_ATTEMPT_COOKIE = "hermes_sso_attempt"
# Possible name variants we may have to read back. Sorted so most-strict
# wins on iteration when both happen to be present (shouldn't happen in
# practice — a single request emits exactly one variant).
_NAME_VARIANTS = ("__Host-", "__Secure-", "")
# RT cookie Max-Age. Kept at 30 days as a generous upper bound on the cookie's
# browser lifetime; Portal's actual refresh-token TTL (24h, rotating) is the
# real authority — once the RT itself expires/rotates out, a refresh attempt
# returns 400 → RefreshExpiredError → clean re-login, regardless of how long
# the cookie lingers. (Not tightened to 24h here to avoid coupling the cookie
# lifetime to a server-side TTL that can change independently; revisit if the
# stale-cookie refresh churn ever matters.)
_RT_MAX_AGE = 30 * 24 * 60 * 60
_PKCE_MAX_AGE = 10 * 60
# Auto-SSO loop-guard marker TTL. Just long enough to cover one redirect
# round trip to the portal and back (a few seconds in practice); kept at 60s
# so a slow portal hop or a manual back-button still trips the guard, while a
# user returning minutes later gets a fresh silent attempt rather than being
# stuck on /login forever. The marker is also cleared explicitly on a
# successful callback and whenever the gate falls back to /login.
_SSO_ATTEMPT_MAX_AGE = 60
def _resolved_name(bare: str, *, use_https: bool, prefix: str) -> str:
"""Pick the cookie-prefix variant for the active request shape.
See module docstring for the prefix selection rules. Mismatch
between setter and reader would silently break sessions, so this
function is the single source of truth for naming.
"""
if not use_https:
return bare
if prefix:
# Path != "/" forbids __Host-; fall back to __Secure-.
return f"__Secure-{bare}"
return f"__Host-{bare}"
def _cookie_path(prefix: str) -> str:
"""Cookie ``Path`` attribute for the active deploy shape.
Under ``X-Forwarded-Prefix: /hermes`` we want ``Path=/hermes`` so:
a) the browser sends the cookie back on requests under the prefix
(browsers omit the cookie if request path doesn't start with
Path);
b) the cookie doesn't leak to other apps on the same origin
(``mission-control.tilos.com/billing/...``).
Direct-deploy (no proxy prefix) gets ``Path=/``.
"""
return prefix if prefix else "/"
def _common_attrs(*, use_https: bool, prefix: str) -> dict:
attrs: dict = {
"httponly": True,
"samesite": "lax",
"path": _cookie_path(prefix),
}
if use_https:
attrs["secure"] = True
return attrs
def set_session_cookies(
response: Response,
*,
access_token: str,
refresh_token: str,
access_token_expires_in: int,
use_https: bool,
prefix: str = "",
) -> None:
"""Set the session cookies on the response.
``access_token_expires_in`` is in seconds. Use the provider's reported
TTL for the access token.
``refresh_token`` is written as the RT cookie when non-empty. Nous Portal
issues a 24h rotating refresh token (hermes #37247); a provider that
omits it returns ``Session.refresh_token == ""`` and we simply don't
persist the RT cookie — the session then behaves as access-token-only
until the AT expires. No other branch changes between the two cases.
``prefix`` is the normalised X-Forwarded-Prefix value (e.g. ``/hermes``)
or ``""`` for a direct deploy. It influences both the cookie name
(``__Host-`` vs ``__Secure-`` vs bare) and the ``Path`` attribute.
"""
response.set_cookie(
_resolved_name(SESSION_AT_COOKIE, use_https=use_https, prefix=prefix),
access_token,
max_age=access_token_expires_in,
**_common_attrs(use_https=use_https, prefix=prefix),
)
# Contract v1: empty refresh token means "don't persist RT cookie".
# Keeping a literal empty-value cookie around would be dead state at
# best, attack surface at worst.
if refresh_token:
response.set_cookie(
_resolved_name(SESSION_RT_COOKIE, use_https=use_https, prefix=prefix),
refresh_token,
max_age=_RT_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_session_cookies(response: Response, *, prefix: str = "") -> None:
"""Emit Max-Age=0 deletions for both session cookies.
To delete a cookie reliably the deletion's ``Path`` must match the
set path AND the cookie name must match the variant the setter used.
We don't know which variant was originally set (cookie prefix
depends on the request that set it), so we emit deletions for every
plausible variant under the active path.
"""
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{SESSION_AT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
response.set_cookie(
f"{variant}{SESSION_RT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def set_pkce_cookie(
response: Response, *, payload: str, use_https: bool, prefix: str = "",
) -> None:
response.set_cookie(
_resolved_name(PKCE_COOKIE, use_https=use_https, prefix=prefix),
payload,
max_age=_PKCE_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_pkce_cookie(response: Response, *, prefix: str = "") -> None:
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{PKCE_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def _read_with_fallback(
request: Request, bare_name: str,
) -> Optional[str]:
"""Read a cookie by checking every prefix variant in order.
The setter chooses one variant based on the active request shape;
the reader doesn't know which one fired (the request that READS
the cookie may not be the same shape as the request that SET it
in pathological cases). Trying all three guarantees we find it.
"""
for variant in _NAME_VARIANTS:
value = request.cookies.get(f"{variant}{bare_name}")
if value is not None:
return value
return None
def read_session_cookies(request: Request) -> Tuple[Optional[str], Optional[str]]:
"""Returns (access_token, refresh_token), either may be None."""
at = _read_with_fallback(request, SESSION_AT_COOKIE)
rt = _read_with_fallback(request, SESSION_RT_COOKIE)
return at, rt
def read_pkce_cookie(request: Request) -> Optional[str]:
return _read_with_fallback(request, PKCE_COOKIE)
def set_sso_attempt_cookie(
response: Response, *, use_https: bool, prefix: str = "",
) -> None:
"""Set the one-shot auto-SSO loop-guard marker (Phase 1).
Written by the gate the moment it auto-initiates the portal OAuth
redirect on an unauthenticated document load. The value is a constant
(``"1"``) — only its presence matters. Short Max-Age so a stale marker
can't permanently suppress a future silent attempt.
"""
response.set_cookie(
_resolved_name(SSO_ATTEMPT_COOKIE, use_https=use_https, prefix=prefix),
"1",
max_age=_SSO_ATTEMPT_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def read_sso_attempt_cookie(request: Request) -> Optional[str]:
"""Return the auto-SSO marker value if present (any variant), else None."""
return _read_with_fallback(request, SSO_ATTEMPT_COOKIE)
def clear_sso_attempt_cookie(response: Response, *, prefix: str = "") -> None:
"""Emit Max-Age=0 deletions for the auto-SSO marker, every name variant.
Called on a successful callback and whenever the gate falls back to
/login, so the marker never lingers to suppress a later silent attempt.
"""
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{SSO_ATTEMPT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def detect_https(request: Request) -> bool:
"""Decide whether to set the ``Secure`` cookie flag.
Reads ``request.url.scheme`` — under uvicorn's ``proxy_headers=True``
(which start_server enables when the gate is active), this honours
``X-Forwarded-Proto`` from Fly's TLS terminator. Loopback traffic is
always HTTP so this returns False there.
"""
return request.url.scheme == "https"