Commit graph

8 commits

Author SHA1 Message Date
Ben
b26d81d536 feat(dashboard-auth): honour X-Forwarded-Prefix + __Host-/__Secure- cookies
Mission-control style deploys reverse-proxy the dashboard at a path
prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject
X-Forwarded-Prefix: /hermes on every request. The SPA mount already
honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__,
but the OAuth gate didn't:

  1. The gate's Location: header to /login and the 401 envelope's
     login_url were built bare ("/login?next=..."). Under a /hermes
     prefix the browser follows that to mission-control.tilos.com/login
     which the proxy doesn't route to the dashboard.
  2. _redirect_uri (the OAuth callback URL handed to the IDP) used
     request.url_for() which doesn't honour X-Forwarded-Prefix
     (Starlette/uvicorn only proxy_headers Host + Proto + For). The
     IDP redirects back to /auth/callback instead of /hermes/auth/
     callback → 404 in the user's browser.
  3. Cookies were set with Path=/ which leaks them to other apps on
     the same origin and won't be sent back on requests under the
     prefix in the first place.

Fix threads the normalised prefix through every boundary:

  * New hermes_cli/dashboard_auth/prefix.py — single source of truth
    for X-Forwarded-Prefix parsing. web_server._normalise_prefix
    becomes a re-export so the SPA mount, the gate, and the cookies
    helper all agree.
  * middleware._unauth_response builds login_url = f"{prefix}/login".
  * routes._redirect_uri splices the prefix into the path component
    of the IDP-bound URL (with full validation of the header).
  * cookies.{set,clear}_{session,pkce}_cookie now take prefix="".
    Path attribute switches to /hermes when set; cookie name switches
    name variant (see below). Every caller passes the request's
    normalised prefix.

Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt
the __Host- / __Secure- cookie name prefixes per draft-west-cookie-
prefixes. The variant is selected from (use_https, prefix):

  * Loopback HTTP → bare "hermes_session_at" (both prefixes require
    Secure, incompatible with HTTP).
  * HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at".
    Strongest spec: bound to exact origin, no Domain attribute, Secure
    required.
  * HTTPS, behind a proxy prefix (Path=/hermes) →
    "__Secure-hermes_session_at". __Host- forbids Path != "/"; the
    explicit Path=/hermes covers same-origin app isolation.

Setter and reader BOTH consult the prefix because the cookie *name*
changes — a reader that looked up the bare name when the setter wrote
__Secure- would never find the value. The reader falls back across
all three variants so a request whose shape changed mid-session (e.g.
post-deploy from no-prefix to /hermes) still picks up the existing
cookie until it expires.

Test coverage:

  - tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests
    pinning:
      • Location: /hermes/login on the gate's HTML redirect
      • 401 envelope login_url carries the prefix
      • Malformed X-Forwarded-Prefix is ignored (header-injection
        defence; the script-tag value is normalised to empty string)
      • _redirect_uri splices /hermes into the path (the property
        that prevents the IDP-returns-to-404 failure)
      • PKCE cookie uses Path=/hermes + __Secure- when proxied
      • Session cookies use __Host- when direct, __Secure- when
        proxied, bare on loopback HTTP
      • End-to-end round trip with hand-managed PKCE cookie carriage
        (TestClient can't simulate a Path=/hermes cookie automatically)
  - tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin
    each (use_https, prefix) shape produces its expected cookie name,
    plus reader-side coverage that __Host- and __Secure- variants are
    both recognised.
  - Existing tests across middleware / 401-reauth / etc. updated to
    match the new cookie names (substring contains instead of
    startswith).

Mutation-tested: reverting _unauth_response to build the bare
"/login" URL trips exactly the two tests that pin the prefix
carriage, confirming the suite discriminates the regression.
2026-05-27 02:12:27 -07:00
Ben
034ad95fed fix(dashboard-auth): propagate next= through login page + PKCE cookie
The gate's _unauth_response set next=<path> on the /login redirect URL,
but nothing downstream read it: render_login_html ignored next=,
auth_login dropped it, and auth_callback read next= from its own query
string — which an IDP never sets on the callback URL (real IDPs only
echo back code+state). The _validate_post_login_target plumbing in the
callback was unreachable on the happy path, so users always landed on
"/" regardless of what they originally requested.

Worse: reading next= from the callback URL was a latent open-redirect
sink, since an attacker could craft /auth/callback?...&next=/admin and
have the server honour it post-auth.

Fix carries next= through the round trip on a server-controlled channel:

  1. login_page reads request.query_params['next'] and passes it (post-
     validation) to render_login_html.
  2. render_login_html threads next= URL-encoded into each provider
     button's href, with HTML-attribute escaping as defence in depth.
  3. auth_login accepts ?next= as a query param, re-validates, and
     appends it as a fourth segment (next=<urlquoted>) in the PKCE
     cookie payload alongside provider/state/verifier.
  4. auth_callback no longer accepts a next: str = "" query param. It
     parses next= out of the PKCE cookie and validates that with the
     same same-origin rules. Any attacker-supplied ?next= on the
     callback URL is silently ignored — server-only carrier.

Test coverage adds three classes:

  - TestAuthCallbackNext drives /login → /auth/login → IDP-bounce →
    /auth/callback end-to-end without smuggling next= onto the callback
    URL (which is what the previous tests did and why they didn't
    catch the bug). Includes test_attacker_callback_next_param_is_ignored
    to pin the security property that the URL value is never read.
  - TestRenderLoginHtmlNext covers the rendering function at the
    unit boundary so a regression that drops next_path is caught
    without spinning up the full app.
  - TestAuthLoginPkceCookieNext inspects the Set-Cookie header on
    /auth/login responses so a regression in cookie encoding is caught
    without driving the full round trip.

Mutation-tested: reverting auth_callback to read next= from the URL
trips 3 of 6 TestAuthCallbackNext tests (the safe-path and attacker-
hardening ones), confirming the suite discriminates between the cookie
read and the URL read.
2026-05-27 02:12:27 -07:00
Ben
5e9308b5b8 feat(dashboard-auth): Phase 6 — 401 re-auth envelope + next= propagation
Contract V1 of nous-account-service PR #180 ships no refresh tokens, so
the original Phase 6 silent-refresh design is replaced with a thinner
'401 → redirect to /login' UX. The dashboard's gated middleware now
emits a structured envelope on any auth failure; the SPA's fetch
wrapper sees it and full-page-navigates the user through re-auth.

hermes_cli/dashboard_auth/cookies.py:
  set_session_cookies(refresh_token='') SKIPS writing the
  hermes_session_rt cookie. Forward-compat: a non-empty refresh_token
  still emits the cookie unchanged, so a future Portal contract that
  starts issuing RTs flips the persistence on with no other change.
  clear_session_cookies still emits a Max-Age=0 deletion for the RT
  cookie so stale cookies from earlier deployments get flushed on
  logout / session expiry. Deprecation marker + rationale in
  module docstring per the user's docstring-only deprecation pattern.

hermes_cli/dashboard_auth/middleware.py:
  _unauth_response now builds a structured JSON envelope for API 401s:
    { error: 'session_expired' | 'unauthenticated',
      detail: 'Unauthorized',
      reason: <internal>,
      login_url: '/login?next=<safe-path>' }
  HTML redirects also carry next= so a user landing on /sessions
  without a cookie bounces back to /sessions after re-auth.
  _safe_next_target validates same-origin: drops protocol-relative
  paths (//evil.com), absolute URLs, and any /login or /auth/* loop.
  Dead cookies are cleared on the 401 path so the browser stops
  replaying invalid tokens.

hermes_cli/dashboard_auth/routes.py:
  /auth/callback accepts next= query param and validates via
  _validate_post_login_target (same rules as the gate's
  _safe_next_target — defence-in-depth because next= survived a full
  IDP round trip and attacker-controlled state can re-enter via the
  callback URL). Open-redirect attempts land at '/' instead.

web/src/lib/api.ts:
  fetchJSON parses the 401 envelope and full-page-navigates to
  body.login_url ONLY on the known session-expiry error codes.
  Domain-level 401s (e.g. permission errors) bubble up as regular
  errors. credentials: 'include' added so cookie auth works for all
  fetches routed through this wrapper. sessionStorage.lastLocation is
  preserved for future use by AuthWidget / hermes_status.

Test files marked with pytest.mark.xdist_group so the four files that
mutate web_server.app.state.auth_required serialize onto the same xdist
worker — eliminates 'works locally, fails in CI' app-state bleed.

20 new tests in test_dashboard_auth_401_reauth.py:
  - set_session_cookies(refresh_token='') skips RT cookie
  - clear_session_cookies still emits RT deletion
  - 401 envelope shape (unauthenticated vs session_expired)
  - dead cookie cleared on invalid-token 401
  - login_url carries next= for deep paths
  - login loop avoided when path is /login/auth/api-auth
  - protocol-relative URL rejected
  - _safe_next_target unit tests (accept same-origin, reject loops/abs)
  - /auth/callback respects safe next= but rejects open redirects

2 pre-existing tests updated to accept the new /login?next=%2F shape.

Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing).
2026-05-27 02:12:27 -07:00
Ben
b69fce9c86 feat(dashboard-auth): single-use WS tickets + POST /api/auth/ws-ticket
Phase 5 task 5.1. Browsers cannot set Authorization on a WebSocket
upgrade, so in gated mode the SPA needs an alternative way to bind the
upgrade to its authenticated session.

  hermes_cli/dashboard_auth/ws_tickets.py — in-memory single-use ticket
  store with 30s TTL. Thread-safe (threading.Lock), token_urlsafe(32)
  values, ticket value truncated to 8 chars in error messages for log
  hygiene. Module-level state with _reset_for_tests() helper.

  hermes_cli/dashboard_auth/routes.py — adds POST /api/auth/ws-ticket.
  Auth-required (the gate middleware already attaches Session to
  request.state.session). Returns {ticket, ttl_seconds}; emits
  WS_TICKET_MINTED audit event with user_id + provider + ip.

  hermes_cli/dashboard_auth/audit.py — adds WS_TICKET_REJECTED enum
  value for the consume-side rejection event (wired into the WS
  endpoints in task 5.2).

11 new tests covering round-trip, single-use, TTL boundary, unknown
ticket rejection, secret-hygiene truncation in error messages, and
concurrent mint+consume from 20 threads.
2026-05-27 02:12:27 -07:00
Ben
5b17eab67a feat(dashboard-auth): auth gate middleware + /auth/* routes + /login HTML
Phase 3, Tasks 3.2 + 3.3 + 3.4. These three pieces are mutually
dependent so they land together.

middleware.py - gated_auth_middleware engages when app.state.auth_required
is True.  Allowlists /login, /auth/*, /api/auth/providers, and static
asset paths; everything else demands a valid session_at cookie.  Verifies
by trying every registered provider's verify_session in turn (multi-
provider stack); attaches verified Session to request.state.session.
Returns 401 JSON for /api/* and 302 -> /login for HTML.  ProviderError
during verify -> 503.

routes.py - APIRouter with:
  GET  /login              server-rendered HTML
  GET  /auth/login?provider=N  302 to IDP + PKCE cookie
  GET  /auth/callback?code,state  completes login, sets session cookies
  POST /auth/logout        clears cookies + best-effort revoke
  GET  /api/auth/providers public bootstrap endpoint (503 if zero)
  GET  /api/auth/me        verified session as JSON (auth-required)

login_page.py - Inline-CSS HTML template, no React, no JavaScript.

web_server.py - Mounted gated_auth_middleware between host_header and
auth_middleware (FastAPI runs middlewares in registration order: host
check -> cookie auth -> token auth).  auth_middleware short-circuits
when auth_required so cookie auth is authoritative in gated mode.
Router is included before mount_spa so the catch-all doesn't swallow
/login or /auth/*.

17 new behavioural tests; loopback regression harness still green.
2026-05-27 02:12:27 -07:00
Ben
a30c4d8ebd feat(dashboard-auth): cookie helpers for session_at/session_rt/pkce
Phase 3, Task 3.1. Three cookies:
  - hermes_session_at: OAuth access token (HttpOnly, TTL = token TTL)
  - hermes_session_rt: OAuth refresh token (HttpOnly, 30d max-age)
  - hermes_session_pkce: PKCE state + verifier + provider hint (10min)

All SameSite=Lax + Path=/. Secure flag is set ONLY when the request
scheme is https — uvicorn proxy_headers=True (enabled in gated mode at
Phase 3.5) rewrites scheme from X-Forwarded-Proto so Fly's TLS
terminator works.
2026-05-27 02:12:27 -07:00
Ben
865cae4f61 feat(dashboard-auth): json-lines audit log at $HERMES_HOME/logs/dashboard-auth.log
Phase 1, Task 1.4. Records every auth event (login start/success/failure,
logout, refresh success/failure, revoke, session verify failure, WS
ticket mint) as one JSON object per line. Token-like kwargs (access_token,
refresh_token, code, code_verifier, state, ticket, cookie, Authorization)
are dropped before serialisation so the log never contains live secrets.

Write failures log at WARNING but never raise — auth flows must not fail
because the audit logger broke.
2026-05-27 02:12:27 -07:00
Ben
2dc6d03a3d feat(dashboard-auth): define DashboardAuthProvider ABC + Session dataclass
Phase 1, Task 1.1. New package hermes_cli/dashboard_auth/ contains:

  base.py     - DashboardAuthProvider ABC with 5 abstract methods
                (start_login, complete_login, verify_session,
                refresh_session, revoke_session), Session + LoginStart
                frozen dataclasses, three exception types
                (ProviderError / InvalidCodeError / RefreshExpiredError),
                and assert_protocol_compliance() for plugins to call
                in their own tests.
  registry.py - Module-level register/get/list/clear with a lock.

Nothing reads the registry yet — Phase 2 adds the StubAuthProvider and
Phase 3 wires the gate middleware. The plugin hook lands in Task 1.3.
2026-05-27 02:12:27 -07:00