The Nous OAuth provider plugin (plugins/dashboard_auth/nous) is bundled
and auto-loaded — same as before — but previously refused to register
unless BOTH HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL
were set, then the gate's fail-closed branch told the operator 'install
the default Nous provider'. That message is misleading: the provider IS
installed; it's just unconfigured. And the contract only really needs
the per-instance client_id — the portal URL is the same for everyone
in production.
Three changes:
1. plugins/dashboard_auth/nous/__init__.py:
- HERMES_DASHBOARD_PORTAL_URL is now optional and defaults to
'https://portal.nousresearch.com'. Override only for staging
(portal.rewbs.uk) or a custom deployment. Empty string also
falls back to the default so an empty Fly secret can't point
the dashboard at nowhere.
- Plugin exposes a module-level LAST_SKIP_REASON: str that the gate
reads when no providers register. Cleared on each register() call.
Skip reasons are human-readable and actionable
('HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal
provisions this env var…').
2. plugins/dashboard_auth/nous/plugin.yaml:
- requires_env drops HERMES_DASHBOARD_PORTAL_URL; only the client_id
is mandatory. Description updated to reflect this.
3. hermes_cli/web_server.py:
- When the gate fail-closes for 'no providers', it now reads each
bundled plugin's LAST_SKIP_REASON and embeds them in the SystemExit
message. Operator sees the specific config fix needed:
Bundled providers reported these issues:
• nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. …
instead of the prior generic 'Install the default Nous provider'.
Tests:
- TestPluginRegister rewritten to assert the new defaults +
LAST_SKIP_REASON contents (6 tests, +1 new for empty-string env).
- New gate test test_start_server_surfaces_nous_skip_reason_when_unconfigured.
- test_get_method_is_not_allowed widened to handle the SPA-shell 200
path explicitly — assertion now verifies no JSON ticket leaks
rather than asserting a specific status code (covers all four of
401/404/405/200).
Docs updated: web-dashboard.md's 'Default provider' section now shows
the env-var table with required/optional columns and embeds the
fail-closed error message verbatim so operators can match what they
see at the prompt.
Contract V1 of nous-account-service PR #180 ships no refresh tokens, so
the original Phase 6 silent-refresh design is replaced with a thinner
'401 → redirect to /login' UX. The dashboard's gated middleware now
emits a structured envelope on any auth failure; the SPA's fetch
wrapper sees it and full-page-navigates the user through re-auth.
hermes_cli/dashboard_auth/cookies.py:
set_session_cookies(refresh_token='') SKIPS writing the
hermes_session_rt cookie. Forward-compat: a non-empty refresh_token
still emits the cookie unchanged, so a future Portal contract that
starts issuing RTs flips the persistence on with no other change.
clear_session_cookies still emits a Max-Age=0 deletion for the RT
cookie so stale cookies from earlier deployments get flushed on
logout / session expiry. Deprecation marker + rationale in
module docstring per the user's docstring-only deprecation pattern.
hermes_cli/dashboard_auth/middleware.py:
_unauth_response now builds a structured JSON envelope for API 401s:
{ error: 'session_expired' | 'unauthenticated',
detail: 'Unauthorized',
reason: <internal>,
login_url: '/login?next=<safe-path>' }
HTML redirects also carry next= so a user landing on /sessions
without a cookie bounces back to /sessions after re-auth.
_safe_next_target validates same-origin: drops protocol-relative
paths (//evil.com), absolute URLs, and any /login or /auth/* loop.
Dead cookies are cleared on the 401 path so the browser stops
replaying invalid tokens.
hermes_cli/dashboard_auth/routes.py:
/auth/callback accepts next= query param and validates via
_validate_post_login_target (same rules as the gate's
_safe_next_target — defence-in-depth because next= survived a full
IDP round trip and attacker-controlled state can re-enter via the
callback URL). Open-redirect attempts land at '/' instead.
web/src/lib/api.ts:
fetchJSON parses the 401 envelope and full-page-navigates to
body.login_url ONLY on the known session-expiry error codes.
Domain-level 401s (e.g. permission errors) bubble up as regular
errors. credentials: 'include' added so cookie auth works for all
fetches routed through this wrapper. sessionStorage.lastLocation is
preserved for future use by AuthWidget / hermes_status.
Test files marked with pytest.mark.xdist_group so the four files that
mutate web_server.app.state.auth_required serialize onto the same xdist
worker — eliminates 'works locally, fails in CI' app-state bleed.
20 new tests in test_dashboard_auth_401_reauth.py:
- set_session_cookies(refresh_token='') skips RT cookie
- clear_session_cookies still emits RT deletion
- 401 envelope shape (unauthenticated vs session_expired)
- dead cookie cleared on invalid-token 401
- login_url carries next= for deep paths
- login loop avoided when path is /login/auth/api-auth
- protocol-relative URL rejected
- _safe_next_target unit tests (accept same-origin, reject loops/abs)
- /auth/callback respects safe next= but rejects open redirects
2 pre-existing tests updated to accept the new /login?next=%2F shape.
Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing).
Phase 3, Task 3.5. Three changes to web_server.py:
1. start_server replaces the legacy SystemExit-refusing-to-bind guard
with: if app.state.auth_required and no providers registered, exit
with a clear message; otherwise log the gate-on banner. --insecure
keeps its existing behaviour.
2. uvicorn proxy_headers flag is computed from app.state.auth_required.
Loopback / --insecure keep it False (so _ws_client_is_allowed sees
the real peer for the loopback gate); gated mode flips it True so
X-Forwarded-Proto from Fly's TLS terminator is honoured for cookie
Secure-flag decisions in detect_https().
3. _serve_index no longer injects window.__HERMES_SESSION_TOKEN__ when
the gate is on — the SPA reads identity from /api/auth/me using
cookie auth instead. window.__HERMES_AUTH_REQUIRED__ flag lets the
SPA pick between ticket-auth (gated) and token-auth (loopback) for
/api/pty + /api/ws (Phase 5 will wire this in the React layer).
4 new behavioural tests; loopback regression harness still green.
Phase 0, Task 0.3. start_server now computes should_require_auth(host,
allow_public) and records it on app.state.auth_required BEFORE the
existing legacy SystemExit guard fires. This gives middleware, the SPA
token-injection path, and WS endpoints a consistent read source for
'is the gate active'. The flag is set but no one reads it yet — Phase 3
registers the gate middleware.
Note: 4 pre-existing test failures in tests/hermes_cli/test_web_server.py
(PtyWebSocket) + test_update_hangup_protection.py reproduce on pristine
HEAD and are unrelated to this change (starlette TestClient WS regression).
Phase 0, Task 0.2. Single source of truth for 'is the auth gate active?'.
Reuses the existing _LOOPBACK_HOST_VALUES frozenset so this stays in sync
with the DNS-rebinding host-header check. RFC1918/CGNAT/link-local are
treated as public — exact threat model the gate exists for.
Phase 0, Task 0.1 of the dashboard-oauth plan. Establishes a baseline for
the loopback dashboard's auth surface so future phases can prove they
didn't regress the existing _SESSION_TOKEN flow when adding the OAuth gate.