mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-10 08:32:09 +00:00
8 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
616c0a36b6 |
fix(dashboard-auth): don't abort verify chain on one provider's ProviderError
The gated dashboard verifies a session cookie by trying each registered DashboardAuthProvider's verify_session in turn (the session cookie stores only the access token, not which provider issued it). A provider that doesn't recognise a token returns None; a provider whose IDP/JWKS is unreachable raises ProviderError. The loop used to return HTTP 503 on the FIRST ProviderError, before any later provider got a turn. With multiple providers stacked, that means an unreachable IDP for a session you didn't even use blocks login through a different, reachable provider. Concrete repro: a self-hosted-OIDC session hits the 'nous' provider first (registered earlier); nous tries to reach Nous Portal's JWKS, which is unreachable in a self-hosted deployment, so it raises — and the gate 503s before the 'self-hosted' provider can verify the token. Hit live while testing the new self-hosted OIDC plugin against a local Keycloak. Fix: a ProviderError from one provider is logged and the loop continues to the next. A 503 is returned only if NO provider verified the token AND at least one was unreachable — distinguishing a transient IDP outage (don't force a needless re-login) from a token that's genuinely invalid (fall through to refresh/relogin). Single-provider behaviour is unchanged. Tests: adds an _UnreachableProvider stub and three cases — unreachable provider first must not block a working second; all-unreachable still 503s; reachable-but-unrecognised falls through to 401/relogin (not 503). Mutation-tested: reverting the fix makes the first case fail with the exact 503 bug. |
||
|
|
ed9e8ba097 |
feat(dashboard-auth): add pluggable password (non-redirect) login
The dashboard auth gate was OAuth-only: a DashboardAuthProvider could
authenticate only via a redirect to an IDP (start_login -> /auth/callback
-> complete_login). There was no first-class path for username/password
auth, so self-hosters who just want a password on their dashboard had no
clean option short of an external OAuth IDP.
Extend the provider framework with a parallel, non-redirect front door
that converges on the same Session + cookie + refresh machinery:
- base.py: add the optional supports_password flag and
complete_password_login(username, password) -> Session (default
raises NotImplementedError so an OAuth-only provider that forgets the
flag fails loudly). Add InvalidCredentialsError. OAuth providers are
unaffected (flag defaults False; the method is never called).
- routes.py: add POST /auth/password-login, mirroring the cookie-minting
tail of /auth/callback but skipping PKCE/state/code. Returns JSON
{ok, next} (the form POSTs via fetch). Generic 401 for both unknown
user and wrong password (no enumeration oracle); 404 hides whether a
provider exists or supports passwords; per-IP sliding-window rate
limit (10/min -> 429). /api/auth/providers now reports
supports_password so the login page can branch.
- middleware.py: allowlist /auth/password-login (a bootstrap route).
verify/refresh/revoke/ws-tickets/logout need zero changes — a password
session is just a Session with provider-minted opaque tokens.
- login_page.py: render a credential form (instead of a redirect button)
for supports_password providers, wired by a small inline script that
POSTs to /auth/password-login and navigates on success. OAuth-only
pages stay script-free.
|
||
|
|
c10ccaaf51
|
feat(dashboard-auth): rotate dashboard sessions via refresh token (#37247)
* feat(dashboard-auth): rotate dashboard sessions via refresh token The dashboard auth-code grant now issues a 24h rotating refresh token (server side: NousResearch/nous-account-service#293). This wires up the Hermes client half so an expired access token is transparently refreshed instead of bouncing the user to /login every 15 minutes. plugins/dashboard_auth/nous: - refresh_session() now POSTs grant_type=refresh_token to Portal's token endpoint and returns a Session carrying the ROTATED refresh token (was an unconditional RefreshExpiredError under the old "no RT in V1" contract). The RT is sent in BOTH the request body (Portal's schema requires it there) and the X-Refresh-Token header (log redaction) — verified against the #293 preview deploy: header-only is rejected as invalid_request, body is accepted. - A 400 from Portal (expired / revoked / reuse-detected) maps to RefreshExpiredError so the middleware forces a clean re-login; network errors map to ProviderError; empty RT fast-fails without a network call. - complete_login now captures the initial refresh token Portal returns (forward-tolerant: empty string if a deploy omits it). - Extracted the shared token-response handling into _token_response_to_session, parameterised on the 400 exception type so the auth-code path raises InvalidCodeError and the refresh path raises RefreshExpiredError. - revoke_session stays a best-effort no-op: Portal exposes no public token-endpoint revocation grant (revocation is the authenticated /sessions UI, keyed by sessionId+userId), so logout is cookie-clearing and the 24h session expires on its own. Documented for a future revoke grant. hermes_cli/dashboard_auth/middleware: - On an expired/invalid access token the gate now attempts refresh via the session's RT BEFORE forcing re-login. On success it serves the request and re-sets the rotated cookies on the response (mandatory: Portal rotates the RT every refresh and reuse-detects, so a stale RT cookie would revoke the whole session on the next refresh). On RefreshExpiredError (or no RT) it falls through to clear-and-relogin. - ProviderError during refresh (Portal unreachable) forces a clean re-login rather than 500-ing the request. - Uses the existing REFRESH_SUCCESS / REFRESH_FAILURE audit events. Validation: - 176 dashboard-auth unit/integration tests pass. - Live E2E against the #293 preview deploy: refresh_session(bad rt) -> RefreshExpiredError through the real token endpoint; live JWKS fetch + RS256 verification rejects a forged token; empty-RT fast-fail. The successful happy-path rotation is covered by unit tests (a live run needs an interactive browser OAuth round trip + registered agent:* client). Depends on: NousResearch/nous-account-service#293 (server-side RT issuance). * fix(dashboard-auth): use Portal's x-nous-refresh-token header name The refresh-token header must match Portal's REFRESH_TOKEN_HEADER exactly ("x-nous-refresh-token"); the initial cut used "X-Refresh-Token", which Portal silently ignores (harmless since the RT is also in the body, which is what the schema requires — but the header redaction was a no-op). Confirmed against the NAS token route + re-validated live against the #293 preview deploy. * fix(dashboard-auth): refresh session when access-token cookie has been evicted The gated middleware bounced users to /login the instant the access-token cookie was absent, without ever consulting the refresh token: at, _rt = read_session_cookies(request) if not at: return _unauth_response(...) # bailed here This made transparent refresh effectively dead for the common case. The access-token cookie is set with Max-Age = access_token_expires_in (~15 min), so a real browser EVICTS hermes_session_at the moment the token lapses while hermes_session_rt persists (30-day Max-Age). From that point the browser sends only the refresh-token cookie — and the old guard rejected it before _attempt_refresh could run. The _attempt_refresh path only fired for a present-but-invalid access token, which never happens in a browser. Fix: only hard-bounce when NEITHER cookie is present. A request carrying just the refresh token now skips verification (no AT to verify) and flows into the existing refresh path, which rotates both cookies and serves the request transparently. A dead/expired RT still raises RefreshExpiredError and falls through to clear-and-relogin. This failure mode escaped the original tests + manual refresh button because both kept the access-token cookie present; only a real browser evicting the cookie at Max-Age exposes it. Added 3 regression tests covering: AT-evicted + RT-present (transparent refresh), no-cookies (still bounces), and RT-only with a dead RT (clean 401, no 500). |
||
|
|
e1eba6f8cc
|
fix(dashboard-auth): drop /api/* paths from OAuth next= round trip (#36244)
When an unauthenticated SPA fetch hit a gated /api/* endpoint (e.g.
GET /api/analytics/models?days=30 fired from ModelsPage on mount or
after a session expiry), the gated middleware stamped the request's
own path into next= on the 401 envelope's login_url. The SPA's global
401 handler in web/src/lib/api.ts full-page-navigated to that URL,
the PKCE cookie carried the encoded /api/* value through the OAuth
round trip to Portal, and /auth/callback's _validate_post_login_target
accepted it as same-origin and redirected the user to the raw JSON
endpoint instead of the dashboard.
Symptom Ben reported: after the OAuth screen he kept landing on
$DOMAIN/api/analytics/models?days=30 (raw JSON) rather than /models.
The bug was deterministic per page — whichever /api/* call ModelsPage,
AnalyticsPage, or SessionsPage fired first owned the redirect race.
Fix: both validators now reject /api/* targets in addition to the
existing /login, /auth/, /api/auth/ exclusions:
- _safe_next_target in middleware.py drops the value before it ever
enters login_url, so the SPA's 401 handler navigates to a bare
/login (which the SPA itself can return-from via its own
sessionStorage["hermes.lastLocation"] fallback that was already
saving the actual browser location).
- _validate_post_login_target in routes.py drops it as second-line
defence at the callback boundary, so a legacy cookie, a regressed
middleware, or an attacker-crafted /auth/login?next=/api/... value
can't smuggle the redirect through. Either layer alone is enough;
pairing them means a regression in one is caught by the other.
The match is anchored: ``decoded == "/api"`` or
``decoded.startswith("/api/")``. SPA route lookalikes like /apidocs
or /api-keys remain valid landing targets — tests pin that.
Test additions in test_dashboard_auth_401_reauth.py:
- TestApi401Envelope: rewrote test_login_url_carries_next_for_deep_
api_path (which asserted the pre-fix behaviour) as
test_login_url_drops_next_for_deep_api_path, plus added the
specific analytics-models repro case from Ben's report.
- TestNextSameOriginValidation: rejects-api-paths + does-not-reject-
api-prefix-lookalikes (covers /apidocs, /api-keys).
- TestAuthCallbackNext: end-to-end test_callback_with_api_next_
lands_at_root drives /auth/login?next=/api/... through to the
callback and asserts the user lands at "/", not the API URL.
- TestValidatePostLoginTarget: new class covering the callback-side
validator directly, including the URL-encoded ``%2Fapi%2F...``
form the PKCE cookie actually carries.
Mutation-tested: reverting both validators causes exactly the 5 new
or rewritten /api/*-related assertions to fail (each fix layer is
independently tested), while the 31 other assertions in the file
remain green. Full tests/hermes_cli/ suite (288 files, 5,938 tests)
passes with the fix applied.
|
||
|
|
a618789dba |
fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.
Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.
Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.
Endpoint inventory (verified safe to remain public):
* /api/status — version, gateway state, active session count,
auth-gate shape. Portal liveness probe target.
* /api/config/defaults — config-defaults feed for the SPA's Config page
* /api/config/schema — config schema for the SPA's Config page
* /api/model/info — model catalogue metadata (context windows)
* /api/dashboard/themes — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard
No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.
Tests:
* New: test_gated_status_is_public (regression guard with the NAS
fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
never the response)
* New: docker integration check #3 in
test_dashboard_oauth_gate_engaged_by_default — /api/status
remains 200 under the gate AND reports auth_required=True so the
portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
/api/sessions instead of /api/status (status is public, so it
can no longer distinguish 'logged in' from 'gate accidentally
disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
test_dashboard_oauth_gate_engaged_by_default probes
/api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
test_dashboard_auth_status_endpoint.py (no longer needed since
/api/status is reachable cold)
Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
|
||
|
|
b26d81d536 |
feat(dashboard-auth): honour X-Forwarded-Prefix + __Host-/__Secure- cookies
Mission-control style deploys reverse-proxy the dashboard at a path
prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject
X-Forwarded-Prefix: /hermes on every request. The SPA mount already
honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__,
but the OAuth gate didn't:
1. The gate's Location: header to /login and the 401 envelope's
login_url were built bare ("/login?next=..."). Under a /hermes
prefix the browser follows that to mission-control.tilos.com/login
which the proxy doesn't route to the dashboard.
2. _redirect_uri (the OAuth callback URL handed to the IDP) used
request.url_for() which doesn't honour X-Forwarded-Prefix
(Starlette/uvicorn only proxy_headers Host + Proto + For). The
IDP redirects back to /auth/callback instead of /hermes/auth/
callback → 404 in the user's browser.
3. Cookies were set with Path=/ which leaks them to other apps on
the same origin and won't be sent back on requests under the
prefix in the first place.
Fix threads the normalised prefix through every boundary:
* New hermes_cli/dashboard_auth/prefix.py — single source of truth
for X-Forwarded-Prefix parsing. web_server._normalise_prefix
becomes a re-export so the SPA mount, the gate, and the cookies
helper all agree.
* middleware._unauth_response builds login_url = f"{prefix}/login".
* routes._redirect_uri splices the prefix into the path component
of the IDP-bound URL (with full validation of the header).
* cookies.{set,clear}_{session,pkce}_cookie now take prefix="".
Path attribute switches to /hermes when set; cookie name switches
name variant (see below). Every caller passes the request's
normalised prefix.
Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt
the __Host- / __Secure- cookie name prefixes per draft-west-cookie-
prefixes. The variant is selected from (use_https, prefix):
* Loopback HTTP → bare "hermes_session_at" (both prefixes require
Secure, incompatible with HTTP).
* HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at".
Strongest spec: bound to exact origin, no Domain attribute, Secure
required.
* HTTPS, behind a proxy prefix (Path=/hermes) →
"__Secure-hermes_session_at". __Host- forbids Path != "/"; the
explicit Path=/hermes covers same-origin app isolation.
Setter and reader BOTH consult the prefix because the cookie *name*
changes — a reader that looked up the bare name when the setter wrote
__Secure- would never find the value. The reader falls back across
all three variants so a request whose shape changed mid-session (e.g.
post-deploy from no-prefix to /hermes) still picks up the existing
cookie until it expires.
Test coverage:
- tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests
pinning:
• Location: /hermes/login on the gate's HTML redirect
• 401 envelope login_url carries the prefix
• Malformed X-Forwarded-Prefix is ignored (header-injection
defence; the script-tag value is normalised to empty string)
• _redirect_uri splices /hermes into the path (the property
that prevents the IDP-returns-to-404 failure)
• PKCE cookie uses Path=/hermes + __Secure- when proxied
• Session cookies use __Host- when direct, __Secure- when
proxied, bare on loopback HTTP
• End-to-end round trip with hand-managed PKCE cookie carriage
(TestClient can't simulate a Path=/hermes cookie automatically)
- tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin
each (use_https, prefix) shape produces its expected cookie name,
plus reader-side coverage that __Host- and __Secure- variants are
both recognised.
- Existing tests across middleware / 401-reauth / etc. updated to
match the new cookie names (substring contains instead of
startswith).
Mutation-tested: reverting _unauth_response to build the bare
"/login" URL trips exactly the two tests that pin the prefix
carriage, confirming the suite discriminates the regression.
|
||
|
|
5e9308b5b8 |
feat(dashboard-auth): Phase 6 — 401 re-auth envelope + next= propagation
Contract V1 of nous-account-service PR #180 ships no refresh tokens, so the original Phase 6 silent-refresh design is replaced with a thinner '401 → redirect to /login' UX. The dashboard's gated middleware now emits a structured envelope on any auth failure; the SPA's fetch wrapper sees it and full-page-navigates the user through re-auth. hermes_cli/dashboard_auth/cookies.py: set_session_cookies(refresh_token='') SKIPS writing the hermes_session_rt cookie. Forward-compat: a non-empty refresh_token still emits the cookie unchanged, so a future Portal contract that starts issuing RTs flips the persistence on with no other change. clear_session_cookies still emits a Max-Age=0 deletion for the RT cookie so stale cookies from earlier deployments get flushed on logout / session expiry. Deprecation marker + rationale in module docstring per the user's docstring-only deprecation pattern. hermes_cli/dashboard_auth/middleware.py: _unauth_response now builds a structured JSON envelope for API 401s: { error: 'session_expired' | 'unauthenticated', detail: 'Unauthorized', reason: <internal>, login_url: '/login?next=<safe-path>' } HTML redirects also carry next= so a user landing on /sessions without a cookie bounces back to /sessions after re-auth. _safe_next_target validates same-origin: drops protocol-relative paths (//evil.com), absolute URLs, and any /login or /auth/* loop. Dead cookies are cleared on the 401 path so the browser stops replaying invalid tokens. hermes_cli/dashboard_auth/routes.py: /auth/callback accepts next= query param and validates via _validate_post_login_target (same rules as the gate's _safe_next_target — defence-in-depth because next= survived a full IDP round trip and attacker-controlled state can re-enter via the callback URL). Open-redirect attempts land at '/' instead. web/src/lib/api.ts: fetchJSON parses the 401 envelope and full-page-navigates to body.login_url ONLY on the known session-expiry error codes. Domain-level 401s (e.g. permission errors) bubble up as regular errors. credentials: 'include' added so cookie auth works for all fetches routed through this wrapper. sessionStorage.lastLocation is preserved for future use by AuthWidget / hermes_status. Test files marked with pytest.mark.xdist_group so the four files that mutate web_server.app.state.auth_required serialize onto the same xdist worker — eliminates 'works locally, fails in CI' app-state bleed. 20 new tests in test_dashboard_auth_401_reauth.py: - set_session_cookies(refresh_token='') skips RT cookie - clear_session_cookies still emits RT deletion - 401 envelope shape (unauthenticated vs session_expired) - dead cookie cleared on invalid-token 401 - login_url carries next= for deep paths - login loop avoided when path is /login/auth/api-auth - protocol-relative URL rejected - _safe_next_target unit tests (accept same-origin, reject loops/abs) - /auth/callback respects safe next= but rejects open redirects 2 pre-existing tests updated to accept the new /login?next=%2F shape. Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing). |
||
|
|
5b17eab67a |
feat(dashboard-auth): auth gate middleware + /auth/* routes + /login HTML
Phase 3, Tasks 3.2 + 3.3 + 3.4. These three pieces are mutually dependent so they land together. middleware.py - gated_auth_middleware engages when app.state.auth_required is True. Allowlists /login, /auth/*, /api/auth/providers, and static asset paths; everything else demands a valid session_at cookie. Verifies by trying every registered provider's verify_session in turn (multi- provider stack); attaches verified Session to request.state.session. Returns 401 JSON for /api/* and 302 -> /login for HTML. ProviderError during verify -> 503. routes.py - APIRouter with: GET /login server-rendered HTML GET /auth/login?provider=N 302 to IDP + PKCE cookie GET /auth/callback?code,state completes login, sets session cookies POST /auth/logout clears cookies + best-effort revoke GET /api/auth/providers public bootstrap endpoint (503 if zero) GET /api/auth/me verified session as JSON (auth-required) login_page.py - Inline-CSS HTML template, no React, no JavaScript. web_server.py - Mounted gated_auth_middleware between host_header and auth_middleware (FastAPI runs middlewares in registration order: host check -> cookie auth -> token auth). auth_middleware short-circuits when auth_required so cookie auth is authoritative in gated mode. Router is included before mount_spa so the catch-all doesn't swallow /login or /auth/*. 17 new behavioural tests; loopback regression harness still green. |