The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.
Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.
Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
Allow non-loopback websocket peers when the dashboard is explicitly exposed with --host 0.0.0.0/:: and --insecure.
This fixes the failure mode where /chat rendered over LAN but /api/ws and /api/events were rejected with HTTP 403, leaving the embedded TUI chat disconnected.
Add regression coverage for the insecure public bind case in the dashboard websocket auth tests.
The packaged Electron desktop loads its renderer over file://, so its
/api/ws handshake carries Origin: file:// (or null). The DNS-rebinding
WebSocket Origin guard only accepted http(s) origins matching the bound
host, so it rejected the desktop's own renderer with 4403 -> "Could not
connect to Hermes gateway" on macOS.
A browser DNS-rebinding attacker can only ever present an http(s) origin
(the site hosting the malicious page); it cannot forge file://, null, or
a custom app scheme AND hold the loopback session token. So on loopback
binds we now trust non-web origins -- the token in _ws_auth_ok remains
the real authenticator. Public/gated binds still reject them, and
cross-site http(s) origins are still rejected everywhere.
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
When the OAuth gate is active, start_server runs uvicorn with
proxy_headers=True so the dashboard can honour X-Forwarded-Proto from
Fly's TLS terminator (cookies, redirect URI reconstruction). A side
effect: ws.client.host is rewritten to the X-Forwarded-For value, which
on Fly is the real internet client IP — never loopback. The loopback
peer guard in _ws_client_is_allowed then rejected every WS upgrade in
gated mode (4403 close) even after a successful OAuth round trip and
ticket consumption, silently breaking /api/pty, /api/ws, /api/pub, and
/api/events.
Fix: in gated mode, bypass the peer-IP check. The OAuth gate +
single-use ticket is the auth. The Host/Origin guard in
_ws_host_origin_is_allowed still runs and is what protects against
DNS-rebinding here, not the peer IP.
Loopback mode behaviour is unchanged: the legacy ?token= path is the
only auth there and we don't want LAN hosts guessing tokens.
Regression coverage: TestWsRequestIsAllowedGated pins all four
behaviours — non-loopback peer allowed in gated mode, non-loopback peer
rejected in loopback mode, loopback peer allowed in loopback mode, and
the Host/Origin guard still firing on a rebinding attempt with gated
mode + matching peer.
The Nous OAuth provider plugin (plugins/dashboard_auth/nous) is bundled
and auto-loaded — same as before — but previously refused to register
unless BOTH HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL
were set, then the gate's fail-closed branch told the operator 'install
the default Nous provider'. That message is misleading: the provider IS
installed; it's just unconfigured. And the contract only really needs
the per-instance client_id — the portal URL is the same for everyone
in production.
Three changes:
1. plugins/dashboard_auth/nous/__init__.py:
- HERMES_DASHBOARD_PORTAL_URL is now optional and defaults to
'https://portal.nousresearch.com'. Override only for staging
(portal.rewbs.uk) or a custom deployment. Empty string also
falls back to the default so an empty Fly secret can't point
the dashboard at nowhere.
- Plugin exposes a module-level LAST_SKIP_REASON: str that the gate
reads when no providers register. Cleared on each register() call.
Skip reasons are human-readable and actionable
('HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal
provisions this env var…').
2. plugins/dashboard_auth/nous/plugin.yaml:
- requires_env drops HERMES_DASHBOARD_PORTAL_URL; only the client_id
is mandatory. Description updated to reflect this.
3. hermes_cli/web_server.py:
- When the gate fail-closes for 'no providers', it now reads each
bundled plugin's LAST_SKIP_REASON and embeds them in the SystemExit
message. Operator sees the specific config fix needed:
Bundled providers reported these issues:
• nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. …
instead of the prior generic 'Install the default Nous provider'.
Tests:
- TestPluginRegister rewritten to assert the new defaults +
LAST_SKIP_REASON contents (6 tests, +1 new for empty-string env).
- New gate test test_start_server_surfaces_nous_skip_reason_when_unconfigured.
- test_get_method_is_not_allowed widened to handle the SPA-shell 200
path explicitly — assertion now verifies no JSON ticket leaks
rather than asserting a specific status code (covers all four of
401/404/405/200).
Docs updated: web-dashboard.md's 'Default provider' section now shows
the env-var table with required/optional columns and embeds the
fail-closed error message verbatim so operators can match what they
see at the prompt.
Contract V1 of nous-account-service PR #180 ships no refresh tokens, so
the original Phase 6 silent-refresh design is replaced with a thinner
'401 → redirect to /login' UX. The dashboard's gated middleware now
emits a structured envelope on any auth failure; the SPA's fetch
wrapper sees it and full-page-navigates the user through re-auth.
hermes_cli/dashboard_auth/cookies.py:
set_session_cookies(refresh_token='') SKIPS writing the
hermes_session_rt cookie. Forward-compat: a non-empty refresh_token
still emits the cookie unchanged, so a future Portal contract that
starts issuing RTs flips the persistence on with no other change.
clear_session_cookies still emits a Max-Age=0 deletion for the RT
cookie so stale cookies from earlier deployments get flushed on
logout / session expiry. Deprecation marker + rationale in
module docstring per the user's docstring-only deprecation pattern.
hermes_cli/dashboard_auth/middleware.py:
_unauth_response now builds a structured JSON envelope for API 401s:
{ error: 'session_expired' | 'unauthenticated',
detail: 'Unauthorized',
reason: <internal>,
login_url: '/login?next=<safe-path>' }
HTML redirects also carry next= so a user landing on /sessions
without a cookie bounces back to /sessions after re-auth.
_safe_next_target validates same-origin: drops protocol-relative
paths (//evil.com), absolute URLs, and any /login or /auth/* loop.
Dead cookies are cleared on the 401 path so the browser stops
replaying invalid tokens.
hermes_cli/dashboard_auth/routes.py:
/auth/callback accepts next= query param and validates via
_validate_post_login_target (same rules as the gate's
_safe_next_target — defence-in-depth because next= survived a full
IDP round trip and attacker-controlled state can re-enter via the
callback URL). Open-redirect attempts land at '/' instead.
web/src/lib/api.ts:
fetchJSON parses the 401 envelope and full-page-navigates to
body.login_url ONLY on the known session-expiry error codes.
Domain-level 401s (e.g. permission errors) bubble up as regular
errors. credentials: 'include' added so cookie auth works for all
fetches routed through this wrapper. sessionStorage.lastLocation is
preserved for future use by AuthWidget / hermes_status.
Test files marked with pytest.mark.xdist_group so the four files that
mutate web_server.app.state.auth_required serialize onto the same xdist
worker — eliminates 'works locally, fails in CI' app-state bleed.
20 new tests in test_dashboard_auth_401_reauth.py:
- set_session_cookies(refresh_token='') skips RT cookie
- clear_session_cookies still emits RT deletion
- 401 envelope shape (unauthenticated vs session_expired)
- dead cookie cleared on invalid-token 401
- login_url carries next= for deep paths
- login loop avoided when path is /login/auth/api-auth
- protocol-relative URL rejected
- _safe_next_target unit tests (accept same-origin, reject loops/abs)
- /auth/callback respects safe next= but rejects open redirects
2 pre-existing tests updated to accept the new /login?next=%2F shape.
Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing).
Phase 5 task 5.2. Four WebSocket endpoints — /api/pty, /api/ws, /api/pub,
/api/events — previously authed with the same constant-time check against
`_SESSION_TOKEN`. Replaced with a single helper that branches on
`app.state.auth_required`:
Loopback / --insecure: legacy ?token=<_SESSION_TOKEN> path (unchanged).
Gated: ?ticket=<single-use> consumed against the
dashboard-auth ticket store.
Critical security property: gated mode UNCONDITIONALLY rejects the
?token= path. A leaked _SESSION_TOKEN value from a log line is not
replayable for WS access in gated deployments.
`_build_sidecar_url` now branches too: loopback uses the legacy token;
gated mode mints a server-internal ticket via mint_ticket() with
pseudo-user 'pty-sidecar' / provider 'server-internal' so audit logs can
distinguish PTY-internal sidecar tickets from browser tickets. PTY
children open /api/pub exactly once at startup so single-use suffices.
Ticket rejections audit-log as WS_TICKET_REJECTED with truncated reason
+ client IP + WS path. Operators debugging 'WS keeps closing' issues see
which endpoint and why.
17 new tests:
- POST /api/auth/ws-ticket: 200 with cookie, 401/302 without, distinct
per call, GET-not-allowed.
- _ws_auth_ok loopback: token accept/reject, missing-token reject,
ticket-param-ignored.
- _ws_auth_ok gated: ticket accept, single-use rejection, unknown reject,
legacy-token-rejected-in-gated assertion, audit-log emission.
- _build_sidecar_url: loopback uses token=, gated uses ticket=, no-bound
returns None.