feat(dashboard-auth): config.yaml as canonical surface for dashboard.oauth

Per AGENTS.md, ~/.hermes/.env is reserved for API keys / secrets and
config.yaml is the surface for non-secret configuration. The Nous
Portal plugin previously read HERMES_DASHBOARD_OAUTH_CLIENT_ID and
HERMES_DASHBOARD_PORTAL_URL from the environment only, which forced
local-dev / on-prem operators to put non-secret per-instance
configuration in .env — violating the convention.

Add dashboard.oauth.{client_id,portal_url} to DEFAULT_CONFIG and have
the plugin resolve each setting with env-overrides-config precedence:

  1. Env var when set to a non-empty value (Fly.io platform-secret
     injection — what pushes per-deploy client_ids without baking
     them into the image).
  2. config.yaml entry (canonical surface for local dev / on-prem).
  3. Plugin default (no provider registered when client_id is empty;
     portal_url defaults to https://portal.nousresearch.com).

Empty env values are explicitly treated as unset so a provisioned-but-
not-populated Fly secret can't accidentally shadow a valid config.yaml
entry with an empty string — operators would otherwise lose the gate.

Implementation:

  - hermes_cli/config.py: add dashboard.oauth.{client_id,portal_url}
    block to DEFAULT_CONFIG with full doc comment explaining the
    override precedence and Fly.io rationale.
  - plugins/dashboard_auth/nous/__init__.py: add _load_config_oauth_section,
    _resolve_client_id, _resolve_portal_url helpers; replace the two
    direct os.environ.get() calls in register() with the resolvers.
    Update the skip-reason string to mention BOTH surfaces so an
    operator looking at the fail-closed bind error knows config.yaml
    is a valid alternative to the env var.
  - plugins/dashboard_auth/nous/plugin.yaml: update description to
    name both surfaces. requires_env stays pointing at the env var
    name — it's metadata-only (not used by the plugin loader for
    gating) so this is documentation/UX, not enforcement.
  - cli-config.yaml.example: append commented dashboard.oauth block
    with the same override rationale operators see in code.
  - website/docs/user-guide/features/web-dashboard.md: rewrite the
    'Default provider: Nous Research' section to lead with config.yaml,
    present env vars as operator overrides (Fly.io's primary path).
    Updated the example fail-closed bind error to match the new
    skip-reason text.

Test coverage — new TestConfigYamlSource class (8 tests) pinning
every tier of the precedence chain:

  - config-yaml-only path registers correctly
  - both config-yaml fields (client_id + portal_url) honoured
  - env var overrides config for client_id (Fly.io critical path)
  - env var overrides config for portal_url
  - empty env string does NOT shadow config (CI/Fly edge case)
  - neither source set → skip with reason mentioning BOTH surfaces
  - load_config() raising falls through to env-only path (resilience)
  - non-dict oauth section falls through cleanly (typo resilience)

Mutation-tested: flipping the precedence to config-wins-over-env trips
exactly test_env_overrides_config_client_id while the other 7 stay
green, confirming the suite discriminates the order, not just the
sources.

This closes the last item in Teknium's PR review (PR #30156).
This commit is contained in:
Ben 2026-05-26 12:00:13 +10:00 committed by Teknium
parent e2a92ce649
commit 61dcc33893
6 changed files with 348 additions and 40 deletions

View file

@ -323,26 +323,45 @@ If the gate would engage but **no** `DashboardAuthProvider` is registered (no No
### Default provider: Nous Research
The bundled `plugins/dashboard_auth/nous` plugin is **always installed** and auto-loaded. It auto-registers a `DashboardAuthProvider` named `nous` when the per-instance client ID is set:
The bundled `plugins/dashboard_auth/nous` plugin is **always installed** and auto-loaded. It auto-registers a `DashboardAuthProvider` named `nous` when a client ID is configured.
| Env var | Required? | Format | Provisioned by |
#### Configuration
The plugin reads from two surfaces, with the environment variable winning when set non-empty:
**`config.yaml`** — the canonical surface:
```yaml
dashboard:
oauth:
client_id: agent:01HXYZ… # required to engage the gate
portal_url: https://portal.nousresearch.com # optional; defaults to production
```
**Environment variables** — operator overrides:
| Env var | Overrides | Format | Provisioned by |
|---------|-----------|--------|----------------|
| `HERMES_DASHBOARD_OAUTH_CLIENT_ID` | **yes** | `agent:{instance_id}` | Nous Portal at Fly.io provisioning time |
| `HERMES_DASHBOARD_PORTAL_URL` | no | `https://portal.nousresearch.com` (default) | Portal — override only for staging or a custom deployment |
| `HERMES_DASHBOARD_OAUTH_CLIENT_ID` | `dashboard.oauth.client_id` | `agent:{instance_id}` | Nous Portal at Fly.io provisioning time |
| `HERMES_DASHBOARD_PORTAL_URL` | `dashboard.oauth.portal_url` | URL (default: `https://portal.nousresearch.com`) | Portal — override only for staging or a custom deployment |
`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is the only required variable; it's injected automatically when you deploy through the Nous Portal. The portal URL defaults to production, so the typical operator never touches it — set it explicitly only if you're pointing at staging (`portal.rewbs.uk`) or a custom Portal deployment.
Per the Hermes Agent convention (`~/.hermes/.env` is for API keys / secrets only), **`config.yaml` is the recommended place to set these values** for local dev, on-prem, and any deployment you control directly. The environment-variable path exists so Fly.io's platform-secret injection can push per-deploy `client_id`s without anyone having to edit `config.yaml` inside the image — that's its primary purpose.
If `HERMES_DASHBOARD_OAUTH_CLIENT_ID` is absent or malformed, the plugin reports the specific reason and the dashboard's fail-closed bind error tells you exactly what to fix:
Empty environment values are treated as unset, so a provisioned-but-not-populated Fly secret can't accidentally shadow a valid `config.yaml` entry.
If neither source provides a client_id, the plugin reports the specific reason and the dashboard's fail-closed bind error tells you exactly what to fix:
```
Refusing to bind dashboard to 0.0.0.0 — the OAuth auth gate engages on
non-loopback binds, but no auth providers are registered.
Bundled providers reported these issues:
• nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal
• nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set (and
dashboard.oauth.client_id in config.yaml is empty). The Nous Portal
provisions this env var (shape 'agent:{instance_id}') when it
deploys a Hermes Agent instance — set it to your provisioned
client id, or pass --insecure to skip the OAuth gate entirely.
client id (either as an env var or under dashboard.oauth.client_id
in config.yaml), or pass --insecure to skip the OAuth gate entirely.
Or pass --insecure to skip the auth gate (NOT recommended on untrusted
networks).
@ -406,11 +425,20 @@ The login page lists all registered providers; multiple providers can be stacked
### Verifying the gate is on
```bash
# Run the dashboard with the gate engaged (Fly.io shape).
# HERMES_DASHBOARD_PORTAL_URL is optional — defaults to production.
# Quick env-var path (Fly.io shape). HERMES_DASHBOARD_PORTAL_URL is
# optional — defaults to production.
HERMES_DASHBOARD_OAUTH_CLIENT_ID=agent:test \
hermes dashboard --host 0.0.0.0
# Or the equivalent via config.yaml (recommended for local dev / on-prem):
#
# dashboard:
# oauth:
# client_id: agent:test
#
# then just:
hermes dashboard --host 0.0.0.0
# Hit /api/status to see the gate state:
curl -s http://127.0.0.1:9119/api/status | jq '.auth_required, .auth_providers'
# true