feat(relay): terminal 4401 (opt-out) → clean "Relay disabled" state
Some checks are pending
CI / detect (push) Waiting to run
CI / tests (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / typecheck (push) Blocked by required conditions
CI / docs-site (push) Blocked by required conditions
CI / history-check (push) Blocked by required conditions
CI / contributor-check (push) Blocked by required conditions
CI / uv-lockfile (push) Blocked by required conditions
CI / docker-lint (push) Blocked by required conditions
CI / supply-chain (push) Blocked by required conditions
CI / osv-scanner (push) Blocked by required conditions
CI / All required checks pass (push) Blocked by required conditions
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions

Phase 7 Unit 7d-B. When an operator opts an instance OUT of the Team Gateway
relay (Unit 7b deprovision), the connector revokes the per-gateway secret and
closes the gateway's WS with 4401. The reconnect supervisor previously treated
EVERY close as retryable, so the live process spun "retrying 4401" forever and
the dashboard showed a red error — opt-out looked like a failure.

Now a 4401 close that arrives AFTER a successful handshake is recognized as a
terminal credential revocation:

- ws_transport.py: track `_handshake_succeeded` (set when a descriptor is
  received); on a 4401 close after a prior success, latch `auth_revoked` and do
  NOT spawn the reconnect supervisor. A 4401 BEFORE any successful handshake
  stays retryable (cold-start / not-yet-provisioned race, not a revocation).
  New `auth_revoked` property + a websockets-version-safe close-code reader
  (prefers `.rcvd`/`.sent` Close frames; `.code` is deprecated in websockets 13+).
- adapter.py: a revocation monitor turns `transport.auth_revoked` into a clean,
  NON-retryable `relay_disabled` fatal and notifies the gateway's fatal-error
  handler (so the adapter is removed and NOT queued for reconnection — the
  credential is dead until the instance is recreated). Monitor is cancelled on
  disconnect; only started when the transport exposes `auth_revoked` (prod WS).
- run.py: `_handle_adapter_fatal_error` maps the `relay_disabled` code to a
  `disabled` platform_state (not `fatal`/`retrying`).
- web: PlatformsCard renders the `disabled` state with a neutral outline badge,
  a PowerOff icon, and muted (not destructive-red) text + message. New optional
  `status.disabled` i18n string ("Disabled").

Also bundles the Phase 7 contract-doc update (this doc is authoritative in
hermes-agent): docs/relay-connector-contract.md gains an "Author-first
resolution + the account-link (DM) path" section documenting the
multi-tenant-guild rule (D-7.2 — route by authenticated author binding, never by
guild; unlinked → fail-closed), the `/link <code>` DM flow, and the
connector-authoritative opt-out + terminal-4401 behavior this PR implements.

Tests: +2 ws_transport (4401-after-handshake terminal / no-reconnect;
4401-before-handshake stays retryable) and +2 adapter (revocation → non-retryable
relay_disabled fatal + handler fired; no-revocation → no fatal). 138 relay tests
pass (incl. the contract-doc conformance test); ruff clean; web tsc clean.

Phase 7 Unit 7d-B (relay-adapter solo lane). Q17 → Option 2; Option 3 (live
de-register, no recreate) + the restart-re-provision hole deferred post-alpha.
This commit is contained in:
Ben 2026-06-24 18:13:37 +10:00 committed by Ben Barclay
parent 3c75e11571
commit c93b9f9057
9 changed files with 367 additions and 8 deletions

View file

@ -1,4 +1,4 @@
import { AlertTriangle, Radio, Wifi, WifiOff } from "lucide-react";
import { AlertTriangle, PowerOff, Radio, Wifi, WifiOff } from "lucide-react";
import type { PlatformStatus } from "@/lib/api";
import { isoTimeAgo } from "@/lib/utils";
import { Badge } from "@nous-research/ui/ui/components/badge";
@ -9,10 +9,11 @@ export function PlatformsCard({ platforms }: PlatformsCardProps) {
const { t } = useI18n();
const platformStateBadge: Record<
string,
{ tone: "success" | "warning" | "destructive"; label: string }
{ tone: "success" | "warning" | "destructive" | "outline"; label: string }
> = {
connected: { tone: "success", label: t.status.connected },
disconnected: { tone: "warning", label: t.status.disconnected },
disabled: { tone: "outline", label: t.status.disabled ?? "Disabled" },
fatal: { tone: "destructive", label: t.status.error },
};
@ -38,7 +39,9 @@ export function PlatformsCard({ platforms }: PlatformsCardProps) {
? Wifi
: info.state === "fatal"
? AlertTriangle
: WifiOff;
: info.state === "disabled"
? PowerOff
: WifiOff;
return (
<div
@ -52,7 +55,9 @@ export function PlatformsCard({ platforms }: PlatformsCardProps) {
? "text-success"
: info.state === "fatal"
? "text-destructive"
: "text-warning"
: info.state === "disabled"
? "text-muted-foreground"
: "text-warning"
}`}
/>
@ -62,7 +67,13 @@ export function PlatformsCard({ platforms }: PlatformsCardProps) {
</span>
{info.error_message && (
<span className="font-mondwest normal-case text-xs text-destructive">
<span
className={`font-mondwest normal-case text-xs ${
info.state === "disabled"
? "text-muted-foreground"
: "text-destructive"
}`}
>
{info.error_message}
</span>
)}

View file

@ -107,6 +107,7 @@ export const en: Translations = {
activeSessions: "Active Sessions",
connected: "Connected",
connectedPlatforms: "Connected Platforms",
disabled: "Disabled",
disconnected: "Disconnected",
error: "Error",
failed: "Failed",

View file

@ -124,6 +124,7 @@ export interface Translations {
agent: string;
connected: string;
connectedPlatforms: string;
disabled?: string;
disconnected: string;
error: string;
failed: string;