mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-21 10:22:18 +00:00
feat(relay): WS-only inbound on the gateway adapter (Phase 3) (#48294)
The connector now delivers inbound (messages + interrupts) over the gateway's OUTBOUND /relay WebSocket, not a signed HTTP POST to an inbound endpoint. The gateway needs no inbound HTTP port — which is what makes hosted gateways (no public IP) able to receive inbound at all. - gateway/relay/adapter.py: connect() wires set_interrupt_inbound_handler( self.on_interrupt) so connector->gateway interrupt_inbound frames bridge into the existing per-session interrupt path (the inbound message handler was already wired). Removed _maybe_start_inbound_receiver() + the _inbound_runner lifecycle — there is no HTTP receiver anymore. - gateway/relay/inbound_receiver.py: deleted (the signed-HTTP InboundDelivery receiver). - gateway/relay/__init__.py: removed relay_inbound_config() (dead with the receiver gone). The delivery key is still set in-process by self-provision for forward-compat but is no longer consumed for inbound. - docs/relay-connector-contract.md: §3 rewritten — inbound is the WS back-channel routed cross-instance via the connector's relay bus; §5 interrupt + §6 auth table updated; the old signed-HTTP-POST + per-tenant-delivery-key-signing path is documented as superseded. gatewayEndpoint noted as passthrough-plane only. Tests: stub_connector grows set_interrupt_inbound_handler + push_interrupt; new test_relay_interrupt case proves connect() wires BOTH inbound handlers and an interrupt_inbound frame over the WS cancels the right session. Removed the HTTP-receiver test; updated the crypto-shedding scan + self-provision delivery-key assertion. 88 relay tests pass. EXPERIMENTAL. Pairs with gateway-gateway (relay bus + WsGatewayDelivery) and the NAS GATEWAY_RELAY_URL stamp. The cross-repo E2E (connector repo) proves the full multi-instance path against this production adapter code.
This commit is contained in:
parent
03d9a95a74
commit
d2c53ff558
9 changed files with 117 additions and 476 deletions
|
|
@ -62,33 +62,55 @@ live platform adapter's capability methods.
|
|||
|
||||
The connector normalizes each platform wire event into a `MessageEvent`
|
||||
(`gateway/platforms/base.py`) and delivers it to the gateway. **Inbound is
|
||||
delivered over a signed HTTP POST, not the outbound `/relay` WebSocket** (see
|
||||
the transport note below). The gateway keys the session via `build_session_key()`
|
||||
delivered over the gateway's OUTBOUND `/relay` WebSocket** (see the transport
|
||||
note below) — the connector pushes an `inbound` frame down the socket the
|
||||
gateway already dialed. The gateway keys the session via `build_session_key()`
|
||||
from the embedded `SessionSource` — so populating the right discriminators is
|
||||
the single highest-correctness responsibility of the connector.
|
||||
|
||||
### Inbound transport (signed HTTP POST, not the outbound WS)
|
||||
### Inbound transport (WS back-channel, not HTTP)
|
||||
|
||||
The gateway dials **out** to the connector's `/relay` WebSocket for the
|
||||
handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound,
|
||||
however, is delivered the other way: the connector **POSTs** the normalized
|
||||
event to the gateway's inbound endpoint (`HttpGatewayDelivery` on the connector;
|
||||
`gateway/relay/inbound_receiver.py` on the gateway). The reason is
|
||||
multi-instance: the connector instance that owns a platform's socket (and thus
|
||||
produces inbound events) is generally **not** the instance a given gateway
|
||||
dialed its outbound WS into, so inbound must target a tenant **endpoint** (which
|
||||
may load-balance across gateway instances) rather than ride one gateway's
|
||||
outbound socket. Each delivery is HMAC-signed with the per-tenant **delivery
|
||||
key** (§6.1); the gateway verifies the signature over the exact raw bytes before
|
||||
accepting the event. Two POST targets:
|
||||
handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound rides
|
||||
the **same socket** in the other direction: the connector pushes an `inbound`
|
||||
frame (and `interrupt_inbound` for §5) down the gateway's outbound WS. There is
|
||||
**no gateway-side inbound HTTP endpoint** — a gateway need not (and, when hosted,
|
||||
cannot) expose any inbound port; everything flows over the connection it
|
||||
initiated.
|
||||
|
||||
**Multi-instance routing.** The connector instance that owns a platform's socket
|
||||
(and thus produces inbound events) is generally **not** the instance the gateway
|
||||
dialed its outbound WS into. The producing instance therefore publishes the
|
||||
event on the connector's internal **relay bus** (Redis pub/sub; `RelayBus` in
|
||||
`src/core/relayBus.ts`) keyed by tenant. Every connector instance subscribes and
|
||||
routes each message to its **local** sessions for that tenant
|
||||
(`RelayServer.routeBusMessage`); the single instance that actually holds the
|
||||
gateway's socket delivers it, and instances with no local session for the tenant
|
||||
no-op. Cross-instance delivery is thus an in-cluster Redis hop, not a public
|
||||
HTTP call.
|
||||
|
||||
Frames (connector → gateway, over the WS):
|
||||
|
||||
- `{"type":"inbound", "event": <MessageEvent>, "bufferId"?}`
|
||||
- `{"type":"interrupt_inbound", "session_key", "chat_id"}` (§5)
|
||||
|
||||
**Trust.** The WS upgrade is authenticated with the gateway's per-gateway secret
|
||||
(§6.1), so the channel is trusted end to end — inbound frames are not separately
|
||||
HMAC-signed (the authenticated socket subsumes the per-delivery origin proof the
|
||||
old HTTP path needed). The relay-bus hop is inside the connector trust domain
|
||||
(same as the lease/buffer/capability stores).
|
||||
|
||||
> Earlier drafts of this contract delivered inbound over a signed **HTTP POST**
|
||||
> to a `gatewayEndpoint` (`HttpGatewayDelivery` + a gateway-side
|
||||
> `inbound_receiver`), HMAC-signed with a per-tenant delivery key. That required
|
||||
> every gateway to expose a reachable inbound URL — impossible for hosted
|
||||
> gateways, which have no public IP. The WS back-channel above replaces it; the
|
||||
> per-tenant delivery key is retained at provision for forward-compat but is no
|
||||
> longer used for inbound. `gatewayEndpoint` remains only for the **passthrough
|
||||
> plane** (Class-2/3 webhooks like Discord interactions / Twilio), which is a
|
||||
> separate synchronous-forward path and out of scope for this section.
|
||||
|
||||
- `POST {gatewayEndpoint}` → `{"type":"message", "event": <MessageEvent>}`
|
||||
- `POST {gatewayEndpoint}/interrupt` → `{"type":"interrupt", "session_key", "reason"?}` (§5)
|
||||
|
||||
> An earlier draft of this contract delivered inbound over the WS `inbound`
|
||||
> frame. That only works single-instance and predates the multi-instance
|
||||
> socket-ownership + channel-auth model; the signed-HTTP path above is the
|
||||
> shipped design.
|
||||
|
||||
### SessionSource fields (the wire surface)
|
||||
|
||||
|
|
@ -178,13 +200,15 @@ gateway holds zero capability material). Source of truth:
|
|||
mid-turn `/stop` over the outbound WS. The connector MUST forward it to the
|
||||
gateway instance running that `session_key` (the routing invariant).
|
||||
- **Connector → gateway:** an inbound interrupt for a `session_key` is delivered
|
||||
as a **signed HTTP POST** to `{gatewayEndpoint}/interrupt` (§3 transport note),
|
||||
and bridged by the adapter's `on_interrupt(session_key, chat_id)` into the
|
||||
existing per-session interrupt mechanism, cancelling exactly that turn
|
||||
as an `interrupt_inbound` frame down the gateway's outbound WS (§3 transport
|
||||
note) — routed cross-instance via the relay bus to whichever instance holds
|
||||
the socket — and bridged by the adapter's `on_interrupt(session_key, chat_id)`
|
||||
into the existing per-session interrupt mechanism, cancelling exactly that turn
|
||||
(siblings untouched).
|
||||
|
||||
The gateway→connector `/stop` rides the outbound WS; the connector→gateway
|
||||
interrupt rides the same signed-HTTP inbound path as a normalized event.
|
||||
Both directions ride the gateway's outbound WS: the gateway→connector `/stop`
|
||||
egresses over it, and the connector→gateway interrupt rides the same `inbound`
|
||||
back-channel as a normalized event.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -231,20 +255,21 @@ only in transport. See `docs/capability-trust-boundary.md` (connector repo:
|
|||
|
||||
A2 makes the connector the sole holder of platform secrets while the gateway may
|
||||
be **customer-managed and internet-exposed**, so the connector⇄gateway channel
|
||||
is itself authenticated. The gateway holds two enrollment-issued credentials
|
||||
(`hermes gateway enroll` → connector `/relay/enroll`): a **per-gateway secret**
|
||||
and a **per-tenant delivery key**. Both are HMAC-SHA256 schemes with a
|
||||
multi-secret rotation verify list (gateway side: `gateway/relay/auth.py`;
|
||||
connector side: `src/core/relayAuthToken.ts` + `src/core/deliverySigning.ts`).
|
||||
is itself authenticated. The gateway holds an enrollment- or provision-issued
|
||||
**per-gateway secret** (`hermes gateway enroll` → connector `/relay/enroll`, or
|
||||
managed self-provision → `/relay/provision`) that authenticates its outbound WS
|
||||
upgrade. It is an HMAC-SHA256 scheme with a multi-secret rotation verify list
|
||||
(gateway side: `gateway/relay/auth.py`; connector side:
|
||||
`src/core/relayAuthToken.ts`).
|
||||
|
||||
| Leg | Credential | Mechanism |
|
||||
|-----|-----------|-----------|
|
||||
| Gateway → connector WS upgrade | per-gateway secret | An `Authorization` bearer header on the `/relay` upgrade. The token is `base64url(payload:exp:sig)` where `payload = gatewayId` and `sig = HMAC(payload:exp, secret)`. Connector verifies and rejects the upgrade (**close 4401**) on mismatch/absence/revocation. The authenticated tenant comes from the connector's store, never the `hello` frame. |
|
||||
| Connector → gateway inbound POST | per-tenant delivery key | Two headers: `x-relay-timestamp` (unix seconds) and `x-relay-signature` (hex `HMAC(ts.rawBody, deliveryKey)`). Gateway verifies over the **exact raw bytes** within a ±300s replay window before accepting the event; rejects **401** otherwise. |
|
||||
| Connector → gateway inbound (`inbound` / `interrupt_inbound` frames) | — (rides the authenticated WS) | Inbound is pushed down the gateway's already-authenticated outbound socket (§3), so no per-message signature is needed. A **per-tenant delivery key** is still issued at enroll/provision and retained for forward-compat, but is no longer used to sign inbound. |
|
||||
|
||||
This is the **channel** authenticator — distinct from platform crypto, which the
|
||||
relay path still sheds entirely (§6). The gateway holds zero platform secrets;
|
||||
these two keys authenticate only the connector link. Full threat model +
|
||||
the per-gateway secret authenticates only the connector link. Full threat model +
|
||||
enrollment/rotation/kill-switch design: `docs/connector-gateway-auth-design.md`
|
||||
(connector repo).
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue