hermes-agent/docs/relay-connector-contract.md
Ben Barclay 64a507da44
feat(relay): handle passthrough_forward over the WS (Phase 5 §5.1, gateway half) (#50702)
The connector half (gateway-gateway) moves the passthrough plane's post-ACK
forward off the HTTP gatewayEndpoint onto the gateway's outbound /relay WS via
a new passthrough_forward frame. This is the gateway side: the relay adapter
now RECEIVES and handles that frame, so a hosted gateway (no public IP) can
process forwarded Class-2/3 traffic (Discord interactions, Twilio) over the
socket it already holds — closing the "passthrough inbound doesn't work for
hosted gateways" gap.

- ws_transport.py: decode the passthrough_forward frame; PassthroughForward
  dataclass + _passthrough_from_wire (base64 body -> exact bytes, byte parity
  with the connector's toPassthroughForward); set_passthrough_handler mirrors
  set_interrupt_inbound_handler.
- transport.py: PassthroughHandler type + set_passthrough_handler on the
  RelayTransport protocol.
- adapter.py: connect() wires the passthrough handler; _on_passthrough decodes
  the (already-sanitized, token-free) forward and, for a Discord interaction,
  converts it to a MessageEvent routed through the normal agent path
  (handle_message) — the reply egresses over the outbound / token-less
  follow_up path, so the gateway never holds the interaction credential. Never
  raises (a bad forward can't kill the read loop). Non-discord forwards (Twilio)
  are logged + dropped for now.
- docs/relay-connector-contract.md: document the passthrough_forward frame +
  PassthroughForward shape + §3.1.

The interaction -> MessageEvent CONVERSION semantics (slash-command vs button
UX, option rendering) are the open sub-design flagged in the spec; the TRANSPORT
+ receive mechanism (this) is settled per Ben's Gate-2 decision: "the relay
adapter handles receiving these events over the WS."

Tests (tests/gateway/relay/test_relay_passthrough.py): byte-preservation
round-trip (+ malformed-body tolerance), connect() wiring, application-command
and message-component interactions route through handle_message with correct
session source + scope capture, malformed/non-discord forwards dropped cleanly.
100 relay tests green. Pairs with the connector PR (gateway-gateway).
2026-06-22 20:10:57 +10:00

310 lines
17 KiB
Markdown

# Relay ↔ Connector Contract (v1, EXPERIMENTAL)
> **Status:** EXPERIMENTAL. This contract MAY CHANGE without a deprecation
> cycle until at least two real Class-1 platforms (Discord + Telegram) have
> validated it. Evolution during the experimental phase is **additive-only**,
> gated by `contract_version`. A breaking change updates both repos in lockstep.
This document is the formal interface between the **Hermes gateway** (Python,
`gateway/relay/`) and the **connector** (Node/TypeScript,
`NousResearch/gateway-gateway`). The connector implementer's first action is to
read this file.
The gateway runs a generic `RelayAdapter` that dials **out** to the connector,
receives a `CapabilityDescriptor` at handshake, then exchanges normalized
`MessageEvent`s (inbound) and actions (outbound) over a per-turn bidirectional
WebSocket. The gateway never learns which concrete platform is fronting it; the
connector owns all platform-specific socket/identity logic.
---
## 1. Handshake
1. Gateway opens the transport (`connect`).
2. Gateway calls `handshake()`; connector returns a `CapabilityDescriptor`
(section 2) describing the platform this adapter instance fronts.
3. Gateway configures the adapter from the descriptor (char limit, length unit,
draft/edit/thread/markdown capabilities) and registers an inbound handler.
4. Connector then streams inbound events and accepts outbound actions.
`contract_version` (currently `1`) is carried in the descriptor. The gateway
ignores unknown descriptor fields (forward-compat) and fills missing optional
fields from defaults.
---
## 2. CapabilityDescriptor (handshake payload)
JSON object. Source of truth: `gateway/relay/descriptor.py`.
| Field | Type | Required | Meaning |
| --- | --- | --- | --- |
| `contract_version` | int | yes | Contract version (additive-only within a version). |
| `platform` | string | yes | Platform name (e.g. `"discord"`, `"telegram"`). |
| `label` | string | yes | Human-readable label. |
| `max_message_length` | int | yes | Char limit; gateway exposes as `MAX_MESSAGE_LENGTH`. 0 → treat as 4096. |
| `supports_draft_streaming` | bool | yes | Native draft-streaming preview support. |
| `supports_edit` | bool | yes | Edit-based streaming possible; if false, consumer degrades to one-message-per-segment. |
| `supports_threads` | bool | yes | `create_handoff_thread` capability. |
| `markdown_dialect` | string | yes | `"plain"`, `"markdown_v2"`, `"discord"`, … (drives `supports_code_blocks`). |
| `len_unit` | string | yes | `"chars"` (builtin len) or `"utf16"` (Telegram UTF-16 code units). |
| `emoji` | string | no | Display emoji (default 🔌). |
| `platform_hint` | string | no | System-prompt platform hint. |
| `pii_safe` | bool | no | Redact PII in session descriptions. |
Most fields are a projection of the gateway's existing `PlatformEntry`; the
runtime-only fields (`len_unit`, `supports_*`, `markdown_dialect`) come from the
live platform adapter's capability methods.
---
## 3. Inbound: `MessageEvent` envelope
The connector normalizes each platform wire event into a `MessageEvent`
(`gateway/platforms/base.py`) and delivers it to the gateway. **Inbound is
delivered over the gateway's OUTBOUND `/relay` WebSocket** (see the transport
note below) — the connector pushes an `inbound` frame down the socket the
gateway already dialed. The gateway keys the session via `build_session_key()`
from the embedded `SessionSource` — so populating the right discriminators is
the single highest-correctness responsibility of the connector.
### Inbound transport (WS back-channel, not HTTP)
The gateway dials **out** to the connector's `/relay` WebSocket for the
handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound rides
the **same socket** in the other direction: the connector pushes an `inbound`
frame (and `interrupt_inbound` for §5) down the gateway's outbound WS. There is
**no gateway-side inbound HTTP endpoint** — a gateway need not (and, when hosted,
cannot) expose any inbound port; everything flows over the connection it
initiated.
**Multi-instance routing.** The connector instance that owns a platform's socket
(and thus produces inbound events) is generally **not** the instance the gateway
dialed its outbound WS into. The producing instance therefore publishes the
event on the connector's internal **relay bus** (Redis pub/sub; `RelayBus` in
`src/core/relayBus.ts`) keyed by tenant. Every connector instance subscribes and
routes each message to its **local** sessions for that tenant
(`RelayServer.routeBusMessage`); the single instance that actually holds the
gateway's socket delivers it, and instances with no local session for the tenant
no-op. Cross-instance delivery is thus an in-cluster Redis hop, not a public
HTTP call.
Frames (connector → gateway, over the WS):
- `{"type":"inbound", "event": <MessageEvent>, "bufferId"?}`
- `{"type":"interrupt_inbound", "session_key", "chat_id"}` (§5)
- `{"type":"passthrough_forward", "forward": <PassthroughForward>, "bufferId"?}` (§5.1)
`PassthroughForward` is the wire form of a forwarded passthrough-plane request
(Class-2/3 webhooks — Discord interactions, Twilio): `{platform, botId, method,
path, headers: [[k,v],…], bodyB64}`. The body is base64-encoded so arbitrary
bytes survive the newline-delimited-JSON transport; the gateway base64-decodes
back to the exact bytes the connector forwarded (the connector already verified
the provider signature and stripped any shared-identity credential at the edge —
§6 — so the gateway re-processes a sanitized, token-free body and acts on it via
the token-less `follow_up` path). See §3.1.
**Trust.** The WS upgrade is authenticated with the gateway's per-gateway secret
(§6.1), so the channel is trusted end to end — inbound frames are not separately
HMAC-signed (the authenticated socket subsumes the per-delivery origin proof the
old HTTP path needed). The relay-bus hop is inside the connector trust domain
(same as the lease/buffer/capability stores).
> Earlier drafts of this contract delivered inbound over a signed **HTTP POST**
> to a `gatewayEndpoint` (`HttpGatewayDelivery` + a gateway-side
> `inbound_receiver`), HMAC-signed with a per-tenant delivery key. That required
> every gateway to expose a reachable inbound URL — impossible for hosted
> gateways, which have no public IP. The WS back-channel above replaces it; the
> per-tenant delivery key is retained at provision for forward-compat but is no
> longer used for inbound. The **passthrough plane** (Class-2/3 webhooks like
> Discord interactions / Twilio) historically still used `gatewayEndpoint` for
> its post-ACK forward; Phase 5 §5.1 moves that forward onto the WS too (the
> `passthrough_forward` frame above), so a hosted gateway needs zero public
> inbound surface and `gatewayEndpoint` is retired once the cutover lands.
### 3.1 Passthrough-plane forward (§5.1)
The passthrough plane answers the provider's latency-critical ACK at the
connector EDGE (e.g. Discord's deferred interaction response within ~3s), then
does a **fire-and-forget** forward of the real request to the gateway. That
forward needs no response back (the provider was already satisfied), so it rides
the same outbound WS as `inbound` via a `passthrough_forward` frame rather than
an HTTP POST. The gateway processes the decoded request through its normal agent
path (a Discord interaction is decoded to a `MessageEvent` and handled like a
message; the reply egresses over the outbound / `follow_up` path). `bufferId` is
present when the forward was buffered (Phase 5 §5.3 buffered-only flip) and the
gateway acks it after durable handoff.
### SessionSource fields (the wire surface)
Source of truth: `SessionSource.to_dict()` in `gateway/session.py`. These are
every key the gateway accepts on the wire. `platform`, `chat_id`, `chat_type`,
`user_id`, `user_name`, `thread_id`, `chat_name`, and `chat_topic` are always
present (may be `null`); the rest are included only when set.
| Field | Type | Always sent | Meaning |
| --- | --- | --- | --- |
| `platform` | string | yes | Platform name (matches the descriptor's `platform`). |
| `chat_id` | string | yes | Primary conversation id (channel/chat). Session-key discriminator. |
| `chat_type` | string | yes | `dm` / `group` / `channel` / `thread` / `forum`. |
| `chat_name` | string\|null | yes | Human-readable chat name. |
| `user_id` | string\|null | yes | Message author id. Session-key discriminator. |
| `user_name` | string\|null | yes | Author display name. |
| `thread_id` | string\|null | yes | Thread/forum-topic id when in a thread. Session-key discriminator. |
| `chat_topic` | string\|null | yes | Channel topic/description (Discord, Slack). |
| `user_id_alt` | string | no | Platform-specific stable alt id (Signal UUID, Feishu union_id). |
| `chat_id_alt` | string | no | Alternate chat id (e.g. Signal group internal id). |
| `guild_id` | string | no | Discord guild / Slack workspace / Matrix server scope. **REQUIRED for Discord server isolation.** Session-key discriminator. |
| `parent_chat_id` | string | no | Parent channel when `chat_id` refers to a thread. |
| `message_id` | string | no | Id of the triggering message (for pin/reply/react). |
> `is_bot` (author-is-a-bot/webhook classification) exists on the gateway-side
> dataclass but is **intentionally NOT on the wire** in v1 — it is not part of
> `to_dict()`. Do not add it to the connector's `SessionSource` until it is
> first added here and to `to_dict()` (additive bump).
### SessionSource discriminators per platform
| Platform | chat_id | chat_type | user_id | thread_id | guild_id |
| --- | --- | --- | --- | --- | --- |
| **Discord** | channel id | `dm`/`group`/`thread` | author id | thread channel id (threads) | **guild id** (REQUIRED for server isolation) |
| **Telegram** | chat id | `dm`/`group`/`forum` | from id | forum topic id (forums) | — |
**Get Discord's `guild_id` wrong and two servers collide into one session.**
This is the #1 High-severity risk. The gateway's `build_session_key()` is the
conformance oracle: for a given `SessionSource`, the connector's normalization
must produce the same key the Python adapter would. (The Phase-1 stub tests
assert known-input → known-key.)
### Bot identity vs tenant (single-bot consolidation, Appendix A)
The envelope carries the **originating bot identity** as a field **distinct from
tenant**. Tenant is resolved from the event's own discriminator (Discord
`guild_id`, Telegram `chat_id`, webhook path/subdomain) — **never** from which
token/socket/process delivered it. This keeps one shared bot able to front many
tenants (Phase 6) without overloading an existing field.
---
## 4. Outbound: action set
The gateway calls the transport with action dicts. Source of truth:
`gateway/relay/transport.py` + `gateway/relay/adapter.py`.
| `op` | Fields | Result |
| --- | --- | --- |
| `send` | `chat_id`, `content`, `reply_to?`, `metadata?` | `{success: bool, message_id?, error?}` |
| `edit` | `chat_id`, `message_id`, `content`, `metadata?` | `{success: bool, error?}` |
| `typing` | `chat_id` | `{success: bool}` |
| `follow_up` | `session_key`, `kind`, `content`, `metadata?` | `{success: bool, message_id?, error?}` |
`get_chat_info(chat_id)` is a separate proxied call returning at least
`{name, type}`. Media actions follow the same envelope shape (deferred to a
later contract revision; additive).
**`follow_up` (A2 capability action).** Some inbound payloads carry a credential
that acts on the **shared** bot identity (e.g. a Discord interaction follow-up
token). Per §6 the connector strips that at the edge and binds it in its
capability vault keyed by the session; it **never reaches the gateway**. To use
it, the gateway issues `follow_up` naming the **session it is already in**
(`session_key`) plus the capability `kind` (e.g. `discord.interaction_token`) —
**never a token**. The connector resolves the real value from its vault,
enforces the tenant match (tenant B can never wield tenant A's capability), and
egresses. `success: false` when the capability is absent/expired or the tenant
doesn't match — the gateway has nothing to retry with, by design (a leaked
gateway holds zero capability material). Source of truth:
`gateway/relay/transport.py` (`send_follow_up`) + `gateway/relay/adapter.py`.
---
## 5. Interrupt (`/stop`) routing
- **Gateway → connector:** `send_interrupt(session_key, reason?)` egresses a
mid-turn `/stop` over the outbound WS. The connector MUST forward it to the
gateway instance running that `session_key` (the routing invariant).
- **Connector → gateway:** an inbound interrupt for a `session_key` is delivered
as an `interrupt_inbound` frame down the gateway's outbound WS (§3 transport
note) — routed cross-instance via the relay bus to whichever instance holds
the socket — and bridged by the adapter's `on_interrupt(session_key, chat_id)`
into the existing per-session interrupt mechanism, cancelling exactly that turn
(siblings untouched).
Both directions ride the gateway's outbound WS: the gateway→connector `/stop`
egresses over it, and the connector→gateway interrupt rides the same `inbound`
back-channel as a normalized event.
---
## 6. Trust boundary & signed-body handling (A2)
**The connector is the sole crypto/identity boundary. The gateway re-validates
nothing.**
Webhook signatures (Discord ed25519, Twilio HMAC, WeCom BizMsgCrypt) are
computed over exact raw bytes, and some payloads are *encrypted* with a shared
secret. The connector fronts a **shared** bot for many tenants and holds every
tenant's platform secrets, so it:
- **verifies / decrypts at the edge** (the only place the secrets live),
- **normalizes** the payload into a tenant-scoped `MessageEvent` (§3),
- **strips any shared-identity capability** out of the payload and binds it in
its capability vault, keyed by the session (see §4 `follow_up`),
- **forwards only the sanitized `MessageEvent`** — never the raw signed body.
The gateway therefore performs **no** platform signature/crypto verification on
the relay path; it trusts the normalized event. This is an enforced invariant on
the gateway side (`tests/gateway/relay/test_relay_sheds_crypto.py`: the relay
package imports/calls no platform-crypto).
**Why not "forward the signed body byte-for-byte so the gateway re-validates"?**
That earlier model is incoherent under an untrusted, disposable tenant gateway:
- Re-validating Twilio HMAC / WeCom crypto would require handing the gateway the
**shared signing secret** — which is itself the leak, and on a shared bot it's
a *cross-tenant* leak.
- WeCom payloads are encrypted with the shared secret; the connector must decrypt
at the edge just to route, so forwarding ciphertext would again require giving
the gateway the secret.
- A Discord interaction token lives **inside** the signed JSON body — you cannot
both preserve the bytes and strip the credential; they are the same bytes.
So byte-preservation is abandoned deliberately: the connector re-serializes the
sanitized event and the gateway trusts it. This also unifies the passthrough and
relay planes — both are "verify at the edge → emit a normalized event," differing
only in transport. See `docs/capability-trust-boundary.md` (connector repo:
`gateway-gateway`) for the full A2 rationale and the connector-side vault.
### 6.1 Channel authentication (the connector⇄gateway link itself)
A2 makes the connector the sole holder of platform secrets while the gateway may
be **customer-managed and internet-exposed**, so the connector⇄gateway channel
is itself authenticated. The gateway holds an enrollment- or provision-issued
**per-gateway secret** (`hermes gateway enroll` → connector `/relay/enroll`, or
managed self-provision → `/relay/provision`) that authenticates its outbound WS
upgrade. It is an HMAC-SHA256 scheme with a multi-secret rotation verify list
(gateway side: `gateway/relay/auth.py`; connector side:
`src/core/relayAuthToken.ts`).
| Leg | Credential | Mechanism |
|-----|-----------|-----------|
| Gateway → connector WS upgrade | per-gateway secret | An `Authorization` bearer header on the `/relay` upgrade. The token is `base64url(payload:exp:sig)` where `payload = gatewayId` and `sig = HMAC(payload:exp, secret)`. Connector verifies and rejects the upgrade (**close 4401**) on mismatch/absence/revocation. The authenticated tenant comes from the connector's store, never the `hello` frame. |
| Connector → gateway inbound (`inbound` / `interrupt_inbound` frames) | — (rides the authenticated WS) | Inbound is pushed down the gateway's already-authenticated outbound socket (§3), so no per-message signature is needed. A **per-tenant delivery key** is still issued at enroll/provision and retained for forward-compat, but is no longer used to sign inbound. |
This is the **channel** authenticator — distinct from platform crypto, which the
relay path still sheds entirely (§6). The gateway holds zero platform secrets;
the per-gateway secret authenticates only the connector link. Full threat model +
enrollment/rotation/kill-switch design: `docs/connector-gateway-auth-design.md`
(connector repo).
---
## 7. Versioning policy
- `contract_version` is an int; bump **only** for additive changes during the
experimental phase (new optional fields, new `op`s).
- A breaking change (renamed/removed field, changed semantics) requires a
coordinated update of both repos and a version bump.
- The connector's first PR references the commit SHA of this file it implements
against.