mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-24 10:52:21 +00:00
The connector half (gateway-gateway) moves the passthrough plane's post-ACK forward off the HTTP gatewayEndpoint onto the gateway's outbound /relay WS via a new passthrough_forward frame. This is the gateway side: the relay adapter now RECEIVES and handles that frame, so a hosted gateway (no public IP) can process forwarded Class-2/3 traffic (Discord interactions, Twilio) over the socket it already holds — closing the "passthrough inbound doesn't work for hosted gateways" gap. - ws_transport.py: decode the passthrough_forward frame; PassthroughForward dataclass + _passthrough_from_wire (base64 body -> exact bytes, byte parity with the connector's toPassthroughForward); set_passthrough_handler mirrors set_interrupt_inbound_handler. - transport.py: PassthroughHandler type + set_passthrough_handler on the RelayTransport protocol. - adapter.py: connect() wires the passthrough handler; _on_passthrough decodes the (already-sanitized, token-free) forward and, for a Discord interaction, converts it to a MessageEvent routed through the normal agent path (handle_message) — the reply egresses over the outbound / token-less follow_up path, so the gateway never holds the interaction credential. Never raises (a bad forward can't kill the read loop). Non-discord forwards (Twilio) are logged + dropped for now. - docs/relay-connector-contract.md: document the passthrough_forward frame + PassthroughForward shape + §3.1. The interaction -> MessageEvent CONVERSION semantics (slash-command vs button UX, option rendering) are the open sub-design flagged in the spec; the TRANSPORT + receive mechanism (this) is settled per Ben's Gate-2 decision: "the relay adapter handles receiving these events over the WS." Tests (tests/gateway/relay/test_relay_passthrough.py): byte-preservation round-trip (+ malformed-body tolerance), connect() wiring, application-command and message-component interactions route through handle_message with correct session source + scope capture, malformed/non-discord forwards dropped cleanly. 100 relay tests green. Pairs with the connector PR (gateway-gateway).
310 lines
17 KiB
Markdown
310 lines
17 KiB
Markdown
# Relay ↔ Connector Contract (v1, EXPERIMENTAL)
|
|
|
|
> **Status:** EXPERIMENTAL. This contract MAY CHANGE without a deprecation
|
|
> cycle until at least two real Class-1 platforms (Discord + Telegram) have
|
|
> validated it. Evolution during the experimental phase is **additive-only**,
|
|
> gated by `contract_version`. A breaking change updates both repos in lockstep.
|
|
|
|
This document is the formal interface between the **Hermes gateway** (Python,
|
|
`gateway/relay/`) and the **connector** (Node/TypeScript,
|
|
`NousResearch/gateway-gateway`). The connector implementer's first action is to
|
|
read this file.
|
|
|
|
The gateway runs a generic `RelayAdapter` that dials **out** to the connector,
|
|
receives a `CapabilityDescriptor` at handshake, then exchanges normalized
|
|
`MessageEvent`s (inbound) and actions (outbound) over a per-turn bidirectional
|
|
WebSocket. The gateway never learns which concrete platform is fronting it; the
|
|
connector owns all platform-specific socket/identity logic.
|
|
|
|
---
|
|
|
|
## 1. Handshake
|
|
|
|
1. Gateway opens the transport (`connect`).
|
|
2. Gateway calls `handshake()`; connector returns a `CapabilityDescriptor`
|
|
(section 2) describing the platform this adapter instance fronts.
|
|
3. Gateway configures the adapter from the descriptor (char limit, length unit,
|
|
draft/edit/thread/markdown capabilities) and registers an inbound handler.
|
|
4. Connector then streams inbound events and accepts outbound actions.
|
|
|
|
`contract_version` (currently `1`) is carried in the descriptor. The gateway
|
|
ignores unknown descriptor fields (forward-compat) and fills missing optional
|
|
fields from defaults.
|
|
|
|
---
|
|
|
|
## 2. CapabilityDescriptor (handshake payload)
|
|
|
|
JSON object. Source of truth: `gateway/relay/descriptor.py`.
|
|
|
|
| Field | Type | Required | Meaning |
|
|
| --- | --- | --- | --- |
|
|
| `contract_version` | int | yes | Contract version (additive-only within a version). |
|
|
| `platform` | string | yes | Platform name (e.g. `"discord"`, `"telegram"`). |
|
|
| `label` | string | yes | Human-readable label. |
|
|
| `max_message_length` | int | yes | Char limit; gateway exposes as `MAX_MESSAGE_LENGTH`. 0 → treat as 4096. |
|
|
| `supports_draft_streaming` | bool | yes | Native draft-streaming preview support. |
|
|
| `supports_edit` | bool | yes | Edit-based streaming possible; if false, consumer degrades to one-message-per-segment. |
|
|
| `supports_threads` | bool | yes | `create_handoff_thread` capability. |
|
|
| `markdown_dialect` | string | yes | `"plain"`, `"markdown_v2"`, `"discord"`, … (drives `supports_code_blocks`). |
|
|
| `len_unit` | string | yes | `"chars"` (builtin len) or `"utf16"` (Telegram UTF-16 code units). |
|
|
| `emoji` | string | no | Display emoji (default 🔌). |
|
|
| `platform_hint` | string | no | System-prompt platform hint. |
|
|
| `pii_safe` | bool | no | Redact PII in session descriptions. |
|
|
|
|
Most fields are a projection of the gateway's existing `PlatformEntry`; the
|
|
runtime-only fields (`len_unit`, `supports_*`, `markdown_dialect`) come from the
|
|
live platform adapter's capability methods.
|
|
|
|
---
|
|
|
|
## 3. Inbound: `MessageEvent` envelope
|
|
|
|
The connector normalizes each platform wire event into a `MessageEvent`
|
|
(`gateway/platforms/base.py`) and delivers it to the gateway. **Inbound is
|
|
delivered over the gateway's OUTBOUND `/relay` WebSocket** (see the transport
|
|
note below) — the connector pushes an `inbound` frame down the socket the
|
|
gateway already dialed. The gateway keys the session via `build_session_key()`
|
|
from the embedded `SessionSource` — so populating the right discriminators is
|
|
the single highest-correctness responsibility of the connector.
|
|
|
|
### Inbound transport (WS back-channel, not HTTP)
|
|
|
|
The gateway dials **out** to the connector's `/relay` WebSocket for the
|
|
handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound rides
|
|
the **same socket** in the other direction: the connector pushes an `inbound`
|
|
frame (and `interrupt_inbound` for §5) down the gateway's outbound WS. There is
|
|
**no gateway-side inbound HTTP endpoint** — a gateway need not (and, when hosted,
|
|
cannot) expose any inbound port; everything flows over the connection it
|
|
initiated.
|
|
|
|
**Multi-instance routing.** The connector instance that owns a platform's socket
|
|
(and thus produces inbound events) is generally **not** the instance the gateway
|
|
dialed its outbound WS into. The producing instance therefore publishes the
|
|
event on the connector's internal **relay bus** (Redis pub/sub; `RelayBus` in
|
|
`src/core/relayBus.ts`) keyed by tenant. Every connector instance subscribes and
|
|
routes each message to its **local** sessions for that tenant
|
|
(`RelayServer.routeBusMessage`); the single instance that actually holds the
|
|
gateway's socket delivers it, and instances with no local session for the tenant
|
|
no-op. Cross-instance delivery is thus an in-cluster Redis hop, not a public
|
|
HTTP call.
|
|
|
|
Frames (connector → gateway, over the WS):
|
|
|
|
- `{"type":"inbound", "event": <MessageEvent>, "bufferId"?}`
|
|
- `{"type":"interrupt_inbound", "session_key", "chat_id"}` (§5)
|
|
- `{"type":"passthrough_forward", "forward": <PassthroughForward>, "bufferId"?}` (§5.1)
|
|
|
|
`PassthroughForward` is the wire form of a forwarded passthrough-plane request
|
|
(Class-2/3 webhooks — Discord interactions, Twilio): `{platform, botId, method,
|
|
path, headers: [[k,v],…], bodyB64}`. The body is base64-encoded so arbitrary
|
|
bytes survive the newline-delimited-JSON transport; the gateway base64-decodes
|
|
back to the exact bytes the connector forwarded (the connector already verified
|
|
the provider signature and stripped any shared-identity credential at the edge —
|
|
§6 — so the gateway re-processes a sanitized, token-free body and acts on it via
|
|
the token-less `follow_up` path). See §3.1.
|
|
|
|
**Trust.** The WS upgrade is authenticated with the gateway's per-gateway secret
|
|
(§6.1), so the channel is trusted end to end — inbound frames are not separately
|
|
HMAC-signed (the authenticated socket subsumes the per-delivery origin proof the
|
|
old HTTP path needed). The relay-bus hop is inside the connector trust domain
|
|
(same as the lease/buffer/capability stores).
|
|
|
|
> Earlier drafts of this contract delivered inbound over a signed **HTTP POST**
|
|
> to a `gatewayEndpoint` (`HttpGatewayDelivery` + a gateway-side
|
|
> `inbound_receiver`), HMAC-signed with a per-tenant delivery key. That required
|
|
> every gateway to expose a reachable inbound URL — impossible for hosted
|
|
> gateways, which have no public IP. The WS back-channel above replaces it; the
|
|
> per-tenant delivery key is retained at provision for forward-compat but is no
|
|
> longer used for inbound. The **passthrough plane** (Class-2/3 webhooks like
|
|
> Discord interactions / Twilio) historically still used `gatewayEndpoint` for
|
|
> its post-ACK forward; Phase 5 §5.1 moves that forward onto the WS too (the
|
|
> `passthrough_forward` frame above), so a hosted gateway needs zero public
|
|
> inbound surface and `gatewayEndpoint` is retired once the cutover lands.
|
|
|
|
### 3.1 Passthrough-plane forward (§5.1)
|
|
|
|
The passthrough plane answers the provider's latency-critical ACK at the
|
|
connector EDGE (e.g. Discord's deferred interaction response within ~3s), then
|
|
does a **fire-and-forget** forward of the real request to the gateway. That
|
|
forward needs no response back (the provider was already satisfied), so it rides
|
|
the same outbound WS as `inbound` via a `passthrough_forward` frame rather than
|
|
an HTTP POST. The gateway processes the decoded request through its normal agent
|
|
path (a Discord interaction is decoded to a `MessageEvent` and handled like a
|
|
message; the reply egresses over the outbound / `follow_up` path). `bufferId` is
|
|
present when the forward was buffered (Phase 5 §5.3 buffered-only flip) and the
|
|
gateway acks it after durable handoff.
|
|
|
|
|
|
|
|
### SessionSource fields (the wire surface)
|
|
|
|
Source of truth: `SessionSource.to_dict()` in `gateway/session.py`. These are
|
|
every key the gateway accepts on the wire. `platform`, `chat_id`, `chat_type`,
|
|
`user_id`, `user_name`, `thread_id`, `chat_name`, and `chat_topic` are always
|
|
present (may be `null`); the rest are included only when set.
|
|
|
|
| Field | Type | Always sent | Meaning |
|
|
| --- | --- | --- | --- |
|
|
| `platform` | string | yes | Platform name (matches the descriptor's `platform`). |
|
|
| `chat_id` | string | yes | Primary conversation id (channel/chat). Session-key discriminator. |
|
|
| `chat_type` | string | yes | `dm` / `group` / `channel` / `thread` / `forum`. |
|
|
| `chat_name` | string\|null | yes | Human-readable chat name. |
|
|
| `user_id` | string\|null | yes | Message author id. Session-key discriminator. |
|
|
| `user_name` | string\|null | yes | Author display name. |
|
|
| `thread_id` | string\|null | yes | Thread/forum-topic id when in a thread. Session-key discriminator. |
|
|
| `chat_topic` | string\|null | yes | Channel topic/description (Discord, Slack). |
|
|
| `user_id_alt` | string | no | Platform-specific stable alt id (Signal UUID, Feishu union_id). |
|
|
| `chat_id_alt` | string | no | Alternate chat id (e.g. Signal group internal id). |
|
|
| `guild_id` | string | no | Discord guild / Slack workspace / Matrix server scope. **REQUIRED for Discord server isolation.** Session-key discriminator. |
|
|
| `parent_chat_id` | string | no | Parent channel when `chat_id` refers to a thread. |
|
|
| `message_id` | string | no | Id of the triggering message (for pin/reply/react). |
|
|
|
|
> `is_bot` (author-is-a-bot/webhook classification) exists on the gateway-side
|
|
> dataclass but is **intentionally NOT on the wire** in v1 — it is not part of
|
|
> `to_dict()`. Do not add it to the connector's `SessionSource` until it is
|
|
> first added here and to `to_dict()` (additive bump).
|
|
|
|
### SessionSource discriminators per platform
|
|
|
|
| Platform | chat_id | chat_type | user_id | thread_id | guild_id |
|
|
| --- | --- | --- | --- | --- | --- |
|
|
| **Discord** | channel id | `dm`/`group`/`thread` | author id | thread channel id (threads) | **guild id** (REQUIRED for server isolation) |
|
|
| **Telegram** | chat id | `dm`/`group`/`forum` | from id | forum topic id (forums) | — |
|
|
|
|
**Get Discord's `guild_id` wrong and two servers collide into one session.**
|
|
This is the #1 High-severity risk. The gateway's `build_session_key()` is the
|
|
conformance oracle: for a given `SessionSource`, the connector's normalization
|
|
must produce the same key the Python adapter would. (The Phase-1 stub tests
|
|
assert known-input → known-key.)
|
|
|
|
### Bot identity vs tenant (single-bot consolidation, Appendix A)
|
|
|
|
The envelope carries the **originating bot identity** as a field **distinct from
|
|
tenant**. Tenant is resolved from the event's own discriminator (Discord
|
|
`guild_id`, Telegram `chat_id`, webhook path/subdomain) — **never** from which
|
|
token/socket/process delivered it. This keeps one shared bot able to front many
|
|
tenants (Phase 6) without overloading an existing field.
|
|
|
|
---
|
|
|
|
## 4. Outbound: action set
|
|
|
|
The gateway calls the transport with action dicts. Source of truth:
|
|
`gateway/relay/transport.py` + `gateway/relay/adapter.py`.
|
|
|
|
| `op` | Fields | Result |
|
|
| --- | --- | --- |
|
|
| `send` | `chat_id`, `content`, `reply_to?`, `metadata?` | `{success: bool, message_id?, error?}` |
|
|
| `edit` | `chat_id`, `message_id`, `content`, `metadata?` | `{success: bool, error?}` |
|
|
| `typing` | `chat_id` | `{success: bool}` |
|
|
| `follow_up` | `session_key`, `kind`, `content`, `metadata?` | `{success: bool, message_id?, error?}` |
|
|
|
|
`get_chat_info(chat_id)` is a separate proxied call returning at least
|
|
`{name, type}`. Media actions follow the same envelope shape (deferred to a
|
|
later contract revision; additive).
|
|
|
|
**`follow_up` (A2 capability action).** Some inbound payloads carry a credential
|
|
that acts on the **shared** bot identity (e.g. a Discord interaction follow-up
|
|
token). Per §6 the connector strips that at the edge and binds it in its
|
|
capability vault keyed by the session; it **never reaches the gateway**. To use
|
|
it, the gateway issues `follow_up` naming the **session it is already in**
|
|
(`session_key`) plus the capability `kind` (e.g. `discord.interaction_token`) —
|
|
**never a token**. The connector resolves the real value from its vault,
|
|
enforces the tenant match (tenant B can never wield tenant A's capability), and
|
|
egresses. `success: false` when the capability is absent/expired or the tenant
|
|
doesn't match — the gateway has nothing to retry with, by design (a leaked
|
|
gateway holds zero capability material). Source of truth:
|
|
`gateway/relay/transport.py` (`send_follow_up`) + `gateway/relay/adapter.py`.
|
|
|
|
---
|
|
|
|
## 5. Interrupt (`/stop`) routing
|
|
|
|
- **Gateway → connector:** `send_interrupt(session_key, reason?)` egresses a
|
|
mid-turn `/stop` over the outbound WS. The connector MUST forward it to the
|
|
gateway instance running that `session_key` (the routing invariant).
|
|
- **Connector → gateway:** an inbound interrupt for a `session_key` is delivered
|
|
as an `interrupt_inbound` frame down the gateway's outbound WS (§3 transport
|
|
note) — routed cross-instance via the relay bus to whichever instance holds
|
|
the socket — and bridged by the adapter's `on_interrupt(session_key, chat_id)`
|
|
into the existing per-session interrupt mechanism, cancelling exactly that turn
|
|
(siblings untouched).
|
|
|
|
Both directions ride the gateway's outbound WS: the gateway→connector `/stop`
|
|
egresses over it, and the connector→gateway interrupt rides the same `inbound`
|
|
back-channel as a normalized event.
|
|
|
|
---
|
|
|
|
## 6. Trust boundary & signed-body handling (A2)
|
|
|
|
**The connector is the sole crypto/identity boundary. The gateway re-validates
|
|
nothing.**
|
|
|
|
Webhook signatures (Discord ed25519, Twilio HMAC, WeCom BizMsgCrypt) are
|
|
computed over exact raw bytes, and some payloads are *encrypted* with a shared
|
|
secret. The connector fronts a **shared** bot for many tenants and holds every
|
|
tenant's platform secrets, so it:
|
|
|
|
- **verifies / decrypts at the edge** (the only place the secrets live),
|
|
- **normalizes** the payload into a tenant-scoped `MessageEvent` (§3),
|
|
- **strips any shared-identity capability** out of the payload and binds it in
|
|
its capability vault, keyed by the session (see §4 `follow_up`),
|
|
- **forwards only the sanitized `MessageEvent`** — never the raw signed body.
|
|
|
|
The gateway therefore performs **no** platform signature/crypto verification on
|
|
the relay path; it trusts the normalized event. This is an enforced invariant on
|
|
the gateway side (`tests/gateway/relay/test_relay_sheds_crypto.py`: the relay
|
|
package imports/calls no platform-crypto).
|
|
|
|
**Why not "forward the signed body byte-for-byte so the gateway re-validates"?**
|
|
That earlier model is incoherent under an untrusted, disposable tenant gateway:
|
|
|
|
- Re-validating Twilio HMAC / WeCom crypto would require handing the gateway the
|
|
**shared signing secret** — which is itself the leak, and on a shared bot it's
|
|
a *cross-tenant* leak.
|
|
- WeCom payloads are encrypted with the shared secret; the connector must decrypt
|
|
at the edge just to route, so forwarding ciphertext would again require giving
|
|
the gateway the secret.
|
|
- A Discord interaction token lives **inside** the signed JSON body — you cannot
|
|
both preserve the bytes and strip the credential; they are the same bytes.
|
|
|
|
So byte-preservation is abandoned deliberately: the connector re-serializes the
|
|
sanitized event and the gateway trusts it. This also unifies the passthrough and
|
|
relay planes — both are "verify at the edge → emit a normalized event," differing
|
|
only in transport. See `docs/capability-trust-boundary.md` (connector repo:
|
|
`gateway-gateway`) for the full A2 rationale and the connector-side vault.
|
|
|
|
### 6.1 Channel authentication (the connector⇄gateway link itself)
|
|
|
|
A2 makes the connector the sole holder of platform secrets while the gateway may
|
|
be **customer-managed and internet-exposed**, so the connector⇄gateway channel
|
|
is itself authenticated. The gateway holds an enrollment- or provision-issued
|
|
**per-gateway secret** (`hermes gateway enroll` → connector `/relay/enroll`, or
|
|
managed self-provision → `/relay/provision`) that authenticates its outbound WS
|
|
upgrade. It is an HMAC-SHA256 scheme with a multi-secret rotation verify list
|
|
(gateway side: `gateway/relay/auth.py`; connector side:
|
|
`src/core/relayAuthToken.ts`).
|
|
|
|
| Leg | Credential | Mechanism |
|
|
|-----|-----------|-----------|
|
|
| Gateway → connector WS upgrade | per-gateway secret | An `Authorization` bearer header on the `/relay` upgrade. The token is `base64url(payload:exp:sig)` where `payload = gatewayId` and `sig = HMAC(payload:exp, secret)`. Connector verifies and rejects the upgrade (**close 4401**) on mismatch/absence/revocation. The authenticated tenant comes from the connector's store, never the `hello` frame. |
|
|
| Connector → gateway inbound (`inbound` / `interrupt_inbound` frames) | — (rides the authenticated WS) | Inbound is pushed down the gateway's already-authenticated outbound socket (§3), so no per-message signature is needed. A **per-tenant delivery key** is still issued at enroll/provision and retained for forward-compat, but is no longer used to sign inbound. |
|
|
|
|
This is the **channel** authenticator — distinct from platform crypto, which the
|
|
relay path still sheds entirely (§6). The gateway holds zero platform secrets;
|
|
the per-gateway secret authenticates only the connector link. Full threat model +
|
|
enrollment/rotation/kill-switch design: `docs/connector-gateway-auth-design.md`
|
|
(connector repo).
|
|
|
|
---
|
|
|
|
## 7. Versioning policy
|
|
|
|
- `contract_version` is an int; bump **only** for additive changes during the
|
|
experimental phase (new optional fields, new `op`s).
|
|
- A breaking change (renamed/removed field, changed semantics) requires a
|
|
coordinated update of both repos and a version bump.
|
|
- The connector's first PR references the commit SHA of this file it implements
|
|
against.
|