* feat(relay): authenticate the connector⇄gateway WS channel
The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.
Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.
* feat(relay): signed-HTTP inbound delivery receiver
The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.
Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).
Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.
* feat(relay): hermes gateway enroll CLI
Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.
Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.
* docs(relay): inbound is signed HTTP, not WS; document channel auth
Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.
* test(relay): update build_gateway_parser callers for cmd_gateway_enroll
The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
14 KiB
Relay ↔ Connector Contract (v1, EXPERIMENTAL)
Status: EXPERIMENTAL. This contract MAY CHANGE without a deprecation cycle until at least two real Class-1 platforms (Discord + Telegram) have validated it. Evolution during the experimental phase is additive-only, gated by
contract_version. A breaking change updates both repos in lockstep.
This document is the formal interface between the Hermes gateway (Python,
gateway/relay/) and the connector (Node/TypeScript,
NousResearch/gateway-gateway). The connector implementer's first action is to
read this file.
The gateway runs a generic RelayAdapter that dials out to the connector,
receives a CapabilityDescriptor at handshake, then exchanges normalized
MessageEvents (inbound) and actions (outbound) over a per-turn bidirectional
WebSocket. The gateway never learns which concrete platform is fronting it; the
connector owns all platform-specific socket/identity logic.
1. Handshake
- Gateway opens the transport (
connect). - Gateway calls
handshake(); connector returns aCapabilityDescriptor(section 2) describing the platform this adapter instance fronts. - Gateway configures the adapter from the descriptor (char limit, length unit, draft/edit/thread/markdown capabilities) and registers an inbound handler.
- Connector then streams inbound events and accepts outbound actions.
contract_version (currently 1) is carried in the descriptor. The gateway
ignores unknown descriptor fields (forward-compat) and fills missing optional
fields from defaults.
2. CapabilityDescriptor (handshake payload)
JSON object. Source of truth: gateway/relay/descriptor.py.
| Field | Type | Required | Meaning |
|---|---|---|---|
contract_version |
int | yes | Contract version (additive-only within a version). |
platform |
string | yes | Platform name (e.g. "discord", "telegram"). |
label |
string | yes | Human-readable label. |
max_message_length |
int | yes | Char limit; gateway exposes as MAX_MESSAGE_LENGTH. 0 → treat as 4096. |
supports_draft_streaming |
bool | yes | Native draft-streaming preview support. |
supports_edit |
bool | yes | Edit-based streaming possible; if false, consumer degrades to one-message-per-segment. |
supports_threads |
bool | yes | create_handoff_thread capability. |
markdown_dialect |
string | yes | "plain", "markdown_v2", "discord", … (drives supports_code_blocks). |
len_unit |
string | yes | "chars" (builtin len) or "utf16" (Telegram UTF-16 code units). |
emoji |
string | no | Display emoji (default 🔌). |
platform_hint |
string | no | System-prompt platform hint. |
pii_safe |
bool | no | Redact PII in session descriptions. |
Most fields are a projection of the gateway's existing PlatformEntry; the
runtime-only fields (len_unit, supports_*, markdown_dialect) come from the
live platform adapter's capability methods.
3. Inbound: MessageEvent envelope
The connector normalizes each platform wire event into a MessageEvent
(gateway/platforms/base.py) and delivers it to the gateway. Inbound is
delivered over a signed HTTP POST, not the outbound /relay WebSocket (see
the transport note below). The gateway keys the session via build_session_key()
from the embedded SessionSource — so populating the right discriminators is
the single highest-correctness responsibility of the connector.
Inbound transport (signed HTTP POST, not the outbound WS)
The gateway dials out to the connector's /relay WebSocket for the
handshake + outbound actions (§4) + its own /stop egress (§5). Inbound,
however, is delivered the other way: the connector POSTs the normalized
event to the gateway's inbound endpoint (HttpGatewayDelivery on the connector;
gateway/relay/inbound_receiver.py on the gateway). The reason is
multi-instance: the connector instance that owns a platform's socket (and thus
produces inbound events) is generally not the instance a given gateway
dialed its outbound WS into, so inbound must target a tenant endpoint (which
may load-balance across gateway instances) rather than ride one gateway's
outbound socket. Each delivery is HMAC-signed with the per-tenant delivery
key (§6.1); the gateway verifies the signature over the exact raw bytes before
accepting the event. Two POST targets:
POST {gatewayEndpoint}→{"type":"message", "event": <MessageEvent>}POST {gatewayEndpoint}/interrupt→{"type":"interrupt", "session_key", "reason"?}(§5)
An earlier draft of this contract delivered inbound over the WS
inboundframe. That only works single-instance and predates the multi-instance socket-ownership + channel-auth model; the signed-HTTP path above is the shipped design.
SessionSource fields (the wire surface)
Source of truth: SessionSource.to_dict() in gateway/session.py. These are
every key the gateway accepts on the wire. platform, chat_id, chat_type,
user_id, user_name, thread_id, chat_name, and chat_topic are always
present (may be null); the rest are included only when set.
| Field | Type | Always sent | Meaning |
|---|---|---|---|
platform |
string | yes | Platform name (matches the descriptor's platform). |
chat_id |
string | yes | Primary conversation id (channel/chat). Session-key discriminator. |
chat_type |
string | yes | dm / group / channel / thread / forum. |
chat_name |
string|null | yes | Human-readable chat name. |
user_id |
string|null | yes | Message author id. Session-key discriminator. |
user_name |
string|null | yes | Author display name. |
thread_id |
string|null | yes | Thread/forum-topic id when in a thread. Session-key discriminator. |
chat_topic |
string|null | yes | Channel topic/description (Discord, Slack). |
user_id_alt |
string | no | Platform-specific stable alt id (Signal UUID, Feishu union_id). |
chat_id_alt |
string | no | Alternate chat id (e.g. Signal group internal id). |
guild_id |
string | no | Discord guild / Slack workspace / Matrix server scope. REQUIRED for Discord server isolation. Session-key discriminator. |
parent_chat_id |
string | no | Parent channel when chat_id refers to a thread. |
message_id |
string | no | Id of the triggering message (for pin/reply/react). |
is_bot(author-is-a-bot/webhook classification) exists on the gateway-side dataclass but is intentionally NOT on the wire in v1 — it is not part ofto_dict(). Do not add it to the connector'sSessionSourceuntil it is first added here and toto_dict()(additive bump).
SessionSource discriminators per platform
| Platform | chat_id | chat_type | user_id | thread_id | guild_id |
|---|---|---|---|---|---|
| Discord | channel id | dm/group/thread |
author id | thread channel id (threads) | guild id (REQUIRED for server isolation) |
| Telegram | chat id | dm/group/forum |
from id | forum topic id (forums) | — |
Get Discord's guild_id wrong and two servers collide into one session.
This is the #1 High-severity risk. The gateway's build_session_key() is the
conformance oracle: for a given SessionSource, the connector's normalization
must produce the same key the Python adapter would. (The Phase-1 stub tests
assert known-input → known-key.)
Bot identity vs tenant (single-bot consolidation, Appendix A)
The envelope carries the originating bot identity as a field distinct from
tenant. Tenant is resolved from the event's own discriminator (Discord
guild_id, Telegram chat_id, webhook path/subdomain) — never from which
token/socket/process delivered it. This keeps one shared bot able to front many
tenants (Phase 6) without overloading an existing field.
4. Outbound: action set
The gateway calls the transport with action dicts. Source of truth:
gateway/relay/transport.py + gateway/relay/adapter.py.
op |
Fields | Result |
|---|---|---|
send |
chat_id, content, reply_to?, metadata? |
{success: bool, message_id?, error?} |
edit |
chat_id, message_id, content, metadata? |
{success: bool, error?} |
typing |
chat_id |
{success: bool} |
follow_up |
session_key, kind, content, metadata? |
{success: bool, message_id?, error?} |
get_chat_info(chat_id) is a separate proxied call returning at least
{name, type}. Media actions follow the same envelope shape (deferred to a
later contract revision; additive).
follow_up (A2 capability action). Some inbound payloads carry a credential
that acts on the shared bot identity (e.g. a Discord interaction follow-up
token). Per §6 the connector strips that at the edge and binds it in its
capability vault keyed by the session; it never reaches the gateway. To use
it, the gateway issues follow_up naming the session it is already in
(session_key) plus the capability kind (e.g. discord.interaction_token) —
never a token. The connector resolves the real value from its vault,
enforces the tenant match (tenant B can never wield tenant A's capability), and
egresses. success: false when the capability is absent/expired or the tenant
doesn't match — the gateway has nothing to retry with, by design (a leaked
gateway holds zero capability material). Source of truth:
gateway/relay/transport.py (send_follow_up) + gateway/relay/adapter.py.
5. Interrupt (/stop) routing
- Gateway → connector:
send_interrupt(session_key, reason?)egresses a mid-turn/stopover the outbound WS. The connector MUST forward it to the gateway instance running thatsession_key(the routing invariant). - Connector → gateway: an inbound interrupt for a
session_keyis delivered as a signed HTTP POST to{gatewayEndpoint}/interrupt(§3 transport note), and bridged by the adapter'son_interrupt(session_key, chat_id)into the existing per-session interrupt mechanism, cancelling exactly that turn (siblings untouched).
The gateway→connector /stop rides the outbound WS; the connector→gateway
interrupt rides the same signed-HTTP inbound path as a normalized event.
6. Trust boundary & signed-body handling (A2)
The connector is the sole crypto/identity boundary. The gateway re-validates nothing.
Webhook signatures (Discord ed25519, Twilio HMAC, WeCom BizMsgCrypt) are computed over exact raw bytes, and some payloads are encrypted with a shared secret. The connector fronts a shared bot for many tenants and holds every tenant's platform secrets, so it:
- verifies / decrypts at the edge (the only place the secrets live),
- normalizes the payload into a tenant-scoped
MessageEvent(§3), - strips any shared-identity capability out of the payload and binds it in
its capability vault, keyed by the session (see §4
follow_up), - forwards only the sanitized
MessageEvent— never the raw signed body.
The gateway therefore performs no platform signature/crypto verification on
the relay path; it trusts the normalized event. This is an enforced invariant on
the gateway side (tests/gateway/relay/test_relay_sheds_crypto.py: the relay
package imports/calls no platform-crypto).
Why not "forward the signed body byte-for-byte so the gateway re-validates"? That earlier model is incoherent under an untrusted, disposable tenant gateway:
- Re-validating Twilio HMAC / WeCom crypto would require handing the gateway the shared signing secret — which is itself the leak, and on a shared bot it's a cross-tenant leak.
- WeCom payloads are encrypted with the shared secret; the connector must decrypt at the edge just to route, so forwarding ciphertext would again require giving the gateway the secret.
- A Discord interaction token lives inside the signed JSON body — you cannot both preserve the bytes and strip the credential; they are the same bytes.
So byte-preservation is abandoned deliberately: the connector re-serializes the
sanitized event and the gateway trusts it. This also unifies the passthrough and
relay planes — both are "verify at the edge → emit a normalized event," differing
only in transport. See docs/capability-trust-boundary.md (connector repo:
gateway-gateway) for the full A2 rationale and the connector-side vault.
6.1 Channel authentication (the connector⇄gateway link itself)
A2 makes the connector the sole holder of platform secrets while the gateway may
be customer-managed and internet-exposed, so the connector⇄gateway channel
is itself authenticated. The gateway holds two enrollment-issued credentials
(hermes gateway enroll → connector /relay/enroll): a per-gateway secret
and a per-tenant delivery key. Both are HMAC-SHA256 schemes with a
multi-secret rotation verify list (gateway side: gateway/relay/auth.py;
connector side: src/core/relayAuthToken.ts + src/core/deliverySigning.ts).
| Leg | Credential | Mechanism |
|---|---|---|
| Gateway → connector WS upgrade | per-gateway secret | An Authorization bearer header on the /relay upgrade. The token is base64url(payload:exp:sig) where payload = gatewayId and sig = HMAC(payload:exp, secret). Connector verifies and rejects the upgrade (close 4401) on mismatch/absence/revocation. The authenticated tenant comes from the connector's store, never the hello frame. |
| Connector → gateway inbound POST | per-tenant delivery key | Two headers: x-relay-timestamp (unix seconds) and x-relay-signature (hex HMAC(ts.rawBody, deliveryKey)). Gateway verifies over the exact raw bytes within a ±300s replay window before accepting the event; rejects 401 otherwise. |
This is the channel authenticator — distinct from platform crypto, which the
relay path still sheds entirely (§6). The gateway holds zero platform secrets;
these two keys authenticate only the connector link. Full threat model +
enrollment/rotation/kill-switch design: docs/connector-gateway-auth-design.md
(connector repo).
7. Versioning policy
contract_versionis an int; bump only for additive changes during the experimental phase (new optional fields, newops).- A breaking change (renamed/removed field, changed semantics) requires a coordinated update of both repos and a version bump.
- The connector's first PR references the commit SHA of this file it implements against.