hermes-agent/gateway
Ben Barclay a64fc490fe
fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828)
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect

Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.

1. RELAY platform never enabled in config.platforms (gateway/config.py).
   register_relay_adapter() puts the adapter in the platform_registry, but
   start_gateway()'s connect loop iterates self.config.platforms — which never
   contained Platform.RELAY. So the adapter was "registered" but never connected
   (logs showed "relay adapter registered" then "No messaging platforms
   enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
   relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
   or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
   single-tenant gateways unaffected).

2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
   The relay URL is configured once as the http(s):// base (used as-is for the
   provision POST), but websockets.connect rejects http(s):// with "scheme isn't
   ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.

3. /relay path not appended (same helper). The connector mounts its
   WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
   other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
   -> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
   already carrying ws(s):// and/or /relay is unchanged, so provision's
   _provision_url (which derives /relay/provision from either form) still works.

Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.

Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.

Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).

Relay-adapter lane. EXPERIMENTAL.

* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant

The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".

Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.

Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.

Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).

Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).

Relay-adapter lane. EXPERIMENTAL.
2026-06-19 16:30:24 +10:00
..
assets fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
builtin_hooks remove: BOOT.md built-in hook (#17093) 2026-04-28 09:50:27 -07:00
platforms fix(telegram): resolve replies to rich (sendRichMessage) messages 2026-06-16 13:04:20 -07:00
relay fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828) 2026-06-19 16:30:24 +10:00
__init__.py docs(gateway): mention Weixin in gateway help and docstrings 2026-05-12 17:08:51 -07:00
authz_mixin.py fix(gateway): preserve WeCom per-group sender allowlists 2026-06-13 07:18:54 -07:00
channel_directory.py fix: harden WhatsApp target alias salvage 2026-06-15 05:51:47 -07:00
config.py fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828) 2026-06-19 16:30:24 +10:00
delivery.py fix(gateway): drop outbound silence-narration messages pre-send 2026-05-29 19:06:05 -07:00
display_config.py feat(gateway): rename to tool_progress_grouping, add config/docs/tests 2026-06-16 05:49:24 -07:00
hooks.py feat(hooks): expose thread_id and chat_type in agent:start/end context (#41672) 2026-06-07 19:16:36 -07:00
kanban_watchers.py refactor(gateway): extract kanban watcher loops into GatewayKanbanWatchersMixin (god-file Phase 3) 2026-06-07 23:14:18 -07:00
memory_monitor.py Port from cline/cline#10343: periodic gateway memory logging (#27102) 2026-05-16 12:55:23 -07:00
message_timestamps.py feat(gateway): inject stable human-readable message timestamps 2026-06-16 15:49:59 -07:00
mirror.py refactor(gateway): drop _append_to_jsonl from mirror 2026-05-20 13:00:57 -07:00
pairing.py fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
platform_registry.py refactor(plugins): add apply_yaml_config_fn registry hook 2026-05-13 22:20:30 -07:00
response_filters.py fix(gateway): suppress exact silence tokens without mutating history 2026-06-14 03:25:08 -07:00
restart.py fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
rich_sent_store.py fix(telegram): resolve replies to rich (sendRichMessage) messages 2026-06-16 13:04:20 -07:00
run.py fix(relay): trigger self-provision on relay-config + NAS token, not is_managed() (#48724) 2026-06-19 01:01:24 +00:00
runtime_footer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
session.py refactor: remove agent-callable send_message tool (#47856) 2026-06-17 07:11:23 -07:00
session_context.py fix(api-server): bind request session context for tools 2026-06-08 20:52:08 -07:00
shutdown_forensics.py chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926) 2026-05-11 11:03:29 -07:00
slash_access.py feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373) 2026-05-10 12:33:54 -07:00
slash_commands.py fix(gateway): preserve original transcript when /compress rotation is skipped 2026-06-18 13:38:35 -07:00
status.py Fix dashboard gateway profile scoping 2026-06-17 05:40:57 -07:00
sticker_cache.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
stream_consumer.py fix(mattermost): harden delivery hygiene 2026-06-16 06:34:54 -07:00
stream_dispatch.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
stream_events.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
whatsapp_identity.py fix(whatsapp_identity): pin identifier regex to ASCII, clarify it's defense-in-depth 2026-04-26 20:48:31 -07:00