hermes-agent/tests/gateway/relay/test_relay_adapter.py
Ben Barclay a64fc490fe
fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828)
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect

Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.

1. RELAY platform never enabled in config.platforms (gateway/config.py).
   register_relay_adapter() puts the adapter in the platform_registry, but
   start_gateway()'s connect loop iterates self.config.platforms — which never
   contained Platform.RELAY. So the adapter was "registered" but never connected
   (logs showed "relay adapter registered" then "No messaging platforms
   enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
   relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
   or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
   single-tenant gateways unaffected).

2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
   The relay URL is configured once as the http(s):// base (used as-is for the
   provision POST), but websockets.connect rejects http(s):// with "scheme isn't
   ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.

3. /relay path not appended (same helper). The connector mounts its
   WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
   other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
   -> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
   already carrying ws(s):// and/or /relay is unchanged, so provision's
   _provision_url (which derives /relay/provision from either form) still works.

Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.

Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.

Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).

Relay-adapter lane. EXPERIMENTAL.

* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant

The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".

Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.

Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.

Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).

Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).

Relay-adapter lane. EXPERIMENTAL.
2026-06-19 16:30:24 +10:00

142 lines
4.6 KiB
Python

"""RelayAdapter capability-advertisement tests (relay Phase 1, Task 1.1)."""
import pytest
from gateway.config import Platform, PlatformConfig
from gateway.relay.adapter import RelayAdapter
from gateway.relay.descriptor import CONTRACT_VERSION, CapabilityDescriptor
def make_desc(**kw) -> CapabilityDescriptor:
base = dict(
contract_version=CONTRACT_VERSION,
platform="telegram",
label="Telegram",
max_message_length=4096,
supports_draft_streaming=False,
supports_edit=True,
supports_threads=True,
markdown_dialect="markdown_v2",
len_unit="utf16",
emoji="\u2708\ufe0f",
platform_hint="",
pii_safe=False,
)
base.update(kw)
return CapabilityDescriptor(**base)
def _adapter(**desc_kw) -> RelayAdapter:
return RelayAdapter(PlatformConfig(), make_desc(**desc_kw))
def test_relay_platform_member_exists():
assert Platform("relay") is Platform.RELAY
def test_advertises_descriptor_max_length():
a = _adapter(max_message_length=2000)
assert a.MAX_MESSAGE_LENGTH == 2000
def test_supports_draft_streaming_follows_descriptor():
assert _adapter(supports_draft_streaming=False).supports_draft_streaming() is False
assert _adapter(supports_draft_streaming=True).supports_draft_streaming() is True
def test_len_fn_utf16_counts_code_units():
a = _adapter(len_unit="utf16")
# An astral-plane emoji is two UTF-16 code units.
assert a.message_len_fn("\U0001f600") == 2
def test_len_fn_chars_uses_builtin_len():
a = _adapter(len_unit="chars")
assert a.message_len_fn("\U0001f600") == 1
def test_is_a_base_platform_adapter():
# stream_consumer's isinstance(adapter, BasePlatformAdapter) guard must pass.
from gateway.platforms.base import BasePlatformAdapter
assert isinstance(_adapter(), BasePlatformAdapter)
@pytest.mark.asyncio
async def test_connect_without_transport_raises():
a = _adapter()
with pytest.raises(RuntimeError, match="no transport"):
await a.connect()
@pytest.mark.asyncio
async def test_send_without_transport_returns_failure():
a = _adapter()
result = await a.send("chat1", "hello")
assert result.success is False
assert result.error == "no transport"
class _CaptureTransport:
"""Minimal RelayTransport stand-in that records the outbound action."""
def __init__(self):
self.sent = None
def set_inbound_handler(self, h): # noqa: D401
self._h = h
async def send_outbound(self, action):
self.sent = action
return {"success": True, "message_id": "m1"}
def _make_event(chat_id="chan-1", guild_id="guild-9"):
from gateway.platforms.base import MessageEvent, MessageType
from gateway.session import SessionSource
src = SessionSource(
platform=Platform.RELAY,
chat_id=chat_id,
chat_type="channel",
guild_id=guild_id,
)
return MessageEvent(text="hi", source=src, message_type=MessageType.TEXT)
@pytest.mark.asyncio
async def test_send_reattaches_guild_id_from_inbound_scope():
"""The connector's egress guard resolves the owning tenant from
metadata.guild_id; the gateway's generic delivery path drops it, so the
relay adapter must re-attach the guild scope learned from the inbound event.
Regression for live 'discord egress declined: target not routed to an
onboarded tenant'."""
t = _CaptureTransport()
a = RelayAdapter(PlatformConfig(), make_desc(platform="discord"), transport=t)
# Simulate the connector delivering an inbound message in guild-9 / chan-1,
# but don't run the full handle_message pipeline — just the scope capture.
a._capture_scope(_make_event(chat_id="chan-1", guild_id="guild-9"))
await a.send("chan-1", "the reply")
assert t.sent["metadata"].get("guild_id") == "guild-9"
@pytest.mark.asyncio
async def test_send_without_known_scope_omits_guild_id():
"""A chat we never saw inbound (e.g. a DM) gets no guild_id — no-op, never
invents a scope."""
t = _CaptureTransport()
a = RelayAdapter(PlatformConfig(), make_desc(platform="discord"), transport=t)
await a.send("unknown-chat", "hi")
assert "guild_id" not in t.sent["metadata"]
@pytest.mark.asyncio
async def test_send_preserves_explicit_guild_id():
"""An explicitly-provided metadata.guild_id is never overwritten."""
t = _CaptureTransport()
a = RelayAdapter(PlatformConfig(), make_desc(platform="discord"), transport=t)
a._capture_scope(_make_event(chat_id="chan-1", guild_id="guild-9"))
await a.send("chan-1", "hi", metadata={"guild_id": "explicit-1"})
assert t.sent["metadata"]["guild_id"] == "explicit-1"