mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-19 10:02:16 +00:00
* feat(relay): authenticate the connector⇄gateway WS channel
The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.
Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.
* feat(relay): signed-HTTP inbound delivery receiver
The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.
Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).
Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.
* feat(relay): hermes gateway enroll CLI
Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.
Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.
* docs(relay): inbound is signed HTTP, not WS; document channel auth
Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.
* test(relay): update build_gateway_parser callers for cmd_gateway_enroll
The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
250 lines
10 KiB
Python
250 lines
10 KiB
Python
"""``hermes gateway enroll`` — enroll a self-hosted gateway with a relay connector.
|
|
|
|
The connector⇄gateway channel is authenticated (the gateway may be
|
|
customer-managed and internet-exposed). This command is the gateway half of the
|
|
zero-touch enrollment in the connector repo's
|
|
``docs/connector-gateway-auth-design.md``:
|
|
|
|
1. Resolve a fresh Nous Portal access token from the existing login
|
|
(``~/.hermes/auth.json``) — the same path ``hermes dashboard register``
|
|
uses (``resolve_nous_access_token``). This proves *which Nous org (tenant)*
|
|
the caller owns; the connector derives the authoritative tenant from it via
|
|
``GET /api/oauth/account`` (never from anything the gateway asserts).
|
|
2. POST ``{enrollmentToken, gatewayId}`` to the connector's ``/relay/enroll``
|
|
with that token in the ``Authorization`` header, over TLS.
|
|
3. The connector verifies the enrollment token (signature + single-use +
|
|
tenant match), mints a per-gateway secret, get-or-creates the per-tenant
|
|
delivery key, and returns both ONCE.
|
|
4. Persist ``GATEWAY_RELAY_ID`` / ``GATEWAY_RELAY_SECRET`` /
|
|
``GATEWAY_RELAY_DELIVERY_KEY`` (+ ``GATEWAY_RELAY_URL`` if supplied) into
|
|
``~/.hermes/.env``. The per-gateway secret authenticates the WS upgrade;
|
|
the per-tenant delivery key verifies signed inbound deliveries.
|
|
|
|
Managed/hosted installs do NOT self-enroll: the orchestrator (NAS) mints the
|
|
secret directly and stamps it into the container env, so this command refuses to
|
|
run under ``is_managed()`` (mirrors ``dashboard register``).
|
|
|
|
EXPERIMENTAL: the relay auth scheme may change without a deprecation cycle until
|
|
≥2 Class-1 platforms validate the contract.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import os
|
|
import socket
|
|
import sys
|
|
import urllib.error
|
|
import urllib.request
|
|
from typing import Optional
|
|
|
|
|
|
def _default_gateway_id() -> str:
|
|
"""A stable-ish default gateway instance id: ``<hostname>-<pid-free slug>``.
|
|
|
|
The gatewayId identifies this enrolled instance for kill-switch granularity
|
|
(the connector indexes its secret verify list by it). Default to the host
|
|
name so a human can recognize it; overridable via ``--gateway-id``.
|
|
"""
|
|
host = ""
|
|
try:
|
|
host = socket.gethostname().strip()
|
|
except Exception:
|
|
host = ""
|
|
return f"gw-{host or 'hermes'}"
|
|
|
|
|
|
def _resolve_connector_url(override: Optional[str]) -> Optional[str]:
|
|
"""Resolve the connector base URL (no trailing slash) for enrollment.
|
|
|
|
Precedence: explicit ``--connector-url`` flag > ``GATEWAY_RELAY_URL`` env >
|
|
``gateway.relay_url`` in config.yaml. The relay URL is a ``ws(s)://`` dial
|
|
target; enrollment is an ``http(s)://`` POST to the same host, so we map the
|
|
scheme. Returns None when nothing is configured (the user must supply one).
|
|
"""
|
|
raw = (override or os.environ.get("GATEWAY_RELAY_URL", "")).strip()
|
|
if not raw:
|
|
try:
|
|
from gateway.run import _load_gateway_config # late import to avoid cycle
|
|
|
|
cfg = (_load_gateway_config().get("gateway") or {})
|
|
raw = str(cfg.get("relay_url", "") or "").strip()
|
|
except Exception:
|
|
raw = ""
|
|
if not raw:
|
|
return None
|
|
raw = raw.rstrip("/")
|
|
# The relay dial URL is ws(s)://…/relay; enrollment posts to http(s)://…/relay/enroll.
|
|
if raw.startswith("ws://"):
|
|
raw = "http://" + raw[len("ws://"):]
|
|
elif raw.startswith("wss://"):
|
|
raw = "https://" + raw[len("wss://"):]
|
|
# Strip a trailing /relay path segment if the user pasted the dial URL.
|
|
if raw.endswith("/relay"):
|
|
raw = raw[: -len("/relay")]
|
|
return raw
|
|
|
|
|
|
def _post_enroll(
|
|
*,
|
|
connector_base_url: str,
|
|
access_token: str,
|
|
enrollment_token: str,
|
|
gateway_id: str,
|
|
timeout: float = 15.0,
|
|
) -> dict:
|
|
"""POST to the connector's ``/relay/enroll`` and return the JSON body.
|
|
|
|
Raises RuntimeError with a user-facing message on any non-2xx / transport
|
|
failure. The connector returns ``{secret, deliveryKey, tenant, gatewayId}``
|
|
on success, ``{error}`` at 400/401/403.
|
|
"""
|
|
url = f"{connector_base_url.rstrip('/')}/relay/enroll"
|
|
data = json.dumps({"enrollmentToken": enrollment_token, "gatewayId": gateway_id}).encode("utf-8")
|
|
req = urllib.request.Request(
|
|
url,
|
|
data=data,
|
|
method="POST",
|
|
headers={
|
|
"Authorization": f"Bearer {access_token}",
|
|
"Content-Type": "application/json",
|
|
"Accept": "application/json",
|
|
},
|
|
)
|
|
try:
|
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
|
payload = json.loads(resp.read().decode())
|
|
except urllib.error.HTTPError as exc:
|
|
detail = ""
|
|
try:
|
|
detail = (json.loads(exc.read().decode()) or {}).get("error", "")
|
|
except Exception:
|
|
pass
|
|
if exc.code == 401:
|
|
raise RuntimeError(
|
|
"Connector rejected the caller identity (401). Your Nous Portal "
|
|
"token could not be verified — try `hermes auth login nous` and retry."
|
|
) from exc
|
|
if exc.code == 403:
|
|
raise RuntimeError(
|
|
detail
|
|
or "Enrollment token invalid, expired, already used, or tenant mismatch (403)."
|
|
) from exc
|
|
raise RuntimeError(
|
|
f"Connector returned HTTP {exc.code}" + (f": {detail}" if detail else "")
|
|
) from exc
|
|
except urllib.error.URLError as exc:
|
|
raise RuntimeError(
|
|
f"Could not reach the connector at {connector_base_url}: {exc.reason}"
|
|
) from exc
|
|
|
|
if not isinstance(payload, dict) or not payload.get("secret"):
|
|
raise RuntimeError("Connector returned an unexpected response (no secret).")
|
|
return payload
|
|
|
|
|
|
def cmd_gateway_enroll(args) -> None:
|
|
"""Enroll this gateway with a relay connector; persist the auth creds to .env."""
|
|
from hermes_cli.auth import AuthError, resolve_nous_access_token
|
|
from hermes_cli.config import is_managed, save_env_value
|
|
|
|
# Managed installs get GATEWAY_RELAY_* stamped in by the orchestrator (NAS
|
|
# mints the secret directly per the design's managed shape). Self-enrolling
|
|
# from inside such a container is a mistake — and save_env_value refuses to
|
|
# write anyway.
|
|
if is_managed():
|
|
print(
|
|
"✗ `hermes gateway enroll` is not available in a managed/hosted install.\n"
|
|
" The relay gateway secret is provisioned by the hosting platform."
|
|
)
|
|
sys.exit(1)
|
|
|
|
enrollment_token = (getattr(args, "token", None) or os.environ.get("GATEWAY_RELAY_ENROLL_TOKEN", "")).strip()
|
|
if not enrollment_token:
|
|
print(
|
|
"✗ No enrollment token. Pass --token <token> (or set "
|
|
"GATEWAY_RELAY_ENROLL_TOKEN).\n"
|
|
" The connector mints this single-use token when your tenant's route "
|
|
"is provisioned; it is delivered with your gateway config."
|
|
)
|
|
sys.exit(1)
|
|
|
|
connector_base_url = _resolve_connector_url(getattr(args, "connector_url", None))
|
|
if not connector_base_url:
|
|
print(
|
|
"✗ No connector URL. Pass --connector-url <url> (or set GATEWAY_RELAY_URL "
|
|
"/ gateway.relay_url in config.yaml)."
|
|
)
|
|
sys.exit(1)
|
|
|
|
gateway_id = (getattr(args, "gateway_id", None) or _default_gateway_id()).strip()
|
|
|
|
# 1. Resolve a fresh Nous access token (the tenant-proving identity).
|
|
try:
|
|
access_token = resolve_nous_access_token()
|
|
except AuthError as exc:
|
|
if getattr(exc, "relogin_required", False):
|
|
print("✗ You're not logged into Nous Portal.")
|
|
print(" Run `hermes setup` (or `hermes auth login nous`) first, then retry.")
|
|
else:
|
|
print(f"✗ Could not resolve a Nous Portal access token: {exc}")
|
|
sys.exit(1)
|
|
except Exception as exc:
|
|
print(f"✗ Could not resolve a Nous Portal access token: {exc}")
|
|
sys.exit(1)
|
|
|
|
# 2-3. Redeem the enrollment token at the connector.
|
|
try:
|
|
result = _post_enroll(
|
|
connector_base_url=connector_base_url,
|
|
access_token=access_token,
|
|
enrollment_token=enrollment_token,
|
|
gateway_id=gateway_id,
|
|
)
|
|
except RuntimeError as exc:
|
|
print(f"✗ Enrollment failed: {exc}")
|
|
sys.exit(1)
|
|
|
|
secret = str(result.get("secret") or "")
|
|
delivery_key = str(result.get("deliveryKey") or "")
|
|
tenant = str(result.get("tenant") or "")
|
|
resolved_gateway_id = str(result.get("gatewayId") or gateway_id)
|
|
|
|
# 4. Persist the creds idempotently. The secret + delivery key are sensitive;
|
|
# save_env_value writes them to ~/.hermes/.env (0600 dir) and never logs.
|
|
to_write = {
|
|
"GATEWAY_RELAY_ID": resolved_gateway_id,
|
|
"GATEWAY_RELAY_SECRET": secret,
|
|
"GATEWAY_RELAY_DELIVERY_KEY": delivery_key,
|
|
}
|
|
# Persist the connector URL too (as the ws(s):// dial target) when supplied
|
|
# explicitly, so the runtime can dial without re-specifying it.
|
|
explicit_url = (getattr(args, "connector_url", None) or "").strip()
|
|
if explicit_url:
|
|
to_write["GATEWAY_RELAY_URL"] = explicit_url.rstrip("/")
|
|
|
|
for key, value in to_write.items():
|
|
if not value:
|
|
continue
|
|
try:
|
|
save_env_value(key, value)
|
|
except Exception as exc:
|
|
print(f"✗ Failed to write {key} to .env: {exc}")
|
|
sys.exit(1)
|
|
|
|
from hermes_cli.config import get_env_path
|
|
|
|
print(f'✓ Enrolled gateway "{resolved_gateway_id}"' + (f" for tenant {tenant}" if tenant else ""))
|
|
print()
|
|
print(f" Wrote to {get_env_path()}:")
|
|
print(f" GATEWAY_RELAY_ID={resolved_gateway_id}")
|
|
print(" GATEWAY_RELAY_SECRET=<hidden>")
|
|
print(" GATEWAY_RELAY_DELIVERY_KEY=<hidden>")
|
|
if explicit_url:
|
|
print(f" GATEWAY_RELAY_URL={explicit_url.rstrip('/')}")
|
|
print()
|
|
print(
|
|
" The gateway now authenticates its relay WS upgrade with the per-gateway\n"
|
|
" secret and verifies signed inbound deliveries with the tenant delivery\n"
|
|
" key. Restart the gateway to pick up the new env."
|
|
)
|