hermes-agent/gateway/platforms/ADDING_A_PLATFORM.md
emozilla 984e6cb5b8 feat(whatsapp): add WhatsApp Business Cloud API adapter
Add an official, production-grade WhatsApp integration via Meta's
Business Cloud API as a complement to the existing Baileys bridge.
No bridge subprocess, no QR codes, no account-ban risk — at the cost
of a Meta Business account and a public HTTPS webhook URL.

Setup is fully wizard-driven: 'hermes whatsapp-cloud' walks through
every credential with paste-time validation (catches the #1 trap of
pasting a phone number into the Phone Number ID field), generates a
verify token, and ends with copy-paste instructions for the
cloudflared / Meta-dashboard / Business Manager pieces that can't be
automated. The wizard also points users at Meta's Business Manager
for setting the bot's display name and profile picture.

Feature set:

- Inbound: text, images (with native-vision routing), voice notes
  (STT), documents (small text inlined, larger cached), reply context.
- Outbound: text with WhatsApp-flavored markdown conversion, images,
  videos, documents, opus voice notes via ffmpeg with MP3 fallback.
- Native interactive buttons for clarify, dangerous-command approval,
  and slash-command confirmation flows — matches the Telegram /
  Discord UX, graceful degrades to plain text.
- Read receipts (blue double-checkmarks) and typing indicator,
  using Meta's combined endpoint so they fire in a single API call.
- Webhook security: X-Hub-Signature-256 HMAC verification (raw body,
  constant-time), wamid deduplication, group-shaped-message refusal
  (groups deferred to v2 — Baileys still covers them).
- Full integration with the gateway's session, cron, display-tier,
  prompt-hint, and auth-allowlist systems. Cloud and Baileys can run
  side-by-side against different phone numbers.

Also wires STT (speech-to-text) through Nous's managed audio gateway
for Nous subscribers — previously the default stt.provider=local
required a separate faster-whisper install. New subscribers now get
voice-note transcription out of the box.

Docs: 418-line user guide at website/docs/user-guide/messaging/
whatsapp-cloud.md, sidebar entry, environment-variables reference,
ADDING_A_PLATFORM.md updated with the optional interactive-UX
contract for future adapter authors.

Tests: 100 dedicated tests for the adapter, 32 for the setup wizard,
20 for the Nous subscription STT wiring, plus regression coverage
across display_config, prompt_builder, and the cron scheduler.

Known limitations (deferred until clear demand signal):
- Group chats — use the Baileys bridge if you need them.
- Message templates for 24-hour-window outside-conversation sends —
  reactive chat is unaffected; cron / delegate_task with gaps > 24h
  will fail with a clear error. The agent's system prompt warns the
  model about this so it knows to mention it when scheduling delayed
  messages.
2026-05-23 01:07:01 -04:00

14 KiB

Adding a New Messaging Platform

There are two ways to add a platform to the Hermes gateway:

Create a plugin directory in ~/.hermes/plugins/ (or under plugins/platforms/ for bundled plugins) with a plugin.yaml and adapter.py. The adapter inherits from BasePlatformAdapter and registers via ctx.register_platform() in the register(ctx) entry point. This requires zero changes to core Hermes code.

The plugin system automatically handles: adapter creation, config parsing, user authorization, cron delivery, send_message routing, system prompt hints, status display, gateway setup, and more.

Optional hooks cover the edges most adapters need:

  • env_enablement_fn: () -> Optional[dict] — seeds PlatformConfig.extra (and an optional home_channel dict) from env vars BEFORE the adapter is constructed. Without this, env-only setups don't surface in hermes gateway status or get_connected_platforms() until the SDK instantiates.
  • apply_yaml_config_fn: (yaml_cfg, platform_cfg) -> Optional[dict] — translate this platform's config.yaml keys into env vars and/or seed PlatformConfig.extra directly. Lets a plugin own its YAML schema instead of growing core gateway/config.py boilerplate per platform. Mutating os.environ is allowed (use not os.getenv(...) guards to preserve env > YAML precedence); the returned dict is merged into PlatformConfig.extra. Called during load_gateway_config() after the generic shared-key loop and before _apply_env_overrides().
  • cron_deliver_env_var: str — name of the *_HOME_CHANNEL env var. When set, deliver=<name> cron jobs route to this var without editing cron/scheduler.py's hardcoded sets.
  • standalone_sender_fn: async (...) -> dict: out-of-process delivery for cron jobs that run separately from the gateway. Without this, a deliver=<name> job fires correctly but the actual send returns No live adapter for platform '<name>'. Pair with cron_deliver_env_var for end-to-end cron support. See the docsite for the signature.
  • plugin.yaml requires_env / optional_env rich-dict entries — auto-populate OPTIONAL_ENV_VARS in hermes_cli/config.py so the setup wizard surfaces proper descriptions, prompts, password flags, and URLs.

Subclassing for platform-specific UX. When a platform has a hard time-window constraint that the base adapter can't anticipate (LINE's 60s single-use reply token, WhatsApp's 24h session window, etc.), an adapter can override _keep_typing to layer a mid-flight bubble at a threshold without expanding the kwarg surface. Always await super()._keep_typing(...) so the typing heartbeat keeps running, and tear down your side task in finally. See plugins/platforms/line/ for the full pattern (Template Buttons postback at 45s, RequestCache state machine, interrupt_session_activity override for /stop orphans) and the developer-guide page for the prose walkthrough.

Sibling adapters that share behavior. When a single platform has two transport modes the user picks between — unofficial vs official APIs, polling vs websocket, library A vs library B — the right structure is two adapters that share a behavior mixin. WhatsApp does this: gateway/platforms/whatsapp.py (Baileys bridge) and gateway/platforms/whatsapp_cloud.py (Meta Cloud API) both inherit from WhatsAppBehaviorMixin in gateway/platforms/whatsapp_common.py. The mixin owns gating, allow-lists, mention parsing, broadcast filters, and the WhatsApp-flavored markdown conversion — everything that's platform-protocol-agnostic. Each adapter owns its transport. Both register distinct Platform.* enum values so the gateway can run both simultaneously against different phone numbers. The mixin must come first in the bases list — class WhatsAppAdapter(Mixin, BasePlatformAdapter) — so the mixin's format_message overrides BasePlatformAdapter's generic default.

See plugins/platforms/irc/, plugins/platforms/teams/, and plugins/platforms/google_chat/ for complete working examples, and website/docs/developer-guide/adding-platform-adapters.md for the full plugin guide with code examples and hook documentation.


Built-in Path (Core Contributors Only)

Checklist for integrating a platform directly into the Hermes core. Use this as a reference when building a built-in adapter — every item here is a real integration point. Missing any of them will cause broken functionality, missing features, or inconsistent behavior.


1. Core Adapter (gateway/platforms/<platform>.py)

The adapter is a subclass of BasePlatformAdapter from gateway/platforms/base.py.

Required methods

Method Purpose
__init__(self, config) Parse config, init state. Call super().__init__(config, Platform.YOUR_PLATFORM)
connect() -> bool Connect to the platform, start listeners. Return True on success
disconnect() Stop listeners, close connections, cancel tasks
send(chat_id, text, ...) -> SendResult Send a text message
send_typing(chat_id) Send typing indicator
send_image(chat_id, image_url, caption) -> SendResult Send an image
get_chat_info(chat_id) -> dict Return {name, type, chat_id} for a chat

Optional methods (have default stubs in base)

Method Purpose
send_document(chat_id, path, caption) Send a file attachment
send_voice(chat_id, path) Send a voice message
send_video(chat_id, path, caption) Send a video
send_animation(chat_id, path, caption) Send a GIF/animation
send_image_file(chat_id, path, caption) Send image from local file

If your platform supports interactive button/menu messages, implement these for a more polished agent experience. They all degrade gracefully to plain text when not overridden:

Method Purpose
send_clarify(chat_id, question, choices, clarify_id, session_key, ...) Render the clarify tool's multi-choice question as tappable buttons. Pair with inbound dispatch that routes button taps to tools.clarify_gateway.resolve_gateway_clarify.
send_exec_approval(chat_id, command, session_key, description, ...) Render dangerous-command approval as Approve/Deny buttons. Inbound dispatch routes to tools.approval.resolve_gateway_approval.
send_slash_confirm(chat_id, title, message, session_key, confirm_id, ...) Render slash-command confirmations (e.g. /reload-mcp) as Once/Always/Cancel buttons. Inbound dispatch routes to tools.slash_confirm.resolve.
send_model_picker(...) Interactive /model picker. Used by Telegram and Discord.

See gateway/platforms/telegram.py, discord.py, and whatsapp_cloud.py for reference implementations. The button-callback id convention (cl:<id>:<idx>, appr:<id>:<choice>, sc:<choice>:<id>) is shared across adapters — match it so the gateway-side resolvers work without modification.

Required function

def check_<platform>_requirements() -> bool:
    """Check if this platform's dependencies are available."""

Key patterns to follow

  • Use self.build_source(...) to construct SessionSource objects
  • Call self.handle_message(event) to dispatch inbound messages to the gateway
  • Use MessageEvent, MessageType, SendResult from base
  • Use cache_image_from_bytes, cache_audio_from_bytes, cache_document_from_bytes for attachments
  • Filter self-messages (prevent reply loops)
  • Filter sync/echo messages if the platform has them
  • Redact sensitive identifiers (phone numbers, tokens) in all log output
  • Implement reconnection with exponential backoff + jitter for streaming connections
  • Set MAX_MESSAGE_LENGTH if the platform has message size limits

2. Platform Enum (gateway/config.py)

Add the platform to the Platform enum:

class Platform(Enum):
    ...
    YOUR_PLATFORM = "your_platform"

Add env var loading in _apply_env_overrides():

# Your Platform
your_token = os.getenv("YOUR_PLATFORM_TOKEN")
if your_token:
    if Platform.YOUR_PLATFORM not in config.platforms:
        config.platforms[Platform.YOUR_PLATFORM] = PlatformConfig()
    config.platforms[Platform.YOUR_PLATFORM].enabled = True
    config.platforms[Platform.YOUR_PLATFORM].token = your_token

Update get_connected_platforms() if your platform doesn't use token/api_key (e.g., WhatsApp uses enabled flag, Signal uses extra dict).


3. Adapter Factory (gateway/run.py)

Add to _create_adapter():

elif platform == Platform.YOUR_PLATFORM:
    from gateway.platforms.your_platform import YourAdapter, check_your_requirements
    if not check_your_requirements():
        logger.warning("Your Platform: dependencies not met")
        return None
    return YourAdapter(config)

4. Authorization Maps (gateway/run.py)

Add to BOTH dicts in _is_user_authorized():

platform_env_map = {
    ...
    Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOWED_USERS",
}
platform_allow_all_map = {
    ...
    Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOW_ALL_USERS",
}

5. Session Source (gateway/session.py)

If your platform needs extra identity fields (e.g., Signal's UUID alongside phone number), add them to the SessionSource dataclass with Optional defaults, and update to_dict(), from_dict(), and build_source() in base.py.


6. System Prompt Hints (agent/prompt_builder.py)

Add a PLATFORM_HINTS entry so the agent knows what platform it's on:

PLATFORM_HINTS = {
    ...
    "your_platform": (
        "You are on Your Platform. "
        "Describe formatting capabilities, media support, etc."
    ),
}

Without this, the agent won't know it's on your platform and may use inappropriate formatting (e.g., markdown on platforms that don't render it).


7. Toolset (toolsets.py)

Add a named toolset for your platform:

"hermes-your-platform": {
    "description": "Your Platform bot toolset",
    "tools": _HERMES_CORE_TOOLS,
    "includes": []
},

And add it to the hermes-gateway composite:

"hermes-gateway": {
    "includes": [..., "hermes-your-platform"]
}

8. Cron Delivery (cron/scheduler.py)

Add to platform_map in _deliver_result():

platform_map = {
    ...
    "your_platform": Platform.YOUR_PLATFORM,
}

Without this, cronjob(action="create", deliver="your_platform", ...) silently fails.


9. Send Message Tool (tools/send_message_tool.py)

Add to platform_map in send_message_tool():

platform_map = {
    ...
    "your_platform": Platform.YOUR_PLATFORM,
}

Add routing in _send_to_platform():

elif platform == Platform.YOUR_PLATFORM:
    return await _send_your_platform(pconfig, chat_id, message)

Implement _send_your_platform() — a standalone async function that sends a single message without requiring the full adapter (for use by cron jobs and the send_message tool outside the gateway process).

Update the tool schema target description to include your platform example.


10. Cronjob Tool Schema (tools/cronjob_tools.py)

Update the deliver parameter description and docstring to mention your platform as a delivery option.


11. Channel Directory (gateway/channel_directory.py)

If your platform can't enumerate chats (most can't), add it to the session-based discovery list:

for plat_name in ("telegram", "whatsapp", "signal", "your_platform"):

12. Status Display (hermes_cli/status.py)

Add to the platforms dict in the Messaging Platforms section:

platforms = {
    ...
    "Your Platform": ("YOUR_PLATFORM_TOKEN", "YOUR_PLATFORM_HOME_CHANNEL"),
}

13. Gateway Setup Wizard (hermes_cli/gateway.py)

Add to the _PLATFORMS list:

{
    "key": "your_platform",
    "label": "Your Platform",
    "emoji": "📱",
    "token_var": "YOUR_PLATFORM_TOKEN",
    "setup_instructions": [...],
    "vars": [...],
}

If your platform needs custom setup logic (connectivity testing, QR codes, policy choices), add a _setup_your_platform() function and route to it in the platform selection switch.

Update _platform_status() if your platform's "configured" check differs from the standard bool(get_env_value(token_var)).


14. Phone/ID Redaction (agent/redact.py)

If your platform uses sensitive identifiers (phone numbers, etc.), add a regex pattern and redaction function to agent/redact.py. This ensures identifiers are masked in ALL log output, not just your adapter's logs.


15. Documentation

File What to update
README.md Platform list in feature table + documentation table
AGENTS.md Gateway description + env var config section
website/docs/user-guide/messaging/<platform>.md NEW — Full setup guide (see existing platform docs for template)
website/docs/user-guide/messaging/index.md Architecture diagram, toolset table, security examples, Next Steps links
website/docs/reference/environment-variables.md All env vars for the platform

16. Tests (tests/gateway/test_<platform>.py)

Recommended test coverage:

  • Platform enum exists with correct value
  • Config loading from env vars via _apply_env_overrides
  • Adapter init (config parsing, allowlist handling, default values)
  • Helper functions (redaction, parsing, file type detection)
  • Session source round-trip (to_dict → from_dict)
  • Authorization integration (platform in allowlist maps)
  • Send message tool routing (platform in platform_map)

Optional but valuable:

  • Async tests for message handling flow (mock the platform API)
  • SSE/WebSocket reconnection logic
  • Attachment processing
  • Group message filtering

Quick Verification

After implementing everything, verify with:

# All tests pass
python -m pytest tests/ -q

# Grep for your platform name to find any missed integration points
grep -r "telegram\|discord\|whatsapp\|slack" gateway/ tools/ agent/ cron/ hermes_cli/ toolsets.py \
  --include="*.py" -l | sort -u
# Check each file in the output — if it mentions other platforms but not yours, you missed it