mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858 )

Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.

messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
  Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
  extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
  google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
  silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
  baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
  plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
  SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
  per-profile .env, not 'hermes config set' (same pattern fixed in
  api-server.md last round). Also bumped example ports to 8650+ to dodge the
  default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
  collision.

developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
  run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
  gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
  platform tree updated (qqbot is a sub-package, not qqbot.py; added
  yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
  (always active)' was wrong — it's an empty extension point and
  _register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
  callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
  NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
  Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
  legacy single-pair model — corrected to the canonical list-form
  fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
  alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
  which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
  'git submodule update --init' is required if cloning without
  --recurse-submodules.

guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
  positional (not --schedule), the script-only flag is --no-agent (not
  --script-only), and there's no --command flag. Replaced with a real example
  that creates the script under ~/.hermes/scripts/ and uses the actual flags.
  Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
  the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
  rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
  fails because --region isn't registered on the auth-add argparse spec.
  Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
  China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
  table mention with a webhook-subscription pointer; also fixed the dead link
  to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
  at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
  knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
  and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
  one quoted string, so cmd.exe gets a single arg instead of the multi-token
  command line it needs. Removed the surrounding quotes — argparse nargs='*'
  collects each token correctly.

integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
  actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
  and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
  api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
  refreshed. Fallback section rewritten to lead with the canonical
  fallback_providers list form (was leading with the legacy fallback_model
  single dict); supported-providers list extended to include azure-foundry,
  alibaba-coding-plan, lmstudio.

index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
  integrations/index.md ('19+') and undercounted — bumped to 20+ and added
  Weixin/QQ Bot/Yuanbao/Google Chat to the list.

Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.

2026-05-09 15:00:24 -07:00

12 KiB

Raw Blame History

sidebar_position	title	description
7	Gateway Internals	How the messaging gateway boots, authorizes users, routes sessions, and delivers messages

Gateway Internals

The messaging gateway is the long-running process that connects Hermes to 20+ external messaging platforms through a unified architecture.

Key Files

File	Purpose
`gateway/run.py`	`GatewayRunner` — main loop, slash commands, message dispatch (large file; check git for current LOC)
`gateway/session.py`	`SessionStore` — conversation persistence and session key construction
`gateway/delivery.py`	Outbound message delivery to target platforms/channels
`gateway/pairing.py`	DM pairing flow for user authorization
`gateway/channel_directory.py`	Maps chat IDs to human-readable names for cron delivery
`gateway/hooks.py`	Hook discovery, loading, and lifecycle event dispatch
`gateway/mirror.py`	Cross-session message mirroring for `send_message`
`gateway/status.py`	Token lock management for profile-scoped gateway instances
`gateway/builtin_hooks/`	Extension point for always-registered hooks (none shipped)
`gateway/platforms/`	Platform adapters (one per messaging platform)

Architecture Overview

┌─────────────────────────────────────────────────┐
│                  GatewayRunner                  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ Telegram │  │ Discord  │  │  Slack   │       │
│  │ Adapter  │  │ Adapter  │  │ Adapter  │       │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘       │
│       │             │             │             │
│       └─────────────┼─────────────┘             │
│                     ▼                           │
│              _handle_message()                  │
│                     │                           │
│         ┌───────────┼───────────┐               │
│         ▼           ▼           ▼               │
│  Slash command   AIAgent    Queue/BG            │
│    dispatch      creation   sessions            │
│                     │                           │
│                     ▼                           │
│                 SessionStore                    │
│              (SQLite persistence)               │
└───────┴─────────────┴─────────────┴─────────────┘

Message Flow

When a message arrives from any platform:

Platform adapter receives raw event, normalizes it into a MessageEvent
Base adapter checks active session guard:
- If agent is running for this session → queue message, set interrupt event
- If /approve, /deny, /stop → bypass guard (dispatched inline)
GatewayRunner._handle_message() receives the event:
- Resolve session key via _session_key_for_source() (format: agent:main:{platform}:{chat_type}:{chat_id})
- Check authorization (see Authorization below)
- Check if it's a slash command → dispatch to command handler
- Check if agent is already running → intercept commands like /stop, /status
- Otherwise → create AIAgent instance and run conversation
Response is sent back through the platform adapter

Session Key Format

Session keys encode the full routing context:

agent:main:{platform}:{chat_type}:{chat_id}

For example: agent:main:telegram:private:123456789

Thread-aware platforms (Telegram forum topics, Discord threads, Slack threads) may include thread IDs in the chat_id portion. Never construct session keys manually — always use build_session_key() from gateway/session.py.

Two-Level Message Guard

When an agent is actively running, incoming messages pass through two sequential guards:

Level 1 — Base adapter (gateway/platforms/base.py): Checks _active_sessions. If the session is active, queues the message in _pending_messages and sets an interrupt event. This catches messages before they reach the gateway runner.
Level 2 — Gateway runner (gateway/run.py): Checks _running_agents. Intercepts specific commands (/stop, /new, /queue, /status, /approve, /deny) and routes them appropriately. Everything else triggers running_agent.interrupt().

Commands that must reach the runner while the agent is blocked (like /approve) are dispatched inline via await self._message_handler(event) — they bypass the background task system to avoid race conditions.

Authorization

The gateway uses a multi-layer authorization check, evaluated in order:

Per-platform allow-all flag (e.g., TELEGRAM_ALLOW_ALL_USERS) — if set, all users on that platform are authorized
Platform allowlist (e.g., TELEGRAM_ALLOWED_USERS) — comma-separated user IDs
DM pairing — authenticated users can pair new users via a pairing code
Global allow-all (GATEWAY_ALLOW_ALL_USERS) — if set, all users across all platforms are authorized
Default: deny — unauthorized users are rejected

DM Pairing Flow

Admin: /pair
Gateway: "Pairing code: ABC123. Share with the user."
New user: ABC123
Gateway: "Paired! You're now authorized."

Pairing state is persisted in gateway/pairing.py and survives restarts.

Slash Command Dispatch

All slash commands in the gateway flow through the same resolution pipeline:

resolve_command() from hermes_cli/commands.py maps input to canonical name (handles aliases, prefix matching)
The canonical name is checked against GATEWAY_KNOWN_COMMANDS
Handler in _handle_message() dispatches based on canonical name
Some commands are gated on config (gateway_config_gate on CommandDef)

Running-Agent Guard

Commands that must NOT execute while the agent is processing are rejected early:

if _quick_key in self._running_agents:
    if canonical == "model":
        return "⏳ Agent is running — wait for it to finish or /stop first."

Bypass commands (/stop, /new, /approve, /deny, /queue, /status) have special handling.

Config Sources

The gateway reads configuration from multiple sources:

Source	What it provides
`~/.hermes/.env`	API keys, bot tokens, platform credentials
`~/.hermes/config.yaml`	Model settings, tool configuration, display options
Environment variables	Override any of the above

Unlike the CLI (which uses load_cli_config() with hardcoded defaults), the gateway reads config.yaml directly via YAML loader. This means config keys that exist in the CLI's defaults dict but not in the user's config file may behave differently between CLI and gateway.

Platform Adapters

Each messaging platform has an adapter in gateway/platforms/:

gateway/platforms/
├── base.py              # BaseAdapter — shared logic for all platforms
├── telegram.py          # Telegram Bot API (long polling or webhook)
├── discord.py           # Discord bot via discord.py
├── slack.py             # Slack Socket Mode
├── whatsapp.py          # WhatsApp Business Cloud API
├── signal.py            # Signal via signal-cli REST API
├── matrix.py            # Matrix via mautrix (optional E2EE)
├── mattermost.py        # Mattermost WebSocket API
├── email.py             # Email via IMAP/SMTP
├── sms.py               # SMS via Twilio
├── dingtalk.py          # DingTalk WebSocket
├── feishu.py            # Feishu/Lark WebSocket or webhook
├── wecom.py             # WeCom (WeChat Work) callback
├── weixin.py            # Weixin (personal WeChat) via iLink Bot API
├── bluebubbles.py       # Apple iMessage via BlueBubbles macOS server
├── qqbot/               # QQ Bot (Tencent QQ) via Official API v2 (sub-package: adapter.py, crypto.py, keyboards.py, …)
├── yuanbao.py           # Yuanbao (Tencent) DM/group adapter
├── feishu_comment.py    # Feishu document/drive comment-reply handler
├── msgraph_webhook.py   # Microsoft Graph change-notification webhook (Teams, Outlook, etc.)
├── webhook.py           # Inbound/outbound webhook adapter
├── api_server.py        # REST API server adapter
└── homeassistant.py     # Home Assistant conversation integration

Adapters implement a common interface:

connect() / disconnect() — lifecycle management
send_message() — outbound message delivery
on_message() — inbound message normalization → MessageEvent

Token Locks

Adapters that connect with unique credentials call acquire_scoped_lock() in connect() and release_scoped_lock() in disconnect(). This prevents two profiles from using the same bot token simultaneously.

Delivery Path

Outgoing deliveries (gateway/delivery.py) handle:

Direct reply — send response back to the originating chat
Home channel delivery — route cron job outputs and background results to a configured home channel
Explicit target delivery — send_message tool specifying telegram:-1001234567890
Cross-platform delivery — deliver to a different platform than the originating message

Cron job deliveries are NOT mirrored into gateway session history — they live in their own cron session only. This is a deliberate design choice to avoid message alternation violations.

Hooks

Gateway hooks are Python modules that respond to lifecycle events:

Gateway Hook Events

Event	When fired
`gateway:startup`	Gateway process starts
`session:start`	New conversation session begins
`session:end`	Session completes or times out
`session:reset`	User resets session with `/new`
`agent:start`	Agent begins processing a message
`agent:step`	Agent completes one tool-calling iteration
`agent:end`	Agent finishes and returns response
`command:*`	Any slash command is executed

Hooks are discovered from gateway/builtin_hooks/ (an extension point — currently empty in the shipped distribution; _register_builtin_hooks() is a no-op stub) and ~/.hermes/hooks/ (user-installed). Each hook is a directory with a HOOK.yaml manifest and handler.py.

Memory Provider Integration

When a memory provider plugin (e.g., Honcho) is enabled:

Gateway creates an AIAgent per message with the session ID
The MemoryManager initializes the provider with the session context
Provider tools (e.g., honcho_profile, viking_search) are routed through:

AIAgent._invoke_tool()
  → self._memory_manager.handle_tool_call(name, args)
    → provider.handle_tool_call(name, args)

On session end/reset, on_session_end() fires for cleanup and final data flush

Memory Flush Lifecycle

When a session is reset, resumed, or expires:

Built-in memories are flushed to disk
Memory provider's on_session_end() hook fires
A temporary AIAgent runs a memory-only conversation turn
Context is then discarded or archived

Background Maintenance

The gateway runs periodic maintenance alongside message handling:

Cron ticking — checks job schedules and fires due jobs
Session expiry — cleans up abandoned sessions after timeout
Memory flush — proactively flushes memory before session expiry
Cache refresh — refreshes model lists and provider status

Process Management

The gateway runs as a long-lived process, managed via:

hermes gateway start / hermes gateway stop — manual control
systemctl (Linux) or launchctl (macOS) — service management
PID file at ~/.hermes/gateway.pid — profile-scoped process tracking

Profile-scoped vs global: start_gateway() uses profile-scoped PID files. hermes gateway stop stops only the current profile's gateway. hermes gateway stop --all uses global ps aux scanning to kill all gateway processes (used during updates).

12 KiB Raw Blame History