hermes-agent/gateway/platforms
Jeff Escalante 4aef055805 fix(gateway/webhook): don't pop delivery_info on send
The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.

Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:

  - "🔄 Primary model failed — switching to fallback: ..."
    (run_agent.py::_emit_status when `fallback_providers` activates)
  - context-pressure / compression notices
  - any other lifecycle event routed through `status_callback`

When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.

This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.

Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.

Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.

Test changes:
  - `test_delivery_info_cleaned_after_send` is replaced with
    `test_delivery_info_survives_multiple_sends`, which is now the
    regression test for this bug — it asserts that two consecutive
    `send()` calls both see the delivery config.
  - A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
    behavior.
  - The two integration tests that asserted `chat_id not in
    adapter._delivery_info` after `send()` now assert the opposite, with
    a comment explaining why.

All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
2026-04-07 17:27:09 -07:00
..
__init__.py Enhance CLI with multi-platform messaging integration and configuration management 2026-02-02 19:01:51 -08:00
ADDING_A_PLATFORM.md docs: finish cron terminology cleanup 2026-03-14 19:20:58 -07:00
api_server.py fix(gateway): wrap cron helpers with staticmethod to prevent self-binding 2026-04-05 12:31:10 -07:00
base.py fix: extend caption substring fix to all platforms 2026-04-07 14:08:59 -07:00
dingtalk.py fix(dingtalk): requirements check passes with only one credential set 2026-03-17 03:50:45 -07:00
discord.py fix(discord): remove default selection from model picker provider dropdown 2026-04-06 23:06:33 -07:00
email.py fix(email): close SMTP and IMAP connections on failure (#3804) 2026-03-29 15:38:32 -07:00
feishu.py fix: extend caption substring fix to all platforms 2026-04-07 14:08:59 -07:00
homeassistant.py fix(gateway): add request timeouts to HA, Email, Mattermost, SMS adapters (#3258) 2026-03-26 14:36:07 -07:00
matrix.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
mattermost.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
signal.py fix(signal): implement send_image_file, send_voice, and send_video for MEDIA: tag delivery 2026-04-06 11:41:34 -07:00
slack.py feat(slack): thread engagement — auto-respond in bot-started and mentioned threads (#5897) 2026-04-07 11:12:08 -07:00
sms.py fix: store asyncio task references to prevent GC mid-execution (#3267) 2026-03-26 14:36:24 -07:00
telegram.py fix: extend caption substring fix to all platforms 2026-04-07 14:08:59 -07:00
telegram_network.py fix(security): reject private and loopback IPs in Telegram DoH fallback (#4129) 2026-03-30 18:53:24 -07:00
webhook.py fix(gateway/webhook): don't pop delivery_info on send 2026-04-07 17:27:09 -07:00
wecom.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
whatsapp.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00