hermes-agent/tests
Jeff Escalante 4aef055805 fix(gateway/webhook): don't pop delivery_info on send
The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.

Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:

  - "🔄 Primary model failed — switching to fallback: ..."
    (run_agent.py::_emit_status when `fallback_providers` activates)
  - context-pressure / compression notices
  - any other lifecycle event routed through `status_callback`

When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.

This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.

Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.

Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.

Test changes:
  - `test_delivery_info_cleaned_after_send` is replaced with
    `test_delivery_info_survives_multiple_sends`, which is now the
    regression test for this bug — it asserts that two consecutive
    `send()` calls both see the delivery config.
  - A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
    behavior.
  - The two integration tests that asserted `chat_id not in
    adapter._delivery_info` after `send()` now assert the opposite, with
    a comment explaining why.

All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
2026-04-07 17:27:09 -07:00
..
acp feat(api): structured run events via /v1/runs SSE endpoint 2026-04-05 12:05:13 -07:00
agent refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
cli refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
cron refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
e2e test(e2e): remove section separator comments 2026-04-01 15:23:52 -07:00
fakes fix: streaming tool call parsing, error handling, and fake HA state mutation 2026-03-14 14:27:20 +03:00
gateway fix(gateway/webhook): don't pop delivery_info on send 2026-04-07 17:27:09 -07:00
hermes_cli refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
honcho_plugin fix(honcho): migration guard for observation mode default change 2026-04-05 12:34:11 -07:00
integration refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804) 2026-03-24 07:30:25 -07:00
plugins feat(supermemory): add multi-container, search_mode, identity template, and env var override (#5933) 2026-04-07 14:03:46 -07:00
run_agent refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
skills fix: protect profile-scoped google workspace oauth tokens 2026-04-03 17:49:18 -07:00
tools refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
__init__.py A bit of restructuring for simplicity and organization 2025-10-01 23:29:25 +00:00
conftest.py fix(approval): show full command in dangerous command approval (#1553) 2026-03-17 02:02:33 -07:00
run_interrupt_test.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_batch_runner_checkpoint.py fix: sanitize chat payloads and provider precedence 2026-03-13 23:59:12 -07:00
test_evidence_store.py feat: add OSS Security Forensics skill (Skills Hub) (#1482) 2026-03-15 21:59:53 -07:00
test_hermes_logging.py fix: repair 57 failing CI tests across 14 files (#5823) 2026-04-07 09:58:45 -07:00
test_hermes_state.py fix: merge dotted+hyphenated FTS5 quoting into single pass 2026-04-02 00:49:11 -07:00
test_honcho_client_config.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
test_mcp_serve.py feat: add MCP server mode — hermes mcp serve (#3795) 2026-03-29 15:47:19 -07:00
test_minisweagent_path.py chore: remove all remaining mini-swe-agent references 2026-03-24 08:19:23 -07:00
test_model_tools.py Add request-scoped plugin lifecycle hooks 2026-04-05 23:31:29 -07:00
test_model_tools_async_bridge.py fix: use per-thread persistent event loops in worker threads 2026-03-20 15:41:06 -04:00
test_packaging_metadata.py chore: prepare Hermes for Homebrew packaging (#4099) 2026-03-30 17:34:43 -07:00
test_project_metadata.py fix: exclude matrix from [all] extras — python-olm is upstream-broken (#4615) 2026-04-02 09:21:37 -07:00
test_sql_injection.py fix(security): eliminate SQL string formatting in execute() calls 2026-03-19 15:16:35 +01:00
test_timezone.py fix: repair 57 failing CI tests across 14 files (#5823) 2026-04-07 09:58:45 -07:00
test_toolset_distributions.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_toolsets.py fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs 2026-03-09 23:27:19 -07:00
test_trajectory_compressor.py fix: URL-based auth for third-party Anthropic endpoints + CI test fixes (#4148) 2026-03-30 20:36:56 -07:00
test_trajectory_compressor_async.py fix: create AsyncOpenAI lazily in trajectory_compressor to avoid closed event loop (#4013) 2026-03-30 13:16:16 -07:00
test_utils_truthy_values.py Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway. 2026-03-30 13:28:10 +09:00