mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-27 11:22:03 +00:00
The scale-to-zero idle watcher never started on a correctly-opted-in, relay-only instance, so the gateway never ran its idle decision, never called go_dormant(), and never sent going_idle to the connector. Fly's autostop still suspended the machine on traffic-idle, but the connector never flipped the instance to buffered-only — so an inbound DM took the live delivery path, found no live session for the suspended machine, and was dropped fail-closed with no wake poke. The machine slept and never woke. Root cause: _scale_to_zero_should_arm() passed list(config.platforms.keys()) to messaging_is_relay_only_or_absent(). config.platforms is pre-seeded with a DISABLED placeholder PlatformConfig for every known platform (telegram, discord, slack, matrix, …), so the key set is always the full ~20-entry catalog regardless of what the instance actually runs. The relay-only check discarded "relay", saw the disabled placeholders as live direct-socket platforms, and returned False — so should_arm() was False and the watcher was never created. Verified live on a staging instance: config.platforms keys = [telegram, discord, slack, mattermost, matrix, relay] with only relay enabled=True; should_arm() = False. Fix: filter config.platforms to ENABLED entries before the relay-only check, mirroring the adapter-connect loop which already gates on `if not platform_config.enabled: continue`. This arms off the same notion of "active platform" the rest of start() already uses — no parallel concept. Also add a one-line not-armed diagnostic: when an instance IS opted in (the HERMES_SCALE_TO_ZERO stamp is set) but the watcher still doesn't arm, log why (relay_only_or_absent, the enabled platforms, wake_url present/missing). A non-opted instance stays silent. The arm path previously logged only on success, so a failed arm was invisible. Tests: the existing pure-helper tests passed bare names so they never exercised the call site that feeds the placeholder-laden config. Add behaviour-contract tests against the REAL _scale_to_zero_should_arm with a realistic config.platforms (relay enabled + others disabled). The F25 regression test (relay-only + disabled placeholders must arm) and the no-platform case are RED without this fix, GREEN with it; the genuinely-enabled-direct-platform / not-opted-in / no-wake-url cases stay correctly non-arming so the filter can't over-broaden. Wake mechanism itself verified healthy independently (direct wakeUrl GET resumed a suspended staging instance in 1.15s, clean resume signature). |
||
|---|---|---|
| .. | ||
| assets | ||
| builtin_hooks | ||
| platforms | ||
| relay | ||
| __init__.py | ||
| authz_mixin.py | ||
| channel_directory.py | ||
| code_skew.py | ||
| config.py | ||
| delivery.py | ||
| display_config.py | ||
| hooks.py | ||
| kanban_watchers.py | ||
| memory_monitor.py | ||
| message_timestamps.py | ||
| mirror.py | ||
| pairing.py | ||
| platform_registry.py | ||
| response_filters.py | ||
| restart.py | ||
| rich_sent_store.py | ||
| run.py | ||
| runtime_footer.py | ||
| scale_to_zero.py | ||
| session.py | ||
| session_context.py | ||
| shutdown_forensics.py | ||
| slash_access.py | ||
| slash_commands.py | ||
| status.py | ||
| sticker_cache.py | ||
| stream_consumer.py | ||
| stream_dispatch.py | ||
| stream_events.py | ||
| whatsapp_identity.py | ||