hermes-agent/tests/gateway
Teknium 307c85e5c1 fix(goals): auto-pause when judge model returns unparseable output
Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like

  judge returned empty response
  judge reply was not JSON: "Let me analyze whether the goal..."

and /goal clear could not stop it mid-loop without /stop.

After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:

  ⏸ Goal paused — the judge model (3 turns) isn't returning the
  required JSON verdict. Route the judge to a stricter model in
  ~/.hermes/config.yaml:
    auxiliary:
      goal_judge:
        provider: openrouter
        model: google/gemini-3-flash-preview
  Then /goal resume to continue.

The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.

Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
2026-05-07 17:33:09 -07:00
..
__init__.py
_plugin_adapter_loader.py test(gateway): isolate plugin adapter imports and guard the anti-pattern 2026-04-30 01:19:34 -07:00
conftest.py test(gateway): isolate plugin adapter imports and guard the anti-pattern 2026-04-30 01:19:34 -07:00
feishu_helpers.py feat(feishu): operator-configurable bot admission and mention policy 2026-04-30 20:30:31 -07:00
restart_test_helpers.py fix(gateway): cap cached session sources with LRU eviction 2026-05-07 05:16:38 -07:00
test_7100_transient_failure_transcript.py fix(gateway): persist user message on transient agent failures (#7100) 2026-04-30 04:32:33 -07:00
test_agent_cache.py fix(agent): honor configured model max tokens 2026-05-07 06:40:30 -07:00
test_allowed_channels_widening.py feat(gateway): add allowed_{chats,channels,rooms} whitelist to Telegram, Mattermost, Matrix, DingTalk 2026-05-07 06:54:29 -07:00
test_allowlist_startup_check.py
test_api_server.py docs: clarify API server tool execution locality 2026-05-07 05:30:37 -07:00
test_api_server_bind_guard.py
test_api_server_jobs.py
test_api_server_multimodal.py
test_api_server_normalize.py
test_api_server_runs.py fix(api-server): use session-scoped task IDs for tool isolation 2026-04-30 19:59:38 -07:00
test_api_server_toolset.py
test_approve_deny_commands.py fix(approval): wake blocked gateway approvals on session cleanup 2026-04-30 19:46:27 -07:00
test_auth_fallback.py
test_auto_continue.py
test_background_command.py
test_background_process_notifications.py fix(gateway): preserve thread routing from cached live session sources 2026-05-07 05:16:38 -07:00
test_base_topic_sessions.py
test_bluebubbles.py
test_busy_session_ack.py
test_busy_session_auth_bypass.py fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775) 2026-04-30 04:29:15 -07:00
test_cancel_background_drain.py
test_channel_directory.py
test_clean_shutdown_marker.py fix: update tests for resume_pending semantics + add AUTHOR_MAP entries 2026-05-03 03:54:03 -07:00
test_command_bypass_active_session.py
test_complete_path_at_filter.py
test_compress_command.py fix(compression): include system prompt + tool schemas in token estimates (#18265) 2026-04-30 23:03:54 -07:00
test_compress_focus.py
test_compress_plugin_engine.py
test_config.py feat(gateway): per-platform gateway_restart_notification flag 2026-05-06 13:39:43 -07:00
test_config_cwd_bridge.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
test_config_env_bridge_authority.py fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761) 2026-05-02 02:08:06 -07:00
test_debug_command.py
test_delivery.py fix(gateway): preserve case-sensitive chat IDs in DeliveryTarget.parse 2026-05-01 14:01:26 -07:00
test_dingtalk.py
test_discord_allowed_channels.py
test_discord_allowed_mentions.py
test_discord_attachment_download.py
test_discord_bot_auth_bypass.py
test_discord_bot_filter.py
test_discord_channel_controls.py
test_discord_channel_prompts.py
test_discord_channel_skills.py
test_discord_component_auth.py fix(gateway/discord): require allowlist auth on slash commands 2026-05-03 03:44:55 -07:00
test_discord_connect.py fix(discord): narrow rate-limit catch and move sync state under gateway/ 2026-05-06 18:12:35 -07:00
test_discord_document_handling.py test(discord): annotate make_attachment content_type as Optional[str] 2026-05-04 12:36:47 -07:00
test_discord_free_response.py fix(gateway): coerce scalar free_response_channels to str before split 2026-05-01 14:01:26 -07:00
test_discord_imports.py
test_discord_media_metadata.py
test_discord_model_picker.py
test_discord_opus.py
test_discord_race_polish.py
test_discord_reactions.py
test_discord_reply_mode.py fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram 2026-05-05 04:58:23 -07:00
test_discord_roles_dm_scope.py fix(discord): route DM role-auth opt-in through config.yaml (not env var) 2026-05-07 05:51:56 -07:00
test_discord_send.py
test_discord_slash_auth.py fix(discord): extend role-scope fix to slash surface + fixture update 2026-05-07 05:51:56 -07:00
test_discord_slash_commands.py fix(gateway/discord): require allowlist auth on slash commands 2026-05-03 03:44:55 -07:00
test_discord_system_messages.py
test_discord_thread_persistence.py fix(gateway): ensure deterministic thread eviction in helpers 2026-05-05 10:13:55 -07:00
test_display_config.py feat(gateway): opt-in cleanup of temporary progress bubbles (#21186) 2026-05-07 05:04:37 -07:00
test_dm_topics.py
test_document_cache.py
test_duplicate_reply_suppression.py fix(gateway): drain pending messages via fresh task, not recursion (#17758) 2026-04-30 03:27:08 -07:00
test_email.py fix(email): drop non-allowlisted senders before dispatch to prevent mail loops 2026-05-04 12:35:22 -07:00
test_ephemeral_reply.py feat(gateway): auto-delete slash-command system notices after TTL (#18266) 2026-04-30 23:05:48 -07:00
test_extract_local_files.py
test_fallback_eviction.py
test_fast_command.py fix(gateway): guard against None request_overrides in _build_api_kwargs 2026-04-28 06:57:23 -07:00
test_feishu.py fix(feishu): keep topic replies in threads 2026-05-06 10:52:51 -07:00
test_feishu_approval_buttons.py
test_feishu_bot_admission.py feat(feishu): operator-configurable bot admission and mention policy 2026-04-30 20:30:31 -07:00
test_feishu_bot_auth_bypass.py feat(feishu): operator-configurable bot admission and mention policy 2026-04-30 20:30:31 -07:00
test_feishu_comment.py
test_feishu_comment_rules.py
test_feishu_onboard.py fix(gateway): use monotonic deadlines in QR onboarding flows 2026-05-07 05:09:39 -07:00
test_fresh_reset_skill_injection.py fix(gateway): re-inject topic-bound skill after /new or /reset 2026-04-30 20:29:19 -07:00
test_gateway_command_help.py fix: sanitize Telegram help command mentions 2026-05-03 17:00:09 -07:00
test_gateway_inactivity_timeout.py
test_gateway_shutdown.py
test_goal_max_turns_config.py fix(gateway): honor configured goal turn budget 2026-05-07 06:31:08 -07:00
test_goal_status_notice.py fix(gateway): defer goal status notices until after response delivery 2026-05-07 17:33:09 -07:00
test_goal_verdict_send.py fix(goals): auto-pause when judge model returns unparseable output 2026-05-07 17:33:09 -07:00
test_google_chat.py feat(plugins/google_chat): Google Chat platform adapter as a bundled plugin 2026-05-07 07:15:44 -07:00
test_home_target_env_var.py fix(gateway): preserve home-channel thread targets across restart notifications 2026-05-03 08:47:49 -07:00
test_homeassistant.py
test_hooks.py
test_insights_unicode_flags.py
test_internal_event_bypass_pairing.py
test_interrupt_key_match.py
test_irc_adapter.py test(gateway): isolate plugin adapter imports and guard the anti-pattern 2026-04-30 01:19:34 -07:00
test_keep_typing_timeout.py
test_matrix.py fix(matrix): defer reaction cleanup redactions 2026-05-07 06:05:44 -07:00
test_matrix_exec_approval.py test(matrix): set user_id in approval-reaction test to bypass defensive self-drop 2026-04-27 21:22:44 -07:00
test_matrix_mention.py test(matrix): adapt outbound-mention notice test to current _send_simple_message API 2026-04-27 21:22:44 -07:00
test_matrix_voice.py
test_mattermost.py
test_media_download_retry.py
test_media_extraction.py
test_message_deduplicator.py
test_mirror.py
test_model_command_custom_providers.py
test_model_switch_persistence.py
test_native_image_buffer_isolation.py fix(gateway): isolate pending native image paths by session 2026-04-30 20:26:35 -07:00
test_notice_delivery.py feat(gateway): private notice delivery and Slack format_message fixes 2026-05-01 13:33:06 -07:00
test_pairing.py fix(pairing): enforce lockout on approve_code, not just generate_code (#10195) (#21325) 2026-05-07 07:18:21 -07:00
test_pending_drain_no_recursion.py test(gateway): pin cleanup invariants for #17758 in-band drain hand-off 2026-04-30 05:00:25 -07:00
test_pending_drain_race.py
test_pending_event_none.py
test_pii_redaction.py
test_platform_base.py feat(gateway): support [[as_document]] directive for skill media routing 2026-05-07 05:20:10 -07:00
test_platform_connected_checkers.py feat(irc): add interactive setup 2026-04-29 21:56:51 -07:00
test_platform_http_client_limits.py fix(gateway): tighten httpx keepalive and close whatsapp typing-response leak (#18451) 2026-05-02 02:23:37 -07:00
test_platform_reconnect.py fix(gateway): isolate platform connect failures with per-platform timeout 2026-04-29 05:00:37 -07:00
test_platform_registry.py feat(irc): add interactive setup 2026-04-29 21:56:51 -07:00
test_plugin_platform_interface.py feat(irc): add interactive setup 2026-04-29 21:56:51 -07:00
test_post_delivery_callback_chaining.py feat(gateway): opt-in cleanup of temporary progress bubbles (#21186) 2026-05-07 05:04:37 -07:00
test_pre_gateway_dispatch.py
test_proxy_mode.py
test_qqbot.py feat(qqbot): wire native tool-approval UX via inline keyboards 2026-05-07 07:48:15 -07:00
test_queue_consumption.py
test_reasoning_command.py fix: coerce show_reasoning and guard_agent_created config bools 2026-04-30 20:40:46 -07:00
test_reload_skills_command.py refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool 2026-04-29 21:07:47 -07:00
test_reload_skills_discord_resync.py fix(discord): /reload-skills now refreshes the /skill autocomplete live (#18754) 2026-05-02 02:00:11 -07:00
test_reply_to_injection.py
test_restart_drain.py feat(gateway): also gate pre-restart "Gateway restarting" notification 2026-05-06 13:39:43 -07:00
test_restart_notification.py fix(gateway): preserve thread routing from cached live session sources 2026-05-07 05:16:38 -07:00
test_restart_redelivery_dedup.py
test_restart_resume_pending.py refactor(gateway): simplify auto-resume + extend to crash recovery 2026-05-07 05:05:34 -07:00
test_resume_command.py feat(memory): notify providers on mid-process session_id rotation (#17409) 2026-04-29 04:57:22 -07:00
test_retry_replacement.py
test_retry_response.py
test_run_cleanup_progress.py feat(gateway): opt-in cleanup of temporary progress bubbles (#21186) 2026-05-07 05:04:37 -07:00
test_run_progress_interrupt.py
test_run_progress_topics.py fix(feishu): keep topic replies in threads 2026-05-06 10:52:51 -07:00
test_runner_fatal_adapter.py
test_runner_startup_failures.py
test_running_agent_session_toggles.py
test_runtime_env_reload_config_authority.py fix(gateway): preserve max turns after env reload 2026-05-07 05:49:16 -07:00
test_runtime_footer.py feat(gateway): opt-in runtime-metadata footer on final replies (#17026) 2026-04-28 06:50:04 -07:00
test_safe_adapter_disconnect.py
test_send_image_file.py
test_send_multiple_images.py feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email 2026-04-30 04:28:08 -07:00
test_send_retry.py
test_session.py fix(state): JSON-encode multimodal message content for sqlite 2026-04-30 20:25:52 -07:00
test_session_boundary_hooks.py
test_session_boundary_security_state.py fix(gateway): clear queued reload-skills notes on new/resume/branch 2026-05-03 17:00:31 -07:00
test_session_dm_thread_seeding.py
test_session_env.py
test_session_hygiene.py feat(gateway): make hygiene hard message limit configurable (#17000) 2026-04-28 05:43:12 -07:00
test_session_info.py
test_session_list_allowed_sources.py
test_session_model_override_routing.py fix: use configured model for gateway auth fallback 2026-05-07 06:29:27 -07:00
test_session_model_reset.py
test_session_race_guard.py fix(gateway): preserve document type when merging queued events 2026-04-30 20:37:27 -07:00
test_session_reset_notify.py
test_session_split_brain_11016.py
test_session_state_cleanup.py
test_session_store_prune.py
test_setup_feishu.py
test_shared_group_sender_prefix.py
test_shutdown_cache_cleanup.py
test_shutdown_memory_provider_messages.py
test_signal.py fix(signal): skip contentless envelopes (profile key updates, empty messages) 2026-04-30 19:42:59 -07:00
test_signal_format.py feat(gateway/signal): native formatting, reply quotes, and reactions 2026-04-29 04:38:17 -07:00
test_signal_rate_limit.py feat(gateway/signal): add support for multiple images sending 2026-04-30 04:28:08 -07:00
test_slack.py fix(slack): close previous handler in connect() to prevent zombie Socket Mode connections 2026-05-03 03:47:49 -07:00
test_slack_approval_buttons.py
test_slack_channel_skills.py
test_slack_mention.py feat(slack): add allowed_channels whitelist config 2026-05-07 06:54:29 -07:00
test_sms.py test(sms): use clear=True in test_missing_phone_number_is_non_retryable 2026-05-04 05:25:09 -07:00
test_sse_agent_cancel.py
test_ssl_certs.py
test_status.py fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
test_status_command.py fix(gateway): snapshot callback generation after agent binds it, not before 2026-04-30 20:41:18 -07:00
test_steer_command.py
test_step_callback_compat.py
test_sticker_cache.py
test_stream_consumer.py fix(gateway): linearize tool-progress bubbles with content messages (#17280) 2026-04-28 22:17:33 -07:00
test_stream_consumer_fresh_final.py
test_stt_config.py
test_stuck_loop.py
test_teams.py fix(tests): patch TypingActivityInput after mock on Python <3.12 2026-05-04 20:59:18 -07:00
test_telegram_approval_buttons.py fix(telegram): enforce gateway auth for inline approval callbacks 2026-04-30 19:59:31 -07:00
test_telegram_caption_merge.py
test_telegram_conflict.py
test_telegram_documents.py fix: route Telegram image documents through photo handling 2026-05-07 04:51:46 -07:00
test_telegram_format.py feat(telegram): render markdown tables as row groups 2026-04-28 05:37:50 -07:00
test_telegram_group_gating.py fix(gateway): bridge top-level require_mention to Telegram config 2026-05-03 16:59:46 -07:00
test_telegram_mention_boundaries.py
test_telegram_network.py fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520) 2026-05-05 04:42:59 -07:00
test_telegram_network_reconnect.py fix(telegram): probe polling liveness after reconnect to detect wedged Updater 2026-05-02 01:55:04 -07:00
test_telegram_photo_interrupts.py
test_telegram_reactions.py
test_telegram_reply_mode.py fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram 2026-05-05 04:58:23 -07:00
test_telegram_text_batching.py
test_telegram_thread_fallback.py fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390) 2026-05-07 08:39:21 -07:00
test_telegram_topic_mode.py feat(telegram): /topic off + help + auth gate + screenshot debounce 2026-05-04 12:07:17 -07:00
test_telegram_webhook_secret.py
test_text_batching.py
test_title_command.py fix(cli,gateway): surface title errors from /new <name> 2026-05-04 03:14:50 -07:00
test_transcript_offset.py
test_tts_media_routing.py feat(gateway): centralize audio routing + FLAC support + Telegram doc fallback (#17833) 2026-04-30 01:32:31 -07:00
test_unauthorized_dm_behavior.py fix(telegram): preserve pre-#17686 chat-ID-in-_USERS configs + doc split 2026-04-29 21:07:55 -07:00
test_unavailable_skill_hint.py fix(gateway): match disabled/optional skills by frontmatter slug, not dir name (#18753) 2026-05-02 02:00:09 -07:00
test_unknown_command.py
test_update_command.py fix(gateway): preserve thread routing for /update progress and prompts 2026-04-30 20:19:23 -07:00
test_update_streaming.py fix(gateway): preserve pending update prompts across restarts 2026-05-05 03:59:39 -07:00
test_usage_command.py
test_verbose_command.py fix(gateway): coerce tool_progress_command as a real boolean 2026-04-30 20:40:46 -07:00
test_vision_memory_leak.py
test_voice_command.py fix(gateway): suppress duplicate voice transcripts 2026-05-03 16:59:21 -07:00
test_voice_mode_platform_isolation.py
test_weak_credential_guard.py
test_webhook_adapter.py fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs 2026-05-07 07:38:43 -07:00
test_webhook_deliver_only.py fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs 2026-05-07 07:38:43 -07:00
test_webhook_dynamic_routes.py
test_webhook_integration.py
test_webhook_signature_rate_limit.py
test_wecom.py fix(gateway): use monotonic deadlines in QR onboarding flows 2026-05-07 05:09:39 -07:00
test_wecom_callback.py
test_weixin.py fix(weixin): wrap long copy-unfriendly lines 2026-05-07 06:08:06 -07:00
test_whatsapp_connect.py fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761) 2026-05-02 02:08:06 -07:00
test_whatsapp_formatting.py Fix WhatsApp long message splitting 2026-05-07 06:27:47 -07:00
test_whatsapp_group_gating.py
test_whatsapp_reply_prefix.py
test_ws_auth_retry.py
test_yolo_command.py