perf(gateway): tune Telegram cadence + adaptive fast-path for short replies

Re-authored against current main from PR #10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes #10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
2026-05-23 05:31:23 +00:00 · 2026-05-10 22:18:06 -07:00 · 2026-05-10 22:18:06 -07:00 · ac95b8cdbe
commit ac95b8cdbe
parent e3b88a8fe2
4 changed files with 103 additions and 15 deletions
--- a/gateway/config.py
+++ b/gateway/config.py
@ -317,6 +317,16 @@ class PlatformConfig:
        )


+# Streaming defaults — single source of truth so both StreamingConfig and
+# StreamConsumerConfig agree on the out-of-the-box edit rhythm.  Tuned for
+# Telegram's ~1 edit/s flood envelope: a touch under 1s lets the cadence
+# breathe without bumping into rate limits, and a smaller buffer threshold
+# makes short replies feel near-instant in DMs.
+DEFAULT_STREAMING_EDIT_INTERVAL: float = 0.8
+DEFAULT_STREAMING_BUFFER_THRESHOLD: int = 24
+DEFAULT_STREAMING_CURSOR: str = " ▉"
+
+
@dataclass
 class StreamingConfig:
    """Configuration for real-time token streaming to messaging platforms."""
@ -330,9 +340,9 @@ class StreamingConfig:
    #   "edit"  — progressive editMessageText only (legacy behaviour).
    #   "off"   — disable streaming entirely.
    transport: str = "auto"
-    edit_interval: float = 1.0    # Seconds between message edits (Telegram rate-limits at ~1/s)
-    buffer_threshold: int = 40    # Chars before forcing an edit
-    cursor: str = " ▉"           # Cursor shown during streaming
+    edit_interval: float = DEFAULT_STREAMING_EDIT_INTERVAL
+    buffer_threshold: int = DEFAULT_STREAMING_BUFFER_THRESHOLD
+    cursor: str = DEFAULT_STREAMING_CURSOR
    # Ported from openclaw/openclaw#72038.  When >0, the final edit for
    # a long-running streamed response is delivered as a fresh message
    # if the original preview has been visible for at least this many
@ -359,9 +369,13 @@ class StreamingConfig:
        return cls(
            enabled=_coerce_bool(data.get("enabled"), False),
            transport=data.get("transport", "auto"),
-            edit_interval=_coerce_float(data.get("edit_interval"), 1.0),
-            buffer_threshold=_coerce_int(data.get("buffer_threshold"), 40),
-            cursor=data.get("cursor", " ▉"),
+            edit_interval=_coerce_float(
+                data.get("edit_interval"), DEFAULT_STREAMING_EDIT_INTERVAL,
+            ),
+            buffer_threshold=_coerce_int(
+                data.get("buffer_threshold"), DEFAULT_STREAMING_BUFFER_THRESHOLD,
+            ),
+            cursor=data.get("cursor", DEFAULT_STREAMING_CURSOR),
            fresh_final_after_seconds=_coerce_float(
                data.get("fresh_final_after_seconds"), 60.0
            ),