perf(gateway): tune Telegram cadence + adaptive fast-path for short replies

Re-authored against current main from PR #10388 by @wilsen0.  The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.

Tuning summary
==============

Text-batch ingress (gateway/platforms/telegram.py):
  - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
  - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
  - Adaptive fast-path tiers in _flush_text_batch:
      total <= 320 cp -> min(cap, 0.18)
      total <= 1024 cp -> min(cap, 0.24)
      else            -> cap
    A single short reply now reaches the agent in ~180ms instead of
    600ms.  Tier constants compose with the configured cap via min()
    so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
    below 0.18 still wins on every tier.
  - _env_float_clamped helper replaces bare float(os.getenv()).
    Rejects NaN / Inf, applies optional min/max bounds.  Used for
    text-batch + media-batch knobs.  Prevents asyncio.sleep(NaN)
    crashes when an operator typos an env var.

Stream cadence (gateway/config.py + stream_consumer.py):
  - StreamingConfig.edit_interval default 1.0s -> 0.8s
  - StreamingConfig.buffer_threshold default 40 -> 24 chars
  - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
    a single source of truth.  StreamConsumerConfig imports them
    instead of duplicating the literals; the prior dual-source drift
    is fixed.

Tool progress (gateway/display_config.py):
  - Telegram default tool_progress 'all' -> 'new'.  Inside
    Telegram's ~1 edit/s flood envelope the 'all' default would
    accumulate edit pressure on busy chats; 'new' shows only the
    leading bubble per tool batch and feels less spammy.
  - Slack tier_low override (tool_progress='off') is preserved.

Composition with native draft streaming (#23512)
================================================

The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based.  The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport.  No conflict.

Stale-base avoidance
====================

Re-authored from scratch rather than cherry-picked.  Dropped from the
original branch:
  - Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking
    continuity' commit
  - boot_md.py builtin gateway hook (unrelated)
  - Reverted Slack tool_progress='off' (#14663) restoration
  - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO
    members deletion
  - 2300+ lines of run.py base-skew noise

Tests
=====

New tests/gateway/test_telegram_text_batch_perf.py:
  - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds).
  - 4 tests for the adaptive-tier composition rules.

Updated tests/gateway/test_display_config.py:
  - test_platform_default_when_no_user_config: 'all' -> 'new' for
    Telegram, with comment.
  - test_high_tier_platforms: split into Telegram-overrides-to-new
    and Discord-stays-all assertions.

Closes #10388.

Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
This commit is contained in:
wilsen0 2026-05-10 22:18:06 -07:00 committed by Teknium
parent e3b88a8fe2
commit ac95b8cdbe
4 changed files with 103 additions and 15 deletions

View file

@ -317,6 +317,16 @@ class PlatformConfig:
)
# Streaming defaults — single source of truth so both StreamingConfig and
# StreamConsumerConfig agree on the out-of-the-box edit rhythm. Tuned for
# Telegram's ~1 edit/s flood envelope: a touch under 1s lets the cadence
# breathe without bumping into rate limits, and a smaller buffer threshold
# makes short replies feel near-instant in DMs.
DEFAULT_STREAMING_EDIT_INTERVAL: float = 0.8
DEFAULT_STREAMING_BUFFER_THRESHOLD: int = 24
DEFAULT_STREAMING_CURSOR: str = ""
@dataclass
class StreamingConfig:
"""Configuration for real-time token streaming to messaging platforms."""
@ -330,9 +340,9 @@ class StreamingConfig:
# "edit" — progressive editMessageText only (legacy behaviour).
# "off" — disable streaming entirely.
transport: str = "auto"
edit_interval: float = 1.0 # Seconds between message edits (Telegram rate-limits at ~1/s)
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
edit_interval: float = DEFAULT_STREAMING_EDIT_INTERVAL
buffer_threshold: int = DEFAULT_STREAMING_BUFFER_THRESHOLD
cursor: str = DEFAULT_STREAMING_CURSOR
# Ported from openclaw/openclaw#72038. When >0, the final edit for
# a long-running streamed response is delivered as a fresh message
# if the original preview has been visible for at least this many
@ -359,9 +369,13 @@ class StreamingConfig:
return cls(
enabled=_coerce_bool(data.get("enabled"), False),
transport=data.get("transport", "auto"),
edit_interval=_coerce_float(data.get("edit_interval"), 1.0),
buffer_threshold=_coerce_int(data.get("buffer_threshold"), 40),
cursor=data.get("cursor", ""),
edit_interval=_coerce_float(
data.get("edit_interval"), DEFAULT_STREAMING_EDIT_INTERVAL,
),
buffer_threshold=_coerce_int(
data.get("buffer_threshold"), DEFAULT_STREAMING_BUFFER_THRESHOLD,
),
cursor=data.get("cursor", DEFAULT_STREAMING_CURSOR),
fresh_final_after_seconds=_coerce_float(
data.get("fresh_final_after_seconds"), 60.0
),