Re-authored against current main from PR #10388 by @wilsen0. The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.
Tuning summary
==============
Text-batch ingress (gateway/platforms/telegram.py):
- HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
- HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
- Adaptive fast-path tiers in _flush_text_batch:
total <= 320 cp -> min(cap, 0.18)
total <= 1024 cp -> min(cap, 0.24)
else -> cap
A single short reply now reaches the agent in ~180ms instead of
600ms. Tier constants compose with the configured cap via min()
so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
below 0.18 still wins on every tier.
- _env_float_clamped helper replaces bare float(os.getenv()).
Rejects NaN / Inf, applies optional min/max bounds. Used for
text-batch + media-batch knobs. Prevents asyncio.sleep(NaN)
crashes when an operator typos an env var.
Stream cadence (gateway/config.py + stream_consumer.py):
- StreamingConfig.edit_interval default 1.0s -> 0.8s
- StreamingConfig.buffer_threshold default 40 -> 24 chars
- DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
a single source of truth. StreamConsumerConfig imports them
instead of duplicating the literals; the prior dual-source drift
is fixed.
Tool progress (gateway/display_config.py):
- Telegram default tool_progress 'all' -> 'new'. Inside
Telegram's ~1 edit/s flood envelope the 'all' default would
accumulate edit pressure on busy chats; 'new' shows only the
leading bubble per tool batch and feels less spammy.
- Slack tier_low override (tool_progress='off') is preserved.
Composition with native draft streaming (#23512)
================================================
The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based. The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport. No conflict.
Stale-base avoidance
====================
Re-authored from scratch rather than cherry-picked. Dropped from the
original branch:
- Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking
continuity' commit
- boot_md.py builtin gateway hook (unrelated)
- Reverted Slack tool_progress='off' (#14663) restoration
- Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO
members deletion
- 2300+ lines of run.py base-skew noise
Tests
=====
New tests/gateway/test_telegram_text_batch_perf.py:
- 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds).
- 4 tests for the adaptive-tier composition rules.
Updated tests/gateway/test_display_config.py:
- test_platform_default_when_no_user_config: 'all' -> 'new' for
Telegram, with comment.
- test_high_tier_platforms: split into Telegram-overrides-to-new
and Discord-stays-all assertions.
Closes#10388.
Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.
Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.
Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
Three changes that address the poor WhatsApp experience reported by users:
1. Reclassify WhatsApp from TIER_LOW to TIER_MEDIUM in display_config.py
— enables streaming and tool progress via the existing Baileys /edit
bridge endpoint. Users now see progressive responses instead of
minutes of silence followed by a wall of text.
2. Lower MAX_MESSAGE_LENGTH from 65536 to 4096 and add proper chunking
— send() now calls format_message() and truncate_message() before
sending, then loops through chunks with a small delay between them.
The base class truncate_message() already handles code block boundary
detection (closes/reopens fences at chunk boundaries). reply_to is
only set on the first chunk.
3. Override format_message() with WhatsApp-specific markdown conversion
— converts **bold** to *bold*, ~~strike~~ to ~strike~, headers to
bold text, and [links](url) to text (url). Code blocks and inline
code are protected from conversion via placeholder substitution.
Together these fix the two user complaints:
- 'sends the whole code all the time' → now chunked at 4K with proper
formatting
- 'terminal gets interrupted and gets cooked' → streaming + tool progress
give visual feedback so users don't accidentally interrupt with
follow-up messages
Add display.platforms section to config.yaml for per-platform overrides of
display settings (tool_progress, show_reasoning, streaming, tool_preview_length).
Each platform gets sensible built-in defaults based on capability tier:
- High (telegram, discord): tool_progress=all, streaming follows global
- Medium (slack, mattermost, matrix, feishu): tool_progress=new
- Low (signal, whatsapp, bluebubbles, wecom, etc.): tool_progress=off, streaming=false
- Minimal (email, sms, webhook, homeassistant): tool_progress=off, streaming=false
Example config:
display:
platforms:
telegram:
tool_progress: all
show_reasoning: true
slack:
tool_progress: off
Resolution order: platform override > global setting > built-in platform default.
Changes:
- New gateway/display_config.py: resolver module with tier-based platform defaults
- gateway/run.py: tool_progress, tool_preview_length, streaming, show_reasoning
all resolve per-platform via the new resolver
- /verbose command: now cycles tool_progress per-platform (saves to
display.platforms.<platform>.tool_progress instead of global)
- /reasoning show|hide: now saves show_reasoning per-platform
- Config version 15 -> 16: migrates tool_progress_overrides into display.platforms
- Backward compat: legacy tool_progress_overrides still read as fallback
- 27 new tests for resolver, normalization, migration, backward compat
- Updated verbose command tests for per-platform behavior
Addresses community request for per-channel verbosity control (Guillaume Meyer,
Nathan Danielsen) — high verbosity on backchannel Telegram, low on customer-facing
Slack, none on email.