diff --git a/.env.example b/.env.example
index 589978e6b5..5c08a4acd6 100644
--- a/.env.example
+++ b/.env.example
@@ -244,6 +244,15 @@ BROWSERBASE_PROXIES=true
# Uses custom Chromium build to avoid bot detection altogether
BROWSERBASE_ADVANCED_STEALTH=false
+# Browser engine for local mode (default: auto = Chrome)
+# "auto" — use Chrome (don't pass --engine flag)
+# "lightpanda" — use Lightpanda (1.3-5.8x faster navigation, no screenshots)
+# "chrome" — explicitly request Chrome
+# Requires agent-browser v0.25.3+. Lightpanda commands that fail or return
+# empty results are automatically retried with Chrome.
+# Also configurable via browser.engine in config.yaml.
+# AGENT_BROWSER_ENGINE=auto
+
# Browser session timeout in seconds (default: 300)
# Sessions are cleaned up after this duration of inactivity
BROWSER_SESSION_TIMEOUT=300
@@ -414,3 +423,24 @@ IMAGE_TOOLS_DEBUG=false
# TEAMS_HOME_CHANNEL= # Default channel/chat ID for cron delivery
# TEAMS_HOME_CHANNEL_NAME= # Display name for the home channel
# TEAMS_PORT=3978 # Webhook listen port (Bot Framework default)
+
+# =============================================================================
+# GOOGLE CHAT INTEGRATION
+# =============================================================================
+# Connects via Cloud Pub/Sub pull subscription (no public URL required).
+# Setup walkthrough: website/docs/user-guide/messaging/google_chat.md.
+# 1. Create a GCP project, enable the Google Chat API and Cloud Pub/Sub.
+# 2. Create a Service Account with roles/pubsub.subscriber on the
+# subscription (NOT project-wide); download the JSON key.
+# 3. Configure your Chat app at console.cloud.google.com/apis/credentials
+# → Google Chat API → Configuration → Cloud Pub/Sub topic.
+# 4. (Optional, for native attachment delivery) Each user runs
+# `/setup-files` once in their own DM after Pub/Sub is wired up.
+#
+# GOOGLE_CHAT_PROJECT_ID= # GCP project hosting the topic (or set GOOGLE_CLOUD_PROJECT)
+# GOOGLE_CHAT_SUBSCRIPTION_NAME= # Full path: projects/
+
+
+
| 真正的终端界面 | 完整的 TUI,支持多行编辑、斜杠命令自动补全、对话历史、中断重定向和流式工具输出。 |
| 随你所在 | Telegram、Discord、Slack、WhatsApp、Signal 和 CLI——全部从单个网关进程运行。语音备忘录转写、跨平台对话连续性。 |
| 闭环学习 | 代理管理记忆并定期自我提醒。复杂任务后自动创建技能。技能在使用中自我改进。FTS5 会话搜索配合 LLM 摘要实现跨会话回溯。Honcho 辩证式用户建模。兼容 agentskills.io 开放标准。 |
| 定时自动化 | 内置 cron 调度器,支持向任何平台投递。日报、夜间备份、周审计——全部用自然语言描述,无人值守运行。 |
| 委派与并行 | 生成隔离子代理处理并行工作流。编写 Python 脚本通过 RPC 调用工具,将多步管道压缩为零上下文开销的轮次。 |
| 随处运行 | 六种终端后端——本地、Docker、SSH、Daytona、Singularity 和 Modal。Daytona 和 Modal 提供 Serverless 持久化——代理环境空闲时休眠、按需唤醒,空闲期间几乎零成本。$5 VPS 或 GPU 集群都能跑。 |
| 研究就绪 | 批量轨迹生成、Atropos RL 环境、轨迹压缩——用于训练下一代工具调用模型。 |
` background inside dashboard board ([#20687](https://github.com/NousResearch/hermes-agent/pull/20687))
+- Fix: preserve dashboard completion summaries + add kanban edit (salvages #20016) ([#20195](https://github.com/NousResearch/hermes-agent/pull/20195))
+- Fix: avoid fragile failure-column renames (salvage #20848) (@kshitijk4poor) ([#20855](https://github.com/NousResearch/hermes-agent/pull/20855))
+
+### Worker lifecycle + reliability
+- **Heartbeat + reclaim + zombie + retry-cap fixes** (#21147, #21141, #21169, #20881) ([#21183](https://github.com/NousResearch/hermes-agent/pull/21183))
+- **Auto-block workers that exit without completing + shutdown race** (#20894) ([#21214](https://github.com/NousResearch/hermes-agent/pull/21214))
+- **Detect darwin zombie workers** (salvages #20023) ([#20188](https://github.com/NousResearch/hermes-agent/pull/20188))
+- **Unify failure counter across spawn/timeout/crash outcomes** ([#20410](https://github.com/NousResearch/hermes-agent/pull/20410))
+- **Enforce worker task-ownership on destructive tool calls** ([#19713](https://github.com/NousResearch/hermes-agent/pull/19713))
+- **Drop worker identity claim from KANBAN_GUIDANCE** ([#19427](https://github.com/NousResearch/hermes-agent/pull/19427))
+- Fix: skip dispatch for tasks assigned to non-profile lanes (salvages #20105, #20134) ([#20165](https://github.com/NousResearch/hermes-agent/pull/20165))
+- Fix: include default profile in on-disk assignee enumeration (salvages #20123) ([#20170](https://github.com/NousResearch/hermes-agent/pull/20170))
+- Fix: ignore stale current board pointers (salvages #20063) ([#20183](https://github.com/NousResearch/hermes-agent/pull/20183))
+- Fix: profile discovery ignores HERMES_HOME in custom-root deployments (@jackey8616) ([#19020](https://github.com/NousResearch/hermes-agent/pull/19020))
+- Fix: allow orchestrator profiles to see kanban tools via toolsets config ([#19606](https://github.com/NousResearch/hermes-agent/pull/19606))
+
+### Batch salvages
+- Tier-1 batch — metadata test, max_spawn config, run-id lifecycle guard (salvages #19522 #19556 #19829) ([#20440](https://github.com/NousResearch/hermes-agent/pull/20440))
+- Tier-2 batch — doctor, started_at, parent-guard, latest_summary, selects, linked-children ([#20448](https://github.com/NousResearch/hermes-agent/pull/20448))
+
+### Documentation
+- Backfill multi-board refs in reference docs ([#19704](https://github.com/NousResearch/hermes-agent/pull/19704))
+- Document `/kanban` slash command ([#19584](https://github.com/NousResearch/hermes-agent/pull/19584))
+- Document recommended handoff evidence metadata (salvage #19512) ([#20415](https://github.com/NousResearch/hermes-agent/pull/20415))
+- Fix orchestrator + worker skill setup instructions (@helix4u) ([#20958](https://github.com/NousResearch/hermes-agent/pull/20958), [#20960](https://github.com/NousResearch/hermes-agent/pull/20960))
+
+---
+
+## 🎯 Persistent Goals, Checkpoints & Session Durability
+
+### `/goal` — persistent cross-turn goals (Ralph loop)
+- **`feat: /goal — persistent cross-turn goals`** ([#18262](https://github.com/NousResearch/hermes-agent/pull/18262))
+- **Docs page — Persistent Goals (/goal)** ([#18275](https://github.com/NousResearch/hermes-agent/pull/18275))
+- Fix: honor configured goal turn budget (salvage #19423) ([#21287](https://github.com/NousResearch/hermes-agent/pull/21287))
+
+### Checkpoints v2
+- **Single-store rewrite with real pruning + disk guardrails** ([#20709](https://github.com/NousResearch/hermes-agent/pull/20709))
+
+### Session durability
+- **Auto-resume interrupted sessions after gateway restart** (salvage #20888) ([#21192](https://github.com/NousResearch/hermes-agent/pull/21192))
+- **Preserve pending update prompts across restarts** ([#20160](https://github.com/NousResearch/hermes-agent/pull/20160))
+- **Preserve home-channel thread targets across restart notifications** (salvage #18440) ([#19271](https://github.com/NousResearch/hermes-agent/pull/19271))
+- **Preserve thread routing from cached live session sources** ([#21206](https://github.com/NousResearch/hermes-agent/pull/21206))
+- **Preserve assistant metadata when branching sessions** ([#18222](https://github.com/NousResearch/hermes-agent/pull/18222))
+- **Preserve thread routing for /update progress and prompts** ([#18193](https://github.com/NousResearch/hermes-agent/pull/18193))
+- **Preserve document type when merging queued events** ([#18215](https://github.com/NousResearch/hermes-agent/pull/18215))
+
+---
+
+## 🛡️ Security & Reliability
+
+### Security hardening (8 P0 closures)
+- **Enable secret redaction by default** (#17691, #20785) ([#21193](https://github.com/NousResearch/hermes-agent/pull/21193))
+- **Discord — scope `DISCORD_ALLOWED_ROLES` to originating guild** (#12136, CVSS 8.1) ([#21241](https://github.com/NousResearch/hermes-agent/pull/21241))
+- **WhatsApp — reject strangers by default, never respond in self-chat** (#8389) ([#21291](https://github.com/NousResearch/hermes-agent/pull/21291))
+- **MCP OAuth — close TOCTOU window when saving credentials** ([#21176](https://github.com/NousResearch/hermes-agent/pull/21176))
+- **`hermes_cli/auth.py` — close TOCTOU window in credential writers** ([#21194](https://github.com/NousResearch/hermes-agent/pull/21194))
+- **Browser — enforce cloud-metadata SSRF floor in hybrid routing** (#16234) ([#21228](https://github.com/NousResearch/hermes-agent/pull/21228))
+- **`hermes debug share` — redact log content at upload time** (@GodsBoy) ([#19318](https://github.com/NousResearch/hermes-agent/pull/19318))
+- **Cron — scan assembled prompt including skill content for prompt injection** (#3968) ([#21350](https://github.com/NousResearch/hermes-agent/pull/21350))
+- **Restore .env/auth.json/state.db with 0600 perms** ([#19699](https://github.com/NousResearch/hermes-agent/pull/19699))
+- **SRI integrity for dashboard plugin scripts** (salvage #19389) ([#21277](https://github.com/NousResearch/hermes-agent/pull/21277))
+- **Bind Meet node server to localhost, restrict token file to owner read** ([#19597](https://github.com/NousResearch/hermes-agent/pull/19597))
+- **Extend sensitive-write target to cover shell RC and credential files** ([#19282](https://github.com/NousResearch/hermes-agent/pull/19282))
+- **Harden YOLO mode env parsing against quoted-bool strings** ([#18214](https://github.com/NousResearch/hermes-agent/pull/18214))
+- **OSV-Scanner CI + Dependabot for github-actions only** ([#20037](https://github.com/NousResearch/hermes-agent/pull/20037))
+
+### Reliability — critical bug closures
+- **CLI crash on startup — `Invalid key 'c-S-c'`** (P0, prompt_toolkit doesn't support Shift modifier) ([#19895](https://github.com/NousResearch/hermes-agent/pull/19895), [#19919](https://github.com/NousResearch/hermes-agent/pull/19919))
+- **CLOSE_WAIT fd leak audit** — httpx keepalive + WhatsApp aiohttp leak + Feishu hygiene (#18451) ([#18766](https://github.com/NousResearch/hermes-agent/pull/18766))
+- **Gateway creates AIAgent with empty OpenRouter API key when OPENROUTER_API_KEY is missing** (#20982) — fallback providers correctly honored
+- **Background review + curator protected from overwriting bundled/hub skills** (#20273) ([#20194](https://github.com/NousResearch/hermes-agent/pull/20194))
+- **TUI compression continuation — ghost sessions with incomplete metadata** (#20001)
+- **`hermes mcp add` silently launches chat instead of registering MCP server** (#19785) ([#21204](https://github.com/NousResearch/hermes-agent/pull/21204))
+- **Background review agent runtime propagation** — provider/model/credentials now actually inherit from parent
+- **Inbound document host paths translated to container paths for Docker backend** (salvage #19048) ([#21184](https://github.com/NousResearch/hermes-agent/pull/21184))
+- **Matrix gateway race between auto-redaction and message delivery with high-speed models** (#19075)
+- **`/new` during active agent session never sends response on Telegram** (#18912)
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### New platform
+- **Google Chat — 20th platform** + generic `env_enablement_fn` / `cron_deliver_env_var` platform-plugin hooks (IRC + Teams migrated) ([#21306](https://github.com/NousResearch/hermes-agent/pull/21306), [#21331](https://github.com/NousResearch/hermes-agent/pull/21331))
+
+### Cross-platform
+- **`allowed_{channels,chats,rooms}` whitelist** — Slack (salvage #7401), Telegram, Mattermost, Matrix, DingTalk ([#21251](https://github.com/NousResearch/hermes-agent/pull/21251))
+- **Per-platform `gateway_restart_notification` flag** ([#20892](https://github.com/NousResearch/hermes-agent/pull/20892))
+- **`busy_ack_enabled` config — suppress ack messages** ([#18194](https://github.com/NousResearch/hermes-agent/pull/18194))
+- **Auto-delete slash-command system notices after TTL** ([#18266](https://github.com/NousResearch/hermes-agent/pull/18266))
+- **Opt-in cleanup of temporary progress bubbles** ([#21186](https://github.com/NousResearch/hermes-agent/pull/21186))
+- **`[[as_document]]` directive — skill media routing** (salvage #19069) ([#21210](https://github.com/NousResearch/hermes-agent/pull/21210))
+- **`hermes gateway list` — cross-profile status** (salvage #19129) ([#21225](https://github.com/NousResearch/hermes-agent/pull/21225))
+- **Auto-resume interrupted sessions after restart** (salvage #20888) ([#21192](https://github.com/NousResearch/hermes-agent/pull/21192))
+- **Atomic restart markers + Windows runtime-lock offset** (#17842) ([#18179](https://github.com/NousResearch/hermes-agent/pull/18179))
+- Fix: `config.yaml` wins over `.env` for agent/display/timezone settings ([#18764](https://github.com/NousResearch/hermes-agent/pull/18764))
+- Fix: auto-restart when source files change out from under us (#17648) ([#18409](https://github.com/NousResearch/hermes-agent/pull/18409))
+- Fix: use git HEAD SHA for stale-code check, not file mtimes ([#19740](https://github.com/NousResearch/hermes-agent/pull/19740))
+- Fix: shutdown + restart hygiene — drain timeout, false-fatal, success log ([#18761](https://github.com/NousResearch/hermes-agent/pull/18761))
+- Fix: preserve max_turns after env reload (salvage #19183) ([#21240](https://github.com/NousResearch/hermes-agent/pull/21240))
+- Fix: exclude ancestor PIDs from gateway process scan ([#19586](https://github.com/NousResearch/hermes-agent/pull/19586))
+- Fix: move quick-command alias dispatch before built-ins ([#19588](https://github.com/NousResearch/hermes-agent/pull/19588))
+- Fix: show other profiles in 'gateway status' to prevent confusion ([#19582](https://github.com/NousResearch/hermes-agent/pull/19582))
+- Fix: include external_dirs skills in Telegram/Discord slash commands (salvage #8790) ([#18741](https://github.com/NousResearch/hermes-agent/pull/18741))
+- Fix: match disabled/optional skills by frontmatter slug, not dir name ([#18753](https://github.com/NousResearch/hermes-agent/pull/18753))
+- Fix: read /status token totals from SessionDB (#17158) ([#18206](https://github.com/NousResearch/hermes-agent/pull/18206))
+- Fix: snapshot callback generation after agent binds it, not before ([#18219](https://github.com/NousResearch/hermes-agent/pull/18219))
+- Fix: re-inject topic-bound skill after /new or /reset ([#18205](https://github.com/NousResearch/hermes-agent/pull/18205))
+- Fix: isolate pending native image paths by session ([#18202](https://github.com/NousResearch/hermes-agent/pull/18202))
+- Fix: clear queued reload skills notes on new/resume/branch ([#19431](https://github.com/NousResearch/hermes-agent/pull/19431))
+- Fix: hide required-arg commands from Telegram menu ([#19400](https://github.com/NousResearch/hermes-agent/pull/19400))
+- Fix: bridge top-level `require_mention` to Telegram config ([#19429](https://github.com/NousResearch/hermes-agent/pull/19429))
+- Fix: suppress duplicate voice transcripts ([#19428](https://github.com/NousResearch/hermes-agent/pull/19428))
+- Fix: show friendly error when service is not installed ([#19707](https://github.com/NousResearch/hermes-agent/pull/19707))
+- Fix: read context_length from custom_providers in session info header ([#19708](https://github.com/NousResearch/hermes-agent/pull/19708))
+- Fix: preserve WSL interop PATH in systemd units ([#19867](https://github.com/NousResearch/hermes-agent/pull/19867))
+- Fix: handle planned service stops (salvage #19876) ([#19936](https://github.com/NousResearch/hermes-agent/pull/19936))
+- Fix: keep DoH-confirmed Telegram IPs that match system DNS (salvage #17043) ([#20175](https://github.com/NousResearch/hermes-agent/pull/20175))
+- Fix: load `reply_to_mode` from config.yaml for Discord + Telegram (salvage #17117) ([#20171](https://github.com/NousResearch/hermes-agent/pull/20171))
+- Fix: tolerate malformed HERMES_HUMAN_DELAY_* env vars (salvage #16933) ([#20217](https://github.com/NousResearch/hermes-agent/pull/20217))
+- Fix: deterministic thread eviction preserves newest entries (salvage #13639) ([#20285](https://github.com/NousResearch/hermes-agent/pull/20285))
+- Fix: don't dead-end setup wizard when only system-scope unit is installed ([#20905](https://github.com/NousResearch/hermes-agent/pull/20905))
+- Fix: wait for systemd restart readiness + harden Discord slash-command sync ([#20949](https://github.com/NousResearch/hermes-agent/pull/20949))
+- Fix: avoid duplicated Responses history (salvage #18995) ([#21185](https://github.com/NousResearch/hermes-agent/pull/21185))
+- Fix: surface bootstrap failures to stderr (salvage #21157) ([#21278](https://github.com/NousResearch/hermes-agent/pull/21278))
+- Fix: log agent task failures instead of silently losing usage data (salvage #21159) ([#21274](https://github.com/NousResearch/hermes-agent/pull/21274))
+- Fix: log runtime-status write failures with rate-limiting (salvage #21158) ([#21285](https://github.com/NousResearch/hermes-agent/pull/21285))
+- Fix: reset-failed before every fallback restart so the gateway can't get stranded ([#21371](https://github.com/NousResearch/hermes-agent/pull/21371))
+- Fix: Telegram — preserve `thread_id=1` for forum General typing indicator ([#21390](https://github.com/NousResearch/hermes-agent/pull/21390))
+- Fix: batch critical fixes — session resume, /new race, HA WebSocket scheme (@kshitijk4poor) ([#19182](https://github.com/NousResearch/hermes-agent/pull/19182))
+
+### Telegram
+- **DM user-managed multi-session topics** (salvage of #19185) ([#19206](https://github.com/NousResearch/hermes-agent/pull/19206))
+
+### Discord
+- **Message deletion action** (salvage #19052) ([#21197](https://github.com/NousResearch/hermes-agent/pull/21197))
+- Fix: allow `free_response_channels` to override `DISCORD_IGNORE_NO_MENTION` ([#19629](https://github.com/NousResearch/hermes-agent/pull/19629))
+
+### Slack
+- Fix: ephemeral slash-command ack, private notice delivery, format_message fixes (@kshitijk4poor) ([#18198](https://github.com/NousResearch/hermes-agent/pull/18198))
+
+### WhatsApp
+- Fix: load WhatsApp home channel from env overrides ([#18190](https://github.com/NousResearch/hermes-agent/pull/18190))
+
+### Feishu
+- **Operator-configurable bot admission and mention policy** ([#18208](https://github.com/NousResearch/hermes-agent/pull/18208))
+- Fix: force text mode for markdown tables (salvage of #13723 by @WuTianyi123) ([#20275](https://github.com/NousResearch/hermes-agent/pull/20275))
+
+### Matrix + Email
+- Fix: `/sethome` on Matrix and Email now persists across restarts ([#18272](https://github.com/NousResearch/hermes-agent/pull/18272))
+
+### Teams
+- **Docs + feat: sidebar + threading with group-chat fallback** ([#20042](https://github.com/NousResearch/hermes-agent/pull/20042))
+
+### Weixin
+- Fix: deduplicate Weixin messages by content fingerprint ([#19742](https://github.com/NousResearch/hermes-agent/pull/19742))
+
+### QQBot
+- **Port SDK improvements in-tree — chunked upload, approval keyboards, quoted attachments** ([#21342](https://github.com/NousResearch/hermes-agent/pull/21342))
+- **Wire native tool-approval UX via inline keyboards** ([#21353](https://github.com/NousResearch/hermes-agent/pull/21353))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Provider & Model Support
+
+#### Pluggable providers
+- **ProviderProfile ABC + `plugins/model-providers/`** — inference providers are now a pluggable surface (salvage of #14424) ([#20324](https://github.com/NousResearch/hermes-agent/pull/20324))
+- **`list_picker_providers`** — credential-filtered picker (salvage #13561) ([#20298](https://github.com/NousResearch/hermes-agent/pull/20298))
+- **Remove `/provider` alias for `/model`** ([#20358](https://github.com/NousResearch/hermes-agent/pull/20358))
+- **Shared Hermes dotenv loader across CLI + plugins** (salvage #13660) ([#20281](https://github.com/NousResearch/hermes-agent/pull/20281))
+- **Nous OAuth persisted across profiles via shared token store** ([#19712](https://github.com/NousResearch/hermes-agent/pull/19712))
+
+#### New models
+- `deepseek/deepseek-v4-pro` added to OpenRouter + Nous Portal ([#20495](https://github.com/NousResearch/hermes-agent/pull/20495))
+- `x-ai/grok-4.3` added to OpenRouter + Nous Portal ([#20497](https://github.com/NousResearch/hermes-agent/pull/20497))
+- `openrouter/owl-alpha` (free tier) added to curated OpenRouter list ([#18071](https://github.com/NousResearch/hermes-agent/pull/18071))
+- `tencent/hy3-preview` paid route on OpenRouter (@Contentment003111) ([#21077](https://github.com/NousResearch/hermes-agent/pull/21077))
+- Arcee Trinity Large Thinking — temperature + compression overrides ([#20473](https://github.com/NousResearch/hermes-agent/pull/20473))
+- Rename `x-ai/grok-4.20-beta` to `x-ai/grok-4.20` ([#19640](https://github.com/NousResearch/hermes-agent/pull/19640))
+- Demote Vercel AI Gateway to bottom of provider picker ([#18112](https://github.com/NousResearch/hermes-agent/pull/18112))
+
+#### Provider configuration
+- **OpenRouter — response caching support** (@kshitijk4poor) ([#19132](https://github.com/NousResearch/hermes-agent/pull/19132))
+- **`image_gen.model` from config.yaml honored** (salvage #19376) ([#21273](https://github.com/NousResearch/hermes-agent/pull/21273))
+- Fix: honor runtime default model during delegate provider resolution (@johnncenae) ([#17587](https://github.com/NousResearch/hermes-agent/pull/17587))
+- Fix: avoid Bedrock credential probe in provider picker (@helix4u) ([#18998](https://github.com/NousResearch/hermes-agent/pull/18998))
+- Fix: drop stale env-var override of persisted provider for cron ([#19627](https://github.com/NousResearch/hermes-agent/pull/19627))
+- Fix: auxiliary curator api_key/base_url into runtime resolution ([#19421](https://github.com/NousResearch/hermes-agent/pull/19421))
+
+### Agent Loop & Conversation
+- **`video_analyze` — native video understanding tool** (@alt-glitch) ([#19301](https://github.com/NousResearch/hermes-agent/pull/19301))
+- **Show context compression count in status bar** (CLI + TUI) ([#21218](https://github.com/NousResearch/hermes-agent/pull/21218))
+- **Isolate `get_tool_definitions` quiet_mode cache + dedup LCM injection** (#17335) ([#17889](https://github.com/NousResearch/hermes-agent/pull/17889))
+- Fix: warning-first tool-call loop guardrails ([#18227](https://github.com/NousResearch/hermes-agent/pull/18227))
+- Fix: break permanent empty-response loop from orphan tool-tail ([#21385](https://github.com/NousResearch/hermes-agent/pull/21385))
+- Fix: propagate ContextVars to concurrent tool worker threads (salvage #16660) ([#18123](https://github.com/NousResearch/hermes-agent/pull/18123))
+- Fix: surface self-improvement review summaries across CLI, TUI, and gateway ([#18073](https://github.com/NousResearch/hermes-agent/pull/18073))
+- Fix: serialize concurrent `hermes_tools` RPC calls from `execute_code` ([#17894](https://github.com/NousResearch/hermes-agent/pull/17894), [#17902](https://github.com/NousResearch/hermes-agent/pull/17902))
+- Fix: include system prompt + tool schemas in token estimates for compression ([#18265](https://github.com/NousResearch/hermes-agent/pull/18265))
+
+### Compression
+- Fix: skip non-string tool content in dedup pass to prevent AttributeError ([#19398](https://github.com/NousResearch/hermes-agent/pull/19398))
+- Fix: reset `_summary_failure_cooldown_until` on session reset ([#19622](https://github.com/NousResearch/hermes-agent/pull/19622))
+- Fix: trigger fallback on timeout errors alongside model-unavailable errors ([#19665](https://github.com/NousResearch/hermes-agent/pull/19665))
+- Fix: `_prune_old_tool_results` boundary direction ([#19725](https://github.com/NousResearch/hermes-agent/pull/19725))
+- Fix: soften summary prompt for content filters (salvage #19456) ([#21302](https://github.com/NousResearch/hermes-agent/pull/21302))
+
+### Delegate
+- Fix: inherit parent fallback_chain in `_build_child_agent` ([#19601](https://github.com/NousResearch/hermes-agent/pull/19601))
+- Fix: guard `_load_config()` against `delegation: null` in config.yaml ([#19662](https://github.com/NousResearch/hermes-agent/pull/19662))
+- Fix: inherit parent api_key when `delegation.base_url` set without `delegation.api_key` ([#19741](https://github.com/NousResearch/hermes-agent/pull/19741))
+- Fix: expand composite toolsets before intersection (salvage #19455) ([#21300](https://github.com/NousResearch/hermes-agent/pull/21300))
+- Fix: correct ACP docs — Claude Code CLI has no --acp flag (salvage #19058) ([#21201](https://github.com/NousResearch/hermes-agent/pull/21201))
+
+### Session & Memory
+- **Hindsight — probe API for `update_mode='append'` to dedupe across processes** (@nicoloboschi) ([#20222](https://github.com/NousResearch/hermes-agent/pull/20222))
+
+### Curator
+- **`hermes curator archive` and `prune` subcommands** ([#20200](https://github.com/NousResearch/hermes-agent/pull/20200))
+- **`hermes curator list-archived`** (#20651) ([#21236](https://github.com/NousResearch/hermes-agent/pull/21236))
+- **Synchronous manual `hermes curator run`** (#20555) ([#21216](https://github.com/NousResearch/hermes-agent/pull/21216))
+- Fix: preserve `last_report_path` in state ([#18169](https://github.com/NousResearch/hermes-agent/pull/18169))
+- Fix: rewrite cron job skill refs after consolidation ([#18253](https://github.com/NousResearch/hermes-agent/pull/18253))
+- Fix: defer first run + `--dry-run` preview (#18373) ([#18389](https://github.com/NousResearch/hermes-agent/pull/18389))
+- Fix: authoritative `absorbed_into` on delete + restore cron skill links on rollback (#18671) ([#18731](https://github.com/NousResearch/hermes-agent/pull/18731))
+- Fix: prevent false-positive consolidation from substring matching ([#19573](https://github.com/NousResearch/hermes-agent/pull/19573))
+- Fix: only mark agent-created for background-review sediment ([#19621](https://github.com/NousResearch/hermes-agent/pull/19621))
+- Fix: protect hub skills by frontmatter name ([#20194](https://github.com/NousResearch/hermes-agent/pull/20194))
+
+---
+
+## 🔧 Tool System
+
+### File tools
+- **Post-write delta lint on `write_file` + `patch`** — in-proc linters for Python, JSON, YAML, TOML ([#20191](https://github.com/NousResearch/hermes-agent/pull/20191))
+
+### Cron
+- **`no_agent` mode — script-only cron jobs (watchdog pattern)** ([#19709](https://github.com/NousResearch/hermes-agent/pull/19709))
+- **`context_from` chaining docs** (salvage #15724) ([#20394](https://github.com/NousResearch/hermes-agent/pull/20394))
+- Fix: treat non-dict origin as missing instead of crashing tick ([#19283](https://github.com/NousResearch/hermes-agent/pull/19283))
+- Fix: bump skill usage when cron jobs load skills ([#19433](https://github.com/NousResearch/hermes-agent/pull/19433))
+- Fix: recover null `next_run_at` jobs ([#19576](https://github.com/NousResearch/hermes-agent/pull/19576))
+- Fix: skip AI call when prerun script produces no output ([#19628](https://github.com/NousResearch/hermes-agent/pull/19628))
+- Fix: expand config.yaml refs during job execution ([#19872](https://github.com/NousResearch/hermes-agent/pull/19872))
+- Fix: serialize `get_due_jobs` writes to prevent parallel state corruption ([#19874](https://github.com/NousResearch/hermes-agent/pull/19874))
+- Fix: initialize MCP servers before constructing the cron AIAgent ([#21354](https://github.com/NousResearch/hermes-agent/pull/21354))
+
+### MCP
+- **SSE transport support** (salvage #19135) ([#21227](https://github.com/NousResearch/hermes-agent/pull/21227))
+- **Forward OAuth auth + bump `sse_read_timeout` on SSE transport** ([#21323](https://github.com/NousResearch/hermes-agent/pull/21323))
+- **Retry stale pipe transport failures as session-expired** ([#21289](https://github.com/NousResearch/hermes-agent/pull/21289))
+- **Surface image tool results as MEDIA tags instead of dropping them** ([#21328](https://github.com/NousResearch/hermes-agent/pull/21328))
+- **Periodic keepalive to `_wait_for_lifecycle_event`** (salvage #17016) ([#20209](https://github.com/NousResearch/hermes-agent/pull/20209))
+- Fix: reconnect on terminated sessions ([#19380](https://github.com/NousResearch/hermes-agent/pull/19380))
+- Fix: decouple AnyUrl import from mcp dependency ([#19695](https://github.com/NousResearch/hermes-agent/pull/19695))
+- Fix: `mcp add --command` gets distinct argparse dest ([#21204](https://github.com/NousResearch/hermes-agent/pull/21204))
+- Fix: clear stale thread interrupt before MCP discovery ([#21276](https://github.com/NousResearch/hermes-agent/pull/21276))
+- Fix: report configured timeout in MCP call errors ([#21281](https://github.com/NousResearch/hermes-agent/pull/21281))
+- Fix: include exception type in error messages when str(exc) is empty (salvage #19425) ([#21292](https://github.com/NousResearch/hermes-agent/pull/21292))
+- Fix: re-raise CancelledError explicitly in `MCPServerTask.run` ([#21318](https://github.com/NousResearch/hermes-agent/pull/21318))
+- Fix: coerce numeric tool args defensively in `mcp_serve` ([#21329](https://github.com/NousResearch/hermes-agent/pull/21329))
+- Fix: gate utility stubs on server-advertised capabilities ([#21347](https://github.com/NousResearch/hermes-agent/pull/21347))
+
+### Browser
+- Fix: allow explicit CDP override without local agent-browser ([#19670](https://github.com/NousResearch/hermes-agent/pull/19670))
+- Fix: inject `--no-sandbox` for root + AppArmor userns restrictions ([#19747](https://github.com/NousResearch/hermes-agent/pull/19747))
+- Fix: tighten Lightpanda fallback edge cases (@kshitijk4poor) ([#20672](https://github.com/NousResearch/hermes-agent/pull/20672))
+
+### Web tools
+- **Per-capability backend selection — search/extract split** (@kshitijk4poor) ([#20061](https://github.com/NousResearch/hermes-agent/pull/20061))
+- **SearXNG native search-only backend** (@kshitijk4poor) ([#20823](https://github.com/NousResearch/hermes-agent/pull/20823))
+
+### Approval / Tool gating
+- Fix: wake blocked gateway approvals on session cleanup ([#18171](https://github.com/NousResearch/hermes-agent/pull/18171))
+- Fix: harden YOLO mode env parsing against quoted-bool strings ([#18214](https://github.com/NousResearch/hermes-agent/pull/18214))
+- Fix: extend sensitive write target to cover shell RC and credential files ([#19282](https://github.com/NousResearch/hermes-agent/pull/19282))
+
+---
+
+## 🔌 Plugin System
+
+- **`transform_llm_output` plugin hook** (salvage of #20813) ([#21235](https://github.com/NousResearch/hermes-agent/pull/21235))
+- **Document `env_enablement_fn` + `cron_deliver_env_var` platform-plugin hooks** ([#21331](https://github.com/NousResearch/hermes-agent/pull/21331))
+- **Pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix** ([#20749](https://github.com/NousResearch/hermes-agent/pull/20749))
+- **Plugin-authoring gaps — image-gen provider guide + publishing a skill tap** ([#20800](https://github.com/NousResearch/hermes-agent/pull/20800))
+
+---
+
+## 🧩 Skills Ecosystem
+
+### New optional skills
+- **Shopify** — Admin + Storefront GraphQL optional skill ([#18116](https://github.com/NousResearch/hermes-agent/pull/18116))
+- **here.now** — optional skill ([#18170](https://github.com/NousResearch/hermes-agent/pull/18170))
+- **shop-app** — personal shopping assistant (optional) ([#20702](https://github.com/NousResearch/hermes-agent/pull/20702))
+- **Anthropic financial-services bundle** — ported as optional finance skills ([#21180](https://github.com/NousResearch/hermes-agent/pull/21180))
+- **kanban-video-orchestrator** — creative optional skill (@SHL0MS) ([#19281](https://github.com/NousResearch/hermes-agent/pull/19281))
+- **searxng-search** — optional skill + Web Search + Extract docs page (@kshitijk4poor) ([#20841](https://github.com/NousResearch/hermes-agent/pull/20841), [#20844](https://github.com/NousResearch/hermes-agent/pull/20844))
+
+### Skill UX
+- **Linear skill — add Documents support + Python helper script** ([#20752](https://github.com/NousResearch/hermes-agent/pull/20752))
+- **Modernize Obsidian skill to use file tools** (salvage #19332) ([#20413](https://github.com/NousResearch/hermes-agent/pull/20413))
+- **Default custom tool creation to plugins** (@kshitijk4poor) ([#19755](https://github.com/NousResearch/hermes-agent/pull/19755))
+- **skill_commands cache — rescan on platform scope changes** (salvage #14570 by @LeonSGP43) ([#18739](https://github.com/NousResearch/hermes-agent/pull/18739))
+- **Skills — additional rescan paths in skill_commands cache** (salvage #19042) ([#21181](https://github.com/NousResearch/hermes-agent/pull/21181))
+- Fix: regression tests for non-dict metadata in `extract_skill_conditions` ([#18213](https://github.com/NousResearch/hermes-agent/pull/18213))
+- Docs: explain restoring bundled skills (salvage #19254) ([#20404](https://github.com/NousResearch/hermes-agent/pull/20404))
+- Docs: document `hermes skills reset` subcommand (salvage #11544) ([#20395](https://github.com/NousResearch/hermes-agent/pull/20395))
+- Docs: himalaya v1.2.0 `folder.aliases` syntax ([#19882](https://github.com/NousResearch/hermes-agent/pull/19882))
+- Point agent at `hermes-agent` skill + docs site sync ([#20390](https://github.com/NousResearch/hermes-agent/pull/20390))
+
+---
+
+## 🖥️ CLI & User Experience
+
+### CLI
+- **`/new` accepts optional session name argument** (salvage of #19555) ([#19637](https://github.com/NousResearch/hermes-agent/pull/19637))
+- **100 new CLI startup tips** ([#20168](https://github.com/NousResearch/hermes-agent/pull/20168))
+- **`display.language` — static message translation** (zh/ja/de/es) ([#20231](https://github.com/NousResearch/hermes-agent/pull/20231))
+- **French (fr) locale** (@Foolafroos) ([#20329](https://github.com/NousResearch/hermes-agent/pull/20329))
+- **Ukrainian (uk) locale** ([#20467](https://github.com/NousResearch/hermes-agent/pull/20467))
+- **Turkish (tr) locale** ([#20474](https://github.com/NousResearch/hermes-agent/pull/20474))
+- Fix: recover classic CLI output after resize (@helix4u) ([#20444](https://github.com/NousResearch/hermes-agent/pull/20444))
+- Fix: complete absolute paths as paths (@helix4u) ([#19930](https://github.com/NousResearch/hermes-agent/pull/19930))
+- Fix: resolve lazy session creation regressions (#18370 fallout) (@alt-glitch) ([#20363](https://github.com/NousResearch/hermes-agent/pull/20363))
+- Fix: local backend CLI always uses launch directory (@alt-glitch) ([#19334](https://github.com/NousResearch/hermes-agent/pull/19334))
+- Refactor: drop dead c-S-c key binding (follow-up to #19895) ([#19919](https://github.com/NousResearch/hermes-agent/pull/19919))
+
+### TUI (Ink)
+- **`/model` picker overhaul to match `hermes model` with inline auth** (@austinpickett) ([#18117](https://github.com/NousResearch/hermes-agent/pull/18117))
+- **Collapsible sections in startup banner** — skills, system prompt, MCP (@kshitijk4poor) ([#20625](https://github.com/NousResearch/hermes-agent/pull/20625))
+- **Show context compression count in status bar** ([#21218](https://github.com/NousResearch/hermes-agent/pull/21218))
+- Perf: reduce overlay render churn with focused selectors (@OutThisLife) ([#20393](https://github.com/NousResearch/hermes-agent/pull/20393))
+- Fix: restore voice push-to-talk parity (salvage of #16189 by @Montbra) (@OutThisLife) ([#20897](https://github.com/NousResearch/hermes-agent/pull/20897))
+- Fix: kanban button (@austinpickett) ([#18358](https://github.com/NousResearch/hermes-agent/pull/18358))
+
+### Dashboard
+- **Plugins page — manage, enable/disable, auth status** (@austinpickett) ([#18095](https://github.com/NousResearch/hermes-agent/pull/18095))
+- **Profiles management page** (@vincez-hms-coder) ([#16419](https://github.com/NousResearch/hermes-agent/pull/16419))
+- **Interactive column sorting in analytics tables** ([#18192](https://github.com/NousResearch/hermes-agent/pull/18192))
+- **`default-large` built-in theme with 18px base size** ([#20820](https://github.com/NousResearch/hermes-agent/pull/20820))
+- **Support serving under URL prefix via `X-Forwarded-Prefix`** (salvage #19450) ([#21296](https://github.com/NousResearch/hermes-agent/pull/21296))
+- **Launch dashboard as side-process via `HERMES_DASHBOARD=1` in Docker** (@benbarclay) ([#19540](https://github.com/NousResearch/hermes-agent/pull/19540))
+- Fix: dashboard theme layout shift (@AllardQuek) ([#17232](https://github.com/NousResearch/hermes-agent/pull/17232))
+- Fix: gateway model picker current context (@helix4u) ([#20513](https://github.com/NousResearch/hermes-agent/pull/20513))
+
+### Update + setup
+- **`hermes update --yes/-y` to skip interactive prompts** ([#18261](https://github.com/NousResearch/hermes-agent/pull/18261))
+- **Restart manual profile gateways after update** ([#18178](https://github.com/NousResearch/hermes-agent/pull/18178))
+
+### Profiles
+- **`--no-skills` flag for empty profile creation** ([#20986](https://github.com/NousResearch/hermes-agent/pull/20986))
+
+---
+
+## 🎵 Voice, Image & Media
+
+- **xAI Custom Voices — voice cloning** (@alt-glitch) ([#18776](https://github.com/NousResearch/hermes-agent/pull/18776))
+- **Achievements — share card render on unlocked badges** ([#19657](https://github.com/NousResearch/hermes-agent/pull/19657))
+- **Refresh systemd unit on gateway boot (not just start/restart)** (@alt-glitch) ([#19684](https://github.com/NousResearch/hermes-agent/pull/19684))
+
+---
+
+## 🔗 API Server & Remote Access
+
+- **`X-Hermes-Session-Key` header for long-term memory scoping** (closes #20060) ([#20199](https://github.com/NousResearch/hermes-agent/pull/20199))
+
+---
+
+## 🧰 ACP Adapter (VS Code / Zed / JetBrains)
+
+- **`/steer` and `/queue` slash commands** (@HenkDz) ([#18114](https://github.com/NousResearch/hermes-agent/pull/18114))
+- Fix: translate Windows cwd for WSL sessions (salvage #18128) ([#18233](https://github.com/NousResearch/hermes-agent/pull/18233))
+- Fix: run `/steer` as a regular prompt on idle sessions ([#18258](https://github.com/NousResearch/hermes-agent/pull/18258))
+- Fix: route Zed thoughts to reasoning + polish tool/context rendering ([#19139](https://github.com/NousResearch/hermes-agent/pull/19139))
+- Fix: atomic session persistence via `replace_messages` (salvage #13675) ([#20279](https://github.com/NousResearch/hermes-agent/pull/20279))
+- Fix: preserve assistant reasoning metadata in session persistence (salvage #13575) ([#20296](https://github.com/NousResearch/hermes-agent/pull/20296))
+- Docs: update VS Code setup for ACP Client extension (salvage #12495) ([#20433](https://github.com/NousResearch/hermes-agent/pull/20433))
+
+---
+
+## 🐳 Docker
+
+- **Launch dashboard as side-process via `HERMES_DASHBOARD=1`** (@benbarclay) ([#19540](https://github.com/NousResearch/hermes-agent/pull/19540))
+- **Refuse root gateway runs in official image** (salvage #19215) ([#21250](https://github.com/NousResearch/hermes-agent/pull/21250))
+- **Chown runtime `node_modules` trees to hermes user** (salvage #19303) ([#21267](https://github.com/NousResearch/hermes-agent/pull/21267))
+- Fix: exclude compose/profile runtime state from build context ([#19626](https://github.com/NousResearch/hermes-agent/pull/19626))
+- CI: don't cancel overlapping builds, guard `:latest` (@ethernet8023) ([#20890](https://github.com/NousResearch/hermes-agent/pull/20890))
+- Test: align Dockerfile contract tests with simplified TUI flow (salvage #19024) ([#21174](https://github.com/NousResearch/hermes-agent/pull/21174))
+- Docs: connect to local inference servers (vLLM, Ollama) (salvage #12335) ([#20407](https://github.com/NousResearch/hermes-agent/pull/20407))
+- Docs: document `API_SERVER_*` env vars (salvage #11758) ([#20409](https://github.com/NousResearch/hermes-agent/pull/20409))
+- Docs: clarify Docker terminal backend is a single persistent container ([#20003](https://github.com/NousResearch/hermes-agent/pull/20003))
+
+---
+
+## 🐛 Notable Bug Fixes
+
+### Agent
+- Fix: recover lazy session creation regressions (#18370 fallout) (@alt-glitch) ([#20363](https://github.com/NousResearch/hermes-agent/pull/20363))
+- Fix: propagate ContextVars to concurrent tool worker threads (salvage #16660) ([#18123](https://github.com/NousResearch/hermes-agent/pull/18123))
+- Fix: warning-first tool-call loop guardrails ([#18227](https://github.com/NousResearch/hermes-agent/pull/18227))
+- Fix: surface self-improvement review summaries across CLI, TUI, and gateway ([#18073](https://github.com/NousResearch/hermes-agent/pull/18073))
+
+### Gateway streaming
+- Fix: harden StreamingConfig bool and numeric coercion (@simbam99) ([#16463](https://github.com/NousResearch/hermes-agent/pull/16463))
+
+### Model
+- Fix: avoid Bedrock credential probe in provider picker (@helix4u) ([#18998](https://github.com/NousResearch/hermes-agent/pull/18998))
+
+### Doctor
+- Fix: check global agent-browser when local install not found ([#19671](https://github.com/NousResearch/hermes-agent/pull/19671))
+- Test: kimi-coding-cn provider validation regression ([#19734](https://github.com/NousResearch/hermes-agent/pull/19734))
+
+### Update
+- Fix: patch `isatty` on real streams to fix xdist-flaky `--yes` tests (salvage #19026) ([#21175](https://github.com/NousResearch/hermes-agent/pull/21175))
+- Fix: teach restart-mocks about the post-update survivor sweep (salvage #19031) ([#21177](https://github.com/NousResearch/hermes-agent/pull/21177))
+
+### Auth
+- Fix: acp preserve assistant reasoning metadata ([#20296](https://github.com/NousResearch/hermes-agent/pull/20296))
+
+### Redact
+- Fix: add `code_file` param to skip false-positive ENV/JSON patterns ([#19715](https://github.com/NousResearch/hermes-agent/pull/19715))
+
+### Email
+- Fix: quoted-relative file-drop paths + Date header on tool email path ([#19646](https://github.com/NousResearch/hermes-agent/pull/19646))
+
+---
+
+## 🧪 Testing
+
+- **ACP — accept prompt persistence kwargs in MCP E2E mocks** (@stephenschoettler) ([#18047](https://github.com/NousResearch/hermes-agent/pull/18047))
+- **Toolsets — include kanban in expected post-#17805 toolset assertions** (@briandevans) ([#18122](https://github.com/NousResearch/hermes-agent/pull/18122))
+- **Agent — cover max-iterations summary message sanitization** ([#19580](https://github.com/NousResearch/hermes-agent/pull/19580))
+- **run_agent — `-inf` and `nan` regression coverage for `_coerce_number`** ([#19703](https://github.com/NousResearch/hermes-agent/pull/19703))
+
+---
+
+## 📚 Documentation
+
+### Major docs additions
+- **`llms.txt` + `llms-full.txt` — agent-friendly ingestion** ([#18276](https://github.com/NousResearch/hermes-agent/pull/18276))
+- **User Stories and Use Cases collage page** ([#18282](https://github.com/NousResearch/hermes-agent/pull/18282))
+- **Persistent Goals (/goal) feature page** ([#18275](https://github.com/NousResearch/hermes-agent/pull/18275))
+- **Windows (WSL2) guide expansion** — filesystem, networking, services, pitfalls ([#20748](https://github.com/NousResearch/hermes-agent/pull/20748))
+- **Chinese (zh-CN) README translation** (salvage #13508) ([#20431](https://github.com/NousResearch/hermes-agent/pull/20431))
+- **zh-Hans Docusaurus locale** + Tool Gateway / image-gen / WSL quickstart translations (salvage #11728) ([#20430](https://github.com/NousResearch/hermes-agent/pull/20430))
+- **Tool Gateway docs restructure** — lead with what it does, config moved to bottom ([#20827](https://github.com/NousResearch/hermes-agent/pull/20827))
+- **Quickstart — Onchain AI Garage Hermes tutorials playlist** ([#20192](https://github.com/NousResearch/hermes-agent/pull/20192))
+- **Open WebUI bootstrap script** (salvage #9566) ([#20427](https://github.com/NousResearch/hermes-agent/pull/20427))
+- **Local Ollama setup guide** (salvage #5842) ([#20426](https://github.com/NousResearch/hermes-agent/pull/20426))
+- **Google Gemini guide** (salvage #17450) ([#20401](https://github.com/NousResearch/hermes-agent/pull/20401))
+- **Custom model aliases for /model command** ([#20475](https://github.com/NousResearch/hermes-agent/pull/20475))
+- **Together/Groq/Perplexity cookbook via `custom_providers`** (salvage #15214) ([#20400](https://github.com/NousResearch/hermes-agent/pull/20400))
+- **Doubao speech integration examples** (TTS + STT) (salvage #18065) ([#20418](https://github.com/NousResearch/hermes-agent/pull/20418))
+- **WSL-to-Windows Chrome MCP bridge** (salvage #8313) ([#20428](https://github.com/NousResearch/hermes-agent/pull/20428))
+- **Hermes skills docs sync** — slash commands + durable-systems section ([#20390](https://github.com/NousResearch/hermes-agent/pull/20390))
+- **AGENTS.md — curator/cron/delegation/toolsets + fix plugin tree** ([#20226](https://github.com/NousResearch/hermes-agent/pull/20226))
+- **Bedrock quickstart entry + fallback comment + deployment link** (salvage #11093) ([#20397](https://github.com/NousResearch/hermes-agent/pull/20397))
+
+### Docs polish
+- Collapse exploding skills tree to a single Skills node ([#18259](https://github.com/NousResearch/hermes-agent/pull/18259))
+- Clarify `session_search` auxiliary model docs ([#19593](https://github.com/NousResearch/hermes-agent/pull/19593))
+- Open WebUI Quick Setup gap fill ([#19654](https://github.com/NousResearch/hermes-agent/pull/19654))
+- Default custom tool creation to plugins (@kshitijk4poor) ([#19755](https://github.com/NousResearch/hermes-agent/pull/19755))
+- Clarify Telegram group chat troubleshooting (salvage #18672) ([#20416](https://github.com/NousResearch/hermes-agent/pull/20416))
+- Codex OAuth auth prerequisite clarification (salvage #18688) ([#20417](https://github.com/NousResearch/hermes-agent/pull/20417))
+- Discord Server Members Intent + SSRC-mapping drift + /voice join slash Choice (salvage #11350) ([#20411](https://github.com/NousResearch/hermes-agent/pull/20411))
+- Document `ctx.dispatch_tool()` (salvage #10955) ([#20391](https://github.com/NousResearch/hermes-agent/pull/20391))
+- Document `hermes webhook subscribe --deliver-only` (salvage #12612) ([#20392](https://github.com/NousResearch/hermes-agent/pull/20392))
+- Document `hermes import` reference (salvage #14711) ([#20396](https://github.com/NousResearch/hermes-agent/pull/20396))
+- Document per-provider TTS `max_text_length` caps (salvage #13825) ([#20389](https://github.com/NousResearch/hermes-agent/pull/20389))
+- Clarify supported prompt customization surfaces (salvage #19987) ([#20383](https://github.com/NousResearch/hermes-agent/pull/20383))
+- Correct `web_extract` summarizer timeout comment (salvage #20051) ([#20381](https://github.com/NousResearch/hermes-agent/pull/20381))
+- Fix fallback provider config paths (salvage #20033) ([#20382](https://github.com/NousResearch/hermes-agent/pull/20382))
+- Fix misleading RL install-extras claim (salvage #19080) ([#21213](https://github.com/NousResearch/hermes-agent/pull/21213))
+- Clarify API server tool execution locality (salvage #19117) ([#21223](https://github.com/NousResearch/hermes-agent/pull/21223))
+- Prefer `.venv` to match AGENTS.md and scripts/run_tests.sh (@xxxigm) ([#21334](https://github.com/NousResearch/hermes-agent/pull/21334))
+- Align tool discovery + test runner with AGENTS.md (@xxxigm) ([#20791](https://github.com/NousResearch/hermes-agent/pull/20791))
+- Align terminal-backend count and naming across docs and code (salvage #19044) ([#20402](https://github.com/NousResearch/hermes-agent/pull/20402))
+- Refresh stale platform counts (salvage #19053) ([#20403](https://github.com/NousResearch/hermes-agent/pull/20403))
+
+---
+
+## 👥 Contributors
+
+### Core
+- **@teknium1** — salvage, triage, review, feature work, and release management
+
+### Top Community Contributors
+
+- **@kshitijk4poor** (21 PRs) — SearXNG native search backend, per-capability backend selection, collapsible TUI startup banner, Slack ephemeral ack + format fixes, Lightpanda fallback hardening, searxng-search optional skill + Web Search + Extract docs, default custom tool creation to plugins, kanban failure-column fix
+- **@alt-glitch** (13 PRs) — video_analyze tool, xAI Custom Voices (voice cloning), local-backend CLI launch-directory fix, lazy-session creation regression recovery, systemd unit refresh on gateway boot
+- **@OutThisLife** (9 PRs) — TUI perf — overlay render churn reduction, voice push-to-talk parity restoration (salvaging @Montbra)
+- **@helix4u** (6 PRs) — Classic CLI output recovery after resize, absolute-path TUI completion, gateway model picker current-context fix, Bedrock credential probe avoidance, kanban docs fixes
+- **@ethernet8023** (3 PRs) — Docker CI — don't cancel overlapping builds, :latest guard
+- **@benbarclay** (3 PRs) — Docker — launch dashboard as side-process via HERMES_DASHBOARD=1
+- **@austinpickett** (3 PRs) — Dashboard Plugins page, TUI /model picker overhaul with inline auth, kanban button fix
+- **@sprmn24** (2 PRs) — Contributor (2 PRs)
+- **@asheriif** (2 PRs) — Contributor (2 PRs)
+- **@xxxigm** (2 PRs) — Contributing docs — .venv preference and test runner alignment with AGENTS.md
+- **@stephenschoettler** (1 PR) — ACP — MCP E2E mock kwargs
+- **@vincez-hms-coder** (1 PR) — Dashboard — Profiles management page
+- **@cdanis** (1 PR) — Contributor
+- **@briandevans** (1 PR) — Toolsets test — kanban assertions post-#17805
+- **@heyitsaamir** (1 PR) — Contributor
+
+### All Contributors
+
+Thanks to everyone who contributed to v0.13.0 — commits, co-authored work, and salvaged PRs. 295 contributors in one week.
+
+@0oAstro, @0xDevNinja, @0xharryriddle, @0xKingBack, @0xsir0000, @0xyg3n, @0z1-ghb, @abhinav11082001-stack,
+@acc001k, @acesjohnny, @adamludwin, @adybag14-cyber, @agentlinker, @agilejava, @ai-ag2026, @AJV20,
+@alanxchen85, @albert748, @AllardQuek, @alt-glitch, @altmazza0-star, @ambition0802, @amitgaur, @amroessam,
+@andrewhosf, @Asce66, @asheriif, @ashermorse, @asimons81, @Aslaaen, @Asunfly, @atongrun, @austinpickett,
+@banditburai, @barteqpl, @Bartok9, @Beandon13, @beardthelion, @beibi9966, @benbarclay, @binhnt92, @bjianhang,
+@BlackJulySnow, @bobashopcashier, @bogerman1, @Bongulielmi, @Brecht-H, @briandevans, @brooklynnicholson,
+@c3115644151, @camaragon, @CashWilliams, @CCClelo, @cdanis, @CES4751, @cg2aigc, @changchun989, @ChanlerDev,
+@CharlieKerfoot, @chengoak, @chenyunbo411, @chinadbo, @CIRWEL, @cixuuz, @cmcgrabby-hue, @colorcross,
+@Contentment003111, @CoreyNoDream, @counterposition, @curiouscleo, @DaniuXie, @deep-name, @dengtaoyuan450-a11y,
+@discodirector, @donramon77, @dpaluy, @ee-blog, @ehz0ah, @el-analista, @elmatadorgh, @EmelyanenkoK,
+@Emidomenge, @emozilla, @Es1la, @EthanGuo-coder, @etherman-os, @ethernet8023, @EvilDrag0n, @exxmen, @Fearvox,
+@Feranmi10, @firefly, @flobo3, @fmercurio, @Foolafroos, @formulahendry, @franksong2702, @ggnnggez, @GinWU05,
+@giwaov, @glesperance, @gnanirahulnutakki, @GodsBoy, @Gosuj, @Grey0202, @guillaumemeyer, @Gutslabs, @h0tp-ftw,
+@haidao1919, @halmisen, @happy5318, @hedirman, @helix4u, @hendrixfreire, @HenkDz, @hex-clawd, @heyitsaamir,
+@hharry11, @Hinotoi-agent, @holynn-q, @hrkzogw, @Hypn0sis, @Hypnus-Yuan, @ideathinklab01-source, @IMHaoyan,
+@Interstellar-code, @ishardo, @jacdevos, @jackey8616, @JanCong, @jasonoutland, @jatingodnani, @JayGwod,
+@jethac, @JezzaHehn, @JiaDe-Wu, @jjjojoj, @jkausel-ai, @John-tip, @johnncenae, @jrusso1020, @jslizar,
+@JTroyerOvermatch, @julysir, @Junass1, @JustinUssuri, @Kailigithub, @keepcalmqqf, @kiala9, @konsisumer,
+@kowenhaoai, @Krionex, @kshitijk4poor, @kyan12, @leavrcn, @leon7609, @LeonSGP43, @leprincep35700, @lhysdl,
+@likejudy, @lisanhu, @liu-collab, @liuguangyong93, @liuhao1024, @LucianoSP, @luoyuctl, @luyao618, @M3RCUR2Y,
+@maciekczech, @Magicray1217, @magicray1217, @MaHaoHao-ch, @malaiwah, @manateelazycat, @masonjames, @megastary,
+@memosr, @MichaelWDanko, @mikeyobrien, @millerc79, @Mind-Dragon, @mioimotoai-lgtm, @misery-hl, @molvikar,
+@momowind, @Montbra, @MottledShadow, @mrbob-git, @mrcharlesiv, @mrcoferland, @ms-alan, @mwnickerson,
+@nazirulhafiy, @nftpoetrist, @nicoloboschi, @nightq, @nikolay-bratanov, @NikolayGusev-astra, @nocturnum91,
+@noOne-list, @nouseman666, @novax635, @npmisantosh, @nudiltoys-cmyk, @olisikh, @oluwadareab12, @Oxidane-bot,
+@pama0227, @pander, @pasevin, @paul-tian, @pdonizete, @perlowja, @pingchesu, @PratikRai0101, @priveperfumes,
+@probepark, @QifengKuang, @quocanh261997, @qWaitCrypto, @qxxaa, @r266-tech, @rames-jusso, @revaraver,
+@Ricardo-M-L, @rob-maron, @Roy-oss1, @rxdxxxx, @SandroHub013, @Sanjays2402, @Sertug17, @shashwatgokhe,
+@shellybotmoyer, @SHL0MS, @SimbaKingjoe, @simbam99, @simplenamebox-ops, @socrates1024, @sonic-netizen,
+@sprmn24, @steezkelly, @stephen0110, @stephenschoettler, @stevenchanin, @stevenchouai, @stormhierta,
+@subtract0, @suncokret12, @swithek, @taeng0204, @TakeshiSawaguchi, @tangyuanjc, @TheEpTic, @thelumiereguy,
+@Tkander1715, @tmdgusya, @Tranquil-Flow, @TruaShamu, @UgwujaGeorge, @valda, @vincez-hms-coder, @VinVC,
+@vominh1919, @wabrent, @WadydX, @wanazhar, @WanderWang, @warabe1122, @web-dev0521, @WideLee, @willy-scr,
+@wmagev, @WuTianyi123, @wxst, @wysie, @Wysie, @xsfX20, @xxxigm, @xyiy001, @YanzhongSu, @ygd58, @Yoimex,
+@yuehei, @Yukipukii1, @yuqianma, @YX234, @zeejaytan, @zhanggttry, @zhao0112, @zng8418, @zons-zhaozhy, @Zyproth
+
+---
+
+**Full Changelog**: [v2026.4.30...v2026.5.7](https://github.com/NousResearch/hermes-agent/compare/v2026.4.30...v2026.5.7)
diff --git a/acp_adapter/server.py b/acp_adapter/server.py
index dd9d75af9c..c61bb80e47 100644
--- a/acp_adapter/server.py
+++ b/acp_adapter/server.py
@@ -3,13 +3,16 @@
from __future__ import annotations
import asyncio
+import base64
import contextvars
import json
import logging
import os
from collections import defaultdict, deque
from concurrent.futures import ThreadPoolExecutor
+from pathlib import Path
from typing import Any, Deque, Optional
+from urllib.parse import unquote, urlparse
import acp
from acp.schema import (
@@ -18,6 +21,7 @@ from acp.schema import (
AuthenticateResponse,
AvailableCommand,
AvailableCommandsUpdate,
+ BlobResourceContents,
ClientCapabilities,
EmbeddedResourceContentBlock,
ForkSessionResponse,
@@ -46,6 +50,7 @@ from acp.schema import (
SessionResumeCapabilities,
SessionInfo,
TextContentBlock,
+ TextResourceContents,
UnstructuredCommandInput,
Usage,
UsageUpdate,
@@ -83,6 +88,272 @@ _executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="acp-agent")
# does not expose a client-side limit, so this is a fixed cap that clients
# paginate against using `cursor` / `next_cursor`.
_LIST_SESSIONS_PAGE_SIZE = 50
+_MAX_ACP_RESOURCE_BYTES = 512 * 1024
+_TEXT_RESOURCE_MIME_PREFIXES = ("text/",)
+_TEXT_RESOURCE_MIME_TYPES = {
+ "application/json",
+ "application/javascript",
+ "application/typescript",
+ "application/xml",
+ "application/x-yaml",
+ "application/yaml",
+ "application/toml",
+ "application/sql",
+}
+
+
+def _resource_display_name(uri: str, name: str | None = None, title: str | None = None) -> str:
+ """Human-readable attachment name for prompt context."""
+ raw_name = (name or "").strip()
+ raw_title = (title or "").strip()
+ if raw_title and raw_name and raw_title != raw_name:
+ return f"{raw_title} ({raw_name})"
+ if raw_title:
+ return raw_title
+ if raw_name:
+ return raw_name
+ parsed = urlparse(uri)
+ candidate = parsed.path if parsed.scheme else uri
+ return Path(unquote(candidate)).name or uri or "resource"
+
+
+def _is_text_resource(mime_type: str | None) -> bool:
+ mime = (mime_type or "").split(";", 1)[0].strip().lower()
+ if not mime:
+ return False
+ return mime.startswith(_TEXT_RESOURCE_MIME_PREFIXES) or mime in _TEXT_RESOURCE_MIME_TYPES
+
+
+def _is_image_resource(mime_type: str | None) -> bool:
+ mime = (mime_type or "").split(";", 1)[0].strip().lower()
+ return mime.startswith("image/")
+
+
+def _guess_image_mime_from_path(path: Path) -> str | None:
+ suffix = path.suffix.lower()
+ return {
+ ".png": "image/png",
+ ".jpg": "image/jpeg",
+ ".jpeg": "image/jpeg",
+ ".gif": "image/gif",
+ ".webp": "image/webp",
+ ".bmp": "image/bmp",
+ ".svg": "image/svg+xml",
+ }.get(suffix)
+
+
+def _image_data_url(data: bytes, mime_type: str) -> str:
+ return f"data:{mime_type};base64,{base64.b64encode(data).decode('ascii')}"
+
+
+def _path_from_file_uri(uri: str) -> Path | None:
+ """Convert local file URIs/paths from ACP clients into a readable Path.
+
+ Zed may send POSIX file URIs from Linux/WSL workspaces or Windows-ish paths
+ when launched through wsl.exe. Translate the common Windows drive form to
+ /mnt//... so Hermes running in WSL can read it.
+ """
+ raw = (uri or "").strip()
+ if not raw:
+ return None
+
+ parsed = urlparse(raw)
+ if parsed.scheme and parsed.scheme != "file":
+ return None
+
+ if parsed.scheme == "file":
+ if parsed.netloc and parsed.netloc not in {"", "localhost"}:
+ return None
+ path_text = unquote(parsed.path or "")
+ else:
+ path_text = unquote(raw)
+
+ # file:///C:/Users/... or C:\Users\...
+ if len(path_text) >= 3 and path_text[0] == "/" and path_text[2] == ":" and path_text[1].isalpha():
+ drive = path_text[1].lower()
+ rest = path_text[3:].lstrip("/\\").replace("\\", "/")
+ return Path("/mnt") / drive / rest
+ if len(path_text) >= 2 and path_text[1] == ":" and path_text[0].isalpha():
+ drive = path_text[0].lower()
+ rest = path_text[2:].lstrip("/\\").replace("\\", "/")
+ return Path("/mnt") / drive / rest
+
+ return Path(path_text)
+
+
+def _decode_text_bytes(data: bytes, mime_type: str | None) -> str | None:
+ """Decode resource bytes if they are probably text; return None for binary."""
+ if b"\x00" in data and not _is_text_resource(mime_type):
+ return None
+ for encoding in ("utf-8-sig", "utf-8", "latin-1"):
+ try:
+ return data.decode(encoding)
+ except UnicodeDecodeError:
+ continue
+ return data.decode("utf-8", errors="replace")
+
+
+def _format_resource_text(
+ *,
+ uri: str,
+ body: str,
+ name: str | None = None,
+ title: str | None = None,
+ note: str | None = None,
+) -> str:
+ display = _resource_display_name(uri, name=name, title=title)
+ header = f"[Attached file: {display}]"
+ if note:
+ header += f" ({note})"
+ return f"{header}\nURI: {uri}\n\n{body}"
+
+
+def _resource_link_to_parts(block: ResourceContentBlock) -> list[dict[str, Any]]:
+ """Convert an ACP resource_link block to OpenAI content parts.
+
+ Returns a list of {"type": "text", ...} and/or {"type": "image_url", ...}
+ parts. Image resources produce an image_url part with a small text header
+ so the model knows which attachment it is. Non-image resources return a
+ single text part with the inlined file body (or a binary-omit note).
+ """
+ uri = str(getattr(block, "uri", "") or "").strip()
+ if not uri:
+ return []
+
+ name = str(getattr(block, "name", "") or "").strip() or None
+ title = str(getattr(block, "title", "") or "").strip() or None
+ mime_type = str(getattr(block, "mime_type", "") or "").strip() or None
+ path = _path_from_file_uri(uri)
+
+ if path is None:
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ name=name,
+ title=title,
+ body="[Resource link only; Hermes cannot read non-file ACP resource URIs directly.]",
+ ),
+ }]
+
+ # Image files: emit a short text header + image_url data URL so vision
+ # models can see the attachment instead of a "binary omitted" note.
+ image_mime = mime_type if _is_image_resource(mime_type) else _guess_image_mime_from_path(path)
+ if image_mime and _is_image_resource(image_mime):
+ try:
+ size = path.stat().st_size
+ if size > _MAX_ACP_RESOURCE_BYTES:
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ name=name,
+ title=title,
+ body=f"[Image too large to inline: {size} bytes, cap={_MAX_ACP_RESOURCE_BYTES}]",
+ ),
+ }]
+ with path.open("rb") as fh:
+ data = fh.read()
+ except OSError as exc:
+ logger.warning("ACP image resource read failed: %s", uri, exc_info=True)
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ name=name,
+ title=title,
+ body=f"[Could not read attached image: {exc}]",
+ ),
+ }]
+ display = _resource_display_name(uri, name=name, title=title)
+ return [
+ {"type": "text", "text": f"[Attached image: {display}]\nURI: {uri}"},
+ {"type": "image_url", "image_url": {"url": _image_data_url(data, image_mime)}},
+ ]
+
+ try:
+ size = path.stat().st_size
+ read_size = min(size, _MAX_ACP_RESOURCE_BYTES)
+ with path.open("rb") as fh:
+ data = fh.read(read_size)
+ text = _decode_text_bytes(data, mime_type)
+ if text is None:
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ name=name,
+ title=title,
+ body=f"[Binary file omitted: {size} bytes, mime={mime_type or 'unknown'}]",
+ ),
+ }]
+ note = None
+ if size > _MAX_ACP_RESOURCE_BYTES:
+ note = f"truncated to {_MAX_ACP_RESOURCE_BYTES} of {size} bytes"
+ return [{
+ "type": "text",
+ "text": _format_resource_text(uri=uri, name=name, title=title, body=text, note=note),
+ }]
+ except OSError as exc:
+ logger.warning("ACP resource read failed: %s", uri, exc_info=True)
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ name=name,
+ title=title,
+ body=f"[Could not read attached file: {exc}]",
+ ),
+ }]
+
+
+def _embedded_resource_to_parts(block: EmbeddedResourceContentBlock) -> list[dict[str, Any]]:
+ resource = getattr(block, "resource", None)
+ if resource is None:
+ return []
+
+ uri = str(getattr(resource, "uri", "") or "").strip()
+ mime_type = str(getattr(resource, "mime_type", "") or "").strip() or None
+
+ if isinstance(resource, TextResourceContents):
+ return [{"type": "text", "text": _format_resource_text(uri=uri, body=resource.text)}]
+
+ if isinstance(resource, BlobResourceContents):
+ blob = resource.blob or ""
+ try:
+ data = base64.b64decode(blob, validate=True)
+ except Exception:
+ data = blob.encode("utf-8", errors="replace")
+
+ # Image blobs go through as image_url so vision models can see them.
+ if _is_image_resource(mime_type):
+ if len(data) > _MAX_ACP_RESOURCE_BYTES:
+ return [{
+ "type": "text",
+ "text": _format_resource_text(
+ uri=uri,
+ body=f"[Embedded image too large to inline: {len(data)} bytes, cap={_MAX_ACP_RESOURCE_BYTES}]",
+ ),
+ }]
+ display = _resource_display_name(uri)
+ return [
+ {"type": "text", "text": f"[Attached image: {display}]" + (f"\nURI: {uri}" if uri else "")},
+ {"type": "image_url", "image_url": {"url": _image_data_url(data, mime_type or "image/png")}},
+ ]
+
+ text = _decode_text_bytes(data[:_MAX_ACP_RESOURCE_BYTES], mime_type)
+ if text is None:
+ body = f"[Binary embedded file omitted: {len(data)} bytes, mime={mime_type or 'unknown'}]"
+ else:
+ body = text
+ if len(data) > _MAX_ACP_RESOURCE_BYTES:
+ body += f"\n\n[Truncated to {_MAX_ACP_RESOURCE_BYTES} of {len(data)} bytes]"
+ return [{"type": "text", "text": _format_resource_text(uri=uri, body=body)}]
+
+ text = getattr(resource, "text", None)
+ if text:
+ return [{"type": "text", "text": _format_resource_text(uri=uri, body=str(text))}]
+ return []
def _extract_text(
@@ -144,6 +415,20 @@ def _content_blocks_to_openai_user_content(
if image_part is not None:
parts.append(image_part)
continue
+ if isinstance(block, ResourceContentBlock):
+ resource_parts = _resource_link_to_parts(block)
+ for part in resource_parts:
+ parts.append(part)
+ if part.get("type") == "text":
+ text_parts.append(part["text"])
+ continue
+ if isinstance(block, EmbeddedResourceContentBlock):
+ resource_parts = _embedded_resource_to_parts(block)
+ for part in resource_parts:
+ parts.append(part)
+ if part.get("type") == "text":
+ text_parts.append(part["text"])
+ continue
if not parts:
return _extract_text(prompt)
@@ -803,6 +1088,7 @@ class HermesACPAgent(acp.Agent):
user_text = _extract_text(prompt).strip()
user_content = _content_blocks_to_openai_user_content(prompt)
+ text_only_prompt = all(isinstance(block, TextContentBlock) for block in prompt)
has_content = bool(user_text) or (
isinstance(user_content, list) and bool(user_content)
)
@@ -821,7 +1107,7 @@ class HermesACPAgent(acp.Agent):
# silently append to state.queued_prompts and respond with
# "No active turn — queued for the next turn", which looks like
# /queue even though the user never typed /queue.
- if isinstance(user_content, str) and user_text.startswith("/steer"):
+ if text_only_prompt and isinstance(user_content, str) and user_text.startswith("/steer"):
steer_text = user_text.split(maxsplit=1)[1].strip() if len(user_text.split(maxsplit=1)) > 1 else ""
interrupted_prompt = ""
rewrite_idle = False
@@ -846,7 +1132,7 @@ class HermesACPAgent(acp.Agent):
# Slash commands are text-only; if the client included images/resources,
# send the whole multimodal prompt to the agent instead of treating it as
# an ACP command.
- if isinstance(user_content, str) and user_text.startswith("/"):
+ if text_only_prompt and isinstance(user_content, str) and user_text.startswith("/"):
response_text = self._handle_slash_command(user_text, state)
if response_text is not None:
if self._conn:
diff --git a/acp_adapter/session.py b/acp_adapter/session.py
index d6dace66b4..61d06432a7 100644
--- a/acp_adapter/session.py
+++ b/acp_adapter/session.py
@@ -466,17 +466,10 @@ class SessionManager:
except Exception:
logger.debug("Failed to update ACP session metadata", exc_info=True)
- # Replace stored messages with current history.
- db.clear_messages(state.session_id)
- for msg in state.history:
- db.append_message(
- session_id=state.session_id,
- role=msg.get("role", "user"),
- content=msg.get("content"),
- tool_name=msg.get("tool_name") or msg.get("name"),
- tool_calls=msg.get("tool_calls"),
- tool_call_id=msg.get("tool_call_id"),
- )
+ # Replace stored messages with current history atomically so a
+ # mid-rewrite failure rolls back and the previously persisted
+ # conversation is preserved (salvaged from #13675).
+ db.replace_messages(state.session_id, state.history)
except Exception:
logger.warning("Failed to persist ACP session %s", state.session_id, exc_info=True)
diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py
index bb1b33fcc8..eb6b3e79ad 100644
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@@ -231,33 +231,30 @@ def _supports_fast_mode(model: str) -> bool:
return any(v in model for v in _FAST_MODE_SUPPORTED_SUBSTRINGS)
-# Beta headers for enhanced features (sent with ALL auth types).
-# As of Opus 4.7 (2026-04-16), the first two are GA on Claude 4.6+ — the
+# Beta headers for enhanced features that are safe on ordinary/native Anthropic
+# requests. As of Opus 4.7 (2026-04-16), these are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
-# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
-# that still gate on the headers continue to get the enhanced features.
+# here so older Claude (4.5, 4.1) + compatible endpoints that still gate on
+# the headers continue to get the enhanced features.
#
-# ``context-1m-2025-08-07`` unlocks the 1M context window on Claude Opus 4.6/4.7
-# and Sonnet 4.6 when served via AWS Bedrock or Azure AI Foundry. 1M is GA on
-# native Anthropic (api.anthropic.com) for Opus 4.6+, but Bedrock/Azure still
-# gate it behind this beta header as of 2026-04 — without it Bedrock caps Opus
-# at 200K even though model_metadata.py advertises 1M. The header is a harmless
-# no-op on endpoints where 1M is GA.
+# Do NOT include ``context-1m-2025-08-07`` here. Anthropic returns HTTP 400
+# ("long context beta is not yet available for this subscription") for
+# accounts without the long-context beta, which breaks normal short auxiliary
+# calls like title generation/session summarization.
#
-# Migration guide: remove these if you no longer support ≤4.5 models or once
-# Bedrock/Azure promote 1M to GA.
+# ``context-1m-2025-08-07`` is still required to unlock the 1M context window
+# on Claude Opus 4.6/4.7 and Sonnet 4.6 when served via AWS Bedrock or Azure
+# AI Foundry. Add it only for those endpoint-specific paths below.
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
- "context-1m-2025-08-07",
]
# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
# the fine-grained tool streaming beta is present. Omit it so tool calls
# fall back to the provider's default response path.
_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
-# 1M context beta — see comment on _COMMON_BETAS above. Stripped for
-# Bearer-auth (MiniMax) endpoints since they host their own models and
-# unknown Anthropic beta headers risk request rejection.
+# 1M context beta. Native Anthropic does not get this by default because some
+# subscriptions reject it, but Bedrock/Azure still need it for 1M context.
_CONTEXT_1M_BETA = "context-1m-2025-08-07"
# Fast mode beta — enables the ``speed: "fast"`` request parameter for
@@ -476,6 +473,14 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))
+def _base_url_needs_context_1m_beta(base_url: str | None) -> bool:
+ """Return True for endpoints that still gate 1M context behind a beta."""
+ normalized = _normalize_base_url_text(base_url).lower()
+ if not normalized:
+ return False
+ return "azure.com" in normalized
+
+
def _common_betas_for_base_url(
base_url: str | None,
*,
@@ -485,27 +490,25 @@ def _common_betas_for_base_url(
MiniMax's Anthropic-compatible endpoints (Bearer-auth) reject requests
that include Anthropic's ``fine-grained-tool-streaming`` beta — every
- tool-use message triggers a connection error. Strip that beta for
- Bearer-auth endpoints while keeping all other betas intact.
+ tool-use message triggers a connection error.
- The ``context-1m-2025-08-07`` beta is also stripped for Bearer-auth
- endpoints — MiniMax hosts its own models, not Claude, so the header is
- irrelevant at best and risks request rejection at worst.
+ The ``context-1m-2025-08-07`` beta is not sent to native Anthropic by
+ default because some subscriptions reject it. Add it only for endpoint
+ families that still require it for 1M context, currently Azure AI Foundry.
+ Bedrock uses its own client helper below and opts in explicitly.
- ``drop_context_1m_beta=True`` additionally strips the 1M-context beta on
- otherwise-unrelated endpoints. The OAuth retry path flips this flag after
- a subscription rejects the beta with
- "The long context beta is not yet available for this subscription" so
- subsequent requests in the same session don't repeat the probe. See the
- reactive recovery loop in ``run_agent.py`` and issue-comment history on
- PR #17680 for the full rationale.
+ ``drop_context_1m_beta=True`` strips the 1M-context beta from any path that
+ would otherwise include it after a subscription/endpoint rejects the beta.
"""
+ betas = list(_COMMON_BETAS)
+ if _base_url_needs_context_1m_beta(base_url) and not drop_context_1m_beta:
+ betas.append(_CONTEXT_1M_BETA)
if _requires_bearer_auth(base_url):
_stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
- return [b for b in _COMMON_BETAS if b not in _stripped]
+ return [b for b in betas if b not in _stripped]
if drop_context_1m_beta:
- return [b for b in _COMMON_BETAS if b != _CONTEXT_1M_BETA]
- return _COMMON_BETAS
+ return [b for b in betas if b != _CONTEXT_1M_BETA]
+ return betas
def build_anthropic_client(
@@ -642,7 +645,7 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
- default_headers={"anthropic-beta": ",".join(_COMMON_BETAS)},
+ default_headers={"anthropic-beta": ",".join([*_COMMON_BETAS, _CONTEXT_1M_BETA])},
)
diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py
index 4c706748a0..bd4e6be457 100644
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -196,6 +196,12 @@ def _is_kimi_model(model: Optional[str]) -> bool:
return bare.startswith("kimi-") or bare == "kimi"
+def _is_arcee_trinity_thinking(model: Optional[str]) -> bool:
+ """True for Arcee Trinity Large Thinking (direct or via OpenRouter)."""
+ bare = (model or "").strip().lower().rsplit("/", 1)[-1]
+ return bare == "trinity-large-thinking"
+
+
def _fixed_temperature_for_model(
model: Optional[str],
base_url: Optional[str] = None,
@@ -213,10 +219,46 @@ def _fixed_temperature_for_model(
if _is_kimi_model(model):
logger.debug("Omitting temperature for Kimi model %r (server-managed)", model)
return OMIT_TEMPERATURE
+ if _is_arcee_trinity_thinking(model):
+ return 0.5
+ return None
+
+
+def _compression_threshold_for_model(model: Optional[str]) -> Optional[float]:
+ """Return a context-compression threshold override for specific models.
+
+ The threshold is the fraction of the model's context window that must be
+ consumed before Hermes triggers summarization. Higher values delay
+ compression and preserve more raw context.
+
+ Returns a float in (0, 1] to override the global ``compression.threshold``
+ config value, or ``None`` to leave the user's config value unchanged.
+ """
+ if _is_arcee_trinity_thinking(model):
+ return 0.75
return None
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
-_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
+def _get_aux_model_for_provider(provider_id: str) -> str:
+ """Return the cheap auxiliary model for a provider.
+
+ Reads from ProviderProfile.default_aux_model first, falling back to the
+ legacy hardcoded dict for providers that predate the profiles system.
+ """
+ try:
+ from providers import get_provider_profile
+ _p = get_provider_profile(provider_id)
+ if _p and _p.default_aux_model:
+ return _p.default_aux_model
+ except Exception:
+ pass
+ return _API_KEY_PROVIDER_AUX_MODELS_FALLBACK.get(provider_id, "")
+
+
+# Fallback for providers not yet migrated to ProviderProfile.default_aux_model,
+# plus providers we intentionally keep pinned here (e.g. Anthropic predates
+# profiles). New providers should set default_aux_model on their profile instead.
+_API_KEY_PROVIDER_AUX_MODELS_FALLBACK: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
@@ -235,6 +277,10 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"tencent-tokenhub": "hy3-preview",
}
+# Legacy alias — callers that haven't been updated to _get_aux_model_for_provider()
+# can still use this dict directly. Kept in sync with _FALLBACK above.
+_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = _API_KEY_PROVIDER_AUX_MODELS_FALLBACK
+
# Vision-specific model overrides for direct providers.
# When the user's main provider has a dedicated vision/multimodal model that
# differs from their main chat model, map it here. The vision auto-detect
@@ -259,10 +305,12 @@ _PROVIDERS_WITHOUT_VISION: frozenset = frozenset({
"kimi-coding-cn",
})
-# OpenRouter app attribution headers (base — always sent)
+# OpenRouter app attribution headers (base — always sent).
+# `X-Title` is the canonical attribution header OpenRouter's dashboard
+# reads; the previous `X-OpenRouter-Title` label was not recognized there.
_OR_HEADERS_BASE = {
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
- "X-OpenRouter-Title": "Hermes Agent",
+ "X-Title": "Hermes Agent",
"X-OpenRouter-Categories": "productivity,cli-agent",
}
@@ -407,6 +455,12 @@ def _to_openai_base_url(base_url: str) -> str:
"""
url = str(base_url or "").strip().rstrip("/")
if url.endswith("/anthropic"):
+ # ZAI (open.bigmodel.cn) uses /api/anthropic for Anthropic wire
+ # but /api/paas/v4 for OpenAI wire — the generic /v1 rewrite is wrong.
+ if "open.bigmodel.cn" in url or "bigmodel" in url:
+ rewritten = url[: -len("/anthropic")] + "/paas/v4"
+ logger.debug("Auxiliary client: rewrote ZAI base URL %s → %s", url, rewritten)
+ return rewritten
rewritten = url[: -len("/anthropic")] + "/v1"
logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
return rewritten
@@ -548,6 +602,14 @@ class _CodexCompletionsAdapter:
"store": False,
}
+ # Preserve the chat.completions timeout contract. This adapter is used
+ # by auxiliary calls such as context compression; if the timeout is not
+ # forwarded and enforced, a Codex Responses stream can sit behind a
+ # dead-looking CLI until the user force-interrupts the whole session.
+ timeout = kwargs.get("timeout")
+ if timeout is not None:
+ resp_kwargs["timeout"] = timeout
+
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
@@ -567,7 +629,12 @@ class _CodexCompletionsAdapter:
# API allows it.
pass
else:
- effort = reasoning_cfg.get("effort", "medium")
+ # Truthy-only check mirrors agent/transports/codex.py
+ # build_kwargs(): falsy values (None, "", 0) fall back
+ # to the default rather than being forwarded to the
+ # Codex backend, which rejects e.g. {"effort": null}
+ # with a 400.
+ effort = reasoning_cfg.get("effort") or "medium"
# Codex backend rejects "minimal"; clamp to "low" to
# match the main-agent Codex transport behavior.
if effort == "minimal":
@@ -600,6 +667,37 @@ class _CodexCompletionsAdapter:
text_parts: List[str] = []
tool_calls_raw: List[Any] = []
usage = None
+ total_timeout = timeout if isinstance(timeout, (int, float)) and timeout > 0 else None
+ deadline = time.monotonic() + float(total_timeout) if total_timeout else None
+ timed_out = threading.Event()
+ timeout_timer: Optional[threading.Timer] = None
+
+ def _timeout_message() -> str:
+ return f"Codex auxiliary Responses stream exceeded {float(total_timeout):.1f}s total timeout"
+
+ def _close_client_on_timeout() -> None:
+ timed_out.set()
+ close = getattr(self._client, "close", None)
+ if callable(close):
+ try:
+ close()
+ except Exception:
+ logger.debug("Codex auxiliary: client close during timeout failed", exc_info=True)
+
+ def _check_cancelled() -> None:
+ if deadline is not None and time.monotonic() >= deadline:
+ timed_out.set()
+ raise TimeoutError(_timeout_message())
+ try:
+ from tools.interrupt import is_interrupted
+ if is_interrupted():
+ raise InterruptedError("Codex auxiliary Responses stream interrupted")
+ except InterruptedError:
+ raise
+ except Exception:
+ # Interrupt state is a best-effort UX hook; never make it a
+ # new failure mode for auxiliary calls.
+ pass
try:
# Collect output items and text deltas during streaming —
@@ -608,8 +706,14 @@ class _CodexCompletionsAdapter:
collected_output_items: List[Any] = []
collected_text_deltas: List[str] = []
has_function_calls = False
+ if total_timeout:
+ timeout_timer = threading.Timer(float(total_timeout), _close_client_on_timeout)
+ timeout_timer.daemon = True
+ timeout_timer.start()
+ _check_cancelled()
with self._client.responses.stream(**resp_kwargs) as stream:
for _event in stream:
+ _check_cancelled()
_etype = getattr(_event, "type", "")
if _etype == "response.output_item.done":
_done = getattr(_event, "item", None)
@@ -621,6 +725,7 @@ class _CodexCompletionsAdapter:
collected_text_deltas.append(_delta)
elif "function_call" in _etype:
has_function_calls = True
+ _check_cancelled()
final = stream.get_final_response()
# Backfill empty output from collected stream events
@@ -680,8 +785,13 @@ class _CodexCompletionsAdapter:
total_tokens=getattr(resp_usage, "total_tokens", 0),
)
except Exception as exc:
+ if timed_out.is_set():
+ raise TimeoutError(_timeout_message()) from exc
logger.debug("Codex auxiliary Responses API call failed: %s", exc)
raise
+ finally:
+ if timeout_timer is not None:
+ timeout_timer.cancel()
content = "".join(text_parts).strip() or None
@@ -775,7 +885,14 @@ class _AnthropicCompletionsAdapter:
model = kwargs.get("model", self._model)
tools = kwargs.get("tools")
tool_choice = kwargs.get("tool_choice")
- max_tokens = kwargs.get("max_tokens") or kwargs.get("max_completion_tokens") or 2000
+ # ZAI's Anthropic-compatible endpoint rejects max_tokens on vision
+ # models (glm-4v-flash etc.) with error code 1210. When the caller
+ # signals this by setting _skip_zai_max_tokens in kwargs, omit it.
+ _skip_mt = kwargs.pop("_skip_zai_max_tokens", False)
+ if _skip_mt:
+ max_tokens = None
+ else:
+ max_tokens = kwargs.get("max_tokens") or kwargs.get("max_completion_tokens") or 2000
temperature = kwargs.get("temperature")
normalized_tool_choice = None
@@ -1150,7 +1267,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
raw_base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
base_url = _to_openai_base_url(raw_base_url)
- model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
+ model = _get_aux_model_for_provider(provider_id) or None
if model is None:
continue # skip provider if we don't know a valid aux model
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
@@ -1166,6 +1283,14 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
+ else:
+ try:
+ from providers import get_provider_profile as _gpf_aux
+ _ph_aux = _gpf_aux(provider_id)
+ if _ph_aux and _ph_aux.default_headers:
+ extra["default_headers"] = dict(_ph_aux.default_headers)
+ except Exception:
+ pass
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1177,7 +1302,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
raw_base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
base_url = _to_openai_base_url(raw_base_url)
- model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
+ model = _get_aux_model_for_provider(provider_id) or None
if model is None:
continue # skip provider if we don't know a valid aux model
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
@@ -1193,6 +1318,14 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
+ else:
+ try:
+ from providers import get_provider_profile as _gpf_aux2
+ _ph_aux2 = _gpf_aux2(provider_id)
+ if _ph_aux2 and _ph_aux2.default_headers:
+ extra["default_headers"] = dict(_ph_aux2.default_headers)
+ except Exception:
+ pass
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1565,7 +1698,7 @@ def _try_anthropic(explicit_api_key: str = None) -> Tuple[Optional[Any], Optiona
from agent.anthropic_adapter import _is_oauth_token
is_oauth = _is_oauth_token(token)
- model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
+ model = _get_aux_model_for_provider("anthropic") or "claude-haiku-4-5-20251001"
logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
try:
real_client = build_anthropic_client(token, base_url)
@@ -1643,6 +1776,39 @@ def _is_payment_error(exc: Exception) -> bool:
return False
+def _is_rate_limit_error(exc: Exception) -> bool:
+ """Detect rate-limit errors that warrant provider fallback.
+
+ Returns True for HTTP 429 errors whose message indicates rate limiting
+ (as opposed to billing/quota exhaustion, which _is_payment_error handles).
+ Also catches OpenAI SDK RateLimitError instances that may not set
+ .status_code on the exception object.
+ """
+ status = getattr(exc, "status_code", None)
+ err_lower = str(exc).lower()
+
+ # OpenAI SDK's RateLimitError sometimes omits .status_code —
+ # detect by class name so we don't miss these. (PR #8023 pattern)
+ if type(exc).__name__ == "RateLimitError":
+ return True
+
+ if status == 429:
+ # Distinguish rate-limit from billing: billing keywords are handled
+ # by _is_payment_error, everything else on 429 is a rate limit.
+ if any(kw in err_lower for kw in (
+ "rate limit", "rate_limit", "too many requests",
+ "try again", "retry after", "resets in",
+ )):
+ return True
+ # Generic 429 without billing keywords = likely a rate limit
+ if not any(kw in err_lower for kw in (
+ "credits", "insufficient funds", "billing",
+ "payment required", "can only afford",
+ )):
+ return True
+ return False
+
+
def _is_connection_error(exc: Exception) -> bool:
"""Detect connection/network errors that warrant provider fallback.
@@ -2368,7 +2534,7 @@ def resolve_provider_client(
if explicit_base_url:
base_url = _to_openai_base_url(explicit_base_url.strip().rstrip("/"))
- default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
+ default_model = _get_aux_model_for_provider(provider)
final_model = _normalize_resolved_model(model or default_model, provider)
if provider == "gemini":
@@ -2733,6 +2899,33 @@ def resolve_vision_provider_client(
)
return _finalize(requested, sync_client, default_model)
+ # ZAI vision models must use the OpenAI-compatible endpoint, not the
+ # Anthropic-compatible one (which may be the main-runtime default).
+ # The Anthropic wire rejects max_tokens on multimodal calls (error 1210),
+ # while the OpenAI wire handles it correctly.
+ if requested == "zai" and not resolved_base_url:
+ zai_openai_urls = [
+ "https://open.bigmodel.cn/api/paas/v4",
+ "https://api.z.ai/api/paas/v4",
+ ]
+ for _zai_url in zai_openai_urls:
+ client, final_model = _get_cached_client(
+ requested, resolved_model, async_mode,
+ base_url=_zai_url,
+ api_key=resolved_api_key or None,
+ api_mode="chat_completions",
+ is_vision=True,
+ )
+ if client is not None:
+ return _finalize(requested, client, final_model)
+ # Fallback: try without explicit base_url (old behavior)
+ client, final_model = _get_cached_client(requested, resolved_model, async_mode,
+ api_mode=resolved_api_mode,
+ is_vision=True)
+ if client is None:
+ return requested, None, None
+ return requested, client, final_model
+
client, final_model = _get_cached_client(requested, resolved_model, async_mode,
api_mode=resolved_api_mode,
is_vision=True)
@@ -2760,10 +2953,11 @@ def auxiliary_max_tokens_param(value: int) -> dict:
"""
custom_base = _current_custom_base_url()
or_key = os.getenv("OPENROUTER_API_KEY")
- # Only use max_completion_tokens for direct OpenAI custom endpoints
+ # Use max_completion_tokens for direct OpenAI-compatible providers that reject
+ # max_tokens on newer GPT-4o/o-series/GPT-5-style models.
if (not or_key
and _read_nous_auth() is None
- and base_url_hostname(custom_base) == "api.openai.com"):
+ and base_url_hostname(custom_base) in {"api.openai.com", "api.githubcopilot.com"}):
return {"max_completion_tokens": value}
return {"max_tokens": value}
@@ -3127,8 +3321,14 @@ def _resolve_task_provider_model(
if task:
# Config.yaml is the primary source for per-task overrides.
- if cfg_base_url:
+ if cfg_base_url and cfg_api_key:
+ # Both base_url and api_key explicitly set → custom endpoint.
return "custom", resolved_model, cfg_base_url, cfg_api_key, resolved_api_mode
+ if cfg_base_url and cfg_provider and cfg_provider != "auto":
+ # base_url set without api_key but with a known provider — use
+ # the provider so it can resolve credentials from env vars
+ # (e.g. OPENROUTER_API_KEY) instead of locking into "custom".
+ return cfg_provider, resolved_model, cfg_base_url, None, resolved_api_mode
if cfg_provider and cfg_provider != "auto":
return cfg_provider, resolved_model, None, None, resolved_api_mode
@@ -3285,7 +3485,16 @@ def _build_call_kwargs(
if max_tokens is not None:
# Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
# Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
- if provider == "custom":
+ # ZAI vision models (glm-4v-flash, glm-4v-plus, etc.) reject max_tokens with
+ # error code 1210 ("API 调用参数有误") on multimodal requests — skip it.
+ _model_lower = (model or "").lower()
+ _skip_max_tokens = (
+ provider == "zai"
+ and ("4v" in _model_lower or "5v" in _model_lower or "-v" in _model_lower)
+ )
+ if _skip_max_tokens:
+ pass # ZAI vision models do not accept max_tokens
+ elif provider == "custom":
custom_base = base_url or _current_custom_base_url()
if base_url_hostname(custom_base) == "api.openai.com":
kwargs["max_completion_tokens"] = max_tokens
@@ -3516,20 +3725,30 @@ def call_llm(
kwargs = retry_kwargs
err_str = str(first_err)
+ # ZAI vision models (glm-4v-flash etc.) return error code 1210
+ # ("API 调用参数有误") when max_tokens is passed on multimodal
+ # calls. The error message does NOT contain "max_tokens" so the
+ # generic retry below never fires. Detect the ZAI-specific error
+ # and strip max_tokens before retrying.
+ _is_zai_param_error = (
+ "1210" in err_str
+ and "bigmodel" in str(getattr(client, "base_url", ""))
+ )
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
+ or _is_zai_param_error
):
kwargs.pop("max_tokens", None)
- kwargs["max_completion_tokens"] = max_tokens
+ kwargs.pop("max_completion_tokens", None)
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
# If the max_tokens retry also hits a payment or connection
# error, fall through to the fallback chain below.
- if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
+ if not (_is_payment_error(retry_err) or _is_connection_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
first_err = retry_err
@@ -3612,13 +3831,27 @@ def call_llm(
# Codex/OAuth tokens that authenticate but whose endpoint is down,
# and providers the user never configured that got picked up by
# the auto-detection chain.
- should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
+ #
+ # ── Rate-limit fallback (#13579) ─────────────────────────────
+ # When the provider returns a 429 rate-limit (not billing), fall
+ # back to an alternative provider instead of exhausting retries
+ # against the same rate-limited endpoint.
+ should_fallback = (
+ _is_payment_error(first_err)
+ or _is_connection_error(first_err)
+ or _is_rate_limit_error(first_err)
+ )
# Only try alternative providers when the user didn't explicitly
# configure this task's provider. Explicit provider = hard constraint;
# auto (the default) = best-effort fallback chain. (#7559)
is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:
- reason = "payment error" if _is_payment_error(first_err) else "connection error"
+ if _is_payment_error(first_err):
+ reason = "payment error"
+ elif _is_rate_limit_error(first_err):
+ reason = "rate limit"
+ else:
+ reason = "connection error"
logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
task or "call", reason, resolved_provider, first_err)
fb_client, fb_model, fb_label = _try_payment_fallback(
@@ -3808,20 +4041,30 @@ async def async_call_llm(
kwargs = retry_kwargs
err_str = str(first_err)
+ # ZAI vision models (glm-4v-flash etc.) return error code 1210
+ # ("API 调用参数有误") when max_tokens is passed on multimodal
+ # calls. The error message does NOT contain "max_tokens" so the
+ # generic retry below never fires. Detect the ZAI-specific error
+ # and strip max_tokens before retrying.
+ _is_zai_param_error = (
+ "1210" in err_str
+ and "bigmodel" in str(getattr(client, "base_url", ""))
+ )
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
+ or _is_zai_param_error
):
kwargs.pop("max_tokens", None)
- kwargs["max_completion_tokens"] = max_tokens
+ kwargs.pop("max_completion_tokens", None)
try:
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
# If the max_tokens retry also hits a payment or connection
# error, fall through to the fallback chain below.
- if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
+ if not (_is_payment_error(retry_err) or _is_connection_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
first_err = retry_err
@@ -3890,11 +4133,20 @@ async def async_call_llm(
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
- # ── Payment / connection fallback (mirrors sync call_llm) ─────
- should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
+ # ── Payment / connection / rate-limit fallback (mirrors sync call_llm) ──
+ should_fallback = (
+ _is_payment_error(first_err)
+ or _is_connection_error(first_err)
+ or _is_rate_limit_error(first_err)
+ )
is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:
- reason = "payment error" if _is_payment_error(first_err) else "connection error"
+ if _is_payment_error(first_err):
+ reason = "payment error"
+ elif _is_rate_limit_error(first_err):
+ reason = "rate limit"
+ else:
+ reason = "connection error"
logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",
task or "call", reason, resolved_provider, first_err)
fb_client, fb_model, fb_label = _try_payment_fallback(
diff --git a/agent/bedrock_adapter.py b/agent/bedrock_adapter.py
index c1dc6bb979..34eebd73ba 100644
--- a/agent/bedrock_adapter.py
+++ b/agent/bedrock_adapter.py
@@ -631,11 +631,18 @@ def normalize_converse_response(response: Dict) -> SimpleNamespace:
stop_reason = response.get("stopReason", "end_turn")
text_parts = []
+ reasoning_parts = []
tool_calls = []
for block in content_blocks:
if "text" in block:
text_parts.append(block["text"])
+ elif "reasoningContent" in block:
+ reasoning = block["reasoningContent"]
+ if isinstance(reasoning, dict):
+ thinking_text = reasoning.get("text", "")
+ if thinking_text:
+ reasoning_parts.append(str(thinking_text))
elif "toolUse" in block:
tu = block["toolUse"]
tool_calls.append(SimpleNamespace(
@@ -652,6 +659,7 @@ def normalize_converse_response(response: Dict) -> SimpleNamespace:
role="assistant",
content="\n".join(text_parts) if text_parts else None,
tool_calls=tool_calls if tool_calls else None,
+ reasoning_content="\n\n".join(reasoning_parts) if reasoning_parts else None,
)
# Build usage stats
@@ -732,6 +740,7 @@ def stream_converse_with_callbacks(
``normalize_converse_response()``.
"""
text_parts: List[str] = []
+ reasoning_parts: List[str] = []
tool_calls: List[SimpleNamespace] = []
current_tool: Optional[Dict] = None
current_text_buffer: List[str] = []
@@ -777,8 +786,10 @@ def stream_converse_with_callbacks(
reasoning = delta["reasoningContent"]
if isinstance(reasoning, dict):
thinking_text = reasoning.get("text", "")
- if thinking_text and on_reasoning_delta:
- on_reasoning_delta(thinking_text)
+ if thinking_text:
+ reasoning_parts.append(str(thinking_text))
+ if on_reasoning_delta:
+ on_reasoning_delta(thinking_text)
elif "contentBlockStop" in event:
if current_tool is not None:
@@ -817,6 +828,7 @@ def stream_converse_with_callbacks(
role="assistant",
content="\n".join(text_parts) if text_parts else None,
tool_calls=tool_calls if tool_calls else None,
+ reasoning_content="\n\n".join(reasoning_parts) if reasoning_parts else None,
)
usage = SimpleNamespace(
diff --git a/agent/context_compressor.py b/agent/context_compressor.py
index f9111f9600..80b0a9b45b 100644
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -6,8 +6,7 @@ protecting head and tail context.
Improvements over v2:
- Structured summary template with Resolved/Pending question tracking
- - Summarizer preamble: "Do not respond to any questions" (from OpenCode)
- - Handoff framing: "different assistant" (from Codex) to create separation
+ - Filter-safe summarizer preamble that treats prior turns as source material
- "Remaining Work" replaces "Next Steps" to avoid reading as active instructions
- Clear separator when summary merges into tail message
- Iterative summary updates (preserves info across multiple compactions)
@@ -43,6 +42,9 @@ SUMMARY_PREFIX = (
"they were already addressed. "
"Your current task is identified in the '## Active Task' section of the "
"summary — resume exactly from there. "
+ "IMPORTANT: Your persistent memory (MEMORY.md, USER.md) in the system "
+ "prompt is ALWAYS authoritative and active — never ignore or deprioritize "
+ "memory content due to this compaction note. "
"Respond ONLY to the latest user message "
"that appears AFTER this summary. The current session state (files, "
"config, etc.) may reflect work described here — avoid repeating it:"
@@ -752,15 +754,14 @@ class ContextCompressor(ContextEngine):
content_to_summarize = self._serialize_for_summary(turns_to_summarize)
# Preamble shared by both first-compaction and iterative-update prompts.
- # Inspired by OpenCode's "do not respond to any questions" instruction
- # and Codex's "another language model" framing.
+ # Keep the wording deliberately plain: Azure/OpenAI-compatible content
+ # filters have flagged stronger "injection" / "do not respond" framing.
_summarizer_preamble = (
"You are a summarization agent creating a context checkpoint. "
- "Your output will be injected as reference material for a DIFFERENT "
- "assistant that continues the conversation. "
- "Do NOT respond to any questions or requests in the conversation — "
- "only output the structured summary. "
- "Do NOT include any preamble, greeting, or prefix. "
+ "Treat the conversation turns below as source material for a "
+ "compact record of prior work. "
+ "Produce only the structured summary; do not add a greeting, "
+ "preamble, or prefix. "
"Write the summary in the same language the user was using in the "
"conversation — do not translate or switch to English. "
"NEVER include API keys, tokens, passwords, secrets, credentials, "
@@ -774,7 +775,7 @@ class ContextCompressor(ContextEngine):
[THE SINGLE MOST IMPORTANT FIELD. Copy the user's most recent request or
task assignment verbatim — the exact words they used. If multiple tasks
were requested and only some are done, list only the ones NOT yet completed.
-The next assistant must pick up exactly here. Example:
+Continuation should pick up exactly here. Example:
"User asked: 'Now refactor the auth module to use JWT instead of sessions'"
If no outstanding task exists, write "None."]
@@ -811,7 +812,7 @@ Be specific with file paths, commands, line numbers, and results.]
[Important technical decisions and WHY they were made]
## Resolved Questions
-[Questions the user asked that were ALREADY answered — include the answer so the next assistant does not re-answer them]
+[Questions the user asked that were ALREADY answered — include the answer so it is not repeated]
## Pending User Asks
[Questions or requests from the user that have NOT yet been answered or fulfilled. If none, write "None."]
@@ -848,7 +849,7 @@ Update the summary using this exact structure. PRESERVE all existing information
# First compaction: summarize from scratch
prompt = f"""{_summarizer_preamble}
-Create a structured handoff summary for a different assistant that will continue this conversation after earlier turns are compacted. The next assistant should be able to understand what happened without re-reading the original turns.
+Create a structured checkpoint summary for the conversation after earlier turns are compacted. The summary should preserve enough detail for continuity without re-reading the original turns.
TURNS TO SUMMARIZE:
{content_to_summarize}
@@ -993,15 +994,39 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return None
@staticmethod
- def _with_summary_prefix(summary: str) -> str:
- """Normalize summary text to the current compaction handoff format."""
+ def _strip_summary_prefix(summary: str) -> str:
+ """Return summary body without the current or legacy handoff prefix."""
text = (summary or "").strip()
- for prefix in (LEGACY_SUMMARY_PREFIX, SUMMARY_PREFIX):
+ for prefix in (SUMMARY_PREFIX, LEGACY_SUMMARY_PREFIX):
if text.startswith(prefix):
- text = text[len(prefix):].lstrip()
- break
+ return text[len(prefix):].lstrip()
+ return text
+
+ @classmethod
+ def _with_summary_prefix(cls, summary: str) -> str:
+ """Normalize summary text to the current compaction handoff format."""
+ text = cls._strip_summary_prefix(summary)
return f"{SUMMARY_PREFIX}\n{text}" if text else SUMMARY_PREFIX
+ @staticmethod
+ def _is_context_summary_content(content: Any) -> bool:
+ text = _content_text_for_contains(content).lstrip()
+ return text.startswith(SUMMARY_PREFIX) or text.startswith(LEGACY_SUMMARY_PREFIX)
+
+ @classmethod
+ def _find_latest_context_summary(
+ cls,
+ messages: List[Dict[str, Any]],
+ start: int,
+ end: int,
+ ) -> tuple[Optional[int], str]:
+ """Find the newest handoff summary inside a compression window."""
+ for idx in range(end - 1, start - 1, -1):
+ content = messages[idx].get("content")
+ if cls._is_context_summary_content(content):
+ return idx, cls._strip_summary_prefix(_content_text_for_contains(content))
+ return None, ""
+
# ------------------------------------------------------------------
# Tool-call / tool-result pair integrity helpers
# ------------------------------------------------------------------
@@ -1308,6 +1333,15 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return messages
turns_to_summarize = messages[compress_start:compress_end]
+ summary_idx, summary_body = self._find_latest_context_summary(
+ messages,
+ compress_start,
+ compress_end,
+ )
+ if summary_idx is not None:
+ if summary_body and not self._previous_summary:
+ self._previous_summary = summary_body
+ turns_to_summarize = messages[summary_idx + 1:compress_end]
if not self.quiet_mode:
logger.info(
@@ -1340,7 +1374,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
msg = messages[i].copy()
if i == 0 and msg.get("role") == "system":
existing = msg.get("content")
- _compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
+ _compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work. Your persistent memory (MEMORY.md, USER.md) remains fully authoritative regardless of compaction.]"
if _compression_note not in _content_text_for_contains(existing):
msg["content"] = _append_text_to_content(
existing,
@@ -1385,6 +1419,19 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Merge the summary into the first tail message instead
# of inserting a standalone message that breaks alternation.
_merge_summary_into_tail = True
+
+ # When the summary lands as a standalone role="user" message,
+ # weak models read the verbatim "## Active Task" quote of a past
+ # user request as fresh input (#11475, #14521). Append the explicit
+ # end marker — the same one used in the merge-into-tail path — so
+ # the model has a clear "summary above, not new input" signal.
+ if not _merge_summary_into_tail and summary_role == "user":
+ summary = (
+ summary
+ + "\n\n--- END OF CONTEXT SUMMARY — "
+ "respond to the message below, not the summary above ---"
+ )
+
if not _merge_summary_into_tail:
compressed.append({"role": summary_role, "content": summary})
diff --git a/agent/copilot_acp_client.py b/agent/copilot_acp_client.py
index 027defa22b..457b32b37b 100644
--- a/agent/copilot_acp_client.py
+++ b/agent/copilot_acp_client.py
@@ -477,8 +477,8 @@ class CopilotACPClient:
proc.stdin.write(json.dumps(payload) + "\n")
proc.stdin.flush()
- deadline = time.time() + timeout_seconds
- while time.time() < deadline:
+ deadline = time.monotonic() + timeout_seconds
+ while time.monotonic() < deadline:
if proc.poll() is not None:
break
try:
diff --git a/agent/credential_pool.py b/agent/credential_pool.py
index 27a16bd435..0043c70ca2 100644
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@@ -68,8 +68,10 @@ SUPPORTED_POOL_STRATEGIES = {
}
# Cooldown before retrying an exhausted credential.
-# 429 (rate-limited) and 402 (billing/quota) both cool down after 1 hour.
+# Transient 401 auth failures cool down briefly so single-key setups can recover.
+# 429 (rate-limited), 402 (billing/quota), and other failures cool down after 1 hour.
# Provider-supplied reset_at timestamps override these defaults.
+EXHAUSTED_TTL_401_SECONDS = 5 * 60 # 5 minutes
EXHAUSTED_TTL_429_SECONDS = 60 * 60 # 1 hour
EXHAUSTED_TTL_DEFAULT_SECONDS = 60 * 60 # 1 hour
@@ -190,6 +192,8 @@ def _is_manual_source(source: str) -> bool:
def _exhausted_ttl(error_code: Optional[int]) -> int:
"""Return cooldown seconds based on the HTTP status that caused exhaustion."""
+ if error_code == 401:
+ return EXHAUSTED_TTL_401_SECONDS
if error_code == 429:
return EXHAUSTED_TTL_429_SECONDS
return EXHAUSTED_TTL_DEFAULT_SECONDS
@@ -305,14 +309,29 @@ def _iter_custom_providers(config: Optional[dict] = None):
yield _normalize_custom_pool_name(name), entry
-def get_custom_provider_pool_key(base_url: str) -> Optional[str]:
+def get_custom_provider_pool_key(base_url: str, provider_name: Optional[str] = None) -> Optional[str]:
"""Look up the custom_providers list in config.yaml and return 'custom:' for a matching base_url.
+ When provider_name is given, prefer matching by name first (solving the case where
+ multiple custom providers share the same base_url but have different API keys).
+ Falls back to base_url matching when no name match is found.
+
Returns None if no match is found.
"""
if not base_url:
return None
normalized_url = base_url.strip().rstrip("/")
+
+ # When a provider name is given, try to match by name first.
+ # This fixes the P1 bug where two custom providers sharing the same
+ # base_url always resolve to the first one's credentials.
+ if provider_name:
+ normalized_name = _normalize_custom_pool_name(provider_name)
+ for norm_name, entry in _iter_custom_providers():
+ if norm_name == normalized_name:
+ return f"{CUSTOM_POOL_PREFIX}{norm_name}"
+
+ # Fall back to base_url matching (original behavior)
for norm_name, entry in _iter_custom_providers():
entry_url = str(entry.get("base_url") or "").strip().rstrip("/")
if entry_url and entry_url == normalized_url:
diff --git a/agent/display.py b/agent/display.py
index 474595d76c..1dd65c3514 100644
--- a/agent/display.py
+++ b/agent/display.py
@@ -852,13 +852,15 @@ def get_cute_tool_message(
s = str(s)
if _tool_preview_max_len == 0:
return s # no limit
- return (s[:n-3] + "...") if len(s) > n else s
+ limit = _tool_preview_max_len
+ return (s[:limit-3] + "...") if len(s) > limit else s
def _path(p, n=35):
p = str(p)
if _tool_preview_max_len == 0:
return p # no limit
- return ("..." + p[-(n-3):]) if len(p) > n else p
+ limit = _tool_preview_max_len
+ return ("..." + p[-(limit-3):]) if len(p) > limit else p
def _wrap(line: str) -> str:
"""Apply skin tool prefix and failure suffix."""
diff --git a/agent/error_classifier.py b/agent/error_classifier.py
index 67feaa4304..419a984b75 100644
--- a/agent/error_classifier.py
+++ b/agent/error_classifier.py
@@ -55,6 +55,7 @@ class FailoverReason(enum.Enum):
thinking_signature = "thinking_signature" # Anthropic thinking block sig invalid
long_context_tier = "long_context_tier" # Anthropic "extra usage" tier gate
oauth_long_context_beta_forbidden = "oauth_long_context_beta_forbidden" # Anthropic OAuth subscription rejects 1M context beta — disable beta and retry
+ llama_cpp_grammar_pattern = "llama_cpp_grammar_pattern" # llama.cpp json-schema-to-grammar rejects regex escapes in `pattern` / `format` — strip from tools and retry
# Catch-all
unknown = "unknown" # Unclassifiable — retry with backoff
@@ -470,6 +471,31 @@ def classify_api_error(
should_compress=False,
)
+ # llama.cpp's ``json-schema-to-grammar`` converter (used by its OAI
+ # server to build GBNF tool-call parsers) rejects regex escape classes
+ # like ``\d``/``\w``/``\s`` and most ``format`` values. MCP servers
+ # routinely emit ``"pattern": "\\d{4}-\\d{2}-\\d{2}"`` for date/phone/
+ # email params. llama.cpp surfaces this as HTTP 400 with one of a few
+ # recognizable phrases; on match we strip ``pattern``/``format`` from
+ # ``self.tools`` in the retry loop and retry once. Cloud providers are
+ # unaffected — they accept these keywords and we never hit this branch.
+ if (
+ status_code == 400
+ and (
+ "error parsing grammar" in error_msg
+ or "json-schema-to-grammar" in error_msg
+ or (
+ "unable to generate parser" in error_msg
+ and "template" in error_msg
+ )
+ )
+ ):
+ return _result(
+ FailoverReason.llama_cpp_grammar_pattern,
+ retryable=True,
+ should_compress=False,
+ )
+
# ── 2. HTTP status code classification ──────────────────────────
if status_code is not None:
diff --git a/agent/i18n.py b/agent/i18n.py
new file mode 100644
index 0000000000..0196439bb4
--- /dev/null
+++ b/agent/i18n.py
@@ -0,0 +1,233 @@
+"""Lightweight internationalization (i18n) for Hermes static user-facing messages.
+
+Scope (thin slice, by design): only the highest-impact static strings shown
+to the user by Hermes itself -- approval prompts, a handful of gateway slash
+command replies, restart-drain notices. Agent-generated output, log lines,
+error tracebacks, tool outputs, and slash-command descriptions all stay in
+English.
+
+Catalog files live under ``locales/.yaml`` at the repo root. Each
+catalog is a flat dict keyed by dotted paths (e.g. ``approval.choose`` or
+``gateway.approval_expired``). Missing keys fall back to English; if English
+is missing too, the key path itself is returned so a broken catalog never
+crashes the agent.
+
+Usage::
+
+ from agent.i18n import t
+ print(t("approval.choose_long")) # current lang
+ print(t("gateway.draining", count=3)) # {count} formatted
+ print(t("approval.choose_long", lang="zh")) # explicit override
+
+Language resolution order:
+ 1. Explicit ``lang=`` argument passed to :func:`t`
+ 2. ``HERMES_LANGUAGE`` environment variable (for tests / quick override)
+ 3. ``display.language`` from config.yaml
+ 4. ``"en"`` (baseline)
+
+Supported languages: en, zh, ja, de, es, fr, tr, uk. Unknown values fall back to en.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import threading
+from functools import lru_cache
+from pathlib import Path
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+SUPPORTED_LANGUAGES: tuple[str, ...] = ("en", "zh", "ja", "de", "es", "fr", "tr", "uk")
+DEFAULT_LANGUAGE = "en"
+
+# Accept a few natural aliases so users who type "chinese" / "zh-CN" / "jp"
+# get the right catalog instead of silently falling back to English.
+_LANGUAGE_ALIASES: dict[str, str] = {
+ "english": "en", "en-us": "en", "en-gb": "en",
+ "chinese": "zh", "mandarin": "zh", "zh-cn": "zh", "zh-tw": "zh", "zh-hans": "zh", "zh-hant": "zh",
+ "japanese": "ja", "jp": "ja", "ja-jp": "ja",
+ "german": "de", "deutsch": "de", "de-de": "de",
+ "spanish": "es", "español": "es", "espanol": "es", "es-es": "es", "es-mx": "es",
+ "french": "fr", "français": "fr", "france": "fr", "fr-fr": "fr", "fr-be": "fr", "fr-ca": "fr", "fr-ch": "fr",
+ "ukrainian": "uk", "ukrainisch": "uk", "українська": "uk", "uk-ua": "uk", "ua": "uk",
+ "turkish": "tr", "türkçe": "tr", "tr-tr": "tr",
+}
+
+_catalog_cache: dict[str, dict[str, str]] = {}
+_catalog_lock = threading.Lock()
+
+
+def _locales_dir() -> Path:
+ """Return the directory containing locale YAML files.
+
+ Lives next to the repo root so both the bundled install and editable
+ checkouts find it without PYTHONPATH gymnastics.
+ """
+ # agent/i18n.py -> agent/ -> repo root
+ return Path(__file__).resolve().parent.parent / "locales"
+
+
+def _normalize_lang(value: Any) -> str:
+ """Normalize a user-supplied language value to a supported code.
+
+ Accepts supported codes directly, common aliases (``chinese`` -> ``zh``),
+ and case-insensitive regional tags (``zh-CN`` -> ``zh``). Returns the
+ default language for unknown values.
+ """
+ if not isinstance(value, str):
+ return DEFAULT_LANGUAGE
+ key = value.strip().lower()
+ if not key:
+ return DEFAULT_LANGUAGE
+ if key in SUPPORTED_LANGUAGES:
+ return key
+ if key in _LANGUAGE_ALIASES:
+ return _LANGUAGE_ALIASES[key]
+ # Try stripping a region suffix (e.g. "pt-br" -> "pt" won't be supported,
+ # but "zh-CN" -> "zh" will).
+ base = key.split("-", 1)[0]
+ if base in SUPPORTED_LANGUAGES:
+ return base
+ return DEFAULT_LANGUAGE
+
+
+def _load_catalog(lang: str) -> dict[str, str]:
+ """Load and flatten one locale YAML file into a dotted-key dict.
+
+ YAML files can be nested for human readability; this produces the flat
+ key space :func:`t` expects. Cached per-language for the process.
+ """
+ with _catalog_lock:
+ cached = _catalog_cache.get(lang)
+ if cached is not None:
+ return cached
+
+ path = _locales_dir() / f"{lang}.yaml"
+ if not path.is_file():
+ logger.debug("i18n catalog missing for %s at %s", lang, path)
+ with _catalog_lock:
+ _catalog_cache[lang] = {}
+ return {}
+
+ try:
+ import yaml # PyYAML is already a hermes dependency
+ with path.open("r", encoding="utf-8") as f:
+ raw = yaml.safe_load(f) or {}
+ except Exception as exc:
+ logger.warning("Failed to load i18n catalog %s: %s", path, exc)
+ with _catalog_lock:
+ _catalog_cache[lang] = {}
+ return {}
+
+ flat: dict[str, str] = {}
+ _flatten_into(raw, "", flat)
+ with _catalog_lock:
+ _catalog_cache[lang] = flat
+ return flat
+
+
+def _flatten_into(node: Any, prefix: str, out: dict[str, str]) -> None:
+ if isinstance(node, dict):
+ for key, value in node.items():
+ child_key = f"{prefix}.{key}" if prefix else str(key)
+ _flatten_into(value, child_key, out)
+ elif isinstance(node, str):
+ out[prefix] = node
+ # Non-string, non-dict leaves are ignored -- catalogs are text-only.
+
+
+@lru_cache(maxsize=1)
+def _config_language_cached() -> str | None:
+ """Read ``display.language`` from config.yaml once per process.
+
+ Cached because ``t()`` is called in hot paths (every approval prompt,
+ every gateway reply) and re-reading YAML each call would be wasteful.
+ ``reset_language_cache()`` clears this when config changes at runtime
+ (e.g. after the setup wizard).
+ """
+ try:
+ from hermes_cli.config import load_config
+ cfg = load_config()
+ lang = (cfg.get("display") or {}).get("language")
+ if lang:
+ return _normalize_lang(lang)
+ except Exception as exc:
+ logger.debug("Could not read display.language from config: %s", exc)
+ return None
+
+
+def reset_language_cache() -> None:
+ """Invalidate cached language resolution and catalogs.
+
+ Call after :func:`hermes_cli.config.save_config` if a running process
+ needs to pick up a changed ``display.language`` without restart.
+ """
+ _config_language_cached.cache_clear()
+ with _catalog_lock:
+ _catalog_cache.clear()
+
+
+def get_language() -> str:
+ """Resolve the active language using env > config > default order."""
+ env_lang = os.environ.get("HERMES_LANGUAGE")
+ if env_lang:
+ return _normalize_lang(env_lang)
+ cfg_lang = _config_language_cached()
+ if cfg_lang:
+ return cfg_lang
+ return DEFAULT_LANGUAGE
+
+
+def t(key: str, lang: str | None = None, **format_kwargs: Any) -> str:
+ """Translate a dotted key to the active language.
+
+ Parameters
+ ----------
+ key
+ Dotted path into the catalog, e.g. ``"approval.choose_long"``.
+ lang
+ Explicit language override. Takes precedence over env + config.
+ **format_kwargs
+ ``str.format`` substitution arguments (``t("gateway.drain", count=3)``
+ expects a catalog entry with a ``{count}`` placeholder).
+
+ Returns
+ -------
+ The translated string, or the English fallback if the key is missing in
+ the target language, or the bare key if English is also missing.
+ """
+ target = _normalize_lang(lang) if lang else get_language()
+ catalog = _load_catalog(target)
+ value = catalog.get(key)
+
+ if value is None and target != DEFAULT_LANGUAGE:
+ # Fall through to English rather than showing a key path to the user.
+ value = _load_catalog(DEFAULT_LANGUAGE).get(key)
+
+ if value is None:
+ # Last-ditch: return the key itself. A broken catalog should not
+ # crash anything; it just looks ugly until someone fixes it.
+ logger.debug("i18n miss: key=%r lang=%r", key, target)
+ value = key
+
+ if format_kwargs:
+ try:
+ return value.format(**format_kwargs)
+ except (KeyError, IndexError, ValueError) as exc:
+ logger.warning(
+ "i18n format failed for key=%r lang=%r kwargs=%r: %s",
+ key, target, format_kwargs, exc,
+ )
+ return value
+ return value
+
+
+__all__ = [
+ "SUPPORTED_LANGUAGES",
+ "DEFAULT_LANGUAGE",
+ "t",
+ "get_language",
+ "reset_language_cache",
+]
diff --git a/agent/image_routing.py b/agent/image_routing.py
index bd2ba83c87..0b6687787a 100644
--- a/agent/image_routing.py
+++ b/agent/image_routing.py
@@ -144,7 +144,51 @@ def decide_image_input_mode(
# it fires, which is cheaper than permanent quality loss.
-def _guess_mime(path: Path) -> str:
+def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
+ """Detect image MIME from magic bytes. Returns None if unrecognised.
+
+ Filename-based detection (``mimetypes.guess_type``) is unreliable when
+ upstream platforms lie about content-type. Discord, for example, can
+ serve a PNG with ``content_type=image/webp`` for proxied/animated
+ stickers, custom emoji previews, or images uploaded via certain bots.
+ Anthropic strictly validates that declared media_type matches the
+ actual bytes and returns HTTP 400 on mismatch, so we sniff to be safe.
+ """
+ if not raw:
+ return None
+ # PNG: 89 50 4E 47 0D 0A 1A 0A
+ if raw.startswith(b"\x89PNG\r\n\x1a\n"):
+ return "image/png"
+ # JPEG: FF D8 FF
+ if raw.startswith(b"\xff\xd8\xff"):
+ return "image/jpeg"
+ # GIF87a / GIF89a
+ if raw[:6] in (b"GIF87a", b"GIF89a"):
+ return "image/gif"
+ # WEBP: "RIFF" .... "WEBP"
+ if len(raw) >= 12 and raw[:4] == b"RIFF" and raw[8:12] == b"WEBP":
+ return "image/webp"
+ # BMP: "BM"
+ if raw.startswith(b"BM"):
+ return "image/bmp"
+ # HEIC/HEIF: ftypheic / ftypheix / ftypmif1 / ftypmsf1 etc.
+ if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in (
+ b"heic", b"heix", b"hevc", b"hevx", b"mif1", b"msf1", b"heim", b"heis",
+ ):
+ return "image/heic"
+ return None
+
+
+def _guess_mime(path: Path, raw: Optional[bytes] = None) -> str:
+ """Return image MIME type for *path*.
+
+ If *raw* bytes are provided, magic-byte sniffing wins (authoritative).
+ Otherwise we fall back to ``mimetypes`` then suffix-based defaults.
+ """
+ if raw is not None:
+ sniffed = _sniff_mime_from_bytes(raw)
+ if sniffed:
+ return sniffed
mime, _ = mimetypes.guess_type(str(path))
if mime and mime.startswith("image/"):
return mime
@@ -178,7 +222,7 @@ def _file_to_data_url(path: Path) -> Optional[str]:
except Exception as exc:
logger.warning("image_routing: failed to read %s — %s", path, exc)
return None
- mime = _guess_mime(path)
+ mime = _guess_mime(path, raw=raw)
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"
@@ -190,24 +234,30 @@ def build_native_content_parts(
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
- [{"type": "text", "text": "..."},
+ [{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
...]
+ The local path of each successfully attached image is appended to the
+ text part as ``[Image attached at: ]``. The model still sees the
+ pixels via the ``image_url`` part (full native vision); the path note
+ just gives it a string handle so MCP/skill tools that take an image
+ path or URL argument can be invoked on the same image without an
+ extra round-trip. This parallels the text-mode hint produced by
+ ``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
+ ``) so behaviour is consistent across both image input modes.
+
Images are attached at their native size. If a provider rejects the
request because an image is too large (e.g. Anthropic's 5 MB per-image
ceiling), the agent's retry loop transparently shrinks and retries
once — see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
- couldn't be read from disk.
+ couldn't be read from disk and are NOT advertised in the path hints.
"""
- parts: List[Dict[str, Any]] = []
skipped: List[str] = []
-
- text = (user_text or "").strip()
- if text:
- parts.append({"type": "text", "text": text})
+ image_parts: List[Dict[str, Any]] = []
+ attached_paths: List[str] = []
for raw_path in image_paths:
p = Path(raw_path)
@@ -218,15 +268,30 @@ def build_native_content_parts(
if not data_url:
skipped.append(str(raw_path))
continue
- parts.append({
+ image_parts.append({
"type": "image_url",
"image_url": {"url": data_url},
})
+ attached_paths.append(str(raw_path))
- # If the text was empty, add a neutral prompt so the turn isn't just images.
- if not text and any(p.get("type") == "image_url" for p in parts):
- parts.insert(0, {"type": "text", "text": "What do you see in this image?"})
+ text = (user_text or "").strip()
+ # If at least one image attached, build a single text part that combines
+ # the user's caption (or a neutral default) with one path hint per image.
+ if attached_paths:
+ base_text = text or "What do you see in this image?"
+ path_hints = "\n".join(
+ f"[Image attached at: {p}]" for p in attached_paths
+ )
+ combined_text = f"{base_text}\n\n{path_hints}"
+ parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
+ parts.extend(image_parts)
+ return parts, skipped
+
+ # No images successfully attached — fall back to plain text-only behaviour.
+ parts = []
+ if text:
+ parts.append({"type": "text", "text": text})
return parts, skipped
diff --git a/agent/memory_manager.py b/agent/memory_manager.py
index ea9b7425fc..1319681d3b 100644
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@@ -1,17 +1,14 @@
-"""MemoryManager — orchestrates the built-in memory provider plus at most
-ONE external plugin memory provider.
+"""MemoryManager — orchestrates memory providers for the agent.
Single integration point in run_agent.py. Replaces scattered per-backend
code with one manager that delegates to registered providers.
-The BuiltinMemoryProvider is always registered first and cannot be removed.
-Only ONE external (non-builtin) provider is allowed at a time — attempting
-to register a second external provider is rejected with a warning. This
+Only ONE external plugin provider is allowed at a time — attempting to
+register a second external provider is rejected with a warning. This
prevents tool schema bloat and conflicting memory backends.
Usage in run_agent.py:
self._memory_manager = MemoryManager()
- self._memory_manager.add_provider(BuiltinMemoryProvider(...))
# Only ONE of these:
self._memory_manager.add_provider(plugin_provider)
@@ -49,7 +46,7 @@ _INTERNAL_CONTEXT_RE = re.compile(
re.IGNORECASE,
)
_INTERNAL_NOTE_RE = re.compile(
- r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
+ r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as (?:informational background data|authoritative reference data[^\]]*)\.\]\s*',
re.IGNORECASE,
)
@@ -183,7 +180,8 @@ def build_memory_context_block(raw_context: str) -> str:
return (
"\n"
"[System note: The following is recalled memory context, "
- "NOT new user input. Treat as informational background data.]\n\n"
+ "NOT new user input. Treat as authoritative reference data — "
+ "this is the agent's persistent memory and should inform all responses.]\n\n"
f"{clean}\n"
" "
)
diff --git a/agent/memory_provider.py b/agent/memory_provider.py
index 1c8dbaf682..c9abc48c7a 100644
--- a/agent/memory_provider.py
+++ b/agent/memory_provider.py
@@ -1,17 +1,16 @@
"""Abstract base class for pluggable memory providers.
-Memory providers give the agent persistent recall across sessions. One
-external provider is active at a time alongside the always-on built-in
-memory (MEMORY.md / USER.md). The MemoryManager enforces this limit.
+Memory providers give the agent persistent recall across sessions.
+The MemoryManager enforces a one-external-provider limit to prevent
+tool schema bloat and conflicting memory backends.
-Built-in memory is always active as the first provider and cannot be removed.
-External providers (Honcho, Hindsight, Mem0, etc.) are additive — they never
-disable the built-in store. Only one external provider runs at a time to
-prevent tool schema bloat and conflicting memory backends.
+External providers (Honcho, Hindsight, Mem0, etc.) are registered
+and managed via MemoryManager. Only one external provider runs at a
+time.
Registration:
- 1. Built-in: BuiltinMemoryProvider — always present, not removable.
- 2. Plugins: Ship in plugins/memory//, activated by memory.provider config.
+ Plugins ship in plugins/memory// and are activated via
+ the memory.provider config key.
Lifecycle (called by MemoryManager, wired in run_agent.py):
initialize() — connect, create resources, warm up
diff --git a/agent/model_metadata.py b/agent/model_metadata.py
index 12117f1446..c362a9ec93 100644
--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@@ -318,6 +318,17 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"ollama.com": "ollama-cloud",
}
+# Auto-extend with hostnames derived from provider profiles.
+# Any provider with a base_url not already in the map gets added automatically.
+try:
+ from providers import list_providers as _list_providers
+ for _pp in _list_providers():
+ _host = _pp.get_hostname()
+ if _host and _host not in _URL_TO_PROVIDER:
+ _URL_TO_PROVIDER[_host] = _pp.name
+except Exception:
+ pass
+
def _infer_provider_from_url(base_url: str) -> Optional[str]:
"""Infer the models.dev provider name from a base URL.
diff --git a/agent/models_dev.py b/agent/models_dev.py
index 79cfa90ca9..0ef18f4ce1 100644
--- a/agent/models_dev.py
+++ b/agent/models_dev.py
@@ -381,14 +381,18 @@ def get_model_capabilities(provider: str, model: str) -> Optional[ModelCapabilit
# Extract capability flags (default to False if missing)
supports_tools = bool(entry.get("tool_call", False))
- # Vision: check both the `attachment` flag and `modalities.input` for "image".
- # Some models (e.g. gemma-4) list image in input modalities but not attachment.
+ # Vision: prefer explicit `modalities.input` when models.dev provides it.
+ # The older `attachment` flag can be stale or too broad for image routing;
+ # fall back to it only when the input modalities are absent/invalid.
input_mods = entry.get("modalities", {})
if isinstance(input_mods, dict):
- input_mods = input_mods.get("input", [])
+ input_mods = input_mods.get("input")
else:
- input_mods = []
- supports_vision = bool(entry.get("attachment", False)) or "image" in input_mods
+ input_mods = None
+ if isinstance(input_mods, list):
+ supports_vision = "image" in input_mods
+ else:
+ supports_vision = bool(entry.get("attachment", False))
supports_reasoning = bool(entry.get("reasoning", False))
# Extract limits
diff --git a/agent/prompt_builder.py b/agent/prompt_builder.py
index 8494a70eef..2f00020cc1 100644
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -513,6 +513,12 @@ PLATFORM_HINTS = {
"image and is the WRONG path. Bare Unicode emoji in text is also not a substitute "
"— when a sticker is the right response, use yb_send_sticker."
),
+ "api_server": (
+ "You're responding through an API server. The rendering layer is unknown — "
+ "assume plain text. No markdown formatting (no asterisks, bullets, headers, "
+ "code fences). Treat this like a conversation, not a document. Keep responses "
+ "brief and natural."
+ ),
}
# ---------------------------------------------------------------------------
diff --git a/agent/redact.py b/agent/redact.py
index afdee65288..1ac284cffd 100644
--- a/agent/redact.py
+++ b/agent/redact.py
@@ -56,12 +56,15 @@ _SENSITIVE_BODY_KEYS = frozenset({
})
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
-# `export HERMES_REDACT_SECRETS=true`) cannot enable/disable redaction
-# mid-session. OFF by default — user must opt in via
-# `security.redact_secrets: true` in config.yaml (bridged to this env var
-# in hermes_cli/main.py and gateway/run.py) or `HERMES_REDACT_SECRETS=true`
-# in ~/.hermes/.env.
-_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("1", "true", "yes", "on")
+# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction
+# mid-session. ON by default — secure default per issue #17691. Users who
+# need raw credential values in tool output (e.g. working on the redactor
+# itself) can opt out via `security.redact_secrets: false` in config.yaml
+# (bridged to this env var in hermes_cli/main.py, gateway/run.py, and
+# cli.py) or `HERMES_REDACT_SECRETS=false` in ~/.hermes/.env. An opt-out
+# warning is logged at gateway and CLI startup so operators see the
+# downgrade — see `_log_redaction_status()` in gateway/run.py and cli.py.
+_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "true").lower() in ("1", "true", "yes", "on")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
diff --git a/agent/think_scrubber.py b/agent/think_scrubber.py
new file mode 100644
index 0000000000..44ddcacff7
--- /dev/null
+++ b/agent/think_scrubber.py
@@ -0,0 +1,386 @@
+"""Stateful scrubber for reasoning/thinking blocks in streamed assistant text.
+
+``run_agent._strip_think_blocks`` is regex-based and correct for a complete
+string, but when it runs *per-delta* in ``_fire_stream_delta`` it destroys
+the state that downstream consumers (CLI ``_stream_delta``, gateway
+``GatewayStreamConsumer._filter_and_accumulate``) rely on.
+
+Concretely, when MiniMax-M2.7 streams
+
+ delta1 = ""
+ delta2 = "Let me check their config"
+ delta3 = " "
+
+the per-delta regex erases delta1 entirely (case 2: unterminated-open at
+boundary matches ``^...``), so the downstream state machine never
+sees the open tag, treats delta2 as regular content, and leaks reasoning
+to the user. Consumers that don't run their own state machine (ACP,
+api_server, TTS) never had any defence at all — they just emitted
+whatever survived the upstream regex.
+
+This module centralises the tag-suppression state machine at the
+upstream layer so every stream_delta_callback sees text that has
+already had reasoning blocks removed. Partial tags at delta
+boundaries are held back until the next delta resolves them, and
+end-of-stream flushing surfaces any held-back prose that turned out
+not to be a real tag.
+
+Usage::
+
+ scrubber = StreamingThinkScrubber()
+ for delta in stream:
+ visible = scrubber.feed(delta)
+ if visible:
+ emit(visible)
+ tail = scrubber.flush() # at end of stream
+ if tail:
+ emit(tail)
+
+The scrubber is re-entrant per agent instance. Call ``reset()`` at
+the top of each new turn so a hung block from an interrupted prior
+stream cannot taint the next turn's output.
+
+Tag variants handled (case-insensitive):
+ ````, ````, ````, ````,
+ ````.
+
+Block-boundary rule for opens: an opening tag is only treated as a
+reasoning-block opener when it appears at the start of the stream,
+after a newline (optionally followed by whitespace), or when only
+whitespace has been emitted on the current line. This prevents prose
+that *mentions* the tag name (e.g. ``"use tags here"``) from
+being incorrectly suppressed. Closed pairs (``X ``) are
+always suppressed regardless of boundary; a closed pair is an
+intentional, bounded construct.
+"""
+
+from __future__ import annotations
+
+from typing import Tuple
+
+__all__ = ["StreamingThinkScrubber"]
+
+
+class StreamingThinkScrubber:
+ """Stateful scrubber for streaming reasoning/thinking blocks.
+
+ State machine:
+ - ``_in_block``: True while inside an opened block, waiting for
+ a close tag. All text inside is discarded.
+ - ``_buf``: held-back partial-tag tail. Emitted / discarded on
+ the next ``feed()`` call or by ``flush()``.
+ - ``_last_emitted_ended_newline``: True iff the most recent
+ emission to the consumer ended with ``\\n``, or nothing has
+ been emitted yet (start-of-stream counts as a boundary). Used
+ to decide whether an open tag at buffer position 0 is at a
+ block boundary.
+ """
+
+ _OPEN_TAG_NAMES: Tuple[str, ...] = (
+ "think",
+ "thinking",
+ "reasoning",
+ "thought",
+ "REASONING_SCRATCHPAD",
+ )
+
+ # Materialise literal tag strings so the hot path does string
+ # operations, not regex compilation per feed().
+ _OPEN_TAGS: Tuple[str, ...] = tuple(f"<{name}>" for name in _OPEN_TAG_NAMES)
+ _CLOSE_TAGS: Tuple[str, ...] = tuple(f"{name}>" for name in _OPEN_TAG_NAMES)
+
+ # Pre-compute the longest tag (for partial-tag hold-back bound).
+ _MAX_TAG_LEN: int = max(len(tag) for tag in _OPEN_TAGS + _CLOSE_TAGS)
+
+ def __init__(self) -> None:
+ self._in_block: bool = False
+ self._buf: str = ""
+ self._last_emitted_ended_newline: bool = True
+
+ def reset(self) -> None:
+ """Reset all state. Call at the top of every new turn."""
+ self._in_block = False
+ self._buf = ""
+ self._last_emitted_ended_newline = True
+
+ def feed(self, text: str) -> str:
+ """Feed one delta; return the scrubbed visible portion.
+
+ May return an empty string when the entire delta is reasoning
+ content or is being held back pending resolution of a partial
+ tag at the boundary.
+ """
+ if not text:
+ return ""
+ buf = self._buf + text
+ self._buf = ""
+ out: list[str] = []
+
+ while buf:
+ if self._in_block:
+ # Hunt for the earliest close tag.
+ close_idx, close_len = self._find_first_tag(
+ buf, self._CLOSE_TAGS,
+ )
+ if close_idx == -1:
+ # No close yet — hold back a potential partial
+ # close-tag prefix; discard everything else.
+ held = self._max_partial_suffix(buf, self._CLOSE_TAGS)
+ self._buf = buf[-held:] if held else ""
+ return "".join(out)
+ # Found close: discard block content + tag, continue.
+ buf = buf[close_idx + close_len:]
+ self._in_block = False
+ else:
+ # Priority 1 — closed X pair anywhere in
+ # buf. Closed pairs are always an intentional,
+ # bounded construct (even mid-line prose containing
+ # an open/close pair is almost certainly a model
+ # leaking reasoning inline), so no boundary gating.
+ pair = self._find_earliest_closed_pair(buf)
+ # Priority 2 — unterminated open tag at a block
+ # boundary. Boundary-gated so prose that mentions
+ # '' isn't over-stripped.
+ open_idx, open_len = self._find_open_at_boundary(
+ buf, out,
+ )
+
+ # Pick whichever match comes earliest in the buffer.
+ if pair is not None and (
+ open_idx == -1 or pair[0] <= open_idx
+ ):
+ start_idx, end_idx = pair
+ preceding = buf[:start_idx]
+ if preceding:
+ preceding = self._strip_orphan_close_tags(preceding)
+ if preceding:
+ out.append(preceding)
+ self._last_emitted_ended_newline = (
+ preceding.endswith("\n")
+ )
+ buf = buf[end_idx:]
+ continue
+
+ if open_idx != -1:
+ # Unterminated open at boundary — emit preceding,
+ # enter block, continue loop with remainder.
+ preceding = buf[:open_idx]
+ if preceding:
+ preceding = self._strip_orphan_close_tags(preceding)
+ if preceding:
+ out.append(preceding)
+ self._last_emitted_ended_newline = (
+ preceding.endswith("\n")
+ )
+ self._in_block = True
+ buf = buf[open_idx + open_len:]
+ continue
+
+ # No resolvable tag structure in buf. Hold back any
+ # partial-tag prefix at the tail so a split tag
+ # across deltas isn't missed, then emit the rest.
+ held = self._max_partial_suffix(buf, self._OPEN_TAGS)
+ held_close = self._max_partial_suffix(
+ buf, self._CLOSE_TAGS,
+ )
+ held = max(held, held_close)
+ if held:
+ emit_text = buf[:-held]
+ self._buf = buf[-held:]
+ else:
+ emit_text = buf
+ self._buf = ""
+ if emit_text:
+ emit_text = self._strip_orphan_close_tags(emit_text)
+ if emit_text:
+ out.append(emit_text)
+ self._last_emitted_ended_newline = (
+ emit_text.endswith("\n")
+ )
+ return "".join(out)
+
+ return "".join(out)
+
+ def flush(self) -> str:
+ """End-of-stream flush.
+
+ If still inside an unterminated block, held-back content is
+ discarded — leaking partial reasoning is worse than a
+ truncated answer. Otherwise the held-back partial-tag tail is
+ emitted verbatim (it turned out not to be a real tag prefix).
+ """
+ if self._in_block:
+ self._buf = ""
+ self._in_block = False
+ return ""
+ tail = self._buf
+ self._buf = ""
+ if not tail:
+ return ""
+ tail = self._strip_orphan_close_tags(tail)
+ if tail:
+ self._last_emitted_ended_newline = tail.endswith("\n")
+ return tail
+
+ # ── internal helpers ───────────────────────────────────────────────
+
+ @staticmethod
+ def _find_first_tag(
+ buf: str, tags: Tuple[str, ...],
+ ) -> Tuple[int, int]:
+ """Return (earliest_index, tag_length) over *tags*, or (-1, 0).
+
+ Case-insensitive match.
+ """
+ buf_lower = buf.lower()
+ best_idx = -1
+ best_len = 0
+ for tag in tags:
+ idx = buf_lower.find(tag.lower())
+ if idx != -1 and (best_idx == -1 or idx < best_idx):
+ best_idx = idx
+ best_len = len(tag)
+ return best_idx, best_len
+
+ def _find_earliest_closed_pair(self, buf: str):
+ """Return (start_idx, end_idx) of the earliest closed pair, else None.
+
+ A closed pair is ``... `` of any variant. Matches are
+ case-insensitive and non-greedy (the closest close tag after
+ an open tag wins), matching the regex ``.*? ``
+ semantics of ``_strip_think_blocks`` case 1. When two tag
+ variants could both match, the one whose open tag appears
+ earlier wins.
+ """
+ buf_lower = buf.lower()
+ best: "tuple[int, int] | None" = None
+ for open_tag, close_tag in zip(self._OPEN_TAGS, self._CLOSE_TAGS):
+ open_lower = open_tag.lower()
+ close_lower = close_tag.lower()
+ open_idx = buf_lower.find(open_lower)
+ if open_idx == -1:
+ continue
+ close_idx = buf_lower.find(
+ close_lower, open_idx + len(open_lower),
+ )
+ if close_idx == -1:
+ continue
+ end_idx = close_idx + len(close_lower)
+ if best is None or open_idx < best[0]:
+ best = (open_idx, end_idx)
+ return best
+
+ def _find_open_at_boundary(
+ self, buf: str, already_emitted: list[str],
+ ) -> Tuple[int, int]:
+ """Return the earliest block-boundary open-tag (idx, len).
+
+ Returns (-1, 0) if no boundary-legal opener is present.
+ """
+ buf_lower = buf.lower()
+ best_idx = -1
+ best_len = 0
+ for tag in self._OPEN_TAGS:
+ tag_lower = tag.lower()
+ search_start = 0
+ while True:
+ idx = buf_lower.find(tag_lower, search_start)
+ if idx == -1:
+ break
+ if self._is_block_boundary(buf, idx, already_emitted):
+ if best_idx == -1 or idx < best_idx:
+ best_idx = idx
+ best_len = len(tag)
+ break # first boundary hit for this tag is enough
+ search_start = idx + 1
+ return best_idx, best_len
+
+ def _is_block_boundary(
+ self, buf: str, idx: int, already_emitted: list[str],
+ ) -> bool:
+ """True iff position *idx* in *buf* is a block boundary.
+
+ A block boundary is:
+ - buf position 0 AND the most recent emission ended with
+ a newline (or nothing has been emitted yet)
+ - any position whose preceding text on the current line
+ (since the last newline in buf) is whitespace-only, AND
+ if there is no newline in the preceding buf portion, the
+ most recent prior emission ended with a newline
+ """
+ if idx == 0:
+ # Check whether the last already-emitted chunk in THIS
+ # feed() call ended with a newline, otherwise fall back
+ # to the cross-feed flag.
+ if already_emitted:
+ return already_emitted[-1].endswith("\n")
+ return self._last_emitted_ended_newline
+ preceding = buf[:idx]
+ last_nl = preceding.rfind("\n")
+ if last_nl == -1:
+ # No newline in buf before the tag — boundary only if the
+ # prior emission ended with a newline AND everything since
+ # is whitespace.
+ if already_emitted:
+ prior_newline = already_emitted[-1].endswith("\n")
+ else:
+ prior_newline = self._last_emitted_ended_newline
+ return prior_newline and preceding.strip() == ""
+ # Newline present — text between it and the tag must be
+ # whitespace-only.
+ return preceding[last_nl + 1:].strip() == ""
+
+ @classmethod
+ def _max_partial_suffix(
+ cls, buf: str, tags: Tuple[str, ...],
+ ) -> int:
+ """Return the longest buf-suffix that is a prefix of any tag.
+
+ Only prefixes strictly shorter than the tag itself count
+ (full-length suffixes are the tag and are handled as matches,
+ not held-back partials). Case-insensitive.
+ """
+ if not buf:
+ return 0
+ buf_lower = buf.lower()
+ max_check = min(len(buf_lower), cls._MAX_TAG_LEN - 1)
+ for i in range(max_check, 0, -1):
+ suffix = buf_lower[-i:]
+ for tag in tags:
+ tag_lower = tag.lower()
+ if len(tag_lower) > i and tag_lower.startswith(suffix):
+ return i
+ return 0
+
+ @classmethod
+ def _strip_orphan_close_tags(cls, text: str) -> str:
+ """Remove any close tags from *text* (orphan-close handling).
+
+ An orphan close tag has no matching open in the current
+ scrubber state; it's always noise, stripped with any trailing
+ whitespace so the surrounding prose flows naturally.
+ """
+ if "" not in text:
+ return text
+ text_lower = text.lower()
+ out: list[str] = []
+ i = 0
+ while i < len(text):
+ matched = False
+ if text_lower[i:i + 2] == "":
+ for tag in cls._CLOSE_TAGS:
+ tag_lower = tag.lower()
+ tag_len = len(tag_lower)
+ if text_lower[i:i + tag_len] == tag_lower:
+ # Skip the tag and any trailing whitespace,
+ # matching _strip_think_blocks case 3.
+ j = i + tag_len
+ while j < len(text) and text[j] in " \t\n\r":
+ j += 1
+ i = j
+ matched = True
+ break
+ if not matched:
+ out.append(text[i])
+ i += 1
+ return "".join(out)
diff --git a/agent/title_generator.py b/agent/title_generator.py
index 3f617093c0..a7f1e158e1 100644
--- a/agent/title_generator.py
+++ b/agent/title_generator.py
@@ -17,6 +17,7 @@ logger = logging.getLogger(__name__)
# so silent-drops (e.g. OpenRouter 402 exhausting the fallback chain)
# become visible instead of piling up as NULL session titles.
FailureCallback = Callable[[str, BaseException], None]
+TitleCallback = Callable[[str], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
@@ -90,6 +91,7 @@ def auto_title_session(
assistant_response: str,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
+ title_callback: Optional[TitleCallback] = None,
) -> None:
"""Generate and set a session title if one doesn't already exist.
@@ -119,6 +121,11 @@ def auto_title_session(
try:
session_db.set_session_title(session_id, title)
logger.debug("Auto-generated session title: %s", title)
+ if title_callback is not None:
+ try:
+ title_callback(title)
+ except Exception:
+ logger.debug("Auto-title callback failed", exc_info=True)
except Exception as e:
logger.debug("Failed to set auto-generated title: %s", e)
@@ -131,6 +138,7 @@ def maybe_auto_title(
conversation_history: list,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
+ title_callback: Optional[TitleCallback] = None,
) -> None:
"""Fire-and-forget title generation after the first exchange.
@@ -152,7 +160,11 @@ def maybe_auto_title(
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
- kwargs={"failure_callback": failure_callback, "main_runtime": main_runtime},
+ kwargs={
+ "failure_callback": failure_callback,
+ "main_runtime": main_runtime,
+ "title_callback": title_callback,
+ },
daemon=True,
name="auto-title",
)
diff --git a/agent/transports/__init__.py b/agent/transports/__init__.py
index d1c8251ed2..b606da7fec 100644
--- a/agent/transports/__init__.py
+++ b/agent/transports/__init__.py
@@ -6,9 +6,16 @@ Usage:
result = transport.normalize_response(raw_response)
"""
-from agent.transports.types import NormalizedResponse, ToolCall, Usage, build_tool_call, map_finish_reason # noqa: F401
+from agent.transports.types import (
+ NormalizedResponse,
+ ToolCall,
+ Usage,
+ build_tool_call,
+ map_finish_reason,
+) # noqa: F401
_REGISTRY: dict = {}
+_discovered: bool = False
def register_transport(api_mode: str, transport_cls: type) -> None:
@@ -23,6 +30,9 @@ def get_transport(api_mode: str):
This allows gradual migration — call sites can check for None
and fall back to the legacy code path.
"""
+ global _discovered
+ if not _discovered:
+ _discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
# The registry can be partially populated when a specific transport
@@ -38,6 +48,8 @@ def get_transport(api_mode: str):
def _discover_transports() -> None:
"""Import all transport modules to trigger auto-registration."""
+ global _discovered
+ _discovered = True
try:
import agent.transports.anthropic # noqa: F401
except ImportError:
diff --git a/agent/transports/chat_completions.py b/agent/transports/chat_completions.py
index 9a115e4547..ca29b39ffe 100644
--- a/agent/transports/chat_completions.py
+++ b/agent/transports/chat_completions.py
@@ -109,7 +109,9 @@ class ChatCompletionsTransport(ProviderTransport):
def api_mode(self) -> str:
return "chat_completions"
- def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
+ def convert_messages(
+ self, messages: list[dict[str, Any]], **kwargs
+ ) -> list[dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` /
@@ -126,7 +128,9 @@ class ChatCompletionsTransport(ProviderTransport):
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
- if isinstance(tc, dict) and ("call_id" in tc or "response_item_id" in tc):
+ if isinstance(tc, dict) and (
+ "call_id" in tc or "response_item_id" in tc
+ ):
needs_sanitize = True
break
if needs_sanitize:
@@ -149,39 +153,41 @@ class ChatCompletionsTransport(ProviderTransport):
tc.pop("response_item_id", None)
return sanitized
- def convert_tools(self, tools: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+ def convert_tools(self, tools: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Tools are already in OpenAI format — identity."""
return tools
def build_kwargs(
self,
model: str,
- messages: List[Dict[str, Any]],
- tools: Optional[List[Dict[str, Any]]] = None,
+ messages: list[dict[str, Any]],
+ tools: list[dict[str, Any]] | None = None,
**params,
- ) -> Dict[str, Any]:
+ ) -> dict[str, Any]:
"""Build chat.completions.create() kwargs.
- This is the most complex transport method — it handles ~16 providers
- via params rather than subclasses.
-
- params:
+ params (all optional):
timeout: float — API call timeout
max_tokens: int | None — user-configured max tokens
- ephemeral_max_output_tokens: int | None — one-shot override (error recovery)
+ ephemeral_max_output_tokens: int | None — one-shot override
max_tokens_param_fn: callable — returns {max_tokens: N} or {max_completion_tokens: N}
reasoning_config: dict | None
request_overrides: dict | None
session_id: str | None
- qwen_session_metadata: dict | None — {sessionId, promptId} precomputed
model_lower: str — lowercase model name for pattern matching
- # Provider detection flags (all optional, default False)
+ # Provider profile path (all per-provider quirks live in providers/)
+ provider_profile: ProviderProfile | None — when present, delegates to
+ _build_kwargs_from_profile(); all flag params below are bypassed.
+ # Legacy-path flags — only used when provider_profile is None
+ # (i.e. custom / unregistered providers). Known providers all go
+ # through provider_profile.
is_openrouter: bool
is_nous: bool
is_qwen_portal: bool
is_github_models: bool
is_nvidia_nim: bool
is_kimi: bool
+ is_tokenhub: bool
is_lmstudio: bool
is_custom_provider: bool
ollama_num_ctx: int | None
@@ -190,6 +196,7 @@ class ChatCompletionsTransport(ProviderTransport):
# Qwen-specific
qwen_prepare_fn: callable | None — runs AFTER codex sanitization
qwen_prepare_inplace_fn: callable | None — in-place variant for deepcopied lists
+ qwen_session_metadata: dict | None
# Temperature
fixed_temperature: Any — from _fixed_temperature_for_model()
omit_temperature: bool
@@ -199,28 +206,21 @@ class ChatCompletionsTransport(ProviderTransport):
lmstudio_reasoning_options: list[str] | None # raw allowed_options from /api/v1/models
# Claude on OpenRouter/Nous max output
anthropic_max_output: int | None
- # Extra
- extra_body_additions: dict | None — pre-built extra_body entries
+ extra_body_additions: dict | None
"""
# Codex sanitization: drop reasoning_items / call_id / response_item_id
sanitized = self.convert_messages(messages)
- # Qwen portal prep AFTER codex sanitization. If sanitize already
- # deepcopied, reuse that copy via the in-place variant to avoid a
- # second deepcopy.
- is_qwen = params.get("is_qwen_portal", False)
- if is_qwen:
- qwen_prep = params.get("qwen_prepare_fn")
- qwen_prep_inplace = params.get("qwen_prepare_inplace_fn")
- if sanitized is messages:
- if qwen_prep is not None:
- sanitized = qwen_prep(sanitized)
- else:
- # Already deepcopied — transform in place
- if qwen_prep_inplace is not None:
- qwen_prep_inplace(sanitized)
- elif qwen_prep is not None:
- sanitized = qwen_prep(sanitized)
+ # ── Provider profile: single-path when present ──────────────────
+ _profile = params.get("provider_profile")
+ if _profile:
+ return self._build_kwargs_from_profile(
+ _profile, model, sanitized, tools, params
+ )
+
+ # ── Legacy fallback (unregistered / unknown provider) ───────────
+ # Reached only when get_provider_profile() returned None.
+ # Known providers always go through the profile path above.
# Developer role swap for GPT-5/Codex models
model_lower = params.get("model_lower", (model or "").lower())
@@ -233,7 +233,7 @@ class ChatCompletionsTransport(ProviderTransport):
sanitized = list(sanitized)
sanitized[0] = {**sanitized[0], "role": "developer"}
- api_kwargs: Dict[str, Any] = {
+ api_kwargs: dict[str, Any] = {
"model": model,
"messages": sanitized,
}
@@ -242,19 +242,6 @@ class ChatCompletionsTransport(ProviderTransport):
if timeout is not None:
api_kwargs["timeout"] = timeout
- # Temperature
- fixed_temp = params.get("fixed_temperature")
- omit_temp = params.get("omit_temperature", False)
- if omit_temp:
- api_kwargs.pop("temperature", None)
- elif fixed_temp is not None:
- api_kwargs["temperature"] = fixed_temp
-
- # Qwen metadata (caller precomputes {sessionId, promptId})
- qwen_meta = params.get("qwen_session_metadata")
- if qwen_meta and is_qwen:
- api_kwargs["metadata"] = qwen_meta
-
# Tools
if tools:
# Moonshot/Kimi uses a stricter flavored JSON Schema. Rewriting
@@ -278,13 +265,6 @@ class ChatCompletionsTransport(ProviderTransport):
api_kwargs.update(max_tokens_fn(ephemeral))
elif max_tokens is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(max_tokens))
- elif is_nvidia_nim and max_tokens_fn:
- api_kwargs.update(max_tokens_fn(16384))
- elif is_qwen and max_tokens_fn:
- api_kwargs.update(max_tokens_fn(65536))
- elif is_kimi and max_tokens_fn:
- # Kimi/Moonshot: 32000 matches Kimi CLI's default
- api_kwargs.update(max_tokens_fn(32000))
elif anthropic_max_out is not None:
api_kwargs["max_tokens"] = anthropic_max_out
@@ -331,7 +311,7 @@ class ChatCompletionsTransport(ProviderTransport):
api_kwargs["reasoning_effort"] = _lm_effort
# extra_body assembly
- extra_body: Dict[str, Any] = {}
+ extra_body: dict[str, Any] = {}
is_openrouter = params.get("is_openrouter", False)
is_nous = params.get("is_nous", False)
@@ -361,35 +341,7 @@ class ChatCompletionsTransport(ProviderTransport):
if gh_reasoning is not None:
extra_body["reasoning"] = gh_reasoning
else:
- if reasoning_config is not None:
- rc = dict(reasoning_config)
- if is_nous and rc.get("enabled") is False:
- pass # omit for Nous when disabled
- else:
- extra_body["reasoning"] = rc
- else:
- extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
-
- if is_nous:
- extra_body["tags"] = ["product=hermes-agent"]
-
- # Ollama num_ctx
- ollama_ctx = params.get("ollama_num_ctx")
- if ollama_ctx:
- options = extra_body.get("options", {})
- options["num_ctx"] = ollama_ctx
- extra_body["options"] = options
-
- # Ollama/custom think=false
- if params.get("is_custom_provider", False):
- if reasoning_config and isinstance(reasoning_config, dict):
- _effort = (reasoning_config.get("effort") or "").strip().lower()
- _enabled = reasoning_config.get("enabled", True)
- if _effort == "none" or _enabled is False:
- extra_body["think"] = False
-
- if is_qwen:
- extra_body["vl_high_resolution_images"] = True
+ extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
if provider_name == "gemini":
raw_thinking_config = _build_gemini_thinking_config(model, reasoning_config)
@@ -423,6 +375,120 @@ class ChatCompletionsTransport(ProviderTransport):
return api_kwargs
+ def _build_kwargs_from_profile(self, profile, model, sanitized, tools, params):
+ """Build API kwargs using a ProviderProfile — single path, no legacy flags.
+
+ This method replaces the entire flag-based kwargs assembly when a
+ provider_profile is passed. Every quirk comes from the profile object.
+ """
+ from providers.base import OMIT_TEMPERATURE
+
+ # Message preprocessing
+ sanitized = profile.prepare_messages(sanitized)
+
+ # Developer role swap — model-name-based, applies to all providers
+ _model_lower = (model or "").lower()
+ if (
+ sanitized
+ and isinstance(sanitized[0], dict)
+ and sanitized[0].get("role") == "system"
+ and any(p in _model_lower for p in DEVELOPER_ROLE_MODELS)
+ ):
+ sanitized = list(sanitized)
+ sanitized[0] = {**sanitized[0], "role": "developer"}
+
+ api_kwargs: dict[str, Any] = {
+ "model": model,
+ "messages": sanitized,
+ }
+
+ # Temperature
+ if profile.fixed_temperature is OMIT_TEMPERATURE:
+ pass # Don't include temperature at all
+ elif profile.fixed_temperature is not None:
+ api_kwargs["temperature"] = profile.fixed_temperature
+ else:
+ # Use caller's temperature if provided
+ temp = params.get("temperature")
+ if temp is not None:
+ api_kwargs["temperature"] = temp
+
+ # Timeout
+ timeout = params.get("timeout")
+ if timeout is not None:
+ api_kwargs["timeout"] = timeout
+
+ # Tools — apply Moonshot/Kimi schema sanitization regardless of path
+ if tools:
+ if is_moonshot_model(model):
+ tools = sanitize_moonshot_tools(tools)
+ api_kwargs["tools"] = tools
+
+ # max_tokens resolution — priority: ephemeral > user > profile default
+ max_tokens_fn = params.get("max_tokens_param_fn")
+ ephemeral = params.get("ephemeral_max_output_tokens")
+ user_max = params.get("max_tokens")
+ anthropic_max = params.get("anthropic_max_output")
+
+ if ephemeral is not None and max_tokens_fn:
+ api_kwargs.update(max_tokens_fn(ephemeral))
+ elif user_max is not None and max_tokens_fn:
+ api_kwargs.update(max_tokens_fn(user_max))
+ elif profile.default_max_tokens and max_tokens_fn:
+ api_kwargs.update(max_tokens_fn(profile.default_max_tokens))
+ elif anthropic_max is not None:
+ api_kwargs["max_tokens"] = anthropic_max
+
+ # Provider-specific api_kwargs extras (reasoning_effort, metadata, etc.)
+ reasoning_config = params.get("reasoning_config")
+ extra_body_from_profile, top_level_from_profile = (
+ profile.build_api_kwargs_extras(
+ reasoning_config=reasoning_config,
+ supports_reasoning=params.get("supports_reasoning", False),
+ qwen_session_metadata=params.get("qwen_session_metadata"),
+ model=model,
+ ollama_num_ctx=params.get("ollama_num_ctx"),
+ )
+ )
+ api_kwargs.update(top_level_from_profile)
+
+ # extra_body assembly
+ extra_body: dict[str, Any] = {}
+
+ # Profile's extra_body (tags, provider prefs, vl_high_resolution, etc.)
+ profile_body = profile.build_extra_body(
+ session_id=params.get("session_id"),
+ provider_preferences=params.get("provider_preferences"),
+ model=model,
+ base_url=params.get("base_url"),
+ reasoning_config=reasoning_config,
+ )
+ if profile_body:
+ extra_body.update(profile_body)
+
+ # Profile's reasoning/thinking extra_body entries
+ if extra_body_from_profile:
+ extra_body.update(extra_body_from_profile)
+
+ # Merge any pre-built extra_body additions from the caller
+ additions = params.get("extra_body_additions")
+ if additions:
+ extra_body.update(additions)
+
+ # Request overrides (user config)
+ overrides = params.get("request_overrides")
+ if overrides:
+ for k, v in overrides.items():
+ if k == "extra_body" and isinstance(v, dict):
+ extra_body.update(v)
+ else:
+ api_kwargs[k] = v
+
+ if extra_body:
+ api_kwargs["extra_body"] = extra_body
+
+ return api_kwargs
+
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize OpenAI ChatCompletion to NormalizedResponse.
@@ -444,7 +510,7 @@ class ChatCompletionsTransport(ProviderTransport):
# Gemini 3 thinking models attach extra_content with
# thought_signature — without replay on the next turn the API
# rejects the request with 400.
- tc_provider_data: Dict[str, Any] = {}
+ tc_provider_data: dict[str, Any] = {}
extra = getattr(tc, "extra_content", None)
if extra is None and hasattr(tc, "model_extra"):
extra = (tc.model_extra or {}).get("extra_content")
@@ -455,12 +521,14 @@ class ChatCompletionsTransport(ProviderTransport):
except Exception:
pass
tc_provider_data["extra_content"] = extra
- tool_calls.append(ToolCall(
- id=tc.id,
- name=tc.function.name,
- arguments=tc.function.arguments,
- provider_data=tc_provider_data or None,
- ))
+ tool_calls.append(
+ ToolCall(
+ id=tc.id,
+ name=tc.function.name,
+ arguments=tc.function.arguments,
+ provider_data=tc_provider_data or None,
+ )
+ )
usage = None
if hasattr(response, "usage") and response.usage:
@@ -508,7 +576,7 @@ class ChatCompletionsTransport(ProviderTransport):
return False
return True
- def extract_cache_stats(self, response: Any) -> Optional[Dict[str, int]]:
+ def extract_cache_stats(self, response: Any) -> dict[str, int] | None:
"""Extract OpenRouter/OpenAI cache stats from prompt_tokens_details."""
usage = getattr(response, "usage", None)
if usage is None:
diff --git a/agent/transports/types.py b/agent/transports/types.py
index 68a807b47c..f0da1eb6f8 100644
--- a/agent/transports/types.py
+++ b/agent/transports/types.py
@@ -12,7 +12,7 @@ from __future__ import annotations
import json
from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional
+from typing import Any
@dataclass
@@ -32,10 +32,10 @@ class ToolCall:
* Others: ``None``
"""
- id: Optional[str]
+ id: str | None
name: str
arguments: str # JSON string
- provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
+ provider_data: dict[str, Any] | None = field(default=None, repr=False)
# ── Backward compatibility ──────────────────────────────────
# The agent loop reads tc.function.name / tc.function.arguments
@@ -47,17 +47,17 @@ class ToolCall:
return "function"
@property
- def function(self) -> "ToolCall":
+ def function(self) -> ToolCall:
"""Return self so tc.function.name / tc.function.arguments work."""
return self
@property
- def call_id(self) -> Optional[str]:
+ def call_id(self) -> str | None:
"""Codex call_id from provider_data, accessed via getattr by _build_assistant_message."""
return (self.provider_data or {}).get("call_id")
@property
- def response_item_id(self) -> Optional[str]:
+ def response_item_id(self) -> str | None:
"""Codex response_item_id from provider_data."""
return (self.provider_data or {}).get("response_item_id")
@@ -101,18 +101,18 @@ class NormalizedResponse:
* Others: ``None``
"""
- content: Optional[str]
- tool_calls: Optional[List[ToolCall]]
+ content: str | None
+ tool_calls: list[ToolCall] | None
finish_reason: str # "stop", "tool_calls", "length", "content_filter"
- reasoning: Optional[str] = None
- usage: Optional[Usage] = None
- provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
+ reasoning: str | None = None
+ usage: Usage | None = None
+ provider_data: dict[str, Any] | None = field(default=None, repr=False)
# ── Backward compatibility ──────────────────────────────────
# The shim _nr_to_assistant_message() mapped these from provider_data.
# These properties let NormalizedResponse pass through directly.
@property
- def reasoning_content(self) -> Optional[str]:
+ def reasoning_content(self) -> str | None:
pd = self.provider_data or {}
return pd.get("reasoning_content")
@@ -136,8 +136,9 @@ class NormalizedResponse:
# Factory helpers
# ---------------------------------------------------------------------------
+
def build_tool_call(
- id: Optional[str],
+ id: str | None,
name: str,
arguments: Any,
**provider_fields: Any,
@@ -151,7 +152,7 @@ def build_tool_call(
return ToolCall(id=id, name=name, arguments=args_str, provider_data=pd)
-def map_finish_reason(reason: Optional[str], mapping: Dict[str, str]) -> str:
+def map_finish_reason(reason: str | None, mapping: dict[str, str]) -> str:
"""Translate a provider-specific stop reason to the normalised set.
Falls back to ``"stop"`` for unknown or ``None`` reasons.
diff --git a/agent/usage_pricing.py b/agent/usage_pricing.py
index 746f962097..467b72931c 100644
--- a/agent/usage_pricing.py
+++ b/agent/usage_pricing.py
@@ -1,5 +1,6 @@
from __future__ import annotations
+import re
from dataclasses import dataclass
from datetime import datetime, timezone
from decimal import Decimal
@@ -82,6 +83,121 @@ _UTC_NOW = lambda: datetime.now(timezone.utc)
# Official docs snapshot entries. Models whose published pricing and cache
# semantics are stable enough to encode exactly.
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
+ # ── Anthropic Claude 4.7 ─────────────────────────────────────────────
+ # Opus 4.5/4.6/4.7 share $5/$25 pricing (new tokenizer, up to 35% more
+ # tokens for the same text).
+ # Source: https://platform.claude.com/docs/en/about-claude/pricing
+ (
+ "anthropic",
+ "claude-opus-4-7",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("5.00"),
+ output_cost_per_million=Decimal("25.00"),
+ cache_read_cost_per_million=Decimal("0.50"),
+ cache_write_cost_per_million=Decimal("6.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-opus-4-7-20250507",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("5.00"),
+ output_cost_per_million=Decimal("25.00"),
+ cache_read_cost_per_million=Decimal("0.50"),
+ cache_write_cost_per_million=Decimal("6.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ # ── Anthropic Claude 4.6 ─────────────────────────────────────────────
+ (
+ "anthropic",
+ "claude-opus-4-6",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("5.00"),
+ output_cost_per_million=Decimal("25.00"),
+ cache_read_cost_per_million=Decimal("0.50"),
+ cache_write_cost_per_million=Decimal("6.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-opus-4-6-20250414",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("5.00"),
+ output_cost_per_million=Decimal("25.00"),
+ cache_read_cost_per_million=Decimal("0.50"),
+ cache_write_cost_per_million=Decimal("6.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-sonnet-4-6",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("3.00"),
+ output_cost_per_million=Decimal("15.00"),
+ cache_read_cost_per_million=Decimal("0.30"),
+ cache_write_cost_per_million=Decimal("3.75"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-sonnet-4-6-20250414",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("3.00"),
+ output_cost_per_million=Decimal("15.00"),
+ cache_read_cost_per_million=Decimal("0.30"),
+ cache_write_cost_per_million=Decimal("3.75"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ # ── Anthropic Claude 4.5 ─────────────────────────────────────────────
+ (
+ "anthropic",
+ "claude-opus-4-5",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("5.00"),
+ output_cost_per_million=Decimal("25.00"),
+ cache_read_cost_per_million=Decimal("0.50"),
+ cache_write_cost_per_million=Decimal("6.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-sonnet-4-5",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("3.00"),
+ output_cost_per_million=Decimal("15.00"),
+ cache_read_cost_per_million=Decimal("0.30"),
+ cache_write_cost_per_million=Decimal("3.75"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ (
+ "anthropic",
+ "claude-haiku-4-5",
+ ): PricingEntry(
+ input_cost_per_million=Decimal("1.00"),
+ output_cost_per_million=Decimal("5.00"),
+ cache_read_cost_per_million=Decimal("0.10"),
+ cache_write_cost_per_million=Decimal("1.25"),
+ source="official_docs_snapshot",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
+ ),
+ # ── Anthropic Claude 4 / 4.1 ─────────────────────────────────────────
(
"anthropic",
"claude-opus-4-20250514",
@@ -91,8 +207,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-prompt-caching-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
@@ -103,8 +219,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-prompt-caching-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
# OpenAI
(
@@ -184,7 +300,7 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
- # Anthropic older models (pre-4.6 generation)
+ # ── Anthropic older models (pre-4.5 generation) ────────────────────────
(
"anthropic",
"claude-3-5-sonnet-20241022",
@@ -194,8 +310,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-pricing-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
@@ -206,8 +322,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("0.08"),
cache_write_cost_per_million=Decimal("1.00"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-pricing-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
@@ -218,8 +334,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-pricing-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
@@ -230,8 +346,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
cache_read_cost_per_million=Decimal("0.03"),
cache_write_cost_per_million=Decimal("0.30"),
source="official_docs_snapshot",
- source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
- pricing_version="anthropic-pricing-2026-03-16",
+ source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+ pricing_version="anthropic-pricing-2026-05",
),
# DeepSeek
(
@@ -426,8 +542,37 @@ def resolve_billing_route(
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
+def _normalize_anthropic_model_name(model: str) -> str:
+ """Normalize Anthropic model name variants to canonical form.
+
+ Handles:
+ - Dot notation: claude-opus-4.7 → claude-opus-4-7
+ - Short aliases: claude-opus-4.7 → claude-opus-4-7
+ - Strips anthropic/ prefix if present
+ """
+ name = model.lower().strip()
+ if name.startswith("anthropic/"):
+ name = name[len("anthropic/"):]
+ # Normalize dots to dashes in version numbers (e.g. 4.7 → 4-7, 4.6 → 4-6)
+ # But preserve the rest of the name structure
+ name = re.sub(r"(\d+)\.(\d+)", r"\1-\2", name)
+ return name
+
+
def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]:
- return _OFFICIAL_DOCS_PRICING.get((route.provider, route.model.lower()))
+ model = route.model.lower()
+ # Direct lookup first
+ entry = _OFFICIAL_DOCS_PRICING.get((route.provider, model))
+ if entry:
+ return entry
+ # Try normalized name for Anthropic (handles dot-notation like opus-4.7)
+ if route.provider == "anthropic":
+ normalized = _normalize_anthropic_model_name(model)
+ if normalized != model:
+ entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
+ if entry:
+ return entry
+ return None
def _openrouter_pricing_entry(route: BillingRoute) -> Optional[PricingEntry]:
diff --git a/cli-config.yaml.example b/cli-config.yaml.example
index 963268d4ba..d7b7dcf931 100644
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -601,7 +601,7 @@ agent:
# - A preset like "hermes-cli" or "hermes-telegram" (curated tool set)
# - A list of individual toolsets to compose your own (see list below)
#
-# Supported platform keys: cli, telegram, discord, whatsapp, slack, qqbot, teams
+# Supported platform keys: cli, telegram, discord, whatsapp, slack, qqbot, teams, google_chat
#
# Examples:
#
@@ -632,6 +632,7 @@ agent:
# homeassistant: hermes-homeassistant (same as telegram)
# qqbot: hermes-qqbot (same as telegram)
# teams: hermes-teams (same as telegram)
+# google_chat: hermes-google_chat (same as telegram)
#
platform_toolsets:
cli: [hermes-cli]
@@ -644,6 +645,7 @@ platform_toolsets:
qqbot: [hermes-qqbot]
yuanbao: [hermes-yuanbao]
teams: [hermes-teams]
+ google_chat: [hermes-google_chat]
# =============================================================================
# Gateway Platform Settings
@@ -875,6 +877,22 @@ display:
# Toggle at runtime with /verbose in the CLI
tool_progress: all
+ # Auto-cleanup of temporary progress bubbles after the final response lands.
+ # On platforms that support message deletion (currently Telegram), this
+ # removes the tool-progress bubble, "⏳ Still working..." notices, and
+ # context-pressure status messages once the final reply has been delivered —
+ # keeping long-running turns visible live, then tidy afterward. Failed runs
+ # leave the bubbles in place as breadcrumbs. Off by default.
+ # Per-platform override: display.platforms.telegram.cleanup_progress
+ # true: Delete tracked progress/status bubbles on successful turn
+ # false: Leave everything in place (default)
+ # Example:
+ # display:
+ # platforms:
+ # telegram:
+ # cleanup_progress: true
+ cleanup_progress: false
+
# Gateway-only natural mid-turn assistant updates.
# When true, completed assistant status messages are sent as separate chat
# messages. This is independent of tool_progress and gateway streaming.
diff --git a/cli.py b/cli.py
index 3b9f6af531..08a9bb94ce 100644
--- a/cli.py
+++ b/cli.py
@@ -27,6 +27,7 @@ import tempfile
import time
import uuid
import textwrap
+from collections import deque
from urllib.parse import unquote, urlparse
from contextlib import contextmanager
from pathlib import Path
@@ -298,6 +299,7 @@ def load_cli_config() -> Dict[str, Any]:
"browser": {
"inactivity_timeout": 120, # Auto-cleanup inactive browser sessions after 2 min
"record_sessions": False, # Auto-record browser sessions as WebM videos
+ "engine": "auto", # Browser engine: auto (Chrome), lightpanda, chrome
},
"compression": {
"enabled": True, # Auto-compress when approaching context limit
@@ -334,6 +336,8 @@ def load_cli_config() -> Dict[str, Any]:
"show_reasoning": False,
"streaming": True,
"busy_input_mode": "interrupt",
+ "persistent_output": True,
+ "persistent_output_max_lines": 200,
"skin": "default",
},
@@ -940,6 +944,18 @@ def _run_state_db_auto_maintenance(session_db) -> None:
except Exception as _prune_exc:
logger.debug("Ghost session prune skipped: %s", _prune_exc)
+ # One-time finalize of orphaned compression continuations (#20001).
+ try:
+ if not session_db.get_meta("orphaned_compression_finalize_v1"):
+ finalized = session_db.finalize_orphaned_compression_sessions()
+ session_db.set_meta("orphaned_compression_finalize_v1", "1")
+ if finalized:
+ logger.info(
+ "Finalized %d orphaned compression sessions", finalized
+ )
+ except Exception as _finalize_exc:
+ logger.debug("Orphan compression finalize skipped: %s", _finalize_exc)
+
cfg = (_load_full_config().get("sessions") or {})
if not cfg.get("auto_prune", False):
return
@@ -971,6 +987,7 @@ def _run_checkpoint_auto_maintenance() -> None:
retention_days=int(cfg.get("retention_days", 7)),
min_interval_hours=int(cfg.get("min_interval_hours", 24)),
delete_orphans=bool(cfg.get("delete_orphans", True)),
+ max_total_size_mb=int(cfg.get("max_total_size_mb", 500)),
)
except Exception as exc:
logger.debug("checkpoint auto-maintenance skipped: %s", exc)
@@ -1263,6 +1280,87 @@ def _render_final_assistant_content(text: str, mode: str = "render"):
return Markdown(plain)
+_OUTPUT_HISTORY_ENABLED = True
+_OUTPUT_HISTORY_REPLAYING = False
+_OUTPUT_HISTORY_SUPPRESSED = False
+_OUTPUT_HISTORY_MAX_LINES = 200
+_OUTPUT_HISTORY = deque(maxlen=_OUTPUT_HISTORY_MAX_LINES)
+_ANSI_CONTROL_RE = re.compile(
+ r"\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07]*(?:\x07|\x1b\\))"
+)
+
+
+def _coerce_output_history_limit(value) -> int:
+ try:
+ return max(10, int(value))
+ except (TypeError, ValueError):
+ return 200
+
+
+def _configure_output_history(enabled: bool, max_lines=200) -> None:
+ """Configure recent CLI output replayed after terminal redraws."""
+ global _OUTPUT_HISTORY_ENABLED, _OUTPUT_HISTORY_MAX_LINES, _OUTPUT_HISTORY
+ _OUTPUT_HISTORY_ENABLED = bool(enabled)
+ _OUTPUT_HISTORY_MAX_LINES = _coerce_output_history_limit(max_lines)
+ _OUTPUT_HISTORY = deque(maxlen=_OUTPUT_HISTORY_MAX_LINES)
+
+
+def _clear_output_history() -> None:
+ _OUTPUT_HISTORY.clear()
+
+
+@contextmanager
+def _suspend_output_history():
+ global _OUTPUT_HISTORY_SUPPRESSED
+ old_value = _OUTPUT_HISTORY_SUPPRESSED
+ _OUTPUT_HISTORY_SUPPRESSED = True
+ try:
+ yield
+ finally:
+ _OUTPUT_HISTORY_SUPPRESSED = old_value
+
+
+def _record_output_history_entry(entry) -> None:
+ if not _OUTPUT_HISTORY_ENABLED or _OUTPUT_HISTORY_REPLAYING or _OUTPUT_HISTORY_SUPPRESSED:
+ return
+ _OUTPUT_HISTORY.append(entry)
+
+
+def _record_output_history(text: str) -> None:
+ if not _OUTPUT_HISTORY_ENABLED or _OUTPUT_HISTORY_REPLAYING or _OUTPUT_HISTORY_SUPPRESSED:
+ return
+ clean = _ANSI_CONTROL_RE.sub("", str(text)).replace("\r", "").rstrip("\n")
+ if not clean:
+ return
+ for line in clean.splitlines():
+ _record_output_history_entry(line)
+
+
+def _replay_output_history() -> None:
+ """Repaint recent output above the prompt after a full screen clear."""
+ global _OUTPUT_HISTORY_REPLAYING
+ if not _OUTPUT_HISTORY_ENABLED or not _OUTPUT_HISTORY:
+ return
+ _OUTPUT_HISTORY_REPLAYING = True
+ try:
+ for entry in tuple(_OUTPUT_HISTORY):
+ if callable(entry):
+ try:
+ lines = entry()
+ except Exception:
+ continue
+ if isinstance(lines, str):
+ lines = lines.splitlines()
+ else:
+ lines = [entry]
+ for line in lines:
+ _pt_print(_PT_ANSI(str(line)))
+ except Exception:
+ pass
+ finally:
+ _OUTPUT_HISTORY_REPLAYING = False
+
+
def _cprint(text: str):
"""Print ANSI-colored text through prompt_toolkit's native renderer.
@@ -1279,6 +1377,8 @@ def _cprint(text: str):
``loop.call_soon_threadsafe``, which pauses the input area, prints
the line above it, and redraws the prompt cleanly.
"""
+ _record_output_history(text)
+
try:
from prompt_toolkit.application import get_app_or_none, run_in_terminal
except Exception:
@@ -1308,7 +1408,13 @@ def _cprint(text: str):
import asyncio as _asyncio
try:
- current_loop = _asyncio.get_event_loop_policy().get_event_loop()
+ # Use get_running_loop() instead of get_event_loop() to avoid the
+ # DeprecationWarning / RuntimeWarning emitted by Python 3.10+ when
+ # get_event_loop() is called from a thread that has no current event
+ # loop set (e.g. the process_loop background thread). Fixes #19285.
+ current_loop = _asyncio.get_running_loop()
+ except RuntimeError:
+ current_loop = None
except Exception:
current_loop = None
# Same thread as the app's loop → safe to print directly.
@@ -1450,7 +1556,21 @@ def _resolve_attachment_path(raw_path: str) -> Path | None:
except Exception:
resolved = path
- if not resolved.exists() or not resolved.is_file():
+ # Path.exists() / is_file() invoke os.stat(), which raises OSError when
+ # the candidate string is structurally invalid as a path — most commonly
+ # ENAMETOOLONG (errno 63 on macOS, errno 36 on Linux) when the input
+ # exceeds NAME_MAX (typically 255 bytes). This bites pasted slash
+ # commands like `/goal ` because `_detect_file_drop()`'s
+ # `starts_like_path` prefilter accepts any input starting with `/`,
+ # then this resolver tries to stat it before short-circuiting on the
+ # slash-command path. Without this guard the OSError propagates up to
+ # the process_loop catch-all in _interactive_loop and the user input
+ # is silently lost (the warning ends up in agent.log but the user sees
+ # nothing — the prompt just hangs).
+ try:
+ if not resolved.exists() or not resolved.is_file():
+ return None
+ except OSError:
return None
return resolved
@@ -1660,6 +1780,20 @@ _TERMINAL_INPUT_MODE_RESET_SEQ = (
)
+def _bind_prompt_submit_keys(kb, handler) -> None:
+ """Bind both CR and LF terminal Enter forms to the submit handler."""
+ for key in ("enter", "c-j"):
+ kb.add(key)(handler)
+
+
+def _disable_prompt_toolkit_cpr_warning(app) -> None:
+ """Let prompt_toolkit fall back from CPR without printing into the prompt."""
+ try:
+ app.renderer.cpr_not_supported_callback = None
+ except Exception:
+ pass
+
+
def _strip_leaked_terminal_responses_with_meta(text: str) -> tuple[str, bool]:
"""Strip leaked terminal control-response sequences from user input.
@@ -1890,8 +2024,8 @@ _skill_commands = scan_skill_commands()
def _get_plugin_cmd_handler_names() -> set:
"""Return plugin command names (without slash prefix) for dispatch matching."""
try:
- from hermes_cli.plugins import get_plugin_manager
- return set(get_plugin_manager()._plugin_commands.keys())
+ from hermes_cli.plugins import get_plugin_commands
+ return set(get_plugin_commands().keys())
except Exception:
return set()
@@ -2035,6 +2169,10 @@ class HermesCLI:
self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
# show_reasoning: display model thinking/reasoning before the response
self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
+ _configure_output_history(
+ enabled=CLI_CONFIG["display"].get("persistent_output", True),
+ max_lines=CLI_CONFIG["display"].get("persistent_output_max_lines", 200),
+ )
# busy_input_mode: "interrupt" (Enter interrupts current run),
# "queue" (Enter queues for next turn), or "steer" (Enter injects
# mid-run via /steer, arriving after the next tool call).
@@ -2145,7 +2283,10 @@ class HermesCLI:
elif CLI_CONFIG.get("max_turns"): # Backwards compat: root-level max_turns
self.max_turns = CLI_CONFIG["max_turns"]
elif os.getenv("HERMES_MAX_ITERATIONS"):
- self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS"))
+ try:
+ self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS", ""))
+ except (TypeError, ValueError):
+ self.max_turns = 90
else:
self.max_turns = 90
@@ -2167,7 +2308,9 @@ class HermesCLI:
if isinstance(cp_cfg, bool):
cp_cfg = {"enabled": cp_cfg}
self.checkpoints_enabled = checkpoints or cp_cfg.get("enabled", False)
- self.checkpoint_max_snapshots = cp_cfg.get("max_snapshots", 50)
+ self.checkpoint_max_snapshots = cp_cfg.get("max_snapshots", 20)
+ self.checkpoint_max_total_size_mb = cp_cfg.get("max_total_size_mb", 500)
+ self.checkpoint_max_file_size_mb = cp_cfg.get("max_file_size_mb", 10)
self.pass_session_id = pass_session_id
# --ignore-rules: honor either the constructor flag or the env var set
# by `hermes chat --ignore-rules` in hermes_cli/main.py. When true we
@@ -2309,6 +2452,9 @@ class HermesCLI:
# Status bar visibility (toggled via /statusbar)
self._status_bar_visible = True
+ self._resize_recovery_lock = threading.Lock()
+ self._resize_recovery_timer = None
+ self._resize_recovery_pending = False
# Background task tracking: {task_id: threading.Thread}
self._background_tasks: Dict[str, threading.Thread] = {}
@@ -2316,6 +2462,8 @@ class HermesCLI:
def _invalidate(self, min_interval: float = 0.25) -> None:
"""Throttled UI repaint — prevents terminal blinking on slow/SSH connections."""
+ if getattr(self, "_resize_recovery_pending", False):
+ return
now = time.monotonic()
if hasattr(self, "_app") and self._app and (now - self._last_invalidate) >= min_interval:
self._last_invalidate = now
@@ -2339,11 +2487,25 @@ class HermesCLI:
app = getattr(self, "_app", None)
if not app:
return
+ self._clear_prompt_toolkit_screen(app)
+ _replay_output_history()
+ try:
+ app.invalidate()
+ except Exception:
+ pass
+
+ def _clear_prompt_toolkit_screen(self, app, *, rebuild_scrollback: bool = False) -> None:
+ """Clear the terminal and reset prompt_toolkit renderer state."""
try:
renderer = app.renderer
out = renderer.output
out.reset_attributes()
out.erase_screen()
+ if rebuild_scrollback:
+ try:
+ out.write_raw("\x1b[3J")
+ except Exception:
+ pass
out.cursor_goto(0, 0)
out.flush()
# Drop prompt_toolkit's cached screen + cursor state so the
@@ -2352,10 +2514,57 @@ class HermesCLI:
renderer.reset(leave_alternate_screen=False)
except Exception:
pass
+
+ def _recover_after_resize(self, app, original_on_resize) -> None:
+ """Recover a resized classic CLI without desynchronizing cursor state."""
+ self._clear_prompt_toolkit_screen(app, rebuild_scrollback=True)
+ _replay_output_history()
+ original_on_resize()
+
+ def _schedule_resize_recovery(self, app, original_on_resize, delay: float = 0.12) -> None:
+ """Debounce resize redraws so footer chrome is not stamped into scrollback."""
try:
- app.invalidate()
+ old_timer = getattr(self, "_resize_recovery_timer", None)
+ lock = getattr(self, "_resize_recovery_lock", None)
+ if lock is None:
+ lock = threading.Lock()
+ self._resize_recovery_lock = lock
+
+ def _timer_fired(timer_ref):
+ def _run_recovery():
+ with lock:
+ if getattr(self, "_resize_recovery_timer", None) is not timer_ref:
+ return
+ self._resize_recovery_timer = None
+ self._resize_recovery_pending = False
+ self._recover_after_resize(app, original_on_resize)
+
+ try:
+ loop = app.loop # type: ignore[attr-defined]
+ except Exception:
+ loop = None
+ if loop is not None:
+ try:
+ loop.call_soon_threadsafe(_run_recovery)
+ return
+ except Exception:
+ pass
+ _run_recovery()
+
+ with lock:
+ if old_timer is not None:
+ try:
+ old_timer.cancel()
+ except Exception:
+ pass
+ self._resize_recovery_pending = True
+ timer = threading.Timer(delay, lambda: _timer_fired(timer))
+ timer.daemon = True
+ self._resize_recovery_timer = timer
+ timer.start()
except Exception:
- pass
+ self._resize_recovery_pending = False
+ self._recover_after_resize(app, original_on_resize)
def _status_bar_context_style(self, percent_used: Optional[int]) -> str:
if percent_used is None:
@@ -2368,6 +2577,15 @@ class HermesCLI:
return "class:status-bar-warn"
return "class:status-bar-good"
+ @staticmethod
+ def _compression_count_style(count: int) -> str:
+ """Return a style class reflecting context compression pressure."""
+ if count >= 10:
+ return "class:status-bar-bad"
+ if count >= 5:
+ return "class:status-bar-warn"
+ return "class:status-bar-dim"
+
def _build_context_bar(self, percent_used: Optional[int], width: int = 10) -> str:
safe_percent = max(0, min(100, percent_used or 0))
filled = round((safe_percent / 100) * width)
@@ -2573,29 +2791,68 @@ class HermesCLI:
elapsed = time.monotonic() - t0
if elapsed >= 60:
_m, _s = int(elapsed // 60), int(elapsed % 60)
- elapsed_str = f"{_m}m {_s}s"
+ # Fixed-width timer to avoid status-line wrap jitter while
+ # scrolling/repainting (e.g. 01m05s, 12m09s).
+ elapsed_str = f"{_m:02d}m{_s:02d}s"
else:
- elapsed_str = f"{elapsed:.1f}s"
+ # Keep width stable before the 60s rollover as well.
+ elapsed_str = f"{elapsed:5.1f}s"
return f" {txt} ({elapsed_str})"
return f" {txt}"
+ def _voice_record_key_label(self) -> str:
+ """Return the configured voice push-to-talk key formatted for UI.
+
+ Shared helper so every voice-facing status line / placeholder /
+ recording hint advertises the SAME label as the registered
+ prompt_toolkit binding.
+
+ Cached at startup (see ``set_voice_record_key_cache``) rather
+ than re-read per render. Two reasons (Copilot round-13 on
+ #19835):
+
+ * The prompt_toolkit binding is registered once at session
+ start via ``@kb.add(_voice_key)``; re-reading config per
+ render meant the status bar could advertise a new shortcut
+ after a config edit while the actual binding was still the
+ startup chord — exactly the display/binding drift this PR
+ is trying to eliminate.
+ * The label is on the hot render path (status bar + composer
+ placeholder invalidated every 150ms during recording), so
+ reading config on every call added avoidable UI overhead.
+ """
+ return getattr(self, "_voice_record_key_display_cache", None) or "Ctrl+B"
+
+ def set_voice_record_key_cache(self, raw_key: object) -> None:
+ """Populate the voice label cache from a raw ``voice.record_key``.
+
+ Called at CLI startup after the prompt_toolkit binding is
+ registered so the cached label always matches the live binding.
+ """
+ try:
+ from hermes_cli.voice import format_voice_record_key_for_status
+ self._voice_record_key_display_cache = format_voice_record_key_for_status(raw_key)
+ except Exception:
+ self._voice_record_key_display_cache = "Ctrl+B"
+
def _get_voice_status_fragments(self, width: Optional[int] = None):
"""Return the voice status bar fragments for the interactive TUI."""
width = width or self._get_tui_terminal_width()
compact = self._use_minimal_tui_chrome(width=width)
+ label = self._voice_record_key_label()
if self._voice_recording:
if compact:
return [("class:voice-status-recording", " ● REC ")]
- return [("class:voice-status-recording", " ● REC Ctrl+B to stop ")]
+ return [("class:voice-status-recording", f" ● REC {label} to stop ")]
if self._voice_processing:
if compact:
return [("class:voice-status", " ◉ STT ")]
return [("class:voice-status", " ◉ Transcribing... ")]
if compact:
- return [("class:voice-status", " 🎤 Ctrl+B ")]
+ return [("class:voice-status", f" 🎤 {label} ")]
tts = " | TTS on" if self._voice_tts else ""
cont = " | Continuous" if self._voice_continuous else ""
- return [("class:voice-status", f" 🎤 Voice mode{tts}{cont} — Ctrl+B to record ")]
+ return [("class:voice-status", f" 🎤 Voice mode{tts}{cont} — {label} to record ")]
def _build_status_bar_text(self, width: Optional[int] = None) -> str:
"""Return a compact one-line session status string for the TUI footer."""
@@ -2612,6 +2869,9 @@ class HermesCLI:
return self._trim_status_bar_text(text, width)
if width < 76:
parts = [f"⚕ {snapshot['model_short']}", percent_label]
+ compressions = snapshot.get("compressions", 0)
+ if compressions:
+ parts.append(f"🗜️ {compressions}")
parts.append(duration_label)
return self._trim_status_bar_text(" · ".join(parts), width)
@@ -2622,7 +2882,10 @@ class HermesCLI:
else:
context_label = "ctx --"
+ compressions = snapshot.get("compressions", 0)
parts = [f"⚕ {snapshot['model_short']}", context_label, percent_label]
+ if compressions:
+ parts.append(f"🗜️ {compressions}")
parts.append(duration_label)
prompt_elapsed = snapshot.get("prompt_elapsed")
if prompt_elapsed:
@@ -2656,15 +2919,21 @@ class HermesCLI:
percent = snapshot["context_percent"]
percent_label = f"{percent}%" if percent is not None else "--"
if width < 76:
+ compressions = snapshot.get("compressions", 0)
frags = [
("class:status-bar", " ⚕ "),
("class:status-bar-strong", snapshot["model_short"]),
("class:status-bar-dim", " · "),
(self._status_bar_context_style(percent), percent_label),
+ ]
+ if compressions:
+ frags.append(("class:status-bar-dim", " · "))
+ frags.append((self._compression_count_style(compressions), f"🗜️ {compressions}"))
+ frags.extend([
("class:status-bar-dim", " · "),
("class:status-bar-dim", duration_label),
("class:status-bar", " "),
- ]
+ ])
else:
if snapshot["context_length"]:
ctx_total = _format_context_length(snapshot["context_length"])
@@ -2674,6 +2943,7 @@ class HermesCLI:
context_label = "ctx --"
bar_style = self._status_bar_context_style(percent)
+ compressions = snapshot.get("compressions", 0)
frags = [
("class:status-bar", " ⚕ "),
("class:status-bar-strong", snapshot["model_short"]),
@@ -2683,9 +2953,14 @@ class HermesCLI:
(bar_style, self._build_context_bar(percent)),
("class:status-bar-dim", " "),
(bar_style, percent_label),
+ ]
+ if compressions:
+ frags.append(("class:status-bar-dim", " │ "))
+ frags.append((self._compression_count_style(compressions), f"🗜️ {compressions}"))
+ frags.extend([
("class:status-bar-dim", " │ "),
("class:status-bar-dim", duration_label),
- ]
+ ])
# Position 7: per-prompt elapsed timer (live or frozen)
prompt_elapsed = snapshot.get("prompt_elapsed")
if prompt_elapsed:
@@ -3634,6 +3909,8 @@ class HermesCLI:
thinking_callback=self._on_thinking,
checkpoints_enabled=self.checkpoints_enabled,
checkpoint_max_snapshots=self.checkpoint_max_snapshots,
+ checkpoint_max_total_size_mb=self.checkpoint_max_total_size_mb,
+ checkpoint_max_file_size_mb=self.checkpoint_max_file_size_mb,
pass_session_id=self.pass_session_id,
skip_context_files=self.ignore_rules,
skip_memory=self.ignore_rules,
@@ -3991,7 +4268,26 @@ class HermesCLI:
padding=(0, 1),
style=_history_text_c,
)
- self._console_print(panel)
+ _record_output_history_entry(lambda: self._render_resume_history_panel_lines(panel))
+ with _suspend_output_history():
+ self._console_print(panel)
+
+ def _render_resume_history_panel_lines(self, panel) -> list[str]:
+ """Render the resume panel at the current terminal width for resize replay."""
+ from io import StringIO
+
+ buf = StringIO()
+ width = shutil.get_terminal_size((80, 24)).columns
+ console = Console(
+ file=buf,
+ force_terminal=True,
+ color_system="truecolor",
+ highlight=False,
+ width=width,
+ )
+ with _suspend_output_history():
+ console.print(panel)
+ return buf.getvalue().rstrip("\n").splitlines()
def _try_attach_clipboard_image(self) -> bool:
"""Check clipboard for an image and attach it if found.
@@ -6350,6 +6646,7 @@ class HermesCLI:
_cprint(f" {_DIM}✓ UI redrawn{_RST}")
elif canonical == "clear":
self.new_session(silent=True)
+ _clear_output_history()
# Clear terminal screen. Inside the TUI, Rich's console.clear()
# goes through patch_stdout's StdoutProxy which swallows the
# screen-clear escape sequences. Use prompt_toolkit's output
@@ -7080,7 +7377,20 @@ class HermesCLI:
if provider is not None:
print(f"🌐 Browser: {provider.provider_name()} (cloud)")
else:
- print("🌐 Browser: local headless Chromium (agent-browser)")
+ # Show engine info for local mode
+ try:
+ from tools.browser_tool import _get_browser_engine
+ engine = _get_browser_engine()
+ except Exception:
+ engine = "auto"
+ if engine == "lightpanda":
+ print("🌐 Browser: local Lightpanda (agent-browser --engine lightpanda)")
+ print(" ⚡ Lightpanda: faster navigation, no screenshot support")
+ print(" Automatic Chrome fallback for screenshots and failed commands")
+ elif engine == "chrome":
+ print("🌐 Browser: local headless Chrome (agent-browser --engine chrome)")
+ else:
+ print("🌐 Browser: local headless Chromium (agent-browser)")
print()
print(" /browser connect — connect to your live Chrome")
print(" /browser disconnect — revert to default")
@@ -7623,6 +7933,10 @@ class HermesCLI:
):
self.session_id = self.agent.session_id
self._pending_title = None
+ # Manual /compress replaces conversation_history with a new
+ # compressed handoff for the child session. Persist it from
+ # offset 0 so resume can recover the continuation after exit.
+ self.agent._flush_messages_to_session_db(self.conversation_history, None)
new_tokens = estimate_request_tokens_rough(
self.conversation_history,
system_prompt=_sys_prompt,
@@ -7677,6 +7991,7 @@ class HermesCLI:
output_tokens = getattr(agent, "session_output_tokens", 0) or 0
cache_read_tokens = getattr(agent, "session_cache_read_tokens", 0) or 0
cache_write_tokens = getattr(agent, "session_cache_write_tokens", 0) or 0
+ reasoning_tokens = getattr(agent, "session_reasoning_tokens", 0) or 0
prompt = agent.session_prompt_tokens
completion = agent.session_completion_tokens
total = agent.session_total_tokens
@@ -7708,6 +8023,8 @@ class HermesCLI:
print(f" Cache read tokens: {cache_read_tokens:>10,}")
print(f" Cache write tokens: {cache_write_tokens:>10,}")
print(f" Output tokens: {output_tokens:>10,}")
+ if reasoning_tokens:
+ print(f" ↳ Reasoning (subset): {reasoning_tokens:>10,}")
print(f" Prompt tokens (total): {prompt:>10,}")
print(f" Completion tokens: {completion:>10,}")
print(f" Total tokens: {total:>10,}")
@@ -8270,20 +8587,38 @@ class HermesCLI:
return
self._voice_recording = True
- # Load silence detection params from config
- voice_cfg = {}
+ # Load silence detection params from config. Shape-safe: a
+ # hand-edited ``voice: true`` / ``voice: cmd+b`` leaves
+ # ``load_config()['voice']`` as a non-dict; coerce to {} so
+ # continuous recording falls back to the documented defaults
+ # instead of crashing on ``.get()``.
+ voice_cfg: dict = {}
try:
from hermes_cli.config import load_config
- voice_cfg = load_config().get("voice", {})
+ _cfg = load_config().get("voice")
+ voice_cfg = _cfg if isinstance(_cfg, dict) else {}
except Exception:
pass
if self._voice_recorder is None:
self._voice_recorder = create_audio_recorder()
- # Apply config-driven silence params
- self._voice_recorder._silence_threshold = voice_cfg.get("silence_threshold", 200)
- self._voice_recorder._silence_duration = voice_cfg.get("silence_duration", 3.0)
+ # Apply config-driven silence params (numeric-guarded so YAML
+ # scalar corruption doesn't break recording start-up).
+ #
+ # ``bool`` is explicitly excluded from the numeric check — in
+ # Python bool is a subclass of int, so a hand-edited
+ # ``silence_threshold: true`` would otherwise be forwarded as
+ # ``1`` instead of falling back to the 200 default (Copilot
+ # round-12 on #19835).
+ _threshold = voice_cfg.get("silence_threshold")
+ _duration = voice_cfg.get("silence_duration")
+ self._voice_recorder._silence_threshold = (
+ _threshold if isinstance(_threshold, (int, float)) and not isinstance(_threshold, bool) else 200
+ )
+ self._voice_recorder._silence_duration = (
+ _duration if isinstance(_duration, (int, float)) and not isinstance(_duration, bool) else 3.0
+ )
def _on_silence():
"""Called by AudioRecorder when silence is detected after speech."""
@@ -8309,12 +8644,13 @@ class HermesCLI:
with self._voice_lock:
self._voice_recording = False
raise
+ _label = self._voice_record_key_label()
if getattr(self._voice_recorder, "supports_silence_autostop", True):
- _recording_hint = "auto-stops on silence | Ctrl+B to stop & exit continuous"
+ _recording_hint = f"auto-stops on silence | {_label} to stop & exit continuous"
elif _is_termux_environment():
- _recording_hint = "Termux:API capture | Ctrl+B to stop"
+ _recording_hint = f"Termux:API capture | {_label} to stop"
else:
- _recording_hint = "Ctrl+B to stop"
+ _recording_hint = f"{_label} to stop"
_cprint(f"\n{_ACCENT}● Recording...{_RST} {_DIM}({_recording_hint}){_RST}")
# Periodically refresh prompt to update audio level indicator
@@ -8559,10 +8895,12 @@ class HermesCLI:
with self._voice_lock:
self._voice_mode = True
- # Check config for auto_tts
+ # Check config for auto_tts (shape-safe — malformed ``voice:`` YAML
+ # leaves ``voice_config`` as a non-dict, so guard before .get()).
try:
from hermes_cli.config import load_config
- voice_config = load_config().get("voice", {})
+ _raw_voice = load_config().get("voice")
+ voice_config = _raw_voice if isinstance(_raw_voice, dict) else {}
if voice_config.get("auto_tts", False):
with self._voice_lock:
self._voice_tts = True
@@ -8574,13 +8912,11 @@ class HermesCLI:
# _voice_message_prefix property and its usage in _process_message().
tts_status = " (TTS enabled)" if self._voice_tts else ""
- try:
- from hermes_cli.config import load_config
- _raw_ptt = load_config().get("voice", {}).get("record_key", "ctrl+b")
- _ptt_key = _raw_ptt.lower().replace("ctrl+", "c-").replace("alt+", "a-")
- except Exception:
- _ptt_key = "c-b"
- _ptt_display = _ptt_key.replace("c-", "Ctrl+").upper()
+ # Use the startup-pinned cache so the advertised shortcut always
+ # matches the live prompt_toolkit binding — reading live config
+ # here would drift after a mid-session config edit (Copilot
+ # round-14 on #19835, same class as round-13).
+ _ptt_display = self._voice_record_key_label()
_cprint(f"\n{_ACCENT}Voice mode enabled{tts_status}{_RST}")
_cprint(f" {_DIM}{_ptt_display} to start/stop recording{_RST}")
_cprint(f" {_DIM}/voice tts to toggle speech output{_RST}")
@@ -8637,7 +8973,6 @@ class HermesCLI:
def _show_voice_status(self):
"""Show current voice mode status."""
- from hermes_cli.config import load_config
from tools.voice_mode import check_voice_requirements
reqs = check_voice_requirements()
@@ -8646,9 +8981,11 @@ class HermesCLI:
_cprint(f" Mode: {'ON' if self._voice_mode else 'OFF'}")
_cprint(f" TTS: {'ON' if self._voice_tts else 'OFF'}")
_cprint(f" Recording: {'YES' if self._voice_recording else 'no'}")
- _raw_key = load_config().get("voice", {}).get("record_key", "ctrl+b")
- _display_key = _raw_key.replace("ctrl+", "Ctrl+").upper() if "ctrl+" in _raw_key.lower() else _raw_key
- _cprint(f" Record key: {_display_key}")
+ # Display the startup-pinned label so /voice status always
+ # matches the live prompt_toolkit binding (Copilot round-14 on
+ # #19835, same class as round-13). Reading live config here
+ # would drift after a mid-session config edit.
+ _cprint(f" Record key: {self._voice_record_key_label()}")
_cprint(f"\n {_BOLD}Requirements:{_RST}")
for line in reqs["details"].split("\n"):
_cprint(f" {line}")
@@ -9912,6 +10249,24 @@ class HermesCLI:
_welcome_text = "Welcome to Hermes Agent! Type your message or /help for commands."
_welcome_color = "#FFF8DC"
self._console_print(f"[{_welcome_color}]{_welcome_text}[/]")
+
+ # Redaction opt-out warning (#17691): ON by default, loud when off.
+ # The redactor snapshots its state at import time so any toggle now
+ # won't affect the running process — we just want the operator to
+ # see that they're running without the safety net.
+ try:
+ _redact_raw = os.getenv("HERMES_REDACT_SECRETS", "true")
+ if _redact_raw.lower() not in ("1", "true", "yes", "on"):
+ self._console_print(
+ "[bold red]⚠ Secret redaction is DISABLED[/] "
+ f"(HERMES_REDACT_SECRETS={_redact_raw}). "
+ "API keys and tokens may appear verbatim in chat output, "
+ "session JSONs, and logs. Set "
+ "[cyan]security.redact_secrets: true[/] in config.yaml "
+ "to re-enable."
+ )
+ except Exception:
+ pass
# First-time OpenClaw-residue banner — fires once if ~/.openclaw/ exists
# after an OpenClaw→Hermes migration (especially migrations done by
# OpenClaw's own tool, which doesn't archive the source directory).
@@ -10051,7 +10406,6 @@ class HermesCLI:
# Key bindings for the input area
kb = KeyBindings()
- @kb.add('enter')
def handle_enter(event):
"""Handle Enter key - submit input.
@@ -10210,17 +10564,14 @@ class HermesCLI:
else:
self._pending_input.put(payload)
event.app.current_buffer.reset(append_to_history=True)
+
+ _bind_prompt_submit_keys(kb, handle_enter)
@kb.add('escape', 'enter')
def handle_alt_enter(event):
"""Alt+Enter inserts a newline for multi-line input."""
event.current_buffer.insert_text('\n')
- @kb.add('c-j')
- def handle_ctrl_enter(event):
- """Ctrl+Enter (c-j) inserts a newline. Most terminals send c-j for Ctrl+Enter."""
- event.current_buffer.insert_text('\n')
-
# VSCode/Cursor bind Ctrl+G to "Find Next" at the editor level, so
# the keystroke never reaches the embedded terminal. Alt+G is unbound
# in those IDEs and arrives here as ('escape', 'g') — register it as
@@ -10483,7 +10834,92 @@ class HermesCLI:
else:
self._should_exit = True
event.app.exit()
-
+
+ # Ctrl+Shift+C: no binding needed. Terminal emulators (GNOME Terminal,
+ # iTerm2, kitty, Windows Terminal, etc.) intercept Ctrl+Shift+C before
+ # the keystroke reaches the application's stdin — prompt_toolkit never
+ # sees it, and prompt_toolkit's key spec parser doesn't even recognise
+ # 'c-S-c' anyway (the Shift modifier is meaningless on control-sequence
+ # keys). #19884 added a handler for this; #19895 patched the resulting
+ # startup crash with try/except. Both were based on a misreading of how
+ # terminal key events propagate. Deleting the dead handler outright.
+
+ @kb.add('c-q') # Ctrl+Q
+ def handle_ctrl_q(event):
+ """Alternative interrupt/exit shortcut (Ctrl+Q).
+
+ Behaves like Ctrl+C: cancels active prompts, interrupts the
+ running agent, or clears the input buffer. Does not support
+ the double-press 'force exit' feature of Ctrl+C.
+ """
+ # Cancel active voice recording.
+ _should_cancel_voice = False
+ _recorder_ref = None
+ with cli_ref._voice_lock:
+ if cli_ref._voice_recording and cli_ref._voice_recorder:
+ _recorder_ref = cli_ref._voice_recorder
+ cli_ref._voice_recording = False
+ cli_ref._voice_continuous = False
+ _should_cancel_voice = True
+ if _should_cancel_voice:
+ _cprint(f"\n{_DIM}Recording cancelled.{_RST}")
+ threading.Thread(
+ target=_recorder_ref.cancel, daemon=True
+ ).start()
+ event.app.invalidate()
+ return
+
+ # Cancel sudo prompt
+ if self._sudo_state:
+ self._sudo_state["response_queue"].put("")
+ self._sudo_state = None
+ event.app.invalidate()
+ return
+
+ # Cancel secret prompt
+ if self._secret_state:
+ self._cancel_secret_capture()
+ event.app.current_buffer.reset()
+ event.app.invalidate()
+ return
+
+ # Cancel approval prompt (deny)
+ if self._approval_state:
+ self._approval_state["response_queue"].put("deny")
+ self._approval_state = None
+ event.app.invalidate()
+ return
+
+ # Cancel /model picker
+ if self._model_picker_state:
+ self._close_model_picker()
+ event.app.current_buffer.reset()
+ event.app.invalidate()
+ return
+
+ # Cancel clarify prompt
+ if self._clarify_state:
+ self._clarify_state["response_queue"].put(
+ "The user cancelled. Use your best judgement to proceed."
+ )
+ self._clarify_state = None
+ self._clarify_freetext = False
+ event.app.current_buffer.reset()
+ event.app.invalidate()
+ return
+
+ if self._agent_running and self.agent:
+ print("\n⚡ Interrupting agent...")
+ self.agent.interrupt()
+ else:
+ if event.app.current_buffer.text or self._attached_images:
+ event.app.current_buffer.reset()
+ self._attached_images.clear()
+ event.app.invalidate()
+ else:
+ self._should_exit = True
+ event.app.exit()
+
@kb.add('c-d')
def handle_ctrl_d(event):
"""Ctrl+D: delete char under cursor (standard readline behaviour).
@@ -10537,15 +10973,44 @@ class HermesCLI:
run_in_terminal(_suspend)
# Voice push-to-talk key: configurable via config.yaml (voice.record_key)
- # Default: Ctrl+B (avoids conflict with Ctrl+R readline reverse-search)
- # Config uses "ctrl+b" format; prompt_toolkit expects "c-b" format.
+ # Default: Ctrl+B (avoids conflict with Ctrl+R readline reverse-search).
+ # Config spellings (ctrl/control/alt/option/opt) are normalized to
+ # prompt_toolkit's c-x / a-x format via ``normalize_voice_record_key_for_prompt_toolkit``
+ # so the same config value binds identically in the TUI and CLI
+ # (Copilot round-9 review on #19835). ``super``/``win``/``windows``
+ # configs silently fall back to the default here since prompt_toolkit
+ # has no super modifier — log a warning so users notice the
+ # TUI/CLI split instead of a silent mismatch (round-11).
+ _raw_key: object = "ctrl+b"
try:
from hermes_cli.config import load_config
- _raw_key = load_config().get("voice", {}).get("record_key", "ctrl+b")
- _voice_key = _raw_key.lower().replace("ctrl+", "c-").replace("alt+", "a-")
+ from hermes_cli.voice import (
+ normalize_voice_record_key_for_prompt_toolkit,
+ voice_record_key_from_config,
+ )
+ _raw_key = voice_record_key_from_config(load_config())
+ _voice_key = normalize_voice_record_key_for_prompt_toolkit(_raw_key)
+ if (
+ isinstance(_raw_key, str)
+ and _raw_key.strip().lower().split("+", 1)[0].strip() in {"super", "win", "windows"}
+ and _voice_key == "c-b"
+ ):
+ logger.warning(
+ "voice.record_key %r uses a TUI-only modifier (super/win); "
+ "CLI fell back to Ctrl+B. Use ctrl+ or alt+ for "
+ "cross-runtime parity.",
+ _raw_key,
+ )
except Exception:
_voice_key = "c-b"
+ # Cache the UI label here — same ``_raw_key`` that drives the
+ # prompt_toolkit binding below. Every status / placeholder /
+ # recording-hint render reads this cached value so display can
+ # never drift from the live keybinding even if the user edits
+ # voice.record_key mid-session (Copilot round-13 on #19835).
+ self.set_voice_record_key_cache(_raw_key)
+
@kb.add(_voice_key)
def handle_voice_record(event):
"""Toggle voice recording when voice mode is active.
@@ -10705,7 +11170,7 @@ class HermesCLI:
def get_prompt():
return cli_ref._get_tui_prompt_fragments()
- # Create the input area with multiline (shift+enter), autocomplete, and paste handling
+ # Create the input area with multiline (Alt+Enter), autocomplete, and paste handling
from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
@@ -10848,7 +11313,8 @@ class HermesCLI:
def _get_placeholder():
if cli_ref._voice_recording:
- return "recording... Ctrl+B to stop, Ctrl+C to cancel"
+ _label = cli_ref._voice_record_key_label()
+ return f"recording... {_label} to stop, Ctrl+C to cancel"
if cli_ref._voice_processing:
return "transcribing..."
if cli_ref._sudo_state:
@@ -10868,7 +11334,8 @@ class HermesCLI:
if cli_ref._agent_running:
return "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
if cli_ref._voice_mode:
- return "type or Ctrl+B to record"
+ _label = cli_ref._voice_record_key_label()
+ return f"type or {_label} to record"
return ""
input_area.control.input_processors.append(_PlaceholderProcessor(_get_placeholder))
@@ -11445,6 +11912,7 @@ class HermesCLI:
mouse_support=False,
**({'cursor': _STEADY_CURSOR} if _STEADY_CURSOR is not None else {}),
)
+ _disable_prompt_toolkit_cpr_warning(app)
self._app = app # Store reference for clarify_callback
# ── Fix ghost status-bar lines on terminal resize ──────────────
@@ -11464,23 +11932,7 @@ class HermesCLI:
_original_on_resize = app._on_resize
def _resize_clear_ghosts():
- renderer = app.renderer
- try:
- out = renderer.output
- # Reset attributes, erase the entire screen, and home the
- # cursor. This overwrites any reflowed status-bar rows or
- # stale content the terminal kept from the prior layout.
- out.reset_attributes()
- out.erase_screen()
- out.cursor_goto(0, 0)
- out.flush()
- # Tell the renderer its tracked position is fresh so its
- # own erase() inside _on_resize doesn't cursor_up() past
- # the top of the screen.
- renderer.reset(leave_alternate_screen=False)
- except Exception:
- pass # never break resize handling
- _original_on_resize()
+ self._schedule_resize_recovery(app, _original_on_resize)
app._on_resize = _resize_clear_ghosts
@@ -11671,8 +12123,22 @@ class HermesCLI:
call _kill_process (SIGTERM + 1 s wait + SIGKILL if needed) →
return from _wait_for_process. ``time.sleep`` releases the
GIL so the daemon actually runs during the window.
+
+ Guarded ``logger.debug``: CPython's ``logging`` module is not
+ reentrant-safe. ``Logger.isEnabledFor`` caches level results
+ in ``Logger._cache``; under shutdown races the cache can be
+ cleared (``_clear_cache``) or mid-mutation when the signal
+ fires, raising ``KeyError: `` (e.g. ``KeyError: 10``
+ for DEBUG) inside the handler. That KeyError then escapes
+ before ``raise KeyboardInterrupt()`` can fire, which bypasses
+ prompt_toolkit's normal interrupt unwind and surfaces as the
+ EIO cascade from issue #13710. Wrap the log in a bare
+ ``try/except`` so the handler can never raise through it.
"""
- logger.debug("Received signal %s, triggering graceful shutdown", signum)
+ try:
+ logger.debug("Received signal %s, triggering graceful shutdown", signum)
+ except Exception:
+ pass # never let logging raise from a signal handler (#13710 regression)
try:
if getattr(self, "agent", None) and getattr(self, "_agent_running", False):
self.agent.interrupt(f"received signal {signum}")
@@ -11733,8 +12199,12 @@ class HermesCLI:
# Set the custom handler on prompt_toolkit's event loop
try:
import asyncio as _aio
- _loop = _aio.get_event_loop()
+ # Use get_running_loop() to avoid DeprecationWarning on
+ # Python 3.10+ when called outside an async context.
+ _loop = _aio.get_running_loop()
_loop.set_exception_handler(_suppress_closed_loop_errors)
+ except RuntimeError:
+ pass # No running loop -- nothing to patch
except Exception:
pass
app.run()
@@ -12069,7 +12539,18 @@ def main(
):
cli.session_id = cli.agent.session_id
response = result.get("final_response", "") if isinstance(result, dict) else str(result)
- if response:
+ # Surface backend errors that produced no visible output
+ # (e.g. invalid model slug → provider 4xx). Mirrors the
+ # interactive CLI path. Write to stderr so piped stdout
+ # stays clean for automation wrappers.
+ if (
+ not response
+ and isinstance(result, dict)
+ and result.get("error")
+ and (result.get("failed") or result.get("partial"))
+ ):
+ print(f"Error: {result['error']}", file=sys.stderr)
+ elif response:
print(response)
# Session ID goes to stderr so piped stdout is clean.
print(f"\nsession_id: {cli.session_id}", file=sys.stderr)
diff --git a/cron/jobs.py b/cron/jobs.py
index 5e493ae3f7..93ad4c17fb 100644
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -420,7 +420,7 @@ def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
def create_job(
- prompt: str,
+ prompt: Optional[str],
schedule: str,
name: Optional[str] = None,
repeat: Optional[int] = None,
@@ -435,12 +435,14 @@ def create_job(
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
+ no_agent: bool = False,
) -> Dict[str, Any]:
"""
Create a new cron job.
Args:
- prompt: The prompt to run (must be self-contained, or a task instruction when skill is set)
+ prompt: The prompt to run (must be self-contained, or a task instruction when skill is set).
+ Ignored when ``no_agent=True`` except as an optional name hint.
schedule: Schedule string (see parse_schedule)
name: Optional friendly name
repeat: How many times to run (None = forever, 1 = once)
@@ -451,21 +453,33 @@ def create_job(
model: Optional per-job model override
provider: Optional per-job provider override
base_url: Optional per-job base URL override
- script: Optional path to a Python script whose stdout is injected into the
- prompt each run. The script runs before the agent turn, and its output
- is prepended as context. Useful for data collection / change detection.
+ script: Optional path to a script whose stdout feeds the job. With
+ ``no_agent=True`` the script IS the job — its stdout is
+ delivered verbatim. Without ``no_agent``, its stdout is
+ injected into the agent's prompt as context (data-collection /
+ change-detection pattern). Paths resolve under
+ ~/.hermes/scripts/; ``.sh`` / ``.bash`` files run via bash,
+ anything else via Python.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
+ Ignored when ``no_agent=True``.
workdir: Optional absolute path. When set, the job runs as if launched
from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
that directory are injected into the system prompt, and the
terminal/file/code_exec tools use it as their working directory
(via TERMINAL_CWD). When unset, the old behaviour is preserved
(no context files injected, tools use the scheduler's cwd).
+ With ``no_agent=True``, ``workdir`` is still applied as the
+ script's cwd so relative paths inside the script behave
+ predictably.
+ no_agent: When True, skip the agent entirely — run ``script`` on schedule
+ and deliver its stdout directly. Empty stdout = silent (no
+ delivery). Requires ``script`` to be set. Ideal for classic
+ watchdogs and periodic alerts that don't need LLM reasoning.
Returns:
The created job dict
@@ -499,6 +513,16 @@ def create_job(
normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
+ normalized_no_agent = bool(no_agent)
+
+ # no_agent jobs are meaningless without a script — the script IS the job.
+ # Surface this as a clear ValueError at create time so bad configs never
+ # reach the scheduler.
+ if normalized_no_agent and not normalized_script:
+ raise ValueError(
+ "no_agent=True requires a script — with no agent and no script "
+ "there is nothing for the job to run."
+ )
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
@@ -508,7 +532,7 @@ def create_job(
else:
context_from = None
- label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
+ label_source = (prompt or (normalized_skills[0] if normalized_skills else None) or (normalized_script if normalized_no_agent else None)) or "cron job"
job = {
"id": job_id,
"name": name or label_source[:50].strip(),
@@ -519,6 +543,7 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
+ "no_agent": normalized_no_agent,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
@@ -785,6 +810,12 @@ def get_due_jobs() -> List[Dict[str, Any]]:
the job is fast-forwarded to the next future run instead of firing
immediately. This prevents a burst of missed jobs on gateway restart.
"""
+ with _jobs_file_lock:
+ return _get_due_jobs_locked()
+
+
+def _get_due_jobs_locked() -> List[Dict[str, Any]]:
+ """Inner implementation of get_due_jobs(); must be called with _jobs_file_lock held."""
now = _hermes_now()
raw_jobs = load_jobs()
jobs = [_apply_skill_fields(j) for j in copy.deepcopy(raw_jobs)]
diff --git a/cron/scheduler.py b/cron/scheduler.py
index cee1cb4067..97d0567300 100644
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -35,12 +35,25 @@ from typing import List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from hermes_constants import get_hermes_home
-from hermes_cli.config import load_config
+from hermes_cli.config import load_config, _expand_env_vars
from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
+class CronPromptInjectionBlocked(Exception):
+ """Raised by _build_job_prompt when the fully-assembled prompt trips the
+ injection scanner. Caught in run_job so the operator sees a clean
+ "job blocked" delivery instead of the scheduler crashing.
+
+ Assembled-prompt scanning (including loaded skill content) plugs the
+ gap from #3968: create-time scanning only covers the user-supplied
+ prompt field; skill content loaded at runtime was never scanned, so a
+ malicious skill could carry an injection payload that reached the
+ non-interactive (auto-approve) cron agent.
+ """
+
+
def _resolve_cron_enabled_toolsets(job: dict, cfg: dict) -> list[str] | None:
"""Resolve the toolset list for a cron job.
@@ -114,12 +127,20 @@ from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_
# locally for audit.
SILENT_MARKER = "[SILENT]"
-# Resolve Hermes home directory (respects HERMES_HOME override)
-_hermes_home = get_hermes_home()
+# Backward-compatible module override used by tests and emergency monkeypatches.
+_hermes_home: Path | None = None
-# File-based lock prevents concurrent ticks from gateway + daemon + systemd timer
-_LOCK_DIR = _hermes_home / "cron"
-_LOCK_FILE = _LOCK_DIR / ".tick.lock"
+
+def _get_hermes_home() -> Path:
+ """Resolve Hermes home dynamically while preserving test monkeypatch hooks."""
+ return _hermes_home or get_hermes_home()
+
+
+def _get_lock_paths() -> tuple[Path, Path]:
+ """Resolve cron lock paths at call time so profile/env changes are honored."""
+ hermes_home = _get_hermes_home()
+ lock_dir = hermes_home / "cron"
+ return lock_dir, lock_dir / ".tick.lock"
def _resolve_origin(job: dict) -> Optional[dict]:
@@ -144,9 +165,54 @@ def _resolve_origin(job: dict) -> Optional[dict]:
return None
+def _plugin_cron_env_var(platform_name: str) -> str:
+ """Return the cron home-channel env var registered by a plugin platform.
+
+ Falls through the platform registry so plugins that set
+ ``cron_deliver_env_var`` on their ``PlatformEntry`` get cron delivery
+ support without editing this module.
+ """
+ try:
+ from hermes_cli.plugins import discover_plugins
+ discover_plugins() # idempotent
+ from gateway.platform_registry import platform_registry
+ entry = platform_registry.get(platform_name.lower())
+ if entry and entry.cron_deliver_env_var:
+ return entry.cron_deliver_env_var
+ except Exception:
+ pass
+ return ""
+
+
+def _is_known_delivery_platform(platform_name: str) -> bool:
+ """Whether ``platform_name`` is a valid cron delivery target.
+
+ Hardcoded built-ins in ``_KNOWN_DELIVERY_PLATFORMS`` are checked first;
+ plugin platforms registered via ``PlatformEntry`` are accepted if they
+ provide a ``cron_deliver_env_var``.
+ """
+ name = platform_name.lower()
+ if name in _KNOWN_DELIVERY_PLATFORMS:
+ return True
+ return bool(_plugin_cron_env_var(name))
+
+
+def _resolve_home_env_var(platform_name: str) -> str:
+ """Return the env var name for a platform's cron home channel.
+
+ Built-in platforms are in ``_HOME_TARGET_ENV_VARS``; plugin platforms are
+ resolved from the platform registry.
+ """
+ name = platform_name.lower()
+ env_var = _HOME_TARGET_ENV_VARS.get(name)
+ if env_var:
+ return env_var
+ return _plugin_cron_env_var(name)
+
+
def _get_home_target_chat_id(platform_name: str) -> str:
"""Return the configured home target chat/room ID for a delivery platform."""
- env_var = _HOME_TARGET_ENV_VARS.get(platform_name.lower())
+ env_var = _resolve_home_env_var(platform_name)
if not env_var:
return ""
value = os.getenv(env_var, "")
@@ -159,7 +225,7 @@ def _get_home_target_chat_id(platform_name: str) -> str:
def _get_home_target_thread_id(platform_name: str) -> Optional[str]:
"""Return the optional thread/topic ID for a platform home target."""
- env_var = _HOME_TARGET_ENV_VARS.get(platform_name.lower())
+ env_var = _resolve_home_env_var(platform_name)
if not env_var:
return None
value = os.getenv(f"{env_var}_THREAD_ID", "").strip()
@@ -170,6 +236,24 @@ def _get_home_target_thread_id(platform_name: str) -> Optional[str]:
return value or None
+def _iter_home_target_platforms():
+ """Iterate built-in + plugin platform names that expose a home channel.
+
+ Used by the ``deliver=origin`` fallback when the job has no origin.
+ """
+ for name in _HOME_TARGET_ENV_VARS:
+ yield name
+ try:
+ from hermes_cli.plugins import discover_plugins
+ discover_plugins() # idempotent
+ from gateway.platform_registry import platform_registry
+ for entry in platform_registry.plugin_entries():
+ if entry.cron_deliver_env_var and entry.name not in _HOME_TARGET_ENV_VARS:
+ yield entry.name
+ except Exception:
+ pass
+
+
def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[dict]:
"""Resolve one concrete auto-delivery target for a cron job."""
@@ -187,7 +271,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
}
# Origin missing (e.g. job created via API/script) — try each
# platform's home channel as a fallback instead of silently dropping.
- for platform_name in _HOME_TARGET_ENV_VARS:
+ for platform_name in _iter_home_target_platforms():
chat_id = _get_home_target_chat_id(platform_name)
if chat_id:
logger.info(
@@ -243,7 +327,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
"thread_id": origin.get("thread_id"),
}
- if platform_name.lower() not in _KNOWN_DELIVERY_PLATFORMS:
+ if not _is_known_delivery_platform(platform_name):
return None
chat_id = _get_home_target_chat_id(platform_name)
if not chat_id:
@@ -576,8 +660,18 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
prevent arbitrary script execution via path traversal or absolute
path injection.
+ Supported interpreters (chosen by file extension):
+
+ * ``.sh`` / ``.bash`` — run with ``/bin/bash``
+ * anything else — run with the current Python interpreter
+ (``sys.executable``), preserving the original behaviour for
+ Python-based pre-check and data-collection scripts.
+
+ Shell support lets ``no_agent=True`` jobs ship classic bash watchdogs
+ (the `memory-watchdog.sh` pattern) without wrapping them in Python.
+
Args:
- script_path: Path to a Python script. Relative paths are resolved
+ script_path: Path to the script. Relative paths are resolved
against HERMES_HOME/scripts/. Absolute and ~-prefixed paths
are also validated to ensure they stay within the scripts dir.
@@ -587,7 +681,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
"""
from hermes_constants import get_hermes_home
- scripts_dir = get_hermes_home() / "scripts"
+ scripts_dir = _get_hermes_home() / "scripts"
scripts_dir.mkdir(parents=True, exist_ok=True)
scripts_dir_resolved = scripts_dir.resolve()
@@ -614,9 +708,19 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
script_timeout = _get_script_timeout()
+ # Pick an interpreter by extension. Bash for .sh/.bash, Python for
+ # everything else. We deliberately do NOT honour the file's own
+ # shebang: the scripts dir is trusted, but keeping the interpreter
+ # choice explicit here keeps the allowed surface small and auditable.
+ suffix = path.suffix.lower()
+ if suffix in (".sh", ".bash"):
+ argv = ["/bin/bash", str(path)]
+ else:
+ argv = [sys.executable, str(path)]
+
try:
result = subprocess.run(
- [sys.executable, str(path)],
+ argv,
capture_output=True,
text=True,
timeout=script_timeout,
@@ -777,7 +881,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
skill_names = [str(name).strip() for name in skills if str(name).strip()]
if not skill_names:
- return prompt
+ return _scan_assembled_cron_prompt(prompt, job)
from tools.skills_tool import skill_view
from tools.skill_usage import bump_use
@@ -820,7 +924,32 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
if prompt:
parts.extend(["", f"The user has provided the following instruction alongside the skill invocation: {prompt}"])
- return "\n".join(parts)
+ return _scan_assembled_cron_prompt("\n".join(parts), job)
+
+
+def _scan_assembled_cron_prompt(assembled: str, job: dict) -> str:
+ """Scan the fully-assembled cron prompt (including skill content) for
+ injection patterns. Raises ``CronPromptInjectionBlocked`` when a match
+ fires so ``run_job`` can surface a clear refusal to the operator.
+
+ Plugs the #3968 gap: ``_scan_cron_prompt`` runs on the user-supplied
+ prompt at create/update, but skill content is loaded from disk at
+ runtime and was never scanned. Since cron runs non-interactively
+ (auto-approves tool calls), a malicious skill carrying an injection
+ payload bypassed every gate.
+ """
+ from tools.cronjob_tools import _scan_cron_prompt
+
+ scan_error = _scan_cron_prompt(assembled)
+ if scan_error:
+ job_label = job.get("name") or job.get("id") or ""
+ logger.warning(
+ "Cron job '%s': assembled prompt blocked by injection scanner — %s",
+ job_label,
+ scan_error,
+ )
+ raise CronPromptInjectionBlocked(scan_error)
+ return assembled
def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
@@ -830,8 +959,120 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
Returns:
Tuple of (success, full_output_doc, final_response, error_message)
"""
+ job_id = job["id"]
+ job_name = job["name"]
+
+ # ---------------------------------------------------------------
+ # no_agent short-circuit — the script IS the job, no LLM involvement.
+ # ---------------------------------------------------------------
+ # This mirrors the classic "run a bash script on a timer, send its
+ # stdout to telegram" watchdog pattern. The agent path is skipped
+ # entirely: no AIAgent, no prompt, no tool loop, no token spend.
+ #
+ # We check this BEFORE importing run_agent / constructing SessionDB so
+ # a pure-script tick never pays for the agent machinery it isn't going
+ # to use. Keep this block self-contained.
+ #
+ # Semantics:
+ # - script stdout (trimmed) → delivered verbatim as the final message
+ # - empty stdout → silent run (no delivery, success=True)
+ # - non-zero exit / timeout → delivered as an error alert, success=False
+ # - wakeAgent=false gate → treated like empty stdout (silent), since
+ # the whole point of no_agent is that there
+ # is no agent to wake
+ if job.get("no_agent"):
+ script_path = job.get("script")
+ if not script_path:
+ err = "no_agent=True but no script is set for this job"
+ logger.error("Job '%s': %s", job_id, err)
+ return False, "", "", err
+
+ # Apply workdir if configured — lets scripts use predictable relative
+ # paths. For no_agent jobs this is just the subprocess cwd (not an
+ # agent TERMINAL_CWD bridge).
+ _job_workdir = (job.get("workdir") or "").strip() or None
+ _prior_cwd = None
+ if _job_workdir and Path(_job_workdir).is_dir():
+ _prior_cwd = os.getcwd()
+ try:
+ os.chdir(_job_workdir)
+ except OSError:
+ _prior_cwd = None
+
+ try:
+ ok, output = _run_job_script(script_path)
+ finally:
+ if _prior_cwd is not None:
+ try:
+ os.chdir(_prior_cwd)
+ except OSError:
+ pass
+
+ now_iso = _hermes_now().strftime("%Y-%m-%d %H:%M:%S")
+
+ if not ok:
+ # Script crashed / timed out / exited non-zero. Deliver the
+ # error so the user knows the watchdog itself broke — silent
+ # failure for an alerting job is the worst-case outcome.
+ alert = (
+ f"⚠ Cron watchdog '{job_name}' script failed\n\n"
+ f"{output}\n\n"
+ f"Time: {now_iso}"
+ )
+ doc = (
+ f"# Cron Job: {job_name}\n\n"
+ f"**Job ID:** {job_id}\n"
+ f"**Run Time:** {now_iso}\n"
+ f"**Mode:** no_agent (script)\n"
+ f"**Status:** script failed\n\n"
+ f"{output}\n"
+ )
+ return False, doc, alert, output
+
+ # Honour the wakeAgent gate as a silent signal — `wakeAgent: false`
+ # means "nothing to report this tick", same as empty stdout.
+ if not _parse_wake_gate(output):
+ logger.info(
+ "Job '%s' (no_agent): wakeAgent=false gate — silent run", job_id
+ )
+ silent_doc = (
+ f"# Cron Job: {job_name}\n\n"
+ f"**Job ID:** {job_id}\n"
+ f"**Run Time:** {now_iso}\n"
+ f"**Mode:** no_agent (script)\n"
+ f"**Status:** silent (wakeAgent=false)\n"
+ )
+ return True, silent_doc, SILENT_MARKER, None
+
+ if not output.strip():
+ logger.info("Job '%s' (no_agent): empty stdout — silent run", job_id)
+ silent_doc = (
+ f"# Cron Job: {job_name}\n\n"
+ f"**Job ID:** {job_id}\n"
+ f"**Run Time:** {now_iso}\n"
+ f"**Mode:** no_agent (script)\n"
+ f"**Status:** silent (empty output)\n"
+ )
+ return True, silent_doc, SILENT_MARKER, None
+
+ doc = (
+ f"# Cron Job: {job_name}\n\n"
+ f"**Job ID:** {job_id}\n"
+ f"**Run Time:** {now_iso}\n"
+ f"**Mode:** no_agent (script)\n\n"
+ f"---\n\n"
+ f"{output}\n"
+ )
+ return True, doc, output, None
+
+ # ---------------------------------------------------------------
+ # Default (LLM) path — import and construct the agent machinery now
+ # that we know we actually need it. Doing these imports here instead of
+ # at module top keeps no_agent ticks from paying for AIAgent / SessionDB
+ # construction costs.
+ # ---------------------------------------------------------------
from run_agent import AIAgent
-
+
# Initialize SQLite session store so cron job messages are persisted
# and discoverable via session_search (same pattern as gateway/run.py).
_session_db = None
@@ -840,9 +1081,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_session_db = SessionDB()
except Exception as e:
logger.debug("Job '%s': SQLite session store not available: %s", job.get("id", "?"), e)
-
- job_id = job["id"]
- job_name = job["name"]
# Wake-gate: if this job has a pre-check script, run it BEFORE building
# the prompt so a ``{"wakeAgent": false}`` response can short-circuit
@@ -866,7 +1104,31 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
)
return True, silent_doc, SILENT_MARKER, None
- prompt = _build_job_prompt(job, prerun_script=prerun_script)
+ try:
+ prompt = _build_job_prompt(job, prerun_script=prerun_script)
+ except CronPromptInjectionBlocked as block_exc:
+ # Assembled prompt (user prompt + loaded skill content) tripped the
+ # injection scanner. Refuse to run the agent this tick and surface
+ # a clear failure to the operator so they see WHY the scheduled job
+ # didn't run and can audit the offending skill.
+ logger.warning(
+ "Job '%s' (ID: %s): blocked by prompt-injection scanner — %s",
+ job_name, job_id, block_exc,
+ )
+ blocked_doc = (
+ f"# Cron Job: {job_name}\n\n"
+ f"**Job ID:** {job_id}\n"
+ f"**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}\n"
+ f"**Status:** BLOCKED\n\n"
+ "The assembled prompt (user prompt + loaded skill content) tripped "
+ "the cron injection scanner and the agent was NOT run.\n\n"
+ f"**Scanner result:** {block_exc}\n\n"
+ "Audit the skill(s) attached to this job for prompt-injection "
+ "payloads or invisible-unicode markers. If the skill is legitimate "
+ "and the match is a false positive, rephrase the content to avoid "
+ "the threat pattern (`tools/cronjob_tools.py::_CRON_THREAT_PATTERNS`)."
+ )
+ return False, blocked_doc, "", str(block_exc)
if prompt is None:
logger.info("Job '%s': script produced no output, skipping AI call.", job_name)
return True, "", SILENT_MARKER, None
@@ -929,9 +1191,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
# changes take effect without a gateway restart.
from dotenv import load_dotenv
try:
- load_dotenv(str(_hermes_home / ".env"), override=True, encoding="utf-8")
+ load_dotenv(str(_get_hermes_home() / ".env"), override=True, encoding="utf-8")
except UnicodeDecodeError:
- load_dotenv(str(_hermes_home / ".env"), override=True, encoding="latin-1")
+ load_dotenv(str(_get_hermes_home() / ".env"), override=True, encoding="latin-1")
delivery_target = _resolve_delivery_target(job)
if delivery_target:
@@ -949,10 +1211,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_cfg = {}
try:
import yaml
- _cfg_path = str(_hermes_home / "config.yaml")
+ _cfg_path = str(_get_hermes_home() / "config.yaml")
if os.path.exists(_cfg_path):
with open(_cfg_path) as _f:
_cfg = yaml.safe_load(_f) or {}
+ _cfg = _expand_env_vars(_cfg)
_model_cfg = _cfg.get("model", {})
if not job.get("model"):
if isinstance(_model_cfg, str):
@@ -982,7 +1245,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if prefill_file:
pfpath = Path(prefill_file).expanduser()
if not pfpath.is_absolute():
- pfpath = _hermes_home / pfpath
+ pfpath = _get_hermes_home() / pfpath
if pfpath.exists():
try:
with open(pfpath, "r", encoding="utf-8") as _pf:
@@ -1060,6 +1323,27 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
except Exception as e:
logger.debug("Job '%s': failed to load credential pool for %s: %s", job_id, runtime_provider, e)
+ # Initialize MCP servers so configured mcp_servers are available to
+ # the agent's tool registry before AIAgent is constructed. Without
+ # this, cron jobs never saw any MCP tools — only the gateway / CLI
+ # paths called discover_mcp_tools() at startup. Idempotent: subsequent
+ # ticks short-circuit on already-connected servers inside
+ # register_mcp_servers(). Non-fatal on failure: a broken MCP server
+ # shouldn't kill an otherwise-working cron job. See #4219.
+ try:
+ from tools.mcp_tool import discover_mcp_tools
+ _mcp_tools = discover_mcp_tools()
+ if _mcp_tools:
+ logger.info(
+ "Job '%s': %d MCP tool(s) available",
+ job_id, len(_mcp_tools),
+ )
+ except Exception as _mcp_exc:
+ logger.warning(
+ "Job '%s': MCP initialization failed (non-fatal): %s",
+ job_id, _mcp_exc,
+ )
+
agent = AIAgent(
model=model,
api_key=runtime.get("api_key"),
@@ -1306,12 +1590,13 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
Returns:
Number of jobs executed (0 if another tick is already running)
"""
- _LOCK_DIR.mkdir(parents=True, exist_ok=True)
+ lock_dir, lock_file = _get_lock_paths()
+ lock_dir.mkdir(parents=True, exist_ok=True)
# Cross-platform file locking: fcntl on Unix, msvcrt on Windows
lock_fd = None
try:
- lock_fd = open(_LOCK_FILE, "w")
+ lock_fd = open(lock_file, "w")
if fcntl:
fcntl.flock(lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
elif msvcrt:
diff --git a/docker-compose.yml b/docker-compose.yml
index bac125c93f..8bdc96b7a9 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -14,6 +14,9 @@
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
+# - If you override entrypoint, keep /opt/hermes/docker/entrypoint.sh in
+# the command chain. It drops root to the hermes user before gateway
+# files such as gateway.lock are created.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
@@ -41,6 +44,15 @@ services:
# - TEAMS_TENANT_ID=${TEAMS_TENANT_ID}
# - TEAMS_ALLOWED_USERS=${TEAMS_ALLOWED_USERS}
# - TEAMS_PORT=${TEAMS_PORT:-3978}
+ # Google Chat — uncomment and fill in to enable the Google Chat gateway.
+ # See website/docs/user-guide/messaging/google_chat.md for the full setup.
+ # The SA JSON path must point to a file mounted into the container —
+ # add a volume entry above (e.g. ``- ~/.hermes/google-chat-sa.json:/secrets/google-chat-sa.json:ro``)
+ # then set GOOGLE_CHAT_SERVICE_ACCOUNT_JSON to that mount path.
+ # - GOOGLE_CHAT_PROJECT_ID=${GOOGLE_CHAT_PROJECT_ID}
+ # - GOOGLE_CHAT_SUBSCRIPTION_NAME=${GOOGLE_CHAT_SUBSCRIPTION_NAME}
+ # - GOOGLE_CHAT_SERVICE_ACCOUNT_JSON=${GOOGLE_CHAT_SERVICE_ACCOUNT_JSON}
+ # - GOOGLE_CHAT_ALLOWED_USERS=${GOOGLE_CHAT_ALLOWED_USERS}
command: ["gateway", "run"]
dashboard:
diff --git a/docs/plans/2026-05-02-telegram-dm-user-managed-multisession-topics.md b/docs/plans/2026-05-02-telegram-dm-user-managed-multisession-topics.md
new file mode 100644
index 0000000000..43c0e5da78
--- /dev/null
+++ b/docs/plans/2026-05-02-telegram-dm-user-managed-multisession-topics.md
@@ -0,0 +1,473 @@
+# Telegram DM User-Managed Multi-Session Topics Implementation Plan
+
+> **For Hermes:** Use test-driven-development for implementation. Use subagent-driven-development only after this plan is split into small reviewed tasks.
+
+**Goal:** Add an opt-in Telegram DM multi-session mode where Telegram user-created private-chat topics become independent Hermes session lanes, while the root DM becomes a system lobby.
+
+**Architecture:** Rely on Telegram's native private-chat topic UI. Users create new topics with the `+` button; Hermes maps each `message_thread_id` to a separate session lane. Hermes does not create topics for normal `/new` flow and does not try to manage topic lifecycle beyond activation/status, root-lobby behavior, and restoring legacy sessions into a user-created topic.
+
+**Tech Stack:** Hermes gateway, Telegram Bot API 9.4+, python-telegram-bot adapter, SQLite SessionDB / side tables, pytest.
+
+---
+
+## 1. Product decisions
+
+### Accepted
+
+- PR-quality implementation: migrations, tests, docs, backwards compatibility.
+- Use SQLite persistence, not JSON sidecars.
+- Live status suffixes in topic titles are out of MVP.
+- Topic title sync/editing is out of MVP except future-compatible storage if cheap.
+- User creates Telegram topics manually through the Telegram bot interface.
+- `/new` does **not** create Telegram topics.
+- Root/main DM becomes a system lobby after activation.
+- Existing Telegram behavior remains unchanged until the feature is activated/enabled.
+- Migration of old sessions is supported through `/topic` listing and `/topic ` restore inside a user-created topic.
+
+### Telegram API assumptions verified from Bot API docs
+
+- `getMe` returns bot `User` fields:
+ - `has_topics_enabled`: forum/topic mode enabled in private chats.
+ - `allows_users_to_create_topics`: users may create/delete topics in private chats.
+- `createForumTopic` works for private chats with a user, but MVP does not rely on it for normal flow.
+- `Message.message_thread_id` identifies a topic in private chats.
+- `sendMessage` supports `message_thread_id` for private-chat topics.
+- `pinChatMessage` is allowed in private chats.
+
+---
+
+## 2. Target UX
+
+### 2.1 Activation from root/main DM
+
+User sends:
+
+```text
+/topic
+```
+
+Hermes:
+
+1. calls Telegram `getMe`;
+2. verifies `has_topics_enabled` and `allows_users_to_create_topics`;
+3. enables multi-session topic mode for this Telegram DM user/chat;
+4. sends an onboarding message;
+5. pins the onboarding message if configured;
+6. shows old/unlinked sessions that can be restored into topics.
+
+Suggested onboarding text:
+
+```text
+Multi-session mode is enabled.
+
+Create new Hermes chats with the + button in this bot interface. Each Telegram topic is an independent Hermes session, so you can work on different tasks in parallel.
+
+This main chat is reserved for system commands, status, and session management.
+
+To restore an old session:
+1. Use /topic here to see unlinked sessions.
+2. Create a new topic with the + button.
+3. Send /topic inside that topic.
+```
+
+### 2.2 Root/main DM after activation
+
+Root DM is a system lobby.
+
+Allowed/system commands include at least:
+
+- `/topic`
+- `/status`
+- `/sessions` if available
+- `/usage`
+- `/help`
+- `/platforms`
+
+Normal user prompts in root DM do not enter the agent loop. Reply:
+
+```text
+This main chat is reserved for system commands.
+
+To chat with Hermes, create a new topic using the + button in this bot interface. Each topic works as an independent Hermes session.
+```
+
+`/new` in root DM does not create a session/topic. Reply:
+
+```text
+To start a new parallel Hermes chat, create a new topic with the + button in this bot interface.
+
+Each topic is an independent Hermes session. Use /new inside a topic only if you want to replace that topic's current session.
+```
+
+### 2.3 First message in a user-created topic
+
+When a user creates a Telegram topic and sends the first message there:
+
+1. Hermes receives a Telegram DM message with `message_thread_id`.
+2. Hermes derives the existing thread-aware `session_key` from `(platform=telegram, chat_type=dm, chat_id, thread_id)`.
+3. If no binding exists, Hermes creates a fresh Hermes session for this topic lane and persists the binding.
+4. The message runs through the normal agent loop for that lane.
+
+### 2.4 `/new` inside a non-main topic
+
+`/new` remains supported but replaces the session attached to the current topic lane.
+
+Hermes should warn:
+
+```text
+Started a new Hermes session in this topic.
+
+Tip: for parallel work, create a new topic with the + button instead of using /new here. /new replaces the session attached to the current topic.
+```
+
+### 2.5 `/topic` in root/main DM after activation
+
+Shows:
+
+- mode enabled/disabled;
+- last capability check result;
+- whether intro message is pinned if known;
+- count of known topic bindings;
+- list of old/unlinked sessions.
+
+Example:
+
+```text
+Telegram multi-session topics are enabled.
+
+Create new Hermes chats with the + button in this bot interface.
+
+Unlinked previous sessions:
+1. 2026-05-01 Research notes — id: abc123
+2. 2026-04-30 Deploy debugging — id: def456
+3. Untitled session — id: ghi789
+
+To restore one:
+1. Create a new topic with the + button.
+2. Open that topic.
+3. Send /topic
+```
+
+### 2.6 `/topic` inside a non-main topic
+
+Without args, show the current topic binding:
+
+```text
+This topic is linked to:
+Session: Research notes
+ID: abc123
+
+Use /new to replace this topic with a fresh session.
+For parallel work, create another topic with the + button.
+```
+
+### 2.7 `/topic ` inside a non-main topic
+
+Restore an old/unlinked session into the current user-created topic.
+
+Behavior:
+
+1. reject if not in Telegram DM topic;
+2. verify session belongs to the same Telegram user/chat or is a safe legacy root DM session for this user;
+3. reject if session is already linked to another active topic in MVP;
+4. `SessionStore.switch_session(current_topic_session_key, target_session_id)`;
+5. upsert binding with `managed_mode = restored`;
+6. send two messages into the topic:
+ - session restored confirmation;
+ - last Hermes assistant message if available.
+
+Example:
+
+```text
+Session restored: Research notes
+
+Last Hermes message:
+...
+```
+
+---
+
+## 3. Persistence model
+
+Use SQLite, but topic-mode schema changes are **explicit opt-in migrations**, not automatic startup reconciliation.
+
+Important rollback-safety rule:
+
+- upgrading Hermes and starting the gateway must not create Telegram topic-mode tables or columns;
+- old/default Telegram behavior must keep working on the existing `state.db`;
+- the first `/topic` activation path calls an idempotent explicit migration, then enables topic mode for that chat;
+- if activation fails before the migration is needed, the database remains in the pre-topic-mode shape.
+
+### 3.1 No eager `sessions` table mutation for MVP
+
+Do **not** add `chat_id`, `chat_type`, `thread_id`, or `session_key` columns to `sessions` as part of ordinary `SessionDB()` startup. The existing declarative `_reconcile_columns()` mechanism would add them eagerly on every process start, which violates the managed-migration requirement.
+
+For MVP, keep origin/session-lane data in topic-specific side tables created only by the explicit `/topic` migration. Legacy unlinked sessions can be discovered conservatively from existing data (`source = telegram`, `user_id = current Telegram user`) plus absence from topic bindings.
+
+If future PRs need richer origin metadata for all gateway sessions, introduce it behind a separate explicit migration/command or a compatibility-reviewed schema bump.
+
+### 3.2 Explicit `/topic` migration API
+
+Add an idempotent method such as:
+
+```python
+def apply_telegram_topic_migration(self) -> None: ...
+```
+
+It creates only topic-mode side tables/indexes and records:
+
+```text
+state_meta.telegram_dm_topic_schema_version = 1
+```
+
+This method is called from `/topic` activation/status paths before reading or writing topic-mode state. It is not called from generic `SessionDB.__init__`, gateway startup, CLI startup, or auto-maintenance.
+
+### 3.3 `telegram_dm_topic_mode`
+
+Stores per-user/chat activation state. Created only by `apply_telegram_topic_migration()`.
+
+Suggested fields:
+
+- `chat_id` primary key
+- `user_id`
+- `enabled`
+- `activated_at`
+- `updated_at`
+- `has_topics_enabled`
+- `allows_users_to_create_topics`
+- `capability_checked_at`
+- `intro_message_id`
+- `pinned_message_id`
+
+### 3.4 `telegram_dm_topic_bindings`
+
+Stores Telegram topic/thread to Hermes session binding. Created only by `apply_telegram_topic_migration()`.
+
+Suggested fields:
+
+- `chat_id`
+- `thread_id`
+- `user_id`
+- `session_key`
+- `session_id`
+- `managed_mode`
+ - `auto`
+ - `restored`
+ - `new_replaced`
+- `linked_at`
+- `updated_at`
+
+Recommended constraints:
+
+- primary key `(chat_id, thread_id)`;
+- unique index on `session_id` for MVP to prevent one session linked to multiple topics;
+- index `(user_id, chat_id)` for status/listing.
+
+### 3.5 Unlinked session semantics
+
+For MVP, a session is unlinked if:
+
+- `source = telegram`;
+- `user_id = current Telegram user`;
+- no row in `telegram_dm_topic_bindings` has `session_id = session_id`.
+
+This is intentionally conservative until a future explicit migration adds richer cross-platform origin metadata.
+
+Never dedupe by title.
+
+---
+
+## 4. Config
+
+Suggested config block:
+
+```yaml
+platforms:
+ telegram:
+ extra:
+ multisession_topics:
+ enabled: false
+ mode: user_managed_topics
+ root_chat_behavior: system_lobby
+ pin_intro_message: true
+```
+
+Notes:
+
+- `enabled: false` means existing Telegram behavior is unchanged.
+- Activation via `/topic` may create per-chat enabled state only if global config permits it.
+- `root_chat_behavior: system_lobby` is the MVP behavior for activated chats.
+
+---
+
+## 5. Command behavior summary
+
+### `/topic` root/main DM
+
+- If not activated: capability check, activate, send/pin onboarding, list unlinked sessions.
+- If activated: show status and unlinked sessions.
+
+### `/topic` non-main topic
+
+- Show current binding.
+
+### `/topic ` root/main DM
+
+Reject with instructions:
+
+```text
+Create a new topic with the + button, open it, then send /topic there to restore this session.
+```
+
+### `/topic ` non-main topic
+
+Restore that session into this topic if ownership/linking checks pass.
+
+### `/new` root/main DM when activated
+
+Reply with instructions to use the `+` button. Do not enter agent loop.
+
+### `/new` non-main topic
+
+Create a new session in the current topic lane, persist/update binding, warn that `+` is preferred for parallel work.
+
+### Normal text root/main DM when activated
+
+Reply with system-lobby instruction. Do not enter agent loop.
+
+### Normal text non-main topic
+
+Normal Hermes agent flow for that topic's session lane.
+
+---
+
+## 6. PR breakdown
+
+### PR 1 — Explicit topic-mode schema migration
+
+**Goal:** Add rollback-safe SQLite support for Telegram topic mode without mutating `state.db` on ordinary upgrade/startup.
+
+**Files likely touched:**
+
+- `hermes_state.py`
+- tests under `tests/`
+
+**Tests first:**
+
+1. opening an old/current DB with `SessionDB()` does not create topic-mode tables or `sessions` origin columns;
+2. calling `apply_telegram_topic_migration()` creates `telegram_dm_topic_mode` and `telegram_dm_topic_bindings` idempotently;
+3. migration records `state_meta.telegram_dm_topic_schema_version = 1`.
+
+### PR 2 — Topic mode activation and binding APIs
+
+**Goal:** Add SQLite persistence for activation and topic bindings.
+
+**Tests first:**
+
+1. enable/check mode row round-trips;
+2. binding upsert and lookup by `(chat_id, user_id, thread_id)`;
+3. linked sessions are excluded from unlinked list.
+
+### PR 3 — `/topic` activation/status command
+
+**Goal:** Implement root activation/status/listing behavior.
+
+**Tests first:**
+
+1. `/topic` in root checks `getMe` capabilities and records activation;
+2. capability failure returns readable instructions;
+3. activated root `/topic` lists unlinked sessions.
+
+### PR 4 — System lobby behavior
+
+**Goal:** Prevent root chat from entering agent loop after activation.
+
+**Tests first:**
+
+1. normal text in activated root returns lobby instruction;
+2. `/new` in activated root returns `+` button instruction;
+3. non-activated root behavior is unchanged.
+
+### PR 5 — Auto-bind user-created topics
+
+**Goal:** First message in non-main topic creates/uses an independent session lane.
+
+**Tests first:**
+
+1. new topic message creates binding with `auto_created`;
+2. repeated topic message reuses same binding/lane;
+3. two topics in same DM do not share sessions.
+
+### PR 6 — Restore legacy sessions into a topic
+
+**Goal:** Implement `/topic ` in non-main topics.
+
+**Tests first:**
+
+1. root `/topic ` rejects with instructions;
+2. topic `/topic ` switches current topic lane to target session;
+3. restore rejects sessions from other users/chats;
+4. restore rejects already-linked sessions;
+5. restore emits confirmation and last Hermes assistant message.
+
+### PR 7 — `/new` inside topic updates binding
+
+**Goal:** Keep existing `/new` semantics but persist topic binding replacement.
+
+**Tests first:**
+
+1. `/new` in topic creates a new session for same topic lane;
+2. binding updates to `managed_mode = new_replaced`;
+3. response includes guidance to use `+` for parallel work.
+
+### PR 8 — Docs and polish
+
+**Goal:** Document the feature and Telegram setup.
+
+**Files likely touched:**
+
+- `website/docs/user-guide/messaging/telegram.md`
+- maybe `website/docs/user-guide/sessions.md`
+
+Docs must explain:
+
+- BotFather/Telegram settings for topic mode and user-created topics;
+- `/topic` activation;
+- root system lobby;
+- using `+` for new parallel chats;
+- restoring old sessions with `/topic ` inside a topic;
+- limitations.
+
+---
+
+## 7. Testing / quality gates
+
+Run targeted tests after each TDD cycle, then broader tests before completion.
+
+Suggested commands after inspection confirms test paths:
+
+```bash
+python -m pytest tests/test_hermes_state.py -q
+python -m pytest tests/gateway/ -q
+python -m pytest tests/ -o 'addopts=' -q
+```
+
+Do not ship without verifying disabled-feature backwards compatibility.
+
+---
+
+## 8. Definition of done for MVP
+
+- `/topic` activates/checks Telegram DM multi-session mode.
+- Root DM becomes a system lobby after activation.
+- Onboarding message tells users to create new chats with the Telegram `+` button.
+- Onboarding message can be pinned in private chat.
+- User-created topics automatically become independent Hermes session lanes.
+- `/new` in root gives instructions, not a new agent run.
+- `/new` in a topic creates a new session in that topic and warns that `+` is preferred for parallel work.
+- `/topic` in root lists unlinked old sessions.
+- `/topic ` inside a topic restores that session and sends confirmation + last Hermes assistant message.
+- Ownership checks prevent restoring other users' sessions.
+- Already-linked sessions are not restored into a second topic in MVP.
+- Existing Telegram behavior is unchanged when the feature is disabled.
+- Tests and docs are included.
diff --git a/environments/README.md b/environments/README.md
index 9677fdb70e..3936e1f35b 100644
--- a/environments/README.md
+++ b/environments/README.md
@@ -40,7 +40,7 @@ This directory contains the integration layer between **hermes-agent's** tool-ca
- `evaluate_log()` for saving eval results to JSON + samples.jsonl
**HermesAgentBaseEnv** (`hermes_base_env.py`) extends BaseEnv with hermes-agent specifics:
-- Sets `os.environ["TERMINAL_ENV"]` to configure the terminal backend (local, docker, modal, daytona, ssh, singularity)
+- Sets `os.environ["TERMINAL_ENV"]` to configure the terminal backend (local, docker, ssh, singularity, modal, daytona, vercel_sandbox)
- Resolves hermes-agent toolsets via `_resolve_tools_for_group()` (calls `get_tool_definitions()` which queries `tools/registry.py`)
- Implements `collect_trajectory()` which runs the full agent loop and computes rewards
- Supports two-phase operation (Phase 1: OpenAI server, Phase 2: VLLM ManagedServer)
diff --git a/gateway/assets/telegram-botfather-threads-settings.jpg b/gateway/assets/telegram-botfather-threads-settings.jpg
new file mode 100644
index 0000000000..b1de115acd
Binary files /dev/null and b/gateway/assets/telegram-botfather-threads-settings.jpg differ
diff --git a/gateway/config.py b/gateway/config.py
index fa64b9046d..6df6b5f4a5 100644
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -271,15 +271,23 @@ class PlatformConfig:
# - "first": Only first chunk threads to user's message (default)
# - "all": All chunks in multi-part replies thread to user's message
reply_to_mode: str = "first"
-
+
+ # Whether the gateway is allowed to send "♻️ Gateway online" /
+ # "♻ Gateway restarted" lifecycle notifications on this platform.
+ # Default True preserves prior behavior. Set False on platforms used
+ # by end users (e.g. Slack) where operator-flavored restart pings are
+ # noise; keep True for back-channels where the operator wants them.
+ gateway_restart_notification: bool = True
+
# Platform-specific settings
extra: Dict[str, Any] = field(default_factory=dict)
-
+
def to_dict(self) -> Dict[str, Any]:
result = {
"enabled": self.enabled,
"extra": self.extra,
"reply_to_mode": self.reply_to_mode,
+ "gateway_restart_notification": self.gateway_restart_notification,
}
if self.token:
result["token"] = self.token
@@ -288,19 +296,22 @@ class PlatformConfig:
if self.home_channel:
result["home_channel"] = self.home_channel.to_dict()
return result
-
+
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "PlatformConfig":
home_channel = None
if "home_channel" in data:
home_channel = HomeChannel.from_dict(data["home_channel"])
-
+
return cls(
enabled=_coerce_bool(data.get("enabled"), False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
reply_to_mode=data.get("reply_to_mode", "first"),
+ gateway_restart_notification=_coerce_bool(
+ data.get("gateway_restart_notification"), True
+ ),
extra=data.get("extra", {}),
)
@@ -798,6 +809,12 @@ def load_gateway_config() -> GatewayConfig:
os.environ["SLACK_FREE_RESPONSE_CHANNELS"] = str(frc)
if "reactions" in slack_cfg and not os.getenv("SLACK_REACTIONS"):
os.environ["SLACK_REACTIONS"] = str(slack_cfg["reactions"]).lower()
+ # allowed_channels: if set, bot ONLY responds in these channels (whitelist)
+ ac = slack_cfg.get("allowed_channels")
+ if ac is not None and not os.getenv("SLACK_ALLOWED_CHANNELS"):
+ if isinstance(ac, list):
+ ac = ",".join(str(v) for v in ac)
+ os.environ["SLACK_ALLOWED_CHANNELS"] = str(ac)
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
@@ -845,6 +862,16 @@ def load_gateway_config() -> GatewayConfig:
):
if yaml_key in allow_mentions_cfg and not os.getenv(env_key):
os.environ[env_key] = str(allow_mentions_cfg[yaml_key]).lower()
+ # reply_to_mode: top-level preferred, falls back to extra.reply_to_mode
+ # YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
+ _discord_extra = discord_cfg.get("extra") if isinstance(discord_cfg.get("extra"), dict) else {}
+ _discord_rtm = (
+ discord_cfg["reply_to_mode"] if "reply_to_mode" in discord_cfg
+ else _discord_extra.get("reply_to_mode")
+ )
+ if _discord_rtm is not None and not os.getenv("DISCORD_REPLY_TO_MODE"):
+ _rtm_str = "off" if _discord_rtm is False else str(_discord_rtm).lower()
+ os.environ["DISCORD_REPLY_TO_MODE"] = _rtm_str
# Bridge top-level require_mention to Telegram when the telegram: section
# does not already provide one. Users often write "require_mention: true"
@@ -872,6 +899,12 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["TELEGRAM_FREE_RESPONSE_CHATS"] = str(frc)
+ # allowed_chats: if set, bot ONLY responds in these group chats (whitelist)
+ ac = telegram_cfg.get("allowed_chats")
+ if ac is not None and not os.getenv("TELEGRAM_ALLOWED_CHATS"):
+ if isinstance(ac, list):
+ ac = ",".join(str(v) for v in ac)
+ os.environ["TELEGRAM_ALLOWED_CHATS"] = str(ac)
ignored_threads = telegram_cfg.get("ignored_threads")
if ignored_threads is not None and not os.getenv("TELEGRAM_IGNORED_THREADS"):
if isinstance(ignored_threads, list):
@@ -881,6 +914,16 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
+ # reply_to_mode: top-level preferred, falls back to extra.reply_to_mode
+ # YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
+ _telegram_extra = telegram_cfg.get("extra") if isinstance(telegram_cfg.get("extra"), dict) else {}
+ _telegram_rtm = (
+ telegram_cfg["reply_to_mode"] if "reply_to_mode" in telegram_cfg
+ else _telegram_extra.get("reply_to_mode")
+ )
+ if _telegram_rtm is not None and not os.getenv("TELEGRAM_REPLY_TO_MODE"):
+ _rtm_str = "off" if _telegram_rtm is False else str(_telegram_rtm).lower()
+ os.environ["TELEGRAM_REPLY_TO_MODE"] = _rtm_str
allowed_users = telegram_cfg.get("allow_from")
if allowed_users is not None and not os.getenv("TELEGRAM_ALLOWED_USERS"):
if isinstance(allowed_users, list):
@@ -945,12 +988,35 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["DINGTALK_FREE_RESPONSE_CHATS"] = str(frc)
+ # allowed_chats: if set, bot ONLY responds in these group chats (whitelist)
+ ac = dingtalk_cfg.get("allowed_chats")
+ if ac is not None and not os.getenv("DINGTALK_ALLOWED_CHATS"):
+ if isinstance(ac, list):
+ ac = ",".join(str(v) for v in ac)
+ os.environ["DINGTALK_ALLOWED_CHATS"] = str(ac)
allowed = dingtalk_cfg.get("allowed_users")
if allowed is not None and not os.getenv("DINGTALK_ALLOWED_USERS"):
if isinstance(allowed, list):
allowed = ",".join(str(v) for v in allowed)
os.environ["DINGTALK_ALLOWED_USERS"] = str(allowed)
+ # Mattermost settings → env vars (env vars take precedence)
+ mattermost_cfg = yaml_cfg.get("mattermost", {})
+ if isinstance(mattermost_cfg, dict):
+ if "require_mention" in mattermost_cfg and not os.getenv("MATTERMOST_REQUIRE_MENTION"):
+ os.environ["MATTERMOST_REQUIRE_MENTION"] = str(mattermost_cfg["require_mention"]).lower()
+ frc = mattermost_cfg.get("free_response_channels")
+ if frc is not None and not os.getenv("MATTERMOST_FREE_RESPONSE_CHANNELS"):
+ if isinstance(frc, list):
+ frc = ",".join(str(v) for v in frc)
+ os.environ["MATTERMOST_FREE_RESPONSE_CHANNELS"] = str(frc)
+ # allowed_channels: if set, bot ONLY responds in these channels (whitelist)
+ ac = mattermost_cfg.get("allowed_channels")
+ if ac is not None and not os.getenv("MATTERMOST_ALLOWED_CHANNELS"):
+ if isinstance(ac, list):
+ ac = ",".join(str(v) for v in ac)
+ os.environ["MATTERMOST_ALLOWED_CHANNELS"] = str(ac)
+
# Matrix settings → env vars (env vars take precedence)
matrix_cfg = yaml_cfg.get("matrix", {})
if isinstance(matrix_cfg, dict):
@@ -961,6 +1027,12 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["MATRIX_FREE_RESPONSE_ROOMS"] = str(frc)
+ # allowed_rooms: if set, bot ONLY responds in these rooms (whitelist)
+ ar = matrix_cfg.get("allowed_rooms")
+ if ar is not None and not os.getenv("MATRIX_ALLOWED_ROOMS"):
+ if isinstance(ar, list):
+ ar = ",".join(str(v) for v in ar)
+ os.environ["MATRIX_ALLOWED_ROOMS"] = str(ar)
if "auto_thread" in matrix_cfg and not os.getenv("MATRIX_AUTO_THREAD"):
os.environ["MATRIX_AUTO_THREAD"] = str(matrix_cfg["auto_thread"]).lower()
if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
@@ -1121,10 +1193,17 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# WhatsApp (typically uses different auth mechanism)
whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
- if whatsapp_enabled:
- if Platform.WHATSAPP not in config.platforms:
- config.platforms[Platform.WHATSAPP] = PlatformConfig()
- config.platforms[Platform.WHATSAPP].enabled = True
+ whatsapp_disabled_explicitly = os.getenv("WHATSAPP_ENABLED", "").lower() in ("false", "0", "no")
+ if Platform.WHATSAPP in config.platforms:
+ # YAML config exists — respect explicit disable
+ wa_cfg = config.platforms[Platform.WHATSAPP]
+ if whatsapp_disabled_explicitly:
+ wa_cfg.enabled = False
+ elif whatsapp_enabled:
+ wa_cfg.enabled = True
+ # else: keep whatever the YAML set
+ elif whatsapp_enabled:
+ config.platforms[Platform.WHATSAPP] = PlatformConfig(enabled=True)
whatsapp_home = os.getenv("WHATSAPP_HOME_CHANNEL")
if whatsapp_home and Platform.WHATSAPP in config.platforms:
config.platforms[Platform.WHATSAPP].home_channel = HomeChannel(
@@ -1585,7 +1664,10 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# Registry-driven enable for plugin platforms. Built-ins have explicit
# blocks above; plugins expose check_fn() which is the single source of
# truth for "are my env vars set?". When it returns True, ensure the
- # platform is enabled so start() will create its adapter.
+ # platform is enabled so start() will create its adapter. Plugins that
+ # need to seed ``PlatformConfig.extra`` from env vars (e.g. Google Chat's
+ # project_id / subscription_name) can supply ``env_enablement_fn`` on
+ # their PlatformEntry — called here BEFORE adapter construction.
try:
from hermes_cli.plugins import discover_plugins
discover_plugins() # idempotent
@@ -1601,5 +1683,31 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
if platform not in config.platforms:
config.platforms[platform] = PlatformConfig()
config.platforms[platform].enabled = True
+ # Seed extras from env if the plugin opted in.
+ if entry.env_enablement_fn is not None:
+ try:
+ seed = entry.env_enablement_fn()
+ except Exception as e:
+ logger.debug(
+ "env_enablement_fn for %s raised: %s", entry.name, e
+ )
+ seed = None
+ if isinstance(seed, dict) and seed:
+ # Extract the home_channel dict (if provided) so we wire it
+ # up as a proper HomeChannel dataclass. Everything else is
+ # merged into ``extra``.
+ home = seed.pop("home_channel", None)
+ config.platforms[platform].extra.update(seed)
+ if isinstance(home, dict) and home.get("chat_id"):
+ config.platforms[platform].home_channel = HomeChannel(
+ platform=platform,
+ chat_id=str(home["chat_id"]),
+ name=str(home.get("name") or "Home"),
+ thread_id=(
+ str(home["thread_id"])
+ if home.get("thread_id")
+ else None
+ ),
+ )
except Exception as e:
logger.debug("Plugin platform enable pass failed: %s", e)
diff --git a/gateway/display_config.py b/gateway/display_config.py
index 832f5cb2f2..55cc344677 100644
--- a/gateway/display_config.py
+++ b/gateway/display_config.py
@@ -35,6 +35,12 @@ _GLOBAL_DEFAULTS: dict[str, Any] = {
"show_reasoning": False,
"tool_preview_length": 0,
"streaming": None, # None = follow top-level streaming config
+ # When true, delete tool-progress / "Still working..." / status bubbles
+ # after the final response lands on platforms that support message
+ # deletion (e.g. Telegram). Off by default — progress is still shown
+ # live, just cleaned up after success so the chat doesn't fill up with
+ # stale breadcrumbs. Failed runs leave bubbles in place as breadcrumbs.
+ "cleanup_progress": False,
}
# ---------------------------------------------------------------------------
@@ -188,6 +194,10 @@ def _normalise(setting: str, value: Any) -> Any:
if isinstance(value, str):
return value.lower() in ("true", "1", "yes", "on")
return bool(value)
+ if setting == "cleanup_progress":
+ if isinstance(value, str):
+ return value.lower() in ("true", "1", "yes", "on")
+ return bool(value)
if setting == "tool_preview_length":
try:
return int(value)
diff --git a/gateway/pairing.py b/gateway/pairing.py
index d5f7ec6b96..af9ff2fdbf 100644
--- a/gateway/pairing.py
+++ b/gateway/pairing.py
@@ -195,12 +195,23 @@ class PairingStore:
"""
Approve a pairing code. Adds the user to the approved list.
- Returns {user_id, user_name} on success, None if code is invalid/expired.
+ Returns {user_id, user_name} on success, None if code is
+ invalid/expired OR the platform is currently locked out after
+ ``MAX_FAILED_ATTEMPTS`` failed approvals (#10195). Callers can
+ disambiguate with ``_is_locked_out(platform)``.
"""
with self._lock:
self._cleanup_expired(platform)
code = code.upper().strip()
+ # Lockout check — must run before the pending lookup so a
+ # valid code (e.g. one already sitting in pending) cannot be
+ # accepted once the lockout fires. Without this, the lockout
+ # only blocks `generate_code`, not `approve_code` — nullifying
+ # the brute-force protection for any code already issued.
+ if self._is_locked_out(platform):
+ return None
+
pending = self._load_json(self._pending_path(platform))
if code not in pending:
self._record_failed_attempt(platform)
diff --git a/gateway/platform_registry.py b/gateway/platform_registry.py
index 11303466da..a52f659692 100644
--- a/gateway/platform_registry.py
+++ b/gateway/platform_registry.py
@@ -110,6 +110,21 @@ class PlatformEntry:
# Do not use markdown."). Empty string = no hint.
platform_hint: str = ""
+ # ── Env-driven auto-configuration ──
+ # Optional: read env vars, return a dict of ``PlatformConfig.extra`` fields
+ # to seed when the platform is auto-enabled. Called during
+ # ``_apply_env_overrides`` BEFORE the adapter is constructed, so
+ # ``gateway status`` etc. can reflect env-only configuration without
+ # instantiating the adapter. Return ``None`` (or an empty dict) to skip.
+ # Signature: () -> Optional[dict[str, Any]]
+ env_enablement_fn: Optional[Callable[[], Optional[dict]]] = None
+
+ # Optional: home-channel env var name for cron/notification delivery
+ # (e.g. ``"IRC_HOME_CHANNEL"``). When set, ``cron.scheduler`` treats this
+ # platform as a valid ``deliver=`` target and reads the env var to
+ # resolve the default chat/room ID. Empty = no cron home-channel support.
+ cron_deliver_env_var: str = ""
+
class PlatformRegistry:
"""Central registry of platform adapters.
diff --git a/gateway/platforms/ADDING_A_PLATFORM.md b/gateway/platforms/ADDING_A_PLATFORM.md
index 7fd28245b1..5091c4647c 100644
--- a/gateway/platforms/ADDING_A_PLATFORM.md
+++ b/gateway/platforms/ADDING_A_PLATFORM.md
@@ -4,18 +4,34 @@ There are two ways to add a platform to the Hermes gateway:
## Plugin Path (Recommended for Community/Third-Party)
-Create a plugin directory in `~/.hermes/plugins/` with a `PLUGIN.yaml` and
-`adapter.py`. The adapter inherits from `BasePlatformAdapter` and registers
-via `ctx.register_platform()` in the `register(ctx)` entry point. This
-requires **zero changes to core Hermes code**.
+Create a plugin directory in `~/.hermes/plugins/` (or under `plugins/platforms/`
+for bundled plugins) with a `plugin.yaml` and `adapter.py`. The adapter
+inherits from `BasePlatformAdapter` and registers via
+`ctx.register_platform()` in the `register(ctx)` entry point. This requires
+**zero changes to core Hermes code**.
The plugin system automatically handles: adapter creation, config parsing,
user authorization, cron delivery, send_message routing, system prompt hints,
status display, gateway setup, and more.
-See `plugins/platforms/irc/` for a complete reference implementation, and
+**Three optional hooks cover the edges most adapters need:**
+
+- `env_enablement_fn: () -> Optional[dict]` — seeds `PlatformConfig.extra`
+ (and an optional `home_channel` dict) from env vars BEFORE the adapter is
+ constructed. Without this, env-only setups don't surface in
+ `hermes gateway status` or `get_connected_platforms()` until the SDK
+ instantiates.
+- `cron_deliver_env_var: str` — name of the `*_HOME_CHANNEL` env var. When
+ set, `deliver=` cron jobs route to this var without editing
+ `cron/scheduler.py`'s hardcoded sets.
+- `plugin.yaml` `requires_env` / `optional_env` rich-dict entries —
+ auto-populate `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` so the setup
+ wizard surfaces proper descriptions, prompts, password flags, and URLs.
+
+See `plugins/platforms/irc/`, `plugins/platforms/teams/`, and
+`plugins/platforms/google_chat/` for complete working examples, and
`website/docs/developer-guide/adding-platform-adapters.md` for the full
-plugin guide with code examples.
+plugin guide with code examples and hook documentation.
---
diff --git a/gateway/platforms/api_server.py b/gateway/platforms/api_server.py
index 230859023b..3b0375ff03 100644
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@@ -2,8 +2,8 @@
OpenAI-compatible API server platform adapter.
Exposes an HTTP server with endpoints:
-- POST /v1/chat/completions — OpenAI Chat Completions format (stateless; opt-in session continuity via X-Hermes-Session-Id header)
-- POST /v1/responses — OpenAI Responses API format (stateful via previous_response_id)
+- POST /v1/chat/completions — OpenAI Chat Completions format (stateless; opt-in session continuity via X-Hermes-Session-Id header; opt-in long-term memory scoping via X-Hermes-Session-Key header)
+- POST /v1/responses — OpenAI Responses API format (stateful via previous_response_id; X-Hermes-Session-Key supported)
- GET /v1/responses/{response_id} — Retrieve a stored response
- DELETE /v1/responses/{response_id} — Delete a stored response
- GET /v1/models — lists hermes-agent as an available model
@@ -56,7 +56,7 @@ logger = logging.getLogger(__name__)
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8642
MAX_STORED_RESPONSES = 100
-MAX_REQUEST_BYTES = 1_000_000 # 1 MB default limit for POST bodies
+MAX_REQUEST_BYTES = 10_000_000 # 10 MB — accommodates long agent conversations with tool calls
CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS = 30.0
MAX_NORMALIZED_TEXT_LENGTH = 65_536 # 64 KB cap for normalized content parts
MAX_CONTENT_LIST_SIZE = 1_000 # Max items when content is an array
@@ -698,6 +698,71 @@ class APIServerAdapter(BasePlatformAdapter):
status=401,
)
+ # ------------------------------------------------------------------
+ # Session header helpers
+ # ------------------------------------------------------------------
+
+ # Soft length cap for session identifiers. Headers are bounded in
+ # aggregate by aiohttp (``client_max_size`` / default 8 KiB per
+ # header), but we impose a tighter limit on the session headers so a
+ # caller can't burn memory by passing a multi-kilobyte "session key".
+ # 256 chars is well above any realistic stable channel identifier
+ # (e.g. ``agent:main:webui:dm:user-42``) while staying small enough
+ # that the sanitized form is safe to pass into Honcho / state.db.
+ _MAX_SESSION_HEADER_LEN = 256
+
+ def _parse_session_key_header(
+ self, request: "web.Request"
+ ) -> tuple[Optional[str], Optional["web.Response"]]:
+ """Extract and validate the ``X-Hermes-Session-Key`` header.
+
+ The session key is a stable per-channel identifier that scopes
+ long-term memory (e.g. Honcho sessions) across transcripts. It
+ is independent of ``X-Hermes-Session-Id``: callers may send
+ either, both, or neither.
+
+ Returns ``(session_key, None)`` on success (with an empty/absent
+ header yielding ``None`` for the key), or ``(None, error_response)``
+ on validation failure.
+
+ Security: like session continuation, accepting a caller-supplied
+ memory scope requires API-key authentication so that an
+ unauthenticated client on a local-only server can't inject itself
+ into another user's long-term memory scope by guessing a key.
+ """
+ raw = request.headers.get("X-Hermes-Session-Key", "").strip()
+ if not raw:
+ return None, None
+
+ if not self._api_key:
+ logger.warning(
+ "X-Hermes-Session-Key rejected: no API key configured. "
+ "Set API_SERVER_KEY to enable long-term memory scoping."
+ )
+ return None, web.json_response(
+ _openai_error(
+ "X-Hermes-Session-Key requires API key authentication. "
+ "Configure API_SERVER_KEY to enable this feature."
+ ),
+ status=403,
+ )
+
+ # Reject control characters that could enable header injection on
+ # the echo path.
+ if re.search(r'[\r\n\x00]', raw):
+ return None, web.json_response(
+ {"error": {"message": "Invalid session key", "type": "invalid_request_error"}},
+ status=400,
+ )
+
+ if len(raw) > self._MAX_SESSION_HEADER_LEN:
+ return None, web.json_response(
+ {"error": {"message": "Session key too long", "type": "invalid_request_error"}},
+ status=400,
+ )
+
+ return raw, None
+
# ------------------------------------------------------------------
# Session DB helper
# ------------------------------------------------------------------
@@ -728,6 +793,7 @@ class APIServerAdapter(BasePlatformAdapter):
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
+ gateway_session_key: Optional[str] = None,
) -> Any:
"""
Create an AIAgent instance using the gateway's runtime config.
@@ -736,6 +802,13 @@ class APIServerAdapter(BasePlatformAdapter):
base_url, etc. from config.yaml / env vars. Toolsets are resolved
from config.yaml platform_toolsets.api_server (same as all other
gateway platforms), falling back to the hermes-api-server default.
+
+ ``gateway_session_key`` is a stable per-channel identifier supplied
+ by the client (via ``X-Hermes-Session-Key``). Unlike ``session_id``
+ which scopes the short-term transcript and rotates on /new, this
+ key is meant to persist across transcripts so long-term memory
+ providers (e.g. Honcho) can scope their per-chat state correctly
+ — matching the semantics of the native gateway's ``session_key``.
"""
from run_agent import AIAgent
from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config, GatewayRunner
@@ -771,6 +844,7 @@ class APIServerAdapter(BasePlatformAdapter):
session_db=self._ensure_session_db(),
fallback_model=fallback_model,
reasoning_config=reasoning_config,
+ gateway_session_key=gateway_session_key,
)
return agent
@@ -843,6 +917,16 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "bearer",
"required": bool(self._api_key),
},
+ "runtime": {
+ "mode": "server_agent",
+ "tool_execution": "server",
+ "split_runtime": False,
+ "description": (
+ "The API server creates a server-side Hermes AIAgent; "
+ "tools execute on the API-server host unless a future "
+ "explicit split-runtime mode is enabled."
+ ),
+ },
"features": {
"chat_completions": True,
"chat_completions_streaming": True,
@@ -854,6 +938,7 @@ class APIServerAdapter(BasePlatformAdapter):
"run_stop": True,
"tool_progress_events": True,
"session_continuity_header": "X-Hermes-Session-Id",
+ "session_key_header": "X-Hermes-Session-Key",
"cors": bool(self._cors_origins),
},
"endpoints": {
@@ -925,6 +1010,15 @@ class APIServerAdapter(BasePlatformAdapter):
status=400,
)
+ # Allow caller to scope long-term memory (e.g. Honcho) with a
+ # stable per-channel identifier via X-Hermes-Session-Key. This
+ # is independent of X-Hermes-Session-Id: the key persists across
+ # transcripts while the id rotates when the caller starts a new
+ # transcript (i.e. /new semantics). See _parse_session_key_header.
+ gateway_session_key, key_err = self._parse_session_key_header(request)
+ if key_err is not None:
+ return key_err
+
# Allow caller to continue an existing session by passing X-Hermes-Session-Id.
# When provided, history is loaded from state.db instead of from the request body.
#
@@ -1059,11 +1153,13 @@ class APIServerAdapter(BasePlatformAdapter):
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
+ gateway_session_key=gateway_session_key,
))
return await self._write_sse_chat_completion(
request, completion_id, model_name, created, _stream_q,
agent_task, agent_ref, session_id=session_id,
+ gateway_session_key=gateway_session_key,
)
# Non-streaming: run the agent (with optional Idempotency-Key)
@@ -1073,6 +1169,7 @@ class APIServerAdapter(BasePlatformAdapter):
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
+ gateway_session_key=gateway_session_key,
)
idempotency_key = request.headers.get("Idempotency-Key")
@@ -1122,11 +1219,17 @@ class APIServerAdapter(BasePlatformAdapter):
},
}
- return web.json_response(response_data, headers={"X-Hermes-Session-Id": session_id})
+ response_headers = {
+ "X-Hermes-Session-Id": result.get("session_id", session_id),
+ }
+ if gateway_session_key:
+ response_headers["X-Hermes-Session-Key"] = gateway_session_key
+ return web.json_response(response_data, headers=response_headers)
async def _write_sse_chat_completion(
self, request: "web.Request", completion_id: str, model: str,
created: int, stream_q, agent_task, agent_ref=None, session_id: str = None,
+ gateway_session_key: str = None,
) -> "web.StreamResponse":
"""Write real streaming SSE from agent's stream_delta_callback queue.
@@ -1149,6 +1252,8 @@ class APIServerAdapter(BasePlatformAdapter):
sse_headers.update(cors)
if session_id:
sse_headers["X-Hermes-Session-Id"] = session_id
+ if gateway_session_key:
+ sse_headers["X-Hermes-Session-Key"] = gateway_session_key
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
@@ -1221,8 +1326,8 @@ class APIServerAdapter(BasePlatformAdapter):
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
- except Exception:
- pass
+ except Exception as exc:
+ logger.warning("Agent task %s failed, usage data lost: %s", completion_id, exc)
# Finish chunk
finish_chunk = {
@@ -1254,6 +1359,22 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", completion_id)
+ except Exception as _exc:
+ # Agent crashed mid-stream. Try to emit an error chunk
+ # so the client gets a proper response instead of a
+ # TransferEncodingError from incomplete chunked encoding.
+ import traceback as _tb
+ logger.error("Agent crashed mid-stream for %s: %s", completion_id, _tb.format_exc()[:300])
+ try:
+ error_chunk = {
+ "id": completion_id, "object": "chat.completion.chunk",
+ "created": created, "model": model,
+ "choices": [{"index": 0, "delta": {}, "finish_reason": "error"}],
+ }
+ await response.write(f"data: {json.dumps(error_chunk)}\n\n".encode())
+ await response.write(b"data: [DONE]\n\n")
+ except Exception:
+ pass
return response
@@ -1272,6 +1393,7 @@ class APIServerAdapter(BasePlatformAdapter):
conversation: Optional[str],
store: bool,
session_id: str,
+ gateway_session_key: Optional[str] = None,
) -> "web.StreamResponse":
"""Write an SSE stream for POST /v1/responses (OpenAI Responses API).
@@ -1314,6 +1436,8 @@ class APIServerAdapter(BasePlatformAdapter):
sse_headers.update(cors)
if session_id:
sse_headers["X-Hermes-Session-Id"] = session_id
+ if gateway_session_key:
+ sse_headers["X-Hermes-Session-Key"] = gateway_session_key
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
@@ -1571,20 +1695,54 @@ class APIServerAdapter(BasePlatformAdapter):
async def _dispatch(it) -> None:
"""Route a queue item to the correct SSE emitter.
- Plain strings are text deltas. Tagged tuples with
- ``__tool_started__`` / ``__tool_completed__`` prefixes
- are tool lifecycle events.
+ Plain strings are text deltas — they are batched (50ms)
+ to reduce Open WebUI re-render storms. Tagged tuples
+ with ``__tool_started__`` / ``__tool_completed__``
+ prefixes are tool lifecycle events and flush the buffer
+ before emitting.
"""
+ nonlocal _batch_timer
if isinstance(it, tuple) and len(it) == 2 and isinstance(it[0], str):
tag, payload = it
+ # Flush batched text before tool events
+ if _batch_buf:
+ await _flush_batch()
if tag == "__tool_started__":
await _emit_tool_started(payload)
elif tag == "__tool_completed__":
await _emit_tool_completed(payload)
- # Unknown tags are silently ignored (forward-compat).
elif isinstance(it, str):
- await _emit_text_delta(it)
- # Other types (non-string, non-tuple) are silently dropped.
+ # Batch text deltas — append to buffer, flush on timer
+ _batch_buf.append(it)
+ if _batch_timer is None:
+ _batch_timer = asyncio.create_task(_batch_flush_after(0.05))
+ # Other types are silently dropped.
+
+ # ── Batching state ──
+ _batch_buf: List[str] = []
+ _batch_timer: Optional[asyncio.Task] = None
+ _batch_lock = asyncio.Lock()
+
+ async def _batch_flush_after(delay: float) -> None:
+ """Wait delay seconds, then flush accumulated text deltas."""
+ try:
+ await asyncio.sleep(delay)
+ except asyncio.CancelledError:
+ return
+ # Clear timer reference BEFORE flush so new deltas
+ # can start a fresh timer while we emit
+ nonlocal _batch_buf, _batch_timer
+ _batch_timer = None
+ await _flush_batch()
+
+ async def _flush_batch() -> None:
+ """Emit a single SSE delta for all accumulated text."""
+ nonlocal _batch_buf
+ async with _batch_lock:
+ if _batch_buf:
+ combined = "".join(_batch_buf)
+ _batch_buf = []
+ await _emit_text_delta(combined)
loop = asyncio.get_running_loop()
while True:
@@ -1609,11 +1767,21 @@ class APIServerAdapter(BasePlatformAdapter):
continue
if item is None: # EOS sentinel
+ # Cancel pending timer and flush remaining batched text
+ if _batch_timer and not _batch_timer.done():
+ _batch_timer.cancel()
+ _batch_timer = None
+ if _batch_buf:
+ await _flush_batch()
break
await _dispatch(item)
last_activity = time.monotonic()
+ # Flush any final batched text before processing result
+ if _batch_buf:
+ await _flush_batch()
+
# Pick up agent result + usage from the completed task
try:
result, agent_usage = await agent_task
@@ -1664,6 +1832,31 @@ class APIServerAdapter(BasePlatformAdapter):
# payload still see the assistant text. This mirrors the
# shape produced by _extract_output_items in the batch path.
final_items: List[Dict[str, Any]] = list(emitted_items)
+
+ # Trim large content from tool call arguments to keep the
+ # response.completed event under ~100KB. Clients already
+ # received full details via incremental events.
+ for _item in final_items:
+ if _item.get("type") == "function_call":
+ try:
+ _args = json.loads(_item.get("arguments", "{}")) if isinstance(_item.get("arguments"), str) else _item.get("arguments", {})
+ if isinstance(_args, dict):
+ for _k in ("content", "query", "pattern", "old_string", "new_string"):
+ if isinstance(_args.get(_k), str) and len(_args[_k]) > 500:
+ _args[_k] = "[" + str(len(_args[_k])) + " chars — truncated for response.completed]"
+ _item["arguments"] = json.dumps(_args)
+ except Exception:
+ pass
+ elif _item.get("type") == "function_call_output":
+ _output = _item.get("output", [])
+ if isinstance(_output, list) and _output:
+ _first = _output[0]
+ if isinstance(_first, dict) and _first.get("type") == "input_text":
+ _text = _first.get("text", "")
+ if len(_text) > 1000:
+ _first["text"] = _text[:500] + "...[" + str(len(_text) - 500) + " more chars]"
+ _item["output"] = [_first]
+
final_items.append({
"type": "message",
"role": "assistant",
@@ -1705,12 +1898,12 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
- full_history = list(conversation_history)
- full_history.append({"role": "user", "content": user_message})
- if isinstance(result, dict) and result.get("messages"):
- full_history.extend(result["messages"])
- else:
- full_history.append({"role": "assistant", "content": final_response_text})
+ full_history = self._build_response_conversation_history(
+ conversation_history,
+ user_message,
+ result,
+ final_response_text,
+ )
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
@@ -1754,6 +1947,30 @@ class APIServerAdapter(BasePlatformAdapter):
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
+ except Exception as _exc:
+ # Agent crashed with an unhandled error (e.g. model API error like
+ # BadRequestError, AuthenticationError). Emit a response.failed
+ # event and properly terminate the SSE stream so the client doesn't
+ # get a TransferEncodingError from incomplete chunked encoding.
+ import traceback as _tb
+ _persist_incomplete_if_needed()
+ agent_error = _tb.format_exc()
+ try:
+ failed_env = _envelope("failed")
+ failed_env["output"] = list(emitted_items)
+ failed_env["error"] = {"message": str(_exc)[:500], "type": "server_error"}
+ failed_env["usage"] = {
+ "input_tokens": usage.get("input_tokens", 0),
+ "output_tokens": usage.get("output_tokens", 0),
+ "total_tokens": usage.get("total_tokens", 0),
+ }
+ await _write_event("response.failed", {
+ "type": "response.failed",
+ "response": failed_env,
+ })
+ except Exception:
+ pass
+ logger.error("Agent crashed mid-stream for %s: %s", response_id, str(agent_error)[:300])
return response
@@ -1763,6 +1980,11 @@ class APIServerAdapter(BasePlatformAdapter):
if auth_err:
return auth_err
+ # Long-term memory scope header (see chat_completions for details).
+ gateway_session_key, key_err = self._parse_session_key_header(request)
+ if key_err is not None:
+ return key_err
+
# Parse request body
try:
body = await request.json()
@@ -1914,6 +2136,7 @@ class APIServerAdapter(BasePlatformAdapter):
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
+ gateway_session_key=gateway_session_key,
))
response_id = f"resp_{uuid.uuid4().hex[:28]}"
@@ -1934,6 +2157,7 @@ class APIServerAdapter(BasePlatformAdapter):
conversation=conversation,
store=store,
session_id=session_id,
+ gateway_session_key=gateway_session_key,
)
async def _compute_response():
@@ -1942,6 +2166,7 @@ class APIServerAdapter(BasePlatformAdapter):
conversation_history=conversation_history,
ephemeral_system_prompt=instructions,
session_id=session_id,
+ gateway_session_key=gateway_session_key,
)
idempotency_key = request.headers.get("Idempotency-Key")
@@ -1977,17 +2202,22 @@ class APIServerAdapter(BasePlatformAdapter):
# Build the full conversation history for storage
# (includes tool calls from the agent run)
- full_history = list(conversation_history)
- full_history.append({"role": "user", "content": user_message})
- # Add agent's internal messages if available
- agent_messages = result.get("messages", [])
- if agent_messages:
- full_history.extend(agent_messages)
- else:
- full_history.append({"role": "assistant", "content": final_response})
+ full_history = self._build_response_conversation_history(
+ conversation_history,
+ user_message,
+ result,
+ final_response,
+ )
- # Build output items (includes tool calls + final message)
- output_items = self._extract_output_items(result)
+ # Build output items from the current turn only. AIAgent returns a
+ # full transcript in result["messages"], while older/mocked paths may
+ # return only the current turn suffix.
+ output_start_index = self._response_messages_turn_start_index(
+ conversation_history,
+ user_message,
+ result,
+ )
+ output_items = self._extract_output_items(result, start_index=output_start_index)
response_data = {
"id": response_id,
@@ -2016,7 +2246,10 @@ class APIServerAdapter(BasePlatformAdapter):
if conversation:
self._response_store.set_conversation(conversation, response_id)
- return web.json_response(response_data)
+ response_headers = {"X-Hermes-Session-Id": session_id}
+ if gateway_session_key:
+ response_headers["X-Hermes-Session-Key"] = gateway_session_key
+ return web.json_response(response_data, headers=response_headers)
# ------------------------------------------------------------------
# GET / DELETE response endpoints
@@ -2276,17 +2509,70 @@ class APIServerAdapter(BasePlatformAdapter):
# ------------------------------------------------------------------
@staticmethod
- def _extract_output_items(result: Dict[str, Any]) -> List[Dict[str, Any]]:
- """
- Build the full output item array from the agent's messages.
+ def _build_response_conversation_history(
+ conversation_history: List[Dict[str, Any]],
+ user_message: Any,
+ result: Dict[str, Any],
+ final_response: Any,
+ ) -> List[Dict[str, Any]]:
+ """Build the stored Responses transcript without duplicating history."""
+ prior = list(conversation_history)
+ current_user = {"role": "user", "content": user_message}
+ agent_messages = result.get("messages") if isinstance(result, dict) else None
- Walks *result["messages"]* and emits:
+ if isinstance(agent_messages, list) and agent_messages:
+ turn_start = APIServerAdapter._response_messages_turn_start_index(
+ conversation_history,
+ user_message,
+ result,
+ )
+ if turn_start:
+ return list(agent_messages)
+
+ full_history = prior
+ full_history.append(current_user)
+ full_history.extend(agent_messages)
+ return full_history
+
+ full_history = prior
+ full_history.append(current_user)
+ full_history.append({"role": "assistant", "content": final_response})
+ return full_history
+
+ @staticmethod
+ def _response_messages_turn_start_index(
+ conversation_history: List[Dict[str, Any]],
+ user_message: Any,
+ result: Dict[str, Any],
+ ) -> int:
+ """Detect transcript-shaped result["messages"] and return turn start."""
+ agent_messages = result.get("messages") if isinstance(result, dict) else None
+ if not isinstance(agent_messages, list) or not agent_messages:
+ return 0
+
+ prior = list(conversation_history)
+ current_user = {"role": "user", "content": user_message}
+ expected_prefix = prior + [current_user]
+ if agent_messages[:len(expected_prefix)] == expected_prefix:
+ return len(expected_prefix)
+ if prior and agent_messages[:len(prior)] == prior:
+ return len(prior)
+ return 0
+
+ @staticmethod
+ def _extract_output_items(result: Dict[str, Any], start_index: int = 0) -> List[Dict[str, Any]]:
+ """
+ Build the output item array from the agent's messages.
+
+ Walks *result["messages"]* starting at *start_index* and emits:
- ``function_call`` items for each tool_call on assistant messages
- ``function_call_output`` items for each tool-role message
- a final ``message`` item with the assistant's text reply
"""
items: List[Dict[str, Any]] = []
messages = result.get("messages", [])
+ if start_index > 0:
+ messages = messages[start_index:]
for msg in messages:
role = msg.get("role")
@@ -2338,6 +2624,7 @@ class APIServerAdapter(BasePlatformAdapter):
tool_start_callback=None,
tool_complete_callback=None,
agent_ref: Optional[list] = None,
+ gateway_session_key: Optional[str] = None,
) -> tuple:
"""
Create an agent and run a conversation in a thread executor.
@@ -2360,6 +2647,7 @@ class APIServerAdapter(BasePlatformAdapter):
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
+ gateway_session_key=gateway_session_key,
)
if agent_ref is not None:
agent_ref[0] = agent
@@ -2374,6 +2662,12 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": getattr(agent, "session_completion_tokens", 0) or 0,
"total_tokens": getattr(agent, "session_total_tokens", 0) or 0,
}
+ # Include the effective session ID in the result so callers
+ # (e.g. X-Hermes-Session-Id header) can track compression-
+ # triggered session rotations. (#16938)
+ _eff_sid = getattr(agent, "session_id", session_id)
+ if isinstance(_eff_sid, str) and _eff_sid:
+ result["session_id"] = _eff_sid
return result, usage
return await loop.run_in_executor(None, _run)
@@ -2453,6 +2747,11 @@ class APIServerAdapter(BasePlatformAdapter):
if auth_err:
return auth_err
+ # Long-term memory scope header (see chat_completions for details).
+ gateway_session_key, key_err = self._parse_session_key_header(request)
+ if key_err is not None:
+ return key_err
+
# Enforce concurrency limit
if len(self._run_streams) >= self._MAX_CONCURRENT_RUNS:
return web.json_response(
@@ -2561,6 +2860,7 @@ class APIServerAdapter(BasePlatformAdapter):
session_id=session_id,
stream_delta_callback=_text_cb,
tool_progress_callback=event_cb,
+ gateway_session_key=gateway_session_key,
)
self._active_run_agents[run_id] = agent
def _run_sync():
@@ -2661,7 +2961,14 @@ class APIServerAdapter(BasePlatformAdapter):
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
- return web.json_response({"run_id": run_id, "status": "started"}, status=202)
+ response_headers = (
+ {"X-Hermes-Session-Key": gateway_session_key} if gateway_session_key else {}
+ )
+ return web.json_response(
+ {"run_id": run_id, "status": "started"},
+ status=202,
+ headers=response_headers,
+ )
async def _handle_get_run(self, request: "web.Request") -> "web.Response":
"""GET /v1/runs/{run_id} — return pollable run status for external UIs."""
@@ -2805,7 +3112,7 @@ class APIServerAdapter(BasePlatformAdapter):
try:
mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
- self._app = web.Application(middlewares=mws)
+ self._app = web.Application(middlewares=mws, client_max_size=MAX_REQUEST_BYTES)
self._app["api_server_adapter"] = self
self._app.router.add_get("/health", self._handle_health)
self._app.router.add_get("/health/detailed", self._handle_health_detailed)
diff --git a/gateway/platforms/base.py b/gateway/platforms/base.py
index 4d611fdaa5..0c238d4d09 100644
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@@ -1304,37 +1304,52 @@ class BasePlatformAdapter(ABC):
self._fatal_error_code = None
self._fatal_error_message = None
self._fatal_error_retryable = True
- try:
- from gateway.status import write_runtime_status
- write_runtime_status(platform=self.platform.value, platform_state="connected", error_code=None, error_message=None)
- except Exception:
- pass
+ self._write_runtime_status_safe("connected", platform_state="connected", error_code=None, error_message=None)
def _mark_disconnected(self) -> None:
self._running = False
if self.has_fatal_error:
return
- try:
- from gateway.status import write_runtime_status
- write_runtime_status(platform=self.platform.value, platform_state="disconnected", error_code=None, error_message=None)
- except Exception:
- pass
+ self._write_runtime_status_safe("disconnected", platform_state="disconnected", error_code=None, error_message=None)
def _set_fatal_error(self, code: str, message: str, *, retryable: bool) -> None:
self._running = False
self._fatal_error_code = code
self._fatal_error_message = message
self._fatal_error_retryable = retryable
+ self._write_runtime_status_safe("fatal", platform_state="fatal", error_code=code, error_message=message)
+
+ def _write_runtime_status_safe(self, context: str, **kwargs) -> None:
+ """Write runtime status; log first failure per context at warning, rest at debug.
+
+ Status writes can fail on permissions, ENOSPC, missing status dir, etc.
+ A persistently failing status dir used to be silent (``except: pass``).
+ Logging every failure would spam the log on reconnect loops, so this
+ surfaces the first failure per (platform, context) at warning level and
+ downgrades subsequent failures to debug.
+ """
try:
from gateway.status import write_runtime_status
- write_runtime_status(
- platform=self.platform.value,
- platform_state="fatal",
- error_code=code,
- error_message=message,
- )
- except Exception:
- pass
+ write_runtime_status(platform=self.platform.value, **kwargs)
+ except Exception as exc:
+ # Use getattr so object.__new__(...) test harnesses that skip __init__
+ # don't blow up on attribute access.
+ logged = getattr(self, "_status_write_logged", None)
+ if logged is None:
+ logged = set()
+ try:
+ self._status_write_logged = logged
+ except Exception:
+ pass
+ key = (self.platform.value, context)
+ if key not in logged:
+ logger.warning(
+ "Failed to write runtime status (%s) for %s: %s (further failures at debug level)",
+ context, self.platform.value, exc,
+ )
+ logged.add(key)
+ else:
+ logger.debug("Failed to write runtime status (%s) for %s: %s", context, self.platform.value, exc)
async def _notify_fatal_error(self) -> None:
handler = self._fatal_error_handler
@@ -1874,23 +1889,38 @@ class BasePlatformAdapter(ABC):
def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
"""
Extract MEDIA: tags and [[audio_as_voice]] directives from response text.
-
+
The TTS tool returns responses like:
[[audio_as_voice]]
MEDIA:/path/to/audio.ogg
-
+
+ Skills that produce large/lossless images (e.g. info-graph, where a
+ rendered JPG is 1-2 MB but Telegram's sendPhoto recompresses to
+ ~200 KB at 1280px) can use ``[[as_document]]`` to request unmodified
+ delivery via sendDocument instead of sendPhoto/sendMediaGroup. The
+ directive is detected at the dispatch sites (which have access to the
+ original response); this method just strips it so it never leaks into
+ user-visible text. Per-file granularity is intentionally not exposed —
+ when an agent emits ``[[as_document]]`` once, every image path in the
+ same response is delivered as a document, mirroring the all-or-nothing
+ scope of ``[[audio_as_voice]]``.
+
Args:
content: The response text to scan.
-
+
Returns:
Tuple of (list of (path, is_voice) pairs, cleaned content with tags removed).
"""
media = []
cleaned = content
-
+
# Check for [[audio_as_voice]] directive
has_voice_tag = "[[audio_as_voice]]" in content
cleaned = cleaned.replace("[[audio_as_voice]]", "")
+ # Strip [[as_document]] directive — callers inspect the original
+ # ``content`` for it (so they can still react to it); here we just
+ # keep it out of the user-visible cleaned text.
+ cleaned = cleaned.replace("[[as_document]]", "")
# Extract MEDIA: tags, allowing optional whitespace after the colon
# and quoted/backticked paths for LLM-formatted outputs.
@@ -2096,9 +2126,52 @@ class BasePlatformAdapter(ABC):
``generation`` lets callers tie the callback to a specific gateway run
generation so stale runs cannot clear callbacks owned by a fresher run.
+
+ If a callback for the same ``session_key`` (and generation, when set)
+ is already registered, the new callback is chained — both fire, in
+ registration order, with per-callback exception isolation. This lets
+ independent features (background-review release + temporary-bubble
+ cleanup) coexist without clobbering each other. Stale-generation
+ callers never overwrite a fresher generation's slot.
"""
if not session_key or not callable(callback):
return
+
+ existing = self._post_delivery_callbacks.get(session_key)
+ if existing is not None:
+ if isinstance(existing, tuple) and len(existing) == 2:
+ existing_gen, existing_cb = existing
+ else:
+ existing_gen, existing_cb = None, existing
+ # Stale-generation registrations never overwrite a fresher slot.
+ if (
+ existing_gen is not None
+ and generation is not None
+ and int(generation) < int(existing_gen)
+ ):
+ return
+ # Same-or-newer generation: chain with the existing callback so
+ # both fire in registration order.
+ if callable(existing_cb) and (
+ existing_gen is None
+ or generation is None
+ or int(existing_gen) == int(generation)
+ ):
+ _prev = existing_cb
+ _new = callback
+
+ def _chained() -> None:
+ try:
+ _prev()
+ except Exception:
+ logger.debug("Post-delivery callback failed", exc_info=True)
+ try:
+ _new()
+ except Exception:
+ logger.debug("Post-delivery callback failed", exc_info=True)
+
+ callback = _chained
+
if generation is None:
self._post_delivery_callbacks[session_key] = callback
else:
@@ -2675,10 +2748,18 @@ class BasePlatformAdapter(ABC):
mode = os.getenv("HERMES_HUMAN_DELAY_MODE", "off").lower()
if mode == "off":
return 0.0
- min_ms = int(os.getenv("HERMES_HUMAN_DELAY_MIN_MS", "800"))
- max_ms = int(os.getenv("HERMES_HUMAN_DELAY_MAX_MS", "2500"))
if mode == "natural":
min_ms, max_ms = 800, 2500
+ return random.uniform(min_ms / 1000.0, max_ms / 1000.0)
+ # custom mode — tolerate malformed env vars instead of crashing.
+ try:
+ min_ms = int(os.getenv("HERMES_HUMAN_DELAY_MIN_MS", "800"))
+ except (TypeError, ValueError):
+ min_ms = 800
+ try:
+ max_ms = int(os.getenv("HERMES_HUMAN_DELAY_MAX_MS", "2500"))
+ except (TypeError, ValueError):
+ max_ms = 2500
return random.uniform(min_ms / 1000.0, max_ms / 1000.0)
async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
@@ -2764,13 +2845,21 @@ class BasePlatformAdapter(ABC):
if not response:
logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
if response:
+ # Capture [[as_document]] before extract_media strips it, so the
+ # dispatch partition below can route image-extension files
+ # through send_document instead of send_multiple_images. Used
+ # by skills that produce large/lossless images (e.g. info-graph)
+ # where Telegram's sendPhoto recompression destroys legibility.
+ force_document_attachments = "[[as_document]]" in response
+
# Extract MEDIA: tags (from TTS tool) before other processing
media_files, response = self.extract_media(response)
-
+
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
# Strip any remaining internal directives from message body (fixes #1561)
text_content = text_content.replace("[[audio_as_voice]]", "").strip()
+ text_content = text_content.replace("[[as_document]]", "").strip()
text_content = re.sub(r"MEDIA:\s*\S+", "", text_content).strip()
if images:
logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
@@ -2872,19 +2961,26 @@ class BasePlatformAdapter(ABC):
_IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
# Partition images out of media_files + local_files so they
- # can be sent as a single batch (Signal RPC)
+ # can be sent as a single batch (Signal RPC). When
+ # ``[[as_document]]`` was set on the original response, image
+ # files skip the photo path and route to send_document below
+ # so they're delivered with original bytes (no Telegram
+ # sendPhoto recompression).
from urllib.parse import quote as _quote
_image_paths: list = []
_non_image_media: list = []
for media_path, is_voice in media_files:
_ext = Path(media_path).suffix.lower()
- if _ext in _IMAGE_EXTS and not is_voice:
+ if (_ext in _IMAGE_EXTS
+ and not is_voice
+ and not force_document_attachments):
_image_paths.append(media_path)
else:
_non_image_media.append((media_path, is_voice))
_non_image_local: list = []
for file_path in local_files:
- if Path(file_path).suffix.lower() in _IMAGE_EXTS:
+ if (Path(file_path).suffix.lower() in _IMAGE_EXTS
+ and not force_document_attachments):
_image_paths.append(file_path)
else:
_non_image_local.append(file_path)
diff --git a/gateway/platforms/dingtalk.py b/gateway/platforms/dingtalk.py
index f1520e22c6..59913b8b17 100644
--- a/gateway/platforms/dingtalk.py
+++ b/gateway/platforms/dingtalk.py
@@ -365,6 +365,20 @@ class DingTalkAdapter(BasePlatformAdapter):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
+ def _dingtalk_allowed_chats(self) -> Set[str]:
+ """Return the whitelist of group chat IDs the bot will respond in.
+
+ When non-empty, group messages from chats NOT in this set are silently
+ ignored — even if the bot is @mentioned. DMs are never filtered.
+ Empty set means no restriction (fully backward compatible).
+ """
+ raw = self.config.extra.get("allowed_chats") if self.config.extra else None
+ if raw is None:
+ raw = os.getenv("DINGTALK_ALLOWED_CHATS", "")
+ if isinstance(raw, list):
+ return {str(part).strip() for part in raw if str(part).strip()}
+ return {part.strip() for part in str(raw).split(",") if part.strip()}
+
def _compile_mention_patterns(self) -> List[re.Pattern]:
"""Compile optional regex wake-word patterns for group triggers."""
patterns = self.config.extra.get("mention_patterns") if self.config.extra else None
@@ -443,13 +457,21 @@ class DingTalkAdapter(BasePlatformAdapter):
DMs remain unrestricted (subject to ``allowed_users`` which is enforced
earlier). Group messages are accepted when:
+ - the chat passes the ``allowed_chats`` whitelist (when set)
- the chat is explicitly allowlisted in ``free_response_chats``
- ``require_mention`` is disabled
- the bot is @mentioned (``is_in_at_list``)
- the text matches a configured regex wake-word pattern
+
+ When ``allowed_chats`` is non-empty, it acts as a hard gate — messages
+ from any group chat not in the list are ignored regardless of the
+ other rules.
"""
if not is_group:
return True
+ allowed = self._dingtalk_allowed_chats()
+ if allowed and chat_id and chat_id not in allowed:
+ return False
if chat_id and chat_id in self._dingtalk_free_response_chats():
return True
if not self._dingtalk_require_mention():
diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py
index ecfa38c723..ae107cdfb2 100644
--- a/gateway/platforms/discord.py
+++ b/gateway/platforms/discord.py
@@ -10,6 +10,8 @@ Uses discord.py library for:
"""
import asyncio
+import hashlib
+import json
import logging
import os
import struct
@@ -24,6 +26,10 @@ logger = logging.getLogger(__name__)
VALID_THREAD_AUTO_ARCHIVE_MINUTES = {60, 1440, 4320, 10080}
_DISCORD_COMMAND_SYNC_POLICIES = {"safe", "bulk", "off"}
+_DISCORD_COMMAND_SYNC_STATE_SUBDIR = "gateway"
+_DISCORD_COMMAND_SYNC_STATE_FILENAME = "discord_command_sync_state.json"
+_DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS = 4.5
+_DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS = 30.0
try:
import discord
@@ -45,6 +51,7 @@ from gateway.config import Platform, PlatformConfig
import re
from gateway.platforms.helpers import MessageDeduplicator, ThreadParticipationTracker
+from utils import atomic_json_write
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
@@ -470,6 +477,34 @@ class VoiceReceiver:
pass
+def _read_dm_role_auth_guild() -> Optional[int]:
+ """Return the guild ID opted-in for DM role-based auth, or None.
+
+ Reads ``discord.dm_role_auth_guild`` from config.yaml. This is
+ deliberately a config.yaml-only setting (not an env var): per repo
+ policy, ``~/.hermes/.env`` is for secrets only, and this is a
+ behavioral setting. Guild IDs aren't secrets.
+
+ Accepts ints or numeric strings in the config. Anything else
+ (empty, malformed, None) returns None, which keeps the secure
+ default (DM role-auth disabled).
+ """
+ try:
+ from hermes_cli.config import read_raw_config
+ cfg = read_raw_config() or {}
+ discord_cfg = cfg.get("discord", {}) or {}
+ raw = discord_cfg.get("dm_role_auth_guild")
+ except Exception:
+ return None
+ if raw is None or raw == "":
+ return None
+ try:
+ guild_id = int(raw)
+ except (TypeError, ValueError):
+ return None
+ return guild_id if guild_id > 0 else None
+
+
class DiscordAdapter(BasePlatformAdapter):
"""
Discord bot adapter.
@@ -694,7 +729,17 @@ class DiscordAdapter(BasePlatformAdapter):
# human-user allowlist below (bots aren't in it).
else:
# Non-bot: enforce the configured user/role allowlists.
- if not self._is_allowed_user(str(message.author.id), message.author):
+ # Pass guild + is_dm so role checks are scoped to the
+ # originating guild (prevents cross-guild DM bypass, see
+ # _is_allowed_user docstring).
+ _msg_guild = getattr(message, "guild", None)
+ _is_dm = isinstance(message.channel, discord.DMChannel) or _msg_guild is None
+ if not self._is_allowed_user(
+ str(message.author.id),
+ message.author,
+ guild=_msg_guild,
+ is_dm=_is_dm,
+ ):
return
# Multi-agent filtering: if the message mentions specific bots
@@ -825,6 +870,167 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Disconnected", self.name)
+ def _command_sync_state_path(self) -> _Path:
+ from hermes_constants import get_hermes_home
+
+ directory = get_hermes_home() / _DISCORD_COMMAND_SYNC_STATE_SUBDIR
+ try:
+ directory.mkdir(parents=True, exist_ok=True)
+ except Exception:
+ pass
+ return directory / _DISCORD_COMMAND_SYNC_STATE_FILENAME
+
+ def _read_command_sync_state(self) -> dict:
+ try:
+ path = self._command_sync_state_path()
+ if not path.exists():
+ return {}
+ data = json.loads(path.read_text(encoding="utf-8"))
+ except Exception:
+ return {}
+ return data if isinstance(data, dict) else {}
+
+ def _write_command_sync_state(self, state: dict) -> None:
+ atomic_json_write(
+ self._command_sync_state_path(),
+ state,
+ indent=None,
+ separators=(",", ":"),
+ )
+
+ def _command_sync_state_key(self, app_id: Any) -> str:
+ return str(app_id or "unknown")
+
+ def _desired_command_sync_fingerprint(self) -> str:
+ tree = self._client.tree if self._client else None
+ desired = []
+ if tree is not None:
+ desired = [
+ self._canonicalize_app_command_payload(command.to_dict(tree))
+ for command in tree.get_commands()
+ ]
+ desired.sort(key=lambda item: (item.get("type", 1), item.get("name", "")))
+ payload = json.dumps(desired, sort_keys=True, separators=(",", ":"))
+ return hashlib.sha256(payload.encode("utf-8")).hexdigest()
+
+ def _command_sync_skip_reason(self, app_id: Any, fingerprint: str) -> Optional[str]:
+ entry = self._read_command_sync_state().get(self._command_sync_state_key(app_id))
+ if not isinstance(entry, dict):
+ return None
+ now = time.time()
+ retry_after_until = float(entry.get("retry_after_until") or 0)
+ if retry_after_until > now:
+ remaining = max(1, int(retry_after_until - now))
+ return f"Discord asked us to wait before syncing slash commands; retry in {remaining}s"
+ if entry.get("fingerprint") == fingerprint and entry.get("last_success_at"):
+ return "same slash-command fingerprint already synced"
+ return None
+
+ def _record_command_sync_attempt(self, app_id: Any, fingerprint: str) -> None:
+ state = self._read_command_sync_state()
+ state[self._command_sync_state_key(app_id)] = {
+ **(
+ state.get(self._command_sync_state_key(app_id))
+ if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
+ else {}
+ ),
+ "fingerprint": fingerprint,
+ "last_attempt_at": time.time(),
+ }
+ self._write_command_sync_state(state)
+
+ def _record_command_sync_rate_limit(self, app_id: Any, fingerprint: str, retry_after: float) -> None:
+ retry_after = max(1.0, float(retry_after))
+ state = self._read_command_sync_state()
+ state[self._command_sync_state_key(app_id)] = {
+ **(
+ state.get(self._command_sync_state_key(app_id))
+ if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
+ else {}
+ ),
+ "fingerprint": fingerprint,
+ "last_attempt_at": time.time(),
+ "retry_after_until": time.time() + retry_after,
+ "retry_after": retry_after,
+ }
+ self._write_command_sync_state(state)
+
+ def _record_command_sync_success(self, app_id: Any, fingerprint: str, summary: dict) -> None:
+ state = self._read_command_sync_state()
+ state[self._command_sync_state_key(app_id)] = {
+ "fingerprint": fingerprint,
+ "last_attempt_at": time.time(),
+ "last_success_at": time.time(),
+ "summary": summary,
+ }
+ self._write_command_sync_state(state)
+
+ @staticmethod
+ def _extract_discord_retry_after(exc: BaseException) -> Optional[float]:
+ value = getattr(exc, "retry_after", None)
+ if value is not None:
+ try:
+ return max(1.0, float(value))
+ except (TypeError, ValueError):
+ return None
+ response = getattr(exc, "response", None)
+ headers = getattr(response, "headers", None)
+ if headers:
+ for key in ("Retry-After", "X-RateLimit-Reset-After"):
+ try:
+ raw = headers.get(key)
+ except Exception:
+ raw = None
+ if raw is None:
+ continue
+ try:
+ return max(1.0, float(raw))
+ except (TypeError, ValueError):
+ continue
+ return None
+
+ @staticmethod
+ def _is_discord_rate_limit(exc: BaseException) -> bool:
+ """True only for exceptions that look like Discord 429 rate limits.
+
+ Narrower than ``hasattr(exc, 'retry_after')``: discord.py's own
+ ``RateLimited`` exception and any HTTPException with status 429
+ qualify. This prevents suppressing unrelated failures that happen
+ to expose a ``retry_after`` attribute."""
+ # discord.py emits RateLimited / HTTPException subclasses for 429s.
+ # Guard with isinstance-of-class so a mocked ``discord`` module
+ # (where attrs are MagicMocks, not types) doesn't trip isinstance.
+ if DISCORD_AVAILABLE and discord is not None:
+ for attr_name in ("RateLimited", "HTTPException"):
+ cls = getattr(discord, attr_name, None)
+ if not isinstance(cls, type):
+ continue
+ if isinstance(exc, cls):
+ if attr_name == "RateLimited":
+ return True
+ status = getattr(exc, "status", None)
+ if status == 429:
+ return True
+ # Fallback duck-type: something named like a rate-limit with a
+ # numeric retry_after. Covers mocked clients in tests and exotic
+ # transports, without swallowing arbitrary exceptions.
+ name = type(exc).__name__.lower()
+ if ("ratelimit" in name or "rate_limit" in name) and getattr(exc, "retry_after", None) is not None:
+ return True
+ response = getattr(exc, "response", None)
+ status = getattr(response, "status", None) or getattr(response, "status_code", None)
+ if status == 429:
+ return True
+ return False
+
+ def _command_sync_mutation_interval_seconds(self) -> float:
+ return _DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS
+
+ async def _sleep_between_command_sync_mutations(self) -> None:
+ interval = self._command_sync_mutation_interval_seconds()
+ if interval > 0:
+ await asyncio.sleep(interval)
+
async def _run_post_connect_initialization(self) -> None:
"""Finish non-critical startup work after Discord is connected."""
if not self._client:
@@ -840,14 +1046,46 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
return
- # Discord's per-app command-management bucket is ~5 writes / 20 s,
- # so a mass-prune-plus-upsert reconcile (e.g. 77 orphans + 30
- # desired = 107 writes) takes several minutes of forced waits.
- # A flat 30 s budget blew up reliably under bucket pressure and
- # left slash commands broken for ~60 min until the bucket fully
- # recovered. Use a wide ceiling; the cap still guards against a
- # true hang. (#16713)
- summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
+ app_id = getattr(self._client, "application_id", None) or getattr(getattr(self._client, "user", None), "id", None)
+ fingerprint = self._desired_command_sync_fingerprint()
+ skip_reason = self._command_sync_skip_reason(app_id, fingerprint)
+ if skip_reason:
+ logger.info("[%s] Skipping Discord slash command sync: %s", self.name, skip_reason)
+ return
+ self._record_command_sync_attempt(app_id, fingerprint)
+
+ http = getattr(self._client, "http", None)
+ has_ratelimit_timeout = http is not None and hasattr(http, "max_ratelimit_timeout")
+ previous_ratelimit_timeout = getattr(http, "max_ratelimit_timeout", None) if has_ratelimit_timeout else None
+ if has_ratelimit_timeout:
+ http.max_ratelimit_timeout = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
+
+ try:
+ # Discord's per-app command-management bucket is small, and
+ # discord.py can otherwise sit inside one long retry sleep
+ # before surfacing the 429. Keep the whole sync bounded and
+ # persist Discord's retry-after when it refuses the batch.
+ summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
+ except Exception as e:
+ if not self._is_discord_rate_limit(e):
+ raise
+ retry_after = self._extract_discord_retry_after(e)
+ if retry_after is None:
+ # Rate-limited but no retry-after signal — back off for a
+ # conservative default so we don't slam the bucket again.
+ retry_after = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
+ self._record_command_sync_rate_limit(app_id, fingerprint, retry_after)
+ logger.warning(
+ "[%s] Discord rate-limited slash command sync; retrying after %.0fs",
+ self.name,
+ retry_after,
+ )
+ return
+ finally:
+ if has_ratelimit_timeout:
+ http.max_ratelimit_timeout = previous_ratelimit_timeout
+
+ self._record_command_sync_success(app_id, fingerprint, summary)
logger.info(
"[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
self.name,
@@ -1009,11 +1247,20 @@ class DiscordAdapter(BasePlatformAdapter):
created = 0
deleted = 0
http = self._client.http
+ mutation_count = 0
+
+ async def mutate(call, *args):
+ nonlocal mutation_count
+ if mutation_count:
+ await self._sleep_between_command_sync_mutations()
+ result = await call(*args)
+ mutation_count += 1
+ return result
for key, desired in desired_by_key.items():
current = existing_by_key.pop(key, None)
if current is None:
- await http.upsert_global_command(app_id, desired)
+ await mutate(http.upsert_global_command, app_id, desired)
created += 1
continue
@@ -1025,16 +1272,16 @@ class DiscordAdapter(BasePlatformAdapter):
continue
if self._patchable_app_command_payload(current_existing_payload) == self._patchable_app_command_payload(desired):
- await http.delete_global_command(app_id, current.id)
- await http.upsert_global_command(app_id, desired)
+ await mutate(http.delete_global_command, app_id, current.id)
+ await mutate(http.upsert_global_command, app_id, desired)
recreated += 1
continue
- await http.edit_global_command(app_id, current.id, desired)
+ await mutate(http.edit_global_command, app_id, current.id, desired)
updated += 1
for current in existing_by_key.values():
- await http.delete_global_command(app_id, current.id)
+ await mutate(http.delete_global_command, app_id, current.id)
deleted += 1
return {
@@ -1854,8 +2101,16 @@ class DiscordAdapter(BasePlatformAdapter):
pass
completed = receiver.check_silence()
+ # Voice inputs always originate from a specific guild
+ # (guild_id is in scope). Pass it so role checks are
+ # guild-scoped and not cross-guild.
+ _vc_guild = self._client.get_guild(guild_id) if self._client is not None else None
for user_id, pcm_data in completed:
- if not self._is_allowed_user(str(user_id)):
+ if not self._is_allowed_user(
+ str(user_id),
+ guild=_vc_guild,
+ is_dm=False,
+ ):
continue
await self._process_voice_input(guild_id, user_id, pcm_data)
except asyncio.CancelledError:
@@ -1898,13 +2153,32 @@ class DiscordAdapter(BasePlatformAdapter):
except OSError:
pass
- def _is_allowed_user(self, user_id: str, author=None) -> bool:
+ def _is_allowed_user(
+ self,
+ user_id: str,
+ author=None,
+ *,
+ guild=None,
+ is_dm: bool = False,
+ ) -> bool:
"""Check if user is allowed via DISCORD_ALLOWED_USERS or DISCORD_ALLOWED_ROLES.
Uses OR semantics: if the user matches EITHER allowlist, they're allowed.
If both allowlists are empty, everyone is allowed (backwards compatible).
- When author is a Member, checks .roles directly; otherwise falls back
- to scanning the bot's mutual guilds for a Member record.
+
+ Role checks are **scoped to the guild the message originated from**.
+ For DMs (no guild context), role-based auth is disabled by default and
+ only user-ID allowlist applies. Set ``discord.dm_role_auth_guild``
+ in config.yaml to a specific guild ID to opt-in: role membership in
+ that one guild will authorize DMs. This prevents cross-guild
+ privilege escalation where a user with the configured role in any
+ shared public server could DM the bot and pass the allowlist.
+
+ Args:
+ user_id: Author ID as a string.
+ author: Optional Member/User object for in-guild role lookup.
+ guild: The guild the message arrived in (None for DMs).
+ is_dm: True if the message came from a DM channel.
"""
# ``getattr`` fallbacks here guard against test fixtures that build
# an adapter via ``object.__new__(DiscordAdapter)`` and skip __init__
@@ -1915,31 +2189,54 @@ class DiscordAdapter(BasePlatformAdapter):
has_roles = bool(allowed_roles)
if not has_users and not has_roles:
return True
- # Check user ID allowlist
+ # Check user ID allowlist (works for both DMs and guild messages)
if has_users and user_id in allowed_users:
return True
- # Check role allowlist
- if has_roles:
- # Try direct role check from Member object
- direct_roles = getattr(author, "roles", None) if author is not None else None
- if direct_roles:
- if any(getattr(r, "id", None) in allowed_roles for r in direct_roles):
- return True
- # Fallback: scan mutual guilds for member's roles
- if self._client is not None:
- try:
- uid_int = int(user_id)
- except (TypeError, ValueError):
- uid_int = None
- if uid_int is not None:
- for guild in self._client.guilds:
- m = guild.get_member(uid_int)
- if m is None:
- continue
- m_roles = getattr(m, "roles", None) or []
- if any(getattr(r, "id", None) in allowed_roles for r in m_roles):
- return True
- return False
+ # Role allowlist is only consulted when configured.
+ if not has_roles:
+ return False
+
+ # DM path: roles require explicit opt-in via
+ # ``discord.dm_role_auth_guild`` in config.yaml. Without this, a
+ # user with the configured role in ANY mutual guild could DM the
+ # bot and bypass the allowlist (cross-guild leakage).
+ if is_dm or guild is None:
+ dm_guild_id = _read_dm_role_auth_guild()
+ if dm_guild_id is None:
+ return False
+ if self._client is None:
+ return False
+ dm_guild = self._client.get_guild(dm_guild_id)
+ if dm_guild is None:
+ return False
+ try:
+ uid_int = int(user_id)
+ except (TypeError, ValueError):
+ return False
+ m = dm_guild.get_member(uid_int)
+ if m is None:
+ return False
+ m_roles = getattr(m, "roles", None) or []
+ return any(getattr(r, "id", None) in allowed_roles for r in m_roles)
+
+ # Guild path: role check is scoped to THIS guild only.
+ # 1) Prefer the direct Member object passed in (correct guild by construction).
+ direct_roles = getattr(author, "roles", None) if author is not None else None
+ author_guild = getattr(author, "guild", None)
+ if direct_roles and (author_guild is None or author_guild.id == guild.id):
+ if any(getattr(r, "id", None) in allowed_roles for r in direct_roles):
+ return True
+ # 2) Fallback: resolve the Member in the message's guild only — NEVER
+ # scan other mutual guilds (that is the cross-guild bypass bug).
+ try:
+ uid_int = int(user_id)
+ except (TypeError, ValueError):
+ return False
+ m = guild.get_member(uid_int)
+ if m is None:
+ return False
+ m_roles = getattr(m, "roles", None) or []
+ return any(getattr(r, "id", None) in allowed_roles for r in m_roles)
# ── Slash command authorization ─────────────────────────────────────
# Slash commands (``_run_simple_slash`` and ``_handle_thread_create_slash``)
@@ -2036,7 +2333,16 @@ class DiscordAdapter(BasePlatformAdapter):
return (True, None)
user_id = str(user.id)
- if not self._is_allowed_user(user_id, author=user):
+ # Pass guild + is_dm so role check is scoped to the originating
+ # guild and cross-guild DM bypass (#12136) can't land via the
+ # slash surface either.
+ interaction_guild = getattr(interaction, "guild", None)
+ if not self._is_allowed_user(
+ user_id,
+ author=user,
+ guild=interaction_guild,
+ is_dm=in_dm,
+ ):
return (
False,
"user not in DISCORD_ALLOWED_USERS / DISCORD_ALLOWED_ROLES",
@@ -2654,9 +2960,14 @@ class DiscordAdapter(BasePlatformAdapter):
await self._run_simple_slash(interaction, "/reload-skills")
@tree.command(name="voice", description="Toggle voice reply mode")
- @discord.app_commands.describe(mode="Voice mode: on, off, tts, channel, leave, or status")
+ @discord.app_commands.describe(mode="Voice mode: join, channel, leave, on, tts, off, or status")
@discord.app_commands.choices(mode=[
- discord.app_commands.Choice(name="channel — join your voice channel", value="channel"),
+ # `join` and `channel` both route to _handle_voice_channel_join in
+ # gateway/run.py — expose both in the slash UI so autocomplete
+ # matches what the docs advertise and what the runner accepts when
+ # the command is typed as plain text.
+ discord.app_commands.Choice(name="join — join your voice channel", value="join"),
+ discord.app_commands.Choice(name="channel — join your voice channel (alias)", value="channel"),
discord.app_commands.Choice(name="leave — leave voice channel", value="leave"),
discord.app_commands.Choice(name="on — voice reply to voice messages", value="on"),
discord.app_commands.Choice(name="tts — voice reply to all messages", value="tts"),
diff --git a/gateway/platforms/email.py b/gateway/platforms/email.py
index a343692636..7717494de5 100644
--- a/gateway/platforms/email.py
+++ b/gateway/platforms/email.py
@@ -416,6 +416,18 @@ class EmailAdapter(BasePlatformAdapter):
logger.debug("[Email] Dropping automated sender at dispatch: %s", sender_addr)
return
+ # Skip senders not in EMAIL_ALLOWED_USERS — prevents the adapter
+ # from creating a MessageEvent (and thus thread context) for senders
+ # that the gateway will never authorize. Without this early guard,
+ # a race between dispatch and authorization can result in the adapter
+ # sending a reply even though the handler returned None.
+ allowed_raw = os.getenv("EMAIL_ALLOWED_USERS", "").strip()
+ if allowed_raw:
+ allowed = {addr.strip().lower() for addr in allowed_raw.split(",") if addr.strip()}
+ if sender_addr.lower() not in allowed:
+ logger.debug("[Email] Dropping non-allowlisted sender at dispatch: %s", sender_addr)
+ return
+
subject = msg_data["subject"]
body = msg_data["body"].strip()
attachments = msg_data["attachments"]
diff --git a/gateway/platforms/feishu.py b/gateway/platforms/feishu.py
index ac920bab69..cd9504e1da 100644
--- a/gateway/platforms/feishu.py
+++ b/gateway/platforms/feishu.py
@@ -153,6 +153,9 @@ _MARKDOWN_HINT_RE = re.compile(
r"(^#{1,6}\s)|(^\s*[-*]\s)|(^\s*\d+\.\s)|(^\s*---+\s*$)|(```)|(`[^`\n]+`)|(\*\*[^*\n].+?\*\*)|(~~[^~\n].+?~~)|(.+?)|(\*[^*\n]+\*)|(\[[^\]]+\]\([^)]+\))|(^>\s)",
re.MULTILINE,
)
+# Detect markdown tables: a line starting with | followed by a separator line.
+# Feishu post-type 'md' elements do not render tables, so we force text mode.
+_MARKDOWN_TABLE_RE = re.compile(r"^\|.*\|\n\|[-|: ]+\|", re.MULTILINE)
_MARKDOWN_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
_MARKDOWN_FENCE_OPEN_RE = re.compile(r"^```([^\n`]*)\s*$")
_MARKDOWN_FENCE_CLOSE_RE = re.compile(r"^```\s*$")
@@ -3862,47 +3865,50 @@ class FeishuAdapter(BasePlatformAdapter):
and self-sent bot event filtering.
Populates ``_bot_open_id`` and ``_bot_name`` from /open-apis/bot/v3/info
- (no extra scopes required beyond the tenant access token). Falls back to
- the application info endpoint for ``_bot_name`` only when the first probe
- doesn't return it. Each field is hydrated independently — a value already
- supplied via env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID /
- FEISHU_BOT_NAME) is preserved and skips its probe.
+ (no extra scopes required beyond the tenant access token). The probe
+ always runs when a client is available so stale env vars from app/bot
+ migrations do not break group @mention gating. Falls back to the
+ application info endpoint for ``_bot_name`` only when the first probe
+ doesn't return it. If the probe fails, env-provided values are preserved.
"""
if not self._client:
return
- if self._bot_open_id and self._bot_name:
- # Everything the self-send filter and precise mention gate need is
- # already in place; nothing to probe.
- return
# Primary probe: /open-apis/bot/v3/info — returns bot_name + open_id, no
# extra scopes required. This is the same endpoint the onboarding wizard
# uses via probe_bot().
- if not self._bot_open_id or not self._bot_name:
- try:
- req = (
- BaseRequest.builder()
- .http_method(HttpMethod.GET)
- .uri("/open-apis/bot/v3/info")
- .token_types({AccessTokenType.TENANT})
- .build()
- )
- resp = await asyncio.to_thread(self._client.request, req)
- content = getattr(getattr(resp, "raw", None), "content", None)
- if content:
- payload = json.loads(content)
- parsed = _parse_bot_response(payload) or {}
- open_id = (parsed.get("bot_open_id") or "").strip()
- bot_name = (parsed.get("bot_name") or "").strip()
- if open_id and not self._bot_open_id:
- self._bot_open_id = open_id
- if bot_name and not self._bot_name:
- self._bot_name = bot_name
- except Exception:
- logger.debug(
- "[Feishu] /bot/v3/info probe failed during hydration",
- exc_info=True,
- )
+ try:
+ req = (
+ BaseRequest.builder()
+ .http_method(HttpMethod.GET)
+ .uri("/open-apis/bot/v3/info")
+ .token_types({AccessTokenType.TENANT})
+ .build()
+ )
+ resp = await asyncio.to_thread(self._client.request, req)
+ content = getattr(getattr(resp, "raw", None), "content", None)
+ if content:
+ payload = json.loads(content)
+ parsed = _parse_bot_response(payload) or {}
+ open_id = (parsed.get("bot_open_id") or "").strip()
+ bot_name = (parsed.get("bot_name") or "").strip()
+ if open_id:
+ if self._bot_open_id and self._bot_open_id != open_id:
+ logger.warning(
+ "[Feishu] FEISHU_BOT_OPEN_ID is stale; using /bot/v3/info open_id for group @mention gating."
+ )
+ self._bot_open_id = open_id
+ if bot_name:
+ if self._bot_name and self._bot_name != bot_name:
+ logger.info(
+ "[Feishu] FEISHU_BOT_NAME differs from /bot/v3/info; using hydrated bot name for group @mention gating."
+ )
+ self._bot_name = bot_name
+ except Exception:
+ logger.debug(
+ "[Feishu] /bot/v3/info probe failed during hydration",
+ exc_info=True,
+ )
# Fallback probe for _bot_name only: application info endpoint. Needs
# admin:app.info:readonly or application:application:self_manage scope,
@@ -3947,7 +3953,14 @@ class FeishuAdapter(BasePlatformAdapter):
if isinstance(seen_data, list):
entries: Dict[str, float] = {str(item).strip(): 0.0 for item in seen_data if str(item).strip()}
elif isinstance(seen_data, dict):
- entries = {k: float(v) for k, v in seen_data.items() if isinstance(k, str) and k.strip()}
+ entries = {}
+ for key, value in seen_data.items():
+ if not isinstance(key, str) or not key.strip():
+ continue
+ try:
+ entries[key] = float(value)
+ except (TypeError, ValueError):
+ continue
else:
return
# Filter out TTL-expired entries (entries saved with ts=0.0 are treated as immortal
@@ -3992,6 +4005,12 @@ class FeishuAdapter(BasePlatformAdapter):
# =========================================================================
def _build_outbound_payload(self, content: str) -> tuple[str, str]:
+ # Feishu post-type 'md' elements do not render markdown tables; sending
+ # table content as post causes the message to appear blank on the client.
+ # Force plain text for anything that looks like a markdown table.
+ if _MARKDOWN_TABLE_RE.search(content):
+ text_payload = {"text": content}
+ return "text", json.dumps(text_payload, ensure_ascii=False)
if _MARKDOWN_HINT_RE.search(content):
return "post", _build_markdown_post_payload(content)
text_payload = {"text": content}
@@ -4070,15 +4089,18 @@ class FeishuAdapter(BasePlatformAdapter):
reply_to: Optional[str],
metadata: Optional[Dict[str, Any]],
) -> Any:
+ effective_reply_to = reply_to
+ if not effective_reply_to and metadata and metadata.get("thread_id"):
+ effective_reply_to = metadata.get("reply_to_message_id")
reply_in_thread = bool((metadata or {}).get("thread_id"))
- if reply_to:
+ if effective_reply_to:
body = self._build_reply_message_body(
content=payload,
msg_type=msg_type,
reply_in_thread=reply_in_thread,
uuid_value=str(uuid.uuid4()),
)
- request = self._build_reply_message_request(reply_to, body)
+ request = self._build_reply_message_request(effective_reply_to, body)
return await asyncio.to_thread(self._client.im.v1.message.reply, request)
body = self._build_create_message_body(
@@ -4087,7 +4109,15 @@ class FeishuAdapter(BasePlatformAdapter):
content=payload,
uuid_value=str(uuid.uuid4()),
)
- request = self._build_create_message_request("chat_id", body)
+ # Detect whether chat_id is a user open_id (DM) or a chat_id (group).
+ # Feishu API expects receive_id_type="open_id" for user DMs (ou_ prefix)
+ # and receive_id_type="chat_id" for group chats (oc_ prefix, which IS
+ # the chat_id format — see https://open.feishu.cn/document/).
+ if chat_id.startswith("ou_"):
+ receive_id_type = "open_id"
+ else:
+ receive_id_type = "chat_id"
+ request = self._build_create_message_request(receive_id_type, body)
return await asyncio.to_thread(self._client.im.v1.message.create, request)
@staticmethod
@@ -4561,12 +4591,12 @@ def _poll_registration(
Returns dict with app_id, app_secret, domain, open_id on success.
Returns None on failure.
"""
- deadline = time.time() + expire_in
+ deadline = time.monotonic() + expire_in
current_domain = domain
domain_switched = False
poll_count = 0
- while time.time() < deadline:
+ while time.monotonic() < deadline:
base_url = _accounts_base_url(current_domain)
try:
res = _post_registration(base_url, {
diff --git a/gateway/platforms/helpers.py b/gateway/platforms/helpers.py
index 8db7af7eba..673beeac9b 100644
--- a/gateway/platforms/helpers.py
+++ b/gateway/platforms/helpers.py
@@ -222,33 +222,37 @@ class ThreadParticipationTracker:
def __init__(self, platform_name: str, max_tracked: int = 500):
self._platform = platform_name
self._max_tracked = max_tracked
- self._threads: set = self._load()
+ self._threads: dict[str, None] = {
+ str(thread_id): None for thread_id in self._load()
+ }
def _state_path(self) -> Path:
from hermes_constants import get_hermes_home
return get_hermes_home() / f"{self._platform}_threads.json"
- def _load(self) -> set:
+ def _load(self) -> list[str]:
path = self._state_path()
if path.exists():
try:
- return set(json.loads(path.read_text(encoding="utf-8")))
+ data = json.loads(path.read_text(encoding="utf-8"))
+ if isinstance(data, list):
+ return [str(thread_id) for thread_id in data]
except Exception:
pass
- return set()
+ return []
def _save(self) -> None:
path = self._state_path()
thread_list = list(self._threads)
if len(thread_list) > self._max_tracked:
thread_list = thread_list[-self._max_tracked:]
- self._threads = set(thread_list)
+ self._threads = {thread_id: None for thread_id in thread_list}
atomic_json_write(path, thread_list, indent=None)
def mark(self, thread_id: str) -> None:
"""Mark *thread_id* as participated and persist."""
if thread_id not in self._threads:
- self._threads.add(thread_id)
+ self._threads[thread_id] = None
self._save()
def __contains__(self, thread_id: str) -> bool:
diff --git a/gateway/platforms/matrix.py b/gateway/platforms/matrix.py
index e3bcd24c5e..12e840b69c 100644
--- a/gateway/platforms/matrix.py
+++ b/gateway/platforms/matrix.py
@@ -17,7 +17,8 @@ Environment variables:
MATRIX_REACTIONS Set "false" to disable processing lifecycle reactions
(eyes/checkmark/cross). Default: true
MATRIX_REQUIRE_MENTION Require @mention in rooms (default: true)
- MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement
+ MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement (alias of matrix.free_response_rooms)
+ MATRIX_ALLOWED_ROOMS Comma-separated room IDs; if set, bot ONLY responds in these rooms (whitelist, DMs exempt; alias of matrix.allowed_rooms)
MATRIX_AUTO_THREAD Auto-create threads for room messages (default: true)
MATRIX_DM_AUTO_THREAD Auto-create threads for DM messages (default: false)
MATRIX_RECOVERY_KEY Recovery key for cross-signing verification after device key rotation
@@ -343,10 +344,29 @@ class MatrixAdapter(BasePlatformAdapter):
self._require_mention: bool = os.getenv(
"MATRIX_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
- free_rooms_raw = os.getenv("MATRIX_FREE_RESPONSE_ROOMS", "")
- self._free_rooms: Set[str] = {
- r.strip() for r in free_rooms_raw.split(",") if r.strip()
- }
+ free_rooms_raw = config.extra.get("free_response_rooms")
+ if free_rooms_raw is None:
+ free_rooms_raw = os.getenv("MATRIX_FREE_RESPONSE_ROOMS", "")
+ if isinstance(free_rooms_raw, list):
+ self._free_rooms: Set[str] = {
+ str(r).strip() for r in free_rooms_raw if str(r).strip()
+ }
+ else:
+ self._free_rooms: Set[str] = {
+ r.strip() for r in str(free_rooms_raw).split(",") if r.strip()
+ }
+ # If non-empty, bot ONLY responds in these rooms (whitelist); DMs exempt.
+ allowed_rooms_raw = config.extra.get("allowed_rooms")
+ if allowed_rooms_raw is None:
+ allowed_rooms_raw = os.getenv("MATRIX_ALLOWED_ROOMS", "")
+ if isinstance(allowed_rooms_raw, list):
+ self._allowed_rooms: Set[str] = {
+ str(r).strip() for r in allowed_rooms_raw if str(r).strip()
+ }
+ else:
+ self._allowed_rooms: Set[str] = {
+ r.strip() for r in str(allowed_rooms_raw).split(",") if r.strip()
+ }
self._auto_thread: bool = os.getenv("MATRIX_AUTO_THREAD", "true").lower() in (
"true",
"1",
@@ -364,6 +384,12 @@ class MatrixAdapter(BasePlatformAdapter):
"MATRIX_REACTIONS", "true"
).lower() not in ("false", "0", "no")
self._pending_reactions: dict[tuple[str, str], str] = {}
+ # Delay before redacting reactions so Matrix homeservers have time to
+ # deliver the final message event without tripping "missing event"
+ # errors in some clients. 5s is empirically safe; not user-tunable —
+ # if that changes, add a config.yaml entry rather than an env var.
+ self._reaction_redaction_delay_seconds = 5.0
+ self._reaction_redaction_tasks: Set[asyncio.Task] = set()
# Proxy support — resolve once at init, reuse for all HTTP traffic.
self._proxy_url: str | None = resolve_proxy_url(platform_env_var="MATRIX_PROXY")
@@ -851,6 +877,14 @@ class MatrixAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
+ redaction_tasks = list(self._reaction_redaction_tasks)
+ for task in redaction_tasks:
+ if not task.done():
+ task.cancel()
+ if redaction_tasks:
+ await asyncio.gather(*redaction_tasks, return_exceptions=True)
+ self._reaction_redaction_tasks.clear()
+
# Close the SQLite crypto store database.
if hasattr(self, "_crypto_db") and self._crypto_db:
try:
@@ -1559,6 +1593,18 @@ class MatrixAdapter(BasePlatformAdapter):
# Require-mention gating.
if not is_dm:
+ # allowed_rooms check (whitelist — must pass before other gating).
+ # When set, messages from rooms NOT in this whitelist are silently
+ # ignored, even if @mentioned. DMs are already excluded above.
+ if self._allowed_rooms and room_id not in self._allowed_rooms:
+ logger.debug(
+ "Matrix: ignoring message %s in %s — room not in "
+ "MATRIX_ALLOWED_ROOMS whitelist",
+ event_id,
+ room_id,
+ )
+ return None
+
is_free_room = room_id in self._free_rooms
in_bot_thread = bool(thread_id and thread_id in self._threads)
if self._require_mention and not is_free_room and not in_bot_thread:
@@ -1929,6 +1975,35 @@ class MatrixAdapter(BasePlatformAdapter):
"""Remove a reaction by redacting its event."""
return await self.redact_message(room_id, reaction_event_id, reason)
+ def _schedule_reaction_redaction(
+ self,
+ room_id: str,
+ reaction_event_id: str,
+ reason: str = "",
+ ) -> None:
+ """Redact a reaction after a short delay so message delivery settles."""
+
+ async def _redact_later() -> None:
+ try:
+ if self._reaction_redaction_delay_seconds:
+ await asyncio.sleep(self._reaction_redaction_delay_seconds)
+ if not await self._redact_reaction(room_id, reaction_event_id, reason):
+ logger.debug(
+ "Matrix: failed to redact reaction %s", reaction_event_id
+ )
+ except asyncio.CancelledError:
+ raise
+ except Exception as exc:
+ logger.debug(
+ "Matrix: delayed reaction redaction failed for %s: %s",
+ reaction_event_id,
+ exc,
+ )
+
+ task = asyncio.create_task(_redact_later())
+ self._reaction_redaction_tasks.add(task)
+ task.add_done_callback(self._reaction_redaction_tasks.discard)
+
async def on_processing_start(self, event: MessageEvent) -> None:
"""Add eyes reaction when the agent starts processing a message."""
if not self._reactions_enabled:
@@ -1957,8 +2032,11 @@ class MatrixAdapter(BasePlatformAdapter):
reaction_key = (room_id, msg_id)
if reaction_key in self._pending_reactions:
eyes_event_id = self._pending_reactions.pop(reaction_key)
- if not await self._redact_reaction(room_id, eyes_event_id):
- logger.debug("Matrix: failed to redact eyes reaction %s", eyes_event_id)
+ self._schedule_reaction_redaction(
+ room_id,
+ eyes_event_id,
+ "processing complete",
+ )
await self._send_reaction(
room_id,
msg_id,
@@ -2037,11 +2115,8 @@ class MatrixAdapter(BasePlatformAdapter):
) -> None:
"""Redact the bot's seed ✅/❎ reactions, leaving only the user's reaction."""
for emoji, evt_id in prompt.bot_reaction_events.items():
- try:
- await self.redact_message(room_id, evt_id, "approval resolved")
- logger.debug("Matrix: redacted bot reaction %s (%s)", emoji, evt_id)
- except Exception as exc:
- logger.debug("Matrix: failed to redact bot reaction %s: %s", emoji, exc)
+ self._schedule_reaction_redaction(room_id, evt_id, "approval resolved")
+ logger.debug("Matrix: scheduled bot reaction redaction %s (%s)", emoji, evt_id)
# ------------------------------------------------------------------
# Text message aggregation (handles Matrix client-side splits)
diff --git a/gateway/platforms/mattermost.py b/gateway/platforms/mattermost.py
index ef3c134a03..3ffd74326d 100644
--- a/gateway/platforms/mattermost.py
+++ b/gateway/platforms/mattermost.py
@@ -706,10 +706,30 @@ class MattermostAdapter(BasePlatformAdapter):
message_text = post.get("message", "")
# Mention-gating for non-DM channels.
- # Config (env vars):
- # MATTERMOST_REQUIRE_MENTION: Require @mention in channels (default: true)
- # MATTERMOST_FREE_RESPONSE_CHANNELS: Channel IDs where bot responds without mention
+ # Config (config.yaml `mattermost.*` with env-var fallback):
+ # require_mention / MATTERMOST_REQUIRE_MENTION: Require @mention in channels (default: true)
+ # free_response_channels / MATTERMOST_FREE_RESPONSE_CHANNELS: Channel IDs where bot responds without mention
+ # allowed_channels / MATTERMOST_ALLOWED_CHANNELS: If set, bot ONLY responds in these channels (whitelist)
if channel_type_raw != "D":
+ # allowed_channels check (whitelist — must pass before other gating).
+ # When set, messages from channels NOT in this list are silently
+ # ignored, even if @mentioned. DMs are already excluded above.
+ allowed_raw = self.config.extra.get("allowed_channels") if self.config.extra else None
+ if allowed_raw is None:
+ allowed_raw = os.getenv("MATTERMOST_ALLOWED_CHANNELS", "")
+ if isinstance(allowed_raw, list):
+ allowed_channels = {str(c).strip() for c in allowed_raw if str(c).strip()}
+ else:
+ allowed_channels = {
+ c.strip() for c in str(allowed_raw).split(",") if c.strip()
+ }
+ if allowed_channels and channel_id not in allowed_channels:
+ logger.debug(
+ "Mattermost: ignoring message in non-allowed channel: %s",
+ channel_id,
+ )
+ return
+
require_mention = os.getenv(
"MATTERMOST_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
diff --git a/gateway/platforms/qqbot/__init__.py b/gateway/platforms/qqbot/__init__.py
index 130269b5f2..d755ec48df 100644
--- a/gateway/platforms/qqbot/__init__.py
+++ b/gateway/platforms/qqbot/__init__.py
@@ -34,6 +34,27 @@ from .crypto import decrypt_secret, generate_bind_key # noqa: F401
# -- Utils -----------------------------------------------------------------
from .utils import build_user_agent, get_api_headers, coerce_list # noqa: F401
+# -- Chunked upload --------------------------------------------------------
+from .chunked_upload import ( # noqa: F401
+ ChunkedUploader,
+ UploadDailyLimitExceededError,
+ UploadFileTooLargeError,
+)
+
+# -- Inline keyboards ------------------------------------------------------
+from .keyboards import ( # noqa: F401
+ ApprovalRequest,
+ ApprovalSender,
+ InlineKeyboard,
+ InteractionEvent,
+ build_approval_keyboard,
+ build_approval_text,
+ build_update_prompt_keyboard,
+ parse_approval_button_data,
+ parse_interaction_event,
+ parse_update_prompt_button_data,
+)
+
__all__ = [
# adapter
"QQAdapter",
@@ -52,4 +73,19 @@ __all__ = [
"build_user_agent",
"get_api_headers",
"coerce_list",
+ # chunked upload
+ "ChunkedUploader",
+ "UploadDailyLimitExceededError",
+ "UploadFileTooLargeError",
+ # keyboards
+ "ApprovalRequest",
+ "ApprovalSender",
+ "InlineKeyboard",
+ "InteractionEvent",
+ "build_approval_keyboard",
+ "build_approval_text",
+ "build_update_prompt_keyboard",
+ "parse_approval_button_data",
+ "parse_interaction_event",
+ "parse_update_prompt_button_data",
]
diff --git a/gateway/platforms/qqbot/adapter.py b/gateway/platforms/qqbot/adapter.py
index f8d7aed787..12caef0f14 100644
--- a/gateway/platforms/qqbot/adapter.py
+++ b/gateway/platforms/qqbot/adapter.py
@@ -41,7 +41,7 @@ import time
import uuid
from datetime import datetime, timezone
from pathlib import Path
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Awaitable, Callable, Dict, List, Optional, Tuple
from urllib.parse import urlparse
try:
@@ -119,6 +119,22 @@ from gateway.platforms.qqbot.utils import (
coerce_list as _coerce_list_impl,
build_user_agent,
)
+from gateway.platforms.qqbot.chunked_upload import (
+ ChunkedUploader,
+ UploadDailyLimitExceededError,
+ UploadFileTooLargeError,
+)
+from gateway.platforms.qqbot.keyboards import (
+ ApprovalRequest,
+ ApprovalSender,
+ InlineKeyboard,
+ InteractionEvent,
+ build_approval_keyboard,
+ build_update_prompt_keyboard,
+ parse_approval_button_data,
+ parse_interaction_event,
+ parse_update_prompt_button_data,
+)
def check_qq_requirements() -> bool:
@@ -208,6 +224,22 @@ class QQAdapter(BasePlatformAdapter):
# Upload cache: content_hash -> {file_info, file_uuid, expires_at}
self._upload_cache: Dict[str, Dict[str, Any]] = {}
+ # Inline-keyboard interaction routing. The callback (if set) is invoked
+ # for every INTERACTION_CREATE event after the adapter has already
+ # ACKed it. Callers (gateway wiring for approvals / update prompts)
+ # register via set_interaction_callback().
+ self._interaction_callback: Optional[
+ Callable[[InteractionEvent], Awaitable[None]]
+ ] = None
+
+ # Default interaction dispatcher: routes approval-button clicks to
+ # tools.approval.resolve_gateway_approval() and update-prompt clicks
+ # to ~/.hermes/.update_response. Set here so the cross-adapter gateway
+ # contract (send_exec_approval / send_update_prompt) works out of the
+ # box; callers can override with set_interaction_callback(None) or
+ # register a custom handler.
+ self._interaction_callback = self._default_interaction_dispatch
+
# ------------------------------------------------------------------
# Properties
# ------------------------------------------------------------------
@@ -759,6 +791,8 @@ class QQAdapter(BasePlatformAdapter):
"GUILD_AT_MESSAGE_CREATE",
):
asyncio.create_task(self._on_message(t, d))
+ elif t == "INTERACTION_CREATE":
+ self._create_task(self._on_interaction(d))
else:
logger.debug("[%s] Unhandled dispatch: %s", self._log_tag, t)
return
@@ -832,6 +866,206 @@ class QQAdapter(BasePlatformAdapter):
elif event_type == "DIRECT_MESSAGE_CREATE":
await self._handle_dm_message(d, msg_id, content, author, timestamp)
+ # ------------------------------------------------------------------
+ # Inline-keyboard interactions (INTERACTION_CREATE)
+ # ------------------------------------------------------------------
+
+ def set_interaction_callback(
+ self,
+ callback: Optional[Callable[[InteractionEvent], Awaitable[None]]],
+ ) -> None:
+ """Register (or clear) the interaction callback.
+
+ Invoked once per ``INTERACTION_CREATE`` event *after* the adapter has
+ ACKed the interaction. The callback is responsible for routing the
+ button click to the right subsystem (approval resolver, update-prompt
+ resolver, etc.) based on the ``button_data`` payload.
+ """
+ self._interaction_callback = callback
+
+ async def _on_interaction(self, d: Any) -> None:
+ """Handle an ``INTERACTION_CREATE`` event.
+
+ Responsibilities:
+
+ 1. Parse the raw payload into an :class:`InteractionEvent`.
+ 2. ACK the interaction (``PUT /interactions/{id}``) so the client
+ stops showing a loading indicator on the button.
+ 3. Dispatch to the registered interaction callback, if any.
+ """
+ if not isinstance(d, dict):
+ return
+ try:
+ event = parse_interaction_event(d)
+ except Exception as exc:
+ logger.warning(
+ "[%s] Failed to parse INTERACTION_CREATE: %s", self._log_tag, exc
+ )
+ return
+
+ if not event.id:
+ logger.warning(
+ "[%s] INTERACTION_CREATE missing id, skipping ACK", self._log_tag
+ )
+ return
+
+ # ACK the interaction promptly — per the QQ docs the client will show
+ # an error icon on the button if we don't respond quickly.
+ try:
+ await self._acknowledge_interaction(event.id)
+ except Exception as exc:
+ logger.warning(
+ "[%s] Failed to ACK interaction %s: %s",
+ self._log_tag, event.id, exc,
+ )
+
+ logger.info(
+ "[%s] Interaction: scene=%s button_data=%r operator=%s",
+ self._log_tag, event.scene, event.button_data, event.operator_openid,
+ )
+
+ callback = self._interaction_callback
+ if callback is None:
+ logger.debug(
+ "[%s] No interaction callback registered; dropping button "
+ "click %r",
+ self._log_tag, event.button_data,
+ )
+ return
+ try:
+ await callback(event)
+ except Exception as exc:
+ logger.error(
+ "[%s] Interaction callback raised: %s",
+ self._log_tag, exc, exc_info=True,
+ )
+
+ async def _acknowledge_interaction(
+ self,
+ interaction_id: str,
+ code: int = 0,
+ ) -> None:
+ """ACK a button interaction via ``PUT /interactions/{id}``.
+
+ :param interaction_id: The ``id`` field from the
+ ``INTERACTION_CREATE`` event.
+ :param code: Response code (``0`` = success).
+ """
+ if not self._http_client:
+ raise RuntimeError("HTTP client not initialized — not connected?")
+ token = await self._ensure_token()
+ headers = {
+ "Authorization": f"QQBot {token}",
+ "Content-Type": "application/json",
+ "User-Agent": build_user_agent(),
+ }
+ resp = await self._http_client.put(
+ f"{API_BASE}/interactions/{interaction_id}",
+ headers=headers,
+ json={"code": code},
+ timeout=DEFAULT_API_TIMEOUT,
+ )
+ if resp.status_code >= 400:
+ raise RuntimeError(
+ f"Interaction ACK failed [{resp.status_code}]: "
+ f"{resp.text[:200]}"
+ )
+
+ # Mapping from QQ keyboard button decisions → the ``choice`` vocabulary
+ # accepted by ``tools.approval.resolve_gateway_approval``. QQ's 3-button
+ # layout (mobile-space constraint) collapses "session" and "always" into
+ # a single "always" button; users wanting session-only approval can fall
+ # back to the ``/approve session`` text command.
+ _APPROVAL_BUTTON_TO_CHOICE = {
+ "allow-once": "once",
+ "allow-always": "always",
+ "deny": "deny",
+ }
+
+ async def _default_interaction_dispatch(
+ self,
+ event: InteractionEvent,
+ ) -> None:
+ """Route ``INTERACTION_CREATE`` button clicks to the right subsystem.
+
+ - ``approve::`` →
+ :func:`tools.approval.resolve_gateway_approval`
+ (unblocks the agent thread waiting on a dangerous-command approval).
+ - ``update_prompt:`` →
+ writes the answer to ``~/.hermes/.update_response`` for the
+ detached ``hermes update --gateway`` process to consume.
+ - Anything else is logged at DEBUG and ignored.
+
+ Installed as the adapter's default interaction callback in
+ ``__init__``. Callers can replace via
+ :meth:`set_interaction_callback` to route clicks elsewhere (or pass
+ ``None`` to drop them entirely).
+ """
+ button_data = event.button_data
+ if not button_data:
+ return
+
+ approval = parse_approval_button_data(button_data)
+ if approval is not None:
+ session_key, decision = approval
+ choice = self._APPROVAL_BUTTON_TO_CHOICE.get(decision)
+ if choice is None:
+ logger.warning(
+ "[%s] Unknown approval decision %r (session=%s)",
+ self._log_tag, decision, session_key,
+ )
+ return
+ try:
+ # Import lazily to keep the adapter importable in tests that
+ # don't exercise the approval subsystem.
+ from tools.approval import resolve_gateway_approval
+ count = resolve_gateway_approval(session_key, choice)
+ logger.info(
+ "[%s] Button resolved %d approval(s) for session %s "
+ "(choice=%s, operator=%s)",
+ self._log_tag, count, session_key, choice,
+ event.operator_openid,
+ )
+ except Exception as exc:
+ logger.error(
+ "[%s] resolve_gateway_approval failed for session %s: %s",
+ self._log_tag, session_key, exc,
+ )
+ return
+
+ update_answer = parse_update_prompt_button_data(button_data)
+ if update_answer is not None:
+ self._write_update_response(update_answer, event.operator_openid)
+ return
+
+ logger.debug(
+ "[%s] Unrecognised button_data %r from interaction %s",
+ self._log_tag, button_data, event.id,
+ )
+
+ @staticmethod
+ def _write_update_response(answer: str, operator: str = "") -> None:
+ """Atomically write the update-prompt answer to ``.update_response``.
+
+ Mirrors the Discord / Telegram / Feishu adapters: the detached
+ ``hermes update --gateway`` watcher polls this file for a ``y``/``n``
+ response to its interactive prompts (stash-restore, config migration).
+ Writes via ``tmp + rename`` so a partial write can't fool the reader.
+ """
+ try:
+ from hermes_constants import get_hermes_home
+ home = get_hermes_home()
+ response_path = home / ".update_response"
+ tmp = response_path.with_suffix(".tmp")
+ tmp.write_text(answer)
+ tmp.replace(response_path)
+ logger.info(
+ "QQ update prompt answered %r by %s",
+ answer, operator or "(unknown)",
+ )
+ except Exception as exc:
+ logger.error("Failed to write update response: %s", exc)
+
async def _handle_c2c_message(
self,
d: Dict[str, Any],
@@ -900,6 +1134,13 @@ class QQAdapter(BasePlatformAdapter):
len(voice_transcripts),
)
+ # Merge any quoted-message context (message_type=103 → msg_elements[0]).
+ quoted = await self._process_quoted_context(d)
+ text = self._merge_quote_into(text, quoted["quote_block"])
+ if quoted["image_urls"]:
+ image_urls = image_urls + quoted["image_urls"]
+ image_media_types = image_media_types + quoted["image_media_types"]
+
if not text.strip() and not image_urls:
return
@@ -958,6 +1199,13 @@ class QQAdapter(BasePlatformAdapter):
else attachment_info
)
+ # Merge any quoted-message context (message_type=103 → msg_elements[0]).
+ quoted = await self._process_quoted_context(d)
+ text = self._merge_quote_into(text, quoted["quote_block"])
+ if quoted["image_urls"]:
+ image_urls = image_urls + quoted["image_urls"]
+ image_media_types = image_media_types + quoted["image_media_types"]
+
if not text.strip() and not image_urls:
return
@@ -1025,6 +1273,13 @@ class QQAdapter(BasePlatformAdapter):
else attachment_info
)
+ # Merge any quoted-message context (message_type=103 → msg_elements[0]).
+ quoted = await self._process_quoted_context(d)
+ text = self._merge_quote_into(text, quoted["quote_block"])
+ if quoted["image_urls"]:
+ image_urls = image_urls + quoted["image_urls"]
+ image_media_types = image_media_types + quoted["image_media_types"]
+
if not text.strip() and not image_urls:
return
@@ -1089,6 +1344,13 @@ class QQAdapter(BasePlatformAdapter):
else attachment_info
)
+ # Merge any quoted-message context (message_type=103 → msg_elements[0]).
+ quoted = await self._process_quoted_context(d)
+ text = self._merge_quote_into(text, quoted["quote_block"])
+ if quoted["image_urls"]:
+ image_urls = image_urls + quoted["image_urls"]
+ image_media_types = image_media_types + quoted["image_media_types"]
+
if not text.strip() and not image_urls:
return
@@ -1109,6 +1371,113 @@ class QQAdapter(BasePlatformAdapter):
)
await self.handle_message(event)
+ # ------------------------------------------------------------------
+ # Quoted-message handling
+ # ------------------------------------------------------------------
+
+ async def _process_quoted_context(
+ self,
+ d: Dict[str, Any],
+ ) -> Dict[str, Any]:
+ """Process the quoted message a user is replying to.
+
+ When a user replies while quoting another message, the platform sets
+ ``message_type = 103`` and pushes the referenced message's content and
+ attachments inside ``msg_elements[0]``. The old adapter ignored
+ ``msg_elements`` entirely, so:
+
+ - Quoted text was surfaced only when the user typed something of
+ their own — bare quote-replies showed nothing.
+ - Quoted attachments (images, voice, files) were never downloaded
+ or described.
+ - Quoted voice messages specifically produced no transcript, so the
+ LLM had no way to see what the user was referring to.
+
+ This method parses ``msg_elements`` and runs the quoted attachments
+ through the same :meth:`_process_attachments` pipeline as the main
+ message body, so quoted voice messages get STT transcripts and
+ quoted images are cached identically.
+
+ :param d: Raw inbound message dict (from the WS dispatch payload).
+ :returns: Dict with keys:
+
+ - ``quote_block``: string to prepend to the user's text body
+ (empty when there's nothing quoted).
+ - ``image_urls``: list of cached quoted-image paths.
+ - ``image_media_types``: parallel list of image MIME types.
+ """
+ empty = {
+ "quote_block": "",
+ "image_urls": [],
+ "image_media_types": [],
+ }
+ # Short-circuit: only message_type 103 indicates a quote.
+ try:
+ if int(d.get("message_type", 0) or 0) != 103:
+ return empty
+ except (TypeError, ValueError):
+ return empty
+
+ elements = d.get("msg_elements")
+ if not isinstance(elements, list) or not elements:
+ return empty
+
+ # msg_elements[0] carries the referenced message. Additional elements
+ # (if any) are very rare in practice; we concatenate their text and
+ # union their attachments for completeness.
+ quoted_text_parts: List[str] = []
+ all_attachments: List[Dict[str, Any]] = []
+ for elem in elements:
+ if not isinstance(elem, dict):
+ continue
+ etext = str(elem.get("content", "")).strip()
+ if etext:
+ quoted_text_parts.append(etext)
+ eatts = elem.get("attachments")
+ if isinstance(eatts, list):
+ for a in eatts:
+ if isinstance(a, dict):
+ all_attachments.append(a)
+
+ att_result = await self._process_attachments(all_attachments)
+ quoted_voice = att_result.get("voice_transcripts") or []
+ quoted_info = att_result.get("attachment_info") or ""
+ quoted_images = att_result.get("image_urls") or []
+ quoted_image_types = att_result.get("image_media_types") or []
+
+ lines: List[str] = []
+ if quoted_text_parts:
+ lines.append(" ".join(quoted_text_parts))
+ for t in quoted_voice:
+ lines.append(t)
+ if quoted_info:
+ lines.append(quoted_info)
+
+ if not lines and not quoted_images:
+ return empty
+
+ if lines:
+ quote_block = "[Quoted message]:\n" + "\n".join(lines)
+ else:
+ # Images-only quote: give the LLM at least a marker so it knows
+ # context was referenced.
+ quote_block = "[Quoted message]: (image)"
+
+ return {
+ "quote_block": quote_block,
+ "image_urls": quoted_images,
+ "image_media_types": quoted_image_types,
+ }
+
+ @staticmethod
+ def _merge_quote_into(text: str, quote_block: str) -> str:
+ """Prepend ``quote_block`` to *text*, separated by a blank line."""
+ if not quote_block:
+ return text
+ if text.strip():
+ return f"{quote_block}\n\n{text}".strip()
+ return quote_block
+
# ------------------------------------------------------------------
# Attachment processing
# ------------------------------------------------------------------
@@ -1992,26 +2361,44 @@ class QQAdapter(BasePlatformAdapter):
return SendResult(success=False, error=error_msg, retryable=retryable)
async def _send_c2c_text(
- self, openid: str, content: str, reply_to: Optional[str] = None
+ self,
+ openid: str,
+ content: str,
+ reply_to: Optional[str] = None,
+ keyboard: Optional[InlineKeyboard] = None,
) -> SendResult:
- """Send text to a C2C user via REST API."""
+ """Send text to a C2C user via REST API.
+
+ :param keyboard: Optional inline keyboard attached to the message.
+ """
self._next_msg_seq(reply_to or openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
+ if keyboard is not None:
+ body["keyboard"] = keyboard.to_dict()
data = await self._api_request("POST", f"/v2/users/{openid}/messages", body)
msg_id = str(data.get("id", uuid.uuid4().hex[:12]))
return SendResult(success=True, message_id=msg_id, raw_response=data)
async def _send_group_text(
- self, group_openid: str, content: str, reply_to: Optional[str] = None
+ self,
+ group_openid: str,
+ content: str,
+ reply_to: Optional[str] = None,
+ keyboard: Optional[InlineKeyboard] = None,
) -> SendResult:
- """Send text to a group via REST API."""
+ """Send text to a group via REST API.
+
+ :param keyboard: Optional inline keyboard attached to the message.
+ """
self._next_msg_seq(reply_to or group_openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
+ if keyboard is not None:
+ body["keyboard"] = keyboard.to_dict()
data = await self._api_request(
"POST", f"/v2/groups/{group_openid}/messages", body
@@ -2031,6 +2418,156 @@ class QQAdapter(BasePlatformAdapter):
msg_id = str(data.get("id", uuid.uuid4().hex[:12]))
return SendResult(success=True, message_id=msg_id, raw_response=data)
+ # ------------------------------------------------------------------
+ # Inline-keyboard outbound helpers (approval / update-prompt flows)
+ # ------------------------------------------------------------------
+
+ async def send_with_keyboard(
+ self,
+ chat_id: str,
+ content: str,
+ keyboard: InlineKeyboard,
+ reply_to: Optional[str] = None,
+ ) -> SendResult:
+ """Send a single text message with an inline keyboard attached.
+
+ Unlike :meth:`send`, this does NOT split long content into chunks —
+ a keyboard message has exactly one interactive surface, and splitting
+ would orphan the buttons from the first chunk. Callers should keep
+ approval/update-prompt bodies short.
+
+ Guild (channel) chats don't support inline keyboards; returns a
+ non-retryable failure for those.
+ """
+ if not self.is_connected:
+ if not await self._wait_for_reconnection():
+ return SendResult(
+ success=False, error="Not connected", retryable=True
+ )
+
+ chat_type = self._guess_chat_type(chat_id)
+ formatted = self.format_message(content)
+ truncated = formatted[: self.MAX_MESSAGE_LENGTH]
+ try:
+ if chat_type == "c2c":
+ return await self._send_c2c_text(
+ chat_id, truncated, reply_to, keyboard=keyboard,
+ )
+ if chat_type == "group":
+ return await self._send_group_text(
+ chat_id, truncated, reply_to, keyboard=keyboard,
+ )
+ return SendResult(
+ success=False,
+ error=(
+ f"Inline keyboards not supported for chat_type "
+ f"{chat_type!r}"
+ ),
+ retryable=False,
+ )
+ except Exception as exc:
+ logger.error(
+ "[%s] send_with_keyboard failed: %s", self._log_tag, exc
+ )
+ return SendResult(success=False, error=str(exc))
+
+ async def send_approval_request(
+ self,
+ chat_id: str,
+ req: ApprovalRequest,
+ reply_to: Optional[str] = None,
+ ) -> SendResult:
+ """Send a 3-button approval request (``allow-once / allow-always / deny``).
+
+ The rendered text comes from :func:`build_approval_text`; callers can
+ override by passing a custom :class:`ApprovalRequest`.
+
+ Users click the button → ``INTERACTION_CREATE`` fires → the adapter's
+ registered :meth:`set_interaction_callback` handler decodes
+ ``button_data`` via :func:`parse_approval_button_data`.
+ """
+ from gateway.platforms.qqbot.keyboards import build_approval_text
+ return await self.send_with_keyboard(
+ chat_id,
+ build_approval_text(req),
+ build_approval_keyboard(req.session_key),
+ reply_to=reply_to,
+ )
+
+ # ------------------------------------------------------------------
+ # Cross-adapter gateway contract — send_exec_approval + send_update_prompt
+ # ------------------------------------------------------------------
+ #
+ # These mirror the signatures that gateway/run.py detects on the adapter
+ # class (e.g. type(adapter).send_exec_approval, type(adapter).send_update_prompt)
+ # for button-based approval / update-confirm UX. Discord, Telegram, Slack,
+ # Matrix, and Feishu already implement the same contract.
+
+ async def send_exec_approval(
+ self,
+ chat_id: str,
+ command: str,
+ session_key: str,
+ description: str = "dangerous command",
+ metadata: Optional[Dict[str, Any]] = None,
+ ) -> SendResult:
+ """Send a button-based exec-approval prompt for a dangerous command.
+
+ Called by ``gateway/run.py``'s ``_approval_notify_sync`` when the
+ agent is blocked waiting for approval. Button clicks resolve via
+ :func:`tools.approval.resolve_gateway_approval` — dispatched by the
+ adapter's interaction callback (:meth:`_default_interaction_dispatch`).
+ """
+ del metadata # QQ doesn't have thread_id / DM targeting overrides.
+
+ # Use the reply-to message for passive-message context when we have one.
+ # QQ requires a msg_id on outbound messages to a user we've never
+ # seen; the last inbound msg_id is the natural choice.
+ msg_id = self._last_msg_id.get(chat_id)
+
+ req = ApprovalRequest(
+ session_key=session_key,
+ title=f"Execute this command?",
+ description=description,
+ command_preview=command,
+ timeout_sec=self._APPROVAL_TIMEOUT_SECONDS,
+ )
+ return await self.send_approval_request(
+ chat_id, req, reply_to=msg_id,
+ )
+
+ _APPROVAL_TIMEOUT_SECONDS = 300 # matches gateway's default gateway_timeout
+
+ async def send_update_prompt(
+ self,
+ chat_id: str,
+ prompt: str,
+ default: str = "",
+ session_key: str = "",
+ metadata: Optional[Dict[str, Any]] = None,
+ ) -> SendResult:
+ """Send a Yes/No update-confirmation prompt with inline buttons.
+
+ Matches the cross-adapter contract used by
+ ``gateway/run.py``'s ``hermes update --gateway`` watcher. Button
+ clicks surface as ``INTERACTION_CREATE`` with
+ ``button_data = 'update_prompt:y'`` or ``'update_prompt:n'``;
+ the adapter's interaction callback writes the answer to
+ ``~/.hermes/.update_response`` so the detached update process
+ can read it.
+ """
+ del session_key, metadata # present for contract parity only.
+
+ default_hint = f" (default: {default})" if default else ""
+ content = f"⚕ **Update Needs Your Input**\n\n{prompt}{default_hint}"
+ msg_id = self._last_msg_id.get(chat_id)
+ return await self.send_with_keyboard(
+ chat_id,
+ content,
+ build_update_prompt_keyboard(),
+ reply_to=msg_id,
+ )
+
def _build_text_body(
self, content: str, reply_to: Optional[str] = None
) -> Dict[str, Any]:
@@ -2160,42 +2697,62 @@ class QQAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
file_name: Optional[str] = None,
) -> SendResult:
- """Upload media and send as a native message."""
+ """Upload media and send as a native message.
+
+ Upload strategy:
+
+ - **HTTP(S) URLs** → single ``POST /v2/{users|groups}/{id}/files``
+ with ``url=...``. The QQ platform fetches the URL directly; fastest
+ path when the source is already hosted.
+ - **Local files** → three-step chunked upload (prepare / PUT parts /
+ complete). Handles files up to the platform's ~100 MB per-file
+ limit without the ~10 MB inline-base64 cap of the old adapter.
+ """
if not self.is_connected:
if not await self._wait_for_reconnection():
return SendResult(success=False, error="Not connected", retryable=True)
- try:
- # Resolve media source
- data, content_type, resolved_name = await self._load_media(
- media_source, file_name
+ chat_type = self._guess_chat_type(chat_id)
+ if chat_type == "guild":
+ # Guild channels don't support native media upload in the same way.
+ return SendResult(
+ success=False,
+ error="Guild media send not supported via this path",
)
- # Route
- chat_type = self._guess_chat_type(chat_id)
-
- if chat_type == "guild":
- # Guild channels don't support native media upload in the same way
- # Send as URL fallback
- return SendResult(
- success=False, error="Guild media send not supported via this path"
+ try:
+ if self._is_url(media_source):
+ # URL upload — let the platform fetch it directly.
+ resolved_name = (
+ file_name
+ or Path(urlparse(media_source).path).name
+ or "media"
+ )
+ upload = await self._upload_media(
+ chat_type,
+ chat_id,
+ file_type,
+ url=media_source,
+ srv_send_msg=False,
+ file_name=resolved_name if file_type == MEDIA_TYPE_FILE else None,
+ )
+ else:
+ # Local file — chunked upload (prepare / PUT parts / complete).
+ resolved_name, upload = await self._upload_local_file(
+ chat_type,
+ chat_id,
+ media_source,
+ file_type,
+ file_name,
)
- # Upload
- upload = await self._upload_media(
- chat_type,
- chat_id,
- file_type,
- file_data=data if not self._is_url(media_source) else None,
- url=media_source if self._is_url(media_source) else None,
- srv_send_msg=False,
- file_name=resolved_name if file_type == MEDIA_TYPE_FILE else None,
- )
-
- file_info = upload.get("file_info")
+ file_info = upload.get("file_info") or (
+ upload.get("data", {}) or {}
+ ).get("file_info")
if not file_info:
return SendResult(
- success=False, error=f"Upload returned no file_info: {upload}"
+ success=False,
+ error=f"Upload returned no file_info: {upload}",
)
# Send media message
@@ -2224,10 +2781,86 @@ class QQAdapter(BasePlatformAdapter):
message_id=str(send_data.get("id", uuid.uuid4().hex[:12])),
raw_response=send_data,
)
+ except UploadDailyLimitExceededError as exc:
+ # Non-retryable: daily quota hit. Give the caller actionable text
+ # so the model can compose a helpful reply.
+ logger.warning(
+ "[%s] Daily upload limit exceeded for %s (%s)",
+ self._log_tag, exc.file_name, exc.file_size_human,
+ )
+ return SendResult(
+ success=False,
+ error=(
+ f"QQ daily upload limit exceeded for {exc.file_name!r} "
+ f"({exc.file_size_human}). Retry tomorrow."
+ ),
+ retryable=False,
+ )
+ except UploadFileTooLargeError as exc:
+ logger.warning(
+ "[%s] File too large: %s (%s, platform limit %s)",
+ self._log_tag, exc.file_name, exc.file_size_human, exc.limit_human,
+ )
+ return SendResult(
+ success=False,
+ error=(
+ f"{exc.file_name!r} ({exc.file_size_human}) exceeds the "
+ f"QQ per-file upload limit ({exc.limit_human})."
+ ),
+ retryable=False,
+ )
except Exception as exc:
logger.error("[%s] Media send failed: %s", self._log_tag, exc)
return SendResult(success=False, error=str(exc))
+ async def _upload_local_file(
+ self,
+ chat_type: str,
+ chat_id: str,
+ media_source: str,
+ file_type: int,
+ file_name: Optional[str],
+ ) -> Tuple[str, Dict[str, Any]]:
+ """Chunked-upload a local file and return ``(resolved_name, complete_response)``.
+
+ The returned ``complete_response`` contains the ``file_info`` token
+ that goes into the subsequent RichMedia message body.
+
+ :raises UploadDailyLimitExceededError: On biz_code 40093002.
+ :raises UploadFileTooLargeError: When the file exceeds the platform limit.
+ :raises FileNotFoundError: If the path does not exist.
+ :raises ValueError: If the path looks like a placeholder (````).
+ :raises RuntimeError: If the HTTP client is not initialized.
+ """
+ if not self._http_client:
+ raise RuntimeError("HTTP client not initialized — not connected?")
+
+ local_path = Path(media_source).expanduser()
+ if not local_path.is_absolute():
+ local_path = (Path.cwd() / local_path).resolve()
+
+ if not local_path.exists() or not local_path.is_file():
+ if media_source.startswith("<") or len(media_source) < 3:
+ raise ValueError(
+ f"Invalid media source (looks like a placeholder): {media_source!r}"
+ )
+ raise FileNotFoundError(f"Media file not found: {local_path}")
+
+ resolved_name = file_name or local_path.name
+ uploader = ChunkedUploader(
+ api_request=self._api_request,
+ http_put=self._http_client.put,
+ log_tag=self._log_tag,
+ )
+ complete = await uploader.upload(
+ chat_type=chat_type,
+ target_id=chat_id,
+ file_path=str(local_path),
+ file_type=file_type,
+ file_name=resolved_name,
+ )
+ return resolved_name, complete
+
async def _load_media(
self, source: str, file_name: Optional[str] = None
) -> Tuple[str, str, str]:
diff --git a/gateway/platforms/qqbot/chunked_upload.py b/gateway/platforms/qqbot/chunked_upload.py
new file mode 100644
index 0000000000..d0a6e5d226
--- /dev/null
+++ b/gateway/platforms/qqbot/chunked_upload.py
@@ -0,0 +1,603 @@
+"""QQ Bot chunked upload flow.
+
+The QQ v2 API caps inline base64 uploads (``file_data`` / ``url``) at ~10 MB.
+For files between 10 MB and ~100 MB we have to use the three-step chunked
+upload flow::
+
+ 1. POST /v2/{users|groups}/{id}/upload_prepare
+ → returns upload_id, block_size, and an array of pre-signed COS part URLs.
+ 2. For each part:
+ PUT the part bytes to its pre-signed COS URL,
+ then POST /v2/{users|groups}/{id}/upload_part_finish to acknowledge.
+ 3. POST /v2/{users|groups}/{id}/files with {"upload_id": ...}
+ → returns the ``file_info`` token the caller uses in a RichMedia
+ message.
+
+Error-code semantics (from the QQ Bot v2 API spec):
+
+- ``40093001`` — ``upload_part_finish`` retryable. Retry until the server-provided
+ ``retry_timeout`` elapses (or a local cap).
+- ``40093002`` — daily cumulative upload quota exceeded. Not retryable; surface
+ as :class:`UploadDailyLimitExceededError` so the caller can build a
+ user-friendly reply.
+
+Exceptions:
+
+- :class:`UploadDailyLimitExceededError` — daily quota hit (non-retryable).
+- :class:`UploadFileTooLargeError` — file exceeds the platform per-file limit.
+- :class:`RuntimeError` — generic upload failure (network, part PUT, complete).
+
+Ported from WideLee's qqbot-agent-sdk v1.2.2 (``media_loader.py::ChunkedUploader``)
+so the heavy-upload path stays in-tree. Authorship preserved via Co-authored-by.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import functools
+import hashlib
+import logging
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Awaitable, Callable, Dict, List, Optional
+
+from gateway.platforms.qqbot.constants import FILE_UPLOAD_TIMEOUT
+
+logger = logging.getLogger(__name__)
+
+
+# ── Error codes ──────────────────────────────────────────────────────
+_BIZ_CODE_DAILY_LIMIT = 40093002 # upload_prepare: daily cumulative limit
+_BIZ_CODE_PART_RETRYABLE = 40093001 # upload_part_finish: transient
+
+# ── Part upload tuning ───────────────────────────────────────────────
+_DEFAULT_CONCURRENT_PARTS = 1
+_MAX_CONCURRENT_PARTS = 10
+
+_PART_UPLOAD_TIMEOUT = 300.0 # 5 minutes per COS PUT
+_PART_UPLOAD_MAX_RETRIES = 2
+_PART_FINISH_RETRY_INTERVAL = 1.0
+_PART_FINISH_DEFAULT_TIMEOUT = 120.0
+_PART_FINISH_MAX_TIMEOUT = 600.0
+
+_COMPLETE_UPLOAD_MAX_RETRIES = 2
+_COMPLETE_UPLOAD_BASE_DELAY = 2.0
+
+# First 10,002,432 bytes used for the ``md5_10m`` hash (per QQ API spec).
+_MD5_10M_SIZE = 10_002_432
+
+
+# ── Exceptions ───────────────────────────────────────────────────────
+
+class UploadDailyLimitExceededError(Exception):
+ """Raised when ``upload_prepare`` returns biz_code 40093002.
+
+ The daily cumulative upload quota for this bot has been reached. Callers
+ should surface :attr:`file_name` + :attr:`file_size_human` so the model
+ can compose a helpful reply.
+ """
+
+ def __init__(self, file_name: str, file_size: int, message: str = "") -> None:
+ self.file_name = file_name
+ self.file_size = file_size
+ super().__init__(
+ message or f"Daily upload limit exceeded for {file_name!r}"
+ )
+
+ @property
+ def file_size_human(self) -> str:
+ return format_size(self.file_size)
+
+
+class UploadFileTooLargeError(Exception):
+ """Raised when a file exceeds the platform per-file size limit."""
+
+ def __init__(
+ self,
+ file_name: str,
+ file_size: int,
+ limit_bytes: int = 0,
+ message: str = "",
+ ) -> None:
+ self.file_name = file_name
+ self.file_size = file_size
+ self.limit_bytes = limit_bytes
+ limit_str = f" ({format_size(limit_bytes)})" if limit_bytes else ""
+ super().__init__(
+ message
+ or (
+ f"File {file_name!r} ({format_size(file_size)}) "
+ f"exceeds platform limit{limit_str}"
+ )
+ )
+
+ @property
+ def file_size_human(self) -> str:
+ return format_size(self.file_size)
+
+ @property
+ def limit_human(self) -> str:
+ return format_size(self.limit_bytes) if self.limit_bytes else "unknown"
+
+
+# ── Progress tracking ────────────────────────────────────────────────
+
+@dataclass
+class _UploadProgress:
+ total_parts: int = 0
+ total_bytes: int = 0
+ completed_parts: int = 0
+ uploaded_bytes: int = 0
+
+
+# ── Prepare-response shape ───────────────────────────────────────────
+
+@dataclass
+class _PreparePart:
+ index: int
+ presigned_url: str
+ block_size: int = 0
+
+
+@dataclass
+class _PrepareResult:
+ upload_id: str
+ block_size: int
+ parts: List[_PreparePart]
+ concurrency: int = _DEFAULT_CONCURRENT_PARTS
+ retry_timeout: float = 0.0
+
+
+def _parse_prepare_response(raw: Dict[str, Any]) -> _PrepareResult:
+ """Parse the upload_prepare API response into a normalized shape.
+
+ The API may return the response directly or wrapped in ``data``.
+ """
+ src = raw.get("data") if isinstance(raw.get("data"), dict) else raw
+ upload_id = str(src.get("upload_id", ""))
+ if not upload_id:
+ raise ValueError(
+ f"upload_prepare response missing upload_id: {str(raw)[:200]}"
+ )
+ block_size = int(src.get("block_size", 0))
+ raw_parts = src.get("parts") or src.get("part_list") or []
+ if not isinstance(raw_parts, list) or not raw_parts:
+ raise ValueError(
+ f"upload_prepare response missing parts: {str(raw)[:200]}"
+ )
+ parts: List[_PreparePart] = []
+ for p in raw_parts:
+ if not isinstance(p, dict):
+ continue
+ parts.append(
+ _PreparePart(
+ index=int(p.get("part_index") or p.get("index") or 0),
+ presigned_url=str(
+ p.get("presigned_url") or p.get("url") or ""
+ ),
+ block_size=int(p.get("block_size", 0)),
+ )
+ )
+ return _PrepareResult(
+ upload_id=upload_id,
+ block_size=block_size,
+ parts=parts,
+ concurrency=int(src.get("concurrency", _DEFAULT_CONCURRENT_PARTS)) or _DEFAULT_CONCURRENT_PARTS,
+ retry_timeout=float(src.get("retry_timeout", 0.0) or 0.0),
+ )
+
+
+# ── Chunked upload driver ────────────────────────────────────────────
+
+ApiRequestFn = Callable[..., Awaitable[Dict[str, Any]]]
+"""Signature of the adapter's ``_api_request`` callable.
+
+We pass the bound method in rather than importing the adapter, to avoid
+circular imports and keep this module testable in isolation.
+"""
+
+
+class ChunkedUploader:
+ """Run the prepare → PUT parts → complete sequence.
+
+ :param api_request: Bound ``_api_request(method, path, body=..., timeout=...)``
+ coroutine from the adapter. Must raise ``RuntimeError`` with the biz_code
+ embedded in the message on API errors.
+ :param http_put: Coroutine ``(url, data, headers, timeout) -> response`` for
+ COS part uploads. Typically wraps ``httpx.AsyncClient.put``.
+ :param log_tag: Log prefix.
+ """
+
+ def __init__(
+ self,
+ api_request: ApiRequestFn,
+ http_put: Callable[..., Awaitable[Any]],
+ log_tag: str = "QQBot",
+ ) -> None:
+ self._api_request = api_request
+ self._http_put = http_put
+ self._log_tag = log_tag
+
+ async def upload(
+ self,
+ chat_type: str,
+ target_id: str,
+ file_path: str,
+ file_type: int,
+ file_name: str,
+ ) -> Dict[str, Any]:
+ """Run the full chunked upload and return the ``complete_upload`` response.
+
+ :param chat_type: ``'c2c'`` or ``'group'``.
+ :param target_id: User or group openid.
+ :param file_path: Absolute path to a local file.
+ :param file_type: ``MEDIA_TYPE_*`` constant.
+ :param file_name: Original filename (for upload_prepare).
+ :returns: The raw response dict from ``complete_upload`` — contains
+ ``file_info`` that the caller uses in a RichMedia message body.
+ :raises UploadDailyLimitExceededError: On biz_code 40093002.
+ :raises UploadFileTooLargeError: When the file exceeds the platform limit.
+ :raises RuntimeError: On other API or I/O failures.
+ """
+ if chat_type not in ("c2c", "group"):
+ raise ValueError(
+ f"ChunkedUploader: unsupported chat_type {chat_type!r}"
+ )
+
+ path = Path(file_path)
+ file_size = path.stat().st_size
+
+ logger.info(
+ "[%s] Chunked upload start: file=%s size=%s type=%d",
+ self._log_tag, file_name, format_size(file_size), file_type,
+ )
+
+ # Step 1: compute hashes (blocking I/O → executor).
+ hashes = await asyncio.get_running_loop().run_in_executor(
+ None, _compute_file_hashes, file_path, file_size
+ )
+
+ # Step 2: upload_prepare.
+ prepare = await self._prepare(
+ chat_type, target_id, file_type, file_name, file_size, hashes
+ )
+ max_concurrent = min(prepare.concurrency, _MAX_CONCURRENT_PARTS)
+ retry_timeout = min(
+ prepare.retry_timeout if prepare.retry_timeout > 0 else _PART_FINISH_DEFAULT_TIMEOUT,
+ _PART_FINISH_MAX_TIMEOUT,
+ )
+ logger.info(
+ "[%s] Prepared: upload_id=%s block_size=%s parts=%d concurrency=%d",
+ self._log_tag, prepare.upload_id, format_size(prepare.block_size),
+ len(prepare.parts), max_concurrent,
+ )
+
+ progress = _UploadProgress(
+ total_parts=len(prepare.parts),
+ total_bytes=file_size,
+ )
+
+ # Step 3: PUT each part + notify.
+ tasks: List[Callable[[], Awaitable[None]]] = [
+ functools.partial(
+ self._upload_one_part,
+ chat_type=chat_type,
+ target_id=target_id,
+ file_path=file_path,
+ file_size=file_size,
+ upload_id=prepare.upload_id,
+ rsp_block_size=prepare.block_size,
+ part=part,
+ retry_timeout=retry_timeout,
+ progress=progress,
+ )
+ for part in prepare.parts
+ ]
+ await _run_with_concurrency(tasks, max_concurrent)
+
+ logger.info(
+ "[%s] All %d parts uploaded, completing…",
+ self._log_tag, len(prepare.parts),
+ )
+
+ # Step 4: complete_upload (retry on transient errors).
+ return await self._complete(chat_type, target_id, prepare.upload_id)
+
+ # ──────────────────────────────────────────────────────────────────
+ # Step 1 — upload_prepare
+ # ──────────────────────────────────────────────────────────────────
+
+ async def _prepare(
+ self,
+ chat_type: str,
+ target_id: str,
+ file_type: int,
+ file_name: str,
+ file_size: int,
+ hashes: Dict[str, str],
+ ) -> _PrepareResult:
+ base = "/v2/users" if chat_type == "c2c" else "/v2/groups"
+ path = f"{base}/{target_id}/upload_prepare"
+ body = {
+ "file_type": file_type,
+ "file_name": file_name,
+ "file_size": file_size,
+ "md5": hashes["md5"],
+ "sha1": hashes["sha1"],
+ "md5_10m": hashes["md5_10m"],
+ }
+ try:
+ raw = await self._api_request(
+ "POST", path, body=body, timeout=FILE_UPLOAD_TIMEOUT
+ )
+ except RuntimeError as exc:
+ err_msg = str(exc)
+ if f"{_BIZ_CODE_DAILY_LIMIT}" in err_msg:
+ raise UploadDailyLimitExceededError(
+ file_name, file_size, err_msg
+ ) from exc
+ raise
+ return _parse_prepare_response(raw)
+
+ # ──────────────────────────────────────────────────────────────────
+ # Step 2 — PUT one part + part_finish
+ # ──────────────────────────────────────────────────────────────────
+
+ async def _upload_one_part(
+ self,
+ chat_type: str,
+ target_id: str,
+ file_path: str,
+ file_size: int,
+ upload_id: str,
+ rsp_block_size: int,
+ part: _PreparePart,
+ retry_timeout: float,
+ progress: _UploadProgress,
+ ) -> None:
+ """PUT one part to COS, then call ``upload_part_finish``."""
+ part_index = part.index
+ # Per-part block_size wins; fall back to the response-level value.
+ actual_block_size = part.block_size if part.block_size > 0 else rsp_block_size
+ offset = (part_index - 1) * rsp_block_size
+ length = min(actual_block_size, file_size - offset)
+
+ # Read this slice of the file (blocking → executor).
+ data = await asyncio.get_running_loop().run_in_executor(
+ None, _read_file_chunk, file_path, offset, length
+ )
+ md5_hex = hashlib.md5(data).hexdigest()
+
+ logger.debug(
+ "[%s] Part %d/%d: uploading %s (offset=%d md5=%s)",
+ self._log_tag, part_index, progress.total_parts,
+ format_size(length), offset, md5_hex,
+ )
+
+ await self._put_to_presigned_url(
+ part.presigned_url, data, part_index, progress.total_parts
+ )
+ await self._part_finish_with_retry(
+ chat_type, target_id, upload_id,
+ part_index, length, md5_hex, retry_timeout,
+ )
+
+ progress.completed_parts += 1
+ progress.uploaded_bytes += length
+ logger.debug(
+ "[%s] Part %d/%d done (%d/%d total)",
+ self._log_tag, part_index, progress.total_parts,
+ progress.completed_parts, progress.total_parts,
+ )
+
+ async def _put_to_presigned_url(
+ self,
+ url: str,
+ data: bytes,
+ part_index: int,
+ total_parts: int,
+ ) -> None:
+ """PUT part data to a pre-signed COS URL with retry."""
+ last_exc: Optional[Exception] = None
+ for attempt in range(_PART_UPLOAD_MAX_RETRIES + 1):
+ try:
+ resp = await asyncio.wait_for(
+ self._http_put(
+ url,
+ data=data,
+ headers={"Content-Length": str(len(data))},
+ ),
+ timeout=_PART_UPLOAD_TIMEOUT,
+ )
+ # Caller's http_put is expected to return an httpx-like response.
+ status = getattr(resp, "status_code", 0)
+ if 200 <= status < 300:
+ logger.debug(
+ "[%s] PUT part %d/%d: %d OK",
+ self._log_tag, part_index, total_parts, status,
+ )
+ return
+ body_preview = ""
+ try:
+ body_preview = getattr(resp, "text", "")[:200]
+ except Exception: # pragma: no cover — defensive
+ pass
+ raise RuntimeError(
+ f"COS PUT returned {status}: {body_preview}"
+ )
+ except Exception as exc:
+ last_exc = exc
+ if attempt < _PART_UPLOAD_MAX_RETRIES:
+ delay = 1.0 * (2 ** attempt)
+ logger.warning(
+ "[%s] PUT part %d/%d attempt %d failed, retry in %.1fs: %s",
+ self._log_tag, part_index, total_parts,
+ attempt + 1, delay, exc,
+ )
+ await asyncio.sleep(delay)
+ raise RuntimeError(
+ f"Part {part_index}/{total_parts} upload failed after "
+ f"{_PART_UPLOAD_MAX_RETRIES + 1} attempts: {last_exc}"
+ )
+
+ async def _part_finish_with_retry(
+ self,
+ chat_type: str,
+ target_id: str,
+ upload_id: str,
+ part_index: int,
+ block_size: int,
+ md5: str,
+ retry_timeout: float,
+ ) -> None:
+ """Call ``upload_part_finish``, retrying on biz_code 40093001."""
+ base = "/v2/users" if chat_type == "c2c" else "/v2/groups"
+ path = f"{base}/{target_id}/upload_part_finish"
+ body = {
+ "upload_id": upload_id,
+ "part_index": part_index,
+ "block_size": block_size,
+ "md5": md5,
+ }
+
+ loop = asyncio.get_running_loop()
+ start = loop.time()
+ attempt = 0
+ while True:
+ try:
+ await self._api_request(
+ "POST", path, body=body, timeout=FILE_UPLOAD_TIMEOUT
+ )
+ return
+ except RuntimeError as exc:
+ err_msg = str(exc)
+ if f"{_BIZ_CODE_PART_RETRYABLE}" not in err_msg:
+ raise
+ elapsed = loop.time() - start
+ if elapsed >= retry_timeout:
+ raise RuntimeError(
+ f"upload_part_finish persistent retry timed out "
+ f"after {retry_timeout:.0f}s ({attempt} retries): {exc}"
+ ) from exc
+ attempt += 1
+ logger.debug(
+ "[%s] part_finish retryable error, attempt %d, "
+ "elapsed=%.1fs: %s",
+ self._log_tag, attempt, elapsed, exc,
+ )
+ await asyncio.sleep(_PART_FINISH_RETRY_INTERVAL)
+
+ # ──────────────────────────────────────────────────────────────────
+ # Step 3 — complete_upload
+ # ──────────────────────────────────────────────────────────────────
+
+ async def _complete(
+ self,
+ chat_type: str,
+ target_id: str,
+ upload_id: str,
+ ) -> Dict[str, Any]:
+ """Call ``complete_upload`` with retry.
+
+ This reuses the ``/files`` endpoint (same as the simple URL-based upload)
+ but signals the chunked-completion path by sending only ``upload_id``.
+ """
+ base = "/v2/users" if chat_type == "c2c" else "/v2/groups"
+ path = f"{base}/{target_id}/files"
+ body = {"upload_id": upload_id}
+
+ last_exc: Optional[Exception] = None
+ for attempt in range(_COMPLETE_UPLOAD_MAX_RETRIES + 1):
+ try:
+ return await self._api_request(
+ "POST", path, body=body, timeout=FILE_UPLOAD_TIMEOUT
+ )
+ except Exception as exc:
+ last_exc = exc
+ if attempt < _COMPLETE_UPLOAD_MAX_RETRIES:
+ delay = _COMPLETE_UPLOAD_BASE_DELAY * (2 ** attempt)
+ logger.warning(
+ "[%s] complete_upload attempt %d failed, "
+ "retry in %.1fs: %s",
+ self._log_tag, attempt + 1, delay, exc,
+ )
+ await asyncio.sleep(delay)
+ raise RuntimeError(
+ f"complete_upload failed after "
+ f"{_COMPLETE_UPLOAD_MAX_RETRIES + 1} attempts: {last_exc}"
+ )
+
+
+# ── Helpers (module-level for testability) ───────────────────────────
+
+def format_size(size_bytes: int) -> str:
+ """Return a human-readable file size string (e.g. ``'12.3 MB'``)."""
+ size = float(size_bytes)
+ for unit in ("B", "KB", "MB", "GB"):
+ if size < 1024.0:
+ return f"{size:.1f} {unit}"
+ size /= 1024.0
+ return f"{size:.1f} TB"
+
+
+def _read_file_chunk(file_path: str, offset: int, length: int) -> bytes:
+ """Read *length* bytes from *file_path* starting at *offset*.
+
+ :raises IOError: If fewer bytes were read than expected (truncated file).
+ """
+ with open(file_path, "rb") as fh:
+ fh.seek(offset)
+ data = fh.read(length)
+ if len(data) != length:
+ raise IOError(
+ f"Short read from {file_path}: expected {length} bytes at "
+ f"offset {offset}, got {len(data)} (file may be truncated)"
+ )
+ return data
+
+
+def _compute_file_hashes(file_path: str, file_size: int) -> Dict[str, str]:
+ """Compute md5, sha1, and md5_10m in a single pass."""
+ md5 = hashlib.md5()
+ sha1 = hashlib.sha1()
+ md5_10m = hashlib.md5()
+
+ need_10m = file_size > _MD5_10M_SIZE
+ bytes_read = 0
+
+ with open(file_path, "rb") as fh:
+ while True:
+ chunk = fh.read(65536)
+ if not chunk:
+ break
+ md5.update(chunk)
+ sha1.update(chunk)
+ if need_10m:
+ remaining = _MD5_10M_SIZE - bytes_read
+ if remaining > 0:
+ md5_10m.update(chunk[:remaining])
+ bytes_read += len(chunk)
+
+ full_md5 = md5.hexdigest()
+ return {
+ "md5": full_md5,
+ "sha1": sha1.hexdigest(),
+ # For small files the "10m" hash is just the full md5.
+ "md5_10m": md5_10m.hexdigest() if need_10m else full_md5,
+ }
+
+
+async def _run_with_concurrency(
+ tasks: List[Callable[[], Awaitable[None]]],
+ concurrency: int,
+) -> None:
+ """Run a list of thunks with a bounded number in flight at once."""
+ if concurrency < 1:
+ concurrency = 1
+ sem = asyncio.Semaphore(concurrency)
+
+ async def _wrap(thunk: Callable[[], Awaitable[None]]) -> None:
+ async with sem:
+ await thunk()
+
+ await asyncio.gather(*(_wrap(t) for t in tasks))
diff --git a/gateway/platforms/qqbot/keyboards.py b/gateway/platforms/qqbot/keyboards.py
new file mode 100644
index 0000000000..19fd36e370
--- /dev/null
+++ b/gateway/platforms/qqbot/keyboards.py
@@ -0,0 +1,473 @@
+"""QQ Bot inline keyboards + approval / update-prompt senders.
+
+QQ Bot v2 supports attaching inline keyboards to outbound messages. When a
+user clicks a button, the platform dispatches an ``INTERACTION_CREATE``
+gateway event containing the button's ``data`` payload. The bot must ACK the
+interaction promptly via ``PUT /interactions/{id}`` or the user sees an
+error indicator on the button.
+
+This module provides:
+
+- :class:`InlineKeyboard` + button dataclasses — serialized into the
+ ``keyboard`` field of the outbound message body.
+- :func:`build_approval_keyboard` — 3-button ✅ once / ⭐ always / ❌ deny
+ keyboard for tool-approval flows.
+- :func:`build_update_prompt_keyboard` — Yes/No keyboard for update confirms.
+- :func:`parse_approval_button_data` / :func:`parse_update_prompt_button_data`
+ — decode the ``button_data`` payload from ``INTERACTION_CREATE``.
+- :class:`ApprovalRequest` + :class:`ApprovalSender` — high-level helper that
+ builds an approval message with keyboard and posts it to a c2c / group chat.
+
+``button_data`` formats::
+
+ approve:: # decision = allow-once|allow-always|deny
+ update_prompt: # answer = y|n
+
+Ported from WideLee's qqbot-agent-sdk v1.2.2 (``approval.py`` + ``dto.py``
+keyboard types). Authorship preserved via Co-authored-by.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import dataclass, field
+from typing import Any, Awaitable, Callable, Dict, List, Optional
+
+logger = logging.getLogger(__name__)
+
+# ── button_data prefixes + patterns ──────────────────────────────────
+
+APPROVAL_BUTTON_PREFIX = "approve:"
+UPDATE_PROMPT_PREFIX = "update_prompt:"
+
+# Pattern: approve::
+# session_key may itself contain colons (e.g. agent:main:qqbot:c2c:OPENID),
+# so the session_key group is greedy but trails the decision.
+_APPROVAL_DATA_RE = re.compile(
+ r"^approve:(.+):(allow-once|allow-always|deny)$"
+)
+
+# Pattern: update_prompt:y | update_prompt:n
+_UPDATE_PROMPT_RE = re.compile(r"^update_prompt:(y|n)$")
+
+
+# ── Keyboard dataclasses ─────────────────────────────────────────────
+
+@dataclass
+class KeyboardButtonPermission:
+ """Button permission metadata. ``type=2`` means all users can click."""
+ type: int = 2
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {"type": self.type}
+
+
+@dataclass
+class KeyboardButtonAction:
+ """What happens when the button is clicked.
+
+ :param type: ``1`` (Callback — triggers ``INTERACTION_CREATE``) or
+ ``2`` (Link — opens a URL).
+ :param data: Payload delivered in ``data.resolved.button_data`` when
+ ``type=1``.
+ :param permission: :class:`KeyboardButtonPermission`.
+ :param click_limit: Max clicks per user (``1`` = single-use).
+ """
+ type: int
+ data: str
+ permission: KeyboardButtonPermission = field(
+ default_factory=KeyboardButtonPermission
+ )
+ click_limit: int = 1
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {
+ "type": self.type,
+ "data": self.data,
+ "permission": self.permission.to_dict(),
+ "click_limit": self.click_limit,
+ }
+
+
+@dataclass
+class KeyboardButtonRenderData:
+ """Visual rendering of a button.
+
+ :param label: Pre-click label.
+ :param visited_label: Post-click label (button stays greyed in place).
+ :param style: ``0`` = grey, ``1`` = blue.
+ """
+ label: str
+ visited_label: str
+ style: int = 1
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {
+ "label": self.label,
+ "visited_label": self.visited_label,
+ "style": self.style,
+ }
+
+
+@dataclass
+class KeyboardButton:
+ """One button in a keyboard.
+
+ :param group_id: Buttons sharing a ``group_id`` are mutually exclusive —
+ clicking one greys the rest.
+ """
+ id: str
+ render_data: KeyboardButtonRenderData
+ action: KeyboardButtonAction
+ group_id: str = "default"
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {
+ "id": self.id,
+ "render_data": self.render_data.to_dict(),
+ "action": self.action.to_dict(),
+ "group_id": self.group_id,
+ }
+
+
+@dataclass
+class KeyboardRow:
+ buttons: List[KeyboardButton] = field(default_factory=list)
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {"buttons": [b.to_dict() for b in self.buttons]}
+
+
+@dataclass
+class KeyboardContent:
+ rows: List[KeyboardRow] = field(default_factory=list)
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {"rows": [r.to_dict() for r in self.rows]}
+
+
+@dataclass
+class InlineKeyboard:
+ """Top-level keyboard payload — goes into ``MessageToCreate.keyboard``."""
+ content: KeyboardContent = field(default_factory=KeyboardContent)
+
+ def to_dict(self) -> Dict[str, Any]:
+ return {"content": self.content.to_dict()}
+
+
+# ── INTERACTION_CREATE parsing ───────────────────────────────────────
+
+def parse_approval_button_data(button_data: str) -> Optional[tuple[str, str]]:
+ """Parse approval ``button_data`` into ``(session_key, decision)``.
+
+ :param button_data: Raw ``data.resolved.button_data`` from
+ ``INTERACTION_CREATE``.
+ :returns: ``(session_key, decision)`` or ``None`` if not an approval button.
+ """
+ m = _APPROVAL_DATA_RE.match(button_data or "")
+ if not m:
+ return None
+ return m.group(1), m.group(2)
+
+
+def parse_update_prompt_button_data(button_data: str) -> Optional[str]:
+ """Parse update-prompt ``button_data`` into ``'y'`` or ``'n'``."""
+ m = _UPDATE_PROMPT_RE.match(button_data or "")
+ if not m:
+ return None
+ return m.group(1)
+
+
+# ── Keyboard builders ────────────────────────────────────────────────
+
+def _make_callback_button(
+ btn_id: str,
+ label: str,
+ visited_label: str,
+ data: str,
+ style: int,
+ group_id: str,
+) -> KeyboardButton:
+ return KeyboardButton(
+ id=btn_id,
+ render_data=KeyboardButtonRenderData(
+ label=label,
+ visited_label=visited_label,
+ style=style,
+ ),
+ action=KeyboardButtonAction(type=1, data=data),
+ group_id=group_id,
+ )
+
+
+def build_approval_keyboard(session_key: str) -> InlineKeyboard:
+ """Build the 3-button approval keyboard.
+
+ Layout: ``[✅ 允许一次] [⭐ 始终允许] [❌ 拒绝]`` — all three share
+ ``group_id='approval'`` so clicking one greys out the rest.
+
+ :param session_key: Embedded into ``button_data`` so the decision
+ routes back to the right pending approval.
+ """
+ return InlineKeyboard(
+ content=KeyboardContent(
+ rows=[
+ KeyboardRow(buttons=[
+ _make_callback_button(
+ btn_id="allow",
+ label="✅ 允许一次",
+ visited_label="已允许",
+ data=f"{APPROVAL_BUTTON_PREFIX}{session_key}:allow-once",
+ style=1,
+ group_id="approval",
+ ),
+ _make_callback_button(
+ btn_id="always",
+ label="⭐ 始终允许",
+ visited_label="已始终允许",
+ data=f"{APPROVAL_BUTTON_PREFIX}{session_key}:allow-always",
+ style=1,
+ group_id="approval",
+ ),
+ _make_callback_button(
+ btn_id="deny",
+ label="❌ 拒绝",
+ visited_label="已拒绝",
+ data=f"{APPROVAL_BUTTON_PREFIX}{session_key}:deny",
+ style=0,
+ group_id="approval",
+ ),
+ ]),
+ ]
+ )
+ )
+
+
+def build_update_prompt_keyboard() -> InlineKeyboard:
+ """Build a Yes/No keyboard for update confirmation prompts."""
+ return InlineKeyboard(
+ content=KeyboardContent(
+ rows=[
+ KeyboardRow(buttons=[
+ _make_callback_button(
+ btn_id="yes",
+ label="✓ 确认",
+ visited_label="已确认",
+ data=f"{UPDATE_PROMPT_PREFIX}y",
+ style=1,
+ group_id="update_prompt",
+ ),
+ _make_callback_button(
+ btn_id="no",
+ label="✗ 取消",
+ visited_label="已取消",
+ data=f"{UPDATE_PROMPT_PREFIX}n",
+ style=0,
+ group_id="update_prompt",
+ ),
+ ]),
+ ]
+ )
+ )
+
+
+# ── ApprovalRequest + text builder ───────────────────────────────────
+
+@dataclass
+class ApprovalRequest:
+ """Structured approval-request display data.
+
+ :param session_key: Routes the decision back to the waiting caller.
+ :param title: Short title at the top.
+ :param description: Optional longer description.
+ :param command_preview: Command text (exec approvals).
+ :param cwd: Working directory (exec approvals).
+ :param tool_name: Tool name (plugin approvals).
+ :param severity: ``'critical' | 'info' | ''``.
+ :param timeout_sec: Seconds until the approval expires.
+ """
+ session_key: str
+ title: str
+ description: str = ""
+ command_preview: str = ""
+ cwd: str = ""
+ tool_name: str = ""
+ severity: str = ""
+ timeout_sec: int = 120
+
+
+def build_approval_text(req: ApprovalRequest) -> str:
+ """Render an :class:`ApprovalRequest` into the message body (markdown)."""
+ if req.command_preview or req.cwd:
+ return _build_exec_text(req)
+ return _build_plugin_text(req)
+
+
+def _build_exec_text(req: ApprovalRequest) -> str:
+ lines: List[str] = ["🔐 **命令执行审批**", ""]
+ if req.command_preview:
+ preview = req.command_preview[:300]
+ lines.append(f"```\n{preview}\n```")
+ if req.cwd:
+ lines.append(f"📁 目录: {req.cwd}")
+ if req.title and req.title != req.command_preview:
+ lines.append(f"📋 {req.title}")
+ if req.description:
+ lines.append(f"📝 {req.description}")
+ lines.append("")
+ lines.append(f"⏱️ 超时: {req.timeout_sec} 秒")
+ return "\n".join(lines)
+
+
+def _build_plugin_text(req: ApprovalRequest) -> str:
+ icon = (
+ "🔴" if req.severity == "critical"
+ else "🔵" if req.severity == "info"
+ else "🟡"
+ )
+ lines: List[str] = [f"{icon} **审批请求**", ""]
+ lines.append(f"📋 {req.title}")
+ if req.description:
+ lines.append(f"📝 {req.description}")
+ if req.tool_name:
+ lines.append(f"🔧 工具: {req.tool_name}")
+ lines.append("")
+ lines.append(f"⏱️ 超时: {req.timeout_sec} 秒")
+ return "\n".join(lines)
+
+
+# ── ApprovalSender ───────────────────────────────────────────────────
+
+PostMessageFn = Callable[..., Awaitable[Dict[str, Any]]]
+"""Signature of an async POST to ``/v2/{users|groups}/{id}/messages``.
+
+Implementations accept a body dict and return the raw API response.
+"""
+
+
+class ApprovalSender:
+ """Send an approval-request message with an inline keyboard.
+
+ Decoupled from the adapter via callables so it can be unit-tested in
+ isolation. Pass the adapter's ``_send_message_with_keyboard`` helper
+ (or any equivalent) as ``post_message``.
+ """
+
+ def __init__(
+ self,
+ post_c2c: PostMessageFn,
+ post_group: PostMessageFn,
+ log_tag: str = "QQBot",
+ ) -> None:
+ self._post_c2c = post_c2c
+ self._post_group = post_group
+ self._log_tag = log_tag
+
+ async def send(
+ self,
+ chat_type: str,
+ chat_id: str,
+ req: ApprovalRequest,
+ msg_id: Optional[str] = None,
+ ) -> bool:
+ """Send an approval message to *chat_id*.
+
+ :param chat_type: ``'c2c'`` or ``'group'``.
+ :param chat_id: User openid or group openid.
+ :param req: :class:`ApprovalRequest`.
+ :param msg_id: Reply-to message id (required for passive messages).
+ :returns: ``True`` on success, ``False`` on failure.
+ """
+ text = build_approval_text(req)
+ keyboard = build_approval_keyboard(req.session_key)
+
+ logger.info(
+ "[%s] Sending approval request to %s:%s (session=%.20s…)",
+ self._log_tag, chat_type, chat_id, req.session_key,
+ )
+
+ try:
+ if chat_type == "c2c":
+ await self._post_c2c(chat_id, text, msg_id, keyboard)
+ elif chat_type == "group":
+ await self._post_group(chat_id, text, msg_id, keyboard)
+ else:
+ logger.warning(
+ "[%s] Approval: unsupported chat_type %r",
+ self._log_tag, chat_type,
+ )
+ return False
+ logger.info(
+ "[%s] Approval message sent to %s:%s",
+ self._log_tag, chat_type, chat_id,
+ )
+ return True
+ except Exception as exc:
+ logger.error(
+ "[%s] Failed to send approval message to %s:%s: %s",
+ self._log_tag, chat_type, chat_id, exc,
+ )
+ return False
+
+
+# ── INTERACTION_CREATE event shape ───────────────────────────────────
+
+@dataclass
+class InteractionEvent:
+ """Parsed ``INTERACTION_CREATE`` event payload.
+
+ See https://bot.q.qq.com/wiki/develop/api-v2/dev-prepare/interface-framework/event-emit.html
+ """
+ id: str = ""
+ """Interaction event id — required for the ``PUT /interactions/{id}`` ACK."""
+
+ type: int = 0
+ """Event type code (``11`` = message button)."""
+
+ chat_type: int = 0
+ """``0`` = guild, ``1`` = group, ``2`` = c2c."""
+
+ scene: str = ""
+ """``'guild'`` | ``'group'`` | ``'c2c'`` — human-readable scene."""
+
+ group_openid: str = ""
+ group_member_openid: str = ""
+ user_openid: str = ""
+ channel_id: str = ""
+ guild_id: str = ""
+
+ button_data: str = ""
+ button_id: str = ""
+ resolver_user_id: str = ""
+
+ @property
+ def operator_openid(self) -> str:
+ """Best available operator openid (group → member; c2c → user)."""
+ return (
+ self.group_member_openid
+ or self.user_openid
+ or self.resolver_user_id
+ )
+
+
+def parse_interaction_event(raw: Dict[str, Any]) -> InteractionEvent:
+ """Parse a raw ``INTERACTION_CREATE`` dispatch payload (``d``)."""
+ data_raw = raw.get("data") or {}
+ resolved = data_raw.get("resolved") or {}
+ scene_code = int(raw.get("chat_type", 0) or 0)
+ scene = {0: "guild", 1: "group", 2: "c2c"}.get(scene_code, "")
+ return InteractionEvent(
+ id=str(raw.get("id", "")),
+ type=int(data_raw.get("type", 0) or 0),
+ chat_type=scene_code,
+ scene=scene,
+ group_openid=str(raw.get("group_openid", "")),
+ group_member_openid=str(raw.get("group_member_openid", "")),
+ user_openid=str(raw.get("user_openid", "")),
+ channel_id=str(raw.get("channel_id", "")),
+ guild_id=str(raw.get("guild_id", "")),
+ button_data=str(resolved.get("button_data", "")),
+ button_id=str(resolved.get("button_id", "")),
+ resolver_user_id=str(resolved.get("user_id", "")),
+ )
diff --git a/gateway/platforms/slack.py b/gateway/platforms/slack.py
index c8ee28859d..843fb78959 100644
--- a/gateway/platforms/slack.py
+++ b/gateway/platforms/slack.py
@@ -1887,6 +1887,12 @@ class SlackAdapter(BasePlatformAdapter):
is_thread_reply = bool(event_thread_ts and event_thread_ts != ts)
if not is_dm and bot_uid:
+ # Check allowed channels — if set, only respond in these channels (whitelist)
+ allowed_channels = self._slack_allowed_channels()
+ if allowed_channels and channel_id not in allowed_channels:
+ logger.debug("[Slack] Ignoring message in non-allowed channel: %s", channel_id)
+ return
+
if channel_id in self._slack_free_response_channels():
pass # Free-response channel — always process
elif not self._slack_require_mention():
@@ -2924,3 +2930,19 @@ class SlackAdapter(BasePlatformAdapter):
if s:
return {part.strip() for part in s.split(",") if part.strip()}
return set()
+
+ def _slack_allowed_channels(self) -> set:
+ """Return the whitelist of channel IDs the bot will respond in.
+
+ When non-empty, messages from channels NOT in this set are silently
+ ignored — even if the bot is @mentioned. DMs are never filtered.
+ Empty set means no restriction (fully backward compatible).
+ """
+ raw = self.config.extra.get("allowed_channels")
+ if raw is None:
+ raw = os.getenv("SLACK_ALLOWED_CHANNELS", "")
+ if isinstance(raw, list):
+ return {str(part).strip() for part in raw if str(part).strip()}
+ if isinstance(raw, str) and raw.strip():
+ return {part.strip() for part in raw.split(",") if part.strip()}
+ return set()
diff --git a/gateway/platforms/telegram.py b/gateway/platforms/telegram.py
index 167d47237e..0d0ac3866f 100644
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -86,6 +86,22 @@ from gateway.platforms.telegram_network import (
)
from utils import atomic_replace
+_TELEGRAM_IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".webp", ".gif"}
+_TELEGRAM_IMAGE_MIME_TO_EXT = {
+ "image/png": ".png",
+ "image/jpeg": ".jpg",
+ "image/jpg": ".jpg",
+ "image/webp": ".webp",
+ "image/gif": ".gif",
+}
+_TELEGRAM_IMAGE_EXT_TO_MIME = {
+ ".png": "image/png",
+ ".jpg": "image/jpeg",
+ ".jpeg": "image/jpeg",
+ ".webp": "image/webp",
+ ".gif": "image/gif",
+}
+
def check_telegram_requirements() -> bool:
"""Check if Telegram dependencies are available."""
@@ -353,6 +369,13 @@ class TelegramAdapter(BasePlatformAdapter):
@classmethod
def _message_thread_id_for_typing(cls, thread_id: Optional[str]) -> Optional[int]:
+ # Asymmetric with _message_thread_id_for_send on purpose. Telegram's
+ # sendMessage and sendChatAction treat thread id "1" (the forum General
+ # topic) differently: sends reject message_thread_id=1 and must omit it,
+ # but sendChatAction needs message_thread_id=1 to place the typing
+ # bubble in the General topic (omitting it hides the bubble entirely
+ # from the client's view of that topic). Preserve the real id here —
+ # sends still map "1" → None via _message_thread_id_for_send.
if not thread_id:
return None
return int(thread_id)
@@ -688,6 +711,29 @@ class TelegramAdapter(BasePlatformAdapter):
)
return None
+ async def rename_dm_topic(
+ self,
+ chat_id: int,
+ thread_id: int,
+ name: str,
+ ) -> None:
+ """Rename a forum topic in a private (DM) chat."""
+ if not self._bot:
+ return
+ try:
+ chat_id_arg = int(chat_id)
+ except (TypeError, ValueError):
+ chat_id_arg = chat_id
+ await self._bot.edit_forum_topic(
+ chat_id=chat_id_arg,
+ message_thread_id=int(thread_id),
+ name=name,
+ )
+ logger.info(
+ "[%s] Renamed DM topic in chat %s thread_id=%s -> '%s'",
+ self.name, chat_id, thread_id, name,
+ )
+
def _persist_dm_topic_thread_id(self, chat_id: int, topic_name: str, thread_id: int) -> None:
"""Save a newly created thread_id back into config.yaml so it persists across restarts."""
try:
@@ -2485,21 +2531,16 @@ class TelegramAdapter(BasePlatformAdapter):
try:
_typing_thread = self._metadata_thread_id(metadata)
message_thread_id = self._message_thread_id_for_typing(_typing_thread)
- try:
- await self._bot.send_chat_action(
- chat_id=int(chat_id),
- action="typing",
- message_thread_id=message_thread_id,
- )
- except Exception as e:
- if message_thread_id is not None and self._is_thread_not_found_error(e):
- await self._bot.send_chat_action(
- chat_id=int(chat_id),
- action="typing",
- message_thread_id=None,
- )
- else:
- raise
+ # No retry-without-thread fallback here: _message_thread_id_for_typing
+ # already maps the forum General topic to None, so any non-None value
+ # reaching this call is a user-created topic. If Telegram rejects it
+ # (e.g. topic deleted mid-session), we swallow the failure rather than
+ # showing a typing indicator in the wrong chat/All Messages.
+ await self._bot.send_chat_action(
+ chat_id=int(chat_id),
+ action="typing",
+ message_thread_id=message_thread_id,
+ )
except Exception as e:
# Typing failures are non-fatal; log at debug level only.
logger.debug(
@@ -2734,6 +2775,20 @@ class TelegramAdapter(BasePlatformAdapter):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
+ def _telegram_allowed_chats(self) -> set[str]:
+ """Return the whitelist of group/supergroup chat IDs the bot will respond in.
+
+ When non-empty, group messages from chats NOT in this set are silently
+ ignored — even if the bot is @mentioned. DMs are never filtered.
+ Empty set means no restriction (fully backward compatible).
+ """
+ raw = self.config.extra.get("allowed_chats")
+ if raw is None:
+ raw = os.getenv("TELEGRAM_ALLOWED_CHATS", "")
+ if isinstance(raw, list):
+ return {str(part).strip() for part in raw if str(part).strip()}
+ return {part.strip() for part in str(raw).split(",") if part.strip()}
+
def _telegram_ignored_threads(self) -> set[int]:
raw = self.config.extra.get("ignored_threads")
if raw is None:
@@ -2882,13 +2937,16 @@ class TelegramAdapter(BasePlatformAdapter):
"""Apply Telegram group trigger rules.
DMs remain unrestricted. Group/supergroup messages are accepted when:
+ - the chat passes the ``allowed_chats`` whitelist (when set)
- the chat is explicitly allowlisted in ``free_response_chats``
- ``require_mention`` is disabled
- the message replies to the bot
- the bot is @mentioned
- the text/caption matches a configured regex wake-word pattern
- When ``require_mention`` is enabled, slash commands are not given
+ When ``allowed_chats`` is non-empty, it acts as a hard gate — messages
+ from any chat not in the list are ignored regardless of the other
+ rules. When ``require_mention`` is enabled, slash commands are not given
special treatment — they must pass the same mention/reply checks
as any other group message. Users can still trigger commands via
the Telegram bot menu (``/command@botname``) or by explicitly
@@ -2897,6 +2955,14 @@ class TelegramAdapter(BasePlatformAdapter):
"""
if not self._is_group_chat(message):
return True
+ # allowed_chats check (whitelist — must pass before other gating).
+ # When set, group messages from chats NOT in this whitelist are
+ # silently ignored, even if @mentioned. DMs are already excluded above.
+ allowed = self._telegram_allowed_chats()
+ if allowed:
+ chat_id_str = str(getattr(getattr(message, "chat", None), "id", ""))
+ if chat_id_str not in allowed:
+ return False
thread_id = getattr(message, "message_thread_id", None)
if thread_id is not None:
try:
@@ -3218,10 +3284,59 @@ class TelegramAdapter(BasePlatformAdapter):
_, ext = os.path.splitext(original_filename)
ext = ext.lower()
+ # Normalize mime_type for robust comparisons (some clients send
+ # uppercase like "IMAGE/PNG").
+ doc_mime = (doc.mime_type or "").lower()
+
# If no extension from filename, reverse-lookup from MIME type
- if not ext and doc.mime_type:
- mime_to_ext = {v: k for k, v in SUPPORTED_DOCUMENT_TYPES.items()}
- ext = mime_to_ext.get(doc.mime_type, "")
+ if not ext and doc_mime:
+ ext = _TELEGRAM_IMAGE_MIME_TO_EXT.get(doc_mime, "")
+ if not ext:
+ mime_to_ext = {v: k for k, v in SUPPORTED_DOCUMENT_TYPES.items()}
+ ext = mime_to_ext.get(doc_mime, "")
+
+ # Check file size early so image documents cannot bypass the
+ # document size limit by taking the image path.
+ MAX_DOC_BYTES = 20 * 1024 * 1024
+ if not doc.file_size or doc.file_size > MAX_DOC_BYTES:
+ event.text = (
+ "The document is too large or its size could not be verified. "
+ "Maximum: 20 MB."
+ )
+ logger.info("[Telegram] Document too large: %s bytes", doc.file_size)
+ await self.handle_message(event)
+ return
+
+ # Telegram may deliver screenshots/photos as documents. If the
+ # payload is actually an image, route it through the image cache
+ # and batching path instead of rejecting it as a document.
+ if ext in _TELEGRAM_IMAGE_EXTENSIONS or doc_mime.startswith("image/"):
+ file_obj = await doc.get_file()
+ image_bytes = await file_obj.download_as_bytearray()
+ image_ext = ext if ext in _TELEGRAM_IMAGE_EXTENSIONS else _TELEGRAM_IMAGE_MIME_TO_EXT.get(doc_mime, ".jpg")
+ try:
+ cached_path = cache_image_from_bytes(bytes(image_bytes), ext=image_ext)
+ except ValueError as e:
+ logger.warning("[Telegram] Failed to cache image document: %s", e, exc_info=True)
+ event.text = (
+ f"Image document '{original_filename or doc_mime or ext or 'unknown'}' "
+ "could not be read as an image."
+ )
+ await self.handle_message(event)
+ return
+
+ event.message_type = MessageType.PHOTO
+ event.media_urls = [cached_path]
+ event.media_types = [doc_mime if doc_mime.startswith("image/") else _TELEGRAM_IMAGE_EXT_TO_MIME.get(image_ext, "image/jpeg")]
+ logger.info("[Telegram] Cached user image-document at %s", cached_path)
+
+ media_group_id = getattr(msg, "media_group_id", None)
+ if media_group_id:
+ await self._queue_media_group_event(str(media_group_id), event)
+ else:
+ batch_key = self._photo_batch_key(event, msg)
+ self._enqueue_photo_event(batch_key, event)
+ return
if not ext and doc.mime_type:
video_mime_to_ext = {v: k for k, v in SUPPORTED_VIDEO_TYPES.items()}
@@ -3249,17 +3364,6 @@ class TelegramAdapter(BasePlatformAdapter):
await self.handle_message(event)
return
- # Check file size (Telegram Bot API limit: 20 MB)
- MAX_DOC_BYTES = 20 * 1024 * 1024
- if not doc.file_size or doc.file_size > MAX_DOC_BYTES:
- event.text = (
- "The document is too large or its size could not be verified. "
- "Maximum: 20 MB."
- )
- logger.info("[Telegram] Document too large: %s bytes", doc.file_size)
- await self.handle_message(event)
- return
-
# Download and cache
file_obj = await doc.get_file()
doc_bytes = await file_obj.download_as_bytearray()
diff --git a/gateway/platforms/telegram_network.py b/gateway/platforms/telegram_network.py
index b099adc50e..8fe4c28093 100644
--- a/gateway/platforms/telegram_network.py
+++ b/gateway/platforms/telegram_network.py
@@ -185,10 +185,13 @@ async def _query_doh_provider(
async def discover_fallback_ips() -> list[str]:
"""Auto-discover Telegram API IPs via DNS-over-HTTPS.
- Resolves api.telegram.org through Google and Cloudflare DoH, collects all
- unique IPs, and excludes the system-DNS-resolved IP (which is presumably
- unreachable on this network). Falls back to a hardcoded seed list when DoH
- is also unavailable.
+ Resolves api.telegram.org through Google and Cloudflare DoH and returns all
+ unique A records. IPs that match the local system resolver are kept rather
+ than excluded: in many networks the system-DNS IP is the most reliable path
+ to api.telegram.org and a transient primary-path failure should be retried
+ against the same address via the IP-rewrite path before the seed list is
+ consulted (#14520). Falls back to a hardcoded seed list only when DoH
+ yields no usable answers.
"""
async with httpx.AsyncClient(timeout=httpx.Timeout(_DOH_TIMEOUT)) as client:
doh_tasks = [_query_doh_provider(client, p) for p in _DOH_PROVIDERS]
@@ -203,11 +206,11 @@ async def discover_fallback_ips() -> list[str]:
if isinstance(r, list):
doh_ips.extend(r)
- # Deduplicate preserving order, exclude system-DNS IPs
+ # Deduplicate preserving order
seen: set[str] = set()
candidates: list[str] = []
for ip in doh_ips:
- if ip not in seen and ip not in system_ips:
+ if ip not in seen:
seen.add(ip)
candidates.append(ip)
@@ -219,7 +222,7 @@ async def discover_fallback_ips() -> list[str]:
return validated
logger.info(
- "DoH discovery yielded no new IPs (system DNS: %s); using seed fallback IPs %s",
+ "DoH discovery yielded no usable IPs (system DNS: %s); using seed fallback IPs %s",
", ".join(system_ips) or "unknown",
", ".join(_SEED_FALLBACK_IPS),
)
diff --git a/gateway/platforms/webhook.py b/gateway/platforms/webhook.py
index 34e2dfa2c5..83aa93e94c 100644
--- a/gateway/platforms/webhook.py
+++ b/gateway/platforms/webhook.py
@@ -59,6 +59,29 @@ DEFAULT_PORT = 8644
_INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
_DYNAMIC_ROUTES_FILENAME = "webhook_subscriptions.json"
+# Hostnames/IP literals that only serve connections originating on the same
+# machine. Anything else is treated as a public bind for safety-rail purposes.
+_LOOPBACK_HOSTS = frozenset({
+ "127.0.0.1",
+ "localhost",
+ "::1",
+ "ip6-localhost",
+ "ip6-loopback",
+})
+
+
+def _is_loopback_host(host: str) -> bool:
+ """True when `host` binds only to the local machine.
+
+ Covers IPv4 loopback, the standard `localhost` alias, IPv6 loopback in
+ both bracketed and bare form, and the common Debian-style aliases. Any
+ falsy value (empty string, None) is conservatively treated as non-loopback
+ because an unset host usually means the platform-default public bind.
+ """
+ if not host:
+ return False
+ return host.strip().lower() in _LOOPBACK_HOSTS
+
def check_webhook_requirements() -> bool:
"""Check if webhook adapter dependencies are available."""
@@ -126,6 +149,17 @@ class WebhookAdapter(BasePlatformAdapter):
f"For testing without auth, set secret to '{_INSECURE_NO_AUTH}'."
)
+ # Safety rail: refuse to start if INSECURE_NO_AUTH is combined with a
+ # non-loopback bind. The escape hatch is for local testing only;
+ # serving an unauthenticated route on a public interface is a
+ # deployment-grade footgun we'd rather crash early than ship.
+ if secret == _INSECURE_NO_AUTH and not _is_loopback_host(self._host):
+ raise ValueError(
+ f"[webhook] Route '{name}' uses INSECURE_NO_AUTH secret "
+ f"but is bound to non-loopback host '{self._host}'. "
+ f"INSECURE_NO_AUTH is for local testing only. "
+ f"Refusing to start to prevent accidental exposure."
+ )
# deliver_only routes bypass the agent — the POST body becomes a
# direct push notification via the configured delivery target.
# Validate up-front so misconfiguration surfaces at startup rather
diff --git a/gateway/platforms/wecom.py b/gateway/platforms/wecom.py
index 873284de79..769743794d 100644
--- a/gateway/platforms/wecom.py
+++ b/gateway/platforms/wecom.py
@@ -37,6 +37,7 @@ import logging
import mimetypes
import os
import re
+import time
import uuid
from datetime import datetime, timezone
from pathlib import Path
@@ -1015,6 +1016,8 @@ class WeComAdapter(BasePlatformAdapter):
if not aes_key:
raise ValueError("aes_key is required")
+ # WeCom doesn't pad base64 keys; add padding if needed
+ aes_key = aes_key + '=' * ((4 - len(aes_key) % 4) % 4)
key = base64.b64decode(aes_key)
if len(key) != 32:
raise ValueError(f"Invalid WeCom AES key length: expected 32 bytes, got {len(key)}")
@@ -1560,12 +1563,11 @@ def qr_scan_for_bot_info(
print(" Fetching configuration results...", end="", flush=True)
# ── Step 3: Poll for result ──
- import time
- deadline = time.time() + timeout_seconds
+ deadline = time.monotonic() + timeout_seconds
query_url = f"{_QR_QUERY_URL}?scode={urllib.parse.quote(scode)}"
poll_count = 0
- while time.time() < deadline:
+ while time.monotonic() < deadline:
try:
req = urllib.request.Request(query_url, headers={"User-Agent": "HermesAgent/1.0"})
with urllib.request.urlopen(req, timeout=10) as resp:
diff --git a/gateway/platforms/weixin.py b/gateway/platforms/weixin.py
index 482692ee7a..1c20b3f290 100644
--- a/gateway/platforms/weixin.py
+++ b/gateway/platforms/weixin.py
@@ -23,6 +23,7 @@ import re
import secrets
import struct
import tempfile
+import textwrap
import time
import uuid
from datetime import datetime
@@ -32,6 +33,8 @@ from urllib.parse import quote, urlparse
logger = logging.getLogger(__name__)
+WEIXIN_COPY_LINE_WIDTH = 120
+
try:
import aiohttp
@@ -548,17 +551,21 @@ async def _upload_ciphertext(
Accepts either a constructed CDN URL (from upload_param) or a direct
upload_full_url — both use POST with the raw ciphertext as the body.
"""
- timeout = aiohttp.ClientTimeout(total=120)
- async with session.post(upload_url, data=ciphertext, headers={"Content-Type": "application/octet-stream"}, timeout=timeout) as response:
- if response.status == 200:
- encrypted_param = response.headers.get("x-encrypted-param")
- if encrypted_param:
- await response.read()
- return encrypted_param
+ # Use asyncio.wait_for() instead of aiohttp ClientTimeout to avoid
+ # "Timeout context manager should be used inside a task" errors when
+ # invoked via asyncio.run_coroutine_threadsafe() from cron jobs.
+ async def _do_upload() -> str:
+ async with session.post(upload_url, data=ciphertext, headers={"Content-Type": "application/octet-stream"}) as response:
+ if response.status == 200:
+ encrypted_param = response.headers.get("x-encrypted-param")
+ if encrypted_param:
+ await response.read()
+ return encrypted_param
+ raw = await response.text()
+ raise RuntimeError(f"CDN upload missing x-encrypted-param header: {raw[:200]}")
raw = await response.text()
- raise RuntimeError(f"CDN upload missing x-encrypted-param header: {raw[:200]}")
- raw = await response.text()
- raise RuntimeError(f"CDN upload HTTP {response.status}: {raw[:200]}")
+ raise RuntimeError(f"CDN upload HTTP {response.status}: {raw[:200]}")
+ return await asyncio.wait_for(_do_upload(), timeout=120)
async def _download_bytes(
@@ -567,10 +574,13 @@ async def _download_bytes(
url: str,
timeout_seconds: float = 60.0,
) -> bytes:
- timeout = aiohttp.ClientTimeout(total=timeout_seconds)
- async with session.get(url, timeout=timeout) as response:
- response.raise_for_status()
- return await response.read()
+ # Use asyncio.wait_for() instead of aiohttp ClientTimeout to avoid
+ # "Timeout context manager should be used inside a task" errors.
+ async def _do_download() -> bytes:
+ async with session.get(url) as response:
+ response.raise_for_status()
+ return await response.read()
+ return await asyncio.wait_for(_do_download(), timeout=timeout_seconds)
_WEIXIN_CDN_ALLOWLIST: frozenset[str] = frozenset(
@@ -724,6 +734,46 @@ def _normalize_markdown_blocks(content: str) -> str:
return "\n".join(result).strip()
+def _wrap_copy_friendly_lines_for_weixin(content: str) -> str:
+ """Wrap long display lines that are hard to copy in WeChat clients."""
+ if not content:
+ return content
+
+ wrapped: List[str] = []
+ in_code_block = False
+
+ for raw_line in content.splitlines():
+ line = raw_line.rstrip()
+ stripped = line.strip()
+
+ if _FENCE_RE.match(stripped):
+ in_code_block = not in_code_block
+ wrapped.append(line)
+ continue
+
+ if (
+ in_code_block
+ or len(line) <= WEIXIN_COPY_LINE_WIDTH
+ or not stripped
+ or stripped.startswith("|")
+ or _TABLE_RULE_RE.match(stripped)
+ ):
+ wrapped.append(line)
+ continue
+
+ wrapped_lines = textwrap.wrap(
+ line,
+ width=WEIXIN_COPY_LINE_WIDTH,
+ break_long_words=False,
+ break_on_hyphens=False,
+ replace_whitespace=False,
+ drop_whitespace=True,
+ )
+ wrapped.extend(wrapped_lines or [line])
+
+ return "\n".join(wrapped).strip()
+
+
def _split_markdown_blocks(content: str) -> List[str]:
if not content:
return []
@@ -1037,11 +1087,11 @@ async def qr_login(
except Exception as _qr_exc:
print(f"(终端二维码渲染失败: {_qr_exc},请直接打开上面的二维码链接)")
- deadline = time.time() + timeout_seconds
+ deadline = time.monotonic() + timeout_seconds
current_base_url = ILINK_BASE_URL
refresh_count = 0
- while time.time() < deadline:
+ while time.monotonic() < deadline:
try:
status_resp = await _api_get(
session,
@@ -1216,7 +1266,12 @@ class WeixinAdapter(BasePlatformAdapter):
logger.debug("[%s] Token lock unavailable (non-fatal): %s", self.name, exc)
self._poll_session = aiohttp.ClientSession(trust_env=True, connector=_make_ssl_connector())
- self._send_session = aiohttp.ClientSession(trust_env=True, connector=_make_ssl_connector())
+ # Disable aiohttp's built-in ClientTimeout (total=None) to prevent
+ # "Timeout context manager should be used inside a task" errors when
+ # send() is invoked via asyncio.run_coroutine_threadsafe() from cron.
+ # Timeout is managed externally via asyncio.wait_for() in _api_post/_api_get.
+ _no_aiohttp_timeout = aiohttp.ClientTimeout(total=None, connect=None, sock_connect=None, sock_read=None)
+ self._send_session = aiohttp.ClientSession(trust_env=True, connector=_make_ssl_connector(), timeout=_no_aiohttp_timeout)
self._token_store.restore(self._account_id)
self._poll_task = asyncio.create_task(self._poll_loop(), name="weixin-poll")
self._mark_connected()
@@ -1824,10 +1879,14 @@ class WeixinAdapter(BasePlatformAdapter):
raise ValueError(f"Blocked unsafe URL (SSRF protection): {url}")
assert self._send_session is not None
- async with self._send_session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as response:
- response.raise_for_status()
- data = await response.read()
- suffix = Path(url.split("?", 1)[0]).suffix or ".bin"
+ # Use asyncio.wait_for() instead of aiohttp ClientTimeout to avoid
+ # "Timeout context manager should be used inside a task" errors.
+ async def _do_fetch():
+ async with self._send_session.get(url) as response:
+ response.raise_for_status()
+ return await response.read()
+ data = await asyncio.wait_for(_do_fetch(), timeout=30)
+ suffix = Path(url.split("?", 1)[0]).suffix or ".bin"
with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as handle:
handle.write(data)
return handle.name
@@ -2006,7 +2065,7 @@ class WeixinAdapter(BasePlatformAdapter):
def format_message(self, content: Optional[str]) -> str:
if content is None:
return ""
- return _normalize_markdown_blocks(content)
+ return _wrap_copy_friendly_lines_for_weixin(_normalize_markdown_blocks(content))
async def send_weixin_direct(
diff --git a/gateway/platforms/whatsapp.py b/gateway/platforms/whatsapp.py
index 921dd70d72..ec45487039 100644
--- a/gateway/platforms/whatsapp.py
+++ b/gateway/platforms/whatsapp.py
@@ -21,6 +21,7 @@ import logging
import os
import platform
import re
+import signal
import subprocess
_IS_WINDOWS = platform.system() == "Windows"
@@ -54,19 +55,77 @@ def _kill_port_process(port: int) -> None:
except subprocess.SubprocessError:
pass
else:
- result = subprocess.run(
- ["fuser", f"{port}/tcp"],
- capture_output=True, timeout=5,
- )
- if result.returncode == 0:
- subprocess.run(
- ["fuser", "-k", f"{port}/tcp"],
+ # Try fuser first (Linux), fall back to lsof (macOS / WSL2)
+ killed = False
+ try:
+ result = subprocess.run(
+ ["fuser", f"{port}/tcp"],
capture_output=True, timeout=5,
)
+ if result.returncode == 0:
+ subprocess.run(
+ ["fuser", "-k", f"{port}/tcp"],
+ capture_output=True, timeout=5,
+ )
+ killed = True
+ except FileNotFoundError:
+ pass # fuser not installed
+
+ if not killed:
+ try:
+ result = subprocess.run(
+ ["lsof", "-ti", f":{port}"],
+ capture_output=True, text=True, timeout=5,
+ )
+ for pid_str in result.stdout.strip().splitlines():
+ try:
+ os.kill(int(pid_str), signal.SIGTERM)
+ except (ValueError, ProcessLookupError, PermissionError):
+ pass
+ except FileNotFoundError:
+ pass # lsof not installed either
except Exception:
pass
+def _kill_stale_bridge_by_pidfile(session_path: Path) -> None:
+ """Kill a bridge process recorded in a PID file from a previous run.
+
+ The bridge writes ``bridge.pid`` into the session directory when it
+ starts. If the gateway crashed without a clean shutdown the old bridge
+ process becomes orphaned — this helper finds and kills it.
+ """
+ pid_file = session_path / "bridge.pid"
+ if not pid_file.exists():
+ return
+ try:
+ pid = int(pid_file.read_text().strip())
+ except (ValueError, OSError, TypeError):
+ try:
+ pid_file.unlink()
+ except OSError:
+ pass
+ return
+ try:
+ os.kill(pid, 0) # check existence
+ os.kill(pid, signal.SIGTERM)
+ logger.info("[whatsapp] Killed stale bridge PID %d from pidfile", pid)
+ except (ProcessLookupError, PermissionError, OSError):
+ pass
+ try:
+ pid_file.unlink()
+ except OSError:
+ pass
+
+
+def _write_bridge_pidfile(session_path: Path, pid: int) -> None:
+ """Write the bridge PID to a file for later cleanup."""
+ try:
+ (session_path / "bridge.pid").write_text(str(pid))
+ except OSError:
+ pass
+
+
def _terminate_bridge_process(proc, *, force: bool = False) -> None:
"""Terminate the bridge process using process-tree semantics where possible."""
if _IS_WINDOWS:
@@ -158,6 +217,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
# WhatsApp message limits — practical UX limit, not protocol max.
# WhatsApp allows ~65K but long messages are unreadable on mobile.
MAX_MESSAGE_LENGTH = 4096
+ DEFAULT_REPLY_PREFIX = "⚕ *Hermes Agent*\n────────────\n"
# Default bridge location relative to the hermes-agent install
_DEFAULT_BRIDGE_DIR = Path(__file__).resolve().parents[2] / "scripts" / "whatsapp-bridge"
@@ -193,6 +253,25 @@ class WhatsAppAdapter(BasePlatformAdapter):
# notification before the normal "✓ whatsapp disconnected" fires.
self._shutting_down: bool = False
+ def _effective_reply_prefix(self) -> str:
+ """Return the prefix the Node bridge will add in self-chat mode."""
+ whatsapp_mode = os.getenv("WHATSAPP_MODE", "self-chat")
+ if whatsapp_mode != "self-chat":
+ return ""
+ if self._reply_prefix is not None:
+ return self._reply_prefix.replace("\\n", "\n")
+ env_prefix = os.getenv("WHATSAPP_REPLY_PREFIX")
+ if env_prefix is not None:
+ return env_prefix.replace("\\n", "\n")
+ return self.DEFAULT_REPLY_PREFIX
+
+ def _outgoing_chunk_limit(self) -> int:
+ """Reserve room for the bridge-side prefix so final WhatsApp text fits."""
+ prefix_len = len(self._effective_reply_prefix())
+ # Keep enough space for truncate_message's pagination indicator and
+ # code-fence repair even if a user configures a very long prefix.
+ return max(1024, self.MAX_MESSAGE_LENGTH - prefix_len)
+
def _whatsapp_require_mention(self) -> bool:
configured = self.config.extra.get("require_mention")
if configured is not None:
@@ -428,6 +507,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
pass # Bridge not running, start a new one
# Kill any orphaned bridge from a previous gateway run
+ _kill_stale_bridge_by_pidfile(self._session_path)
_kill_port_process(self._bridge_port)
await asyncio.sleep(1)
@@ -459,6 +539,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
preexec_fn=None if _IS_WINDOWS else os.setsid,
env=bridge_env,
)
+ _write_bridge_pidfile(self._session_path, self._bridge_process.pid)
# Wait for the bridge to connect to WhatsApp.
# Phase 1: wait for the HTTP server to come up (up to 15s).
@@ -609,6 +690,12 @@ class WhatsAppAdapter(BasePlatformAdapter):
# Bridge was not started by us, don't kill it
print(f"[{self.name}] Disconnecting (external bridge left running)")
+ # Clean up PID file
+ try:
+ (self._session_path / "bridge.pid").unlink(missing_ok=True)
+ except OSError:
+ pass
+
# Cancel the poll task explicitly
if self._poll_task and not self._poll_task.done():
self._poll_task.cancel()
@@ -713,7 +800,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
# Format and chunk the message
formatted = self.format_message(content)
- chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+ chunks = self.truncate_message(formatted, self._outgoing_chunk_limit())
last_message_id = None
for chunk in chunks:
diff --git a/gateway/run.py b/gateway/run.py
index 6047de3220..24ed660895 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -39,6 +39,7 @@ from typing import Dict, Optional, Any, List, Union
# gateway is a long-running daemon, so its boot cost matters less than
# preserving the established test-patch surface.
from agent.account_usage import fetch_account_usage, render_account_usage_lines
+from agent.i18n import t
from hermes_cli.config import cfg_get
# --- Agent cache tuning ---------------------------------------------------
@@ -93,46 +94,6 @@ def _telegramize_command_mentions(text: str, platform: Any) -> str:
_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT = 60 * 60
-# --- Stale-code self-check ------------------------------------------------
-# Long-running gateway processes that survive an ``hermes update`` keep the
-# old ``hermes_cli.config`` (and friends) cached in ``sys.modules``. When
-# the updated tool files on disk then try to ``from hermes_cli.config
-# import cfg_get`` (added in PR #17304), the import resolves against the
-# already-loaded stale module object and raises ``ImportError`` — see
-# Issue #17648. Rather than papering over the import failure site-by-site
-# in every tool file, detect the stale state centrally and auto-restart
-# so the gateway reloads with fresh code. The sentinel files below are
-# the canonical repo-level markers that every update touches; if any is
-# newer than the gateway's boot time, we know the running process is out
-# of date.
-_STALE_CODE_SENTINELS: tuple[str, ...] = (
- "hermes_cli/config.py",
- "hermes_cli/__init__.py",
- "run_agent.py",
- "gateway/run.py",
- "pyproject.toml",
-)
-
-
-def _compute_repo_mtime(repo_root: Path) -> float:
- """Return the newest mtime across the stale-code sentinel files.
-
- Missing files are ignored (they may not exist on older checkouts).
- Returns 0.0 if no sentinel file is readable — treat that as "can't
- tell", which downstream callers interpret as "not stale" to avoid
- false-positive restart loops.
- """
- newest = 0.0
- for rel in _STALE_CODE_SENTINELS:
- try:
- st = (repo_root / rel).stat()
- except (OSError, FileNotFoundError):
- continue
- if st.st_mtime > newest:
- newest = st.st_mtime
- return newest
-
-
def _coerce_gateway_timestamp(value: Any) -> Optional[float]:
"""Best-effort conversion of stored gateway timestamps to epoch seconds.
@@ -297,13 +258,18 @@ def _ensure_ssl_certs() -> None:
return
def _home_target_env_var(platform_name: str) -> str:
- """Return the configured home-target env var for a platform."""
- from cron.scheduler import _HOME_TARGET_ENV_VARS
+ """Return the configured home-target env var for a platform.
- return _HOME_TARGET_ENV_VARS.get(
- platform_name.lower(),
- f"{platform_name.upper()}_HOME_CHANNEL",
- )
+ Consults built-in ``_HOME_TARGET_ENV_VARS`` first, then the plugin
+ registry via ``cron.scheduler._resolve_home_env_var``, then falls back
+ to ``_HOME_CHANNEL`` for unknown names.
+ """
+ from cron.scheduler import _resolve_home_env_var
+
+ resolved = _resolve_home_env_var(platform_name)
+ if resolved:
+ return resolved
+ return f"{platform_name.upper()}_HOME_CHANNEL"
def _home_thread_env_var(platform_name: str) -> str:
@@ -338,6 +304,36 @@ _env_path = _hermes_home / '.env'
load_hermes_dotenv(hermes_home=_hermes_home, project_env=Path(__file__).resolve().parents[1] / '.env')
+def _reload_runtime_env_preserving_config_authority() -> None:
+ """Reload .env for fresh credentials without letting stale .env override config.
+
+ Gateway processes are long-lived, so per-turn code reloads ~/.hermes/.env to
+ pick up rotated API keys. config.yaml remains authoritative for agent budget
+ settings such as agent.max_turns; otherwise a stale HERMES_MAX_ITERATIONS in
+ .env can replace the startup bridge on later turns.
+ """
+ load_hermes_dotenv(
+ hermes_home=_hermes_home,
+ project_env=Path(__file__).resolve().parents[1] / '.env',
+ )
+
+ config_path = _hermes_home / 'config.yaml'
+ if not config_path.exists():
+ return
+ try:
+ import yaml as _yaml
+ with open(config_path, encoding="utf-8") as f:
+ cfg = _yaml.safe_load(f) or {}
+ from hermes_cli.config import _expand_env_vars
+ cfg = _expand_env_vars(cfg)
+ except Exception:
+ return
+
+ agent_cfg = cfg.get("agent", {})
+ if isinstance(agent_cfg, dict) and "max_turns" in agent_cfg:
+ os.environ["HERMES_MAX_ITERATIONS"] = str(agent_cfg["max_turns"])
+
+
_DOCKER_VOLUME_SPEC_RE = re.compile(r"^(?P.+):(?P/[^:]+?)(?::(?P[^:]+))?$")
_DOCKER_MEDIA_OUTPUT_CONTAINER_PATHS = {"/output", "/outputs"}
@@ -507,22 +503,22 @@ try:
_network_cfg = (_cfg if '_cfg' in dir() else {}).get("network", {})
if isinstance(_network_cfg, dict) and _network_cfg.get("force_ipv4"):
apply_ipv4_preference(force=True)
-except Exception:
- pass
+except Exception as _bootstrap_exc:
+ print(f" Warning: IPv4 preference application failed: {_bootstrap_exc}", file=sys.stderr)
# Validate config structure early — log warnings so gateway operators see problems
try:
from hermes_cli.config import print_config_warnings
print_config_warnings()
-except Exception:
- pass
+except Exception as _bootstrap_exc:
+ print(f" Warning: config validation failed: {_bootstrap_exc}", file=sys.stderr)
# Warn if user has deprecated MESSAGING_CWD / TERMINAL_CWD in .env
try:
from hermes_cli.config import warn_deprecated_cwd_env_vars
warn_deprecated_cwd_env_vars()
-except Exception:
- pass
+except Exception as _bootstrap_exc:
+ print(f" Warning: deprecation check failed: {_bootstrap_exc}", file=sys.stderr)
# Gateway runs in quiet mode - suppress debug output and use cwd directly (no temp dirs)
os.environ["HERMES_QUIET"] = "1"
@@ -652,7 +648,11 @@ def _try_resolve_fallback_provider() -> dict | None:
explicit_base_url=entry.get("base_url"),
explicit_api_key=entry.get("api_key"),
)
- logger.info("Fallback provider resolved: %s", runtime.get("provider"))
+ logger.info(
+ "Fallback provider resolved: %s model=%s",
+ runtime.get("provider"),
+ entry.get("model"),
+ )
return {
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
@@ -661,6 +661,7 @@ def _try_resolve_fallback_provider() -> dict | None:
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
"credential_pool": runtime.get("credential_pool"),
+ "model": entry.get("model"),
}
except Exception as fb_exc:
logger.debug("Fallback entry %s failed: %s", entry.get("provider"), fb_exc)
@@ -978,6 +979,72 @@ import weakref as _weakref
_gateway_runner_ref: _weakref.ref = lambda: None
+def _normalize_empty_agent_response(
+ agent_result: dict,
+ response: str,
+ *,
+ history_len: int = 0,
+) -> str:
+ """Normalize empty/None agent responses into user-facing messages.
+
+ Consolidates the existing ``failed`` handler and adds a catch-all for
+ the case where the agent did work (api_calls > 0) but returned no text.
+ Fix for #18765.
+ """
+ if response:
+ return response
+
+ if agent_result.get("failed"):
+ error_detail = agent_result.get("error", "unknown error")
+ error_str = str(error_detail).lower()
+ is_context_failure = any(
+ p in error_str
+ for p in ("context", "token", "too large", "too long", "exceed", "payload")
+ ) or ("400" in error_str and history_len > 50)
+ if is_context_failure:
+ return (
+ "⚠️ Session too large for the model's context window.\n"
+ "Use /compact to compress the conversation, or "
+ "/reset to start fresh."
+ )
+ return (
+ f"The request failed: {str(error_detail)[:300]}\n"
+ "Try again or use /reset to start a fresh session."
+ )
+
+ api_calls = int(agent_result.get("api_calls", 0) or 0)
+ if api_calls > 0 and not agent_result.get("interrupted"):
+ if agent_result.get("partial"):
+ err = agent_result.get("error", "processing incomplete")
+ return f"⚠️ Processing stopped: {str(err)[:200]}. Try again."
+ return (
+ "⚠️ Processing completed but no response was generated. "
+ "This may be a transient error — try sending your message again."
+ )
+
+ return response
+
+
+def _should_clear_resume_pending_after_turn(agent_result: dict) -> bool:
+ """Return True only when a gateway turn really completed successfully.
+
+ Restart recovery uses ``resume_pending`` as a durable marker for sessions
+ interrupted during gateway drain. A soft interrupt can still bubble out as
+ a syntactically normal agent result with an empty final response; clearing
+ the marker in that case loses the recovery signal and startup auto-resume
+ has nothing to schedule.
+ """
+ if not isinstance(agent_result, dict):
+ return False
+ if agent_result.get("interrupted"):
+ return False
+ if agent_result.get("failed") or agent_result.get("partial") or agent_result.get("error"):
+ return False
+ if agent_result.get("completed") is False:
+ return False
+ return True
+
+
class GatewayRunner:
"""
Main gateway controller.
@@ -1000,12 +1067,6 @@ class GatewayRunner:
_stop_task: Optional[asyncio.Task] = None
_session_model_overrides: Dict[str, Dict[str, str]] = {}
_session_reasoning_overrides: Dict[str, Dict[str, Any]] = {}
- # Stale-code self-check defaults (see _detect_stale_code()). Class-level
- # so tests that construct GatewayRunner via ``object.__new__`` without
- # running __init__ don't crash when _handle_message reads these.
- _boot_wall_time: float = 0.0
- _boot_repo_mtime: float = 0.0
- _stale_code_restart_triggered: bool = False
def __init__(self, config: Optional[GatewayConfig] = None):
global _gateway_runner_ref
@@ -1014,22 +1075,6 @@ class GatewayRunner:
self._warn_if_docker_media_delivery_is_risky()
_gateway_runner_ref = _weakref.ref(self)
- # Boot-time snapshot used by the stale-code self-check. Captured
- # before any work happens so post-update file writes are guaranteed
- # to have newer mtimes. See _detect_stale_code() / Issue #17648.
- try:
- self._boot_wall_time: float = time.time()
- self._repo_root_for_staleness: Path = Path(__file__).resolve().parent.parent
- self._boot_repo_mtime: float = _compute_repo_mtime(
- self._repo_root_for_staleness,
- )
- except Exception:
- self._boot_wall_time = 0.0
- self._repo_root_for_staleness = Path(".")
- self._boot_repo_mtime = 0.0
- self._stale_code_notified: set[str] = set()
- self._stale_code_restart_triggered: bool = False
-
# Load ephemeral config from config.yaml / env vars.
# Both are injected at API-call time only and never persisted.
self._prefill_messages = self._load_prefill_messages()
@@ -1050,6 +1095,7 @@ class GatewayRunner:
)
self.delivery_router = DeliveryRouter(self.config)
self._running = False
+ self._gateway_loop: Optional[asyncio.AbstractEventLoop] = None
self._shutdown_event = asyncio.Event()
self._exit_cleanly = False
self._exit_with_failure = False
@@ -1080,6 +1126,13 @@ class GatewayRunner:
self._pending_native_image_paths_by_session: Dict[str, List[str]] = {}
self._busy_ack_ts: Dict[str, float] = {} # last busy-ack timestamp per session (debounce)
self._session_run_generation: Dict[str, int] = {}
+ # LRU cache of live SessionSources keyed by session_key. Used by
+ # fallback routing paths (shutdown notifications, synthetic
+ # background-process events) when the persisted origin is missing
+ # and _parse_session_key can't recover thread_id. Capped so it
+ # cannot grow unbounded over a long-running gateway lifetime.
+ self._session_sources: "OrderedDict[str, SessionSource]" = OrderedDict()
+ self._session_sources_max = 512
# Cache AIAgent instances per session to preserve prompt caching.
# Without this, a new AIAgent is created per message, rebuilding the
@@ -1174,6 +1227,7 @@ class GatewayRunner:
retention_days=int(_ckpt_cfg.get("retention_days", 7)),
min_interval_hours=int(_ckpt_cfg.get("min_interval_hours", 24)),
delete_orphans=bool(_ckpt_cfg.get("delete_orphans", True)),
+ max_total_size_mb=int(_ckpt_cfg.get("max_total_size_mb", 500)),
)
except Exception as exc:
logger.debug("checkpoint auto-maintenance skipped: %s", exc)
@@ -1454,6 +1508,118 @@ class GatewayRunner:
thread_sessions_per_user=getattr(config, "thread_sessions_per_user", False),
)
+ def _telegram_topic_mode_enabled(self, source: SessionSource) -> bool:
+ """Return whether Telegram DM topic mode is active for this chat."""
+ if source.platform != Platform.TELEGRAM or source.chat_type != "dm":
+ return False
+ session_db = getattr(self, "_session_db", None)
+ if session_db is None:
+ return False
+ try:
+ raw = session_db.is_telegram_topic_mode_enabled(
+ chat_id=str(source.chat_id),
+ user_id=str(source.user_id),
+ )
+ except Exception:
+ logger.debug("Failed to read Telegram topic mode state", exc_info=True)
+ return False
+ # Only honor a real True from the SessionDB. Any other value
+ # (including MagicMock instances from test fixtures that didn't
+ # opt into topic mode) means topic mode is off for this chat.
+ return raw is True
+
+ # Telegram's General (pinned top) topic in forum-enabled private chats.
+ # Bot API behavior varies: some clients omit message_thread_id for
+ # General, others send "1". Treat both as "root" for lobby/lane purposes.
+ _TELEGRAM_GENERAL_TOPIC_IDS = frozenset({"", "1"})
+
+ def _is_telegram_topic_root_lobby(self, source: SessionSource) -> bool:
+ """True for the main Telegram DM (or General topic) when topic mode has made it a lobby."""
+ if source.platform != Platform.TELEGRAM or source.chat_type != "dm":
+ return False
+ if not self._telegram_topic_mode_enabled(source):
+ return False
+ tid = str(source.thread_id or "")
+ return tid in self._TELEGRAM_GENERAL_TOPIC_IDS
+
+ def _is_telegram_topic_lane(self, source: SessionSource) -> bool:
+ """True for a user-created Telegram private-chat topic lane."""
+ if source.platform != Platform.TELEGRAM or source.chat_type != "dm":
+ return False
+ if not self._telegram_topic_mode_enabled(source):
+ return False
+ tid = str(source.thread_id or "")
+ if not tid or tid in self._TELEGRAM_GENERAL_TOPIC_IDS:
+ return False
+ return True
+
+ _TELEGRAM_LOBBY_REMINDER_COOLDOWN_S = 30.0
+
+ def _should_send_telegram_lobby_reminder(self, source: SessionSource) -> bool:
+ """Rate-limit root-DM lobby reminders to one message per cooldown window.
+
+ A user who forgets multi-session mode is enabled and types several
+ prompts in the root DM would otherwise get a reminder for every
+ message. Cap it so the first one lands and the rest stay quiet.
+ """
+ if not hasattr(self, "_telegram_lobby_reminder_ts"):
+ self._telegram_lobby_reminder_ts = {}
+ chat_id = str(source.chat_id or "")
+ if not chat_id:
+ return True
+ import time as _time
+ now = _time.monotonic()
+ last = self._telegram_lobby_reminder_ts.get(chat_id, 0.0)
+ if now - last < self._TELEGRAM_LOBBY_REMINDER_COOLDOWN_S:
+ return False
+ self._telegram_lobby_reminder_ts[chat_id] = now
+ return True
+
+ def _telegram_topic_root_lobby_message(self) -> str:
+ return (
+ "This main chat is reserved for system commands.\n\n"
+ "To start a new Hermes chat, open the All Messages topic at the top "
+ "of this bot interface and send any message there. Telegram will "
+ "create a new topic for that message; each topic works as an "
+ "independent Hermes session."
+ )
+
+ def _telegram_topic_root_new_message(self) -> str:
+ return (
+ "To start a new parallel Hermes chat, open the All Messages topic "
+ "at the top of this bot interface and send any message there. "
+ "Telegram will create a new topic for it.\n\n"
+ "Each topic is an independent Hermes session. Use /new inside an "
+ "existing topic only if you want to replace that topic's current session."
+ )
+
+ def _telegram_topic_new_header(self, source: SessionSource) -> Optional[str]:
+ if not self._is_telegram_topic_lane(source):
+ return None
+ return (
+ "Started a new Hermes session in this topic.\n\n"
+ "Tip: for parallel work, open All Messages and send a message there "
+ "to create a separate topic instead of using /new here. /new replaces "
+ "the session attached to the current topic."
+ )
+
+ def _record_telegram_topic_binding(
+ self,
+ source: SessionSource,
+ session_entry,
+ ) -> None:
+ """Persist the Telegram topic -> Hermes session binding for topic lanes."""
+ session_db = getattr(self, "_session_db", None)
+ if session_db is None or not source.chat_id or not source.thread_id:
+ return
+ session_db.bind_telegram_topic(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ user_id=str(source.user_id or ""),
+ session_key=session_entry.session_key,
+ session_id=session_entry.session_id,
+ )
+
def _resolve_session_agent_runtime(
self,
*,
@@ -1505,6 +1671,14 @@ class GatewayRunner:
)
runtime_kwargs = _resolve_runtime_agent_kwargs()
+ runtime_model = runtime_kwargs.pop("model", None)
+ if runtime_model:
+ logger.info(
+ "Runtime provider supplied explicit model override: %s -> %s",
+ model,
+ runtime_model,
+ )
+ model = runtime_model
if override and resolved_session_key:
model, runtime_kwargs = self._apply_session_model_override(
resolved_session_key, model, runtime_kwargs
@@ -2332,6 +2506,9 @@ class GatewayRunner:
e,
)
+ if source is None:
+ source = self._get_cached_session_source(session_key)
+
if source is not None:
platform_str = source.platform.value
chat_id = str(source.chat_id)
@@ -2359,6 +2536,14 @@ class GatewayRunner:
if not adapter:
continue
+ platform_cfg = self.config.platforms.get(platform)
+ if platform_cfg is not None and not platform_cfg.gateway_restart_notification:
+ logger.info(
+ "Shutdown notification suppressed for active session: %s has gateway_restart_notification=false",
+ platform_str,
+ )
+ continue
+
# Include thread_id if present so the message lands in the
# correct forum topic / thread.
metadata = {"thread_id": thread_id} if thread_id else None
@@ -2384,11 +2569,24 @@ class GatewayRunner:
platform_str, chat_id, e,
)
- for platform, adapter in self.adapters.items():
+ # Snapshot adapters up front: adapter.send() can hit a fatal error
+ # path that pops the adapter from self.adapters (see _handle_fatal
+ # elsewhere), which would otherwise trigger
+ # ``RuntimeError: dictionary changed size during iteration`` —
+ # observed in a user report during gateway shutdown.
+ for platform, adapter in list(self.adapters.items()):
home = self.config.get_home_channel(platform)
if not home or not home.chat_id:
continue
+ platform_cfg = self.config.platforms.get(platform)
+ if platform_cfg is not None and not platform_cfg.gateway_restart_notification:
+ logger.info(
+ "Shutdown notification suppressed for home channel: %s has gateway_restart_notification=false",
+ platform.value,
+ )
+ continue
+
dedup_key = (platform.value, str(home.chat_id), str(home.thread_id) if home.thread_id else None)
if dedup_key in notified:
continue
@@ -2624,62 +2822,82 @@ class GatewayRunner:
task.add_done_callback(self._background_tasks.discard)
return True
- def _detect_stale_code(self) -> bool:
- """Return True if source files on disk are newer than the running process.
+ # Drain-timeout reasons set by _stop_impl() when a still-running turn is
+ # force-interrupted; "restart_interrupted" is set by
+ # SessionStore.suspend_recently_active() on crash recovery (no
+ # .clean_shutdown marker). All three mean "the agent was mid-turn and
+ # we killed it" — eligible for startup auto-resume.
+ _AUTO_RESUME_REASONS = frozenset(
+ {"restart_timeout", "shutdown_timeout", "restart_interrupted"}
+ )
- A gateway that survives ``hermes update`` (manual SIGTERM never
- escalated, systemd restart race, detached-process respawn failed,
- etc.) keeps pre-update modules cached in ``sys.modules``. Later
- imports of names added post-update — e.g. ``cfg_get`` from PR
- #17304 — raise ImportError against the stale module object (see
- Issue #17648). Detecting this at the source — "the code on disk
- is newer than me" — lets us auto-restart instead of serving
- broken responses until the user notices and runs
- ``hermes gateway restart`` manually.
+ def _schedule_resume_pending_sessions(self) -> int:
+ """Auto-continue fresh restart-interrupted sessions after startup.
- Returns False when the boot-time snapshot is unavailable or no
- sentinel file is readable, to avoid false-positive restart loops
- in unusual checkouts (sparse clones, read-only filesystems).
+ ``resume_pending`` already preserves the transcript AND the existing
+ ``_is_resume_pending`` branch in ``_handle_message_with_agent``
+ injects a reason-aware recovery system note on the next turn. This
+ method closes the UX gap by synthesizing that next turn once
+ adapters are back online — the event text is empty so the existing
+ injection path owns the wording and we never double up.
+
+ Adapters that are not yet ready (adapter missing from
+ ``self.adapters``) are skipped silently; their sessions stay
+ ``resume_pending`` and will auto-resume on the next real user
+ message, or on the next gateway startup.
"""
- if not self._boot_wall_time or not self._boot_repo_mtime:
- return False
+ window = _auto_continue_freshness_window()
try:
- current = _compute_repo_mtime(self._repo_root_for_staleness)
- except Exception:
- return False
- if current <= 0.0:
- return False
- # 2-second slack guards against filesystems with coarse mtime
- # resolution (FAT32, some NFS mounts). Real updates always move
- # the newest-file mtime forward by minutes, so this doesn't hide
- # genuine staleness.
- return current > self._boot_repo_mtime + 2.0
-
- def _trigger_stale_code_restart(self) -> None:
- """Idempotently kick off a graceful restart after stale-code detection.
-
- Runs at most once per process. The restart request goes through
- the normal drain path so in-flight agent turns finish before the
- process exits; the service manager (systemd / launchd / detached
- profile watcher) then respawns with fresh code. On manual
- ``hermes gateway run`` installs without a supervisor, the
- process exits and the user must restart by hand — but they get a
- user-visible message telling them so.
- """
- if self._stale_code_restart_triggered:
- return
- self._stale_code_restart_triggered = True
- logger.warning(
- "Stale-code self-check: source files newer than gateway boot "
- "time (boot=%.0f, newest=%.0f) — requesting graceful restart. "
- "See Issue #17648.",
- self._boot_repo_mtime,
- _compute_repo_mtime(self._repo_root_for_staleness),
- )
- try:
- self.request_restart(detached=False, via_service=True)
+ with self.session_store._lock: # noqa: SLF001 — snapshot under lock
+ self.session_store._ensure_loaded_locked() # noqa: SLF001
+ candidates = [
+ entry for entry in self.session_store._entries.values() # noqa: SLF001
+ if entry.resume_pending
+ and not entry.suspended
+ and entry.origin is not None
+ and entry.resume_reason in self._AUTO_RESUME_REASONS
+ ]
except Exception as exc:
- logger.error("Stale-code restart request failed: %s", exc)
+ logger.warning("Failed to enumerate resume-pending sessions: %s", exc)
+ return 0
+
+ now = datetime.now()
+ scheduled = 0
+ for entry in candidates:
+ marker = entry.last_resume_marked_at or entry.updated_at
+ if marker is not None and (now - marker).total_seconds() > window:
+ continue
+
+ source = entry.origin
+ adapter = self.adapters.get(source.platform)
+ if adapter is None:
+ logger.debug(
+ "Skipping auto-resume for %s: adapter not ready for %s",
+ entry.session_key,
+ getattr(source.platform, "value", source.platform),
+ )
+ continue
+
+ # Empty-text internal event — the _is_resume_pending branch in
+ # _handle_message_with_agent prepends the proper reason-aware
+ # system note before the turn runs.
+ event = MessageEvent(
+ text="",
+ message_type=MessageType.TEXT,
+ source=source,
+ internal=True,
+ )
+ task = asyncio.create_task(adapter.handle_message(event))
+ self._background_tasks.add(task)
+ task.add_done_callback(self._background_tasks.discard)
+ scheduled += 1
+
+ if scheduled:
+ logger.info(
+ "Scheduled auto-resume for %d restart-interrupted session(s)",
+ scheduled,
+ )
+ return scheduled
async def start(self) -> bool:
"""
@@ -2688,6 +2906,10 @@ class GatewayRunner:
Returns True if at least one adapter connected successfully.
"""
logger.info("Starting Hermes Gateway...")
+ try:
+ self._gateway_loop = asyncio.get_running_loop()
+ except RuntimeError:
+ self._gateway_loop = None
logger.info("Session storage: %s", self.config.sessions_dir)
# Log the resolved max_iterations budget so operators can verify the
# config.yaml → env bridge did the right thing at a glance (instead
@@ -2701,6 +2923,29 @@ class GatewayRunner:
)
except Exception:
pass
+ # Redaction status: ON by default (#17691). Surface a prominent
+ # warning if an operator has explicitly opted out so they don't
+ # forget the downgrade is active — the redactor snapshots its
+ # state at import time, so this log line is the source of truth
+ # for this process's lifetime.
+ try:
+ _redact_raw = os.getenv("HERMES_REDACT_SECRETS", "true")
+ _redact_on = _redact_raw.lower() in ("1", "true", "yes", "on")
+ if _redact_on:
+ logger.info(
+ "Secret redaction: ENABLED (tool output, logs, and chat "
+ "responses are scrubbed before delivery)"
+ )
+ else:
+ logger.warning(
+ "Secret redaction: DISABLED (HERMES_REDACT_SECRETS=%s). "
+ "API keys and tokens may appear verbatim in chat output, "
+ "session JSONs, and logs. Set security.redact_secrets: true "
+ "in config.yaml to re-enable.",
+ _redact_raw,
+ )
+ except Exception:
+ pass
try:
from hermes_cli.profiles import get_active_profile_name
_profile = get_active_profile_name()
@@ -3065,6 +3310,12 @@ class GatewayRunner:
skip_targets=skip_home_targets,
)
+ # Automatically continue fresh sessions that were interrupted by the
+ # previous gateway restart/shutdown. The resume_pending flag is cleared
+ # by the normal successful-turn path, so a failed auto-resume remains
+ # visible for manual recovery on the next user message.
+ self._schedule_resume_pending_sessions()
+
# Drain any recovered process watchers (from crash recovery checkpoint)
try:
from tools.process_registry import process_registry
@@ -3578,6 +3829,29 @@ class GatewayRunner:
if interval < 1.0:
interval = 1.0 # sanity floor — tighter than this is a footgun
+ # Read max_spawn config to limit concurrent kanban tasks
+ max_spawn = kanban_cfg.get("max_spawn", None)
+ if max_spawn is not None:
+ logger.info(f"kanban dispatcher: max_spawn={max_spawn}")
+
+ raw_failure_limit = kanban_cfg.get("failure_limit", _kb.DEFAULT_FAILURE_LIMIT)
+ try:
+ failure_limit = int(raw_failure_limit)
+ except (TypeError, ValueError):
+ logger.warning(
+ "kanban dispatcher: invalid kanban.failure_limit=%r; using default %d",
+ raw_failure_limit,
+ _kb.DEFAULT_FAILURE_LIMIT,
+ )
+ failure_limit = _kb.DEFAULT_FAILURE_LIMIT
+ if failure_limit < 1:
+ logger.warning(
+ "kanban dispatcher: kanban.failure_limit=%r is below 1; using default %d",
+ raw_failure_limit,
+ _kb.DEFAULT_FAILURE_LIMIT,
+ )
+ failure_limit = _kb.DEFAULT_FAILURE_LIMIT
+
# Initial delay so the gateway finishes wiring adapters before the
# dispatcher spawns workers (those workers may hit gateway notify
# subscriptions etc.). Matches the notifier watcher's delay.
@@ -3606,7 +3880,12 @@ class GatewayRunner:
_kb.init_db(board=slug) # idempotent, handles first-run
except Exception:
pass
- return _kb.dispatch_once(conn, board=slug)
+ return _kb.dispatch_once(
+ conn,
+ board=slug,
+ max_spawn=max_spawn,
+ failure_limit=failure_limit,
+ )
except Exception:
logger.exception("kanban dispatcher: tick failed on board %s", slug)
return None
@@ -3635,7 +3914,17 @@ class GatewayRunner:
return out
def _ready_nonempty() -> bool:
- """Cheap probe: is there a ready+assigned+unclaimed task on ANY board?"""
+ """Cheap probe: is there at least one ready+assigned+unclaimed
+ task on ANY board whose assignee maps to a real Hermes profile
+ (i.e. one the dispatcher would actually spawn for)?
+
+ Tasks assigned to control-plane lanes (e.g. ``orion-cc``,
+ ``orion-research``) are pulled by terminals via
+ ``claim_task`` directly and never spawnable, so a queue full
+ of those is "correctly idle", not "stuck". Filtering them out
+ here keeps the stuck-warn fire only on real failures (broken
+ PATH, missing venv, credential loss for a real Hermes profile).
+ """
try:
boards = _kb.list_boards(include_archived=False)
except Exception:
@@ -3645,12 +3934,7 @@ class GatewayRunner:
conn = None
try:
conn = _kb.connect(board=slug)
- row = conn.execute(
- "SELECT 1 FROM tasks "
- "WHERE status = 'ready' AND assignee IS NOT NULL "
- " AND claim_lock IS NULL LIMIT 1"
- ).fetchone()
- if row is not None:
+ if _kb.has_spawnable_ready(conn):
return True
except Exception:
continue
@@ -4602,27 +4886,6 @@ class GatewayRunner:
"""
source = event.source
- # Stale-code self-check (Issue #17648). A gateway that survives
- # ``hermes update`` keeps old modules cached in sys.modules; the
- # first inbound message is our earliest safe chance to detect
- # this and restart gracefully before we dispatch to the agent
- # and hit ImportError on freshly-added names (e.g. cfg_get).
- # Idempotent — runs the real check at most once per message, and
- # request_restart() no-ops after the first call.
- try:
- if self._detect_stale_code():
- self._trigger_stale_code_restart()
- # Acknowledge to the user so they don't see a silent
- # drop; the gateway will be back up in a moment via the
- # service manager / profile-watcher respawn.
- return (
- "⟳ Gateway code was updated in the background — "
- "restarting this gateway so your next message runs "
- "on the new code. Please retry in a moment."
- )
- except Exception as _stale_exc:
- logger.debug("Stale-code self-check failed: %s", _stale_exc)
-
# Internal events (e.g. background-process completion notifications)
# are system-generated and must skip user authorization.
is_internal = bool(getattr(event, "internal", False))
@@ -4749,10 +5012,12 @@ class GatewayRunner:
response_text = raw
if response_text:
response_path = _hermes_home / ".update_response"
+ prompt_path = _hermes_home / ".update_prompt.json"
try:
tmp = response_path.with_suffix(".tmp")
tmp.write_text(response_text)
tmp.replace(response_path)
+ prompt_path.unlink(missing_ok=True)
except OSError as e:
logger.warning("Failed to write update response: %s", e)
return f"✗ Failed to send response to update process: {e}"
@@ -4767,10 +5032,12 @@ class GatewayRunner:
# The slash command then falls through to normal dispatch.
if _recognized_cmd:
response_path = _hermes_home / ".update_response"
+ prompt_path = _hermes_home / ".update_prompt.json"
try:
tmp = response_path.with_suffix(".tmp")
tmp.write_text("")
tmp.replace(response_path)
+ prompt_path.unlink(missing_ok=True)
logger.info(
"Recognized /%s during pending update prompt for %s; "
"cancelled prompt with default and dispatching command",
@@ -5274,7 +5541,12 @@ class GatewayRunner:
break
if canonical == "new":
+ if self._is_telegram_topic_root_lobby(source):
+ return self._telegram_topic_root_new_message()
return await self._handle_reset_command(event)
+
+ if canonical == "topic":
+ return await self._handle_topic_command(event)
if canonical == "help":
return await self._handle_help_command(event)
@@ -5523,6 +5795,13 @@ class GatewayRunner:
# No bare text matching — "yes" in normal conversation must not trigger
# execution of a dangerous command.
+ if self._is_telegram_topic_root_lobby(source):
+ # Debounce the lobby reminder so a user who forgets about
+ # topic mode and fires ten prompts doesn't get ten copies.
+ if self._should_send_telegram_lobby_reminder(source):
+ return self._telegram_topic_root_lobby_message()
+ return None
+
# ── Claim this session before any await ───────────────────────
# Between here and _run_agent registering the real AIAgent, there
# are numerous await points (hooks, vision enrichment, STT,
@@ -5690,6 +5969,7 @@ class GatewayRunner:
if event.media_urls and event.message_type == MessageType.DOCUMENT:
import mimetypes as _mimetypes
+ from tools.credential_files import to_agent_visible_cache_path
_TEXT_EXTENSIONS = {".txt", ".md", ".csv", ".log", ".json", ".xml", ".yaml", ".yml", ".toml", ".ini", ".cfg"}
for i, path in enumerate(event.media_urls):
@@ -5710,16 +5990,21 @@ class GatewayRunner:
display_name = parts[2] if len(parts) >= 3 else basename
display_name = re.sub(r'[^\w.\- ]', '_', display_name)
+ # Translate host cache path to in-container path if running under Docker backend.
+ # This ensures the agent receives a path it can open inside its sandbox, as the
+ # cache directories are auto-mounted at /root/.hermes/cache/* by get_cache_directory_mounts().
+ agent_path = to_agent_visible_cache_path(path)
+
if mtype.startswith("text/"):
context_note = (
f"[The user sent a text document: '{display_name}'. "
f"Its content has been included below. "
- f"The file is also saved at: {path}]"
+ f"The file is also saved at: {agent_path}]"
)
else:
context_note = (
f"[The user sent a document: '{display_name}'. "
- f"The file is saved at: {path}. "
+ f"The file is saved at: {agent_path}. "
f"Ask the user what they'd like you to do with it.]"
)
message_text = f"{context_note}\n\n{message_text}"
@@ -5784,6 +6069,41 @@ class GatewayRunner:
return []
return list(pending_native.pop(session_key, []) or [])
+ def _cache_session_source(self, session_key: str, source) -> None:
+ if not session_key or source is None:
+ return
+ cached_sources = getattr(self, "_session_sources", None)
+ if cached_sources is None:
+ cached_sources = OrderedDict()
+ self._session_sources = cached_sources
+ try:
+ cached_sources[session_key] = dataclasses.replace(source)
+ except Exception:
+ logger.debug("Failed to cache live session source for %s", session_key, exc_info=True)
+ return
+ # LRU: mark as most-recently-used and trim to max size.
+ try:
+ cached_sources.move_to_end(session_key)
+ max_size = getattr(self, "_session_sources_max", 512)
+ while len(cached_sources) > max_size:
+ cached_sources.popitem(last=False)
+ except Exception:
+ pass
+
+ def _get_cached_session_source(self, session_key: str):
+ if not session_key:
+ return None
+ cached_sources = getattr(self, "_session_sources", None)
+ if not cached_sources:
+ return None
+ source = cached_sources.get(session_key)
+ if source is not None:
+ try:
+ cached_sources.move_to_end(session_key)
+ except Exception:
+ pass
+ return source
+
async def _handle_message_with_agent(self, event, source, _quick_key: str, run_generation: int):
"""Inner handler that runs under the _running_agents sentinel guard."""
_msg_start_time = time.time()
@@ -5798,6 +6118,32 @@ class GatewayRunner:
# Get or create session
session_entry = self.session_store.get_or_create_session(source)
session_key = session_entry.session_key
+ self._cache_session_source(session_key, source)
+ if self._is_telegram_topic_lane(source):
+ try:
+ binding = self._session_db.get_telegram_topic_binding(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ ) if self._session_db else None
+ except Exception:
+ logger.debug("Failed to read Telegram topic binding", exc_info=True)
+ binding = None
+ if binding:
+ bound_session_id = str(binding.get("session_id") or "")
+ if bound_session_id and bound_session_id != session_entry.session_id:
+ # Route the override through SessionStore so the session_key
+ # → session_id mapping is persisted to disk and the previous
+ # lane session is ended cleanly. Mutating session_entry in
+ # place here created a split-brain state where the JSON
+ # index pointed at one id but code downstream used another.
+ switched = self.session_store.switch_session(session_key, bound_session_id)
+ if switched is not None:
+ session_entry = switched
+ else:
+ try:
+ self._record_telegram_topic_binding(source, session_entry)
+ except Exception:
+ logger.debug("Failed to record Telegram topic binding", exc_info=True)
if getattr(session_entry, "was_auto_reset", False):
# Treat auto-reset as a full conversation boundary — drop every
# session-scoped transient state so the fresh session does not
@@ -6247,6 +6593,10 @@ class GatewayRunner:
_werr,
)
finally:
+ # Evict the cached agent so the next turn
+ # rebuilds its system prompt from current
+ # SOUL.md, memory, and skills.
+ self._evict_cached_agent(session_key)
self._cleanup_agent_resources(_hyg_agent)
except Exception as e:
@@ -6405,7 +6755,7 @@ class GatewayRunner:
# shutdown) — the turn ran to completion, so recovery
# succeeded and subsequent messages should no longer receive
# the restart-interruption system note.
- if session_key:
+ if session_key and _should_clear_resume_pending_after_turn(agent_result):
self._clear_restart_failure_count(session_key)
try:
self.session_store.clear_resume_pending(session_key)
@@ -6415,33 +6765,11 @@ class GatewayRunner:
session_key, _e,
)
- # Surface error details when the agent failed silently (final_response=None)
- if not response and agent_result.get("failed"):
- error_detail = agent_result.get("error", "unknown error")
- error_str = str(error_detail).lower()
-
- # Detect context-overflow failures and give specific guidance.
- # Generic 400 "Error" from Anthropic with large sessions is the
- # most common cause of this (#1630).
- _is_ctx_fail = any(p in error_str for p in (
- "context", "token", "too large", "too long",
- "exceed", "payload",
- )) or (
- "400" in error_str
- and len(history) > 50
- )
-
- if _is_ctx_fail:
- response = (
- "⚠️ Session too large for the model's context window.\n"
- "Use /compact to compress the conversation, or "
- "/reset to start fresh."
- )
- else:
- response = (
- f"The request failed: {str(error_detail)[:300]}\n"
- "Try again or use /reset to start a fresh session."
- )
+ # Normalize empty responses: surface errors, partial failures, and
+ # the case where agent did work but returned no text. Fix for #18765.
+ response = _normalize_empty_agent_response(
+ agent_result, response, history_len=len(history),
+ )
# If the agent's session_id changed during compression, update
# session_entry so transcript writes below go to the right session.
@@ -6984,11 +7312,11 @@ class GatewayRunner:
session_info = ""
if new_entry:
- header = "✨ Session reset! Starting fresh."
+ header = self._telegram_topic_new_header(source) or "✨ Session reset! Starting fresh."
else:
# No existing session, just create one
new_entry = self.session_store.get_or_create_session(source, force_new=True)
- header = "✨ New session started!"
+ header = self._telegram_topic_new_header(source) or "✨ New session started!"
# Set session title if provided with /new
_title_arg = event.get_command_args().strip()
@@ -7013,6 +7341,17 @@ class GatewayRunner:
_title_note = "\n⚠️ Title is empty after cleanup — session started untitled."
header = header + _title_note
+ # When /new runs inside a Telegram DM topic lane, rewrite the
+ # (chat_id, thread_id) → session_id binding so the next message
+ # uses the freshly-created session. Without this, the binding
+ # still points at the old session and the binding-lookup at the
+ # top of _handle_message_with_agent would switch right back.
+ if self._is_telegram_topic_lane(source) and new_entry is not None:
+ try:
+ self._record_telegram_topic_binding(source, new_entry)
+ except Exception:
+ logger.debug("Failed to rebind Telegram topic after /new", exc_info=True)
+
# Fire plugin on_session_reset hook (new session guaranteed to exist)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
@@ -7343,7 +7682,7 @@ class GatewayRunner:
if self._restart_requested or self._draining:
count = self._running_agent_count()
if count:
- return f"⏳ Draining {count} active agent(s) before restart..."
+ return t("gateway.draining", count=count)
return EphemeralReply("⏳ Gateway restart already in progress...")
# Save the requester's routing info so the new gateway process can
@@ -7395,7 +7734,7 @@ class GatewayRunner:
else:
self.request_restart(detached=True, via_service=False)
if active_agents:
- return f"⏳ Draining {active_agents} active agent(s) before restart..."
+ return t("gateway.draining", count=active_agents)
return EphemeralReply("♻ Restarting gateway. If you aren't notified within 60 seconds, restart from the console with `hermes gateway restart`.")
def _is_stale_restart_redelivery(self, event: MessageEvent) -> bool:
@@ -7543,6 +7882,7 @@ class GatewayRunner:
from hermes_cli.model_switch import (
switch_model as _switch_model, parse_model_flags,
list_authenticated_providers,
+ list_picker_providers,
)
from hermes_cli.providers import get_label
@@ -7597,7 +7937,7 @@ class GatewayRunner:
if has_picker:
try:
- providers = list_authenticated_providers(
+ providers = list_picker_providers(
current_provider=current_provider,
current_base_url=current_base_url,
current_model=current_model,
@@ -7996,6 +8336,27 @@ class GatewayRunner:
# ────────────────────────────────────────────────────────────────
# /goal — persistent cross-turn goals (Ralph-style loop)
# ────────────────────────────────────────────────────────────────
+ def _goal_max_turns_from_config(self) -> int:
+ """Resolve the configured /goal turn budget for gateway sessions.
+
+ GatewayRunner.config is a GatewayConfig dataclass, not the full
+ user config mapping. Top-level config blocks such as ``goals`` are
+ therefore only available through hermes_cli.config.load_config().
+ """
+ try:
+ goals_cfg = (
+ (self.config or {}).get("goals", {})
+ if isinstance(self.config, dict)
+ else getattr(self.config, "goals", {}) or {}
+ )
+ if not goals_cfg:
+ from hermes_cli.config import load_config
+
+ goals_cfg = (load_config() or {}).get("goals") or {}
+ return int(goals_cfg.get("max_turns", 20) or 20)
+ except Exception:
+ return 20
+
def _get_goal_manager_for_event(self, event: "MessageEvent"):
"""Return a GoalManager bound to the session for this gateway event.
@@ -8015,15 +8376,7 @@ class GatewayRunner:
sid = getattr(session_entry, "session_id", None) or ""
if not sid:
return None, None
- try:
- goals_cfg = (
- (self.config or {}).get("goals", {})
- if isinstance(self.config, dict)
- else getattr(self.config, "goals", {}) or {}
- )
- max_turns = int(goals_cfg.get("max_turns", 20) or 20)
- except Exception:
- max_turns = 20
+ max_turns = self._goal_max_turns_from_config()
return GoalManager(session_id=sid, default_max_turns=max_turns), session_entry
async def _handle_goal_command(self, event: "MessageEvent") -> str:
@@ -8065,7 +8418,7 @@ class GatewayRunner:
if lower in ("clear", "stop", "done"):
had = mgr.has_goal()
mgr.clear()
- return "✓ Goal cleared." if had else "No active goal."
+ return t("gateway.goal_cleared") if had else t("gateway.no_active_goal")
# Otherwise — treat the remaining text as the new goal.
try:
@@ -8123,15 +8476,7 @@ class GatewayRunner:
if not sid:
return
- try:
- goals_cfg = (
- (self.config or {}).get("goals", {})
- if isinstance(self.config, dict)
- else getattr(self.config, "goals", {}) or {}
- )
- max_turns = int(goals_cfg.get("max_turns", 20) or 20)
- except Exception:
- max_turns = 20
+ max_turns = self._goal_max_turns_from_config()
mgr = GoalManager(session_id=sid, default_max_turns=max_turns)
if not mgr.is_active():
@@ -8674,6 +9019,12 @@ class GatewayRunner:
from urllib.parse import quote as _quote
try:
+ # Capture [[as_document]] before extract_media strips it, so the
+ # dispatch partition below can route image-extension files
+ # through send_document (preserving bytes) instead of
+ # send_multiple_images (Telegram sendPhoto recompresses to ~1280px).
+ force_document_attachments = "[[as_document]]" in response
+
media_files, _ = adapter.extract_media(response)
_, cleaned = adapter.extract_images(response)
local_files, _ = adapter.extract_local_files(cleaned)
@@ -8686,19 +9037,24 @@ class GatewayRunner:
_IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
# Partition out images so they can be sent as a single batch
- # (e.g. Signal's multi-attachment RPC)
+ # (e.g. Signal's multi-attachment RPC). When [[as_document]] was
+ # set, image-extension files skip the photo path and route to
+ # send_document below — preserving original bytes.
image_paths: list = []
non_image_media: list = []
for media_path, is_voice in media_files:
ext = Path(media_path).suffix.lower()
- if ext in _IMAGE_EXTS and not is_voice:
+ if (ext in _IMAGE_EXTS
+ and not is_voice
+ and not force_document_attachments):
image_paths.append(media_path)
else:
non_image_media.append((media_path, is_voice))
non_image_local: list = []
for file_path in local_files:
- if Path(file_path).suffix.lower() in _IMAGE_EXTS:
+ if (Path(file_path).suffix.lower() in _IMAGE_EXTS
+ and not force_document_attachments):
image_paths.append(file_path)
else:
non_image_local.append(file_path)
@@ -9283,7 +9639,7 @@ class GatewayRunner:
try:
user_config: dict = _load_gateway_config()
except Exception as e:
- return f"⚠️ Could not read config.yaml: {e}"
+ return t("gateway.config_read_failed", error=e)
effective = resolve_footer_config(user_config, platform_key)
@@ -9316,7 +9672,7 @@ class GatewayRunner:
atomic_yaml_write(config_path, user_config)
except Exception as e:
logger.warning("Failed to save runtime_footer.enabled: %s", e)
- return f"⚠️ Could not save config: {e}"
+ return t("gateway.config_save_failed", error=e)
state = "ON" if new_state else "OFF"
example = ""
@@ -9440,6 +9796,9 @@ class GatewayRunner:
_aux_fail_model = getattr(compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(compressor, "_last_aux_model_failure_error", None)
finally:
+ # Evict cached agent so next turn rebuilds system prompt
+ # from current files (SOUL.md, memory, etc.).
+ self._evict_cached_agent(session_key)
self._cleanup_agent_resources(tmp_agent)
lines = [f"🗜️ {summary['headline']}"]
if focus_topic:
@@ -9466,6 +9825,504 @@ class GatewayRunner:
logger.warning("Manual compress failed: %s", e)
return f"Compression failed: {e}"
+ async def _get_telegram_topic_capabilities(self, source: SessionSource) -> dict:
+ """Read Telegram private-topic capability flags via Bot API getMe."""
+ adapter = self.adapters.get(source.platform) if getattr(self, "adapters", None) else None
+ bot = getattr(adapter, "_bot", None)
+ if bot is None or not hasattr(bot, "get_me"):
+ return {"checked": False}
+ try:
+ me = await bot.get_me()
+ except Exception:
+ logger.debug("Failed to fetch Telegram getMe topic capabilities", exc_info=True)
+ return {"checked": False}
+
+ def _field(name: str):
+ if hasattr(me, name):
+ return getattr(me, name)
+ api_kwargs = getattr(me, "api_kwargs", None)
+ if isinstance(api_kwargs, dict) and name in api_kwargs:
+ return api_kwargs.get(name)
+ if isinstance(me, dict):
+ return me.get(name)
+ return None
+
+ return {
+ "checked": True,
+ "has_topics_enabled": _field("has_topics_enabled"),
+ "allows_users_to_create_topics": _field("allows_users_to_create_topics"),
+ }
+
+ async def _ensure_telegram_system_topic(self, source: SessionSource) -> None:
+ """Create/pin the managed System topic after /topic activation when possible."""
+ adapter = self.adapters.get(source.platform) if getattr(self, "adapters", None) else None
+ if adapter is None or not source.chat_id:
+ return
+
+ thread_id = None
+ create_topic = getattr(adapter, "_create_dm_topic", None)
+ if callable(create_topic):
+ try:
+ thread_id = await create_topic(int(source.chat_id), "System")
+ except Exception:
+ logger.debug("Failed to create Telegram System topic", exc_info=True)
+ if not thread_id:
+ return
+
+ message_id = None
+ try:
+ send_result = await adapter.send(
+ source.chat_id,
+ "System topic for Hermes commands and status.",
+ metadata={"thread_id": str(thread_id)},
+ )
+ message_id = getattr(send_result, "message_id", None)
+ except Exception:
+ logger.debug("Failed to send Telegram System topic intro", exc_info=True)
+ if not message_id:
+ return
+
+ bot = getattr(adapter, "_bot", None)
+ if bot is None or not hasattr(bot, "pin_chat_message"):
+ return
+ try:
+ await bot.pin_chat_message(
+ chat_id=int(source.chat_id),
+ message_id=int(message_id),
+ disable_notification=True,
+ )
+ except Exception:
+ logger.debug("Failed to pin Telegram System topic intro", exc_info=True)
+
+ async def _send_telegram_topic_setup_image(self, source: SessionSource) -> None:
+ """Send the bundled BotFather Threads Settings screenshot when available."""
+ adapter = self.adapters.get(source.platform) if getattr(self, "adapters", None) else None
+ if adapter is None or not source.chat_id or not hasattr(adapter, "send_image_file"):
+ return
+ image_path = Path(__file__).resolve().parent / "assets" / "telegram-botfather-threads-settings.jpg"
+ if not image_path.exists():
+ return
+ try:
+ await adapter.send_image_file(
+ chat_id=source.chat_id,
+ image_path=str(image_path),
+ caption="BotFather → Bot Settings → Threads Settings",
+ metadata={"thread_id": str(source.thread_id)} if source.thread_id else None,
+ )
+ except Exception:
+ logger.debug("Failed to send Telegram topic setup image", exc_info=True)
+
+ def _sanitize_telegram_topic_title(self, title: str) -> str:
+ """Return a Bot API-safe forum topic name from a generated session title."""
+ cleaned = re.sub(r"\s+", " ", str(title or "")).strip()
+ if not cleaned:
+ return "Hermes Chat"
+ # Telegram forum topic names are short (currently 1-128 chars). Keep
+ # extra room for multi-byte titles and avoid trailing ellipsis churn.
+ if len(cleaned) > 120:
+ cleaned = cleaned[:117].rstrip() + "..."
+ return cleaned
+
+ async def _rename_telegram_topic_for_session_title(
+ self,
+ source: SessionSource,
+ session_id: str,
+ title: str,
+ ) -> None:
+ """Best-effort rename of a Telegram DM topic when Hermes auto-titles a session."""
+ if not self._is_telegram_topic_lane(source) or not source.chat_id or not source.thread_id:
+ return
+
+ # Skip rename when the topic is operator-declared via
+ # extra.dm_topics. Those topics have fixed names chosen by the
+ # operator (plus optional skill binding); auto-renaming would
+ # silently mutate operator config.
+ #
+ # Check the class, not the instance — getattr() on MagicMock
+ # auto-creates attributes, so `hasattr(adapter, "_get_dm_topic_info")`
+ # would return True for every test double.
+ adapter = self.adapters.get(source.platform) if getattr(self, "adapters", None) else None
+ if adapter is not None:
+ get_info = getattr(type(adapter), "_get_dm_topic_info", None)
+ if callable(get_info):
+ try:
+ operator_topic = get_info(adapter, str(source.chat_id), str(source.thread_id))
+ except Exception:
+ operator_topic = None
+ # Only treat dict-shaped returns as operator-declared; a
+ # bare MagicMock or other sentinel shouldn't count.
+ if isinstance(operator_topic, dict):
+ return
+
+ session_db = getattr(self, "_session_db", None)
+ if session_db is not None:
+ try:
+ binding = session_db.get_telegram_topic_binding(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ )
+ if binding and str(binding.get("session_id") or "") != str(session_id):
+ return
+ except Exception:
+ logger.debug("Failed to verify Telegram topic binding before rename", exc_info=True)
+ return
+
+ if adapter is None:
+ return
+ topic_name = self._sanitize_telegram_topic_title(title)
+ try:
+ rename_topic = getattr(adapter, "rename_dm_topic", None)
+ if rename_topic is not None:
+ await rename_topic(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ name=topic_name,
+ )
+ return
+
+ bot = getattr(adapter, "_bot", None)
+ edit_forum_topic = getattr(bot, "edit_forum_topic", None) if bot is not None else None
+ if edit_forum_topic is None:
+ edit_forum_topic = getattr(bot, "editForumTopic", None) if bot is not None else None
+ if edit_forum_topic is None:
+ return
+ try:
+ await edit_forum_topic(
+ chat_id=int(source.chat_id),
+ message_thread_id=int(source.thread_id),
+ name=topic_name,
+ )
+ except (TypeError, ValueError):
+ await edit_forum_topic(
+ chat_id=source.chat_id,
+ message_thread_id=source.thread_id,
+ name=topic_name,
+ )
+ except Exception:
+ logger.debug("Failed to rename Telegram topic for auto-generated title", exc_info=True)
+
+ def _schedule_telegram_topic_title_rename(
+ self,
+ source: SessionSource,
+ session_id: str,
+ title: str,
+ ) -> None:
+ """Schedule a topic rename from the auto-title background thread."""
+ if not title or not self._is_telegram_topic_lane(source):
+ return
+ try:
+ loop = asyncio.get_running_loop()
+ except RuntimeError:
+ loop = getattr(self, "_gateway_loop", None)
+ if loop is None or loop.is_closed():
+ return
+ try:
+ copied_source = dataclasses.replace(source)
+ except Exception:
+ copied_source = source
+ future = asyncio.run_coroutine_threadsafe(
+ self._rename_telegram_topic_for_session_title(copied_source, session_id, title),
+ loop,
+ )
+ def _log_rename_failure(fut) -> None:
+ try:
+ fut.result()
+ except Exception:
+ logger.debug("Telegram topic title rename failed", exc_info=True)
+
+ future.add_done_callback(_log_rename_failure)
+
+ _TELEGRAM_CAPABILITY_HINT_COOLDOWN_S = 300.0
+
+ def _should_send_telegram_capability_hint(self, source: SessionSource) -> bool:
+ """Rate-limit the BotFather Threads Settings screenshot.
+
+ If a user sends /topic repeatedly while Threads Settings are still
+ off, we shouldn't keep re-uploading the screenshot every time.
+ """
+ if not hasattr(self, "_telegram_capability_hint_ts"):
+ self._telegram_capability_hint_ts = {}
+ chat_id = str(source.chat_id or "")
+ if not chat_id:
+ return True
+ import time as _time
+ now = _time.monotonic()
+ last = self._telegram_capability_hint_ts.get(chat_id, 0.0)
+ if now - last < self._TELEGRAM_CAPABILITY_HINT_COOLDOWN_S:
+ return False
+ self._telegram_capability_hint_ts[chat_id] = now
+ return True
+
+ def _telegram_topic_help_text(self) -> str:
+ return (
+ "/topic — enable multi-session DM mode (one bot, many parallel chats)\n"
+ "\n"
+ "Usage:\n"
+ " /topic Enable topic mode, or show status if already on\n"
+ " /topic help Show this message\n"
+ " /topic off Disable topic mode and clear topic bindings\n"
+ " /topic Inside a topic: restore a previous session by ID\n"
+ "\n"
+ "How it works:\n"
+ "1. Run /topic once in this DM — Hermes checks BotFather Threads\n"
+ " Settings are enabled and flips on multi-session mode.\n"
+ "2. Tap All Messages at the top of the bot and send any message.\n"
+ " Telegram creates a new topic for that message; each topic is\n"
+ " an independent Hermes session (fresh history, fresh context).\n"
+ "3. The root DM becomes a system lobby — send /topic, /status,\n"
+ " /help, /usage there. Normal prompts go in a topic.\n"
+ "4. /new inside a topic resets just that topic's session.\n"
+ "5. /topic inside a topic restores an old session into it."
+ )
+
+ def _disable_telegram_topic_mode_for_chat(self, source: SessionSource) -> str:
+ """Cleanly disable topic mode for a chat via /topic off."""
+ if not self._session_db:
+ return "Session database not available."
+ chat_id = str(source.chat_id or "")
+ if not chat_id:
+ return "Could not determine chat ID."
+ # No-op if never enabled.
+ try:
+ currently_enabled = self._session_db.is_telegram_topic_mode_enabled(
+ chat_id=chat_id,
+ user_id=str(source.user_id or ""),
+ )
+ except Exception:
+ currently_enabled = False
+ if not currently_enabled:
+ return "Multi-session topic mode is not currently enabled for this chat."
+ try:
+ self._session_db.disable_telegram_topic_mode(chat_id=chat_id)
+ except Exception as exc:
+ logger.exception("Failed to disable Telegram topic mode")
+ return f"Failed to disable topic mode: {exc}"
+ # Reset per-chat debounce state so the user doesn't see a stale
+ # cooldown on the next activation.
+ for attr in ("_telegram_lobby_reminder_ts", "_telegram_capability_hint_ts"):
+ store = getattr(self, attr, None)
+ if isinstance(store, dict):
+ store.pop(chat_id, None)
+ return (
+ "Multi-session topic mode is now OFF for this chat.\n\n"
+ "Existing topics in Telegram aren't removed — they'll just stop "
+ "being gated as independent sessions. The root DM works as a "
+ "normal Hermes chat again. Run /topic to re-enable later."
+ )
+
+ async def _handle_topic_command(self, event: MessageEvent, args: str = "") -> str:
+ """Handle /topic for Telegram DM user-managed topic sessions."""
+ source = event.source
+ if source.platform != Platform.TELEGRAM or source.chat_type != "dm":
+ return "The /topic command is only available in Telegram private chats."
+ if not self._session_db:
+ return "Session database not available."
+
+ # Authorization: /topic activates multi-session mode and mutates
+ # SQLite side tables. Unauthorized senders (not in allowlist) must
+ # not be able to do that. Gateway routes already authorize the
+ # message before reaching here, but defense in depth.
+ auth_fn = getattr(self, "_is_user_authorized", None)
+ if callable(auth_fn):
+ try:
+ if not auth_fn(source):
+ return "You are not authorized to use /topic on this bot."
+ except Exception:
+ logger.debug("Topic auth check failed", exc_info=True)
+
+ args = event.get_command_args().strip()
+
+ # /topic help — inline usage without leaving the bot.
+ if args.lower() in {"help", "?", "-h", "--help"}:
+ return self._telegram_topic_help_text()
+
+ # /topic off — clean disable path so users don't have to edit the DB.
+ if args.lower() in {"off", "disable", "stop"}:
+ return self._disable_telegram_topic_mode_for_chat(source)
+
+ if args:
+ if not source.thread_id:
+ return (
+ "To restore a session, first create or open a Telegram topic, "
+ "then send /topic inside that topic. To create a "
+ "new topic, open All Messages and send any message there."
+ )
+ return await self._restore_telegram_topic_session(event, args)
+
+ capabilities = await self._get_telegram_topic_capabilities(source)
+ if capabilities.get("checked"):
+ if capabilities.get("has_topics_enabled") is False:
+ # Debounce the BotFather screenshot: don't re-send on every
+ # /topic while threads are still disabled.
+ if self._should_send_telegram_capability_hint(source):
+ await self._send_telegram_topic_setup_image(source)
+ return (
+ "Telegram topics are not enabled for this bot yet.\n\n"
+ "How to enable them:\n"
+ "1. Open @BotFather.\n"
+ "2. Choose your bot.\n"
+ "3. Open Bot Settings → Threads Settings.\n"
+ "4. Turn on Threaded Mode and make sure users are allowed to create new threads.\n\n"
+ "Then send /topic again."
+ )
+ if capabilities.get("allows_users_to_create_topics") is False:
+ if self._should_send_telegram_capability_hint(source):
+ await self._send_telegram_topic_setup_image(source)
+ return (
+ "Telegram topics are enabled, but users are not allowed to create topics.\n\n"
+ "Open @BotFather → choose your bot → Bot Settings → Threads Settings, "
+ "then turn off 'Disallow users to create new threads'.\n\n"
+ "Then send /topic again."
+ )
+
+ try:
+ self._session_db.enable_telegram_topic_mode(
+ chat_id=str(source.chat_id),
+ user_id=str(source.user_id),
+ has_topics_enabled=capabilities.get("has_topics_enabled"),
+ allows_users_to_create_topics=capabilities.get("allows_users_to_create_topics"),
+ )
+ except Exception as exc:
+ logger.exception("Failed to enable Telegram topic mode")
+ return f"Failed to enable Telegram topic mode: {exc}"
+
+ if not source.thread_id:
+ await self._ensure_telegram_system_topic(source)
+
+ if source.thread_id:
+ try:
+ binding = self._session_db.get_telegram_topic_binding(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ )
+ except Exception:
+ logger.debug("Failed to read Telegram topic binding", exc_info=True)
+ binding = None
+ if binding:
+ session_id = str(binding.get("session_id") or "")
+ title = None
+ try:
+ title = self._session_db.get_session_title(session_id)
+ except Exception:
+ title = None
+ session_label = title or "Untitled session"
+ return (
+ "This topic is linked to:\n"
+ f"Session: {session_label}\n"
+ f"ID: {session_id}\n\n"
+ "Use /new to replace this topic with a fresh session.\n"
+ "For parallel work, open All Messages and send a message there "
+ "to create another topic."
+ )
+ return (
+ "Telegram multi-session topics are enabled.\n\n"
+ "This topic will be used as an independent Hermes session. "
+ "Use /new to replace this topic's current session. For parallel "
+ "work, open All Messages and send a message there to create another topic."
+ )
+
+ return self._telegram_topic_root_status_message(source)
+
+ def _telegram_topic_root_status_message(self, source: SessionSource) -> str:
+ lines = [
+ "Telegram multi-session topics are enabled.",
+ "",
+ "To create a new Hermes chat, open All Messages at the top of this "
+ "bot interface and send any message there. Telegram will create a "
+ "new topic for it.",
+ "",
+ ]
+ try:
+ sessions = self._session_db.list_unlinked_telegram_sessions_for_user(
+ chat_id=str(source.chat_id),
+ user_id=str(source.user_id),
+ limit=10,
+ )
+ except Exception:
+ logger.debug("Failed to list unlinked Telegram sessions", exc_info=True)
+ sessions = []
+
+ if sessions:
+ lines.append("Previous unlinked sessions:")
+ for session in sessions:
+ session_id = str(session.get("id") or "")
+ title = str(session.get("title") or "Untitled session")
+ preview = str(session.get("preview") or "").strip()
+ line = f"- {title} — `{session_id}`"
+ if preview:
+ line += f" — {preview}"
+ lines.append(line)
+ lines.extend([
+ "",
+ "To restore one:",
+ "1. Create or open a topic. To create a new one, open All Messages and send any message there.",
+ "2. Send /topic inside that topic.",
+ f"Example: Send /topic {sessions[0].get('id')} inside a topic.",
+ ])
+ else:
+ lines.extend([
+ "No previous unlinked Telegram sessions found.",
+ "",
+ "To restore a previous session later:",
+ "1. Create or open a topic. To create a new one, open All Messages and send any message there.",
+ "2. Send /topic inside that topic.",
+ ])
+ return "\n".join(lines)
+
+ async def _restore_telegram_topic_session(self, event: MessageEvent, raw_session_id: str) -> str:
+ """Restore an existing Telegram-owned Hermes session into this topic."""
+ source = event.source
+ session_id = self._session_db.resolve_session_id(raw_session_id.strip())
+ if not session_id:
+ return f"Session not found: {raw_session_id.strip()}"
+
+ session = self._session_db.get_session(session_id)
+ if not session:
+ return f"Session not found: {raw_session_id.strip()}"
+ if str(session.get("source") or "") != "telegram":
+ return "That session is not a Telegram session and cannot be restored into this topic."
+ if str(session.get("user_id") or "") != str(source.user_id):
+ return "That session does not belong to this Telegram user."
+
+ linked = self._session_db.is_telegram_session_linked_to_topic(session_id=session_id)
+ current_binding = self._session_db.get_telegram_topic_binding(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ )
+ if linked:
+ if not current_binding or current_binding.get("session_id") != session_id:
+ return "That session is already linked to another Telegram topic."
+
+ session_key = self._session_key_for_source(source)
+ try:
+ self._session_db.bind_telegram_topic(
+ chat_id=str(source.chat_id),
+ thread_id=str(source.thread_id),
+ user_id=str(source.user_id),
+ session_key=session_key,
+ session_id=session_id,
+ managed_mode="restored",
+ )
+ except ValueError as exc:
+ if "already linked" in str(exc):
+ return "That session is already linked to another Telegram topic."
+ raise
+
+ title = self._session_db.get_session_title(session_id) or session_id
+ last_assistant = None
+ try:
+ for message in reversed(self._session_db.get_messages(session_id)):
+ if message.get("role") == "assistant" and message.get("content"):
+ last_assistant = str(message.get("content"))
+ break
+ except Exception:
+ last_assistant = None
+
+ response = f"Session restored: {title}"
+ if last_assistant:
+ response += f"\n\nLast Hermes message:\n{last_assistant}"
+ return response
+
async def _handle_title_command(self, event: MessageEvent) -> str:
"""Handle /title command — set or show the current session's title."""
source = event.source
@@ -10256,7 +11113,7 @@ class GatewayRunner:
if not has_blocking_approval(session_key):
if session_key in self._pending_approvals:
self._pending_approvals.pop(session_key)
- return "⚠️ Approval expired (agent is no longer waiting). Ask the agent to try again."
+ return t("gateway.approval_expired")
return "No pending command to approve."
# Parse args: support "all", "all session", "all always", "session", "always"
@@ -10671,12 +11528,13 @@ class GatewayRunner:
f"or type your answer directly.",
metadata=metadata,
)
+ # Keep the prompt marker on disk until the user
+ # answers. If the gateway restarts mid-prompt, the
+ # next watcher can recover by re-forwarding it from
+ # disk. Duplicate sends in the same process are
+ # still suppressed by _update_prompt_pending.
self._update_prompt_pending[session_key] = True
- # Remove the prompt file so it isn't re-read on the
- # next poll cycle. The update process only needs
# .update_response to continue — it doesn't re-check
- # .update_prompt.json while waiting.
- prompt_path.unlink(missing_ok=True)
logger.info("Forwarded update prompt to %s: %s", session_key, prompt_text[:80])
except (json.JSONDecodeError, OSError) as e:
logger.debug("Failed to read update prompt: %s", e)
@@ -10814,6 +11672,14 @@ class GatewayRunner:
)
return None
+ platform_cfg = self.config.platforms.get(platform)
+ if platform_cfg is not None and not platform_cfg.gateway_restart_notification:
+ logger.info(
+ "Restart notification suppressed: %s has gateway_restart_notification=false",
+ platform_str,
+ )
+ return None
+
metadata = {"thread_id": thread_id} if thread_id else None
result = await adapter.send(
str(chat_id),
@@ -10865,6 +11731,14 @@ class GatewayRunner:
if not home or not home.chat_id:
continue
+ platform_cfg = self.config.platforms.get(platform)
+ if platform_cfg is not None and not platform_cfg.gateway_restart_notification:
+ logger.info(
+ "Home-channel startup notification suppressed: %s has gateway_restart_notification=false",
+ platform.value,
+ )
+ continue
+
target = (platform.value, str(home.chat_id), str(home.thread_id) if home.thread_id else None)
if target in skipped or target in delivered:
continue
@@ -11135,6 +12009,10 @@ class GatewayRunner:
exc,
)
+ cached_source = self._get_cached_session_source(session_key)
+ if cached_source is not None:
+ return cached_source
+
_parsed = _parse_session_key(session_key)
if _parsed:
derived_platform = _parsed["platform"]
@@ -11378,6 +12256,7 @@ class GatewayRunner:
# Add more here as new baked-at-construction config settings are added.
_CACHE_BUSTING_CONFIG_KEYS: tuple = (
("model", "context_length"),
+ ("model", "max_tokens"),
("compression", "enabled"),
("compression", "threshold"),
("compression", "target_ratio"),
@@ -12235,6 +13114,24 @@ class GatewayRunner:
last_tool = [None] # Mutable container for tracking in closure
last_progress_msg = [None] # Track last message for dedup
repeat_count = [0] # How many times the same message repeated
+
+ # Auto-cleanup of temporary progress bubbles (Telegram + any adapter
+ # that implements ``delete_message``). When enabled via
+ # ``display.platforms..cleanup_progress: true``, message IDs
+ # from the tool-progress / "Still working..." / status-callback bubbles
+ # are collected here and deleted after the final response lands.
+ # Failed runs skip cleanup so the bubbles remain as breadcrumbs.
+ _cleanup_progress = bool(
+ resolve_display_setting(user_config, platform_key, "cleanup_progress")
+ )
+ _cleanup_adapter = self.adapters.get(source.platform) if _cleanup_progress else None
+ if _cleanup_adapter is not None and (
+ type(_cleanup_adapter).delete_message is BasePlatformAdapter.delete_message
+ ):
+ # Adapter doesn't support deletion — silently disable.
+ _cleanup_progress = False
+ _cleanup_adapter = None
+ _cleanup_msg_ids: List[str] = []
# First-touch onboarding latch: fires at most once per run, even if
# several tools exceed the threshold.
long_tool_hint_fired = [False]
@@ -12357,12 +13254,19 @@ class GatewayRunner:
# - Slack DM threading needs event_message_id fallback (reply thread)
# - Telegram uses message_thread_id only for forum topics; passing a
# normal DM/group message id as thread_id causes send failures
+ # - Feishu only honors reply_in_thread when sending a reply, so topic
+ # progress uses the triggering event message as the reply target
# - Other platforms should use explicit source.thread_id only
if source.platform == Platform.SLACK:
_progress_thread_id = source.thread_id or event_message_id
else:
_progress_thread_id = source.thread_id
_progress_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
+ _progress_reply_to = (
+ event_message_id
+ if source.platform == Platform.FEISHU and source.thread_id and event_message_id
+ else None
+ )
async def send_progress_messages():
if not progress_queue:
@@ -12476,17 +13380,40 @@ class GatewayRunner:
adapter.name,
)
can_edit = False
- await adapter.send(chat_id=source.chat_id, content=msg, metadata=_progress_metadata)
+ _flood_result = await adapter.send(
+ chat_id=source.chat_id,
+ content=msg,
+ reply_to=_progress_reply_to,
+ metadata=_progress_metadata,
+ )
+ if (
+ _cleanup_progress
+ and getattr(_flood_result, "success", False)
+ and getattr(_flood_result, "message_id", None)
+ ):
+ _cleanup_msg_ids.append(str(_flood_result.message_id))
else:
if can_edit:
# First tool: send all accumulated text as new message
full_text = "\n".join(progress_lines)
- result = await adapter.send(chat_id=source.chat_id, content=full_text, metadata=_progress_metadata)
+ result = await adapter.send(
+ chat_id=source.chat_id,
+ content=full_text,
+ reply_to=_progress_reply_to,
+ metadata=_progress_metadata,
+ )
else:
# Editing unsupported: send just this line
- result = await adapter.send(chat_id=source.chat_id, content=msg, metadata=_progress_metadata)
+ result = await adapter.send(
+ chat_id=source.chat_id,
+ content=msg,
+ reply_to=_progress_reply_to,
+ metadata=_progress_metadata,
+ )
if result.success and result.message_id:
progress_msg_id = result.message_id
+ if _cleanup_progress:
+ _cleanup_msg_ids.append(str(result.message_id))
_last_edit_ts = time.monotonic()
@@ -12584,13 +13511,23 @@ class GatewayRunner:
# Bridge sync status_callback → async adapter.send for context pressure
_status_adapter = self.adapters.get(source.platform)
_status_chat_id = source.chat_id
- _status_thread_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
+ if source.platform == Platform.FEISHU and source.thread_id and event_message_id:
+ # Feishu topics only keep messages inside the topic when they are
+ # sent via the reply API with reply_in_thread=true. Status/interim,
+ # approval, and stream-consumer paths usually only receive metadata,
+ # so carry the triggering message id as a Feishu-specific fallback.
+ _status_thread_metadata: Optional[Dict[str, Any]] = {
+ "thread_id": _progress_thread_id,
+ "reply_to_message_id": event_message_id,
+ }
+ else:
+ _status_thread_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
def _status_callback_sync(event_type: str, message: str) -> None:
if not _status_adapter or not _run_still_current():
return
try:
- asyncio.run_coroutine_threadsafe(
+ _fut = asyncio.run_coroutine_threadsafe(
_status_adapter.send(
_status_chat_id,
message,
@@ -12598,6 +13535,16 @@ class GatewayRunner:
),
_loop_for_step,
)
+ if _cleanup_progress:
+ def _track_status_id(fut) -> None:
+ try:
+ res = fut.result()
+ except Exception:
+ return
+ mid = getattr(res, "message_id", None)
+ if getattr(res, "success", False) and mid:
+ _cleanup_msg_ids.append(str(mid))
+ _fut.add_done_callback(_track_status_id)
except Exception as _e:
logger.debug("status_callback error (%s): %s", event_type, _e)
@@ -12631,13 +13578,9 @@ class GatewayRunner:
combined_ephemeral = (combined_ephemeral + "\n\n" + self._ephemeral_system_prompt).strip()
# Re-read .env and config for fresh credentials (gateway is long-lived,
- # keys may change without restart).
- try:
- load_dotenv(_env_path, override=True, encoding="utf-8")
- except UnicodeDecodeError:
- load_dotenv(_env_path, override=True, encoding="latin-1")
- except Exception:
- pass
+ # keys may change without restart). Keep config.yaml authoritative for
+ # runtime budget settings bridged into env vars.
+ _reload_runtime_env_preserving_config_authority()
try:
model, runtime_kwargs = self._resolve_session_agent_runtime(
@@ -12728,7 +13671,7 @@ class GatewayRunner:
adapter=_adapter,
chat_id=source.chat_id,
config=_consumer_cfg,
- metadata={"thread_id": _progress_thread_id} if _progress_thread_id else None,
+ metadata=_status_thread_metadata,
on_new_message=(
(lambda: progress_queue.put(("__reset__",)))
if progress_queue is not None
@@ -13205,6 +14148,11 @@ class GatewayRunner:
"messages": result.get("messages", []),
"api_calls": result.get("api_calls", 0),
"failed": result.get("failed", False),
+ "partial": result.get("partial", False),
+ "completed": result.get("completed"),
+ "interrupted": result.get("interrupted", False),
+ "interrupt_message": result.get("interrupt_message"),
+ "error": result.get("error"),
"compression_exhausted": result.get("compression_exhausted", False),
"tools": tools_holder[0] or [],
"history_offset": len(agent_history),
@@ -13288,20 +14236,29 @@ class GatewayRunner:
_title_failure_cb = getattr(
agent, "_emit_auxiliary_failure", None
)
- maybe_auto_title(
- self._session_db,
- effective_session_id,
- message,
- final_response,
- all_msgs,
- failure_callback=_title_failure_cb,
- main_runtime={
+ maybe_auto_title_kwargs = {
+ "failure_callback": _title_failure_cb,
+ "main_runtime": {
"model": getattr(agent, "model", None),
"provider": getattr(agent, "provider", None),
"base_url": getattr(agent, "base_url", None),
"api_key": getattr(agent, "api_key", None),
"api_mode": getattr(agent, "api_mode", None),
} if agent else None,
+ }
+ if self._is_telegram_topic_lane(source):
+ maybe_auto_title_kwargs["title_callback"] = lambda title: self._schedule_telegram_topic_title_rename(
+ source,
+ effective_session_id,
+ title,
+ )
+ maybe_auto_title(
+ self._session_db,
+ effective_session_id,
+ message,
+ final_response,
+ all_msgs,
+ **maybe_auto_title_kwargs,
)
except Exception:
pass
@@ -13311,6 +14268,11 @@ class GatewayRunner:
"last_reasoning": result.get("last_reasoning"),
"messages": result_holder[0].get("messages", []) if result_holder[0] else [],
"api_calls": result_holder[0].get("api_calls", 0) if result_holder[0] else 0,
+ "completed": result_holder[0].get("completed") if result_holder[0] else None,
+ "interrupted": result_holder[0].get("interrupted", False) if result_holder[0] else False,
+ "partial": result_holder[0].get("partial", False) if result_holder[0] else False,
+ "error": result_holder[0].get("error") if result_holder[0] else None,
+ "interrupt_message": result_holder[0].get("interrupt_message") if result_holder[0] else None,
"tools": tools_holder[0] or [],
"history_offset": _effective_history_offset,
"last_prompt_tokens": _last_prompt_toks,
@@ -13449,11 +14411,17 @@ class GatewayRunner:
except Exception:
pass
try:
- await _notify_adapter.send(
+ _notify_res = await _notify_adapter.send(
source.chat_id,
f"⏳ Still working... ({_elapsed_mins} min elapsed{_status_detail})",
metadata=_status_thread_metadata,
)
+ if (
+ _cleanup_progress
+ and getattr(_notify_res, "success", False)
+ and getattr(_notify_res, "message_id", None)
+ ):
+ _cleanup_msg_ids.append(str(_notify_res.message_id))
except Exception as _ne:
logger.debug("Long-running notification error: %s", _ne)
@@ -13927,7 +14895,49 @@ class GatewayRunner:
_previewed,
)
response["already_sent"] = True
-
+
+ # Schedule deletion of tracked temporary progress bubbles after the
+ # final response lands. Failed runs skip this so bubbles remain as
+ # breadcrumbs for the user to see what work happened. Only fires on
+ # adapters that support ``delete_message`` (see init above); failures
+ # are swallowed — deletion is best-effort.
+ if (
+ _cleanup_progress
+ and _cleanup_adapter is not None
+ and _cleanup_msg_ids
+ and session_key
+ and isinstance(response, dict)
+ and not response.get("failed")
+ and hasattr(_cleanup_adapter, "register_post_delivery_callback")
+ ):
+ _ids_snapshot = list(_cleanup_msg_ids)
+ _chat_id_snapshot = source.chat_id
+ _adapter_snapshot = _cleanup_adapter
+ _loop_snapshot = asyncio.get_running_loop()
+
+ def _cleanup_temp_bubbles() -> None:
+ async def _delete_all() -> None:
+ for _mid in _ids_snapshot:
+ try:
+ await _adapter_snapshot.delete_message(
+ _chat_id_snapshot, _mid
+ )
+ except Exception:
+ pass
+ try:
+ asyncio.run_coroutine_threadsafe(_delete_all(), _loop_snapshot)
+ except Exception:
+ pass
+
+ try:
+ _cleanup_adapter.register_post_delivery_callback(
+ session_key,
+ _cleanup_temp_bubbles,
+ generation=run_generation,
+ )
+ except Exception as _rpe:
+ logger.debug("Post-delivery cleanup registration failed: %s", _rpe)
+
return response
@@ -14177,15 +15187,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
runner = GatewayRunner(config)
- # Track whether a signal initiated the shutdown (vs. internal request).
- # When an unexpected SIGTERM kills the gateway, we exit non-zero so
- # systemd's Restart=on-failure revives the process. systemctl stop
- # is safe: systemd tracks stop-requested state independently of exit
- # code, so Restart= never fires for a deliberate stop.
+ # Track whether an unexpected signal initiated the shutdown. When an
+ # unexpected SIGTERM kills the gateway, we exit non-zero so service
+ # managers can revive the process. Planned stop paths write a marker
+ # before signalling us so they can exit cleanly instead.
_signal_initiated_shutdown = False
# Set up signal handlers
- def shutdown_signal_handler():
+ def shutdown_signal_handler(received_signal=None):
nonlocal _signal_initiated_shutdown
# Planned --replace takeover check: when a sibling gateway is
# taking over via --replace, it wrote a marker naming this PID
@@ -14201,10 +15210,28 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
except Exception as e:
logger.debug("Takeover marker check failed: %s", e)
+ # Planned stop check: service managers and `hermes gateway stop`
+ # also send SIGTERM, which is indistinguishable from an unexpected
+ # external kill unless the CLI marks it first. SIGINT comes from an
+ # interactive Ctrl+C and is likewise an intentional foreground stop.
+ planned_stop = False
+ if received_signal == signal.SIGINT:
+ planned_stop = True
+ elif not planned_takeover:
+ try:
+ from gateway.status import consume_planned_stop_marker_for_self
+ planned_stop = consume_planned_stop_marker_for_self()
+ except Exception as e:
+ logger.debug("Planned stop marker check failed: %s", e)
+
if planned_takeover:
logger.info(
"Received SIGTERM as a planned --replace takeover — exiting cleanly"
)
+ elif planned_stop:
+ logger.info(
+ "Received SIGTERM/SIGINT as a planned gateway stop — exiting cleanly"
+ )
else:
_signal_initiated_shutdown = True
logger.info("Received SIGTERM/SIGINT — initiating shutdown")
@@ -14240,7 +15267,7 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
if threading.current_thread() is threading.main_thread():
for sig in (signal.SIGINT, signal.SIGTERM):
try:
- loop.add_signal_handler(sig, shutdown_signal_handler)
+ loop.add_signal_handler(sig, shutdown_signal_handler, sig)
except NotImplementedError:
pass
if hasattr(signal, "SIGUSR1"):
@@ -14338,14 +15365,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
if runner.exit_code is not None:
raise SystemExit(runner.exit_code)
- # When a signal (SIGTERM/SIGINT) caused the shutdown and it wasn't a
- # planned restart (/restart, /update, SIGUSR1), exit non-zero so
- # systemd's Restart=on-failure revives the process. This covers:
+ # When an unexpected SIGTERM caused the shutdown and it wasn't a planned
+ # restart (/restart, /update, SIGUSR1), exit non-zero so systemd's
+ # Restart=on-failure revives the process. This covers:
# - hermes update killing the gateway mid-work
# - External kill commands
# - WSL2/container runtime sending unexpected signals
- # systemctl stop is safe: systemd tracks "stop requested" state
- # independently of exit code, so Restart= never fires for it.
+ # `hermes gateway stop` and interactive Ctrl+C are handled above as
+ # planned stops and should not trigger service-manager revival.
if _signal_initiated_shutdown and not runner._restart_requested:
logger.info(
"Exiting with code 1 (signal-initiated shutdown without restart "
diff --git a/gateway/session.py b/gateway/session.py
index 16de296e0e..be393e48e6 100644
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -1276,8 +1276,9 @@ class SessionStore:
# Also write legacy JSONL (keeps existing tooling working during transition)
transcript_path = self.get_transcript_path(session_id)
- with open(transcript_path, "a", encoding="utf-8") as f:
- f.write(json.dumps(message, ensure_ascii=False) + "\n")
+ with self._lock:
+ with open(transcript_path, "a", encoding="utf-8") as f:
+ f.write(json.dumps(message, ensure_ascii=False) + "\n")
def rewrite_transcript(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
"""Replace the entire transcript for a session with new messages.
diff --git a/gateway/status.py b/gateway/status.py
index f329b25f08..bdff9aa988 100644
--- a/gateway/status.py
+++ b/gateway/status.py
@@ -637,6 +637,8 @@ def release_all_scoped_locks(
_TAKEOVER_MARKER_FILENAME = ".gateway-takeover.json"
_TAKEOVER_MARKER_TTL_S = 60 # Marker older than this is treated as stale
+_PLANNED_STOP_MARKER_FILENAME = ".gateway-planned-stop.json"
+_PLANNED_STOP_MARKER_TTL_S = 60
def _get_takeover_marker_path() -> Path:
@@ -645,6 +647,67 @@ def _get_takeover_marker_path() -> Path:
return home / _TAKEOVER_MARKER_FILENAME
+def _get_planned_stop_marker_path() -> Path:
+ """Return the path to the intentional gateway stop marker file."""
+ home = get_hermes_home()
+ return home / _PLANNED_STOP_MARKER_FILENAME
+
+
+def _marker_is_stale(written_at: str, ttl_s: int) -> bool:
+ try:
+ written_dt = datetime.fromisoformat(written_at)
+ age = (datetime.now(timezone.utc) - written_dt).total_seconds()
+ return age > ttl_s
+ except (TypeError, ValueError):
+ return True
+
+
+def _consume_pid_marker_for_self(
+ path: Path,
+ *,
+ pid_field: str,
+ start_time_field: str,
+ ttl_s: int,
+) -> bool:
+ record = _read_json_file(path)
+ if not record:
+ return False
+
+ try:
+ target_pid = int(record[pid_field])
+ target_start_time = record.get(start_time_field)
+ written_at = record.get("written_at") or ""
+ except (KeyError, TypeError, ValueError):
+ try:
+ path.unlink(missing_ok=True)
+ except OSError:
+ pass
+ return False
+
+ if _marker_is_stale(written_at, ttl_s):
+ try:
+ path.unlink(missing_ok=True)
+ except OSError:
+ pass
+ return False
+
+ our_pid = os.getpid()
+ our_start_time = _get_process_start_time(our_pid)
+ matches = (
+ target_pid == our_pid
+ and target_start_time is not None
+ and our_start_time is not None
+ and target_start_time == our_start_time
+ )
+
+ try:
+ path.unlink(missing_ok=True)
+ except OSError:
+ pass
+
+ return matches
+
+
def write_takeover_marker(target_pid: int) -> bool:
"""Record that ``target_pid`` is being replaced by the current process.
@@ -681,59 +744,13 @@ def consume_takeover_marker_for_self() -> bool:
Always unlinks the marker on match (and on detected staleness) so
subsequent unrelated signals don't re-trigger.
"""
- path = _get_takeover_marker_path()
- record = _read_json_file(path)
- if not record:
- return False
-
- # Any malformed or stale marker → drop it and return False
- try:
- target_pid = int(record["target_pid"])
- target_start_time = record.get("target_start_time")
- written_at = record.get("written_at") or ""
- except (KeyError, TypeError, ValueError):
- try:
- path.unlink(missing_ok=True)
- except OSError:
- pass
- return False
-
- # TTL guard: a stale marker older than _TAKEOVER_MARKER_TTL_S is ignored.
- stale = False
- try:
- written_dt = datetime.fromisoformat(written_at)
- age = (datetime.now(timezone.utc) - written_dt).total_seconds()
- if age > _TAKEOVER_MARKER_TTL_S:
- stale = True
- except (TypeError, ValueError):
- stale = True # Unparseable timestamp — treat as stale
-
- if stale:
- try:
- path.unlink(missing_ok=True)
- except OSError:
- pass
- return False
-
- # Does the marker name THIS process?
- our_pid = os.getpid()
- our_start_time = _get_process_start_time(our_pid)
- matches = (
- target_pid == our_pid
- and target_start_time is not None
- and our_start_time is not None
- and target_start_time == our_start_time
+ return _consume_pid_marker_for_self(
+ _get_takeover_marker_path(),
+ pid_field="target_pid",
+ start_time_field="target_start_time",
+ ttl_s=_TAKEOVER_MARKER_TTL_S,
)
- # Consume the marker whether it matched or not — a marker that doesn't
- # match our identity is stale-for-us anyway.
- try:
- path.unlink(missing_ok=True)
- except OSError:
- pass
-
- return matches
-
def clear_takeover_marker() -> None:
"""Remove the takeover marker unconditionally. Safe to call repeatedly."""
@@ -743,6 +760,45 @@ def clear_takeover_marker() -> None:
pass
+def write_planned_stop_marker(target_pid: int) -> bool:
+ """Record that ``target_pid`` is being stopped intentionally.
+
+ The gateway exits non-zero for unexpected SIGTERM so service managers can
+ revive it. Service stop commands send the same SIGTERM, so the CLI writes
+ this short-lived marker first to let the target process exit cleanly.
+ """
+ try:
+ target_start_time = _get_process_start_time(target_pid)
+ record = {
+ "target_pid": target_pid,
+ "target_start_time": target_start_time,
+ "stopper_pid": os.getpid(),
+ "written_at": _utc_now_iso(),
+ }
+ _write_json_file(_get_planned_stop_marker_path(), record)
+ return True
+ except (OSError, PermissionError):
+ return False
+
+
+def consume_planned_stop_marker_for_self() -> bool:
+ """Return True when the current process is being intentionally stopped."""
+ return _consume_pid_marker_for_self(
+ _get_planned_stop_marker_path(),
+ pid_field="target_pid",
+ start_time_field="target_start_time",
+ ttl_s=_PLANNED_STOP_MARKER_TTL_S,
+ )
+
+
+def clear_planned_stop_marker() -> None:
+ """Remove the planned-stop marker unconditionally."""
+ try:
+ _get_planned_stop_marker_path().unlink(missing_ok=True)
+ except OSError:
+ pass
+
+
def get_running_pid(
pid_path: Optional[Path] = None,
*,
diff --git a/hermes_cli/__init__.py b/hermes_cli/__init__.py
index 9141ea93e7..0f247ddcc1 100644
--- a/hermes_cli/__init__.py
+++ b/hermes_cli/__init__.py
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
-__version__ = "0.12.0"
-__release_date__ = "2026.4.30"
+__version__ = "0.13.0"
+__release_date__ = "2026.5.7"
def _ensure_utf8():
diff --git a/hermes_cli/_parser.py b/hermes_cli/_parser.py
index 29ac96c97b..3ece411e75 100644
--- a/hermes_cli/_parser.py
+++ b/hermes_cli/_parser.py
@@ -70,6 +70,9 @@ Examples:
hermes logs --since 1h Lines from the last hour
hermes debug share Upload debug report for support
hermes update Update to latest version
+ hermes dashboard Start web UI dashboard (port 9119)
+ hermes dashboard --stop Stop running dashboard processes
+ hermes dashboard --status List running dashboard processes
For more help on a command:
hermes --help
diff --git a/hermes_cli/auth.py b/hermes_cli/auth.py
index 5b63d41eb1..3fa726d6a7 100644
--- a/hermes_cli/auth.py
+++ b/hermes_cli/auth.py
@@ -416,6 +416,40 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
),
}
+# Auto-extend PROVIDER_REGISTRY with any api-key provider registered in
+# providers/ that is not already declared above. New providers only need a
+# plugins/model-providers// plugin — no edits to this file required.
+try:
+ from providers import list_providers as _list_providers_for_registry
+ for _pp in _list_providers_for_registry():
+ if _pp.name in PROVIDER_REGISTRY:
+ continue
+ if _pp.auth_type != "api_key" or not _pp.env_vars:
+ continue
+ # Skip providers that need custom token resolution or are special-cased
+ # in resolve_provider() (copilot/kimi/zai have bespoke token refresh;
+ # openrouter/custom are aggregator/user-supplied and handled outside
+ # the registry — adding them here breaks runtime_provider resolution
+ # that relies on `openrouter not in PROVIDER_REGISTRY`).
+ if _pp.name in {"copilot", "kimi-coding", "kimi-coding-cn", "zai", "openrouter", "custom"}:
+ continue
+ _api_key_vars = tuple(v for v in _pp.env_vars if not v.endswith("_BASE_URL") and not v.endswith("_URL"))
+ _base_url_var = next((v for v in _pp.env_vars if v.endswith("_BASE_URL") or v.endswith("_URL")), None)
+ PROVIDER_REGISTRY[_pp.name] = ProviderConfig(
+ id=_pp.name,
+ name=_pp.display_name or _pp.name,
+ auth_type="api_key",
+ inference_base_url=_pp.base_url,
+ api_key_env_vars=_api_key_vars or _pp.env_vars,
+ base_url_env_var=_base_url_var or "",
+ )
+ # Also register aliases so resolve_provider() resolves them
+ for _alias in _pp.aliases:
+ if _alias not in PROVIDER_REGISTRY:
+ PROVIDER_REGISTRY[_alias] = PROVIDER_REGISTRY[_pp.name]
+except Exception:
+ pass
+
# =============================================================================
# Anthropic Key Helper
@@ -746,42 +780,121 @@ def _auth_file_path() -> Path:
return path
+def _global_auth_file_path() -> Optional[Path]:
+ """Return the global-root auth.json when the process is in profile mode.
+
+ Returns ``None`` when the profile and global root resolve to the same
+ directory (classic mode, or custom HERMES_HOME that is not a profile).
+ Used by read-only fallback paths so providers authed at the root are
+ visible to profile processes that haven't configured them locally.
+
+ See issue #18594 follow-up (credential_pool shadowing).
+ """
+ try:
+ from hermes_constants import get_default_hermes_root
+ global_root = get_default_hermes_root()
+ except Exception:
+ return None
+ profile_home = get_hermes_home()
+ try:
+ if profile_home.resolve(strict=False) == global_root.resolve(strict=False):
+ return None
+ except Exception:
+ if profile_home == global_root:
+ return None
+ # No pytest seat belt here: this is a pure read-only path, and
+ # ``_load_global_auth_store()`` wraps the read in a try/except so an
+ # unreadable global file can never break the profile process. The
+ # write-side seat belt still lives on ``_auth_file_path()`` where it
+ # belongs (that's what protects the real user's auth store from being
+ # corrupted by a mis-configured test).
+ return global_root / "auth.json"
+
+
+def _load_global_auth_store() -> Dict[str, Any]:
+ """Load the global-root auth store (read-only fallback).
+
+ Returns an empty dict when no global fallback exists (classic mode,
+ or the global auth.json is absent). Never raises on missing file.
+
+ Seat belt: under pytest, refuses to read the real user's
+ ``~/.hermes/auth.json`` even when HERMES_HOME is set to a profile
+ path. The hermetic conftest does not redirect ``HOME``, so
+ ``get_default_hermes_root()`` for a profile-shaped HERMES_HOME can
+ still resolve to the real user's home on a dev machine. That would
+ leak real credentials into tests. This guard uses the unmodified
+ ``HOME`` env var (what ``os.path.expanduser('~')`` would resolve to),
+ not ``Path.home()``, because ``Path.home`` is sometimes monkeypatched
+ by fixtures that want to relocate the global root to a tmp path.
+ """
+ global_path = _global_auth_file_path()
+ if global_path is None or not global_path.exists():
+ return {}
+ if os.environ.get("PYTEST_CURRENT_TEST"):
+ real_home_env = os.environ.get("HOME", "")
+ if real_home_env:
+ real_root = Path(real_home_env) / ".hermes" / "auth.json"
+ try:
+ if global_path.resolve(strict=False) == real_root.resolve(strict=False):
+ return {}
+ except Exception:
+ pass
+ try:
+ return _load_auth_store(global_path)
+ except Exception:
+ # A malformed global store must not break profile reads. The
+ # profile's own auth store is still authoritative.
+ return {}
+
+
def _auth_lock_path() -> Path:
return _auth_file_path().with_suffix(".lock")
_auth_lock_holder = threading.local()
+
@contextmanager
-def _auth_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
- """Cross-process advisory lock for auth.json reads+writes. Reentrant."""
- # Reentrant: if this thread already holds the lock, just yield.
- if getattr(_auth_lock_holder, "depth", 0) > 0:
- _auth_lock_holder.depth += 1
+def _file_lock(
+ lock_path: Path,
+ holder: threading.local,
+ timeout_seconds: float,
+ timeout_message: str,
+):
+ """Cross-process advisory flock helper.
+
+ Reentrant per-thread via ``holder.depth``. Falls back to a depth-only
+ guard when neither ``fcntl`` nor ``msvcrt`` is available (rare).
+ Callers supply their own ``threading.local`` so independent locks
+ (e.g. profile auth.json vs shared Nous store) don't share reentrancy
+ state — that would let one lock's reentrant acquisition silently skip
+ the other's kernel-level flock.
+ """
+ if getattr(holder, "depth", 0) > 0:
+ holder.depth += 1
try:
yield
finally:
- _auth_lock_holder.depth -= 1
+ holder.depth -= 1
return
- lock_path = _auth_lock_path()
lock_path.parent.mkdir(parents=True, exist_ok=True)
if fcntl is None and msvcrt is None:
- _auth_lock_holder.depth = 1
+ holder.depth = 1
try:
yield
finally:
- _auth_lock_holder.depth = 0
+ holder.depth = 0
return
# On Windows, msvcrt.locking needs the file to have content and the
- # file pointer at position 0. Ensure the lock file has at least 1 byte.
+ # file pointer at position 0. Ensure the lock file has at least 1 byte.
if msvcrt and (not lock_path.exists() or lock_path.stat().st_size == 0):
lock_path.write_text(" ", encoding="utf-8")
with lock_path.open("r+" if msvcrt else "a+") as lock_file:
- deadline = time.time() + max(1.0, timeout_seconds)
+ deadline = time.monotonic() + max(1.0, timeout_seconds)
while True:
try:
if fcntl:
@@ -791,15 +904,15 @@ def _auth_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
msvcrt.locking(lock_file.fileno(), msvcrt.LK_NBLCK, 1)
break
except (BlockingIOError, OSError, PermissionError):
- if time.time() >= deadline:
- raise TimeoutError("Timed out waiting for auth store lock")
+ if time.monotonic() >= deadline:
+ raise TimeoutError(timeout_message)
time.sleep(0.05)
- _auth_lock_holder.depth = 1
+ holder.depth = 1
try:
yield
finally:
- _auth_lock_holder.depth = 0
+ holder.depth = 0
if fcntl:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_UN)
elif msvcrt:
@@ -810,6 +923,25 @@ def _auth_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
pass
+@contextmanager
+def _auth_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
+ """Cross-process advisory lock for auth.json reads+writes. Reentrant.
+
+ Lock ordering invariant: when this lock is held together with
+ ``_nous_shared_store_lock``, acquire ``_auth_store_lock`` FIRST
+ (outer) and the shared Nous lock SECOND (inner). All runtime
+ refresh paths follow this order; violating it risks deadlock
+ against a concurrent import on the shared store.
+ """
+ with _file_lock(
+ _auth_lock_path(),
+ _auth_lock_holder,
+ timeout_seconds,
+ "Timed out waiting for auth store lock",
+ ):
+ yield
+
+
def _load_auth_store(auth_file: Optional[Path] = None) -> Dict[str, Any]:
auth_file = auth_file or _auth_file_path()
if not auth_file.exists():
@@ -853,12 +985,27 @@ def _load_auth_store(auth_file: Optional[Path] = None) -> Dict[str, Any]:
def _save_auth_store(auth_store: Dict[str, Any]) -> Path:
auth_file = _auth_file_path()
auth_file.parent.mkdir(parents=True, exist_ok=True)
+ # Tighten parent dir to 0o700 so siblings can't traverse to creds.
+ # No-op on Windows (POSIX mode bits not enforced); ignore failures.
+ try:
+ os.chmod(auth_file.parent, 0o700)
+ except OSError:
+ pass
auth_store["version"] = AUTH_STORE_VERSION
auth_store["updated_at"] = datetime.now(timezone.utc).isoformat()
payload = json.dumps(auth_store, indent=2) + "\n"
tmp_path = auth_file.with_name(f"{auth_file.name}.tmp.{os.getpid()}.{uuid.uuid4().hex}")
try:
- with tmp_path.open("w", encoding="utf-8") as handle:
+ # Create with 0o600 atomically via os.open(O_EXCL) + fdopen to close
+ # the TOCTOU window where default umask (often 0o644) briefly exposed
+ # OAuth tokens to other local users between open() and chmod().
+ # Mirrors agent/google_oauth.py (#19673) and tools/mcp_oauth.py (#21148).
+ fd = os.open(
+ str(tmp_path),
+ os.O_WRONLY | os.O_CREAT | os.O_EXCL,
+ stat.S_IRUSR | stat.S_IWUSR,
+ )
+ with os.fdopen(fd, "w", encoding="utf-8") as handle:
handle.write(payload)
handle.flush()
os.fsync(handle.fileno())
@@ -932,15 +1079,50 @@ def get_auth_provider_display_name(provider_id: str) -> str:
def read_credential_pool(provider_id: Optional[str] = None) -> Dict[str, Any]:
- """Return the persisted credential pool, or one provider slice."""
+ """Return the persisted credential pool, or one provider slice.
+
+ In profile mode, the profile's credential pool is authoritative. If a
+ provider has no entries in the profile, entries from the global-root
+ ``auth.json`` are used as a read-only fallback — so workers spawned in a
+ profile can see providers that were only authenticated at global scope.
+
+ Profile entries always win: the global fallback only applies per-provider
+ when the profile has zero entries for that provider. Once the user runs
+ ``hermes auth add `` inside the profile, profile entries
+ fully shadow global for that provider on the next read.
+
+ Writes always go to the profile (``write_credential_pool`` is unchanged).
+ See issue #18594 follow-up.
+ """
auth_store = _load_auth_store()
pool = auth_store.get("credential_pool")
if not isinstance(pool, dict):
pool = {}
+
+ global_pool: Dict[str, Any] = {}
+ global_store = _load_global_auth_store()
+ maybe_global_pool = global_store.get("credential_pool") if global_store else None
+ if isinstance(maybe_global_pool, dict):
+ global_pool = maybe_global_pool
+
if provider_id is None:
- return dict(pool)
+ merged = dict(pool)
+ for gp_key, gp_entries in global_pool.items():
+ if not isinstance(gp_entries, list) or not gp_entries:
+ continue
+ # Per-provider shadowing: profile wins whenever it has ANY entries.
+ existing = merged.get(gp_key)
+ if isinstance(existing, list) and existing:
+ continue
+ merged[gp_key] = list(gp_entries)
+ return merged
+
provider_entries = pool.get(provider_id)
- return list(provider_entries) if isinstance(provider_entries, list) else []
+ if isinstance(provider_entries, list) and provider_entries:
+ return list(provider_entries)
+ # Profile has no entries for this provider — fall back to global.
+ global_entries = global_pool.get(provider_id)
+ return list(global_entries) if isinstance(global_entries, list) else []
def write_credential_pool(provider_id: str, entries: List[Dict[str, Any]]) -> Path:
@@ -999,9 +1181,25 @@ def unsuppress_credential_source(provider_id: str, source: str) -> bool:
def get_provider_auth_state(provider_id: str) -> Optional[Dict[str, Any]]:
- """Return persisted auth state for a provider, or None."""
+ """Return persisted auth state for a provider, or None.
+
+ In profile mode, falls back to the global-root ``auth.json`` when the
+ profile has no state for this provider. Profile state always wins when
+ present. Writes (``_save_auth_store`` / ``persist_*_credentials``) are
+ unchanged — they still target the profile only. This mirrors
+ ``read_credential_pool``'s per-provider shadowing semantics so that
+ ``_seed_from_singletons`` can reseed a profile's credential pool from
+ global-scope provider state (e.g. a globally-authenticated Anthropic
+ OAuth or Nous device-code session). See issue #18594 follow-up.
+ """
auth_store = _load_auth_store()
- return _load_provider_state(auth_store, provider_id)
+ state = _load_provider_state(auth_store, provider_id)
+ if state is not None:
+ return state
+ global_store = _load_global_auth_store()
+ if not global_store:
+ return None
+ return _load_provider_state(global_store, provider_id)
def get_active_provider() -> Optional[str]:
@@ -1195,6 +1393,17 @@ def resolve_provider(
"vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
}
+ # Extend with aliases declared in plugins/model-providers// that aren't already mapped.
+ # This keeps providers/ as the single source for new aliases while the
+ # hardcoded dict above remains authoritative for existing ones.
+ try:
+ from providers import list_providers as _lp
+ for _pp in _lp():
+ for _alias in _pp.aliases:
+ if _alias not in _PROVIDER_ALIASES:
+ _PROVIDER_ALIASES[_alias] = _pp.name
+ except Exception:
+ pass
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
if normalized == "openrouter":
@@ -1360,10 +1569,33 @@ def _read_qwen_cli_tokens() -> Dict[str, Any]:
def _save_qwen_cli_tokens(tokens: Dict[str, Any]) -> Path:
auth_path = _qwen_cli_auth_path()
auth_path.parent.mkdir(parents=True, exist_ok=True)
- tmp_path = auth_path.with_suffix(".tmp")
- tmp_path.write_text(json.dumps(tokens, indent=2, sort_keys=True) + "\n", encoding="utf-8")
- os.chmod(tmp_path, stat.S_IRUSR | stat.S_IWUSR)
- tmp_path.replace(auth_path)
+ try:
+ os.chmod(auth_path.parent, 0o700)
+ except OSError:
+ pass
+ # Per-process random temp suffix avoids collisions between concurrent
+ # writers and stale leftovers from a crashed prior write.
+ tmp_path = auth_path.with_name(f"{auth_path.name}.tmp.{os.getpid()}.{uuid.uuid4().hex}")
+ # Create with 0o600 atomically via os.open(O_EXCL) — closes the TOCTOU
+ # window where write_text() + post-write chmod briefly exposed tokens
+ # at process umask (typically 0o644). See #19673, #21148.
+ fd = os.open(
+ str(tmp_path),
+ os.O_WRONLY | os.O_CREAT | os.O_EXCL,
+ stat.S_IRUSR | stat.S_IWUSR,
+ )
+ try:
+ with os.fdopen(fd, "w", encoding="utf-8") as fh:
+ fh.write(json.dumps(tokens, indent=2, sort_keys=True) + "\n")
+ fh.flush()
+ os.fsync(fh.fileno())
+ atomic_replace(tmp_path, auth_path)
+ finally:
+ try:
+ if tmp_path.exists():
+ tmp_path.unlink()
+ except OSError:
+ pass
return auth_path
@@ -1780,9 +2012,9 @@ def _spotify_wait_for_callback(
thread = threading.Thread(target=server.serve_forever, kwargs={"poll_interval": 0.1}, daemon=True)
thread.start()
- deadline = time.time() + max(5.0, timeout_seconds)
+ deadline = time.monotonic() + max(5.0, timeout_seconds)
try:
- while time.time() < deadline:
+ while time.monotonic() < deadline:
if result["code"] or result["error"]:
return result
time.sleep(0.1)
@@ -2545,10 +2777,10 @@ def _poll_for_token(
poll_interval: int,
) -> Dict[str, Any]:
"""Poll the token endpoint until the user approves or the code expires."""
- deadline = time.time() + max(1, expires_in)
+ deadline = time.monotonic() + max(1, expires_in)
current_interval = max(1, min(poll_interval, DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS))
- while time.time() < deadline:
+ while time.monotonic() < deadline:
response = client.post(
f"{portal_base_url}/api/oauth/token",
data={
@@ -2606,6 +2838,7 @@ def _poll_for_token(
# -----------------------------------------------------------------------------
NOUS_SHARED_STORE_FILENAME = "nous_auth.json"
+_nous_shared_lock_holder = threading.local()
def _nous_shared_auth_dir() -> Path:
@@ -2645,6 +2878,69 @@ def _nous_shared_store_path() -> Path:
return path
+@contextmanager
+def _nous_shared_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
+ """Cross-profile lock for the shared Nous OAuth store.
+
+ Lock ordering invariant: if both this and ``_auth_store_lock`` need
+ to be held, acquire ``_auth_store_lock`` FIRST. All runtime refresh
+ paths follow this order. The one exception is
+ ``_try_import_shared_nous_state``, which holds this lock alone for
+ the entire refresh+mint cycle so concurrent imports on sibling
+ profiles can't race on the single-use shared refresh token; that
+ helper must NOT be called with ``_auth_store_lock`` already held.
+ """
+ try:
+ lock_path = _nous_shared_store_path().with_suffix(".lock")
+ except RuntimeError:
+ # No HERMES_HOME yet (pre-setup): fall through without locking.
+ yield
+ return
+
+ with _file_lock(
+ lock_path,
+ _nous_shared_lock_holder,
+ timeout_seconds,
+ "Timed out waiting for shared Nous auth lock",
+ ):
+ yield
+
+
+def _merge_shared_nous_oauth_state(state: Dict[str, Any]) -> bool:
+ """Copy fresher shared OAuth tokens into a profile-local Nous state."""
+ shared = _read_shared_nous_state()
+ if not shared:
+ return False
+
+ shared_refresh = shared.get("refresh_token")
+ if not isinstance(shared_refresh, str) or not shared_refresh.strip():
+ return False
+
+ local_refresh = state.get("refresh_token")
+ shared_access_exp = _parse_iso_timestamp(shared.get("expires_at")) or 0.0
+ local_access_exp = _parse_iso_timestamp(state.get("expires_at")) or 0.0
+ refresh_changed = shared_refresh.strip() != str(local_refresh or "").strip()
+ fresher_access = shared_access_exp > local_access_exp
+ if not refresh_changed and not fresher_access:
+ return False
+
+ for key in (
+ "access_token",
+ "refresh_token",
+ "token_type",
+ "scope",
+ "client_id",
+ "portal_base_url",
+ "inference_base_url",
+ "obtained_at",
+ "expires_at",
+ ):
+ value = shared.get(key)
+ if value not in (None, ""):
+ state[key] = value
+ return True
+
+
def _write_shared_nous_state(state: Dict[str, Any]) -> None:
"""Persist a minimal copy of the Nous OAuth state to the shared store.
@@ -2677,15 +2973,34 @@ def _write_shared_nous_state(state: Dict[str, Any]) -> None:
"updated_at": datetime.now(timezone.utc).isoformat(),
}
try:
- path = _nous_shared_store_path()
- path.parent.mkdir(parents=True, exist_ok=True)
- tmp = path.with_suffix(path.suffix + ".tmp")
- tmp.write_text(json.dumps(shared, indent=2, sort_keys=True))
- try:
- os.chmod(tmp, 0o600)
- except OSError:
- pass
- os.replace(tmp, path)
+ with _nous_shared_store_lock():
+ path = _nous_shared_store_path()
+ path.parent.mkdir(parents=True, exist_ok=True)
+ try:
+ os.chmod(path.parent, 0o700)
+ except OSError:
+ pass
+ tmp = path.with_name(f"{path.name}.tmp.{os.getpid()}.{uuid.uuid4().hex}")
+ # Create with 0o600 atomically via os.open(O_EXCL) — closes the TOCTOU
+ # window where write_text() + post-write chmod briefly exposed Nous
+ # refresh_token at process umask. See #19673, #21148.
+ fd = os.open(
+ str(tmp),
+ os.O_WRONLY | os.O_CREAT | os.O_EXCL,
+ stat.S_IRUSR | stat.S_IWUSR,
+ )
+ try:
+ with os.fdopen(fd, "w", encoding="utf-8") as fh:
+ fh.write(json.dumps(shared, indent=2, sort_keys=True))
+ fh.flush()
+ os.fsync(fh.fileno())
+ os.replace(tmp, path)
+ finally:
+ try:
+ if tmp.exists():
+ tmp.unlink()
+ except OSError:
+ pass
_oauth_trace(
"nous_shared_store_written",
path=str(path),
@@ -2742,36 +3057,38 @@ def _try_import_shared_nous_state(
etc.) — caller should then fall through to the normal device-code
flow.
"""
- shared = _read_shared_nous_state()
- if not shared:
- return None
-
- # Build a full state dict so refresh_nous_oauth_from_state has every
- # field it needs. force_refresh=True gets us a fresh access_token
- # for this profile; force_mint=True gets us a fresh agent_key.
- state: Dict[str, Any] = {
- "access_token": shared.get("access_token"),
- "refresh_token": shared.get("refresh_token"),
- "client_id": shared.get("client_id") or DEFAULT_NOUS_CLIENT_ID,
- "portal_base_url": shared.get("portal_base_url") or DEFAULT_NOUS_PORTAL_URL,
- "inference_base_url": shared.get("inference_base_url") or DEFAULT_NOUS_INFERENCE_URL,
- "token_type": shared.get("token_type") or "Bearer",
- "scope": shared.get("scope") or DEFAULT_NOUS_SCOPE,
- "obtained_at": shared.get("obtained_at"),
- "expires_at": shared.get("expires_at"),
- "agent_key": None,
- "agent_key_expires_at": None,
- "tls": {"insecure": False, "ca_bundle": None},
- }
-
try:
- refreshed = refresh_nous_oauth_from_state(
- state,
- min_key_ttl_seconds=min_key_ttl_seconds,
- timeout_seconds=timeout_seconds,
- force_refresh=True,
- force_mint=True,
- )
+ with _nous_shared_store_lock(timeout_seconds=max(timeout_seconds + 5.0, AUTH_LOCK_TIMEOUT_SECONDS)):
+ shared = _read_shared_nous_state()
+ if not shared:
+ return None
+
+ # Build a full state dict so refresh_nous_oauth_from_state has every
+ # field it needs. force_refresh=True gets us a fresh access_token
+ # for this profile; force_mint=True gets us a fresh agent_key.
+ state: Dict[str, Any] = {
+ "access_token": shared.get("access_token"),
+ "refresh_token": shared.get("refresh_token"),
+ "client_id": shared.get("client_id") or DEFAULT_NOUS_CLIENT_ID,
+ "portal_base_url": shared.get("portal_base_url") or DEFAULT_NOUS_PORTAL_URL,
+ "inference_base_url": shared.get("inference_base_url") or DEFAULT_NOUS_INFERENCE_URL,
+ "token_type": shared.get("token_type") or "Bearer",
+ "scope": shared.get("scope") or DEFAULT_NOUS_SCOPE,
+ "obtained_at": shared.get("obtained_at"),
+ "expires_at": shared.get("expires_at"),
+ "agent_key": None,
+ "agent_key_expires_at": None,
+ "tls": {"insecure": False, "ca_bundle": None},
+ }
+
+ refreshed = refresh_nous_oauth_from_state(
+ state,
+ min_key_ttl_seconds=min_key_ttl_seconds,
+ timeout_seconds=timeout_seconds,
+ force_refresh=True,
+ force_mint=True,
+ )
+ _write_shared_nous_state(refreshed)
except AuthError as exc:
_oauth_trace(
"nous_shared_import_failed",
@@ -2973,59 +3290,65 @@ def resolve_nous_access_token(
client_id = str(state.get("client_id") or DEFAULT_NOUS_CLIENT_ID)
verify = _resolve_verify(insecure=insecure, ca_bundle=ca_bundle, auth_state=state)
- access_token = state.get("access_token")
- refresh_token = state.get("refresh_token")
- if not isinstance(access_token, str) or not access_token:
- raise AuthError(
- "No access token found for Nous Portal login.",
- provider="nous",
- relogin_required=True,
- )
+ with _nous_shared_store_lock(timeout_seconds=max(timeout_seconds + 5.0, AUTH_LOCK_TIMEOUT_SECONDS)):
+ merged_shared = _merge_shared_nous_oauth_state(state)
+ access_token = state.get("access_token")
+ refresh_token = state.get("refresh_token")
+ if not isinstance(access_token, str) or not access_token:
+ raise AuthError(
+ "No access token found for Nous Portal login.",
+ provider="nous",
+ relogin_required=True,
+ )
- if not _is_expiring(state.get("expires_at"), refresh_skew_seconds):
- return access_token
+ if not _is_expiring(state.get("expires_at"), refresh_skew_seconds):
+ if merged_shared:
+ _save_provider_state(auth_store, "nous", state)
+ _save_auth_store(auth_store)
+ return access_token
- if not isinstance(refresh_token, str) or not refresh_token:
- raise AuthError(
- "Session expired and no refresh token is available.",
- provider="nous",
- relogin_required=True,
- )
+ if not isinstance(refresh_token, str) or not refresh_token:
+ raise AuthError(
+ "Session expired and no refresh token is available.",
+ provider="nous",
+ relogin_required=True,
+ )
- timeout = httpx.Timeout(timeout_seconds if timeout_seconds else 15.0)
- with httpx.Client(
- timeout=timeout,
- headers={"Accept": "application/json"},
- verify=verify,
- ) as client:
- refreshed = _refresh_access_token(
- client=client,
- portal_base_url=portal_base_url,
- client_id=client_id,
- refresh_token=refresh_token,
- )
+ timeout = httpx.Timeout(timeout_seconds if timeout_seconds else 15.0)
+ with httpx.Client(
+ timeout=timeout,
+ headers={"Accept": "application/json"},
+ verify=verify,
+ ) as client:
+ refreshed = _refresh_access_token(
+ client=client,
+ portal_base_url=portal_base_url,
+ client_id=client_id,
+ refresh_token=refresh_token,
+ )
- now = datetime.now(timezone.utc)
- access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
- state["access_token"] = refreshed["access_token"]
- state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
- state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
- state["scope"] = refreshed.get("scope") or state.get("scope")
- state["obtained_at"] = now.isoformat()
- state["expires_in"] = access_ttl
- state["expires_at"] = datetime.fromtimestamp(
- now.timestamp() + access_ttl,
- tz=timezone.utc,
- ).isoformat()
- state["portal_base_url"] = portal_base_url
- state["client_id"] = client_id
- state["tls"] = {
- "insecure": verify is False,
- "ca_bundle": verify if isinstance(verify, str) else None,
- }
- _save_provider_state(auth_store, "nous", state)
- _save_auth_store(auth_store)
- return state["access_token"]
+ now = datetime.now(timezone.utc)
+ access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
+ state["access_token"] = refreshed["access_token"]
+ state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
+ state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
+ state["scope"] = refreshed.get("scope") or state.get("scope")
+ state["obtained_at"] = now.isoformat()
+ state["expires_in"] = access_ttl
+ state["expires_at"] = datetime.fromtimestamp(
+ now.timestamp() + access_ttl,
+ tz=timezone.utc,
+ ).isoformat()
+ state["portal_base_url"] = portal_base_url
+ state["client_id"] = client_id
+ state["tls"] = {
+ "insecure": verify is False,
+ "ca_bundle": verify if isinstance(verify, str) else None,
+ }
+ _save_provider_state(auth_store, "nous", state)
+ _save_auth_store(auth_store)
+ _write_shared_nous_state(state)
+ return state["access_token"]
def refresh_nous_oauth_pure(
@@ -3293,46 +3616,53 @@ def resolve_nous_runtime_credentials(
# Step 1: refresh access token if expiring
if _is_expiring(state.get("expires_at"), ACCESS_TOKEN_REFRESH_SKEW_SECONDS):
- if not isinstance(refresh_token, str) or not refresh_token:
- raise AuthError("Session expired and no refresh token is available.",
- provider="nous", relogin_required=True)
+ with _nous_shared_store_lock(timeout_seconds=max(timeout_seconds + 5.0, AUTH_LOCK_TIMEOUT_SECONDS)):
+ if _merge_shared_nous_oauth_state(state):
+ access_token = state.get("access_token")
+ refresh_token = state.get("refresh_token")
+ _persist_state("post_shared_merge_access_expiring")
- _oauth_trace(
- "refresh_start",
- sequence_id=sequence_id,
- reason="access_expiring",
- refresh_token_fp=_token_fingerprint(refresh_token),
- )
- refreshed = _refresh_access_token(
- client=client, portal_base_url=portal_base_url,
- client_id=client_id, refresh_token=refresh_token,
- )
- now = datetime.now(timezone.utc)
- access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
- previous_refresh_token = refresh_token
- state["access_token"] = refreshed["access_token"]
- state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
- state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
- state["scope"] = refreshed.get("scope") or state.get("scope")
- refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
- if refreshed_url:
- inference_base_url = refreshed_url
- state["obtained_at"] = now.isoformat()
- state["expires_in"] = access_ttl
- state["expires_at"] = datetime.fromtimestamp(
- now.timestamp() + access_ttl, tz=timezone.utc
- ).isoformat()
- access_token = state["access_token"]
- refresh_token = state["refresh_token"]
- _oauth_trace(
- "refresh_success",
- sequence_id=sequence_id,
- reason="access_expiring",
- previous_refresh_token_fp=_token_fingerprint(previous_refresh_token),
- new_refresh_token_fp=_token_fingerprint(refresh_token),
- )
- # Persist immediately so downstream mint failures cannot drop rotated refresh tokens.
- _persist_state("post_refresh_access_expiring")
+ if _is_expiring(state.get("expires_at"), ACCESS_TOKEN_REFRESH_SKEW_SECONDS):
+ if not isinstance(refresh_token, str) or not refresh_token:
+ raise AuthError("Session expired and no refresh token is available.",
+ provider="nous", relogin_required=True)
+
+ _oauth_trace(
+ "refresh_start",
+ sequence_id=sequence_id,
+ reason="access_expiring",
+ refresh_token_fp=_token_fingerprint(refresh_token),
+ )
+ refreshed = _refresh_access_token(
+ client=client, portal_base_url=portal_base_url,
+ client_id=client_id, refresh_token=refresh_token,
+ )
+ now = datetime.now(timezone.utc)
+ access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
+ previous_refresh_token = refresh_token
+ state["access_token"] = refreshed["access_token"]
+ state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
+ state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
+ state["scope"] = refreshed.get("scope") or state.get("scope")
+ refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
+ if refreshed_url:
+ inference_base_url = refreshed_url
+ state["obtained_at"] = now.isoformat()
+ state["expires_in"] = access_ttl
+ state["expires_at"] = datetime.fromtimestamp(
+ now.timestamp() + access_ttl, tz=timezone.utc
+ ).isoformat()
+ access_token = state["access_token"]
+ refresh_token = state["refresh_token"]
+ _oauth_trace(
+ "refresh_success",
+ sequence_id=sequence_id,
+ reason="access_expiring",
+ previous_refresh_token_fp=_token_fingerprint(previous_refresh_token),
+ new_refresh_token_fp=_token_fingerprint(refresh_token),
+ )
+ # Persist immediately so downstream mint failures cannot drop rotated refresh tokens.
+ _persist_state("post_refresh_access_expiring")
# Step 2: mint agent key if missing/expiring
used_cached_key = False
@@ -3365,41 +3695,47 @@ def resolve_nous_runtime_credentials(
and isinstance(latest_refresh_token, str)
and latest_refresh_token
):
- _oauth_trace(
- "refresh_start",
- sequence_id=sequence_id,
- reason="mint_retry_after_invalid_token",
- refresh_token_fp=_token_fingerprint(latest_refresh_token),
- )
- refreshed = _refresh_access_token(
- client=client, portal_base_url=portal_base_url,
- client_id=client_id, refresh_token=latest_refresh_token,
- )
- now = datetime.now(timezone.utc)
- access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
- state["access_token"] = refreshed["access_token"]
- state["refresh_token"] = refreshed.get("refresh_token") or latest_refresh_token
- state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
- state["scope"] = refreshed.get("scope") or state.get("scope")
- refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
- if refreshed_url:
- inference_base_url = refreshed_url
- state["obtained_at"] = now.isoformat()
- state["expires_in"] = access_ttl
- state["expires_at"] = datetime.fromtimestamp(
- now.timestamp() + access_ttl, tz=timezone.utc
- ).isoformat()
- access_token = state["access_token"]
- refresh_token = state["refresh_token"]
- _oauth_trace(
- "refresh_success",
- sequence_id=sequence_id,
- reason="mint_retry_after_invalid_token",
- previous_refresh_token_fp=_token_fingerprint(latest_refresh_token),
- new_refresh_token_fp=_token_fingerprint(refresh_token),
- )
- # Persist retry refresh immediately for crash safety and cross-process visibility.
- _persist_state("post_refresh_mint_retry")
+ with _nous_shared_store_lock(timeout_seconds=max(timeout_seconds + 5.0, AUTH_LOCK_TIMEOUT_SECONDS)):
+ if _merge_shared_nous_oauth_state(state):
+ access_token = state.get("access_token")
+ latest_refresh_token = state.get("refresh_token")
+ _persist_state("post_shared_merge_mint_retry")
+ else:
+ _oauth_trace(
+ "refresh_start",
+ sequence_id=sequence_id,
+ reason="mint_retry_after_invalid_token",
+ refresh_token_fp=_token_fingerprint(latest_refresh_token),
+ )
+ refreshed = _refresh_access_token(
+ client=client, portal_base_url=portal_base_url,
+ client_id=client_id, refresh_token=latest_refresh_token,
+ )
+ now = datetime.now(timezone.utc)
+ access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
+ state["access_token"] = refreshed["access_token"]
+ state["refresh_token"] = refreshed.get("refresh_token") or latest_refresh_token
+ state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
+ state["scope"] = refreshed.get("scope") or state.get("scope")
+ refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
+ if refreshed_url:
+ inference_base_url = refreshed_url
+ state["obtained_at"] = now.isoformat()
+ state["expires_in"] = access_ttl
+ state["expires_at"] = datetime.fromtimestamp(
+ now.timestamp() + access_ttl, tz=timezone.utc
+ ).isoformat()
+ access_token = state["access_token"]
+ refresh_token = state["refresh_token"]
+ _oauth_trace(
+ "refresh_success",
+ sequence_id=sequence_id,
+ reason="mint_retry_after_invalid_token",
+ previous_refresh_token_fp=_token_fingerprint(latest_refresh_token),
+ new_refresh_token_fp=_token_fingerprint(refresh_token),
+ )
+ # Persist retry refresh immediately for crash safety and cross-process visibility.
+ _persist_state("post_refresh_mint_retry")
mint_payload = _mint_agent_key(
client=client, portal_base_url=portal_base_url,
@@ -3895,6 +4231,14 @@ def _config_provider_matches(provider_id: Optional[str]) -> bool:
return _get_config_provider() == provider_id.strip().lower()
+def _should_reset_config_provider_on_logout(provider_id: Optional[str]) -> bool:
+ """Return True when logout should reset the model provider config."""
+ if not provider_id:
+ return False
+ normalized = provider_id.strip().lower()
+ return normalized in PROVIDER_REGISTRY and _config_provider_matches(normalized)
+
+
def _logout_default_provider_from_config() -> Optional[str]:
"""Fallback logout target when auth.json has no active provider.
@@ -4980,15 +5324,18 @@ def logout_command(args) -> None:
print("No provider is currently logged in.")
return
- config_matches = _config_provider_matches(target)
+ should_reset_config = _should_reset_config_provider_on_logout(target)
provider_name = get_auth_provider_display_name(target)
- if clear_provider_auth(target) or config_matches:
- _reset_config_provider()
+ if clear_provider_auth(target) or should_reset_config:
+ if should_reset_config:
+ _reset_config_provider()
print(f"Logged out of {provider_name}.")
- if os.getenv("OPENROUTER_API_KEY"):
+ if should_reset_config and os.getenv("OPENROUTER_API_KEY"):
print("Hermes will use OpenRouter for inference.")
- else:
+ elif should_reset_config:
print("Run `hermes model` or configure an API key to use Hermes.")
+ else:
+ print("Model provider configuration was unchanged.")
else:
print(f"No auth state found for {provider_name}.")
diff --git a/hermes_cli/checkpoints.py b/hermes_cli/checkpoints.py
new file mode 100644
index 0000000000..cac5cd0979
--- /dev/null
+++ b/hermes_cli/checkpoints.py
@@ -0,0 +1,244 @@
+"""`hermes checkpoints` CLI subcommand.
+
+Gives users direct visibility and control over the filesystem checkpoint
+store at ``~/.hermes/checkpoints/``. Actions:
+
+ hermes checkpoints # same as `status`
+ hermes checkpoints status # total size, project count, breakdown
+ hermes checkpoints list # per-project checkpoint counts + workdir
+ hermes checkpoints prune [opts] # force a sweep (ignores the 24h marker)
+ hermes checkpoints clear [-f] # nuke the entire base (asks first)
+ hermes checkpoints clear-legacy # delete just the legacy-* archives
+
+Examples::
+
+ hermes checkpoints
+ hermes checkpoints prune --retention-days 3 --max-size-mb 200
+ hermes checkpoints clear -f
+
+None of these require the agent to be running. Safe to call any time.
+"""
+
+from __future__ import annotations
+
+import argparse
+import time
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict
+
+
+def _fmt_bytes(n: int) -> str:
+ units = ("B", "KB", "MB", "GB", "TB")
+ size = float(n or 0)
+ for unit in units:
+ if size < 1024 or unit == units[-1]:
+ if unit == "B":
+ return f"{int(size)} {unit}"
+ return f"{size:.1f} {unit}"
+ size /= 1024
+ return f"{size:.1f} TB"
+
+
+def _fmt_ts(ts: Any) -> str:
+ try:
+ return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M")
+ except (TypeError, ValueError):
+ return "—"
+
+
+def _fmt_age(ts: Any) -> str:
+ try:
+ age = time.time() - float(ts)
+ except (TypeError, ValueError):
+ return "—"
+ if age < 0:
+ return "now"
+ if age < 60:
+ return f"{int(age)}s ago"
+ if age < 3600:
+ return f"{int(age / 60)}m ago"
+ if age < 86400:
+ return f"{int(age / 3600)}h ago"
+ return f"{int(age / 86400)}d ago"
+
+
+def cmd_status(args: argparse.Namespace) -> int:
+ from tools.checkpoint_manager import store_status
+
+ info = store_status()
+ base = info["base"]
+ print(f"Checkpoint base: {base}")
+ print(f"Total size: {_fmt_bytes(info['total_size_bytes'])}")
+ print(f" store/ {_fmt_bytes(info['store_size_bytes'])}")
+ print(f" legacy-* {_fmt_bytes(info['legacy_size_bytes'])}")
+ print(f"Projects: {info['project_count']}")
+
+ projects = sorted(
+ info["projects"],
+ key=lambda p: (p.get("last_touch") or 0),
+ reverse=True,
+ )
+ if projects:
+ print()
+ print(f" {'WORKDIR':<60} {'COMMITS':>7} {'LAST TOUCH':>12} STATE")
+ for p in projects[: args.limit if hasattr(args, "limit") and args.limit else 20]:
+ wd = p.get("workdir") or "(unknown)"
+ if len(wd) > 60:
+ wd = "…" + wd[-59:]
+ exists = p.get("exists")
+ state = "live" if exists else "orphan"
+ commits = p.get("commits", 0)
+ last = _fmt_age(p.get("last_touch"))
+ print(f" {wd:<60} {commits:>7} {last:>12} {state}")
+
+ legacy = info.get("legacy_archives", [])
+ if legacy:
+ print()
+ print(f"Legacy archives ({len(legacy)}):")
+ for arch in sorted(legacy, key=lambda a: a.get("mtime", 0), reverse=True):
+ print(f" {arch['name']:<40} {_fmt_bytes(arch['size_bytes']):>10}")
+ print()
+ print("Clear with: hermes checkpoints clear-legacy")
+ return 0
+
+
+def cmd_list(args: argparse.Namespace) -> int:
+ # `list` is just a terser status — already covered.
+ return cmd_status(args)
+
+
+def cmd_prune(args: argparse.Namespace) -> int:
+ from tools.checkpoint_manager import prune_checkpoints
+
+ retention_days = args.retention_days
+ max_size_mb = args.max_size_mb
+
+ print("Pruning checkpoint store…")
+ print(f" retention_days: {retention_days}")
+ print(f" delete_orphans: {not args.keep_orphans}")
+ print(f" max_total_size_mb: {max_size_mb}")
+ print()
+
+ result = prune_checkpoints(
+ retention_days=retention_days,
+ delete_orphans=not args.keep_orphans,
+ max_total_size_mb=max_size_mb,
+ )
+ print(f"Scanned: {result['scanned']}")
+ print(f"Deleted orphan: {result['deleted_orphan']}")
+ print(f"Deleted stale: {result['deleted_stale']}")
+ print(f"Errors: {result['errors']}")
+ print(f"Bytes reclaimed: {_fmt_bytes(result['bytes_freed'])}")
+ return 0
+
+
+def _confirm(prompt: str) -> bool:
+ try:
+ resp = input(f"{prompt} [y/N]: ").strip().lower()
+ except (EOFError, KeyboardInterrupt):
+ print()
+ return False
+ return resp in ("y", "yes")
+
+
+def cmd_clear(args: argparse.Namespace) -> int:
+ from tools.checkpoint_manager import CHECKPOINT_BASE, clear_all, store_status
+
+ info = store_status()
+ if info["total_size_bytes"] == 0 and not Path(CHECKPOINT_BASE).exists():
+ print("Nothing to clear — checkpoint base does not exist.")
+ return 0
+
+ print(f"This will delete the ENTIRE checkpoint base at {info['base']}")
+ print(f" size: {_fmt_bytes(info['total_size_bytes'])}")
+ print(f" projects: {info['project_count']}")
+ print(f" legacy dirs: {len(info.get('legacy_archives', []))}")
+ print()
+ print("All /rollback history for every working directory will be lost.")
+ if not args.force and not _confirm("Proceed?"):
+ print("Aborted.")
+ return 1
+
+ result = clear_all()
+ if result["deleted"]:
+ print(f"Cleared. Reclaimed {_fmt_bytes(result['bytes_freed'])}.")
+ return 0
+ print("Could not clear checkpoint base (see logs).")
+ return 2
+
+
+def cmd_clear_legacy(args: argparse.Namespace) -> int:
+ from tools.checkpoint_manager import clear_legacy, store_status
+
+ info = store_status()
+ legacy = info.get("legacy_archives", [])
+ if not legacy:
+ print("No legacy archives to clear.")
+ return 0
+
+ total = sum(a.get("size_bytes", 0) for a in legacy)
+ print(f"Found {len(legacy)} legacy archive(s), total {_fmt_bytes(total)}:")
+ for arch in legacy:
+ print(f" {arch['name']:<40} {_fmt_bytes(arch['size_bytes']):>10}")
+ print()
+ print("Legacy archives hold pre-v2 per-project shadow repos, moved aside")
+ print("during the single-store migration. Delete when you're confident")
+ print("you don't need the old /rollback history.")
+ if not args.force and not _confirm("Delete all legacy archives?"):
+ print("Aborted.")
+ return 1
+
+ result = clear_legacy()
+ print(f"Deleted {result['deleted']} archive(s), reclaimed {_fmt_bytes(result['bytes_freed'])}.")
+ return 0
+
+
+def register_cli(parser: argparse.ArgumentParser) -> None:
+ """Wire subcommands onto the ``hermes checkpoints`` parser."""
+ parser.set_defaults(func=cmd_status) # bare `hermes checkpoints` → status
+ subs = parser.add_subparsers(dest="checkpoints_command", metavar="COMMAND")
+
+ p_status = subs.add_parser(
+ "status",
+ help="Show total size, project count, and per-project breakdown",
+ )
+ p_status.add_argument("--limit", type=int, default=20,
+ help="Max projects to list (default 20)")
+ p_status.set_defaults(func=cmd_status)
+
+ p_list = subs.add_parser(
+ "list",
+ help="Alias for 'status'",
+ )
+ p_list.add_argument("--limit", type=int, default=20)
+ p_list.set_defaults(func=cmd_list)
+
+ p_prune = subs.add_parser(
+ "prune",
+ help="Delete orphan/stale checkpoints and GC the store",
+ )
+ p_prune.add_argument("--retention-days", type=int, default=7,
+ help="Drop projects whose last_touch is older than N days (default 7)")
+ p_prune.add_argument("--max-size-mb", type=int, default=500,
+ help="After orphan/stale prune, drop oldest commits "
+ "per project until total size <= this (default 500)")
+ p_prune.add_argument("--keep-orphans", action="store_true",
+ help="Skip deleting projects whose workdir no longer exists")
+ p_prune.set_defaults(func=cmd_prune)
+
+ p_clear = subs.add_parser(
+ "clear",
+ help="Delete the entire checkpoint base (all /rollback history)",
+ )
+ p_clear.add_argument("-f", "--force", action="store_true",
+ help="Skip confirmation prompt")
+ p_clear.set_defaults(func=cmd_clear)
+
+ p_legacy = subs.add_parser(
+ "clear-legacy",
+ help="Delete only the legacy-/ archives from v1 migration",
+ )
+ p_legacy.add_argument("-f", "--force", action="store_true",
+ help="Skip confirmation prompt")
+ p_legacy.set_defaults(func=cmd_clear_legacy)
diff --git a/hermes_cli/claw.py b/hermes_cli/claw.py
index 9b02916a55..5455b4355d 100644
--- a/hermes_cli/claw.py
+++ b/hermes_cli/claw.py
@@ -235,6 +235,9 @@ def _scan_workspace_state(source_dir: Path) -> list[tuple[Path, str]]:
"""
findings: list[tuple[Path, str]] = []
+ if not source_dir.exists():
+ return findings
+
# Direct state files in the root
for name in ("todo.json", "sessions", "logs"):
candidate = source_dir / name
@@ -243,7 +246,12 @@ def _scan_workspace_state(source_dir: Path) -> list[tuple[Path, str]]:
findings.append((candidate, f"Root {kind}: {name}"))
# State files inside workspace directories
- for child in sorted(source_dir.iterdir()):
+ try:
+ children = sorted(source_dir.iterdir())
+ except OSError:
+ return findings
+
+ for child in children:
if not child.is_dir() or child.name.startswith("."):
continue
# Check for workspace-like subdirectories
diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py
index c7ddfa0fa0..de41bcfae7 100644
--- a/hermes_cli/commands.py
+++ b/hermes_cli/commands.py
@@ -65,6 +65,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Session
CommandDef("new", "Start a new session (fresh session ID + history)", "Session",
aliases=("reset",), args_hint="[name]"),
+ CommandDef("topic", "Enable or inspect Telegram DM topic sessions", "Session",
+ gateway_only=True, args_hint="[off|help|session-id]"),
CommandDef("clear", "Clear screen and start a new session", "Session",
cli_only=True),
CommandDef("redraw", "Force a full UI repaint (recovers from terminal drift)", "Session",
@@ -107,6 +109,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("resume", "Resume a previously-named session", "Session",
args_hint="[name]"),
+ # Configuration
+ CommandDef("sessions", "Browse and resume previous sessions", "Session"),
+
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
@@ -155,9 +160,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
- CommandDef("curator", "Background skill maintenance (status, run, pin, archive)",
+ CommandDef("curator", "Background skill maintenance (status, run, pin, archive, list-archived)",
"Tools & Skills", args_hint="[subcommand]",
- subcommands=("status", "run", "pause", "resume", "pin", "unpin", "restore")),
+ subcommands=("status", "run", "pause", "resume", "pin", "unpin", "restore", "list-archived")),
CommandDef("kanban", "Multi-profile collaboration board (tasks, links, comments)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("list", "ls", "show", "create", "assign", "link", "unlink",
diff --git a/hermes_cli/config.py b/hermes_cli/config.py
index 0f34d98528..1e040c3685 100644
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -544,12 +544,25 @@ DEFAULT_CONFIG = {
# via TERMINAL_LOCAL_PERSISTENT env var.
"persistent_shell": True,
},
-
+
+ "web": {
+ "backend": "", # shared fallback — applies to both search and extract
+ "search_backend": "", # per-capability override for web_search (e.g. "searxng")
+ "extract_backend": "", # per-capability override for web_extract (e.g. "native")
+ },
+
"browser": {
"inactivity_timeout": 120,
"command_timeout": 30, # Timeout for browser commands in seconds (screenshot, navigate, etc.)
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
+ # Browser engine for local mode. Passed as ``--engine `` to
+ # agent-browser v0.25.3+.
+ # "auto" — use Chrome (default, don't pass --engine at all)
+ # "lightpanda" — use Lightpanda (1.3-5.8x faster navigation, no screenshots)
+ # "chrome" — explicitly request Chrome
+ # Also settable via AGENT_BROWSER_ENGINE env var.
+ "engine": "auto",
"auto_local_for_private_urls": True, # When a cloud provider is set, auto-spawn local Chromium for LAN/localhost URLs instead of sending them to the cloud
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
@@ -567,21 +580,39 @@ DEFAULT_CONFIG = {
},
# Filesystem checkpoints — automatic snapshots before destructive file ops.
- # When enabled, the agent takes a snapshot of the working directory once per
- # conversation turn (on first write_file/patch call). Use /rollback to restore.
+ # When enabled, the agent takes a snapshot of the working directory once
+ # per conversation turn (on first write_file/patch call). Use /rollback
+ # to restore.
+ #
+ # Defaults changed in v2 (single shared shadow store, real pruning):
+ # - enabled: True -> False (opt-in; most users never use /rollback)
+ # - max_snapshots: 50 -> 20 (now actually enforced via ref rewrite)
+ # - auto_prune: False -> True (orphans/stale pruned automatically)
+ # Opt in via ``hermes chat --checkpoints`` or set enabled=True here.
"checkpoints": {
- "enabled": True,
- "max_snapshots": 50, # Max checkpoints to keep per directory
- # Auto-maintenance: shadow repos accumulate forever under
- # ~/.hermes/checkpoints/ (one per cd'd working directory). Field
- # reports put the typical offender at 1000+ repos / ~12 GB. When
- # auto_prune is on, hermes sweeps at startup (at most once per
- # min_interval_hours) and deletes:
- # * orphan repos: HERMES_WORKDIR no longer exists on disk
- # * stale repos: newest mtime older than retention_days
- # Opt-in so users who rely on /rollback against long-ago sessions
- # never lose data silently.
- "auto_prune": False,
+ "enabled": False,
+ # Max checkpoints to keep per working directory. Pre-v2 this only
+ # limited the `/rollback` listing; v2 actually rewrites the ref and
+ # garbage-collects older commits.
+ "max_snapshots": 20,
+ # Hard ceiling on total ``~/.hermes/checkpoints/`` size (MB). When
+ # exceeded, the oldest checkpoint per project is dropped in a
+ # round-robin pass until total size falls under the cap.
+ # 0 disables the size cap.
+ "max_total_size_mb": 500,
+ # Skip any single file larger than this when staging a checkpoint.
+ # Prevents accidental snapshotting of datasets, model weights, and
+ # other large generated assets. 0 disables the filter.
+ "max_file_size_mb": 10,
+ # Auto-maintenance: hermes sweeps the checkpoint base at startup
+ # (at most once per ``min_interval_hours``) and:
+ # * deletes project entries whose workdir no longer exists (orphan)
+ # * deletes project entries whose last_touch is older than
+ # ``retention_days``
+ # * GCs the single shared store to reclaim unreachable objects
+ # * enforces ``max_total_size_mb`` across remaining projects
+ # * deletes ``legacy-*`` archives older than ``retention_days``
+ "auto_prune": True,
"retention_days": 7,
"delete_orphans": True,
"min_interval_hours": 24,
@@ -749,6 +780,19 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
+ # Triage specifier — flesh out a rough one-liner in the Kanban
+ # Triage column into a concrete spec, then promote it to ``todo``.
+ # Invoked by ``hermes kanban specify`` (single id or --all). Set a
+ # cheap, capable model here (gemini-flash works well); the main
+ # model is overkill for short spec expansion.
+ "triage_specifier": {
+ "provider": "auto",
+ "model": "",
+ "base_url": "",
+ "api_key": "",
+ "timeout": 120,
+ "extra_body": {},
+ },
# Curator — skill-usage review fork. Timeout is generous because the
# review pass can take several minutes on reasoning models (umbrella
# building over hundreds of candidate skills). "auto" = use main chat
@@ -778,9 +822,19 @@ DEFAULT_CONFIG = {
"show_reasoning": False,
"streaming": False,
"final_response_markdown": "strip", # render | strip | raw
+ # Preserve recent classic CLI output across Ctrl+L, /redraw, and
+ # terminal resize full-screen clears. Disable if a terminal emulator
+ # behaves badly with replayed scrollback.
+ "persistent_output": True,
+ "persistent_output_max_lines": 200,
"inline_diffs": True, # Show inline diff previews for write actions (write_file, patch, skill_manage)
"show_cost": False, # Show $ cost in the status bar (off by default)
"skin": "default",
+ # UI language for static user-facing messages (approval prompts, a
+ # handful of gateway slash-command replies). Does NOT affect agent
+ # responses, log lines, tool outputs, or slash-command descriptions.
+ # Supported: en, zh, ja, de, es, fr, tr, uk. Unknown values fall back to en.
+ "language": "en",
# TUI busy indicator style: kaomoji (default), emoji, unicode (braille
# spinner), or ascii. Live-swappable via `/indicator