Merge branch 'main' into fix/hermes-plugin-openinference-finalization

2026-06-09 08:21:50 +00:00 · 2026-06-08 14:19:18 -07:00 · 2026-06-08 14:19:18 -07:00 · cf49630379
commit cf49630379
parent 728612c29c 9fd3d5cf85
314 changed files with 33143 additions and 14664 deletions
--- a/.github/workflows/nix-lockfile-fix.yml
+++ b/.github/workflows/nix-lockfile-fix.yml
@ -75,9 +75,10 @@ jobs:
        run: |
          set -euo pipefail

-          # Ensure only nix files were modified — prevents accidental
-          # self-triggering if fix-lockfiles ever touches package files.
-          unexpected="$(git diff --name-only | grep -Ev '^nix/(tui|web)\.nix$' || true)"
+          # Ensure only nix/lib.nix (home of the single npmDepsHash) was
+          # modified — prevents accidental self-triggering if fix-lockfiles
+          # ever touches package files.
+          unexpected="$(git diff --name-only | grep -Ev '^nix/lib\.nix$' || true)"
          if [ -n "$unexpected" ]; then
            echo "::error::Unexpected modified files: $unexpected"
            exit 1
@ -89,7 +90,7 @@ jobs:

          git config user.name 'github-actions[bot]'
          git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
-          git add nix/tui.nix nix/web.nix
+          git add nix/lib.nix
          git commit -m "fix(nix): auto-refresh npm lockfile hashes" \
            -m "Source: $GITHUB_SHA" \
            -m "Run: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID"
@ -216,7 +217,7 @@ jobs:
          set -euo pipefail
          git config user.name 'github-actions[bot]'
          git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
-          git add nix/tui.nix nix/web.nix
+          git add nix/lib.nix
          git commit -m "fix(nix): refresh npm lockfile hashes"
          git push

--- a/README.md
+++ b/README.md
@ -10,6 +10,7 @@
  <a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License: MIT"></a>
  <a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
  <a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
+  <a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
 </p>

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
@ -52,7 +53,7 @@ If you already have Git installed, the installer detects it and uses that instea

 > **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
 >
-> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
+> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux.

 After installation:

--- a/README.ur-pk.md
+++ b/README.ur-pk.md
@ -0,0 +1,261 @@
+<div dir="rtl">
+
+<p align="center">
+  <img src="assets/banner.png" alt="Hermes Agent" width="100%">
+</p>
+
+# ہرمیس ایجنٹ ☤ (Hermes Agent)
+
+<p align="center">
+  <a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentation"></a>
+  <a href="https://discord.gg/NousResearch"><img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
+  <a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License: MIT"></a>
+  <a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
+  <a href="README.md"><img src="https://img.shields.io/badge/Lang-English-lightgrey?style=for-the-badge" alt="English"></a>
+  <a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
+</p>
+
+**[نوس ریسرچ (Nous Research)](https://nousresearch.com) کا تیار کردہ خود کو بہتر بنانے والا اے آئی (AI) ایجنٹ۔** یہ واحد ایجنٹ ہے جس میں سیکھنے کا عمل (learning loop) پہلے سے موجود ہے — یہ اپنے تجربات سے نئی مہارتیں (skills) بناتا ہے، استعمال کے دوران ان کو بہتر کرتا ہے، معلومات کو محفوظ رکھنے کے لیے خود کو یاد دہانی کرواتا ہے، اپنی پرانی بات چیت کو تلاش کر سکتا ہے، اور مختلف سیشنز کے دوران آپ کے بارے میں ایک گہری سمجھ پیدا کرتا ہے۔ اسے $5 والے VPS پر چلائیں، GPU کلسٹر پر، یا سرور لیس (serverless) انفراسٹرکچر پر جس کی قیمت استعمال نہ ہونے پر تقریباً صفر ہے۔ یہ آپ کے لیپ ٹاپ تک محدود نہیں ہے — آپ ٹیلی گرام (Telegram) سے اس کے ساتھ بات چیت کر سکتے ہیں جبکہ یہ کلاؤڈ VM پر کام کر رہا ہو۔
+
+آپ اپنی مرضی کا کوئی بھی ماڈل استعمال کر سکتے ہیں — [Nous Portal](https://portal.nousresearch.com)، [OpenRouter](https://openrouter.ai) (200 سے زائد ماڈلز)، [NovitaAI](https://novita.ai) (ماڈل API، ایجنٹ سینڈ باکس، اور GPU کلاؤڈ کے لیے اے آئی مقامی کلاؤڈ)، [NVIDIA NIM](https://build.nvidia.com) (Nemotron)، [Xiaomi MiMo](https://platform.xiaomimimo.com)، [z.ai/GLM](https://z.ai)، [Kimi/Moonshot](https://platform.moonshot.ai)، [MiniMax](https://www.minimax.io)، [Hugging Face](https://huggingface.co)، OpenAI، یا اپنا حسب ضرورت اینڈ پوائنٹ (endpoint) استعمال کریں۔ ماڈل تبدیل کرنے کے لیے صرف `hermes model` استعمال کریں — کسی کوڈ کو تبدیل کرنے کی ضرورت نہیں، کوئی پابندی نہیں۔
+
+<table>
+<tr><td><b>حقیقی ٹرمینل انٹرفیس</b></td><td>مکمل TUI جس میں ملٹی لائن ایڈیٹنگ، سلیش-کمانڈ آٹو کمپلیٹ، بات چیت کی ہسٹری، انٹرپٹ اور ری ڈائریکٹ، اور سٹریمنگ ٹول آؤٹ پٹ شامل ہے۔</td></tr>
+<tr><td><b>یہ وہاں موجود ہے جہاں آپ ہیں</b></td><td>ٹیلی گرام، ڈسکارڈ (Discord)، سلیک (Slack)، واٹس ایپ (WhatsApp)، سگنل (Signal)، اور CLI — سب ایک ہی گیٹ وے پروسیس سے کام کرتے ہیں۔ وائس میمو (Voice memo) ٹرانسکرپشن، کراس پلیٹ فارم بات چیت کا تسلسل۔</td></tr>
+<tr><td><b>سیکھنے کا ایک مکمل عمل</b></td><td>ایجنٹ کی اپنی ترتیب دی گئی میموری، جس میں وہ خود کو وقتاً فوقتاً یاد دہانی کرواتا ہے۔ پیچیدہ کاموں کے بعد خود کار طریقے سے مہارت (skill) کی تخلیق۔ استعمال کے دوران مہارتوں میں بہتری۔ LLM سمرائزیشن کے ساتھ FTS5 سیشن سرچ تاکہ پرانے سیشنز کی یاددہانی کی جا سکے۔ <a href="https://github.com/plastic-labs/honcho">Honcho</a> کے ذریعے صارف کی ماڈلنگ۔ <a href="https://agentskills.io">agentskills.io</a> اوپن سٹینڈرڈ کے ساتھ مکمل مطابقت۔</td></tr>
+<tr><td><b>شیڈول کی گئی خودکار کارروائیاں</b></td><td>بلٹ ان (Built-in) کرون (cron) شیڈیولر جو کسی بھی پلیٹ فارم پر ڈیلیوری کے لیے استعمال ہو سکتا ہے۔ روزانہ کی رپورٹس، رات کے بیک اپس، ہفتہ وار آڈٹس — یہ سب کچھ قدرتی زبان (natural language) میں اور بغیر کسی نگرانی کے کام کرتا ہے۔</td></tr>
+<tr><td><b>کام کی تقسیم اور متوازی عمل</b></td><td>متوازی (parallel) کاموں کے لیے الگ سے ذیلی ایجنٹس (subagents) بنائیں۔ پائتھون (Python) سکرپٹس لکھیں جو RPC کے ذریعے ٹولز کو استعمال کریں، تاکہ کئی مراحل پر مشتمل کاموں کو بغیر کسی سیاق و سباق (context) کے خرچ کے، ایک ہی باری میں انجام دیا جا سکے۔</td></tr>
+<tr><td><b>کہیں بھی چلائیں، صرف اپنے لیپ ٹاپ پر نہیں</b></td><td>چھ (Six) ٹرمینل بیک اینڈز — لوکل، Docker، SSH، Singularity، Modal، اور Daytona۔ ڈیٹونا (Daytona) اور موڈل (Modal) سرور لیس (serverless) فعالیت پیش کرتے ہیں — جب آپ کا ایجنٹ فارغ ہوتا ہے تو اس کا ماحول سلیپ (hibernate) ہو جاتا ہے اور ضرورت پڑنے پر خود بخود جاگ جاتا ہے، جس کی وجہ سے سیشنز کے درمیان لاگت تقریباً صفر رہتی ہے۔ اسے $5 والے VPS یا GPU کلسٹر پر چلائیں۔</td></tr>
+<tr><td><b>تحقیق کے لیے تیار</b></td><td>بیچ (Batch) ٹریجیکٹری (trajectory) جنریشن، اگلی نسل کے ٹول کالنگ ماڈلز کی تربیت کے لیے ٹریجیکٹری کمپریشن۔</td></tr>
+</table>
+
+---
+
+## فوری انسٹالیشن (Quick Install)
+
+### لینکس (Linux)، میک او ایس (macOS)، ڈبلیو ایس ایل ٹو (WSL2)، ٹرمکس (Termux)
+
+<div dir="ltr">
+
+```bash
+curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
+```
+
+</div>
+
+### ونڈوز (نیٹو، پاور شیل)
+
+> **توجہ فرمائیں:** مقامی ونڈوز (Native Windows) پر ہرمیس بغیر WSL کے چلتا ہے — CLI، گیٹ وے، TUI، اور ٹولز سب مقامی طور پر کام کرتے ہیں۔ اگر آپ WSL2 استعمال کرنا پسند کرتے ہیں، تو اوپر دی گئی لینکس/میک او ایس کی کمانڈ وہاں بھی کام کرے گی۔ کوئی مسئلہ نظر آیا؟ براہ کرم [مسائل (issues) درج کریں](https://github.com/NousResearch/hermes-agent/issues)۔
+
+اسے پاور شیل (PowerShell) میں چلائیں:
+
+<div dir="ltr">
+
+```powershell
+iex (irm https://hermes-agent.nousresearch.com/install.ps1)
+```
+
+</div>
+
+انسٹالر سب کچھ خود سنبھالتا ہے: uv، Python 3.11، Node.js، ripgrep، ffmpeg، **اور ایک پورٹ ایبل (portable) گٹ بیش (Git Bash)** (یعنی MinGit، جو `%LOCALAPPDATA%\hermes\git` میں ان پیک ہوتا ہے — اس کے لیے ایڈمن کی اجازت درکار نہیں، اور یہ سسٹم کے کسی بھی گٹ انسٹال سے بالکل الگ ہے)۔ ہرمیس اس بنڈل شدہ گٹ بیش کو شیل کمانڈز چلانے کے لیے استعمال کرتا ہے۔
+
+اگر آپ کے پاس پہلے سے گٹ (Git) انسٹال ہے، تو انسٹالر اسے شناخت کر لیتا ہے اور اسے ہی استعمال کرتا ہے۔ بصورت دیگر آپ کو صرف ~45MB کے MinGit ڈاؤنلوڈ کی ضرورت ہوگی — یہ آپ کے سسٹم کے گٹ پر کوئی اثر نہیں ڈالے گا۔
+
+> **اینڈرائیڈ (Android) / ٹرمکس (Termux):** ٹیسٹ کیا گیا مینوئل طریقہ [Termux گائیڈ](https://hermes-agent.nousresearch.com/docs/getting-started/termux) میں موجود ہے۔ ٹرمکس پر ہرمیس ایک مخصوص `.[termux]` ایکسٹرا انسٹال کرتا ہے کیونکہ مکمل `.[all]` ایکسٹرا میں ایسی وائس ڈیپینڈینسیز شامل ہیں جو اینڈرائیڈ کے ساتھ مطابقت نہیں رکھتیں۔
+>
+> **ونڈوز (Windows):** مقامی ونڈوز کی مکمل سپورٹ موجود ہے — اوپر دی گئی پاور شیل کی کمانڈ سب کچھ انسٹال کر دیتی ہے۔ اگر آپ WSL2 استعمال کرنا چاہتے ہیں، تو لینکس کی کمانڈ وہاں کام کرتی ہے۔ مقامی ونڈوز میں انسٹالیشن `%LOCALAPPDATA%\hermes` میں ہوتی ہے؛ جبکہ WSL2 میں لینکس کی طرح `~/.hermes` میں ہوتی ہے۔ ہرمیس کا وہ واحد فیچر جسے فی الحال خاص طور پر WSL2 کی ضرورت ہے وہ براؤزر پر مبنی ڈیش بورڈ چیٹ پین ہے (یہ POSIX PTY استعمال کرتا ہے — کلاسک CLI اور گیٹ وے دونوں مقامی طور پر چلتے ہیں)۔
+
+انسٹالیشن کے بعد:
+
+<div dir="ltr">
+
+```bash
+source ~/.bashrc    # شیل کو ری لوڈ کریں (یا: source ~/.zshrc)
+hermes              # بات چیت شروع کریں!
+```
+
+</div>
+
+---
+
+## آغاز کریں (Getting Started)
+
+<div dir="ltr">
+
+```bash
+hermes              # انٹرایکٹو CLI — بات چیت شروع کریں
+hermes model        # اپنا LLM پرووائیڈر اور ماڈل منتخب کریں
+hermes tools        # کنفیگر کریں کہ کون سے ٹولز ایکٹو ہیں
+hermes config set   # انفرادی کنفگ (config) ویلیوز سیٹ کریں
+hermes gateway      # میسجنگ گیٹ وے شروع کریں (ٹیلی گرام، ڈسکارڈ، وغیرہ)
+hermes setup        # مکمل سیٹ اپ وزرڈ چلائیں (یہ سب کچھ ایک ساتھ کنفیگر کر دے گا)
+hermes claw migrate # OpenClaw سے مائیگریٹ کریں (اگر آپ OpenClaw سے آ رہے ہیں)
+hermes update       # لیٹسٹ ورژن پر اپ ڈیٹ کریں
+hermes doctor       # کسی بھی مسئلے کی تشخیص کریں
+```
+
+</div>
+
+📖 **[مکمل دستاویزات →](https://hermes-agent.nousresearch.com/docs/)**
+
+---
+
+## API-کیز اکٹھی کرنے سے بچیں — Nous Portal
+
+ہرمیس آپ کے پسندیدہ پرووائیڈر کے ساتھ کام کرتا ہے — یہ چیز تبدیل نہیں ہو رہی۔ لیکن اگر آپ ماڈل، ویب سرچ، امیج جنریشن، TTS، اور کلاؤڈ براؤزر کے لیے پانچ الگ الگ API کیز جمع نہیں کرنا چاہتے، تو **[Nous Portal](https://portal.nousresearch.com)** ان سب کو ایک ہی سبسکرپشن کے تحت کور کرتا ہے:
+
+- **300+ ماڈلز** — ان میں سے کوئی بھی ماڈل `/model <name>` کے ذریعے منتخب کریں
+- **ٹول گیٹ وے (Tool Gateway)** — ویب سرچ (Firecrawl)، امیج جنریشن (FAL)، ٹیکسٹ ٹو سپیچ (OpenAI)، کلاؤڈ براؤزر (Browser Use)، یہ سب آپ کی سبسکرپشن کے ذریعے چلتے ہیں۔ کسی اضافی اکاؤنٹ کی ضرورت نہیں۔
+
+نئی انسٹالیشن کے بعد بس ایک کمانڈ کی ضرورت ہے:
+
+<div dir="ltr">
+
+```bash
+hermes setup --portal
+```
+
+</div>
+
+یہ آپ کو OAuth کے ذریعے لاگ ان کرواتا ہے، Nous کو آپ کا پرووائیڈر مقرر کرتا ہے، اور ٹول گیٹ وے کو آن کر دیتا ہے۔ `hermes portal info` کمانڈ استعمال کر کے آپ کسی بھی وقت چیک کر سکتے ہیں کہ کون کون سی سروسز منسلک ہیں۔ مکمل تفصیلات [Tool Gateway دستاویزات کے صفحے](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway) پر موجود ہیں۔
+
+آپ اب بھی کسی بھی ٹول کے لیے اپنی مرضی کی API کیز استعمال کر سکتے ہیں — گیٹ وے ہر سروس کے لیے الگ الگ کام کرتا ہے، ایسا نہیں کہ یا تو سب کچھ استعمال کریں یا کچھ بھی نہیں۔
+
+---
+
+## CLI بمقابلہ میسجنگ فوری حوالہ
+
+ہرمیس کے دو بنیادی انٹر فیس ہیں: آپ ٹرمینل UI کو `hermes` کے ساتھ شروع کریں، یا گیٹ وے چلا کر اس کے ساتھ ٹیلی گرام، ڈسکارڈ، سلیک، واٹس ایپ، سگنل، یا ای میل کے ذریعے بات کریں۔ جب آپ کسی بات چیت میں ہوتے ہیں، تو بہت سی سلیش (slash) کمانڈز دونوں انٹرفیسز میں ایک جیسی ہوتی ہیں۔
+
+<div dir="ltr">
+
+| کارروائی (Action)                         | سی ایل آئی (CLI)                              | میسجنگ پلیٹ فارمز (Messaging platforms)                                          |
+| --------------------------------------- | --------------------------------------------- | -------------------------------------------------------------------------------- |
+| بات چیت شروع کریں                       | `hermes`                                      | `hermes gateway setup` اور `hermes gateway start` چلائیں، پھر بوٹ کو میسج بھیجیں |
+| نئی بات چیت شروع کریں                   | `/new` یا `/reset`                            | `/new` یا `/reset`                                                               |
+| ماڈل تبدیل کریں                         | `/model [provider:model]`                     | `/model [provider:model]`                                                        |
+| پرسنلٹی (Personality) سیٹ کریں           | `/personality [name]`                         | `/personality [name]`                                                            |
+| پچھلی باری کو دوبارہ یا منسوخ (undo) کریں | `/retry`، `/undo`                             | `/retry`، `/undo`                                                                |
+| کانٹیکسٹ (context) کمپریس کریں / استعمال چیک کریں | `/compress`، `/usage`، `/insights [--days N]` | `/compress`، `/usage`، `/insights [days]`                                        |
+| مہارتیں (Skills) براؤز کریں             | `/skills` یا `/<skill-name>`                  | `/<skill-name>`                                                                  |
+| موجودہ کام کو روکیں                     | `Ctrl+C` دبائیں یا نیا میسج بھیجیں            | `/stop` یا نیا میسج بھیجیں                                                       |
+| پلیٹ فارم کے لحاظ سے سٹیٹس              | `/platforms`                                  | `/status`، `/sethome`                                                            |
+
+</div>
+
+مکمل کمانڈ لسٹ کے لیے، [CLI گائیڈ](https://hermes-agent.nousresearch.com/docs/user-guide/cli) اور [میسجنگ گیٹ وے گائیڈ](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) دیکھیں۔
+
+---
+
+## دستاویزات (Documentation)
+
+تمام دستاویزات **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)** پر موجود ہیں:
+
+<div dir="ltr">
+
+| سیکشن (Section)                                                                                     | تفصیل (What's Covered)                                     |
+| --------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
+| [فوری آغاز (Quickstart)](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart)     | انسٹالیشن → سیٹ اپ → 2 منٹ میں پہلی بات چیت شروع کریں       |
+| [CLI کا استعمال](https://hermes-agent.nousresearch.com/docs/user-guide/cli)                         | کمانڈز، کی بائنڈنگز (keybindings)، پرسنلٹیز (personalities)، سیشنز |
+| [کنفیگریشن (Configuration)](https://hermes-agent.nousresearch.com/docs/user-guide/configuration)    | کنفگ فائل، پرووائیڈرز، ماڈلز، اور تمام آپشنز               |
+| [میسجنگ گیٹ وے](https://hermes-agent.nousresearch.com/docs/user-guide/messaging)                    | ٹیلی گرام، ڈسکارڈ، سلیک، واٹس ایپ، سگنل، ہوم اسسٹنٹ         |
+| [سیکیورٹی (Security)](https://hermes-agent.nousresearch.com/docs/user-guide/security)              | کمانڈ کی منظوری، DM پیئرنگ (pairing)، کنٹینر آئسولیشن       |
+| [ٹولز اور ٹول سیٹس](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools)          | 40 سے زائد ٹولز، ٹول سیٹ سسٹم، ٹرمینل بیک اینڈز             |
+| [مہارتوں کا سسٹم (Skills System)](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills)| پروسیجرل (Procedural) میموری، سکلز ہب، نئی مہارتیں بنانا    |
+| [میموری (Memory)](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory)            | مستقل میموری، یوزر پروفائلز، بہترین طریقہ کار              |
+| [MCP انضمام (Integration)](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp)      | صلاحیتوں کو بڑھانے کے لیے کسی بھی MCP سرور کو جوڑیں        |
+| [کرون (Cron) شیڈیولنگ](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron)         | پلیٹ فارم ڈیلیوری کے ساتھ شیڈول کیے گئے کام                 |
+| [کانٹیکسٹ (Context) فائلز](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files)| پروجیکٹ کا سیاق و سباق (context) جو ہر بات چیت پر اثر انداز ہوتا ہے |
+| [آرکیٹیکچر (Architecture)](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | پروجیکٹ کا ڈھانچہ، ایجنٹ لوپ، اہم کلاسز                    |
+| [تعاون (Contributing)](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing)     | ڈیویلپمنٹ سیٹ اپ، PR کا طریقہ کار، کوڈنگ کا انداز          |
+| [CLI حوالہ جات (Reference)](https://hermes-agent.nousresearch.com/docs/reference/cli-commands)      | تمام کمانڈز اور فلیگز (flags)                              |
+| [انوائرمنٹ ویری ایبلز](https://hermes-agent.nousresearch.com/docs/reference/environment-variables)  | مکمل انوائرمنٹ ویری ایبل حوالہ جات                         |
+
+</div>
+
+---
+
+## OpenClaw سے منتقلی
+
+اگر آپ OpenClaw سے منتقل ہو رہے ہیں، تو ہرمیس آپ کی سیٹنگز، یادیں (memories)، مہارتیں (skills)، اور API کیز کو خود بخود امپورٹ کر سکتا ہے۔
+
+**پہلی بار سیٹ اپ کے دوران:** سیٹ اپ وزرڈ (`hermes setup`) خود بخود `~/.openclaw` کو پہچان لیتا ہے اور کنفیگریشن شروع ہونے سے پہلے مائیگریٹ (migrate) کرنے کا آپشن دیتا ہے۔
+
+**انسٹالیشن کے بعد کسی بھی وقت:**
+
+<div dir="ltr">
+
+```bash
+hermes claw migrate              # انٹرایکٹو مائیگریشن (مکمل پری سیٹ)
+hermes claw migrate --dry-run    # جائزہ لیں کہ کیا کیا مائیگریٹ ہوگا
+hermes claw migrate --preset user-data   # حساس معلومات (secrets) کے بغیر مائیگریٹ کریں
+hermes claw migrate --overwrite  # موجودہ متصادم فائلوں کو اوور رائٹ کریں
+```
+
+</div>
+
+جو چیزیں امپورٹ ہوتی ہیں:
+
+- **SOUL.md** — پرسونا (persona) فائل
+- **میموریز (Memories)** — MEMORY.md اور USER.md کی اندراجات
+- **مہارتیں (Skills)** — صارف کی بنائی گئی مہارتیں → `~/.hermes/skills/openclaw-imports/`
+- **کمانڈ الاؤ لسٹ (allowlist)** — منظوری کے پیٹرنز (approval patterns)
+- **میسجنگ سیٹنگز** — پلیٹ فارم کنفیگریشنز، اجازت یافتہ صارفین، ورکنگ ڈائریکٹری
+- **API کیز** — الاؤ لسٹ شدہ حساس معلومات (ٹیلی گرام، OpenRouter، OpenAI، Anthropic، ElevenLabs)
+- **TTS اثاثے** — ورک اسپیس کی آڈیو فائلیں
+- **ورک اسپیس کی ہدایات** — AGENTS.md (`--workspace-target` کے ساتھ)
+
+تمام آپشنز دیکھنے کے لیے `hermes claw migrate --help` استعمال کریں، یا انٹرایکٹو ایجنٹ کی مدد سے مائیگریٹ کرنے کے لیے `openclaw-migration` سکل کا استعمال کریں (جس میں ڈرائی رن (dry-run) پریویوز شامل ہیں)۔
+
+---
+
+## تعاون کریں (Contributing)
+
+ہم آپ کے تعاون کا خیرمقدم کرتے ہیں! ڈیویلپمنٹ سیٹ اپ، کوڈ کے انداز اور PR کے طریقہ کار کے لیے براہ کرم ہماری [Contributing گائیڈ](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) دیکھیں۔
+
+معاونین (contributors) کے لیے فوری آغاز — کلون (clone) کریں اور `setup-hermes.sh` چلائیں:
+
+<div dir="ltr">
+
+```bash
+git clone https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+./setup-hermes.sh     # uv کو انسٹال کرتا ہے، venv بناتا ہے، .[all] کو انسٹال کرتا ہے، اور ~/.local/bin/hermes کا سیم لنک (symlink) بناتا ہے
+./hermes              # خود بخود venv کی شناخت کرتا ہے، پہلے `source` کرنے کی ضرورت نہیں
+```
+
+</div>
+
+مینوئل طریقہ (اوپر والے طریقے کے مساوی):
+
+<div dir="ltr">
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+uv venv .venv --python 3.11
+source .venv/bin/activate
+uv pip install -e ".[all,dev]"
+scripts/run_tests.sh
+```
+
+</div>
+
+---
+
+## کمیونٹی (Community)
+
+- 💬 [ڈسکارڈ (Discord)](https://discord.gg/NousResearch)
+- 📚 [سکلز ہب (Skills Hub)](https://agentskills.io)
+- 🐛 [مسائل (Issues)](https://github.com/NousResearch/hermes-agent/issues)
+- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — ہرمیس اور دیگر MCP ہوسٹس کے لیے لینکس (Linux) ڈیسک ٹاپ کنٹرول MCP سرور، جس میں AT-SPI ایکسیسیبلٹی ٹریز، Wayland/X11 ان پٹ، سکرین شاٹس، اور کمپوزیٹر ونڈو ٹارگیٹنگ شامل ہے۔
+- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — کمیونٹی وی چیٹ (WeChat) برج: ہرمیس ایجنٹ اور OpenClaw کو ایک ہی وی چیٹ اکاؤنٹ پر چلائیں۔
+
+---
+
+## لائسنس (License)
+
+MIT — تفصیلات کے لیے [LICENSE](LICENSE) دیکھیں۔
+
+[نوس ریسرچ (Nous Research)](https://nousresearch.com) کی جانب سے تیار کردہ۔
+
+</div>
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@ -10,6 +10,7 @@
  <a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License: MIT"></a>
  <a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
  <a href="README.md"><img src="https://img.shields.io/badge/Lang-English-lightgrey?style=for-the-badge" alt="English"></a>
+  <a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
 </p>

 **由 [Nous Research](https://nousresearch.com) 构建的自进化 AI 代理。** 它是唯一内置学习闭环的智能代理——从经验中创建技能，在使用中改进技能，主动持久化知识，搜索过往对话，并在跨会话中逐步构建对你的深度理解。可以在 $5 的 VPS 上运行，也可以在 GPU 集群上运行，或者使用几乎零成本的 Serverless 基础设施。它不绑定你的笔记本——你可以在 Telegram 上与它对话，而它在云端 VM 上工作。
--- a/acp_adapter/provenance.py
+++ b/acp_adapter/provenance.py
@ -0,0 +1,127 @@
+"""Derive ACP session-provenance metadata from the existing compression chain.
+
+This is an additive Hermes extension surfaced under ACP ``_meta.hermes`` so
+existing ACP clients ignore it. It carries no new persisted state: everything
+is derived on demand from the ``sessions`` table (``parent_session_id`` /
+``end_reason``), which already models compression-continuation chains.
+
+The ACP/editor ``session_id`` stays the stable public handle. When context
+compression rotates the internal Hermes head, ``build_session_provenance`` lets
+a client see the previous/current internal ids and the lineage root without
+parsing status text, guessing from token drops, or reading ``state.db``.
+"""
+
+from __future__ import annotations
+
+from typing import Any, Dict, Optional
+
+# Bound defensive walks; compression chains this deep are pathological.
+_MAX_WALK = 100
+
+
+def build_session_provenance(
+    db: Any,
+    acp_session_id: str,
+    current_hermes_session_id: str,
+    *,
+    previous_hermes_session_id: Optional[str] = None,
+) -> Optional[Dict[str, Any]]:
+    """Build ``_meta.hermes.sessionProvenance`` for an ACP session.
+
+    Args:
+        db: A ``SessionDB`` (must expose ``get_session``).
+        acp_session_id: The stable ACP/editor-facing session handle.
+        current_hermes_session_id: The live internal Hermes DB session id
+            (``state.agent.session_id``).
+        previous_hermes_session_id: The internal id from before the most recent
+            turn, when known. Supplied by ``prompt()`` to flag a rotation.
+
+    Returns:
+        A dict suitable for ``{"hermes": {"sessionProvenance": <dict>}}`` under
+        ACP ``_meta``, or ``None`` if the session can't be read.
+    """
+    try:
+        row = db.get_session(current_hermes_session_id)
+    except Exception:
+        return None
+    if not row:
+        return None
+
+    parent_id = row.get("parent_session_id")
+    end_reason = row.get("end_reason")
+
+    # Walk parents to the lineage root and count compression depth. Only
+    # compression-split parents (parent.end_reason == 'compression') count
+    # toward depth — delegate/branch children share the parent_session_id
+    # column but are not compaction boundaries.
+    root_id = current_hermes_session_id
+    compression_depth = 0
+    cursor_parent = parent_id
+    seen = {current_hermes_session_id}
+    for _ in range(_MAX_WALK):
+        if not cursor_parent or cursor_parent in seen:
+            break
+        seen.add(cursor_parent)
+        try:
+            prow = db.get_session(cursor_parent)
+        except Exception:
+            prow = None
+        if not prow:
+            break
+        root_id = cursor_parent
+        if prow.get("end_reason") == "compression":
+            compression_depth += 1
+        cursor_parent = prow.get("parent_session_id")
+
+    # A session is a compression continuation when its parent was ended with
+    # end_reason='compression'. Determine that from the immediate parent.
+    is_continuation = False
+    if parent_id:
+        try:
+            immediate_parent = db.get_session(parent_id)
+        except Exception:
+            immediate_parent = None
+        if immediate_parent and immediate_parent.get("end_reason") == "compression":
+            is_continuation = True
+
+    rotated = bool(
+        previous_hermes_session_id
+        and previous_hermes_session_id != current_hermes_session_id
+    )
+
+    provenance: Dict[str, Any] = {
+        "acpSessionId": acp_session_id,
+        "currentHermesSessionId": current_hermes_session_id,
+        "rootHermesSessionId": root_id,
+        "parentHermesSessionId": parent_id,
+        "sessionKind": "continuation" if is_continuation else "root",
+        "compressionDepth": compression_depth,
+    }
+    if previous_hermes_session_id:
+        provenance["previousHermesSessionId"] = previous_hermes_session_id
+    if rotated:
+        # The head moved during the last turn. The only mechanism that rotates
+        # the internal id mid-turn is compression-driven session splitting.
+        provenance["reason"] = "compression"
+        provenance["creatorKind"] = "compression"
+
+    return provenance
+
+
+def session_provenance_meta(
+    db: Any,
+    acp_session_id: str,
+    current_hermes_session_id: str,
+    *,
+    previous_hermes_session_id: Optional[str] = None,
+) -> Optional[Dict[str, Any]]:
+    """Return a ready ``_meta`` payload: ``{"hermes": {"sessionProvenance": ...}}``."""
+    prov = build_session_provenance(
+        db,
+        acp_session_id,
+        current_hermes_session_id,
+        previous_hermes_session_id=previous_hermes_session_id,
+    )
+    if prov is None:
+        return None
+    return {"hermes": {"sessionProvenance": prov}}
--- a/acp_adapter/server.py
+++ b/acp_adapter/server.py
@ -71,6 +71,7 @@ from acp_adapter.events import (
    make_tool_progress_cb,
 )
 from acp_adapter.permissions import make_approval_callback
+from acp_adapter.provenance import session_provenance_meta
 from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
 from acp_adapter.tools import build_tool_complete, build_tool_start

@ -709,8 +710,39 @@ class HermesACPAgent(acp.Agent):
                exc_info=True,
            )

-    async def _send_session_info_update(self, session_id: str) -> None:
-        """Send ACP native session metadata after Hermes changes it."""
+    def _provenance_meta(
+        self,
+        acp_session_id: str,
+        current_hermes_session_id: str,
+        previous_hermes_session_id: Optional[str] = None,
+    ) -> Optional[dict]:
+        """Best-effort ``_meta.hermes.sessionProvenance`` for an ACP session."""
+        try:
+            return session_provenance_meta(
+                self.session_manager._get_db(),
+                acp_session_id,
+                current_hermes_session_id,
+                previous_hermes_session_id=previous_hermes_session_id,
+            )
+        except Exception:
+            logger.debug(
+                "Could not build ACP session provenance for %s", acp_session_id, exc_info=True
+            )
+            return None
+
+    async def _send_session_info_update(
+        self,
+        session_id: str,
+        *,
+        current_hermes_session_id: Optional[str] = None,
+        previous_hermes_session_id: Optional[str] = None,
+    ) -> None:
+        """Send ACP native session metadata after Hermes changes it.
+
+        When the internal Hermes head rotated (e.g. compression-driven session
+        split during a turn), pass ``previous_hermes_session_id`` so the
+        attached ``_meta.hermes.sessionProvenance`` flags the rotation reason.
+        """
        if not self._conn:
            return
        try:
@ -727,10 +759,16 @@ class HermesACPAgent(acp.Agent):
        # the updated_at since we're emitting this notification precisely
        # because the title was just refreshed.
        updated_at = datetime.now(timezone.utc).isoformat()
+        meta = self._provenance_meta(
+            session_id,
+            current_hermes_session_id or session_id,
+            previous_hermes_session_id,
+        )
        update = SessionInfoUpdate(
            session_update="session_info_update",
            title=title if isinstance(title, str) and title.strip() else None,
            updated_at=updated_at,
+            field_meta=meta,
        )
        try:
            await self._conn.session_update(
@ -1081,6 +1119,9 @@ class HermesACPAgent(acp.Agent):
            session_id=state.session_id,
            models=self._build_model_state(state),
            modes=self._session_modes(state),
+            field_meta=self._provenance_meta(
+                state.session_id, getattr(state.agent, "session_id", state.session_id)
+            ),
        )

    async def load_session(
@ -1125,6 +1166,9 @@ class HermesACPAgent(acp.Agent):
        return LoadSessionResponse(
            models=self._build_model_state(state),
            modes=self._session_modes(state),
+            field_meta=self._provenance_meta(
+                session_id, getattr(state.agent, "session_id", session_id)
+            ),
        )

    async def resume_session(
@ -1157,6 +1201,9 @@ class HermesACPAgent(acp.Agent):
        return ResumeSessionResponse(
            models=self._build_model_state(state),
            modes=self._session_modes(state),
+            field_meta=self._provenance_meta(
+                state.session_id, getattr(state.agent, "session_id", state.session_id)
+            ),
        )

    async def cancel(self, session_id: str, **kwargs: Any) -> None:
@ -1494,6 +1541,11 @@ class HermesACPAgent(acp.Agent):
                        logger.debug("Could not clear ACP session context", exc_info=True)

        try:
+            # Snapshot the internal Hermes DB session id before the turn so we
+            # can detect a compression-driven session rotation afterwards. The
+            # ACP `session_id` stays the stable client handle; agent.session_id
+            # is the live internal head that compression may rotate.
+            pre_turn_hermes_id = getattr(state.agent, "session_id", None)
            # Wrap the executor call in a fresh copy of the current context so
            # concurrent ACP sessions on the shared ThreadPoolExecutor don't
            # stomp on each other's ContextVar writes (HERMES_SESSION_KEY in
@ -1512,8 +1564,41 @@ class HermesACPAgent(acp.Agent):
            # Persist updated history so sessions survive process restarts.
            self.session_manager.save_session(session_id)

+        # Detect a compression-driven internal session rotation. If the agent's
+        # DB head moved during the turn, emit a session_info_update carrying
+        # _meta.hermes.sessionProvenance so ACP clients can render the boundary
+        # and keep old/new ids in lineage. The ACP session_id is unchanged.
+        post_turn_hermes_id = getattr(state.agent, "session_id", None)
+        if (
+            conn
+            and post_turn_hermes_id
+            and pre_turn_hermes_id
+            and post_turn_hermes_id != pre_turn_hermes_id
+        ):
+            try:
+                await self._send_session_info_update(
+                    session_id,
+                    current_hermes_session_id=post_turn_hermes_id,
+                    previous_hermes_session_id=pre_turn_hermes_id,
+                )
+            except Exception:
+                logger.debug(
+                    "Could not emit ACP provenance update after rotation for %s",
+                    session_id,
+                    exc_info=True,
+                )
+
        final_response = result.get("final_response", "")
-        if final_response:
+        cancelled = bool(state.cancel_event and state.cancel_event.is_set())
+        interrupted = bool(result.get("interrupted")) or cancelled
+        # Hermes' local "waiting for model response" interrupt status is metadata,
+        # not assistant prose — clients get cancellation from stop_reason instead.
+        from agent.conversation_loop import INTERRUPT_WAITING_FOR_MODEL_PREFIX
+
+        suppress_interrupt_response = interrupted and final_response.startswith(
+            INTERRUPT_WAITING_FOR_MODEL_PREFIX
+        )
+        if final_response and not suppress_interrupt_response:
            try:
                from agent.title_generator import maybe_auto_title

@ -1534,7 +1619,12 @@ class HermesACPAgent(acp.Agent):
                )
            except Exception:
                logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
-        if final_response and conn and (not streamed_message or result.get("response_transformed")):
+        if (
+            final_response
+            and conn
+            and not suppress_interrupt_response
+            and (not streamed_message or result.get("response_transformed"))
+        ):
            # Deliver the final response when streaming did not already send it,
            # or when a plugin hook transformed the response after streaming
            # finished (e.g. transform_llm_output) — otherwise the appended /
@ -1576,7 +1666,7 @@ class HermesACPAgent(acp.Agent):

        await self._send_usage_update(state)

-        stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn"
+        stop_reason = "cancelled" if cancelled else "end_turn"
        return PromptResponse(stop_reason=stop_reason, usage=usage)

    # ---- Slash commands (headless) -------------------------------------------
--- a/agent/agent_init.py
+++ b/agent/agent_init.py
@ -169,6 +169,7 @@ def init_agent(
    save_trajectories: bool = False,
    verbose_logging: bool = False,
    quiet_mode: bool = False,
+    tool_progress_mode: str = "all",
    ephemeral_system_prompt: str = None,
    log_prefix_chars: int = 100,
    log_prefix: str = "",
@ -280,6 +281,7 @@ def init_agent(
    agent.save_trajectories = save_trajectories
    agent.verbose_logging = verbose_logging
    agent.quiet_mode = quiet_mode
+    agent.tool_progress_mode = tool_progress_mode
    agent.ephemeral_system_prompt = ephemeral_system_prompt
    agent.platform = platform  # "cli", "telegram", "discord", "whatsapp", etc.
    agent._user_id = user_id  # Platform user identifier (gateway sessions)
--- a/agent/agent_runtime_helpers.py
+++ b/agent/agent_runtime_helpers.py
@ -1846,6 +1846,27 @@ def repair_tool_call(agent, tool_name: str) -> str | None:
    if not tool_name:
        return None

+    # VolcEngine api/plan workaround (issue #33007): the endpoint's
+    # protocol-translation layer occasionally leaks raw XML attribute
+    # fragments into tool_use.name, e.g.
+    #   `terminal" parameter="command" string="true`
+    #   `execute_code" parameter="code" string="true`
+    #   `session_search" parameter="session_id" string="true`
+    # We trim at the first unambiguous XML/quote character so the rest
+    # of the repair pipeline (lowercase / snake_case / fuzzy match)
+    # can resolve the cleaned name to a real tool.
+    #
+    # Crucially we DO NOT split on whitespace: legitimate inputs like
+    # "write file" must keep flowing through ``_norm`` -> ``write_file``
+    # (covered by test_space_to_underscore in
+    # tests/run_agent/test_repair_tool_call_name.py).
+    for _xml_sep in ('"', "'", "<", ">"):
+        _idx = tool_name.find(_xml_sep)
+        if _idx > 0:
+            tool_name = tool_name[:_idx]
+    if not tool_name:
+        return None
+
    def _norm(s: str) -> str:
        return s.lower().replace("-", "_").replace(" ", "_")

--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@ -2301,3 +2301,43 @@ def build_anthropic_kwargs(
        kwargs["extra_headers"] = {"anthropic-beta": ",".join(betas)}

    return kwargs
+
+
+# Keys that belong exclusively to the OpenAI Responses / Codex API shape.
+# The Anthropic Messages SDK (``messages.create()`` / ``messages.stream()``)
+# raises ``TypeError: ... got an unexpected keyword argument`` on any of them.
+_RESPONSES_ONLY_KWARGS = frozenset(
+    {"instructions", "input", "store", "parallel_tool_calls"}
+)
+
+
+def sanitize_anthropic_kwargs(api_kwargs: Any, *, log_prefix: str = "") -> Any:
+    """Drop Responses-API-only keys before an Anthropic Messages SDK call.
+
+    Defensive boundary guard for #31673: under rare api_mode-flip races
+    (e.g. a concurrent auxiliary call mutating a shared agent between the
+    kwargs build and the stream dispatch), a Responses-shaped payload
+    carrying ``instructions=`` can reach ``messages.stream()`` /
+    ``messages.create()``. The Anthropic SDK rejects it with a
+    non-retryable ``TypeError`` that nukes the whole turn and propagates
+    the entire fallback chain.
+
+    Mutates ``api_kwargs`` in place and returns it. When a foreign key is
+    present we log a WARNING so the underlying race stays visible in the
+    wild instead of being silently papered over.
+    """
+    if not isinstance(api_kwargs, dict):
+        return api_kwargs
+    leaked = _RESPONSES_ONLY_KWARGS.intersection(api_kwargs)
+    if leaked:
+        for _key in leaked:
+            api_kwargs.pop(_key, None)
+        logger.warning(
+            "%sStripped Responses-only kwarg(s) %s from an Anthropic Messages "
+            "call (api_mode flip race — see #31673). The call will proceed; "
+            "this breadcrumb means a kwargs build ran under a Responses "
+            "api_mode while dispatch ran under anthropic_messages.",
+            log_prefix,
+            sorted(leaked),
+        )
+    return api_kwargs
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@ -637,54 +637,6 @@ def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
 # calls to the Codex Responses API so callers don't need any changes.


-def _convert_content_for_responses(content: Any) -> Any:
-    """Convert chat.completions content to Responses API format.
-
-    chat.completions uses:
-      {"type": "text", "text": "..."}
-      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
-
-    Responses API uses:
-      {"type": "input_text", "text": "..."}
-      {"type": "input_image", "image_url": "data:image/png;base64,..."}
-
-    If content is a plain string, it's returned as-is (the Responses API
-    accepts strings directly for text-only messages).
-    """
-    if isinstance(content, str):
-        return content
-    if not isinstance(content, list):
-        return str(content) if content else ""
-
-    converted: List[Dict[str, Any]] = []
-    for part in content:
-        if not isinstance(part, dict):
-            continue
-        ptype = part.get("type", "")
-        if ptype == "text":
-            converted.append({"type": "input_text", "text": part.get("text", "")})
-        elif ptype == "image_url":
-            # chat.completions nests the URL: {"image_url": {"url": "..."}}
-            image_data = part.get("image_url", {})
-            url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
-            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
-            # Preserve detail if specified
-            detail = image_data.get("detail") if isinstance(image_data, dict) else None
-            if detail:
-                entry["detail"] = detail
-            converted.append(entry)
-        elif ptype in {"input_text", "input_image"}:
-            # Already in Responses format — pass through
-            converted.append(part)
-        else:
-            # Unknown content type — try to preserve as text
-            text = part.get("text", "")
-            if text:
-                converted.append({"type": "input_text", "text": text})
-
-    return converted or ""
-
-
 class _CodexCompletionsAdapter:
    """Drop-in shim that accepts chat.completions.create() kwargs and
    routes them through the Codex Responses streaming API."""
@ -697,26 +649,37 @@ class _CodexCompletionsAdapter:
        messages = kwargs.get("messages", [])
        model = kwargs.get("model", self._model)

-        # Separate system/instructions from conversation messages.
-        # Convert chat.completions multimodal content blocks to Responses
-        # API format (input_text / input_image instead of text / image_url).
+        # Separate system/instructions from replayable conversation messages,
+        # then route the rest through the SINGLE shared chat->Responses
+        # converter used by the main agent transport
+        # (agent/transports/codex.py). Maintaining a private conversion loop
+        # here let chat-style messages with role="tool" leak straight into
+        # Responses input[] — which the Responses API rejects with
+        # "Invalid value: 'tool'. Supported values are: 'assistant', 'system',
+        # 'developer', and 'user'." (issue #5709, hit hard by flush_memories()
+        # / compression replaying real session history that includes assistant
+        # tool_calls + role="tool" results). The shared converter encodes
+        # assistant tool calls as `function_call` items and tool results as
+        # `function_call_output` items with a valid call_id, so every
+        # Responses path normalizes tool history identically and cannot drift.
+        from agent.codex_responses_adapter import _chat_messages_to_responses_input
+
        instructions = "You are a helpful assistant."
-        input_msgs: List[Dict[str, Any]] = []
+        replay_messages: List[Dict[str, Any]] = []
        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content") or ""
            if role == "system":
                instructions = content if isinstance(content, str) else str(content)
            else:
-                input_msgs.append({
-                    "role": role,
-                    "content": _convert_content_for_responses(content),
-                })
+                replay_messages.append(msg)
+
+        input_items = _chat_messages_to_responses_input(replay_messages)

        resp_kwargs: Dict[str, Any] = {
            "model": model,
            "instructions": instructions,
-            "input": input_msgs or [{"role": "user", "content": ""}],
+            "input": input_items or [{"role": "user", "content": ""}],
            "store": False,
        }

@ -2513,6 +2476,25 @@ def _is_connection_error(exc: Exception) -> bool:
    return False


+def _is_transient_transport_error(exc: Exception) -> bool:
+    """Return True for a one-off transport blip worth retrying ONCE on the
+    same provider before any provider/model fallback.
+
+    Covers connection/streaming-close errors (via the canonical
+    ``_is_connection_error`` detector, shared so the two cannot drift) plus a
+    pure 5xx/408 HTTP status. Deliberately narrow: this is the "retry the
+    same target once" gate, distinct from ``_is_payment_error`` /
+    ``_is_auth_error`` / ``_is_rate_limit_error`` which the except-chain
+    handles by switching provider, refreshing creds, or rotating the pool.
+    """
+    if _is_connection_error(exc):
+        return True
+    status = getattr(exc, "status_code", None) or getattr(
+        getattr(exc, "response", None), "status_code", None
+    )
+    return isinstance(status, int) and (status == 408 or 500 <= status < 600)
+
+
 def _is_auth_error(exc: Exception) -> bool:
    """Detect auth failures that should trigger provider-specific refresh."""
    status = getattr(exc, "status_code", None)
@ -5184,8 +5166,28 @@ def call_llm(
    # Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
    # then payment fallback.
    try:
-        return _validate_llm_response(
-            client.chat.completions.create(**kwargs), task)
+        # Retry ONCE on the same provider for a one-off transient transport
+        # blip (streaming-close / incomplete chunked read / 5xx / 408) before
+        # the except-chain below escalates to provider/model fallback. A
+        # single dropped connection shouldn't abandon an otherwise-healthy
+        # provider. A second failure (or any non-transient error) falls
+        # through to ``first_err`` and the existing fallback handling
+        # unchanged. This is the unified home for the transient retry that
+        # every auxiliary task (compression, memory flush, title-gen,
+        # session-search, vision) shares. (PR #16587)
+        try:
+            return _validate_llm_response(
+                client.chat.completions.create(**kwargs), task)
+        except Exception as transient_err:
+            if not _is_transient_transport_error(transient_err):
+                raise
+            logger.info(
+                "Auxiliary %s: transient transport error; retrying once on "
+                "the same provider before fallback: %s",
+                task or "call", transient_err,
+            )
+            return _validate_llm_response(
+                client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
            retry_kwargs = dict(kwargs)
@ -5651,8 +5653,22 @@ async def async_call_llm(
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

    try:
-        return _validate_llm_response(
-            await client.chat.completions.create(**kwargs), task)
+        # Retry ONCE on the same provider for a transient transport blip
+        # before the except-chain escalates to fallback — see call_llm()
+        # for the rationale. (PR #16587)
+        try:
+            return _validate_llm_response(
+                await client.chat.completions.create(**kwargs), task)
+        except Exception as transient_err:
+            if not _is_transient_transport_error(transient_err):
+                raise
+            logger.info(
+                "Auxiliary %s (async): transient transport error; retrying "
+                "once on the same provider before fallback: %s",
+                task or "call", transient_err,
+            )
+            return _validate_llm_response(
+                await client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
            retry_kwargs = dict(kwargs)
--- a/agent/background_review.py
+++ b/agent/background_review.py
@ -449,6 +449,17 @@ def _run_review_in_thread(
            # if a future code path bypasses the cache.
            review_agent.session_start = agent.session_start
            review_agent.session_id = agent.session_id
+            # Never let the review fork compress. It shares the parent's
+            # session_id, so if it won a compression race it would rotate the
+            # parent into a NEW child that the gateway never adopts (the fork
+            # is single-lifecycle and dies right after this run_conversation).
+            # The foreground turn would then start from the stale parent and
+            # compress it again, leaving the same parent with two sibling
+            # children (issue #38727). Review also needs full context to
+            # produce a good memory/skill summary — compressing would strip
+            # detail. Both compression triggers in conversation_loop.py gate on
+            # agent.compression_enabled, so this short-circuits both paths.
+            review_agent.compression_enabled = False

            from model_tools import get_tool_definitions
            from hermes_cli.plugins import (
--- a/agent/chat_completion_helpers.py
+++ b/agent/chat_completion_helpers.py
@ -139,6 +139,15 @@ def interruptible_api_call(agent, api_kwargs: dict):
    result = {"response": None, "error": None}
    request_client_holder = {"client": None, "owner_tid": None}
    request_client_lock = threading.Lock()
+    # Request-local cancellation flag. Distinct from agent._interrupt_requested
+    # because that flag is cleared at run_conversation() turn boundaries, but
+    # this daemon worker thread can outlive the turn (the gateway caches
+    # AIAgent instances per session). Tracks whether THIS specific request was
+    # cancelled by the main thread's interrupt handler, so the transport error
+    # that is the expected consequence of our own force-close isn't misread as
+    # a network bug and surfaced to the caller. (PR #6600 — cascading interrupt
+    # hang.)
+    _request_cancelled = {"value": False}

    def _set_request_client(client):
        with request_client_lock:
@ -229,6 +238,17 @@ def interruptible_api_call(agent, api_kwargs: dict):
                )
                result["response"] = request_client.chat.completions.create(**api_kwargs)
        except Exception as e:
+            # If the request was cancelled by the main thread's interrupt
+            # handler, the transport error is the expected consequence of our
+            # own force-close, NOT a network bug. Swallow it instead of
+            # surfacing — the main thread raises InterruptedError. (#6600)
+            if _request_cancelled["value"]:
+                logger.debug(
+                    "Non-streaming worker caught %s after request cancellation — "
+                    "exiting without surfacing a network error.",
+                    type(e).__name__,
+                )
+                return
            result["error"] = e
        finally:
            _close_request_client_once("request_complete")
@ -506,6 +526,14 @@ def interruptible_api_call(agent, api_kwargs: dict):
            break

        if agent._interrupt_requested:
+            # Mark THIS request cancelled before force-closing so the worker's
+            # exception handler recognizes the forced transport error as a
+            # cancel and exits cleanly instead of surfacing a network error or
+            # (in the streaming path) burning full retry cycles. (#6600)
+            _request_cancelled["value"] = True
+            logger.debug(
+                "Force-closing httpx client due to interrupt (not a network error)."
+            )
            # Force-close the in-flight worker-local HTTP connection to stop
            # token generation without poisoning the shared client used to
            # seed future retries.
@ -1625,6 +1653,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
    result = {"response": None, "error": None, "partial_tool_names": []}
    request_client_holder = {"client": None, "diag": None, "owner_tid": None}
    request_client_lock = threading.Lock()
+    # Request-local cancellation flag — see interruptible_api_call for the full
+    # rationale. The streaming retry loop is where the 7-minute cascading-
+    # interrupt hang originated: a force-close raised RemoteProtocolError, the
+    # loop classified it as a transient network error, and burned full retry
+    # cycles (and emitted "reconnecting" noise) on a request the user already
+    # cancelled. The token lets the worker recognize its own forced close and
+    # exit immediately instead of retrying. (PR #6600.)
+    _request_cancelled = {"value": False}

    def _set_request_client(client):
        with request_client_lock:
@ -1950,6 +1986,58 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
                "(possible upstream error or malformed SSE response)."
            )

+        # A stream that delivered a tool call but only partial/unparseable
+        # JSON args splits into two very different cases:
+        #
+        #   1. Provider sent finish_reason="length" → a genuine output-cap
+        #      truncation.  Boosting max_tokens on retry is the right move.
+        #
+        #   2. Provider sent NO finish_reason (the SSE simply stopped after
+        #      the opening "{" with no terminator and no [DONE]) → the
+        #      upstream dropped/stalled the connection mid tool-call.  This
+        #      is NOT an output cap — the model never reported hitting one.
+        #      Some dedicated endpoints (e.g. NVIDIA Nemotron Ultra on the
+        #      Nous dedicated endpoint) stall for minutes during large
+        #      tool-arg generation, then close the stream cleanly without a
+        #      finish_reason.  Stamping "length" here sends it down the
+        #      max_tokens-boost truncation path, which retries 3× to no
+        #      effect and finally reports the misleading "Response truncated
+        #      due to output length limit" — the red herring this guards
+        #      against.  Route it through the partial-stream-stub path
+        #      instead so the loop reports an honest mid-tool-call stream
+        #      drop and fails fast rather than escalating output budget.
+        _tool_args_dropped_no_finish = has_truncated_tool_args and finish_reason is None
+        if _tool_args_dropped_no_finish:
+            _dropped_names = [
+                (tool_calls_acc[idx]["function"]["name"] or "?")
+                for idx in sorted(tool_calls_acc)
+            ]
+            logger.warning(
+                "Stream ended with no finish_reason while a tool call's "
+                "arguments were still incomplete (tools=%s); treating as a "
+                "mid-tool-call stream drop, not an output-length truncation.",
+                _dropped_names,
+            )
+            full_reasoning = "".join(reasoning_parts) or None
+            mock_message = SimpleNamespace(
+                role=role,
+                content=full_content,
+                tool_calls=None,
+                reasoning_content=full_reasoning,
+            )
+            mock_choice = SimpleNamespace(
+                index=0,
+                message=mock_message,
+                finish_reason=FINISH_REASON_LENGTH,
+            )
+            return SimpleNamespace(
+                id=PARTIAL_STREAM_STUB_ID,
+                model=model_name,
+                choices=[mock_choice],
+                usage=usage_obj,
+                _dropped_tool_names=_dropped_names or None,
+            )
+
        effective_finish_reason = finish_reason or "stop"
        if has_truncated_tool_args:
            effective_finish_reason = "length"
@ -1988,6 +2076,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
        # Per-attempt diagnostic dict for the retry block to consume.
        _diag = agent._stream_diag_init()
        request_client_holder["diag"] = _diag
+        # Defensive: strip Responses-only kwargs (instructions, input, ...)
+        # that can leak in under an api_mode-flip race. The Anthropic SDK
+        # raises a non-retryable TypeError on them, killing the turn. See
+        # #31673 / sanitize_anthropic_kwargs().
+        from agent.anthropic_adapter import sanitize_anthropic_kwargs
+        sanitize_anthropic_kwargs(
+            api_kwargs, log_prefix=getattr(agent, "log_prefix", "")
+        )
        # Use the Anthropic SDK's streaming context manager
        with agent._anthropic_client.messages.stream(**api_kwargs) as stream:
            # The Anthropic SDK exposes the raw httpx response on
@ -2078,6 +2174,21 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
                        result["response"] = _call_chat_completions()
                    return  # success
                except Exception as e:
+                    # If the main poll loop force-closed this request because
+                    # of an interrupt, the resulting transport error is the
+                    # expected consequence of our own close — NOT a transient
+                    # network error. Exit immediately: no retry, no fallback,
+                    # no "reconnecting" status. The outer poll loop raises
+                    # InterruptedError. This is the fix for the cascading-
+                    # interrupt hang where doomed retries burned full
+                    # stream-stale-timeout cycles. (#6600)
+                    if _request_cancelled["value"]:
+                        logger.debug(
+                            "Streaming worker caught %s after request "
+                            "cancellation — exiting without retry.",
+                            type(e).__name__,
+                        )
+                        return
                    _is_timeout = isinstance(
                        e, (_httpx.ReadTimeout, _httpx.ConnectTimeout, _httpx.PoolTimeout)
                    )
@ -2387,6 +2498,15 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
            )

        if agent._interrupt_requested:
+            # Mark THIS request cancelled before force-closing so the worker's
+            # exception handler recognizes the forced transport error as a
+            # cancel and exits without retrying or surfacing a network error.
+            # (#6600)
+            _request_cancelled["value"] = True
+            logger.debug(
+                "Force-closing streaming httpx client due to interrupt "
+                "(not a network error)."
+            )
            try:
                if agent.api_mode == "anthropic_messages":
                    agent._anthropic_client.close()
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@ -553,6 +553,22 @@ class ContextCompressor(ContextEngine):
        self.last_rough_tokens_when_real_prompt_fit = 0
        self.awaiting_real_usage_after_compression = False

+    def on_session_end(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
+        """Clear per-session compaction state at a real session boundary.
+
+        ``_previous_summary`` is per-session iterative-summary state. It is
+        cleared on ``on_session_reset()`` (/new, /reset), but session *end*
+        (CLI exit, gateway expiry, session-id rotation) goes through
+        ``on_session_end()`` instead — which inherited a no-op from
+        ``ContextEngine``. Without clearing here, a cron/background session's
+        summary could survive on a reused compressor instance and leak into the
+        next live session via the ``_generate_summary()`` iterative-update path
+        (#38788). ``compress()`` already guards the leak at the point of use;
+        this is defense-in-depth that drops the stale summary the moment the
+        owning session ends.
+        """
+        self._previous_summary = None
+
    def update_model(
        self,
        model: str,
@ -1818,6 +1834,41 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            accumulated += msg_tokens
            cut_idx = i

+        # If the backward walk never broke early because the entire transcript
+        # fits within soft_ceiling, accumulated now holds the total transcript
+        # size.  Without intervention _ensure_last_user_message_in_tail pushes
+        # cut_idx forward to include the last user message, and the caller's
+        # compress_start >= compress_end guard either returns unchanged (no-op)
+        # or compresses a single message — both of which trigger the infinite
+        # compaction loop described in #40803.
+        #
+        # Fix: when the whole transcript fits in soft_ceiling, compute a
+        # meaningful cut point using the raw (non-inflated) budget so that
+        # compression actually summarizes a worthwhile middle section.
+        if cut_idx <= head_end and accumulated <= soft_ceiling and accumulated > 0:
+            # The entire compressable region fits in the soft ceiling.
+            # Re-walk with the raw budget (no 1.5x multiplier) to find a
+            # split that gives the summarizer something useful.
+            raw_budget = token_budget
+            raw_accumulated = 0
+            for j in range(n - 1, head_end - 1, -1):
+                raw_msg = messages[j]
+                raw_content = raw_msg.get("content") or ""
+                raw_len = _content_length_for_budget(raw_content)
+                raw_tok = raw_len // _CHARS_PER_TOKEN + 10
+                for tc in raw_msg.get("tool_calls") or []:
+                    if isinstance(tc, dict):
+                        args = tc.get("function", {}).get("arguments", "")
+                        raw_tok += len(args) // _CHARS_PER_TOKEN
+                if raw_accumulated + raw_tok > raw_budget and (n - j) >= min_tail:
+                    cut_idx = j
+                    break
+                raw_accumulated += raw_tok
+                cut_idx = j
+            # If the raw-budget walk also consumed everything (very small
+            # transcript), fall through — the existing fallback logic below
+            # will still force a minimal cut after head_end.
+
        # Ensure we protect at least min_tail messages
        fallback_cut = n - min_tail
        cut_idx = min(cut_idx, fallback_cut)
@ -1920,6 +1971,21 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        compress_end = self._find_tail_cut_by_tokens(messages, compress_start)

        if compress_start >= compress_end:
+            # No compressable window — the entire transcript fits within
+            # the tail budget (soft_ceiling).  Without recording this as
+            # an ineffective compression the anti-thrashing guard in
+            # should_compress() never fires and every subsequent turn
+            # re-triggers a no-op compression loop.  (#40803)
+            self._ineffective_compression_count += 1
+            self._last_compression_savings_pct = 0.0
+            if not self.quiet_mode:
+                logger.warning(
+                    "Compression skipped: compress_start (%d) >= compress_end (%d) "
+                    "— transcript fits within tail budget, nothing to compress. "
+                    "ineffective_compression_count=%d",
+                    compress_start, compress_end,
+                    self._ineffective_compression_count,
+                )
            return messages

        turns_to_summarize = messages[compress_start:compress_end]
@ -1940,6 +2006,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            if summary_body and not self._previous_summary:
                self._previous_summary = summary_body
            turns_to_summarize = messages[max(compress_start, summary_idx + 1):compress_end]
+        elif self._previous_summary:
+            # No handoff summary found in the current messages, but
+            # _previous_summary is non-empty — it was set by a different
+            # (now-ended) session (e.g., a cron job, a prior /new).  Discard
+            # it so _generate_summary() does not inject cross-session content
+            # into the summarizer prompt via the iterative-update path.
+            self._previous_summary = None

        if not self.quiet_mode:
            logger.info(
--- a/agent/conversation_compression.py
+++ b/agent/conversation_compression.py
@ -507,12 +507,29 @@ def compress_context(
            agent._session_db.end_session(agent.session_id, "compression")
            old_session_id = agent.session_id
            agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+            # Ordering contract: the agent thread updates the contextvar here;
+            # the gateway propagates to SessionEntry after run_in_executor returns.
            try:
                from gateway.session_context import set_current_session_id

                set_current_session_id(agent.session_id)
            except Exception:
                os.environ["HERMES_SESSION_ID"] = agent.session_id
+            # The gateway/tools session context (ContextVar + env) and the
+            # logging session context are SEPARATE mechanisms. The call above
+            # moves the former; the ``[session_id]`` tag on log lines comes
+            # from ``hermes_logging._session_context`` (set once per turn in
+            # conversation_loop.py). Without this, post-rotation log lines in
+            # the same turn keep the STALE old id while the message/DB/gateway
+            # state carry the new one — breaking log correlation exactly at the
+            # compaction boundary (see #34089). Guarded separately so a logging
+            # failure can never regress the routing update above.
+            try:
+                from hermes_logging import set_session_context
+
+                set_session_context(agent.session_id)
+            except Exception:
+                pass
            agent._session_db_created = False
            agent._session_db.create_session(
                session_id=agent.session_id,
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@ -91,6 +91,7 @@ AUTH_TYPE_OAUTH = "oauth"
 AUTH_TYPE_API_KEY = "api_key"

 SOURCE_MANUAL = "manual"
+SOURCE_MANUAL_DEVICE_CODE = f"{SOURCE_MANUAL}:device_code"

 STRATEGY_FILL_FIRST = "fill_first"
 STRATEGY_ROUND_ROBIN = "round_robin"
@ -374,7 +375,7 @@ def _iter_custom_providers(config: Optional[dict] = None):
        yield _normalize_custom_pool_name(name), entry


-def get_custom_provider_pool_key(base_url: str, provider_name: Optional[str] = None) -> Optional[str]:
+def get_custom_provider_pool_key(base_url: Optional[str], provider_name: Optional[str] = None) -> Optional[str]:
    """Look up the custom_providers list in config.yaml and return 'custom:<name>' for a matching base_url.

    When provider_name is given, prefer matching by name first (solving the case where
--- a/agent/curator.py
+++ b/agent/curator.py
@ -375,6 +375,11 @@ CURATOR_REVIEW_PROMPT = (
    "into ~/.hermes/skills/.archive/) is the maximum destructive action. "
    "Archives are recoverable; deletion is not.\n"
    "3. DO NOT touch skills shown as pinned=yes. Skip them entirely.\n"
+    "3b. DO NOT archive, delete, consolidate, move, or otherwise modify any "
+    "skill named in the protected built-ins list (currently: plan). These "
+    "back load-bearing UX (slash-command entry points referenced in docs and "
+    "tips) and are filtered out of the candidate list below — never resurrect "
+    "one as an archive or absorb target.\n"
    "4. DO NOT use usage counters as a reason to skip consolidation. The "
    "counters are new and often mostly zero. Judge overlap on CONTENT, "
    "not on use_count. 'use=0' is not evidence a skill is valuable; it's "
--- a/agent/image_routing.py
+++ b/agent/image_routing.py
@ -219,6 +219,35 @@ def _supports_vision_override(
        coerced = _coerce_capability_bool(per_model.get("supports_vision"))
        if coerced is not None:
            return coerced
+
+    # 2b. Legacy list-style custom_providers. Entries are dicts with a
+    # "name" key and a nested "models" dict. Match by provider name (which
+    # may appear as the raw name or "custom:<name>" at runtime).
+    custom_providers = cfg.get("custom_providers")
+    if isinstance(custom_providers, list):
+        # Build candidate names: the provider value and the config provider
+        # value, both raw and with "custom:" prefix stripped/added.
+        candidate_names: set = set()
+        for p in filter(None, (provider, config_provider)):
+            candidate_names.add(p)
+            if p.startswith("custom:"):
+                candidate_names.add(p[len("custom:"):])
+            else:
+                candidate_names.add(f"custom:{p}")
+        for entry_raw in custom_providers:
+            if not isinstance(entry_raw, dict):
+                continue
+            entry_name = str(entry_raw.get("name") or "").strip()
+            if entry_name not in candidate_names:
+                continue
+            models_raw = entry_raw.get("models")
+            models_cfg = models_raw if isinstance(models_raw, dict) else {}
+            per_model_raw = models_cfg.get(model)
+            per_model = per_model_raw if isinstance(per_model_raw, dict) else {}
+            coerced = _coerce_capability_bool(per_model.get("supports_vision"))
+            if coerced is not None:
+                return coerced
+
    return None


--- a/agent/insights.py
+++ b/agent/insights.py
@ -20,23 +20,17 @@ import json
 import time
 from collections import Counter, defaultdict
 from datetime import datetime
-from typing import Any, Dict, List
+from typing import Any, Dict, List, Optional

 from agent.usage_pricing import (
    CanonicalUsage,
-    DEFAULT_PRICING,
    estimate_usage_cost,
    format_duration_compact,
    has_known_pricing,
 )

-_DEFAULT_PRICING = DEFAULT_PRICING


-def _has_known_pricing(model_name: str, provider: str = None, base_url: str = None) -> bool:
-    """Check if a model has known pricing (vs unknown/custom endpoint)."""
-    return has_known_pricing(model_name, provider=provider, base_url=base_url)
-

 def _estimate_cost(
    session_or_model: Dict[str, Any] | str,
@ -45,8 +39,8 @@ def _estimate_cost(
    *,
    cache_read_tokens: int = 0,
    cache_write_tokens: int = 0,
-    provider: str = None,
-    base_url: str = None,
+    provider: Optional[str] = None,
+    base_url: Optional[str] = None,
 ) -> tuple[float, str]:
    """Estimate the USD cost for a session row or a model/token tuple."""
    if isinstance(session_or_model, dict):
@ -77,9 +71,6 @@ def _estimate_cost(
    return float(result.amount_usd or 0.0), result.status


-def _format_duration(seconds: float) -> str:
-    """Format seconds into a human-readable duration string."""
-    return format_duration_compact(seconds)


 def _bar_chart(values: List[int], max_width: int = 20) -> List[str]:
@ -435,7 +426,7 @@ class InsightsEngine:
                included_cost_sessions += 1
            elif status == "unknown":
                unknown_cost_sessions += 1
-            if _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url")):
+            if has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url")):
                models_with_pricing.add(display)
            else:
                models_without_pricing.add(display)
@ -508,7 +499,7 @@ class InsightsEngine:
            d["tool_calls"] += s.get("tool_call_count") or 0
            estimate, status = _estimate_cost(s)
            d["cost"] += estimate
-            d["has_pricing"] = _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url"))
+            d["has_pricing"] = has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url"))
            d["cost_status"] = status

        result = [
@ -679,7 +670,7 @@ class InsightsEngine:
            top.append({
                "label": "Longest session",
                "session_id": longest["id"][:16],
-                "value": _format_duration(dur),
+                "value": format_duration_compact(dur),
                "date": datetime.fromtimestamp(longest["started_at"]).strftime("%b %d"),
            })

@ -764,7 +755,7 @@ class InsightsEngine:
        lines.append(f"  Input tokens:      {o['total_input_tokens']:<12,}  Output tokens:   {o['total_output_tokens']:,}")
        lines.append(f"  Total tokens:      {o['total_tokens']:,}")
        if o["total_hours"] > 0:
-            lines.append(f"  Active time:       ~{_format_duration(o['total_hours'] * 3600):<11}  Avg session:     ~{_format_duration(o['avg_session_duration'])}")
+            lines.append(f"  Active time:       ~{format_duration_compact(o['total_hours'] * 3600):<11}  Avg session:     ~{format_duration_compact(o['avg_session_duration'])}")
        lines.append(f"  Avg msgs/session:  {o['avg_messages_per_session']:.1f}")
        lines.append("")

@ -879,7 +870,7 @@ class InsightsEngine:
        lines.append(f"**Sessions:** {o['total_sessions']} | **Messages:** {o['total_messages']:,} | **Tool calls:** {o['total_tool_calls']:,}")
        lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
        if o["total_hours"] > 0:
-            lines.append(f"**Active time:** ~{_format_duration(o['total_hours'] * 3600)} | **Avg session:** ~{_format_duration(o['avg_session_duration'])}")
+            lines.append(f"**Active time:** ~{format_duration_compact(o['total_hours'] * 3600)} | **Avg session:** ~{format_duration_compact(o['avg_session_duration'])}")
        lines.append("")

        # Models (top 5)
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@ -28,6 +28,8 @@ from __future__ import annotations
 import logging
 import re
 import inspect
+import threading
+from concurrent.futures import ThreadPoolExecutor
 from typing import Any, Dict, List, Optional

 from agent.memory_provider import MemoryProvider
@ -35,6 +37,12 @@ from tools.registry import tool_error

 logger = logging.getLogger(__name__)

+# How long shutdown_all() waits for in-flight background sync/prefetch work
+# to drain before abandoning it. A wedged provider must never block process
+# teardown indefinitely — the worker threads are daemon, so anything still
+# running past this window dies with the interpreter.
+_SYNC_DRAIN_TIMEOUT_S = 5.0
+

 # ---------------------------------------------------------------------------
 # Context fencing helpers
@ -252,6 +260,13 @@ class MemoryManager:
        self._providers: List[MemoryProvider] = []
        self._tool_to_provider: Dict[str, MemoryProvider] = {}
        self._has_external: bool = False  # True once a non-builtin provider is added
+        # Background executor for end-of-turn sync/prefetch. Lazily created on
+        # first use so the common builtin-only path spawns no extra threads.
+        # A single worker serializes a provider's writes (turn N must land
+        # before turn N+1) and caps thread growth at one per manager. See
+        # _submit_background() and the sync_all/queue_prefetch_all rationale.
+        self._sync_executor: Optional[ThreadPoolExecutor] = None
+        self._sync_executor_lock = threading.Lock()

    # -- Registration --------------------------------------------------------

@ -375,15 +390,27 @@ class MemoryManager:
        return "\n\n".join(parts)

    def queue_prefetch_all(self, query: str, *, session_id: str = "") -> None:
-        """Queue background prefetch on all providers for the next turn."""
-        for provider in self._providers:
-            try:
-                provider.queue_prefetch(query, session_id=session_id)
-            except Exception as e:
-                logger.debug(
-                    "Memory provider '%s' queue_prefetch failed (non-fatal): %s",
-                    provider.name, e,
-                )
+        """Queue background prefetch on all providers for the next turn.
+
+        Provider work is dispatched to a background worker so a slow or
+        wedged provider can never block the caller. See ``sync_all`` for
+        the full rationale (agent stuck "running" minutes after a turn).
+        """
+        providers = list(self._providers)
+        if not providers:
+            return
+
+        def _run() -> None:
+            for provider in providers:
+                try:
+                    provider.queue_prefetch(query, session_id=session_id)
+                except Exception as e:
+                    logger.debug(
+                        "Memory provider '%s' queue_prefetch failed (non-fatal): %s",
+                        provider.name, e,
+                    )
+
+        self._submit_background(_run)

    # -- Sync ----------------------------------------------------------------

@ -407,27 +434,120 @@ class MemoryManager:
        session_id: str = "",
        messages: Optional[List[Dict[str, Any]]] = None,
    ) -> None:
-        """Sync a completed turn to all providers."""
-        for provider in self._providers:
+        """Sync a completed turn to all providers.
+
+        Runs on a background worker thread, NOT inline on the
+        turn-completion path. A provider's ``sync_turn`` may make a
+        blocking network/daemon call (a misconfigured Hindsight daemon
+        was observed blocking ~298s before failing); doing that inline
+        held ``run_conversation`` open long after the user saw their
+        response, so every interface (CLI, TUI, gateway) kept the agent
+        marked "running" for minutes and any follow-up message triggered
+        an aggressive interrupt. Dispatching off-thread means a slow or
+        broken provider can never stall the turn — the sync simply
+        completes (or fails, logged) in the background.
+
+        Writes are serialized through a single worker so turn N lands
+        before turn N+1; provider implementations don't need their own
+        ordering guarantees.
+        """
+        providers = list(self._providers)
+        if not providers:
+            return
+
+        def _run() -> None:
+            for provider in providers:
+                try:
+                    if messages is not None and self._provider_sync_accepts_messages(provider):
+                        provider.sync_turn(
+                            user_content,
+                            assistant_content,
+                            session_id=session_id,
+                            messages=messages,
+                        )
+                    else:
+                        provider.sync_turn(
+                            user_content,
+                            assistant_content,
+                            session_id=session_id,
+                        )
+                except Exception as e:
+                    logger.warning(
+                        "Memory provider '%s' sync_turn failed: %s",
+                        provider.name, e,
+                    )
+
+        self._submit_background(_run)
+
+    # -- Background dispatch -------------------------------------------------
+
+    def _submit_background(self, fn) -> None:
+        """Run ``fn`` on the manager's background worker.
+
+        The executor is created lazily and shared across calls. If the
+        executor can't be created or has already been shut down, ``fn``
+        runs inline as a last-resort fallback — losing the async benefit
+        but never losing the write itself. ``fn`` must do its own
+        per-provider error handling; this wrapper only guards executor
+        plumbing.
+        """
+        executor = self._get_sync_executor()
+        if executor is None:
+            # Executor unavailable (shut down / creation failed) — run
+            # inline rather than drop the work. Slow, but correct.
            try:
-                if messages is not None and self._provider_sync_accepts_messages(provider):
-                    provider.sync_turn(
-                        user_content,
-                        assistant_content,
-                        session_id=session_id,
-                        messages=messages,
+                fn()
+            except Exception as e:  # pragma: no cover - fn guards internally
+                logger.debug("Inline memory background task failed: %s", e)
+            return
+        try:
+            executor.submit(fn)
+        except RuntimeError:
+            # Executor was shut down between the get and the submit
+            # (teardown race). Fall back to inline.
+            try:
+                fn()
+            except Exception as e:  # pragma: no cover - fn guards internally
+                logger.debug("Inline memory background task failed: %s", e)
+
+    def _get_sync_executor(self) -> Optional[ThreadPoolExecutor]:
+        """Lazily create the single-worker background executor."""
+        if self._sync_executor is not None:
+            return self._sync_executor
+        with self._sync_executor_lock:
+            if self._sync_executor is None:
+                try:
+                    self._sync_executor = ThreadPoolExecutor(
+                        max_workers=1,
+                        thread_name_prefix="mem-sync",
                    )
-                else:
-                    provider.sync_turn(
-                        user_content,
-                        assistant_content,
-                        session_id=session_id,
-                    )
-            except Exception as e:
-                logger.warning(
-                    "Memory provider '%s' sync_turn failed: %s",
-                    provider.name, e,
-                )
+                except Exception as e:  # pragma: no cover - resource exhaustion
+                    logger.warning("Failed to create memory sync executor: %s", e)
+                    return None
+            return self._sync_executor
+
+    def flush_pending(self, timeout: Optional[float] = None) -> bool:
+        """Block until queued sync/prefetch work has drained.
+
+        Single-worker executor means submitting a sentinel and waiting on
+        it guarantees every previously-submitted task has run. Returns
+        True if the barrier completed within ``timeout`` (or no executor
+        exists), False on timeout. Used at real session boundaries and by
+        tests that need to assert provider state deterministically.
+        """
+        executor = self._sync_executor
+        if executor is None:
+            return True
+        try:
+            fut = executor.submit(lambda: None)
+        except RuntimeError:
+            # Executor already shut down — nothing pending.
+            return True
+        try:
+            fut.result(timeout=timeout)
+            return True
+        except Exception:
+            return False

    # -- Tools ---------------------------------------------------------------

@ -653,7 +773,15 @@ class MemoryManager:
                )

    def shutdown_all(self) -> None:
-        """Shut down all providers (reverse order for clean teardown)."""
+        """Shut down all providers (reverse order for clean teardown).
+
+        Drains the background sync/prefetch executor first (bounded by
+        ``_SYNC_DRAIN_TIMEOUT_S``) so a turn's final sync has a chance to
+        land before providers are torn down. The worker threads are
+        daemon, so anything still wedged past the drain window dies with
+        the interpreter rather than blocking exit.
+        """
+        self._drain_sync_executor()
        for provider in reversed(self._providers):
            try:
                provider.shutdown()
@ -663,6 +791,52 @@ class MemoryManager:
                    provider.name, e,
                )

+    def _drain_sync_executor(self) -> None:
+        """Shut down the background executor, waiting briefly for drain.
+
+        Bounded by ``_SYNC_DRAIN_TIMEOUT_S``: a wedged provider must never
+        hang process/session teardown. We stop accepting new work and
+        cancel anything still queued, then wait at most the drain timeout
+        for the currently-running task on a watcher thread. The worker is
+        daemon, so an over-running task dies with the interpreter.
+        """
+        with self._sync_executor_lock:
+            executor = self._sync_executor
+            self._sync_executor = None
+        if executor is None:
+            return
+        try:
+            # Stop accepting new work and drop anything still queued, but
+            # do NOT block here — cancel_futures cancels not-yet-started
+            # tasks; the in-flight one keeps running on its daemon thread.
+            executor.shutdown(wait=False, cancel_futures=True)
+        except TypeError:
+            # Older Python without cancel_futures kwarg.
+            try:
+                executor.shutdown(wait=False)
+            except Exception as e:  # pragma: no cover
+                logger.debug("Memory sync executor shutdown failed: %s", e)
+            return
+        except Exception as e:  # pragma: no cover
+            logger.debug("Memory sync executor shutdown failed: %s", e)
+            return
+        # Give an in-flight sync a bounded chance to finish on a watcher
+        # thread so we don't block the caller past the drain timeout.
+        drainer = threading.Thread(
+            target=lambda: self._bounded_executor_wait(executor),
+            daemon=True,
+            name="mem-sync-drain",
+        )
+        drainer.start()
+        drainer.join(timeout=_SYNC_DRAIN_TIMEOUT_S)
+
+    @staticmethod
+    def _bounded_executor_wait(executor: ThreadPoolExecutor) -> None:
+        try:
+            executor.shutdown(wait=True)
+        except Exception as e:  # pragma: no cover
+            logger.debug("Memory sync executor drain wait failed: %s", e)
+
    def initialize_all(self, session_id: str, **kwargs) -> None:
        """Initialize all providers.

--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@ -1684,6 +1684,26 @@ def get_model_context_length(
                "in config.yaml to override.",
                model, base_url, f"{DEFAULT_FALLBACK_CONTEXT:,}",
            )
+            # 3b. Before falling back to the hard 256K default, consult the
+            # hardcoded catalog as a last resort.  A proxied/custom Anthropic
+            # gateway (e.g. corporate proxy) fails the Ollama/local probes
+            # above, but the model name may still match an entry in
+            # DEFAULT_CONTEXT_LENGTHS (e.g. "claude-opus-4-8" → 1M).
+            # Without this, the early return here short-circuits the catalog
+            # lookup at step 8 and silently caps context at 256K.
+            model_lower = model.lower()
+            for default_model, length in sorted(
+                DEFAULT_CONTEXT_LENGTHS.items(),
+                key=lambda x: len(x[0]),
+                reverse=True,
+            ):
+                if default_model in model_lower:
+                    logger.info(
+                        "Using hardcoded context length %s for model %r "
+                        "(custom endpoint, catalog match on %r)",
+                        f"{length:,}", model, default_model,
+                    )
+                    return length
            return DEFAULT_FALLBACK_CONTEXT

    # 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
--- a/agent/tool_executor.py
+++ b/agent/tool_executor.py
@ -702,7 +702,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
        if agent._should_emit_quiet_tool_messages():
            cute_msg = _get_cute_tool_message_impl(name, args, tool_duration, result=function_result)
            agent._safe_print(f"  {cute_msg}")
-        elif not agent.quiet_mode:
+        elif getattr(agent, "tool_progress_mode", "all") != "off":
            _preview_str = _multimodal_text_summary(function_result)
            if agent.verbose_logging:
                print(f"  ✅ Tool {i+1} completed in {tool_duration:.2f}s")
--- a/agent/turn_context.py
+++ b/agent/turn_context.py
@ -0,0 +1,388 @@
+"""Per-turn setup for ``run_conversation`` (the turn prologue).
+
+``run_conversation`` opened with ~470 lines of straight-line setup before the
+tool-calling loop ever started: stdio guarding, runtime-main wiring, retry-counter
+resets, user-message sanitization, todo/nudge-counter hydration, system-prompt
+restore-or-build, crash-resilience persistence, preflight context compression, the
+``pre_llm_call`` plugin hook, and external-memory prefetch.
+
+All of that is *prologue* — it runs once per turn, has no back-references into the
+loop, and produces a fixed set of values the loop then consumes. ``TurnContext``
+captures those produced values; ``build_turn_context`` performs the setup work and
+returns one. ``run_conversation`` is left to unpack the context and run the loop,
+shrinking the orchestrator by the full prologue.
+
+The builder still mutates ``agent`` heavily (counters, thread id, cached prompt,
+session DB) exactly as the inline code did — those side effects are the point. The
+``TurnContext`` it returns carries only the *locals* the loop reads back.
+
+Behavior is identical to the original inline prologue; this is a pure
+move-and-name refactor with no semantic change.
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+import uuid
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional
+
+from agent.iteration_budget import IterationBudget
+from agent.model_metadata import estimate_request_tokens_rough
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TurnContext:
+    """Values produced by the turn prologue and consumed by the turn loop."""
+
+    # Sanitized inbound message (surrogates stripped).
+    user_message: str
+    # Clean message preserved for transcripts / memory queries (no nudge injection).
+    original_user_message: Any
+    # Working message list for this turn (loop appends to it).
+    messages: List[Dict[str, Any]]
+    # May be reset to None by preflight compression (new session created).
+    conversation_history: Optional[List[Dict[str, Any]]]
+    # Cached system prompt active for this turn (may be rebuilt by compression).
+    active_system_prompt: Optional[str]
+    # Task / turn identifiers.
+    effective_task_id: str
+    turn_id: str
+    # Index of the current user turn within ``messages``.
+    current_turn_user_idx: int
+    # Whether the post-turn memory review should fire.
+    should_review_memory: bool = False
+    # Context contributed by ``pre_llm_call`` plugins (appended to user message).
+    plugin_user_context: str = ""
+    # External-memory prefetch result, reused across loop iterations.
+    ext_prefetch_cache: str = ""
+
+
+def build_turn_context(
+    agent,
+    user_message: str,
+    system_message: Optional[str],
+    conversation_history: Optional[List[Dict[str, Any]]],
+    task_id: Optional[str],
+    stream_callback,
+    persist_user_message: Optional[str],
+    *,
+    restore_or_build_system_prompt,
+    install_safe_stdio,
+    sanitize_surrogates,
+    summarize_user_message_for_log,
+    set_session_context,
+    set_current_write_origin,
+    ra,
+) -> TurnContext:
+    """Run the once-per-turn setup and return the loop's input context.
+
+    The callables/helpers the original prologue referenced from the
+    ``conversation_loop`` module are passed in explicitly to keep this module
+    free of an import cycle with ``agent.conversation_loop``.
+    """
+    # Guard stdio against OSError from broken pipes (systemd/headless/daemon).
+    install_safe_stdio()
+
+    agent._ensure_db_session()
+
+    # Tell auxiliary_client what the live main provider/model are for this turn.
+    try:
+        from agent.auxiliary_client import set_runtime_main
+        set_runtime_main(
+            getattr(agent, "provider", "") or "",
+            getattr(agent, "model", "") or "",
+            base_url=getattr(agent, "base_url", "") or "",
+            api_key=getattr(agent, "api_key", "") or "",
+            api_mode=getattr(agent, "api_mode", "") or "",
+        )
+    except Exception:
+        pass
+
+    # Tag log records on this thread with the session ID for ``hermes logs``.
+    set_session_context(agent.session_id)
+
+    # Bind the skill write-origin ContextVar for this thread.
+    set_current_write_origin(getattr(agent, "_memory_write_origin", "assistant_tool"))
+
+    # Restore the primary runtime if the previous turn activated fallback.
+    agent._restore_primary_runtime()
+
+    # Sanitize surrogate characters from user input.
+    if isinstance(user_message, str):
+        user_message = sanitize_surrogates(user_message)
+    if isinstance(persist_user_message, str):
+        persist_user_message = sanitize_surrogates(persist_user_message)
+
+    # Store stream callback for _interruptible_api_call to pick up.
+    agent._stream_callback = stream_callback
+    agent._persist_user_message_idx = None
+    agent._persist_user_message_override = persist_user_message
+    # Generate unique task_id if not provided to isolate VMs between tasks.
+    effective_task_id = task_id or str(uuid.uuid4())
+    agent._current_task_id = effective_task_id
+    turn_id = f"{agent.session_id or 'session'}:{effective_task_id}:{uuid.uuid4().hex[:8]}"
+    agent._current_turn_id = turn_id
+    agent._current_api_request_id = ""
+
+    # Reset retry counters and iteration budget at the start of each turn.
+    agent._invalid_tool_retries = 0
+    agent._invalid_json_retries = 0
+    agent._empty_content_retries = 0
+    agent._incomplete_scratchpad_retries = 0
+    agent._codex_incomplete_retries = 0
+    agent._thinking_prefill_retries = 0
+    agent._post_tool_empty_retried = False
+    agent._last_content_with_tools = None
+    agent._last_content_tools_all_housekeeping = False
+    agent._mute_post_response = False
+    agent._unicode_sanitization_passes = 0
+    agent._tool_guardrails.reset_for_turn()
+    agent._tool_guardrail_halt_decision = None
+    agent._vision_supported = True
+
+    # Pre-turn connection health check: clean up dead TCP connections.
+    if agent.api_mode != "anthropic_messages":
+        try:
+            if agent._cleanup_dead_connections():
+                agent._emit_status(
+                    "🔌 Detected stale connections from a previous provider "
+                    "issue — cleaned up automatically. Proceeding with fresh "
+                    "connection."
+                )
+        except Exception:
+            pass
+    # Replay compression warning through status_callback for gateway platforms.
+    if agent._compression_warning:
+        agent._replay_compression_warning()
+        agent._compression_warning = None  # send once
+
+    # NOTE: _turns_since_memory and _iters_since_skill are NOT reset here.
+    agent.iteration_budget = IterationBudget(agent.max_iterations)
+
+    # Log conversation turn start for debugging/observability.
+    _preview_text = summarize_user_message_for_log(user_message)
+    _msg_preview = (_preview_text[:80] + "...") if len(_preview_text) > 80 else _preview_text
+    _msg_preview = _msg_preview.replace("\n", " ")
+    logger.info(
+        "conversation turn: session=%s model=%s provider=%s platform=%s history=%d msg=%r",
+        agent.session_id or "none", agent.model, agent.provider or "unknown",
+        agent.platform or "unknown", len(conversation_history or []),
+        _msg_preview,
+    )
+
+    # Initialize conversation (copy to avoid mutating the caller's list).
+    messages = list(conversation_history) if conversation_history else []
+
+    # Hydrate todo store from conversation history.
+    if conversation_history and not agent._todo_store.has_items():
+        agent._hydrate_todo_store(conversation_history)
+
+    # Hydrate per-session nudge counters from persisted history (issue #22357).
+    if conversation_history and agent._user_turn_count == 0:
+        prior_user_turns = sum(
+            1 for m in conversation_history if m.get("role") == "user"
+        )
+        if prior_user_turns > 0:
+            agent._user_turn_count = prior_user_turns
+            if agent._memory_nudge_interval > 0 and agent._turns_since_memory == 0:
+                agent._turns_since_memory = prior_user_turns % agent._memory_nudge_interval
+
+    # Track user turns for memory flush and periodic nudge logic.
+    agent._user_turn_count += 1
+
+    # Reset the streaming context scrubber at the top of each turn.
+    scrubber = getattr(agent, "_stream_context_scrubber", None)
+    if scrubber is not None:
+        scrubber.reset()
+    # Reset the think scrubber for the same reason.
+    think_scrubber = getattr(agent, "_stream_think_scrubber", None)
+    if think_scrubber is not None:
+        think_scrubber.reset()
+
+    # Preserve the original user message (no nudge injection).
+    original_user_message = persist_user_message if persist_user_message is not None else user_message
+
+    # Track memory nudge trigger (turn-based, checked here).
+    should_review_memory = False
+    if (agent._memory_nudge_interval > 0
+            and "memory" in agent.valid_tool_names
+            and agent._memory_store):
+        agent._turns_since_memory += 1
+        if agent._turns_since_memory >= agent._memory_nudge_interval:
+            should_review_memory = True
+            agent._turns_since_memory = 0
+
+    # Add user message.
+    user_msg = {"role": "user", "content": user_message}
+    messages.append(user_msg)
+    current_turn_user_idx = len(messages) - 1
+    agent._persist_user_message_idx = current_turn_user_idx
+
+    if not agent.quiet_mode:
+        _print_preview = summarize_user_message_for_log(user_message)
+        agent._safe_print(
+            f"💬 Starting conversation: '{_print_preview[:60]}"
+            f"{'...' if len(_print_preview) > 60 else ''}'"
+        )
+
+    # ── System prompt (cached per session for prefix caching) ──
+    if agent._cached_system_prompt is None:
+        restore_or_build_system_prompt(agent, system_message, conversation_history)
+
+    active_system_prompt = agent._cached_system_prompt
+
+    # Crash-resilience: persist the inbound user turn as soon as the session row exists.
+    try:
+        agent._persist_session(messages, conversation_history)
+    except Exception:
+        logger.warning(
+            "Early turn-start session persistence failed for session=%s",
+            agent.session_id or "none",
+            exc_info=True,
+        )
+
+    # ── Preflight context compression ──
+    if (
+        agent.compression_enabled
+        and len(messages) > agent.context_compressor.protect_first_n
+                            + agent.context_compressor.protect_last_n + 1
+    ):
+        _preflight_tokens = estimate_request_tokens_rough(
+            messages,
+            system_prompt=active_system_prompt or "",
+            tools=agent.tools or None,
+        )
+        _compressor = agent.context_compressor
+        _defer_preflight = getattr(
+            _compressor,
+            "should_defer_preflight_to_real_usage",
+            lambda _tokens: False,
+        )
+        _preflight_deferred = _defer_preflight(_preflight_tokens)
+
+        if not _preflight_deferred:
+            _last = _compressor.last_prompt_tokens
+            # Do NOT overwrite the -1 sentinel (#36718).
+            if _last >= 0 and _preflight_tokens > _last:
+                _compressor.last_prompt_tokens = _preflight_tokens
+
+        if _preflight_deferred:
+            logger.info(
+                "Skipping preflight compression: rough estimate ~%s >= %s, "
+                "but last real provider prompt was %s after compression",
+                f"{_preflight_tokens:,}",
+                f"{_compressor.threshold_tokens:,}",
+                f"{_compressor.last_real_prompt_tokens:,}",
+            )
+        elif _compressor.should_compress(_preflight_tokens):
+            logger.info(
+                "Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
+                f"{_preflight_tokens:,}",
+                f"{_compressor.threshold_tokens:,}",
+                agent.model,
+                f"{_compressor.context_length:,}",
+            )
+            agent._emit_status(
+                f"📦 Preflight compression: ~{_preflight_tokens:,} tokens "
+                f">= {_compressor.threshold_tokens:,} threshold. "
+                "This may take a moment."
+            )
+            for _pass in range(3):
+                _orig_len = len(messages)
+                messages, active_system_prompt = agent._compress_context(
+                    messages, system_message, approx_tokens=_preflight_tokens,
+                    task_id=effective_task_id,
+                )
+                if len(messages) >= _orig_len:
+                    break  # Cannot compress further
+                conversation_history = None
+                agent._empty_content_retries = 0
+                agent._thinking_prefill_retries = 0
+                agent._last_content_with_tools = None
+                agent._last_content_tools_all_housekeeping = False
+                agent._mute_post_response = False
+                _preflight_tokens = estimate_request_tokens_rough(
+                    messages,
+                    system_prompt=active_system_prompt or "",
+                    tools=agent.tools or None,
+                )
+                if not _compressor.should_compress(_preflight_tokens):
+                    break
+
+    # Plugin hook: pre_llm_call (context injected into user message, not system prompt).
+    plugin_user_context = ""
+    try:
+        from hermes_cli.plugins import invoke_hook as _invoke_hook
+        _pre_results = _invoke_hook(
+            "pre_llm_call",
+            session_id=agent.session_id,
+            task_id=effective_task_id,
+            turn_id=turn_id,
+            user_message=original_user_message,
+            conversation_history=list(messages),
+            is_first_turn=(not bool(conversation_history)),
+            model=agent.model,
+            platform=getattr(agent, "platform", None) or "",
+            sender_id=getattr(agent, "_user_id", None) or "",
+        )
+        _ctx_parts: list[str] = []
+        for r in _pre_results:
+            if isinstance(r, dict) and r.get("context"):
+                _ctx_parts.append(str(r["context"]))
+            elif isinstance(r, str) and r.strip():
+                _ctx_parts.append(r)
+        if _ctx_parts:
+            plugin_user_context = "\n\n".join(_ctx_parts)
+    except Exception as exc:
+        logger.warning("pre_llm_call hook failed: %s", exc)
+
+    # Per-turn file-mutation verifier state.
+    agent._turn_failed_file_mutations = {}
+
+    # Record the execution thread so interrupt()/clear_interrupt() can scope
+    # the tool-level interrupt signal to THIS agent's thread only.
+    agent._execution_thread_id = threading.current_thread().ident
+
+    # Clear stale per-thread interrupt state, preserving a pending interrupt.
+    ra()._set_interrupt(False, agent._execution_thread_id)
+    if agent._interrupt_requested:
+        ra()._set_interrupt(True, agent._execution_thread_id)
+        agent._interrupt_thread_signal_pending = False
+    else:
+        agent._interrupt_message = None
+        agent._interrupt_thread_signal_pending = False
+
+    # Notify memory providers of the new turn (BEFORE prefetch_all).
+    if agent._memory_manager:
+        try:
+            _turn_msg = original_user_message if isinstance(original_user_message, str) else ""
+            agent._memory_manager.on_turn_start(agent._user_turn_count, _turn_msg)
+        except Exception:
+            pass
+
+    # External memory provider: prefetch once before the tool loop.
+    ext_prefetch_cache = ""
+    if agent._memory_manager:
+        try:
+            _query = original_user_message if isinstance(original_user_message, str) else ""
+            ext_prefetch_cache = agent._memory_manager.prefetch_all(_query) or ""
+        except Exception:
+            pass
+
+    return TurnContext(
+        user_message=user_message,
+        original_user_message=original_user_message,
+        messages=messages,
+        conversation_history=conversation_history,
+        active_system_prompt=active_system_prompt,
+        effective_task_id=effective_task_id,
+        turn_id=turn_id,
+        current_turn_user_idx=current_turn_user_idx,
+        should_review_memory=should_review_memory,
+        plugin_user_context=plugin_user_context,
+        ext_prefetch_cache=ext_prefetch_cache,
+    )
--- a/agent/turn_finalizer.py
+++ b/agent/turn_finalizer.py
@ -0,0 +1,428 @@
+"""Post-loop turn finalization for ``run_conversation``.
+
+Extracted from ``agent/conversation_loop.py`` as part of the god-file
+decomposition campaign (``~/.hermes/plans/god-file-decomposition.md``, Phase 1
+step 4 — the post-loop ``TurnFinalizer`` seam). ``run_conversation``'s tail
+(everything after the main tool-calling ``while`` loop) is lifted here verbatim:
+budget-exhaustion summary, trajectory save, session persist, turn diagnostics,
+response transforms, result-dict assembly, steer drain, and the memory/skill
+review trigger.
+
+Behavior-neutral: the body is moved unchanged. All ``agent.*`` side effects fire
+exactly as before; only the post-loop *locals* are passed in as keyword args, and
+the assembled ``result`` dict is returned to ``run_conversation`` which returns it
+to the caller. The function is synchronous with a single return — mirroring the
+region it replaces (no awaits, no early returns).
+
+Module ``logger`` is imported lazily inside the body (``from
+agent.conversation_loop import logger``) so this module never imports
+``agent.conversation_loop`` at import time -> no import cycle, and the log records
+keep the exact logger name (``"agent.conversation_loop"``).
+"""
+
+from __future__ import annotations
+
+import os
+
+from agent.codex_responses_adapter import _summarize_user_message_for_log
+
+
+def finalize_turn(
+    agent,
+    *,
+    final_response,
+    api_call_count,
+    interrupted,
+    failed,
+    messages,
+    conversation_history,
+    effective_task_id,
+    turn_id,
+    user_message,
+    original_user_message,
+    _should_review_memory,
+    _turn_exit_reason,
+):
+    """Run the post-loop finalization and return the turn ``result`` dict.
+
+    Lifted verbatim from ``run_conversation`` (the region after the main agent
+    loop). See module docstring.
+    """
+    from agent.conversation_loop import logger
+
+    if final_response is None and (
+        api_call_count >= agent.max_iterations
+        or agent.iteration_budget.remaining <= 0
+    ):
+        # Budget exhausted — ask the model for a summary via one extra
+        # API call with tools stripped.  _handle_max_iterations injects a
+        # user message and makes a single toolless request.
+        _turn_exit_reason = f"max_iterations_reached({api_call_count}/{agent.max_iterations})"
+        agent._emit_status(
+            f"⚠️ Iteration budget exhausted ({api_call_count}/{agent.max_iterations}) "
+            "— asking model to summarise"
+        )
+        if not agent.quiet_mode:
+            agent._safe_print(
+                f"\n⚠️  Iteration budget exhausted ({api_call_count}/{agent.max_iterations}) "
+                "— requesting summary..."
+            )
+        final_response = agent._handle_max_iterations(messages, api_call_count)
+
+        # If running as a kanban worker, signal the dispatcher that the
+        # worker could not complete (rather than treating it as a
+        # protocol violation).  The agent loop strips tools before calling
+        # _handle_max_iterations, so the model cannot call kanban_block
+        # itself — we must do it on its behalf.
+        #
+        # We route through ``_record_task_failure(outcome="timed_out")``
+        # rather than ``kanban_block`` so this counts toward the
+        # ``consecutive_failures`` counter and the dispatcher's
+        # ``failure_limit`` circuit breaker (#29747 gap 2).  Without this,
+        # a task whose worker keeps exhausting its budget would block
+        # silently each run, get auto-promoted by the operator (or never
+        # surface), and re-block in an endless loop with no signal.
+        _kanban_task = os.environ.get("HERMES_KANBAN_TASK")
+        if _kanban_task:
+            try:
+                from hermes_cli import kanban_db as _kb
+                _conn = _kb.connect()
+                try:
+                    _kb._record_task_failure(
+                        _conn,
+                        _kanban_task,
+                        error=(
+                            f"Iteration budget exhausted "
+                            f"({api_call_count}/{agent.max_iterations}) — "
+                            "task could not complete within the allowed "
+                            "iterations"
+                        ),
+                        outcome="timed_out",
+                        release_claim=True,
+                        end_run=True,
+                        event_payload_extra={
+                            "budget_used": api_call_count,
+                            "budget_max": agent.max_iterations,
+                        },
+                    )
+                    logger.info(
+                        "recorded budget-exhausted failure for task %s (%d/%d)",
+                        _kanban_task, api_call_count, agent.max_iterations,
+                    )
+                finally:
+                    try:
+                        _conn.close()
+                    except Exception:
+                        pass
+            except Exception:
+                logger.warning(
+                    "Failed to record budget-exhausted failure for task %s",
+                    _kanban_task,
+                    exc_info=True,
+                )
+
+    # Determine if conversation completed successfully
+    completed = (
+        final_response is not None
+        and api_call_count < agent.max_iterations
+        and not failed
+    )
+
+    # Save trajectory if enabled.  ``user_message`` may be a multimodal
+    # list of parts; the trajectory format wants a plain string.
+    agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
+
+    # Clean up VM and browser for this task after conversation completes
+    agent._cleanup_task_resources(effective_task_id)
+
+    # Persist session to both JSON log and SQLite only after private retry
+    # scaffolding has been removed. Otherwise a later user "continue" turn
+    # can replay assistant("(empty)") / recovery nudges and fall into the
+    # same empty-response loop again.
+    agent._drop_trailing_empty_response_scaffolding(messages)
+    agent._persist_session(messages, conversation_history)
+
+    # ── Turn-exit diagnostic log ─────────────────────────────────────
+    # Always logged at INFO so agent.log captures WHY every turn ended.
+    # When the last message is a tool result (agent was mid-work), log
+    # at WARNING — this is the "just stops" scenario users report.
+    _last_msg_role = messages[-1].get("role") if messages else None
+    _last_tool_name = None
+    if _last_msg_role == "tool":
+        # Walk back to find the assistant message with the tool call
+        for _m in reversed(messages):
+            if _m.get("role") == "assistant" and _m.get("tool_calls"):
+                _tcs = _m["tool_calls"]
+                if _tcs and isinstance(_tcs[0], dict):
+                    _last_tool_name = _tcs[-1].get("function", {}).get("name")
+                break
+
+    _turn_tool_count = sum(
+        1 for m in messages
+        if isinstance(m, dict) and m.get("role") == "assistant" and m.get("tool_calls")
+    )
+    _resp_len = len(final_response) if final_response else 0
+    _budget_used = agent.iteration_budget.used if agent.iteration_budget else 0
+    _budget_max = agent.iteration_budget.max_total if agent.iteration_budget else 0
+
+    _diag_msg = (
+        "Turn ended: reason=%s model=%s api_calls=%d/%d budget=%d/%d "
+        "tool_turns=%d last_msg_role=%s response_len=%d session=%s"
+    )
+    _diag_args = (
+        _turn_exit_reason, agent.model, api_call_count, agent.max_iterations,
+        _budget_used, _budget_max,
+        _turn_tool_count, _last_msg_role, _resp_len,
+        agent.session_id or "none",
+    )
+
+    if _last_msg_role == "tool" and not interrupted:
+        # Agent was mid-work — this is the "just stops" case.
+        logger.warning(
+            "Turn ended with pending tool result (agent may appear stuck). "
+            + _diag_msg + " last_tool=%s",
+            *_diag_args, _last_tool_name,
+        )
+    else:
+        logger.info(_diag_msg, *_diag_args)
+
+    # File-mutation verifier footer.
+    # If one or more ``write_file`` / ``patch`` calls failed during this
+    # turn and were never superseded by a successful write to the same
+    # path, append an advisory footer to the assistant response.  This
+    # catches the specific case — reported by Ben Eng (#15524-adjacent)
+    # — where a model issues a batch of parallel patches, half of them
+    # fail with "Could not find old_string", and the model summarises
+    # the turn claiming every file was edited.  The user then has to
+    # manually run ``git status`` to catch the lie.  With this footer
+    # the truth is surfaced on every turn, so over-claiming is
+    # structurally impossible past the model.
+    #
+    # Gate: only applied when a real text response exists for this
+    # turn and the user didn't interrupt.  Empty/interrupted turns
+    # already have other surface text that shouldn't be augmented.
+    if final_response and not interrupted:
+        try:
+            _failed = getattr(agent, "_turn_failed_file_mutations", None) or {}
+            if _failed and agent._file_mutation_verifier_enabled():
+                footer = agent._format_file_mutation_failure_footer(_failed)
+                if footer:
+                    final_response = final_response.rstrip() + "\n\n" + footer
+        except Exception as _ver_err:
+            logger.debug("file-mutation verifier footer failed: %s", _ver_err)
+
+    # Turn-completion explainer.
+    # When a turn ends abnormally after substantive work — empty content
+    # after retries, a partial/truncated stream, a still-pending tool
+    # result, or an iteration/budget limit — the user otherwise gets a
+    # blank or fragmentary response box with no consolidated reason why
+    # the agent stopped (#34452).  Surface a single user-visible
+    # explanation derived from ``_turn_exit_reason``, mirroring the
+    # file-mutation verifier footer pattern above.
+    #
+    # Gate carefully so healthy turns stay quiet:
+    #   - ``text_response(...)`` exits never produce an explanation
+    #     (handled inside the formatter), so a terse ``Done.`` is silent.
+    #   - We only ACT when there is no genuinely usable reply this turn:
+    #     an empty response, the "(empty)" terminal sentinel, or a
+    #     suspiciously short partial fragment with no terminating
+    #     punctuation (e.g. "The").  A real short answer keeps its text.
+    if not interrupted:
+        try:
+            if agent._turn_completion_explainer_enabled():
+                _stripped = (final_response or "").strip()
+                _is_empty_terminal = _stripped == "" or _stripped == "(empty)"
+                # A short fragment that is not a normal text_response exit
+                # and lacks sentence-ending punctuation is treated as a
+                # truncated partial (the "The" case from #34452).
+                _is_partial_fragment = (
+                    not _is_empty_terminal
+                    and not str(_turn_exit_reason).startswith("text_response")
+                    and len(_stripped) <= 24
+                    and _stripped[-1:] not in {".", "!", "?", "。", "！", "？", "`", ")"}
+                )
+                if _is_empty_terminal or _is_partial_fragment:
+                    _explanation = agent._format_turn_completion_explanation(
+                        _turn_exit_reason
+                    )
+                    if _explanation:
+                        if _is_empty_terminal:
+                            # Replace the bare "(empty)"/blank sentinel with
+                            # the actionable explanation.
+                            final_response = _explanation
+                        else:
+                            # Keep the partial fragment, append the reason so
+                            # the user sees both what arrived and why it
+                            # stopped.
+                            final_response = (
+                                _stripped + "\n\n" + _explanation
+                            )
+        except Exception as _exp_err:
+            logger.debug("turn-completion explainer failed: %s", _exp_err)
+
+    _response_transformed = False
+
+    # Plugin hook: transform_llm_output
+    # Fired once per turn after the tool-calling loop completes.
+    # Plugins can transform the LLM's output text before it's returned.
+    # First hook to return a string wins; None/empty return leaves text unchanged.
+    if final_response and not interrupted:
+        try:
+            from hermes_cli.plugins import invoke_hook as _invoke_hook
+            _transform_results = _invoke_hook(
+                "transform_llm_output",
+                response_text=final_response,
+                session_id=agent.session_id or "",
+                model=agent.model,
+                platform=getattr(agent, "platform", None) or "",
+            )
+            for _hook_result in _transform_results:
+                if isinstance(_hook_result, str) and _hook_result:
+                    final_response = _hook_result
+                    _response_transformed = True
+                    break  # First non-empty string wins
+        except Exception as exc:
+            logger.warning("transform_llm_output hook failed: %s", exc)
+
+    # Plugin hook: post_llm_call
+    # Fired once per turn after the tool-calling loop completes.
+    # Plugins can use this to persist conversation data (e.g. sync
+    # to an external memory system).
+    if final_response and not interrupted:
+        try:
+            from hermes_cli.plugins import invoke_hook as _invoke_hook
+            _invoke_hook(
+                "post_llm_call",
+                session_id=agent.session_id,
+                task_id=effective_task_id,
+                turn_id=turn_id,
+                user_message=original_user_message,
+                assistant_response=final_response,
+                conversation_history=list(messages),
+                model=agent.model,
+                platform=getattr(agent, "platform", None) or "",
+            )
+        except Exception as exc:
+            logger.warning("post_llm_call hook failed: %s", exc)
+
+    # Extract reasoning from the CURRENT turn only.  Walk backwards
+    # but stop at the user message that started this turn — anything
+    # earlier is from a prior turn and must not leak into the reasoning
+    # box (confusing stale display; #17055).  Within the current turn
+    # we still want the *most recent* non-empty reasoning: many
+    # providers (Claude thinking, DeepSeek v4, Codex Responses) emit
+    # reasoning on the tool-call step and leave the final-answer step
+    # with reasoning=None, so picking only the last assistant would
+    # silently drop legitimate same-turn reasoning.
+    last_reasoning = None
+    for msg in reversed(messages):
+        if msg.get("role") == "user":
+            break  # turn boundary — don't cross into prior turns
+        if msg.get("role") == "assistant" and msg.get("reasoning"):
+            last_reasoning = msg["reasoning"]
+            break
+
+    # Build result with interrupt info if applicable
+    result = {
+        "final_response": final_response,
+        "last_reasoning": last_reasoning,
+        "messages": messages,
+        "api_calls": api_call_count,
+        "completed": completed,
+        "turn_exit_reason": _turn_exit_reason,
+        "failed": failed,
+        "partial": False,  # True only when stopped due to invalid tool calls
+        "interrupted": interrupted,
+        "response_transformed": _response_transformed,
+        "response_previewed": getattr(agent, "_response_was_previewed", False),
+        "model": agent.model,
+        "provider": agent.provider,
+        "base_url": agent.base_url,
+        "input_tokens": agent.session_input_tokens,
+        "output_tokens": agent.session_output_tokens,
+        "cache_read_tokens": agent.session_cache_read_tokens,
+        "cache_write_tokens": agent.session_cache_write_tokens,
+        "reasoning_tokens": agent.session_reasoning_tokens,
+        "prompt_tokens": agent.session_prompt_tokens,
+        "completion_tokens": agent.session_completion_tokens,
+        "total_tokens": agent.session_total_tokens,
+        "last_prompt_tokens": getattr(agent.context_compressor, "last_prompt_tokens", 0) or 0,
+        "estimated_cost_usd": agent.session_estimated_cost_usd,
+        "cost_status": agent.session_cost_status,
+        "cost_source": agent.session_cost_source,
+        "session_id": agent.session_id,
+    }
+    if agent._tool_guardrail_halt_decision is not None:
+        result["guardrail"] = agent._tool_guardrail_halt_decision.to_metadata()
+    # If a /steer landed after the final assistant turn (no more tool
+    # batches to drain into), hand it back to the caller so it can be
+    # delivered as the next user turn instead of being silently lost.
+    _leftover_steer = agent._drain_pending_steer()
+    if _leftover_steer:
+        result["pending_steer"] = _leftover_steer
+    agent._response_was_previewed = False
+
+    # Include interrupt message if one triggered the interrupt
+    if interrupted and agent._interrupt_message:
+        result["interrupt_message"] = agent._interrupt_message
+
+    # Clear interrupt state after handling
+    agent.clear_interrupt()
+
+    # Clear stream callback so it doesn't leak into future calls
+    agent._stream_callback = None
+
+    # Check skill trigger NOW — based on how many tool iterations THIS turn used.
+    _should_review_skills = False
+    if (agent._skill_nudge_interval > 0
+            and agent._iters_since_skill >= agent._skill_nudge_interval
+            and "skill_manage" in agent.valid_tool_names):
+        _should_review_skills = True
+        agent._iters_since_skill = 0
+
+    # External memory provider: sync the completed turn + queue next prefetch.
+    agent._sync_external_memory_for_turn(
+        original_user_message=original_user_message,
+        final_response=final_response,
+        interrupted=interrupted,
+        messages=messages,
+    )
+
+    # Background memory/skill review — runs AFTER the response is delivered
+    # so it never competes with the user's task for model attention.
+    if final_response and not interrupted and (_should_review_memory or _should_review_skills):
+        try:
+            agent._spawn_background_review(
+                messages_snapshot=list(messages),
+                review_memory=_should_review_memory,
+                review_skills=_should_review_skills,
+            )
+        except Exception:
+            pass  # Background review is best-effort
+
+    # Note: Memory provider on_session_end() + shutdown_all() are NOT
+    # called here — run_conversation() is called once per user message in
+    # multi-turn sessions. Shutting down after every turn would kill the
+    # provider before the second message. Actual session-end cleanup is
+    # handled by the CLI (atexit / /reset) and gateway (session expiry /
+    # _reset_session).
+
+    # Plugin hook: on_session_end
+    # Fired at the very end of every run_conversation call.
+    # Plugins can use this for cleanup, flushing buffers, etc.
+    try:
+        from hermes_cli.plugins import invoke_hook as _invoke_hook
+        _invoke_hook(
+            "on_session_end",
+            session_id=agent.session_id,
+            task_id=effective_task_id,
+            turn_id=turn_id,
+            completed=completed,
+            interrupted=interrupted,
+            model=agent.model,
+            platform=getattr(agent, "platform", None) or "",
+        )
+    except Exception as exc:
+        logger.warning("on_session_end hook failed: %s", exc)
+
+    return result
--- a/agent/turn_retry_state.py
+++ b/agent/turn_retry_state.py
@ -0,0 +1,68 @@
+"""Per-attempt recovery bookkeeping for the conversation turn loop.
+
+The inner retry loop in ``run_conversation`` (``while retry_count <
+max_retries``) makes several distinct recovery attempts on a single model API
+call: a credential-pool 429 retry, a per-provider OAuth refresh (codex,
+anthropic, nous, copilot), a long-context compression restart, a length-
+continuation restart, and a handful of format-recovery branches (thinking-
+signature stripping, multimodal-tool-content stripping, llama.cpp grammar
+fallback, image shrink, invalid-encrypted-content, 1M-beta header).
+
+Each of those branches is guarded by a one-shot boolean so it fires at most
+once per attempt. They used to be ~16 bare ``*_attempted`` / ``has_retried_*``
+/ ``restart_with_*`` locals declared inline before the loop and threaded
+through its 2,400-line body. ``TurnRetryState`` collapses them into one object
+the loop mutates in place (``state.codex_auth_retry_attempted = True``), giving
+the recovery bookkeeping a single named, testable home.
+
+Loop-control variables (``retry_count``, ``max_retries``,
+``max_compression_attempts``) intentionally stay as plain locals — they are the
+``while`` mechanics, not recovery bookkeeping, and putting them on the object
+would add indirection without clarifying anything.
+
+This module is dependency-free so it can be unit-tested in isolation and
+imported by the turn loop without an import cycle.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, fields
+
+
+@dataclass
+class TurnRetryState:
+    """One-shot recovery guards + restart signals for a single API-call attempt.
+
+    A fresh instance is created for each iteration of the outer turn loop
+    (once per ``api_call_count``). Each guard fires its recovery branch at most
+    once; the ``restart_with_*`` signals are read by the loop after the attempt
+    to decide whether to rebuild the request and retry.
+    """
+
+    # ── Per-provider OAuth / credential refresh guards ───────────────────
+    codex_auth_retry_attempted: bool = False
+    anthropic_auth_retry_attempted: bool = False
+    nous_auth_retry_attempted: bool = False
+    nous_paid_entitlement_refresh_attempted: bool = False
+    copilot_auth_retry_attempted: bool = False
+
+    # ── Format / payload recovery guards ─────────────────────────────────
+    thinking_sig_retry_attempted: bool = False
+    invalid_encrypted_content_retry_attempted: bool = False
+    image_shrink_retry_attempted: bool = False
+    multimodal_tool_content_retry_attempted: bool = False
+    oauth_1m_beta_retry_attempted: bool = False
+    llama_cpp_grammar_retry_attempted: bool = False
+
+    # ── Transport / rate-limit recovery ──────────────────────────────────
+    primary_recovery_attempted: bool = False
+    has_retried_429: bool = False
+
+    # ── Restart signals (read by the outer loop after the attempt) ───────
+    restart_with_compressed_messages: bool = False
+    restart_with_length_continuation: bool = False
+
+    def __iter__(self):
+        # Convenience for debugging / tests: iterate (name, value) pairs.
+        for f in fields(self):
+            yield f.name, getattr(self, f.name)
--- a/apps/desktop/electron/main.cjs
+++ b/apps/desktop/electron/main.cjs
@ -1902,12 +1902,36 @@ function resolveWebDist() {
  const unpackedDist = path.join(unpackedPathFor(APP_ROOT), 'dist')
  if (directoryExists(unpackedDist)) return unpackedDist

-  return path.join(APP_ROOT, 'dist')
+  // Final fallback: APP_ROOT/dist. When packaged with asar:true this lives
+  // INSIDE app.asar — not a servable filesystem directory — so the embedded
+  // dashboard backend 404s on static routes (see #41327, #39472). The durable
+  // fix is unpacking dist/ (PR #41411 adds dist/** to asarUnpack so the tier-2
+  // unpackedDist above resolves). If we still land here while packaged, log it
+  // so the cause isn't silent.
+  const fallback = path.join(APP_ROOT, 'dist')
+  if (IS_PACKAGED && /app\.asar(?=$|[\\/])/.test(fallback) && !directoryExists(fallback)) {
+    rememberLog(
+      `[web-dist] dashboard frontend dir resolved to an asar-internal path that ` +
+        `is not a real directory: ${fallback}. Static routes will 404. ` +
+        `Ensure dist/** is unpacked (asarUnpack) or set HERMES_DESKTOP_WEB_DIST.`
+    )
+  }
+  return fallback
 }

 function resolveRendererIndex() {
  const candidates = [path.join(APP_ROOT, 'dist', 'index.html'), path.join(resolveWebDist(), 'index.html')]
-  return candidates.find(fileExists) || candidates[0]
+  const found = candidates.find(fileExists)
+  if (found) return found
+  // Nothing on disk. A packaged build with no renderer bundle blank-pages with
+  // a bare ERR_FILE_NOT_FOUND and no clue why (see #39484). Surface the cause
+  // and the fix before Electron loads the missing file.
+  rememberLog(
+    `[renderer] index.html not found — the desktop app was packaged without a ` +
+      `renderer bundle. Tried: ${candidates.join(', ')}. ` +
+      `Rebuild with: hermes desktop --force-build`
+  )
+  return candidates[0]
 }

 function resolveHermesCwd() {
@ -3137,7 +3161,7 @@ function buildApplicationMenu() {
        label: 'Actual Size',
        accelerator: 'CommandOrControl+0',
        click: () => {
-          if (mainWindow && !mainWindow.isDestroyed()) mainWindow.webContents.setZoomLevel(0)
+          setAndPersistZoomLevel(mainWindow, 0)
        }
      },
      {
@ -3145,8 +3169,7 @@ function buildApplicationMenu() {
        accelerator: 'CommandOrControl+Plus',
        click: () => {
          if (mainWindow && !mainWindow.isDestroyed()) {
-            const next = Math.min(mainWindow.webContents.getZoomLevel() + 0.1, 9)
-            mainWindow.webContents.setZoomLevel(next)
+            setAndPersistZoomLevel(mainWindow, mainWindow.webContents.getZoomLevel() + 0.1)
          }
        }
      },
@ -3155,8 +3178,7 @@ function buildApplicationMenu() {
        accelerator: 'CommandOrControl+-',
        click: () => {
          if (mainWindow && !mainWindow.isDestroyed()) {
-            const next = Math.max(mainWindow.webContents.getZoomLevel() - 0.1, -9)
-            mainWindow.webContents.setZoomLevel(next)
+            setAndPersistZoomLevel(mainWindow, mainWindow.webContents.getZoomLevel() - 0.1)
          }
        }
      },
@ -3218,6 +3240,38 @@ function installPreviewShortcut(window) {
  })
 }

+// Zoom level is persisted in the renderer's own localStorage (per-origin,
+// survives reloads/restarts) rather than a main-process JSON file. The main
+// process owns setZoomLevel, so we mirror each change into localStorage and
+// read it back on did-finish-load to re-apply after reloads or crash recovery.
+const ZOOM_STORAGE_KEY = 'hermes:desktop:zoomLevel'
+
+function clampZoomLevel(value) {
+  if (!Number.isFinite(value)) return 0
+  return Math.min(Math.max(value, -9), 9)
+}
+
+function setAndPersistZoomLevel(window, zoomLevel) {
+  if (!window || window.isDestroyed()) return
+  const next = clampZoomLevel(zoomLevel)
+  window.webContents.setZoomLevel(next)
+  window.webContents
+    .executeJavaScript(`try { localStorage.setItem(${JSON.stringify(ZOOM_STORAGE_KEY)}, ${JSON.stringify(String(next))}) } catch {}`)
+    .catch(error => rememberLog(`[zoom] persist failed: ${error?.message || error}`))
+}
+
+function restorePersistedZoomLevel(window) {
+  if (!window || window.isDestroyed()) return
+  window.webContents
+    .executeJavaScript(`(() => { try { return localStorage.getItem(${JSON.stringify(ZOOM_STORAGE_KEY)}) } catch { return null } })()`)
+    .then(stored => {
+      if (stored == null || !window || window.isDestroyed()) return
+      const level = clampZoomLevel(Number(stored))
+      window.webContents.setZoomLevel(level)
+    })
+    .catch(error => rememberLog(`[zoom] restore failed: ${error?.message || error}`))
+}
+
 function installZoomShortcuts(window) {
  // Override Ctrl/Cmd + +/-/0 with half the default zoom step (0.1 vs 0.2).
  // The menu items handle this on macOS (where the menu is always present),
@ -3231,15 +3285,13 @@ function installZoomShortcuts(window) {
    const key = input.key
    if (key === '0') {
      event.preventDefault()
-      window.webContents.setZoomLevel(0)
+      setAndPersistZoomLevel(window, 0)
    } else if (key === '=' || key === '+') {
      event.preventDefault()
-      const next = Math.min(window.webContents.getZoomLevel() + ZOOM_STEP, 9)
-      window.webContents.setZoomLevel(next)
+      setAndPersistZoomLevel(window, window.webContents.getZoomLevel() + ZOOM_STEP)
    } else if (key === '-') {
      event.preventDefault()
-      const next = Math.max(window.webContents.getZoomLevel() - ZOOM_STEP, -9)
-      window.webContents.setZoomLevel(next)
+      setAndPersistZoomLevel(window, window.webContents.getZoomLevel() - ZOOM_STEP)
    }
  })
 }
@ -3847,10 +3899,12 @@ async function sanitizeDesktopConnectionConfig(config = readDesktopConnectionCon
  const scoped = key ? config.profiles?.[key] || null : null
  const block = key ? scoped || {} : config.remote || {}

+  const envOverride = key ? false : Boolean(process.env.HERMES_DESKTOP_REMOTE_URL)
+
  const remoteToken = decryptDesktopSecret(block.token)
  const authMode = normAuthMode(block.authMode)
-  const remoteUrl = String(block.url || '')
-  const mode = (key ? scoped?.mode : config.mode) === 'remote' ? 'remote' : 'local'
+  const remoteUrl = envOverride ? String(process.env.HERMES_DESKTOP_REMOTE_URL || '') : String(block.url || '')
+  const mode = envOverride || (key ? scoped?.mode : config.mode) === 'remote' ? 'remote' : 'local'

  let remoteOauthConnected = false
  if (authMode === 'oauth' && remoteUrl) {
@ -3876,7 +3930,7 @@ async function sanitizeDesktopConnectionConfig(config = readDesktopConnectionCon
    remoteTokenSet: Boolean(remoteToken),
    // The env override only forces the global/primary connection; a per-profile
    // scope is never overridden by HERMES_DESKTOP_REMOTE_URL.
-    envOverride: key ? false : Boolean(process.env.HERMES_DESKTOP_REMOTE_URL)
+    envOverride
  }
 }

@ -4614,7 +4668,7 @@ function createWindow() {
  mainWindow = new BrowserWindow({
    width: 1220,
    height: 800,
-    minWidth: 900,
+    minWidth: 400,
    minHeight: 620,
    title: 'Hermes',
    // Frameless title bar on every platform so the renderer can paint the
@ -4730,6 +4784,7 @@ function createWindow() {
  }

  mainWindow.webContents.once('did-finish-load', () => {
+    restorePersistedZoomLevel(mainWindow)
    broadcastBootProgress()
    sendWindowStateChanged()
    startHermes().catch(error => rememberLog(error.stack || error.message))
@ -4737,6 +4792,45 @@ function createWindow() {
 }

 ipcMain.handle('hermes:connection', async (_event, profile) => ensureBackend(profile))
+// Reconnect-after-wake recovery. A REMOTE primary backend has no child process,
+// so the 'exit'/'error' handlers that would clear a dead connectionPromise never
+// fire — once the remote becomes unreachable across a sleep/wake the renderer
+// re-dials the same dead descriptor forever and the composer stays stuck on
+// "Starting Hermes…". Before the renderer's backoff loop reconnects, it asks us
+// to confirm the cached PRIMARY backend is still reachable; if a remote one is
+// not, we drop the cache so the next getConnection() rebuilds it. Local backends
+// self-heal via their child 'exit' handler, so we never touch them here.
+ipcMain.handle('hermes:connection:revalidate', async () => {
+  if (!connectionPromise) {
+    return { ok: true, rebuilt: false }
+  }
+
+  let conn = null
+  try {
+    conn = await connectionPromise
+  } catch {
+    // The cached boot already rejected (its own catch nulls connectionPromise);
+    // nothing to revalidate — the next getConnection() builds fresh.
+    return { ok: true, rebuilt: false }
+  }
+
+  if (!conn || conn.mode !== 'remote' || !conn.baseUrl) {
+    return { ok: true, rebuilt: false }
+  }
+
+  const base = conn.baseUrl.replace(/\/+$/, '')
+  try {
+    await fetchPublicJson(`${base}/api/status`, { timeoutMs: 2_500 })
+    return { ok: true, rebuilt: false }
+  } catch {
+    // Unreachable remote: drop the stale cache so the renderer's next reconnect
+    // tick rebuilds a fresh, reachable descriptor. resetHermesConnection only
+    // nulls connectionPromise for a remote (no child to SIGTERM).
+    rememberLog('Cached remote Hermes backend failed liveness probe; dropping stale connection.')
+    resetHermesConnection()
+    return { ok: true, rebuilt: true }
+  }
+})
 ipcMain.handle('hermes:backend:touch', async (_event, profile) => {
  touchPoolBackend(profile)
  return { ok: true }
--- a/apps/desktop/electron/preload.cjs
+++ b/apps/desktop/electron/preload.cjs
@ -2,6 +2,7 @@ const { contextBridge, ipcRenderer, webUtils } = require('electron')

 contextBridge.exposeInMainWorld('hermesDesktop', {
  getConnection: profile => ipcRenderer.invoke('hermes:connection', profile),
+  revalidateConnection: () => ipcRenderer.invoke('hermes:connection:revalidate'),
  touchBackend: profile => ipcRenderer.invoke('hermes:backend:touch', profile),
  getGatewayWsUrl: profile => ipcRenderer.invoke('hermes:gateway:ws-url', profile),
  getBootProgress: () => ipcRenderer.invoke('hermes:boot-progress:get'),
--- a/apps/desktop/package.json
+++ b/apps/desktop/package.json
@ -18,7 +18,7 @@
    "profile:main": "wait-on http://127.0.0.1:5174 && cross-env XCURSOR_SIZE=24 HERMES_DESKTOP_DEV_SERVER=http://127.0.0.1:5174 electron --inspect=9229 .",
    "profile:main:cpu": "wait-on http://127.0.0.1:5174 && cross-env XCURSOR_SIZE=24 NODE_OPTIONS=--cpu-prof HERMES_DESKTOP_DEV_SERVER=http://127.0.0.1:5174 electron .",
    "start": "npm run build && electron .",
-    "build": "node scripts/assert-root-install.cjs && node scripts/write-build-stamp.cjs && node scripts/stage-native-deps.cjs && tsc -b && vite build",
+    "build": "node scripts/assert-root-install.cjs && node scripts/write-build-stamp.cjs && node scripts/stage-native-deps.cjs && tsc -b && vite build && node scripts/assert-dist-built.cjs",
    "builder": "cross-env NODE_OPTIONS=--max-old-space-size=16384 electron-builder",
    "pack": "npm run build && npm run builder -- --dir",
    "dist": "npm run build && npm run builder",
@ -166,7 +166,8 @@
    "afterSign": "scripts/notarize.cjs",
    "asarUnpack": [
      "**/*.node",
-      "**/prebuilds/**"
+      "**/prebuilds/**",
+      "dist/**"
    ],
    "mac": {
      "category": "public.app-category.developer-tools",
--- a/apps/desktop/pr-assets/session-source-folders.png
+++ b/apps/desktop/pr-assets/session-source-folders.png
--- a/apps/desktop/scripts/assert-dist-built.cjs
+++ b/apps/desktop/scripts/assert-dist-built.cjs
@ -0,0 +1,70 @@
+"use strict"
+
+// Build-time guard: refuse to hand a half-built renderer to electron-builder.
+//
+// `npm run pack` / `npm run dist*` are `npm run build && npm run builder`.
+// If the `build` step (tsc -b && vite build) fails but packaging proceeds
+// anyway — a stale checkout that fails typecheck, an interrupted vite build,
+// or npm not short-circuiting `&&` in some shells — electron-builder happily
+// packages an app with an empty or missing `dist/`. The result launches but
+// blank-pages with `ERR_FILE_NOT_FOUND` for dist/index.html, with no clue why.
+//
+// This runs at the tail of `build`, after vite build, so any packaging path
+// inherits it. It fails loud and early instead of shipping a broken bundle.
+// See issues #39484 (renderer blank page) and #41327 / #39472 (dashboard 404).
+
+const fs = require("fs")
+const path = require("path")
+
+// Pure check — returns { ok: true } or { ok: false, error: "..." }.
+// Kept side-effect-free so it can be unit tested without spawning a process.
+function checkDistBuilt(distDir) {
+  if (!fs.existsSync(distDir) || !fs.statSync(distDir).isDirectory()) {
+    return { ok: false, error: `no dist directory at ${distDir}` }
+  }
+
+  const indexHtml = path.join(distDir, "index.html")
+  if (!fs.existsSync(indexHtml) || !fs.statSync(indexHtml).isFile()) {
+    return { ok: false, error: `dist/index.html is missing at ${indexHtml}` }
+  }
+  if (fs.statSync(indexHtml).size === 0) {
+    return { ok: false, error: `dist/index.html is empty at ${indexHtml}` }
+  }
+
+  // index.html alone isn't enough — vite emits hashed JS into dist/assets.
+  // An index.html with no script bundle still blank-pages.
+  const assetsDir = path.join(distDir, "assets")
+  const hasAssets =
+    fs.existsSync(assetsDir) &&
+    fs.statSync(assetsDir).isDirectory() &&
+    fs.readdirSync(assetsDir).some(name => name.endsWith(".js"))
+  if (!hasAssets) {
+    return { ok: false, error: `dist/assets has no built JS bundle (expected vite output under ${assetsDir})` }
+  }
+
+  return { ok: true }
+}
+
+function main() {
+  const desktopRoot = path.resolve(__dirname, "..")
+  const distDir = path.join(desktopRoot, "dist")
+  const result = checkDistBuilt(distDir)
+
+  if (!result.ok) {
+    console.error(`\n✗ assert-dist-built: ${result.error}`)
+    console.error("  The renderer bundle is missing or incomplete, so packaging")
+    console.error("  would produce an app that launches to a blank page.")
+    console.error("  Re-run the build and check the tsc/vite output above for the")
+    console.error("  real failure, then package again:")
+    console.error(`    cd ${desktopRoot} && npm run build\n`)
+    process.exit(1)
+  }
+
+  console.log("✓ assert-dist-built: dist/index.html + assets present")
+}
+
+if (require.main === module) {
+  main()
+}
+
+module.exports = { checkDistBuilt }
--- a/apps/desktop/scripts/assert-dist-built.test.cjs
+++ b/apps/desktop/scripts/assert-dist-built.test.cjs
@ -0,0 +1,84 @@
+const assert = require('node:assert/strict')
+const fs = require('node:fs')
+const os = require('node:os')
+const path = require('node:path')
+const test = require('node:test')
+
+const { checkDistBuilt } = require('../scripts/assert-dist-built.cjs')
+
+function makeDist(extra) {
+  const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-assert-dist-'))
+  const distDir = path.join(tempRoot, 'dist')
+  fs.mkdirSync(distDir, { recursive: true })
+  if (extra) extra(distDir)
+  return { tempRoot, distDir }
+}
+
+test('checkDistBuilt passes when index.html + an assets JS bundle exist', () => {
+  const { tempRoot, distDir } = makeDist(d => {
+    fs.writeFileSync(path.join(d, 'index.html'), '<!doctype html><div id=root></div>', 'utf8')
+    fs.mkdirSync(path.join(d, 'assets'))
+    fs.writeFileSync(path.join(d, 'assets', 'index-abc123.js'), 'console.log(1)', 'utf8')
+  })
+  try {
+    assert.deepEqual(checkDistBuilt(distDir), { ok: true })
+  } finally {
+    fs.rmSync(tempRoot, { recursive: true, force: true })
+  }
+})
+
+test('checkDistBuilt fails when the dist directory is absent', () => {
+  const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-assert-dist-'))
+  try {
+    const result = checkDistBuilt(path.join(tempRoot, 'dist'))
+    assert.equal(result.ok, false)
+    assert.match(result.error, /no dist directory/)
+  } finally {
+    fs.rmSync(tempRoot, { recursive: true, force: true })
+  }
+})
+
+test('checkDistBuilt fails when index.html is missing', () => {
+  const { tempRoot, distDir } = makeDist(d => {
+    fs.mkdirSync(path.join(d, 'assets'))
+    fs.writeFileSync(path.join(d, 'assets', 'index-abc123.js'), 'console.log(1)', 'utf8')
+  })
+  try {
+    const result = checkDistBuilt(distDir)
+    assert.equal(result.ok, false)
+    assert.match(result.error, /index\.html is missing/)
+  } finally {
+    fs.rmSync(tempRoot, { recursive: true, force: true })
+  }
+})
+
+test('checkDistBuilt fails when index.html is empty', () => {
+  const { tempRoot, distDir } = makeDist(d => {
+    fs.writeFileSync(path.join(d, 'index.html'), '', 'utf8')
+    fs.mkdirSync(path.join(d, 'assets'))
+    fs.writeFileSync(path.join(d, 'assets', 'index-abc123.js'), 'console.log(1)', 'utf8')
+  })
+  try {
+    const result = checkDistBuilt(distDir)
+    assert.equal(result.ok, false)
+    assert.match(result.error, /index\.html is empty/)
+  } finally {
+    fs.rmSync(tempRoot, { recursive: true, force: true })
+  }
+})
+
+test('checkDistBuilt fails when assets/ has no JS bundle', () => {
+  const { tempRoot, distDir } = makeDist(d => {
+    fs.writeFileSync(path.join(d, 'index.html'), '<!doctype html>', 'utf8')
+    fs.mkdirSync(path.join(d, 'assets'))
+    // CSS only, no JS — still a blank page at runtime.
+    fs.writeFileSync(path.join(d, 'assets', 'index-abc123.css'), 'body{}', 'utf8')
+  })
+  try {
+    const result = checkDistBuilt(distDir)
+    assert.equal(result.ok, false)
+    assert.match(result.error, /no built JS bundle/)
+  } finally {
+    fs.rmSync(tempRoot, { recursive: true, force: true })
+  }
+})
--- a/apps/desktop/src/app/chat/index.tsx
+++ b/apps/desktop/src/app/chat/index.tsx
@ -124,7 +124,10 @@ function ChatHeader({

  return (
    <header className={cn(titlebarHeaderBaseClass, isRoutedSessionView && titlebarHeaderShadowClass)}>
-      <div className="min-w-0 flex-1">
+      <div
+        className="min-w-0 flex-1"
+        style={{ maxWidth: 'calc(100vw - var(--titlebar-content-inset,0px) - var(--titlebar-tools-right) - var(--titlebar-tools-width) - 1.5rem)' }}
+      >
        <SessionActionsMenu
          align="start"
          onDelete={selectedSessionId ? onDeleteSelectedSession : undefined}
@ -135,11 +138,11 @@ function ChatHeader({
          title={title}
        >
          <Button
-            className="pointer-events-auto h-6 min-w-0 gap-1 border border-transparent bg-transparent px-2 py-0 text-(--ui-text-secondary) hover:border-(--ui-stroke-tertiary) hover:bg-(--ui-control-hover-background) hover:text-foreground data-[state=open]:border-(--ui-stroke-tertiary) data-[state=open]:bg-(--ui-control-active-background) [-webkit-app-region:no-drag]"
+            className="pointer-events-auto flex h-6 min-w-0 max-w-full gap-1 border border-transparent bg-transparent px-2 py-0 text-(--ui-text-secondary) hover:border-(--ui-stroke-tertiary) hover:bg-(--ui-control-hover-background) hover:text-foreground data-[state=open]:border-(--ui-stroke-tertiary) data-[state=open]:bg-(--ui-control-active-background) [-webkit-app-region:no-drag]"
            type="button"
            variant="ghost"
          >
-            <h2 className="max-w-[52vw] truncate text-[0.75rem] font-medium leading-none">{title}</h2>
+            <h2 className="min-w-0 flex-1 truncate text-[0.75rem] font-medium leading-none">{title}</h2>
            <Codicon className="shrink-0 text-(--ui-text-tertiary)" name="chevron-down" size="0.8125rem" />
          </Button>
        </SessionActionsMenu>
--- a/apps/desktop/src/app/chat/sidebar/index.tsx
+++ b/apps/desktop/src/app/chat/sidebar/index.tsx
@ -19,6 +19,7 @@ import { useStore } from '@nanostores/react'
 import type * as React from 'react'
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'

+import { PlatformAvatar } from '@/app/messaging/platform-icon'
 import { Button } from '@/components/ui/button'
 import { Codicon } from '@/components/ui/codicon'
 import { DisclosureCaret } from '@/components/ui/disclosure-caret'
@ -39,6 +40,7 @@ import { searchSessions, type SessionInfo, type SessionSearchResult } from '@/he
 import { useI18n } from '@/i18n'
 import { profileColor } from '@/lib/profile-color'
 import { sessionMatchesSearch } from '@/lib/session-search'
+import { normalizeSessionSource, sessionSourceLabel } from '@/lib/session-source'
 import { cn } from '@/lib/utils'
 import { $cronJobs } from '@/store/cron'
 import {
@ -47,8 +49,11 @@ import {
  $sidebarAgentsGrouped,
  $sidebarCronOpen,
  $sidebarOpen,
+  $sidebarOverlayMounted,
  $sidebarPinsOpen,
  $sidebarRecentsOpen,
+  $sidebarSessionOrderIds,
+  $sidebarWorkspaceOrderIds,
  pinSession,
  reorderPinnedSession,
  SESSION_SEARCH_FOCUS_EVENT,
@ -56,6 +61,8 @@ import {
  setSidebarCronOpen,
  setSidebarPinsOpen,
  setSidebarRecentsOpen,
+  setSidebarSessionOrderIds,
+  setSidebarWorkspaceOrderIds,
  SIDEBAR_SESSIONS_PAGE_SIZE,
  unpinSession
 } from '@/store/layout'
@ -116,10 +123,14 @@ const WORKSPACE_PAGE = 5
 // ALL-profiles view: show only the latest N per profile up front to keep the
 // unified list scannable, then reveal/fetch more in N-sized steps on demand.
 const PROFILE_INITIAL_PAGE = 5
-const WS_ID_PREFIX = 'workspace:'
+const GROUP_DND_ID_PREFIX = 'group:'
+const LOCAL_SESSION_SOURCES = new Set(['cli', 'desktop', 'local', 'tui'])
+
+const groupDndId = (id: string) => `${GROUP_DND_ID_PREFIX}${id}`
+
+const parseGroupDndId = (id: string) =>
+  id.startsWith(GROUP_DND_ID_PREFIX) ? id.slice(GROUP_DND_ID_PREFIX.length) : null

-const wsId = (id: string) => `${WS_ID_PREFIX}${id}`
-const parseWsId = (id: string) => (id.startsWith(WS_ID_PREFIX) ? id.slice(WS_ID_PREFIX.length) : null)
 const countLabel = (loaded: number, total: number) => (total > loaded ? `${loaded}/${total}` : String(loaded))
 const sessionTime = (s: SessionInfo) => s.last_active || s.started_at || 0

@ -150,6 +161,33 @@ function orderByIds<T>(items: T[], getId: (item: T) => string, orderIds: string[
  return out
 }

+function reconcileOrderIds(currentIds: string[], orderIds: string[]): string[] {
+  if (!currentIds.length) {
+    return []
+  }
+
+  if (!orderIds.length) {
+    return currentIds
+  }
+
+  const current = new Set(currentIds)
+  const next = orderIds.filter(id => current.has(id))
+  const known = new Set(next)
+
+  for (const id of currentIds) {
+    if (!known.has(id)) {
+      next.push(id)
+      known.add(id)
+    }
+  }
+
+  return next
+}
+
+function sameIds(left: string[], right: string[]) {
+  return left.length === right.length && left.every((item, index) => item === right[index])
+}
+
 const baseName = (path: string) =>
  path
    .replace(/[/\\]+$/, '')
@ -183,7 +221,11 @@ function searchResultToSession(result: SessionSearchResult): SessionInfo {
  }
 }

-function workspaceGroupsFor(sessions: SessionInfo[], noWorkspaceLabel: string): SidebarSessionGroup[] {
+function workspaceGroupsFor(
+  sessions: SessionInfo[],
+  noWorkspaceLabel: string,
+  options: { preserveSessionOrder?: boolean } = {}
+): SidebarSessionGroup[] {
  const groups = new Map<string, SidebarSessionGroup>()

  for (const session of sessions) {
@ -196,17 +238,56 @@ function workspaceGroupsFor(sessions: SessionInfo[], noWorkspaceLabel: string):
    groups.set(id, group)
  }

-  // Groups keep recency order (Map insertion = first-seen in the recency-sorted
-  // input, so an active project floats up), but rows *within* a group sort by
-  // creation time so they don't reshuffle every time a message lands — keeps
-  // muscle memory intact.
-  for (const group of groups.values()) {
-    group.sessions.sort((a, b) => b.started_at - a.started_at)
+  if (!options.preserveSessionOrder) {
+    // Groups keep recency order (Map insertion = first-seen in the recency-sorted
+    // input, so an active project floats up), but rows *within* a group sort by
+    // creation time so they don't reshuffle every time a message lands — keeps
+    // muscle memory intact.
+    for (const group of groups.values()) {
+      group.sessions.sort((a, b) => b.started_at - a.started_at)
+    }
  }

  return [...groups.values()]
 }

+function sourceSessionGroupsFor(sessions: SessionInfo[]): {
+  localSessions: SessionInfo[]
+  sourceGroups: SidebarSessionGroup[]
+} {
+  const groups = new Map<string, SidebarSessionGroup>()
+  const localSessions: SessionInfo[] = []
+
+  for (const session of sessions) {
+    const sourceId = normalizeSessionSource(session.source)
+
+    if (!sourceId || LOCAL_SESSION_SOURCES.has(sourceId)) {
+      localSessions.push(session)
+
+      continue
+    }
+
+    const label = sessionSourceLabel(sourceId) ?? sourceId
+
+    const group = groups.get(sourceId) ?? {
+      id: `source:${sourceId}`,
+      label,
+      mode: 'source',
+      path: null,
+      sessions: [],
+      sourceId
+    }
+
+    group.sessions.push(session)
+    groups.set(sourceId, group)
+  }
+
+  return {
+    localSessions,
+    sourceGroups: [...groups.values()].sort((a, b) => sessionTime(b.sessions[0]) - sessionTime(a.sessions[0]))
+  }
+}
+
 function useSortableBindings(id: string) {
  const { attributes, isDragging, listeners, setNodeRef, transform, transition } = useSortable({ id })

@ -215,7 +296,11 @@ function useSortableBindings(id: string) {
    dragHandleProps: { ...attributes, ...listeners },
    ref: setNodeRef,
    reorderable: true as const,
-    style: { transform: CSS.Transform.toString(transform), transition }
+    style: {
+      transform: CSS.Transform.toString(transform),
+      transition: isDragging ? undefined : transition,
+      willChange: isDragging ? 'transform' : undefined
+    }
  }
 }

@ -247,6 +332,9 @@ export function ChatSidebar({
  const { t } = useI18n()
  const s = t.sidebar
  const sidebarOpen = useStore($sidebarOpen)
+  // Collapsed-but-overlay-mounted → render the full sidebar, not just the nav rail.
+  const overlayMounted = useStore($sidebarOverlayMounted)
+  const contentVisible = sidebarOpen || overlayMounted
  const panesFlipped = useStore($panesFlipped)
  const agentsGrouped = useStore($sidebarAgentsGrouped)
  const pinnedSessionIds = useStore($pinnedSessionIds)
@ -270,8 +358,8 @@ export function ChatSidebar({
  // profile while scope is still ALL (persisted), the rail is hidden and they'd
  // otherwise be stuck in the grouped view with no way out.
  const showAllProfiles = multiProfile && profileScope === ALL_PROFILES
-  const [agentOrderIds, setAgentOrderIds] = useState<string[]>([])
-  const [workspaceOrderIds, setWorkspaceOrderIds] = useState<string[]>([])
+  const agentOrderIds = useStore($sidebarSessionOrderIds)
+  const workspaceOrderIds = useStore($sidebarWorkspaceOrderIds)
  const [searchQuery, setSearchQuery] = useState('')
  const [serverMatches, setServerMatches] = useState<SessionSearchResult[]>([])
  const [newSessionKbdFlash, setNewSessionKbdFlash] = useState(false)
@ -425,14 +513,40 @@ export function ChatSidebar({
    [sortedSessions, pinnedRealIdSet]
  )

+  useEffect(() => {
+    const next = reconcileOrderIds(
+      unpinnedAgentSessions.map(s => s.id),
+      agentOrderIds
+    )
+
+    if (!sameIds(next, agentOrderIds)) {
+      setSidebarSessionOrderIds(next)
+    }
+  }, [agentOrderIds, unpinnedAgentSessions])
+
  const agentSessions = useMemo(
    () => orderByIds(unpinnedAgentSessions, s => s.id, agentOrderIds),
    [unpinnedAgentSessions, agentOrderIds]
  )

+  const { localSessions: localAgentSessions, sourceGroups } = useMemo(
+    () => sourceSessionGroupsFor(agentSessions),
+    [agentSessions]
+  )
+
+  const orderedSourceGroups = useMemo(
+    () => orderByIds(sourceGroups, g => g.id, workspaceOrderIds),
+    [sourceGroups, workspaceOrderIds]
+  )
+
  const agentGroups = useMemo(
-    () => orderByIds(workspaceGroupsFor(agentSessions, s.noWorkspace), g => g.id, workspaceOrderIds),
-    [agentSessions, s.noWorkspace, workspaceOrderIds]
+    () =>
+      orderByIds(
+        workspaceGroupsFor(localAgentSessions, s.noWorkspace, { preserveSessionOrder: sourceGroups.length > 0 }),
+        g => g.id,
+        workspaceOrderIds
+      ),
+    [localAgentSessions, s.noWorkspace, sourceGroups.length, workspaceOrderIds]
  )

  const loadMoreForProfileGroup = useCallback(
@ -445,9 +559,7 @@ export function ChatSidebar({

      void Promise.resolve(onLoadMoreProfileSessions(profile))
        .catch(() => undefined)
-        .finally(() =>
-          setProfileLoadMorePending(({ [profile]: _done, ...rest }) => rest)
-        )
+        .finally(() => setProfileLoadMorePending(({ [profile]: _done, ...rest }) => rest))
    },
    [onLoadMoreProfileSessions]
  )
@ -478,15 +590,17 @@ export function ChatSidebar({
      groups.set(key, group)
    }

-    return [...groups.values()]
-      .map(group => ({
-        ...group,
-        loadingMore: Boolean(profileLoadMorePending[group.id]),
-        onLoadMore: onLoadMoreProfileSessions ? () => loadMoreForProfileGroup(group.id) : undefined,
-        totalCount: Math.max(group.sessions.length, sessionProfileTotals[group.id] ?? 0)
-      }))
-      // default (root) first, then the rest alphabetically.
-      .sort((a, b) => (a.id === 'default' ? -1 : b.id === 'default' ? 1 : a.label.localeCompare(b.label)))
+    return (
+      [...groups.values()]
+        .map(group => ({
+          ...group,
+          loadingMore: Boolean(profileLoadMorePending[group.id]),
+          onLoadMore: onLoadMoreProfileSessions ? () => loadMoreForProfileGroup(group.id) : undefined,
+          totalCount: Math.max(group.sessions.length, sessionProfileTotals[group.id] ?? 0)
+        }))
+        // default (root) first, then the rest alphabetically.
+        .sort((a, b) => (a.id === 'default' ? -1 : b.id === 'default' ? 1 : a.label.localeCompare(b.label)))
+    )
  }, [
    showAllProfiles,
    agentSessions,
@ -496,6 +610,53 @@ export function ChatSidebar({
    sessionProfileTotals
  ])

+  const displayAgentSessions = sourceGroups.length ? localAgentSessions : agentSessions
+
+  const displayAgentGroups = useMemo(() => {
+    if (orderedSourceGroups.length) {
+      const localGroups = agentsGrouped
+        ? agentGroups
+        : localAgentSessions.length
+          ? [
+              {
+                id: 'local-sessions',
+                label: 'Local',
+                mode: 'workspace' as const,
+                path: null,
+                sessions: localAgentSessions
+              }
+            ]
+          : []
+
+      return orderByIds([...orderedSourceGroups, ...localGroups], g => g.id, workspaceOrderIds)
+    }
+
+    return showAllProfiles ? profileGroups : agentsGrouped ? agentGroups : undefined
+  }, [
+    agentGroups,
+    agentsGrouped,
+    localAgentSessions,
+    orderedSourceGroups,
+    profileGroups,
+    showAllProfiles,
+    workspaceOrderIds
+  ])
+
+  useEffect(() => {
+    if (!displayAgentGroups?.length || showAllProfiles) {
+      return
+    }
+
+    const next = reconcileOrderIds(
+      displayAgentGroups.map(g => g.id),
+      workspaceOrderIds
+    )
+
+    if (!sameIds(next, workspaceOrderIds)) {
+      setSidebarWorkspaceOrderIds(next)
+    }
+  }, [displayAgentGroups, showAllProfiles, workspaceOrderIds])
+
  const showSessionSkeletons = sessionsLoading && sortedSessions.length === 0

  const showSessionSections = showSessionSkeletons || sortedSessions.length > 0
@ -543,23 +704,24 @@ export function ChatSidebar({

    const activeId = String(active.id)
    const overId = String(over.id)
-    const activeWs = parseWsId(activeId)
-    const overWs = parseWsId(overId)
+    const activeGroup = parseGroupDndId(activeId)
+    const overGroup = parseGroupDndId(overId)

-    if (activeWs && overWs) {
-      const oldIdx = agentGroups.findIndex(g => g.id === activeWs)
-      const newIdx = agentGroups.findIndex(g => g.id === overWs)
+    if (activeGroup && overGroup) {
+      const groups = displayAgentGroups ?? []
+      const oldIdx = groups.findIndex(g => g.id === activeGroup)
+      const newIdx = groups.findIndex(g => g.id === overGroup)

      if (oldIdx < 0 || newIdx < 0) {
        return
      }

-      setWorkspaceOrderIds(arrayMove(agentGroups, oldIdx, newIdx).map(g => g.id))
+      setSidebarWorkspaceOrderIds(arrayMove(groups, oldIdx, newIdx).map(g => g.id))

      return
    }

-    if (activeWs || overWs) {
+    if (activeGroup || overGroup) {
      return
    }

@ -570,7 +732,7 @@ export function ChatSidebar({
      return
    }

-    setAgentOrderIds(arrayMove(agentSessions, oldIdx, newIdx).map(s => s.id))
+    setSidebarSessionOrderIds(arrayMove(agentSessions, oldIdx, newIdx).map(s => s.id))
  }

  return (
@ -580,7 +742,11 @@ export function ChatSidebar({
        panesFlipped ? 'border-l border-r-0' : 'border-r border-l-0',
        sidebarOpen
          ? 'border-(--sidebar-edge-border) bg-(--ui-sidebar-surface-background) opacity-100'
-          : 'pointer-events-none border-transparent bg-transparent opacity-0'
+          : 'pointer-events-none border-transparent bg-transparent opacity-0',
+        // While floated by PaneShell's hover-reveal, force visible + interactive
+        // — on hover (group-hover/reveal) or when keyboard-pinned (data-forced).
+        'in-data-[pane-hover-reveal=open]:pointer-events-auto in-data-[pane-hover-reveal=open]:border-(--sidebar-edge-border) in-data-[pane-hover-reveal=open]:bg-(--ui-sidebar-surface-background) in-data-[pane-hover-reveal=open]:opacity-100',
+        'group-hover/reveal:pointer-events-auto group-hover/reveal:border-(--sidebar-edge-border) group-hover/reveal:bg-(--ui-sidebar-surface-background) group-hover/reveal:opacity-100'
      )}
      collapsible="none"
    >
@ -624,14 +790,14 @@ export function ChatSidebar({
                      type="button"
                    >
                      <item.icon className="size-4 shrink-0 text-[color-mix(in_srgb,currentColor_72%,transparent)]" />
-                      {sidebarOpen && (
+                      {contentVisible && (
                        <>
-                          <span className="min-w-0 flex-1 truncate max-[46.25rem]:hidden">
+                          <span className="min-w-0 flex-1 truncate">
                            {s.nav[item.id] ?? item.label}
                          </span>
                          {isNewSession && (
                            <KbdGroup
-                              className={cn('ml-auto max-[46.25rem]:hidden', newSessionKbdFlash && 'opacity-100!')}
+                              className={cn('ml-auto', newSessionKbdFlash && 'opacity-100!')}
                              keys={[...NEW_SESSION_KBD]}
                            />
                          )}
@ -645,7 +811,7 @@ export function ChatSidebar({
          </SidebarGroupContent>
        </SidebarGroup>

-        {sidebarOpen && showSessionSections && (
+        {contentVisible && showSessionSections && (
          <div className="shrink-0 px-2 pb-1 pt-1">
            <SearchField
              aria-label={s.searchAria}
@ -657,7 +823,7 @@ export function ChatSidebar({
          </div>
        )}

-        {sidebarOpen && showSessionSections && trimmedQuery && (
+        {contentVisible && showSessionSections && trimmedQuery && (
          <SidebarSessionsSection
            activeSessionId={activeSidebarSessionId}
            contentClassName="flex min-h-0 flex-1 flex-col gap-px overflow-y-auto overscroll-contain pb-1.75"
@ -681,7 +847,7 @@ export function ChatSidebar({
          />
        )}

-        {sidebarOpen && showSessionSections && !trimmedQuery && (
+        {contentVisible && showSessionSections && !trimmedQuery && (
          <SidebarSessionsSection
            activeSessionId={activeSidebarSessionId}
            contentClassName="flex min-h-10 shrink-0 flex-col gap-px rounded-lg pb-2 pt-1"
@ -703,7 +869,7 @@ export function ChatSidebar({
          />
        )}

-        {sidebarOpen && showSessionSections && !trimmedQuery && (
+        {contentVisible && showSessionSections && !trimmedQuery && (
          <SidebarSessionsSection
            activeSessionId={activeSidebarSessionId}
            contentClassName={cn(
@ -727,7 +893,7 @@ export function ChatSidebar({
              ) : null
            }
            forceEmptyState={showSessionSkeletons}
-            groups={showAllProfiles ? profileGroups : agentsGrouped ? agentGroups : undefined}
+            groups={displayAgentGroups}
            headerAction={
              // Always reserve the icon-xs (size-6) slot so the header keeps the
              // same height whether or not the toggle renders — otherwise the
@ -736,7 +902,7 @@ export function ChatSidebar({
              // the toggle does nothing, and it's irrelevant in the ALL-profiles
              // view (always grouped by profile), so hide the button (not the slot).
              <div className="grid size-6 shrink-0 place-items-center">
-                {!showAllProfiles && agentSessions.length > 0 ? (
+                {!showAllProfiles && localAgentSessions.length > 0 ? (
                  <Tip label={agentsGrouped ? s.groupTitleGrouped : s.groupTitleUngrouped}>
                    <Button
                      aria-label={agentsGrouped ? s.groupAriaGrouped : s.groupAriaUngrouped}
@ -770,13 +936,13 @@ export function ChatSidebar({
            open={agentsOpen}
            pinned={false}
            rootClassName="min-h-0 flex-1 p-0"
-            sessions={agentSessions}
+            sessions={displayAgentSessions}
            sortable={!showAllProfiles && agentSessions.length > 1}
            workingSessionIdSet={workingSessionIdSet}
          />
        )}

-        {sidebarOpen && !trimmedQuery && cronJobs.length > 0 && (
+        {contentVisible && !trimmedQuery && cronJobs.length > 0 && (
          <SidebarCronJobsSection
            jobs={cronJobs}
            label={s.cronJobs}
@ -788,9 +954,9 @@ export function ChatSidebar({
          />
        )}

-        {sidebarOpen && !showSessionSections && <div className="min-h-0 flex-1" />}
+        {contentVisible && !showSessionSections && <div className="min-h-0 flex-1" />}

-        {sidebarOpen && (
+        {contentVisible && (
          <div className="shrink-0 px-0.5 pb-1 pt-0.5">
            <ProfileRail />
          </div>
@ -872,8 +1038,9 @@ interface SidebarSessionGroup {
  // Profile color for the ALL-profiles view; absent for workspace groups.
  color?: null | string
  loadingMore?: boolean
-  mode?: 'profile' | 'workspace'
+  mode?: 'profile' | 'source' | 'workspace'
  onLoadMore?: () => void
+  sourceId?: string
  totalCount?: number
 }

@ -928,7 +1095,8 @@ function SidebarSessionsSection({
  onReorder,
  dndSensors
 }: SidebarSessionsSectionProps) {
-  const showEmptyState = forceEmptyState || sessions.length === 0
+  const hasGroupedSessions = Boolean(groups?.some(group => group.sessions.length > 0))
+  const showEmptyState = forceEmptyState || (!hasGroupedSessions && sessions.length === 0)
  const dndActive = sortable && !!onReorder

  const renderRow = (session: SessionInfo) => {
@ -961,12 +1129,25 @@ function SidebarSessionsSection({
      renderRows(items)
    )

+  const renderNestedSessionList = (items: SessionInfo[]) =>
+    dndActive ? (
+      <DndContext collisionDetection={closestCenter} onDragEnd={onReorder} sensors={dndSensors}>
+        <SortableContext items={items.map(s => s.id)} strategy={verticalListSortingStrategy}>
+          {renderRows(items)}
+        </SortableContext>
+      </DndContext>
+    ) : (
+      renderRows(items)
+    )
+
  const flatVirtualized = !showEmptyState && !groups?.length && sessions.length >= VIRTUALIZE_THRESHOLD

  let inner: React.ReactNode
+  let bodyOwnsDndContext = dndActive && !showEmptyState

  if (showEmptyState) {
    inner = emptyState
+    bodyOwnsDndContext = false
  } else if (groups?.length) {
    const groupNodes = groups.map(group =>
      dndActive ? (
@ -974,7 +1155,7 @@ function SidebarSessionsSection({
          group={group}
          key={group.id}
          onNewSession={onNewSessionInWorkspace}
-          renderRows={renderSessionList}
+          renderRows={renderNestedSessionList}
        />
      ) : (
        <SidebarWorkspaceGroup
@ -987,12 +1168,15 @@ function SidebarSessionsSection({
    )

    inner = dndActive ? (
-      <SortableContext items={groups.map(g => wsId(g.id))} strategy={verticalListSortingStrategy}>
-        {groupNodes}
-      </SortableContext>
+      <DndContext collisionDetection={closestCenter} onDragEnd={onReorder} sensors={dndSensors}>
+        <SortableContext items={groups.map(g => groupDndId(g.id))} strategy={verticalListSortingStrategy}>
+          {groupNodes}
+        </SortableContext>
+      </DndContext>
    ) : (
      groupNodes
    )
+    bodyOwnsDndContext = false
  } else if (flatVirtualized) {
    inner = (
      <VirtualSessionList
@ -1011,14 +1195,13 @@ function SidebarSessionsSection({
    inner = renderSessionList(sessions)
  }

-  const body =
-    dndActive && !showEmptyState ? (
-      <DndContext collisionDetection={closestCenter} onDragEnd={onReorder} sensors={dndSensors}>
-        {inner}
-      </DndContext>
-    ) : (
-      inner
-    )
+  const body = bodyOwnsDndContext ? (
+    <DndContext collisionDetection={closestCenter} onDragEnd={onReorder} sensors={dndSensors}>
+      {inner}
+    </DndContext>
+  ) : (
+    inner
+  )

  // The virtualizer owns its own scroller, so suppress the wrapper's overflow
  // to avoid a double scroll container.
@ -1061,6 +1244,7 @@ function SidebarWorkspaceGroup({
  const { t } = useI18n()
  const s = t.sidebar
  const isProfileGroup = group.mode === 'profile'
+  const isSourceGroup = group.mode === 'source'
  const pageStep = isProfileGroup ? PROFILE_INITIAL_PAGE : WORKSPACE_PAGE
  const [open, setOpen] = useState(true)
  const [visibleCount, setVisibleCount] = useState(pageStep)
@ -1086,7 +1270,16 @@ function SidebarWorkspaceGroup({
  }

  return (
-    <div className={cn('grid gap-px', dragging && 'z-10 opacity-60', className)} ref={ref} style={style} {...rest}>
+    <div
+      className={cn(
+        'grid gap-px data-[dragging=true]:z-10 data-[dragging=true]:opacity-70 data-[dragging=true]:will-change-transform',
+        className
+      )}
+      data-dragging={dragging ? 'true' : undefined}
+      ref={ref}
+      style={style}
+      {...rest}
+    >
      <div className="group/workspace flex min-h-6 items-center gap-1 px-2 pt-1 text-[0.6875rem] font-medium text-(--ui-text-tertiary)">
        <button
          className="flex min-w-0 items-center gap-1.5 bg-transparent text-left hover:text-(--ui-text-secondary)"
@ -1094,7 +1287,18 @@ function SidebarWorkspaceGroup({
          type="button"
        >
          {group.color ? (
-            <span aria-hidden="true" className="size-2 shrink-0 rounded-full" style={{ backgroundColor: group.color }} />
+            <span
+              aria-hidden="true"
+              className="size-2 shrink-0 rounded-full"
+              style={{ backgroundColor: group.color }}
+            />
+          ) : null}
+          {isSourceGroup && group.sourceId ? (
+            <PlatformAvatar
+              className="size-4 rounded-[4px] text-[0.5625rem] [&_svg]:size-3"
+              platformId={group.sourceId}
+              platformName={group.label}
+            />
          ) : null}
          <span className="truncate">{group.label}</span>
          <SidebarCount>
@ -1143,7 +1347,11 @@ function SidebarWorkspaceGroup({
          {renderRows(visibleSessions)}
          {hiddenCount > 0 &&
            (isProfileGroup ? (
-              <SidebarLoadMoreRow loading={Boolean(group.loadingMore)} onClick={handleProfileLoadMore} step={nextCount} />
+              <SidebarLoadMoreRow
+                loading={Boolean(group.loadingMore)}
+                onClick={handleProfileLoadMore}
+                step={nextCount}
+              />
            ) : (
              <Tip label={s.showMoreIn(nextCount, group.label)}>
                <button
@ -1169,7 +1377,7 @@ interface SortableWorkspaceProps {
 }

 function SortableSidebarWorkspaceGroup(props: SortableWorkspaceProps) {
-  return <SidebarWorkspaceGroup {...props} {...useSortableBindings(wsId(props.group.id))} />
+  return <SidebarWorkspaceGroup {...props} {...useSortableBindings(groupDndId(props.group.id))} />
 }

 function SidebarCount({ children }: { children: React.ReactNode }) {
--- a/apps/desktop/src/app/chat/sidebar/session-row.tsx
+++ b/apps/desktop/src/app/chat/sidebar/session-row.tsx
@ -176,8 +176,8 @@ export function SidebarSessionRow({
                needsInput ? 'overflow-visible' : 'overflow-hidden'
              )}
            >
-            <SidebarRowDot isWorking={isWorking} needsInput={needsInput} />
-          </span>
+              <SidebarRowDot isWorking={isWorking} needsInput={needsInput} />
+            </span>
          )}
          <span className="min-w-0 flex-1 truncate text-[0.8125rem] font-normal text-(--ui-text-secondary) group-hover:text-foreground group-data-[working=true]:text-foreground/90">
            {title}
--- a/apps/desktop/src/app/desktop-controller.tsx
+++ b/apps/desktop/src/app/desktop-controller.tsx
@ -8,6 +8,7 @@ import { DesktopInstallOverlay } from '@/components/desktop-install-overlay'
 import { DesktopOnboardingOverlay } from '@/components/desktop-onboarding-overlay'
 import { GatewayConnectingOverlay } from '@/components/gateway-connecting-overlay'
 import { Pane, PaneMain } from '@/components/pane-shell'
+import { useMediaQuery } from '@/hooks/use-media-query'
 import { useSkinCommand } from '@/themes/use-skin-command'

 import { formatRefValue } from '../components/assistant-ui/directive-text'
@ -23,6 +24,7 @@ import {
  FILE_BROWSER_MAX_WIDTH,
  FILE_BROWSER_MIN_WIDTH,
  pinSession,
+  setSidebarOverlayMounted,
  SIDEBAR_DEFAULT_WIDTH,
  SIDEBAR_MAX_WIDTH,
  SIDEBAR_SESSIONS_PAGE_SIZE,
@ -46,6 +48,7 @@ import {
  $sessions,
  $workingSessionIds,
  CRON_SECTION_LIMIT,
+  getRecentlySettledSessionIds,
  mergeSessionPage,
  sessionPinId,
  setAwaitingResponse,
@ -76,6 +79,7 @@ import { CommandPalette } from './command-palette'
 import { useGatewayBoot } from './gateway/hooks/use-gateway-boot'
 import { useGatewayRequest } from './gateway/hooks/use-gateway-request'
 import { useKeybinds } from './hooks/use-keybinds'
+import { SIDEBAR_COLLAPSE_MEDIA_QUERY } from './layout-constants'
 import { ModelPickerOverlay } from './model-picker-overlay'
 import { ModelVisibilityOverlay } from './model-visibility-overlay'
 import { RightSidebarPane } from './right-sidebar'
@ -127,12 +131,18 @@ function sameCronSignature(a: SessionInfo[], b: SessionInfo[]): boolean {
 }

 // Rows a session refresh must preserve even if the aggregator omits them:
-// in-flight first turns (message_count 0), pinned rows aged off the page, and
-// the actively-viewed chat (its "working" flag clears a beat before the
-// aggregator sees the persisted row). Pass `scope` to only keep the active row
-// when it belongs to the profile being paged.
+// in-flight first turns (message_count 0), pinned rows aged off the page, the
+// actively-viewed chat (its "working" flag clears a beat before the aggregator
+// sees the persisted row), and sessions whose turn just settled (same race, but
+// for a chat the user has already navigated away from). Pass `scope` to only
+// keep the active row when it belongs to the profile being paged.
 function sessionsToKeep(scope?: string): Set<string> {
-  const keep = new Set<string>([...$workingSessionIds.get(), ...$pinnedSessionIds.get()])
+  const keep = new Set<string>([
+    ...$workingSessionIds.get(),
+    ...$pinnedSessionIds.get(),
+    ...getRecentlySettledSessionIds()
+  ])
+
  const active = $selectedStoredSessionId.get()

  if (active) {
@ -165,6 +175,10 @@ export function DesktopController() {
  const terminalTakeover = useStore($terminalTakeover)
  const panesFlipped = useStore($panesFlipped)
  const profileScope = useStore($profileScope)
+  // Below SIDEBAR_COLLAPSE_BREAKPOINT_PX there's no room for a docked rail —
+  // collapse both sidebars (without touching their stored open state) so the
+  // hover-reveal overlay becomes the way in. Restores once it's wide again.
+  const narrowViewport = useMediaQuery(SIDEBAR_COLLAPSE_MEDIA_QUERY)

  const routedSessionId = routeSessionId(location.pathname)
  const routeToken = `${location.pathname}:${location.search}:${location.hash}`
@ -300,6 +314,7 @@ export function DesktopController() {
      // with few recent sessions isn't windowed out of the cross-profile
      // recency page — the empty-history-on-profile-switch bug.
      const sessionProfile = profileScope === ALL_PROFILES ? 'all' : profileScope
+
      const result = await listAllProfileSessions(limit, 1, 'exclude', 'recent', sessionProfile, {
        excludeSources: ['cron']
      })
@ -846,6 +861,8 @@ export function DesktopController() {
    <Pane
      defaultOpen={false}
      disabled={!chatOpen}
+      forceCollapsed={narrowViewport}
+      hoverReveal
      id="file-browser"
      key="file-browser"
      maxWidth={FILE_BROWSER_MAX_WIDTH}
@ -873,9 +890,12 @@ export function DesktopController() {
    >
      <Pane
        disabled={terminalTakeoverActive}
+        forceCollapsed={narrowViewport}
+        hoverReveal
        id="chat-sidebar"
        maxWidth={SIDEBAR_MAX_WIDTH}
        minWidth={SIDEBAR_DEFAULT_WIDTH}
+        onOverlayActiveChange={setSidebarOverlayMounted}
        resizable
        side={sidebarSide}
        width={`${SIDEBAR_DEFAULT_WIDTH}px`}
--- a/apps/desktop/src/app/gateway/hooks/use-gateway-boot.ts
+++ b/apps/desktop/src/app/gateway/hooks/use-gateway-boot.ts
@ -120,6 +120,13 @@ export function useGatewayBoot({
      reconnecting = true

      try {
+        // Drop a stale REMOTE backend cache before re-dialing. After sleep/wake a
+        // remote backend can become unreachable, but it has no child process
+        // whose 'exit' would clear the main process's cached descriptor — without
+        // this the renderer re-dials the same dead endpoint forever and stays on
+        // "Starting Hermes…". The probe is a no-op for a healthy or local backend.
+        await desktop.revalidateConnection?.().catch(() => undefined)
+
        const conn = await desktop.getConnection($activeGatewayProfile.get())

        if (cancelled) {
@ -218,6 +225,15 @@ export function useGatewayBoot({
        reconnectAttempt = 0
        reauthNotified = false
        clearReconnectTimer()
+
+        // A revalidate-driven reconnect can rebuild the backend in place when the
+        // cached remote was found dead, which re-drives the boot-progress overlay.
+        // Unlike the initial boot, nothing calls completeDesktopBoot() afterwards,
+        // so dismiss it here once we're open again — otherwise the overlay sticks
+        // at ~94%. A no-op on a normal (non-rebuild) reconnect.
+        if (bootCompleted) {
+          completeDesktopBoot()
+        }
      } else if (bootCompleted && (st === 'closed' || st === 'error')) {
        // The socket dropped after a healthy boot (typically sleep/wake). Try
        // to bring it back instead of leaving the composer stuck disabled.
--- a/apps/desktop/src/app/hooks/use-keybinds.ts
+++ b/apps/desktop/src/app/hooks/use-keybinds.ts
@ -2,11 +2,15 @@ import { useEffect, useRef } from 'react'
 import { useNavigate } from 'react-router-dom'

 import { setRightSidebarTab } from '@/app/right-sidebar/store'
+import { PANE_TOGGLE_REVEAL_EVENT } from '@/components/pane-shell'
+import { matchesQuery } from '@/hooks/use-media-query'
 import { PROFILE_SLOT_COUNT } from '@/lib/keybinds/actions'
 import { comboAllowedInInput, comboFromEvent, isEditableTarget } from '@/lib/keybinds/combo'
 import { toggleCommandPalette } from '@/store/command-palette'
 import { $capture, $comboIndex, endCapture, setBinding, toggleKeybindPanel } from '@/store/keybinds'
 import {
+  CHAT_SIDEBAR_PANE_ID,
+  FILE_BROWSER_PANE_ID,
  requestSessionSearchFocus,
  setFileBrowserOpen,
  toggleFileBrowserOpen,
@ -24,6 +28,7 @@ import { $activeSessionId, $sessions, setModelPickerOpen } from '@/store/session
 import { useTheme } from '@/themes/context'

 import { requestComposerFocus } from '../chat/composer/focus'
+import { SIDEBAR_COLLAPSE_MEDIA_QUERY } from '../layout-constants'
 import {
  AGENTS_ROUTE,
  ARTIFACTS_ROUTE,
@ -109,8 +114,20 @@ export function useKeybinds(deps: KeybindRuntimeDeps): void {
    'session.focusSearch': requestSessionSearchFocus,
    'session.togglePin': deps.toggleSelectedPin,

-    'view.toggleSidebar': toggleSidebarOpen,
-    'view.toggleRightSidebar': toggleFileBrowserOpen,
+    'view.toggleSidebar': () => {
+      if (matchesQuery(SIDEBAR_COLLAPSE_MEDIA_QUERY)) {
+        window.dispatchEvent(new CustomEvent(PANE_TOGGLE_REVEAL_EVENT, { detail: { id: CHAT_SIDEBAR_PANE_ID } }))
+      } else {
+        toggleSidebarOpen()
+      }
+    },
+    'view.toggleRightSidebar': () => {
+      if (matchesQuery(SIDEBAR_COLLAPSE_MEDIA_QUERY)) {
+        window.dispatchEvent(new CustomEvent(PANE_TOGGLE_REVEAL_EVENT, { detail: { id: FILE_BROWSER_PANE_ID } }))
+      } else {
+        toggleFileBrowserOpen()
+      }
+    },
    'view.showFiles': () => showRightSidebarTab('files'),
    'view.showTerminal': () => showRightSidebarTab('terminal'),
    'view.flipPanes': togglePanesFlipped,
--- a/apps/desktop/src/app/layout-constants.ts
+++ b/apps/desktop/src/app/layout-constants.ts
@ -11,3 +11,9 @@ export const PAGE_INSET_X = 'px-[clamp(1.25rem,4vw,4rem)]'
 // Matching negative inline-margin to bleed an element (e.g. a sticky header bar)
 // out to the gutter edges before re-applying PAGE_INSET_X.
 export const PAGE_INSET_NEG_X = '-mx-[clamp(1.25rem,4vw,4rem)]'
+
+// Below this viewport width a docked sidebar leaves no room for content, so both
+// rails auto-collapse into the hover-reveal overlay. Single source of truth for
+// the responsive collapse point.
+export const SIDEBAR_COLLAPSE_BREAKPOINT_PX = 768
+export const SIDEBAR_COLLAPSE_MEDIA_QUERY = `(max-width: ${SIDEBAR_COLLAPSE_BREAKPOINT_PX}px)`
--- a/apps/desktop/src/app/messaging/platform-icon.tsx
+++ b/apps/desktop/src/app/messaging/platform-icon.tsx
@ -28,15 +28,17 @@ import { cn } from '@/lib/utils'
 type IconKind = 'brand' | 'generic'

 interface PlatformIconSpec {
-  Icon: ComponentType<SVGProps<SVGSVGElement>>
+  Icon?: ComponentType<SVGProps<SVGSVGElement>>
  color: string
  kind: IconKind
+  monogram?: string
 }

 const PLATFORM_ICONS: Record<string, PlatformIconSpec> = {
  telegram: { Icon: SiTelegram, color: '#26A5E4', kind: 'brand' },
  discord: { Icon: SiDiscord, color: '#5865F2', kind: 'brand' },
  // Slack removed from Simple Icons by Salesforce request — letter monogram.
+  slack: { color: '#4A154B', kind: 'brand', monogram: 'S' },
  mattermost: { Icon: SiMattermost, color: '#0058CC', kind: 'brand' },
  matrix: { Icon: SiMatrix, color: '#000000', kind: 'brand' },
  signal: { Icon: SiSignal, color: '#3A76F0', kind: 'brand' },
@ -87,7 +89,7 @@ export function PlatformAvatar({ className, platformId, platformName }: Platform
        color
      }}
    >
-      <Icon className="size-3.5" />
+      {Icon ? <Icon className="size-3.5" /> : spec.monogram || platformName.charAt(0).toUpperCase()}
    </span>
  )
 }
--- a/apps/desktop/src/app/session/hooks/use-message-stream.ts
+++ b/apps/desktop/src/app/session/hooks/use-message-stream.ts
@ -14,6 +14,7 @@ import {
  upsertToolPart
 } from '@/lib/chat-messages'
 import { coerceGatewayText, coerceThinkingText, normalizePersonalityValue } from '@/lib/chat-runtime'
+import { gatewayEventRequiresSessionId } from '@/lib/gateway-events'
 import { triggerHaptic } from '@/lib/haptics'
 import { isProviderSetupErrorMessage } from '@/lib/provider-setup-errors'
 import { setClarifyRequest } from '@/store/clarify'
@ -613,6 +614,9 @@ export function useMessageStream({
    (event: RpcEvent) => {
      const payload = event.payload as GatewayEventPayload | undefined
      const explicitSid = event.session_id || ''
+      if (!explicitSid && gatewayEventRequiresSessionId(event.type)) {
+        return
+      }
      const sessionId = explicitSid || activeSessionIdRef.current
      const isActiveEvent = !!sessionId && sessionId === activeSessionIdRef.current

--- a/apps/desktop/src/app/session/hooks/use-session-state-cache.ts
+++ b/apps/desktop/src/app/session/hooks/use-session-state-cache.ts
@ -9,6 +9,28 @@ import { $busy, $messages, noteSessionActivity, setSessionAttention, setSessionW

 import type { ClientSessionState } from '../../types'

+// Shallow per-message identity check. When a flush carries no transcript
+// changes, `preserveLocalAssistantErrors` returns the same message objects in
+// the same order, so reference equality per slot is enough to detect "nothing
+// to publish" and avoid a needless `$messages` churn.
+function sameMessageList(a: ChatMessage[], b: ChatMessage[]): boolean {
+  if (a === b) {
+    return true
+  }
+
+  if (a.length !== b.length) {
+    return false
+  }
+
+  for (let index = 0; index < a.length; index += 1) {
+    if (a[index] !== b[index]) {
+      return false
+    }
+  }
+
+  return true
+}
+
 interface SessionStateCacheOptions {
  activeSessionId: string | null
  busyRef: MutableRefObject<boolean>
@ -88,7 +110,20 @@ export function useSessionStateCache({
      return
    }

-    setMessages(preserveLocalAssistantErrors(pending.state.messages, $messages.get()))
+    // `preserveLocalAssistantErrors` always returns a fresh array, so publishing
+    // it unconditionally puts a new `$messages` reference on the store every
+    // flush — including the periodic `session.info` heartbeats that don't touch
+    // the transcript. That churns ChatView → runtimeMessageRepository → the
+    // assistant-ui runtime → the virtualizer, which re-measures and visibly
+    // jerks the scroll position while the user is reading. Skip the publish when
+    // the merged result is content-identical to what's already on screen.
+    const currentMessages = $messages.get()
+    const nextMessages = preserveLocalAssistantErrors(pending.state.messages, currentMessages)
+
+    if (!sameMessageList(nextMessages, currentMessages)) {
+      setMessages(nextMessages)
+    }
+
    setBusy(pending.state.busy)
    setMutableRef(busyRef, pending.state.busy)
    setAwaitingResponse(pending.state.awaitingResponse)
--- a/apps/desktop/src/app/settings/appearance-settings.tsx
+++ b/apps/desktop/src/app/settings/appearance-settings.tsx
@ -6,6 +6,7 @@ import { useI18n } from '@/i18n'
 import { triggerHaptic } from '@/lib/haptics'
 import { Check, Palette } from '@/lib/icons'
 import { cn } from '@/lib/utils'
+import { $activeGatewayProfile, $profiles, normalizeProfileKey } from '@/store/profile'
 import { $toolViewMode, setToolViewMode } from '@/store/tool-view'
 import { useTheme } from '@/themes/context'
 import { BUILTIN_THEMES } from '@/themes/presets'
@ -57,8 +58,17 @@ export function AppearanceSettings() {
  const { t, isSavingLocale } = useI18n()
  const { themeName, mode, availableThemes, setTheme, setMode } = useTheme()
  const toolViewMode = useStore($toolViewMode)
+  const profiles = useStore($profiles)
+  const activeProfileKey = normalizeProfileKey(useStore($activeGatewayProfile))
  const a = t.settings.appearance

+  // Themes save per profile. Surface that only when the user actually has more
+  // than one profile (single-profile installs never see the distinction).
+  const showProfileNote = profiles.length > 1
+
+  const activeProfileName =
+    profiles.find(profile => normalizeProfileKey(profile.name) === activeProfileKey)?.name ?? activeProfileKey
+
  const modeOptions = MODE_OPTIONS.map(({ id, icon }) => ({ icon, id, label: t.settings.modeOptions[id].label }))

  const toolOptions = [
@ -98,43 +108,50 @@ export function AppearanceSettings() {

          <ListRow
            below={
-              <div className="mt-3 grid gap-3 sm:grid-cols-2 xl:grid-cols-3">
-                {availableThemes.map(theme => {
-                  const active = themeName === theme.name
+              <>
+                <div className="mt-3 grid gap-3 sm:grid-cols-2 xl:grid-cols-3">
+                  {availableThemes.map(theme => {
+                    const active = themeName === theme.name

-                  return (
-                    <button
-                      className={cn(
-                        'rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-2 text-left transition hover:bg-(--chrome-action-hover)',
-                        active && 'border-(--ui-stroke-secondary) bg-(--ui-bg-tertiary)'
-                      )}
-                      key={theme.name}
-                      onClick={() => {
-                        triggerHaptic('crisp')
-                        setTheme(theme.name)
-                      }}
-                      type="button"
-                    >
-                      <ThemePreview name={theme.name} />
-                      <div className="mt-3 flex items-start justify-between gap-3 px-1">
-                        <div className="min-w-0">
-                          <div className="truncate text-[length:var(--conversation-text-font-size)] font-medium">
-                            {theme.label}
-                          </div>
-                          <div className="mt-0.5 line-clamp-2 text-[length:var(--conversation-caption-font-size)] leading-(--conversation-caption-line-height) text-(--ui-text-tertiary)">
-                            {theme.description}
-                          </div>
-                        </div>
-                        {active && (
-                          <span className="mt-0.5 grid size-5 shrink-0 place-items-center rounded-full bg-primary text-primary-foreground">
-                            <Check className="size-3.5" />
-                          </span>
+                    return (
+                      <button
+                        className={cn(
+                          'rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-2 text-left transition hover:bg-(--chrome-action-hover)',
+                          active && 'border-(--ui-stroke-secondary) bg-(--ui-bg-tertiary)'
                        )}
-                      </div>
-                    </button>
-                  )
-                })}
-              </div>
+                        key={theme.name}
+                        onClick={() => {
+                          triggerHaptic('crisp')
+                          setTheme(theme.name)
+                        }}
+                        type="button"
+                      >
+                        <ThemePreview name={theme.name} />
+                        <div className="mt-3 flex items-start justify-between gap-3 px-1">
+                          <div className="min-w-0">
+                            <div className="truncate text-[length:var(--conversation-text-font-size)] font-medium">
+                              {theme.label}
+                            </div>
+                            <div className="mt-0.5 line-clamp-2 text-[length:var(--conversation-caption-font-size)] leading-(--conversation-caption-line-height) text-(--ui-text-tertiary)">
+                              {theme.description}
+                            </div>
+                          </div>
+                          {active && (
+                            <span className="mt-0.5 grid size-5 shrink-0 place-items-center rounded-full bg-primary text-primary-foreground">
+                              <Check className="size-3.5" />
+                            </span>
+                          )}
+                        </div>
+                      </button>
+                    )
+                  })}
+                </div>
+                {showProfileNote && (
+                  <p className="mt-3 text-[length:var(--conversation-caption-font-size)] leading-(--conversation-caption-line-height) text-(--ui-text-tertiary)">
+                    {a.themeProfileNote(activeProfileName)}
+                  </p>
+                )}
+              </>
            }
            description={a.themeDesc}
            title={a.themeTitle}
--- a/apps/desktop/src/app/shell/app-shell.tsx
+++ b/apps/desktop/src/app/shell/app-shell.tsx
@ -5,6 +5,7 @@ import { useSyncExternalStore } from 'react'
 import { NotificationStack } from '@/components/notifications'
 import { PaneShell } from '@/components/pane-shell'
 import { SidebarProvider } from '@/components/ui/sidebar'
+import { useMediaQuery } from '@/hooks/use-media-query'
 import {
  $fileBrowserOpen,
  $panesFlipped,
@ -16,6 +17,8 @@ import {
 import { $paneWidthOverride } from '@/store/panes'
 import { $connection } from '@/store/session'

+import { SIDEBAR_COLLAPSE_MEDIA_QUERY } from '../layout-constants'
+
 import { KeybindPanel } from './keybind-panel'
 import { StatusbarControls, type StatusbarItem } from './statusbar-controls'
 import { TITLEBAR_HEIGHT, titlebarControlsPosition } from './titlebar'
@ -58,6 +61,7 @@ export function AppShell({
  const sidebarOpen = useStore($sidebarOpen)
  const fileBrowserOpen = useStore($fileBrowserOpen)
  const panesFlipped = useStore($panesFlipped)
+  const narrowViewport = useMediaQuery(SIDEBAR_COLLAPSE_MEDIA_QUERY)
  const fileBrowserWidthOverride = useStore($paneWidthOverride(FILE_BROWSER_PANE_ID))
  const connection = useStore($connection)
  const viewportFullscreen = useSyncExternalStore(subscribeWindowSize, viewportIsFullscreen, () => false)
@ -71,8 +75,10 @@ export function AppShell({

  // The inset clears the top-left titlebar buttons when nothing covers the
  // window's left edge. Default layout: the sessions sidebar sits there.
-  // Flipped layout: the file browser does instead.
-  const leftEdgePaneOpen = panesFlipped ? fileBrowserOpen : sidebarOpen
+  // Flipped layout: the file browser does instead. Below the collapse
+  // breakpoint both rails are force-collapsed (hover-reveal overlay), so the
+  // edge is uncovered regardless of their stored open state.
+  const leftEdgePaneOpen = !narrowViewport && (panesFlipped ? fileBrowserOpen : sidebarOpen)

  const titlebarContentInset = leftEdgePaneOpen
    ? 0
--- a/apps/desktop/src/app/shell/hooks/use-statusbar-items.tsx
+++ b/apps/desktop/src/app/shell/hooks/use-statusbar-items.tsx
@ -4,6 +4,7 @@ import { useCallback, useMemo } from 'react'

 import type { CommandCenterSection } from '@/app/command-center'
 import { GatewayMenuPanel } from '@/app/shell/gateway-menu-panel'
+import { useI18n } from '@/i18n'
 import {
  Activity,
  AlertCircle,
@ -16,17 +17,17 @@ import {
  Zap,
  ZapFilled
 } from '@/lib/icons'
-import { useI18n } from '@/i18n'
 import { formatModelStatusLabel } from '@/lib/model-status-label'
 import type { RuntimeReadinessResult } from '@/lib/runtime-readiness'
 import { contextBarLabel, LiveDuration, usageContextLabel } from '@/lib/statusbar'
 import { cn } from '@/lib/utils'
-import { setSessionYolo } from '@/lib/yolo-session'
+import { setGlobalYolo, setSessionYolo } from '@/lib/yolo-session'
 import { $desktopActionTasks } from '@/store/activity'
 import { $previewServerRestartStatus } from '@/store/preview'
 import {
  $activeSessionId,
  $busy,
+  $connection,
  $currentFastMode,
  $currentModel,
  $currentProvider,
@ -40,11 +41,18 @@ import {
  setYoloActive
 } from '@/store/session'
 import { $subagentsBySession, activeSubagentCount } from '@/store/subagents'
-import { $desktopVersion, $updateApply, $updateStatus, setUpdateOverlayOpen } from '@/store/updates'
+import {
+  $backendUpdateApply,
+  $backendUpdateStatus,
+  $desktopVersion,
+  $updateApply,
+  $updateStatus,
+  openUpdateOverlayFor
+} from '@/store/updates'
 import type { StatusResponse } from '@/types/hermes'

 import { CRON_ROUTE } from '../../routes'
-import type { StatusbarItem } from '../statusbar-controls'
+import type { StatusbarItem, StatusbarSelectModifiers } from '../statusbar-controls'

 interface StatusbarItemsOptions {
  agentsOpen: boolean
@ -97,7 +105,10 @@ export function useStatusbarItems({
  const subagentsBySession = useStore($subagentsBySession)
  const updateStatus = useStore($updateStatus)
  const updateApply = useStore($updateApply)
+  const backendUpdateStatus = useStore($backendUpdateStatus)
+  const backendUpdateApply = useStore($backendUpdateApply)
  const desktopVersion = useStore($desktopVersion)
+  const connection = useStore($connection)

  const contextUsage = useMemo(() => usageContextLabel(currentUsage), [currentUsage])
  const contextBar = useMemo(() => contextBarLabel(currentUsage), [currentUsage])
@ -105,22 +116,39 @@ export function useStatusbarItems({
  // Per-session approval bypass (same scope as the TUI's Shift+Tab). On a
  // new-chat draft (no runtime session yet) we arm locally; the session-create
  // path applies it once the backend session exists.
-  const toggleYolo = useCallback(async () => {
-    const next = !$yoloActive.get()
-    const sid = $activeSessionId.get()
+  //
+  // Shift+click flips the GLOBAL approvals.mode instead — a persistent,
+  // all-sessions/CLI/TUI/cron bypass that survives restarts.
+  const toggleYolo = useCallback(
+    async (modifiers?: StatusbarSelectModifiers) => {
+      const next = !$yoloActive.get()

-    setYoloActive(next)
+      setYoloActive(next)

-    if (!sid) {
-      return
-    }
+      if (modifiers?.shiftKey) {
+        try {
+          await setGlobalYolo(requestGateway, next)
+        } catch {
+          setYoloActive(!next)
+        }

-    try {
-      await setSessionYolo(requestGateway, sid, next)
-    } catch {
-      setYoloActive(!next)
-    }
-  }, [requestGateway])
+        return
+      }
+
+      const sid = $activeSessionId.get()
+
+      if (!sid) {
+        return
+      }
+
+      try {
+        await setSessionYolo(requestGateway, sid, next)
+      } catch {
+        setYoloActive(!next)
+      }
+    },
+    [requestGateway]
+  )

  const showYoloToggle = gatewayState === 'open' && (!!activeSessionId || freshDraftReady)

@ -177,18 +205,19 @@ export function useStatusbarItems({
      ? 'text-amber-600 hover:text-amber-600'
      : 'text-destructive hover:text-destructive'

-  const versionItem = useMemo<StatusbarItem>(() => {
+  const clientVersionItem = useMemo<StatusbarItem>(() => {
    const appVersion = desktopVersion?.appVersion
    const sha = updateStatus?.currentSha?.slice(0, 7) ?? null
    const behind = updateStatus?.behind ?? 0
    const applying = updateApply.applying || updateApply.stage === 'restart'
-    const base = appVersion ? `v${appVersion}` : (sha ?? copy.unknown)
+    const remote = connection?.mode === 'remote'
+
+    const version = appVersion ? `v${appVersion}` : (sha ?? copy.unknown)
+    const base = remote ? copy.clientLabel(appVersion ?? sha ?? copy.unknown) : version
    const behindHint = !applying && behind > 0 ? ` (+${behind})` : ''

    const label = applying
-      ? updateApply.stage === 'restart'
-        ? `${base} · ${copy.restart}`
-        : `${base} · ${copy.update}`
+      ? `${base} · ${updateApply.stage === 'restart' ? copy.restart : copy.update}`
      : `${base}${behindHint}`

    const tooltip = [
@ -203,17 +232,18 @@ export function useStatusbarItems({

    return {
      className: !applying && behind > 0 ? 'text-primary hover:text-primary' : undefined,
-      detail: appVersion && sha && !applying ? sha : undefined,
+      detail: appVersion && sha && !applying && !remote ? sha : undefined,
      hidden: !appVersion && !sha,
      icon: applying ? <Loader2 className="size-3 animate-spin" /> : <Hash className="size-3" />,
-      id: 'version',
+      id: 'version-client',
      label,
-      onSelect: () => setUpdateOverlayOpen(true),
+      onSelect: () => openUpdateOverlayFor('client'),
      title: tooltip || undefined,
      variant: 'action'
    }
  }, [
    desktopVersion?.appVersion,
+    connection?.mode,
    copy,
    updateApply.applying,
    updateApply.message,
@ -223,6 +253,50 @@ export function useStatusbarItems({
    updateStatus?.currentSha
  ])

+  const backendVersionItem = useMemo<StatusbarItem | null>(() => {
+    if (connection?.mode !== 'remote') {
+      return null
+    }
+
+    const backendVersion = statusSnapshot?.version
+    const behind = backendUpdateStatus?.behind ?? 0
+    const applying = backendUpdateApply.applying || backendUpdateApply.stage === 'restart'
+
+    const base = copy.backendLabel(backendVersion ?? copy.unknown)
+    const behindHint = !applying && behind > 0 ? ` (+${behind})` : ''
+
+    const label = applying
+      ? `${base} · ${backendUpdateApply.stage === 'restart' ? copy.restart : copy.update}`
+      : `${base}${behindHint}`
+
+    const tooltip = [
+      applying ? backendUpdateApply.message || copy.updateInProgress : null,
+      !applying && behind > 0 && copy.commitsBehind(behind, 'main'),
+      backendVersion && copy.backendVersion(backendVersion)
+    ]
+      .filter(Boolean)
+      .join(' · ')
+
+    return {
+      className: !applying && behind > 0 ? 'text-primary hover:text-primary' : undefined,
+      hidden: !backendVersion,
+      icon: applying ? <Loader2 className="size-3 animate-spin" /> : <Hash className="size-3" />,
+      id: 'version-backend',
+      label,
+      onSelect: () => openUpdateOverlayFor('backend'),
+      title: tooltip || undefined,
+      variant: 'action'
+    }
+  }, [
+    connection?.mode,
+    statusSnapshot?.version,
+    backendUpdateStatus?.behind,
+    backendUpdateApply.applying,
+    backendUpdateApply.message,
+    backendUpdateApply.stage,
+    copy
+  ])
+
  const coreLeftStatusbarItems = useMemo<readonly StatusbarItem[]>(
    () => [
      {
@ -333,7 +407,7 @@ export function useStatusbarItems({
          <Zap className="size-3.5 shrink-0 opacity-70" />
        ),
        id: 'yolo',
-        onSelect: () => void toggleYolo(),
+        onSelect: modifiers => void toggleYolo(modifiers),
        title: yoloActive ? copy.yoloOn : copy.yoloOff,
        variant: 'action'
      },
@ -368,7 +442,8 @@ export function useStatusbarItems({
              variant: 'action' as const
            })
      },
-      versionItem
+      clientVersionItem,
+      ...(backendVersionItem ? [backendVersionItem] : [])
    ],
    [
      busy,
@ -384,7 +459,8 @@ export function useStatusbarItems({
      showYoloToggle,
      toggleYolo,
      turnStartedAt,
-      versionItem,
+      clientVersionItem,
+      backendVersionItem,
      yoloActive
    ]
  )
--- a/apps/desktop/src/app/shell/model-menu-panel.tsx
+++ b/apps/desktop/src/app/shell/model-menu-panel.tsx
@ -24,6 +24,7 @@ import {
  $visibleModels,
  collapseModelFamilies,
  DEFAULT_VISIBLE_PER_PROVIDER,
+  effectiveVisibleKeys,
  type ModelFamily,
  modelVisibilityKey,
  setModelVisibilityOpen
@ -86,13 +87,17 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
    : null

  const providers = modelOptions.data?.providers
+  const effectiveVisibleModels = useMemo(
+    () => effectiveVisibleKeys(visibleModels, providers ?? []),
+    [visibleModels, providers]
+  )

  const switchTo = (model: string, provider: string) =>
    onSelectModel({ model, persistGlobal: !activeSessionId, provider })

  const groups = useMemo(
-    () => groupModels(providers ?? [], search, { model: optionsModel, provider: optionsProvider }, visibleModels),
-    [providers, search, optionsModel, optionsProvider, visibleModels]
+    () => groupModels(providers ?? [], search, { model: optionsModel, provider: optionsProvider }, effectiveVisibleModels),
+    [providers, search, optionsModel, optionsProvider, effectiveVisibleModels]
  )

  return (
--- a/apps/desktop/src/app/shell/statusbar-controls.tsx
+++ b/apps/desktop/src/app/shell/statusbar-controls.tsx
@ -35,12 +35,16 @@ export interface StatusbarItem {
  menuClassName?: string
  menuContent?: ReactNode
  menuItems?: readonly StatusbarMenuItem[]
-  onSelect?: () => void
+  onSelect?: (modifiers: StatusbarSelectModifiers) => void
  title?: string
  to?: string
  variant?: 'action' | 'link' | 'menu' | 'text'
 }

+export interface StatusbarSelectModifiers {
+  shiftKey: boolean
+}
+
 export type StatusbarItemSide = 'left' | 'right'
 export type SetStatusbarItemGroup = (id: string, items: readonly StatusbarItem[], side?: StatusbarItemSide) => void

@ -170,12 +174,12 @@ function StatusbarItemView({ item, navigate }: { item: StatusbarItem; navigate:
    <button
      className={cn(STATUSBAR_ACTION_CLASS, item.className)}
      disabled={item.disabled}
-      onClick={() => {
+      onClick={event => {
        if (item.to) {
          navigate(item.to)
        }

-        item.onSelect?.()
+        item.onSelect?.({ shiftKey: event.shiftKey })
      }}
      type="button"
    >
--- a/apps/desktop/src/app/updates-overlay.tsx
+++ b/apps/desktop/src/app/updates-overlay.tsx
@ -12,12 +12,19 @@ import { useI18n } from '@/i18n'
 import { buildCommitChangelog, type CommitGroup } from '@/lib/commit-changelog'
 import { AlertCircle, Check, CheckCircle2, Copy, Terminal } from '@/lib/icons'
 import { cn } from '@/lib/utils'
+import { resolveUpdateCopy, type UpdateTarget } from '@/lib/update-copy'
 import {
+  $backendUpdateApply,
+  $backendUpdateChecking,
+  $backendUpdateStatus,
  $updateApply,
  $updateChecking,
  $updateOverlayOpen,
+  $updateOverlayTarget,
  $updateStatus,
+  applyBackendUpdate,
  applyUpdates,
+  checkBackendUpdates,
  checkUpdates,
  resetUpdateApplyState,
  setUpdateOverlayOpen,
@ -30,15 +37,27 @@ function totalItems(groups: readonly CommitGroup[]) {

 export function UpdatesOverlay() {
  const open = useStore($updateOverlayOpen)
-  const status = useStore($updateStatus)
-  const checking = useStore($updateChecking)
-  const apply = useStore($updateApply)
+  const target = useStore($updateOverlayTarget)
+
+  const clientStatus = useStore($updateStatus)
+  const clientChecking = useStore($updateChecking)
+  const clientApply = useStore($updateApply)
+  const backendStatus = useStore($backendUpdateStatus)
+  const backendChecking = useStore($backendUpdateChecking)
+  const backendApply = useStore($backendUpdateApply)
+
+  const isBackend = target === 'backend'
+  const status = isBackend ? backendStatus : clientStatus
+  const checking = isBackend ? backendChecking : clientChecking
+  const apply = isBackend ? backendApply : clientApply
+  const check = isBackend ? checkBackendUpdates : checkUpdates
+  const install = isBackend ? applyBackendUpdate : applyUpdates

  useEffect(() => {
    if (open && !status && !checking) {
-      void checkUpdates()
+      void check()
    }
-  }, [checking, open, status])
+  }, [check, checking, open, status])

  const behind = status?.behind ?? 0

@ -64,7 +83,7 @@ export function UpdatesOverlay() {
  }

  const handleInstall = () => {
-    void applyUpdates()
+    void install()
  }

  return (
@ -73,7 +92,7 @@ export function UpdatesOverlay() {
        className="max-w-sm overflow-hidden border-border/70 p-0 gap-0"
        showCloseButton={phase !== 'applying'}
      >
-        {phase === 'applying' && <ApplyingView apply={apply} />}
+        {phase === 'applying' && <ApplyingView apply={apply} isBackend={isBackend} />}

        {phase === 'manual' && (
          <ManualView command={apply.command ?? 'hermes update'} onDone={() => handleClose(false)} />
@ -90,8 +109,9 @@ export function UpdatesOverlay() {
            commits={status?.commits ?? []}
            onInstall={handleInstall}
            onLater={() => handleClose(false)}
-            onRetryCheck={() => void checkUpdates()}
+            onRetryCheck={() => void check()}
            status={status}
+            target={target}
          />
        )}
      </DialogContent>
@ -106,7 +126,8 @@ function IdleView({
  onInstall,
  onLater,
  onRetryCheck,
-  status
+  status,
+  target
 }: {
  behind: number
  checking: boolean
@ -115,6 +136,7 @@ function IdleView({
  onLater: () => void
  onRetryCheck: () => void
  status: DesktopUpdateStatus | null
+  target: UpdateTarget
 }) {
  const { t } = useI18n()
  const u = t.updates
@ -167,7 +189,7 @@ function IdleView({
  if (behind === 0) {
    return (
      <CenteredStatus
-        body={u.latestBody}
+        body={target === 'backend' ? u.latestBodyBackend : u.latestBody}
        icon={<CheckCircle2 className="size-7 text-emerald-600 dark:text-emerald-400" />}
        title={u.allSetTitle}
      />
@ -178,14 +200,20 @@ function IdleView({
  const shownItems = totalItems(groups)
  const remaining = Math.max(0, behind - shownItems)

+  // Name what's being updated. In remote mode the overlay acts on the connected
+  // backend, not the local client — say so. When there are no commit rows to
+  // show (e.g. pip/non-git backend), degrade to honest "no release notes" copy
+  // instead of generic filler.
+  const { title, body } = resolveUpdateCopy({ target, shownItems, copy: u })
+
  return (
    <div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
      <div className="flex flex-col items-center gap-3 text-center">
        <BrandMark className="size-16" />

-        <DialogTitle className="text-center text-xl">{u.availableTitle}</DialogTitle>
+        <DialogTitle className="text-center text-xl">{title}</DialogTitle>
        <DialogDescription className="text-center text-sm">
-          {u.availableBody}
+          {body}
        </DialogDescription>
      </div>

@ -281,10 +309,11 @@ function ManualView({ command, onDone }: { command: string; onDone: () => void }
  )
 }

-function ApplyingView({ apply }: { apply: UpdateApplyState }) {
+function ApplyingView({ apply, isBackend }: { apply: UpdateApplyState; isBackend: boolean }) {
  const { t } = useI18n()
  const u = t.updates
  const label = u.stages[apply.stage as DesktopUpdateStage] ?? u.stages.idle
+  const body = isBackend ? u.applyingBodyBackend : u.applyingBody

  const percent =
    typeof apply.percent === 'number' && Number.isFinite(apply.percent)
@ -298,7 +327,7 @@ function ApplyingView({ apply }: { apply: UpdateApplyState }) {

        <DialogTitle className="text-center text-xl">{label}</DialogTitle>
        <DialogDescription className="text-center text-sm">
-          {u.applyingBody}
+          {body}
        </DialogDescription>
      </div>

--- a/apps/desktop/src/components/assistant-ui/markdown-text.tsx
+++ b/apps/desktop/src/components/assistant-ui/markdown-text.tsx
@ -425,7 +425,7 @@ function MarkdownTextSurface({ containerClassName, containerProps }: MarkdownTex
          <div className="aui-md-table my-2 max-w-full overflow-x-auto rounded-[0.375rem] border border-border">
            <table
              className={cn(
-                'm-0 w-full border-collapse text-[0.8125rem] [&_tr]:border-b [&_tr]:border-border last:[&_tr]:border-0',
+                'm-0 w-full min-w-[18rem] border-collapse text-[0.8125rem] [&_tr]:border-b [&_tr]:border-border last:[&_tr]:border-0',
                className
              )}
              {...props}
@ -438,7 +438,7 @@ function MarkdownTextSurface({ containerClassName, containerProps }: MarkdownTex
        th: ({ className, ...props }: ComponentProps<'th'>) => (
          <th
            className={cn(
-              'px-2.5 py-1.5 text-left align-middle text-[0.75rem] font-medium text-muted-foreground',
+              'whitespace-nowrap px-2.5 py-1.5 text-left align-middle text-[0.75rem] font-medium text-muted-foreground',
              className
            )}
            {...props}
--- a/apps/desktop/src/components/assistant-ui/streaming.test.tsx
+++ b/apps/desktop/src/components/assistant-ui/streaming.test.tsx
@ -489,7 +489,7 @@ describe('assistant-ui streaming renderer', () => {
    expect(viewport.scrollTop).toBe(420)
  })

-  it('keeps sticky-bottom armed through viewport height changes during streaming', async () => {
+  it('does not follow streaming content growth even while parked at the bottom', async () => {
    const { container } = render(<StreamingHarness />)

    const content = container.querySelector('[data-slot="aui_thread-content"]') as HTMLDivElement
@ -508,6 +508,7 @@ describe('assistant-ui streaming renderer', () => {

    await wait(80)

+    // Park the user at the bottom of the current content.
    await act(async () => {
      viewport.scrollTop = 800
      fireEvent.scroll(viewport)
@ -520,6 +521,9 @@ describe('assistant-ui streaming renderer', () => {
      fireEvent.scroll(viewport)
    })

+    // Content grows as tokens stream in. Streaming auto-follow is removed, so
+    // the viewport must NOT chase the new bottom — it stays where the user
+    // last left it.
    scrollHeight = 1_200

    await act(async () => {
@ -529,7 +533,7 @@ describe('assistant-ui streaming renderer', () => {
    })
    await wait(0)

-    expect(viewport.scrollTop).toBe(1_200)
+    expect(viewport.scrollTop).toBe(760)
  })

  it('honors the first upward wheel scroll even when a programmatic bottom-pin scroll event is still pending', async () => {
@ -566,7 +570,7 @@ describe('assistant-ui streaming renderer', () => {
    expect(viewport.scrollTop).toBe(420)
  })

-  it('keeps following final code-highlight growth when a run completes at bottom', async () => {
+  it('does not snap to the bottom on final code-highlight growth after a run completes', async () => {
    const { container } = render(<StreamingHarness />)

    const content = container.querySelector('[data-slot="aui_thread-content"]') as HTMLDivElement
@ -588,10 +592,13 @@ describe('assistant-ui streaming renderer', () => {

    await wait(650)

+    // Completion re-measures (Shiki highlight) and grows the content. The
+    // post-run bottom lock is removed, so the viewport stays put instead of
+    // snapping to the new bottom.
    scrollHeight = 1_700
    await wait(0)

-    expect(viewport.scrollTop).toBe(1_700)
+    expect(viewport.scrollTop).toBe(800)
  })

  it('does not restart bottom-follow after completion when the user scrolled up', async () => {
--- a/apps/desktop/src/components/assistant-ui/thread-virtualizer.tsx
+++ b/apps/desktop/src/components/assistant-ui/thread-virtualizer.tsx
@ -19,7 +19,6 @@ import { setThreadScrolledUp } from '@/store/thread-scroll'
 const ESTIMATED_ITEM_HEIGHT = 220
 const OVERSCAN = 4
 const AT_BOTTOM_THRESHOLD = 4
-const POST_RUN_BOTTOM_LOCK_MS = 1_200

 type ThreadMessageComponents = ComponentProps<typeof ThreadPrimitive.MessageByIndex>['components']

@ -265,8 +264,27 @@ function useThreadScrollAnchor({
      return
    }

+    // Already parked at the bottom: writing `scrollTop` is a no-op and the
+    // browser fires NO scroll event, so arming the programmatic gate here would
+    // leave it permanently set. Repeated pins (streaming heartbeats, the
+    // post-run lock loop) then accumulate the gate, and the next genuine user
+    // scroll-up is misread as one of our programmatic scrolls — re-arming
+    // sticky-bottom and yanking the viewport back down. Refresh trackers, bail.
+    const distFromBottom = el.scrollHeight - (el.scrollTop + el.clientHeight)
+
+    if (distFromBottom <= AT_BOTTOM_THRESHOLD) {
+      lastTopRef.current = el.scrollTop
+      lastHeightRef.current = el.scrollHeight
+      lastClientHeightRef.current = el.clientHeight
+
+      return
+    }
+
    // Hold the disarm gate across the scroll event the next line will fire.
-    programmaticScrollPendingRef.current += 1
+    // Set to 1 rather than incrementing: coalesced writes within a frame fire a
+    // single scroll event, so a counter > 1 can never drain and would swallow a
+    // later real user scroll.
+    programmaticScrollPendingRef.current = 1
    scrollElementToBottom(el)
    lastTopRef.current = el.scrollTop
    lastHeightRef.current = el.scrollHeight
@ -369,51 +387,15 @@ function useThreadScrollAnchor({
    }
  }, [scrollerRef, stickyBottomRef])

-  // Follow content growth (streaming, item measurements, loading indicator)
-  // while armed. During fast streaming the ResizeObserver can fire many
-  // times per frame as Streamdown re-tokenizes; coalesce to one pin per
-  // animation frame so we don't run the scroll-event/re-pin chain
-  // (~20+ ms self in `Virtualizer.getMaxScrollOffset`) several times per
-  // token.
-  useEffect(() => {
-    if (!enabled || !isRunning) {
-      return undefined
-    }
-
-    const el = scrollerRef.current
-
-    if (!el) {
-      return undefined
-    }
-
-    let pinRafScheduled = false
-
-    const schedulePin = () => {
-      if (pinRafScheduled || !stickyBottomRef.current) {
-        return
-      }
-
-      pinRafScheduled = true
-      requestAnimationFrame(() => {
-        pinRafScheduled = false
-
-        if (stickyBottomRef.current) {
-          pinToBottom()
-        }
-      })
-    }
-
-    const observer = new ResizeObserver(schedulePin)
-
-    // Observe ONLY the content (firstElementChild), not the scroller `el`
-    // itself. Resizes of the viewport/scroller (window resize, devtools
-    // panel toggle) shouldn't trigger a pin — only content growth should.
-    if (el.firstElementChild) {
-      observer.observe(el.firstElementChild)
-    }
-
-    return () => observer.disconnect()
-  }, [enabled, isRunning, pinToBottom, scrollerRef, stickyBottomRef])
+  // Intentionally NO streaming auto-follow. Earlier builds ran a
+  // ResizeObserver here that re-pinned the viewport to the bottom on every
+  // content growth while a turn was running, so the chat tracked tokens as
+  // they streamed. That behavior is removed by request: once a turn is in
+  // flight the viewport stays exactly where the user left it. The viewport
+  // is still moved to the bottom ONCE per user submit / new turn / session
+  // change (see the layout effect and the session-change effect below) so a
+  // freshly submitted message lands in view — but it does not chase the
+  // stream afterward.

  // Jump to bottom on session change OR when an empty thread first gets
  // content. Both share the same intent and the same effect.
@ -429,22 +411,21 @@ function useThreadScrollAnchor({
    }
  }, [enabled, groupCount, jumpToBottom, sessionKey])

-  // Pre-paint pin: when groupCount increases while armed (optimistic user
-  // message insert, streaming assistant turn arriving, etc.), pin BEFORE
-  // the browser commits the layout to screen. Using useLayoutEffect rather
-  // than useEffect so this runs synchronously after React commits the DOM
-  // mutation but before the browser paints. Without this, there's a ~50ms
-  // visual window where the new message sits below the fold while we wait
-  // for the ResizeObserver / scroll event chain to fire and re-pin.
+  // Pre-paint pin: when groupCount increases while armed (a new turn arriving
+  // from the user submit or assistant turn start), pin BEFORE the browser
+  // commits the layout to screen. Using useLayoutEffect rather than useEffect
+  // so this runs synchronously after React commits the DOM mutation but before
+  // the browser paints. Without this, there's a ~50ms visual window where the
+  // new message sits below the fold.
  //
  // We pin TWICE in this critical path — once synchronously, then once on
  // the next rAF. The second pin catches the case where React mounts the
  // new message in the second commit (after our layout effect ran), which
  // grows scrollHeight again; without the rAF pin the user briefly sees a
-  // ~15 px gap below the new message until the RO catches up. Streaming
-  // tokens use the rate-limited RO path only; only the group-count change
-  // (which fires once per user submit / new turn arrival) pays for the
-  // extra pin.
+  // ~15 px gap below the new message. This fires once per user submit / new
+  // turn arrival — it is NOT streaming-token follow (that path is removed
+  // above), so a turn that streams a long response after this initial jump
+  // will not chase the bottom.
  const prevGroupCountForLayoutRef = useRef(groupCount)
  useLayoutEffect(() => {
    if (!enabled) {
@ -468,45 +449,17 @@ function useThreadScrollAnchor({
    prevGroupCountForLayoutRef.current = groupCount
  }, [enabled, groupCount, pinToBottom, stickyBottomRef])

-  // Completion swaps streaming placeholders/plain code for final rendered DOM
-  // (notably Shiki-highlighted code). Keep following the bottom briefly after
-  // `isRunning` flips false so that final measurement pass cannot strand the
-  // viewport near the top of a large code block.
+  // Intentionally NO post-run bottom lock. Earlier builds kept pinning to
+  // the bottom for POST_RUN_BOTTOM_LOCK_MS after `isRunning` flipped false to
+  // chase final Shiki re-highlight measurement. With streaming follow gone,
+  // re-pinning at completion would yank the viewport back to the bottom even
+  // though the user is reading earlier content — the opposite of what's
+  // wanted. The one-time submit / new-turn jump already covers landing a
+  // fresh message in view.
  const prevIsRunningForLayoutRef = useRef(isRunning)
  useLayoutEffect(() => {
-    const finishedRun = prevIsRunningForLayoutRef.current && !isRunning
    prevIsRunningForLayoutRef.current = isRunning
-
-    if (!enabled || !finishedRun || !stickyBottomRef.current) {
-      return undefined
-    }
-
-    const lockUntil = performance.now() + POST_RUN_BOTTOM_LOCK_MS
-    let lockRaf: number | null = null
-
-    const lockFrame = () => {
-      lockRaf = null
-
-      if (!stickyBottomRef.current) {
-        return
-      }
-
-      pinToBottom()
-
-      if (performance.now() < lockUntil) {
-        lockRaf = requestAnimationFrame(lockFrame)
-      }
-    }
-
-    pinToBottom()
-    lockRaf = requestAnimationFrame(lockFrame)
-
-    return () => {
-      if (lockRaf !== null) {
-        cancelAnimationFrame(lockRaf)
-      }
-    }
-  }, [enabled, isRunning, pinToBottom, stickyBottomRef])
+  }, [isRunning])

  useAuiEvent('thread.runStart', jumpToBottom)
 }
--- a/apps/desktop/src/components/assistant-ui/thread.tsx
+++ b/apps/desktop/src/components/assistant-ui/thread.tsx
@ -150,10 +150,7 @@ export const Thread: FC<{
  )

  const emptyPlaceholder = intro ? (
-    <div
-      className="flex min-h-0 w-full flex-col items-center justify-center"
-      style={{ paddingBottom: 'var(--composer-measured-height)' }}
-    >
+    <div className="flex min-h-0 w-full flex-col items-center justify-center pt-[var(--composer-measured-height)]">
      <Intro {...intro} />
    </div>
  ) : undefined
@ -470,9 +467,7 @@ const ReasoningAccordionGroup: FC<{ children?: ReactNode; endIndex: number; star
    s =>
      s.thread.isRunning &&
      s.message.status?.type === 'running' &&
-      s.message.parts
-        .slice(Math.max(0, startIndex))
-        .some(p => p?.type === 'reasoning' && p.status?.type !== 'complete')
+      s.message.parts.slice(Math.max(0, startIndex)).some(p => p?.type === 'reasoning' && p.status?.type !== 'complete')
  )

  // A reasoning group with no actual text is pure noise — drop the whole
--- a/apps/desktop/src/components/chat/intro.tsx
+++ b/apps/desktop/src/components/chat/intro.tsx
@ -160,14 +160,14 @@ export function Intro({ personality, seed }: IntroProps) {

  return (
    <div
-      className="pointer-events-none flex w-full min-w-0 flex-col items-center justify-center px-3 py-6 text-center text-muted-foreground sm:px-6 lg:px-8"
+      className="pointer-events-none flex w-full min-w-0 flex-col items-center justify-center px-0.5 py-6 text-center text-muted-foreground sm:px-6 lg:px-8"
      data-slot="aui_intro"
    >
      <div className="w-full min-w-0">
        <p
          aria-label={WORDMARK}
-          className="fit-text mx-auto mb-3 w-[88%] font-['Collapse'] font-bold uppercase leading-[0.9] tracking-[0.08em] text-midground mix-blend-plus-lighter dark:text-foreground/90"
-          style={{ '--fit-text-line-height': '0.9', '--fit-text-min': '2.75rem' } as CSSProperties}
+          className="fit-text mx-auto mb-1 w-[calc(100%-1rem)] font-['Collapse'] font-bold uppercase leading-[0.9] tracking-[0.08em] text-midground mix-blend-plus-lighter dark:text-foreground/90"
+          style={{ '--fit-min': '2.75rem' } as CSSProperties}
        >
          <span>
            <span>{WORDMARK}</span>
--- a/apps/desktop/src/components/pane-shell/index.ts
+++ b/apps/desktop/src/components/pane-shell/index.ts
@ -1,4 +1,4 @@
 export type { PaneShellContextValue, PaneSlot } from './context'
 export { PaneShellContext } from './context'
-export { Pane, PaneMain, PaneShell } from './pane-shell'
+export { Pane, PANE_TOGGLE_REVEAL_EVENT, PaneMain, PaneShell } from './pane-shell'
 export type { PaneMainProps, PaneProps, PaneShellProps } from './pane-shell'
--- a/apps/desktop/src/components/pane-shell/pane-shell.tsx
+++ b/apps/desktop/src/components/pane-shell/pane-shell.tsx
@ -10,7 +10,8 @@ import {
  useContext,
  useEffect,
  useMemo,
-  useRef
+  useRef,
+  useState
 } from 'react'

 import { cn } from '@/lib/utils'
@ -31,6 +32,12 @@ export interface PaneProps {
  defaultOpen?: boolean
  /** Forces the pane closed (track→0, aria-hidden) without writing to the store — for transient route gates. */
  disabled?: boolean
+  /** Like disabled, but keeps hoverReveal alive — collapses the track without writing to the store (e.g. narrow window). */
+  forceCollapsed?: boolean
+  /** When collapsed, float the contents over the main column on hover/focus instead of hiding them (track stays 0px). */
+  hoverReveal?: boolean
+  /** Called with true while the pane is a collapsed hover-reveal overlay, so the consumer can keep contents mounted (ready to slide). */
+  onOverlayActiveChange?: (overlayActive: boolean) => void
  id: string
  maxWidth?: WidthValue
  minWidth?: WidthValue
@ -53,6 +60,7 @@ export interface PaneShellProps {
 interface CollectedPane {
  defaultOpen: boolean
  disabled: boolean
+  forceCollapsed: boolean
  id: string
  resizable: boolean
  side: PaneSide
@ -62,6 +70,22 @@ interface CollectedPane {
 const DEFAULT_WIDTH = '16rem'
 const DEFAULT_RESIZE_MIN_WIDTH = 160

+// Hover-reveal slide. The enter delay is a pure-CSS hover-intent gate: a fast
+// pass-by doesn't dwell on the trigger long enough for the delay to elapse.
+const HOVER_REVEAL_SLIDE_MS = 220
+const HOVER_REVEAL_ENTER_DELAY_MS = 130
+const HOVER_REVEAL_EASE = 'cubic-bezier(0.32,0.72,0,1)'
+// Offset shadow lifting the revealed panel off the content (same both sides;
+// the mirror axis is offset-x, which is 0). Same color on light + dark.
+const HOVER_REVEAL_SHADOW = '0px -18px 18px -5px #00000012'
+// Edge trigger strip, inset past the OS window-resize grab area.
+const HOVER_REVEAL_TRIGGER_WIDTH = 14
+const HOVER_REVEAL_EDGE_GUTTER = 6
+
+// Fired (window CustomEvent<{ id }>) to toggle a force-collapsed pane's reveal
+// from the keyboard, since its store-open toggle is a no-op while collapsed.
+export const PANE_TOGGLE_REVEAL_EVENT = 'hermes:pane-toggle-reveal'
+
 const widthToCss = (value: WidthValue | undefined, fallback: string) =>
  value === undefined ? fallback : typeof value === 'number' ? `${value}px` : value

@ -110,6 +134,7 @@ function collectPanes(children: ReactNode) {
    const entry: CollectedPane = {
      defaultOpen: props.defaultOpen ?? true,
      disabled: props.disabled ?? false,
+      forceCollapsed: props.forceCollapsed ?? false,
      id: props.id,
      resizable: props.resizable ?? false,
      side: props.side,
@ -124,7 +149,7 @@ function collectPanes(children: ReactNode) {

 function trackForPane(pane: CollectedPane, states: Record<string, { open: boolean; widthOverride?: number }>) {
  const stateOpen = states[pane.id]?.open ?? pane.defaultOpen
-  const open = !pane.disabled && stateOpen
+  const open = !pane.disabled && !pane.forceCollapsed && stateOpen

  if (!open) {
    return { open: false, track: '0px' }
@ -193,14 +218,29 @@ export function Pane({
  className,
  defaultOpen = true,
  disabled = false,
+  hoverReveal = false,
  id,
  maxWidth,
  minWidth,
-  resizable = false
+  onOverlayActiveChange,
+  resizable = false,
+  width
 }: PaneProps) {
  const ctx = useContext(PaneShellContext)
+  const paneStates = useStore($paneStates)
  const registered = useRef(false)
  const paneRef = useRef<HTMLDivElement | null>(null)
+  // Keyboard (mod+b / mod+j) pins the reveal open while collapsed; hover is CSS.
+  const [forced, setForced] = useState(false)
+
+  const slot = ctx?.paneById.get(id)
+  const open = Boolean(slot?.open && !disabled)
+  const side = slot?.side ?? 'left'
+  // Collapsed + hoverReveal: float the pane contents over the main column on
+  // hover/focus instead of hiding them. Honors any persisted resize width.
+  const overlayActive = !open && hoverReveal && !disabled
+  const override = resizable ? paneStates[id]?.widthOverride : undefined
+  const overlayWidth = override !== undefined ? `${override}px` : widthToCss(width, DEFAULT_WIDTH)

  useEffect(() => {
    if (registered.current) {
@ -211,12 +251,34 @@ export function Pane({
    ensurePaneRegistered(id, { open: defaultOpen })
  }, [defaultOpen, id])

-  const slot = ctx?.paneById.get(id)
-  const open = Boolean(slot?.open && !disabled)
+  // Keyboard toggle pins/unpins the reveal while collapsed; clear when no longer
+  // a collapsed overlay (reopened / widened).
+  useEffect(() => {
+    if (typeof window === 'undefined' || !overlayActive) {
+      setForced(false)
+
+      return
+    }
+
+    const onToggle = (e: Event) => {
+      if ((e as CustomEvent<{ id: string }>).detail?.id === id) {
+        setForced(v => !v)
+      }
+    }
+
+    window.addEventListener(PANE_TOGGLE_REVEAL_EVENT, onToggle)
+
+    return () => window.removeEventListener(PANE_TOGGLE_REVEAL_EVENT, onToggle)
+  }, [id, overlayActive])
+
+  // Keep contents mounted while collapsed so reveal is a pure CSS transform.
+  useEffect(() => {
+    onOverlayActiveChange?.(overlayActive)
+  }, [onOverlayActiveChange, overlayActive])
+
  const canResize = open && resizable
  const lo = widthToPx(minWidth) ?? DEFAULT_RESIZE_MIN_WIDTH
  const hi = widthToPx(maxWidth) ?? Number.POSITIVE_INFINITY
-  const side = slot?.side ?? 'left'

  const startResize = useCallback(
    (event: ReactPointerEvent<HTMLDivElement>) => {
@ -273,6 +335,58 @@ export function Pane({
    return null
  }

+  // Collapsed hover-reveal track: a 0px, pointer-transparent grid cell holding a
+  // thin edge trigger + the floating panel (both absolute, escaping the zero
+  // box). group-hover (or data-forced from the keyboard) drives the slide; the
+  // enter-delay is the hover-intent gate. No JS pointer math.
+  if (overlayActive) {
+    const edge = side === 'left' ? 'left' : 'right'
+    const offscreen = side === 'left' ? '-translate-x-[calc(100%+1rem)]' : 'translate-x-[calc(100%+1rem)]'
+
+    return (
+      <div
+        className={cn('group/reveal pointer-events-none relative row-start-1 min-w-0', className)}
+        data-forced={forced ? '' : undefined}
+        data-pane-hover-reveal={forced ? 'open' : 'closed'}
+        data-pane-id={id}
+        data-pane-open="false"
+        data-pane-side={side}
+        ref={paneRef}
+        style={{ gridColumn: `${slot.column} / ${slot.column + 1}` }}
+      >
+        <div
+          aria-hidden="true"
+          className="pointer-events-auto absolute inset-y-0 z-30 [-webkit-app-region:no-drag]"
+          style={{ [edge]: HOVER_REVEAL_EDGE_GUTTER, width: HOVER_REVEAL_TRIGGER_WIDTH }}
+        />
+
+        {/* Keyed on side so flipping panes remounts off-screen on the new edge
+            instead of transitioning the transform across the viewport. */}
+        <div
+          className={cn(
+            'pointer-events-none absolute inset-y-0 z-30 overflow-hidden transition-transform delay-0',
+            offscreen,
+            'group-hover/reveal:pointer-events-auto group-hover/reveal:translate-x-0 group-hover/reveal:delay-[var(--reveal-enter-delay)] group-hover/reveal:shadow-[var(--reveal-shadow)]',
+            'group-data-[forced]/reveal:pointer-events-auto group-data-[forced]/reveal:translate-x-0 group-data-[forced]/reveal:delay-0 group-data-[forced]/reveal:shadow-[var(--reveal-shadow)]'
+          )}
+          key={edge}
+          style={
+            {
+              [edge]: 0,
+              width: overlayWidth,
+              '--reveal-shadow': HOVER_REVEAL_SHADOW,
+              transitionDuration: `${HOVER_REVEAL_SLIDE_MS}ms`,
+              transitionTimingFunction: HOVER_REVEAL_EASE,
+              '--reveal-enter-delay': `${HOVER_REVEAL_ENTER_DELAY_MS}ms`
+            } as CSSProperties
+          }
+        >
+          <div className="flex h-full w-full flex-col">{children}</div>
+        </div>
+      </div>
+    )
+  }
+
  return (
    <div
      aria-hidden={!open}
--- a/apps/desktop/src/global.d.ts
+++ b/apps/desktop/src/global.d.ts
@ -7,6 +7,13 @@ declare global {
      // the window's backend; pass a named profile to lazily spawn/reuse that
      // profile's backend from the pool.
      getConnection: (profile?: string | null) => Promise<HermesConnection>
+      // Reconnect-after-wake recovery: liveness-probe the cached PRIMARY backend
+      // and drop it if a remote one has gone unreachable, so the next
+      // getConnection() rebuilds a reachable descriptor instead of the renderer
+      // re-dialing a dead remote forever. No-op for local backends (they
+      // self-heal via the child 'exit' handler). `rebuilt` is true when a stale
+      // remote cache was dropped.
+      revalidateConnection: () => Promise<{ ok: boolean; rebuilt: boolean }>
      // Keepalive: mark a pool profile backend as recently used so the idle
      // reaper spares it while its chat is active.
      touchBackend: (profile?: string | null) => Promise<{ ok: boolean }>
--- a/apps/desktop/src/hermes.ts
+++ b/apps/desktop/src/hermes.ts
@ -7,6 +7,7 @@ import type {
  AudioSpeakResponse,
  AudioTranscriptionResponse,
  AuxiliaryModelsResponse,
+  BackendUpdateCheckResponse,
  ConfigSchemaResponse,
  CronJob,
  CronJobCreatePayload,
@ -53,6 +54,7 @@ export type {
  AnalyticsSkillEntry,
  AnalyticsSkillsSummary,
  AnalyticsTotals,
+  BackendUpdateCheckResponse,
  AudioSpeakResponse,
  AudioTranscriptionResponse,
  AuxiliaryModelsResponse,
@ -686,6 +688,15 @@ export function updateHermes(): Promise<ActionResponse> {
  })
 }

+/** Query the connected backend's own update state. In remote mode this is the
+ *  authoritative source for the backend's behind-count + "what's changed",
+ *  distinct from the Electron client clone's git state. */
+export function checkHermesUpdate(force = false): Promise<BackendUpdateCheckResponse> {
+  return window.hermesDesktop.api<BackendUpdateCheckResponse>({
+    path: `/api/hermes/update/check${force ? '?force=true' : ''}`
+  })
+}
+
 export function getActionStatus(name: string, lines = 200): Promise<ActionStatusResponse> {
  return window.hermesDesktop.api<ActionStatusResponse>({
    path: `/api/actions/${encodeURIComponent(name)}/status?lines=${Math.max(1, lines)}`
--- a/apps/desktop/src/i18n/en.ts
+++ b/apps/desktop/src/i18n/en.ts
@ -292,7 +292,8 @@ export const en: Translations = {
      technical: 'Technical',
      technicalDesc: 'Include raw tool args/results and low-level details.',
      themeTitle: 'Theme',
-      themeDesc: 'Desktop palettes only. The selected mode is applied on top.'
+      themeDesc: 'Desktop palettes only. The selected mode is applied on top.',
+      themeProfileNote: profile => `Saved for the ${profile} profile — each profile keeps its own theme.`
    },
    fieldLabels: FIELD_LABELS,
    fieldDescriptions: FIELD_DESCRIPTIONS,
@ -1237,9 +1238,13 @@ export const en: Translations = {
    unsupportedMessage: 'This version of Hermes can’t update itself from inside the app.',
    connectionRetry: 'Check your connection and try again.',
    latestBody: 'You’re running the latest version.',
+    latestBodyBackend: 'The backend is running the latest version.',
    allSetTitle: 'You’re all set',
    availableTitle: 'New update available',
    availableBody: 'A new version of Hermes is ready to install.',
+    availableTitleBackend: 'Backend update available',
+    availableBodyBackend: 'A newer version of the connected Hermes backend is ready to install.',
+    availableBodyNoChangelog: 'A newer version is ready. Release notes aren’t available for this install type.',
    updateNow: 'Update now',
    maybeLater: 'Maybe later',
    moreChanges: count => `+ ${count} more change${count === 1 ? '' : 's'} included.`,
@ -1250,10 +1255,19 @@ export const en: Translations = {
    copied: 'Copied',
    done: 'Done',
    applyingBody: 'The Hermes updater will take over in its own window and reopen Hermes when it’s done.',
+    applyingBodyBackend: 'The remote backend is applying the update and will restart. Hermes reconnects automatically when it’s back.',
    applyingClose: 'Hermes will close to apply the update.',
    errorTitle: 'Update didn’t finish',
    errorBody: 'No worries — nothing was lost. You can try again now.',
-    notNow: 'Not now'
+    notNow: 'Not now',
+    applyStatus: {
+      preparing: 'Updating backend…',
+      pulling: 'Backend updating…',
+      restarting: 'Backend restarting to load the update…',
+      notAvailable: 'Update not available for this backend.',
+      failed: 'Backend update failed.',
+      noReturn: 'Backend didn’t come back online. The update may not have completed — check the backend host.'
+    }
  },

  install: {
@ -1439,6 +1453,9 @@ export const en: Translations = {
      updateInProgress: 'Update in progress',
      commitsBehind: (count, branch) => `${count} commit${count === 1 ? '' : 's'} behind ${branch}`,
      desktopVersion: version => `Hermes Desktop v${version}`,
+      backendVersion: version => `Backend v${version}`,
+      clientLabel: version => `client v${version}`,
+      backendLabel: version => `backend v${version}`,
      commit: sha => `commit ${sha}`,
      branch: branch => `branch ${branch}`,
      closeCommandCenter: 'Close Command Center',
@ -1463,8 +1480,8 @@ export const en: Translations = {
      contextUsage: 'Context usage',
      session: 'Session',
      runtimeSessionElapsed: 'Runtime session elapsed',
-      yoloOn: 'YOLO on — auto-approving dangerous commands. Click to turn off.',
-      yoloOff: 'YOLO off — click to auto-approve dangerous commands.',
+      yoloOn: 'YOLO on — auto-approving dangerous commands. Click to turn off. Shift+click toggles it globally.',
+      yoloOff: 'YOLO off — click to auto-approve dangerous commands. Shift+click toggles it globally.',
      modelNone: 'none',
      noModel: 'no model',
      switchModel: 'Switch model',
--- a/apps/desktop/src/i18n/ja.ts
+++ b/apps/desktop/src/i18n/ja.ts
@ -215,7 +215,8 @@ export const ja = defineLocale({
      technical: 'テクニカル',
      technicalDesc: '生のツール引数、結果、低レベルの詳細を含めます。',
      themeTitle: 'テーマ',
-      themeDesc: 'デスクトップ専用のパレットです。選択したモードの上に適用されます。'
+      themeDesc: 'デスクトップ専用のパレットです。選択したモードの上に適用されます。',
+      themeProfileNote: profile => `「${profile}」プロファイルに保存されます。プロファイルごとに個別のテーマを保持します。`
    },
    fieldLabels: defineFieldCopy({
      model: 'デフォルトモデル',
@ -1378,9 +1379,13 @@ export const ja = defineLocale({
    unsupportedMessage: 'このバージョンの Hermes はアプリ内から自分を更新できません。',
    connectionRetry: '接続を確認してもう一度試してください。',
    latestBody: '最新バージョンを実行しています。',
+    latestBodyBackend: 'バックエンドは最新バージョンを実行しています。',
    allSetTitle: '準備完了',
    availableTitle: '新しい更新が利用可能',
    availableBody: '新しいバージョンの Hermes をインストールする準備ができています。',
+    availableTitleBackend: 'バックエンドの更新があります',
+    availableBodyBackend: '接続中の Hermes バックエンドの新しいバージョンをインストールできます。',
+    availableBodyNoChangelog: '新しいバージョンを利用できます。このインストール形式ではリリースノートは表示できません。',
    updateNow: '今すぐ更新',
    maybeLater: '後で',
    moreChanges: count => `さらに ${count} 件の変更が含まれています。`,
@ -1392,10 +1397,19 @@ export const ja = defineLocale({
    copied: 'コピーしました',
    done: '完了',
    applyingBody: 'Hermes アップデーターが独自のウィンドウで引き継ぎ、完了後に Hermes を再度開きます。',
+    applyingBodyBackend: 'リモートバックエンドが更新を適用して再起動します。復帰すると Hermes が自動的に再接続します。',
    applyingClose: 'Hermes は更新を適用するために閉じます。',
    errorTitle: '更新が完了しませんでした',
    errorBody: 'ご安心ください。何も失われていません。今すぐ再試行できます。',
-    notNow: '今は後で'
+    notNow: '今は後で',
+    applyStatus: {
+      preparing: 'バックエンドを更新しています…',
+      pulling: 'バックエンドを更新中…',
+      restarting: 'バックエンドが更新を読み込むため再起動しています…',
+      notAvailable: 'このバックエンドでは更新を利用できません。',
+      failed: 'バックエンドの更新に失敗しました。',
+      noReturn: 'バックエンドがオンラインに戻りませんでした。更新が完了していない可能性があります。バックエンドホストを確認してください。'
+    }
  },

  install: {
@ -1582,6 +1596,9 @@ export const ja = defineLocale({
      updateInProgress: '更新中',
      commitsBehind: (count, branch) => `${branch} より ${count} コミット遅れています`,
      desktopVersion: version => `Hermes Desktop v${version}`,
+      backendVersion: version => `バックエンド v${version}`,
+      clientLabel: version => `クライアント v${version}`,
+      backendLabel: version => `バックエンド v${version}`,
      commit: sha => `コミット ${sha}`,
      branch: branch => `ブランチ ${branch}`,
      closeCommandCenter: 'コマンドセンターを閉じる',
@ -1606,8 +1623,8 @@ export const ja = defineLocale({
      contextUsage: 'コンテキスト使用状況',
      session: 'セッション',
      runtimeSessionElapsed: 'ランタイムセッション経過時間',
-      yoloOn: 'YOLO オン — 危険なコマンドを自動承認中。クリックでオフに。',
-      yoloOff: 'YOLO オフ — クリックで危険なコマンドを自動承認。',
+      yoloOn: 'YOLO オン — 危険なコマンドを自動承認中。クリックでオフに。Shift+クリックで全体に切り替え。',
+      yoloOff: 'YOLO オフ — クリックで危険なコマンドを自動承認。Shift+クリックで全体に切り替え。',
      modelNone: 'なし',
      noModel: 'モデルなし',
      switchModel: 'モデルを切り替え',
--- a/apps/desktop/src/i18n/types.ts
+++ b/apps/desktop/src/i18n/types.ts
@ -219,6 +219,7 @@ export interface Translations {
      technicalDesc: string
      themeTitle: string
      themeDesc: string
+      themeProfileNote: (profile: string) => string
    }
    fieldLabels: Record<string, string>
    fieldDescriptions: Record<string, string>
@ -937,9 +938,13 @@ export interface Translations {
    unsupportedMessage: string
    connectionRetry: string
    latestBody: string
+    latestBodyBackend: string
    allSetTitle: string
    availableTitle: string
    availableBody: string
+    availableTitleBackend: string
+    availableBodyBackend: string
+    availableBodyNoChangelog: string
    updateNow: string
    maybeLater: string
    moreChanges: (count: number) => string
@ -950,10 +955,19 @@ export interface Translations {
    copied: string
    done: string
    applyingBody: string
+    applyingBodyBackend: string
    applyingClose: string
    errorTitle: string
    errorBody: string
    notNow: string
+    applyStatus: {
+      preparing: string
+      pulling: string
+      restarting: string
+      notAvailable: string
+      failed: string
+      noReturn: string
+    }
  }

  install: {
@ -1111,6 +1125,9 @@ export interface Translations {
      updateInProgress: string
      commitsBehind: (count: number, branch: string) => string
      desktopVersion: (version: string) => string
+      backendVersion: (version: string) => string
+      clientLabel: (version: string) => string
+      backendLabel: (version: string) => string
      commit: (sha: string) => string
      branch: (branch: string) => string
      closeCommandCenter: string
--- a/apps/desktop/src/i18n/zh-hant.ts
+++ b/apps/desktop/src/i18n/zh-hant.ts
@ -209,7 +209,8 @@ export const zhHant = defineLocale({
      technical: '技術',
      technicalDesc: '包含原始工具參數、結果與底層細節。',
      themeTitle: '主題',
-      themeDesc: '僅限桌面端的調色盤。所選模式會套用在其上。'
+      themeDesc: '僅限桌面端的調色盤。所選模式會套用在其上。',
+      themeProfileNote: profile => `已為「${profile}」設定檔儲存——每個設定檔保留各自的主題。`
    },
    fieldLabels: defineFieldCopy({
      model: '預設模型',
@ -1344,9 +1345,13 @@ export const zhHant = defineLocale({
    unsupportedMessage: '此版本的 Hermes 無法在應用程式內自行更新。',
    connectionRetry: '請檢查網路連線後重試。',
    latestBody: '您正在執行最新版本。',
+    latestBodyBackend: '後端正在執行最新版本。',
    allSetTitle: '已是最新版本',
    availableTitle: '有可用更新',
    availableBody: '新版 Hermes 已可安裝。',
+    availableTitleBackend: '後端有可用更新',
+    availableBodyBackend: '已連接的 Hermes 後端有新版本可安裝。',
+    availableBodyNoChangelog: '已有新版本可用。此安裝方式無法顯示更新日誌。',
    updateNow: '立即更新',
    maybeLater: '稍後再說',
    moreChanges: count => `另有 ${count} 項變更。`,
@ -1357,10 +1362,19 @@ export const zhHant = defineLocale({
    copied: '已複製',
    done: '完成',
    applyingBody: 'Hermes 更新程式會在自己的視窗中接管，並在完成後重新開啟 Hermes。',
+    applyingBodyBackend: '遠端後端正在套用更新並將重新啟動。恢復後 Hermes 會自動重新連線。',
    applyingClose: 'Hermes 將關閉以套用更新。',
    errorTitle: '更新未完成',
    errorBody: '沒有資料遺失。您可以現在重試。',
-    notNow: '暫不'
+    notNow: '暫不',
+    applyStatus: {
+      preparing: '正在更新後端…',
+      pulling: '後端更新中…',
+      restarting: '後端正在重新啟動以載入更新…',
+      notAvailable: '此後端無法更新。',
+      failed: '後端更新失敗。',
+      noReturn: '後端未恢復連線。更新可能未完成——請檢查後端主機。'
+    }
  },

  install: {
@ -1543,6 +1557,9 @@ export const zhHant = defineLocale({
      updateInProgress: '更新中',
      commitsBehind: (count, branch) => `落後 ${branch} ${count} 個提交`,
      desktopVersion: version => `Hermes Desktop v${version}`,
+      backendVersion: version => `後端 v${version}`,
+      clientLabel: version => `用戶端 v${version}`,
+      backendLabel: version => `後端 v${version}`,
      commit: sha => `提交 ${sha}`,
      branch: branch => `分支 ${branch}`,
      closeCommandCenter: '關閉命令中心',
@ -1567,8 +1584,8 @@ export const zhHant = defineLocale({
      contextUsage: '上下文使用量',
      session: '工作階段',
      runtimeSessionElapsed: '執行時工作階段已用時間',
-      yoloOn: 'YOLO 已開啟 — 自動核准危險指令。點擊關閉。',
-      yoloOff: 'YOLO 已關閉 — 點擊自動核准危險指令。',
+      yoloOn: 'YOLO 已開啟 — 自動核准危險指令。點擊關閉。Shift+點擊可全域切換。',
+      yoloOff: 'YOLO 已關閉 — 點擊自動核准危險指令。Shift+點擊可全域切換。',
      modelNone: '無',
      noModel: '無模型',
      switchModel: '切換模型',
--- a/apps/desktop/src/i18n/zh.ts
+++ b/apps/desktop/src/i18n/zh.ts
@ -287,7 +287,8 @@ export const zh: Translations = {
      technical: '技术',
      technicalDesc: '包含原始工具参数/结果及底层细节。',
      themeTitle: '主题',
-      themeDesc: '仅桌面端调色板。所选模式叠加其上。'
+      themeDesc: '仅桌面端调色板。所选模式叠加其上。',
+      themeProfileNote: profile => `已为「${profile}」配置文件保存——每个配置文件保留各自的主题。`
    },
    fieldLabels: defineFieldCopy({
      model: '默认模型',
@ -1424,9 +1425,13 @@ export const zh: Translations = {
    unsupportedMessage: '此版本的 Hermes 无法在应用内自行更新。',
    connectionRetry: '请检查网络连接后重试。',
    latestBody: '你正在运行最新版本。',
+    latestBodyBackend: '后端正在运行最新版本。',
    allSetTitle: '已是最新',
    availableTitle: '有可用更新',
    availableBody: '新版 Hermes 已可安装。',
+    availableTitleBackend: '后端有可用更新',
+    availableBodyBackend: '已连接的 Hermes 后端有新版本可安装。',
+    availableBodyNoChangelog: '已有新版本可用。此安装方式无法显示更新日志。',
    updateNow: '立即更新',
    maybeLater: '稍后再说',
    moreChanges: count => `另有 ${count} 项更改。`,
@ -1437,10 +1442,19 @@ export const zh: Translations = {
    copied: '已复制',
    done: '完成',
    applyingBody: 'Hermes 更新器会在自己的窗口中接管，并在完成后重新打开 Hermes。',
+    applyingBodyBackend: '远程后端正在应用更新并将重启。恢复后 Hermes 会自动重新连接。',
    applyingClose: 'Hermes 将关闭以应用更新。',
    errorTitle: '更新未完成',
    errorBody: '没有数据丢失。你可以现在重试。',
-    notNow: '暂不'
+    notNow: '暂不',
+    applyStatus: {
+      preparing: '正在更新后端…',
+      pulling: '后端更新中…',
+      restarting: '后端正在重启以加载更新…',
+      notAvailable: '此后端无法更新。',
+      failed: '后端更新失败。',
+      noReturn: '后端未恢复在线。更新可能未完成——请检查后端主机。'
+    }
  },

  install: {
@ -1620,6 +1634,9 @@ export const zh: Translations = {
      updateInProgress: '正在更新',
      commitsBehind: (count, branch) => `落后 ${branch} ${count} 个提交`,
      desktopVersion: version => `Hermes Desktop v${version}`,
+      backendVersion: version => `后端 v${version}`,
+      clientLabel: version => `客户端 v${version}`,
+      backendLabel: version => `后端 v${version}`,
      commit: sha => `提交 ${sha}`,
      branch: branch => `分支 ${branch}`,
      closeCommandCenter: '关闭命令中心',
@ -1644,8 +1661,8 @@ export const zh: Translations = {
      contextUsage: '上下文用量',
      session: '会话',
      runtimeSessionElapsed: '运行时会话已用时间',
-      yoloOn: 'YOLO 已开启 - 自动批准危险命令。点击关闭。',
-      yoloOff: 'YOLO 已关闭 - 点击自动批准危险命令。',
+      yoloOn: 'YOLO 已开启 - 自动批准危险命令。点击关闭。Shift+点击可全局切换。',
+      yoloOff: 'YOLO 已关闭 - 点击自动批准危险命令。Shift+点击可全局切换。',
      modelNone: '无',
      noModel: '无模型',
      switchModel: '切换模型',
--- a/apps/desktop/src/lib/gateway-events.test.ts
+++ b/apps/desktop/src/lib/gateway-events.test.ts
@ -0,0 +1,27 @@
+import { describe, expect, it } from 'vitest'
+
+import { gatewayEventRequiresSessionId } from './gateway-events'
+
+describe('gateway event routing', () => {
+  it('drops only unscoped subagent events (genuinely background work)', () => {
+    expect(gatewayEventRequiresSessionId('subagent.progress')).toBe(true)
+    expect(gatewayEventRequiresSessionId('subagent.start')).toBe(true)
+  })
+
+  it('attributes unscoped foreground turn events to the active chat', () => {
+    // These must NOT be dropped when unscoped — they are the focused turn's own
+    // output, and dropping them loses the live response until a refetch (#42178).
+    expect(gatewayEventRequiresSessionId('message.delta')).toBe(false)
+    expect(gatewayEventRequiresSessionId('message.complete')).toBe(false)
+    expect(gatewayEventRequiresSessionId('reasoning.delta')).toBe(false)
+    expect(gatewayEventRequiresSessionId('tool.start')).toBe(false)
+    expect(gatewayEventRequiresSessionId('approval.request')).toBe(false)
+  })
+
+  it('allows global events to remain unscoped', () => {
+    expect(gatewayEventRequiresSessionId('gateway.ready')).toBe(false)
+    expect(gatewayEventRequiresSessionId('preview.restart.progress')).toBe(false)
+    expect(gatewayEventRequiresSessionId('session.info')).toBe(false)
+    expect(gatewayEventRequiresSessionId(undefined)).toBe(false)
+  })
+})
--- a/apps/desktop/src/lib/gateway-events.ts
+++ b/apps/desktop/src/lib/gateway-events.ts
@ -11,6 +11,22 @@ function asRecord(payload: unknown): Record<string, unknown> {
  return payload && typeof payload === 'object' ? (payload as Record<string, unknown>) : {}
 }

+/**
+ * Whether an unscoped event (no `session_id`) must be dropped rather than
+ * attributed to the focused chat.
+ *
+ * Only `subagent.*` qualifies: it describes background/async work that must
+ * never attach to whichever chat happens to be focused. Every other scoped
+ * event — message/reasoning/thinking/tool/status/prompt — is, when unscoped,
+ * the active turn's own output. The gateway always stamps a *background*
+ * session's events with that session's id, so a missing id can only mean "the
+ * focused turn". #42178 dropped those too, which silently swallowed the live
+ * answer; it then reappeared only after a transcript refetch (manual refresh).
+ */
+export function gatewayEventRequiresSessionId(eventType: string | undefined): boolean {
+  return eventType?.startsWith('subagent.') ?? false
+}
+
 export function gatewayEventCompletedFileDiff(event: RpcEventLike): boolean {
  if (event.type !== 'tool.complete') {
    return false
--- a/apps/desktop/src/lib/session-search.test.ts
+++ b/apps/desktop/src/lib/session-search.test.ts
@ -52,6 +52,14 @@ describe('sessionMatchesSearch', () => {
    expect(sessionMatchesSearch(session, 'hermes-agent')).toBe(true)
  })

+  it('matches sessions by source platform and aliases', () => {
+    expect(sessionMatchesSearch(makeSession({ source: 'telegram' }), 'Telegram')).toBe(true)
+    expect(sessionMatchesSearch(makeSession({ source: 'whatsapp' }), 'WhatsApp')).toBe(true)
+    expect(sessionMatchesSearch(makeSession({ source: 'whatsapp' }), 'wa')).toBe(true)
+    expect(sessionMatchesSearch(makeSession({ source: 'slack' }), 'slack')).toBe(true)
+    expect(sessionMatchesSearch(makeSession({ source: 'bluebubbles' }), 'imessage')).toBe(true)
+  })
+
  it('does not match unrelated queries', () => {
    expect(sessionMatchesSearch(makeSession(), 'totally-unrelated')).toBe(false)
  })
--- a/apps/desktop/src/lib/session-search.ts
+++ b/apps/desktop/src/lib/session-search.ts
@ -1,6 +1,7 @@
 import type { SessionInfo } from '@/types/hermes'

 import { sessionTitle } from './chat-runtime'
+import { sessionSourceSearchTerms } from './session-source'

 export function sessionMatchesSearch(session: SessionInfo, query: string): boolean {
  const needle = query.trim().toLowerCase()
@ -14,6 +15,7 @@ export function sessionMatchesSearch(session: SessionInfo, query: string): boole
    session._lineage_root_id ?? '',
    sessionTitle(session),
    session.preview ?? '',
-    session.cwd ?? ''
+    session.cwd ?? '',
+    ...sessionSourceSearchTerms(session.source)
  ].some(value => value.toLowerCase().includes(needle))
 }
--- a/apps/desktop/src/lib/session-source.ts
+++ b/apps/desktop/src/lib/session-source.ts
@ -0,0 +1,62 @@
+const SOURCE_LABELS: Record<string, string> = {
+  api_server: 'API',
+  bluebubbles: 'iMessage',
+  cli: 'CLI',
+  codex: 'Codex',
+  desktop: 'Desktop',
+  discord: 'Discord',
+  email: 'Email',
+  gateway: 'Gateway',
+  local: 'Local',
+  matrix: 'Matrix',
+  mattermost: 'Mattermost',
+  qqbot: 'QQ',
+  signal: 'Signal',
+  slack: 'Slack',
+  sms: 'SMS',
+  telegram: 'Telegram',
+  tui: 'TUI',
+  webhook: 'Webhook',
+  weixin: 'WeChat',
+  whatsapp: 'WhatsApp',
+  yuanbao: 'Yuanbao'
+}
+
+const SOURCE_ALIASES: Record<string, string[]> = {
+  bluebubbles: ['apple messages', 'imessage'],
+  cli: ['terminal'],
+  desktop: ['app', 'gui'],
+  local: ['machine'],
+  qqbot: ['qq'],
+  telegram: ['tg'],
+  tui: ['terminal'],
+  weixin: ['wechat'],
+  whatsapp: ['wa']
+}
+
+export function normalizeSessionSource(source: null | string | undefined): string | null {
+  const id = source?.trim().toLowerCase()
+
+  return id || null
+}
+
+export function sessionSourceLabel(source: null | string | undefined): string | null {
+  const id = normalizeSessionSource(source)
+
+  if (!id) {
+    return null
+  }
+
+  return SOURCE_LABELS[id] || id.replace(/[_-]+/g, ' ').replace(/\b\w/g, char => char.toUpperCase())
+}
+
+export function sessionSourceSearchTerms(source: null | string | undefined): string[] {
+  const id = normalizeSessionSource(source)
+  const label = sessionSourceLabel(id)
+
+  if (!id) {
+    return []
+  }
+
+  return [id, label ?? '', ...(SOURCE_ALIASES[id] ?? [])].filter(Boolean)
+}
--- a/apps/desktop/src/lib/update-copy.test.ts
+++ b/apps/desktop/src/lib/update-copy.test.ts
@ -0,0 +1,38 @@
+import { describe, expect, it } from 'vitest'
+
+import { resolveUpdateCopy } from './update-copy'
+
+const copy = {
+  availableTitle: 'New update available',
+  availableBody: 'A new version of Hermes is ready to install.',
+  availableTitleBackend: 'Backend update available',
+  availableBodyBackend: 'A newer version of the connected Hermes backend is ready to install.',
+  availableBodyNoChangelog: 'A newer version is ready. Release notes aren’t available for this install type.'
+}
+
+describe('resolveUpdateCopy', () => {
+  it('client target with commits: client title + client body', () => {
+    const r = resolveUpdateCopy({ target: 'client', shownItems: 5, copy })
+    expect(r.title).toBe('New update available')
+    expect(r.body).toBe('A new version of Hermes is ready to install.')
+  })
+
+  it('backend target with commits: names the backend in title and body', () => {
+    const r = resolveUpdateCopy({ target: 'backend', shownItems: 5, copy })
+    expect(r.title).toBe('Backend update available')
+    expect(r.body).toContain('backend')
+  })
+
+  it('no changelog (pip/non-git backend): degrades honestly, still names backend target in title', () => {
+    const r = resolveUpdateCopy({ target: 'backend', shownItems: 0, copy })
+    expect(r.title).toBe('Backend update available')
+    // Body must NOT pretend there are notes — it states they're unavailable.
+    expect(r.body).toBe(copy.availableBodyNoChangelog)
+  })
+
+  it('no changelog on client: same honest degrade', () => {
+    const r = resolveUpdateCopy({ target: 'client', shownItems: 0, copy })
+    expect(r.title).toBe('New update available')
+    expect(r.body).toBe(copy.availableBodyNoChangelog)
+  })
+})
--- a/apps/desktop/src/lib/update-copy.ts
+++ b/apps/desktop/src/lib/update-copy.ts
@ -0,0 +1,44 @@
+/**
+ * Pure copy-selection for the updates overlay's "available" state.
+ *
+ * Names the update target (client vs the connected backend in remote mode) and
+ * degrades honestly when there's no commit changelog to show (e.g. a pip /
+ * non-git backend where `git log` yields nothing) instead of generic filler.
+ *
+ * Extracted from updates-overlay.tsx so the wording logic is unit-testable.
+ */
+
+export type UpdateTarget = 'client' | 'backend'
+
+export interface UpdateCopyStrings {
+  availableTitle: string
+  availableBody: string
+  availableTitleBackend: string
+  availableBodyBackend: string
+  availableBodyNoChangelog: string
+}
+
+export interface ResolveUpdateCopyInput {
+  target: UpdateTarget
+  /** Number of commit rows actually shown in the changelog. 0 → no notes. */
+  shownItems: number
+  copy: UpdateCopyStrings
+}
+
+export interface UpdateCopyResult {
+  title: string
+  body: string
+}
+
+export function resolveUpdateCopy({ target, shownItems, copy }: ResolveUpdateCopyInput): UpdateCopyResult {
+  const title = target === 'backend' ? copy.availableTitleBackend : copy.availableTitle
+
+  const body =
+    shownItems === 0
+      ? copy.availableBodyNoChangelog
+      : target === 'backend'
+        ? copy.availableBodyBackend
+        : copy.availableBody
+
+  return { title, body }
+}
--- a/apps/desktop/src/lib/yolo-session.ts
+++ b/apps/desktop/src/lib/yolo-session.ts
@ -24,3 +24,27 @@ export async function setSessionYolo(

  return active
 }
+
+/**
+ * Toggle GLOBAL YOLO (approval bypass) via gateway `config.set` with
+ * `scope: 'global'`. This flips the persistent `approvals.mode` in config.yaml
+ * between `off` (bypass on) and `manual` (bypass off), affecting every session,
+ * the CLI, the TUI, and cron — and it survives restarts. Triggered by
+ * Shift+clicking the status-bar zap.
+ */
+export async function setGlobalYolo(
+  requestGateway: GatewayRequester,
+  enabled: boolean
+): Promise<boolean> {
+  const result = await requestGateway<{ value?: string }>('config.set', {
+    key: 'yolo',
+    scope: 'global',
+    value: enabled ? '1' : '0'
+  })
+
+  const active = result?.value === '1'
+
+  setYoloActive(active)
+
+  return active
+}
--- a/apps/desktop/src/store/layout.ts
+++ b/apps/desktop/src/store/layout.ts
@ -23,6 +23,8 @@ export const SIDEBAR_SESSIONS_PAGE_SIZE = 50
 const SIDEBAR_PINNED_STORAGE_KEY = 'hermes.desktop.pinnedSessions'
 const SIDEBAR_AGENTS_GROUPED_STORAGE_KEY = 'hermes.desktop.agentsGroupedByWorkspace'
 const SIDEBAR_CRON_OPEN_STORAGE_KEY = 'hermes.desktop.sidebarCronOpen'
+const SIDEBAR_SESSION_ORDER_STORAGE_KEY = 'hermes.desktop.sessionOrder'
+const SIDEBAR_WORKSPACE_ORDER_STORAGE_KEY = 'hermes.desktop.workspaceOrder'
 const PANES_FLIPPED_STORAGE_KEY = 'hermes.desktop.panesFlipped'

 export const CHAT_SIDEBAR_PANE_ID = 'chat-sidebar'
@ -53,7 +55,14 @@ export const $sidebarWidth: ReadableAtom<number> = computed($paneStates, states
 })

 export const $pinnedSessionIds = atom(storedStringArray(SIDEBAR_PINNED_STORAGE_KEY))
+export const $sidebarSessionOrderIds = atom(storedStringArray(SIDEBAR_SESSION_ORDER_STORAGE_KEY))
+export const $sidebarWorkspaceOrderIds = atom(storedStringArray(SIDEBAR_WORKSPACE_ORDER_STORAGE_KEY))
 export const $sidebarPinsOpen = atom(true)
+// Set by the PaneShell hover-reveal overlay while the sidebar is collapsed; kept
+// true the whole time it's a floating overlay (not just while shown) so the
+// consumer mounts contents off-screen, ready to slide. ChatSidebar mounts its
+// rows on `sidebarOpen || this`.
+export const $sidebarOverlayMounted = atom(false)
 export const $sidebarRecentsOpen = atom(true)
 // Cron-job sessions live in their own section below recents, collapsed by
 // default (it only renders at all when cron sessions exist) so the
@ -68,6 +77,8 @@ export const $sessionsLimit = atom(SIDEBAR_SESSIONS_PAGE_SIZE)

 $pinnedSessionIds.subscribe(ids => persistStringArray(SIDEBAR_PINNED_STORAGE_KEY, [...ids]))
 $sidebarCronOpen.subscribe(open => persistBoolean(SIDEBAR_CRON_OPEN_STORAGE_KEY, open))
+$sidebarSessionOrderIds.subscribe(ids => persistStringArray(SIDEBAR_SESSION_ORDER_STORAGE_KEY, [...ids]))
+$sidebarWorkspaceOrderIds.subscribe(ids => persistStringArray(SIDEBAR_WORKSPACE_ORDER_STORAGE_KEY, [...ids]))
 $sidebarAgentsGrouped.subscribe(grouped => persistBoolean(SIDEBAR_AGENTS_GROUPED_STORAGE_KEY, grouped))
 $panesFlipped.subscribe(flipped => persistBoolean(PANES_FLIPPED_STORAGE_KEY, flipped))

@ -116,6 +127,10 @@ export function setSidebarPinsOpen(open: boolean) {
  $sidebarPinsOpen.set(open)
 }

+export function setSidebarOverlayMounted(mounted: boolean) {
+  $sidebarOverlayMounted.set(mounted)
+}
+
 export function setSidebarRecentsOpen(open: boolean) {
  $sidebarRecentsOpen.set(open)
 }
@ -128,6 +143,18 @@ export function setSidebarAgentsGrouped(grouped: boolean) {
  $sidebarAgentsGrouped.set(grouped)
 }

+export function setSidebarSessionOrderIds(ids: string[]) {
+  if (!arraysEqual($sidebarSessionOrderIds.get(), ids)) {
+    $sidebarSessionOrderIds.set(ids)
+  }
+}
+
+export function setSidebarWorkspaceOrderIds(ids: string[]) {
+  if (!arraysEqual($sidebarWorkspaceOrderIds.get(), ids)) {
+    $sidebarWorkspaceOrderIds.set(ids)
+  }
+}
+
 export function setSidebarResizing(resizing: boolean) {
  $isSidebarResizing.set(resizing)
 }
--- a/apps/desktop/src/store/model-visibility.test.ts
+++ b/apps/desktop/src/store/model-visibility.test.ts
@ -0,0 +1,37 @@
+import { describe, expect, it } from 'vitest'
+
+import type { ModelOptionProvider } from '@/types/hermes'
+
+import { effectiveVisibleKeys, modelVisibilityKey } from './model-visibility'
+
+const provider = (slug: string, models: string[]): ModelOptionProvider => ({
+  models,
+  name: slug,
+  slug
+})
+
+describe('model visibility', () => {
+  it('keeps newly configured providers visible when stored choices are stale', () => {
+    const stored = new Set([modelVisibilityKey('copilot', 'claude-sonnet-4.6')])
+
+    const visible = effectiveVisibleKeys(stored, [
+      provider('copilot', ['claude-sonnet-4.6']),
+      provider('local-ollama', ['qwen3:latest', 'llama3.2:latest'])
+    ])
+
+    expect(visible.has(modelVisibilityKey('copilot', 'claude-sonnet-4.6'))).toBe(true)
+    expect(visible.has(modelVisibilityKey('local-ollama', 'qwen3:latest'))).toBe(true)
+    expect(visible.has(modelVisibilityKey('local-ollama', 'llama3.2:latest'))).toBe(true)
+  })
+
+  it('does not re-add models from a provider that already has stored choices', () => {
+    const stored = new Set([modelVisibilityKey('local-ollama', 'qwen3:latest')])
+
+    const visible = effectiveVisibleKeys(stored, [
+      provider('local-ollama', ['qwen3:latest', 'llama3.2:latest'])
+    ])
+
+    expect(visible.has(modelVisibilityKey('local-ollama', 'qwen3:latest'))).toBe(true)
+    expect(visible.has(modelVisibilityKey('local-ollama', 'llama3.2:latest'))).toBe(false)
+  })
+})
--- a/apps/desktop/src/store/model-visibility.ts
+++ b/apps/desktop/src/store/model-visibility.ts
@ -104,5 +104,30 @@ export function effectiveVisibleKeys(
  stored: Set<string> | null,
  providers: readonly ModelOptionProvider[]
 ): Set<string> {
-  return stored ?? defaultVisibleKeys(providers)
+  if (!stored) {
+    return defaultVisibleKeys(providers)
+  }
+
+  if (stored.size === 0) {
+    return new Set()
+  }
+
+  const next = new Set(stored)
+
+  for (const provider of providers) {
+    const providerPrefix = `${provider.slug}::`
+    const hasStoredProvider = [...stored].some(key => key.startsWith(providerPrefix))
+
+    if (hasStoredProvider) {
+      continue
+    }
+
+    const families = collapseModelFamilies(provider.models ?? [])
+
+    for (const family of families.slice(0, DEFAULT_VISIBLE_PER_PROVIDER)) {
+      next.add(modelVisibilityKey(provider.slug, family.id))
+    }
+  }
+
+  return next
 }
--- a/apps/desktop/src/store/session.test.ts
+++ b/apps/desktop/src/store/session.test.ts
@ -1,8 +1,16 @@
-import { describe, expect, it } from 'vitest'
+import { afterEach, describe, expect, it, vi } from 'vitest'

 import type { SessionInfo } from '@/types/hermes'

-import { $attentionSessionIds, mergeSessionPage, sessionPinId, setSessionAttention } from './session'
+import {
+  $attentionSessionIds,
+  $workingSessionIds,
+  getRecentlySettledSessionIds,
+  mergeSessionPage,
+  sessionPinId,
+  setSessionAttention,
+  setSessionWorking
+} from './session'

 const session = (over: Partial<SessionInfo>): SessionInfo => ({
  archived: false,
@ -129,3 +137,61 @@ describe('mergeSessionPage', () => {
    expect(merged.map(s => s.id)).toEqual(['tip', 'other'])
  })
 })
+
+describe('getRecentlySettledSessionIds', () => {
+  afterEach(() => {
+    vi.useRealTimers()
+    $workingSessionIds.set([])
+
+    // Drain anything left in the grace map so tests stay isolated.
+    for (const id of getRecentlySettledSessionIds(Number.MAX_SAFE_INTEGER)) {
+      void id
+    }
+  })
+
+  it('keeps a session for the grace window after its turn settles, then drops it', () => {
+    vi.useFakeTimers()
+    vi.setSystemTime(0)
+    $workingSessionIds.set([])
+
+    // A turn starts then ends: the working→idle transition grants grace.
+    setSessionWorking('s1', true)
+    setSessionWorking('s1', false)
+    expect(getRecentlySettledSessionIds()).toEqual(['s1'])
+
+    // Still inside the window.
+    vi.setSystemTime(29_000)
+    expect(getRecentlySettledSessionIds()).toEqual(['s1'])
+
+    // Past the window: the entry is pruned on read.
+    vi.setSystemTime(31_000)
+    expect(getRecentlySettledSessionIds()).toEqual([])
+  })
+
+  it('does not grant grace when the session was never working (idle re-asserts)', () => {
+    vi.useFakeTimers()
+    vi.setSystemTime(0)
+    $workingSessionIds.set([])
+
+    // updateSessionState re-asserts `false` for idle sessions on every tick;
+    // these must not pin an idle chat into the keep-set indefinitely.
+    setSessionWorking('idle', false)
+    setSessionWorking('idle', false)
+    expect(getRecentlySettledSessionIds()).toEqual([])
+  })
+
+  it('clears the grace timer when the session goes busy again', () => {
+    vi.useFakeTimers()
+    vi.setSystemTime(0)
+    $workingSessionIds.set([])
+
+    setSessionWorking('s2', true)
+    setSessionWorking('s2', false)
+    expect(getRecentlySettledSessionIds()).toEqual(['s2'])
+
+    // A new turn for the same session is "working" again — drop it from the
+    // settled set so it's tracked as working, not recently-finished.
+    setSessionWorking('s2', true)
+    expect(getRecentlySettledSessionIds()).toEqual([])
+  })
+})
--- a/apps/desktop/src/store/session.ts
+++ b/apps/desktop/src/store/session.ts
@ -202,6 +202,47 @@ function clearSessionWatchdog(sessionId: string) {
  }
 }

+// A session's "working" flag clears the instant its turn ends, but the
+// cross-profile aggregator (listSessions with min_messages=1) only sees the
+// just-persisted first turn a beat later. The active chat is shielded from that
+// race by sessionsToKeep(), but a brand-new session that finished *while you
+// were viewing a different chat* is, at the next refresh, neither working,
+// pinned, nor active — so mergeSessionPage() evicts it. Nothing re-fetches
+// afterward, so it stays gone until the app restarts. (Repro: start a new chat,
+// then click another session before the first reply lands.)
+//
+// To bridge that window we keep a session in the merge keep-set for a short
+// grace period after its turn settles, giving the aggregator time to catch up.
+// Entries auto-expire, so this never accumulates and can't resurrect a deleted
+// session (mergeSessionPage only revives rows still present in the in-memory
+// list, which optimistic delete/archive already drops).
+const SESSION_SETTLE_GRACE_MS = 30 * 1000
+const settledSessionExpiry = new Map<string, number>()
+
+function markSessionSettled(sessionId: string) {
+  settledSessionExpiry.set(sessionId, Date.now() + SESSION_SETTLE_GRACE_MS)
+}
+
+function clearSessionSettled(sessionId: string) {
+  settledSessionExpiry.delete(sessionId)
+}
+
+/** Stored ids of sessions whose turn ended within the grace window. Prunes
+ *  expired entries as it reads, so it stays bounded without a timer. */
+export function getRecentlySettledSessionIds(now: number = Date.now()): string[] {
+  const live: string[] = []
+
+  for (const [id, expiry] of settledSessionExpiry) {
+    if (expiry > now) {
+      live.push(id)
+    } else {
+      settledSessionExpiry.delete(id)
+    }
+  }
+
+  return live
+}
+
 /** Call when a streaming event for a session lands. Refreshes the watchdog
 *  so the session keeps its "working" status as long as data keeps coming. */
 export function noteSessionActivity(sessionId: string | null | undefined) {
@ -243,13 +284,24 @@ export function setSessionWorking(sessionId: string | null | undefined, working:
    return
  }

+  const wasWorking = $workingSessionIds.get().includes(sessionId)
+
  toggleMembership(setWorkingSessionIds, sessionId, working)

  // Bookend the watchdog: arm on enter, disarm on leave. A later
  // noteSessionActivity() from a streaming event refreshes the timer.
  if (working) {
+    clearSessionSettled(sessionId)
    armSessionWatchdog(sessionId)
  } else {
    clearSessionWatchdog(sessionId)
+
+    // Only grant grace on a real working→idle transition (updateSessionState
+    // re-asserts `false` on every state tick, which must not keep extending the
+    // window). This keeps the just-finished session visible long enough for the
+    // aggregator to return its now-persisted row.
+    if (wasWorking) {
+      markSessionSettled(sessionId)
+    }
  }
 }
--- a/apps/desktop/src/store/updates.test.ts
+++ b/apps/desktop/src/store/updates.test.ts
@ -1,4 +1,4 @@
-import { beforeEach, describe, expect, it, vi } from 'vitest'
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'

 import type { DesktopUpdateStatus } from '@/global'

@ -23,7 +23,18 @@ vi.mock('@/store/notifications', () => ({
  dismissNotification: (...args: unknown[]) => dismissSpy(...args)
 }))

-const { maybeNotifyUpdateAvailable } = await import('./updates')
+const checkHermesUpdateSpy = vi.fn()
+const updateHermesSpy = vi.fn()
+const getActionStatusSpy = vi.fn()
+
+vi.mock('@/hermes', () => ({
+  checkHermesUpdate: (...args: unknown[]) => checkHermesUpdateSpy(...args),
+  updateHermes: (...args: unknown[]) => updateHermesSpy(...args),
+  getActionStatus: (...args: unknown[]) => getActionStatusSpy(...args)
+}))
+
+const { maybeNotifyUpdateAvailable, checkBackendUpdates, $backendUpdateStatus, applyBackendUpdate, $backendUpdateApply } = await import('./updates')
+const { setConnection } = await import('./session')

 const status = (over: Partial<DesktopUpdateStatus> = {}): DesktopUpdateStatus => ({
  supported: true,
@ -75,3 +86,114 @@ describe('maybeNotifyUpdateAvailable', () => {
    expect(notifySpy).not.toHaveBeenCalled()
  })
 })
+
+describe('checkBackendUpdates', () => {
+  beforeEach(() => {
+    storage.clear()
+    notifySpy.mockClear()
+    checkHermesUpdateSpy.mockReset()
+    $backendUpdateStatus.set(null)
+    vi.useRealTimers()
+  })
+
+  const setRemote = (on: boolean) =>
+    setConnection({
+      baseUrl: 'http://box:9119',
+      isFullscreen: false,
+      mode: on ? 'remote' : 'local',
+      nativeOverlayWidth: 0,
+      token: 't',
+      wsUrl: 'ws://box:9119',
+      logs: [],
+      windowButtonPosition: null
+    })
+
+  it('maps the backend /update/check onto the backend status, including commits', async () => {
+    setRemote(true)
+    checkHermesUpdateSpy.mockResolvedValue({
+      install_method: 'git',
+      current_version: '0.16.0',
+      behind: 2,
+      update_available: true,
+      can_apply: true,
+      update_command: 'hermes update',
+      message: null,
+      commits: [{ sha: 'abc1234', summary: 'feat: x', author: 'a', at: 1 }]
+    })
+
+    const result = await checkBackendUpdates()
+
+    expect(checkHermesUpdateSpy).toHaveBeenCalled()
+    expect(result?.behind).toBe(2)
+    expect(result?.commits?.[0]?.sha).toBe('abc1234')
+    expect(result?.supported).toBe(true)
+    expect($backendUpdateStatus.get()?.commits?.[0]?.summary).toBe('feat: x')
+  })
+
+  it('honours can_apply=false (docker/nix): not supported, carries message', async () => {
+    setRemote(true)
+    checkHermesUpdateSpy.mockResolvedValue({
+      install_method: 'docker',
+      current_version: '0.16.0',
+      behind: null,
+      update_available: false,
+      can_apply: false,
+      update_command: 'docker pull ...',
+      message: 'Docker images are immutable.'
+    })
+
+    const result = await checkBackendUpdates()
+
+    expect(result?.supported).toBe(false)
+    expect(result?.message).toBe('Docker images are immutable.')
+  })
+
+  it('is a no-op in local mode (backend check only runs when remote)', async () => {
+    setRemote(false)
+    await checkBackendUpdates()
+    expect(checkHermesUpdateSpy).not.toHaveBeenCalled()
+  })
+})
+
+describe('applyBackendUpdate recovery', () => {
+  beforeEach(() => {
+    storage.clear()
+    checkHermesUpdateSpy.mockReset()
+    updateHermesSpy.mockReset()
+    getActionStatusSpy.mockReset()
+    $backendUpdateApply.set({ applying: false, stage: 'idle', message: '', percent: null, error: null, command: null, log: [] })
+    vi.useFakeTimers()
+  })
+
+  afterEach(() => {
+    vi.useRealTimers()
+  })
+
+  it('waits for the backend to return after the restart drops the connection, then clears the overlay', async () => {
+    updateHermesSpy.mockResolvedValue({ ok: true, name: 'update', pid: 1 })
+    getActionStatusSpy.mockRejectedValue(new Error('ECONNREFUSED'))
+    checkHermesUpdateSpy.mockResolvedValue({ install_method: 'git', current_version: '0.16.0', behind: 0, update_available: false, can_apply: true, update_command: 'hermes update', message: null })
+
+    const promise = applyBackendUpdate()
+    await vi.advanceTimersByTimeAsync(5000)
+    const result = await promise
+
+    expect(result.ok).toBe(true)
+    expect($backendUpdateApply.get().stage).toBe('idle')
+    expect($backendUpdateApply.get().applying).toBe(false)
+  })
+
+  it('surfaces an error when the backend never comes back after the restart', async () => {
+    updateHermesSpy.mockResolvedValue({ ok: true, name: 'update', pid: 1 })
+    getActionStatusSpy.mockRejectedValue(new Error('ECONNREFUSED'))
+    checkHermesUpdateSpy.mockRejectedValue(new Error('ECONNREFUSED'))
+
+    const promise = applyBackendUpdate()
+    await vi.advanceTimersByTimeAsync(70000)
+    const result = await promise
+
+    expect(result.ok).toBe(false)
+    expect($backendUpdateApply.get().stage).toBe('error')
+  })
+})
+
--- a/apps/desktop/src/store/updates.ts
+++ b/apps/desktop/src/store/updates.ts
@ -13,9 +13,12 @@ import type {
  DesktopUpdateStatus,
  DesktopVersionInfo
 } from '@/global'
+import { checkHermesUpdate, getActionStatus, updateHermes } from '@/hermes'
 import { translateNow } from '@/i18n'
 import { persistString, storedString } from '@/lib/storage'
 import { dismissNotification, notify } from '@/store/notifications'
+import { $connection } from '@/store/session'
+import type { BackendUpdateCheckResponse } from '@/types/hermes'

 export interface UpdateApplyState {
  applying: boolean
@ -45,8 +48,24 @@ export const $updateChecking = atom<boolean>(false)
 export const $updateOverlayOpen = atom<boolean>(false)
 export const $updateStatus = atom<DesktopUpdateStatus | null>(null)

+// Client and backend are independently updatable; each keeps its own state.
+export const $backendUpdateStatus = atom<DesktopUpdateStatus | null>(null)
+export const $backendUpdateApply = atom<UpdateApplyState>(IDLE)
+export const $backendUpdateChecking = atom<boolean>(false)
+
+export type UpdateTarget = 'client' | 'backend'
+export const $updateOverlayTarget = atom<UpdateTarget>('client')
+
 export const setUpdateOverlayOpen = (open: boolean) => $updateOverlayOpen.set(open)
-export const resetUpdateApplyState = () => $updateApply.set(IDLE)
+export const openUpdateOverlayFor = (target: UpdateTarget) => {
+  $updateOverlayTarget.set(target)
+  $updateOverlayOpen.set(true)
+  void (target === 'backend' ? checkBackendUpdates() : checkUpdates())
+}
+export const resetUpdateApplyState = () => {
+  $updateApply.set(IDLE)
+  $backendUpdateApply.set(IDLE)
+}

 const UPDATE_TOAST_ID = 'desktop-update-available'
 // Time-based snooze instead of per-sha dismissal: this repo lands ~100 commits
@ -86,7 +105,7 @@ export function reportBackendContract(contract: number | undefined): void {
  }

  notify({
-    action: { label: translateNow('notifications.updateHermes'), onClick: () => void applyUpdates() },
+    action: { label: translateNow('notifications.updateHermes'), onClick: () => void applyBackendUpdate() },
    durationMs: 0,
    id: SKEW_TOAST_ID,
    kind: 'warning',
@ -137,13 +156,8 @@ export function maybeNotifyUpdateAvailable(status: DesktopUpdateStatus | null) {
  })
 }

-/**
- * Opens the updates dialog and kicks off a fresh check so the user always
- * sees current state, even if a stale status is cached from earlier.
- */
 export function openUpdatesWindow(): void {
-  $updateOverlayOpen.set(true)
-  void checkUpdates()
+  openUpdateOverlayFor(isRemoteMode() ? 'backend' : 'client')
 }

 /** Re-read the running app's version from the Electron main process and
@ -174,6 +188,52 @@ export async function refreshDesktopVersion(): Promise<DesktopVersionInfo | null
  }
 }

+function isRemoteMode(): boolean {
+  return $connection.get()?.mode === 'remote'
+}
+
+function mapBackendCheck(res: BackendUpdateCheckResponse): DesktopUpdateStatus {
+  const behind = res.behind ?? 0
+
+  return {
+    supported: res.can_apply,
+    message: res.message ?? undefined,
+    behind: behind > 0 ? behind : 0,
+    targetSha: res.update_available ? `backend:${res.current_version}` : undefined,
+    commits: res.commits,
+    fetchedAt: Date.now()
+  }
+}
+
+export async function checkBackendUpdates(): Promise<DesktopUpdateStatus | null> {
+  if (!isRemoteMode() || $backendUpdateChecking.get()) {
+    return $backendUpdateStatus.get()
+  }
+
+  $backendUpdateChecking.set(true)
+
+  try {
+    const status = mapBackendCheck(await checkHermesUpdate(true))
+    $backendUpdateStatus.set(status)
+    maybeNotifyUpdateAvailable(status)
+
+    return status
+  } catch (error) {
+    const fallback: DesktopUpdateStatus = {
+      supported: $backendUpdateStatus.get()?.supported ?? true,
+      error: 'check-failed',
+      message: error instanceof Error ? error.message : String(error),
+      fetchedAt: Date.now()
+    }
+
+    $backendUpdateStatus.set(fallback)
+
+    return fallback
+  } finally {
+    $backendUpdateChecking.set(false)
+  }
+}
+
 export async function checkUpdates(): Promise<DesktopUpdateStatus | null> {
  const bridge = window.hermesDesktop?.updates

@ -187,9 +247,6 @@ export async function checkUpdates(): Promise<DesktopUpdateStatus | null> {
    const status = await bridge.check()
    $updateStatus.set(status)
    maybeNotifyUpdateAvailable(status)
-    // The update check pulls the latest hermes_cli + bundled package metadata
-    // into place. Re-read the running version so About reflects the now-fresh
-    // checkout rather than the one captured at process start.
    void refreshDesktopVersion()

    return status
@ -247,6 +304,107 @@ export async function applyUpdates(opts: DesktopUpdateApplyOptions = {}): Promis
  }
 }

+const BACKEND_RETURN_POLL_MS = 1500
+const BACKEND_RETURN_MAX_ATTEMPTS = 40
+
+async function waitForBackendReturn(): Promise<boolean> {
+  for (let attempt = 0; attempt < BACKEND_RETURN_MAX_ATTEMPTS; attempt += 1) {
+    await new Promise(resolve => globalThis.setTimeout(resolve, BACKEND_RETURN_POLL_MS))
+    try {
+      await checkHermesUpdate()
+
+      return true
+    } catch {
+      continue
+    }
+  }
+
+  return false
+}
+
+function finishBackendApply(returned: boolean): DesktopUpdateApplyResult {
+  if (returned) {
+    $backendUpdateApply.set(IDLE)
+    setUpdateOverlayOpen(false)
+    void checkBackendUpdates()
+
+    return { ok: true, message: 'Backend update applied.' }
+  }
+
+  $backendUpdateApply.set({
+    ...$backendUpdateApply.get(),
+    applying: false,
+    stage: 'error',
+    error: 'apply-failed',
+    message: translateNow('updates.applyStatus.noReturn')
+  })
+
+  return { ok: false, error: 'apply-failed', message: 'Backend did not come back online.' }
+}
+
+export async function applyBackendUpdate(): Promise<DesktopUpdateApplyResult> {
+  dismissNotification(UPDATE_TOAST_ID)
+  $backendUpdateApply.set({ ...IDLE, applying: true, stage: 'prepare', message: translateNow('updates.applyStatus.preparing') })
+
+  try {
+    const started = await updateHermes()
+
+    if (!started.ok) {
+      const message = (started as { message?: string }).message || translateNow('updates.applyStatus.notAvailable')
+      const command = (started as { update_command?: string }).update_command || 'hermes update'
+      $backendUpdateApply.set({ ...IDLE, applying: false, stage: 'manual', message, command })
+
+      return { ok: false, error: 'manual', manual: true, message, command }
+    }
+
+    $backendUpdateApply.set({ ...IDLE, applying: true, stage: 'pull', message: translateNow('updates.applyStatus.pulling') })
+
+    let last: Awaited<ReturnType<typeof getActionStatus>> | null = null
+    for (let attempt = 0; attempt < 30; attempt += 1) {
+      await new Promise(resolve => globalThis.setTimeout(resolve, 1500))
+      try {
+        last = await getActionStatus(started.name, 200)
+      } catch {
+        // The dashboard restarts mid-update, dropping this connection — expected, not a failure.
+        $backendUpdateApply.set({
+          ...$backendUpdateApply.get(),
+          applying: true,
+          stage: 'restart',
+          message: translateNow('updates.applyStatus.restarting')
+        })
+
+        return finishBackendApply(await waitForBackendReturn())
+      }
+
+      if (last && !last.running) {
+        break
+      }
+    }
+
+    const ok = !!last && (last.exit_code ?? 1) === 0
+    if (ok) {
+      $backendUpdateApply.set({ ...$backendUpdateApply.get(), applying: true, stage: 'restart', message: translateNow('updates.applyStatus.restarting') })
+
+      return finishBackendApply(await waitForBackendReturn())
+    }
+
+    $backendUpdateApply.set({
+      ...$backendUpdateApply.get(),
+      applying: false,
+      stage: 'error',
+      error: 'apply-failed',
+      message: translateNow('updates.applyStatus.failed')
+    })
+
+    return { ok: false, error: 'apply-failed', message: 'Backend update failed.' }
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error)
+    $backendUpdateApply.set({ ...$backendUpdateApply.get(), applying: false, stage: 'error', error: 'apply-failed', message })
+
+    return { ok: false, error: 'apply-failed', message }
+  }
+}
+
 function ingestProgress(payload: DesktopUpdateProgress): void {
  const current = $updateApply.get()
  const log = [...current.log, { stage: payload.stage, message: payload.message, at: payload.at }].slice(-50)
@ -267,6 +425,8 @@ function ingestProgress(payload: DesktopUpdateProgress): void {
 let pollerStarted = false
 let backgroundTimer: ReturnType<typeof setInterval> | null = null
 let lastFocusAt = 0
+let connectionUnsub: (() => void) | null = null
+let lastConnectionMode: string | undefined

 /** Wire up background polling + progress streaming. Idempotent. */
 export function startUpdatePoller(): void {
@ -282,11 +442,28 @@ export function startUpdatePoller(): void {

  pollerStarted = true
  void checkUpdates()
+  void checkBackendUpdates()
  void refreshDesktopVersion()
  bridge.onProgress(ingestProgress)

+  // The poller starts at mount, before the gateway connects — so the first
+  // backend check above sees mode≠remote and no-ops. Re-check once the
+  // connection resolves to remote.
+  connectionUnsub = $connection.subscribe(conn => {
+    if (conn?.mode === lastConnectionMode) {
+      return
+    }
+    lastConnectionMode = conn?.mode
+    if (conn?.mode === 'remote') {
+      void checkBackendUpdates()
+    }
+  })
+
  window.addEventListener('focus', onFocus)
-  backgroundTimer = setInterval(() => void checkUpdates(), 30 * 60 * 1000)
+  backgroundTimer = setInterval(() => {
+    void checkUpdates()
+    void checkBackendUpdates()
+  }, 30 * 60 * 1000)
 }

 export function stopUpdatePoller(): void {
@ -295,6 +472,9 @@ export function stopUpdatePoller(): void {
    backgroundTimer = null
  }

+  connectionUnsub?.()
+  connectionUnsub = null
+  lastConnectionMode = undefined
  window.removeEventListener('focus', onFocus)
  pollerStarted = false
 }
@ -308,8 +488,6 @@ function onFocus() {

  lastFocusAt = now
  void checkUpdates()
-  // Cheap and safe to re-read on every (throttled) focus: the user may have
-  // updated Hermes from another window/CLI between focuses, and About should
-  // catch up without forcing a restart.
+  void checkBackendUpdates()
  void refreshDesktopVersion()
 }
--- a/apps/desktop/src/styles.css
+++ b/apps/desktop/src/styles.css
@ -888,52 +888,42 @@ canvas {
 }

 .fit-text {
+  --fit-captured-length: initial;
+  --fit-support-sentinel: var(--fit-captured-length, 9999px);
+
  display: flex;
-  font-size: var(--fit-text-min, 1rem);
  container-type: inline-size;
-  --captured-length: initial;
-  --support-sentinel: var(--captured-length, 9999px);
 }

-.fit-text > [aria-hidden='true'] {
+.fit-text > [aria-hidden] {
  visibility: hidden;
 }

-.fit-text > :not([aria-hidden='true']) {
+.fit-text > :not([aria-hidden]) {
  flex-grow: 1;
  container-type: inline-size;
-  --captured-length: 100cqi;
-  --available-space: var(--captured-length);
+
+  --fit-captured-length: 100cqi;
+  --fit-available-space: var(--fit-captured-length);
 }

-.fit-text > :not([aria-hidden='true']) > * {
+.fit-text > :not([aria-hidden]) > * {
+  --fit-support-sentinel: inherit;
+  --fit-captured-length: 100cqi;
+  --fit-ratio: tan(atan2(var(--fit-available-space), var(--fit-available-space) - var(--fit-captured-length)));
+
  display: block;
-  inline-size: var(--available-space);
-  line-height: var(--fit-text-line-height, 1);
-  --support-sentinel: inherit;
-  --captured-length: 100cqi;
-  --ratio: tan(atan2(var(--available-space), var(--available-space) - var(--captured-length)));
-  --font-size: clamp(
-    var(--fit-text-min, 1em),
-    1em * var(--ratio),
-    var(--fit-text-max, infinity * 1px) - var(--support-sentinel)
-  );
-  font-size: var(--font-size);
+  inline-size: var(--fit-available-space);
+  font-size: clamp(var(--fit-min, 1em), 1em * var(--fit-ratio), var(--fit-max, infinity * 1px) - var(--fit-support-sentinel));
 }

@container (inline-size > 0) {
-  .fit-text > :not([aria-hidden='true']) > * {
+  .fit-text > :not([aria-hidden]) > * {
    white-space: nowrap;
  }
 }

-@property --captured-length {
-  syntax: '<length>';
-  initial-value: 0px;
-  inherits: true;
-}
-
-@property --captured-length2 {
+@property --fit-captured-length {
  syntax: '<length>';
  initial-value: 0px;
  inherits: true;
--- a/apps/desktop/src/themes/context.tsx
+++ b/apps/desktop/src/themes/context.tsx
@ -9,15 +9,28 @@
 * The two are persisted independently. Shift+X toggles light/dark.
 */

+import { useStore } from '@nanostores/react'
 import { createContext, type ReactNode, useCallback, useContext, useEffect, useMemo, useState } from 'react'

 import { matchesQuery, useMediaQuery } from '@/hooks/use-media-query'
+import { persistString, persistStringRecord, storedString, storedStringRecord } from '@/lib/storage'
+import { $activeGatewayProfile, normalizeProfileKey } from '@/store/profile'

 import { BUILTIN_THEME_LIST, BUILTIN_THEMES, DEFAULT_SKIN_NAME, DEFAULT_TYPOGRAPHY, nousTheme } from './presets'
 import type { DesktopTheme, DesktopThemeColors } from './types'

+// Legacy global skin (pre per-profile themes). Still the inheritance fallback
+// for any profile without its own assignment, so single-profile users and old
+// installs are unaffected.
 const SKIN_KEY = 'hermes-desktop-theme-v2'
 const MODE_KEY = 'hermes-desktop-mode-v1'
+// Per-profile skin + light/dark mode assignments: { [profileKey]: value }. A
+// profile inherits the global default until it's given its own appearance.
+const PROFILE_SKINS_KEY = 'hermes-desktop-profile-themes-v1'
+const PROFILE_MODES_KEY = 'hermes-desktop-profile-modes-v1'
+// Last active profile, recorded so the boot-time paint can pick that profile's
+// theme before the gateway reports which profile actually launched.
+const LAST_PROFILE_KEY = 'hermes-desktop-active-profile-v1'
 const RETIRED_SKINS = new Set(['nous-light', 'default', 'gold'])

 export type ThemeMode = 'light' | 'dark' | 'system'
@ -27,9 +40,36 @@ const INJECTED_FONT_URLS = new Set<string>()
 const resolveMode = (mode: ThemeMode, systemDark = matchesQuery('(prefers-color-scheme: dark)')): 'light' | 'dark' =>
  mode === 'system' ? (systemDark ? 'dark' : 'light') : mode

-const normalizeSkin = (name: string | null | undefined): string =>
+const normalizeSkin = (name: string | null): string =>
  name && BUILTIN_THEMES[name] && !RETIRED_SKINS.has(name) ? name : DEFAULT_SKIN_NAME

+const normalizeMode = (value: string | null): ThemeMode =>
+  value === 'light' || value === 'dark' || value === 'system' ? value : 'light'
+
+// ─── Per-profile appearance persistence ─────────────────────────────────────
+// Skin and mode are each stored per profile. "default" isn't a real profile —
+// it *is* the legacy global slot, so it reads/writes the global directly. Named
+// profiles get their own entry and fall back to that global until assigned, so
+// unassigned profiles and pre-per-profile installs stay on the global value.
+const profilePref = <T extends string>(record: string, legacy: string, normalize: (v: string | null) => T) => ({
+  resolve: (profile: string): T => normalize(storedStringRecord(record)[profile] ?? storedString(legacy)),
+  assign: (profile: string, value: T): void => {
+    if (profile === 'default') {
+      persistString(legacy, value)
+    } else {
+      persistStringRecord(record, { ...storedStringRecord(record), [profile]: value })
+    }
+  }
+})
+
+export const skinPref = profilePref(PROFILE_SKINS_KEY, SKIN_KEY, normalizeSkin)
+export const modePref = profilePref(PROFILE_MODES_KEY, MODE_KEY, normalizeMode)
+
+// Last active profile — lets the boot paint pick its appearance before the
+// gateway reports which profile actually launched.
+const readBootProfileKey = () => normalizeProfileKey(storedString(LAST_PROFILE_KEY))
+const rememberActiveProfileKey = (profile: string) => persistString(LAST_PROFILE_KEY, profile)
+
 // ─── Color math (for synthesised light variants of dark-only skins) ────────

 function hexToRgb(hex: string): [number, number, number] | null {
@ -231,12 +271,13 @@ function applyTheme(theme: DesktopTheme, mode: 'light' | 'dark') {
  }
 }

-// Boot-time paint to avoid a flash before <ThemeProvider> mounts.
+// Boot-time paint to avoid a flash before <ThemeProvider> mounts. Use the last
+// active profile's appearance so a non-default profile relaunch paints its own
+// skin + light/dark mode.
 if (typeof window !== 'undefined') {
-  const skin = normalizeSkin(window.localStorage.getItem(SKIN_KEY))
-  const mode = (window.localStorage.getItem(MODE_KEY) as ThemeMode) ?? 'light'
-  const resolved = resolveMode(mode)
-  applyTheme(deriveTheme(skin, resolved), resolved)
+  const profile = readBootProfileKey()
+  const resolved = resolveMode(modePref.resolve(profile))
+  applyTheme(deriveTheme(skinPref.resolve(profile), resolved), resolved)
 }

 // ─── Context ────────────────────────────────────────────────────────────────
@ -264,29 +305,46 @@ const ThemeContext = createContext<ThemeContextValue>({
 })

 export function ThemeProvider({ children }: { children: ReactNode }) {
+  // Skin + mode are assigned per profile; the active profile drives which
+  // appearance shows. Single-profile users only ever see "default", so their
+  // behavior is unchanged.
+  const profileKey = normalizeProfileKey(useStore($activeGatewayProfile))
+
  const [themeName, setThemeNameState] = useState(() =>
-    typeof window === 'undefined' ? DEFAULT_SKIN_NAME : normalizeSkin(window.localStorage.getItem(SKIN_KEY))
+    typeof window === 'undefined' ? DEFAULT_SKIN_NAME : skinPref.resolve(readBootProfileKey())
  )

  const [mode, setModeState] = useState<ThemeMode>(() =>
-    typeof window === 'undefined' ? 'light' : ((window.localStorage.getItem(MODE_KEY) as ThemeMode) ?? 'light')
+    typeof window === 'undefined' ? 'light' : modePref.resolve(readBootProfileKey())
  )

+  // Follow profile switches: paint the profile's assigned skin + mode and
+  // remember it for the next boot's first paint.
+  useEffect(() => {
+    rememberActiveProfileKey(profileKey)
+    setThemeNameState(skinPref.resolve(profileKey))
+    setModeState(modePref.resolve(profileKey))
+  }, [profileKey])
+
  const systemDark = useMediaQuery('(prefers-color-scheme: dark)')
  const resolvedMode = resolveMode(mode, systemDark)
  const activeTheme = useMemo(() => deriveTheme(themeName, resolvedMode), [themeName, resolvedMode])

  useEffect(() => applyTheme(activeTheme, resolvedMode), [activeTheme, resolvedMode])

+  // Assign to whichever profile is live right now (read fresh so the callbacks
+  // stay stable across profile switches).
+  const liveProfile = () => normalizeProfileKey($activeGatewayProfile.get())
+
  const setTheme = useCallback((name: string) => {
    const next = normalizeSkin(name)
    setThemeNameState(next)
-    window.localStorage.setItem(SKIN_KEY, next)
+    skinPref.assign(liveProfile(), next)
  }, [])

  const setMode = useCallback((next: ThemeMode) => {
    setModeState(next)
-    window.localStorage.setItem(MODE_KEY, next)
+    modePref.assign(liveProfile(), next)
  }, [])

  // The light/dark toggle (Shift+X by default) is owned by the keybind runtime
--- a/apps/desktop/src/themes/profile-theme.test.ts
+++ b/apps/desktop/src/themes/profile-theme.test.ts
@ -0,0 +1,41 @@
+import { beforeEach, describe, expect, it } from 'vitest'
+
+import { modePref, skinPref } from './context'
+import { DEFAULT_SKIN_NAME } from './presets'
+
+// Skin and mode share one per-profile contract, so assert it once over both.
+interface Pref {
+  resolve: (profile: string) => string
+  assign: (profile: string, value: string) => void
+}
+
+const cases = [
+  { name: 'skin', pref: skinPref as unknown as Pref, fallback: DEFAULT_SKIN_NAME, a: 'ember', b: 'midnight', junk: 'nope' },
+  { name: 'mode', pref: modePref as unknown as Pref, fallback: 'light', a: 'dark', b: 'system', junk: 'dusk' }
+]
+
+describe.each(cases)('per-profile $name', ({ pref, fallback, a, b, junk }) => {
+  beforeEach(() => window.localStorage.clear())
+
+  it('falls back to the default when unassigned', () => {
+    expect(pref.resolve('default')).toBe(fallback)
+    expect(pref.resolve('work')).toBe(fallback)
+  })
+
+  it('keeps each profile on its own value', () => {
+    pref.assign('work', a)
+    pref.assign('default', b)
+    expect(pref.resolve('work')).toBe(a)
+    expect(pref.resolve('default')).toBe(b)
+  })
+
+  it('lets unassigned profiles inherit the default profile as the global fallback', () => {
+    pref.assign('default', a)
+    expect(pref.resolve('never-themed')).toBe(a)
+  })
+
+  it('normalizes an unknown stored value back to the default', () => {
+    pref.assign('work', junk)
+    expect(pref.resolve('work')).toBe(fallback)
+  })
+})
--- a/apps/desktop/src/types/hermes.ts
+++ b/apps/desktop/src/types/hermes.ts
@ -596,6 +596,27 @@ export interface ActionStatusResponse {
  running: boolean
 }

+export interface BackendUpdateCommit {
+  sha: string
+  summary: string
+  author: string
+  at: number
+}
+
+/** Shape of `GET /api/hermes/update/check` — the backend's own update state.
+ *  Used by the desktop's remote update overlay so the backend version (not the
+ *  Electron client clone) drives "what's changed + Install" in remote mode. */
+export interface BackendUpdateCheckResponse {
+  install_method: string
+  current_version: string
+  behind: number | null
+  update_available: boolean
+  can_apply: boolean
+  update_command: string | null
+  message: string | null
+  commits?: BackendUpdateCommit[]
+}
+
 export interface AuxiliaryTaskAssignment {
  base_url: string
  model: string
--- a/apps/desktop/vite.config.ts
+++ b/apps/desktop/vite.config.ts
@ -6,6 +6,19 @@ import path from 'path'
 export default defineConfig({
  base: './',
  plugins: [react(), tailwindcss()],
+  css: {
+    // Pin an explicit (empty) PostCSS config. Tailwind is handled entirely by
+    // `@tailwindcss/vite`, so the renderer needs no PostCSS plugins — and
+    // without this, Vite's `postcss-load-config` walks UP the filesystem
+    // looking for a stray `postcss.config.*` / `tailwind.config.*`. The desktop
+    // build runs from inside the user's home tree (e.g.
+    // `C:\Users\<name>\AppData\Local\hermes\hermes-agent\apps\desktop`), so an
+    // unrelated Tailwind v3 config higher up the tree gets picked up and
+    // reprocesses our v4 stylesheet, failing the build with
+    // "`@layer base` is used but no matching `@tailwind base` directive is
+    // present." Pinning the config makes the build hermetic.
+    postcss: { plugins: [] }
+  },
  build: {
    // Keep desktop packaging stable: Shiki ships many dynamic chunks by
    // default, and electron-builder can OOM scanning thousands of files.
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@ -885,7 +885,7 @@ delegation:
  max_iterations: 50                          # Max tool-calling turns per child (default: 50)
  # max_concurrent_children: 3                # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
                                              # WARNING: values above 10 multiply API cost linearly.
-  # max_spawn_depth: 1                        # Delegation tree depth (floor 1, no ceiling; default: 1 = flat).
+  # max_spawn_depth: 1                        # Delegation tree depth cap (range: 1-3, default: 1 = flat).
                                              # Raise to 2 to allow workers to spawn their own subagents.
                                              # Requires role="orchestrator" on intermediate agents.
  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
--- a/cli.py
+++ b/cli.py
--- a/gateway/authz_mixin.py
+++ b/gateway/authz_mixin.py
@ -0,0 +1,426 @@
+"""User-authorization methods for ``GatewayRunner``.
+
+Extracted from ``gateway/run.py`` as part of the god-file decomposition campaign
+(``~/.hermes/plans/god-file-decomposition.md``, Phase 3 mechanical mixin lifts).
+This mixin holds the inbound-message authorization cluster: whether a user/chat
+is allowed to talk to the agent, the per-adapter DM policy, and the
+unauthorized-DM behavior.
+
+Behavior-neutral: every method is lifted verbatim from ``GatewayRunner``.
+``self.*`` calls resolve unchanged via the MRO. Neutral dependencies import at
+module top; the module-level ``logger`` is imported lazily inside the one method
+that uses it (``from gateway.run import logger`` resolves at call time, when
+``gateway.run`` is fully loaded) so this module never imports ``gateway.run`` at
+import time -> no import cycle. The lazy import preserves the exact logger name
+(``"gateway.run"``) so log records are unchanged.
+"""
+
+from __future__ import annotations
+
+import os
+from typing import Optional
+
+from gateway.config import Platform
+from gateway.session import SessionSource
+from gateway.whatsapp_identity import (
+    expand_whatsapp_aliases as _expand_whatsapp_auth_aliases,
+    normalize_whatsapp_identifier as _normalize_whatsapp_identifier,
+)
+
+
+class GatewayAuthorizationMixin:
+    """User/chat authorization methods for ``GatewayRunner``."""
+
+    def _adapter_enforces_own_access_policy(self, platform: Optional[Platform]) -> bool:
+        """Whether the adapter for *platform* gates access at intake itself.
+
+        Mirrors ``BasePlatformAdapter.enforces_own_access_policy``. Adapters
+        such as WeCom, Weixin, Yuanbao, QQBot, and WhatsApp evaluate their
+        documented ``dm_policy`` / ``group_policy`` / ``allow_from`` config before a
+        message is dispatched to the gateway, so a message that reaches
+        ``_is_user_authorized`` has already been authorized by the adapter.
+        Defaults to ``False`` when the adapter is unknown or doesn't expose
+        the flag.
+        """
+        if not platform:
+            return False
+        # Some test helpers build a bare GatewayRunner via object.__new__ and
+        # never set ``adapters``; treat a missing/empty map as "no adapter"
+        # rather than raising (see pitfalls.md #17).
+        adapters = getattr(self, "adapters", None)
+        if not adapters:
+            return False
+        adapter = adapters.get(platform)
+        if adapter is None:
+            return False
+        return bool(getattr(adapter, "enforces_own_access_policy", False))
+
+    def _adapter_dm_policy(self, platform: Optional[Platform]) -> str:
+        """Best-effort read of an own-policy adapter's effective DM policy.
+
+        Returns the lowercased ``dm_policy`` (``"open"`` / ``"allowlist"`` /
+        ``"disabled"`` / ``"pairing"``) for *platform*, or ``""`` when unknown.
+        Prefers the live adapter's resolved ``_dm_policy`` — which already folds
+        in both ``config.extra`` and the ``<PLATFORM>_DM_POLICY`` env var (the
+        env var is not always bridged back into ``config.extra``) — and falls
+        back to ``config.extra`` for bare runners built without a live adapter.
+
+        Used by ``_is_user_authorized`` to carve ``dm_policy: pairing`` out of
+        the adapter-trust shortcut: in pairing mode the adapter forwards the DM
+        so the gateway can run its pairing handshake, so "reached the gateway"
+        must not be read as "authorized".
+        """
+        if not platform:
+            return ""
+        adapters = getattr(self, "adapters", None) or {}
+        adapter = adapters.get(platform)
+        policy = getattr(adapter, "_dm_policy", None) if adapter is not None else None
+        if policy is None:
+            config = getattr(self, "config", None)
+            platform_cfg = (
+                config.platforms.get(platform)
+                if config is not None and hasattr(config, "platforms")
+                else None
+            )
+            extra = getattr(platform_cfg, "extra", None) if platform_cfg else None
+            if isinstance(extra, dict):
+                policy = extra.get("dm_policy")
+        return str(policy or "").strip().lower()
+
+    def _is_user_authorized(self, source: SessionSource) -> bool:
+        """
+        Check if a user is authorized to use the bot.
+        
+        Checks in order:
+        1. Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
+        2. Environment variable allowlists (TELEGRAM_ALLOWED_USERS, etc.)
+        3. DM pairing approved list
+        4. Global allow-all (GATEWAY_ALLOW_ALL_USERS=true)
+        5. Default: deny
+        """
+        from gateway.run import logger
+        # Home Assistant events are system-generated (state changes), not
+        # user-initiated messages.  The HASS_TOKEN already authenticates the
+        # connection, so HA events are always authorized.
+        # Webhook events are authenticated via HMAC signature validation in
+        # the adapter itself — no user allowlist applies.
+        if source.platform in {Platform.HOMEASSISTANT, Platform.WEBHOOK}:
+            return True
+
+        user_id = source.user_id
+
+        # Telegram (and similar) authorize entire group/forum/channel chats
+        # by chat ID via TELEGRAM_GROUP_ALLOWED_CHATS / QQ_GROUP_ALLOWED_USERS.
+        # That allowlist is chat-scoped, so it must work even when
+        # source.user_id is None — Telegram emits anonymous-admin posts,
+        # sender_chat traffic, and channel broadcasts with no `from_user`,
+        # and an operator who explicitly listed the chat expects those to
+        # be honored. Run this check before the no-user-id guard below so
+        # documented behavior matches reality
+        # (website/docs/reference/environment-variables.md,
+        # website/docs/user-guide/messaging/telegram.md).
+        if source.chat_type in {"group", "forum", "channel"} and source.chat_id:
+            chat_allowlist_env = {
+                Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_CHATS",
+                Platform.QQBOT: "QQ_GROUP_ALLOWED_USERS",
+            }.get(source.platform, "")
+            if chat_allowlist_env:
+                raw_chat_allowlist = os.getenv(chat_allowlist_env, "").strip()
+                if raw_chat_allowlist:
+                    allowed_group_ids = {
+                        cid.strip()
+                        for cid in raw_chat_allowlist.split(",")
+                        if cid.strip()
+                    }
+                    if "*" in allowed_group_ids or source.chat_id in allowed_group_ids:
+                        return True
+
+        if not user_id:
+            return False
+
+        platform_env_map = {
+            Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
+            Platform.DISCORD: "DISCORD_ALLOWED_USERS",
+            Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
+            Platform.SLACK: "SLACK_ALLOWED_USERS",
+            Platform.SIGNAL: "SIGNAL_ALLOWED_USERS",
+            Platform.EMAIL: "EMAIL_ALLOWED_USERS",
+            Platform.SMS: "SMS_ALLOWED_USERS",
+            Platform.MATTERMOST: "MATTERMOST_ALLOWED_USERS",
+            Platform.MATRIX: "MATRIX_ALLOWED_USERS",
+            Platform.DINGTALK: "DINGTALK_ALLOWED_USERS",
+            Platform.FEISHU: "FEISHU_ALLOWED_USERS",
+            Platform.WECOM: "WECOM_ALLOWED_USERS",
+            Platform.WECOM_CALLBACK: "WECOM_CALLBACK_ALLOWED_USERS",
+            Platform.WEIXIN: "WEIXIN_ALLOWED_USERS",
+            Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOWED_USERS",
+            Platform.QQBOT: "QQ_ALLOWED_USERS",
+            Platform.YUANBAO: "YUANBAO_ALLOWED_USERS",
+        }
+        platform_group_user_env_map = {
+            Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_USERS",
+        }
+        platform_group_chat_env_map = {
+            Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_CHATS",
+            Platform.QQBOT: "QQ_GROUP_ALLOWED_USERS",
+        }
+        platform_allow_all_map = {
+            Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
+            Platform.DISCORD: "DISCORD_ALLOW_ALL_USERS",
+            Platform.WHATSAPP: "WHATSAPP_ALLOW_ALL_USERS",
+            Platform.SLACK: "SLACK_ALLOW_ALL_USERS",
+            Platform.SIGNAL: "SIGNAL_ALLOW_ALL_USERS",
+            Platform.EMAIL: "EMAIL_ALLOW_ALL_USERS",
+            Platform.SMS: "SMS_ALLOW_ALL_USERS",
+            Platform.MATTERMOST: "MATTERMOST_ALLOW_ALL_USERS",
+            Platform.MATRIX: "MATRIX_ALLOW_ALL_USERS",
+            Platform.DINGTALK: "DINGTALK_ALLOW_ALL_USERS",
+            Platform.FEISHU: "FEISHU_ALLOW_ALL_USERS",
+            Platform.WECOM: "WECOM_ALLOW_ALL_USERS",
+            Platform.WECOM_CALLBACK: "WECOM_CALLBACK_ALLOW_ALL_USERS",
+            Platform.WEIXIN: "WEIXIN_ALLOW_ALL_USERS",
+            Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOW_ALL_USERS",
+            Platform.QQBOT: "QQ_ALLOW_ALL_USERS",
+            Platform.YUANBAO: "YUANBAO_ALLOW_ALL_USERS",
+        }
+        # Bots admitted by {PLATFORM}_ALLOW_BOTS bypass the human allowlist (#4466).
+        platform_allow_bots_map = {
+            Platform.DISCORD: "DISCORD_ALLOW_BOTS",
+            Platform.FEISHU: "FEISHU_ALLOW_BOTS",
+        }
+
+        # Plugin platforms: check the registry for auth env var names
+        if source.platform not in platform_env_map:
+            try:
+                from gateway.platform_registry import platform_registry
+                entry = platform_registry.get(source.platform.value)
+                if entry:
+                    if entry.allowed_users_env:
+                        platform_env_map[source.platform] = entry.allowed_users_env
+                    if entry.allow_all_env:
+                        platform_allow_all_map[source.platform] = entry.allow_all_env
+            except Exception:
+                pass
+
+        # Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
+        platform_allow_all_var = platform_allow_all_map.get(source.platform, "")
+        if platform_allow_all_var and os.getenv(platform_allow_all_var, "").lower() in {"true", "1", "yes"}:
+            return True
+
+        if getattr(source, "is_bot", False):
+            allow_bots_var = platform_allow_bots_map.get(source.platform)
+            if allow_bots_var and os.getenv(allow_bots_var, "none").lower().strip() in {"mentions", "all"}:
+                return True
+
+        # Check pairing store (always checked, regardless of allowlists)
+        platform_name = source.platform.value if source.platform else ""
+        if self.pairing_store.is_approved(platform_name, user_id):
+            return True
+
+        # Check platform-specific and global allowlists
+        platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""), "").strip()
+        group_user_allowlist = ""
+        group_chat_allowlist = ""
+        if source.chat_type in {"group", "forum"}:
+            group_user_allowlist = os.getenv(platform_group_user_env_map.get(source.platform, ""), "").strip()
+            group_chat_allowlist = os.getenv(platform_group_chat_env_map.get(source.platform, ""), "").strip()
+        global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "").strip()
+
+        if not platform_allowlist and not group_user_allowlist and not group_chat_allowlist and not global_allowlist:
+            # No env allowlists configured. Adapters that own their own
+            # config-driven access policy (dm_policy / group_policy /
+            # allow_from / group_allow_from) already gated this message at
+            # intake — it would not have reached the gateway otherwise — so
+            # honor that decision instead of falling through to the
+            # env-only default-deny below, which would silently break
+            # `dm_policy: open` and config-only allowlists. (#34515)
+            if self._adapter_enforces_own_access_policy(source.platform):
+                # Exception: `dm_policy: pairing` does NOT authorize at intake.
+                # The adapter forwards the DM precisely so the gateway can run
+                # its pairing handshake (issue a code, consult the pairing
+                # store). The pairing-store approval check above already ran and
+                # returned False for this sender, so blanket-trusting the
+                # adapter here would silently turn pairing mode into open
+                # access. Fall through to default-deny so the unpaired sender is
+                # offered a pairing code instead. (Pairing is DM-only; group
+                # traffic keeps the adapter-trust path.)
+                if not (
+                    source.chat_type == "dm"
+                    and self._adapter_dm_policy(source.platform) == "pairing"
+                ):
+                    return True
+            # No allowlists configured -- check global allow-all flag
+            return os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in {"true", "1", "yes"}
+
+        # Telegram can optionally authorize group traffic by chat ID.
+        # Keep this separate from TELEGRAM_GROUP_ALLOWED_USERS, which gates
+        # the sender user ID for group/forum messages.
+        if group_chat_allowlist and source.chat_type in {"group", "forum"} and source.chat_id:
+            allowed_group_ids = {
+                chat_id.strip() for chat_id in group_chat_allowlist.split(",") if chat_id.strip()
+            }
+            if "*" in allowed_group_ids or source.chat_id in allowed_group_ids:
+                return True
+
+        # Backward-compat shim for #15027: prior to PR #17686,
+        # TELEGRAM_GROUP_ALLOWED_USERS was (mis)used as a chat-ID allowlist.
+        # Values starting with "-" are Telegram chat IDs, not user IDs, so if
+        # users still have those in TELEGRAM_GROUP_ALLOWED_USERS we honor them
+        # as chat IDs and warn once. The correct var is now
+        # TELEGRAM_GROUP_ALLOWED_CHATS.
+        if (
+            source.platform == Platform.TELEGRAM
+            and group_user_allowlist
+            and source.chat_type in {"group", "forum"}
+            and source.chat_id
+        ):
+            legacy_chat_ids = {
+                v.strip()
+                for v in group_user_allowlist.split(",")
+                if v.strip().startswith("-")
+            }
+            if legacy_chat_ids:
+                if not getattr(self, "_warned_telegram_group_users_legacy", False):
+                    logger.warning(
+                        "TELEGRAM_GROUP_ALLOWED_USERS contains chat-ID-shaped values "
+                        "(%s). Treating them as chat IDs for backward compatibility. "
+                        "Move chat IDs to TELEGRAM_GROUP_ALLOWED_CHATS — the _USERS var "
+                        "is now for sender user IDs.",
+                        ",".join(sorted(legacy_chat_ids)),
+                    )
+                    self._warned_telegram_group_users_legacy = True
+                if source.chat_id in legacy_chat_ids:
+                    return True
+
+        # Check if user is in any allowlist. In group/forum chats,
+        # TELEGRAM_GROUP_ALLOWED_USERS is the scoped allowlist and should not
+        # imply DM access; TELEGRAM_ALLOWED_USERS remains the platform-wide
+        # allowlist and still works everywhere for backward compatibility.
+        allowed_ids = set()
+        if platform_allowlist:
+            allowed_ids.update(uid.strip() for uid in platform_allowlist.split(",") if uid.strip())
+        if group_user_allowlist:
+            allowed_ids.update(uid.strip() for uid in group_user_allowlist.split(",") if uid.strip())
+        if global_allowlist:
+            allowed_ids.update(uid.strip() for uid in global_allowlist.split(",") if uid.strip())
+
+        # "*" in any allowlist means allow everyone (consistent with
+        # SIGNAL_GROUP_ALLOWED_USERS precedent)
+        if "*" in allowed_ids:
+            return True
+
+        check_ids = {user_id}
+        if "@" in user_id:
+            check_ids.add(user_id.split("@")[0])
+
+        # WhatsApp: resolve phone↔LID aliases from bridge session mapping files
+        if source.platform == Platform.WHATSAPP:
+            normalized_allowed_ids = set()
+            for allowed_id in allowed_ids:
+                normalized_allowed_ids.update(_expand_whatsapp_auth_aliases(allowed_id))
+            if normalized_allowed_ids:
+                allowed_ids = normalized_allowed_ids
+
+            check_ids.update(_expand_whatsapp_auth_aliases(user_id))
+            normalized_user_id = _normalize_whatsapp_identifier(user_id)
+            if normalized_user_id:
+                check_ids.add(normalized_user_id)
+
+        # SimpleX: SIMPLEX_ALLOWED_USERS accepts either the numeric contactId
+        # or the contact's display name. The adapter sets user_id=contactId for
+        # stability across renames, but the SimpleX UI never surfaces the
+        # numeric id — operators only see display names, so that's what they
+        # naturally put in the env var. Match both so the allowlist works
+        # regardless of which form was chosen.
+        # Plugin platform: compare by value since Platform.SIMPLEX is not a
+        # hardcoded enum member (it's a dynamic plugin platform).
+        if (
+            source.platform is not None
+            and source.platform.value == "simplex"
+            and source.user_name
+        ):
+            check_ids.add(source.user_name)
+
+        return bool(check_ids & allowed_ids)
+
+    def _get_unauthorized_dm_behavior(self, platform: Optional[Platform]) -> str:
+        """Return how unauthorized DMs should be handled for a platform.
+
+        Resolution order:
+        1. Explicit per-platform ``unauthorized_dm_behavior`` in config — always wins.
+        2. Explicit global ``unauthorized_dm_behavior`` in config — wins when no per-platform.
+        3. When an allowlist (``PLATFORM_ALLOWED_USERS``,
+           ``PLATFORM_GROUP_ALLOWED_USERS`` / ``PLATFORM_GROUP_ALLOWED_CHATS``,
+           or ``GATEWAY_ALLOWED_USERS``) is configured, default to ``"ignore"`` —
+           the allowlist signals that the owner has deliberately restricted
+           access; spamming unknown contacts with pairing codes is both noisy
+           and a potential info-leak. (#9337)
+        4. No allowlist and no explicit config → ``"pair"`` (open-gateway default).
+        """
+        config = getattr(self, "config", None)
+
+        # Check for an explicit per-platform override first.
+        if config and hasattr(config, "get_unauthorized_dm_behavior") and platform:
+            platform_cfg = config.platforms.get(platform) if hasattr(config, "platforms") else None
+            if platform_cfg and "unauthorized_dm_behavior" in getattr(platform_cfg, "extra", {}):
+                # Operator explicitly configured behavior for this platform — respect it.
+                return config.get_unauthorized_dm_behavior(platform)
+
+        # Check for an explicit global config override.
+        if config and hasattr(config, "unauthorized_dm_behavior"):
+            if config.unauthorized_dm_behavior != "pair":  # non-default → explicit override
+                return config.unauthorized_dm_behavior
+
+        # Config-driven dm_policy (WeCom / Weixin / Yuanbao / QQBot). An
+        # allowlist or disabled DM policy means the operator restricted access,
+        # so unauthorized DMs should be dropped silently rather than answered
+        # with a pairing code. An explicit pairing policy opts back into codes.
+        if platform and config and hasattr(config, "platforms"):
+            platform_cfg = config.platforms.get(platform)
+            extra = getattr(platform_cfg, "extra", None) if platform_cfg else None
+            if isinstance(extra, dict):
+                dm_policy = str(extra.get("dm_policy") or "").strip().lower()
+                if dm_policy == "pairing":
+                    return "pair"
+                if dm_policy in {"allowlist", "disabled"}:
+                    return "ignore"
+
+        # No explicit override.  Fall back to allowlist-aware default:
+        # if any allowlist is configured for this platform, silently drop
+        # unauthorized messages instead of sending pairing codes.
+        if platform:
+            platform_env_map = {
+                Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
+                Platform.DISCORD:  "DISCORD_ALLOWED_USERS",
+                Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
+                Platform.SLACK:    "SLACK_ALLOWED_USERS",
+                Platform.SIGNAL:   "SIGNAL_ALLOWED_USERS",
+                Platform.EMAIL:    "EMAIL_ALLOWED_USERS",
+                Platform.SMS:      "SMS_ALLOWED_USERS",
+                Platform.MATTERMOST: "MATTERMOST_ALLOWED_USERS",
+                Platform.MATRIX:   "MATRIX_ALLOWED_USERS",
+                Platform.DINGTALK: "DINGTALK_ALLOWED_USERS",
+                Platform.FEISHU:   "FEISHU_ALLOWED_USERS",
+                Platform.WECOM:    "WECOM_ALLOWED_USERS",
+                Platform.WECOM_CALLBACK: "WECOM_CALLBACK_ALLOWED_USERS",
+                Platform.WEIXIN:   "WEIXIN_ALLOWED_USERS",
+                Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOWED_USERS",
+                Platform.QQBOT:    "QQ_ALLOWED_USERS",
+            }
+            platform_group_env_map = {
+                Platform.TELEGRAM: (
+                    "TELEGRAM_GROUP_ALLOWED_USERS",
+                    "TELEGRAM_GROUP_ALLOWED_CHATS",
+                ),
+                Platform.QQBOT: ("QQ_GROUP_ALLOWED_USERS",),
+            }
+            if os.getenv(platform_env_map.get(platform, ""), "").strip():
+                return "ignore"
+            for env_key in platform_group_env_map.get(platform, ()):
+                if os.getenv(env_key, "").strip():
+                    return "ignore"
+
+        if os.getenv("GATEWAY_ALLOWED_USERS", "").strip():
+            return "ignore"
+
+        return "pair"
--- a/gateway/hooks.py
+++ b/gateway/hooks.py
@ -17,6 +17,23 @@ Events:
  - command:*           -- Any slash command executed (wildcard match)

 Errors in hooks are caught and logged but never block the main pipeline.
+
+Context dict passed to ``agent:start`` / ``agent:end`` handlers:
+  platform     -- source platform name (e.g. "telegram", "matrix", "slack")
+  user_id      -- platform user id of the sender
+  chat_id      -- platform chat id (group/DM identifier)
+  thread_id    -- Telegram forum-topic id / thread root id (string; empty
+                  when not in a thread / topic)
+  chat_type    -- "dm" | "group" | "forum" (empty if unknown)
+  session_id   -- Hermes session id
+  message      -- inbound message text (truncated to 500 chars)
+
+``agent:end`` adds:
+  response     -- agent response text (truncated to 500 chars)
+
+Handlers posting a follow-up into the same Telegram forum-topic should
+include ``message_thread_id=int(thread_id)`` when ``chat_type == "forum"``
+and ``thread_id`` is non-empty.
 """

 import asyncio
--- a/gateway/kanban_watchers.py
+++ b/gateway/kanban_watchers.py
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@ -61,6 +61,29 @@ from gateway.platforms.base import (

 logger = logging.getLogger(__name__)

+
+def _hermes_version() -> str:
+    """Return the hermes-agent version string, or "dev" if it can't be resolved.
+
+    Tries the installed package metadata first (authoritative for a pip/uv
+    install), then the in-tree ``hermes_cli.__version__`` (covers editable /
+    source checkouts where metadata may be stale or absent). Never raises —
+    a version probe must not be able to break the health endpoint.
+    """
+    try:
+        from importlib.metadata import version
+
+        return version("hermes-agent")
+    except Exception:
+        pass
+    try:
+        from hermes_cli import __version__
+
+        return __version__
+    except Exception:
+        return "dev"
+
+
 # Default settings
 DEFAULT_HOST = "127.0.0.1"
 DEFAULT_PORT = 8642
@ -1047,7 +1070,9 @@ class APIServerAdapter(BasePlatformAdapter):

    async def _handle_health(self, request: "web.Request") -> "web.Response":
        """GET /health — simple health check."""
-        return web.json_response({"status": "ok", "platform": "hermes-agent"})
+        return web.json_response(
+            {"status": "ok", "platform": "hermes-agent", "version": _hermes_version()}
+        )

    async def _handle_health_detailed(self, request: "web.Request") -> "web.Response":
        """GET /health/detailed — rich status for cross-container dashboard probing.
@ -1062,6 +1087,7 @@ class APIServerAdapter(BasePlatformAdapter):
        return web.json_response({
            "status": "ok",
            "platform": "hermes-agent",
+            "version": _hermes_version(),
            "gateway_state": runtime.get("gateway_state"),
            "platforms": runtime.get("platforms", {}),
            "active_agents": runtime.get("active_agents", 0),
@ -1454,10 +1480,11 @@ class APIServerAdapter(BasePlatformAdapter):
        if err:
            return err
        db = self._ensure_session_db()
-        messages = db.get_messages(session_id)
+        resolved_id = db.resolve_resume_session_id(session_id)
+        messages = db.get_messages(resolved_id)
        return web.json_response({
            "object": "list",
-            "session_id": session_id,
+            "session_id": resolved_id,
            "data": [self._message_response(m) for m in messages],
        })

--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@ -1792,7 +1792,14 @@ class BasePlatformAdapter(ABC):
    - Sending messages/responses
    - Handling media
    """
-    
+
+    # Whether this platform renders triple-backtick fenced code blocks (i.e.
+    # ``format_message`` translates/preserves markdown fences into a real code
+    # block).  Drives presentation choices like rendering a ``terminal`` tool
+    # call's command as a ```bash block instead of a flat preview line.
+    # Default False (plain-text platforms); markdown-rendering adapters set True.
+    supports_code_blocks: bool = False
+
    def __init__(self, config: PlatformConfig, platform: Platform):
        self.config = config
        self.platform = platform
--- a/gateway/platforms/feishu.py
+++ b/gateway/platforms/feishu.py
@ -1409,6 +1409,8 @@ def check_feishu_requirements() -> bool:
 class FeishuAdapter(BasePlatformAdapter):
    """Feishu/Lark bot adapter."""

+    supports_code_blocks = True  # Feishu renders fenced code blocks
+
    MAX_MESSAGE_LENGTH = 8000
    # Max distinct chat IDs retained in _chat_locks before LRU eviction kicks in.
    CHAT_LOCK_MAX_SIZE: int = 1000
--- a/gateway/platforms/matrix.py
+++ b/gateway/platforms/matrix.py
@ -420,6 +420,8 @@ class _CryptoStateStore:
 class MatrixAdapter(BasePlatformAdapter):
    """Gateway adapter for Matrix (any homeserver)."""

+    supports_code_blocks = True  # Matrix renders fenced code blocks (HTML/markdown)
+
    # Threshold for detecting Matrix client-side message splits.
    # When a chunk is near the ~4000-char practical limit, a continuation
    # is almost certain.
--- a/gateway/platforms/slack.py
+++ b/gateway/platforms/slack.py
@ -317,6 +317,7 @@ class SlackAdapter(BasePlatformAdapter):
    """

    MAX_MESSAGE_LENGTH = 39000  # Slack API allows 40,000 chars; leave margin
+    supports_code_blocks = True  # Slack mrkdwn renders fenced code blocks

    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.SLACK)
@ -2290,7 +2291,38 @@ class SlackAdapter(BasePlatformAdapter):
            if not thread_ts and self._dm_top_level_threads_as_sessions():
                thread_ts = ts
        else:
-            thread_ts = event.get("thread_ts") or ts  # ts fallback for channels
+            # Channel message session scoping.
+            #
+            # Three cases:
+            #   (a) genuine thread reply   → scope session per thread
+            #   (b) top-level, reply_in_thread=true (the default)  →
+            #       legacy behaviour: each top-level message becomes its
+            #       own thread, so the UX still "replies in a thread"
+            #       and sessions are keyed per thread root
+            #   (c) top-level, reply_in_thread=false → scope one session
+            #       across the whole channel so context accumulates across
+            #       messages (#15421 bug 1)
+            event_thread_ts_raw = event.get("thread_ts")
+            # Align with ``is_thread_reply`` below — a ``thread_ts ==
+            # ts`` payload (some thread-root shapes) is not a real reply
+            # and must not prevent the shared-session path from taking
+            # effect.  Matching the same invariant here keeps the two
+            # branches in sync even if Slack introduces new payload
+            # variants (Copilot on #15464).
+            if event_thread_ts_raw and event_thread_ts_raw != ts:
+                thread_ts = event_thread_ts_raw
+            elif self.config.extra.get("reply_in_thread", True):
+                # Legacy default: treat ts as a synthetic thread root so
+                # this top-level message gets its own session.
+                thread_ts = ts
+            else:
+                # reply_in_thread=false: no thread key → session manager
+                # groups by (platform, channel_id, None) and the channel
+                # shares one conversation.  reply_to_message_id at the
+                # outbound side is already gated on ``thread_ts != ts``
+                # so None here produces a non-threaded reply without
+                # further changes.
+                thread_ts = None

        # In channels, respond if:
        #   0. Channel is in free_response_channels, OR require_mention is
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@ -344,6 +344,7 @@ class TelegramAdapter(BasePlatformAdapter):

    # Telegram message limits
    MAX_MESSAGE_LENGTH = 4096
+    supports_code_blocks = True  # Telegram MarkdownV2 renders fenced code blocks
    # Threshold for detecting Telegram client-side message splits.
    # When a chunk is near this limit, a continuation is almost certain.
    _SPLIT_THRESHOLD = 4000
@ -1142,7 +1143,13 @@ class TelegramAdapter(BasePlatformAdapter):
                # gateway process is alive and reports "connected" but
                # no messages are received or sent.
                if self._polling_conflict_count < MAX_CONFLICT_RETRIES:
-                    loop = asyncio.get_event_loop()
+                    # We are inside a running coroutine, so the running loop is
+                    # guaranteed to exist. asyncio.get_event_loop() is deprecated
+                    # and raises "RuntimeError: There is no current event loop in
+                    # thread 'MainThread'" on Python 3.10+ when invoked from a
+                    # context without an attached loop (which can happen when PTB
+                    # dispatches this error callback). Use get_running_loop().
+                    loop = asyncio.get_running_loop()
                    self._polling_error_task = loop.create_task(
                        self._handle_polling_conflict(retry_err)
                    )
--- a/gateway/platforms/weixin.py
+++ b/gateway/platforms/weixin.py
@ -1138,6 +1138,8 @@ async def qr_login(
 class WeixinAdapter(BasePlatformAdapter):
    """Native Hermes adapter for Weixin personal accounts."""

+    supports_code_blocks = True  # Weixin renders fenced code blocks
+
    MAX_MESSAGE_LENGTH = 2000

    # WeChat does not support editing sent messages — streaming must use the
@ -1172,6 +1174,24 @@ class WeixinAdapter(BasePlatformAdapter):
            extra.get("send_chunk_retry_delay_seconds")
            or os.getenv("WEIXIN_SEND_CHUNK_RETRY_DELAY_SECONDS", "1.0")
        )
+        self._send_text_gate = asyncio.Lock()
+        self._rate_limit_circuit_threshold = max(
+            1,
+            int(
+                extra.get("rate_limit_circuit_threshold")
+                or os.getenv("WEIXIN_RATE_LIMIT_CIRCUIT_THRESHOLD", "1")
+            ),
+        )
+        self._rate_limit_circuit_window_seconds = float(
+            extra.get("rate_limit_circuit_window_seconds")
+            or os.getenv("WEIXIN_RATE_LIMIT_CIRCUIT_WINDOW_SECONDS", "30.0")
+        )
+        self._rate_limit_circuit_open_seconds = float(
+            extra.get("rate_limit_circuit_open_seconds")
+            or os.getenv("WEIXIN_RATE_LIMIT_CIRCUIT_OPEN_SECONDS", "30.0")
+        )
+        self._rate_limit_circuit_until = 0.0
+        self._rate_limit_events: List[float] = []
        self._dm_policy = str(extra.get("dm_policy") or os.getenv("WEIXIN_DM_POLICY", "open")).strip().lower()
        self._group_policy = str(extra.get("group_policy") or os.getenv("WEIXIN_GROUP_POLICY", "disabled")).strip().lower()
        allow_from = extra.get("allow_from")
@ -1645,6 +1665,37 @@ class WeixinAdapter(BasePlatformAdapter):
            content, self.MAX_MESSAGE_LENGTH, self._split_multiline_messages,
        )

+    def _rate_limit_cooldown_remaining(self) -> float:
+        return max(0.0, self._rate_limit_circuit_until - time.monotonic())
+
+    def _rate_limit_error(self) -> RuntimeError:
+        return RuntimeError(
+            f"iLink sendmessage rate limited; cooldown active for {self._rate_limit_cooldown_remaining():.1f}s"
+        )
+
+    def _open_rate_limit_circuit(self) -> None:
+        if self._rate_limit_circuit_open_seconds <= 0:
+            return
+        self._rate_limit_circuit_until = max(
+            self._rate_limit_circuit_until,
+            time.monotonic() + self._rate_limit_circuit_open_seconds,
+        )
+
+    def _record_rate_limit_event(self) -> bool:
+        """Record a genuine iLink rate limit and return True if breaker opened."""
+        now = time.monotonic()
+        window_start = now - self._rate_limit_circuit_window_seconds
+        self._rate_limit_events = [ts for ts in self._rate_limit_events if ts >= window_start]
+        self._rate_limit_events.append(now)
+        if len(self._rate_limit_events) >= self._rate_limit_circuit_threshold:
+            self._open_rate_limit_circuit()
+            return self._rate_limit_cooldown_remaining() > 0
+        return False
+
+    def _reset_rate_limit_circuit(self) -> None:
+        self._rate_limit_events.clear()
+        self._rate_limit_circuit_until = 0.0
+
    async def _send_text_chunk(
        self,
        *,
@ -1660,9 +1711,28 @@ class WeixinAdapter(BasePlatformAdapter):
        degraded fallback, which keeps cron-initiated push messages working
        even when no user message has refreshed the session recently.
        """
+        async with self._send_text_gate:
+            await self._send_text_chunk_locked(
+                chat_id=chat_id,
+                chunk=chunk,
+                context_token=context_token,
+                client_id=client_id,
+            )
+
+    async def _send_text_chunk_locked(
+        self,
+        *,
+        chat_id: str,
+        chunk: str,
+        context_token: Optional[str],
+        client_id: str,
+    ) -> None:
+        """Send a text chunk while holding the adapter-wide outbound text gate."""
        last_error: Optional[Exception] = None
        retried_without_token = False
        for attempt in range(self._send_chunk_retries + 1):
+            if self._rate_limit_cooldown_remaining() > 0:
+                raise self._rate_limit_error()
            try:
                resp = await _send_message(
                    self._send_session,
@ -1708,6 +1778,9 @@ class WeixinAdapter(BasePlatformAdapter):
                            last_error = RuntimeError(
                                f"iLink sendmessage rate limited: ret={ret} errcode={errcode} errmsg={errmsg}"
                            )
+                            if self._record_rate_limit_event():
+                                last_error = self._rate_limit_error()
+                                break
                            if attempt >= self._send_chunk_retries:
                                break
                            wait = self._send_chunk_retry_delay_seconds * 3  # 3x backoff for rate limit
@ -1721,6 +1794,7 @@ class WeixinAdapter(BasePlatformAdapter):
                        raise RuntimeError(
                            f"iLink sendmessage error: ret={ret} errcode={errcode} errmsg={errmsg}"
                        )
+                self._reset_rate_limit_circuit()
                return
            except Exception as exc:
                last_error = exc
@ -1808,10 +1882,47 @@ class WeixinAdapter(BasePlatformAdapter):
            logger.error("[%s] send failed to=%s: %s", self.name, _safe_id(chat_id), exc)
            return SendResult(success=False, error=str(exc))

+    async def _ensure_typing_ticket(self, chat_id: str) -> Optional[str]:
+        """Return a valid typing ticket, refreshing from getConfig if expired.
+
+        The iLink typing ticket has a 600-second TTL.  When a long-running
+        session exceeds that window the cached ticket evicts, and both
+        ``send_typing`` and ``stop_typing`` silently no-op — leaving the
+        WeChat client stuck showing the typing indicator forever.  This
+        method transparently refreshes the ticket so the stop signal can
+        always be delivered.
+        """
+        ticket = self._typing_cache.get(chat_id)
+        if ticket:
+            return ticket
+        if not self._send_session or not self._token:
+            return None
+        # Ticket expired or never fetched — refresh via getConfig.
+        # Use the most recent context_token for this peer if available.
+        context_token = self._token_store.get(self._account_id, chat_id)
+        try:
+            response = await _get_config(
+                self._send_session,
+                base_url=self._base_url,
+                token=self._token,
+                user_id=chat_id,
+                context_token=context_token,
+            )
+            typing_ticket = str(response.get("typing_ticket") or "")
+            if typing_ticket:
+                self._typing_cache.set(chat_id, typing_ticket)
+                return typing_ticket
+        except Exception as exc:
+            logger.debug(
+                "[%s] typing ticket refresh failed for %s: %s",
+                self.name, _safe_id(chat_id), exc,
+            )
+        return None
+
    async def send_typing(self, chat_id: str, metadata: Optional[Dict[str, Any]] = None) -> None:
        if not self._send_session or not self._token:
            return
-        typing_ticket = self._typing_cache.get(chat_id)
+        typing_ticket = await self._ensure_typing_ticket(chat_id)
        if not typing_ticket:
            return
        try:
@ -1829,7 +1940,7 @@ class WeixinAdapter(BasePlatformAdapter):
    async def stop_typing(self, chat_id: str) -> None:
        if not self._send_session or not self._token:
            return
-        typing_ticket = self._typing_cache.get(chat_id)
+        typing_ticket = await self._ensure_typing_ticket(chat_id)
        if not typing_ticket:
            return
        try:
--- a/gateway/platforms/whatsapp.py
+++ b/gateway/platforms/whatsapp.py
@ -242,6 +242,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
    # WhatsApp message limits — practical UX limit, not protocol max.
    # WhatsApp allows ~65K but long messages are unreadable on mobile.
    MAX_MESSAGE_LENGTH = 4096
+    supports_code_blocks = True  # WhatsApp renders fenced code blocks (monospace)
    DEFAULT_REPLY_PREFIX = "⚕ *Hermes Agent*\n────────────\n"
    
    # Default bridge location relative to the hermes-agent install
--- a/gateway/platforms/yuanbao.py
+++ b/gateway/platforms/yuanbao.py
@ -120,6 +120,16 @@ AUTH_TIMEOUT_SECONDS = 10.0
 MAX_RECONNECT_ATTEMPTS = 100
 DEFAULT_SEND_TIMEOUT = 30.0  # WS biz request timeout

+# Upper bound on the WS close handshake during teardown (#40383). The
+# websockets connection's own close_timeout (5s) blocks until the server
+# echoes the close frame; an idle/unresponsive server never replies, stalling
+# gateway shutdown by the full timeout. Bounding the close await here keeps
+# teardown fast — a responsive server completes the handshake in well under a
+# second, so this only caps the pathological hang. Also bounds the reconnect /
+# connect-failure cleanup paths that reuse _cleanup_ws(), where a graceful
+# close is unnecessary anyway (the socket is being discarded to redial).
+WS_CLOSE_TIMEOUT_S = 1.0
+
 # Close codes that indicate permanent errors — do NOT reconnect.
 NO_RECONNECT_CLOSE_CODES = {4012, 4013, 4014, 4018, 4019, 4021}

@ -3445,12 +3455,22 @@ class ConnectionManager:
        return False

    async def _cleanup_ws(self) -> None:
-        """Close and clear the WebSocket connection."""
+        """Close and clear the WebSocket connection, bounded by
+        ``WS_CLOSE_TIMEOUT_S`` so an unresponsive server can't stall teardown
+        (see the constant's definition for the full rationale)."""
        ws = self._ws
        self._ws = None
        if ws is not None:
            try:
-                await ws.close()
+                await asyncio.wait_for(ws.close(), timeout=WS_CLOSE_TIMEOUT_S)
+            except asyncio.TimeoutError:
+                # Server never echoed the close frame within the bound; drop the
+                # connection. websockets force-closes the transport on cancel,
+                # and at shutdown the loop is tearing down anyway.
+                logger.debug(
+                    "[%s] WS close handshake exceeded %.1fs — dropping connection",
+                    self._adapter.name, WS_CLOSE_TIMEOUT_S,
+                )
            except Exception:
                pass

--- a/gateway/run.py
+++ b/gateway/run.py
--- a/gateway/session.py
+++ b/gateway/session.py
@ -635,6 +635,22 @@ def build_session_key(
            if source.thread_id:
                return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
            return f"agent:main:{platform}:dm:{dm_chat_id}"
+        # No chat_id — fall back to the sender's own identifier before the
+        # bare per-platform sink.  Without this, every DM from every user that
+        # arrives without a chat_id (non-standard adapters / synthetic sources)
+        # collapses into one shared "agent:main:<platform>:dm" session, and a
+        # single cached agent ends up serving multiple people's conversations —
+        # cross-user history bleed.  participant_id keeps DMs isolated per user.
+        dm_participant_id = source.user_id_alt or source.user_id
+        if dm_participant_id and source.platform == Platform.WHATSAPP:
+            dm_participant_id = (
+                canonical_whatsapp_identifier(str(dm_participant_id))
+                or dm_participant_id
+            )
+        if dm_participant_id:
+            if source.thread_id:
+                return f"agent:main:{platform}:dm:{dm_participant_id}:{source.thread_id}"
+            return f"agent:main:{platform}:dm:{dm_participant_id}"
        if source.thread_id:
            return f"agent:main:{platform}:dm:{source.thread_id}"
        return f"agent:main:{platform}:dm"
--- a/gateway/slash_commands.py
+++ b/gateway/slash_commands.py
--- a/hermes_cli/auth.py
+++ b/hermes_cli/auth.py
@ -1182,6 +1182,24 @@ def _store_provider_state(
        auth_store["active_provider"] = provider_id


+def mark_provider_active_if_unset(provider_id: str) -> None:
+    """Set ``active_provider`` to *provider_id* only when none is set yet.
+
+    Used by ``hermes auth add`` OAuth paths that create credential-pool
+    entries directly (no singleton ``providers.<id>`` block). Adding the
+    very first credential for a provider should make it the active provider
+    so the setup wizard's ``_model_section_has_credentials()`` check (which
+    consults ``get_active_provider()``) does not report "No inference
+    provider configured". Subsequent adds for an already-active setup leave
+    the user's chosen active provider untouched.
+    """
+    with _auth_store_lock():
+        auth_store = _load_auth_store()
+        if not (auth_store.get("active_provider") or "").strip():
+            auth_store["active_provider"] = provider_id
+            _save_auth_store(auth_store)
+
+
 def is_known_auth_provider(provider_id: str) -> bool:
    normalized = (provider_id or "").strip().lower()
    return normalized in PROVIDER_REGISTRY or normalized in SERVICE_PROVIDER_NAMES
@ -1561,6 +1579,21 @@ def resolve_provider(
    if has_usable_secret(os.getenv("OPENAI_API_KEY")) or has_usable_secret(os.getenv("OPENROUTER_API_KEY")):
        return "openrouter"

+    # Auto-detect an OpenRouter credential added via `hermes auth add openrouter`
+    # (manual pool entry, no env var). Without this, a key that only lives in
+    # the credential pool is invisible to auto-detection — the user sees
+    # `hermes auth list` showing the credential while requests go out with no
+    # Authorization header ("HTTP 401: Missing Authentication header"). The
+    # env-var check above only covers keys exported as OPENROUTER_API_KEY /
+    # OPENAI_API_KEY. See issue #42130.
+    try:
+        from agent.credential_pool import load_pool as _load_pool
+
+        if _load_pool("openrouter").has_credentials():
+            return "openrouter"
+    except Exception as e:
+        logger.debug("Could not check OpenRouter credential pool: %s", e)
+
    # Auto-detect API-key providers by checking their env vars
    for pid, pconfig in PROVIDER_REGISTRY.items():
        if pconfig.auth_type != "api_key":
@ -3340,6 +3373,7 @@ def _sync_codex_pool_entries(
    auth_store: Dict[str, Any],
    tokens: Dict[str, str],
    last_refresh: Optional[str],
+    previous_singleton_tokens: Optional[Dict[str, str]] = None,
 ) -> None:
    """Mirror a fresh Codex re-auth into the credential_pool OAuth entries.

@ -3355,24 +3389,34 @@ def _sync_codex_pool_entries(
      OAuth flow when the user logged in via ``hermes setup`` / the model
      picker.  Always synced with the fresh tokens.
    * ``manual:device_code`` — entries created by ``hermes auth add openai-codex``
-      that use the same device-code OAuth mechanism.  An interactive re-auth
-      proves the user owns the ChatGPT account, so it is safe (and expected)
-      to refresh these entries too.  Without this, a user who once ran the
-      ``hermes auth add`` workaround for #33000 would silently leave that
-      manual entry stale on every subsequent re-auth, recreating the issue
-      reported in #33538.
+      that use the same device-code OAuth mechanism.  ONLY synced if the
+      entry's existing access_token matches the *previous* singleton
+      access_token (i.e. the entry is a legacy singleton-alias from the
+      #33000 workaround era).  Manual entries whose tokens never matched the
+      singleton represent INDEPENDENT accounts added via
+      ``hermes auth add openai-codex`` and must not be overwritten by a
+      re-auth that targeted a different account (regression for #39236).
+
+      The original #33538 fix refreshed every ``manual:device_code`` entry
+      unconditionally.  That worked when ``manual:device_code`` only meant
+      "legacy alias of the singleton", but the same source string is now
+      also produced by independent-account additions, and the broad sync
+      silently clobbered distinct accounts with the latest-authenticated
+      token pair.  The access_token-match check distinguishes the two cases
+      without changing the source-string contract.

    What does NOT get refreshed:

    * ``manual:api_key`` and any other non-device-code manual sources — those
      are independent credentials (an explicit API key, a different ChatGPT
      account, etc.) and must not be overwritten by a single re-auth.
+    * ``manual:device_code`` entries whose access_token does NOT match the
+      previous singleton — see above; these are independent accounts.

-    Error markers (``last_status``, ``last_error_*``) are also cleared on
-    every device-code-backed entry — even those whose tokens we did not
-    rewrite — so that an interactive re-auth gives every relevant pool entry
-    a fresh selection chance instead of leaving them marked unhealthy from a
-    pre-re-auth 401.
+    Error markers (``last_status``, ``last_error_*``) are cleared ONLY on
+    entries that actually had their tokens rewritten by this re-auth.
+    Independent entries keep their own error state (their 401/429 markers
+    belong to that account's own auth flow, not this re-auth).
    """
    access_token = tokens.get("access_token")
    if not access_token:
@ -3384,15 +3428,34 @@ def _sync_codex_pool_entries(
    entries = pool.get("openai-codex")
    if not isinstance(entries, list):
        return
-    # Sources whose tokens should be rewritten by a fresh Codex device-code
-    # OAuth re-auth.  ``manual:api_key`` and unknown sources are intentionally
-    # excluded — they represent independent credentials.
-    REFRESHABLE_SOURCES = {"device_code", "manual:device_code"}
+    # Previous singleton access_token (before this re-auth overwrote it) —
+    # used to distinguish legacy singleton-aliases from independent accounts.
+    # When None or empty, no manual entry can be treated as an alias (which
+    # is the right default for first-ever-save or a freshly initialized
+    # auth.json).
+    prev_at = None
+    if isinstance(previous_singleton_tokens, dict):
+        prev_at = previous_singleton_tokens.get("access_token") or None
    for entry in entries:
        if not isinstance(entry, dict):
            continue
        source = entry.get("source")
-        if source not in REFRESHABLE_SOURCES:
+        if source == "device_code":
+            # Singleton-seeded mirror — always refresh.
+            refresh_this_entry = True
+        elif source == "manual:device_code":
+            # Refresh only if this entry's existing access_token matches the
+            # previous singleton access_token (i.e. it is a true alias of the
+            # singleton from the #33000 workaround era).  An entry with its
+            # own distinct token material is an independent account and must
+            # be left alone (#39236).
+            refresh_this_entry = bool(
+                prev_at and entry.get("access_token") == prev_at
+            )
+        else:
+            # ``manual:api_key`` and any future non-device-code sources.
+            refresh_this_entry = False
+        if not refresh_this_entry:
            continue
        entry["access_token"] = access_token
        if refresh_token:
@ -3414,13 +3477,24 @@ def _save_codex_tokens(tokens: Dict[str, str], last_refresh: str = None, label:
    with _auth_store_lock():
        auth_store = _load_auth_store()
        state = _load_provider_state(auth_store, "openai-codex") or {}
+        # Capture the previous singleton tokens BEFORE overwriting them.  The
+        # pool-sync step uses this to distinguish legacy singleton-aliases
+        # (which should be refreshed) from independent accounts that
+        # ``hermes auth add openai-codex`` created (which must not be
+        # overwritten — see #39236).
+        previous_singleton_tokens = state.get("tokens") if isinstance(state.get("tokens"), dict) else None
        state["tokens"] = tokens
        state["last_refresh"] = last_refresh
        state["auth_mode"] = "chatgpt"
        if label and str(label).strip():
            state["label"] = str(label).strip()
        _save_provider_state(auth_store, "openai-codex", state)
-        _sync_codex_pool_entries(auth_store, tokens, last_refresh)
+        _sync_codex_pool_entries(
+            auth_store,
+            tokens,
+            last_refresh,
+            previous_singleton_tokens=previous_singleton_tokens,
+        )
        _save_auth_store(auth_store)


--- a/hermes_cli/auth_commands.py
+++ b/hermes_cli/auth_commands.py
@ -13,6 +13,7 @@ from agent.credential_pool import (
    AUTH_TYPE_OAUTH,
    CUSTOM_POOL_PREFIX,
    SOURCE_MANUAL,
+    SOURCE_MANUAL_DEVICE_CODE,
    STATUS_EXHAUSTED,
    STRATEGY_FILL_FIRST,
    STRATEGY_ROUND_ROBIN,
@ -312,15 +313,35 @@ def auth_add_command(args) -> None:
            creds["tokens"]["access_token"],
            _oauth_default_label(provider, len(pool.entries()) + 1),
        )
-        auth_mod._save_codex_tokens(
-            creds["tokens"],
-            last_refresh=creds.get("last_refresh"),
+        # Add a distinct, self-contained pool entry per account (matching the
+        # xai-oauth / google-gemini-cli / qwen-oauth patterns) instead of
+        # routing through the singleton ``_save_codex_tokens`` save path.
+        # The singleton round-trip collapsed every added account into the
+        # latest login: a second ``hermes auth add openai-codex`` overwrote
+        # the first account's singleton-mirrored ``device_code`` entry rather
+        # than creating an independent one (#39236). ``manual:device_code``
+        # entries refresh from their own token pair, so they need no singleton
+        # shadow.
+        entry = PooledCredential(
+            provider=provider,
+            id=uuid.uuid4().hex[:6],
            label=label,
+            auth_type=AUTH_TYPE_OAUTH,
+            priority=0,
+            source=SOURCE_MANUAL_DEVICE_CODE,
+            access_token=creds["tokens"]["access_token"],
+            refresh_token=creds["tokens"].get("refresh_token"),
+            base_url=creds.get("base_url"),
+            last_refresh=creds.get("last_refresh"),
        )
-        pool = load_pool(provider)
-        entry = next((item for item in pool.entries() if item.source == "device_code"), None)
-        shown_label = entry.label if entry is not None else label
-        print(f'Saved {provider} OAuth device-code credentials: "{shown_label}"')
+        first_credential = not pool.entries()
+        pool.add_entry(entry)
+        # Adding the first Codex credential should make it the active provider
+        # (the old singleton save path did this implicitly via
+        # _save_provider_state). Subsequent adds leave the active provider as-is.
+        if first_credential:
+            auth_mod.mark_provider_active_if_unset(provider)
+        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
        return

    if provider == "xai-oauth":
--- a/Show more
+++ b/Show more