Merge remote-tracking branch 'origin/main' into bb/pets

2026-06-27 11:22:03 +00:00 · 2026-06-22 05:25:49 -05:00 · 2026-06-22 05:25:49 -05:00 · 5342eccf12
commit 5342eccf12
parent 6fd839ac84 04a1d9efd7
823 changed files with 58322 additions and 13772 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -102,6 +102,3 @@ acp_registry/
 .gitattributes
 .hadolint.yaml
 .mailmap
-
-# Top-level LICENSE (not matched by *.md); not needed inside the container
-LICENSE
--- a/.env.example
+++ b/.env.example
@ -105,6 +105,7 @@
 # Get your token at: https://huggingface.co/settings/tokens
 # Required permission: "Make calls to Inference Providers"
 # HF_TOKEN=
+# HF_BASE_URL=https://router.huggingface.co/v1  # Override default base URL
 # OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1  # Override default base URL

 # =============================================================================
@ -411,6 +412,9 @@ IMAGE_TOOLS_DEBUG=false
 # Groq API key (free tier — used for Whisper STT in voice mode)
 # GROQ_API_KEY=

+# ElevenLabs API key (cloud STT/TTS — Scribe transcription)
+# ELEVENLABS_API_KEY=
+
 # =============================================================================
 # STT PROVIDER SELECTION
 # =============================================================================
--- a/AGENTS.md
+++ b/AGENTS.md
@ -954,9 +954,10 @@ Enable/disable per platform via `hermes tools` (the curses UI) or the
 ## Delegation (`delegate_task`)

 `tools/delegate_tool.py` spawns a subagent with an isolated
-context + terminal session. Synchronous: the parent waits for the
-child's summary before continuing its own loop — if the parent is
-interrupted, the child is cancelled.
+context + terminal session. By default the parent waits for the
+child's summary before continuing its own loop. With `background=true`,
+Hermes returns a delegation id immediately and the result re-enters the
+conversation later through the async-delegation completion queue.

 Two shapes:

@ -978,9 +979,9 @@ Key config knobs (under `delegation:` in `config.yaml`):
 `orchestrator_enabled`, `subagent_auto_approve`, `inherit_mcp_toolsets`,
 `max_iterations`.

-Synchronicity rule: delegate_task is **not** durable. For long-running
-work that must outlive the current turn, use `cronjob` or
-`terminal(background=True, notify_on_complete=True)` instead.
+Durability rule: background `delegate_task` is detached from the current
+turn but still process-local. For work that must survive process restart, use
+`cronjob` or `terminal(background=True, notify_on_complete=True)` instead.

 ---

@ -1174,7 +1175,7 @@ automatically scope to the active profile.
   a unique credential (bot token, API key), call `acquire_scoped_lock()` from
   `gateway.status` in the `connect()`/`start()` method and `release_scoped_lock()` in
   `disconnect()`/`stop()`. This prevents two profiles from using the same credential.
-   See `gateway/platforms/telegram.py` for the canonical pattern.
+   See `plugins/platforms/irc/adapter.py` for the canonical pattern.

 6. **Profile operations are HOME-anchored, not HERMES_HOME-anchored** — `_get_profiles_root()`
   returns `Path.home() / ".hermes" / "profiles"`, NOT `get_hermes_home() / "profiles"`.
--- a/CONTRIBUTING.es.md
+++ b/CONTRIBUTING.es.md
@ -0,0 +1,602 @@
+# Contribuir a Hermes Agent
+
+¡Gracias por contribuir a Hermes Agent! Esta guía cubre todo lo que necesitas: configurar tu entorno de desarrollo, entender la arquitectura, decidir qué construir y conseguir que tu PR sea aceptado.
+
+---
+
+## Prioridades de Contribución
+
+Valoramos las contribuciones en este orden:
+
+1. **Correcciones de errores** — bloqueos, comportamiento incorrecto, pérdida de datos. Siempre la máxima prioridad.
+2. **Compatibilidad entre plataformas** — macOS, diferentes distribuciones de Linux y WSL2 en Windows. Queremos que Hermes funcione en todas partes.
+3. **Fortalecimiento de seguridad** — inyección de shell, inyección de prompts, traversal de rutas, escalada de privilegios. Ver [Consideraciones de Seguridad](#consideraciones-de-seguridad).
+4. **Rendimiento y robustez** — lógica de reintento, manejo de errores, degradación elegante.
+5. **Nuevas habilidades** — pero solo las ampliamente útiles. Ver [¿Debería ser una Habilidad o una Herramienta?](#debería-ser-una-habilidad-o-una-herramienta)
+6. **Nuevas herramientas** — raramente necesarias. La mayoría de las capacidades deberían ser habilidades. Ver más abajo.
+7. **Documentación** — correcciones, aclaraciones, nuevos ejemplos.
+
+---
+
+## ¿Debería ser una Habilidad o una Herramienta?
+
+Esta es la pregunta más común para los nuevos colaboradores. La respuesta casi siempre es **habilidad**.
+
+### Hazlo una Habilidad cuando:
+
+- La capacidad se puede expresar como instrucciones + comandos de shell + herramientas existentes
+- Envuelve una CLI externa o API que el agente puede llamar a través de `terminal` o `web_extract`
+- No necesita integración personalizada de Python ni gestión de claves API integrada en el agente
+- Ejemplos: búsqueda en arXiv, flujos de trabajo de git, gestión de Docker, procesamiento de PDF, email a través de herramientas CLI
+
+### Hazlo una Herramienta cuando:
+
+- Requiere integración de extremo a extremo con claves API, flujos de autenticación o configuración de múltiples componentes gestionada por el harness del agente
+- Necesita lógica de procesamiento personalizada que debe ejecutarse con precisión en cada ocasión (no "mejor esfuerzo" de la interpretación del LLM)
+- Maneja datos binarios, streaming o eventos en tiempo real que no pueden pasar por el terminal
+- Ejemplos: automatización de navegador (gestión de sesiones Browserbase), TTS (codificación de audio + entrega en plataforma), análisis de visión (manejo de imágenes base64)
+
+### ¿Debería la Habilidad estar incluida?
+
+Las habilidades incluidas (en `skills/`) se envían con cada instalación de Hermes. Deben ser **ampliamente útiles para la mayoría de los usuarios**:
+
+- Manejo de documentos, investigación web, flujos de trabajo de desarrollo comunes, administración de sistemas
+- Usadas regularmente por una amplia gama de personas
+
+Si tu habilidad es oficial y útil pero no universalmente necesaria (ej., una integración de servicio de pago, una dependencia pesada), ponla en **`optional-skills/`** — se envía con el repositorio pero no está activada por defecto. Los usuarios pueden descubrirla a través de `hermes skills browse` (etiquetada como "oficial") e instalarla con `hermes skills install` (sin advertencia de terceros, confianza integrada).
+
+Si tu habilidad es especializada, contribuida por la comunidad o de nicho, es mejor para un **Skills Hub** — súbela a un registro de habilidades y compártela en el [Discord de Nous Research](https://discord.gg/NousResearch). Los usuarios pueden instalarla con `hermes skills install`.
+
+---
+
+## Proveedores de Memoria: Publicar como Plugin Independiente
+
+**Ya no aceptamos nuevos proveedores de memoria en este repositorio.** El conjunto de proveedores integrados en `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) está cerrado. Si quieres añadir un nuevo backend de memoria, publícalo como un **repositorio de plugin independiente** que los usuarios instalen en `~/.hermes/plugins/` (o a través de un entry point de pip).
+
+Los plugins de memoria independientes:
+
+- Implementan el mismo ABC `MemoryProvider` (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown` y opcionalmente `post_setup(hermes_home, config)` para integración con el asistente de configuración
+- Usan el mismo sistema de descubrimiento — `discover_memory_providers()` los recoge desde directorios de plugins de usuario/proyecto y entry points de pip
+- Se integran con `hermes memory setup` a través de `post_setup()` — sin necesidad de tocar el código base
+- Pueden registrar sus propios subcomandos CLI a través de `register_cli(subparser)` en un archivo `cli.py`
+- Obtienen todos los mismos hooks de ciclo de vida y plomería de configuración que los proveedores incluidos en el árbol
+
+Los PRs que añadan un nuevo directorio bajo `plugins/memory/` serán cerrados con un puntero para publicar el proveedor como su propio repositorio. Los proveedores en árbol existentes se mantienen; las correcciones de errores para ellos son bienvenidas.
+
+Esto no es una barra de calidad — es una decisión de acoplamiento y mantenimiento. Los proveedores de memoria son el tipo de plugin más común y no deberían vivir todos en este árbol.
+
+---
+
+## Configuración del Desarrollo
+
+### Prerequisitos
+
+| Requisito | Notas |
+|-----------|-------|
+| **Git** | Con la extensión `git-lfs` instalada |
+| **Python 3.11+** | uv lo instalará si falta |
+| **uv** | Gestor de paquetes Python rápido ([instalar](https://docs.astral.sh/uv/)) |
+| **Node.js 20+** | Opcional — necesario para herramientas de navegador y puente WhatsApp (coincide con los engines de `package.json` raíz) |
+
+### Clonar e instalar
+
+```bash
+git clone https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+
+# Crear venv con Python 3.11
+uv venv venv --python 3.11
+export VIRTUAL_ENV="$(pwd)/venv"
+
+# Instalar con todos los extras (mensajería, cron, menús CLI, herramientas de desarrollo)
+uv pip install -e ".[all,dev]"
+
+# Opcional: herramientas de navegador
+npm install
+```
+
+### Configurar para desarrollo
+
+```bash
+mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
+cp cli-config.yaml.example ~/.hermes/config.yaml
+touch ~/.hermes/.env
+
+# Añadir al menos una clave de proveedor LLM:
+echo "OPENROUTER_API_KEY=***" >> ~/.hermes/.env
+```
+
+### Ejecutar
+
+```bash
+# Enlace simbólico para acceso global
+mkdir -p ~/.local/bin
+ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
+
+# Verificar
+hermes doctor
+hermes chat -q "Hola"
+```
+
+### Ejecutar tests
+
+```bash
+# Preferido — coincide con CI (entorno hermético, 4 workers xdist); ver AGENTS.md
+scripts/run_tests.sh
+
+# Alternativa (activa el venv primero). El wrapper sigue recomendándose
+# para paridad con GitHub Actions antes de abrir un PR:
+pytest tests/ -v
+```
+
+---
+
+## Estructura del Proyecto
+
+```
+hermes-agent/
+├── run_agent.py              # Clase AIAgent — bucle de conversación central, despacho de herramientas, persistencia de sesión
+├── cli.py                    # Clase HermesCLI — TUI interactiva, integración prompt_toolkit
+├── model_tools.py            # Orquestación de herramientas (capa delgada sobre tools/registry.py)
+├── toolsets.py               # Agrupaciones y presets de herramientas (hermes-cli, hermes-telegram, etc.)
+├── hermes_state.py           # Base de datos de sesiones SQLite con búsqueda de texto completo FTS5, títulos de sesión
+├── batch_runner.py           # Procesamiento en lote paralelo para generación de trayectorias
+│
+├── agent/                    # Internos del agente (módulos extraídos)
+│   ├── prompt_builder.py         # Ensamblaje del prompt del sistema (identidad, habilidades, archivos de contexto, memoria)
+│   ├── context_compressor.py     # Auto-resumición al acercarse a los límites de contexto
+│   ├── auxiliary_client.py       # Resuelve clientes OpenAI auxiliares (resumición, visión)
+│   ├── display.py                # KawaiiSpinner, formateo del progreso de herramientas
+│   ├── model_metadata.py         # Longitudes de contexto del modelo, estimación de tokens
+│   └── trajectory.py             # Ayudantes para guardar trayectorias
+│
+├── hermes_cli/               # Implementaciones de comandos CLI
+│   ├── main.py                   # Punto de entrada, análisis de argumentos, despacho de comandos
+│   ├── config.py                 # Gestión de configuración, migración, definiciones de variables de entorno
+│   ├── setup.py                  # Asistente de configuración interactivo
+│   ├── auth.py                   # Resolución de proveedor, OAuth, Nous Portal
+│   ├── models.py                 # Listas de selección de modelos de OpenRouter
+│   ├── banner.py                 # Banner de bienvenida, arte ASCII
+│   ├── commands.py               # Registro central de comandos de barra (CommandDef), autocompletado, ayudantes del gateway
+│   ├── callbacks.py              # Callbacks interactivos (aclarar, sudo, aprobación)
+│   ├── doctor.py                 # Diagnósticos
+│   ├── skills_hub.py             # CLI del Skills Hub + comando de barra /skills
+│   └── skin_engine.py            # Motor de skins/temas — personalización visual de CLI basada en datos
+│
+├── tools/                    # Implementaciones de herramientas (auto-registradas)
+│   ├── registry.py               # Registro central de herramientas (esquemas, manejadores, despacho)
+│   ├── approval.py               # Detección de comandos peligrosos + aprobación por sesión
+│   ├── terminal_tool.py          # Orquestación del terminal (sudo, ciclo de vida del entorno, backends)
+│   ├── file_operations.py        # read_file, write_file, búsqueda, patch, etc.
+│   ├── web_tools.py              # web_search, web_extract (Paralelo/Firecrawl + resumición Gemini)
+│   ├── vision_tools.py           # Análisis de imágenes a través de modelos multimodales
+│   ├── delegate_tool.py          # Lanzamiento de subagentes y ejecución paralela de tareas
+│   ├── code_execution_tool.py    # Python sandboxado con acceso a herramientas vía RPC
+│   ├── session_search_tool.py    # Búsqueda en conversaciones pasadas con FTS5 + ventanas ancladas
+│   ├── cronjob_tools.py          # Gestión de tareas programadas
+│   ├── skill_tools.py            # Búsqueda, carga y gestión de habilidades
+│   └── environments/             # Backends de ejecución del terminal
+│       ├── base.py                   # ABC BaseEnvironment
+│       ├── local.py, docker.py, ssh.py, singularity.py, modal.py, daytona.py
+│
+├── gateway/                  # Gateway de mensajería
+│   ├── run.py                    # GatewayRunner — ciclo de vida de plataformas, enrutamiento de mensajes, cron
+│   ├── config.py                 # Resolución de configuración de plataformas
+│   ├── session.py                # Almacén de sesiones, prompts de contexto, políticas de reset
+│   └── platforms/                # Adaptadores de plataformas
+│       ├── telegram.py, discord_adapter.py, slack.py, whatsapp.py
+│
+├── scripts/                  # Scripts del instalador y puente
+│   ├── install.sh                # Instalador Linux/macOS
+│   ├── install.ps1               # Instalador Windows PowerShell
+│   └── whatsapp-bridge/          # Puente WhatsApp Node.js (Baileys)
+│
+├── skills/                   # Habilidades incluidas (copiadas a ~/.hermes/skills/ en la instalación)
+├── optional-skills/          # Habilidades opcionales oficiales (descubribles vía hub, no activadas por defecto)
+├── tests/                    # Suite de tests
+├── website/                  # Sitio de documentación (hermes-agent.nousresearch.com)
+│
+├── cli-config.yaml.example   # Configuración de ejemplo (copiada a ~/.hermes/config.yaml)
+└── AGENTS.md                 # Guía de desarrollo para asistentes de codificación IA
+```
+
+### Configuración del usuario (almacenada en `~/.hermes/`)
+
+| Ruta | Propósito |
+|------|-----------|
+| `~/.hermes/config.yaml` | Configuración (modelo, terminal, toolsets, compresión, etc.) |
+| `~/.hermes/.env` | Claves API y secretos |
+| `~/.hermes/auth.json` | Credenciales OAuth (Nous Portal) |
+| `~/.hermes/skills/` | Todas las habilidades activas (incluidas + instaladas desde hub + creadas por el agente) |
+| `~/.hermes/memories/` | Memoria persistente (MEMORY.md, USER.md) |
+| `~/.hermes/state.db` | Base de datos de sesiones SQLite |
+| `~/.hermes/sessions/` | Índice de enrutamiento del gateway (`sessions.json`), migas de pan de solicitudes, transcripciones `*.jsonl` del gateway y (opcionalmente) snapshots JSON por sesión cuando `sessions.write_json_snapshots: true` está configurado. Los snapshots por sesión están desactivados por defecto; state.db es canónica. |
+| `~/.hermes/cron/` | Datos de trabajos programados |
+| `~/.hermes/whatsapp/session/` | Credenciales del puente WhatsApp |
+
+---
+
+## Descripción General de la Arquitectura
+
+### Bucle Central
+
+```
+Mensaje del usuario → AIAgent._run_agent_loop()
+  ├── Construir prompt del sistema (prompt_builder.py)
+  ├── Construir kwargs de API (modelo, mensajes, herramientas, configuración de razonamiento)
+  ├── Llamar al LLM (API compatible con OpenAI)
+  ├── Si tool_calls en la respuesta:
+  │     ├── Ejecutar cada herramienta a través del despacho del registro
+  │     ├── Añadir resultados de herramientas a la conversación
+  │     └── Volver a la llamada al LLM
+  ├── Si respuesta de texto:
+  │     ├── Persistir sesión en DB
+  │     └── Devolver final_response
+  └── Compresión de contexto si se acerca al límite de tokens
+```
+
+### Patrones de Diseño Clave
+
+- **Herramientas auto-registradas**: Cada archivo de herramienta llama a `registry.register()` en el momento de importación. `model_tools.py` activa el descubrimiento importando todos los módulos de herramientas.
+- **Agrupación en toolsets**: Las herramientas se agrupan en toolsets (`web`, `terminal`, `file`, `browser`, etc.) que pueden habilitarse/deshabilitarse por plataforma.
+- **Persistencia de sesión**: Todas las conversaciones se almacenan en SQLite (`hermes_state.py`) con búsqueda de texto completo y títulos de sesión únicos.
+- **Inyección efímera**: Los prompts del sistema y los mensajes de relleno se inyectan en el momento de la llamada API, nunca se persisten en la base de datos ni en los logs.
+- **Abstracción de proveedor**: El agente funciona con cualquier API compatible con OpenAI. La resolución del proveedor ocurre en el momento de la inicialización.
+- **Enrutamiento de proveedor**: Al usar OpenRouter, `provider_routing` en config.yaml controla la selección del proveedor.
+
+---
+
+## Estilo de Código
+
+- **PEP 8** con excepciones prácticas (no imponemos longitud de línea estricta)
+- **Comentarios**: Solo cuando se explica la intención no obvia, compromisos o peculiaridades de API. No narres lo que hace el código
+- **Manejo de errores**: Captura excepciones específicas. Registra con `logger.warning()`/`logger.error()` — usa `exc_info=True` para errores inesperados
+- **Multiplataforma**: Nunca asumas Unix. Ver [Compatibilidad Multiplataforma](#compatibilidad-multiplataforma)
+
+---
+
+## Añadir una Nueva Herramienta
+
+Antes de escribir una herramienta, pregúntate: [¿debería ser una habilidad en su lugar?](#debería-ser-una-habilidad-o-una-herramienta)
+
+Las herramientas se auto-registran en el registro central. Cada archivo de herramienta co-localiza su esquema, manejador y registro:
+
+```python
+"""my_tool — Breve descripción de lo que hace esta herramienta."""
+
+import json
+from tools.registry import registry
+
+
+def my_tool(param1: str, param2: int = 10, **kwargs) -> str:
+    """Manejador. Devuelve un resultado en cadena (a menudo JSON)."""
+    result = do_work(param1, param2)
+    return json.dumps(result)
+
+
+MY_TOOL_SCHEMA = {
+    "type": "function",
+    "function": {
+        "name": "my_tool",
+        "description": "Qué hace esta herramienta y cuándo debería usarla el agente.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "param1": {"type": "string", "description": "Qué es param1"},
+                "param2": {"type": "integer", "description": "Qué es param2", "default": 10},
+            },
+            "required": ["param1"],
+        },
+    },
+}
+
+
+def _check_requirements() -> bool:
+    """Devuelve True si las dependencias de esta herramienta están disponibles."""
+    return True
+
+
+registry.register(
+    name="my_tool",
+    toolset="my_toolset",
+    schema=MY_TOOL_SCHEMA,
+    handler=lambda args, **kw: my_tool(**args, **kw),
+    check_fn=_check_requirements,
+)
+```
+
+**Conectar a un toolset (requerido):** Las herramientas integradas se auto-descubren: cualquier
+archivo `tools/*.py` que contenga una llamada de nivel superior `registry.register(...)` es
+importado por `discover_builtin_tools()` en `tools/registry.py` cuando `model_tools`
+se carga. **No** hay una lista de importaciones manual en `model_tools.py` que mantener.
+
+Todavía debes añadir el nombre de la herramienta a la lista apropiada en `toolsets.py`
+(por ejemplo `_HERMES_CORE_TOOLS` o un toolset dedicado); de lo contrario la herramienta
+se registra pero nunca se expone al agente.
+
+Consulta `AGENTS.md` (sección **Adding New Tools**) para rutas conscientes del perfil y
+orientación sobre plugins vs. núcleo.
+
+---
+
+## Añadir una Habilidad
+
+Las habilidades incluidas viven en `skills/` organizadas por categoría. Las habilidades opcionales oficiales usan la misma estructura en `optional-skills/`:
+
+```
+skills/
+├── research/
+│   └── arxiv/
+│       ├── SKILL.md              # Requerido: instrucciones principales
+│       └── scripts/              # Opcional: scripts auxiliares
+│           └── search_arxiv.py
+├── productivity/
+│   └── ocr-and-documents/
+│       ├── SKILL.md
+│       ├── scripts/
+│       └── references/
+└── ...
+```
+
+### Formato de SKILL.md
+
+```markdown
+---
+name: my-skill
+description: Breve descripción (mostrada en los resultados de búsqueda de habilidades)
+version: 1.0.0
+author: Tu Nombre
+license: MIT
+platforms: [macos, linux]          # Opcional — restringir a plataformas de SO específicas
+required_environment_variables:    # Opcional — metadatos de configuración segura al cargar
+  - name: MY_API_KEY
+    prompt: Clave API
+    help: Dónde obtenerla
+    required_for: funcionalidad completa
+prerequisites:                     # Requisitos de tiempo de ejecución heredados opcionales
+  env_vars: [MY_API_KEY]
+  commands: [curl, jq]
+metadata:
+  hermes:
+    tags: [Categoría, Subcategoría, Palabras clave]
+    related_skills: [other-skill-name]
+    fallback_for_toolsets: [web]
+    requires_toolsets: [terminal]
+---
+
+# Título de la Habilidad
+
+Introducción breve.
+
+## Cuándo Usar
+Condiciones de activación — ¿cuándo debería el agente cargar esta habilidad?
+
+## Referencia Rápida
+Tabla de comandos o llamadas API comunes.
+
+## Procedimiento
+Instrucciones paso a paso que el agente sigue.
+
+## Problemas Conocidos
+Modos de fallo conocidos y cómo manejarlos.
+
+## Verificación
+Cómo confirma el agente que funcionó.
+```
+
+### Estándares de autoría de habilidades (OBLIGATORIOS)
+
+Todo skill nuevo o modernizado — incluido, opcional o contribuido — debe cumplir estos estándares antes del merge:
+
+1. **`description` ≤ 60 caracteres, una oración, termina con punto.** Las descripciones largas saturan la UI de listado de habilidades. Indica la capacidad, no la implementación. Sin palabras de marketing ("potente", "completo", "fluido", "avanzado").
+
+2. **Las herramientas referenciadas en el cuerpo de SKILL.md deben ser herramientas nativas de Hermes o servidores MCP que la habilidad espere explícitamente.** Usa los nombres de herramientas en comillas invertidas: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, etc.
+
+3. **El campo `platforms:` auditado contra las importaciones reales del script.** Las habilidades que usen primitivos solo de POSIX deben declarar sus plataformas soportadas.
+
+4. **`author` da crédito primero al colaborador humano.**
+
+5. **El cuerpo de SKILL.md usa el orden moderno de secciones:** título, intro de 2-3 oraciones, luego: `## Cuándo Usar`, `## Prerequisitos`, `## Cómo Ejecutar`, `## Referencia Rápida`, `## Procedimiento`, `## Problemas Conocidos`, `## Verificación`.
+
+6. **Los scripts van en `scripts/`, las referencias en `references/`, las plantillas en `templates/`.**
+
+7. **Los tests viven en `tests/skills/test_<skill>_skill.py`** y usan solo stdlib + pytest + `unittest.mock`. Sin llamadas de red en vivo.
+
+8. **Las adiciones a `.env.example` están aisladas en un bloque claramente delimitado.**
+
+---
+
+## Añadir una Skin / Tema
+
+Hermes usa un sistema de skins basado en datos — no se necesitan cambios de código para añadir una nueva skin.
+
+**Opción A: Skin de usuario (archivo YAML)**
+
+Crea `~/.hermes/skins/<nombre>.yaml`:
+
+```yaml
+name: mitema
+description: Breve descripción del tema
+
+colors:
+  banner_border: "#HEX"
+  banner_title: "#HEX"
+  banner_accent: "#HEX"
+  banner_dim: "#HEX"
+  banner_text: "#HEX"
+  response_border: "#HEX"
+
+spinner:
+  waiting_faces: ["(⚔)", "(⛨)"]
+  thinking_faces: ["(⚔)", "(⌁)"]
+  thinking_verbs: ["forjando", "planeando"]
+
+branding:
+  agent_name: "Mi Agente"
+  welcome: "Mensaje de bienvenida"
+  response_label: " ⚔ Agente "
+  prompt_symbol: "⚔"
+
+tool_prefix: "╎"
+```
+
+Todos los campos son opcionales — los valores faltantes se heredan de la skin predeterminada.
+
+**Opción B: Skin integrada**
+
+Añade al dict `_BUILTIN_SKINS` en `hermes_cli/skin_engine.py`. Usa el mismo esquema que arriba pero como dict de Python.
+
+**Activar:**
+- CLI: `/skin mitema` o establece `display.skin: mitema` en config.yaml
+
+---
+
+## Compatibilidad Multiplataforma
+
+Hermes se ejecuta en Linux, macOS y Windows nativo (además de WSL2). Al escribir código
+que toca el SO, asume que *cualquier* plataforma puede alcanzar tu ruta de código.
+
+> **Antes de hacer PR:** ejecuta `scripts/check-windows-footguns.py` para detectar
+> los patrones inseguros comunes de Windows en tu diff. Es basado en grep y barato;
+> CI también lo ejecuta en cada PR.
+
+### Reglas críticas
+
+1. **Nunca llames `os.kill(pid, 0)` para comprobaciones de liveness.** En Windows **NO es una operación sin efecto**. Usa `psutil.pid_exists(pid)` en su lugar.
+
+2. **Usa `shutil.which()` antes de hacer shell — no asumas que Windows tiene las herramientas que tiene Linux.** `ps`, `kill`, `grep`, `awk`, etc. simplemente no existen en Windows.
+
+3. **`termios` y `fcntl` son solo de Unix.** Siempre captura tanto `ImportError` como `NotImplementedError`.
+
+4. **Codificación de archivos.** Windows puede guardar archivos `.env` en `cp1252`. Siempre maneja errores de codificación.
+
+5. **Gestión de procesos.** `os.setsid()`, `os.killpg()`, `os.fork()`, `os.getuid()` y el manejo de señales POSIX difieren en Windows.
+
+6. **Señales que no existen en Windows:** `SIGALRM`, `SIGCHLD`, `SIGHUP`, `SIGUSR1`, `SIGUSR2`, etc.
+
+7. **Separadores de ruta.** Usa `pathlib.Path` en lugar de concatenación de cadenas con `/`.
+
+8. **Los enlaces simbólicos necesitan privilegios elevados en Windows** (a menos que el Modo Desarrollador esté activado).
+
+9. **Los modos de archivo POSIX (0o600, 0o644, etc.) NO se aplican en NTFS** por defecto.
+
+10. **Los daemons de fondo desacoplados en Windows necesitan `pythonw.exe`, NO `python.exe`.**
+
+---
+
+## Consideraciones de Seguridad
+
+Hermes tiene acceso al terminal. La seguridad importa.
+
+### Protecciones existentes
+
+| Capa | Implementación |
+|------|---------------|
+| **Piping de contraseña sudo** | Usa `shlex.quote()` para prevenir inyección de shell |
+| **Detección de comandos peligrosos** | Patrones regex en `tools/approval.py` con flujo de aprobación del usuario |
+| **Inyección de prompts en cron** | Escáner en `tools/cronjob_tools.py` bloquea patrones de anulación de instrucciones |
+| **Lista de denegación de escritura** | Rutas protegidas resueltas a través de `os.path.realpath()` para prevenir bypass de enlaces simbólicos |
+| **Skills Guard** | Escáner de seguridad para habilidades instaladas desde el hub (`tools/skills_guard.py`) |
+| **Sandbox de ejecución de código** | El proceso hijo `execute_code` se ejecuta con claves API eliminadas del entorno |
+| **Fortalecimiento de contenedor** | Docker: todas las capacidades eliminadas, sin escalada de privilegios, límites de PID, tmpfs de tamaño limitado |
+
+### Al contribuir código sensible a la seguridad
+
+- **Siempre usa `shlex.quote()`** al interpolar entrada del usuario en comandos de shell
+- **Resuelve enlaces simbólicos** con `os.path.realpath()` antes de comprobaciones de control de acceso basadas en rutas
+- **No registres secretos.** Las claves API, tokens y contraseñas nunca deben aparecer en la salida de log
+- **Captura excepciones amplias** alrededor de la ejecución de herramientas para que un solo fallo no bloquee el bucle del agente
+- **Prueba en todas las plataformas** si tu cambio toca rutas de archivos, gestión de procesos o comandos de shell
+
+### Política de fijación de dependencias (fortalecimiento de la cadena de suministro)
+
+Tras el [compromiso de la cadena de suministro de litellm](https://github.com/BerriAI/litellm/issues/24512) en marzo de 2026 y la [campaña del gusano Mini Shai-Hulud](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) en mayo de 2026, todas las dependencias deben seguir estas reglas:
+
+| Tipo de fuente | Tratamiento requerido | Justificación |
+|---|---|---|
+| **Paquete PyPI** | `>=suelo,<siguiente_mayor` | Las versiones de PyPI son inmutables una vez publicadas, pero pueden empujarse nuevas versiones en tu rango. |
+| **URL de Git** | SHA completo del commit | Las ramas y etiquetas son refs mutables; el SHA está direccionado por contenido. |
+| **GitHub Actions** | SHA completo del commit + comentario de versión | Las etiquetas de acción son refs mutables. Fija como `uses: owner/action@<sha>  # vX.Y.Z` |
+| **Instalaciones pip solo de CI** | `==exacto` | Builds de CI herméticos; el cambio es aceptable. |
+
+**Cada nueva dependencia de PyPI en un PR debe tener un límite superior `<siguiente_mayor`.** Los PRs que añadan especificaciones `>=X.Y.Z` sin límite superior serán rechazados.
+
+---
+
+## Proceso de Pull Request
+
+### Nomenclatura de ramas
+
+```
+fix/descripcion        # Correcciones de errores
+feat/descripcion       # Nuevas funcionalidades
+docs/descripcion       # Documentación
+test/descripcion       # Tests
+refactor/descripcion   # Reestructuración de código
+```
+
+### Antes de enviar
+
+1. **Ejecutar tests**: `scripts/run_tests.sh` (recomendado; igual que CI) o `pytest tests/ -v` con el venv del proyecto activado
+2. **Probar manualmente**: Ejecuta `hermes` y ejercita la ruta de código que cambiaste
+3. **Verificar impacto multiplataforma**: Si tocas E/S de archivos, gestión de procesos o manejo del terminal, considera macOS, Linux y WSL2
+4. **Mantén los PRs enfocados**: Un cambio lógico por PR. No mezcles una corrección de error con una refactorización con una nueva funcionalidad.
+
+### Descripción del PR
+
+Incluye:
+- **Qué** cambió y **por qué**
+- **Cómo probarlo** (pasos de reproducción para errores, ejemplos de uso para funcionalidades)
+- **Qué plataformas** probaste
+- Referencia cualquier issue relacionado
+
+### Mensajes de commit
+
+Usamos [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+<tipo>(<alcance>): <descripción>
+```
+
+| Tipo | Usar para |
+|------|-----------|
+| `fix` | Correcciones de errores |
+| `feat` | Nuevas funcionalidades |
+| `docs` | Documentación |
+| `test` | Tests |
+| `refactor` | Reestructuración de código (sin cambio de comportamiento) |
+| `chore` | Build, CI, actualizaciones de dependencias |
+
+Alcances: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`, etc.
+
+Ejemplos:
+```
+fix(cli): prevenir bloqueo en save_config_value cuando el modelo es una cadena
+feat(gateway): añadir aislamiento de sesión multi-usuario de WhatsApp
+fix(security): prevenir inyección de shell en el piping de contraseña sudo
+test(tools): añadir tests unitarios para file_operations
+```
+
+---
+
+## Reportar Issues
+
+- Usa [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
+- Incluye: SO, versión de Python, versión de Hermes (`hermes version`), traza de error completa
+- Incluye pasos para reproducir
+- Verifica los issues existentes antes de crear duplicados
+- Para vulnerabilidades de seguridad, por favor reporta de forma privada
+
+---
+
+## Comunidad
+
+- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch) — para preguntas, mostrar proyectos y compartir habilidades
+- **GitHub Discussions**: Para propuestas de diseño y discusiones de arquitectura
+- **Skills Hub**: Sube habilidades especializadas a un registro y compártelas con la comunidad
+
+---
+
+## Licencia
+
+Al contribuir, aceptas que tus contribuciones serán licenciadas bajo la [Licencia MIT](LICENSE).
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -18,6 +18,24 @@ We value contributions in this order:

 ---

+## Before You Start: Search First
+
+A quick search before you build saves your time and keeps the PR queue clean — duplicates are common here, so it's worth a minute up front.
+
+- **Search both open *and* merged PRs and issues** for your topic or error symptom — the duplicate-check in the PR template fires at review time, after you've already done the work:
+  ```bash
+  gh search issues --repo NousResearch/hermes-agent "<your terms>"
+  gh search prs --repo NousResearch/hermes-agent --state all "<your terms>"
+  ```
+  Or use the web UI: [issues](https://github.com/NousResearch/hermes-agent/issues?q=) · [PRs (all states)](https://github.com/NousResearch/hermes-agent/pulls?q=is%3Apr).
+- **The issue tracker can lag the code.** Many requested features are already implemented in-tree, so also search the source (`search_files`, or your editor's grep) for the capability before proposing it.
+- **If an open PR already addresses it**, consider reviewing or improving that one instead of opening a competing duplicate.
+- **For larger work**, comment on the issue to signal you're working on it, so others don't start the same thing.
+
+Related: #38284 covers the agent-side analog — Hermes itself checking existing issues and PRs before deep self-troubleshooting. This section is the human-contributor complement.
+
+---
+
 ## Should it be a Skill or a Tool?

 This is the most common question for new contributors. The answer is almost always **skill**.
@ -412,6 +430,12 @@ Brief intro.
 ## When to Use
 Trigger conditions — when should the agent load this skill?

+## Prerequisites
+Env vars, install steps, MCP setup, API key sourcing.
+
+## How to Run
+Canonical invocation through the `terminal` tool.
+
 ## Quick Reference
 Table of common commands or API calls.

--- a/README.es.md
+++ b/README.es.md
@ -0,0 +1,220 @@
+<p align="center">
+  <img src="assets/banner.png" alt="Hermes Agent" width="100%">
+</p>
+
+# Hermes Agent ☤
+<p align="center">
+  <a href="https://hermes-agent.nousresearch.com/">Hermes Agent</a> | <a href="https://hermes-agent.nousresearch.com/">Hermes Desktop</a>
+</p>
+<p align="center">
+  <a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentación"></a>
+  <a href="https://discord.gg/NousResearch"><img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
+  <a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/Licencia-MIT-green?style=for-the-badge" alt="Licencia: MIT"></a>
+  <a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Creado%20por-Nous%20Research-blueviolet?style=for-the-badge" alt="Creado por Nous Research"></a>
+  <a href="README.md"><img src="https://img.shields.io/badge/Lang-English-blue?style=for-the-badge" alt="English"></a>
+  <a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
+  <a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
+</p>
+
+**El agente de IA con mejora continua creado por [Nous Research](https://nousresearch.com).** Es el único agente con un bucle de aprendizaje integrado: crea habilidades a partir de la experiencia, las mejora durante el uso, se impulsa a sí mismo a persistir el conocimiento, busca en sus propias conversaciones pasadas y construye un modelo cada vez más profundo de quién eres a lo largo de las sesiones. Ejecútalo en un VPS de $5, un clúster de GPUs o infraestructura sin servidor que cuesta casi nada cuando está inactivo. No está atado a tu laptop — habla con él desde Telegram mientras trabaja en una VM en la nube.
+
+Usa cualquier modelo que quieras — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (más de 200 modelos), [NovitaAI](https://novita.ai), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, o tu propio endpoint. Cambia con `hermes model` — sin cambios de código, sin dependencias.
+
+<table>
+<tr><td><b>Una interfaz de terminal real</b></td><td>TUI completa con edición multilínea, autocompletado de comandos, historial de conversaciones, interrupción y redirección, y salida de herramientas en streaming.</td></tr>
+<tr><td><b>Vive donde tú vives</b></td><td>Telegram, Discord, Slack, WhatsApp, Signal y CLI — todo desde un único proceso gateway. Transcripción de notas de voz, continuidad de conversación entre plataformas.</td></tr>
+<tr><td><b>Un bucle de aprendizaje cerrado</b></td><td>Memoria curada por el agente con recordatorios periódicos. Creación autónoma de habilidades tras tareas complejas. Las habilidades mejoran solas durante el uso. Búsqueda FTS5 de sesiones con resumención por LLM para recuperación entre sesiones. Modelado de usuario dialéctico <a href="https://github.com/plastic-labs/honcho">Honcho</a>. Compatible con el estándar abierto de <a href="https://agentskills.io">agentskills.io</a>.</td></tr>
+<tr><td><b>Automatizaciones programadas</b></td><td>Planificador cron integrado con entrega a cualquier plataforma. Informes diarios, copias de seguridad nocturnas, auditorías semanales — todo en lenguaje natural, ejecutándose de forma autónoma.</td></tr>
+<tr><td><b>Delega y paraleliza</b></td><td>Lanza subagentes aislados para flujos de trabajo paralelos. Escribe scripts de Python que llaman a herramientas vía RPC, convirtiendo pipelines de múltiples pasos en turnos de coste cero de contexto.</td></tr>
+<tr><td><b>Funciona en cualquier lugar, no solo en tu laptop</b></td><td>Seis backends de terminal — local, Docker, SSH, Singularity, Modal y Daytona. Daytona y Modal ofrecen persistencia sin servidor — el entorno de tu agente hiberna cuando está inactivo y se activa bajo demanda, costando casi nada entre sesiones. Ejecútalo en un VPS de $5 o un clúster de GPUs.</td></tr>
+<tr><td><b>Listo para investigación</b></td><td>Generación de trayectorias en lote, compresión de trayectorias para entrenar la próxima generación de modelos de llamadas a herramientas.</td></tr>
+</table>
+
+---
+
+## Instalación rápida
+
+### Linux, macOS, WSL2, Termux
+
+```bash
+curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
+```
+
+### Windows (nativo, PowerShell)
+
+> **Nota:** En Windows nativo, Hermes funciona sin WSL — la CLI, el gateway, la TUI y las herramientas funcionan de forma nativa. Si prefieres usar WSL2, el comando de Linux/macOS de arriba también funciona allí. ¿Encontraste un error? Por favor [crea un issue](https://github.com/NousResearch/hermes-agent/issues).
+
+Ejecuta esto en PowerShell:
+
+```powershell
+iex (irm https://hermes-agent.nousresearch.com/install.ps1)
+```
+
+El instalador se encarga de todo: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **y un Git Bash portátil** (MinGit, descomprimido en `%LOCALAPPDATA%\hermes\git` — no requiere administrador, completamente aislado de cualquier instalación de Git del sistema). Hermes usa este Git Bash incluido para ejecutar comandos de shell.
+
+Si ya tienes Git instalado, el instalador lo detecta y lo usa en su lugar. De lo contrario, una descarga de ~45MB de MinGit es todo lo que necesitas — no tocará ni interferirá con ningún Git del sistema.
+
+> **Android / Termux:** La ruta manual probada está documentada en la [guía de Termux](https://hermes-agent.nousresearch.com/docs/getting-started/termux). En Termux, Hermes instala el extra `.[termux]` curado porque el extra completo `.[all]` actualmente incluye dependencias de voz incompatibles con Android.
+>
+> **Windows:** Windows nativo es totalmente compatible — el comando de PowerShell de arriba instala todo. Si prefieres usar WSL2, el comando de Linux también funciona allí. La instalación nativa de Windows se encuentra en `%LOCALAPPDATA%\hermes`; WSL2 instala en `~/.hermes` como en Linux.
+
+Después de la instalación:
+
+```bash
+source ~/.bashrc    # recargar shell (o: source ~/.zshrc)
+hermes              # ¡empieza a chatear!
+```
+
+---
+
+## Primeros pasos
+
+```bash
+hermes              # CLI interactiva — inicia una conversación
+hermes model        # Elige tu proveedor y modelo LLM
+hermes tools        # Configura qué herramientas están habilitadas
+hermes config set   # Establece valores de configuración individuales
+hermes gateway      # Inicia el gateway de mensajería (Telegram, Discord, etc.)
+hermes setup        # Ejecuta el asistente de configuración completo
+hermes claw migrate # Migra desde OpenClaw (si vienes de OpenClaw)
+hermes update       # Actualiza a la última versión
+hermes doctor       # Diagnostica cualquier problema
+```
+
+📖 **[Documentación completa →](https://hermes-agent.nousresearch.com/docs/)**
+
+---
+
+## Evita la colección de claves API — Nous Portal
+
+Hermes funciona con cualquier proveedor que quieras — eso no cambiará. Pero si prefieres no recopilar cinco claves API separadas para el modelo, búsqueda web, generación de imágenes, TTS y un navegador en la nube, **[Nous Portal](https://portal.nousresearch.com)** las cubre todas bajo una sola suscripción:
+
+- **Más de 300 modelos** — elige cualquiera con `/model <nombre>`
+- **Tool Gateway** — búsqueda web (Firecrawl), generación de imágenes (FAL), texto a voz (OpenAI), navegador en la nube (Browser Use), todo enrutado a través de tu suscripción. Sin cuentas adicionales.
+
+Un comando desde una instalación nueva:
+
+```bash
+hermes setup --portal
+```
+
+Esto te autentica vía OAuth, establece Nous como tu proveedor y activa el Tool Gateway. Comprueba qué está conectado en cualquier momento con `hermes portal info`. Detalles completos en la [página de documentación del Tool Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
+
+Puedes seguir usando tus propias claves por herramienta cuando quieras — el gateway es por backend, no todo o nada.
+
+---
+
+## Referencia rápida: CLI vs Mensajería
+
+Hermes tiene dos puntos de entrada: inicia la interfaz de terminal con `hermes`, o ejecuta el gateway y habla con él desde Telegram, Discord, Slack, WhatsApp, Signal o Email. Una vez en una conversación, muchos comandos de barra son compartidos entre ambas interfaces.
+
+| Acción                              | CLI                                           | Plataformas de mensajería                                                         |
+| ----------------------------------- | --------------------------------------------- | --------------------------------------------------------------------------------- |
+| Empezar a chatear                   | `hermes`                                      | Ejecuta `hermes gateway setup` + `hermes gateway start`, luego envía un mensaje al bot |
+| Nueva conversación                  | `/new` o `/reset`                             | `/new` o `/reset`                                                                 |
+| Cambiar modelo                      | `/model [proveedor:modelo]`                   | `/model [proveedor:modelo]`                                                       |
+| Establecer personalidad             | `/personality [nombre]`                       | `/personality [nombre]`                                                           |
+| Reintentar o deshacer último turno  | `/retry`, `/undo`                             | `/retry`, `/undo`                                                                 |
+| Comprimir contexto / ver uso        | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]`                                         |
+| Explorar habilidades                | `/skills` o `/<nombre-habilidad>`             | `/<nombre-habilidad>`                                                             |
+| Interrumpir trabajo actual          | `Ctrl+C` o enviar un nuevo mensaje            | `/stop` o enviar un nuevo mensaje                                                 |
+| Estado específico de plataforma     | `/platforms`                                  | `/status`, `/sethome`                                                             |
+
+Para las listas de comandos completas, consulta la [guía de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) y la [guía del Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
+
+---
+
+## Documentación
+
+Toda la documentación está en **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
+
+| Sección                                                                                             | Contenido                                                    |
+| --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
+| [Inicio rápido](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart)              | Instalar → configurar → primera conversación en 2 minutos   |
+| [Uso de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli)                             | Comandos, atajos de teclado, personalidades, sesiones        |
+| [Configuración](https://hermes-agent.nousresearch.com/docs/user-guide/configuration)               | Archivo de configuración, proveedores, modelos, todas las opciones |
+| [Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging)           | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant   |
+| [Seguridad](https://hermes-agent.nousresearch.com/docs/user-guide/security)                        | Aprobación de comandos, emparejamiento por DM, aislamiento en contenedor |
+| [Herramientas y Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools)   | Más de 40 herramientas, sistema de toolsets, backends de terminal |
+| [Sistema de Habilidades](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills)   | Memoria procedimental, Skills Hub, creación de habilidades   |
+| [Memoria](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory)                   | Memoria persistente, perfiles de usuario, mejores prácticas  |
+| [Integración MCP](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp)              | Conecta cualquier servidor MCP para capacidades extendidas   |
+| [Programación Cron](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron)           | Tareas programadas con entrega a plataforma                  |
+| [Archivos de Contexto](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Contexto de proyecto que da forma a cada conversación      |
+| [Arquitectura](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture)            | Estructura del proyecto, bucle del agente, clases principales |
+| [Contribuir](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing)              | Configuración de desarrollo, proceso de PR, estilo de código |
+| [Referencia de CLI](https://hermes-agent.nousresearch.com/docs/reference/cli-commands)             | Todos los comandos y flags                                   |
+| [Variables de Entorno](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Referencia completa de variables de entorno                  |
+
+---
+
+## Migración desde OpenClaw
+
+Si vienes de OpenClaw, Hermes puede importar automáticamente tu configuración, memorias, habilidades y claves API.
+
+**Durante la configuración inicial:** El asistente de configuración (`hermes setup`) detecta automáticamente `~/.openclaw` y ofrece migrar antes de que comience la configuración.
+
+**En cualquier momento después de instalar:**
+
+```bash
+hermes claw migrate              # Migración interactiva (preset completo)
+hermes claw migrate --dry-run    # Vista previa de qué se migraría
+hermes claw migrate --preset user-data   # Migrar sin secretos
+hermes claw migrate --overwrite  # Sobreescribir conflictos existentes
+```
+
+Qué se importa:
+
+- **SOUL.md** — archivo de personalidad
+- **Memorias** — entradas de MEMORY.md y USER.md
+- **Habilidades** — habilidades creadas por el usuario → `~/.hermes/skills/openclaw-imports/`
+- **Lista de comandos permitidos** — patrones de aprobación
+- **Configuración de mensajería** — configuración de plataformas, usuarios permitidos, directorio de trabajo
+- **Claves API** — secretos en lista de permitidos (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
+- **Assets de TTS** — archivos de audio del espacio de trabajo
+- **Instrucciones del espacio de trabajo** — AGENTS.md (con `--workspace-target`)
+
+Consulta `hermes claw migrate --help` para todas las opciones, o usa la habilidad `openclaw-migration` para una migración guiada interactiva por el agente con vistas previas de dry-run.
+
+---
+
+## Contribuir
+
+¡Las contribuciones son bienvenidas! Consulta la [Guía de Contribución](CONTRIBUTING.es.md) para la configuración del desarrollo, el estilo de código y el proceso de PR.
+
+Inicio rápido para colaboradores — clona y comienza con `setup-hermes.sh`:
+
+```bash
+git clone https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+./setup-hermes.sh     # instala uv, crea venv, instala .[all], enlaza ~/.local/bin/hermes
+./hermes              # detecta automáticamente el venv, no necesitas hacer `source` primero
+```
+
+Ruta manual (equivalente a lo anterior):
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+uv venv .venv --python 3.11
+source .venv/bin/activate
+uv pip install -e ".[all,dev]"
+scripts/run_tests.sh
+```
+
+---
+
+## Comunidad
+
+- 💬 [Discord](https://discord.gg/NousResearch)
+- 📚 [Skills Hub](https://agentskills.io)
+- 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
+- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Servidor MCP de control de escritorio Linux para Hermes y otros hosts MCP, con árboles de accesibilidad AT-SPI, entrada Wayland/X11, capturas de pantalla y targeting de ventanas del compositor.
+- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Puente WeChat comunitario: Ejecuta Hermes Agent y OpenClaw en la misma cuenta de WeChat.
+
+---
+
+## Licencia
+
+MIT — ver [LICENSE](LICENSE).
+
+Creado por [Nous Research](https://nousresearch.com).
--- a/README.md
+++ b/README.md
@ -13,6 +13,7 @@
  <a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
  <a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
  <a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
+  <a href="README.es.md"><img src="https://img.shields.io/badge/Lang-Español-orange?style=for-the-badge" alt="Español"></a>
 </p>

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
@ -64,6 +65,41 @@ source ~/.bashrc    # reload shell (or: source ~/.zshrc)
 hermes              # start chatting!
 ```

+### Troubleshooting
+
+#### Windows Defender or antivirus flags `uv.exe` as malware
+
+If your antivirus (Bitdefender, Windows Defender, etc.) quarantines `uv.exe` from the Hermes `bin` folder (`%LOCALAPPDATA%\hermes\bin\uv.exe`), this is a **false positive**. The file is Astral's `uv` — the Rust Python package manager Hermes bundles to manage its Python environment. ML-based antivirus engines commonly flag unsigned Rust binaries that download and install packages.
+
+**To verify your copy is authentic:**
+
+```powershell
+# Install GitHub CLI if needed
+winget install --id GitHub.cli
+
+# Login to GitHub
+gh auth login
+
+# Run verification
+$uv = "$env:LOCALAPPDATA\hermes\bin\uv.exe"
+$ver = (& $uv --version).Split(' ')[1]
+[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
+$zip = "$env:TEMP\uv.zip"
+Invoke-WebRequest "https://github.com/astral-sh/uv/releases/download/$ver/uv-x86_64-pc-windows-msvc.zip" -OutFile $zip -UseBasicParsing
+gh attestation verify $zip --repo astral-sh/uv
+Expand-Archive $zip "$env:TEMP\uv_x" -Force
+(Get-FileHash "$env:TEMP\uv_x\uv.exe").Hash -eq (Get-FileHash $uv).Hash
+```
+
+If attestation says "Verification succeeded" and the last line prints `True`, you're good.
+
+**To whitelist Hermes:**
+- **Windows Defender:** Run PowerShell as Admin → `Add-MpPreference -ExclusionPath "$env:LOCALAPPDATA\hermes\bin"`
+- **Bitdefender:** Add an exception in the Bitdefender console (Protection > Antivirus > Settings > Manage Exceptions)
+- Whitelist the **folder**, not the file hash — Hermes updates `uv` and the hash changes every version
+
+For more context, see the upstream Astral reports: [astral-sh/uv#13553](https://github.com/astral-sh/uv/issues/13553), [astral-sh/uv#15011](https://github.com/astral-sh/uv/issues/15011), [astral-sh/uv#10079](https://github.com/astral-sh/uv/issues/10079).
+
 ---

 ## Getting Started
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@ -39,7 +39,11 @@ curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash

 > **Android / Termux：** 已测试的手动安装路径请参考 [Termux 指南](https://hermes-agent.nousresearch.com/docs/getting-started/termux)。在 Termux 上，Hermes 会安装精选的 `.[termux]` 扩展，因为完整的 `.[all]` 扩展会拉取 Android 不兼容的语音依赖。
 >
-> **Windows：** 原生 Windows 不受支持。请安装 [WSL2](https://learn.microsoft.com/zh-cn/windows/wsl/install) 并运行上述命令。
+> **Windows：** 在 PowerShell 中运行：
+> ```powershell
+> iex (irm https://hermes-agent.nousresearch.com/install.ps1)
+> ```
+> 安装完成后，可能需要重启终端，然后运行 `hermes` 开始对话。

 安装后：

--- a/SECURITY.es.md
+++ b/SECURITY.es.md
@ -0,0 +1,322 @@
+# Política de Seguridad de Hermes Agent
+
+Este documento describe el modelo de confianza de Hermes Agent, identifica el
+único límite de seguridad que el proyecto trata como estructural y define el
+alcance para los informes de vulnerabilidades.
+
+## 1. Reportar una Vulnerabilidad
+
+Reporta de forma privada a través de [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
+o **security@nousresearch.com**. No abras issues públicos para
+vulnerabilidades de seguridad. **Hermes Agent no opera un programa de
+recompensas por errores.**
+
+Un informe útil incluye:
+
+- Una descripción concisa y evaluación de severidad.
+- El componente afectado, identificado por ruta de archivo y rango de líneas
+  (ej. `path/to/file.py:120-145`).
+- Detalles del entorno (`hermes version`, SHA del commit, SO, versión de Python).
+- Una reproducción contra `main` o el último release.
+- Una declaración de qué límite de confianza del §2 se cruza.
+
+Por favor lee el §2 y el §3 antes de enviar. Los informes que demuestren
+límites de una heurística en proceso que esta política no trate como un
+límite serán cerrados como fuera de alcance bajo el §3 — pero consulta el §3.2:
+siguen siendo bienvenidos como issues o pull requests regulares, simplemente no
+a través del canal de seguridad privado.
+
+---
+
+## 2. Modelo de Confianza
+
+Hermes Agent es un agente personal de un solo inquilino. Su postura es
+por capas, y las capas no tienen el mismo peso. Los reportadores y
+operadores deben razonar sobre ellas en los mismos términos.
+
+### 2.1 Definiciones
+
+- **Proceso del agente.** El intérprete Python que ejecuta Hermes Agent,
+  incluyendo cualquier módulo Python que haya cargado (habilidades, plugins,
+  manejadores de hooks).
+- **Backend de terminal.** Un objetivo de ejecución conectado para la
+  herramienta `terminal()`. El predeterminado ejecuta comandos directamente en el host.
+  Otros backends ejecutan comandos dentro de un contenedor, sandbox en la nube o
+  host remoto.
+- **Superficie de entrada.** Cualquier canal a través del cual el contenido entra en el
+  contexto del agente: entrada del operador, fetches web, email, mensajes del gateway,
+  lecturas de archivos, respuestas del servidor MCP, resultados de herramientas.
+- **Envolvente de confianza.** El conjunto de recursos a los que un operador ha otorgado
+  implícitamente acceso a Hermes Agent al ejecutarlo — típicamente, todo lo que
+  la propia cuenta de usuario del operador puede alcanzar en el host.
+- **Postura.** Una declaración explícita en la documentación o código de Hermes Agent
+  sobre cómo una capa consumidora (adaptador, UI, escritor de archivos,
+  shell) debe tratar la salida del agente — ej. "el dashboard renderiza
+  la salida del agente como HTML inerte."
+
+### 2.2 El Límite: Aislamiento a Nivel de SO
+
+**El único límite de seguridad contra un LLM adversario es el
+sistema operativo.** Nada dentro del proceso del agente constituye
+contención — ni la puerta de aprobación, ni la redacción de salida, ni ningún
+escáner de patrones, ni ninguna lista de herramientas permitidas. Cualquier componente dentro
+del proceso que filtre la salida del LLM es una heurística operando sobre una
+cadena influenciada por el atacante, y esta política lo trata como tal.
+
+Hermes Agent admite dos posturas de aislamiento a nivel de SO. Abordan
+diferentes amenazas y un operador debe elegir deliberadamente.
+
+#### Aislamiento del backend de terminal
+
+Un backend de terminal no predeterminado ejecuta comandos de shell emitidos por el LLM
+dentro de un contenedor, host remoto o sandbox en la nube. Las herramientas de archivos
+(`read_file`, `write_file`, `patch`) también se ejecutan a través de este backend,
+ya que están implementadas sobre el contrato del shell — no pueden
+alcanzar rutas que el backend no exponga.
+
+Lo que confina: todo lo que el agente hace emitiendo operaciones de shell o
+de archivos. Lo que **no** confina: todo lo que el agente hace en su propio
+proceso Python. Eso incluye la herramienta de ejecución de código (lanzada como
+subproceso del host), subprocesos MCP (lanzados desde el entorno del agente),
+carga de plugins, despacho de hooks y carga de habilidades (todos importados en el
+intérprete del agente).
+
+El aislamiento del backend de terminal es la postura correcta cuando la preocupación es
+que el LLM emita comandos de shell destructivos o escrituras de herramientas de archivo no deseadas, y el
+operador es de confianza.
+
+#### Envoltura del proceso completo
+
+La envoltura del proceso completo ejecuta todo el árbol de procesos del agente dentro de un
+sandbox. Cada ruta de código — shell, ejecución de código, MCP, herramientas de archivos,
+plugins, hooks, carga de habilidades — está sujeta a la misma política de sistema de archivos,
+red, proceso e (donde sea aplicable) inferencia.
+
+Hermes Agent admite esto de dos maneras:
+
+- **La propia imagen Docker de Hermes Agent y la configuración de Compose.** Más
+  liviana; el agente se ejecuta en un contenedor estándar con montajes y
+  política de red configurados por el operador.
+- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
+  OpenShell proporciona sandboxes por sesión con política declarativa
+  a través de capas de sistema de archivos, red (egreso L7), proceso/syscall e
+  enrutamiento de inferencia. Las políticas de red e inferencia son
+  recargables en caliente. Las credenciales se inyectan desde un almacén de Proveedor
+  y nunca tocan el sistema de archivos del sandbox.
+
+Bajo una envoltura de proceso completo, las heurísticas en proceso de Hermes Agent
+(§2.4) funcionan como prevención de accidentes en capas sobre un límite real.
+Esta es la postura soportada cuando el agente ingiere contenido de superficies
+que el operador no controla — la web abierta, email entrante, canales de
+múltiples usuarios, servidores MCP no confiables — y para despliegues en
+producción o compartidos.
+
+Los operadores que ejecuten el backend local predeterminado con superficies de entrada
+no confiables, o que ejecuten un sandbox de backend de terminal esperando que contenga
+rutas de código que no pasan por el shell, están operando fuera de la postura de
+seguridad soportada.
+
+### 2.3 Alcance de Credenciales
+
+Hermes Agent filtra el entorno que pasa a sus componentes en proceso de
+menor confianza: subprocesos de shell, subprocesos MCP y el proceso hijo
+de ejecución de código. Las credenciales como las claves API del proveedor y los
+tokens del gateway se eliminan por defecto; las variables declaradas explícitamente
+por el operador o por una habilidad cargada se pasan.
+
+Esto reduce la exfiltración casual. No es contención. Cualquier
+componente que se ejecute dentro del proceso del agente (habilidades, plugins, manejadores
+de hooks) puede leer lo que el agente mismo puede leer, incluidas las
+credenciales en memoria. La mitigación contra un componente en proceso comprometido
+es la revisión del operador antes de instalar (§2.4, §2.5), no el
+saneamiento del entorno.
+
+### 2.4 Heurísticas en Proceso
+
+Los siguientes componentes filtran o advierten sobre el comportamiento del LLM. Son
+útiles. No son límites.
+
+- La **puerta de aprobación** detecta patrones de shell destructivos comunes
+  y le pide al operador confirmación antes de la ejecución. El shell es Turing-
+  completo; una lista de denegación sobre cadenas de shell es estructuralmente
+  incompleta. La puerta detecta errores en modo cooperativo, no salidas
+  adversariales.
+- **La redacción de salida** elimina patrones similares a secretos de la visualización.
+  Un productor de salida motivado la evitará.
+- **Skills Guard** escanea el contenido de habilidades instalables en busca de patrones
+  de inyección. Es una ayuda de revisión; el límite para habilidades de terceros
+  es la revisión del operador antes de instalar. Revisar una habilidad significa
+  leer su código Python y scripts, no solo su descripción SKILL.md —
+  las habilidades ejecutan Python arbitrario en el momento de importación.
+
+### 2.5 Modelo de Confianza de Plugins
+
+Los plugins se cargan en el proceso del agente y se ejecutan con todos los privilegios
+del agente: pueden leer las mismas credenciales, llamar a las mismas
+herramientas, registrar los mismos hooks e importar los mismos módulos que
+cualquier cosa incluida en el árbol. El límite para los plugins de terceros es
+la revisión del operador antes de instalar — la misma regla que las habilidades (§2.4),
+mencionado por separado porque los plugins son arquitectónicamente más pesados
+y a menudo incluyen sus propios servicios en segundo plano, oyentes de red
+y dependencias.
+
+Un plugin malicioso o con errores no es una vulnerabilidad en Hermes Agent
+en sí mismo. Los errores en la ruta de instalación o descubrimiento de plugins de Hermes Agent
+que impidan al operador ver lo que está instalando están en alcance bajo el §3.1.
+
+### 2.6 Superficies Externas
+
+Una **superficie externa** es cualquier canal fuera del proceso del agente local
+a través del cual un llamador puede despachar trabajo del agente, resolver
+aprobaciones o recibir salida del agente. Cada superficie tiene su propio
+modelo de autorización, pero las reglas a continuación se aplican uniformemente.
+
+**Superficies en Hermes Agent:**
+
+- **Adaptadores de plataforma del gateway.** Integraciones de mensajería en
+  `gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
+  y adaptadores análogos incluidos como plugins.
+- **Superficies HTTP expuestas en red.** El adaptador del servidor API, el
+  plugin del dashboard, los endpoints HTTP del plugin kanban, y cualquier
+  otro plugin que vincule un socket de escucha.
+- **Adaptadores de Editor / IDE.** El adaptador ACP (`acp_adapter/`) e
+  integraciones equivalentes que aceptan solicitudes de un proceso cliente local.
+- **El gateway TUI (`tui_gateway/`).** Backend JSON-RPC para la
+  UI de terminal Ink, alcanzado a través de IPC local.
+
+**Reglas uniformes:**
+
+1. **Se requiere autorización en cada superficie que cruce un límite de confianza.** Para
+   superficies de mensajería y HTTP en red, el límite es la red: la autorización
+   significa una lista de llamadores permitidos configurada por el operador. Para superficies
+   de editor e IPC local (ACP, gateway TUI), el límite es la cuenta de usuario del host:
+   la autorización significa depender del control de acceso a nivel de SO (permisos
+   de archivos, vinculaciones solo a loopback) y no exponer la superficie más allá
+   del usuario local sin una capa de autenticación de red explícita.
+2. **Se requiere una lista de permitidos para cada adaptador de red habilitado.**
+   Los adaptadores deben rechazar despachar trabajo del agente, resolver
+   aprobaciones o transmitir salida hasta que se establezca una lista de permitidos. Las rutas
+   de código que fallan de forma abierta cuando no hay lista de permitidos configurada son errores de código en
+   alcance bajo el §3.1.
+3. **Los identificadores de sesión son manejadores de enrutamiento, no límites de autorización.**
+   Conocer el ID de sesión de otro llamador no otorga acceso a sus aprobaciones o salida;
+   la autorización siempre se vuelve a verificar contra la lista de permitidos (o equivalente
+   a nivel de SO).
+4. **Dentro del conjunto autorizado, todos los llamadores tienen la misma confianza.**
+   Hermes Agent no modela capacidades por llamador dentro de un único adaptador.
+   Los operadores que necesiten separación de capacidades deben ejecutar instancias
+   de agente separadas con listas de permitidos separadas.
+5. **Vincular una superficie solo local a una interfaz no-loopback es una decisión de
+   operador de emergencia (§3.2).** El dashboard y otros servidores HTTP de plugins
+   son predeterminados a loopback; exponerlos a través de `--host 0.0.0.0` o equivalente
+   hace que el fortalecimiento de exposición pública (§4) sea responsabilidad del operador.
+
+---
+
+## 3. Alcance
+
+### 3.1 En Alcance
+
+- Escape de una postura de aislamiento a nivel de SO declarada (§2.2): una
+  ruta de código controlada por el atacante alcanzando estado que la postura
+  afirmó confinar.
+- Acceso no autorizado a superficie externa: un llamador fuera del conjunto de
+  autorización configurado (lista de permitidos, o equivalente a nivel de SO
+  para superficies de IPC local) despachando trabajo, recibiendo salida o
+  resolviendo aprobaciones (§2.6).
+- Exfiltración de credenciales: filtración de credenciales del operador o
+  material de autorización de sesión a un destino fuera del envolvente de
+  confianza, a través de un mecanismo que debería haberlo prevenido
+  (error de saneamiento de entorno, registro del adaptador, error de transporte
+  que vacía credenciales a un upstream, etc.).
+- Violaciones de la documentación del modelo de confianza: código que se comporta
+  contrariamente a lo que esta política, la propia documentación de Hermes Agent o
+  las expectativas razonables del operador predecirían — incluyendo casos donde
+  Hermes Agent ha documentado una postura sobre cómo su salida debe ser
+  renderizada por una capa consumidora (dashboard, adaptador de gateway,
+  escritor de archivos, shell) y una ruta de código rompe esa postura.
+
+### 3.2 Fuera de Alcance
+
+"Fuera de alcance" aquí significa "no es una vulnerabilidad de seguridad bajo esta
+política." No significa "no vale la pena reportarlo." Las mejoras a las
+heurísticas en proceso, ideas de fortalecimiento y correcciones de UX son bienvenidas como
+issues o pull requests regulares — la puerta de aprobación siempre puede detectar
+más patrones, la redacción puede volverse más inteligente, el comportamiento del adaptador
+puede apretarse siempre. Estos elementos simplemente no van a través del canal de
+divulgación privada y no reciben avisos.
+
+- **Bypasses de heurísticas en proceso (§2.4)** — bypasses de regex de la puerta de aprobación,
+  bypasses de redacción, bypasses de patrones de Skills Guard, e informes
+  análogos contra heurísticas futuras. Estos componentes no son límites;
+  vencerlos no es una vulnerabilidad bajo esta política.
+- **Inyección de prompts per se.** Hacer que el LLM emita salida inusual
+  — a través de contenido inyectado, alucinación, artefactos de entrenamiento,
+  o cualquier otra causa — no es en sí mismo una vulnerabilidad. "Logré
+  inyección de prompts" sin un resultado encadenado del §3.1 no es un informe
+  procesable bajo esta política.
+- **Consecuencias de una postura de aislamiento elegida.** Los informes de que
+  una ruta de código que opera dentro del alcance de su postura puede hacer lo que esa
+  postura permite no son vulnerabilidades. Ejemplos: herramientas de shell o archivos
+  que alcanzan estado del host bajo el backend local; subprocesos de ejecución de código
+  o MCP que alcanzan estado del host bajo aislamiento de backend de terminal que solo
+  sandboxea el shell; informes cuyas precondiciones requieren acceso de escritura preexistente
+  a archivos de configuración o credenciales propiedad del operador (esos ya están dentro
+  del envolvente de confianza).
+- **Configuraciones documentadas de emergencia.** Compensaciones seleccionadas por el operador
+  que deshabilitan explícitamente protecciones: `--insecure` y flags equivalentes
+  en el dashboard u otros componentes, aprobaciones deshabilitadas,
+  backend local en producción, perfiles de desarrollo que evitan
+  la seguridad de hermes-home, y similares. Los informes contra esas
+  configuraciones no son vulnerabilidades — eso es el trabajo del flag.
+- **Habilidades y plugins contribuidos por la comunidad.** Las habilidades de terceros
+  (incluyendo el repositorio de habilidades de la comunidad) y los plugins de terceros
+  están en la superficie de revisión del operador, no en la superficie de confianza de Hermes Agent
+  (§2.4, §2.5). Una habilidad o plugin que haga algo
+  malicioso es el modo de falla esperado de uno que no fue
+  revisado, no una vulnerabilidad en Hermes Agent. Los errores en la ruta de
+  instalación de habilidades o plugins de Hermes Agent que impidan al
+  operador ver lo que está instalando están en alcance bajo el §3.1.
+- **Exposición pública sin controles externos.** Exponer el
+  gateway o la API a la internet pública sin autenticación,
+  VPN o firewall.
+- **Restricciones de lectura/escritura a nivel de herramienta en una postura donde el shell está
+  permitido.** Si una ruta es alcanzable a través de la herramienta terminal, los informes
+  de que otras herramientas de archivos pueden alcanzarla no añaden nada.
+
+---
+
+## 4. Fortalecimiento del Despliegue
+
+La decisión de fortalecimiento más importante es hacer coincidir el aislamiento
+(§2.2) con la confianza del contenido que el agente ingerirá. Más allá de eso:
+
+- Ejecuta el agente como usuario no-root. La imagen de contenedor proporcionada
+  hace esto por defecto.
+- Mantén las credenciales en el archivo de credenciales del operador con permisos
+  estrictos, nunca en la configuración principal, nunca en control de versiones.
+  Bajo OpenShell, usa el almacén de Proveedores en lugar de un archivo de
+  credenciales en disco.
+- No expongas el gateway o la API a la internet pública sin
+  VPN, Tailscale o protección de firewall. Bajo OpenShell, usa la
+  capa de política de red para restringir el egreso.
+- Configura una lista de llamadores permitidos para cada adaptador de red expuesto
+  que habilites (§2.6).
+- Revisa las habilidades y plugins de terceros antes de instalar (§2.4,
+  §2.5). Para las habilidades, esto significa leer el Python y los scripts,
+  no solo SKILL.md. Los informes de Skills Guard y el registro de auditoría
+  de instalación son la superficie de revisión.
+- Hermes Agent incluye guardias de cadena de suministro para lanzamientos de servidores
+  MCP y para cambios de dependencias / paquetes incluidos en CI; consulta
+  `CONTRIBUTING.es.md` para más detalles.
+
+---
+
+## 5. Divulgación
+
+- **Ventana de divulgación coordinada:** 90 días desde el informe, o hasta que se
+  publique una corrección, lo que ocurra primero.
+- **Canal:** el hilo GHSA o correspondencia por email con
+  security@nousresearch.com.
+- **Crédito:** los reportadores reciben crédito en las notas de versión a menos que
+  se solicite anonimato.
--- a/SECURITY.md
+++ b/SECURITY.md
@ -121,10 +121,11 @@ outside the supported security posture.
 ### 2.3 Credential Scoping

 Hermes Agent filters the environment it passes to its lower-trust
-in-process components: shell subprocesses, MCP subprocesses, and
-the code-execution child. Credentials like provider API keys and
-gateway tokens are stripped by default; variables explicitly
-declared by the operator or by a loaded skill are passed through.
+in-process components: shell subprocesses, MCP subprocesses,
+cron job scripts, and the code-execution child. Credentials like
+provider API keys and gateway tokens are stripped by default;
+variables explicitly declared by the operator or by a loaded
+skill are passed through.

 This reduces casual exfiltration. It is not containment. Any
 component running inside the agent process (skills, plugins, hook
--- a/acp_adapter/session.py
+++ b/acp_adapter/session.py
@ -617,6 +617,10 @@ class SessionManager:

        _register_task_cwd(session_id, cwd)
        agent = AIAgent(**kwargs)
+        # Codex app-server sessions are spawned lazily on the first turn. Stamp
+        # the ACP workspace onto the agent so the Codex runtime starts from the
+        # editor/session cwd instead of the Hermes daemon's process cwd.
+        agent.session_cwd = cwd
        # ACP stdio transport requires stdout to remain protocol-only JSON-RPC.
        # Route any incidental human-readable agent output to stderr instead.
        agent._print_fn = _acp_stderr_print
--- a/acp_registry/agent.json
+++ b/acp_registry/agent.json
@ -1,7 +1,7 @@
 {
  "id": "hermes-agent",
  "name": "Hermes Agent",
-  "version": "0.16.0",
+  "version": "0.17.0",
  "description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
  "repository": "https://github.com/NousResearch/hermes-agent",
  "website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@ -9,7 +9,7 @@
  "license": "MIT",
  "distribution": {
    "uvx": {
-      "package": "hermes-agent[acp]==0.16.0",
+      "package": "hermes-agent[acp]==0.17.0",
      "args": ["hermes-acp"]
    }
  }
--- a/agent/agent_init.py
+++ b/agent/agent_init.py
@ -50,7 +50,7 @@ from agent.tool_guardrails import (
 from hermes_cli.config import cfg_get
 from hermes_cli.timeouts import get_provider_request_timeout
 from hermes_constants import get_hermes_home
-from utils import base_url_host_matches
+from utils import base_url_host_matches, is_truthy_value

 # Use the same logger name as run_agent so tests patching ``run_agent.logger``
 # capture our warnings.  (run_agent.py also does
@ -265,7 +265,8 @@ def init_agent(
            output_config.format instead of a trailing-assistant prefill.
        platform (str): The interface platform the user is on (e.g. "cli", "telegram", "discord", "whatsapp").
            Used to inject platform-specific formatting hints into the system prompt.
-        skip_context_files (bool): If True, skip auto-injection of SOUL.md, AGENTS.md, and .cursorrules
+        skip_context_files (bool): If True, skip auto-injection of project context files
+            (SOUL.md, .hermes.md, AGENTS.md, CLAUDE.md, .cursorrules) from the cwd / HERMES_HOME
            into the system prompt. Use this for batch processing and data generation to avoid
            polluting trajectories with user-specific persona or project instructions.
        load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary
@ -531,7 +532,14 @@ def init_agent(
    agent._last_activity_desc: str = "initializing"
    agent._current_tool: str | None = None
    agent._api_call_count: int = 0
-
+    # Opt-out flag for the between-turns MCP tool refresh (build_turn_context).
+    # Set on internal forks (e.g. background_review) that must keep ``tools[]``
+    # byte-identical to a parent for provider cache parity.
+    agent._skip_mcp_refresh = False
+    # Registry generation the current tool snapshot was derived from. Lets a
+    # late/concurrent refresh reject a stale (older-generation) rebuild instead
+    # of clobbering a newer one. Set adjacent to the tool snapshot below.
+    agent._tool_snapshot_generation = 0
    # Rate limit tracking — updated from x-ratelimit-* response headers
    # after each API call.  Accessed by /usage slash command.
    agent._rate_limit_state: Optional["RateLimitState"] = None
@ -800,6 +808,8 @@ def init_agent(
                # _custom_headers; older/mocked clients may expose
                # _default_headers instead.
                _routed_headers = getattr(_routed_client, "_custom_headers", None)
+                if not _routed_headers:
+                    _routed_headers = getattr(_routed_client, "default_headers", None)
                if not _routed_headers:
                    _routed_headers = getattr(_routed_client, "_default_headers", None)
                if _routed_headers:
@ -853,6 +863,8 @@ def init_agent(
                            if _provider_timeout is not None:
                                client_kwargs["timeout"] = _provider_timeout
                            _fb_headers = getattr(_fb_client, "_custom_headers", None)
+                            if not _fb_headers:
+                                _fb_headers = getattr(_fb_client, "default_headers", None)
                            if not _fb_headers:
                                _fb_headers = getattr(_fb_client, "_default_headers", None)
                            if _fb_headers:
@ -953,7 +965,14 @@ def init_agent(
            print(f"🔄 Fallback chain ({len(agent._fallback_chain)} providers): " +
                  " → ".join(f"{f['model']} ({f['provider']})" for f in agent._fallback_chain))

-    # Get available tools with filtering
+    # Get available tools with filtering. Capture the registry generation this
+    # snapshot is derived from FIRST, so a later concurrent refresh can tell
+    # whether it holds a newer or staler view (see refresh_agent_mcp_tools).
+    try:
+        from tools.registry import registry as _snapshot_registry
+        agent._tool_snapshot_generation = _snapshot_registry._generation
+    except Exception:
+        agent._tool_snapshot_generation = 0
    agent.tools = _ra().get_tool_definitions(
        enabled_toolsets=enabled_toolsets,
        disabled_toolsets=disabled_toolsets,
@ -1081,6 +1100,12 @@ def init_agent(
    agent._parent_session_id = parent_session_id
    agent._last_flushed_db_idx = 0  # tracks DB-write cursor to prevent duplicate writes
    agent._session_db_created = False  # DB row deferred to run_conversation()
+    # Most agents own their session row and should finalize it on close().
+    # Some temporary helper agents (manual compression / session-hygiene /
+    # background-review forks) rotate or share the session forward to a
+    # continuation row that must remain open after the helper is torn down;
+    # those callers explicitly set this flag to False.
+    agent._end_session_on_close = True
    agent._session_init_model_config = {
        "max_iterations": agent.max_iterations,
        "reasoning_config": reasoning_config,
@ -1325,6 +1350,14 @@ def init_agent(
    compression_abort_on_summary_failure = str(
        _compression_cfg.get("abort_on_summary_failure", False)
    ).lower() in {"true", "1", "yes"}
+    # In-place compaction: when True, compress_context() rewrites the message
+    # list + rebuilds the system prompt WITHOUT rotating the session id (no
+    # parent_session_id chain, no `name #N` renumber). See #38763 and
+    # agent/conversation_compression.py. Consumed by compress_context(), not the
+    # compressor, so it rides on the agent.
+    compression_in_place = is_truthy_value(
+        _compression_cfg.get("in_place"), default=False
+    )

    # Read optional explicit context_length override for the auxiliary
    # compression model. Custom endpoints often cannot report this via
@ -1544,6 +1577,7 @@ def init_agent(
            abort_on_summary_failure=compression_abort_on_summary_failure,
        )
    agent.compression_enabled = compression_enabled
+    agent.compression_in_place = compression_in_place

    # Reject models whose context window is below the minimum required
    # for reliable tool-calling workflows (64K tokens).
--- a/agent/agent_runtime_helpers.py
+++ b/agent/agent_runtime_helpers.py
@ -1050,6 +1050,11 @@ def restore_primary_runtime(agent) -> bool:
        agent._fallback_activated = False
        agent._fallback_index = 0

+        # Undo the fallback's identity rewrite so the prompt is
+        # byte-identical to the stored copy again (prefix cache match).
+        from agent.chat_completion_helpers import rewrite_prompt_model_identity
+        rewrite_prompt_model_identity(agent, rt["model"], rt["provider"])
+
        logger.info(
            "Primary runtime restored for new turn: %s (%s)",
            agent.model, agent.provider,
@ -1373,22 +1378,6 @@ def create_openai_client(agent, client_kwargs: dict, *, reason: str, shared: boo
            agent._client_log_context(),
        )
        return client
-    if agent.provider == "google-gemini-cli" or str(client_kwargs.get("base_url", "")).startswith("cloudcode-pa://"):
-        from agent.gemini_cloudcode_adapter import GeminiCloudCodeClient
-
-        # Strip OpenAI-specific kwargs the Gemini client doesn't accept
-        safe_kwargs = {
-            k: v for k, v in client_kwargs.items()
-            if k in {"api_key", "base_url", "default_headers", "project_id", "timeout"}
-        }
-        client = GeminiCloudCodeClient(**safe_kwargs)
-        _ra().logger.info(
-            "Gemini Cloud Code Assist client created (%s, shared=%s) %s",
-            reason,
-            shared,
-            agent._client_log_context(),
-        )
-        return client
    if agent.provider == "gemini":
        from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url

@ -2182,25 +2171,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
    if source_msg.get("role") != "assistant":
        return

-    # 1. Explicit reasoning_content already set — preserve it verbatim
-    # (includes DeepSeek/Kimi's own space-placeholder written at creation
-    # time, and any valid reasoning content from the same provider).
+    needs_thinking_pad = agent._needs_thinking_reasoning_pad()
+
+    # 1. Explicit reasoning_content already set.
    #
-    # Exception: sessions persisted BEFORE #17341 have empty-string
-    # placeholders pinned at creation time. DeepSeek V4 Pro rejects
-    # those with HTTP 400. When the active provider enforces the
-    # thinking-mode echo, upgrade "" → " " on replay so stale history
-    # doesn't 400 the user on the next turn.
+    # When the active provider enforces the thinking-mode echo-back
+    # (DeepSeek / Kimi / MiMo), preserve it verbatim — that includes their
+    # own space-placeholder written at creation time and any valid reasoning
+    # from the same provider. Sessions persisted BEFORE #17341 have
+    # empty-string placeholders pinned at creation time; DeepSeek V4 Pro
+    # rejects those with HTTP 400, so upgrade "" → " " on replay.
+    #
+    # When the active provider does NOT enforce echo-back, strip the field
+    # entirely. Strict OpenAI-compatible providers (Mistral, Cerebras, Groq,
+    # SambaNova, …) reject ANY reasoning_content key in input messages with
+    # HTTP 400/422 ("Extra inputs are not permitted"), even an empty string
+    # or a single-space pad. This is the cross-provider fallback case: a
+    # reasoning primary (DeepSeek/Kimi/MiMo) pads history with " ", then a
+    # fallback to a strict provider replays that pad and 422s. Stripping
+    # here covers the rebuild path; reapply_reasoning_echo_for_provider()
+    # covers the already-built api_messages path. Refs #45655.
    existing = source_msg.get("reasoning_content")
    if isinstance(existing, str):
-        if existing == "" and agent._needs_thinking_reasoning_pad():
+        if not needs_thinking_pad:
+            api_msg.pop("reasoning_content", None)
+        elif existing == "":
            api_msg["reasoning_content"] = " "
        else:
            api_msg["reasoning_content"] = existing
        return

-    needs_thinking_pad = agent._needs_thinking_reasoning_pad()
-
    # 2. Cross-provider poisoned history (#15748): on DeepSeek/Kimi,
    # if the source turn has tool_calls AND a 'reasoning' field but no
    # 'reasoning_content' key, the 'reasoning' text was written by a
@ -2226,9 +2226,13 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
    # for providers that use the internal 'reasoning' key.
    # This must happen before the unconditional empty-string fallback so
    # genuine reasoning content is not overwritten (#15812 regression in
-    # PR #15478).
+    # PR #15478). Only promote for providers that enforce echo-back —
+    # strict providers reject the field (refs #45655).
    if isinstance(normalized_reasoning, str) and normalized_reasoning:
-        api_msg["reasoning_content"] = normalized_reasoning
+        if needs_thinking_pad:
+            api_msg["reasoning_content"] = normalized_reasoning
+        else:
+            api_msg.pop("reasoning_content", None)
        return

    # 4. DeepSeek / Kimi thinking mode: all assistant messages need
@ -2249,34 +2253,53 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No


 def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
-    """Re-pad assistant turns with reasoning_content for the active provider.
+    """Re-pad (or strip) assistant turns' reasoning_content for the active provider.

    ``api_messages`` is built once, before the retry loop, while the *primary*
-    provider is active.  If a mid-conversation fallback then switches to a
-    require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
-    turns that were built when the prior provider did NOT need the echo-back go
-    out without ``reasoning_content`` and the new provider rejects them with
-    HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
+    provider is active.  A mid-conversation fallback can then switch providers,
+    so the reasoning fields baked into ``api_messages`` are shaped for the
+    *prior* provider and must be reconciled against the *current* one:

-    Calling this immediately before building the request kwargs re-applies the
-    pad against the *current* provider.  It is idempotent and a no-op unless
-    ``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
-    is safe to call every iteration and covers every fallback path.
+    * Switching TO a require-side provider (DeepSeek / Kimi / MiMo thinking
+      mode): assistant turns built when the prior provider did NOT need the
+      echo-back go out without ``reasoning_content`` and the new provider
+      rejects them with HTTP 400 ("The reasoning_content in the thinking mode
+      must be passed back").  Re-apply the pad.

-    Returns the number of assistant turns that gained reasoning_content.
+    * Switching TO a strict provider that rejects the field (Mistral,
+      Cerebras, Groq, SambaNova, …): assistant turns built under a reasoning
+      primary carry a ``reasoning_content`` pad (often a single space ``" "``),
+      and the strict provider rejects it with HTTP 400/422 ("Extra inputs are
+      not permitted").  Strip the field.  This is the exact cross-provider
+      fallback bug from #45655 — a DeepSeek primary pads history with ``" "``,
+      the request falls back to Mistral, and Mistral 422s on the stale pad.
+
+    Calling this immediately before building the request kwargs reconciles the
+    fields against the *current* provider.  It is idempotent and safe to call
+    every iteration; it covers every fallback path.
+
+    Returns the number of assistant turns whose reasoning_content was added or
+    removed.
    """
-    if not agent._needs_thinking_reasoning_pad():
-        return 0
-    padded = 0
+    needs_pad = agent._needs_thinking_reasoning_pad()
+    changed = 0
    for api_msg in api_messages:
        if api_msg.get("role") != "assistant":
            continue
-        if api_msg.get("reasoning_content"):
-            continue
-        copy_reasoning_content_for_api(agent, api_msg, api_msg)
-        if api_msg.get("reasoning_content"):
-            padded += 1
-    return padded
+        if needs_pad:
+            if api_msg.get("reasoning_content"):
+                continue
+            copy_reasoning_content_for_api(agent, api_msg, api_msg)
+            if api_msg.get("reasoning_content"):
+                changed += 1
+        else:
+            # Strict provider — strip any stale reasoning_content pad left
+            # over from a reasoning primary so the fallback request doesn't
+            # 400/422 on it.
+            if "reasoning_content" in api_msg:
+                api_msg.pop("reasoning_content", None)
+                changed += 1
+    return changed


 def _iter_pool_sockets(client: Any):
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@ -2535,3 +2535,56 @@ def sanitize_anthropic_kwargs(api_kwargs: Any, *, log_prefix: str = "") -> Any:
            sorted(leaked),
        )
    return api_kwargs
+
+
+def _is_stream_unavailable_error(exc: Exception) -> bool:
+    """Return True when an Anthropic stream call should fall back to create()."""
+    err_lower = str(exc).lower()
+    if "stream" in err_lower and "not supported" in err_lower:
+        return True
+    if "invokemodelwithresponsestream" in err_lower:
+        from agent.bedrock_adapter import is_streaming_access_denied_error
+
+        return is_streaming_access_denied_error(exc)
+    return False
+
+
+def create_anthropic_message(
+    client: Any,
+    api_kwargs: dict,
+    *,
+    log_prefix: str = "",
+    prefer_stream: bool = True,
+) -> Any:
+    """Create an Anthropic message, aggregating via stream when available.
+
+    Some Anthropic-compatible gateways are SSE-only: they ignore non-streaming
+    requests and return ``text/event-stream`` even for ``messages.create()``.
+    The SDK can surface that as raw text, so callers that expect a Message then
+    crash on ``.content``.  Prefer ``messages.stream().get_final_message()`` to
+    match the main turn path, falling back to ``create()`` only for providers
+    that explicitly do not support streaming, such as restricted Bedrock roles.
+    """
+    sanitize_anthropic_kwargs(api_kwargs, log_prefix=log_prefix)
+
+    messages_api = getattr(client, "messages", None)
+    stream_fn = getattr(messages_api, "stream", None)
+    if prefer_stream and callable(stream_fn):
+        stream_kwargs = dict(api_kwargs)
+        stream_kwargs.pop("stream", None)
+        try:
+            with stream_fn(**stream_kwargs) as stream:
+                return stream.get_final_message()
+        except Exception as exc:
+            if not _is_stream_unavailable_error(exc):
+                raise
+            logger.debug(
+                "%sAnthropic Messages stream unavailable; falling back to "
+                "messages.create(): %s",
+                log_prefix,
+                exc,
+            )
+
+    create_kwargs = dict(api_kwargs)
+    create_kwargs.pop("stream", None)
+    return messages_api.create(**create_kwargs)
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@ -40,6 +40,7 @@ Payment / credit exhaustion fallback:
  their OpenRouter balance but has Codex OAuth or another provider available.
 """

+import contextlib
 import json
 import logging
 import os
@ -102,11 +103,44 @@ OpenAI = _OpenAIProxy()  # module-level name, resolves lazily on call/isinstance
 from agent.credential_pool import load_pool
 from hermes_cli.config import get_hermes_home
 from hermes_constants import OPENROUTER_BASE_URL
-from utils import base_url_host_matches, base_url_hostname, model_forces_max_completion_tokens, normalize_proxy_env_vars
+from utils import base_url_host_matches, base_url_hostname, env_float, model_forces_max_completion_tokens, normalize_proxy_env_vars

 logger = logging.getLogger(__name__)


+# ── Interrupt protection for atomic auxiliary tasks ──────────────────────
+# Some auxiliary tasks must NOT be aborted mid-flight by a gateway interrupt
+# (e.g. an incoming user message while the agent is busy). Context
+# compression is the prime case: if the summary LLM call is interrupted
+# part-way, compression falls back to a static "summary unavailable" marker
+# and the real handoff is lost (#23975). A thread-local flag lets such a
+# task mark its in-flight LLM call as interrupt-protected; the Codex
+# Responses stream's cancellation check honors it. TIMEOUTS still fire
+# (a hung call must die), and all OTHER aux tasks (vision, web_extract,
+# title_generation, …) remain freely interruptible.
+_aux_interrupt_protection = threading.local()
+
+
+def _aux_interrupt_protected() -> bool:
+    return bool(getattr(_aux_interrupt_protection, "active", False))
+
+
+@contextlib.contextmanager
+def aux_interrupt_protection(active: bool = True):
+    """Mark the current thread's auxiliary LLM call as interrupt-protected.
+
+    Used by atomic aux tasks (compression) so a mid-flight gateway interrupt
+    doesn't abort the call and trigger a degraded fallback. Re-entrant-safe:
+    restores the previous value on exit.
+    """
+    prev = getattr(_aux_interrupt_protection, "active", False)
+    _aux_interrupt_protection.active = active
+    try:
+        yield
+    finally:
+        _aux_interrupt_protection.active = prev
+
+
 def _safe_isinstance(obj: Any, maybe_type: Any) -> bool:
    """Return False instead of raising when a patched symbol is not a type."""
    try:
@ -631,6 +665,13 @@ def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
    return str(url or "").strip().rstrip("/")


+def _nous_min_key_ttl_seconds() -> int:
+    try:
+        return max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
+    except (TypeError, ValueError):
+        return 1800
+
+
 # ── Codex Responses → chat.completions adapter ─────────────────────────────
 # All auxiliary consumers call client.chat.completions.create(**kwargs) and
 # read response.choices[0].message.content. This adapter translates those
@ -805,7 +846,11 @@ class _CodexCompletionsAdapter:
                raise TimeoutError(_timeout_message())
            try:
                from tools.interrupt import is_interrupted
-                if is_interrupted():
+                # Honor interrupt protection for atomic aux tasks (compression):
+                # a mid-flight gateway interrupt must NOT abort the summary call
+                # and trigger a degraded fallback marker (#23975). Timeouts above
+                # still fire; other aux tasks remain interruptible.
+                if is_interrupted() and not _aux_interrupt_protected():
                    raise InterruptedError("Codex auxiliary Responses stream interrupted")
            except InterruptedError:
                raise
@ -997,7 +1042,7 @@ class _AnthropicCompletionsAdapter:
        self._is_oauth = is_oauth

    def create(self, **kwargs) -> Any:
-        from agent.anthropic_adapter import build_anthropic_kwargs
+        from agent.anthropic_adapter import build_anthropic_kwargs, create_anthropic_message
        from agent.transports import get_transport

        messages = kwargs.get("messages", [])
@ -1041,7 +1086,7 @@ class _AnthropicCompletionsAdapter:
            if not _forbids_sampling_params(model):
                anthropic_kwargs["temperature"] = temperature

-        response = self._client.messages.create(**anthropic_kwargs)
+        response = create_anthropic_message(self._client, anthropic_kwargs)
        _transport = get_transport("anthropic_messages")
        _nr = _transport.normalize_response(
            response, strip_tool_prefix=self._is_oauth
@ -1300,6 +1345,57 @@ def _nous_base_url() -> str:
    return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)


+def _resolve_nous_pool_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
+    """Resolve Nous auxiliary credentials from the selected pool entry."""
+    try:
+        from hermes_cli.auth import _agent_key_is_usable
+
+        pool = load_pool("nous")
+    except Exception as exc:
+        logger.debug("Auxiliary Nous pool credential resolution failed: %s", exc)
+        return None
+
+    if not pool or not pool.has_credentials():
+        return None
+
+    try:
+        entry = pool.select()
+    except Exception as exc:
+        logger.debug("Auxiliary Nous pool selection failed: %s", exc)
+        return None
+
+    if entry is None:
+        return None
+
+    state = {
+        "agent_key": getattr(entry, "agent_key", None),
+        "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+        "scope": getattr(entry, "scope", None),
+    }
+    if force_refresh or not _agent_key_is_usable(state, _nous_min_key_ttl_seconds()):
+        try:
+            refreshed = pool.try_refresh_current()
+        except Exception as exc:
+            logger.debug("Auxiliary Nous pool refresh failed: %s", exc)
+            refreshed = None
+        if refreshed is None:
+            return None
+        entry = refreshed
+
+    provider = {
+        "agent_key": getattr(entry, "agent_key", None),
+        "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+        "access_token": getattr(entry, "access_token", None),
+        "expires_at": getattr(entry, "expires_at", None),
+        "scope": getattr(entry, "scope", None),
+    }
+    api_key = _nous_api_key(provider)
+    base_url = _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL)
+    if not api_key or not base_url:
+        return None
+    return api_key, base_url
+
+
 def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
    """Return fresh Nous runtime credentials when available.

@ -1308,11 +1404,15 @@ def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[
    relying only on whatever raw tokens happen to be sitting in auth.json
    or the credential pool.
    """
+    pooled = _resolve_nous_pool_runtime_api(force_refresh=force_refresh)
+    if pooled is not None:
+        return pooled
+
    try:
        from hermes_cli.auth import resolve_nous_runtime_credentials

        creds = resolve_nous_runtime_credentials(
-            timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
+            timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
            force_refresh=force_refresh,
        )
    except Exception as exc:
@ -2905,7 +3005,7 @@ def _refresh_provider_credentials(provider: str) -> bool:
            from hermes_cli.auth import resolve_nous_runtime_credentials

            creds = resolve_nous_runtime_credentials(
-                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
+                timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
                force_refresh=True,
            )
            if not str(creds.get("api_key", "") or "").strip():
--- a/agent/background_review.py
+++ b/agent/background_review.py
@ -535,6 +535,13 @@ def _run_review_in_thread(
            )
            review_agent._memory_write_origin = "background_review"
            review_agent._memory_write_context = "background_review"
+            # The review fork pins the parent's cached system prompt and keeps
+            # ``tools[]`` byte-identical to the parent so its outbound request
+            # hits the same provider cache prefix (see the toolset-parity note
+            # above). The between-turns MCP refresh in build_turn_context would
+            # add late-connecting MCP tools to this fork and break that parity,
+            # so opt the review fork out of it.
+            review_agent._skip_mcp_refresh = True
            review_agent._memory_store = agent._memory_store
            review_agent._memory_enabled = agent._memory_enabled
            review_agent._user_profile_enabled = agent._user_profile_enabled
@ -568,6 +575,13 @@ def _run_review_in_thread(
            # if a future code path bypasses the cache.
            review_agent.session_start = agent.session_start
            review_agent.session_id = agent.session_id
+            # The fork shares the parent's live session_id (pinned above for
+            # prefix-cache parity). It is single-lifecycle and calls close()
+            # right after this run_conversation(); without opting out, close()
+            # would finalize the parent's still-active session row mid
+            # conversation (the review fires every ~10 turns). Leave session
+            # finalization to the real owner (CLI close / gateway reset / cron).
+            review_agent._end_session_on_close = False
            # Never let the review fork compress. It shares the parent's
            # session_id, so if it won a compression race it would rotate the
            # parent into a NEW child that the gateway never adopts (the fork
--- a/agent/chat_completion_helpers.py
+++ b/agent/chat_completion_helpers.py
@ -34,7 +34,7 @@ from agent.message_sanitization import (
    _repair_tool_call_arguments,
 )
 from tools.terminal_tool import is_persistent_env
-from utils import base_url_host_matches, base_url_hostname, env_int
+from utils import base_url_host_matches, base_url_hostname, env_float, env_int

 logger = logging.getLogger(__name__)

@ -1042,6 +1042,35 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic



+def rewrite_prompt_model_identity(agent, model: str, provider: str) -> None:
+    """Point the cached system prompt's ``Model:``/``Provider:`` lines at
+    the active runtime after a provider switch.
+
+    The system prompt is session-stable and replayed verbatim for prefix-cache
+    warmth, but after a failover the new backend's cache is cold anyway —
+    while a stale identity line makes the agent misreport which model it is
+    when asked.  Rewrite the lines in place WITHOUT persisting to the session
+    DB: the stored row keeps the primary's labels, so when the primary is
+    restored the prompt is byte-identical to the stored copy again and its
+    prefix cache still matches.
+
+    Only the LAST occurrence of each line is touched — the identity lines
+    live in the volatile tail of the prompt, and earlier matches could be
+    user content (memory snapshots, context files).
+    """
+    sp = getattr(agent, "_cached_system_prompt", None)
+    if not isinstance(sp, str) or not sp:
+        return
+    for label, value in (("Model", model), ("Provider", provider)):
+        if not value:
+            continue
+        matches = list(re.finditer(rf"(?m)^{label}: .*$", sp))
+        if matches:
+            last = matches[-1]
+            sp = f"{sp[:last.start()]}{label}: {value}{sp[last.end():]}"
+    agent._cached_system_prompt = sp
+
+
 def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool:
    """Switch to the next fallback model/provider in the chain.

@ -1287,6 +1316,10 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
                api_mode=agent.api_mode,
            )

+        # Keep the prompt's self-identity in sync with the model actually
+        # answering, so "what model are you?" doesn't report the primary.
+        rewrite_prompt_model_identity(agent, fb_model, fb_provider)
+
        agent._buffer_status(
            f"🔄 Primary model failed — switching to fallback: "
            f"{fb_model} via {fb_provider}"
@ -1761,14 +1794,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
        _base_timeout = (
            _provider_timeout_cfg
            if _provider_timeout_cfg is not None
-            else float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            else env_float("HERMES_API_TIMEOUT", 1800.0)
        )
        # Read timeout: config wins here too.  Otherwise use
        # HERMES_STREAM_READ_TIMEOUT (default 120s) for cloud providers.
        if _provider_timeout_cfg is not None:
            _stream_read_timeout = _provider_timeout_cfg
        else:
-            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            _stream_read_timeout = env_float("HERMES_STREAM_READ_TIMEOUT", 120.0)
            # Local providers (Ollama, llama.cpp, vLLM) can take minutes for
            # prefill on large contexts before producing the first token.
            # Auto-increase the httpx read timeout unless the user explicitly
@ -2508,7 +2541,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
    if _cfg_stale is not None:
        _stream_stale_timeout_base = _cfg_stale
    else:
-        _stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
+        _stream_stale_timeout_base = env_float("HERMES_STREAM_STALE_TIMEOUT", 180.0)
    # Local providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
    # for prefill on large contexts.  Disable the stale detector unless
    # the user explicitly set HERMES_STREAM_STALE_TIMEOUT.
--- a/agent/codex_runtime.py
+++ b/agent/codex_runtime.py
@ -25,6 +25,61 @@ from typing import Any, Dict, List
 logger = logging.getLogger(__name__)


+def _codex_note_to_tool_progress(note: dict) -> tuple[str, str, dict] | None:
+    """Map a Codex app-server ``item/started`` notification to a Hermes
+    tool-progress event ``(tool_name, preview, args)``.
+
+    The Codex app-server runtime processes ``item/started`` notifications for
+    command execution, file changes, and MCP/dynamic tool calls, but never
+    surfaced them as Hermes tool-progress events — so gateways (Telegram, etc.)
+    showed no verbose "running X" breadcrumbs on this route while every other
+    provider did (#38835). Returns None for items that aren't tool-shaped.
+    """
+    if not isinstance(note, dict) or note.get("method") != "item/started":
+        return None
+    params = note.get("params") or {}
+    item = params.get("item") or {}
+    if not isinstance(item, dict):
+        return None
+
+    item_type = item.get("type") or ""
+    if item_type == "commandExecution":
+        command = item.get("command") or ""
+        return "exec_command", command, {"command": command, "cwd": item.get("cwd") or ""}
+
+    if item_type == "fileChange":
+        changes = item.get("changes") or []
+        preview = "file changes"
+        if isinstance(changes, list) and changes:
+            paths = [
+                str(change.get("path"))
+                for change in changes
+                if isinstance(change, dict) and change.get("path")
+            ]
+            if paths:
+                preview = ", ".join(paths[:3])
+                if len(paths) > 3:
+                    preview += f", +{len(paths) - 3} more"
+        return "apply_patch", preview, {"changes": changes}
+
+    if item_type == "mcpToolCall":
+        server = item.get("server") or "mcp"
+        tool = item.get("tool") or "unknown"
+        args = item.get("arguments") or {}
+        if not isinstance(args, dict):
+            args = {"arguments": args}
+        return f"mcp.{server}.{tool}", tool, args
+
+    if item_type == "dynamicToolCall":
+        tool = item.get("tool") or "unknown"
+        args = item.get("arguments") or {}
+        if not isinstance(args, dict):
+            args = {"arguments": args}
+        return tool, tool, args
+
+    return None
+
+
 def _coerce_usage_int(value: Any) -> int:
    if isinstance(value, bool):
        return 0
@ -195,7 +250,9 @@ def run_codex_app_server_turn(
    # Spawned on first turn, reused across turns, closed at AIAgent
    # shutdown (see _cleanup hook).
    if not hasattr(agent, "_codex_session") or agent._codex_session is None:
-        cwd = getattr(agent, "session_cwd", None) or os.getcwd()
+        from agent.runtime_cwd import resolve_agent_cwd
+
+        cwd = getattr(agent, "session_cwd", None) or str(resolve_agent_cwd())
        # Approval callback: defer to Hermes' standard prompt flow if a
        # CLI thread has installed one. Gateway / cron contexts get the
        # codex-side fail-closed default.
@ -204,9 +261,27 @@ def run_codex_app_server_turn(
            approval_callback = _get_approval_callback()
        except Exception:
            approval_callback = None
+
+        def _on_codex_event(note: dict) -> None:
+            # Bridge Codex app-server item/started notifications to Hermes
+            # tool-progress so gateways show verbose "running X" breadcrumbs
+            # on this route too (#38835).
+            progress_callback = getattr(agent, "tool_progress_callback", None)
+            if progress_callback is None:
+                return
+            mapped = _codex_note_to_tool_progress(note)
+            if mapped is None:
+                return
+            tool_name, preview, args = mapped
+            try:
+                progress_callback("tool.started", tool_name, preview, args)
+            except Exception:
+                logger.debug("codex tool-progress callback raised", exc_info=True)
+
        agent._codex_session = CodexAppServerSession(
            cwd=cwd,
            approval_callback=approval_callback,
+            on_event=_on_codex_event,
        )

    # NOTE: the user message is ALREADY appended to messages by the
@ -290,6 +365,7 @@ def run_codex_app_server_turn(
                original_user_message=original_user_message,
                final_response=turn.final_text,
                interrupted=False,
+                messages=messages,
            )
        except Exception:
            logger.debug("external memory sync raised", exc_info=True)
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@ -23,7 +23,7 @@ import re
 import time
 from typing import Any, Dict, List, Optional

-from agent.auxiliary_client import call_llm, _is_connection_error
+from agent.auxiliary_client import call_llm, _is_connection_error, aux_interrupt_protection
 from agent.context_engine import ContextEngine
 from agent.model_metadata import (
    MINIMUM_CONTEXT_LENGTH,
@ -656,9 +656,8 @@ class ContextCompressor(ContextEngine):
        self.provider = provider
        self.api_mode = api_mode
        self.context_length = context_length
-        self.threshold_tokens = max(
-            int(context_length * self.threshold_percent),
-            MINIMUM_CONTEXT_LENGTH,
+        self.threshold_tokens = self._compute_threshold_tokens(
+            context_length, self.threshold_percent
        )
        # Recalculate token budgets for the new context length so the
        # compressor stays calibrated after a model switch (e.g. 200K → 32K).
@ -668,6 +667,62 @@ class ContextCompressor(ContextEngine):
            int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
        )

+        # Reset cross-call calibration state captured under the PREVIOUS model.
+        # These fields encode "the provider proved this prompt fit" / "preflight
+        # can be deferred" decisions that are only valid for the model that
+        # produced them. Carrying them across a switch to a smaller-context
+        # model would let should_defer_preflight_to_real_usage() suppress a
+        # preflight compression the new model actually needs — the exact
+        # oversized-send-after-switch failure in #23767. The new model's first
+        # response repopulates them via update_from_response(). Setting
+        # last_prompt_tokens to 0 (NOT -1) is deliberate: 0 is the documented
+        # "no real usage yet -> use the rough estimate" state, so the post-
+        # response should_compress path falls back to estimate_request_tokens_rough
+        # rather than skipping compression. -1 is a different sentinel
+        # (#36718, "compression just ran, await real usage") and must not be set here.
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+        self.last_real_prompt_tokens = 0
+        self.last_rough_tokens_when_real_prompt_fit = 0
+        self.last_compression_rough_tokens = 0
+        self.awaiting_real_usage_after_compression = False
+        self._ineffective_compression_count = 0
+
+    # When the MINIMUM_CONTEXT_LENGTH floor meets/exceeds a small context
+    # window, compacting at the percentage (50% → 32K of a 64K window) wastes
+    # half the usable context. Trigger near the top of the window instead so a
+    # minimum-context model uses most of its budget before compacting — same
+    # rationale as the gpt-5.5/Codex 85% autoraise.
+    _MIN_CTX_TRIGGER_RATIO = 0.85
+
+    @staticmethod
+    def _compute_threshold_tokens(context_length: int, threshold_percent: float) -> int:
+        """Compute the compaction trigger threshold in tokens.
+
+        The base value is ``context_length * threshold_percent``, floored at
+        ``MINIMUM_CONTEXT_LENGTH`` so large-context models don't compress
+        prematurely at 50%. BUT that floor degenerates at small windows: for a
+        model whose ``context_length`` is at/below the minimum (e.g. a 64K
+        local model), ``max(0.5*64000, 64000) == 64000`` makes the threshold
+        equal the ENTIRE window — auto-compression can never fire because the
+        provider rejects the request before usage reaches 100% (#14690).
+
+        When the floor would meet or exceed the context window, trigger at
+        ``_MIN_CTX_TRIGGER_RATIO`` (85%) of the window — high enough that a
+        small model uses most of its context before compacting, but below
+        100% so compaction fires before the provider rejects the request.
+        """
+        pct_value = int(context_length * threshold_percent)
+        floored = max(pct_value, MINIMUM_CONTEXT_LENGTH)
+        # If flooring pushed the threshold to/over the window it can never be
+        # reached. Trigger at 85% of the window so a minimum-context model
+        # rides most of its budget before compacting instead of wasting half.
+        if context_length > 0 and floored >= context_length:
+            return max(1, min(int(context_length * ContextCompressor._MIN_CTX_TRIGGER_RATIO),
+                              context_length - 1))
+        return floored
+
    def __init__(
        self,
        model: str,
@ -708,10 +763,11 @@ class ContextCompressor(ContextEngine):
        # Floor: never compress below MINIMUM_CONTEXT_LENGTH tokens even if
        # the percentage would suggest a lower value.  This prevents premature
        # compression on large-context models at 50% while keeping the % sane
-        # for models right at the minimum.
-        self.threshold_tokens = max(
-            int(self.context_length * threshold_percent),
-            MINIMUM_CONTEXT_LENGTH,
+        # for models right at the minimum. _compute_threshold_tokens also
+        # guards the degenerate case where the floor would equal/exceed the
+        # window (small models), so auto-compression can still fire (#14690).
+        self.threshold_tokens = self._compute_threshold_tokens(
+            self.context_length, threshold_percent
        )
        self.compression_count = 0

@ -761,6 +817,14 @@ class ContextCompressor(ContextEngine):
        # this flag to know "compression was attempted but aborted, freeze
        # the chat until the user manually retries via /compress".
        self._last_compress_aborted: bool = False
+        # Set True when the summary call failed with an authentication /
+        # permission error (HTTP 401/403). Auth failures are non-recoverable
+        # at the request level — the credential or endpoint is broken — so
+        # compress() must ABORT (preserve the session unchanged) rather than
+        # rotate into a degraded child session with a placeholder summary.
+        # This is independent of the abort_on_summary_failure config flag:
+        # rotating on a broken credential is never the right behavior.
+        self._last_summary_auth_failure: bool = False
        # When a user-configured summary model fails and we recover by
        # retrying on the main model, record the failure so gateway /
        # CLI callers can still warn the user even though compression
@ -1245,7 +1309,10 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
 Unknown from deterministic fallback. Inspect current repository/session state if needed.

 {HISTORICAL_IN_PROGRESS_HEADING}
-{active_task}
+Unknown from deterministic fallback — the latest user ask is recorded once under
+"{HISTORICAL_TASK_HEADING}" above as historical context only. Do NOT treat it as an
+unfulfilled instruction to re-answer; verify current state and continue from the
+protected recent messages after this summary.

 ## Blocked
 {_bullets(blockers, limit=5)}
@ -1257,7 +1324,9 @@ None recoverable from deterministic fallback.
 None recoverable from deterministic fallback.

 {HISTORICAL_PENDING_ASKS_HEADING}
-{active_task}
+None recoverable from deterministic fallback. (The latest user ask is preserved once
+under "{HISTORICAL_TASK_HEADING}" as historical context — it is NOT necessarily
+outstanding.)

 ## Relevant Files
 {_bullets(relevant_files, limit=12)}
@ -1511,11 +1580,33 @@ This compaction should PRIORITISE preserving all information related to the focu
            }
            if self.summary_model:
                call_kwargs["model"] = self.summary_model
-            response = call_llm(**call_kwargs)
+            # Compression is atomic: protect the in-flight summary call from a
+            # mid-turn gateway interrupt. Without this, an incoming user message
+            # aborts the summary and compression falls back to a degraded static
+            # marker, losing the real handoff (#23975). Re-entrant: a main-model
+            # retry (_generate_summary recursion) re-enters harmlessly.
+            with aux_interrupt_protection():
+                response = call_llm(**call_kwargs)
            content = response.choices[0].message.content
            # Handle cases where content is not a string (e.g., dict from llama.cpp)
            if not isinstance(content, str):
                content = str(content) if content else ""
+            # Some OpenAI-compatible proxies (e.g. cmkey.cn, one-api channels)
+            # return a well-formed HTTP 200 with an empty or whitespace-only
+            # ``content`` instead of an error or empty ``choices``. That payload
+            # passes ``_validate_llm_response`` (a ``message`` exists), so it
+            # reaches here and would otherwise be stored as a prefix-only
+            # summary with no body — silently wiping the compacted turns and
+            # making the model forget the in-progress task (#11978, #11914).
+            # Treat empty content as a failure so it routes through the same
+            # main-model fallback + cooldown machinery as a transport error,
+            # rather than replacing real context with an empty summary.
+            if not content.strip():
+                raise RuntimeError(
+                    "Context compression LLM returned empty content "
+                    f"(provider={self.provider or 'auto'} "
+                    f"model={self.summary_model or self.model})"
+                )
            # Redact the summary output as well — the summarizer LLM may
            # ignore prompt instructions and echo back secrets verbatim.
            summary = redact_sensitive_text(content.strip())
@ -1524,17 +1615,29 @@ This compaction should PRIORITISE preserving all information related to the focu
            self._summary_failure_cooldown_until = 0.0
            self._summary_model_fallen_back = False
            self._last_summary_error = None
+            self._last_summary_auth_failure = False
            return self._with_summary_prefix(summary)
-        except RuntimeError:
-            # No provider configured — long cooldown, unlikely to self-resolve
-            self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
-            self._last_summary_error = "no auxiliary LLM provider configured"
-            logger.warning("Context compression: no provider available for "
-                            "summary. Middle turns will be dropped without summary "
-                            "for %d seconds.",
-                            _SUMMARY_FAILURE_COOLDOWN_SECONDS)
-            return None
        except Exception as e:
+            # ``call_llm`` raises ``RuntimeError`` for two very different cases:
+            #   1. No provider configured ("No LLM provider configured ...") —
+            #      a permanent misconfiguration, long cooldown is correct.
+            #   2. An empty/invalid response from a configured provider
+            #      (``_validate_llm_response`` empty-``choices``/``None``, or our
+            #      empty-``content`` guard above) — a transient/proxy fault that
+            #      should fall back to the main model first, exactly like the
+            #      transport errors handled below.
+            # Only (1) belongs in the long no-provider cooldown; (2) and every
+            # other exception flow into the generic fallback logic so they get
+            # a main-model retry before any cooldown. (#11978, #11914)
+            if isinstance(e, RuntimeError) and "no llm provider configured" in str(e).lower():
+                # No provider configured — long cooldown, unlikely to self-resolve
+                self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
+                self._last_summary_error = "no auxiliary LLM provider configured"
+                logger.warning("Context compression: no provider available for "
+                                "summary. Middle turns will be dropped without summary "
+                                "for %d seconds.",
+                                _SUMMARY_FAILURE_COOLDOWN_SECONDS)
+                return None
            # If the summary model is different from the main model and the
            # error looks permanent (model not found, 503, 404), fall back to
            # using the main model instead of entering cooldown that leaves
@ -1571,6 +1674,26 @@ This compaction should PRIORITISE preserving all information related to the focu
            # back to the main model instead of entering a 60-second cooldown.
            # See issue #18458.
            _is_streaming_closed = _is_connection_error(e)
+            # Authentication / permission failures (401/403) are NOT transient
+            # and NOT fixable by retrying the same request: the credential is
+            # invalid/blocked/expired or the endpoint is wrong (e.g. a prod
+            # token sent to a staging inference URL). Flag them so compress()
+            # aborts and preserves the session instead of rotating into a
+            # degraded child with a placeholder summary. We still allow the
+            # one-shot fallback to the MAIN model below when the failure came
+            # from a distinct auxiliary summary_model (its dedicated creds may
+            # be the only broken thing); only a failure on the main model — or
+            # a fallback that also auth-fails — makes the abort stick.
+            _is_auth_error = (
+                _status in {401, 403}
+                or "invalid api key" in _err_str
+                or "invalid x-api-key" in _err_str
+                or ("api key" in _err_str and ("invalid" in _err_str or "blocked" in _err_str))
+                or "unauthorized" in _err_str
+                or "authentication" in _err_str
+            )
+            if _is_auth_error:
+                self._last_summary_auth_failure = True
            if _is_json_decode and not _is_model_not_found and not _is_timeout:
                logger.error(
                    "Context compression failed: auxiliary LLM returned a "
@ -1809,6 +1932,23 @@ This compaction should PRIORITISE preserving all information related to the focu
            idx += 1
        return idx

+    def _effective_protect_first_n(self) -> int:
+        """``protect_first_n`` decayed across compression cycles.
+
+        ``protect_first_n`` keeps the first N non-system messages verbatim so
+        the original task framing survives the FIRST compaction. But applying
+        it on every subsequent pass fossilizes those early turns — they're
+        re-copied into each child session and never summarized away, so old
+        user messages become immortal and grow the head unboundedly across a
+        long session (#11996). Once the session has been compressed at least
+        once, the early turns are already captured in the handoff summary, so
+        there's no need to keep re-protecting them: decay to 0 (the system
+        prompt is still always protected separately by _protect_head_size).
+        """
+        if self.compression_count >= 1 or self._previous_summary:
+            return 0
+        return self.protect_first_n
+
    def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
        """Total count of head messages to protect.

@ -1820,14 +1960,19 @@ This compaction should PRIORITISE preserving all information related to the focu
        the ``messages`` list (e.g. the gateway ``/compress`` handler
        strips it before calling compress()).

-        Examples:
+        The ``protect_first_n`` portion DECAYS after the first compression
+        (see _effective_protect_first_n) so early user turns don't fossilize
+        across repeated compactions (#11996).
+
+        Examples (first compaction):
          protect_first_n=0 → system prompt only (or nothing if no system msg)
          protect_first_n=3 → system + first 3 non-system messages
+        After the first compaction: system prompt only.
        """
        head = 0
        if messages and messages[0].get("role") == "system":
            head = 1
-        return head + self.protect_first_n
+        return head + self._effective_protect_first_n()

    def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
        """Pull a compress-end boundary backward to avoid splitting a
@ -2178,6 +2323,7 @@ This compaction should PRIORITISE preserving all information related to the focu
        self._last_aux_model_failure_error = None
        self._last_aux_model_failure_model = None
        self._last_compress_aborted = False
+        self._last_summary_auth_failure = False

        # Manual /compress (force=True) bypasses the failure cooldown so the
        # user can retry immediately after an auto-compress abort.  Without
@ -2293,19 +2439,38 @@ This compaction should PRIORITISE preserving all information related to the focu
        #           _last_summary_dropped_count for gateway hygiene to
        #           surface a warning.
        # Default is False (historical behavior).
-        if not summary and self.abort_on_summary_failure:
+        #
+        # EXCEPTION — auth failures always abort. A 401/403 from the summary
+        # call means the credential or endpoint is broken (invalid/blocked
+        # key, or a token pointed at the wrong inference host). Rotating into
+        # a child session with a placeholder summary on a broken credential
+        # strands the user on a degraded session for zero benefit — every
+        # subsequent call fails the same way. So when the failure was an auth
+        # error we abort regardless of abort_on_summary_failure, preserving
+        # the conversation unchanged until the credential is fixed.
+        if not summary and (self.abort_on_summary_failure or self._last_summary_auth_failure):
            n_skipped = compress_end - compress_start
            self._last_summary_dropped_count = 0  # nothing actually dropped
            self._last_summary_fallback_used = False
            self._last_compress_aborted = True
            if not self.quiet_mode:
-                logger.warning(
-                    "Summary generation failed — aborting compression "
-                    "(compression.abort_on_summary_failure=true). "
-                    "%d message(s) preserved unchanged. Conversation is "
-                    "frozen until the next /compress or /new.",
-                    n_skipped,
-                )
+                if self._last_summary_auth_failure:
+                    logger.warning(
+                        "Summary generation failed with an authentication "
+                        "error — aborting compression. %d message(s) preserved "
+                        "unchanged; the session was NOT rotated. Check your "
+                        "provider credential / inference endpoint, then retry "
+                        "with /compress or start fresh with /new.",
+                        n_skipped,
+                    )
+                else:
+                    logger.warning(
+                        "Summary generation failed — aborting compression "
+                        "(compression.abort_on_summary_failure=true). "
+                        "%d message(s) preserved unchanged. Conversation is "
+                        "frozen until the next /compress or /new.",
+                        n_skipped,
+                    )
            return messages

        # Phase 4: Assemble compressed message list
--- a/agent/conversation_compression.py
+++ b/agent/conversation_compression.py
@ -328,6 +328,16 @@ def compress_context(
        agent._compression_feasibility_checked = True

    _pre_msg_count = len(messages)
+    # In-place compaction (config: compression.in_place, see #38763). When True,
+    # this compaction rewrites the message list + rebuilds the system prompt but
+    # keeps the SAME session_id — no end_session, no parent_session_id child, no
+    # `name #N` renumber, no contextvar/env/logging re-sync, no memory/context-
+    # engine session-switch. The conversation keeps one durable id for life,
+    # eliminating the session-rotation bug cluster. Default False during rollout.
+    in_place = bool(getattr(agent, "compression_in_place", False))
+    # Set True once the in-place DB write actually completes (the DB block can
+    # raise and skip it). Surfaced to the gateway via agent._last_compaction_in_place.
+    compacted_in_place = False
    logger.info(
        "context compression started: session=%s messages=%d tokens=~%s model=%s focus=%r",
        agent.session_id or "none", _pre_msg_count,
@ -508,125 +518,244 @@ def compress_context(

    if agent._session_db:
        try:
-            # Propagate title to the new session with auto-numbering
-            old_title = agent._session_db.get_session_title(agent.session_id)
-            # Trigger memory extraction on the old session before it rotates.
+            # Trigger memory extraction on the current session before the
+            # transcript is rewritten (runs in BOTH modes — the logical
+            # conversation's pre-compaction turns are about to be summarized
+            # away regardless of whether the id rotates).
            agent.commit_memory_session(messages)
-            # Flush any un-persisted messages from the current turn to the
-            # old session *before* rotating.  compress_context() can be
-            # called mid-turn (auto-compress when context exceeds threshold)
-            # at a point when _flush_messages_to_session_db() has not yet
-            # run.  Without this, messages generated during the current turn
-            # are silently lost on session rotation (#47202).
-            try:
-                agent._flush_messages_to_session_db(messages)
-            except Exception:
-                pass  # best-effort — don't block compression on a flush error
-            agent._session_db.end_session(agent.session_id, "compression")
-            old_session_id = agent.session_id
-            agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
-            # Ordering contract: the agent thread updates the contextvar here;
-            # the gateway propagates to SessionEntry after run_in_executor returns.
-            try:
-                from gateway.session_context import set_current_session_id

-                set_current_session_id(agent.session_id)
-            except Exception:
-                os.environ["HERMES_SESSION_ID"] = agent.session_id
-            # The gateway/tools session context (ContextVar + env) and the
-            # logging session context are SEPARATE mechanisms. The call above
-            # moves the former; the ``[session_id]`` tag on log lines comes
-            # from ``hermes_logging._session_context`` (set once per turn in
-            # conversation_loop.py). Without this, post-rotation log lines in
-            # the same turn keep the STALE old id while the message/DB/gateway
-            # state carry the new one — breaking log correlation exactly at the
-            # compaction boundary (see #34089). Guarded separately so a logging
-            # failure can never regress the routing update above.
-            try:
-                from hermes_logging import set_session_context
-
-                set_session_context(agent.session_id)
-            except Exception:
-                pass
-            agent._session_db_created = False
-            agent._session_db.create_session(
-                session_id=agent.session_id,
-                source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
-                model=agent.model,
-                model_config=agent._session_init_model_config,
-                parent_session_id=old_session_id,
-            )
-            agent._session_db_created = True
-            # Auto-number the title for the continuation session
-            if old_title:
+            if in_place:
+                # ── In-place compaction: keep the same session_id ──────────
+                # No end_session, no new row, no parent_session_id, no title
+                # renumber, no contextvar/env/logging re-sync. The session's
+                # id, title, cwd, /goal, and gateway routing all stay put.
+                #
+                # Durable, NON-DESTRUCTIVE replace: soft-archive the
+                # pre-compaction turns (active=0, kept on disk + FTS-searchable +
+                # recoverable) and insert `compressed` as the new live (active=1)
+                # set, atomically. `compressed` already carries the surviving
+                # tail (current-turn messages the compressor kept via
+                # protect_last_n), so we DON'T pre-flush here — a flush would
+                # INSERT current-turn rows that archive_and_compact would then
+                # archive alongside the rest (harmless but wasted writes). The
+                # live-context load filters active=1, so a resume reloads ONLY
+                # the compacted set; the original turns remain under the SAME id
+                # for search/recovery (Teknium review — keep one durable id
+                # WITHOUT destroying history, unlike a hard replace_messages).
+                # See #38763.
+                agent._session_db.archive_and_compact(agent.session_id, compressed)
+                # Reset the flush identity set so the next turn's appends are
+                # diffed against the COMPACTED transcript: the compacted dicts
+                # are passed as conversation_history next turn and skipped by
+                # identity, so only genuinely new turn messages get appended
+                # (no dup of the summary, no resurrection of dropped turns).
+                agent._flushed_db_message_ids = set()
+                # Rotation-independent signal: the conversation was compacted in
+                # place (id unchanged). The gateway reads this (NOT an id-change
+                # diff) to re-baseline transcript handling.
+                compacted_in_place = True
+            else:
+                # ── Rotation (legacy): end this session, fork a continuation ─
+                # Flush any un-persisted current-turn messages to the OLD
+                # session before ending it, so they survive in the preserved
+                # parent transcript (#47202). (In-place skips this — see above.)
                try:
-                    new_title = agent._session_db.get_next_title_in_lineage(old_title)
-                    agent._session_db.set_session_title(agent.session_id, new_title)
-                except (ValueError, Exception) as e:
-                    logger.debug("Could not propagate title on compression: %s", e)
+                    agent._flush_messages_to_session_db(messages)
+                except Exception:
+                    pass  # best-effort — don't block compression on a flush error
+                # Propagate title to the new session with auto-numbering
+                old_title = agent._session_db.get_session_title(agent.session_id)
+                agent._session_db.end_session(agent.session_id, "compression")
+                old_session_id = agent.session_id
+                agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+                # Ordering contract: the agent thread updates the contextvar here;
+                # the gateway propagates to SessionEntry after run_in_executor returns.
+                try:
+                    from gateway.session_context import set_current_session_id
+
+                    set_current_session_id(agent.session_id)
+                except Exception:
+                    os.environ["HERMES_SESSION_ID"] = agent.session_id
+                # The gateway/tools session context (ContextVar + env) and the
+                # logging session context are SEPARATE mechanisms. The call above
+                # moves the former; the ``[session_id]`` tag on log lines comes
+                # from ``hermes_logging._session_context`` (set once per turn in
+                # conversation_loop.py). Without this, post-rotation log lines in
+                # the same turn keep the STALE old id while the message/DB/gateway
+                # state carry the new one — breaking log correlation exactly at the
+                # compaction boundary (see #34089). Guarded separately so a logging
+                # failure can never regress the routing update above.
+                try:
+                    from hermes_logging import set_session_context
+
+                    set_session_context(agent.session_id)
+                except Exception:
+                    pass
+                agent._session_db_created = False
+                try:
+                    agent._session_db.create_session(
+                        session_id=agent.session_id,
+                        source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
+                        model=agent.model,
+                        model_config=agent._session_init_model_config,
+                        parent_session_id=old_session_id,
+                    )
+                except Exception as _cs_err:
+                    # The child row could not be created (e.g. FK constraint,
+                    # contended write). Previously the outer handler simply
+                    # warned and let the agent continue on the NEW id — which
+                    # has no row in state.db, producing an orphan: the parent
+                    # is ended, the child is never indexed, and every
+                    # subsequent message is attributed to a session that
+                    # doesn't exist (#33906/#33907). Roll the live id back to
+                    # the parent so the conversation stays attached to a real,
+                    # indexed session instead of a phantom.
+                    logger.warning(
+                        "Compression child session create failed (%s) — "
+                        "rolling back to parent session %s to avoid an orphan.",
+                        _cs_err, old_session_id,
+                    )
+                    agent.session_id = old_session_id
+                    try:
+                        from gateway.session_context import set_current_session_id
+                        set_current_session_id(agent.session_id)
+                    except Exception:
+                        os.environ["HERMES_SESSION_ID"] = agent.session_id
+                    try:
+                        from hermes_logging import set_session_context
+                        set_session_context(agent.session_id)
+                    except Exception:
+                        pass
+                    # Re-open the parent: it was ended above, but we're
+                    # continuing on it, so it must not stay closed.
+                    try:
+                        agent._session_db.reopen_session(old_session_id)
+                    except Exception:
+                        pass
+                    old_session_id = None  # no rotation happened
+                    # The parent row already exists in state.db, so mark the
+                    # session as created — _ensure_db_session would otherwise
+                    # retry a (harmless INSERT OR IGNORE) create next turn.
+                    agent._session_db_created = True
+                    raise
+                agent._session_db_created = True
+                # Carry a persistent /goal onto the continuation session.
+                # Compression mints a fresh child id; load_goal does a flat
+                # per-session lookup with no parent walk, so without this an
+                # active goal silently dies at the boundary (#33618).
+                try:
+                    from hermes_cli.goals import migrate_goal_to_session
+                    migrate_goal_to_session(old_session_id, agent.session_id, reason="compression")
+                except Exception as _goal_err:
+                    logger.debug("Could not migrate goal on compression: %s", _goal_err)
+                # Auto-number the title for the continuation session
+                if old_title:
+                    try:
+                        new_title = agent._session_db.get_next_title_in_lineage(old_title)
+                        agent._session_db.set_session_title(agent.session_id, new_title)
+                    except (ValueError, Exception) as e:
+                        logger.debug("Could not propagate title on compression: %s", e)
+
+            # Shared post-write steps (both modes target agent.session_id, which
+            # in-place keeps and rotation has already reassigned to the new id):
+            # refresh the stored system prompt and reset the flush cursor so the
+            # next turn re-bases its append diff.
            agent._session_db.update_system_prompt(agent.session_id, new_system_prompt)
-            # Reset flush cursor — new session starts with no messages written
            agent._last_flushed_db_idx = 0
        except Exception as e:
-            logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
+            # If the rotation rolled back to the parent (orphan-avoidance
+            # above), agent.session_id is the still-indexed parent and
+            # old_session_id was cleared — so this is recovery, not an
+            # un-indexed orphan. Otherwise an earlier step failed before the
+            # child was created and the warning's original meaning holds.
+            if locals().get("old_session_id") is None and not in_place:
+                logger.warning(
+                    "Compression rotation aborted and rolled back to the "
+                    "parent session (%s): %s", agent.session_id or "?", e,
+                )
+            else:
+                logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

-    # Notify the context engine that the session_id rotated because of
-    # compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
-    # boundary_reason="compression" to preserve DAG lineage across the
-    # rollover instead of re-initializing fresh per-session state.
-    # See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
+    # Compaction-boundary bookkeeping, computed once. `old_session_id` is only
+    # bound in the rotation branch; in-place leaves it unset. `_boundary_parent`
+    # is the id the boundary notifications attribute the prior state to: the old
+    # id on rotation, the (unchanged) current id in-place.
+    _old_sid = locals().get("old_session_id")
+    _is_boundary = bool(_old_sid) or in_place
+    _boundary_parent = _old_sid or agent.session_id or ""
+
+    # Notify the context engine that a compaction boundary occurred. Plugin
+    # engines (e.g. hermes-lcm) use boundary_reason="compression" to preserve
+    # DAG lineage / checkpoint per-session state across the boundary instead of
+    # re-initializing fresh. See hermes-lcm#68. Built-in ContextCompressor
+    # ignores kwargs. Fires in BOTH modes: rotation passes old→new ids; in-place
+    # passes the SAME id (the boundary is real even though the id didn't move).
    try:
-        _old_sid = locals().get("old_session_id")
-        if _old_sid and hasattr(agent.context_compressor, "on_session_start"):
+        if _is_boundary and hasattr(agent.context_compressor, "on_session_start"):
            agent.context_compressor.on_session_start(
                agent.session_id or "",
                boundary_reason="compression",
-                old_session_id=_old_sid,
+                old_session_id=_boundary_parent,
+                platform=getattr(agent, "platform", None) or "cli",
                conversation_id=getattr(agent, "_gateway_session_key", None),
            )
    except Exception as _ce_err:
        logger.debug("context engine on_session_start (compression): %s", _ce_err)

-    # Notify memory providers of the compression-driven session_id rotation
-    # so provider-cached per-session state (Hindsight's _document_id,
-    # accumulated turn buffers, counters) refreshes. reset=False because
-    # the logical conversation continues; only the id and DB row rolled
-    # over. See #6672.
+    # Notify memory providers of the compaction boundary so provider-cached
+    # per-session state (Hindsight's _document_id, accumulated turn buffers,
+    # counters) refreshes. reset=False because the logical conversation
+    # continues. See #6672. Fires in BOTH modes: in-place uses the same id as
+    # parent (the conversation didn't fork, but the buffer must still be told
+    # the transcript was compacted so it doesn't double-count dropped turns).
    try:
-        _old_sid = locals().get("old_session_id")
-        if _old_sid and agent._memory_manager:
+        if _is_boundary and agent._memory_manager:
            agent._memory_manager.on_session_switch(
                agent.session_id or "",
-                parent_session_id=_old_sid,
+                parent_session_id=_boundary_parent,
                reset=False,
                reason="compression",
            )
    except Exception as _me_err:
        logger.debug("memory manager on_session_switch (compression): %s", _me_err)

-    # Warn on repeated compressions (quality degrades with each pass)
+    # Warn on repeated compressions (quality degrades with each pass).
+    # Route through _emit_status (like the other compression warnings above)
+    # so the warning reaches the TUI / Telegram / Discord via status_callback,
+    # not just CLI stdout. _emit_status still _vprints for the CLI, and
+    # storing it on _compression_warning lets replay_compression_warning
+    # re-deliver it once a late-bound gateway status_callback is wired (#36908).
    _cc = agent.context_compressor.compression_count
    if _cc >= 2:
-        agent._vprint(
+        _cc_msg = (
            f"{agent.log_prefix}⚠️  Session compressed {_cc} times — "
-            f"accuracy may degrade. Consider /new to start fresh.",
-            force=True,
+            f"accuracy may degrade. Consider /new to start fresh."
        )
+        agent._compression_warning = _cc_msg
+        agent._emit_status(_cc_msg)

    # Emit session:compress event so hooks (e.g. MemPalace sync) can ingest
-    # the completed old session before its details are lost.
-    _old_sid_for_event = locals().get("old_session_id")
+    # the completed old session before its details are lost. In in-place mode
+    # there is no old id (same session); ``in_place=True`` tells hooks the
+    # transcript was compacted on the same id rather than rotated.
    if getattr(agent, "event_callback", None):
        try:
            agent.event_callback("session:compress", {
                "platform": agent.platform or "",
                "session_id": agent.session_id,
-                "old_session_id": _old_sid_for_event or "",
+                "old_session_id": _old_sid or "",
+                "in_place": in_place,
                "compression_count": agent.context_compressor.compression_count,
            })
        except Exception as e:
            logger.debug("event_callback error on session:compress: %s", e)

+    # Surface the compaction mode to the caller (run_conversation / gateway)
+    # via a rotation-independent flag. The gateway uses this — NOT an
+    # id-change diff — to re-baseline transcript handling (history_offset=0 +
+    # rewrite on the same id) when compaction happened in place. See #38763.
+    agent._last_compaction_in_place = compacted_in_place
+
    # Keep the post-compression rough estimate for diagnostics, but do not
    # treat it as provider-reported prompt usage. Schema-heavy rough estimates
    # can remain above threshold even after the next real API request fits.
@ -712,33 +841,58 @@ def try_shrink_image_parts_in_messages(
    # actually brought under the target.
    unshrinkable_oversized = 0

-    def _shrink_data_url(url: str) -> Optional[str]:
-        """Return a smaller data URL, or None if shrink can't help."""
-        if not isinstance(url, str) or not url.startswith("data:"):
+    def _decode_pixels(data_url: str) -> Optional[tuple]:
+        """Return ``(width, height)`` of a base64 data URL, or None on failure.
+
+        Soft-depends on Pillow; returns None (caller falls back to a
+        bytes-only check) if Pillow is missing or the payload is corrupt.
+        """
+        try:
+            import base64 as _b64_dim
+            import io as _io_dim
+            header_d, _, data_d = data_url.partition(",")
+            if not data_d or not data_url.startswith("data:"):
+                return None
+            from PIL import Image as _PILImage
+            with _PILImage.open(_io_dim.BytesIO(_b64_dim.b64decode(data_d))) as _img:
+                return _img.size
+        except Exception:
            return None

-        # Check both byte size AND pixel dimensions.
+    def _shrink_data_url(url: str) -> tuple:
+        """Return ``(resized_url, unshrinkable)`` for a data URL.
+
+        ``resized_url`` is a smaller/dimension-correct data URL, or None when
+        no rewrite was applied.  ``unshrinkable`` is True only when the image
+        exceeded a constraint (byte-size or dimensions) and the resize failed
+        to satisfy *that same* constraint — so the caller knows retrying is
+        pointless even if a different image in the request shrank.
+        """
+        if not isinstance(url, str) or not url.startswith("data:"):
+            return None, False
+
+        # Determine which constraint is binding.  The accept/reject gate below
+        # MUST be checked against the same axis that triggered the shrink: a
+        # downscaled screenshot PNG routinely re-encodes to *more* bytes than
+        # the original (PNG compression is non-monotonic in image size — a
+        # smaller raster with LANCZOS resampling noise compresses worse than a
+        # larger smooth one).  Rejecting a pixel-correct downscale purely
+        # because its bytes grew permanently wedges sessions on the Anthropic
+        # many-image 2000px path (#48013).
        needs_shrink = len(url) > target_bytes  # over byte budget
+        triggered_by = "bytes" if needs_shrink else None
        if not needs_shrink:
-            # Even if bytes are fine, check pixel dimensions against the
-            # provider's reported per-side cap.  A screenshot can be tiny in
-            # bytes yet too large in pixels.
-            try:
-                import base64 as _b64_dim
-                header_d, _, data_d = url.partition(",")
-                if not data_d:
-                    return None
-                raw_d = _b64_dim.b64decode(data_d)
-                from PIL import Image as _PILImage
-                import io as _io_dim
-                with _PILImage.open(_io_dim.BytesIO(raw_d)) as _img:
-                    if max(_img.size) <= max_dimension:
-                        return None  # both bytes and pixels are fine
-                needs_shrink = True  # pixels exceed limit, force shrink
-            except Exception:
-                # If we can't check dimensions (Pillow unavailable, corrupt
-                # image, etc.), fall back to byte-only check.
-                return None
+            # Bytes are fine — check pixel dimensions against the provider's
+            # reported per-side cap.  A screenshot can be tiny in bytes yet
+            # too large in pixels.
+            dims = _decode_pixels(url)
+            if dims is None:
+                # Pillow missing or corrupt data — fall back to byte-only.
+                return None, False
+            if max(dims) <= max_dimension:
+                return None, False  # both bytes and pixels are within limits
+            needs_shrink = True
+            triggered_by = "dimension"

        try:
            header, _, data = url.partition(",")
@ -770,13 +924,45 @@ def try_shrink_image_parts_in_messages(
                    Path(tmp.name).unlink(missing_ok=True)
                except Exception:
                    pass
-            if not resized or len(resized) >= len(url):
-                # Shrink didn't help (or made it bigger — corrupt input?).
-                return None
-            return resized
+            if not resized:
+                # Resize returned nothing — Pillow couldn't help.
+                return None, True
+            if triggered_by == "bytes":
+                # Byte budget is the binding constraint — bytes must shrink.
+                if len(resized) >= len(url):
+                    return None, True  # re-encode made it bigger
+                # The per-side dimension cap is ALSO an active provider
+                # constraint on this request (the caller passes the parsed cap
+                # to both this helper and the resizer).  _resize_image_for_vision
+                # returns a best-effort, possibly-over-cap blob when it
+                # exhausts its halving budget — it freezes the long side once
+                # the short side hits its 64px floor, so a very-high-aspect
+                # image can stay over the cap even after bytes shrank.  If the
+                # output is still over the cap, retrying would re-400 on
+                # dimensions; treat it as unshrinkable.  (Skip when dims can't
+                # be decoded — preserves historical byte-only behaviour.)
+                new_dims = _decode_pixels(resized)
+                if new_dims is not None and max(new_dims) > max_dimension:
+                    return None, True
+                return resized, False
+            # triggered_by == "dimension": the per-side cap is binding.  The
+            # re-encode may have grown in bytes; accept it as long as it is now
+            # within the dimension cap.  Verify the new dimensions when we can.
+            new_dims = _decode_pixels(resized)
+            if new_dims is not None:
+                if max(new_dims) <= max_dimension:
+                    return resized, False
+                # Still over the per-side cap — the resize didn't satisfy it.
+                return None, True
+            # Couldn't verify the re-encode's dimensions (corrupt output or
+            # Pillow gone mid-call).  Fall back to the historical "bytes must
+            # shrink" gate so we never accept an unverifiable, byte-larger blob.
+            if len(resized) >= len(url):
+                return None, True
+            return resized, False
        except Exception as exc:
            logger.warning("image-shrink recovery: re-encode failed — %s", exc)
-            return None
+            return None, triggered_by is not None

    for msg in api_messages:
        if not isinstance(msg, dict):
@ -795,20 +981,18 @@ def try_shrink_image_parts_in_messages(
            # OpenAI Responses: {"image_url": "data:..."}
            if isinstance(image_value, dict):
                url = image_value.get("url", "")
-                resized = _shrink_data_url(url)
+                resized, unshrinkable = _shrink_data_url(url)
                if resized:
                    image_value["url"] = resized
                    changed_count += 1
-                elif isinstance(url, str) and url.startswith("data:") \
-                        and len(url) > target_bytes:
+                elif unshrinkable:
                    unshrinkable_oversized += 1
            elif isinstance(image_value, str):
-                resized = _shrink_data_url(image_value)
+                resized, unshrinkable = _shrink_data_url(image_value)
                if resized:
                    part["image_url"] = resized
                    changed_count += 1
-                elif image_value.startswith("data:") \
-                        and len(image_value) > target_bytes:
+                elif unshrinkable:
                    unshrinkable_oversized += 1

    if changed_count:
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@ -466,6 +466,32 @@ def _content_policy_blocked_result(
    }


+def _sync_failover_system_message(agent, api_messages, active_system_prompt):
+    """Refresh the in-flight system message after a provider failover.
+
+    ``try_activate_fallback`` rewrites the ``Model:``/``Provider:`` identity
+    lines on ``agent._cached_system_prompt`` (see
+    ``rewrite_prompt_model_identity``) so the agent reports the model that is
+    actually answering.  But the current call block's ``api_messages`` were
+    built from the pre-failover prompt, and the retry loop rebuilds
+    ``api_kwargs`` from that list each iteration — without this sync the
+    whole turn (and every gateway turn, since fallback re-activates per
+    message while the primary is down) ships the stale identity.
+
+    Mutates ``api_messages[0]`` in place and returns the prompt to use as
+    ``active_system_prompt`` for subsequent call-block rebuilds.
+    """
+    sp = getattr(agent, "_cached_system_prompt", None)
+    if not isinstance(sp, str) or not sp:
+        return active_system_prompt
+    if api_messages and api_messages[0].get("role") == "system":
+        effective = sp
+        if agent.ephemeral_system_prompt:
+            effective = (effective + "\n\n" + agent.ephemeral_system_prompt).strip()
+        api_messages[0]["content"] = effective
+    return sp
+
+
 def run_conversation(
    agent,
    user_message: str,
@ -940,6 +966,8 @@ def run_conversation(
                        )
                        agent._buffer_status(f"⏳ {_nous_msg}")
                        if agent._try_activate_fallback():
+                            active_system_prompt = _sync_failover_system_message(
+                                agent, api_messages, active_system_prompt)
                            retry_count = 0
                            compression_attempts = 0
                            _retry.primary_recovery_attempted = False
@ -1265,6 +1293,8 @@ def run_conversation(
                    if agent._fallback_index < len(agent._fallback_chain):
                        agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
                    if agent._try_activate_fallback():
+                        active_system_prompt = _sync_failover_system_message(
+                            agent, api_messages, active_system_prompt)
                        retry_count = 0
                        compression_attempts = 0
                        _retry.primary_recovery_attempted = False
@ -1336,6 +1366,8 @@ def run_conversation(
                        if agent._has_pending_fallback():
                            agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
                        if agent._try_activate_fallback():
+                            active_system_prompt = _sync_failover_system_message(
+                                agent, api_messages, active_system_prompt)
                            retry_count = 0
                            compression_attempts = 0
                            _retry.primary_recovery_attempted = False
@ -1479,6 +1511,8 @@ def run_conversation(
                            "⚠️ Model declined to respond (safety refusal) — trying fallback..."
                        )
                    if agent._try_activate_fallback():
+                        active_system_prompt = _sync_failover_system_message(
+                            agent, api_messages, active_system_prompt)
                        retry_count = 0
                        compression_attempts = 0
                        _retry.primary_recovery_attempted = False
@ -2783,11 +2817,46 @@ def run_conversation(
                        else:
                            agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
                        if agent._try_activate_fallback(reason=classified.reason):
+                            active_system_prompt = _sync_failover_system_message(
+                                agent, api_messages, active_system_prompt)
                            retry_count = 0
                            compression_attempts = 0
                            _retry.primary_recovery_attempted = False
                            continue

+                # ── Auth-failure provider failover ───────────────────────
+                # A 401/403 that survives the per-provider credential-refresh
+                # attempt above (each guarded by its own
+                # ``*_auth_retry_attempted`` flag) means the active provider's
+                # credential or endpoint is broken in a way refreshing can't
+                # fix (revoked OAuth, blocked/expired key, an account pinned to
+                # a dead/staging endpoint). Previously the loop only printed
+                # "switch providers manually" advice and fell through, so a
+                # user with a configured fallback chain kept thrashing on the
+                # same dead credential every turn instead of failing over.
+                # Escalate to the fallback chain here, mirroring the rate-
+                # limit/billing failover above. When no fallback is configured
+                # (or the chain is exhausted), _try_activate_fallback returns
+                # False and we fall through to the existing terminal handling
+                # + provider-specific troubleshooting guidance unchanged.
+                if (
+                    classified.is_auth
+                    and not _retry.auth_failover_attempted
+                    and agent._fallback_index < len(agent._fallback_chain)
+                ):
+                    _retry.auth_failover_attempted = True
+                    agent._buffer_status(
+                        "🔐 Authentication failed and could not be refreshed — "
+                        "switching to fallback provider..."
+                    )
+                    if agent._try_activate_fallback(reason=classified.reason):
+                        active_system_prompt = _sync_failover_system_message(
+                            agent, api_messages, active_system_prompt)
+                        retry_count = 0
+                        compression_attempts = 0
+                        _retry.primary_recovery_attempted = False
+                        continue
+
                # ── Nous Portal: record rate limit & skip retries ─────
                # When Nous returns a 429 that is a genuine account-
                # level rate limit, record the reset time to a shared
@ -2914,6 +2983,7 @@ def run_conversation(
                    agent._buffer_status(f"⚠️  Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")

                    original_len = len(messages)
+                    original_tokens = estimate_messages_tokens_rough(messages)
                    messages, active_system_prompt = agent._compress_context(
                        messages, system_message, approx_tokens=approx_tokens,
                        task_id=effective_task_id,
@ -2923,8 +2993,18 @@ def run_conversation(
                    # messages to the new session, not skipping them.
                    conversation_history = None

-                    if len(messages) < original_len:
-                        agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+                    # Re-estimate tokens after compression.  Same-message-count
+                    # compression (tool-result pruning, in-place summarization)
+                    # can materially reduce request size without reducing the
+                    # message array.  (#39550)
+                    new_tokens = estimate_messages_tokens_rough(messages)
+                    approx_tokens = new_tokens  # update for downstream logging
+
+                    if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95):
+                        if len(messages) < original_len:
+                            agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+                        else:
+                            agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
                        time.sleep(2)  # Brief pause between compression retries
                        _retry.restart_with_compressed_messages = True
                        break
@ -3070,6 +3150,7 @@ def run_conversation(
                    agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")

                    original_len = len(messages)
+                    original_tokens = estimate_messages_tokens_rough(messages)
                    messages, active_system_prompt = agent._compress_context(
                        messages, system_message, approx_tokens=approx_tokens,
                        task_id=effective_task_id,
@ -3079,9 +3160,18 @@ def run_conversation(
                    # messages to the new session, not skipping them.
                    conversation_history = None

-                    if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
+                    # Re-estimate tokens after compression.  Same-message-count
+                    # compression (tool-result pruning, in-place summarization)
+                    # can materially reduce request size without reducing the
+                    # message array.  (#39550)
+                    new_tokens = estimate_messages_tokens_rough(messages)
+                    approx_tokens = new_tokens  # update for downstream logging
+
+                    if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95) or (new_ctx and new_ctx < old_ctx):
                        if len(messages) < original_len:
                            agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+                        elif new_tokens > 0 and new_tokens < original_tokens * 0.95:
+                            agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
                        time.sleep(2)  # Brief pause between compression retries
                        _retry.restart_with_compressed_messages = True
                        break
@ -3090,13 +3180,13 @@ def run_conversation(
                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
                        agent._vprint(f"{agent.log_prefix}   💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
-                        logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
+                        logger.error(f"{agent.log_prefix}Context length exceeded: {new_tokens:,} tokens. Cannot compress further.")
                        agent._persist_session(messages, conversation_history)
                        return {
                            "messages": messages,
                            "completed": False,
                            "api_calls": api_call_count,
-                            "error": f"Context length exceeded ({approx_tokens:,} tokens). Cannot compress further.",
+                            "error": f"Context length exceeded ({new_tokens:,} tokens). Cannot compress further.",
                            "partial": True,
                            "failed": True,
                            "compression_exhausted": True,
@ -3186,6 +3276,8 @@ def run_conversation(
                        else:
                            agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
                    if agent._try_activate_fallback():
+                        active_system_prompt = _sync_failover_system_message(
+                            agent, api_messages, active_system_prompt)
                        retry_count = 0
                        compression_attempts = 0
                        _retry.primary_recovery_attempted = False
@ -3197,15 +3289,22 @@ def run_conversation(
                    # Terminal — flush buffered context so the user sees
                    # what was tried before the abort.
                    agent._flush_status_buffer()
+                    # Summarize once: Cloudflare/proxy HTML challenge pages and
+                    # other raw provider bodies must be collapsed to a short
+                    # one-liner here, otherwise the full page leaks into the
+                    # returned ``error`` field and downstream consumers deliver
+                    # it verbatim (e.g. a cron failure notification dumped a
+                    # ~60KB Cloudflare challenge page as 31 Discord messages).
+                    _nonretryable_summary = agent._summarize_api_error(api_error)
                    if classified.reason == FailoverReason.content_policy_blocked:
                        agent._emit_status(
                            f"❌ Provider safety filter blocked this request: "
-                            f"{agent._summarize_api_error(api_error)}"
+                            f"{_nonretryable_summary}"
                        )
                    else:
                        agent._emit_status(
                            f"❌ Non-retryable error (HTTP {status_code}): "
-                            f"{agent._summarize_api_error(api_error)}"
+                            f"{_nonretryable_summary}"
                        )
                    agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
                    agent._vprint(f"{agent.log_prefix}   🔌 Provider: {_provider}  Model: {_model}", force=True)
@ -3290,18 +3389,17 @@ def run_conversation(
                    else:
                        agent._persist_session(messages, conversation_history)
                    if classified.reason == FailoverReason.content_policy_blocked:
-                        _summary = agent._summarize_api_error(api_error)
                        _policy_response = (
                            "⚠️  The model provider's safety filter blocked this request "
                            "(not a Hermes/gateway failure).\n\n"
-                            f"Provider message: {_summary}\n\n"
+                            f"Provider message: {_nonretryable_summary}\n\n"
                            f"{_CONTENT_POLICY_RECOVERY_HINT}"
                        )
                        return _content_policy_blocked_result(
                            messages,
                            api_call_count,
                            final_response=_policy_response,
-                            error_detail=_summary,
+                            error_detail=_nonretryable_summary,
                        )
                    return {
                        "final_response": None,
@ -3309,7 +3407,7 @@ def run_conversation(
                        "api_calls": api_call_count,
                        "completed": False,
                        "failed": True,
-                        "error": str(api_error),
+                        "error": _nonretryable_summary,
                    }

                if retry_count >= max_retries:
@ -3327,6 +3425,8 @@ def run_conversation(
                    if agent._has_pending_fallback():
                        agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
                    if agent._try_activate_fallback():
+                        active_system_prompt = _sync_failover_system_message(
+                            agent, api_messages, active_system_prompt)
                        retry_count = 0
                        compression_attempts = 0
                        _retry.primary_recovery_attempted = False
@ -4273,6 +4373,8 @@ def run_conversation(
                            "switching to fallback provider..."
                        )
                        if agent._try_activate_fallback():
+                            active_system_prompt = _sync_failover_system_message(
+                                agent, api_messages, active_system_prompt)
                            agent._empty_content_retries = 0
                            agent._buffer_status(
                                f"↻ Switched to fallback: {agent.model} "
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@ -15,6 +15,7 @@ from typing import Any, Dict, List, Optional, Set, Tuple

 from hermes_constants import OPENROUTER_BASE_URL
 from hermes_cli.config import load_env
+from agent.secret_scope import get_secret as _get_secret
 from agent.credential_persistence import (
    is_borrowed_credential_source,
    sanitize_borrowed_credential_payload,
@ -1666,7 +1667,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
        _env_file = load_env()

        def _env_val(key: str) -> str:
-            return (_env_file.get(key) or os.environ.get(key) or "").strip()
+            return (_env_file.get(key) or _get_secret(key, "") or "").strip()

        anthropic_api_key = _env_val("ANTHROPIC_API_KEY")
        anthropic_oauth_env = (
@ -1952,7 +1953,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
    # changes to the .env file.
    def _get_env_prefer_dotenv(key: str) -> str:
        env_file = load_env()
-        val = env_file.get(key) or os.environ.get(key) or ""
+        val = env_file.get(key) or _get_secret(key, "") or ""
        return val.strip()

    # Honour user suppression — `hermes auth remove <provider> <N>` for an
@ -2061,19 +2062,34 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
    return changed, active_sources


-def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources: Set[str]) -> bool:
+def _prune_stale_seeded_entries(
+    entries: List[PooledCredential],
+    active_sources: Set[str],
+    *,
+    prune_env_sources: bool = True,
+) -> bool:
+    def _is_prunable(entry: PooledCredential) -> bool:
+        # ``env:*`` entries are persisted references that get re-hydrated from
+        # the environment on every load. A process that merely lacks the env
+        # var this call must NOT delete the on-disk entry for every other
+        # process — that destructive read is the bug behind #9331. Only prune
+        # an env source when ``prune_env_sources`` is explicitly requested
+        # (e.g. an `hermes auth` command that confirmed the source is gone).
+        if entry.source.startswith("env:"):
+            return prune_env_sources
+        # File-backed singletons (device-code OAuth, claude_code) and Hermes
+        # PKCE should disappear from the pool when their backing file is gone.
+        return (
+            is_borrowed_credential_source(entry.source, entry.provider)
+            or entry.source == "hermes_pkce"
+        )
+
    retained = [
        entry
        for entry in entries
        if _is_manual_source(entry.source)
        or entry.source in active_sources
-        or not (
-            is_borrowed_credential_source(entry.source, entry.provider)
-            # Hermes PKCE is Hermes-owned/persistable while present, but it is
-            # still a file-backed singleton and should disappear from the pool
-            # when the backing OAuth file is gone.
-            or entry.source == "hermes_pkce"
-        )
+        or not _is_prunable(entry)
    ]
    if len(retained) == len(entries):
        return False
@ -2173,7 +2189,15 @@ def load_pool(provider: str) -> CredentialPool:
        singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
        env_changed, env_sources = _seed_from_env(provider, entries)
        changed = raw_needs_sanitization or singleton_changed or env_changed
-        changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
+        # ``load_pool()`` is a non-destructive read for env-seeded entries: a
+        # process missing a provider env var must not delete the persisted
+        # pool entry for every other process (#9331). File-backed singletons
+        # still prune when their backing file is gone.
+        changed |= _prune_stale_seeded_entries(
+            entries,
+            singleton_sources | env_sources,
+            prune_env_sources=False,
+        )
        changed |= _normalize_pool_priorities(provider, entries)

    if changed:
--- a/agent/gemini_cloudcode_adapter.py
+++ b/agent/gemini_cloudcode_adapter.py
@ -1,909 +0,0 @@
-"""OpenAI-compatible facade that talks to Google's Cloud Code Assist backend.
-
-This adapter lets Hermes use the ``google-gemini-cli`` provider as if it were
-a standard OpenAI-shaped chat completion endpoint, while the underlying HTTP
-traffic goes to ``cloudcode-pa.googleapis.com/v1internal:{generateContent,
-streamGenerateContent}`` with a Bearer access token obtained via OAuth PKCE.
-
-Architecture
------------
- ``GeminiCloudCodeClient`` exposes ``.chat.completions.create(**kwargs)``
-  mirroring the subset of the OpenAI SDK that ``run_agent.py`` uses.
- Incoming OpenAI ``messages[]`` / ``tools[]`` / ``tool_choice`` are translated
-  to Gemini's native ``contents[]`` / ``tools[].functionDeclarations`` /
-  ``toolConfig`` / ``systemInstruction`` shape.
- The request body is wrapped ``{project, model, user_prompt_id, request}``
-  per Code Assist API expectations.
- Responses (``candidates[].content.parts[]``) are converted back to
-  OpenAI ``choices[0].message`` shape with ``content`` + ``tool_calls``.
- Streaming uses SSE (``?alt=sse``) and yields OpenAI-shaped delta chunks.
-
-Attribution
-----------
-Translation semantics follow jenslys/opencode-gemini-auth (MIT) and the public
-Gemini API docs. Request envelope shape
-(``{project, model, user_prompt_id, request}``) is documented nowhere; it is
-reverse-engineered from the opencode-gemini-auth and clawdbot implementations.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import time
-import uuid
-from types import SimpleNamespace
-from typing import Any, Dict, Iterator, List, Optional
-
-import httpx
-
-from agent import google_oauth
-from agent.gemini_schema import sanitize_gemini_tool_parameters
-from agent.google_code_assist import (
-    CODE_ASSIST_ENDPOINT,
-    CodeAssistError,
-    ProjectContext,
-    resolve_project_context,
-)
-
-logger = logging.getLogger(__name__)
-
-
-# =============================================================================
-# Request translation: OpenAI → Gemini
-# =============================================================================
-
-_ROLE_MAP_OPENAI_TO_GEMINI = {
-    "user": "user",
-    "assistant": "model",
-    "system": "user",   # handled separately via systemInstruction
-    "tool": "user",     # functionResponse is wrapped in a user-role turn
-    "function": "user",
-}
-
-
-def _coerce_content_to_text(content: Any) -> str:
-    """OpenAI content may be str or a list of parts; reduce to plain text."""
-    if content is None:
-        return ""
-    if isinstance(content, str):
-        return content
-    if isinstance(content, list):
-        pieces: List[str] = []
-        for p in content:
-            if isinstance(p, str):
-                pieces.append(p)
-            elif isinstance(p, dict):
-                if p.get("type") == "text" and isinstance(p.get("text"), str):
-                    pieces.append(p["text"])
-                # Multimodal (image_url, etc.) — stub for now; log and skip
-                elif p.get("type") in {"image_url", "input_audio"}:
-                    logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
-        return "\n".join(pieces)
-    return str(content)
-
-
-def _translate_tool_call_to_gemini(tool_call: Dict[str, Any]) -> Dict[str, Any]:
-    """OpenAI tool_call -> Gemini functionCall part."""
-    fn = tool_call.get("function") or {}
-    args_raw = fn.get("arguments", "")
-    try:
-        args = json.loads(args_raw) if isinstance(args_raw, str) and args_raw else {}
-    except json.JSONDecodeError:
-        args = {"_raw": args_raw}
-    if not isinstance(args, dict):
-        args = {"_value": args}
-    return {
-        "functionCall": {
-            "name": fn.get("name") or "",
-            "args": args,
-        },
-        # Sentinel signature — matches opencode-gemini-auth's approach.
-        # Without this, Code Assist rejects function calls that originated
-        # outside its own chain.
-        "thoughtSignature": "skip_thought_signature_validator",
-    }
-
-
-def _translate_tool_result_to_gemini(message: Dict[str, Any]) -> Dict[str, Any]:
-    """OpenAI tool-role message -> Gemini functionResponse part.
-
-    The function name isn't in the OpenAI tool message directly; it must be
-    passed via the assistant message that issued the call. For simplicity we
-    look up ``name`` on the message (OpenAI SDK copies it there) or on the
-    ``tool_call_id`` cross-reference.
-    """
-    name = str(message.get("name") or message.get("tool_call_id") or "tool")
-    content = _coerce_content_to_text(message.get("content"))
-    # Gemini expects the response as a dict under `response`. We wrap plain
-    # text in {"output": "..."}.
-    try:
-        parsed = json.loads(content) if content.strip().startswith(("{", "[")) else None
-    except json.JSONDecodeError:
-        parsed = None
-    response = parsed if isinstance(parsed, dict) else {"output": content}
-    return {
-        "functionResponse": {
-            "name": name,
-            "response": response,
-        },
-    }
-
-
-def _build_gemini_contents(
-    messages: List[Dict[str, Any]],
-) -> tuple[List[Dict[str, Any]], Optional[Dict[str, Any]]]:
-    """Convert OpenAI messages[] to Gemini contents[] + systemInstruction."""
-    system_text_parts: List[str] = []
-    contents: List[Dict[str, Any]] = []
-
-    for msg in messages:
-        if not isinstance(msg, dict):
-            continue
-        role = str(msg.get("role") or "user")
-
-        if role == "system":
-            system_text_parts.append(_coerce_content_to_text(msg.get("content")))
-            continue
-
-        # Tool result message — emit a user-role turn with functionResponse
-        if role == "tool" or role == "function":
-            contents.append({
-                "role": "user",
-                "parts": [_translate_tool_result_to_gemini(msg)],
-            })
-            continue
-
-        gemini_role = _ROLE_MAP_OPENAI_TO_GEMINI.get(role, "user")
-        parts: List[Dict[str, Any]] = []
-
-        text = _coerce_content_to_text(msg.get("content"))
-        if text:
-            parts.append({"text": text})
-
-        # Assistant messages can carry tool_calls
-        tool_calls = msg.get("tool_calls") or []
-        if isinstance(tool_calls, list):
-            for tc in tool_calls:
-                if isinstance(tc, dict):
-                    parts.append(_translate_tool_call_to_gemini(tc))
-
-        if not parts:
-            # Gemini rejects empty parts; skip the turn entirely
-            continue
-
-        contents.append({"role": gemini_role, "parts": parts})
-
-    system_instruction: Optional[Dict[str, Any]] = None
-    joined_system = "\n".join(p for p in system_text_parts if p).strip()
-    if joined_system:
-        system_instruction = {
-            "role": "system",
-            "parts": [{"text": joined_system}],
-        }
-
-    return contents, system_instruction
-
-
-def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
-    """OpenAI tools[] -> Gemini tools[].functionDeclarations[]."""
-    if not isinstance(tools, list) or not tools:
-        return []
-    declarations: List[Dict[str, Any]] = []
-    for t in tools:
-        if not isinstance(t, dict):
-            continue
-        fn = t.get("function") or {}
-        if not isinstance(fn, dict):
-            continue
-        name = fn.get("name")
-        if not name:
-            continue
-        decl = {"name": str(name)}
-        if fn.get("description"):
-            decl["description"] = str(fn["description"])
-        params = fn.get("parameters")
-        if isinstance(params, dict):
-            decl["parameters"] = sanitize_gemini_tool_parameters(params)
-        declarations.append(decl)
-    if not declarations:
-        return []
-    return [{"functionDeclarations": declarations}]
-
-
-def _translate_tool_choice_to_gemini(tool_choice: Any) -> Optional[Dict[str, Any]]:
-    """OpenAI tool_choice -> Gemini toolConfig.functionCallingConfig."""
-    if tool_choice is None:
-        return None
-    if isinstance(tool_choice, str):
-        if tool_choice == "auto":
-            return {"functionCallingConfig": {"mode": "AUTO"}}
-        if tool_choice == "required":
-            return {"functionCallingConfig": {"mode": "ANY"}}
-        if tool_choice == "none":
-            return {"functionCallingConfig": {"mode": "NONE"}}
-    if isinstance(tool_choice, dict):
-        fn = tool_choice.get("function") or {}
-        name = fn.get("name")
-        if name:
-            return {
-                "functionCallingConfig": {
-                    "mode": "ANY",
-                    "allowedFunctionNames": [str(name)],
-                },
-            }
-    return None
-
-
-def _normalize_thinking_config(config: Any) -> Optional[Dict[str, Any]]:
-    """Accept thinkingBudget / thinkingLevel / includeThoughts (+ snake_case)."""
-    if not isinstance(config, dict) or not config:
-        return None
-    budget = config.get("thinkingBudget", config.get("thinking_budget"))
-    level = config.get("thinkingLevel", config.get("thinking_level"))
-    include = config.get("includeThoughts", config.get("include_thoughts"))
-    normalized: Dict[str, Any] = {}
-    if isinstance(budget, (int, float)):
-        normalized["thinkingBudget"] = int(budget)
-    if isinstance(level, str) and level.strip():
-        normalized["thinkingLevel"] = level.strip().lower()
-    if isinstance(include, bool):
-        normalized["includeThoughts"] = include
-    return normalized or None
-
-
-def build_gemini_request(
-    *,
-    messages: List[Dict[str, Any]],
-    tools: Any = None,
-    tool_choice: Any = None,
-    temperature: Optional[float] = None,
-    max_tokens: Optional[int] = None,
-    top_p: Optional[float] = None,
-    stop: Any = None,
-    thinking_config: Any = None,
-) -> Dict[str, Any]:
-    """Build the inner Gemini request body (goes inside ``request`` wrapper)."""
-    contents, system_instruction = _build_gemini_contents(messages)
-
-    body: Dict[str, Any] = {"contents": contents}
-    if system_instruction is not None:
-        body["systemInstruction"] = system_instruction
-
-    gemini_tools = _translate_tools_to_gemini(tools)
-    if gemini_tools:
-        body["tools"] = gemini_tools
-    tool_cfg = _translate_tool_choice_to_gemini(tool_choice)
-    if tool_cfg is not None:
-        body["toolConfig"] = tool_cfg
-
-    generation_config: Dict[str, Any] = {}
-    if isinstance(temperature, (int, float)):
-        generation_config["temperature"] = float(temperature)
-    if isinstance(max_tokens, int) and max_tokens > 0:
-        generation_config["maxOutputTokens"] = max_tokens
-    if isinstance(top_p, (int, float)):
-        generation_config["topP"] = float(top_p)
-    if isinstance(stop, str) and stop:
-        generation_config["stopSequences"] = [stop]
-    elif isinstance(stop, list) and stop:
-        generation_config["stopSequences"] = [str(s) for s in stop if s]
-    normalized_thinking = _normalize_thinking_config(thinking_config)
-    if normalized_thinking:
-        generation_config["thinkingConfig"] = normalized_thinking
-    if generation_config:
-        body["generationConfig"] = generation_config
-
-    return body
-
-
-def wrap_code_assist_request(
-    *,
-    project_id: str,
-    model: str,
-    inner_request: Dict[str, Any],
-    user_prompt_id: Optional[str] = None,
-) -> Dict[str, Any]:
-    """Wrap the inner Gemini request in the Code Assist envelope."""
-    return {
-        "project": project_id,
-        "model": model,
-        "user_prompt_id": user_prompt_id or str(uuid.uuid4()),
-        "request": inner_request,
-    }
-
-
-# =============================================================================
-# Response translation: Gemini → OpenAI
-# =============================================================================
-
-def _translate_gemini_response(
-    resp: Dict[str, Any],
-    model: str,
-) -> SimpleNamespace:
-    """Non-streaming Gemini response -> OpenAI-shaped SimpleNamespace.
-
-    Code Assist wraps the actual Gemini response inside ``response``, so we
-    unwrap it first if present.
-    """
-    inner = resp.get("response") if isinstance(resp.get("response"), dict) else resp
-
-    candidates = inner.get("candidates") or []
-    if not isinstance(candidates, list) or not candidates:
-        return _empty_response(model)
-
-    cand = candidates[0]
-    content_obj = cand.get("content") if isinstance(cand, dict) else {}
-    parts = content_obj.get("parts") if isinstance(content_obj, dict) else []
-
-    text_pieces: List[str] = []
-    reasoning_pieces: List[str] = []
-    tool_calls: List[SimpleNamespace] = []
-
-    for i, part in enumerate(parts or []):
-        if not isinstance(part, dict):
-            continue
-        # Thought parts are model's internal reasoning — surface as reasoning,
-        # don't mix into content.
-        if part.get("thought") is True:
-            if isinstance(part.get("text"), str):
-                reasoning_pieces.append(part["text"])
-            continue
-        if isinstance(part.get("text"), str):
-            text_pieces.append(part["text"])
-            continue
-        fc = part.get("functionCall")
-        if isinstance(fc, dict) and fc.get("name"):
-            try:
-                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
-            except (TypeError, ValueError):
-                args_str = "{}"
-            tool_calls.append(SimpleNamespace(
-                id=f"call_{uuid.uuid4().hex[:12]}",
-                type="function",
-                index=i,
-                function=SimpleNamespace(name=str(fc["name"]), arguments=args_str),
-            ))
-
-    finish_reason = "tool_calls" if tool_calls else _map_gemini_finish_reason(
-        str(cand.get("finishReason") or "")
-    )
-
-    usage_meta = inner.get("usageMetadata") or {}
-    usage = SimpleNamespace(
-        prompt_tokens=int(usage_meta.get("promptTokenCount") or 0),
-        completion_tokens=int(usage_meta.get("candidatesTokenCount") or 0),
-        total_tokens=int(usage_meta.get("totalTokenCount") or 0),
-        prompt_tokens_details=SimpleNamespace(
-            cached_tokens=int(usage_meta.get("cachedContentTokenCount") or 0),
-        ),
-    )
-
-    message = SimpleNamespace(
-        role="assistant",
-        content="".join(text_pieces) if text_pieces else None,
-        tool_calls=tool_calls or None,
-        reasoning="".join(reasoning_pieces) or None,
-        reasoning_content="".join(reasoning_pieces) or None,
-        reasoning_details=None,
-    )
-    choice = SimpleNamespace(
-        index=0,
-        message=message,
-        finish_reason=finish_reason,
-    )
-    return SimpleNamespace(
-        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
-        object="chat.completion",
-        created=int(time.time()),
-        model=model,
-        choices=[choice],
-        usage=usage,
-    )
-
-
-def _empty_response(model: str) -> SimpleNamespace:
-    message = SimpleNamespace(
-        role="assistant", content="", tool_calls=None,
-        reasoning=None, reasoning_content=None, reasoning_details=None,
-    )
-    choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
-    usage = SimpleNamespace(
-        prompt_tokens=0, completion_tokens=0, total_tokens=0,
-        prompt_tokens_details=SimpleNamespace(cached_tokens=0),
-    )
-    return SimpleNamespace(
-        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
-        object="chat.completion",
-        created=int(time.time()),
-        model=model,
-        choices=[choice],
-        usage=usage,
-    )
-
-
-def _map_gemini_finish_reason(reason: str) -> str:
-    mapping = {
-        "STOP": "stop",
-        "MAX_TOKENS": "length",
-        "SAFETY": "content_filter",
-        "RECITATION": "content_filter",
-        "OTHER": "stop",
-    }
-    return mapping.get(reason.upper(), "stop")
-
-
-# =============================================================================
-# Streaming SSE iterator
-# =============================================================================
-
-class _GeminiStreamChunk(SimpleNamespace):
-    """Mimics an OpenAI ChatCompletionChunk with .choices[0].delta."""
-    pass
-
-
-def _make_stream_chunk(
-    *,
-    model: str,
-    content: str = "",
-    tool_call_delta: Optional[Dict[str, Any]] = None,
-    finish_reason: Optional[str] = None,
-    reasoning: str = "",
-) -> _GeminiStreamChunk:
-    delta_kwargs: Dict[str, Any] = {
-        "role": "assistant",
-        "content": None,
-        "tool_calls": None,
-        "reasoning": None,
-        "reasoning_content": None,
-    }
-    if content:
-        delta_kwargs["content"] = content
-    if tool_call_delta is not None:
-        delta_kwargs["tool_calls"] = [SimpleNamespace(
-            index=tool_call_delta.get("index", 0),
-            id=tool_call_delta.get("id") or f"call_{uuid.uuid4().hex[:12]}",
-            type="function",
-            function=SimpleNamespace(
-                name=tool_call_delta.get("name") or "",
-                arguments=tool_call_delta.get("arguments") or "",
-            ),
-        )]
-    if reasoning:
-        delta_kwargs["reasoning"] = reasoning
-        delta_kwargs["reasoning_content"] = reasoning
-    delta = SimpleNamespace(**delta_kwargs)
-    choice = SimpleNamespace(index=0, delta=delta, finish_reason=finish_reason)
-    return _GeminiStreamChunk(
-        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
-        object="chat.completion.chunk",
-        created=int(time.time()),
-        model=model,
-        choices=[choice],
-        usage=None,
-    )
-
-
-def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
-    """Parse Server-Sent Events from an httpx streaming response."""
-    buffer = ""
-    for chunk in response.iter_text():
-        if not chunk:
-            continue
-        buffer += chunk
-        while "\n" in buffer:
-            line, buffer = buffer.split("\n", 1)
-            line = line.rstrip("\r")
-            if not line:
-                continue
-            if line.startswith("data: "):
-                data = line[6:]
-                if data == "[DONE]":
-                    return
-                try:
-                    yield json.loads(data)
-                except json.JSONDecodeError:
-                    logger.debug("Non-JSON SSE line: %s", data[:200])
-
-
-def _translate_stream_event(
-    event: Dict[str, Any],
-    model: str,
-    tool_call_counter: List[int],
-) -> List[_GeminiStreamChunk]:
-    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s).
-
-    ``tool_call_counter`` is a single-element list used as a mutable counter
-    across events in the same stream. Each ``functionCall`` part gets a
-    fresh, unique OpenAI ``index`` — keying by function name would collide
-    whenever the model issues parallel calls to the same tool (e.g. reading
-    three files in one turn).
-    """
-    inner = event.get("response") if isinstance(event.get("response"), dict) else event
-    candidates = inner.get("candidates") or []
-    if not candidates:
-        return []
-    cand = candidates[0]
-    if not isinstance(cand, dict):
-        return []
-
-    chunks: List[_GeminiStreamChunk] = []
-
-    content = cand.get("content") or {}
-    parts = content.get("parts") if isinstance(content, dict) else []
-    for part in parts or []:
-        if not isinstance(part, dict):
-            continue
-        if part.get("thought") is True and isinstance(part.get("text"), str):
-            chunks.append(_make_stream_chunk(
-                model=model, reasoning=part["text"],
-            ))
-            continue
-        if isinstance(part.get("text"), str) and part["text"]:
-            chunks.append(_make_stream_chunk(model=model, content=part["text"]))
-        fc = part.get("functionCall")
-        if isinstance(fc, dict) and fc.get("name"):
-            name = str(fc["name"])
-            idx = tool_call_counter[0]
-            tool_call_counter[0] += 1
-            try:
-                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
-            except (TypeError, ValueError):
-                args_str = "{}"
-            chunks.append(_make_stream_chunk(
-                model=model,
-                tool_call_delta={
-                    "index": idx,
-                    "name": name,
-                    "arguments": args_str,
-                },
-            ))
-
-    finish_reason_raw = str(cand.get("finishReason") or "")
-    if finish_reason_raw:
-        mapped = _map_gemini_finish_reason(finish_reason_raw)
-        if tool_call_counter[0] > 0:
-            mapped = "tool_calls"
-        chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
-    return chunks
-
-
-# =============================================================================
-# GeminiCloudCodeClient — OpenAI-compatible facade
-# =============================================================================
-
-MARKER_BASE_URL = "cloudcode-pa://google"
-
-
-class _GeminiChatCompletions:
-    def __init__(self, client: "GeminiCloudCodeClient"):
-        self._client = client
-
-    def create(self, **kwargs: Any) -> Any:
-        return self._client._create_chat_completion(**kwargs)
-
-
-class _GeminiChatNamespace:
-    def __init__(self, client: "GeminiCloudCodeClient"):
-        self.completions = _GeminiChatCompletions(client)
-
-
-class GeminiCloudCodeClient:
-    """Minimal OpenAI-SDK-compatible facade over Code Assist v1internal."""
-
-    def __init__(
-        self,
-        *,
-        api_key: Optional[str] = None,
-        base_url: Optional[str] = None,
-        default_headers: Optional[Dict[str, str]] = None,
-        project_id: str = "",
-        **_: Any,
-    ):
-        # `api_key` here is a dummy — real auth is the OAuth access token
-        # fetched on every call via agent.google_oauth.get_valid_access_token().
-        # We accept the kwarg for openai.OpenAI interface parity.
-        self.api_key = api_key or "google-oauth"
-        self.base_url = base_url or MARKER_BASE_URL
-        self._default_headers = dict(default_headers or {})
-        self._configured_project_id = project_id
-        self._project_context: Optional[ProjectContext] = None
-        self._project_context_lock = False  # simple single-thread guard
-        self.chat = _GeminiChatNamespace(self)
-        self.is_closed = False
-        self._http = httpx.Client(timeout=httpx.Timeout(connect=15.0, read=600.0, write=30.0, pool=30.0))
-
-    def close(self) -> None:
-        self.is_closed = True
-        try:
-            self._http.close()
-        except Exception:
-            pass
-
-    # Implement the OpenAI SDK's context-manager-ish closure check
-    def __enter__(self):
-        return self
-
-    def __exit__(self, exc_type, exc_val, exc_tb):
-        self.close()
-
-    def _ensure_project_context(self, access_token: str, model: str) -> ProjectContext:
-        """Lazily resolve and cache the project context for this client."""
-        if self._project_context is not None:
-            return self._project_context
-
-        env_project = google_oauth.resolve_project_id_from_env()
-        creds = google_oauth.load_credentials()
-        stored_project = creds.project_id if creds else ""
-
-        # Prefer what's already baked into the creds
-        if stored_project:
-            self._project_context = ProjectContext(
-                project_id=stored_project,
-                managed_project_id=creds.managed_project_id if creds else "",
-                tier_id="",
-                source="stored",
-            )
-            return self._project_context
-
-        ctx = resolve_project_context(
-            access_token,
-            configured_project_id=self._configured_project_id,
-            env_project_id=env_project,
-            user_agent_model=model,
-        )
-        # Persist discovered project back to the creds file so the next
-        # session doesn't re-run the discovery.
-        if ctx.project_id or ctx.managed_project_id:
-            google_oauth.update_project_ids(
-                project_id=ctx.project_id,
-                managed_project_id=ctx.managed_project_id,
-            )
-        self._project_context = ctx
-        return ctx
-
-    def _create_chat_completion(
-        self,
-        *,
-        model: str = "gemini-2.5-flash",
-        messages: Optional[List[Dict[str, Any]]] = None,
-        stream: bool = False,
-        tools: Any = None,
-        tool_choice: Any = None,
-        temperature: Optional[float] = None,
-        max_tokens: Optional[int] = None,
-        top_p: Optional[float] = None,
-        stop: Any = None,
-        extra_body: Optional[Dict[str, Any]] = None,
-        timeout: Any = None,
-        **_: Any,
-    ) -> Any:
-        access_token = google_oauth.get_valid_access_token()
-        ctx = self._ensure_project_context(access_token, model)
-
-        thinking_config = None
-        if isinstance(extra_body, dict):
-            thinking_config = extra_body.get("thinking_config") or extra_body.get("thinkingConfig")
-
-        inner = build_gemini_request(
-            messages=messages or [],
-            tools=tools,
-            tool_choice=tool_choice,
-            temperature=temperature,
-            max_tokens=max_tokens,
-            top_p=top_p,
-            stop=stop,
-            thinking_config=thinking_config,
-        )
-        wrapped = wrap_code_assist_request(
-            project_id=ctx.project_id,
-            model=model,
-            inner_request=inner,
-        )
-
-        headers = {
-            "Content-Type": "application/json",
-            "Accept": "application/json",
-            "Authorization": f"Bearer {access_token}",
-            "User-Agent": "hermes-agent (gemini-cli-compat)",
-            "X-Goog-Api-Client": "gl-python/hermes",
-            "x-activity-request-id": str(uuid.uuid4()),
-        }
-        headers.update(self._default_headers)
-
-        if stream:
-            return self._stream_completion(model=model, wrapped=wrapped, headers=headers)
-
-        url = f"{CODE_ASSIST_ENDPOINT}/v1internal:generateContent"
-        response = self._http.post(url, json=wrapped, headers=headers)
-        if response.status_code != 200:
-            raise _gemini_http_error(response)
-        try:
-            payload = response.json()
-        except ValueError as exc:
-            raise CodeAssistError(
-                f"Invalid JSON from Code Assist: {exc}",
-                code="code_assist_invalid_json",
-            ) from exc
-        return _translate_gemini_response(payload, model=model)
-
-    def _stream_completion(
-        self,
-        *,
-        model: str,
-        wrapped: Dict[str, Any],
-        headers: Dict[str, str],
-    ) -> Iterator[_GeminiStreamChunk]:
-        """Generator that yields OpenAI-shaped streaming chunks."""
-        url = f"{CODE_ASSIST_ENDPOINT}/v1internal:streamGenerateContent?alt=sse"
-        stream_headers = dict(headers)
-        stream_headers["Accept"] = "text/event-stream"
-
-        def _generator() -> Iterator[_GeminiStreamChunk]:
-            try:
-                with self._http.stream("POST", url, json=wrapped, headers=stream_headers) as response:
-                    if response.status_code != 200:
-                        # Materialize error body for better diagnostics
-                        response.read()
-                        raise _gemini_http_error(response)
-                    tool_call_counter: List[int] = [0]
-                    for event in _iter_sse_events(response):
-                        for chunk in _translate_stream_event(event, model, tool_call_counter):
-                            yield chunk
-            except httpx.HTTPError as exc:
-                raise CodeAssistError(
-                    f"Streaming request failed: {exc}",
-                    code="code_assist_stream_error",
-                ) from exc
-
-        return _generator()
-
-
-def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
-    """Translate an httpx response into a CodeAssistError with rich metadata.
-
-    Parses Google's error envelope (``{"error": {"code", "message", "status",
-    "details": [...]}}``) so the agent's error classifier can reason about
-    the failure — ``status_code`` enables the rate_limit / auth classification
-    paths, and ``response`` lets the main loop honor ``Retry-After`` just
-    like it does for OpenAI SDK exceptions.
-
-    Also lifts a few recognizable Google conditions into human-readable
-    messages so the user sees something better than a 500-char JSON dump:
-
-        MODEL_CAPACITY_EXHAUSTED → "Gemini model capacity exhausted for
-            <model>. This is a Google-side throttle..."
-        RESOURCE_EXHAUSTED w/o reason → quota-style message
-        404 → "Model <name> not found at cloudcode-pa..."
-    """
-    status = response.status_code
-
-    # Parse the body once, surviving any weird encodings.
-    body_text = ""
-    body_json: Dict[str, Any] = {}
-    try:
-        body_text = response.text
-    except Exception:
-        body_text = ""
-    if body_text:
-        try:
-            parsed = json.loads(body_text)
-            if isinstance(parsed, dict):
-                body_json = parsed
-        except (ValueError, TypeError):
-            body_json = {}
-
-    # Dig into Google's error envelope.  Shape is:
-    #   {"error": {"code": 429, "message": "...", "status": "RESOURCE_EXHAUSTED",
-    #              "details": [{"@type": ".../ErrorInfo", "reason": "MODEL_CAPACITY_EXHAUSTED",
-    #                           "metadata": {...}},
-    #                          {"@type": ".../RetryInfo", "retryDelay": "30s"}]}}
-    err_obj = body_json.get("error") if isinstance(body_json, dict) else None
-    if not isinstance(err_obj, dict):
-        err_obj = {}
-    err_status = str(err_obj.get("status") or "").strip()
-    err_message = str(err_obj.get("message") or "").strip()
-    _raw_details = err_obj.get("details")
-    err_details_list = _raw_details if isinstance(_raw_details, list) else []
-
-    # Extract google.rpc.ErrorInfo reason + metadata.  There may be more
-    # than one ErrorInfo (rare), so we pick the first one with a reason.
-    error_reason = ""
-    error_metadata: Dict[str, Any] = {}
-    retry_delay_seconds: Optional[float] = None
-    for detail in err_details_list:
-        if not isinstance(detail, dict):
-            continue
-        type_url = str(detail.get("@type") or "")
-        if not error_reason and type_url.endswith("/google.rpc.ErrorInfo"):
-            reason = detail.get("reason")
-            if isinstance(reason, str) and reason:
-                error_reason = reason
-            md = detail.get("metadata")
-            if isinstance(md, dict):
-                error_metadata = md
-        elif retry_delay_seconds is None and type_url.endswith("/google.rpc.RetryInfo"):
-            # retryDelay is a google.protobuf.Duration string like "30s" or "1.5s".
-            delay_raw = detail.get("retryDelay")
-            if isinstance(delay_raw, str) and delay_raw.endswith("s"):
-                try:
-                    retry_delay_seconds = float(delay_raw[:-1])
-                except ValueError:
-                    pass
-            elif isinstance(delay_raw, (int, float)):
-                retry_delay_seconds = float(delay_raw)
-
-    # Fall back to the Retry-After header if the body didn't include RetryInfo.
-    if retry_delay_seconds is None:
-        try:
-            header_val = response.headers.get("Retry-After") or response.headers.get("retry-after")
-        except Exception:
-            header_val = None
-        if header_val:
-            try:
-                retry_delay_seconds = float(header_val)
-            except (TypeError, ValueError):
-                retry_delay_seconds = None
-
-    # Classify the error code.  ``code_assist_rate_limited`` stays the default
-    # for 429s; a more specific reason tag helps downstream callers (e.g. tests,
-    # logs) without changing the rate_limit classification path.
-    code = f"code_assist_http_{status}"
-    if status == 401:
-        code = "code_assist_unauthorized"
-    elif status == 429:
-        code = "code_assist_rate_limited"
-        if error_reason == "MODEL_CAPACITY_EXHAUSTED":
-            code = "code_assist_capacity_exhausted"
-
-    # Build a human-readable message.  Keep the status + a raw-body tail for
-    # debugging, but lead with a friendlier summary when we recognize the
-    # Google signal.
-    model_hint = ""
-    if isinstance(error_metadata, dict):
-        model_hint = str(error_metadata.get("model") or error_metadata.get("modelId") or "").strip()
-
-    if status == 429 and error_reason == "MODEL_CAPACITY_EXHAUSTED":
-        target = model_hint or "this Gemini model"
-        message = (
-            f"Gemini capacity exhausted for {target} (Google-side throttle, "
-            f"not a Hermes issue). Try a different Gemini model or set a "
-            f"fallback_providers entry to a non-Gemini provider."
-        )
-        if retry_delay_seconds is not None:
-            message += f" Google suggests retrying in {retry_delay_seconds:g}s."
-    elif status == 429 and err_status == "RESOURCE_EXHAUSTED":
-        message = (
-            f"Gemini quota exhausted ({err_message or 'RESOURCE_EXHAUSTED'}). "
-            f"Check /gquota for remaining daily requests."
-        )
-        if retry_delay_seconds is not None:
-            message += f" Retry suggested in {retry_delay_seconds:g}s."
-    elif status == 404:
-        # Google returns 404 when a model has been retired or renamed.
-        target = model_hint or (err_message or "model")
-        message = (
-            f"Code Assist 404: {target} is not available at "
-            f"cloudcode-pa.googleapis.com. It may have been renamed or "
-            f"retired. Check hermes_cli/models.py for the current list."
-        )
-    elif err_message:
-        # Generic fallback with the parsed message.
-        message = f"Code Assist HTTP {status} ({err_status or 'error'}): {err_message}"
-    else:
-        # Last-ditch fallback — raw body snippet.
-        message = f"Code Assist returned HTTP {status}: {body_text[:500]}"
-
-    return CodeAssistError(
-        message,
-        code=code,
-        status_code=status,
-        response=response,
-        retry_after=retry_delay_seconds,
-        details={
-            "status": err_status,
-            "reason": error_reason,
-            "metadata": error_metadata,
-            "message": err_message,
-        },
-    )
--- a/agent/google_code_assist.py
+++ b/agent/google_code_assist.py
@ -1,451 +0,0 @@
-"""Google Code Assist API client — project discovery, onboarding, quota.
-
-The Code Assist API powers Google's official gemini-cli. It sits at
-``cloudcode-pa.googleapis.com`` and provides:
-
- Free tier access (generous daily quota) for personal Google accounts
- Paid tier access via GCP projects with billing / Workspace / Standard / Enterprise
-
-This module handles the control-plane dance needed before inference:
-
-1. ``load_code_assist()`` — probe the user's account to learn what tier they're on
-   and whether a ``cloudaicompanionProject`` is already assigned.
-2. ``onboard_user()`` — if the user hasn't been onboarded yet (new account, fresh
-   free tier, etc.), call this with the chosen tier + project id. Supports LRO
-   polling for slow provisioning.
-3. ``retrieve_user_quota()`` — fetch the ``buckets[]`` array showing remaining
-   quota per model, used by the ``/gquota`` slash command.
-
-VPC-SC handling: enterprise accounts under a VPC Service Controls perimeter
-will get ``SECURITY_POLICY_VIOLATED`` on ``load_code_assist``. We catch this
-and force the account to ``standard-tier`` so the call chain still succeeds.
-
-Derived from opencode-gemini-auth (MIT) and clawdbot/extensions/google. The
-request/response shapes are specific to Google's internal Code Assist API,
-documented nowhere public — we copy them from the reference implementations.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import time
-import urllib.error
-import urllib.request
-import uuid
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional
-
-logger = logging.getLogger(__name__)
-
-
-# =============================================================================
-# Constants
-# =============================================================================
-
-CODE_ASSIST_ENDPOINT = "https://cloudcode-pa.googleapis.com"
-
-# Fallback endpoints tried when prod returns an error during project discovery
-FALLBACK_ENDPOINTS = [
-    "https://daily-cloudcode-pa.sandbox.googleapis.com",
-    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
-]
-
-# Tier identifiers that Google's API uses
-FREE_TIER_ID = "free-tier"
-LEGACY_TIER_ID = "legacy-tier"
-STANDARD_TIER_ID = "standard-tier"
-
-# Default HTTP headers matching gemini-cli's fingerprint.
-# Google may reject unrecognized User-Agents on these internal endpoints.
-_GEMINI_CLI_USER_AGENT = "google-api-nodejs-client/9.15.1 (gzip)"
-_X_GOOG_API_CLIENT = "gl-node/24.0.0"
-_DEFAULT_REQUEST_TIMEOUT = 30.0
-_ONBOARDING_POLL_ATTEMPTS = 12
-_ONBOARDING_POLL_INTERVAL_SECONDS = 5.0
-
-
-class CodeAssistError(RuntimeError):
-    """Exception raised by the Code Assist (``cloudcode-pa``) integration.
-
-    Carries HTTP status / response / retry-after metadata so the agent's
-    ``error_classifier._extract_status_code`` and the main loop's Retry-After
-    handling (which walks ``error.response.headers``) pick up the right
-    signals.  Without these, 429s from the OAuth path look like opaque
-    ``RuntimeError`` and skip the rate-limit path.
-    """
-
-    def __init__(
-        self,
-        message: str,
-        *,
-        code: str = "code_assist_error",
-        status_code: Optional[int] = None,
-        response: Any = None,
-        retry_after: Optional[float] = None,
-        details: Optional[Dict[str, Any]] = None,
-    ) -> None:
-        super().__init__(message)
-        self.code = code
-        # ``status_code`` is picked up by ``agent.error_classifier._extract_status_code``
-        # so a 429 from Code Assist classifies as FailoverReason.rate_limit and
-        # triggers the main loop's fallback_providers chain the same way SDK
-        # errors do.
-        self.status_code = status_code
-        # ``response`` is the underlying ``httpx.Response`` (or a shim with a
-        # ``.headers`` mapping and ``.json()`` method).  The main loop reads
-        # ``error.response.headers["Retry-After"]`` to honor Google's retry
-        # hints when the backend throttles us.
-        self.response = response
-        # Parsed ``Retry-After`` seconds (kept separately for convenience —
-        # Google returns retry hints in both the header and the error body's
-        # ``google.rpc.RetryInfo`` details, and we pick whichever we found).
-        self.retry_after = retry_after
-        # Parsed structured error details from the Google error envelope
-        # (e.g. ``{"reason": "MODEL_CAPACITY_EXHAUSTED", "status": "RESOURCE_EXHAUSTED"}``).
-        # Useful for logging and for tests that want to assert on specifics.
-        self.details = details or {}
-
-
-class ProjectIdRequiredError(CodeAssistError):
-    def __init__(self, message: str = "GCP project id required for this tier") -> None:
-        super().__init__(message, code="code_assist_project_id_required")
-
-
-# =============================================================================
-# HTTP primitive (auth via Bearer token passed per-call)
-# =============================================================================
-
-def _build_headers(access_token: str, *, user_agent_model: str = "") -> Dict[str, str]:
-    ua = _GEMINI_CLI_USER_AGENT
-    if user_agent_model:
-        ua = f"{ua} model/{user_agent_model}"
-    return {
-        "Content-Type": "application/json",
-        "Accept": "application/json",
-        "Authorization": f"Bearer {access_token}",
-        "User-Agent": ua,
-        "X-Goog-Api-Client": _X_GOOG_API_CLIENT,
-        "x-activity-request-id": str(uuid.uuid4()),
-    }
-
-
-def _client_metadata() -> Dict[str, str]:
-    """Match Google's gemini-cli exactly — unrecognized metadata may be rejected."""
-    return {
-        "ideType": "IDE_UNSPECIFIED",
-        "platform": "PLATFORM_UNSPECIFIED",
-        "pluginType": "GEMINI",
-    }
-
-
-def _post_json(
-    url: str,
-    body: Dict[str, Any],
-    access_token: str,
-    *,
-    timeout: float = _DEFAULT_REQUEST_TIMEOUT,
-    user_agent_model: str = "",
-) -> Dict[str, Any]:
-    data = json.dumps(body).encode("utf-8")
-    request = urllib.request.Request(
-        url, data=data, method="POST",
-        headers=_build_headers(access_token, user_agent_model=user_agent_model),
-    )
-    try:
-        with urllib.request.urlopen(request, timeout=timeout) as response:
-            raw = response.read().decode("utf-8", errors="replace")
-            return json.loads(raw) if raw else {}
-    except urllib.error.HTTPError as exc:
-        detail = ""
-        try:
-            detail = exc.read().decode("utf-8", errors="replace")
-        except Exception:
-            pass
-        # Special case: VPC-SC violation should be distinguishable
-        if _is_vpc_sc_violation(detail):
-            raise CodeAssistError(
-                f"VPC-SC policy violation: {detail}",
-                code="code_assist_vpc_sc",
-            ) from exc
-        raise CodeAssistError(
-            f"Code Assist HTTP {exc.code}: {detail or exc.reason}",
-            code=f"code_assist_http_{exc.code}",
-        ) from exc
-    except urllib.error.URLError as exc:
-        raise CodeAssistError(
-            f"Code Assist request failed: {exc}",
-            code="code_assist_network_error",
-        ) from exc
-
-
-def _is_vpc_sc_violation(body: str) -> bool:
-    """Detect a VPC Service Controls violation from a response body."""
-    if not body:
-        return False
-    try:
-        parsed = json.loads(body)
-    except (json.JSONDecodeError, ValueError):
-        return "SECURITY_POLICY_VIOLATED" in body
-    # Walk the nested error structure Google uses
-    error = parsed.get("error") if isinstance(parsed, dict) else None
-    if not isinstance(error, dict):
-        return False
-    details = error.get("details") or []
-    if isinstance(details, list):
-        for item in details:
-            if isinstance(item, dict):
-                reason = item.get("reason") or ""
-                if reason == "SECURITY_POLICY_VIOLATED":
-                    return True
-    msg = str(error.get("message", ""))
-    return "SECURITY_POLICY_VIOLATED" in msg
-
-
-# =============================================================================
-# load_code_assist — discovers current tier + assigned project
-# =============================================================================
-
-@dataclass
-class CodeAssistProjectInfo:
-    """Result from ``load_code_assist``."""
-    current_tier_id: str = ""
-    cloudaicompanion_project: str = ""   # Google-managed project (free tier)
-    allowed_tiers: List[str] = field(default_factory=list)
-    raw: Dict[str, Any] = field(default_factory=dict)
-
-
-def load_code_assist(
-    access_token: str,
-    *,
-    project_id: str = "",
-    user_agent_model: str = "",
-) -> CodeAssistProjectInfo:
-    """Call ``POST /v1internal:loadCodeAssist`` with prod → sandbox fallback.
-
-    Returns whatever tier + project info Google reports. On VPC-SC violations,
-    returns a synthetic ``standard-tier`` result so the chain can continue.
-    """
-    body: Dict[str, Any] = {
-        "metadata": {
-            "duetProject": project_id,
-            **_client_metadata(),
-        },
-    }
-    if project_id:
-        body["cloudaicompanionProject"] = project_id
-
-    endpoints = [CODE_ASSIST_ENDPOINT] + FALLBACK_ENDPOINTS
-    last_err: Optional[Exception] = None
-    for endpoint in endpoints:
-        url = f"{endpoint}/v1internal:loadCodeAssist"
-        try:
-            resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
-            return _parse_load_response(resp)
-        except CodeAssistError as exc:
-            if exc.code == "code_assist_vpc_sc":
-                logger.info("VPC-SC violation on %s — defaulting to standard-tier", endpoint)
-                return CodeAssistProjectInfo(
-                    current_tier_id=STANDARD_TIER_ID,
-                    cloudaicompanion_project=project_id,
-                )
-            last_err = exc
-            logger.warning("loadCodeAssist failed on %s: %s", endpoint, exc)
-            continue
-    if last_err:
-        raise last_err
-    return CodeAssistProjectInfo()
-
-
-def _parse_load_response(resp: Dict[str, Any]) -> CodeAssistProjectInfo:
-    current_tier = resp.get("currentTier") or {}
-    tier_id = str(current_tier.get("id") or "") if isinstance(current_tier, dict) else ""
-    project = str(resp.get("cloudaicompanionProject") or "")
-    allowed = resp.get("allowedTiers") or []
-    allowed_ids: List[str] = []
-    if isinstance(allowed, list):
-        for t in allowed:
-            if isinstance(t, dict):
-                tid = str(t.get("id") or "")
-                if tid:
-                    allowed_ids.append(tid)
-    return CodeAssistProjectInfo(
-        current_tier_id=tier_id,
-        cloudaicompanion_project=project,
-        allowed_tiers=allowed_ids,
-        raw=resp,
-    )
-
-
-# =============================================================================
-# onboard_user — provisions a new user on a tier (with LRO polling)
-# =============================================================================
-
-def onboard_user(
-    access_token: str,
-    *,
-    tier_id: str,
-    project_id: str = "",
-    user_agent_model: str = "",
-) -> Dict[str, Any]:
-    """Call ``POST /v1internal:onboardUser`` to provision the user.
-
-    For paid tiers, ``project_id`` is REQUIRED (raises ProjectIdRequiredError).
-    For free tiers, ``project_id`` is optional — Google will assign one.
-
-    Returns the final operation response. Polls ``/v1internal/<name>`` for up
-    to ``_ONBOARDING_POLL_ATTEMPTS`` × ``_ONBOARDING_POLL_INTERVAL_SECONDS``
-    (default: 12 × 5s = 1 min).
-    """
-    if tier_id != FREE_TIER_ID and tier_id != LEGACY_TIER_ID and not project_id:
-        raise ProjectIdRequiredError(
-            f"Tier {tier_id!r} requires a GCP project id. "
-            "Set HERMES_GEMINI_PROJECT_ID or GOOGLE_CLOUD_PROJECT."
-        )
-
-    body: Dict[str, Any] = {
-        "tierId": tier_id,
-        "metadata": _client_metadata(),
-    }
-    if project_id:
-        body["cloudaicompanionProject"] = project_id
-
-    endpoint = CODE_ASSIST_ENDPOINT
-    url = f"{endpoint}/v1internal:onboardUser"
-    resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
-
-    # Poll if LRO (long-running operation)
-    if not resp.get("done"):
-        op_name = resp.get("name", "")
-        if not op_name:
-            return resp
-        for attempt in range(_ONBOARDING_POLL_ATTEMPTS):
-            time.sleep(_ONBOARDING_POLL_INTERVAL_SECONDS)
-            poll_url = f"{endpoint}/v1internal/{op_name}"
-            try:
-                poll_resp = _post_json(poll_url, {}, access_token, user_agent_model=user_agent_model)
-            except CodeAssistError as exc:
-                logger.warning("Onboarding poll attempt %d failed: %s", attempt + 1, exc)
-                continue
-            if poll_resp.get("done"):
-                return poll_resp
-        logger.warning("Onboarding did not complete within %d attempts", _ONBOARDING_POLL_ATTEMPTS)
-    return resp
-
-
-# =============================================================================
-# retrieve_user_quota — for /gquota
-# =============================================================================
-
-@dataclass
-class QuotaBucket:
-    model_id: str
-    token_type: str = ""
-    remaining_fraction: float = 0.0
-    reset_time_iso: str = ""
-    raw: Dict[str, Any] = field(default_factory=dict)
-
-
-def retrieve_user_quota(
-    access_token: str,
-    *,
-    project_id: str = "",
-    user_agent_model: str = "",
-) -> List[QuotaBucket]:
-    """Call ``POST /v1internal:retrieveUserQuota`` and parse ``buckets[]``."""
-    body: Dict[str, Any] = {}
-    if project_id:
-        body["project"] = project_id
-    url = f"{CODE_ASSIST_ENDPOINT}/v1internal:retrieveUserQuota"
-    resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
-    raw_buckets = resp.get("buckets") or []
-    buckets: List[QuotaBucket] = []
-    if not isinstance(raw_buckets, list):
-        return buckets
-    for b in raw_buckets:
-        if not isinstance(b, dict):
-            continue
-        buckets.append(QuotaBucket(
-            model_id=str(b.get("modelId") or ""),
-            token_type=str(b.get("tokenType") or ""),
-            remaining_fraction=float(b.get("remainingFraction") or 0.0),
-            reset_time_iso=str(b.get("resetTime") or ""),
-            raw=b,
-        ))
-    return buckets
-
-
-# =============================================================================
-# Project context resolution
-# =============================================================================
-
-@dataclass
-class ProjectContext:
-    """Resolved state for a given OAuth session."""
-    project_id: str = ""           # effective project id sent on requests
-    managed_project_id: str = ""   # Google-assigned project (free tier)
-    tier_id: str = ""
-    source: str = ""               # "env", "config", "discovered", "onboarded"
-
-
-def resolve_project_context(
-    access_token: str,
-    *,
-    configured_project_id: str = "",
-    env_project_id: str = "",
-    user_agent_model: str = "",
-) -> ProjectContext:
-    """Figure out what project id + tier to use for requests.
-
-    Priority:
-      1. If configured_project_id or env_project_id is set, use that directly
-         and short-circuit (no discovery needed).
-      2. Otherwise call loadCodeAssist to see what Google says.
-      3. If no tier assigned yet, onboard the user (free tier default).
-    """
-    # Short-circuit: caller provided a project id
-    if configured_project_id:
-        return ProjectContext(
-            project_id=configured_project_id,
-            tier_id=STANDARD_TIER_ID,  # assume paid since they specified one
-            source="config",
-        )
-    if env_project_id:
-        return ProjectContext(
-            project_id=env_project_id,
-            tier_id=STANDARD_TIER_ID,
-            source="env",
-        )
-
-    # Discover via loadCodeAssist
-    info = load_code_assist(access_token, user_agent_model=user_agent_model)
-
-    effective_project = info.cloudaicompanion_project
-    tier = info.current_tier_id
-
-    if not tier:
-        # User hasn't been onboarded — provision them on free tier
-        onboard_resp = onboard_user(
-            access_token,
-            tier_id=FREE_TIER_ID,
-            project_id="",
-            user_agent_model=user_agent_model,
-        )
-        # Re-parse from the onboard response
-        response_body = onboard_resp.get("response") or {}
-        if isinstance(response_body, dict):
-            effective_project = (
-                effective_project
-                or str(response_body.get("cloudaicompanionProject") or "")
-            )
-        tier = FREE_TIER_ID
-        source = "onboarded"
-    else:
-        source = "discovered"
-
-    return ProjectContext(
-        project_id=effective_project,
-        managed_project_id=effective_project if tier == FREE_TIER_ID else "",
-        tier_id=tier,
-        source=source,
-    )
--- a/agent/google_oauth.py
+++ b/agent/google_oauth.py
--- a/agent/image_gen_provider.py
+++ b/agent/image_gen_provider.py
@ -11,6 +11,18 @@ Providers live in ``<repo>/plugins/image_gen/<name>/`` (built-in, auto-loaded
 as ``kind: backend``) or ``~/.hermes/plugins/image_gen/<name>/`` (user, opt-in
 via ``plugins.enabled``).

+Unified surface
+---------------
+One tool — ``image_generate`` — covers **text-to-image** and
+**image-to-image / image editing**. The router is the presence of
+``image_url`` (and/or ``reference_image_urls``): if any source image is
+provided, the provider routes to its image-to-image / edit endpoint; if
+omitted, the provider routes to text-to-image. Users pick one **model**
+(e.g. nano-banana-pro, gpt-image-2, grok-imagine-image); the provider
+handles which underlying endpoint to hit. This mirrors the ``video_gen``
+provider design (``agent/video_gen_provider.py``) so the two surfaces
+stay learnable together.
+
 Response shape
 --------------
 All providers return a dict that :func:`success_response` / :func:`error_response`
@ -21,6 +33,7 @@ produce. The tool wrapper JSON-serializes it. Keys:
    model          str              provider-specific model identifier
    prompt         str              echoed prompt
    aspect_ratio   str              "landscape" | "square" | "portrait"
+    modality       str              "text" | "image" (which mode was used)
    provider       str              provider name (for diagnostics)
    error          str              only when success=False
    error_type     str              only when success=False
@ -127,19 +140,51 @@ class ImageGenProvider(abc.ABC):
            return models[0].get("id")
        return None

+    def capabilities(self) -> Dict[str, Any]:
+        """Return what this provider supports.
+
+        Returned dict (all keys optional)::
+
+            {
+                "modalities": ["text", "image"],   # which inputs the backend accepts
+                "max_reference_images": 9,          # cap for reference_image_urls
+            }
+
+        ``modalities`` declares whether the active backend/model supports
+        text-to-image (``"text"``), image-to-image / editing (``"image"``),
+        or both. The tool layer surfaces this in the dynamic schema so the
+        model knows when ``image_url`` is honored. Used by ``hermes tools``
+        for the picker too. Default: text-only (backward compatible — a
+        provider that doesn't override this advertises text-to-image only).
+        """
+        return {
+            "modalities": ["text"],
+            "max_reference_images": 0,
+        }
+
    @abc.abstractmethod
    def generate(
        self,
        prompt: str,
        aspect_ratio: str = DEFAULT_ASPECT_RATIO,
+        *,
+        image_url: Optional[str] = None,
+        reference_image_urls: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> Dict[str, Any]:
-        """Generate an image.
+        """Generate an image from a text prompt, or edit/transform a source image.
+
+        Routing: if ``image_url`` (or any ``reference_image_urls``) is
+        provided, the provider should route to its image-to-image / edit
+        endpoint; otherwise text-to-image. ``image_url`` is the primary
+        source image to edit; ``reference_image_urls`` are additional
+        style/composition references (provider clamps to its declared
+        ``max_reference_images``).

        Implementations should return the dict from :func:`success_response`
        or :func:`error_response`. ``kwargs`` may contain forward-compat
-        parameters future versions of the schema will expose — implementations
-        should ignore unknown keys.
+        parameters future versions of the schema will expose —
+        implementations MUST ignore unknown keys (no TypeError).
        """


@ -162,6 +207,26 @@ def resolve_aspect_ratio(value: Optional[str]) -> str:
    return DEFAULT_ASPECT_RATIO


+def normalize_reference_images(value: Any) -> Optional[List[str]]:
+    """Coerce a reference-image argument into a clean list of URL/path strings.
+
+    Accepts a single string or a list; strips blanks and whitespace. Returns
+    ``None`` when nothing usable remains so providers can treat "no refs" as a
+    single sentinel.
+    """
+    if value is None:
+        return None
+    if isinstance(value, str):
+        value = [value]
+    if not isinstance(value, (list, tuple)):
+        return None
+    out: List[str] = []
+    for item in value:
+        if isinstance(item, str) and item.strip():
+            out.append(item.strip())
+    return out or None
+
+
 def _images_cache_dir() -> Path:
    """Return ``$HERMES_HOME/cache/images/``, creating parents as needed."""
    from hermes_constants import get_hermes_home
@ -280,13 +345,16 @@ def success_response(
    prompt: str,
    aspect_ratio: str,
    provider: str,
+    modality: str = "text",
    extra: Optional[Dict[str, Any]] = None,
 ) -> Dict[str, Any]:
    """Build a uniform success response dict.

    ``image`` may be an HTTP URL or an absolute filesystem path (for b64
-    providers like OpenAI). Callers that need to pass through additional
-    backend-specific fields can supply ``extra``.
+    providers like OpenAI). ``modality`` is ``"text"`` (text-to-image) or
+    ``"image"`` (image-to-image / editing) — indicates which endpoint was
+    actually hit, useful for diagnostics. Callers that need to pass through
+    additional backend-specific fields can supply ``extra``.
    """
    payload: Dict[str, Any] = {
        "success": True,
@ -294,6 +362,7 @@ def success_response(
        "model": model,
        "prompt": prompt,
        "aspect_ratio": aspect_ratio,
+        "modality": modality,
        "provider": provider,
    }
    if extra:
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@ -721,9 +721,10 @@ class MemoryManager:
            try:
                provider.on_session_end(messages)
            except Exception as e:
-                logger.debug(
+                logger.warning(
                    "Memory provider '%s' on_session_end failed: %s",
                    provider.name, e,
+                    exc_info=True,
                )

    def on_session_switch(
--- a/agent/memory_provider.py
+++ b/agent/memory_provider.py
@ -28,6 +28,7 @@ Optional hooks (override to opt in):
  on_pre_compress(messages) -> str       — extract before context compression
  on_memory_write(action, target, content, metadata=None) — mirror built-in memory writes
  on_delegation(task, result, **kwargs)  — parent-side observation of subagent work
+  backup_paths() -> list[str]            — extra on-disk paths to include in `hermes backup`
 """

 from __future__ import annotations
@ -294,3 +295,21 @@ class MemoryProvider(ABC):

        Use to mirror built-in memory writes to your backend.
        """
+
+    def backup_paths(self) -> List[str]:
+        """Return extra on-disk paths this provider stores OUTSIDE HERMES_HOME.
+
+        ``hermes backup`` only walks HERMES_HOME, so any provider state kept
+        under ``~/.honcho``, ``~/.hindsight``, ``~/.openviking``, etc. is lost
+        across a backup/import cycle unless it's declared here.
+
+        Return a list of absolute path strings (files or directories). The
+        backup command resolves each, captures the ones that exist and live
+        under the user's home directory into a reserved ``_external/`` subtree
+        of the archive, and ``hermes import`` restores them to their original
+        locations. Paths outside the home directory are skipped for safety.
+
+        MUST be callable without ``initialize()`` and without network — resolve
+        from config/env only. Default returns an empty list (nothing external).
+        """
+        return []
--- a/agent/message_content.py
+++ b/agent/message_content.py
@ -0,0 +1,50 @@
+from __future__ import annotations
+
+from collections.abc import Mapping
+from typing import Any
+
+
+_NON_TEXT_PART_TYPES = {"image", "image_url", "input_image", "audio", "input_audio"}
+_TEXT_KEYS = ("text", "content", "input_text", "output_text", "summary_text")
+
+
+def _field(value: Any, key: str) -> Any:
+    if isinstance(value, Mapping):
+        return value.get(key)
+    return getattr(value, key, None)
+
+
+def _text_from_part(part: Any) -> str:
+    if part is None:
+        return ""
+    if isinstance(part, str):
+        return part
+
+    part_type = str(_field(part, "type") or "").strip().lower()
+    if part_type in _NON_TEXT_PART_TYPES:
+        return ""
+
+    for key in _TEXT_KEYS:
+        text = _field(part, key)
+        if isinstance(text, str):
+            return text
+    return ""
+
+
+def flatten_message_text(content: Any, *, sep: str = "\n") -> str:
+    """Return the visible text from common chat/Responses message content shapes."""
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        chunks = [_text_from_part(part) for part in content]
+        return sep.join(chunk for chunk in chunks if chunk)
+
+    text = _text_from_part(content)
+    if text:
+        return text
+    try:
+        return str(content)
+    except Exception:
+        return ""
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@ -238,6 +238,23 @@ KANBAN_GUIDANCE = (
    "of the decomposition. Do NOT execute the work yourself; your job is "
    "routing, not implementation.\n"
    "\n"
+    "## Reference details that change outcomes\n"
+    "\n"
+    "- **Workspace.** `cd $HERMES_KANBAN_WORKSPACE` first. For a `worktree` kind "
+    "with no `.git`, `git worktree add <path> "
+    "${HERMES_KANBAN_BRANCH:-wt/$HERMES_KANBAN_TASK}` from the main repo, then "
+    "cd there.\n"
+    "- **Deliverables.** Files a human wants go in "
+    "`kanban_complete(artifacts=[<absolute paths>])` (top-level param; paths in "
+    "`metadata` are NOT uploaded). Files must exist at completion.\n"
+    "- **Created cards.** List ids in `kanban_complete(created_cards=[...])` "
+    "ONLY when captured from a successful `kanban_create` return — never invent "
+    "or paste ids; the kernel rejects the completion on any phantom id.\n"
+    "- **Orchestrating: discover profiles first.** The dispatcher SILENTLY "
+    "drops a card with an unknown assignee (it sits in `ready` forever). Ground "
+    "every assignee in a real profile (`hermes profile list`, or ask the user), "
+    "and express dependencies via `parents=[...]` on `kanban_create`, not prose.\n"
+    "\n"
    "## Do NOT\n"
    "\n"
    "- Do not shell out to `hermes kanban <verb>` for board operations. Use "
--- a/agent/redact.py
+++ b/agent/redact.py
@ -120,9 +120,25 @@ _JSON_FIELD_RE = re.compile(
    re.IGNORECASE,
 )

-# Authorization headers
+# Authorization headers — any scheme (Bearer, Basic, Token, Digest, …) plus the
+# bare-credential form, and Proxy-Authorization. The credential token is masked
+# while the header name and scheme word are preserved for debuggability. The
+# previous rule only matched ``Bearer``, so ``Basic <base64 user:pass>`` and
+# ``token <pat>`` leaked verbatim into logs/transcripts.
 _AUTH_HEADER_RE = re.compile(
-    r"(Authorization:\s*Bearer\s+)(\S+)",
+    r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?(\S+)",
+    re.IGNORECASE,
+)
+
+# API-key style auth headers carrying a single opaque value (no scheme word).
+# Anthropic and many providers authenticate with ``x-api-key``; values without
+# a known vendor prefix (custom/local backends) would otherwise leak when a
+# request or curl command is logged or echoed into tool output / transcripts.
+_SECRET_HEADER_NAMES = (
+    r"(?:x-api-key|x-goog-api-key|api-key|apikey|x-api-token|x-auth-token|x-access-token)"
+)
+_SECRET_HEADER_RE = re.compile(
+    rf"({_SECRET_HEADER_NAMES}\s*:\s*)(\S+)",
    re.IGNORECASE,
 )

@ -374,11 +390,19 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
                return f'{key}: "{_mask_token(value)}"'
            text = _JSON_FIELD_RE.sub(_redact_json, text)

-    # Authorization headers — _AUTH_HEADER_RE is "Authorization: Bearer ..."
-    # case-insensitive, so "uthorization" is the cheapest substring gate that
-    # covers both "Authorization" and "authorization" without a casefold().
+    # Authorization headers — _AUTH_HEADER_RE matches any scheme after
+    # "[Proxy-]Authorization:" case-insensitively, so "uthorization" is the
+    # cheapest substring gate that covers every casing without a casefold().
    if "uthorization" in text or "UTHORIZATION" in text:
        text = _AUTH_HEADER_RE.sub(
+            lambda m: m.group(1) + (m.group(2) or "") + _mask_token(m.group(3)),
+            text,
+        )
+
+    # API-key style headers (x-api-key, api-key, …). Header values are
+    # colon-separated, so gate on ":" — the regex itself is the precise filter.
+    if ":" in text:
+        text = _SECRET_HEADER_RE.sub(
            lambda m: m.group(1) + _mask_token(m.group(2)),
            text,
        )
--- a/agent/secret_scope.py
+++ b/agent/secret_scope.py
@ -0,0 +1,205 @@
+"""Profile-scoped credential resolution for multi-profile gateway multiplexing.
+
+The multiplexing gateway serves many profiles from one process. Each profile
+has its own ``.env`` with its own provider keys and platform tokens, so we
+**cannot** union them into the process-global ``os.environ`` (that would leak
+profile A's keys to profile B's turns, and to every subprocess spawned with
+``env=dict(os.environ)``).
+
+This module provides a fail-closed, context-local secret scope:
+
+- ``set_secret_scope(mapping)`` installs the active profile's secrets for the
+  current task (a contextvar, so it propagates into the agent's worker thread
+  via ``copy_context()`` exactly like the HERMES_HOME override).
+- ``get_secret(name)`` reads from that scope. When multiplexing is **active**
+  and no scope is set, it RAISES rather than silently falling back to
+  ``os.environ`` — an un-migrated or newly-added call site fails loud at that
+  exact line instead of leaking another profile's value. When multiplexing is
+  **off** (the default), it transparently reads ``os.environ`` so the
+  single-profile gateway and every non-gateway caller behave exactly as before.
+
+Design rationale lives in ``docs/design/multiplexing-gateway.md`` (Workstream A).
+"""
+from __future__ import annotations
+
+import os
+from contextvars import ContextVar, Token
+from pathlib import Path
+from typing import Dict, Mapping, Optional
+
+
+# ── multiplex-active flag ────────────────────────────────────────────────
+# Process-global: set once at gateway startup when gateway.multiplex_profiles
+# is true. Governs whether get_secret() fails closed on an unscoped read.
+# A plain module global (not a contextvar): it describes the deployment mode,
+# not a per-task value.
+_MULTIPLEX_ACTIVE: bool = False
+
+
+def set_multiplex_active(active: bool) -> None:
+    """Mark whether the process is running as a profile multiplexer.
+
+    Called once at gateway startup. When True, ``get_secret`` fails closed on
+    an unscoped read instead of falling back to ``os.environ``.
+    """
+    global _MULTIPLEX_ACTIVE
+    _MULTIPLEX_ACTIVE = bool(active)
+
+
+def is_multiplex_active() -> bool:
+    """Return whether the process is running as a profile multiplexer."""
+    return _MULTIPLEX_ACTIVE
+
+
+# ── the secret scope contextvar ──────────────────────────────────────────
+_SECRET_SCOPE: ContextVar[Optional[Mapping[str, str]]] = ContextVar(
+    "_SECRET_SCOPE", default=None
+)
+
+
+class UnscopedSecretError(RuntimeError):
+    """Raised when a secret is read in multiplex mode with no scope installed.
+
+    This is the fail-closed signal: it means a credential read reached
+    ``get_secret`` without a profile scope active, which in a multiplexer would
+    otherwise leak whichever profile's value happened to be in ``os.environ``.
+    The fix is to wrap the call path in ``set_secret_scope(...)`` (the per-turn
+    / per-adapter profile scope), not to widen the allowlist.
+    """
+
+
+def set_secret_scope(secrets: Optional[Mapping[str, str]]) -> Token:
+    """Install the active profile's secret mapping for the current context.
+
+    Returns a token for ``reset_secret_scope``. Pass ``None`` to clear.
+    """
+    return _SECRET_SCOPE.set(secrets)
+
+
+def reset_secret_scope(token: Token) -> None:
+    """Restore the previous secret scope."""
+    _SECRET_SCOPE.reset(token)
+
+
+def current_secret_scope() -> Optional[Mapping[str, str]]:
+    """Return the active secret mapping, or None when no scope is installed."""
+    return _SECRET_SCOPE.get()
+
+
+# ── genuinely-global env vars (NOT per-profile secrets) ──────────────────
+# These are process/deployment-level settings, not profile credentials. They
+# legitimately live in os.environ and must keep reading from it even in
+# multiplex mode — routing them through the fail-closed path would wrongly
+# crash. Anything matching is read from os.environ regardless of scope.
+#
+# Membership test is by exact name OR prefix (see _is_global_env). Keep this
+# list tight: when in doubt a value is a profile secret, not a global.
+_GLOBAL_ENV_EXACT = frozenset({
+    # Hermes runtime / deployment
+    "HERMES_HOME", "HERMES_PROFILE", "HERMES_GATEWAY_LOCK_DIR",
+    "HERMES_MAX_ITERATIONS", "HERMES_MAX_TOKENS", "HERMES_API_TIMEOUT",
+    "HERMES_REDACT_SECRETS", "HERMES_NOUS_TIMEOUT_SECONDS",
+    "_HERMES_GATEWAY",
+    # OS / interpreter
+    "PATH", "HOME", "USER", "LANG", "LC_ALL", "TZ", "PWD", "SHELL", "TMPDIR",
+    "VIRTUAL_ENV", "PYTHONPATH", "SSL_CERT_FILE",
+    # Kanban paths (per-board, not per-profile-secret)
+    "HERMES_KANBAN_DB", "HERMES_KANBAN_WORKSPACES_ROOT", "HERMES_KANBAN_BOARD",
+})
+_GLOBAL_ENV_PREFIXES = (
+    "HERMES_KANBAN_",
+    "HERMES_TELEGRAM_",   # tuning knobs (batch delays, fallback toggles) — NOT the token
+    "TERMINAL_",          # terminal/sandbox backend settings
+)
+
+
+def _is_global_env(name: str) -> bool:
+    """Return True for genuinely process-global (non-profile-secret) env vars."""
+    if name in _GLOBAL_ENV_EXACT:
+        return True
+    return any(name.startswith(p) for p in _GLOBAL_ENV_PREFIXES)
+
+
+def get_secret(name: str, default: Optional[str] = None) -> Optional[str]:
+    """Resolve a credential by env-var name, honoring the active profile scope.
+
+    Resolution order:
+
+    1. Genuinely-global vars (``_is_global_env``) always read ``os.environ`` —
+       they are deployment settings, not profile secrets.
+    2. When a secret scope is installed (multiplexed turn), read from it; an
+       absent key returns ``default``. The scope is authoritative — we do NOT
+       fall through to ``os.environ``, because in a multiplexer ``os.environ``
+       may hold another profile's value.
+    3. No scope installed:
+       - multiplex INACTIVE (default deployment): read ``os.environ`` —
+         identical to the legacy ``os.getenv`` behavior every caller had before.
+       - multiplex ACTIVE: FAIL CLOSED. Raise ``UnscopedSecretError`` so the
+         missing scope is caught loudly instead of leaking a cross-profile value.
+    """
+    if _is_global_env(name):
+        val = os.environ.get(name)
+        return val if val is not None else default
+
+    scope = _SECRET_SCOPE.get()
+    if scope is not None:
+        val = scope.get(name)
+        return val if val is not None else default
+
+    if _MULTIPLEX_ACTIVE:
+        raise UnscopedSecretError(
+            f"get_secret({name!r}) called with no profile secret scope active "
+            f"while multiplexing is on. This credential read must run inside a "
+            f"set_secret_scope(...) block (the per-turn / per-adapter profile "
+            f"scope). Reading os.environ here would risk leaking another "
+            f"profile's value. See docs/design/multiplexing-gateway.md "
+            f"(Workstream A)."
+        )
+
+    val = os.environ.get(name)
+    return val if val is not None else default
+
+
+def load_env_file(env_path: Path) -> Dict[str, str]:
+    """Parse a ``.env`` file into a plain dict WITHOUT touching ``os.environ``.
+
+    Used to load a profile's secrets into an isolated mapping for
+    ``set_secret_scope``. Mirrors python-dotenv's basic parsing (KEY=VALUE,
+    ``export`` prefix, ``#`` comments, optional matching quotes) but never
+    mutates the process environment — that isolation is the whole point.
+    """
+    secrets: Dict[str, str] = {}
+    try:
+        text = env_path.read_text(encoding="utf-8")
+    except (FileNotFoundError, OSError, UnicodeDecodeError):
+        return secrets
+
+    for raw in text.splitlines():
+        line = raw.strip()
+        if not line or line.startswith("#"):
+            continue
+        if line.startswith("export "):
+            line = line[len("export "):].lstrip()
+        if "=" not in line:
+            continue
+        key, _, value = line.partition("=")
+        key = key.strip()
+        if not key:
+            continue
+        value = value.strip()
+        if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
+            value = value[1:-1]
+        secrets[key] = value
+
+    return secrets
+
+
+def build_profile_secret_scope(hermes_home: Path) -> Dict[str, str]:
+    """Build a profile's secret mapping from its ``<home>/.env``.
+
+    Returns a fresh dict (safe to install via ``set_secret_scope``). Genuinely
+    global vars are intentionally NOT copied in — ``get_secret`` reads those
+    from ``os.environ`` directly, so the scope holds only profile secrets.
+    """
+    return load_env_file(Path(hermes_home) / ".env")
+
--- a/agent/shell_hooks.py
+++ b/agent/shell_hooks.py
@ -49,6 +49,58 @@ Wire protocol

    # Silent no-op:
    <empty or any non-matching JSON object>
+
+Per-event ``extra`` keys
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``extra`` object contains every kwarg that is **not** one of the
+top-level payload keys (``tool_name``, ``args``, ``session_id``,
+``parent_session_id``).  The tables below list the ``extra`` keys
+emitted by each built-in hook site.
+
+``post_tool_call`` (emitted from ``model_tools.py``)::
+
+    result          – tool return value (serialised string)
+    status          – "ok" | "error" | "blocked"
+    error_type      – error category (e.g. "ValueError"), or None
+    error_message   – human-readable error text, or None
+    duration_ms     – wall-clock time in milliseconds
+    task_id         – current task id (empty string if none)
+    tool_call_id    – provider tool-call id
+    turn_id         – current turn id
+    api_request_id  – current API request id
+    middleware_trace – list of dicts from tool middleware chain
+
+``pre_tool_call`` (emitted from ``model_tools.py``)::
+
+    task_id         – current task id (empty string if none)
+    tool_call_id    – provider tool-call id
+    turn_id         – current turn id
+    api_request_id  – current API request id
+    middleware_trace – list of dicts from tool middleware chain
+
+``on_session_start`` (emitted from ``agent/conversation_loop.py``)::
+
+    model           – model name (e.g. "claude-sonnet-4-20250514")
+    platform        – platform identifier (e.g. "cli", "whatsapp")
+
+``on_session_end`` (emitted from ``agent/turn_finalizer.py``)::
+
+    task_id         – current task id
+    turn_id         – current turn id
+    completed       – bool, True when the turn produced a final response
+    interrupted     – bool, True when the user interrupted
+    model           – model name
+    platform        – platform identifier
+
+``subagent_stop`` (emitted from ``tools/delegate_tool.py``)::
+
+    parent_turn_id  – parent agent's current turn id
+    child_session_id – child (subagent) session id
+    child_role      – role string of the child agent
+    child_summary   – summary of the child's work
+    child_status    – exit status string (e.g. "success", "error")
+    duration_ms     – wall-clock time of the child run in milliseconds
 """

 from __future__ import annotations
--- a/agent/skill_utils.py
+++ b/agent/skill_utils.py
@ -280,9 +280,9 @@ def skill_matches_environment(frontmatter: Dict[str, Any]) -> bool:
    This is an OFFER-time filter: it controls whether a skill shows up in the
    skills index / autocomplete / slash-command list. It is intentionally NOT
    enforced by ``skill_view`` or ``--skills`` preloading — an explicit load is
-    explicit consent, and load-bearing force-loads (e.g. the kanban dispatcher
-    injecting ``--skills kanban-worker``) must always succeed regardless of how
-    the offer surfaces filter the skill.
+    explicit consent, and load-bearing force-loads (e.g. a dispatcher pinning
+    a task to a specialist skill via ``--skills``) must always succeed
+    regardless of how the offer surfaces filter the skill.

    A skill matches when ANY of its declared environments is currently active
    (OR semantics, mirroring ``platforms``). Unknown env tags fail open.
--- a/agent/title_generator.py
+++ b/agent/title_generator.py
@ -22,9 +22,31 @@ TitleCallback = Callable[[str], None]
 _TITLE_PROMPT = (
    "Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
    "following exchange. The title should capture the main topic or intent. "
+    "Write the title in the same language the user is writing in. "
    "Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
 )

+_TITLE_PROMPT_PINNED_LANGUAGE = (
+    "Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
+    "following exchange. The title should capture the main topic or intent. "
+    "Write the title in {language}. "
+    "Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
+)
+
+
+def _title_language() -> str:
+    """Return configured title language, or empty string to match the user."""
+    try:
+        from hermes_cli.config import load_config
+
+        return str(
+            ((load_config() or {}).get("auxiliary") or {})
+            .get("title_generation", {})
+            .get("language", "")
+        ).strip()
+    except Exception:
+        return ""
+

 def generate_title(
    user_message: str,
@ -48,8 +70,11 @@ def generate_title(
    user_snippet = user_message[:500] if user_message else ""
    assistant_snippet = assistant_response[:500] if assistant_response else ""

+    language = _title_language()
+    prompt = _TITLE_PROMPT_PINNED_LANGUAGE.format(language=language) if language else _TITLE_PROMPT
+
    messages = [
-        {"role": "system", "content": _TITLE_PROMPT},
+        {"role": "system", "content": prompt},
        {"role": "user", "content": f"User: {user_snippet}\n\nAssistant: {assistant_snippet}"},
    ]

--- a/agent/tool_executor.py
+++ b/agent/tool_executor.py
@ -44,9 +44,26 @@ from tools.tool_result_storage import (
    maybe_persist_tool_result,
    enforce_turn_budget,
 )
+from tools.budget_config import BudgetConfig, DEFAULT_BUDGET, budget_for_context_window

 logger = logging.getLogger(__name__)

+
+def _budget_for_agent(agent) -> BudgetConfig:
+    """Resolve a tool-result BudgetConfig scaled to the agent's context window.
+
+    Large-context models keep the historical 100K/200K char defaults; small
+    models (e.g. a 65K-token local model switched into mid-session) get a budget
+    proportional to their window so a single large tool result can't push the
+    request past the model's limit (#23767). Falls back to the default budget
+    when the context length isn't resolvable.
+    """
+    try:
+        ctx = getattr(getattr(agent, "context_compressor", None), "context_length", None)
+        return budget_for_context_window(int(ctx)) if ctx else DEFAULT_BUDGET
+    except Exception:
+        return DEFAULT_BUDGET
+
 # Maximum number of concurrent worker threads for parallel tool execution.
 # Mirrors the constant in ``run_agent`` for tests/imports that look here.
 _MAX_TOOL_WORKERS = 8
@ -249,6 +266,10 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
    tool_calls = assistant_message.tool_calls
    num_tools = len(tool_calls)

+    # Resolve the context-scaled tool-output budget once per turn (cheap, but
+    # avoids rebuilding it per result inside the loop below).
+    _tool_budget = _budget_for_agent(agent)
+
    # ── Pre-flight: interrupt check ──────────────────────────────────
    if agent._interrupt_requested:
        print(f"{agent.log_prefix}⚡ Interrupt: skipping {num_tools} tool call(s)")
@ -725,6 +746,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
            tool_name=name,
            tool_use_id=tc.id,
            env=get_active_env(effective_task_id),
+            config=_tool_budget,
        ) if not _is_multimodal_tool_result(function_result) else function_result

        subdir_hints = agent._subdirectory_hints.check_tool_call(name, args)
@ -756,7 +778,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
    num_tools = len(parsed_calls)
    if num_tools > 0:
        turn_tool_msgs = messages[-num_tools:]
-        enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id))
+        enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id), config=_tool_budget)

    # ── /steer injection ──────────────────────────────────────────────
    # Append any pending user steer text to the last tool result so the
@ -769,6 +791,8 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe

 def execute_tool_calls_sequential(agent, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
    """Execute tool calls sequentially (original behavior). Used for single calls or interactive tools."""
+    # Resolve the context-scaled tool-output budget once per turn.
+    _tool_budget = _budget_for_agent(agent)
    for i, tool_call in enumerate(assistant_message.tool_calls, 1):
        # SAFETY: check interrupt BEFORE starting each tool.
        # If the user sent "stop" during a previous tool's execution,
@ -1377,6 +1401,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
            tool_name=function_name,
            tool_use_id=tool_call.id,
            env=get_active_env(effective_task_id),
+            config=_tool_budget,
        ) if not _is_multimodal_tool_result(function_result) else function_result

        # Discover subdirectory context files from tool arguments
@ -1425,7 +1450,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
    # ── Per-turn aggregate budget enforcement ─────────────────────────
    num_tools_seq = len(assistant_message.tool_calls)
    if num_tools_seq > 0:
-        enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id))
+        enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id), config=_tool_budget)

    # ── /steer injection ──────────────────────────────────────────────
    # See _execute_tool_calls_parallel for the rationale. Same hook,
--- a/agent/transports/chat_completions.py
+++ b/agent/transports/chat_completions.py
@ -172,6 +172,7 @@ class ChatCompletionsTransport(ProviderTransport):
                "codex_reasoning_items" in msg
                or "codex_message_items" in msg
                or "tool_name" in msg
+                or "timestamp" in msg  # #47868 — strict providers reject this
            ):
                needs_sanitize = True
                break
@ -201,6 +202,7 @@ class ChatCompletionsTransport(ProviderTransport):
            msg.pop("codex_reasoning_items", None)
            msg.pop("codex_message_items", None)
            msg.pop("tool_name", None)
+            msg.pop("timestamp", None)  # #47868 — leak into strict providers
            # Drop all Hermes-internal scaffolding markers (``_``-prefixed).
            # OpenAI's message schema has no ``_``-prefixed fields, so this
            # is safe and future-proofs against new markers being added.
@ -435,10 +437,6 @@ class ChatCompletionsTransport(ProviderTransport):
                    extra_body["extra_body"] = openai_compat_extra
            elif raw_thinking_config:
                extra_body["thinking_config"] = raw_thinking_config
-        elif provider_name == "google-gemini-cli":
-            thinking_config = _build_gemini_thinking_config(model, reasoning_config)
-            if thinking_config:
-                extra_body["thinking_config"] = thinking_config

        # Merge any pre-built extra_body additions
        additions = params.get("extra_body_additions")
--- a/agent/turn_context.py
+++ b/agent/turn_context.py
@ -112,6 +112,24 @@ def build_turn_context(
    # Restore the primary runtime if the previous turn activated fallback.
    agent._restore_primary_runtime()

+    # Between-turns MCP refresh: an MCP server that finished connecting since
+    # the previous turn (slow HTTP/OAuth servers routinely take 2-6s on a cold
+    # connect, missing the bounded startup wait) lands in THIS turn's tool
+    # snapshot.  This is cache-safe by construction: it runs in the per-turn
+    # prologue, before this turn's first API call assembles ``tools=``, so it
+    # only ever extends a fresh request prefix — it never mutates the cached
+    # prefix of an in-flight turn.  No-op when no MCP servers are registered
+    # (the common case, gated by the cheap ``has_registered_mcp_tools`` check)
+    # or when the tool set is unchanged (``refresh_agent_mcp_tools`` diffs by
+    # name and leaves the snapshot untouched on no-change).
+    try:
+        if not getattr(agent, "_skip_mcp_refresh", False):
+            from tools.mcp_tool import has_registered_mcp_tools, refresh_agent_mcp_tools
+            if has_registered_mcp_tools():
+                refresh_agent_mcp_tools(agent, quiet_mode=True)
+    except Exception:
+        logger.debug("between-turns MCP tool refresh skipped", exc_info=True)
+
    # Sanitize surrogate characters from user input.
    if isinstance(user_message, str):
        user_message = sanitize_surrogates(user_message)
--- a/agent/turn_finalizer.py
+++ b/agent/turn_finalizer.py
@ -128,19 +128,44 @@ def finalize_turn(
        and not failed
    )

+    # Post-loop cleanup must never lose the response.  Trajectory save,
+    # resource teardown, and session persistence all touch fallible
+    # surfaces — file I/O / JSON serialization (_save_trajectory), remote
+    # VM/browser teardown over the network (_cleanup_task_resources), and
+    # SQLite writes (_persist_session).  A raise from any of them used to
+    # propagate straight out of run_conversation, discarding the partial
+    # final_response the caller is waiting for (subprocess wrappers saw an
+    # empty stdout with no traceback — #8049).  Each step is now guarded
+    # independently so one failure can't skip the others, and any errors
+    # are surfaced on the result dict via ``cleanup_errors`` rather than
+    # killing the turn.
+    _cleanup_errors = []
+
    # Save trajectory if enabled.  ``user_message`` may be a multimodal
    # list of parts; the trajectory format wants a plain string.
-    agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
+    try:
+        agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
+    except Exception as _save_err:
+        _cleanup_errors.append(f"save_trajectory: {_save_err}")
+        logger.error("finalize_turn: _save_trajectory failed: %s", _save_err, exc_info=True)

    # Clean up VM and browser for this task after conversation completes
-    agent._cleanup_task_resources(effective_task_id)
+    try:
+        agent._cleanup_task_resources(effective_task_id)
+    except Exception as _cleanup_err:
+        _cleanup_errors.append(f"cleanup_task_resources: {_cleanup_err}")
+        logger.error("finalize_turn: _cleanup_task_resources failed: %s", _cleanup_err, exc_info=True)

    # Persist session to both JSON log and SQLite only after private retry
    # scaffolding has been removed. Otherwise a later user "continue" turn
    # can replay assistant("(empty)") / recovery nudges and fall into the
    # same empty-response loop again.
-    agent._drop_trailing_empty_response_scaffolding(messages)
-    agent._persist_session(messages, conversation_history)
+    try:
+        agent._drop_trailing_empty_response_scaffolding(messages)
+        agent._persist_session(messages, conversation_history)
+    except Exception as _persist_err:
+        _cleanup_errors.append(f"persist_session: {_persist_err}")
+        logger.error("finalize_turn: _persist_session failed: %s", _persist_err, exc_info=True)

    # ── Turn-exit diagnostic log ─────────────────────────────────────
    # Always logged at INFO so agent.log captures WHY every turn ended.
@ -354,6 +379,11 @@ def finalize_turn(
    }
    if agent._tool_guardrail_halt_decision is not None:
        result["guardrail"] = agent._tool_guardrail_halt_decision.to_metadata()
+    # Surface any post-loop cleanup failures so the caller can distinguish a
+    # clean turn from one whose trajectory/session/resource teardown raised
+    # (the response is still returned either way — #8049).
+    if _cleanup_errors:
+        result["cleanup_errors"] = _cleanup_errors
    # If a /steer landed after the final assistant turn (no more tool
    # batches to drain into), hand it back to the caller so it can be
    # delivered as the next user turn instead of being silently lost.
--- a/agent/turn_retry_state.py
+++ b/agent/turn_retry_state.py
@ -58,6 +58,12 @@ class TurnRetryState:
    primary_recovery_attempted: bool = False
    has_retried_429: bool = False

+    # ── Auth-failure provider failover ───────────────────────────────────
+    # Set once we've escalated a persistent 401/403 (after the per-provider
+    # credential-refresh attempt above failed) to the fallback chain, so we
+    # don't loop on the same auth failover within one attempt.
+    auth_failover_attempted: bool = False
+
    # ── Restart signals (read by the outer loop after the attempt) ───────
    restart_with_compressed_messages: bool = False
    restart_with_length_continuation: bool = False
--- a/agent/usage_pricing.py
+++ b/agent/usage_pricing.py
@ -451,6 +451,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
    ): PricingEntry(
        input_cost_per_million=Decimal("15.00"),
        output_cost_per_million=Decimal("75.00"),
+        cache_read_cost_per_million=Decimal("1.50"),
+        cache_write_cost_per_million=Decimal("18.75"),
        source="official_docs_snapshot",
        source_url="https://aws.amazon.com/bedrock/pricing/",
        pricing_version="bedrock-pricing-2026-04",
@ -461,6 +463,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
    ): PricingEntry(
        input_cost_per_million=Decimal("3.00"),
        output_cost_per_million=Decimal("15.00"),
+        cache_read_cost_per_million=Decimal("0.30"),
+        cache_write_cost_per_million=Decimal("3.75"),
        source="official_docs_snapshot",
        source_url="https://aws.amazon.com/bedrock/pricing/",
        pricing_version="bedrock-pricing-2026-04",
@ -471,6 +475,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
    ): PricingEntry(
        input_cost_per_million=Decimal("3.00"),
        output_cost_per_million=Decimal("15.00"),
+        cache_read_cost_per_million=Decimal("0.30"),
+        cache_write_cost_per_million=Decimal("3.75"),
        source="official_docs_snapshot",
        source_url="https://aws.amazon.com/bedrock/pricing/",
        pricing_version="bedrock-pricing-2026-04",
@ -481,6 +487,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
    ): PricingEntry(
        input_cost_per_million=Decimal("0.80"),
        output_cost_per_million=Decimal("4.00"),
+        cache_read_cost_per_million=Decimal("0.08"),
+        cache_write_cost_per_million=Decimal("1.00"),
        source="official_docs_snapshot",
        source_url="https://aws.amazon.com/bedrock/pricing/",
        pricing_version="bedrock-pricing-2026-04",
@ -584,6 +592,26 @@ def resolve_billing_route(
    return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")


+def _normalize_bedrock_model_name(model: str) -> str:
+    """Normalize a Bedrock model id to its bare foundation-model form.
+
+    Bedrock cross-region inference profiles prefix the foundation model id
+    with a region scope (``us.`` / ``global.`` / ``eu.`` / ``ap.`` / ``jp.``),
+    e.g. ``us.anthropic.claude-opus-4-7``.  The pricing table is keyed on the
+    bare ``anthropic.claude-*`` id, so the prefix must be stripped before the
+    lookup or every cross-region session prices as unknown.  Mirrors the
+    prefix list in ``bedrock_adapter.is_anthropic_bedrock_model``.  Also
+    normalizes dot-notation version numbers (``4.7`` → ``4-7``).
+    """
+    name = model.lower().strip()
+    for prefix in ("us.", "global.", "eu.", "ap.", "jp."):
+        if name.startswith(prefix):
+            name = name[len(prefix):]
+            break
+    name = re.sub(r"(\d+)\.(\d+)", r"\1-\2", name)
+    return name
+
+
 def _normalize_anthropic_model_name(model: str) -> str:
    """Normalize Anthropic model name variants to canonical form.

@ -614,6 +642,14 @@ def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]
            entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
            if entry:
                return entry
+    # Bedrock cross-region inference profiles carry a region prefix
+    # (us./global./eu./...) that the bare pricing keys don't have.
+    if route.provider == "bedrock":
+        normalized = _normalize_bedrock_model_name(model)
+        if normalized != model:
+            entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
+            if entry:
+                return entry
    return None


--- a/apps/bootstrap-installer/src-tauri/src/paths.rs
+++ b/apps/bootstrap-installer/src-tauri/src/paths.rs
@ -77,6 +77,19 @@ pub fn installer_dest() -> PathBuf {
    hermes_home().join(name)
 }

+/// Marker the updater writes for the duration of an in-app update and removes
+/// when it finishes (see update.rs `UpdateMarkerGuard`). A freshly-launched
+/// desktop checks this before spawning its own local backend: spawning one
+/// mid-update re-locks the venv shim and triggers `force_kill_other_hermes`,
+/// which then kills that legitimate backend in a respawn loop (#50238).
+///
+/// Lives directly under HERMES_HOME (same rationale as `installer_dest`) so the
+/// Electron desktop — which resolves HERMES_HOME identically and pins it into
+/// the updater's env — agrees on the exact path.
+pub fn update_in_progress_marker() -> PathBuf {
+    hermes_home().join(".hermes-update-in-progress")
+}
+
 /// Copy the currently-running installer binary to `installer_dest()` so it's
 /// available for future `--update` runs and shortcut launches.
 ///
--- a/apps/bootstrap-installer/src-tauri/src/update.rs
+++ b/apps/bootstrap-installer/src-tauri/src/update.rs
@ -103,9 +103,61 @@ pub async fn start_update(app: AppHandle) -> Result<(), String> {
    Ok(())
 }

+/// RAII guard that owns the "update in progress" marker (see
+/// `paths::update_in_progress_marker`). Created at the top of `run_update`;
+/// its `Drop` removes the marker on EVERY exit path — success, early
+/// `return Err`, or a panic that unwinds through `run_update` — so a crashed
+/// or aborted updater can never permanently strand the marker and block
+/// future desktop launches. The marker payload is `{pid}\n{started_at_unix}`
+/// so the desktop's launch gate can detect a stale marker (dead PID / past a
+/// hard ceiling) and self-heal rather than wait forever.
+struct UpdateMarkerGuard {
+    path: PathBuf,
+}
+
+impl UpdateMarkerGuard {
+    /// Write the marker. Best-effort: a write failure must NOT abort the
+    /// update (the gate degrades to "no marker => proceed", i.e. exactly the
+    /// pre-fix behavior), so we log and carry on with a guard that still
+    /// attempts cleanup of whatever may exist at the path.
+    fn acquire(path: PathBuf) -> Self {
+        let pid = std::process::id();
+        let started_at = std::time::SystemTime::now()
+            .duration_since(std::time::UNIX_EPOCH)
+            .map(|d| d.as_secs())
+            .unwrap_or(0);
+        if let Some(parent) = path.parent() {
+            let _ = std::fs::create_dir_all(parent);
+        }
+        if let Err(err) = std::fs::write(&path, format!("{pid}\n{started_at}")) {
+            tracing::warn!(?path, %err, "could not write update-in-progress marker");
+        }
+        Self { path }
+    }
+}
+
+impl Drop for UpdateMarkerGuard {
+    fn drop(&mut self) {
+        if let Err(err) = std::fs::remove_file(&self.path) {
+            if err.kind() != std::io::ErrorKind::NotFound {
+                tracing::warn!(path = ?self.path, %err, "could not remove update-in-progress marker");
+            }
+        }
+    }
+}
+
 async fn run_update(app: AppHandle) -> Result<()> {
    let hermes_home = crate::paths::hermes_home();
    let install_root = hermes_home.join("hermes-agent");
+
+    // Mutual exclusion (#50238): publish an "update in progress" marker for the
+    // entire duration of this update. A desktop instance the user relaunches
+    // mid-update consults this before spawning its own local backend — without
+    // it, that backend re-locks the venv shim, our `force_kill_other_hermes`
+    // straggler-cleanup kills it, and the relaunch/kill cycle loops. The guard
+    // removes the marker on every exit path (incl. early returns / panics).
+    let _update_marker = UpdateMarkerGuard::acquire(crate::paths::update_in_progress_marker());
+
    let update_branch = update_branch_from_args(std::env::args().skip(1))
        .or_else(|| option_env_string("BUILD_PIN_BRANCH"))
        .unwrap_or_else(|| "main".to_string());
@ -518,11 +570,13 @@ fn format_locked_paths(paths: &[PathBuf]) -> String {
 /// taskkill, excluding our own PID.
 ///
 /// Safe w.r.t. our own update child: this runs inside the install-lock wait,
-/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. At this
-/// point no update-driven hermes.exe exists yet, so the only hermes.exe images
-/// are stragglers from the old desktop — exactly what we want gone. (`/FI PID
-/// ne <self>` also spares this Tauri process, though it isn't named
-/// hermes.exe.)
+/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. And a
+/// desktop the user relaunches mid-update will NOT have spawned a backend —
+/// `startHermes()` in the desktop gates local-backend startup on our
+/// update-in-progress marker and parks until we finish (#50238). So the only
+/// hermes.exe images here are stragglers from the old desktop — exactly what
+/// we want gone. (`/FI PID ne <self>` also spares this Tauri process, though it
+/// isn't named hermes.exe.)
 fn force_kill_other_hermes() {
    if !cfg!(target_os = "windows") {
        return;
@ -992,6 +1046,48 @@ mod tests {
        assert!(locked_paths(&probes).is_empty());
    }

+    #[test]
+    fn update_marker_guard_writes_then_removes_on_drop() {
+        let dir = unique_tmp_dir("marker-guard");
+        std::fs::create_dir_all(&dir).unwrap();
+        let marker = dir.join(".hermes-update-in-progress");
+
+        {
+            let _g = UpdateMarkerGuard::acquire(marker.clone());
+            assert!(marker.exists(), "marker must exist while the guard is held");
+            let body = std::fs::read_to_string(&marker).unwrap();
+            let pid_line = body.lines().next().unwrap();
+            assert_eq!(
+                pid_line.trim().parse::<u32>().unwrap(),
+                std::process::id(),
+                "marker records our pid so the desktop can probe liveness"
+            );
+            assert_eq!(body.lines().count(), 2, "marker is pid + started_at lines");
+        }
+
+        assert!(
+            !marker.exists(),
+            "Drop must remove the marker on every exit path (incl. early return / panic unwind)"
+        );
+        let _ = std::fs::remove_dir_all(&dir);
+    }
+
+    #[test]
+    fn update_marker_guard_drop_is_quiet_when_already_gone() {
+        let dir = unique_tmp_dir("marker-guard-gone");
+        std::fs::create_dir_all(&dir).unwrap();
+        let marker = dir.join(".hermes-update-in-progress");
+
+        let guard = UpdateMarkerGuard::acquire(marker.clone());
+        // Simulate an external cleanup (e.g. the desktop pruned a marker it
+        // judged stale) before our guard drops — Drop must not panic.
+        std::fs::remove_file(&marker).unwrap();
+        drop(guard);
+
+        assert!(!marker.exists());
+        let _ = std::fs::remove_dir_all(&dir);
+    }
+
    #[test]
    fn parses_update_branch_from_space_or_equals_args() {
        assert_eq!(
--- a/apps/desktop/README.md
+++ b/apps/desktop/README.md
@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig

 ### How it works

-The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
+The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.

 ### Verification

--- a/apps/desktop/electron/backend-ready.cjs
+++ b/apps/desktop/electron/backend-ready.cjs
@ -1,5 +1,32 @@
 const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m

+// The announcement clock starts the instant the backend process is spawned —
+// before uvicorn binds its socket. On a cold install the child must first
+// compile and import the whole `hermes_cli.main` → `web_server` → FastAPI/
+// uvicorn chain, and on Windows real-time AV (Defender) scans every freshly
+// written `.pyc`. That pre-bind cost can run 30-60s on a slow disk, so a tight
+// 45s deadline kills a *healthy but still-starting* backend and respawns it,
+// piling up orphaned processes (issue #50209). A roomier default absorbs the
+// cold-start cost; a warm start still announces in well under a second.
+const DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS = 90_000
+// Never trust a deadline tighter than the warm-start path needs; floor at 45s
+// (the historical default) so a malformed override can't reintroduce the loop.
+const MIN_PORT_ANNOUNCE_TIMEOUT_MS = 45_000
+
+/**
+ * Resolve the port-announcement deadline. Honors the
+ * HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS env override (for users on slow
+ * disks / aggressive AV who need an even longer cold-start window), clamped
+ * to a sane floor so a bad value can't make boot flakier than the default.
+ */
+function resolvePortAnnounceTimeoutMs(env = process.env) {
+  const parsed = Number(env.HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS)
+  if (Number.isFinite(parsed) && parsed > 0) {
+    return Math.max(MIN_PORT_ANNOUNCE_TIMEOUT_MS, Math.round(parsed))
+  }
+  return DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS
+}
+
 /**
 * Watch a child process's stdout for the `HERMES_DASHBOARD_READY port=<N>`
 * line that web_server.py prints after uvicorn binds its socket.
@ -9,11 +36,15 @@ const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
 *   - the child emits an `error` event
 *   - no line arrives within the timeout
 *
+ * The default timeout is cold-start tolerant (see
+ * DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS) because the clock starts before the
+ * backend has even bound its port. Pass an explicit `timeoutMs` to override.
+ *
 * A single `cleanup()` tears down every listener (data/exit/error/timeout)
 * on every terminal path — resolve, reject, or timeout — so repeated
 * backend spawns don't leak listener slots on the child.
 */
-function waitForDashboardPort(child, timeoutMs = 45_000) {
+function waitForDashboardPort(child, timeoutMs = resolvePortAnnounceTimeoutMs()) {
  return new Promise((resolve, reject) => {
    let buf = ''
    let done = false
@ -63,4 +94,9 @@ function waitForDashboardPort(child, timeoutMs = 45_000) {
  })
 }

-module.exports = { waitForDashboardPort }
+module.exports = {
+  waitForDashboardPort,
+  resolvePortAnnounceTimeoutMs,
+  DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+  MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+}
--- a/apps/desktop/electron/backend-ready.test.cjs
+++ b/apps/desktop/electron/backend-ready.test.cjs
@ -0,0 +1,121 @@
+/**
+ * Tests for electron/backend-ready.cjs.
+ *
+ * Run with: node --test electron/backend-ready.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * Covers the cold-start port-announcement deadline (issue #50209): the clock
+ * starts before the backend binds its port, so a tight 45s deadline killed a
+ * healthy-but-still-compiling backend on cold Windows installs. The default is
+ * now cold-start tolerant and overridable via
+ * HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const { EventEmitter } = require('node:events')
+
+const {
+  waitForDashboardPort,
+  resolvePortAnnounceTimeoutMs,
+  DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+  MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+} = require('./backend-ready.cjs')
+
+// A minimal stand-in for a spawned child process: an EventEmitter with a
+// stdout EventEmitter, matching the surface waitForDashboardPort consumes
+// (child.stdout.on('data'), child.on('exit'|'error') + the .off() teardown).
+function makeFakeChild() {
+  const child = new EventEmitter()
+  child.stdout = new EventEmitter()
+  return child
+}
+
+// ---------------------------------------------------------------------------
+// resolvePortAnnounceTimeoutMs
+// ---------------------------------------------------------------------------
+
+test('default is cold-start tolerant (> the historical 45s floor)', () => {
+  assert.equal(resolvePortAnnounceTimeoutMs({}), DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS)
+  assert.ok(
+    DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS > MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+    'cold-start default must exceed the warm-start floor'
+  )
+})
+
+test('honors a valid HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS override', () => {
+  const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '120000' }
+  assert.equal(resolvePortAnnounceTimeoutMs(env), 120_000)
+})
+
+test('clamps an override below the floor up to the 45s minimum', () => {
+  const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '1000' }
+  assert.equal(resolvePortAnnounceTimeoutMs(env), MIN_PORT_ANNOUNCE_TIMEOUT_MS)
+})
+
+test('rounds a fractional override', () => {
+  const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '60000.7' }
+  assert.equal(resolvePortAnnounceTimeoutMs(env), 60_001)
+})
+
+test('falls back to the default for malformed / non-positive overrides', () => {
+  for (const bad of ['', 'abc', '0', '-5', 'NaN', undefined]) {
+    const env = bad === undefined ? {} : { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: bad }
+    assert.equal(
+      resolvePortAnnounceTimeoutMs(env),
+      DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+      `override ${JSON.stringify(bad)} should fall through to the default`
+    )
+  }
+})
+
+// ---------------------------------------------------------------------------
+// waitForDashboardPort
+// ---------------------------------------------------------------------------
+
+test('resolves with the announced port', async () => {
+  const child = makeFakeChild()
+  const p = waitForDashboardPort(child, 1000)
+  child.stdout.emit('data', 'noise before\nHERMES_DASHBOARD_READY port=54321\n')
+  assert.equal(await p, 54321)
+})
+
+test('parses the port even when the line arrives split across chunks', async () => {
+  const child = makeFakeChild()
+  const p = waitForDashboardPort(child, 1000)
+  child.stdout.emit('data', 'HERMES_DASHBOARD_READY po')
+  child.stdout.emit('data', 'rt=8080\n')
+  assert.equal(await p, 8080)
+})
+
+test('rejects when the child exits before announcing', async () => {
+  const child = makeFakeChild()
+  const p = waitForDashboardPort(child, 1000)
+  child.emit('exit', 1, null)
+  await assert.rejects(p, /exited before port announcement/)
+})
+
+test('rejects on a child error event', async () => {
+  const child = makeFakeChild()
+  const p = waitForDashboardPort(child, 1000)
+  child.emit('error', new Error('spawn ENOENT'))
+  await assert.rejects(p, /spawn ENOENT/)
+})
+
+test('rejects with the timeout message after the deadline', async () => {
+  const child = makeFakeChild()
+  await assert.rejects(
+    waitForDashboardPort(child, 20),
+    /Timed out waiting for Hermes backend port announcement \(20ms\)/
+  )
+})
+
+test('a late announcement after timeout does not throw (listeners torn down)', async () => {
+  const child = makeFakeChild()
+  await assert.rejects(waitForDashboardPort(child, 20), /Timed out/)
+  // The orphaned backend may still print its READY line later; the watcher
+  // must have detached so this emit is a no-op rather than a double-settle.
+  assert.doesNotThrow(() => {
+    child.stdout.emit('data', 'HERMES_DASHBOARD_READY port=9999\n')
+  })
+})
--- a/apps/desktop/electron/link-title-window.cjs
+++ b/apps/desktop/electron/link-title-window.cjs
@ -0,0 +1,42 @@
+'use strict'
+
+// Hidden BrowserWindow used by tier-2 link-title resolution: when curl can't
+// read a page <title> (bot walls, JS-rendered pages), we briefly load the URL
+// in an offscreen window and read its title. That window loads arbitrary
+// user-linked pages — including YouTube/`watch` URLs that autoplay — so it must
+// never be allowed to emit sound.
+
+function linkTitleWindowOptions(partitionSession) {
+  return {
+    show: false,
+    width: 1280,
+    height: 800,
+    webPreferences: {
+      backgroundThrottling: false,
+      contextIsolation: true,
+      javascript: true,
+      nodeIntegration: false,
+      sandbox: true,
+      session: partitionSession,
+      webSecurity: true
+    }
+  }
+}
+
+// Create the offscreen title-fetch window and immediately mute it. Without the
+// mute, autoplaying media on the loaded page (e.g. a YouTube link) leaks ~2s of
+// audio every time a session containing such links is re-rendered. See #49505.
+function createLinkTitleWindow(BrowserWindow, partitionSession) {
+  const window = new BrowserWindow(linkTitleWindowOptions(partitionSession))
+
+  try {
+    window.webContents.setAudioMuted(true)
+  } catch {
+    // webContents may be unavailable in degraded/headless environments; muting
+    // is best-effort and the window is destroyed within a few seconds anyway.
+  }
+
+  return window
+}
+
+module.exports = { createLinkTitleWindow, linkTitleWindowOptions }
--- a/apps/desktop/electron/link-title-window.test.cjs
+++ b/apps/desktop/electron/link-title-window.test.cjs
@ -0,0 +1,56 @@
+const assert = require('node:assert/strict')
+const test = require('node:test')
+
+const { createLinkTitleWindow, linkTitleWindowOptions } = require('./link-title-window.cjs')
+
+function makeFakeBrowserWindow() {
+  const calls = { audioMuted: [] }
+  const FakeBrowserWindow = function (options) {
+    this.options = options
+    this.webContents = {
+      setAudioMuted(value) {
+        calls.audioMuted.push(value)
+      }
+    }
+  }
+
+  return { FakeBrowserWindow, calls }
+}
+
+test('linkTitleWindowOptions keeps the offscreen, hardened defaults', () => {
+  const session = { id: 'link-titles' }
+  const options = linkTitleWindowOptions(session)
+
+  assert.equal(options.show, false)
+  assert.equal(options.webPreferences.session, session)
+  assert.equal(options.webPreferences.contextIsolation, true)
+  assert.equal(options.webPreferences.sandbox, true)
+  assert.equal(options.webPreferences.nodeIntegration, false)
+})
+
+test('createLinkTitleWindow mutes audio so historical links never autoplay sound', () => {
+  // Regression for #49505: the hidden title-fetch window loaded YouTube/watch
+  // URLs (to read their <title>) without muting, leaking ~2s of audio on every
+  // history re-render.
+  const { FakeBrowserWindow, calls } = makeFakeBrowserWindow()
+
+  const window = createLinkTitleWindow(FakeBrowserWindow, { id: 'link-titles' })
+
+  assert.ok(window instanceof FakeBrowserWindow)
+  assert.deepEqual(calls.audioMuted, [true])
+})
+
+test('createLinkTitleWindow still returns the window if muting throws', () => {
+  const ThrowingBrowserWindow = function (options) {
+    this.options = options
+    this.webContents = {
+      setAudioMuted() {
+        throw new Error('webContents unavailable')
+      }
+    }
+  }
+
+  const window = createLinkTitleWindow(ThrowingBrowserWindow, { id: 'link-titles' })
+
+  assert.ok(window instanceof ThrowingBrowserWindow)
+})
--- a/apps/desktop/electron/main.cjs
+++ b/apps/desktop/electron/main.cjs
@ -34,6 +34,7 @@ const {
  SESSION_WINDOW_MIN_WIDTH
 } = require('./session-windows.cjs')
 const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
+const { createLinkTitleWindow } = require('./link-title-window.cjs')
 const { probeGatewayWebSocket } = require('./gateway-ws-probe.cjs')
 const { adoptServedDashboardToken } = require('./dashboard-token.cjs')
 const { waitForDashboardPort } = require('./backend-ready.cjs')
@ -42,6 +43,16 @@ const { fetchMarketplaceThemes, searchMarketplaceThemes } = require('./vscode-ma
 const { buildDesktopBackendEnv, normalizeHermesHomeRoot } = require('./backend-env.cjs')
 const { readWindowsUserEnvVar } = require('./windows-user-env.cjs')
 const { readDirForIpc } = require('./fs-read-dir.cjs')
+const { readLiveUpdateMarker } = require('./update-marker.cjs')
+const {
+  resolveUnpackedRelease,
+  decideRelaunchOutcome,
+  sandboxPreflight,
+  sandboxFallbackFromEnv,
+  collectRelaunchArgs,
+  collectRelaunchEnv,
+  buildRelaunchScript
+} = require('./update-relaunch.cjs')
 const { gitRootForIpc } = require('./git-root.cjs')
 const { worktreesForIpc } = require('./git-worktrees.cjs')
 const { OFFICIAL_REPO_HTTPS_URL, isOfficialSshRemote } = require('./update-remote.cjs')
@ -150,6 +161,8 @@ if (REMOTE_DISPLAY_REASON) {
  )
 }

+ipcMain.handle('hermes:get-remote-display-reason', () => REMOTE_DISPLAY_REASON)
+
 // Keep the renderer running at full speed while the window is in the background
 // or occluded. The chat transcript streams to screen through a
 // requestAnimationFrame-gated flush; Chromium pauses rAF (and clamps timers)
@ -268,6 +281,23 @@ function resolveHermesHome() {
 }

 const HERMES_HOME = resolveHermesHome()
+
+function hermesManagedNodePathEntries() {
+  // NOTE: keep this ordering in sync with iter_hermes_node_dirs() in
+  // hermes_constants.py — this Node main process cannot import the Python
+  // module, so the platform-ordering rule is mirrored here.
+  const root = path.join(HERMES_HOME, 'node')
+  const bin = path.join(root, 'bin')
+  const entries = IS_WINDOWS ? [root, bin] : [bin, root]
+  return entries.filter(directoryExists)
+}
+
+function pathWithHermesManagedNode(...entries) {
+  return [...hermesManagedNodePathEntries(), ...entries, process.env.PATH]
+    .filter(Boolean)
+    .join(path.delimiter)
+}
+
 // ACTIVE_HERMES_ROOT — the canonical mutable Hermes install. Same path
 // install.ps1 / install.sh use, so a desktop-only user and a CLI-only user end
 // up with identical layouts and can share one install.
@ -1090,6 +1120,59 @@ function directoryExists(filePath) {
  }
 }

+// --- in-app update mutual exclusion (#50238) -------------------------------
+// The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
+// duration of an `--update` run (see update.rs UpdateMarkerGuard). If the user
+// relaunches the desktop mid-update — because the window vanished with no
+// progress and looks crashed — a fresh instance must NOT spawn its own local
+// backend: that backend re-locks the venv shim, the updater's straggler cleanup
+// (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
+// fails with the 45s "backend didn't come up" error, and the relaunch/kill
+// cycle loops. Instead the fresh instance parks until the update finishes, then
+// brings the backend up itself (it is the surviving instance — the updater's
+// own relaunch hits our single-instance lock and quits). Marker parsing +
+// staleness self-heal live in update-marker.cjs (unit-tested).
+
+// How long we'll park the launch waiting for a live update to finish before
+// giving up and starting the backend anyway (belt-and-suspenders alongside the
+// marker's own age ceiling; covers a stuck-but-alive updater).
+const UPDATE_WAIT_TIMEOUT_MS = 20 * 60 * 1000
+const UPDATE_WAIT_POLL_MS = 1000
+// How long the desktop lingers on the "updating, don't reopen" overlay after
+// spawning the detached updater, before it quits to release the venv shim. The
+// old 600ms was long enough to register the child process but far too short for
+// the user to READ the overlay — the window just vanished, looked like a crash,
+// and the user relaunched mid-update (the #50238 restart-loop trigger). A
+// couple of seconds lets the message land and bridges the gap until the
+// updater's own progress window appears. (#50419)
+const UPDATE_HANDOFF_DWELL_MS = 2500
+
+// Block until no live update is in progress (or we hit the wait timeout).
+// Emits a boot-progress phase so the renderer shows "Update in progress…"
+// rather than a frozen splash. Returns true if it parked at all.
+async function waitForUpdateToFinish() {
+  let marker = readLiveUpdateMarker(HERMES_HOME)
+  if (!marker) return false
+
+  rememberLog(`[updates] update in progress (pid=${marker.pid}); deferring backend start until it finishes`)
+  const deadline = Date.now() + UPDATE_WAIT_TIMEOUT_MS
+  while (marker && Date.now() < deadline) {
+    await advanceBootProgress(
+      'backend.update-wait',
+      'An update is finishing — Hermes will start automatically when it completes…',
+      12
+    )
+    await new Promise(r => setTimeout(r, UPDATE_WAIT_POLL_MS))
+    marker = readLiveUpdateMarker(HERMES_HOME)
+  }
+  if (marker) {
+    rememberLog('[updates] update still in progress after wait timeout; starting backend anyway')
+  } else {
+    rememberLog('[updates] update finished; proceeding with backend start')
+  }
+  return true
+}
+
 function unpackedPathFor(filePath) {
  return filePath.replace(/app\.asar(?=$|[\\/])/, 'app.asar.unpacked')
 }
@ -1801,7 +1884,11 @@ async function applyUpdates(opts = {}) {
      return { ok: true, manual: true, command, hermesRoot: updateRoot }
    }

-    emitUpdateProgress({ stage: 'restart', message: 'Handing off to the Hermes updater…', percent: 100 })
+    emitUpdateProgress({
+      stage: 'restart',
+      message: 'Updating Hermes — this window will close and the updater will open. Don’t reopen Hermes yourself; it restarts automatically when the update finishes.',
+      percent: 100
+    })
    repairMacUpdaterHelper(updater)

    const updateRoot = resolveUpdateRoot()
@ -1827,7 +1914,7 @@ async function applyUpdates(opts = {}) {
      env: {
        ...process.env,
        HERMES_HOME,
-        PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
+        PATH: pathWithHermesManagedNode(venvBin)
      },
      detached: true,
      stdio: 'ignore',
@ -1837,11 +1924,14 @@ async function applyUpdates(opts = {}) {

    rememberLog(`[updates] launched updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release venv shim`)

-    // Give the OS a beat to register the new process, then quit. The updater
-    // rebuilds and relaunches us when it's done.
+    // Linger on the "updating — don't reopen" overlay long enough for the user
+    // to actually read it (and to bridge the gap until the updater's own window
+    // appears), THEN quit to release the venv shim. The updater rebuilds and
+    // relaunches us when it's done. (#50419 — a 600ms quit looked like a crash
+    // and lured users into the #50238 relaunch loop.)
    setTimeout(() => {
      app.quit()
-    }, 600)
+    }, UPDATE_HANDOFF_DWELL_MS)

    return { ok: true, handedOff: true, updater }
  } finally {
@ -1871,7 +1961,7 @@ async function handOffWindowsBootstrapRecovery(reason) {
    env: {
      ...process.env,
      HERMES_HOME,
-      PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
+      PATH: pathWithHermesManagedNode(venvBin)
    },
    detached: true,
    stdio: 'ignore',
@ -1880,9 +1970,12 @@ async function handOffWindowsBootstrapRecovery(reason) {
  child.unref()

  rememberLog(`[bootstrap] handed off ${reason} recovery to updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release app.asar`)
+  // Same dwell as the in-app update hand-off (#50419): give the updater's
+  // window time to appear before we vanish, so the recovery doesn't look like
+  // a crash and provoke a mid-recovery relaunch.
  setTimeout(() => {
    app.quit()
-  }, 600)
+  }, UPDATE_HANDOFF_DWELL_MS)

  return true
 }
@ -1952,13 +2045,11 @@ async function applyUpdatesPosixInApp() {
  }

  // Put the Hermes-managed Node and the venv on PATH so `hermes desktop`'s
-  // npm build can find them on a machine with no system Node.
-  const extraPath = [path.join(HERMES_HOME, 'node', 'bin'), path.join(updateRoot, 'venv', 'bin')]
-    .filter(Boolean)
-    .join(path.delimiter)
+  // npm build can find them on a machine with no system Node. Windows portable
+  // Node lives directly under %LOCALAPPDATA%\hermes\node, not node\bin.
  const env = {
    HERMES_HOME,
-    PATH: [extraPath, process.env.PATH].filter(Boolean).join(path.delimiter)
+    PATH: pathWithHermesManagedNode(path.join(updateRoot, 'venv', 'bin'))
  }

  // `hermes update` reaps stale `hermes dashboard` backends (a code update
@ -2028,6 +2119,114 @@ async function applyUpdatesPosixInApp() {
    return { ok: false, backendUpdated: true, error: 'desktop rebuild failed' }
  }

+  // Linux in-app update terminal state (#45205). `hermes desktop --build-only`
+  // rebuilds the unpacked app in place under apps/desktop/release/<plat>-unpacked.
+  // We can only HONESTLY relaunch into the new GUI when the *running* binary IS
+  // that rebuilt one — i.e. execPath lives under release/<plat>-unpacked. The
+  // outcome is decided by three signals (see update-relaunch.cjs):
+  //
+  //   underUnpacked + sandboxOk  → 'relaunch': detached watcher re-execs us in
+  //       place (mirrors the macOS handoff). Without it the update succeeds but
+  //       the app never restarts and the overlay hangs on "applying" forever.
+  //   !underUnpacked             → 'guiSkew': the running shell is an AppImage/
+  //       .deb/.rpm/dev/unresolved binary we did NOT replace. Claiming "loads
+  //       next launch" is a lie (GUI/backend skew, #37541) — surface an
+  //       explicit closeable terminal state telling the user the GUI package
+  //       was NOT changed and must be updated/reinstalled.
+  //   underUnpacked + !sandboxOk → 'manual': we'd be relaunching the rebuilt
+  //       binary, but a fresh rebuild can leave chrome-sandbox without
+  //       root:root + setuid (mode 4755) and Electron then refuses to launch
+  //       ("quit and never came back"). DO NOT quit into a dead app — keep the
+  //       working window and surface the closeable manual-restart state.
+  if (!IS_MAC) {
+    const unpackedDir = resolveUnpackedRelease(process.execPath, updateRoot, process.platform)
+    const underUnpacked = unpackedDir !== null
+
+    const preflight = underUnpacked
+      ? sandboxPreflight(unpackedDir, p => fs.statSync(p))
+      : { ok: false, reason: 'not-under-unpacked', path: null }
+    const sandboxFallback = sandboxFallbackFromEnv(process.env, process.argv.slice(1))
+    const sandboxOk = preflight.ok || sandboxFallback
+    if (underUnpacked && !preflight.ok) {
+      rememberLog(
+        `[updates] sandbox preflight: not launchable (${preflight.reason}) at ${preflight.path}; ` +
+          `fallback=${sandboxFallback ? 'env/--no-sandbox' : 'none'}`
+      )
+    }
+
+    const outcome = decideRelaunchOutcome({ underUnpacked, sandboxOk })
+
+    if (outcome === 'relaunch') {
+      emitUpdateProgress({ stage: 'restart', message: 'Restarting Hermes…', percent: 100 })
+      // Preserve launch context across the re-exec: replay the original args
+      // (filtered of Electron internals) and the env/cwd that define which
+      // backend/profile/root this instance talks to. Without this the
+      // relaunched instance comes up with default context instead of the user's.
+      const relaunchArgs = collectRelaunchArgs(process.argv.slice(1))
+      const relaunchEnv = collectRelaunchEnv(process.env)
+      const relaunchScript = buildRelaunchScript({
+        pid: process.pid,
+        execPath: process.execPath,
+        args: relaunchArgs,
+        env: relaunchEnv,
+        cwd: process.cwd()
+      })
+      const scriptPath = path.join(app.getPath('temp'), `hermes-desktop-update-${Date.now()}.sh`)
+      try {
+        fs.writeFileSync(scriptPath, relaunchScript, { mode: 0o755 })
+        const child = spawn('/bin/bash', [scriptPath], { detached: true, stdio: 'ignore' })
+        child.unref()
+        rememberLog(
+          `[updates] launched linux relaunch: ${scriptPath} -> ${process.execPath} ` +
+            `(args=${relaunchArgs.length}, env=${Object.keys(relaunchEnv).length})`
+        )
+        setTimeout(() => app.quit(), UPDATE_HANDOFF_DWELL_MS)
+        return { ok: true, handedOff: true }
+      } catch (err) {
+        rememberLog(`[updates] linux relaunch failed: ${err.message}; falling back to manual restart`)
+        return {
+          ok: true,
+          backendUpdated: true,
+          guiUpdated: false,
+          manualRestart: true,
+          message: 'Backend updated. Quit and reopen Hermes to load the new version.'
+        }
+      }
+    }
+
+    if (outcome === 'guiSkew') {
+      emitUpdateProgress({
+        stage: 'guiSkew',
+        message:
+          'Backend updated, but the desktop app package was not changed. ' +
+          'Update or reinstall the Hermes desktop app to match.',
+        percent: 100
+      })
+      rememberLog(
+        `[updates] gui/backend skew: execPath ${process.execPath} not under release/*-unpacked; ` +
+          'backend updated, GUI package unchanged (AppImage/.deb/.rpm/dev/unresolved)'
+      )
+      return { ok: true, backendUpdated: true, guiUpdated: false, guiSkew: true }
+    }
+
+    // outcome === 'manual': we're the rebuilt binary, but its sandbox helper is
+    // not launchable and no fallback applies. Keep this working window alive.
+    rememberLog(
+      `[updates] sandbox not launchable (${preflight.reason}); skipping auto-relaunch, ` +
+        'returning manual-restart so the user keeps a working window'
+    )
+    return {
+      ok: true,
+      backendUpdated: true,
+      guiUpdated: false,
+      manualRestart: true,
+      sandboxBlocked: true,
+      message:
+        'Backend updated. The rebuilt app can’t relaunch automatically ' +
+        '(sandbox helper needs root). Quit and reopen Hermes to finish.'
+    }
+  }
+
  const rebuiltApp = [
    path.join(updateRoot, 'apps', 'desktop', 'release', 'mac-arm64', 'Hermes.app'),
    path.join(updateRoot, 'apps', 'desktop', 'release', 'mac', 'Hermes.app')
@ -2963,20 +3162,7 @@ function runRenderTitleJob(rawUrl) {
    }

    try {
-      window = new BrowserWindow({
-        show: false,
-        width: 1280,
-        height: 800,
-        webPreferences: {
-          backgroundThrottling: false,
-          contextIsolation: true,
-          javascript: true,
-          nodeIntegration: false,
-          sandbox: true,
-          session: partitionSession,
-          webSecurity: true
-        }
-      })
+      window = createLinkTitleWindow(BrowserWindow, partitionSession)
    } catch {
      return finish('')
    }
@ -4905,6 +5091,14 @@ async function startHermes() {
      }
    }

+    // Mutual exclusion with an in-app update (#50238). If this instance was
+    // relaunched while the Tauri updater is still applying an update, spawning
+    // a local backend now re-locks the venv shim and gets killed by the
+    // updater's straggler cleanup — looping. Park until the update finishes (or
+    // is detected stale), THEN start the backend. Local backends only; remote
+    // connections returned above and never touch the install tree.
+    await waitForUpdateToFinish()
+
    const token = crypto.randomBytes(32).toString('base64url')
    // --port 0: the OS assigns an ephemeral port; the child announces it on stdout.
    const dashboardArgs = ['dashboard', '--no-open', '--host', '127.0.0.1', '--port', '0']
--- a/apps/desktop/electron/preload.cjs
+++ b/apps/desktop/electron/preload.cjs
@ -166,6 +166,7 @@ contextBridge.exposeInMainWorld('hermesDesktop', {
    return () => ipcRenderer.removeListener('hermes:bootstrap:event', listener)
  },
  getVersion: () => ipcRenderer.invoke('hermes:version'),
+  getRemoteDisplayReason: () => ipcRenderer.invoke('hermes:get-remote-display-reason'),
  uninstall: {
    summary: () => ipcRenderer.invoke('hermes:uninstall:summary'),
    run: mode => ipcRenderer.invoke('hermes:uninstall:run', { mode })
--- a/apps/desktop/electron/update-marker.cjs
+++ b/apps/desktop/electron/update-marker.cjs
@ -0,0 +1,93 @@
+/**
+ * In-app update mutual-exclusion marker (#50238).
+ *
+ * The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
+ * duration of an `--update` run (see apps/bootstrap-installer/src-tauri/src/
+ * update.rs `UpdateMarkerGuard`). The marker body is two lines: the updater's
+ * pid and the unix-seconds it started.
+ *
+ * Why: if the user relaunches the desktop mid-update — the window vanished with
+ * no progress and looks crashed — a fresh instance must NOT spawn its own local
+ * backend. That backend re-locks the venv shim, the updater's straggler cleanup
+ * (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
+ * fails with the 45s "backend didn't come up" timeout, and the user relaunches
+ * into the same trap — an infinite respawn/kill loop. The desktop gates local
+ * backend startup on this marker and parks until the update finishes.
+ *
+ * This module holds the PURE, side-effect-light logic (path, pid liveness,
+ * parse + staleness) so it is unit-testable without booting Electron. The
+ * polling/boot-progress wrapper lives in main.cjs where the boot-progress and
+ * log sinks are.
+ */
+
+const fs = require('fs')
+const path = require('path')
+
+// Even with a live-looking PID, never treat a marker older than this as a live
+// update. A full update (git pull + pip + desktop rebuild) is minutes, not tens
+// of minutes; past this the marker is almost certainly stale (e.g. the OS
+// recycled the pid onto an unrelated process), so the gate self-heals.
+const UPDATE_MARKER_MAX_AGE_MS = 20 * 60 * 1000
+
+function markerPath(hermesHome) {
+  return path.join(hermesHome, '.hermes-update-in-progress')
+}
+
+// True only if a host process with this pid is currently alive. Signal 0 does
+// not deliver a signal — it just probes existence/permission. ESRCH => dead;
+// EPERM => alive but owned by another user (still "alive" for our purposes).
+// Injectable `kill` keeps it unit-testable.
+function isPidAlive(pid, kill = process.kill.bind(process)) {
+  if (!Number.isInteger(pid) || pid <= 0) return false
+  try {
+    kill(pid, 0)
+    return true
+  } catch (err) {
+    return Boolean(err && err.code === 'EPERM')
+  }
+}
+
+/**
+ * Read + interpret the marker.
+ *
+ * Returns `{ pid, ageMs }` only when an update is GENUINELY still running
+ * (parseable pid that is alive, within the age ceiling). Returns `null` for
+ * every "no live update" case — absent, unreadable, malformed, dead pid, or
+ * past the ceiling — and, when a stale marker file exists, deletes it so it
+ * cannot strand future launches.
+ *
+ * Pure-ish: file I/O against the given path, plus an injectable pid probe and
+ * clock for tests.
+ */
+function readLiveUpdateMarker(hermesHome, { kill, now = Date.now, maxAgeMs = UPDATE_MARKER_MAX_AGE_MS } = {}) {
+  const file = markerPath(hermesHome)
+  let raw
+  try {
+    raw = fs.readFileSync(file, 'utf8')
+  } catch {
+    return null // absent or unreadable => no live update
+  }
+
+  const [pidLine, startedLine] = String(raw).split('\n')
+  const pid = Number.parseInt((pidLine || '').trim(), 10)
+  const startedAt = Number.parseInt((startedLine || '').trim(), 10)
+  const ageMs = Number.isFinite(startedAt) ? now() - startedAt * 1000 : Infinity
+  const alive = Number.isInteger(pid) && isPidAlive(pid, kill)
+
+  if (!alive || ageMs > maxAgeMs) {
+    try {
+      fs.unlinkSync(file)
+    } catch {
+      void 0
+    }
+    return null
+  }
+  return { pid, ageMs }
+}
+
+module.exports = {
+  UPDATE_MARKER_MAX_AGE_MS,
+  markerPath,
+  isPidAlive,
+  readLiveUpdateMarker
+}
--- a/apps/desktop/electron/update-marker.test.cjs
+++ b/apps/desktop/electron/update-marker.test.cjs
@ -0,0 +1,92 @@
+/**
+ * Tests for electron/update-marker.cjs — the in-app update mutual-exclusion
+ * marker that prevents a desktop relaunched mid-update from spawning a backend
+ * the updater then kills in a loop (#50238).
+ *
+ * Run with: node --test electron/update-marker.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * Why this matters: the gate must (a) report a live update only when the
+ * updater pid is alive AND the marker is fresh, (b) treat absent/malformed/
+ * dead-pid/expired markers as "no live update" so a crashed updater can't
+ * strand future launches, and (c) self-heal by deleting a stale marker file.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const fs = require('fs')
+const os = require('os')
+const path = require('path')
+
+const { markerPath, isPidAlive, readLiveUpdateMarker, UPDATE_MARKER_MAX_AGE_MS } = require('./update-marker.cjs')
+
+function tmpHome(tag) {
+  const dir = fs.mkdtempSync(path.join(os.tmpdir(), `hermes-marker-${tag}-`))
+  return dir
+}
+
+function writeMarker(home, pid, startedAtSec) {
+  fs.writeFileSync(markerPath(home), `${pid}\n${startedAtSec}`)
+}
+
+const ALIVE = () => true // injected kill that "succeeds" => pid alive
+const DEAD = () => {
+  const err = new Error('no such process')
+  err.code = 'ESRCH'
+  throw err
+}
+
+test('absent marker => no live update', () => {
+  const home = tmpHome('absent')
+  assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
+})
+
+test('live pid within age ceiling => live update reported', () => {
+  const home = tmpHome('live')
+  const now = 1_000_000_000_000
+  writeMarker(home, 4242, Math.floor(now / 1000) - 5) // 5s old
+  const res = readLiveUpdateMarker(home, { kill: ALIVE, now: () => now })
+  assert.ok(res, 'a fresh, alive marker is a live update')
+  assert.equal(res.pid, 4242)
+  assert.ok(res.ageMs >= 0 && res.ageMs < 10_000)
+  assert.ok(fs.existsSync(markerPath(home)), 'a live marker is NOT deleted')
+})
+
+test('dead pid => no live update and marker is pruned', () => {
+  const home = tmpHome('dead')
+  writeMarker(home, 999999, Math.floor(Date.now() / 1000))
+  assert.equal(readLiveUpdateMarker(home, { kill: DEAD }), null)
+  assert.ok(!fs.existsSync(markerPath(home)), 'a dead-pid marker self-heals (deleted)')
+})
+
+test('expired marker (past age ceiling) => no live update and pruned', () => {
+  const home = tmpHome('expired')
+  const now = 1_000_000_000_000
+  writeMarker(home, 4242, Math.floor((now - UPDATE_MARKER_MAX_AGE_MS - 60_000) / 1000))
+  // Even though the pid is "alive", the marker is too old to trust.
+  assert.equal(readLiveUpdateMarker(home, { kill: ALIVE, now: () => now }), null)
+  assert.ok(!fs.existsSync(markerPath(home)), 'an expired marker self-heals (deleted)')
+})
+
+test('malformed marker => no live update and pruned', () => {
+  const home = tmpHome('malformed')
+  fs.writeFileSync(markerPath(home), 'not-a-pid\nnonsense')
+  assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
+  assert.ok(!fs.existsSync(markerPath(home)))
+})
+
+test('isPidAlive: own pid is alive, impossible pid is dead', () => {
+  assert.equal(isPidAlive(process.pid), true)
+  assert.equal(isPidAlive(-1), false)
+  assert.equal(isPidAlive(0), false)
+  assert.equal(isPidAlive(NaN), false)
+})
+
+test('isPidAlive: EPERM counts as alive (process owned by another user)', () => {
+  const eperm = () => {
+    const err = new Error('operation not permitted')
+    err.code = 'EPERM'
+    throw err
+  }
+  assert.equal(isPidAlive(4242, eperm), true)
+})
--- a/apps/desktop/electron/update-relaunch.cjs
+++ b/apps/desktop/electron/update-relaunch.cjs
@ -0,0 +1,265 @@
+'use strict'
+
+/**
+ * update-relaunch.cjs — pure decision + script-generation helpers for the
+ * Linux in-app update relaunch (#45205).
+ *
+ * Extracted from main.cjs's `applyUpdatesPosixInApp` so the security- and
+ * correctness-critical "do we relaunch, or land on a manual terminal state?"
+ * decision is unit-testable without booting Electron (main.cjs
+ * `require('electron')` at load).
+ *
+ * Background
+ * ----------
+ * After `hermes update` + `hermes desktop --build-only`, the freshly-rebuilt
+ * GUI lives under `apps/desktop/release/<plat>-unpacked`. We can only honestly
+ * relaunch into the new GUI when the *running* binary is that rebuilt one —
+ * i.e. its execPath is under the rebuilt `release/<plat>-unpacked` dir.
+ *
+ *   - Source / unpacked install (execPath under release/<plat>-unpacked):
+ *     the running binary IS the thing we just rebuilt → relaunch it in place.
+ *   - AppImage / .deb / .rpm / dev / unresolved (execPath elsewhere):
+ *     the backend was updated but THIS GUI shell was NOT replaced. Claiming
+ *     "the new version loads next launch" is a lie that produces GUI/backend
+ *     skew (#37541): the user keeps running the old GUI against new backend
+ *     code with no path to fix it from inside the app. Surface an explicit
+ *     terminal state telling them the GUI package must be reinstalled.
+ *
+ * Sandbox preflight (#3 in the review)
+ * ------------------------------------
+ * A fresh `release/<plat>-unpacked` rebuild can leave `chrome-sandbox` without
+ * the required `root:root` + setuid (mode 4755). Electron then refuses to
+ * launch with "The SUID sandbox helper binary was found, but is not configured
+ * correctly" and the relaunch yields "quit and never came back" — a dead app.
+ * Before we quit+hand off we preflight the rebuilt sandbox helper; if it is NOT
+ * launchable (and no working non-interactive fallback applies — see
+ * sandboxFallbackFromEnv) we DO NOT quit. We keep the working window and return
+ * the closeable manual-restart terminal state instead.
+ */
+
+const path = require('node:path')
+
+// Map process.platform → electron-builder's `release/<dir>-unpacked` name.
+function unpackedDirName(platform) {
+  if (platform === 'darwin') return 'mac-unpacked' // not used (mac swaps bundles)
+  if (platform === 'win32') return 'win-unpacked'
+  return 'linux-unpacked'
+}
+
+/**
+ * If `execPath` lives under `<updateRoot>/apps/desktop/release/<plat>-unpacked`,
+ * return that unpacked dir; otherwise null. A null result means the running
+ * binary is NOT the thing we just rebuilt (AppImage/.deb/.rpm/dev), so we must
+ * not claim a GUI relaunch.
+ *
+ * Match is a path-segment-aware prefix check (not a bare string startsWith) so
+ * `.../release/linux-unpacked-evil` can't masquerade as `.../release/linux-unpacked`.
+ */
+function resolveUnpackedRelease(execPath, updateRoot, platform) {
+  if (!execPath || !updateRoot) return null
+  const releaseDir = path.join(updateRoot, 'apps', 'desktop', 'release')
+  const unpacked = path.join(releaseDir, unpackedDirName(platform))
+  const normalizedExec = path.resolve(String(execPath))
+  // execPath must be the unpacked dir itself or a descendant of it.
+  const withSep = unpacked.endsWith(path.sep) ? unpacked : unpacked + path.sep
+  if (normalizedExec === unpacked || normalizedExec.startsWith(withSep)) {
+    return unpacked
+  }
+  return null
+}
+
+/**
+ * Pure decision: given whether the running binary is under the rebuilt
+ * unpacked release AND whether its sandbox helper is launchable, choose the
+ * terminal outcome.
+ *
+ *   'relaunch' — quit + detached watcher re-execs the rebuilt binary in place.
+ *   'guiSkew'  — backend updated, GUI package NOT changed; user must reinstall
+ *                the GUI. Closeable terminal state; does NOT claim a GUI update.
+ *   'manual'   — running the rebuilt binary, but its sandbox helper is not
+ *                launchable and no fallback applies; do NOT quit into a dead
+ *                app. Closeable manual-restart terminal state.
+ */
+function decideRelaunchOutcome({ underUnpacked, sandboxOk }) {
+  if (!underUnpacked) return 'guiSkew'
+  if (!sandboxOk) return 'manual'
+  return 'relaunch'
+}
+
+/**
+ * Preflight the rebuilt sandbox helper. Returns
+ *   { ok: boolean, reason: string, path: string }
+ *
+ * `ok` is true when chrome-sandbox is owned by uid 0 AND has the setuid bit
+ * (mode & 0o4000) — i.e. Electron can launch it. If chrome-sandbox does not
+ * exist at all we treat it as ok: this Electron build does not use the SUID
+ * sandbox helper (e.g. it ships the namespace sandbox), so the relaunch is not
+ * blocked on it.
+ *
+ * `statSync` is injectable so this is testable without a real setuid file.
+ */
+function sandboxPreflight(unpackedDir, statSync) {
+  if (!unpackedDir) return { ok: false, reason: 'no-unpacked-dir', path: null }
+  const sandboxPath = path.join(unpackedDir, 'chrome-sandbox')
+  let st
+  try {
+    st = statSync(sandboxPath)
+  } catch {
+    // No chrome-sandbox helper present → this build doesn't rely on the SUID
+    // sandbox; nothing to block the relaunch.
+    return { ok: true, reason: 'no-sandbox-helper', path: sandboxPath }
+  }
+  const ownedByRoot = st.uid === 0
+  const hasSetuid = (st.mode & 0o4000) !== 0
+  if (ownedByRoot && hasSetuid) {
+    return { ok: true, reason: 'launchable', path: sandboxPath }
+  }
+  if (!ownedByRoot && !hasSetuid) {
+    return { ok: false, reason: 'not-root-not-setuid', path: sandboxPath }
+  }
+  if (!ownedByRoot) return { ok: false, reason: 'not-root', path: sandboxPath }
+  return { ok: false, reason: 'not-setuid', path: sandboxPath }
+}
+
+/**
+ * Detect a non-interactive sandbox fallback the user has opted into via the
+ * environment. The reviewer asked us to integrate with any existing
+ * `--no-sandbox` / chrome-sandbox handling. A repo grep found NO existing
+ * non-interactive sandbox fallback in the desktop app (the only chrome-sandbox
+ * reference is documentation in scripts/before-pack.cjs). The one signal that
+ * DOES exist is the standard Electron escape hatch: ELECTRON_DISABLE_SANDBOX=1
+ * (and the equivalent `--no-sandbox` already present in the launch args). If
+ * the user has set that, the rebuilt binary will start even with a broken
+ * chrome-sandbox, so the relaunch is safe.
+ *
+ * Returns true when a fallback makes the relaunch safe despite a failed
+ * sandbox preflight.
+ */
+function sandboxFallbackFromEnv(env, launchArgs) {
+  const disable = String((env && env.ELECTRON_DISABLE_SANDBOX) || '').trim()
+  if (disable === '1' || disable.toLowerCase() === 'true') return true
+  if (Array.isArray(launchArgs) && launchArgs.some(a => a === '--no-sandbox')) return true
+  return false
+}
+
+// POSIX single-quote a value for safe inclusion in the generated bash script.
+function shellQuote(value) {
+  return `'${String(value).replace(/'/g, `'\\''`)}'`
+}
+
+// Electron / Chromium internal switches that must NOT be replayed on re-exec:
+// they are runtime artifacts of THIS launch, not user intent, and re-passing
+// them can change sandbox/zygote behavior or point at stale fds/dirs.
+const INTERNAL_ARG_PREFIXES = [
+  '--type=', // renderer/gpu/zygote child markers
+  '--user-data-dir=',
+  '--enable-features=',
+  '--disable-features=',
+  '--field-trial-handle=',
+  '--enable-logging',
+  '--log-file=',
+  // NB: --no-sandbox is deliberately NOT stripped — it reflects the user's /
+  // environment's SUID-sandbox opt-out (some hardened kernels/containers require
+  // it) and is the signal sandboxFallbackFromEnv() uses to allow a relaunch when
+  // chrome-sandbox isn't setuid. Dropping it would make exactly that relaunch
+  // fail ("quit and never came back").
+  '--disable-gpu-sandbox',
+  '--lang=',
+  '--inspect',
+  '--remote-debugging-port='
+]
+
+/**
+ * Filter Electron internals out of the original launch args so we replay only
+ * meaningful user/launcher intent (deep-link URLs, app-specific flags).
+ * `argv` is expected to be process.argv.slice(1) for a PACKAGED app (argv[0] is
+ * the exec path itself; there is no entry-script arg as in a dev run).
+ */
+function collectRelaunchArgs(argv) {
+  if (!Array.isArray(argv)) return []
+  return argv.filter(arg => {
+    if (typeof arg !== 'string' || arg.length === 0) return false
+    return !INTERNAL_ARG_PREFIXES.some(prefix =>
+      prefix.endsWith('=') ? arg.startsWith(prefix) : arg === prefix || arg.startsWith(prefix + '=')
+    )
+  })
+}
+
+// Env keys whose values define the relaunched instance's context (which
+// backend/profile/root it talks to). Anything HERMES_DESKTOP_* is preserved
+// plus HERMES_HOME. We snapshot the values, not the live env, so the new
+// instance comes up pointed at the same place this one was.
+// ELECTRON_DISABLE_SANDBOX is preserved for the same reason --no-sandbox is kept
+// in the replayed args: if a relaunch is only safe because the user opted out of
+// the SUID sandbox, the relaunched instance must inherit that opt-out too.
+const PRESERVED_ENV_KEYS = ['HERMES_HOME', 'ELECTRON_DISABLE_SANDBOX']
+const PRESERVED_ENV_PREFIXES = ['HERMES_DESKTOP_']
+
+function collectRelaunchEnv(env) {
+  const out = {}
+  if (!env || typeof env !== 'object') return out
+  for (const [key, value] of Object.entries(env)) {
+    if (value == null) continue
+    if (PRESERVED_ENV_KEYS.includes(key) || PRESERVED_ENV_PREFIXES.some(p => key.startsWith(p))) {
+      out[key] = String(value)
+    }
+  }
+  return out
+}
+
+/**
+ * Build the detached bash watcher that waits for the parent to exit (graceful
+ * window then SIGKILL), self-deletes, and re-execs the rebuilt binary WITH the
+ * original launch context (cwd, env, args) restored.
+ *
+ * @param {object} o
+ * @param {number} o.pid       parent (this) process pid to wait on
+ * @param {string} o.execPath  binary to re-exec
+ * @param {string[]} o.args    filtered launch args to replay
+ * @param {object} o.env       env key→value to export before exec
+ * @param {string} o.cwd       working directory to restore
+ */
+function buildRelaunchScript({ pid, execPath, args, env, cwd }) {
+  const exports = Object.entries(env || {})
+    .map(([k, v]) => `export ${k}=${shellQuote(v)}`)
+    .join('\n')
+  const quotedArgs = (args || []).map(shellQuote).join(' ')
+  const cwdLine = cwd ? `cd ${shellQuote(cwd)} 2>/dev/null || true` : ''
+  // NOTE: `exec` replaces the watcher process with the relaunched app, so the
+  // re-exec inherits exactly the env/cwd we set above.
+  return `#!/bin/bash
+set -u
+APP_PID=${Number(pid)}
+# Wait up to ~30s for a graceful exit, then SIGKILL: a hung/zombie parent must
+# be gone before we relaunch, or the new instance bails on the single-instance
+# lock. (#45205)
+for _ in $(seq 1 60); do
+  kill -0 "$APP_PID" 2>/dev/null || break
+  sleep 0.5
+done
+if kill -0 "$APP_PID" 2>/dev/null; then
+  kill -9 "$APP_PID" 2>/dev/null || true
+  sleep 0.5
+fi
+# Self-delete so temp watchers don't accumulate across updates.
+rm -f -- "$0" 2>/dev/null || true
+${cwdLine}
+${exports}
+exec ${shellQuote(execPath)}${quotedArgs ? ' ' + quotedArgs : ''}
+`
+}
+
+module.exports = {
+  unpackedDirName,
+  resolveUnpackedRelease,
+  decideRelaunchOutcome,
+  sandboxPreflight,
+  sandboxFallbackFromEnv,
+  collectRelaunchArgs,
+  collectRelaunchEnv,
+  buildRelaunchScript,
+  shellQuote,
+  INTERNAL_ARG_PREFIXES,
+  PRESERVED_ENV_KEYS,
+  PRESERVED_ENV_PREFIXES
+}
--- a/apps/desktop/electron/update-relaunch.test.cjs
+++ b/apps/desktop/electron/update-relaunch.test.cjs
@ -0,0 +1,231 @@
+/**
+ * Tests for electron/update-relaunch.cjs — the pure decision + script helpers
+ * behind the Linux in-app update relaunch (#45205).
+ *
+ * Run with: node --test electron/update-relaunch.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * What this locks (review acceptance criteria for PR #45205):
+ *   1. The execPath split: only a binary under release/<plat>-unpacked may
+ *      relaunch/claim a GUI update; AppImage/.deb/.rpm/dev/unresolved paths land
+ *      on the guiSkew terminal state and do NOT claim the GUI was updated.
+ *   2. Launch context is replayed on re-exec (args filtered of Electron
+ *      internals; HERMES_HOME / HERMES_DESKTOP_* env + cwd preserved) and is
+ *      safely shell-quoted.
+ *   3. The sandbox preflight: chrome-sandbox must be root-owned + setuid to be
+ *      launchable; otherwise the decision degrades to a manual terminal state
+ *      (keep a working window) unless a non-interactive fallback applies.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const fs = require('node:fs')
+const os = require('node:os')
+const path = require('node:path')
+const { execFileSync } = require('node:child_process')
+
+const {
+  unpackedDirName,
+  resolveUnpackedRelease,
+  decideRelaunchOutcome,
+  sandboxPreflight,
+  sandboxFallbackFromEnv,
+  collectRelaunchArgs,
+  collectRelaunchEnv,
+  buildRelaunchScript,
+  shellQuote
+} = require('./update-relaunch.cjs')
+
+const ROOT = '/home/u/.hermes/hermes-agent'
+const UNPACKED = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked')
+
+// ---------------------------------------------------------------------------
+// 1) The execPath split — the heart of the GUI/backend skew guard.
+// ---------------------------------------------------------------------------
+
+test('unpackedDirName maps platform to the electron-builder dir', () => {
+  assert.equal(unpackedDirName('linux'), 'linux-unpacked')
+  assert.equal(unpackedDirName('win32'), 'win-unpacked')
+})
+
+test('resolveUnpackedRelease returns the dir for a binary UNDER release/<plat>-unpacked', () => {
+  const exec = path.join(UNPACKED, 'hermes')
+  assert.equal(resolveUnpackedRelease(exec, ROOT, 'linux'), UNPACKED)
+  // The unpacked dir itself also counts.
+  assert.equal(resolveUnpackedRelease(UNPACKED, ROOT, 'linux'), UNPACKED)
+})
+
+test('resolveUnpackedRelease is null for AppImage / .deb / .rpm / dev / unresolved paths', () => {
+  // AppImage mount
+  assert.equal(resolveUnpackedRelease('/tmp/.mount_Hermes12345/AppRun', ROOT, 'linux'), null)
+  // .deb / .rpm system install
+  assert.equal(resolveUnpackedRelease('/usr/lib/hermes/hermes', ROOT, 'linux'), null)
+  assert.equal(resolveUnpackedRelease('/opt/Hermes/hermes', ROOT, 'linux'), null)
+  // dev electron
+  assert.equal(resolveUnpackedRelease('/home/u/.hermes/hermes-agent/node_modules/electron/dist/electron', ROOT, 'linux'), null)
+  // empty / missing
+  assert.equal(resolveUnpackedRelease('', ROOT, 'linux'), null)
+  assert.equal(resolveUnpackedRelease(path.join(UNPACKED, 'hermes'), '', 'linux'), null)
+})
+
+test('resolveUnpackedRelease is not fooled by a sibling prefix dir', () => {
+  // `.../release/linux-unpacked-evil` must NOT match `.../release/linux-unpacked`.
+  const sneaky = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked-evil', 'hermes')
+  assert.equal(resolveUnpackedRelease(sneaky, ROOT, 'linux'), null)
+})
+
+test('decideRelaunchOutcome: only under-unpacked + sandbox-ok relaunches', () => {
+  assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: true }), 'relaunch')
+  // Under unpacked but sandbox not launchable → manual (keep a working window).
+  assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: false }), 'manual')
+  // Not under unpacked → guiSkew regardless of sandbox flag.
+  assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: true }), 'guiSkew')
+  assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: false }), 'guiSkew')
+})
+
+// ---------------------------------------------------------------------------
+// 3) Sandbox preflight
+// ---------------------------------------------------------------------------
+
+const fakeStat = (uid, mode) => () => ({ uid, mode })
+const throwStat = () => {
+  throw Object.assign(new Error('ENOENT'), { code: 'ENOENT' })
+}
+
+test('sandboxPreflight: root-owned + setuid is launchable', () => {
+  const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o4755))
+  assert.equal(r.ok, true)
+  assert.equal(r.reason, 'launchable')
+})
+
+test('sandboxPreflight: not root → not launchable', () => {
+  const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o4755))
+  assert.equal(r.ok, false)
+  assert.equal(r.reason, 'not-root')
+})
+
+test('sandboxPreflight: missing setuid bit → not launchable', () => {
+  const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o755))
+  assert.equal(r.ok, false)
+  assert.equal(r.reason, 'not-setuid')
+})
+
+test('sandboxPreflight: neither root nor setuid (the fresh-rebuild trap)', () => {
+  const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o755))
+  assert.equal(r.ok, false)
+  assert.equal(r.reason, 'not-root-not-setuid')
+})
+
+test('sandboxPreflight: no chrome-sandbox helper present → ok (build does not use SUID sandbox)', () => {
+  const r = sandboxPreflight(UNPACKED, throwStat)
+  assert.equal(r.ok, true)
+  assert.equal(r.reason, 'no-sandbox-helper')
+})
+
+test('sandboxFallbackFromEnv: ELECTRON_DISABLE_SANDBOX / --no-sandbox make a broken sandbox safe', () => {
+  assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: '1' }, []), true)
+  assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: 'true' }, []), true)
+  assert.equal(sandboxFallbackFromEnv({}, ['--no-sandbox']), true)
+  assert.equal(sandboxFallbackFromEnv({}, ['--foo']), false)
+  assert.equal(sandboxFallbackFromEnv({}, []), false)
+  assert.equal(sandboxFallbackFromEnv(null, null), false)
+})
+
+// ---------------------------------------------------------------------------
+// 2) Launch-context preservation
+// ---------------------------------------------------------------------------
+
+test('collectRelaunchArgs drops Electron internals, keeps user/launcher args', () => {
+  const argv = [
+    '--type=renderer',
+    '--user-data-dir=/tmp/x',
+    '--enable-features=Foo',
+    '--field-trial-handle=123',
+    '--no-sandbox', // sandbox opt-out — KEEP (user/env intent + relaunch fallback)
+    '--lang=en-US',
+    'hermes://open/agent/42', // deep link — keep
+    '--profile=work', // app flag — keep
+    '--remote-debugging-port=9222' // internal — drop
+  ]
+  assert.deepEqual(collectRelaunchArgs(argv), ['--no-sandbox', 'hermes://open/agent/42', '--profile=work'])
+  assert.deepEqual(collectRelaunchArgs(undefined), [])
+})
+
+test('collectRelaunchEnv preserves HERMES_HOME + HERMES_DESKTOP_* + sandbox opt-out only', () => {
+  const env = {
+    HERMES_HOME: '/home/u/.hermes',
+    HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
+    HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
+    HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
+    ELECTRON_DISABLE_SANDBOX: '1', // sandbox opt-out — preserved
+    PATH: '/usr/bin', // not preserved
+    HOME: '/home/u', // not preserved
+    UNRELATED: 'x'
+  }
+  assert.deepEqual(collectRelaunchEnv(env), {
+    HERMES_HOME: '/home/u/.hermes',
+    HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
+    HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
+    HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
+    ELECTRON_DISABLE_SANDBOX: '1'
+  })
+  assert.deepEqual(collectRelaunchEnv(null), {})
+})
+
+// ---------------------------------------------------------------------------
+// Generated watcher script: safe quoting + valid bash syntax.
+// ---------------------------------------------------------------------------
+
+test('shellQuote neutralizes single quotes and metacharacters', () => {
+  assert.equal(shellQuote(`a'b`), `'a'\\''b'`)
+  assert.equal(shellQuote('$(rm -rf /)'), `'$(rm -rf /)'`)
+})
+
+test('buildRelaunchScript embeds pid/exec/args/env/cwd and is valid bash', () => {
+  const script = buildRelaunchScript({
+    pid: 4242,
+    execPath: '/home/u/.hermes/hermes-agent/apps/desktop/release/linux-unpacked/Hermes',
+    args: ['hermes://open/agent/42', "--note=it's fine"],
+    env: { HERMES_HOME: '/home/u/.hermes', HERMES_DESKTOP_REMOTE_URL: 'http://box:9119' },
+    cwd: '/home/u/work dir'
+  })
+
+  // Structural assertions.
+  assert.match(script, /^#!\/bin\/bash/)
+  assert.match(script, /APP_PID=4242/)
+  assert.match(script, /kill -9 "\$APP_PID"/)
+  assert.match(script, /rm -f -- "\$0"/)
+  // env exports + cwd restore + args replay are present and quoted.
+  assert.match(script, /export HERMES_HOME='\/home\/u\/\.hermes'/)
+  assert.match(script, /export HERMES_DESKTOP_REMOTE_URL='http:\/\/box:9119'/)
+  assert.match(script, /cd '\/home\/u\/work dir'/)
+  assert.match(script, /exec '.*\/linux-unpacked\/Hermes' 'hermes:\/\/open\/agent\/42' '--note=it'\\''s fine'/)
+
+  // It must be syntactically valid bash (`bash -n`). Write to a temp file and lint.
+  const tmp = path.join(os.tmpdir(), `hermes-relaunch-test-${Date.now()}.sh`)
+  fs.writeFileSync(tmp, script)
+  try {
+    execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
+  } finally {
+    fs.rmSync(tmp, { force: true })
+  }
+})
+
+test('buildRelaunchScript with no args/env still lints clean', () => {
+  const script = buildRelaunchScript({
+    pid: 1,
+    execPath: '/opt/Hermes/Hermes',
+    args: [],
+    env: {},
+    cwd: ''
+  })
+  const tmp = path.join(os.tmpdir(), `hermes-relaunch-test2-${Date.now()}.sh`)
+  fs.writeFileSync(tmp, script)
+  try {
+    execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
+  } finally {
+    fs.rmSync(tmp, { force: true })
+  }
+  // exec line has no trailing args.
+  assert.match(script, /exec '\/opt\/Hermes\/Hermes'\n/)
+})
--- a/apps/desktop/package.json
+++ b/apps/desktop/package.json
@ -2,7 +2,7 @@
  "name": "hermes",
  "productName": "Hermes",
  "private": true,
-  "version": "0.15.1",
+  "version": "0.17.0",
  "description": "Native desktop shell for Hermes Agent.",
  "author": "Nous Research",
  "type": "module",
@ -37,7 +37,7 @@
    "test:desktop:nsis": "node scripts/test-desktop.mjs nsis",
    "test:desktop:existing": "node scripts/test-desktop.mjs existing",
    "test:desktop:fresh": "node scripts/test-desktop.mjs fresh",
-    "test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/windows-user-env.test.cjs",
+    "test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/backend-ready.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/link-title-window.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/update-marker.test.cjs electron/update-relaunch.test.cjs electron/windows-user-env.test.cjs",
    "typecheck": "tsc -p . --noEmit",
    "lint": "eslint src/ electron/",
    "lint:fix": "eslint src/ electron/ --fix",
--- a/apps/desktop/src/app/agents/index.tsx
+++ b/apps/desktop/src/app/agents/index.tsx
@ -357,7 +357,7 @@ function SubagentRow({ node, depth = 0, nowMs }: { node: SubagentNode; depth?: n
      </button>

      {visibleRows.length > 0 ? (
-        <div className="grid min-w-0 gap-1 pl-6">
+        <div className="grid min-w-0 gap-1 pl-6" data-selectable-text="true">
          {visibleRows.map((entry, i) => (
            <StreamLine
              active={running && i === visibleRows.length - 1}
@ -371,7 +371,7 @@ function SubagentRow({ node, depth = 0, nowMs }: { node: SubagentNode; depth?: n
      ) : null}

      {open && fileLines.length > 0 ? (
-        <div className="grid min-w-0 gap-0.5 pl-6">
+        <div className="grid min-w-0 gap-0.5 pl-6" data-selectable-text="true">
          <p className="text-[0.58rem] font-medium tracking-wider text-muted-foreground/60 uppercase">
            {t.agents.files}
          </p>
--- a/apps/desktop/src/app/chat/composer/attachments.test.tsx
+++ b/apps/desktop/src/app/chat/composer/attachments.test.tsx
@ -0,0 +1,69 @@
+import { cleanup, render, screen } from '@testing-library/react'
+import { afterEach, describe, expect, it } from 'vitest'
+
+import { I18nProvider } from '@/i18n/context'
+
+import { AttachmentList } from './attachments'
+import type { ComposerAttachment } from '@/store/composer'
+
+function makeAttachment(id: string, label = 'test.pdf'): ComposerAttachment {
+  return { id, kind: 'file', label }
+}
+
+function renderWithI18n(ui: React.ReactNode) {
+  return render(
+    <I18nProvider configClient={{ getConfig: async () => ({}), saveConfig: async () => ({ ok: true }) }}>
+      {ui}
+    </I18nProvider>
+  )
+}
+
+describe('AttachmentList', () => {
+  afterEach(() => {
+    cleanup()
+  })
+
+  it('renders valid attachments', () => {
+    const attachments = [makeAttachment('a', 'doc.pdf'), makeAttachment('b', 'img.png')]
+    renderWithI18n(<AttachmentList attachments={attachments} />)
+    expect(screen.getByText('doc.pdf')).toBeDefined()
+    expect(screen.getByText('img.png')).toBeDefined()
+  })
+
+  it('renders empty list without error', () => {
+    renderWithI18n(<AttachmentList attachments={[]} />)
+    const container = screen.getByTestId?.('composer-attachments') ?? document.querySelector('[data-slot="composer-attachments"]')
+    expect(container).toBeDefined()
+  })
+
+  it('does not crash when attachments array contains undefined entries', () => {
+    // Repro: session switch can leave stale/undefined entries in the
+    // attachments array, causing a TypeError at attachment.refText.
+    const attachments = [
+      makeAttachment('a', 'good.pdf'),
+      undefined as unknown as ComposerAttachment,
+      makeAttachment('b', 'also-good.png')
+    ]
+
+    expect(() => {
+      renderWithI18n(<AttachmentList attachments={attachments} />)
+    }).not.toThrow()
+
+    // Only valid attachments should render
+    expect(screen.getByText('good.pdf')).toBeDefined()
+    expect(screen.getByText('also-good.png')).toBeDefined()
+  })
+
+  it('does not crash when attachments array contains null entries', () => {
+    const attachments = [
+      null as unknown as ComposerAttachment,
+      makeAttachment('a', 'valid.txt')
+    ]
+
+    expect(() => {
+      renderWithI18n(<AttachmentList attachments={attachments} />)
+    }).not.toThrow()
+
+    expect(screen.getByText('valid.txt')).toBeDefined()
+  })
+})
--- a/apps/desktop/src/app/chat/composer/attachments.tsx
+++ b/apps/desktop/src/app/chat/composer/attachments.tsx
@ -20,7 +20,7 @@ export function AttachmentList({
 }) {
  return (
    <div className="flex max-w-full flex-wrap gap-1.5 px-1 pt-1" data-slot="composer-attachments">
-      {attachments.map(attachment => (
+      {attachments.filter(Boolean).map(attachment => (
        <AttachmentPill attachment={attachment} key={attachment.id} onRemove={onRemove} />
      ))}
    </div>
--- a/apps/desktop/src/app/chat/composer/completion-drawer.tsx
+++ b/apps/desktop/src/app/chat/composer/completion-drawer.tsx
@ -2,21 +2,20 @@ import type { Unstable_TriggerAdapter } from '@assistant-ui/core'
 import { ComposerPrimitive } from '@assistant-ui/react'
 import type { ReactNode } from 'react'

-import { composerFusedDockCard } from '@/components/chat/composer-dock'
+import { composerPanelCard } from '@/components/chat/composer-dock'
 import { cn } from '@/lib/utils'

-// Same docked chrome as the queue/status stack, but its own thing: a narrow,
-// left-aligned card (not full width) that fuses to the composer's edge instead
-// of floating above it. `left-1` matches the stack's `mx-1` inset; the negative
-// margin overlaps the seam so the composer's (now-transparent) edge border reads
-// as shared. Fused (opaque) fill — the composer surface swaps to the same fill
-// while a drawer is open, so the two paint as one panel.
-const DRAWER_SHELL =
-  'absolute left-1 z-50 w-80 max-w-[calc(100%-0.5rem)] max-h-[min(22rem,calc(100vh-8rem))] overflow-y-auto overscroll-contain p-1 text-xs text-popover-foreground'
+// A standalone glassy panel floating just off the composer edge, inset from the
+// left. Skin is the shared composerPanelCard (also used by the attach menu).
+const DRAWER_SHELL = cn(
+  'absolute left-2 z-50 w-80 max-w-[calc(100%-1rem)] max-h-[min(22rem,calc(100vh-8rem))]',
+  'overflow-y-auto overscroll-contain p-1 text-popover-foreground',
+  composerPanelCard
+)

-export const COMPLETION_DRAWER_CLASS = cn(DRAWER_SHELL, 'bottom-full -mb-[9px]', composerFusedDockCard('top'))
+export const COMPLETION_DRAWER_CLASS = cn(DRAWER_SHELL, 'bottom-full mb-1')

-export const COMPLETION_DRAWER_BELOW_CLASS = cn(DRAWER_SHELL, 'top-full -mt-[9px]', composerFusedDockCard('bottom'))
+export const COMPLETION_DRAWER_BELOW_CLASS = cn(DRAWER_SHELL, 'top-full mt-1')

 export function ComposerCompletionDrawer({
  adapter,
--- a/apps/desktop/src/app/chat/composer/context-menu.tsx
+++ b/apps/desktop/src/app/chat/composer/context-menu.tsx
@ -1,5 +1,6 @@
 import { useState } from 'react'

+import { composerPanelCard } from '@/components/chat/composer-dock'
 import { Button } from '@/components/ui/button'
 import { Codicon } from '@/components/ui/codicon'
 import { Dialog, DialogContent, DialogDescription, DialogHeader, DialogTitle } from '@/components/ui/dialog'
@ -54,11 +55,11 @@ export function ContextMenu({
            type="button"
            variant="ghost"
          >
-            <Codicon name="add" size="1rem" />
+            <Codicon name="add" size="0.875rem" />
          </Button>
        </DropdownMenuTrigger>
-        <DropdownMenuContent align="start" className="w-60" side="top" sideOffset={10}>
-          <DropdownMenuLabel className="text-[0.7rem] font-medium uppercase tracking-wide text-muted-foreground/85">
+        <DropdownMenuContent align="start" className={cn('w-60', composerPanelCard)} side="top" sideOffset={6}>
+          <DropdownMenuLabel className="px-2 pb-0.5 pt-0.5 text-[0.625rem] font-semibold uppercase tracking-wider text-(--ui-text-tertiary)">
            {c.attachLabel}
          </DropdownMenuLabel>
          <ContextMenuItem disabled={!onPickFiles} icon={FileText} onSelect={onPickFiles}>
@ -142,7 +143,12 @@ function PromptSnippetsDialog({ onInsertText, onOpenChange, open }: PromptSnippe

 export function ContextMenuItem({ children, disabled, icon: Icon, onSelect }: ContextMenuItemProps) {
  return (
-    <DropdownMenuItem disabled={disabled} onSelect={onSelect}>
+    // Override font size + highlight to match the / · @ completion rows exactly.
+    <DropdownMenuItem
+      className="text-[length:var(--conversation-tool-font-size)] focus:bg-(--ui-bg-tertiary)"
+      disabled={disabled}
+      onSelect={onSelect}
+    >
      <Icon />
      <span>{children}</span>
    </DropdownMenuItem>
--- a/apps/desktop/src/app/chat/composer/controls.tsx
+++ b/apps/desktop/src/app/chat/composer/controls.tsx
@ -43,6 +43,7 @@ export function ComposerControls({
  busyAction,
  canSteer,
  canSubmit,
+  compactModelPill = false,
  conversation,
  disabled,
  hasComposerPayload,
@ -55,6 +56,7 @@ export function ComposerControls({
  busyAction: 'queue' | 'stop'
  canSteer: boolean
  canSubmit: boolean
+  compactModelPill?: boolean
  conversation: ConversationProps
  disabled: boolean
  hasComposerPayload: boolean
@ -83,7 +85,7 @@ export function ComposerControls({

  return (
    <div className="ml-auto flex shrink-0 items-center gap-(--composer-control-gap)">
-      <ModelPill disabled={disabled} model={state.model} />
+      <ModelPill compact={compactModelPill} disabled={disabled} model={state.model} />
      {/* While the agent runs and the user is typing, steer takes over the mic's
          slot rather than crowding the row with an extra button. */}
      {canSteer ? (
@ -97,7 +99,7 @@ export function ComposerControls({
            type="button"
            variant="ghost"
          >
-            <SteeringWheel size={16} />
+            <SteeringWheel size={14} />
          </Button>
        </Tip>
      ) : (
@ -116,7 +118,7 @@ export function ComposerControls({
            size="icon"
            type="button"
          >
-            <AudioLines size={17} />
+            <AudioLines size={15} />
          </Button>
        </Tip>
      ) : (
@ -129,12 +131,12 @@ export function ComposerControls({
          >
            {busy ? (
              busyAction === 'queue' ? (
-                <Layers3 size={16} />
+                <Layers3 size={14} />
              ) : (
-                <span className="block size-3 rounded-[0.1875rem] bg-current" />
+                <span className="block size-2.5 rounded-[0.1875rem] bg-current" />
              )
            ) : (
-              <Codicon name="arrow-up" size="1rem" />
+              <Codicon name="arrow-up" size="0.875rem" />
            )}
          </Button>
        </Tip>
@ -293,11 +295,11 @@ function DictationButton({
        variant="ghost"
      >
        {status === 'recording' ? (
-          <Square className="fill-current" size={12} />
+          <Square className="fill-current" size={11} />
        ) : status === 'transcribing' ? (
-          <Loader2 className="animate-spin" size={16} />
+          <Loader2 className="animate-spin" size={14} />
        ) : (
-          <Codicon name="mic" size="1rem" />
+          <Codicon name="mic" size="0.875rem" />
        )}
      </Button>
    </Tip>
--- a/apps/desktop/src/app/chat/composer/hooks/use-popout-drag.ts
+++ b/apps/desktop/src/app/chat/composer/hooks/use-popout-drag.ts
@ -0,0 +1,352 @@
+import {
+  type PointerEvent as ReactPointerEvent,
+  type RefObject,
+  useCallback,
+  useEffect,
+  useRef,
+  useState
+} from 'react'
+
+import {
+  POPOUT_ESTIMATED_HEIGHT,
+  POPOUT_WIDTH_REM,
+  setComposerPopoutPosition,
+  type PopoutPosition,
+  type PopoutSize
+} from '@/store/composer-popout'
+
+// Floating surface long-press before it becomes draggable (the 5px platform drags
+// instantly; this only covers grabbing the composer body itself).
+const LONG_PRESS_MS = 360
+const LONG_PRESS_MOVE_TOLERANCE = 10
+// Upward drag distance from the docked composer that peels it off into a float.
+const PEEL_OUT_PX = 16
+const DOCK_ZONE_BOTTOM_PX = 72
+// How close the composer's center must be to the viewport center (px) to count as
+// "over the dock". Kept tight so the bottom-left/right corners stay free.
+const DOCK_ZONE_CENTER_TOLERANCE_PX = 150
+// Falloff distances over which dock proximity ramps from 1 (in-zone) down to 0.
+const DOCK_VERTICAL_FALLOFF_PX = 260
+const DOCK_HORIZONTAL_FALLOFF_PX = 220
+
+interface PressState {
+  armed: boolean
+  mode: 'dock' | 'float'
+  pointerId: number
+  startBottom: number
+  startRight: number
+  startX: number
+  startY: number
+}
+
+interface ComposerPopoutGesturesOptions {
+  composerRef: RefObject<HTMLFormElement | null>
+  onDock: () => void
+  onPopOut: () => void
+  poppedOut: boolean
+  position: PopoutPosition
+}
+
+function gestureTargetOk(target: EventTarget | null) {
+  if (!(target instanceof Element)) {
+    return false
+  }
+
+  return !target.closest('button, a, input, textarea, select, [role="menuitem"], [data-radix-popper-content-wrapper]')
+}
+
+/** Floating composer's 5px outer frame — grab here to drag without long-press. */
+function isFloatDragPlatform(target: EventTarget | null) {
+  if (!(target instanceof Element)) {
+    return false
+  }
+
+  if (!target.closest('[data-slot="composer-root"][data-popped-out]')) {
+    return false
+  }
+
+  if (target.closest('[data-slot="composer-surface"], [data-slot="composer-rich-input"]')) {
+    return false
+  }
+
+  return gestureTargetOk(target)
+}
+
+/** 0 (far) → 1 (inside the dock zone). Drives both the dock glow and the
+ *  release-to-dock test (which fires at proximity 1). */
+function dockProximityOf(rect: DOMRect) {
+  const horizontalDist = Math.abs(rect.left + rect.width / 2 - window.innerWidth / 2)
+  const verticalGap = window.innerHeight - DOCK_ZONE_BOTTOM_PX - rect.bottom
+
+  const v = verticalGap <= 0 ? 1 : Math.max(0, 1 - verticalGap / DOCK_VERTICAL_FALLOFF_PX)
+  const h =
+    horizontalDist <= DOCK_ZONE_CENTER_TOLERANCE_PX
+      ? 1
+      : Math.max(0, 1 - (horizontalDist - DOCK_ZONE_CENTER_TOLERANCE_PX) / DOCK_HORIZONTAL_FALLOFF_PX)
+
+  return v * h
+}
+
+const clampOffset = (value: number, max: number) => Math.min(Math.max(0, value), max)
+
+/** Fixed-position composer uses bottom/right insets; keep the grab point under the pointer. */
+function popoutPositionUnderPointer(
+  clientX: number,
+  clientY: number,
+  grabX: number,
+  grabY: number,
+  boxWidth: number,
+  boxHeight: number
+): PopoutPosition {
+  return {
+    bottom: window.innerHeight - clientY + grabY - boxHeight,
+    right: window.innerWidth - clientX + grabX - boxWidth
+  }
+}
+
+/**
+ * Gesture pop-out / dock for the composer — fully gestural, no hold-to-toggle.
+ *
+ * Docked: drag the composer upward (off the dock) to peel it out into a float,
+ * then keep dragging in the same motion.
+ * Floating: drag the 5px frame to move instantly, or long-press the body then
+ * drag; release over the bottom-center dock band to snap back in.
+ */
+export function useComposerPopoutGestures({
+  composerRef,
+  onDock,
+  onPopOut,
+  poppedOut,
+  position
+}: ComposerPopoutGesturesOptions) {
+  const [dragging, setDragging] = useState(false)
+  const [dockProximity, setDockProximity] = useState(0)
+
+  const stateRef = useRef<PressState | null>(null)
+  const timerRef = useRef<number | null>(null)
+  const liveRef = useRef(position)
+  liveRef.current = position
+
+  const onPopOutRef = useRef(onPopOut)
+  onPopOutRef.current = onPopOut
+
+  const clearTimer = useCallback(() => {
+    if (timerRef.current !== null) {
+      window.clearTimeout(timerRef.current)
+      timerRef.current = null
+    }
+  }, [])
+
+  const resetGesture = useCallback(() => {
+    clearTimer()
+    stateRef.current = null
+    setDragging(false)
+    setDockProximity(0)
+  }, [clearTimer])
+
+  const beginFloatDrag = useCallback(
+    (state: PressState, clientX: number, clientY: number, next: PopoutPosition, size?: PopoutSize) => {
+      clearTimer()
+      const clamped = setComposerPopoutPosition(next, { size })
+      liveRef.current = clamped
+
+      state.mode = 'float'
+      state.armed = true
+      state.startBottom = clamped.bottom
+      state.startRight = clamped.right
+      state.startX = clientX
+      state.startY = clientY
+
+      setDragging(true)
+    },
+    [clearTimer]
+  )
+
+  const peelOffFromDock = useCallback(
+    (state: PressState, clientX: number, clientY: number) => {
+      const composer = composerRef.current
+
+      if (!composer) {
+        return
+      }
+
+      const rem = parseFloat(getComputedStyle(document.documentElement).fontSize) || 16
+      const rect = composer.getBoundingClientRect()
+      const boxWidth = POPOUT_WIDTH_REM * rem
+      const boxHeight = POPOUT_ESTIMATED_HEIGHT
+      const grabX = clampOffset(state.startX - rect.left, boxWidth)
+      const grabY = clampOffset(state.startY - rect.top, boxHeight)
+      const next = popoutPositionUnderPointer(clientX, clientY, grabX, grabY, boxWidth, boxHeight)
+
+      beginFloatDrag(state, clientX, clientY, next, { height: boxHeight, width: boxWidth })
+      onPopOutRef.current()
+    },
+    [beginFloatDrag, composerRef]
+  )
+
+  const onPointerDown = useCallback(
+    (event: ReactPointerEvent<HTMLElement>) => {
+      if (event.button !== 0 || !gestureTargetOk(event.target)) {
+        return
+      }
+
+      // Floating: grabbing the 5px platform drags immediately.
+      if (poppedOut && isFloatDragPlatform(event.target)) {
+        stateRef.current = {
+          armed: true,
+          mode: 'float',
+          pointerId: event.pointerId,
+          startBottom: liveRef.current.bottom,
+          startRight: liveRef.current.right,
+          startX: event.clientX,
+          startY: event.clientY
+        }
+        setDragging(true)
+
+        return
+      }
+
+      stateRef.current = {
+        armed: false,
+        mode: poppedOut ? 'float' : 'dock',
+        pointerId: event.pointerId,
+        startBottom: liveRef.current.bottom,
+        startRight: liveRef.current.right,
+        startX: event.clientX,
+        startY: event.clientY
+      }
+
+      clearTimer()
+
+      // Docked has NO timer — pop-out is purely the upward peel gesture (handled
+      // in pointermove). Floating arms a long-press to drag the body.
+      if (poppedOut) {
+        timerRef.current = window.setTimeout(() => {
+          const state = stateRef.current
+
+          if (!state || state.armed) {
+            return
+          }
+
+          state.armed = true
+          setDragging(true)
+        }, LONG_PRESS_MS)
+      }
+    },
+    [clearTimer, poppedOut]
+  )
+
+  useEffect(() => {
+    // Coalesce drag updates to one per frame — pointermove can fire several times
+    // between paints on high-Hz mice, and each update re-renders + clamps.
+    let raf: number | null = null
+    let pending: { x: number; y: number } | null = null
+
+    const cancelRaf = () => {
+      if (raf !== null) {
+        cancelAnimationFrame(raf)
+        raf = null
+      }
+    }
+
+    const flush = () => {
+      raf = null
+      const state = stateRef.current
+
+      if (!state?.armed || state.mode !== 'float' || !pending) {
+        return
+      }
+
+      const composer = composerRef.current
+      const size = composer ? { height: composer.offsetHeight, width: composer.offsetWidth } : undefined
+
+      liveRef.current = setComposerPopoutPosition(
+        {
+          bottom: state.startBottom - (pending.y - state.startY),
+          right: state.startRight - (pending.x - state.startX)
+        },
+        { size }
+      )
+
+      if (composer) {
+        setDockProximity(dockProximityOf(composer.getBoundingClientRect()))
+      }
+    }
+
+    const handleMove = (event: PointerEvent) => {
+      const state = stateRef.current
+
+      if (!state || event.pointerId !== state.pointerId) {
+        return
+      }
+
+      // Pre-arm: cheap threshold checks run inline (no per-frame work yet).
+      if (!state.armed) {
+        const deltaX = event.clientX - state.startX
+        const deltaY = event.clientY - state.startY
+
+        if (state.mode === 'dock') {
+          // Peel off only on a clear upward drag — not a sideways/down wiggle.
+          if (-deltaY > PEEL_OUT_PX && -deltaY > Math.abs(deltaX)) {
+            peelOffFromDock(state, event.clientX, event.clientY)
+          } else if (Math.abs(deltaX) > PEEL_OUT_PX || deltaY > LONG_PRESS_MOVE_TOLERANCE) {
+            resetGesture()
+          }
+        } else if (Math.abs(deltaX) > LONG_PRESS_MOVE_TOLERANCE || Math.abs(deltaY) > LONG_PRESS_MOVE_TOLERANCE) {
+          // Float body long-press pending: movement cancels the hold.
+          resetGesture()
+        }
+
+        return
+      }
+
+      if (state.mode !== 'float') {
+        return
+      }
+
+      event.preventDefault()
+      pending = { x: event.clientX, y: event.clientY }
+      raf ??= requestAnimationFrame(flush)
+    }
+
+    const handleUp = (event: PointerEvent) => {
+      const state = stateRef.current
+
+      if (!state || event.pointerId !== state.pointerId) {
+        return
+      }
+
+      cancelRaf()
+
+      if (state.armed && state.mode === 'float') {
+        const composer = composerRef.current
+        const rect = composer?.getBoundingClientRect()
+
+        if (rect && dockProximityOf(rect) >= 1) {
+          onDock()
+        } else {
+          // Persist the resting position once, on release — never per move.
+          const size = composer ? { height: composer.offsetHeight, width: composer.offsetWidth } : undefined
+          setComposerPopoutPosition(liveRef.current, { persist: true, size })
+        }
+      }
+
+      resetGesture()
+    }
+
+    window.addEventListener('pointermove', handleMove)
+    window.addEventListener('pointerup', handleUp)
+    window.addEventListener('pointercancel', handleUp)
+
+    return () => {
+      cancelRaf()
+      window.removeEventListener('pointermove', handleMove)
+      window.removeEventListener('pointerup', handleUp)
+      window.removeEventListener('pointercancel', handleUp)
+    }
+  }, [composerRef, onDock, peelOffFromDock, resetGesture])
+
+  useEffect(() => clearTimer, [clearTimer])
+
+  return { dockProximity, dragging, onPointerDown }
+}
--- a/apps/desktop/src/app/chat/composer/index.tsx
+++ b/apps/desktop/src/app/chat/composer/index.tsx
@ -40,6 +40,13 @@ import {
  isBrowsingHistory,
  resetBrowseState
 } from '@/store/composer-input-history'
+import {
+  $composerPopoutPosition,
+  $composerPoppedOut,
+  POPOUT_WIDTH_REM,
+  setComposerPoppedOut,
+  setComposerPopoutPosition
+} from '@/store/composer-popout'
 import {
  $queuedPromptsBySession,
  enqueueQueuedPrompt,
@ -55,6 +62,7 @@ import { $statusItemsBySession } from '@/store/composer-status'
 import { notify } from '@/store/notifications'
 import { $gatewayState, $messages, setSessionPickerOpen } from '@/store/session'
 import { $threadScrolledUp } from '@/store/thread-scroll'
+import { isSecondaryWindow } from '@/store/windows'
 import { useTheme } from '@/themes'

 import { extractDroppedFiles, HERMES_PATHS_MIME, partitionDroppedFiles } from '../hooks/use-composer-actions'
@ -73,6 +81,7 @@ import {
 } from './focus'
 import { HelpHint } from './help-hint'
 import { useAtCompletions } from './hooks/use-at-completions'
+import { useComposerPopoutGestures } from './hooks/use-popout-drag'
 import { useSlashCompletions } from './hooks/use-slash-completions'
 import { useVoiceConversation } from './hooks/use-voice-conversation'
 import { useVoiceRecorder } from './hooks/use-voice-recorder'
@ -85,6 +94,7 @@ import {
 import { QueuePanel } from './queue-panel'
 import {
  composerPlainText,
+  deleteChipBeforeCaret,
  deleteSelectionInEditor,
  insertPlainTextAtCaret,
  normalizeComposerEditorDom,
@ -185,6 +195,13 @@ export function ChatBar({
  const queuedPromptsBySession = useStore($queuedPromptsBySession)
  const statusItemsBySession = useStore($statusItemsBySession)
  const scrolledUp = useStore($threadScrolledUp)
+  // Pop-out is a shared, persisted state — but secondary windows (the Ctrl+Shift+N
+  // tiny window, subagent watch windows) always start docked and can't pop out:
+  // a floating composer makes no sense in a single-session side window, and it
+  // would otherwise write the shared atom and yank the main window's composer out.
+  const popoutAllowed = !isSecondaryWindow()
+  const poppedOut = useStore($composerPoppedOut) && popoutAllowed
+  const popoutPosition = useStore($composerPopoutPosition)
  const activeQueueSessionKey = queueSessionKey || sessionId || null

  const queuedPrompts = useMemo(
@ -206,6 +223,32 @@ export function ChatBar({
  const composerRef = useRef<HTMLFormElement | null>(null)
  const composerSurfaceRef = useRef<HTMLDivElement | null>(null)
  const editorRef = useRef<HTMLDivElement | null>(null)
+
+  const handleComposerPopOut = useCallback(() => {
+    triggerHaptic('open')
+    setComposerPoppedOut(true)
+  }, [])
+
+  const handleComposerDock = useCallback(() => {
+    triggerHaptic('success')
+    setComposerPoppedOut(false)
+  }, [])
+
+  // Double-click the grab area toggles dock/float. Undocking restores the last
+  // position (the persisted atom is never cleared on dock).
+  const handleComposerToggle = useCallback(() => {
+    poppedOut ? handleComposerDock() : handleComposerPopOut()
+  }, [handleComposerDock, handleComposerPopOut, poppedOut])
+
+  const { dockProximity, dragging, onPointerDown: onComposerGesturePointerDown } =
+    useComposerPopoutGestures({
+      composerRef,
+      onDock: handleComposerDock,
+      onPopOut: handleComposerPopOut,
+      poppedOut,
+      position: popoutPosition
+    })
+
  const draftRef = useRef(draft)
  const pendingDraftPersistRef = useRef<{ scope: string | null; text: string } | null>(null)
  const activeQueueSessionKeyRef = useRef(activeQueueSessionKey)
@ -405,7 +448,10 @@ export function ChatBar({
      return
    }

-    if (draft.includes('\n')) {
+    // Only a non-trailing newline forces an immediate expand. A trailing newline
+    // (or phantom \n from contenteditable junk) is left to the ResizeObserver,
+    // which expands only when the editor's real height actually grows.
+    if (draft.trimEnd().includes('\n')) {
      setExpanded(true)
    }
  }, [draft, expanded])
@ -428,6 +474,20 @@ export function ChatBar({
      return
    }

+    // Floating composer is out of the thread's flow — it must not reserve any
+    // bottom clearance. Zero the measured vars so the thread reclaims the space.
+    // (Read globals here so the callback stays stable; mirror the popoutAllowed
+    // gate since secondary windows are forced docked.)
+    if ($composerPoppedOut.get() && !isSecondaryWindow()) {
+      const root = document.documentElement
+      lastBucketedHeightRef.current = 0
+      lastBucketedSurfaceHeightRef.current = 0
+      root.style.setProperty('--composer-measured-height', '0px')
+      root.style.setProperty('--composer-surface-measured-height', '0px')
+
+      return
+    }
+
    const { height, width } = composer.getBoundingClientRect()
    const surfaceHeight = composerSurfaceRef.current?.getBoundingClientRect().height
    const root = document.documentElement
@ -474,6 +534,35 @@ export function ChatBar({

  useResizeObserver(syncComposerMetrics, composerRef, composerSurfaceRef, editorRef)

+  // Toggling pop-out changes whether the composer reserves thread clearance.
+  // The ResizeObserver may not fire (the box can keep the same box size), so
+  // re-sync explicitly: docked republishes the measured height, floating zeroes
+  // it so the thread reclaims the bottom space.
+  useEffect(() => {
+    syncComposerMetrics()
+  }, [poppedOut, syncComposerMetrics])
+
+  // Keep the floating box on-screen: re-clamp (with the real measured size) when
+  // it pops out and whenever the window resizes — so a position persisted on a
+  // bigger/other monitor, or a shrunk window, can never strand it out of reach.
+  useEffect(() => {
+    if (!poppedOut) {
+      return undefined
+    }
+
+    const reclamp = (persist: boolean) => {
+      const el = composerRef.current
+      const size = el ? { height: el.offsetHeight, width: el.offsetWidth } : undefined
+      setComposerPopoutPosition($composerPopoutPosition.get(), { persist, size })
+    }
+
+    reclamp(true)
+    const onResize = () => reclamp(false)
+    window.addEventListener('resize', onResize)
+
+    return () => window.removeEventListener('resize', onResize)
+  }, [poppedOut])
+
  useEffect(() => {
    return () => {
      const root = document.documentElement
@ -832,6 +921,22 @@ export function ChatBar({
      return
    }

+    // Plain Backspace right after a directive chip: remove the chip + its
+    // auto-inserted trailing space as one unit, so deleting a directive never
+    // leaves an orphaned space. (Modified backspaces stay native.)
+    if (
+      event.key === 'Backspace' &&
+      !event.metaKey &&
+      !event.ctrlKey &&
+      !event.altKey &&
+      deleteChipBeforeCaret(event.currentTarget)
+    ) {
+      event.preventDefault()
+      flushEditorToDraft(event.currentTarget)
+
+      return
+    }
+
    // Non-collapsed Backspace/Delete: native selection-delete is ~O(n²) on large
    // drafts (Ctrl+A → Delete froze ~1.3s). Collapsed carets fall through.
    if (
@ -1720,6 +1825,7 @@ export function ChatBar({
      busyAction={busyAction}
      canSteer={canSteer}
      canSubmit={canSubmit}
+      compactModelPill={poppedOut}
      conversation={{
        active: voiceConversationActive,
        level: conversation.level,
@ -1750,7 +1856,7 @@ export function ChatBar({
        autoCapitalize="off"
        autoCorrect="off"
        className={cn(
-          'min-h-(--composer-input-min-height) max-h-(--composer-input-max-height) overflow-y-auto whitespace-pre-wrap break-words [overflow-wrap:anywhere] bg-transparent pb-1 pr-1 pt-1 leading-normal text-foreground outline-none disabled:cursor-not-allowed',
+          'min-h-(--composer-input-min-height) max-h-(--composer-input-max-height) cursor-text overflow-y-auto whitespace-pre-wrap break-words [overflow-wrap:anywhere] bg-transparent pb-1 pr-1 pt-1 leading-normal text-foreground outline-none disabled:cursor-not-allowed',
          'empty:before:content-[attr(data-placeholder)] empty:before:text-muted-foreground/60',
          '**:data-ref-text:cursor-default',
          stacked && 'pl-3',
@ -1819,10 +1925,34 @@ export function ChatBar({

  return (
    <>
+      {dragging && poppedOut && (
+        <div
+          aria-hidden
+          className="pointer-events-none fixed inset-x-0 bottom-0 z-20 h-32"
+          style={{
+            // A bottom-centered radial glow — soft on every side by construction,
+            // so it reads as the dock target without any hard band edges. Its
+            // intensity tracks how close the composer is to the dock (1 = peak).
+            background:
+              'radial-gradient(64% 130% at 50% 100%, color-mix(in srgb, var(--color-primary) 26%, transparent) 0%, transparent 70%)',
+            // Scaled by --dock-glow-scale (lower in light mode — see styles.css).
+            opacity: `calc(${0.1 + dockProximity * 0.57} * var(--dock-glow-scale, 1))`
+          }}
+        />
+      )}
      <ComposerPrimitive.Unstable_TriggerPopoverRoot>
        <ComposerPrimitive.Root
-          className="group/composer absolute bottom-0 left-1/2 z-30 w-[min(var(--composer-width),calc(100%-2rem))] max-w-full -translate-x-1/2 rounded-2xl pt-2 pb-[var(--composer-shell-pad-block-end)]"
+          className={cn(
+            'group/composer z-30 overflow-visible rounded-2xl',
+            poppedOut
+              ? // Floating: the composer (with its own border) floats with an even
+                // 5px transparent grab margin around it — drag that to move it.
+                'fixed w-[var(--composer-popout-width)] max-w-[calc(100vw-1.5rem)] bg-transparent p-[5px]'
+              : 'absolute bottom-0 left-1/2 w-[min(var(--composer-width),calc(100%-2rem))] max-w-full -translate-x-1/2 pt-2 pb-[var(--composer-shell-pad-block-end)]',
+            dragging && 'cursor-grabbing select-none touch-none'
+          )}
          data-drag-active={dragActive ? '' : undefined}
+          data-popped-out={poppedOut ? '' : undefined}
          data-slot="composer-root"
          data-status-stack={statusStackVisible ? '' : undefined}
          data-thread-scrolled-up={scrolledUp ? '' : undefined}
@ -1830,6 +1960,7 @@ export function ChatBar({
          onDragLeave={handleDragLeave}
          onDragOver={handleDragOver}
          onDrop={handleDrop}
+          onPointerDown={popoutAllowed ? onComposerGesturePointerDown : undefined}
          onSubmit={e => {
            e.preventDefault()

@ -1840,6 +1971,16 @@ export function ChatBar({
            submitDraft()
          }}
          ref={composerRef}
+          style={
+            poppedOut
+              ? {
+                  bottom: `${popoutPosition.bottom}px`,
+                  right: `${popoutPosition.right}px`,
+                  // A compact one-sentence width when floating.
+                  ['--composer-popout-width' as string]: `${POPOUT_WIDTH_REM}rem`
+                }
+              : undefined
+          }
        >
          {showHelpHint && <HelpHint />}
          {trigger && !argStageEmpty && (
@ -1876,16 +2017,31 @@ export function ChatBar({
            }
            sessionId={statusSessionId}
          />
-          <div
-            className="pointer-events-none absolute inset-0 rounded-[inherit]"
-            style={{ background: COMPOSER_FADE_BACKGROUND }}
-          />
+          {!poppedOut && (
+            <div
+              className="pointer-events-none absolute inset-0 rounded-[inherit]"
+              style={{ background: COMPOSER_FADE_BACKGROUND }}
+            />
+          )}
+          {/* Drag region: covers the transparent grab margin around the surface.
+              The surface sits on top (z-4) so only the exposed ring receives this
+              element's hover/cursor — grab cursor + a diagonal hatch (/////)
+              appear when you hover the draggable margin, never over the input.
+              The hatch pattern + opacity ladder live in styles.css. */}
+          {popoutAllowed && (
+            <div
+              aria-hidden
+              className={cn('pointer-events-auto absolute inset-0', dragging ? 'cursor-grabbing' : 'cursor-grab')}
+              data-dragging={dragging ? '' : undefined}
+              data-slot="composer-drag-region"
+              onDoubleClick={handleComposerToggle}
+            />
+          )}
          <div className="relative w-full rounded-[inherit]">
            <div
              className={cn(
                'group/composer-surface relative z-4 isolate rounded-[inherit] border border-[color-mix(in_srgb,var(--dt-composer-ring)_calc(18%*var(--composer-ring-strength)),var(--dt-input))] transition-[border-color] duration-200 ease-out focus-within:border-[color-mix(in_srgb,var(--dt-composer-ring)_calc(45%*var(--composer-ring-strength)),transparent)]',
                COMPOSER_DROP_FADE_CLASS,
-                'group-has-data-[state=open]/composer:border-t-transparent',
                dragActive && COMPOSER_DROP_ACTIVE_CLASS
              )}
              data-slot="composer-surface"
@ -1941,7 +2097,7 @@ export function ChatBar({
                      : 'grid-cols-[auto_1fr_auto] items-center gap-(--composer-control-gap) [grid-template-areas:"menu_input_controls"]'
                  )}
                >
-                  <div className="flex items-center [grid-area:menu]">{contextMenu}</div>
+                  <div className="flex translate-y-[3px] items-start self-start [grid-area:menu]">{contextMenu}</div>
                  <div className="min-w-0 [grid-area:input]">{input}</div>
                  <div className="flex items-center justify-end [grid-area:controls]">{controls}</div>
                </div>
--- a/apps/desktop/src/app/chat/composer/model-pill.tsx
+++ b/apps/desktop/src/app/chat/composer/model-pill.tsx
@ -29,7 +29,15 @@ const PILL = cn(
 * `model.options` dropdown (`modelMenuContent`) verbatim; falls back to the
 * full picker when the gateway is closed and no live menu exists.
 */
-export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatBarState['model'] }) {
+export function ModelPill({
+  compact = false,
+  disabled,
+  model
+}: {
+  compact?: boolean
+  disabled: boolean
+  model: ChatBarState['model']
+}) {
  const copy = useI18n().t.shell.statusbar
  const currentModel = useStore($currentModel)
  const currentProvider = useStore($currentProvider)
@ -40,7 +48,9 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
  // The model resolves a beat after the gateway/session comes up. Rather than
  // flash a literal "No model", show a quiet loader (inherits the pill text
  // color at half opacity) until a model lands.
-  const label = (
+  const label = compact ? (
+    <ChevronDown className="size-3.5 shrink-0 opacity-70" />
+  ) : (
    <>
      {currentModel.trim() ? (
        <span className="truncate">{formatModelStatusLabel(currentModel, { fastMode, reasoningEffort })}</span>
@ -51,13 +61,22 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
    </>
  )

+  // Compact (floating composer): a snug square holding just the chevron — no pill
+  // padding, sized to match the other composer icon buttons.
+  const pillClass = compact
+    ? cn(
+        'size-(--composer-control-size) shrink-0 justify-center gap-0 rounded-md p-0',
+        'text-(--ui-text-tertiary) hover:bg-(--chrome-action-hover) hover:text-foreground'
+      )
+    : PILL
+
  const title = currentProvider ? copy.modelTitle(currentProvider, currentModel || copy.modelNone) : copy.switchModel

  if (!model.modelMenuContent) {
    return (
      <Button
        aria-label={copy.openModelPicker}
-        className={PILL}
+        className={pillClass}
        disabled={disabled}
        onClick={() => setModelPickerOpen(true)}
        title={copy.openModelPicker}
@ -72,7 +91,14 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
  return (
    <DropdownMenu onOpenChange={setOpen} open={open}>
      <DropdownMenuTrigger asChild>
-        <Button aria-label={title} className={PILL} disabled={disabled} title={title} type="button" variant="ghost">
+        <Button
+          aria-label={title}
+          className={pillClass}
+          disabled={disabled}
+          title={title}
+          type="button"
+          variant="ghost"
+        >
          {label}
        </Button>
      </DropdownMenuTrigger>
--- a/apps/desktop/src/app/chat/composer/rich-editor.ts
+++ b/apps/desktop/src/app/chat/composer/rich-editor.ts
@ -172,6 +172,60 @@ export function insertPlainTextAtCaret(editor: HTMLElement, text: string) {
  }
 }

+/** Backspace at a collapsed caret immediately after a chip: delete the chip AND
+ *  the single trailing space we auto-insert after it, atomically — so removing a
+ *  directive never strands an orphaned space (the contenteditable-driven cleanup
+ *  was unreliable). Returns whether it ran. */
+export function deleteChipBeforeCaret(editor: HTMLElement): boolean {
+  const hit = composerSelectionRange(editor)
+
+  if (!hit || !hit.range.collapsed) {
+    return false
+  }
+
+  const { startContainer, startOffset } = hit.range
+  let chip: ChildNode | null = null
+
+  if (startContainer === editor) {
+    chip = startOffset > 0 ? editor.childNodes[startOffset - 1] : null
+  } else if (startContainer.nodeType === Node.TEXT_NODE && startOffset === 0) {
+    chip = startContainer.previousSibling
+  }
+
+  if (chip?.nodeType !== Node.ELEMENT_NODE || !(chip as HTMLElement).dataset.refText) {
+    return false
+  }
+
+  const after = chip.nextSibling
+  chip.remove()
+
+  // Drop the auto-inserted trailing space; keep any real following text.
+  if (after?.nodeType === Node.TEXT_NODE) {
+    const text = after.textContent ?? ''
+
+    if (text === ' ') {
+      after.remove()
+    } else if (text.startsWith(' ')) {
+      after.textContent = text.slice(1)
+    }
+  }
+
+  const caret = document.createRange()
+
+  if (after?.isConnected) {
+    caret.setStartBefore(after)
+  } else {
+    caret.selectNodeContents(editor)
+    caret.collapse(false)
+  }
+
+  caret.collapse(true)
+  hit.selection.removeAllRanges()
+  hit.selection.addRange(caret)
+
+  return true
+}
+
 /** Remove a non-collapsed selection in-editor. Skips collapsed carets so word/
 *  line delete (Opt/Cmd+Backspace) stays native. Returns whether anything ran. */
 export function deleteSelectionInEditor(editor: HTMLElement) {
@ -242,35 +296,68 @@ export function placeCaretEnd(element: HTMLElement) {
  selection?.addRange(range)
 }

-/** Drop contenteditable junk that serializes as `\n` and falsely expands the composer. */
-export function normalizeComposerEditorDom(editor: HTMLElement) {
-  if (editor.childNodes.length === 1 && editor.firstChild?.nodeName === 'BR') {
-    editor.replaceChildren()
-
-    return
+/** Nothing but a break / whitespace (recursively) — i.e. no real text or chip. */
+function isBlankNode(node: ChildNode | null): boolean {
+  if (!node) {
+    return false
  }

+  if (node.nodeName === 'BR') {
+    return true
+  }
+
+  if (node.nodeType === Node.TEXT_NODE) {
+    return !(node.textContent || '').trim()
+  }
+
+  if (node.nodeType === Node.ELEMENT_NODE) {
+    const el = node as HTMLElement
+
+    return !el.dataset.refText && Array.from(el.childNodes).every(isBlankNode)
+  }
+
+  return false
+}
+
+/** Drop contenteditable junk that serializes as `\n` and falsely expands the
+ *  composer. Editing around a contenteditable=false chip makes Chromium wrap the
+ *  remainder in stray block <div>s / trailing <br>s — none of which our own
+ *  rendering emits (we use text nodes + <br> + chips). Real <br> line breaks
+ *  (Shift+Enter, which sit after actual text) are preserved. */
+export function normalizeComposerEditorDom(editor: HTMLElement) {
+  // A trailing block wrapper holding only a break/whitespace is the phantom
+  // "new line" Chromium adds after a chip on backspace — drop it.
+  const tailBlock = editor.lastChild as HTMLElement | null
+
+  if (
+    tailBlock?.nodeType === Node.ELEMENT_NODE &&
+    (tailBlock.tagName === 'DIV' || tailBlock.tagName === 'P') &&
+    isBlankNode(tailBlock)
+  ) {
+    editor.removeChild(tailBlock)
+  }
+
+  // Unwrap a lone block wrapper back to inline content.
  if (editor.childNodes.length === 1 && editor.firstChild?.nodeType === Node.ELEMENT_NODE) {
    const wrapper = editor.firstChild as HTMLElement

-    if (wrapper.tagName === 'DIV' && wrapper.dataset.slot !== RICH_INPUT_SLOT) {
+    if ((wrapper.tagName === 'DIV' || wrapper.tagName === 'P') && wrapper.dataset.slot !== RICH_INPUT_SLOT) {
      editor.replaceChildren(...Array.from(wrapper.childNodes))
    }
  }

+  // A trailing <br> right after a chip / only whitespace is a phantom line.
  const last = editor.lastChild

-  if (last?.nodeName !== 'BR') {
-    return
-  }
+  if (last?.nodeName === 'BR') {
+    let prev: ChildNode | null = last.previousSibling

-  let prev: ChildNode | null = last.previousSibling
+    while (prev?.nodeType === Node.TEXT_NODE && !(prev.textContent || '').trim()) {
+      prev = prev.previousSibling
+    }

-  while (prev?.nodeType === Node.TEXT_NODE && !(prev.textContent || '').trim()) {
-    prev = prev.previousSibling
-  }
-
-  if ((prev as HTMLElement | null)?.dataset.refText) {
-    editor.removeChild(last)
+    if (!prev || (prev as HTMLElement).dataset?.refText) {
+      editor.removeChild(last)
+    }
  }
 }
--- a/apps/desktop/src/app/chat/composer/trigger-popover.tsx
+++ b/apps/desktop/src/app/chat/composer/trigger-popover.tsx
@ -137,7 +137,7 @@ export function ComposerTriggerPopover({
                        floating tooltip. */}
                    <span
                      className={cn(
-                        'text-[0.8125rem] font-medium leading-snug text-foreground',
+                        'font-medium leading-snug text-foreground',
                        active ? 'whitespace-normal break-words' : 'truncate'
                      )}
                    >
@ -146,7 +146,7 @@ export function ComposerTriggerPopover({
                    {description && (
                      <span
                        className={cn(
-                          'text-[0.6875rem] leading-snug text-(--ui-text-tertiary)',
+                          'leading-snug text-(--ui-text-tertiary)',
                          active ? 'whitespace-normal break-words' : 'truncate'
                        )}
                      >
--- a/apps/desktop/src/app/chat/sidebar/session-actions-menu.test.ts
+++ b/apps/desktop/src/app/chat/sidebar/session-actions-menu.test.ts
@ -0,0 +1,92 @@
+import { afterEach, describe, expect, it, vi } from 'vitest'
+
+import { $activeSessionId, $selectedStoredSessionId } from '@/store/session'
+
+import { renameSessionPreferringRpc } from './session-actions-menu'
+
+// The branched-session rename bug: a freshly branched session lives only in the
+// gateway's runtime _sessions map (no state.db row yet), so REST PATCH
+// /api/sessions/{id} 404s with "Session not found". renameSessionPreferringRpc
+// must route the ACTIVE row through the session.title RPC (runtime id), which
+// persists the row on demand, and otherwise fall back to REST.
+
+const renameSession = vi.fn(async () => ({ ok: true, title: 'rest-title' }))
+const request = vi.fn(async () => ({ title: 'rpc-title' }) as never)
+const activeGateway = vi.fn<() => { request: typeof request } | null>(() => ({ request }))
+
+vi.mock('@/hermes', () => ({
+  renameSession: (...args: unknown[]) => renameSession(...(args as [])),
+  HermesGateway: class {}
+}))
+
+vi.mock('@/store/gateway', () => ({
+  activeGateway: () => activeGateway()
+}))
+
+const RUNTIME_ID = 'rt-runtime-1'
+const STORED_ID = 'stored-branch-1'
+
+afterEach(() => {
+  renameSession.mockClear()
+  request.mockClear()
+  activeGateway.mockReset()
+  activeGateway.mockReturnValue({ request })
+  $activeSessionId.set(null)
+  $selectedStoredSessionId.set(null)
+})
+
+describe('renameSessionPreferringRpc', () => {
+  it('renames the active branched session via the session.title RPC, not REST', async () => {
+    $selectedStoredSessionId.set(STORED_ID)
+    $activeSessionId.set(RUNTIME_ID)
+
+    const result = await renameSessionPreferringRpc(STORED_ID, 'My branch')
+
+    expect(request).toHaveBeenCalledWith('session.title', { session_id: RUNTIME_ID, title: 'My branch' })
+    expect(renameSession).not.toHaveBeenCalled()
+    expect(result.title).toBe('rpc-title')
+  })
+
+  it('falls back to REST when the RPC fails (e.g. socket mid-reconnect)', async () => {
+    $selectedStoredSessionId.set(STORED_ID)
+    $activeSessionId.set(RUNTIME_ID)
+    request.mockRejectedValueOnce(new Error('not connected'))
+
+    const result = await renameSessionPreferringRpc(STORED_ID, 'My branch', 'work')
+
+    expect(request).toHaveBeenCalledOnce()
+    expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', 'work')
+    expect(result.title).toBe('rest-title')
+  })
+
+  it('uses REST for a non-active row (background/persisted session)', async () => {
+    $selectedStoredSessionId.set('some-other-active-session')
+    $activeSessionId.set(RUNTIME_ID)
+
+    await renameSessionPreferringRpc(STORED_ID, 'My branch', 'work')
+
+    expect(request).not.toHaveBeenCalled()
+    expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', 'work')
+  })
+
+  it('uses REST when clearing the title (RPC rejects empty titles)', async () => {
+    $selectedStoredSessionId.set(STORED_ID)
+    $activeSessionId.set(RUNTIME_ID)
+
+    await renameSessionPreferringRpc(STORED_ID, '')
+
+    expect(request).not.toHaveBeenCalled()
+    expect(renameSession).toHaveBeenCalledWith(STORED_ID, '', undefined)
+  })
+
+  it('uses REST when no gateway is connected', async () => {
+    $selectedStoredSessionId.set(STORED_ID)
+    $activeSessionId.set(RUNTIME_ID)
+    activeGateway.mockReturnValue(null)
+
+    await renameSessionPreferringRpc(STORED_ID, 'My branch')
+
+    expect(request).not.toHaveBeenCalled()
+    expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', undefined)
+  })
+})
--- a/apps/desktop/src/app/chat/sidebar/session-actions-menu.tsx
+++ b/apps/desktop/src/app/chat/sidebar/session-actions-menu.tsx
@ -19,10 +19,58 @@ import { renameSession } from '@/hermes'
 import { useI18n } from '@/i18n'
 import { triggerHaptic } from '@/lib/haptics'
 import { exportSession } from '@/lib/session-export'
+import { activeGateway } from '@/store/gateway'
 import { notify, notifyError } from '@/store/notifications'
-import { setSessions } from '@/store/session'
+import { $activeSessionId, $selectedStoredSessionId, setSessions } from '@/store/session'
 import { canOpenSessionWindow, openSessionInNewWindow } from '@/store/windows'

+import type { SessionTitleResponse } from '../../types'
+
+// Rename a session, preferring the gateway's session.title RPC over REST.
+//
+// A freshly *branched* session (and any brand-new chat) lives only in the
+// gateway's in-memory _sessions map keyed by its RUNTIME id — no row is
+// persisted to state.db until the first turn. REST PATCH /api/sessions/{id}
+// resolves against the stored sessions table, so it 404s ("Session not found")
+// on these runtime-only sessions. The session.title RPC resolves the live
+// runtime session AND persists the row on demand, so it succeeds where REST
+// cannot. This mirrors the /title slash command's fix (use-prompt-actions.ts).
+//
+// We only take the RPC path for the ACTIVE/selected session: its runtime id is
+// known ($activeSessionId) and it lives on the active gateway, so there is no
+// profile-routing ambiguity. Every other row (already persisted, possibly on a
+// background profile) keeps the REST path, which handles profile scoping and a
+// non-empty title is required by the RPC (it rejects clears), so clears stay on
+// REST too.
+export async function renameSessionPreferringRpc(
+  storedSessionId: string,
+  title: string,
+  profile?: string
+): Promise<{ title?: string }> {
+  const isActiveRow = storedSessionId === $selectedStoredSessionId.get()
+  const runtimeId = isActiveRow ? $activeSessionId.get() : null
+  const gateway = activeGateway()
+
+  if (title && runtimeId && gateway) {
+    try {
+      const result = await gateway.request<SessionTitleResponse>('session.title', {
+        session_id: runtimeId,
+        title
+      })
+
+      return { title: result?.title ?? title }
+    } catch (err) {
+      // Fall through to REST — e.g. the socket is mid-reconnect. REST still
+      // works for any session that already has a persisted row. Log so a
+      // genuine RPC-side failure (which then surfaces a REST 404 for the
+      // runtime id) is at least diagnosable instead of silently swallowed.
+      console.warn('session.title RPC rename failed; falling back to REST', err)
+    }
+  }
+
+  return renameSession(storedSessionId, title, profile)
+}
+
 interface SessionActions {
  sessionId: string
  title: string
@ -235,7 +283,7 @@ function RenameSessionDialog({ open, onOpenChange, sessionId, currentTitle, prof
    setSubmitting(true)

    try {
-      const result = await renameSession(sessionId, next, profile)
+      const result = await renameSessionPreferringRpc(sessionId, next, profile)
      const finalTitle = result.title || next || ''
      setSessions(prev => prev.map(s => (s.id === sessionId ? { ...s, title: finalTitle || null } : s)))
      notify({ durationMs: 2_000, kind: 'success', message: r.renamed })
--- a/apps/desktop/src/app/command-center/index.tsx
+++ b/apps/desktop/src/app/command-center/index.tsx
@ -395,7 +395,7 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
                      </div>
                      <div className="flex shrink-0 items-center gap-1.5 whitespace-nowrap">
                        <Button onClick={() => void runSystemAction('restart')} size="xs" variant="text">
-                          {cc.restartMessaging}
+                          {cc.restartGateway}
                        </Button>
                        <Button onClick={() => void runSystemAction('update')} size="xs" variant="textStrong">
                          {cc.updateHermes}
@ -426,7 +426,10 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
                    </span>
                  )}
                </div>
-                <pre className="min-h-0 flex-1 overflow-auto whitespace-pre-wrap wrap-break-word rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-3 font-mono text-[0.65rem] leading-relaxed text-(--ui-text-tertiary)">
+                <pre
+                  className="min-h-0 flex-1 overflow-auto whitespace-pre-wrap wrap-break-word rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-3 font-mono text-[0.65rem] leading-relaxed text-(--ui-text-tertiary)"
+                  data-selectable-text="true"
+                >
                  {logs.length ? logs.join('\n') : cc.noLogs}
                </pre>
              </div>
--- a/apps/desktop/src/app/command-palette/index.tsx
+++ b/apps/desktop/src/app/command-palette/index.tsx
@ -31,6 +31,7 @@ import {
  Palette,
  PawPrint,
  Plus,
+  RefreshCw,
  Settings,
  Settings2,
  Sun,
@ -42,6 +43,7 @@ import {
 import { cn } from '@/lib/utils'
 import { $commandPaletteOpen, $commandPalettePage, closeCommandPalette, setCommandPaletteOpen } from '@/store/command-palette'
 import { $bindings } from '@/store/keybinds'
+import { runGatewayRestart } from '@/store/system-actions'
 import { luminance } from '@/themes/color'
 import { type ThemeMode, useTheme } from '@/themes/context'
 import { isUserTheme, resolveTheme } from '@/themes/user-themes'
@ -371,6 +373,13 @@ export function CommandPalette() {
            keywords: ['command center', 'usage', 'tokens', 'cost'],
            label: cc.sections.usage,
            run: go(`${COMMAND_CENTER_ROUTE}?section=usage`)
+          },
+          {
+            icon: RefreshCw,
+            id: 'cc-restart-gateway',
+            keywords: ['gateway', 'restart', 'messaging', 'reconnect', 'system'],
+            label: cc.restartGateway,
+            run: () => void runGatewayRestart()
          }
        ]
      },
--- a/apps/desktop/src/app/desktop-controller.tsx
+++ b/apps/desktop/src/app/desktop-controller.tsx
@ -8,12 +8,14 @@ import { DesktopInstallOverlay } from '@/components/desktop-install-overlay'
 import { DesktopOnboardingOverlay } from '@/components/desktop-onboarding-overlay'
 import { GatewayConnectingOverlay } from '@/components/gateway-connecting-overlay'
 import { Pane, PaneMain } from '@/components/pane-shell'
+import { RemoteDisplayBanner } from '@/components/remote-display-banner'
 import { useMediaQuery } from '@/hooks/use-media-query'
 import { useSkinCommand } from '@/themes/use-skin-command'

 import { formatRefValue } from '../components/assistant-ui/directive-text'
 import { getCronJobs, getSessionMessages, listAllProfileSessions, type SessionInfo, triggerCronJob } from '../hermes'
 import { type ChatMessage, chatMessageText, preserveLocalAssistantErrors, toChatMessages } from '../lib/chat-messages'
+import { storedSessionIdForNotification } from '../lib/session-ids'
 import {
  isMessagingSource,
  LOCAL_SESSION_SOURCE_IDS,
@ -279,16 +281,20 @@ export function DesktopController() {
    }
  }, [])

-  // Notification click: the main process already focused the window; jump to its session.
+  // Notification click: the main process already focused the window; jump to its
+  // session. Notifications are tagged with the gateway *runtime* session id, but
+  // the chat route is keyed by the *stored* id — navigating with the runtime id
+  // resumes a non-existent stored session ("session not found") and strands the
+  // user. Translate runtime -> stored before navigating.
  useEffect(() => {
    const unsubscribe = window.hermesDesktop?.onFocusSession?.(sessionId => {
      if (sessionId) {
-        navigate(sessionRoute(sessionId))
+        navigate(sessionRoute(storedSessionIdForNotification(sessionId, runtimeIdByStoredSessionIdRef.current)))
      }
    })

    return () => unsubscribe?.()
-  }, [navigate])
+  }, [navigate, runtimeIdByStoredSessionIdRef])

  // Notification action button (Approve/Reject) — resolve in place, no navigation.
  useEffect(() => {
@ -1001,6 +1007,7 @@ export function DesktopController() {

  const overlays = (
    <>
+      <RemoteDisplayBanner />
      {!isSecondaryWindow() && <DesktopInstallOverlay />}
      {!isSecondaryWindow() && (
        <DesktopOnboardingOverlay
--- a/apps/desktop/src/app/messaging/index.tsx
+++ b/apps/desktop/src/app/messaging/index.tsx
@ -17,6 +17,7 @@ import { type Translations, useI18n } from '@/i18n'
 import { AlertTriangle, ExternalLink, Save, Trash2 } from '@/lib/icons'
 import { cn } from '@/lib/utils'
 import { notify, notifyError } from '@/store/notifications'
+import { runGatewayRestart } from '@/store/system-actions'

 import { useRefreshHotkey } from '../hooks/use-refresh-hotkey'
 import { useRouteEnumParam } from '../hooks/use-route-enum-param'
@ -97,6 +98,8 @@ function fieldCopy(field: MessagingEnvVarInfo, m: Translations['messaging']) {
 export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, ...props }: MessagingViewProps) {
  const { t } = useI18n()
  const m = t.messaging
+  // Both save/toggle toasts offer the same one-click restart.
+  const restartGatewayAction = { label: t.commandCenter.restartGateway, onClick: () => void runGatewayRestart() }
  const [platforms, setPlatforms] = useState<MessagingPlatformInfo[] | null>(null)
  const [edits, setEdits] = useState<EditMap>({})
  const [query, setQuery] = useState('')
@ -197,7 +200,8 @@ export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, .
      notify({
        kind: 'success',
        title: enabled ? m.platformEnabled(platform.name) : m.platformDisabled(platform.name),
-        message: m.restartToApply
+        message: m.restartToApply,
+        action: restartGatewayAction
      })
    } catch (err) {
      notifyError(err, m.failedUpdate(platform.name))
@ -222,7 +226,8 @@ export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, .
      notify({
        kind: 'success',
        title: m.setupSaved(platform.name),
-        message: m.restartToReconnect
+        message: m.restartToReconnect,
+        action: restartGatewayAction
      })
    } catch (err) {
      notifyError(err, m.failedSave(platform.name))
--- a/apps/desktop/src/app/right-sidebar/index.tsx
+++ b/apps/desktop/src/app/right-sidebar/index.tsx
@ -173,6 +173,7 @@ function FilesystemTab({
          disabled={!hasCwd || loading}
          onClick={onRefresh}
          size="icon-xs"
+          title={r.refreshTree}
          variant="ghost"
        >
          <Codicon name="refresh" size="0.8125rem" spinning={loading} />
@ -182,6 +183,7 @@ function FilesystemTab({
          className={HEADER_ACTION_CLASS}
          onClick={() => void onChangeFolder()}
          size="icon-xs"
+          title={r.openFolder}
          variant="ghost"
        >
          <Codicon name="folder-opened" size="0.8125rem" />
@ -192,6 +194,7 @@ function FilesystemTab({
          disabled={!hasCwd || !canCollapse}
          onClick={onCollapseAll}
          size="icon-xs"
+          title={r.collapseAll}
          variant="ghost"
        >
          <Codicon name="collapse-all" size="0.8125rem" />
--- a/apps/desktop/src/app/session/hooks/use-prompt-actions.test.tsx
+++ b/apps/desktop/src/app/session/hooks/use-prompt-actions.test.tsx
@ -205,6 +205,67 @@ describe('usePromptActions /title', () => {
  })
 })

+describe('usePromptActions slash.exec dispatch payloads', () => {
+  afterEach(() => {
+    cleanup()
+    $busy.set(false)
+    vi.restoreAllMocks()
+  })
+
+  it('submits /goal send directives returned directly by slash.exec instead of rendering no output', async () => {
+    const calls: { method: string; params?: Record<string, unknown> }[] = []
+    const states: Record<string, unknown>[] = []
+    const requestGateway = vi.fn(async (method: string, params?: Record<string, unknown>) => {
+      calls.push({ method, params })
+
+      if (method === 'slash.exec') {
+        return {
+          type: 'send',
+          notice: '⊙ Goal set. Starting now.',
+          message: 'write the implementation plan'
+        } as never
+      }
+
+      return {} as never
+    })
+
+    let handle: HarnessHandle | null = null
+    render(
+      <Harness
+        onReady={h => (handle = h)}
+        onSeedState={s => states.push(s)}
+        refreshSessions={async () => undefined}
+        requestGateway={requestGateway}
+      />
+    )
+
+    await handle!.submitText('/goal write the implementation plan')
+
+    expect(calls.map(c => c.method)).toEqual(['slash.exec', 'prompt.submit'])
+    expect(calls[0]?.params).toEqual({
+      command: 'goal write the implementation plan',
+      session_id: RUNTIME_SESSION_ID
+    })
+    expect(calls[1]?.params).toEqual({
+      session_id: RUNTIME_SESSION_ID,
+      text: 'write the implementation plan'
+    })
+
+    const renderedText = states
+      .flatMap(state => {
+        const messages = Array.isArray(state.messages)
+          ? (state.messages as Array<{ parts?: Array<{ text?: string }> }>)
+          : []
+
+        return messages.flatMap(message => (message.parts ?? []).map(part => part.text ?? ''))
+      })
+      .join('\n')
+
+    expect(renderedText).toContain('⊙ Goal set. Starting now.')
+    expect(renderedText).not.toContain('/goal: no output')
+  })
+})
+
 describe('usePromptActions desktop slash pickers', () => {
  beforeEach(() => {
    setSessions(() => [sessionInfo({ id: '20260610_120000_abcdef', title: 'Loaded session' })])
--- a/apps/desktop/src/app/session/hooks/use-prompt-actions.ts
+++ b/apps/desktop/src/app/session/hooks/use-prompt-actions.ts
@ -33,6 +33,7 @@ import {
  clearComposerAttachments,
  type ComposerAttachment,
  setComposerAttachmentUploadState,
+  setComposerDraft,
  terminalContextBlocksFromDraft,
  updateComposerAttachment
 } from '@/store/composer'
@ -916,31 +917,7 @@ export function usePromptActions({
          return
        }

-        try {
-          const result = await requestGateway<SlashExecResponse>('slash.exec', {
-            session_id: sessionId,
-            command: command.replace(/^\/+/, '')
-          })
-
-          const body = result?.output || `/${name}: no output`
-          renderSlashOutput(result?.warning ? `warning: ${result.warning}\n${body}` : body)
-
-          return
-        } catch {
-          // Fall back to command.dispatch for skill/send/alias directives.
-        }
-
-        try {
-          const dispatch = parseCommandDispatch(
-            await requestGateway<unknown>('command.dispatch', { session_id: sessionId, name, arg })
-          )
-
-          if (!dispatch) {
-            renderSlashOutput('error: invalid response: command.dispatch')
-
-            return
-          }
-
+        const handleDispatch = async (dispatch: NonNullable<ReturnType<typeof parseCommandDispatch>>): Promise<void> => {
          if (dispatch.type === 'exec' || dispatch.type === 'plugin') {
            renderSlashOutput(dispatch.output ?? '(no output)')

@ -953,8 +930,26 @@ export function usePromptActions({
            return
          }

+          // send / prefill carry an optional `notice` (e.g. "⊙ Goal set …")
+          // that the backend wants shown as a system line before the message
+          // is acted on. Mirrors the TUI's createSlashHandler — without it a
+          // `/goal <text>` looked like it did nothing.
+          if ((dispatch.type === 'send' || dispatch.type === 'prefill') && dispatch.notice?.trim()) {
+            renderSlashOutput(dispatch.notice.trim())
+          }
+
          const message = ('message' in dispatch ? dispatch.message : '')?.trim() ?? ''

+          // /undo returns a prefill directive: drop the backed-up message into
+          // the composer for editing instead of submitting it immediately.
+          if (dispatch.type === 'prefill') {
+            if (message) {
+              setComposerDraft(message)
+            }
+
+            return
+          }
+
          if (!message) {
            renderSlashOutput(
              `/${name}: ${dispatch.type === 'skill' ? 'skill payload missing message' : 'empty message'}`
@ -974,6 +969,43 @@ export function usePromptActions({
          }

          await submitPromptText(message)
+        }
+
+        try {
+          const result = await requestGateway<unknown>('slash.exec', {
+            session_id: sessionId,
+            command: command.replace(/^\/+/, '')
+          })
+
+          const dispatch = parseCommandDispatch(result)
+
+          if (dispatch) {
+            await handleDispatch(dispatch)
+
+            return
+          }
+
+          const output = result && typeof result === 'object' ? (result as SlashExecResponse) : null
+          const body = output?.output || `/${name}: no output`
+          renderSlashOutput(output?.warning ? `warning: ${output.warning}\n${body}` : body)
+
+          return
+        } catch {
+          // Fall back to command.dispatch for skill/send/alias directives.
+        }
+
+        try {
+          const dispatch = parseCommandDispatch(
+            await requestGateway<unknown>('command.dispatch', { session_id: sessionId, name, arg })
+          )
+
+          if (!dispatch) {
+            renderSlashOutput('error: invalid response: command.dispatch')
+
+            return
+          }
+
+          await handleDispatch(dispatch)
        } catch (err) {
          renderSlashOutput(`error: ${err instanceof Error ? err.message : String(err)}`)
        }
--- a/apps/desktop/src/app/settings/about-settings.tsx
+++ b/apps/desktop/src/app/settings/about-settings.tsx
@ -13,7 +13,8 @@ import {
  $updateStatus,
  checkUpdates,
  openUpdatesWindow,
-  refreshDesktopVersion
+  refreshDesktopVersion,
+  startActiveUpdate
 } from '@/store/updates'

 import { ListRow, SectionHeading, SettingsContent } from './primitives'
@ -141,9 +142,14 @@ export function AboutSettings() {
            </Button>

            {behind > 0 && supported && !applying && (
-              <Button onClick={() => openUpdatesWindow()} size="sm">
-                {a.seeWhatsNew}
-              </Button>
+              <>
+                <Button onClick={() => startActiveUpdate()} size="sm">
+                  {a.updateNow}
+                </Button>
+                <Button onClick={() => openUpdatesWindow()} size="sm" variant="textStrong">
+                  {a.seeWhatsNew}
+                </Button>
+              </>
            )}

            <Button asChild className="ml-auto" size="sm" variant="text">
--- a/apps/desktop/src/app/settings/constants.ts
+++ b/apps/desktop/src/app/settings/constants.ts
@ -74,7 +74,6 @@ export const PROVIDER_GROUPS: ProviderPrefix[] = [
    priority: 4
  },
  { prefix: 'GEMINI_', name: 'Gemini', priority: 4 },
-  { prefix: 'HERMES_GEMINI_', name: 'Gemini', priority: 4 },
  {
    prefix: 'DEEPSEEK_',
    name: 'DeepSeek',
--- a/apps/desktop/src/app/settings/helpers.test.ts
+++ b/apps/desktop/src/app/settings/helpers.test.ts
@ -132,9 +132,9 @@ describe('settings helpers', () => {
      // KIMI_CN_ likewise must beat KIMI_.
      expect(providerGroup('KIMI_CN_API_KEY')).toBe('Kimi (China)')
      expect(providerGroup('KIMI_API_KEY')).toBe('Kimi / Moonshot')
-      // HERMES_QWEN_ and HERMES_GEMINI_ both share the HERMES_ stem.
+      // HERMES_QWEN_ shares the HERMES_ stem with other integrations.
      expect(providerGroup('HERMES_QWEN_BASE_URL')).toBe('DashScope (Qwen)')
-      expect(providerGroup('HERMES_GEMINI_CLIENT_ID')).toBe('Gemini')
+      expect(providerGroup('GEMINI_API_KEY')).toBe('Gemini')
    })

    it('falls back to "Other" for un-grouped env vars', () => {
--- a/apps/desktop/src/app/settings/providers-settings.test.tsx
+++ b/apps/desktop/src/app/settings/providers-settings.test.tsx
@ -2,7 +2,7 @@ import { cleanup, fireEvent, render, screen, waitFor } from '@testing-library/re
 import { atom } from 'nanostores'
 import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'

-import type { OAuthProvider } from '@/types/hermes'
+import type { EnvVarInfo, OAuthProvider } from '@/types/hermes'

 const listOAuthProviders = vi.fn()
 const disconnectOAuthProvider = vi.fn()
@ -36,6 +36,25 @@ function provider(id: string, loggedIn: boolean, patch: Partial<OAuthProvider> =
  }
 }

+// One `/api/env` row (an EnvVarInfo) for the API-keys view. Mirrors the
+// `provider()` factory above: a valid base + per-test overrides, typed against
+// the real response shape so it can't drift from EnvVarInfo.
+function keyVar(patch: Partial<EnvVarInfo> = {}): EnvVarInfo {
+  return {
+    advanced: false,
+    category: 'provider',
+    description: '',
+    is_password: true,
+    is_set: false,
+    provider: '',
+    provider_label: '',
+    redacted_value: null,
+    tools: [],
+    url: '',
+    ...patch
+  }
+}
+
 beforeEach(() => {
  onboarding.set({ manual: false })
  getEnvVars.mockResolvedValue({})
@ -97,4 +116,56 @@ describe('ProvidersSettings', () => {
    expect(screen.queryByRole('button', { name: 'Remove Qwen Code' })).toBeNull()
    expect(screen.getByText(/managed by its own CLI/)).toBeTruthy()
  })
+
+  it('renders a Keys card for a backend-tagged provider with no PROVIDER_GROUPS prefix', async () => {
+    // A provider the backend catalog tags (provider/provider_label) but that has
+    // no desktop PROVIDER_GROUPS prefix row must still render its own card —
+    // this is the GUI/CLI drift fix: membership comes from the backend, not
+    // from the hand-maintained prefix list.
+    getEnvVars.mockResolvedValue({
+      WIDGETAI_API_KEY: keyVar({
+        provider: 'widgetai',
+        provider_label: 'WidgetAI',
+        url: 'https://widgetai.example/keys'
+      })
+    })
+    listOAuthProviders.mockResolvedValue({ providers: [] })
+
+    const { ProvidersSettings } = await import('./providers-settings')
+    render(<ProvidersSettings onClose={vi.fn()} onViewChange={vi.fn()} view="keys" />)
+
+    expect(await screen.findByText('WidgetAI')).toBeTruthy()
+  })
+
+  it('orders API-key providers by priority then name, and filters them via search', async () => {
+    // These three providers have no curated PROVIDER_GROUPS priority, so they
+    // share the default priority and fall back to alphabetical among themselves
+    // (Acme, Middle, Zebra) — exercising the name tiebreak of the priority sort.
+    getEnvVars.mockResolvedValue({
+      ZEBRA_API_KEY: keyVar({ provider: 'zebra', provider_label: 'Zebra' }),
+      ACME_API_KEY: keyVar({ provider: 'acme', provider_label: 'Acme' }),
+      MIDDLE_API_KEY: keyVar({ provider: 'middle', provider_label: 'Middle' })
+    })
+    listOAuthProviders.mockResolvedValue({ providers: [] })
+
+    const { ProvidersSettings } = await import('./providers-settings')
+    render(<ProvidersSettings onClose={vi.fn()} onViewChange={vi.fn()} view="keys" />)
+
+    // Equal priority → alphabetical tiebreak: Acme, Middle, Zebra.
+    await screen.findByText('Acme')
+    const labels = screen.getAllByText(/Acme|Middle|Zebra/).map(el => el.textContent)
+    expect(labels).toEqual(['Acme', 'Middle', 'Zebra'])
+
+    // Typing narrows the list to matching providers only.
+    const search = screen.getByPlaceholderText('Search providers…')
+    fireEvent.change(search, { target: { value: 'mid' } })
+
+    await waitFor(() => expect(screen.queryByText('Acme')).toBeNull())
+    expect(screen.getByText('Middle')).toBeTruthy()
+    expect(screen.queryByText('Zebra')).toBeNull()
+
+    // A non-matching query shows the empty-state copy.
+    fireEvent.change(search, { target: { value: 'nonesuch-xyz' } })
+    expect(await screen.findByText('No providers match your search.')).toBeTruthy()
+  })
 })
--- a/apps/desktop/src/app/settings/providers-settings.tsx
+++ b/apps/desktop/src/app/settings/providers-settings.tsx
@ -12,6 +12,7 @@ import {
  sortProviders
 } from '@/components/desktop-onboarding-overlay'
 import { Button } from '@/components/ui/button'
+import { SearchField } from '@/components/ui/search-field'
 import { disconnectOAuthProvider, listOAuthProviders } from '@/hermes'
 import { useI18n } from '@/i18n'
 import { Check, ChevronDown, ChevronRight, KeyRound, Loader2, Terminal, Trash2 } from '@/lib/icons'
@ -45,8 +46,17 @@ export const PROVIDER_VIEWS = ['accounts', 'keys'] as const
 export type ProviderView = (typeof PROVIDER_VIEWS)[number]

 // Group the env catalog by provider — one ListRow per vendor plus optional
-// advanced overrides (base URL, region, etc.). Groups without a key field and
-// the "Other" bucket are skipped.
+// advanced overrides (base URL, region, etc.). Groups without a key field are
+// skipped.
+//
+// Grouping key precedence:
+//   1. Backend `provider_label` / `provider` (from the unified provider catalog
+//      in hermes_cli/provider_catalog.py) — the SAME provider identity
+//      `hermes model` uses. This is authoritative: a provider tagged by the
+//      backend always renders a card, even with no PROVIDER_GROUPS row.
+//   2. Desktop prefix match (`providerGroup`) — legacy fallback for provider
+//      env vars that predate the backend tagging.
+// Only entries that resolve to neither (the "Other" bucket) are skipped.
 function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGroup[] {
  const buckets = new Map<string, [string, EnvVarInfo][]>()

@ -55,7 +65,9 @@ function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGr
      continue
    }

-    const name = providerGroup(key)
+    // Prefer the backend-supplied provider label/id so the Keys tab groups by
+    // the same identity the CLI picker uses; fall back to the prefix guess.
+    const name = info.provider_label?.trim() || info.provider?.trim() || providerGroup(key)

    if (name === 'Other') {
      continue
@ -73,6 +85,9 @@ function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGr
      continue
    }

+    // Presentation overlay (priority, blurb, docs) is keyed by the prefix-based
+    // group name; when the backend introduced this provider it may have no
+    // overlay entry, so fall back to the backend/env metadata for display.
    const meta = providerMeta(name)

    groups.push({
@ -131,6 +146,7 @@ function OAuthPicker({
  const rest = featured ? ordered.filter(p => p.id !== FEATURED_ID) : ordered
  // Keep connected accounts grouped and always visible; only the unconnected
  // providers hide behind the disclosure, so the page leads with what's set up.
+  // Both lists preserve `sortProviders` order (curated priority, then name).
  const connected = rest.filter(p => p.status?.logged_in)
  const others = rest.filter(p => !p.status?.logged_in)
  const collapsible = others.length > 0
@ -284,6 +300,8 @@ export function ProvidersSettings({ onClose, onViewChange, view }: ProvidersSett
  const [oauthProviders, setOauthProviders] = useState<OAuthProvider[]>([])
  const [openProvider, setOpenProvider] = useState<null | string>(null)
  const [disconnecting, setDisconnecting] = useState<null | string>(null)
+  // Free-text filter for the API-keys view (provider name / env-var key / desc).
+  const [keyQuery, setKeyQuery] = useState('')
  // The onboarding overlay owns the OAuth flow. Watch its `manual` flag so we
  // re-read connection state when the user finishes (or dismisses) a sign-in
  // they launched from this page — otherwise the cards keep their stale status.
@ -372,20 +390,49 @@ export function ProvidersSettings({ onClose, onViewChange, view }: ProvidersSett
  const keyGroups = buildProviderKeyGroups(vars)

  if (showApiKeys) {
+    const q = keyQuery.trim().toLowerCase()
+    const visibleGroups = q
+      ? keyGroups.filter(group => {
+          const haystack = [
+            group.name,
+            group.description ?? '',
+            group.primary[0],
+            ...group.advanced.map(([k]) => k)
+          ]
+
+          return haystack.some(s => s.toLowerCase().includes(q))
+        })
+      : keyGroups
+
    return (
      <SettingsContent>
        {keyGroups.length > 0 ? (
-          <div className="grid gap-2">
-            {keyGroups.map(group => (
-              <ProviderKeyRows
-                expanded={openProvider === group.name}
-                group={group}
-                key={group.name}
-                onExpand={() => setOpenProvider(group.name)}
-                onToggle={() => setOpenProvider(prev => (prev === group.name ? null : group.name))}
-                rowProps={rowProps}
-              />
-            ))}
+          <div className="grid gap-3">
+            <SearchField
+              aria-label={t.settings.providers.searchKeys}
+              containerClassName="w-full"
+              onChange={setKeyQuery}
+              placeholder={t.settings.providers.searchKeys}
+              value={keyQuery}
+            />
+            {visibleGroups.length > 0 ? (
+              <div className="grid gap-2">
+                {visibleGroups.map(group => (
+                  <ProviderKeyRows
+                    expanded={openProvider === group.name}
+                    group={group}
+                    key={group.name}
+                    onExpand={() => setOpenProvider(group.name)}
+                    onToggle={() => setOpenProvider(prev => (prev === group.name ? null : group.name))}
+                    rowProps={rowProps}
+                  />
+                ))}
+              </div>
+            ) : (
+              <div className="grid min-h-24 place-items-center px-4 py-6 text-center text-[length:var(--conversation-caption-font-size)] text-muted-foreground">
+                {t.settings.providers.noKeysMatch}
+              </div>
+            )}
          </div>
        ) : (
          <NoProviderKeys />
--- a/apps/desktop/src/app/settings/toolset-config-panel.tsx
+++ b/apps/desktop/src/app/settings/toolset-config-panel.tsx
@ -272,7 +272,10 @@ function PostSetupRunner({ toolset, postSetupKey, onComplete }: PostSetupRunnerP
      </div>

      {status && (status.lines.length > 0 || status.running) && (
-        <pre className="max-h-48 overflow-y-auto rounded-md bg-background px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground whitespace-pre-wrap">
+        <pre
+          className="max-h-48 overflow-y-auto rounded-md bg-background px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground whitespace-pre-wrap"
+          data-selectable-text="true"
+        >
          {status.lines.length > 0 ? status.lines.join('\n') : copy.postSetupStarting}
        </pre>
      )}
--- a/apps/desktop/src/app/shell/hooks/use-statusbar-items.tsx
+++ b/apps/desktop/src/app/shell/hooks/use-statusbar-items.tsx
@ -4,6 +4,7 @@ import { useCallback, useMemo } from 'react'
 import type { CommandCenterSection } from '@/app/command-center'
 import { $terminalTakeover, setTerminalTakeover } from '@/app/right-sidebar/store'
 import { GatewayMenuPanel } from '@/app/shell/gateway-menu-panel'
+import { GlyphSpinner } from '@/components/ui/glyph-spinner'
 import { useI18n } from '@/i18n'
 import {
  Activity,
@ -35,6 +36,7 @@ import {
  setYoloActive
 } from '@/store/session'
 import { $subagentsBySession, activeSubagentCount } from '@/store/subagents'
+import { $gatewayRestarting } from '@/store/system-actions'
 import {
  $backendUpdateApply,
  $backendUpdateStatus,
@ -89,6 +91,7 @@ export function useStatusbarItems({
  const busy = useStore($busy)
  const currentUsage = useStore($currentUsage)
  const desktopActionTasks = useStore($desktopActionTasks)
+  const gatewayRestarting = useStore($gatewayRestarting)
  const previewServerRestartStatus = useStore($previewServerRestartStatus)
  const sessionStartedAt = useStore($sessionStartedAt)
  const turnStartedAt = useStore($turnStartedAt)
@ -299,9 +302,15 @@ export function useStatusbarItems({
        variant: 'action'
      },
      {
-        className: gatewayClassName,
-        detail: gatewayDetail,
-        icon: inferenceReady ? <Activity className="size-3" /> : <AlertCircle className="size-3" />,
+        className: gatewayRestarting ? undefined : gatewayClassName,
+        detail: gatewayRestarting ? copy.gatewayRestarting : gatewayDetail,
+        icon: gatewayRestarting ? (
+          <GlyphSpinner ariaLabel={copy.gatewayRestarting} className="size-3" />
+        ) : inferenceReady ? (
+          <Activity className="size-3" />
+        ) : (
+          <AlertCircle className="size-3" />
+        ),
        id: 'gateway-health',
        label: copy.gateway,
        menuClassName: 'w-72',
@ -354,6 +363,7 @@ export function useStatusbarItems({
      gatewayMenuContent,
      gatewayClassName,
      gatewayDetail,
+      gatewayRestarting,
      inferenceReady,
      inferenceStatus?.reason,
      openAgents,
--- a/apps/desktop/src/app/shell/model-menu-panel.tsx
+++ b/apps/desktop/src/app/shell/model-menu-panel.tsx
@ -1,5 +1,5 @@
 import { useStore } from '@nanostores/react'
-import { useQuery } from '@tanstack/react-query'
+import { useQuery, useQueryClient } from '@tanstack/react-query'
 import { createContext, useContext, useMemo, useState } from 'react'

 import { Codicon } from '@/components/ui/codicon'
@ -62,6 +62,8 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
  const copy = t.shell.modelMenu
  const closeMenu = useContext(ModelMenuCloseContext)
  const [search, setSearch] = useState('')
+  const [refreshing, setRefreshing] = useState(false)
+  const queryClient = useQueryClient()
  // Reactive session state is read from the stores here (not drilled in), so
  // toggling effort/fast/model re-renders this panel in place without forcing
  // the parent to rebuild the menu content (which would close the dropdown).
@ -110,6 +112,38 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
  // next session.create (see selectModel). The default lives in Settings → Model.
  const switchTo = (model: string, provider: string) => onSelectModel({ model, provider })

+  // Explicit "Refresh Models": re-fetch the catalog with refresh:true so the
+  // backend busts its 1h provider-model disk cache and re-pulls each provider's
+  // live list. Fixes live-only models (e.g. OpenCode Zen free tier) vanishing
+  // when the cache expires and falls back to the curated static list.
+  const refreshModels = async () => {
+    if (refreshing) {
+      return
+    }
+
+    setRefreshing(true)
+
+    try {
+      const queryKey = ['model-options', activeSessionId || 'global']
+
+      const next =
+        gateway && activeSessionId
+          ? await gateway.request<ModelOptionsResponse>('model.options', {
+              session_id: activeSessionId,
+              refresh: true
+            })
+          : await getGlobalModelOptions({ refresh: true })
+
+      queryClient.setQueryData<ModelOptionsResponse>(queryKey, next)
+    } catch {
+      // Network/backend hiccup — fall back to a plain invalidate so the next
+      // open re-fetches (still cached, but no worse than before).
+      void queryClient.invalidateQueries({ queryKey: ['model-options'] })
+    } finally {
+      setRefreshing(false)
+    }
+  }
+
  // Selecting a model row restores that model's remembered preset onto the
  // session (effort/fast), gated by capability. Unset → Hermes defaults.
  const selectFamily = async (family: ModelFamily, provider: ModelOptionProvider) => {
@ -173,7 +207,7 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
          {copy.noModels}
        </DropdownMenuItem>
      ) : (
-        <div className="max-h-80 overflow-y-auto py-0.5">
+        <div className="max-h-[max(150px,30dvh)] overflow-y-auto py-0.5">
          {groups.map(group => (
            <DropdownMenuGroup className="py-0.5" key={group.provider.slug}>
              <DropdownMenuLabel className={dropdownMenuSectionLabel}>{group.provider.name}</DropdownMenuLabel>
@ -268,10 +302,23 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model

      <DropdownMenuSeparator className="mx-0" />

+      <DropdownMenuItem
+        className={cn(dropdownMenuRow, 'text-(--ui-text-tertiary)')}
+        disabled={refreshing}
+        onSelect={event => {
+          event.preventDefault()
+          void refreshModels()
+        }}
+      >
+        <Codicon className={cn(refreshing && 'animate-spin')} name="sync" size="0.75rem" />
+        {copy.refreshModels}
+      </DropdownMenuItem>
+
      <DropdownMenuItem
        className={cn(dropdownMenuRow, 'text-(--ui-text-tertiary)')}
        onSelect={() => setModelVisibilityOpen(true)}
      >
+        <Codicon name="settings-gear" size="0.75rem" />
        {copy.editModels}
      </DropdownMenuItem>
    </>
--- a/apps/desktop/src/app/types.ts
+++ b/apps/desktop/src/app/types.ts
@ -106,6 +106,13 @@ export interface SkillCommandDispatchResponse {
 export interface SendCommandDispatchResponse {
  type: 'send'
  message: string
+  notice?: string
+}
+
+export interface PrefillCommandDispatchResponse {
+  type: 'prefill'
+  message: string
+  notice?: string
 }

 export type CommandDispatchResponse =
@ -113,6 +120,7 @@ export type CommandDispatchResponse =
  | AliasCommandDispatchResponse
  | SkillCommandDispatchResponse
  | SendCommandDispatchResponse
+  | PrefillCommandDispatchResponse

 export type SidebarNavId = 'artifacts' | 'command-center' | 'messaging' | 'new-session' | 'settings' | 'skills'

--- a/apps/desktop/src/app/updates-overlay.tsx
+++ b/apps/desktop/src/app/updates-overlay.tsx
@ -61,14 +61,16 @@ export function UpdatesOverlay() {

  const behind = status?.behind ?? 0

-  const phase: 'idle' | 'applying' | 'manual' | 'error' =
+  const phase: 'idle' | 'applying' | 'manual' | 'guiSkew' | 'error' =
    apply.stage === 'manual'
      ? 'manual'
-      : apply.applying || apply.stage === 'restart'
-        ? 'applying'
-        : apply.stage === 'error'
-          ? 'error'
-          : 'idle'
+      : apply.stage === 'guiSkew'
+        ? 'guiSkew'
+        : apply.applying || apply.stage === 'restart'
+          ? 'applying'
+          : apply.stage === 'error'
+            ? 'error'
+            : 'idle'

  const handleClose = (next: boolean) => {
    if (phase === 'applying') {
@ -77,7 +79,13 @@ export function UpdatesOverlay() {

    setUpdateOverlayOpen(next)

-    if (!next && (apply.stage === 'error' || apply.stage === 'restart' || apply.stage === 'manual')) {
+    if (
+      !next &&
+      (apply.stage === 'error' ||
+        apply.stage === 'restart' ||
+        apply.stage === 'manual' ||
+        apply.stage === 'guiSkew')
+    ) {
      resetUpdateApplyState()
    }
  }
@ -95,7 +103,11 @@ export function UpdatesOverlay() {
        {phase === 'applying' && <ApplyingView apply={apply} isBackend={isBackend} />}

        {phase === 'manual' && (
-          <ManualView command={apply.command ?? 'hermes update'} onDone={() => handleClose(false)} />
+          <ManualView command={apply.command ?? null} message={apply.message} onDone={() => handleClose(false)} />
+        )}
+
+        {phase === 'guiSkew' && (
+          <GuiSkewView message={apply.message} onDone={() => handleClose(false)} />
        )}

        {phase === 'error' && (
@ -251,18 +263,48 @@ function IdleView({
  )
 }

-function ManualView({ command, onDone }: { command: string; onDone: () => void }) {
+function ManualView({
+  command,
+  message,
+  onDone
+}: {
+  command: string | null
+  message?: string
+  onDone: () => void
+}) {
  const { t } = useI18n()
  const u = t.updates
  const [copied, setCopied] = useState(false)

  const handleCopy = () => {
+    if (!command) return
    void writeClipboardText(command).then(() => {
      setCopied(true)
      window.setTimeout(() => setCopied(false), 1800)
    })
  }

+  // No command (e.g. the Linux sandbox-blocked relaunch): render the explanatory
+  // message + a Done button, not a copy-a-command box.
+  if (!command) {
+    return (
+      <div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
+        <div className="flex flex-col items-center gap-3 text-center">
+          <Terminal className="size-8 text-primary" />
+
+          <DialogTitle className="text-center text-xl">{u.manualTitle}</DialogTitle>
+          <DialogDescription className="text-center text-sm">
+            {message || u.manualPickedUp}
+          </DialogDescription>
+        </div>
+
+        <Button className="font-semibold" onClick={onDone} size="lg" variant="secondary">
+          {u.done}
+        </Button>
+      </div>
+    )
+  }
+
  return (
    <div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
      <div className="flex flex-col items-center gap-3 text-center">
@ -309,6 +351,32 @@ function ManualView({ command, onDone }: { command: string; onDone: () => void }
  )
 }

+// Linux GUI/backend skew (#45205): backend updated, but the running desktop app
+// package (AppImage/.deb/.rpm) was NOT changed. Closeable terminal state that
+// tells the user to update/reinstall the desktop app — never claims the GUI was
+// updated.
+function GuiSkewView({ message, onDone }: { message?: string; onDone: () => void }) {
+  const { t } = useI18n()
+  const u = t.updates
+
+  return (
+    <div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
+      <div className="flex flex-col items-center gap-3 text-center">
+        <AlertCircle className="size-8 text-amber-500" />
+
+        <DialogTitle className="text-center text-xl">{u.guiSkewTitle}</DialogTitle>
+        <DialogDescription className="max-w-prose text-center text-sm leading-5 text-muted-foreground">
+          {message || u.guiSkewBody}
+        </DialogDescription>
+      </div>
+
+      <Button className="font-semibold" onClick={onDone} size="lg" variant="secondary">
+        {u.done}
+      </Button>
+    </div>
+  )
+}
+
 function ApplyingView({ apply, isBackend }: { apply: UpdateApplyState; isBackend: boolean }) {
  const { t } = useI18n()
  const u = t.updates
--- a/apps/desktop/src/components/assistant-ui/thread.tsx
+++ b/apps/desktop/src/components/assistant-ui/thread.tsx
@ -859,7 +859,10 @@ const ProcessNotificationNote: FC<{ text: string }> = ({ text }) => {
          <summary className="cursor-pointer select-none text-muted-foreground/45 hover:text-muted-foreground/70">
            output
          </summary>
-          <pre className="mt-0.5 max-h-48 overflow-auto whitespace-pre-wrap font-mono text-[0.625rem] leading-4 text-muted-foreground/55">
+          <pre
+            className="mt-0.5 max-h-48 overflow-auto whitespace-pre-wrap font-mono text-[0.625rem] leading-4 text-muted-foreground/55"
+            data-selectable-text="true"
+          >
            {detail}
          </pre>
        </details>
--- a/apps/desktop/src/components/assistant-ui/tool-approval.test.tsx
+++ b/apps/desktop/src/components/assistant-ui/tool-approval.test.tsx
@ -1,4 +1,4 @@
-import { cleanup, fireEvent, render, screen, waitFor } from '@testing-library/react'
+import { cleanup, fireEvent, render, screen, waitFor, within } from '@testing-library/react'
 import { afterEach, beforeAll, describe, expect, it, vi } from 'vitest'

 import type { HermesGateway } from '@/hermes'
@ -6,7 +6,7 @@ import { $gateway } from '@/store/gateway'
 import { $approvalRequest, clearAllPrompts, setApprovalRequest } from '@/store/prompts'
 import { $activeSessionId } from '@/store/session'

-import { PendingToolApproval } from './tool-approval'
+import { PendingApprovalFallback, PendingToolApproval } from './tool-approval'
 import type { ToolPart } from './tool-fallback-model'

 // Radix's DropdownMenu touches pointer-capture + scrollIntoView, which jsdom
@ -130,4 +130,30 @@ describe('PendingToolApproval', () => {
    expect(await screen.findByRole('menuitem', { name: /Allow this session/ })).toBeTruthy()
    expect(screen.queryByRole('menuitem', { name: /Always allow/ })).toBeNull()
  })
+
+  it('renders a floating fallback when no pending tool row is mounted', () => {
+    setRequest('rm /tmp/hermes_approval_test.txt')
+    const { container } = render(<PendingApprovalFallback />)
+    const fallback = container.querySelector('[data-slot="tool-approval-fallback"]')
+
+    expect(fallback).not.toBeNull()
+    expect(within(fallback as HTMLElement).getByRole('button', { name: /Run/ })).toBeTruthy()
+    expect(within(fallback as HTMLElement).getByRole('button', { name: /Reject/ })).toBeTruthy()
+  })
+
+  it('hides the floating fallback once the inline approval bar is mounted', async () => {
+    setRequest('rm /tmp/hermes_approval_test.txt')
+
+    const { container } = render(
+      <>
+        <PendingToolApproval part={part('terminal')} />
+        <PendingApprovalFallback />
+      </>
+    )
+
+    await waitFor(() => {
+      expect(container.querySelector('[data-slot="tool-approval-inline"]')).not.toBeNull()
+      expect(container.querySelector('[data-slot="tool-approval-fallback"]')).toBeNull()
+    })
+  })
 })
--- a/apps/desktop/src/components/assistant-ui/tool-approval.tsx
+++ b/apps/desktop/src/components/assistant-ui/tool-approval.tsx
@ -15,11 +15,17 @@ import {
 import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigger } from '@/components/ui/dropdown-menu'
 import { useI18n } from '@/i18n'
 import { triggerHaptic } from '@/lib/haptics'
-import { ChevronDown, Loader2 } from '@/lib/icons'
+import { AlertCircle, ChevronDown, Loader2 } from '@/lib/icons'
 import { cn } from '@/lib/utils'
 import { $gateway } from '@/store/gateway'
 import { notifyError } from '@/store/notifications'
-import { $approvalRequest, type ApprovalRequest, clearApprovalRequest } from '@/store/prompts'
+import {
+  $approvalInlineVisible,
+  $approvalRequest,
+  type ApprovalRequest,
+  clearApprovalRequest,
+  registerApprovalInlineAnchor
+} from '@/store/prompts'

 import type { ToolPart } from './tool-fallback-model'

@ -48,12 +54,47 @@ export const PendingToolApproval: FC<{ part: ToolPart }> = ({ part }) => {
    return null
  }

-  return <ApprovalBar request={request} />
+  return <InlineApprovalBar request={request} />
+}
+
+const InlineApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
+  useEffect(() => registerApprovalInlineAnchor(), [])
+
+  return <ApprovalBar request={request} surface="inline" />
+}
+
+export const PendingApprovalFallback: FC = () => {
+  const { t } = useI18n()
+  const request = useStore($approvalRequest)
+  const inlineVisible = useStore($approvalInlineVisible)
+
+  if (!request || inlineVisible) {
+    return null
+  }
+
+  return (
+    <div
+      className="pointer-events-none absolute left-1/2 z-30 w-[calc(100%-2rem)] max-w-2xl -translate-x-1/2"
+      data-slot="tool-approval-fallback"
+      style={{ bottom: 'calc(var(--composer-measured-height) + var(--status-stack-measured-height) + 0.875rem)' }}
+    >
+      <div className="pointer-events-auto rounded-xl border border-primary/30 bg-(--ui-chat-surface-background) px-3 py-2 shadow-lg backdrop-blur-xl [-webkit-backdrop-filter:blur(1rem)]">
+        <div className="flex min-w-0 items-center gap-2 text-sm text-primary">
+          <AlertCircle className="size-4 shrink-0" />
+          <span className="shrink-0 font-medium">{t.assistant.approval.jumpToApproval}</span>
+          {request.description && (
+            <span className="min-w-0 truncate text-(--ui-text-tertiary)">{request.description}</span>
+          )}
+        </div>
+        <ApprovalBar request={request} surface="floating" />
+      </div>
+    </div>
+  )
 }

 const isMac = typeof navigator !== 'undefined' && /Mac|iP(hone|ad|od)/.test(navigator.platform)

-const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
+const ApprovalBar: FC<{ request: ApprovalRequest; surface: 'floating' | 'inline' }> = ({ request, surface }) => {
  const { t } = useI18n()
  const copy = t.assistant.approval
  const gateway = useStore($gateway)
@ -99,7 +140,7 @@ const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
        setSubmitting(null)
      }
    },
-    [busy, gateway, request.sessionId]
+    [busy, copy.gatewayDisconnected, copy.sendFailed, gateway, request.sessionId]
  )

  // ⌘/Ctrl+Enter → Run, Esc → Reject.
@ -126,7 +167,10 @@ const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
  }, [confirmAlways, respond])

  return (
-    <div className="mt-1 ps-5" data-slot="tool-approval-inline">
+    <div
+      className={cn(surface === 'inline' ? 'mt-1 ps-5' : 'mt-2')}
+      data-slot={surface === 'inline' ? 'tool-approval-inline' : 'tool-approval-actions'}
+    >
      <div className="flex items-center gap-2.5">
        <div className="inline-flex h-6 items-stretch overflow-hidden rounded-md border border-primary/25 bg-primary/10 text-primary">
          <Button
--- a/apps/desktop/src/components/assistant-ui/tool-fallback-model.test.ts
+++ b/apps/desktop/src/components/assistant-ui/tool-fallback-model.test.ts
@ -1,6 +1,11 @@
 import { describe, expect, it } from 'vitest'

-import { buildToolView, type ToolPart } from './tool-fallback-model'
+import {
+  buildToolView,
+  countDiffLineStats,
+  inlineDiffFromResult,
+  type ToolPart
+} from './tool-fallback-model'

 const part = (overrides: Partial<ToolPart>): ToolPart => ({
  args: {},
@ -64,3 +69,51 @@ describe('buildToolView terminal exit-code status', () => {
    )
  })
 })
+
+describe('buildToolView file edit diffs', () => {
+  const patchDiff = '--- a/src/demo.ts\n+++ b/src/demo.ts\n@@ -1 +1 @@\n-old\n+new'
+
+  it('reads inline_diff and diff fields from patch results', () => {
+    expect(inlineDiffFromResult({ inline_diff: patchDiff })).toBe(patchDiff)
+    expect(inlineDiffFromResult({ diff: patchDiff })).toBe(patchDiff)
+  })
+
+  it('suppresses raw patch args when a diff is available', () => {
+    const view = buildToolView(
+      part({
+        args: { context: 'src/demo.ts', mode: 'replace', new_string: 'new', path: 'src/demo.ts' },
+        result: { diff: patchDiff, success: true },
+        toolName: 'patch'
+      }),
+      patchDiff
+    )
+
+    expect(view.title).toBe('demo.ts')
+    expect(view.subtitle).toBe('src/demo.ts')
+    expect(view.detail).toBe('')
+    expect(view.inlineDiff).toBe(patchDiff)
+  })
+
+  it('shows path subtitle instead of patch args JSON while pending', () => {
+    const view = buildToolView(
+      part({
+        args: { context: 'src/demo.ts', mode: 'replace', new_string: 'new', path: 'src/demo.ts' },
+        result: undefined,
+        toolName: 'patch'
+      }),
+      ''
+    )
+
+    expect(view.title).toBe('demo.ts')
+    expect(view.subtitle).toBe('src/demo.ts')
+    expect(view.detail).toBe('')
+  })
+})
+
+describe('countDiffLineStats', () => {
+  it('counts added and removed lines', () => {
+    expect(
+      countDiffLineStats(`--- a/x\n+++ b/x\n@@\n-old\n+new\n context\n+another`)
+    ).toEqual({ added: 2, removed: 1 })
+  })
+})
--- a/apps/desktop/src/components/assistant-ui/tool-fallback-model.ts
+++ b/apps/desktop/src/components/assistant-ui/tool-fallback-model.ts
@ -72,6 +72,46 @@ export interface MessageRunningStateSlice {
  }
 }

+const FILE_EDIT_TOOL_NAMES = new Set(['edit_file', 'patch', 'write_file'])
+
+export function isFileEditTool(toolName: string): boolean {
+  return FILE_EDIT_TOOL_NAMES.has(toolName)
+}
+
+export interface DiffLineStats {
+  added: number
+  removed: number
+}
+
+export function countDiffLineStats(diff: string): DiffLineStats {
+  let added = 0
+  let removed = 0
+
+  for (const line of diff.split('\n')) {
+    if (line.startsWith('+') && !line.startsWith('+++')) {
+      added += 1
+    } else if (line.startsWith('-') && !line.startsWith('---')) {
+      removed += 1
+    }
+  }
+
+  return { added, removed }
+}
+
+function fileEditPath(args: Record<string, unknown>, result: Record<string, unknown>): string {
+  return (
+    firstStringField(args, ['path', 'file', 'filepath']) ||
+    firstStringField(result, ['path', 'file', 'filepath', 'resolved_path']) ||
+    htmlPathFromInlineDiff(firstStringField(result, ['inline_diff', 'diff']))
+  )
+}
+
+function fileEditBasename(path: string): string {
+  const normalized = path.replace(/\\/g, '/').trim()
+
+  return normalized.split('/').filter(Boolean).pop() || normalized
+}
+
 const TOOL_META: Record<string, ToolMeta> = {
  browser_click: { done: 'Clicked page element', pending: 'Clicking page element', icon: 'globe', tone: 'browser' },
  browser_fill: { done: 'Filled form field', pending: 'Filling form field', icon: 'globe', tone: 'browser' },
@ -95,7 +135,7 @@ const TOOL_META: Record<string, ToolMeta> = {
  execute_code: { done: 'Ran code', pending: 'Running code', icon: 'terminal', tone: 'terminal' },
  image_generate: { done: 'Generated image', pending: 'Generating image', icon: 'file-media', tone: 'image' },
  list_files: { done: 'Listed files', pending: 'Listing files', icon: 'files', tone: 'file' },
-  patch: { done: 'Patched file', pending: 'Patching file', icon: 'diff', tone: 'file' },
+  patch: { done: 'Patched file', pending: 'Patching file', icon: 'edit', tone: 'file' },
  read_file: { done: 'Read file', pending: 'Reading file', icon: 'file', tone: 'file' },
  search_files: { done: 'Searched files', pending: 'Searching files', icon: 'search', tone: 'file' },
  session_search_recall: {
@ -797,8 +837,8 @@ function toolPreviewTarget(toolName: string, args: Record<string, unknown>, resu
    return looksLikeUrl(explicit) ? explicit : findFirstUrl(args, result)
  }

-  if (toolName === 'write_file' || toolName === 'edit_file') {
-    return htmlPathFromInlineDiff(firstStringField(result, ['inline_diff']))
+  if (isFileEditTool(toolName)) {
+    return htmlPathFromInlineDiff(firstStringField(result, ['inline_diff', 'diff']))
  }

  return ''
@ -858,9 +898,17 @@ function stripDividerLines(value: string): string {
 }

 export function inlineDiffFromResult(result: unknown): string {
-  const value = parseMaybeObject(result).inline_diff
+  const record = parseMaybeObject(result)

-  return typeof value === 'string' ? stripInlineDiffChrome(value) : ''
+  for (const key of ['inline_diff', 'diff']) {
+    const value = record[key]
+
+    if (typeof value === 'string' && value.trim()) {
+      return stripInlineDiffChrome(value)
+    }
+  }
+
+  return ''
 }

 // Falls back to a string only when there's something concrete to render —
@ -1047,15 +1095,22 @@ function toolSubtitle(
    return command ? compactPreview(command, 120) : 'Executed command'
  }

-  if (toolName === 'read_file' || toolName === 'write_file' || toolName === 'edit_file') {
-    const path =
-      firstStringField(argsRecord, ['path', 'file', 'filepath']) ||
-      htmlPathFromInlineDiff(firstStringField(resultRecord, ['inline_diff']))
+  if (toolName === 'read_file' || isFileEditTool(toolName)) {
+    const isEdit = isFileEditTool(toolName)

-    return (
-      path ||
-      (firstStringField(resultRecord, ['inline_diff']) ? 'Changed file' : fallbackDetailText(argsRecord, resultRecord))
-    )
+    const path = isEdit
+      ? fileEditPath(argsRecord, resultRecord)
+      : firstStringField(argsRecord, ['path', 'file', 'filepath'])
+
+    if (path) {
+      return path
+    }
+
+    if (!isEdit) {
+      return fallbackDetailText(argsRecord, resultRecord)
+    }
+
+    return inlineDiffFromResult(resultRecord) ? 'Changed file' : ''
  }

  if (toolName === 'web_extract') {
@ -1153,8 +1208,22 @@ function toolDetailText(
    }
  }

-  if (part.toolName === 'write_file' || part.toolName === 'edit_file') {
-    return inlineDiffFromResult(part.result) ? '' : fallbackDetailText(argsRecord, resultRecord)
+  if (isFileEditTool(part.toolName)) {
+    if (inlineDiffFromResult(part.result)) {
+      return ''
+    }
+
+    const summary = firstStringField(resultRecord, ['message', 'summary'])
+
+    if (summary) {
+      return summary
+    }
+
+    if (fileEditPath(argsRecord, resultRecord)) {
+      return ''
+    }
+
+    return fallbackDetailText(argsRecord, resultRecord)
  }

  if (part.toolName === 'web_search') {
@ -1253,8 +1322,12 @@ export function toolCopyPayload(part: ToolPart, view: ToolView): { label: string
    }
  }

-  if (part.toolName === 'write_file' || part.toolName === 'edit_file') {
-    const path = firstStringField(args, ['path', 'file', 'filepath'])
+  if (isFileEditTool(part.toolName)) {
+    if (view.inlineDiff.trim()) {
+      return { label: copy.file, text: view.inlineDiff }
+    }
+
+    const path = fileEditPath(args, result)

    if (path) {
      return { label: copy.path, text: path }
@ -1304,6 +1377,14 @@ function dynamicTitle(
    }
  }

+  if (isFileEditTool(part.toolName)) {
+    const path = fileEditPath(args, result)
+
+    if (path) {
+      return fileEditBasename(path)
+    }
+  }
+
  return fallback
 }

@ -1317,7 +1398,12 @@ export function buildToolView(part: ToolPart, inlineDiff: string): ToolView {
  const title = dynamicTitle(part, argsRecord, resultRecord, baseTitle)
  const titleEnriched = title !== baseTitle
  const baseSubtitle = error || toolSubtitle(part, argsRecord, resultRecord)
-  const keepSubtitleWithTitle = part.toolName === 'terminal' || part.toolName === 'execute_code'
+
+  const keepSubtitleWithTitle =
+    part.toolName === 'terminal' ||
+    part.toolName === 'execute_code' ||
+    (isFileEditTool(part.toolName) && Boolean(baseSubtitle.trim()))
+
  const subtitle = titleEnriched && !error && !keepSubtitleWithTitle ? '' : baseSubtitle
  const detailBody = stripDividerLines(toolDetailText(part, argsRecord, resultRecord))

--- a/apps/desktop/src/components/assistant-ui/tool-fallback.tsx
+++ b/apps/desktop/src/components/assistant-ui/tool-fallback.tsx
@ -8,7 +8,7 @@ import { AnsiText } from '@/components/assistant-ui/ansi-text'
 import { useElapsedSeconds } from '@/components/chat/activity-timer'
 import { ActivityTimerText } from '@/components/chat/activity-timer-text'
 import { CompactMarkdown } from '@/components/chat/compact-markdown'
-import { DiffLines } from '@/components/chat/diff-lines'
+import { FileDiffPanel } from '@/components/chat/diff-lines'
 import { DisclosureRow } from '@/components/chat/disclosure-row'
 import { PreviewAttachment } from '@/components/chat/preview-attachment'
 import { ZoomableImage } from '@/components/chat/zoomable-image'
@ -16,6 +16,7 @@ import { Button } from '@/components/ui/button'
 import { Codicon } from '@/components/ui/codicon'
 import { CopyButton } from '@/components/ui/copy-button'
 import { FadeText } from '@/components/ui/fade-text'
+import { FileTypeIcon } from '@/components/ui/file-type-icon'
 import { GlyphSpinner } from '@/components/ui/glyph-spinner'
 import { ToolIcon } from '@/components/ui/tool-icon'
 import { Tip } from '@/components/ui/tooltip'
@ -32,7 +33,9 @@ import { PendingToolApproval } from './tool-approval'
 import {
  buildToolView,
  cleanVisibleText,
+  countDiffLineStats,
  inlineDiffFromResult,
+  isFileEditTool,
  isPreviewableTarget,
  looksRedundant,
  type SearchResultRow,
@ -133,9 +136,21 @@ function statusGlyph(status: ToolStatus, copy: ToolStatusCopy): ReactNode {
 // Leading glyph for any tool-row header. Status (running/error/warning)
 // takes precedence; otherwise falls back to the tool's codicon. Returns
 // null when neither applies so callers can render unconditionally.
-function ToolGlyph({ copy, icon, status }: { copy: ToolStatusCopy; icon?: string; status?: ToolStatus }) {
+function ToolGlyph({
+  copy,
+  filePath,
+  icon,
+  status
+}: {
+  copy: ToolStatusCopy
+  filePath?: string
+  icon?: string
+  status?: ToolStatus
+}) {
  const node = status ? (
    statusGlyph(status, copy)
+  ) : filePath ? (
+    <FileTypeIcon className="text-(--ui-text-tertiary)" path={filePath} size="0.875rem" />
  ) : icon ? (
    <ToolIcon className="text-(--ui-text-tertiary)" name={icon} size="0.875rem" />
  ) : null
@ -204,8 +219,13 @@ function ToolEntry({ part }: ToolEntryProps) {
  const toolViewMode = useStore($toolViewMode)
  const disclosureId = `tool-entry:${messageId}:${toolPartDisclosureId(part)}`
  const dismissed = useStore($toolRowDismissed(disclosureId))
-  const open = useDisclosureOpen(disclosureId)
  const isPending = messageRunning && part.result === undefined
+  const liveDiffs = useStore($toolInlineDiffs)
+  const sideDiff = part.toolCallId ? liveDiffs[part.toolCallId] || '' : ''
+  const inlineDiff = stripInlineDiffChrome(sideDiff) || inlineDiffFromResult(part.result)
+  const isFileEdit = isFileEditTool(part.toolName)
+  const defaultOpen = Boolean(inlineDiff)
+  const open = useDisclosureOpen(disclosureId, defaultOpen)
  const canDismiss = !isPending && !embedded
  // Only animate entries that mount while their message is actively
  // streaming — historical sessions mount with `messageRunning === false`,
@ -213,9 +233,6 @@ function ToolEntry({ part }: ToolEntryProps) {
  // handles its own enter animation, so embedded children skip it.
  const enterRef = useEnterAnimation(messageRunning && !embedded, `tool-entry:${disclosureId}`)
  const elapsed = useElapsedSeconds(isPending, `tool:${disclosureId}`)
-  const liveDiffs = useStore($toolInlineDiffs)
-  const sideDiff = part.toolCallId ? liveDiffs[part.toolCallId] || '' : ''
-  const inlineDiff = stripInlineDiffChrome(sideDiff) || inlineDiffFromResult(part.result)

  // Stale parts (no result, but message stopped running) get a synthetic
  // empty result so buildToolView treats them as completed-no-output.
@ -253,11 +270,12 @@ function ToolEntry({ part }: ToolEntryProps) {
  const detailMatchesSubtitle = looksRedundant(view.subtitle, view.detail)

  const showDetail =
-    (view.status === 'error' && Boolean(detailSections.summary || detailSections.body)) ||
-    (view.status !== 'error' &&
-      Boolean(view.detail) &&
-      !looksRedundant(view.title, view.detail) &&
-      !detailMatchesSubtitle)
+    !view.inlineDiff &&
+    ((view.status === 'error' && Boolean(detailSections.summary || detailSections.body)) ||
+      (view.status !== 'error' &&
+        Boolean(view.detail) &&
+        !looksRedundant(view.title, view.detail) &&
+        !detailMatchesSubtitle))

  const renderDetailAsCode =
    view.status !== 'error' &&
@ -283,6 +301,13 @@ function ToolEntry({ part }: ToolEntryProps) {

  const copyAction = useMemo(() => toolCopyPayload(part, view), [part, view])

+  const diffStats = useMemo(
+    () => (isFileEdit && view.inlineDiff ? countDiffLineStats(view.inlineDiff) : null),
+    [isFileEdit, view.inlineDiff]
+  )
+
+  const showDiffStats = !isPending && Boolean(diffStats && (diffStats.added > 0 || diffStats.removed > 0))
+
  // The header trailing slot only carries the live duration timer while the
  // tool is running. The copy control used to live here too, but an
  // `opacity-0` (yet still clickable) button straddling the caret/duration made
@ -299,7 +324,12 @@ function ToolEntry({ part }: ToolEntryProps) {
    <Tip label={statusCopy.dismiss}>
      <Button
        aria-label={statusCopy.dismiss}
-        className="size-5 rounded-md text-(--ui-text-tertiary) opacity-0 transition-opacity hover:text-(--ui-text-primary) hover:opacity-100 group-hover/disclosure-row:opacity-80 group-focus-within/disclosure-row:opacity-80"
+        className={cn(
+          'size-5 rounded-md text-(--ui-text-tertiary) transition-opacity hover:text-(--ui-text-primary) hover:opacity-100',
+          open
+            ? 'opacity-80'
+            : 'opacity-0 group-hover/disclosure-row:opacity-80 group-focus-within/disclosure-row:opacity-80'
+        )}
        onClick={event => {
          event.stopPropagation()
          dismissToolRow(disclosureId)
@ -317,13 +347,24 @@ function ToolEntry({ part }: ToolEntryProps) {
    return null
  }

+  // A completed file edit with no diff to review is a bare, unexpandable row.
+  // This is almost always a `write_file` create after a reload: only `patch`
+  // persists its diff in the tool result, so creates rehydrate diff-less and
+  // read like dead duplicates of the real diff row. Hide them — but keep
+  // in-flight writes (activity) and failures (errors) visible.
+  if (isFileEdit && !isPending && view.status !== 'error' && !view.inlineDiff) {
+    return null
+  }
+
  return (
    <div
      className={cn(
        'min-w-0 max-w-full overflow-hidden text-[length:var(--conversation-tool-font-size)] text-(--ui-text-tertiary)',
        open && 'rounded-[0.625rem] border border-(--ui-stroke-tertiary)'
      )}
+      data-file-edit={isFileEdit && open ? '' : undefined}
      data-slot="tool-block"
+      data-tool-row=""
      ref={enterRef}
    >
      <div className={cn(open && 'border-b border-(--ui-stroke-tertiary) px-2 py-1.5')}>
@ -333,8 +374,16 @@ function ToolEntry({ part }: ToolEntryProps) {
          open={open}
          trailing={trailing}
        >
-          <span className="flex min-w-0 items-center gap-1.5">
-            <ToolGlyph copy={copy} icon={view.icon} status={leadingStatus(isPending, view.status)} />
+          <span
+            className="flex min-w-0 items-center gap-1.5"
+            title={isFileEdit && view.subtitle ? view.subtitle : undefined}
+          >
+            <ToolGlyph
+              copy={copy}
+              filePath={isFileEdit ? view.subtitle : undefined}
+              icon={view.icon}
+              status={leadingStatus(isPending, view.status)}
+            />
            <FadeText
              className={cn(
                TOOL_HEADER_TITLE_CLASS,
@ -346,7 +395,17 @@ function ToolEntry({ part }: ToolEntryProps) {
              {view.title}
            </FadeText>
            {!isPending && view.countLabel && <span className={TOOL_HEADER_DURATION_CLASS}>{view.countLabel}</span>}
-            {!isPending && view.durationLabel && (
+            {showDiffStats && diffStats && (
+              <span className="flex shrink-0 items-center gap-1 font-mono text-[0.625rem] tabular-nums">
+                {diffStats.added > 0 && (
+                  <span className="text-emerald-600 dark:text-emerald-400">+{diffStats.added}</span>
+                )}
+                {diffStats.removed > 0 && (
+                  <span className="text-rose-600 dark:text-rose-400">−{diffStats.removed}</span>
+                )}
+              </span>
+            )}
+            {!isFileEdit && !isPending && view.durationLabel && (
              <span className={TOOL_HEADER_DURATION_CLASS}>{view.durationLabel}</span>
            )}
          </span>
@ -358,7 +417,7 @@ function ToolEntry({ part }: ToolEntryProps) {
          {copyAction.text && (
            <CopyButton
              appearance="inline"
-              className="absolute right-1.5 top-1.5 z-10 h-5 gap-0 rounded-md border border-(--ui-stroke-tertiary) bg-background/80 px-1 opacity-60 backdrop-blur-sm transition-opacity hover:opacity-100 focus-visible:opacity-100"
+              className="absolute right-1.5 top-1.5 z-10 h-5 gap-0 rounded-md border border-(--ui-stroke-tertiary) bg-background/80 px-1 opacity-100 backdrop-blur-sm transition-opacity hover:opacity-100 focus-visible:opacity-100"
              iconClassName="size-3"
              label={copyAction.label}
              showLabel={false}
@ -380,6 +439,7 @@ function ToolEntry({ part }: ToolEntryProps) {
              <SearchResultsList hits={view.searchHits} />
            </div>
          )}
+          {view.inlineDiff && <FileDiffPanel diff={view.inlineDiff} path={isFileEdit ? view.subtitle : undefined} />}
          {showDetail &&
            toolViewMode !== 'technical' &&
            (view.status === 'error' ? (
@ -448,14 +508,21 @@ function ToolEntry({ part }: ToolEntryProps) {
              </pre>
            </details>
          )}
-          {toolViewMode === 'technical' && (
+          {toolViewMode === 'technical' && !(isFileEdit && view.inlineDiff) && (
            <pre className={cn(TOOL_SECTION_PRE_CLASS, 'whitespace-pre-wrap wrap-anywhere')}>
              {rawTechnicalTrace(part.args, part.result)}
            </pre>
          )}
+          {toolViewMode === 'technical' && isFileEdit && view.inlineDiff && (
+            <details className="max-w-full">
+              <summary className={cn(TOOL_SECTION_LABEL_CLASS, 'mb-0 cursor-pointer')}>Tool payload</summary>
+              <pre className={cn(TOOL_SECTION_PRE_CLASS, 'mt-1 whitespace-pre-wrap wrap-anywhere')}>
+                {rawTechnicalTrace(part.args, part.result)}
+              </pre>
+            </details>
+          )}
        </div>
      )}
-      {open && view.inlineDiff && <DiffLines text={view.inlineDiff} />}
    </div>
  )
 }
@ -488,6 +555,7 @@ export const ToolGroupSlot: FC<PropsWithChildren<{ endIndex: number; startIndex:
      <div
        className="grid min-w-0 max-w-full gap-(--tool-row-gap) overflow-hidden"
        data-slot="tool-block"
+        data-tool-group=""
        ref={enterRef}
      >
        {children}
--- a/apps/desktop/src/components/chat/composer-dock.ts
+++ b/apps/desktop/src/components/chat/composer-dock.ts
@ -1,12 +1,9 @@
 import { cn } from '@/lib/utils'

 /**
- * The composer surface and everything docked to it (slash·@ popover, `?` help)
- * paint ONE shared `--composer-fill` var. The state ladder (rest / scrolled /
- * focused / drawer-open) lives in styles.css on `[data-slot='composer-root']`,
- * so the two layers can never disagree — drawer-open forces an opaque fill via
- * `:has()`, because translucent glass sampling different backdrops (thread vs
- * fade gradient) renders as different colors even with identical tints.
+ * The composer surface and the status/queue stack paint ONE shared
+ * `--composer-fill` var. The state ladder (rest / scrolled) lives in styles.css
+ * on `[data-slot='composer-root']`, so the layers can never disagree.
 */
 export const composerFill = 'bg-(--composer-fill)'

@ -26,6 +23,13 @@ const composerDockEdge = (edge: 'bottom' | 'top') =>
 export const composerDockCard = (edge: 'bottom' | 'top' = 'top') =>
  cn(composerDockEdge(edge), composerFill, composerSurfaceGlass)

-/** Fused docked card — completion drawers. Shares `--composer-fill` with the
- *  composer surface, which goes opaque while a drawer is open. */
-export const composerFusedDockCard = (edge: 'bottom' | 'top' = 'top') => cn(composerDockEdge(edge), composerFill)
+/** Floating composer panel skin — the `/`·`@`·`?` completion drawer and the
+ *  attach (`+`) menu. Glassy translucent card, hairline border, full radius,
+ *  smallest type, soft nous shadow. Uses an explicit fill (not `--composer-fill`)
+ *  so it renders identically whether mounted inside the composer or portaled out
+ *  of it. Visual skin only — consumers add their own size/position/padding. */
+export const composerPanelCard = cn(
+  'rounded-2xl border border-border/65 shadow-nous text-[length:var(--conversation-tool-font-size)]',
+  'bg-[color-mix(in_srgb,var(--dt-card)_72%,transparent)]',
+  composerSurfaceGlass
+)
--- a/apps/desktop/src/components/chat/diff-lines.tsx
+++ b/apps/desktop/src/components/chat/diff-lines.tsx
@ -1,33 +1,176 @@
-import * as React from 'react'
+'use client'

+import type { ReactNode } from 'react'
+import * as React from 'react'
+import { useShikiHighlighter } from 'react-shiki'
+import type { ShikiTransformer } from 'shiki'
+
+import { exceedsHighlightBudget, SHIKI_THEME } from '@/components/chat/shiki-highlighter'
+import { shikiLanguageForFilename } from '@/lib/markdown-code'
 import { cn } from '@/lib/utils'

 /**
- * Per-line classed renderer for unified diffs. Lives outside `CodeCard` so
- * tool-result panels (already nested inside a tool card) don't double-shell;
- * for markdown ` ```diff ` fences the standard `CodeCard` + Shiki path runs
- * instead and gives equivalent coloring.
+ * Renders a unified diff for a tool's file edit. Two paths share one parse:
+ *  - `SyntaxDiff` highlights the change *content* in the file's language via
+ *    Shiki, then a per-line transformer paints the add/remove tint on top.
+ *  - `DiffLines` is the color-only fallback (no language, over budget, or while
+ *    Shiki loads).
+ * Both drop git file-headers + `@@` hunk noise and the `+/-` gutter so changes
+ * read by color + a 2px gutter accent, the way Cursor does.
 */
-interface DiffLineKind {
-  className?: string
-  match: (line: string) => boolean
+type DiffKind = 'add' | 'context' | 'remove'
+
+interface DiffLine {
+  kind: DiffKind
+  text: string
 }

-const DIFF_LINE_KINDS: DiffLineKind[] = [
-  {
-    className: 'text-emerald-700 dark:text-emerald-300',
-    match: line => line.startsWith('+') && !line.startsWith('+++')
-  },
-  { className: 'text-rose-700 dark:text-rose-300', match: line => line.startsWith('-') && !line.startsWith('---') },
-  { className: 'text-sky-700 dark:text-sky-300', match: line => line.startsWith('@@') },
-  {
-    className: 'text-muted-foreground/70',
-    match: line => line.startsWith('---') || line.startsWith('+++') || / → /.test(line.slice(0, 60))
-  }
-]
+// Tint + 2px gutter accent per change kind. Text color is included for the
+// plain renderer; the Shiki path omits it so syntax colors win, layering only
+// the background + border.
+const DIFF_KIND_TINT: Record<DiffKind, string> = {
+  add: 'border-emerald-500 bg-emerald-500/12',
+  context: 'border-transparent',
+  remove: 'border-rose-500 bg-rose-500/12'
+}

-function classifyLine(line: string): string | undefined {
-  return DIFF_LINE_KINDS.find(kind => kind.match(line))?.className
+const DIFF_KIND_TEXT: Record<DiffKind, string> = {
+  add: 'text-emerald-800 dark:text-emerald-200',
+  context: '',
+  remove: 'text-rose-800 dark:text-rose-200'
+}
+
+const DIFF_LINE_BASE = 'block min-w-max whitespace-pre border-l-2 px-2.5 py-px'
+
+// Bleed out of the tool-card body's `p-1.5` so tints/borders run flush to the
+// card edges (rounded corners clip via the card's overflow); compact height
+// with internal scroll like a code block.
+const DIFF_BOX_CLASS =
+  '-mx-1.5 -mb-1.5 max-h-[12rem] max-w-none min-w-0 overflow-auto overscroll-contain font-mono text-[0.7rem] leading-relaxed text-(--ui-text-secondary)'
+
+function diffKind(line: string): DiffKind {
+  if (line.startsWith('+') && !line.startsWith('+++')) {
+    return 'add'
+  }
+
+  if (line.startsWith('-') && !line.startsWith('---')) {
+    return 'remove'
+  }
+
+  return 'context'
+}
+
+// Drop the leading +/-/space gutter so changes read by color alone, keeping the
+// rest of the indentation intact.
+function stripDiffMarker(line: string): string {
+  if (diffKind(line) !== 'context' || line.startsWith(' ')) {
+    return line.slice(1)
+  }
+
+  return line
+}
+
+// Git-style unified diffs arrive with a file-header preamble — `diff --git`,
+// `index …`, `--- a/path`, `+++ b/path`, and Hermes' own `a/path → b/path`
+// arrow line. That preamble just repeats the path (which the tool row already
+// shows) and reads especially badly for absolute paths (`a//Users/…`). Strip
+// the leading header zone up to the first hunk.
+const DIFF_HEADER_PREFIXES = ['diff --git', 'index ', '--- ', '+++ ', 'similarity ', 'rename ', 'new file', 'deleted file']
+
+function isArrowHeaderLine(line: string): boolean {
+  const trimmed = line.trim()
+
+  return trimmed.includes('→') && /^\S.*→\s*\S+$/.test(trimmed) && !/^[+\-@]/.test(trimmed)
+}
+
+/** Exported for tests. */
+export function stripDiffFileHeaders(diff: string): string {
+  const lines = diff.split('\n')
+  let start = 0
+
+  for (; start < lines.length; start += 1) {
+    const line = lines[start]
+
+    if (line.startsWith('@@')) {
+      break
+    }
+
+    if (line.trim() === '' || isArrowHeaderLine(line) || DIFF_HEADER_PREFIXES.some(prefix => line.startsWith(prefix))) {
+      continue
+    }
+
+    break
+  }
+
+  return lines.slice(start).join('\n')
+}
+
+// Cleaned diff → renderable lines: file-headers + `@@` hunks dropped (a blank
+// separator kept between hunks), markers stripped, kind recorded.
+function parseDiff(diff: string): DiffLine[] {
+  const out: DiffLine[] = []
+  let emitted = false
+
+  for (const line of stripDiffFileHeaders(diff).split('\n')) {
+    if (line.startsWith('@@')) {
+      if (emitted) {
+        out.push({ kind: 'context', text: '' })
+      }
+
+      continue
+    }
+
+    out.push({ kind: diffKind(line), text: stripDiffMarker(line) })
+    emitted = true
+  }
+
+  return out
+}
+
+function DiffBody({ lines, syntax }: { lines: DiffLine[]; syntax?: boolean }) {
+  return (
+    <>
+      {lines.map((line, index) => (
+        <span
+          className={cn(DIFF_LINE_BASE, DIFF_KIND_TINT[line.kind], !syntax && DIFF_KIND_TEXT[line.kind])}
+          key={`${index}-${line.text}`}
+        >
+          {line.text || ' '}
+        </span>
+      ))}
+    </>
+  )
+}
+
+// Shiki transformer: tag each `.line` with the diff tint for its kind, so the
+// syntax-highlighted output keeps add/remove backgrounds + the gutter accent.
+function diffLineTransformer(kinds: DiffKind[]): ShikiTransformer {
+  return {
+    line(node, line) {
+      const kind = kinds[line - 1] ?? 'context'
+
+      const existing = Array.isArray(node.properties.className)
+        ? (node.properties.className as string[])
+        : node.properties.className
+          ? [String(node.properties.className)]
+          : []
+
+      node.properties.className = [...existing, DIFF_LINE_BASE, DIFF_KIND_TINT[kind]]
+    }
+  }
+}
+
+function SyntaxDiff({ language, lines }: { language: string; lines: DiffLine[] }) {
+  const code = React.useMemo(() => lines.map(line => line.text).join('\n'), [lines])
+  const transformers = React.useMemo(() => [diffLineTransformer(lines.map(line => line.kind))], [lines])
+
+  const highlighted = useShikiHighlighter(code, language, SHIKI_THEME, {
+    defaultColor: 'light-dark()',
+    transformers
+  })
+
+  // Until Shiki resolves, show the plain colored diff so there's no flash.
+  return (highlighted as ReactNode) ?? <DiffBody lines={lines} />
 }

 interface DiffLinesProps extends Omit<React.ComponentProps<'pre'>, 'children'> {
@ -35,20 +178,28 @@ interface DiffLinesProps extends Omit<React.ComponentProps<'pre'>, 'children'> {
 }

 export function DiffLines({ className, text, ...props }: DiffLinesProps) {
+  const lines = React.useMemo(() => parseDiff(text), [text])
+
  return (
-    <pre
-      className={cn(
-        'mt-1 mb-1.5 max-h-96 max-w-full min-w-0 overflow-auto rounded-md border border-border/60 bg-muted/35 px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground',
-        className
-      )}
-      data-slot="diff-lines"
-      {...props}
-    >
-      {text.split('\n').map((line, index) => (
-        <span className={cn('block min-w-max whitespace-pre', classifyLine(line))} key={`${index}-${line}`}>
-          {line || ' '}
-        </span>
-      ))}
+    <pre className={cn(DIFF_BOX_CLASS, className)} data-slot="diff-lines" {...props}>
+      <DiffBody lines={lines} />
    </pre>
  )
 }
+
+interface FileDiffPanelProps {
+  diff: string
+  path?: string
+}
+
+export function FileDiffPanel({ diff, path }: FileDiffPanelProps) {
+  const lines = React.useMemo(() => parseDiff(diff), [diff])
+  const language = shikiLanguageForFilename(path)
+  const canHighlight = Boolean(language) && !exceedsHighlightBudget(diff)
+
+  return (
+    <div className={DIFF_BOX_CLASS} data-slot="file-diff-panel">
+      {canHighlight ? <SyntaxDiff language={language} lines={lines} /> : <DiffBody lines={lines} />}
+    </div>
+  )
+}
--- a/apps/desktop/src/components/chat/shiki-highlighter.tsx
+++ b/apps/desktop/src/components/chat/shiki-highlighter.tsx
@ -30,7 +30,10 @@ interface HermesSyntaxHighlighterProps extends SyntaxHighlighterProps {
  defer?: boolean
 }

-const SHIKI_THEME = { dark: 'github-dark-default', light: 'github-light-default' } as const
+// `github-dark-dimmed` is GitHub's lower-contrast dark palette — the vivid
+// `github-dark-default` tokens read harsh at our small code size. Shared by the
+// inline diff renderer too (see diff-lines.tsx) so code + diffs match.
+export const SHIKI_THEME = { dark: 'github-dark-dimmed', light: 'github-light-default' } as const

 /**
 * `github-light-default` colors comments `#6e7781` (~4.2:1 against the code
--- a/apps/desktop/src/components/chat/terminal-output.tsx
+++ b/apps/desktop/src/components/chat/terminal-output.tsx
@ -41,7 +41,11 @@ export function TerminalOutput({ className, text }: TerminalOutputProps) {
  }, [text])

  return (
-    <div className={cn('max-h-16 overflow-auto overscroll-contain', className)} ref={ref}>
+    <div
+      className={cn('max-h-16 overflow-auto overscroll-contain', className)}
+      data-selectable-text="true"
+      ref={ref}
+    >
      <pre className="w-max min-w-full font-mono text-[0.5625rem] leading-[0.85rem] whitespace-pre text-muted-foreground/70">
        {text}
      </pre>
--- a/apps/desktop/src/components/model-visibility-dialog.tsx
+++ b/apps/desktop/src/components/model-visibility-dialog.tsx
@ -14,10 +14,9 @@ import {
  $visibleModels,
  collapseModelFamilies,
  effectiveVisibleKeys,
-  emptyProviderSentinelKey,
-  isProviderSentinel,
  modelVisibilityKey,
-  setVisibleModels
+  setVisibleModels,
+  toggleModelVisibility
 } from '@/store/model-visibility'
 import type { ModelOptionProvider, ModelOptionsResponse } from '@/types/hermes'

@ -61,25 +60,7 @@ export function ModelVisibilityDialog({
  const visible = effectiveVisibleKeys(stored, providers)

  const toggle = (provider: ModelOptionProvider, model: string) => {
-    const next = new Set(effectiveVisibleKeys($visibleModels.get(), providers))
-    const key = modelVisibilityKey(provider.slug, model)
-    const sentinel = emptyProviderSentinelKey(provider.slug)
-
-    if (next.has(key)) {
-      next.delete(key)
-
-      // Check if this was the last real model for this provider.
-      const remainingForProvider = [...next].some(k => k.startsWith(`${provider.slug}::`) && !isProviderSentinel(k))
-
-      if (!remainingForProvider) {
-        next.add(sentinel)
-      }
-    } else {
-      next.delete(sentinel)
-      next.add(key)
-    }
-
-    setVisibleModels(next)
+    setVisibleModels(toggleModelVisibility($visibleModels.get(), providers, provider.slug, model))
  }

  const q = search.trim().toLowerCase()
--- a/apps/desktop/src/components/notifications.tsx
+++ b/apps/desktop/src/components/notifications.tsx
@ -154,7 +154,10 @@ function NotificationDetail({ detail }: { detail: string }) {
    <details className="mt-2 text-xs text-muted-foreground">
      <summary className="select-none font-medium text-muted-foreground hover:text-foreground">{copy.details}</summary>
      <div className="mt-1 rounded-md bg-background/65 p-2">
-        <pre className="max-h-32 whitespace-pre-wrap wrap-break-word font-mono text-[0.6875rem] leading-relaxed">
+        <pre
+          className="max-h-32 whitespace-pre-wrap wrap-break-word font-mono text-[0.6875rem] leading-relaxed"
+          data-selectable-text="true"
+        >
          {detail}
        </pre>
        <CopyButton
--- a/apps/desktop/src/components/prompt-overlays.tsx
+++ b/apps/desktop/src/components/prompt-overlays.tsx
@ -3,6 +3,7 @@
 import { useStore } from '@nanostores/react'
 import { type FormEvent, useCallback, useEffect, useState } from 'react'

+import { PendingApprovalFallback } from '@/components/assistant-ui/tool-approval'
 import { Button } from '@/components/ui/button'
 import {
  Dialog,
@ -21,13 +22,12 @@ import { notifyError } from '@/store/notifications'
 import { $secretRequest, $sudoRequest, clearSecretRequest, clearSudoRequest } from '@/store/prompts'

 // Renders the modal mid-turn prompts the gateway raises and waits on: sudo
-// password and skill secret capture. (Dangerous-command / execute_code approval
-// is rendered INLINE on the pending tool row instead — see
-// components/assistant-ui/tool-approval.tsx — so it reads like an inline "Run"
-// affordance rather than a blocking modal.) Each Python-side caller blocks the
-// agent thread until the matching `*.respond` RPC lands; without a renderer the
-// agent stalls until its timeout and the tool is BLOCKED (the bug this fixes —
-// desktop handled clarify.request but not these). Any close path (Esc, backdrop
+// password and skill secret capture. Dangerous-command / execute_code approval
+// prefers the pending tool row, but also has a chat-level fallback when no row
+// is mounted (remote gateway sessions can raise the request before the matching
+// tool call is visible). Each Python-side caller blocks the agent thread until
+// the matching `*.respond` RPC lands; without a renderer the agent stalls until
+// its timeout and the tool is BLOCKED. Any close path (Esc, backdrop
 // click) funnels through Radix's single `onOpenChange(false)` and maps to a
 // refusal, so silence is never mistaken for consent, matching the TUI. We
 // deliberately do NOT add onEscapeKeyDown / onInteractOutside handlers — they'd
@ -227,6 +227,7 @@ function SecretDialog() {
 export function PromptOverlays() {
  return (
    <>
+      <PendingApprovalFallback />
      <SudoDialog />
      <SecretDialog />
    </>
--- a/apps/desktop/src/components/remote-display-banner.tsx
+++ b/apps/desktop/src/components/remote-display-banner.tsx
@ -0,0 +1,42 @@
+import { useEffect, useState } from 'react'
+
+import { Alert, AlertDescription } from '@/components/ui/alert'
+import { Button } from '@/components/ui/button'
+import { Codicon } from '@/components/ui/codicon'
+import { useI18n } from '@/i18n'
+import { Info } from '@/lib/icons'
+
+export function RemoteDisplayBanner() {
+  const { t } = useI18n()
+  const [reason, setReason] = useState<string | null>(null)
+  const [dismissed, setDismissed] = useState(false)
+
+  useEffect(() => {
+    void window.hermesDesktop?.getRemoteDisplayReason?.().then(result => setReason(result))
+  }, [])
+
+  if (!reason || dismissed) {
+    return null
+  }
+
+  return (
+    <div className="pointer-events-none fixed left-1/2 top-[calc(var(--titlebar-height,34px)+0.75rem)] z-[200] w-[min(32rem,calc(100%-2rem))] -translate-x-1/2">
+      <Alert className="pointer-events-auto grid-cols-[auto_minmax(0,1fr)_auto] border-(--stroke-nous) bg-popover/95 pr-2.5 shadow-nous backdrop-blur-md">
+        <Info className="text-muted-foreground" />
+        <AlertDescription className="col-start-2">
+          <p className="m-0">{t.remoteDisplayBanner.message(reason)}</p>
+        </AlertDescription>
+        <Button
+          aria-label={t.remoteDisplayBanner.dismiss}
+          className="col-start-3 -mr-1 text-muted-foreground"
+          onClick={() => setDismissed(true)}
+          size="icon-xs"
+          type="button"
+          variant="ghost"
+        >
+          <Codicon name="close" size="0.875rem" />
+        </Button>
+      </Alert>
+    </div>
+  )
+}
--- a/Show more
+++ b/Show more