diff --git a/.dockerignore b/.dockerignore
index f6fbbc9f137..a5b50068f02 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -102,6 +102,3 @@ acp_registry/
.gitattributes
.hadolint.yaml
.mailmap
-
-# Top-level LICENSE (not matched by *.md); not needed inside the container
-LICENSE
diff --git a/.env.example b/.env.example
index 924146613c4..4c83db1f3b4 100644
--- a/.env.example
+++ b/.env.example
@@ -105,6 +105,7 @@
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
# HF_TOKEN=
+# HF_BASE_URL=https://router.huggingface.co/v1 # Override default base URL
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
@@ -411,6 +412,9 @@ IMAGE_TOOLS_DEBUG=false
# Groq API key (free tier — used for Whisper STT in voice mode)
# GROQ_API_KEY=
+# ElevenLabs API key (cloud STT/TTS — Scribe transcription)
+# ELEVENLABS_API_KEY=
+
# =============================================================================
# STT PROVIDER SELECTION
# =============================================================================
diff --git a/AGENTS.md b/AGENTS.md
index e032f765447..30deedf5bf1 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -954,9 +954,10 @@ Enable/disable per platform via `hermes tools` (the curses UI) or the
## Delegation (`delegate_task`)
`tools/delegate_tool.py` spawns a subagent with an isolated
-context + terminal session. Synchronous: the parent waits for the
-child's summary before continuing its own loop — if the parent is
-interrupted, the child is cancelled.
+context + terminal session. By default the parent waits for the
+child's summary before continuing its own loop. With `background=true`,
+Hermes returns a delegation id immediately and the result re-enters the
+conversation later through the async-delegation completion queue.
Two shapes:
@@ -978,9 +979,9 @@ Key config knobs (under `delegation:` in `config.yaml`):
`orchestrator_enabled`, `subagent_auto_approve`, `inherit_mcp_toolsets`,
`max_iterations`.
-Synchronicity rule: delegate_task is **not** durable. For long-running
-work that must outlive the current turn, use `cronjob` or
-`terminal(background=True, notify_on_complete=True)` instead.
+Durability rule: background `delegate_task` is detached from the current
+turn but still process-local. For work that must survive process restart, use
+`cronjob` or `terminal(background=True, notify_on_complete=True)` instead.
---
@@ -1174,7 +1175,7 @@ automatically scope to the active profile.
a unique credential (bot token, API key), call `acquire_scoped_lock()` from
`gateway.status` in the `connect()`/`start()` method and `release_scoped_lock()` in
`disconnect()`/`stop()`. This prevents two profiles from using the same credential.
- See `gateway/platforms/telegram.py` for the canonical pattern.
+ See `plugins/platforms/irc/adapter.py` for the canonical pattern.
6. **Profile operations are HOME-anchored, not HERMES_HOME-anchored** — `_get_profiles_root()`
returns `Path.home() / ".hermes" / "profiles"`, NOT `get_hermes_home() / "profiles"`.
diff --git a/CONTRIBUTING.es.md b/CONTRIBUTING.es.md
new file mode 100644
index 00000000000..ab34206dd6c
--- /dev/null
+++ b/CONTRIBUTING.es.md
@@ -0,0 +1,602 @@
+# Contribuir a Hermes Agent
+
+¡Gracias por contribuir a Hermes Agent! Esta guía cubre todo lo que necesitas: configurar tu entorno de desarrollo, entender la arquitectura, decidir qué construir y conseguir que tu PR sea aceptado.
+
+---
+
+## Prioridades de Contribución
+
+Valoramos las contribuciones en este orden:
+
+1. **Correcciones de errores** — bloqueos, comportamiento incorrecto, pérdida de datos. Siempre la máxima prioridad.
+2. **Compatibilidad entre plataformas** — macOS, diferentes distribuciones de Linux y WSL2 en Windows. Queremos que Hermes funcione en todas partes.
+3. **Fortalecimiento de seguridad** — inyección de shell, inyección de prompts, traversal de rutas, escalada de privilegios. Ver [Consideraciones de Seguridad](#consideraciones-de-seguridad).
+4. **Rendimiento y robustez** — lógica de reintento, manejo de errores, degradación elegante.
+5. **Nuevas habilidades** — pero solo las ampliamente útiles. Ver [¿Debería ser una Habilidad o una Herramienta?](#debería-ser-una-habilidad-o-una-herramienta)
+6. **Nuevas herramientas** — raramente necesarias. La mayoría de las capacidades deberían ser habilidades. Ver más abajo.
+7. **Documentación** — correcciones, aclaraciones, nuevos ejemplos.
+
+---
+
+## ¿Debería ser una Habilidad o una Herramienta?
+
+Esta es la pregunta más común para los nuevos colaboradores. La respuesta casi siempre es **habilidad**.
+
+### Hazlo una Habilidad cuando:
+
+- La capacidad se puede expresar como instrucciones + comandos de shell + herramientas existentes
+- Envuelve una CLI externa o API que el agente puede llamar a través de `terminal` o `web_extract`
+- No necesita integración personalizada de Python ni gestión de claves API integrada en el agente
+- Ejemplos: búsqueda en arXiv, flujos de trabajo de git, gestión de Docker, procesamiento de PDF, email a través de herramientas CLI
+
+### Hazlo una Herramienta cuando:
+
+- Requiere integración de extremo a extremo con claves API, flujos de autenticación o configuración de múltiples componentes gestionada por el harness del agente
+- Necesita lógica de procesamiento personalizada que debe ejecutarse con precisión en cada ocasión (no "mejor esfuerzo" de la interpretación del LLM)
+- Maneja datos binarios, streaming o eventos en tiempo real que no pueden pasar por el terminal
+- Ejemplos: automatización de navegador (gestión de sesiones Browserbase), TTS (codificación de audio + entrega en plataforma), análisis de visión (manejo de imágenes base64)
+
+### ¿Debería la Habilidad estar incluida?
+
+Las habilidades incluidas (en `skills/`) se envían con cada instalación de Hermes. Deben ser **ampliamente útiles para la mayoría de los usuarios**:
+
+- Manejo de documentos, investigación web, flujos de trabajo de desarrollo comunes, administración de sistemas
+- Usadas regularmente por una amplia gama de personas
+
+Si tu habilidad es oficial y útil pero no universalmente necesaria (ej., una integración de servicio de pago, una dependencia pesada), ponla en **`optional-skills/`** — se envía con el repositorio pero no está activada por defecto. Los usuarios pueden descubrirla a través de `hermes skills browse` (etiquetada como "oficial") e instalarla con `hermes skills install` (sin advertencia de terceros, confianza integrada).
+
+Si tu habilidad es especializada, contribuida por la comunidad o de nicho, es mejor para un **Skills Hub** — súbela a un registro de habilidades y compártela en el [Discord de Nous Research](https://discord.gg/NousResearch). Los usuarios pueden instalarla con `hermes skills install`.
+
+---
+
+## Proveedores de Memoria: Publicar como Plugin Independiente
+
+**Ya no aceptamos nuevos proveedores de memoria en este repositorio.** El conjunto de proveedores integrados en `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) está cerrado. Si quieres añadir un nuevo backend de memoria, publícalo como un **repositorio de plugin independiente** que los usuarios instalen en `~/.hermes/plugins/` (o a través de un entry point de pip).
+
+Los plugins de memoria independientes:
+
+- Implementan el mismo ABC `MemoryProvider` (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown` y opcionalmente `post_setup(hermes_home, config)` para integración con el asistente de configuración
+- Usan el mismo sistema de descubrimiento — `discover_memory_providers()` los recoge desde directorios de plugins de usuario/proyecto y entry points de pip
+- Se integran con `hermes memory setup` a través de `post_setup()` — sin necesidad de tocar el código base
+- Pueden registrar sus propios subcomandos CLI a través de `register_cli(subparser)` en un archivo `cli.py`
+- Obtienen todos los mismos hooks de ciclo de vida y plomería de configuración que los proveedores incluidos en el árbol
+
+Los PRs que añadan un nuevo directorio bajo `plugins/memory/` serán cerrados con un puntero para publicar el proveedor como su propio repositorio. Los proveedores en árbol existentes se mantienen; las correcciones de errores para ellos son bienvenidas.
+
+Esto no es una barra de calidad — es una decisión de acoplamiento y mantenimiento. Los proveedores de memoria son el tipo de plugin más común y no deberían vivir todos en este árbol.
+
+---
+
+## Configuración del Desarrollo
+
+### Prerequisitos
+
+| Requisito | Notas |
+|-----------|-------|
+| **Git** | Con la extensión `git-lfs` instalada |
+| **Python 3.11+** | uv lo instalará si falta |
+| **uv** | Gestor de paquetes Python rápido ([instalar](https://docs.astral.sh/uv/)) |
+| **Node.js 20+** | Opcional — necesario para herramientas de navegador y puente WhatsApp (coincide con los engines de `package.json` raíz) |
+
+### Clonar e instalar
+
+```bash
+git clone https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+
+# Crear venv con Python 3.11
+uv venv venv --python 3.11
+export VIRTUAL_ENV="$(pwd)/venv"
+
+# Instalar con todos los extras (mensajería, cron, menús CLI, herramientas de desarrollo)
+uv pip install -e ".[all,dev]"
+
+# Opcional: herramientas de navegador
+npm install
+```
+
+### Configurar para desarrollo
+
+```bash
+mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
+cp cli-config.yaml.example ~/.hermes/config.yaml
+touch ~/.hermes/.env
+
+# Añadir al menos una clave de proveedor LLM:
+echo "OPENROUTER_API_KEY=***" >> ~/.hermes/.env
+```
+
+### Ejecutar
+
+```bash
+# Enlace simbólico para acceso global
+mkdir -p ~/.local/bin
+ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
+
+# Verificar
+hermes doctor
+hermes chat -q "Hola"
+```
+
+### Ejecutar tests
+
+```bash
+# Preferido — coincide con CI (entorno hermético, 4 workers xdist); ver AGENTS.md
+scripts/run_tests.sh
+
+# Alternativa (activa el venv primero). El wrapper sigue recomendándose
+# para paridad con GitHub Actions antes de abrir un PR:
+pytest tests/ -v
+```
+
+---
+
+## Estructura del Proyecto
+
+```
+hermes-agent/
+├── run_agent.py # Clase AIAgent — bucle de conversación central, despacho de herramientas, persistencia de sesión
+├── cli.py # Clase HermesCLI — TUI interactiva, integración prompt_toolkit
+├── model_tools.py # Orquestación de herramientas (capa delgada sobre tools/registry.py)
+├── toolsets.py # Agrupaciones y presets de herramientas (hermes-cli, hermes-telegram, etc.)
+├── hermes_state.py # Base de datos de sesiones SQLite con búsqueda de texto completo FTS5, títulos de sesión
+├── batch_runner.py # Procesamiento en lote paralelo para generación de trayectorias
+│
+├── agent/ # Internos del agente (módulos extraídos)
+│ ├── prompt_builder.py # Ensamblaje del prompt del sistema (identidad, habilidades, archivos de contexto, memoria)
+│ ├── context_compressor.py # Auto-resumición al acercarse a los límites de contexto
+│ ├── auxiliary_client.py # Resuelve clientes OpenAI auxiliares (resumición, visión)
+│ ├── display.py # KawaiiSpinner, formateo del progreso de herramientas
+│ ├── model_metadata.py # Longitudes de contexto del modelo, estimación de tokens
+│ └── trajectory.py # Ayudantes para guardar trayectorias
+│
+├── hermes_cli/ # Implementaciones de comandos CLI
+│ ├── main.py # Punto de entrada, análisis de argumentos, despacho de comandos
+│ ├── config.py # Gestión de configuración, migración, definiciones de variables de entorno
+│ ├── setup.py # Asistente de configuración interactivo
+│ ├── auth.py # Resolución de proveedor, OAuth, Nous Portal
+│ ├── models.py # Listas de selección de modelos de OpenRouter
+│ ├── banner.py # Banner de bienvenida, arte ASCII
+│ ├── commands.py # Registro central de comandos de barra (CommandDef), autocompletado, ayudantes del gateway
+│ ├── callbacks.py # Callbacks interactivos (aclarar, sudo, aprobación)
+│ ├── doctor.py # Diagnósticos
+│ ├── skills_hub.py # CLI del Skills Hub + comando de barra /skills
+│ └── skin_engine.py # Motor de skins/temas — personalización visual de CLI basada en datos
+│
+├── tools/ # Implementaciones de herramientas (auto-registradas)
+│ ├── registry.py # Registro central de herramientas (esquemas, manejadores, despacho)
+│ ├── approval.py # Detección de comandos peligrosos + aprobación por sesión
+│ ├── terminal_tool.py # Orquestación del terminal (sudo, ciclo de vida del entorno, backends)
+│ ├── file_operations.py # read_file, write_file, búsqueda, patch, etc.
+│ ├── web_tools.py # web_search, web_extract (Paralelo/Firecrawl + resumición Gemini)
+│ ├── vision_tools.py # Análisis de imágenes a través de modelos multimodales
+│ ├── delegate_tool.py # Lanzamiento de subagentes y ejecución paralela de tareas
+│ ├── code_execution_tool.py # Python sandboxado con acceso a herramientas vía RPC
+│ ├── session_search_tool.py # Búsqueda en conversaciones pasadas con FTS5 + ventanas ancladas
+│ ├── cronjob_tools.py # Gestión de tareas programadas
+│ ├── skill_tools.py # Búsqueda, carga y gestión de habilidades
+│ └── environments/ # Backends de ejecución del terminal
+│ ├── base.py # ABC BaseEnvironment
+│ ├── local.py, docker.py, ssh.py, singularity.py, modal.py, daytona.py
+│
+├── gateway/ # Gateway de mensajería
+│ ├── run.py # GatewayRunner — ciclo de vida de plataformas, enrutamiento de mensajes, cron
+│ ├── config.py # Resolución de configuración de plataformas
+│ ├── session.py # Almacén de sesiones, prompts de contexto, políticas de reset
+│ └── platforms/ # Adaptadores de plataformas
+│ ├── telegram.py, discord_adapter.py, slack.py, whatsapp.py
+│
+├── scripts/ # Scripts del instalador y puente
+│ ├── install.sh # Instalador Linux/macOS
+│ ├── install.ps1 # Instalador Windows PowerShell
+│ └── whatsapp-bridge/ # Puente WhatsApp Node.js (Baileys)
+│
+├── skills/ # Habilidades incluidas (copiadas a ~/.hermes/skills/ en la instalación)
+├── optional-skills/ # Habilidades opcionales oficiales (descubribles vía hub, no activadas por defecto)
+├── tests/ # Suite de tests
+├── website/ # Sitio de documentación (hermes-agent.nousresearch.com)
+│
+├── cli-config.yaml.example # Configuración de ejemplo (copiada a ~/.hermes/config.yaml)
+└── AGENTS.md # Guía de desarrollo para asistentes de codificación IA
+```
+
+### Configuración del usuario (almacenada en `~/.hermes/`)
+
+| Ruta | Propósito |
+|------|-----------|
+| `~/.hermes/config.yaml` | Configuración (modelo, terminal, toolsets, compresión, etc.) |
+| `~/.hermes/.env` | Claves API y secretos |
+| `~/.hermes/auth.json` | Credenciales OAuth (Nous Portal) |
+| `~/.hermes/skills/` | Todas las habilidades activas (incluidas + instaladas desde hub + creadas por el agente) |
+| `~/.hermes/memories/` | Memoria persistente (MEMORY.md, USER.md) |
+| `~/.hermes/state.db` | Base de datos de sesiones SQLite |
+| `~/.hermes/sessions/` | Índice de enrutamiento del gateway (`sessions.json`), migas de pan de solicitudes, transcripciones `*.jsonl` del gateway y (opcionalmente) snapshots JSON por sesión cuando `sessions.write_json_snapshots: true` está configurado. Los snapshots por sesión están desactivados por defecto; state.db es canónica. |
+| `~/.hermes/cron/` | Datos de trabajos programados |
+| `~/.hermes/whatsapp/session/` | Credenciales del puente WhatsApp |
+
+---
+
+## Descripción General de la Arquitectura
+
+### Bucle Central
+
+```
+Mensaje del usuario → AIAgent._run_agent_loop()
+ ├── Construir prompt del sistema (prompt_builder.py)
+ ├── Construir kwargs de API (modelo, mensajes, herramientas, configuración de razonamiento)
+ ├── Llamar al LLM (API compatible con OpenAI)
+ ├── Si tool_calls en la respuesta:
+ │ ├── Ejecutar cada herramienta a través del despacho del registro
+ │ ├── Añadir resultados de herramientas a la conversación
+ │ └── Volver a la llamada al LLM
+ ├── Si respuesta de texto:
+ │ ├── Persistir sesión en DB
+ │ └── Devolver final_response
+ └── Compresión de contexto si se acerca al límite de tokens
+```
+
+### Patrones de Diseño Clave
+
+- **Herramientas auto-registradas**: Cada archivo de herramienta llama a `registry.register()` en el momento de importación. `model_tools.py` activa el descubrimiento importando todos los módulos de herramientas.
+- **Agrupación en toolsets**: Las herramientas se agrupan en toolsets (`web`, `terminal`, `file`, `browser`, etc.) que pueden habilitarse/deshabilitarse por plataforma.
+- **Persistencia de sesión**: Todas las conversaciones se almacenan en SQLite (`hermes_state.py`) con búsqueda de texto completo y títulos de sesión únicos.
+- **Inyección efímera**: Los prompts del sistema y los mensajes de relleno se inyectan en el momento de la llamada API, nunca se persisten en la base de datos ni en los logs.
+- **Abstracción de proveedor**: El agente funciona con cualquier API compatible con OpenAI. La resolución del proveedor ocurre en el momento de la inicialización.
+- **Enrutamiento de proveedor**: Al usar OpenRouter, `provider_routing` en config.yaml controla la selección del proveedor.
+
+---
+
+## Estilo de Código
+
+- **PEP 8** con excepciones prácticas (no imponemos longitud de línea estricta)
+- **Comentarios**: Solo cuando se explica la intención no obvia, compromisos o peculiaridades de API. No narres lo que hace el código
+- **Manejo de errores**: Captura excepciones específicas. Registra con `logger.warning()`/`logger.error()` — usa `exc_info=True` para errores inesperados
+- **Multiplataforma**: Nunca asumas Unix. Ver [Compatibilidad Multiplataforma](#compatibilidad-multiplataforma)
+
+---
+
+## Añadir una Nueva Herramienta
+
+Antes de escribir una herramienta, pregúntate: [¿debería ser una habilidad en su lugar?](#debería-ser-una-habilidad-o-una-herramienta)
+
+Las herramientas se auto-registran en el registro central. Cada archivo de herramienta co-localiza su esquema, manejador y registro:
+
+```python
+"""my_tool — Breve descripción de lo que hace esta herramienta."""
+
+import json
+from tools.registry import registry
+
+
+def my_tool(param1: str, param2: int = 10, **kwargs) -> str:
+ """Manejador. Devuelve un resultado en cadena (a menudo JSON)."""
+ result = do_work(param1, param2)
+ return json.dumps(result)
+
+
+MY_TOOL_SCHEMA = {
+ "type": "function",
+ "function": {
+ "name": "my_tool",
+ "description": "Qué hace esta herramienta y cuándo debería usarla el agente.",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "param1": {"type": "string", "description": "Qué es param1"},
+ "param2": {"type": "integer", "description": "Qué es param2", "default": 10},
+ },
+ "required": ["param1"],
+ },
+ },
+}
+
+
+def _check_requirements() -> bool:
+ """Devuelve True si las dependencias de esta herramienta están disponibles."""
+ return True
+
+
+registry.register(
+ name="my_tool",
+ toolset="my_toolset",
+ schema=MY_TOOL_SCHEMA,
+ handler=lambda args, **kw: my_tool(**args, **kw),
+ check_fn=_check_requirements,
+)
+```
+
+**Conectar a un toolset (requerido):** Las herramientas integradas se auto-descubren: cualquier
+archivo `tools/*.py` que contenga una llamada de nivel superior `registry.register(...)` es
+importado por `discover_builtin_tools()` en `tools/registry.py` cuando `model_tools`
+se carga. **No** hay una lista de importaciones manual en `model_tools.py` que mantener.
+
+Todavía debes añadir el nombre de la herramienta a la lista apropiada en `toolsets.py`
+(por ejemplo `_HERMES_CORE_TOOLS` o un toolset dedicado); de lo contrario la herramienta
+se registra pero nunca se expone al agente.
+
+Consulta `AGENTS.md` (sección **Adding New Tools**) para rutas conscientes del perfil y
+orientación sobre plugins vs. núcleo.
+
+---
+
+## Añadir una Habilidad
+
+Las habilidades incluidas viven en `skills/` organizadas por categoría. Las habilidades opcionales oficiales usan la misma estructura en `optional-skills/`:
+
+```
+skills/
+├── research/
+│ └── arxiv/
+│ ├── SKILL.md # Requerido: instrucciones principales
+│ └── scripts/ # Opcional: scripts auxiliares
+│ └── search_arxiv.py
+├── productivity/
+│ └── ocr-and-documents/
+│ ├── SKILL.md
+│ ├── scripts/
+│ └── references/
+└── ...
+```
+
+### Formato de SKILL.md
+
+```markdown
+---
+name: my-skill
+description: Breve descripción (mostrada en los resultados de búsqueda de habilidades)
+version: 1.0.0
+author: Tu Nombre
+license: MIT
+platforms: [macos, linux] # Opcional — restringir a plataformas de SO específicas
+required_environment_variables: # Opcional — metadatos de configuración segura al cargar
+ - name: MY_API_KEY
+ prompt: Clave API
+ help: Dónde obtenerla
+ required_for: funcionalidad completa
+prerequisites: # Requisitos de tiempo de ejecución heredados opcionales
+ env_vars: [MY_API_KEY]
+ commands: [curl, jq]
+metadata:
+ hermes:
+ tags: [Categoría, Subcategoría, Palabras clave]
+ related_skills: [other-skill-name]
+ fallback_for_toolsets: [web]
+ requires_toolsets: [terminal]
+---
+
+# Título de la Habilidad
+
+Introducción breve.
+
+## Cuándo Usar
+Condiciones de activación — ¿cuándo debería el agente cargar esta habilidad?
+
+## Referencia Rápida
+Tabla de comandos o llamadas API comunes.
+
+## Procedimiento
+Instrucciones paso a paso que el agente sigue.
+
+## Problemas Conocidos
+Modos de fallo conocidos y cómo manejarlos.
+
+## Verificación
+Cómo confirma el agente que funcionó.
+```
+
+### Estándares de autoría de habilidades (OBLIGATORIOS)
+
+Todo skill nuevo o modernizado — incluido, opcional o contribuido — debe cumplir estos estándares antes del merge:
+
+1. **`description` ≤ 60 caracteres, una oración, termina con punto.** Las descripciones largas saturan la UI de listado de habilidades. Indica la capacidad, no la implementación. Sin palabras de marketing ("potente", "completo", "fluido", "avanzado").
+
+2. **Las herramientas referenciadas en el cuerpo de SKILL.md deben ser herramientas nativas de Hermes o servidores MCP que la habilidad espere explícitamente.** Usa los nombres de herramientas en comillas invertidas: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, etc.
+
+3. **El campo `platforms:` auditado contra las importaciones reales del script.** Las habilidades que usen primitivos solo de POSIX deben declarar sus plataformas soportadas.
+
+4. **`author` da crédito primero al colaborador humano.**
+
+5. **El cuerpo de SKILL.md usa el orden moderno de secciones:** título, intro de 2-3 oraciones, luego: `## Cuándo Usar`, `## Prerequisitos`, `## Cómo Ejecutar`, `## Referencia Rápida`, `## Procedimiento`, `## Problemas Conocidos`, `## Verificación`.
+
+6. **Los scripts van en `scripts/`, las referencias en `references/`, las plantillas en `templates/`.**
+
+7. **Los tests viven en `tests/skills/test__skill.py`** y usan solo stdlib + pytest + `unittest.mock`. Sin llamadas de red en vivo.
+
+8. **Las adiciones a `.env.example` están aisladas en un bloque claramente delimitado.**
+
+---
+
+## Añadir una Skin / Tema
+
+Hermes usa un sistema de skins basado en datos — no se necesitan cambios de código para añadir una nueva skin.
+
+**Opción A: Skin de usuario (archivo YAML)**
+
+Crea `~/.hermes/skins/.yaml`:
+
+```yaml
+name: mitema
+description: Breve descripción del tema
+
+colors:
+ banner_border: "#HEX"
+ banner_title: "#HEX"
+ banner_accent: "#HEX"
+ banner_dim: "#HEX"
+ banner_text: "#HEX"
+ response_border: "#HEX"
+
+spinner:
+ waiting_faces: ["(⚔)", "(⛨)"]
+ thinking_faces: ["(⚔)", "(⌁)"]
+ thinking_verbs: ["forjando", "planeando"]
+
+branding:
+ agent_name: "Mi Agente"
+ welcome: "Mensaje de bienvenida"
+ response_label: " ⚔ Agente "
+ prompt_symbol: "⚔"
+
+tool_prefix: "╎"
+```
+
+Todos los campos son opcionales — los valores faltantes se heredan de la skin predeterminada.
+
+**Opción B: Skin integrada**
+
+Añade al dict `_BUILTIN_SKINS` en `hermes_cli/skin_engine.py`. Usa el mismo esquema que arriba pero como dict de Python.
+
+**Activar:**
+- CLI: `/skin mitema` o establece `display.skin: mitema` en config.yaml
+
+---
+
+## Compatibilidad Multiplataforma
+
+Hermes se ejecuta en Linux, macOS y Windows nativo (además de WSL2). Al escribir código
+que toca el SO, asume que *cualquier* plataforma puede alcanzar tu ruta de código.
+
+> **Antes de hacer PR:** ejecuta `scripts/check-windows-footguns.py` para detectar
+> los patrones inseguros comunes de Windows en tu diff. Es basado en grep y barato;
+> CI también lo ejecuta en cada PR.
+
+### Reglas críticas
+
+1. **Nunca llames `os.kill(pid, 0)` para comprobaciones de liveness.** En Windows **NO es una operación sin efecto**. Usa `psutil.pid_exists(pid)` en su lugar.
+
+2. **Usa `shutil.which()` antes de hacer shell — no asumas que Windows tiene las herramientas que tiene Linux.** `ps`, `kill`, `grep`, `awk`, etc. simplemente no existen en Windows.
+
+3. **`termios` y `fcntl` son solo de Unix.** Siempre captura tanto `ImportError` como `NotImplementedError`.
+
+4. **Codificación de archivos.** Windows puede guardar archivos `.env` en `cp1252`. Siempre maneja errores de codificación.
+
+5. **Gestión de procesos.** `os.setsid()`, `os.killpg()`, `os.fork()`, `os.getuid()` y el manejo de señales POSIX difieren en Windows.
+
+6. **Señales que no existen en Windows:** `SIGALRM`, `SIGCHLD`, `SIGHUP`, `SIGUSR1`, `SIGUSR2`, etc.
+
+7. **Separadores de ruta.** Usa `pathlib.Path` en lugar de concatenación de cadenas con `/`.
+
+8. **Los enlaces simbólicos necesitan privilegios elevados en Windows** (a menos que el Modo Desarrollador esté activado).
+
+9. **Los modos de archivo POSIX (0o600, 0o644, etc.) NO se aplican en NTFS** por defecto.
+
+10. **Los daemons de fondo desacoplados en Windows necesitan `pythonw.exe`, NO `python.exe`.**
+
+---
+
+## Consideraciones de Seguridad
+
+Hermes tiene acceso al terminal. La seguridad importa.
+
+### Protecciones existentes
+
+| Capa | Implementación |
+|------|---------------|
+| **Piping de contraseña sudo** | Usa `shlex.quote()` para prevenir inyección de shell |
+| **Detección de comandos peligrosos** | Patrones regex en `tools/approval.py` con flujo de aprobación del usuario |
+| **Inyección de prompts en cron** | Escáner en `tools/cronjob_tools.py` bloquea patrones de anulación de instrucciones |
+| **Lista de denegación de escritura** | Rutas protegidas resueltas a través de `os.path.realpath()` para prevenir bypass de enlaces simbólicos |
+| **Skills Guard** | Escáner de seguridad para habilidades instaladas desde el hub (`tools/skills_guard.py`) |
+| **Sandbox de ejecución de código** | El proceso hijo `execute_code` se ejecuta con claves API eliminadas del entorno |
+| **Fortalecimiento de contenedor** | Docker: todas las capacidades eliminadas, sin escalada de privilegios, límites de PID, tmpfs de tamaño limitado |
+
+### Al contribuir código sensible a la seguridad
+
+- **Siempre usa `shlex.quote()`** al interpolar entrada del usuario en comandos de shell
+- **Resuelve enlaces simbólicos** con `os.path.realpath()` antes de comprobaciones de control de acceso basadas en rutas
+- **No registres secretos.** Las claves API, tokens y contraseñas nunca deben aparecer en la salida de log
+- **Captura excepciones amplias** alrededor de la ejecución de herramientas para que un solo fallo no bloquee el bucle del agente
+- **Prueba en todas las plataformas** si tu cambio toca rutas de archivos, gestión de procesos o comandos de shell
+
+### Política de fijación de dependencias (fortalecimiento de la cadena de suministro)
+
+Tras el [compromiso de la cadena de suministro de litellm](https://github.com/BerriAI/litellm/issues/24512) en marzo de 2026 y la [campaña del gusano Mini Shai-Hulud](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) en mayo de 2026, todas las dependencias deben seguir estas reglas:
+
+| Tipo de fuente | Tratamiento requerido | Justificación |
+|---|---|---|
+| **Paquete PyPI** | `>=suelo, # vX.Y.Z` |
+| **Instalaciones pip solo de CI** | `==exacto` | Builds de CI herméticos; el cambio es aceptable. |
+
+**Cada nueva dependencia de PyPI en un PR debe tener un límite superior `=X.Y.Z` sin límite superior serán rechazados.
+
+---
+
+## Proceso de Pull Request
+
+### Nomenclatura de ramas
+
+```
+fix/descripcion # Correcciones de errores
+feat/descripcion # Nuevas funcionalidades
+docs/descripcion # Documentación
+test/descripcion # Tests
+refactor/descripcion # Reestructuración de código
+```
+
+### Antes de enviar
+
+1. **Ejecutar tests**: `scripts/run_tests.sh` (recomendado; igual que CI) o `pytest tests/ -v` con el venv del proyecto activado
+2. **Probar manualmente**: Ejecuta `hermes` y ejercita la ruta de código que cambiaste
+3. **Verificar impacto multiplataforma**: Si tocas E/S de archivos, gestión de procesos o manejo del terminal, considera macOS, Linux y WSL2
+4. **Mantén los PRs enfocados**: Un cambio lógico por PR. No mezcles una corrección de error con una refactorización con una nueva funcionalidad.
+
+### Descripción del PR
+
+Incluye:
+- **Qué** cambió y **por qué**
+- **Cómo probarlo** (pasos de reproducción para errores, ejemplos de uso para funcionalidades)
+- **Qué plataformas** probaste
+- Referencia cualquier issue relacionado
+
+### Mensajes de commit
+
+Usamos [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+():
+```
+
+| Tipo | Usar para |
+|------|-----------|
+| `fix` | Correcciones de errores |
+| `feat` | Nuevas funcionalidades |
+| `docs` | Documentación |
+| `test` | Tests |
+| `refactor` | Reestructuración de código (sin cambio de comportamiento) |
+| `chore` | Build, CI, actualizaciones de dependencias |
+
+Alcances: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`, etc.
+
+Ejemplos:
+```
+fix(cli): prevenir bloqueo en save_config_value cuando el modelo es una cadena
+feat(gateway): añadir aislamiento de sesión multi-usuario de WhatsApp
+fix(security): prevenir inyección de shell en el piping de contraseña sudo
+test(tools): añadir tests unitarios para file_operations
+```
+
+---
+
+## Reportar Issues
+
+- Usa [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
+- Incluye: SO, versión de Python, versión de Hermes (`hermes version`), traza de error completa
+- Incluye pasos para reproducir
+- Verifica los issues existentes antes de crear duplicados
+- Para vulnerabilidades de seguridad, por favor reporta de forma privada
+
+---
+
+## Comunidad
+
+- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch) — para preguntas, mostrar proyectos y compartir habilidades
+- **GitHub Discussions**: Para propuestas de diseño y discusiones de arquitectura
+- **Skills Hub**: Sube habilidades especializadas a un registro y compártelas con la comunidad
+
+---
+
+## Licencia
+
+Al contribuir, aceptas que tus contribuciones serán licenciadas bajo la [Licencia MIT](LICENSE).
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 1a70116548a..045d8097f88 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -18,6 +18,24 @@ We value contributions in this order:
---
+## Before You Start: Search First
+
+A quick search before you build saves your time and keeps the PR queue clean — duplicates are common here, so it's worth a minute up front.
+
+- **Search both open *and* merged PRs and issues** for your topic or error symptom — the duplicate-check in the PR template fires at review time, after you've already done the work:
+ ```bash
+ gh search issues --repo NousResearch/hermes-agent ""
+ gh search prs --repo NousResearch/hermes-agent --state all ""
+ ```
+ Or use the web UI: [issues](https://github.com/NousResearch/hermes-agent/issues?q=) · [PRs (all states)](https://github.com/NousResearch/hermes-agent/pulls?q=is%3Apr).
+- **The issue tracker can lag the code.** Many requested features are already implemented in-tree, so also search the source (`search_files`, or your editor's grep) for the capability before proposing it.
+- **If an open PR already addresses it**, consider reviewing or improving that one instead of opening a competing duplicate.
+- **For larger work**, comment on the issue to signal you're working on it, so others don't start the same thing.
+
+Related: #38284 covers the agent-side analog — Hermes itself checking existing issues and PRs before deep self-troubleshooting. This section is the human-contributor complement.
+
+---
+
## Should it be a Skill or a Tool?
This is the most common question for new contributors. The answer is almost always **skill**.
@@ -412,6 +430,12 @@ Brief intro.
## When to Use
Trigger conditions — when should the agent load this skill?
+## Prerequisites
+Env vars, install steps, MCP setup, API key sourcing.
+
+## How to Run
+Canonical invocation through the `terminal` tool.
+
## Quick Reference
Table of common commands or API calls.
diff --git a/README.es.md b/README.es.md
new file mode 100644
index 00000000000..af8558513c5
--- /dev/null
+++ b/README.es.md
@@ -0,0 +1,220 @@
+
+
+**El agente de IA con mejora continua creado por [Nous Research](https://nousresearch.com).** Es el único agente con un bucle de aprendizaje integrado: crea habilidades a partir de la experiencia, las mejora durante el uso, se impulsa a sí mismo a persistir el conocimiento, busca en sus propias conversaciones pasadas y construye un modelo cada vez más profundo de quién eres a lo largo de las sesiones. Ejecútalo en un VPS de $5, un clúster de GPUs o infraestructura sin servidor que cuesta casi nada cuando está inactivo. No está atado a tu laptop — habla con él desde Telegram mientras trabaja en una VM en la nube.
+
+Usa cualquier modelo que quieras — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (más de 200 modelos), [NovitaAI](https://novita.ai), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, o tu propio endpoint. Cambia con `hermes model` — sin cambios de código, sin dependencias.
+
+
+
Una interfaz de terminal real
TUI completa con edición multilínea, autocompletado de comandos, historial de conversaciones, interrupción y redirección, y salida de herramientas en streaming.
+
Vive donde tú vives
Telegram, Discord, Slack, WhatsApp, Signal y CLI — todo desde un único proceso gateway. Transcripción de notas de voz, continuidad de conversación entre plataformas.
+
Un bucle de aprendizaje cerrado
Memoria curada por el agente con recordatorios periódicos. Creación autónoma de habilidades tras tareas complejas. Las habilidades mejoran solas durante el uso. Búsqueda FTS5 de sesiones con resumención por LLM para recuperación entre sesiones. Modelado de usuario dialéctico Honcho. Compatible con el estándar abierto de agentskills.io.
+
Automatizaciones programadas
Planificador cron integrado con entrega a cualquier plataforma. Informes diarios, copias de seguridad nocturnas, auditorías semanales — todo en lenguaje natural, ejecutándose de forma autónoma.
+
Delega y paraleliza
Lanza subagentes aislados para flujos de trabajo paralelos. Escribe scripts de Python que llaman a herramientas vía RPC, convirtiendo pipelines de múltiples pasos en turnos de coste cero de contexto.
+
Funciona en cualquier lugar, no solo en tu laptop
Seis backends de terminal — local, Docker, SSH, Singularity, Modal y Daytona. Daytona y Modal ofrecen persistencia sin servidor — el entorno de tu agente hiberna cuando está inactivo y se activa bajo demanda, costando casi nada entre sesiones. Ejecútalo en un VPS de $5 o un clúster de GPUs.
+
Listo para investigación
Generación de trayectorias en lote, compresión de trayectorias para entrenar la próxima generación de modelos de llamadas a herramientas.
+
+
+---
+
+## Instalación rápida
+
+### Linux, macOS, WSL2, Termux
+
+```bash
+curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
+```
+
+### Windows (nativo, PowerShell)
+
+> **Nota:** En Windows nativo, Hermes funciona sin WSL — la CLI, el gateway, la TUI y las herramientas funcionan de forma nativa. Si prefieres usar WSL2, el comando de Linux/macOS de arriba también funciona allí. ¿Encontraste un error? Por favor [crea un issue](https://github.com/NousResearch/hermes-agent/issues).
+
+Ejecuta esto en PowerShell:
+
+```powershell
+iex (irm https://hermes-agent.nousresearch.com/install.ps1)
+```
+
+El instalador se encarga de todo: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **y un Git Bash portátil** (MinGit, descomprimido en `%LOCALAPPDATA%\hermes\git` — no requiere administrador, completamente aislado de cualquier instalación de Git del sistema). Hermes usa este Git Bash incluido para ejecutar comandos de shell.
+
+Si ya tienes Git instalado, el instalador lo detecta y lo usa en su lugar. De lo contrario, una descarga de ~45MB de MinGit es todo lo que necesitas — no tocará ni interferirá con ningún Git del sistema.
+
+> **Android / Termux:** La ruta manual probada está documentada en la [guía de Termux](https://hermes-agent.nousresearch.com/docs/getting-started/termux). En Termux, Hermes instala el extra `.[termux]` curado porque el extra completo `.[all]` actualmente incluye dependencias de voz incompatibles con Android.
+>
+> **Windows:** Windows nativo es totalmente compatible — el comando de PowerShell de arriba instala todo. Si prefieres usar WSL2, el comando de Linux también funciona allí. La instalación nativa de Windows se encuentra en `%LOCALAPPDATA%\hermes`; WSL2 instala en `~/.hermes` como en Linux.
+
+Después de la instalación:
+
+```bash
+source ~/.bashrc # recargar shell (o: source ~/.zshrc)
+hermes # ¡empieza a chatear!
+```
+
+---
+
+## Primeros pasos
+
+```bash
+hermes # CLI interactiva — inicia una conversación
+hermes model # Elige tu proveedor y modelo LLM
+hermes tools # Configura qué herramientas están habilitadas
+hermes config set # Establece valores de configuración individuales
+hermes gateway # Inicia el gateway de mensajería (Telegram, Discord, etc.)
+hermes setup # Ejecuta el asistente de configuración completo
+hermes claw migrate # Migra desde OpenClaw (si vienes de OpenClaw)
+hermes update # Actualiza a la última versión
+hermes doctor # Diagnostica cualquier problema
+```
+
+📖 **[Documentación completa →](https://hermes-agent.nousresearch.com/docs/)**
+
+---
+
+## Evita la colección de claves API — Nous Portal
+
+Hermes funciona con cualquier proveedor que quieras — eso no cambiará. Pero si prefieres no recopilar cinco claves API separadas para el modelo, búsqueda web, generación de imágenes, TTS y un navegador en la nube, **[Nous Portal](https://portal.nousresearch.com)** las cubre todas bajo una sola suscripción:
+
+- **Más de 300 modelos** — elige cualquiera con `/model `
+- **Tool Gateway** — búsqueda web (Firecrawl), generación de imágenes (FAL), texto a voz (OpenAI), navegador en la nube (Browser Use), todo enrutado a través de tu suscripción. Sin cuentas adicionales.
+
+Un comando desde una instalación nueva:
+
+```bash
+hermes setup --portal
+```
+
+Esto te autentica vía OAuth, establece Nous como tu proveedor y activa el Tool Gateway. Comprueba qué está conectado en cualquier momento con `hermes portal info`. Detalles completos en la [página de documentación del Tool Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
+
+Puedes seguir usando tus propias claves por herramienta cuando quieras — el gateway es por backend, no todo o nada.
+
+---
+
+## Referencia rápida: CLI vs Mensajería
+
+Hermes tiene dos puntos de entrada: inicia la interfaz de terminal con `hermes`, o ejecuta el gateway y habla con él desde Telegram, Discord, Slack, WhatsApp, Signal o Email. Una vez en una conversación, muchos comandos de barra son compartidos entre ambas interfaces.
+
+| Acción | CLI | Plataformas de mensajería |
+| ----------------------------------- | --------------------------------------------- | --------------------------------------------------------------------------------- |
+| Empezar a chatear | `hermes` | Ejecuta `hermes gateway setup` + `hermes gateway start`, luego envía un mensaje al bot |
+| Nueva conversación | `/new` o `/reset` | `/new` o `/reset` |
+| Cambiar modelo | `/model [proveedor:modelo]` | `/model [proveedor:modelo]` |
+| Establecer personalidad | `/personality [nombre]` | `/personality [nombre]` |
+| Reintentar o deshacer último turno | `/retry`, `/undo` | `/retry`, `/undo` |
+| Comprimir contexto / ver uso | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
+| Explorar habilidades | `/skills` o `/` | `/` |
+| Interrumpir trabajo actual | `Ctrl+C` o enviar un nuevo mensaje | `/stop` o enviar un nuevo mensaje |
+| Estado específico de plataforma | `/platforms` | `/status`, `/sethome` |
+
+Para las listas de comandos completas, consulta la [guía de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) y la [guía del Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
+
+---
+
+## Documentación
+
+Toda la documentación está en **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
+
+| Sección | Contenido |
+| --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
+| [Inicio rápido](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Instalar → configurar → primera conversación en 2 minutos |
+| [Uso de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Comandos, atajos de teclado, personalidades, sesiones |
+| [Configuración](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Archivo de configuración, proveedores, modelos, todas las opciones |
+| [Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
+| [Seguridad](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Aprobación de comandos, emparejamiento por DM, aislamiento en contenedor |
+| [Herramientas y Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | Más de 40 herramientas, sistema de toolsets, backends de terminal |
+| [Sistema de Habilidades](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Memoria procedimental, Skills Hub, creación de habilidades |
+| [Memoria](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory) | Memoria persistente, perfiles de usuario, mejores prácticas |
+| [Integración MCP](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp) | Conecta cualquier servidor MCP para capacidades extendidas |
+| [Programación Cron](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) | Tareas programadas con entrega a plataforma |
+| [Archivos de Contexto](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Contexto de proyecto que da forma a cada conversación |
+| [Arquitectura](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | Estructura del proyecto, bucle del agente, clases principales |
+| [Contribuir](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) | Configuración de desarrollo, proceso de PR, estilo de código |
+| [Referencia de CLI](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) | Todos los comandos y flags |
+| [Variables de Entorno](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Referencia completa de variables de entorno |
+
+---
+
+## Migración desde OpenClaw
+
+Si vienes de OpenClaw, Hermes puede importar automáticamente tu configuración, memorias, habilidades y claves API.
+
+**Durante la configuración inicial:** El asistente de configuración (`hermes setup`) detecta automáticamente `~/.openclaw` y ofrece migrar antes de que comience la configuración.
+
+**En cualquier momento después de instalar:**
+
+```bash
+hermes claw migrate # Migración interactiva (preset completo)
+hermes claw migrate --dry-run # Vista previa de qué se migraría
+hermes claw migrate --preset user-data # Migrar sin secretos
+hermes claw migrate --overwrite # Sobreescribir conflictos existentes
+```
+
+Qué se importa:
+
+- **SOUL.md** — archivo de personalidad
+- **Memorias** — entradas de MEMORY.md y USER.md
+- **Habilidades** — habilidades creadas por el usuario → `~/.hermes/skills/openclaw-imports/`
+- **Lista de comandos permitidos** — patrones de aprobación
+- **Configuración de mensajería** — configuración de plataformas, usuarios permitidos, directorio de trabajo
+- **Claves API** — secretos en lista de permitidos (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
+- **Assets de TTS** — archivos de audio del espacio de trabajo
+- **Instrucciones del espacio de trabajo** — AGENTS.md (con `--workspace-target`)
+
+Consulta `hermes claw migrate --help` para todas las opciones, o usa la habilidad `openclaw-migration` para una migración guiada interactiva por el agente con vistas previas de dry-run.
+
+---
+
+## Contribuir
+
+¡Las contribuciones son bienvenidas! Consulta la [Guía de Contribución](CONTRIBUTING.es.md) para la configuración del desarrollo, el estilo de código y el proceso de PR.
+
+Inicio rápido para colaboradores — clona y comienza con `setup-hermes.sh`:
+
+```bash
+git clone https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+./setup-hermes.sh # instala uv, crea venv, instala .[all], enlaza ~/.local/bin/hermes
+./hermes # detecta automáticamente el venv, no necesitas hacer `source` primero
+```
+
+Ruta manual (equivalente a lo anterior):
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+uv venv .venv --python 3.11
+source .venv/bin/activate
+uv pip install -e ".[all,dev]"
+scripts/run_tests.sh
+```
+
+---
+
+## Comunidad
+
+- 💬 [Discord](https://discord.gg/NousResearch)
+- 📚 [Skills Hub](https://agentskills.io)
+- 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
+- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Servidor MCP de control de escritorio Linux para Hermes y otros hosts MCP, con árboles de accesibilidad AT-SPI, entrada Wayland/X11, capturas de pantalla y targeting de ventanas del compositor.
+- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Puente WeChat comunitario: Ejecuta Hermes Agent y OpenClaw en la misma cuenta de WeChat.
+
+---
+
+## Licencia
+
+MIT — ver [LICENSE](LICENSE).
+
+Creado por [Nous Research](https://nousresearch.com).
diff --git a/README.md b/README.md
index 5fb4e80082b..0d5a638e227 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@
+
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
@@ -64,6 +65,41 @@ source ~/.bashrc # reload shell (or: source ~/.zshrc)
hermes # start chatting!
```
+### Troubleshooting
+
+#### Windows Defender or antivirus flags `uv.exe` as malware
+
+If your antivirus (Bitdefender, Windows Defender, etc.) quarantines `uv.exe` from the Hermes `bin` folder (`%LOCALAPPDATA%\hermes\bin\uv.exe`), this is a **false positive**. The file is Astral's `uv` — the Rust Python package manager Hermes bundles to manage its Python environment. ML-based antivirus engines commonly flag unsigned Rust binaries that download and install packages.
+
+**To verify your copy is authentic:**
+
+```powershell
+# Install GitHub CLI if needed
+winget install --id GitHub.cli
+
+# Login to GitHub
+gh auth login
+
+# Run verification
+$uv = "$env:LOCALAPPDATA\hermes\bin\uv.exe"
+$ver = (& $uv --version).Split(' ')[1]
+[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
+$zip = "$env:TEMP\uv.zip"
+Invoke-WebRequest "https://github.com/astral-sh/uv/releases/download/$ver/uv-x86_64-pc-windows-msvc.zip" -OutFile $zip -UseBasicParsing
+gh attestation verify $zip --repo astral-sh/uv
+Expand-Archive $zip "$env:TEMP\uv_x" -Force
+(Get-FileHash "$env:TEMP\uv_x\uv.exe").Hash -eq (Get-FileHash $uv).Hash
+```
+
+If attestation says "Verification succeeded" and the last line prints `True`, you're good.
+
+**To whitelist Hermes:**
+- **Windows Defender:** Run PowerShell as Admin → `Add-MpPreference -ExclusionPath "$env:LOCALAPPDATA\hermes\bin"`
+- **Bitdefender:** Add an exception in the Bitdefender console (Protection > Antivirus > Settings > Manage Exceptions)
+- Whitelist the **folder**, not the file hash — Hermes updates `uv` and the hash changes every version
+
+For more context, see the upstream Astral reports: [astral-sh/uv#13553](https://github.com/astral-sh/uv/issues/13553), [astral-sh/uv#15011](https://github.com/astral-sh/uv/issues/15011), [astral-sh/uv#10079](https://github.com/astral-sh/uv/issues/10079).
+
---
## Getting Started
diff --git a/README.zh-CN.md b/README.zh-CN.md
index 2453739f917..5ebfe1a7c50 100644
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -39,7 +39,11 @@ curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
> **Android / Termux:** 已测试的手动安装路径请参考 [Termux 指南](https://hermes-agent.nousresearch.com/docs/getting-started/termux)。在 Termux 上,Hermes 会安装精选的 `.[termux]` 扩展,因为完整的 `.[all]` 扩展会拉取 Android 不兼容的语音依赖。
>
-> **Windows:** 原生 Windows 不受支持。请安装 [WSL2](https://learn.microsoft.com/zh-cn/windows/wsl/install) 并运行上述命令。
+> **Windows:** 在 PowerShell 中运行:
+> ```powershell
+> iex (irm https://hermes-agent.nousresearch.com/install.ps1)
+> ```
+> 安装完成后,可能需要重启终端,然后运行 `hermes` 开始对话。
安装后:
diff --git a/SECURITY.es.md b/SECURITY.es.md
new file mode 100644
index 00000000000..30b43716ebb
--- /dev/null
+++ b/SECURITY.es.md
@@ -0,0 +1,322 @@
+# Política de Seguridad de Hermes Agent
+
+Este documento describe el modelo de confianza de Hermes Agent, identifica el
+único límite de seguridad que el proyecto trata como estructural y define el
+alcance para los informes de vulnerabilidades.
+
+## 1. Reportar una Vulnerabilidad
+
+Reporta de forma privada a través de [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
+o **security@nousresearch.com**. No abras issues públicos para
+vulnerabilidades de seguridad. **Hermes Agent no opera un programa de
+recompensas por errores.**
+
+Un informe útil incluye:
+
+- Una descripción concisa y evaluación de severidad.
+- El componente afectado, identificado por ruta de archivo y rango de líneas
+ (ej. `path/to/file.py:120-145`).
+- Detalles del entorno (`hermes version`, SHA del commit, SO, versión de Python).
+- Una reproducción contra `main` o el último release.
+- Una declaración de qué límite de confianza del §2 se cruza.
+
+Por favor lee el §2 y el §3 antes de enviar. Los informes que demuestren
+límites de una heurística en proceso que esta política no trate como un
+límite serán cerrados como fuera de alcance bajo el §3 — pero consulta el §3.2:
+siguen siendo bienvenidos como issues o pull requests regulares, simplemente no
+a través del canal de seguridad privado.
+
+---
+
+## 2. Modelo de Confianza
+
+Hermes Agent es un agente personal de un solo inquilino. Su postura es
+por capas, y las capas no tienen el mismo peso. Los reportadores y
+operadores deben razonar sobre ellas en los mismos términos.
+
+### 2.1 Definiciones
+
+- **Proceso del agente.** El intérprete Python que ejecuta Hermes Agent,
+ incluyendo cualquier módulo Python que haya cargado (habilidades, plugins,
+ manejadores de hooks).
+- **Backend de terminal.** Un objetivo de ejecución conectado para la
+ herramienta `terminal()`. El predeterminado ejecuta comandos directamente en el host.
+ Otros backends ejecutan comandos dentro de un contenedor, sandbox en la nube o
+ host remoto.
+- **Superficie de entrada.** Cualquier canal a través del cual el contenido entra en el
+ contexto del agente: entrada del operador, fetches web, email, mensajes del gateway,
+ lecturas de archivos, respuestas del servidor MCP, resultados de herramientas.
+- **Envolvente de confianza.** El conjunto de recursos a los que un operador ha otorgado
+ implícitamente acceso a Hermes Agent al ejecutarlo — típicamente, todo lo que
+ la propia cuenta de usuario del operador puede alcanzar en el host.
+- **Postura.** Una declaración explícita en la documentación o código de Hermes Agent
+ sobre cómo una capa consumidora (adaptador, UI, escritor de archivos,
+ shell) debe tratar la salida del agente — ej. "el dashboard renderiza
+ la salida del agente como HTML inerte."
+
+### 2.2 El Límite: Aislamiento a Nivel de SO
+
+**El único límite de seguridad contra un LLM adversario es el
+sistema operativo.** Nada dentro del proceso del agente constituye
+contención — ni la puerta de aprobación, ni la redacción de salida, ni ningún
+escáner de patrones, ni ninguna lista de herramientas permitidas. Cualquier componente dentro
+del proceso que filtre la salida del LLM es una heurística operando sobre una
+cadena influenciada por el atacante, y esta política lo trata como tal.
+
+Hermes Agent admite dos posturas de aislamiento a nivel de SO. Abordan
+diferentes amenazas y un operador debe elegir deliberadamente.
+
+#### Aislamiento del backend de terminal
+
+Un backend de terminal no predeterminado ejecuta comandos de shell emitidos por el LLM
+dentro de un contenedor, host remoto o sandbox en la nube. Las herramientas de archivos
+(`read_file`, `write_file`, `patch`) también se ejecutan a través de este backend,
+ya que están implementadas sobre el contrato del shell — no pueden
+alcanzar rutas que el backend no exponga.
+
+Lo que confina: todo lo que el agente hace emitiendo operaciones de shell o
+de archivos. Lo que **no** confina: todo lo que el agente hace en su propio
+proceso Python. Eso incluye la herramienta de ejecución de código (lanzada como
+subproceso del host), subprocesos MCP (lanzados desde el entorno del agente),
+carga de plugins, despacho de hooks y carga de habilidades (todos importados en el
+intérprete del agente).
+
+El aislamiento del backend de terminal es la postura correcta cuando la preocupación es
+que el LLM emita comandos de shell destructivos o escrituras de herramientas de archivo no deseadas, y el
+operador es de confianza.
+
+#### Envoltura del proceso completo
+
+La envoltura del proceso completo ejecuta todo el árbol de procesos del agente dentro de un
+sandbox. Cada ruta de código — shell, ejecución de código, MCP, herramientas de archivos,
+plugins, hooks, carga de habilidades — está sujeta a la misma política de sistema de archivos,
+red, proceso e (donde sea aplicable) inferencia.
+
+Hermes Agent admite esto de dos maneras:
+
+- **La propia imagen Docker de Hermes Agent y la configuración de Compose.** Más
+ liviana; el agente se ejecuta en un contenedor estándar con montajes y
+ política de red configurados por el operador.
+- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
+ OpenShell proporciona sandboxes por sesión con política declarativa
+ a través de capas de sistema de archivos, red (egreso L7), proceso/syscall e
+ enrutamiento de inferencia. Las políticas de red e inferencia son
+ recargables en caliente. Las credenciales se inyectan desde un almacén de Proveedor
+ y nunca tocan el sistema de archivos del sandbox.
+
+Bajo una envoltura de proceso completo, las heurísticas en proceso de Hermes Agent
+(§2.4) funcionan como prevención de accidentes en capas sobre un límite real.
+Esta es la postura soportada cuando el agente ingiere contenido de superficies
+que el operador no controla — la web abierta, email entrante, canales de
+múltiples usuarios, servidores MCP no confiables — y para despliegues en
+producción o compartidos.
+
+Los operadores que ejecuten el backend local predeterminado con superficies de entrada
+no confiables, o que ejecuten un sandbox de backend de terminal esperando que contenga
+rutas de código que no pasan por el shell, están operando fuera de la postura de
+seguridad soportada.
+
+### 2.3 Alcance de Credenciales
+
+Hermes Agent filtra el entorno que pasa a sus componentes en proceso de
+menor confianza: subprocesos de shell, subprocesos MCP y el proceso hijo
+de ejecución de código. Las credenciales como las claves API del proveedor y los
+tokens del gateway se eliminan por defecto; las variables declaradas explícitamente
+por el operador o por una habilidad cargada se pasan.
+
+Esto reduce la exfiltración casual. No es contención. Cualquier
+componente que se ejecute dentro del proceso del agente (habilidades, plugins, manejadores
+de hooks) puede leer lo que el agente mismo puede leer, incluidas las
+credenciales en memoria. La mitigación contra un componente en proceso comprometido
+es la revisión del operador antes de instalar (§2.4, §2.5), no el
+saneamiento del entorno.
+
+### 2.4 Heurísticas en Proceso
+
+Los siguientes componentes filtran o advierten sobre el comportamiento del LLM. Son
+útiles. No son límites.
+
+- La **puerta de aprobación** detecta patrones de shell destructivos comunes
+ y le pide al operador confirmación antes de la ejecución. El shell es Turing-
+ completo; una lista de denegación sobre cadenas de shell es estructuralmente
+ incompleta. La puerta detecta errores en modo cooperativo, no salidas
+ adversariales.
+- **La redacción de salida** elimina patrones similares a secretos de la visualización.
+ Un productor de salida motivado la evitará.
+- **Skills Guard** escanea el contenido de habilidades instalables en busca de patrones
+ de inyección. Es una ayuda de revisión; el límite para habilidades de terceros
+ es la revisión del operador antes de instalar. Revisar una habilidad significa
+ leer su código Python y scripts, no solo su descripción SKILL.md —
+ las habilidades ejecutan Python arbitrario en el momento de importación.
+
+### 2.5 Modelo de Confianza de Plugins
+
+Los plugins se cargan en el proceso del agente y se ejecutan con todos los privilegios
+del agente: pueden leer las mismas credenciales, llamar a las mismas
+herramientas, registrar los mismos hooks e importar los mismos módulos que
+cualquier cosa incluida en el árbol. El límite para los plugins de terceros es
+la revisión del operador antes de instalar — la misma regla que las habilidades (§2.4),
+mencionado por separado porque los plugins son arquitectónicamente más pesados
+y a menudo incluyen sus propios servicios en segundo plano, oyentes de red
+y dependencias.
+
+Un plugin malicioso o con errores no es una vulnerabilidad en Hermes Agent
+en sí mismo. Los errores en la ruta de instalación o descubrimiento de plugins de Hermes Agent
+que impidan al operador ver lo que está instalando están en alcance bajo el §3.1.
+
+### 2.6 Superficies Externas
+
+Una **superficie externa** es cualquier canal fuera del proceso del agente local
+a través del cual un llamador puede despachar trabajo del agente, resolver
+aprobaciones o recibir salida del agente. Cada superficie tiene su propio
+modelo de autorización, pero las reglas a continuación se aplican uniformemente.
+
+**Superficies en Hermes Agent:**
+
+- **Adaptadores de plataforma del gateway.** Integraciones de mensajería en
+ `gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
+ y adaptadores análogos incluidos como plugins.
+- **Superficies HTTP expuestas en red.** El adaptador del servidor API, el
+ plugin del dashboard, los endpoints HTTP del plugin kanban, y cualquier
+ otro plugin que vincule un socket de escucha.
+- **Adaptadores de Editor / IDE.** El adaptador ACP (`acp_adapter/`) e
+ integraciones equivalentes que aceptan solicitudes de un proceso cliente local.
+- **El gateway TUI (`tui_gateway/`).** Backend JSON-RPC para la
+ UI de terminal Ink, alcanzado a través de IPC local.
+
+**Reglas uniformes:**
+
+1. **Se requiere autorización en cada superficie que cruce un límite de confianza.** Para
+ superficies de mensajería y HTTP en red, el límite es la red: la autorización
+ significa una lista de llamadores permitidos configurada por el operador. Para superficies
+ de editor e IPC local (ACP, gateway TUI), el límite es la cuenta de usuario del host:
+ la autorización significa depender del control de acceso a nivel de SO (permisos
+ de archivos, vinculaciones solo a loopback) y no exponer la superficie más allá
+ del usuario local sin una capa de autenticación de red explícita.
+2. **Se requiere una lista de permitidos para cada adaptador de red habilitado.**
+ Los adaptadores deben rechazar despachar trabajo del agente, resolver
+ aprobaciones o transmitir salida hasta que se establezca una lista de permitidos. Las rutas
+ de código que fallan de forma abierta cuando no hay lista de permitidos configurada son errores de código en
+ alcance bajo el §3.1.
+3. **Los identificadores de sesión son manejadores de enrutamiento, no límites de autorización.**
+ Conocer el ID de sesión de otro llamador no otorga acceso a sus aprobaciones o salida;
+ la autorización siempre se vuelve a verificar contra la lista de permitidos (o equivalente
+ a nivel de SO).
+4. **Dentro del conjunto autorizado, todos los llamadores tienen la misma confianza.**
+ Hermes Agent no modela capacidades por llamador dentro de un único adaptador.
+ Los operadores que necesiten separación de capacidades deben ejecutar instancias
+ de agente separadas con listas de permitidos separadas.
+5. **Vincular una superficie solo local a una interfaz no-loopback es una decisión de
+ operador de emergencia (§3.2).** El dashboard y otros servidores HTTP de plugins
+ son predeterminados a loopback; exponerlos a través de `--host 0.0.0.0` o equivalente
+ hace que el fortalecimiento de exposición pública (§4) sea responsabilidad del operador.
+
+---
+
+## 3. Alcance
+
+### 3.1 En Alcance
+
+- Escape de una postura de aislamiento a nivel de SO declarada (§2.2): una
+ ruta de código controlada por el atacante alcanzando estado que la postura
+ afirmó confinar.
+- Acceso no autorizado a superficie externa: un llamador fuera del conjunto de
+ autorización configurado (lista de permitidos, o equivalente a nivel de SO
+ para superficies de IPC local) despachando trabajo, recibiendo salida o
+ resolviendo aprobaciones (§2.6).
+- Exfiltración de credenciales: filtración de credenciales del operador o
+ material de autorización de sesión a un destino fuera del envolvente de
+ confianza, a través de un mecanismo que debería haberlo prevenido
+ (error de saneamiento de entorno, registro del adaptador, error de transporte
+ que vacía credenciales a un upstream, etc.).
+- Violaciones de la documentación del modelo de confianza: código que se comporta
+ contrariamente a lo que esta política, la propia documentación de Hermes Agent o
+ las expectativas razonables del operador predecirían — incluyendo casos donde
+ Hermes Agent ha documentado una postura sobre cómo su salida debe ser
+ renderizada por una capa consumidora (dashboard, adaptador de gateway,
+ escritor de archivos, shell) y una ruta de código rompe esa postura.
+
+### 3.2 Fuera de Alcance
+
+"Fuera de alcance" aquí significa "no es una vulnerabilidad de seguridad bajo esta
+política." No significa "no vale la pena reportarlo." Las mejoras a las
+heurísticas en proceso, ideas de fortalecimiento y correcciones de UX son bienvenidas como
+issues o pull requests regulares — la puerta de aprobación siempre puede detectar
+más patrones, la redacción puede volverse más inteligente, el comportamiento del adaptador
+puede apretarse siempre. Estos elementos simplemente no van a través del canal de
+divulgación privada y no reciben avisos.
+
+- **Bypasses de heurísticas en proceso (§2.4)** — bypasses de regex de la puerta de aprobación,
+ bypasses de redacción, bypasses de patrones de Skills Guard, e informes
+ análogos contra heurísticas futuras. Estos componentes no son límites;
+ vencerlos no es una vulnerabilidad bajo esta política.
+- **Inyección de prompts per se.** Hacer que el LLM emita salida inusual
+ — a través de contenido inyectado, alucinación, artefactos de entrenamiento,
+ o cualquier otra causa — no es en sí mismo una vulnerabilidad. "Logré
+ inyección de prompts" sin un resultado encadenado del §3.1 no es un informe
+ procesable bajo esta política.
+- **Consecuencias de una postura de aislamiento elegida.** Los informes de que
+ una ruta de código que opera dentro del alcance de su postura puede hacer lo que esa
+ postura permite no son vulnerabilidades. Ejemplos: herramientas de shell o archivos
+ que alcanzan estado del host bajo el backend local; subprocesos de ejecución de código
+ o MCP que alcanzan estado del host bajo aislamiento de backend de terminal que solo
+ sandboxea el shell; informes cuyas precondiciones requieren acceso de escritura preexistente
+ a archivos de configuración o credenciales propiedad del operador (esos ya están dentro
+ del envolvente de confianza).
+- **Configuraciones documentadas de emergencia.** Compensaciones seleccionadas por el operador
+ que deshabilitan explícitamente protecciones: `--insecure` y flags equivalentes
+ en el dashboard u otros componentes, aprobaciones deshabilitadas,
+ backend local en producción, perfiles de desarrollo que evitan
+ la seguridad de hermes-home, y similares. Los informes contra esas
+ configuraciones no son vulnerabilidades — eso es el trabajo del flag.
+- **Habilidades y plugins contribuidos por la comunidad.** Las habilidades de terceros
+ (incluyendo el repositorio de habilidades de la comunidad) y los plugins de terceros
+ están en la superficie de revisión del operador, no en la superficie de confianza de Hermes Agent
+ (§2.4, §2.5). Una habilidad o plugin que haga algo
+ malicioso es el modo de falla esperado de uno que no fue
+ revisado, no una vulnerabilidad en Hermes Agent. Los errores en la ruta de
+ instalación de habilidades o plugins de Hermes Agent que impidan al
+ operador ver lo que está instalando están en alcance bajo el §3.1.
+- **Exposición pública sin controles externos.** Exponer el
+ gateway o la API a la internet pública sin autenticación,
+ VPN o firewall.
+- **Restricciones de lectura/escritura a nivel de herramienta en una postura donde el shell está
+ permitido.** Si una ruta es alcanzable a través de la herramienta terminal, los informes
+ de que otras herramientas de archivos pueden alcanzarla no añaden nada.
+
+---
+
+## 4. Fortalecimiento del Despliegue
+
+La decisión de fortalecimiento más importante es hacer coincidir el aislamiento
+(§2.2) con la confianza del contenido que el agente ingerirá. Más allá de eso:
+
+- Ejecuta el agente como usuario no-root. La imagen de contenedor proporcionada
+ hace esto por defecto.
+- Mantén las credenciales en el archivo de credenciales del operador con permisos
+ estrictos, nunca en la configuración principal, nunca en control de versiones.
+ Bajo OpenShell, usa el almacén de Proveedores en lugar de un archivo de
+ credenciales en disco.
+- No expongas el gateway o la API a la internet pública sin
+ VPN, Tailscale o protección de firewall. Bajo OpenShell, usa la
+ capa de política de red para restringir el egreso.
+- Configura una lista de llamadores permitidos para cada adaptador de red expuesto
+ que habilites (§2.6).
+- Revisa las habilidades y plugins de terceros antes de instalar (§2.4,
+ §2.5). Para las habilidades, esto significa leer el Python y los scripts,
+ no solo SKILL.md. Los informes de Skills Guard y el registro de auditoría
+ de instalación son la superficie de revisión.
+- Hermes Agent incluye guardias de cadena de suministro para lanzamientos de servidores
+ MCP y para cambios de dependencias / paquetes incluidos en CI; consulta
+ `CONTRIBUTING.es.md` para más detalles.
+
+---
+
+## 5. Divulgación
+
+- **Ventana de divulgación coordinada:** 90 días desde el informe, o hasta que se
+ publique una corrección, lo que ocurra primero.
+- **Canal:** el hilo GHSA o correspondencia por email con
+ security@nousresearch.com.
+- **Crédito:** los reportadores reciben crédito en las notas de versión a menos que
+ se solicite anonimato.
diff --git a/SECURITY.md b/SECURITY.md
index c58e348b579..2579c6eaec5 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -121,10 +121,11 @@ outside the supported security posture.
### 2.3 Credential Scoping
Hermes Agent filters the environment it passes to its lower-trust
-in-process components: shell subprocesses, MCP subprocesses, and
-the code-execution child. Credentials like provider API keys and
-gateway tokens are stripped by default; variables explicitly
-declared by the operator or by a loaded skill are passed through.
+in-process components: shell subprocesses, MCP subprocesses,
+cron job scripts, and the code-execution child. Credentials like
+provider API keys and gateway tokens are stripped by default;
+variables explicitly declared by the operator or by a loaded
+skill are passed through.
This reduces casual exfiltration. It is not containment. Any
component running inside the agent process (skills, plugins, hook
diff --git a/acp_adapter/session.py b/acp_adapter/session.py
index c124229bec8..bbe34b06789 100644
--- a/acp_adapter/session.py
+++ b/acp_adapter/session.py
@@ -617,6 +617,10 @@ class SessionManager:
_register_task_cwd(session_id, cwd)
agent = AIAgent(**kwargs)
+ # Codex app-server sessions are spawned lazily on the first turn. Stamp
+ # the ACP workspace onto the agent so the Codex runtime starts from the
+ # editor/session cwd instead of the Hermes daemon's process cwd.
+ agent.session_cwd = cwd
# ACP stdio transport requires stdout to remain protocol-only JSON-RPC.
# Route any incidental human-readable agent output to stderr instead.
agent._print_fn = _acp_stderr_print
diff --git a/acp_registry/agent.json b/acp_registry/agent.json
index 4d900075229..aaf14f5f5f2 100644
--- a/acp_registry/agent.json
+++ b/acp_registry/agent.json
@@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
- "version": "0.16.0",
+ "version": "0.17.0",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
- "package": "hermes-agent[acp]==0.16.0",
+ "package": "hermes-agent[acp]==0.17.0",
"args": ["hermes-acp"]
}
}
diff --git a/agent/agent_init.py b/agent/agent_init.py
index 555f930f559..ffefcee5eb7 100644
--- a/agent/agent_init.py
+++ b/agent/agent_init.py
@@ -50,7 +50,7 @@ from agent.tool_guardrails import (
from hermes_cli.config import cfg_get
from hermes_cli.timeouts import get_provider_request_timeout
from hermes_constants import get_hermes_home
-from utils import base_url_host_matches
+from utils import base_url_host_matches, is_truthy_value
# Use the same logger name as run_agent so tests patching ``run_agent.logger``
# capture our warnings. (run_agent.py also does
@@ -265,7 +265,8 @@ def init_agent(
output_config.format instead of a trailing-assistant prefill.
platform (str): The interface platform the user is on (e.g. "cli", "telegram", "discord", "whatsapp").
Used to inject platform-specific formatting hints into the system prompt.
- skip_context_files (bool): If True, skip auto-injection of SOUL.md, AGENTS.md, and .cursorrules
+ skip_context_files (bool): If True, skip auto-injection of project context files
+ (SOUL.md, .hermes.md, AGENTS.md, CLAUDE.md, .cursorrules) from the cwd / HERMES_HOME
into the system prompt. Use this for batch processing and data generation to avoid
polluting trajectories with user-specific persona or project instructions.
load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary
@@ -531,7 +532,14 @@ def init_agent(
agent._last_activity_desc: str = "initializing"
agent._current_tool: str | None = None
agent._api_call_count: int = 0
-
+ # Opt-out flag for the between-turns MCP tool refresh (build_turn_context).
+ # Set on internal forks (e.g. background_review) that must keep ``tools[]``
+ # byte-identical to a parent for provider cache parity.
+ agent._skip_mcp_refresh = False
+ # Registry generation the current tool snapshot was derived from. Lets a
+ # late/concurrent refresh reject a stale (older-generation) rebuild instead
+ # of clobbering a newer one. Set adjacent to the tool snapshot below.
+ agent._tool_snapshot_generation = 0
# Rate limit tracking — updated from x-ratelimit-* response headers
# after each API call. Accessed by /usage slash command.
agent._rate_limit_state: Optional["RateLimitState"] = None
@@ -800,6 +808,8 @@ def init_agent(
# _custom_headers; older/mocked clients may expose
# _default_headers instead.
_routed_headers = getattr(_routed_client, "_custom_headers", None)
+ if not _routed_headers:
+ _routed_headers = getattr(_routed_client, "default_headers", None)
if not _routed_headers:
_routed_headers = getattr(_routed_client, "_default_headers", None)
if _routed_headers:
@@ -853,6 +863,8 @@ def init_agent(
if _provider_timeout is not None:
client_kwargs["timeout"] = _provider_timeout
_fb_headers = getattr(_fb_client, "_custom_headers", None)
+ if not _fb_headers:
+ _fb_headers = getattr(_fb_client, "default_headers", None)
if not _fb_headers:
_fb_headers = getattr(_fb_client, "_default_headers", None)
if _fb_headers:
@@ -953,7 +965,14 @@ def init_agent(
print(f"🔄 Fallback chain ({len(agent._fallback_chain)} providers): " +
" → ".join(f"{f['model']} ({f['provider']})" for f in agent._fallback_chain))
- # Get available tools with filtering
+ # Get available tools with filtering. Capture the registry generation this
+ # snapshot is derived from FIRST, so a later concurrent refresh can tell
+ # whether it holds a newer or staler view (see refresh_agent_mcp_tools).
+ try:
+ from tools.registry import registry as _snapshot_registry
+ agent._tool_snapshot_generation = _snapshot_registry._generation
+ except Exception:
+ agent._tool_snapshot_generation = 0
agent.tools = _ra().get_tool_definitions(
enabled_toolsets=enabled_toolsets,
disabled_toolsets=disabled_toolsets,
@@ -1081,6 +1100,12 @@ def init_agent(
agent._parent_session_id = parent_session_id
agent._last_flushed_db_idx = 0 # tracks DB-write cursor to prevent duplicate writes
agent._session_db_created = False # DB row deferred to run_conversation()
+ # Most agents own their session row and should finalize it on close().
+ # Some temporary helper agents (manual compression / session-hygiene /
+ # background-review forks) rotate or share the session forward to a
+ # continuation row that must remain open after the helper is torn down;
+ # those callers explicitly set this flag to False.
+ agent._end_session_on_close = True
agent._session_init_model_config = {
"max_iterations": agent.max_iterations,
"reasoning_config": reasoning_config,
@@ -1325,6 +1350,14 @@ def init_agent(
compression_abort_on_summary_failure = str(
_compression_cfg.get("abort_on_summary_failure", False)
).lower() in {"true", "1", "yes"}
+ # In-place compaction: when True, compress_context() rewrites the message
+ # list + rebuilds the system prompt WITHOUT rotating the session id (no
+ # parent_session_id chain, no `name #N` renumber). See #38763 and
+ # agent/conversation_compression.py. Consumed by compress_context(), not the
+ # compressor, so it rides on the agent.
+ compression_in_place = is_truthy_value(
+ _compression_cfg.get("in_place"), default=False
+ )
# Read optional explicit context_length override for the auxiliary
# compression model. Custom endpoints often cannot report this via
@@ -1544,6 +1577,7 @@ def init_agent(
abort_on_summary_failure=compression_abort_on_summary_failure,
)
agent.compression_enabled = compression_enabled
+ agent.compression_in_place = compression_in_place
# Reject models whose context window is below the minimum required
# for reliable tool-calling workflows (64K tokens).
diff --git a/agent/agent_runtime_helpers.py b/agent/agent_runtime_helpers.py
index 4a267f95596..92d521b16d8 100644
--- a/agent/agent_runtime_helpers.py
+++ b/agent/agent_runtime_helpers.py
@@ -1050,6 +1050,11 @@ def restore_primary_runtime(agent) -> bool:
agent._fallback_activated = False
agent._fallback_index = 0
+ # Undo the fallback's identity rewrite so the prompt is
+ # byte-identical to the stored copy again (prefix cache match).
+ from agent.chat_completion_helpers import rewrite_prompt_model_identity
+ rewrite_prompt_model_identity(agent, rt["model"], rt["provider"])
+
logger.info(
"Primary runtime restored for new turn: %s (%s)",
agent.model, agent.provider,
@@ -1373,22 +1378,6 @@ def create_openai_client(agent, client_kwargs: dict, *, reason: str, shared: boo
agent._client_log_context(),
)
return client
- if agent.provider == "google-gemini-cli" or str(client_kwargs.get("base_url", "")).startswith("cloudcode-pa://"):
- from agent.gemini_cloudcode_adapter import GeminiCloudCodeClient
-
- # Strip OpenAI-specific kwargs the Gemini client doesn't accept
- safe_kwargs = {
- k: v for k, v in client_kwargs.items()
- if k in {"api_key", "base_url", "default_headers", "project_id", "timeout"}
- }
- client = GeminiCloudCodeClient(**safe_kwargs)
- _ra().logger.info(
- "Gemini Cloud Code Assist client created (%s, shared=%s) %s",
- reason,
- shared,
- agent._client_log_context(),
- )
- return client
if agent.provider == "gemini":
from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
@@ -2182,25 +2171,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
if source_msg.get("role") != "assistant":
return
- # 1. Explicit reasoning_content already set — preserve it verbatim
- # (includes DeepSeek/Kimi's own space-placeholder written at creation
- # time, and any valid reasoning content from the same provider).
+ needs_thinking_pad = agent._needs_thinking_reasoning_pad()
+
+ # 1. Explicit reasoning_content already set.
#
- # Exception: sessions persisted BEFORE #17341 have empty-string
- # placeholders pinned at creation time. DeepSeek V4 Pro rejects
- # those with HTTP 400. When the active provider enforces the
- # thinking-mode echo, upgrade "" → " " on replay so stale history
- # doesn't 400 the user on the next turn.
+ # When the active provider enforces the thinking-mode echo-back
+ # (DeepSeek / Kimi / MiMo), preserve it verbatim — that includes their
+ # own space-placeholder written at creation time and any valid reasoning
+ # from the same provider. Sessions persisted BEFORE #17341 have
+ # empty-string placeholders pinned at creation time; DeepSeek V4 Pro
+ # rejects those with HTTP 400, so upgrade "" → " " on replay.
+ #
+ # When the active provider does NOT enforce echo-back, strip the field
+ # entirely. Strict OpenAI-compatible providers (Mistral, Cerebras, Groq,
+ # SambaNova, …) reject ANY reasoning_content key in input messages with
+ # HTTP 400/422 ("Extra inputs are not permitted"), even an empty string
+ # or a single-space pad. This is the cross-provider fallback case: a
+ # reasoning primary (DeepSeek/Kimi/MiMo) pads history with " ", then a
+ # fallback to a strict provider replays that pad and 422s. Stripping
+ # here covers the rebuild path; reapply_reasoning_echo_for_provider()
+ # covers the already-built api_messages path. Refs #45655.
existing = source_msg.get("reasoning_content")
if isinstance(existing, str):
- if existing == "" and agent._needs_thinking_reasoning_pad():
+ if not needs_thinking_pad:
+ api_msg.pop("reasoning_content", None)
+ elif existing == "":
api_msg["reasoning_content"] = " "
else:
api_msg["reasoning_content"] = existing
return
- needs_thinking_pad = agent._needs_thinking_reasoning_pad()
-
# 2. Cross-provider poisoned history (#15748): on DeepSeek/Kimi,
# if the source turn has tool_calls AND a 'reasoning' field but no
# 'reasoning_content' key, the 'reasoning' text was written by a
@@ -2226,9 +2226,13 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
# for providers that use the internal 'reasoning' key.
# This must happen before the unconditional empty-string fallback so
# genuine reasoning content is not overwritten (#15812 regression in
- # PR #15478).
+ # PR #15478). Only promote for providers that enforce echo-back —
+ # strict providers reject the field (refs #45655).
if isinstance(normalized_reasoning, str) and normalized_reasoning:
- api_msg["reasoning_content"] = normalized_reasoning
+ if needs_thinking_pad:
+ api_msg["reasoning_content"] = normalized_reasoning
+ else:
+ api_msg.pop("reasoning_content", None)
return
# 4. DeepSeek / Kimi thinking mode: all assistant messages need
@@ -2249,34 +2253,53 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
- """Re-pad assistant turns with reasoning_content for the active provider.
+ """Re-pad (or strip) assistant turns' reasoning_content for the active provider.
``api_messages`` is built once, before the retry loop, while the *primary*
- provider is active. If a mid-conversation fallback then switches to a
- require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
- turns that were built when the prior provider did NOT need the echo-back go
- out without ``reasoning_content`` and the new provider rejects them with
- HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
+ provider is active. A mid-conversation fallback can then switch providers,
+ so the reasoning fields baked into ``api_messages`` are shaped for the
+ *prior* provider and must be reconciled against the *current* one:
- Calling this immediately before building the request kwargs re-applies the
- pad against the *current* provider. It is idempotent and a no-op unless
- ``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
- is safe to call every iteration and covers every fallback path.
+ * Switching TO a require-side provider (DeepSeek / Kimi / MiMo thinking
+ mode): assistant turns built when the prior provider did NOT need the
+ echo-back go out without ``reasoning_content`` and the new provider
+ rejects them with HTTP 400 ("The reasoning_content in the thinking mode
+ must be passed back"). Re-apply the pad.
- Returns the number of assistant turns that gained reasoning_content.
+ * Switching TO a strict provider that rejects the field (Mistral,
+ Cerebras, Groq, SambaNova, …): assistant turns built under a reasoning
+ primary carry a ``reasoning_content`` pad (often a single space ``" "``),
+ and the strict provider rejects it with HTTP 400/422 ("Extra inputs are
+ not permitted"). Strip the field. This is the exact cross-provider
+ fallback bug from #45655 — a DeepSeek primary pads history with ``" "``,
+ the request falls back to Mistral, and Mistral 422s on the stale pad.
+
+ Calling this immediately before building the request kwargs reconciles the
+ fields against the *current* provider. It is idempotent and safe to call
+ every iteration; it covers every fallback path.
+
+ Returns the number of assistant turns whose reasoning_content was added or
+ removed.
"""
- if not agent._needs_thinking_reasoning_pad():
- return 0
- padded = 0
+ needs_pad = agent._needs_thinking_reasoning_pad()
+ changed = 0
for api_msg in api_messages:
if api_msg.get("role") != "assistant":
continue
- if api_msg.get("reasoning_content"):
- continue
- copy_reasoning_content_for_api(agent, api_msg, api_msg)
- if api_msg.get("reasoning_content"):
- padded += 1
- return padded
+ if needs_pad:
+ if api_msg.get("reasoning_content"):
+ continue
+ copy_reasoning_content_for_api(agent, api_msg, api_msg)
+ if api_msg.get("reasoning_content"):
+ changed += 1
+ else:
+ # Strict provider — strip any stale reasoning_content pad left
+ # over from a reasoning primary so the fallback request doesn't
+ # 400/422 on it.
+ if "reasoning_content" in api_msg:
+ api_msg.pop("reasoning_content", None)
+ changed += 1
+ return changed
def _iter_pool_sockets(client: Any):
diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py
index 4a586d7f0fd..03e8b58e16c 100644
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@@ -2535,3 +2535,56 @@ def sanitize_anthropic_kwargs(api_kwargs: Any, *, log_prefix: str = "") -> Any:
sorted(leaked),
)
return api_kwargs
+
+
+def _is_stream_unavailable_error(exc: Exception) -> bool:
+ """Return True when an Anthropic stream call should fall back to create()."""
+ err_lower = str(exc).lower()
+ if "stream" in err_lower and "not supported" in err_lower:
+ return True
+ if "invokemodelwithresponsestream" in err_lower:
+ from agent.bedrock_adapter import is_streaming_access_denied_error
+
+ return is_streaming_access_denied_error(exc)
+ return False
+
+
+def create_anthropic_message(
+ client: Any,
+ api_kwargs: dict,
+ *,
+ log_prefix: str = "",
+ prefer_stream: bool = True,
+) -> Any:
+ """Create an Anthropic message, aggregating via stream when available.
+
+ Some Anthropic-compatible gateways are SSE-only: they ignore non-streaming
+ requests and return ``text/event-stream`` even for ``messages.create()``.
+ The SDK can surface that as raw text, so callers that expect a Message then
+ crash on ``.content``. Prefer ``messages.stream().get_final_message()`` to
+ match the main turn path, falling back to ``create()`` only for providers
+ that explicitly do not support streaming, such as restricted Bedrock roles.
+ """
+ sanitize_anthropic_kwargs(api_kwargs, log_prefix=log_prefix)
+
+ messages_api = getattr(client, "messages", None)
+ stream_fn = getattr(messages_api, "stream", None)
+ if prefer_stream and callable(stream_fn):
+ stream_kwargs = dict(api_kwargs)
+ stream_kwargs.pop("stream", None)
+ try:
+ with stream_fn(**stream_kwargs) as stream:
+ return stream.get_final_message()
+ except Exception as exc:
+ if not _is_stream_unavailable_error(exc):
+ raise
+ logger.debug(
+ "%sAnthropic Messages stream unavailable; falling back to "
+ "messages.create(): %s",
+ log_prefix,
+ exc,
+ )
+
+ create_kwargs = dict(api_kwargs)
+ create_kwargs.pop("stream", None)
+ return messages_api.create(**create_kwargs)
diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py
index 86a1c765a78..0afb0add20b 100644
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -40,6 +40,7 @@ Payment / credit exhaustion fallback:
their OpenRouter balance but has Codex OAuth or another provider available.
"""
+import contextlib
import json
import logging
import os
@@ -102,11 +103,44 @@ OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
-from utils import base_url_host_matches, base_url_hostname, model_forces_max_completion_tokens, normalize_proxy_env_vars
+from utils import base_url_host_matches, base_url_hostname, env_float, model_forces_max_completion_tokens, normalize_proxy_env_vars
logger = logging.getLogger(__name__)
+# ── Interrupt protection for atomic auxiliary tasks ──────────────────────
+# Some auxiliary tasks must NOT be aborted mid-flight by a gateway interrupt
+# (e.g. an incoming user message while the agent is busy). Context
+# compression is the prime case: if the summary LLM call is interrupted
+# part-way, compression falls back to a static "summary unavailable" marker
+# and the real handoff is lost (#23975). A thread-local flag lets such a
+# task mark its in-flight LLM call as interrupt-protected; the Codex
+# Responses stream's cancellation check honors it. TIMEOUTS still fire
+# (a hung call must die), and all OTHER aux tasks (vision, web_extract,
+# title_generation, …) remain freely interruptible.
+_aux_interrupt_protection = threading.local()
+
+
+def _aux_interrupt_protected() -> bool:
+ return bool(getattr(_aux_interrupt_protection, "active", False))
+
+
+@contextlib.contextmanager
+def aux_interrupt_protection(active: bool = True):
+ """Mark the current thread's auxiliary LLM call as interrupt-protected.
+
+ Used by atomic aux tasks (compression) so a mid-flight gateway interrupt
+ doesn't abort the call and trigger a degraded fallback. Re-entrant-safe:
+ restores the previous value on exit.
+ """
+ prev = getattr(_aux_interrupt_protection, "active", False)
+ _aux_interrupt_protection.active = active
+ try:
+ yield
+ finally:
+ _aux_interrupt_protection.active = prev
+
+
def _safe_isinstance(obj: Any, maybe_type: Any) -> bool:
"""Return False instead of raising when a patched symbol is not a type."""
try:
@@ -631,6 +665,13 @@ def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
return str(url or "").strip().rstrip("/")
+def _nous_min_key_ttl_seconds() -> int:
+ try:
+ return max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
+ except (TypeError, ValueError):
+ return 1800
+
+
# ── Codex Responses → chat.completions adapter ─────────────────────────────
# All auxiliary consumers call client.chat.completions.create(**kwargs) and
# read response.choices[0].message.content. This adapter translates those
@@ -805,7 +846,11 @@ class _CodexCompletionsAdapter:
raise TimeoutError(_timeout_message())
try:
from tools.interrupt import is_interrupted
- if is_interrupted():
+ # Honor interrupt protection for atomic aux tasks (compression):
+ # a mid-flight gateway interrupt must NOT abort the summary call
+ # and trigger a degraded fallback marker (#23975). Timeouts above
+ # still fire; other aux tasks remain interruptible.
+ if is_interrupted() and not _aux_interrupt_protected():
raise InterruptedError("Codex auxiliary Responses stream interrupted")
except InterruptedError:
raise
@@ -997,7 +1042,7 @@ class _AnthropicCompletionsAdapter:
self._is_oauth = is_oauth
def create(self, **kwargs) -> Any:
- from agent.anthropic_adapter import build_anthropic_kwargs
+ from agent.anthropic_adapter import build_anthropic_kwargs, create_anthropic_message
from agent.transports import get_transport
messages = kwargs.get("messages", [])
@@ -1041,7 +1086,7 @@ class _AnthropicCompletionsAdapter:
if not _forbids_sampling_params(model):
anthropic_kwargs["temperature"] = temperature
- response = self._client.messages.create(**anthropic_kwargs)
+ response = create_anthropic_message(self._client, anthropic_kwargs)
_transport = get_transport("anthropic_messages")
_nr = _transport.normalize_response(
response, strip_tool_prefix=self._is_oauth
@@ -1300,6 +1345,57 @@ def _nous_base_url() -> str:
return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
+def _resolve_nous_pool_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
+ """Resolve Nous auxiliary credentials from the selected pool entry."""
+ try:
+ from hermes_cli.auth import _agent_key_is_usable
+
+ pool = load_pool("nous")
+ except Exception as exc:
+ logger.debug("Auxiliary Nous pool credential resolution failed: %s", exc)
+ return None
+
+ if not pool or not pool.has_credentials():
+ return None
+
+ try:
+ entry = pool.select()
+ except Exception as exc:
+ logger.debug("Auxiliary Nous pool selection failed: %s", exc)
+ return None
+
+ if entry is None:
+ return None
+
+ state = {
+ "agent_key": getattr(entry, "agent_key", None),
+ "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+ "scope": getattr(entry, "scope", None),
+ }
+ if force_refresh or not _agent_key_is_usable(state, _nous_min_key_ttl_seconds()):
+ try:
+ refreshed = pool.try_refresh_current()
+ except Exception as exc:
+ logger.debug("Auxiliary Nous pool refresh failed: %s", exc)
+ refreshed = None
+ if refreshed is None:
+ return None
+ entry = refreshed
+
+ provider = {
+ "agent_key": getattr(entry, "agent_key", None),
+ "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+ "access_token": getattr(entry, "access_token", None),
+ "expires_at": getattr(entry, "expires_at", None),
+ "scope": getattr(entry, "scope", None),
+ }
+ api_key = _nous_api_key(provider)
+ base_url = _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL)
+ if not api_key or not base_url:
+ return None
+ return api_key, base_url
+
+
def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
"""Return fresh Nous runtime credentials when available.
@@ -1308,11 +1404,15 @@ def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[
relying only on whatever raw tokens happen to be sitting in auth.json
or the credential pool.
"""
+ pooled = _resolve_nous_pool_runtime_api(force_refresh=force_refresh)
+ if pooled is not None:
+ return pooled
+
try:
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
- timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
+ timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
force_refresh=force_refresh,
)
except Exception as exc:
@@ -2905,7 +3005,7 @@ def _refresh_provider_credentials(provider: str) -> bool:
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
- timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
+ timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
force_refresh=True,
)
if not str(creds.get("api_key", "") or "").strip():
diff --git a/agent/background_review.py b/agent/background_review.py
index ee4791d98d3..fa4de508e19 100644
--- a/agent/background_review.py
+++ b/agent/background_review.py
@@ -535,6 +535,13 @@ def _run_review_in_thread(
)
review_agent._memory_write_origin = "background_review"
review_agent._memory_write_context = "background_review"
+ # The review fork pins the parent's cached system prompt and keeps
+ # ``tools[]`` byte-identical to the parent so its outbound request
+ # hits the same provider cache prefix (see the toolset-parity note
+ # above). The between-turns MCP refresh in build_turn_context would
+ # add late-connecting MCP tools to this fork and break that parity,
+ # so opt the review fork out of it.
+ review_agent._skip_mcp_refresh = True
review_agent._memory_store = agent._memory_store
review_agent._memory_enabled = agent._memory_enabled
review_agent._user_profile_enabled = agent._user_profile_enabled
@@ -568,6 +575,13 @@ def _run_review_in_thread(
# if a future code path bypasses the cache.
review_agent.session_start = agent.session_start
review_agent.session_id = agent.session_id
+ # The fork shares the parent's live session_id (pinned above for
+ # prefix-cache parity). It is single-lifecycle and calls close()
+ # right after this run_conversation(); without opting out, close()
+ # would finalize the parent's still-active session row mid
+ # conversation (the review fires every ~10 turns). Leave session
+ # finalization to the real owner (CLI close / gateway reset / cron).
+ review_agent._end_session_on_close = False
# Never let the review fork compress. It shares the parent's
# session_id, so if it won a compression race it would rotate the
# parent into a NEW child that the gateway never adopts (the fork
diff --git a/agent/chat_completion_helpers.py b/agent/chat_completion_helpers.py
index 1ee1702b45e..cee392caaba 100644
--- a/agent/chat_completion_helpers.py
+++ b/agent/chat_completion_helpers.py
@@ -34,7 +34,7 @@ from agent.message_sanitization import (
_repair_tool_call_arguments,
)
from tools.terminal_tool import is_persistent_env
-from utils import base_url_host_matches, base_url_hostname, env_int
+from utils import base_url_host_matches, base_url_hostname, env_float, env_int
logger = logging.getLogger(__name__)
@@ -1042,6 +1042,35 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
+def rewrite_prompt_model_identity(agent, model: str, provider: str) -> None:
+ """Point the cached system prompt's ``Model:``/``Provider:`` lines at
+ the active runtime after a provider switch.
+
+ The system prompt is session-stable and replayed verbatim for prefix-cache
+ warmth, but after a failover the new backend's cache is cold anyway —
+ while a stale identity line makes the agent misreport which model it is
+ when asked. Rewrite the lines in place WITHOUT persisting to the session
+ DB: the stored row keeps the primary's labels, so when the primary is
+ restored the prompt is byte-identical to the stored copy again and its
+ prefix cache still matches.
+
+ Only the LAST occurrence of each line is touched — the identity lines
+ live in the volatile tail of the prompt, and earlier matches could be
+ user content (memory snapshots, context files).
+ """
+ sp = getattr(agent, "_cached_system_prompt", None)
+ if not isinstance(sp, str) or not sp:
+ return
+ for label, value in (("Model", model), ("Provider", provider)):
+ if not value:
+ continue
+ matches = list(re.finditer(rf"(?m)^{label}: .*$", sp))
+ if matches:
+ last = matches[-1]
+ sp = f"{sp[:last.start()]}{label}: {value}{sp[last.end():]}"
+ agent._cached_system_prompt = sp
+
+
def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool:
"""Switch to the next fallback model/provider in the chain.
@@ -1287,6 +1316,10 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
api_mode=agent.api_mode,
)
+ # Keep the prompt's self-identity in sync with the model actually
+ # answering, so "what model are you?" doesn't report the primary.
+ rewrite_prompt_model_identity(agent, fb_model, fb_provider)
+
agent._buffer_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
@@ -1761,14 +1794,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_base_timeout = (
_provider_timeout_cfg
if _provider_timeout_cfg is not None
- else float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+ else env_float("HERMES_API_TIMEOUT", 1800.0)
)
# Read timeout: config wins here too. Otherwise use
# HERMES_STREAM_READ_TIMEOUT (default 120s) for cloud providers.
if _provider_timeout_cfg is not None:
_stream_read_timeout = _provider_timeout_cfg
else:
- _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+ _stream_read_timeout = env_float("HERMES_STREAM_READ_TIMEOUT", 120.0)
# Local providers (Ollama, llama.cpp, vLLM) can take minutes for
# prefill on large contexts before producing the first token.
# Auto-increase the httpx read timeout unless the user explicitly
@@ -2508,7 +2541,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
if _cfg_stale is not None:
_stream_stale_timeout_base = _cfg_stale
else:
- _stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
+ _stream_stale_timeout_base = env_float("HERMES_STREAM_STALE_TIMEOUT", 180.0)
# Local providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
# for prefill on large contexts. Disable the stale detector unless
# the user explicitly set HERMES_STREAM_STALE_TIMEOUT.
diff --git a/agent/codex_runtime.py b/agent/codex_runtime.py
index 7f175fff97f..e638a194159 100644
--- a/agent/codex_runtime.py
+++ b/agent/codex_runtime.py
@@ -25,6 +25,61 @@ from typing import Any, Dict, List
logger = logging.getLogger(__name__)
+def _codex_note_to_tool_progress(note: dict) -> tuple[str, str, dict] | None:
+ """Map a Codex app-server ``item/started`` notification to a Hermes
+ tool-progress event ``(tool_name, preview, args)``.
+
+ The Codex app-server runtime processes ``item/started`` notifications for
+ command execution, file changes, and MCP/dynamic tool calls, but never
+ surfaced them as Hermes tool-progress events — so gateways (Telegram, etc.)
+ showed no verbose "running X" breadcrumbs on this route while every other
+ provider did (#38835). Returns None for items that aren't tool-shaped.
+ """
+ if not isinstance(note, dict) or note.get("method") != "item/started":
+ return None
+ params = note.get("params") or {}
+ item = params.get("item") or {}
+ if not isinstance(item, dict):
+ return None
+
+ item_type = item.get("type") or ""
+ if item_type == "commandExecution":
+ command = item.get("command") or ""
+ return "exec_command", command, {"command": command, "cwd": item.get("cwd") or ""}
+
+ if item_type == "fileChange":
+ changes = item.get("changes") or []
+ preview = "file changes"
+ if isinstance(changes, list) and changes:
+ paths = [
+ str(change.get("path"))
+ for change in changes
+ if isinstance(change, dict) and change.get("path")
+ ]
+ if paths:
+ preview = ", ".join(paths[:3])
+ if len(paths) > 3:
+ preview += f", +{len(paths) - 3} more"
+ return "apply_patch", preview, {"changes": changes}
+
+ if item_type == "mcpToolCall":
+ server = item.get("server") or "mcp"
+ tool = item.get("tool") or "unknown"
+ args = item.get("arguments") or {}
+ if not isinstance(args, dict):
+ args = {"arguments": args}
+ return f"mcp.{server}.{tool}", tool, args
+
+ if item_type == "dynamicToolCall":
+ tool = item.get("tool") or "unknown"
+ args = item.get("arguments") or {}
+ if not isinstance(args, dict):
+ args = {"arguments": args}
+ return tool, tool, args
+
+ return None
+
+
def _coerce_usage_int(value: Any) -> int:
if isinstance(value, bool):
return 0
@@ -195,7 +250,9 @@ def run_codex_app_server_turn(
# Spawned on first turn, reused across turns, closed at AIAgent
# shutdown (see _cleanup hook).
if not hasattr(agent, "_codex_session") or agent._codex_session is None:
- cwd = getattr(agent, "session_cwd", None) or os.getcwd()
+ from agent.runtime_cwd import resolve_agent_cwd
+
+ cwd = getattr(agent, "session_cwd", None) or str(resolve_agent_cwd())
# Approval callback: defer to Hermes' standard prompt flow if a
# CLI thread has installed one. Gateway / cron contexts get the
# codex-side fail-closed default.
@@ -204,9 +261,27 @@ def run_codex_app_server_turn(
approval_callback = _get_approval_callback()
except Exception:
approval_callback = None
+
+ def _on_codex_event(note: dict) -> None:
+ # Bridge Codex app-server item/started notifications to Hermes
+ # tool-progress so gateways show verbose "running X" breadcrumbs
+ # on this route too (#38835).
+ progress_callback = getattr(agent, "tool_progress_callback", None)
+ if progress_callback is None:
+ return
+ mapped = _codex_note_to_tool_progress(note)
+ if mapped is None:
+ return
+ tool_name, preview, args = mapped
+ try:
+ progress_callback("tool.started", tool_name, preview, args)
+ except Exception:
+ logger.debug("codex tool-progress callback raised", exc_info=True)
+
agent._codex_session = CodexAppServerSession(
cwd=cwd,
approval_callback=approval_callback,
+ on_event=_on_codex_event,
)
# NOTE: the user message is ALREADY appended to messages by the
@@ -290,6 +365,7 @@ def run_codex_app_server_turn(
original_user_message=original_user_message,
final_response=turn.final_text,
interrupted=False,
+ messages=messages,
)
except Exception:
logger.debug("external memory sync raised", exc_info=True)
diff --git a/agent/context_compressor.py b/agent/context_compressor.py
index 16db1bedc30..19bc0e5f0f1 100644
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -23,7 +23,7 @@ import re
import time
from typing import Any, Dict, List, Optional
-from agent.auxiliary_client import call_llm, _is_connection_error
+from agent.auxiliary_client import call_llm, _is_connection_error, aux_interrupt_protection
from agent.context_engine import ContextEngine
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
@@ -656,9 +656,8 @@ class ContextCompressor(ContextEngine):
self.provider = provider
self.api_mode = api_mode
self.context_length = context_length
- self.threshold_tokens = max(
- int(context_length * self.threshold_percent),
- MINIMUM_CONTEXT_LENGTH,
+ self.threshold_tokens = self._compute_threshold_tokens(
+ context_length, self.threshold_percent
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
@@ -668,6 +667,62 @@ class ContextCompressor(ContextEngine):
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
+ # Reset cross-call calibration state captured under the PREVIOUS model.
+ # These fields encode "the provider proved this prompt fit" / "preflight
+ # can be deferred" decisions that are only valid for the model that
+ # produced them. Carrying them across a switch to a smaller-context
+ # model would let should_defer_preflight_to_real_usage() suppress a
+ # preflight compression the new model actually needs — the exact
+ # oversized-send-after-switch failure in #23767. The new model's first
+ # response repopulates them via update_from_response(). Setting
+ # last_prompt_tokens to 0 (NOT -1) is deliberate: 0 is the documented
+ # "no real usage yet -> use the rough estimate" state, so the post-
+ # response should_compress path falls back to estimate_request_tokens_rough
+ # rather than skipping compression. -1 is a different sentinel
+ # (#36718, "compression just ran, await real usage") and must not be set here.
+ self.last_prompt_tokens = 0
+ self.last_completion_tokens = 0
+ self.last_total_tokens = 0
+ self.last_real_prompt_tokens = 0
+ self.last_rough_tokens_when_real_prompt_fit = 0
+ self.last_compression_rough_tokens = 0
+ self.awaiting_real_usage_after_compression = False
+ self._ineffective_compression_count = 0
+
+ # When the MINIMUM_CONTEXT_LENGTH floor meets/exceeds a small context
+ # window, compacting at the percentage (50% → 32K of a 64K window) wastes
+ # half the usable context. Trigger near the top of the window instead so a
+ # minimum-context model uses most of its budget before compacting — same
+ # rationale as the gpt-5.5/Codex 85% autoraise.
+ _MIN_CTX_TRIGGER_RATIO = 0.85
+
+ @staticmethod
+ def _compute_threshold_tokens(context_length: int, threshold_percent: float) -> int:
+ """Compute the compaction trigger threshold in tokens.
+
+ The base value is ``context_length * threshold_percent``, floored at
+ ``MINIMUM_CONTEXT_LENGTH`` so large-context models don't compress
+ prematurely at 50%. BUT that floor degenerates at small windows: for a
+ model whose ``context_length`` is at/below the minimum (e.g. a 64K
+ local model), ``max(0.5*64000, 64000) == 64000`` makes the threshold
+ equal the ENTIRE window — auto-compression can never fire because the
+ provider rejects the request before usage reaches 100% (#14690).
+
+ When the floor would meet or exceed the context window, trigger at
+ ``_MIN_CTX_TRIGGER_RATIO`` (85%) of the window — high enough that a
+ small model uses most of its context before compacting, but below
+ 100% so compaction fires before the provider rejects the request.
+ """
+ pct_value = int(context_length * threshold_percent)
+ floored = max(pct_value, MINIMUM_CONTEXT_LENGTH)
+ # If flooring pushed the threshold to/over the window it can never be
+ # reached. Trigger at 85% of the window so a minimum-context model
+ # rides most of its budget before compacting instead of wasting half.
+ if context_length > 0 and floored >= context_length:
+ return max(1, min(int(context_length * ContextCompressor._MIN_CTX_TRIGGER_RATIO),
+ context_length - 1))
+ return floored
+
def __init__(
self,
model: str,
@@ -708,10 +763,11 @@ class ContextCompressor(ContextEngine):
# Floor: never compress below MINIMUM_CONTEXT_LENGTH tokens even if
# the percentage would suggest a lower value. This prevents premature
# compression on large-context models at 50% while keeping the % sane
- # for models right at the minimum.
- self.threshold_tokens = max(
- int(self.context_length * threshold_percent),
- MINIMUM_CONTEXT_LENGTH,
+ # for models right at the minimum. _compute_threshold_tokens also
+ # guards the degenerate case where the floor would equal/exceed the
+ # window (small models), so auto-compression can still fire (#14690).
+ self.threshold_tokens = self._compute_threshold_tokens(
+ self.context_length, threshold_percent
)
self.compression_count = 0
@@ -761,6 +817,14 @@ class ContextCompressor(ContextEngine):
# this flag to know "compression was attempted but aborted, freeze
# the chat until the user manually retries via /compress".
self._last_compress_aborted: bool = False
+ # Set True when the summary call failed with an authentication /
+ # permission error (HTTP 401/403). Auth failures are non-recoverable
+ # at the request level — the credential or endpoint is broken — so
+ # compress() must ABORT (preserve the session unchanged) rather than
+ # rotate into a degraded child session with a placeholder summary.
+ # This is independent of the abort_on_summary_failure config flag:
+ # rotating on a broken credential is never the right behavior.
+ self._last_summary_auth_failure: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
@@ -1245,7 +1309,10 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
Unknown from deterministic fallback. Inspect current repository/session state if needed.
{HISTORICAL_IN_PROGRESS_HEADING}
-{active_task}
+Unknown from deterministic fallback — the latest user ask is recorded once under
+"{HISTORICAL_TASK_HEADING}" above as historical context only. Do NOT treat it as an
+unfulfilled instruction to re-answer; verify current state and continue from the
+protected recent messages after this summary.
## Blocked
{_bullets(blockers, limit=5)}
@@ -1257,7 +1324,9 @@ None recoverable from deterministic fallback.
None recoverable from deterministic fallback.
{HISTORICAL_PENDING_ASKS_HEADING}
-{active_task}
+None recoverable from deterministic fallback. (The latest user ask is preserved once
+under "{HISTORICAL_TASK_HEADING}" as historical context — it is NOT necessarily
+outstanding.)
## Relevant Files
{_bullets(relevant_files, limit=12)}
@@ -1511,11 +1580,33 @@ This compaction should PRIORITISE preserving all information related to the focu
}
if self.summary_model:
call_kwargs["model"] = self.summary_model
- response = call_llm(**call_kwargs)
+ # Compression is atomic: protect the in-flight summary call from a
+ # mid-turn gateway interrupt. Without this, an incoming user message
+ # aborts the summary and compression falls back to a degraded static
+ # marker, losing the real handoff (#23975). Re-entrant: a main-model
+ # retry (_generate_summary recursion) re-enters harmlessly.
+ with aux_interrupt_protection():
+ response = call_llm(**call_kwargs)
content = response.choices[0].message.content
# Handle cases where content is not a string (e.g., dict from llama.cpp)
if not isinstance(content, str):
content = str(content) if content else ""
+ # Some OpenAI-compatible proxies (e.g. cmkey.cn, one-api channels)
+ # return a well-formed HTTP 200 with an empty or whitespace-only
+ # ``content`` instead of an error or empty ``choices``. That payload
+ # passes ``_validate_llm_response`` (a ``message`` exists), so it
+ # reaches here and would otherwise be stored as a prefix-only
+ # summary with no body — silently wiping the compacted turns and
+ # making the model forget the in-progress task (#11978, #11914).
+ # Treat empty content as a failure so it routes through the same
+ # main-model fallback + cooldown machinery as a transport error,
+ # rather than replacing real context with an empty summary.
+ if not content.strip():
+ raise RuntimeError(
+ "Context compression LLM returned empty content "
+ f"(provider={self.provider or 'auto'} "
+ f"model={self.summary_model or self.model})"
+ )
# Redact the summary output as well — the summarizer LLM may
# ignore prompt instructions and echo back secrets verbatim.
summary = redact_sensitive_text(content.strip())
@@ -1524,17 +1615,29 @@ This compaction should PRIORITISE preserving all information related to the focu
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
+ self._last_summary_auth_failure = False
return self._with_summary_prefix(summary)
- except RuntimeError:
- # No provider configured — long cooldown, unlikely to self-resolve
- self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
- self._last_summary_error = "no auxiliary LLM provider configured"
- logger.warning("Context compression: no provider available for "
- "summary. Middle turns will be dropped without summary "
- "for %d seconds.",
- _SUMMARY_FAILURE_COOLDOWN_SECONDS)
- return None
except Exception as e:
+ # ``call_llm`` raises ``RuntimeError`` for two very different cases:
+ # 1. No provider configured ("No LLM provider configured ...") —
+ # a permanent misconfiguration, long cooldown is correct.
+ # 2. An empty/invalid response from a configured provider
+ # (``_validate_llm_response`` empty-``choices``/``None``, or our
+ # empty-``content`` guard above) — a transient/proxy fault that
+ # should fall back to the main model first, exactly like the
+ # transport errors handled below.
+ # Only (1) belongs in the long no-provider cooldown; (2) and every
+ # other exception flow into the generic fallback logic so they get
+ # a main-model retry before any cooldown. (#11978, #11914)
+ if isinstance(e, RuntimeError) and "no llm provider configured" in str(e).lower():
+ # No provider configured — long cooldown, unlikely to self-resolve
+ self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
+ self._last_summary_error = "no auxiliary LLM provider configured"
+ logger.warning("Context compression: no provider available for "
+ "summary. Middle turns will be dropped without summary "
+ "for %d seconds.",
+ _SUMMARY_FAILURE_COOLDOWN_SECONDS)
+ return None
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
@@ -1571,6 +1674,26 @@ This compaction should PRIORITISE preserving all information related to the focu
# back to the main model instead of entering a 60-second cooldown.
# See issue #18458.
_is_streaming_closed = _is_connection_error(e)
+ # Authentication / permission failures (401/403) are NOT transient
+ # and NOT fixable by retrying the same request: the credential is
+ # invalid/blocked/expired or the endpoint is wrong (e.g. a prod
+ # token sent to a staging inference URL). Flag them so compress()
+ # aborts and preserves the session instead of rotating into a
+ # degraded child with a placeholder summary. We still allow the
+ # one-shot fallback to the MAIN model below when the failure came
+ # from a distinct auxiliary summary_model (its dedicated creds may
+ # be the only broken thing); only a failure on the main model — or
+ # a fallback that also auth-fails — makes the abort stick.
+ _is_auth_error = (
+ _status in {401, 403}
+ or "invalid api key" in _err_str
+ or "invalid x-api-key" in _err_str
+ or ("api key" in _err_str and ("invalid" in _err_str or "blocked" in _err_str))
+ or "unauthorized" in _err_str
+ or "authentication" in _err_str
+ )
+ if _is_auth_error:
+ self._last_summary_auth_failure = True
if _is_json_decode and not _is_model_not_found and not _is_timeout:
logger.error(
"Context compression failed: auxiliary LLM returned a "
@@ -1809,6 +1932,23 @@ This compaction should PRIORITISE preserving all information related to the focu
idx += 1
return idx
+ def _effective_protect_first_n(self) -> int:
+ """``protect_first_n`` decayed across compression cycles.
+
+ ``protect_first_n`` keeps the first N non-system messages verbatim so
+ the original task framing survives the FIRST compaction. But applying
+ it on every subsequent pass fossilizes those early turns — they're
+ re-copied into each child session and never summarized away, so old
+ user messages become immortal and grow the head unboundedly across a
+ long session (#11996). Once the session has been compressed at least
+ once, the early turns are already captured in the handoff summary, so
+ there's no need to keep re-protecting them: decay to 0 (the system
+ prompt is still always protected separately by _protect_head_size).
+ """
+ if self.compression_count >= 1 or self._previous_summary:
+ return 0
+ return self.protect_first_n
+
def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
"""Total count of head messages to protect.
@@ -1820,14 +1960,19 @@ This compaction should PRIORITISE preserving all information related to the focu
the ``messages`` list (e.g. the gateway ``/compress`` handler
strips it before calling compress()).
- Examples:
+ The ``protect_first_n`` portion DECAYS after the first compression
+ (see _effective_protect_first_n) so early user turns don't fossilize
+ across repeated compactions (#11996).
+
+ Examples (first compaction):
protect_first_n=0 → system prompt only (or nothing if no system msg)
protect_first_n=3 → system + first 3 non-system messages
+ After the first compaction: system prompt only.
"""
head = 0
if messages and messages[0].get("role") == "system":
head = 1
- return head + self.protect_first_n
+ return head + self._effective_protect_first_n()
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Pull a compress-end boundary backward to avoid splitting a
@@ -2178,6 +2323,7 @@ This compaction should PRIORITISE preserving all information related to the focu
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compress_aborted = False
+ self._last_summary_auth_failure = False
# Manual /compress (force=True) bypasses the failure cooldown so the
# user can retry immediately after an auto-compress abort. Without
@@ -2293,19 +2439,38 @@ This compaction should PRIORITISE preserving all information related to the focu
# _last_summary_dropped_count for gateway hygiene to
# surface a warning.
# Default is False (historical behavior).
- if not summary and self.abort_on_summary_failure:
+ #
+ # EXCEPTION — auth failures always abort. A 401/403 from the summary
+ # call means the credential or endpoint is broken (invalid/blocked
+ # key, or a token pointed at the wrong inference host). Rotating into
+ # a child session with a placeholder summary on a broken credential
+ # strands the user on a degraded session for zero benefit — every
+ # subsequent call fails the same way. So when the failure was an auth
+ # error we abort regardless of abort_on_summary_failure, preserving
+ # the conversation unchanged until the credential is fixed.
+ if not summary and (self.abort_on_summary_failure or self._last_summary_auth_failure):
n_skipped = compress_end - compress_start
self._last_summary_dropped_count = 0 # nothing actually dropped
self._last_summary_fallback_used = False
self._last_compress_aborted = True
if not self.quiet_mode:
- logger.warning(
- "Summary generation failed — aborting compression "
- "(compression.abort_on_summary_failure=true). "
- "%d message(s) preserved unchanged. Conversation is "
- "frozen until the next /compress or /new.",
- n_skipped,
- )
+ if self._last_summary_auth_failure:
+ logger.warning(
+ "Summary generation failed with an authentication "
+ "error — aborting compression. %d message(s) preserved "
+ "unchanged; the session was NOT rotated. Check your "
+ "provider credential / inference endpoint, then retry "
+ "with /compress or start fresh with /new.",
+ n_skipped,
+ )
+ else:
+ logger.warning(
+ "Summary generation failed — aborting compression "
+ "(compression.abort_on_summary_failure=true). "
+ "%d message(s) preserved unchanged. Conversation is "
+ "frozen until the next /compress or /new.",
+ n_skipped,
+ )
return messages
# Phase 4: Assemble compressed message list
diff --git a/agent/conversation_compression.py b/agent/conversation_compression.py
index 5c7d299f0a4..94fff283893 100644
--- a/agent/conversation_compression.py
+++ b/agent/conversation_compression.py
@@ -328,6 +328,16 @@ def compress_context(
agent._compression_feasibility_checked = True
_pre_msg_count = len(messages)
+ # In-place compaction (config: compression.in_place, see #38763). When True,
+ # this compaction rewrites the message list + rebuilds the system prompt but
+ # keeps the SAME session_id — no end_session, no parent_session_id child, no
+ # `name #N` renumber, no contextvar/env/logging re-sync, no memory/context-
+ # engine session-switch. The conversation keeps one durable id for life,
+ # eliminating the session-rotation bug cluster. Default False during rollout.
+ in_place = bool(getattr(agent, "compression_in_place", False))
+ # Set True once the in-place DB write actually completes (the DB block can
+ # raise and skip it). Surfaced to the gateway via agent._last_compaction_in_place.
+ compacted_in_place = False
logger.info(
"context compression started: session=%s messages=%d tokens=~%s model=%s focus=%r",
agent.session_id or "none", _pre_msg_count,
@@ -508,125 +518,244 @@ def compress_context(
if agent._session_db:
try:
- # Propagate title to the new session with auto-numbering
- old_title = agent._session_db.get_session_title(agent.session_id)
- # Trigger memory extraction on the old session before it rotates.
+ # Trigger memory extraction on the current session before the
+ # transcript is rewritten (runs in BOTH modes — the logical
+ # conversation's pre-compaction turns are about to be summarized
+ # away regardless of whether the id rotates).
agent.commit_memory_session(messages)
- # Flush any un-persisted messages from the current turn to the
- # old session *before* rotating. compress_context() can be
- # called mid-turn (auto-compress when context exceeds threshold)
- # at a point when _flush_messages_to_session_db() has not yet
- # run. Without this, messages generated during the current turn
- # are silently lost on session rotation (#47202).
- try:
- agent._flush_messages_to_session_db(messages)
- except Exception:
- pass # best-effort — don't block compression on a flush error
- agent._session_db.end_session(agent.session_id, "compression")
- old_session_id = agent.session_id
- agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
- # Ordering contract: the agent thread updates the contextvar here;
- # the gateway propagates to SessionEntry after run_in_executor returns.
- try:
- from gateway.session_context import set_current_session_id
- set_current_session_id(agent.session_id)
- except Exception:
- os.environ["HERMES_SESSION_ID"] = agent.session_id
- # The gateway/tools session context (ContextVar + env) and the
- # logging session context are SEPARATE mechanisms. The call above
- # moves the former; the ``[session_id]`` tag on log lines comes
- # from ``hermes_logging._session_context`` (set once per turn in
- # conversation_loop.py). Without this, post-rotation log lines in
- # the same turn keep the STALE old id while the message/DB/gateway
- # state carry the new one — breaking log correlation exactly at the
- # compaction boundary (see #34089). Guarded separately so a logging
- # failure can never regress the routing update above.
- try:
- from hermes_logging import set_session_context
-
- set_session_context(agent.session_id)
- except Exception:
- pass
- agent._session_db_created = False
- agent._session_db.create_session(
- session_id=agent.session_id,
- source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
- model=agent.model,
- model_config=agent._session_init_model_config,
- parent_session_id=old_session_id,
- )
- agent._session_db_created = True
- # Auto-number the title for the continuation session
- if old_title:
+ if in_place:
+ # ── In-place compaction: keep the same session_id ──────────
+ # No end_session, no new row, no parent_session_id, no title
+ # renumber, no contextvar/env/logging re-sync. The session's
+ # id, title, cwd, /goal, and gateway routing all stay put.
+ #
+ # Durable, NON-DESTRUCTIVE replace: soft-archive the
+ # pre-compaction turns (active=0, kept on disk + FTS-searchable +
+ # recoverable) and insert `compressed` as the new live (active=1)
+ # set, atomically. `compressed` already carries the surviving
+ # tail (current-turn messages the compressor kept via
+ # protect_last_n), so we DON'T pre-flush here — a flush would
+ # INSERT current-turn rows that archive_and_compact would then
+ # archive alongside the rest (harmless but wasted writes). The
+ # live-context load filters active=1, so a resume reloads ONLY
+ # the compacted set; the original turns remain under the SAME id
+ # for search/recovery (Teknium review — keep one durable id
+ # WITHOUT destroying history, unlike a hard replace_messages).
+ # See #38763.
+ agent._session_db.archive_and_compact(agent.session_id, compressed)
+ # Reset the flush identity set so the next turn's appends are
+ # diffed against the COMPACTED transcript: the compacted dicts
+ # are passed as conversation_history next turn and skipped by
+ # identity, so only genuinely new turn messages get appended
+ # (no dup of the summary, no resurrection of dropped turns).
+ agent._flushed_db_message_ids = set()
+ # Rotation-independent signal: the conversation was compacted in
+ # place (id unchanged). The gateway reads this (NOT an id-change
+ # diff) to re-baseline transcript handling.
+ compacted_in_place = True
+ else:
+ # ── Rotation (legacy): end this session, fork a continuation ─
+ # Flush any un-persisted current-turn messages to the OLD
+ # session before ending it, so they survive in the preserved
+ # parent transcript (#47202). (In-place skips this — see above.)
try:
- new_title = agent._session_db.get_next_title_in_lineage(old_title)
- agent._session_db.set_session_title(agent.session_id, new_title)
- except (ValueError, Exception) as e:
- logger.debug("Could not propagate title on compression: %s", e)
+ agent._flush_messages_to_session_db(messages)
+ except Exception:
+ pass # best-effort — don't block compression on a flush error
+ # Propagate title to the new session with auto-numbering
+ old_title = agent._session_db.get_session_title(agent.session_id)
+ agent._session_db.end_session(agent.session_id, "compression")
+ old_session_id = agent.session_id
+ agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+ # Ordering contract: the agent thread updates the contextvar here;
+ # the gateway propagates to SessionEntry after run_in_executor returns.
+ try:
+ from gateway.session_context import set_current_session_id
+
+ set_current_session_id(agent.session_id)
+ except Exception:
+ os.environ["HERMES_SESSION_ID"] = agent.session_id
+ # The gateway/tools session context (ContextVar + env) and the
+ # logging session context are SEPARATE mechanisms. The call above
+ # moves the former; the ``[session_id]`` tag on log lines comes
+ # from ``hermes_logging._session_context`` (set once per turn in
+ # conversation_loop.py). Without this, post-rotation log lines in
+ # the same turn keep the STALE old id while the message/DB/gateway
+ # state carry the new one — breaking log correlation exactly at the
+ # compaction boundary (see #34089). Guarded separately so a logging
+ # failure can never regress the routing update above.
+ try:
+ from hermes_logging import set_session_context
+
+ set_session_context(agent.session_id)
+ except Exception:
+ pass
+ agent._session_db_created = False
+ try:
+ agent._session_db.create_session(
+ session_id=agent.session_id,
+ source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
+ model=agent.model,
+ model_config=agent._session_init_model_config,
+ parent_session_id=old_session_id,
+ )
+ except Exception as _cs_err:
+ # The child row could not be created (e.g. FK constraint,
+ # contended write). Previously the outer handler simply
+ # warned and let the agent continue on the NEW id — which
+ # has no row in state.db, producing an orphan: the parent
+ # is ended, the child is never indexed, and every
+ # subsequent message is attributed to a session that
+ # doesn't exist (#33906/#33907). Roll the live id back to
+ # the parent so the conversation stays attached to a real,
+ # indexed session instead of a phantom.
+ logger.warning(
+ "Compression child session create failed (%s) — "
+ "rolling back to parent session %s to avoid an orphan.",
+ _cs_err, old_session_id,
+ )
+ agent.session_id = old_session_id
+ try:
+ from gateway.session_context import set_current_session_id
+ set_current_session_id(agent.session_id)
+ except Exception:
+ os.environ["HERMES_SESSION_ID"] = agent.session_id
+ try:
+ from hermes_logging import set_session_context
+ set_session_context(agent.session_id)
+ except Exception:
+ pass
+ # Re-open the parent: it was ended above, but we're
+ # continuing on it, so it must not stay closed.
+ try:
+ agent._session_db.reopen_session(old_session_id)
+ except Exception:
+ pass
+ old_session_id = None # no rotation happened
+ # The parent row already exists in state.db, so mark the
+ # session as created — _ensure_db_session would otherwise
+ # retry a (harmless INSERT OR IGNORE) create next turn.
+ agent._session_db_created = True
+ raise
+ agent._session_db_created = True
+ # Carry a persistent /goal onto the continuation session.
+ # Compression mints a fresh child id; load_goal does a flat
+ # per-session lookup with no parent walk, so without this an
+ # active goal silently dies at the boundary (#33618).
+ try:
+ from hermes_cli.goals import migrate_goal_to_session
+ migrate_goal_to_session(old_session_id, agent.session_id, reason="compression")
+ except Exception as _goal_err:
+ logger.debug("Could not migrate goal on compression: %s", _goal_err)
+ # Auto-number the title for the continuation session
+ if old_title:
+ try:
+ new_title = agent._session_db.get_next_title_in_lineage(old_title)
+ agent._session_db.set_session_title(agent.session_id, new_title)
+ except (ValueError, Exception) as e:
+ logger.debug("Could not propagate title on compression: %s", e)
+
+ # Shared post-write steps (both modes target agent.session_id, which
+ # in-place keeps and rotation has already reassigned to the new id):
+ # refresh the stored system prompt and reset the flush cursor so the
+ # next turn re-bases its append diff.
agent._session_db.update_system_prompt(agent.session_id, new_system_prompt)
- # Reset flush cursor — new session starts with no messages written
agent._last_flushed_db_idx = 0
except Exception as e:
- logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
+ # If the rotation rolled back to the parent (orphan-avoidance
+ # above), agent.session_id is the still-indexed parent and
+ # old_session_id was cleared — so this is recovery, not an
+ # un-indexed orphan. Otherwise an earlier step failed before the
+ # child was created and the warning's original meaning holds.
+ if locals().get("old_session_id") is None and not in_place:
+ logger.warning(
+ "Compression rotation aborted and rolled back to the "
+ "parent session (%s): %s", agent.session_id or "?", e,
+ )
+ else:
+ logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
- # Notify the context engine that the session_id rotated because of
- # compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
- # boundary_reason="compression" to preserve DAG lineage across the
- # rollover instead of re-initializing fresh per-session state.
- # See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
+ # Compaction-boundary bookkeeping, computed once. `old_session_id` is only
+ # bound in the rotation branch; in-place leaves it unset. `_boundary_parent`
+ # is the id the boundary notifications attribute the prior state to: the old
+ # id on rotation, the (unchanged) current id in-place.
+ _old_sid = locals().get("old_session_id")
+ _is_boundary = bool(_old_sid) or in_place
+ _boundary_parent = _old_sid or agent.session_id or ""
+
+ # Notify the context engine that a compaction boundary occurred. Plugin
+ # engines (e.g. hermes-lcm) use boundary_reason="compression" to preserve
+ # DAG lineage / checkpoint per-session state across the boundary instead of
+ # re-initializing fresh. See hermes-lcm#68. Built-in ContextCompressor
+ # ignores kwargs. Fires in BOTH modes: rotation passes old→new ids; in-place
+ # passes the SAME id (the boundary is real even though the id didn't move).
try:
- _old_sid = locals().get("old_session_id")
- if _old_sid and hasattr(agent.context_compressor, "on_session_start"):
+ if _is_boundary and hasattr(agent.context_compressor, "on_session_start"):
agent.context_compressor.on_session_start(
agent.session_id or "",
boundary_reason="compression",
- old_session_id=_old_sid,
+ old_session_id=_boundary_parent,
+ platform=getattr(agent, "platform", None) or "cli",
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
- # Notify memory providers of the compression-driven session_id rotation
- # so provider-cached per-session state (Hindsight's _document_id,
- # accumulated turn buffers, counters) refreshes. reset=False because
- # the logical conversation continues; only the id and DB row rolled
- # over. See #6672.
+ # Notify memory providers of the compaction boundary so provider-cached
+ # per-session state (Hindsight's _document_id, accumulated turn buffers,
+ # counters) refreshes. reset=False because the logical conversation
+ # continues. See #6672. Fires in BOTH modes: in-place uses the same id as
+ # parent (the conversation didn't fork, but the buffer must still be told
+ # the transcript was compacted so it doesn't double-count dropped turns).
try:
- _old_sid = locals().get("old_session_id")
- if _old_sid and agent._memory_manager:
+ if _is_boundary and agent._memory_manager:
agent._memory_manager.on_session_switch(
agent.session_id or "",
- parent_session_id=_old_sid,
+ parent_session_id=_boundary_parent,
reset=False,
reason="compression",
)
except Exception as _me_err:
logger.debug("memory manager on_session_switch (compression): %s", _me_err)
- # Warn on repeated compressions (quality degrades with each pass)
+ # Warn on repeated compressions (quality degrades with each pass).
+ # Route through _emit_status (like the other compression warnings above)
+ # so the warning reaches the TUI / Telegram / Discord via status_callback,
+ # not just CLI stdout. _emit_status still _vprints for the CLI, and
+ # storing it on _compression_warning lets replay_compression_warning
+ # re-deliver it once a late-bound gateway status_callback is wired (#36908).
_cc = agent.context_compressor.compression_count
if _cc >= 2:
- agent._vprint(
+ _cc_msg = (
f"{agent.log_prefix}⚠️ Session compressed {_cc} times — "
- f"accuracy may degrade. Consider /new to start fresh.",
- force=True,
+ f"accuracy may degrade. Consider /new to start fresh."
)
+ agent._compression_warning = _cc_msg
+ agent._emit_status(_cc_msg)
# Emit session:compress event so hooks (e.g. MemPalace sync) can ingest
- # the completed old session before its details are lost.
- _old_sid_for_event = locals().get("old_session_id")
+ # the completed old session before its details are lost. In in-place mode
+ # there is no old id (same session); ``in_place=True`` tells hooks the
+ # transcript was compacted on the same id rather than rotated.
if getattr(agent, "event_callback", None):
try:
agent.event_callback("session:compress", {
"platform": agent.platform or "",
"session_id": agent.session_id,
- "old_session_id": _old_sid_for_event or "",
+ "old_session_id": _old_sid or "",
+ "in_place": in_place,
"compression_count": agent.context_compressor.compression_count,
})
except Exception as e:
logger.debug("event_callback error on session:compress: %s", e)
+ # Surface the compaction mode to the caller (run_conversation / gateway)
+ # via a rotation-independent flag. The gateway uses this — NOT an
+ # id-change diff — to re-baseline transcript handling (history_offset=0 +
+ # rewrite on the same id) when compaction happened in place. See #38763.
+ agent._last_compaction_in_place = compacted_in_place
+
# Keep the post-compression rough estimate for diagnostics, but do not
# treat it as provider-reported prompt usage. Schema-heavy rough estimates
# can remain above threshold even after the next real API request fits.
@@ -712,33 +841,58 @@ def try_shrink_image_parts_in_messages(
# actually brought under the target.
unshrinkable_oversized = 0
- def _shrink_data_url(url: str) -> Optional[str]:
- """Return a smaller data URL, or None if shrink can't help."""
- if not isinstance(url, str) or not url.startswith("data:"):
+ def _decode_pixels(data_url: str) -> Optional[tuple]:
+ """Return ``(width, height)`` of a base64 data URL, or None on failure.
+
+ Soft-depends on Pillow; returns None (caller falls back to a
+ bytes-only check) if Pillow is missing or the payload is corrupt.
+ """
+ try:
+ import base64 as _b64_dim
+ import io as _io_dim
+ header_d, _, data_d = data_url.partition(",")
+ if not data_d or not data_url.startswith("data:"):
+ return None
+ from PIL import Image as _PILImage
+ with _PILImage.open(_io_dim.BytesIO(_b64_dim.b64decode(data_d))) as _img:
+ return _img.size
+ except Exception:
return None
- # Check both byte size AND pixel dimensions.
+ def _shrink_data_url(url: str) -> tuple:
+ """Return ``(resized_url, unshrinkable)`` for a data URL.
+
+ ``resized_url`` is a smaller/dimension-correct data URL, or None when
+ no rewrite was applied. ``unshrinkable`` is True only when the image
+ exceeded a constraint (byte-size or dimensions) and the resize failed
+ to satisfy *that same* constraint — so the caller knows retrying is
+ pointless even if a different image in the request shrank.
+ """
+ if not isinstance(url, str) or not url.startswith("data:"):
+ return None, False
+
+ # Determine which constraint is binding. The accept/reject gate below
+ # MUST be checked against the same axis that triggered the shrink: a
+ # downscaled screenshot PNG routinely re-encodes to *more* bytes than
+ # the original (PNG compression is non-monotonic in image size — a
+ # smaller raster with LANCZOS resampling noise compresses worse than a
+ # larger smooth one). Rejecting a pixel-correct downscale purely
+ # because its bytes grew permanently wedges sessions on the Anthropic
+ # many-image 2000px path (#48013).
needs_shrink = len(url) > target_bytes # over byte budget
+ triggered_by = "bytes" if needs_shrink else None
if not needs_shrink:
- # Even if bytes are fine, check pixel dimensions against the
- # provider's reported per-side cap. A screenshot can be tiny in
- # bytes yet too large in pixels.
- try:
- import base64 as _b64_dim
- header_d, _, data_d = url.partition(",")
- if not data_d:
- return None
- raw_d = _b64_dim.b64decode(data_d)
- from PIL import Image as _PILImage
- import io as _io_dim
- with _PILImage.open(_io_dim.BytesIO(raw_d)) as _img:
- if max(_img.size) <= max_dimension:
- return None # both bytes and pixels are fine
- needs_shrink = True # pixels exceed limit, force shrink
- except Exception:
- # If we can't check dimensions (Pillow unavailable, corrupt
- # image, etc.), fall back to byte-only check.
- return None
+ # Bytes are fine — check pixel dimensions against the provider's
+ # reported per-side cap. A screenshot can be tiny in bytes yet
+ # too large in pixels.
+ dims = _decode_pixels(url)
+ if dims is None:
+ # Pillow missing or corrupt data — fall back to byte-only.
+ return None, False
+ if max(dims) <= max_dimension:
+ return None, False # both bytes and pixels are within limits
+ needs_shrink = True
+ triggered_by = "dimension"
try:
header, _, data = url.partition(",")
@@ -770,13 +924,45 @@ def try_shrink_image_parts_in_messages(
Path(tmp.name).unlink(missing_ok=True)
except Exception:
pass
- if not resized or len(resized) >= len(url):
- # Shrink didn't help (or made it bigger — corrupt input?).
- return None
- return resized
+ if not resized:
+ # Resize returned nothing — Pillow couldn't help.
+ return None, True
+ if triggered_by == "bytes":
+ # Byte budget is the binding constraint — bytes must shrink.
+ if len(resized) >= len(url):
+ return None, True # re-encode made it bigger
+ # The per-side dimension cap is ALSO an active provider
+ # constraint on this request (the caller passes the parsed cap
+ # to both this helper and the resizer). _resize_image_for_vision
+ # returns a best-effort, possibly-over-cap blob when it
+ # exhausts its halving budget — it freezes the long side once
+ # the short side hits its 64px floor, so a very-high-aspect
+ # image can stay over the cap even after bytes shrank. If the
+ # output is still over the cap, retrying would re-400 on
+ # dimensions; treat it as unshrinkable. (Skip when dims can't
+ # be decoded — preserves historical byte-only behaviour.)
+ new_dims = _decode_pixels(resized)
+ if new_dims is not None and max(new_dims) > max_dimension:
+ return None, True
+ return resized, False
+ # triggered_by == "dimension": the per-side cap is binding. The
+ # re-encode may have grown in bytes; accept it as long as it is now
+ # within the dimension cap. Verify the new dimensions when we can.
+ new_dims = _decode_pixels(resized)
+ if new_dims is not None:
+ if max(new_dims) <= max_dimension:
+ return resized, False
+ # Still over the per-side cap — the resize didn't satisfy it.
+ return None, True
+ # Couldn't verify the re-encode's dimensions (corrupt output or
+ # Pillow gone mid-call). Fall back to the historical "bytes must
+ # shrink" gate so we never accept an unverifiable, byte-larger blob.
+ if len(resized) >= len(url):
+ return None, True
+ return resized, False
except Exception as exc:
logger.warning("image-shrink recovery: re-encode failed — %s", exc)
- return None
+ return None, triggered_by is not None
for msg in api_messages:
if not isinstance(msg, dict):
@@ -795,20 +981,18 @@ def try_shrink_image_parts_in_messages(
# OpenAI Responses: {"image_url": "data:..."}
if isinstance(image_value, dict):
url = image_value.get("url", "")
- resized = _shrink_data_url(url)
+ resized, unshrinkable = _shrink_data_url(url)
if resized:
image_value["url"] = resized
changed_count += 1
- elif isinstance(url, str) and url.startswith("data:") \
- and len(url) > target_bytes:
+ elif unshrinkable:
unshrinkable_oversized += 1
elif isinstance(image_value, str):
- resized = _shrink_data_url(image_value)
+ resized, unshrinkable = _shrink_data_url(image_value)
if resized:
part["image_url"] = resized
changed_count += 1
- elif image_value.startswith("data:") \
- and len(image_value) > target_bytes:
+ elif unshrinkable:
unshrinkable_oversized += 1
if changed_count:
diff --git a/agent/conversation_loop.py b/agent/conversation_loop.py
index ef69ac68329..bbc379adf25 100644
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@@ -466,6 +466,32 @@ def _content_policy_blocked_result(
}
+def _sync_failover_system_message(agent, api_messages, active_system_prompt):
+ """Refresh the in-flight system message after a provider failover.
+
+ ``try_activate_fallback`` rewrites the ``Model:``/``Provider:`` identity
+ lines on ``agent._cached_system_prompt`` (see
+ ``rewrite_prompt_model_identity``) so the agent reports the model that is
+ actually answering. But the current call block's ``api_messages`` were
+ built from the pre-failover prompt, and the retry loop rebuilds
+ ``api_kwargs`` from that list each iteration — without this sync the
+ whole turn (and every gateway turn, since fallback re-activates per
+ message while the primary is down) ships the stale identity.
+
+ Mutates ``api_messages[0]`` in place and returns the prompt to use as
+ ``active_system_prompt`` for subsequent call-block rebuilds.
+ """
+ sp = getattr(agent, "_cached_system_prompt", None)
+ if not isinstance(sp, str) or not sp:
+ return active_system_prompt
+ if api_messages and api_messages[0].get("role") == "system":
+ effective = sp
+ if agent.ephemeral_system_prompt:
+ effective = (effective + "\n\n" + agent.ephemeral_system_prompt).strip()
+ api_messages[0]["content"] = effective
+ return sp
+
+
def run_conversation(
agent,
user_message: str,
@@ -940,6 +966,8 @@ def run_conversation(
)
agent._buffer_status(f"⏳ {_nous_msg}")
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -1265,6 +1293,8 @@ def run_conversation(
if agent._fallback_index < len(agent._fallback_chain):
agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -1336,6 +1366,8 @@ def run_conversation(
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -1479,6 +1511,8 @@ def run_conversation(
"⚠️ Model declined to respond (safety refusal) — trying fallback..."
)
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -2783,11 +2817,46 @@ def run_conversation(
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
continue
+ # ── Auth-failure provider failover ───────────────────────
+ # A 401/403 that survives the per-provider credential-refresh
+ # attempt above (each guarded by its own
+ # ``*_auth_retry_attempted`` flag) means the active provider's
+ # credential or endpoint is broken in a way refreshing can't
+ # fix (revoked OAuth, blocked/expired key, an account pinned to
+ # a dead/staging endpoint). Previously the loop only printed
+ # "switch providers manually" advice and fell through, so a
+ # user with a configured fallback chain kept thrashing on the
+ # same dead credential every turn instead of failing over.
+ # Escalate to the fallback chain here, mirroring the rate-
+ # limit/billing failover above. When no fallback is configured
+ # (or the chain is exhausted), _try_activate_fallback returns
+ # False and we fall through to the existing terminal handling
+ # + provider-specific troubleshooting guidance unchanged.
+ if (
+ classified.is_auth
+ and not _retry.auth_failover_attempted
+ and agent._fallback_index < len(agent._fallback_chain)
+ ):
+ _retry.auth_failover_attempted = True
+ agent._buffer_status(
+ "🔐 Authentication failed and could not be refreshed — "
+ "switching to fallback provider..."
+ )
+ if agent._try_activate_fallback(reason=classified.reason):
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
+ retry_count = 0
+ compression_attempts = 0
+ _retry.primary_recovery_attempted = False
+ continue
+
# ── Nous Portal: record rate limit & skip retries ─────
# When Nous returns a 429 that is a genuine account-
# level rate limit, record the reset time to a shared
@@ -2914,6 +2983,7 @@ def run_conversation(
agent._buffer_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
+ original_tokens = estimate_messages_tokens_rough(messages)
messages, active_system_prompt = agent._compress_context(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
@@ -2923,8 +2993,18 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
- if len(messages) < original_len:
- agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+ # Re-estimate tokens after compression. Same-message-count
+ # compression (tool-result pruning, in-place summarization)
+ # can materially reduce request size without reducing the
+ # message array. (#39550)
+ new_tokens = estimate_messages_tokens_rough(messages)
+ approx_tokens = new_tokens # update for downstream logging
+
+ if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95):
+ if len(messages) < original_len:
+ agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+ else:
+ agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
time.sleep(2) # Brief pause between compression retries
_retry.restart_with_compressed_messages = True
break
@@ -3070,6 +3150,7 @@ def run_conversation(
agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
original_len = len(messages)
+ original_tokens = estimate_messages_tokens_rough(messages)
messages, active_system_prompt = agent._compress_context(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
@@ -3079,9 +3160,18 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
- if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
+ # Re-estimate tokens after compression. Same-message-count
+ # compression (tool-result pruning, in-place summarization)
+ # can materially reduce request size without reducing the
+ # message array. (#39550)
+ new_tokens = estimate_messages_tokens_rough(messages)
+ approx_tokens = new_tokens # update for downstream logging
+
+ if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95) or (new_ctx and new_ctx < old_ctx):
if len(messages) < original_len:
agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+ elif new_tokens > 0 and new_tokens < original_tokens * 0.95:
+ agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
time.sleep(2) # Brief pause between compression retries
_retry.restart_with_compressed_messages = True
break
@@ -3090,13 +3180,13 @@ def run_conversation(
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
- logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
+ logger.error(f"{agent.log_prefix}Context length exceeded: {new_tokens:,} tokens. Cannot compress further.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
"completed": False,
"api_calls": api_call_count,
- "error": f"Context length exceeded ({approx_tokens:,} tokens). Cannot compress further.",
+ "error": f"Context length exceeded ({new_tokens:,} tokens). Cannot compress further.",
"partial": True,
"failed": True,
"compression_exhausted": True,
@@ -3186,6 +3276,8 @@ def run_conversation(
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -3197,15 +3289,22 @@ def run_conversation(
# Terminal — flush buffered context so the user sees
# what was tried before the abort.
agent._flush_status_buffer()
+ # Summarize once: Cloudflare/proxy HTML challenge pages and
+ # other raw provider bodies must be collapsed to a short
+ # one-liner here, otherwise the full page leaks into the
+ # returned ``error`` field and downstream consumers deliver
+ # it verbatim (e.g. a cron failure notification dumped a
+ # ~60KB Cloudflare challenge page as 31 Discord messages).
+ _nonretryable_summary = agent._summarize_api_error(api_error)
if classified.reason == FailoverReason.content_policy_blocked:
agent._emit_status(
f"❌ Provider safety filter blocked this request: "
- f"{agent._summarize_api_error(api_error)}"
+ f"{_nonretryable_summary}"
)
else:
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
- f"{agent._summarize_api_error(api_error)}"
+ f"{_nonretryable_summary}"
)
agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
@@ -3290,18 +3389,17 @@ def run_conversation(
else:
agent._persist_session(messages, conversation_history)
if classified.reason == FailoverReason.content_policy_blocked:
- _summary = agent._summarize_api_error(api_error)
_policy_response = (
"⚠️ The model provider's safety filter blocked this request "
"(not a Hermes/gateway failure).\n\n"
- f"Provider message: {_summary}\n\n"
+ f"Provider message: {_nonretryable_summary}\n\n"
f"{_CONTENT_POLICY_RECOVERY_HINT}"
)
return _content_policy_blocked_result(
messages,
api_call_count,
final_response=_policy_response,
- error_detail=_summary,
+ error_detail=_nonretryable_summary,
)
return {
"final_response": None,
@@ -3309,7 +3407,7 @@ def run_conversation(
"api_calls": api_call_count,
"completed": False,
"failed": True,
- "error": str(api_error),
+ "error": _nonretryable_summary,
}
if retry_count >= max_retries:
@@ -3327,6 +3425,8 @@ def run_conversation(
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@@ -4273,6 +4373,8 @@ def run_conversation(
"switching to fallback provider..."
)
if agent._try_activate_fallback():
+ active_system_prompt = _sync_failover_system_message(
+ agent, api_messages, active_system_prompt)
agent._empty_content_retries = 0
agent._buffer_status(
f"↻ Switched to fallback: {agent.model} "
diff --git a/agent/credential_pool.py b/agent/credential_pool.py
index 04b22c76a68..4e883cffaa0 100644
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@@ -15,6 +15,7 @@ from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import load_env
+from agent.secret_scope import get_secret as _get_secret
from agent.credential_persistence import (
is_borrowed_credential_source,
sanitize_borrowed_credential_payload,
@@ -1666,7 +1667,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
_env_file = load_env()
def _env_val(key: str) -> str:
- return (_env_file.get(key) or os.environ.get(key) or "").strip()
+ return (_env_file.get(key) or _get_secret(key, "") or "").strip()
anthropic_api_key = _env_val("ANTHROPIC_API_KEY")
anthropic_oauth_env = (
@@ -1952,7 +1953,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
# changes to the .env file.
def _get_env_prefer_dotenv(key: str) -> str:
env_file = load_env()
- val = env_file.get(key) or os.environ.get(key) or ""
+ val = env_file.get(key) or _get_secret(key, "") or ""
return val.strip()
# Honour user suppression — `hermes auth remove ` for an
@@ -2061,19 +2062,34 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
return changed, active_sources
-def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources: Set[str]) -> bool:
+def _prune_stale_seeded_entries(
+ entries: List[PooledCredential],
+ active_sources: Set[str],
+ *,
+ prune_env_sources: bool = True,
+) -> bool:
+ def _is_prunable(entry: PooledCredential) -> bool:
+ # ``env:*`` entries are persisted references that get re-hydrated from
+ # the environment on every load. A process that merely lacks the env
+ # var this call must NOT delete the on-disk entry for every other
+ # process — that destructive read is the bug behind #9331. Only prune
+ # an env source when ``prune_env_sources`` is explicitly requested
+ # (e.g. an `hermes auth` command that confirmed the source is gone).
+ if entry.source.startswith("env:"):
+ return prune_env_sources
+ # File-backed singletons (device-code OAuth, claude_code) and Hermes
+ # PKCE should disappear from the pool when their backing file is gone.
+ return (
+ is_borrowed_credential_source(entry.source, entry.provider)
+ or entry.source == "hermes_pkce"
+ )
+
retained = [
entry
for entry in entries
if _is_manual_source(entry.source)
or entry.source in active_sources
- or not (
- is_borrowed_credential_source(entry.source, entry.provider)
- # Hermes PKCE is Hermes-owned/persistable while present, but it is
- # still a file-backed singleton and should disappear from the pool
- # when the backing OAuth file is gone.
- or entry.source == "hermes_pkce"
- )
+ or not _is_prunable(entry)
]
if len(retained) == len(entries):
return False
@@ -2173,7 +2189,15 @@ def load_pool(provider: str) -> CredentialPool:
singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
env_changed, env_sources = _seed_from_env(provider, entries)
changed = raw_needs_sanitization or singleton_changed or env_changed
- changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
+ # ``load_pool()`` is a non-destructive read for env-seeded entries: a
+ # process missing a provider env var must not delete the persisted
+ # pool entry for every other process (#9331). File-backed singletons
+ # still prune when their backing file is gone.
+ changed |= _prune_stale_seeded_entries(
+ entries,
+ singleton_sources | env_sources,
+ prune_env_sources=False,
+ )
changed |= _normalize_pool_priorities(provider, entries)
if changed:
diff --git a/agent/gemini_cloudcode_adapter.py b/agent/gemini_cloudcode_adapter.py
deleted file mode 100644
index 222327807be..00000000000
--- a/agent/gemini_cloudcode_adapter.py
+++ /dev/null
@@ -1,909 +0,0 @@
-"""OpenAI-compatible facade that talks to Google's Cloud Code Assist backend.
-
-This adapter lets Hermes use the ``google-gemini-cli`` provider as if it were
-a standard OpenAI-shaped chat completion endpoint, while the underlying HTTP
-traffic goes to ``cloudcode-pa.googleapis.com/v1internal:{generateContent,
-streamGenerateContent}`` with a Bearer access token obtained via OAuth PKCE.
-
-Architecture
-------------
-- ``GeminiCloudCodeClient`` exposes ``.chat.completions.create(**kwargs)``
- mirroring the subset of the OpenAI SDK that ``run_agent.py`` uses.
-- Incoming OpenAI ``messages[]`` / ``tools[]`` / ``tool_choice`` are translated
- to Gemini's native ``contents[]`` / ``tools[].functionDeclarations`` /
- ``toolConfig`` / ``systemInstruction`` shape.
-- The request body is wrapped ``{project, model, user_prompt_id, request}``
- per Code Assist API expectations.
-- Responses (``candidates[].content.parts[]``) are converted back to
- OpenAI ``choices[0].message`` shape with ``content`` + ``tool_calls``.
-- Streaming uses SSE (``?alt=sse``) and yields OpenAI-shaped delta chunks.
-
-Attribution
------------
-Translation semantics follow jenslys/opencode-gemini-auth (MIT) and the public
-Gemini API docs. Request envelope shape
-(``{project, model, user_prompt_id, request}``) is documented nowhere; it is
-reverse-engineered from the opencode-gemini-auth and clawdbot implementations.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import time
-import uuid
-from types import SimpleNamespace
-from typing import Any, Dict, Iterator, List, Optional
-
-import httpx
-
-from agent import google_oauth
-from agent.gemini_schema import sanitize_gemini_tool_parameters
-from agent.google_code_assist import (
- CODE_ASSIST_ENDPOINT,
- CodeAssistError,
- ProjectContext,
- resolve_project_context,
-)
-
-logger = logging.getLogger(__name__)
-
-
-# =============================================================================
-# Request translation: OpenAI → Gemini
-# =============================================================================
-
-_ROLE_MAP_OPENAI_TO_GEMINI = {
- "user": "user",
- "assistant": "model",
- "system": "user", # handled separately via systemInstruction
- "tool": "user", # functionResponse is wrapped in a user-role turn
- "function": "user",
-}
-
-
-def _coerce_content_to_text(content: Any) -> str:
- """OpenAI content may be str or a list of parts; reduce to plain text."""
- if content is None:
- return ""
- if isinstance(content, str):
- return content
- if isinstance(content, list):
- pieces: List[str] = []
- for p in content:
- if isinstance(p, str):
- pieces.append(p)
- elif isinstance(p, dict):
- if p.get("type") == "text" and isinstance(p.get("text"), str):
- pieces.append(p["text"])
- # Multimodal (image_url, etc.) — stub for now; log and skip
- elif p.get("type") in {"image_url", "input_audio"}:
- logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
- return "\n".join(pieces)
- return str(content)
-
-
-def _translate_tool_call_to_gemini(tool_call: Dict[str, Any]) -> Dict[str, Any]:
- """OpenAI tool_call -> Gemini functionCall part."""
- fn = tool_call.get("function") or {}
- args_raw = fn.get("arguments", "")
- try:
- args = json.loads(args_raw) if isinstance(args_raw, str) and args_raw else {}
- except json.JSONDecodeError:
- args = {"_raw": args_raw}
- if not isinstance(args, dict):
- args = {"_value": args}
- return {
- "functionCall": {
- "name": fn.get("name") or "",
- "args": args,
- },
- # Sentinel signature — matches opencode-gemini-auth's approach.
- # Without this, Code Assist rejects function calls that originated
- # outside its own chain.
- "thoughtSignature": "skip_thought_signature_validator",
- }
-
-
-def _translate_tool_result_to_gemini(message: Dict[str, Any]) -> Dict[str, Any]:
- """OpenAI tool-role message -> Gemini functionResponse part.
-
- The function name isn't in the OpenAI tool message directly; it must be
- passed via the assistant message that issued the call. For simplicity we
- look up ``name`` on the message (OpenAI SDK copies it there) or on the
- ``tool_call_id`` cross-reference.
- """
- name = str(message.get("name") or message.get("tool_call_id") or "tool")
- content = _coerce_content_to_text(message.get("content"))
- # Gemini expects the response as a dict under `response`. We wrap plain
- # text in {"output": "..."}.
- try:
- parsed = json.loads(content) if content.strip().startswith(("{", "[")) else None
- except json.JSONDecodeError:
- parsed = None
- response = parsed if isinstance(parsed, dict) else {"output": content}
- return {
- "functionResponse": {
- "name": name,
- "response": response,
- },
- }
-
-
-def _build_gemini_contents(
- messages: List[Dict[str, Any]],
-) -> tuple[List[Dict[str, Any]], Optional[Dict[str, Any]]]:
- """Convert OpenAI messages[] to Gemini contents[] + systemInstruction."""
- system_text_parts: List[str] = []
- contents: List[Dict[str, Any]] = []
-
- for msg in messages:
- if not isinstance(msg, dict):
- continue
- role = str(msg.get("role") or "user")
-
- if role == "system":
- system_text_parts.append(_coerce_content_to_text(msg.get("content")))
- continue
-
- # Tool result message — emit a user-role turn with functionResponse
- if role == "tool" or role == "function":
- contents.append({
- "role": "user",
- "parts": [_translate_tool_result_to_gemini(msg)],
- })
- continue
-
- gemini_role = _ROLE_MAP_OPENAI_TO_GEMINI.get(role, "user")
- parts: List[Dict[str, Any]] = []
-
- text = _coerce_content_to_text(msg.get("content"))
- if text:
- parts.append({"text": text})
-
- # Assistant messages can carry tool_calls
- tool_calls = msg.get("tool_calls") or []
- if isinstance(tool_calls, list):
- for tc in tool_calls:
- if isinstance(tc, dict):
- parts.append(_translate_tool_call_to_gemini(tc))
-
- if not parts:
- # Gemini rejects empty parts; skip the turn entirely
- continue
-
- contents.append({"role": gemini_role, "parts": parts})
-
- system_instruction: Optional[Dict[str, Any]] = None
- joined_system = "\n".join(p for p in system_text_parts if p).strip()
- if joined_system:
- system_instruction = {
- "role": "system",
- "parts": [{"text": joined_system}],
- }
-
- return contents, system_instruction
-
-
-def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
- """OpenAI tools[] -> Gemini tools[].functionDeclarations[]."""
- if not isinstance(tools, list) or not tools:
- return []
- declarations: List[Dict[str, Any]] = []
- for t in tools:
- if not isinstance(t, dict):
- continue
- fn = t.get("function") or {}
- if not isinstance(fn, dict):
- continue
- name = fn.get("name")
- if not name:
- continue
- decl = {"name": str(name)}
- if fn.get("description"):
- decl["description"] = str(fn["description"])
- params = fn.get("parameters")
- if isinstance(params, dict):
- decl["parameters"] = sanitize_gemini_tool_parameters(params)
- declarations.append(decl)
- if not declarations:
- return []
- return [{"functionDeclarations": declarations}]
-
-
-def _translate_tool_choice_to_gemini(tool_choice: Any) -> Optional[Dict[str, Any]]:
- """OpenAI tool_choice -> Gemini toolConfig.functionCallingConfig."""
- if tool_choice is None:
- return None
- if isinstance(tool_choice, str):
- if tool_choice == "auto":
- return {"functionCallingConfig": {"mode": "AUTO"}}
- if tool_choice == "required":
- return {"functionCallingConfig": {"mode": "ANY"}}
- if tool_choice == "none":
- return {"functionCallingConfig": {"mode": "NONE"}}
- if isinstance(tool_choice, dict):
- fn = tool_choice.get("function") or {}
- name = fn.get("name")
- if name:
- return {
- "functionCallingConfig": {
- "mode": "ANY",
- "allowedFunctionNames": [str(name)],
- },
- }
- return None
-
-
-def _normalize_thinking_config(config: Any) -> Optional[Dict[str, Any]]:
- """Accept thinkingBudget / thinkingLevel / includeThoughts (+ snake_case)."""
- if not isinstance(config, dict) or not config:
- return None
- budget = config.get("thinkingBudget", config.get("thinking_budget"))
- level = config.get("thinkingLevel", config.get("thinking_level"))
- include = config.get("includeThoughts", config.get("include_thoughts"))
- normalized: Dict[str, Any] = {}
- if isinstance(budget, (int, float)):
- normalized["thinkingBudget"] = int(budget)
- if isinstance(level, str) and level.strip():
- normalized["thinkingLevel"] = level.strip().lower()
- if isinstance(include, bool):
- normalized["includeThoughts"] = include
- return normalized or None
-
-
-def build_gemini_request(
- *,
- messages: List[Dict[str, Any]],
- tools: Any = None,
- tool_choice: Any = None,
- temperature: Optional[float] = None,
- max_tokens: Optional[int] = None,
- top_p: Optional[float] = None,
- stop: Any = None,
- thinking_config: Any = None,
-) -> Dict[str, Any]:
- """Build the inner Gemini request body (goes inside ``request`` wrapper)."""
- contents, system_instruction = _build_gemini_contents(messages)
-
- body: Dict[str, Any] = {"contents": contents}
- if system_instruction is not None:
- body["systemInstruction"] = system_instruction
-
- gemini_tools = _translate_tools_to_gemini(tools)
- if gemini_tools:
- body["tools"] = gemini_tools
- tool_cfg = _translate_tool_choice_to_gemini(tool_choice)
- if tool_cfg is not None:
- body["toolConfig"] = tool_cfg
-
- generation_config: Dict[str, Any] = {}
- if isinstance(temperature, (int, float)):
- generation_config["temperature"] = float(temperature)
- if isinstance(max_tokens, int) and max_tokens > 0:
- generation_config["maxOutputTokens"] = max_tokens
- if isinstance(top_p, (int, float)):
- generation_config["topP"] = float(top_p)
- if isinstance(stop, str) and stop:
- generation_config["stopSequences"] = [stop]
- elif isinstance(stop, list) and stop:
- generation_config["stopSequences"] = [str(s) for s in stop if s]
- normalized_thinking = _normalize_thinking_config(thinking_config)
- if normalized_thinking:
- generation_config["thinkingConfig"] = normalized_thinking
- if generation_config:
- body["generationConfig"] = generation_config
-
- return body
-
-
-def wrap_code_assist_request(
- *,
- project_id: str,
- model: str,
- inner_request: Dict[str, Any],
- user_prompt_id: Optional[str] = None,
-) -> Dict[str, Any]:
- """Wrap the inner Gemini request in the Code Assist envelope."""
- return {
- "project": project_id,
- "model": model,
- "user_prompt_id": user_prompt_id or str(uuid.uuid4()),
- "request": inner_request,
- }
-
-
-# =============================================================================
-# Response translation: Gemini → OpenAI
-# =============================================================================
-
-def _translate_gemini_response(
- resp: Dict[str, Any],
- model: str,
-) -> SimpleNamespace:
- """Non-streaming Gemini response -> OpenAI-shaped SimpleNamespace.
-
- Code Assist wraps the actual Gemini response inside ``response``, so we
- unwrap it first if present.
- """
- inner = resp.get("response") if isinstance(resp.get("response"), dict) else resp
-
- candidates = inner.get("candidates") or []
- if not isinstance(candidates, list) or not candidates:
- return _empty_response(model)
-
- cand = candidates[0]
- content_obj = cand.get("content") if isinstance(cand, dict) else {}
- parts = content_obj.get("parts") if isinstance(content_obj, dict) else []
-
- text_pieces: List[str] = []
- reasoning_pieces: List[str] = []
- tool_calls: List[SimpleNamespace] = []
-
- for i, part in enumerate(parts or []):
- if not isinstance(part, dict):
- continue
- # Thought parts are model's internal reasoning — surface as reasoning,
- # don't mix into content.
- if part.get("thought") is True:
- if isinstance(part.get("text"), str):
- reasoning_pieces.append(part["text"])
- continue
- if isinstance(part.get("text"), str):
- text_pieces.append(part["text"])
- continue
- fc = part.get("functionCall")
- if isinstance(fc, dict) and fc.get("name"):
- try:
- args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
- except (TypeError, ValueError):
- args_str = "{}"
- tool_calls.append(SimpleNamespace(
- id=f"call_{uuid.uuid4().hex[:12]}",
- type="function",
- index=i,
- function=SimpleNamespace(name=str(fc["name"]), arguments=args_str),
- ))
-
- finish_reason = "tool_calls" if tool_calls else _map_gemini_finish_reason(
- str(cand.get("finishReason") or "")
- )
-
- usage_meta = inner.get("usageMetadata") or {}
- usage = SimpleNamespace(
- prompt_tokens=int(usage_meta.get("promptTokenCount") or 0),
- completion_tokens=int(usage_meta.get("candidatesTokenCount") or 0),
- total_tokens=int(usage_meta.get("totalTokenCount") or 0),
- prompt_tokens_details=SimpleNamespace(
- cached_tokens=int(usage_meta.get("cachedContentTokenCount") or 0),
- ),
- )
-
- message = SimpleNamespace(
- role="assistant",
- content="".join(text_pieces) if text_pieces else None,
- tool_calls=tool_calls or None,
- reasoning="".join(reasoning_pieces) or None,
- reasoning_content="".join(reasoning_pieces) or None,
- reasoning_details=None,
- )
- choice = SimpleNamespace(
- index=0,
- message=message,
- finish_reason=finish_reason,
- )
- return SimpleNamespace(
- id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
- object="chat.completion",
- created=int(time.time()),
- model=model,
- choices=[choice],
- usage=usage,
- )
-
-
-def _empty_response(model: str) -> SimpleNamespace:
- message = SimpleNamespace(
- role="assistant", content="", tool_calls=None,
- reasoning=None, reasoning_content=None, reasoning_details=None,
- )
- choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
- usage = SimpleNamespace(
- prompt_tokens=0, completion_tokens=0, total_tokens=0,
- prompt_tokens_details=SimpleNamespace(cached_tokens=0),
- )
- return SimpleNamespace(
- id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
- object="chat.completion",
- created=int(time.time()),
- model=model,
- choices=[choice],
- usage=usage,
- )
-
-
-def _map_gemini_finish_reason(reason: str) -> str:
- mapping = {
- "STOP": "stop",
- "MAX_TOKENS": "length",
- "SAFETY": "content_filter",
- "RECITATION": "content_filter",
- "OTHER": "stop",
- }
- return mapping.get(reason.upper(), "stop")
-
-
-# =============================================================================
-# Streaming SSE iterator
-# =============================================================================
-
-class _GeminiStreamChunk(SimpleNamespace):
- """Mimics an OpenAI ChatCompletionChunk with .choices[0].delta."""
- pass
-
-
-def _make_stream_chunk(
- *,
- model: str,
- content: str = "",
- tool_call_delta: Optional[Dict[str, Any]] = None,
- finish_reason: Optional[str] = None,
- reasoning: str = "",
-) -> _GeminiStreamChunk:
- delta_kwargs: Dict[str, Any] = {
- "role": "assistant",
- "content": None,
- "tool_calls": None,
- "reasoning": None,
- "reasoning_content": None,
- }
- if content:
- delta_kwargs["content"] = content
- if tool_call_delta is not None:
- delta_kwargs["tool_calls"] = [SimpleNamespace(
- index=tool_call_delta.get("index", 0),
- id=tool_call_delta.get("id") or f"call_{uuid.uuid4().hex[:12]}",
- type="function",
- function=SimpleNamespace(
- name=tool_call_delta.get("name") or "",
- arguments=tool_call_delta.get("arguments") or "",
- ),
- )]
- if reasoning:
- delta_kwargs["reasoning"] = reasoning
- delta_kwargs["reasoning_content"] = reasoning
- delta = SimpleNamespace(**delta_kwargs)
- choice = SimpleNamespace(index=0, delta=delta, finish_reason=finish_reason)
- return _GeminiStreamChunk(
- id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
- object="chat.completion.chunk",
- created=int(time.time()),
- model=model,
- choices=[choice],
- usage=None,
- )
-
-
-def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
- """Parse Server-Sent Events from an httpx streaming response."""
- buffer = ""
- for chunk in response.iter_text():
- if not chunk:
- continue
- buffer += chunk
- while "\n" in buffer:
- line, buffer = buffer.split("\n", 1)
- line = line.rstrip("\r")
- if not line:
- continue
- if line.startswith("data: "):
- data = line[6:]
- if data == "[DONE]":
- return
- try:
- yield json.loads(data)
- except json.JSONDecodeError:
- logger.debug("Non-JSON SSE line: %s", data[:200])
-
-
-def _translate_stream_event(
- event: Dict[str, Any],
- model: str,
- tool_call_counter: List[int],
-) -> List[_GeminiStreamChunk]:
- """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s).
-
- ``tool_call_counter`` is a single-element list used as a mutable counter
- across events in the same stream. Each ``functionCall`` part gets a
- fresh, unique OpenAI ``index`` — keying by function name would collide
- whenever the model issues parallel calls to the same tool (e.g. reading
- three files in one turn).
- """
- inner = event.get("response") if isinstance(event.get("response"), dict) else event
- candidates = inner.get("candidates") or []
- if not candidates:
- return []
- cand = candidates[0]
- if not isinstance(cand, dict):
- return []
-
- chunks: List[_GeminiStreamChunk] = []
-
- content = cand.get("content") or {}
- parts = content.get("parts") if isinstance(content, dict) else []
- for part in parts or []:
- if not isinstance(part, dict):
- continue
- if part.get("thought") is True and isinstance(part.get("text"), str):
- chunks.append(_make_stream_chunk(
- model=model, reasoning=part["text"],
- ))
- continue
- if isinstance(part.get("text"), str) and part["text"]:
- chunks.append(_make_stream_chunk(model=model, content=part["text"]))
- fc = part.get("functionCall")
- if isinstance(fc, dict) and fc.get("name"):
- name = str(fc["name"])
- idx = tool_call_counter[0]
- tool_call_counter[0] += 1
- try:
- args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
- except (TypeError, ValueError):
- args_str = "{}"
- chunks.append(_make_stream_chunk(
- model=model,
- tool_call_delta={
- "index": idx,
- "name": name,
- "arguments": args_str,
- },
- ))
-
- finish_reason_raw = str(cand.get("finishReason") or "")
- if finish_reason_raw:
- mapped = _map_gemini_finish_reason(finish_reason_raw)
- if tool_call_counter[0] > 0:
- mapped = "tool_calls"
- chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
- return chunks
-
-
-# =============================================================================
-# GeminiCloudCodeClient — OpenAI-compatible facade
-# =============================================================================
-
-MARKER_BASE_URL = "cloudcode-pa://google"
-
-
-class _GeminiChatCompletions:
- def __init__(self, client: "GeminiCloudCodeClient"):
- self._client = client
-
- def create(self, **kwargs: Any) -> Any:
- return self._client._create_chat_completion(**kwargs)
-
-
-class _GeminiChatNamespace:
- def __init__(self, client: "GeminiCloudCodeClient"):
- self.completions = _GeminiChatCompletions(client)
-
-
-class GeminiCloudCodeClient:
- """Minimal OpenAI-SDK-compatible facade over Code Assist v1internal."""
-
- def __init__(
- self,
- *,
- api_key: Optional[str] = None,
- base_url: Optional[str] = None,
- default_headers: Optional[Dict[str, str]] = None,
- project_id: str = "",
- **_: Any,
- ):
- # `api_key` here is a dummy — real auth is the OAuth access token
- # fetched on every call via agent.google_oauth.get_valid_access_token().
- # We accept the kwarg for openai.OpenAI interface parity.
- self.api_key = api_key or "google-oauth"
- self.base_url = base_url or MARKER_BASE_URL
- self._default_headers = dict(default_headers or {})
- self._configured_project_id = project_id
- self._project_context: Optional[ProjectContext] = None
- self._project_context_lock = False # simple single-thread guard
- self.chat = _GeminiChatNamespace(self)
- self.is_closed = False
- self._http = httpx.Client(timeout=httpx.Timeout(connect=15.0, read=600.0, write=30.0, pool=30.0))
-
- def close(self) -> None:
- self.is_closed = True
- try:
- self._http.close()
- except Exception:
- pass
-
- # Implement the OpenAI SDK's context-manager-ish closure check
- def __enter__(self):
- return self
-
- def __exit__(self, exc_type, exc_val, exc_tb):
- self.close()
-
- def _ensure_project_context(self, access_token: str, model: str) -> ProjectContext:
- """Lazily resolve and cache the project context for this client."""
- if self._project_context is not None:
- return self._project_context
-
- env_project = google_oauth.resolve_project_id_from_env()
- creds = google_oauth.load_credentials()
- stored_project = creds.project_id if creds else ""
-
- # Prefer what's already baked into the creds
- if stored_project:
- self._project_context = ProjectContext(
- project_id=stored_project,
- managed_project_id=creds.managed_project_id if creds else "",
- tier_id="",
- source="stored",
- )
- return self._project_context
-
- ctx = resolve_project_context(
- access_token,
- configured_project_id=self._configured_project_id,
- env_project_id=env_project,
- user_agent_model=model,
- )
- # Persist discovered project back to the creds file so the next
- # session doesn't re-run the discovery.
- if ctx.project_id or ctx.managed_project_id:
- google_oauth.update_project_ids(
- project_id=ctx.project_id,
- managed_project_id=ctx.managed_project_id,
- )
- self._project_context = ctx
- return ctx
-
- def _create_chat_completion(
- self,
- *,
- model: str = "gemini-2.5-flash",
- messages: Optional[List[Dict[str, Any]]] = None,
- stream: bool = False,
- tools: Any = None,
- tool_choice: Any = None,
- temperature: Optional[float] = None,
- max_tokens: Optional[int] = None,
- top_p: Optional[float] = None,
- stop: Any = None,
- extra_body: Optional[Dict[str, Any]] = None,
- timeout: Any = None,
- **_: Any,
- ) -> Any:
- access_token = google_oauth.get_valid_access_token()
- ctx = self._ensure_project_context(access_token, model)
-
- thinking_config = None
- if isinstance(extra_body, dict):
- thinking_config = extra_body.get("thinking_config") or extra_body.get("thinkingConfig")
-
- inner = build_gemini_request(
- messages=messages or [],
- tools=tools,
- tool_choice=tool_choice,
- temperature=temperature,
- max_tokens=max_tokens,
- top_p=top_p,
- stop=stop,
- thinking_config=thinking_config,
- )
- wrapped = wrap_code_assist_request(
- project_id=ctx.project_id,
- model=model,
- inner_request=inner,
- )
-
- headers = {
- "Content-Type": "application/json",
- "Accept": "application/json",
- "Authorization": f"Bearer {access_token}",
- "User-Agent": "hermes-agent (gemini-cli-compat)",
- "X-Goog-Api-Client": "gl-python/hermes",
- "x-activity-request-id": str(uuid.uuid4()),
- }
- headers.update(self._default_headers)
-
- if stream:
- return self._stream_completion(model=model, wrapped=wrapped, headers=headers)
-
- url = f"{CODE_ASSIST_ENDPOINT}/v1internal:generateContent"
- response = self._http.post(url, json=wrapped, headers=headers)
- if response.status_code != 200:
- raise _gemini_http_error(response)
- try:
- payload = response.json()
- except ValueError as exc:
- raise CodeAssistError(
- f"Invalid JSON from Code Assist: {exc}",
- code="code_assist_invalid_json",
- ) from exc
- return _translate_gemini_response(payload, model=model)
-
- def _stream_completion(
- self,
- *,
- model: str,
- wrapped: Dict[str, Any],
- headers: Dict[str, str],
- ) -> Iterator[_GeminiStreamChunk]:
- """Generator that yields OpenAI-shaped streaming chunks."""
- url = f"{CODE_ASSIST_ENDPOINT}/v1internal:streamGenerateContent?alt=sse"
- stream_headers = dict(headers)
- stream_headers["Accept"] = "text/event-stream"
-
- def _generator() -> Iterator[_GeminiStreamChunk]:
- try:
- with self._http.stream("POST", url, json=wrapped, headers=stream_headers) as response:
- if response.status_code != 200:
- # Materialize error body for better diagnostics
- response.read()
- raise _gemini_http_error(response)
- tool_call_counter: List[int] = [0]
- for event in _iter_sse_events(response):
- for chunk in _translate_stream_event(event, model, tool_call_counter):
- yield chunk
- except httpx.HTTPError as exc:
- raise CodeAssistError(
- f"Streaming request failed: {exc}",
- code="code_assist_stream_error",
- ) from exc
-
- return _generator()
-
-
-def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
- """Translate an httpx response into a CodeAssistError with rich metadata.
-
- Parses Google's error envelope (``{"error": {"code", "message", "status",
- "details": [...]}}``) so the agent's error classifier can reason about
- the failure — ``status_code`` enables the rate_limit / auth classification
- paths, and ``response`` lets the main loop honor ``Retry-After`` just
- like it does for OpenAI SDK exceptions.
-
- Also lifts a few recognizable Google conditions into human-readable
- messages so the user sees something better than a 500-char JSON dump:
-
- MODEL_CAPACITY_EXHAUSTED → "Gemini model capacity exhausted for
- . This is a Google-side throttle..."
- RESOURCE_EXHAUSTED w/o reason → quota-style message
- 404 → "Model not found at cloudcode-pa..."
- """
- status = response.status_code
-
- # Parse the body once, surviving any weird encodings.
- body_text = ""
- body_json: Dict[str, Any] = {}
- try:
- body_text = response.text
- except Exception:
- body_text = ""
- if body_text:
- try:
- parsed = json.loads(body_text)
- if isinstance(parsed, dict):
- body_json = parsed
- except (ValueError, TypeError):
- body_json = {}
-
- # Dig into Google's error envelope. Shape is:
- # {"error": {"code": 429, "message": "...", "status": "RESOURCE_EXHAUSTED",
- # "details": [{"@type": ".../ErrorInfo", "reason": "MODEL_CAPACITY_EXHAUSTED",
- # "metadata": {...}},
- # {"@type": ".../RetryInfo", "retryDelay": "30s"}]}}
- err_obj = body_json.get("error") if isinstance(body_json, dict) else None
- if not isinstance(err_obj, dict):
- err_obj = {}
- err_status = str(err_obj.get("status") or "").strip()
- err_message = str(err_obj.get("message") or "").strip()
- _raw_details = err_obj.get("details")
- err_details_list = _raw_details if isinstance(_raw_details, list) else []
-
- # Extract google.rpc.ErrorInfo reason + metadata. There may be more
- # than one ErrorInfo (rare), so we pick the first one with a reason.
- error_reason = ""
- error_metadata: Dict[str, Any] = {}
- retry_delay_seconds: Optional[float] = None
- for detail in err_details_list:
- if not isinstance(detail, dict):
- continue
- type_url = str(detail.get("@type") or "")
- if not error_reason and type_url.endswith("/google.rpc.ErrorInfo"):
- reason = detail.get("reason")
- if isinstance(reason, str) and reason:
- error_reason = reason
- md = detail.get("metadata")
- if isinstance(md, dict):
- error_metadata = md
- elif retry_delay_seconds is None and type_url.endswith("/google.rpc.RetryInfo"):
- # retryDelay is a google.protobuf.Duration string like "30s" or "1.5s".
- delay_raw = detail.get("retryDelay")
- if isinstance(delay_raw, str) and delay_raw.endswith("s"):
- try:
- retry_delay_seconds = float(delay_raw[:-1])
- except ValueError:
- pass
- elif isinstance(delay_raw, (int, float)):
- retry_delay_seconds = float(delay_raw)
-
- # Fall back to the Retry-After header if the body didn't include RetryInfo.
- if retry_delay_seconds is None:
- try:
- header_val = response.headers.get("Retry-After") or response.headers.get("retry-after")
- except Exception:
- header_val = None
- if header_val:
- try:
- retry_delay_seconds = float(header_val)
- except (TypeError, ValueError):
- retry_delay_seconds = None
-
- # Classify the error code. ``code_assist_rate_limited`` stays the default
- # for 429s; a more specific reason tag helps downstream callers (e.g. tests,
- # logs) without changing the rate_limit classification path.
- code = f"code_assist_http_{status}"
- if status == 401:
- code = "code_assist_unauthorized"
- elif status == 429:
- code = "code_assist_rate_limited"
- if error_reason == "MODEL_CAPACITY_EXHAUSTED":
- code = "code_assist_capacity_exhausted"
-
- # Build a human-readable message. Keep the status + a raw-body tail for
- # debugging, but lead with a friendlier summary when we recognize the
- # Google signal.
- model_hint = ""
- if isinstance(error_metadata, dict):
- model_hint = str(error_metadata.get("model") or error_metadata.get("modelId") or "").strip()
-
- if status == 429 and error_reason == "MODEL_CAPACITY_EXHAUSTED":
- target = model_hint or "this Gemini model"
- message = (
- f"Gemini capacity exhausted for {target} (Google-side throttle, "
- f"not a Hermes issue). Try a different Gemini model or set a "
- f"fallback_providers entry to a non-Gemini provider."
- )
- if retry_delay_seconds is not None:
- message += f" Google suggests retrying in {retry_delay_seconds:g}s."
- elif status == 429 and err_status == "RESOURCE_EXHAUSTED":
- message = (
- f"Gemini quota exhausted ({err_message or 'RESOURCE_EXHAUSTED'}). "
- f"Check /gquota for remaining daily requests."
- )
- if retry_delay_seconds is not None:
- message += f" Retry suggested in {retry_delay_seconds:g}s."
- elif status == 404:
- # Google returns 404 when a model has been retired or renamed.
- target = model_hint or (err_message or "model")
- message = (
- f"Code Assist 404: {target} is not available at "
- f"cloudcode-pa.googleapis.com. It may have been renamed or "
- f"retired. Check hermes_cli/models.py for the current list."
- )
- elif err_message:
- # Generic fallback with the parsed message.
- message = f"Code Assist HTTP {status} ({err_status or 'error'}): {err_message}"
- else:
- # Last-ditch fallback — raw body snippet.
- message = f"Code Assist returned HTTP {status}: {body_text[:500]}"
-
- return CodeAssistError(
- message,
- code=code,
- status_code=status,
- response=response,
- retry_after=retry_delay_seconds,
- details={
- "status": err_status,
- "reason": error_reason,
- "metadata": error_metadata,
- "message": err_message,
- },
- )
diff --git a/agent/google_code_assist.py b/agent/google_code_assist.py
deleted file mode 100644
index eec6441f80e..00000000000
--- a/agent/google_code_assist.py
+++ /dev/null
@@ -1,451 +0,0 @@
-"""Google Code Assist API client — project discovery, onboarding, quota.
-
-The Code Assist API powers Google's official gemini-cli. It sits at
-``cloudcode-pa.googleapis.com`` and provides:
-
-- Free tier access (generous daily quota) for personal Google accounts
-- Paid tier access via GCP projects with billing / Workspace / Standard / Enterprise
-
-This module handles the control-plane dance needed before inference:
-
-1. ``load_code_assist()`` — probe the user's account to learn what tier they're on
- and whether a ``cloudaicompanionProject`` is already assigned.
-2. ``onboard_user()`` — if the user hasn't been onboarded yet (new account, fresh
- free tier, etc.), call this with the chosen tier + project id. Supports LRO
- polling for slow provisioning.
-3. ``retrieve_user_quota()`` — fetch the ``buckets[]`` array showing remaining
- quota per model, used by the ``/gquota`` slash command.
-
-VPC-SC handling: enterprise accounts under a VPC Service Controls perimeter
-will get ``SECURITY_POLICY_VIOLATED`` on ``load_code_assist``. We catch this
-and force the account to ``standard-tier`` so the call chain still succeeds.
-
-Derived from opencode-gemini-auth (MIT) and clawdbot/extensions/google. The
-request/response shapes are specific to Google's internal Code Assist API,
-documented nowhere public — we copy them from the reference implementations.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import time
-import urllib.error
-import urllib.request
-import uuid
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional
-
-logger = logging.getLogger(__name__)
-
-
-# =============================================================================
-# Constants
-# =============================================================================
-
-CODE_ASSIST_ENDPOINT = "https://cloudcode-pa.googleapis.com"
-
-# Fallback endpoints tried when prod returns an error during project discovery
-FALLBACK_ENDPOINTS = [
- "https://daily-cloudcode-pa.sandbox.googleapis.com",
- "https://autopush-cloudcode-pa.sandbox.googleapis.com",
-]
-
-# Tier identifiers that Google's API uses
-FREE_TIER_ID = "free-tier"
-LEGACY_TIER_ID = "legacy-tier"
-STANDARD_TIER_ID = "standard-tier"
-
-# Default HTTP headers matching gemini-cli's fingerprint.
-# Google may reject unrecognized User-Agents on these internal endpoints.
-_GEMINI_CLI_USER_AGENT = "google-api-nodejs-client/9.15.1 (gzip)"
-_X_GOOG_API_CLIENT = "gl-node/24.0.0"
-_DEFAULT_REQUEST_TIMEOUT = 30.0
-_ONBOARDING_POLL_ATTEMPTS = 12
-_ONBOARDING_POLL_INTERVAL_SECONDS = 5.0
-
-
-class CodeAssistError(RuntimeError):
- """Exception raised by the Code Assist (``cloudcode-pa``) integration.
-
- Carries HTTP status / response / retry-after metadata so the agent's
- ``error_classifier._extract_status_code`` and the main loop's Retry-After
- handling (which walks ``error.response.headers``) pick up the right
- signals. Without these, 429s from the OAuth path look like opaque
- ``RuntimeError`` and skip the rate-limit path.
- """
-
- def __init__(
- self,
- message: str,
- *,
- code: str = "code_assist_error",
- status_code: Optional[int] = None,
- response: Any = None,
- retry_after: Optional[float] = None,
- details: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(message)
- self.code = code
- # ``status_code`` is picked up by ``agent.error_classifier._extract_status_code``
- # so a 429 from Code Assist classifies as FailoverReason.rate_limit and
- # triggers the main loop's fallback_providers chain the same way SDK
- # errors do.
- self.status_code = status_code
- # ``response`` is the underlying ``httpx.Response`` (or a shim with a
- # ``.headers`` mapping and ``.json()`` method). The main loop reads
- # ``error.response.headers["Retry-After"]`` to honor Google's retry
- # hints when the backend throttles us.
- self.response = response
- # Parsed ``Retry-After`` seconds (kept separately for convenience —
- # Google returns retry hints in both the header and the error body's
- # ``google.rpc.RetryInfo`` details, and we pick whichever we found).
- self.retry_after = retry_after
- # Parsed structured error details from the Google error envelope
- # (e.g. ``{"reason": "MODEL_CAPACITY_EXHAUSTED", "status": "RESOURCE_EXHAUSTED"}``).
- # Useful for logging and for tests that want to assert on specifics.
- self.details = details or {}
-
-
-class ProjectIdRequiredError(CodeAssistError):
- def __init__(self, message: str = "GCP project id required for this tier") -> None:
- super().__init__(message, code="code_assist_project_id_required")
-
-
-# =============================================================================
-# HTTP primitive (auth via Bearer token passed per-call)
-# =============================================================================
-
-def _build_headers(access_token: str, *, user_agent_model: str = "") -> Dict[str, str]:
- ua = _GEMINI_CLI_USER_AGENT
- if user_agent_model:
- ua = f"{ua} model/{user_agent_model}"
- return {
- "Content-Type": "application/json",
- "Accept": "application/json",
- "Authorization": f"Bearer {access_token}",
- "User-Agent": ua,
- "X-Goog-Api-Client": _X_GOOG_API_CLIENT,
- "x-activity-request-id": str(uuid.uuid4()),
- }
-
-
-def _client_metadata() -> Dict[str, str]:
- """Match Google's gemini-cli exactly — unrecognized metadata may be rejected."""
- return {
- "ideType": "IDE_UNSPECIFIED",
- "platform": "PLATFORM_UNSPECIFIED",
- "pluginType": "GEMINI",
- }
-
-
-def _post_json(
- url: str,
- body: Dict[str, Any],
- access_token: str,
- *,
- timeout: float = _DEFAULT_REQUEST_TIMEOUT,
- user_agent_model: str = "",
-) -> Dict[str, Any]:
- data = json.dumps(body).encode("utf-8")
- request = urllib.request.Request(
- url, data=data, method="POST",
- headers=_build_headers(access_token, user_agent_model=user_agent_model),
- )
- try:
- with urllib.request.urlopen(request, timeout=timeout) as response:
- raw = response.read().decode("utf-8", errors="replace")
- return json.loads(raw) if raw else {}
- except urllib.error.HTTPError as exc:
- detail = ""
- try:
- detail = exc.read().decode("utf-8", errors="replace")
- except Exception:
- pass
- # Special case: VPC-SC violation should be distinguishable
- if _is_vpc_sc_violation(detail):
- raise CodeAssistError(
- f"VPC-SC policy violation: {detail}",
- code="code_assist_vpc_sc",
- ) from exc
- raise CodeAssistError(
- f"Code Assist HTTP {exc.code}: {detail or exc.reason}",
- code=f"code_assist_http_{exc.code}",
- ) from exc
- except urllib.error.URLError as exc:
- raise CodeAssistError(
- f"Code Assist request failed: {exc}",
- code="code_assist_network_error",
- ) from exc
-
-
-def _is_vpc_sc_violation(body: str) -> bool:
- """Detect a VPC Service Controls violation from a response body."""
- if not body:
- return False
- try:
- parsed = json.loads(body)
- except (json.JSONDecodeError, ValueError):
- return "SECURITY_POLICY_VIOLATED" in body
- # Walk the nested error structure Google uses
- error = parsed.get("error") if isinstance(parsed, dict) else None
- if not isinstance(error, dict):
- return False
- details = error.get("details") or []
- if isinstance(details, list):
- for item in details:
- if isinstance(item, dict):
- reason = item.get("reason") or ""
- if reason == "SECURITY_POLICY_VIOLATED":
- return True
- msg = str(error.get("message", ""))
- return "SECURITY_POLICY_VIOLATED" in msg
-
-
-# =============================================================================
-# load_code_assist — discovers current tier + assigned project
-# =============================================================================
-
-@dataclass
-class CodeAssistProjectInfo:
- """Result from ``load_code_assist``."""
- current_tier_id: str = ""
- cloudaicompanion_project: str = "" # Google-managed project (free tier)
- allowed_tiers: List[str] = field(default_factory=list)
- raw: Dict[str, Any] = field(default_factory=dict)
-
-
-def load_code_assist(
- access_token: str,
- *,
- project_id: str = "",
- user_agent_model: str = "",
-) -> CodeAssistProjectInfo:
- """Call ``POST /v1internal:loadCodeAssist`` with prod → sandbox fallback.
-
- Returns whatever tier + project info Google reports. On VPC-SC violations,
- returns a synthetic ``standard-tier`` result so the chain can continue.
- """
- body: Dict[str, Any] = {
- "metadata": {
- "duetProject": project_id,
- **_client_metadata(),
- },
- }
- if project_id:
- body["cloudaicompanionProject"] = project_id
-
- endpoints = [CODE_ASSIST_ENDPOINT] + FALLBACK_ENDPOINTS
- last_err: Optional[Exception] = None
- for endpoint in endpoints:
- url = f"{endpoint}/v1internal:loadCodeAssist"
- try:
- resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
- return _parse_load_response(resp)
- except CodeAssistError as exc:
- if exc.code == "code_assist_vpc_sc":
- logger.info("VPC-SC violation on %s — defaulting to standard-tier", endpoint)
- return CodeAssistProjectInfo(
- current_tier_id=STANDARD_TIER_ID,
- cloudaicompanion_project=project_id,
- )
- last_err = exc
- logger.warning("loadCodeAssist failed on %s: %s", endpoint, exc)
- continue
- if last_err:
- raise last_err
- return CodeAssistProjectInfo()
-
-
-def _parse_load_response(resp: Dict[str, Any]) -> CodeAssistProjectInfo:
- current_tier = resp.get("currentTier") or {}
- tier_id = str(current_tier.get("id") or "") if isinstance(current_tier, dict) else ""
- project = str(resp.get("cloudaicompanionProject") or "")
- allowed = resp.get("allowedTiers") or []
- allowed_ids: List[str] = []
- if isinstance(allowed, list):
- for t in allowed:
- if isinstance(t, dict):
- tid = str(t.get("id") or "")
- if tid:
- allowed_ids.append(tid)
- return CodeAssistProjectInfo(
- current_tier_id=tier_id,
- cloudaicompanion_project=project,
- allowed_tiers=allowed_ids,
- raw=resp,
- )
-
-
-# =============================================================================
-# onboard_user — provisions a new user on a tier (with LRO polling)
-# =============================================================================
-
-def onboard_user(
- access_token: str,
- *,
- tier_id: str,
- project_id: str = "",
- user_agent_model: str = "",
-) -> Dict[str, Any]:
- """Call ``POST /v1internal:onboardUser`` to provision the user.
-
- For paid tiers, ``project_id`` is REQUIRED (raises ProjectIdRequiredError).
- For free tiers, ``project_id`` is optional — Google will assign one.
-
- Returns the final operation response. Polls ``/v1internal/`` for up
- to ``_ONBOARDING_POLL_ATTEMPTS`` × ``_ONBOARDING_POLL_INTERVAL_SECONDS``
- (default: 12 × 5s = 1 min).
- """
- if tier_id != FREE_TIER_ID and tier_id != LEGACY_TIER_ID and not project_id:
- raise ProjectIdRequiredError(
- f"Tier {tier_id!r} requires a GCP project id. "
- "Set HERMES_GEMINI_PROJECT_ID or GOOGLE_CLOUD_PROJECT."
- )
-
- body: Dict[str, Any] = {
- "tierId": tier_id,
- "metadata": _client_metadata(),
- }
- if project_id:
- body["cloudaicompanionProject"] = project_id
-
- endpoint = CODE_ASSIST_ENDPOINT
- url = f"{endpoint}/v1internal:onboardUser"
- resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
-
- # Poll if LRO (long-running operation)
- if not resp.get("done"):
- op_name = resp.get("name", "")
- if not op_name:
- return resp
- for attempt in range(_ONBOARDING_POLL_ATTEMPTS):
- time.sleep(_ONBOARDING_POLL_INTERVAL_SECONDS)
- poll_url = f"{endpoint}/v1internal/{op_name}"
- try:
- poll_resp = _post_json(poll_url, {}, access_token, user_agent_model=user_agent_model)
- except CodeAssistError as exc:
- logger.warning("Onboarding poll attempt %d failed: %s", attempt + 1, exc)
- continue
- if poll_resp.get("done"):
- return poll_resp
- logger.warning("Onboarding did not complete within %d attempts", _ONBOARDING_POLL_ATTEMPTS)
- return resp
-
-
-# =============================================================================
-# retrieve_user_quota — for /gquota
-# =============================================================================
-
-@dataclass
-class QuotaBucket:
- model_id: str
- token_type: str = ""
- remaining_fraction: float = 0.0
- reset_time_iso: str = ""
- raw: Dict[str, Any] = field(default_factory=dict)
-
-
-def retrieve_user_quota(
- access_token: str,
- *,
- project_id: str = "",
- user_agent_model: str = "",
-) -> List[QuotaBucket]:
- """Call ``POST /v1internal:retrieveUserQuota`` and parse ``buckets[]``."""
- body: Dict[str, Any] = {}
- if project_id:
- body["project"] = project_id
- url = f"{CODE_ASSIST_ENDPOINT}/v1internal:retrieveUserQuota"
- resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
- raw_buckets = resp.get("buckets") or []
- buckets: List[QuotaBucket] = []
- if not isinstance(raw_buckets, list):
- return buckets
- for b in raw_buckets:
- if not isinstance(b, dict):
- continue
- buckets.append(QuotaBucket(
- model_id=str(b.get("modelId") or ""),
- token_type=str(b.get("tokenType") or ""),
- remaining_fraction=float(b.get("remainingFraction") or 0.0),
- reset_time_iso=str(b.get("resetTime") or ""),
- raw=b,
- ))
- return buckets
-
-
-# =============================================================================
-# Project context resolution
-# =============================================================================
-
-@dataclass
-class ProjectContext:
- """Resolved state for a given OAuth session."""
- project_id: str = "" # effective project id sent on requests
- managed_project_id: str = "" # Google-assigned project (free tier)
- tier_id: str = ""
- source: str = "" # "env", "config", "discovered", "onboarded"
-
-
-def resolve_project_context(
- access_token: str,
- *,
- configured_project_id: str = "",
- env_project_id: str = "",
- user_agent_model: str = "",
-) -> ProjectContext:
- """Figure out what project id + tier to use for requests.
-
- Priority:
- 1. If configured_project_id or env_project_id is set, use that directly
- and short-circuit (no discovery needed).
- 2. Otherwise call loadCodeAssist to see what Google says.
- 3. If no tier assigned yet, onboard the user (free tier default).
- """
- # Short-circuit: caller provided a project id
- if configured_project_id:
- return ProjectContext(
- project_id=configured_project_id,
- tier_id=STANDARD_TIER_ID, # assume paid since they specified one
- source="config",
- )
- if env_project_id:
- return ProjectContext(
- project_id=env_project_id,
- tier_id=STANDARD_TIER_ID,
- source="env",
- )
-
- # Discover via loadCodeAssist
- info = load_code_assist(access_token, user_agent_model=user_agent_model)
-
- effective_project = info.cloudaicompanion_project
- tier = info.current_tier_id
-
- if not tier:
- # User hasn't been onboarded — provision them on free tier
- onboard_resp = onboard_user(
- access_token,
- tier_id=FREE_TIER_ID,
- project_id="",
- user_agent_model=user_agent_model,
- )
- # Re-parse from the onboard response
- response_body = onboard_resp.get("response") or {}
- if isinstance(response_body, dict):
- effective_project = (
- effective_project
- or str(response_body.get("cloudaicompanionProject") or "")
- )
- tier = FREE_TIER_ID
- source = "onboarded"
- else:
- source = "discovered"
-
- return ProjectContext(
- project_id=effective_project,
- managed_project_id=effective_project if tier == FREE_TIER_ID else "",
- tier_id=tier,
- source=source,
- )
diff --git a/agent/google_oauth.py b/agent/google_oauth.py
deleted file mode 100644
index 9eb55ec19dc..00000000000
--- a/agent/google_oauth.py
+++ /dev/null
@@ -1,1067 +0,0 @@
-"""Google OAuth PKCE flow for the Gemini (google-gemini-cli) inference provider.
-
-This module implements Authorization Code + PKCE (S256) OAuth against Google's
-accounts.google.com endpoints. The resulting access token is used by
-``agent.gemini_cloudcode_adapter`` to talk to ``cloudcode-pa.googleapis.com``
-(Google's Code Assist backend that powers the Gemini CLI's free and paid tiers).
-
-Synthesized from:
-- jenslys/opencode-gemini-auth (MIT) — overall flow shape, public OAuth creds, request format
-- clawdbot/extensions/google/ — refresh-token rotation, VPC-SC handling reference
-- PRs #10176 (@sliverp) and #10779 (@newarthur) — PKCE module structure, cross-process lock
-
-Storage (``~/.hermes/auth/google_oauth.json``, chmod 0o600):
-
- {
- "refresh": "refreshToken|projectId|managedProjectId",
- "access": "...",
- "expires": 1744848000000, // unix MILLIseconds
- "email": "user@example.com"
- }
-
-The ``refresh`` field packs the refresh_token together with the resolved GCP
-project IDs so subsequent sessions don't need to re-discover the project.
-This matches opencode-gemini-auth's storage contract exactly.
-
-The packed format stays parseable even if no project IDs are present — just
-a bare refresh_token is treated as "packed with empty IDs".
-
-Public client credentials
--------------------------
-The client_id and client_secret below are Google's PUBLIC desktop OAuth client
-for their own open-source gemini-cli. They are baked into every copy of the
-gemini-cli npm package and are NOT confidential — desktop OAuth clients have
-no secret-keeping requirement (PKCE provides the security). Shipping them here
-is consistent with opencode-gemini-auth and the official Google gemini-cli.
-
-Policy note: Google considers using this OAuth client with third-party software
-a policy violation. Users see an upfront warning with ``confirm(default=False)``
-before authorization begins.
-"""
-
-from __future__ import annotations
-
-import base64
-import contextlib
-import hashlib
-import http.server
-import json
-import logging
-import os
-import secrets
-import stat
-import threading
-import time
-import urllib.error
-import urllib.parse
-import urllib.request
-from dataclasses import dataclass
-from pathlib import Path
-from typing import Any, Dict, Optional, Tuple
-
-from hermes_constants import get_hermes_home, secure_parent_dir
-
-logger = logging.getLogger(__name__)
-
-
-# =============================================================================
-# OAuth client credential resolution.
-#
-# Resolution order:
-# 1. HERMES_GEMINI_CLIENT_ID / HERMES_GEMINI_CLIENT_SECRET env vars (power users)
-# 2. Shipped defaults — Google's public gemini-cli desktop OAuth client
-# (baked into every copy of Google's open-source gemini-cli; NOT
-# confidential — desktop OAuth clients use PKCE, not client_secret, for
-# security). Using these matches opencode-gemini-auth behavior.
-# 3. Fallback: scrape from a locally installed gemini-cli binary (helps forks
-# that deliberately wipe the shipped defaults).
-# 4. Fail with a helpful error.
-# =============================================================================
-
-ENV_CLIENT_ID = "HERMES_GEMINI_CLIENT_ID"
-ENV_CLIENT_SECRET = "HERMES_GEMINI_CLIENT_SECRET"
-
-# Public gemini-cli desktop OAuth client (shipped in Google's open-source
-# gemini-cli MIT repo). Composed piecewise to keep the constants readable and
-# to pair each piece with an explicit comment about why it is non-confidential.
-# See: https://github.com/google-gemini/gemini-cli/blob/main/packages/core/src/code_assist/oauth2.ts
-_PUBLIC_CLIENT_ID_PROJECT_NUM = "681255809395"
-_PUBLIC_CLIENT_ID_HASH = "oo8ft2oprdrnp9e3aqf6av3hmdib135j"
-_PUBLIC_CLIENT_SECRET_SUFFIX = "4uHgMPm-1o7Sk-geV6Cu5clXFsxl"
-
-_DEFAULT_CLIENT_ID = (
- f"{_PUBLIC_CLIENT_ID_PROJECT_NUM}-{_PUBLIC_CLIENT_ID_HASH}"
- ".apps.googleusercontent.com"
-)
-_DEFAULT_CLIENT_SECRET = f"GOCSPX-{_PUBLIC_CLIENT_SECRET_SUFFIX}"
-
-# Regex patterns for fallback scraping from an installed gemini-cli.
-import re as _re
-from utils import atomic_replace
-_CLIENT_ID_PATTERN = _re.compile(
- r"OAUTH_CLIENT_ID\s*=\s*['\"]([0-9]+-[a-z0-9]+\.apps\.googleusercontent\.com)['\"]"
-)
-_CLIENT_SECRET_PATTERN = _re.compile(
- r"OAUTH_CLIENT_SECRET\s*=\s*['\"](GOCSPX-[A-Za-z0-9_-]+)['\"]"
-)
-_CLIENT_ID_SHAPE = _re.compile(r"([0-9]{8,}-[a-z0-9]{20,}\.apps\.googleusercontent\.com)")
-_CLIENT_SECRET_SHAPE = _re.compile(r"(GOCSPX-[A-Za-z0-9_-]{20,})")
-
-
-# =============================================================================
-# Endpoints & constants
-# =============================================================================
-
-AUTH_ENDPOINT = "https://accounts.google.com/o/oauth2/v2/auth"
-TOKEN_ENDPOINT = "https://oauth2.googleapis.com/token"
-USERINFO_ENDPOINT = "https://www.googleapis.com/oauth2/v1/userinfo"
-
-OAUTH_SCOPES = (
- "https://www.googleapis.com/auth/cloud-platform "
- "https://www.googleapis.com/auth/userinfo.email "
- "https://www.googleapis.com/auth/userinfo.profile"
-)
-
-DEFAULT_REDIRECT_PORT = 8085
-REDIRECT_HOST = "127.0.0.1"
-CALLBACK_PATH = "/oauth2callback"
-
-# 60-second clock skew buffer (matches opencode-gemini-auth).
-REFRESH_SKEW_SECONDS = 60
-
-TOKEN_REQUEST_TIMEOUT_SECONDS = 20.0
-CALLBACK_WAIT_SECONDS = 300
-LOCK_TIMEOUT_SECONDS = 30.0
-
-# Headless env detection
-_HEADLESS_ENV_VARS = ("SSH_CONNECTION", "SSH_CLIENT", "SSH_TTY", "HERMES_HEADLESS")
-
-
-# =============================================================================
-# Error type
-# =============================================================================
-
-class GoogleOAuthError(RuntimeError):
- """Raised for any failure in the Google OAuth flow."""
-
- def __init__(self, message: str, *, code: str = "google_oauth_error") -> None:
- super().__init__(message)
- self.code = code
-
-
-# =============================================================================
-# File paths & cross-process locking
-# =============================================================================
-
-def _credentials_path() -> Path:
- return get_hermes_home() / "auth" / "google_oauth.json"
-
-
-def _lock_path() -> Path:
- return _credentials_path().with_suffix(".json.lock")
-
-
-_lock_state = threading.local()
-
-
-@contextlib.contextmanager
-def _credentials_lock(timeout_seconds: float = LOCK_TIMEOUT_SECONDS):
- """Cross-process lock around the credentials file (fcntl POSIX / msvcrt Windows)."""
- depth = getattr(_lock_state, "depth", 0)
- if depth > 0:
- _lock_state.depth = depth + 1
- try:
- yield
- finally:
- _lock_state.depth -= 1
- return
-
- lock_file_path = _lock_path()
- lock_file_path.parent.mkdir(parents=True, exist_ok=True)
- fd = os.open(str(lock_file_path), os.O_CREAT | os.O_RDWR, 0o600)
- acquired = False
- try:
- try:
- import fcntl
- except ImportError:
- fcntl = None
-
- if fcntl is not None:
- deadline = time.monotonic() + max(0.0, float(timeout_seconds))
- while True:
- try:
- fcntl.flock(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
- acquired = True
- break
- except BlockingIOError:
- if time.monotonic() >= deadline:
- raise TimeoutError(
- f"Timed out acquiring Google OAuth credentials lock at {lock_file_path}."
- )
- time.sleep(0.05)
- else:
- try:
- import msvcrt # type: ignore[import-not-found]
-
- deadline = time.monotonic() + max(0.0, float(timeout_seconds))
- while True:
- try:
- msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
- acquired = True
- break
- except OSError:
- if time.monotonic() >= deadline:
- raise TimeoutError(
- f"Timed out acquiring Google OAuth credentials lock at {lock_file_path}."
- )
- time.sleep(0.05)
- except ImportError:
- acquired = True
-
- _lock_state.depth = 1
- yield
- finally:
- try:
- if acquired:
- try:
- import fcntl
-
- fcntl.flock(fd, fcntl.LOCK_UN)
- except ImportError:
- try:
- import msvcrt # type: ignore[import-not-found]
-
- try:
- msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
- except OSError:
- pass
- except ImportError:
- pass
- finally:
- os.close(fd)
- _lock_state.depth = 0
-
-
-# =============================================================================
-# Client ID resolution
-# =============================================================================
-
-_scraped_creds_cache: Dict[str, str] = {}
-
-
-def _locate_gemini_cli_oauth_js() -> Optional[Path]:
- """Walk the user's gemini binary install to find its oauth2.js.
-
- Returns None if gemini isn't installed. Supports both the npm install
- (``node_modules/@google/gemini-cli-core/dist/**/code_assist/oauth2.js``)
- and the Homebrew ``bundle/`` layout.
- """
- import shutil
-
- gemini = shutil.which("gemini")
- if not gemini:
- return None
-
- try:
- real = Path(gemini).resolve()
- except OSError:
- return None
-
- # Walk up from the binary to find npm install root
- search_dirs: list[Path] = []
- cur = real.parent
- for _ in range(8): # don't walk too far
- search_dirs.append(cur)
- if (cur / "node_modules").exists():
- search_dirs.append(cur / "node_modules" / "@google" / "gemini-cli-core")
- break
- if cur.parent == cur:
- break
- cur = cur.parent
-
- for root in search_dirs:
- if not root.exists():
- continue
- # Common known paths
- candidates = [
- root / "dist" / "src" / "code_assist" / "oauth2.js",
- root / "dist" / "code_assist" / "oauth2.js",
- root / "src" / "code_assist" / "oauth2.js",
- ]
- for c in candidates:
- if c.exists():
- return c
- # Recursive fallback: look for oauth2.js within 10 dirs deep
- try:
- for path in root.rglob("oauth2.js"):
- return path
- except (OSError, ValueError):
- continue
-
- return None
-
-
-def _scrape_client_credentials() -> Tuple[str, str]:
- """Extract client_id + client_secret from the local gemini-cli install."""
- if _scraped_creds_cache.get("resolved"):
- return _scraped_creds_cache.get("client_id", ""), _scraped_creds_cache.get("client_secret", "")
-
- oauth_js = _locate_gemini_cli_oauth_js()
- if oauth_js is None:
- _scraped_creds_cache["resolved"] = "1" # Don't retry on every call
- return "", ""
-
- try:
- content = oauth_js.read_text(encoding="utf-8", errors="replace")
- except OSError as exc:
- logger.debug("Failed to read oauth2.js at %s: %s", oauth_js, exc)
- _scraped_creds_cache["resolved"] = "1"
- return "", ""
-
- # Precise pattern first, then fallback shape match
- cid_match = _CLIENT_ID_PATTERN.search(content) or _CLIENT_ID_SHAPE.search(content)
- cs_match = _CLIENT_SECRET_PATTERN.search(content) or _CLIENT_SECRET_SHAPE.search(content)
-
- client_id = cid_match.group(1) if cid_match else ""
- client_secret = cs_match.group(1) if cs_match else ""
-
- _scraped_creds_cache["client_id"] = client_id
- _scraped_creds_cache["client_secret"] = client_secret
- _scraped_creds_cache["resolved"] = "1"
-
- if client_id:
- logger.info("Scraped Gemini OAuth client from %s", oauth_js)
-
- return client_id, client_secret
-
-
-def _get_client_id() -> str:
- env_val = (os.getenv(ENV_CLIENT_ID) or "").strip()
- if env_val:
- return env_val
- if _DEFAULT_CLIENT_ID:
- return _DEFAULT_CLIENT_ID
- scraped, _ = _scrape_client_credentials()
- return scraped
-
-
-def _get_client_secret() -> str:
- env_val = (os.getenv(ENV_CLIENT_SECRET) or "").strip()
- if env_val:
- return env_val
- if _DEFAULT_CLIENT_SECRET:
- return _DEFAULT_CLIENT_SECRET
- _, scraped = _scrape_client_credentials()
- return scraped
-
-
-def _require_client_id() -> str:
- cid = _get_client_id()
- if not cid:
- raise GoogleOAuthError(
- "Google OAuth client ID is not available.\n"
- "Hermes looks for a locally installed gemini-cli to source the OAuth client. "
- "Either:\n"
- " 1. Install it: npm install -g @google/gemini-cli (or brew install gemini-cli)\n"
- " 2. Set HERMES_GEMINI_CLIENT_ID and HERMES_GEMINI_CLIENT_SECRET in ~/.hermes/.env\n"
- "\n"
- "Register a Desktop OAuth client at:\n"
- " https://console.cloud.google.com/apis/credentials\n"
- "(enable the Generative Language API on the project).",
- code="google_oauth_client_id_missing",
- )
- return cid
-
-
-# =============================================================================
-# PKCE
-# =============================================================================
-
-def _generate_pkce_pair() -> Tuple[str, str]:
- """Generate a (verifier, challenge) pair using S256."""
- verifier = secrets.token_urlsafe(64)
- digest = hashlib.sha256(verifier.encode("ascii")).digest()
- challenge = base64.urlsafe_b64encode(digest).rstrip(b"=").decode("ascii")
- return verifier, challenge
-
-
-# =============================================================================
-# Packed refresh format: refresh_token[|project_id[|managed_project_id]]
-# =============================================================================
-
-@dataclass
-class RefreshParts:
- refresh_token: str
- project_id: str = ""
- managed_project_id: str = ""
-
- @classmethod
- def parse(cls, packed: str) -> "RefreshParts":
- if not packed:
- return cls(refresh_token="")
- parts = packed.split("|", 2)
- return cls(
- refresh_token=parts[0],
- project_id=parts[1] if len(parts) > 1 else "",
- managed_project_id=parts[2] if len(parts) > 2 else "",
- )
-
- def format(self) -> str:
- if not self.refresh_token:
- return ""
- if not self.project_id and not self.managed_project_id:
- return self.refresh_token
- return f"{self.refresh_token}|{self.project_id}|{self.managed_project_id}"
-
-
-# =============================================================================
-# Credentials (dataclass wrapping the on-disk format)
-# =============================================================================
-
-@dataclass
-class GoogleCredentials:
- access_token: str
- refresh_token: str
- expires_ms: int # unix milliseconds
- email: str = ""
- project_id: str = ""
- managed_project_id: str = ""
-
- def to_dict(self) -> Dict[str, Any]:
- return {
- "refresh": RefreshParts(
- refresh_token=self.refresh_token,
- project_id=self.project_id,
- managed_project_id=self.managed_project_id,
- ).format(),
- "access": self.access_token,
- "expires": int(self.expires_ms),
- "email": self.email,
- }
-
- @classmethod
- def from_dict(cls, data: Dict[str, Any]) -> "GoogleCredentials":
- refresh_packed = str(data.get("refresh", "") or "")
- parts = RefreshParts.parse(refresh_packed)
- return cls(
- access_token=str(data.get("access", "") or ""),
- refresh_token=parts.refresh_token,
- expires_ms=int(data.get("expires", 0) or 0),
- email=str(data.get("email", "") or ""),
- project_id=parts.project_id,
- managed_project_id=parts.managed_project_id,
- )
-
- def expires_unix_seconds(self) -> float:
- return self.expires_ms / 1000.0
-
- def access_token_expired(self, skew_seconds: int = REFRESH_SKEW_SECONDS) -> bool:
- if not self.access_token or not self.expires_ms:
- return True
- return (time.time() + max(0, skew_seconds)) * 1000 >= self.expires_ms
-
-
-# =============================================================================
-# Credential I/O (atomic + locked)
-# =============================================================================
-
-def load_credentials() -> Optional[GoogleCredentials]:
- """Load credentials from disk. Returns None if missing or corrupt."""
- path = _credentials_path()
- if not path.exists():
- return None
- try:
- with _credentials_lock():
- raw = path.read_text(encoding="utf-8")
- data = json.loads(raw)
- except (json.JSONDecodeError, OSError, IOError) as exc:
- logger.warning("Failed to read Google OAuth credentials at %s: %s", path, exc)
- return None
- if not isinstance(data, dict):
- return None
- creds = GoogleCredentials.from_dict(data)
- if not creds.access_token:
- return None
- return creds
-
-
-def save_credentials(creds: GoogleCredentials) -> Path:
- """Atomically write creds to disk with 0o600 permissions."""
- path = _credentials_path()
- path.parent.mkdir(parents=True, exist_ok=True)
- # Tighten parent dir to 0o700 so siblings can't traverse to the creds file.
- # On Windows this is a no-op (POSIX mode bits aren't enforced); ignore failures.
- # secure_parent_dir refuses to chmod / or top-level dirs (#25821).
- secure_parent_dir(path)
- payload = json.dumps(creds.to_dict(), indent=2, sort_keys=True) + "\n"
-
- with _credentials_lock():
- tmp_path = path.with_suffix(f".tmp.{os.getpid()}.{secrets.token_hex(4)}")
- try:
- # Create with 0o600 atomically to close the TOCTOU window where the
- # default umask (often 0o644) would briefly expose tokens to other
- # local users between open() and chmod().
- fd = os.open(
- str(tmp_path),
- os.O_WRONLY | os.O_CREAT | os.O_EXCL,
- stat.S_IRUSR | stat.S_IWUSR,
- )
- with os.fdopen(fd, "w", encoding="utf-8") as fh:
- fh.write(payload)
- fh.flush()
- os.fsync(fh.fileno())
- atomic_replace(tmp_path, path)
- finally:
- try:
- if tmp_path.exists():
- tmp_path.unlink()
- except OSError:
- pass
- return path
-
-
-def clear_credentials() -> None:
- """Remove the creds file. Idempotent."""
- path = _credentials_path()
- with _credentials_lock():
- try:
- path.unlink()
- except FileNotFoundError:
- pass
- except OSError as exc:
- logger.warning("Failed to remove Google OAuth credentials at %s: %s", path, exc)
-
-
-# =============================================================================
-# HTTP helpers
-# =============================================================================
-
-def _post_form(url: str, data: Dict[str, str], timeout: float) -> Dict[str, Any]:
- """POST x-www-form-urlencoded and return parsed JSON response."""
- body = urllib.parse.urlencode(data).encode("ascii")
- request = urllib.request.Request(
- url,
- data=body,
- method="POST",
- headers={
- "Content-Type": "application/x-www-form-urlencoded",
- "Accept": "application/json",
- },
- )
- try:
- with urllib.request.urlopen(request, timeout=timeout) as response:
- raw = response.read().decode("utf-8", errors="replace")
- return json.loads(raw)
- except urllib.error.HTTPError as exc:
- detail = ""
- try:
- detail = exc.read().decode("utf-8", errors="replace")
- except Exception:
- pass
- # Detect invalid_grant to signal credential revocation
- code = "google_oauth_token_http_error"
- if "invalid_grant" in detail.lower():
- code = "google_oauth_invalid_grant"
- raise GoogleOAuthError(
- f"Google OAuth token endpoint returned HTTP {exc.code}: {detail or exc.reason}",
- code=code,
- ) from exc
- except urllib.error.URLError as exc:
- raise GoogleOAuthError(
- f"Google OAuth token request failed: {exc}",
- code="google_oauth_token_network_error",
- ) from exc
-
-
-def exchange_code(
- code: str,
- verifier: str,
- redirect_uri: str,
- *,
- client_id: Optional[str] = None,
- client_secret: Optional[str] = None,
- timeout: float = TOKEN_REQUEST_TIMEOUT_SECONDS,
-) -> Dict[str, Any]:
- """Exchange authorization code for access + refresh tokens."""
- cid = client_id if client_id is not None else _get_client_id()
- csecret = client_secret if client_secret is not None else _get_client_secret()
- data = {
- "grant_type": "authorization_code",
- "code": code,
- "code_verifier": verifier,
- "client_id": cid,
- "redirect_uri": redirect_uri,
- }
- if csecret:
- data["client_secret"] = csecret
- return _post_form(TOKEN_ENDPOINT, data, timeout)
-
-
-def refresh_access_token(
- refresh_token: str,
- *,
- client_id: Optional[str] = None,
- client_secret: Optional[str] = None,
- timeout: float = TOKEN_REQUEST_TIMEOUT_SECONDS,
-) -> Dict[str, Any]:
- """Refresh the access token."""
- if not refresh_token:
- raise GoogleOAuthError(
- "Cannot refresh: refresh_token is empty. Re-run OAuth login.",
- code="google_oauth_refresh_token_missing",
- )
- cid = client_id if client_id is not None else _get_client_id()
- csecret = client_secret if client_secret is not None else _get_client_secret()
- data = {
- "grant_type": "refresh_token",
- "refresh_token": refresh_token,
- "client_id": cid,
- }
- if csecret:
- data["client_secret"] = csecret
- return _post_form(TOKEN_ENDPOINT, data, timeout)
-
-
-def _fetch_user_email(access_token: str, timeout: float = TOKEN_REQUEST_TIMEOUT_SECONDS) -> str:
- """Best-effort userinfo fetch for display. Failures return empty string."""
- try:
- request = urllib.request.Request(
- USERINFO_ENDPOINT + "?alt=json",
- headers={"Authorization": f"Bearer {access_token}"},
- )
- with urllib.request.urlopen(request, timeout=timeout) as response:
- raw = response.read().decode("utf-8", errors="replace")
- data = json.loads(raw)
- return str(data.get("email", "") or "")
- except Exception as exc:
- logger.debug("Userinfo fetch failed (non-fatal): %s", exc)
- return ""
-
-
-# =============================================================================
-# In-flight refresh deduplication
-# =============================================================================
-
-_refresh_inflight: Dict[str, threading.Event] = {}
-_refresh_inflight_lock = threading.Lock()
-
-
-def get_valid_access_token(*, force_refresh: bool = False) -> str:
- """Load creds, refreshing if near expiry, and return a valid bearer token.
-
- Dedupes concurrent refreshes by refresh_token. On ``invalid_grant``, the
- credential file is wiped and a ``google_oauth_invalid_grant`` error is raised
- (caller is expected to trigger a re-login flow).
- """
- creds = load_credentials()
- if creds is None:
- raise GoogleOAuthError(
- "No Google OAuth credentials found. Run `hermes auth add google-gemini-cli` first.",
- code="google_oauth_not_logged_in",
- )
-
- if not force_refresh and not creds.access_token_expired():
- return creds.access_token
-
- # Dedupe concurrent refreshes by refresh_token
- rt = creds.refresh_token
- with _refresh_inflight_lock:
- event = _refresh_inflight.get(rt)
- if event is None:
- event = threading.Event()
- _refresh_inflight[rt] = event
- owner = True
- else:
- owner = False
-
- if not owner:
- # Another thread is refreshing — wait, then re-read from disk.
- event.wait(timeout=LOCK_TIMEOUT_SECONDS)
- fresh = load_credentials()
- if fresh is not None and not fresh.access_token_expired():
- return fresh.access_token
- # Fall through to do our own refresh if the other attempt failed
-
- try:
- try:
- resp = refresh_access_token(rt)
- except GoogleOAuthError as exc:
- if exc.code == "google_oauth_invalid_grant":
- logger.warning(
- "Google OAuth refresh token invalid (revoked/expired). "
- "Clearing credentials at %s — user must re-login.",
- _credentials_path(),
- )
- clear_credentials()
- raise
-
- new_access = str(resp.get("access_token", "") or "").strip()
- if not new_access:
- raise GoogleOAuthError(
- "Refresh response did not include an access_token.",
- code="google_oauth_refresh_empty",
- )
- # Google sometimes rotates refresh_token; preserve existing if omitted.
- new_refresh = str(resp.get("refresh_token", "") or "").strip() or creds.refresh_token
- expires_in = int(resp.get("expires_in", 0) or 0)
-
- creds.access_token = new_access
- creds.refresh_token = new_refresh
- creds.expires_ms = int((time.time() + max(60, expires_in)) * 1000)
- save_credentials(creds)
- return creds.access_token
- finally:
- if owner:
- with _refresh_inflight_lock:
- _refresh_inflight.pop(rt, None)
- event.set()
-
-
-# =============================================================================
-# Update project IDs on stored creds
-# =============================================================================
-
-def update_project_ids(project_id: str = "", managed_project_id: str = "") -> None:
- """Persist resolved/discovered project IDs back into the credential file."""
- creds = load_credentials()
- if creds is None:
- return
- if project_id:
- creds.project_id = project_id
- if managed_project_id:
- creds.managed_project_id = managed_project_id
- save_credentials(creds)
-
-
-# =============================================================================
-# Callback server
-# =============================================================================
-
-class _OAuthCallbackHandler(http.server.BaseHTTPRequestHandler):
- expected_state: str = ""
- captured_code: Optional[str] = None
- captured_error: Optional[str] = None
- ready: Optional[threading.Event] = None
-
- def log_message(self, format: str, *args: Any) -> None: # noqa: A002, N802
- logger.debug("OAuth callback: " + format, *args)
-
- def do_GET(self) -> None: # noqa: N802
- parsed = urllib.parse.urlparse(self.path)
- if parsed.path != CALLBACK_PATH:
- self.send_response(404)
- self.end_headers()
- return
-
- params = urllib.parse.parse_qs(parsed.query)
- state = (params.get("state") or [""])[0]
- error = (params.get("error") or [""])[0]
- code = (params.get("code") or [""])[0]
-
- if state != type(self).expected_state:
- type(self).captured_error = "state_mismatch"
- self._respond_html(400, _ERROR_PAGE.format(message="State mismatch — aborting for safety."))
- elif error:
- type(self).captured_error = error
- # Simple HTML-escape of the error value
- safe_err = (
- str(error)
- .replace("&", "&")
- .replace("<", "<")
- .replace(">", ">")
- )
- self._respond_html(400, _ERROR_PAGE.format(message=f"Authorization denied: {safe_err}"))
- elif code:
- type(self).captured_code = code
- self._respond_html(200, _SUCCESS_PAGE)
- else:
- type(self).captured_error = "no_code"
- self._respond_html(400, _ERROR_PAGE.format(message="Callback received no authorization code."))
-
- if type(self).ready is not None:
- type(self).ready.set()
-
- def _respond_html(self, status: int, body: str) -> None:
- payload = body.encode("utf-8")
- self.send_response(status)
- self.send_header("Content-Type", "text/html; charset=utf-8")
- self.send_header("Content-Length", str(len(payload)))
- self.end_headers()
- self.wfile.write(payload)
-
-
-_SUCCESS_PAGE = """
-Hermes — signed in
-
-
Signed in to Google.
-
You can close this tab and return to your terminal.
Return to your terminal — Hermes will walk you through a manual paste fallback.
-"""
-
-
-def _bind_callback_server(preferred_port: int = DEFAULT_REDIRECT_PORT) -> Tuple[http.server.HTTPServer, int]:
- try:
- server = http.server.HTTPServer((REDIRECT_HOST, preferred_port), _OAuthCallbackHandler)
- return server, preferred_port
- except OSError as exc:
- logger.info(
- "Preferred OAuth callback port %d unavailable (%s); requesting ephemeral port",
- preferred_port, exc,
- )
- server = http.server.HTTPServer((REDIRECT_HOST, 0), _OAuthCallbackHandler)
- return server, server.server_address[1]
-
-
-def _is_headless() -> bool:
- return any(os.getenv(k) for k in _HEADLESS_ENV_VARS)
-
-
-# =============================================================================
-# Main login flow
-# =============================================================================
-
-def start_oauth_flow(
- *,
- force_relogin: bool = False,
- open_browser: bool = True,
- callback_wait_seconds: float = CALLBACK_WAIT_SECONDS,
- project_id: str = "",
-) -> GoogleCredentials:
- """Run the interactive browser OAuth flow and persist credentials.
-
- Args:
- force_relogin: If False and valid creds already exist, return them.
- open_browser: If False, skip webbrowser.open and print the URL only.
- callback_wait_seconds: Max seconds to wait for the browser callback.
- project_id: Initial GCP project ID to bake into the stored creds.
- Can be discovered/updated later via update_project_ids().
- """
- if not force_relogin:
- existing = load_credentials()
- if existing and existing.access_token:
- logger.info("Google OAuth credentials already present; skipping login.")
- return existing
-
- client_id = _require_client_id() # raises GoogleOAuthError with install hints
- client_secret = _get_client_secret()
-
- verifier, challenge = _generate_pkce_pair()
- state = secrets.token_urlsafe(16)
-
- # If headless, skip the listener and go straight to paste mode
- if _is_headless() and open_browser:
- logger.info("Headless environment detected; using paste-mode OAuth fallback.")
- return _paste_mode_login(verifier, challenge, state, client_id, client_secret, project_id)
-
- server, port = _bind_callback_server(DEFAULT_REDIRECT_PORT)
- redirect_uri = f"http://{REDIRECT_HOST}:{port}{CALLBACK_PATH}"
-
- _OAuthCallbackHandler.expected_state = state
- _OAuthCallbackHandler.captured_code = None
- _OAuthCallbackHandler.captured_error = None
- ready = threading.Event()
- _OAuthCallbackHandler.ready = ready
-
- params = {
- "client_id": client_id,
- "redirect_uri": redirect_uri,
- "response_type": "code",
- "scope": OAUTH_SCOPES,
- "state": state,
- "code_challenge": challenge,
- "code_challenge_method": "S256",
- "access_type": "offline",
- "prompt": "consent",
- }
- auth_url = AUTH_ENDPOINT + "?" + urllib.parse.urlencode(params) + "#hermes"
-
- server_thread = threading.Thread(target=server.serve_forever, daemon=True)
- server_thread.start()
-
- print()
- print("Opening your browser to sign in to Google…")
- print(f"If it does not open automatically, visit:\n {auth_url}")
- print()
-
- if open_browser:
- try:
- import webbrowser
-
- try:
- from hermes_cli.auth import (
- _can_open_graphical_browser as _can_open_gui,
- )
- except Exception:
- _can_open_gui = lambda: True # noqa: E731
-
- if _can_open_gui():
- webbrowser.open(auth_url, new=1, autoraise=True)
- except Exception as exc:
- logger.debug("webbrowser.open failed: %s", exc)
-
- code: Optional[str] = None
- try:
- if ready.wait(timeout=callback_wait_seconds):
- code = _OAuthCallbackHandler.captured_code
- error = _OAuthCallbackHandler.captured_error
- if error:
- raise GoogleOAuthError(
- f"Authorization failed: {error}",
- code="google_oauth_authorization_failed",
- )
- else:
- logger.info("Callback server timed out — offering manual paste fallback.")
- code = _prompt_paste_fallback()
- finally:
- try:
- server.shutdown()
- except Exception:
- pass
- try:
- server.server_close()
- except Exception:
- pass
- server_thread.join(timeout=2.0)
-
- if not code:
- raise GoogleOAuthError(
- "No authorization code received. Aborting.",
- code="google_oauth_no_code",
- )
-
- token_resp = exchange_code(
- code, verifier, redirect_uri,
- client_id=client_id, client_secret=client_secret,
- )
- return _persist_token_response(token_resp, project_id=project_id)
-
-
-def _paste_mode_login(
- verifier: str,
- challenge: str,
- state: str,
- client_id: str,
- client_secret: str,
- project_id: str,
-) -> GoogleCredentials:
- """Run OAuth flow without a local callback server."""
- # Use a placeholder redirect URI; user will paste the full URL back
- redirect_uri = f"http://{REDIRECT_HOST}:{DEFAULT_REDIRECT_PORT}{CALLBACK_PATH}"
- params = {
- "client_id": client_id,
- "redirect_uri": redirect_uri,
- "response_type": "code",
- "scope": OAUTH_SCOPES,
- "state": state,
- "code_challenge": challenge,
- "code_challenge_method": "S256",
- "access_type": "offline",
- "prompt": "consent",
- }
- auth_url = AUTH_ENDPOINT + "?" + urllib.parse.urlencode(params) + "#hermes"
-
- print()
- print("Open this URL in a browser on any device:")
- print(f" {auth_url}")
- print()
- print("After signing in, Google will redirect to localhost (which won't load).")
- print("Copy the full URL from your browser and paste it below.")
- print()
-
- code = _prompt_paste_fallback()
- if not code:
- raise GoogleOAuthError("No authorization code provided.", code="google_oauth_no_code")
-
- token_resp = exchange_code(
- code, verifier, redirect_uri,
- client_id=client_id, client_secret=client_secret,
- )
- return _persist_token_response(token_resp, project_id=project_id)
-
-
-def _prompt_paste_fallback() -> Optional[str]:
- print()
- print("Paste the full redirect URL Google showed you, OR just the 'code=' parameter value.")
- raw = input("Callback URL or code: ").strip()
- if not raw:
- return None
- if raw.startswith("http://") or raw.startswith("https://"):
- parsed = urllib.parse.urlparse(raw)
- params = urllib.parse.parse_qs(parsed.query)
- return (params.get("code") or [""])[0] or None
- # Accept a bare query string as well
- if raw.startswith("?"):
- params = urllib.parse.parse_qs(raw[1:])
- return (params.get("code") or [""])[0] or None
- return raw
-
-
-def _persist_token_response(
- token_resp: Dict[str, Any],
- *,
- project_id: str = "",
-) -> GoogleCredentials:
- access_token = str(token_resp.get("access_token", "") or "").strip()
- refresh_token = str(token_resp.get("refresh_token", "") or "").strip()
- expires_in = int(token_resp.get("expires_in", 0) or 0)
- if not access_token or not refresh_token:
- raise GoogleOAuthError(
- "Google token response missing access_token or refresh_token.",
- code="google_oauth_incomplete_token_response",
- )
- creds = GoogleCredentials(
- access_token=access_token,
- refresh_token=refresh_token,
- expires_ms=int((time.time() + max(60, expires_in)) * 1000),
- email=_fetch_user_email(access_token),
- project_id=project_id,
- managed_project_id="",
- )
- save_credentials(creds)
- logger.info("Google OAuth credentials saved to %s", _credentials_path())
- return creds
-
-
-# =============================================================================
-# Pool-compatible variant
-# =============================================================================
-
-def run_gemini_oauth_login_pure() -> Dict[str, Any]:
- """Run the login flow and return a dict matching the credential pool shape."""
- creds = start_oauth_flow(force_relogin=True)
- return {
- "access_token": creds.access_token,
- "refresh_token": creds.refresh_token,
- "expires_at_ms": creds.expires_ms,
- "email": creds.email,
- "project_id": creds.project_id,
- }
-
-
-# =============================================================================
-# Project ID resolution
-# =============================================================================
-
-def resolve_project_id_from_env() -> str:
- """Return a GCP project ID from env vars, in priority order."""
- for var in (
- "HERMES_GEMINI_PROJECT_ID",
- "GOOGLE_CLOUD_PROJECT",
- "GOOGLE_CLOUD_PROJECT_ID",
- ):
- val = (os.getenv(var) or "").strip()
- if val:
- return val
- return ""
diff --git a/agent/image_gen_provider.py b/agent/image_gen_provider.py
index a7f1b8c31ff..a3eeb1e4c8c 100644
--- a/agent/image_gen_provider.py
+++ b/agent/image_gen_provider.py
@@ -11,6 +11,18 @@ Providers live in ``/plugins/image_gen//`` (built-in, auto-loaded
as ``kind: backend``) or ``~/.hermes/plugins/image_gen//`` (user, opt-in
via ``plugins.enabled``).
+Unified surface
+---------------
+One tool — ``image_generate`` — covers **text-to-image** and
+**image-to-image / image editing**. The router is the presence of
+``image_url`` (and/or ``reference_image_urls``): if any source image is
+provided, the provider routes to its image-to-image / edit endpoint; if
+omitted, the provider routes to text-to-image. Users pick one **model**
+(e.g. nano-banana-pro, gpt-image-2, grok-imagine-image); the provider
+handles which underlying endpoint to hit. This mirrors the ``video_gen``
+provider design (``agent/video_gen_provider.py``) so the two surfaces
+stay learnable together.
+
Response shape
--------------
All providers return a dict that :func:`success_response` / :func:`error_response`
@@ -21,6 +33,7 @@ produce. The tool wrapper JSON-serializes it. Keys:
model str provider-specific model identifier
prompt str echoed prompt
aspect_ratio str "landscape" | "square" | "portrait"
+ modality str "text" | "image" (which mode was used)
provider str provider name (for diagnostics)
error str only when success=False
error_type str only when success=False
@@ -127,19 +140,51 @@ class ImageGenProvider(abc.ABC):
return models[0].get("id")
return None
+ def capabilities(self) -> Dict[str, Any]:
+ """Return what this provider supports.
+
+ Returned dict (all keys optional)::
+
+ {
+ "modalities": ["text", "image"], # which inputs the backend accepts
+ "max_reference_images": 9, # cap for reference_image_urls
+ }
+
+ ``modalities`` declares whether the active backend/model supports
+ text-to-image (``"text"``), image-to-image / editing (``"image"``),
+ or both. The tool layer surfaces this in the dynamic schema so the
+ model knows when ``image_url`` is honored. Used by ``hermes tools``
+ for the picker too. Default: text-only (backward compatible — a
+ provider that doesn't override this advertises text-to-image only).
+ """
+ return {
+ "modalities": ["text"],
+ "max_reference_images": 0,
+ }
+
@abc.abstractmethod
def generate(
self,
prompt: str,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
+ *,
+ image_url: Optional[str] = None,
+ reference_image_urls: Optional[List[str]] = None,
**kwargs: Any,
) -> Dict[str, Any]:
- """Generate an image.
+ """Generate an image from a text prompt, or edit/transform a source image.
+
+ Routing: if ``image_url`` (or any ``reference_image_urls``) is
+ provided, the provider should route to its image-to-image / edit
+ endpoint; otherwise text-to-image. ``image_url`` is the primary
+ source image to edit; ``reference_image_urls`` are additional
+ style/composition references (provider clamps to its declared
+ ``max_reference_images``).
Implementations should return the dict from :func:`success_response`
or :func:`error_response`. ``kwargs`` may contain forward-compat
- parameters future versions of the schema will expose — implementations
- should ignore unknown keys.
+ parameters future versions of the schema will expose —
+ implementations MUST ignore unknown keys (no TypeError).
"""
@@ -162,6 +207,26 @@ def resolve_aspect_ratio(value: Optional[str]) -> str:
return DEFAULT_ASPECT_RATIO
+def normalize_reference_images(value: Any) -> Optional[List[str]]:
+ """Coerce a reference-image argument into a clean list of URL/path strings.
+
+ Accepts a single string or a list; strips blanks and whitespace. Returns
+ ``None`` when nothing usable remains so providers can treat "no refs" as a
+ single sentinel.
+ """
+ if value is None:
+ return None
+ if isinstance(value, str):
+ value = [value]
+ if not isinstance(value, (list, tuple)):
+ return None
+ out: List[str] = []
+ for item in value:
+ if isinstance(item, str) and item.strip():
+ out.append(item.strip())
+ return out or None
+
+
def _images_cache_dir() -> Path:
"""Return ``$HERMES_HOME/cache/images/``, creating parents as needed."""
from hermes_constants import get_hermes_home
@@ -280,13 +345,16 @@ def success_response(
prompt: str,
aspect_ratio: str,
provider: str,
+ modality: str = "text",
extra: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Build a uniform success response dict.
``image`` may be an HTTP URL or an absolute filesystem path (for b64
- providers like OpenAI). Callers that need to pass through additional
- backend-specific fields can supply ``extra``.
+ providers like OpenAI). ``modality`` is ``"text"`` (text-to-image) or
+ ``"image"`` (image-to-image / editing) — indicates which endpoint was
+ actually hit, useful for diagnostics. Callers that need to pass through
+ additional backend-specific fields can supply ``extra``.
"""
payload: Dict[str, Any] = {
"success": True,
@@ -294,6 +362,7 @@ def success_response(
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
+ "modality": modality,
"provider": provider,
}
if extra:
diff --git a/agent/memory_manager.py b/agent/memory_manager.py
index dcd50a2997a..c4baf44fe9a 100644
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@@ -721,9 +721,10 @@ class MemoryManager:
try:
provider.on_session_end(messages)
except Exception as e:
- logger.debug(
+ logger.warning(
"Memory provider '%s' on_session_end failed: %s",
provider.name, e,
+ exc_info=True,
)
def on_session_switch(
diff --git a/agent/memory_provider.py b/agent/memory_provider.py
index 89ac40effaa..4210a4c252e 100644
--- a/agent/memory_provider.py
+++ b/agent/memory_provider.py
@@ -28,6 +28,7 @@ Optional hooks (override to opt in):
on_pre_compress(messages) -> str — extract before context compression
on_memory_write(action, target, content, metadata=None) — mirror built-in memory writes
on_delegation(task, result, **kwargs) — parent-side observation of subagent work
+ backup_paths() -> list[str] — extra on-disk paths to include in `hermes backup`
"""
from __future__ import annotations
@@ -294,3 +295,21 @@ class MemoryProvider(ABC):
Use to mirror built-in memory writes to your backend.
"""
+
+ def backup_paths(self) -> List[str]:
+ """Return extra on-disk paths this provider stores OUTSIDE HERMES_HOME.
+
+ ``hermes backup`` only walks HERMES_HOME, so any provider state kept
+ under ``~/.honcho``, ``~/.hindsight``, ``~/.openviking``, etc. is lost
+ across a backup/import cycle unless it's declared here.
+
+ Return a list of absolute path strings (files or directories). The
+ backup command resolves each, captures the ones that exist and live
+ under the user's home directory into a reserved ``_external/`` subtree
+ of the archive, and ``hermes import`` restores them to their original
+ locations. Paths outside the home directory are skipped for safety.
+
+ MUST be callable without ``initialize()`` and without network — resolve
+ from config/env only. Default returns an empty list (nothing external).
+ """
+ return []
diff --git a/agent/message_content.py b/agent/message_content.py
new file mode 100644
index 00000000000..c42bf408550
--- /dev/null
+++ b/agent/message_content.py
@@ -0,0 +1,50 @@
+from __future__ import annotations
+
+from collections.abc import Mapping
+from typing import Any
+
+
+_NON_TEXT_PART_TYPES = {"image", "image_url", "input_image", "audio", "input_audio"}
+_TEXT_KEYS = ("text", "content", "input_text", "output_text", "summary_text")
+
+
+def _field(value: Any, key: str) -> Any:
+ if isinstance(value, Mapping):
+ return value.get(key)
+ return getattr(value, key, None)
+
+
+def _text_from_part(part: Any) -> str:
+ if part is None:
+ return ""
+ if isinstance(part, str):
+ return part
+
+ part_type = str(_field(part, "type") or "").strip().lower()
+ if part_type in _NON_TEXT_PART_TYPES:
+ return ""
+
+ for key in _TEXT_KEYS:
+ text = _field(part, key)
+ if isinstance(text, str):
+ return text
+ return ""
+
+
+def flatten_message_text(content: Any, *, sep: str = "\n") -> str:
+ """Return the visible text from common chat/Responses message content shapes."""
+ if content is None:
+ return ""
+ if isinstance(content, str):
+ return content
+ if isinstance(content, list):
+ chunks = [_text_from_part(part) for part in content]
+ return sep.join(chunk for chunk in chunks if chunk)
+
+ text = _text_from_part(content)
+ if text:
+ return text
+ try:
+ return str(content)
+ except Exception:
+ return ""
diff --git a/agent/prompt_builder.py b/agent/prompt_builder.py
index 97836f27b05..92378512261 100644
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -238,6 +238,23 @@ KANBAN_GUIDANCE = (
"of the decomposition. Do NOT execute the work yourself; your job is "
"routing, not implementation.\n"
"\n"
+ "## Reference details that change outcomes\n"
+ "\n"
+ "- **Workspace.** `cd $HERMES_KANBAN_WORKSPACE` first. For a `worktree` kind "
+ "with no `.git`, `git worktree add "
+ "${HERMES_KANBAN_BRANCH:-wt/$HERMES_KANBAN_TASK}` from the main repo, then "
+ "cd there.\n"
+ "- **Deliverables.** Files a human wants go in "
+ "`kanban_complete(artifacts=[])` (top-level param; paths in "
+ "`metadata` are NOT uploaded). Files must exist at completion.\n"
+ "- **Created cards.** List ids in `kanban_complete(created_cards=[...])` "
+ "ONLY when captured from a successful `kanban_create` return — never invent "
+ "or paste ids; the kernel rejects the completion on any phantom id.\n"
+ "- **Orchestrating: discover profiles first.** The dispatcher SILENTLY "
+ "drops a card with an unknown assignee (it sits in `ready` forever). Ground "
+ "every assignee in a real profile (`hermes profile list`, or ask the user), "
+ "and express dependencies via `parents=[...]` on `kanban_create`, not prose.\n"
+ "\n"
"## Do NOT\n"
"\n"
"- Do not shell out to `hermes kanban ` for board operations. Use "
diff --git a/agent/redact.py b/agent/redact.py
index de247ec0ad2..06a7300a307 100644
--- a/agent/redact.py
+++ b/agent/redact.py
@@ -120,9 +120,25 @@ _JSON_FIELD_RE = re.compile(
re.IGNORECASE,
)
-# Authorization headers
+# Authorization headers — any scheme (Bearer, Basic, Token, Digest, …) plus the
+# bare-credential form, and Proxy-Authorization. The credential token is masked
+# while the header name and scheme word are preserved for debuggability. The
+# previous rule only matched ``Bearer``, so ``Basic `` and
+# ``token `` leaked verbatim into logs/transcripts.
_AUTH_HEADER_RE = re.compile(
- r"(Authorization:\s*Bearer\s+)(\S+)",
+ r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?(\S+)",
+ re.IGNORECASE,
+)
+
+# API-key style auth headers carrying a single opaque value (no scheme word).
+# Anthropic and many providers authenticate with ``x-api-key``; values without
+# a known vendor prefix (custom/local backends) would otherwise leak when a
+# request or curl command is logged or echoed into tool output / transcripts.
+_SECRET_HEADER_NAMES = (
+ r"(?:x-api-key|x-goog-api-key|api-key|apikey|x-api-token|x-auth-token|x-access-token)"
+)
+_SECRET_HEADER_RE = re.compile(
+ rf"({_SECRET_HEADER_NAMES}\s*:\s*)(\S+)",
re.IGNORECASE,
)
@@ -374,11 +390,19 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
- # Authorization headers — _AUTH_HEADER_RE is "Authorization: Bearer ..."
- # case-insensitive, so "uthorization" is the cheapest substring gate that
- # covers both "Authorization" and "authorization" without a casefold().
+ # Authorization headers — _AUTH_HEADER_RE matches any scheme after
+ # "[Proxy-]Authorization:" case-insensitively, so "uthorization" is the
+ # cheapest substring gate that covers every casing without a casefold().
if "uthorization" in text or "UTHORIZATION" in text:
text = _AUTH_HEADER_RE.sub(
+ lambda m: m.group(1) + (m.group(2) or "") + _mask_token(m.group(3)),
+ text,
+ )
+
+ # API-key style headers (x-api-key, api-key, …). Header values are
+ # colon-separated, so gate on ":" — the regex itself is the precise filter.
+ if ":" in text:
+ text = _SECRET_HEADER_RE.sub(
lambda m: m.group(1) + _mask_token(m.group(2)),
text,
)
diff --git a/agent/secret_scope.py b/agent/secret_scope.py
new file mode 100644
index 00000000000..26022ca9b0e
--- /dev/null
+++ b/agent/secret_scope.py
@@ -0,0 +1,205 @@
+"""Profile-scoped credential resolution for multi-profile gateway multiplexing.
+
+The multiplexing gateway serves many profiles from one process. Each profile
+has its own ``.env`` with its own provider keys and platform tokens, so we
+**cannot** union them into the process-global ``os.environ`` (that would leak
+profile A's keys to profile B's turns, and to every subprocess spawned with
+``env=dict(os.environ)``).
+
+This module provides a fail-closed, context-local secret scope:
+
+- ``set_secret_scope(mapping)`` installs the active profile's secrets for the
+ current task (a contextvar, so it propagates into the agent's worker thread
+ via ``copy_context()`` exactly like the HERMES_HOME override).
+- ``get_secret(name)`` reads from that scope. When multiplexing is **active**
+ and no scope is set, it RAISES rather than silently falling back to
+ ``os.environ`` — an un-migrated or newly-added call site fails loud at that
+ exact line instead of leaking another profile's value. When multiplexing is
+ **off** (the default), it transparently reads ``os.environ`` so the
+ single-profile gateway and every non-gateway caller behave exactly as before.
+
+Design rationale lives in ``docs/design/multiplexing-gateway.md`` (Workstream A).
+"""
+from __future__ import annotations
+
+import os
+from contextvars import ContextVar, Token
+from pathlib import Path
+from typing import Dict, Mapping, Optional
+
+
+# ── multiplex-active flag ────────────────────────────────────────────────
+# Process-global: set once at gateway startup when gateway.multiplex_profiles
+# is true. Governs whether get_secret() fails closed on an unscoped read.
+# A plain module global (not a contextvar): it describes the deployment mode,
+# not a per-task value.
+_MULTIPLEX_ACTIVE: bool = False
+
+
+def set_multiplex_active(active: bool) -> None:
+ """Mark whether the process is running as a profile multiplexer.
+
+ Called once at gateway startup. When True, ``get_secret`` fails closed on
+ an unscoped read instead of falling back to ``os.environ``.
+ """
+ global _MULTIPLEX_ACTIVE
+ _MULTIPLEX_ACTIVE = bool(active)
+
+
+def is_multiplex_active() -> bool:
+ """Return whether the process is running as a profile multiplexer."""
+ return _MULTIPLEX_ACTIVE
+
+
+# ── the secret scope contextvar ──────────────────────────────────────────
+_SECRET_SCOPE: ContextVar[Optional[Mapping[str, str]]] = ContextVar(
+ "_SECRET_SCOPE", default=None
+)
+
+
+class UnscopedSecretError(RuntimeError):
+ """Raised when a secret is read in multiplex mode with no scope installed.
+
+ This is the fail-closed signal: it means a credential read reached
+ ``get_secret`` without a profile scope active, which in a multiplexer would
+ otherwise leak whichever profile's value happened to be in ``os.environ``.
+ The fix is to wrap the call path in ``set_secret_scope(...)`` (the per-turn
+ / per-adapter profile scope), not to widen the allowlist.
+ """
+
+
+def set_secret_scope(secrets: Optional[Mapping[str, str]]) -> Token:
+ """Install the active profile's secret mapping for the current context.
+
+ Returns a token for ``reset_secret_scope``. Pass ``None`` to clear.
+ """
+ return _SECRET_SCOPE.set(secrets)
+
+
+def reset_secret_scope(token: Token) -> None:
+ """Restore the previous secret scope."""
+ _SECRET_SCOPE.reset(token)
+
+
+def current_secret_scope() -> Optional[Mapping[str, str]]:
+ """Return the active secret mapping, or None when no scope is installed."""
+ return _SECRET_SCOPE.get()
+
+
+# ── genuinely-global env vars (NOT per-profile secrets) ──────────────────
+# These are process/deployment-level settings, not profile credentials. They
+# legitimately live in os.environ and must keep reading from it even in
+# multiplex mode — routing them through the fail-closed path would wrongly
+# crash. Anything matching is read from os.environ regardless of scope.
+#
+# Membership test is by exact name OR prefix (see _is_global_env). Keep this
+# list tight: when in doubt a value is a profile secret, not a global.
+_GLOBAL_ENV_EXACT = frozenset({
+ # Hermes runtime / deployment
+ "HERMES_HOME", "HERMES_PROFILE", "HERMES_GATEWAY_LOCK_DIR",
+ "HERMES_MAX_ITERATIONS", "HERMES_MAX_TOKENS", "HERMES_API_TIMEOUT",
+ "HERMES_REDACT_SECRETS", "HERMES_NOUS_TIMEOUT_SECONDS",
+ "_HERMES_GATEWAY",
+ # OS / interpreter
+ "PATH", "HOME", "USER", "LANG", "LC_ALL", "TZ", "PWD", "SHELL", "TMPDIR",
+ "VIRTUAL_ENV", "PYTHONPATH", "SSL_CERT_FILE",
+ # Kanban paths (per-board, not per-profile-secret)
+ "HERMES_KANBAN_DB", "HERMES_KANBAN_WORKSPACES_ROOT", "HERMES_KANBAN_BOARD",
+})
+_GLOBAL_ENV_PREFIXES = (
+ "HERMES_KANBAN_",
+ "HERMES_TELEGRAM_", # tuning knobs (batch delays, fallback toggles) — NOT the token
+ "TERMINAL_", # terminal/sandbox backend settings
+)
+
+
+def _is_global_env(name: str) -> bool:
+ """Return True for genuinely process-global (non-profile-secret) env vars."""
+ if name in _GLOBAL_ENV_EXACT:
+ return True
+ return any(name.startswith(p) for p in _GLOBAL_ENV_PREFIXES)
+
+
+def get_secret(name: str, default: Optional[str] = None) -> Optional[str]:
+ """Resolve a credential by env-var name, honoring the active profile scope.
+
+ Resolution order:
+
+ 1. Genuinely-global vars (``_is_global_env``) always read ``os.environ`` —
+ they are deployment settings, not profile secrets.
+ 2. When a secret scope is installed (multiplexed turn), read from it; an
+ absent key returns ``default``. The scope is authoritative — we do NOT
+ fall through to ``os.environ``, because in a multiplexer ``os.environ``
+ may hold another profile's value.
+ 3. No scope installed:
+ - multiplex INACTIVE (default deployment): read ``os.environ`` —
+ identical to the legacy ``os.getenv`` behavior every caller had before.
+ - multiplex ACTIVE: FAIL CLOSED. Raise ``UnscopedSecretError`` so the
+ missing scope is caught loudly instead of leaking a cross-profile value.
+ """
+ if _is_global_env(name):
+ val = os.environ.get(name)
+ return val if val is not None else default
+
+ scope = _SECRET_SCOPE.get()
+ if scope is not None:
+ val = scope.get(name)
+ return val if val is not None else default
+
+ if _MULTIPLEX_ACTIVE:
+ raise UnscopedSecretError(
+ f"get_secret({name!r}) called with no profile secret scope active "
+ f"while multiplexing is on. This credential read must run inside a "
+ f"set_secret_scope(...) block (the per-turn / per-adapter profile "
+ f"scope). Reading os.environ here would risk leaking another "
+ f"profile's value. See docs/design/multiplexing-gateway.md "
+ f"(Workstream A)."
+ )
+
+ val = os.environ.get(name)
+ return val if val is not None else default
+
+
+def load_env_file(env_path: Path) -> Dict[str, str]:
+ """Parse a ``.env`` file into a plain dict WITHOUT touching ``os.environ``.
+
+ Used to load a profile's secrets into an isolated mapping for
+ ``set_secret_scope``. Mirrors python-dotenv's basic parsing (KEY=VALUE,
+ ``export`` prefix, ``#`` comments, optional matching quotes) but never
+ mutates the process environment — that isolation is the whole point.
+ """
+ secrets: Dict[str, str] = {}
+ try:
+ text = env_path.read_text(encoding="utf-8")
+ except (FileNotFoundError, OSError, UnicodeDecodeError):
+ return secrets
+
+ for raw in text.splitlines():
+ line = raw.strip()
+ if not line or line.startswith("#"):
+ continue
+ if line.startswith("export "):
+ line = line[len("export "):].lstrip()
+ if "=" not in line:
+ continue
+ key, _, value = line.partition("=")
+ key = key.strip()
+ if not key:
+ continue
+ value = value.strip()
+ if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
+ value = value[1:-1]
+ secrets[key] = value
+
+ return secrets
+
+
+def build_profile_secret_scope(hermes_home: Path) -> Dict[str, str]:
+ """Build a profile's secret mapping from its ``/.env``.
+
+ Returns a fresh dict (safe to install via ``set_secret_scope``). Genuinely
+ global vars are intentionally NOT copied in — ``get_secret`` reads those
+ from ``os.environ`` directly, so the scope holds only profile secrets.
+ """
+ return load_env_file(Path(hermes_home) / ".env")
+
diff --git a/agent/shell_hooks.py b/agent/shell_hooks.py
index 4e2b2ddd7c3..97ba3862120 100644
--- a/agent/shell_hooks.py
+++ b/agent/shell_hooks.py
@@ -49,6 +49,58 @@ Wire protocol
# Silent no-op:
+
+Per-event ``extra`` keys
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``extra`` object contains every kwarg that is **not** one of the
+top-level payload keys (``tool_name``, ``args``, ``session_id``,
+``parent_session_id``). The tables below list the ``extra`` keys
+emitted by each built-in hook site.
+
+``post_tool_call`` (emitted from ``model_tools.py``)::
+
+ result – tool return value (serialised string)
+ status – "ok" | "error" | "blocked"
+ error_type – error category (e.g. "ValueError"), or None
+ error_message – human-readable error text, or None
+ duration_ms – wall-clock time in milliseconds
+ task_id – current task id (empty string if none)
+ tool_call_id – provider tool-call id
+ turn_id – current turn id
+ api_request_id – current API request id
+ middleware_trace – list of dicts from tool middleware chain
+
+``pre_tool_call`` (emitted from ``model_tools.py``)::
+
+ task_id – current task id (empty string if none)
+ tool_call_id – provider tool-call id
+ turn_id – current turn id
+ api_request_id – current API request id
+ middleware_trace – list of dicts from tool middleware chain
+
+``on_session_start`` (emitted from ``agent/conversation_loop.py``)::
+
+ model – model name (e.g. "claude-sonnet-4-20250514")
+ platform – platform identifier (e.g. "cli", "whatsapp")
+
+``on_session_end`` (emitted from ``agent/turn_finalizer.py``)::
+
+ task_id – current task id
+ turn_id – current turn id
+ completed – bool, True when the turn produced a final response
+ interrupted – bool, True when the user interrupted
+ model – model name
+ platform – platform identifier
+
+``subagent_stop`` (emitted from ``tools/delegate_tool.py``)::
+
+ parent_turn_id – parent agent's current turn id
+ child_session_id – child (subagent) session id
+ child_role – role string of the child agent
+ child_summary – summary of the child's work
+ child_status – exit status string (e.g. "success", "error")
+ duration_ms – wall-clock time of the child run in milliseconds
"""
from __future__ import annotations
diff --git a/agent/skill_utils.py b/agent/skill_utils.py
index 9f16534a450..338fa37cb85 100644
--- a/agent/skill_utils.py
+++ b/agent/skill_utils.py
@@ -280,9 +280,9 @@ def skill_matches_environment(frontmatter: Dict[str, Any]) -> bool:
This is an OFFER-time filter: it controls whether a skill shows up in the
skills index / autocomplete / slash-command list. It is intentionally NOT
enforced by ``skill_view`` or ``--skills`` preloading — an explicit load is
- explicit consent, and load-bearing force-loads (e.g. the kanban dispatcher
- injecting ``--skills kanban-worker``) must always succeed regardless of how
- the offer surfaces filter the skill.
+ explicit consent, and load-bearing force-loads (e.g. a dispatcher pinning
+ a task to a specialist skill via ``--skills``) must always succeed
+ regardless of how the offer surfaces filter the skill.
A skill matches when ANY of its declared environments is currently active
(OR semantics, mirroring ``platforms``). Unknown env tags fail open.
diff --git a/agent/title_generator.py b/agent/title_generator.py
index a7f1e158e1a..583a2cfc601 100644
--- a/agent/title_generator.py
+++ b/agent/title_generator.py
@@ -22,9 +22,31 @@ TitleCallback = Callable[[str], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
+ "Write the title in the same language the user is writing in. "
"Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
)
+_TITLE_PROMPT_PINNED_LANGUAGE = (
+ "Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
+ "following exchange. The title should capture the main topic or intent. "
+ "Write the title in {language}. "
+ "Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
+)
+
+
+def _title_language() -> str:
+ """Return configured title language, or empty string to match the user."""
+ try:
+ from hermes_cli.config import load_config
+
+ return str(
+ ((load_config() or {}).get("auxiliary") or {})
+ .get("title_generation", {})
+ .get("language", "")
+ ).strip()
+ except Exception:
+ return ""
+
def generate_title(
user_message: str,
@@ -48,8 +70,11 @@ def generate_title(
user_snippet = user_message[:500] if user_message else ""
assistant_snippet = assistant_response[:500] if assistant_response else ""
+ language = _title_language()
+ prompt = _TITLE_PROMPT_PINNED_LANGUAGE.format(language=language) if language else _TITLE_PROMPT
+
messages = [
- {"role": "system", "content": _TITLE_PROMPT},
+ {"role": "system", "content": prompt},
{"role": "user", "content": f"User: {user_snippet}\n\nAssistant: {assistant_snippet}"},
]
diff --git a/agent/tool_executor.py b/agent/tool_executor.py
index e7ba79db8b7..b79c29767e8 100644
--- a/agent/tool_executor.py
+++ b/agent/tool_executor.py
@@ -44,9 +44,26 @@ from tools.tool_result_storage import (
maybe_persist_tool_result,
enforce_turn_budget,
)
+from tools.budget_config import BudgetConfig, DEFAULT_BUDGET, budget_for_context_window
logger = logging.getLogger(__name__)
+
+def _budget_for_agent(agent) -> BudgetConfig:
+ """Resolve a tool-result BudgetConfig scaled to the agent's context window.
+
+ Large-context models keep the historical 100K/200K char defaults; small
+ models (e.g. a 65K-token local model switched into mid-session) get a budget
+ proportional to their window so a single large tool result can't push the
+ request past the model's limit (#23767). Falls back to the default budget
+ when the context length isn't resolvable.
+ """
+ try:
+ ctx = getattr(getattr(agent, "context_compressor", None), "context_length", None)
+ return budget_for_context_window(int(ctx)) if ctx else DEFAULT_BUDGET
+ except Exception:
+ return DEFAULT_BUDGET
+
# Maximum number of concurrent worker threads for parallel tool execution.
# Mirrors the constant in ``run_agent`` for tests/imports that look here.
_MAX_TOOL_WORKERS = 8
@@ -249,6 +266,10 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
tool_calls = assistant_message.tool_calls
num_tools = len(tool_calls)
+ # Resolve the context-scaled tool-output budget once per turn (cheap, but
+ # avoids rebuilding it per result inside the loop below).
+ _tool_budget = _budget_for_agent(agent)
+
# ── Pre-flight: interrupt check ──────────────────────────────────
if agent._interrupt_requested:
print(f"{agent.log_prefix}⚡ Interrupt: skipping {num_tools} tool call(s)")
@@ -725,6 +746,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
tool_name=name,
tool_use_id=tc.id,
env=get_active_env(effective_task_id),
+ config=_tool_budget,
) if not _is_multimodal_tool_result(function_result) else function_result
subdir_hints = agent._subdirectory_hints.check_tool_call(name, args)
@@ -756,7 +778,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
num_tools = len(parsed_calls)
if num_tools > 0:
turn_tool_msgs = messages[-num_tools:]
- enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id))
+ enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id), config=_tool_budget)
# ── /steer injection ──────────────────────────────────────────────
# Append any pending user steer text to the last tool result so the
@@ -769,6 +791,8 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
def execute_tool_calls_sequential(agent, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
"""Execute tool calls sequentially (original behavior). Used for single calls or interactive tools."""
+ # Resolve the context-scaled tool-output budget once per turn.
+ _tool_budget = _budget_for_agent(agent)
for i, tool_call in enumerate(assistant_message.tool_calls, 1):
# SAFETY: check interrupt BEFORE starting each tool.
# If the user sent "stop" during a previous tool's execution,
@@ -1377,6 +1401,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
tool_name=function_name,
tool_use_id=tool_call.id,
env=get_active_env(effective_task_id),
+ config=_tool_budget,
) if not _is_multimodal_tool_result(function_result) else function_result
# Discover subdirectory context files from tool arguments
@@ -1425,7 +1450,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
# ── Per-turn aggregate budget enforcement ─────────────────────────
num_tools_seq = len(assistant_message.tool_calls)
if num_tools_seq > 0:
- enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id))
+ enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id), config=_tool_budget)
# ── /steer injection ──────────────────────────────────────────────
# See _execute_tool_calls_parallel for the rationale. Same hook,
diff --git a/agent/transports/chat_completions.py b/agent/transports/chat_completions.py
index c0b2a13d250..42e81dc30e7 100644
--- a/agent/transports/chat_completions.py
+++ b/agent/transports/chat_completions.py
@@ -172,6 +172,7 @@ class ChatCompletionsTransport(ProviderTransport):
"codex_reasoning_items" in msg
or "codex_message_items" in msg
or "tool_name" in msg
+ or "timestamp" in msg # #47868 — strict providers reject this
):
needs_sanitize = True
break
@@ -201,6 +202,7 @@ class ChatCompletionsTransport(ProviderTransport):
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
msg.pop("tool_name", None)
+ msg.pop("timestamp", None) # #47868 — leak into strict providers
# Drop all Hermes-internal scaffolding markers (``_``-prefixed).
# OpenAI's message schema has no ``_``-prefixed fields, so this
# is safe and future-proofs against new markers being added.
@@ -435,10 +437,6 @@ class ChatCompletionsTransport(ProviderTransport):
extra_body["extra_body"] = openai_compat_extra
elif raw_thinking_config:
extra_body["thinking_config"] = raw_thinking_config
- elif provider_name == "google-gemini-cli":
- thinking_config = _build_gemini_thinking_config(model, reasoning_config)
- if thinking_config:
- extra_body["thinking_config"] = thinking_config
# Merge any pre-built extra_body additions
additions = params.get("extra_body_additions")
diff --git a/agent/turn_context.py b/agent/turn_context.py
index 8041eabdb7f..0bbdf73764e 100644
--- a/agent/turn_context.py
+++ b/agent/turn_context.py
@@ -112,6 +112,24 @@ def build_turn_context(
# Restore the primary runtime if the previous turn activated fallback.
agent._restore_primary_runtime()
+ # Between-turns MCP refresh: an MCP server that finished connecting since
+ # the previous turn (slow HTTP/OAuth servers routinely take 2-6s on a cold
+ # connect, missing the bounded startup wait) lands in THIS turn's tool
+ # snapshot. This is cache-safe by construction: it runs in the per-turn
+ # prologue, before this turn's first API call assembles ``tools=``, so it
+ # only ever extends a fresh request prefix — it never mutates the cached
+ # prefix of an in-flight turn. No-op when no MCP servers are registered
+ # (the common case, gated by the cheap ``has_registered_mcp_tools`` check)
+ # or when the tool set is unchanged (``refresh_agent_mcp_tools`` diffs by
+ # name and leaves the snapshot untouched on no-change).
+ try:
+ if not getattr(agent, "_skip_mcp_refresh", False):
+ from tools.mcp_tool import has_registered_mcp_tools, refresh_agent_mcp_tools
+ if has_registered_mcp_tools():
+ refresh_agent_mcp_tools(agent, quiet_mode=True)
+ except Exception:
+ logger.debug("between-turns MCP tool refresh skipped", exc_info=True)
+
# Sanitize surrogate characters from user input.
if isinstance(user_message, str):
user_message = sanitize_surrogates(user_message)
diff --git a/agent/turn_finalizer.py b/agent/turn_finalizer.py
index 20db3fcef9f..91496d72040 100644
--- a/agent/turn_finalizer.py
+++ b/agent/turn_finalizer.py
@@ -128,19 +128,44 @@ def finalize_turn(
and not failed
)
+ # Post-loop cleanup must never lose the response. Trajectory save,
+ # resource teardown, and session persistence all touch fallible
+ # surfaces — file I/O / JSON serialization (_save_trajectory), remote
+ # VM/browser teardown over the network (_cleanup_task_resources), and
+ # SQLite writes (_persist_session). A raise from any of them used to
+ # propagate straight out of run_conversation, discarding the partial
+ # final_response the caller is waiting for (subprocess wrappers saw an
+ # empty stdout with no traceback — #8049). Each step is now guarded
+ # independently so one failure can't skip the others, and any errors
+ # are surfaced on the result dict via ``cleanup_errors`` rather than
+ # killing the turn.
+ _cleanup_errors = []
+
# Save trajectory if enabled. ``user_message`` may be a multimodal
# list of parts; the trajectory format wants a plain string.
- agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
+ try:
+ agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
+ except Exception as _save_err:
+ _cleanup_errors.append(f"save_trajectory: {_save_err}")
+ logger.error("finalize_turn: _save_trajectory failed: %s", _save_err, exc_info=True)
# Clean up VM and browser for this task after conversation completes
- agent._cleanup_task_resources(effective_task_id)
+ try:
+ agent._cleanup_task_resources(effective_task_id)
+ except Exception as _cleanup_err:
+ _cleanup_errors.append(f"cleanup_task_resources: {_cleanup_err}")
+ logger.error("finalize_turn: _cleanup_task_resources failed: %s", _cleanup_err, exc_info=True)
# Persist session to both JSON log and SQLite only after private retry
# scaffolding has been removed. Otherwise a later user "continue" turn
# can replay assistant("(empty)") / recovery nudges and fall into the
# same empty-response loop again.
- agent._drop_trailing_empty_response_scaffolding(messages)
- agent._persist_session(messages, conversation_history)
+ try:
+ agent._drop_trailing_empty_response_scaffolding(messages)
+ agent._persist_session(messages, conversation_history)
+ except Exception as _persist_err:
+ _cleanup_errors.append(f"persist_session: {_persist_err}")
+ logger.error("finalize_turn: _persist_session failed: %s", _persist_err, exc_info=True)
# ── Turn-exit diagnostic log ─────────────────────────────────────
# Always logged at INFO so agent.log captures WHY every turn ended.
@@ -354,6 +379,11 @@ def finalize_turn(
}
if agent._tool_guardrail_halt_decision is not None:
result["guardrail"] = agent._tool_guardrail_halt_decision.to_metadata()
+ # Surface any post-loop cleanup failures so the caller can distinguish a
+ # clean turn from one whose trajectory/session/resource teardown raised
+ # (the response is still returned either way — #8049).
+ if _cleanup_errors:
+ result["cleanup_errors"] = _cleanup_errors
# If a /steer landed after the final assistant turn (no more tool
# batches to drain into), hand it back to the caller so it can be
# delivered as the next user turn instead of being silently lost.
diff --git a/agent/turn_retry_state.py b/agent/turn_retry_state.py
index 188fe3f1c16..34183bd06be 100644
--- a/agent/turn_retry_state.py
+++ b/agent/turn_retry_state.py
@@ -58,6 +58,12 @@ class TurnRetryState:
primary_recovery_attempted: bool = False
has_retried_429: bool = False
+ # ── Auth-failure provider failover ───────────────────────────────────
+ # Set once we've escalated a persistent 401/403 (after the per-provider
+ # credential-refresh attempt above failed) to the fallback chain, so we
+ # don't loop on the same auth failover within one attempt.
+ auth_failover_attempted: bool = False
+
# ── Restart signals (read by the outer loop after the attempt) ───────
restart_with_compressed_messages: bool = False
restart_with_length_continuation: bool = False
diff --git a/agent/usage_pricing.py b/agent/usage_pricing.py
index 95bb11df521..7c4416e5fb2 100644
--- a/agent/usage_pricing.py
+++ b/agent/usage_pricing.py
@@ -451,6 +451,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
+ cache_read_cost_per_million=Decimal("1.50"),
+ cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@@ -461,6 +463,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
+ cache_read_cost_per_million=Decimal("0.30"),
+ cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@@ -471,6 +475,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
+ cache_read_cost_per_million=Decimal("0.30"),
+ cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@@ -481,6 +487,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
+ cache_read_cost_per_million=Decimal("0.08"),
+ cache_write_cost_per_million=Decimal("1.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@@ -584,6 +592,26 @@ def resolve_billing_route(
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
+def _normalize_bedrock_model_name(model: str) -> str:
+ """Normalize a Bedrock model id to its bare foundation-model form.
+
+ Bedrock cross-region inference profiles prefix the foundation model id
+ with a region scope (``us.`` / ``global.`` / ``eu.`` / ``ap.`` / ``jp.``),
+ e.g. ``us.anthropic.claude-opus-4-7``. The pricing table is keyed on the
+ bare ``anthropic.claude-*`` id, so the prefix must be stripped before the
+ lookup or every cross-region session prices as unknown. Mirrors the
+ prefix list in ``bedrock_adapter.is_anthropic_bedrock_model``. Also
+ normalizes dot-notation version numbers (``4.7`` → ``4-7``).
+ """
+ name = model.lower().strip()
+ for prefix in ("us.", "global.", "eu.", "ap.", "jp."):
+ if name.startswith(prefix):
+ name = name[len(prefix):]
+ break
+ name = re.sub(r"(\d+)\.(\d+)", r"\1-\2", name)
+ return name
+
+
def _normalize_anthropic_model_name(model: str) -> str:
"""Normalize Anthropic model name variants to canonical form.
@@ -614,6 +642,14 @@ def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]
entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
if entry:
return entry
+ # Bedrock cross-region inference profiles carry a region prefix
+ # (us./global./eu./...) that the bare pricing keys don't have.
+ if route.provider == "bedrock":
+ normalized = _normalize_bedrock_model_name(model)
+ if normalized != model:
+ entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
+ if entry:
+ return entry
return None
diff --git a/apps/bootstrap-installer/src-tauri/src/paths.rs b/apps/bootstrap-installer/src-tauri/src/paths.rs
index c9171f361ce..99ad16f6b88 100644
--- a/apps/bootstrap-installer/src-tauri/src/paths.rs
+++ b/apps/bootstrap-installer/src-tauri/src/paths.rs
@@ -77,6 +77,19 @@ pub fn installer_dest() -> PathBuf {
hermes_home().join(name)
}
+/// Marker the updater writes for the duration of an in-app update and removes
+/// when it finishes (see update.rs `UpdateMarkerGuard`). A freshly-launched
+/// desktop checks this before spawning its own local backend: spawning one
+/// mid-update re-locks the venv shim and triggers `force_kill_other_hermes`,
+/// which then kills that legitimate backend in a respawn loop (#50238).
+///
+/// Lives directly under HERMES_HOME (same rationale as `installer_dest`) so the
+/// Electron desktop — which resolves HERMES_HOME identically and pins it into
+/// the updater's env — agrees on the exact path.
+pub fn update_in_progress_marker() -> PathBuf {
+ hermes_home().join(".hermes-update-in-progress")
+}
+
/// Copy the currently-running installer binary to `installer_dest()` so it's
/// available for future `--update` runs and shortcut launches.
///
diff --git a/apps/bootstrap-installer/src-tauri/src/update.rs b/apps/bootstrap-installer/src-tauri/src/update.rs
index a42838293a1..539f69e9f78 100644
--- a/apps/bootstrap-installer/src-tauri/src/update.rs
+++ b/apps/bootstrap-installer/src-tauri/src/update.rs
@@ -103,9 +103,61 @@ pub async fn start_update(app: AppHandle) -> Result<(), String> {
Ok(())
}
+/// RAII guard that owns the "update in progress" marker (see
+/// `paths::update_in_progress_marker`). Created at the top of `run_update`;
+/// its `Drop` removes the marker on EVERY exit path — success, early
+/// `return Err`, or a panic that unwinds through `run_update` — so a crashed
+/// or aborted updater can never permanently strand the marker and block
+/// future desktop launches. The marker payload is `{pid}\n{started_at_unix}`
+/// so the desktop's launch gate can detect a stale marker (dead PID / past a
+/// hard ceiling) and self-heal rather than wait forever.
+struct UpdateMarkerGuard {
+ path: PathBuf,
+}
+
+impl UpdateMarkerGuard {
+ /// Write the marker. Best-effort: a write failure must NOT abort the
+ /// update (the gate degrades to "no marker => proceed", i.e. exactly the
+ /// pre-fix behavior), so we log and carry on with a guard that still
+ /// attempts cleanup of whatever may exist at the path.
+ fn acquire(path: PathBuf) -> Self {
+ let pid = std::process::id();
+ let started_at = std::time::SystemTime::now()
+ .duration_since(std::time::UNIX_EPOCH)
+ .map(|d| d.as_secs())
+ .unwrap_or(0);
+ if let Some(parent) = path.parent() {
+ let _ = std::fs::create_dir_all(parent);
+ }
+ if let Err(err) = std::fs::write(&path, format!("{pid}\n{started_at}")) {
+ tracing::warn!(?path, %err, "could not write update-in-progress marker");
+ }
+ Self { path }
+ }
+}
+
+impl Drop for UpdateMarkerGuard {
+ fn drop(&mut self) {
+ if let Err(err) = std::fs::remove_file(&self.path) {
+ if err.kind() != std::io::ErrorKind::NotFound {
+ tracing::warn!(path = ?self.path, %err, "could not remove update-in-progress marker");
+ }
+ }
+ }
+}
+
async fn run_update(app: AppHandle) -> Result<()> {
let hermes_home = crate::paths::hermes_home();
let install_root = hermes_home.join("hermes-agent");
+
+ // Mutual exclusion (#50238): publish an "update in progress" marker for the
+ // entire duration of this update. A desktop instance the user relaunches
+ // mid-update consults this before spawning its own local backend — without
+ // it, that backend re-locks the venv shim, our `force_kill_other_hermes`
+ // straggler-cleanup kills it, and the relaunch/kill cycle loops. The guard
+ // removes the marker on every exit path (incl. early returns / panics).
+ let _update_marker = UpdateMarkerGuard::acquire(crate::paths::update_in_progress_marker());
+
let update_branch = update_branch_from_args(std::env::args().skip(1))
.or_else(|| option_env_string("BUILD_PIN_BRANCH"))
.unwrap_or_else(|| "main".to_string());
@@ -518,11 +570,13 @@ fn format_locked_paths(paths: &[PathBuf]) -> String {
/// taskkill, excluding our own PID.
///
/// Safe w.r.t. our own update child: this runs inside the install-lock wait,
-/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. At this
-/// point no update-driven hermes.exe exists yet, so the only hermes.exe images
-/// are stragglers from the old desktop — exactly what we want gone. (`/FI PID
-/// ne ` also spares this Tauri process, though it isn't named
-/// hermes.exe.)
+/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. And a
+/// desktop the user relaunches mid-update will NOT have spawned a backend —
+/// `startHermes()` in the desktop gates local-backend startup on our
+/// update-in-progress marker and parks until we finish (#50238). So the only
+/// hermes.exe images here are stragglers from the old desktop — exactly what
+/// we want gone. (`/FI PID ne ` also spares this Tauri process, though it
+/// isn't named hermes.exe.)
fn force_kill_other_hermes() {
if !cfg!(target_os = "windows") {
return;
@@ -992,6 +1046,48 @@ mod tests {
assert!(locked_paths(&probes).is_empty());
}
+ #[test]
+ fn update_marker_guard_writes_then_removes_on_drop() {
+ let dir = unique_tmp_dir("marker-guard");
+ std::fs::create_dir_all(&dir).unwrap();
+ let marker = dir.join(".hermes-update-in-progress");
+
+ {
+ let _g = UpdateMarkerGuard::acquire(marker.clone());
+ assert!(marker.exists(), "marker must exist while the guard is held");
+ let body = std::fs::read_to_string(&marker).unwrap();
+ let pid_line = body.lines().next().unwrap();
+ assert_eq!(
+ pid_line.trim().parse::().unwrap(),
+ std::process::id(),
+ "marker records our pid so the desktop can probe liveness"
+ );
+ assert_eq!(body.lines().count(), 2, "marker is pid + started_at lines");
+ }
+
+ assert!(
+ !marker.exists(),
+ "Drop must remove the marker on every exit path (incl. early return / panic unwind)"
+ );
+ let _ = std::fs::remove_dir_all(&dir);
+ }
+
+ #[test]
+ fn update_marker_guard_drop_is_quiet_when_already_gone() {
+ let dir = unique_tmp_dir("marker-guard-gone");
+ std::fs::create_dir_all(&dir).unwrap();
+ let marker = dir.join(".hermes-update-in-progress");
+
+ let guard = UpdateMarkerGuard::acquire(marker.clone());
+ // Simulate an external cleanup (e.g. the desktop pruned a marker it
+ // judged stale) before our guard drops — Drop must not panic.
+ std::fs::remove_file(&marker).unwrap();
+ drop(guard);
+
+ assert!(!marker.exists());
+ let _ = std::fs::remove_dir_all(&dir);
+ }
+
#[test]
fn parses_update_branch_from_space_or_equals_args() {
assert_eq!(
diff --git a/apps/desktop/README.md b/apps/desktop/README.md
index 17d1cacee5b..8a6d3efe9bf 100644
--- a/apps/desktop/README.md
+++ b/apps/desktop/README.md
@@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig
### How it works
-The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
+The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification
diff --git a/apps/desktop/electron/backend-ready.cjs b/apps/desktop/electron/backend-ready.cjs
index 9af41e549c4..a4899e8657a 100644
--- a/apps/desktop/electron/backend-ready.cjs
+++ b/apps/desktop/electron/backend-ready.cjs
@@ -1,5 +1,32 @@
const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
+// The announcement clock starts the instant the backend process is spawned —
+// before uvicorn binds its socket. On a cold install the child must first
+// compile and import the whole `hermes_cli.main` → `web_server` → FastAPI/
+// uvicorn chain, and on Windows real-time AV (Defender) scans every freshly
+// written `.pyc`. That pre-bind cost can run 30-60s on a slow disk, so a tight
+// 45s deadline kills a *healthy but still-starting* backend and respawns it,
+// piling up orphaned processes (issue #50209). A roomier default absorbs the
+// cold-start cost; a warm start still announces in well under a second.
+const DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS = 90_000
+// Never trust a deadline tighter than the warm-start path needs; floor at 45s
+// (the historical default) so a malformed override can't reintroduce the loop.
+const MIN_PORT_ANNOUNCE_TIMEOUT_MS = 45_000
+
+/**
+ * Resolve the port-announcement deadline. Honors the
+ * HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS env override (for users on slow
+ * disks / aggressive AV who need an even longer cold-start window), clamped
+ * to a sane floor so a bad value can't make boot flakier than the default.
+ */
+function resolvePortAnnounceTimeoutMs(env = process.env) {
+ const parsed = Number(env.HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS)
+ if (Number.isFinite(parsed) && parsed > 0) {
+ return Math.max(MIN_PORT_ANNOUNCE_TIMEOUT_MS, Math.round(parsed))
+ }
+ return DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS
+}
+
/**
* Watch a child process's stdout for the `HERMES_DASHBOARD_READY port=`
* line that web_server.py prints after uvicorn binds its socket.
@@ -9,11 +36,15 @@ const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
* - the child emits an `error` event
* - no line arrives within the timeout
*
+ * The default timeout is cold-start tolerant (see
+ * DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS) because the clock starts before the
+ * backend has even bound its port. Pass an explicit `timeoutMs` to override.
+ *
* A single `cleanup()` tears down every listener (data/exit/error/timeout)
* on every terminal path — resolve, reject, or timeout — so repeated
* backend spawns don't leak listener slots on the child.
*/
-function waitForDashboardPort(child, timeoutMs = 45_000) {
+function waitForDashboardPort(child, timeoutMs = resolvePortAnnounceTimeoutMs()) {
return new Promise((resolve, reject) => {
let buf = ''
let done = false
@@ -63,4 +94,9 @@ function waitForDashboardPort(child, timeoutMs = 45_000) {
})
}
-module.exports = { waitForDashboardPort }
+module.exports = {
+ waitForDashboardPort,
+ resolvePortAnnounceTimeoutMs,
+ DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+ MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+}
diff --git a/apps/desktop/electron/backend-ready.test.cjs b/apps/desktop/electron/backend-ready.test.cjs
new file mode 100644
index 00000000000..8f6267b7929
--- /dev/null
+++ b/apps/desktop/electron/backend-ready.test.cjs
@@ -0,0 +1,121 @@
+/**
+ * Tests for electron/backend-ready.cjs.
+ *
+ * Run with: node --test electron/backend-ready.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * Covers the cold-start port-announcement deadline (issue #50209): the clock
+ * starts before the backend binds its port, so a tight 45s deadline killed a
+ * healthy-but-still-compiling backend on cold Windows installs. The default is
+ * now cold-start tolerant and overridable via
+ * HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const { EventEmitter } = require('node:events')
+
+const {
+ waitForDashboardPort,
+ resolvePortAnnounceTimeoutMs,
+ DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+ MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+} = require('./backend-ready.cjs')
+
+// A minimal stand-in for a spawned child process: an EventEmitter with a
+// stdout EventEmitter, matching the surface waitForDashboardPort consumes
+// (child.stdout.on('data'), child.on('exit'|'error') + the .off() teardown).
+function makeFakeChild() {
+ const child = new EventEmitter()
+ child.stdout = new EventEmitter()
+ return child
+}
+
+// ---------------------------------------------------------------------------
+// resolvePortAnnounceTimeoutMs
+// ---------------------------------------------------------------------------
+
+test('default is cold-start tolerant (> the historical 45s floor)', () => {
+ assert.equal(resolvePortAnnounceTimeoutMs({}), DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS)
+ assert.ok(
+ DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS > MIN_PORT_ANNOUNCE_TIMEOUT_MS,
+ 'cold-start default must exceed the warm-start floor'
+ )
+})
+
+test('honors a valid HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS override', () => {
+ const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '120000' }
+ assert.equal(resolvePortAnnounceTimeoutMs(env), 120_000)
+})
+
+test('clamps an override below the floor up to the 45s minimum', () => {
+ const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '1000' }
+ assert.equal(resolvePortAnnounceTimeoutMs(env), MIN_PORT_ANNOUNCE_TIMEOUT_MS)
+})
+
+test('rounds a fractional override', () => {
+ const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '60000.7' }
+ assert.equal(resolvePortAnnounceTimeoutMs(env), 60_001)
+})
+
+test('falls back to the default for malformed / non-positive overrides', () => {
+ for (const bad of ['', 'abc', '0', '-5', 'NaN', undefined]) {
+ const env = bad === undefined ? {} : { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: bad }
+ assert.equal(
+ resolvePortAnnounceTimeoutMs(env),
+ DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
+ `override ${JSON.stringify(bad)} should fall through to the default`
+ )
+ }
+})
+
+// ---------------------------------------------------------------------------
+// waitForDashboardPort
+// ---------------------------------------------------------------------------
+
+test('resolves with the announced port', async () => {
+ const child = makeFakeChild()
+ const p = waitForDashboardPort(child, 1000)
+ child.stdout.emit('data', 'noise before\nHERMES_DASHBOARD_READY port=54321\n')
+ assert.equal(await p, 54321)
+})
+
+test('parses the port even when the line arrives split across chunks', async () => {
+ const child = makeFakeChild()
+ const p = waitForDashboardPort(child, 1000)
+ child.stdout.emit('data', 'HERMES_DASHBOARD_READY po')
+ child.stdout.emit('data', 'rt=8080\n')
+ assert.equal(await p, 8080)
+})
+
+test('rejects when the child exits before announcing', async () => {
+ const child = makeFakeChild()
+ const p = waitForDashboardPort(child, 1000)
+ child.emit('exit', 1, null)
+ await assert.rejects(p, /exited before port announcement/)
+})
+
+test('rejects on a child error event', async () => {
+ const child = makeFakeChild()
+ const p = waitForDashboardPort(child, 1000)
+ child.emit('error', new Error('spawn ENOENT'))
+ await assert.rejects(p, /spawn ENOENT/)
+})
+
+test('rejects with the timeout message after the deadline', async () => {
+ const child = makeFakeChild()
+ await assert.rejects(
+ waitForDashboardPort(child, 20),
+ /Timed out waiting for Hermes backend port announcement \(20ms\)/
+ )
+})
+
+test('a late announcement after timeout does not throw (listeners torn down)', async () => {
+ const child = makeFakeChild()
+ await assert.rejects(waitForDashboardPort(child, 20), /Timed out/)
+ // The orphaned backend may still print its READY line later; the watcher
+ // must have detached so this emit is a no-op rather than a double-settle.
+ assert.doesNotThrow(() => {
+ child.stdout.emit('data', 'HERMES_DASHBOARD_READY port=9999\n')
+ })
+})
diff --git a/apps/desktop/electron/link-title-window.cjs b/apps/desktop/electron/link-title-window.cjs
new file mode 100644
index 00000000000..80b3af3976e
--- /dev/null
+++ b/apps/desktop/electron/link-title-window.cjs
@@ -0,0 +1,42 @@
+'use strict'
+
+// Hidden BrowserWindow used by tier-2 link-title resolution: when curl can't
+// read a page (bot walls, JS-rendered pages), we briefly load the URL
+// in an offscreen window and read its title. That window loads arbitrary
+// user-linked pages — including YouTube/`watch` URLs that autoplay — so it must
+// never be allowed to emit sound.
+
+function linkTitleWindowOptions(partitionSession) {
+ return {
+ show: false,
+ width: 1280,
+ height: 800,
+ webPreferences: {
+ backgroundThrottling: false,
+ contextIsolation: true,
+ javascript: true,
+ nodeIntegration: false,
+ sandbox: true,
+ session: partitionSession,
+ webSecurity: true
+ }
+ }
+}
+
+// Create the offscreen title-fetch window and immediately mute it. Without the
+// mute, autoplaying media on the loaded page (e.g. a YouTube link) leaks ~2s of
+// audio every time a session containing such links is re-rendered. See #49505.
+function createLinkTitleWindow(BrowserWindow, partitionSession) {
+ const window = new BrowserWindow(linkTitleWindowOptions(partitionSession))
+
+ try {
+ window.webContents.setAudioMuted(true)
+ } catch {
+ // webContents may be unavailable in degraded/headless environments; muting
+ // is best-effort and the window is destroyed within a few seconds anyway.
+ }
+
+ return window
+}
+
+module.exports = { createLinkTitleWindow, linkTitleWindowOptions }
diff --git a/apps/desktop/electron/link-title-window.test.cjs b/apps/desktop/electron/link-title-window.test.cjs
new file mode 100644
index 00000000000..87333efb69d
--- /dev/null
+++ b/apps/desktop/electron/link-title-window.test.cjs
@@ -0,0 +1,56 @@
+const assert = require('node:assert/strict')
+const test = require('node:test')
+
+const { createLinkTitleWindow, linkTitleWindowOptions } = require('./link-title-window.cjs')
+
+function makeFakeBrowserWindow() {
+ const calls = { audioMuted: [] }
+ const FakeBrowserWindow = function (options) {
+ this.options = options
+ this.webContents = {
+ setAudioMuted(value) {
+ calls.audioMuted.push(value)
+ }
+ }
+ }
+
+ return { FakeBrowserWindow, calls }
+}
+
+test('linkTitleWindowOptions keeps the offscreen, hardened defaults', () => {
+ const session = { id: 'link-titles' }
+ const options = linkTitleWindowOptions(session)
+
+ assert.equal(options.show, false)
+ assert.equal(options.webPreferences.session, session)
+ assert.equal(options.webPreferences.contextIsolation, true)
+ assert.equal(options.webPreferences.sandbox, true)
+ assert.equal(options.webPreferences.nodeIntegration, false)
+})
+
+test('createLinkTitleWindow mutes audio so historical links never autoplay sound', () => {
+ // Regression for #49505: the hidden title-fetch window loaded YouTube/watch
+ // URLs (to read their ) without muting, leaking ~2s of audio on every
+ // history re-render.
+ const { FakeBrowserWindow, calls } = makeFakeBrowserWindow()
+
+ const window = createLinkTitleWindow(FakeBrowserWindow, { id: 'link-titles' })
+
+ assert.ok(window instanceof FakeBrowserWindow)
+ assert.deepEqual(calls.audioMuted, [true])
+})
+
+test('createLinkTitleWindow still returns the window if muting throws', () => {
+ const ThrowingBrowserWindow = function (options) {
+ this.options = options
+ this.webContents = {
+ setAudioMuted() {
+ throw new Error('webContents unavailable')
+ }
+ }
+ }
+
+ const window = createLinkTitleWindow(ThrowingBrowserWindow, { id: 'link-titles' })
+
+ assert.ok(window instanceof ThrowingBrowserWindow)
+})
diff --git a/apps/desktop/electron/main.cjs b/apps/desktop/electron/main.cjs
index 42f81c38123..628edc8ef7a 100644
--- a/apps/desktop/electron/main.cjs
+++ b/apps/desktop/electron/main.cjs
@@ -34,6 +34,7 @@ const {
SESSION_WINDOW_MIN_WIDTH
} = require('./session-windows.cjs')
const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
+const { createLinkTitleWindow } = require('./link-title-window.cjs')
const { probeGatewayWebSocket } = require('./gateway-ws-probe.cjs')
const { adoptServedDashboardToken } = require('./dashboard-token.cjs')
const { waitForDashboardPort } = require('./backend-ready.cjs')
@@ -42,6 +43,16 @@ const { fetchMarketplaceThemes, searchMarketplaceThemes } = require('./vscode-ma
const { buildDesktopBackendEnv, normalizeHermesHomeRoot } = require('./backend-env.cjs')
const { readWindowsUserEnvVar } = require('./windows-user-env.cjs')
const { readDirForIpc } = require('./fs-read-dir.cjs')
+const { readLiveUpdateMarker } = require('./update-marker.cjs')
+const {
+ resolveUnpackedRelease,
+ decideRelaunchOutcome,
+ sandboxPreflight,
+ sandboxFallbackFromEnv,
+ collectRelaunchArgs,
+ collectRelaunchEnv,
+ buildRelaunchScript
+} = require('./update-relaunch.cjs')
const { gitRootForIpc } = require('./git-root.cjs')
const { worktreesForIpc } = require('./git-worktrees.cjs')
const { OFFICIAL_REPO_HTTPS_URL, isOfficialSshRemote } = require('./update-remote.cjs')
@@ -150,6 +161,8 @@ if (REMOTE_DISPLAY_REASON) {
)
}
+ipcMain.handle('hermes:get-remote-display-reason', () => REMOTE_DISPLAY_REASON)
+
// Keep the renderer running at full speed while the window is in the background
// or occluded. The chat transcript streams to screen through a
// requestAnimationFrame-gated flush; Chromium pauses rAF (and clamps timers)
@@ -268,6 +281,23 @@ function resolveHermesHome() {
}
const HERMES_HOME = resolveHermesHome()
+
+function hermesManagedNodePathEntries() {
+ // NOTE: keep this ordering in sync with iter_hermes_node_dirs() in
+ // hermes_constants.py — this Node main process cannot import the Python
+ // module, so the platform-ordering rule is mirrored here.
+ const root = path.join(HERMES_HOME, 'node')
+ const bin = path.join(root, 'bin')
+ const entries = IS_WINDOWS ? [root, bin] : [bin, root]
+ return entries.filter(directoryExists)
+}
+
+function pathWithHermesManagedNode(...entries) {
+ return [...hermesManagedNodePathEntries(), ...entries, process.env.PATH]
+ .filter(Boolean)
+ .join(path.delimiter)
+}
+
// ACTIVE_HERMES_ROOT — the canonical mutable Hermes install. Same path
// install.ps1 / install.sh use, so a desktop-only user and a CLI-only user end
// up with identical layouts and can share one install.
@@ -1090,6 +1120,59 @@ function directoryExists(filePath) {
}
}
+// --- in-app update mutual exclusion (#50238) -------------------------------
+// The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
+// duration of an `--update` run (see update.rs UpdateMarkerGuard). If the user
+// relaunches the desktop mid-update — because the window vanished with no
+// progress and looks crashed — a fresh instance must NOT spawn its own local
+// backend: that backend re-locks the venv shim, the updater's straggler cleanup
+// (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
+// fails with the 45s "backend didn't come up" error, and the relaunch/kill
+// cycle loops. Instead the fresh instance parks until the update finishes, then
+// brings the backend up itself (it is the surviving instance — the updater's
+// own relaunch hits our single-instance lock and quits). Marker parsing +
+// staleness self-heal live in update-marker.cjs (unit-tested).
+
+// How long we'll park the launch waiting for a live update to finish before
+// giving up and starting the backend anyway (belt-and-suspenders alongside the
+// marker's own age ceiling; covers a stuck-but-alive updater).
+const UPDATE_WAIT_TIMEOUT_MS = 20 * 60 * 1000
+const UPDATE_WAIT_POLL_MS = 1000
+// How long the desktop lingers on the "updating, don't reopen" overlay after
+// spawning the detached updater, before it quits to release the venv shim. The
+// old 600ms was long enough to register the child process but far too short for
+// the user to READ the overlay — the window just vanished, looked like a crash,
+// and the user relaunched mid-update (the #50238 restart-loop trigger). A
+// couple of seconds lets the message land and bridges the gap until the
+// updater's own progress window appears. (#50419)
+const UPDATE_HANDOFF_DWELL_MS = 2500
+
+// Block until no live update is in progress (or we hit the wait timeout).
+// Emits a boot-progress phase so the renderer shows "Update in progress…"
+// rather than a frozen splash. Returns true if it parked at all.
+async function waitForUpdateToFinish() {
+ let marker = readLiveUpdateMarker(HERMES_HOME)
+ if (!marker) return false
+
+ rememberLog(`[updates] update in progress (pid=${marker.pid}); deferring backend start until it finishes`)
+ const deadline = Date.now() + UPDATE_WAIT_TIMEOUT_MS
+ while (marker && Date.now() < deadline) {
+ await advanceBootProgress(
+ 'backend.update-wait',
+ 'An update is finishing — Hermes will start automatically when it completes…',
+ 12
+ )
+ await new Promise(r => setTimeout(r, UPDATE_WAIT_POLL_MS))
+ marker = readLiveUpdateMarker(HERMES_HOME)
+ }
+ if (marker) {
+ rememberLog('[updates] update still in progress after wait timeout; starting backend anyway')
+ } else {
+ rememberLog('[updates] update finished; proceeding with backend start')
+ }
+ return true
+}
+
function unpackedPathFor(filePath) {
return filePath.replace(/app\.asar(?=$|[\\/])/, 'app.asar.unpacked')
}
@@ -1801,7 +1884,11 @@ async function applyUpdates(opts = {}) {
return { ok: true, manual: true, command, hermesRoot: updateRoot }
}
- emitUpdateProgress({ stage: 'restart', message: 'Handing off to the Hermes updater…', percent: 100 })
+ emitUpdateProgress({
+ stage: 'restart',
+ message: 'Updating Hermes — this window will close and the updater will open. Don’t reopen Hermes yourself; it restarts automatically when the update finishes.',
+ percent: 100
+ })
repairMacUpdaterHelper(updater)
const updateRoot = resolveUpdateRoot()
@@ -1827,7 +1914,7 @@ async function applyUpdates(opts = {}) {
env: {
...process.env,
HERMES_HOME,
- PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
+ PATH: pathWithHermesManagedNode(venvBin)
},
detached: true,
stdio: 'ignore',
@@ -1837,11 +1924,14 @@ async function applyUpdates(opts = {}) {
rememberLog(`[updates] launched updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release venv shim`)
- // Give the OS a beat to register the new process, then quit. The updater
- // rebuilds and relaunches us when it's done.
+ // Linger on the "updating — don't reopen" overlay long enough for the user
+ // to actually read it (and to bridge the gap until the updater's own window
+ // appears), THEN quit to release the venv shim. The updater rebuilds and
+ // relaunches us when it's done. (#50419 — a 600ms quit looked like a crash
+ // and lured users into the #50238 relaunch loop.)
setTimeout(() => {
app.quit()
- }, 600)
+ }, UPDATE_HANDOFF_DWELL_MS)
return { ok: true, handedOff: true, updater }
} finally {
@@ -1871,7 +1961,7 @@ async function handOffWindowsBootstrapRecovery(reason) {
env: {
...process.env,
HERMES_HOME,
- PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
+ PATH: pathWithHermesManagedNode(venvBin)
},
detached: true,
stdio: 'ignore',
@@ -1880,9 +1970,12 @@ async function handOffWindowsBootstrapRecovery(reason) {
child.unref()
rememberLog(`[bootstrap] handed off ${reason} recovery to updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release app.asar`)
+ // Same dwell as the in-app update hand-off (#50419): give the updater's
+ // window time to appear before we vanish, so the recovery doesn't look like
+ // a crash and provoke a mid-recovery relaunch.
setTimeout(() => {
app.quit()
- }, 600)
+ }, UPDATE_HANDOFF_DWELL_MS)
return true
}
@@ -1952,13 +2045,11 @@ async function applyUpdatesPosixInApp() {
}
// Put the Hermes-managed Node and the venv on PATH so `hermes desktop`'s
- // npm build can find them on a machine with no system Node.
- const extraPath = [path.join(HERMES_HOME, 'node', 'bin'), path.join(updateRoot, 'venv', 'bin')]
- .filter(Boolean)
- .join(path.delimiter)
+ // npm build can find them on a machine with no system Node. Windows portable
+ // Node lives directly under %LOCALAPPDATA%\hermes\node, not node\bin.
const env = {
HERMES_HOME,
- PATH: [extraPath, process.env.PATH].filter(Boolean).join(path.delimiter)
+ PATH: pathWithHermesManagedNode(path.join(updateRoot, 'venv', 'bin'))
}
// `hermes update` reaps stale `hermes dashboard` backends (a code update
@@ -2028,6 +2119,114 @@ async function applyUpdatesPosixInApp() {
return { ok: false, backendUpdated: true, error: 'desktop rebuild failed' }
}
+ // Linux in-app update terminal state (#45205). `hermes desktop --build-only`
+ // rebuilds the unpacked app in place under apps/desktop/release/-unpacked.
+ // We can only HONESTLY relaunch into the new GUI when the *running* binary IS
+ // that rebuilt one — i.e. execPath lives under release/-unpacked. The
+ // outcome is decided by three signals (see update-relaunch.cjs):
+ //
+ // underUnpacked + sandboxOk → 'relaunch': detached watcher re-execs us in
+ // place (mirrors the macOS handoff). Without it the update succeeds but
+ // the app never restarts and the overlay hangs on "applying" forever.
+ // !underUnpacked → 'guiSkew': the running shell is an AppImage/
+ // .deb/.rpm/dev/unresolved binary we did NOT replace. Claiming "loads
+ // next launch" is a lie (GUI/backend skew, #37541) — surface an
+ // explicit closeable terminal state telling the user the GUI package
+ // was NOT changed and must be updated/reinstalled.
+ // underUnpacked + !sandboxOk → 'manual': we'd be relaunching the rebuilt
+ // binary, but a fresh rebuild can leave chrome-sandbox without
+ // root:root + setuid (mode 4755) and Electron then refuses to launch
+ // ("quit and never came back"). DO NOT quit into a dead app — keep the
+ // working window and surface the closeable manual-restart state.
+ if (!IS_MAC) {
+ const unpackedDir = resolveUnpackedRelease(process.execPath, updateRoot, process.platform)
+ const underUnpacked = unpackedDir !== null
+
+ const preflight = underUnpacked
+ ? sandboxPreflight(unpackedDir, p => fs.statSync(p))
+ : { ok: false, reason: 'not-under-unpacked', path: null }
+ const sandboxFallback = sandboxFallbackFromEnv(process.env, process.argv.slice(1))
+ const sandboxOk = preflight.ok || sandboxFallback
+ if (underUnpacked && !preflight.ok) {
+ rememberLog(
+ `[updates] sandbox preflight: not launchable (${preflight.reason}) at ${preflight.path}; ` +
+ `fallback=${sandboxFallback ? 'env/--no-sandbox' : 'none'}`
+ )
+ }
+
+ const outcome = decideRelaunchOutcome({ underUnpacked, sandboxOk })
+
+ if (outcome === 'relaunch') {
+ emitUpdateProgress({ stage: 'restart', message: 'Restarting Hermes…', percent: 100 })
+ // Preserve launch context across the re-exec: replay the original args
+ // (filtered of Electron internals) and the env/cwd that define which
+ // backend/profile/root this instance talks to. Without this the
+ // relaunched instance comes up with default context instead of the user's.
+ const relaunchArgs = collectRelaunchArgs(process.argv.slice(1))
+ const relaunchEnv = collectRelaunchEnv(process.env)
+ const relaunchScript = buildRelaunchScript({
+ pid: process.pid,
+ execPath: process.execPath,
+ args: relaunchArgs,
+ env: relaunchEnv,
+ cwd: process.cwd()
+ })
+ const scriptPath = path.join(app.getPath('temp'), `hermes-desktop-update-${Date.now()}.sh`)
+ try {
+ fs.writeFileSync(scriptPath, relaunchScript, { mode: 0o755 })
+ const child = spawn('/bin/bash', [scriptPath], { detached: true, stdio: 'ignore' })
+ child.unref()
+ rememberLog(
+ `[updates] launched linux relaunch: ${scriptPath} -> ${process.execPath} ` +
+ `(args=${relaunchArgs.length}, env=${Object.keys(relaunchEnv).length})`
+ )
+ setTimeout(() => app.quit(), UPDATE_HANDOFF_DWELL_MS)
+ return { ok: true, handedOff: true }
+ } catch (err) {
+ rememberLog(`[updates] linux relaunch failed: ${err.message}; falling back to manual restart`)
+ return {
+ ok: true,
+ backendUpdated: true,
+ guiUpdated: false,
+ manualRestart: true,
+ message: 'Backend updated. Quit and reopen Hermes to load the new version.'
+ }
+ }
+ }
+
+ if (outcome === 'guiSkew') {
+ emitUpdateProgress({
+ stage: 'guiSkew',
+ message:
+ 'Backend updated, but the desktop app package was not changed. ' +
+ 'Update or reinstall the Hermes desktop app to match.',
+ percent: 100
+ })
+ rememberLog(
+ `[updates] gui/backend skew: execPath ${process.execPath} not under release/*-unpacked; ` +
+ 'backend updated, GUI package unchanged (AppImage/.deb/.rpm/dev/unresolved)'
+ )
+ return { ok: true, backendUpdated: true, guiUpdated: false, guiSkew: true }
+ }
+
+ // outcome === 'manual': we're the rebuilt binary, but its sandbox helper is
+ // not launchable and no fallback applies. Keep this working window alive.
+ rememberLog(
+ `[updates] sandbox not launchable (${preflight.reason}); skipping auto-relaunch, ` +
+ 'returning manual-restart so the user keeps a working window'
+ )
+ return {
+ ok: true,
+ backendUpdated: true,
+ guiUpdated: false,
+ manualRestart: true,
+ sandboxBlocked: true,
+ message:
+ 'Backend updated. The rebuilt app can’t relaunch automatically ' +
+ '(sandbox helper needs root). Quit and reopen Hermes to finish.'
+ }
+ }
+
const rebuiltApp = [
path.join(updateRoot, 'apps', 'desktop', 'release', 'mac-arm64', 'Hermes.app'),
path.join(updateRoot, 'apps', 'desktop', 'release', 'mac', 'Hermes.app')
@@ -2963,20 +3162,7 @@ function runRenderTitleJob(rawUrl) {
}
try {
- window = new BrowserWindow({
- show: false,
- width: 1280,
- height: 800,
- webPreferences: {
- backgroundThrottling: false,
- contextIsolation: true,
- javascript: true,
- nodeIntegration: false,
- sandbox: true,
- session: partitionSession,
- webSecurity: true
- }
- })
+ window = createLinkTitleWindow(BrowserWindow, partitionSession)
} catch {
return finish('')
}
@@ -4905,6 +5091,14 @@ async function startHermes() {
}
}
+ // Mutual exclusion with an in-app update (#50238). If this instance was
+ // relaunched while the Tauri updater is still applying an update, spawning
+ // a local backend now re-locks the venv shim and gets killed by the
+ // updater's straggler cleanup — looping. Park until the update finishes (or
+ // is detected stale), THEN start the backend. Local backends only; remote
+ // connections returned above and never touch the install tree.
+ await waitForUpdateToFinish()
+
const token = crypto.randomBytes(32).toString('base64url')
// --port 0: the OS assigns an ephemeral port; the child announces it on stdout.
const dashboardArgs = ['dashboard', '--no-open', '--host', '127.0.0.1', '--port', '0']
diff --git a/apps/desktop/electron/preload.cjs b/apps/desktop/electron/preload.cjs
index 93620facdf4..f2f348b1d36 100644
--- a/apps/desktop/electron/preload.cjs
+++ b/apps/desktop/electron/preload.cjs
@@ -166,6 +166,7 @@ contextBridge.exposeInMainWorld('hermesDesktop', {
return () => ipcRenderer.removeListener('hermes:bootstrap:event', listener)
},
getVersion: () => ipcRenderer.invoke('hermes:version'),
+ getRemoteDisplayReason: () => ipcRenderer.invoke('hermes:get-remote-display-reason'),
uninstall: {
summary: () => ipcRenderer.invoke('hermes:uninstall:summary'),
run: mode => ipcRenderer.invoke('hermes:uninstall:run', { mode })
diff --git a/apps/desktop/electron/update-marker.cjs b/apps/desktop/electron/update-marker.cjs
new file mode 100644
index 00000000000..a00a18baf00
--- /dev/null
+++ b/apps/desktop/electron/update-marker.cjs
@@ -0,0 +1,93 @@
+/**
+ * In-app update mutual-exclusion marker (#50238).
+ *
+ * The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
+ * duration of an `--update` run (see apps/bootstrap-installer/src-tauri/src/
+ * update.rs `UpdateMarkerGuard`). The marker body is two lines: the updater's
+ * pid and the unix-seconds it started.
+ *
+ * Why: if the user relaunches the desktop mid-update — the window vanished with
+ * no progress and looks crashed — a fresh instance must NOT spawn its own local
+ * backend. That backend re-locks the venv shim, the updater's straggler cleanup
+ * (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
+ * fails with the 45s "backend didn't come up" timeout, and the user relaunches
+ * into the same trap — an infinite respawn/kill loop. The desktop gates local
+ * backend startup on this marker and parks until the update finishes.
+ *
+ * This module holds the PURE, side-effect-light logic (path, pid liveness,
+ * parse + staleness) so it is unit-testable without booting Electron. The
+ * polling/boot-progress wrapper lives in main.cjs where the boot-progress and
+ * log sinks are.
+ */
+
+const fs = require('fs')
+const path = require('path')
+
+// Even with a live-looking PID, never treat a marker older than this as a live
+// update. A full update (git pull + pip + desktop rebuild) is minutes, not tens
+// of minutes; past this the marker is almost certainly stale (e.g. the OS
+// recycled the pid onto an unrelated process), so the gate self-heals.
+const UPDATE_MARKER_MAX_AGE_MS = 20 * 60 * 1000
+
+function markerPath(hermesHome) {
+ return path.join(hermesHome, '.hermes-update-in-progress')
+}
+
+// True only if a host process with this pid is currently alive. Signal 0 does
+// not deliver a signal — it just probes existence/permission. ESRCH => dead;
+// EPERM => alive but owned by another user (still "alive" for our purposes).
+// Injectable `kill` keeps it unit-testable.
+function isPidAlive(pid, kill = process.kill.bind(process)) {
+ if (!Number.isInteger(pid) || pid <= 0) return false
+ try {
+ kill(pid, 0)
+ return true
+ } catch (err) {
+ return Boolean(err && err.code === 'EPERM')
+ }
+}
+
+/**
+ * Read + interpret the marker.
+ *
+ * Returns `{ pid, ageMs }` only when an update is GENUINELY still running
+ * (parseable pid that is alive, within the age ceiling). Returns `null` for
+ * every "no live update" case — absent, unreadable, malformed, dead pid, or
+ * past the ceiling — and, when a stale marker file exists, deletes it so it
+ * cannot strand future launches.
+ *
+ * Pure-ish: file I/O against the given path, plus an injectable pid probe and
+ * clock for tests.
+ */
+function readLiveUpdateMarker(hermesHome, { kill, now = Date.now, maxAgeMs = UPDATE_MARKER_MAX_AGE_MS } = {}) {
+ const file = markerPath(hermesHome)
+ let raw
+ try {
+ raw = fs.readFileSync(file, 'utf8')
+ } catch {
+ return null // absent or unreadable => no live update
+ }
+
+ const [pidLine, startedLine] = String(raw).split('\n')
+ const pid = Number.parseInt((pidLine || '').trim(), 10)
+ const startedAt = Number.parseInt((startedLine || '').trim(), 10)
+ const ageMs = Number.isFinite(startedAt) ? now() - startedAt * 1000 : Infinity
+ const alive = Number.isInteger(pid) && isPidAlive(pid, kill)
+
+ if (!alive || ageMs > maxAgeMs) {
+ try {
+ fs.unlinkSync(file)
+ } catch {
+ void 0
+ }
+ return null
+ }
+ return { pid, ageMs }
+}
+
+module.exports = {
+ UPDATE_MARKER_MAX_AGE_MS,
+ markerPath,
+ isPidAlive,
+ readLiveUpdateMarker
+}
diff --git a/apps/desktop/electron/update-marker.test.cjs b/apps/desktop/electron/update-marker.test.cjs
new file mode 100644
index 00000000000..4de97dc2451
--- /dev/null
+++ b/apps/desktop/electron/update-marker.test.cjs
@@ -0,0 +1,92 @@
+/**
+ * Tests for electron/update-marker.cjs — the in-app update mutual-exclusion
+ * marker that prevents a desktop relaunched mid-update from spawning a backend
+ * the updater then kills in a loop (#50238).
+ *
+ * Run with: node --test electron/update-marker.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * Why this matters: the gate must (a) report a live update only when the
+ * updater pid is alive AND the marker is fresh, (b) treat absent/malformed/
+ * dead-pid/expired markers as "no live update" so a crashed updater can't
+ * strand future launches, and (c) self-heal by deleting a stale marker file.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const fs = require('fs')
+const os = require('os')
+const path = require('path')
+
+const { markerPath, isPidAlive, readLiveUpdateMarker, UPDATE_MARKER_MAX_AGE_MS } = require('./update-marker.cjs')
+
+function tmpHome(tag) {
+ const dir = fs.mkdtempSync(path.join(os.tmpdir(), `hermes-marker-${tag}-`))
+ return dir
+}
+
+function writeMarker(home, pid, startedAtSec) {
+ fs.writeFileSync(markerPath(home), `${pid}\n${startedAtSec}`)
+}
+
+const ALIVE = () => true // injected kill that "succeeds" => pid alive
+const DEAD = () => {
+ const err = new Error('no such process')
+ err.code = 'ESRCH'
+ throw err
+}
+
+test('absent marker => no live update', () => {
+ const home = tmpHome('absent')
+ assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
+})
+
+test('live pid within age ceiling => live update reported', () => {
+ const home = tmpHome('live')
+ const now = 1_000_000_000_000
+ writeMarker(home, 4242, Math.floor(now / 1000) - 5) // 5s old
+ const res = readLiveUpdateMarker(home, { kill: ALIVE, now: () => now })
+ assert.ok(res, 'a fresh, alive marker is a live update')
+ assert.equal(res.pid, 4242)
+ assert.ok(res.ageMs >= 0 && res.ageMs < 10_000)
+ assert.ok(fs.existsSync(markerPath(home)), 'a live marker is NOT deleted')
+})
+
+test('dead pid => no live update and marker is pruned', () => {
+ const home = tmpHome('dead')
+ writeMarker(home, 999999, Math.floor(Date.now() / 1000))
+ assert.equal(readLiveUpdateMarker(home, { kill: DEAD }), null)
+ assert.ok(!fs.existsSync(markerPath(home)), 'a dead-pid marker self-heals (deleted)')
+})
+
+test('expired marker (past age ceiling) => no live update and pruned', () => {
+ const home = tmpHome('expired')
+ const now = 1_000_000_000_000
+ writeMarker(home, 4242, Math.floor((now - UPDATE_MARKER_MAX_AGE_MS - 60_000) / 1000))
+ // Even though the pid is "alive", the marker is too old to trust.
+ assert.equal(readLiveUpdateMarker(home, { kill: ALIVE, now: () => now }), null)
+ assert.ok(!fs.existsSync(markerPath(home)), 'an expired marker self-heals (deleted)')
+})
+
+test('malformed marker => no live update and pruned', () => {
+ const home = tmpHome('malformed')
+ fs.writeFileSync(markerPath(home), 'not-a-pid\nnonsense')
+ assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
+ assert.ok(!fs.existsSync(markerPath(home)))
+})
+
+test('isPidAlive: own pid is alive, impossible pid is dead', () => {
+ assert.equal(isPidAlive(process.pid), true)
+ assert.equal(isPidAlive(-1), false)
+ assert.equal(isPidAlive(0), false)
+ assert.equal(isPidAlive(NaN), false)
+})
+
+test('isPidAlive: EPERM counts as alive (process owned by another user)', () => {
+ const eperm = () => {
+ const err = new Error('operation not permitted')
+ err.code = 'EPERM'
+ throw err
+ }
+ assert.equal(isPidAlive(4242, eperm), true)
+})
diff --git a/apps/desktop/electron/update-relaunch.cjs b/apps/desktop/electron/update-relaunch.cjs
new file mode 100644
index 00000000000..62032cde8c9
--- /dev/null
+++ b/apps/desktop/electron/update-relaunch.cjs
@@ -0,0 +1,265 @@
+'use strict'
+
+/**
+ * update-relaunch.cjs — pure decision + script-generation helpers for the
+ * Linux in-app update relaunch (#45205).
+ *
+ * Extracted from main.cjs's `applyUpdatesPosixInApp` so the security- and
+ * correctness-critical "do we relaunch, or land on a manual terminal state?"
+ * decision is unit-testable without booting Electron (main.cjs
+ * `require('electron')` at load).
+ *
+ * Background
+ * ----------
+ * After `hermes update` + `hermes desktop --build-only`, the freshly-rebuilt
+ * GUI lives under `apps/desktop/release/-unpacked`. We can only honestly
+ * relaunch into the new GUI when the *running* binary is that rebuilt one —
+ * i.e. its execPath is under the rebuilt `release/-unpacked` dir.
+ *
+ * - Source / unpacked install (execPath under release/-unpacked):
+ * the running binary IS the thing we just rebuilt → relaunch it in place.
+ * - AppImage / .deb / .rpm / dev / unresolved (execPath elsewhere):
+ * the backend was updated but THIS GUI shell was NOT replaced. Claiming
+ * "the new version loads next launch" is a lie that produces GUI/backend
+ * skew (#37541): the user keeps running the old GUI against new backend
+ * code with no path to fix it from inside the app. Surface an explicit
+ * terminal state telling them the GUI package must be reinstalled.
+ *
+ * Sandbox preflight (#3 in the review)
+ * ------------------------------------
+ * A fresh `release/-unpacked` rebuild can leave `chrome-sandbox` without
+ * the required `root:root` + setuid (mode 4755). Electron then refuses to
+ * launch with "The SUID sandbox helper binary was found, but is not configured
+ * correctly" and the relaunch yields "quit and never came back" — a dead app.
+ * Before we quit+hand off we preflight the rebuilt sandbox helper; if it is NOT
+ * launchable (and no working non-interactive fallback applies — see
+ * sandboxFallbackFromEnv) we DO NOT quit. We keep the working window and return
+ * the closeable manual-restart terminal state instead.
+ */
+
+const path = require('node:path')
+
+// Map process.platform → electron-builder's `release/-unpacked` name.
+function unpackedDirName(platform) {
+ if (platform === 'darwin') return 'mac-unpacked' // not used (mac swaps bundles)
+ if (platform === 'win32') return 'win-unpacked'
+ return 'linux-unpacked'
+}
+
+/**
+ * If `execPath` lives under `/apps/desktop/release/-unpacked`,
+ * return that unpacked dir; otherwise null. A null result means the running
+ * binary is NOT the thing we just rebuilt (AppImage/.deb/.rpm/dev), so we must
+ * not claim a GUI relaunch.
+ *
+ * Match is a path-segment-aware prefix check (not a bare string startsWith) so
+ * `.../release/linux-unpacked-evil` can't masquerade as `.../release/linux-unpacked`.
+ */
+function resolveUnpackedRelease(execPath, updateRoot, platform) {
+ if (!execPath || !updateRoot) return null
+ const releaseDir = path.join(updateRoot, 'apps', 'desktop', 'release')
+ const unpacked = path.join(releaseDir, unpackedDirName(platform))
+ const normalizedExec = path.resolve(String(execPath))
+ // execPath must be the unpacked dir itself or a descendant of it.
+ const withSep = unpacked.endsWith(path.sep) ? unpacked : unpacked + path.sep
+ if (normalizedExec === unpacked || normalizedExec.startsWith(withSep)) {
+ return unpacked
+ }
+ return null
+}
+
+/**
+ * Pure decision: given whether the running binary is under the rebuilt
+ * unpacked release AND whether its sandbox helper is launchable, choose the
+ * terminal outcome.
+ *
+ * 'relaunch' — quit + detached watcher re-execs the rebuilt binary in place.
+ * 'guiSkew' — backend updated, GUI package NOT changed; user must reinstall
+ * the GUI. Closeable terminal state; does NOT claim a GUI update.
+ * 'manual' — running the rebuilt binary, but its sandbox helper is not
+ * launchable and no fallback applies; do NOT quit into a dead
+ * app. Closeable manual-restart terminal state.
+ */
+function decideRelaunchOutcome({ underUnpacked, sandboxOk }) {
+ if (!underUnpacked) return 'guiSkew'
+ if (!sandboxOk) return 'manual'
+ return 'relaunch'
+}
+
+/**
+ * Preflight the rebuilt sandbox helper. Returns
+ * { ok: boolean, reason: string, path: string }
+ *
+ * `ok` is true when chrome-sandbox is owned by uid 0 AND has the setuid bit
+ * (mode & 0o4000) — i.e. Electron can launch it. If chrome-sandbox does not
+ * exist at all we treat it as ok: this Electron build does not use the SUID
+ * sandbox helper (e.g. it ships the namespace sandbox), so the relaunch is not
+ * blocked on it.
+ *
+ * `statSync` is injectable so this is testable without a real setuid file.
+ */
+function sandboxPreflight(unpackedDir, statSync) {
+ if (!unpackedDir) return { ok: false, reason: 'no-unpacked-dir', path: null }
+ const sandboxPath = path.join(unpackedDir, 'chrome-sandbox')
+ let st
+ try {
+ st = statSync(sandboxPath)
+ } catch {
+ // No chrome-sandbox helper present → this build doesn't rely on the SUID
+ // sandbox; nothing to block the relaunch.
+ return { ok: true, reason: 'no-sandbox-helper', path: sandboxPath }
+ }
+ const ownedByRoot = st.uid === 0
+ const hasSetuid = (st.mode & 0o4000) !== 0
+ if (ownedByRoot && hasSetuid) {
+ return { ok: true, reason: 'launchable', path: sandboxPath }
+ }
+ if (!ownedByRoot && !hasSetuid) {
+ return { ok: false, reason: 'not-root-not-setuid', path: sandboxPath }
+ }
+ if (!ownedByRoot) return { ok: false, reason: 'not-root', path: sandboxPath }
+ return { ok: false, reason: 'not-setuid', path: sandboxPath }
+}
+
+/**
+ * Detect a non-interactive sandbox fallback the user has opted into via the
+ * environment. The reviewer asked us to integrate with any existing
+ * `--no-sandbox` / chrome-sandbox handling. A repo grep found NO existing
+ * non-interactive sandbox fallback in the desktop app (the only chrome-sandbox
+ * reference is documentation in scripts/before-pack.cjs). The one signal that
+ * DOES exist is the standard Electron escape hatch: ELECTRON_DISABLE_SANDBOX=1
+ * (and the equivalent `--no-sandbox` already present in the launch args). If
+ * the user has set that, the rebuilt binary will start even with a broken
+ * chrome-sandbox, so the relaunch is safe.
+ *
+ * Returns true when a fallback makes the relaunch safe despite a failed
+ * sandbox preflight.
+ */
+function sandboxFallbackFromEnv(env, launchArgs) {
+ const disable = String((env && env.ELECTRON_DISABLE_SANDBOX) || '').trim()
+ if (disable === '1' || disable.toLowerCase() === 'true') return true
+ if (Array.isArray(launchArgs) && launchArgs.some(a => a === '--no-sandbox')) return true
+ return false
+}
+
+// POSIX single-quote a value for safe inclusion in the generated bash script.
+function shellQuote(value) {
+ return `'${String(value).replace(/'/g, `'\\''`)}'`
+}
+
+// Electron / Chromium internal switches that must NOT be replayed on re-exec:
+// they are runtime artifacts of THIS launch, not user intent, and re-passing
+// them can change sandbox/zygote behavior or point at stale fds/dirs.
+const INTERNAL_ARG_PREFIXES = [
+ '--type=', // renderer/gpu/zygote child markers
+ '--user-data-dir=',
+ '--enable-features=',
+ '--disable-features=',
+ '--field-trial-handle=',
+ '--enable-logging',
+ '--log-file=',
+ // NB: --no-sandbox is deliberately NOT stripped — it reflects the user's /
+ // environment's SUID-sandbox opt-out (some hardened kernels/containers require
+ // it) and is the signal sandboxFallbackFromEnv() uses to allow a relaunch when
+ // chrome-sandbox isn't setuid. Dropping it would make exactly that relaunch
+ // fail ("quit and never came back").
+ '--disable-gpu-sandbox',
+ '--lang=',
+ '--inspect',
+ '--remote-debugging-port='
+]
+
+/**
+ * Filter Electron internals out of the original launch args so we replay only
+ * meaningful user/launcher intent (deep-link URLs, app-specific flags).
+ * `argv` is expected to be process.argv.slice(1) for a PACKAGED app (argv[0] is
+ * the exec path itself; there is no entry-script arg as in a dev run).
+ */
+function collectRelaunchArgs(argv) {
+ if (!Array.isArray(argv)) return []
+ return argv.filter(arg => {
+ if (typeof arg !== 'string' || arg.length === 0) return false
+ return !INTERNAL_ARG_PREFIXES.some(prefix =>
+ prefix.endsWith('=') ? arg.startsWith(prefix) : arg === prefix || arg.startsWith(prefix + '=')
+ )
+ })
+}
+
+// Env keys whose values define the relaunched instance's context (which
+// backend/profile/root it talks to). Anything HERMES_DESKTOP_* is preserved
+// plus HERMES_HOME. We snapshot the values, not the live env, so the new
+// instance comes up pointed at the same place this one was.
+// ELECTRON_DISABLE_SANDBOX is preserved for the same reason --no-sandbox is kept
+// in the replayed args: if a relaunch is only safe because the user opted out of
+// the SUID sandbox, the relaunched instance must inherit that opt-out too.
+const PRESERVED_ENV_KEYS = ['HERMES_HOME', 'ELECTRON_DISABLE_SANDBOX']
+const PRESERVED_ENV_PREFIXES = ['HERMES_DESKTOP_']
+
+function collectRelaunchEnv(env) {
+ const out = {}
+ if (!env || typeof env !== 'object') return out
+ for (const [key, value] of Object.entries(env)) {
+ if (value == null) continue
+ if (PRESERVED_ENV_KEYS.includes(key) || PRESERVED_ENV_PREFIXES.some(p => key.startsWith(p))) {
+ out[key] = String(value)
+ }
+ }
+ return out
+}
+
+/**
+ * Build the detached bash watcher that waits for the parent to exit (graceful
+ * window then SIGKILL), self-deletes, and re-execs the rebuilt binary WITH the
+ * original launch context (cwd, env, args) restored.
+ *
+ * @param {object} o
+ * @param {number} o.pid parent (this) process pid to wait on
+ * @param {string} o.execPath binary to re-exec
+ * @param {string[]} o.args filtered launch args to replay
+ * @param {object} o.env env key→value to export before exec
+ * @param {string} o.cwd working directory to restore
+ */
+function buildRelaunchScript({ pid, execPath, args, env, cwd }) {
+ const exports = Object.entries(env || {})
+ .map(([k, v]) => `export ${k}=${shellQuote(v)}`)
+ .join('\n')
+ const quotedArgs = (args || []).map(shellQuote).join(' ')
+ const cwdLine = cwd ? `cd ${shellQuote(cwd)} 2>/dev/null || true` : ''
+ // NOTE: `exec` replaces the watcher process with the relaunched app, so the
+ // re-exec inherits exactly the env/cwd we set above.
+ return `#!/bin/bash
+set -u
+APP_PID=${Number(pid)}
+# Wait up to ~30s for a graceful exit, then SIGKILL: a hung/zombie parent must
+# be gone before we relaunch, or the new instance bails on the single-instance
+# lock. (#45205)
+for _ in $(seq 1 60); do
+ kill -0 "$APP_PID" 2>/dev/null || break
+ sleep 0.5
+done
+if kill -0 "$APP_PID" 2>/dev/null; then
+ kill -9 "$APP_PID" 2>/dev/null || true
+ sleep 0.5
+fi
+# Self-delete so temp watchers don't accumulate across updates.
+rm -f -- "$0" 2>/dev/null || true
+${cwdLine}
+${exports}
+exec ${shellQuote(execPath)}${quotedArgs ? ' ' + quotedArgs : ''}
+`
+}
+
+module.exports = {
+ unpackedDirName,
+ resolveUnpackedRelease,
+ decideRelaunchOutcome,
+ sandboxPreflight,
+ sandboxFallbackFromEnv,
+ collectRelaunchArgs,
+ collectRelaunchEnv,
+ buildRelaunchScript,
+ shellQuote,
+ INTERNAL_ARG_PREFIXES,
+ PRESERVED_ENV_KEYS,
+ PRESERVED_ENV_PREFIXES
+}
diff --git a/apps/desktop/electron/update-relaunch.test.cjs b/apps/desktop/electron/update-relaunch.test.cjs
new file mode 100644
index 00000000000..0cccb1b20eb
--- /dev/null
+++ b/apps/desktop/electron/update-relaunch.test.cjs
@@ -0,0 +1,231 @@
+/**
+ * Tests for electron/update-relaunch.cjs — the pure decision + script helpers
+ * behind the Linux in-app update relaunch (#45205).
+ *
+ * Run with: node --test electron/update-relaunch.test.cjs
+ * (Wired into npm test:desktop:platforms in package.json.)
+ *
+ * What this locks (review acceptance criteria for PR #45205):
+ * 1. The execPath split: only a binary under release/-unpacked may
+ * relaunch/claim a GUI update; AppImage/.deb/.rpm/dev/unresolved paths land
+ * on the guiSkew terminal state and do NOT claim the GUI was updated.
+ * 2. Launch context is replayed on re-exec (args filtered of Electron
+ * internals; HERMES_HOME / HERMES_DESKTOP_* env + cwd preserved) and is
+ * safely shell-quoted.
+ * 3. The sandbox preflight: chrome-sandbox must be root-owned + setuid to be
+ * launchable; otherwise the decision degrades to a manual terminal state
+ * (keep a working window) unless a non-interactive fallback applies.
+ */
+
+const test = require('node:test')
+const assert = require('node:assert/strict')
+const fs = require('node:fs')
+const os = require('node:os')
+const path = require('node:path')
+const { execFileSync } = require('node:child_process')
+
+const {
+ unpackedDirName,
+ resolveUnpackedRelease,
+ decideRelaunchOutcome,
+ sandboxPreflight,
+ sandboxFallbackFromEnv,
+ collectRelaunchArgs,
+ collectRelaunchEnv,
+ buildRelaunchScript,
+ shellQuote
+} = require('./update-relaunch.cjs')
+
+const ROOT = '/home/u/.hermes/hermes-agent'
+const UNPACKED = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked')
+
+// ---------------------------------------------------------------------------
+// 1) The execPath split — the heart of the GUI/backend skew guard.
+// ---------------------------------------------------------------------------
+
+test('unpackedDirName maps platform to the electron-builder dir', () => {
+ assert.equal(unpackedDirName('linux'), 'linux-unpacked')
+ assert.equal(unpackedDirName('win32'), 'win-unpacked')
+})
+
+test('resolveUnpackedRelease returns the dir for a binary UNDER release/-unpacked', () => {
+ const exec = path.join(UNPACKED, 'hermes')
+ assert.equal(resolveUnpackedRelease(exec, ROOT, 'linux'), UNPACKED)
+ // The unpacked dir itself also counts.
+ assert.equal(resolveUnpackedRelease(UNPACKED, ROOT, 'linux'), UNPACKED)
+})
+
+test('resolveUnpackedRelease is null for AppImage / .deb / .rpm / dev / unresolved paths', () => {
+ // AppImage mount
+ assert.equal(resolveUnpackedRelease('/tmp/.mount_Hermes12345/AppRun', ROOT, 'linux'), null)
+ // .deb / .rpm system install
+ assert.equal(resolveUnpackedRelease('/usr/lib/hermes/hermes', ROOT, 'linux'), null)
+ assert.equal(resolveUnpackedRelease('/opt/Hermes/hermes', ROOT, 'linux'), null)
+ // dev electron
+ assert.equal(resolveUnpackedRelease('/home/u/.hermes/hermes-agent/node_modules/electron/dist/electron', ROOT, 'linux'), null)
+ // empty / missing
+ assert.equal(resolveUnpackedRelease('', ROOT, 'linux'), null)
+ assert.equal(resolveUnpackedRelease(path.join(UNPACKED, 'hermes'), '', 'linux'), null)
+})
+
+test('resolveUnpackedRelease is not fooled by a sibling prefix dir', () => {
+ // `.../release/linux-unpacked-evil` must NOT match `.../release/linux-unpacked`.
+ const sneaky = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked-evil', 'hermes')
+ assert.equal(resolveUnpackedRelease(sneaky, ROOT, 'linux'), null)
+})
+
+test('decideRelaunchOutcome: only under-unpacked + sandbox-ok relaunches', () => {
+ assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: true }), 'relaunch')
+ // Under unpacked but sandbox not launchable → manual (keep a working window).
+ assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: false }), 'manual')
+ // Not under unpacked → guiSkew regardless of sandbox flag.
+ assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: true }), 'guiSkew')
+ assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: false }), 'guiSkew')
+})
+
+// ---------------------------------------------------------------------------
+// 3) Sandbox preflight
+// ---------------------------------------------------------------------------
+
+const fakeStat = (uid, mode) => () => ({ uid, mode })
+const throwStat = () => {
+ throw Object.assign(new Error('ENOENT'), { code: 'ENOENT' })
+}
+
+test('sandboxPreflight: root-owned + setuid is launchable', () => {
+ const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o4755))
+ assert.equal(r.ok, true)
+ assert.equal(r.reason, 'launchable')
+})
+
+test('sandboxPreflight: not root → not launchable', () => {
+ const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o4755))
+ assert.equal(r.ok, false)
+ assert.equal(r.reason, 'not-root')
+})
+
+test('sandboxPreflight: missing setuid bit → not launchable', () => {
+ const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o755))
+ assert.equal(r.ok, false)
+ assert.equal(r.reason, 'not-setuid')
+})
+
+test('sandboxPreflight: neither root nor setuid (the fresh-rebuild trap)', () => {
+ const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o755))
+ assert.equal(r.ok, false)
+ assert.equal(r.reason, 'not-root-not-setuid')
+})
+
+test('sandboxPreflight: no chrome-sandbox helper present → ok (build does not use SUID sandbox)', () => {
+ const r = sandboxPreflight(UNPACKED, throwStat)
+ assert.equal(r.ok, true)
+ assert.equal(r.reason, 'no-sandbox-helper')
+})
+
+test('sandboxFallbackFromEnv: ELECTRON_DISABLE_SANDBOX / --no-sandbox make a broken sandbox safe', () => {
+ assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: '1' }, []), true)
+ assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: 'true' }, []), true)
+ assert.equal(sandboxFallbackFromEnv({}, ['--no-sandbox']), true)
+ assert.equal(sandboxFallbackFromEnv({}, ['--foo']), false)
+ assert.equal(sandboxFallbackFromEnv({}, []), false)
+ assert.equal(sandboxFallbackFromEnv(null, null), false)
+})
+
+// ---------------------------------------------------------------------------
+// 2) Launch-context preservation
+// ---------------------------------------------------------------------------
+
+test('collectRelaunchArgs drops Electron internals, keeps user/launcher args', () => {
+ const argv = [
+ '--type=renderer',
+ '--user-data-dir=/tmp/x',
+ '--enable-features=Foo',
+ '--field-trial-handle=123',
+ '--no-sandbox', // sandbox opt-out — KEEP (user/env intent + relaunch fallback)
+ '--lang=en-US',
+ 'hermes://open/agent/42', // deep link — keep
+ '--profile=work', // app flag — keep
+ '--remote-debugging-port=9222' // internal — drop
+ ]
+ assert.deepEqual(collectRelaunchArgs(argv), ['--no-sandbox', 'hermes://open/agent/42', '--profile=work'])
+ assert.deepEqual(collectRelaunchArgs(undefined), [])
+})
+
+test('collectRelaunchEnv preserves HERMES_HOME + HERMES_DESKTOP_* + sandbox opt-out only', () => {
+ const env = {
+ HERMES_HOME: '/home/u/.hermes',
+ HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
+ HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
+ HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
+ ELECTRON_DISABLE_SANDBOX: '1', // sandbox opt-out — preserved
+ PATH: '/usr/bin', // not preserved
+ HOME: '/home/u', // not preserved
+ UNRELATED: 'x'
+ }
+ assert.deepEqual(collectRelaunchEnv(env), {
+ HERMES_HOME: '/home/u/.hermes',
+ HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
+ HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
+ HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
+ ELECTRON_DISABLE_SANDBOX: '1'
+ })
+ assert.deepEqual(collectRelaunchEnv(null), {})
+})
+
+// ---------------------------------------------------------------------------
+// Generated watcher script: safe quoting + valid bash syntax.
+// ---------------------------------------------------------------------------
+
+test('shellQuote neutralizes single quotes and metacharacters', () => {
+ assert.equal(shellQuote(`a'b`), `'a'\\''b'`)
+ assert.equal(shellQuote('$(rm -rf /)'), `'$(rm -rf /)'`)
+})
+
+test('buildRelaunchScript embeds pid/exec/args/env/cwd and is valid bash', () => {
+ const script = buildRelaunchScript({
+ pid: 4242,
+ execPath: '/home/u/.hermes/hermes-agent/apps/desktop/release/linux-unpacked/Hermes',
+ args: ['hermes://open/agent/42', "--note=it's fine"],
+ env: { HERMES_HOME: '/home/u/.hermes', HERMES_DESKTOP_REMOTE_URL: 'http://box:9119' },
+ cwd: '/home/u/work dir'
+ })
+
+ // Structural assertions.
+ assert.match(script, /^#!\/bin\/bash/)
+ assert.match(script, /APP_PID=4242/)
+ assert.match(script, /kill -9 "\$APP_PID"/)
+ assert.match(script, /rm -f -- "\$0"/)
+ // env exports + cwd restore + args replay are present and quoted.
+ assert.match(script, /export HERMES_HOME='\/home\/u\/\.hermes'/)
+ assert.match(script, /export HERMES_DESKTOP_REMOTE_URL='http:\/\/box:9119'/)
+ assert.match(script, /cd '\/home\/u\/work dir'/)
+ assert.match(script, /exec '.*\/linux-unpacked\/Hermes' 'hermes:\/\/open\/agent\/42' '--note=it'\\''s fine'/)
+
+ // It must be syntactically valid bash (`bash -n`). Write to a temp file and lint.
+ const tmp = path.join(os.tmpdir(), `hermes-relaunch-test-${Date.now()}.sh`)
+ fs.writeFileSync(tmp, script)
+ try {
+ execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
+ } finally {
+ fs.rmSync(tmp, { force: true })
+ }
+})
+
+test('buildRelaunchScript with no args/env still lints clean', () => {
+ const script = buildRelaunchScript({
+ pid: 1,
+ execPath: '/opt/Hermes/Hermes',
+ args: [],
+ env: {},
+ cwd: ''
+ })
+ const tmp = path.join(os.tmpdir(), `hermes-relaunch-test2-${Date.now()}.sh`)
+ fs.writeFileSync(tmp, script)
+ try {
+ execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
+ } finally {
+ fs.rmSync(tmp, { force: true })
+ }
+ // exec line has no trailing args.
+ assert.match(script, /exec '\/opt\/Hermes\/Hermes'\n/)
+})
diff --git a/apps/desktop/package.json b/apps/desktop/package.json
index c1d2290e4cb..81e855451f8 100644
--- a/apps/desktop/package.json
+++ b/apps/desktop/package.json
@@ -2,7 +2,7 @@
"name": "hermes",
"productName": "Hermes",
"private": true,
- "version": "0.15.1",
+ "version": "0.17.0",
"description": "Native desktop shell for Hermes Agent.",
"author": "Nous Research",
"type": "module",
@@ -37,7 +37,7 @@
"test:desktop:nsis": "node scripts/test-desktop.mjs nsis",
"test:desktop:existing": "node scripts/test-desktop.mjs existing",
"test:desktop:fresh": "node scripts/test-desktop.mjs fresh",
- "test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/windows-user-env.test.cjs",
+ "test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/backend-ready.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/link-title-window.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/update-marker.test.cjs electron/update-relaunch.test.cjs electron/windows-user-env.test.cjs",
"typecheck": "tsc -p . --noEmit",
"lint": "eslint src/ electron/",
"lint:fix": "eslint src/ electron/ --fix",
diff --git a/apps/desktop/src/app/agents/index.tsx b/apps/desktop/src/app/agents/index.tsx
index ec8f186dd1b..6a1fbf9eeea 100644
--- a/apps/desktop/src/app/agents/index.tsx
+++ b/apps/desktop/src/app/agents/index.tsx
@@ -357,7 +357,7 @@ function SubagentRow({ node, depth = 0, nowMs }: { node: SubagentNode; depth?: n
{visibleRows.length > 0 ? (
-
+ )
+}
diff --git a/apps/desktop/src/components/ui/file-type-icon.tsx b/apps/desktop/src/components/ui/file-type-icon.tsx
new file mode 100644
index 00000000000..fe40c4f2437
--- /dev/null
+++ b/apps/desktop/src/components/ui/file-type-icon.tsx
@@ -0,0 +1,22 @@
+import { ToolIcon, type ToolIconProps } from '@/components/ui/tool-icon'
+import { codiconForFilename, codiconForLanguage } from '@/lib/markdown-code'
+
+export interface FileTypeIconProps extends Omit {
+ /** A code-fence language tag (e.g. `ts`, `json`). Used when no `path`. */
+ language?: string
+ /** A file path or bare name; its extension selects the icon. Wins over `language`. */
+ path?: string
+}
+
+/**
+ * Icon for a file or code language, resolved through the one mapping shared
+ * with code blocks (`codiconForFilename` / `codiconForLanguage`). Renders via
+ * `ToolIcon`, so it uses a filled glyph when one exists and falls back to the
+ * outline codicon font otherwise. Pass a `path` for file rows or a `language`
+ * for fenced code.
+ */
+export function FileTypeIcon({ language, path, ...props }: FileTypeIconProps) {
+ const name = path ? codiconForFilename(path) : codiconForLanguage(language)
+
+ return
+}
diff --git a/apps/desktop/src/components/ui/log-view.tsx b/apps/desktop/src/components/ui/log-view.tsx
index fcaad4d62b1..8ae191af8c0 100644
--- a/apps/desktop/src/components/ui/log-view.tsx
+++ b/apps/desktop/src/components/ui/log-view.tsx
@@ -4,6 +4,7 @@ import { cn } from '@/lib/utils'
// Shared raw-log viewer: no bg, hairline border, tight padding, small mono.
// One style everywhere we surface logs. Pass a max-h-* via className.
+// Selectable by default — logs exist to be read and copied.
export function LogView({ className, ...props }: ComponentProps<'div'>) {
return (
) {
'overflow-auto rounded-lg border border-(--ui-stroke-tertiary) px-2.5 py-1.5 font-mono text-[0.6875rem] leading-[1.5] whitespace-pre-wrap break-words text-(--ui-text-tertiary) [scrollbar-width:thin]',
className
)}
+ data-selectable-text="true"
{...props}
/>
)
diff --git a/apps/desktop/src/global.d.ts b/apps/desktop/src/global.d.ts
index 5e41d3e7423..1e90d3b10a0 100644
--- a/apps/desktop/src/global.d.ts
+++ b/apps/desktop/src/global.d.ts
@@ -123,6 +123,7 @@ declare global {
cancelBootstrap: () => Promise<{ ok: boolean; cancelled: boolean }>
onBootstrapEvent: (callback: (payload: DesktopBootstrapEvent) => void) => () => void
getVersion: () => Promise
+ getRemoteDisplayReason?: () => Promise
updates: {
check: () => Promise
apply: (opts?: DesktopUpdateApplyOptions) => Promise
@@ -249,9 +250,45 @@ export interface DesktopUpdateApplyResult {
manual?: boolean
command?: string
hermesRoot?: string
+ /** True when the backend was updated but the GUI couldn't be relaunched in
+ * place (AppImage / dev run): the new version loads on next launch. */
+ backendUpdated?: boolean
+ /** False when the running GUI package was NOT replaced by this update
+ * (Linux GUI/backend skew, or a sandbox-blocked relaunch). Distinguishes
+ * "backend only" outcomes from a real in-place GUI relaunch. (#45205) */
+ guiUpdated?: boolean
+ /** True for the Linux GUI/backend-skew terminal state: backend updated but
+ * the running AppImage/.deb/.rpm shell is unchanged and must be
+ * reinstalled. Renders a closeable "update the desktop app" message. */
+ guiSkew?: boolean
+ /** True when the update finished but the app must be quit + reopened by hand
+ * (e.g. the rebuilt sandbox helper isn't launchable): keep a working
+ * window, don't auto-quit into a dead app. (#45205) */
+ manualRestart?: boolean
+ /** True when the auto-relaunch was skipped specifically because the rebuilt
+ * chrome-sandbox helper is not launchable (not root:root + setuid). */
+ sandboxBlocked?: boolean
+ /** True when a detached relauncher took over (macOS bundle swap / Linux
+ * re-exec): the app is about to quit and reopen itself. */
+ handedOff?: boolean
}
-export type DesktopUpdateStage = 'idle' | 'prepare' | 'fetch' | 'pull' | 'pydeps' | 'restart' | 'manual' | 'error'
+export type DesktopUpdateStage =
+ | 'idle'
+ | 'prepare'
+ | 'fetch'
+ | 'pull'
+ | 'pydeps'
+ | 'update'
+ | 'rebuild'
+ | 'restart'
+ | 'done'
+ | 'manual'
+ /** Backend updated but the running GUI package (AppImage/.deb/.rpm) was NOT
+ * changed — the user must update/reinstall the desktop app. Terminal,
+ * closeable; never claims the GUI was updated. (#45205) */
+ | 'guiSkew'
+ | 'error'
export interface DesktopUpdateProgress {
stage: DesktopUpdateStage
diff --git a/apps/desktop/src/hermes.ts b/apps/desktop/src/hermes.ts
index 3b200a598f4..197e24611ab 100644
--- a/apps/desktop/src/hermes.ts
+++ b/apps/desktop/src/hermes.ts
@@ -660,10 +660,10 @@ export function getUsageAnalytics(days = 30): Promise {
})
}
-export function getGlobalModelOptions(): Promise {
+export function getGlobalModelOptions(opts?: { refresh?: boolean }): Promise {
return window.hermesDesktop.api({
...profileScoped(),
- path: '/api/model/options'
+ path: opts?.refresh ? '/api/model/options?refresh=1' : '/api/model/options'
})
}
diff --git a/apps/desktop/src/i18n/en.ts b/apps/desktop/src/i18n/en.ts
index e8be5a6dec8..2323558629e 100644
--- a/apps/desktop/src/i18n/en.ts
+++ b/apps/desktop/src/i18n/en.ts
@@ -146,6 +146,12 @@ export const en: Translations = {
}
},
+ remoteDisplayBanner: {
+ message: reason =>
+ `Software rendering active — remote display detected (${reason}). GPU acceleration is disabled to prevent flickering.`,
+ dismiss: 'Dismiss'
+ },
+
titlebar: {
hideSidebar: 'Hide sidebar',
showSidebar: 'Show sidebar',
@@ -403,6 +409,7 @@ export const en: Translations = {
checkNow: 'Check now',
checking: 'Checking…',
seeWhatsNew: "See what's new",
+ updateNow: 'Update now',
releaseNotes: 'Release notes',
onLatest: "You're on the latest version.",
installing: 'An update is currently installing.',
@@ -606,6 +613,8 @@ export const en: Translations = {
removedMessage: provider => `${provider} was removed.`,
failedRemove: provider => `Could not remove ${provider}`,
noProviderKeys: 'No provider API keys available.',
+ searchKeys: 'Search providers…',
+ noKeysMatch: 'No providers match your search.',
loading: 'Loading providers...'
},
sessions: {
@@ -800,7 +809,8 @@ export const en: Translations = {
gatewayRunning: 'Messaging gateway running',
gatewayStopped: 'Messaging gateway stopped',
hermesActiveSessions: (version, count) => `Hermes ${version} · Active sessions ${count}`,
- restartMessaging: 'Restart messaging',
+ restartGateway: 'Restart gateway',
+ gatewayRestartFailed: 'Gateway restart failed.',
updateHermes: 'Update Hermes',
actionRunning: 'running',
actionDone: 'done',
@@ -869,9 +879,9 @@ export const en: Translations = {
disableAria: name => `Disable ${name}`,
platformEnabled: name => `${name} enabled`,
platformDisabled: name => `${name} disabled`,
- restartToApply: 'Restart the gateway for this change to take effect.',
+ restartToApply: 'This change takes effect after a gateway restart.',
setupSaved: name => `${name} setup saved`,
- restartToReconnect: 'Restart the gateway to reconnect with the new credentials.',
+ restartToReconnect: 'New credentials take effect after a gateway restart.',
keyCleared: key => `${key} cleared`,
setupUpdated: name => `${name} setup was updated.`,
failedUpdate: name => `Failed to update ${name}`,
@@ -1384,8 +1394,12 @@ export const en: Translations = {
fetch: 'Downloading…',
pull: 'Almost there…',
pydeps: 'Finishing up…',
+ update: 'Updating Hermes…',
+ rebuild: 'Rebuilding the desktop app…',
restart: 'Restarting Hermes…',
+ done: 'Update complete',
manual: 'Update from your terminal',
+ guiSkew: 'Update the desktop app',
error: 'Update paused'
},
checking: 'Looking for updates…',
@@ -1408,13 +1422,17 @@ export const en: Translations = {
manualTitle: 'Update from your terminal',
manualBody: 'You installed Hermes from the command line, so updates run there too. Paste this into your terminal:',
manualPickedUp: 'Hermes will pick up the new version next time you launch it.',
+ guiSkewTitle: 'Update the desktop app',
+ guiSkewBody:
+ 'The backend was updated, but this desktop app package wasn’t changed. Update or reinstall the Hermes desktop app (your AppImage / .deb / .rpm) to match.',
copy: 'Copy',
copied: 'Copied',
done: 'Done',
- applyingBody: 'The Hermes updater will take over in its own window and reopen Hermes when it’s done.',
+ applyingBody:
+ 'The Hermes updater takes over in its own window and reopens Hermes automatically when it’s done. Please don’t reopen Hermes yourself while it’s updating.',
applyingBodyBackend:
'The remote backend is applying the update and will restart. Hermes reconnects automatically when it’s back.',
- applyingClose: 'Hermes will close to apply the update.',
+ applyingClose: 'This window will close while the update runs, then Hermes reopens on its own.',
errorTitle: 'Update didn’t finish',
errorBody: 'No worries — nothing was lost. You can try again now.',
notNow: 'Not now',
@@ -1571,6 +1589,7 @@ export const en: Translations = {
search: 'Search models',
noModels: 'No models found',
editModels: 'Edit Models…',
+ refreshModels: 'Refresh Models',
fast: 'Fast',
medium: 'Med'
},
@@ -1625,6 +1644,7 @@ export const en: Translations = {
gatewayChecking: 'checking',
gatewayConnecting: 'connecting',
gatewayOffline: 'offline',
+ gatewayRestarting: 'restarting…',
gatewayTitle: 'Hermes inference gateway status',
agents: 'Agents',
closeAgents: 'Close agents',
diff --git a/apps/desktop/src/i18n/ja.ts b/apps/desktop/src/i18n/ja.ts
index 3a28b50aac3..2f0535a6942 100644
--- a/apps/desktop/src/i18n/ja.ts
+++ b/apps/desktop/src/i18n/ja.ts
@@ -147,6 +147,12 @@ export const ja = defineLocale({
}
},
+ remoteDisplayBanner: {
+ message: reason =>
+ `ソフトウェアレンダリングが有効です — リモートディスプレイを検出しました(${reason})。ちらつきを防ぐため GPU アクセラレーションは無効化されています。`,
+ dismiss: '閉じる'
+ },
+
titlebar: {
hideSidebar: 'サイドバーを非表示',
showSidebar: 'サイドバーを表示',
@@ -525,6 +531,7 @@ export const ja = defineLocale({
checkNow: '今すぐ確認',
checking: '確認中…',
seeWhatsNew: '新機能を見る',
+ updateNow: '今すぐ更新',
releaseNotes: 'リリースノート',
onLatest: '最新バージョンです。',
installing: '更新をインストール中です。',
@@ -725,6 +732,8 @@ export const ja = defineLocale({
removedMessage: provider => `${provider} を削除しました。`,
failedRemove: provider => `${provider} を削除できませんでした`,
noProviderKeys: '利用可能なプロバイダー API キーがありません。',
+ searchKeys: 'プロバイダーを検索…',
+ noKeysMatch: '一致するプロバイダーがありません。',
loading: 'プロバイダーを読み込み中...'
},
sessions: {
@@ -920,7 +929,8 @@ export const ja = defineLocale({
gatewayRunning: 'メッセージングゲートウェイが実行中',
gatewayStopped: 'メッセージングゲートウェイが停止中',
hermesActiveSessions: (version, count) => `Hermes ${version} · アクティブセッション ${count}`,
- restartMessaging: 'メッセージングを再起動',
+ restartGateway: 'ゲートウェイを再起動',
+ gatewayRestartFailed: 'ゲートウェイの再起動に失敗しました。',
updateHermes: 'Hermes を更新',
actionRunning: '実行中',
actionDone: '完了',
@@ -990,9 +1000,9 @@ export const ja = defineLocale({
disableAria: name => `${name} を無効にする`,
platformEnabled: name => `${name} を有効にしました`,
platformDisabled: name => `${name} を無効にしました`,
- restartToApply: 'この変更を有効にするにはゲートウェイを再起動してください。',
+ restartToApply: 'この変更はゲートウェイの再起動後に有効になります。',
setupSaved: name => `${name} の設定を保存しました`,
- restartToReconnect: '新しい認証情報で再接続するにはゲートウェイを再起動してください。',
+ restartToReconnect: '新しい認証情報はゲートウェイの再起動後に有効になります。',
keyCleared: key => `${key} をクリアしました`,
setupUpdated: name => `${name} の設定が更新されました。`,
failedUpdate: name => `${name} の更新に失敗しました`,
@@ -1512,8 +1522,12 @@ export const ja = defineLocale({
fetch: 'ダウンロード中…',
pull: 'もうすぐ完了…',
pydeps: '仕上げ中…',
+ update: 'Hermes を更新中…',
+ rebuild: 'デスクトップアプリを再ビルド中…',
restart: 'Hermes を再起動中…',
+ done: '更新が完了しました',
manual: 'ターミナルから更新',
+ guiSkew: 'デスクトップアプリを更新してください',
error: '更新が一時停止中'
},
checking: '更新を確認中…',
@@ -1538,12 +1552,15 @@ export const ja = defineLocale({
manualBody:
'Hermes をコマンドラインからインストールしたため、更新もそこで実行されます。これをターミナルに貼り付けてください:',
manualPickedUp: 'Hermes は次回起動時に新しいバージョンを読み込みます。',
+ guiSkewTitle: 'デスクトップアプリを更新してください',
+ guiSkewBody:
+ 'バックエンドは更新されましたが、このデスクトップアプリのパッケージは変更されていません。一致させるために Hermes デスクトップアプリ(AppImage / .deb / .rpm)を更新または再インストールしてください。',
copy: 'コピー',
copied: 'コピーしました',
done: '完了',
- applyingBody: 'Hermes アップデーターが独自のウィンドウで引き継ぎ、完了後に Hermes を再度開きます。',
+ applyingBody: 'Hermes アップデーターが独自のウィンドウで引き継ぎ、完了後に自動的に Hermes を再度開きます。更新中はご自分で Hermes を開き直さないでください。',
applyingBodyBackend: 'リモートバックエンドが更新を適用して再起動します。復帰すると Hermes が自動的に再接続します。',
- applyingClose: 'Hermes は更新を適用するために閉じます。',
+ applyingClose: 'このウィンドウは更新中に閉じ、その後 Hermes が自動的に再度開きます。',
errorTitle: '更新が完了しませんでした',
errorBody: 'ご安心ください。何も失われていません。今すぐ再試行できます。',
notNow: '今は後で',
@@ -1701,6 +1718,7 @@ export const ja = defineLocale({
search: 'モデルを検索',
noModels: 'モデルが見つかりません',
editModels: 'モデルを編集…',
+ refreshModels: 'モデルを更新',
fast: '高速',
medium: '中'
},
@@ -1755,6 +1773,7 @@ export const ja = defineLocale({
gatewayChecking: '確認中',
gatewayConnecting: '接続中',
gatewayOffline: 'オフライン',
+ gatewayRestarting: '再起動中…',
gatewayTitle: 'Hermes 推論ゲートウェイのステータス',
agents: 'エージェント',
closeAgents: 'エージェントを閉じる',
diff --git a/apps/desktop/src/i18n/types.ts b/apps/desktop/src/i18n/types.ts
index 70807da8bf7..0ebc6c68d4b 100644
--- a/apps/desktop/src/i18n/types.ts
+++ b/apps/desktop/src/i18n/types.ts
@@ -159,6 +159,11 @@ export interface Translations {
}
}
+ remoteDisplayBanner: {
+ message: (reason: string) => string
+ dismiss: string
+ }
+
titlebar: {
hideSidebar: string
showSidebar: string
@@ -299,6 +304,7 @@ export interface Translations {
checkNow: string
checking: string
seeWhatsNew: string
+ updateNow: string
releaseNotes: string
onLatest: string
installing: string
@@ -485,6 +491,8 @@ export interface Translations {
removedMessage: (provider: string) => string
failedRemove: (provider: string) => string
noProviderKeys: string
+ searchKeys: string
+ noKeysMatch: string
loading: string
}
sessions: {
@@ -662,7 +670,8 @@ export interface Translations {
gatewayRunning: string
gatewayStopped: string
hermesActiveSessions: (version: string, count: number) => string
- restartMessaging: string
+ restartGateway: string
+ gatewayRestartFailed: string
updateHermes: string
actionRunning: string
actionDone: string
@@ -1077,6 +1086,10 @@ export interface Translations {
manualTitle: string
manualBody: string
manualPickedUp: string
+ /** GUI/backend skew (#45205): backend updated but the running desktop app
+ * package (AppImage/.deb/.rpm) was not changed and must be reinstalled. */
+ guiSkewTitle: string
+ guiSkewBody: string
copy: string
copied: string
done: string
@@ -1211,6 +1224,7 @@ export interface Translations {
search: string
noModels: string
editModels: string
+ refreshModels: string
fast: string
medium: string
}
@@ -1265,6 +1279,7 @@ export interface Translations {
gatewayChecking: string
gatewayConnecting: string
gatewayOffline: string
+ gatewayRestarting: string
gatewayTitle: string
agents: string
closeAgents: string
diff --git a/apps/desktop/src/i18n/zh-hant.ts b/apps/desktop/src/i18n/zh-hant.ts
index 3e1420d3414..c0eeb5ac08e 100644
--- a/apps/desktop/src/i18n/zh-hant.ts
+++ b/apps/desktop/src/i18n/zh-hant.ts
@@ -142,6 +142,11 @@ export const zhHant = defineLocale({
}
},
+ remoteDisplayBanner: {
+ message: reason => `軟體繪圖已啟用 — 偵測到遠端顯示(${reason})。為防止畫面閃爍,已停用 GPU 加速。`,
+ dismiss: '關閉'
+ },
+
titlebar: {
hideSidebar: '隱藏側邊欄',
showSidebar: '顯示側邊欄',
@@ -512,6 +517,7 @@ export const zhHant = defineLocale({
checkNow: '立即檢查',
checking: '檢查中…',
seeWhatsNew: '查看新增內容',
+ updateNow: '立即更新',
releaseNotes: '發行說明',
onLatest: '你已是最新版本。',
installing: '正在安裝更新。',
@@ -700,6 +706,8 @@ export const zhHant = defineLocale({
removedMessage: provider => `${provider} 已移除。`,
failedRemove: provider => `無法移除 ${provider}`,
noProviderKeys: '沒有可用的提供方 API 金鑰。',
+ searchKeys: '搜尋提供方…',
+ noKeysMatch: '沒有符合的提供方。',
loading: '正在載入提供方...'
},
sessions: {
@@ -891,7 +899,8 @@ export const zhHant = defineLocale({
gatewayRunning: '訊息閘道執行中',
gatewayStopped: '訊息閘道已停止',
hermesActiveSessions: (version, count) => `Hermes ${version} · 活躍工作階段 ${count}`,
- restartMessaging: '重新啟動訊息服務',
+ restartGateway: '重新啟動閘道',
+ gatewayRestartFailed: '閘道重新啟動失敗。',
updateHermes: '更新 Hermes',
actionRunning: '執行中',
actionDone: '完成',
@@ -960,9 +969,9 @@ export const zhHant = defineLocale({
disableAria: name => `停用 ${name}`,
platformEnabled: name => `${name} 已啟用`,
platformDisabled: name => `${name} 已停用`,
- restartToApply: '重新啟動閘道後此變更才會生效。',
+ restartToApply: '此變更將在閘道重新啟動後生效。',
setupSaved: name => `${name} 設定已儲存`,
- restartToReconnect: '重新啟動閘道以使用新憑證重新連線。',
+ restartToReconnect: '新憑證將在閘道重新啟動後生效。',
keyCleared: key => `${key} 已清除`,
setupUpdated: name => `${name} 設定已更新。`,
failedUpdate: name => `更新 ${name} 失敗`,
@@ -1464,8 +1473,12 @@ export const zhHant = defineLocale({
fetch: '下載中…',
pull: '快完成了…',
pydeps: '收尾中…',
+ update: '正在更新 Hermes…',
+ rebuild: '正在重新建置桌面應用程式…',
restart: '正在重新啟動 Hermes…',
+ done: '更新完成',
manual: '從終端機更新',
+ guiSkew: '請更新桌面應用程式',
error: '更新已暫停'
},
checking: '正在檢查更新…',
@@ -1488,12 +1501,15 @@ export const zhHant = defineLocale({
manualTitle: '從終端機更新',
manualBody: '您是從命令列安裝的 Hermes,因此更新也需要在那裡執行。請將此指令貼到終端機:',
manualPickedUp: '下次啟動 Hermes 時會使用新版本。',
+ guiSkewTitle: '請更新桌面應用程式',
+ guiSkewBody:
+ '後端已更新,但此桌面應用程式套件未變更。請更新或重新安裝 Hermes 桌面應用程式(你的 AppImage / .deb / .rpm)以保持一致。',
copy: '複製',
copied: '已複製',
done: '完成',
- applyingBody: 'Hermes 更新程式會在自己的視窗中接管,並在完成後重新開啟 Hermes。',
+ applyingBody: 'Hermes 更新程式會在自己的視窗中接管,並在完成後自動重新開啟 Hermes。更新期間請勿自行重新開啟 Hermes。',
applyingBodyBackend: '遠端後端正在套用更新並將重新啟動。恢復後 Hermes 會自動重新連線。',
- applyingClose: 'Hermes 將關閉以套用更新。',
+ applyingClose: '此視窗會在更新期間關閉,隨後 Hermes 會自動重新開啟。',
errorTitle: '更新未完成',
errorBody: '沒有資料遺失。您可以現在重試。',
notNow: '暫不',
@@ -1643,6 +1659,7 @@ export const zhHant = defineLocale({
search: '搜尋模型',
noModels: '找不到模型',
editModels: '編輯模型…',
+ refreshModels: '重新整理模型',
fast: '快速',
medium: '中'
},
@@ -1697,6 +1714,7 @@ export const zhHant = defineLocale({
gatewayChecking: '檢查中',
gatewayConnecting: '連線中',
gatewayOffline: '離線',
+ gatewayRestarting: '重新啟動中…',
gatewayTitle: 'Hermes 推論閘道狀態',
agents: '代理',
closeAgents: '關閉代理',
diff --git a/apps/desktop/src/i18n/zh.ts b/apps/desktop/src/i18n/zh.ts
index 34ddd474359..567c3dfe0d7 100644
--- a/apps/desktop/src/i18n/zh.ts
+++ b/apps/desktop/src/i18n/zh.ts
@@ -142,6 +142,11 @@ export const zh: Translations = {
}
},
+ remoteDisplayBanner: {
+ message: reason => `软件渲染已启用 — 检测到远程显示(${reason})。为防止画面闪烁,已禁用 GPU 加速。`,
+ dismiss: '关闭'
+ },
+
titlebar: {
hideSidebar: '隐藏侧边栏',
showSidebar: '显示侧边栏',
@@ -600,6 +605,7 @@ export const zh: Translations = {
checkNow: '立即检查',
checking: '检查中…',
seeWhatsNew: '查看新增内容',
+ updateNow: '立即更新',
releaseNotes: '发行说明',
onLatest: '你已是最新版本。',
installing: '正在安装更新。',
@@ -797,6 +803,8 @@ export const zh: Translations = {
removedMessage: provider => `${provider} 已移除。`,
failedRemove: provider => `无法移除 ${provider}`,
noProviderKeys: '没有可用的提供方 API 密钥。',
+ searchKeys: '搜索提供方…',
+ noKeysMatch: '没有匹配的提供方。',
loading: '正在加载提供方...'
},
sessions: {
@@ -988,7 +996,8 @@ export const zh: Translations = {
gatewayRunning: '消息网关运行中',
gatewayStopped: '消息网关已停止',
hermesActiveSessions: (version, count) => `Hermes ${version} · 活跃会话 ${count}`,
- restartMessaging: '重启消息服务',
+ restartGateway: '重启网关',
+ gatewayRestartFailed: '网关重启失败。',
updateHermes: '更新 Hermes',
actionRunning: '运行中',
actionDone: '完成',
@@ -1057,9 +1066,9 @@ export const zh: Translations = {
disableAria: name => `禁用 ${name}`,
platformEnabled: name => `${name} 已启用`,
platformDisabled: name => `${name} 已禁用`,
- restartToApply: '重启网关后此更改才会生效。',
+ restartToApply: '此更改将在网关重启后生效。',
setupSaved: name => `${name} 设置已保存`,
- restartToReconnect: '重启网关以使用新凭据重新连接。',
+ restartToReconnect: '新凭据将在网关重启后生效。',
keyCleared: key => `${key} 已清除`,
setupUpdated: name => `${name} 设置已更新。`,
failedUpdate: name => `更新 ${name} 失败`,
@@ -1569,8 +1578,12 @@ export const zh: Translations = {
fetch: '下载中…',
pull: '马上完成…',
pydeps: '收尾中…',
+ update: '正在更新 Hermes…',
+ rebuild: '正在重新构建桌面应用…',
restart: '正在重启 Hermes…',
+ done: '更新完成',
manual: '从终端更新',
+ guiSkew: '请更新桌面应用',
error: '更新已暂停'
},
checking: '正在检查更新…',
@@ -1593,12 +1606,14 @@ export const zh: Translations = {
manualTitle: '从终端更新',
manualBody: '你是从命令行安装的 Hermes,因此更新也需要在那里运行。请将此命令粘贴到终端:',
manualPickedUp: '下次启动 Hermes 时会使用新版本。',
+ guiSkewTitle: '请更新桌面应用',
+ guiSkewBody: '后端已更新,但此桌面应用包未更改。请更新或重新安装 Hermes 桌面应用(你的 AppImage / .deb / .rpm)以保持一致。',
copy: '复制',
copied: '已复制',
done: '完成',
- applyingBody: 'Hermes 更新器会在自己的窗口中接管,并在完成后重新打开 Hermes。',
+ applyingBody: 'Hermes 更新器会在自己的窗口中接管,并在完成后自动重新打开 Hermes。更新期间请不要自行重新打开 Hermes。',
applyingBodyBackend: '远程后端正在应用更新并将重启。恢复后 Hermes 会自动重新连接。',
- applyingClose: 'Hermes 将关闭以应用更新。',
+ applyingClose: '此窗口会在更新期间关闭,随后 Hermes 会自动重新打开。',
errorTitle: '更新未完成',
errorBody: '没有数据丢失。你可以现在重试。',
notNow: '暂不',
@@ -1749,6 +1764,7 @@ export const zh: Translations = {
search: '搜索模型',
noModels: '未找到模型',
editModels: '编辑模型…',
+ refreshModels: '刷新模型',
fast: '快速',
medium: '中'
},
@@ -1803,6 +1819,7 @@ export const zh: Translations = {
gatewayChecking: '检查中',
gatewayConnecting: '连接中',
gatewayOffline: '离线',
+ gatewayRestarting: '重启中…',
gatewayTitle: 'Hermes 推理网关状态',
agents: '代理',
closeAgents: '关闭代理',
diff --git a/apps/desktop/src/lib/chat-runtime.test.ts b/apps/desktop/src/lib/chat-runtime.test.ts
index c2a9099a1a8..1b4efb33ad5 100644
--- a/apps/desktop/src/lib/chat-runtime.test.ts
+++ b/apps/desktop/src/lib/chat-runtime.test.ts
@@ -2,7 +2,7 @@ import { describe, expect, it } from 'vitest'
import type { ComposerAttachment } from '@/store/composer'
-import { coerceThinkingText, optimisticAttachmentRef } from './chat-runtime'
+import { coerceThinkingText, optimisticAttachmentRef, parseCommandDispatch } from './chat-runtime'
const DATA_URL = 'data:image/png;base64,iVBORw0KGgoAAAANS'
@@ -52,3 +52,31 @@ describe('coerceThinkingText', () => {
).toBe('')
})
})
+
+describe('parseCommandDispatch', () => {
+ it('keeps the notice on a send directive (e.g. /goal set)', () => {
+ // The backend's /goal set returns {type:send, notice:"⊙ Goal set …", message}.
+ // Dropping the notice made /goal look like it did nothing in the desktop app.
+ const parsed = parseCommandDispatch({ type: 'send', notice: '⊙ Goal set', message: 'do the thing' })
+
+ expect(parsed).toEqual({ type: 'send', message: 'do the thing', notice: '⊙ Goal set' })
+ })
+
+ it('keeps message-only send directives working (no notice)', () => {
+ expect(parseCommandDispatch({ type: 'send', message: 'hi' })).toEqual({
+ type: 'send',
+ message: 'hi',
+ notice: undefined
+ })
+ })
+
+ it('parses a prefill directive with its notice (e.g. /undo)', () => {
+ const parsed = parseCommandDispatch({ type: 'prefill', notice: 'backed up 1 turn', message: 'edit me' })
+
+ expect(parsed).toEqual({ type: 'prefill', message: 'edit me', notice: 'backed up 1 turn' })
+ })
+
+ it('rejects a prefill directive missing its message', () => {
+ expect(parseCommandDispatch({ type: 'prefill', notice: 'x' })).toBeNull()
+ })
+})
diff --git a/apps/desktop/src/lib/chat-runtime.ts b/apps/desktop/src/lib/chat-runtime.ts
index ac5273a2236..c573a1e5899 100644
--- a/apps/desktop/src/lib/chat-runtime.ts
+++ b/apps/desktop/src/lib/chat-runtime.ts
@@ -238,7 +238,12 @@ export function parseCommandDispatch(raw: unknown): CommandDispatchResponse | nu
return typeof row.name === 'string' ? { type: 'skill', name: row.name, message: str(row.message) } : null
case 'send':
- return typeof row.message === 'string' ? { type: 'send', message: row.message } : null
+ return typeof row.message === 'string' ? { type: 'send', message: row.message, notice: str(row.notice) } : null
+
+ case 'prefill':
+ return typeof row.message === 'string'
+ ? { type: 'prefill', message: row.message, notice: str(row.notice) }
+ : null
default:
return null
diff --git a/apps/desktop/src/lib/desktop-slash-commands.ts b/apps/desktop/src/lib/desktop-slash-commands.ts
index e1a0f2d773c..5f2b51f8d9a 100644
--- a/apps/desktop/src/lib/desktop-slash-commands.ts
+++ b/apps/desktop/src/lib/desktop-slash-commands.ts
@@ -152,7 +152,7 @@ const DESKTOP_COMMAND_SPECS: readonly DesktopCommandSpec[] = [
const NO_DESKTOP_SURFACE: Record = {
terminal: [
'/busy', '/clear', '/compact', '/config', '/copy', '/cron', '/details',
- '/exit', '/footer', '/gateway', '/gquota', '/history', '/image', '/indicator', '/logs',
+ '/exit', '/footer', '/gateway', '/history', '/image', '/indicator', '/logs',
'/mouse', '/paste', '/platforms', '/plugins', '/quit', '/redraw', '/reload', '/restart',
'/sb', '/set-home', '/sethome', '/snap', '/snapshot', '/statusbar', '/toolsets', '/update', '/verbose'
],
diff --git a/apps/desktop/src/lib/markdown-code.ts b/apps/desktop/src/lib/markdown-code.ts
index 0b105727490..3d9f3e5e1b6 100644
--- a/apps/desktop/src/lib/markdown-code.ts
+++ b/apps/desktop/src/lib/markdown-code.ts
@@ -108,6 +108,137 @@ export function codiconForLanguage(language: string | undefined): string {
return CODICON_BY_LANGUAGE[sanitizeLanguageTag(language || '')] || 'code'
}
+// File extension → language tag, so a filename can resolve to the same icon a
+// fenced code block of that language would get. Only extensions that map to a
+// non-generic codicon need an entry; everything else falls through to `code`.
+const LANGUAGE_BY_EXTENSION: Record = {
+ bash: 'bash',
+ cfg: 'ini',
+ conf: 'ini',
+ css: 'css',
+ dockerfile: 'dockerfile',
+ env: 'env',
+ gql: 'graphql',
+ graphql: 'graphql',
+ ini: 'ini',
+ json: 'json',
+ json5: 'json',
+ less: 'less',
+ markdown: 'markdown',
+ md: 'markdown',
+ mdx: 'markdown',
+ mmd: 'mermaid',
+ ps1: 'powershell',
+ psql: 'sql',
+ sass: 'sass',
+ scss: 'scss',
+ sh: 'bash',
+ sql: 'sql',
+ svg: 'svg',
+ toml: 'toml',
+ yaml: 'yaml',
+ yml: 'yml',
+ zsh: 'zsh'
+}
+
+// Pick an icon for a file path by its extension (or bare name like
+// `Dockerfile`), reusing the language→codicon map so file-edit rows and code
+// blocks share one visual vocabulary. Unknown / generic code files get `code`.
+export function codiconForFilename(path: string | undefined): string {
+ const token = filenameExtToken(path)
+ const language = LANGUAGE_BY_EXTENSION[token] || token
+
+ return codiconForLanguage(language)
+}
+
+// Last path segment's extension (or the bare lowercased name for `Dockerfile`,
+// `Makefile`, …). Shared by the icon and Shiki-language resolvers.
+function filenameExtToken(path: string | undefined): string {
+ const base = (path || '').replace(/\\/g, '/').split('/').pop()?.trim().toLowerCase() || ''
+ const dot = base.lastIndexOf('.')
+
+ return dot > 0 ? base.slice(dot + 1) : base
+}
+
+// File extension → Shiki bundled-language id, for syntax-highlighting diffs in
+// the editing tool's own language. Unknown extensions return '' so callers fall
+// back to the plain color-only diff renderer.
+const SHIKI_LANGUAGE_BY_EXTENSION: Record = {
+ astro: 'astro',
+ bash: 'bash',
+ c: 'c',
+ cc: 'cpp',
+ cjs: 'javascript',
+ clj: 'clojure',
+ cpp: 'cpp',
+ cs: 'csharp',
+ css: 'css',
+ cxx: 'cpp',
+ dart: 'dart',
+ dockerfile: 'docker',
+ ex: 'elixir',
+ exs: 'elixir',
+ fish: 'fish',
+ go: 'go',
+ gql: 'graphql',
+ graphql: 'graphql',
+ h: 'c',
+ hpp: 'cpp',
+ hs: 'haskell',
+ htm: 'html',
+ html: 'html',
+ ini: 'ini',
+ java: 'java',
+ jl: 'julia',
+ js: 'javascript',
+ json: 'json',
+ json5: 'json5',
+ jsonc: 'jsonc',
+ jsx: 'jsx',
+ kt: 'kotlin',
+ kts: 'kotlin',
+ less: 'less',
+ lua: 'lua',
+ makefile: 'make',
+ markdown: 'markdown',
+ md: 'markdown',
+ mdx: 'mdx',
+ mjs: 'javascript',
+ ml: 'ocaml',
+ mts: 'typescript',
+ nix: 'nix',
+ php: 'php',
+ pl: 'perl',
+ proto: 'proto',
+ ps1: 'powershell',
+ py: 'python',
+ pyi: 'python',
+ r: 'r',
+ rb: 'ruby',
+ rs: 'rust',
+ sass: 'sass',
+ scala: 'scala',
+ scss: 'scss',
+ sh: 'bash',
+ sql: 'sql',
+ svelte: 'svelte',
+ swift: 'swift',
+ tf: 'terraform',
+ toml: 'toml',
+ ts: 'typescript',
+ tsx: 'tsx',
+ vue: 'vue',
+ xml: 'xml',
+ yaml: 'yaml',
+ yml: 'yaml',
+ zig: 'zig',
+ zsh: 'bash'
+}
+
+export function shikiLanguageForFilename(path: string | undefined): string {
+ return SHIKI_LANGUAGE_BY_EXTENSION[filenameExtToken(path)] || ''
+}
+
function proseLineCount(body: string): number {
return body.split('\n').filter(line => {
const trimmed = line.trim()
diff --git a/apps/desktop/src/lib/session-ids.test.ts b/apps/desktop/src/lib/session-ids.test.ts
new file mode 100644
index 00000000000..b5653c8eecd
--- /dev/null
+++ b/apps/desktop/src/lib/session-ids.test.ts
@@ -0,0 +1,44 @@
+import { describe, expect, it } from 'vitest'
+
+import { storedSessionIdForNotification } from './session-ids'
+
+describe('storedSessionIdForNotification', () => {
+ it('translates a runtime id back to its stored id', () => {
+ // The route is keyed by the stored id, but notifications carry the runtime
+ // id. Resolving runtime -> stored keeps notification-click navigation from
+ // resuming a non-existent stored session ("session not found").
+ const map = new Map([['stored-abc', 'runtime-123']])
+
+ expect(storedSessionIdForNotification('runtime-123', map)).toBe('stored-abc')
+ })
+
+ it('returns the id unchanged when no mapping is known', () => {
+ // A notification for a session this window never opened may already carry a
+ // stored id; let the resume/REST lookup handle it as-is.
+ const map = new Map([['stored-abc', 'runtime-123']])
+
+ expect(storedSessionIdForNotification('stored-xyz', map)).toBe('stored-xyz')
+ })
+
+ it('returns the id unchanged for an empty map', () => {
+ expect(storedSessionIdForNotification('runtime-123', new Map())).toBe('runtime-123')
+ })
+
+ it('resolves the correct stored id among several sessions', () => {
+ const map = new Map([
+ ['stored-1', 'runtime-1'],
+ ['stored-2', 'runtime-2'],
+ ['stored-3', 'runtime-3']
+ ])
+
+ expect(storedSessionIdForNotification('runtime-2', map)).toBe('stored-2')
+ })
+
+ it('does not treat a stored id as a runtime id (keys are not matched)', () => {
+ // The map is stored -> runtime. A value that only appears as a *key* must
+ // not be rewritten, otherwise an already-stored id could be mangled.
+ const map = new Map([['stored-1', 'runtime-1']])
+
+ expect(storedSessionIdForNotification('stored-1', map)).toBe('stored-1')
+ })
+})
diff --git a/apps/desktop/src/lib/session-ids.ts b/apps/desktop/src/lib/session-ids.ts
new file mode 100644
index 00000000000..c97cadc2628
--- /dev/null
+++ b/apps/desktop/src/lib/session-ids.ts
@@ -0,0 +1,26 @@
+// The gateway tags every event — and therefore every native notification —
+// with the *runtime* session id (the key under which the session lives in the
+// gateway's in-memory `_sessions` map). The chat route, however, is keyed by
+// the *stored* session id (`stored_session_id`), which is a different value:
+// a brand-new chat gets a runtime id immediately but its stored id is assigned
+// when the first turn persists. Navigating to a runtime id therefore tries to
+// resume a stored session that does not exist ("session not found") and
+// strands the user, who experiences it as the running session being destroyed.
+//
+// `runtimeIdByStoredSessionId` maps stored -> runtime; this resolves the
+// reverse so notification-click navigation lands on the real route. The id is
+// returned unchanged when no mapping is known — it may already be a stored id
+// (e.g. a notification for a session this window never opened), in which case
+// the normal resume/REST lookup handles it.
+export function storedSessionIdForNotification(
+ id: string,
+ runtimeIdByStoredSessionId: ReadonlyMap
+): string {
+ for (const [storedId, runtimeId] of runtimeIdByStoredSessionId) {
+ if (runtimeId === id) {
+ return storedId
+ }
+ }
+
+ return id
+}
diff --git a/apps/desktop/src/store/composer-popout.ts b/apps/desktop/src/store/composer-popout.ts
new file mode 100644
index 00000000000..66e758aa1f0
--- /dev/null
+++ b/apps/desktop/src/store/composer-popout.ts
@@ -0,0 +1,114 @@
+import { atom } from 'nanostores'
+
+import { persistBoolean, persistString, storedBoolean, storedString } from '@/lib/storage'
+
+const POPOUT_ENABLED_STORAGE_KEY = 'hermes.desktop.composerPopout.enabled'
+const POPOUT_POSITION_STORAGE_KEY = 'hermes.desktop.composerPopout.position'
+
+/** Where the floating composer's bottom-right corner sits, measured as an inset
+ * from the viewport's bottom/right edges. Anchoring to the bottom-right keeps
+ * the box visually pinned to its default corner as the window resizes and as
+ * the box grows upward while typing (the corner stays put, height climbs). */
+export interface PopoutPosition {
+ bottom: number
+ right: number
+}
+
+// Floating composer width (rem). Shared by the inline style that sets
+// --composer-popout-width and the peel-off drag math.
+export const POPOUT_WIDTH_REM = 19.5
+
+// Default pop-out placement: tucked into the bottom-right of the thread, clear
+// of the window chrome. Matches the brief's "default to the right bottom".
+const DEFAULT_POSITION: PopoutPosition = { bottom: 24, right: 24 }
+
+function readPosition(): PopoutPosition {
+ const raw = storedString(POPOUT_POSITION_STORAGE_KEY)
+
+ if (!raw) {
+ return DEFAULT_POSITION
+ }
+
+ try {
+ const parsed = JSON.parse(raw) as Partial
+
+ if (typeof parsed.bottom === 'number' && typeof parsed.right === 'number') {
+ // Clamp on load — a position persisted on a larger/other monitor must not
+ // strand the box off-screen on this one.
+ return clampPosition({ bottom: parsed.bottom, right: parsed.right })
+ }
+ } catch {
+ // Corrupt value — fall back to the default corner.
+ }
+
+ return DEFAULT_POSITION
+}
+
+export interface PopoutSize {
+ height: number
+ width: number
+}
+
+interface SetPositionOptions {
+ persist?: boolean
+ /** Measured box size; falls back to the compact width + a min height so the
+ * box stays grabbable even when the caller can't measure it. */
+ size?: PopoutSize
+}
+
+// Keep at least this much of every edge between the box and the viewport, so the
+// floating composer can never be dragged (or restored) out of reach.
+const EDGE_MARGIN = 8
+const TITLEBAR_HEIGHT_FALLBACK = 34
+const TITLEBAR_CLEARANCE_REM = 0.75
+// Height floor used when the real box height is unknown (init / load / peel-off).
+export const POPOUT_ESTIMATED_HEIGHT = 56
+const MIN_VISIBLE_HEIGHT = POPOUT_ESTIMATED_HEIGHT
+
+const clampRange = (value: number, lo: number, hi: number) => Math.min(Math.max(value, lo), Math.max(lo, hi))
+
+const rootFontSize = () => parseFloat(getComputedStyle(document.documentElement).fontSize) || 16
+
+function titlebarTopMargin() {
+ const raw = getComputedStyle(document.documentElement).getPropertyValue('--titlebar-height').trim()
+ const titlebarHeight = Number.parseFloat(raw)
+ const breathingRoom = TITLEBAR_CLEARANCE_REM * rootFontSize()
+
+ return Math.max(EDGE_MARGIN, (Number.isFinite(titlebarHeight) ? titlebarHeight : TITLEBAR_HEIGHT_FALLBACK) + breathingRoom)
+}
+
+// Bound the bottom-right inset so the WHOLE box stays on-screen — the corner
+// anchor alone would let the box's width/height push it past the left/top edges.
+function clampPosition({ bottom, right }: PopoutPosition, size?: PopoutSize): PopoutPosition {
+ const width = size?.width || POPOUT_WIDTH_REM * rootFontSize()
+ const height = size?.height || MIN_VISIBLE_HEIGHT
+ const topMargin = titlebarTopMargin()
+
+ return {
+ bottom: clampRange(bottom, EDGE_MARGIN, window.innerHeight - height - topMargin),
+ right: clampRange(right, EDGE_MARGIN, window.innerWidth - width - EDGE_MARGIN)
+ }
+}
+
+export const $composerPoppedOut = atom(storedBoolean(POPOUT_ENABLED_STORAGE_KEY, false))
+export const $composerPopoutPosition = atom(readPosition())
+
+export function setComposerPoppedOut(value: boolean) {
+ $composerPoppedOut.set(value)
+ persistBoolean(POPOUT_ENABLED_STORAGE_KEY, value)
+}
+
+/** Move the box (state only by default). Used per-frame during a drag — no IO
+ * unless `persist`. Returns the clamped position so callers can sync their live
+ * ref. Pass the measured `size` for exact bounds; otherwise a fallback keeps it
+ * on-screen. */
+export function setComposerPopoutPosition(position: PopoutPosition, { persist, size }: SetPositionOptions = {}): PopoutPosition {
+ const next = clampPosition(position, size)
+ $composerPopoutPosition.set(next)
+
+ if (persist) {
+ persistString(POPOUT_POSITION_STORAGE_KEY, JSON.stringify(next))
+ }
+
+ return next
+}
diff --git a/apps/desktop/src/store/model-visibility.test.ts b/apps/desktop/src/store/model-visibility.test.ts
index 90eccdf457e..805493cd5bc 100644
--- a/apps/desktop/src/store/model-visibility.test.ts
+++ b/apps/desktop/src/store/model-visibility.test.ts
@@ -4,10 +4,13 @@ import type { ModelOptionProvider } from '@/types/hermes'
import {
collapseModelFamilies,
+ defaultVisibleKeys,
effectiveVisibleKeys,
emptyProviderSentinelKey,
isProviderSentinel,
- modelVisibilityKey
+ modelVisibilityKey,
+ resolveVisibleKeys,
+ toggleModelVisibility
} from './model-visibility'
const provider = (slug: string, models: string[]): ModelOptionProvider => ({
@@ -96,4 +99,133 @@ describe('model visibility', () => {
expect(isProviderSentinel('openai::')).toBe(true)
expect(isProviderSentinel('openai::gpt-4o')).toBe(false)
})
+
+ it('resolveVisibleKeys preserves sentinels that effectiveVisibleKeys strips', () => {
+ const stored = new Set([emptyProviderSentinelKey('nous')])
+ const providers = [provider('nous', ['hermes-x', 'hermes-y']), provider('ollama', ['qwen3:latest'])]
+
+ const resolved = resolveVisibleKeys(stored, providers)
+ expect(resolved.has(emptyProviderSentinelKey('nous'))).toBe(true)
+ expect(resolved.has(modelVisibilityKey('nous', 'hermes-x'))).toBe(false)
+ // Un-customized providers still expand to their defaults.
+ expect(resolved.has(modelVisibilityKey('ollama', 'qwen3:latest'))).toBe(true)
+
+ // Display variant drops the sentinel.
+ expect(effectiveVisibleKeys(stored, providers).has(emptyProviderSentinelKey('nous'))).toBe(false)
+ })
+})
+
+describe('toggleModelVisibility', () => {
+ const providers = [provider('openai', ['gpt-a', 'gpt-b']), provider('nous', ['hermes-x', 'hermes-y'])]
+
+ // Drive the handler the way the dialog does: feed each result back in as the
+ // next `stored`, so the persisted set is what the next toggle starts from.
+ const apply = (stored: Set | null, slug: string, model: string) =>
+ toggleModelVisibility(stored, providers, slug, model)
+
+ it('records a hide-all sentinel when the last model of a provider is toggled off', () => {
+ let stored: Set | null = null
+ stored = apply(stored, 'openai', 'gpt-a')
+ stored = apply(stored, 'openai', 'gpt-b')
+
+ expect(stored.has(emptyProviderSentinelKey('openai'))).toBe(true)
+ expect(effectiveVisibleKeys(stored, providers).has(modelVisibilityKey('openai', 'gpt-a'))).toBe(false)
+ expect(effectiveVisibleKeys(stored, providers).has(modelVisibilityKey('openai', 'gpt-b'))).toBe(false)
+ })
+
+ it('keeps a hidden provider hidden when a different provider is toggled (regression for #43485)', () => {
+ // Hide ALL of nous — its sentinel is now stored.
+ let stored: Set | null = null
+ stored = apply(stored, 'nous', 'hermes-x')
+ stored = apply(stored, 'nous', 'hermes-y')
+ expect(stored.has(emptyProviderSentinelKey('nous'))).toBe(true)
+
+ // Toggle a model in another provider. nous must NOT snap back on.
+ stored = apply(stored, 'openai', 'gpt-a')
+
+ expect(stored.has(emptyProviderSentinelKey('nous'))).toBe(true)
+ const visible = effectiveVisibleKeys(stored, providers)
+ expect(visible.has(modelVisibilityKey('nous', 'hermes-x'))).toBe(false)
+ expect(visible.has(modelVisibilityKey('nous', 'hermes-y'))).toBe(false)
+ })
+
+ it('clears only the toggled provider sentinel when a model is re-enabled', () => {
+ let stored: Set | null = new Set([emptyProviderSentinelKey('openai'), emptyProviderSentinelKey('nous')])
+
+ stored = apply(stored, 'openai', 'gpt-a')
+
+ expect(stored.has(emptyProviderSentinelKey('openai'))).toBe(false)
+ expect(stored.has(emptyProviderSentinelKey('nous'))).toBe(true)
+ const visible = effectiveVisibleKeys(stored, providers)
+ expect(visible.has(modelVisibilityKey('openai', 'gpt-a'))).toBe(true)
+ expect(visible.has(modelVisibilityKey('nous', 'hermes-x'))).toBe(false)
+ })
+
+ it('re-enabling one model of a hidden-all provider restores ONLY that model, not the curated defaults', () => {
+ // openai hidden-all, nous untouched.
+ let stored: Set | null = new Set([emptyProviderSentinelKey('openai')])
+
+ stored = apply(stored, 'openai', 'gpt-a')
+
+ const visible = effectiveVisibleKeys(stored, providers)
+ expect(visible.has(modelVisibilityKey('openai', 'gpt-a'))).toBe(true)
+ // gpt-b is NOT restored — "you hid everything, you get back only what you re-enable".
+ expect(visible.has(modelVisibilityKey('openai', 'gpt-b'))).toBe(false)
+ })
+
+ it('re-hiding the last re-enabled model re-adds the sentinel (full round-trip)', () => {
+ let stored: Set | null = new Set([emptyProviderSentinelKey('openai')])
+
+ // Re-enable gpt-a (clears sentinel, set = {gpt-a}), then toggle it back off.
+ stored = apply(stored, 'openai', 'gpt-a')
+ expect(stored.has(emptyProviderSentinelKey('openai'))).toBe(false)
+ stored = apply(stored, 'openai', 'gpt-a')
+
+ expect(stored.has(emptyProviderSentinelKey('openai'))).toBe(true)
+ expect(effectiveVisibleKeys(stored, providers).has(modelVisibilityKey('openai', 'gpt-a'))).toBe(false)
+ })
+
+ it('toggling from an empty (non-null) stored set adds the model without expanding defaults', () => {
+ // Empty-but-not-null = "everything hidden". resolveVisibleKeys short-circuits to {}.
+ const stored = new Set()
+
+ const next = apply(stored, 'openai', 'gpt-a')
+
+ expect(next.has(modelVisibilityKey('openai', 'gpt-a'))).toBe(true)
+ // No curated defaults were expanded for any provider.
+ expect(next.has(modelVisibilityKey('openai', 'gpt-b'))).toBe(false)
+ expect(next.has(modelVisibilityKey('nous', 'hermes-x'))).toBe(false)
+ })
+
+ it('toggling off one default model from null stored keeps the rest of the curated defaults', () => {
+ // null = "never customized": resolveVisibleKeys expands all defaults first.
+ const next = apply(null, 'openai', 'gpt-a')
+
+ expect(next.has(modelVisibilityKey('openai', 'gpt-a'))).toBe(false)
+ expect(next.has(modelVisibilityKey('openai', 'gpt-b'))).toBe(true)
+ expect(next.has(modelVisibilityKey('nous', 'hermes-x'))).toBe(true)
+ // Other models remain, so no sentinel.
+ expect(next.has(emptyProviderSentinelKey('openai'))).toBe(false)
+ })
+
+ it('tolerates a provider with zero models (defensive — dialog filters these out)', () => {
+ const ps = [provider('empty', []), provider('openai', ['gpt-a'])]
+ const next = toggleModelVisibility(new Set([modelVisibilityKey('openai', 'gpt-a')]), ps, 'empty', 'ghost')
+
+ // No crash; the phantom key is recorded but no defaults are invented.
+ expect([...next].some(k => k.startsWith('empty::') && !isProviderSentinel(k))).toBe(true)
+ expect(next.has(modelVisibilityKey('openai', 'gpt-a'))).toBe(true)
+ })
+})
+
+describe('resolveVisibleKeys', () => {
+ const providers = [provider('openai', ['gpt-a', 'gpt-b']), provider('nous', ['hermes-x', 'hermes-y'])]
+
+ it('returns the curated defaults verbatim for null stored', () => {
+ expect(resolveVisibleKeys(null, providers)).toEqual(defaultVisibleKeys(providers))
+ })
+
+ it('returns an empty set for an empty (non-null) stored set', () => {
+ expect([...resolveVisibleKeys(new Set(), providers)]).toEqual([])
+ })
})
diff --git a/apps/desktop/src/store/model-visibility.ts b/apps/desktop/src/store/model-visibility.ts
index 5c2b568c596..44f15b4c32a 100644
--- a/apps/desktop/src/store/model-visibility.ts
+++ b/apps/desktop/src/store/model-visibility.ts
@@ -106,19 +106,29 @@ export function defaultVisibleKeys(providers: readonly ModelOptionProvider[]): S
const keys = new Set()
for (const provider of providers) {
- const families = collapseModelFamilies(provider.models ?? [])
-
- for (const family of families.slice(0, DEFAULT_VISIBLE_PER_PROVIDER)) {
- keys.add(modelVisibilityKey(provider.slug, family.id))
- }
+ expandProviderDefaults(provider, keys)
}
return keys
}
-/** Resolve which keys are currently visible: the user's explicit set when
- * configured, otherwise the curated default for the given providers. */
-export function effectiveVisibleKeys(
+/** Add a provider's curated default model keys (top-N collapsed families) to
+ * `target`. Shared by `defaultVisibleKeys` and `resolveVisibleKeys` so the
+ * expansion rule lives in exactly one place. */
+function expandProviderDefaults(provider: ModelOptionProvider, target: Set): void {
+ const families = collapseModelFamilies(provider.models ?? [])
+
+ for (const family of families.slice(0, DEFAULT_VISIBLE_PER_PROVIDER)) {
+ target.add(modelVisibilityKey(provider.slug, family.id))
+ }
+}
+
+/** Resolve the canonical working set: the user's stored keys plus the curated
+ * default expansion for any provider they haven't customized. Hide-all
+ * sentinels are PRESERVED here — this is the set the toggle handler mutates and
+ * persists, so dropping a sentinel would silently re-enable a provider the user
+ * emptied. Use `effectiveVisibleKeys` for display (sentinels stripped). */
+export function resolveVisibleKeys(
stored: Set | null,
providers: readonly ModelOptionProvider[]
): Set {
@@ -134,22 +144,31 @@ export function effectiveVisibleKeys(
for (const provider of providers) {
const providerPrefix = `${provider.slug}::`
+
const hasStoredProvider = [...stored].some(
key => key.startsWith(providerPrefix) && !isProviderSentinel(key)
)
+
const hasSentinel = stored.has(emptyProviderSentinelKey(provider.slug))
if (hasStoredProvider || hasSentinel) {
continue
}
- const families = collapseModelFamilies(provider.models ?? [])
-
- for (const family of families.slice(0, DEFAULT_VISIBLE_PER_PROVIDER)) {
- next.add(modelVisibilityKey(provider.slug, family.id))
- }
+ expandProviderDefaults(provider, next)
}
+ return next
+}
+
+/** Resolve which keys are currently visible for DISPLAY: the resolved working
+ * set with bookkeeping sentinels stripped (they are not real models). */
+export function effectiveVisibleKeys(
+ stored: Set | null,
+ providers: readonly ModelOptionProvider[]
+): Set {
+ const next = resolveVisibleKeys(stored, providers)
+
// Strip sentinel keys — they are bookkeeping, not real visibility entries.
for (const key of [...next]) {
if (isProviderSentinel(key)) {
@@ -159,3 +178,42 @@ export function effectiveVisibleKeys(
return next
}
+
+/** Compute the next persisted visibility set when one model row is toggled.
+ * Seeds from `resolveVisibleKeys` (NOT `effectiveVisibleKeys`) so other
+ * providers' hide-all sentinels survive the persist. When the last visible
+ * model of a provider is toggled off, a sentinel records the explicit
+ * hide-all; re-enabling a model clears THAT provider's sentinel (only). */
+export function toggleModelVisibility(
+ stored: Set | null,
+ providers: readonly ModelOptionProvider[],
+ providerSlug: string,
+ model: string
+): Set {
+ // `resolveVisibleKeys` always returns a fresh Set, so we can mutate it directly.
+ const next = resolveVisibleKeys(stored, providers)
+ const key = modelVisibilityKey(providerSlug, model)
+ const sentinel = emptyProviderSentinelKey(providerSlug)
+
+ if (next.has(key)) {
+ next.delete(key)
+
+ // Check if this was the last real model for this provider.
+ const remainingForProvider = [...next].some(
+ k => k.startsWith(`${providerSlug}::`) && !isProviderSentinel(k)
+ )
+
+ if (!remainingForProvider) {
+ next.add(sentinel)
+ }
+ } else {
+ // Re-enabling promotes a previously hidden-all provider to an explicit
+ // set of exactly the one re-enabled model — the curated defaults are NOT
+ // restored. Intentional: "you hid everything, you get back only what you
+ // re-enable." (Locked in by the sentinel-clear-on-re-enable test.)
+ next.delete(sentinel)
+ next.add(key)
+ }
+
+ return next
+}
diff --git a/apps/desktop/src/store/prompts.ts b/apps/desktop/src/store/prompts.ts
index a514556d102..2d7a74baa8b 100644
--- a/apps/desktop/src/store/prompts.ts
+++ b/apps/desktop/src/store/prompts.ts
@@ -87,10 +87,20 @@ export interface SecretRequest extends KeyedPrompt {
const approval = keyedPromptStore()
const sudo = keyedPromptStore()
const secret = keyedPromptStore()
+const $approvalInlineAnchorCount = atom(0)
export const $approvalRequest = approval.$active
export const setApprovalRequest = approval.set
export const clearApprovalRequest = approval.clear
+export const $approvalInlineVisible = computed($approvalInlineAnchorCount, count => count > 0)
+
+export function registerApprovalInlineAnchor(): () => void {
+ $approvalInlineAnchorCount.set($approvalInlineAnchorCount.get() + 1)
+
+ return () => {
+ $approvalInlineAnchorCount.set(Math.max(0, $approvalInlineAnchorCount.get() - 1))
+ }
+}
export const $sudoRequest = sudo.$active
export const setSudoRequest = sudo.set
@@ -107,6 +117,7 @@ export function clearAllPrompts(sessionId?: string | null): void {
approval.reset()
sudo.reset()
secret.reset()
+ $approvalInlineAnchorCount.set(0)
return
}
diff --git a/apps/desktop/src/store/system-actions.ts b/apps/desktop/src/store/system-actions.ts
new file mode 100644
index 00000000000..43a8d9b770e
--- /dev/null
+++ b/apps/desktop/src/store/system-actions.ts
@@ -0,0 +1,48 @@
+import { atom } from 'nanostores'
+
+import { getActionStatus, restartGateway } from '@/hermes'
+import { translateNow } from '@/i18n'
+import { notifyError } from '@/store/notifications'
+import type { ActionResponse } from '@/types/hermes'
+
+const POLL_ATTEMPTS = 18
+const POLL_INTERVAL_MS = 1200
+const POLL_TIMEOUT_S = 180
+
+// True while a gateway restart is in flight — drives the statusbar gateway
+// indicator (glyph spinner) so the restart shows up where users already look,
+// instead of a toast that vanishes or a generic "Agents running" counter.
+export const $gatewayRestarting = atom(false)
+
+// Poll a backend action to completion (or a bounded window), throwing on a
+// non-zero exit so the caller can surface the failure.
+async function awaitAction(started: ActionResponse): Promise {
+ for (let attempt = 0; attempt < POLL_ATTEMPTS; attempt += 1) {
+ await new Promise(resolve => window.setTimeout(resolve, POLL_INTERVAL_MS))
+ const status = await getActionStatus(started.name, POLL_TIMEOUT_S)
+
+ if (!status.running) {
+ if (status.exit_code != null && status.exit_code !== 0) {
+ throw new Error(translateNow('commandCenter.gatewayRestartFailed'))
+ }
+
+ return
+ }
+ }
+}
+
+// Restart the messaging gateway, surfacing progress in the statusbar gateway
+// indicator. Self-contained and never rejects, so every trigger — Cmd+K, the
+// messaging save/toggle toasts — gets identical feedback from a plain
+// `void runGatewayRestart()`, and a failure is the only thing that toasts.
+export async function runGatewayRestart(): Promise {
+ $gatewayRestarting.set(true)
+
+ try {
+ await awaitAction(await restartGateway())
+ } catch (err) {
+ notifyError(err, translateNow('commandCenter.gatewayRestartFailed'))
+ } finally {
+ $gatewayRestarting.set(false)
+ }
+}
diff --git a/apps/desktop/src/store/updates.test.ts b/apps/desktop/src/store/updates.test.ts
index bb74cd650c1..25ceda7c22f 100644
--- a/apps/desktop/src/store/updates.test.ts
+++ b/apps/desktop/src/store/updates.test.ts
@@ -41,7 +41,18 @@ vi.mock('@/hermes', () => ({
getActionStatus: (...args: unknown[]) => getActionStatusSpy(...args)
}))
-const { maybeNotifyUpdateAvailable, checkBackendUpdates, $backendUpdateStatus, applyBackendUpdate, $backendUpdateApply, reportBackendContract } = await import('./updates')
+const {
+ maybeNotifyUpdateAvailable,
+ checkBackendUpdates,
+ $backendUpdateStatus,
+ applyBackendUpdate,
+ $backendUpdateApply,
+ reportBackendContract,
+ applyUpdates,
+ $updateApply,
+ $updateOverlayOpen,
+ resetUpdateApplyState
+} = await import('./updates')
const { setConnection } = await import('./session')
const status = (over: Partial = {}): DesktopUpdateStatus => ({
@@ -218,6 +229,119 @@ describe('checkBackendUpdates', () => {
})
})
+describe('applyUpdates terminal state', () => {
+ const applyMock = vi.fn()
+
+ beforeEach(() => {
+ storage.clear()
+ notifySpy.mockClear()
+ dismissSpy.mockClear()
+ applyMock.mockReset()
+ resetUpdateApplyState()
+ $updateOverlayOpen.set(true)
+ ;(globalThis as unknown as { window: unknown }).window = {
+ hermesDesktop: { updates: { apply: applyMock } }
+ }
+ vi.useRealTimers()
+ })
+
+ afterEach(() => {
+ delete (globalThis as unknown as { window?: unknown }).window
+ })
+
+ it('holds the restart view when a relauncher hands off (no close, no toast)', async () => {
+ applyMock.mockResolvedValue({ ok: true, handedOff: true })
+
+ const result = await applyUpdates()
+
+ expect(result.handedOff).toBe(true)
+ // The detached relauncher will quit + reopen us; keep "applying" until then.
+ expect($updateApply.get().applying).toBe(true)
+ expect($updateOverlayOpen.get()).toBe(true)
+ expect(notifySpy).not.toHaveBeenCalled()
+ })
+
+ it('closes the overlay + toasts when updated but not relaunched in place', async () => {
+ // The Linux AppImage / dev-run path: backend + GUI updated, no in-place
+ // relaunch. Must not strand the overlay on a closeless spinner.
+ applyMock.mockResolvedValue({ ok: true, backendUpdated: true })
+
+ await applyUpdates()
+
+ expect($updateOverlayOpen.get()).toBe(false)
+ expect($updateApply.get().applying).toBe(false)
+ expect($updateApply.get().stage).toBe('idle')
+ expect(notifySpy).toHaveBeenCalledTimes(1)
+ expect(notifySpy.mock.calls[0]?.[0]).toMatchObject({ kind: 'success' })
+ })
+
+ it('lands on a closeable error state when the apply resolves not-ok', async () => {
+ applyMock.mockResolvedValue({ ok: false, error: 'rebuild-failed', message: 'rebuild failed' })
+
+ await applyUpdates()
+
+ expect($updateApply.get().applying).toBe(false)
+ expect($updateApply.get().stage).toBe('error')
+ expect($updateApply.get().error).toBe('rebuild-failed')
+ })
+
+ it('keeps the manual command state for CLI installs with no staged updater', async () => {
+ applyMock.mockResolvedValue({ ok: true, manual: true, command: 'hermes update' })
+
+ await applyUpdates()
+
+ expect($updateApply.get().stage).toBe('manual')
+ expect($updateApply.get().command).toBe('hermes update')
+ expect($updateOverlayOpen.get()).toBe(true)
+ expect(notifySpy).not.toHaveBeenCalled()
+ })
+
+ it('lands on the guiSkew terminal state for a GUI/backend skew (AppImage/.deb/.rpm), without claiming a GUI update', async () => {
+ // Linux: backend updated, but the running desktop package was NOT replaced.
+ // Must NOT toast "loads next launch" — that's the dishonest message #45205
+ // guards against. Lands on a closeable guiSkew view instead.
+ applyMock.mockResolvedValue({
+ ok: true,
+ backendUpdated: true,
+ guiUpdated: false,
+ guiSkew: true,
+ message: 'Backend updated, but the desktop app package was not changed.'
+ })
+
+ const result = await applyUpdates()
+
+ expect(result.guiUpdated).toBe(false)
+ expect($updateApply.get().stage).toBe('guiSkew')
+ expect($updateApply.get().applying).toBe(false)
+ expect($updateApply.get().message).toMatch(/desktop app package was not changed/)
+ // Overlay stays open on a closeable terminal view; no "all set" toast.
+ expect($updateOverlayOpen.get()).toBe(true)
+ expect(notifySpy).not.toHaveBeenCalled()
+ })
+
+ it('lands on a closeable manual-restart state when the rebuilt sandbox blocks auto-relaunch', async () => {
+ // Under release/*-unpacked but chrome-sandbox isn't launchable: don't quit
+ // into a dead app — keep a working window on a closeable manual state.
+ applyMock.mockResolvedValue({
+ ok: true,
+ backendUpdated: true,
+ guiUpdated: false,
+ manualRestart: true,
+ sandboxBlocked: true,
+ message: 'Backend updated. Quit and reopen Hermes to finish.'
+ })
+
+ const result = await applyUpdates()
+
+ expect(result.manualRestart).toBe(true)
+ expect($updateApply.get().stage).toBe('manual')
+ expect($updateApply.get().command).toBeNull()
+ expect($updateApply.get().message).toMatch(/Quit and reopen/)
+ expect($updateOverlayOpen.get()).toBe(true)
+ expect(notifySpy).not.toHaveBeenCalled()
+ })
+})
+
describe('applyBackendUpdate recovery', () => {
beforeEach(() => {
storage.clear()
diff --git a/apps/desktop/src/store/updates.ts b/apps/desktop/src/store/updates.ts
index b9338314e70..6b6aae9bea1 100644
--- a/apps/desktop/src/store/updates.ts
+++ b/apps/desktop/src/store/updates.ts
@@ -195,6 +195,20 @@ export function openUpdatesWindow(): void {
openUpdateOverlayFor(isRemoteMode() ? 'backend' : 'client')
}
+/**
+ * Start applying the available update for the active target right away. Opens
+ * the updates overlay first so the user sees apply progress (the overlay
+ * renders ApplyingView once `applying` flips true), then kicks off the install.
+ * Used by the "Update now" affordance on the About panel, which would otherwise
+ * only be able to open the changelog overlay.
+ */
+export function startActiveUpdate(): void {
+ const target: UpdateTarget = isRemoteMode() ? 'backend' : 'client'
+ $updateOverlayTarget.set(target)
+ $updateOverlayOpen.set(true)
+ void (target === 'backend' ? applyBackendUpdate() : applyUpdates())
+}
+
/** Re-read the running app's version from the Electron main process and
* publish it on `$desktopVersion`. Called when the About panel mounts, the
* update flow finishes, and the window regains focus, so the About text
@@ -328,6 +342,70 @@ export async function applyUpdates(opts: DesktopUpdateApplyOptions = {}): Promis
message: result.command ?? 'hermes update',
command: result.command ?? 'hermes update'
})
+
+ return result
+ }
+
+ // A detached relauncher took over (macOS bundle swap / Linux re-exec): the
+ // app is about to quit and reopen, so hold the "Restarting…" view until it
+ // does. Every other resolved outcome MUST land on a terminal, closeable
+ // state: the apply IPC resolves here, but the progress stream may have left
+ // us on a non-terminal stage (e.g. 'done'/'rebuild'), which renders as a
+ // spinner with no close button — the exact hang this guards against.
+ // Linux GUI/backend skew (#45205): the backend was updated but the running
+ // desktop app PACKAGE was not changed (AppImage/.deb/.rpm). We must NOT tell
+ // the user "the new version loads next launch" — that's false; this packaged
+ // shell keeps running old GUI code against the new backend. Land on the
+ // dedicated, closeable guiSkew terminal state telling them to update/reinstall
+ // the desktop app.
+ if (result?.guiSkew) {
+ $updateApply.set({
+ ...IDLE,
+ applying: false,
+ stage: 'guiSkew',
+ message: result.message ?? translateNow('updates.guiSkewBody')
+ })
+
+ return result
+ }
+
+ // Backend updated but the app couldn't auto-relaunch (e.g. the rebuilt
+ // sandbox helper isn't launchable): keep a closeable manual-restart state so
+ // the user keeps a working window instead of a dead app or a stuck spinner.
+ if (result?.ok && result?.manualRestart) {
+ $updateApply.set({
+ ...IDLE,
+ applying: false,
+ stage: 'manual',
+ message: result.message ?? translateNow('updates.manualPickedUp')
+ })
+
+ return result
+ }
+
+ if (!result?.handedOff) {
+ if (result?.ok) {
+ // Updated, but couldn't relaunch in place (AppImage / dev run). Dismiss
+ // the overlay and let the user know the new version loads next launch
+ // rather than stranding them on an un-closeable spinner.
+ setUpdateOverlayOpen(false)
+ resetUpdateApplyState()
+ notify({
+ durationMs: 8000,
+ id: UPDATE_TOAST_ID,
+ kind: 'success',
+ message: translateNow('updates.manualPickedUp'),
+ title: translateNow('updates.allSetTitle')
+ })
+ } else {
+ $updateApply.set({
+ ...$updateApply.get(),
+ applying: false,
+ stage: 'error',
+ error: result?.error ?? 'apply-failed',
+ message: result?.message ?? translateNow('updates.errorBody')
+ })
+ }
}
return result
@@ -443,7 +521,11 @@ export async function applyBackendUpdate(): Promise {
function ingestProgress(payload: DesktopUpdateProgress): void {
const current = $updateApply.get()
const log = [...current.log, { stage: payload.stage, message: payload.message, at: payload.at }].slice(-50)
- const terminal = payload.stage === 'error' || payload.stage === 'restart' || payload.stage === 'manual'
+ const terminal =
+ payload.stage === 'error' ||
+ payload.stage === 'restart' ||
+ payload.stage === 'manual' ||
+ payload.stage === 'guiSkew'
$updateApply.set({
applying: !terminal,
diff --git a/apps/desktop/src/styles.css b/apps/desktop/src/styles.css
index 03b348c9d84..9487b636dfb 100644
--- a/apps/desktop/src/styles.css
+++ b/apps/desktop/src/styles.css
@@ -299,8 +299,11 @@
'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji', emoji;
/* Key caps always use the native UI face — never theme typography overrides. */
--dt-font-kbd: -apple-system, BlinkMacSystemFont, 'SF Pro Text', 'Segoe UI', system-ui, sans-serif;
+ /* JetBrains Mono first — the face we bundle (@font-face above) and the
+ terminal's primary — so code/diff match the terminal on every platform
+ instead of drifting to a system Cascadia Code where it's installed. */
--dt-font-mono:
- 'Cascadia Code', 'JetBrains Mono', 'SF Mono', ui-monospace, Menlo, Consolas, monospace, 'Apple Color Emoji',
+ 'JetBrains Mono', 'Cascadia Code', 'SF Mono', ui-monospace, Menlo, Consolas, monospace, 'Apple Color Emoji',
'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji', emoji;
--dt-base-size: 1rem;
--dt-line-height: 1.5;
@@ -337,8 +340,8 @@
--file-tree-row-height: 1.375rem;
--composer-width: 48.75rem;
- --composer-control-size: 1.75rem;
- --composer-control-primary-size: 1.875rem;
+ --composer-control-size: 1.5rem;
+ --composer-control-primary-size: 1.625rem;
--composer-control-gap: 0.25rem;
--composer-row-gap: 0.25rem;
--composer-ring-strength: 1;
@@ -680,6 +683,7 @@ textarea,
[contenteditable]:not([contenteditable='false']),
[data-slot='aui_user-message-root'],
[data-slot='aui_assistant-message-content'],
+[data-slot='aui_system-message-root'],
[data-selectable-text='true'],
[data-selectable-text='true'] * {
-webkit-user-select: text;
@@ -1001,10 +1005,55 @@ canvas {
}
[data-slot='composer-root'] {
- width: min(var(--composer-width), calc(100% - 2rem));
+ /* +10px width compensates the 5px side padding so the visible surface keeps
+ its exact width/position — the inline padding is just transparent grab space
+ for the peel-out drag, matching the floating composer's 5px platform. */
+ width: calc(min(var(--composer-width), calc(100% - 2rem)) + 10px);
+ padding-inline: 5px;
padding-bottom: var(--composer-shell-pad-block-end);
}
+/* Popped-out (floating) composer: compact width + an even 5px transparent grab
+ platform. The higher-specificity selector resets the base rule's padding-bottom
+ so the inset is equal on all four sides (not 5px sides / shell-pad bottom). */
+[data-slot='composer-root'][data-popped-out] {
+ width: var(--composer-popout-width, 24rem);
+ max-width: calc(100vw - 1.5rem);
+ padding: 5px;
+}
+
+/* Dock glow intensity scale — dimmer in light mode (the primary glow reads
+ much stronger over a light backdrop), full strength in dark mode. */
+:root {
+ --dock-glow-scale: 0.55;
+}
+
+.dark {
+ --dock-glow-scale: 1;
+}
+
+/* Drag-region hatch — a diagonal ///// pattern (Photoshop-style) that fades into
+ the transparent grab margin on hover (and stays while dragging) to signal the
+ composer is draggable. Inherits the root radius so it clips to the corners. */
+[data-slot='composer-drag-region'] {
+ /* Hatch frame radius (tuned by hand). */
+ border-radius: 0.4rem;
+ opacity: 0;
+ transition: opacity 150ms ease;
+ background-image: repeating-linear-gradient(
+ -45deg,
+ color-mix(in srgb, var(--ui-text-tertiary) 38%, transparent) 0,
+ color-mix(in srgb, var(--ui-text-tertiary) 38%, transparent) 1px,
+ transparent 1px,
+ transparent 3.5px
+ );
+}
+
+[data-slot='composer-drag-region']:hover,
+[data-slot='composer-drag-region'][data-dragging] {
+ opacity: 0.33;
+}
+
[data-slot='composer-root'] > .pointer-events-none {
background: linear-gradient(
to bottom,
@@ -1017,6 +1066,12 @@ canvas {
border-color: var(--ui-stroke-secondary) !important;
}
+/* On focus we don't change the fill — just shift the border ~15% toward the
+ foreground, which darkens it in light mode and lightens it in dark mode. */
+[data-slot='composer-surface']:focus-within {
+ border-color: color-mix(in srgb, var(--ui-stroke-secondary) 85%, var(--dt-foreground)) !important;
+}
+
[data-slot='composer-fade'] {
min-height: 2.375rem;
}
@@ -1050,14 +1105,6 @@ canvas {
--composer-fill: color-mix(in srgb, var(--dt-card) 48%, transparent);
}
-[data-slot='composer-root']:has([data-slot='composer-surface']:focus-within) {
- --composer-fill: var(--ui-chat-bubble-background);
-}
-
-[data-slot='composer-root']:has([data-slot='composer-completion-drawer']) {
- --composer-fill: color-mix(in srgb, var(--dt-card) 90%, var(--dt-background));
-}
-
/* Tool/thinking blocks now live at message-text alignment (no leading
chevron column to escape into), so their headers and bodies share a
common left edge with the model's text. */
@@ -1170,19 +1217,56 @@ canvas {
background: transparent !important;
}
-[data-slot='aui_assistant-message-content'] > :is([data-slot='tool-block'], [data-slot='aui_thinking-disclosure']) {
+/* Fade scaffolding so the prose reading column stays primary. Two targets:
+ a thinking disclosure fades as one block, and each *individual* tool row
+ (`[data-tool-row]`) fades on its own. We deliberately do NOT fade the tool
+ group wrapper (`[data-tool-group]`): opacity on a parent opens a stacking
+ context, so a child row can never be more opaque than the group — that made
+ it impossible to keep one row lit (an open diff) while its siblings faded.
+ With the fade per-row, each row hovers/focuses independently. */
+[data-slot='aui_assistant-message-content'] > [data-slot='aui_thinking-disclosure'],
+[data-slot='aui_assistant-message-content'] [data-slot='tool-block'][data-tool-row] {
opacity: 0.67;
transition: opacity 120ms ease-out;
}
-[data-slot='aui_assistant-message-content']
- > :is([data-slot='tool-block'], [data-slot='aui_thinking-disclosure']):is(:hover, :focus-within) {
+/* Lift on hover or *keyboard* focus only. `:focus-within` also matches the
+ focus a mouse click leaves on the disclosure toggle, which kept a row lit
+ after you clicked to collapse it; `:has(:focus-visible)` excludes that. */
+[data-slot='aui_assistant-message-content'] > [data-slot='aui_thinking-disclosure']:is(:hover, :has(:focus-visible)),
+[data-slot='aui_assistant-message-content'] [data-slot='tool-block'][data-tool-row]:is(:hover, :has(:focus-visible)) {
opacity: 1;
}
-/* A generated image is the deliverable, not scaffolding — keep it at full
- strength instead of dimming it until hover. */
-[data-slot='aui_assistant-message-content'] > [data-slot='tool-block']:has([data-slot='aui_generated-image']) {
+/* Syntax-highlighted inline diff (Shiki): strip the theme's own surface +
+ default margins so context lines stay transparent and each changed line owns
+ its tint. `display: grid` on the code puts one `.line` per row and drops the
+ whitespace-only `\n` nodes between them — without it, full-width block lines
+ double up with the literal newlines (phantom blank rows). */
+[data-slot='file-diff-panel'] .shiki,
+[data-slot='file-diff-panel'] .shiki code {
+ margin: 0;
+ background: transparent !important;
+}
+
+[data-slot='file-diff-panel'] .shiki code {
+ display: grid;
+}
+
+/* The github-dark token palette reads candy-bright at our small code size.
+ `github-dark-dimmed` only dims the *background* (which we strip), so soften
+ the token *foregrounds* directly — a small saturation + brightness pullback,
+ hues preserved — for both code blocks and inline diffs. Dark mode only. */
+.dark .shiki {
+ filter: saturate(0.82) brightness(0.92);
+}
+
+/* File edits (write_file / edit_file / patch) are the deliverable, not
+ scaffolding — the diff is what the user reviews, like a PR. An *expanded*
+ edit stays at full strength; collapsed it fades like any other row. The
+ `data-file-edit` marker sits on the same row element and is only present
+ while the row is open. */
+[data-slot='aui_assistant-message-content'] [data-slot='tool-block'][data-tool-row][data-file-edit] {
opacity: 1;
}
diff --git a/apps/desktop/src/types/hermes.ts b/apps/desktop/src/types/hermes.ts
index a497e3f10a9..b67cc3041a7 100644
--- a/apps/desktop/src/types/hermes.ts
+++ b/apps/desktop/src/types/hermes.ts
@@ -108,6 +108,12 @@ export interface EnvVarInfo {
description: string
is_password: boolean
is_set: boolean
+ // Backend-derived provider grouping hints (from the unified provider catalog
+ // in hermes_cli/provider_catalog.py). When present, the Keys tab groups by
+ // this provider identity — the SAME one `hermes model` uses — instead of
+ // desktop-only env-var prefix guesses. Empty for non-provider env vars.
+ provider?: string
+ provider_label?: string
redacted_value: null | string
tools: string[]
url: null | string
diff --git a/cli-config.yaml.example b/cli-config.yaml.example
index 8d3525019c8..35f87b16c61 100644
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -98,7 +98,9 @@ model:
# ``stale_timeout_seconds`` controls the non-streaming stale-call detector and
# wins over the legacy HERMES_API_CALL_STALE_TIMEOUT env var. Leaving these
# unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s,
-# HERMES_API_CALL_STALE_TIMEOUT=300s, native Anthropic 900s).
+# HERMES_API_CALL_STALE_TIMEOUT=90s, native Anthropic 900s). The
+# implicit non-stream stale detector is auto-disabled for local endpoints
+# and can scale upward for very large contexts.
#
# Not currently wired for AWS Bedrock (bedrock_converse + AnthropicBedrock
# SDK paths) — those use boto3 with its own timeout configuration.
@@ -164,6 +166,16 @@ model:
#
# worktree: true # Always create a worktree when in a git repo
# worktree: false # Default — only create when -w flag is passed
+#
+# By default a new worktree branches from the freshly-fetched remote tip
+# (the current branch's upstream, else the remote's default branch) so it
+# starts current with the project instead of from the local clone's
+# (possibly stale) HEAD. Set worktree_sync: false to branch from local HEAD
+# instead — useful when offline or when you deliberately want the clone's
+# exact current state as the base.
+#
+# worktree_sync: true # Default — branch from the fetched remote tip
+# worktree_sync: false # Branch from local HEAD (offline / pinned base)
# =============================================================================
# Terminal Tool Configuration
@@ -483,6 +495,10 @@ prompt_caching:
# # reasoning controls:
# # extra_body:
# # enable_thinking: false
+# # Some vLLM/Qwen deployments expect this nested:
+# # extra_body:
+# # chat_template_kwargs:
+# # enable_thinking: false
# =============================================================================
# Persistent Memory
@@ -724,7 +740,7 @@ platform_toolsets:
# # allowed_chats: ["-1001234567890"]
# extra:
# disable_link_previews: false # Set true to suppress Telegram URL previews in bot messages
-# rich_messages: false # Bot API 10.1 rich messages (tables/task lists/details/math); default true, set false to force legacy MarkdownV2
+# rich_messages: false # Bot API 10.1 rich messages (tables/task lists/details/math); default false for copyable legacy MarkdownV2, set true to opt in
#
# Discord-specific settings (config.yaml top-level, not under platforms:):
#
@@ -803,7 +819,7 @@ platform_toolsets:
# =============================================================================
# Connect to external MCP servers to add tools from the MCP ecosystem.
# Each server's tools are automatically discovered and registered.
-# See docs/mcp.md for full documentation.
+# See website/docs/user-guide/features/mcp.md for full documentation.
#
# Stdio servers (spawn a subprocess):
# command: the executable to run
@@ -817,6 +833,10 @@ platform_toolsets:
# Optional per-server settings:
# timeout: tool call timeout in seconds (default: 120)
# connect_timeout: initial connection timeout (default: 60)
+# keepalive_interval: liveness ping cadence in seconds (default: 180).
+# Lower it below the server's session TTL for servers that expire idle
+# sessions quickly (e.g. Unreal Engine editor MCP, ~15s), otherwise idle
+# tool calls hit an expired session and pay a slow reconnect. Floored at 5s.
#
# mcp_servers:
# time:
diff --git a/cli.py b/cli.py
index 4d5ac86994b..c0753881e0b 100644
--- a/cli.py
+++ b/cli.py
@@ -452,6 +452,7 @@ def load_cli_config() -> Dict[str, Any]:
"resume_max_assistant_lines": 3,
"resume_skip_tool_only": True,
"show_reasoning": False,
+ "reasoning_full": False,
"streaming": True,
"busy_input_mode": "interrupt",
"persistent_output": True,
@@ -562,6 +563,18 @@ def load_cli_config() -> Dict[str, Any]:
from hermes_cli.config import _expand_env_vars
defaults = _expand_env_vars(defaults)
+ # Managed scope: overlay administrator-pinned values LAST so they win over
+ # the user's config here too. cli.py builds its config independently of
+ # hermes_cli.config._load_config_impl (which has its own managed merge), so
+ # without this the entire interactive CLI/TUI surface — skin, display prefs,
+ # etc. read from CLI_CONFIG — would silently ignore managed scope while
+ # `hermes config`/`doctor`/guards (which use load_config) honor it. The
+ # shared helper mirrors _load_config_impl (env-only expansion, root-model
+ # normalization, leaf-merge) and is fail-open.
+ from hermes_cli import managed_scope
+
+ defaults = managed_scope.apply_managed_overlay(defaults)
+
# Apply terminal config to environment variables (so terminal_tool picks them up)
terminal_config = defaults.get("terminal", {})
@@ -608,6 +621,7 @@ def load_cli_config() -> Dict[str, Any]:
"container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
"docker_volumes": "TERMINAL_DOCKER_VOLUMES",
"docker_env": "TERMINAL_DOCKER_ENV",
+ "docker_extra_args": "TERMINAL_DOCKER_EXTRA_ARGS",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
@@ -1019,11 +1033,20 @@ def _run_cleanup(*, notify_session_finalize: bool = True):
# partially-initialised agents where the attribute is missing.
_session_msgs = getattr(_active_agent_ref, '_session_messages', None)
if isinstance(_session_msgs, list):
+ logger.info(
+ "CLI cleanup calling memory shutdown for session %s with %d message(s)",
+ getattr(_active_agent_ref, "session_id", None) or "",
+ len(_session_msgs),
+ )
_active_agent_ref.shutdown_memory_provider(_session_msgs)
else:
+ logger.info(
+ "CLI cleanup calling memory shutdown for session %s without session message list",
+ getattr(_active_agent_ref, "session_id", None) or "",
+ )
_active_agent_ref.shutdown_memory_provider()
- except Exception:
- pass
+ except Exception as e:
+ logger.warning("CLI cleanup memory shutdown failed: %s", e, exc_info=True)
def _should_emit_cleanup_session_finalize(session_id: str | None) -> bool:
@@ -1224,11 +1247,91 @@ def _path_is_within_root(path: Path, root: Path) -> bool:
return False
-def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
+def _resolve_worktree_base(repo_root: str) -> tuple:
+ """Resolve the freshest base ref to branch a new worktree from.
+
+ The standalone clone's ``HEAD`` can lag the remote by hundreds of commits
+ (the ``~/.hermes/hermes-agent`` clone is updated only by ``hermes update``,
+ not on every session). Branching a worktree from that stale ``HEAD`` roots
+ every new branch on an old base — so the PR diff GitHub computes against
+ current ``main`` balloons with unrelated changes, and the agent has to
+ discover the staleness via the pre-push gate and rebase. Branching from the
+ freshly-fetched remote tip instead means the worktree starts current.
+
+ Strategy (each step falls back to the next on failure):
+ 1. If the current branch tracks an upstream, fetch and use that upstream
+ ref — so a deliberate feature-branch worktree tracks its own remote,
+ not the default branch.
+ 2. Else fetch the remote's default branch (``origin/HEAD`` → e.g.
+ ``origin/main``) and use it.
+ 3. Else fall back to ``HEAD`` (offline, no remote, or detached) — the
+ old behavior, never worse than before.
+
+ Returns ``(base_ref, label)`` where *base_ref* is a git revision suitable
+ for ``git worktree add ... `` and *label* is a short
+ human-readable description for the session banner.
+ """
+ import subprocess
+
+ def _git(args, timeout=20):
+ return subprocess.run(
+ ["git", *args],
+ capture_output=True, text=True, timeout=timeout, cwd=repo_root,
+ )
+
+ # 1. Current branch's upstream, if it tracks one.
+ try:
+ up = _git(["rev-parse", "--abbrev-ref", "--symbolic-full-name", "@{upstream}"])
+ if up.returncode == 0:
+ upstream = up.stdout.strip() # e.g. "origin/main"
+ if upstream and "/" in upstream:
+ remote = upstream.split("/", 1)[0]
+ # Fetch just that branch; fail-soft if offline.
+ _git(["fetch", remote, upstream.split("/", 1)[1]], timeout=30)
+ return upstream, f"{upstream} (fetched)"
+ except Exception as e:
+ logger.debug("worktree base: upstream resolution failed: %s", e)
+
+ # 2. Remote default branch (origin/HEAD).
+ try:
+ # Resolve the remote's default branch symref.
+ head_ref = _git(["symbolic-ref", "--quiet", "refs/remotes/origin/HEAD"])
+ default_ref = ""
+ if head_ref.returncode == 0:
+ default_ref = head_ref.stdout.strip().replace("refs/remotes/", "", 1)
+ if not default_ref:
+ # origin/HEAD not set locally; ask the remote.
+ show = _git(["remote", "show", "origin"], timeout=30)
+ for line in show.stdout.splitlines():
+ line = line.strip()
+ if line.startswith("HEAD branch:"):
+ _branch = line.split(":", 1)[1].strip()
+ # A remote with no default branch reports "(unknown)";
+ # don't construct a bogus "origin/(unknown)" ref from it.
+ if _branch and _branch != "(unknown)":
+ default_ref = "origin/" + _branch
+ break
+ if default_ref and "/" in default_ref:
+ remote, branch = default_ref.split("/", 1)
+ _git(["fetch", remote, branch], timeout=30)
+ return default_ref, f"{default_ref} (fetched)"
+ except Exception as e:
+ logger.debug("worktree base: default-branch resolution failed: %s", e)
+
+ # 3. Fall back to local HEAD (offline / no remote / detached).
+ return "HEAD", "HEAD (local — could not reach remote)"
+
+
+def _setup_worktree(repo_root: str = None, sync_base: bool = True) -> Optional[Dict[str, str]]:
"""Create an isolated git worktree for this CLI session.
Returns a dict with worktree metadata on success, None on failure.
The dict contains: path, branch, repo_root.
+
+ When *sync_base* is True (default), the worktree branches from the
+ freshly-fetched remote tip rather than the (possibly stale) local ``HEAD``
+ — see ``_resolve_worktree_base``. Set ``worktree_sync: false`` in config to
+ branch from local ``HEAD`` (the pre-#10760-followup behavior).
"""
import subprocess
@@ -1260,15 +1363,37 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
except Exception as e:
logger.debug("Could not update .gitignore: %s", e)
+ # Resolve the base ref. By default branch from the freshly-fetched remote
+ # tip so the worktree starts current with the project, not from the
+ # (possibly stale) local HEAD of the standalone clone (#10760 follow-up).
+ if sync_base:
+ base_ref, base_label = _resolve_worktree_base(repo_root)
+ else:
+ base_ref, base_label = "HEAD", "HEAD (local — worktree_sync disabled)"
+
# Create the worktree
try:
result = subprocess.run(
- ["git", "worktree", "add", str(wt_path), "-b", branch_name, "HEAD"],
+ ["git", "worktree", "add", str(wt_path), "-b", branch_name, base_ref],
capture_output=True, text=True, timeout=30, cwd=repo_root,
)
if result.returncode != 0:
- print(f"\033[31m✗ Failed to create worktree: {result.stderr.strip()}\033[0m")
- return None
+ # If branching from the resolved remote ref failed for any reason
+ # (e.g. a partial fetch left the ref unusable), retry from local
+ # HEAD so worktree creation never hard-fails on a sync hiccup.
+ if base_ref != "HEAD":
+ logger.warning(
+ "worktree add from %s failed (%s); retrying from local HEAD",
+ base_ref, result.stderr.strip(),
+ )
+ base_ref, base_label = "HEAD", "HEAD (fallback — remote base failed)"
+ result = subprocess.run(
+ ["git", "worktree", "add", str(wt_path), "-b", branch_name, base_ref],
+ capture_output=True, text=True, timeout=30, cwd=repo_root,
+ )
+ if result.returncode != 0:
+ print(f"\033[31m✗ Failed to create worktree: {result.stderr.strip()}\033[0m")
+ return None
except Exception as e:
print(f"\033[31m✗ Failed to create worktree: {e}\033[0m")
return None
@@ -1340,14 +1465,27 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
except Exception as e:
logger.debug("Error copying .worktreeinclude entries: %s", e)
+ # Lock the worktree so other processes (and `git worktree remove`) can see
+ # it is actively in use. Fail-soft: a lock failure never blocks the session.
+ try:
+ subprocess.run(
+ ["git", "worktree", "lock", "--reason", f"hermes pid={os.getpid()}", str(wt_path)],
+ capture_output=True, text=True, timeout=10, cwd=repo_root,
+ )
+ logger.debug("Worktree locked: %s (pid=%s)", wt_path, os.getpid())
+ except Exception as e:
+ logger.debug("git worktree lock failed (non-fatal): %s", e)
+
info = {
"path": str(wt_path),
"branch": branch_name,
"repo_root": repo_root,
+ "base": base_ref,
}
print(f"\033[32m✓ Worktree created:\033[0m {wt_path}")
print(f" Branch: {branch_name}")
+ print(f" Base: {base_label}")
return info
@@ -1415,6 +1553,16 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
# Remove worktree (even if working tree is dirty — uncommitted
# changes without unpushed commits are just artifacts)
+ # Unlock first so `git worktree remove` isn't blocked by the lock we
+ # placed at creation time. Fail-soft — never block cleanup.
+ try:
+ subprocess.run(
+ ["git", "worktree", "unlock", wt_path],
+ capture_output=True, text=True, timeout=10, cwd=repo_root,
+ )
+ except Exception as e:
+ logger.debug("git worktree unlock failed (non-fatal): %s", e)
+
try:
subprocess.run(
["git", "worktree", "remove", wt_path, "--force"],
@@ -3259,6 +3407,9 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
# show_reasoning: display model thinking/reasoning before the response
self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
+ # reasoning_full: when reasoning display is on, print the post-response
+ # recap box uncollapsed instead of clamping to the first 10 lines.
+ self.reasoning_full = CLI_CONFIG["display"].get("reasoning_full", False)
_configure_output_history(
enabled=CLI_CONFIG["display"].get("persistent_output", True),
max_lines=CLI_CONFIG["display"].get("persistent_output_max_lines", 200),
@@ -3503,11 +3654,36 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self._last_turn_finished_at: Optional[float] = None # time.time() when the last agent loop finished
# Initialize SQLite session store early so /title works before first message
self._session_db = None
+ self._session_db_unavailable = False
try:
from hermes_state import SessionDB
self._session_db = SessionDB()
except Exception as e:
+ # #41386: a failed session store means the transcript is NOT
+ # persisted to state.db — the live chat looks healthy but resume
+ # later shows a truncated/empty session. A buried log line is not
+ # enough; surface it prominently so the user knows persistence is
+ # off for this run and can fix the store before relying on resume.
+ self._session_db_unavailable = True
logger.warning("Failed to initialize SessionDB — session will NOT be indexed for search: %s", e)
+ try:
+ # Console is imported at module scope; do NOT re-import it here.
+ # A function-local `import` would make `Console` a local name for
+ # the whole __init__ body and break the earlier `self.console =
+ # Console()` with UnboundLocalError.
+ Console(stderr=True).print(
+ "[bold yellow]⚠ Session store unavailable[/bold yellow] — "
+ "this conversation will [bold]NOT be saved[/bold] to disk and "
+ "cannot be resumed later. Searching past sessions is also disabled.\n"
+ f" Reason: {e}\n"
+ " Fix the state.db store (e.g. `hermes update` to rebuild the venv) to restore persistence."
+ )
+ except Exception:
+ # Never let the warning path itself break startup.
+ print(
+ "WARNING: Session store unavailable — this conversation will NOT be "
+ f"saved to disk and cannot be resumed later. Reason: {e}"
+ )
# Opportunistic state.db maintenance — runs at most once per
# min_interval_hours, tracked via state_meta in state.db itself so
@@ -3637,6 +3813,15 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self._resize_recovery_lock = threading.Lock()
self._resize_recovery_timer = None
self._resize_recovery_pending = False
+ # Debounced timer that clears the post-resize suppression once the
+ # terminal reflow settles, so the status bar returns during idle
+ # without waiting for the next submitted input.
+ self._status_bar_unsuppress_timer = None
+ # Last terminal width seen by the resize handler. Used to distinguish a
+ # width change (column reflow → possible ghost chrome, needs a viewport
+ # clear) from a rows-only change (no reflow). None until the first
+ # resize fires.
+ self._last_resize_width = None
# Background task tracking: {task_id: threading.Thread}
self._background_tasks: Dict[str, threading.Thread] = {}
@@ -3787,15 +3972,112 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
origin and can leave stale prompt glyphs after a narrow resize.
We also flag ``_status_bar_suppressed_after_resize`` so the dynamic
- status bar and input separator rules stay hidden until the next user
- input. On column shrink the terminal reflows already-rendered status
- bar rows into scrollback before prompt_toolkit can erase them; drawing
- a fresh full-width bar immediately makes the old and new versions
- look duplicated (#19280, #22976). Clearing the suppression on the
- next prompt restores the bar cleanly.
+ status bar and input separator rules stay hidden while the terminal
+ reflow settles. On column shrink the terminal reflows already-rendered
+ status bar rows into scrollback before prompt_toolkit can erase them;
+ drawing a fresh full-width bar immediately makes the old and new
+ versions look duplicated (#19280, #22976).
+
+ Suppression alone is not enough on a WIDTH change. prompt_toolkit's
+ ``renderer.erase()`` does ``cursor_up(_cursor_pos.y)`` + ``erase_down()``
+ using the ``_cursor_pos.y`` cached from the LAST render at the OLD
+ width (renderer.py). When the column count shrinks, the terminal
+ reflows each already-painted full-width chrome row into 2+ physical
+ rows, so the cached ``y`` undershoots: ``cursor_up`` does not climb
+ past the reflowed rows and ``erase_down`` leaves the stale bar stranded
+ ABOVE the live origin. The next paint then stacks a fresh bar below it
+ — the duplicated-status-bar report (two bars, two elapsed readings).
+ Suppression hides the *new* bar but never erases the already-reflowed
+ *old* one, so the ghost survives the whole suppression window.
+
+ Fix: on a width change, wipe the visible viewport with ``erase_screen``
+ (CSI 2J) BEFORE delegating to prompt_toolkit's resize, then let its
+ repaint redraw from a clean origin. This is banner-safe: 2J clears
+ only the visible screen, NOT scrollback history (that is CSI 3J, which
+ we do not send here — ``rebuild_scrollback=False``), so the startup
+ banner that scrolled into history is preserved and
+ ``_replay_output_history`` is not needed. Row-count-only changes skip
+ the clear (no reflow, so no ghost) to avoid an unnecessary repaint.
+
+ The suppression is transient: a short follow-up timer clears it and
+ repaints once the reflow has settled, so the bar returns on its own
+ during idle. Previously the flag was only cleared on the next
+ *submitted* user input, so a resize/reflow (tmux pane change, SSH
+ window restore, font zoom) followed by idle left the status bar hidden
+ indefinitely even while the refresh clock kept ticking (the dynamic
+ chrome rendered at height 0 on every repaint). The next-submit clear
+ at the input loop remains as a fast path.
"""
self._status_bar_suppressed_after_resize = True
+ # On a WIDTH change the terminal has already reflowed the old full-width
+ # chrome into extra physical rows that prompt_toolkit's stale-cursor
+ # erase (cursor_up(_cursor_pos.y) cached at the OLD width) will not
+ # reach, leaving a duplicated status bar stranded above the live origin.
+ # Ctrl+L / /redraw clears it cleanly, so route the resize path through
+ # the SAME recovery: wipe the visible viewport (banner-safe — CSI 2J
+ # only, never CSI 3J) and replay the transcript so nothing is lost.
+ # Row-count-only changes skip this (no reflow → no ghost) to avoid an
+ # unnecessary full repaint.
+ try:
+ new_width = self._get_tui_terminal_width()
+ except Exception:
+ new_width = None
+ prev_width = getattr(self, "_last_resize_width", None)
+ # First resize of the session has no prior width to compare against;
+ # treat it as a change so an initial maximize/restore is covered too.
+ width_changed = new_width is not None and new_width != prev_width
+ if width_changed:
+ try:
+ self._clear_prompt_toolkit_screen(app, rebuild_scrollback=False)
+ _replay_output_history()
+ except Exception:
+ pass
+ if new_width is not None:
+ self._last_resize_width = new_width
original_on_resize()
+ self._schedule_status_bar_unsuppress(app)
+
+ def _schedule_status_bar_unsuppress(self, app, delay: float = 0.35) -> None:
+ """Clear the post-resize status-bar suppression after the reflow settles.
+
+ Debounced: a fresh resize cancels the pending unsuppress and restarts
+ the timer, so a resize storm only repaints the bar once it stops.
+ """
+ try:
+ old_timer = getattr(self, "_status_bar_unsuppress_timer", None)
+ if old_timer is not None:
+ try:
+ old_timer.cancel()
+ except Exception:
+ pass
+
+ def _clear():
+ self._status_bar_suppressed_after_resize = False
+ try:
+ app.invalidate()
+ except Exception:
+ pass
+
+ def _fire():
+ try:
+ loop = getattr(app, "loop", None)
+ except Exception:
+ loop = None
+ if loop is not None:
+ try:
+ loop.call_soon_threadsafe(_clear)
+ return
+ except Exception:
+ pass
+ _clear()
+
+ timer = threading.Timer(delay, _fire)
+ timer.daemon = True
+ self._status_bar_unsuppress_timer = timer
+ timer.start()
+ except Exception:
+ # Fail open: never leave the bar stuck hidden.
+ self._status_bar_suppressed_after_resize = False
def _schedule_resize_recovery(self, app, original_on_resize, delay: float = 0.12) -> None:
"""Debounce resize redraws so footer chrome is not stamped into scrollback."""
@@ -5328,12 +5610,86 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
# Set skip flag (again) so the text-change event fired when the
# editor closes does not re-collapse the returned content.
self._skip_paste_collapse = True
- target_buffer.open_in_editor(validate_and_handle=False)
+ # Open the editor, then submit the saved draft on a clean exit —
+ # matching the TUI's Ctrl+G (openEditor), which sends the buffer
+ # instead of requiring a second Enter. Submission in this CLI is
+ # driven by the custom `enter` keybinding, NOT the buffer's
+ # accept_handler, so validate_and_handle can't route through it;
+ # chain a done-callback on the returned Task that re-uses the
+ # real submit pipeline via _submit_editor_buffer().
+ task = target_buffer.open_in_editor(validate_and_handle=False)
+ if task is not None and hasattr(task, "add_done_callback"):
+ task.add_done_callback(
+ lambda _t, b=target_buffer: self._submit_editor_buffer(b)
+ )
return True
except Exception as exc:
_cprint(f"{_DIM}Failed to open external editor: {exc}{_RST}")
return False
+ def _submit_editor_buffer(self, buffer) -> None:
+ """Submit the draft an external editor left in ``buffer``.
+
+ Invoked from the Ctrl+G done-callback so saving the editor sends the
+ prompt (TUI parity) instead of leaving it sitting in the input area.
+ Mirrors the idle/queue branches of the `enter` keybinding handler:
+ an empty save is ignored (never submits a blank turn), a slash command
+ is dispatched, otherwise the text is routed through the same input
+ queues the normal Enter path uses. Runs on the prompt_toolkit event
+ loop via the Task callback, so it must be cheap and non-blocking.
+ """
+ try:
+ text = (getattr(buffer, "text", "") or "").strip()
+ except Exception:
+ return
+ if not text:
+ # Editor saved empty / was cleared — match the TUI, which drops
+ # an empty draft instead of submitting a blank turn.
+ return
+
+ app = getattr(self, "_app", None)
+
+ # Slash commands: dispatch directly, same as the Enter handler's
+ # _looks_like_slash_command branch.
+ if _looks_like_slash_command(text):
+ try:
+ if not self.process_command(text):
+ self._should_exit = True
+ if app is not None and app.is_running:
+ app.exit()
+ except Exception as exc:
+ _cprint(f" {_DIM}Command failed: {exc}{_RST}")
+ finally:
+ self._reset_input_buffer(buffer)
+ if app is not None:
+ app.invalidate()
+ return
+
+ # Regular prompt: route through the same queues the Enter handler uses.
+ if self._agent_running:
+ # Agent busy → honour the configured busy-input behaviour by
+ # queueing for the next turn (the safe default; interrupt/steer
+ # remain reachable via the normal Enter path).
+ self._interrupt_queue.put(text) if self.busy_input_mode == "interrupt" else self._pending_input.put(text)
+ preview = text[:80] + ("..." if len(text) > 80 else "")
+ _cprint(f" Queued for the next turn: {preview}")
+ else:
+ self._pending_input.put(text)
+
+ self._reset_input_buffer(buffer)
+ if app is not None:
+ app.invalidate()
+
+ def _reset_input_buffer(self, buffer) -> None:
+ """Clear an input buffer after a programmatic submit (best-effort)."""
+ try:
+ buffer.reset(append_to_history=True)
+ except Exception:
+ try:
+ buffer.text = ""
+ except Exception:
+ pass
+
def _install_tool_callbacks(self) -> None:
@@ -6091,6 +6447,22 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
preview_limit = 400
visible_index = 0
hidden_tool_messages = 0
+ show_ts = bool(getattr(self, "show_timestamps", False))
+
+ def _ts_suffix(message: dict) -> str:
+ # Messages restored from SessionDB carry a unix `timestamp`; live
+ # unsaved turns may not. Only annotate when both the toggle is on
+ # and the turn actually has a stored time — never fabricate one.
+ if not show_ts:
+ return ""
+ ts = message.get("timestamp")
+ if not ts:
+ return ""
+ try:
+ from datetime import datetime
+ return f" [{datetime.fromtimestamp(float(ts)).strftime('%H:%M')}]"
+ except (ValueError, OSError, TypeError):
+ return ""
def flush_tool_summary():
nonlocal hidden_tool_messages
@@ -6124,13 +6496,13 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
content_text = "" if content is None else str(content)
if role == "user":
- print(f"\n [You #{visible_index}]")
+ print(f"\n [You #{visible_index}]{_ts_suffix(msg)}")
print(
f" {content_text[:preview_limit]}{'...' if len(content_text) > preview_limit else ''}"
)
continue
- print(f"\n [Hermes #{visible_index}]")
+ print(f"\n [Hermes #{visible_index}]{_ts_suffix(msg)}")
tool_calls = msg.get("tool_calls") or []
if content_text:
preview = content_text[:preview_limit]
@@ -6994,7 +7366,35 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
_cprint(f" ✗ {result.error_message}")
return
+ if self.agent is not None:
+ try:
+ from hermes_cli.context_switch_guard import merge_preflight_compression_warning
+
+ merge_preflight_compression_warning(
+ result,
+ agent=self.agent,
+ messages=list(self.conversation_history or []),
+ config_context_length=getattr(self.agent, "_config_context_length", None),
+ )
+ except Exception as exc:
+ logger.debug("preflight-compression switch warning failed: %s", exc)
+
old_model = self.model
+ # Snapshot the CLI-level credential/runtime fields BEFORE mutating them
+ # so a failed in-place agent swap can roll the whole CLI back to the old
+ # working model. Otherwise the broken credentials staged below leak into
+ # the next turn's resolution even though the agent itself rolled back
+ # (#50163).
+ _cli_snapshot = {
+ "model": self.model,
+ "provider": self.provider,
+ "requested_provider": self.requested_provider,
+ "_explicit_api_key": getattr(self, "_explicit_api_key", None),
+ "_explicit_base_url": getattr(self, "_explicit_base_url", None),
+ "api_key": self.api_key,
+ "base_url": self.base_url,
+ "api_mode": self.api_mode,
+ }
self.model = result.new_model
self.provider = result.target_provider
self.requested_provider = result.target_provider
@@ -7020,7 +7420,17 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
api_mode=result.api_mode,
)
except Exception as exc:
- _cprint(f" ⚠ Agent swap failed ({exc}); change applied to next session.")
+ # The agent rolled itself back to the old working model/client.
+ # Roll the CLI's own staged fields back too and abort the rest
+ # of the commit (note + success print) so a failed switch is a
+ # no-op rather than a dead session (#50163).
+ for _k, _v in _cli_snapshot.items():
+ setattr(self, _k, _v)
+ _cprint(
+ f" ⚠ Model switch to {result.new_model} failed ({exc}); "
+ f"staying on {old_model}."
+ )
+ return
self._pending_model_switch_note = (
f"[Note: model was just switched from {old_model} to {result.new_model} "
@@ -7144,24 +7554,43 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self._close_model_picker()
def _handle_model_switch(self, cmd_original: str):
- """Handle /model command — switch model for this session.
+ """Handle /model command — switch model.
Supports:
/model — show current model + usage hints
- /model — switch for this session only
- /model --global — switch and persist to config.yaml
+ /model — switch model (persists by default)
+ /model --session — switch for this session only
+ /model --global — switch and persist (explicit)
/model --provider — switch provider + model
/model --provider — switch to provider, auto-detect model
+
+ Persistence defaults to on (``model.persist_switch_by_default`` in
+ config.yaml, default True). Use ``--session`` for a one-off switch.
"""
- from hermes_cli.model_switch import switch_model, parse_model_flags
+ from hermes_cli.model_switch import (
+ switch_model,
+ parse_model_flags,
+ resolve_persist_behavior,
+ )
from hermes_cli.providers import get_label
# Parse args from the original command
parts = cmd_original.split(None, 1) # split off '/model'
raw_args = parts[1].strip() if len(parts) > 1 else ""
- # Parse --provider, --global, and --refresh flags
- model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
+ # Parse --provider, --global, --session, and --refresh flags
+ (
+ model_input,
+ explicit_provider,
+ is_global_flag,
+ force_refresh,
+ is_session,
+ ) = parse_model_flags(raw_args)
+ # Resolve the effective persistence once: --session overrides the
+ # config-gated default, --global forces persist, otherwise defer to
+ # model.persist_switch_by_default (defaults to True so /model survives
+ # across sessions).
+ persist_global = resolve_persist_behavior(is_global_flag, is_session)
# --refresh: wipe the on-disk picker cache before building the
# provider list. Forces a live re-fetch of every authed provider's
@@ -7209,7 +7638,8 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
if not providers:
_cprint(" No authenticated providers found.")
_cprint("")
- _cprint(" /model switch model")
+ _cprint(" /model switch model (persists)")
+ _cprint(" /model --session switch for this session only")
_cprint(" /model --provider switch provider")
_cprint(" /model --refresh re-fetch live model lists")
return
@@ -7240,6 +7670,19 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
_cprint(f" ✗ {result.error_message}")
return
+ if self.agent is not None:
+ try:
+ from hermes_cli.context_switch_guard import merge_preflight_compression_warning
+
+ merge_preflight_compression_warning(
+ result,
+ agent=self.agent,
+ messages=list(self.conversation_history or []),
+ config_context_length=getattr(self.agent, "_config_context_length", None),
+ )
+ except Exception as exc:
+ logger.debug("preflight-compression switch warning failed: %s", exc)
+
if not self._confirm_expensive_model_switch(result):
_cprint(" Model switch cancelled.")
return
@@ -7248,6 +7691,18 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
# Update requested_provider so _ensure_runtime_credentials() doesn't
# overwrite the switch on the next turn (it re-resolves from this).
old_model = self.model
+ # Snapshot CLI-level fields before mutation so a failed in-place swap
+ # rolls the whole CLI back to the old working model (#50163).
+ _cli_snapshot = {
+ "model": self.model,
+ "provider": self.provider,
+ "requested_provider": self.requested_provider,
+ "_explicit_api_key": getattr(self, "_explicit_api_key", None),
+ "_explicit_base_url": getattr(self, "_explicit_base_url", None),
+ "api_key": self.api_key,
+ "base_url": self.base_url,
+ "api_mode": self.api_mode,
+ }
self.model = result.new_model
self.provider = result.target_provider
self.requested_provider = result.target_provider
@@ -7274,7 +7729,15 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
api_mode=result.api_mode,
)
except Exception as exc:
- _cprint(f" ⚠ Agent swap failed ({exc}); change applied to next session.")
+ # Agent rolled itself back; roll the CLI back too and abort so a
+ # failed switch is a no-op rather than a dead session (#50163).
+ for _k, _v in _cli_snapshot.items():
+ setattr(self, _k, _v)
+ _cprint(
+ f" ⚠ Model switch to {result.new_model} failed ({exc}); "
+ f"staying on {old_model}."
+ )
+ return
# Store a note to prepend to the next user message so the model
# knows a switch occurred (avoids injecting system messages mid-history
@@ -7329,7 +7792,7 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
save_config_value("model.default", result.new_model)
if result.provider_changed:
save_config_value("model.provider", result.target_provider)
- _cprint(" Saved to config.yaml (--global)")
+ _cprint(" Saved to config.yaml")
else:
_cprint(" (session only — add --global to persist)")
@@ -7700,8 +8163,6 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self._handle_model_switch(cmd_original)
elif canonical == "codex-runtime":
self._handle_codex_runtime(cmd_original)
- elif canonical == "gquota":
- self._handle_gquota_command(cmd_original)
elif canonical == "personality":
# Use original case (handler lowercases the personality name itself)
@@ -7713,6 +8174,8 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
if retry_msg and hasattr(self, '_pending_input'):
# Re-queue the message so process_loop sends it to the agent
self._pending_input.put(retry_msg)
+ elif canonical == "prompt":
+ self._handle_prompt_compose_command(cmd_original)
elif canonical == "undo":
# Parse optional turn count: "/undo" → 1, "/undo 3" → 3.
_undo_n = 1
@@ -7764,6 +8227,8 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
self._status_bar_visible = not self._status_bar_visible
state = "visible" if self._status_bar_visible else "hidden"
self._console_print(f" Status bar {state}")
+ elif canonical == "timestamps":
+ self._handle_timestamps_command(cmd_original)
elif canonical == "verbose":
self._toggle_verbose()
elif canonical == "footer":
@@ -9710,16 +10175,35 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
else:
print(f" 🔧 {len(new_tools)} tool(s) available from {len(connected_servers)} server(s)")
- # Refresh the agent's tool list so the model can call new tools
+ # Refresh the agent's tool list so the model can call new tools.
+ # Route through the shared helper so this CLI /reload-mcp path stays
+ # in lockstep with the TUI RPC / gateway reload / late-binding paths
+ # (name-diff, thread-safe, and — critically — additive-preserving so
+ # memory-provider and context-engine tools survive the rebuild).
if self.agent is not None:
- self.agent.tools = get_tool_definitions(
- enabled_toolsets=self.agent.enabled_toolsets
- if hasattr(self.agent, "enabled_toolsets") else None,
+ from tools.mcp_tool import refresh_agent_mcp_tools
+ # Explicit reload: pick up MCP servers the user ENABLED in config
+ # this session. self.enabled_toolsets was resolved once at
+ # startup; merge in any now-connected server names (unless the
+ # user pinned `all`/`*`, which already includes everything) so a
+ # freshly-added server isn't filtered out. Mirrors startup, where
+ # MCP server names are part of enabled_toolsets (see __init__).
+ enabled_override = None
+ et = self.enabled_toolsets
+ if et and "all" not in et and "*" not in et:
+ merged = list(et)
+ for _name in sorted(connected_servers):
+ if _name not in merged:
+ merged.append(_name)
+ enabled_override = merged
+ refresh_agent_mcp_tools(
+ self.agent,
+ enabled_override=enabled_override,
quiet_mode=True,
)
- self.agent.valid_tool_names = {
- tool["function"]["name"] for tool in self.agent.tools
- } if self.agent.tools else set()
+ # Keep the CLI's own list in sync with what the agent now uses.
+ if enabled_override is not None:
+ self.enabled_toolsets = enabled_override
# Inject a message at the END of conversation history so the
# model knows tools changed. Appended after all existing
@@ -11400,11 +11884,12 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
r_fill = w - 2 - len(r_label)
r_top = f"{_DIM}┌─{r_label}{'─' * max(r_fill - 1, 0)}┐{_RST}"
r_bot = f"{_DIM}└{'─' * (w - 2)}┘{_RST}"
- # Collapse long reasoning: show first 10 lines
+ # Collapse long reasoning to the first 10 lines unless the
+ # user opted into full display via /reasoning full.
lines = reasoning.strip().splitlines()
- if len(lines) > 10:
+ if len(lines) > 10 and not getattr(self, "reasoning_full", False):
display_reasoning = "\n".join(lines[:10])
- display_reasoning += f"\n{_DIM} ... ({len(lines) - 10} more lines){_RST}"
+ display_reasoning += f"\n{_DIM} ... ({len(lines) - 10} more lines — /reasoning full to show){_RST}"
else:
display_reasoning = reasoning.strip()
_cprint(f"\n{r_top}\n{_DIM}{display_reasoning}{_RST}\n{r_bot}")
@@ -11554,6 +12039,36 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
except Exception:
pass
+ def _persist_active_session_before_close(self):
+ """Best-effort SQLite/JSON flush before the CLI marks a session closed.
+
+ ``run_conversation()`` normally persists at turn boundaries, but a
+ terminal close/SIGHUP/SIGTERM can unwind the prompt_toolkit app while
+ the agent thread still holds the current turn only in memory. Flush the
+ agent's live ``_session_messages`` before ``end_session()`` so resume,
+ session_search, and state.db do not lose the interrupted turn.
+ """
+ agent = getattr(self, "agent", None)
+ if not agent or not hasattr(agent, "_persist_session"):
+ return
+
+ messages = getattr(agent, "_session_messages", None)
+ if not isinstance(messages, list):
+ messages = getattr(self, "conversation_history", None)
+ if not isinstance(messages, list) or not messages:
+ return
+
+ conversation_history = getattr(self, "conversation_history", None)
+ if not isinstance(conversation_history, list):
+ conversation_history = messages
+
+ try:
+ agent._persist_session(messages, conversation_history)
+ if getattr(agent, "session_id", None):
+ self.session_id = agent.session_id
+ except (Exception, KeyboardInterrupt) as e:
+ logger.debug("Could not persist active CLI session before close: %s", e)
+
def _print_exit_summary(self):
"""Print session resume info on exit, similar to Claude Code."""
# Clear the screen + scrollback before printing the summary so the
@@ -12114,7 +12629,13 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
# --- /model picker modal ---
if self._model_picker_state:
try:
- self._handle_model_picker_selection()
+ # Picker selections persist by default (same default as
+ # /model ); honour model.persist_switch_by_default.
+ from hermes_cli.model_switch import resolve_persist_behavior
+
+ self._handle_model_picker_selection(
+ persist_global=resolve_persist_behavior(False, False)
+ )
except Exception as _exc:
_cprint(f" ✗ Model selection failed: {_exc}")
self._close_model_picker()
@@ -13734,13 +14255,13 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
style=style,
full_screen=False,
mouse_support=False,
- # The status bar contains wall-clock read-outs (live prompt elapsed
- # and idle-since-last-turn). Once a turn finishes there may be no
- # further events to invalidate the app, so prompt_toolkit would keep
- # rendering the first post-turn value (usually ``✓ 0s``) forever.
- # A low-rate refresh keeps the clock honest without reintroducing a
- # custom repaint thread or touching conversation state.
- refresh_interval=1.0,
+ # Read from display.cli_refresh_interval (default 0 = disabled).
+ # When non-zero, prompt_toolkit redraws the UI on this cadence
+ # during idle, keeping wall-clock status-bar read-outs ticking.
+ # Set to 0 to suppress background redraws entirely — avoids
+ # fighting terminal auto-scroll in non-fullscreen mode (Xshell,
+ # iTerm2, Windows Terminal). See #48309.
+ refresh_interval=float(CLI_CONFIG.get("display", {}).get("cli_refresh_interval", 0)),
# Erase the live bottom chrome (status bar, input box, separator
# rules) on exit instead of freezing a final copy into scrollback.
# Without this, prompt_toolkit's render_as_done teardown repaints
@@ -14262,6 +14783,12 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
set_sudo_password_callback(None)
set_approval_callback(None)
set_secret_capture_callback(None)
+ # Flush any in-memory turn transcript before marking the session
+ # closed. On SIGHUP/SIGTERM/window close the agent thread may not
+ # reach its normal run_conversation() persistence path before the
+ # daemon thread is reaped.
+ self._persist_active_session_before_close()
+
# Close session in SQLite
if hasattr(self, '_session_db') and self._session_db and self.agent:
try:
@@ -14509,7 +15036,11 @@ def main(
_repo = _git_repo_root()
if _repo:
_prune_stale_worktrees(_repo)
- wt_info = _setup_worktree()
+ # Branch the worktree from the freshly-fetched remote tip by
+ # default so it starts current with the project. Opt out with
+ # worktree_sync: false to branch from local HEAD instead.
+ _sync_base = CLI_CONFIG.get("worktree_sync", True)
+ wt_info = _setup_worktree(sync_base=_sync_base)
if wt_info:
_active_worktree = wt_info
os.environ["TERMINAL_CWD"] = wt_info["path"]
diff --git a/cron/jobs.py b/cron/jobs.py
index 178bd0fad81..6ec6d5be123 100644
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -12,6 +12,7 @@ import logging
import shutil
import tempfile
import threading
+import time
import os
import re
import uuid
@@ -30,7 +31,7 @@ except ImportError: # pragma: no cover - non-Windows
msvcrt = None
from datetime import datetime, timedelta
from pathlib import Path
-from hermes_constants import get_hermes_home
+from hermes_constants import get_default_hermes_root, get_hermes_home
from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
@@ -48,9 +49,23 @@ except ImportError:
# Configuration
# =============================================================================
-HERMES_DIR = get_hermes_home().resolve()
+HERMES_DIR = get_default_hermes_root().resolve()
CRON_DIR = HERMES_DIR / "cron"
JOBS_FILE = CRON_DIR / "jobs.json"
+# Heartbeat file the in-process ticker touches on every loop iteration. The
+# gateway process and the (separate) ``hermes cron status`` process share it
+# so status can tell whether the ticker THREAD is alive, not just whether the
+# gateway PROCESS exists — a ticker that dies silently inside a live gateway
+# would otherwise report healthy (#32612, #32895).
+TICKER_HEARTBEAT_FILE = CRON_DIR / "ticker_heartbeat"
+# Last tick that completed WITHOUT raising. Distinguishing this from the plain
+# heartbeat lets status detect a ticker that is alive but failing every tick.
+TICKER_SUCCESS_FILE = CRON_DIR / "ticker_last_success"
+# Default ticker loop interval (seconds). The single source of truth shared by
+# the in-process ticker (cron/scheduler_provider.py) and the staleness
+# threshold in `hermes cron status` (hermes_cli/cron.py), so the two never
+# drift apart.
+TICKER_INTERVAL_SECONDS = 60
# In-process lock protecting load_jobs→modify→save_jobs cycles.
# Required when tick() runs jobs in parallel threads — without this,
@@ -394,6 +409,31 @@ def _ensure_aware(dt: datetime) -> datetime:
return dt.astimezone(target_tz)
+def _timezone_offset_mismatch(stored: datetime, current: datetime) -> bool:
+ """Return True when a stored aware timestamp uses a different UTC offset.
+
+ Naive stored timestamps return False: they carry no offset to compare, and
+ are normalized by ``_ensure_aware`` instead — they intentionally never take
+ the offset-repair path.
+ """
+ if stored.tzinfo is None or current.tzinfo is None:
+ return False
+ return stored.utcoffset() != current.utcoffset()
+
+
+def _stored_wall_clock_is_future(stored: datetime, current: datetime) -> bool:
+ """Return True when the stored local wall-clock time has not arrived yet.
+
+ Cron schedules express local wall-clock intent. If Hermes/system local time
+ changes after next_run_at was persisted, an old offset can make a future
+ wall-clock run look due at the converted absolute time (for example
+ 21:00+10 becomes 13:00+02). Comparing naive wall-clock values lets us
+ distinguish that migration case from a genuinely missed run whose scheduled
+ wall time has already passed.
+ """
+ return stored.replace(tzinfo=None) > current.replace(tzinfo=None)
+
+
def _recoverable_oneshot_run_at(
schedule: Dict[str, Any],
now: datetime,
@@ -499,14 +539,120 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
return None
+# =============================================================================
+# Ticker heartbeat (liveness signal for `hermes cron status`)
+# =============================================================================
+
+def _atomic_write_epoch(path: Path) -> None:
+ """Atomically write the current epoch time to ``path``.
+
+ Uses the same tmpfile + ``atomic_replace`` pattern as ``save_jobs`` so a
+ concurrent reader in another process (``hermes cron status``) never sees a
+ torn/truncated file. Best-effort: failures are swallowed by callers.
+ """
+ ensure_dirs()
+ fd, tmp_path = tempfile.mkstemp(dir=str(CRON_DIR), suffix=".tmp", prefix=".hb_")
+ try:
+ with os.fdopen(fd, "w", encoding="utf-8") as f:
+ f.write(str(time.time()))
+ f.flush()
+ os.fsync(f.fileno())
+ atomic_replace(tmp_path, path)
+ except BaseException:
+ try:
+ os.unlink(tmp_path)
+ except OSError:
+ pass
+ raise
+
+
+def record_ticker_heartbeat(success: bool = False) -> None:
+ """Record a ticker liveness signal, and optionally a successful-tick signal.
+
+ The ticker calls this once per loop iteration. ``success=True`` additionally
+ bumps the *last successful tick* marker. We track two distinct signals so
+ `hermes cron status` can tell a thread that is merely *alive and looping*
+ (heartbeat fresh, success stale) from one that is actually *firing jobs*
+ (both fresh) — a ticker stuck failing every tick would otherwise keep the
+ plain heartbeat fresh and falsely report healthy (#32612, #32895).
+
+ Best-effort: a write failure must never disrupt the tick loop.
+ """
+ try:
+ _atomic_write_epoch(TICKER_HEARTBEAT_FILE)
+ except Exception:
+ pass
+ if success:
+ try:
+ _atomic_write_epoch(TICKER_SUCCESS_FILE)
+ except Exception:
+ pass
+
+
+def _epoch_file_age(path: Path) -> Optional[float]:
+ try:
+ raw = path.read_text(encoding="utf-8").strip()
+ return max(0.0, time.time() - float(raw))
+ except Exception:
+ return None
+
+
+def get_ticker_heartbeat_age() -> Optional[float]:
+ """Seconds since the ticker loop last iterated, or None if unknown.
+
+ None = heartbeat file missing/unreadable (older build, never ran, or a
+ torn read). Callers treat None as "cannot determine", not "dead".
+ """
+ return _epoch_file_age(TICKER_HEARTBEAT_FILE)
+
+
+def get_ticker_success_age() -> Optional[float]:
+ """Seconds since the ticker last completed a tick WITHOUT raising, or None."""
+ return _epoch_file_age(TICKER_SUCCESS_FILE)
+
+
# =============================================================================
# Job CRUD Operations
# =============================================================================
+_WARNED_ORPHAN_STORE = False
+
+
+def _warn_if_orphaned_profile_store() -> None:
+ """Loudly warn (once) if the root store is empty but a profile-local
+ jobs.json exists from before #32091's root-anchoring fix.
+
+ Such a file is now unreachable (the store anchors at the default root, not
+ the active profile). The jobs in it were already orphaned pre-fix (the
+ profile-less gateway never read them), so this is not a regression — but a
+ user who could SEE them in `cron list` under their profile would otherwise
+ find them silently gone. Point them at the path instead of failing silent.
+ """
+ global _WARNED_ORPHAN_STORE
+ if _WARNED_ORPHAN_STORE:
+ return
+ try:
+ active = get_hermes_home().resolve()
+ if active == HERMES_DIR:
+ return # not in a profile; nothing could be orphaned
+ legacy = active / "cron" / "jobs.json"
+ if legacy.exists():
+ _WARNED_ORPHAN_STORE = True
+ logger.warning(
+ "Cron jobs now live at %s (shared across profiles). A legacy "
+ "profile-local store exists at %s and is no longer read; "
+ "re-create those jobs or move them into the root store. (#32091)",
+ JOBS_FILE, legacy,
+ )
+ except Exception:
+ pass # best-effort advisory; never block load_jobs
+
+
def load_jobs() -> List[Dict[str, Any]]:
"""Load all jobs from storage."""
ensure_dirs()
if not JOBS_FILE.exists():
+ _warn_if_orphaned_profile_store()
return []
_strict_retry = False # track whether we used the strict=False fallback
@@ -976,6 +1122,9 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
job["last_error"] = error if not success else None
# Track delivery failures separately — cleared on successful delivery
job["last_delivery_error"] = delivery_error
+ # Clear any external-fire claim so a re-armed recurring job can
+ # be claimed again on its next fire (Phase 4C CAS).
+ job["fire_claim"] = None
# Increment completed count
if job.get("repeat"):
@@ -1057,13 +1206,84 @@ def advance_next_run(job_id: str) -> bool:
return False
+def _machine_id() -> str:
+ """Stable-ish identifier for claim attribution/debugging (NOT correctness).
+
+ Uses ``HERMES_MACHINE_ID`` if set, else hostname + pid. The CAS correctness
+ comes from the file lock + the fresh-claim check, not from this value.
+ """
+ explicit = os.getenv("HERMES_MACHINE_ID", "").strip()
+ if explicit:
+ return explicit
+ try:
+ import socket
+ host = socket.gethostname()
+ except Exception:
+ host = "unknown"
+ return f"{host}:{os.getpid()}"
+
+
+def claim_job_for_fire(job_id: str, *, claim_ttl_seconds: int = 300) -> bool:
+ """Atomically claim a job for a single external 'fire' (multi-machine
+ at-most-once). Returns True iff THIS caller won the claim.
+
+ Used by the external-provider fire path (``CronScheduler.fire_due``) when an
+ external scheduler (Chronos) signals a job is due across N gateway replicas:
+ exactly one wins. Single-machine deployments always win.
+
+ Under the file lock: reject if the job is missing/disabled/paused. If a
+ fresh claim (younger than ``claim_ttl_seconds``) already exists, lose.
+ Otherwise stamp a ``fire_claim`` and, for recurring jobs, advance
+ ``next_run_at`` (mirrors ``advance_next_run``'s at-most-once bump so a stale
+ re-delivery for the old time can't re-fire). One-shots keep ``next_run_at``
+ but the fresh ``fire_claim`` blocks a duplicate retry for the same fire.
+ ``mark_job_run`` clears the claim on completion so a re-armed recurring job
+ is claimable again next fire.
+
+ The stale-claim TTL means a machine that crashed after claiming but before
+ completing doesn't wedge the job forever — after the TTL another fire can
+ reclaim it.
+ """
+ with _jobs_lock():
+ jobs = load_jobs()
+ for job in jobs:
+ if job["id"] != job_id:
+ continue
+ if not job.get("enabled", True) or job.get("state") == "paused":
+ return False
+ now = _hermes_now()
+ existing = job.get("fire_claim")
+ if existing:
+ try:
+ claimed_at = _ensure_aware(datetime.fromisoformat(existing["at"]))
+ if (now - claimed_at).total_seconds() < claim_ttl_seconds:
+ return False # someone holds a fresh claim
+ except Exception:
+ pass # malformed claim → overwrite
+ job["fire_claim"] = {"at": now.isoformat(), "by": _machine_id()}
+ kind = job.get("schedule", {}).get("kind")
+ if kind in {"cron", "interval"}:
+ nxt = compute_next_run(job["schedule"], now.isoformat())
+ if nxt:
+ job["next_run_at"] = nxt
+ save_jobs(jobs)
+ return True
+ return False
+
+
def get_due_jobs() -> List[Dict[str, Any]]:
"""Get all jobs that are due to run now.
- For recurring jobs (cron/interval), if the scheduled time is stale
- (more than one period in the past, e.g. because the gateway was down),
- the job is fast-forwarded to the next future run instead of firing
- immediately. This prevents a burst of missed jobs on gateway restart.
+ For recurring jobs (cron/interval), if the scheduled time is stale (more
+ than one period in the past, e.g. because the gateway was down OR because a
+ long-running previous execution overran the interval), the accumulated
+ missed runs are collapsed — ``next_run_at`` is fast-forwarded to the next
+ future occurrence so a backlog does NOT burst-fire on restart — but the job
+ still fires ONCE now. This prevents the perpetual-defer loop (#33315) where
+ a job whose runtime exceeds ``interval + grace`` would be skipped forever.
+
+ Note: firing once on catch-up flows through ``mark_job_run``, so a job with
+ a ``repeat.times`` limit consumes one of its runs on that catch-up fire.
"""
with _jobs_lock():
return _get_due_jobs_locked()
@@ -1121,35 +1341,84 @@ def _get_due_jobs_locked() -> List[Dict[str, Any]]:
needs_save = True
break
- next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
+ raw_next_run_dt = datetime.fromisoformat(next_run)
+ schedule = job.get("schedule", {})
+ kind = schedule.get("kind")
+
+ next_run_dt = _ensure_aware(raw_next_run_dt)
+ # Migration repair: a cron job persists next_run_at as an absolute
+ # instant, but the cron expr describes local wall-clock intent. If the
+ # configured/system timezone changed after persistence, the stored
+ # instant's offset no longer matches now's, and its converted time can
+ # look due hours early (21:00+10 -> 13:00+02). When the stored *wall
+ # clock* is still in the future, recompute from the schedule so we fire
+ # at the intended local time instead of early-then-again.
+ #
+ # TRADE-OFF: this cannot distinguish a config/host TZ migration from a
+ # legitimate DST offset change. A DST boundary that satisfies all four
+ # conditions will recompute (and thus SKIP the pending occurrence, no
+ # catch-up) rather than fire it. Accepted: in the pure-migration case
+ # the recompute lands on the same wall-clock time later the same period,
+ # and DST-boundary collisions with a still-future stored wall clock are
+ # rare relative to the double-fire bug this prevents (#28934).
+ if (
+ kind == "cron"
+ and next_run_dt <= now
+ and _timezone_offset_mismatch(raw_next_run_dt, now)
+ and _stored_wall_clock_is_future(raw_next_run_dt, now)
+ ):
+ new_next = compute_next_run(schedule, now.isoformat())
+ if new_next:
+ logger.info(
+ "Job '%s' next_run_at offset changed (%s -> %s). "
+ "Recomputing cron run to preserve local wall-clock intent: %s",
+ job.get("name", job["id"]),
+ raw_next_run_dt.utcoffset(),
+ now.utcoffset(),
+ new_next,
+ )
+ for rj in raw_jobs:
+ if rj["id"] == job["id"]:
+ rj["next_run_at"] = new_next
+ needs_save = True
+ break
+ continue
+
if next_run_dt <= now:
- schedule = job.get("schedule", {})
- kind = schedule.get("kind")
# For recurring jobs, check if the scheduled time is stale
# (gateway was down and missed the window). Fast-forward to
# the next future occurrence instead of firing a stale run.
grace = _compute_grace_seconds(schedule)
if kind in {"cron", "interval"} and (now - next_run_dt).total_seconds() > grace:
- # Job is past its catch-up grace window — this is a stale missed run.
- # Grace scales with schedule period: daily=2h, hourly=30m, 10min=5m.
+ # Job is past its catch-up grace window — skip accumulated
+ # missed runs but still execute once now to avoid deferring
+ # indefinitely (e.g. a long-running job just finished).
new_next = compute_next_run(schedule, now.isoformat())
if new_next:
logger.info(
"Job '%s' missed its scheduled time (%s, grace=%ds). "
- "Fast-forwarding to next run: %s",
+ "Running now; next run provisionally set to: %s "
+ "(re-anchored on completion)",
job.get("name", job["id"]),
next_run,
grace,
new_next,
)
- # Update the job in storage
+ # Persist the fast-forward to storage now (skip accumulated
+ # slots). In the built-in ticker path this is shortly
+ # overwritten by advance_next_run + mark_job_run, but it is
+ # NOT redundant: it (a) protects the crash window between
+ # here and mark_job_run, and (b) covers the external
+ # fire_due provider path, which does not call
+ # advance_next_run. mark_job_run re-anchors next_run_at off
+ # the actual completion time, so this value is provisional.
for rj in raw_jobs:
if rj["id"] == job["id"]:
rj["next_run_at"] = new_next
needs_save = True
break
- continue # Skip this run
+ # Fall through to due.append(job) — execute once now
due.append(job)
diff --git a/cron/scheduler.py b/cron/scheduler.py
index 35906996619..b7d662e61a4 100644
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -15,6 +15,7 @@ import contextvars
import json
import logging
import os
+import re
import shutil
import subprocess
import sys
@@ -45,6 +46,59 @@ from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
+def _summarize_cron_failure_for_delivery(job: dict, error: str | None) -> str:
+ """Return a compact one-line failure message for chat delivery.
+
+ Full details stay in the cron output directory and the logs. Chat should
+ show the operator what broke without dumping provider JSON, retry noise, or
+ stack traces into the delivery channel.
+ """
+ job_name = job.get("name") or job.get("id") or "cron job"
+ text = (error or "unknown error").strip()
+ lower = text.lower()
+
+ # Provider/API failures are the common noisy path. Keep these short.
+ if "429" in text or "rate limit" in lower or "usage limit" in lower:
+ reason = "rate limit"
+ if "weekly usage limit" in lower:
+ reason = "weekly usage limit"
+ elif "quota" in lower:
+ reason = "quota limit"
+ return (
+ f"⚠️ Cron '{job_name}' failed: provider {reason}. "
+ "Fallback chain was exhausted or unavailable. "
+ "Full details saved in cron output."
+ )
+
+ if "readtimeout" in lower or "timed out" in lower or "timeout" in lower:
+ return (
+ f"⚠️ Cron '{job_name}' failed: provider timeout. "
+ "Fallback chain was exhausted or unavailable. "
+ "Full details saved in cron output."
+ )
+
+ # Match authentication/authorization wording at a word boundary and the
+ # 401/403 status codes as whole tokens, so "oauth", "4015" and similar do
+ # not trip a misleading auth message.
+ if re.search(r"authenticat|authoriz", lower) or re.search(r"\b(401|403)\b", text):
+ return (
+ f"⚠️ Cron '{job_name}' failed: provider authentication error. "
+ "Full details saved in cron output."
+ )
+
+ # Strip common exception wrappers and collapse provider payloads. Bound
+ # the input first so a multi-KB provider blob cannot slow the
+ # substitutions.
+ cleaned = re.sub(
+ r"^(RuntimeError|Exception|ValueError|HTTPStatusError):\s*",
+ "", text[:2000],
+ )
+ cleaned = re.sub(r"\s+", " ", cleaned).strip()
+ if len(cleaned) > 180:
+ cleaned = cleaned[:177].rstrip() + "..."
+ return f"⚠️ Cron '{job_name}' failed: {cleaned}"
+
+
class CronPromptInjectionBlocked(Exception):
"""Raised by _build_job_prompt when the fully-assembled prompt trips the
injection scanner. Caught in run_job so the operator sees a clean
@@ -229,9 +283,17 @@ def _get_hermes_home() -> Path:
def _get_lock_paths() -> tuple[Path, Path]:
- """Resolve cron lock paths at call time so profile/env changes are honored."""
- hermes_home = _get_hermes_home()
- lock_dir = hermes_home / "cron"
+ """Resolve cron lock paths at call time so profile/env changes are honored.
+
+ Anchored on the DEFAULT ROOT home (not the active profile), matching the
+ jobs store in cron.jobs (which uses get_default_hermes_root). The tick lock
+ is storage-coordination — it must live next to the single jobs.json so that
+ tickers running under different profiles share one lock and can't
+ double-fire the relocated store (#32091). Execution context (.env,
+ config.yaml, scripts) stays profile-aware via _get_hermes_home().
+ """
+ from hermes_constants import get_default_hermes_root
+ lock_dir = (_hermes_home or get_default_hermes_root()) / "cron"
return lock_dir, lock_dir / ".tick.lock"
@@ -656,6 +718,27 @@ def _send_media_via_adapter(
logger.warning("Job '%s': failed to send media %s: %s", job.get("id", "?"), media_path, e)
+def _confirm_adapter_delivery(send_result) -> bool:
+ """Return True only if ``send_result`` unambiguously confirms delivery.
+
+ A live adapter that returns ``None`` (e.g. a swallowed exception, a busy
+ platform, or a code path that returns early without producing a
+ ``SendResult``) must NOT be treated as success — doing so causes the
+ scheduler to log ``"delivered to via live adapter"`` while the
+ gateway never actually sees the message (#47056).
+
+ Likewise, an object missing a ``success`` attribute (e.g. a bare ``dict``
+ or a partial mock) is a contract violation: it does not actually tell us
+ whether the send succeeded. Require an explicit, truthy ``success``
+ attribute to count as confirmed.
+ """
+ if send_result is None:
+ return False
+ if not hasattr(send_result, "success"):
+ return False
+ return bool(getattr(send_result, "success"))
+
+
def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]:
"""
Deliver job output to the configured target(s) (origin chat, specific platform, etc.).
@@ -669,11 +752,25 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
"""
targets = _resolve_delivery_targets(job)
if not targets:
- if job.get("deliver", "local") != "local":
- msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}"
- logger.warning("Job '%s': %s", job["id"], msg)
- return msg
- return None # local-only jobs don't deliver — not a failure
+ deliver_value = _normalize_deliver_value(job.get("deliver", "local"))
+ if deliver_value == "local":
+ return None # local-only jobs don't deliver — not a failure
+ # deliver=origin with no resolvable origin and no configured home
+ # channels: treat as local rather than reporting an error. CLI-created
+ # jobs never capture a {platform, chat_id} origin, so failing here would
+ # make every CLI `deliver=origin` (or auto-detect) job emit a spurious
+ # "no delivery target resolved" error on every run (#43014). The output
+ # is still persisted in last_output for `cron list`/resume.
+ if deliver_value == "origin":
+ logger.info(
+ "Job '%s': deliver=origin but no origin or home channels — "
+ "skipping delivery (output saved in last_output)",
+ job.get("name", job.get("id", "?")),
+ )
+ return None
+ msg = f"no delivery target resolved for deliver={deliver_value}"
+ logger.warning("Job '%s': %s", job["id"], msg)
+ return msg
from tools.send_message_tool import _send_to_platform
from gateway.config import load_gateway_config, Platform
@@ -756,66 +853,226 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
# rooms (e.g. Matrix) where the standalone HTTP path cannot encrypt.
runtime_adapter = (adapters or {}).get(platform)
delivered = False
+ target_errors = []
if runtime_adapter is not None and loop is not None and getattr(loop, "is_running", lambda: False)():
- send_metadata = {"thread_id": thread_id} if thread_id else None
+ # Telegram three-mode topic routing (#22773): a private chat
+ # (positive chat_id) with a NUMERIC topic id is a Bot API Direct
+ # Messages topic and must be addressed via ``direct_messages_topic_id``
+ # — a bare ``message_thread_id`` is rejected/mis-routed by Bot API
+ # 10.0 and lands in General. Forum/supergroup targets (negative
+ # chat_id) and named DM-topic lanes keep the default thread_id
+ # handling. Compute the routed metadata ONCE so both the text send
+ # (via DeliveryRouter) and the media send use the same routing.
+ from gateway.delivery import (
+ DeliveryRouter,
+ DeliveryTarget,
+ _looks_like_int,
+ _looks_like_telegram_private_chat_id,
+ )
+
+ is_private_dm_topic = (
+ platform == Platform.TELEGRAM
+ and thread_id is not None
+ and _looks_like_telegram_private_chat_id(str(chat_id))
+ and _looks_like_int(str(thread_id))
+ )
+ if is_private_dm_topic:
+ # Routed via direct_messages_topic_id (mode 2), no bare thread_id.
+ route_thread_id = None
+ route_metadata = {
+ "direct_messages_topic_id": str(thread_id),
+ "job_id": job["id"],
+ }
+ # Media metadata mirrors the text routing so attachments land in
+ # the same DM topic instead of the General lane (#22773).
+ media_metadata = {"direct_messages_topic_id": str(thread_id)}
+ else:
+ route_thread_id = str(thread_id) if thread_id is not None else None
+ route_metadata = {"job_id": job["id"]}
+ media_metadata = {"thread_id": thread_id} if thread_id else None
+
try:
- # Send cleaned text (MEDIA tags stripped) — not the raw content
+ # Send cleaned text (MEDIA tags stripped) — not the raw content.
+ # Route through the gateway's DeliveryRouter so the live send
+ # gets the same platform-specific routing as live messages —
+ # in particular Telegram's three-mode topic routing. The
+ # standalone cron path lacked this, so DM-topic cron deliveries
+ # landed in the General topic or were rejected by Bot API 10.0
+ # (#22773).
text_to_send = cleaned_delivery_content.strip()
adapter_ok = True
+ timed_out = False
if text_to_send:
from agent.async_utils import safe_schedule_threadsafe
+
+ router = DeliveryRouter(config, adapters)
+ route_target = DeliveryTarget(
+ platform=platform,
+ chat_id=str(chat_id),
+ thread_id=route_thread_id,
+ is_explicit=True,
+ )
+ # Pass thread routing via the target (not a bare metadata
+ # "thread_id"): the router only applies its Telegram DM-topic
+ # detection when "thread_id"/"message_thread_id" are absent
+ # from metadata, deriving the routing from target.thread_id
+ # or the explicit direct_messages_topic_id above.
future = safe_schedule_threadsafe(
- runtime_adapter.send(chat_id, text_to_send, metadata=send_metadata),
+ router._deliver_to_platform(
+ route_target,
+ text_to_send,
+ route_metadata,
+ ),
loop,
)
if future is None:
adapter_ok = False
+ target_errors.append("live adapter event loop scheduling failed")
else:
+ send_result = None
+ timeout_handled = False
try:
send_result = future.result(timeout=60)
except TimeoutError:
- future.cancel()
+ # #38922: a slow confirmation does NOT necessarily
+ # mean the send failed — but we must distinguish two
+ # cases via future.cancel()'s return value:
+ #
+ # cancel() == False -> the coroutine was already
+ # running on the gateway loop when the timeout
+ # fired; the request is in flight on the wire and
+ # cannot be un-sent. Re-sending via standalone
+ # would be a guaranteed DUPLICATE, so treat it as
+ # delivered (assume-delivered).
+ #
+ # cancel() == True -> the scheduled callback never
+ # started executing (loop wedged/backlogged for
+ # the full 60s), so nothing was sent. We MUST
+ # fall through to the standalone path or the
+ # message is silently dropped (worse than a
+ # duplicate).
+ cancelled = future.cancel()
+ if cancelled:
+ msg = (
+ f"live adapter send to {platform_name}:{chat_id} "
+ "timed out before the coroutine was dispatched"
+ )
+ logger.warning(
+ "Job '%s': %s, falling back to standalone",
+ job["id"], msg,
+ )
+ target_errors.append(msg)
+ adapter_ok = False # fall through to standalone path
+ timeout_handled = True
+ else:
+ timed_out = True
+ timeout_handled = True
+ logger.warning(
+ "Job '%s': live adapter send to %s:%s timed out "
+ "after 60s; already dispatched (in flight), "
+ "assuming delivered (skipping standalone fallback "
+ "to avoid duplicate)",
+ job["id"], platform_name, chat_id,
+ )
+ except Exception as ex:
+ # A real send error (not a slow confirmation) — fall
+ # through to the standalone path so the message is
+ # still delivered.
+ target_errors.append(f"live adapter send failed: {ex}")
raise
- if send_result and not getattr(send_result, "success", True):
- err = getattr(send_result, "error", "unknown")
- logger.warning(
- "Job '%s': live adapter send to %s:%s failed (%s), falling back to standalone",
- job["id"], platform_name, chat_id, err,
- )
- adapter_ok = False # fall through to standalone path
- elif (
- send_result
- and thread_id
- and getattr(send_result, "raw_response", None)
- and send_result.raw_response.get("thread_fallback")
- ):
- requested_thread_id = send_result.raw_response.get("requested_thread_id") or thread_id
- msg = (
- f"configured thread_id {requested_thread_id} for "
- f"{platform_name}:{chat_id} was not found; delivered without thread_id"
- )
- logger.warning("Job '%s': %s", job["id"], msg)
- delivery_errors.append(msg)
- # Send extracted media files as native attachments via the live adapter
- if adapter_ok and media_files:
+ if timeout_handled:
+ # The timeout branch above already decided the
+ # outcome (assume-delivered if in flight, or
+ # adapter_ok=False to fall through if never
+ # dispatched). send_result is None, so skip the
+ # confirmation/thread-fallback inspection below.
+ pass
+ else:
+ # _deliver_to_platform returns either a SendResult
+ # (.success attr) or, when the silence-narration
+ # filter drops the message, a plain dict
+ # {"success": True, "delivered": False, ...}.
+ # Normalize both shapes so a getattr default doesn't
+ # misread a dict, and so a None / success-less object
+ # is NOT counted as delivered (#47056).
+ if isinstance(send_result, dict):
+ send_success = bool(send_result.get("success", False))
+ send_raw_response = send_result.get("raw_response")
+ else:
+ send_success = _confirm_adapter_delivery(send_result)
+ send_raw_response = getattr(send_result, "raw_response", None)
+
+ if not send_success:
+ if isinstance(send_result, dict):
+ err = send_result.get("error", "unknown")
+ shape = "dict"
+ elif send_result is not None:
+ err = getattr(send_result, "error", None)
+ shape = type(send_result).__name__
+ else:
+ err = "no response from adapter"
+ shape = "None"
+ msg = (
+ f"live adapter send to {platform_name}:{chat_id} "
+ f"returned unconfirmed result ({shape}, error={err})"
+ )
+ logger.warning(
+ "Job '%s': %s, falling back to standalone",
+ job["id"], msg,
+ )
+ target_errors.append(msg)
+ adapter_ok = False # fall through to standalone path
+ elif (
+ send_raw_response
+ and thread_id
+ and send_raw_response.get("thread_fallback")
+ ):
+ requested_thread_id = send_raw_response.get("requested_thread_id") or thread_id
+ msg = (
+ f"configured thread_id {requested_thread_id} for "
+ f"{platform_name}:{chat_id} was not found; delivered without thread_id"
+ )
+ logger.warning("Job '%s': %s", job["id"], msg)
+ delivery_errors.append(msg)
+
+ # Send extracted media files as native attachments via the live
+ # adapter, using the same DM-topic-aware routing as the text send
+ # (#22773 — media previously used a bare thread_id and landed in
+ # the General lane for private DM topics). Skip on an in-flight
+ # confirmation timeout: the gateway loop is contended, so each
+ # media send would also block its 30s budget, and the text
+ # payload is already assumed delivered (#38922). Record the
+ # skipped attachments so the drop is visible rather than silently
+ # lost.
+ if adapter_ok and not timed_out and media_files:
_send_media_via_adapter(
runtime_adapter,
chat_id,
media_files,
- send_metadata,
+ media_metadata,
loop,
job,
platform=platform,
)
+ elif timed_out and media_files:
+ msg = (
+ f"{len(media_files)} media attachment(s) not delivered to "
+ f"{platform_name}:{chat_id} (live adapter confirmation timed out)"
+ )
+ logger.warning("Job '%s': %s", job["id"], msg)
+ delivery_errors.append(msg)
if adapter_ok:
logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
delivered = True
except Exception as e:
+ err_msg = f"live adapter delivery to {platform_name}:{chat_id} failed: {e}"
+ if not any(err_msg in err for err in target_errors):
+ target_errors.append(err_msg)
logger.warning(
- "Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
- job["id"], platform_name, chat_id, e,
+ "Job '%s': %s, falling back to standalone",
+ job["id"], err_msg,
)
if not delivered:
@@ -835,13 +1092,15 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
except Exception as e:
msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
logger.error("Job '%s': %s", job["id"], msg)
- delivery_errors.append(msg)
+ target_errors.extend([msg])
+ delivery_errors.extend(target_errors)
continue
if result and result.get("error"):
msg = f"delivery error: {result['error']}"
logger.error("Job '%s': %s", job["id"], msg)
- delivery_errors.append(msg)
+ target_errors.extend([msg])
+ delivery_errors.extend(target_errors)
continue
logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
@@ -907,6 +1166,10 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
Shell support lets ``no_agent=True`` jobs ship classic bash watchdogs
(the `memory-watchdog.sh` pattern) without wrapping them in Python.
+ Subprocess environment is passed through ``_sanitize_subprocess_env`` so
+ provider credentials and other Hermes-managed secrets are not inherited
+ (SECURITY.md §2.3), matching terminal and MCP child processes.
+
Args:
script_path: Path to the script. Relative paths are resolved
against HERMES_HOME/scripts/. Absolute and ~-prefixed paths
@@ -968,6 +1231,8 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
argv = [sys.executable, str(path)]
try:
+ from tools.environments.local import _sanitize_subprocess_env
+
popen_kwargs = {"creationflags": windows_hide_flags()} if sys.platform == "win32" else {}
result = subprocess.run(
argv,
@@ -975,6 +1240,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
text=True,
timeout=script_timeout,
cwd=str(path.parent),
+ env=_sanitize_subprocess_env(os.environ.copy()),
**popen_kwargs,
)
stdout = (result.stdout or "").strip()
@@ -1577,6 +1843,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
else str(delivery_target["thread_id"])
)
+ # Model resolution precedence: per-job override > HERMES_MODEL env >
+ # config.yaml ``model:`` (string or ``{default: ...}``). The per-job
+ # value is intentionally re-read from storage every tick so a
+ # ``cronjob action=update model=...`` after a failed run takes effect
+ # on the next tick — there is no in-memory cache.
model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Load config.yaml for model, reasoning, prefill, toolsets, provider routing
@@ -1587,16 +1858,44 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if os.path.exists(_cfg_path):
with open(_cfg_path, encoding="utf-8") as _f:
_cfg = yaml.safe_load(_f) or {}
+ # Managed scope: a scheduled job must honor administrator-pinned
+ # model / reasoning / toolsets / provider_routing too. This loader
+ # builds its own dict, so overlay managed values via the shared
+ # helper (fail-open, no-op when no managed scope).
+ try:
+ from hermes_cli import managed_scope
+ _cfg = managed_scope.apply_managed_overlay(_cfg)
+ except Exception:
+ pass
_cfg = _expand_env_vars(_cfg)
- _model_cfg = _cfg.get("model", {})
+ # Coerce null/missing to {} so a falsy default never
+ # clobbers an already-resolved env value with ``None``.
+ _model_cfg = _cfg.get("model") or {}
if not job.get("model"):
if isinstance(_model_cfg, str):
model = _model_cfg
elif isinstance(_model_cfg, dict):
- model = _model_cfg.get("default", model)
+ # Mirror the CLI/oneshot resolution: prefer ``default``,
+ # accept a ``model`` alias, overwrite only when truthy.
+ _default = _model_cfg.get("default") or _model_cfg.get("model")
+ if _default:
+ model = _default
except Exception as e:
logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)
+ # Fail fast if no model resolved from job / env / config.yaml: an empty
+ # model otherwise reaches the provider as an opaque 400 (#23979).
+ if not (isinstance(model, str) and model.strip()):
+ raise RuntimeError(
+ f"Cron job '{job_name}' has no model configured "
+ f"(job.model={job.get('model')!r}, "
+ f"HERMES_MODEL={os.getenv('HERMES_MODEL', '')!r}, "
+ "config.yaml model.default missing or empty). "
+ f"Set a per-job model via "
+ f"`cronjob action=update job_id={job_id} model=` or set a "
+ "default with `hermes model `."
+ )
+
# Apply IPv4 preference if configured.
try:
from hermes_constants import apply_ipv4_preference
@@ -1967,6 +2266,82 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
logger.debug("Job '%s': failed to reap stale auxiliary clients: %s", job_id, e)
+def run_one_job(job: dict, *, adapters=None, loop=None, verbose: bool = False) -> bool:
+ """Run ONE due job end-to-end: execute → save output → deliver → mark.
+
+ This is the shared firing body extracted from ``tick``'s per-job closure so
+ that BOTH the built-in ticker and an external provider's ``fire_due`` (e.g.
+ Chronos) run the identical sequence — no duplicated correctness.
+
+ It does NOT decide whether the job is due, claim it, or compute the next
+ run — those are the caller's concern (``tick`` advances ``next_run_at``
+ under the file lock before dispatch; an external provider claims via the
+ store CAS). This function only fires the given job once.
+
+ Returns True if the job was processed (even if the job itself failed —
+ failure is recorded via ``mark_job_run``), False only if processing raised.
+ """
+ try:
+ success, output, final_response, error = run_job(job)
+
+ output_file = save_job_output(job["id"], output)
+ if verbose:
+ logger.info("Output saved to: %s", output_file)
+
+ # Deliver the final response to the origin/target chat.
+ # If the agent responded with [SILENT], skip delivery (but
+ # output is already saved above). Failed jobs always deliver.
+ deliver_content = final_response if success else _summarize_cron_failure_for_delivery(job, error)
+ # Treat whitespace-only final responses the same as empty
+ # responses: do not deliver a blank message, and let the
+ # empty-response guard below mark the run as a soft failure.
+ should_deliver = bool(deliver_content.strip())
+ if should_deliver and success and SILENT_MARKER in deliver_content.strip().upper():
+ logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
+ should_deliver = False
+
+ delivery_error = None
+ if should_deliver:
+ try:
+ delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
+ except Exception as de:
+ delivery_error = str(de)
+ logger.error("Delivery failed for job %s: %s", job["id"], de)
+
+ # Treat empty final_response as a soft failure so last_status
+ # is not "ok" — the agent ran but produced nothing useful.
+ # (issue #8585)
+ if success and not final_response.strip():
+ success = False
+ error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
+
+ mark_job_run(job["id"], success, error, delivery_error=delivery_error)
+ return True
+
+ except Exception as e:
+ logger.error("Error processing job %s: %s", job['id'], e)
+ mark_job_run(job["id"], False, str(e))
+ return False
+
+
+def _notify_provider_jobs_changed() -> None:
+ """Best-effort: tell the active scheduler provider the job set changed.
+
+ Called by the consumer surfaces (model tool / CLI / REST) AFTER a
+ successful store mutation (create/update/remove/pause/resume) so an external
+ provider (Chronos) can re-provision/cancel the affected one-shot via NAS.
+ No-op for the built-in (it re-reads jobs.json each tick), so the default
+ path is unchanged. Lives here (not in cron/jobs.py) to keep the store free
+ of provider imports — avoids an import cycle and keeps jobs.py low-coupling.
+ Never raises into the caller.
+ """
+ try:
+ from cron.scheduler_provider import resolve_cron_scheduler
+ resolve_cron_scheduler().on_jobs_changed()
+ except Exception as e:
+ logger.debug("on_jobs_changed notify failed: %s", e)
+
+
def tick(verbose: bool = True, adapters=None, loop=None, sync: bool = True) -> int:
"""
Check and run all due jobs.
@@ -2045,48 +2420,11 @@ def tick(verbose: bool = True, adapters=None, loop=None, sync: bool = True) -> i
)
def _process_job(job: dict) -> bool:
- """Run one due job end-to-end: execute, save, deliver, mark."""
- try:
- success, output, final_response, error = run_job(job)
-
- output_file = save_job_output(job["id"], output)
- if verbose:
- logger.info("Output saved to: %s", output_file)
-
- # Deliver the final response to the origin/target chat.
- # If the agent responded with [SILENT], skip delivery (but
- # output is already saved above). Failed jobs always deliver.
- deliver_content = final_response if success else f"⚠️ Cron job '{job.get('name', job['id'])}' failed:\n{error}"
- # Treat whitespace-only final responses the same as empty
- # responses: do not deliver a blank message, and let the
- # empty-response guard below mark the run as a soft failure.
- should_deliver = bool(deliver_content.strip())
- if should_deliver and success and SILENT_MARKER in deliver_content.strip().upper():
- logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
- should_deliver = False
-
- delivery_error = None
- if should_deliver:
- try:
- delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
- except Exception as de:
- delivery_error = str(de)
- logger.error("Delivery failed for job %s: %s", job["id"], de)
-
- # Treat empty final_response as a soft failure so last_status
- # is not "ok" — the agent ran but produced nothing useful.
- # (issue #8585)
- if success and not final_response.strip():
- success = False
- error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
-
- mark_job_run(job["id"], success, error, delivery_error=delivery_error)
- return True
-
- except Exception as e:
- logger.error("Error processing job %s: %s", job['id'], e)
- mark_job_run(job["id"], False, str(e))
- return False
+ """Run one due job end-to-end. Thin wrapper around the shared
+ module-level ``run_one_job`` so ``tick`` and external providers
+ (Chronos ``fire_due``) use the identical execute→save→deliver→mark
+ body."""
+ return run_one_job(job, adapters=adapters, loop=loop, verbose=verbose)
# Partition due jobs: those with a per-job workdir mutate
# os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
@@ -2185,6 +2523,12 @@ def tick(verbose: bool = True, adapters=None, loop=None, sync: bool = True) -> i
def _on_done(_f: concurrent.futures.Future) -> None:
_remaining[0] -= 1
+ try:
+ _exc = _f.exception()
+ if _exc is not None:
+ logger.error("Cron job future failed in async mode: %s", _exc, exc_info=(type(_exc), _exc, _exc.__traceback__))
+ except Exception:
+ pass
if _remaining[0] <= 0:
_sweep_mcp_orphans()
diff --git a/cron/scheduler_provider.py b/cron/scheduler_provider.py
new file mode 100644
index 00000000000..6b5c838617a
--- /dev/null
+++ b/cron/scheduler_provider.py
@@ -0,0 +1,194 @@
+"""CronScheduler provider interface (Axis B — the trigger).
+
+⚠️ EXPERIMENTAL — this interface is validated by exactly ONE consumer (the
+built-in) until an external provider (Chronos, Phase 4) shakes it out. Until
+then the module path, method signatures, and start() kwargs MAY change without
+a deprecation cycle. Once a second provider validates the shape it becomes
+stable. Any growth MUST be additive (new optional method with a default), never
+a changed signature on start() or a new abstractmethod.
+
+A CronScheduler decides *when* a due job fires. It does NOT decide what firing
+means: execution + delivery stay in cron.scheduler.run_job / _deliver_result,
+shared by all providers. Providers must never reimplement agent construction or
+delivery.
+
+The built-in InProcessCronScheduler runs the historical 60s daemon-thread
+ticker. Alternative providers (e.g. Chronos, a NAS-mediated managed-cron
+provider for scale-to-zero deployments) live under plugins/cron// and are
+selected via the `cron.provider` config key (empty = built-in).
+"""
+from __future__ import annotations
+
+import threading
+from abc import ABC, abstractmethod
+from typing import Any
+
+
+class CronScheduler(ABC):
+ """Axis-B trigger provider. Decides WHEN a due cron job fires.
+
+ Required surface is intentionally minimal: ``name`` + ``start``. ``stop``
+ and ``is_available`` carry safe defaults. The three Phase-4 hooks
+ (``on_jobs_changed`` / ``fire_due`` / ``reconcile``) are added later as
+ NON-abstract methods so the built-in keeps satisfying the ABC without
+ overriding them — see ``test_abc_growth_stays_additive``.
+ """
+
+ @property
+ @abstractmethod
+ def name(self) -> str:
+ """Short identifier, e.g. 'builtin', 'chronos'."""
+
+ def is_available(self) -> bool:
+ """Whether this provider can run in the current environment.
+
+ MUST NOT make network calls. The built-in is always available; an
+ external provider checks for configured endpoint/credentials. When a
+ named provider returns False, the resolver falls back to the built-in.
+ """
+ return True
+
+ @abstractmethod
+ def start(
+ self,
+ stop_event: threading.Event,
+ *,
+ adapters: Any = None,
+ loop: Any = None,
+ interval: int = 60,
+ ) -> None:
+ """Begin firing due jobs.
+
+ For the built-in this BLOCKS in the 60s loop until stop_event is set
+ (it is run inside a daemon thread by the caller, exactly as today).
+ An external provider may register a schedule/webhook and return
+ immediately; in that case it must still honor stop_event for teardown.
+ """
+
+ def stop(self) -> None:
+ """Optional eager teardown hook. Default no-op; setting the stop_event
+ is the primary stop signal. Override for providers holding external
+ resources (queue consumers, HTTP servers)."""
+ return None
+
+ # --- Optional hooks for external providers (added Phase 4). --------------
+ # All default-safe so the built-in inherits working behavior without
+ # overriding. Keep these NON-abstract — see test_abc_growth_stays_additive.
+
+ def on_jobs_changed(self) -> None:
+ """Called after a successful store mutation (create/update/remove/
+ pause/resume). External providers reconcile their registry here (e.g.
+ Chronos re-provisions/cancels the affected one-shot via NAS).
+ Built-in: no-op (it re-reads jobs.json on every tick)."""
+ return None
+
+ def fire_due(self, job_id: str, *, adapters: Any = None, loop: Any = None) -> bool:
+ """Run a single job NOW via the shared orchestrator. Called by the
+ inbound fire webhook when an external scheduler signals a job is due.
+
+ The default claims the job with a store-level compare-and-set
+ (multi-machine at-most-once), then runs it via the shared
+ ``run_one_job`` body. Built-in never calls this (it has its own tick
+ loop); an external provider routes its inbound fire here.
+
+ Returns True if THIS caller claimed and ran the job, False if the claim
+ was lost (another machine/retry won it) or the job no longer exists.
+ """
+ from cron.jobs import claim_job_for_fire, get_job
+ from cron.scheduler import run_one_job
+
+ if not claim_job_for_fire(job_id):
+ return False # another machine already claimed this fire
+ job = get_job(job_id)
+ if job is None:
+ return False # job removed (e.g. repeat-N exhausted) between arm and fire
+ return run_one_job(job, adapters=adapters, loop=loop)
+
+ def reconcile(self) -> None:
+ """Converge the external registry toward jobs.json (the desired state):
+ arm missing one-shots, cancel orphaned ones, re-arm changed times.
+ Built-in: no-op."""
+ return None
+
+
+def resolve_cron_scheduler() -> "CronScheduler":
+ """Return the active cron scheduler provider.
+
+ Reads ``cron.provider`` from config. Empty/absent → built-in. A named
+ provider that is missing, fails to load, or reports ``is_available() ==
+ False`` falls back to the built-in with a warning — cron must never be left
+ without a trigger.
+ """
+ import logging
+
+ logger = logging.getLogger("cron.scheduler_provider")
+
+ name = ""
+ try:
+ from hermes_cli.config import cfg_get, load_config
+ name = (cfg_get(load_config(), "cron", "provider", default="") or "").strip()
+ except Exception:
+ pass
+
+ if not name or name in ("builtin", "in-process", "inprocess"):
+ return InProcessCronScheduler()
+
+ try:
+ from plugins.cron import load_cron_scheduler
+ provider = load_cron_scheduler(name)
+ if provider is None:
+ logger.warning("cron.provider '%s' not found; using built-in ticker", name)
+ return InProcessCronScheduler()
+ if not provider.is_available():
+ logger.warning("cron.provider '%s' not available; using built-in ticker", name)
+ return InProcessCronScheduler()
+ logger.info("Using cron scheduler provider: %s", provider.name)
+ return provider
+ except Exception as e:
+ logger.warning(
+ "Failed to load cron.provider '%s' (%s); using built-in ticker", name, e
+ )
+ return InProcessCronScheduler()
+
+
+class InProcessCronScheduler(CronScheduler):
+ """Default provider: the historical in-process 60s ticker.
+
+ ``start()`` blocks in the tick loop until ``stop_event`` is set, identical
+ to the pre-refactor ``_start_cron_ticker`` core loop. The caller runs it in
+ a daemon thread.
+ """
+
+ @property
+ def name(self) -> str:
+ return "builtin"
+
+ def start(self, stop_event, *, adapters=None, loop=None, interval=60):
+ import logging
+ from cron.scheduler import tick as cron_tick
+ from cron.jobs import record_ticker_heartbeat
+
+ logger = logging.getLogger("cron.scheduler_provider")
+ logger.info("In-process cron scheduler started (interval=%ds)", interval)
+ # Heartbeat once before the first sleep so `hermes cron status` sees a
+ # live ticker immediately after startup, not only after the first tick.
+ record_ticker_heartbeat()
+ while not stop_event.is_set():
+ ok = False
+ try:
+ cron_tick(verbose=False, adapters=adapters, loop=loop, sync=False)
+ ok = True
+ except BaseException as e:
+ # Catch BaseException (not just Exception) so a SystemExit from
+ # a misbehaving provider SDK / agent retry path does not kill
+ # the ticker thread silently (#32612). KeyboardInterrupt is
+ # intentionally caught here too — gateway shutdown is driven by
+ # stop_event (set by the main thread's signal handler), not by
+ # an exception in this daemon thread, so swallowing it and
+ # re-checking stop_event keeps shutdown clean.
+ logger.error("Cron tick error: %s", e, exc_info=True)
+ # Record liveness every iteration; bump the success marker only on a
+ # clean tick, so status can tell "alive but failing every tick" from
+ # "actually firing jobs" (#32612, #32895).
+ record_ticker_heartbeat(success=ok)
+ stop_event.wait(interval)
diff --git a/cron/suggestions.py b/cron/suggestions.py
index 636a0335cc3..6c10a4f5b28 100644
--- a/cron/suggestions.py
+++ b/cron/suggestions.py
@@ -36,13 +36,13 @@ import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional
-from hermes_constants import get_hermes_home
+from hermes_constants import get_default_hermes_root
from hermes_time import now as _hermes_now
from utils import atomic_replace
logger = logging.getLogger(__name__)
-CRON_DIR = get_hermes_home().resolve() / "cron"
+CRON_DIR = get_default_hermes_root().resolve() / "cron"
SUGGESTIONS_FILE = CRON_DIR / "suggestions.json"
# In-process lock protecting load->modify->save cycles (the background review
diff --git a/docker/s6-rc.d/dashboard/run b/docker/s6-rc.d/dashboard/run
index d6fd29cafd3..2eb0cf9cb18 100755
--- a/docker/s6-rc.d/dashboard/run
+++ b/docker/s6-rc.d/dashboard/run
@@ -30,26 +30,27 @@ cd /opt/data
dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"
dash_port="${HERMES_DASHBOARD_PORT:-9119}"
-# `--insecure` is opt-in via HERMES_DASHBOARD_INSECURE. The dashboard's
-# OAuth auth gate engages automatically on non-loopback binds when a
-# DashboardAuthProvider is registered (e.g. the bundled dashboard_auth/nous
-# provider, which auto-registers when HERMES_DASHBOARD_OAUTH_CLIENT_ID is
-# set). If no provider is registered, start_server fails closed with a
-# specific operator-facing error.
+# The dashboard's auth gate engages automatically on non-loopback binds and
+# REQUIRES a DashboardAuthProvider to be registered, else start_server fails
+# closed. Two zero-infra ways to satisfy it in a container:
+# • Password: set HERMES_DASHBOARD_BASIC_AUTH_USERNAME + _PASSWORD (bundled
+# dashboard_auth/basic provider — no external IDP).
+# • OAuth: set HERMES_DASHBOARD_OAUTH_CLIENT_ID (bundled nous provider).
#
-# This used to derive --insecure from the bind host ("anything non-loopback
-# implies insecure"), but that predates the OAuth gate and silently
-# disabled it on every container-deployed dashboard. The gate is now the
-# authority; operators on trusted LANs / behind a reverse proxy without
-# the OAuth contract opt in explicitly.
-insecure=""
+# HERMES_DASHBOARD_INSECURE no longer disables the gate (June 2026 hardening:
+# unauthenticated public dashboards were the entry point for the MCP-config
+# persistence campaign). It is accepted but ignored; warn if set so operators
+# migrate to a real provider.
case "${HERMES_DASHBOARD_INSECURE:-}" in
- 1|true|TRUE|True|yes|YES|Yes) insecure="--insecure" ;;
+ 1|true|TRUE|True|yes|YES|Yes)
+ echo "[dashboard] HERMES_DASHBOARD_INSECURE no longer disables the auth gate." >&2
+ echo "[dashboard] A non-loopback dashboard requires an auth provider:" >&2
+ echo "[dashboard] set HERMES_DASHBOARD_BASIC_AUTH_USERNAME + _PASSWORD (password)" >&2
+ echo "[dashboard] or HERMES_DASHBOARD_OAUTH_CLIENT_ID (OAuth)." >&2
+ ;;
esac
# Skip the drop when already non-root.
-# shellcheck disable=SC2086 # word-splitting of $insecure is intentional
-[ "$(id -u)" = 0 ] || exec hermes dashboard --host "$dash_host" --port "$dash_port" --no-open $insecure
-# shellcheck disable=SC2086 # word-splitting of $insecure is intentional
+[ "$(id -u)" = 0 ] || exec hermes dashboard --host "$dash_host" --port "$dash_port" --no-open
exec s6-setuidgid hermes hermes dashboard \
- --host "$dash_host" --port "$dash_port" --no-open $insecure
+ --host "$dash_host" --port "$dash_port" --no-open
diff --git a/docs/chronos-managed-cron-contract.md b/docs/chronos-managed-cron-contract.md
new file mode 100644
index 00000000000..64937a9c994
--- /dev/null
+++ b/docs/chronos-managed-cron-contract.md
@@ -0,0 +1,196 @@
+# Chronos managed-cron — agent ↔ NAS wire contract
+
+**Status:** authoritative wire spec for the Chronos cron provider.
+**Audience:** the NAS-side implementer of the `agent-cron` endpoints
+(`nous-account-service`) and anyone debugging the managed-cron path.
+
+Chronos lets a hosted Hermes gateway **scale to zero** while idle and still
+fire cron jobs. Instead of an in-process 60-second ticker, the agent asks NAS
+to arm exactly **one external one-shot per job at that job's real next-fire
+time**. NAS calls the agent back at fire time over an authenticated webhook;
+the agent runs the job and re-arms the next one-shot. Between fires the agent
+process can be fully stopped — it wakes only on a genuine fire.
+
+The external scheduler NAS uses to implement the one-shots is an **internal NAS
+implementation detail**. The agent never talks to it, never holds its
+credentials, and never names it. The agent only knows the three NAS endpoints
+below.
+
+```
+create/update/pause/resume/remove a cron job (agent side)
+ │
+ ▼
+ChronosCronScheduler.reconcile() ── agent computes next_run_at
+ │ POST {portal}/api/agent-cron/provision (auth: agent's Nous access token)
+ ▼
+NAS arms a one-shot for fire_at ── NAS owns the scheduler + its creds
+ │
+ ⏰ at fire_at
+ ▼
+scheduler → POST {portal}/api/agent-cron/relay (auth: scheduler signature, NAS-verified)
+ │
+ ▼
+NAS mints a short-lived agent-audience JWT (purpose=cron_fire)
+ │ POST {agent_callback_url}/api/cron/fire (auth: that JWT)
+ ▼
+agent verifies the NAS JWT → store CAS claim → run_one_job → re-arm next one-shot
+```
+
+## Trust model (read this first)
+
+| Hop | Who calls whom | Auth mechanism | Verified by |
+|---|---|---|---|
+| 1 | agent → NAS (`provision`/`cancel`/`list`) | the agent's existing **Nous Portal access token** (Bearer) | NAS (its normal agent-token path) |
+| 2 | scheduler → NAS (`relay`) | the scheduler's request **signature** | NAS (the signature path it already has) |
+| 3 | NAS → agent (`/api/cron/fire`) | a **short-lived NAS-minted JWT** (`aud=agent:{instance_id}`, `purpose=cron_fire`) | agent (PyJWT against NAS JWKS) |
+
+Why NAS-mediated rather than scheduler→agent direct: the scheduler signs with
+**NAS's** keys, which the agent does not (and should not) hold. The agent can
+only verify a **NAS-minted** token — a trust path it already has. This keeps
+all scheduler credentials inside NAS. (Full rationale: the plan's DQ-4.)
+
+No new secret is introduced on the agent: hop 1 reuses the token the agent
+already uses for the portal, and hop 3 reuses the NAS-JWT verification the agent
+already performs.
+
+---
+
+## Endpoint 1 — `POST /api/agent-cron/provision` (agent → NAS)
+
+Arm (or re-arm, idempotently) exactly one one-shot for a job.
+
+- **Auth:** `Authorization: Bearer `. NAS validates via
+ its normal agent-token path and scopes the row to the calling agent/org.
+- **Request body:**
+ ```json
+ {
+ "job_id": "ab12cd34",
+ "fire_at": "2026-06-18T12:34:56+00:00",
+ "agent_callback_url": "https://agent-xyz.fly.dev",
+ "dedup_key": "ab12cd34:2026-06-18T12:34:56+00:00"
+ }
+ ```
+ - `fire_at` — ISO 8601, **agent-computed**. May be sub-minute in the future;
+ NAS must honor second-granularity (the agent owns the time, so there is no
+ 1-minute scheduler floor).
+ - `agent_callback_url` — the agent's own publicly-reachable base URL. NAS
+ POSTs `{agent_callback_url}/api/cron/fire` at fire time.
+ - `dedup_key` — `"{job_id}:{fire_at}"`. NAS **upserts by `(agent_id, job_id)`**
+ so re-arming the same fire is idempotent (no duplicate one-shots). A new
+ `fire_at` for the same `job_id` replaces the prior arm.
+- **Action:** arm one one-shot to fire at `fire_at`, destined for the NAS
+ **relay** route (Endpoint 3) — NOT the agent directly, so NAS stays in the
+ loop to mint the agent JWT. Persist `(agent_id, job_id, schedule_id,
+ agent_callback_url)`.
+- **Response:** `200 {"schedule_id": ""}`.
+
+## Endpoint 2 — `POST /api/agent-cron/cancel` (agent → NAS)
+
+- **Auth:** same as Endpoint 1.
+- **Body:** `{"job_id": "ab12cd34"}`.
+- **Action:** cancel the armed one-shot for `(agent_id, job_id)` and delete the
+ row. Idempotent — cancelling an unknown job is a 200 no-op.
+- **Response:** `200 {"ok": true}`.
+
+## Endpoint 3 — `POST /api/agent-cron/relay` (scheduler → NAS, the fire relay)
+
+- **Auth:** the scheduler's request **signature**, verified by NAS with the
+ signature path it already has. This is the trust boundary for the fire — a
+ forged relay call must be rejected here.
+- **Action:**
+ 1. Look up `(agent_id, job_id) → agent_callback_url` from the persisted row.
+ 2. Mint a **short-lived** JWT: `aud = "agent:{instance_id}"`,
+ `iss = {portal_url}`, `purpose = "cron_fire"`, small `exp` (≈60–120s),
+ signed with NAS's normal asymmetric signing key (published via JWKS).
+ 3. `POST {agent_callback_url}/api/cron/fire` with
+ `Authorization: Bearer ` and body `{"job_id": "...", "fire_at": "..."}`.
+ 4. Treat a non-2xx agent response as a **retryable** failure (let the
+ scheduler retry the relay). The agent's store CAS de-dupes a double fire,
+ so retries are safe.
+- **Response to the scheduler:** 2xx once the agent POST is accepted (202), so
+ the scheduler does not retry a delivered fire.
+
+---
+
+## Inbound `POST /api/cron/fire` (NAS → agent) — agent side, already implemented
+
+This is the agent endpoint NAS calls in Endpoint 3 step 3. Served by the
+**dashboard app** (`hermes_cli/web_server.py`) — the agent's always-reachable
+public HTTP surface on hosted deployments (the gateway may be idle/scaled down);
+it is in `PUBLIC_API_PATHS` so the dashboard cookie gate lets the bearer-JWT
+callback through to the verifier. (Also registered on the optional
+`APIServerAdapter` for self-host API-server deployments.) The verifier is
+`plugins/cron/chronos/verify.py`.
+
+- **Auth:** `Authorization: Bearer `. The agent verifies:
+ - signature against the NAS JWKS (`cron.chronos.nas_jwks_url`),
+ - `aud` == `cron.chronos.expected_audience` (this agent's
+ `agent:{instance_id}`),
+ - `iss` == `cron.chronos.portal_url`,
+ - `exp` / `nbf` (30s leeway),
+ - `purpose == "cron_fire"` — a general agent JWT (no/other purpose) is
+ rejected so it can't be replayed against this endpoint.
+- **Body:** `{"job_id": "ab12cd34", "fire_at": "..."}` (only `job_id` is used).
+- **Behavior:**
+ - invalid/missing/forged/expired/wrong-aud/wrong-purpose token → **401**, no
+ execution.
+ - missing `job_id` → **400**.
+ - valid → **202 `{"status": "accepted", "job_id": "..."}`** immediately, and
+ the job runs in the background. 202-before-run means a long agent turn never
+ trips the relay's HTTP timeout.
+- **At-most-once:** the agent claims the job with a store-level compare-and-set
+ (`claim_job_for_fire`) before running. A relay/scheduler retry that arrives
+ while the first fire is in flight (or after it completed) loses the claim and
+ does not double-run.
+
+---
+
+## At-most-once & re-arm semantics
+
+- **Recurring (cron/interval):** on fire, the agent advances `next_run_at`
+ (under its store lock) as part of the claim, runs the job, then re-provisions
+ a one-shot for the new `next_run_at`. A duplicate relay for the old `fire_at`
+ finds the claim taken / time advanced and is dropped.
+- **One-shot (`30m`, `+90s`, etc.):** fires once; `mark_job_run` marks it
+ completed. No re-arm.
+- **`repeat.times = N`:** `mark_job_run` deletes the job at the limit, so
+ `get_job` returns `None` after the final fire → the agent does **not** re-arm
+ → the schedule stops cleanly with no orphaned one-shot.
+- **Multi-replica agents:** the store CAS makes the fire at-most-once across N
+ gateway replicas sharing one `HERMES_HOME` — exactly one replica runs each
+ fire.
+
+## Reconcile (self-healing)
+
+The agent reconciles desired (`jobs.json`) vs armed on:
+- `start()` (gateway boot / wake),
+- every successful job mutation (`on_jobs_changed`),
+- piggybacked after each fire (re-arm).
+
+Reconcile arms missing/changed-time jobs and cancels orphans. A missed
+provision (transient NAS error) self-heals on the next reconcile. There is **no
+periodic wake** of a sleeping agent — that would negate scale-to-zero.
+
+## Config (agent side)
+
+All non-secret (`cron.chronos.*` in `config.yaml`); the agent holds no scheduler
+credentials. For hosted agents NAS sets these at provision time:
+
+| key | meaning |
+|---|---|
+| `cron.provider` | `"chronos"` to activate (empty = built-in ticker) |
+| `cron.chronos.portal_url` | NAS base URL (also the expected JWT `iss`) |
+| `cron.chronos.callback_url` | the agent's own public base URL for NAS→agent fires |
+| `cron.chronos.expected_audience` | this agent's JWT `aud` (`agent:{instance_id}`) |
+| `cron.chronos.nas_jwks_url` | NAS JWKS for verifying the fire JWT |
+
+If `callback_url` / `portal_url` is blank or the agent has no Nous login,
+`is_available()` returns False and the resolver falls back to the built-in
+in-process ticker — cron never loses its trigger.
+
+## Escape hatch (not default)
+
+The inbound `/api/cron/fire` verifier is pluggable (`get_fire_verifier()`). If
+relay volume through NAS ever saturates, a direct scheduler→agent mode with a
+per-job NAS-minted cron-key can replace the NAS-JWT verifier with **no change to
+the webhook handler**. NAS-mediated (this contract) is the default.
diff --git a/docs/relay-connector-contract.md b/docs/relay-connector-contract.md
index 39c86a5f839..4e20726197f 100644
--- a/docs/relay-connector-contract.md
+++ b/docs/relay-connector-contract.md
@@ -62,33 +62,80 @@ live platform adapter's capability methods.
The connector normalizes each platform wire event into a `MessageEvent`
(`gateway/platforms/base.py`) and delivers it to the gateway. **Inbound is
-delivered over a signed HTTP POST, not the outbound `/relay` WebSocket** (see
-the transport note below). The gateway keys the session via `build_session_key()`
+delivered over the gateway's OUTBOUND `/relay` WebSocket** (see the transport
+note below) — the connector pushes an `inbound` frame down the socket the
+gateway already dialed. The gateway keys the session via `build_session_key()`
from the embedded `SessionSource` — so populating the right discriminators is
the single highest-correctness responsibility of the connector.
-### Inbound transport (signed HTTP POST, not the outbound WS)
+### Inbound transport (WS back-channel, not HTTP)
The gateway dials **out** to the connector's `/relay` WebSocket for the
-handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound,
-however, is delivered the other way: the connector **POSTs** the normalized
-event to the gateway's inbound endpoint (`HttpGatewayDelivery` on the connector;
-`gateway/relay/inbound_receiver.py` on the gateway). The reason is
-multi-instance: the connector instance that owns a platform's socket (and thus
-produces inbound events) is generally **not** the instance a given gateway
-dialed its outbound WS into, so inbound must target a tenant **endpoint** (which
-may load-balance across gateway instances) rather than ride one gateway's
-outbound socket. Each delivery is HMAC-signed with the per-tenant **delivery
-key** (§6.1); the gateway verifies the signature over the exact raw bytes before
-accepting the event. Two POST targets:
+handshake + outbound actions (§4) + its own `/stop` egress (§5). Inbound rides
+the **same socket** in the other direction: the connector pushes an `inbound`
+frame (and `interrupt_inbound` for §5) down the gateway's outbound WS. There is
+**no gateway-side inbound HTTP endpoint** — a gateway need not (and, when hosted,
+cannot) expose any inbound port; everything flows over the connection it
+initiated.
+
+**Multi-instance routing.** The connector instance that owns a platform's socket
+(and thus produces inbound events) is generally **not** the instance the gateway
+dialed its outbound WS into. The producing instance therefore publishes the
+event on the connector's internal **relay bus** (Redis pub/sub; `RelayBus` in
+`src/core/relayBus.ts`) keyed by tenant. Every connector instance subscribes and
+routes each message to its **local** sessions for that tenant
+(`RelayServer.routeBusMessage`); the single instance that actually holds the
+gateway's socket delivers it, and instances with no local session for the tenant
+no-op. Cross-instance delivery is thus an in-cluster Redis hop, not a public
+HTTP call.
+
+Frames (connector → gateway, over the WS):
+
+- `{"type":"inbound", "event": , "bufferId"?}`
+- `{"type":"interrupt_inbound", "session_key", "chat_id"}` (§5)
+- `{"type":"passthrough_forward", "forward": , "bufferId"?}` (§5.1)
+
+`PassthroughForward` is the wire form of a forwarded passthrough-plane request
+(Class-2/3 webhooks — Discord interactions, Twilio): `{platform, botId, method,
+path, headers: [[k,v],…], bodyB64}`. The body is base64-encoded so arbitrary
+bytes survive the newline-delimited-JSON transport; the gateway base64-decodes
+back to the exact bytes the connector forwarded (the connector already verified
+the provider signature and stripped any shared-identity credential at the edge —
+§6 — so the gateway re-processes a sanitized, token-free body and acts on it via
+the token-less `follow_up` path). See §3.1.
+
+**Trust.** The WS upgrade is authenticated with the gateway's per-gateway secret
+(§6.1), so the channel is trusted end to end — inbound frames are not separately
+HMAC-signed (the authenticated socket subsumes the per-delivery origin proof the
+old HTTP path needed). The relay-bus hop is inside the connector trust domain
+(same as the lease/buffer/capability stores).
+
+> Earlier drafts of this contract delivered inbound over a signed **HTTP POST**
+> to a `gatewayEndpoint` (`HttpGatewayDelivery` + a gateway-side
+> `inbound_receiver`), HMAC-signed with a per-tenant delivery key. That required
+> every gateway to expose a reachable inbound URL — impossible for hosted
+> gateways, which have no public IP. The WS back-channel above replaces it; the
+> per-tenant delivery key is retained at provision for forward-compat but is no
+> longer used for inbound. The **passthrough plane** (Class-2/3 webhooks like
+> Discord interactions / Twilio) historically still used `gatewayEndpoint` for
+> its post-ACK forward; Phase 5 §5.1 moves that forward onto the WS too (the
+> `passthrough_forward` frame above), so a hosted gateway needs zero public
+> inbound surface and `gatewayEndpoint` is retired once the cutover lands.
+
+### 3.1 Passthrough-plane forward (§5.1)
+
+The passthrough plane answers the provider's latency-critical ACK at the
+connector EDGE (e.g. Discord's deferred interaction response within ~3s), then
+does a **fire-and-forget** forward of the real request to the gateway. That
+forward needs no response back (the provider was already satisfied), so it rides
+the same outbound WS as `inbound` via a `passthrough_forward` frame rather than
+an HTTP POST. The gateway processes the decoded request through its normal agent
+path (a Discord interaction is decoded to a `MessageEvent` and handled like a
+message; the reply egresses over the outbound / `follow_up` path). `bufferId` is
+present when the forward was buffered (Phase 5 §5.3 buffered-only flip) and the
+gateway acks it after durable handoff.
-- `POST {gatewayEndpoint}` → `{"type":"message", "event": }`
-- `POST {gatewayEndpoint}/interrupt` → `{"type":"interrupt", "session_key", "reason"?}` (§5)
-> An earlier draft of this contract delivered inbound over the WS `inbound`
-> frame. That only works single-instance and predates the multi-instance
-> socket-ownership + channel-auth model; the signed-HTTP path above is the
-> shipped design.
### SessionSource fields (the wire surface)
@@ -178,13 +225,15 @@ gateway holds zero capability material). Source of truth:
mid-turn `/stop` over the outbound WS. The connector MUST forward it to the
gateway instance running that `session_key` (the routing invariant).
- **Connector → gateway:** an inbound interrupt for a `session_key` is delivered
- as a **signed HTTP POST** to `{gatewayEndpoint}/interrupt` (§3 transport note),
- and bridged by the adapter's `on_interrupt(session_key, chat_id)` into the
- existing per-session interrupt mechanism, cancelling exactly that turn
+ as an `interrupt_inbound` frame down the gateway's outbound WS (§3 transport
+ note) — routed cross-instance via the relay bus to whichever instance holds
+ the socket — and bridged by the adapter's `on_interrupt(session_key, chat_id)`
+ into the existing per-session interrupt mechanism, cancelling exactly that turn
(siblings untouched).
-The gateway→connector `/stop` rides the outbound WS; the connector→gateway
-interrupt rides the same signed-HTTP inbound path as a normalized event.
+Both directions ride the gateway's outbound WS: the gateway→connector `/stop`
+egresses over it, and the connector→gateway interrupt rides the same `inbound`
+back-channel as a normalized event.
---
@@ -231,20 +280,21 @@ only in transport. See `docs/capability-trust-boundary.md` (connector repo:
A2 makes the connector the sole holder of platform secrets while the gateway may
be **customer-managed and internet-exposed**, so the connector⇄gateway channel
-is itself authenticated. The gateway holds two enrollment-issued credentials
-(`hermes gateway enroll` → connector `/relay/enroll`): a **per-gateway secret**
-and a **per-tenant delivery key**. Both are HMAC-SHA256 schemes with a
-multi-secret rotation verify list (gateway side: `gateway/relay/auth.py`;
-connector side: `src/core/relayAuthToken.ts` + `src/core/deliverySigning.ts`).
+is itself authenticated. The gateway holds an enrollment- or provision-issued
+**per-gateway secret** (`hermes gateway enroll` → connector `/relay/enroll`, or
+managed self-provision → `/relay/provision`) that authenticates its outbound WS
+upgrade. It is an HMAC-SHA256 scheme with a multi-secret rotation verify list
+(gateway side: `gateway/relay/auth.py`; connector side:
+`src/core/relayAuthToken.ts`).
| Leg | Credential | Mechanism |
|-----|-----------|-----------|
| Gateway → connector WS upgrade | per-gateway secret | An `Authorization` bearer header on the `/relay` upgrade. The token is `base64url(payload:exp:sig)` where `payload = gatewayId` and `sig = HMAC(payload:exp, secret)`. Connector verifies and rejects the upgrade (**close 4401**) on mismatch/absence/revocation. The authenticated tenant comes from the connector's store, never the `hello` frame. |
-| Connector → gateway inbound POST | per-tenant delivery key | Two headers: `x-relay-timestamp` (unix seconds) and `x-relay-signature` (hex `HMAC(ts.rawBody, deliveryKey)`). Gateway verifies over the **exact raw bytes** within a ±300s replay window before accepting the event; rejects **401** otherwise. |
+| Connector → gateway inbound (`inbound` / `interrupt_inbound` frames) | — (rides the authenticated WS) | Inbound is pushed down the gateway's already-authenticated outbound socket (§3), so no per-message signature is needed. A **per-tenant delivery key** is still issued at enroll/provision and retained for forward-compat, but is no longer used to sign inbound. |
This is the **channel** authenticator — distinct from platform crypto, which the
relay path still sheds entirely (§6). The gateway holds zero platform secrets;
-these two keys authenticate only the connector link. Full threat model +
+the per-gateway secret authenticates only the connector link. Full threat model +
enrollment/rotation/kill-switch design: `docs/connector-gateway-auth-design.md`
(connector repo).
diff --git a/docs/session-lifecycle.md b/docs/session-lifecycle.md
new file mode 100644
index 00000000000..14ce1635927
--- /dev/null
+++ b/docs/session-lifecycle.md
@@ -0,0 +1,631 @@
+# Session Lifecycle
+
+> **Audience:** Gateway developers and maintainers
+> **Source files:** `gateway/session.py` (~1444 lines), `gateway/run.py` (~16800 lines), `gateway/config.py`
+> **Last updated:** 2026-06-16
+
+## Overview
+
+A **session** represents a continuous conversation between the agent and one or more users on a
+messaging platform. The session lifecycle governs when conversations persist, when they reset,
+how they survive gateway restarts, and how messages queue during concurrent operations.
+
+The session system lives primarily in two modules:
+
+- `gateway/session.py` — Data model (`SessionSource`, `SessionEntry`, `SessionContext`),
+ key generation (`build_session_key`), and the main store (`SessionStore`).
+- `gateway/run.py` — Gateway runner (`GatewayRunner`) that wires sessions into the message
+ processing pipeline: session expiry watching, agent caching, restart recovery, and message
+ queuing.
+
+---
+
+## 1. SessionSource — Message Origin Descriptor
+
+`SessionSource` is a frozen record of *where a message came from*. It is attached to every
+incoming `MessageEvent` and used for routing, isolation, and context injection.
+
+### Fields
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `platform` | `Platform` | *(required)* | Enum identifying the messaging platform (telegram, discord, slack, signal, whatsapp, matrix, local, etc.). |
+| `chat_id` | `str` | *(required)* | Platform-level chat/group/channel identifier. Routed through the adapter's `chat_id_key` transform. |
+| `chat_name` | `Optional[str]` | `None` | Human-readable name of the chat or group. |
+| `chat_type` | `str` | `"dm"` | One of `"dm"`, `"group"`, `"channel"`, `"thread"`. Controls session key generation and isolation. |
+| `user_id` | `Optional[str]` | `None` | Platform-specific user identifier. Used for authorization and per-user session isolation. |
+| `user_name` | `Optional[str]` | `None` | Display name of the message author. Injected into system prompt. |
+| `thread_id` | `Optional[str]` | `None` | Forum topic / Discord thread / Slack thread identifier. Differentiates threaded conversations. |
+| `chat_topic` | `Optional[str]` | `None` | Channel topic or description (Discord channel topic, Slack channel purpose). |
+| `user_id_alt` | `Optional[str]` | `None` | Platform-specific stable alternative ID (Signal UUID, Feishu union_id). Used when `user_id` is ephemeral. |
+| `chat_id_alt` | `Optional[str]` | `None` | Signal group internal ID — maps a Signal group V2 identifier to its canonical form. |
+| `is_bot` | `bool` | `False` | True when the message author is a bot or webhook (Discord bots). |
+| `guild_id` | `Optional[str]` | `None` | Discord guild / Slack workspace / Matrix server scope identifier. |
+| `parent_chat_id` | `Optional[str]` | `None` | Parent channel when `chat_id` refers to a thread. |
+| `message_id` | `Optional[str]` | `None` | ID of the triggering message. Used for pin/reply/react operations and Discord ID injection. |
+| `role_authorized` | `bool` | `False` | True when adapter granted access via a platform role (not individual user ID). |
+
+### Key Methods
+
+- **`description`** (property: `str`) — Human-readable summary e.g. `"DM with Alice"`,
+ `"group: My Group, thread: 12345"`.
+- **`to_dict()` / `from_dict()`** — Serialization round-trip for persistence in `sessions.json`.
+
+---
+
+## 2. SessionEntry — Active Session Record
+
+`SessionEntry` is the per-session metadata record stored in memory and persisted to
+`{sessions_dir}/sessions.json`. Each entry maps a `session_key` to its current `session_id`.
+
+### Fields
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `session_key` | `str` | *(required)* | Deterministic key identifying the conversation lane (see §4). |
+| `session_id` | `str` | *(required)* | Unique identifier for this specific conversation incarnation. Format: `YYYYMMDD_HHMMSS_<8hex>`. |
+| `created_at` | `datetime` | *(required)* | When this session incarnation was created. |
+| `updated_at` | `datetime` | *(required)* | Last activity timestamp. Used for idle timeout and expiry checks. |
+| `origin` | `Optional[SessionSource]` | `None` | The source that created this session, used for delivery routing. |
+| `display_name` | `Optional[str]` | `None` | Chat display name (sourced from `SessionSource.chat_name`). |
+| `platform` | `Optional[Platform]` | `None` | Platform enum, persisted for expiry policy lookup across restarts. |
+| `chat_type` | `str` | `"dm"` | Chat type, also persisted for policy lookup. |
+| `input_tokens` | `int` | `0` | Cumulative LLM input (prompt) tokens consumed. |
+| `output_tokens` | `int` | `0` | Cumulative LLM output (completion) tokens consumed. |
+| `cache_read_tokens` | `int` | `0` | Cumulative prompt cache read tokens. |
+| `cache_write_tokens` | `int` | `0` | Cumulative prompt cache write tokens. |
+| `total_tokens` | `int` | `0` | Total token count across all turns. |
+| `estimated_cost_usd` | `float` | `0.0` | Estimated cumulative USD cost. |
+| `cost_status` | `str` | `"unknown"` | Cost tracking status label. |
+| `last_prompt_tokens` | `int` | `0` | Last API-reported prompt token count. Used for accurate compression pre-check. |
+
+### Boolean Flags (State Machine)
+
+SessionEntry has several boolean flags that form a simple state machine governing session
+behavior on the next access.
+
+| Flag | Type | Default | Description |
+|---|---|---|---|
+| `was_auto_reset` | `bool` | `False` | Set when a session was auto-reset due to policy expiry (idle/daily). Consumed once to inject a context notice. |
+| `auto_reset_reason` | `Optional[str]` | `None` | `"idle"` or `"daily"` — why the previous session was auto-reset. |
+| `reset_had_activity` | `bool` | `False` | Whether the expired session had any messages (`total_tokens > 0`). |
+| `is_fresh_reset` | `bool` | `False` | Set by explicit `/new` or `/reset`. Triggers topic/channel skill re-injection on first message. Distinguished from `was_auto_reset` to avoid misleading "session expired" notices. |
+| `expiry_finalized` | `bool` | `False` | Set by background expiry watcher after invoking `on_session_finalize` hooks, cleaning tool resources, and evicting the cached agent. Prevents redundant finalization across restarts. |
+| `suspended` | `bool` | `False` | Hard force-wipe signal. Set by `/stop` or stuck-loop escalation (3+ consecutive restart failures). On next `get_or_create_session()`, forces a new `session_id` regardless of `resume_pending`. |
+| `resume_pending` | `bool` | `False` | Soft recovery marker. Set by `suspend_recently_active()` (crash recovery) or drain timeout. On next access, preserves the existing `session_id` — the user continues on the same transcript. Cleared after the next successful turn completes. |
+| `resume_reason` | `Optional[str]` | `None` | Why resume was marked: `"restart_timeout"`, `"shutdown_timeout"`, `"restart_interrupted"`. |
+| `last_resume_marked_at` | `Optional[datetime]` | `None` | Timestamp of the last resume-pending marking. |
+
+### State Transition Logic (get_or_create_session)
+
+```
+ ┌──────────┐
+ │ Incoming │
+ │ Message │
+ └────┬─────┘
+ │
+ ▼
+ ┌──────────────────────┐
+ │ session_key exists │──── No ──► Create fresh SessionEntry
+ │ AND !force_new │
+ └──────────┬───────────┘
+ │ Yes
+ ▼
+ ┌──────────────────────┐
+ │ entry.suspended? │──── Yes ──► Auto-reset: new session_id
+ └──────────┬───────────┘ (reason="suspended")
+ │ No
+ ▼
+ ┌──────────────────────┐
+ │ entry.resume_pending?│──── Yes ──► Return existing entry
+ └──────────┬───────────┘ (preserve session_id)
+ │ No Clear flag on next successful turn
+ ▼
+ ┌──────────────────────┐
+ │ Policy says reset? │──── Yes ──► Auto-reset: new session_id
+ └──────────┬───────────┘ (reason="idle"/"daily")
+ │ No
+ ▼
+ ┌──────────────────────┐
+ │ Return existing │
+ │ entry, bump │
+ │ updated_at │
+ └──────────────────────┘
+```
+
+**Priority order in `get_or_create_session()`:**
+1. `suspended=True` → always force-reset (hard wipe)
+2. `resume_pending=True` → preserve session_id (soft recovery)
+3. Policy expiry (idle/daily) → auto-reset
+4. No trigger → return existing entry (bump `updated_at`)
+
+---
+
+## 3. SessionStore — Storage and Operations
+
+`SessionStore` is the main storage layer. It maintains an in-memory dict (`_entries`) persisted
+to `sessions.json`, with SQLite (`SessionDB`) as the canonical store for session metadata and
+message transcripts.
+
+### Constructor
+
+```python
+SessionStore(sessions_dir: Path, config: GatewayConfig, has_active_processes_fn=None)
+```
+
+- `sessions_dir` — Directory where `sessions.json` lives.
+- `config` — `GatewayConfig` instance for reset policy lookups.
+- `has_active_processes_fn` — Optional callback keyed by `session_key` to check for running
+ background processes. Sessions with active processes are never expired or pruned.
+
+### Operations (Methods)
+
+| Method | Description |
+|---|---|
+| `get_or_create_session(source, force_new=False)` | Core entry point. Returns existing or creates new `SessionEntry`. Evaluates `suspended`, `resume_pending`, and reset policy. Creates/ends SQLite records. |
+| `update_session(session_key, last_prompt_tokens=None)` | Lightweight metadata update after an interaction. Bumps `updated_at`, optionally records `last_prompt_tokens`. |
+| `reset_session(session_key, display_name=None)` | Explicit reset (from `/new` or `/reset`). Creates new `session_id`, sets `is_fresh_reset=True`. Ends old SQLite session, creates new one. |
+| `switch_session(session_key, target_session_id)` | Switch to a different existing session ID (from `/resume`). Ends current SQLite session, reopens target. |
+| `suspend_session(session_key)` | Mark session as `suspended=True` (from `/stop`). Forces auto-reset on next access. |
+| `mark_resume_pending(session_key, reason)` | Mark session as `resume_pending=True` (from drain timeout). Preserves session_id on next access. Will NOT override `suspended=True`. |
+| `clear_resume_pending(session_key)` | Clear `resume_pending` after a successful resumed turn. Called from gateway after `run_conversation()` returns. |
+| `suspend_recently_active(max_age_seconds=120)` | Crash recovery: mark recently-active sessions as `resume_pending=True`. Skips already-pending and already-suspended entries. Called on startup after unclean shutdown. |
+| `prune_old_entries(max_age_days)` | Drop entries older than `max_age_days` (based on `updated_at`). Skips `suspended` entries and sessions with active processes. |
+| `list_sessions(active_minutes=None)` | Return all sessions, optionally filtered by recent activity. Sorted by `updated_at` descending. |
+| `lookup_by_session_id(session_id)` | Find the active `SessionEntry` for a persisted session ID. |
+| `has_any_sessions()` | Check if any sessions have ever been created (uses SQLite for history, not just in-memory dict). |
+| `append_to_transcript(session_id, message, skip_db=False)` | Append a message to SQLite transcript. `skip_db=True` prevents duplicate writes when the agent already persisted. |
+| `rewrite_transcript(session_id, messages)` | Full replacement of session transcript (used by `/retry`, `/undo`, `/compress`). |
+| `load_transcript(session_id)` | Load all messages from a session's SQLite transcript. |
+| `rewind_session(session_id, n=1)` | Back up `n` user turns via soft-delete (keeps audit trail). Returns `{rewound_count, turns_undone, target_text}`. |
+
+### Internal Helpers
+
+- `_ensure_loaded()` / `_ensure_loaded_locked()` — Load `sessions.json` into `_entries` dict.
+- `_save()` — Atomic write to `sessions.json` via temp file + `atomic_replace`.
+- `_generate_session_key(source)` — Delegates to `build_session_key()` with config params.
+- `_is_session_expired(entry)` — Policy check from entry alone (no source needed). Used by
+ background expiry watcher.
+- `_should_reset(entry, source)` — Policy check returning `"idle"`, `"daily"`, or `None`.
+
+### Storage Layout
+
+```
+{sessions_dir}/
+ sessions.json # In-memory _entries dict, persisted as JSON
+ Maps session_key → SessionEntry (metadata only)
+ {session_id}.jsonl # (Legacy, removed in spec 002)
+```
+
+The canonical transcript store is SQLite via `SessionDB` (from `hermes_state`). The
+`sessions.json` file persists the `session_key → session_id` mapping and entry metadata
+(flags, timestamps, token counts). If SQLite is unavailable, the store falls back to
+JSONL, but this is a degradation path.
+
+---
+
+## 4. SessionKey Generation Rules
+
+Session keys are deterministic strings that identify a conversation lane. They are generated
+by `build_session_key(source, group_sessions_per_user, thread_sessions_per_user)`.
+
+### Key Format
+
+```
+agent:main:{platform}:{chat_type}[:{chat_id}][:{thread_id}][:{participant_id}]
+```
+
+### DM Rules
+
+| Scenario | Key |
+|---|---|
+| DM with chat_id | `agent:main:telegram:dm:12345` |
+| DM with chat_id + thread | `agent:main:telegram:dm:12345:thread_678` |
+| DM without chat_id, with participant_id | `agent:main:signal:dm:user_abc` |
+| DM without chat_id or participant_id | `agent:main:telegram:dm` |
+| WhatsApp DM (canonicalized) | `agent:main:whatsapp:dm:{canonical_number}` |
+
+- DMs always include `chat_id` when present, isolating each private conversation.
+- `thread_id` further differentiates threaded DMs within the same DM chat.
+- Without `chat_id`, falls back to `user_id_alt` or `user_id` as participant_id.
+- Without any identifier, all DMs on that platform collapse to one shared session.
+
+### Group/Channel Rules
+
+| Scenario | Key |
+|---|---|
+| Group chat | `agent:main:telegram:group:-10012345` |
+| Group chat, per-user isolation | `agent:main:telegram:group:-10012345:user_abc` |
+| Thread in group, shared | `agent:main:discord:group:12345:thread_678` |
+| Thread in group, per-user | `agent:main:discord:group:12345:thread_678:user_abc` |
+| Channel | `agent:main:slack:channel:C12345` |
+| WhatsApp group (canonicalized) | `agent:main:whatsapp:group:{canonical_id}:{participant}` |
+
+- `chat_id` identifies the parent group/channel.
+- `thread_id` differentiates threads within that parent.
+- **Per-user isolation** (append `participant_id`) is controlled by:
+ - `group_sessions_per_user` (default: `True`) — group/channel sessions are isolated.
+ - `thread_sessions_per_user` (default: `False`) — threads are **shared** by default
+ (Telegram forum topics, Discord threads, Slack threads all share one session per thread).
+- `participant_id` = `user_id_alt` or `user_id` (in that priority).
+- WhatsApp identifiers are canonicalized to handle JID/LID alias flips.
+
+### Special Case: WhatApp
+
+WhatsApp phone numbers go through `canonical_whatsapp_identifier()` which strips the
+`@s.whatsapp.net` suffix and normalizes to E.164 format. This prevents session fragmentation
+when the bridge returns different alias forms of the same phone number.
+
+---
+
+## 5. Multi-User Isolation Strategy
+
+Multi-user isolation determines whether multiple users in the same chat share a conversation
+or each get their own private session.
+
+### Decision Logic (`is_shared_multi_user_session`)
+
+```python
+def is_shared_multi_user_session(source, *, group_sessions_per_user, thread_sessions_per_user):
+ if source.chat_type == "dm":
+ return False # DMs are always private
+ if source.thread_id:
+ return not thread_sessions_per_user # Threads: shared unless per-user
+ return not group_sessions_per_user # Groups: isolated unless shared
+```
+
+### Summary
+
+| Chat Type | Default | Config Control |
+|---|---|---|
+| DM | Private (never shared) | N/A |
+| Group/Channel | Per-user isolation | `group_sessions_per_user` (default: True) |
+| Thread (forum, discord) | Shared (all participants see same context) | `thread_sessions_per_user` (default: False) |
+
+### Impact on System Prompt
+
+When `shared_multi_user_session=True`, the system prompt omits a fixed user name and instead
+states: *"Multi-user {thread|session} — messages are prefixed with [sender name]. Multiple
+users may participate."* Individual sender names are prefixed on each user message by the
+gateway at runtime, preserving prompt caching (the system prompt doesn't change per-turn).
+
+---
+
+## 6. Reset Policy
+
+Reset policies control when a session automatically loses context (gets a new `session_id`).
+
+### Policy Modes (`SessionResetPolicy`)
+
+| Mode | Behavior | Default Config |
+|---|---|---|
+| `"none"` | Never auto-reset. Context managed only by compression. | — |
+| `"idle"` | Reset after N minutes of inactivity from `updated_at`. | `idle_minutes: 1440` (24h) |
+| `"daily"` | Reset at a specific hour each day (local time). | `at_hour: 4` (4 AM) |
+| `"both"` | Whichever triggers first — daily boundary OR idle timeout. | **(default)** |
+
+### Policy Evaluation
+
+```python
+# Idle check
+idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
+if now > idle_deadline: return "idle"
+
+# Daily check
+today_reset = now.replace(hour=policy.at_hour, minute=0, second=0, microsecond=0)
+if now.hour < policy.at_hour:
+ today_reset -= timedelta(days=1) # Reset hasn't happened yet today
+if entry.updated_at < today_reset: return "daily"
+```
+
+### Per-Platform/Per-Type Policies
+
+Reset policies are configurable per platform and session type via `config.get_reset_policy()`.
+This allows different platforms to have different expiry rules (e.g., Telegram DMs reset
+after 24h idle, but Slack groups persist indefinitely).
+
+### Exclusions
+
+Sessions with active background processes are **never** expired or reset. The
+`has_active_processes_fn` callback checks for running processes when evaluating policies.
+
+### Reset Effects
+
+When a reset triggers:
+
+1. Old session is ended in SQLite (with reason `"session_reset"`).
+2. New `session_id` is generated (`YYYYMMDD_HHMMSS_<8hex>`).
+3. New `SessionEntry` is created with `was_auto_reset=True` and the reset reason.
+4. `reset_had_activity` is set if the old session had any turns (`total_tokens > 0`).
+5. The old AIAgent cache entry is evicted on the next expiry watcher pass.
+6. On the first message after reset, a context notice is injected: "Session expired due to inactivity / daily reset."
+
+---
+
+## 7. Restart Recovery Flow
+
+The restart recovery system ensures that in-flight sessions are preserved across gateway
+restarts, crashes, and drain timeouts. It is the solution to issue #7536.
+
+### Startup Recovery Sequence
+
+```
+Gateway starts
+ │
+ ▼
+┌───────────────────────────────┐
+│ Check for .clean_shutdown │── Exists? ──► Skip suspension (clean exit)
+│ marker │
+└───────────────────────────────┘
+ │ Missing
+ ▼
+┌───────────────────────────────┐
+│ session_store │── Marks sessions updated within
+│ .suspend_recently_active() │ last 120 seconds as resume_pending
+└───────────────────────────────┘
+ │
+ ▼
+┌───────────────────────────────┐
+│ _suspend_stuck_loop_sessions()│── Suspends sessions that have been
+│ │ active across 3+ restarts
+└───────────────────────────────┘
+ │
+ ▼
+┌───────────────────────────────┐
+│ Queue inbound messages while │
+│ startup restore runs │
+│ (_startup_restore_in_progress)│
+└───────────────────────────────┘
+ │
+ ▼
+┌───────────────────────────────┐
+│ For each adapter, find │
+│ resume_pending sessions → │
+│ synthesize MessageEvent and │
+│ run _handle_message to let │
+│ the agent auto-continue │
+└───────────────────────────────┘
+```
+
+### suspend_recently_active(max_age_seconds=120)
+
+Called on gateway startup when no `.clean_shutdown` marker exists (indicating a crash or
+unexpected exit). For each session updated within the last 120 seconds:
+
+- Sets `resume_pending=True`, `resume_reason="restart_interrupted"`,
+ `last_resume_marked_at=now`.
+- Skips entries already `resume_pending=True` (no double-mark).
+- Skips entries explicitly `suspended=True` (hard wipe should stay).
+
+### Stuck-Loop Detection (`_suspend_stuck_loop_sessions`)
+
+Counts consecutive restarts via a JSON file (`{HERMES_HOME}/restart_counts.json`). If a
+session has been active across 3+ consecutive restarts, it's auto-suspended so the user
+gets a clean slate.
+
+### Drain-Timeout Marking
+
+On graceful shutdown/restart, the drain system calls `mark_resume_pending()` for any
+session that was mid-turn when the drain timeout fired. Reasons:
+
+- `"restart_timeout"` — killed during restart drain
+- `"shutdown_timeout"` — killed during shutdown drain
+- `"restart_interrupted"` — crash recovery (from `suspend_recently_active`)
+
+All three reasons are in `_AUTO_RESUME_REASONS` and eligible for startup auto-resume.
+
+### Auto-Resume on Next Access
+
+When `get_or_create_session()` encounters `resume_pending=True`:
+
+1. It returns the existing entry **without** creating a new `session_id`.
+2. The existing transcript is loaded intact.
+3. The marking is not cleared here — it survives until the next successful turn
+ completes (`clear_resume_pending()` is called from the gateway after
+ `run_conversation()` returns a real response).
+4. If the resumed turn is interrupted again, the `resume_pending` flag remains set,
+ and the next restart will retry. The stuck-loop counter handles terminal escalation
+ (3 retries → suspended).
+
+### Clean Shutdown Marker (`.clean_shutdown`)
+
+Written at the end of a graceful shutdown. On next startup:
+
+- If present: skip `suspend_recently_active()` entirely. Active agents were already
+ drained, so no sessions are stuck.
+- Then delete the marker.
+
+This prevents unwanted auto-resets after `hermes update`, `hermes gateway restart`,
+or `/restart`.
+
+---
+
+## 8. Message Queuing Flow
+
+The message queuing system handles two scenarios:
+
+1. **Interrupt follow-ups** — When a user sends multiple messages while the agent is
+ processing, subsequent messages are queued as single-slot pending messages.
+2. **`/queue` FIFO** — Explicit `/queue` commands that must each produce their own full
+ agent turn, in order, without merging.
+
+### Data Structures
+
+```
+adapter._pending_messages: Dict[session_key, MessageEvent]
+ └── Single "next-up" slot per session. Overwritten on repeat sends
+ (burst collapse). Shared with photo-burst follow-ups.
+
+self._queued_events: Dict[session_key, List[MessageEvent]]
+ └── Overflow buffer. Each /queue invocation appends here when the
+ slot is occupied. Promoted one-at-a-time after each drain.
+```
+
+### Enqueue (`_enqueue_fifo`)
+
+```
+_enqueue_fifo(session_key, event, adapter)
+ │
+ ▼
+┌───────────────────────────────────────┐
+│ Is slot free? │
+│ (session_key NOT in _pending_messages)│── Yes ──► Place event in slot
+└───────────────────────────────────────┘
+ │ No
+ ▼
+Append to _queued_events[session_key] (overflow tail)
+```
+
+### Dequeue / Promotion (`_promote_queued_event`)
+
+Called at the drain site after the slot was consumed. If there's an overflow item:
+
+- When `pending_event is None` (slot was empty), return overflow head as the new event.
+- When `pending_event` exists, stage overflow head in the slot for the next recursion.
+- If no adapter available, push back to `_queued_events` (don't silently drop).
+
+### Queue Depth
+
+`_queue_depth(session_key, adapter)` returns `len(overflow) + (1 if slot occupied else 0)`.
+
+### Clearing
+
+Queued events for a session are cleared on `/new` and `/reset` (via `_handle_reset_command`).
+
+### FIFO Invariant
+
+Each `/queue` invocation produces exactly one full agent turn, in FIFO order, with no
+merging. The single-slot `_pending_messages` + overflow `_queued_events` design ensures
+that repeated sends during an active turn don't cause out-of-order processing.
+
+---
+
+## 9. Session Context Injection
+
+`SessionContext` is built from a `SessionSource` and `GatewayConfig` and injected into the
+agent's system prompt. It tells the agent:
+
+- Where the current message came from
+- What platforms are connected
+- Where it can deliver scheduled task outputs
+- Whether this is a shared multi-user session
+
+### Construction (`build_session_context`)
+
+```python
+def build_session_context(source, config, session_entry=None) -> SessionContext
+```
+
+1. Collects connected platforms from config.
+2. Collects home channels for each platform.
+3. Determines `shared_multi_user_session` via `is_shared_multi_user_session()`.
+4. Attaches session metadata (key, id, timestamps) if `session_entry` is provided.
+
+### PII Redaction (`build_session_context_prompt`)
+
+The dynamic system prompt section (`## Current Session Context`) can optionally redact
+personally identifiable information before sending to the LLM:
+
+- User IDs → `user_<12hex>` (SHA-256 prefix)
+- Chat IDs → `:<12hex>` or just `<12hex>`
+- Platforms excluded from redaction: Discord (needs raw IDs for `@mentions`),
+ and any plugin-registered platform not marked `pii_safe`.
+
+Redaction applies only to the system prompt text. Routing, session keys, and adapter
+operations always use the original values.
+
+---
+
+## 10. Background Expiry Watcher
+
+The `_session_expiry_watcher` task runs in the gateway event loop every 300 seconds (5 min).
+
+### Responsibilities
+
+1. **Finalize expired sessions** — For each entry where `_is_session_expired()` returns
+ True and `expiry_finalized` is False:
+ - Invoke `on_session_finalize` plugin hooks (cleanup, notifications).
+ - Clean up cached AIAgent resources (close tool resources, shut down memory provider).
+ - Evict the cached agent entry.
+ - Clear per-session overrides (`_session_model_overrides`, reasoning overrides, etc.).
+ - Mark `expiry_finalized=True` and persist.
+
+2. **Sweep idle cached agents** — Calls `_sweep_idle_cached_agents()` to evict agents that
+ have been idle beyond `_AGENT_CACHE_IDLE_TTL_SECS` (3600s / 1h), regardless of session
+ reset policy. This prevents unbounded memory growth in gateways with long-lived sessions.
+
+3. **Prune stale entries** — Calls `session_store.prune_old_entries()` hourly based on
+ `config.session_store_max_age_days`. Prevents `sessions.json` from growing unbounded.
+
+### Failure Handling
+
+- Per-session retry count: each failed finalize is retried up to 3 consecutive times.
+- After 3 failures, the entry is force-marked `expiry_finalized=True` to prevent infinite
+ retry loops.
+
+---
+
+## 11. Agent Cache
+
+The gateway maintains an LRU cache of `AIAgent` instances keyed by `session_key` to
+preserve prompt caching across turns.
+
+### Cache Properties
+
+- **Max size:** 128 entries (`_AGENT_CACHE_MAX_SIZE`).
+- **Eviction policy:** Least-recently-used (LRU via `OrderedDict`).
+- **Idle TTL:** 3600s (1h) — enforced by `_session_expiry_watcher`.
+- **Lock:** `_agent_cache_lock` (threading) for thread safety.
+
+### Cache Lifecycle
+
+```
+Message arrives
+ │
+ ▼
+get_or_create_session() → session_key obtained
+ │
+ ▼
+Lookup _agent_cache[session_key]
+ │
+ ├── Hit → move_to_end(), reuse AIAgent (preserves prompt cache)
+ │
+ └── Miss → create new AIAgent, store in cache
+ (if at capacity, popitem(last=False) evicts LRU entry)
+ │
+ ▼
+run_conversation() → agent processes message
+ │
+ ▼
+Session expiry watcher evicts agent when session finalizes
+```
+
+### Cleanup Flow
+
+When a session expires:
+1. `_cleanup_agent_resources(agent)` — shuts down memory provider, closes tool resources.
+2. `_evict_cached_agent(key)` — removes from `_agent_cache` so the agent can be GC'd.
+
+---
+
+## Appendix: Key Configuration
+
+| Config Key | Type | Default | Description |
+|---|---|---|---|
+| `group_sessions_per_user` | `bool` | `true` | Isolate group/channel sessions per user |
+| `thread_sessions_per_user` | `bool` | `false` | Isolate thread sessions per user |
+| `session_store_max_age_days` | `int` | `0` | Prune sessions older than N days (0=disabled) |
+| `agent.gateway_auto_continue_freshness` | `int` | `3600` | Seconds for resume freshness window |
+| `agent.gateway_timeout` | `int` | `1800` | Agent turn timeout (30 min default) |
+
+### Reset Policy (per-platform/type, in config.yaml)
+
+```yaml
+session_reset:
+ mode: both # none | idle | daily | both
+ at_hour: 4 # daily reset hour (local time)
+ idle_minutes: 1440 # idle timeout (24h)
+ notify: true # notify user on auto-reset
+```
+
+Platform-specific overrides can be set under `platforms..session_reset`.
diff --git a/gateway/authz_mixin.py b/gateway/authz_mixin.py
index 9ededa49130..bcefb4eecb4 100644
--- a/gateway/authz_mixin.py
+++ b/gateway/authz_mixin.py
@@ -457,14 +457,19 @@ class GatewayAuthorizationMixin:
Resolution order:
1. Explicit per-platform ``unauthorized_dm_behavior`` in config — always wins.
- 2. Explicit global ``unauthorized_dm_behavior`` in config — wins when no per-platform.
- 3. When an allowlist (``PLATFORM_ALLOWED_USERS``,
+ 2. Email defaults to ``"ignore"`` unless explicitly opted into
+ pairing. Inboxes may contain arbitrary unread human messages, so
+ replying with pairing codes is not a safe platform default.
+ 3. Explicit global ``unauthorized_dm_behavior`` in config — wins for
+ chat-shaped platforms when no per-platform override is set.
+ 4. When an adapter-level DM policy opts into pairing or silent drop, honor it.
+ 5. When an allowlist (``PLATFORM_ALLOWED_USERS``,
``PLATFORM_GROUP_ALLOWED_USERS`` / ``PLATFORM_GROUP_ALLOWED_CHATS``,
or ``GATEWAY_ALLOWED_USERS``) is configured, default to ``"ignore"`` —
the allowlist signals that the owner has deliberately restricted
access; spamming unknown contacts with pairing codes is both noisy
and a potential info-leak. (#9337)
- 4. No allowlist and no explicit config → ``"pair"`` (open-gateway default).
+ 6. No allowlist and no explicit config → ``"pair"`` (open-gateway default).
"""
config = getattr(self, "config", None)
@@ -475,6 +480,14 @@ class GatewayAuthorizationMixin:
# Operator explicitly configured behavior for this platform — respect it.
return config.get_unauthorized_dm_behavior(platform)
+ # Email is inbox-shaped, not chat-shaped: an agent mailbox may contain
+ # unrelated unread human email. Require an explicit per-platform
+ # ``unauthorized_dm_behavior: pair`` opt-in before replying to unknown
+ # senders with pairing codes. Keep this before the global fallback to
+ # match GatewayConfig.get_unauthorized_dm_behavior().
+ if platform == Platform.EMAIL:
+ return "ignore"
+
# Check for an explicit global config override.
if config and hasattr(config, "unauthorized_dm_behavior"):
if config.unauthorized_dm_behavior != "pair": # non-default → explicit override
diff --git a/gateway/config.py b/gateway/config.py
index 0ebf23e12d0..e1556b37d52 100644
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -17,7 +17,7 @@ from typing import Dict, List, Optional, Any, Callable
from enum import Enum
from hermes_cli.config import get_hermes_home
-from utils import is_truthy_value
+from utils import env_int, is_truthy_value
logger = logging.getLogger(__name__)
@@ -463,23 +463,15 @@ _PLATFORM_CONNECTED_CHECKERS: dict[Platform, Callable[[PlatformConfig], bool]] =
Platform.WEIXIN: lambda cfg: bool(
cfg.extra.get("account_id") and (cfg.token or cfg.extra.get("token"))
),
- Platform.WHATSAPP: lambda cfg: True, # bridge handles auth
Platform.WHATSAPP_CLOUD: lambda cfg: bool(
cfg.extra.get("phone_number_id") and cfg.extra.get("access_token")
),
Platform.SIGNAL: lambda cfg: bool(cfg.extra.get("http_url")),
- Platform.EMAIL: lambda cfg: bool(cfg.extra.get("address")),
- Platform.SMS: lambda cfg: bool(os.getenv("TWILIO_ACCOUNT_SID")),
Platform.API_SERVER: lambda cfg: True,
Platform.WEBHOOK: lambda cfg: True,
Platform.MSGRAPH_WEBHOOK: lambda cfg: bool(
str(cfg.extra.get("client_state") or "").strip()
),
- Platform.FEISHU: lambda cfg: bool(cfg.extra.get("app_id")),
- Platform.WECOM: lambda cfg: bool(cfg.extra.get("bot_id")),
- Platform.WECOM_CALLBACK: lambda cfg: bool(
- cfg.extra.get("corp_id") or cfg.extra.get("apps")
- ),
Platform.BLUEBUBBLES: lambda cfg: bool(
cfg.extra.get("server_url") and cfg.extra.get("password")
),
@@ -489,10 +481,6 @@ _PLATFORM_CONNECTED_CHECKERS: dict[Platform, Callable[[PlatformConfig], bool]] =
Platform.YUANBAO: lambda cfg: bool(
cfg.extra.get("app_id") and cfg.extra.get("app_secret")
),
- Platform.DINGTALK: lambda cfg: bool(
- (cfg.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID"))
- and (cfg.extra.get("client_secret") or os.getenv("DINGTALK_CLIENT_SECRET"))
- ),
# Relay dials OUT to a connector; it is "connected" once an endpoint URL is
# configured (extra["relay_url"] or extra["url"]). The capability descriptor
# is negotiated at handshake time, so the URL is the only config-level
@@ -545,6 +533,13 @@ class GatewayConfig:
thread_sessions_per_user: bool = False # When False (default), threads are shared across all participants
max_concurrent_sessions: Optional[int] = None # Positive int caps simultaneous active chat sessions
+ # Multi-profile multiplexing (opt-in; default off preserves one-gateway-per-profile).
+ # When True, the default profile's gateway serves inbound messages for every
+ # profile on the host: profiles are stamped into session keys and (in later
+ # phases) per-profile adapters/credentials are resolved. When False, the
+ # gateway behaves exactly as before — single HERMES_HOME, no profile stamping.
+ multiplex_profiles: bool = False
+
# Unauthorized DM policy
unauthorized_dm_behavior: str = "pair" # "pair" or "ignore"
@@ -587,9 +582,17 @@ class GatewayConfig:
if checker is not None:
return checker(config)
- # Plugin-registered platforms
+ # Plugin-registered platforms. Force plugin discovery first so this
+ # works even when GatewayConfig is constructed directly (e.g. in tests
+ # or callers that bypass load_gateway_config(), which is what triggers
+ # discovery in the normal path). discover_plugins() is idempotent.
try:
from gateway.platform_registry import platform_registry
+ try:
+ from hermes_cli.plugins import discover_plugins
+ discover_plugins()
+ except Exception:
+ pass
entry = platform_registry.get(platform.value)
if entry:
if entry.is_connected is not None:
@@ -650,6 +653,7 @@ class GatewayConfig:
"group_sessions_per_user": self.group_sessions_per_user,
"thread_sessions_per_user": self.thread_sessions_per_user,
"max_concurrent_sessions": self.max_concurrent_sessions,
+ "multiplex_profiles": self.multiplex_profiles,
"unauthorized_dm_behavior": self.unauthorized_dm_behavior,
"streaming": self.streaming.to_dict(),
"session_store_max_age_days": self.session_store_max_age_days,
@@ -695,7 +699,12 @@ class GatewayConfig:
group_sessions_per_user = data.get("group_sessions_per_user")
thread_sessions_per_user = data.get("thread_sessions_per_user")
+ multiplex_profiles = data.get("multiplex_profiles")
nested_gateway = data.get("gateway") if isinstance(data.get("gateway"), dict) else {}
+ if multiplex_profiles is None and isinstance(nested_gateway, dict):
+ # Also honor gateway.multiplex_profiles written by
+ # ``hermes config set gateway.multiplex_profiles true``.
+ multiplex_profiles = nested_gateway.get("multiplex_profiles")
if "max_concurrent_sessions" in data:
max_concurrent_raw = data.get("max_concurrent_sessions")
max_concurrent_key = "max_concurrent_sessions"
@@ -732,6 +741,7 @@ class GatewayConfig:
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
+ multiplex_profiles=_coerce_bool(multiplex_profiles, False),
max_concurrent_sessions=max_concurrent_sessions,
unauthorized_dm_behavior=unauthorized_dm_behavior,
streaming=StreamingConfig.from_dict(data.get("streaming", {})),
@@ -739,7 +749,12 @@ class GatewayConfig:
)
def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
- """Return the effective unauthorized-DM behavior for a platform."""
+ """Return the effective unauthorized-DM behavior for a platform.
+
+ Email is inbox-shaped, not chat-shaped, so it defaults to ``"ignore"``
+ unless ``platforms.email.unauthorized_dm_behavior`` explicitly opts
+ into pairing. A global default does not opt email into pairing.
+ """
if platform:
platform_cfg = self.platforms.get(platform)
if platform_cfg and "unauthorized_dm_behavior" in platform_cfg.extra:
@@ -747,6 +762,8 @@ class GatewayConfig:
platform_cfg.extra.get("unauthorized_dm_behavior"),
self.unauthorized_dm_behavior,
)
+ if platform == Platform.EMAIL:
+ return "ignore"
return self.unauthorized_dm_behavior
def get_notice_delivery(self, platform: Optional[Platform] = None) -> str:
@@ -796,6 +813,14 @@ def load_gateway_config() -> GatewayConfig:
with open(config_yaml_path, encoding="utf-8") as f:
yaml_cfg = yaml.safe_load(f) or {}
+ # Managed scope: overlay administrator-pinned values so the gateway
+ # honors them too. This loader builds its own dict instead of going
+ # through hermes_cli.config.load_config, so without this a managed
+ # session_reset / quick_commands / stt / model would be ignored by
+ # the messaging gateway. Fail-open via the shared helper.
+ from hermes_cli import managed_scope
+ yaml_cfg = managed_scope.apply_managed_overlay(yaml_cfg)
+
# Map config.yaml keys → GatewayConfig.from_dict() schema.
# Each key overwrites whatever gateway.json may have set.
sr = yaml_cfg.get("session_reset")
@@ -823,6 +848,13 @@ def load_gateway_config() -> GatewayConfig:
if "thread_sessions_per_user" in yaml_cfg:
gw_data["thread_sessions_per_user"] = yaml_cfg["thread_sessions_per_user"]
+ # Multiplexing flag: accept both the top-level key and the nested
+ # gateway.multiplex_profiles form (from_dict resolves the nested
+ # fallback, but surface the top-level key here for parity with the
+ # other session-scope flags above).
+ if "multiplex_profiles" in yaml_cfg:
+ gw_data["multiplex_profiles"] = yaml_cfg["multiplex_profiles"]
+
gateway_section = yaml_cfg.get("gateway")
if isinstance(gateway_section, dict) and "max_concurrent_sessions" in gateway_section:
gw_data["max_concurrent_sessions"] = gateway_section["max_concurrent_sessions"]
@@ -997,7 +1029,11 @@ def load_gateway_config() -> GatewayConfig:
plat_data, extra = _ensure_platform_extra_dict(platforms_data, plat.value)
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
- if plat == Platform.SLACK and enabled_was_explicit:
+ # Mark the explicit enable/disable so the registry-driven
+ # plugin-enable pass in _apply_env_overrides honors an
+ # explicit ``enabled: false`` for migrated plugin platforms
+ # (slack, telegram, matrix, dingtalk, whatsapp, feishu …)
+ # instead of re-enabling them on token/SDK presence. #41112.
extra["_enabled_explicit"] = True
extra.update(bridged)
@@ -1038,28 +1074,10 @@ def load_gateway_config() -> GatewayConfig:
_, extra = _ensure_platform_extra_dict(platforms_data, entry.name)
extra.update(seeded)
- # Slack settings → env vars (env vars take precedence)
- slack_cfg = yaml_cfg.get("slack", {})
- if isinstance(slack_cfg, dict):
- if "require_mention" in slack_cfg and not os.getenv("SLACK_REQUIRE_MENTION"):
- os.environ["SLACK_REQUIRE_MENTION"] = str(slack_cfg["require_mention"]).lower()
- if "strict_mention" in slack_cfg and not os.getenv("SLACK_STRICT_MENTION"):
- os.environ["SLACK_STRICT_MENTION"] = str(slack_cfg["strict_mention"]).lower()
- if "allow_bots" in slack_cfg and not os.getenv("SLACK_ALLOW_BOTS"):
- os.environ["SLACK_ALLOW_BOTS"] = str(slack_cfg["allow_bots"]).lower()
- frc = slack_cfg.get("free_response_channels")
- if frc is not None and not os.getenv("SLACK_FREE_RESPONSE_CHANNELS"):
- if isinstance(frc, list):
- frc = ",".join(str(v) for v in frc)
- os.environ["SLACK_FREE_RESPONSE_CHANNELS"] = str(frc)
- if "reactions" in slack_cfg and not os.getenv("SLACK_REACTIONS"):
- os.environ["SLACK_REACTIONS"] = str(slack_cfg["reactions"]).lower()
- # allowed_channels: if set, bot ONLY responds in these channels (whitelist)
- ac = slack_cfg.get("allowed_channels")
- if ac is not None and not os.getenv("SLACK_ALLOWED_CHANNELS"):
- if isinstance(ac, list):
- ac = ",".join(str(v) for v in ac)
- os.environ["SLACK_ALLOWED_CHANNELS"] = str(ac)
+ # Slack settings → env vars: migrated to the slack plugin's
+ # ``apply_yaml_config_fn`` hook (see plugins/platforms/slack/
+ # adapter.py::_apply_yaml_config), dispatched in the
+ # ``apply_yaml_config_fn`` loop above. #41112 / #3823.
# Bridge top-level require_mention to Telegram when the telegram: section
# does not already provide one. Users often write "require_mention: true"
@@ -1072,125 +1090,22 @@ def load_gateway_config() -> GatewayConfig:
_tg_plat = platforms_data.setdefault(Platform.TELEGRAM.value, {})
_tg_extra = _tg_plat.setdefault("extra", {})
_tg_extra.setdefault("require_mention", _tl_require_mention)
+ # Also bridge to the TELEGRAM_REQUIRE_MENTION env var that the
+ # adapter reads at runtime. This used to live in the telegram_cfg
+ # block in core; it stays in core because it keys off the TOP-LEVEL
+ # require_mention (not a telegram: block), so the telegram plugin's
+ # apply_yaml_config_fn hook — which only runs when a telegram config
+ # block exists — can't cover the no-telegram-block case (#3979).
+ if not os.getenv("TELEGRAM_REQUIRE_MENTION"):
+ os.environ["TELEGRAM_REQUIRE_MENTION"] = str(_tl_require_mention).lower()
- # Telegram settings → env vars (env vars take precedence)
- telegram_cfg = yaml_cfg.get("telegram", {})
- if isinstance(telegram_cfg, dict):
- # Bridge top-level legacy `telegram.disable_topic_auto_rename` into
- # gateway.platforms.telegram.extra so the runtime config sees it.
- # Read as a runtime-config flag, not env-var (no need for env override).
- if "disable_topic_auto_rename" in telegram_cfg:
- _tg_plat = platforms_data.setdefault(Platform.TELEGRAM.value, {})
- _tg_extra = _tg_plat.setdefault("extra", {})
- _tg_extra.setdefault(
- "disable_topic_auto_rename",
- telegram_cfg["disable_topic_auto_rename"],
- )
- # Prefer telegram.require_mention; fall back to the top-level shorthand.
- _effective_rm = telegram_cfg.get("require_mention", yaml_cfg.get("require_mention"))
- if _effective_rm is not None and not os.getenv("TELEGRAM_REQUIRE_MENTION"):
- os.environ["TELEGRAM_REQUIRE_MENTION"] = str(_effective_rm).lower()
- if "mention_patterns" in telegram_cfg and not os.getenv("TELEGRAM_MENTION_PATTERNS"):
- os.environ["TELEGRAM_MENTION_PATTERNS"] = json.dumps(telegram_cfg["mention_patterns"])
- if "exclusive_bot_mentions" in telegram_cfg and not os.getenv("TELEGRAM_EXCLUSIVE_BOT_MENTIONS"):
- os.environ["TELEGRAM_EXCLUSIVE_BOT_MENTIONS"] = str(telegram_cfg["exclusive_bot_mentions"]).lower()
- if "guest_mode" in telegram_cfg and not os.getenv("TELEGRAM_GUEST_MODE"):
- os.environ["TELEGRAM_GUEST_MODE"] = str(telegram_cfg["guest_mode"]).lower()
- if "observe_unmentioned_group_messages" in telegram_cfg and not os.getenv("TELEGRAM_OBSERVE_UNMENTIONED_GROUP_MESSAGES"):
- os.environ["TELEGRAM_OBSERVE_UNMENTIONED_GROUP_MESSAGES"] = str(telegram_cfg["observe_unmentioned_group_messages"]).lower()
- frc = telegram_cfg.get("free_response_chats")
- if frc is not None and not os.getenv("TELEGRAM_FREE_RESPONSE_CHATS"):
- if isinstance(frc, list):
- frc = ",".join(str(v) for v in frc)
- os.environ["TELEGRAM_FREE_RESPONSE_CHATS"] = str(frc)
- # allowed_chats: if set, bot ONLY responds in these group chats (whitelist)
- ac = telegram_cfg.get("allowed_chats")
- if ac is not None and not os.getenv("TELEGRAM_ALLOWED_CHATS"):
- if isinstance(ac, list):
- ac = ",".join(str(v) for v in ac)
- os.environ["TELEGRAM_ALLOWED_CHATS"] = str(ac)
- allowed_topics = telegram_cfg.get("allowed_topics")
- if allowed_topics is not None and not os.getenv("TELEGRAM_ALLOWED_TOPICS"):
- if isinstance(allowed_topics, list):
- allowed_topics = ",".join(str(v) for v in allowed_topics)
- os.environ["TELEGRAM_ALLOWED_TOPICS"] = str(allowed_topics)
- ignored_threads = telegram_cfg.get("ignored_threads")
- if ignored_threads is not None and not os.getenv("TELEGRAM_IGNORED_THREADS"):
- if isinstance(ignored_threads, list):
- ignored_threads = ",".join(str(v) for v in ignored_threads)
- os.environ["TELEGRAM_IGNORED_THREADS"] = str(ignored_threads)
- if "reactions" in telegram_cfg and not os.getenv("TELEGRAM_REACTIONS"):
- os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
- if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
- os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
- # reply_to_mode: top-level preferred, falls back to extra.reply_to_mode
- # YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
- _telegram_extra = telegram_cfg.get("extra") if isinstance(telegram_cfg.get("extra"), dict) else {}
- _telegram_rtm = (
- telegram_cfg["reply_to_mode"] if "reply_to_mode" in telegram_cfg
- else _telegram_extra.get("reply_to_mode")
- )
- if _telegram_rtm is not None and not os.getenv("TELEGRAM_REPLY_TO_MODE"):
- _rtm_str = "off" if _telegram_rtm is False else str(_telegram_rtm).lower()
- os.environ["TELEGRAM_REPLY_TO_MODE"] = _rtm_str
- allowed_users = telegram_cfg.get("allow_from")
- if allowed_users is not None and not os.getenv("TELEGRAM_ALLOWED_USERS"):
- if isinstance(allowed_users, list):
- allowed_users = ",".join(str(v) for v in allowed_users)
- os.environ["TELEGRAM_ALLOWED_USERS"] = str(allowed_users)
- group_allowed_users = telegram_cfg.get("group_allow_from")
- if group_allowed_users is not None and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
- if isinstance(group_allowed_users, list):
- group_allowed_users = ",".join(str(v) for v in group_allowed_users)
- os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(group_allowed_users)
- group_allowed_chats = telegram_cfg.get("group_allowed_chats")
- if group_allowed_chats is not None and not os.getenv("TELEGRAM_GROUP_ALLOWED_CHATS"):
- if isinstance(group_allowed_chats, list):
- group_allowed_chats = ",".join(str(v) for v in group_allowed_chats)
- os.environ["TELEGRAM_GROUP_ALLOWED_CHATS"] = str(group_allowed_chats)
- for _telegram_extra_key in ("guest_mode", "disable_link_previews", "observe_unmentioned_group_messages"):
- if _telegram_extra_key in telegram_cfg:
- plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
- if not isinstance(plat_data, dict):
- plat_data = {}
- platforms_data[Platform.TELEGRAM.value] = plat_data
- extra = plat_data.setdefault("extra", {})
- if not isinstance(extra, dict):
- extra = {}
- plat_data["extra"] = extra
- extra[_telegram_extra_key] = telegram_cfg[_telegram_extra_key]
- if _telegram_extra:
- _plat_data, _plat_extra = _ensure_platform_extra_dict(
- platforms_data, Platform.TELEGRAM.value
- )
- for _telegram_extra_key, _telegram_extra_value in _telegram_extra.items():
- _plat_extra.setdefault(_telegram_extra_key, _telegram_extra_value)
+ # Telegram settings → env vars / extra: migrated to the telegram
+ # plugin's apply_yaml_config_fn hook
+ # (plugins/platforms/telegram/adapter.py). #41112 / #3823.
- whatsapp_cfg = yaml_cfg.get("whatsapp", {})
- if isinstance(whatsapp_cfg, dict):
- if "require_mention" in whatsapp_cfg and not os.getenv("WHATSAPP_REQUIRE_MENTION"):
- os.environ["WHATSAPP_REQUIRE_MENTION"] = str(whatsapp_cfg["require_mention"]).lower()
- if "mention_patterns" in whatsapp_cfg and not os.getenv("WHATSAPP_MENTION_PATTERNS"):
- os.environ["WHATSAPP_MENTION_PATTERNS"] = json.dumps(whatsapp_cfg["mention_patterns"])
- frc = whatsapp_cfg.get("free_response_chats")
- if frc is not None and not os.getenv("WHATSAPP_FREE_RESPONSE_CHATS"):
- if isinstance(frc, list):
- frc = ",".join(str(v) for v in frc)
- os.environ["WHATSAPP_FREE_RESPONSE_CHATS"] = str(frc)
- if "dm_policy" in whatsapp_cfg and not os.getenv("WHATSAPP_DM_POLICY"):
- os.environ["WHATSAPP_DM_POLICY"] = str(whatsapp_cfg["dm_policy"]).lower()
- af = whatsapp_cfg.get("allow_from")
- if af is not None and not os.getenv("WHATSAPP_ALLOWED_USERS"):
- if isinstance(af, list):
- af = ",".join(str(v) for v in af)
- os.environ["WHATSAPP_ALLOWED_USERS"] = str(af)
- if "group_policy" in whatsapp_cfg and not os.getenv("WHATSAPP_GROUP_POLICY"):
- os.environ["WHATSAPP_GROUP_POLICY"] = str(whatsapp_cfg["group_policy"]).lower()
- gaf = whatsapp_cfg.get("group_allow_from")
- if gaf is not None and not os.getenv("WHATSAPP_GROUP_ALLOWED_USERS"):
- if isinstance(gaf, list):
- gaf = ",".join(str(v) for v in gaf)
- os.environ["WHATSAPP_GROUP_ALLOWED_USERS"] = str(gaf)
+ # WhatsApp settings → env vars: migrated to the whatsapp plugin's
+ # apply_yaml_config_fn hook (plugins/platforms/whatsapp/adapter.py).
+ # #41112 / #3823.
# Signal settings → env vars (env vars take precedence)
signal_cfg = yaml_cfg.get("signal", {})
@@ -1198,72 +1113,20 @@ def load_gateway_config() -> GatewayConfig:
if "require_mention" in signal_cfg and not os.getenv("SIGNAL_REQUIRE_MENTION"):
os.environ["SIGNAL_REQUIRE_MENTION"] = str(signal_cfg["require_mention"]).lower()
- # DingTalk settings → env vars (env vars take precedence)
- dingtalk_cfg = yaml_cfg.get("dingtalk", {})
- if isinstance(dingtalk_cfg, dict):
- if "require_mention" in dingtalk_cfg and not os.getenv("DINGTALK_REQUIRE_MENTION"):
- os.environ["DINGTALK_REQUIRE_MENTION"] = str(dingtalk_cfg["require_mention"]).lower()
- if "mention_patterns" in dingtalk_cfg and not os.getenv("DINGTALK_MENTION_PATTERNS"):
- os.environ["DINGTALK_MENTION_PATTERNS"] = json.dumps(dingtalk_cfg["mention_patterns"])
- frc = dingtalk_cfg.get("free_response_chats")
- if frc is not None and not os.getenv("DINGTALK_FREE_RESPONSE_CHATS"):
- if isinstance(frc, list):
- frc = ",".join(str(v) for v in frc)
- os.environ["DINGTALK_FREE_RESPONSE_CHATS"] = str(frc)
- # allowed_chats: if set, bot ONLY responds in these group chats (whitelist)
- ac = dingtalk_cfg.get("allowed_chats")
- if ac is not None and not os.getenv("DINGTALK_ALLOWED_CHATS"):
- if isinstance(ac, list):
- ac = ",".join(str(v) for v in ac)
- os.environ["DINGTALK_ALLOWED_CHATS"] = str(ac)
- allowed = dingtalk_cfg.get("allowed_users")
- if allowed is not None and not os.getenv("DINGTALK_ALLOWED_USERS"):
- if isinstance(allowed, list):
- allowed = ",".join(str(v) for v in allowed)
- os.environ["DINGTALK_ALLOWED_USERS"] = str(allowed)
+ # DingTalk settings → env vars: migrated to the dingtalk plugin's
+ # apply_yaml_config_fn hook (plugins/platforms/dingtalk/adapter.py).
+ # #41112 / #3823.
# Mattermost config bridge moved into plugins/platforms/mattermost/
# adapter.py::_apply_yaml_config — see #25443 (apply_yaml_config_fn).
- # Matrix settings → env vars (env vars take precedence)
- matrix_cfg = yaml_cfg.get("matrix", {})
- if isinstance(matrix_cfg, dict):
- if "require_mention" in matrix_cfg and not os.getenv("MATRIX_REQUIRE_MENTION"):
- os.environ["MATRIX_REQUIRE_MENTION"] = str(matrix_cfg["require_mention"]).lower()
- allowed_users = matrix_cfg.get("allowed_users")
- if allowed_users is not None and not os.getenv("MATRIX_ALLOWED_USERS"):
- if isinstance(allowed_users, list):
- allowed_users = ",".join(str(v) for v in allowed_users)
- os.environ["MATRIX_ALLOWED_USERS"] = str(allowed_users)
- allowed_rooms = matrix_cfg.get("allowed_rooms")
- if allowed_rooms is not None and not os.getenv("MATRIX_ALLOWED_ROOMS"):
- if isinstance(allowed_rooms, list):
- allowed_rooms = ",".join(str(v) for v in allowed_rooms)
- os.environ["MATRIX_ALLOWED_ROOMS"] = str(allowed_rooms)
- frc = matrix_cfg.get("free_response_rooms")
- if frc is not None and not os.getenv("MATRIX_FREE_RESPONSE_ROOMS"):
- if isinstance(frc, list):
- frc = ",".join(str(v) for v in frc)
- os.environ["MATRIX_FREE_RESPONSE_ROOMS"] = str(frc)
- ignore_patterns = matrix_cfg.get("ignore_user_patterns")
- if ignore_patterns is not None and not os.getenv("MATRIX_IGNORE_USER_PATTERNS"):
- if isinstance(ignore_patterns, list):
- ignore_patterns = ",".join(str(v) for v in ignore_patterns)
- os.environ["MATRIX_IGNORE_USER_PATTERNS"] = str(ignore_patterns)
- if "process_notices" in matrix_cfg and not os.getenv("MATRIX_PROCESS_NOTICES"):
- os.environ["MATRIX_PROCESS_NOTICES"] = str(matrix_cfg["process_notices"]).lower()
- if "session_scope" in matrix_cfg and not os.getenv("MATRIX_SESSION_SCOPE"):
- os.environ["MATRIX_SESSION_SCOPE"] = str(matrix_cfg["session_scope"]).lower()
- if "auto_thread" in matrix_cfg and not os.getenv("MATRIX_AUTO_THREAD"):
- os.environ["MATRIX_AUTO_THREAD"] = str(matrix_cfg["auto_thread"]).lower()
- if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
- os.environ["MATRIX_DM_MENTION_THREADS"] = str(matrix_cfg["dm_mention_threads"]).lower()
+ # Matrix settings → env vars: migrated to the matrix plugin's
+ # apply_yaml_config_fn hook (plugins/platforms/matrix/adapter.py).
+ # #41112 / #3823.
- # Feishu settings → env vars (env vars take precedence)
- feishu_cfg = yaml_cfg.get("feishu", {})
- if isinstance(feishu_cfg, dict):
- if "allow_bots" in feishu_cfg and not os.getenv("FEISHU_ALLOW_BOTS"):
- os.environ["FEISHU_ALLOW_BOTS"] = str(feishu_cfg["allow_bots"]).lower()
+ # Feishu settings → env vars: migrated to the feishu plugin's
+ # apply_yaml_config_fn hook (plugins/platforms/feishu/adapter.py).
+ # #41112 / #3823.
except Exception as e:
logger.warning(
@@ -1362,7 +1225,13 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
return config.platforms[platform]
platform_config = config.platforms[platform]
- enabled_was_explicit = bool(platform_config.extra.pop("_enabled_explicit", False))
+ # Read (don't pop) the explicit-enable marker: the registry-driven
+ # plugin-enable pass later in this function also needs it to avoid
+ # re-enabling a platform the user explicitly disabled (migrated plugin
+ # platforms — telegram, matrix — flow through here too, #41112). The
+ # flag is cleared once for all platforms in the final cleanup at the
+ # end of _apply_env_overrides.
+ enabled_was_explicit = bool(platform_config.extra.get("_enabled_explicit", False))
if not platform_config.enabled and not enabled_was_explicit:
platform_config.enabled = True
return platform_config
@@ -1505,7 +1374,12 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.SLACK].enabled = True
else:
slack_config = config.platforms[Platform.SLACK]
- enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
+ # Read (don't pop) the explicit-enable marker: the registry-driven
+ # plugin-enable pass below also needs it to avoid re-enabling a
+ # platform the user explicitly disabled (Slack is now a plugin
+ # entry — #41112). The flag is cleared once for all platforms in
+ # the final cleanup at the end of _apply_env_overrides.
+ enabled_was_explicit = bool(slack_config.extra.get("_enabled_explicit", False))
if not slack_config.enabled and not enabled_was_explicit:
# Top-level Slack settings such as channel prompts should not
# turn an env-token setup into a disabled platform. Only an
@@ -1831,7 +1705,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
"token": os.getenv("WECOM_CALLBACK_TOKEN", ""),
"encoding_aes_key": os.getenv("WECOM_CALLBACK_ENCODING_AES_KEY", ""),
"host": os.getenv("WECOM_CALLBACK_HOST", "0.0.0.0"),
- "port": int(os.getenv("WECOM_CALLBACK_PORT", "8645")),
+ "port": env_int("WECOM_CALLBACK_PORT", 8645),
})
# Weixin (personal WeChat via iLink Bot API)
@@ -1887,7 +1761,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
"server_url": bluebubbles_server_url.rstrip("/"),
"password": bluebubbles_password,
"webhook_host": os.getenv("BLUEBUBBLES_WEBHOOK_HOST", "127.0.0.1"),
- "webhook_port": int(os.getenv("BLUEBUBBLES_WEBHOOK_PORT", "8645")),
+ "webhook_port": env_int("BLUEBUBBLES_WEBHOOK_PORT", 8645),
"webhook_path": os.getenv("BLUEBUBBLES_WEBHOOK_PATH", "/bluebubbles-webhook"),
"send_read_receipts": os.getenv("BLUEBUBBLES_SEND_READ_RECEIPTS", "true").lower() in {"true", "1", "yes"},
})
@@ -2040,13 +1914,24 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
from gateway.platform_registry import platform_registry
for entry in platform_registry.plugin_entries():
try:
- if not entry.check_fn():
- continue
+ platform = Platform(entry.name)
except Exception as e:
- logger.debug("check_fn for %s raised: %s", entry.name, e)
+ logger.debug("unknown platform name %r: %s", entry.name, e)
continue
- platform = Platform(entry.name)
existing_cfg = config.platforms.get(platform)
+ # Respect an explicit ``enabled: false`` (YAML / gateway.json /
+ # dashboard PUT). ``_enabled_explicit`` is set in
+ # load_gateway_config() (via _merge_platform_map / the shared-key
+ # loop) when the user wrote ``enabled`` for this platform; if they
+ # explicitly disabled it, never re-enable here just because
+ # check_fn() / is_connected() pass (e.g. a token is present but the
+ # user set telegram.enabled: false). #41112.
+ if (
+ existing_cfg is not None
+ and not existing_cfg.enabled
+ and bool((existing_cfg.extra or {}).get("_enabled_explicit", False))
+ ):
+ continue
# Seed candidate extras from ``env_enablement_fn`` so plugins
# whose ``is_connected`` reads ``config.extra`` (e.g. Google
# Chat's ``_is_connected`` checks ``config.extra["project_id"]``)
@@ -2116,6 +2001,22 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
entry.name,
)
continue
+ # Verify dependencies LAST — only for platforms that are already
+ # enabled or passed the credential gate above. For adapter plugins
+ # ``check_fn`` lazy-INSTALLS the platform SDK (pip) as a side
+ # effect, so running it as an unconditional sweep over every
+ # registered platform made ``load_gateway_config()`` pip-install
+ # Discord/Telegram/Slack/Feishu/Dingtalk on every call — including
+ # the desktop/dashboard readiness probe (``GET /api/status``, which
+ # awaits this synchronously) — even when the user configured none
+ # of them. That blocked startup until every install finished and
+ # caused the desktop app to time out and boot-loop (stuck at 94%).
+ try:
+ if not entry.check_fn():
+ continue
+ except Exception as e:
+ logger.debug("check_fn for %s raised: %s", entry.name, e)
+ continue
if platform not in config.platforms:
config.platforms[platform] = PlatformConfig()
config.platforms[platform].enabled = True
@@ -2143,5 +2044,24 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
except Exception as e:
logger.debug("Plugin platform enable pass failed: %s", e)
+ # Relay (generic connector-fronted platform, EXPERIMENTAL). Enabled when a
+ # connector relay URL is configured via GATEWAY_RELAY_URL (env) or
+ # gateway.relay_url (config.yaml). The adapter is registered into the
+ # platform_registry at gateway startup (gateway.relay.register_relay_adapter)
+ # and dials OUT to the connector — so, like Telegram/Matrix, it has no public
+ # inbound port and just needs Platform.RELAY present+enabled in
+ # config.platforms for start_gateway()'s connect loop to bring it up. The
+ # connected-checker (Platform.RELAY in _PLATFORM_CONNECTED_CHECKERS) keys on
+ # extra["relay_url"], so mirror the URL into extra here.
+ relay_url_env = os.getenv("GATEWAY_RELAY_URL", "").strip()
+ relay_url_yaml = ""
+ existing_relay = config.platforms.get(Platform.RELAY)
+ if existing_relay is not None:
+ relay_url_yaml = str(existing_relay.extra.get("relay_url") or "").strip()
+ relay_url_val = relay_url_env or relay_url_yaml
+ if relay_url_val:
+ relay_config = _enable_from_env(Platform.RELAY)
+ relay_config.extra["relay_url"] = relay_url_val.rstrip("/")
+
for platform_config in config.platforms.values():
platform_config.extra.pop("_enabled_explicit", None)
diff --git a/gateway/kanban_watchers.py b/gateway/kanban_watchers.py
index 328cbd7fb5b..5bcf70c8d21 100644
--- a/gateway/kanban_watchers.py
+++ b/gateway/kanban_watchers.py
@@ -16,13 +16,97 @@ import os
import sqlite3
import time
from pathlib import Path
-from typing import Any, Optional
+from typing import Any, Callable, Optional
# Match the logger run.py uses (logging.getLogger(__name__) where __name__ ==
# "gateway.run") so extracted log records keep their original logger name.
logger = logging.getLogger("gateway.run")
+def _resolve_auto_decompose_settings(
+ load_config: Callable[[], Any],
+) -> "tuple[bool, int]":
+ """Resolve the live (enabled, per_tick) auto-decompose settings.
+
+ Read fresh from config on every dispatcher tick (#49638) so that flipping
+ ``kanban.auto_decompose: false`` to STOP runaway fan-out takes effect on the
+ next tick instead of requiring a gateway restart. Auto-decompose is a
+ safety toggle — a user who sees it create and launch tasks they didn't
+ intend reaches for this flag to halt it, and a stale boot-captured value
+ silently ignoring that change is the bug reported in #49638.
+
+ Fails **safe**: if the config read raises, return ``(False, 3)`` — a
+ transient read error must never re-enable a feature the user turned off,
+ nor fall back to the burst-prone default-on behaviour. ``per_tick`` is
+ clamped to ``>= 1``.
+ """
+ try:
+ cfg = load_config()
+ except Exception:
+ return False, 3
+ kcfg = cfg.get("kanban", {}) if isinstance(cfg, dict) else {}
+ enabled = bool(kcfg.get("auto_decompose", True))
+ try:
+ per_tick = int(kcfg.get("auto_decompose_per_tick", 3) or 3)
+ except (TypeError, ValueError):
+ per_tick = 3
+ if per_tick < 1:
+ per_tick = 1
+ return enabled, per_tick
+
+
+def _acquire_singleton_lock(lock_path) -> "tuple[Optional[object], str]":
+ """Take an exclusive, non-blocking advisory lock for the sole dispatcher.
+
+ Only one gateway process machine-wide may run the embedded kanban
+ dispatcher: concurrent dispatchers double the reclaim frequency (each
+ runs its own ``release_stale_claims`` → promote → dispatch loop), double
+ claim-attempt events in the event log, and — with ``wal_autocheckpoint=0`` —
+ concurrent manual WAL checkpoints can corrupt index pages. The
+ ``dispatch_in_gateway`` config flag is the primary control; this lock is the
+ backstop that survives config drift and same-profile restart races.
+
+ Delegates to :func:`gateway.status._try_acquire_file_lock` (``fcntl`` on
+ POSIX, ``msvcrt`` on Windows) so the guard is cross-platform.
+
+ Returns ``(handle, "held")`` on success — the caller keeps the file handle
+ for the process lifetime and **must** release it via
+ :func:`_release_singleton_lock` when done. ``(None, "contended")`` when
+ another process holds the lock (caller must NOT dispatch). ``(None,
+ "unavailable")`` when locking cannot be performed (non-POSIX filesystem
+ without flock, or the status.py helpers are unimportable) — caller falls
+ back to config-only control.
+ """
+ try:
+ from gateway.status import _try_acquire_file_lock # deferred; same package
+ except ImportError:
+ return None, "unavailable"
+ try:
+ Path(lock_path).parent.mkdir(parents=True, exist_ok=True)
+ handle = open(str(lock_path), "a+", encoding="utf-8")
+ except OSError:
+ return None, "unavailable"
+ if not _try_acquire_file_lock(handle):
+ handle.close()
+ return None, "contended"
+ return handle, "held"
+
+
+def _release_singleton_lock(handle) -> None:
+ """Release a dispatcher singleton lock acquired via :func:`_acquire_singleton_lock`."""
+ if handle is None:
+ return
+ try:
+ from gateway.status import _release_file_lock
+ _release_file_lock(handle)
+ except Exception:
+ pass
+ try:
+ handle.close()
+ except Exception:
+ pass
+
+
class GatewayKanbanWatchersMixin:
"""Kanban watcher / notifier / dispatcher loops for GatewayRunner."""
@@ -606,6 +690,31 @@ class GatewayKanbanWatchersMixin:
logger.warning("kanban dispatcher: kanban_db not importable; dispatcher disabled")
return
+ # Single-dispatcher backstop. dispatch_in_gateway defaults to true, so a
+ # new profile gateway (or a same-profile restart race) can silently
+ # start a second dispatcher; concurrent dispatchers double reclaim
+ # frequency, double claim-attempt events, and — with
+ # wal_autocheckpoint=0 — concurrent manual WAL checkpoints can corrupt
+ # index pages. The lock lives at the machine-global kanban root
+ # (shared across profiles by design), so it serialises ALL gateways.
+ self._kanban_dispatcher_lock_handle = None
+ _lock_path = _kb.kanban_home() / "kanban" / ".dispatcher.lock"
+ _lock_handle, _lock_state = _acquire_singleton_lock(_lock_path)
+ if _lock_state == "contended":
+ logger.info(
+ "kanban dispatcher: another gateway already holds the dispatcher "
+ "lock (%s); this gateway will NOT dispatch.", _lock_path,
+ )
+ return
+ if _lock_state == "held":
+ self._kanban_dispatcher_lock_handle = _lock_handle # hold for process lifetime
+ logger.info("kanban dispatcher: holding singleton dispatcher lock (%s)", _lock_path)
+ else:
+ logger.warning(
+ "kanban dispatcher: advisory lock unavailable at %s; proceeding "
+ "on config control alone.", _lock_path,
+ )
+
try:
interval = float(kanban_cfg.get("dispatch_interval_seconds", 60) or 60)
except (ValueError, TypeError):
@@ -908,17 +1017,20 @@ class GatewayKanbanWatchersMixin:
# ``kanban.auto_decompose_per_tick`` (default 3) so a bulk-load
# of triage tasks doesn't burst-spend the aux LLM in one tick;
# remainder defers to subsequent ticks.
- auto_decompose_enabled = bool(kanban_cfg.get("auto_decompose", True))
- try:
- auto_decompose_per_tick = int(
- kanban_cfg.get("auto_decompose_per_tick", 3) or 3
- )
- except (TypeError, ValueError):
- auto_decompose_per_tick = 3
- if auto_decompose_per_tick < 1:
- auto_decompose_per_tick = 1
+ #
+ # The flag is re-read from config EVERY tick (#49638) rather than
+ # captured once at boot. Auto-decompose is a safety toggle: a user who
+ # sees it fan out and run tasks they didn't intend reaches for
+ # ``kanban.auto_decompose: false`` to STOP it — and that must take
+ # effect on the next tick, not require a gateway restart. (Reported:
+ # auto-decompose created and launched destructive tasks while the user
+ # was still typing the task description, and the flag "couldn't be
+ # disabled" because the gateway had captured its boot-time value.)
+ def _read_auto_decompose_settings() -> tuple[bool, int]:
+ """Re-resolve (enabled, per_tick) from current config each tick."""
+ return _resolve_auto_decompose_settings(_load_config)
- def _auto_decompose_tick() -> int:
+ def _auto_decompose_tick(auto_decompose_per_tick: int) -> int:
"""Run the auto-decomposer for up to N triage tasks across all
boards. Returns the number of triage tasks that were
successfully decomposed or specified this tick.
@@ -1013,8 +1125,12 @@ class GatewayKanbanWatchersMixin:
logger.exception("kanban dispatcher: zombie reaper failed")
try:
- if auto_decompose_enabled:
- await asyncio.to_thread(_auto_decompose_tick)
+ # Re-read the auto-decompose toggle live each tick so a user
+ # flipping kanban.auto_decompose=false to STOP runaway fan-out
+ # takes effect on the next tick, not on gateway restart (#49638).
+ _ad_enabled, _ad_per_tick = _read_auto_decompose_settings()
+ if _ad_enabled:
+ await asyncio.to_thread(_auto_decompose_tick, _ad_per_tick)
results = await asyncio.to_thread(_tick_once)
any_spawned = False
for slug, res in (results or []):
@@ -1052,6 +1168,8 @@ class GatewayKanbanWatchersMixin:
last_warn_at = now
except asyncio.CancelledError:
logger.debug("kanban dispatcher: cancelled")
+ _release_singleton_lock(self._kanban_dispatcher_lock_handle)
+ self._kanban_dispatcher_lock_handle = None
raise
except Exception:
logger.exception("kanban dispatcher: unexpected watcher error")
@@ -1062,3 +1180,6 @@ class GatewayKanbanWatchersMixin:
while slept < interval and self._running:
await asyncio.sleep(min(1.0, interval - slept))
slept += 1.0
+
+ _release_singleton_lock(self._kanban_dispatcher_lock_handle)
+ self._kanban_dispatcher_lock_handle = None
diff --git a/gateway/platforms/api_server.py b/gateway/platforms/api_server.py
index da86952a09d..7970e704ba8 100644
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@@ -717,6 +717,16 @@ except ImportError:
_cron_resume = None
_cron_trigger = None
+
+def _notify_cron_provider_jobs_changed() -> None:
+ """Tell the active cron scheduler provider the job set changed after a REST
+ mutation (no-op for the built-in). Best-effort — never breaks the handler."""
+ try:
+ from cron.scheduler import _notify_provider_jobs_changed
+ _notify_provider_jobs_changed()
+ except Exception:
+ pass
+
# Defense-in-depth: mirror the agent-facing cronjob tool, which scans the
# user-supplied prompt for exfiltration/injection payloads at create/update
# time (tools/cronjob_tools.py). The REST cron endpoints are authenticated
@@ -739,6 +749,16 @@ class APIServerAdapter(BasePlatformAdapter):
and routes them through hermes-agent's AIAgent.
"""
+ # Stateless request/response: every route (the OpenAI-spec
+ # /v1/chat/completions and /v1/responses, and the proprietary /v1/runs SSE
+ # stream) tears down its channel when the turn ends. There is no persistent
+ # outbound channel to push a background completion to a client that already
+ # received its response, and ``send()`` is a no-op stub. So async-delivery
+ # tools (terminal notify_on_complete / watch_patterns, delegate_task
+ # background=True) must NOT promise delivery on this path — see
+ # ``async_delivery_supported()``.
+ supports_async_delivery: bool = False
+
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.API_SERVER)
extra = config.extra or {}
@@ -772,6 +792,15 @@ class APIServerAdapter(BasePlatformAdapter):
# in-flight run by run_id.
self._run_approval_sessions: Dict[str, str] = {}
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
+ # Concurrency cap shared across all agent-serving endpoints
+ # (/v1/chat/completions, /v1/responses, /v1/runs). Read from
+ # config.yaml gateway.api_server.max_concurrent_runs; 0 disables
+ # the cap. Bounds CPU / memory / upstream-LLM-quota exhaustion
+ # from a request flood (#7483).
+ self._max_concurrent_runs: int = self._resolve_max_concurrent_runs()
+ # Number of in-flight runs on the non-streaming chat/responses paths
+ # (the /v1/runs path tracks its own in-flight set via _run_streams).
+ self._inflight_agent_runs: int = 0
@staticmethod
def _parse_cors_origins(value: Any) -> tuple[str, ...]:
@@ -788,6 +817,30 @@ class APIServerAdapter(BasePlatformAdapter):
return tuple(str(item).strip() for item in items if str(item).strip())
+ @staticmethod
+ def _resolve_max_concurrent_runs() -> int:
+ """Read the concurrent-run cap from config.yaml (0 disables).
+
+ gateway.api_server.max_concurrent_runs. Falls back to the historical
+ default of 10 when unset or malformed. Negative values are clamped
+ to 0 (disabled).
+ """
+ default = 10
+ try:
+ from hermes_cli.config import cfg_get, load_config
+
+ raw = cfg_get(
+ load_config(),
+ "gateway",
+ "api_server",
+ "max_concurrent_runs",
+ default=default,
+ )
+ value = int(raw)
+ except Exception:
+ return default
+ return max(0, value)
+
@staticmethod
def _resolve_model_name(explicit: str) -> str:
"""Derive the advertised model name for /v1/models.
@@ -1033,7 +1086,13 @@ class APIServerAdapter(BasePlatformAdapter):
— matching the semantics of the native gateway's ``session_key``.
"""
from run_agent import AIAgent
- from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config, GatewayRunner
+ from gateway.run import (
+ _current_max_iterations,
+ _resolve_runtime_agent_kwargs,
+ _resolve_gateway_model,
+ _load_gateway_config,
+ GatewayRunner,
+ )
from hermes_cli.tools_config import _get_platform_tools
runtime_kwargs = _resolve_runtime_agent_kwargs()
@@ -1043,7 +1102,7 @@ class APIServerAdapter(BasePlatformAdapter):
user_config = _load_gateway_config()
enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
- max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
+ max_iterations = _current_max_iterations()
# Load fallback provider chain so the API server platform has the
# same fallback behaviour as Telegram/Discord/Slack (fixes #4954).
@@ -1087,16 +1146,35 @@ class APIServerAdapter(BasePlatformAdapter):
dashboard can display full status without needing a shared PID file or
/proc access. No authentication required.
"""
- from gateway.status import read_runtime_status
+ from gateway.status import (
+ derive_gateway_busy,
+ derive_gateway_drainable,
+ parse_active_agents,
+ read_runtime_status,
+ )
runtime = read_runtime_status() or {}
+ gw_state = runtime.get("gateway_state")
+ gw_active = parse_active_agents(runtime.get("active_agents", 0))
+ # This endpoint is served BY the gateway process, so it is by definition
+ # alive — gateway_running is True. Derive busy/drainable from the same
+ # shared contract /api/status uses so the two surfaces never disagree.
return web.json_response({
"status": "ok",
"platform": "hermes-agent",
"version": _hermes_version(),
- "gateway_state": runtime.get("gateway_state"),
+ "gateway_state": gw_state,
"platforms": runtime.get("platforms", {}),
- "active_agents": runtime.get("active_agents", 0),
+ "active_agents": gw_active,
+ "gateway_busy": derive_gateway_busy(
+ gateway_running=True,
+ gateway_state=gw_state,
+ active_agents=gw_active,
+ ),
+ "gateway_drainable": derive_gateway_drainable(
+ gateway_running=True,
+ gateway_state=gw_state,
+ ),
"exit_reason": runtime.get("exit_reason"),
"updated_at": runtime.get("updated_at"),
"pid": os.getpid(),
@@ -1732,6 +1810,11 @@ class APIServerAdapter(BasePlatformAdapter):
if auth_err:
return auth_err
+ # Bound total in-flight agent runs (configurable; #7483).
+ limited = self._concurrency_limited_response()
+ if limited is not None:
+ return limited
+
# Parse request body
try:
body = await request.json()
@@ -2801,6 +2884,11 @@ class APIServerAdapter(BasePlatformAdapter):
if auth_err:
return auth_err
+ # Bound total in-flight agent runs (configurable; #7483).
+ limited = self._concurrency_limited_response()
+ if limited is not None:
+ return limited
+
# Long-term memory scope header (see chat_completions for details).
gateway_session_key, key_err = self._parse_session_key_header(request)
if key_err is not None:
@@ -3206,6 +3294,7 @@ class APIServerAdapter(BasePlatformAdapter):
kwargs["repeat"] = repeat
job = _cron_create(**kwargs)
+ _notify_cron_provider_jobs_changed()
return web.json_response({"job": job})
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@@ -3262,6 +3351,7 @@ class APIServerAdapter(BasePlatformAdapter):
job = _cron_update(job_id, sanitized)
if not job:
return web.json_response({"error": "Job not found"}, status=404)
+ _notify_cron_provider_jobs_changed()
return web.json_response({"job": job})
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@@ -3281,6 +3371,7 @@ class APIServerAdapter(BasePlatformAdapter):
success = _cron_remove(job_id)
if not success:
return web.json_response({"error": "Job not found"}, status=404)
+ _notify_cron_provider_jobs_changed()
return web.json_response({"ok": True})
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@@ -3300,6 +3391,7 @@ class APIServerAdapter(BasePlatformAdapter):
job = _cron_pause(job_id)
if not job:
return web.json_response({"error": "Job not found"}, status=404)
+ _notify_cron_provider_jobs_changed()
return web.json_response({"job": job})
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@@ -3319,6 +3411,7 @@ class APIServerAdapter(BasePlatformAdapter):
job = _cron_resume(job_id)
if not job:
return web.json_response({"error": "Job not found"}, status=404)
+ _notify_cron_provider_jobs_changed()
return web.json_response({"job": job})
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@@ -3342,6 +3435,64 @@ class APIServerAdapter(BasePlatformAdapter):
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
+ async def _handle_cron_fire(self, request: "web.Request") -> "web.Response":
+ """POST /api/cron/fire — Chronos managed-cron fire webhook (NAS → agent).
+
+ Authenticated by a NAS-minted JWT (verified via the pluggable
+ fire-verifier), NOT API_SERVER_KEY — NAS holds no API server key, and
+ this is the only inbound that can trigger remote job execution, so it
+ gets its own purpose-scoped token check.
+
+ Returns 202 + runs the job in the background so a long agent turn never
+ trips NAS's HTTP timeout. The store CAS claim inside fire_due guards
+ against double-fire on a NAS/scheduler retry.
+ """
+ from hermes_cli.config import cfg_get, load_config
+ from plugins.cron.chronos.verify import get_fire_verifier
+
+ auth = request.headers.get("Authorization", "")
+ token = auth[7:].strip() if auth.startswith("Bearer ") else ""
+
+ cfg = load_config()
+ claims = get_fire_verifier()(
+ token=token,
+ expected_audience=cfg_get(cfg, "cron", "chronos", "expected_audience", default=""),
+ jwks_or_key=cfg_get(cfg, "cron", "chronos", "nas_jwks_url", default="") or None,
+ issuer=cfg_get(cfg, "cron", "chronos", "portal_url", default="") or None,
+ )
+ if claims is None:
+ logger.warning(
+ "cron fire: rejected invalid token: %s",
+ self._request_audit_log_suffix(request),
+ )
+ return web.json_response({"error": "invalid fire token"}, status=401)
+
+ try:
+ body = await request.json()
+ except Exception:
+ body = {}
+ job_id = (body or {}).get("job_id")
+ if not job_id:
+ return web.json_response({"error": "missing job_id"}, status=400)
+
+ from cron.scheduler_provider import resolve_cron_scheduler
+ provider = resolve_cron_scheduler()
+
+ loop = asyncio.get_running_loop()
+ # Fire in the background (202 immediately). fire_due claims via the
+ # store CAS, so a retry while this is in flight is de-duped.
+ task = asyncio.create_task(
+ asyncio.to_thread(provider.fire_due, job_id, adapters=None, loop=loop)
+ )
+ try:
+ self._background_tasks.add(task)
+ task.add_done_callback(self._background_tasks.discard)
+ except (TypeError, AttributeError):
+ pass
+
+ return web.json_response({"status": "accepted", "job_id": job_id}, status=202)
+
+
# ------------------------------------------------------------------
# Output extraction helper
# ------------------------------------------------------------------
@@ -3489,6 +3640,63 @@ class APIServerAdapter(BasePlatformAdapter):
# Agent execution
# ------------------------------------------------------------------
+ def _concurrency_limited_response(self) -> Optional["web.Response"]:
+ """Return a 429 response if the concurrent-run cap is reached, else None.
+
+ The cap bounds total in-flight agent activity across every
+ agent-serving endpoint: the non-streaming chat/responses paths
+ (tracked by ``_inflight_agent_runs``) plus the ``/v1/runs`` streaming
+ path (tracked by ``_run_streams``). A configured value of 0 disables
+ the cap entirely.
+ """
+ limit = self._max_concurrent_runs
+ if limit <= 0:
+ return None
+ inflight = self._inflight_agent_runs + len(self._run_streams)
+ if inflight >= limit:
+ return web.json_response(
+ _openai_error(
+ f"Too many concurrent runs (max {limit})",
+ err_type="rate_limit_error",
+ code="rate_limit_exceeded",
+ ),
+ status=429,
+ headers={"Retry-After": "1"},
+ )
+ return None
+
+ @staticmethod
+ def _bind_api_server_session(
+ *,
+ chat_id: str = "",
+ session_key: str = "",
+ session_id: str = "",
+ ) -> list:
+ """Bind session contextvars for an API-server agent run.
+
+ This is the SINGLE structural chokepoint every API-server agent-entry
+ path must use to seed session context — it hardwires
+ ``platform="api_server"`` and ``async_delivery=False`` so a new route
+ physically cannot reintroduce the silent-no-op bug (#10760) by
+ forgetting to mark the channel as non-delivering. There is no
+ ``async_delivery`` parameter to get wrong; the stateless HTTP path can
+ never wake the agent after the turn ends, on ANY route.
+
+ Returns reset tokens; pass them to ``clear_session_vars`` in a
+ ``finally`` block (the binding is request-scoped and must not outlive
+ the turn — a session resumed later on a delivering interface, e.g. the
+ CLI or a gateway platform, re-binds fresh and is NOT blocked).
+ """
+ from gateway.session_context import set_session_vars
+
+ return set_session_vars(
+ platform="api_server",
+ chat_id=chat_id,
+ session_key=session_key,
+ session_id=session_id,
+ async_delivery=False,
+ )
+
async def _run_agent(
self,
user_message: str,
@@ -3516,10 +3724,9 @@ class APIServerAdapter(BasePlatformAdapter):
loop = asyncio.get_running_loop()
def _run():
- from gateway.session_context import clear_session_vars, set_session_vars
+ from gateway.session_context import clear_session_vars
- tokens = set_session_vars(
- platform="api_server",
+ tokens = self._bind_api_server_session(
chat_id=session_id or "",
session_key=gateway_session_key or session_id or "",
session_id=session_id or "",
@@ -3557,13 +3764,16 @@ class APIServerAdapter(BasePlatformAdapter):
finally:
clear_session_vars(tokens)
- return await loop.run_in_executor(None, _run)
+ self._inflight_agent_runs += 1
+ try:
+ return await loop.run_in_executor(None, _run)
+ finally:
+ self._inflight_agent_runs -= 1
# ------------------------------------------------------------------
# /v1/runs — structured event streaming
# ------------------------------------------------------------------
- _MAX_CONCURRENT_RUNS = 10 # Prevent unbounded resource allocation
_RUN_STREAM_TTL = 300 # seconds before orphaned runs are swept
_RUN_STATUS_TTL = 3600 # seconds to retain terminal run status for polling
@@ -3639,12 +3849,11 @@ class APIServerAdapter(BasePlatformAdapter):
if key_err is not None:
return key_err
- # Enforce concurrency limit
- if len(self._run_streams) >= self._MAX_CONCURRENT_RUNS:
- return web.json_response(
- _openai_error(f"Too many concurrent runs (max {self._MAX_CONCURRENT_RUNS})", code="rate_limit_exceeded"),
- status=429,
- )
+ # Enforce concurrency limit (shared across all agent-serving
+ # endpoints; configurable via gateway.api_server.max_concurrent_runs).
+ limited = self._concurrency_limited_response()
+ if limited is not None:
+ return limited
try:
body = await request.json()
@@ -3772,7 +3981,7 @@ class APIServerAdapter(BasePlatformAdapter):
pass
def _run_sync():
- from gateway.session_context import clear_session_vars, set_session_vars
+ from gateway.session_context import clear_session_vars
from tools.approval import (
register_gateway_notify,
reset_current_session_key,
@@ -3788,8 +3997,7 @@ class APIServerAdapter(BasePlatformAdapter):
# contextvars so concurrent runs do not share process
# environment state.
approval_token = set_current_session_key(approval_session_key)
- session_tokens = set_session_vars(
- platform="api_server",
+ session_tokens = self._bind_api_server_session(
session_key=approval_session_key,
)
register_gateway_notify(approval_session_key, _approval_notify)
@@ -4196,6 +4404,11 @@ class APIServerAdapter(BasePlatformAdapter):
self._app.router.add_post("/api/jobs/{job_id}/pause", self._handle_pause_job)
self._app.router.add_post("/api/jobs/{job_id}/resume", self._handle_resume_job)
self._app.router.add_post("/api/jobs/{job_id}/run", self._handle_run_job)
+
+ # Chronos managed-cron fire webhook (NAS → agent). Authenticated by a
+ # NAS-minted JWT (NOT API_SERVER_KEY), so it has its own auth path.
+ if _CRON_AVAILABLE:
+ self._app.router.add_post("/api/cron/fire", self._handle_cron_fire)
# Structured event streaming
self._app.router.add_post("/v1/runs", self._handle_runs)
self._app.router.add_get("/v1/runs/{run_id}", self._handle_get_run)
@@ -4228,23 +4441,56 @@ class APIServerAdapter(BasePlatformAdapter):
)
return False
- # Refuse to start network-accessible with a placeholder key.
- # Ported from openclaw/openclaw#64586.
+ # Refuse to start network-accessible with a placeholder or weak key.
+ # Ported from openclaw/openclaw#64586; entropy floor raised to 16 in
+ # the June 2026 hermes-0day hardening (an 8-char key dispatching
+ # terminal-capable agent work on a public bind is brute-forceable).
if is_network_accessible(self._host) and self._api_key:
try:
from hermes_cli.auth import has_usable_secret
- if not has_usable_secret(self._api_key, min_length=8):
+ if not has_usable_secret(self._api_key, min_length=16):
logger.error(
- "[%s] Refusing to start: API_SERVER_KEY is set to a "
- "placeholder value. Generate a real secret "
- "(e.g. `openssl rand -hex 32`) and set API_SERVER_KEY "
- "before exposing the API server on %s.",
+ "[%s] Refusing to start: API_SERVER_KEY is a "
+ "placeholder or too short (<16 chars) for a "
+ "network-accessible bind. This endpoint dispatches "
+ "terminal-capable agent work — a guessable key is "
+ "remote code execution. Generate a strong secret "
+ "(e.g. `openssl rand -hex 32`) and set "
+ "API_SERVER_KEY before exposing it on %s.",
self.name, self._host,
)
return False
except ImportError:
pass
+ # Loud warning when a network-accessible API server runs against an
+ # unsandboxed local terminal backend. The API server can drive the
+ # agent's terminal/file tools as the host user; on a public bind
+ # that is the exact surface the hermes-0day campaign abused to write
+ # ~/.hermes/config.yaml and plant persistence. Sandboxing (Docker /
+ # remote backend) contains the blast radius. Warn, don't refuse —
+ # the operator may have an external firewall / strong key.
+ if is_network_accessible(self._host):
+ try:
+ from hermes_cli.config import load_config as _load_cfg
+ _backend = (
+ ((_load_cfg() or {}).get("terminal") or {}).get(
+ "backend", "local"
+ )
+ )
+ except Exception:
+ _backend = "local"
+ if str(_backend).lower() == "local":
+ logger.warning(
+ "[%s] API server is network-accessible (%s) AND the "
+ "terminal backend is 'local' (unsandboxed). Agent work "
+ "dispatched through this endpoint runs as the host user "
+ "with full terminal/file access. Strongly consider a "
+ "sandboxed backend (terminal.backend: docker) and "
+ "firewalling this port to trusted networks only.",
+ self.name, self._host,
+ )
+
# Port conflict detection — fail fast if port is already in use
try:
with _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM) as _s:
diff --git a/gateway/platforms/base.py b/gateway/platforms/base.py
index cda3acc6e58..46339b81471 100644
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@@ -567,6 +567,96 @@ async def _ssrf_redirect_guard(response):
# Default location: {HERMES_HOME}/cache/images/ (legacy: image_cache/)
IMAGE_CACHE_DIR = get_hermes_dir("cache/images", "image_cache")
+# ---------------------------------------------------------------------------
+# Inbound media size cap (#13145)
+#
+# Inbound image / audio / video payloads are buffered fully into process
+# memory before being written to the cache directory. With no cap, a single
+# large upload (Discord Nitro allows 500 MB) — or a remote URL in an inbound
+# message payload pointing at an arbitrarily large file — can spike RAM and
+# OOM-kill the gateway. The ``cache_*_from_bytes`` helpers (the shared funnel
+# every platform reaches eventually) and the ``cache_*_from_url`` downloaders
+# enforce this cap, so the protection holds regardless of which platform
+# adapter or code path produced the bytes.
+#
+# Configurable via ``gateway.max_inbound_media_bytes`` in config.yaml.
+# ``0`` disables the cap. Default 128 MiB — generous enough for ordinary
+# photos/voice notes/short clips while still bounding a hostile upload.
+# ---------------------------------------------------------------------------
+DEFAULT_INBOUND_MEDIA_MAX_BYTES = 128 * 1024 * 1024
+
+
+def get_inbound_media_max_bytes() -> int:
+ """Return the max inbound image/audio/video bytes allowed in memory.
+
+ Reads ``gateway.max_inbound_media_bytes`` from config.yaml. ``0`` (or a
+ negative / unparseable value) disables the cap. Non-fatal if config is
+ unreadable — falls back to the default.
+ """
+ try:
+ from hermes_cli.config import load_config as _load_config
+ cfg = _load_config()
+ except Exception:
+ return DEFAULT_INBOUND_MEDIA_MAX_BYTES
+ gw = cfg.get("gateway", {}) if isinstance(cfg, dict) else {}
+ if not isinstance(gw, dict) or "max_inbound_media_bytes" not in gw:
+ return DEFAULT_INBOUND_MEDIA_MAX_BYTES
+ try:
+ return int(gw["max_inbound_media_bytes"])
+ except (TypeError, ValueError):
+ return DEFAULT_INBOUND_MEDIA_MAX_BYTES
+
+
+def validate_inbound_media_size(
+ size: int,
+ *,
+ media_type: str = "media",
+ max_bytes: Optional[int] = None,
+) -> None:
+ """Raise ``ValueError`` if an inbound media payload exceeds the cap.
+
+ A ``max_bytes`` of ``0`` (or the configured cap resolving to ``0``)
+ disables the check entirely. Passing ``max_bytes`` lets callers resolve
+ the limit once and reuse it across an incremental read.
+ """
+ limit = get_inbound_media_max_bytes() if max_bytes is None else max_bytes
+ if limit and size > limit:
+ raise ValueError(
+ f"Inbound {media_type} payload is too large "
+ f"({size} bytes > {limit} bytes)"
+ )
+
+
+async def _read_httpx_body_with_limit(response, *, media_type: str) -> bytes:
+ """Read an httpx streaming response body without exceeding the media cap.
+
+ Rejects early on an oversized ``Content-Length`` header, then re-checks
+ the running total as chunks arrive so a lying/absent header can't smuggle
+ an unbounded body past the cap.
+ """
+ max_bytes = get_inbound_media_max_bytes()
+ content_length = response.headers.get("content-length")
+ if content_length:
+ try:
+ declared_size = int(content_length)
+ except ValueError:
+ logger.debug(
+ "Ignoring invalid Content-Length for inbound %s: %r",
+ media_type, content_length,
+ )
+ else:
+ validate_inbound_media_size(
+ declared_size, media_type=media_type, max_bytes=max_bytes,
+ )
+
+ chunks: list[bytes] = []
+ total = 0
+ async for chunk in response.aiter_bytes():
+ total += len(chunk)
+ validate_inbound_media_size(total, media_type=media_type, max_bytes=max_bytes)
+ chunks.append(chunk)
+ return b"".join(chunks)
+
def get_image_cache_dir() -> Path:
"""Return the image cache directory, creating it if it doesn't exist."""
@@ -606,6 +696,7 @@ def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
ValueError: If *data* does not look like a valid image (e.g. an HTML
error page returned by the upstream server).
"""
+ validate_inbound_media_size(len(data), media_type="image")
if not _looks_like_image(data):
snippet = data[:80].decode("utf-8", errors="replace")
raise ValueError(
@@ -651,15 +742,19 @@ async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) ->
) as client:
for attempt in range(retries + 1):
try:
- response = await client.get(
+ async with client.stream(
+ "GET",
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
- )
- response.raise_for_status()
- return cache_image_from_bytes(response.content, ext)
+ ) as response:
+ response.raise_for_status()
+ content = await _read_httpx_body_with_limit(
+ response, media_type="image",
+ )
+ return cache_image_from_bytes(content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
@@ -726,6 +821,7 @@ def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
Returns:
Absolute path to the cached audio file as a string.
"""
+ validate_inbound_media_size(len(data), media_type="audio")
cache_dir = get_audio_cache_dir()
filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"
filepath = cache_dir / filename
@@ -765,15 +861,19 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
) as client:
for attempt in range(retries + 1):
try:
- response = await client.get(
+ async with client.stream(
+ "GET",
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
- )
- response.raise_for_status()
- return cache_audio_from_bytes(response.content, ext)
+ ) as response:
+ response.raise_for_status()
+ content = await _read_httpx_body_with_limit(
+ response, media_type="audio",
+ )
+ return cache_audio_from_bytes(content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
@@ -818,6 +918,7 @@ def get_video_cache_dir() -> Path:
def cache_video_from_bytes(data: bytes, ext: str = ".mp4") -> str:
"""Save raw video bytes to the cache and return the absolute file path."""
+ validate_inbound_media_size(len(data), media_type="video")
cache_dir = get_video_cache_dir()
filename = f"video_{uuid.uuid4().hex[:12]}{ext}"
filepath = cache_dir / filename
@@ -1147,6 +1248,33 @@ SUPPORTED_DOCUMENT_TYPES = {
}
+# ---------------------------------------------------------------------------
+# Text-injection extension allowlist
+#
+# Files whose contents are safe to inline into the prompt (UTF-8 text) when
+# small enough. This is intentionally an extension/MIME gate, NOT a blind
+# UTF-8 decode: binary formats like PDF/zip/docx can begin with decodable
+# ASCII headers and must never be inlined. Any uploaded file is still cached
+# and surfaced to the agent regardless of whether it lands in this set —
+# this only controls inline-vs-path-pointer for the prompt.
+# ---------------------------------------------------------------------------
+
+_TEXT_INJECT_EXTENSIONS = {
+ ".txt", ".md", ".markdown", ".csv", ".tsv", ".log",
+ ".json", ".jsonl", ".ndjson", ".xml", ".yaml", ".yml", ".toml",
+ ".ini", ".cfg", ".conf", ".env", ".properties",
+ ".html", ".htm", ".css", ".scss", ".sass", ".less",
+ ".py", ".pyi", ".js", ".mjs", ".cjs", ".ts", ".tsx", ".jsx",
+ ".sh", ".bash", ".zsh", ".fish", ".ps1", ".bat",
+ ".c", ".h", ".cpp", ".cc", ".hpp", ".cs", ".java", ".kt",
+ ".go", ".rs", ".rb", ".php", ".pl", ".lua", ".r", ".jl",
+ ".swift", ".m", ".scala", ".clj", ".ex", ".exs", ".erl",
+ ".sql", ".graphql", ".proto", ".tf", ".hcl",
+ ".dockerfile", ".makefile", ".cmake", ".gradle",
+ ".rst", ".tex", ".srt", ".vtt", ".diff", ".patch",
+}
+
+
# ---------------------------------------------------------------------------
# Image document types
#
@@ -1353,9 +1481,10 @@ def cache_media_bytes(
``default_kind`` ("image"/"video"/"audio"/"document") biases classification
when the extension/MIME are ambiguous — e.g. a Telegram native photo whose
- file has no usable name. Unsupported document types return None so the
- caller can record an "unsupported" note. Images that fail validation
- (``cache_image_from_bytes`` raises ValueError) also return None.
+ file has no usable name. Any non-image/video/audio file is cached as a
+ document and surfaced to the agent (arbitrary types get
+ ``application/octet-stream``); only images that fail validation
+ (``cache_image_from_bytes`` raises ValueError) return None.
"""
from tools.credential_files import to_agent_visible_cache_path
@@ -1391,11 +1520,20 @@ def cache_media_bytes(
out_mime = mime if mime.startswith("audio/") else f"audio/{aud_ext.lstrip('.')}"
return CachedMedia(to_agent_visible_cache_path(path), out_mime, "audio", display)
- if ext not in SUPPORTED_DOCUMENT_TYPES:
- return None
-
- path = cache_document_from_bytes(data, filename or f"document{ext}")
- return CachedMedia(to_agent_visible_cache_path(path), SUPPORTED_DOCUMENT_TYPES[ext], "document", display or f"document{ext}")
+ # Any other file type is cached and surfaced to the agent as a local path
+ # so it can be inspected with terminal / read_file / etc. Authorization to
+ # talk to the agent is the gate that matters — once a user is allowed to
+ # message it, the file-extension allowlist must not silently drop their
+ # uploads. Known extensions keep their precise MIME; everything else is
+ # tagged application/octet-stream (or the caller-supplied MIME) so the
+ # agent knows it's an arbitrary file and reaches for terminal tools.
+ fallback_name = filename or (f"document{ext}" if ext else "document.bin")
+ path = cache_document_from_bytes(data, fallback_name)
+ if ext in SUPPORTED_DOCUMENT_TYPES:
+ out_mime = SUPPORTED_DOCUMENT_TYPES[ext]
+ else:
+ out_mime = mime if mime else "application/octet-stream"
+ return CachedMedia(to_agent_visible_cache_path(path), out_mime, "document", display or fallback_name)
class MessageType(Enum):
@@ -1454,6 +1592,9 @@ class MessageEvent:
# Reply context
reply_to_message_id: Optional[str] = None
reply_to_text: Optional[str] = None # Text of the replied-to message (for context injection)
+ reply_to_author_id: Optional[str] = None
+ reply_to_author_name: Optional[str] = None
+ reply_to_is_own_message: bool = False # True when the user replied to this bot/assistant's message
# Auto-loaded skill(s) for topic/channel bindings (e.g., Telegram DM Topics,
# Discord channel_skill_bindings). A single name or ordered list.
@@ -1570,6 +1711,105 @@ class SendResult:
# made up the full payload, in send order. Empty tuple for the common
# single-message case.
continuation_message_ids: tuple = ()
+ # Machine-readable failure category (set only when ``success`` is False).
+ # ``error`` stays the human-readable detail string; ``error_kind`` lets
+ # consumers branch deterministically instead of substring-matching the raw
+ # provider message. One of the values in :data:`SEND_ERROR_KINDS` or
+ # ``None`` (unset / not classified). Producers should set this via
+ # :func:`classify_send_error`.
+ error_kind: Optional[str] = None
+
+
+# Machine-readable send-failure categories. Kept platform-neutral so every
+# adapter can populate ``SendResult.error_kind`` from the same vocabulary and
+# the gateway can decide — once, in one place — whether a failure is worth
+# surfacing to the user.
+#
+# too_long content exceeded the platform's per-message size cap; the
+# adapter typically recovers via continuation/split, so this is
+# informational rather than a hard failure.
+# bad_format the platform rejected the message markup/entities (parse
+# error); a plain-text retry is the actionable fix.
+# forbidden the bot is blocked, kicked, or lacks permission to post to the
+# target — the bot CANNOT reach the user, so there is nowhere to
+# surface a notice.
+# not_found the target chat/thread/message no longer exists.
+# rate_limited the platform throttled the send (flood control).
+# transient a connection-level failure that is safe to retry.
+# unknown classification did not match any known shape.
+SEND_ERROR_KINDS = frozenset(
+ {
+ "too_long",
+ "bad_format",
+ "forbidden",
+ "not_found",
+ "rate_limited",
+ "transient",
+ "unknown",
+ }
+)
+
+
+def classify_send_error(exc: Optional[BaseException], error_text: str = "") -> str:
+ """Map a send exception / error string to a :data:`SEND_ERROR_KINDS` value.
+
+ Platform-neutral: matches on the lowercased text of ``exc`` (and/or the
+ explicit ``error_text``) against the substrings the major messaging APIs
+ use. Conservative — anything unrecognized returns ``"unknown"`` so callers
+ never mistake an unclassified failure for a benign one.
+ """
+ parts = []
+ if error_text:
+ parts.append(error_text)
+ if exc is not None:
+ parts.append(str(exc))
+ parts.append(exc.__class__.__name__)
+ blob = " ".join(parts).lower()
+ if not blob.strip():
+ return "unknown"
+ if "message_too_long" in blob or "too long" in blob or "message is too long" in blob:
+ return "too_long"
+ if (
+ "can't parse entities" in blob
+ or "cant parse entities" in blob
+ or "can't find end" in blob
+ or "unsupported start tag" in blob
+ or ("entity" in blob and "parse" in blob)
+ or ("bad request" in blob and "entit" in blob)
+ ):
+ return "bad_format"
+ if (
+ "forbidden" in blob
+ or "bot was blocked" in blob
+ or "blocked by the user" in blob
+ or "user is deactivated" in blob
+ or "not enough rights" in blob
+ or "have no rights" in blob
+ or "not a member" in blob
+ ):
+ return "forbidden"
+ if (
+ "chat not found" in blob
+ or "message to edit not found" in blob
+ or "message to reply not found" in blob
+ or "thread not found" in blob
+ or "topic_deleted" in blob
+ or "message_id_invalid" in blob
+ ):
+ return "not_found"
+ if (
+ "flood" in blob
+ or "too many requests" in blob
+ or "retry after" in blob
+ or "rate limit" in blob
+ ):
+ return "rate_limited"
+ for pat in _RETRYABLE_ERROR_PATTERNS:
+ if pat in blob:
+ return "transient"
+ if "connecttimeout" in blob:
+ return "transient"
+ return "unknown"
class EphemeralReply(str):
@@ -1821,6 +2061,22 @@ class BasePlatformAdapter(ABC):
# preview (see gateway/run.py progress_callback).
supports_code_blocks: bool = False
+ # Whether this adapter can deliver an ASYNC notification back to the agent
+ # AFTER a turn ends — i.e. wake a fresh turn to surface a background
+ # process completion (terminal notify_on_complete / watch_patterns) or a
+ # detached subagent result (delegate_task background=True).
+ #
+ # True for adapters that hold a persistent outbound channel (Telegram,
+ # Discord, Slack, ... — they have a real ``send()`` and the gateway runs
+ # the watcher/drain loops). False for stateless request/response adapters
+ # (the API server): every route closes its channel when the turn ends, so
+ # there is nowhere to push a later completion. The gateway propagates this
+ # into the ``HERMES_SESSION_ASYNC_DELIVERY`` contextvar at session-bind
+ # time; tools read it via ``async_delivery_supported()`` and refuse to make
+ # a delivery promise they can't keep. A new stateless adapter only needs to
+ # set this to False to stay correct-by-default.
+ supports_async_delivery: bool = True
+
# The command prefix users can always TYPE on this platform to reach
# Hermes commands. Default "/" (most platforms deliver "/approve" etc.
# as plain message text). Platforms where typing a leading "/" is
diff --git a/gateway/platforms/signal.py b/gateway/platforms/signal.py
index 99153034848..f91dc96d60f 100644
--- a/gateway/platforms/signal.py
+++ b/gateway/platforms/signal.py
@@ -17,8 +17,12 @@ import json
import logging
import os
import random
+import shutil
+import subprocess
+import tempfile
import time
import uuid
+from collections import OrderedDict
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
@@ -39,6 +43,7 @@ from gateway.platforms.base import (
cache_image_from_url,
)
from gateway.platforms.helpers import redact_phone
+from gateway.platforms.signal_format import markdown_to_signal
from gateway.platforms.signal_rate_limit import (
SIGNAL_BATCH_PACING_NOTICE_THRESHOLD,
SIGNAL_MAX_ATTACHMENTS_PER_MSG,
@@ -76,7 +81,14 @@ def _parse_comma_list(value: str) -> List[str]:
def _guess_extension(data: bytes) -> str:
- """Guess file extension from magic bytes."""
+ """Guess file extension from magic bytes.
+
+ Android Signal delivers voice notes as raw ADTS AAC frames, which share
+ the ``0xFF 0xFx`` sync word with MPEG-1/2 Layer 3 (MP3). The byte-1
+ layout disambiguates: ADTS packs ``ID layer protection_absent`` into
+ bits 3-0, where ``ID`` is 0 for MPEG-2/4 AAC and ``layer`` is always
+ 0 for ADTS. A real MP3 frame has ``ID=1`` and ``layer`` in {1, 2, 3}.
+ """
if data[:4] == b"\x89PNG":
return ".png"
if data[:2] == b"\xff\xd8":
@@ -92,6 +104,12 @@ def _guess_extension(data: bytes) -> str:
if data[:4] == b"OggS":
return ".ogg"
if len(data) >= 2 and data[0] == 0xFF and (data[1] & 0xE0) == 0xE0:
+ # ``0xFF 0xFx`` is shared by MP3 and ADTS AAC. The discriminator
+ # is bits 3-1 of byte 1: ADTS has ``ID=0`` and ``layer=00`` (mask
+ # 0xF6, target 0xF0); MP3 has ``ID=1`` and ``layer`` in {01,10,11}
+ # (mask 0xF6, target in {0xF2, 0xF4, 0xF6}).
+ if (data[1] & 0xF6) == 0xF0:
+ return ".aac"
return ".mp3"
if data[:2] == b"PK":
return ".zip"
@@ -120,6 +138,61 @@ def _ext_to_mime(ext: str) -> str:
return _EXT_TO_MIME.get(ext.lower(), "application/octet-stream")
+def _remux_aac_to_m4a(aac_data: bytes) -> Optional[Tuple[bytes, str]]:
+ """Losslessly remux raw ADTS AAC bytes into an MP4 (.m4a) container.
+
+ Used by the Signal attachment cache so Android voice notes land on disk
+ in a container that every major STT API (Groq, OpenAI, xAI, Mistral
+ Voxtral) will accept. ``ffmpeg -c:a copy`` is a single demux/remux —
+ no re-encode, no quality loss, sub-100ms for typical voice-note sizes.
+
+ Returns ``(m4a_bytes, ".m4a")`` on success, or ``None`` if ffmpeg is
+ missing, input is invalid, or remux fails for any reason. Callers
+ must treat ``None`` as "pass through unchanged" and not raise.
+ """
+ ffmpeg = shutil.which("ffmpeg")
+ if not ffmpeg:
+ # Common Homebrew/local prefixes on macOS dev hosts.
+ for prefix in ("/opt/homebrew/bin/ffmpeg", "/usr/local/bin/ffmpeg"):
+ if os.path.isfile(prefix) and os.access(prefix, os.X_OK):
+ ffmpeg = prefix
+ break
+ if not ffmpeg:
+ logger.debug("Signal: ffmpeg not found, skipping AAC→M4A remux")
+ return None
+ try:
+ with tempfile.NamedTemporaryFile(suffix=".aac", delete=False) as src:
+ src.write(aac_data)
+ src_path = src.name
+ dst_path = src_path[:-4] + ".m4a"
+ try:
+ proc = subprocess.run(
+ [ffmpeg, "-y", "-loglevel", "error", "-i", src_path,
+ "-c:a", "copy", "-movflags", "+faststart", dst_path],
+ capture_output=True, timeout=10,
+ )
+ if proc.returncode != 0:
+ logger.warning(
+ "Signal: AAC→M4A remux failed (ffmpeg exit %d): %s",
+ proc.returncode, proc.stderr.decode("utf-8", "replace")[:300],
+ )
+ return None
+ with open(dst_path, "rb") as f:
+ return f.read(), ".m4a"
+ finally:
+ for p in (src_path, dst_path):
+ try:
+ os.unlink(p)
+ except OSError:
+ pass
+ except subprocess.TimeoutExpired:
+ logger.warning("Signal: AAC→M4A remux timed out (>10s)")
+ return None
+ except Exception:
+ logger.exception("Signal: AAC→M4A remux error")
+ return None
+
+
def _render_mentions(text: str, mentions: list) -> str:
"""Replace Signal mention placeholders (\\uFFFC) with readable @identifiers.
@@ -232,9 +305,24 @@ class SignalAdapter(BasePlatformAdapter):
self._account_normalized = self.account.strip()
# Track recently sent message timestamps to prevent echo-back loops
- # in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
- self._recent_sent_timestamps: set = set()
- self._max_recent_timestamps = 50
+ # in Note to Self / self-chat mode and linked-device group sync-sents.
+ # OrderedDict[timestamp_ms -> insertion_monotonic_seconds] gives us
+ # LRU eviction (popitem(last=False) drops oldest) plus a TTL so that
+ # under chatty groups a still-pending echo cannot be evicted just
+ # because >50 outbounds happened. With a 5-minute TTL the cap only
+ # matters for runaway producers, not normal traffic bursts.
+ self._recent_sent_timestamps: "OrderedDict[int, float]" = OrderedDict()
+ self._max_recent_timestamps = 512
+ self._recent_sent_ttl_seconds = 300.0
+ # Keep a separate bounded cache of outbound Signal message timestamps.
+ # Signal quote.id is the timestamp of the quoted message, so this lets
+ # inbound replies identify that the user replied to a message sent by
+ # this bot even after the self-sync echo was filtered above.
+ # OrderedDict (not set) so the cap evicts the OLDEST timestamp in FIFO
+ # order — a plain set.pop() removes an arbitrary element, which could
+ # drop a still-recent timestamp and miss a genuine reply-to-own-message.
+ self._sent_message_timestamps: "OrderedDict[str, None]" = OrderedDict()
+ self._max_sent_message_timestamps = 500
# Signal increasingly exposes ACI/PNI UUIDs as stable recipient IDs.
# Keep a best-effort mapping so outbound sends can upgrade from a
# phone number to the corresponding UUID when signal-cli prefers it.
@@ -458,8 +546,7 @@ class SignalAdapter(BasePlatformAdapter):
sent_msg_group_id = sent_msg_group_info.get("groupId") if sent_msg_group_info else None
if dest == self._account_normalized or sent_msg_group_id:
# Check if this is an echo of our own outbound reply
- if sent_ts and sent_ts in self._recent_sent_timestamps:
- self._recent_sent_timestamps.discard(sent_ts)
+ if self._consume_sent_timestamp(sent_ts):
return
# Genuine user Note to Self — promote to dataMessage
is_note_to_self = True
@@ -543,10 +630,37 @@ class SignalAdapter(BasePlatformAdapter):
)
return
- # Extract quote (reply-to) context from Signal dataMessage
+ # Strip the bot's own @mention from any group message so the agent
+ # doesn't misinterpret "@+155****4567 say hello" as a directive to
+ # contact that phone number. _render_mentions replaces the Signal
+ #  placeholder with @, which looks like an
+ # addressee to the LLM rather than a self-reference. Applies to every
+ # group (not just require_mention groups) so the self-mention is
+ # cleaned wherever it appears.
+ if is_group and text:
+ account_norm = self._account_normalized
+ if account_norm:
+ text = text.replace(f"@{account_norm}", "")
+ # Also strip if the mention was rendered using the bot's UUID
+ bot_uuid = self._recipient_uuid_by_number.get(account_norm)
+ if bot_uuid:
+ text = text.replace(f"@{bot_uuid}", "")
+ # Tidy the spacing the removed mention left behind: collapse the
+ # double-space at a mid-sentence removal and trim the ends.
+ # Only touches the doubled space the removal introduced, so
+ # intentional newlines in a multi-line message are preserved.
+ text = text.replace(" ", " ").strip()
+
+ # Extract quote (reply-to) context from Signal dataMessage. Signal's
+ # quote.id is the timestamp of the quoted message; quote.author points
+ # at the quoted sender when available. Preserve both so the gateway can
+ # tell the agent when the user replied to a specific assistant message.
quote_data = data_message.get("quote") or {}
reply_to_id = str(quote_data.get("id")) if quote_data.get("id") else None
reply_to_text = quote_data.get("text")
+ reply_to_author = self._extract_quote_author(quote_data)
+ reply_to_author_name = quote_data.get("authorName") or quote_data.get("authorProfileName")
+ reply_to_is_own = self._quote_references_own_message(reply_to_id, reply_to_author)
# Process attachments
attachments_data = data_message.get("attachments", [])
@@ -631,9 +745,16 @@ class SignalAdapter(BasePlatformAdapter):
media_urls=media_urls,
media_types=media_types,
timestamp=timestamp,
- raw_message={"sender": sender, "timestamp_ms": ts_ms},
+ raw_message={
+ "sender": sender,
+ "timestamp_ms": ts_ms,
+ "quote": quote_data if quote_data else None,
+ },
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
+ reply_to_author_id=reply_to_author,
+ reply_to_author_name=reply_to_author_name,
+ reply_to_is_own_message=reply_to_is_own,
)
logger.debug("Signal: message from %s in %s: %s",
@@ -648,6 +769,56 @@ class SignalAdapter(BasePlatformAdapter):
self._recipient_uuid_by_number[number] = service_id
self._recipient_number_by_uuid[service_id] = number
+ @staticmethod
+ def _extract_quote_author(quote_data: Any) -> Optional[str]:
+ """Return the best available Signal sender identifier from quote metadata."""
+ if not isinstance(quote_data, dict):
+ return None
+ for key in (
+ "author",
+ "authorNumber",
+ "authorUuid",
+ "authorAci",
+ "authorServiceId",
+ "authorServiceIdString",
+ ):
+ value = quote_data.get(key)
+ if value:
+ return str(value)
+ return None
+
+ def _quote_references_own_message(
+ self,
+ reply_to_id: Optional[str],
+ reply_to_author: Optional[str],
+ ) -> bool:
+ """True when a Signal quote points at this adapter's outbound message."""
+ if reply_to_id and str(reply_to_id) in self._sent_message_timestamps:
+ return True
+ if not reply_to_author:
+ return False
+ author = str(reply_to_author).strip()
+ if self._account_normalized and author == self._account_normalized:
+ return True
+ cached_uuid = self._recipient_uuid_by_number.get(self._account_normalized)
+ if cached_uuid and author == cached_uuid:
+ return True
+ cached_number = self._recipient_number_by_uuid.get(author)
+ return bool(cached_number and cached_number == self._account_normalized)
+
+ def _remember_sent_message_timestamp(self, timestamp: Any) -> None:
+ """Keep a bounded cache of outbound Signal timestamps for quote matching."""
+ if timestamp is None:
+ return
+ key = str(timestamp)
+ # Re-insert to mark most-recently-used so eviction drops genuinely old
+ # timestamps, not a recently re-seen one.
+ self._sent_message_timestamps.pop(key, None)
+ self._sent_message_timestamps[key] = None
+ # FIFO-evict the oldest entry once over the cap.
+ while len(self._sent_message_timestamps) > self._max_sent_message_timestamps:
+ self._sent_message_timestamps.popitem(last=False)
+
def _extract_contact_uuid(self, contact: Any, phone_number: str) -> Optional[str]:
"""Best-effort extraction of a Signal service ID from listContacts output."""
if not isinstance(contact, dict):
@@ -724,6 +895,18 @@ class SignalAdapter(BasePlatformAdapter):
raw_data = base64.b64decode(result)
ext = _guess_extension(raw_data)
+ # Android Signal voice notes are raw ADTS AAC streams. Most STT
+ # providers (Groq Whisper, OpenAI Whisper) reject raw ADTS — they
+ # require AAC to be muxed into an MP4 container. Remux losslessly
+ # with ``ffmpeg -c:a copy`` so the cached file is a normal .m4a.
+ # No re-encode, sub-100ms on a Pi 5. Graceful no-op if ffmpeg is
+ # absent: the raw ADTS file is cached as-is and STT may reject it
+ # (there is no downstream sniff-and-remux fallback).
+ if ext == ".aac":
+ remuxed: Optional[Tuple[bytes, str]] = await asyncio.to_thread(_remux_aac_to_m4a, raw_data)
+ if remuxed is not None:
+ raw_data, ext = remuxed
+
if _is_image_ext(ext):
path = cache_image_from_bytes(raw_data, ext)
elif _is_audio_ext(ext):
@@ -796,7 +979,16 @@ class SignalAdapter(BasePlatformAdapter):
logger.debug("Signal RPC error (%s): %s", method, err)
return None
- return data.get("result")
+ result = data.get("result")
+ if isinstance(result, dict) and raise_on_rate_limit:
+ results = result.get("results")
+ if isinstance(results, list):
+ for r in results:
+ if isinstance(r, dict) and r.get("type") == "RATE_LIMIT_FAILURE":
+ retry_after = r.get("retryAfterSeconds")
+ raise SignalRateLimitError("Rate limit exceeded for recipient", retry_after=retry_after)
+
+ return result
except SignalRateLimitError:
raise
@@ -812,144 +1004,9 @@ class SignalAdapter(BasePlatformAdapter):
# ------------------------------------------------------------------
@staticmethod
- def _markdown_to_signal(text: str) -> tuple:
- """Convert markdown to plain text + Signal textStyles list.
-
- Signal doesn't render markdown. Instead it uses ``bodyRanges``
- (exposed by signal-cli as ``textStyle`` / ``textStyles`` params)
- with the format ``start:length:STYLE``.
-
- Positions are measured in **UTF-16 code units** (not Python code
- points) because that's what the Signal protocol uses.
-
- Supported styles: BOLD, ITALIC, STRIKETHROUGH, MONOSPACE.
- (Signal's SPOILER style is not currently mapped — no standard
- markdown syntax for it; would need ``||spoiler||`` parsing.)
-
- Returns ``(plain_text, styles_list)`` where *styles_list* may be
- empty if there's nothing to format.
- """
- import re
-
- def _utf16_len(s: str) -> int:
- """Length of *s* in UTF-16 code units."""
- return len(s.encode("utf-16-le")) // 2
-
- # Pre-process: normalize whitespace before any position tracking
- # so later operations don't invalidate recorded offsets.
- text = re.sub(r"\n{3,}", "\n\n", text)
- text = text.strip()
-
- styles: list = []
-
- # --- Phase 1: fenced code blocks ```...``` → MONOSPACE ---
- _CB = re.compile(r"```[a-zA-Z0-9_+-]*\n?(.*?)```", re.DOTALL)
- while m := _CB.search(text):
- inner = m.group(1).rstrip("\n")
- start = m.start()
- text = text[: m.start()] + inner + text[m.end() :]
- styles.append((start, len(inner), "MONOSPACE"))
-
- # --- Phase 2: heading markers # Foo → Foo (BOLD) ---
- _HEADING = re.compile(r"^#{1,6}\s+", re.MULTILINE)
- new_text = ""
- last_end = 0
- for m in _HEADING.finditer(text):
- new_text += text[last_end : m.start()]
- last_end = m.end()
- eol = text.find("\n", m.end())
- if eol == -1:
- eol = len(text)
- heading_text = text[m.end() : eol]
- start = len(new_text)
- new_text += heading_text
- styles.append((start, len(heading_text), "BOLD"))
- last_end = eol
- new_text += text[last_end:]
- text = new_text
-
- # --- Phase 3: inline patterns (single-pass to avoid offset drift) ---
- # The old code processed each pattern sequentially, stripping markers
- # and recording positions per-pass. Later passes shifted text without
- # adjusting earlier positions → bold/italic landed mid-word.
- #
- # Fix: collect ALL non-overlapping matches first, then strip every
- # marker in one pass so positions are computed against the final text.
- _PATTERNS = [
- (re.compile(r"\*\*(.+?)\*\*", re.DOTALL), "BOLD"),
- (re.compile(r"__(.+?)__", re.DOTALL), "BOLD"),
- (re.compile(r"~~(.+?)~~", re.DOTALL), "STRIKETHROUGH"),
- (re.compile(r"`(.+?)`"), "MONOSPACE"),
- (re.compile(r"(? os for os, oe in occupied):
- all_matches.append((ms, me, m.start(1), m.end(1), style))
- occupied.append((ms, me))
- all_matches.sort()
-
- # Build removal list so we can adjust Phase 1/2 styles.
- # Each match removes its prefix markers (start..g1_start) and
- # suffix markers (g1_end..end).
- removals: list = [] # (position, length) sorted
- for ms, me, g1s, g1e, _ in all_matches:
- if g1s > ms:
- removals.append((ms, g1s - ms))
- if me > g1e:
- removals.append((g1e, me - g1e))
- removals.sort()
-
- # Adjust Phase 1/2 styles for characters about to be removed.
- def _adj(pos: int) -> int:
- shift = 0
- for rp, rl in removals:
- if rp < pos:
- shift += min(rl, pos - rp)
- else:
- break
- return pos - shift
-
- adjusted_prior: list = []
- for s, l, st in styles:
- ns = _adj(s)
- ne = _adj(s + l)
- if ne > ns:
- adjusted_prior.append((ns, ne - ns, st))
-
- # Strip all inline markers in one pass → positions are correct.
- result = ""
- last_end = 0
- inline_styles: list = []
- for ms, me, g1s, g1e, sty in all_matches:
- result += text[last_end:ms]
- pos = len(result)
- inner = text[g1s:g1e]
- result += inner
- inline_styles.append((pos, len(inner), sty))
- last_end = me
- result += text[last_end:]
- text = result
-
- styles = adjusted_prior + inline_styles
-
- # Convert code-point offsets → UTF-16 code-unit offsets
- style_strings = []
- for cp_start, cp_len, stype in sorted(styles):
- # Safety: skip any out-of-bounds styles
- if cp_start < 0 or cp_start + cp_len > len(text):
- continue
- u16_start = _utf16_len(text[:cp_start])
- u16_len = _utf16_len(text[cp_start : cp_start + cp_len])
- style_strings.append(f"{u16_start}:{u16_len}:{stype}")
-
- return text, style_strings
+ def _markdown_to_signal(text: str) -> tuple[str, list[str]]:
+ """Backward-compatible wrapper around shared Signal formatting helper."""
+ return markdown_to_signal(text)
def format_message(self, content: str) -> str:
"""Strip markdown for plain-text fallback (used by base class).
@@ -960,6 +1017,29 @@ class SignalAdapter(BasePlatformAdapter):
# Our send() override bypasses this entirely.
return content
+ def _validate_send_result(self, result: Any) -> tuple[bool, Optional[str]]:
+ """Validate signal-cli send response results.
+
+ Returns (success, error_message).
+ """
+ if not result or not isinstance(result, dict):
+ return True, None
+
+ results = result.get("results")
+ if isinstance(results, list):
+ for r in results:
+ if not isinstance(r, dict):
+ continue
+ rtype = r.get("type")
+ if rtype and rtype != "SUCCESS":
+ return False, str(rtype)
+ if "success" in r and not r.get("success"):
+ fail = r.get("failure")
+ if fail:
+ return False, str(fail)
+ return False, "Recipient delivery failed"
+ return True, None
+
# ------------------------------------------------------------------
# Sending
# ------------------------------------------------------------------
@@ -992,9 +1072,13 @@ class SignalAdapter(BasePlatformAdapter):
else:
params["recipient"] = [await self._resolve_recipient(chat_id)]
+ logger.info("[Signal] Sending response (%d chars) to %s", len(plain_text), chat_id)
result = await self._rpc("send", params)
if result is not None:
+ success, err_msg = self._validate_send_result(result)
+ if not success:
+ return SendResult(success=False, error=err_msg, raw_response=result)
self._track_sent_timestamp(result)
# Signal has no editable message identifier. Returning None keeps the
# stream consumer on the non-edit fallback path instead of pretending
@@ -1006,9 +1090,29 @@ class SignalAdapter(BasePlatformAdapter):
"""Record outbound message timestamp for echo-back filtering."""
ts = rpc_result.get("timestamp") if isinstance(rpc_result, dict) else None
if ts:
- self._recent_sent_timestamps.add(ts)
- if len(self._recent_sent_timestamps) > self._max_recent_timestamps:
- self._recent_sent_timestamps.pop()
+ self._remember_sent_message_timestamp(ts)
+ now = time.monotonic()
+ # Re-insert to mark as most-recently-used.
+ self._recent_sent_timestamps.pop(ts, None)
+ self._recent_sent_timestamps[ts] = now
+ # Drop entries older than TTL first (cheap O(k) where k=expired).
+ cutoff = now - self._recent_sent_ttl_seconds
+ while self._recent_sent_timestamps:
+ oldest_ts, oldest_at = next(iter(self._recent_sent_timestamps.items()))
+ if oldest_at < cutoff:
+ self._recent_sent_timestamps.popitem(last=False)
+ else:
+ break
+ # Hard cap as a last-resort guard against runaway producers.
+ while len(self._recent_sent_timestamps) > self._max_recent_timestamps:
+ self._recent_sent_timestamps.popitem(last=False)
+
+ def _consume_sent_timestamp(self, ts) -> bool:
+ """Pop a timestamp if it matches one we sent. Returns True on echo."""
+ if ts and ts in self._recent_sent_timestamps:
+ self._recent_sent_timestamps.pop(ts, None)
+ return True
+ return False
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send a typing indicator.
@@ -1171,14 +1275,33 @@ class SignalAdapter(BasePlatformAdapter):
)
_rpc_duration = time.monotonic() - _rpc_t0
if result is not None:
- self._track_sent_timestamp(result)
- await scheduler.report_rpc_duration(_rpc_duration, n)
- logger.info(
- "Signal batch %d/%d: %d attachments sent in %.1fs "
- "(attempt %d/%d)",
- idx + 1, len(att_batches), n, _rpc_duration,
- attempt, SIGNAL_RATE_LIMIT_MAX_ATTEMPTS,
- )
+ success, err_msg = self._validate_send_result(result)
+ if success:
+ self._track_sent_timestamp(result)
+ await scheduler.report_rpc_duration(_rpc_duration, n)
+ logger.info(
+ "Signal batch %d/%d: %d attachments sent in %.1fs "
+ "(attempt %d/%d)",
+ idx + 1, len(att_batches), n, _rpc_duration,
+ attempt, SIGNAL_RATE_LIMIT_MAX_ATTEMPTS,
+ )
+ else:
+ logger.error(
+ "Signal: RPC send failed for batch %d/%d (%d attachments, "
+ "attempt %d/%d, rpc_duration=%.1fs): %s",
+ idx + 1, len(att_batches), n,
+ attempt, SIGNAL_RATE_LIMIT_MAX_ATTEMPTS,
+ _rpc_duration, err_msg,
+ )
+ # Retry transient (non-rate-limit) failures once
+ if attempt < SIGNAL_RATE_LIMIT_MAX_ATTEMPTS:
+ backoff = 2.0 ** attempt
+ logger.info(
+ "Signal: retrying batch %d/%d after %.1fs backoff",
+ idx + 1, len(att_batches), backoff,
+ )
+ await asyncio.sleep(backoff)
+ continue
else:
# Assume the server didn't accept the batch, don't deduce tokens
logger.error(
@@ -1277,6 +1400,9 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
+ success, err_msg = self._validate_send_result(result)
+ if not success:
+ return SendResult(success=False, error=err_msg, raw_response=result)
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send with attachment failed")
@@ -1316,6 +1442,9 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
+ success, err_msg = self._validate_send_result(result)
+ if not success:
+ return SendResult(success=False, error=err_msg, raw_response=result)
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error=f"RPC send {media_label.lower()} failed")
@@ -1385,8 +1514,29 @@ class SignalAdapter(BasePlatformAdapter):
await task
except asyncio.CancelledError:
pass
- # Reset per-chat typing backoff state so the next agent turn starts
- # fresh rather than inheriting a cooldown from a prior conversation.
+
+ # Send an explicit stop-typing RPC so the recipient's device drops the
+ # indicator immediately instead of waiting for Signal's ~5s built-in
+ # timeout. Failures are best-effort — the backoff state must still be
+ # cleared so the next agent turn starts clean.
+ try:
+ params: Dict[str, Any] = {"account": self.account}
+ if chat_id.startswith("group:"):
+ params["groupId"] = chat_id[6:]
+ else:
+ params["recipient"] = [await self._resolve_recipient(chat_id)]
+ params["stop"] = True
+ await self._rpc(
+ "sendTyping",
+ params,
+ rpc_id="typing-stop",
+ log_failures=False,
+ )
+ except Exception:
+ # Best-effort: any RPC failure (or recipient-resolution failure)
+ # must not prevent backoff cleanup.
+ pass
+
self._typing_failures.pop(chat_id, None)
self._typing_skip_until.pop(chat_id, None)
diff --git a/gateway/platforms/signal_format.py b/gateway/platforms/signal_format.py
new file mode 100644
index 00000000000..e8539549bf1
--- /dev/null
+++ b/gateway/platforms/signal_format.py
@@ -0,0 +1,140 @@
+"""Shared Signal formatting helpers.
+
+Keep markdown → Signal native formatting conversion in one place so both the
+live Signal adapter and standalone send paths emit the same bodyRanges.
+"""
+
+from __future__ import annotations
+
+import re
+
+
+def markdown_to_signal(text: str) -> tuple[str, list[str]]:
+ """Convert markdown to plain text + Signal textStyles list.
+
+ Signal doesn't render markdown. Instead it uses ``bodyRanges`` (exposed by
+ signal-cli as ``textStyle`` / ``textStyles`` params) with the format
+ ``start:length:STYLE``.
+
+ Positions are measured in UTF-16 code units because that's what the Signal
+ protocol uses.
+
+ Supported styles: BOLD, ITALIC, STRIKETHROUGH, MONOSPACE.
+ """
+
+ def _utf16_len(s: str) -> int:
+ """Length of *s* in UTF-16 code units."""
+ return len(s.encode("utf-16-le")) // 2
+
+ def _normalize_bullet_markers(source: str) -> str:
+ """Replace Markdown bullet markers with plain Unicode bullets.
+
+ Signal does not render Markdown list syntax, so ``- item`` and
+ ``* item`` otherwise arrive as literal Markdown markers. Preserve
+ fenced code blocks byte-for-byte; list-looking lines inside code are
+ code, not prose bullets.
+ """
+ parts = re.split(r"(```.*?```)", source, flags=re.DOTALL)
+ for idx, part in enumerate(parts):
+ if idx % 2 == 1:
+ continue
+ parts[idx] = re.sub(r"(?m)^([ \t]{0,3})[-*+]\s+", r"\1• ", part)
+ return "".join(parts)
+
+ text = re.sub(r"\n{3,}", "\n\n", text)
+ text = text.strip()
+ text = _normalize_bullet_markers(text)
+
+ styles: list[tuple[int, int, str]] = []
+
+ code_block = re.compile(r"```[a-zA-Z0-9_+-]*\n?(.*?)```", re.DOTALL)
+ while match := code_block.search(text):
+ inner = match.group(1).rstrip("\n")
+ start = match.start()
+ text = text[: match.start()] + inner + text[match.end() :]
+ styles.append((start, len(inner), "MONOSPACE"))
+
+ heading = re.compile(r"^#{1,6}\s+", re.MULTILINE)
+ new_text = ""
+ last_end = 0
+ for match in heading.finditer(text):
+ new_text += text[last_end : match.start()]
+ last_end = match.end()
+ eol = text.find("\n", match.end())
+ if eol == -1:
+ eol = len(text)
+ heading_text = text[match.end() : eol]
+ start = len(new_text)
+ new_text += heading_text
+ styles.append((start, len(heading_text), "BOLD"))
+ last_end = eol
+ new_text += text[last_end:]
+ text = new_text
+
+ patterns = [
+ (re.compile(r"\*\*(.+?)\*\*", re.DOTALL), "BOLD"),
+ (re.compile(r"__(.+?)__", re.DOTALL), "BOLD"),
+ (re.compile(r"~~(.+?)~~", re.DOTALL), "STRIKETHROUGH"),
+ (re.compile(r"`(.+?)`"), "MONOSPACE"),
+ (re.compile(r"(? os for os, oe in occupied):
+ all_matches.append((ms, me, match.start(1), match.end(1), style))
+ occupied.append((ms, me))
+ all_matches.sort()
+
+ removals: list[tuple[int, int]] = []
+ for ms, me, g1s, g1e, _ in all_matches:
+ if g1s > ms:
+ removals.append((ms, g1s - ms))
+ if me > g1e:
+ removals.append((g1e, me - g1e))
+ removals.sort()
+
+ def _adjust(pos: int) -> int:
+ shift = 0
+ for remove_pos, remove_len in removals:
+ if remove_pos < pos:
+ shift += min(remove_len, pos - remove_pos)
+ else:
+ break
+ return pos - shift
+
+ adjusted_prior: list[tuple[int, int, str]] = []
+ for start, length, style in styles:
+ new_start = _adjust(start)
+ new_end = _adjust(start + length)
+ if new_end > new_start:
+ adjusted_prior.append((new_start, new_end - new_start, style))
+
+ result = ""
+ last_end = 0
+ inline_styles: list[tuple[int, int, str]] = []
+ for ms, me, g1s, g1e, style in all_matches:
+ result += text[last_end:ms]
+ pos = len(result)
+ inner = text[g1s:g1e]
+ result += inner
+ inline_styles.append((pos, len(inner), style))
+ last_end = me
+ result += text[last_end:]
+ text = result
+
+ styles = adjusted_prior + inline_styles
+
+ style_strings: list[str] = []
+ for cp_start, cp_len, style_type in sorted(styles):
+ if cp_start < 0 or cp_start + cp_len > len(text):
+ continue
+ u16_start = _utf16_len(text[:cp_start])
+ u16_len = _utf16_len(text[cp_start : cp_start + cp_len])
+ style_strings.append(f"{u16_start}:{u16_len}:{style_type}")
+
+ return text, style_strings
diff --git a/gateway/platforms/webhook.py b/gateway/platforms/webhook.py
index 222adf4c2ea..d9f98282a8d 100644
--- a/gateway/platforms/webhook.py
+++ b/gateway/platforms/webhook.py
@@ -57,6 +57,11 @@ from gateway.platforms.base import (
logger = logging.getLogger(__name__)
+# Sentinel returned by _resolve_request_profile when a /p// prefix
+# names a profile this gateway does not serve (→ 404). Distinct from None
+# (no prefix / multiplexing off → handle as the default profile).
+_PROFILE_REJECTED = object()
+
_BUILTIN_DELIVER_PLATFORMS = {
"telegram", "discord", "slack", "signal", "sms", "whatsapp",
"matrix", "mattermost", "homeassistant", "email", "dingtalk",
@@ -189,6 +194,14 @@ class WebhookAdapter(BasePlatformAdapter):
app = web.Application()
app.router.add_get("/health", self._handle_health)
app.router.add_post("/webhooks/{route_name}", self._handle_webhook)
+ # Multi-profile multiplexing: a /p//webhooks/ prefix
+ # routes the inbound event to that profile. Same handler; the profile is
+ # captured from the path and stamped onto the SessionSource so the agent
+ # turn resolves that profile's config/skills/credentials. Only honored
+ # when gateway.multiplex_profiles is on (the handler validates).
+ app.router.add_post(
+ "/p/{profile}/webhooks/{route_name}", self._handle_webhook
+ )
# Port conflict detection — fail fast if port is already in use
import socket as _socket
@@ -397,6 +410,35 @@ class WebhookAdapter(BasePlatformAdapter):
except Exception as e:
logger.error("[webhook] Failed to reload dynamic routes: %s", e)
+ def _resolve_request_profile(self, request: "web.Request"):
+ """Resolve + validate the /p// URL prefix on a webhook request.
+
+ Returns:
+ - ``None`` when no profile prefix is present, or multiplexing is off
+ (the prefix is ignored, request handled as the default profile).
+ - the profile name (str) when present, multiplexing is on, and the
+ profile is one this gateway serves.
+ - ``_PROFILE_REJECTED`` when a prefix is present but the profile is
+ unknown/unconfigured (handler returns 404).
+ """
+ profile = (request.match_info.get("profile") or "").strip()
+ if not profile:
+ return None
+ runner = self.gateway_runner
+ cfg = getattr(runner, "config", None)
+ if not getattr(cfg, "multiplex_profiles", False):
+ # Prefix supplied but multiplexing is off — ignore it, behave as
+ # the single-profile gateway (don't 404 a would-be valid route).
+ return None
+ try:
+ from hermes_cli.profiles import profiles_to_serve
+ served = {name for name, _ in profiles_to_serve(multiplex=True)}
+ except Exception:
+ return _PROFILE_REJECTED
+ if profile not in served:
+ return _PROFILE_REJECTED
+ return profile
+
async def _handle_webhook(self, request: "web.Request") -> "web.Response":
"""POST /webhooks/{route_name} — receive and process a webhook event."""
# Hot-reload dynamic subscriptions on each request (mtime-gated, cheap)
@@ -405,6 +447,13 @@ class WebhookAdapter(BasePlatformAdapter):
route_name = request.match_info.get("route_name", "")
route_config = self._routes.get(route_name)
+ # Multi-profile: resolve + validate the /p// prefix if present.
+ profile = self._resolve_request_profile(request)
+ if profile is _PROFILE_REJECTED:
+ return web.json_response(
+ {"error": "Unknown or unconfigured profile"}, status=404
+ )
+
if not route_config:
return web.json_response(
{"error": f"Unknown route: {route_name}"}, status=404
@@ -641,6 +690,8 @@ class WebhookAdapter(BasePlatformAdapter):
user_id=f"webhook:{route_name}",
user_name=route_name,
)
+ if profile and isinstance(profile, str):
+ source.profile = profile
event = MessageEvent(
text=prompt,
message_type=MessageType.TEXT,
diff --git a/gateway/platforms/whatsapp_common.py b/gateway/platforms/whatsapp_common.py
index 6b56be3b8de..c6ed3da6e32 100644
--- a/gateway/platforms/whatsapp_common.py
+++ b/gateway/platforms/whatsapp_common.py
@@ -365,3 +365,56 @@ class WhatsAppBehaviorMixin:
result = result.replace(f"{_CODE_PH}{i}\x00", code)
return result
+
+
+# ---------------------------------------------------------------------------
+# Shared bridge directory resolution for CLI and adapter
+# ---------------------------------------------------------------------------
+
+def resolve_whatsapp_bridge_dir() -> Path:
+ """Resolve the WhatsApp bridge directory, mirroring to HERMES_HOME if needed.
+
+ When the install tree is read-only (e.g., Docker /opt/hermes), this function
+ mirrors the bridge source to a writable HERMES_HOME location and returns that
+ path. This ensures npm install works in Docker environments.
+
+ Returns the resolved bridge directory path.
+ """
+ import shutil
+ from pathlib import Path as _Path
+
+ # Default location in install tree (may be read-only)
+ from hermes_constants import get_hermes_home
+ install_bridge = _Path(__file__).resolve().parents[2] / "scripts" / "whatsapp-bridge"
+
+ # Try HERMES_HOME location first
+ hermes_home = get_hermes_home()
+ hermes_home_bridge = hermes_home / "scripts" / "whatsapp-bridge"
+
+ # Check if install dir is writable
+ try:
+ test_file = install_bridge / ".write_test"
+ test_file.touch()
+ test_file.unlink()
+ install_writable = True
+ except (OSError, PermissionError):
+ install_writable = False
+
+ if install_writable:
+ return install_bridge
+
+ # Install dir is read-only, mirror to HERMES_HOME if needed
+ if hermes_home_bridge.exists():
+ return hermes_home_bridge
+
+ # Mirror the bridge source to HERMES_HOME
+ try:
+ hermes_home_bridge.parent.mkdir(parents=True, exist_ok=True)
+ shutil.copytree(
+ install_bridge,
+ hermes_home_bridge,
+ dirs_exist_ok=False,
+ )
+ return hermes_home_bridge
+ except Exception:
+ return install_bridge
diff --git a/gateway/relay/__init__.py b/gateway/relay/__init__.py
index 421fe0ac240..4b3fdda8a8d 100644
--- a/gateway/relay/__init__.py
+++ b/gateway/relay/__init__.py
@@ -79,40 +79,6 @@ def relay_connection_auth() -> tuple[Optional[str], Optional[str]]:
return (gateway_id or None, secret or None)
-def relay_inbound_config() -> tuple[Optional[str], Optional[str], int]:
- """Resolve (delivery_key, bind_host, bind_port) for the inbound receiver.
-
- The connector delivers normalized inbound events to this gateway over a
- SIGNED HTTP POST (not the outbound WS), verified with the per-tenant delivery
- key issued at enrollment (``GATEWAY_RELAY_DELIVERY_KEY``). The receiver only
- starts when a delivery key AND a bind port are configured — a gateway with no
- public inbound URL (e.g. a purely outbound dev run) simply doesn't run it.
-
- Env first (Docker), then ``gateway.relay_delivery_key`` /
- ``gateway.relay_inbound_host`` / ``gateway.relay_inbound_port`` in config.yaml.
- Port 0 (default/unset) -> receiver disabled.
- """
- key = os.environ.get("GATEWAY_RELAY_DELIVERY_KEY", "").strip()
- host = os.environ.get("GATEWAY_RELAY_INBOUND_HOST", "").strip()
- port_raw = os.environ.get("GATEWAY_RELAY_INBOUND_PORT", "").strip()
- if not (key and port_raw):
- try:
- from gateway.run import _load_gateway_config # late import to avoid cycle
-
- cfg = (_load_gateway_config().get("gateway") or {})
- key = key or str(cfg.get("relay_delivery_key", "") or "").strip()
- host = host or str(cfg.get("relay_inbound_host", "") or "").strip()
- if not port_raw:
- port_raw = str(cfg.get("relay_inbound_port", "") or "").strip()
- except Exception: # noqa: BLE001 - config absence/parse must never crash registration
- pass
- try:
- port = int(port_raw) if port_raw else 0
- except ValueError:
- port = 0
- return (key or None, host or "0.0.0.0", port)
-
-
def relay_endpoint() -> Optional[str]:
"""The gateway's own PUBLIC inbound URL, asserted to the connector at provision.
@@ -238,21 +204,33 @@ def _post_provision(
return payload
-def self_provision_if_managed() -> bool:
- """Managed-boot self-provision: mint relay creds in-process, no human, no disk.
+def self_provision_relay() -> bool:
+ """Boot-time relay self-provision: mint relay creds in-process, no human, no disk.
- Fires only on a MANAGED boot (``is_managed()``) with relay configured
- (``relay_url()`` set) and NO per-gateway secret already present. In that case
- the runtime resolves the agent's own Nous access token (the same
+ Fires when relay is configured (``relay_url()`` set) and NO per-gateway secret
+ is already present, AND the agent can resolve its own Nous access token. In
+ that case the runtime resolves the agent's own Nous access token (the same
``resolve_nous_access_token()`` the enroll CLI / dashboard register use),
POSTs ``/relay/provision`` asserting its own endpoint + route keys, and sets
``GATEWAY_RELAY_ID`` / ``GATEWAY_RELAY_SECRET`` / ``GATEWAY_RELAY_DELIVERY_KEY``
into ``os.environ`` so the subsequent ``register_relay_adapter()`` picks them
- up. The creds live ONLY in process memory — never written to ``~/.hermes/.env``
- (``save_env_value`` refuses under managed anyway, and keeping the secret off
- any volume is the stronger posture).
+ up. The creds live ONLY in process memory — never written to ``~/.hermes/.env``.
- Stateless: process-env creds don't survive a restart, so a managed container
+ The trigger is deliberately NOT ``is_managed()``: that means
+ "package-manager/NixOS-managed" and is False on a NAS-hosted Fly agent (which
+ sets neither ``HERMES_MANAGED`` nor a ``.managed`` marker), so gating on it
+ blocked the exact hosted case this is for. The real signal is "you pointed me
+ at a connector and didn't pin a secret" — which is both NAS-independent and
+ self-guarding:
+
+ - A NAS-hosted agent: has ``GATEWAY_RELAY_URL``, no pinned secret, and a
+ bootstrapped NAS token -> self-provisions.
+ - A self-hosted operator who ran ``hermes gateway enroll``: has a PINNED
+ ``GATEWAY_RELAY_SECRET`` -> skipped (the secret-present guard below).
+ - A self-hosted box with a relay URL but no NAS identity:
+ ``resolve_nous_access_token()`` fails -> graceful no-op.
+
+ Stateless: process-env creds don't survive a restart, so a hosted container
re-provisions every boot; the connector's rotation window covers a still-
connected prior instance. An explicitly-pinned ``GATEWAY_RELAY_SECRET`` (env
or config) is RESPECTED — self-provision skips so an operator pin isn't
@@ -267,18 +245,12 @@ def self_provision_if_managed() -> bool:
logger = logging.getLogger("gateway.relay")
- try:
- from hermes_cli.config import is_managed
- except Exception: # noqa: BLE001
- return False
-
- if not is_managed():
- return False
dial_url = relay_url()
if not dial_url:
return False
- # Respect an already-present (pinned/stamped) secret — don't stomp it.
+ # Respect an already-present (pinned/stamped) secret — don't stomp it. This
+ # is also what makes a self-hosted, enrolled gateway skip self-provision.
existing_id, existing_secret = relay_connection_auth()
if existing_id and existing_secret:
logger.info("relay self-provision skipped: GATEWAY_RELAY_SECRET already set")
@@ -289,6 +261,8 @@ def self_provision_if_managed() -> bool:
access_token = resolve_nous_access_token()
except Exception as exc: # noqa: BLE001 - boot must survive a token failure
+ # No resolvable NAS identity (e.g. a self-hosted box that hasn't enrolled)
+ # -> nothing to provision with; skip quietly and let the gateway boot.
logger.warning("relay self-provision skipped: could not resolve Nous token (%s)", exc)
return False
@@ -318,8 +292,11 @@ def self_provision_if_managed() -> bool:
logger.warning("relay self-provision failed (%s); gateway will boot without relay auth", exc)
return False
- # Set creds in-process so register_relay_adapter() + relay_inbound_config()
- # read them from os.environ. Never logged.
+ # Set creds in-process so register_relay_adapter() reads them from os.environ
+ # (the per-gateway secret authenticates the outbound WS upgrade). The delivery
+ # key is still issued by the connector and persisted for forward-compat, but
+ # inbound now rides the WS (no HTTP receiver), so it is not consumed here.
+ # Never logged.
os.environ["GATEWAY_RELAY_ID"] = str(result.get("gatewayId") or gateway_id)
os.environ["GATEWAY_RELAY_SECRET"] = str(result.get("secret") or "")
os.environ["GATEWAY_RELAY_DELIVERY_KEY"] = str(result.get("deliveryKey") or "")
diff --git a/gateway/relay/adapter.py b/gateway/relay/adapter.py
index b64f7abc517..9e44a34b421 100644
--- a/gateway/relay/adapter.py
+++ b/gateway/relay/adapter.py
@@ -22,9 +22,10 @@ import logging
from typing import Any, Callable, Dict, Optional
from gateway.config import Platform, PlatformConfig
-from gateway.platforms.base import BasePlatformAdapter, SendResult
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
from gateway.relay.descriptor import CapabilityDescriptor
from gateway.relay.transport import RelayTransport
+from gateway.session import SessionSource
logger = logging.getLogger(__name__)
@@ -57,11 +58,14 @@ class RelayAdapter(BasePlatformAdapter):
self._transport = transport
# Capability surface read by stream_consumer (getattr(..., 4096)).
self.MAX_MESSAGE_LENGTH = descriptor.max_message_length
+ # chat_id -> guild_id (Discord) / workspace scope, learned from inbound
+ # events. The connector's egress guard resolves the owning tenant from
+ # the OUTBOUND action's metadata.guild_id; the gateway's generic delivery
+ # path (run.py _thread_metadata_for_source) only carries thread_id, so we
+ # re-attach the scope here from what we saw inbound. Keyed by chat_id
+ # (channel) since that's what send() receives. See routedEgressGuard.ts.
+ self._scope_by_chat: Dict[str, str] = {}
self.supports_code_blocks = descriptor.markdown_dialect not in ("", "plain")
- # Inbound delivery receiver (signed connector→gateway HTTP POSTs). Built
- # lazily in connect() when a delivery key + bind port are configured; a
- # purely-outbound dev gateway runs without it. See inbound_receiver.py.
- self._inbound_runner: Any = None
# ── capability surface (from descriptor) ─────────────────────────────
@property
@@ -80,6 +84,19 @@ class RelayAdapter(BasePlatformAdapter):
if self._transport is None:
raise RuntimeError("RelayAdapter has no transport configured")
self._transport.set_inbound_handler(self._on_inbound)
+ # Inbound interrupts (connector -> owning gateway) arrive as
+ # interrupt_inbound frames over the SAME outbound WS; bridge them to the
+ # adapter's interrupt path. WS-only: there is no inbound HTTP receiver.
+ set_interrupt = getattr(self._transport, "set_interrupt_inbound_handler", None)
+ if callable(set_interrupt):
+ set_interrupt(self.on_interrupt)
+ # Passthrough-plane forwards (Discord interactions, Twilio, …) also ride
+ # the SAME outbound WS (Phase 5 §5.1) — the connector edge-ACKed and
+ # forwards the real request here, so a hosted gateway needs no public
+ # inbound port. Bridge them to the adapter's passthrough handler.
+ set_passthrough = getattr(self._transport, "set_passthrough_handler", None)
+ if callable(set_passthrough):
+ set_passthrough(self._on_passthrough)
ok = await self._transport.connect()
if not ok:
return False
@@ -92,40 +109,12 @@ class RelayAdapter(BasePlatformAdapter):
logger.warning("relay handshake failed: %s", exc)
return False
self._apply_descriptor(descriptor)
- # Start the signed inbound-delivery receiver if configured (the connector
- # POSTs normalized events to it over HTTP, verified with the tenant
- # delivery key). Non-fatal: a receiver bind failure must not fail the
- # outbound connection — the gateway can still send.
- await self._maybe_start_inbound_receiver()
+ # Inbound (messages + interrupts) is delivered over the outbound WS via
+ # the connector's relay bus — there is NO inbound HTTP endpoint (hosted
+ # gateways have no public IP). The transport's reader already dispatches
+ # `inbound` / `interrupt_inbound` frames to the handlers wired above.
return True
- async def _maybe_start_inbound_receiver(self) -> None:
- """Start the inbound HTTP receiver when a delivery key + port are set."""
- from gateway.relay import relay_inbound_config
-
- delivery_key, host, port = relay_inbound_config()
- if not (delivery_key and port):
- return # no inbound URL configured -> outbound-only gateway
- try:
- from aiohttp import web
-
- from gateway.relay.inbound_receiver import InboundDeliveryReceiver
-
- receiver = InboundDeliveryReceiver(
- delivery_key_verify_list=lambda: [delivery_key],
- on_message=self._on_inbound,
- on_interrupt=self.on_interrupt,
- )
- runner = web.AppRunner(receiver.build_app(), access_log=None)
- await runner.setup()
- site = web.TCPSite(runner, host, port)
- await site.start()
- self._inbound_runner = runner
- logger.info("relay inbound receiver listening on http://%s:%s", host, port)
- except Exception as exc: # noqa: BLE001 - inbound bind failure must not kill outbound
- logger.warning("relay inbound receiver failed to start: %s", exc)
- self._inbound_runner = None
-
def _apply_descriptor(self, descriptor: CapabilityDescriptor) -> None:
"""Adopt a (re)negotiated descriptor into the live capability surface."""
self.descriptor = descriptor
@@ -134,8 +123,35 @@ class RelayAdapter(BasePlatformAdapter):
async def _on_inbound(self, event) -> None:
"""Bridge a connector-delivered MessageEvent into the normal adapter path."""
+ self._capture_scope(event)
await self.handle_message(event)
+ def _capture_scope(self, event) -> None:
+ """Remember chat_id -> guild scope from an inbound event so our outbound
+ (the agent's reply) can re-assert it for the connector's egress tenant
+ resolution. Never raises — scope tracking must not break inbound."""
+ try:
+ src = getattr(event, "source", None)
+ scope = getattr(src, "guild_id", None) if src else None
+ chat = getattr(src, "chat_id", None) if src else None
+ if scope and chat:
+ self._scope_by_chat[str(chat)] = str(scope)
+ except Exception: # noqa: BLE001 - scope tracking must never break inbound
+ pass
+
+ def _with_scope(self, chat_id: str, metadata: Optional[Dict[str, Any]]) -> Dict[str, Any]:
+ """Ensure the outbound metadata carries guild_id for the connector's
+ egress tenant resolution. The connector resolves the owning tenant from
+ metadata.guild_id (Discord); without it egress is declined as
+ 'target not routed to an onboarded tenant'. No-op when we have no scope
+ for this chat (e.g. DMs) or it's already present."""
+ meta: Dict[str, Any] = dict(metadata or {})
+ if not meta.get("guild_id"):
+ scope = self._scope_by_chat.get(str(chat_id))
+ if scope:
+ meta["guild_id"] = scope
+ return meta
+
async def on_interrupt(self, session_key: str, chat_id: str) -> None:
"""Bridge a connector-delivered /stop into the adapter's interrupt path.
@@ -147,13 +163,96 @@ class RelayAdapter(BasePlatformAdapter):
"""
await self.interrupt_session_activity(session_key, chat_id)
+ async def _on_passthrough(self, forward, buffer_id: Optional[str] = None) -> None:
+ """Handle a connector-forwarded passthrough request (Phase 5 §5.1).
+
+ The passthrough plane (Discord interactions, Twilio webhooks, …) answers
+ the provider's latency-critical ACK at the connector EDGE, then forwards
+ the real, ALREADY-SANITIZED request to this gateway over the outbound WS.
+ The connector is the trust boundary: it verified the provider signature
+ at the edge and stripped any shared-identity credential (e.g. a Discord
+ interaction follow-up token) into its vault — so this body carries no
+ token, and the agent later acts on it via the token-less ``follow_up``
+ path (``send_follow_up``), never holding the credential.
+
+ For a Discord interaction we decode the (JSON) body and convert it to a
+ normalized ``MessageEvent`` so it flows through the SAME agent path as a
+ chat message (``handle_message``); the agent's reply egresses over the
+ normal outbound/follow_up path. Non-JSON or non-interaction forwards are
+ logged and dropped for now (Twilio/SMS over the relay is a later unit).
+
+ NEVER raises: a malformed forward must not kill the read loop.
+
+ NOTE (open semantic sub-design, flagged for review): the interaction ->
+ MessageEvent mapping below is the v1 default. The exact agent UX for a
+ slash-command / button interaction (vs. a plain message) — command name
+ surfacing, option rendering, deferred-vs-immediate response — is the open
+ piece tracked in the spec; the TRANSPORT + receive mechanism (this whole
+ path) is settled.
+ """
+ try:
+ platform = getattr(forward, "platform", "") or ""
+ if platform == "discord":
+ event = self._discord_interaction_to_event(forward)
+ if event is not None:
+ self._capture_scope(event)
+ await self.handle_message(event)
+ return
+ logger.info(
+ "relay passthrough_forward dropped (no handler): platform=%s method=%s path=%s",
+ platform,
+ getattr(forward, "method", "?"),
+ getattr(forward, "path", "?"),
+ )
+ except Exception: # noqa: BLE001 - a bad forward must never break the reader
+ logger.warning("relay passthrough_forward handling failed", exc_info=True)
+
+ def _discord_interaction_to_event(self, forward):
+ """Convert a forwarded Discord interaction body to a MessageEvent, or None.
+
+ Builds the session source the same way the connector does for an
+ interaction (``interactionSessionSource`` on the connector side), so the
+ agent's session key matches the one the connector bound the follow-up
+ capability under. Returns None when the body isn't a usable interaction
+ (e.g. a PING, which the connector already answers at the edge and never
+ forwards).
+ """
+ import json
+
+ from gateway.platforms.base import MessageType
+
+ try:
+ payload = json.loads(bytes(getattr(forward, "body", b"")).decode("utf-8"))
+ except Exception: # noqa: BLE001
+ return None
+ if not isinstance(payload, dict):
+ return None
+ # type 1 = PING (answered at the edge, never forwarded); 2 = APPLICATION_COMMAND;
+ # 3 = MESSAGE_COMPONENT; 5 = MODAL_SUBMIT. Surface a best-effort text.
+ itype = payload.get("type")
+ data = payload.get("data") or {}
+ if itype == 2:
+ text = str(data.get("name") or "")
+ elif itype == 3:
+ text = str(data.get("custom_id") or "")
+ else:
+ text = ""
+ member = payload.get("member") or {}
+ user = (member.get("user") if isinstance(member, dict) else None) or payload.get("user") or {}
+ channel_id = str(payload.get("channel_id") or "")
+ guild_id = payload.get("guild_id")
+ source = SessionSource(
+ platform=Platform.RELAY,
+ chat_id=channel_id,
+ chat_type="channel" if guild_id else "dm",
+ user_id=str(user.get("id")) if isinstance(user, dict) and user.get("id") else None,
+ user_name=str(user.get("username")) if isinstance(user, dict) and user.get("username") else None,
+ guild_id=str(guild_id) if guild_id else None,
+ message_id=str(payload.get("id")) if payload.get("id") else None,
+ )
+ return MessageEvent(text=text, message_type=MessageType.TEXT, source=source)
+
async def disconnect(self) -> None:
- if self._inbound_runner is not None:
- try:
- await self._inbound_runner.cleanup()
- except Exception: # noqa: BLE001 - best-effort teardown
- pass
- self._inbound_runner = None
if self._transport is not None:
await self._transport.disconnect()
@@ -172,7 +271,7 @@ class RelayAdapter(BasePlatformAdapter):
"chat_id": chat_id,
"content": content,
"reply_to": reply_to,
- "metadata": metadata or {},
+ "metadata": self._with_scope(chat_id, metadata),
}
)
return SendResult(
diff --git a/gateway/relay/inbound_receiver.py b/gateway/relay/inbound_receiver.py
deleted file mode 100644
index 733fe38c2c6..00000000000
--- a/gateway/relay/inbound_receiver.py
+++ /dev/null
@@ -1,204 +0,0 @@
-"""Gateway-side inbound delivery receiver. EXPERIMENTAL.
-
-The connector delivers normalized inbound events to a tenant's gateway over a
-**signed HTTP POST** (connector ``src/relay/httpGatewayDelivery.ts``), NOT over
-the gateway's outbound ``/relay`` WebSocket: the connector instance that owns a
-platform socket is generally not the instance a given gateway dialed out to, so
-inbound is delivered to a tenant ENDPOINT (which may load-balance across gateway
-instances). Each delivery is HMAC-signed with the per-tenant **delivery key**
-(``gateway/relay/auth.py``); this receiver verifies the signature over the EXACT
-raw request bytes before accepting the event.
-
-Two routes (mirroring the connector's two POST targets):
- POST {base} {"type":"message", "event": , ...}
- POST {base}/interrupt {"type":"interrupt","session_key": ..., "reason"?}
-
-The receiver:
- 1. reads the RAW body bytes (never a reparsed/re-serialized form — the HMAC is
- over the literal bytes the connector signed),
- 2. verifies ``x-relay-signature`` / ``x-relay-timestamp`` against the delivery
- key verify list (primary + secondary during rotation), within the replay
- window — rejects 401 on any failure,
- 3. parses the JSON and dispatches: a ``message`` to the inbound handler (the
- RelayAdapter's ``handle_message`` via the transport's normal path), an
- ``interrupt`` to the interrupt handler.
-
-EXPERIMENTAL: the transport protocol may change without a deprecation cycle
-until ≥2 Class-1 platforms validate it. See docs/relay-connector-contract.md.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-from typing import Any, Awaitable, Callable, Optional, Sequence
-
-from gateway.platforms.base import MessageEvent
-from gateway.relay.auth import (
- DELIVERY_SIG_HEADER,
- DELIVERY_TS_HEADER,
- verify_delivery_signature,
-)
-
-logger = logging.getLogger(__name__)
-
-# Callbacks the receiver dispatches verified deliveries to.
-InboundMessageHandler = Callable[[MessageEvent], Awaitable[None]]
-InboundInterruptHandler = Callable[[str, str], Awaitable[None]]
-
-try: # lazy/optional dep — mirrors the other HTTP-receiving adapters
- from aiohttp import web
-except ImportError: # pragma: no cover - exercised only when the extra is absent
- web = None # type: ignore[assignment]
-
-AIOHTTP_AVAILABLE = web is not None
-
-
-def _event_from_wire(raw: dict) -> MessageEvent:
- """Rebuild a MessageEvent from the connector's normalized inbound payload.
-
- Identical mapping to the WS transport's ``_event_from_wire`` (the wire shape
- is the same; only the transport differs). Kept here so the HTTP receiver has
- no import dependency on the WS transport module.
- """
- from gateway.config import Platform
- from gateway.platforms.base import MessageType
- from gateway.session import SessionSource
-
- src = raw.get("source", {}) or {}
- platform = src.get("platform", "relay")
- try:
- platform_enum = Platform(platform)
- except ValueError:
- platform_enum = Platform.RELAY
-
- source = SessionSource(
- platform=platform_enum,
- chat_id=src.get("chat_id", ""),
- chat_type=src.get("chat_type", "dm"),
- chat_name=src.get("chat_name"),
- user_id=src.get("user_id"),
- user_name=src.get("user_name"),
- thread_id=src.get("thread_id"),
- chat_topic=src.get("chat_topic"),
- user_id_alt=src.get("user_id_alt"),
- chat_id_alt=src.get("chat_id_alt"),
- guild_id=src.get("guild_id"),
- parent_chat_id=src.get("parent_chat_id"),
- message_id=src.get("message_id"),
- )
- try:
- msg_type = MessageType(raw.get("message_type", "text"))
- except ValueError:
- msg_type = MessageType.TEXT
-
- return MessageEvent(
- text=raw.get("text", ""),
- message_type=msg_type,
- source=source,
- message_id=raw.get("message_id"),
- reply_to_message_id=raw.get("reply_to_message_id"),
- media_urls=raw.get("media_urls") or [],
- )
-
-
-class InboundDeliveryReceiver:
- """Verifies + dispatches signed connector→gateway inbound deliveries.
-
- Transport-agnostic core: ``handle_raw`` takes the raw body bytes + headers +
- which route was hit and returns ``(status, body)``. The aiohttp wiring
- (``build_app`` / ``serve``) is a thin shell so the verify+dispatch logic is
- unit-testable without a live socket.
- """
-
- def __init__(
- self,
- *,
- delivery_key_verify_list: Callable[[], Sequence[str]],
- on_message: InboundMessageHandler,
- on_interrupt: Optional[InboundInterruptHandler] = None,
- max_skew_seconds: int = 300,
- ) -> None:
- # A callable (not a static list) so a rotated delivery key is picked up
- # without rebuilding the receiver — mirrors the connector's verify list.
- self._verify_list = delivery_key_verify_list
- self._on_message = on_message
- self._on_interrupt = on_interrupt
- self._max_skew_seconds = max_skew_seconds
-
- async def handle_raw(
- self, *, raw_body: bytes, timestamp: Optional[str], signature: Optional[str], is_interrupt: bool
- ) -> tuple[int, dict]:
- """Verify the signature over ``raw_body`` and dispatch. Returns (status, json).
-
- 401 on a missing/invalid/expired signature (never dispatches unverified).
- 400 on malformed JSON. 200 on a verified, dispatched delivery.
- """
- verify_keys = list(self._verify_list() or [])
- if not verify_keys:
- # No delivery key provisioned -> we cannot verify -> reject. A gateway
- # that hasn't enrolled must not accept inbound (fail closed).
- logger.warning("relay inbound: no delivery key configured; rejecting")
- return 401, {"error": "no delivery key configured"}
-
- # Verify over the EXACT raw bytes the connector signed. Decode to text
- # with the same UTF-8 the connector's JSON.stringify produced; a single
- # differing byte breaks the HMAC (raw-body-preservation discipline).
- body_text = raw_body.decode("utf-8", errors="strict")
- if not verify_delivery_signature(
- body_text, timestamp, signature, verify_keys, self._max_skew_seconds
- ):
- return 401, {"error": "invalid delivery signature"}
-
- try:
- payload = json.loads(body_text)
- except json.JSONDecodeError:
- return 400, {"error": "invalid JSON body"}
-
- if is_interrupt or payload.get("type") == "interrupt":
- session_key = str(payload.get("session_key", ""))
- chat_id = str(payload.get("chat_id", "") or payload.get("reason", "") or "")
- if self._on_interrupt is not None and session_key:
- await self._on_interrupt(session_key, chat_id)
- return 200, {"ok": True}
-
- # Default: a normalized inbound message event.
- event_raw = payload.get("event")
- if not isinstance(event_raw, dict):
- return 400, {"error": "missing event"}
- event = _event_from_wire(event_raw)
- await self._on_message(event)
- return 200, {"ok": True}
-
- # ── aiohttp wiring (thin shell over handle_raw) ──────────────────────
- def build_app(self) -> Any:
- """Build an aiohttp Application exposing the delivery + interrupt routes."""
- if not AIOHTTP_AVAILABLE:
- raise RuntimeError(
- "InboundDeliveryReceiver requires the 'aiohttp' package "
- "(install the messaging extra)."
- )
-
- async def _deliver(request: Any) -> Any:
- return await self._respond(request, is_interrupt=False)
-
- async def _interrupt(request: Any) -> Any:
- return await self._respond(request, is_interrupt=True)
-
- app = web.Application()
- app.router.add_get("/healthz", lambda _: web.Response(text="ok"))
- app.router.add_post("/", _deliver)
- app.router.add_post("/interrupt", _interrupt)
- return app
-
- async def _respond(self, request: Any, *, is_interrupt: bool) -> Any:
- # Read the RAW bytes — do NOT use request.json() (it reparses and we'd
- # verify over a re-serialized form, breaking the HMAC).
- raw_body = await request.read()
- status, body = await self.handle_raw(
- raw_body=raw_body,
- timestamp=request.headers.get(DELIVERY_TS_HEADER),
- signature=request.headers.get(DELIVERY_SIG_HEADER),
- is_interrupt=is_interrupt,
- )
- return web.json_response(body, status=status)
diff --git a/gateway/relay/transport.py b/gateway/relay/transport.py
index afe6f769f26..b557416c7ad 100644
--- a/gateway/relay/transport.py
+++ b/gateway/relay/transport.py
@@ -30,6 +30,13 @@ from gateway.relay.descriptor import CapabilityDescriptor
# Callback the transport invokes for each inbound normalized event.
InboundHandler = Callable[[MessageEvent], Awaitable[None]]
+# Callback the transport invokes for each forwarded passthrough request (§5.1).
+# The first arg is a PassthroughForward (gateway/relay/ws_transport.py) — typed
+# as Any here to keep this protocol module free of a concrete-transport import
+# (ws_transport imports FROM this module). The second is an optional bufferId
+# (Phase 5 §5.3 buffered flip) the handler acks after durable handoff.
+PassthroughHandler = Callable[[Any, Optional[str]], Awaitable[None]]
+
@runtime_checkable
class RelayTransport(Protocol):
@@ -51,6 +58,18 @@ class RelayTransport(Protocol):
"""Register the callback invoked with each inbound MessageEvent."""
...
+ def set_passthrough_handler(self, handler: "PassthroughHandler") -> None:
+ """Register the callback invoked with each forwarded passthrough request.
+
+ Phase 5 §5.1: the passthrough plane (Discord interactions, Twilio, …)
+ answers the provider's edge ACK at the connector, then forwards the real
+ request to the gateway over this same outbound socket (a hosted gateway
+ has no public inbound port). The transport invokes ``handler(forward,
+ buffer_id)`` for each ``passthrough_forward`` frame. Optional on a
+ transport (an in-memory stub may not implement it).
+ """
+ ...
+
async def send_outbound(self, action: Dict[str, Any]) -> Dict[str, Any]:
"""Carry an outbound action (send/edit/typing) to the connector.
diff --git a/gateway/relay/ws_transport.py b/gateway/relay/ws_transport.py
index b2e8eda09cd..eb17848e0b3 100644
--- a/gateway/relay/ws_transport.py
+++ b/gateway/relay/ws_transport.py
@@ -33,6 +33,7 @@ import asyncio
import json
import logging
import uuid
+from dataclasses import dataclass
from typing import Any, Dict, Optional
from gateway.platforms.base import MessageEvent, MessageType
@@ -54,6 +55,35 @@ _HANDSHAKE_TIMEOUT_S = 30.0
_OUTBOUND_TIMEOUT_S = 30.0
+def _ws_dial_url(url: str) -> str:
+ """Normalize a connector URL to the ``ws(s)://…/relay`` dial target.
+
+ The relay URL is configured once (``GATEWAY_RELAY_URL`` / ``gateway.relay_url``)
+ as the connector's BASE URL (e.g. ``https://connector.example``) and shared by
+ both the provision POST (which needs ``http(s)://…/relay/provision`` — see
+ ``_provision_url``) and the WS dial (which needs ``ws(s)://…/relay``, the path
+ the connector mounts its ``WebSocketServer`` on). Two normalizations, both
+ load-bearing:
+
+ - scheme: ``https -> wss``, ``http -> ws`` (``websockets.connect`` raises
+ "scheme isn't ws or wss" on an http(s) URL).
+ - path: ensure it ends in ``/relay`` (the connector returns HTTP 400 on an
+ upgrade to any other path, since the WS server is mounted at ``/relay``).
+
+ Idempotent: an already-``ws(s)://…/relay`` URL is returned unchanged, so a URL
+ configured WITH the scheme and/or ``/relay`` still works.
+ """
+ raw = (url or "").strip()
+ if raw.startswith("https://"):
+ raw = "wss://" + raw[len("https://"):]
+ elif raw.startswith("http://"):
+ raw = "ws://" + raw[len("http://"):]
+ raw = raw.rstrip("/")
+ if not raw.endswith("/relay"):
+ raw = f"{raw}/relay"
+ return raw
+
+
def _event_from_wire(raw: Dict[str, Any]) -> MessageEvent:
"""Rebuild a MessageEvent from the connector's normalized inbound payload.
@@ -99,6 +129,54 @@ def _event_from_wire(raw: Dict[str, Any]) -> MessageEvent:
)
+@dataclass
+class PassthroughForward:
+ """A connector-forwarded passthrough-plane request (Phase 5 §5.1).
+
+ The connector answered the provider's latency-critical ACK at its edge, then
+ forwarded the real (already-sanitized) request to this gateway over the WS.
+ ``body`` is the exact decoded bytes the connector forwarded (the wire carries
+ it base64-encoded for byte parity). ``headers`` preserve arrival order.
+ """
+
+ platform: str
+ bot_id: str
+ method: str
+ path: str
+ headers: list[tuple[str, str]]
+ body: bytes
+
+
+def _passthrough_from_wire(raw: Dict[str, Any]) -> PassthroughForward:
+ """Rebuild a PassthroughForward from the connector's wire frame.
+
+ Mirrors the connector's ``PassthroughForward`` (relay/protocol.ts): the body
+ is base64-decoded back to the exact bytes the connector forwarded, so the
+ gateway re-processes byte-identical content (the connector is the trust
+ boundary; it already verified at the edge).
+ """
+ import base64
+
+ body_b64 = raw.get("bodyB64", "") or ""
+ try:
+ body = base64.b64decode(body_b64)
+ except Exception: # noqa: BLE001 - a malformed body must not crash the reader
+ body = b""
+ headers_raw = raw.get("headers", []) or []
+ headers: list[tuple[str, str]] = []
+ for pair in headers_raw:
+ if isinstance(pair, (list, tuple)) and len(pair) == 2:
+ headers.append((str(pair[0]), str(pair[1])))
+ return PassthroughForward(
+ platform=str(raw.get("platform", "")),
+ bot_id=str(raw.get("botId", "")),
+ method=str(raw.get("method", "")),
+ path=str(raw.get("path", "")),
+ headers=headers,
+ body=body,
+ )
+
+
class WebSocketRelayTransport:
"""RelayTransport over a WebSocket connection the gateway dials to the connector."""
@@ -118,7 +196,7 @@ class WebSocketRelayTransport:
"WebSocketRelayTransport requires the 'websockets' package "
"(install the messaging extra)."
)
- self._url = url
+ self._url = _ws_dial_url(url)
self._platform = platform
self._bot_id = bot_id
self._connect_timeout_s = connect_timeout_s
@@ -289,6 +367,16 @@ class WebSocketRelayTransport:
handler = getattr(self, "_interrupt_inbound_handler", None)
if handler is not None:
await handler(frame.get("session_key", ""), frame.get("chat_id", ""))
+ elif ftype == "passthrough_forward":
+ # Phase 5 §5.1: a forwarded passthrough-plane request (Discord
+ # interaction, Twilio, …) the connector already edge-ACKed. It rides
+ # the SAME outbound WS as inbound messages so a hosted gateway needs
+ # no public inbound port. Dispatch to the adapter's handler; the
+ # bufferId (when present, §5.3 buffered flip) is passed for ack.
+ handler = getattr(self, "_passthrough_handler", None)
+ if handler is not None:
+ fwd = _passthrough_from_wire(frame.get("forward", {}))
+ await handler(fwd, frame.get("bufferId"))
else:
# hello/outbound/interrupt are gateway->connector; ignore if echoed.
pass
@@ -296,3 +384,12 @@ class WebSocketRelayTransport:
def set_interrupt_inbound_handler(self, handler: Any) -> None:
"""Register the callback for connector->gateway interrupt_inbound frames."""
self._interrupt_inbound_handler = handler
+
+ def set_passthrough_handler(self, handler: Any) -> None:
+ """Register the callback for connector->gateway passthrough_forward frames.
+
+ Mirrors set_interrupt_inbound_handler: the runner/adapter wires this so a
+ forwarded passthrough request (Phase 5 §5.1) reaches the adapter over the
+ same outbound WS the gateway already holds. ``handler(forward, buffer_id)``.
+ """
+ self._passthrough_handler = handler
diff --git a/gateway/run.py b/gateway/run.py
index 8f139341793..a388f184ad6 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -195,6 +195,19 @@ def _gateway_platform_value(platform: Any) -> str:
return str(getattr(platform, "value", platform) or "").strip().lower()
+def _non_conversational_metadata(
+ metadata: Optional[Dict[str, Any]] = None,
+ *,
+ platform: Any = None,
+) -> Optional[Dict[str, Any]]:
+ """Mark Discord lifecycle/status sends without changing other platforms."""
+ if _gateway_platform_value(platform) != "discord":
+ return metadata
+ merged = dict(metadata or {})
+ merged["non_conversational"] = True
+ return merged
+
+
def _is_transient_network_error(exc: BaseException) -> bool:
"""Return True for transient network errors safe to log + swallow.
@@ -792,6 +805,13 @@ def _build_gateway_agent_history(
# tools that were killed mid-flight.
agent_history = _strip_interrupted_tool_tails(agent_history)
+ # Strip a dangling assistant(tool_calls) tail with no tool answers —
+ # the signature of a SIGKILL mid-tool-call (e.g. the tool itself ran
+ # `docker restart`/`kill` and took the gateway down before the result
+ # was persisted). Without this the model re-issues the unanswered call
+ # on resume and loops the restart forever (#49201).
+ agent_history = _strip_dangling_tool_call_tail(agent_history)
+
observed_context = "\n".join(observed_group_context).strip() or None
return agent_history, observed_context
@@ -917,6 +937,50 @@ def _strip_interrupted_tool_tails(
return cleaned
+def _strip_dangling_tool_call_tail(
+ agent_history: List[Dict[str, Any]],
+) -> List[Dict[str, Any]]:
+ """Strip a trailing ``assistant(tool_calls)`` block left with NO answers.
+
+ When a tool call itself kills the gateway process (``docker restart``,
+ ``systemctl restart``, ``kill``, ``hermes gateway restart``), the process
+ is terminated by SIGKILL *mid-call* — before the tool result is ever
+ written and before the orderly shutdown rewind
+ (``_drop_trailing_empty_response_scaffolding``) can run. The last thing
+ persisted is the ``assistant`` message that issued the ``tool_calls``,
+ with zero matching ``tool`` rows.
+
+ On resume the model sees an unanswered tool call at the tail and naturally
+ re-issues it — which restarts the gateway again, producing the infinite
+ reboot loop in #49201. ``_strip_interrupted_tool_tails`` does not catch
+ this because there is no tool result to inspect for an interrupt marker.
+
+ This strips that dangling tail at the source so there is nothing for the
+ model to re-execute. It only acts when the tail is an
+ ``assistant(tool_calls)`` whose calls have NO corresponding ``tool``
+ results — a completed assistant→tool pair (any tool answers present) is
+ left untouched so genuine mid-progress tool loops still resume.
+ """
+ if not agent_history:
+ return agent_history
+
+ last = agent_history[-1]
+ if not (
+ isinstance(last, dict)
+ and last.get("role") == "assistant"
+ and last.get("tool_calls")
+ ):
+ return agent_history
+
+ logger.debug(
+ "Stripping dangling unanswered assistant(tool_calls) tail "
+ "(%d call(s)) — process likely killed mid-tool-call by a "
+ "restart/shutdown command (#49201)",
+ len(last.get("tool_calls") or []),
+ )
+ return agent_history[:-1]
+
+
_AUTO_CONTINUE_NOTE_PREFIX = "[System note: Your previous turn"
_AUTO_CONTINUE_FALLBACK_PREFIX = "[System note: A new message"
@@ -1051,6 +1115,55 @@ def _collect_auto_append_media_tags(
return media_tags, has_voice_directive
+
+def _collect_history_media_paths(agent_history: List[Dict[str, Any]]) -> set:
+ """Collect every media path already delivered in prior tool results.
+
+ Used to dedup auto-appended MEDIA tags so the same file is not re-sent on
+ later turns. Must cover BOTH delivery shapes:
+ * ``MEDIA:`` text tags in tool results, and
+ * ``image_generate`` JSON-payload paths (``host_image`` / ``image`` /
+ ``agent_visible_image``), which carry no MEDIA: tag.
+
+ Missing the JSON-payload shape caused #46627: after a compression
+ boundary the auto-append fallback rescans full history, re-discovers an
+ earlier ``image_generate`` result whose path was never in the dedup set,
+ and re-emits the MEDIA tag every turn.
+ """
+ paths: set = set()
+ tool_name_by_call_id: Dict[str, str] = {}
+ for msg in agent_history:
+ if msg.get("role") == "assistant":
+ for call in msg.get("tool_calls") or []:
+ cid = call.get("id") or call.get("call_id")
+ fn = call.get("function") or {}
+ name = str(fn.get("name") or call.get("name") or "")
+ if cid and name:
+ tool_name_by_call_id[str(cid)] = name
+ for msg in agent_history:
+ if msg.get("role") not in {"tool", "function"}:
+ continue
+ content = str(msg.get("content", "") or "")
+ if "MEDIA:" in content:
+ for match in _TOOL_MEDIA_RE.finditer(content):
+ p = match.group(1).strip().rstrip('",}')
+ if p:
+ paths.add(p)
+ continue
+ cid = str(msg.get("tool_call_id") or msg.get("call_id") or "")
+ if tool_name_by_call_id.get(cid) == "image_generate":
+ try:
+ payload = json.loads(content)
+ except Exception:
+ payload = None
+ if isinstance(payload, dict) and payload.get("success"):
+ for field in _JSON_MEDIA_TOOL_PATH_FIELDS:
+ jp = payload.get(field)
+ if isinstance(jp, str) and jp:
+ paths.add(jp)
+ break
+ return paths
+
# ---------------------------------------------------------------------------
# SSL certificate auto-detection for NixOS and other non-standard systems.
# Must run BEFORE any HTTP library (discord, aiohttp, etc.) is imported.
@@ -1173,13 +1286,31 @@ def _reload_runtime_env_preserving_config_authority() -> None:
pick up rotated API keys. config.yaml remains authoritative for agent budget
settings such as agent.max_turns; otherwise a stale HERMES_MAX_ITERATIONS in
.env can replace the startup bridge on later turns.
+
+ In multiplex mode this is a NO-OP for the credential reload: secrets come
+ from the per-turn ``set_secret_scope`` (installed by ``_profile_runtime_scope``)
+ which loads the routed profile's ``.env`` into an isolated mapping. Mutating
+ the process-global ``os.environ`` here would defeat that isolation and leak
+ the default profile's keys to every profile's turns and subprocesses.
"""
+ from agent.secret_scope import is_multiplex_active
+ if is_multiplex_active():
+ # Credentials are resolved from the active profile's secret scope, not
+ # os.environ. Still honor config.yaml's agent.max_turns bridge below
+ # using the scoped home, but never reload .env into global env.
+ _bridge_max_turns_from_config(_hermes_home)
+ return
+
load_hermes_dotenv(
hermes_home=_hermes_home,
project_env=Path(__file__).resolve().parents[1] / '.env',
)
+ _bridge_max_turns_from_config(_hermes_home)
- config_path = _hermes_home / 'config.yaml'
+
+def _bridge_max_turns_from_config(home: "Path") -> None:
+ """Bridge config.yaml agent.max_turns into HERMES_MAX_ITERATIONS (a global)."""
+ config_path = home / 'config.yaml'
if not config_path.exists():
return
try:
@@ -1188,6 +1319,15 @@ def _reload_runtime_env_preserving_config_authority() -> None:
cfg = _yaml.safe_load(f) or {}
from hermes_cli.config import _expand_env_vars
cfg = _expand_env_vars(cfg)
+ # Managed scope: keep administrator-pinned values authoritative on every
+ # turn too. This per-turn reload re-bridges config→env, so without the
+ # overlay a managed agent.max_turns / timezone / redact_secrets would be
+ # replaced by the user's value after the first turn. Fail-open.
+ try:
+ from hermes_cli import managed_scope
+ cfg = managed_scope.apply_managed_overlay(cfg)
+ except Exception:
+ pass
except Exception:
return
@@ -1196,6 +1336,80 @@ def _reload_runtime_env_preserving_config_authority() -> None:
os.environ["HERMES_MAX_ITERATIONS"] = str(agent_cfg["max_turns"])
+def _current_max_iterations() -> int:
+ """Return the current per-turn iteration budget after runtime env refresh."""
+ _reload_runtime_env_preserving_config_authority()
+ try:
+ return int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
+ except (TypeError, ValueError):
+ return 90
+
+
+from contextlib import contextmanager as _contextmanager
+
+
+# Platforms that bind a host TCP port (HTTP/webhook listeners). In a profile
+# multiplexer the default profile owns the single shared listener and serves
+# every profile through the /p// URL prefix, so a SECONDARY profile
+# enabling one of these is always a misconfiguration: it would try to bind a
+# port already held by the default's listener. We hard-error on it rather than
+# silently dropping the adapter (see _start_one_profile_adapters).
+# Stored as platform .value strings since the Platform enum is imported below.
+_PORT_BINDING_PLATFORM_VALUES = frozenset({
+ "webhook",
+ "api_server",
+ "msgraph_webhook",
+ "feishu",
+ "wecom_callback",
+ "bluebubbles",
+ "sms",
+})
+
+
+class MultiplexConfigError(RuntimeError):
+ """A profile multiplexer config is invalid (fail-fast at startup).
+
+ Distinct from a transient adapter-connect failure: a transient error is
+ logged and the gateway stays alive to retry, but a config error means the
+ operator must fix config.yaml, so it aborts startup cleanly.
+ """
+
+
+@_contextmanager
+def _profile_runtime_scope(profile_home: "Path"):
+ """Scope config/skills/memory AND credentials to a profile for one turn.
+
+ Combines the two seams the multiplexer needs:
+ 1. ``set_hermes_home_override`` — redirects ``get_hermes_home()`` (config,
+ skills, memory, SOUL, sessions) to the profile's home. Contextvar, so
+ it propagates into the agent worker thread via ``copy_context()``.
+ 2. ``set_secret_scope`` — installs the profile's ``.env`` secrets as the
+ authoritative credential source, so ``get_secret`` reads this profile's
+ keys and never the process-global ``os.environ`` (which in a
+ multiplexer may hold another profile's values).
+
+ Only used on the multiplexed inbound path. Single-profile gateways never
+ enter this scope, so their behavior is unchanged. Loading the profile's
+ ``.env`` here does NOT mutate ``os.environ`` — ``build_profile_secret_scope``
+ returns an isolated dict — which is what keeps subprocesses (MCP, kanban)
+ from inheriting cross-profile secrets.
+ """
+ from hermes_constants import set_hermes_home_override, reset_hermes_home_override
+ from agent.secret_scope import (
+ build_profile_secret_scope,
+ set_secret_scope,
+ reset_secret_scope,
+ )
+
+ home_token = set_hermes_home_override(str(profile_home))
+ secret_token = set_secret_scope(build_profile_secret_scope(Path(profile_home)))
+ try:
+ yield
+ finally:
+ reset_secret_scope(secret_token)
+ reset_hermes_home_override(home_token)
+
+
_DOCKER_VOLUME_SPEC_RE = re.compile(r"^(?P.+):(?P/[^:]+?)(?::(?P[^:]+))?$")
_DOCKER_MEDIA_OUTPUT_CONTAINER_PATHS = {"/output", "/outputs"}
@@ -1210,6 +1424,17 @@ if _config_path.exists():
# Expand ${ENV_VAR} references before bridging to env vars.
from hermes_cli.config import _expand_env_vars
_cfg = _expand_env_vars(_cfg)
+ # Managed scope: overlay administrator-pinned values BEFORE bridging to
+ # env vars, so a managed timezone / redact_secrets / max_turns / terminal
+ # setting wins over the user's value at the env layer too. This bridge
+ # reads config.yaml directly (not via load_config), so without the
+ # overlay every HERMES_*/TERMINAL_* env var below would carry the user's
+ # value even when an administrator pinned it. Fail-open via the helper.
+ try:
+ from hermes_cli import managed_scope
+ _cfg = managed_scope.apply_managed_overlay(_cfg)
+ except Exception:
+ pass
# Top-level simple values (fallback only — don't override .env)
for _key, _val in _cfg.items():
if isinstance(_val, (str, int, float, bool)) and _key not in os.environ:
@@ -1239,6 +1464,7 @@ if _config_path.exists():
"container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
"docker_volumes": "TERMINAL_DOCKER_VOLUMES",
"docker_env": "TERMINAL_DOCKER_ENV",
+ "docker_extra_args": "TERMINAL_DOCKER_EXTRA_ARGS",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
@@ -1880,8 +2106,14 @@ def _load_gateway_config() -> dict:
Uses the module-level ``_hermes_home`` (so tests that monkeypatch it
still see their fixture) and shares the mtime-keyed raw-yaml cache
from ``hermes_cli.config.read_raw_config`` when the paths match.
+
+ Managed scope is overlaid on the result (via the shared helper) so the
+ gateway honors administrator-pinned values — neither read_raw_config nor a
+ direct yaml.safe_load carries the managed merge on its own. Fail-open.
"""
config_path = _hermes_home / 'config.yaml'
+ raw: dict = {}
+ used_canonical = False
try:
from hermes_cli.config import get_config_path, read_raw_config
# Fast path: if _hermes_home agrees with the canonical config
@@ -1889,18 +2121,31 @@ def _load_gateway_config() -> dict:
# direct read (keeps test fixtures with a monkeypatched
# _hermes_home working).
if config_path == get_config_path():
- return read_raw_config()
+ raw = read_raw_config()
+ used_canonical = True
except Exception:
pass
+ if not used_canonical:
+ try:
+ if config_path.exists():
+ import yaml
+ with open(config_path, 'r', encoding='utf-8') as f:
+ raw = yaml.safe_load(f) or {}
+ except Exception:
+ logger.debug("Could not load gateway config from %s", config_path)
+ raw = {}
+
+ # Overlay managed scope. read_raw_config() returns the user's raw YAML
+ # WITHOUT the managed merge (that lives in load_config/_load_config_impl),
+ # so the overlay is required on both paths for the gateway to honor pinned
+ # values. Helper is fail-open and a no-op when no managed scope exists.
try:
- if config_path.exists():
- import yaml
- with open(config_path, 'r', encoding='utf-8') as f:
- return yaml.safe_load(f) or {}
+ from hermes_cli import managed_scope
+ raw = managed_scope.apply_managed_overlay(raw if isinstance(raw, dict) else {})
except Exception:
- logger.debug("Could not load gateway config from %s", config_path)
- return {}
+ pass
+ return raw if isinstance(raw, dict) else {}
def _load_gateway_runtime_config() -> dict:
@@ -2240,7 +2485,22 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
def __init__(self, config: Optional[GatewayConfig] = None):
global _gateway_runner_ref
self.config = config or load_gateway_config()
+ # Mark the process as a profile multiplexer when configured. This flips
+ # agent.secret_scope.get_secret() to fail-closed on any unscoped
+ # credential read, so a missed migration crashes loudly instead of
+ # leaking a cross-profile value (Workstream A). Inert when off.
+ try:
+ from agent.secret_scope import set_multiplex_active
+ set_multiplex_active(bool(getattr(self.config, "multiplex_profiles", False)))
+ except Exception:
+ logger.debug("could not set multiplex-active flag", exc_info=True)
self.adapters: Dict[Platform, BasePlatformAdapter] = {}
+ # Multi-profile multiplexing: adapters for NON-default profiles live
+ # here, keyed by profile name then Platform. self.adapters stays the
+ # default/active profile's map so the ~93 existing self.adapters[...]
+ # sites are untouched when multiplexing is off (this dict is empty).
+ # Populated by _start_secondary_profile_adapters().
+ self._profile_adapters: Dict[str, Dict[Platform, BasePlatformAdapter]] = {}
self._warn_if_docker_media_delivery_is_risky()
_gateway_runner_ref = _weakref.ref(self)
@@ -2792,10 +3052,24 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
except Exception:
pass
config = getattr(self, "config", None)
+ # Mirror SessionStore._resolve_profile_for_key so this fallback path
+ # produces the same namespace as the primary path: None (legacy
+ # agent:main) unless multiplexing is on, then the active profile.
+ _profile = None
+ if getattr(config, "multiplex_profiles", False):
+ if source.profile:
+ _profile = source.profile
+ else:
+ try:
+ from hermes_cli.profiles import get_active_profile_name
+ _profile = get_active_profile_name() or "default"
+ except Exception:
+ _profile = None
return build_session_key(
source,
group_sessions_per_user=getattr(config, "group_sessions_per_user", True),
thread_sessions_per_user=getattr(config, "thread_sessions_per_user", False),
+ profile=_profile,
)
def _telegram_topic_mode_enabled(self, source: SessionSource) -> bool:
@@ -3392,6 +3666,28 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
except Exception:
pass
+ def _persist_active_agents(self) -> None:
+ """Persist the live in-flight agent count to ``gateway_state.json``.
+
+ Called at every turn boundary (a running-agent slot is claimed or
+ released) so the dashboard ``/api/status`` readout reflects in-flight
+ gateway turns in near-real-time. Without this the file is only
+ rewritten on lifecycle transitions, so any ``active_agents`` read
+ between transitions is stale (a turn could start and finish without the
+ file ever moving).
+
+ Deliberately passes ONLY ``active_agents`` — ``gateway_state`` and the
+ other fields stay ``_UNSET`` so ``write_runtime_status``'s
+ read-merge-write preserves the current lifecycle state (``running`` /
+ ``draining`` / …). Passing ``gateway_state=None`` here would clobber it.
+ Best-effort: a failed status write must never disrupt a turn.
+ """
+ try:
+ from gateway.status import write_runtime_status
+ write_runtime_status(active_agents=self._running_agent_count())
+ except Exception:
+ pass
+
def _update_platform_runtime_status(
self,
platform: str,
@@ -3945,6 +4241,20 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if not adapter:
return False # let default path handle it
+ # --- Internal synthetic events must never interrupt/steer ---
+ # Async-delegation completions (delegate_task(background=true)) and
+ # background-process completions (terminal notify_on_complete) re-enter
+ # the originating session as internal MessageEvents. When the session
+ # is busy, treating them like a user TEXT message means interrupt-mode
+ # (the default busy_text_mode) aborts the active turn AND sends a "⚡
+ # Interrupting current task" ack — exactly the opposite of the design
+ # invariant that a completion surfaces as a NEW turn only when idle and
+ # never splices into a running turn. Fall through to the base adapter,
+ # which queues internal events silently (no interrupt, no ack) so they
+ # cascade after the current turn finishes.
+ if getattr(event, "internal", False):
+ return False
+
running_agent = self._running_agents.get(session_key)
effective_mode = self._busy_input_mode
@@ -4002,13 +4312,19 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# current run finishes (or is interrupted). Skip this for a
# successful steer — the text already landed inside the run and
# must NOT also be replayed as a next-turn user message.
+ #
+ # Route through _queue_or_replace_pending_event (the same FIFO
+ # infrastructure used by busy queue-mode and /queue) rather than a
+ # raw merge_pending_message_event(merge_text=True). The raw merge
+ # newline-joins consecutive TEXT follow-ups into a SINGLE pending
+ # turn, destroying message boundaries — so two separate user
+ # messages sent while the agent was busy (interrupt mode, or a
+ # steer that fell back to queue) arrived as one mashed-together
+ # turn (#43066 sub-bug 2). The FIFO path gives each text its own
+ # turn in arrival order while still preserving photo-burst / album
+ # merge semantics for media.
if not steered:
- merge_pending_message_event(
- adapter._pending_messages,
- session_key,
- event,
- merge_text=event.message_type == MessageType.TEXT,
- )
+ self._queue_or_replace_pending_event(session_key, event)
is_queue_mode = effective_mode == "queue"
is_steer_mode = effective_mode == "steer"
@@ -4359,6 +4675,40 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
def _finalize_shutdown_agents(self, active_agents: Dict[str, Any]) -> None:
for agent in active_agents.values():
+ # Persist any in-flight transcript to the SQLite session store
+ # before teardown (#13121). An agent forcibly interrupted by the
+ # drain-timeout escalation may never reach
+ # ``turn_finalizer.finalize_turn`` (the only place that flushes the
+ # turn to state.db) — e.g. it was blocked in a tool call that did
+ # not abort within the post-interrupt grace window. Its in-flight
+ # tool rounds live only in the in-memory ``_session_messages``
+ # (refreshed per tool round in ``conversation_loop`` but never
+ # written to SQLite mid-turn), so the immediate pre-restart turn is
+ # silently dropped from ``load_transcript()`` on resume. Flushing
+ # here closes that gap; the resume_pending / fresh-tool-tail
+ # branches in ``_handle_message_with_agent`` already expect a
+ # transcript whose tail may be a pending tool result. The flush is
+ # idempotent (identity-tracked in ``_flush_messages_to_session_db``),
+ # so agents that DID finish gracefully re-flush nothing.
+ try:
+ _flush = getattr(agent, "_flush_messages_to_session_db", None)
+ _session_messages = getattr(agent, "_session_messages", None)
+ if callable(_flush) and isinstance(_session_messages, list) and _session_messages:
+ # Strip private empty-response retry scaffolding from the
+ # tail first, mirroring the graceful ``_persist_session``
+ # path, so a resumed turn doesn't replay synthetic recovery
+ # nudges.
+ _strip = getattr(
+ agent, "_drop_trailing_empty_response_scaffolding", None
+ )
+ if callable(_strip):
+ try:
+ _strip(_session_messages)
+ except Exception:
+ pass
+ _flush(_session_messages)
+ except Exception as _e:
+ logger.debug("Shutdown transcript flush failed: %s", _e)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
@@ -4371,6 +4721,27 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
pass
self._cleanup_agent_resources(agent)
+ def _should_emit_long_running_notification(
+ self,
+ session_key: Optional[str],
+ agent: Any,
+ executor_task: Optional[Any],
+ ) -> bool:
+ """Only emit the heartbeat while this task still owns the live run.
+
+ Guards against a stale ``running: delegate_task`` heartbeat outliving the
+ run that started it: stop once the executor finishes, the agent is gone,
+ or the session key has been rebound to a different live agent (e.g. the
+ user sent ``/new`` and a fresh agent took the slot mid-run, #12029).
+ """
+ if agent is None:
+ return False
+ if executor_task is not None and executor_task.done():
+ return False
+ if session_key and self._running_agents.get(session_key) is not agent:
+ return False
+ return True
+
def _cleanup_agent_resources(self, agent: Any) -> None:
"""Best-effort cleanup for temporary or cached agent instances."""
if agent is None:
@@ -4894,6 +5265,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# instead of spinning up a duplicate AIAgent (#45456).
self._running_agents[entry.session_key] = _AGENT_PENDING_SENTINEL
self._running_agents_ts[entry.session_key] = time.time()
+ self._persist_active_agents()
# Empty-text internal event — the _is_resume_pending branch in
# _handle_message_with_agent prepends the proper reason-aware
@@ -5119,14 +5491,15 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
from gateway.relay import (
register_relay_adapter,
relay_url,
- self_provision_if_managed,
+ self_provision_relay,
)
- # Managed boot: self-provision relay creds in-process (resolve the
- # agent's NAS token -> POST /relay/provision -> set GATEWAY_RELAY_* in
- # os.environ) BEFORE registration reads them. No-op when not managed,
- # relay unconfigured, or a secret is already pinned. Never raises.
- self_provision_if_managed()
+ # Boot-time relay self-provision: resolve the agent's NAS token ->
+ # POST /relay/provision -> set GATEWAY_RELAY_* in os.environ BEFORE
+ # registration reads them. No-op when relay is unconfigured, a secret
+ # is already pinned, or no NAS token resolves (self-hosted, unenrolled).
+ # Never raises.
+ self_provision_relay()
if register_relay_adapter():
logger.info("relay adapter registered (connector at %s)", relay_url())
@@ -5334,7 +5707,30 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
"attempts": 1,
"next_retry": time.monotonic() + 30,
}
-
+
+ # Multi-profile multiplexing: bring up adapters for every OTHER profile
+ # this gateway serves. Each profile's adapters connect under that
+ # profile's home + credential scope and stamp their inbound events with
+ # the profile so the agent turn resolves correctly. No-op when off.
+ try:
+ _secondary_connected = await self._start_secondary_profile_adapters()
+ connected_count += _secondary_connected
+ except MultiplexConfigError as e:
+ # Invalid multiplexer config — abort startup cleanly so the operator
+ # fixes config.yaml rather than running a half-wired gateway.
+ reason = str(e)
+ logger.error("Gateway multiplexer config error: %s", reason)
+ try:
+ from gateway.status import write_runtime_status
+ write_runtime_status(gateway_state="startup_failed", exit_reason=reason)
+ except Exception:
+ pass
+ self._request_clean_exit(reason)
+ self._startup_restore_in_progress = False
+ return True
+ except Exception as e:
+ logger.error("Secondary-profile adapter startup failed: %s", e, exc_info=True)
+
if connected_count == 0:
if startup_nonretryable_errors:
reason = "; ".join(startup_nonretryable_errors)
@@ -6341,6 +6737,22 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
time.monotonic() - _adapter_started_at,
e,
)
+
+ # Disconnect secondary-profile adapters (multiplex mode).
+ for _prof, _amap in list(getattr(self, "_profile_adapters", {}).items()):
+ for platform, adapter in list(_amap.items()):
+ try:
+ await adapter.cancel_background_tasks()
+ except Exception as e:
+ logger.debug("✗ %s bg-cancel error (profile %s): %s", platform.value, _prof, e)
+ try:
+ await adapter.disconnect()
+ logger.info("✓ %s disconnected (profile: %s)", platform.value, _prof)
+ except Exception as e:
+ logger.error("✗ %s disconnect error (profile %s): %s", platform.value, _prof, e)
+ _amap.clear()
+ if hasattr(self, "_profile_adapters"):
+ self._profile_adapters.clear()
logger.info(
"Shutdown phase: all adapters disconnected at +%.2fs",
_phase_elapsed(),
@@ -6510,6 +6922,175 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
"""Wait for shutdown signal."""
await self._shutdown_event.wait()
+ async def _start_secondary_profile_adapters(self) -> int:
+ """Bring up adapters for every non-active profile this gateway serves.
+
+ Returns the number of secondary adapters that connected. No-op (returns
+ 0) unless ``gateway.multiplex_profiles`` is on.
+
+ Each profile's adapters are created and connected under that profile's
+ HERMES_HOME + secret scope (``_profile_runtime_scope``), stored in
+ ``self._profile_adapters[profile]``, and given a message handler that
+ stamps ``source.profile`` before delegating to the shared
+ ``_handle_message`` — so the agent turn resolves that profile's config,
+ skills, and credentials. Same-platform credential collisions (two
+ profiles polling the same bot token) are detected and refused here, the
+ only point that sees every profile's resolved credentials together.
+ """
+ if not getattr(self.config, "multiplex_profiles", False):
+ return 0
+
+ try:
+ from hermes_cli.profiles import profiles_to_serve, get_active_profile_name
+ except Exception:
+ return 0
+
+ active = get_active_profile_name() or "default"
+ connected = 0
+ # (platform, token-fingerprint) -> profile that claimed it. Detects two
+ # profiles trying to poll the same bot credential (impossible to do
+ # concurrently). Seed with the active profile's adapters.
+ claimed: Dict[tuple, str] = {}
+ for _plat, _ad in self.adapters.items():
+ fp = self._adapter_credential_fingerprint(_ad)
+ if fp is not None:
+ claimed[(_plat, fp)] = active
+
+ for profile_name, profile_home in profiles_to_serve(multiplex=True):
+ if profile_name == active:
+ continue # handled by the primary startup loop
+ try:
+ connected += await self._start_one_profile_adapters(
+ profile_name, profile_home, claimed
+ )
+ except MultiplexConfigError:
+ # Config error (e.g. a secondary profile binding a port) is not
+ # transient — propagate so startup aborts cleanly instead of
+ # limping along with a half-configured multiplexer.
+ raise
+ except Exception as e:
+ logger.error(
+ "Failed to start adapters for profile '%s': %s",
+ profile_name, e, exc_info=True,
+ )
+
+ # Record served profiles in runtime status for `hermes status`.
+ try:
+ from gateway.status import write_runtime_status
+ served = [active] + sorted(self._profile_adapters.keys())
+ write_runtime_status(served_profiles=served)
+ except Exception:
+ logger.debug("could not record served_profiles", exc_info=True)
+
+ return connected
+
+ async def _start_one_profile_adapters(
+ self, profile_name: str, profile_home: "Path", claimed: Dict[tuple, str]
+ ) -> int:
+ """Create+connect one profile's adapters under its runtime scope."""
+ from gateway.config import load_gateway_config
+
+ with _profile_runtime_scope(profile_home):
+ profile_cfg = load_gateway_config()
+
+ profile_map = self._profile_adapters.setdefault(profile_name, {})
+ connected = 0
+ for platform, platform_config in profile_cfg.platforms.items():
+ if not platform_config.enabled:
+ continue
+ # A secondary profile must NOT enable a port-binding platform: the
+ # default profile's listener already serves every profile via the
+ # /p// prefix, so a second bind can only collide. This is a
+ # config error, not a transient failure — fail fast and loud.
+ if platform.value in _PORT_BINDING_PLATFORM_VALUES:
+ raise MultiplexConfigError(
+ f"Profile '{profile_name}' enables the port-binding platform "
+ f"'{platform.value}', but gateway.multiplex_profiles is on. The "
+ f"default profile owns the single shared HTTP listener and "
+ f"serves every profile through the /p/{profile_name}/ URL "
+ f"prefix — a secondary profile cannot bind its own port. "
+ f"Remove platforms.{platform.value} from profile "
+ f"'{profile_name}'s config.yaml (configure it only on the "
+ f"default profile)."
+ )
+ with _profile_runtime_scope(profile_home):
+ adapter = self._create_adapter(platform, platform_config)
+ if not adapter:
+ continue
+
+ # Same-token conflict detection — refuse a duplicate poll.
+ fp = self._adapter_credential_fingerprint(adapter)
+ if fp is not None:
+ owner = claimed.get((platform, fp))
+ if owner is not None:
+ logger.error(
+ "Profile '%s' and '%s' both configure %s with the same "
+ "credential — refusing to start the duplicate (a single "
+ "bot token cannot be polled twice). Give each profile its "
+ "own %s credential.",
+ owner, profile_name, platform.value, platform.value,
+ )
+ await self._safe_adapter_disconnect(adapter, platform)
+ continue
+ claimed[(platform, fp)] = profile_name
+
+ # Stamp every inbound event from this adapter with its profile so
+ # the agent turn (and session key) resolve to the right home.
+ adapter.set_message_handler(
+ self._make_profile_message_handler(profile_name)
+ )
+ adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
+ adapter.set_session_store(self.session_store)
+ adapter.set_busy_session_handler(self._handle_active_session_busy_message)
+ adapter.set_topic_recovery_fn(self._recover_telegram_topic_thread_id)
+ adapter._busy_text_mode = self._busy_text_mode
+
+ try:
+ with _profile_runtime_scope(profile_home):
+ success = await self._connect_adapter_with_timeout(adapter, platform)
+ if success:
+ profile_map[platform] = adapter
+ connected += 1
+ logger.info("✓ %s connected (profile: %s)", platform.value, profile_name)
+ else:
+ logger.warning("✗ %s failed to connect (profile: %s)", platform.value, profile_name)
+ await self._safe_adapter_disconnect(adapter, platform)
+ except Exception as e:
+ logger.error("✗ %s error (profile: %s): %s", platform.value, profile_name, e)
+ await self._safe_adapter_disconnect(adapter, platform)
+ return connected
+
+ def _make_profile_message_handler(self, profile_name: str):
+ """Return a message handler that stamps source.profile then delegates."""
+ async def _handler(event):
+ try:
+ if getattr(event, "source", None) is not None and not event.source.profile:
+ event.source.profile = profile_name
+ except Exception:
+ pass
+ return await self._handle_message(event)
+ return _handler
+
+ @staticmethod
+ def _adapter_credential_fingerprint(adapter: Any) -> Optional[str]:
+ """Return a stable, log-safe fingerprint of an adapter's credential.
+
+ Used only to detect two profiles claiming the same bot token. Returns a
+ salted hash (never the token itself) of the adapter's primary
+ credential, or None when no credential is discoverable (in which case
+ we don't attempt conflict detection for it).
+ """
+ token = None
+ for attr in ("token", "bot_token", "_token", "api_token", "_bot_token"):
+ val = getattr(adapter, attr, None)
+ if isinstance(val, str) and val.strip():
+ token = val.strip()
+ break
+ if not token:
+ return None
+ import hashlib
+ return hashlib.sha256(("hermes-mux:" + token).encode("utf-8")).hexdigest()[:16]
+
def _create_adapter(
self,
platform: Platform,
@@ -6555,43 +7136,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
logger.debug("Platform registry lookup for '%s' failed: %s", platform.value, e)
# Fall through to built-in adapters below
- if platform == Platform.TELEGRAM:
- from gateway.platforms.telegram import TelegramAdapter, check_telegram_requirements
- if not check_telegram_requirements():
- logger.warning("Telegram: python-telegram-bot not installed")
- return None
- adapter = TelegramAdapter(config)
- # Apply Telegram notification mode from config. Controls whether
- # intermediate messages (tool progress, streaming, status) trigger
- # push notifications. Supports ENV override for quick testing.
- _notify_mode = os.getenv("HERMES_TELEGRAM_NOTIFICATIONS", "")
- if not _notify_mode:
- try:
- _gw_cfg = _load_gateway_config()
- _raw = cfg_get(_gw_cfg, "display", "platforms", "telegram", "notifications")
- if _raw not in {None, ""}:
- _notify_mode = str(_raw).strip().lower()
- except Exception:
- pass
- _notify_mode = _notify_mode or "important"
- if _notify_mode not in {"all", "important"}:
- logger.warning(
- "Unknown telegram notifications mode '%s', "
- "defaulting to 'important' (valid: all, important)",
- _notify_mode,
- )
- _notify_mode = "important"
- adapter._notifications_mode = _notify_mode
- return adapter
-
- elif platform == Platform.WHATSAPP:
- from gateway.platforms.whatsapp import WhatsAppAdapter, check_whatsapp_requirements
- if not check_whatsapp_requirements():
- logger.warning("WhatsApp: Node.js not installed or bridge not configured")
- return None
- return WhatsAppAdapter(config)
-
- elif platform == Platform.WHATSAPP_CLOUD:
+ if platform == Platform.WHATSAPP_CLOUD:
from gateway.platforms.whatsapp_cloud import (
WhatsAppCloudAdapter,
check_whatsapp_cloud_requirements,
@@ -6603,13 +7148,6 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
return None
return WhatsAppCloudAdapter(config)
- elif platform == Platform.SLACK:
- from gateway.platforms.slack import SlackAdapter, check_slack_requirements
- if not check_slack_requirements():
- logger.warning("Slack: slack-bolt not installed. Run: pip install 'hermes-agent[slack]'")
- return None
- return SlackAdapter(config)
-
elif platform == Platform.SIGNAL:
from gateway.platforms.signal import SignalAdapter, check_signal_requirements
if not check_signal_requirements():
@@ -6617,51 +7155,6 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
return None
return SignalAdapter(config)
- elif platform == Platform.EMAIL:
- from gateway.platforms.email import EmailAdapter, check_email_requirements
- if not check_email_requirements():
- logger.warning("Email: EMAIL_ADDRESS, EMAIL_PASSWORD, EMAIL_IMAP_HOST, or EMAIL_SMTP_HOST not set")
- return None
- return EmailAdapter(config)
-
- elif platform == Platform.SMS:
- from gateway.platforms.sms import SmsAdapter, check_sms_requirements
- if not check_sms_requirements():
- logger.warning("SMS: aiohttp not installed or TWILIO_ACCOUNT_SID/TWILIO_AUTH_TOKEN not set")
- return None
- return SmsAdapter(config)
-
- elif platform == Platform.DINGTALK:
- from gateway.platforms.dingtalk import DingTalkAdapter, check_dingtalk_requirements
- if not check_dingtalk_requirements():
- logger.warning("DingTalk: dingtalk-stream not installed or DINGTALK_CLIENT_ID/SECRET not set")
- return None
- return DingTalkAdapter(config)
-
- elif platform == Platform.FEISHU:
- from gateway.platforms.feishu import FeishuAdapter, check_feishu_requirements
- if not check_feishu_requirements():
- logger.warning("Feishu: lark-oapi not installed or FEISHU_APP_ID/SECRET not set")
- return None
- return FeishuAdapter(config)
-
- elif platform == Platform.WECOM_CALLBACK:
- from gateway.platforms.wecom_callback import (
- WecomCallbackAdapter,
- check_wecom_callback_requirements,
- )
- if not check_wecom_callback_requirements():
- logger.warning("WeComCallback: aiohttp/httpx/defusedxml not installed")
- return None
- return WecomCallbackAdapter(config)
-
- elif platform == Platform.WECOM:
- from gateway.platforms.wecom import WeComAdapter, check_wecom_requirements
- if not check_wecom_requirements():
- logger.warning("WeCom: aiohttp not installed or WECOM_BOT_ID/SECRET not set")
- return None
- return WeComAdapter(config)
-
elif platform == Platform.WEIXIN:
from gateway.platforms.weixin import WeixinAdapter, check_weixin_requirements
if not check_weixin_requirements():
@@ -6669,13 +7162,6 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
return None
return WeixinAdapter(config)
- elif platform == Platform.MATRIX:
- from gateway.platforms.matrix import MatrixAdapter, check_matrix_requirements
- if not check_matrix_requirements():
- logger.warning("Matrix: mautrix not installed or credentials not set. Run: pip install 'mautrix[encryption]'")
- return None
- return MatrixAdapter(config)
-
elif platform == Platform.API_SERVER:
from gateway.platforms.api_server import APIServerAdapter, check_api_server_requirements
if not check_api_server_requirements():
@@ -7957,6 +8443,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
self._active_session_leases[_quick_key] = _active_session_lease
self._running_agents[_quick_key] = _AGENT_PENDING_SENTINEL
self._running_agents_ts[_quick_key] = time.time()
+ self._persist_active_agents()
_run_generation = self._begin_session_run_generation(_quick_key)
try:
@@ -8201,8 +8688,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
guessed, _ = _mimetypes.guess_type(path)
if guessed:
mtype = guessed
- if not mtype.startswith(("application/", "text/")):
- continue
+ else:
+ mtype = "application/octet-stream"
+ # Any accepted file gets a path-pointing context note — we accept
+ # all file types now, so a non-text/non-application MIME (font/*,
+ # model/*, etc.) must still tell the agent the file exists.
basename = os.path.basename(path)
parts = basename.split("_", 2)
@@ -8225,7 +8715,13 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# multiple times, and without an explicit pointer the agent has to
# guess (or answer for both subjects). Token overhead is minimal.
reply_snippet = event.reply_to_text[:500]
- message_text = f'[Replying to: "{reply_snippet}"]\n\n{message_text}'
+ if getattr(event, "reply_to_is_own_message", False):
+ message_text = (
+ f'[Replying to your previous message: "{reply_snippet}"]\n\n'
+ f"{message_text}"
+ )
+ else:
+ message_text = f'[Replying to: "{reply_snippet}"]\n\n{message_text}'
if "@" in message_text:
try:
@@ -8582,7 +9078,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
_hyg_model = "anthropic/claude-sonnet-4.6"
_hyg_threshold_pct = 0.85
_hyg_compression_enabled = True
- _hyg_hard_msg_limit = 400
+ _hyg_hard_msg_limit = 5000
_hyg_config_context_length = None
_hyg_provider = None
_hyg_base_url = None
@@ -8704,8 +9200,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# extreme, regardless of token estimates. This breaks the
# death spiral where API disconnects prevent token data
# collection, which prevents compression, which causes more
- # disconnects. 400 messages is well above normal sessions
- # but catches runaway growth before it becomes unrecoverable.
+ # disconnects. 5000 messages is far above any normal session
+ # but catches truly runaway growth before it becomes
+ # unrecoverable. Set well clear of legitimate large-context
+ # (1M+) sessions doing thousands of short turns — those
+ # compress on the token threshold, not this count-based floor.
# Threshold is configurable via
# compression.hygiene_hard_message_limit.
# (#2153)
@@ -8754,6 +9253,13 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
session_id=session_entry.session_id,
)
try:
+ # The hygiene agent rotates the session
+ # forward to a continuation id that becomes
+ # the gateway session's live row. It must
+ # never finalize on close() (today it has no
+ # session_db so close() no-ops, but this
+ # guards a future where one is wired in).
+ _hyg_agent._end_session_on_close = False
_hyg_agent._print_fn = lambda *a, **kw: None
loop = asyncio.get_running_loop()
@@ -8770,7 +9276,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# the NEW session so the old transcript stays intact
# and searchable via session_search.
_hyg_new_sid = _hyg_agent.session_id
- if _hyg_new_sid != session_entry.session_id:
+ _hyg_rotated = _hyg_new_sid != session_entry.session_id
+ _hyg_in_place = bool(
+ getattr(_hyg_agent, "compression_in_place", False)
+ )
+ if _hyg_rotated:
session_entry.session_id = _hyg_new_sid
self.session_store._save()
self._sync_telegram_topic_binding(
@@ -8778,16 +9288,41 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
reason="hygiene-compression",
)
- self.session_store.rewrite_transcript(
- session_entry.session_id, _compressed
- )
- # Reset stored token count — transcript was rewritten
- session_entry.last_prompt_tokens = 0
- history = _compressed
- _new_count = len(_compressed)
- _new_tokens = estimate_messages_tokens_rough(
- _compressed
- )
+ # Only rewrite the transcript when rotation produced
+ # a NEW session id OR in-place compaction succeeded.
+ # The danger this guards against (mirrors the
+ # /compress fix #44794/#39704): the hygiene agent is
+ # built WITHOUT a session_db, so _compress_context
+ # cannot rotate — if it also wasn't in-place, the
+ # session_id is unchanged for a FAILURE reason, and an
+ # unconditional rewrite_transcript() would DELETE the
+ # original messages and replace them with only the
+ # compressed summary (permanent data loss, #21301).
+ if _hyg_rotated or _hyg_in_place:
+ self.session_store.rewrite_transcript(
+ session_entry.session_id, _compressed
+ )
+ # Reset stored token count — transcript rewritten
+ session_entry.last_prompt_tokens = 0
+ history = _compressed
+ _new_count = len(_compressed)
+ _new_tokens = estimate_messages_tokens_rough(
+ _compressed
+ )
+ else:
+ # No rewrite happened — transcript preserved
+ # unchanged, so the post-compression counts equal
+ # the pre-compression ones.
+ _new_count = _msg_count
+ _new_tokens = _approx_tokens
+ logger.warning(
+ "Gateway hygiene compression for session %s "
+ "did not rotate or compact in place "
+ "(no session_db on the hygiene agent) — "
+ "preserving the original transcript instead "
+ "of overwriting it with the summary (#21301).",
+ session_entry.session_id,
+ )
logger.info(
"Session hygiene: compressed %s → %s msgs, "
@@ -10632,7 +11167,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
disabled_toolsets = agent_cfg.get("disabled_toolsets") or None
pr = self._provider_routing
- max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
+ max_iterations = _current_max_iterations()
reasoning_config = self._resolve_session_reasoning_config(source=source)
self._reasoning_config = reasoning_config
self._service_tier = self._load_service_tier()
@@ -11274,7 +11809,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# consented to the prompt-cache invalidation via the slash-confirm
# gate in _handle_reload_mcp_command before we reach this point.
try:
- from model_tools import get_tool_definitions
+ from tools.mcp_tool import refresh_agent_mcp_tools
_cache = getattr(self, "_agent_cache", None)
_cache_lock = getattr(self, "_agent_cache_lock", None)
if _cache_lock is not None and _cache:
@@ -11286,15 +11821,16 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
continue
if _agent is None:
continue
- new_defs = get_tool_definitions(
- enabled_toolsets=getattr(_agent, "enabled_toolsets", None),
- disabled_toolsets=getattr(_agent, "disabled_toolsets", None),
- quiet_mode=True,
- )
- _agent.tools = new_defs
- _agent.valid_tool_names = {
- t["function"]["name"] for t in new_defs
- } if new_defs else set()
+ # Preserve each cached agent's build-time toolset
+ # selection EXACTLY: a gateway session built with a
+ # restricted enabled_toolsets (e.g. ["safe"]) must
+ # NOT silently gain tools after a reload. This is the
+ # opposite of the interactive CLI/TUI /reload-mcp,
+ # which is a single user re-applying their own config
+ # edit; gateway agents are per-session and may be
+ # deliberately locked down. (Contract is asserted by
+ # test_reload_mcp_preserves_per_agent_toolset_overrides.)
+ refresh_agent_mcp_tools(_agent, quiet_mode=True)
except Exception as _exc:
logger.debug(
"Failed to update cached agent tools after MCP reload: %s",
@@ -11736,7 +12272,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
chunks = [clean[i:i + max_chunk] for i in range(0, len(clean), max_chunk)]
for chunk in chunks:
try:
- await adapter.send(chat_id, f"```\n{chunk}\n```", metadata=metadata)
+ await adapter.send(
+ chat_id,
+ f"```\n{chunk}\n```",
+ metadata=_non_conversational_metadata(metadata, platform=platform),
+ )
except Exception as e:
logger.debug("Update stream send failed: %s", e)
@@ -11759,12 +12299,16 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
exit_code_raw = exit_code_path.read_text().strip() or "1"
exit_code = int(exit_code_raw)
if exit_code == 0:
- await adapter.send(chat_id, "✅ Hermes update finished.", metadata=metadata)
+ await adapter.send(
+ chat_id,
+ "✅ Hermes update finished.",
+ metadata=_non_conversational_metadata(metadata, platform=platform),
+ )
else:
await adapter.send(
chat_id,
"❌ Hermes update failed (exit code {}).".format(exit_code),
- metadata=metadata,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
)
logger.info("Update finished (exit=%s), notified %s", exit_code, session_key)
except Exception as e:
@@ -11815,7 +12359,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
prompt=prompt_text,
default=default,
session_key=session_key,
- metadata=metadata,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
)
sent_buttons = True
except Exception as btn_err:
@@ -11829,7 +12373,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
f"{prompt_text}{default_hint}\n\n"
f"Reply `{_p}approve` (yes) or `{_p}deny` (no), "
f"or type your answer directly.",
- metadata=metadata,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
)
# Keep the prompt marker on disk until the user
# answers. If the gateway restarts mid-prompt, the
@@ -11853,7 +12397,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
await adapter.send(
chat_id,
"❌ Hermes update timed out after 30 minutes.",
- metadata=metadata,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
)
except Exception:
pass
@@ -11959,7 +12503,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
msg = "✅ Hermes update finished successfully."
else:
msg = "❌ Hermes update failed. Check the gateway logs or run `hermes update` manually for details."
- await adapter.send(chat_id, msg, metadata=metadata)
+ await adapter.send(
+ chat_id,
+ msg,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
+ )
logger.info(
"Sent post-update notification to %s:%s (exit=%s)",
platform_str,
@@ -12022,7 +12570,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
result = await adapter.send(
str(chat_id),
"♻ Gateway restarted successfully. Your session continues.",
- metadata=metadata,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
)
# adapter.send() catches provider errors (e.g. "Chat not found")
# and returns SendResult(success=False) rather than raising, so
@@ -12089,9 +12637,21 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
adapter=adapter,
)
if metadata:
- result = await adapter.send(str(home.chat_id), message, metadata=metadata)
+ result = await adapter.send(
+ str(home.chat_id),
+ message,
+ metadata=_non_conversational_metadata(metadata, platform=platform),
+ )
else:
- result = await adapter.send(str(home.chat_id), message)
+ _startup_meta = _non_conversational_metadata(platform=platform)
+ if _startup_meta:
+ result = await adapter.send(
+ str(home.chat_id),
+ message,
+ metadata=_startup_meta,
+ )
+ else:
+ result = await adapter.send(str(home.chat_id), message)
if result is not None and getattr(result, "success", True) is False:
logger.warning(
"Home-channel startup notification failed for %s:%s: %s",
@@ -12127,6 +12687,16 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
in a ``finally`` block.
"""
from gateway.session_context import set_session_vars
+ # Propagate the adapter's async-delivery capability so async tools
+ # (terminal notify_on_complete / watch_patterns, delegate_task
+ # background=True) know whether this channel can wake a later turn.
+ # Default True keeps CLI / unknown paths working; stateless adapters
+ # (api_server) declare supports_async_delivery=False. Use getattr so
+ # bare runners built via object.__new__ (tests) without self.adapters
+ # don't blow up — they simply default to supported.
+ _adapters = getattr(self, "adapters", None) or {}
+ _adapter = _adapters.get(context.source.platform)
+ _async_delivery = getattr(_adapter, "supports_async_delivery", True)
return set_session_vars(
platform=context.source.platform.value,
chat_id=context.source.chat_id,
@@ -12136,6 +12706,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
user_name=str(context.source.user_name) if context.source.user_name else "",
session_key=context.session_key,
message_id=str(context.source.message_id) if context.source.message_id else "",
+ async_delivery=_async_delivery,
)
def _clear_session_env(self, tokens: list) -> None:
@@ -12642,7 +13213,9 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if session.exited:
# --- Agent-triggered completion: inject synthetic message ---
- # Skip if the agent already consumed the result via wait/poll/log
+ # Skip if the agent already consumed the result via wait/log.
+ # poll() is read-only and intentionally does NOT mark consumed
+ # (#10156) — a status check must not suppress this delivery turn.
from tools.process_registry import format_process_notification, process_registry as _pr_check
if agent_notify and not _pr_check.is_completion_consumed(session_id):
from tools.ansi_strip import strip_ansi
@@ -12732,7 +13305,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if adapter and chat_id:
try:
send_meta = {"thread_id": thread_id} if thread_id else None
- await adapter.send(chat_id, message_text, metadata=send_meta)
+ await adapter.send(
+ chat_id,
+ message_text,
+ metadata=_non_conversational_metadata(send_meta, platform=platform_name),
+ )
except Exception as e:
logger.error("Watcher delivery error: %s", e)
break
@@ -12753,7 +13330,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if adapter and chat_id:
try:
send_meta = {"thread_id": thread_id} if thread_id else None
- await adapter.send(chat_id, message_text, metadata=send_meta)
+ await adapter.send(
+ chat_id,
+ message_text,
+ metadata=_non_conversational_metadata(send_meta, platform=platform_name),
+ )
except Exception as e:
logger.error("Watcher delivery error: %s", e)
@@ -13001,6 +13582,11 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
self._running_agents_ts.pop(session_key, None)
if hasattr(self, "_busy_ack_ts"):
self._busy_ack_ts.pop(session_key, None)
+ # Turn boundary: a running-agent slot was just released. Persist the
+ # new (lower) in-flight count so the dashboard readout stays current
+ # between lifecycle transitions. Preserves gateway_state (see
+ # _persist_active_agents).
+ self._persist_active_agents()
return True
def _clear_session_boundary_security_state(self, session_key: str) -> None:
@@ -13551,6 +14137,13 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
from gateway.stream_consumer import GatewayStreamConsumer, StreamConsumerConfig
_adapter = self.adapters.get(source.platform)
if _adapter:
+ _pause_typing_before_finalize = None
+ if source.platform == Platform.TELEGRAM and hasattr(_adapter, "pause_typing_for_chat"):
+ def _pause_typing_before_finalize(
+ _adapter=_adapter,
+ _chat_id=source.chat_id,
+ ) -> None:
+ _adapter.pause_typing_for_chat(_chat_id)
_adapter_supports_edit = getattr(_adapter, "SUPPORTS_MESSAGE_EDITING", True)
_effective_cursor = _scfg.cursor if _adapter_supports_edit else ""
_buffer_only = False
@@ -13580,6 +14173,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
chat_id=source.chat_id,
config=_consumer_cfg,
metadata=_thread_metadata,
+ on_before_finalize=_pause_typing_before_finalize,
initial_reply_to_id=event_message_id,
)
except Exception as _sc_err:
@@ -13739,6 +14333,64 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
channel_prompt: Optional[str] = None,
persist_user_message: Optional[str] = None,
persist_user_timestamp: Optional[float] = None,
+ ) -> Dict[str, Any]:
+ """Profile-scoping wrapper around the agent run.
+
+ When multiplexing is active, resolve the inbound source's profile and
+ run the whole turn inside ``_profile_runtime_scope`` so config/skills/
+ memory resolve to that profile's home AND credentials resolve from that
+ profile's secret scope (never the process-global ``os.environ``). When
+ multiplexing is off this is a transparent pass-through — zero behavior
+ change for single-profile gateways.
+ """
+ if not getattr(getattr(self, "config", None), "multiplex_profiles", False):
+ return await self._run_agent_inner(
+ message, context_prompt, history, source, session_id,
+ session_key=session_key, run_generation=run_generation,
+ _interrupt_depth=_interrupt_depth, event_message_id=event_message_id,
+ channel_prompt=channel_prompt, persist_user_message=persist_user_message,
+ persist_user_timestamp=persist_user_timestamp,
+ )
+
+ profile_home = self._resolve_profile_home_for_source(source)
+ with _profile_runtime_scope(profile_home):
+ return await self._run_agent_inner(
+ message, context_prompt, history, source, session_id,
+ session_key=session_key, run_generation=run_generation,
+ _interrupt_depth=_interrupt_depth, event_message_id=event_message_id,
+ channel_prompt=channel_prompt, persist_user_message=persist_user_message,
+ persist_user_timestamp=persist_user_timestamp,
+ )
+
+ def _resolve_profile_home_for_source(self, source: SessionSource) -> "Path":
+ """Resolve which profile's HERMES_HOME should serve this inbound source.
+
+ Prefers the profile the source was routed to (``source.profile`` — set
+ by the /p// URL prefix or a per-credential adapter), falling
+ back to the active profile (the multiplexer's own home).
+ """
+ from hermes_cli.profiles import get_active_profile_name, get_profile_dir
+ try:
+ name = (source.profile or "").strip() or get_active_profile_name() or "default"
+ return get_profile_dir(name)
+ except Exception:
+ from hermes_constants import get_hermes_home
+ return get_hermes_home()
+
+ async def _run_agent_inner(
+ self,
+ message: str,
+ context_prompt: str,
+ history: List[Dict[str, Any]],
+ source: SessionSource,
+ session_id: str,
+ session_key: str = None,
+ run_generation: Optional[int] = None,
+ _interrupt_depth: int = 0,
+ event_message_id: Optional[str] = None,
+ channel_prompt: Optional[str] = None,
+ persist_user_message: Optional[str] = None,
+ persist_user_timestamp: Optional[float] = None,
) -> Dict[str, Any]:
"""
Run the agent with the given message and context.
@@ -14134,6 +14786,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if _progress_thread_id == source.thread_id
else {"thread_id": _progress_thread_id}
) if _progress_thread_id else None
+ _progress_metadata = _non_conversational_metadata(_progress_metadata, platform=source.platform)
_progress_reply_to = (
event_message_id
if source.platform in (Platform.FEISHU, Platform.MATTERMOST) and source.thread_id and event_message_id
@@ -14580,9 +15233,6 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# session_key is now set via contextvars in _set_session_env()
# (concurrency-safe). Keep os.environ as fallback for CLI/cron.
os.environ["HERMES_SESSION_KEY"] = session_key or ""
-
- # Read from env var or use default (same as CLI)
- max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
# Map platform enum to the platform hint key the agent understands.
# Platform.LOCAL ("local") maps to "cli"; others pass through as-is.
@@ -14597,10 +15247,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if self._ephemeral_system_prompt:
combined_ephemeral = (combined_ephemeral + "\n\n" + self._ephemeral_system_prompt).strip()
- # Re-read .env and config for fresh credentials (gateway is long-lived,
- # keys may change without restart). Keep config.yaml authoritative for
- # runtime budget settings bridged into env vars.
- _reload_runtime_env_preserving_config_authority()
+ max_iterations = _current_max_iterations()
try:
model, runtime_kwargs = self._resolve_session_agent_runtime(
@@ -14655,6 +15302,13 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
from gateway.stream_consumer import GatewayStreamConsumer, StreamConsumerConfig
_adapter = self.adapters.get(source.platform)
if _adapter:
+ _pause_typing_before_finalize = None
+ if source.platform == Platform.TELEGRAM and hasattr(_adapter, "pause_typing_for_chat"):
+ def _pause_typing_before_finalize(
+ _adapter=_adapter,
+ _chat_id=source.chat_id,
+ ) -> None:
+ _adapter.pause_typing_for_chat(_chat_id)
# Platforms that don't support editing sent messages
# (e.g. QQ, WeChat) should skip streaming entirely —
# without edit support, the consumer sends a partial
@@ -14699,6 +15353,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if progress_queue is not None
else None
),
+ on_before_finalize=_pause_typing_before_finalize,
initial_reply_to_id=event_message_id,
)
if _want_stream_deltas:
@@ -14798,6 +15453,9 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
except KeyError:
pass
self._init_cached_agent_for_turn(agent, _interrupt_depth)
+ # Refresh agent max_iterations from current config
+ # (cached agent may have been created with old config)
+ agent.max_iterations = max_iterations
logger.debug("Reusing cached agent for session %s", session_key)
if agent is None:
@@ -14899,7 +15557,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
_status_adapter.send(
_status_chat_id,
message,
- metadata=_status_thread_metadata,
+ metadata=_non_conversational_metadata(_status_thread_metadata, platform=source.platform),
),
_loop_for_step,
logger=logger,
@@ -15055,22 +15713,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# Collect MEDIA paths already in history so we can exclude them
# from the current turn's extraction. This is compression-safe:
# even if the message list shrinks, we know which paths are old.
- _history_media_paths: set = set()
- for _hm in agent_history:
- if _hm.get("role") in {"tool", "function"}:
- _hc = _hm.get("content", "")
- if "MEDIA:" in _hc:
- _TOOL_MEDIA_RE = re.compile(
- r'MEDIA:((?:[A-Za-z]:[/\\]|/|~\/)\S+\.(?:png|jpe?g|gif|webp|'
- r'mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|'
- r'flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|'
- r'txt|csv|apk|ipa))',
- re.IGNORECASE
- )
- for _match in _TOOL_MEDIA_RE.finditer(_hc):
- _p = _match.group(1).strip().rstrip('",}')
- if _p:
- _history_media_paths.add(_p)
+ _history_media_paths: set = _collect_history_media_paths(agent_history)
# Register per-session gateway approval callback so dangerous
# command approval blocks the agent thread (mirrors CLI input()).
@@ -15230,14 +15873,28 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
else "a gateway interruption"
)
_persist_user_message_override = message
+ # The empty-message case is the auto-resume startup turn
+ # synthesized by _schedule_resume_pending_sessions — there is
+ # no NEW user message to address, so tell the model to report
+ # recovery instead of the (nonexistent) "new message".
+ if message:
+ _resume_guidance = (
+ "Address the user's NEW message below FIRST and focus "
+ "on what the user is asking now."
+ )
+ else:
+ _resume_guidance = (
+ "Report to the user that the session was restored "
+ "successfully and ask what they would like to do next."
+ )
message = (
- f"[System note: A new message has arrived. The previous turn "
- f"was interrupted by {_reason_phrase}. "
- f"Address the user's NEW message below FIRST. "
+ f"[System note: The previous turn was interrupted by "
+ f"{_reason_phrase}; the gateway is now back online. "
+ f"Any restart/shutdown command in the history has already "
+ f"run — do NOT re-execute or verify it. {_resume_guidance} "
f"Do NOT re-execute old tool calls — skip any unfinished "
- f"work from the conversation history and focus on what the "
- f"user is asking now.]\n\n"
- + message
+ f"work from the conversation history.]"
+ + (f"\n\n{message}" if message else "")
)
elif _has_fresh_tool_tail:
_persist_user_message_override = message
@@ -15348,6 +16005,13 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
# below must still point the gateway at the compressed child.
agent = agent_holder[0]
_session_was_split = False
+ # In-place compaction (compression.in_place / #38763) compacts the
+ # transcript WITHOUT rotating the id, so the id-change diff below
+ # can't detect it. compress_context() sets this rotation-independent
+ # flag on the agent; the gateway uses it to re-baseline transcript
+ # handling (history_offset=0 + rewrite the JSONL transcript) the
+ # same way a split would, even though the session_id is unchanged.
+ _compacted_in_place = bool(getattr(agent, "_last_compaction_in_place", False)) if agent else False
agent_session_id = getattr(agent, 'session_id', session_id) if agent else session_id
if agent and session_key and agent_session_id != session_id:
_session_was_split = True
@@ -15396,7 +16060,14 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
)
effective_session_id = agent_session_id
- _effective_history_offset = 0 if _session_was_split else len(agent_history)
+ # history_offset=0 whenever the agent's message list no longer has
+ # the original history prefix — i.e. on rotation (split) OR in-place
+ # compaction. In both cases the returned `messages` is the compacted
+ # set, so the gateway must persist all of it (offset 0), not slice
+ # past the pre-compaction length (which would drop everything).
+ _effective_history_offset = (
+ 0 if (_session_was_split or _compacted_in_place) else len(agent_history)
+ )
if not final_response:
error_msg = f"⚠️ {result['error']}" if result.get("error") else ""
@@ -15413,6 +16084,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
"compression_exhausted": result.get("compression_exhausted", False),
"tools": tools_holder[0] or [],
"history_offset": _effective_history_offset,
+ "compacted_in_place": _compacted_in_place,
"session_id": effective_session_id,
"last_prompt_tokens": _last_prompt_toks,
"input_tokens": _input_toks,
@@ -15513,6 +16185,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
"interrupt_message": result_holder[0].get("interrupt_message") if result_holder[0] else None,
"tools": tools_holder[0] or [],
"history_offset": _effective_history_offset,
+ "compacted_in_place": _compacted_in_place,
"last_prompt_tokens": _last_prompt_toks,
"input_tokens": _input_toks,
"output_tokens": _output_toks,
@@ -15694,6 +16367,20 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
_heartbeat_msg_id: Optional[str] = None
while True:
await asyncio.sleep(_NOTIFY_INTERVAL)
+ # Stop heartbeating once this run no longer owns the session
+ # slot or the executor has finished — otherwise a stale
+ # "running: delegate_task" bubble can outlive the run that
+ # spawned it (#12029). _executor_task is a closure var bound
+ # just after this task is scheduled; tolerate the brief window
+ # before then (the first wake is _NOTIFY_INTERVAL away anyway).
+ try:
+ _exec_ref = _executor_task
+ except NameError:
+ _exec_ref = None
+ if not self._should_emit_long_running_notification(
+ session_key, agent_holder[0], _exec_ref
+ ):
+ break
_elapsed_mins = int((time.time() - _notify_start) // 60)
# Include agent activity context if available. Default
# heartbeat is terse: elapsed + current tool. Verbose
@@ -15741,7 +16428,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
_notify_res = await _notify_adapter.send(
source.chat_id,
_heartbeat_text,
- metadata=_status_thread_metadata,
+ metadata=_non_conversational_metadata(_status_thread_metadata, platform=source.platform),
)
if getattr(_notify_res, "success", False) and getattr(
_notify_res, "message_id", None
@@ -16464,21 +17151,20 @@ def _run_planned_stop_watcher(
stop_event.wait(poll_interval)
-def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
- """
- Background thread that ticks the cron scheduler at a regular interval.
-
- Runs inside the gateway process so cronjobs fire automatically without
- needing a separate `hermes cron daemon` or system cron entry.
+def _start_gateway_housekeeping(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
+ """Background thread for gateway-only periodic chores (NOT cron).
- When ``adapters`` and ``loop`` are provided, passes them through to the
- cron delivery path so live adapters can be used for E2EE rooms.
+ Split out of the historical ``_start_cron_ticker`` so the cron *trigger*
+ can live behind the ``CronScheduler`` provider (built-in or external) while
+ these gateway-specific chores keep running independently of which provider
+ fires cron. An external scale-to-zero provider has no 60s loop at all, but
+ this housekeeping still wants its hourly cadence — so it owns its own loop.
- Also refreshes the channel directory every 5 minutes and prunes the
- image/audio/document cache + expired ``hermes debug share`` pastes
- once per hour.
+ Refreshes the channel directory every 5 minutes and prunes the
+ image/audio/document cache + expired ``hermes debug share`` pastes once per
+ hour, and polls the curator hourly (its inner gate enforces the real
+ weekly cadence).
"""
- from cron.scheduler import tick as cron_tick
from gateway.platforms.base import cleanup_image_cache, cleanup_document_cache
from hermes_cli.debug import _sweep_expired_pastes
@@ -16487,14 +17173,9 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
PASTE_SWEEP_EVERY = 60 # ticks — once per hour
CURATOR_EVERY = 60 # ticks — poll hourly (inner gate handles the real cadence)
- logger.info("Cron ticker started (interval=%ds)", interval)
+ logger.info("Gateway housekeeping started (interval=%ds)", interval)
tick_count = 0
while not stop_event.is_set():
- try:
- cron_tick(verbose=False, adapters=adapters, loop=loop, sync=False)
- except Exception as e:
- logger.debug("Cron tick error: %s", e)
-
tick_count += 1
if tick_count % CHANNEL_DIR_EVERY == 0 and adapters:
@@ -16502,9 +17183,9 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
from gateway.channel_directory import build_channel_directory
if loop is not None:
# build_channel_directory is async (Slack web calls), and
- # this ticker runs in a background thread. Schedule onto
- # the gateway event loop and wait briefly for completion
- # so refresh failures are still logged via the except.
+ # this runs in a background thread. Schedule onto the
+ # gateway event loop and wait briefly for completion so
+ # refresh failures are still logged via the except.
fut = safe_schedule_threadsafe(
build_channel_directory(adapters), loop,
logger=logger,
@@ -16540,7 +17221,7 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
except Exception as e:
logger.debug("Paste sweep error: %s", e)
- # Curator — piggy-back on the existing cron ticker so long-running
+ # Curator — piggy-back on the housekeeping loop so long-running
# gateways get weekly skill maintenance without needing restarts.
# maybe_run_curator() is internally gated by config.interval_hours
# (7 days by default), so CURATOR_EVERY is just the poll rate — the
@@ -16556,7 +17237,22 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
logger.debug("Curator tick error: %s", e)
stop_event.wait(timeout=interval)
- logger.info("Cron ticker stopped")
+ logger.info("Gateway housekeeping stopped")
+
+
+def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
+ """DEPRECATED shim — preserved for backward compatibility.
+
+ The cron trigger now lives behind the ``CronScheduler`` provider
+ (``cron.scheduler_provider``); the gateway resolves a provider and runs its
+ ``start()`` directly (see ``start_gateway``). This shim runs ONLY the
+ built-in in-process tick loop, exactly as before, for any external caller
+ or test that still references this symbol (e.g. hermes_cli/debug.py). It no
+ longer runs gateway housekeeping — that moved to
+ ``_start_gateway_housekeeping``.
+ """
+ from cron.scheduler_provider import InProcessCronScheduler
+ InProcessCronScheduler().start(stop_event, adapters=adapters, loop=loop, interval=interval)
async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool = False, verbosity: Optional[int] = 0) -> bool:
@@ -16722,6 +17418,24 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
from hermes_logging import setup_logging, _safe_stderr
setup_logging(hermes_home=_hermes_home, mode="gateway")
+ # Startup security posture audit — warn-on-load, never blocks. Surfaces
+ # root / weak-SSH / ephemeral-container / unauthenticated-listener posture
+ # so operators get the "you're exposed" signal the June 2026 MCP-config
+ # persistence campaign victims never had.
+ try:
+ from hermes_cli.security_audit_startup import log_startup_security_warnings
+
+ _audit_cfg = None
+ try:
+ from hermes_cli.config import read_raw_config
+
+ _audit_cfg = read_raw_config()
+ except Exception:
+ _audit_cfg = None
+ log_startup_security_warnings(hermes_home=_hermes_home, config=_audit_cfg)
+ except Exception as _audit_exc:
+ logger.debug("Startup security audit failed (non-fatal): %s", _audit_exc)
+
# Optional stderr handler — level driven by -v/-q flags on the CLI.
# verbosity=None (-q/--quiet): no stderr output
# verbosity=0 (default): WARNING and above
@@ -16928,6 +17642,13 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
atexit.register(remove_pid_file)
atexit.register(release_gateway_runtime_lock)
+ try:
+ from hermes_cli.nous_auth_keepalive import start_nous_auth_keepalive
+
+ start_nous_auth_keepalive()
+ except Exception as exc:
+ logger.debug("Nous auth keepalive did not start: %s", exc)
+
_ensure_windows_gateway_venv_imports()
# MCP tool discovery — run in an executor so the asyncio event loop
@@ -16952,29 +17673,58 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
logger.error("Gateway exiting cleanly: %s", runner.exit_reason)
return True
- # Start background cron ticker so scheduled jobs fire automatically.
- # Pass the event loop so cron delivery can use live adapters (E2EE support).
+ # Start the background cron scheduler via the resolved provider so
+ # scheduled jobs fire automatically. The built-in provider is the
+ # historical in-process 60s ticker; an external provider (e.g. chronos)
+ # may arm a schedule and return. Pass the event loop so cron delivery can
+ # use live adapters (E2EE support).
+ from cron.scheduler_provider import resolve_cron_scheduler
cron_stop = threading.Event()
+ cron_provider = resolve_cron_scheduler()
cron_thread = threading.Thread(
- target=_start_cron_ticker,
+ target=cron_provider.start,
args=(cron_stop,),
kwargs={"adapters": runner.adapters, "loop": asyncio.get_running_loop()},
daemon=True,
- name="cron-ticker",
+ name="cron-scheduler",
)
cron_thread.start()
+
+ # Gateway-only periodic housekeeping (channel dir, cache cleanup, paste
+ # sweep, curator) — runs independently of which cron provider is active.
+ # Shares cron_stop as the shutdown signal.
+ housekeeping_thread = threading.Thread(
+ target=_start_gateway_housekeeping,
+ args=(cron_stop,),
+ kwargs={"adapters": runner.adapters, "loop": asyncio.get_running_loop()},
+ daemon=True,
+ name="gateway-housekeeping",
+ )
+ housekeeping_thread.start()
# Wait for shutdown
await runner.wait_for_shutdown()
+ try:
+ from hermes_cli.nous_auth_keepalive import stop_nous_auth_keepalive
+
+ stop_nous_auth_keepalive()
+ except Exception:
+ pass
+
if runner.should_exit_with_failure:
if runner.exit_reason:
logger.error("Gateway exiting with failure: %s", runner.exit_reason)
return False
- # Stop cron ticker cleanly
+ # Stop cron scheduler + housekeeping cleanly
cron_stop.set()
+ try:
+ cron_provider.stop()
+ except Exception as e:
+ logger.debug("Cron provider stop() error: %s", e)
cron_thread.join(timeout=5)
+ housekeeping_thread.join(timeout=5)
# Stop the planned-stop watcher (daemon=True so this is belt-and-suspenders).
_planned_stop_watcher_stop.set()
diff --git a/gateway/session.py b/gateway/session.py
index f48b83fed0c..68df8f2955d 100644
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -66,6 +66,28 @@ from .whatsapp_identity import (
)
from utils import atomic_replace
+# Session keys/ids flow into filesystem paths downstream (e.g.
+# ``sessions_dir / f"{session_id}.json"`` in hermes_state, request-dump
+# filenames in agent_runtime_helpers). Any value that could escape the
+# sessions directory as a path must be rejected at the entry boundary.
+# Rejects: parent traversal (``..``), a path separator anywhere (``/`` or
+# ``\``, so a non-leading Windows separator can't slip through), and a
+# leading Windows drive letter (``C:``). Legitimate session keys are
+# colon-delimited multi-segment ids (``agent:main::...``) and
+# never contain these, so there are no false positives in practice.
+def _is_path_unsafe(value: object) -> bool:
+ """Return True if ``value`` could traverse outside the sessions dir."""
+ if not value:
+ return False
+ s = str(value)
+ if ".." in s or "/" in s or "\\" in s:
+ return True
+ # Leading Windows drive path, e.g. "C:\..." or "d:/...". A bare "x:"
+ # with no following separator isn't a usable absolute path, and the
+ # separator forms are already caught above — but keep an explicit guard
+ # for the drive-letter prefix in case a separator was normalized away.
+ return len(s) >= 2 and s[0].isalpha() and s[1] == ":"
+
@dataclass
class SessionSource:
@@ -92,6 +114,11 @@ class SessionSource:
parent_chat_id: Optional[str] = None # Parent channel when chat_id refers to a thread
message_id: Optional[str] = None # ID of the triggering message (for pin/reply/react)
role_authorized: bool = False # True when adapter granted access via role (not user ID)
+ # Profile this inbound message is routed to in a multiplexing gateway
+ # (from the /p// URL prefix or per-credential adapter ownership).
+ # None => the gateway's active/default profile. Drives both session-key
+ # namespacing and the per-turn config/credential scope.
+ profile: Optional[str] = None
@property
def description(self) -> str:
@@ -135,6 +162,8 @@ class SessionSource:
d["parent_chat_id"] = self.parent_chat_id
if self.message_id:
d["message_id"] = self.message_id
+ if self.profile:
+ d["profile"] = self.profile
return d
@classmethod
@@ -153,6 +182,7 @@ class SessionSource:
guild_id=data.get("guild_id"),
parent_chat_id=data.get("parent_chat_id"),
message_id=data.get("message_id"),
+ profile=data.get("profile"),
)
@@ -565,9 +595,19 @@ class SessionEntry:
except (TypeError, ValueError):
last_resume_marked_at = None
+ session_key = data["session_key"]
+ session_id = data["session_id"]
+
+ # Validate path-sensitive fields to prevent directory traversal (CWE-22)
+ for _field, _val in (("session_key", session_key), ("session_id", session_id)):
+ if _is_path_unsafe(_val):
+ raise ValueError(
+ f"Invalid {_field}: potential directory traversal detected"
+ )
+
return cls(
- session_key=data["session_key"],
- session_id=data["session_id"],
+ session_key=session_key,
+ session_id=session_id,
created_at=datetime.fromisoformat(data["created_at"]),
updated_at=datetime.fromisoformat(data["updated_at"]),
origin=origin,
@@ -615,15 +655,41 @@ def is_shared_multi_user_session(
return not group_sessions_per_user
+def _session_key_namespace(profile: Optional[str]) -> str:
+ """Return the ``agent:`` namespace prefix for a session key.
+
+ The historical key format is ``agent:main:::...`` where
+ ``main`` is a static namespace literal (NOT a branch name — branching keys
+ off ``session_id``, not this slot). Multi-profile multiplexing reuses this
+ slot to carry the profile:
+
+ - default profile (or ``None``/``""``/``"default"``) → ``agent:main`` —
+ BYTE-IDENTICAL to every key ever generated, so existing sessions and all
+ positional parsers (``parts[2]`` == platform, etc.) are unaffected.
+ - named profile ``coder`` → ``agent:coder`` — keeps the same positional
+ layout, just a different namespace, so two profiles serving the same
+ platform/chat never collide.
+ """
+ if not profile or profile == "default":
+ return "agent:main"
+ return f"agent:{profile}"
+
+
def build_session_key(
source: SessionSource,
group_sessions_per_user: bool = True,
thread_sessions_per_user: bool = False,
+ profile: Optional[str] = None,
) -> str:
"""Build a deterministic session key from a message source.
This is the single source of truth for session key construction.
+ ``profile`` selects the key namespace (see :func:`_session_key_namespace`).
+ It defaults to ``None`` ⇒ the legacy ``agent:main`` namespace, so callers
+ that don't multiplex produce byte-identical keys to before. Only the
+ multiplexing gateway passes a non-default profile.
+
DM rules:
- DMs include chat_id when present, so each private conversation is isolated.
- thread_id further differentiates threaded DMs within the same DM chat.
@@ -643,6 +709,7 @@ def build_session_key(
shared session per chat.
- Without identifiers, messages fall back to one session per platform/chat_type.
"""
+ ns = _session_key_namespace(profile)
platform = source.platform.value
if source.chat_type == "dm":
dm_chat_id = source.chat_id
@@ -651,12 +718,12 @@ def build_session_key(
if dm_chat_id:
if source.thread_id:
- return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
- return f"agent:main:{platform}:dm:{dm_chat_id}"
+ return f"{ns}:{platform}:dm:{dm_chat_id}:{source.thread_id}"
+ return f"{ns}:{platform}:dm:{dm_chat_id}"
# No chat_id — fall back to the sender's own identifier before the
# bare per-platform sink. Without this, every DM from every user that
# arrives without a chat_id (non-standard adapters / synthetic sources)
- # collapses into one shared "agent:main::dm" session, and a
+ # collapses into one shared "::dm" session, and a
# single cached agent ends up serving multiple people's conversations —
# cross-user history bleed. participant_id keeps DMs isolated per user.
dm_participant_id = source.user_id_alt or source.user_id
@@ -667,11 +734,11 @@ def build_session_key(
)
if dm_participant_id:
if source.thread_id:
- return f"agent:main:{platform}:dm:{dm_participant_id}:{source.thread_id}"
- return f"agent:main:{platform}:dm:{dm_participant_id}"
+ return f"{ns}:{platform}:dm:{dm_participant_id}:{source.thread_id}"
+ return f"{ns}:{platform}:dm:{dm_participant_id}"
if source.thread_id:
- return f"agent:main:{platform}:dm:{source.thread_id}"
- return f"agent:main:{platform}:dm"
+ return f"{ns}:{platform}:dm:{source.thread_id}"
+ return f"{ns}:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
@@ -679,7 +746,7 @@ def build_session_key(
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
- key_parts = ["agent:main", platform, source.chat_type]
+ key_parts = [ns, platform, source.chat_type]
if source.chat_id:
key_parts.append(source.chat_id)
@@ -741,12 +808,11 @@ class SessionStore:
try:
with open(sessions_file, "r", encoding="utf-8") as f:
data = json.load(f)
- for key, entry_data in data.items():
- try:
- self._entries[key] = SessionEntry.from_dict(entry_data)
- except (ValueError, KeyError):
- # Skip entries with unknown/removed platform values
- continue
+ for key, entry_data in data.items():
+ try:
+ self._entries[key] = SessionEntry.from_dict(entry_data)
+ except (ValueError, KeyError) as e:
+ logger.warning("Skipping invalid session entry %r: %s", key, e)
except Exception as e:
print(f"[gateway] Warning: Failed to load sessions: {e}")
@@ -775,12 +841,32 @@ class SessionStore:
logger.debug("Could not remove temp file %s: %s", tmp_path, e)
raise
+ def _resolve_profile_for_key(self, source: Optional[SessionSource] = None) -> Optional[str]:
+ """Return the profile namespace for session keys, or None when off.
+
+ When ``multiplex_profiles`` is disabled (default), returns ``None`` so
+ keys stay in the legacy ``agent:main`` namespace — byte-identical to
+ before. When enabled, prefers the profile the inbound source was routed
+ to (``source.profile`` — set by the /p// URL prefix or
+ per-credential adapter), falling back to the active profile name.
+ """
+ if not getattr(self.config, "multiplex_profiles", False):
+ return None
+ if source is not None and source.profile:
+ return source.profile
+ try:
+ from hermes_cli.profiles import get_active_profile_name
+ return get_active_profile_name() or "default"
+ except Exception:
+ return None
+
def _generate_session_key(self, source: SessionSource) -> str:
"""Generate a session key from a source."""
return build_session_key(
source,
group_sessions_per_user=getattr(self.config, "group_sessions_per_user", True),
thread_sessions_per_user=getattr(self.config, "thread_sessions_per_user", False),
+ profile=self._resolve_profile_for_key(source),
)
def _is_session_expired(self, entry: SessionEntry) -> bool:
diff --git a/gateway/session_context.py b/gateway/session_context.py
index c8c5cf438c7..55f269df54d 100644
--- a/gateway/session_context.py
+++ b/gateway/session_context.py
@@ -49,6 +49,7 @@ _UNSET: Any = object()
# ---------------------------------------------------------------------------
_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET)
+_SESSION_SOURCE: ContextVar = ContextVar("HERMES_SESSION_SOURCE", default=_UNSET)
_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET)
_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET)
_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET)
@@ -61,6 +62,27 @@ _SESSION_ID: ContextVar = ContextVar("HERMES_SESSION_ID", default=_UNSET)
# private-chat topic (those lanes route only with thread id + reply anchor).
_SESSION_MESSAGE_ID: ContextVar = ContextVar("HERMES_SESSION_MESSAGE_ID", default=_UNSET)
+# Whether the current session's delivery channel can route an ASYNC completion
+# back to the agent AFTER the current turn ends (i.e. wake a fresh turn).
+#
+# True — CLI (in-process completion_queue drain) and the real gateway
+# platforms (Telegram/Discord/Slack/...), which hold a persistent
+# outbound channel and run the watcher/drain loops.
+# False — stateless request/response adapters (the API server: every route,
+# spec and proprietary, tears down its channel when the turn ends, so
+# a background completion that finishes later has nowhere to go).
+#
+# Tools that promise async delivery (terminal notify_on_complete /
+# watch_patterns, delegate_task background=True) read this via
+# ``async_delivery_supported()`` and refuse to hand out a promise the channel
+# can't keep — turning a silent no-op into an explicit contract.
+#
+# Default _UNSET => treated as supported, so CLI (which never sets a platform)
+# and any contextvar-unaware path keep working. Stateless adapters opt OUT by
+# setting ``supports_async_delivery = False`` on the adapter class; the gateway
+# propagates that into this contextvar at session-bind time.
+_SESSION_ASYNC_DELIVERY: ContextVar = ContextVar("HERMES_SESSION_ASYNC_DELIVERY", default=_UNSET)
+
# Cron auto-delivery vars — set per-job in run_job() so concurrent jobs
# don't clobber each other's delivery targets.
_CRON_AUTO_DELIVER_PLATFORM: ContextVar = ContextVar("HERMES_CRON_AUTO_DELIVER_PLATFORM", default=_UNSET)
@@ -69,6 +91,7 @@ _CRON_AUTO_DELIVER_THREAD_ID: ContextVar = ContextVar("HERMES_CRON_AUTO_DELIVER_
_VAR_MAP = {
"HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
+ "HERMES_SESSION_SOURCE": _SESSION_SOURCE,
"HERMES_SESSION_CHAT_ID": _SESSION_CHAT_ID,
"HERMES_SESSION_CHAT_NAME": _SESSION_CHAT_NAME,
"HERMES_SESSION_THREAD_ID": _SESSION_THREAD_ID,
@@ -100,6 +123,7 @@ def set_current_session_id(session_id: str) -> None:
def set_session_vars(
platform: str = "",
+ source: str = "",
chat_id: str = "",
chat_name: str = "",
thread_id: str = "",
@@ -109,6 +133,7 @@ def set_session_vars(
session_id: str = "",
message_id: str = "",
cwd: str = "",
+ async_delivery: bool = True,
) -> list:
"""Set all session context variables and return reset tokens.
@@ -119,9 +144,15 @@ def set_session_vars(
only for API compatibility.
``cwd`` pins the logical working directory for this context.
+
+ ``async_delivery`` declares whether this session's channel can route a
+ background completion back to the agent after the turn ends (see
+ ``_SESSION_ASYNC_DELIVERY`` / ``async_delivery_supported``). Stateless
+ request/response adapters (the API server) pass ``False``.
"""
tokens = [
_SESSION_PLATFORM.set(platform),
+ _SESSION_SOURCE.set(source),
_SESSION_CHAT_ID.set(chat_id),
_SESSION_CHAT_NAME.set(chat_name),
_SESSION_THREAD_ID.set(thread_id),
@@ -130,6 +161,7 @@ def set_session_vars(
_SESSION_KEY.set(session_key),
_SESSION_ID.set(session_id),
_SESSION_MESSAGE_ID.set(message_id),
+ _SESSION_ASYNC_DELIVERY.set(bool(async_delivery)),
]
try:
from agent.runtime_cwd import set_session_cwd
@@ -153,6 +185,7 @@ def clear_session_vars(tokens: list) -> None:
"""
for var in (
_SESSION_PLATFORM,
+ _SESSION_SOURCE,
_SESSION_CHAT_ID,
_SESSION_CHAT_NAME,
_SESSION_THREAD_ID,
@@ -163,6 +196,11 @@ def clear_session_vars(tokens: list) -> None:
_SESSION_MESSAGE_ID,
):
var.set("")
+ # Reset async-delivery capability to the "never set" sentinel rather than a
+ # falsy value: a cleared context should fall back to the default-supported
+ # behavior (CLI / unaware paths), not be mistaken for an opted-out
+ # stateless adapter.
+ _SESSION_ASYNC_DELIVERY.set(_UNSET)
try:
from agent.runtime_cwd import clear_session_cwd
@@ -195,3 +233,22 @@ def get_session_env(name: str, default: str = "") -> str:
return value
# Fall back to os.environ for CLI, cron, and test compatibility
return os.getenv(name, default)
+
+
+def async_delivery_supported() -> bool:
+ """Whether the current session can deliver a background completion later.
+
+ Returns ``False`` only when the active session was explicitly bound by a
+ stateless adapter (the API server) that cannot route a notification back to
+ the agent after the turn ends. CLI, cron, and the real gateway platforms —
+ and any path that never bound the contextvar — return ``True``.
+
+ Tools that promise async delivery (``terminal`` notify_on_complete /
+ watch_patterns, ``delegate_task`` background=True) consult this before
+ registering a watcher / dispatching a detached child, so they can refuse a
+ promise the channel can't keep instead of silently no-op'ing.
+ """
+ value = _SESSION_ASYNC_DELIVERY.get()
+ if value is _UNSET:
+ return True
+ return bool(value)
diff --git a/gateway/slash_commands.py b/gateway/slash_commands.py
index 04c3f4ca89f..ca519413a07 100644
--- a/gateway/slash_commands.py
+++ b/gateway/slash_commands.py
@@ -34,7 +34,7 @@ from agent.i18n import t
from gateway.config import HomeChannel, Platform, PlatformConfig
from gateway.platforms.base import EphemeralReply, MessageEvent, MessageType
from gateway.session import SessionSource, build_session_key
-from hermes_cli.config import cfg_get
+from hermes_cli.config import cfg_get, clear_model_endpoint_credentials
from utils import (
atomic_json_write,
atomic_yaml_write,
@@ -1030,12 +1030,13 @@ class GatewaySlashCommandsMixin:
)
async def _handle_model_command(self, event: MessageEvent) -> Optional[str]:
- """Handle /model command — switch model for this session.
+ """Handle /model command — switch model.
Supports:
/model — interactive picker (Telegram/Discord) or text list
- /model — switch for this session only
- /model --global — switch and persist to config.yaml
+ /model — switch model (persists by default)
+ /model --session — switch for this session only
+ /model --global — switch and persist (explicit)
/model --provider — switch provider + model
/model --provider — switch to provider, auto-detect model
"""
@@ -1043,6 +1044,7 @@ class GatewaySlashCommandsMixin:
import yaml
from hermes_cli.model_switch import (
switch_model as _switch_model, parse_model_flags,
+ resolve_persist_behavior,
list_authenticated_providers,
list_picker_providers,
)
@@ -1050,8 +1052,15 @@ class GatewaySlashCommandsMixin:
raw_args = event.get_command_args().strip()
- # Parse --provider, --global, and --refresh flags
- model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
+ # Parse --provider, --global, --session, and --refresh flags
+ (
+ model_input,
+ explicit_provider,
+ is_global_flag,
+ force_refresh,
+ is_session,
+ ) = parse_model_flags(raw_args)
+ persist_global = resolve_persist_behavior(is_global_flag, is_session)
# --refresh: bust the disk cache so the picker shows live data.
if force_refresh:
@@ -1143,7 +1152,7 @@ class GatewaySlashCommandsMixin:
current_model=_cur_model,
current_base_url=_cur_base_url,
current_api_key=_cur_api_key,
- is_global=False,
+ is_global=persist_global,
explicit_provider=provider_slug,
user_providers=user_provs,
custom_providers=custom_provs,
@@ -1151,6 +1160,22 @@ class GatewaySlashCommandsMixin:
if not result.success:
return t("gateway.model.error_prefix", error=result.error_message)
+ try:
+ from hermes_cli.context_switch_guard import (
+ enrich_model_switch_warnings_for_gateway,
+ )
+
+ enrich_model_switch_warnings_for_gateway(
+ result,
+ _self,
+ session_key=_session_key,
+ source=event.source,
+ custom_providers=custom_provs,
+ load_gateway_config=_load_gateway_config,
+ )
+ except Exception as exc:
+ logger.debug("preflight-compression switch warning failed: %s", exc)
+
# Update cached agent in-place
cached_entry = None
_cache_lock = getattr(_self, "_agent_cache_lock", None)
@@ -1168,7 +1193,25 @@ class GatewaySlashCommandsMixin:
api_mode=result.api_mode,
)
except Exception as exc:
- logger.warning("Picker model switch failed for cached agent: %s", exc)
+ # The in-place swap rolled the agent back to the
+ # OLD working model/client and re-raised. Abort
+ # the rest of the commit: do NOT persist the
+ # failed model to the DB, do NOT set a session
+ # override pointing at the broken model, and do
+ # NOT evict the working cached agent. Otherwise
+ # the next message rebuilds a dead agent from the
+ # broken override and the conversation is lost
+ # (#50163). A failed switch must be a no-op.
+ logger.warning(
+ "Picker model switch failed for cached agent: %s", exc
+ )
+ return t(
+ "gateway.model.error_prefix",
+ error=(
+ f"Model switch to {result.new_model} failed ({exc}); "
+ f"staying on {_cur_model}."
+ ),
+ )
# Persist the new model to the session DB so the
# dashboard shows the updated model (#34850).
@@ -1207,6 +1250,36 @@ class GatewaySlashCommandsMixin:
# stale cache signature to trigger a rebuild.
_self._evict_cached_agent(_session_key)
+ # Persist to config (default) unless --session opted out,
+ # mirroring the text /model command path above so a picked
+ # model survives across sessions like a typed one (#49066).
+ if persist_global:
+ try:
+ if config_path.exists():
+ with open(config_path, encoding="utf-8") as f:
+ _persist_cfg = yaml.safe_load(f) or {}
+ else:
+ _persist_cfg = {}
+ _raw_model = _persist_cfg.get("model")
+ if isinstance(_raw_model, dict):
+ _persist_model_cfg = _raw_model
+ elif isinstance(_raw_model, str) and _raw_model.strip():
+ _persist_model_cfg = {"default": _raw_model.strip()}
+ _persist_cfg["model"] = _persist_model_cfg
+ else:
+ _persist_model_cfg = {}
+ _persist_cfg["model"] = _persist_model_cfg
+ _persist_model_cfg["default"] = result.new_model
+ _persist_model_cfg["provider"] = result.target_provider
+ if result.base_url:
+ _persist_model_cfg["base_url"] = result.base_url
+ if str(result.target_provider or "").strip().lower() != "custom":
+ clear_model_endpoint_credentials(_persist_model_cfg)
+ from hermes_cli.config import save_config
+ save_config(_persist_cfg)
+ except Exception as e:
+ logger.warning("Failed to persist model switch: %s", e)
+
# Build confirmation text
plabel = result.provider_label or result.target_provider
lines = [t("gateway.model.switched", model=result.new_model)]
@@ -1240,7 +1313,12 @@ class GatewaySlashCommandsMixin:
if mi.has_cost_data():
lines.append(t("gateway.model.cost_label", cost=mi.format_cost()))
lines.append(t("gateway.model.capabilities_label", capabilities=mi.format_capabilities()))
- lines.append(t("gateway.model.session_only_hint"))
+ if result.warning_message:
+ lines.append(t("gateway.model.warning_prefix", warning=result.warning_message))
+ if persist_global:
+ lines.append(t("gateway.model.saved_global"))
+ else:
+ lines.append(t("gateway.model.session_only_hint"))
return "\n".join(lines)
metadata = self._thread_metadata_for_source(source, self._reply_anchor_for_event(event))
@@ -1303,6 +1381,22 @@ class GatewaySlashCommandsMixin:
if not result.success:
return t("gateway.model.error_prefix", error=result.error_message)
+ try:
+ from hermes_cli.context_switch_guard import (
+ enrich_model_switch_warnings_for_gateway,
+ )
+
+ enrich_model_switch_warnings_for_gateway(
+ result,
+ self,
+ session_key=session_key,
+ source=source,
+ custom_providers=custom_provs,
+ load_gateway_config=_load_gateway_config,
+ )
+ except Exception as exc:
+ logger.debug("preflight-compression switch warning failed: %s", exc)
+
async def _finish_switch() -> str:
"""Apply the resolved switch (agent, session, config) and build the reply."""
# If there's a cached agent, update it in-place
@@ -1323,7 +1417,20 @@ class GatewaySlashCommandsMixin:
api_mode=result.api_mode,
)
except Exception as exc:
+ # In-place swap rolled the agent back to the OLD working
+ # model/client and re-raised. Abort the commit: skip DB
+ # persist, session override, cache eviction, and config
+ # write so a failed switch is a no-op rather than a dead
+ # conversation (#50163). Without this early return the
+ # next message rebuilds a broken agent from the override.
logger.warning("In-place model switch failed for cached agent: %s", exc)
+ return t(
+ "gateway.model.error_prefix",
+ error=(
+ f"Model switch to {result.new_model} failed ({exc}); "
+ f"staying on {current_model}."
+ ),
+ )
# Persist the new model to the session DB so the dashboard
# shows the updated model (#34850).
@@ -1362,7 +1469,7 @@ class GatewaySlashCommandsMixin:
# override rather than relying on cache signature mismatch detection.
self._evict_cached_agent(session_key)
- # Persist to config if --global
+ # Persist to config (default) unless --session opted out
if persist_global:
try:
if config_path.exists():
@@ -1389,6 +1496,8 @@ class GatewaySlashCommandsMixin:
model_cfg["provider"] = result.target_provider
if result.base_url:
model_cfg["base_url"] = result.base_url
+ if str(result.target_provider or "").strip().lower() != "custom":
+ clear_model_endpoint_credentials(model_cfg)
from hermes_cli.config import save_config
save_config(cfg)
except Exception as e:
@@ -2583,12 +2692,14 @@ class GatewaySlashCommandsMixin:
if partial and tail:
compressed = rejoin_compressed_head_and_tail(compressed, tail)
- # _compress_context already calls end_session() on the old session
- # (preserving its full transcript in SQLite) and creates a new
- # session_id for the continuation. Write the compressed messages
- # into the NEW session so the original history stays searchable.
+ # _compress_context either rotated (legacy: ended the old
+ # session, created a continuation id — write compressed messages
+ # into the NEW session so the original stays searchable) or
+ # compacted in place (compression.in_place / #38763: same id,
+ # transcript replaced with the compacted set).
new_session_id = tmp_agent.session_id
rotated = new_session_id != session_entry.session_id
+ _in_place = bool(getattr(tmp_agent, "compression_in_place", False))
if rotated:
session_entry.session_id = new_session_id
self.session_store._save()
@@ -2596,20 +2707,27 @@ class GatewaySlashCommandsMixin:
source, session_entry, reason="compress-command",
)
- # Only rewrite the transcript when rotation actually produced a
- # NEW session id. If _compress_context could not rotate (e.g.
- # _session_db unavailable, or the DB split raised), session_id
- # is unchanged and rewrite_transcript() would DELETE the
- # original messages and replace them with only the compressed
- # summary — permanent data loss (#44794, #39704). In that case
- # leave the original transcript intact.
- if rotated:
- self.session_store.rewrite_transcript(new_session_id, compressed)
+ # Rewrite the transcript when EITHER rotation produced a new id
+ # OR in-place compaction succeeded. The danger this guards
+ # against is the THIRD case: _compress_context could NOT rotate
+ # AND was not in-place (e.g. legacy mode but _session_db
+ # unavailable / the DB split raised) — there session_id is
+ # unchanged for a FAILURE reason, and rewrite_transcript() would
+ # DELETE the original messages and replace them with only the
+ # compressed summary (permanent data loss #44794, #39704). In
+ # in-place mode the unchanged id is SUCCESS, so the rewrite is
+ # exactly right (and is the durable write when the throwaway
+ # /compress agent has no _session_db of its own).
+ if rotated or _in_place:
+ self.session_store.rewrite_transcript(
+ new_session_id, compressed
+ )
else:
logger.warning(
"Manual /compress: session rotation did not occur "
- "(session_id unchanged) — preserving original transcript "
- "instead of overwriting it (#44794)."
+ "(session_id unchanged) and in-place mode is off — "
+ "preserving original transcript instead of overwriting "
+ "it (#44794)."
)
# Reset stored token count — transcript changed, old value is stale
self.session_store.update_session(
@@ -2794,6 +2912,22 @@ class GatewaySlashCommandsMixin:
# Set the title
try:
if self._session_db.set_session_title(session_id, sanitized):
+ # Propagate the user-chosen title to the visible Telegram
+ # forum topic name too. Auto-generated titles already rename
+ # the topic; without this, /title only updated the DB title
+ # and the topic kept its auto-assigned name. No-ops off
+ # Telegram topic lanes and when auto-rename is disabled.
+ schedule_rename = getattr(
+ self, "_schedule_telegram_topic_title_rename", None
+ )
+ if callable(schedule_rename):
+ try:
+ schedule_rename(source, session_id, sanitized)
+ except Exception:
+ logger.debug(
+ "Failed to rename Telegram topic from /title",
+ exc_info=True,
+ )
return t("gateway.title.set_to", title=sanitized)
else:
return t("gateway.title.not_found")
diff --git a/gateway/status.py b/gateway/status.py
index 367ac33c4d7..0f812c23e34 100644
--- a/gateway/status.py
+++ b/gateway/status.py
@@ -14,6 +14,7 @@ concurrently under distinct configurations).
import hashlib
import json
import os
+import shlex
import signal
import subprocess
import sys
@@ -109,12 +110,37 @@ def _get_scope_lock_path(scope: str, identity: str) -> Path:
def _get_process_start_time(pid: int) -> Optional[int]:
- """Return the kernel start time for a process when available."""
+ """Return a stable per-process start-time fingerprint, or None.
+
+ Used as a PID-reuse guard: a ``(pid, start_time)`` pair uniquely identifies
+ a process, so a recycled PID (same number, different process) yields a
+ different value and is never mistaken for the original.
+
+ On Linux this is field 22 of ``/proc//stat`` (start time in clock
+ ticks since boot, an int). On platforms without ``/proc`` (macOS, Windows)
+ we fall back to ``psutil.Process(pid).create_time()`` — a float epoch
+ timestamp — quantized to an int (centiseconds) for stable equality.
+
+ The two sources are never mixed on a single platform: ``/proc`` always
+ succeeds first on Linux, and always fails on macOS/Windows so psutil is
+ always used there. Because the guard only compares the value recorded at
+ spawn against the live value *on the same host*, the differing units across
+ platforms are irrelevant — only same-source equality matters.
+ """
stat_path = Path(f"/proc/{pid}/stat")
try:
# Field 22 in /proc//stat is process start time (clock ticks).
return int(stat_path.read_text(encoding="utf-8").split()[21])
except (FileNotFoundError, IndexError, PermissionError, ValueError, OSError):
+ pass
+
+ # No /proc (macOS / Windows): psutil is a hard dependency and exposes a
+ # cross-platform creation time. Quantize to centiseconds so repeated reads
+ # of the same process compare equal without float-precision fragility.
+ try:
+ import psutil # type: ignore
+ return int(round(psutil.Process(pid).create_time() * 100))
+ except Exception:
return None
@@ -164,20 +190,86 @@ def _read_process_cmdline(pid: int) -> Optional[str]:
return None
+def looks_like_gateway_command_line(command: str | None) -> bool:
+ """Return True only for a real ``gateway run`` process command line.
+
+ Lifecycle decisions (is the gateway up? did restart relaunch it?) must not
+ fire on loose substring matches. The previous ``"... gateway" in cmdline``
+ test also matched ``hermes_cli.main gateway status`` and even unrelated
+ processes like ``python -m tui_gateway`` -- which made ``restart()`` race
+ against a still-draining old process and ``status``/``start`` report false
+ positives. This requires the actual ``gateway`` subcommand followed by
+ ``run`` (or one of the gateway-dedicated entrypoints), excluding the other
+ ``gateway`` management subcommands and any process that merely contains the
+ word "gateway".
+
+ Tokenizes quote-aware (``shlex``) so quoted Windows paths with spaces
+ (``"C:\\Program Files\\...\\hermes-gateway.exe"``) survive, and strips
+ ``--profile``/``-p`` selectors from anywhere in argv -- Hermes's
+ ``_apply_profile_override`` removes them before argparse, so the profile
+ flag (and a profile literally named ``gateway``) can legally appear on
+ either side of the ``gateway`` subcommand.
+ """
+ if not command:
+ return False
+
+ try:
+ raw_tokens = shlex.split(command, posix=False)
+ except ValueError:
+ raw_tokens = command.split()
+ # Strip surrounding quotes, normalize slashes + case per token.
+ tokens = [t.strip("\"'").replace("\\", "/").lower() for t in raw_tokens]
+ if not tokens:
+ return False
+
+ # Gateway-dedicated entrypoints carry no subcommand to inspect.
+ for token in tokens:
+ if token == "gateway/run.py" or token.endswith("/gateway/run.py"):
+ return True
+ basename = token.rsplit("/", 1)[-1]
+ if basename in ("hermes-gateway", "hermes-gateway.exe"):
+ return True
+
+ joined = " ".join(tokens)
+ has_gateway_entry = (
+ "hermes_cli.main" in joined
+ or "hermes_cli/main.py" in joined
+ or any(t.rsplit("/", 1)[-1] in ("hermes", "hermes.exe") for t in tokens)
+ )
+ if not has_gateway_entry:
+ return False
+
+ # Drop profile selectors anywhere: --profile X / -p X / --profile=X / -p=X.
+ # This consumes a profile VALUE of "gateway" too, so the real subcommand
+ # token is the one we land on below.
+ filtered: list[str] = []
+ skip_next = False
+ for token in tokens:
+ if skip_next:
+ skip_next = False
+ continue
+ if token in ("--profile", "-p"):
+ skip_next = True
+ continue
+ if token.startswith("--profile=") or token.startswith("-p="):
+ continue
+ filtered.append(token)
+
+ for i, token in enumerate(filtered):
+ if token != "gateway":
+ continue
+ if i + 1 >= len(filtered):
+ return True # bare `hermes gateway` defaults to `run`
+ return filtered[i + 1] == "run"
+ return False
+
+
def _looks_like_gateway_process(pid: int) -> bool:
"""Return True when the live PID still looks like the Hermes gateway."""
cmdline = _read_process_cmdline(pid)
if not cmdline:
return False
-
- patterns = (
- "hermes_cli.main gateway",
- "hermes_cli/main.py gateway",
- "hermes gateway",
- "hermes-gateway",
- "gateway/run.py",
- )
- return any(pattern in cmdline for pattern in patterns)
+ return looks_like_gateway_command_line(cmdline)
def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
@@ -189,15 +281,8 @@ def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
if not isinstance(argv, list) or not argv:
return False
- # Normalize Windows backslashes so patterns match cross-platform.
- cmdline = " ".join(str(part) for part in argv).replace("\\", "/")
- patterns = (
- "hermes_cli.main gateway",
- "hermes_cli/main.py gateway",
- "hermes gateway",
- "gateway/run.py",
- )
- return any(pattern in cmdline for pattern in patterns)
+ cmdline = " ".join(str(part) for part in argv)
+ return looks_like_gateway_command_line(cmdline)
def _build_pid_record() -> dict:
@@ -515,6 +600,7 @@ def write_runtime_status(
platform_state: Any = _UNSET,
error_code: Any = _UNSET,
error_message: Any = _UNSET,
+ served_profiles: Any = _UNSET,
) -> None:
"""Persist gateway runtime health information for diagnostics/status."""
path = _get_runtime_status_path()
@@ -534,7 +620,12 @@ def write_runtime_status(
if restart_requested is not _UNSET:
payload["restart_requested"] = bool(restart_requested)
if active_agents is not _UNSET:
- payload["active_agents"] = max(0, int(active_agents))
+ payload["active_agents"] = parse_active_agents(active_agents)
+ if served_profiles is not _UNSET:
+ # Profiles this gateway multiplexes (multi-profile mode). Absent/empty
+ # for a single-profile gateway. Lets `hermes status` show per-profile
+ # coverage without a second probe.
+ payload["served_profiles"] = list(served_profiles or [])
if platform is not _UNSET:
platform_payload = payload["platforms"].get(platform, {})
@@ -555,6 +646,64 @@ def read_runtime_status() -> Optional[dict[str, Any]]:
return _read_json_file(_get_runtime_status_path())
+def parse_active_agents(raw: Any) -> int:
+ """Coerce a persisted ``active_agents`` value to a clamped non-negative int.
+
+ The shared coercion for the in-flight gateway-turn count. Used on the WRITE
+ side (``write_runtime_status``) and by both HTTP read surfaces
+ (``/api/status`` and ``/health/detailed``) so the count is clamped to a
+ single contract — never negative, never raising on a manually-edited or
+ otherwise non-numeric value (degrades to ``0``).
+ """
+ try:
+ return max(0, int(raw))
+ except (TypeError, ValueError):
+ return 0
+
+
+# States in which the gateway is alive and could be asked to drain. Anything
+# else (draining already, stopping, stopped, startup_failed, None) is NOT a
+# valid begin-drain target.
+_DRAINABLE_GATEWAY_STATES = frozenset({"running"})
+
+
+def derive_gateway_busy(
+ *, gateway_running: bool, gateway_state: Any, active_agents: Any
+) -> bool:
+ """Whether the gateway is actively processing in-flight turns.
+
+ The contract NAS gates lifecycle actions on. Busy iff the gateway is live
+ (``gateway_running``), in the ``running`` state, AND at least one agent is
+ mid-turn (``active_agents > 0``). Degrades to ``False`` whenever liveness
+ is unknown, the state is anything but ``running``, or the count is
+ absent/unparseable — i.e. a down or file-absent gateway reads "not busy",
+ never a spurious "busy".
+
+ NOTE: liveness keys off ``gateway_running`` (a live PID / health probe),
+ NEVER ``updated_at`` — a healthy idle gateway never advances that timestamp.
+ """
+ if not gateway_running:
+ return False
+ if gateway_state not in _DRAINABLE_GATEWAY_STATES:
+ return False
+ try:
+ return int(active_agents) > 0
+ except (TypeError, ValueError):
+ return False
+
+
+def derive_gateway_drainable(*, gateway_running: bool, gateway_state: Any) -> bool:
+ """Whether the gateway can accept a begin-drain request right now.
+
+ True iff the gateway is live and in the ``running`` state — i.e. not already
+ draining/stopping/stopped and not in a failed-start state. This is
+ independent of ``active_agents``: an idle running gateway is drainable (the
+ drain just completes immediately). Degrades to ``False`` for a down or
+ non-running gateway.
+ """
+ return bool(gateway_running) and gateway_state in _DRAINABLE_GATEWAY_STATES
+
+
def get_runtime_status_running_pid(
runtime: Optional[dict[str, Any]] = None,
) -> Optional[int]:
diff --git a/gateway/stream_consumer.py b/gateway/stream_consumer.py
index f559d7ecd43..6c115e715e7 100644
--- a/gateway/stream_consumer.py
+++ b/gateway/stream_consumer.py
@@ -119,6 +119,7 @@ class GatewayStreamConsumer:
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
on_new_message: Optional[callable] = None,
+ on_before_finalize: Optional[Callable[[], Any]] = None,
initial_reply_to_id: Optional[str] = None,
):
self.adapter = adapter
@@ -133,6 +134,10 @@ class GatewayStreamConsumer:
# the content, not edit the old bubble above it.
# Called with no arguments. Exceptions are swallowed.
self._on_new_message = on_new_message
+ # Fired once when the stream transitions into its finalization path.
+ # Gateway callers use this to pause typing refreshes before a slow
+ # final rich-text edit (Telegram MarkdownV2 finalize, etc.).
+ self._on_before_finalize = on_before_finalize
self._initial_reply_to_id = initial_reply_to_id
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
@@ -196,6 +201,7 @@ class GatewayStreamConsumer:
# first failure we permanently disable drafts for the remainder of
# this response and route through edit-based for graceful degradation.
self._draft_failures = 0
+ self._before_finalize_notified = False
def _metadata_for_send(
self,
@@ -242,6 +248,20 @@ class GatewayStreamConsumer:
the subsequent cosmetic edit (cursor removal) failed."""
return self._final_content_delivered
+ async def _notify_before_finalize(self) -> None:
+ """Run the pre-finalize hook exactly once, swallowing hook errors."""
+ if self._before_finalize_notified:
+ return
+ self._before_finalize_notified = True
+ if self._on_before_finalize is None:
+ return
+ try:
+ result = self._on_before_finalize()
+ if inspect.isawaitable(result):
+ await result
+ except Exception:
+ pass
+
async def _edit_message(
self,
*,
@@ -620,6 +640,8 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
+ if self._accumulated or self._message_id is not None or self._already_sent:
+ await self._notify_before_finalize()
# Final edit without cursor. If progressive editing failed
# mid-stream, send a single continuation/fallback message
# here instead of letting the base gateway path send the
@@ -1418,11 +1440,37 @@ class GatewayStreamConsumer:
# finalizing through edit would visibly downgrade a rich
# preview, so re-deliver as a fresh message + delete the
# preview instead.
+ #
+ # When the adapter exposes prefers_fresh_final_streaming
+ # and explicitly returns False, the time-based threshold
+ # must NOT override that decision. On Telegram the
+ # fresh-final path sends a Rich Message (sendRichMessage)
+ # that overlaps with the legacy MarkdownV2 preview already
+ # visible from streaming — both remain on screen because
+ # the old message is only best-effort deleted. Adapters
+ # without the hook still get the time-based fresh-final.
+ # (#47048)
+ # Check the *class* for the hook so MagicMock adapters
+ # (which auto-create attributes on access) are not
+ # falsely detected as having it. Also check instance
+ # __dict__ for test doubles that explicitly assign the
+ # attribute (e.g. adapter.prefers_fresh_final_streaming
+ # = MagicMock(return_value=False)).
+ _has_prefers_hook = (
+ hasattr(type(self.adapter),
+ "prefers_fresh_final_streaming")
+ or "prefers_fresh_final_streaming"
+ in getattr(self.adapter, "__dict__", {})
+ )
+ _prefers_fresh = self._adapter_prefers_fresh_final(text)
if (
finalize
and (
- self._should_send_fresh_final()
- or self._adapter_prefers_fresh_final(text)
+ _prefers_fresh
+ or (
+ not _has_prefers_hook
+ and self._should_send_fresh_final()
+ )
)
and await self._try_fresh_final(
text, is_turn_final=is_turn_final,
diff --git a/gateway/whatsapp_identity.py b/gateway/whatsapp_identity.py
index 9cd0a6f28be..7a0efe4e9f9 100644
--- a/gateway/whatsapp_identity.py
+++ b/gateway/whatsapp_identity.py
@@ -67,6 +67,57 @@ def normalize_whatsapp_identifier(value: str) -> str:
)
+# A target that is "just a phone number" — optional leading ``+`` then digits
+# and the usual human separators (spaces, dots, dashes, parens). Anything that
+# already carries an ``@`` is a fully-qualified JID and must pass through
+# untouched (group ``@g.us``, LID ``@lid``, ``status@broadcast`` etc.).
+_BARE_PHONE_RE = re.compile(r"^\+?[\d\s().\-]+$")
+
+
+def to_whatsapp_jid(value: str) -> str:
+ """Normalize an *outbound* WhatsApp target to a bridge-safe JID.
+
+ Baileys' ``jidDecode`` crashes on a bare phone number — it expects a
+ fully-qualified JID such as ``50766715226@s.whatsapp.net``. This helper
+ is the inverse of :func:`normalize_whatsapp_identifier`: instead of
+ stripping a JID down to its numeric core for comparison, it *builds* the
+ JID a send must use.
+
+ Behaviour:
+
+ - ``"+50766715226"`` / ``"50766715226"`` → ``"50766715226@s.whatsapp.net"``
+ - ``"50766715226@s.whatsapp.net"`` → unchanged
+ - ``"group-id@g.us"`` / ``"130631430344750@lid"`` → unchanged
+ - ``"user:device@s.whatsapp.net"`` style colon-before-``@`` → ``@`` form
+ - anything that isn't a recognizable bare phone → returned unchanged so
+ the bridge can surface a meaningful error rather than us mangling it.
+
+ Returns ``""`` for an empty/whitespace input.
+ """
+ if not value:
+ return ""
+
+ normalized = str(value).strip()
+ # Drop a device suffix before the domain: ``user:device@domain`` is a
+ # legacy Baileys shape whose ``:device`` part is not addressable — collapse
+ # it to ``user@domain``. (Mirrors normalize_whatsapp_identifier, which
+ # splits the bare id on ``:`` for the same reason.)
+ if ":" in normalized and "@" in normalized:
+ prefix, _, domain = normalized.partition("@")
+ normalized = f"{prefix.split(':', 1)[0]}@{domain}"
+
+ # Already a fully-qualified JID — leave it alone.
+ if "@" in normalized:
+ return normalized
+
+ if _BARE_PHONE_RE.fullmatch(normalized):
+ digits = re.sub(r"\D+", "", normalized)
+ if digits:
+ return f"{digits}@s.whatsapp.net"
+
+ return normalized
+
+
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
diff --git a/hermes_cli/__init__.py b/hermes_cli/__init__.py
index 11f2fb6f867..68844329fec 100644
--- a/hermes_cli/__init__.py
+++ b/hermes_cli/__init__.py
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
-__version__ = "0.16.0"
-__release_date__ = "2026.6.5"
+__version__ = "0.17.0"
+__release_date__ = "2026.6.19"
def _ensure_utf8():
diff --git a/hermes_cli/auth.py b/hermes_cli/auth.py
index d0c70a48def..4271ec20417 100644
--- a/hermes_cli/auth.py
+++ b/hermes_cli/auth.py
@@ -46,7 +46,7 @@ import httpx
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_constants import OPENROUTER_BASE_URL, secure_parent_dir
from agent.credential_persistence import sanitize_borrowed_credential_payload
-from utils import atomic_replace, atomic_yaml_write, is_truthy_value
+from utils import atomic_replace, atomic_yaml_write, env_float, is_truthy_value
logger = logging.getLogger(__name__)
@@ -138,10 +138,6 @@ SERVICE_PROVIDER_NAMES: Dict[str, str] = {
"spotify": "Spotify",
}
-# Google Gemini OAuth (google-gemini-cli provider, Cloud Code Assist backend)
-DEFAULT_GEMINI_CLOUDCODE_BASE_URL = "cloudcode-pa://google"
-GEMINI_OAUTH_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 60 # refresh 60s before expiry
-
# LM Studio's default no-auth mode still requires *some* non-empty bearer for
# the API-key code paths (auxiliary_client, runtime resolver) to treat the
# provider as configured. This sentinel is sent only to LM Studio, never to
@@ -206,12 +202,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_QWEN_BASE_URL,
),
- "google-gemini-cli": ProviderConfig(
- id="google-gemini-cli",
- name="Google Gemini (OAuth)",
- auth_type="oauth_external",
- inference_base_url=DEFAULT_GEMINI_CLOUDCODE_BASE_URL,
- ),
"lmstudio": ProviderConfig(
id="lmstudio",
name="LM Studio",
@@ -1529,7 +1519,7 @@ def resolve_provider(
"github-models": "copilot", "github-model": "copilot",
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
"opencode": "opencode-zen", "zen": "opencode-zen",
- "qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth", "google-gemini-cli": "google-gemini-cli", "gemini-cli": "google-gemini-cli", "gemini-oauth": "google-gemini-cli",
+ "qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub", "tokenhub": "tencent-tokenhub",
@@ -2155,97 +2145,6 @@ def get_qwen_auth_status() -> Dict[str, Any]:
# =============================================================================
-# Google Gemini OAuth (google-gemini-cli) — PKCE flow + Cloud Code Assist.
-#
-# Tokens live in ~/.hermes/auth/google_oauth.json (managed by agent.google_oauth).
-# The `base_url` here is the marker "cloudcode-pa://google" that run_agent.py
-# uses to construct a GeminiCloudCodeClient instead of the default OpenAI SDK.
-# Actual HTTP traffic goes to https://cloudcode-pa.googleapis.com/v1internal:*.
-# =============================================================================
-
-def _mark_google_gemini_cli_active(creds: Dict[str, Any]) -> None:
- """Set active_provider to google-gemini-cli in auth.json.
-
- The actual OAuth tokens live in the Google credential file managed by
- agent.google_oauth. This function only writes a minimal provider-state
- entry (email for display) and sets active_provider so that
- get_active_provider() and _model_section_has_credentials() detect the
- provider for the setup wizard and status commands.
- """
- with _auth_store_lock():
- auth_store = _load_auth_store()
- state: Dict[str, Any] = {}
- if creds.get("email"):
- state["email"] = str(creds["email"])
- _save_provider_state(auth_store, "google-gemini-cli", state)
- _save_auth_store(auth_store)
-
-
-def resolve_gemini_oauth_runtime_credentials(
- *,
- force_refresh: bool = False,
-) -> Dict[str, Any]:
- """Resolve runtime OAuth creds for google-gemini-cli."""
- try:
- from agent.google_oauth import (
- GoogleOAuthError,
- _credentials_path,
- get_valid_access_token,
- load_credentials,
- )
- except ImportError as exc:
- raise AuthError(
- f"agent.google_oauth is not importable: {exc}",
- provider="google-gemini-cli",
- code="google_oauth_module_missing",
- ) from exc
-
- try:
- access_token = get_valid_access_token(force_refresh=force_refresh)
- except GoogleOAuthError as exc:
- raise AuthError(
- str(exc),
- provider="google-gemini-cli",
- code=exc.code,
- ) from exc
-
- creds = load_credentials()
- base_url = DEFAULT_GEMINI_CLOUDCODE_BASE_URL
- return {
- "provider": "google-gemini-cli",
- "base_url": base_url,
- "api_key": access_token,
- "source": "google-oauth",
- "expires_at_ms": (creds.expires_ms if creds else None),
- "auth_file": str(_credentials_path()),
- "email": (creds.email if creds else "") or "",
- "project_id": (creds.project_id if creds else "") or "",
- }
-
-
-def get_gemini_oauth_auth_status() -> Dict[str, Any]:
- """Return a status dict for `hermes auth list` / `hermes status`."""
- try:
- from agent.google_oauth import _credentials_path, load_credentials
- except ImportError:
- return {"logged_in": False, "error": "agent.google_oauth unavailable"}
- auth_path = _credentials_path()
- creds = load_credentials()
- if creds is None or not creds.access_token:
- return {
- "logged_in": False,
- "auth_file": str(auth_path),
- "error": "not logged in",
- }
- return {
- "logged_in": True,
- "auth_file": str(auth_path),
- "source": "google-oauth",
- "api_key": creds.access_token,
- "expires_at_ms": creds.expires_ms,
- "email": creds.email,
- "project_id": creds.project_id,
- }
# Spotify auth — PKCE tokens stored in ~/.hermes/auth.json
# =============================================================================
@@ -2899,9 +2798,31 @@ def resolve_spotify_runtime_credentials(
if not should_refresh and refresh_if_expiring:
should_refresh = _is_expiring(state.get("expires_at"), refresh_skew_seconds)
if should_refresh:
- state = _refresh_spotify_oauth_state(state)
- _store_provider_state(auth_store, "spotify", state, set_active=False)
- _save_auth_store(auth_store)
+ try:
+ state = _refresh_spotify_oauth_state(state)
+ _store_provider_state(auth_store, "spotify", state, set_active=False)
+ _save_auth_store(auth_store)
+ except AuthError as exc:
+ if exc.relogin_required and state.get("refresh_token"):
+ # Terminal refresh failure — clear dead tokens from auth.json
+ # so subsequent calls fail fast without a network retry.
+ # Mirrors the Nous / xAI-OAuth / Codex-OAuth / MiniMax pattern.
+ for _k in ("access_token", "refresh_token", "expires_at", "expires_in", "obtained_at"):
+ state.pop(_k, None)
+ state["last_auth_error"] = {
+ "provider": "spotify",
+ "code": exc.code or "refresh_failed",
+ "message": str(exc),
+ "reason": "runtime_refresh_failure",
+ "relogin_required": True,
+ "at": datetime.now(timezone.utc).isoformat(),
+ }
+ try:
+ _store_provider_state(auth_store, "spotify", state, set_active=False)
+ _save_auth_store(auth_store)
+ except Exception as _save_exc:
+ logger.debug("Spotify OAuth: failed to persist quarantined state: %s", _save_exc)
+ raise
access_token = str(state.get("access_token", "") or "").strip()
if not access_token:
@@ -3838,7 +3759,7 @@ def resolve_codex_runtime_credentials(
tokens = dict(data["tokens"])
access_token = str(tokens.get("access_token", "") or "").strip()
- refresh_timeout_seconds = float(os.getenv("HERMES_CODEX_REFRESH_TIMEOUT_SECONDS", "20"))
+ refresh_timeout_seconds = env_float("HERMES_CODEX_REFRESH_TIMEOUT_SECONDS", 20)
should_refresh = bool(force_refresh)
if (not should_refresh) and refresh_if_expiring:
@@ -4475,7 +4396,7 @@ def resolve_xai_oauth_runtime_credentials(
data = _read_xai_oauth_tokens()
tokens = dict(data["tokens"])
access_token = str(tokens.get("access_token", "") or "").strip()
- refresh_timeout_seconds = float(os.getenv("HERMES_XAI_REFRESH_TIMEOUT_SECONDS", "20"))
+ refresh_timeout_seconds = env_float("HERMES_XAI_REFRESH_TIMEOUT_SECONDS", 20)
discovery = dict(data.get("discovery") or {})
token_endpoint = str(discovery.get("token_endpoint", "") or "").strip()
redirect_uri = str(data.get("redirect_uri", "") or "").strip()
@@ -5430,9 +5351,15 @@ def refresh_nous_oauth_pure(
state["refresh_token"] = refreshed.get("refresh_token") or refresh_token_value
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
+ # Heal a poisoned stored value: when the Portal-returned URL is
+ # rejected by the allowlist (returns None), reset to the production
+ # default instead of leaving a previously-persisted bad host (e.g. a
+ # stale staging URL) in place. Without this reset, an auth.json that
+ # was poisoned before the allowlist existed keeps re-validating to
+ # None on every refresh and silently re-uses the dead endpoint —
+ # the "falling back to default" warning never actually takes effect.
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
- if refreshed_url:
- state["inference_base_url"] = refreshed_url
+ state["inference_base_url"] = refreshed_url or DEFAULT_NOUS_INFERENCE_URL
state["obtained_at"] = now.isoformat()
state["expires_in"] = access_ttl
state["expires_at"] = datetime.fromtimestamp(
@@ -5705,9 +5632,13 @@ def resolve_nous_runtime_credentials(
state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
+ # Heal a poisoned stored value (see refresh_nous_oauth_pure):
+ # reject → reset to production default, don't keep a stale
+ # staging host that re-validates to None every refresh.
+ # The local inference_base_url is persisted to state below
+ # (and used for the client), so healing it here suffices.
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
- if refreshed_url:
- inference_base_url = refreshed_url
+ inference_base_url = refreshed_url or DEFAULT_NOUS_INFERENCE_URL
state["obtained_at"] = now.isoformat()
state["expires_in"] = access_ttl
state["expires_at"] = datetime.fromtimestamp(
@@ -6157,8 +6088,6 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
return get_xai_oauth_auth_status()
if target == "qwen-oauth":
return get_qwen_auth_status()
- if target == "google-gemini-cli":
- return get_gemini_oauth_auth_status()
if target == "minimax-oauth":
return get_minimax_oauth_auth_status()
if target == "copilot-acp":
@@ -6386,16 +6315,12 @@ def _update_config_for_provider(
# Clear stale base_url to prevent contamination when switching providers
model_cfg.pop("base_url", None)
- # Clear stale api_key/api_mode left over from a previous custom provider.
- # When the user switches from e.g. a MiniMax custom endpoint
- # (api_mode=anthropic_messages, api_key=mxp-...) to a built-in provider
- # (e.g. OpenRouter), the stale api_key/api_mode would override the new
- # provider's credentials and transport choice. Built-in providers that
- # need a specific api_mode (copilot, xai) set it at request-resolution
- # time via `_copilot_runtime_api_mode` / `_detect_api_mode_for_url`, so
- # removing the persisted value here is safe.
- model_cfg.pop("api_key", None)
- model_cfg.pop("api_mode", None)
+ # Clear stale endpoint credentials left over from a previous custom provider.
+ # Built-in providers resolve credentials from env/auth state, not inline
+ # model.api_key.
+ from hermes_cli.config import clear_model_endpoint_credentials
+
+ clear_model_endpoint_credentials(model_cfg)
# When switching to a non-OpenRouter provider, ensure model.default is
# valid for the new provider. An OpenRouter-formatted name like
diff --git a/hermes_cli/auth_commands.py b/hermes_cli/auth_commands.py
index f1f87c7703c..decf30dea0f 100644
--- a/hermes_cli/auth_commands.py
+++ b/hermes_cli/auth_commands.py
@@ -34,7 +34,7 @@ from hermes_cli.secret_prompt import masked_secret_prompt
# Providers that support OAuth login in addition to API keys.
-_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "xai-oauth", "qwen-oauth", "google-gemini-cli", "minimax-oauth"}
+_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "xai-oauth", "qwen-oauth", "minimax-oauth"}
def _get_custom_provider_names() -> list:
@@ -314,7 +314,7 @@ def auth_add_command(args) -> None:
_oauth_default_label(provider, len(pool.entries()) + 1),
)
# Add a distinct, self-contained pool entry per account (matching the
- # xai-oauth / google-gemini-cli / qwen-oauth patterns) instead of
+ # xai-oauth / qwen-oauth patterns) instead of
# routing through the singleton ``_save_codex_tokens`` save path.
# The singleton round-trip collapsed every added account into the
# latest login: a second ``hermes auth add openai-codex`` overwrote
@@ -364,28 +364,6 @@ def auth_add_command(args) -> None:
print(f'Saved {provider} OAuth credentials: "{shown_label}"')
return
- if provider == "google-gemini-cli":
- from agent.google_oauth import run_gemini_oauth_login_pure
-
- creds = run_gemini_oauth_login_pure()
- auth_mod._mark_google_gemini_cli_active(creds)
- label = (getattr(args, "label", None) or "").strip() or (
- creds.get("email") or _oauth_default_label(provider, len(pool.entries()) + 1)
- )
- entry = PooledCredential(
- provider=provider,
- id=uuid.uuid4().hex[:6],
- label=label,
- auth_type=AUTH_TYPE_OAUTH,
- priority=0,
- source=f"{SOURCE_MANUAL}:google_pkce",
- access_token=creds["access_token"],
- refresh_token=creds.get("refresh_token"),
- )
- pool.add_entry(entry)
- print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
- return
-
if provider == "qwen-oauth":
creds = auth_mod.resolve_qwen_runtime_credentials(refresh_if_expiring=False)
auth_mod._mark_qwen_oauth_active(creds)
diff --git a/hermes_cli/backup.py b/hermes_cli/backup.py
index 0064881c43f..702077f273a 100644
--- a/hermes_cli/backup.py
+++ b/hermes_cli/backup.py
@@ -34,14 +34,38 @@ logger = logging.getLogger(__name__)
# ``hermes-agent`` is special-cased to root level only in ``_should_exclude``
# so that skill directories like ``skills/autonomous-ai-agents/hermes-agent/``
# are not accidentally excluded.
+#
+# The dependency/cache entries below matter for more than tidiness: without
+# them a single plugin venv, MCP-server install, or pip/uv cache living under
+# HERMES_HOME gets walked file-by-file, ballooning a backup to hundreds of
+# thousands of entries that crawl for hours — the exact "backup stuck for
+# days / 426543 files" symptom users hit. The dependency/test-env names mostly
+# mirror ``agent.skill_utils.EXCLUDED_SKILL_DIRS`` (the project's canonical
+# "regeneratable dir" set); ``.cache`` is an additional backup-only entry, as
+# it names a broad regeneratable cache convention (pip/uv/etc.) that the skill
+# scanner doesn't need to prune but a backup walk does. We deliberately do NOT
+# exclude ``.archive`` here because the curator's ``skills/.archive/`` holds
+# restorable user skills that must survive a backup.
_EXCLUDED_DIRS = {
"hermes-agent", # the codebase repo — re-clone instead
"__pycache__", # bytecode caches — regenerated on import
".git", # nested git dirs (profiles shouldn't have these, but safety)
- "node_modules", # js deps if website/ somehow leaks in
+ "node_modules", # js deps — reinstalled on demand
"backups", # prior auto-backups — don't nest backups exponentially
"checkpoints", # session-local trajectory caches — regenerated per-session,
# session-hash-keyed so they don't port to another machine anyway
+ # Python dependency trees (plugin / MCP-server venvs under HERMES_HOME) —
+ # regenerated by reinstalling; never irreplaceable state.
+ ".venv",
+ "venv",
+ "site-packages",
+ # Tool / build caches — all regeneratable.
+ ".cache",
+ ".tox",
+ ".nox",
+ ".pytest_cache",
+ ".mypy_cache",
+ ".ruff_cache",
}
# File-name suffixes to skip
@@ -100,6 +124,89 @@ _IMPORT_SKIP_NAMES = {
# zipfile.open() drops Unix mode bits on extract; restore tightens these to 0600.
_SECRET_FILE_NAMES = {".env", "auth.json", "state.db"}
+# Reserved archive subtree for provider state that lives OUTSIDE HERMES_HOME
+# (e.g. ~/.honcho, ~/.hindsight). The active memory provider declares these via
+# MemoryProvider.backup_paths(); they're stored under this prefix encoded
+# relative to the user's home directory, and restored to their original
+# home-relative location on import. Anything not under home is skipped.
+_EXTERNAL_PREFIX = "_external/"
+
+
+def _collect_memory_provider_external_paths() -> List[Path]:
+ """Return existing absolute paths the active memory provider stores
+ outside HERMES_HOME, resolved from config only (no network, no init).
+
+ Reads ``memory.provider`` from config, loads just that provider, and asks
+ it for ``backup_paths()``. Returns an empty list when no external provider
+ is active or the provider can't be loaded — backup must never fail because
+ of a flaky plugin.
+ """
+ try:
+ from plugins.memory import _get_active_memory_provider, load_memory_provider
+ except Exception:
+ return []
+
+ try:
+ active = _get_active_memory_provider()
+ except Exception:
+ active = None
+ if not active:
+ return []
+
+ try:
+ provider = load_memory_provider(active)
+ except Exception:
+ provider = None
+ if provider is None:
+ return []
+
+ try:
+ declared = provider.backup_paths() or []
+ except Exception as exc:
+ logger.warning("backup_paths() failed for memory provider %r: %s", active, exc)
+ return []
+
+ out: List[Path] = []
+ seen: set = set()
+ for raw in declared:
+ try:
+ p = Path(raw).expanduser()
+ except Exception:
+ continue
+ if not p.exists():
+ continue
+ try:
+ resolved = p.resolve()
+ except (OSError, ValueError):
+ continue
+ if resolved in seen:
+ continue
+ seen.add(resolved)
+ out.append(p)
+ return out
+
+
+def _iter_external_files(base: Path) -> List[Path]:
+ """Yield regular files under *base* (a file or a directory), skipping
+ symlinks, caches, and pyc files. *base* itself may be a file."""
+ files: List[Path] = []
+ if base.is_file() and not base.is_symlink():
+ files.append(base)
+ return files
+ if not base.is_dir():
+ return files
+ for dirpath, dirnames, filenames in os.walk(base, followlinks=False):
+ dp = Path(dirpath)
+ dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
+ for fname in filenames:
+ fpath = dp / fname
+ if fpath.is_symlink():
+ continue
+ if fpath.name in _EXCLUDED_NAMES or fpath.name.endswith(_EXCLUDED_SUFFIXES):
+ continue
+ files.append(fpath)
+ return files
+
def _should_exclude(rel_path: Path) -> bool:
"""Return True if *rel_path* (relative to hermes root) should be skipped."""
@@ -238,12 +345,36 @@ def run_backup(args) -> None:
files_to_add.append((fpath, rel))
- if not files_to_add:
+ # External memory-provider state (e.g. ~/.honcho, ~/.hindsight) lives
+ # outside HERMES_HOME, so the walk above never sees it. Ask the active
+ # provider for its declared paths and stage them under the reserved
+ # ``_external/`` arc prefix, encoded relative to the user's home dir.
+ # Only paths under home are captured (security + portability); anything
+ # else is skipped with a note.
+ home_dir = Path.home().resolve()
+ external_to_add: list[tuple[Path, str]] = [] # (absolute, arcname)
+ skipped_external: list[str] = []
+ for base in _collect_memory_provider_external_paths():
+ try:
+ base_resolved = base.resolve()
+ base_resolved.relative_to(home_dir)
+ except (ValueError, OSError):
+ skipped_external.append(str(base))
+ continue
+ for fpath in _iter_external_files(base):
+ try:
+ rel_to_home = fpath.resolve().relative_to(home_dir)
+ except (ValueError, OSError):
+ continue
+ arcname = _EXTERNAL_PREFIX + rel_to_home.as_posix()
+ external_to_add.append((fpath, arcname))
+
+ if not files_to_add and not external_to_add:
print("No files to back up.")
return
# Create the zip
- file_count = len(files_to_add)
+ file_count = len(files_to_add) + len(external_to_add)
print(f"Backing up {file_count} files ...")
total_bytes = 0
@@ -282,6 +413,17 @@ def run_backup(args) -> None:
if i % 500 == 0:
print(f" {i}/{file_count} files ...")
+ # External memory-provider state, stored under the ``_external/`` arc
+ # prefix. These never include ``.db`` files in practice (config/env
+ # blobs), so a straight zf.write is fine.
+ for abs_path, arcname in external_to_add:
+ try:
+ zf.write(abs_path, arcname=arcname)
+ total_bytes += abs_path.stat().st_size
+ except (PermissionError, OSError, ValueError) as exc:
+ errors.append(f" {arcname}: {exc}")
+ continue
+
elapsed = time.monotonic() - t0
zip_size = out_path.stat().st_size
@@ -293,6 +435,20 @@ def run_backup(args) -> None:
print(f" Compressed: {_format_size(zip_size)}")
print(f" Time: {elapsed:.1f}s")
+ if external_to_add:
+ print(
+ f"\n Included {len(external_to_add)} memory-provider file(s) "
+ f"stored outside {display_hermes_home()}."
+ )
+
+ if skipped_external:
+ print(
+ f"\n Skipped {len(skipped_external)} memory-provider path(s) "
+ f"outside your home directory (not portable):"
+ )
+ for p in sorted(skipped_external)[:10]:
+ print(f" {p}")
+
if skipped_dirs:
print(f"\n Excluded directories:")
for d in sorted(skipped_dirs):
@@ -418,10 +574,44 @@ def run_import(args) -> None:
errors = []
restored = 0
+ restored_external = 0
skipped_runtime: list[str] = []
+ home_dir = Path.home().resolve()
t0 = time.monotonic()
for member in members:
+ # External memory-provider state captured under the reserved
+ # ``_external/`` arc prefix restores to its original home-relative
+ # location (e.g. ~/.honcho/config.json), NOT under HERMES_HOME.
+ if member.startswith(_EXTERNAL_PREFIX):
+ ext_rel = member[len(_EXTERNAL_PREFIX):]
+ if not ext_rel:
+ continue
+ target = home_dir / ext_rel
+ # Security: the resolved target must stay under the home dir.
+ try:
+ target.resolve().relative_to(home_dir)
+ except ValueError:
+ errors.append(f" {member}: path traversal blocked")
+ continue
+ try:
+ target.parent.mkdir(parents=True, exist_ok=True)
+ with zf.open(member) as src, open(target, "wb") as dst:
+ dst.write(src.read())
+ # External provider configs commonly hold credentials.
+ if target.suffix in {".json", ".env", ".conf"} or target.name in _SECRET_FILE_NAMES:
+ try:
+ os.chmod(target, 0o600)
+ except OSError:
+ pass
+ restored += 1
+ restored_external += 1
+ except (PermissionError, OSError) as exc:
+ errors.append(f" {member}: {exc}")
+ if restored % 500 == 0:
+ print(f" {restored}/{file_count} files ...")
+ continue
+
# Strip prefix if detected
if prefix and member.startswith(prefix):
rel = member[len(prefix):]
@@ -470,6 +660,12 @@ def run_import(args) -> None:
print(f"Import complete: {restored} files restored in {elapsed:.1f}s")
print(f" Target: {display_hermes_home()}")
+ if restored_external:
+ print(
+ f"\n Restored {restored_external} memory-provider file(s) to "
+ f"their original location(s) outside {display_hermes_home()}."
+ )
+
if errors:
print(f"\n Warnings ({len(errors)} files skipped):")
for e in errors[:10]:
@@ -704,8 +900,22 @@ def restore_quick_snapshot(
"""
home = hermes_home or get_hermes_home()
root = _quick_snapshot_root(home)
+
+ # Security: reject snapshot_id values that contain path separators or
+ # traversal sequences so that `root / snapshot_id` stays inside root.
+ if not snapshot_id or "/" in snapshot_id or "\\" in snapshot_id or snapshot_id in (".", ".."):
+ logger.error("Invalid snapshot_id: %s", snapshot_id)
+ return False
+
snap_dir = root / snapshot_id
+ # Confirm the resolved path is still inside root (handles symlinks etc.)
+ try:
+ snap_dir.resolve().relative_to(root.resolve())
+ except ValueError:
+ logger.error("Snapshot path traversal blocked for id: %s", snapshot_id)
+ return False
+
if not snap_dir.is_dir():
return False
@@ -718,11 +928,24 @@ def restore_quick_snapshot(
restored = 0
for rel in meta.get("files", {}):
+ # Security: reject absolute paths and traversals in manifest entries
src = snap_dir / rel
- if not src.exists():
+ try:
+ src.resolve().relative_to(snap_dir.resolve())
+ except ValueError:
+ logger.error("Manifest path traversal blocked: %s", rel)
continue
dst = home / rel
+ try:
+ dst.resolve().relative_to(home.resolve())
+ except ValueError:
+ logger.error("Manifest path traversal blocked: %s", rel)
+ continue
+
+ if not src.exists():
+ continue
+
dst.parent.mkdir(parents=True, exist_ok=True)
try:
diff --git a/hermes_cli/banner.py b/hermes_cli/banner.py
index 952a09ef99f..62f9f40e7a6 100644
--- a/hermes_cli/banner.py
+++ b/hermes_cli/banner.py
@@ -575,6 +575,18 @@ def build_welcome_banner(console: "Console", model: str, cwd: str,
enabled_toolsets = enabled_toolsets or []
_, unavailable_toolsets = check_tool_availability(quiet=True)
+ # The availability check walks the GLOBAL toolset registry, so it includes
+ # toolsets that aren't part of this agent's platform set at all (e.g.
+ # `discord`, `feishu_doc` on a CLI session). Those must never surface in the
+ # banner's "Available Tools" — they aren't exposed to the agent. Restrict to
+ # toolsets actually enabled for this agent; a toolset that's enabled but
+ # currently has unmet deps legitimately shows as disabled/lazy below.
+ _enabled_ts = {str(t) for t in enabled_toolsets}
+ if _enabled_ts:
+ unavailable_toolsets = [
+ item for item in unavailable_toolsets
+ if str(item.get("id", item.get("name", ""))) in _enabled_ts
+ ]
disabled_tools = set()
# Tools whose toolset has a check_fn are lazy-initialized (e.g. honcho,
# homeassistant) — they show as unavailable at banner time because the
@@ -722,10 +734,21 @@ def build_welcome_banner(console: "Console", model: str, cwd: str,
right_lines.append("")
right_lines.append(f"[bold {accent}]Available Skills[/]")
- skills_by_category = get_available_skills()
- total_skills = sum(len(s) for s in skills_by_category.values())
+ # The skills catalog is only reachable when the `skills` toolset is enabled
+ # (it exposes skill_view / skill_manage). When it's disabled — e.g. a Blank
+ # Slate install — the agent literally cannot load any skill, so advertising
+ # the on-disk catalog here is misleading. Reflect the real state instead.
+ _skills_enabled = (not _enabled_ts) or ("skills" in _enabled_ts)
+ if _skills_enabled:
+ skills_by_category = get_available_skills()
+ total_skills = sum(len(s) for s in skills_by_category.values())
+ else:
+ skills_by_category = {}
+ total_skills = 0
- if skills_by_category:
+ if not _skills_enabled:
+ right_lines.append(f"[dim {dim}]Skills toolset disabled[/]")
+ elif skills_by_category:
for category in sorted(skills_by_category.keys()):
skill_names = sorted(skills_by_category[category])
if len(skill_names) > 8:
diff --git a/hermes_cli/cli_agent_setup_mixin.py b/hermes_cli/cli_agent_setup_mixin.py
index 1041e8fd0b5..a71d8835698 100644
--- a/hermes_cli/cli_agent_setup_mixin.py
+++ b/hermes_cli/cli_agent_setup_mixin.py
@@ -391,9 +391,17 @@ class CLIAgentSetupMixin:
notice_callback=self._on_notice,
notice_clear_callback=self._on_notice_clear,
)
- # Store reference for atexit memory provider shutdown
- global _active_agent_ref
- _active_agent_ref = self.agent
+ # Store reference for atexit memory provider shutdown.
+ # NOTE: this MUST write to the ``cli`` module's global, not a
+ # local module global. ``_run_cleanup`` (in cli.py) reads
+ # ``cli._active_agent_ref`` to decide whether to fire the memory
+ # provider's ``on_session_end`` hook. When this code lived in
+ # cli.py a bare ``global _active_agent_ref`` worked; after the
+ # god-file extraction into this mixin a ``global`` here would bind
+ # *this module's* namespace, leaving ``cli._active_agent_ref`` None
+ # forever — so memory shutdown never ran on /exit (#49287).
+ import cli as _cli
+ _cli._active_agent_ref = self.agent
# Route agent status output through prompt_toolkit so ANSI escape
# sequences aren't garbled by patch_stdout's StdoutProxy (#2262).
self.agent._print_fn = _cprint
diff --git a/hermes_cli/cli_commands_mixin.py b/hermes_cli/cli_commands_mixin.py
index a064321b4d1..50013371692 100644
--- a/hermes_cli/cli_commands_mixin.py
+++ b/hermes_cli/cli_commands_mixin.py
@@ -947,52 +947,6 @@ class CLICommandsMixin:
_cprint(f" Original session: {parent_session_id}")
_cprint(f" Branch session: {new_session_id}")
- def _handle_gquota_command(self, cmd_original: str) -> None:
- """Show Google Gemini Code Assist quota usage for the current OAuth account."""
- try:
- from agent.google_oauth import get_valid_access_token, GoogleOAuthError, load_credentials
- from agent.google_code_assist import retrieve_user_quota, CodeAssistError
- except ImportError as exc:
- self._console_print(f" [red]Gemini modules unavailable: {exc}[/]")
- return
-
- try:
- access_token = get_valid_access_token()
- except GoogleOAuthError as exc:
- self._console_print(f" [yellow]{exc}[/]")
- self._console_print(" Run [bold]/model[/] and pick 'Google Gemini (OAuth)' to sign in.")
- return
-
- creds = load_credentials()
- project_id = (creds.project_id if creds else "") or ""
-
- try:
- buckets = retrieve_user_quota(access_token, project_id=project_id)
- except CodeAssistError as exc:
- self._console_print(f" [red]Quota lookup failed:[/] {exc}")
- return
-
- if not buckets:
- self._console_print(" [dim]No quota buckets reported (account may be on legacy/unmetered tier).[/]")
- return
-
- # Sort for stable display, group by model
- buckets.sort(key=lambda b: (b.model_id, b.token_type))
- self._console_print()
- self._console_print(f" [bold]Gemini Code Assist quota[/] (project: {project_id or '(auto / free-tier)'})")
- self._console_print()
- for b in buckets:
- pct = max(0.0, min(1.0, b.remaining_fraction))
- width = 20
- filled = int(round(pct * width))
- bar = "▓" * filled + "░" * (width - filled)
- pct_str = f"{int(pct * 100):3d}%"
- header = b.model_id
- if b.token_type:
- header += f" [{b.token_type}]"
- self._console_print(f" {header:40s} {bar} {pct_str}")
- self._console_print()
-
def _handle_personality_command(self, cmd: str):
"""Handle the /personality command to set predefined personalities."""
from cli import save_config_value
@@ -2064,6 +2018,79 @@ class CLICommandsMixin:
if self._apply_tui_skin_style():
print(" Prompt + TUI colors updated.")
+ def _compose_in_editor(self, initial_text: str = "") -> str:
+ """Open ``$VISUAL``/``$EDITOR`` on a temp markdown file and return the
+ saved buffer (comment lines starting with ``#!`` stripped).
+
+ Returns the composed prompt text, or an empty string if the editor
+ could not be launched or the buffer was left empty. Factored out so
+ the read-back/strip logic is unit-testable without spawning an editor.
+ """
+ import os
+ import shlex
+ import subprocess
+ import tempfile
+
+ editor = os.environ.get("VISUAL") or os.environ.get("EDITOR")
+ if not editor:
+ editor = "notepad" if os.name == "nt" else "nano"
+
+ header = (
+ "#! Compose your prompt below. Lines starting with '#!' are ignored.\n"
+ "#! Save and quit to send; leave empty to cancel.\n\n"
+ )
+ fd, path = tempfile.mkstemp(suffix=".md", prefix="hermes_prompt_")
+ try:
+ with os.fdopen(fd, "w", encoding="utf-8") as fh:
+ fh.write(header)
+ if initial_text:
+ fh.write(initial_text)
+ try:
+ subprocess.call([*shlex.split(editor), path])
+ except Exception:
+ # Fall back to a bare invocation (editor value may not be a
+ # simple argv-splittable string on some platforms).
+ subprocess.call(f"{editor} {shlex.quote(path)}", shell=True)
+ with open(path, "r", encoding="utf-8") as fh:
+ raw = fh.read()
+ finally:
+ try:
+ os.unlink(path)
+ except OSError:
+ pass
+
+ lines = [ln for ln in raw.splitlines() if not ln.startswith("#!")]
+ return "\n".join(lines).strip()
+
+ def _handle_prompt_compose_command(self, cmd_original: str) -> None:
+ """Handle /prompt — compose the next prompt in $EDITOR and send it.
+
+ Opens the user's editor on a temporary markdown file (optionally
+ seeded with text passed after the command), then queues the saved
+ buffer as the next agent turn via the one-shot ``_pending_agent_seed``
+ the interactive loop already consumes (same path as /blueprint).
+ """
+ from cli import _DIM, _RST, _cprint
+
+ initial = ""
+ parts = (cmd_original or "").strip().split(None, 1)
+ if len(parts) > 1:
+ initial = parts[1]
+
+ try:
+ composed = self._compose_in_editor(initial)
+ except Exception as exc:
+ _cprint(f" {_DIM}(>_<) Could not open editor: {exc}{_RST}")
+ return
+
+ if not composed:
+ _cprint(f" {_DIM}(._.) Empty prompt — nothing sent.{_RST}")
+ return
+
+ # One-shot seed: the interactive loop runs this as the next agent turn
+ # right after process_command() returns (see cli.py main loop).
+ self._pending_agent_seed = composed
+
def _handle_footer_command(self, cmd_original: str) -> None:
"""Toggle or inspect ``display.runtime_footer.enabled`` from the CLI.
@@ -2117,6 +2144,56 @@ class CLICommandsMixin:
else:
_cprint(" Failed to save runtime_footer setting to config.yaml")
+ def _handle_timestamps_command(self, cmd_original: str) -> None:
+ """Toggle or inspect ``display.timestamps`` from the CLI.
+
+ When on, submitted and streamed message labels carry an ``[HH:MM]``
+ suffix and ``/history`` prefixes each turn with its time (for turns
+ that carry a stored timestamp).
+
+ Usage:
+ /timestamps → toggle
+ /timestamps on|off → explicit
+ /timestamps status → show current state
+ """
+ from cli import _cprint, save_config_value
+ from hermes_cli.colors import Colors as _Colors
+
+ arg = ""
+ try:
+ parts = (cmd_original or "").strip().split(None, 1)
+ if len(parts) > 1:
+ arg = parts[1].strip().lower()
+ except Exception:
+ arg = ""
+
+ current = bool(getattr(self, "show_timestamps", False))
+
+ if arg in {"status", "?"}:
+ state = "ON" if current else "OFF"
+ _cprint(f" {_Colors.BOLD}Message timestamps:{_Colors.RESET} {state}")
+ return
+
+ if arg in {"on", "enable", "true", "1"}:
+ new_state = True
+ elif arg in {"off", "disable", "false", "0"}:
+ new_state = False
+ elif arg == "":
+ new_state = not current
+ else:
+ _cprint(" Usage: /timestamps [on|off|status]")
+ return
+
+ self.show_timestamps = new_state
+ if save_config_value("display.timestamps", new_state):
+ state = (
+ f"{_Colors.GREEN}ON{_Colors.RESET}" if new_state
+ else f"{_Colors.DIM}OFF{_Colors.RESET}"
+ )
+ _cprint(f" Message timestamps: {state}")
+ else:
+ _cprint(" Failed to save timestamps setting to config.yaml")
+
def _handle_reasoning_command(self, cmd: str):
"""Handle /reasoning — manage effort level and display toggle.
@@ -2125,6 +2202,8 @@ class CLICommandsMixin:
/reasoning Set reasoning effort (none, minimal, low, medium, high, xhigh)
/reasoning show|on Show model thinking/reasoning in output
/reasoning hide|off Hide model thinking/reasoning from output
+ /reasoning full Show complete thinking (no 10-line clamp)
+ /reasoning clamp Collapse long thinking to the first 10 lines
"""
from cli import _ACCENT, _DIM, _RST, _cprint, _parse_reasoning_config, save_config_value
parts = cmd.strip().split(maxsplit=1)
@@ -2139,9 +2218,10 @@ class CLICommandsMixin:
else:
level = rc.get("effort", "medium")
display_state = "on ✓" if self.show_reasoning else "off"
+ full_state = "full" if getattr(self, "reasoning_full", False) else "clamped to 10 lines"
_cprint(f" {_ACCENT}Reasoning effort: {level}{_RST}")
- _cprint(f" {_ACCENT}Reasoning display: {display_state}{_RST}")
- _cprint(f" {_DIM}Usage: /reasoning {_RST}")
+ _cprint(f" {_ACCENT}Reasoning display: {display_state} ({full_state}){_RST}")
+ _cprint(f" {_DIM}Usage: /reasoning {_RST}")
return
arg = parts[1].strip().lower()
@@ -2163,6 +2243,21 @@ class CLICommandsMixin:
_cprint(f" {_ACCENT}✓ Reasoning display: OFF (saved){_RST}")
return
+ # Full / clamped recap toggle
+ if arg in {"full", "all"}:
+ self.reasoning_full = True
+ save_config_value("display.reasoning_full", True)
+ _cprint(f" {_ACCENT}✓ Reasoning display: FULL (saved){_RST}")
+ _cprint(f" {_DIM} The post-response recap box will print complete thinking.{_RST}")
+ if not self.show_reasoning:
+ _cprint(f" {_DIM} Note: reasoning display is OFF — run /reasoning show to see it.{_RST}")
+ return
+ if arg in {"clamp", "collapse", "short"}:
+ self.reasoning_full = False
+ save_config_value("display.reasoning_full", False)
+ _cprint(f" {_ACCENT}✓ Reasoning display: CLAMPED to 10 lines (saved){_RST}")
+ return
+
# Effort level change
parsed = _parse_reasoning_config(arg)
if parsed is None:
diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py
index b7e19bdeebf..cf67efd2e36 100644
--- a/hermes_cli/commands.py
+++ b/hermes_cli/commands.py
@@ -78,6 +78,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("save", "Save the current conversation", "Session",
cli_only=True),
CommandDef("retry", "Retry the last message (resend to agent)", "Session"),
+ CommandDef("prompt", "Compose your next prompt in $EDITOR (markdown), then send it", "Session",
+ cli_only=True, args_hint="[initial text]", aliases=("compose",)),
CommandDef("undo", "Back up N user turns and re-prompt (default 1)", "Session",
args_hint="[N]"),
CommandDef("title", "Set a title for the current session", "Session",
@@ -123,18 +125,19 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
- CommandDef("model", "Switch model for this session", "Configuration",
- args_hint="[model] [--provider name] [--global] [--refresh]"),
+ CommandDef("model", "Switch model (persists by default)", "Configuration",
+ args_hint="[model] [--provider name] [--global|--session] [--refresh]"),
CommandDef("codex-runtime", "Toggle codex app-server runtime for OpenAI/Codex models",
"Configuration", aliases=("codex_runtime",),
args_hint="[auto|codex_app_server]"),
- CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
- cli_only=True),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
CommandDef("statusbar", "Toggle the context/model status bar", "Configuration",
cli_only=True, aliases=("sb",)),
+ CommandDef("timestamps", "Toggle [HH:MM] timestamps on messages and /history", "Configuration",
+ cli_only=True, args_hint="[on|off|status]",
+ subcommands=("on", "off", "status"), aliases=("ts",)),
CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
"Configuration", cli_only=True,
gateway_config_gate="display.tool_progress_command"),
@@ -144,8 +147,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("yolo", "Toggle YOLO mode (skip all dangerous command approvals)",
"Configuration"),
CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
- args_hint="[level|show|hide]",
- subcommands=("none", "minimal", "low", "medium", "high", "xhigh", "show", "hide", "on", "off")),
+ args_hint="[level|show|hide|full|clamp]",
+ subcommands=("none", "minimal", "low", "medium", "high", "xhigh", "show", "hide", "on", "off", "full", "clamp")),
CommandDef("fast", "Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode (Normal/Fast)", "Configuration",
args_hint="[normal|fast|status]",
subcommands=("normal", "fast", "status", "on", "off")),
@@ -217,7 +220,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
gateway_only=True),
CommandDef("usage", "Show token usage and rate limits for the current session", "Info"),
CommandDef("credits", "Show Nous credit balance and top up", "Info"),
- CommandDef("billing", "Manage Nous terminal billing — buy credits, auto-reload, limits", "Info"),
+ CommandDef("billing", "Manage Nous terminal billing — buy credits, auto-reload, limits", "Info",
+ cli_only=True),
CommandDef("insights", "Show usage insights and analytics", "Info",
args_hint="[days]"),
CommandDef("platforms", "Show gateway/messaging platform status", "Info",
diff --git a/hermes_cli/config.py b/hermes_cli/config.py
index a557899ae98..f688b565cdd 100644
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -169,8 +169,8 @@ _ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
# the dashboard. ``config.yaml`` is the supported surface for these.
#
# IMPORTANT: ``HERMES_*`` overall is NOT blocked. Many legitimate
-# integration credentials follow that prefix (HERMES_GEMINI_CLIENT_ID,
-# HERMES_LANGFUSE_PUBLIC_KEY, HERMES_SPOTIFY_CLIENT_ID, ...). The
+# integration credentials follow that prefix (HERMES_LANGFUSE_PUBLIC_KEY,
+# HERMES_SPOTIFY_CLIENT_ID, ...). The
# denylist is name-by-name on purpose so the gate stays narrow and
# doesn't accidentally break provider setup wizards.
#
@@ -223,7 +223,10 @@ _LAST_EXPANDED_CONFIG_BY_PATH: Dict[str, Any] = {}
# save_config() + migrate_config() write via atomic_yaml_write which
# produces a fresh inode, so stat() sees a new mtime_ns and the next
# load repopulates automatically — no explicit invalidation hook.
-_LOAD_CONFIG_CACHE: Dict[str, Tuple[int, int, Dict[str, Any]]] = {}
+# Cached tuple is (user_mtime_ns, user_size, managed_mtime_ns, managed_size,
+# merged_value) — the managed-file signature is folded in so editing the
+# managed-scope config.yaml invalidates the cache (see managed_scope).
+_LOAD_CONFIG_CACHE: Dict[str, Tuple[int, int, int, int, Dict[str, Any]]] = {}
# (path, mtime_ns, size) -> cached raw yaml dict. Same pattern as
# _LOAD_CONFIG_CACHE but for read_raw_config() — used when callers want
# the user's on-disk values without defaults merged in.
@@ -1018,6 +1021,12 @@ DEFAULT_CONFIG = {
"modal_mode": "auto",
"cwd": ".", # Use current directory
"timeout": 180,
+ # Bounded grace period (seconds) between SIGTERM and an escalated
+ # SIGKILL when terminating a host process tree (browser daemons, etc.).
+ # A daemon that stalls in its SIGTERM handler is force-killed after this
+ # window so it can't leak indefinitely. 0 disables escalation (SIGTERM
+ # only — the historical behavior). Floored internally at 0.
+ "daemon_term_grace_seconds": 2.0,
# Environment variables to pass through to sandboxed execution
# (terminal and execute_code). Skill-declared required_environment_variables
# are passed through automatically; this list is for non-skill use cases.
@@ -1198,6 +1207,21 @@ DEFAULT_CONFIG = {
# 100K chars ≈ 25–35K tokens across typical tokenisers.
"file_read_max_chars": 100_000,
+ # Seconds to wait at agent-build time for in-flight MCP server discovery
+ # to finish before the agent snapshots its tool list. MCP discovery runs
+ # in a background thread so a slow/dead server can't freeze startup; this
+ # bounds how long the first agent build blocks on it. The wait returns
+ # the INSTANT discovery completes, so users with no MCP servers (the common
+ # case) or fast servers pay ~0s regardless of this value — the bound is
+ # only reached when a server is genuinely still connecting. The old 0.75s
+ # default was a touch short for HTTP/OAuth servers on a cold connect; a
+ # modest bump lets more of them land in the FIRST turn's snapshot. This is
+ # only a turn-1 latency/UX knob: a server that misses this window is still
+ # picked up automatically on the next turn by the between-turns refresh
+ # (see agent/turn_context.py), so correctness never depends on it. Keep it
+ # small so a slow/dead server adds little to first-response latency.
+ "mcp_discovery_timeout": 1.5,
+
# Tool-output truncation thresholds. When terminal output or a
# single read_file page exceeds these limits, Hermes truncates the
# payload sent to the model (keeping head + tail for terminal,
@@ -1241,7 +1265,7 @@ DEFAULT_CONFIG = {
"threshold": 0.50, # compress when context usage exceeds this ratio
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
- "hygiene_hard_message_limit": 400, # gateway session-hygiene force-compress threshold by message count
+ "hygiene_hard_message_limit": 5000, # gateway session-hygiene force-compress threshold by message count
"protect_first_n": 3, # non-system head messages always preserved
# verbatim, in ADDITION to the system prompt
# (which is always implicitly protected). Set to
@@ -1269,6 +1293,22 @@ DEFAULT_CONFIG = {
# exact route is affected — gpt-5.5 on OpenAI's
# direct API, OpenRouter, and Copilot keep the
# global threshold regardless.
+ "in_place": False, # When True, compaction rewrites the message
+ # list and rebuilds the system prompt WITHOUT
+ # rotating the session id — the conversation
+ # keeps one durable id for its whole life
+ # (no parent_session_id chain, no `name #N`
+ # renumbering). Eliminates the session-rotation
+ # bug cluster (#33618 /goal loss, #14238 lost
+ # response, #33907 orphans, #45117 search gaps,
+ # #42228 null cwd) — see #38763. Non-destructive:
+ # the live context is compacted (lossy for what
+ # the model reloads), but the pre-compaction
+ # turns are soft-archived under the same id
+ # (active=0, compacted=1) — still searchable via
+ # session_search and recoverable, not deleted.
+ # Default False during rollout; will flip on
+ # after live validation.
},
# Kanban subsystem (orchestrator workers + dispatcher-driven child tasks).
@@ -1420,6 +1460,7 @@ DEFAULT_CONFIG = {
"api_key": "",
"timeout": 30,
"extra_body": {},
+ "language": "",
},
"tts_audio_tags": {
"provider": "auto",
@@ -1532,6 +1573,10 @@ DEFAULT_CONFIG = {
"tui_agents_nudge": True,
"bell_on_complete": False,
"show_reasoning": False,
+ # When reasoning display is on, the post-response "Reasoning" recap box
+ # collapses long thinking to the first 10 lines. Set true to print the
+ # complete thinking text uncollapsed (live streaming is always full).
+ "reasoning_full": False,
# Background self-improvement review notifications surfaced in chat.
# "off" — no chat notification (the review still runs and writes)
# "on" — generic "💾 Memory updated" line (default)
@@ -1581,6 +1626,14 @@ DEFAULT_CONFIG = {
# TUI busy indicator style: kaomoji (default), emoji, unicode (braille
# spinner), or ascii. Live-swappable via `/indicator