Merge remote-tracking branch 'origin/main' into bb/pets

This commit is contained in:
Brooklyn Nicholson 2026-06-22 05:25:49 -05:00
commit 5342eccf12
823 changed files with 58322 additions and 13772 deletions

View file

@ -102,6 +102,3 @@ acp_registry/
.gitattributes
.hadolint.yaml
.mailmap
# Top-level LICENSE (not matched by *.md); not needed inside the container
LICENSE

View file

@ -105,6 +105,7 @@
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
# HF_TOKEN=
# HF_BASE_URL=https://router.huggingface.co/v1 # Override default base URL
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
@ -411,6 +412,9 @@ IMAGE_TOOLS_DEBUG=false
# Groq API key (free tier — used for Whisper STT in voice mode)
# GROQ_API_KEY=
# ElevenLabs API key (cloud STT/TTS — Scribe transcription)
# ELEVENLABS_API_KEY=
# =============================================================================
# STT PROVIDER SELECTION
# =============================================================================

View file

@ -954,9 +954,10 @@ Enable/disable per platform via `hermes tools` (the curses UI) or the
## Delegation (`delegate_task`)
`tools/delegate_tool.py` spawns a subagent with an isolated
context + terminal session. Synchronous: the parent waits for the
child's summary before continuing its own loop — if the parent is
interrupted, the child is cancelled.
context + terminal session. By default the parent waits for the
child's summary before continuing its own loop. With `background=true`,
Hermes returns a delegation id immediately and the result re-enters the
conversation later through the async-delegation completion queue.
Two shapes:
@ -978,9 +979,9 @@ Key config knobs (under `delegation:` in `config.yaml`):
`orchestrator_enabled`, `subagent_auto_approve`, `inherit_mcp_toolsets`,
`max_iterations`.
Synchronicity rule: delegate_task is **not** durable. For long-running
work that must outlive the current turn, use `cronjob` or
`terminal(background=True, notify_on_complete=True)` instead.
Durability rule: background `delegate_task` is detached from the current
turn but still process-local. For work that must survive process restart, use
`cronjob` or `terminal(background=True, notify_on_complete=True)` instead.
---
@ -1174,7 +1175,7 @@ automatically scope to the active profile.
a unique credential (bot token, API key), call `acquire_scoped_lock()` from
`gateway.status` in the `connect()`/`start()` method and `release_scoped_lock()` in
`disconnect()`/`stop()`. This prevents two profiles from using the same credential.
See `gateway/platforms/telegram.py` for the canonical pattern.
See `plugins/platforms/irc/adapter.py` for the canonical pattern.
6. **Profile operations are HOME-anchored, not HERMES_HOME-anchored**`_get_profiles_root()`
returns `Path.home() / ".hermes" / "profiles"`, NOT `get_hermes_home() / "profiles"`.

602
CONTRIBUTING.es.md Normal file
View file

@ -0,0 +1,602 @@
# Contribuir a Hermes Agent
¡Gracias por contribuir a Hermes Agent! Esta guía cubre todo lo que necesitas: configurar tu entorno de desarrollo, entender la arquitectura, decidir qué construir y conseguir que tu PR sea aceptado.
---
## Prioridades de Contribución
Valoramos las contribuciones en este orden:
1. **Correcciones de errores** — bloqueos, comportamiento incorrecto, pérdida de datos. Siempre la máxima prioridad.
2. **Compatibilidad entre plataformas** — macOS, diferentes distribuciones de Linux y WSL2 en Windows. Queremos que Hermes funcione en todas partes.
3. **Fortalecimiento de seguridad** — inyección de shell, inyección de prompts, traversal de rutas, escalada de privilegios. Ver [Consideraciones de Seguridad](#consideraciones-de-seguridad).
4. **Rendimiento y robustez** — lógica de reintento, manejo de errores, degradación elegante.
5. **Nuevas habilidades** — pero solo las ampliamente útiles. Ver [¿Debería ser una Habilidad o una Herramienta?](#debería-ser-una-habilidad-o-una-herramienta)
6. **Nuevas herramientas** — raramente necesarias. La mayoría de las capacidades deberían ser habilidades. Ver más abajo.
7. **Documentación** — correcciones, aclaraciones, nuevos ejemplos.
---
## ¿Debería ser una Habilidad o una Herramienta?
Esta es la pregunta más común para los nuevos colaboradores. La respuesta casi siempre es **habilidad**.
### Hazlo una Habilidad cuando:
- La capacidad se puede expresar como instrucciones + comandos de shell + herramientas existentes
- Envuelve una CLI externa o API que el agente puede llamar a través de `terminal` o `web_extract`
- No necesita integración personalizada de Python ni gestión de claves API integrada en el agente
- Ejemplos: búsqueda en arXiv, flujos de trabajo de git, gestión de Docker, procesamiento de PDF, email a través de herramientas CLI
### Hazlo una Herramienta cuando:
- Requiere integración de extremo a extremo con claves API, flujos de autenticación o configuración de múltiples componentes gestionada por el harness del agente
- Necesita lógica de procesamiento personalizada que debe ejecutarse con precisión en cada ocasión (no "mejor esfuerzo" de la interpretación del LLM)
- Maneja datos binarios, streaming o eventos en tiempo real que no pueden pasar por el terminal
- Ejemplos: automatización de navegador (gestión de sesiones Browserbase), TTS (codificación de audio + entrega en plataforma), análisis de visión (manejo de imágenes base64)
### ¿Debería la Habilidad estar incluida?
Las habilidades incluidas (en `skills/`) se envían con cada instalación de Hermes. Deben ser **ampliamente útiles para la mayoría de los usuarios**:
- Manejo de documentos, investigación web, flujos de trabajo de desarrollo comunes, administración de sistemas
- Usadas regularmente por una amplia gama de personas
Si tu habilidad es oficial y útil pero no universalmente necesaria (ej., una integración de servicio de pago, una dependencia pesada), ponla en **`optional-skills/`** — se envía con el repositorio pero no está activada por defecto. Los usuarios pueden descubrirla a través de `hermes skills browse` (etiquetada como "oficial") e instalarla con `hermes skills install` (sin advertencia de terceros, confianza integrada).
Si tu habilidad es especializada, contribuida por la comunidad o de nicho, es mejor para un **Skills Hub** — súbela a un registro de habilidades y compártela en el [Discord de Nous Research](https://discord.gg/NousResearch). Los usuarios pueden instalarla con `hermes skills install`.
---
## Proveedores de Memoria: Publicar como Plugin Independiente
**Ya no aceptamos nuevos proveedores de memoria en este repositorio.** El conjunto de proveedores integrados en `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) está cerrado. Si quieres añadir un nuevo backend de memoria, publícalo como un **repositorio de plugin independiente** que los usuarios instalen en `~/.hermes/plugins/` (o a través de un entry point de pip).
Los plugins de memoria independientes:
- Implementan el mismo ABC `MemoryProvider` (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown` y opcionalmente `post_setup(hermes_home, config)` para integración con el asistente de configuración
- Usan el mismo sistema de descubrimiento — `discover_memory_providers()` los recoge desde directorios de plugins de usuario/proyecto y entry points de pip
- Se integran con `hermes memory setup` a través de `post_setup()` — sin necesidad de tocar el código base
- Pueden registrar sus propios subcomandos CLI a través de `register_cli(subparser)` en un archivo `cli.py`
- Obtienen todos los mismos hooks de ciclo de vida y plomería de configuración que los proveedores incluidos en el árbol
Los PRs que añadan un nuevo directorio bajo `plugins/memory/` serán cerrados con un puntero para publicar el proveedor como su propio repositorio. Los proveedores en árbol existentes se mantienen; las correcciones de errores para ellos son bienvenidas.
Esto no es una barra de calidad — es una decisión de acoplamiento y mantenimiento. Los proveedores de memoria son el tipo de plugin más común y no deberían vivir todos en este árbol.
---
## Configuración del Desarrollo
### Prerequisitos
| Requisito | Notas |
|-----------|-------|
| **Git** | Con la extensión `git-lfs` instalada |
| **Python 3.11+** | uv lo instalará si falta |
| **uv** | Gestor de paquetes Python rápido ([instalar](https://docs.astral.sh/uv/)) |
| **Node.js 20+** | Opcional — necesario para herramientas de navegador y puente WhatsApp (coincide con los engines de `package.json` raíz) |
### Clonar e instalar
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Crear venv con Python 3.11
uv venv venv --python 3.11
export VIRTUAL_ENV="$(pwd)/venv"
# Instalar con todos los extras (mensajería, cron, menús CLI, herramientas de desarrollo)
uv pip install -e ".[all,dev]"
# Opcional: herramientas de navegador
npm install
```
### Configurar para desarrollo
```bash
mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
cp cli-config.yaml.example ~/.hermes/config.yaml
touch ~/.hermes/.env
# Añadir al menos una clave de proveedor LLM:
echo "OPENROUTER_API_KEY=***" >> ~/.hermes/.env
```
### Ejecutar
```bash
# Enlace simbólico para acceso global
mkdir -p ~/.local/bin
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
# Verificar
hermes doctor
hermes chat -q "Hola"
```
### Ejecutar tests
```bash
# Preferido — coincide con CI (entorno hermético, 4 workers xdist); ver AGENTS.md
scripts/run_tests.sh
# Alternativa (activa el venv primero). El wrapper sigue recomendándose
# para paridad con GitHub Actions antes de abrir un PR:
pytest tests/ -v
```
---
## Estructura del Proyecto
```
hermes-agent/
├── run_agent.py # Clase AIAgent — bucle de conversación central, despacho de herramientas, persistencia de sesión
├── cli.py # Clase HermesCLI — TUI interactiva, integración prompt_toolkit
├── model_tools.py # Orquestación de herramientas (capa delgada sobre tools/registry.py)
├── toolsets.py # Agrupaciones y presets de herramientas (hermes-cli, hermes-telegram, etc.)
├── hermes_state.py # Base de datos de sesiones SQLite con búsqueda de texto completo FTS5, títulos de sesión
├── batch_runner.py # Procesamiento en lote paralelo para generación de trayectorias
├── agent/ # Internos del agente (módulos extraídos)
│ ├── prompt_builder.py # Ensamblaje del prompt del sistema (identidad, habilidades, archivos de contexto, memoria)
│ ├── context_compressor.py # Auto-resumición al acercarse a los límites de contexto
│ ├── auxiliary_client.py # Resuelve clientes OpenAI auxiliares (resumición, visión)
│ ├── display.py # KawaiiSpinner, formateo del progreso de herramientas
│ ├── model_metadata.py # Longitudes de contexto del modelo, estimación de tokens
│ └── trajectory.py # Ayudantes para guardar trayectorias
├── hermes_cli/ # Implementaciones de comandos CLI
│ ├── main.py # Punto de entrada, análisis de argumentos, despacho de comandos
│ ├── config.py # Gestión de configuración, migración, definiciones de variables de entorno
│ ├── setup.py # Asistente de configuración interactivo
│ ├── auth.py # Resolución de proveedor, OAuth, Nous Portal
│ ├── models.py # Listas de selección de modelos de OpenRouter
│ ├── banner.py # Banner de bienvenida, arte ASCII
│ ├── commands.py # Registro central de comandos de barra (CommandDef), autocompletado, ayudantes del gateway
│ ├── callbacks.py # Callbacks interactivos (aclarar, sudo, aprobación)
│ ├── doctor.py # Diagnósticos
│ ├── skills_hub.py # CLI del Skills Hub + comando de barra /skills
│ └── skin_engine.py # Motor de skins/temas — personalización visual de CLI basada en datos
├── tools/ # Implementaciones de herramientas (auto-registradas)
│ ├── registry.py # Registro central de herramientas (esquemas, manejadores, despacho)
│ ├── approval.py # Detección de comandos peligrosos + aprobación por sesión
│ ├── terminal_tool.py # Orquestación del terminal (sudo, ciclo de vida del entorno, backends)
│ ├── file_operations.py # read_file, write_file, búsqueda, patch, etc.
│ ├── web_tools.py # web_search, web_extract (Paralelo/Firecrawl + resumición Gemini)
│ ├── vision_tools.py # Análisis de imágenes a través de modelos multimodales
│ ├── delegate_tool.py # Lanzamiento de subagentes y ejecución paralela de tareas
│ ├── code_execution_tool.py # Python sandboxado con acceso a herramientas vía RPC
│ ├── session_search_tool.py # Búsqueda en conversaciones pasadas con FTS5 + ventanas ancladas
│ ├── cronjob_tools.py # Gestión de tareas programadas
│ ├── skill_tools.py # Búsqueda, carga y gestión de habilidades
│ └── environments/ # Backends de ejecución del terminal
│ ├── base.py # ABC BaseEnvironment
│ ├── local.py, docker.py, ssh.py, singularity.py, modal.py, daytona.py
├── gateway/ # Gateway de mensajería
│ ├── run.py # GatewayRunner — ciclo de vida de plataformas, enrutamiento de mensajes, cron
│ ├── config.py # Resolución de configuración de plataformas
│ ├── session.py # Almacén de sesiones, prompts de contexto, políticas de reset
│ └── platforms/ # Adaptadores de plataformas
│ ├── telegram.py, discord_adapter.py, slack.py, whatsapp.py
├── scripts/ # Scripts del instalador y puente
│ ├── install.sh # Instalador Linux/macOS
│ ├── install.ps1 # Instalador Windows PowerShell
│ └── whatsapp-bridge/ # Puente WhatsApp Node.js (Baileys)
├── skills/ # Habilidades incluidas (copiadas a ~/.hermes/skills/ en la instalación)
├── optional-skills/ # Habilidades opcionales oficiales (descubribles vía hub, no activadas por defecto)
├── tests/ # Suite de tests
├── website/ # Sitio de documentación (hermes-agent.nousresearch.com)
├── cli-config.yaml.example # Configuración de ejemplo (copiada a ~/.hermes/config.yaml)
└── AGENTS.md # Guía de desarrollo para asistentes de codificación IA
```
### Configuración del usuario (almacenada en `~/.hermes/`)
| Ruta | Propósito |
|------|-----------|
| `~/.hermes/config.yaml` | Configuración (modelo, terminal, toolsets, compresión, etc.) |
| `~/.hermes/.env` | Claves API y secretos |
| `~/.hermes/auth.json` | Credenciales OAuth (Nous Portal) |
| `~/.hermes/skills/` | Todas las habilidades activas (incluidas + instaladas desde hub + creadas por el agente) |
| `~/.hermes/memories/` | Memoria persistente (MEMORY.md, USER.md) |
| `~/.hermes/state.db` | Base de datos de sesiones SQLite |
| `~/.hermes/sessions/` | Índice de enrutamiento del gateway (`sessions.json`), migas de pan de solicitudes, transcripciones `*.jsonl` del gateway y (opcionalmente) snapshots JSON por sesión cuando `sessions.write_json_snapshots: true` está configurado. Los snapshots por sesión están desactivados por defecto; state.db es canónica. |
| `~/.hermes/cron/` | Datos de trabajos programados |
| `~/.hermes/whatsapp/session/` | Credenciales del puente WhatsApp |
---
## Descripción General de la Arquitectura
### Bucle Central
```
Mensaje del usuario → AIAgent._run_agent_loop()
├── Construir prompt del sistema (prompt_builder.py)
├── Construir kwargs de API (modelo, mensajes, herramientas, configuración de razonamiento)
├── Llamar al LLM (API compatible con OpenAI)
├── Si tool_calls en la respuesta:
│ ├── Ejecutar cada herramienta a través del despacho del registro
│ ├── Añadir resultados de herramientas a la conversación
│ └── Volver a la llamada al LLM
├── Si respuesta de texto:
│ ├── Persistir sesión en DB
│ └── Devolver final_response
└── Compresión de contexto si se acerca al límite de tokens
```
### Patrones de Diseño Clave
- **Herramientas auto-registradas**: Cada archivo de herramienta llama a `registry.register()` en el momento de importación. `model_tools.py` activa el descubrimiento importando todos los módulos de herramientas.
- **Agrupación en toolsets**: Las herramientas se agrupan en toolsets (`web`, `terminal`, `file`, `browser`, etc.) que pueden habilitarse/deshabilitarse por plataforma.
- **Persistencia de sesión**: Todas las conversaciones se almacenan en SQLite (`hermes_state.py`) con búsqueda de texto completo y títulos de sesión únicos.
- **Inyección efímera**: Los prompts del sistema y los mensajes de relleno se inyectan en el momento de la llamada API, nunca se persisten en la base de datos ni en los logs.
- **Abstracción de proveedor**: El agente funciona con cualquier API compatible con OpenAI. La resolución del proveedor ocurre en el momento de la inicialización.
- **Enrutamiento de proveedor**: Al usar OpenRouter, `provider_routing` en config.yaml controla la selección del proveedor.
---
## Estilo de Código
- **PEP 8** con excepciones prácticas (no imponemos longitud de línea estricta)
- **Comentarios**: Solo cuando se explica la intención no obvia, compromisos o peculiaridades de API. No narres lo que hace el código
- **Manejo de errores**: Captura excepciones específicas. Registra con `logger.warning()`/`logger.error()` — usa `exc_info=True` para errores inesperados
- **Multiplataforma**: Nunca asumas Unix. Ver [Compatibilidad Multiplataforma](#compatibilidad-multiplataforma)
---
## Añadir una Nueva Herramienta
Antes de escribir una herramienta, pregúntate: [¿debería ser una habilidad en su lugar?](#debería-ser-una-habilidad-o-una-herramienta)
Las herramientas se auto-registran en el registro central. Cada archivo de herramienta co-localiza su esquema, manejador y registro:
```python
"""my_tool — Breve descripción de lo que hace esta herramienta."""
import json
from tools.registry import registry
def my_tool(param1: str, param2: int = 10, **kwargs) -> str:
"""Manejador. Devuelve un resultado en cadena (a menudo JSON)."""
result = do_work(param1, param2)
return json.dumps(result)
MY_TOOL_SCHEMA = {
"type": "function",
"function": {
"name": "my_tool",
"description": "Qué hace esta herramienta y cuándo debería usarla el agente.",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "Qué es param1"},
"param2": {"type": "integer", "description": "Qué es param2", "default": 10},
},
"required": ["param1"],
},
},
}
def _check_requirements() -> bool:
"""Devuelve True si las dependencias de esta herramienta están disponibles."""
return True
registry.register(
name="my_tool",
toolset="my_toolset",
schema=MY_TOOL_SCHEMA,
handler=lambda args, **kw: my_tool(**args, **kw),
check_fn=_check_requirements,
)
```
**Conectar a un toolset (requerido):** Las herramientas integradas se auto-descubren: cualquier
archivo `tools/*.py` que contenga una llamada de nivel superior `registry.register(...)` es
importado por `discover_builtin_tools()` en `tools/registry.py` cuando `model_tools`
se carga. **No** hay una lista de importaciones manual en `model_tools.py` que mantener.
Todavía debes añadir el nombre de la herramienta a la lista apropiada en `toolsets.py`
(por ejemplo `_HERMES_CORE_TOOLS` o un toolset dedicado); de lo contrario la herramienta
se registra pero nunca se expone al agente.
Consulta `AGENTS.md` (sección **Adding New Tools**) para rutas conscientes del perfil y
orientación sobre plugins vs. núcleo.
---
## Añadir una Habilidad
Las habilidades incluidas viven en `skills/` organizadas por categoría. Las habilidades opcionales oficiales usan la misma estructura en `optional-skills/`:
```
skills/
├── research/
│ └── arxiv/
│ ├── SKILL.md # Requerido: instrucciones principales
│ └── scripts/ # Opcional: scripts auxiliares
│ └── search_arxiv.py
├── productivity/
│ └── ocr-and-documents/
│ ├── SKILL.md
│ ├── scripts/
│ └── references/
└── ...
```
### Formato de SKILL.md
```markdown
---
name: my-skill
description: Breve descripción (mostrada en los resultados de búsqueda de habilidades)
version: 1.0.0
author: Tu Nombre
license: MIT
platforms: [macos, linux] # Opcional — restringir a plataformas de SO específicas
required_environment_variables: # Opcional — metadatos de configuración segura al cargar
- name: MY_API_KEY
prompt: Clave API
help: Dónde obtenerla
required_for: funcionalidad completa
prerequisites: # Requisitos de tiempo de ejecución heredados opcionales
env_vars: [MY_API_KEY]
commands: [curl, jq]
metadata:
hermes:
tags: [Categoría, Subcategoría, Palabras clave]
related_skills: [other-skill-name]
fallback_for_toolsets: [web]
requires_toolsets: [terminal]
---
# Título de la Habilidad
Introducción breve.
## Cuándo Usar
Condiciones de activación — ¿cuándo debería el agente cargar esta habilidad?
## Referencia Rápida
Tabla de comandos o llamadas API comunes.
## Procedimiento
Instrucciones paso a paso que el agente sigue.
## Problemas Conocidos
Modos de fallo conocidos y cómo manejarlos.
## Verificación
Cómo confirma el agente que funcionó.
```
### Estándares de autoría de habilidades (OBLIGATORIOS)
Todo skill nuevo o modernizado — incluido, opcional o contribuido — debe cumplir estos estándares antes del merge:
1. **`description` ≤ 60 caracteres, una oración, termina con punto.** Las descripciones largas saturan la UI de listado de habilidades. Indica la capacidad, no la implementación. Sin palabras de marketing ("potente", "completo", "fluido", "avanzado").
2. **Las herramientas referenciadas en el cuerpo de SKILL.md deben ser herramientas nativas de Hermes o servidores MCP que la habilidad espere explícitamente.** Usa los nombres de herramientas en comillas invertidas: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, etc.
3. **El campo `platforms:` auditado contra las importaciones reales del script.** Las habilidades que usen primitivos solo de POSIX deben declarar sus plataformas soportadas.
4. **`author` da crédito primero al colaborador humano.**
5. **El cuerpo de SKILL.md usa el orden moderno de secciones:** título, intro de 2-3 oraciones, luego: `## Cuándo Usar`, `## Prerequisitos`, `## Cómo Ejecutar`, `## Referencia Rápida`, `## Procedimiento`, `## Problemas Conocidos`, `## Verificación`.
6. **Los scripts van en `scripts/`, las referencias en `references/`, las plantillas en `templates/`.**
7. **Los tests viven en `tests/skills/test_<skill>_skill.py`** y usan solo stdlib + pytest + `unittest.mock`. Sin llamadas de red en vivo.
8. **Las adiciones a `.env.example` están aisladas en un bloque claramente delimitado.**
---
## Añadir una Skin / Tema
Hermes usa un sistema de skins basado en datos — no se necesitan cambios de código para añadir una nueva skin.
**Opción A: Skin de usuario (archivo YAML)**
Crea `~/.hermes/skins/<nombre>.yaml`:
```yaml
name: mitema
description: Breve descripción del tema
colors:
banner_border: "#HEX"
banner_title: "#HEX"
banner_accent: "#HEX"
banner_dim: "#HEX"
banner_text: "#HEX"
response_border: "#HEX"
spinner:
waiting_faces: ["(⚔)", "(⛨)"]
thinking_faces: ["(⚔)", "(⌁)"]
thinking_verbs: ["forjando", "planeando"]
branding:
agent_name: "Mi Agente"
welcome: "Mensaje de bienvenida"
response_label: " ⚔ Agente "
prompt_symbol: "⚔"
tool_prefix: "╎"
```
Todos los campos son opcionales — los valores faltantes se heredan de la skin predeterminada.
**Opción B: Skin integrada**
Añade al dict `_BUILTIN_SKINS` en `hermes_cli/skin_engine.py`. Usa el mismo esquema que arriba pero como dict de Python.
**Activar:**
- CLI: `/skin mitema` o establece `display.skin: mitema` en config.yaml
---
## Compatibilidad Multiplataforma
Hermes se ejecuta en Linux, macOS y Windows nativo (además de WSL2). Al escribir código
que toca el SO, asume que *cualquier* plataforma puede alcanzar tu ruta de código.
> **Antes de hacer PR:** ejecuta `scripts/check-windows-footguns.py` para detectar
> los patrones inseguros comunes de Windows en tu diff. Es basado en grep y barato;
> CI también lo ejecuta en cada PR.
### Reglas críticas
1. **Nunca llames `os.kill(pid, 0)` para comprobaciones de liveness.** En Windows **NO es una operación sin efecto**. Usa `psutil.pid_exists(pid)` en su lugar.
2. **Usa `shutil.which()` antes de hacer shell — no asumas que Windows tiene las herramientas que tiene Linux.** `ps`, `kill`, `grep`, `awk`, etc. simplemente no existen en Windows.
3. **`termios` y `fcntl` son solo de Unix.** Siempre captura tanto `ImportError` como `NotImplementedError`.
4. **Codificación de archivos.** Windows puede guardar archivos `.env` en `cp1252`. Siempre maneja errores de codificación.
5. **Gestión de procesos.** `os.setsid()`, `os.killpg()`, `os.fork()`, `os.getuid()` y el manejo de señales POSIX difieren en Windows.
6. **Señales que no existen en Windows:** `SIGALRM`, `SIGCHLD`, `SIGHUP`, `SIGUSR1`, `SIGUSR2`, etc.
7. **Separadores de ruta.** Usa `pathlib.Path` en lugar de concatenación de cadenas con `/`.
8. **Los enlaces simbólicos necesitan privilegios elevados en Windows** (a menos que el Modo Desarrollador esté activado).
9. **Los modos de archivo POSIX (0o600, 0o644, etc.) NO se aplican en NTFS** por defecto.
10. **Los daemons de fondo desacoplados en Windows necesitan `pythonw.exe`, NO `python.exe`.**
---
## Consideraciones de Seguridad
Hermes tiene acceso al terminal. La seguridad importa.
### Protecciones existentes
| Capa | Implementación |
|------|---------------|
| **Piping de contraseña sudo** | Usa `shlex.quote()` para prevenir inyección de shell |
| **Detección de comandos peligrosos** | Patrones regex en `tools/approval.py` con flujo de aprobación del usuario |
| **Inyección de prompts en cron** | Escáner en `tools/cronjob_tools.py` bloquea patrones de anulación de instrucciones |
| **Lista de denegación de escritura** | Rutas protegidas resueltas a través de `os.path.realpath()` para prevenir bypass de enlaces simbólicos |
| **Skills Guard** | Escáner de seguridad para habilidades instaladas desde el hub (`tools/skills_guard.py`) |
| **Sandbox de ejecución de código** | El proceso hijo `execute_code` se ejecuta con claves API eliminadas del entorno |
| **Fortalecimiento de contenedor** | Docker: todas las capacidades eliminadas, sin escalada de privilegios, límites de PID, tmpfs de tamaño limitado |
### Al contribuir código sensible a la seguridad
- **Siempre usa `shlex.quote()`** al interpolar entrada del usuario en comandos de shell
- **Resuelve enlaces simbólicos** con `os.path.realpath()` antes de comprobaciones de control de acceso basadas en rutas
- **No registres secretos.** Las claves API, tokens y contraseñas nunca deben aparecer en la salida de log
- **Captura excepciones amplias** alrededor de la ejecución de herramientas para que un solo fallo no bloquee el bucle del agente
- **Prueba en todas las plataformas** si tu cambio toca rutas de archivos, gestión de procesos o comandos de shell
### Política de fijación de dependencias (fortalecimiento de la cadena de suministro)
Tras el [compromiso de la cadena de suministro de litellm](https://github.com/BerriAI/litellm/issues/24512) en marzo de 2026 y la [campaña del gusano Mini Shai-Hulud](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) en mayo de 2026, todas las dependencias deben seguir estas reglas:
| Tipo de fuente | Tratamiento requerido | Justificación |
|---|---|---|
| **Paquete PyPI** | `>=suelo,<siguiente_mayor` | Las versiones de PyPI son inmutables una vez publicadas, pero pueden empujarse nuevas versiones en tu rango. |
| **URL de Git** | SHA completo del commit | Las ramas y etiquetas son refs mutables; el SHA está direccionado por contenido. |
| **GitHub Actions** | SHA completo del commit + comentario de versión | Las etiquetas de acción son refs mutables. Fija como `uses: owner/action@<sha> # vX.Y.Z` |
| **Instalaciones pip solo de CI** | `==exacto` | Builds de CI herméticos; el cambio es aceptable. |
**Cada nueva dependencia de PyPI en un PR debe tener un límite superior `<siguiente_mayor`.** Los PRs que añadan especificaciones `>=X.Y.Z` sin límite superior serán rechazados.
---
## Proceso de Pull Request
### Nomenclatura de ramas
```
fix/descripcion # Correcciones de errores
feat/descripcion # Nuevas funcionalidades
docs/descripcion # Documentación
test/descripcion # Tests
refactor/descripcion # Reestructuración de código
```
### Antes de enviar
1. **Ejecutar tests**: `scripts/run_tests.sh` (recomendado; igual que CI) o `pytest tests/ -v` con el venv del proyecto activado
2. **Probar manualmente**: Ejecuta `hermes` y ejercita la ruta de código que cambiaste
3. **Verificar impacto multiplataforma**: Si tocas E/S de archivos, gestión de procesos o manejo del terminal, considera macOS, Linux y WSL2
4. **Mantén los PRs enfocados**: Un cambio lógico por PR. No mezcles una corrección de error con una refactorización con una nueva funcionalidad.
### Descripción del PR
Incluye:
- **Qué** cambió y **por qué**
- **Cómo probarlo** (pasos de reproducción para errores, ejemplos de uso para funcionalidades)
- **Qué plataformas** probaste
- Referencia cualquier issue relacionado
### Mensajes de commit
Usamos [Conventional Commits](https://www.conventionalcommits.org/):
```
<tipo>(<alcance>): <descripción>
```
| Tipo | Usar para |
|------|-----------|
| `fix` | Correcciones de errores |
| `feat` | Nuevas funcionalidades |
| `docs` | Documentación |
| `test` | Tests |
| `refactor` | Reestructuración de código (sin cambio de comportamiento) |
| `chore` | Build, CI, actualizaciones de dependencias |
Alcances: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`, etc.
Ejemplos:
```
fix(cli): prevenir bloqueo en save_config_value cuando el modelo es una cadena
feat(gateway): añadir aislamiento de sesión multi-usuario de WhatsApp
fix(security): prevenir inyección de shell en el piping de contraseña sudo
test(tools): añadir tests unitarios para file_operations
```
---
## Reportar Issues
- Usa [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
- Incluye: SO, versión de Python, versión de Hermes (`hermes version`), traza de error completa
- Incluye pasos para reproducir
- Verifica los issues existentes antes de crear duplicados
- Para vulnerabilidades de seguridad, por favor reporta de forma privada
---
## Comunidad
- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch) — para preguntas, mostrar proyectos y compartir habilidades
- **GitHub Discussions**: Para propuestas de diseño y discusiones de arquitectura
- **Skills Hub**: Sube habilidades especializadas a un registro y compártelas con la comunidad
---
## Licencia
Al contribuir, aceptas que tus contribuciones serán licenciadas bajo la [Licencia MIT](LICENSE).

View file

@ -18,6 +18,24 @@ We value contributions in this order:
---
## Before You Start: Search First
A quick search before you build saves your time and keeps the PR queue clean — duplicates are common here, so it's worth a minute up front.
- **Search both open *and* merged PRs and issues** for your topic or error symptom — the duplicate-check in the PR template fires at review time, after you've already done the work:
```bash
gh search issues --repo NousResearch/hermes-agent "<your terms>"
gh search prs --repo NousResearch/hermes-agent --state all "<your terms>"
```
Or use the web UI: [issues](https://github.com/NousResearch/hermes-agent/issues?q=) · [PRs (all states)](https://github.com/NousResearch/hermes-agent/pulls?q=is%3Apr).
- **The issue tracker can lag the code.** Many requested features are already implemented in-tree, so also search the source (`search_files`, or your editor's grep) for the capability before proposing it.
- **If an open PR already addresses it**, consider reviewing or improving that one instead of opening a competing duplicate.
- **For larger work**, comment on the issue to signal you're working on it, so others don't start the same thing.
Related: #38284 covers the agent-side analog — Hermes itself checking existing issues and PRs before deep self-troubleshooting. This section is the human-contributor complement.
---
## Should it be a Skill or a Tool?
This is the most common question for new contributors. The answer is almost always **skill**.
@ -412,6 +430,12 @@ Brief intro.
## When to Use
Trigger conditions — when should the agent load this skill?
## Prerequisites
Env vars, install steps, MCP setup, API key sourcing.
## How to Run
Canonical invocation through the `terminal` tool.
## Quick Reference
Table of common commands or API calls.

220
README.es.md Normal file
View file

@ -0,0 +1,220 @@
<p align="center">
<img src="assets/banner.png" alt="Hermes Agent" width="100%">
</p>
# Hermes Agent ☤
<p align="center">
<a href="https://hermes-agent.nousresearch.com/">Hermes Agent</a> | <a href="https://hermes-agent.nousresearch.com/">Hermes Desktop</a>
</p>
<p align="center">
<a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentación"></a>
<a href="https://discord.gg/NousResearch"><img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
<a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/Licencia-MIT-green?style=for-the-badge" alt="Licencia: MIT"></a>
<a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Creado%20por-Nous%20Research-blueviolet?style=for-the-badge" alt="Creado por Nous Research"></a>
<a href="README.md"><img src="https://img.shields.io/badge/Lang-English-blue?style=for-the-badge" alt="English"></a>
<a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
<a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
</p>
**El agente de IA con mejora continua creado por [Nous Research](https://nousresearch.com).** Es el único agente con un bucle de aprendizaje integrado: crea habilidades a partir de la experiencia, las mejora durante el uso, se impulsa a sí mismo a persistir el conocimiento, busca en sus propias conversaciones pasadas y construye un modelo cada vez más profundo de quién eres a lo largo de las sesiones. Ejecútalo en un VPS de $5, un clúster de GPUs o infraestructura sin servidor que cuesta casi nada cuando está inactivo. No está atado a tu laptop — habla con él desde Telegram mientras trabaja en una VM en la nube.
Usa cualquier modelo que quieras — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (más de 200 modelos), [NovitaAI](https://novita.ai), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, o tu propio endpoint. Cambia con `hermes model` — sin cambios de código, sin dependencias.
<table>
<tr><td><b>Una interfaz de terminal real</b></td><td>TUI completa con edición multilínea, autocompletado de comandos, historial de conversaciones, interrupción y redirección, y salida de herramientas en streaming.</td></tr>
<tr><td><b>Vive donde tú vives</b></td><td>Telegram, Discord, Slack, WhatsApp, Signal y CLI — todo desde un único proceso gateway. Transcripción de notas de voz, continuidad de conversación entre plataformas.</td></tr>
<tr><td><b>Un bucle de aprendizaje cerrado</b></td><td>Memoria curada por el agente con recordatorios periódicos. Creación autónoma de habilidades tras tareas complejas. Las habilidades mejoran solas durante el uso. Búsqueda FTS5 de sesiones con resumención por LLM para recuperación entre sesiones. Modelado de usuario dialéctico <a href="https://github.com/plastic-labs/honcho">Honcho</a>. Compatible con el estándar abierto de <a href="https://agentskills.io">agentskills.io</a>.</td></tr>
<tr><td><b>Automatizaciones programadas</b></td><td>Planificador cron integrado con entrega a cualquier plataforma. Informes diarios, copias de seguridad nocturnas, auditorías semanales — todo en lenguaje natural, ejecutándose de forma autónoma.</td></tr>
<tr><td><b>Delega y paraleliza</b></td><td>Lanza subagentes aislados para flujos de trabajo paralelos. Escribe scripts de Python que llaman a herramientas vía RPC, convirtiendo pipelines de múltiples pasos en turnos de coste cero de contexto.</td></tr>
<tr><td><b>Funciona en cualquier lugar, no solo en tu laptop</b></td><td>Seis backends de terminal — local, Docker, SSH, Singularity, Modal y Daytona. Daytona y Modal ofrecen persistencia sin servidor — el entorno de tu agente hiberna cuando está inactivo y se activa bajo demanda, costando casi nada entre sesiones. Ejecútalo en un VPS de $5 o un clúster de GPUs.</td></tr>
<tr><td><b>Listo para investigación</b></td><td>Generación de trayectorias en lote, compresión de trayectorias para entrenar la próxima generación de modelos de llamadas a herramientas.</td></tr>
</table>
---
## Instalación rápida
### Linux, macOS, WSL2, Termux
```bash
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
```
### Windows (nativo, PowerShell)
> **Nota:** En Windows nativo, Hermes funciona sin WSL — la CLI, el gateway, la TUI y las herramientas funcionan de forma nativa. Si prefieres usar WSL2, el comando de Linux/macOS de arriba también funciona allí. ¿Encontraste un error? Por favor [crea un issue](https://github.com/NousResearch/hermes-agent/issues).
Ejecuta esto en PowerShell:
```powershell
iex (irm https://hermes-agent.nousresearch.com/install.ps1)
```
El instalador se encarga de todo: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **y un Git Bash portátil** (MinGit, descomprimido en `%LOCALAPPDATA%\hermes\git` — no requiere administrador, completamente aislado de cualquier instalación de Git del sistema). Hermes usa este Git Bash incluido para ejecutar comandos de shell.
Si ya tienes Git instalado, el instalador lo detecta y lo usa en su lugar. De lo contrario, una descarga de ~45MB de MinGit es todo lo que necesitas — no tocará ni interferirá con ningún Git del sistema.
> **Android / Termux:** La ruta manual probada está documentada en la [guía de Termux](https://hermes-agent.nousresearch.com/docs/getting-started/termux). En Termux, Hermes instala el extra `.[termux]` curado porque el extra completo `.[all]` actualmente incluye dependencias de voz incompatibles con Android.
>
> **Windows:** Windows nativo es totalmente compatible — el comando de PowerShell de arriba instala todo. Si prefieres usar WSL2, el comando de Linux también funciona allí. La instalación nativa de Windows se encuentra en `%LOCALAPPDATA%\hermes`; WSL2 instala en `~/.hermes` como en Linux.
Después de la instalación:
```bash
source ~/.bashrc # recargar shell (o: source ~/.zshrc)
hermes # ¡empieza a chatear!
```
---
## Primeros pasos
```bash
hermes # CLI interactiva — inicia una conversación
hermes model # Elige tu proveedor y modelo LLM
hermes tools # Configura qué herramientas están habilitadas
hermes config set # Establece valores de configuración individuales
hermes gateway # Inicia el gateway de mensajería (Telegram, Discord, etc.)
hermes setup # Ejecuta el asistente de configuración completo
hermes claw migrate # Migra desde OpenClaw (si vienes de OpenClaw)
hermes update # Actualiza a la última versión
hermes doctor # Diagnostica cualquier problema
```
📖 **[Documentación completa →](https://hermes-agent.nousresearch.com/docs/)**
---
## Evita la colección de claves API — Nous Portal
Hermes funciona con cualquier proveedor que quieras — eso no cambiará. Pero si prefieres no recopilar cinco claves API separadas para el modelo, búsqueda web, generación de imágenes, TTS y un navegador en la nube, **[Nous Portal](https://portal.nousresearch.com)** las cubre todas bajo una sola suscripción:
- **Más de 300 modelos** — elige cualquiera con `/model <nombre>`
- **Tool Gateway** — búsqueda web (Firecrawl), generación de imágenes (FAL), texto a voz (OpenAI), navegador en la nube (Browser Use), todo enrutado a través de tu suscripción. Sin cuentas adicionales.
Un comando desde una instalación nueva:
```bash
hermes setup --portal
```
Esto te autentica vía OAuth, establece Nous como tu proveedor y activa el Tool Gateway. Comprueba qué está conectado en cualquier momento con `hermes portal info`. Detalles completos en la [página de documentación del Tool Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
Puedes seguir usando tus propias claves por herramienta cuando quieras — el gateway es por backend, no todo o nada.
---
## Referencia rápida: CLI vs Mensajería
Hermes tiene dos puntos de entrada: inicia la interfaz de terminal con `hermes`, o ejecuta el gateway y habla con él desde Telegram, Discord, Slack, WhatsApp, Signal o Email. Una vez en una conversación, muchos comandos de barra son compartidos entre ambas interfaces.
| Acción | CLI | Plataformas de mensajería |
| ----------------------------------- | --------------------------------------------- | --------------------------------------------------------------------------------- |
| Empezar a chatear | `hermes` | Ejecuta `hermes gateway setup` + `hermes gateway start`, luego envía un mensaje al bot |
| Nueva conversación | `/new` o `/reset` | `/new` o `/reset` |
| Cambiar modelo | `/model [proveedor:modelo]` | `/model [proveedor:modelo]` |
| Establecer personalidad | `/personality [nombre]` | `/personality [nombre]` |
| Reintentar o deshacer último turno | `/retry`, `/undo` | `/retry`, `/undo` |
| Comprimir contexto / ver uso | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Explorar habilidades | `/skills` o `/<nombre-habilidad>` | `/<nombre-habilidad>` |
| Interrumpir trabajo actual | `Ctrl+C` o enviar un nuevo mensaje | `/stop` o enviar un nuevo mensaje |
| Estado específico de plataforma | `/platforms` | `/status`, `/sethome` |
Para las listas de comandos completas, consulta la [guía de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) y la [guía del Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
---
## Documentación
Toda la documentación está en **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| Sección | Contenido |
| --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
| [Inicio rápido](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Instalar → configurar → primera conversación en 2 minutos |
| [Uso de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Comandos, atajos de teclado, personalidades, sesiones |
| [Configuración](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Archivo de configuración, proveedores, modelos, todas las opciones |
| [Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| [Seguridad](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Aprobación de comandos, emparejamiento por DM, aislamiento en contenedor |
| [Herramientas y Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | Más de 40 herramientas, sistema de toolsets, backends de terminal |
| [Sistema de Habilidades](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Memoria procedimental, Skills Hub, creación de habilidades |
| [Memoria](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory) | Memoria persistente, perfiles de usuario, mejores prácticas |
| [Integración MCP](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp) | Conecta cualquier servidor MCP para capacidades extendidas |
| [Programación Cron](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) | Tareas programadas con entrega a plataforma |
| [Archivos de Contexto](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Contexto de proyecto que da forma a cada conversación |
| [Arquitectura](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | Estructura del proyecto, bucle del agente, clases principales |
| [Contribuir](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) | Configuración de desarrollo, proceso de PR, estilo de código |
| [Referencia de CLI](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) | Todos los comandos y flags |
| [Variables de Entorno](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Referencia completa de variables de entorno |
---
## Migración desde OpenClaw
Si vienes de OpenClaw, Hermes puede importar automáticamente tu configuración, memorias, habilidades y claves API.
**Durante la configuración inicial:** El asistente de configuración (`hermes setup`) detecta automáticamente `~/.openclaw` y ofrece migrar antes de que comience la configuración.
**En cualquier momento después de instalar:**
```bash
hermes claw migrate # Migración interactiva (preset completo)
hermes claw migrate --dry-run # Vista previa de qué se migraría
hermes claw migrate --preset user-data # Migrar sin secretos
hermes claw migrate --overwrite # Sobreescribir conflictos existentes
```
Qué se importa:
- **SOUL.md** — archivo de personalidad
- **Memorias** — entradas de MEMORY.md y USER.md
- **Habilidades** — habilidades creadas por el usuario → `~/.hermes/skills/openclaw-imports/`
- **Lista de comandos permitidos** — patrones de aprobación
- **Configuración de mensajería** — configuración de plataformas, usuarios permitidos, directorio de trabajo
- **Claves API** — secretos en lista de permitidos (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
- **Assets de TTS** — archivos de audio del espacio de trabajo
- **Instrucciones del espacio de trabajo** — AGENTS.md (con `--workspace-target`)
Consulta `hermes claw migrate --help` para todas las opciones, o usa la habilidad `openclaw-migration` para una migración guiada interactiva por el agente con vistas previas de dry-run.
---
## Contribuir
¡Las contribuciones son bienvenidas! Consulta la [Guía de Contribución](CONTRIBUTING.es.md) para la configuración del desarrollo, el estilo de código y el proceso de PR.
Inicio rápido para colaboradores — clona y comienza con `setup-hermes.sh`:
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
./setup-hermes.sh # instala uv, crea venv, instala .[all], enlaza ~/.local/bin/hermes
./hermes # detecta automáticamente el venv, no necesitas hacer `source` primero
```
Ruta manual (equivalente a lo anterior):
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
scripts/run_tests.sh
```
---
## Comunidad
- 💬 [Discord](https://discord.gg/NousResearch)
- 📚 [Skills Hub](https://agentskills.io)
- 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Servidor MCP de control de escritorio Linux para Hermes y otros hosts MCP, con árboles de accesibilidad AT-SPI, entrada Wayland/X11, capturas de pantalla y targeting de ventanas del compositor.
- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Puente WeChat comunitario: Ejecuta Hermes Agent y OpenClaw en la misma cuenta de WeChat.
---
## Licencia
MIT — ver [LICENSE](LICENSE).
Creado por [Nous Research](https://nousresearch.com).

View file

@ -13,6 +13,7 @@
<a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
<a href="README.zh-CN.md"><img src="https://img.shields.io/badge/Lang-中文-red?style=for-the-badge" alt="中文"></a>
<a href="README.ur-pk.md"><img src="https://img.shields.io/badge/Lang-اردو-green?style=for-the-badge" alt="اردو"></a>
<a href="README.es.md"><img src="https://img.shields.io/badge/Lang-Español-orange?style=for-the-badge" alt="Español"></a>
</p>
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
@ -64,6 +65,41 @@ source ~/.bashrc # reload shell (or: source ~/.zshrc)
hermes # start chatting!
```
### Troubleshooting
#### Windows Defender or antivirus flags `uv.exe` as malware
If your antivirus (Bitdefender, Windows Defender, etc.) quarantines `uv.exe` from the Hermes `bin` folder (`%LOCALAPPDATA%\hermes\bin\uv.exe`), this is a **false positive**. The file is Astral's `uv` — the Rust Python package manager Hermes bundles to manage its Python environment. ML-based antivirus engines commonly flag unsigned Rust binaries that download and install packages.
**To verify your copy is authentic:**
```powershell
# Install GitHub CLI if needed
winget install --id GitHub.cli
# Login to GitHub
gh auth login
# Run verification
$uv = "$env:LOCALAPPDATA\hermes\bin\uv.exe"
$ver = (& $uv --version).Split(' ')[1]
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$zip = "$env:TEMP\uv.zip"
Invoke-WebRequest "https://github.com/astral-sh/uv/releases/download/$ver/uv-x86_64-pc-windows-msvc.zip" -OutFile $zip -UseBasicParsing
gh attestation verify $zip --repo astral-sh/uv
Expand-Archive $zip "$env:TEMP\uv_x" -Force
(Get-FileHash "$env:TEMP\uv_x\uv.exe").Hash -eq (Get-FileHash $uv).Hash
```
If attestation says "Verification succeeded" and the last line prints `True`, you're good.
**To whitelist Hermes:**
- **Windows Defender:** Run PowerShell as Admin → `Add-MpPreference -ExclusionPath "$env:LOCALAPPDATA\hermes\bin"`
- **Bitdefender:** Add an exception in the Bitdefender console (Protection > Antivirus > Settings > Manage Exceptions)
- Whitelist the **folder**, not the file hash — Hermes updates `uv` and the hash changes every version
For more context, see the upstream Astral reports: [astral-sh/uv#13553](https://github.com/astral-sh/uv/issues/13553), [astral-sh/uv#15011](https://github.com/astral-sh/uv/issues/15011), [astral-sh/uv#10079](https://github.com/astral-sh/uv/issues/10079).
---
## Getting Started

View file

@ -39,7 +39,11 @@ curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
> **Android / Termux** 已测试的手动安装路径请参考 [Termux 指南](https://hermes-agent.nousresearch.com/docs/getting-started/termux)。在 Termux 上Hermes 会安装精选的 `.[termux]` 扩展,因为完整的 `.[all]` 扩展会拉取 Android 不兼容的语音依赖。
>
> **Windows** 原生 Windows 不受支持。请安装 [WSL2](https://learn.microsoft.com/zh-cn/windows/wsl/install) 并运行上述命令。
> **Windows** 在 PowerShell 中运行:
> ```powershell
> iex (irm https://hermes-agent.nousresearch.com/install.ps1)
> ```
> 安装完成后,可能需要重启终端,然后运行 `hermes` 开始对话。
安装后:

322
SECURITY.es.md Normal file
View file

@ -0,0 +1,322 @@
# Política de Seguridad de Hermes Agent
Este documento describe el modelo de confianza de Hermes Agent, identifica el
único límite de seguridad que el proyecto trata como estructural y define el
alcance para los informes de vulnerabilidades.
## 1. Reportar una Vulnerabilidad
Reporta de forma privada a través de [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
o **security@nousresearch.com**. No abras issues públicos para
vulnerabilidades de seguridad. **Hermes Agent no opera un programa de
recompensas por errores.**
Un informe útil incluye:
- Una descripción concisa y evaluación de severidad.
- El componente afectado, identificado por ruta de archivo y rango de líneas
(ej. `path/to/file.py:120-145`).
- Detalles del entorno (`hermes version`, SHA del commit, SO, versión de Python).
- Una reproducción contra `main` o el último release.
- Una declaración de qué límite de confianza del §2 se cruza.
Por favor lee el §2 y el §3 antes de enviar. Los informes que demuestren
límites de una heurística en proceso que esta política no trate como un
límite serán cerrados como fuera de alcance bajo el §3 — pero consulta el §3.2:
siguen siendo bienvenidos como issues o pull requests regulares, simplemente no
a través del canal de seguridad privado.
---
## 2. Modelo de Confianza
Hermes Agent es un agente personal de un solo inquilino. Su postura es
por capas, y las capas no tienen el mismo peso. Los reportadores y
operadores deben razonar sobre ellas en los mismos términos.
### 2.1 Definiciones
- **Proceso del agente.** El intérprete Python que ejecuta Hermes Agent,
incluyendo cualquier módulo Python que haya cargado (habilidades, plugins,
manejadores de hooks).
- **Backend de terminal.** Un objetivo de ejecución conectado para la
herramienta `terminal()`. El predeterminado ejecuta comandos directamente en el host.
Otros backends ejecutan comandos dentro de un contenedor, sandbox en la nube o
host remoto.
- **Superficie de entrada.** Cualquier canal a través del cual el contenido entra en el
contexto del agente: entrada del operador, fetches web, email, mensajes del gateway,
lecturas de archivos, respuestas del servidor MCP, resultados de herramientas.
- **Envolvente de confianza.** El conjunto de recursos a los que un operador ha otorgado
implícitamente acceso a Hermes Agent al ejecutarlo — típicamente, todo lo que
la propia cuenta de usuario del operador puede alcanzar en el host.
- **Postura.** Una declaración explícita en la documentación o código de Hermes Agent
sobre cómo una capa consumidora (adaptador, UI, escritor de archivos,
shell) debe tratar la salida del agente — ej. "el dashboard renderiza
la salida del agente como HTML inerte."
### 2.2 El Límite: Aislamiento a Nivel de SO
**El único límite de seguridad contra un LLM adversario es el
sistema operativo.** Nada dentro del proceso del agente constituye
contención — ni la puerta de aprobación, ni la redacción de salida, ni ningún
escáner de patrones, ni ninguna lista de herramientas permitidas. Cualquier componente dentro
del proceso que filtre la salida del LLM es una heurística operando sobre una
cadena influenciada por el atacante, y esta política lo trata como tal.
Hermes Agent admite dos posturas de aislamiento a nivel de SO. Abordan
diferentes amenazas y un operador debe elegir deliberadamente.
#### Aislamiento del backend de terminal
Un backend de terminal no predeterminado ejecuta comandos de shell emitidos por el LLM
dentro de un contenedor, host remoto o sandbox en la nube. Las herramientas de archivos
(`read_file`, `write_file`, `patch`) también se ejecutan a través de este backend,
ya que están implementadas sobre el contrato del shell — no pueden
alcanzar rutas que el backend no exponga.
Lo que confina: todo lo que el agente hace emitiendo operaciones de shell o
de archivos. Lo que **no** confina: todo lo que el agente hace en su propio
proceso Python. Eso incluye la herramienta de ejecución de código (lanzada como
subproceso del host), subprocesos MCP (lanzados desde el entorno del agente),
carga de plugins, despacho de hooks y carga de habilidades (todos importados en el
intérprete del agente).
El aislamiento del backend de terminal es la postura correcta cuando la preocupación es
que el LLM emita comandos de shell destructivos o escrituras de herramientas de archivo no deseadas, y el
operador es de confianza.
#### Envoltura del proceso completo
La envoltura del proceso completo ejecuta todo el árbol de procesos del agente dentro de un
sandbox. Cada ruta de código — shell, ejecución de código, MCP, herramientas de archivos,
plugins, hooks, carga de habilidades — está sujeta a la misma política de sistema de archivos,
red, proceso e (donde sea aplicable) inferencia.
Hermes Agent admite esto de dos maneras:
- **La propia imagen Docker de Hermes Agent y la configuración de Compose.** Más
liviana; el agente se ejecuta en un contenedor estándar con montajes y
política de red configurados por el operador.
- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
OpenShell proporciona sandboxes por sesión con política declarativa
a través de capas de sistema de archivos, red (egreso L7), proceso/syscall e
enrutamiento de inferencia. Las políticas de red e inferencia son
recargables en caliente. Las credenciales se inyectan desde un almacén de Proveedor
y nunca tocan el sistema de archivos del sandbox.
Bajo una envoltura de proceso completo, las heurísticas en proceso de Hermes Agent
(§2.4) funcionan como prevención de accidentes en capas sobre un límite real.
Esta es la postura soportada cuando el agente ingiere contenido de superficies
que el operador no controla — la web abierta, email entrante, canales de
múltiples usuarios, servidores MCP no confiables — y para despliegues en
producción o compartidos.
Los operadores que ejecuten el backend local predeterminado con superficies de entrada
no confiables, o que ejecuten un sandbox de backend de terminal esperando que contenga
rutas de código que no pasan por el shell, están operando fuera de la postura de
seguridad soportada.
### 2.3 Alcance de Credenciales
Hermes Agent filtra el entorno que pasa a sus componentes en proceso de
menor confianza: subprocesos de shell, subprocesos MCP y el proceso hijo
de ejecución de código. Las credenciales como las claves API del proveedor y los
tokens del gateway se eliminan por defecto; las variables declaradas explícitamente
por el operador o por una habilidad cargada se pasan.
Esto reduce la exfiltración casual. No es contención. Cualquier
componente que se ejecute dentro del proceso del agente (habilidades, plugins, manejadores
de hooks) puede leer lo que el agente mismo puede leer, incluidas las
credenciales en memoria. La mitigación contra un componente en proceso comprometido
es la revisión del operador antes de instalar (§2.4, §2.5), no el
saneamiento del entorno.
### 2.4 Heurísticas en Proceso
Los siguientes componentes filtran o advierten sobre el comportamiento del LLM. Son
útiles. No son límites.
- La **puerta de aprobación** detecta patrones de shell destructivos comunes
y le pide al operador confirmación antes de la ejecución. El shell es Turing-
completo; una lista de denegación sobre cadenas de shell es estructuralmente
incompleta. La puerta detecta errores en modo cooperativo, no salidas
adversariales.
- **La redacción de salida** elimina patrones similares a secretos de la visualización.
Un productor de salida motivado la evitará.
- **Skills Guard** escanea el contenido de habilidades instalables en busca de patrones
de inyección. Es una ayuda de revisión; el límite para habilidades de terceros
es la revisión del operador antes de instalar. Revisar una habilidad significa
leer su código Python y scripts, no solo su descripción SKILL.md —
las habilidades ejecutan Python arbitrario en el momento de importación.
### 2.5 Modelo de Confianza de Plugins
Los plugins se cargan en el proceso del agente y se ejecutan con todos los privilegios
del agente: pueden leer las mismas credenciales, llamar a las mismas
herramientas, registrar los mismos hooks e importar los mismos módulos que
cualquier cosa incluida en el árbol. El límite para los plugins de terceros es
la revisión del operador antes de instalar — la misma regla que las habilidades (§2.4),
mencionado por separado porque los plugins son arquitectónicamente más pesados
y a menudo incluyen sus propios servicios en segundo plano, oyentes de red
y dependencias.
Un plugin malicioso o con errores no es una vulnerabilidad en Hermes Agent
en sí mismo. Los errores en la ruta de instalación o descubrimiento de plugins de Hermes Agent
que impidan al operador ver lo que está instalando están en alcance bajo el §3.1.
### 2.6 Superficies Externas
Una **superficie externa** es cualquier canal fuera del proceso del agente local
a través del cual un llamador puede despachar trabajo del agente, resolver
aprobaciones o recibir salida del agente. Cada superficie tiene su propio
modelo de autorización, pero las reglas a continuación se aplican uniformemente.
**Superficies en Hermes Agent:**
- **Adaptadores de plataforma del gateway.** Integraciones de mensajería en
`gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
y adaptadores análogos incluidos como plugins.
- **Superficies HTTP expuestas en red.** El adaptador del servidor API, el
plugin del dashboard, los endpoints HTTP del plugin kanban, y cualquier
otro plugin que vincule un socket de escucha.
- **Adaptadores de Editor / IDE.** El adaptador ACP (`acp_adapter/`) e
integraciones equivalentes que aceptan solicitudes de un proceso cliente local.
- **El gateway TUI (`tui_gateway/`).** Backend JSON-RPC para la
UI de terminal Ink, alcanzado a través de IPC local.
**Reglas uniformes:**
1. **Se requiere autorización en cada superficie que cruce un límite de confianza.** Para
superficies de mensajería y HTTP en red, el límite es la red: la autorización
significa una lista de llamadores permitidos configurada por el operador. Para superficies
de editor e IPC local (ACP, gateway TUI), el límite es la cuenta de usuario del host:
la autorización significa depender del control de acceso a nivel de SO (permisos
de archivos, vinculaciones solo a loopback) y no exponer la superficie más allá
del usuario local sin una capa de autenticación de red explícita.
2. **Se requiere una lista de permitidos para cada adaptador de red habilitado.**
Los adaptadores deben rechazar despachar trabajo del agente, resolver
aprobaciones o transmitir salida hasta que se establezca una lista de permitidos. Las rutas
de código que fallan de forma abierta cuando no hay lista de permitidos configurada son errores de código en
alcance bajo el §3.1.
3. **Los identificadores de sesión son manejadores de enrutamiento, no límites de autorización.**
Conocer el ID de sesión de otro llamador no otorga acceso a sus aprobaciones o salida;
la autorización siempre se vuelve a verificar contra la lista de permitidos (o equivalente
a nivel de SO).
4. **Dentro del conjunto autorizado, todos los llamadores tienen la misma confianza.**
Hermes Agent no modela capacidades por llamador dentro de un único adaptador.
Los operadores que necesiten separación de capacidades deben ejecutar instancias
de agente separadas con listas de permitidos separadas.
5. **Vincular una superficie solo local a una interfaz no-loopback es una decisión de
operador de emergencia (§3.2).** El dashboard y otros servidores HTTP de plugins
son predeterminados a loopback; exponerlos a través de `--host 0.0.0.0` o equivalente
hace que el fortalecimiento de exposición pública (§4) sea responsabilidad del operador.
---
## 3. Alcance
### 3.1 En Alcance
- Escape de una postura de aislamiento a nivel de SO declarada (§2.2): una
ruta de código controlada por el atacante alcanzando estado que la postura
afirmó confinar.
- Acceso no autorizado a superficie externa: un llamador fuera del conjunto de
autorización configurado (lista de permitidos, o equivalente a nivel de SO
para superficies de IPC local) despachando trabajo, recibiendo salida o
resolviendo aprobaciones (§2.6).
- Exfiltración de credenciales: filtración de credenciales del operador o
material de autorización de sesión a un destino fuera del envolvente de
confianza, a través de un mecanismo que debería haberlo prevenido
(error de saneamiento de entorno, registro del adaptador, error de transporte
que vacía credenciales a un upstream, etc.).
- Violaciones de la documentación del modelo de confianza: código que se comporta
contrariamente a lo que esta política, la propia documentación de Hermes Agent o
las expectativas razonables del operador predecirían — incluyendo casos donde
Hermes Agent ha documentado una postura sobre cómo su salida debe ser
renderizada por una capa consumidora (dashboard, adaptador de gateway,
escritor de archivos, shell) y una ruta de código rompe esa postura.
### 3.2 Fuera de Alcance
"Fuera de alcance" aquí significa "no es una vulnerabilidad de seguridad bajo esta
política." No significa "no vale la pena reportarlo." Las mejoras a las
heurísticas en proceso, ideas de fortalecimiento y correcciones de UX son bienvenidas como
issues o pull requests regulares — la puerta de aprobación siempre puede detectar
más patrones, la redacción puede volverse más inteligente, el comportamiento del adaptador
puede apretarse siempre. Estos elementos simplemente no van a través del canal de
divulgación privada y no reciben avisos.
- **Bypasses de heurísticas en proceso (§2.4)** — bypasses de regex de la puerta de aprobación,
bypasses de redacción, bypasses de patrones de Skills Guard, e informes
análogos contra heurísticas futuras. Estos componentes no son límites;
vencerlos no es una vulnerabilidad bajo esta política.
- **Inyección de prompts per se.** Hacer que el LLM emita salida inusual
— a través de contenido inyectado, alucinación, artefactos de entrenamiento,
o cualquier otra causa — no es en sí mismo una vulnerabilidad. "Logré
inyección de prompts" sin un resultado encadenado del §3.1 no es un informe
procesable bajo esta política.
- **Consecuencias de una postura de aislamiento elegida.** Los informes de que
una ruta de código que opera dentro del alcance de su postura puede hacer lo que esa
postura permite no son vulnerabilidades. Ejemplos: herramientas de shell o archivos
que alcanzan estado del host bajo el backend local; subprocesos de ejecución de código
o MCP que alcanzan estado del host bajo aislamiento de backend de terminal que solo
sandboxea el shell; informes cuyas precondiciones requieren acceso de escritura preexistente
a archivos de configuración o credenciales propiedad del operador (esos ya están dentro
del envolvente de confianza).
- **Configuraciones documentadas de emergencia.** Compensaciones seleccionadas por el operador
que deshabilitan explícitamente protecciones: `--insecure` y flags equivalentes
en el dashboard u otros componentes, aprobaciones deshabilitadas,
backend local en producción, perfiles de desarrollo que evitan
la seguridad de hermes-home, y similares. Los informes contra esas
configuraciones no son vulnerabilidades — eso es el trabajo del flag.
- **Habilidades y plugins contribuidos por la comunidad.** Las habilidades de terceros
(incluyendo el repositorio de habilidades de la comunidad) y los plugins de terceros
están en la superficie de revisión del operador, no en la superficie de confianza de Hermes Agent
(§2.4, §2.5). Una habilidad o plugin que haga algo
malicioso es el modo de falla esperado de uno que no fue
revisado, no una vulnerabilidad en Hermes Agent. Los errores en la ruta de
instalación de habilidades o plugins de Hermes Agent que impidan al
operador ver lo que está instalando están en alcance bajo el §3.1.
- **Exposición pública sin controles externos.** Exponer el
gateway o la API a la internet pública sin autenticación,
VPN o firewall.
- **Restricciones de lectura/escritura a nivel de herramienta en una postura donde el shell está
permitido.** Si una ruta es alcanzable a través de la herramienta terminal, los informes
de que otras herramientas de archivos pueden alcanzarla no añaden nada.
---
## 4. Fortalecimiento del Despliegue
La decisión de fortalecimiento más importante es hacer coincidir el aislamiento
(§2.2) con la confianza del contenido que el agente ingerirá. Más allá de eso:
- Ejecuta el agente como usuario no-root. La imagen de contenedor proporcionada
hace esto por defecto.
- Mantén las credenciales en el archivo de credenciales del operador con permisos
estrictos, nunca en la configuración principal, nunca en control de versiones.
Bajo OpenShell, usa el almacén de Proveedores en lugar de un archivo de
credenciales en disco.
- No expongas el gateway o la API a la internet pública sin
VPN, Tailscale o protección de firewall. Bajo OpenShell, usa la
capa de política de red para restringir el egreso.
- Configura una lista de llamadores permitidos para cada adaptador de red expuesto
que habilites (§2.6).
- Revisa las habilidades y plugins de terceros antes de instalar (§2.4,
§2.5). Para las habilidades, esto significa leer el Python y los scripts,
no solo SKILL.md. Los informes de Skills Guard y el registro de auditoría
de instalación son la superficie de revisión.
- Hermes Agent incluye guardias de cadena de suministro para lanzamientos de servidores
MCP y para cambios de dependencias / paquetes incluidos en CI; consulta
`CONTRIBUTING.es.md` para más detalles.
---
## 5. Divulgación
- **Ventana de divulgación coordinada:** 90 días desde el informe, o hasta que se
publique una corrección, lo que ocurra primero.
- **Canal:** el hilo GHSA o correspondencia por email con
security@nousresearch.com.
- **Crédito:** los reportadores reciben crédito en las notas de versión a menos que
se solicite anonimato.

View file

@ -121,10 +121,11 @@ outside the supported security posture.
### 2.3 Credential Scoping
Hermes Agent filters the environment it passes to its lower-trust
in-process components: shell subprocesses, MCP subprocesses, and
the code-execution child. Credentials like provider API keys and
gateway tokens are stripped by default; variables explicitly
declared by the operator or by a loaded skill are passed through.
in-process components: shell subprocesses, MCP subprocesses,
cron job scripts, and the code-execution child. Credentials like
provider API keys and gateway tokens are stripped by default;
variables explicitly declared by the operator or by a loaded
skill are passed through.
This reduces casual exfiltration. It is not containment. Any
component running inside the agent process (skills, plugins, hook

View file

@ -617,6 +617,10 @@ class SessionManager:
_register_task_cwd(session_id, cwd)
agent = AIAgent(**kwargs)
# Codex app-server sessions are spawned lazily on the first turn. Stamp
# the ACP workspace onto the agent so the Codex runtime starts from the
# editor/session cwd instead of the Hermes daemon's process cwd.
agent.session_cwd = cwd
# ACP stdio transport requires stdout to remain protocol-only JSON-RPC.
# Route any incidental human-readable agent output to stderr instead.
agent._print_fn = _acp_stderr_print

View file

@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.16.0",
"version": "0.17.0",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.16.0",
"package": "hermes-agent[acp]==0.17.0",
"args": ["hermes-acp"]
}
}

View file

@ -50,7 +50,7 @@ from agent.tool_guardrails import (
from hermes_cli.config import cfg_get
from hermes_cli.timeouts import get_provider_request_timeout
from hermes_constants import get_hermes_home
from utils import base_url_host_matches
from utils import base_url_host_matches, is_truthy_value
# Use the same logger name as run_agent so tests patching ``run_agent.logger``
# capture our warnings. (run_agent.py also does
@ -265,7 +265,8 @@ def init_agent(
output_config.format instead of a trailing-assistant prefill.
platform (str): The interface platform the user is on (e.g. "cli", "telegram", "discord", "whatsapp").
Used to inject platform-specific formatting hints into the system prompt.
skip_context_files (bool): If True, skip auto-injection of SOUL.md, AGENTS.md, and .cursorrules
skip_context_files (bool): If True, skip auto-injection of project context files
(SOUL.md, .hermes.md, AGENTS.md, CLAUDE.md, .cursorrules) from the cwd / HERMES_HOME
into the system prompt. Use this for batch processing and data generation to avoid
polluting trajectories with user-specific persona or project instructions.
load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary
@ -531,7 +532,14 @@ def init_agent(
agent._last_activity_desc: str = "initializing"
agent._current_tool: str | None = None
agent._api_call_count: int = 0
# Opt-out flag for the between-turns MCP tool refresh (build_turn_context).
# Set on internal forks (e.g. background_review) that must keep ``tools[]``
# byte-identical to a parent for provider cache parity.
agent._skip_mcp_refresh = False
# Registry generation the current tool snapshot was derived from. Lets a
# late/concurrent refresh reject a stale (older-generation) rebuild instead
# of clobbering a newer one. Set adjacent to the tool snapshot below.
agent._tool_snapshot_generation = 0
# Rate limit tracking — updated from x-ratelimit-* response headers
# after each API call. Accessed by /usage slash command.
agent._rate_limit_state: Optional["RateLimitState"] = None
@ -800,6 +808,8 @@ def init_agent(
# _custom_headers; older/mocked clients may expose
# _default_headers instead.
_routed_headers = getattr(_routed_client, "_custom_headers", None)
if not _routed_headers:
_routed_headers = getattr(_routed_client, "default_headers", None)
if not _routed_headers:
_routed_headers = getattr(_routed_client, "_default_headers", None)
if _routed_headers:
@ -853,6 +863,8 @@ def init_agent(
if _provider_timeout is not None:
client_kwargs["timeout"] = _provider_timeout
_fb_headers = getattr(_fb_client, "_custom_headers", None)
if not _fb_headers:
_fb_headers = getattr(_fb_client, "default_headers", None)
if not _fb_headers:
_fb_headers = getattr(_fb_client, "_default_headers", None)
if _fb_headers:
@ -953,7 +965,14 @@ def init_agent(
print(f"🔄 Fallback chain ({len(agent._fallback_chain)} providers): " +
"".join(f"{f['model']} ({f['provider']})" for f in agent._fallback_chain))
# Get available tools with filtering
# Get available tools with filtering. Capture the registry generation this
# snapshot is derived from FIRST, so a later concurrent refresh can tell
# whether it holds a newer or staler view (see refresh_agent_mcp_tools).
try:
from tools.registry import registry as _snapshot_registry
agent._tool_snapshot_generation = _snapshot_registry._generation
except Exception:
agent._tool_snapshot_generation = 0
agent.tools = _ra().get_tool_definitions(
enabled_toolsets=enabled_toolsets,
disabled_toolsets=disabled_toolsets,
@ -1081,6 +1100,12 @@ def init_agent(
agent._parent_session_id = parent_session_id
agent._last_flushed_db_idx = 0 # tracks DB-write cursor to prevent duplicate writes
agent._session_db_created = False # DB row deferred to run_conversation()
# Most agents own their session row and should finalize it on close().
# Some temporary helper agents (manual compression / session-hygiene /
# background-review forks) rotate or share the session forward to a
# continuation row that must remain open after the helper is torn down;
# those callers explicitly set this flag to False.
agent._end_session_on_close = True
agent._session_init_model_config = {
"max_iterations": agent.max_iterations,
"reasoning_config": reasoning_config,
@ -1325,6 +1350,14 @@ def init_agent(
compression_abort_on_summary_failure = str(
_compression_cfg.get("abort_on_summary_failure", False)
).lower() in {"true", "1", "yes"}
# In-place compaction: when True, compress_context() rewrites the message
# list + rebuilds the system prompt WITHOUT rotating the session id (no
# parent_session_id chain, no `name #N` renumber). See #38763 and
# agent/conversation_compression.py. Consumed by compress_context(), not the
# compressor, so it rides on the agent.
compression_in_place = is_truthy_value(
_compression_cfg.get("in_place"), default=False
)
# Read optional explicit context_length override for the auxiliary
# compression model. Custom endpoints often cannot report this via
@ -1544,6 +1577,7 @@ def init_agent(
abort_on_summary_failure=compression_abort_on_summary_failure,
)
agent.compression_enabled = compression_enabled
agent.compression_in_place = compression_in_place
# Reject models whose context window is below the minimum required
# for reliable tool-calling workflows (64K tokens).

View file

@ -1050,6 +1050,11 @@ def restore_primary_runtime(agent) -> bool:
agent._fallback_activated = False
agent._fallback_index = 0
# Undo the fallback's identity rewrite so the prompt is
# byte-identical to the stored copy again (prefix cache match).
from agent.chat_completion_helpers import rewrite_prompt_model_identity
rewrite_prompt_model_identity(agent, rt["model"], rt["provider"])
logger.info(
"Primary runtime restored for new turn: %s (%s)",
agent.model, agent.provider,
@ -1373,22 +1378,6 @@ def create_openai_client(agent, client_kwargs: dict, *, reason: str, shared: boo
agent._client_log_context(),
)
return client
if agent.provider == "google-gemini-cli" or str(client_kwargs.get("base_url", "")).startswith("cloudcode-pa://"):
from agent.gemini_cloudcode_adapter import GeminiCloudCodeClient
# Strip OpenAI-specific kwargs the Gemini client doesn't accept
safe_kwargs = {
k: v for k, v in client_kwargs.items()
if k in {"api_key", "base_url", "default_headers", "project_id", "timeout"}
}
client = GeminiCloudCodeClient(**safe_kwargs)
_ra().logger.info(
"Gemini Cloud Code Assist client created (%s, shared=%s) %s",
reason,
shared,
agent._client_log_context(),
)
return client
if agent.provider == "gemini":
from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
@ -2182,25 +2171,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
if source_msg.get("role") != "assistant":
return
# 1. Explicit reasoning_content already set — preserve it verbatim
# (includes DeepSeek/Kimi's own space-placeholder written at creation
# time, and any valid reasoning content from the same provider).
needs_thinking_pad = agent._needs_thinking_reasoning_pad()
# 1. Explicit reasoning_content already set.
#
# Exception: sessions persisted BEFORE #17341 have empty-string
# placeholders pinned at creation time. DeepSeek V4 Pro rejects
# those with HTTP 400. When the active provider enforces the
# thinking-mode echo, upgrade "" → " " on replay so stale history
# doesn't 400 the user on the next turn.
# When the active provider enforces the thinking-mode echo-back
# (DeepSeek / Kimi / MiMo), preserve it verbatim — that includes their
# own space-placeholder written at creation time and any valid reasoning
# from the same provider. Sessions persisted BEFORE #17341 have
# empty-string placeholders pinned at creation time; DeepSeek V4 Pro
# rejects those with HTTP 400, so upgrade "" → " " on replay.
#
# When the active provider does NOT enforce echo-back, strip the field
# entirely. Strict OpenAI-compatible providers (Mistral, Cerebras, Groq,
# SambaNova, …) reject ANY reasoning_content key in input messages with
# HTTP 400/422 ("Extra inputs are not permitted"), even an empty string
# or a single-space pad. This is the cross-provider fallback case: a
# reasoning primary (DeepSeek/Kimi/MiMo) pads history with " ", then a
# fallback to a strict provider replays that pad and 422s. Stripping
# here covers the rebuild path; reapply_reasoning_echo_for_provider()
# covers the already-built api_messages path. Refs #45655.
existing = source_msg.get("reasoning_content")
if isinstance(existing, str):
if existing == "" and agent._needs_thinking_reasoning_pad():
if not needs_thinking_pad:
api_msg.pop("reasoning_content", None)
elif existing == "":
api_msg["reasoning_content"] = " "
else:
api_msg["reasoning_content"] = existing
return
needs_thinking_pad = agent._needs_thinking_reasoning_pad()
# 2. Cross-provider poisoned history (#15748): on DeepSeek/Kimi,
# if the source turn has tool_calls AND a 'reasoning' field but no
# 'reasoning_content' key, the 'reasoning' text was written by a
@ -2226,9 +2226,13 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
# for providers that use the internal 'reasoning' key.
# This must happen before the unconditional empty-string fallback so
# genuine reasoning content is not overwritten (#15812 regression in
# PR #15478).
# PR #15478). Only promote for providers that enforce echo-back —
# strict providers reject the field (refs #45655).
if isinstance(normalized_reasoning, str) and normalized_reasoning:
api_msg["reasoning_content"] = normalized_reasoning
if needs_thinking_pad:
api_msg["reasoning_content"] = normalized_reasoning
else:
api_msg.pop("reasoning_content", None)
return
# 4. DeepSeek / Kimi thinking mode: all assistant messages need
@ -2249,34 +2253,53 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
"""Re-pad assistant turns with reasoning_content for the active provider.
"""Re-pad (or strip) assistant turns' reasoning_content for the active provider.
``api_messages`` is built once, before the retry loop, while the *primary*
provider is active. If a mid-conversation fallback then switches to a
require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
turns that were built when the prior provider did NOT need the echo-back go
out without ``reasoning_content`` and the new provider rejects them with
HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
provider is active. A mid-conversation fallback can then switch providers,
so the reasoning fields baked into ``api_messages`` are shaped for the
*prior* provider and must be reconciled against the *current* one:
Calling this immediately before building the request kwargs re-applies the
pad against the *current* provider. It is idempotent and a no-op unless
``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
is safe to call every iteration and covers every fallback path.
* Switching TO a require-side provider (DeepSeek / Kimi / MiMo thinking
mode): assistant turns built when the prior provider did NOT need the
echo-back go out without ``reasoning_content`` and the new provider
rejects them with HTTP 400 ("The reasoning_content in the thinking mode
must be passed back"). Re-apply the pad.
Returns the number of assistant turns that gained reasoning_content.
* Switching TO a strict provider that rejects the field (Mistral,
Cerebras, Groq, SambaNova, ): assistant turns built under a reasoning
primary carry a ``reasoning_content`` pad (often a single space ``" "``),
and the strict provider rejects it with HTTP 400/422 ("Extra inputs are
not permitted"). Strip the field. This is the exact cross-provider
fallback bug from #45655 — a DeepSeek primary pads history with ``" "``,
the request falls back to Mistral, and Mistral 422s on the stale pad.
Calling this immediately before building the request kwargs reconciles the
fields against the *current* provider. It is idempotent and safe to call
every iteration; it covers every fallback path.
Returns the number of assistant turns whose reasoning_content was added or
removed.
"""
if not agent._needs_thinking_reasoning_pad():
return 0
padded = 0
needs_pad = agent._needs_thinking_reasoning_pad()
changed = 0
for api_msg in api_messages:
if api_msg.get("role") != "assistant":
continue
if api_msg.get("reasoning_content"):
continue
copy_reasoning_content_for_api(agent, api_msg, api_msg)
if api_msg.get("reasoning_content"):
padded += 1
return padded
if needs_pad:
if api_msg.get("reasoning_content"):
continue
copy_reasoning_content_for_api(agent, api_msg, api_msg)
if api_msg.get("reasoning_content"):
changed += 1
else:
# Strict provider — strip any stale reasoning_content pad left
# over from a reasoning primary so the fallback request doesn't
# 400/422 on it.
if "reasoning_content" in api_msg:
api_msg.pop("reasoning_content", None)
changed += 1
return changed
def _iter_pool_sockets(client: Any):

View file

@ -2535,3 +2535,56 @@ def sanitize_anthropic_kwargs(api_kwargs: Any, *, log_prefix: str = "") -> Any:
sorted(leaked),
)
return api_kwargs
def _is_stream_unavailable_error(exc: Exception) -> bool:
"""Return True when an Anthropic stream call should fall back to create()."""
err_lower = str(exc).lower()
if "stream" in err_lower and "not supported" in err_lower:
return True
if "invokemodelwithresponsestream" in err_lower:
from agent.bedrock_adapter import is_streaming_access_denied_error
return is_streaming_access_denied_error(exc)
return False
def create_anthropic_message(
client: Any,
api_kwargs: dict,
*,
log_prefix: str = "",
prefer_stream: bool = True,
) -> Any:
"""Create an Anthropic message, aggregating via stream when available.
Some Anthropic-compatible gateways are SSE-only: they ignore non-streaming
requests and return ``text/event-stream`` even for ``messages.create()``.
The SDK can surface that as raw text, so callers that expect a Message then
crash on ``.content``. Prefer ``messages.stream().get_final_message()`` to
match the main turn path, falling back to ``create()`` only for providers
that explicitly do not support streaming, such as restricted Bedrock roles.
"""
sanitize_anthropic_kwargs(api_kwargs, log_prefix=log_prefix)
messages_api = getattr(client, "messages", None)
stream_fn = getattr(messages_api, "stream", None)
if prefer_stream and callable(stream_fn):
stream_kwargs = dict(api_kwargs)
stream_kwargs.pop("stream", None)
try:
with stream_fn(**stream_kwargs) as stream:
return stream.get_final_message()
except Exception as exc:
if not _is_stream_unavailable_error(exc):
raise
logger.debug(
"%sAnthropic Messages stream unavailable; falling back to "
"messages.create(): %s",
log_prefix,
exc,
)
create_kwargs = dict(api_kwargs)
create_kwargs.pop("stream", None)
return messages_api.create(**create_kwargs)

View file

@ -40,6 +40,7 @@ Payment / credit exhaustion fallback:
their OpenRouter balance but has Codex OAuth or another provider available.
"""
import contextlib
import json
import logging
import os
@ -102,11 +103,44 @@ OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
from utils import base_url_host_matches, base_url_hostname, model_forces_max_completion_tokens, normalize_proxy_env_vars
from utils import base_url_host_matches, base_url_hostname, env_float, model_forces_max_completion_tokens, normalize_proxy_env_vars
logger = logging.getLogger(__name__)
# ── Interrupt protection for atomic auxiliary tasks ──────────────────────
# Some auxiliary tasks must NOT be aborted mid-flight by a gateway interrupt
# (e.g. an incoming user message while the agent is busy). Context
# compression is the prime case: if the summary LLM call is interrupted
# part-way, compression falls back to a static "summary unavailable" marker
# and the real handoff is lost (#23975). A thread-local flag lets such a
# task mark its in-flight LLM call as interrupt-protected; the Codex
# Responses stream's cancellation check honors it. TIMEOUTS still fire
# (a hung call must die), and all OTHER aux tasks (vision, web_extract,
# title_generation, …) remain freely interruptible.
_aux_interrupt_protection = threading.local()
def _aux_interrupt_protected() -> bool:
return bool(getattr(_aux_interrupt_protection, "active", False))
@contextlib.contextmanager
def aux_interrupt_protection(active: bool = True):
"""Mark the current thread's auxiliary LLM call as interrupt-protected.
Used by atomic aux tasks (compression) so a mid-flight gateway interrupt
doesn't abort the call and trigger a degraded fallback. Re-entrant-safe:
restores the previous value on exit.
"""
prev = getattr(_aux_interrupt_protection, "active", False)
_aux_interrupt_protection.active = active
try:
yield
finally:
_aux_interrupt_protection.active = prev
def _safe_isinstance(obj: Any, maybe_type: Any) -> bool:
"""Return False instead of raising when a patched symbol is not a type."""
try:
@ -631,6 +665,13 @@ def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
return str(url or "").strip().rstrip("/")
def _nous_min_key_ttl_seconds() -> int:
try:
return max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
except (TypeError, ValueError):
return 1800
# ── Codex Responses → chat.completions adapter ─────────────────────────────
# All auxiliary consumers call client.chat.completions.create(**kwargs) and
# read response.choices[0].message.content. This adapter translates those
@ -805,7 +846,11 @@ class _CodexCompletionsAdapter:
raise TimeoutError(_timeout_message())
try:
from tools.interrupt import is_interrupted
if is_interrupted():
# Honor interrupt protection for atomic aux tasks (compression):
# a mid-flight gateway interrupt must NOT abort the summary call
# and trigger a degraded fallback marker (#23975). Timeouts above
# still fire; other aux tasks remain interruptible.
if is_interrupted() and not _aux_interrupt_protected():
raise InterruptedError("Codex auxiliary Responses stream interrupted")
except InterruptedError:
raise
@ -997,7 +1042,7 @@ class _AnthropicCompletionsAdapter:
self._is_oauth = is_oauth
def create(self, **kwargs) -> Any:
from agent.anthropic_adapter import build_anthropic_kwargs
from agent.anthropic_adapter import build_anthropic_kwargs, create_anthropic_message
from agent.transports import get_transport
messages = kwargs.get("messages", [])
@ -1041,7 +1086,7 @@ class _AnthropicCompletionsAdapter:
if not _forbids_sampling_params(model):
anthropic_kwargs["temperature"] = temperature
response = self._client.messages.create(**anthropic_kwargs)
response = create_anthropic_message(self._client, anthropic_kwargs)
_transport = get_transport("anthropic_messages")
_nr = _transport.normalize_response(
response, strip_tool_prefix=self._is_oauth
@ -1300,6 +1345,57 @@ def _nous_base_url() -> str:
return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
def _resolve_nous_pool_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
"""Resolve Nous auxiliary credentials from the selected pool entry."""
try:
from hermes_cli.auth import _agent_key_is_usable
pool = load_pool("nous")
except Exception as exc:
logger.debug("Auxiliary Nous pool credential resolution failed: %s", exc)
return None
if not pool or not pool.has_credentials():
return None
try:
entry = pool.select()
except Exception as exc:
logger.debug("Auxiliary Nous pool selection failed: %s", exc)
return None
if entry is None:
return None
state = {
"agent_key": getattr(entry, "agent_key", None),
"agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
"scope": getattr(entry, "scope", None),
}
if force_refresh or not _agent_key_is_usable(state, _nous_min_key_ttl_seconds()):
try:
refreshed = pool.try_refresh_current()
except Exception as exc:
logger.debug("Auxiliary Nous pool refresh failed: %s", exc)
refreshed = None
if refreshed is None:
return None
entry = refreshed
provider = {
"agent_key": getattr(entry, "agent_key", None),
"agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
"access_token": getattr(entry, "access_token", None),
"expires_at": getattr(entry, "expires_at", None),
"scope": getattr(entry, "scope", None),
}
api_key = _nous_api_key(provider)
base_url = _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL)
if not api_key or not base_url:
return None
return api_key, base_url
def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
"""Return fresh Nous runtime credentials when available.
@ -1308,11 +1404,15 @@ def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[
relying only on whatever raw tokens happen to be sitting in auth.json
or the credential pool.
"""
pooled = _resolve_nous_pool_runtime_api(force_refresh=force_refresh)
if pooled is not None:
return pooled
try:
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
force_refresh=force_refresh,
)
except Exception as exc:
@ -2905,7 +3005,7 @@ def _refresh_provider_credentials(provider: str) -> bool:
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
timeout_seconds=env_float("HERMES_NOUS_TIMEOUT_SECONDS", 15),
force_refresh=True,
)
if not str(creds.get("api_key", "") or "").strip():

View file

@ -535,6 +535,13 @@ def _run_review_in_thread(
)
review_agent._memory_write_origin = "background_review"
review_agent._memory_write_context = "background_review"
# The review fork pins the parent's cached system prompt and keeps
# ``tools[]`` byte-identical to the parent so its outbound request
# hits the same provider cache prefix (see the toolset-parity note
# above). The between-turns MCP refresh in build_turn_context would
# add late-connecting MCP tools to this fork and break that parity,
# so opt the review fork out of it.
review_agent._skip_mcp_refresh = True
review_agent._memory_store = agent._memory_store
review_agent._memory_enabled = agent._memory_enabled
review_agent._user_profile_enabled = agent._user_profile_enabled
@ -568,6 +575,13 @@ def _run_review_in_thread(
# if a future code path bypasses the cache.
review_agent.session_start = agent.session_start
review_agent.session_id = agent.session_id
# The fork shares the parent's live session_id (pinned above for
# prefix-cache parity). It is single-lifecycle and calls close()
# right after this run_conversation(); without opting out, close()
# would finalize the parent's still-active session row mid
# conversation (the review fires every ~10 turns). Leave session
# finalization to the real owner (CLI close / gateway reset / cron).
review_agent._end_session_on_close = False
# Never let the review fork compress. It shares the parent's
# session_id, so if it won a compression race it would rotate the
# parent into a NEW child that the gateway never adopts (the fork

View file

@ -34,7 +34,7 @@ from agent.message_sanitization import (
_repair_tool_call_arguments,
)
from tools.terminal_tool import is_persistent_env
from utils import base_url_host_matches, base_url_hostname, env_int
from utils import base_url_host_matches, base_url_hostname, env_float, env_int
logger = logging.getLogger(__name__)
@ -1042,6 +1042,35 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
def rewrite_prompt_model_identity(agent, model: str, provider: str) -> None:
"""Point the cached system prompt's ``Model:``/``Provider:`` lines at
the active runtime after a provider switch.
The system prompt is session-stable and replayed verbatim for prefix-cache
warmth, but after a failover the new backend's cache is cold anyway —
while a stale identity line makes the agent misreport which model it is
when asked. Rewrite the lines in place WITHOUT persisting to the session
DB: the stored row keeps the primary's labels, so when the primary is
restored the prompt is byte-identical to the stored copy again and its
prefix cache still matches.
Only the LAST occurrence of each line is touched the identity lines
live in the volatile tail of the prompt, and earlier matches could be
user content (memory snapshots, context files).
"""
sp = getattr(agent, "_cached_system_prompt", None)
if not isinstance(sp, str) or not sp:
return
for label, value in (("Model", model), ("Provider", provider)):
if not value:
continue
matches = list(re.finditer(rf"(?m)^{label}: .*$", sp))
if matches:
last = matches[-1]
sp = f"{sp[:last.start()]}{label}: {value}{sp[last.end():]}"
agent._cached_system_prompt = sp
def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool:
"""Switch to the next fallback model/provider in the chain.
@ -1287,6 +1316,10 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
api_mode=agent.api_mode,
)
# Keep the prompt's self-identity in sync with the model actually
# answering, so "what model are you?" doesn't report the primary.
rewrite_prompt_model_identity(agent, fb_model, fb_provider)
agent._buffer_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
@ -1761,14 +1794,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_base_timeout = (
_provider_timeout_cfg
if _provider_timeout_cfg is not None
else float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
else env_float("HERMES_API_TIMEOUT", 1800.0)
)
# Read timeout: config wins here too. Otherwise use
# HERMES_STREAM_READ_TIMEOUT (default 120s) for cloud providers.
if _provider_timeout_cfg is not None:
_stream_read_timeout = _provider_timeout_cfg
else:
_stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
_stream_read_timeout = env_float("HERMES_STREAM_READ_TIMEOUT", 120.0)
# Local providers (Ollama, llama.cpp, vLLM) can take minutes for
# prefill on large contexts before producing the first token.
# Auto-increase the httpx read timeout unless the user explicitly
@ -2508,7 +2541,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
if _cfg_stale is not None:
_stream_stale_timeout_base = _cfg_stale
else:
_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
_stream_stale_timeout_base = env_float("HERMES_STREAM_STALE_TIMEOUT", 180.0)
# Local providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
# for prefill on large contexts. Disable the stale detector unless
# the user explicitly set HERMES_STREAM_STALE_TIMEOUT.

View file

@ -25,6 +25,61 @@ from typing import Any, Dict, List
logger = logging.getLogger(__name__)
def _codex_note_to_tool_progress(note: dict) -> tuple[str, str, dict] | None:
"""Map a Codex app-server ``item/started`` notification to a Hermes
tool-progress event ``(tool_name, preview, args)``.
The Codex app-server runtime processes ``item/started`` notifications for
command execution, file changes, and MCP/dynamic tool calls, but never
surfaced them as Hermes tool-progress events so gateways (Telegram, etc.)
showed no verbose "running X" breadcrumbs on this route while every other
provider did (#38835). Returns None for items that aren't tool-shaped.
"""
if not isinstance(note, dict) or note.get("method") != "item/started":
return None
params = note.get("params") or {}
item = params.get("item") or {}
if not isinstance(item, dict):
return None
item_type = item.get("type") or ""
if item_type == "commandExecution":
command = item.get("command") or ""
return "exec_command", command, {"command": command, "cwd": item.get("cwd") or ""}
if item_type == "fileChange":
changes = item.get("changes") or []
preview = "file changes"
if isinstance(changes, list) and changes:
paths = [
str(change.get("path"))
for change in changes
if isinstance(change, dict) and change.get("path")
]
if paths:
preview = ", ".join(paths[:3])
if len(paths) > 3:
preview += f", +{len(paths) - 3} more"
return "apply_patch", preview, {"changes": changes}
if item_type == "mcpToolCall":
server = item.get("server") or "mcp"
tool = item.get("tool") or "unknown"
args = item.get("arguments") or {}
if not isinstance(args, dict):
args = {"arguments": args}
return f"mcp.{server}.{tool}", tool, args
if item_type == "dynamicToolCall":
tool = item.get("tool") or "unknown"
args = item.get("arguments") or {}
if not isinstance(args, dict):
args = {"arguments": args}
return tool, tool, args
return None
def _coerce_usage_int(value: Any) -> int:
if isinstance(value, bool):
return 0
@ -195,7 +250,9 @@ def run_codex_app_server_turn(
# Spawned on first turn, reused across turns, closed at AIAgent
# shutdown (see _cleanup hook).
if not hasattr(agent, "_codex_session") or agent._codex_session is None:
cwd = getattr(agent, "session_cwd", None) or os.getcwd()
from agent.runtime_cwd import resolve_agent_cwd
cwd = getattr(agent, "session_cwd", None) or str(resolve_agent_cwd())
# Approval callback: defer to Hermes' standard prompt flow if a
# CLI thread has installed one. Gateway / cron contexts get the
# codex-side fail-closed default.
@ -204,9 +261,27 @@ def run_codex_app_server_turn(
approval_callback = _get_approval_callback()
except Exception:
approval_callback = None
def _on_codex_event(note: dict) -> None:
# Bridge Codex app-server item/started notifications to Hermes
# tool-progress so gateways show verbose "running X" breadcrumbs
# on this route too (#38835).
progress_callback = getattr(agent, "tool_progress_callback", None)
if progress_callback is None:
return
mapped = _codex_note_to_tool_progress(note)
if mapped is None:
return
tool_name, preview, args = mapped
try:
progress_callback("tool.started", tool_name, preview, args)
except Exception:
logger.debug("codex tool-progress callback raised", exc_info=True)
agent._codex_session = CodexAppServerSession(
cwd=cwd,
approval_callback=approval_callback,
on_event=_on_codex_event,
)
# NOTE: the user message is ALREADY appended to messages by the
@ -290,6 +365,7 @@ def run_codex_app_server_turn(
original_user_message=original_user_message,
final_response=turn.final_text,
interrupted=False,
messages=messages,
)
except Exception:
logger.debug("external memory sync raised", exc_info=True)

View file

@ -23,7 +23,7 @@ import re
import time
from typing import Any, Dict, List, Optional
from agent.auxiliary_client import call_llm, _is_connection_error
from agent.auxiliary_client import call_llm, _is_connection_error, aux_interrupt_protection
from agent.context_engine import ContextEngine
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
@ -656,9 +656,8 @@ class ContextCompressor(ContextEngine):
self.provider = provider
self.api_mode = api_mode
self.context_length = context_length
self.threshold_tokens = max(
int(context_length * self.threshold_percent),
MINIMUM_CONTEXT_LENGTH,
self.threshold_tokens = self._compute_threshold_tokens(
context_length, self.threshold_percent
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
@ -668,6 +667,62 @@ class ContextCompressor(ContextEngine):
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
# Reset cross-call calibration state captured under the PREVIOUS model.
# These fields encode "the provider proved this prompt fit" / "preflight
# can be deferred" decisions that are only valid for the model that
# produced them. Carrying them across a switch to a smaller-context
# model would let should_defer_preflight_to_real_usage() suppress a
# preflight compression the new model actually needs — the exact
# oversized-send-after-switch failure in #23767. The new model's first
# response repopulates them via update_from_response(). Setting
# last_prompt_tokens to 0 (NOT -1) is deliberate: 0 is the documented
# "no real usage yet -> use the rough estimate" state, so the post-
# response should_compress path falls back to estimate_request_tokens_rough
# rather than skipping compression. -1 is a different sentinel
# (#36718, "compression just ran, await real usage") and must not be set here.
self.last_prompt_tokens = 0
self.last_completion_tokens = 0
self.last_total_tokens = 0
self.last_real_prompt_tokens = 0
self.last_rough_tokens_when_real_prompt_fit = 0
self.last_compression_rough_tokens = 0
self.awaiting_real_usage_after_compression = False
self._ineffective_compression_count = 0
# When the MINIMUM_CONTEXT_LENGTH floor meets/exceeds a small context
# window, compacting at the percentage (50% → 32K of a 64K window) wastes
# half the usable context. Trigger near the top of the window instead so a
# minimum-context model uses most of its budget before compacting — same
# rationale as the gpt-5.5/Codex 85% autoraise.
_MIN_CTX_TRIGGER_RATIO = 0.85
@staticmethod
def _compute_threshold_tokens(context_length: int, threshold_percent: float) -> int:
"""Compute the compaction trigger threshold in tokens.
The base value is ``context_length * threshold_percent``, floored at
``MINIMUM_CONTEXT_LENGTH`` so large-context models don't compress
prematurely at 50%. BUT that floor degenerates at small windows: for a
model whose ``context_length`` is at/below the minimum (e.g. a 64K
local model), ``max(0.5*64000, 64000) == 64000`` makes the threshold
equal the ENTIRE window auto-compression can never fire because the
provider rejects the request before usage reaches 100% (#14690).
When the floor would meet or exceed the context window, trigger at
``_MIN_CTX_TRIGGER_RATIO`` (85%) of the window high enough that a
small model uses most of its context before compacting, but below
100% so compaction fires before the provider rejects the request.
"""
pct_value = int(context_length * threshold_percent)
floored = max(pct_value, MINIMUM_CONTEXT_LENGTH)
# If flooring pushed the threshold to/over the window it can never be
# reached. Trigger at 85% of the window so a minimum-context model
# rides most of its budget before compacting instead of wasting half.
if context_length > 0 and floored >= context_length:
return max(1, min(int(context_length * ContextCompressor._MIN_CTX_TRIGGER_RATIO),
context_length - 1))
return floored
def __init__(
self,
model: str,
@ -708,10 +763,11 @@ class ContextCompressor(ContextEngine):
# Floor: never compress below MINIMUM_CONTEXT_LENGTH tokens even if
# the percentage would suggest a lower value. This prevents premature
# compression on large-context models at 50% while keeping the % sane
# for models right at the minimum.
self.threshold_tokens = max(
int(self.context_length * threshold_percent),
MINIMUM_CONTEXT_LENGTH,
# for models right at the minimum. _compute_threshold_tokens also
# guards the degenerate case where the floor would equal/exceed the
# window (small models), so auto-compression can still fire (#14690).
self.threshold_tokens = self._compute_threshold_tokens(
self.context_length, threshold_percent
)
self.compression_count = 0
@ -761,6 +817,14 @@ class ContextCompressor(ContextEngine):
# this flag to know "compression was attempted but aborted, freeze
# the chat until the user manually retries via /compress".
self._last_compress_aborted: bool = False
# Set True when the summary call failed with an authentication /
# permission error (HTTP 401/403). Auth failures are non-recoverable
# at the request level — the credential or endpoint is broken — so
# compress() must ABORT (preserve the session unchanged) rather than
# rotate into a degraded child session with a placeholder summary.
# This is independent of the abort_on_summary_failure config flag:
# rotating on a broken credential is never the right behavior.
self._last_summary_auth_failure: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
@ -1245,7 +1309,10 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
Unknown from deterministic fallback. Inspect current repository/session state if needed.
{HISTORICAL_IN_PROGRESS_HEADING}
{active_task}
Unknown from deterministic fallback the latest user ask is recorded once under
"{HISTORICAL_TASK_HEADING}" above as historical context only. Do NOT treat it as an
unfulfilled instruction to re-answer; verify current state and continue from the
protected recent messages after this summary.
## Blocked
{_bullets(blockers, limit=5)}
@ -1257,7 +1324,9 @@ None recoverable from deterministic fallback.
None recoverable from deterministic fallback.
{HISTORICAL_PENDING_ASKS_HEADING}
{active_task}
None recoverable from deterministic fallback. (The latest user ask is preserved once
under "{HISTORICAL_TASK_HEADING}" as historical context it is NOT necessarily
outstanding.)
## Relevant Files
{_bullets(relevant_files, limit=12)}
@ -1511,11 +1580,33 @@ This compaction should PRIORITISE preserving all information related to the focu
}
if self.summary_model:
call_kwargs["model"] = self.summary_model
response = call_llm(**call_kwargs)
# Compression is atomic: protect the in-flight summary call from a
# mid-turn gateway interrupt. Without this, an incoming user message
# aborts the summary and compression falls back to a degraded static
# marker, losing the real handoff (#23975). Re-entrant: a main-model
# retry (_generate_summary recursion) re-enters harmlessly.
with aux_interrupt_protection():
response = call_llm(**call_kwargs)
content = response.choices[0].message.content
# Handle cases where content is not a string (e.g., dict from llama.cpp)
if not isinstance(content, str):
content = str(content) if content else ""
# Some OpenAI-compatible proxies (e.g. cmkey.cn, one-api channels)
# return a well-formed HTTP 200 with an empty or whitespace-only
# ``content`` instead of an error or empty ``choices``. That payload
# passes ``_validate_llm_response`` (a ``message`` exists), so it
# reaches here and would otherwise be stored as a prefix-only
# summary with no body — silently wiping the compacted turns and
# making the model forget the in-progress task (#11978, #11914).
# Treat empty content as a failure so it routes through the same
# main-model fallback + cooldown machinery as a transport error,
# rather than replacing real context with an empty summary.
if not content.strip():
raise RuntimeError(
"Context compression LLM returned empty content "
f"(provider={self.provider or 'auto'} "
f"model={self.summary_model or self.model})"
)
# Redact the summary output as well — the summarizer LLM may
# ignore prompt instructions and echo back secrets verbatim.
summary = redact_sensitive_text(content.strip())
@ -1524,17 +1615,29 @@ This compaction should PRIORITISE preserving all information related to the focu
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
self._last_summary_auth_failure = False
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logger.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
return None
except Exception as e:
# ``call_llm`` raises ``RuntimeError`` for two very different cases:
# 1. No provider configured ("No LLM provider configured ...") —
# a permanent misconfiguration, long cooldown is correct.
# 2. An empty/invalid response from a configured provider
# (``_validate_llm_response`` empty-``choices``/``None``, or our
# empty-``content`` guard above) — a transient/proxy fault that
# should fall back to the main model first, exactly like the
# transport errors handled below.
# Only (1) belongs in the long no-provider cooldown; (2) and every
# other exception flow into the generic fallback logic so they get
# a main-model retry before any cooldown. (#11978, #11914)
if isinstance(e, RuntimeError) and "no llm provider configured" in str(e).lower():
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logger.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
return None
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
@ -1571,6 +1674,26 @@ This compaction should PRIORITISE preserving all information related to the focu
# back to the main model instead of entering a 60-second cooldown.
# See issue #18458.
_is_streaming_closed = _is_connection_error(e)
# Authentication / permission failures (401/403) are NOT transient
# and NOT fixable by retrying the same request: the credential is
# invalid/blocked/expired or the endpoint is wrong (e.g. a prod
# token sent to a staging inference URL). Flag them so compress()
# aborts and preserves the session instead of rotating into a
# degraded child with a placeholder summary. We still allow the
# one-shot fallback to the MAIN model below when the failure came
# from a distinct auxiliary summary_model (its dedicated creds may
# be the only broken thing); only a failure on the main model — or
# a fallback that also auth-fails — makes the abort stick.
_is_auth_error = (
_status in {401, 403}
or "invalid api key" in _err_str
or "invalid x-api-key" in _err_str
or ("api key" in _err_str and ("invalid" in _err_str or "blocked" in _err_str))
or "unauthorized" in _err_str
or "authentication" in _err_str
)
if _is_auth_error:
self._last_summary_auth_failure = True
if _is_json_decode and not _is_model_not_found and not _is_timeout:
logger.error(
"Context compression failed: auxiliary LLM returned a "
@ -1809,6 +1932,23 @@ This compaction should PRIORITISE preserving all information related to the focu
idx += 1
return idx
def _effective_protect_first_n(self) -> int:
"""``protect_first_n`` decayed across compression cycles.
``protect_first_n`` keeps the first N non-system messages verbatim so
the original task framing survives the FIRST compaction. But applying
it on every subsequent pass fossilizes those early turns they're
re-copied into each child session and never summarized away, so old
user messages become immortal and grow the head unboundedly across a
long session (#11996). Once the session has been compressed at least
once, the early turns are already captured in the handoff summary, so
there's no need to keep re-protecting them: decay to 0 (the system
prompt is still always protected separately by _protect_head_size).
"""
if self.compression_count >= 1 or self._previous_summary:
return 0
return self.protect_first_n
def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
"""Total count of head messages to protect.
@ -1820,14 +1960,19 @@ This compaction should PRIORITISE preserving all information related to the focu
the ``messages`` list (e.g. the gateway ``/compress`` handler
strips it before calling compress()).
Examples:
The ``protect_first_n`` portion DECAYS after the first compression
(see _effective_protect_first_n) so early user turns don't fossilize
across repeated compactions (#11996).
Examples (first compaction):
protect_first_n=0 system prompt only (or nothing if no system msg)
protect_first_n=3 system + first 3 non-system messages
After the first compaction: system prompt only.
"""
head = 0
if messages and messages[0].get("role") == "system":
head = 1
return head + self.protect_first_n
return head + self._effective_protect_first_n()
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Pull a compress-end boundary backward to avoid splitting a
@ -2178,6 +2323,7 @@ This compaction should PRIORITISE preserving all information related to the focu
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compress_aborted = False
self._last_summary_auth_failure = False
# Manual /compress (force=True) bypasses the failure cooldown so the
# user can retry immediately after an auto-compress abort. Without
@ -2293,19 +2439,38 @@ This compaction should PRIORITISE preserving all information related to the focu
# _last_summary_dropped_count for gateway hygiene to
# surface a warning.
# Default is False (historical behavior).
if not summary and self.abort_on_summary_failure:
#
# EXCEPTION — auth failures always abort. A 401/403 from the summary
# call means the credential or endpoint is broken (invalid/blocked
# key, or a token pointed at the wrong inference host). Rotating into
# a child session with a placeholder summary on a broken credential
# strands the user on a degraded session for zero benefit — every
# subsequent call fails the same way. So when the failure was an auth
# error we abort regardless of abort_on_summary_failure, preserving
# the conversation unchanged until the credential is fixed.
if not summary and (self.abort_on_summary_failure or self._last_summary_auth_failure):
n_skipped = compress_end - compress_start
self._last_summary_dropped_count = 0 # nothing actually dropped
self._last_summary_fallback_used = False
self._last_compress_aborted = True
if not self.quiet_mode:
logger.warning(
"Summary generation failed — aborting compression "
"(compression.abort_on_summary_failure=true). "
"%d message(s) preserved unchanged. Conversation is "
"frozen until the next /compress or /new.",
n_skipped,
)
if self._last_summary_auth_failure:
logger.warning(
"Summary generation failed with an authentication "
"error — aborting compression. %d message(s) preserved "
"unchanged; the session was NOT rotated. Check your "
"provider credential / inference endpoint, then retry "
"with /compress or start fresh with /new.",
n_skipped,
)
else:
logger.warning(
"Summary generation failed — aborting compression "
"(compression.abort_on_summary_failure=true). "
"%d message(s) preserved unchanged. Conversation is "
"frozen until the next /compress or /new.",
n_skipped,
)
return messages
# Phase 4: Assemble compressed message list

View file

@ -328,6 +328,16 @@ def compress_context(
agent._compression_feasibility_checked = True
_pre_msg_count = len(messages)
# In-place compaction (config: compression.in_place, see #38763). When True,
# this compaction rewrites the message list + rebuilds the system prompt but
# keeps the SAME session_id — no end_session, no parent_session_id child, no
# `name #N` renumber, no contextvar/env/logging re-sync, no memory/context-
# engine session-switch. The conversation keeps one durable id for life,
# eliminating the session-rotation bug cluster. Default False during rollout.
in_place = bool(getattr(agent, "compression_in_place", False))
# Set True once the in-place DB write actually completes (the DB block can
# raise and skip it). Surfaced to the gateway via agent._last_compaction_in_place.
compacted_in_place = False
logger.info(
"context compression started: session=%s messages=%d tokens=~%s model=%s focus=%r",
agent.session_id or "none", _pre_msg_count,
@ -508,125 +518,244 @@ def compress_context(
if agent._session_db:
try:
# Propagate title to the new session with auto-numbering
old_title = agent._session_db.get_session_title(agent.session_id)
# Trigger memory extraction on the old session before it rotates.
# Trigger memory extraction on the current session before the
# transcript is rewritten (runs in BOTH modes — the logical
# conversation's pre-compaction turns are about to be summarized
# away regardless of whether the id rotates).
agent.commit_memory_session(messages)
# Flush any un-persisted messages from the current turn to the
# old session *before* rotating. compress_context() can be
# called mid-turn (auto-compress when context exceeds threshold)
# at a point when _flush_messages_to_session_db() has not yet
# run. Without this, messages generated during the current turn
# are silently lost on session rotation (#47202).
try:
agent._flush_messages_to_session_db(messages)
except Exception:
pass # best-effort — don't block compression on a flush error
agent._session_db.end_session(agent.session_id, "compression")
old_session_id = agent.session_id
agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
# Ordering contract: the agent thread updates the contextvar here;
# the gateway propagates to SessionEntry after run_in_executor returns.
try:
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
os.environ["HERMES_SESSION_ID"] = agent.session_id
# The gateway/tools session context (ContextVar + env) and the
# logging session context are SEPARATE mechanisms. The call above
# moves the former; the ``[session_id]`` tag on log lines comes
# from ``hermes_logging._session_context`` (set once per turn in
# conversation_loop.py). Without this, post-rotation log lines in
# the same turn keep the STALE old id while the message/DB/gateway
# state carry the new one — breaking log correlation exactly at the
# compaction boundary (see #34089). Guarded separately so a logging
# failure can never regress the routing update above.
try:
from hermes_logging import set_session_context
set_session_context(agent.session_id)
except Exception:
pass
agent._session_db_created = False
agent._session_db.create_session(
session_id=agent.session_id,
source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
model=agent.model,
model_config=agent._session_init_model_config,
parent_session_id=old_session_id,
)
agent._session_db_created = True
# Auto-number the title for the continuation session
if old_title:
if in_place:
# ── In-place compaction: keep the same session_id ──────────
# No end_session, no new row, no parent_session_id, no title
# renumber, no contextvar/env/logging re-sync. The session's
# id, title, cwd, /goal, and gateway routing all stay put.
#
# Durable, NON-DESTRUCTIVE replace: soft-archive the
# pre-compaction turns (active=0, kept on disk + FTS-searchable +
# recoverable) and insert `compressed` as the new live (active=1)
# set, atomically. `compressed` already carries the surviving
# tail (current-turn messages the compressor kept via
# protect_last_n), so we DON'T pre-flush here — a flush would
# INSERT current-turn rows that archive_and_compact would then
# archive alongside the rest (harmless but wasted writes). The
# live-context load filters active=1, so a resume reloads ONLY
# the compacted set; the original turns remain under the SAME id
# for search/recovery (Teknium review — keep one durable id
# WITHOUT destroying history, unlike a hard replace_messages).
# See #38763.
agent._session_db.archive_and_compact(agent.session_id, compressed)
# Reset the flush identity set so the next turn's appends are
# diffed against the COMPACTED transcript: the compacted dicts
# are passed as conversation_history next turn and skipped by
# identity, so only genuinely new turn messages get appended
# (no dup of the summary, no resurrection of dropped turns).
agent._flushed_db_message_ids = set()
# Rotation-independent signal: the conversation was compacted in
# place (id unchanged). The gateway reads this (NOT an id-change
# diff) to re-baseline transcript handling.
compacted_in_place = True
else:
# ── Rotation (legacy): end this session, fork a continuation ─
# Flush any un-persisted current-turn messages to the OLD
# session before ending it, so they survive in the preserved
# parent transcript (#47202). (In-place skips this — see above.)
try:
new_title = agent._session_db.get_next_title_in_lineage(old_title)
agent._session_db.set_session_title(agent.session_id, new_title)
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
agent._flush_messages_to_session_db(messages)
except Exception:
pass # best-effort — don't block compression on a flush error
# Propagate title to the new session with auto-numbering
old_title = agent._session_db.get_session_title(agent.session_id)
agent._session_db.end_session(agent.session_id, "compression")
old_session_id = agent.session_id
agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
# Ordering contract: the agent thread updates the contextvar here;
# the gateway propagates to SessionEntry after run_in_executor returns.
try:
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
os.environ["HERMES_SESSION_ID"] = agent.session_id
# The gateway/tools session context (ContextVar + env) and the
# logging session context are SEPARATE mechanisms. The call above
# moves the former; the ``[session_id]`` tag on log lines comes
# from ``hermes_logging._session_context`` (set once per turn in
# conversation_loop.py). Without this, post-rotation log lines in
# the same turn keep the STALE old id while the message/DB/gateway
# state carry the new one — breaking log correlation exactly at the
# compaction boundary (see #34089). Guarded separately so a logging
# failure can never regress the routing update above.
try:
from hermes_logging import set_session_context
set_session_context(agent.session_id)
except Exception:
pass
agent._session_db_created = False
try:
agent._session_db.create_session(
session_id=agent.session_id,
source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
model=agent.model,
model_config=agent._session_init_model_config,
parent_session_id=old_session_id,
)
except Exception as _cs_err:
# The child row could not be created (e.g. FK constraint,
# contended write). Previously the outer handler simply
# warned and let the agent continue on the NEW id — which
# has no row in state.db, producing an orphan: the parent
# is ended, the child is never indexed, and every
# subsequent message is attributed to a session that
# doesn't exist (#33906/#33907). Roll the live id back to
# the parent so the conversation stays attached to a real,
# indexed session instead of a phantom.
logger.warning(
"Compression child session create failed (%s) — "
"rolling back to parent session %s to avoid an orphan.",
_cs_err, old_session_id,
)
agent.session_id = old_session_id
try:
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
os.environ["HERMES_SESSION_ID"] = agent.session_id
try:
from hermes_logging import set_session_context
set_session_context(agent.session_id)
except Exception:
pass
# Re-open the parent: it was ended above, but we're
# continuing on it, so it must not stay closed.
try:
agent._session_db.reopen_session(old_session_id)
except Exception:
pass
old_session_id = None # no rotation happened
# The parent row already exists in state.db, so mark the
# session as created — _ensure_db_session would otherwise
# retry a (harmless INSERT OR IGNORE) create next turn.
agent._session_db_created = True
raise
agent._session_db_created = True
# Carry a persistent /goal onto the continuation session.
# Compression mints a fresh child id; load_goal does a flat
# per-session lookup with no parent walk, so without this an
# active goal silently dies at the boundary (#33618).
try:
from hermes_cli.goals import migrate_goal_to_session
migrate_goal_to_session(old_session_id, agent.session_id, reason="compression")
except Exception as _goal_err:
logger.debug("Could not migrate goal on compression: %s", _goal_err)
# Auto-number the title for the continuation session
if old_title:
try:
new_title = agent._session_db.get_next_title_in_lineage(old_title)
agent._session_db.set_session_title(agent.session_id, new_title)
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
# Shared post-write steps (both modes target agent.session_id, which
# in-place keeps and rotation has already reassigned to the new id):
# refresh the stored system prompt and reset the flush cursor so the
# next turn re-bases its append diff.
agent._session_db.update_system_prompt(agent.session_id, new_system_prompt)
# Reset flush cursor — new session starts with no messages written
agent._last_flushed_db_idx = 0
except Exception as e:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
# If the rotation rolled back to the parent (orphan-avoidance
# above), agent.session_id is the still-indexed parent and
# old_session_id was cleared — so this is recovery, not an
# un-indexed orphan. Otherwise an earlier step failed before the
# child was created and the warning's original meaning holds.
if locals().get("old_session_id") is None and not in_place:
logger.warning(
"Compression rotation aborted and rolled back to the "
"parent session (%s): %s", agent.session_id or "?", e,
)
else:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
# Notify the context engine that the session_id rotated because of
# compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
# boundary_reason="compression" to preserve DAG lineage across the
# rollover instead of re-initializing fresh per-session state.
# See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
# Compaction-boundary bookkeeping, computed once. `old_session_id` is only
# bound in the rotation branch; in-place leaves it unset. `_boundary_parent`
# is the id the boundary notifications attribute the prior state to: the old
# id on rotation, the (unchanged) current id in-place.
_old_sid = locals().get("old_session_id")
_is_boundary = bool(_old_sid) or in_place
_boundary_parent = _old_sid or agent.session_id or ""
# Notify the context engine that a compaction boundary occurred. Plugin
# engines (e.g. hermes-lcm) use boundary_reason="compression" to preserve
# DAG lineage / checkpoint per-session state across the boundary instead of
# re-initializing fresh. See hermes-lcm#68. Built-in ContextCompressor
# ignores kwargs. Fires in BOTH modes: rotation passes old→new ids; in-place
# passes the SAME id (the boundary is real even though the id didn't move).
try:
_old_sid = locals().get("old_session_id")
if _old_sid and hasattr(agent.context_compressor, "on_session_start"):
if _is_boundary and hasattr(agent.context_compressor, "on_session_start"):
agent.context_compressor.on_session_start(
agent.session_id or "",
boundary_reason="compression",
old_session_id=_old_sid,
old_session_id=_boundary_parent,
platform=getattr(agent, "platform", None) or "cli",
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
# Notify memory providers of the compression-driven session_id rotation
# so provider-cached per-session state (Hindsight's _document_id,
# accumulated turn buffers, counters) refreshes. reset=False because
# the logical conversation continues; only the id and DB row rolled
# over. See #6672.
# Notify memory providers of the compaction boundary so provider-cached
# per-session state (Hindsight's _document_id, accumulated turn buffers,
# counters) refreshes. reset=False because the logical conversation
# continues. See #6672. Fires in BOTH modes: in-place uses the same id as
# parent (the conversation didn't fork, but the buffer must still be told
# the transcript was compacted so it doesn't double-count dropped turns).
try:
_old_sid = locals().get("old_session_id")
if _old_sid and agent._memory_manager:
if _is_boundary and agent._memory_manager:
agent._memory_manager.on_session_switch(
agent.session_id or "",
parent_session_id=_old_sid,
parent_session_id=_boundary_parent,
reset=False,
reason="compression",
)
except Exception as _me_err:
logger.debug("memory manager on_session_switch (compression): %s", _me_err)
# Warn on repeated compressions (quality degrades with each pass)
# Warn on repeated compressions (quality degrades with each pass).
# Route through _emit_status (like the other compression warnings above)
# so the warning reaches the TUI / Telegram / Discord via status_callback,
# not just CLI stdout. _emit_status still _vprints for the CLI, and
# storing it on _compression_warning lets replay_compression_warning
# re-deliver it once a late-bound gateway status_callback is wired (#36908).
_cc = agent.context_compressor.compression_count
if _cc >= 2:
agent._vprint(
_cc_msg = (
f"{agent.log_prefix}⚠️ Session compressed {_cc} times — "
f"accuracy may degrade. Consider /new to start fresh.",
force=True,
f"accuracy may degrade. Consider /new to start fresh."
)
agent._compression_warning = _cc_msg
agent._emit_status(_cc_msg)
# Emit session:compress event so hooks (e.g. MemPalace sync) can ingest
# the completed old session before its details are lost.
_old_sid_for_event = locals().get("old_session_id")
# the completed old session before its details are lost. In in-place mode
# there is no old id (same session); ``in_place=True`` tells hooks the
# transcript was compacted on the same id rather than rotated.
if getattr(agent, "event_callback", None):
try:
agent.event_callback("session:compress", {
"platform": agent.platform or "",
"session_id": agent.session_id,
"old_session_id": _old_sid_for_event or "",
"old_session_id": _old_sid or "",
"in_place": in_place,
"compression_count": agent.context_compressor.compression_count,
})
except Exception as e:
logger.debug("event_callback error on session:compress: %s", e)
# Surface the compaction mode to the caller (run_conversation / gateway)
# via a rotation-independent flag. The gateway uses this — NOT an
# id-change diff — to re-baseline transcript handling (history_offset=0 +
# rewrite on the same id) when compaction happened in place. See #38763.
agent._last_compaction_in_place = compacted_in_place
# Keep the post-compression rough estimate for diagnostics, but do not
# treat it as provider-reported prompt usage. Schema-heavy rough estimates
# can remain above threshold even after the next real API request fits.
@ -712,33 +841,58 @@ def try_shrink_image_parts_in_messages(
# actually brought under the target.
unshrinkable_oversized = 0
def _shrink_data_url(url: str) -> Optional[str]:
"""Return a smaller data URL, or None if shrink can't help."""
if not isinstance(url, str) or not url.startswith("data:"):
def _decode_pixels(data_url: str) -> Optional[tuple]:
"""Return ``(width, height)`` of a base64 data URL, or None on failure.
Soft-depends on Pillow; returns None (caller falls back to a
bytes-only check) if Pillow is missing or the payload is corrupt.
"""
try:
import base64 as _b64_dim
import io as _io_dim
header_d, _, data_d = data_url.partition(",")
if not data_d or not data_url.startswith("data:"):
return None
from PIL import Image as _PILImage
with _PILImage.open(_io_dim.BytesIO(_b64_dim.b64decode(data_d))) as _img:
return _img.size
except Exception:
return None
# Check both byte size AND pixel dimensions.
def _shrink_data_url(url: str) -> tuple:
"""Return ``(resized_url, unshrinkable)`` for a data URL.
``resized_url`` is a smaller/dimension-correct data URL, or None when
no rewrite was applied. ``unshrinkable`` is True only when the image
exceeded a constraint (byte-size or dimensions) and the resize failed
to satisfy *that same* constraint so the caller knows retrying is
pointless even if a different image in the request shrank.
"""
if not isinstance(url, str) or not url.startswith("data:"):
return None, False
# Determine which constraint is binding. The accept/reject gate below
# MUST be checked against the same axis that triggered the shrink: a
# downscaled screenshot PNG routinely re-encodes to *more* bytes than
# the original (PNG compression is non-monotonic in image size — a
# smaller raster with LANCZOS resampling noise compresses worse than a
# larger smooth one). Rejecting a pixel-correct downscale purely
# because its bytes grew permanently wedges sessions on the Anthropic
# many-image 2000px path (#48013).
needs_shrink = len(url) > target_bytes # over byte budget
triggered_by = "bytes" if needs_shrink else None
if not needs_shrink:
# Even if bytes are fine, check pixel dimensions against the
# provider's reported per-side cap. A screenshot can be tiny in
# bytes yet too large in pixels.
try:
import base64 as _b64_dim
header_d, _, data_d = url.partition(",")
if not data_d:
return None
raw_d = _b64_dim.b64decode(data_d)
from PIL import Image as _PILImage
import io as _io_dim
with _PILImage.open(_io_dim.BytesIO(raw_d)) as _img:
if max(_img.size) <= max_dimension:
return None # both bytes and pixels are fine
needs_shrink = True # pixels exceed limit, force shrink
except Exception:
# If we can't check dimensions (Pillow unavailable, corrupt
# image, etc.), fall back to byte-only check.
return None
# Bytes are fine — check pixel dimensions against the provider's
# reported per-side cap. A screenshot can be tiny in bytes yet
# too large in pixels.
dims = _decode_pixels(url)
if dims is None:
# Pillow missing or corrupt data — fall back to byte-only.
return None, False
if max(dims) <= max_dimension:
return None, False # both bytes and pixels are within limits
needs_shrink = True
triggered_by = "dimension"
try:
header, _, data = url.partition(",")
@ -770,13 +924,45 @@ def try_shrink_image_parts_in_messages(
Path(tmp.name).unlink(missing_ok=True)
except Exception:
pass
if not resized or len(resized) >= len(url):
# Shrink didn't help (or made it bigger — corrupt input?).
return None
return resized
if not resized:
# Resize returned nothing — Pillow couldn't help.
return None, True
if triggered_by == "bytes":
# Byte budget is the binding constraint — bytes must shrink.
if len(resized) >= len(url):
return None, True # re-encode made it bigger
# The per-side dimension cap is ALSO an active provider
# constraint on this request (the caller passes the parsed cap
# to both this helper and the resizer). _resize_image_for_vision
# returns a best-effort, possibly-over-cap blob when it
# exhausts its halving budget — it freezes the long side once
# the short side hits its 64px floor, so a very-high-aspect
# image can stay over the cap even after bytes shrank. If the
# output is still over the cap, retrying would re-400 on
# dimensions; treat it as unshrinkable. (Skip when dims can't
# be decoded — preserves historical byte-only behaviour.)
new_dims = _decode_pixels(resized)
if new_dims is not None and max(new_dims) > max_dimension:
return None, True
return resized, False
# triggered_by == "dimension": the per-side cap is binding. The
# re-encode may have grown in bytes; accept it as long as it is now
# within the dimension cap. Verify the new dimensions when we can.
new_dims = _decode_pixels(resized)
if new_dims is not None:
if max(new_dims) <= max_dimension:
return resized, False
# Still over the per-side cap — the resize didn't satisfy it.
return None, True
# Couldn't verify the re-encode's dimensions (corrupt output or
# Pillow gone mid-call). Fall back to the historical "bytes must
# shrink" gate so we never accept an unverifiable, byte-larger blob.
if len(resized) >= len(url):
return None, True
return resized, False
except Exception as exc:
logger.warning("image-shrink recovery: re-encode failed — %s", exc)
return None
return None, triggered_by is not None
for msg in api_messages:
if not isinstance(msg, dict):
@ -795,20 +981,18 @@ def try_shrink_image_parts_in_messages(
# OpenAI Responses: {"image_url": "data:..."}
if isinstance(image_value, dict):
url = image_value.get("url", "")
resized = _shrink_data_url(url)
resized, unshrinkable = _shrink_data_url(url)
if resized:
image_value["url"] = resized
changed_count += 1
elif isinstance(url, str) and url.startswith("data:") \
and len(url) > target_bytes:
elif unshrinkable:
unshrinkable_oversized += 1
elif isinstance(image_value, str):
resized = _shrink_data_url(image_value)
resized, unshrinkable = _shrink_data_url(image_value)
if resized:
part["image_url"] = resized
changed_count += 1
elif image_value.startswith("data:") \
and len(image_value) > target_bytes:
elif unshrinkable:
unshrinkable_oversized += 1
if changed_count:

View file

@ -466,6 +466,32 @@ def _content_policy_blocked_result(
}
def _sync_failover_system_message(agent, api_messages, active_system_prompt):
"""Refresh the in-flight system message after a provider failover.
``try_activate_fallback`` rewrites the ``Model:``/``Provider:`` identity
lines on ``agent._cached_system_prompt`` (see
``rewrite_prompt_model_identity``) so the agent reports the model that is
actually answering. But the current call block's ``api_messages`` were
built from the pre-failover prompt, and the retry loop rebuilds
``api_kwargs`` from that list each iteration without this sync the
whole turn (and every gateway turn, since fallback re-activates per
message while the primary is down) ships the stale identity.
Mutates ``api_messages[0]`` in place and returns the prompt to use as
``active_system_prompt`` for subsequent call-block rebuilds.
"""
sp = getattr(agent, "_cached_system_prompt", None)
if not isinstance(sp, str) or not sp:
return active_system_prompt
if api_messages and api_messages[0].get("role") == "system":
effective = sp
if agent.ephemeral_system_prompt:
effective = (effective + "\n\n" + agent.ephemeral_system_prompt).strip()
api_messages[0]["content"] = effective
return sp
def run_conversation(
agent,
user_message: str,
@ -940,6 +966,8 @@ def run_conversation(
)
agent._buffer_status(f"{_nous_msg}")
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -1265,6 +1293,8 @@ def run_conversation(
if agent._fallback_index < len(agent._fallback_chain):
agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -1336,6 +1366,8 @@ def run_conversation(
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -1479,6 +1511,8 @@ def run_conversation(
"⚠️ Model declined to respond (safety refusal) — trying fallback..."
)
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -2783,11 +2817,46 @@ def run_conversation(
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
continue
# ── Auth-failure provider failover ───────────────────────
# A 401/403 that survives the per-provider credential-refresh
# attempt above (each guarded by its own
# ``*_auth_retry_attempted`` flag) means the active provider's
# credential or endpoint is broken in a way refreshing can't
# fix (revoked OAuth, blocked/expired key, an account pinned to
# a dead/staging endpoint). Previously the loop only printed
# "switch providers manually" advice and fell through, so a
# user with a configured fallback chain kept thrashing on the
# same dead credential every turn instead of failing over.
# Escalate to the fallback chain here, mirroring the rate-
# limit/billing failover above. When no fallback is configured
# (or the chain is exhausted), _try_activate_fallback returns
# False and we fall through to the existing terminal handling
# + provider-specific troubleshooting guidance unchanged.
if (
classified.is_auth
and not _retry.auth_failover_attempted
and agent._fallback_index < len(agent._fallback_chain)
):
_retry.auth_failover_attempted = True
agent._buffer_status(
"🔐 Authentication failed and could not be refreshed — "
"switching to fallback provider..."
)
if agent._try_activate_fallback(reason=classified.reason):
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
continue
# ── Nous Portal: record rate limit & skip retries ─────
# When Nous returns a 429 that is a genuine account-
# level rate limit, record the reset time to a shared
@ -2914,6 +2983,7 @@ def run_conversation(
agent._buffer_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
original_tokens = estimate_messages_tokens_rough(messages)
messages, active_system_prompt = agent._compress_context(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
@ -2923,8 +2993,18 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
if len(messages) < original_len:
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
# can materially reduce request size without reducing the
# message array. (#39550)
new_tokens = estimate_messages_tokens_rough(messages)
approx_tokens = new_tokens # update for downstream logging
if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95):
if len(messages) < original_len:
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
else:
agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
time.sleep(2) # Brief pause between compression retries
_retry.restart_with_compressed_messages = True
break
@ -3070,6 +3150,7 @@ def run_conversation(
agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
original_len = len(messages)
original_tokens = estimate_messages_tokens_rough(messages)
messages, active_system_prompt = agent._compress_context(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
@ -3079,9 +3160,18 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
# can materially reduce request size without reducing the
# message array. (#39550)
new_tokens = estimate_messages_tokens_rough(messages)
approx_tokens = new_tokens # update for downstream logging
if len(messages) < original_len or (new_tokens > 0 and new_tokens < original_tokens * 0.95) or (new_ctx and new_ctx < old_ctx):
if len(messages) < original_len:
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
elif new_tokens > 0 and new_tokens < original_tokens * 0.95:
agent._buffer_status(f"🗜️ Compressed ~{original_tokens:,} → ~{new_tokens:,} tokens, retrying...")
time.sleep(2) # Brief pause between compression retries
_retry.restart_with_compressed_messages = True
break
@ -3090,13 +3180,13 @@ def run_conversation(
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
logger.error(f"{agent.log_prefix}Context length exceeded: {new_tokens:,} tokens. Cannot compress further.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
"completed": False,
"api_calls": api_call_count,
"error": f"Context length exceeded ({approx_tokens:,} tokens). Cannot compress further.",
"error": f"Context length exceeded ({new_tokens:,} tokens). Cannot compress further.",
"partial": True,
"failed": True,
"compression_exhausted": True,
@ -3186,6 +3276,8 @@ def run_conversation(
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -3197,15 +3289,22 @@ def run_conversation(
# Terminal — flush buffered context so the user sees
# what was tried before the abort.
agent._flush_status_buffer()
# Summarize once: Cloudflare/proxy HTML challenge pages and
# other raw provider bodies must be collapsed to a short
# one-liner here, otherwise the full page leaks into the
# returned ``error`` field and downstream consumers deliver
# it verbatim (e.g. a cron failure notification dumped a
# ~60KB Cloudflare challenge page as 31 Discord messages).
_nonretryable_summary = agent._summarize_api_error(api_error)
if classified.reason == FailoverReason.content_policy_blocked:
agent._emit_status(
f"❌ Provider safety filter blocked this request: "
f"{agent._summarize_api_error(api_error)}"
f"{_nonretryable_summary}"
)
else:
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
f"{agent._summarize_api_error(api_error)}"
f"{_nonretryable_summary}"
)
agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
@ -3290,18 +3389,17 @@ def run_conversation(
else:
agent._persist_session(messages, conversation_history)
if classified.reason == FailoverReason.content_policy_blocked:
_summary = agent._summarize_api_error(api_error)
_policy_response = (
"⚠️ The model provider's safety filter blocked this request "
"(not a Hermes/gateway failure).\n\n"
f"Provider message: {_summary}\n\n"
f"Provider message: {_nonretryable_summary}\n\n"
f"{_CONTENT_POLICY_RECOVERY_HINT}"
)
return _content_policy_blocked_result(
messages,
api_call_count,
final_response=_policy_response,
error_detail=_summary,
error_detail=_nonretryable_summary,
)
return {
"final_response": None,
@ -3309,7 +3407,7 @@ def run_conversation(
"api_calls": api_call_count,
"completed": False,
"failed": True,
"error": str(api_error),
"error": _nonretryable_summary,
}
if retry_count >= max_retries:
@ -3327,6 +3425,8 @@ def run_conversation(
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
@ -4273,6 +4373,8 @@ def run_conversation(
"switching to fallback provider..."
)
if agent._try_activate_fallback():
active_system_prompt = _sync_failover_system_message(
agent, api_messages, active_system_prompt)
agent._empty_content_retries = 0
agent._buffer_status(
f"↻ Switched to fallback: {agent.model} "

View file

@ -15,6 +15,7 @@ from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import load_env
from agent.secret_scope import get_secret as _get_secret
from agent.credential_persistence import (
is_borrowed_credential_source,
sanitize_borrowed_credential_payload,
@ -1666,7 +1667,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
_env_file = load_env()
def _env_val(key: str) -> str:
return (_env_file.get(key) or os.environ.get(key) or "").strip()
return (_env_file.get(key) or _get_secret(key, "") or "").strip()
anthropic_api_key = _env_val("ANTHROPIC_API_KEY")
anthropic_oauth_env = (
@ -1952,7 +1953,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
# changes to the .env file.
def _get_env_prefer_dotenv(key: str) -> str:
env_file = load_env()
val = env_file.get(key) or os.environ.get(key) or ""
val = env_file.get(key) or _get_secret(key, "") or ""
return val.strip()
# Honour user suppression — `hermes auth remove <provider> <N>` for an
@ -2061,19 +2062,34 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
return changed, active_sources
def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources: Set[str]) -> bool:
def _prune_stale_seeded_entries(
entries: List[PooledCredential],
active_sources: Set[str],
*,
prune_env_sources: bool = True,
) -> bool:
def _is_prunable(entry: PooledCredential) -> bool:
# ``env:*`` entries are persisted references that get re-hydrated from
# the environment on every load. A process that merely lacks the env
# var this call must NOT delete the on-disk entry for every other
# process — that destructive read is the bug behind #9331. Only prune
# an env source when ``prune_env_sources`` is explicitly requested
# (e.g. an `hermes auth` command that confirmed the source is gone).
if entry.source.startswith("env:"):
return prune_env_sources
# File-backed singletons (device-code OAuth, claude_code) and Hermes
# PKCE should disappear from the pool when their backing file is gone.
return (
is_borrowed_credential_source(entry.source, entry.provider)
or entry.source == "hermes_pkce"
)
retained = [
entry
for entry in entries
if _is_manual_source(entry.source)
or entry.source in active_sources
or not (
is_borrowed_credential_source(entry.source, entry.provider)
# Hermes PKCE is Hermes-owned/persistable while present, but it is
# still a file-backed singleton and should disappear from the pool
# when the backing OAuth file is gone.
or entry.source == "hermes_pkce"
)
or not _is_prunable(entry)
]
if len(retained) == len(entries):
return False
@ -2173,7 +2189,15 @@ def load_pool(provider: str) -> CredentialPool:
singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
env_changed, env_sources = _seed_from_env(provider, entries)
changed = raw_needs_sanitization or singleton_changed or env_changed
changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
# ``load_pool()`` is a non-destructive read for env-seeded entries: a
# process missing a provider env var must not delete the persisted
# pool entry for every other process (#9331). File-backed singletons
# still prune when their backing file is gone.
changed |= _prune_stale_seeded_entries(
entries,
singleton_sources | env_sources,
prune_env_sources=False,
)
changed |= _normalize_pool_priorities(provider, entries)
if changed:

View file

@ -1,909 +0,0 @@
"""OpenAI-compatible facade that talks to Google's Cloud Code Assist backend.
This adapter lets Hermes use the ``google-gemini-cli`` provider as if it were
a standard OpenAI-shaped chat completion endpoint, while the underlying HTTP
traffic goes to ``cloudcode-pa.googleapis.com/v1internal:{generateContent,
streamGenerateContent}`` with a Bearer access token obtained via OAuth PKCE.
Architecture
------------
- ``GeminiCloudCodeClient`` exposes ``.chat.completions.create(**kwargs)``
mirroring the subset of the OpenAI SDK that ``run_agent.py`` uses.
- Incoming OpenAI ``messages[]`` / ``tools[]`` / ``tool_choice`` are translated
to Gemini's native ``contents[]`` / ``tools[].functionDeclarations`` /
``toolConfig`` / ``systemInstruction`` shape.
- The request body is wrapped ``{project, model, user_prompt_id, request}``
per Code Assist API expectations.
- Responses (``candidates[].content.parts[]``) are converted back to
OpenAI ``choices[0].message`` shape with ``content`` + ``tool_calls``.
- Streaming uses SSE (``?alt=sse``) and yields OpenAI-shaped delta chunks.
Attribution
-----------
Translation semantics follow jenslys/opencode-gemini-auth (MIT) and the public
Gemini API docs. Request envelope shape
(``{project, model, user_prompt_id, request}``) is documented nowhere; it is
reverse-engineered from the opencode-gemini-auth and clawdbot implementations.
"""
from __future__ import annotations
import json
import logging
import time
import uuid
from types import SimpleNamespace
from typing import Any, Dict, Iterator, List, Optional
import httpx
from agent import google_oauth
from agent.gemini_schema import sanitize_gemini_tool_parameters
from agent.google_code_assist import (
CODE_ASSIST_ENDPOINT,
CodeAssistError,
ProjectContext,
resolve_project_context,
)
logger = logging.getLogger(__name__)
# =============================================================================
# Request translation: OpenAI → Gemini
# =============================================================================
_ROLE_MAP_OPENAI_TO_GEMINI = {
"user": "user",
"assistant": "model",
"system": "user", # handled separately via systemInstruction
"tool": "user", # functionResponse is wrapped in a user-role turn
"function": "user",
}
def _coerce_content_to_text(content: Any) -> str:
"""OpenAI content may be str or a list of parts; reduce to plain text."""
if content is None:
return ""
if isinstance(content, str):
return content
if isinstance(content, list):
pieces: List[str] = []
for p in content:
if isinstance(p, str):
pieces.append(p)
elif isinstance(p, dict):
if p.get("type") == "text" and isinstance(p.get("text"), str):
pieces.append(p["text"])
# Multimodal (image_url, etc.) — stub for now; log and skip
elif p.get("type") in {"image_url", "input_audio"}:
logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
return "\n".join(pieces)
return str(content)
def _translate_tool_call_to_gemini(tool_call: Dict[str, Any]) -> Dict[str, Any]:
"""OpenAI tool_call -> Gemini functionCall part."""
fn = tool_call.get("function") or {}
args_raw = fn.get("arguments", "")
try:
args = json.loads(args_raw) if isinstance(args_raw, str) and args_raw else {}
except json.JSONDecodeError:
args = {"_raw": args_raw}
if not isinstance(args, dict):
args = {"_value": args}
return {
"functionCall": {
"name": fn.get("name") or "",
"args": args,
},
# Sentinel signature — matches opencode-gemini-auth's approach.
# Without this, Code Assist rejects function calls that originated
# outside its own chain.
"thoughtSignature": "skip_thought_signature_validator",
}
def _translate_tool_result_to_gemini(message: Dict[str, Any]) -> Dict[str, Any]:
"""OpenAI tool-role message -> Gemini functionResponse part.
The function name isn't in the OpenAI tool message directly; it must be
passed via the assistant message that issued the call. For simplicity we
look up ``name`` on the message (OpenAI SDK copies it there) or on the
``tool_call_id`` cross-reference.
"""
name = str(message.get("name") or message.get("tool_call_id") or "tool")
content = _coerce_content_to_text(message.get("content"))
# Gemini expects the response as a dict under `response`. We wrap plain
# text in {"output": "..."}.
try:
parsed = json.loads(content) if content.strip().startswith(("{", "[")) else None
except json.JSONDecodeError:
parsed = None
response = parsed if isinstance(parsed, dict) else {"output": content}
return {
"functionResponse": {
"name": name,
"response": response,
},
}
def _build_gemini_contents(
messages: List[Dict[str, Any]],
) -> tuple[List[Dict[str, Any]], Optional[Dict[str, Any]]]:
"""Convert OpenAI messages[] to Gemini contents[] + systemInstruction."""
system_text_parts: List[str] = []
contents: List[Dict[str, Any]] = []
for msg in messages:
if not isinstance(msg, dict):
continue
role = str(msg.get("role") or "user")
if role == "system":
system_text_parts.append(_coerce_content_to_text(msg.get("content")))
continue
# Tool result message — emit a user-role turn with functionResponse
if role == "tool" or role == "function":
contents.append({
"role": "user",
"parts": [_translate_tool_result_to_gemini(msg)],
})
continue
gemini_role = _ROLE_MAP_OPENAI_TO_GEMINI.get(role, "user")
parts: List[Dict[str, Any]] = []
text = _coerce_content_to_text(msg.get("content"))
if text:
parts.append({"text": text})
# Assistant messages can carry tool_calls
tool_calls = msg.get("tool_calls") or []
if isinstance(tool_calls, list):
for tc in tool_calls:
if isinstance(tc, dict):
parts.append(_translate_tool_call_to_gemini(tc))
if not parts:
# Gemini rejects empty parts; skip the turn entirely
continue
contents.append({"role": gemini_role, "parts": parts})
system_instruction: Optional[Dict[str, Any]] = None
joined_system = "\n".join(p for p in system_text_parts if p).strip()
if joined_system:
system_instruction = {
"role": "system",
"parts": [{"text": joined_system}],
}
return contents, system_instruction
def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
"""OpenAI tools[] -> Gemini tools[].functionDeclarations[]."""
if not isinstance(tools, list) or not tools:
return []
declarations: List[Dict[str, Any]] = []
for t in tools:
if not isinstance(t, dict):
continue
fn = t.get("function") or {}
if not isinstance(fn, dict):
continue
name = fn.get("name")
if not name:
continue
decl = {"name": str(name)}
if fn.get("description"):
decl["description"] = str(fn["description"])
params = fn.get("parameters")
if isinstance(params, dict):
decl["parameters"] = sanitize_gemini_tool_parameters(params)
declarations.append(decl)
if not declarations:
return []
return [{"functionDeclarations": declarations}]
def _translate_tool_choice_to_gemini(tool_choice: Any) -> Optional[Dict[str, Any]]:
"""OpenAI tool_choice -> Gemini toolConfig.functionCallingConfig."""
if tool_choice is None:
return None
if isinstance(tool_choice, str):
if tool_choice == "auto":
return {"functionCallingConfig": {"mode": "AUTO"}}
if tool_choice == "required":
return {"functionCallingConfig": {"mode": "ANY"}}
if tool_choice == "none":
return {"functionCallingConfig": {"mode": "NONE"}}
if isinstance(tool_choice, dict):
fn = tool_choice.get("function") or {}
name = fn.get("name")
if name:
return {
"functionCallingConfig": {
"mode": "ANY",
"allowedFunctionNames": [str(name)],
},
}
return None
def _normalize_thinking_config(config: Any) -> Optional[Dict[str, Any]]:
"""Accept thinkingBudget / thinkingLevel / includeThoughts (+ snake_case)."""
if not isinstance(config, dict) or not config:
return None
budget = config.get("thinkingBudget", config.get("thinking_budget"))
level = config.get("thinkingLevel", config.get("thinking_level"))
include = config.get("includeThoughts", config.get("include_thoughts"))
normalized: Dict[str, Any] = {}
if isinstance(budget, (int, float)):
normalized["thinkingBudget"] = int(budget)
if isinstance(level, str) and level.strip():
normalized["thinkingLevel"] = level.strip().lower()
if isinstance(include, bool):
normalized["includeThoughts"] = include
return normalized or None
def build_gemini_request(
*,
messages: List[Dict[str, Any]],
tools: Any = None,
tool_choice: Any = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
top_p: Optional[float] = None,
stop: Any = None,
thinking_config: Any = None,
) -> Dict[str, Any]:
"""Build the inner Gemini request body (goes inside ``request`` wrapper)."""
contents, system_instruction = _build_gemini_contents(messages)
body: Dict[str, Any] = {"contents": contents}
if system_instruction is not None:
body["systemInstruction"] = system_instruction
gemini_tools = _translate_tools_to_gemini(tools)
if gemini_tools:
body["tools"] = gemini_tools
tool_cfg = _translate_tool_choice_to_gemini(tool_choice)
if tool_cfg is not None:
body["toolConfig"] = tool_cfg
generation_config: Dict[str, Any] = {}
if isinstance(temperature, (int, float)):
generation_config["temperature"] = float(temperature)
if isinstance(max_tokens, int) and max_tokens > 0:
generation_config["maxOutputTokens"] = max_tokens
if isinstance(top_p, (int, float)):
generation_config["topP"] = float(top_p)
if isinstance(stop, str) and stop:
generation_config["stopSequences"] = [stop]
elif isinstance(stop, list) and stop:
generation_config["stopSequences"] = [str(s) for s in stop if s]
normalized_thinking = _normalize_thinking_config(thinking_config)
if normalized_thinking:
generation_config["thinkingConfig"] = normalized_thinking
if generation_config:
body["generationConfig"] = generation_config
return body
def wrap_code_assist_request(
*,
project_id: str,
model: str,
inner_request: Dict[str, Any],
user_prompt_id: Optional[str] = None,
) -> Dict[str, Any]:
"""Wrap the inner Gemini request in the Code Assist envelope."""
return {
"project": project_id,
"model": model,
"user_prompt_id": user_prompt_id or str(uuid.uuid4()),
"request": inner_request,
}
# =============================================================================
# Response translation: Gemini → OpenAI
# =============================================================================
def _translate_gemini_response(
resp: Dict[str, Any],
model: str,
) -> SimpleNamespace:
"""Non-streaming Gemini response -> OpenAI-shaped SimpleNamespace.
Code Assist wraps the actual Gemini response inside ``response``, so we
unwrap it first if present.
"""
inner = resp.get("response") if isinstance(resp.get("response"), dict) else resp
candidates = inner.get("candidates") or []
if not isinstance(candidates, list) or not candidates:
return _empty_response(model)
cand = candidates[0]
content_obj = cand.get("content") if isinstance(cand, dict) else {}
parts = content_obj.get("parts") if isinstance(content_obj, dict) else []
text_pieces: List[str] = []
reasoning_pieces: List[str] = []
tool_calls: List[SimpleNamespace] = []
for i, part in enumerate(parts or []):
if not isinstance(part, dict):
continue
# Thought parts are model's internal reasoning — surface as reasoning,
# don't mix into content.
if part.get("thought") is True:
if isinstance(part.get("text"), str):
reasoning_pieces.append(part["text"])
continue
if isinstance(part.get("text"), str):
text_pieces.append(part["text"])
continue
fc = part.get("functionCall")
if isinstance(fc, dict) and fc.get("name"):
try:
args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
except (TypeError, ValueError):
args_str = "{}"
tool_calls.append(SimpleNamespace(
id=f"call_{uuid.uuid4().hex[:12]}",
type="function",
index=i,
function=SimpleNamespace(name=str(fc["name"]), arguments=args_str),
))
finish_reason = "tool_calls" if tool_calls else _map_gemini_finish_reason(
str(cand.get("finishReason") or "")
)
usage_meta = inner.get("usageMetadata") or {}
usage = SimpleNamespace(
prompt_tokens=int(usage_meta.get("promptTokenCount") or 0),
completion_tokens=int(usage_meta.get("candidatesTokenCount") or 0),
total_tokens=int(usage_meta.get("totalTokenCount") or 0),
prompt_tokens_details=SimpleNamespace(
cached_tokens=int(usage_meta.get("cachedContentTokenCount") or 0),
),
)
message = SimpleNamespace(
role="assistant",
content="".join(text_pieces) if text_pieces else None,
tool_calls=tool_calls or None,
reasoning="".join(reasoning_pieces) or None,
reasoning_content="".join(reasoning_pieces) or None,
reasoning_details=None,
)
choice = SimpleNamespace(
index=0,
message=message,
finish_reason=finish_reason,
)
return SimpleNamespace(
id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
object="chat.completion",
created=int(time.time()),
model=model,
choices=[choice],
usage=usage,
)
def _empty_response(model: str) -> SimpleNamespace:
message = SimpleNamespace(
role="assistant", content="", tool_calls=None,
reasoning=None, reasoning_content=None, reasoning_details=None,
)
choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
usage = SimpleNamespace(
prompt_tokens=0, completion_tokens=0, total_tokens=0,
prompt_tokens_details=SimpleNamespace(cached_tokens=0),
)
return SimpleNamespace(
id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
object="chat.completion",
created=int(time.time()),
model=model,
choices=[choice],
usage=usage,
)
def _map_gemini_finish_reason(reason: str) -> str:
mapping = {
"STOP": "stop",
"MAX_TOKENS": "length",
"SAFETY": "content_filter",
"RECITATION": "content_filter",
"OTHER": "stop",
}
return mapping.get(reason.upper(), "stop")
# =============================================================================
# Streaming SSE iterator
# =============================================================================
class _GeminiStreamChunk(SimpleNamespace):
"""Mimics an OpenAI ChatCompletionChunk with .choices[0].delta."""
pass
def _make_stream_chunk(
*,
model: str,
content: str = "",
tool_call_delta: Optional[Dict[str, Any]] = None,
finish_reason: Optional[str] = None,
reasoning: str = "",
) -> _GeminiStreamChunk:
delta_kwargs: Dict[str, Any] = {
"role": "assistant",
"content": None,
"tool_calls": None,
"reasoning": None,
"reasoning_content": None,
}
if content:
delta_kwargs["content"] = content
if tool_call_delta is not None:
delta_kwargs["tool_calls"] = [SimpleNamespace(
index=tool_call_delta.get("index", 0),
id=tool_call_delta.get("id") or f"call_{uuid.uuid4().hex[:12]}",
type="function",
function=SimpleNamespace(
name=tool_call_delta.get("name") or "",
arguments=tool_call_delta.get("arguments") or "",
),
)]
if reasoning:
delta_kwargs["reasoning"] = reasoning
delta_kwargs["reasoning_content"] = reasoning
delta = SimpleNamespace(**delta_kwargs)
choice = SimpleNamespace(index=0, delta=delta, finish_reason=finish_reason)
return _GeminiStreamChunk(
id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
object="chat.completion.chunk",
created=int(time.time()),
model=model,
choices=[choice],
usage=None,
)
def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
"""Parse Server-Sent Events from an httpx streaming response."""
buffer = ""
for chunk in response.iter_text():
if not chunk:
continue
buffer += chunk
while "\n" in buffer:
line, buffer = buffer.split("\n", 1)
line = line.rstrip("\r")
if not line:
continue
if line.startswith("data: "):
data = line[6:]
if data == "[DONE]":
return
try:
yield json.loads(data)
except json.JSONDecodeError:
logger.debug("Non-JSON SSE line: %s", data[:200])
def _translate_stream_event(
event: Dict[str, Any],
model: str,
tool_call_counter: List[int],
) -> List[_GeminiStreamChunk]:
"""Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s).
``tool_call_counter`` is a single-element list used as a mutable counter
across events in the same stream. Each ``functionCall`` part gets a
fresh, unique OpenAI ``index`` keying by function name would collide
whenever the model issues parallel calls to the same tool (e.g. reading
three files in one turn).
"""
inner = event.get("response") if isinstance(event.get("response"), dict) else event
candidates = inner.get("candidates") or []
if not candidates:
return []
cand = candidates[0]
if not isinstance(cand, dict):
return []
chunks: List[_GeminiStreamChunk] = []
content = cand.get("content") or {}
parts = content.get("parts") if isinstance(content, dict) else []
for part in parts or []:
if not isinstance(part, dict):
continue
if part.get("thought") is True and isinstance(part.get("text"), str):
chunks.append(_make_stream_chunk(
model=model, reasoning=part["text"],
))
continue
if isinstance(part.get("text"), str) and part["text"]:
chunks.append(_make_stream_chunk(model=model, content=part["text"]))
fc = part.get("functionCall")
if isinstance(fc, dict) and fc.get("name"):
name = str(fc["name"])
idx = tool_call_counter[0]
tool_call_counter[0] += 1
try:
args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
except (TypeError, ValueError):
args_str = "{}"
chunks.append(_make_stream_chunk(
model=model,
tool_call_delta={
"index": idx,
"name": name,
"arguments": args_str,
},
))
finish_reason_raw = str(cand.get("finishReason") or "")
if finish_reason_raw:
mapped = _map_gemini_finish_reason(finish_reason_raw)
if tool_call_counter[0] > 0:
mapped = "tool_calls"
chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
return chunks
# =============================================================================
# GeminiCloudCodeClient — OpenAI-compatible facade
# =============================================================================
MARKER_BASE_URL = "cloudcode-pa://google"
class _GeminiChatCompletions:
def __init__(self, client: "GeminiCloudCodeClient"):
self._client = client
def create(self, **kwargs: Any) -> Any:
return self._client._create_chat_completion(**kwargs)
class _GeminiChatNamespace:
def __init__(self, client: "GeminiCloudCodeClient"):
self.completions = _GeminiChatCompletions(client)
class GeminiCloudCodeClient:
"""Minimal OpenAI-SDK-compatible facade over Code Assist v1internal."""
def __init__(
self,
*,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
default_headers: Optional[Dict[str, str]] = None,
project_id: str = "",
**_: Any,
):
# `api_key` here is a dummy — real auth is the OAuth access token
# fetched on every call via agent.google_oauth.get_valid_access_token().
# We accept the kwarg for openai.OpenAI interface parity.
self.api_key = api_key or "google-oauth"
self.base_url = base_url or MARKER_BASE_URL
self._default_headers = dict(default_headers or {})
self._configured_project_id = project_id
self._project_context: Optional[ProjectContext] = None
self._project_context_lock = False # simple single-thread guard
self.chat = _GeminiChatNamespace(self)
self.is_closed = False
self._http = httpx.Client(timeout=httpx.Timeout(connect=15.0, read=600.0, write=30.0, pool=30.0))
def close(self) -> None:
self.is_closed = True
try:
self._http.close()
except Exception:
pass
# Implement the OpenAI SDK's context-manager-ish closure check
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
def _ensure_project_context(self, access_token: str, model: str) -> ProjectContext:
"""Lazily resolve and cache the project context for this client."""
if self._project_context is not None:
return self._project_context
env_project = google_oauth.resolve_project_id_from_env()
creds = google_oauth.load_credentials()
stored_project = creds.project_id if creds else ""
# Prefer what's already baked into the creds
if stored_project:
self._project_context = ProjectContext(
project_id=stored_project,
managed_project_id=creds.managed_project_id if creds else "",
tier_id="",
source="stored",
)
return self._project_context
ctx = resolve_project_context(
access_token,
configured_project_id=self._configured_project_id,
env_project_id=env_project,
user_agent_model=model,
)
# Persist discovered project back to the creds file so the next
# session doesn't re-run the discovery.
if ctx.project_id or ctx.managed_project_id:
google_oauth.update_project_ids(
project_id=ctx.project_id,
managed_project_id=ctx.managed_project_id,
)
self._project_context = ctx
return ctx
def _create_chat_completion(
self,
*,
model: str = "gemini-2.5-flash",
messages: Optional[List[Dict[str, Any]]] = None,
stream: bool = False,
tools: Any = None,
tool_choice: Any = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
top_p: Optional[float] = None,
stop: Any = None,
extra_body: Optional[Dict[str, Any]] = None,
timeout: Any = None,
**_: Any,
) -> Any:
access_token = google_oauth.get_valid_access_token()
ctx = self._ensure_project_context(access_token, model)
thinking_config = None
if isinstance(extra_body, dict):
thinking_config = extra_body.get("thinking_config") or extra_body.get("thinkingConfig")
inner = build_gemini_request(
messages=messages or [],
tools=tools,
tool_choice=tool_choice,
temperature=temperature,
max_tokens=max_tokens,
top_p=top_p,
stop=stop,
thinking_config=thinking_config,
)
wrapped = wrap_code_assist_request(
project_id=ctx.project_id,
model=model,
inner_request=inner,
)
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"Bearer {access_token}",
"User-Agent": "hermes-agent (gemini-cli-compat)",
"X-Goog-Api-Client": "gl-python/hermes",
"x-activity-request-id": str(uuid.uuid4()),
}
headers.update(self._default_headers)
if stream:
return self._stream_completion(model=model, wrapped=wrapped, headers=headers)
url = f"{CODE_ASSIST_ENDPOINT}/v1internal:generateContent"
response = self._http.post(url, json=wrapped, headers=headers)
if response.status_code != 200:
raise _gemini_http_error(response)
try:
payload = response.json()
except ValueError as exc:
raise CodeAssistError(
f"Invalid JSON from Code Assist: {exc}",
code="code_assist_invalid_json",
) from exc
return _translate_gemini_response(payload, model=model)
def _stream_completion(
self,
*,
model: str,
wrapped: Dict[str, Any],
headers: Dict[str, str],
) -> Iterator[_GeminiStreamChunk]:
"""Generator that yields OpenAI-shaped streaming chunks."""
url = f"{CODE_ASSIST_ENDPOINT}/v1internal:streamGenerateContent?alt=sse"
stream_headers = dict(headers)
stream_headers["Accept"] = "text/event-stream"
def _generator() -> Iterator[_GeminiStreamChunk]:
try:
with self._http.stream("POST", url, json=wrapped, headers=stream_headers) as response:
if response.status_code != 200:
# Materialize error body for better diagnostics
response.read()
raise _gemini_http_error(response)
tool_call_counter: List[int] = [0]
for event in _iter_sse_events(response):
for chunk in _translate_stream_event(event, model, tool_call_counter):
yield chunk
except httpx.HTTPError as exc:
raise CodeAssistError(
f"Streaming request failed: {exc}",
code="code_assist_stream_error",
) from exc
return _generator()
def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
"""Translate an httpx response into a CodeAssistError with rich metadata.
Parses Google's error envelope (``{"error": {"code", "message", "status",
"details": [...]}}``) so the agent's error classifier can reason about
the failure ``status_code`` enables the rate_limit / auth classification
paths, and ``response`` lets the main loop honor ``Retry-After`` just
like it does for OpenAI SDK exceptions.
Also lifts a few recognizable Google conditions into human-readable
messages so the user sees something better than a 500-char JSON dump:
MODEL_CAPACITY_EXHAUSTED "Gemini model capacity exhausted for
<model>. This is a Google-side throttle..."
RESOURCE_EXHAUSTED w/o reason quota-style message
404 "Model <name> not found at cloudcode-pa..."
"""
status = response.status_code
# Parse the body once, surviving any weird encodings.
body_text = ""
body_json: Dict[str, Any] = {}
try:
body_text = response.text
except Exception:
body_text = ""
if body_text:
try:
parsed = json.loads(body_text)
if isinstance(parsed, dict):
body_json = parsed
except (ValueError, TypeError):
body_json = {}
# Dig into Google's error envelope. Shape is:
# {"error": {"code": 429, "message": "...", "status": "RESOURCE_EXHAUSTED",
# "details": [{"@type": ".../ErrorInfo", "reason": "MODEL_CAPACITY_EXHAUSTED",
# "metadata": {...}},
# {"@type": ".../RetryInfo", "retryDelay": "30s"}]}}
err_obj = body_json.get("error") if isinstance(body_json, dict) else None
if not isinstance(err_obj, dict):
err_obj = {}
err_status = str(err_obj.get("status") or "").strip()
err_message = str(err_obj.get("message") or "").strip()
_raw_details = err_obj.get("details")
err_details_list = _raw_details if isinstance(_raw_details, list) else []
# Extract google.rpc.ErrorInfo reason + metadata. There may be more
# than one ErrorInfo (rare), so we pick the first one with a reason.
error_reason = ""
error_metadata: Dict[str, Any] = {}
retry_delay_seconds: Optional[float] = None
for detail in err_details_list:
if not isinstance(detail, dict):
continue
type_url = str(detail.get("@type") or "")
if not error_reason and type_url.endswith("/google.rpc.ErrorInfo"):
reason = detail.get("reason")
if isinstance(reason, str) and reason:
error_reason = reason
md = detail.get("metadata")
if isinstance(md, dict):
error_metadata = md
elif retry_delay_seconds is None and type_url.endswith("/google.rpc.RetryInfo"):
# retryDelay is a google.protobuf.Duration string like "30s" or "1.5s".
delay_raw = detail.get("retryDelay")
if isinstance(delay_raw, str) and delay_raw.endswith("s"):
try:
retry_delay_seconds = float(delay_raw[:-1])
except ValueError:
pass
elif isinstance(delay_raw, (int, float)):
retry_delay_seconds = float(delay_raw)
# Fall back to the Retry-After header if the body didn't include RetryInfo.
if retry_delay_seconds is None:
try:
header_val = response.headers.get("Retry-After") or response.headers.get("retry-after")
except Exception:
header_val = None
if header_val:
try:
retry_delay_seconds = float(header_val)
except (TypeError, ValueError):
retry_delay_seconds = None
# Classify the error code. ``code_assist_rate_limited`` stays the default
# for 429s; a more specific reason tag helps downstream callers (e.g. tests,
# logs) without changing the rate_limit classification path.
code = f"code_assist_http_{status}"
if status == 401:
code = "code_assist_unauthorized"
elif status == 429:
code = "code_assist_rate_limited"
if error_reason == "MODEL_CAPACITY_EXHAUSTED":
code = "code_assist_capacity_exhausted"
# Build a human-readable message. Keep the status + a raw-body tail for
# debugging, but lead with a friendlier summary when we recognize the
# Google signal.
model_hint = ""
if isinstance(error_metadata, dict):
model_hint = str(error_metadata.get("model") or error_metadata.get("modelId") or "").strip()
if status == 429 and error_reason == "MODEL_CAPACITY_EXHAUSTED":
target = model_hint or "this Gemini model"
message = (
f"Gemini capacity exhausted for {target} (Google-side throttle, "
f"not a Hermes issue). Try a different Gemini model or set a "
f"fallback_providers entry to a non-Gemini provider."
)
if retry_delay_seconds is not None:
message += f" Google suggests retrying in {retry_delay_seconds:g}s."
elif status == 429 and err_status == "RESOURCE_EXHAUSTED":
message = (
f"Gemini quota exhausted ({err_message or 'RESOURCE_EXHAUSTED'}). "
f"Check /gquota for remaining daily requests."
)
if retry_delay_seconds is not None:
message += f" Retry suggested in {retry_delay_seconds:g}s."
elif status == 404:
# Google returns 404 when a model has been retired or renamed.
target = model_hint or (err_message or "model")
message = (
f"Code Assist 404: {target} is not available at "
f"cloudcode-pa.googleapis.com. It may have been renamed or "
f"retired. Check hermes_cli/models.py for the current list."
)
elif err_message:
# Generic fallback with the parsed message.
message = f"Code Assist HTTP {status} ({err_status or 'error'}): {err_message}"
else:
# Last-ditch fallback — raw body snippet.
message = f"Code Assist returned HTTP {status}: {body_text[:500]}"
return CodeAssistError(
message,
code=code,
status_code=status,
response=response,
retry_after=retry_delay_seconds,
details={
"status": err_status,
"reason": error_reason,
"metadata": error_metadata,
"message": err_message,
},
)

View file

@ -1,451 +0,0 @@
"""Google Code Assist API client — project discovery, onboarding, quota.
The Code Assist API powers Google's official gemini-cli. It sits at
``cloudcode-pa.googleapis.com`` and provides:
- Free tier access (generous daily quota) for personal Google accounts
- Paid tier access via GCP projects with billing / Workspace / Standard / Enterprise
This module handles the control-plane dance needed before inference:
1. ``load_code_assist()`` probe the user's account to learn what tier they're on
and whether a ``cloudaicompanionProject`` is already assigned.
2. ``onboard_user()`` if the user hasn't been onboarded yet (new account, fresh
free tier, etc.), call this with the chosen tier + project id. Supports LRO
polling for slow provisioning.
3. ``retrieve_user_quota()`` fetch the ``buckets[]`` array showing remaining
quota per model, used by the ``/gquota`` slash command.
VPC-SC handling: enterprise accounts under a VPC Service Controls perimeter
will get ``SECURITY_POLICY_VIOLATED`` on ``load_code_assist``. We catch this
and force the account to ``standard-tier`` so the call chain still succeeds.
Derived from opencode-gemini-auth (MIT) and clawdbot/extensions/google. The
request/response shapes are specific to Google's internal Code Assist API,
documented nowhere public we copy them from the reference implementations.
"""
from __future__ import annotations
import json
import logging
import time
import urllib.error
import urllib.request
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
# =============================================================================
# Constants
# =============================================================================
CODE_ASSIST_ENDPOINT = "https://cloudcode-pa.googleapis.com"
# Fallback endpoints tried when prod returns an error during project discovery
FALLBACK_ENDPOINTS = [
"https://daily-cloudcode-pa.sandbox.googleapis.com",
"https://autopush-cloudcode-pa.sandbox.googleapis.com",
]
# Tier identifiers that Google's API uses
FREE_TIER_ID = "free-tier"
LEGACY_TIER_ID = "legacy-tier"
STANDARD_TIER_ID = "standard-tier"
# Default HTTP headers matching gemini-cli's fingerprint.
# Google may reject unrecognized User-Agents on these internal endpoints.
_GEMINI_CLI_USER_AGENT = "google-api-nodejs-client/9.15.1 (gzip)"
_X_GOOG_API_CLIENT = "gl-node/24.0.0"
_DEFAULT_REQUEST_TIMEOUT = 30.0
_ONBOARDING_POLL_ATTEMPTS = 12
_ONBOARDING_POLL_INTERVAL_SECONDS = 5.0
class CodeAssistError(RuntimeError):
"""Exception raised by the Code Assist (``cloudcode-pa``) integration.
Carries HTTP status / response / retry-after metadata so the agent's
``error_classifier._extract_status_code`` and the main loop's Retry-After
handling (which walks ``error.response.headers``) pick up the right
signals. Without these, 429s from the OAuth path look like opaque
``RuntimeError`` and skip the rate-limit path.
"""
def __init__(
self,
message: str,
*,
code: str = "code_assist_error",
status_code: Optional[int] = None,
response: Any = None,
retry_after: Optional[float] = None,
details: Optional[Dict[str, Any]] = None,
) -> None:
super().__init__(message)
self.code = code
# ``status_code`` is picked up by ``agent.error_classifier._extract_status_code``
# so a 429 from Code Assist classifies as FailoverReason.rate_limit and
# triggers the main loop's fallback_providers chain the same way SDK
# errors do.
self.status_code = status_code
# ``response`` is the underlying ``httpx.Response`` (or a shim with a
# ``.headers`` mapping and ``.json()`` method). The main loop reads
# ``error.response.headers["Retry-After"]`` to honor Google's retry
# hints when the backend throttles us.
self.response = response
# Parsed ``Retry-After`` seconds (kept separately for convenience —
# Google returns retry hints in both the header and the error body's
# ``google.rpc.RetryInfo`` details, and we pick whichever we found).
self.retry_after = retry_after
# Parsed structured error details from the Google error envelope
# (e.g. ``{"reason": "MODEL_CAPACITY_EXHAUSTED", "status": "RESOURCE_EXHAUSTED"}``).
# Useful for logging and for tests that want to assert on specifics.
self.details = details or {}
class ProjectIdRequiredError(CodeAssistError):
def __init__(self, message: str = "GCP project id required for this tier") -> None:
super().__init__(message, code="code_assist_project_id_required")
# =============================================================================
# HTTP primitive (auth via Bearer token passed per-call)
# =============================================================================
def _build_headers(access_token: str, *, user_agent_model: str = "") -> Dict[str, str]:
ua = _GEMINI_CLI_USER_AGENT
if user_agent_model:
ua = f"{ua} model/{user_agent_model}"
return {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"Bearer {access_token}",
"User-Agent": ua,
"X-Goog-Api-Client": _X_GOOG_API_CLIENT,
"x-activity-request-id": str(uuid.uuid4()),
}
def _client_metadata() -> Dict[str, str]:
"""Match Google's gemini-cli exactly — unrecognized metadata may be rejected."""
return {
"ideType": "IDE_UNSPECIFIED",
"platform": "PLATFORM_UNSPECIFIED",
"pluginType": "GEMINI",
}
def _post_json(
url: str,
body: Dict[str, Any],
access_token: str,
*,
timeout: float = _DEFAULT_REQUEST_TIMEOUT,
user_agent_model: str = "",
) -> Dict[str, Any]:
data = json.dumps(body).encode("utf-8")
request = urllib.request.Request(
url, data=data, method="POST",
headers=_build_headers(access_token, user_agent_model=user_agent_model),
)
try:
with urllib.request.urlopen(request, timeout=timeout) as response:
raw = response.read().decode("utf-8", errors="replace")
return json.loads(raw) if raw else {}
except urllib.error.HTTPError as exc:
detail = ""
try:
detail = exc.read().decode("utf-8", errors="replace")
except Exception:
pass
# Special case: VPC-SC violation should be distinguishable
if _is_vpc_sc_violation(detail):
raise CodeAssistError(
f"VPC-SC policy violation: {detail}",
code="code_assist_vpc_sc",
) from exc
raise CodeAssistError(
f"Code Assist HTTP {exc.code}: {detail or exc.reason}",
code=f"code_assist_http_{exc.code}",
) from exc
except urllib.error.URLError as exc:
raise CodeAssistError(
f"Code Assist request failed: {exc}",
code="code_assist_network_error",
) from exc
def _is_vpc_sc_violation(body: str) -> bool:
"""Detect a VPC Service Controls violation from a response body."""
if not body:
return False
try:
parsed = json.loads(body)
except (json.JSONDecodeError, ValueError):
return "SECURITY_POLICY_VIOLATED" in body
# Walk the nested error structure Google uses
error = parsed.get("error") if isinstance(parsed, dict) else None
if not isinstance(error, dict):
return False
details = error.get("details") or []
if isinstance(details, list):
for item in details:
if isinstance(item, dict):
reason = item.get("reason") or ""
if reason == "SECURITY_POLICY_VIOLATED":
return True
msg = str(error.get("message", ""))
return "SECURITY_POLICY_VIOLATED" in msg
# =============================================================================
# load_code_assist — discovers current tier + assigned project
# =============================================================================
@dataclass
class CodeAssistProjectInfo:
"""Result from ``load_code_assist``."""
current_tier_id: str = ""
cloudaicompanion_project: str = "" # Google-managed project (free tier)
allowed_tiers: List[str] = field(default_factory=list)
raw: Dict[str, Any] = field(default_factory=dict)
def load_code_assist(
access_token: str,
*,
project_id: str = "",
user_agent_model: str = "",
) -> CodeAssistProjectInfo:
"""Call ``POST /v1internal:loadCodeAssist`` with prod → sandbox fallback.
Returns whatever tier + project info Google reports. On VPC-SC violations,
returns a synthetic ``standard-tier`` result so the chain can continue.
"""
body: Dict[str, Any] = {
"metadata": {
"duetProject": project_id,
**_client_metadata(),
},
}
if project_id:
body["cloudaicompanionProject"] = project_id
endpoints = [CODE_ASSIST_ENDPOINT] + FALLBACK_ENDPOINTS
last_err: Optional[Exception] = None
for endpoint in endpoints:
url = f"{endpoint}/v1internal:loadCodeAssist"
try:
resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
return _parse_load_response(resp)
except CodeAssistError as exc:
if exc.code == "code_assist_vpc_sc":
logger.info("VPC-SC violation on %s — defaulting to standard-tier", endpoint)
return CodeAssistProjectInfo(
current_tier_id=STANDARD_TIER_ID,
cloudaicompanion_project=project_id,
)
last_err = exc
logger.warning("loadCodeAssist failed on %s: %s", endpoint, exc)
continue
if last_err:
raise last_err
return CodeAssistProjectInfo()
def _parse_load_response(resp: Dict[str, Any]) -> CodeAssistProjectInfo:
current_tier = resp.get("currentTier") or {}
tier_id = str(current_tier.get("id") or "") if isinstance(current_tier, dict) else ""
project = str(resp.get("cloudaicompanionProject") or "")
allowed = resp.get("allowedTiers") or []
allowed_ids: List[str] = []
if isinstance(allowed, list):
for t in allowed:
if isinstance(t, dict):
tid = str(t.get("id") or "")
if tid:
allowed_ids.append(tid)
return CodeAssistProjectInfo(
current_tier_id=tier_id,
cloudaicompanion_project=project,
allowed_tiers=allowed_ids,
raw=resp,
)
# =============================================================================
# onboard_user — provisions a new user on a tier (with LRO polling)
# =============================================================================
def onboard_user(
access_token: str,
*,
tier_id: str,
project_id: str = "",
user_agent_model: str = "",
) -> Dict[str, Any]:
"""Call ``POST /v1internal:onboardUser`` to provision the user.
For paid tiers, ``project_id`` is REQUIRED (raises ProjectIdRequiredError).
For free tiers, ``project_id`` is optional Google will assign one.
Returns the final operation response. Polls ``/v1internal/<name>`` for up
to ``_ONBOARDING_POLL_ATTEMPTS`` × ``_ONBOARDING_POLL_INTERVAL_SECONDS``
(default: 12 × 5s = 1 min).
"""
if tier_id != FREE_TIER_ID and tier_id != LEGACY_TIER_ID and not project_id:
raise ProjectIdRequiredError(
f"Tier {tier_id!r} requires a GCP project id. "
"Set HERMES_GEMINI_PROJECT_ID or GOOGLE_CLOUD_PROJECT."
)
body: Dict[str, Any] = {
"tierId": tier_id,
"metadata": _client_metadata(),
}
if project_id:
body["cloudaicompanionProject"] = project_id
endpoint = CODE_ASSIST_ENDPOINT
url = f"{endpoint}/v1internal:onboardUser"
resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
# Poll if LRO (long-running operation)
if not resp.get("done"):
op_name = resp.get("name", "")
if not op_name:
return resp
for attempt in range(_ONBOARDING_POLL_ATTEMPTS):
time.sleep(_ONBOARDING_POLL_INTERVAL_SECONDS)
poll_url = f"{endpoint}/v1internal/{op_name}"
try:
poll_resp = _post_json(poll_url, {}, access_token, user_agent_model=user_agent_model)
except CodeAssistError as exc:
logger.warning("Onboarding poll attempt %d failed: %s", attempt + 1, exc)
continue
if poll_resp.get("done"):
return poll_resp
logger.warning("Onboarding did not complete within %d attempts", _ONBOARDING_POLL_ATTEMPTS)
return resp
# =============================================================================
# retrieve_user_quota — for /gquota
# =============================================================================
@dataclass
class QuotaBucket:
model_id: str
token_type: str = ""
remaining_fraction: float = 0.0
reset_time_iso: str = ""
raw: Dict[str, Any] = field(default_factory=dict)
def retrieve_user_quota(
access_token: str,
*,
project_id: str = "",
user_agent_model: str = "",
) -> List[QuotaBucket]:
"""Call ``POST /v1internal:retrieveUserQuota`` and parse ``buckets[]``."""
body: Dict[str, Any] = {}
if project_id:
body["project"] = project_id
url = f"{CODE_ASSIST_ENDPOINT}/v1internal:retrieveUserQuota"
resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
raw_buckets = resp.get("buckets") or []
buckets: List[QuotaBucket] = []
if not isinstance(raw_buckets, list):
return buckets
for b in raw_buckets:
if not isinstance(b, dict):
continue
buckets.append(QuotaBucket(
model_id=str(b.get("modelId") or ""),
token_type=str(b.get("tokenType") or ""),
remaining_fraction=float(b.get("remainingFraction") or 0.0),
reset_time_iso=str(b.get("resetTime") or ""),
raw=b,
))
return buckets
# =============================================================================
# Project context resolution
# =============================================================================
@dataclass
class ProjectContext:
"""Resolved state for a given OAuth session."""
project_id: str = "" # effective project id sent on requests
managed_project_id: str = "" # Google-assigned project (free tier)
tier_id: str = ""
source: str = "" # "env", "config", "discovered", "onboarded"
def resolve_project_context(
access_token: str,
*,
configured_project_id: str = "",
env_project_id: str = "",
user_agent_model: str = "",
) -> ProjectContext:
"""Figure out what project id + tier to use for requests.
Priority:
1. If configured_project_id or env_project_id is set, use that directly
and short-circuit (no discovery needed).
2. Otherwise call loadCodeAssist to see what Google says.
3. If no tier assigned yet, onboard the user (free tier default).
"""
# Short-circuit: caller provided a project id
if configured_project_id:
return ProjectContext(
project_id=configured_project_id,
tier_id=STANDARD_TIER_ID, # assume paid since they specified one
source="config",
)
if env_project_id:
return ProjectContext(
project_id=env_project_id,
tier_id=STANDARD_TIER_ID,
source="env",
)
# Discover via loadCodeAssist
info = load_code_assist(access_token, user_agent_model=user_agent_model)
effective_project = info.cloudaicompanion_project
tier = info.current_tier_id
if not tier:
# User hasn't been onboarded — provision them on free tier
onboard_resp = onboard_user(
access_token,
tier_id=FREE_TIER_ID,
project_id="",
user_agent_model=user_agent_model,
)
# Re-parse from the onboard response
response_body = onboard_resp.get("response") or {}
if isinstance(response_body, dict):
effective_project = (
effective_project
or str(response_body.get("cloudaicompanionProject") or "")
)
tier = FREE_TIER_ID
source = "onboarded"
else:
source = "discovered"
return ProjectContext(
project_id=effective_project,
managed_project_id=effective_project if tier == FREE_TIER_ID else "",
tier_id=tier,
source=source,
)

File diff suppressed because it is too large Load diff

View file

@ -11,6 +11,18 @@ Providers live in ``<repo>/plugins/image_gen/<name>/`` (built-in, auto-loaded
as ``kind: backend``) or ``~/.hermes/plugins/image_gen/<name>/`` (user, opt-in
via ``plugins.enabled``).
Unified surface
---------------
One tool ``image_generate`` covers **text-to-image** and
**image-to-image / image editing**. The router is the presence of
``image_url`` (and/or ``reference_image_urls``): if any source image is
provided, the provider routes to its image-to-image / edit endpoint; if
omitted, the provider routes to text-to-image. Users pick one **model**
(e.g. nano-banana-pro, gpt-image-2, grok-imagine-image); the provider
handles which underlying endpoint to hit. This mirrors the ``video_gen``
provider design (``agent/video_gen_provider.py``) so the two surfaces
stay learnable together.
Response shape
--------------
All providers return a dict that :func:`success_response` / :func:`error_response`
@ -21,6 +33,7 @@ produce. The tool wrapper JSON-serializes it. Keys:
model str provider-specific model identifier
prompt str echoed prompt
aspect_ratio str "landscape" | "square" | "portrait"
modality str "text" | "image" (which mode was used)
provider str provider name (for diagnostics)
error str only when success=False
error_type str only when success=False
@ -127,19 +140,51 @@ class ImageGenProvider(abc.ABC):
return models[0].get("id")
return None
def capabilities(self) -> Dict[str, Any]:
"""Return what this provider supports.
Returned dict (all keys optional)::
{
"modalities": ["text", "image"], # which inputs the backend accepts
"max_reference_images": 9, # cap for reference_image_urls
}
``modalities`` declares whether the active backend/model supports
text-to-image (``"text"``), image-to-image / editing (``"image"``),
or both. The tool layer surfaces this in the dynamic schema so the
model knows when ``image_url`` is honored. Used by ``hermes tools``
for the picker too. Default: text-only (backward compatible a
provider that doesn't override this advertises text-to-image only).
"""
return {
"modalities": ["text"],
"max_reference_images": 0,
}
@abc.abstractmethod
def generate(
self,
prompt: str,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
*,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
**kwargs: Any,
) -> Dict[str, Any]:
"""Generate an image.
"""Generate an image from a text prompt, or edit/transform a source image.
Routing: if ``image_url`` (or any ``reference_image_urls``) is
provided, the provider should route to its image-to-image / edit
endpoint; otherwise text-to-image. ``image_url`` is the primary
source image to edit; ``reference_image_urls`` are additional
style/composition references (provider clamps to its declared
``max_reference_images``).
Implementations should return the dict from :func:`success_response`
or :func:`error_response`. ``kwargs`` may contain forward-compat
parameters future versions of the schema will expose implementations
should ignore unknown keys.
parameters future versions of the schema will expose
implementations MUST ignore unknown keys (no TypeError).
"""
@ -162,6 +207,26 @@ def resolve_aspect_ratio(value: Optional[str]) -> str:
return DEFAULT_ASPECT_RATIO
def normalize_reference_images(value: Any) -> Optional[List[str]]:
"""Coerce a reference-image argument into a clean list of URL/path strings.
Accepts a single string or a list; strips blanks and whitespace. Returns
``None`` when nothing usable remains so providers can treat "no refs" as a
single sentinel.
"""
if value is None:
return None
if isinstance(value, str):
value = [value]
if not isinstance(value, (list, tuple)):
return None
out: List[str] = []
for item in value:
if isinstance(item, str) and item.strip():
out.append(item.strip())
return out or None
def _images_cache_dir() -> Path:
"""Return ``$HERMES_HOME/cache/images/``, creating parents as needed."""
from hermes_constants import get_hermes_home
@ -280,13 +345,16 @@ def success_response(
prompt: str,
aspect_ratio: str,
provider: str,
modality: str = "text",
extra: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Build a uniform success response dict.
``image`` may be an HTTP URL or an absolute filesystem path (for b64
providers like OpenAI). Callers that need to pass through additional
backend-specific fields can supply ``extra``.
providers like OpenAI). ``modality`` is ``"text"`` (text-to-image) or
``"image"`` (image-to-image / editing) indicates which endpoint was
actually hit, useful for diagnostics. Callers that need to pass through
additional backend-specific fields can supply ``extra``.
"""
payload: Dict[str, Any] = {
"success": True,
@ -294,6 +362,7 @@ def success_response(
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"modality": modality,
"provider": provider,
}
if extra:

View file

@ -721,9 +721,10 @@ class MemoryManager:
try:
provider.on_session_end(messages)
except Exception as e:
logger.debug(
logger.warning(
"Memory provider '%s' on_session_end failed: %s",
provider.name, e,
exc_info=True,
)
def on_session_switch(

View file

@ -28,6 +28,7 @@ Optional hooks (override to opt in):
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
backup_paths() -> list[str] extra on-disk paths to include in `hermes backup`
"""
from __future__ import annotations
@ -294,3 +295,21 @@ class MemoryProvider(ABC):
Use to mirror built-in memory writes to your backend.
"""
def backup_paths(self) -> List[str]:
"""Return extra on-disk paths this provider stores OUTSIDE HERMES_HOME.
``hermes backup`` only walks HERMES_HOME, so any provider state kept
under ``~/.honcho``, ``~/.hindsight``, ``~/.openviking``, etc. is lost
across a backup/import cycle unless it's declared here.
Return a list of absolute path strings (files or directories). The
backup command resolves each, captures the ones that exist and live
under the user's home directory into a reserved ``_external/`` subtree
of the archive, and ``hermes import`` restores them to their original
locations. Paths outside the home directory are skipped for safety.
MUST be callable without ``initialize()`` and without network resolve
from config/env only. Default returns an empty list (nothing external).
"""
return []

50
agent/message_content.py Normal file
View file

@ -0,0 +1,50 @@
from __future__ import annotations
from collections.abc import Mapping
from typing import Any
_NON_TEXT_PART_TYPES = {"image", "image_url", "input_image", "audio", "input_audio"}
_TEXT_KEYS = ("text", "content", "input_text", "output_text", "summary_text")
def _field(value: Any, key: str) -> Any:
if isinstance(value, Mapping):
return value.get(key)
return getattr(value, key, None)
def _text_from_part(part: Any) -> str:
if part is None:
return ""
if isinstance(part, str):
return part
part_type = str(_field(part, "type") or "").strip().lower()
if part_type in _NON_TEXT_PART_TYPES:
return ""
for key in _TEXT_KEYS:
text = _field(part, key)
if isinstance(text, str):
return text
return ""
def flatten_message_text(content: Any, *, sep: str = "\n") -> str:
"""Return the visible text from common chat/Responses message content shapes."""
if content is None:
return ""
if isinstance(content, str):
return content
if isinstance(content, list):
chunks = [_text_from_part(part) for part in content]
return sep.join(chunk for chunk in chunks if chunk)
text = _text_from_part(content)
if text:
return text
try:
return str(content)
except Exception:
return ""

View file

@ -238,6 +238,23 @@ KANBAN_GUIDANCE = (
"of the decomposition. Do NOT execute the work yourself; your job is "
"routing, not implementation.\n"
"\n"
"## Reference details that change outcomes\n"
"\n"
"- **Workspace.** `cd $HERMES_KANBAN_WORKSPACE` first. For a `worktree` kind "
"with no `.git`, `git worktree add <path> "
"${HERMES_KANBAN_BRANCH:-wt/$HERMES_KANBAN_TASK}` from the main repo, then "
"cd there.\n"
"- **Deliverables.** Files a human wants go in "
"`kanban_complete(artifacts=[<absolute paths>])` (top-level param; paths in "
"`metadata` are NOT uploaded). Files must exist at completion.\n"
"- **Created cards.** List ids in `kanban_complete(created_cards=[...])` "
"ONLY when captured from a successful `kanban_create` return — never invent "
"or paste ids; the kernel rejects the completion on any phantom id.\n"
"- **Orchestrating: discover profiles first.** The dispatcher SILENTLY "
"drops a card with an unknown assignee (it sits in `ready` forever). Ground "
"every assignee in a real profile (`hermes profile list`, or ask the user), "
"and express dependencies via `parents=[...]` on `kanban_create`, not prose.\n"
"\n"
"## Do NOT\n"
"\n"
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "

View file

@ -120,9 +120,25 @@ _JSON_FIELD_RE = re.compile(
re.IGNORECASE,
)
# Authorization headers
# Authorization headers — any scheme (Bearer, Basic, Token, Digest, …) plus the
# bare-credential form, and Proxy-Authorization. The credential token is masked
# while the header name and scheme word are preserved for debuggability. The
# previous rule only matched ``Bearer``, so ``Basic <base64 user:pass>`` and
# ``token <pat>`` leaked verbatim into logs/transcripts.
_AUTH_HEADER_RE = re.compile(
r"(Authorization:\s*Bearer\s+)(\S+)",
r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?(\S+)",
re.IGNORECASE,
)
# API-key style auth headers carrying a single opaque value (no scheme word).
# Anthropic and many providers authenticate with ``x-api-key``; values without
# a known vendor prefix (custom/local backends) would otherwise leak when a
# request or curl command is logged or echoed into tool output / transcripts.
_SECRET_HEADER_NAMES = (
r"(?:x-api-key|x-goog-api-key|api-key|apikey|x-api-token|x-auth-token|x-access-token)"
)
_SECRET_HEADER_RE = re.compile(
rf"({_SECRET_HEADER_NAMES}\s*:\s*)(\S+)",
re.IGNORECASE,
)
@ -374,11 +390,19 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
# Authorization headers — _AUTH_HEADER_RE is "Authorization: Bearer ..."
# case-insensitive, so "uthorization" is the cheapest substring gate that
# covers both "Authorization" and "authorization" without a casefold().
# Authorization headers — _AUTH_HEADER_RE matches any scheme after
# "[Proxy-]Authorization:" case-insensitively, so "uthorization" is the
# cheapest substring gate that covers every casing without a casefold().
if "uthorization" in text or "UTHORIZATION" in text:
text = _AUTH_HEADER_RE.sub(
lambda m: m.group(1) + (m.group(2) or "") + _mask_token(m.group(3)),
text,
)
# API-key style headers (x-api-key, api-key, …). Header values are
# colon-separated, so gate on ":" — the regex itself is the precise filter.
if ":" in text:
text = _SECRET_HEADER_RE.sub(
lambda m: m.group(1) + _mask_token(m.group(2)),
text,
)

205
agent/secret_scope.py Normal file
View file

@ -0,0 +1,205 @@
"""Profile-scoped credential resolution for multi-profile gateway multiplexing.
The multiplexing gateway serves many profiles from one process. Each profile
has its own ``.env`` with its own provider keys and platform tokens, so we
**cannot** union them into the process-global ``os.environ`` (that would leak
profile A's keys to profile B's turns, and to every subprocess spawned with
``env=dict(os.environ)``).
This module provides a fail-closed, context-local secret scope:
- ``set_secret_scope(mapping)`` installs the active profile's secrets for the
current task (a contextvar, so it propagates into the agent's worker thread
via ``copy_context()`` exactly like the HERMES_HOME override).
- ``get_secret(name)`` reads from that scope. When multiplexing is **active**
and no scope is set, it RAISES rather than silently falling back to
``os.environ`` an un-migrated or newly-added call site fails loud at that
exact line instead of leaking another profile's value. When multiplexing is
**off** (the default), it transparently reads ``os.environ`` so the
single-profile gateway and every non-gateway caller behave exactly as before.
Design rationale lives in ``docs/design/multiplexing-gateway.md`` (Workstream A).
"""
from __future__ import annotations
import os
from contextvars import ContextVar, Token
from pathlib import Path
from typing import Dict, Mapping, Optional
# ── multiplex-active flag ────────────────────────────────────────────────
# Process-global: set once at gateway startup when gateway.multiplex_profiles
# is true. Governs whether get_secret() fails closed on an unscoped read.
# A plain module global (not a contextvar): it describes the deployment mode,
# not a per-task value.
_MULTIPLEX_ACTIVE: bool = False
def set_multiplex_active(active: bool) -> None:
"""Mark whether the process is running as a profile multiplexer.
Called once at gateway startup. When True, ``get_secret`` fails closed on
an unscoped read instead of falling back to ``os.environ``.
"""
global _MULTIPLEX_ACTIVE
_MULTIPLEX_ACTIVE = bool(active)
def is_multiplex_active() -> bool:
"""Return whether the process is running as a profile multiplexer."""
return _MULTIPLEX_ACTIVE
# ── the secret scope contextvar ──────────────────────────────────────────
_SECRET_SCOPE: ContextVar[Optional[Mapping[str, str]]] = ContextVar(
"_SECRET_SCOPE", default=None
)
class UnscopedSecretError(RuntimeError):
"""Raised when a secret is read in multiplex mode with no scope installed.
This is the fail-closed signal: it means a credential read reached
``get_secret`` without a profile scope active, which in a multiplexer would
otherwise leak whichever profile's value happened to be in ``os.environ``.
The fix is to wrap the call path in ``set_secret_scope(...)`` (the per-turn
/ per-adapter profile scope), not to widen the allowlist.
"""
def set_secret_scope(secrets: Optional[Mapping[str, str]]) -> Token:
"""Install the active profile's secret mapping for the current context.
Returns a token for ``reset_secret_scope``. Pass ``None`` to clear.
"""
return _SECRET_SCOPE.set(secrets)
def reset_secret_scope(token: Token) -> None:
"""Restore the previous secret scope."""
_SECRET_SCOPE.reset(token)
def current_secret_scope() -> Optional[Mapping[str, str]]:
"""Return the active secret mapping, or None when no scope is installed."""
return _SECRET_SCOPE.get()
# ── genuinely-global env vars (NOT per-profile secrets) ──────────────────
# These are process/deployment-level settings, not profile credentials. They
# legitimately live in os.environ and must keep reading from it even in
# multiplex mode — routing them through the fail-closed path would wrongly
# crash. Anything matching is read from os.environ regardless of scope.
#
# Membership test is by exact name OR prefix (see _is_global_env). Keep this
# list tight: when in doubt a value is a profile secret, not a global.
_GLOBAL_ENV_EXACT = frozenset({
# Hermes runtime / deployment
"HERMES_HOME", "HERMES_PROFILE", "HERMES_GATEWAY_LOCK_DIR",
"HERMES_MAX_ITERATIONS", "HERMES_MAX_TOKENS", "HERMES_API_TIMEOUT",
"HERMES_REDACT_SECRETS", "HERMES_NOUS_TIMEOUT_SECONDS",
"_HERMES_GATEWAY",
# OS / interpreter
"PATH", "HOME", "USER", "LANG", "LC_ALL", "TZ", "PWD", "SHELL", "TMPDIR",
"VIRTUAL_ENV", "PYTHONPATH", "SSL_CERT_FILE",
# Kanban paths (per-board, not per-profile-secret)
"HERMES_KANBAN_DB", "HERMES_KANBAN_WORKSPACES_ROOT", "HERMES_KANBAN_BOARD",
})
_GLOBAL_ENV_PREFIXES = (
"HERMES_KANBAN_",
"HERMES_TELEGRAM_", # tuning knobs (batch delays, fallback toggles) — NOT the token
"TERMINAL_", # terminal/sandbox backend settings
)
def _is_global_env(name: str) -> bool:
"""Return True for genuinely process-global (non-profile-secret) env vars."""
if name in _GLOBAL_ENV_EXACT:
return True
return any(name.startswith(p) for p in _GLOBAL_ENV_PREFIXES)
def get_secret(name: str, default: Optional[str] = None) -> Optional[str]:
"""Resolve a credential by env-var name, honoring the active profile scope.
Resolution order:
1. Genuinely-global vars (``_is_global_env``) always read ``os.environ``
they are deployment settings, not profile secrets.
2. When a secret scope is installed (multiplexed turn), read from it; an
absent key returns ``default``. The scope is authoritative we do NOT
fall through to ``os.environ``, because in a multiplexer ``os.environ``
may hold another profile's value.
3. No scope installed:
- multiplex INACTIVE (default deployment): read ``os.environ``
identical to the legacy ``os.getenv`` behavior every caller had before.
- multiplex ACTIVE: FAIL CLOSED. Raise ``UnscopedSecretError`` so the
missing scope is caught loudly instead of leaking a cross-profile value.
"""
if _is_global_env(name):
val = os.environ.get(name)
return val if val is not None else default
scope = _SECRET_SCOPE.get()
if scope is not None:
val = scope.get(name)
return val if val is not None else default
if _MULTIPLEX_ACTIVE:
raise UnscopedSecretError(
f"get_secret({name!r}) called with no profile secret scope active "
f"while multiplexing is on. This credential read must run inside a "
f"set_secret_scope(...) block (the per-turn / per-adapter profile "
f"scope). Reading os.environ here would risk leaking another "
f"profile's value. See docs/design/multiplexing-gateway.md "
f"(Workstream A)."
)
val = os.environ.get(name)
return val if val is not None else default
def load_env_file(env_path: Path) -> Dict[str, str]:
"""Parse a ``.env`` file into a plain dict WITHOUT touching ``os.environ``.
Used to load a profile's secrets into an isolated mapping for
``set_secret_scope``. Mirrors python-dotenv's basic parsing (KEY=VALUE,
``export`` prefix, ``#`` comments, optional matching quotes) but never
mutates the process environment that isolation is the whole point.
"""
secrets: Dict[str, str] = {}
try:
text = env_path.read_text(encoding="utf-8")
except (FileNotFoundError, OSError, UnicodeDecodeError):
return secrets
for raw in text.splitlines():
line = raw.strip()
if not line or line.startswith("#"):
continue
if line.startswith("export "):
line = line[len("export "):].lstrip()
if "=" not in line:
continue
key, _, value = line.partition("=")
key = key.strip()
if not key:
continue
value = value.strip()
if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
value = value[1:-1]
secrets[key] = value
return secrets
def build_profile_secret_scope(hermes_home: Path) -> Dict[str, str]:
"""Build a profile's secret mapping from its ``<home>/.env``.
Returns a fresh dict (safe to install via ``set_secret_scope``). Genuinely
global vars are intentionally NOT copied in ``get_secret`` reads those
from ``os.environ`` directly, so the scope holds only profile secrets.
"""
return load_env_file(Path(hermes_home) / ".env")

View file

@ -49,6 +49,58 @@ Wire protocol
# Silent no-op:
<empty or any non-matching JSON object>
Per-event ``extra`` keys
~~~~~~~~~~~~~~~~~~~~~~~~
The ``extra`` object contains every kwarg that is **not** one of the
top-level payload keys (``tool_name``, ``args``, ``session_id``,
``parent_session_id``). The tables below list the ``extra`` keys
emitted by each built-in hook site.
``post_tool_call`` (emitted from ``model_tools.py``)::
result tool return value (serialised string)
status "ok" | "error" | "blocked"
error_type error category (e.g. "ValueError"), or None
error_message human-readable error text, or None
duration_ms wall-clock time in milliseconds
task_id current task id (empty string if none)
tool_call_id provider tool-call id
turn_id current turn id
api_request_id current API request id
middleware_trace list of dicts from tool middleware chain
``pre_tool_call`` (emitted from ``model_tools.py``)::
task_id current task id (empty string if none)
tool_call_id provider tool-call id
turn_id current turn id
api_request_id current API request id
middleware_trace list of dicts from tool middleware chain
``on_session_start`` (emitted from ``agent/conversation_loop.py``)::
model model name (e.g. "claude-sonnet-4-20250514")
platform platform identifier (e.g. "cli", "whatsapp")
``on_session_end`` (emitted from ``agent/turn_finalizer.py``)::
task_id current task id
turn_id current turn id
completed bool, True when the turn produced a final response
interrupted bool, True when the user interrupted
model model name
platform platform identifier
``subagent_stop`` (emitted from ``tools/delegate_tool.py``)::
parent_turn_id parent agent's current turn id
child_session_id child (subagent) session id
child_role role string of the child agent
child_summary summary of the child's work
child_status exit status string (e.g. "success", "error")
duration_ms wall-clock time of the child run in milliseconds
"""
from __future__ import annotations

View file

@ -280,9 +280,9 @@ def skill_matches_environment(frontmatter: Dict[str, Any]) -> bool:
This is an OFFER-time filter: it controls whether a skill shows up in the
skills index / autocomplete / slash-command list. It is intentionally NOT
enforced by ``skill_view`` or ``--skills`` preloading an explicit load is
explicit consent, and load-bearing force-loads (e.g. the kanban dispatcher
injecting ``--skills kanban-worker``) must always succeed regardless of how
the offer surfaces filter the skill.
explicit consent, and load-bearing force-loads (e.g. a dispatcher pinning
a task to a specialist skill via ``--skills``) must always succeed
regardless of how the offer surfaces filter the skill.
A skill matches when ANY of its declared environments is currently active
(OR semantics, mirroring ``platforms``). Unknown env tags fail open.

View file

@ -22,9 +22,31 @@ TitleCallback = Callable[[str], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
"Write the title in the same language the user is writing in. "
"Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
)
_TITLE_PROMPT_PINNED_LANGUAGE = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
"Write the title in {language}. "
"Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
)
def _title_language() -> str:
"""Return configured title language, or empty string to match the user."""
try:
from hermes_cli.config import load_config
return str(
((load_config() or {}).get("auxiliary") or {})
.get("title_generation", {})
.get("language", "")
).strip()
except Exception:
return ""
def generate_title(
user_message: str,
@ -48,8 +70,11 @@ def generate_title(
user_snippet = user_message[:500] if user_message else ""
assistant_snippet = assistant_response[:500] if assistant_response else ""
language = _title_language()
prompt = _TITLE_PROMPT_PINNED_LANGUAGE.format(language=language) if language else _TITLE_PROMPT
messages = [
{"role": "system", "content": _TITLE_PROMPT},
{"role": "system", "content": prompt},
{"role": "user", "content": f"User: {user_snippet}\n\nAssistant: {assistant_snippet}"},
]

View file

@ -44,9 +44,26 @@ from tools.tool_result_storage import (
maybe_persist_tool_result,
enforce_turn_budget,
)
from tools.budget_config import BudgetConfig, DEFAULT_BUDGET, budget_for_context_window
logger = logging.getLogger(__name__)
def _budget_for_agent(agent) -> BudgetConfig:
"""Resolve a tool-result BudgetConfig scaled to the agent's context window.
Large-context models keep the historical 100K/200K char defaults; small
models (e.g. a 65K-token local model switched into mid-session) get a budget
proportional to their window so a single large tool result can't push the
request past the model's limit (#23767). Falls back to the default budget
when the context length isn't resolvable.
"""
try:
ctx = getattr(getattr(agent, "context_compressor", None), "context_length", None)
return budget_for_context_window(int(ctx)) if ctx else DEFAULT_BUDGET
except Exception:
return DEFAULT_BUDGET
# Maximum number of concurrent worker threads for parallel tool execution.
# Mirrors the constant in ``run_agent`` for tests/imports that look here.
_MAX_TOOL_WORKERS = 8
@ -249,6 +266,10 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
tool_calls = assistant_message.tool_calls
num_tools = len(tool_calls)
# Resolve the context-scaled tool-output budget once per turn (cheap, but
# avoids rebuilding it per result inside the loop below).
_tool_budget = _budget_for_agent(agent)
# ── Pre-flight: interrupt check ──────────────────────────────────
if agent._interrupt_requested:
print(f"{agent.log_prefix}⚡ Interrupt: skipping {num_tools} tool call(s)")
@ -725,6 +746,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
tool_name=name,
tool_use_id=tc.id,
env=get_active_env(effective_task_id),
config=_tool_budget,
) if not _is_multimodal_tool_result(function_result) else function_result
subdir_hints = agent._subdirectory_hints.check_tool_call(name, args)
@ -756,7 +778,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
num_tools = len(parsed_calls)
if num_tools > 0:
turn_tool_msgs = messages[-num_tools:]
enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id))
enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id), config=_tool_budget)
# ── /steer injection ──────────────────────────────────────────────
# Append any pending user steer text to the last tool result so the
@ -769,6 +791,8 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
def execute_tool_calls_sequential(agent, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
"""Execute tool calls sequentially (original behavior). Used for single calls or interactive tools."""
# Resolve the context-scaled tool-output budget once per turn.
_tool_budget = _budget_for_agent(agent)
for i, tool_call in enumerate(assistant_message.tool_calls, 1):
# SAFETY: check interrupt BEFORE starting each tool.
# If the user sent "stop" during a previous tool's execution,
@ -1377,6 +1401,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
tool_name=function_name,
tool_use_id=tool_call.id,
env=get_active_env(effective_task_id),
config=_tool_budget,
) if not _is_multimodal_tool_result(function_result) else function_result
# Discover subdirectory context files from tool arguments
@ -1425,7 +1450,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
# ── Per-turn aggregate budget enforcement ─────────────────────────
num_tools_seq = len(assistant_message.tool_calls)
if num_tools_seq > 0:
enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id))
enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id), config=_tool_budget)
# ── /steer injection ──────────────────────────────────────────────
# See _execute_tool_calls_parallel for the rationale. Same hook,

View file

@ -172,6 +172,7 @@ class ChatCompletionsTransport(ProviderTransport):
"codex_reasoning_items" in msg
or "codex_message_items" in msg
or "tool_name" in msg
or "timestamp" in msg # #47868 — strict providers reject this
):
needs_sanitize = True
break
@ -201,6 +202,7 @@ class ChatCompletionsTransport(ProviderTransport):
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
msg.pop("tool_name", None)
msg.pop("timestamp", None) # #47868 — leak into strict providers
# Drop all Hermes-internal scaffolding markers (``_``-prefixed).
# OpenAI's message schema has no ``_``-prefixed fields, so this
# is safe and future-proofs against new markers being added.
@ -435,10 +437,6 @@ class ChatCompletionsTransport(ProviderTransport):
extra_body["extra_body"] = openai_compat_extra
elif raw_thinking_config:
extra_body["thinking_config"] = raw_thinking_config
elif provider_name == "google-gemini-cli":
thinking_config = _build_gemini_thinking_config(model, reasoning_config)
if thinking_config:
extra_body["thinking_config"] = thinking_config
# Merge any pre-built extra_body additions
additions = params.get("extra_body_additions")

View file

@ -112,6 +112,24 @@ def build_turn_context(
# Restore the primary runtime if the previous turn activated fallback.
agent._restore_primary_runtime()
# Between-turns MCP refresh: an MCP server that finished connecting since
# the previous turn (slow HTTP/OAuth servers routinely take 2-6s on a cold
# connect, missing the bounded startup wait) lands in THIS turn's tool
# snapshot. This is cache-safe by construction: it runs in the per-turn
# prologue, before this turn's first API call assembles ``tools=``, so it
# only ever extends a fresh request prefix — it never mutates the cached
# prefix of an in-flight turn. No-op when no MCP servers are registered
# (the common case, gated by the cheap ``has_registered_mcp_tools`` check)
# or when the tool set is unchanged (``refresh_agent_mcp_tools`` diffs by
# name and leaves the snapshot untouched on no-change).
try:
if not getattr(agent, "_skip_mcp_refresh", False):
from tools.mcp_tool import has_registered_mcp_tools, refresh_agent_mcp_tools
if has_registered_mcp_tools():
refresh_agent_mcp_tools(agent, quiet_mode=True)
except Exception:
logger.debug("between-turns MCP tool refresh skipped", exc_info=True)
# Sanitize surrogate characters from user input.
if isinstance(user_message, str):
user_message = sanitize_surrogates(user_message)

View file

@ -128,19 +128,44 @@ def finalize_turn(
and not failed
)
# Post-loop cleanup must never lose the response. Trajectory save,
# resource teardown, and session persistence all touch fallible
# surfaces — file I/O / JSON serialization (_save_trajectory), remote
# VM/browser teardown over the network (_cleanup_task_resources), and
# SQLite writes (_persist_session). A raise from any of them used to
# propagate straight out of run_conversation, discarding the partial
# final_response the caller is waiting for (subprocess wrappers saw an
# empty stdout with no traceback — #8049). Each step is now guarded
# independently so one failure can't skip the others, and any errors
# are surfaced on the result dict via ``cleanup_errors`` rather than
# killing the turn.
_cleanup_errors = []
# Save trajectory if enabled. ``user_message`` may be a multimodal
# list of parts; the trajectory format wants a plain string.
agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
try:
agent._save_trajectory(messages, _summarize_user_message_for_log(user_message), completed)
except Exception as _save_err:
_cleanup_errors.append(f"save_trajectory: {_save_err}")
logger.error("finalize_turn: _save_trajectory failed: %s", _save_err, exc_info=True)
# Clean up VM and browser for this task after conversation completes
agent._cleanup_task_resources(effective_task_id)
try:
agent._cleanup_task_resources(effective_task_id)
except Exception as _cleanup_err:
_cleanup_errors.append(f"cleanup_task_resources: {_cleanup_err}")
logger.error("finalize_turn: _cleanup_task_resources failed: %s", _cleanup_err, exc_info=True)
# Persist session to both JSON log and SQLite only after private retry
# scaffolding has been removed. Otherwise a later user "continue" turn
# can replay assistant("(empty)") / recovery nudges and fall into the
# same empty-response loop again.
agent._drop_trailing_empty_response_scaffolding(messages)
agent._persist_session(messages, conversation_history)
try:
agent._drop_trailing_empty_response_scaffolding(messages)
agent._persist_session(messages, conversation_history)
except Exception as _persist_err:
_cleanup_errors.append(f"persist_session: {_persist_err}")
logger.error("finalize_turn: _persist_session failed: %s", _persist_err, exc_info=True)
# ── Turn-exit diagnostic log ─────────────────────────────────────
# Always logged at INFO so agent.log captures WHY every turn ended.
@ -354,6 +379,11 @@ def finalize_turn(
}
if agent._tool_guardrail_halt_decision is not None:
result["guardrail"] = agent._tool_guardrail_halt_decision.to_metadata()
# Surface any post-loop cleanup failures so the caller can distinguish a
# clean turn from one whose trajectory/session/resource teardown raised
# (the response is still returned either way — #8049).
if _cleanup_errors:
result["cleanup_errors"] = _cleanup_errors
# If a /steer landed after the final assistant turn (no more tool
# batches to drain into), hand it back to the caller so it can be
# delivered as the next user turn instead of being silently lost.

View file

@ -58,6 +58,12 @@ class TurnRetryState:
primary_recovery_attempted: bool = False
has_retried_429: bool = False
# ── Auth-failure provider failover ───────────────────────────────────
# Set once we've escalated a persistent 401/403 (after the per-provider
# credential-refresh attempt above failed) to the fallback chain, so we
# don't loop on the same auth failover within one attempt.
auth_failover_attempted: bool = False
# ── Restart signals (read by the outer loop after the attempt) ───────
restart_with_compressed_messages: bool = False
restart_with_length_continuation: bool = False

View file

@ -451,6 +451,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@ -461,6 +463,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@ -471,6 +475,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@ -481,6 +487,8 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
cache_read_cost_per_million=Decimal("0.08"),
cache_write_cost_per_million=Decimal("1.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
@ -584,6 +592,26 @@ def resolve_billing_route(
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
def _normalize_bedrock_model_name(model: str) -> str:
"""Normalize a Bedrock model id to its bare foundation-model form.
Bedrock cross-region inference profiles prefix the foundation model id
with a region scope (``us.`` / ``global.`` / ``eu.`` / ``ap.`` / ``jp.``),
e.g. ``us.anthropic.claude-opus-4-7``. The pricing table is keyed on the
bare ``anthropic.claude-*`` id, so the prefix must be stripped before the
lookup or every cross-region session prices as unknown. Mirrors the
prefix list in ``bedrock_adapter.is_anthropic_bedrock_model``. Also
normalizes dot-notation version numbers (``4.7`` ``4-7``).
"""
name = model.lower().strip()
for prefix in ("us.", "global.", "eu.", "ap.", "jp."):
if name.startswith(prefix):
name = name[len(prefix):]
break
name = re.sub(r"(\d+)\.(\d+)", r"\1-\2", name)
return name
def _normalize_anthropic_model_name(model: str) -> str:
"""Normalize Anthropic model name variants to canonical form.
@ -614,6 +642,14 @@ def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]
entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
if entry:
return entry
# Bedrock cross-region inference profiles carry a region prefix
# (us./global./eu./...) that the bare pricing keys don't have.
if route.provider == "bedrock":
normalized = _normalize_bedrock_model_name(model)
if normalized != model:
entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
if entry:
return entry
return None

View file

@ -77,6 +77,19 @@ pub fn installer_dest() -> PathBuf {
hermes_home().join(name)
}
/// Marker the updater writes for the duration of an in-app update and removes
/// when it finishes (see update.rs `UpdateMarkerGuard`). A freshly-launched
/// desktop checks this before spawning its own local backend: spawning one
/// mid-update re-locks the venv shim and triggers `force_kill_other_hermes`,
/// which then kills that legitimate backend in a respawn loop (#50238).
///
/// Lives directly under HERMES_HOME (same rationale as `installer_dest`) so the
/// Electron desktop — which resolves HERMES_HOME identically and pins it into
/// the updater's env — agrees on the exact path.
pub fn update_in_progress_marker() -> PathBuf {
hermes_home().join(".hermes-update-in-progress")
}
/// Copy the currently-running installer binary to `installer_dest()` so it's
/// available for future `--update` runs and shortcut launches.
///

View file

@ -103,9 +103,61 @@ pub async fn start_update(app: AppHandle) -> Result<(), String> {
Ok(())
}
/// RAII guard that owns the "update in progress" marker (see
/// `paths::update_in_progress_marker`). Created at the top of `run_update`;
/// its `Drop` removes the marker on EVERY exit path — success, early
/// `return Err`, or a panic that unwinds through `run_update` — so a crashed
/// or aborted updater can never permanently strand the marker and block
/// future desktop launches. The marker payload is `{pid}\n{started_at_unix}`
/// so the desktop's launch gate can detect a stale marker (dead PID / past a
/// hard ceiling) and self-heal rather than wait forever.
struct UpdateMarkerGuard {
path: PathBuf,
}
impl UpdateMarkerGuard {
/// Write the marker. Best-effort: a write failure must NOT abort the
/// update (the gate degrades to "no marker => proceed", i.e. exactly the
/// pre-fix behavior), so we log and carry on with a guard that still
/// attempts cleanup of whatever may exist at the path.
fn acquire(path: PathBuf) -> Self {
let pid = std::process::id();
let started_at = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0);
if let Some(parent) = path.parent() {
let _ = std::fs::create_dir_all(parent);
}
if let Err(err) = std::fs::write(&path, format!("{pid}\n{started_at}")) {
tracing::warn!(?path, %err, "could not write update-in-progress marker");
}
Self { path }
}
}
impl Drop for UpdateMarkerGuard {
fn drop(&mut self) {
if let Err(err) = std::fs::remove_file(&self.path) {
if err.kind() != std::io::ErrorKind::NotFound {
tracing::warn!(path = ?self.path, %err, "could not remove update-in-progress marker");
}
}
}
}
async fn run_update(app: AppHandle) -> Result<()> {
let hermes_home = crate::paths::hermes_home();
let install_root = hermes_home.join("hermes-agent");
// Mutual exclusion (#50238): publish an "update in progress" marker for the
// entire duration of this update. A desktop instance the user relaunches
// mid-update consults this before spawning its own local backend — without
// it, that backend re-locks the venv shim, our `force_kill_other_hermes`
// straggler-cleanup kills it, and the relaunch/kill cycle loops. The guard
// removes the marker on every exit path (incl. early returns / panics).
let _update_marker = UpdateMarkerGuard::acquire(crate::paths::update_in_progress_marker());
let update_branch = update_branch_from_args(std::env::args().skip(1))
.or_else(|| option_env_string("BUILD_PIN_BRANCH"))
.unwrap_or_else(|| "main".to_string());
@ -518,11 +570,13 @@ fn format_locked_paths(paths: &[PathBuf]) -> String {
/// taskkill, excluding our own PID.
///
/// Safe w.r.t. our own update child: this runs inside the install-lock wait,
/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. At this
/// point no update-driven hermes.exe exists yet, so the only hermes.exe images
/// are stragglers from the old desktop — exactly what we want gone. (`/FI PID
/// ne <self>` also spares this Tauri process, though it isn't named
/// hermes.exe.)
/// which completes BEFORE we spawn `venv\Scripts\hermes.exe update`. And a
/// desktop the user relaunches mid-update will NOT have spawned a backend —
/// `startHermes()` in the desktop gates local-backend startup on our
/// update-in-progress marker and parks until we finish (#50238). So the only
/// hermes.exe images here are stragglers from the old desktop — exactly what
/// we want gone. (`/FI PID ne <self>` also spares this Tauri process, though it
/// isn't named hermes.exe.)
fn force_kill_other_hermes() {
if !cfg!(target_os = "windows") {
return;
@ -992,6 +1046,48 @@ mod tests {
assert!(locked_paths(&probes).is_empty());
}
#[test]
fn update_marker_guard_writes_then_removes_on_drop() {
let dir = unique_tmp_dir("marker-guard");
std::fs::create_dir_all(&dir).unwrap();
let marker = dir.join(".hermes-update-in-progress");
{
let _g = UpdateMarkerGuard::acquire(marker.clone());
assert!(marker.exists(), "marker must exist while the guard is held");
let body = std::fs::read_to_string(&marker).unwrap();
let pid_line = body.lines().next().unwrap();
assert_eq!(
pid_line.trim().parse::<u32>().unwrap(),
std::process::id(),
"marker records our pid so the desktop can probe liveness"
);
assert_eq!(body.lines().count(), 2, "marker is pid + started_at lines");
}
assert!(
!marker.exists(),
"Drop must remove the marker on every exit path (incl. early return / panic unwind)"
);
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn update_marker_guard_drop_is_quiet_when_already_gone() {
let dir = unique_tmp_dir("marker-guard-gone");
std::fs::create_dir_all(&dir).unwrap();
let marker = dir.join(".hermes-update-in-progress");
let guard = UpdateMarkerGuard::acquire(marker.clone());
// Simulate an external cleanup (e.g. the desktop pruned a marker it
// judged stale) before our guard drops — Drop must not panic.
std::fs::remove_file(&marker).unwrap();
drop(guard);
assert!(!marker.exists());
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn parses_update_branch_from_space_or_equals_args() {
assert_eq!(

View file

@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification

View file

@ -1,5 +1,32 @@
const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
// The announcement clock starts the instant the backend process is spawned —
// before uvicorn binds its socket. On a cold install the child must first
// compile and import the whole `hermes_cli.main` → `web_server` → FastAPI/
// uvicorn chain, and on Windows real-time AV (Defender) scans every freshly
// written `.pyc`. That pre-bind cost can run 30-60s on a slow disk, so a tight
// 45s deadline kills a *healthy but still-starting* backend and respawns it,
// piling up orphaned processes (issue #50209). A roomier default absorbs the
// cold-start cost; a warm start still announces in well under a second.
const DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS = 90_000
// Never trust a deadline tighter than the warm-start path needs; floor at 45s
// (the historical default) so a malformed override can't reintroduce the loop.
const MIN_PORT_ANNOUNCE_TIMEOUT_MS = 45_000
/**
* Resolve the port-announcement deadline. Honors the
* HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS env override (for users on slow
* disks / aggressive AV who need an even longer cold-start window), clamped
* to a sane floor so a bad value can't make boot flakier than the default.
*/
function resolvePortAnnounceTimeoutMs(env = process.env) {
const parsed = Number(env.HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS)
if (Number.isFinite(parsed) && parsed > 0) {
return Math.max(MIN_PORT_ANNOUNCE_TIMEOUT_MS, Math.round(parsed))
}
return DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS
}
/**
* Watch a child process's stdout for the `HERMES_DASHBOARD_READY port=<N>`
* line that web_server.py prints after uvicorn binds its socket.
@ -9,11 +36,15 @@ const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
* - the child emits an `error` event
* - no line arrives within the timeout
*
* The default timeout is cold-start tolerant (see
* DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS) because the clock starts before the
* backend has even bound its port. Pass an explicit `timeoutMs` to override.
*
* A single `cleanup()` tears down every listener (data/exit/error/timeout)
* on every terminal path resolve, reject, or timeout so repeated
* backend spawns don't leak listener slots on the child.
*/
function waitForDashboardPort(child, timeoutMs = 45_000) {
function waitForDashboardPort(child, timeoutMs = resolvePortAnnounceTimeoutMs()) {
return new Promise((resolve, reject) => {
let buf = ''
let done = false
@ -63,4 +94,9 @@ function waitForDashboardPort(child, timeoutMs = 45_000) {
})
}
module.exports = { waitForDashboardPort }
module.exports = {
waitForDashboardPort,
resolvePortAnnounceTimeoutMs,
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS,
}

View file

@ -0,0 +1,121 @@
/**
* Tests for electron/backend-ready.cjs.
*
* Run with: node --test electron/backend-ready.test.cjs
* (Wired into npm test:desktop:platforms in package.json.)
*
* Covers the cold-start port-announcement deadline (issue #50209): the clock
* starts before the backend binds its port, so a tight 45s deadline killed a
* healthy-but-still-compiling backend on cold Windows installs. The default is
* now cold-start tolerant and overridable via
* HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor.
*/
const test = require('node:test')
const assert = require('node:assert/strict')
const { EventEmitter } = require('node:events')
const {
waitForDashboardPort,
resolvePortAnnounceTimeoutMs,
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS,
} = require('./backend-ready.cjs')
// A minimal stand-in for a spawned child process: an EventEmitter with a
// stdout EventEmitter, matching the surface waitForDashboardPort consumes
// (child.stdout.on('data'), child.on('exit'|'error') + the .off() teardown).
function makeFakeChild() {
const child = new EventEmitter()
child.stdout = new EventEmitter()
return child
}
// ---------------------------------------------------------------------------
// resolvePortAnnounceTimeoutMs
// ---------------------------------------------------------------------------
test('default is cold-start tolerant (> the historical 45s floor)', () => {
assert.equal(resolvePortAnnounceTimeoutMs({}), DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS)
assert.ok(
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS > MIN_PORT_ANNOUNCE_TIMEOUT_MS,
'cold-start default must exceed the warm-start floor'
)
})
test('honors a valid HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS override', () => {
const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '120000' }
assert.equal(resolvePortAnnounceTimeoutMs(env), 120_000)
})
test('clamps an override below the floor up to the 45s minimum', () => {
const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '1000' }
assert.equal(resolvePortAnnounceTimeoutMs(env), MIN_PORT_ANNOUNCE_TIMEOUT_MS)
})
test('rounds a fractional override', () => {
const env = { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: '60000.7' }
assert.equal(resolvePortAnnounceTimeoutMs(env), 60_001)
})
test('falls back to the default for malformed / non-positive overrides', () => {
for (const bad of ['', 'abc', '0', '-5', 'NaN', undefined]) {
const env = bad === undefined ? {} : { HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS: bad }
assert.equal(
resolvePortAnnounceTimeoutMs(env),
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
`override ${JSON.stringify(bad)} should fall through to the default`
)
}
})
// ---------------------------------------------------------------------------
// waitForDashboardPort
// ---------------------------------------------------------------------------
test('resolves with the announced port', async () => {
const child = makeFakeChild()
const p = waitForDashboardPort(child, 1000)
child.stdout.emit('data', 'noise before\nHERMES_DASHBOARD_READY port=54321\n')
assert.equal(await p, 54321)
})
test('parses the port even when the line arrives split across chunks', async () => {
const child = makeFakeChild()
const p = waitForDashboardPort(child, 1000)
child.stdout.emit('data', 'HERMES_DASHBOARD_READY po')
child.stdout.emit('data', 'rt=8080\n')
assert.equal(await p, 8080)
})
test('rejects when the child exits before announcing', async () => {
const child = makeFakeChild()
const p = waitForDashboardPort(child, 1000)
child.emit('exit', 1, null)
await assert.rejects(p, /exited before port announcement/)
})
test('rejects on a child error event', async () => {
const child = makeFakeChild()
const p = waitForDashboardPort(child, 1000)
child.emit('error', new Error('spawn ENOENT'))
await assert.rejects(p, /spawn ENOENT/)
})
test('rejects with the timeout message after the deadline', async () => {
const child = makeFakeChild()
await assert.rejects(
waitForDashboardPort(child, 20),
/Timed out waiting for Hermes backend port announcement \(20ms\)/
)
})
test('a late announcement after timeout does not throw (listeners torn down)', async () => {
const child = makeFakeChild()
await assert.rejects(waitForDashboardPort(child, 20), /Timed out/)
// The orphaned backend may still print its READY line later; the watcher
// must have detached so this emit is a no-op rather than a double-settle.
assert.doesNotThrow(() => {
child.stdout.emit('data', 'HERMES_DASHBOARD_READY port=9999\n')
})
})

View file

@ -0,0 +1,42 @@
'use strict'
// Hidden BrowserWindow used by tier-2 link-title resolution: when curl can't
// read a page <title> (bot walls, JS-rendered pages), we briefly load the URL
// in an offscreen window and read its title. That window loads arbitrary
// user-linked pages — including YouTube/`watch` URLs that autoplay — so it must
// never be allowed to emit sound.
function linkTitleWindowOptions(partitionSession) {
return {
show: false,
width: 1280,
height: 800,
webPreferences: {
backgroundThrottling: false,
contextIsolation: true,
javascript: true,
nodeIntegration: false,
sandbox: true,
session: partitionSession,
webSecurity: true
}
}
}
// Create the offscreen title-fetch window and immediately mute it. Without the
// mute, autoplaying media on the loaded page (e.g. a YouTube link) leaks ~2s of
// audio every time a session containing such links is re-rendered. See #49505.
function createLinkTitleWindow(BrowserWindow, partitionSession) {
const window = new BrowserWindow(linkTitleWindowOptions(partitionSession))
try {
window.webContents.setAudioMuted(true)
} catch {
// webContents may be unavailable in degraded/headless environments; muting
// is best-effort and the window is destroyed within a few seconds anyway.
}
return window
}
module.exports = { createLinkTitleWindow, linkTitleWindowOptions }

View file

@ -0,0 +1,56 @@
const assert = require('node:assert/strict')
const test = require('node:test')
const { createLinkTitleWindow, linkTitleWindowOptions } = require('./link-title-window.cjs')
function makeFakeBrowserWindow() {
const calls = { audioMuted: [] }
const FakeBrowserWindow = function (options) {
this.options = options
this.webContents = {
setAudioMuted(value) {
calls.audioMuted.push(value)
}
}
}
return { FakeBrowserWindow, calls }
}
test('linkTitleWindowOptions keeps the offscreen, hardened defaults', () => {
const session = { id: 'link-titles' }
const options = linkTitleWindowOptions(session)
assert.equal(options.show, false)
assert.equal(options.webPreferences.session, session)
assert.equal(options.webPreferences.contextIsolation, true)
assert.equal(options.webPreferences.sandbox, true)
assert.equal(options.webPreferences.nodeIntegration, false)
})
test('createLinkTitleWindow mutes audio so historical links never autoplay sound', () => {
// Regression for #49505: the hidden title-fetch window loaded YouTube/watch
// URLs (to read their <title>) without muting, leaking ~2s of audio on every
// history re-render.
const { FakeBrowserWindow, calls } = makeFakeBrowserWindow()
const window = createLinkTitleWindow(FakeBrowserWindow, { id: 'link-titles' })
assert.ok(window instanceof FakeBrowserWindow)
assert.deepEqual(calls.audioMuted, [true])
})
test('createLinkTitleWindow still returns the window if muting throws', () => {
const ThrowingBrowserWindow = function (options) {
this.options = options
this.webContents = {
setAudioMuted() {
throw new Error('webContents unavailable')
}
}
}
const window = createLinkTitleWindow(ThrowingBrowserWindow, { id: 'link-titles' })
assert.ok(window instanceof ThrowingBrowserWindow)
})

View file

@ -34,6 +34,7 @@ const {
SESSION_WINDOW_MIN_WIDTH
} = require('./session-windows.cjs')
const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
const { createLinkTitleWindow } = require('./link-title-window.cjs')
const { probeGatewayWebSocket } = require('./gateway-ws-probe.cjs')
const { adoptServedDashboardToken } = require('./dashboard-token.cjs')
const { waitForDashboardPort } = require('./backend-ready.cjs')
@ -42,6 +43,16 @@ const { fetchMarketplaceThemes, searchMarketplaceThemes } = require('./vscode-ma
const { buildDesktopBackendEnv, normalizeHermesHomeRoot } = require('./backend-env.cjs')
const { readWindowsUserEnvVar } = require('./windows-user-env.cjs')
const { readDirForIpc } = require('./fs-read-dir.cjs')
const { readLiveUpdateMarker } = require('./update-marker.cjs')
const {
resolveUnpackedRelease,
decideRelaunchOutcome,
sandboxPreflight,
sandboxFallbackFromEnv,
collectRelaunchArgs,
collectRelaunchEnv,
buildRelaunchScript
} = require('./update-relaunch.cjs')
const { gitRootForIpc } = require('./git-root.cjs')
const { worktreesForIpc } = require('./git-worktrees.cjs')
const { OFFICIAL_REPO_HTTPS_URL, isOfficialSshRemote } = require('./update-remote.cjs')
@ -150,6 +161,8 @@ if (REMOTE_DISPLAY_REASON) {
)
}
ipcMain.handle('hermes:get-remote-display-reason', () => REMOTE_DISPLAY_REASON)
// Keep the renderer running at full speed while the window is in the background
// or occluded. The chat transcript streams to screen through a
// requestAnimationFrame-gated flush; Chromium pauses rAF (and clamps timers)
@ -268,6 +281,23 @@ function resolveHermesHome() {
}
const HERMES_HOME = resolveHermesHome()
function hermesManagedNodePathEntries() {
// NOTE: keep this ordering in sync with iter_hermes_node_dirs() in
// hermes_constants.py — this Node main process cannot import the Python
// module, so the platform-ordering rule is mirrored here.
const root = path.join(HERMES_HOME, 'node')
const bin = path.join(root, 'bin')
const entries = IS_WINDOWS ? [root, bin] : [bin, root]
return entries.filter(directoryExists)
}
function pathWithHermesManagedNode(...entries) {
return [...hermesManagedNodePathEntries(), ...entries, process.env.PATH]
.filter(Boolean)
.join(path.delimiter)
}
// ACTIVE_HERMES_ROOT — the canonical mutable Hermes install. Same path
// install.ps1 / install.sh use, so a desktop-only user and a CLI-only user end
// up with identical layouts and can share one install.
@ -1090,6 +1120,59 @@ function directoryExists(filePath) {
}
}
// --- in-app update mutual exclusion (#50238) -------------------------------
// The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
// duration of an `--update` run (see update.rs UpdateMarkerGuard). If the user
// relaunches the desktop mid-update — because the window vanished with no
// progress and looks crashed — a fresh instance must NOT spawn its own local
// backend: that backend re-locks the venv shim, the updater's straggler cleanup
// (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
// fails with the 45s "backend didn't come up" error, and the relaunch/kill
// cycle loops. Instead the fresh instance parks until the update finishes, then
// brings the backend up itself (it is the surviving instance — the updater's
// own relaunch hits our single-instance lock and quits). Marker parsing +
// staleness self-heal live in update-marker.cjs (unit-tested).
// How long we'll park the launch waiting for a live update to finish before
// giving up and starting the backend anyway (belt-and-suspenders alongside the
// marker's own age ceiling; covers a stuck-but-alive updater).
const UPDATE_WAIT_TIMEOUT_MS = 20 * 60 * 1000
const UPDATE_WAIT_POLL_MS = 1000
// How long the desktop lingers on the "updating, don't reopen" overlay after
// spawning the detached updater, before it quits to release the venv shim. The
// old 600ms was long enough to register the child process but far too short for
// the user to READ the overlay — the window just vanished, looked like a crash,
// and the user relaunched mid-update (the #50238 restart-loop trigger). A
// couple of seconds lets the message land and bridges the gap until the
// updater's own progress window appears. (#50419)
const UPDATE_HANDOFF_DWELL_MS = 2500
// Block until no live update is in progress (or we hit the wait timeout).
// Emits a boot-progress phase so the renderer shows "Update in progress…"
// rather than a frozen splash. Returns true if it parked at all.
async function waitForUpdateToFinish() {
let marker = readLiveUpdateMarker(HERMES_HOME)
if (!marker) return false
rememberLog(`[updates] update in progress (pid=${marker.pid}); deferring backend start until it finishes`)
const deadline = Date.now() + UPDATE_WAIT_TIMEOUT_MS
while (marker && Date.now() < deadline) {
await advanceBootProgress(
'backend.update-wait',
'An update is finishing — Hermes will start automatically when it completes…',
12
)
await new Promise(r => setTimeout(r, UPDATE_WAIT_POLL_MS))
marker = readLiveUpdateMarker(HERMES_HOME)
}
if (marker) {
rememberLog('[updates] update still in progress after wait timeout; starting backend anyway')
} else {
rememberLog('[updates] update finished; proceeding with backend start')
}
return true
}
function unpackedPathFor(filePath) {
return filePath.replace(/app\.asar(?=$|[\\/])/, 'app.asar.unpacked')
}
@ -1801,7 +1884,11 @@ async function applyUpdates(opts = {}) {
return { ok: true, manual: true, command, hermesRoot: updateRoot }
}
emitUpdateProgress({ stage: 'restart', message: 'Handing off to the Hermes updater…', percent: 100 })
emitUpdateProgress({
stage: 'restart',
message: 'Updating Hermes — this window will close and the updater will open. Dont reopen Hermes yourself; it restarts automatically when the update finishes.',
percent: 100
})
repairMacUpdaterHelper(updater)
const updateRoot = resolveUpdateRoot()
@ -1827,7 +1914,7 @@ async function applyUpdates(opts = {}) {
env: {
...process.env,
HERMES_HOME,
PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
PATH: pathWithHermesManagedNode(venvBin)
},
detached: true,
stdio: 'ignore',
@ -1837,11 +1924,14 @@ async function applyUpdates(opts = {}) {
rememberLog(`[updates] launched updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release venv shim`)
// Give the OS a beat to register the new process, then quit. The updater
// rebuilds and relaunches us when it's done.
// Linger on the "updating — don't reopen" overlay long enough for the user
// to actually read it (and to bridge the gap until the updater's own window
// appears), THEN quit to release the venv shim. The updater rebuilds and
// relaunches us when it's done. (#50419 — a 600ms quit looked like a crash
// and lured users into the #50238 relaunch loop.)
setTimeout(() => {
app.quit()
}, 600)
}, UPDATE_HANDOFF_DWELL_MS)
return { ok: true, handedOff: true, updater }
} finally {
@ -1871,7 +1961,7 @@ async function handOffWindowsBootstrapRecovery(reason) {
env: {
...process.env,
HERMES_HOME,
PATH: [path.join(HERMES_HOME, 'node', 'bin'), venvBin, process.env.PATH].filter(Boolean).join(path.delimiter)
PATH: pathWithHermesManagedNode(venvBin)
},
detached: true,
stdio: 'ignore',
@ -1880,9 +1970,12 @@ async function handOffWindowsBootstrapRecovery(reason) {
child.unref()
rememberLog(`[bootstrap] handed off ${reason} recovery to updater: ${updater} ${updaterArgs.join(' ')}; exiting desktop to release app.asar`)
// Same dwell as the in-app update hand-off (#50419): give the updater's
// window time to appear before we vanish, so the recovery doesn't look like
// a crash and provoke a mid-recovery relaunch.
setTimeout(() => {
app.quit()
}, 600)
}, UPDATE_HANDOFF_DWELL_MS)
return true
}
@ -1952,13 +2045,11 @@ async function applyUpdatesPosixInApp() {
}
// Put the Hermes-managed Node and the venv on PATH so `hermes desktop`'s
// npm build can find them on a machine with no system Node.
const extraPath = [path.join(HERMES_HOME, 'node', 'bin'), path.join(updateRoot, 'venv', 'bin')]
.filter(Boolean)
.join(path.delimiter)
// npm build can find them on a machine with no system Node. Windows portable
// Node lives directly under %LOCALAPPDATA%\hermes\node, not node\bin.
const env = {
HERMES_HOME,
PATH: [extraPath, process.env.PATH].filter(Boolean).join(path.delimiter)
PATH: pathWithHermesManagedNode(path.join(updateRoot, 'venv', 'bin'))
}
// `hermes update` reaps stale `hermes dashboard` backends (a code update
@ -2028,6 +2119,114 @@ async function applyUpdatesPosixInApp() {
return { ok: false, backendUpdated: true, error: 'desktop rebuild failed' }
}
// Linux in-app update terminal state (#45205). `hermes desktop --build-only`
// rebuilds the unpacked app in place under apps/desktop/release/<plat>-unpacked.
// We can only HONESTLY relaunch into the new GUI when the *running* binary IS
// that rebuilt one — i.e. execPath lives under release/<plat>-unpacked. The
// outcome is decided by three signals (see update-relaunch.cjs):
//
// underUnpacked + sandboxOk → 'relaunch': detached watcher re-execs us in
// place (mirrors the macOS handoff). Without it the update succeeds but
// the app never restarts and the overlay hangs on "applying" forever.
// !underUnpacked → 'guiSkew': the running shell is an AppImage/
// .deb/.rpm/dev/unresolved binary we did NOT replace. Claiming "loads
// next launch" is a lie (GUI/backend skew, #37541) — surface an
// explicit closeable terminal state telling the user the GUI package
// was NOT changed and must be updated/reinstalled.
// underUnpacked + !sandboxOk → 'manual': we'd be relaunching the rebuilt
// binary, but a fresh rebuild can leave chrome-sandbox without
// root:root + setuid (mode 4755) and Electron then refuses to launch
// ("quit and never came back"). DO NOT quit into a dead app — keep the
// working window and surface the closeable manual-restart state.
if (!IS_MAC) {
const unpackedDir = resolveUnpackedRelease(process.execPath, updateRoot, process.platform)
const underUnpacked = unpackedDir !== null
const preflight = underUnpacked
? sandboxPreflight(unpackedDir, p => fs.statSync(p))
: { ok: false, reason: 'not-under-unpacked', path: null }
const sandboxFallback = sandboxFallbackFromEnv(process.env, process.argv.slice(1))
const sandboxOk = preflight.ok || sandboxFallback
if (underUnpacked && !preflight.ok) {
rememberLog(
`[updates] sandbox preflight: not launchable (${preflight.reason}) at ${preflight.path}; ` +
`fallback=${sandboxFallback ? 'env/--no-sandbox' : 'none'}`
)
}
const outcome = decideRelaunchOutcome({ underUnpacked, sandboxOk })
if (outcome === 'relaunch') {
emitUpdateProgress({ stage: 'restart', message: 'Restarting Hermes…', percent: 100 })
// Preserve launch context across the re-exec: replay the original args
// (filtered of Electron internals) and the env/cwd that define which
// backend/profile/root this instance talks to. Without this the
// relaunched instance comes up with default context instead of the user's.
const relaunchArgs = collectRelaunchArgs(process.argv.slice(1))
const relaunchEnv = collectRelaunchEnv(process.env)
const relaunchScript = buildRelaunchScript({
pid: process.pid,
execPath: process.execPath,
args: relaunchArgs,
env: relaunchEnv,
cwd: process.cwd()
})
const scriptPath = path.join(app.getPath('temp'), `hermes-desktop-update-${Date.now()}.sh`)
try {
fs.writeFileSync(scriptPath, relaunchScript, { mode: 0o755 })
const child = spawn('/bin/bash', [scriptPath], { detached: true, stdio: 'ignore' })
child.unref()
rememberLog(
`[updates] launched linux relaunch: ${scriptPath} -> ${process.execPath} ` +
`(args=${relaunchArgs.length}, env=${Object.keys(relaunchEnv).length})`
)
setTimeout(() => app.quit(), UPDATE_HANDOFF_DWELL_MS)
return { ok: true, handedOff: true }
} catch (err) {
rememberLog(`[updates] linux relaunch failed: ${err.message}; falling back to manual restart`)
return {
ok: true,
backendUpdated: true,
guiUpdated: false,
manualRestart: true,
message: 'Backend updated. Quit and reopen Hermes to load the new version.'
}
}
}
if (outcome === 'guiSkew') {
emitUpdateProgress({
stage: 'guiSkew',
message:
'Backend updated, but the desktop app package was not changed. ' +
'Update or reinstall the Hermes desktop app to match.',
percent: 100
})
rememberLog(
`[updates] gui/backend skew: execPath ${process.execPath} not under release/*-unpacked; ` +
'backend updated, GUI package unchanged (AppImage/.deb/.rpm/dev/unresolved)'
)
return { ok: true, backendUpdated: true, guiUpdated: false, guiSkew: true }
}
// outcome === 'manual': we're the rebuilt binary, but its sandbox helper is
// not launchable and no fallback applies. Keep this working window alive.
rememberLog(
`[updates] sandbox not launchable (${preflight.reason}); skipping auto-relaunch, ` +
'returning manual-restart so the user keeps a working window'
)
return {
ok: true,
backendUpdated: true,
guiUpdated: false,
manualRestart: true,
sandboxBlocked: true,
message:
'Backend updated. The rebuilt app cant relaunch automatically ' +
'(sandbox helper needs root). Quit and reopen Hermes to finish.'
}
}
const rebuiltApp = [
path.join(updateRoot, 'apps', 'desktop', 'release', 'mac-arm64', 'Hermes.app'),
path.join(updateRoot, 'apps', 'desktop', 'release', 'mac', 'Hermes.app')
@ -2963,20 +3162,7 @@ function runRenderTitleJob(rawUrl) {
}
try {
window = new BrowserWindow({
show: false,
width: 1280,
height: 800,
webPreferences: {
backgroundThrottling: false,
contextIsolation: true,
javascript: true,
nodeIntegration: false,
sandbox: true,
session: partitionSession,
webSecurity: true
}
})
window = createLinkTitleWindow(BrowserWindow, partitionSession)
} catch {
return finish('')
}
@ -4905,6 +5091,14 @@ async function startHermes() {
}
}
// Mutual exclusion with an in-app update (#50238). If this instance was
// relaunched while the Tauri updater is still applying an update, spawning
// a local backend now re-locks the venv shim and gets killed by the
// updater's straggler cleanup — looping. Park until the update finishes (or
// is detected stale), THEN start the backend. Local backends only; remote
// connections returned above and never touch the install tree.
await waitForUpdateToFinish()
const token = crypto.randomBytes(32).toString('base64url')
// --port 0: the OS assigns an ephemeral port; the child announces it on stdout.
const dashboardArgs = ['dashboard', '--no-open', '--host', '127.0.0.1', '--port', '0']

View file

@ -166,6 +166,7 @@ contextBridge.exposeInMainWorld('hermesDesktop', {
return () => ipcRenderer.removeListener('hermes:bootstrap:event', listener)
},
getVersion: () => ipcRenderer.invoke('hermes:version'),
getRemoteDisplayReason: () => ipcRenderer.invoke('hermes:get-remote-display-reason'),
uninstall: {
summary: () => ipcRenderer.invoke('hermes:uninstall:summary'),
run: mode => ipcRenderer.invoke('hermes:uninstall:run', { mode })

View file

@ -0,0 +1,93 @@
/**
* In-app update mutual-exclusion marker (#50238).
*
* The Tauri updater writes HERMES_HOME/.hermes-update-in-progress for the whole
* duration of an `--update` run (see apps/bootstrap-installer/src-tauri/src/
* update.rs `UpdateMarkerGuard`). The marker body is two lines: the updater's
* pid and the unix-seconds it started.
*
* Why: if the user relaunches the desktop mid-update the window vanished with
* no progress and looks crashed a fresh instance must NOT spawn its own local
* backend. That backend re-locks the venv shim, the updater's straggler cleanup
* (`force_kill_other_hermes`, taskkill /IM hermes.exe) kills it, the launch
* fails with the 45s "backend didn't come up" timeout, and the user relaunches
* into the same trap an infinite respawn/kill loop. The desktop gates local
* backend startup on this marker and parks until the update finishes.
*
* This module holds the PURE, side-effect-light logic (path, pid liveness,
* parse + staleness) so it is unit-testable without booting Electron. The
* polling/boot-progress wrapper lives in main.cjs where the boot-progress and
* log sinks are.
*/
const fs = require('fs')
const path = require('path')
// Even with a live-looking PID, never treat a marker older than this as a live
// update. A full update (git pull + pip + desktop rebuild) is minutes, not tens
// of minutes; past this the marker is almost certainly stale (e.g. the OS
// recycled the pid onto an unrelated process), so the gate self-heals.
const UPDATE_MARKER_MAX_AGE_MS = 20 * 60 * 1000
function markerPath(hermesHome) {
return path.join(hermesHome, '.hermes-update-in-progress')
}
// True only if a host process with this pid is currently alive. Signal 0 does
// not deliver a signal — it just probes existence/permission. ESRCH => dead;
// EPERM => alive but owned by another user (still "alive" for our purposes).
// Injectable `kill` keeps it unit-testable.
function isPidAlive(pid, kill = process.kill.bind(process)) {
if (!Number.isInteger(pid) || pid <= 0) return false
try {
kill(pid, 0)
return true
} catch (err) {
return Boolean(err && err.code === 'EPERM')
}
}
/**
* Read + interpret the marker.
*
* Returns `{ pid, ageMs }` only when an update is GENUINELY still running
* (parseable pid that is alive, within the age ceiling). Returns `null` for
* every "no live update" case absent, unreadable, malformed, dead pid, or
* past the ceiling and, when a stale marker file exists, deletes it so it
* cannot strand future launches.
*
* Pure-ish: file I/O against the given path, plus an injectable pid probe and
* clock for tests.
*/
function readLiveUpdateMarker(hermesHome, { kill, now = Date.now, maxAgeMs = UPDATE_MARKER_MAX_AGE_MS } = {}) {
const file = markerPath(hermesHome)
let raw
try {
raw = fs.readFileSync(file, 'utf8')
} catch {
return null // absent or unreadable => no live update
}
const [pidLine, startedLine] = String(raw).split('\n')
const pid = Number.parseInt((pidLine || '').trim(), 10)
const startedAt = Number.parseInt((startedLine || '').trim(), 10)
const ageMs = Number.isFinite(startedAt) ? now() - startedAt * 1000 : Infinity
const alive = Number.isInteger(pid) && isPidAlive(pid, kill)
if (!alive || ageMs > maxAgeMs) {
try {
fs.unlinkSync(file)
} catch {
void 0
}
return null
}
return { pid, ageMs }
}
module.exports = {
UPDATE_MARKER_MAX_AGE_MS,
markerPath,
isPidAlive,
readLiveUpdateMarker
}

View file

@ -0,0 +1,92 @@
/**
* Tests for electron/update-marker.cjs the in-app update mutual-exclusion
* marker that prevents a desktop relaunched mid-update from spawning a backend
* the updater then kills in a loop (#50238).
*
* Run with: node --test electron/update-marker.test.cjs
* (Wired into npm test:desktop:platforms in package.json.)
*
* Why this matters: the gate must (a) report a live update only when the
* updater pid is alive AND the marker is fresh, (b) treat absent/malformed/
* dead-pid/expired markers as "no live update" so a crashed updater can't
* strand future launches, and (c) self-heal by deleting a stale marker file.
*/
const test = require('node:test')
const assert = require('node:assert/strict')
const fs = require('fs')
const os = require('os')
const path = require('path')
const { markerPath, isPidAlive, readLiveUpdateMarker, UPDATE_MARKER_MAX_AGE_MS } = require('./update-marker.cjs')
function tmpHome(tag) {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), `hermes-marker-${tag}-`))
return dir
}
function writeMarker(home, pid, startedAtSec) {
fs.writeFileSync(markerPath(home), `${pid}\n${startedAtSec}`)
}
const ALIVE = () => true // injected kill that "succeeds" => pid alive
const DEAD = () => {
const err = new Error('no such process')
err.code = 'ESRCH'
throw err
}
test('absent marker => no live update', () => {
const home = tmpHome('absent')
assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
})
test('live pid within age ceiling => live update reported', () => {
const home = tmpHome('live')
const now = 1_000_000_000_000
writeMarker(home, 4242, Math.floor(now / 1000) - 5) // 5s old
const res = readLiveUpdateMarker(home, { kill: ALIVE, now: () => now })
assert.ok(res, 'a fresh, alive marker is a live update')
assert.equal(res.pid, 4242)
assert.ok(res.ageMs >= 0 && res.ageMs < 10_000)
assert.ok(fs.existsSync(markerPath(home)), 'a live marker is NOT deleted')
})
test('dead pid => no live update and marker is pruned', () => {
const home = tmpHome('dead')
writeMarker(home, 999999, Math.floor(Date.now() / 1000))
assert.equal(readLiveUpdateMarker(home, { kill: DEAD }), null)
assert.ok(!fs.existsSync(markerPath(home)), 'a dead-pid marker self-heals (deleted)')
})
test('expired marker (past age ceiling) => no live update and pruned', () => {
const home = tmpHome('expired')
const now = 1_000_000_000_000
writeMarker(home, 4242, Math.floor((now - UPDATE_MARKER_MAX_AGE_MS - 60_000) / 1000))
// Even though the pid is "alive", the marker is too old to trust.
assert.equal(readLiveUpdateMarker(home, { kill: ALIVE, now: () => now }), null)
assert.ok(!fs.existsSync(markerPath(home)), 'an expired marker self-heals (deleted)')
})
test('malformed marker => no live update and pruned', () => {
const home = tmpHome('malformed')
fs.writeFileSync(markerPath(home), 'not-a-pid\nnonsense')
assert.equal(readLiveUpdateMarker(home, { kill: ALIVE }), null)
assert.ok(!fs.existsSync(markerPath(home)))
})
test('isPidAlive: own pid is alive, impossible pid is dead', () => {
assert.equal(isPidAlive(process.pid), true)
assert.equal(isPidAlive(-1), false)
assert.equal(isPidAlive(0), false)
assert.equal(isPidAlive(NaN), false)
})
test('isPidAlive: EPERM counts as alive (process owned by another user)', () => {
const eperm = () => {
const err = new Error('operation not permitted')
err.code = 'EPERM'
throw err
}
assert.equal(isPidAlive(4242, eperm), true)
})

View file

@ -0,0 +1,265 @@
'use strict'
/**
* update-relaunch.cjs pure decision + script-generation helpers for the
* Linux in-app update relaunch (#45205).
*
* Extracted from main.cjs's `applyUpdatesPosixInApp` so the security- and
* correctness-critical "do we relaunch, or land on a manual terminal state?"
* decision is unit-testable without booting Electron (main.cjs
* `require('electron')` at load).
*
* Background
* ----------
* After `hermes update` + `hermes desktop --build-only`, the freshly-rebuilt
* GUI lives under `apps/desktop/release/<plat>-unpacked`. We can only honestly
* relaunch into the new GUI when the *running* binary is that rebuilt one
* i.e. its execPath is under the rebuilt `release/<plat>-unpacked` dir.
*
* - Source / unpacked install (execPath under release/<plat>-unpacked):
* the running binary IS the thing we just rebuilt relaunch it in place.
* - AppImage / .deb / .rpm / dev / unresolved (execPath elsewhere):
* the backend was updated but THIS GUI shell was NOT replaced. Claiming
* "the new version loads next launch" is a lie that produces GUI/backend
* skew (#37541): the user keeps running the old GUI against new backend
* code with no path to fix it from inside the app. Surface an explicit
* terminal state telling them the GUI package must be reinstalled.
*
* Sandbox preflight (#3 in the review)
* ------------------------------------
* A fresh `release/<plat>-unpacked` rebuild can leave `chrome-sandbox` without
* the required `root:root` + setuid (mode 4755). Electron then refuses to
* launch with "The SUID sandbox helper binary was found, but is not configured
* correctly" and the relaunch yields "quit and never came back" a dead app.
* Before we quit+hand off we preflight the rebuilt sandbox helper; if it is NOT
* launchable (and no working non-interactive fallback applies see
* sandboxFallbackFromEnv) we DO NOT quit. We keep the working window and return
* the closeable manual-restart terminal state instead.
*/
const path = require('node:path')
// Map process.platform → electron-builder's `release/<dir>-unpacked` name.
function unpackedDirName(platform) {
if (platform === 'darwin') return 'mac-unpacked' // not used (mac swaps bundles)
if (platform === 'win32') return 'win-unpacked'
return 'linux-unpacked'
}
/**
* If `execPath` lives under `<updateRoot>/apps/desktop/release/<plat>-unpacked`,
* return that unpacked dir; otherwise null. A null result means the running
* binary is NOT the thing we just rebuilt (AppImage/.deb/.rpm/dev), so we must
* not claim a GUI relaunch.
*
* Match is a path-segment-aware prefix check (not a bare string startsWith) so
* `.../release/linux-unpacked-evil` can't masquerade as `.../release/linux-unpacked`.
*/
function resolveUnpackedRelease(execPath, updateRoot, platform) {
if (!execPath || !updateRoot) return null
const releaseDir = path.join(updateRoot, 'apps', 'desktop', 'release')
const unpacked = path.join(releaseDir, unpackedDirName(platform))
const normalizedExec = path.resolve(String(execPath))
// execPath must be the unpacked dir itself or a descendant of it.
const withSep = unpacked.endsWith(path.sep) ? unpacked : unpacked + path.sep
if (normalizedExec === unpacked || normalizedExec.startsWith(withSep)) {
return unpacked
}
return null
}
/**
* Pure decision: given whether the running binary is under the rebuilt
* unpacked release AND whether its sandbox helper is launchable, choose the
* terminal outcome.
*
* 'relaunch' quit + detached watcher re-execs the rebuilt binary in place.
* 'guiSkew' backend updated, GUI package NOT changed; user must reinstall
* the GUI. Closeable terminal state; does NOT claim a GUI update.
* 'manual' running the rebuilt binary, but its sandbox helper is not
* launchable and no fallback applies; do NOT quit into a dead
* app. Closeable manual-restart terminal state.
*/
function decideRelaunchOutcome({ underUnpacked, sandboxOk }) {
if (!underUnpacked) return 'guiSkew'
if (!sandboxOk) return 'manual'
return 'relaunch'
}
/**
* Preflight the rebuilt sandbox helper. Returns
* { ok: boolean, reason: string, path: string }
*
* `ok` is true when chrome-sandbox is owned by uid 0 AND has the setuid bit
* (mode & 0o4000) i.e. Electron can launch it. If chrome-sandbox does not
* exist at all we treat it as ok: this Electron build does not use the SUID
* sandbox helper (e.g. it ships the namespace sandbox), so the relaunch is not
* blocked on it.
*
* `statSync` is injectable so this is testable without a real setuid file.
*/
function sandboxPreflight(unpackedDir, statSync) {
if (!unpackedDir) return { ok: false, reason: 'no-unpacked-dir', path: null }
const sandboxPath = path.join(unpackedDir, 'chrome-sandbox')
let st
try {
st = statSync(sandboxPath)
} catch {
// No chrome-sandbox helper present → this build doesn't rely on the SUID
// sandbox; nothing to block the relaunch.
return { ok: true, reason: 'no-sandbox-helper', path: sandboxPath }
}
const ownedByRoot = st.uid === 0
const hasSetuid = (st.mode & 0o4000) !== 0
if (ownedByRoot && hasSetuid) {
return { ok: true, reason: 'launchable', path: sandboxPath }
}
if (!ownedByRoot && !hasSetuid) {
return { ok: false, reason: 'not-root-not-setuid', path: sandboxPath }
}
if (!ownedByRoot) return { ok: false, reason: 'not-root', path: sandboxPath }
return { ok: false, reason: 'not-setuid', path: sandboxPath }
}
/**
* Detect a non-interactive sandbox fallback the user has opted into via the
* environment. The reviewer asked us to integrate with any existing
* `--no-sandbox` / chrome-sandbox handling. A repo grep found NO existing
* non-interactive sandbox fallback in the desktop app (the only chrome-sandbox
* reference is documentation in scripts/before-pack.cjs). The one signal that
* DOES exist is the standard Electron escape hatch: ELECTRON_DISABLE_SANDBOX=1
* (and the equivalent `--no-sandbox` already present in the launch args). If
* the user has set that, the rebuilt binary will start even with a broken
* chrome-sandbox, so the relaunch is safe.
*
* Returns true when a fallback makes the relaunch safe despite a failed
* sandbox preflight.
*/
function sandboxFallbackFromEnv(env, launchArgs) {
const disable = String((env && env.ELECTRON_DISABLE_SANDBOX) || '').trim()
if (disable === '1' || disable.toLowerCase() === 'true') return true
if (Array.isArray(launchArgs) && launchArgs.some(a => a === '--no-sandbox')) return true
return false
}
// POSIX single-quote a value for safe inclusion in the generated bash script.
function shellQuote(value) {
return `'${String(value).replace(/'/g, `'\\''`)}'`
}
// Electron / Chromium internal switches that must NOT be replayed on re-exec:
// they are runtime artifacts of THIS launch, not user intent, and re-passing
// them can change sandbox/zygote behavior or point at stale fds/dirs.
const INTERNAL_ARG_PREFIXES = [
'--type=', // renderer/gpu/zygote child markers
'--user-data-dir=',
'--enable-features=',
'--disable-features=',
'--field-trial-handle=',
'--enable-logging',
'--log-file=',
// NB: --no-sandbox is deliberately NOT stripped — it reflects the user's /
// environment's SUID-sandbox opt-out (some hardened kernels/containers require
// it) and is the signal sandboxFallbackFromEnv() uses to allow a relaunch when
// chrome-sandbox isn't setuid. Dropping it would make exactly that relaunch
// fail ("quit and never came back").
'--disable-gpu-sandbox',
'--lang=',
'--inspect',
'--remote-debugging-port='
]
/**
* Filter Electron internals out of the original launch args so we replay only
* meaningful user/launcher intent (deep-link URLs, app-specific flags).
* `argv` is expected to be process.argv.slice(1) for a PACKAGED app (argv[0] is
* the exec path itself; there is no entry-script arg as in a dev run).
*/
function collectRelaunchArgs(argv) {
if (!Array.isArray(argv)) return []
return argv.filter(arg => {
if (typeof arg !== 'string' || arg.length === 0) return false
return !INTERNAL_ARG_PREFIXES.some(prefix =>
prefix.endsWith('=') ? arg.startsWith(prefix) : arg === prefix || arg.startsWith(prefix + '=')
)
})
}
// Env keys whose values define the relaunched instance's context (which
// backend/profile/root it talks to). Anything HERMES_DESKTOP_* is preserved
// plus HERMES_HOME. We snapshot the values, not the live env, so the new
// instance comes up pointed at the same place this one was.
// ELECTRON_DISABLE_SANDBOX is preserved for the same reason --no-sandbox is kept
// in the replayed args: if a relaunch is only safe because the user opted out of
// the SUID sandbox, the relaunched instance must inherit that opt-out too.
const PRESERVED_ENV_KEYS = ['HERMES_HOME', 'ELECTRON_DISABLE_SANDBOX']
const PRESERVED_ENV_PREFIXES = ['HERMES_DESKTOP_']
function collectRelaunchEnv(env) {
const out = {}
if (!env || typeof env !== 'object') return out
for (const [key, value] of Object.entries(env)) {
if (value == null) continue
if (PRESERVED_ENV_KEYS.includes(key) || PRESERVED_ENV_PREFIXES.some(p => key.startsWith(p))) {
out[key] = String(value)
}
}
return out
}
/**
* Build the detached bash watcher that waits for the parent to exit (graceful
* window then SIGKILL), self-deletes, and re-execs the rebuilt binary WITH the
* original launch context (cwd, env, args) restored.
*
* @param {object} o
* @param {number} o.pid parent (this) process pid to wait on
* @param {string} o.execPath binary to re-exec
* @param {string[]} o.args filtered launch args to replay
* @param {object} o.env env keyvalue to export before exec
* @param {string} o.cwd working directory to restore
*/
function buildRelaunchScript({ pid, execPath, args, env, cwd }) {
const exports = Object.entries(env || {})
.map(([k, v]) => `export ${k}=${shellQuote(v)}`)
.join('\n')
const quotedArgs = (args || []).map(shellQuote).join(' ')
const cwdLine = cwd ? `cd ${shellQuote(cwd)} 2>/dev/null || true` : ''
// NOTE: `exec` replaces the watcher process with the relaunched app, so the
// re-exec inherits exactly the env/cwd we set above.
return `#!/bin/bash
set -u
APP_PID=${Number(pid)}
# Wait up to ~30s for a graceful exit, then SIGKILL: a hung/zombie parent must
# be gone before we relaunch, or the new instance bails on the single-instance
# lock. (#45205)
for _ in $(seq 1 60); do
kill -0 "$APP_PID" 2>/dev/null || break
sleep 0.5
done
if kill -0 "$APP_PID" 2>/dev/null; then
kill -9 "$APP_PID" 2>/dev/null || true
sleep 0.5
fi
# Self-delete so temp watchers don't accumulate across updates.
rm -f -- "$0" 2>/dev/null || true
${cwdLine}
${exports}
exec ${shellQuote(execPath)}${quotedArgs ? ' ' + quotedArgs : ''}
`
}
module.exports = {
unpackedDirName,
resolveUnpackedRelease,
decideRelaunchOutcome,
sandboxPreflight,
sandboxFallbackFromEnv,
collectRelaunchArgs,
collectRelaunchEnv,
buildRelaunchScript,
shellQuote,
INTERNAL_ARG_PREFIXES,
PRESERVED_ENV_KEYS,
PRESERVED_ENV_PREFIXES
}

View file

@ -0,0 +1,231 @@
/**
* Tests for electron/update-relaunch.cjs the pure decision + script helpers
* behind the Linux in-app update relaunch (#45205).
*
* Run with: node --test electron/update-relaunch.test.cjs
* (Wired into npm test:desktop:platforms in package.json.)
*
* What this locks (review acceptance criteria for PR #45205):
* 1. The execPath split: only a binary under release/<plat>-unpacked may
* relaunch/claim a GUI update; AppImage/.deb/.rpm/dev/unresolved paths land
* on the guiSkew terminal state and do NOT claim the GUI was updated.
* 2. Launch context is replayed on re-exec (args filtered of Electron
* internals; HERMES_HOME / HERMES_DESKTOP_* env + cwd preserved) and is
* safely shell-quoted.
* 3. The sandbox preflight: chrome-sandbox must be root-owned + setuid to be
* launchable; otherwise the decision degrades to a manual terminal state
* (keep a working window) unless a non-interactive fallback applies.
*/
const test = require('node:test')
const assert = require('node:assert/strict')
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const { execFileSync } = require('node:child_process')
const {
unpackedDirName,
resolveUnpackedRelease,
decideRelaunchOutcome,
sandboxPreflight,
sandboxFallbackFromEnv,
collectRelaunchArgs,
collectRelaunchEnv,
buildRelaunchScript,
shellQuote
} = require('./update-relaunch.cjs')
const ROOT = '/home/u/.hermes/hermes-agent'
const UNPACKED = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked')
// ---------------------------------------------------------------------------
// 1) The execPath split — the heart of the GUI/backend skew guard.
// ---------------------------------------------------------------------------
test('unpackedDirName maps platform to the electron-builder dir', () => {
assert.equal(unpackedDirName('linux'), 'linux-unpacked')
assert.equal(unpackedDirName('win32'), 'win-unpacked')
})
test('resolveUnpackedRelease returns the dir for a binary UNDER release/<plat>-unpacked', () => {
const exec = path.join(UNPACKED, 'hermes')
assert.equal(resolveUnpackedRelease(exec, ROOT, 'linux'), UNPACKED)
// The unpacked dir itself also counts.
assert.equal(resolveUnpackedRelease(UNPACKED, ROOT, 'linux'), UNPACKED)
})
test('resolveUnpackedRelease is null for AppImage / .deb / .rpm / dev / unresolved paths', () => {
// AppImage mount
assert.equal(resolveUnpackedRelease('/tmp/.mount_Hermes12345/AppRun', ROOT, 'linux'), null)
// .deb / .rpm system install
assert.equal(resolveUnpackedRelease('/usr/lib/hermes/hermes', ROOT, 'linux'), null)
assert.equal(resolveUnpackedRelease('/opt/Hermes/hermes', ROOT, 'linux'), null)
// dev electron
assert.equal(resolveUnpackedRelease('/home/u/.hermes/hermes-agent/node_modules/electron/dist/electron', ROOT, 'linux'), null)
// empty / missing
assert.equal(resolveUnpackedRelease('', ROOT, 'linux'), null)
assert.equal(resolveUnpackedRelease(path.join(UNPACKED, 'hermes'), '', 'linux'), null)
})
test('resolveUnpackedRelease is not fooled by a sibling prefix dir', () => {
// `.../release/linux-unpacked-evil` must NOT match `.../release/linux-unpacked`.
const sneaky = path.join(ROOT, 'apps', 'desktop', 'release', 'linux-unpacked-evil', 'hermes')
assert.equal(resolveUnpackedRelease(sneaky, ROOT, 'linux'), null)
})
test('decideRelaunchOutcome: only under-unpacked + sandbox-ok relaunches', () => {
assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: true }), 'relaunch')
// Under unpacked but sandbox not launchable → manual (keep a working window).
assert.equal(decideRelaunchOutcome({ underUnpacked: true, sandboxOk: false }), 'manual')
// Not under unpacked → guiSkew regardless of sandbox flag.
assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: true }), 'guiSkew')
assert.equal(decideRelaunchOutcome({ underUnpacked: false, sandboxOk: false }), 'guiSkew')
})
// ---------------------------------------------------------------------------
// 3) Sandbox preflight
// ---------------------------------------------------------------------------
const fakeStat = (uid, mode) => () => ({ uid, mode })
const throwStat = () => {
throw Object.assign(new Error('ENOENT'), { code: 'ENOENT' })
}
test('sandboxPreflight: root-owned + setuid is launchable', () => {
const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o4755))
assert.equal(r.ok, true)
assert.equal(r.reason, 'launchable')
})
test('sandboxPreflight: not root → not launchable', () => {
const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o4755))
assert.equal(r.ok, false)
assert.equal(r.reason, 'not-root')
})
test('sandboxPreflight: missing setuid bit → not launchable', () => {
const r = sandboxPreflight(UNPACKED, fakeStat(0, 0o755))
assert.equal(r.ok, false)
assert.equal(r.reason, 'not-setuid')
})
test('sandboxPreflight: neither root nor setuid (the fresh-rebuild trap)', () => {
const r = sandboxPreflight(UNPACKED, fakeStat(1000, 0o755))
assert.equal(r.ok, false)
assert.equal(r.reason, 'not-root-not-setuid')
})
test('sandboxPreflight: no chrome-sandbox helper present → ok (build does not use SUID sandbox)', () => {
const r = sandboxPreflight(UNPACKED, throwStat)
assert.equal(r.ok, true)
assert.equal(r.reason, 'no-sandbox-helper')
})
test('sandboxFallbackFromEnv: ELECTRON_DISABLE_SANDBOX / --no-sandbox make a broken sandbox safe', () => {
assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: '1' }, []), true)
assert.equal(sandboxFallbackFromEnv({ ELECTRON_DISABLE_SANDBOX: 'true' }, []), true)
assert.equal(sandboxFallbackFromEnv({}, ['--no-sandbox']), true)
assert.equal(sandboxFallbackFromEnv({}, ['--foo']), false)
assert.equal(sandboxFallbackFromEnv({}, []), false)
assert.equal(sandboxFallbackFromEnv(null, null), false)
})
// ---------------------------------------------------------------------------
// 2) Launch-context preservation
// ---------------------------------------------------------------------------
test('collectRelaunchArgs drops Electron internals, keeps user/launcher args', () => {
const argv = [
'--type=renderer',
'--user-data-dir=/tmp/x',
'--enable-features=Foo',
'--field-trial-handle=123',
'--no-sandbox', // sandbox opt-out — KEEP (user/env intent + relaunch fallback)
'--lang=en-US',
'hermes://open/agent/42', // deep link — keep
'--profile=work', // app flag — keep
'--remote-debugging-port=9222' // internal — drop
]
assert.deepEqual(collectRelaunchArgs(argv), ['--no-sandbox', 'hermes://open/agent/42', '--profile=work'])
assert.deepEqual(collectRelaunchArgs(undefined), [])
})
test('collectRelaunchEnv preserves HERMES_HOME + HERMES_DESKTOP_* + sandbox opt-out only', () => {
const env = {
HERMES_HOME: '/home/u/.hermes',
HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
ELECTRON_DISABLE_SANDBOX: '1', // sandbox opt-out — preserved
PATH: '/usr/bin', // not preserved
HOME: '/home/u', // not preserved
UNRELATED: 'x'
}
assert.deepEqual(collectRelaunchEnv(env), {
HERMES_HOME: '/home/u/.hermes',
HERMES_DESKTOP_REMOTE_URL: 'http://box:9119',
HERMES_DESKTOP_REMOTE_TOKEN: 'secret',
HERMES_DESKTOP_HERMES_ROOT: '/home/u/dev/hermes',
ELECTRON_DISABLE_SANDBOX: '1'
})
assert.deepEqual(collectRelaunchEnv(null), {})
})
// ---------------------------------------------------------------------------
// Generated watcher script: safe quoting + valid bash syntax.
// ---------------------------------------------------------------------------
test('shellQuote neutralizes single quotes and metacharacters', () => {
assert.equal(shellQuote(`a'b`), `'a'\\''b'`)
assert.equal(shellQuote('$(rm -rf /)'), `'$(rm -rf /)'`)
})
test('buildRelaunchScript embeds pid/exec/args/env/cwd and is valid bash', () => {
const script = buildRelaunchScript({
pid: 4242,
execPath: '/home/u/.hermes/hermes-agent/apps/desktop/release/linux-unpacked/Hermes',
args: ['hermes://open/agent/42', "--note=it's fine"],
env: { HERMES_HOME: '/home/u/.hermes', HERMES_DESKTOP_REMOTE_URL: 'http://box:9119' },
cwd: '/home/u/work dir'
})
// Structural assertions.
assert.match(script, /^#!\/bin\/bash/)
assert.match(script, /APP_PID=4242/)
assert.match(script, /kill -9 "\$APP_PID"/)
assert.match(script, /rm -f -- "\$0"/)
// env exports + cwd restore + args replay are present and quoted.
assert.match(script, /export HERMES_HOME='\/home\/u\/\.hermes'/)
assert.match(script, /export HERMES_DESKTOP_REMOTE_URL='http:\/\/box:9119'/)
assert.match(script, /cd '\/home\/u\/work dir'/)
assert.match(script, /exec '.*\/linux-unpacked\/Hermes' 'hermes:\/\/open\/agent\/42' '--note=it'\\''s fine'/)
// It must be syntactically valid bash (`bash -n`). Write to a temp file and lint.
const tmp = path.join(os.tmpdir(), `hermes-relaunch-test-${Date.now()}.sh`)
fs.writeFileSync(tmp, script)
try {
execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
} finally {
fs.rmSync(tmp, { force: true })
}
})
test('buildRelaunchScript with no args/env still lints clean', () => {
const script = buildRelaunchScript({
pid: 1,
execPath: '/opt/Hermes/Hermes',
args: [],
env: {},
cwd: ''
})
const tmp = path.join(os.tmpdir(), `hermes-relaunch-test2-${Date.now()}.sh`)
fs.writeFileSync(tmp, script)
try {
execFileSync('bash', ['-n', tmp], { stdio: 'pipe' })
} finally {
fs.rmSync(tmp, { force: true })
}
// exec line has no trailing args.
assert.match(script, /exec '\/opt\/Hermes\/Hermes'\n/)
})

View file

@ -2,7 +2,7 @@
"name": "hermes",
"productName": "Hermes",
"private": true,
"version": "0.15.1",
"version": "0.17.0",
"description": "Native desktop shell for Hermes Agent.",
"author": "Nous Research",
"type": "module",
@ -37,7 +37,7 @@
"test:desktop:nsis": "node scripts/test-desktop.mjs nsis",
"test:desktop:existing": "node scripts/test-desktop.mjs existing",
"test:desktop:fresh": "node scripts/test-desktop.mjs fresh",
"test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/windows-user-env.test.cjs",
"test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/backend-ready.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/link-title-window.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-rebuild.test.cjs electron/update-marker.test.cjs electron/update-relaunch.test.cjs electron/windows-user-env.test.cjs",
"typecheck": "tsc -p . --noEmit",
"lint": "eslint src/ electron/",
"lint:fix": "eslint src/ electron/ --fix",

View file

@ -357,7 +357,7 @@ function SubagentRow({ node, depth = 0, nowMs }: { node: SubagentNode; depth?: n
</button>
{visibleRows.length > 0 ? (
<div className="grid min-w-0 gap-1 pl-6">
<div className="grid min-w-0 gap-1 pl-6" data-selectable-text="true">
{visibleRows.map((entry, i) => (
<StreamLine
active={running && i === visibleRows.length - 1}
@ -371,7 +371,7 @@ function SubagentRow({ node, depth = 0, nowMs }: { node: SubagentNode; depth?: n
) : null}
{open && fileLines.length > 0 ? (
<div className="grid min-w-0 gap-0.5 pl-6">
<div className="grid min-w-0 gap-0.5 pl-6" data-selectable-text="true">
<p className="text-[0.58rem] font-medium tracking-wider text-muted-foreground/60 uppercase">
{t.agents.files}
</p>

View file

@ -0,0 +1,69 @@
import { cleanup, render, screen } from '@testing-library/react'
import { afterEach, describe, expect, it } from 'vitest'
import { I18nProvider } from '@/i18n/context'
import { AttachmentList } from './attachments'
import type { ComposerAttachment } from '@/store/composer'
function makeAttachment(id: string, label = 'test.pdf'): ComposerAttachment {
return { id, kind: 'file', label }
}
function renderWithI18n(ui: React.ReactNode) {
return render(
<I18nProvider configClient={{ getConfig: async () => ({}), saveConfig: async () => ({ ok: true }) }}>
{ui}
</I18nProvider>
)
}
describe('AttachmentList', () => {
afterEach(() => {
cleanup()
})
it('renders valid attachments', () => {
const attachments = [makeAttachment('a', 'doc.pdf'), makeAttachment('b', 'img.png')]
renderWithI18n(<AttachmentList attachments={attachments} />)
expect(screen.getByText('doc.pdf')).toBeDefined()
expect(screen.getByText('img.png')).toBeDefined()
})
it('renders empty list without error', () => {
renderWithI18n(<AttachmentList attachments={[]} />)
const container = screen.getByTestId?.('composer-attachments') ?? document.querySelector('[data-slot="composer-attachments"]')
expect(container).toBeDefined()
})
it('does not crash when attachments array contains undefined entries', () => {
// Repro: session switch can leave stale/undefined entries in the
// attachments array, causing a TypeError at attachment.refText.
const attachments = [
makeAttachment('a', 'good.pdf'),
undefined as unknown as ComposerAttachment,
makeAttachment('b', 'also-good.png')
]
expect(() => {
renderWithI18n(<AttachmentList attachments={attachments} />)
}).not.toThrow()
// Only valid attachments should render
expect(screen.getByText('good.pdf')).toBeDefined()
expect(screen.getByText('also-good.png')).toBeDefined()
})
it('does not crash when attachments array contains null entries', () => {
const attachments = [
null as unknown as ComposerAttachment,
makeAttachment('a', 'valid.txt')
]
expect(() => {
renderWithI18n(<AttachmentList attachments={attachments} />)
}).not.toThrow()
expect(screen.getByText('valid.txt')).toBeDefined()
})
})

View file

@ -20,7 +20,7 @@ export function AttachmentList({
}) {
return (
<div className="flex max-w-full flex-wrap gap-1.5 px-1 pt-1" data-slot="composer-attachments">
{attachments.map(attachment => (
{attachments.filter(Boolean).map(attachment => (
<AttachmentPill attachment={attachment} key={attachment.id} onRemove={onRemove} />
))}
</div>

View file

@ -2,21 +2,20 @@ import type { Unstable_TriggerAdapter } from '@assistant-ui/core'
import { ComposerPrimitive } from '@assistant-ui/react'
import type { ReactNode } from 'react'
import { composerFusedDockCard } from '@/components/chat/composer-dock'
import { composerPanelCard } from '@/components/chat/composer-dock'
import { cn } from '@/lib/utils'
// Same docked chrome as the queue/status stack, but its own thing: a narrow,
// left-aligned card (not full width) that fuses to the composer's edge instead
// of floating above it. `left-1` matches the stack's `mx-1` inset; the negative
// margin overlaps the seam so the composer's (now-transparent) edge border reads
// as shared. Fused (opaque) fill — the composer surface swaps to the same fill
// while a drawer is open, so the two paint as one panel.
const DRAWER_SHELL =
'absolute left-1 z-50 w-80 max-w-[calc(100%-0.5rem)] max-h-[min(22rem,calc(100vh-8rem))] overflow-y-auto overscroll-contain p-1 text-xs text-popover-foreground'
// A standalone glassy panel floating just off the composer edge, inset from the
// left. Skin is the shared composerPanelCard (also used by the attach menu).
const DRAWER_SHELL = cn(
'absolute left-2 z-50 w-80 max-w-[calc(100%-1rem)] max-h-[min(22rem,calc(100vh-8rem))]',
'overflow-y-auto overscroll-contain p-1 text-popover-foreground',
composerPanelCard
)
export const COMPLETION_DRAWER_CLASS = cn(DRAWER_SHELL, 'bottom-full -mb-[9px]', composerFusedDockCard('top'))
export const COMPLETION_DRAWER_CLASS = cn(DRAWER_SHELL, 'bottom-full mb-1')
export const COMPLETION_DRAWER_BELOW_CLASS = cn(DRAWER_SHELL, 'top-full -mt-[9px]', composerFusedDockCard('bottom'))
export const COMPLETION_DRAWER_BELOW_CLASS = cn(DRAWER_SHELL, 'top-full mt-1')
export function ComposerCompletionDrawer({
adapter,

View file

@ -1,5 +1,6 @@
import { useState } from 'react'
import { composerPanelCard } from '@/components/chat/composer-dock'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { Dialog, DialogContent, DialogDescription, DialogHeader, DialogTitle } from '@/components/ui/dialog'
@ -54,11 +55,11 @@ export function ContextMenu({
type="button"
variant="ghost"
>
<Codicon name="add" size="1rem" />
<Codicon name="add" size="0.875rem" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="start" className="w-60" side="top" sideOffset={10}>
<DropdownMenuLabel className="text-[0.7rem] font-medium uppercase tracking-wide text-muted-foreground/85">
<DropdownMenuContent align="start" className={cn('w-60', composerPanelCard)} side="top" sideOffset={6}>
<DropdownMenuLabel className="px-2 pb-0.5 pt-0.5 text-[0.625rem] font-semibold uppercase tracking-wider text-(--ui-text-tertiary)">
{c.attachLabel}
</DropdownMenuLabel>
<ContextMenuItem disabled={!onPickFiles} icon={FileText} onSelect={onPickFiles}>
@ -142,7 +143,12 @@ function PromptSnippetsDialog({ onInsertText, onOpenChange, open }: PromptSnippe
export function ContextMenuItem({ children, disabled, icon: Icon, onSelect }: ContextMenuItemProps) {
return (
<DropdownMenuItem disabled={disabled} onSelect={onSelect}>
// Override font size + highlight to match the / · @ completion rows exactly.
<DropdownMenuItem
className="text-[length:var(--conversation-tool-font-size)] focus:bg-(--ui-bg-tertiary)"
disabled={disabled}
onSelect={onSelect}
>
<Icon />
<span>{children}</span>
</DropdownMenuItem>

View file

@ -43,6 +43,7 @@ export function ComposerControls({
busyAction,
canSteer,
canSubmit,
compactModelPill = false,
conversation,
disabled,
hasComposerPayload,
@ -55,6 +56,7 @@ export function ComposerControls({
busyAction: 'queue' | 'stop'
canSteer: boolean
canSubmit: boolean
compactModelPill?: boolean
conversation: ConversationProps
disabled: boolean
hasComposerPayload: boolean
@ -83,7 +85,7 @@ export function ComposerControls({
return (
<div className="ml-auto flex shrink-0 items-center gap-(--composer-control-gap)">
<ModelPill disabled={disabled} model={state.model} />
<ModelPill compact={compactModelPill} disabled={disabled} model={state.model} />
{/* While the agent runs and the user is typing, steer takes over the mic's
slot rather than crowding the row with an extra button. */}
{canSteer ? (
@ -97,7 +99,7 @@ export function ComposerControls({
type="button"
variant="ghost"
>
<SteeringWheel size={16} />
<SteeringWheel size={14} />
</Button>
</Tip>
) : (
@ -116,7 +118,7 @@ export function ComposerControls({
size="icon"
type="button"
>
<AudioLines size={17} />
<AudioLines size={15} />
</Button>
</Tip>
) : (
@ -129,12 +131,12 @@ export function ComposerControls({
>
{busy ? (
busyAction === 'queue' ? (
<Layers3 size={16} />
<Layers3 size={14} />
) : (
<span className="block size-3 rounded-[0.1875rem] bg-current" />
<span className="block size-2.5 rounded-[0.1875rem] bg-current" />
)
) : (
<Codicon name="arrow-up" size="1rem" />
<Codicon name="arrow-up" size="0.875rem" />
)}
</Button>
</Tip>
@ -293,11 +295,11 @@ function DictationButton({
variant="ghost"
>
{status === 'recording' ? (
<Square className="fill-current" size={12} />
<Square className="fill-current" size={11} />
) : status === 'transcribing' ? (
<Loader2 className="animate-spin" size={16} />
<Loader2 className="animate-spin" size={14} />
) : (
<Codicon name="mic" size="1rem" />
<Codicon name="mic" size="0.875rem" />
)}
</Button>
</Tip>

View file

@ -0,0 +1,352 @@
import {
type PointerEvent as ReactPointerEvent,
type RefObject,
useCallback,
useEffect,
useRef,
useState
} from 'react'
import {
POPOUT_ESTIMATED_HEIGHT,
POPOUT_WIDTH_REM,
setComposerPopoutPosition,
type PopoutPosition,
type PopoutSize
} from '@/store/composer-popout'
// Floating surface long-press before it becomes draggable (the 5px platform drags
// instantly; this only covers grabbing the composer body itself).
const LONG_PRESS_MS = 360
const LONG_PRESS_MOVE_TOLERANCE = 10
// Upward drag distance from the docked composer that peels it off into a float.
const PEEL_OUT_PX = 16
const DOCK_ZONE_BOTTOM_PX = 72
// How close the composer's center must be to the viewport center (px) to count as
// "over the dock". Kept tight so the bottom-left/right corners stay free.
const DOCK_ZONE_CENTER_TOLERANCE_PX = 150
// Falloff distances over which dock proximity ramps from 1 (in-zone) down to 0.
const DOCK_VERTICAL_FALLOFF_PX = 260
const DOCK_HORIZONTAL_FALLOFF_PX = 220
interface PressState {
armed: boolean
mode: 'dock' | 'float'
pointerId: number
startBottom: number
startRight: number
startX: number
startY: number
}
interface ComposerPopoutGesturesOptions {
composerRef: RefObject<HTMLFormElement | null>
onDock: () => void
onPopOut: () => void
poppedOut: boolean
position: PopoutPosition
}
function gestureTargetOk(target: EventTarget | null) {
if (!(target instanceof Element)) {
return false
}
return !target.closest('button, a, input, textarea, select, [role="menuitem"], [data-radix-popper-content-wrapper]')
}
/** Floating composer's 5px outer frame — grab here to drag without long-press. */
function isFloatDragPlatform(target: EventTarget | null) {
if (!(target instanceof Element)) {
return false
}
if (!target.closest('[data-slot="composer-root"][data-popped-out]')) {
return false
}
if (target.closest('[data-slot="composer-surface"], [data-slot="composer-rich-input"]')) {
return false
}
return gestureTargetOk(target)
}
/** 0 (far) 1 (inside the dock zone). Drives both the dock glow and the
* release-to-dock test (which fires at proximity 1). */
function dockProximityOf(rect: DOMRect) {
const horizontalDist = Math.abs(rect.left + rect.width / 2 - window.innerWidth / 2)
const verticalGap = window.innerHeight - DOCK_ZONE_BOTTOM_PX - rect.bottom
const v = verticalGap <= 0 ? 1 : Math.max(0, 1 - verticalGap / DOCK_VERTICAL_FALLOFF_PX)
const h =
horizontalDist <= DOCK_ZONE_CENTER_TOLERANCE_PX
? 1
: Math.max(0, 1 - (horizontalDist - DOCK_ZONE_CENTER_TOLERANCE_PX) / DOCK_HORIZONTAL_FALLOFF_PX)
return v * h
}
const clampOffset = (value: number, max: number) => Math.min(Math.max(0, value), max)
/** Fixed-position composer uses bottom/right insets; keep the grab point under the pointer. */
function popoutPositionUnderPointer(
clientX: number,
clientY: number,
grabX: number,
grabY: number,
boxWidth: number,
boxHeight: number
): PopoutPosition {
return {
bottom: window.innerHeight - clientY + grabY - boxHeight,
right: window.innerWidth - clientX + grabX - boxWidth
}
}
/**
* Gesture pop-out / dock for the composer fully gestural, no hold-to-toggle.
*
* Docked: drag the composer upward (off the dock) to peel it out into a float,
* then keep dragging in the same motion.
* Floating: drag the 5px frame to move instantly, or long-press the body then
* drag; release over the bottom-center dock band to snap back in.
*/
export function useComposerPopoutGestures({
composerRef,
onDock,
onPopOut,
poppedOut,
position
}: ComposerPopoutGesturesOptions) {
const [dragging, setDragging] = useState(false)
const [dockProximity, setDockProximity] = useState(0)
const stateRef = useRef<PressState | null>(null)
const timerRef = useRef<number | null>(null)
const liveRef = useRef(position)
liveRef.current = position
const onPopOutRef = useRef(onPopOut)
onPopOutRef.current = onPopOut
const clearTimer = useCallback(() => {
if (timerRef.current !== null) {
window.clearTimeout(timerRef.current)
timerRef.current = null
}
}, [])
const resetGesture = useCallback(() => {
clearTimer()
stateRef.current = null
setDragging(false)
setDockProximity(0)
}, [clearTimer])
const beginFloatDrag = useCallback(
(state: PressState, clientX: number, clientY: number, next: PopoutPosition, size?: PopoutSize) => {
clearTimer()
const clamped = setComposerPopoutPosition(next, { size })
liveRef.current = clamped
state.mode = 'float'
state.armed = true
state.startBottom = clamped.bottom
state.startRight = clamped.right
state.startX = clientX
state.startY = clientY
setDragging(true)
},
[clearTimer]
)
const peelOffFromDock = useCallback(
(state: PressState, clientX: number, clientY: number) => {
const composer = composerRef.current
if (!composer) {
return
}
const rem = parseFloat(getComputedStyle(document.documentElement).fontSize) || 16
const rect = composer.getBoundingClientRect()
const boxWidth = POPOUT_WIDTH_REM * rem
const boxHeight = POPOUT_ESTIMATED_HEIGHT
const grabX = clampOffset(state.startX - rect.left, boxWidth)
const grabY = clampOffset(state.startY - rect.top, boxHeight)
const next = popoutPositionUnderPointer(clientX, clientY, grabX, grabY, boxWidth, boxHeight)
beginFloatDrag(state, clientX, clientY, next, { height: boxHeight, width: boxWidth })
onPopOutRef.current()
},
[beginFloatDrag, composerRef]
)
const onPointerDown = useCallback(
(event: ReactPointerEvent<HTMLElement>) => {
if (event.button !== 0 || !gestureTargetOk(event.target)) {
return
}
// Floating: grabbing the 5px platform drags immediately.
if (poppedOut && isFloatDragPlatform(event.target)) {
stateRef.current = {
armed: true,
mode: 'float',
pointerId: event.pointerId,
startBottom: liveRef.current.bottom,
startRight: liveRef.current.right,
startX: event.clientX,
startY: event.clientY
}
setDragging(true)
return
}
stateRef.current = {
armed: false,
mode: poppedOut ? 'float' : 'dock',
pointerId: event.pointerId,
startBottom: liveRef.current.bottom,
startRight: liveRef.current.right,
startX: event.clientX,
startY: event.clientY
}
clearTimer()
// Docked has NO timer — pop-out is purely the upward peel gesture (handled
// in pointermove). Floating arms a long-press to drag the body.
if (poppedOut) {
timerRef.current = window.setTimeout(() => {
const state = stateRef.current
if (!state || state.armed) {
return
}
state.armed = true
setDragging(true)
}, LONG_PRESS_MS)
}
},
[clearTimer, poppedOut]
)
useEffect(() => {
// Coalesce drag updates to one per frame — pointermove can fire several times
// between paints on high-Hz mice, and each update re-renders + clamps.
let raf: number | null = null
let pending: { x: number; y: number } | null = null
const cancelRaf = () => {
if (raf !== null) {
cancelAnimationFrame(raf)
raf = null
}
}
const flush = () => {
raf = null
const state = stateRef.current
if (!state?.armed || state.mode !== 'float' || !pending) {
return
}
const composer = composerRef.current
const size = composer ? { height: composer.offsetHeight, width: composer.offsetWidth } : undefined
liveRef.current = setComposerPopoutPosition(
{
bottom: state.startBottom - (pending.y - state.startY),
right: state.startRight - (pending.x - state.startX)
},
{ size }
)
if (composer) {
setDockProximity(dockProximityOf(composer.getBoundingClientRect()))
}
}
const handleMove = (event: PointerEvent) => {
const state = stateRef.current
if (!state || event.pointerId !== state.pointerId) {
return
}
// Pre-arm: cheap threshold checks run inline (no per-frame work yet).
if (!state.armed) {
const deltaX = event.clientX - state.startX
const deltaY = event.clientY - state.startY
if (state.mode === 'dock') {
// Peel off only on a clear upward drag — not a sideways/down wiggle.
if (-deltaY > PEEL_OUT_PX && -deltaY > Math.abs(deltaX)) {
peelOffFromDock(state, event.clientX, event.clientY)
} else if (Math.abs(deltaX) > PEEL_OUT_PX || deltaY > LONG_PRESS_MOVE_TOLERANCE) {
resetGesture()
}
} else if (Math.abs(deltaX) > LONG_PRESS_MOVE_TOLERANCE || Math.abs(deltaY) > LONG_PRESS_MOVE_TOLERANCE) {
// Float body long-press pending: movement cancels the hold.
resetGesture()
}
return
}
if (state.mode !== 'float') {
return
}
event.preventDefault()
pending = { x: event.clientX, y: event.clientY }
raf ??= requestAnimationFrame(flush)
}
const handleUp = (event: PointerEvent) => {
const state = stateRef.current
if (!state || event.pointerId !== state.pointerId) {
return
}
cancelRaf()
if (state.armed && state.mode === 'float') {
const composer = composerRef.current
const rect = composer?.getBoundingClientRect()
if (rect && dockProximityOf(rect) >= 1) {
onDock()
} else {
// Persist the resting position once, on release — never per move.
const size = composer ? { height: composer.offsetHeight, width: composer.offsetWidth } : undefined
setComposerPopoutPosition(liveRef.current, { persist: true, size })
}
}
resetGesture()
}
window.addEventListener('pointermove', handleMove)
window.addEventListener('pointerup', handleUp)
window.addEventListener('pointercancel', handleUp)
return () => {
cancelRaf()
window.removeEventListener('pointermove', handleMove)
window.removeEventListener('pointerup', handleUp)
window.removeEventListener('pointercancel', handleUp)
}
}, [composerRef, onDock, peelOffFromDock, resetGesture])
useEffect(() => clearTimer, [clearTimer])
return { dockProximity, dragging, onPointerDown }
}

View file

@ -40,6 +40,13 @@ import {
isBrowsingHistory,
resetBrowseState
} from '@/store/composer-input-history'
import {
$composerPopoutPosition,
$composerPoppedOut,
POPOUT_WIDTH_REM,
setComposerPoppedOut,
setComposerPopoutPosition
} from '@/store/composer-popout'
import {
$queuedPromptsBySession,
enqueueQueuedPrompt,
@ -55,6 +62,7 @@ import { $statusItemsBySession } from '@/store/composer-status'
import { notify } from '@/store/notifications'
import { $gatewayState, $messages, setSessionPickerOpen } from '@/store/session'
import { $threadScrolledUp } from '@/store/thread-scroll'
import { isSecondaryWindow } from '@/store/windows'
import { useTheme } from '@/themes'
import { extractDroppedFiles, HERMES_PATHS_MIME, partitionDroppedFiles } from '../hooks/use-composer-actions'
@ -73,6 +81,7 @@ import {
} from './focus'
import { HelpHint } from './help-hint'
import { useAtCompletions } from './hooks/use-at-completions'
import { useComposerPopoutGestures } from './hooks/use-popout-drag'
import { useSlashCompletions } from './hooks/use-slash-completions'
import { useVoiceConversation } from './hooks/use-voice-conversation'
import { useVoiceRecorder } from './hooks/use-voice-recorder'
@ -85,6 +94,7 @@ import {
import { QueuePanel } from './queue-panel'
import {
composerPlainText,
deleteChipBeforeCaret,
deleteSelectionInEditor,
insertPlainTextAtCaret,
normalizeComposerEditorDom,
@ -185,6 +195,13 @@ export function ChatBar({
const queuedPromptsBySession = useStore($queuedPromptsBySession)
const statusItemsBySession = useStore($statusItemsBySession)
const scrolledUp = useStore($threadScrolledUp)
// Pop-out is a shared, persisted state — but secondary windows (the Ctrl+Shift+N
// tiny window, subagent watch windows) always start docked and can't pop out:
// a floating composer makes no sense in a single-session side window, and it
// would otherwise write the shared atom and yank the main window's composer out.
const popoutAllowed = !isSecondaryWindow()
const poppedOut = useStore($composerPoppedOut) && popoutAllowed
const popoutPosition = useStore($composerPopoutPosition)
const activeQueueSessionKey = queueSessionKey || sessionId || null
const queuedPrompts = useMemo(
@ -206,6 +223,32 @@ export function ChatBar({
const composerRef = useRef<HTMLFormElement | null>(null)
const composerSurfaceRef = useRef<HTMLDivElement | null>(null)
const editorRef = useRef<HTMLDivElement | null>(null)
const handleComposerPopOut = useCallback(() => {
triggerHaptic('open')
setComposerPoppedOut(true)
}, [])
const handleComposerDock = useCallback(() => {
triggerHaptic('success')
setComposerPoppedOut(false)
}, [])
// Double-click the grab area toggles dock/float. Undocking restores the last
// position (the persisted atom is never cleared on dock).
const handleComposerToggle = useCallback(() => {
poppedOut ? handleComposerDock() : handleComposerPopOut()
}, [handleComposerDock, handleComposerPopOut, poppedOut])
const { dockProximity, dragging, onPointerDown: onComposerGesturePointerDown } =
useComposerPopoutGestures({
composerRef,
onDock: handleComposerDock,
onPopOut: handleComposerPopOut,
poppedOut,
position: popoutPosition
})
const draftRef = useRef(draft)
const pendingDraftPersistRef = useRef<{ scope: string | null; text: string } | null>(null)
const activeQueueSessionKeyRef = useRef(activeQueueSessionKey)
@ -405,7 +448,10 @@ export function ChatBar({
return
}
if (draft.includes('\n')) {
// Only a non-trailing newline forces an immediate expand. A trailing newline
// (or phantom \n from contenteditable junk) is left to the ResizeObserver,
// which expands only when the editor's real height actually grows.
if (draft.trimEnd().includes('\n')) {
setExpanded(true)
}
}, [draft, expanded])
@ -428,6 +474,20 @@ export function ChatBar({
return
}
// Floating composer is out of the thread's flow — it must not reserve any
// bottom clearance. Zero the measured vars so the thread reclaims the space.
// (Read globals here so the callback stays stable; mirror the popoutAllowed
// gate since secondary windows are forced docked.)
if ($composerPoppedOut.get() && !isSecondaryWindow()) {
const root = document.documentElement
lastBucketedHeightRef.current = 0
lastBucketedSurfaceHeightRef.current = 0
root.style.setProperty('--composer-measured-height', '0px')
root.style.setProperty('--composer-surface-measured-height', '0px')
return
}
const { height, width } = composer.getBoundingClientRect()
const surfaceHeight = composerSurfaceRef.current?.getBoundingClientRect().height
const root = document.documentElement
@ -474,6 +534,35 @@ export function ChatBar({
useResizeObserver(syncComposerMetrics, composerRef, composerSurfaceRef, editorRef)
// Toggling pop-out changes whether the composer reserves thread clearance.
// The ResizeObserver may not fire (the box can keep the same box size), so
// re-sync explicitly: docked republishes the measured height, floating zeroes
// it so the thread reclaims the bottom space.
useEffect(() => {
syncComposerMetrics()
}, [poppedOut, syncComposerMetrics])
// Keep the floating box on-screen: re-clamp (with the real measured size) when
// it pops out and whenever the window resizes — so a position persisted on a
// bigger/other monitor, or a shrunk window, can never strand it out of reach.
useEffect(() => {
if (!poppedOut) {
return undefined
}
const reclamp = (persist: boolean) => {
const el = composerRef.current
const size = el ? { height: el.offsetHeight, width: el.offsetWidth } : undefined
setComposerPopoutPosition($composerPopoutPosition.get(), { persist, size })
}
reclamp(true)
const onResize = () => reclamp(false)
window.addEventListener('resize', onResize)
return () => window.removeEventListener('resize', onResize)
}, [poppedOut])
useEffect(() => {
return () => {
const root = document.documentElement
@ -832,6 +921,22 @@ export function ChatBar({
return
}
// Plain Backspace right after a directive chip: remove the chip + its
// auto-inserted trailing space as one unit, so deleting a directive never
// leaves an orphaned space. (Modified backspaces stay native.)
if (
event.key === 'Backspace' &&
!event.metaKey &&
!event.ctrlKey &&
!event.altKey &&
deleteChipBeforeCaret(event.currentTarget)
) {
event.preventDefault()
flushEditorToDraft(event.currentTarget)
return
}
// Non-collapsed Backspace/Delete: native selection-delete is ~O(n²) on large
// drafts (Ctrl+A → Delete froze ~1.3s). Collapsed carets fall through.
if (
@ -1720,6 +1825,7 @@ export function ChatBar({
busyAction={busyAction}
canSteer={canSteer}
canSubmit={canSubmit}
compactModelPill={poppedOut}
conversation={{
active: voiceConversationActive,
level: conversation.level,
@ -1750,7 +1856,7 @@ export function ChatBar({
autoCapitalize="off"
autoCorrect="off"
className={cn(
'min-h-(--composer-input-min-height) max-h-(--composer-input-max-height) overflow-y-auto whitespace-pre-wrap break-words [overflow-wrap:anywhere] bg-transparent pb-1 pr-1 pt-1 leading-normal text-foreground outline-none disabled:cursor-not-allowed',
'min-h-(--composer-input-min-height) max-h-(--composer-input-max-height) cursor-text overflow-y-auto whitespace-pre-wrap break-words [overflow-wrap:anywhere] bg-transparent pb-1 pr-1 pt-1 leading-normal text-foreground outline-none disabled:cursor-not-allowed',
'empty:before:content-[attr(data-placeholder)] empty:before:text-muted-foreground/60',
'**:data-ref-text:cursor-default',
stacked && 'pl-3',
@ -1819,10 +1925,34 @@ export function ChatBar({
return (
<>
{dragging && poppedOut && (
<div
aria-hidden
className="pointer-events-none fixed inset-x-0 bottom-0 z-20 h-32"
style={{
// A bottom-centered radial glow — soft on every side by construction,
// so it reads as the dock target without any hard band edges. Its
// intensity tracks how close the composer is to the dock (1 = peak).
background:
'radial-gradient(64% 130% at 50% 100%, color-mix(in srgb, var(--color-primary) 26%, transparent) 0%, transparent 70%)',
// Scaled by --dock-glow-scale (lower in light mode — see styles.css).
opacity: `calc(${0.1 + dockProximity * 0.57} * var(--dock-glow-scale, 1))`
}}
/>
)}
<ComposerPrimitive.Unstable_TriggerPopoverRoot>
<ComposerPrimitive.Root
className="group/composer absolute bottom-0 left-1/2 z-30 w-[min(var(--composer-width),calc(100%-2rem))] max-w-full -translate-x-1/2 rounded-2xl pt-2 pb-[var(--composer-shell-pad-block-end)]"
className={cn(
'group/composer z-30 overflow-visible rounded-2xl',
poppedOut
? // Floating: the composer (with its own border) floats with an even
// 5px transparent grab margin around it — drag that to move it.
'fixed w-[var(--composer-popout-width)] max-w-[calc(100vw-1.5rem)] bg-transparent p-[5px]'
: 'absolute bottom-0 left-1/2 w-[min(var(--composer-width),calc(100%-2rem))] max-w-full -translate-x-1/2 pt-2 pb-[var(--composer-shell-pad-block-end)]',
dragging && 'cursor-grabbing select-none touch-none'
)}
data-drag-active={dragActive ? '' : undefined}
data-popped-out={poppedOut ? '' : undefined}
data-slot="composer-root"
data-status-stack={statusStackVisible ? '' : undefined}
data-thread-scrolled-up={scrolledUp ? '' : undefined}
@ -1830,6 +1960,7 @@ export function ChatBar({
onDragLeave={handleDragLeave}
onDragOver={handleDragOver}
onDrop={handleDrop}
onPointerDown={popoutAllowed ? onComposerGesturePointerDown : undefined}
onSubmit={e => {
e.preventDefault()
@ -1840,6 +1971,16 @@ export function ChatBar({
submitDraft()
}}
ref={composerRef}
style={
poppedOut
? {
bottom: `${popoutPosition.bottom}px`,
right: `${popoutPosition.right}px`,
// A compact one-sentence width when floating.
['--composer-popout-width' as string]: `${POPOUT_WIDTH_REM}rem`
}
: undefined
}
>
{showHelpHint && <HelpHint />}
{trigger && !argStageEmpty && (
@ -1876,16 +2017,31 @@ export function ChatBar({
}
sessionId={statusSessionId}
/>
<div
className="pointer-events-none absolute inset-0 rounded-[inherit]"
style={{ background: COMPOSER_FADE_BACKGROUND }}
/>
{!poppedOut && (
<div
className="pointer-events-none absolute inset-0 rounded-[inherit]"
style={{ background: COMPOSER_FADE_BACKGROUND }}
/>
)}
{/* Drag region: covers the transparent grab margin around the surface.
The surface sits on top (z-4) so only the exposed ring receives this
element's hover/cursor grab cursor + a diagonal hatch (/////)
appear when you hover the draggable margin, never over the input.
The hatch pattern + opacity ladder live in styles.css. */}
{popoutAllowed && (
<div
aria-hidden
className={cn('pointer-events-auto absolute inset-0', dragging ? 'cursor-grabbing' : 'cursor-grab')}
data-dragging={dragging ? '' : undefined}
data-slot="composer-drag-region"
onDoubleClick={handleComposerToggle}
/>
)}
<div className="relative w-full rounded-[inherit]">
<div
className={cn(
'group/composer-surface relative z-4 isolate rounded-[inherit] border border-[color-mix(in_srgb,var(--dt-composer-ring)_calc(18%*var(--composer-ring-strength)),var(--dt-input))] transition-[border-color] duration-200 ease-out focus-within:border-[color-mix(in_srgb,var(--dt-composer-ring)_calc(45%*var(--composer-ring-strength)),transparent)]',
COMPOSER_DROP_FADE_CLASS,
'group-has-data-[state=open]/composer:border-t-transparent',
dragActive && COMPOSER_DROP_ACTIVE_CLASS
)}
data-slot="composer-surface"
@ -1941,7 +2097,7 @@ export function ChatBar({
: 'grid-cols-[auto_1fr_auto] items-center gap-(--composer-control-gap) [grid-template-areas:"menu_input_controls"]'
)}
>
<div className="flex items-center [grid-area:menu]">{contextMenu}</div>
<div className="flex translate-y-[3px] items-start self-start [grid-area:menu]">{contextMenu}</div>
<div className="min-w-0 [grid-area:input]">{input}</div>
<div className="flex items-center justify-end [grid-area:controls]">{controls}</div>
</div>

View file

@ -29,7 +29,15 @@ const PILL = cn(
* `model.options` dropdown (`modelMenuContent`) verbatim; falls back to the
* full picker when the gateway is closed and no live menu exists.
*/
export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatBarState['model'] }) {
export function ModelPill({
compact = false,
disabled,
model
}: {
compact?: boolean
disabled: boolean
model: ChatBarState['model']
}) {
const copy = useI18n().t.shell.statusbar
const currentModel = useStore($currentModel)
const currentProvider = useStore($currentProvider)
@ -40,7 +48,9 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
// The model resolves a beat after the gateway/session comes up. Rather than
// flash a literal "No model", show a quiet loader (inherits the pill text
// color at half opacity) until a model lands.
const label = (
const label = compact ? (
<ChevronDown className="size-3.5 shrink-0 opacity-70" />
) : (
<>
{currentModel.trim() ? (
<span className="truncate">{formatModelStatusLabel(currentModel, { fastMode, reasoningEffort })}</span>
@ -51,13 +61,22 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
</>
)
// Compact (floating composer): a snug square holding just the chevron — no pill
// padding, sized to match the other composer icon buttons.
const pillClass = compact
? cn(
'size-(--composer-control-size) shrink-0 justify-center gap-0 rounded-md p-0',
'text-(--ui-text-tertiary) hover:bg-(--chrome-action-hover) hover:text-foreground'
)
: PILL
const title = currentProvider ? copy.modelTitle(currentProvider, currentModel || copy.modelNone) : copy.switchModel
if (!model.modelMenuContent) {
return (
<Button
aria-label={copy.openModelPicker}
className={PILL}
className={pillClass}
disabled={disabled}
onClick={() => setModelPickerOpen(true)}
title={copy.openModelPicker}
@ -72,7 +91,14 @@ export function ModelPill({ disabled, model }: { disabled: boolean; model: ChatB
return (
<DropdownMenu onOpenChange={setOpen} open={open}>
<DropdownMenuTrigger asChild>
<Button aria-label={title} className={PILL} disabled={disabled} title={title} type="button" variant="ghost">
<Button
aria-label={title}
className={pillClass}
disabled={disabled}
title={title}
type="button"
variant="ghost"
>
{label}
</Button>
</DropdownMenuTrigger>

View file

@ -172,6 +172,60 @@ export function insertPlainTextAtCaret(editor: HTMLElement, text: string) {
}
}
/** Backspace at a collapsed caret immediately after a chip: delete the chip AND
* the single trailing space we auto-insert after it, atomically so removing a
* directive never strands an orphaned space (the contenteditable-driven cleanup
* was unreliable). Returns whether it ran. */
export function deleteChipBeforeCaret(editor: HTMLElement): boolean {
const hit = composerSelectionRange(editor)
if (!hit || !hit.range.collapsed) {
return false
}
const { startContainer, startOffset } = hit.range
let chip: ChildNode | null = null
if (startContainer === editor) {
chip = startOffset > 0 ? editor.childNodes[startOffset - 1] : null
} else if (startContainer.nodeType === Node.TEXT_NODE && startOffset === 0) {
chip = startContainer.previousSibling
}
if (chip?.nodeType !== Node.ELEMENT_NODE || !(chip as HTMLElement).dataset.refText) {
return false
}
const after = chip.nextSibling
chip.remove()
// Drop the auto-inserted trailing space; keep any real following text.
if (after?.nodeType === Node.TEXT_NODE) {
const text = after.textContent ?? ''
if (text === ' ') {
after.remove()
} else if (text.startsWith(' ')) {
after.textContent = text.slice(1)
}
}
const caret = document.createRange()
if (after?.isConnected) {
caret.setStartBefore(after)
} else {
caret.selectNodeContents(editor)
caret.collapse(false)
}
caret.collapse(true)
hit.selection.removeAllRanges()
hit.selection.addRange(caret)
return true
}
/** Remove a non-collapsed selection in-editor. Skips collapsed carets so word/
* line delete (Opt/Cmd+Backspace) stays native. Returns whether anything ran. */
export function deleteSelectionInEditor(editor: HTMLElement) {
@ -242,35 +296,68 @@ export function placeCaretEnd(element: HTMLElement) {
selection?.addRange(range)
}
/** Drop contenteditable junk that serializes as `\n` and falsely expands the composer. */
export function normalizeComposerEditorDom(editor: HTMLElement) {
if (editor.childNodes.length === 1 && editor.firstChild?.nodeName === 'BR') {
editor.replaceChildren()
return
/** Nothing but a break / whitespace (recursively) — i.e. no real text or chip. */
function isBlankNode(node: ChildNode | null): boolean {
if (!node) {
return false
}
if (node.nodeName === 'BR') {
return true
}
if (node.nodeType === Node.TEXT_NODE) {
return !(node.textContent || '').trim()
}
if (node.nodeType === Node.ELEMENT_NODE) {
const el = node as HTMLElement
return !el.dataset.refText && Array.from(el.childNodes).every(isBlankNode)
}
return false
}
/** Drop contenteditable junk that serializes as `\n` and falsely expands the
* composer. Editing around a contenteditable=false chip makes Chromium wrap the
* remainder in stray block <div>s / trailing <br>s none of which our own
* rendering emits (we use text nodes + <br> + chips). Real <br> line breaks
* (Shift+Enter, which sit after actual text) are preserved. */
export function normalizeComposerEditorDom(editor: HTMLElement) {
// A trailing block wrapper holding only a break/whitespace is the phantom
// "new line" Chromium adds after a chip on backspace — drop it.
const tailBlock = editor.lastChild as HTMLElement | null
if (
tailBlock?.nodeType === Node.ELEMENT_NODE &&
(tailBlock.tagName === 'DIV' || tailBlock.tagName === 'P') &&
isBlankNode(tailBlock)
) {
editor.removeChild(tailBlock)
}
// Unwrap a lone block wrapper back to inline content.
if (editor.childNodes.length === 1 && editor.firstChild?.nodeType === Node.ELEMENT_NODE) {
const wrapper = editor.firstChild as HTMLElement
if (wrapper.tagName === 'DIV' && wrapper.dataset.slot !== RICH_INPUT_SLOT) {
if ((wrapper.tagName === 'DIV' || wrapper.tagName === 'P') && wrapper.dataset.slot !== RICH_INPUT_SLOT) {
editor.replaceChildren(...Array.from(wrapper.childNodes))
}
}
// A trailing <br> right after a chip / only whitespace is a phantom line.
const last = editor.lastChild
if (last?.nodeName !== 'BR') {
return
}
if (last?.nodeName === 'BR') {
let prev: ChildNode | null = last.previousSibling
let prev: ChildNode | null = last.previousSibling
while (prev?.nodeType === Node.TEXT_NODE && !(prev.textContent || '').trim()) {
prev = prev.previousSibling
}
while (prev?.nodeType === Node.TEXT_NODE && !(prev.textContent || '').trim()) {
prev = prev.previousSibling
}
if ((prev as HTMLElement | null)?.dataset.refText) {
editor.removeChild(last)
if (!prev || (prev as HTMLElement).dataset?.refText) {
editor.removeChild(last)
}
}
}

View file

@ -137,7 +137,7 @@ export function ComposerTriggerPopover({
floating tooltip. */}
<span
className={cn(
'text-[0.8125rem] font-medium leading-snug text-foreground',
'font-medium leading-snug text-foreground',
active ? 'whitespace-normal break-words' : 'truncate'
)}
>
@ -146,7 +146,7 @@ export function ComposerTriggerPopover({
{description && (
<span
className={cn(
'text-[0.6875rem] leading-snug text-(--ui-text-tertiary)',
'leading-snug text-(--ui-text-tertiary)',
active ? 'whitespace-normal break-words' : 'truncate'
)}
>

View file

@ -0,0 +1,92 @@
import { afterEach, describe, expect, it, vi } from 'vitest'
import { $activeSessionId, $selectedStoredSessionId } from '@/store/session'
import { renameSessionPreferringRpc } from './session-actions-menu'
// The branched-session rename bug: a freshly branched session lives only in the
// gateway's runtime _sessions map (no state.db row yet), so REST PATCH
// /api/sessions/{id} 404s with "Session not found". renameSessionPreferringRpc
// must route the ACTIVE row through the session.title RPC (runtime id), which
// persists the row on demand, and otherwise fall back to REST.
const renameSession = vi.fn(async () => ({ ok: true, title: 'rest-title' }))
const request = vi.fn(async () => ({ title: 'rpc-title' }) as never)
const activeGateway = vi.fn<() => { request: typeof request } | null>(() => ({ request }))
vi.mock('@/hermes', () => ({
renameSession: (...args: unknown[]) => renameSession(...(args as [])),
HermesGateway: class {}
}))
vi.mock('@/store/gateway', () => ({
activeGateway: () => activeGateway()
}))
const RUNTIME_ID = 'rt-runtime-1'
const STORED_ID = 'stored-branch-1'
afterEach(() => {
renameSession.mockClear()
request.mockClear()
activeGateway.mockReset()
activeGateway.mockReturnValue({ request })
$activeSessionId.set(null)
$selectedStoredSessionId.set(null)
})
describe('renameSessionPreferringRpc', () => {
it('renames the active branched session via the session.title RPC, not REST', async () => {
$selectedStoredSessionId.set(STORED_ID)
$activeSessionId.set(RUNTIME_ID)
const result = await renameSessionPreferringRpc(STORED_ID, 'My branch')
expect(request).toHaveBeenCalledWith('session.title', { session_id: RUNTIME_ID, title: 'My branch' })
expect(renameSession).not.toHaveBeenCalled()
expect(result.title).toBe('rpc-title')
})
it('falls back to REST when the RPC fails (e.g. socket mid-reconnect)', async () => {
$selectedStoredSessionId.set(STORED_ID)
$activeSessionId.set(RUNTIME_ID)
request.mockRejectedValueOnce(new Error('not connected'))
const result = await renameSessionPreferringRpc(STORED_ID, 'My branch', 'work')
expect(request).toHaveBeenCalledOnce()
expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', 'work')
expect(result.title).toBe('rest-title')
})
it('uses REST for a non-active row (background/persisted session)', async () => {
$selectedStoredSessionId.set('some-other-active-session')
$activeSessionId.set(RUNTIME_ID)
await renameSessionPreferringRpc(STORED_ID, 'My branch', 'work')
expect(request).not.toHaveBeenCalled()
expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', 'work')
})
it('uses REST when clearing the title (RPC rejects empty titles)', async () => {
$selectedStoredSessionId.set(STORED_ID)
$activeSessionId.set(RUNTIME_ID)
await renameSessionPreferringRpc(STORED_ID, '')
expect(request).not.toHaveBeenCalled()
expect(renameSession).toHaveBeenCalledWith(STORED_ID, '', undefined)
})
it('uses REST when no gateway is connected', async () => {
$selectedStoredSessionId.set(STORED_ID)
$activeSessionId.set(RUNTIME_ID)
activeGateway.mockReturnValue(null)
await renameSessionPreferringRpc(STORED_ID, 'My branch')
expect(request).not.toHaveBeenCalled()
expect(renameSession).toHaveBeenCalledWith(STORED_ID, 'My branch', undefined)
})
})

View file

@ -19,10 +19,58 @@ import { renameSession } from '@/hermes'
import { useI18n } from '@/i18n'
import { triggerHaptic } from '@/lib/haptics'
import { exportSession } from '@/lib/session-export'
import { activeGateway } from '@/store/gateway'
import { notify, notifyError } from '@/store/notifications'
import { setSessions } from '@/store/session'
import { $activeSessionId, $selectedStoredSessionId, setSessions } from '@/store/session'
import { canOpenSessionWindow, openSessionInNewWindow } from '@/store/windows'
import type { SessionTitleResponse } from '../../types'
// Rename a session, preferring the gateway's session.title RPC over REST.
//
// A freshly *branched* session (and any brand-new chat) lives only in the
// gateway's in-memory _sessions map keyed by its RUNTIME id — no row is
// persisted to state.db until the first turn. REST PATCH /api/sessions/{id}
// resolves against the stored sessions table, so it 404s ("Session not found")
// on these runtime-only sessions. The session.title RPC resolves the live
// runtime session AND persists the row on demand, so it succeeds where REST
// cannot. This mirrors the /title slash command's fix (use-prompt-actions.ts).
//
// We only take the RPC path for the ACTIVE/selected session: its runtime id is
// known ($activeSessionId) and it lives on the active gateway, so there is no
// profile-routing ambiguity. Every other row (already persisted, possibly on a
// background profile) keeps the REST path, which handles profile scoping and a
// non-empty title is required by the RPC (it rejects clears), so clears stay on
// REST too.
export async function renameSessionPreferringRpc(
storedSessionId: string,
title: string,
profile?: string
): Promise<{ title?: string }> {
const isActiveRow = storedSessionId === $selectedStoredSessionId.get()
const runtimeId = isActiveRow ? $activeSessionId.get() : null
const gateway = activeGateway()
if (title && runtimeId && gateway) {
try {
const result = await gateway.request<SessionTitleResponse>('session.title', {
session_id: runtimeId,
title
})
return { title: result?.title ?? title }
} catch (err) {
// Fall through to REST — e.g. the socket is mid-reconnect. REST still
// works for any session that already has a persisted row. Log so a
// genuine RPC-side failure (which then surfaces a REST 404 for the
// runtime id) is at least diagnosable instead of silently swallowed.
console.warn('session.title RPC rename failed; falling back to REST', err)
}
}
return renameSession(storedSessionId, title, profile)
}
interface SessionActions {
sessionId: string
title: string
@ -235,7 +283,7 @@ function RenameSessionDialog({ open, onOpenChange, sessionId, currentTitle, prof
setSubmitting(true)
try {
const result = await renameSession(sessionId, next, profile)
const result = await renameSessionPreferringRpc(sessionId, next, profile)
const finalTitle = result.title || next || ''
setSessions(prev => prev.map(s => (s.id === sessionId ? { ...s, title: finalTitle || null } : s)))
notify({ durationMs: 2_000, kind: 'success', message: r.renamed })

View file

@ -395,7 +395,7 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
</div>
<div className="flex shrink-0 items-center gap-1.5 whitespace-nowrap">
<Button onClick={() => void runSystemAction('restart')} size="xs" variant="text">
{cc.restartMessaging}
{cc.restartGateway}
</Button>
<Button onClick={() => void runSystemAction('update')} size="xs" variant="textStrong">
{cc.updateHermes}
@ -426,7 +426,10 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
</span>
)}
</div>
<pre className="min-h-0 flex-1 overflow-auto whitespace-pre-wrap wrap-break-word rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-3 font-mono text-[0.65rem] leading-relaxed text-(--ui-text-tertiary)">
<pre
className="min-h-0 flex-1 overflow-auto whitespace-pre-wrap wrap-break-word rounded-lg border border-(--ui-stroke-tertiary) bg-(--ui-bg-quinary) p-3 font-mono text-[0.65rem] leading-relaxed text-(--ui-text-tertiary)"
data-selectable-text="true"
>
{logs.length ? logs.join('\n') : cc.noLogs}
</pre>
</div>

View file

@ -31,6 +31,7 @@ import {
Palette,
PawPrint,
Plus,
RefreshCw,
Settings,
Settings2,
Sun,
@ -42,6 +43,7 @@ import {
import { cn } from '@/lib/utils'
import { $commandPaletteOpen, $commandPalettePage, closeCommandPalette, setCommandPaletteOpen } from '@/store/command-palette'
import { $bindings } from '@/store/keybinds'
import { runGatewayRestart } from '@/store/system-actions'
import { luminance } from '@/themes/color'
import { type ThemeMode, useTheme } from '@/themes/context'
import { isUserTheme, resolveTheme } from '@/themes/user-themes'
@ -371,6 +373,13 @@ export function CommandPalette() {
keywords: ['command center', 'usage', 'tokens', 'cost'],
label: cc.sections.usage,
run: go(`${COMMAND_CENTER_ROUTE}?section=usage`)
},
{
icon: RefreshCw,
id: 'cc-restart-gateway',
keywords: ['gateway', 'restart', 'messaging', 'reconnect', 'system'],
label: cc.restartGateway,
run: () => void runGatewayRestart()
}
]
},

View file

@ -8,12 +8,14 @@ import { DesktopInstallOverlay } from '@/components/desktop-install-overlay'
import { DesktopOnboardingOverlay } from '@/components/desktop-onboarding-overlay'
import { GatewayConnectingOverlay } from '@/components/gateway-connecting-overlay'
import { Pane, PaneMain } from '@/components/pane-shell'
import { RemoteDisplayBanner } from '@/components/remote-display-banner'
import { useMediaQuery } from '@/hooks/use-media-query'
import { useSkinCommand } from '@/themes/use-skin-command'
import { formatRefValue } from '../components/assistant-ui/directive-text'
import { getCronJobs, getSessionMessages, listAllProfileSessions, type SessionInfo, triggerCronJob } from '../hermes'
import { type ChatMessage, chatMessageText, preserveLocalAssistantErrors, toChatMessages } from '../lib/chat-messages'
import { storedSessionIdForNotification } from '../lib/session-ids'
import {
isMessagingSource,
LOCAL_SESSION_SOURCE_IDS,
@ -279,16 +281,20 @@ export function DesktopController() {
}
}, [])
// Notification click: the main process already focused the window; jump to its session.
// Notification click: the main process already focused the window; jump to its
// session. Notifications are tagged with the gateway *runtime* session id, but
// the chat route is keyed by the *stored* id — navigating with the runtime id
// resumes a non-existent stored session ("session not found") and strands the
// user. Translate runtime -> stored before navigating.
useEffect(() => {
const unsubscribe = window.hermesDesktop?.onFocusSession?.(sessionId => {
if (sessionId) {
navigate(sessionRoute(sessionId))
navigate(sessionRoute(storedSessionIdForNotification(sessionId, runtimeIdByStoredSessionIdRef.current)))
}
})
return () => unsubscribe?.()
}, [navigate])
}, [navigate, runtimeIdByStoredSessionIdRef])
// Notification action button (Approve/Reject) — resolve in place, no navigation.
useEffect(() => {
@ -1001,6 +1007,7 @@ export function DesktopController() {
const overlays = (
<>
<RemoteDisplayBanner />
{!isSecondaryWindow() && <DesktopInstallOverlay />}
{!isSecondaryWindow() && (
<DesktopOnboardingOverlay

View file

@ -17,6 +17,7 @@ import { type Translations, useI18n } from '@/i18n'
import { AlertTriangle, ExternalLink, Save, Trash2 } from '@/lib/icons'
import { cn } from '@/lib/utils'
import { notify, notifyError } from '@/store/notifications'
import { runGatewayRestart } from '@/store/system-actions'
import { useRefreshHotkey } from '../hooks/use-refresh-hotkey'
import { useRouteEnumParam } from '../hooks/use-route-enum-param'
@ -97,6 +98,8 @@ function fieldCopy(field: MessagingEnvVarInfo, m: Translations['messaging']) {
export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, ...props }: MessagingViewProps) {
const { t } = useI18n()
const m = t.messaging
// Both save/toggle toasts offer the same one-click restart.
const restartGatewayAction = { label: t.commandCenter.restartGateway, onClick: () => void runGatewayRestart() }
const [platforms, setPlatforms] = useState<MessagingPlatformInfo[] | null>(null)
const [edits, setEdits] = useState<EditMap>({})
const [query, setQuery] = useState('')
@ -197,7 +200,8 @@ export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, .
notify({
kind: 'success',
title: enabled ? m.platformEnabled(platform.name) : m.platformDisabled(platform.name),
message: m.restartToApply
message: m.restartToApply,
action: restartGatewayAction
})
} catch (err) {
notifyError(err, m.failedUpdate(platform.name))
@ -222,7 +226,8 @@ export function MessagingView({ setStatusbarItemGroup: _setStatusbarItemGroup, .
notify({
kind: 'success',
title: m.setupSaved(platform.name),
message: m.restartToReconnect
message: m.restartToReconnect,
action: restartGatewayAction
})
} catch (err) {
notifyError(err, m.failedSave(platform.name))

View file

@ -173,6 +173,7 @@ function FilesystemTab({
disabled={!hasCwd || loading}
onClick={onRefresh}
size="icon-xs"
title={r.refreshTree}
variant="ghost"
>
<Codicon name="refresh" size="0.8125rem" spinning={loading} />
@ -182,6 +183,7 @@ function FilesystemTab({
className={HEADER_ACTION_CLASS}
onClick={() => void onChangeFolder()}
size="icon-xs"
title={r.openFolder}
variant="ghost"
>
<Codicon name="folder-opened" size="0.8125rem" />
@ -192,6 +194,7 @@ function FilesystemTab({
disabled={!hasCwd || !canCollapse}
onClick={onCollapseAll}
size="icon-xs"
title={r.collapseAll}
variant="ghost"
>
<Codicon name="collapse-all" size="0.8125rem" />

View file

@ -205,6 +205,67 @@ describe('usePromptActions /title', () => {
})
})
describe('usePromptActions slash.exec dispatch payloads', () => {
afterEach(() => {
cleanup()
$busy.set(false)
vi.restoreAllMocks()
})
it('submits /goal send directives returned directly by slash.exec instead of rendering no output', async () => {
const calls: { method: string; params?: Record<string, unknown> }[] = []
const states: Record<string, unknown>[] = []
const requestGateway = vi.fn(async (method: string, params?: Record<string, unknown>) => {
calls.push({ method, params })
if (method === 'slash.exec') {
return {
type: 'send',
notice: '⊙ Goal set. Starting now.',
message: 'write the implementation plan'
} as never
}
return {} as never
})
let handle: HarnessHandle | null = null
render(
<Harness
onReady={h => (handle = h)}
onSeedState={s => states.push(s)}
refreshSessions={async () => undefined}
requestGateway={requestGateway}
/>
)
await handle!.submitText('/goal write the implementation plan')
expect(calls.map(c => c.method)).toEqual(['slash.exec', 'prompt.submit'])
expect(calls[0]?.params).toEqual({
command: 'goal write the implementation plan',
session_id: RUNTIME_SESSION_ID
})
expect(calls[1]?.params).toEqual({
session_id: RUNTIME_SESSION_ID,
text: 'write the implementation plan'
})
const renderedText = states
.flatMap(state => {
const messages = Array.isArray(state.messages)
? (state.messages as Array<{ parts?: Array<{ text?: string }> }>)
: []
return messages.flatMap(message => (message.parts ?? []).map(part => part.text ?? ''))
})
.join('\n')
expect(renderedText).toContain('⊙ Goal set. Starting now.')
expect(renderedText).not.toContain('/goal: no output')
})
})
describe('usePromptActions desktop slash pickers', () => {
beforeEach(() => {
setSessions(() => [sessionInfo({ id: '20260610_120000_abcdef', title: 'Loaded session' })])

View file

@ -33,6 +33,7 @@ import {
clearComposerAttachments,
type ComposerAttachment,
setComposerAttachmentUploadState,
setComposerDraft,
terminalContextBlocksFromDraft,
updateComposerAttachment
} from '@/store/composer'
@ -916,31 +917,7 @@ export function usePromptActions({
return
}
try {
const result = await requestGateway<SlashExecResponse>('slash.exec', {
session_id: sessionId,
command: command.replace(/^\/+/, '')
})
const body = result?.output || `/${name}: no output`
renderSlashOutput(result?.warning ? `warning: ${result.warning}\n${body}` : body)
return
} catch {
// Fall back to command.dispatch for skill/send/alias directives.
}
try {
const dispatch = parseCommandDispatch(
await requestGateway<unknown>('command.dispatch', { session_id: sessionId, name, arg })
)
if (!dispatch) {
renderSlashOutput('error: invalid response: command.dispatch')
return
}
const handleDispatch = async (dispatch: NonNullable<ReturnType<typeof parseCommandDispatch>>): Promise<void> => {
if (dispatch.type === 'exec' || dispatch.type === 'plugin') {
renderSlashOutput(dispatch.output ?? '(no output)')
@ -953,8 +930,26 @@ export function usePromptActions({
return
}
// send / prefill carry an optional `notice` (e.g. "⊙ Goal set …")
// that the backend wants shown as a system line before the message
// is acted on. Mirrors the TUI's createSlashHandler — without it a
// `/goal <text>` looked like it did nothing.
if ((dispatch.type === 'send' || dispatch.type === 'prefill') && dispatch.notice?.trim()) {
renderSlashOutput(dispatch.notice.trim())
}
const message = ('message' in dispatch ? dispatch.message : '')?.trim() ?? ''
// /undo returns a prefill directive: drop the backed-up message into
// the composer for editing instead of submitting it immediately.
if (dispatch.type === 'prefill') {
if (message) {
setComposerDraft(message)
}
return
}
if (!message) {
renderSlashOutput(
`/${name}: ${dispatch.type === 'skill' ? 'skill payload missing message' : 'empty message'}`
@ -974,6 +969,43 @@ export function usePromptActions({
}
await submitPromptText(message)
}
try {
const result = await requestGateway<unknown>('slash.exec', {
session_id: sessionId,
command: command.replace(/^\/+/, '')
})
const dispatch = parseCommandDispatch(result)
if (dispatch) {
await handleDispatch(dispatch)
return
}
const output = result && typeof result === 'object' ? (result as SlashExecResponse) : null
const body = output?.output || `/${name}: no output`
renderSlashOutput(output?.warning ? `warning: ${output.warning}\n${body}` : body)
return
} catch {
// Fall back to command.dispatch for skill/send/alias directives.
}
try {
const dispatch = parseCommandDispatch(
await requestGateway<unknown>('command.dispatch', { session_id: sessionId, name, arg })
)
if (!dispatch) {
renderSlashOutput('error: invalid response: command.dispatch')
return
}
await handleDispatch(dispatch)
} catch (err) {
renderSlashOutput(`error: ${err instanceof Error ? err.message : String(err)}`)
}

View file

@ -13,7 +13,8 @@ import {
$updateStatus,
checkUpdates,
openUpdatesWindow,
refreshDesktopVersion
refreshDesktopVersion,
startActiveUpdate
} from '@/store/updates'
import { ListRow, SectionHeading, SettingsContent } from './primitives'
@ -141,9 +142,14 @@ export function AboutSettings() {
</Button>
{behind > 0 && supported && !applying && (
<Button onClick={() => openUpdatesWindow()} size="sm">
{a.seeWhatsNew}
</Button>
<>
<Button onClick={() => startActiveUpdate()} size="sm">
{a.updateNow}
</Button>
<Button onClick={() => openUpdatesWindow()} size="sm" variant="textStrong">
{a.seeWhatsNew}
</Button>
</>
)}
<Button asChild className="ml-auto" size="sm" variant="text">

View file

@ -74,7 +74,6 @@ export const PROVIDER_GROUPS: ProviderPrefix[] = [
priority: 4
},
{ prefix: 'GEMINI_', name: 'Gemini', priority: 4 },
{ prefix: 'HERMES_GEMINI_', name: 'Gemini', priority: 4 },
{
prefix: 'DEEPSEEK_',
name: 'DeepSeek',

View file

@ -132,9 +132,9 @@ describe('settings helpers', () => {
// KIMI_CN_ likewise must beat KIMI_.
expect(providerGroup('KIMI_CN_API_KEY')).toBe('Kimi (China)')
expect(providerGroup('KIMI_API_KEY')).toBe('Kimi / Moonshot')
// HERMES_QWEN_ and HERMES_GEMINI_ both share the HERMES_ stem.
// HERMES_QWEN_ shares the HERMES_ stem with other integrations.
expect(providerGroup('HERMES_QWEN_BASE_URL')).toBe('DashScope (Qwen)')
expect(providerGroup('HERMES_GEMINI_CLIENT_ID')).toBe('Gemini')
expect(providerGroup('GEMINI_API_KEY')).toBe('Gemini')
})
it('falls back to "Other" for un-grouped env vars', () => {

View file

@ -2,7 +2,7 @@ import { cleanup, fireEvent, render, screen, waitFor } from '@testing-library/re
import { atom } from 'nanostores'
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'
import type { OAuthProvider } from '@/types/hermes'
import type { EnvVarInfo, OAuthProvider } from '@/types/hermes'
const listOAuthProviders = vi.fn()
const disconnectOAuthProvider = vi.fn()
@ -36,6 +36,25 @@ function provider(id: string, loggedIn: boolean, patch: Partial<OAuthProvider> =
}
}
// One `/api/env` row (an EnvVarInfo) for the API-keys view. Mirrors the
// `provider()` factory above: a valid base + per-test overrides, typed against
// the real response shape so it can't drift from EnvVarInfo.
function keyVar(patch: Partial<EnvVarInfo> = {}): EnvVarInfo {
return {
advanced: false,
category: 'provider',
description: '',
is_password: true,
is_set: false,
provider: '',
provider_label: '',
redacted_value: null,
tools: [],
url: '',
...patch
}
}
beforeEach(() => {
onboarding.set({ manual: false })
getEnvVars.mockResolvedValue({})
@ -97,4 +116,56 @@ describe('ProvidersSettings', () => {
expect(screen.queryByRole('button', { name: 'Remove Qwen Code' })).toBeNull()
expect(screen.getByText(/managed by its own CLI/)).toBeTruthy()
})
it('renders a Keys card for a backend-tagged provider with no PROVIDER_GROUPS prefix', async () => {
// A provider the backend catalog tags (provider/provider_label) but that has
// no desktop PROVIDER_GROUPS prefix row must still render its own card —
// this is the GUI/CLI drift fix: membership comes from the backend, not
// from the hand-maintained prefix list.
getEnvVars.mockResolvedValue({
WIDGETAI_API_KEY: keyVar({
provider: 'widgetai',
provider_label: 'WidgetAI',
url: 'https://widgetai.example/keys'
})
})
listOAuthProviders.mockResolvedValue({ providers: [] })
const { ProvidersSettings } = await import('./providers-settings')
render(<ProvidersSettings onClose={vi.fn()} onViewChange={vi.fn()} view="keys" />)
expect(await screen.findByText('WidgetAI')).toBeTruthy()
})
it('orders API-key providers by priority then name, and filters them via search', async () => {
// These three providers have no curated PROVIDER_GROUPS priority, so they
// share the default priority and fall back to alphabetical among themselves
// (Acme, Middle, Zebra) — exercising the name tiebreak of the priority sort.
getEnvVars.mockResolvedValue({
ZEBRA_API_KEY: keyVar({ provider: 'zebra', provider_label: 'Zebra' }),
ACME_API_KEY: keyVar({ provider: 'acme', provider_label: 'Acme' }),
MIDDLE_API_KEY: keyVar({ provider: 'middle', provider_label: 'Middle' })
})
listOAuthProviders.mockResolvedValue({ providers: [] })
const { ProvidersSettings } = await import('./providers-settings')
render(<ProvidersSettings onClose={vi.fn()} onViewChange={vi.fn()} view="keys" />)
// Equal priority → alphabetical tiebreak: Acme, Middle, Zebra.
await screen.findByText('Acme')
const labels = screen.getAllByText(/Acme|Middle|Zebra/).map(el => el.textContent)
expect(labels).toEqual(['Acme', 'Middle', 'Zebra'])
// Typing narrows the list to matching providers only.
const search = screen.getByPlaceholderText('Search providers…')
fireEvent.change(search, { target: { value: 'mid' } })
await waitFor(() => expect(screen.queryByText('Acme')).toBeNull())
expect(screen.getByText('Middle')).toBeTruthy()
expect(screen.queryByText('Zebra')).toBeNull()
// A non-matching query shows the empty-state copy.
fireEvent.change(search, { target: { value: 'nonesuch-xyz' } })
expect(await screen.findByText('No providers match your search.')).toBeTruthy()
})
})

View file

@ -12,6 +12,7 @@ import {
sortProviders
} from '@/components/desktop-onboarding-overlay'
import { Button } from '@/components/ui/button'
import { SearchField } from '@/components/ui/search-field'
import { disconnectOAuthProvider, listOAuthProviders } from '@/hermes'
import { useI18n } from '@/i18n'
import { Check, ChevronDown, ChevronRight, KeyRound, Loader2, Terminal, Trash2 } from '@/lib/icons'
@ -45,8 +46,17 @@ export const PROVIDER_VIEWS = ['accounts', 'keys'] as const
export type ProviderView = (typeof PROVIDER_VIEWS)[number]
// Group the env catalog by provider — one ListRow per vendor plus optional
// advanced overrides (base URL, region, etc.). Groups without a key field and
// the "Other" bucket are skipped.
// advanced overrides (base URL, region, etc.). Groups without a key field are
// skipped.
//
// Grouping key precedence:
// 1. Backend `provider_label` / `provider` (from the unified provider catalog
// in hermes_cli/provider_catalog.py) — the SAME provider identity
// `hermes model` uses. This is authoritative: a provider tagged by the
// backend always renders a card, even with no PROVIDER_GROUPS row.
// 2. Desktop prefix match (`providerGroup`) — legacy fallback for provider
// env vars that predate the backend tagging.
// Only entries that resolve to neither (the "Other" bucket) are skipped.
function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGroup[] {
const buckets = new Map<string, [string, EnvVarInfo][]>()
@ -55,7 +65,9 @@ function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGr
continue
}
const name = providerGroup(key)
// Prefer the backend-supplied provider label/id so the Keys tab groups by
// the same identity the CLI picker uses; fall back to the prefix guess.
const name = info.provider_label?.trim() || info.provider?.trim() || providerGroup(key)
if (name === 'Other') {
continue
@ -73,6 +85,9 @@ function buildProviderKeyGroups(vars: Record<string, EnvVarInfo>): ProviderKeyGr
continue
}
// Presentation overlay (priority, blurb, docs) is keyed by the prefix-based
// group name; when the backend introduced this provider it may have no
// overlay entry, so fall back to the backend/env metadata for display.
const meta = providerMeta(name)
groups.push({
@ -131,6 +146,7 @@ function OAuthPicker({
const rest = featured ? ordered.filter(p => p.id !== FEATURED_ID) : ordered
// Keep connected accounts grouped and always visible; only the unconnected
// providers hide behind the disclosure, so the page leads with what's set up.
// Both lists preserve `sortProviders` order (curated priority, then name).
const connected = rest.filter(p => p.status?.logged_in)
const others = rest.filter(p => !p.status?.logged_in)
const collapsible = others.length > 0
@ -284,6 +300,8 @@ export function ProvidersSettings({ onClose, onViewChange, view }: ProvidersSett
const [oauthProviders, setOauthProviders] = useState<OAuthProvider[]>([])
const [openProvider, setOpenProvider] = useState<null | string>(null)
const [disconnecting, setDisconnecting] = useState<null | string>(null)
// Free-text filter for the API-keys view (provider name / env-var key / desc).
const [keyQuery, setKeyQuery] = useState('')
// The onboarding overlay owns the OAuth flow. Watch its `manual` flag so we
// re-read connection state when the user finishes (or dismisses) a sign-in
// they launched from this page — otherwise the cards keep their stale status.
@ -372,20 +390,49 @@ export function ProvidersSettings({ onClose, onViewChange, view }: ProvidersSett
const keyGroups = buildProviderKeyGroups(vars)
if (showApiKeys) {
const q = keyQuery.trim().toLowerCase()
const visibleGroups = q
? keyGroups.filter(group => {
const haystack = [
group.name,
group.description ?? '',
group.primary[0],
...group.advanced.map(([k]) => k)
]
return haystack.some(s => s.toLowerCase().includes(q))
})
: keyGroups
return (
<SettingsContent>
{keyGroups.length > 0 ? (
<div className="grid gap-2">
{keyGroups.map(group => (
<ProviderKeyRows
expanded={openProvider === group.name}
group={group}
key={group.name}
onExpand={() => setOpenProvider(group.name)}
onToggle={() => setOpenProvider(prev => (prev === group.name ? null : group.name))}
rowProps={rowProps}
/>
))}
<div className="grid gap-3">
<SearchField
aria-label={t.settings.providers.searchKeys}
containerClassName="w-full"
onChange={setKeyQuery}
placeholder={t.settings.providers.searchKeys}
value={keyQuery}
/>
{visibleGroups.length > 0 ? (
<div className="grid gap-2">
{visibleGroups.map(group => (
<ProviderKeyRows
expanded={openProvider === group.name}
group={group}
key={group.name}
onExpand={() => setOpenProvider(group.name)}
onToggle={() => setOpenProvider(prev => (prev === group.name ? null : group.name))}
rowProps={rowProps}
/>
))}
</div>
) : (
<div className="grid min-h-24 place-items-center px-4 py-6 text-center text-[length:var(--conversation-caption-font-size)] text-muted-foreground">
{t.settings.providers.noKeysMatch}
</div>
)}
</div>
) : (
<NoProviderKeys />

View file

@ -272,7 +272,10 @@ function PostSetupRunner({ toolset, postSetupKey, onComplete }: PostSetupRunnerP
</div>
{status && (status.lines.length > 0 || status.running) && (
<pre className="max-h-48 overflow-y-auto rounded-md bg-background px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground whitespace-pre-wrap">
<pre
className="max-h-48 overflow-y-auto rounded-md bg-background px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground whitespace-pre-wrap"
data-selectable-text="true"
>
{status.lines.length > 0 ? status.lines.join('\n') : copy.postSetupStarting}
</pre>
)}

View file

@ -4,6 +4,7 @@ import { useCallback, useMemo } from 'react'
import type { CommandCenterSection } from '@/app/command-center'
import { $terminalTakeover, setTerminalTakeover } from '@/app/right-sidebar/store'
import { GatewayMenuPanel } from '@/app/shell/gateway-menu-panel'
import { GlyphSpinner } from '@/components/ui/glyph-spinner'
import { useI18n } from '@/i18n'
import {
Activity,
@ -35,6 +36,7 @@ import {
setYoloActive
} from '@/store/session'
import { $subagentsBySession, activeSubagentCount } from '@/store/subagents'
import { $gatewayRestarting } from '@/store/system-actions'
import {
$backendUpdateApply,
$backendUpdateStatus,
@ -89,6 +91,7 @@ export function useStatusbarItems({
const busy = useStore($busy)
const currentUsage = useStore($currentUsage)
const desktopActionTasks = useStore($desktopActionTasks)
const gatewayRestarting = useStore($gatewayRestarting)
const previewServerRestartStatus = useStore($previewServerRestartStatus)
const sessionStartedAt = useStore($sessionStartedAt)
const turnStartedAt = useStore($turnStartedAt)
@ -299,9 +302,15 @@ export function useStatusbarItems({
variant: 'action'
},
{
className: gatewayClassName,
detail: gatewayDetail,
icon: inferenceReady ? <Activity className="size-3" /> : <AlertCircle className="size-3" />,
className: gatewayRestarting ? undefined : gatewayClassName,
detail: gatewayRestarting ? copy.gatewayRestarting : gatewayDetail,
icon: gatewayRestarting ? (
<GlyphSpinner ariaLabel={copy.gatewayRestarting} className="size-3" />
) : inferenceReady ? (
<Activity className="size-3" />
) : (
<AlertCircle className="size-3" />
),
id: 'gateway-health',
label: copy.gateway,
menuClassName: 'w-72',
@ -354,6 +363,7 @@ export function useStatusbarItems({
gatewayMenuContent,
gatewayClassName,
gatewayDetail,
gatewayRestarting,
inferenceReady,
inferenceStatus?.reason,
openAgents,

View file

@ -1,5 +1,5 @@
import { useStore } from '@nanostores/react'
import { useQuery } from '@tanstack/react-query'
import { useQuery, useQueryClient } from '@tanstack/react-query'
import { createContext, useContext, useMemo, useState } from 'react'
import { Codicon } from '@/components/ui/codicon'
@ -62,6 +62,8 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
const copy = t.shell.modelMenu
const closeMenu = useContext(ModelMenuCloseContext)
const [search, setSearch] = useState('')
const [refreshing, setRefreshing] = useState(false)
const queryClient = useQueryClient()
// Reactive session state is read from the stores here (not drilled in), so
// toggling effort/fast/model re-renders this panel in place without forcing
// the parent to rebuild the menu content (which would close the dropdown).
@ -110,6 +112,38 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
// next session.create (see selectModel). The default lives in Settings → Model.
const switchTo = (model: string, provider: string) => onSelectModel({ model, provider })
// Explicit "Refresh Models": re-fetch the catalog with refresh:true so the
// backend busts its 1h provider-model disk cache and re-pulls each provider's
// live list. Fixes live-only models (e.g. OpenCode Zen free tier) vanishing
// when the cache expires and falls back to the curated static list.
const refreshModels = async () => {
if (refreshing) {
return
}
setRefreshing(true)
try {
const queryKey = ['model-options', activeSessionId || 'global']
const next =
gateway && activeSessionId
? await gateway.request<ModelOptionsResponse>('model.options', {
session_id: activeSessionId,
refresh: true
})
: await getGlobalModelOptions({ refresh: true })
queryClient.setQueryData<ModelOptionsResponse>(queryKey, next)
} catch {
// Network/backend hiccup — fall back to a plain invalidate so the next
// open re-fetches (still cached, but no worse than before).
void queryClient.invalidateQueries({ queryKey: ['model-options'] })
} finally {
setRefreshing(false)
}
}
// Selecting a model row restores that model's remembered preset onto the
// session (effort/fast), gated by capability. Unset → Hermes defaults.
const selectFamily = async (family: ModelFamily, provider: ModelOptionProvider) => {
@ -173,7 +207,7 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
{copy.noModels}
</DropdownMenuItem>
) : (
<div className="max-h-80 overflow-y-auto py-0.5">
<div className="max-h-[max(150px,30dvh)] overflow-y-auto py-0.5">
{groups.map(group => (
<DropdownMenuGroup className="py-0.5" key={group.provider.slug}>
<DropdownMenuLabel className={dropdownMenuSectionLabel}>{group.provider.name}</DropdownMenuLabel>
@ -268,10 +302,23 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
<DropdownMenuSeparator className="mx-0" />
<DropdownMenuItem
className={cn(dropdownMenuRow, 'text-(--ui-text-tertiary)')}
disabled={refreshing}
onSelect={event => {
event.preventDefault()
void refreshModels()
}}
>
<Codicon className={cn(refreshing && 'animate-spin')} name="sync" size="0.75rem" />
{copy.refreshModels}
</DropdownMenuItem>
<DropdownMenuItem
className={cn(dropdownMenuRow, 'text-(--ui-text-tertiary)')}
onSelect={() => setModelVisibilityOpen(true)}
>
<Codicon name="settings-gear" size="0.75rem" />
{copy.editModels}
</DropdownMenuItem>
</>

View file

@ -106,6 +106,13 @@ export interface SkillCommandDispatchResponse {
export interface SendCommandDispatchResponse {
type: 'send'
message: string
notice?: string
}
export interface PrefillCommandDispatchResponse {
type: 'prefill'
message: string
notice?: string
}
export type CommandDispatchResponse =
@ -113,6 +120,7 @@ export type CommandDispatchResponse =
| AliasCommandDispatchResponse
| SkillCommandDispatchResponse
| SendCommandDispatchResponse
| PrefillCommandDispatchResponse
export type SidebarNavId = 'artifacts' | 'command-center' | 'messaging' | 'new-session' | 'settings' | 'skills'

View file

@ -61,14 +61,16 @@ export function UpdatesOverlay() {
const behind = status?.behind ?? 0
const phase: 'idle' | 'applying' | 'manual' | 'error' =
const phase: 'idle' | 'applying' | 'manual' | 'guiSkew' | 'error' =
apply.stage === 'manual'
? 'manual'
: apply.applying || apply.stage === 'restart'
? 'applying'
: apply.stage === 'error'
? 'error'
: 'idle'
: apply.stage === 'guiSkew'
? 'guiSkew'
: apply.applying || apply.stage === 'restart'
? 'applying'
: apply.stage === 'error'
? 'error'
: 'idle'
const handleClose = (next: boolean) => {
if (phase === 'applying') {
@ -77,7 +79,13 @@ export function UpdatesOverlay() {
setUpdateOverlayOpen(next)
if (!next && (apply.stage === 'error' || apply.stage === 'restart' || apply.stage === 'manual')) {
if (
!next &&
(apply.stage === 'error' ||
apply.stage === 'restart' ||
apply.stage === 'manual' ||
apply.stage === 'guiSkew')
) {
resetUpdateApplyState()
}
}
@ -95,7 +103,11 @@ export function UpdatesOverlay() {
{phase === 'applying' && <ApplyingView apply={apply} isBackend={isBackend} />}
{phase === 'manual' && (
<ManualView command={apply.command ?? 'hermes update'} onDone={() => handleClose(false)} />
<ManualView command={apply.command ?? null} message={apply.message} onDone={() => handleClose(false)} />
)}
{phase === 'guiSkew' && (
<GuiSkewView message={apply.message} onDone={() => handleClose(false)} />
)}
{phase === 'error' && (
@ -251,18 +263,48 @@ function IdleView({
)
}
function ManualView({ command, onDone }: { command: string; onDone: () => void }) {
function ManualView({
command,
message,
onDone
}: {
command: string | null
message?: string
onDone: () => void
}) {
const { t } = useI18n()
const u = t.updates
const [copied, setCopied] = useState(false)
const handleCopy = () => {
if (!command) return
void writeClipboardText(command).then(() => {
setCopied(true)
window.setTimeout(() => setCopied(false), 1800)
})
}
// No command (e.g. the Linux sandbox-blocked relaunch): render the explanatory
// message + a Done button, not a copy-a-command box.
if (!command) {
return (
<div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
<div className="flex flex-col items-center gap-3 text-center">
<Terminal className="size-8 text-primary" />
<DialogTitle className="text-center text-xl">{u.manualTitle}</DialogTitle>
<DialogDescription className="text-center text-sm">
{message || u.manualPickedUp}
</DialogDescription>
</div>
<Button className="font-semibold" onClick={onDone} size="lg" variant="secondary">
{u.done}
</Button>
</div>
)
}
return (
<div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
<div className="flex flex-col items-center gap-3 text-center">
@ -309,6 +351,32 @@ function ManualView({ command, onDone }: { command: string; onDone: () => void }
)
}
// Linux GUI/backend skew (#45205): backend updated, but the running desktop app
// package (AppImage/.deb/.rpm) was NOT changed. Closeable terminal state that
// tells the user to update/reinstall the desktop app — never claims the GUI was
// updated.
function GuiSkewView({ message, onDone }: { message?: string; onDone: () => void }) {
const { t } = useI18n()
const u = t.updates
return (
<div className="grid gap-5 px-6 pb-6 pt-7 pr-8">
<div className="flex flex-col items-center gap-3 text-center">
<AlertCircle className="size-8 text-amber-500" />
<DialogTitle className="text-center text-xl">{u.guiSkewTitle}</DialogTitle>
<DialogDescription className="max-w-prose text-center text-sm leading-5 text-muted-foreground">
{message || u.guiSkewBody}
</DialogDescription>
</div>
<Button className="font-semibold" onClick={onDone} size="lg" variant="secondary">
{u.done}
</Button>
</div>
)
}
function ApplyingView({ apply, isBackend }: { apply: UpdateApplyState; isBackend: boolean }) {
const { t } = useI18n()
const u = t.updates

View file

@ -859,7 +859,10 @@ const ProcessNotificationNote: FC<{ text: string }> = ({ text }) => {
<summary className="cursor-pointer select-none text-muted-foreground/45 hover:text-muted-foreground/70">
output
</summary>
<pre className="mt-0.5 max-h-48 overflow-auto whitespace-pre-wrap font-mono text-[0.625rem] leading-4 text-muted-foreground/55">
<pre
className="mt-0.5 max-h-48 overflow-auto whitespace-pre-wrap font-mono text-[0.625rem] leading-4 text-muted-foreground/55"
data-selectable-text="true"
>
{detail}
</pre>
</details>

View file

@ -1,4 +1,4 @@
import { cleanup, fireEvent, render, screen, waitFor } from '@testing-library/react'
import { cleanup, fireEvent, render, screen, waitFor, within } from '@testing-library/react'
import { afterEach, beforeAll, describe, expect, it, vi } from 'vitest'
import type { HermesGateway } from '@/hermes'
@ -6,7 +6,7 @@ import { $gateway } from '@/store/gateway'
import { $approvalRequest, clearAllPrompts, setApprovalRequest } from '@/store/prompts'
import { $activeSessionId } from '@/store/session'
import { PendingToolApproval } from './tool-approval'
import { PendingApprovalFallback, PendingToolApproval } from './tool-approval'
import type { ToolPart } from './tool-fallback-model'
// Radix's DropdownMenu touches pointer-capture + scrollIntoView, which jsdom
@ -130,4 +130,30 @@ describe('PendingToolApproval', () => {
expect(await screen.findByRole('menuitem', { name: /Allow this session/ })).toBeTruthy()
expect(screen.queryByRole('menuitem', { name: /Always allow/ })).toBeNull()
})
it('renders a floating fallback when no pending tool row is mounted', () => {
setRequest('rm /tmp/hermes_approval_test.txt')
const { container } = render(<PendingApprovalFallback />)
const fallback = container.querySelector('[data-slot="tool-approval-fallback"]')
expect(fallback).not.toBeNull()
expect(within(fallback as HTMLElement).getByRole('button', { name: /Run/ })).toBeTruthy()
expect(within(fallback as HTMLElement).getByRole('button', { name: /Reject/ })).toBeTruthy()
})
it('hides the floating fallback once the inline approval bar is mounted', async () => {
setRequest('rm /tmp/hermes_approval_test.txt')
const { container } = render(
<>
<PendingToolApproval part={part('terminal')} />
<PendingApprovalFallback />
</>
)
await waitFor(() => {
expect(container.querySelector('[data-slot="tool-approval-inline"]')).not.toBeNull()
expect(container.querySelector('[data-slot="tool-approval-fallback"]')).toBeNull()
})
})
})

View file

@ -15,11 +15,17 @@ import {
import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigger } from '@/components/ui/dropdown-menu'
import { useI18n } from '@/i18n'
import { triggerHaptic } from '@/lib/haptics'
import { ChevronDown, Loader2 } from '@/lib/icons'
import { AlertCircle, ChevronDown, Loader2 } from '@/lib/icons'
import { cn } from '@/lib/utils'
import { $gateway } from '@/store/gateway'
import { notifyError } from '@/store/notifications'
import { $approvalRequest, type ApprovalRequest, clearApprovalRequest } from '@/store/prompts'
import {
$approvalInlineVisible,
$approvalRequest,
type ApprovalRequest,
clearApprovalRequest,
registerApprovalInlineAnchor
} from '@/store/prompts'
import type { ToolPart } from './tool-fallback-model'
@ -48,12 +54,47 @@ export const PendingToolApproval: FC<{ part: ToolPart }> = ({ part }) => {
return null
}
return <ApprovalBar request={request} />
return <InlineApprovalBar request={request} />
}
const InlineApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
useEffect(() => registerApprovalInlineAnchor(), [])
return <ApprovalBar request={request} surface="inline" />
}
export const PendingApprovalFallback: FC = () => {
const { t } = useI18n()
const request = useStore($approvalRequest)
const inlineVisible = useStore($approvalInlineVisible)
if (!request || inlineVisible) {
return null
}
return (
<div
className="pointer-events-none absolute left-1/2 z-30 w-[calc(100%-2rem)] max-w-2xl -translate-x-1/2"
data-slot="tool-approval-fallback"
style={{ bottom: 'calc(var(--composer-measured-height) + var(--status-stack-measured-height) + 0.875rem)' }}
>
<div className="pointer-events-auto rounded-xl border border-primary/30 bg-(--ui-chat-surface-background) px-3 py-2 shadow-lg backdrop-blur-xl [-webkit-backdrop-filter:blur(1rem)]">
<div className="flex min-w-0 items-center gap-2 text-sm text-primary">
<AlertCircle className="size-4 shrink-0" />
<span className="shrink-0 font-medium">{t.assistant.approval.jumpToApproval}</span>
{request.description && (
<span className="min-w-0 truncate text-(--ui-text-tertiary)">{request.description}</span>
)}
</div>
<ApprovalBar request={request} surface="floating" />
</div>
</div>
)
}
const isMac = typeof navigator !== 'undefined' && /Mac|iP(hone|ad|od)/.test(navigator.platform)
const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
const ApprovalBar: FC<{ request: ApprovalRequest; surface: 'floating' | 'inline' }> = ({ request, surface }) => {
const { t } = useI18n()
const copy = t.assistant.approval
const gateway = useStore($gateway)
@ -99,7 +140,7 @@ const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
setSubmitting(null)
}
},
[busy, gateway, request.sessionId]
[busy, copy.gatewayDisconnected, copy.sendFailed, gateway, request.sessionId]
)
// ⌘/Ctrl+Enter → Run, Esc → Reject.
@ -126,7 +167,10 @@ const ApprovalBar: FC<{ request: ApprovalRequest }> = ({ request }) => {
}, [confirmAlways, respond])
return (
<div className="mt-1 ps-5" data-slot="tool-approval-inline">
<div
className={cn(surface === 'inline' ? 'mt-1 ps-5' : 'mt-2')}
data-slot={surface === 'inline' ? 'tool-approval-inline' : 'tool-approval-actions'}
>
<div className="flex items-center gap-2.5">
<div className="inline-flex h-6 items-stretch overflow-hidden rounded-md border border-primary/25 bg-primary/10 text-primary">
<Button

View file

@ -1,6 +1,11 @@
import { describe, expect, it } from 'vitest'
import { buildToolView, type ToolPart } from './tool-fallback-model'
import {
buildToolView,
countDiffLineStats,
inlineDiffFromResult,
type ToolPart
} from './tool-fallback-model'
const part = (overrides: Partial<ToolPart>): ToolPart => ({
args: {},
@ -64,3 +69,51 @@ describe('buildToolView terminal exit-code status', () => {
)
})
})
describe('buildToolView file edit diffs', () => {
const patchDiff = '--- a/src/demo.ts\n+++ b/src/demo.ts\n@@ -1 +1 @@\n-old\n+new'
it('reads inline_diff and diff fields from patch results', () => {
expect(inlineDiffFromResult({ inline_diff: patchDiff })).toBe(patchDiff)
expect(inlineDiffFromResult({ diff: patchDiff })).toBe(patchDiff)
})
it('suppresses raw patch args when a diff is available', () => {
const view = buildToolView(
part({
args: { context: 'src/demo.ts', mode: 'replace', new_string: 'new', path: 'src/demo.ts' },
result: { diff: patchDiff, success: true },
toolName: 'patch'
}),
patchDiff
)
expect(view.title).toBe('demo.ts')
expect(view.subtitle).toBe('src/demo.ts')
expect(view.detail).toBe('')
expect(view.inlineDiff).toBe(patchDiff)
})
it('shows path subtitle instead of patch args JSON while pending', () => {
const view = buildToolView(
part({
args: { context: 'src/demo.ts', mode: 'replace', new_string: 'new', path: 'src/demo.ts' },
result: undefined,
toolName: 'patch'
}),
''
)
expect(view.title).toBe('demo.ts')
expect(view.subtitle).toBe('src/demo.ts')
expect(view.detail).toBe('')
})
})
describe('countDiffLineStats', () => {
it('counts added and removed lines', () => {
expect(
countDiffLineStats(`--- a/x\n+++ b/x\n@@\n-old\n+new\n context\n+another`)
).toEqual({ added: 2, removed: 1 })
})
})

View file

@ -72,6 +72,46 @@ export interface MessageRunningStateSlice {
}
}
const FILE_EDIT_TOOL_NAMES = new Set(['edit_file', 'patch', 'write_file'])
export function isFileEditTool(toolName: string): boolean {
return FILE_EDIT_TOOL_NAMES.has(toolName)
}
export interface DiffLineStats {
added: number
removed: number
}
export function countDiffLineStats(diff: string): DiffLineStats {
let added = 0
let removed = 0
for (const line of diff.split('\n')) {
if (line.startsWith('+') && !line.startsWith('+++')) {
added += 1
} else if (line.startsWith('-') && !line.startsWith('---')) {
removed += 1
}
}
return { added, removed }
}
function fileEditPath(args: Record<string, unknown>, result: Record<string, unknown>): string {
return (
firstStringField(args, ['path', 'file', 'filepath']) ||
firstStringField(result, ['path', 'file', 'filepath', 'resolved_path']) ||
htmlPathFromInlineDiff(firstStringField(result, ['inline_diff', 'diff']))
)
}
function fileEditBasename(path: string): string {
const normalized = path.replace(/\\/g, '/').trim()
return normalized.split('/').filter(Boolean).pop() || normalized
}
const TOOL_META: Record<string, ToolMeta> = {
browser_click: { done: 'Clicked page element', pending: 'Clicking page element', icon: 'globe', tone: 'browser' },
browser_fill: { done: 'Filled form field', pending: 'Filling form field', icon: 'globe', tone: 'browser' },
@ -95,7 +135,7 @@ const TOOL_META: Record<string, ToolMeta> = {
execute_code: { done: 'Ran code', pending: 'Running code', icon: 'terminal', tone: 'terminal' },
image_generate: { done: 'Generated image', pending: 'Generating image', icon: 'file-media', tone: 'image' },
list_files: { done: 'Listed files', pending: 'Listing files', icon: 'files', tone: 'file' },
patch: { done: 'Patched file', pending: 'Patching file', icon: 'diff', tone: 'file' },
patch: { done: 'Patched file', pending: 'Patching file', icon: 'edit', tone: 'file' },
read_file: { done: 'Read file', pending: 'Reading file', icon: 'file', tone: 'file' },
search_files: { done: 'Searched files', pending: 'Searching files', icon: 'search', tone: 'file' },
session_search_recall: {
@ -797,8 +837,8 @@ function toolPreviewTarget(toolName: string, args: Record<string, unknown>, resu
return looksLikeUrl(explicit) ? explicit : findFirstUrl(args, result)
}
if (toolName === 'write_file' || toolName === 'edit_file') {
return htmlPathFromInlineDiff(firstStringField(result, ['inline_diff']))
if (isFileEditTool(toolName)) {
return htmlPathFromInlineDiff(firstStringField(result, ['inline_diff', 'diff']))
}
return ''
@ -858,9 +898,17 @@ function stripDividerLines(value: string): string {
}
export function inlineDiffFromResult(result: unknown): string {
const value = parseMaybeObject(result).inline_diff
const record = parseMaybeObject(result)
return typeof value === 'string' ? stripInlineDiffChrome(value) : ''
for (const key of ['inline_diff', 'diff']) {
const value = record[key]
if (typeof value === 'string' && value.trim()) {
return stripInlineDiffChrome(value)
}
}
return ''
}
// Falls back to a string only when there's something concrete to render —
@ -1047,15 +1095,22 @@ function toolSubtitle(
return command ? compactPreview(command, 120) : 'Executed command'
}
if (toolName === 'read_file' || toolName === 'write_file' || toolName === 'edit_file') {
const path =
firstStringField(argsRecord, ['path', 'file', 'filepath']) ||
htmlPathFromInlineDiff(firstStringField(resultRecord, ['inline_diff']))
if (toolName === 'read_file' || isFileEditTool(toolName)) {
const isEdit = isFileEditTool(toolName)
return (
path ||
(firstStringField(resultRecord, ['inline_diff']) ? 'Changed file' : fallbackDetailText(argsRecord, resultRecord))
)
const path = isEdit
? fileEditPath(argsRecord, resultRecord)
: firstStringField(argsRecord, ['path', 'file', 'filepath'])
if (path) {
return path
}
if (!isEdit) {
return fallbackDetailText(argsRecord, resultRecord)
}
return inlineDiffFromResult(resultRecord) ? 'Changed file' : ''
}
if (toolName === 'web_extract') {
@ -1153,8 +1208,22 @@ function toolDetailText(
}
}
if (part.toolName === 'write_file' || part.toolName === 'edit_file') {
return inlineDiffFromResult(part.result) ? '' : fallbackDetailText(argsRecord, resultRecord)
if (isFileEditTool(part.toolName)) {
if (inlineDiffFromResult(part.result)) {
return ''
}
const summary = firstStringField(resultRecord, ['message', 'summary'])
if (summary) {
return summary
}
if (fileEditPath(argsRecord, resultRecord)) {
return ''
}
return fallbackDetailText(argsRecord, resultRecord)
}
if (part.toolName === 'web_search') {
@ -1253,8 +1322,12 @@ export function toolCopyPayload(part: ToolPart, view: ToolView): { label: string
}
}
if (part.toolName === 'write_file' || part.toolName === 'edit_file') {
const path = firstStringField(args, ['path', 'file', 'filepath'])
if (isFileEditTool(part.toolName)) {
if (view.inlineDiff.trim()) {
return { label: copy.file, text: view.inlineDiff }
}
const path = fileEditPath(args, result)
if (path) {
return { label: copy.path, text: path }
@ -1304,6 +1377,14 @@ function dynamicTitle(
}
}
if (isFileEditTool(part.toolName)) {
const path = fileEditPath(args, result)
if (path) {
return fileEditBasename(path)
}
}
return fallback
}
@ -1317,7 +1398,12 @@ export function buildToolView(part: ToolPart, inlineDiff: string): ToolView {
const title = dynamicTitle(part, argsRecord, resultRecord, baseTitle)
const titleEnriched = title !== baseTitle
const baseSubtitle = error || toolSubtitle(part, argsRecord, resultRecord)
const keepSubtitleWithTitle = part.toolName === 'terminal' || part.toolName === 'execute_code'
const keepSubtitleWithTitle =
part.toolName === 'terminal' ||
part.toolName === 'execute_code' ||
(isFileEditTool(part.toolName) && Boolean(baseSubtitle.trim()))
const subtitle = titleEnriched && !error && !keepSubtitleWithTitle ? '' : baseSubtitle
const detailBody = stripDividerLines(toolDetailText(part, argsRecord, resultRecord))

View file

@ -8,7 +8,7 @@ import { AnsiText } from '@/components/assistant-ui/ansi-text'
import { useElapsedSeconds } from '@/components/chat/activity-timer'
import { ActivityTimerText } from '@/components/chat/activity-timer-text'
import { CompactMarkdown } from '@/components/chat/compact-markdown'
import { DiffLines } from '@/components/chat/diff-lines'
import { FileDiffPanel } from '@/components/chat/diff-lines'
import { DisclosureRow } from '@/components/chat/disclosure-row'
import { PreviewAttachment } from '@/components/chat/preview-attachment'
import { ZoomableImage } from '@/components/chat/zoomable-image'
@ -16,6 +16,7 @@ import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { CopyButton } from '@/components/ui/copy-button'
import { FadeText } from '@/components/ui/fade-text'
import { FileTypeIcon } from '@/components/ui/file-type-icon'
import { GlyphSpinner } from '@/components/ui/glyph-spinner'
import { ToolIcon } from '@/components/ui/tool-icon'
import { Tip } from '@/components/ui/tooltip'
@ -32,7 +33,9 @@ import { PendingToolApproval } from './tool-approval'
import {
buildToolView,
cleanVisibleText,
countDiffLineStats,
inlineDiffFromResult,
isFileEditTool,
isPreviewableTarget,
looksRedundant,
type SearchResultRow,
@ -133,9 +136,21 @@ function statusGlyph(status: ToolStatus, copy: ToolStatusCopy): ReactNode {
// Leading glyph for any tool-row header. Status (running/error/warning)
// takes precedence; otherwise falls back to the tool's codicon. Returns
// null when neither applies so callers can render unconditionally.
function ToolGlyph({ copy, icon, status }: { copy: ToolStatusCopy; icon?: string; status?: ToolStatus }) {
function ToolGlyph({
copy,
filePath,
icon,
status
}: {
copy: ToolStatusCopy
filePath?: string
icon?: string
status?: ToolStatus
}) {
const node = status ? (
statusGlyph(status, copy)
) : filePath ? (
<FileTypeIcon className="text-(--ui-text-tertiary)" path={filePath} size="0.875rem" />
) : icon ? (
<ToolIcon className="text-(--ui-text-tertiary)" name={icon} size="0.875rem" />
) : null
@ -204,8 +219,13 @@ function ToolEntry({ part }: ToolEntryProps) {
const toolViewMode = useStore($toolViewMode)
const disclosureId = `tool-entry:${messageId}:${toolPartDisclosureId(part)}`
const dismissed = useStore($toolRowDismissed(disclosureId))
const open = useDisclosureOpen(disclosureId)
const isPending = messageRunning && part.result === undefined
const liveDiffs = useStore($toolInlineDiffs)
const sideDiff = part.toolCallId ? liveDiffs[part.toolCallId] || '' : ''
const inlineDiff = stripInlineDiffChrome(sideDiff) || inlineDiffFromResult(part.result)
const isFileEdit = isFileEditTool(part.toolName)
const defaultOpen = Boolean(inlineDiff)
const open = useDisclosureOpen(disclosureId, defaultOpen)
const canDismiss = !isPending && !embedded
// Only animate entries that mount while their message is actively
// streaming — historical sessions mount with `messageRunning === false`,
@ -213,9 +233,6 @@ function ToolEntry({ part }: ToolEntryProps) {
// handles its own enter animation, so embedded children skip it.
const enterRef = useEnterAnimation(messageRunning && !embedded, `tool-entry:${disclosureId}`)
const elapsed = useElapsedSeconds(isPending, `tool:${disclosureId}`)
const liveDiffs = useStore($toolInlineDiffs)
const sideDiff = part.toolCallId ? liveDiffs[part.toolCallId] || '' : ''
const inlineDiff = stripInlineDiffChrome(sideDiff) || inlineDiffFromResult(part.result)
// Stale parts (no result, but message stopped running) get a synthetic
// empty result so buildToolView treats them as completed-no-output.
@ -253,11 +270,12 @@ function ToolEntry({ part }: ToolEntryProps) {
const detailMatchesSubtitle = looksRedundant(view.subtitle, view.detail)
const showDetail =
(view.status === 'error' && Boolean(detailSections.summary || detailSections.body)) ||
(view.status !== 'error' &&
Boolean(view.detail) &&
!looksRedundant(view.title, view.detail) &&
!detailMatchesSubtitle)
!view.inlineDiff &&
((view.status === 'error' && Boolean(detailSections.summary || detailSections.body)) ||
(view.status !== 'error' &&
Boolean(view.detail) &&
!looksRedundant(view.title, view.detail) &&
!detailMatchesSubtitle))
const renderDetailAsCode =
view.status !== 'error' &&
@ -283,6 +301,13 @@ function ToolEntry({ part }: ToolEntryProps) {
const copyAction = useMemo(() => toolCopyPayload(part, view), [part, view])
const diffStats = useMemo(
() => (isFileEdit && view.inlineDiff ? countDiffLineStats(view.inlineDiff) : null),
[isFileEdit, view.inlineDiff]
)
const showDiffStats = !isPending && Boolean(diffStats && (diffStats.added > 0 || diffStats.removed > 0))
// The header trailing slot only carries the live duration timer while the
// tool is running. The copy control used to live here too, but an
// `opacity-0` (yet still clickable) button straddling the caret/duration made
@ -299,7 +324,12 @@ function ToolEntry({ part }: ToolEntryProps) {
<Tip label={statusCopy.dismiss}>
<Button
aria-label={statusCopy.dismiss}
className="size-5 rounded-md text-(--ui-text-tertiary) opacity-0 transition-opacity hover:text-(--ui-text-primary) hover:opacity-100 group-hover/disclosure-row:opacity-80 group-focus-within/disclosure-row:opacity-80"
className={cn(
'size-5 rounded-md text-(--ui-text-tertiary) transition-opacity hover:text-(--ui-text-primary) hover:opacity-100',
open
? 'opacity-80'
: 'opacity-0 group-hover/disclosure-row:opacity-80 group-focus-within/disclosure-row:opacity-80'
)}
onClick={event => {
event.stopPropagation()
dismissToolRow(disclosureId)
@ -317,13 +347,24 @@ function ToolEntry({ part }: ToolEntryProps) {
return null
}
// A completed file edit with no diff to review is a bare, unexpandable row.
// This is almost always a `write_file` create after a reload: only `patch`
// persists its diff in the tool result, so creates rehydrate diff-less and
// read like dead duplicates of the real diff row. Hide them — but keep
// in-flight writes (activity) and failures (errors) visible.
if (isFileEdit && !isPending && view.status !== 'error' && !view.inlineDiff) {
return null
}
return (
<div
className={cn(
'min-w-0 max-w-full overflow-hidden text-[length:var(--conversation-tool-font-size)] text-(--ui-text-tertiary)',
open && 'rounded-[0.625rem] border border-(--ui-stroke-tertiary)'
)}
data-file-edit={isFileEdit && open ? '' : undefined}
data-slot="tool-block"
data-tool-row=""
ref={enterRef}
>
<div className={cn(open && 'border-b border-(--ui-stroke-tertiary) px-2 py-1.5')}>
@ -333,8 +374,16 @@ function ToolEntry({ part }: ToolEntryProps) {
open={open}
trailing={trailing}
>
<span className="flex min-w-0 items-center gap-1.5">
<ToolGlyph copy={copy} icon={view.icon} status={leadingStatus(isPending, view.status)} />
<span
className="flex min-w-0 items-center gap-1.5"
title={isFileEdit && view.subtitle ? view.subtitle : undefined}
>
<ToolGlyph
copy={copy}
filePath={isFileEdit ? view.subtitle : undefined}
icon={view.icon}
status={leadingStatus(isPending, view.status)}
/>
<FadeText
className={cn(
TOOL_HEADER_TITLE_CLASS,
@ -346,7 +395,17 @@ function ToolEntry({ part }: ToolEntryProps) {
{view.title}
</FadeText>
{!isPending && view.countLabel && <span className={TOOL_HEADER_DURATION_CLASS}>{view.countLabel}</span>}
{!isPending && view.durationLabel && (
{showDiffStats && diffStats && (
<span className="flex shrink-0 items-center gap-1 font-mono text-[0.625rem] tabular-nums">
{diffStats.added > 0 && (
<span className="text-emerald-600 dark:text-emerald-400">+{diffStats.added}</span>
)}
{diffStats.removed > 0 && (
<span className="text-rose-600 dark:text-rose-400">{diffStats.removed}</span>
)}
</span>
)}
{!isFileEdit && !isPending && view.durationLabel && (
<span className={TOOL_HEADER_DURATION_CLASS}>{view.durationLabel}</span>
)}
</span>
@ -358,7 +417,7 @@ function ToolEntry({ part }: ToolEntryProps) {
{copyAction.text && (
<CopyButton
appearance="inline"
className="absolute right-1.5 top-1.5 z-10 h-5 gap-0 rounded-md border border-(--ui-stroke-tertiary) bg-background/80 px-1 opacity-60 backdrop-blur-sm transition-opacity hover:opacity-100 focus-visible:opacity-100"
className="absolute right-1.5 top-1.5 z-10 h-5 gap-0 rounded-md border border-(--ui-stroke-tertiary) bg-background/80 px-1 opacity-100 backdrop-blur-sm transition-opacity hover:opacity-100 focus-visible:opacity-100"
iconClassName="size-3"
label={copyAction.label}
showLabel={false}
@ -380,6 +439,7 @@ function ToolEntry({ part }: ToolEntryProps) {
<SearchResultsList hits={view.searchHits} />
</div>
)}
{view.inlineDiff && <FileDiffPanel diff={view.inlineDiff} path={isFileEdit ? view.subtitle : undefined} />}
{showDetail &&
toolViewMode !== 'technical' &&
(view.status === 'error' ? (
@ -448,14 +508,21 @@ function ToolEntry({ part }: ToolEntryProps) {
</pre>
</details>
)}
{toolViewMode === 'technical' && (
{toolViewMode === 'technical' && !(isFileEdit && view.inlineDiff) && (
<pre className={cn(TOOL_SECTION_PRE_CLASS, 'whitespace-pre-wrap wrap-anywhere')}>
{rawTechnicalTrace(part.args, part.result)}
</pre>
)}
{toolViewMode === 'technical' && isFileEdit && view.inlineDiff && (
<details className="max-w-full">
<summary className={cn(TOOL_SECTION_LABEL_CLASS, 'mb-0 cursor-pointer')}>Tool payload</summary>
<pre className={cn(TOOL_SECTION_PRE_CLASS, 'mt-1 whitespace-pre-wrap wrap-anywhere')}>
{rawTechnicalTrace(part.args, part.result)}
</pre>
</details>
)}
</div>
)}
{open && view.inlineDiff && <DiffLines text={view.inlineDiff} />}
</div>
)
}
@ -488,6 +555,7 @@ export const ToolGroupSlot: FC<PropsWithChildren<{ endIndex: number; startIndex:
<div
className="grid min-w-0 max-w-full gap-(--tool-row-gap) overflow-hidden"
data-slot="tool-block"
data-tool-group=""
ref={enterRef}
>
{children}

View file

@ -1,12 +1,9 @@
import { cn } from '@/lib/utils'
/**
* The composer surface and everything docked to it (slash·@ popover, `?` help)
* paint ONE shared `--composer-fill` var. The state ladder (rest / scrolled /
* focused / drawer-open) lives in styles.css on `[data-slot='composer-root']`,
* so the two layers can never disagree drawer-open forces an opaque fill via
* `:has()`, because translucent glass sampling different backdrops (thread vs
* fade gradient) renders as different colors even with identical tints.
* The composer surface and the status/queue stack paint ONE shared
* `--composer-fill` var. The state ladder (rest / scrolled) lives in styles.css
* on `[data-slot='composer-root']`, so the layers can never disagree.
*/
export const composerFill = 'bg-(--composer-fill)'
@ -26,6 +23,13 @@ const composerDockEdge = (edge: 'bottom' | 'top') =>
export const composerDockCard = (edge: 'bottom' | 'top' = 'top') =>
cn(composerDockEdge(edge), composerFill, composerSurfaceGlass)
/** Fused docked card completion drawers. Shares `--composer-fill` with the
* composer surface, which goes opaque while a drawer is open. */
export const composerFusedDockCard = (edge: 'bottom' | 'top' = 'top') => cn(composerDockEdge(edge), composerFill)
/** Floating composer panel skin — the `/`·`@`·`?` completion drawer and the
* attach (`+`) menu. Glassy translucent card, hairline border, full radius,
* smallest type, soft nous shadow. Uses an explicit fill (not `--composer-fill`)
* so it renders identically whether mounted inside the composer or portaled out
* of it. Visual skin only consumers add their own size/position/padding. */
export const composerPanelCard = cn(
'rounded-2xl border border-border/65 shadow-nous text-[length:var(--conversation-tool-font-size)]',
'bg-[color-mix(in_srgb,var(--dt-card)_72%,transparent)]',
composerSurfaceGlass
)

View file

@ -1,33 +1,176 @@
import * as React from 'react'
'use client'
import type { ReactNode } from 'react'
import * as React from 'react'
import { useShikiHighlighter } from 'react-shiki'
import type { ShikiTransformer } from 'shiki'
import { exceedsHighlightBudget, SHIKI_THEME } from '@/components/chat/shiki-highlighter'
import { shikiLanguageForFilename } from '@/lib/markdown-code'
import { cn } from '@/lib/utils'
/**
* Per-line classed renderer for unified diffs. Lives outside `CodeCard` so
* tool-result panels (already nested inside a tool card) don't double-shell;
* for markdown ` ```diff ` fences the standard `CodeCard` + Shiki path runs
* instead and gives equivalent coloring.
* Renders a unified diff for a tool's file edit. Two paths share one parse:
* - `SyntaxDiff` highlights the change *content* in the file's language via
* Shiki, then a per-line transformer paints the add/remove tint on top.
* - `DiffLines` is the color-only fallback (no language, over budget, or while
* Shiki loads).
* Both drop git file-headers + `@@` hunk noise and the `+/-` gutter so changes
* read by color + a 2px gutter accent, the way Cursor does.
*/
interface DiffLineKind {
className?: string
match: (line: string) => boolean
type DiffKind = 'add' | 'context' | 'remove'
interface DiffLine {
kind: DiffKind
text: string
}
const DIFF_LINE_KINDS: DiffLineKind[] = [
{
className: 'text-emerald-700 dark:text-emerald-300',
match: line => line.startsWith('+') && !line.startsWith('+++')
},
{ className: 'text-rose-700 dark:text-rose-300', match: line => line.startsWith('-') && !line.startsWith('---') },
{ className: 'text-sky-700 dark:text-sky-300', match: line => line.startsWith('@@') },
{
className: 'text-muted-foreground/70',
match: line => line.startsWith('---') || line.startsWith('+++') || / → /.test(line.slice(0, 60))
}
]
// Tint + 2px gutter accent per change kind. Text color is included for the
// plain renderer; the Shiki path omits it so syntax colors win, layering only
// the background + border.
const DIFF_KIND_TINT: Record<DiffKind, string> = {
add: 'border-emerald-500 bg-emerald-500/12',
context: 'border-transparent',
remove: 'border-rose-500 bg-rose-500/12'
}
function classifyLine(line: string): string | undefined {
return DIFF_LINE_KINDS.find(kind => kind.match(line))?.className
const DIFF_KIND_TEXT: Record<DiffKind, string> = {
add: 'text-emerald-800 dark:text-emerald-200',
context: '',
remove: 'text-rose-800 dark:text-rose-200'
}
const DIFF_LINE_BASE = 'block min-w-max whitespace-pre border-l-2 px-2.5 py-px'
// Bleed out of the tool-card body's `p-1.5` so tints/borders run flush to the
// card edges (rounded corners clip via the card's overflow); compact height
// with internal scroll like a code block.
const DIFF_BOX_CLASS =
'-mx-1.5 -mb-1.5 max-h-[12rem] max-w-none min-w-0 overflow-auto overscroll-contain font-mono text-[0.7rem] leading-relaxed text-(--ui-text-secondary)'
function diffKind(line: string): DiffKind {
if (line.startsWith('+') && !line.startsWith('+++')) {
return 'add'
}
if (line.startsWith('-') && !line.startsWith('---')) {
return 'remove'
}
return 'context'
}
// Drop the leading +/-/space gutter so changes read by color alone, keeping the
// rest of the indentation intact.
function stripDiffMarker(line: string): string {
if (diffKind(line) !== 'context' || line.startsWith(' ')) {
return line.slice(1)
}
return line
}
// Git-style unified diffs arrive with a file-header preamble — `diff --git`,
// `index …`, `--- a/path`, `+++ b/path`, and Hermes' own `a/path → b/path`
// arrow line. That preamble just repeats the path (which the tool row already
// shows) and reads especially badly for absolute paths (`a//Users/…`). Strip
// the leading header zone up to the first hunk.
const DIFF_HEADER_PREFIXES = ['diff --git', 'index ', '--- ', '+++ ', 'similarity ', 'rename ', 'new file', 'deleted file']
function isArrowHeaderLine(line: string): boolean {
const trimmed = line.trim()
return trimmed.includes('→') && /^\S.*→\s*\S+$/.test(trimmed) && !/^[+\-@]/.test(trimmed)
}
/** Exported for tests. */
export function stripDiffFileHeaders(diff: string): string {
const lines = diff.split('\n')
let start = 0
for (; start < lines.length; start += 1) {
const line = lines[start]
if (line.startsWith('@@')) {
break
}
if (line.trim() === '' || isArrowHeaderLine(line) || DIFF_HEADER_PREFIXES.some(prefix => line.startsWith(prefix))) {
continue
}
break
}
return lines.slice(start).join('\n')
}
// Cleaned diff → renderable lines: file-headers + `@@` hunks dropped (a blank
// separator kept between hunks), markers stripped, kind recorded.
function parseDiff(diff: string): DiffLine[] {
const out: DiffLine[] = []
let emitted = false
for (const line of stripDiffFileHeaders(diff).split('\n')) {
if (line.startsWith('@@')) {
if (emitted) {
out.push({ kind: 'context', text: '' })
}
continue
}
out.push({ kind: diffKind(line), text: stripDiffMarker(line) })
emitted = true
}
return out
}
function DiffBody({ lines, syntax }: { lines: DiffLine[]; syntax?: boolean }) {
return (
<>
{lines.map((line, index) => (
<span
className={cn(DIFF_LINE_BASE, DIFF_KIND_TINT[line.kind], !syntax && DIFF_KIND_TEXT[line.kind])}
key={`${index}-${line.text}`}
>
{line.text || ' '}
</span>
))}
</>
)
}
// Shiki transformer: tag each `.line` with the diff tint for its kind, so the
// syntax-highlighted output keeps add/remove backgrounds + the gutter accent.
function diffLineTransformer(kinds: DiffKind[]): ShikiTransformer {
return {
line(node, line) {
const kind = kinds[line - 1] ?? 'context'
const existing = Array.isArray(node.properties.className)
? (node.properties.className as string[])
: node.properties.className
? [String(node.properties.className)]
: []
node.properties.className = [...existing, DIFF_LINE_BASE, DIFF_KIND_TINT[kind]]
}
}
}
function SyntaxDiff({ language, lines }: { language: string; lines: DiffLine[] }) {
const code = React.useMemo(() => lines.map(line => line.text).join('\n'), [lines])
const transformers = React.useMemo(() => [diffLineTransformer(lines.map(line => line.kind))], [lines])
const highlighted = useShikiHighlighter(code, language, SHIKI_THEME, {
defaultColor: 'light-dark()',
transformers
})
// Until Shiki resolves, show the plain colored diff so there's no flash.
return (highlighted as ReactNode) ?? <DiffBody lines={lines} />
}
interface DiffLinesProps extends Omit<React.ComponentProps<'pre'>, 'children'> {
@ -35,20 +178,28 @@ interface DiffLinesProps extends Omit<React.ComponentProps<'pre'>, 'children'> {
}
export function DiffLines({ className, text, ...props }: DiffLinesProps) {
const lines = React.useMemo(() => parseDiff(text), [text])
return (
<pre
className={cn(
'mt-1 mb-1.5 max-h-96 max-w-full min-w-0 overflow-auto rounded-md border border-border/60 bg-muted/35 px-2.5 py-1.5 font-mono text-[0.7rem] leading-relaxed text-muted-foreground',
className
)}
data-slot="diff-lines"
{...props}
>
{text.split('\n').map((line, index) => (
<span className={cn('block min-w-max whitespace-pre', classifyLine(line))} key={`${index}-${line}`}>
{line || ' '}
</span>
))}
<pre className={cn(DIFF_BOX_CLASS, className)} data-slot="diff-lines" {...props}>
<DiffBody lines={lines} />
</pre>
)
}
interface FileDiffPanelProps {
diff: string
path?: string
}
export function FileDiffPanel({ diff, path }: FileDiffPanelProps) {
const lines = React.useMemo(() => parseDiff(diff), [diff])
const language = shikiLanguageForFilename(path)
const canHighlight = Boolean(language) && !exceedsHighlightBudget(diff)
return (
<div className={DIFF_BOX_CLASS} data-slot="file-diff-panel">
{canHighlight ? <SyntaxDiff language={language} lines={lines} /> : <DiffBody lines={lines} />}
</div>
)
}

View file

@ -30,7 +30,10 @@ interface HermesSyntaxHighlighterProps extends SyntaxHighlighterProps {
defer?: boolean
}
const SHIKI_THEME = { dark: 'github-dark-default', light: 'github-light-default' } as const
// `github-dark-dimmed` is GitHub's lower-contrast dark palette — the vivid
// `github-dark-default` tokens read harsh at our small code size. Shared by the
// inline diff renderer too (see diff-lines.tsx) so code + diffs match.
export const SHIKI_THEME = { dark: 'github-dark-dimmed', light: 'github-light-default' } as const
/**
* `github-light-default` colors comments `#6e7781` (~4.2:1 against the code

View file

@ -41,7 +41,11 @@ export function TerminalOutput({ className, text }: TerminalOutputProps) {
}, [text])
return (
<div className={cn('max-h-16 overflow-auto overscroll-contain', className)} ref={ref}>
<div
className={cn('max-h-16 overflow-auto overscroll-contain', className)}
data-selectable-text="true"
ref={ref}
>
<pre className="w-max min-w-full font-mono text-[0.5625rem] leading-[0.85rem] whitespace-pre text-muted-foreground/70">
{text}
</pre>

View file

@ -14,10 +14,9 @@ import {
$visibleModels,
collapseModelFamilies,
effectiveVisibleKeys,
emptyProviderSentinelKey,
isProviderSentinel,
modelVisibilityKey,
setVisibleModels
setVisibleModels,
toggleModelVisibility
} from '@/store/model-visibility'
import type { ModelOptionProvider, ModelOptionsResponse } from '@/types/hermes'
@ -61,25 +60,7 @@ export function ModelVisibilityDialog({
const visible = effectiveVisibleKeys(stored, providers)
const toggle = (provider: ModelOptionProvider, model: string) => {
const next = new Set(effectiveVisibleKeys($visibleModels.get(), providers))
const key = modelVisibilityKey(provider.slug, model)
const sentinel = emptyProviderSentinelKey(provider.slug)
if (next.has(key)) {
next.delete(key)
// Check if this was the last real model for this provider.
const remainingForProvider = [...next].some(k => k.startsWith(`${provider.slug}::`) && !isProviderSentinel(k))
if (!remainingForProvider) {
next.add(sentinel)
}
} else {
next.delete(sentinel)
next.add(key)
}
setVisibleModels(next)
setVisibleModels(toggleModelVisibility($visibleModels.get(), providers, provider.slug, model))
}
const q = search.trim().toLowerCase()

View file

@ -154,7 +154,10 @@ function NotificationDetail({ detail }: { detail: string }) {
<details className="mt-2 text-xs text-muted-foreground">
<summary className="select-none font-medium text-muted-foreground hover:text-foreground">{copy.details}</summary>
<div className="mt-1 rounded-md bg-background/65 p-2">
<pre className="max-h-32 whitespace-pre-wrap wrap-break-word font-mono text-[0.6875rem] leading-relaxed">
<pre
className="max-h-32 whitespace-pre-wrap wrap-break-word font-mono text-[0.6875rem] leading-relaxed"
data-selectable-text="true"
>
{detail}
</pre>
<CopyButton

View file

@ -3,6 +3,7 @@
import { useStore } from '@nanostores/react'
import { type FormEvent, useCallback, useEffect, useState } from 'react'
import { PendingApprovalFallback } from '@/components/assistant-ui/tool-approval'
import { Button } from '@/components/ui/button'
import {
Dialog,
@ -21,13 +22,12 @@ import { notifyError } from '@/store/notifications'
import { $secretRequest, $sudoRequest, clearSecretRequest, clearSudoRequest } from '@/store/prompts'
// Renders the modal mid-turn prompts the gateway raises and waits on: sudo
// password and skill secret capture. (Dangerous-command / execute_code approval
// is rendered INLINE on the pending tool row instead — see
// components/assistant-ui/tool-approval.tsx — so it reads like an inline "Run"
// affordance rather than a blocking modal.) Each Python-side caller blocks the
// agent thread until the matching `*.respond` RPC lands; without a renderer the
// agent stalls until its timeout and the tool is BLOCKED (the bug this fixes —
// desktop handled clarify.request but not these). Any close path (Esc, backdrop
// password and skill secret capture. Dangerous-command / execute_code approval
// prefers the pending tool row, but also has a chat-level fallback when no row
// is mounted (remote gateway sessions can raise the request before the matching
// tool call is visible). Each Python-side caller blocks the agent thread until
// the matching `*.respond` RPC lands; without a renderer the agent stalls until
// its timeout and the tool is BLOCKED. Any close path (Esc, backdrop
// click) funnels through Radix's single `onOpenChange(false)` and maps to a
// refusal, so silence is never mistaken for consent, matching the TUI. We
// deliberately do NOT add onEscapeKeyDown / onInteractOutside handlers — they'd
@ -227,6 +227,7 @@ function SecretDialog() {
export function PromptOverlays() {
return (
<>
<PendingApprovalFallback />
<SudoDialog />
<SecretDialog />
</>

View file

@ -0,0 +1,42 @@
import { useEffect, useState } from 'react'
import { Alert, AlertDescription } from '@/components/ui/alert'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { useI18n } from '@/i18n'
import { Info } from '@/lib/icons'
export function RemoteDisplayBanner() {
const { t } = useI18n()
const [reason, setReason] = useState<string | null>(null)
const [dismissed, setDismissed] = useState(false)
useEffect(() => {
void window.hermesDesktop?.getRemoteDisplayReason?.().then(result => setReason(result))
}, [])
if (!reason || dismissed) {
return null
}
return (
<div className="pointer-events-none fixed left-1/2 top-[calc(var(--titlebar-height,34px)+0.75rem)] z-[200] w-[min(32rem,calc(100%-2rem))] -translate-x-1/2">
<Alert className="pointer-events-auto grid-cols-[auto_minmax(0,1fr)_auto] border-(--stroke-nous) bg-popover/95 pr-2.5 shadow-nous backdrop-blur-md">
<Info className="text-muted-foreground" />
<AlertDescription className="col-start-2">
<p className="m-0">{t.remoteDisplayBanner.message(reason)}</p>
</AlertDescription>
<Button
aria-label={t.remoteDisplayBanner.dismiss}
className="col-start-3 -mr-1 text-muted-foreground"
onClick={() => setDismissed(true)}
size="icon-xs"
type="button"
variant="ghost"
>
<Codicon name="close" size="0.875rem" />
</Button>
</Alert>
</div>
)
}

Some files were not shown because too many files have changed in this diff Show more