feat(api_server): expose run status for external UIs (#17085)

Adds two API server endpoints for external UIs and orchestrators: - GET /v1/capabilities — machine-readable feature discovery so clients can detect which Runs API / SSE / auth features this Hermes version supports before depending on them. - GET /v1/runs/{run_id} — pollable run status so dashboards can check queued/running/completed/failed/cancelled/stopping state without holding an SSE connection open. Also moves request validation ahead of run allocation so invalid payloads no longer leave orphaned entries in _run_streams waiting for the TTL sweep. task_id is intentionally kept as "default" for the Runs API to preserve the shared-sandbox model used by CLI, gateway, and the existing _run_agent_with_callbacks path. session_id is surfaced in run status for external-UI correlation only. Salvage of PR #17085 by @Magaav.
2026-05-01 01:51:44 +00:00 · 2026-04-29 06:36:56 -07:00 · 2026-04-29 06:36:56 -07:00 · 810d98e892
commit 810d98e892
parent 83c288da01
4 changed files with 362 additions and 23 deletions
--- a/website/docs/user-guide/features/api-server.md
+++ b/website/docs/user-guide/features/api-server.md
@ -194,6 +194,29 @@ Delete a stored response.

 Lists the agent as an available model. The advertised model name defaults to the [profile](/docs/user-guide/profiles) name (or `hermes-agent` for the default profile). Required by most frontends for model discovery.

+### GET /v1/capabilities
+
+Returns a machine-readable description of the API server's stable surface for external UIs, orchestrators, and plugin bridges.
+
+```json
+{
+  "object": "hermes.api_server.capabilities",
+  "platform": "hermes-agent",
+  "model": "hermes-agent",
+  "auth": {"type": "bearer", "required": true},
+  "features": {
+    "chat_completions": true,
+    "responses_api": true,
+    "run_submission": true,
+    "run_status": true,
+    "run_events_sse": true,
+    "run_stop": true
+  }
+}
+```
+
+Use this endpoint when integrating dashboards, browser UIs, or control planes so they can discover whether the running Hermes version supports runs, streaming, cancellation, and session continuity without depending on private Python internals.
+
 ### GET /health

 Health check. Returns `{"status": "ok"}`. Also available at **GET /v1/health** for OpenAI-compatible clients that expect the `/v1/` prefix.
@ -210,10 +233,41 @@ In addition to `/v1/chat/completions` and `/v1/responses`, the server exposes a

 Create a new agent run. Returns a `run_id` that can be used to subscribe to progress events.

+```json
+{
+  "run_id": "run_abc123",
+  "status": "started"
+}
+```
+
+Runs accept a simple `input` string and optional `session_id`, `instructions`, `conversation_history`, or `previous_response_id`. When `session_id` is provided, Hermes surfaces it in the run status so external UIs can correlate runs with their own conversation IDs.
+
+### GET /v1/runs/\{run_id\}
+
+Poll the current run state. This is useful for dashboards that need status without holding an SSE connection open, or for UIs that reconnect after navigation.
+
+```json
+{
+  "object": "hermes.run",
+  "run_id": "run_abc123",
+  "status": "completed",
+  "session_id": "space-session",
+  "model": "hermes-agent",
+  "output": "Done.",
+  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
+}
+```
+
+Statuses are retained briefly after terminal states (`completed`, `failed`, or `cancelled`) for polling and UI reconciliation.
+
 ### GET /v1/runs/\{run_id\}/events

 Server-Sent Events stream of the run's tool-call progress, token deltas, and lifecycle events. Designed for dashboards and thick clients that want to attach/detach without losing state.

+### POST /v1/runs/\{run_id\}/stop
+
+Interrupt a running agent turn. The endpoint returns immediately with `{"status": "stopping"}` while Hermes asks the active agent to stop at the next safe interruption point.
+
 ## Jobs API (background scheduled work)

 The server exposes a lightweight jobs CRUD surface for managing scheduled / background agent runs from a remote client. All endpoints are gated behind the same bearer auth.