Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.
Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.
Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
Sweep ~74 redundant local imports across 21 files where the same module
was already imported at the top level. Also includes type fixes and lint
cleanups on the same branch.
OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision
requests to the API server. Both endpoints accept the canonical OpenAI
multimodal shape:
Chat Completions: {type: text|image_url, image_url: {url, detail?}}
Responses: {type: input_text|input_image, image_url: <str>, detail?}
The server validates and converts both into a single internal shape that the
existing agent pipeline already handles (Anthropic adapter converts,
OpenAI-wire providers pass through). Remote http(s) URLs and data:image/*
URLs are supported.
Uploaded files (file, input_file, file_id) and non-image data: URLs are
rejected with 400 unsupported_content_type.
Changes:
- gateway/platforms/api_server.py
- _normalize_multimodal_content(): validates + normalizes both Chat and
Responses content shapes. Returns a plain string for text-only content
(preserves prompt-cache behavior on existing callers) or a canonical
[{type:text|image_url,...}] list when images are present.
- _content_has_visible_payload(): replaces the bare truthy check so a
user turn with only an image no longer rejects as 'No user message'.
- _handle_chat_completions and _handle_responses both call the new helper
for user/assistant content; system messages continue to flatten to text.
- Codex conversation_history, input[], and inline history paths all share
the same validator. No duplicated normalizers.
- run_agent.py
- _summarize_user_message_for_log(): produces a short string summary
('[1 image] describe this') from list content for logging, spinner
previews, and trajectory writes. Fixes AttributeError when list
user_message hit user_message[:80] + '...' / .replace().
- _chat_content_to_responses_parts(): module-level helper that converts
chat-style multimodal content to Responses 'input_text'/'input_image'
parts. Used in _chat_messages_to_responses_input for Codex routing.
- _preflight_codex_input_items() now validates and passes through list
content parts for user/assistant messages instead of stringifying.
- tests/gateway/test_api_server_multimodal.py (new, 38 tests)
- Unit coverage for _normalize_multimodal_content, including both part
formats, data URL gating, and all reject paths.
- Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses
verifying multimodal payloads reach _run_agent intact.
- 400 coverage for file / input_file / non-image data URL.
- tests/run_agent/test_run_agent_multimodal_prologue.py (new)
- Regression coverage for the prologue no-crash contract.
- _chat_content_to_responses_parts round-trip coverage.
- website/docs/user-guide/features/api-server.md
- Inline image examples for both endpoints.
- Updated Limitations: files still unsupported, images now supported.
Validated live against openrouter/anthropic/claude-opus-4.6:
POST /v1/chat/completions → 200, vision-accurate description
POST /v1/responses → 200, same image, clean output_text
POST /v1/chat/completions [file] → 400 unsupported_content_type
POST /v1/responses [input_file] → 400 unsupported_content_type
POST /v1/responses [non-image data URL] → 400 unsupported_content_type
Closes#5621, #8253, #4046, #6632.
Co-authored-by: Paul Bergeron <paul@gamma.app>
Co-authored-by: zhangxicen <zhangxicen@example.com>
Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com>
Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
All 10 call sites in gateway/run.py and gateway/platforms/api_server.py
are inside async functions where a loop is guaranteed to be running.
get_event_loop() is deprecated since Python 3.10 — it can silently
create a new loop when none is running, masking bugs.
get_running_loop() raises RuntimeError instead, which is safer.
Surfaced during review of PRs #10533 and #10647.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
The /v1/responses endpoint generated a new UUID session_id for every
request, even when previous_response_id was provided. This caused each
turn of a multi-turn conversation to appear as a separate session on the
web dashboard, despite the conversation history being correctly chained.
Fix: store session_id alongside the response in the ResponseStore, and
reuse it when a subsequent request chains via previous_response_id.
Applies to both the non-streaming /v1/responses path and the streaming
SSE path. The /v1/runs endpoint also gains session continuity from
stored responses (explicit body.session_id still takes priority).
Adds test verifying session_id is preserved across chained requests.
The streaming path emits output as content-part arrays for Open WebUI
compatibility, but the batch (non-streaming) Responses API path must
return output as a plain string per the OpenAI Responses API spec.
Reverts the _extract_output_items change from the cherry-picked commits
while preserving the streaming path's array format.
The dashboard's gateway status detection relied solely on local PID checks
(os.kill + /proc), which fails when the gateway runs in a separate container.
Changes:
- web_server.py: Add _probe_gateway_health() that queries the gateway's HTTP
/health/detailed endpoint when the local PID check fails. Activated by
setting the GATEWAY_HEALTH_URL env var (e.g. http://gateway:8642/health).
Falls back to standard PID check when the env var is not set.
- api_server.py: Add GET /health/detailed endpoint that returns full gateway
state (platforms, gateway_state, active_agents, pid, etc.) without auth.
The existing GET /health remains unchanged for backwards compatibility.
- StatusPage.tsx: Handle the case where gateway_pid is null but the gateway
is running remotely, displaying 'Running (remote)' instead of 'PID null'.
Environment variables:
- GATEWAY_HEALTH_URL: URL of the gateway health endpoint (e.g.
http://gateway-container:8642/health). Unset = local PID check only.
- GATEWAY_HEALTH_TIMEOUT: Probe timeout in seconds (default: 3).
Port from openclaw/openclaw#64586: users who copy .env.example without
changing placeholder values now get a clear error at startup instead of
a confusing auth failure from the platform API. Also rejects placeholder
API_SERVER_KEY when binding to a network-accessible address.
Cherry-picked from PR #8677.
Some OpenAI-compatible clients (Open WebUI, LobeChat, etc.) send
message content as an array of typed parts instead of a plain string:
[{"type": "text", "text": "hello"}]
The agent pipeline expects strings, so these array payloads caused
silent failures or empty messages.
Add _normalize_chat_content() with defensive limits (recursion depth,
list size, output length) and apply it to both the Chat Completions
and Responses API endpoints. The Responses path had inline
normalization that only handled input_text/output_text — the shared
function also handles the standard 'text' type.
Salvaged from PR #7980 (ikelvingo) — only the content normalization;
the SSE and Weixin changes in that PR were regressions and are not
included.
Co-authored-by: ikelvingo <ikelvingo@users.noreply.github.com>
Tool progress markers (e.g. `⏰ list`) were injected directly into
SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI,
LobeChat, etc.) store delta.content verbatim as the assistant message
and send it back on subsequent requests. After enough turns, the model
learns to emit these markers as plain text instead of issuing real tool
calls — silently hallucinating tool results without ever running them.
Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE
event instead of mixing it into delta.content. Per the SSE spec, clients
that don't understand a custom event type silently ignore it, so this is
backward-compatible. Frontends that want to render progress indicators
can listen for the custom event without persisting it to conversation
history.
The /v1/runs endpoint already uses structured events — this aligns the
/v1/chat/completions streaming path with the same principle.
Closes#6972
Add is_network_accessible() helper using Python's ipaddress module to
robustly classify bind addresses (IPv4/IPv6 loopback, wildcards,
mapped addresses, hostname resolution with DNS-failure-fails-closed).
The API server connect() now refuses to start when the bind address is
network-accessible and no API_SERVER_KEY is set, preventing RCE from
other machines on the network.
Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>
The API server's _run_agent() was not passing task_id to
run_conversation(), causing a fresh random UUID per request. This meant
every Open WebUI message spun up a new Docker container and tore it down
afterward — making persistent filesystem state impossible.
Two fixes:
1. Pass task_id="default" so all API server conversations share the same
Docker container (matching the design intent: one configured Docker
environment, always the same container).
2. Derive a stable session_id from the system prompt + first user message
hash instead of uuid4(). This stops hermes sessions list from being
polluted with single-message throwaway sessions.
Fixes#3438.
dingtalk.py: The session_webhook URL from incoming DingTalk messages is POSTed to
without any origin validation (line 290), enabling SSRF attacks via crafted webhook
URLs (e.g. http://169.254.169.254/ to reach cloud metadata). Add a regex check
that only accepts the official DingTalk API origin (https://api.dingtalk.com/).
Also cap _session_webhooks dict at 500 entries with FIFO eviction to prevent
unbounded memory growth from long-running gateway instances.
api_server.py: The X-Hermes-Session-Id request header is accepted and echoed back
into response headers (lines 675, 697) without sanitization. A session ID
containing \r\n enables HTTP response splitting / header injection. Add a check
that rejects session IDs containing control characters (\r, \n, \x00).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Two security hardening changes for the API server:
1. **Startup warning when no API key is configured.**
When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated
requests. This is the default configuration, but operators may not
realize the security implications. A prominent warning at startup
makes the risk visible.
2. **Require authentication for session continuation.**
The `X-Hermes-Session-Id` header allows callers to load and continue
any session stored in state.db. Without authentication, an attacker
who can reach the API server (e.g. via CORS from a malicious page,
or on a shared host) could enumerate session IDs and read conversation
history — which may contain API keys, passwords, code, or other
sensitive data shared with the agent.
Session continuation now returns 403 when no API key is configured,
with a clear error message explaining how to enable the feature.
When a key IS configured, the existing Bearer token check already
gates access.
This is defense-in-depth: the API server is intended for local use,
but defense against cross-origin and shared-host attacks is important
since the default binding is 127.0.0.1 which is reachable from
browsers via DNS rebinding or localhost CORS.
* feat: API server model name derived from profile name
For multi-user setups (e.g. OpenWebUI), each profile's API server now
advertises a distinct model name on /v1/models:
- Profile 'lucas' -> model ID 'lucas'
- Profile 'admin' -> model ID 'admin'
- Default profile -> 'hermes-agent' (unchanged)
Explicit override via API_SERVER_MODEL_NAME env var or
platforms.api_server.model_name config for custom names.
Resolves friction where OpenWebUI couldn't distinguish multiple
hermes-agent connections all advertising the same model name.
* docs: multi-user setup with profiles for API server + Open WebUI
- api-server.md: added Multi-User Setup section, API_SERVER_MODEL_NAME
to config table, updated /v1/models description
- open-webui.md: added Multi-User Setup with Profiles section with
step-by-step guide, updated model name references
- environment-variables.md: added API_SERVER_MODEL_NAME entry
When /v1/runs receives an OpenAI-style array of messages as input, all
messages except the last user turn are now extracted as conversation_history.
Previously only the last message was kept, silently discarding earlier
context in multi-turn conversations.
Handles multi-part content blocks by flattening text portions. Only fires
when no explicit conversation_history was provided.
Based on PR #5837 by pradeep7127.
Allow clients to pass explicit conversation_history in /v1/responses and
/v1/runs request bodies instead of relying on server-side response chaining
via previous_response_id. Solves problems with stateless deployments where
the in-memory ResponseStore is lost on restart.
Adds input validation (must be array of {role, content} objects) and clear
precedence: explicit conversation_history > previous_response_id.
Based on PR #5805 by VanBladee, with added input validation.
Commit cc2b56b2 changed the tool_progress_callback signature from
(name, preview, args) to (event_type, name, preview, args, **kwargs)
but the API server's chat completion streaming callback was not updated.
This caused tool calls to not display in Open WebUI because the
callback received arguments in wrong positions.
- Update _on_tool_progress to use new 4-arg signature
- Add event_type filter to only show tool.started events
- Add **kwargs for optional duration/is_error parameters
Plain functions imported as class attributes in APIServerAdapter get
auto-bound as methods via Python's descriptor protocol. Every
self._cron_*() call injected self as the first positional argument,
causing TypeError on all 8 cron API endpoints at runtime.
Wrap each import with staticmethod() so self._cron_*() calls dispatch
correctly without modifying any call sites.
Co-authored-by: teknium <teknium@nousresearch.com>
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
The API server platform never passed fallback_model to AIAgent(),
so the fallback provider chain was always empty for requests through
the OpenAI-compatible endpoint. Load it via GatewayApp._load_fallback_model()
to match the behavior of Telegram/Discord/Slack platforms.
The API server adapter created AIAgent instances without passing
session_db, so conversations via Open WebUI and other OpenAI-compatible
frontends were never persisted to state.db. This meant 'hermes sessions
list' showed no API server sessions — they were effectively stateless.
Changes:
- Add _ensure_session_db() helper for lazy SessionDB initialization
- Pass session_db=self._ensure_session_db() in _create_agent()
- Refactor existing X-Hermes-Session-Id handler to use the shared helper
Sessions now persist with source='api_server' and are visible alongside
CLI and gateway sessions in hermes sessions list/search.
Reuse a single SessionDB across requests by caching on self._session_db
with lazy initialization. Avoids creating a new SQLite connection per
request when X-Hermes-Session-Id is used. Updated tests to set
adapter._session_db directly instead of patching the constructor.
Allow callers to pass X-Hermes-Session-Id in request headers to continue
an existing conversation. When provided, history is loaded from SessionDB
instead of the request body, and the session_id is echoed in the response
header. Without the header, existing behavior is preserved (new uuid per
request).
This enables web UI clients to maintain thread continuity without modifying
any session state themselves — the same mechanism the gateway uses for IM
platforms (Telegram, Discord, etc.).
Wire the existing tool_progress_callback through the API server's
streaming handler so Open WebUI users see what tool is running.
Uses the existing 3-arg callback signature (name, preview, args)
that fires at tool start — no changes to run_agent.py needed.
Progress appears as inline markdown in the SSE content stream.
Inspired by PR #4032 by sroecker, reimplemented to avoid breaking
the callback signature used by CLI and gateway consumers.
Add X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer
to all API server responses via a new security_headers_middleware.
Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
Adds Access-Control-Max-Age: 600 to CORS preflight responses, telling
browsers to cache the preflight for 10 minutes. Reduces redundant OPTIONS
requests and improves perceived latency for browser-based API clients.
Salvaged from PR #3514 by aydnOktay.
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
StreamResponse headers are flushed on prepare() before the CORS
middleware can inject them. Resolve CORS headers up front using
_cors_headers_for_origin() so the full set (including
Access-Control-Allow-Origin) is present on SSE streams.
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Add GET /v1/health as an alias to the existing /health endpoint so
OpenAI-compatible health checks work out of the box.
Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
Browser clients using the Idempotency-Key header for request
deduplication were blocked by CORS preflight because the header
was not listed in Access-Control-Allow-Headers.
Add Idempotency-Key to _CORS_HEADERS and add tests for both the
new header allowance and the existing Vary: Origin behavior.
Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Salvage of #3399 by @binhnt92 with true agent interruption added on top.
When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled.
Closes#3399.
The API server adapter was creating agents without specifying
enabled_toolsets, causing ALL tools to load — including clarify,
send_message, and text_to_speech which don't work without interactive
callbacks or gateway dispatch.
Changes:
- toolsets.py: Add hermes-api-server toolset (core tools minus clarify,
send_message, text_to_speech)
- api_server.py: Resolve toolsets from config.yaml platform_toolsets
via _get_platform_tools() — same path as all other gateway platforms.
Falls back to hermes-api-server default when no override configured.
- tools_config.py: Add api_server to PLATFORMS dict so users can
customize via 'hermes tools' or platform_toolsets.api_server in
config.yaml
- 12 tests covering toolset definition, config resolution, and
user override
Reported by thatwolfieguy on Discord.
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events
Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.
* fix(run_agent): improve timeout handling for chat completions
Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.
* fix(run_agent): reduce default stream read timeout for chat completions
Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.
* fix(run_agent): enhance streaming error handling and retry logic
Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.
* fix(api_server): streaming breaks when agent makes tool calls
The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.
After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.
Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.
Reported by Rohit Paul on X.
The /v1/responses endpoint used an in-memory OrderedDict that lost
all conversation state on gateway restart. Replace with SQLite-backed
storage at ~/.hermes/response_store.db.
- Responses and conversation name mappings survive restarts
- Same LRU eviction behavior (configurable max_size)
- WAL mode for concurrent read performance
- Falls back to in-memory SQLite if disk path unavailable
- Conversation name→response_id mapping moved into the store
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
CRUD + actions for cron jobs on the existing API server (port 8642):
GET /api/jobs — list jobs
POST /api/jobs — create job
GET /api/jobs/{id} — get job
PATCH /api/jobs/{id} — update job
DELETE /api/jobs/{id} — delete job
POST /api/jobs/{id}/pause — pause job
POST /api/jobs/{id}/resume — resume job
POST /api/jobs/{id}/run — trigger immediate run
All endpoints use existing API_SERVER_KEY auth. Job ID format
validated (12 hex chars). Logic ported from PR #2111 by nock4,
adapted from FastAPI to aiohttp on the existing API server.
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>