hermes-agent/gateway/platforms
Teknium f683132c1d
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969)
OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision
requests to the API server. Both endpoints accept the canonical OpenAI
multimodal shape:

  Chat Completions: {type: text|image_url, image_url: {url, detail?}}
  Responses:        {type: input_text|input_image, image_url: <str>, detail?}

The server validates and converts both into a single internal shape that the
existing agent pipeline already handles (Anthropic adapter converts,
OpenAI-wire providers pass through). Remote http(s) URLs and data:image/*
URLs are supported.

Uploaded files (file, input_file, file_id) and non-image data: URLs are
rejected with 400 unsupported_content_type.

Changes:

- gateway/platforms/api_server.py
  - _normalize_multimodal_content(): validates + normalizes both Chat and
    Responses content shapes. Returns a plain string for text-only content
    (preserves prompt-cache behavior on existing callers) or a canonical
    [{type:text|image_url,...}] list when images are present.
  - _content_has_visible_payload(): replaces the bare truthy check so a
    user turn with only an image no longer rejects as 'No user message'.
  - _handle_chat_completions and _handle_responses both call the new helper
    for user/assistant content; system messages continue to flatten to text.
  - Codex conversation_history, input[], and inline history paths all share
    the same validator. No duplicated normalizers.

- run_agent.py
  - _summarize_user_message_for_log(): produces a short string summary
    ('[1 image] describe this') from list content for logging, spinner
    previews, and trajectory writes. Fixes AttributeError when list
    user_message hit user_message[:80] + '...' / .replace().
  - _chat_content_to_responses_parts(): module-level helper that converts
    chat-style multimodal content to Responses 'input_text'/'input_image'
    parts. Used in _chat_messages_to_responses_input for Codex routing.
  - _preflight_codex_input_items() now validates and passes through list
    content parts for user/assistant messages instead of stringifying.

- tests/gateway/test_api_server_multimodal.py (new, 38 tests)
  - Unit coverage for _normalize_multimodal_content, including both part
    formats, data URL gating, and all reject paths.
  - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses
    verifying multimodal payloads reach _run_agent intact.
  - 400 coverage for file / input_file / non-image data URL.

- tests/run_agent/test_run_agent_multimodal_prologue.py (new)
  - Regression coverage for the prologue no-crash contract.
  - _chat_content_to_responses_parts round-trip coverage.

- website/docs/user-guide/features/api-server.md
  - Inline image examples for both endpoints.
  - Updated Limitations: files still unsupported, images now supported.

Validated live against openrouter/anthropic/claude-opus-4.6:
  POST /v1/chat/completions  → 200, vision-accurate description
  POST /v1/responses         → 200, same image, clean output_text
  POST /v1/chat/completions [file] → 400 unsupported_content_type
  POST /v1/responses [input_file]  → 400 unsupported_content_type
  POST /v1/responses [non-image data URL] → 400 unsupported_content_type

Closes #5621, #8253, #4046, #6632.

Co-authored-by: Paul Bergeron <paul@gamma.app>
Co-authored-by: zhangxicen <zhangxicen@example.com>
Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com>
Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
..
qqbot fix(qqbot): add back-compat for env var rename; drop qrcode core dep 2026-04-17 15:31:14 -07:00
__init__.py feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions 2026-04-14 00:11:49 -07:00
ADDING_A_PLATFORM.md docs: finish cron terminology cleanup 2026-03-14 19:20:58 -07:00
api_server.py feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) 2026-04-20 04:16:13 -07:00
base.py fix: tighten gateway interrupt salvage follow-ups 2026-04-19 03:03:57 -07:00
bluebubbles.py fix(gateway/bluebubbles): embed password in registered webhook URL for inbound auth 2026-04-14 11:02:48 -07:00
dingtalk.py feat(dingtalk): AI Cards streaming, emoji reactions, and media handling 2026-04-17 19:26:53 -07:00
discord.py fix(gateway): accept finalize kwarg in all platform edit_message overrides 2026-04-19 22:46:47 -07:00
email.py fix(gateway): validate Slack image downloads before caching 2026-04-10 03:53:09 -07:00
feishu.py feat(feishu): show processing state via reactions on user messages 2026-04-20 02:04:57 -07:00
feishu_comment.py feat: add Feishu document comment intelligent reply with 3-tier access control 2026-04-17 19:04:11 -07:00
feishu_comment_rules.py fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP 2026-04-17 19:04:11 -07:00
helpers.py fix: enforce TTL in MessageDeduplicator + use yaml for gateway --config (#10306, #10216) (#10509) 2026-04-15 13:35:40 -07:00
homeassistant.py fix(gateway): add request timeouts to HA, Email, Mattermost, SMS adapters (#3258) 2026-03-26 14:36:07 -07:00
matrix.py fix(gateway): accept finalize kwarg in all platform edit_message overrides 2026-04-19 22:46:47 -07:00
mattermost.py fix(gateway): accept finalize kwarg in all platform edit_message overrides 2026-04-19 22:46:47 -07:00
signal.py fix(gateway): prevent scoped lock and resource leaks on connection failure 2026-04-20 01:44:36 -07:00
slack.py fix(gateway): prevent scoped lock and resource leaks on connection failure 2026-04-20 01:44:36 -07:00
sms.py remove unused import and fix misleading log 2026-04-11 14:05:38 -07:00
telegram.py fix(gateway): make Telegram DM topic config writes atomic 2026-04-20 00:57:53 -07:00
telegram_network.py feat(telegram): add dedicated TELEGRAM_PROXY env var and config.yaml proxy_url support 2026-04-15 22:13:11 -07:00
webhook.py fix(webhook): validate HMAC signature before rate limiting (#12544) 2026-04-19 22:45:08 -07:00
wecom.py fix(wecom): bound req_id cache, revert undocumented is_group change, add tests 2026-04-17 19:03:29 -07:00
wecom_callback.py fix: activate WeCom callback message deduplication (#10305) (#10588) 2026-04-15 17:22:58 -07:00
wecom_crypto.py feat(gateway): add WeCom callback-mode adapter for self-built apps 2026-04-11 15:22:49 -07:00
weixin.py Fix Weixin media uploads and refresh lockfile 2026-04-17 06:50:36 -07:00
whatsapp.py fix(gateway): prevent scoped lock and resource leaks on connection failure 2026-04-20 01:44:36 -07:00