hermes-agent/website/docs/user-guide/features
Teknium f683132c1d
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969)
OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision
requests to the API server. Both endpoints accept the canonical OpenAI
multimodal shape:

  Chat Completions: {type: text|image_url, image_url: {url, detail?}}
  Responses:        {type: input_text|input_image, image_url: <str>, detail?}

The server validates and converts both into a single internal shape that the
existing agent pipeline already handles (Anthropic adapter converts,
OpenAI-wire providers pass through). Remote http(s) URLs and data:image/*
URLs are supported.

Uploaded files (file, input_file, file_id) and non-image data: URLs are
rejected with 400 unsupported_content_type.

Changes:

- gateway/platforms/api_server.py
  - _normalize_multimodal_content(): validates + normalizes both Chat and
    Responses content shapes. Returns a plain string for text-only content
    (preserves prompt-cache behavior on existing callers) or a canonical
    [{type:text|image_url,...}] list when images are present.
  - _content_has_visible_payload(): replaces the bare truthy check so a
    user turn with only an image no longer rejects as 'No user message'.
  - _handle_chat_completions and _handle_responses both call the new helper
    for user/assistant content; system messages continue to flatten to text.
  - Codex conversation_history, input[], and inline history paths all share
    the same validator. No duplicated normalizers.

- run_agent.py
  - _summarize_user_message_for_log(): produces a short string summary
    ('[1 image] describe this') from list content for logging, spinner
    previews, and trajectory writes. Fixes AttributeError when list
    user_message hit user_message[:80] + '...' / .replace().
  - _chat_content_to_responses_parts(): module-level helper that converts
    chat-style multimodal content to Responses 'input_text'/'input_image'
    parts. Used in _chat_messages_to_responses_input for Codex routing.
  - _preflight_codex_input_items() now validates and passes through list
    content parts for user/assistant messages instead of stringifying.

- tests/gateway/test_api_server_multimodal.py (new, 38 tests)
  - Unit coverage for _normalize_multimodal_content, including both part
    formats, data URL gating, and all reject paths.
  - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses
    verifying multimodal payloads reach _run_agent intact.
  - 400 coverage for file / input_file / non-image data URL.

- tests/run_agent/test_run_agent_multimodal_prologue.py (new)
  - Regression coverage for the prologue no-crash contract.
  - _chat_content_to_responses_parts round-trip coverage.

- website/docs/user-guide/features/api-server.md
  - Inline image examples for both endpoints.
  - Updated Limitations: files still unsupported, images now supported.

Validated live against openrouter/anthropic/claude-opus-4.6:
  POST /v1/chat/completions  → 200, vision-accurate description
  POST /v1/responses         → 200, same image, clean output_text
  POST /v1/chat/completions [file] → 400 unsupported_content_type
  POST /v1/responses [input_file]  → 400 unsupported_content_type
  POST /v1/responses [non-image data URL] → 400 unsupported_content_type

Closes #5621, #8253, #4046, #6632.

Co-authored-by: Paul Bergeron <paul@gamma.app>
Co-authored-by: zhangxicen <zhangxicen@example.com>
Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com>
Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
..
_category_.json feat: add documentation website (Docusaurus) 2026-03-05 05:24:55 -08:00
acp.md docs(acp): fix zed config 2026-04-03 01:46:45 -07:00
api-server.md feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) 2026-04-20 04:16:13 -07:00
batch-processing.md fix: normalize remaining reasoning effort orderings and add missing 'minimal' 2026-04-09 14:20:16 -07:00
browser.md feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369) 2026-04-19 00:03:10 -07:00
code-execution.md docs(execute_code): document project/strict execution modes (#12073) 2026-04-18 01:53:09 -07:00
context-files.md feat: progressive subdirectory hint discovery (#5291) 2026-04-05 12:33:47 -07:00
context-references.md docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393) 2026-04-05 19:45:50 -07:00
credential-pools.md docs: comprehensive docs audit — cover 13 features from last week's PRs (#5815) 2026-04-07 10:21:03 -07:00
cron.md feat(skills): consolidate find-nearby into maps as a single location skill 2026-04-19 05:19:22 -07:00
dashboard-plugins.md docs: add dashboard themes and plugins documentation 2026-04-16 04:10:06 -07:00
delegation.md docs: comprehensive docs audit — cover 13 features from last week's PRs (#5815) 2026-04-07 10:21:03 -07:00
fallback-providers.md docs(config): document session_search auxiliary controls 2026-04-20 00:47:39 -07:00
honcho.md feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback 2026-04-18 22:50:55 -07:00
hooks.md docs: correctness audit — fix wrong values, add missing coverage (#11972) 2026-04-18 01:45:48 -07:00
image-generation.md feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406) 2026-04-16 22:05:41 -07:00
mcp.md docs: deep quality pass — expand 10 thin pages, fix specific issues (#4134) 2026-03-30 20:30:11 -07:00
memory-providers.md feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback 2026-04-18 22:50:55 -07:00
memory.md docs: add Supermemory to memory providers docs, env vars, CLI reference 2026-04-06 22:15:58 -07:00
overview.md feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406) 2026-04-16 22:05:41 -07:00
personality.md docs: document SOUL.md as primary agent identity (#1927) 2026-03-18 04:18:08 -07:00
plugins.md docs: document register_command() for plugin slash commands (#10671) 2026-04-15 19:55:25 -07:00
provider-routing.md docs: fallback providers + /background command documentation 2026-03-15 06:24:28 -07:00
rl-training.md docs: fix stale and incorrect documentation across 18 files 2026-03-24 07:53:07 -07:00
skills.md feat(skills): add 'hermes skills reset' to un-stick bundled skills (#11468) 2026-04-17 00:41:31 -07:00
skins.md feat(skin): add warm-lightmode skin from PR #4811 2026-04-13 23:51:21 -07:00
tool-gateway.md feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406) 2026-04-16 22:05:41 -07:00
tools.md docs: add Nous Tool Gateway documentation 2026-04-16 12:36:49 -07:00
tts.md docs: correctness audit — fix wrong values, add missing coverage (#11972) 2026-04-18 01:45:48 -07:00
vision.md docs: add Vision & Image Paste guide with platform compatibility 2026-03-05 23:51:46 -08:00
voice-mode.md docs: fix 40+ discrepancies between documentation and codebase (#5818) 2026-04-07 10:17:44 -07:00
web-dashboard.md docs: add dashboard themes and plugins documentation 2026-04-16 04:10:06 -07:00