hermes-agent/tests/gateway/test_api_server_multimodal.py
Teknium f683132c1d
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969)
OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision
requests to the API server. Both endpoints accept the canonical OpenAI
multimodal shape:

  Chat Completions: {type: text|image_url, image_url: {url, detail?}}
  Responses:        {type: input_text|input_image, image_url: <str>, detail?}

The server validates and converts both into a single internal shape that the
existing agent pipeline already handles (Anthropic adapter converts,
OpenAI-wire providers pass through). Remote http(s) URLs and data:image/*
URLs are supported.

Uploaded files (file, input_file, file_id) and non-image data: URLs are
rejected with 400 unsupported_content_type.

Changes:

- gateway/platforms/api_server.py
  - _normalize_multimodal_content(): validates + normalizes both Chat and
    Responses content shapes. Returns a plain string for text-only content
    (preserves prompt-cache behavior on existing callers) or a canonical
    [{type:text|image_url,...}] list when images are present.
  - _content_has_visible_payload(): replaces the bare truthy check so a
    user turn with only an image no longer rejects as 'No user message'.
  - _handle_chat_completions and _handle_responses both call the new helper
    for user/assistant content; system messages continue to flatten to text.
  - Codex conversation_history, input[], and inline history paths all share
    the same validator. No duplicated normalizers.

- run_agent.py
  - _summarize_user_message_for_log(): produces a short string summary
    ('[1 image] describe this') from list content for logging, spinner
    previews, and trajectory writes. Fixes AttributeError when list
    user_message hit user_message[:80] + '...' / .replace().
  - _chat_content_to_responses_parts(): module-level helper that converts
    chat-style multimodal content to Responses 'input_text'/'input_image'
    parts. Used in _chat_messages_to_responses_input for Codex routing.
  - _preflight_codex_input_items() now validates and passes through list
    content parts for user/assistant messages instead of stringifying.

- tests/gateway/test_api_server_multimodal.py (new, 38 tests)
  - Unit coverage for _normalize_multimodal_content, including both part
    formats, data URL gating, and all reject paths.
  - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses
    verifying multimodal payloads reach _run_agent intact.
  - 400 coverage for file / input_file / non-image data URL.

- tests/run_agent/test_run_agent_multimodal_prologue.py (new)
  - Regression coverage for the prologue no-crash contract.
  - _chat_content_to_responses_parts round-trip coverage.

- website/docs/user-guide/features/api-server.md
  - Inline image examples for both endpoints.
  - Updated Limitations: files still unsupported, images now supported.

Validated live against openrouter/anthropic/claude-opus-4.6:
  POST /v1/chat/completions  → 200, vision-accurate description
  POST /v1/responses         → 200, same image, clean output_text
  POST /v1/chat/completions [file] → 400 unsupported_content_type
  POST /v1/responses [input_file]  → 400 unsupported_content_type
  POST /v1/responses [non-image data URL] → 400 unsupported_content_type

Closes #5621, #8253, #4046, #6632.

Co-authored-by: Paul Bergeron <paul@gamma.app>
Co-authored-by: zhangxicen <zhangxicen@example.com>
Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com>
Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00

308 lines
12 KiB
Python

"""End-to-end tests for inline image inputs on /v1/chat/completions and /v1/responses.
Covers the multimodal normalization path added to the API server. Unlike the
adapter-level tests that patch ``_run_agent``, these tests patch
``AIAgent.run_conversation`` instead so the adapter's full request-handling
path (including the ``run_agent`` prologue that used to crash on list content)
executes against a real aiohttp app.
"""
from unittest.mock import MagicMock, patch
import pytest
from aiohttp import web
from aiohttp.test_utils import TestClient, TestServer
from gateway.config import PlatformConfig
from gateway.platforms.api_server import (
APIServerAdapter,
_content_has_visible_payload,
_normalize_multimodal_content,
cors_middleware,
security_headers_middleware,
)
# ---------------------------------------------------------------------------
# Pure-function tests for _normalize_multimodal_content
# ---------------------------------------------------------------------------
class TestNormalizeMultimodalContent:
def test_string_passthrough(self):
assert _normalize_multimodal_content("hello") == "hello"
def test_none_returns_empty_string(self):
assert _normalize_multimodal_content(None) == ""
def test_text_only_list_collapses_to_string(self):
content = [{"type": "text", "text": "hi"}, {"type": "text", "text": "there"}]
assert _normalize_multimodal_content(content) == "hi\nthere"
def test_responses_input_text_canonicalized(self):
content = [{"type": "input_text", "text": "hello"}]
assert _normalize_multimodal_content(content) == "hello"
def test_image_url_preserved_with_text(self):
content = [
{"type": "text", "text": "describe this"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}},
]
out = _normalize_multimodal_content(content)
assert isinstance(out, list)
assert out == [
{"type": "text", "text": "describe this"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}},
]
def test_input_image_converted_to_canonical_shape(self):
content = [
{"type": "input_text", "text": "hi"},
{"type": "input_image", "image_url": "https://example.com/cat.png"},
]
out = _normalize_multimodal_content(content)
assert out == [
{"type": "text", "text": "hi"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png"}},
]
def test_data_image_url_accepted(self):
content = [{"type": "image_url", "image_url": {"url": "data:image/png;base64,AAAA"}}]
out = _normalize_multimodal_content(content)
assert out == [{"type": "image_url", "image_url": {"url": "data:image/png;base64,AAAA"}}]
def test_non_image_data_url_rejected(self):
content = [{"type": "image_url", "image_url": {"url": "data:text/plain;base64,SGVsbG8="}}]
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content(content)
assert str(exc.value).startswith("unsupported_content_type:")
def test_file_part_rejected(self):
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content([{"type": "file", "file": {"file_id": "f_1"}}])
assert str(exc.value).startswith("unsupported_content_type:")
def test_input_file_part_rejected(self):
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content([{"type": "input_file", "file_id": "f_1"}])
assert str(exc.value).startswith("unsupported_content_type:")
def test_missing_url_rejected(self):
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content([{"type": "image_url", "image_url": {}}])
assert str(exc.value).startswith("invalid_image_url:")
def test_bad_scheme_rejected(self):
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content([{"type": "image_url", "image_url": {"url": "ftp://example.com/x.png"}}])
assert str(exc.value).startswith("invalid_image_url:")
def test_unknown_part_type_rejected(self):
with pytest.raises(ValueError) as exc:
_normalize_multimodal_content([{"type": "audio", "audio": {}}])
assert str(exc.value).startswith("unsupported_content_type:")
class TestContentHasVisiblePayload:
def test_non_empty_string(self):
assert _content_has_visible_payload("hello")
def test_whitespace_only_string(self):
assert not _content_has_visible_payload(" ")
def test_list_with_image_only(self):
assert _content_has_visible_payload([{"type": "image_url", "image_url": {"url": "x"}}])
def test_list_with_only_empty_text(self):
assert not _content_has_visible_payload([{"type": "text", "text": ""}])
# ---------------------------------------------------------------------------
# HTTP integration — real aiohttp client hitting the adapter handlers
# ---------------------------------------------------------------------------
def _make_adapter() -> APIServerAdapter:
return APIServerAdapter(PlatformConfig(enabled=True))
def _create_app(adapter: APIServerAdapter) -> web.Application:
mws = [mw for mw in (cors_middleware, security_headers_middleware) if mw is not None]
app = web.Application(middlewares=mws)
app["api_server_adapter"] = adapter
app.router.add_post("/v1/chat/completions", adapter._handle_chat_completions)
app.router.add_post("/v1/responses", adapter._handle_responses)
app.router.add_get("/v1/responses/{response_id}", adapter._handle_get_response)
return app
@pytest.fixture
def adapter():
return _make_adapter()
class TestChatCompletionsMultimodalHTTP:
@pytest.mark.asyncio
async def test_inline_image_preserved_to_run_agent(self, adapter):
"""Multimodal user content reaches _run_agent as a list of parts."""
image_payload = [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}},
]
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
with patch.object(
adapter,
"_run_agent",
new=MagicMock(),
) as mock_run:
async def _stub(**kwargs):
mock_run.captured = kwargs
return (
{"final_response": "A cat.", "messages": [], "api_calls": 1},
{"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
)
mock_run.side_effect = _stub
resp = await cli.post(
"/v1/chat/completions",
json={
"model": "hermes-agent",
"messages": [{"role": "user", "content": image_payload}],
},
)
assert resp.status == 200, await resp.text()
assert mock_run.captured["user_message"] == image_payload
@pytest.mark.asyncio
async def test_text_only_array_collapses_to_string(self, adapter):
"""Text-only array becomes a plain string so logging stays unchanged."""
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
with patch.object(adapter, "_run_agent", new=MagicMock()) as mock_run:
async def _stub(**kwargs):
mock_run.captured = kwargs
return (
{"final_response": "ok", "messages": [], "api_calls": 1},
{"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
)
mock_run.side_effect = _stub
resp = await cli.post(
"/v1/chat/completions",
json={
"model": "hermes-agent",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "hello"}]},
],
},
)
assert resp.status == 200, await resp.text()
assert mock_run.captured["user_message"] == "hello"
@pytest.mark.asyncio
async def test_file_part_returns_400(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.post(
"/v1/chat/completions",
json={
"model": "hermes-agent",
"messages": [
{"role": "user", "content": [{"type": "file", "file": {"file_id": "f_1"}}]},
],
},
)
assert resp.status == 400
body = await resp.json()
assert body["error"]["code"] == "unsupported_content_type"
assert body["error"]["param"] == "messages[0].content"
@pytest.mark.asyncio
async def test_non_image_data_url_returns_400(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.post(
"/v1/chat/completions",
json={
"model": "hermes-agent",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "data:text/plain;base64,SGVsbG8="},
},
],
},
],
},
)
assert resp.status == 400
body = await resp.json()
assert body["error"]["code"] == "unsupported_content_type"
class TestResponsesMultimodalHTTP:
@pytest.mark.asyncio
async def test_input_image_canonicalized_and_forwarded(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
with patch.object(adapter, "_run_agent", new=MagicMock()) as mock_run:
async def _stub(**kwargs):
mock_run.captured = kwargs
return (
{"final_response": "ok", "messages": [], "api_calls": 1},
{"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
)
mock_run.side_effect = _stub
resp = await cli.post(
"/v1/responses",
json={
"model": "hermes-agent",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Describe."},
{
"type": "input_image",
"image_url": "https://example.com/cat.png",
},
],
}
],
},
)
assert resp.status == 200, await resp.text()
expected = [
{"type": "text", "text": "Describe."},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png"}},
]
assert mock_run.captured["user_message"] == expected
@pytest.mark.asyncio
async def test_input_file_returns_400(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.post(
"/v1/responses",
json={
"model": "hermes-agent",
"input": [
{
"role": "user",
"content": [{"type": "input_file", "file_id": "f_1"}],
}
],
},
)
assert resp.status == 400
body = await resp.json()
assert body["error"]["code"] == "unsupported_content_type"