fix(image_gen): cache xAI ephemeral URL responses to disk (#26942) (#31759)

xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them.  The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image.  Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.

Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal.  Mirrors the audio_cache pattern used by text_to_speech.

Wires save_url_image into both the xAI and OpenAI providers' URL
branches.  When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.

Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
  in-process HTTP server: bytes round-trip, content-type → extension,
  URL-suffix fallback, default-to-png, 404 propagation, empty-body
  refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
  test_successful_url_response (was asserting the bug), add
  test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.

160/160 in the broader image_gen test surface.
This commit is contained in:
Teknium 2026-05-24 18:10:47 -07:00 committed by GitHub
parent af973e4071
commit 031f9c9edc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 375 additions and 10 deletions

View file

@ -33,6 +33,7 @@ from agent.image_gen_provider import (
error_response,
resolve_aspect_ratio,
save_b64_image,
save_url_image,
success_response,
)
@ -266,9 +267,21 @@ class OpenAIImageGenProvider(ImageGenProvider):
)
image_ref = str(saved_path)
elif url:
# Defensive — gpt-image-2 returns b64 today, but fall back
# gracefully if the API ever changes.
image_ref = url
# Defensive — gpt-image-2 returns b64 today, but OpenAI's API
# has previously returned URLs. Cache the bytes locally so the
# gateway never tries to fetch an ephemeral / signed URL after
# it expires — same rationale as the xAI provider (#26942).
try:
saved_path = save_url_image(url, prefix=f"openai_{tier_id}")
except Exception as exc:
logger.warning(
"OpenAI image URL %s could not be cached (%s); falling back to bare URL.",
url,
exc,
)
image_ref = url
else:
image_ref = str(saved_path)
else:
return error_response(
error="OpenAI response contained neither b64_json nor URL",

View file

@ -29,6 +29,7 @@ from agent.image_gen_provider import (
error_response,
resolve_aspect_ratio,
save_b64_image,
save_url_image,
success_response,
)
from tools.xai_http import hermes_xai_user_agent, resolve_xai_http_credentials
@ -281,7 +282,24 @@ class XAIImageGenProvider(ImageGenProvider):
)
image_ref = str(saved_path)
elif url:
image_ref = url
# xAI's grok-imagine-image returns ephemeral ``imgen.x.ai/xai-tmp-*``
# URLs that 404 within minutes — by the time Telegram's
# ``send_photo`` or any downstream consumer fetches them, the
# asset is gone (#26942). Materialise the bytes locally at
# tool-completion time so the gateway has a stable file path to
# upload, mirroring the b64 branch above and the audio_cache
# pattern used by text_to_speech.
try:
saved_path = save_url_image(url, prefix=f"xai_{model_id}")
except Exception as exc:
logger.warning(
"xAI image URL %s could not be cached (%s); falling back to bare URL.",
url,
exc,
)
image_ref = url
else:
image_ref = str(saved_path)
else:
return error_response(
error="xAI response contained neither b64_json nor URL",