When the user's provider (e.g. ollama-cloud) is not in _PROVIDER_VISION_MODELS,
the auto-detection falls through to the main model which may not be
vision-capable (e.g. llama3). This sends an image payload to a text-only
model that will fail.
Add _is_likely_vision_model() heuristic that checks model names for known
vision/multimodal indicators. When no explicit vision override exists and
the main model does not appear vision-capable, skip directly to aggregator
fallbacks instead of attempting a doomed request.