fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks

GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:

    model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint

resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.

Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
This commit is contained in:
hermes-agent-dhabibi 2026-04-10 04:35:07 +00:00 committed by Teknium
parent 718e8ad6fa
commit c1af614289
2 changed files with 80 additions and 0 deletions

View file

@ -1425,6 +1425,23 @@ def resolve_provider_client(
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
# Copilot GPT-5+ models (except gpt-5-mini) require the Responses
# API — they are not accessible via /chat/completions. Wrap the
# plain client in CodexAuxiliaryClient so call_llm() transparently
# routes through responses.stream().
if provider == "copilot" and final_model and not raw_codex:
try:
from hermes_cli.models import _should_use_copilot_responses_api
if _should_use_copilot_responses_api(final_model):
logger.debug(
"resolve_provider_client: copilot model %s needs "
"Responses API — wrapping with CodexAuxiliaryClient",
final_model)
client = CodexAuxiliaryClient(client, final_model)
except ImportError:
pass
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))