fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint

Follow-up improvements on top of @konsisumer's cherry-picked fix for #10648:

1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and
   a deprecation marker. The previous list included 'copilot-cli' and bare
   'deprecation', which would false-positive on stderr from the NEW
   @github/copilot CLI — whose repo is literally github.com/github/copilot-cli
   and which legitimately surfaces those substrings in its own messages.

2. Replace the deprecation hint. The user in #10648 installed
   'gh extension install github/gh-copilot' (the deprecated extension)
   thinking that's what ACP mode uses, when ACP actually spawns the new
   'copilot' binary from '@github/copilot'. The hint now points users at the
   correct install command ('npm install -g @github/copilot') with the new
   CLI's repo URL, and demotes provider-switching to a fallback alternative.

3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the
   'github-models' alias to the canonical 'copilot' provider id, matching the
   convention used by every other entry in the table.

4. Sharpen the 413 hint message. The free tier's ~8K cap is below the
   system-prompt floor, so this endpoint is fundamentally incompatible with
   an agentic loop — not a 'use a different URL' problem.

Tests:
- New parametrized false-positive coverage for the new CLI's stderr shape.
- Updated assertion to require canonical 'copilot' provider mapping.
- All 14 deprecation/URL tests pass.
This commit is contained in:
teknium1 2026-05-16 01:58:13 -07:00 committed by Teknium
parent b85b938b1f
commit 374dc81c23
4 changed files with 84 additions and 38 deletions

View file

@ -14185,29 +14185,35 @@ class AIAgent:
}
# Actionable hint for GitHub Models (Azure) 413 errors.
# The free tier enforces a hard 8K token limit per request,
# which Hermes' system prompt alone can exceed. Compression
# won't help — surface a clear message so the user doesn't
# wait through three futile compression attempts.
# The free tier enforces a hard 8K token cap per request,
# which Hermes' system prompt + tool schemas alone exceed.
# Compression can't help — the floor is the system prompt
# itself, not the conversation — so surface a clear "not
# compatible" message instead of looping into three futile
# compression attempts.
if (
status_code == 413
and isinstance(_base, str)
and "models.inference.ai.azure.com" in _base
):
self._vprint(
f"{self.log_prefix} 💡 GitHub Models (Azure) enforces a hard per-request token limit (often 8K).",
f"{self.log_prefix} 💡 GitHub Models free tier (models.inference.ai.azure.com) caps every",
force=True,
)
self._vprint(
f"{self.log_prefix} Hermes' system prompt alone may exceed this limit. This endpoint is not",
f"{self.log_prefix} request at ~8K tokens. Hermes' system prompt + tool schemas baseline",
force=True,
)
self._vprint(
f"{self.log_prefix} compatible with Hermes Agent. Use https://models.github.ai or the GitHub",
f"{self.log_prefix} exceeds that floor, so this endpoint cannot run an agentic loop.",
force=True,
)
self._vprint(
f"{self.log_prefix} Copilot provider instead, which have higher token limits.",
f"{self.log_prefix} Use the `copilot` provider with a Copilot subscription token (`hermes",
force=True,
)
self._vprint(
f"{self.log_prefix} setup` → GitHub Copilot), or pick any other provider.",
force=True,
)