fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
This commit is contained in:
Teknium 2026-04-20 21:17:28 -07:00 committed by Teknium
parent cecf84daf7
commit dbb7e00e7e
13 changed files with 184 additions and 76 deletions

View file

@ -40,6 +40,8 @@ from pathlib import Path
from typing import List, Dict, Any, Optional, Tuple, Callable
from dataclasses import dataclass, field
from datetime import datetime
from utils import base_url_host_matches, base_url_hostname
import fire
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn, TimeRemainingColumn
from rich.console import Console
@ -432,22 +434,29 @@ class TrajectoryCompressor:
def _detect_provider(self) -> str:
"""Detect the provider name from the configured base_url."""
url = (self.config.base_url or "").lower()
if "openrouter" in url:
url = self.config.base_url or ""
if base_url_host_matches(url, "openrouter.ai"):
return "openrouter"
if "nousresearch.com" in url:
if base_url_host_matches(url, "nousresearch.com"):
return "nous"
if "chatgpt.com/backend-api/codex" in url:
if (
base_url_hostname(url) == "chatgpt.com"
and "/backend-api/codex" in url.lower()
):
return "codex"
if "api.z.ai" in url:
if base_url_host_matches(url, "z.ai"):
return "zai"
if "moonshot.ai" in url or "moonshot.cn" in url or "api.kimi.com" in url:
if (
base_url_host_matches(url, "moonshot.ai")
or base_url_host_matches(url, "moonshot.cn")
or base_url_host_matches(url, "api.kimi.com")
):
return "kimi-coding"
if "arcee.ai" in url:
if base_url_host_matches(url, "arcee.ai"):
return "arcee"
if "minimaxi.com" in url:
if base_url_host_matches(url, "minimaxi.com"):
return "minimax-cn"
if "minimax.io" in url:
if base_url_host_matches(url, "minimax.io"):
return "minimax"
# Unknown base_url — not a known provider
return ""