mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-02 07:11:49 +00:00
fix(agent): fallback immediately on provider content-policy blocks (#33883)
* fix(agent): fallback immediately on provider content-policy blocks
Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible
cybersecurity risk', OpenAI moderation 'violates our usage policies',
Anthropic safety-system rejections, Azure content_filter) are
deterministic decisions about a specific prompt. Retrying the same
prompt up to api_max_retries times just reproduces the same refusal and
burns paid attempts before surfacing the generic 'API failed after 3
retries — <provider message>' to Telegram / cron with no indication that
the failure came from the model provider rather than Hermes itself.
Classify these as a new FailoverReason.content_policy_blocked
(non-retryable, should_fallback=True) and route them through the
existing is_client_error path so the loop:
- skips the 3x retry backoff
- activates a configured fallback model immediately
- emits a clear provider-safety message to the user (not the generic
'Non-retryable error (HTTP None)') and surfaces actionable guidance
when no fallback is configured (rephrase, narrow context, or set
fallback_model in hermes config)
- returns a final_response that explicitly tells the user this came
from the model provider, so gateway delivery is unambiguous and
cron last_status reflects the safety block rather than a vague
'agent reported failure'
Patterns are intentionally narrow — verbatim refusal phrasings keyed to
specific provider safety pipelines, not generic words like 'policy' or
'violation' that would collide with billing / format / auth errors.
Regression guards in test_18028_content_policy_blocked.py verify
billing 402s, generic 400s, and OpenRouter account-level
provider_policy_blocked remain distinct classifications.
Salvaged from #18164 onto current main (file restructure: loop logic
moved from run_agent.py to agent/conversation_loop.py, _emit_status →
_buffer_status), broadened patterns beyond the original OpenAI Codex
cybersecurity case to cover OpenAI moderation, Anthropic safety system,
and Azure content_filter; added user-actionable guidance and a clear
final_response so cron/gateway surfaces the policy block instead of a
generic non-retryable error, and added a regression-guard test module
mirroring the is_client_error predicate.
Addresses #18028.
Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
* chore: add kchuang1015 to AUTHOR_MAP
---------
Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
This commit is contained in:
parent
a82c88bac0
commit
0554ef1aa3
5 changed files with 334 additions and 6 deletions
|
|
@ -44,9 +44,10 @@ class FailoverReason(enum.Enum):
|
|||
payload_too_large = "payload_too_large" # 413 — compress payload
|
||||
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
|
||||
|
||||
# Model
|
||||
# Model / provider policy
|
||||
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
|
||||
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
|
||||
content_policy_blocked = "content_policy_blocked" # Provider safety filter rejected this prompt — deterministic per-request, don't retry unchanged
|
||||
|
||||
# Request format
|
||||
format_error = "format_error" # 400 bad request — abort or strip + retry
|
||||
|
|
@ -289,6 +290,45 @@ _PROVIDER_POLICY_BLOCKED_PATTERNS = [
|
|||
"no endpoints found matching your data policy",
|
||||
]
|
||||
|
||||
# Provider content-policy / safety-filter blocks. Distinct from
|
||||
# ``provider_policy_blocked`` above (which is an OpenRouter *account*-level
|
||||
# data/privacy guardrail) — these are *per-prompt* safety decisions made by
|
||||
# the upstream model provider. They are deterministic for the unchanged
|
||||
# request, so retrying the same prompt three times just reproduces the same
|
||||
# block and burns paid attempts on a refusal. The recovery is to switch to a
|
||||
# configured fallback model/provider immediately, or surface the block to
|
||||
# the user with actionable guidance if no fallback exists.
|
||||
#
|
||||
# Patterns are intentionally narrow — each phrase is a verbatim string from
|
||||
# a specific provider's safety pipeline, not a generic word like "policy" or
|
||||
# "violation" that could collide with billing/auth/format errors:
|
||||
# • OpenAI Codex cybersecurity refusal (gpt-5.5, the case from #18028)
|
||||
# • OpenAI moderation refusal ("violates our usage policies", with
|
||||
# "usage policies" disambiguating from billing's "exceeded ... policy")
|
||||
# • Anthropic safety refusal ("prompt was flagged by ... safety system")
|
||||
# • OpenAI Responses content filter
|
||||
_CONTENT_POLICY_BLOCKED_PATTERNS = [
|
||||
# OpenAI Codex (#18028) — message may arrive without an HTTP status
|
||||
"flagged for possible cybersecurity risk",
|
||||
"trusted access for cyber",
|
||||
# OpenAI moderation — chat completions / responses
|
||||
"violates our usage policies",
|
||||
"violates openai's usage policies",
|
||||
"your request was flagged by",
|
||||
# Anthropic safety system
|
||||
"prompt was flagged by our safety",
|
||||
"responses cannot be generated due to safety",
|
||||
# Generic content-filter wording seen on Azure / OpenAI Responses.
|
||||
# ``content_filter`` (underscore) is the OpenAI-standard error/finish
|
||||
# token surfaced verbatim by their SDKs when a request is blocked.
|
||||
# ``responsibleaipolicyviolation`` is Azure OpenAI's error code.
|
||||
# Deliberately NOT matching the space variant ("content filter") — it
|
||||
# appears in benign config descriptions and tooltip text that providers
|
||||
# echo back; the underscore form is provider-specific enough.
|
||||
"content_filter",
|
||||
"responsibleaipolicyviolation",
|
||||
]
|
||||
|
||||
# Auth patterns (non-status-code signals)
|
||||
_AUTH_PATTERNS = [
|
||||
"invalid api key",
|
||||
|
|
@ -492,6 +532,20 @@ def classify_api_error(
|
|||
|
||||
# ── 1. Provider-specific patterns (highest priority) ────────────
|
||||
|
||||
# Provider content-policy / safety-filter block. The provider has made a
|
||||
# deterministic refusal decision about THIS prompt — retrying unchanged
|
||||
# just reproduces the same refusal and burns paid attempts. Must run
|
||||
# before status-based classification so a 400 safety block isn't
|
||||
# downgraded to a generic ``format_error`` and a status-less block
|
||||
# (OpenAI Codex SDK can raise without one) isn't left in the retryable
|
||||
# ``unknown`` bucket. See issue #18028.
|
||||
if any(p in error_msg for p in _CONTENT_POLICY_BLOCKED_PATTERNS):
|
||||
return _result(
|
||||
FailoverReason.content_policy_blocked,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Anthropic thinking block signature invalid (400).
|
||||
# Don't gate on provider — OpenRouter proxies Anthropic errors, so the
|
||||
# provider may be "openrouter" even though the error is Anthropic-specific.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue