Commit graph

554 commits

Author SHA1 Message Date
kshitijk4poor
43de1ca8c2 refactor: remove _nr_to_assistant_message shim + fix flush_memories guard
NormalizedResponse and ToolCall now have backward-compat properties
so the agent loop can read them directly without the shim:

  ToolCall: .type, .function (returns self), .call_id, .response_item_id
  NormalizedResponse: .reasoning_content, .reasoning_details,
                      .codex_reasoning_items

This eliminates the 35-line shim and its 4 call sites in run_agent.py.

Also changes flush_memories guard from hasattr(response, 'choices')
to self.api_mode in ('chat_completions', 'bedrock_converse') so it
works with raw boto3 dicts too.

WS1 items 3+4 of Cycle 2 (#14418).
2026-04-23 02:30:05 -07:00
kshitijk4poor
f4612785a4 refactor: collapse normalize_anthropic_response to return NormalizedResponse directly
3-layer chain (transport → v2 → v1) was collapsed to 2-layer in PR 7.
This collapses the remaining 2-layer (transport → v1 → NR mapping in
transport) to 1-layer: v1 now returns NormalizedResponse directly.

Before: adapter returns (SimpleNamespace, finish_reason) tuple,
  transport unpacks and maps to NormalizedResponse (22 lines).
After: adapter returns NormalizedResponse, transport is a
  1-line passthrough.

Also updates ToolCall construction — adapter now creates ToolCall
dataclass directly instead of SimpleNamespace(id, type, function).

WS1 item 1 of Cycle 2 (#14418).
2026-04-23 02:30:05 -07:00
kshitijk4poor
738d0900fd refactor: migrate auxiliary_client Anthropic path to use transport
Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().

Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.

After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.

The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).

WS1 item 2 of Cycle 2 (#14418).
2026-04-23 02:30:05 -07:00
zhzouxiaoya12
3d90292eda fix: normalize provider in list_provider_models to support aliases 2026-04-23 01:59:20 -07:00
Siddharth Balyan
d1ce358646
feat(agent): add PLATFORM_HINTS for matrix, mattermost, and feishu (#14428)
* feat(agent): add PLATFORM_HINTS for matrix, mattermost, and feishu

These platform adapters fully support media delivery (send_image,
send_document, send_voice, send_video) but were missing from
PLATFORM_HINTS, leaving agents unaware of their platform context,
markdown rendering, and MEDIA: tag support.

Salvaged from PR #7370 by Rutimka — wecom excluded since main already
has a more detailed version.

Co-Authored-By: Marco Rutsch <marco@rutimka.de>

* test: add missing Markdown assertion for feishu platform hint

---------

Co-authored-by: Marco Rutsch <marco@rutimka.de>
2026-04-23 12:50:22 +05:30
iborazzi
f41031af3a fix: increase max_tokens for GLM 5.1 reasoning headroom 2026-04-22 18:44:07 -07:00
kshitijk4poor
d30ee2e545 refactor: unify transport dispatch + collapse normalize shims
Consolidate 4 per-transport lazy singleton helpers (_get_anthropic_transport,
_get_codex_transport, _get_chat_completions_transport, _get_bedrock_transport)
into one generic _get_transport(api_mode) with a shared dict cache.

Collapse the 65-line main normalize block (3 api_mode branches, each with
its own SimpleNamespace shim) into 7 lines: one _get_transport() call +
one _nr_to_assistant_message() shared shim. The shim extracts provider_data
fields (codex_reasoning_items, reasoning_details, call_id, response_item_id)
into the SimpleNamespace shape downstream code expects.

Wire chat_completions and bedrock_converse normalize through their transports
for the first time — these were previously falling into the raw
response.choices[0].message else branch.

Remove 8 dead codex adapter imports that have zero callers after PRs 1-6.

Transport lifecycle improvements:
- Eagerly warm transport cache at __init__ (surfaces import errors early)
- Invalidate transport cache on api_mode change (switch_model, fallback
  activation, fallback restore, transport recovery) — prevents stale
  transport after mid-session provider switch

run_agent.py: -32 net lines (11,988 -> 11,956).

PR 7 of the provider transport refactor.
2026-04-22 18:34:25 -07:00
Teknium
c9c6182839 fix(anthropic): guard max_tokens against non-positive values
Port from openclaw/openclaw#66664. The build_anthropic_kwargs call site
used 'max_tokens or _get_anthropic_max_output(model)', which correctly
falls back when max_tokens is 0 or None (falsy) but lets negative ints
(-1, -500), fractional floats (0.5, 8192.7), NaN, and infinity leak
through to the Anthropic API. Anthropic rejects these with HTTP 400
('max_tokens: must be greater than or equal to 1'), turning a local
config error into a surprise mid-conversation failure.

Add two resolver helpers matching OpenClaw's:
  _resolve_positive_anthropic_max_tokens — returns int(value) only if
    value is a finite positive number; excludes bools, strings, NaN,
    infinity, sub-one positives (floor to 0).
  _resolve_anthropic_messages_max_tokens — prefers a positive requested
    value, else falls back to the model's output ceiling; raises
    ValueError only if no positive budget can be resolved.

The context-window clamp at the call site (max_tokens > context_length)
is preserved unchanged — it handles oversized values; the new resolver
handles non-positive values. These concerns are now cleanly separated.

Tests: 17 new cases covering positive/zero/negative ints, fractional
floats (both >1 and <1), NaN, infinity, booleans, strings, None, and
integration via build_anthropic_kwargs.

Refs: openclaw/openclaw#66664
2026-04-22 18:04:47 -07:00
sicnuyudidi
c03858733d fix: pass correct arguments in summary model fallback retry
_generate_summary() takes (turns_to_summarize, focus_topic) but the
summary model fallback path passed (messages, summary_budget) — where
'messages' is not even in scope, causing a NameError.

Fix the recursive call to pass the correct variables so the fallback
to the main model actually works when the summary model is unavailable.

Fixes: #10721
2026-04-22 17:57:13 -07:00
Teknium
d74eaef5f9 fix(error_classifier): retry mid-stream SSL/TLS alert errors as transport
Mid-stream SSL alerts (bad_record_mac, tls_alert_internal_error, handshake
failures) previously fell through the classifier pipeline to the 'unknown'
bucket because:

  - ssl.SSLError type names weren't in _TRANSPORT_ERROR_TYPES (the
    isinstance(OSError) catch picks up some but not all SDK-wrapped forms)
  - the message-pattern list had no SSL alert substrings

The 'unknown' bucket is still retryable, but: (a) logs tell the user
'unknown' instead of identifying the cause, (b) it bypasses the
transport-specific backoff/fallback logic, and (c) if the SSL error
happens on a large session with a generic 'connection closed' wrapper,
the existing disconnect-on-large-session heuristic would incorrectly
trigger context compression — expensive, and never fixes a transport
hiccup.

Changes:
  - Add ssl.SSLError and its subclass type names to _TRANSPORT_ERROR_TYPES
  - New _SSL_TRANSIENT_PATTERNS list (separate from _SERVER_DISCONNECT_PATTERNS
    so SSL alerts route to timeout, not context_overflow+compress)
  - New step 5 in the classifier pipeline: SSL pattern check runs BEFORE
    the disconnect check to pre-empt the large-session-compress path

Patterns cover both space-separated ('ssl alert', 'bad record mac')
and underscore-separated ('ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC')
forms.  This is load-bearing because OpenSSL 3.x changed the error-code
separator from underscore to slash (e.g. SSLV3_ALERT_BAD_RECORD_MAC →
SSL/TLS_ALERT_BAD_RECORD_MAC) and will likely churn again — matching on
stable alert reason substrings survives future format changes.

Tests (8 new):
  - BAD_RECORD_MAC in Python ssl.c format
  - OpenSSL 3.x underscore format
  - TLSV1_ALERT_INTERNAL_ERROR
  - ssl handshake failure
  - [SSL: ...] prefix fallback
  - Real ssl.SSLError instance
  - REGRESSION GUARD: SSL on large session does NOT compress
  - REGRESSION GUARD: plain disconnect on large session STILL compresses
2026-04-22 17:44:50 -07:00
Anders Bell
02aba4a728 fix(skills): follow symlinks in iter_skill_index_files
os.walk() by default does not follow symlinks, causing skills
linked via symlinks to be invisible to the skill discovery system.
Add followlinks=True so that symlinked skill directories are scanned.
2026-04-22 17:43:30 -07:00
Teknium
b9463e32c6 fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies
Port from cline/cline#10266.

When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:

- reported `cache_write_tokens = 0` even when the model actually did a
  prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
  top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
  turn made cost estimation and the status-bar cache-hit percentage wrong
  for Claude traffic going through these gateways.

Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.

Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
2026-04-22 17:40:49 -07:00
wujhsu
276ef49c96 fix(provider): recognize open.bigmodel.cn as Zhipu/ZAI provider
Zhipu AI (智谱) serves both international users via api.z.ai and
China-based users via open.bigmodel.cn. The domestic endpoint was not
mapped in _URL_TO_PROVIDER, causing Hermes to treat it as an unknown
custom endpoint and fall back to the default 128K context length
instead of resolving the correct 200K+ context via models.dev or the
hardcoded GLM defaults.

This affects users of both the standard API
(https://open.bigmodel.cn/api/paas/v4) and the Coding Plan
(https://open.bigmodel.cn/api/coding/paas/v4).
2026-04-22 17:35:55 -07:00
Clifford Garwood
27621ef836 feat: add ctx_size to context length keys for Lemonade server support
- Adds 'ctx_size' field to _CONTEXT_LENGTH_KEYS tuple
- Enables hermes agent to correctly detect context size from custom LLMs
  running on Lemonade server that use this field name instead of the
  standard keys (max_seq_len, n_ctx_train, n_ctx)
2026-04-22 17:25:04 -07:00
Feranmi
66d2d7090e fix(model_metadata): add gemma-4 and gemma4 context length entries
Fixes #12976

The generic "gemma": 8192 fallback was incorrectly matching gemma4:31b-cloud
before the more specific Gemma 4 entries could match, causing Hermes to assign
only 8K context instead of 262K. Added "gemma-4" and "gemma4" entries before
the fallback to correctly handle Gemma 4 model naming conventions.
2026-04-22 16:33:25 -07:00
Teknium
c96a548bde
feat(models): add xiaomi/mimo-v2.5-pro and mimo-v2.5 to openrouter + nous (#14184)
Replace xiaomi/mimo-v2-pro with xiaomi/mimo-v2.5-pro and xiaomi/mimo-v2.5
in the OpenRouter fallback catalog and the nous provider model list.
Add matching DEFAULT_CONTEXT_LENGTHS entries (1M tokens each).
2026-04-22 16:12:39 -07:00
Yukipukii1
1e8254e599 fix(agent): guard context compressor against structured message content 2026-04-22 14:46:51 -07:00
ismell0992-afk
6513138f26 fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts
`is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies
RFC-1918 ranges and link-local as private but deliberately excludes the
RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its
mesh IPs. As a result, Ollama reached over Tailscale (e.g.
`http://100.77.243.5:11434`) was treated as remote and missed the
automatic stream-read / stale-stream timeout bumps, so cold model load
plus long prefill would trip the 300 s watchdog before the first token.

Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")`
(built once) and extend `is_local_endpoint()` to match the block both
via the parsed-`IPv4Address` path and the existing bare-string fallback
(for symmetry with the 10/172/192 checks). Also hoist the previously
function-local `import ipaddress` to module scope now that it's used by
the constant.

Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound,
representative host, MagicDNS anchor, upper bound) and a near-miss
negative set (just below 100.64.0.0, just above 100.127.255.255, well
outside the block, and first-octet-wrong).
2026-04-22 14:46:10 -07:00
bobashopcashier
b49a1b71a7 fix(agent): accept empty content with stop_reason=end_turn as valid anthropic response
Anthropic's API can legitimately return content=[] with stop_reason="end_turn"
when the model has nothing more to add after a turn that already delivered the
user-facing text alongside a trivial tool call (e.g. memory write). The transport
validator was treating that as an invalid response, triggering 3 retries that
each returned the same valid-but-empty response, then failing the run with
"Invalid API response after 3 retries."

The downstream normalizer already handles empty content correctly (empty loop
over response.content, content=None, finish_reason="stop"), so the only fix
needed is at the validator boundary.

Tests:
- Empty content + stop_reason="end_turn" → valid (the fix)
- Empty content + stop_reason="tool_use" → still invalid (regression guard)
- Empty content without stop_reason → still invalid (existing behavior preserved)
2026-04-22 14:26:23 -07:00
kshitijk4poor
04e039f687 fix: Kimi /coding thinking block survival + empty reasoning_content + block ordering
Follow-up to the cherry-picked PR #13897 fix. Three issues found:

1. CRITICAL: The thinking block synthesised from reasoning_content was
   immediately stripped by the third-party signature management code
   (Kimi is classified as _is_third_party_anthropic_endpoint). Added a
   Kimi-specific carve-out that preserves unsigned thinking blocks while
   still stripping Anthropic-signed blocks Kimi can't validate.

2. Empty-string reasoning_content was silently dropped because the
   truthiness check ('if reasoning_content and ...') evaluates to False
   for ''. Changed to 'isinstance(reasoning_content, str)' so the
   tier-3 fallback from _copy_reasoning_content_for_api (which injects
   '' for Kimi tool-call messages with no reasoning) actually produces
   a thinking block.

3. The thinking block was appended AFTER tool_use blocks. Anthropic
   protocol requires thinking -> text -> tool_use ordering. Changed to
   blocks.insert(0, ...) to prepend.
2026-04-22 08:21:23 -07:00
Jerome
2efb0eea21 fix(anthropic_adapter): preserve reasoning_content on assistant tool-call messages for Kimi /coding
Fixes NousResearch/hermes-agent#13848

Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its
own thinking semantics: when thinking is enabled, Kimi validates message
history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content.

The Anthropic path never populated that field, and
convert_messages_to_anthropic strips all Anthropic thinking blocks on
third-party endpoints — so the request failed with HTTP 400:
  "thinking is enabled but reasoning_content is missing in assistant
tool call message at index N"

Now, when an assistant message contains tool_calls and a
reasoning_content string, we append a {"type": "thinking", ...} block
to the Anthropic content so Kimi can validate the history.  This only
affects assistant messages with tool_calls + reasoning_content; plain
text assistant messages are unchanged.
2026-04-22 08:21:23 -07:00
Teknium
77e04a29d5
fix(error_classifier): don't classify generic 404 as model_not_found (#14013)
The 404 branch in _classify_by_status had dead code: the generic
fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the
exact same classification (model_not_found + should_fallback=True),
so every 404 — regardless of message — was treated as a missing model.

This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s
usually mean a wrong endpoint path, proxy routing glitch, or transient
backend issue — not a missing model. Claiming 'model not found' misleads
the next turn and silently falls back to another provider when the real
problem was a URL typo the user should see.

Fix: only classify 404 as model_not_found when the message actually
matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found",
etc.). Otherwise fall through as unknown (retryable) so the real error
surfaces in the retry loop.

Test updated to match the new behavior. 103 error_classifier tests pass.
2026-04-22 06:11:47 -07:00
hengm3467
c6b1ef4e58 feat: add Step Plan provider support (salvage #6005)
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:

- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
  small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
  and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
  mapping, and StepFun region/model persistence

Based on #6005 by @hengm3467.

Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
2026-04-22 02:59:58 -07:00
Teknium
ff9752410a
feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)
* feat(plugins): pluggable image_gen backends + OpenAI provider

Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:

- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
  Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
  incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
  `image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
  `plugin.yaml` of their own).

Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.

FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.

- 41 unit tests (scanner recursion, kind parsing, gate logic,
  registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
  `_handle_image_generate` routes to OpenAI when configured

* fix(image_gen/openai): don't send response_format to gpt-image-*

The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.

* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog

gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.

Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.

* feat(image_gen/openai): expose gpt-image-2 as three quality tiers

Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:

  gpt-image-2-low     ~15s   fast iteration
  gpt-image-2-medium  ~40s   default
  gpt-image-2-high    ~2min  highest fidelity

Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.

Config:
  image_gen.openai.model: gpt-image-2-high
  # or
  image_gen.model: gpt-image-2-low
  # or env var for scripts/tests
  OPENAI_IMAGE_MODEL=gpt-image-2-medium

Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.

* feat(tools_config): plugin image_gen providers inject themselves into picker

'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.

Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
  (name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
  every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
  Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
  writes image_gen.provider and routes to the plugin's list_models()
  catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
  FAL key when any plugin provider reports is_available().

FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.

Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.

397 tests pass across plugins/, tools_config, registry, and picker.

* fix(image_gen): close final gaps for plugin-backend parity with FAL

Two small places that still hardcoded FAL:

- hermes_cli/setup.py status line: an OpenAI-only setup showed
  'Image Generation: missing FAL_KEY'. Now probes plugin providers
  and reports '(OpenAI)' when one is_available() — or falls back to
  'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.

- image_generate tool schema description: said 'using FAL.ai, default
  FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
  user-configured' — and notes the 'image' field can be a URL or an
  absolute path, which the gateway delivers either way via
  extract_local_files().
2026-04-21 21:30:10 -07:00
Teknium
410f33a728
fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826)
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:

  HTTP 400: thinking is enabled but reasoning_content is missing in
  assistant tool call message at index N

Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.

build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).

Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
2026-04-21 21:19:14 -07:00
kshitijk4poor
57411fca24 feat: add BedrockTransport + wire all Bedrock transport paths
Fourth and final transport — completes the transport layer with all four
api_modes covered.  Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.

Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction

Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response.  Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it.  Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.

Transport coverage — complete:
| api_mode           | Transport                | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport       |             |          |         |
| codex_responses    | ResponsesApiTransport    |             |          |         |
| chat_completions   | ChatCompletionsTransport |             |          |         |
| bedrock_converse   | BedrockTransport         |             |          |         |

17 new BedrockTransport tests pass.  117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass.  Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
2026-04-21 20:58:37 -07:00
kshitijk4poor
83d86ce344 feat: add ChatCompletionsTransport + wire all default paths
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.

Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
  ToolCall.provider_data — the original shim stripped it, causing 400 errors
  on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
  the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
  extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
  transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
  chat_completions, response.choices[0].message is already the right shape
  with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
  and per-tool-call .extra_content from the OpenAI SDK.

run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.

39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
2026-04-21 20:50:02 -07:00
emozilla
29693f9d8e feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.

hermes_cli/models.py
  - fetch_nous_recommended_models(portal_base_url, force_refresh=False)
    10-minute TTL cache, keyed per portal URL (staging vs prod don't
    collide).  Public endpoint, no auth required.  Returns {} on any
    failure so callers always get a dict.
  - get_nous_recommended_aux_model(vision, free_tier=None, ...)
    Tier-aware pick from the payload:
      - Paid tier → paidRecommended{Vision,Compaction}Model, falling back
        to freeRecommended* when the paid field is null (common during
        staged rollouts of new paid models).
      - Free tier → freeRecommended* only, never leaks paid models.
    When free_tier is None, auto-detects via the existing
    check_nous_free_tier() helper (already cached 3 min against
    /api/oauth/account).  Detection errors default to paid so we never
    silently downgrade a paying user.

agent/auxiliary_client.py — _try_nous()
  - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
    to get_nous_recommended_aux_model(vision=vision).
  - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
    Portal is unreachable or returns a null recommendation.
  - The Portal is now the source of truth for aux model selection; the
    xiaomi allowlist we used to carry is effectively dead.

Tests (15 new)
  - tests/hermes_cli/test_models.py::TestNousRecommendedModels
    Fetch caching, per-portal keying, network failure, force_refresh;
    paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
    auto-detect, detection-error → paid default, null/blank modelName
    handling.
  - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
    _try_nous honors Portal recommendation for text + vision, falls
    back to google/gemini-3-flash-preview on None or exception.

Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
2026-04-21 20:35:16 -07:00
kshitijk4poor
c832ebd67c feat: add ResponsesApiTransport + wire all Codex transport paths
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().

Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)

Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.

Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.

24 new tests. 343 codex/responses/transport tests pass (0 failures).

PR 4 of the provider transport refactor.
2026-04-21 19:48:56 -07:00
王强
2a026eb762 fix: Update Kimi Coding API endpoint and User-Agent 2026-04-21 19:48:39 -07:00
王强
de181dfd22 fix: add User-Agent claude-code/0.1.0 for Kimi /coding endpoint
- Add _is_kimi_coding_endpoint() to detect Kimi coding API
- Place Kimi check BEFORE _requires_bearer_auth to ensure User-Agent header is set
- Without this header, Kimi returns 403 on /coding/v1/messages
- Fixes kimi-2.5, kimi-for-coding, kimi-k2.6-code-preview all returning 403
2026-04-21 19:48:39 -07:00
Teknium
84449d9afe
fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766)
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.

The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.

Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.

Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.
2026-04-21 19:36:05 -07:00
Teknium
52cbceea44
fix(vision): restore tier-aware Nous vision model selection (#13703)
Revert two overreaches from #13699 that forced paid Nous vision to
xiaomi/mimo-v2-omni instead of the tier-appropriate gemini-3-flash-preview:

1. Remove "nous": "xiaomi/mimo-v2-omni" from _PROVIDER_VISION_MODELS —
   #13696 already routes nous main-provider vision through the strict
   backend, and this entry caused any direct resolve_provider_client(
   "nous", ...) aggregator-lookup path to pick the wrong model for paid.

2. Drop the 'elif vision' paid override in _try_nous() that forced
   mimo-v2-omni on every Nous vision call regardless of tier. Paid
   accounts now keep gemini-3-flash-preview for vision as well as text.

Free-tier behavior unchanged: still uses mimo-v2-omni for vision,
mimo-v2-pro for text (check_nous_free_tier() branch).

E2E verified:
  paid vision → google/gemini-3-flash-preview
  free vision → xiaomi/mimo-v2-omni
  paid text   → google/gemini-3-flash-preview
  free text   → xiaomi/mimo-v2-pro
2026-04-21 14:43:55 -07:00
helix4u
7ba9c22cde fix(vision): route Nous main-provider vision through tier-aware backend 2026-04-21 14:42:32 -07:00
Esteban
0301787653 fix(vision): resolve Nous vision model correctly in auto-detect path
Two changes:
1. _PROVIDER_VISION_MODELS: add 'nous' -> 'xiaomi/mimo-v2-omni' entry
   so the vision auto-detect chain picks the correct multimodal model.

2. resolve_provider_client: detect when the requested model is a vision
   model (from _PROVIDER_VISION_MODELS or known vision model names) and
   pass vision=True to _try_nous().  Previously, _try_nous() was always
   called without vision=True in resolve_provider_client(), causing it to
   return the default text model (gemini-3-flash-preview or mimo-v2-pro)
   instead of the vision-capable mimo-v2-omni.

The _try_nous() function already handled free-tier vision correctly, but
the resolve_provider_client() path (used by the auto-detect vision chain)
never signaled that a vision task was in progress.

Verified: xiaomi/mimo-v2-omni returns HTTP 200 with image inputs on Nous
inference API. google/gemini-3-flash-preview returns 404 with images.
2026-04-21 14:27:41 -07:00
helix4u
392b2bb17b fix(auxiliary): refresh Nous runtime credentials after aux 401s 2026-04-21 14:25:57 -07:00
unlinearity
155b619867 fix(agent): normalize socks:// env proxies for httpx/anthropic
WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed.

Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them:
  - run_agent._get_proxy_from_env
  - agent.auxiliary_client._validate_proxy_env_urls
  - agent.anthropic_adapter.build_anthropic_client
  - gateway.platforms.base.resolve_proxy_url

Regression coverage:
  - run_agent proxy env resolution
  - auxiliary proxy env normalization
  - gateway proxy URL resolution

Verified with:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py

39 passed.
2026-04-21 05:52:46 -07:00
kshitijk4poor
8a11b0a204 feat(account-usage): add per-provider account limits module
Ports agent/account_usage.py and its tests from the original PR #2486
branch. Defines AccountUsageSnapshot / AccountUsageWindow dataclasses,
a shared renderer, and provider-specific fetchers for OpenAI Codex
(wham/usage), Anthropic OAuth (oauth/usage), and OpenRouter (/credits
and /key). Wiring into /usage lands in a follow-up salvage commit.

Authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-21 01:56:35 -07:00
Teknium
2c69b3eca8
fix(auth): unify credential source removal — every source sticks (#13427)
Every credential source Hermes reads from now behaves identically on
`hermes auth remove`: the pool entry stays gone across fresh load_pool()
calls, even when the underlying external state (env var, OAuth file,
auth.json block, config entry) is still present.

Before this, auth_remove_command was a 110-line if/elif with five
special cases, and three more sources (qwen-cli, copilot, custom
config) had no removal handler at all — their pool entries silently
resurrected on the next invocation.  Even the handled cases diverged:
codex suppressed, anthropic deleted-without-suppressing, nous cleared
without suppressing.  Each new provider added a new gap.

What's new:
  agent/credential_sources.py — RemovalStep registry, one entry per
  source (env, claude_code, hermes_pkce, nous device_code, codex
  device_code, qwen-cli, copilot gh_cli + env vars, custom config).
  auth_remove_command dispatches uniformly via find_removal_step().

Changes elsewhere:
  agent/credential_pool.py — every upsert in _seed_from_env,
  _seed_from_singletons, and _seed_custom_pool now gates on
  is_source_suppressed(provider, source) via a shared helper.
  hermes_cli/auth_commands.py — auth_remove_command reduced to 25
  lines of dispatch; auth_add_command now clears ALL suppressions for
  the provider on re-add (was env:* only).

Copilot is special: the same token is seeded twice (gh_cli via
_seed_from_singletons + env:<VAR> via _seed_from_env), so removing one
entry without suppressing the other variants lets the duplicate
resurrect.  The copilot RemovalStep suppresses gh_cli + all three env
variants (COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN) at once.

Tests: 11 new unit tests + 4059 existing pass.  12 E2E scenarios cover
every source in isolated HERMES_HOME with simulated fresh processes.
2026-04-21 01:52:49 -07:00
Teknium
b341b19fff
fix(auth): hermes auth remove sticks for shell-exported env vars (#13418)
Removing an env-seeded credential only cleared ~/.hermes/.env and the
current process's os.environ, leaving shell-exported vars (shell profile,
systemd EnvironmentFile, launchd plist) to resurrect the entry on the
next load_pool() call.  This matched the pre-#11485 codex behaviour.

Now we suppress env:<VAR> in auth.json on remove, gate _seed_from_env()
behind is_source_suppressed(), clear env:* suppressions on auth add,
and print a diagnostic pointing at the shell when the var lives there.

Applies to every env:* seeded credential (xai, deepseek, moonshot, zai,
nvidia, openrouter, anthropic, etc.), not just xai.

Reported by @teknium1 from community user 'Artificial Brain' — couldn't
remove their xAI key via hermes auth remove.
2026-04-21 01:34:50 -07:00
ifrederico
9b36636363 fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
kshitijk4poor
731f4fbae6 feat: add transport ABC + AnthropicTransport wired to all paths
Add ProviderTransport ABC (4 abstract methods: convert_messages,
convert_tools, build_kwargs, normalize_response) plus optional hooks
(validate_response, extract_cache_stats, map_finish_reason).

Add transport registry with lazy discovery — get_transport() auto-imports
transport modules on first call.

Add AnthropicTransport — delegates to existing anthropic_adapter.py
functions, wired to ALL Anthropic code paths in run_agent.py:
- Main normalize loop (L10775)
- Main build_kwargs (L6673)
- Response validation (L9366)
- Finish reason mapping (L9534)
- Cache stats extraction (L9827)
- Truncation normalize (L9565)
- Memory flush build_kwargs + normalize (L7363, L7395)
- Iteration-limit summary + retry (L8465, L8498)

Zero direct adapter imports remain for transport methods. Client lifecycle,
streaming, auth, and credential management stay on AIAgent.

20 new tests (ABC contract, registry, AnthropicTransport methods).
359 anthropic-related tests pass (0 failures).

PR 3 of the provider transport refactor.
2026-04-21 01:27:01 -07:00
alt-glitch
1010e5fa3c refactor: remove redundant local imports already available at module level
Sweep ~74 redundant local imports across 21 files where the same module
was already imported at the top level. Also includes type fixes and lint
cleanups on the same branch.
2026-04-21 00:50:58 -07:00
Teknium
328223576b
feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384)
* feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates

When a skill loads, the activation message now exposes the absolute
skill directory and substitutes ${HERMES_SKILL_DIR} /
${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with
bundled scripts can instruct the agent to run them by absolute path
without an extra skill_view round-trip.

Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md
are pre-executed (with the skill directory as CWD) and their stdout is
inlined into the message before the agent reads it. Off by default —
enable via skills.inline_shell in config.yaml — because any snippet
runs on the host without approval.

Changes:
- agent/skill_commands.py: template substitution, inline-shell
  expansion, absolute skill-dir header, supporting-files list now
  shows both relative and absolute forms.
- hermes_cli/config.py: new skills.template_vars,
  skills.inline_shell, skills.inline_shell_timeout knobs.
- tests/agent/test_skill_commands.py: coverage for header, both
  template tokens (present and missing session id), template_vars
  disable, inline-shell default-off, enabled, CWD, and timeout.
- website/docs/developer-guide/creating-skills.md: documents the
  template tokens, the absolute-path header, and the opt-in inline
  shell with its security caveat.

Validation: tests/agent/ 1591 passed (includes 9 new tests).
E2E: loaded a real skill in an isolated HERMES_HOME; confirmed
${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID}
resolves to the passed task_id, !`date` runs when opt-in is set, and
stays literal when it isn't.

* feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot

bash login shells don't source ~/.bashrc, so tools that install themselves
there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to
the environment snapshot Hermes builds once per session.  Under systemd
or any context with a minimal parent env, that surfaces as
'node: command not found' in the terminal tool even though the binary
is reachable from every interactive shell on the machine.

Changes:
- tools/environments/local.py: before the login-shell snapshot bootstrap
  runs, prepend guarded 'source <file>' lines for each resolved init
  file.  Missing files are skipped, each source is wrapped with a
  '[ -r ... ] && . ... || true' guard so a broken rc can't abort the
  bootstrap.
- hermes_cli/config.py: new terminal.shell_init_files (explicit list,
  supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on)
  knobs.  When shell_init_files is set it takes precedence; when it's
  empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced.
- tests/tools/test_local_shell_init.py: 10 tests covering the resolver
  (auto-bashrc, missing file, explicit override, ~/${VAR} expansion,
  opt-out) and the prelude builder (quoting, guarded sourcing), plus
  a real-LocalEnvironment snapshot test that confirms exports in the
  init file land in subsequent commands' environment.
- website/docs/reference/faq.md: documents the fix in Troubleshooting,
  including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh
  directly via shell_init_files.

Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40
pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py
50/50 pass.  E2E in an isolated HERMES_HOME: confirmed that a fake
~/.bashrc setting a marker var and PATH addition shows up in a real
LocalEnvironment().execute() call, that auto_source_bashrc=false
suppresses it, that an explicit shell_init_files entry wins over the
auto default, and that a missing bashrc is silently skipped.
2026-04-21 00:39:19 -07:00
Teknium
62cbeb6367
test: stop testing mutable data — convert change-detectors to invariants (#13363)
Catalog snapshots, config version literals, and enumeration counts are data
that changes as designed. Tests that assert on those values add no
behavioral coverage — they just break CI on every routine update and cost
engineering time to 'fix.'

Replace with invariants where one exists, delete where none does.

Deleted (pure snapshots):
- TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al
- TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models'
- test_browser_camofox_state::test_config_version_matches_current_schema
  (docstring literally said it would break on unrelated bumps)

Relaxed (keep plumbing check, drop snapshot):
- Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists:
  now assert 'provider exists and has >= 1 entry' instead of specific names
- HuggingFace main/models.py consistency test: drop 'len >= 6' floor

Dynamicized (follow source, not a literal):
- 3x test_config.py migration tests: raw['_config_version'] ==
  DEFAULT_CONFIG['_config_version'] instead of hardcoded 21

Fixed stale tests against intentional behavior changes:
- test_insights::test_gateway_format_hides_cost: name matches new behavior
  (no dollar figures); remove contradicting '$' in text assertion
- test_config::prefers_api_then_url_then_base_url: flipped per PR #9332;
  rename + update to base_url > url > api
- test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to
  assert called — contract is 'credential flowed through'
- test_interrupt_propagation: add provider/model/_base_url to bare-agent
  fixture so the stale-timeout code path resolves

Fixed stale integration tests against opt-in plugin gate:
- transform_tool_result + transform_terminal_output: write plugins.enabled
  allow-list to config.yaml and reset the plugin manager singleton

Source fix (real consistency invariant):
- agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length
  (262144, same as K2.5). test_model_metadata_has_context_lengths was
  correctly catching the gap.

Policy:
- AGENTS.md Testing section: new subsection 'Don't write change-detector
  tests' with do/don't examples. Reviewers should reject catalog-snapshot
  assertions in new tests.

Covers every test that failed on the last completed main CI run
(24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present
+ test_terminal_and_file_toolsets_resolve_all_tools, which now pass both
alone and with the full tests/tools/ directory (xdist ordering flake that
resolved itself).
2026-04-20 23:20:33 -07:00
kshitijk4poor
7ab5eebd03 feat: add transport types + migrate Anthropic normalize path
Add agent/transports/types.py with three shared dataclasses:
- NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data
- ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata)
- Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens

Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the
existing v1 function and maps its output to NormalizedResponse. One call site
in run_agent.py (the main normalize branch) uses v2 with a back-compat shim
to SimpleNamespace for downstream code.

No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3
with the first concrete transport (AnthropicTransport).

46 new tests:
- test_types.py: dataclass construction, build_tool_call, map_finish_reason
- test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools,
  thinking, mixed, stop reasons, mcp prefix stripping, edge cases)

Part of the provider transport refactor (PR 2 of 9).
2026-04-20 23:06:00 -07:00
Teknium
dbb7e00e7e fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 22:14:29 -07:00
Teknium
cecf84daf7 fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 22:14:29 -07:00
jerilynzheng
b117538798 feat: attribution default_headers for ai-gateway provider
Requests through Vercel AI Gateway now carry referrerUrl / appName /
User-Agent attribution so traffic shows up in the gateway's analytics.
Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new
ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.
2026-04-20 21:02:28 -07:00
Peter Fontana
3988c3c245 feat: shell hooks — wire shell scripts as Hermes hook callbacks
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.

Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
  to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted

Cherry-picked from PR #13143 by @pefontana.
2026-04-20 20:53:51 -07:00