Commit graph

518 commits

Author SHA1 Message Date
unlinearity
155b619867 fix(agent): normalize socks:// env proxies for httpx/anthropic
WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed.

Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them:
  - run_agent._get_proxy_from_env
  - agent.auxiliary_client._validate_proxy_env_urls
  - agent.anthropic_adapter.build_anthropic_client
  - gateway.platforms.base.resolve_proxy_url

Regression coverage:
  - run_agent proxy env resolution
  - auxiliary proxy env normalization
  - gateway proxy URL resolution

Verified with:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py

39 passed.
2026-04-21 05:52:46 -07:00
kshitijk4poor
8a11b0a204 feat(account-usage): add per-provider account limits module
Ports agent/account_usage.py and its tests from the original PR #2486
branch. Defines AccountUsageSnapshot / AccountUsageWindow dataclasses,
a shared renderer, and provider-specific fetchers for OpenAI Codex
(wham/usage), Anthropic OAuth (oauth/usage), and OpenRouter (/credits
and /key). Wiring into /usage lands in a follow-up salvage commit.

Authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-21 01:56:35 -07:00
Teknium
2c69b3eca8
fix(auth): unify credential source removal — every source sticks (#13427)
Every credential source Hermes reads from now behaves identically on
`hermes auth remove`: the pool entry stays gone across fresh load_pool()
calls, even when the underlying external state (env var, OAuth file,
auth.json block, config entry) is still present.

Before this, auth_remove_command was a 110-line if/elif with five
special cases, and three more sources (qwen-cli, copilot, custom
config) had no removal handler at all — their pool entries silently
resurrected on the next invocation.  Even the handled cases diverged:
codex suppressed, anthropic deleted-without-suppressing, nous cleared
without suppressing.  Each new provider added a new gap.

What's new:
  agent/credential_sources.py — RemovalStep registry, one entry per
  source (env, claude_code, hermes_pkce, nous device_code, codex
  device_code, qwen-cli, copilot gh_cli + env vars, custom config).
  auth_remove_command dispatches uniformly via find_removal_step().

Changes elsewhere:
  agent/credential_pool.py — every upsert in _seed_from_env,
  _seed_from_singletons, and _seed_custom_pool now gates on
  is_source_suppressed(provider, source) via a shared helper.
  hermes_cli/auth_commands.py — auth_remove_command reduced to 25
  lines of dispatch; auth_add_command now clears ALL suppressions for
  the provider on re-add (was env:* only).

Copilot is special: the same token is seeded twice (gh_cli via
_seed_from_singletons + env:<VAR> via _seed_from_env), so removing one
entry without suppressing the other variants lets the duplicate
resurrect.  The copilot RemovalStep suppresses gh_cli + all three env
variants (COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN) at once.

Tests: 11 new unit tests + 4059 existing pass.  12 E2E scenarios cover
every source in isolated HERMES_HOME with simulated fresh processes.
2026-04-21 01:52:49 -07:00
Teknium
b341b19fff
fix(auth): hermes auth remove sticks for shell-exported env vars (#13418)
Removing an env-seeded credential only cleared ~/.hermes/.env and the
current process's os.environ, leaving shell-exported vars (shell profile,
systemd EnvironmentFile, launchd plist) to resurrect the entry on the
next load_pool() call.  This matched the pre-#11485 codex behaviour.

Now we suppress env:<VAR> in auth.json on remove, gate _seed_from_env()
behind is_source_suppressed(), clear env:* suppressions on auth add,
and print a diagnostic pointing at the shell when the var lives there.

Applies to every env:* seeded credential (xai, deepseek, moonshot, zai,
nvidia, openrouter, anthropic, etc.), not just xai.

Reported by @teknium1 from community user 'Artificial Brain' — couldn't
remove their xAI key via hermes auth remove.
2026-04-21 01:34:50 -07:00
ifrederico
9b36636363 fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
kshitijk4poor
731f4fbae6 feat: add transport ABC + AnthropicTransport wired to all paths
Add ProviderTransport ABC (4 abstract methods: convert_messages,
convert_tools, build_kwargs, normalize_response) plus optional hooks
(validate_response, extract_cache_stats, map_finish_reason).

Add transport registry with lazy discovery — get_transport() auto-imports
transport modules on first call.

Add AnthropicTransport — delegates to existing anthropic_adapter.py
functions, wired to ALL Anthropic code paths in run_agent.py:
- Main normalize loop (L10775)
- Main build_kwargs (L6673)
- Response validation (L9366)
- Finish reason mapping (L9534)
- Cache stats extraction (L9827)
- Truncation normalize (L9565)
- Memory flush build_kwargs + normalize (L7363, L7395)
- Iteration-limit summary + retry (L8465, L8498)

Zero direct adapter imports remain for transport methods. Client lifecycle,
streaming, auth, and credential management stay on AIAgent.

20 new tests (ABC contract, registry, AnthropicTransport methods).
359 anthropic-related tests pass (0 failures).

PR 3 of the provider transport refactor.
2026-04-21 01:27:01 -07:00
alt-glitch
1010e5fa3c refactor: remove redundant local imports already available at module level
Sweep ~74 redundant local imports across 21 files where the same module
was already imported at the top level. Also includes type fixes and lint
cleanups on the same branch.
2026-04-21 00:50:58 -07:00
Teknium
328223576b
feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384)
* feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates

When a skill loads, the activation message now exposes the absolute
skill directory and substitutes ${HERMES_SKILL_DIR} /
${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with
bundled scripts can instruct the agent to run them by absolute path
without an extra skill_view round-trip.

Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md
are pre-executed (with the skill directory as CWD) and their stdout is
inlined into the message before the agent reads it. Off by default —
enable via skills.inline_shell in config.yaml — because any snippet
runs on the host without approval.

Changes:
- agent/skill_commands.py: template substitution, inline-shell
  expansion, absolute skill-dir header, supporting-files list now
  shows both relative and absolute forms.
- hermes_cli/config.py: new skills.template_vars,
  skills.inline_shell, skills.inline_shell_timeout knobs.
- tests/agent/test_skill_commands.py: coverage for header, both
  template tokens (present and missing session id), template_vars
  disable, inline-shell default-off, enabled, CWD, and timeout.
- website/docs/developer-guide/creating-skills.md: documents the
  template tokens, the absolute-path header, and the opt-in inline
  shell with its security caveat.

Validation: tests/agent/ 1591 passed (includes 9 new tests).
E2E: loaded a real skill in an isolated HERMES_HOME; confirmed
${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID}
resolves to the passed task_id, !`date` runs when opt-in is set, and
stays literal when it isn't.

* feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot

bash login shells don't source ~/.bashrc, so tools that install themselves
there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to
the environment snapshot Hermes builds once per session.  Under systemd
or any context with a minimal parent env, that surfaces as
'node: command not found' in the terminal tool even though the binary
is reachable from every interactive shell on the machine.

Changes:
- tools/environments/local.py: before the login-shell snapshot bootstrap
  runs, prepend guarded 'source <file>' lines for each resolved init
  file.  Missing files are skipped, each source is wrapped with a
  '[ -r ... ] && . ... || true' guard so a broken rc can't abort the
  bootstrap.
- hermes_cli/config.py: new terminal.shell_init_files (explicit list,
  supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on)
  knobs.  When shell_init_files is set it takes precedence; when it's
  empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced.
- tests/tools/test_local_shell_init.py: 10 tests covering the resolver
  (auto-bashrc, missing file, explicit override, ~/${VAR} expansion,
  opt-out) and the prelude builder (quoting, guarded sourcing), plus
  a real-LocalEnvironment snapshot test that confirms exports in the
  init file land in subsequent commands' environment.
- website/docs/reference/faq.md: documents the fix in Troubleshooting,
  including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh
  directly via shell_init_files.

Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40
pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py
50/50 pass.  E2E in an isolated HERMES_HOME: confirmed that a fake
~/.bashrc setting a marker var and PATH addition shows up in a real
LocalEnvironment().execute() call, that auto_source_bashrc=false
suppresses it, that an explicit shell_init_files entry wins over the
auto default, and that a missing bashrc is silently skipped.
2026-04-21 00:39:19 -07:00
Teknium
62cbeb6367
test: stop testing mutable data — convert change-detectors to invariants (#13363)
Catalog snapshots, config version literals, and enumeration counts are data
that changes as designed. Tests that assert on those values add no
behavioral coverage — they just break CI on every routine update and cost
engineering time to 'fix.'

Replace with invariants where one exists, delete where none does.

Deleted (pure snapshots):
- TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al
- TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models'
- test_browser_camofox_state::test_config_version_matches_current_schema
  (docstring literally said it would break on unrelated bumps)

Relaxed (keep plumbing check, drop snapshot):
- Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists:
  now assert 'provider exists and has >= 1 entry' instead of specific names
- HuggingFace main/models.py consistency test: drop 'len >= 6' floor

Dynamicized (follow source, not a literal):
- 3x test_config.py migration tests: raw['_config_version'] ==
  DEFAULT_CONFIG['_config_version'] instead of hardcoded 21

Fixed stale tests against intentional behavior changes:
- test_insights::test_gateway_format_hides_cost: name matches new behavior
  (no dollar figures); remove contradicting '$' in text assertion
- test_config::prefers_api_then_url_then_base_url: flipped per PR #9332;
  rename + update to base_url > url > api
- test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to
  assert called — contract is 'credential flowed through'
- test_interrupt_propagation: add provider/model/_base_url to bare-agent
  fixture so the stale-timeout code path resolves

Fixed stale integration tests against opt-in plugin gate:
- transform_tool_result + transform_terminal_output: write plugins.enabled
  allow-list to config.yaml and reset the plugin manager singleton

Source fix (real consistency invariant):
- agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length
  (262144, same as K2.5). test_model_metadata_has_context_lengths was
  correctly catching the gap.

Policy:
- AGENTS.md Testing section: new subsection 'Don't write change-detector
  tests' with do/don't examples. Reviewers should reject catalog-snapshot
  assertions in new tests.

Covers every test that failed on the last completed main CI run
(24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present
+ test_terminal_and_file_toolsets_resolve_all_tools, which now pass both
alone and with the full tests/tools/ directory (xdist ordering flake that
resolved itself).
2026-04-20 23:20:33 -07:00
kshitijk4poor
7ab5eebd03 feat: add transport types + migrate Anthropic normalize path
Add agent/transports/types.py with three shared dataclasses:
- NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data
- ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata)
- Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens

Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the
existing v1 function and maps its output to NormalizedResponse. One call site
in run_agent.py (the main normalize branch) uses v2 with a back-compat shim
to SimpleNamespace for downstream code.

No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3
with the first concrete transport (AnthropicTransport).

46 new tests:
- test_types.py: dataclass construction, build_tool_call, map_finish_reason
- test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools,
  thinking, mixed, stop reasons, mcp prefix stripping, edge cases)

Part of the provider transport refactor (PR 2 of 9).
2026-04-20 23:06:00 -07:00
Teknium
dbb7e00e7e fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 22:14:29 -07:00
Teknium
cecf84daf7 fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 22:14:29 -07:00
jerilynzheng
b117538798 feat: attribution default_headers for ai-gateway provider
Requests through Vercel AI Gateway now carry referrerUrl / appName /
User-Agent attribution so traffic shows up in the gateway's analytics.
Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new
ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.
2026-04-20 21:02:28 -07:00
Peter Fontana
3988c3c245 feat: shell hooks — wire shell scripts as Hermes hook callbacks
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.

Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
  to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted

Cherry-picked from PR #13143 by @pefontana.
2026-04-20 20:53:51 -07:00
Tanner Fokkens
cde7283821 fix: forward auth when probing local model metadata
Pass the user's configured api_key through local-server detection and
context-length probes (detect_local_server_type, _query_local_context_length,
query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in
fetch_endpoint_model_metadata when a loaded instance is present — so the
probed context length is the actual runtime value the user loaded the model
at, not just the model's theoretical max.

Helps local-LLM users whose auto-detected context length was wrong, causing
compression failures and context-overrun crashes.
2026-04-20 20:51:56 -07:00
entropidelic
3368814a3d fix(security): redact secrets from context compaction input and output
Three-layer defense against secrets leaking into compaction summaries:
1. Input redaction: redact_sensitive_text() on message content and tool
   call arguments in _serialize_for_summary() before sending to summarizer
2. Prompt instructions: NEVER include API keys/tokens/passwords in the
   summarizer preamble, template Critical Context section, and focus topic
3. Output redaction: redact_sensitive_text() on the summary output and
   _previous_summary for iterative updates

Reuses existing agent/redact.py patterns (sk-*, ghp_*, key=value, etc).

Cherry-picked from PR #9200 by @entropidelic.
2026-04-20 16:07:13 -07:00
Teknium
3cba81ebed
fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)
Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
2026-04-20 12:23:05 -07:00
kshitijk4poor
ff56bebdf3 refactor: extract codex_responses logic into dedicated adapter
Extract 12 Codex Responses API format-conversion and normalization functions
from run_agent.py into agent/codex_responses_adapter.py, following the
existing pattern of anthropic_adapter.py and bedrock_adapter.py.

run_agent.py: 12,550 → 11,865 lines (-685 lines)

Functions moved:
- _chat_content_to_responses_parts (multimodal content conversion)
- _summarize_user_message_for_log (multimodal message logging)
- _deterministic_call_id (cache-safe fallback IDs)
- _split_responses_tool_id (composite ID splitting)
- _derive_responses_function_call_id (fc_ prefix conversion)
- _responses_tools (schema format conversion)
- _chat_messages_to_responses_input (message format conversion)
- _preflight_codex_input_items (input validation)
- _preflight_codex_api_kwargs (API kwargs validation)
- _extract_responses_message_text (response text extraction)
- _extract_responses_reasoning_text (reasoning extraction)
- _normalize_codex_response (full response normalization)

All functions are stateless module-level functions. AIAgent methods remain
as thin one-line wrappers. Both module-level helpers are re-exported from
run_agent.py for backward compatibility with existing test imports.

Includes multimodal inline image support (PR #12969) that the original PR
was missing.

Based on PR #12975 by @kshitijk4poor.
2026-04-20 11:53:17 -07:00
Teknium
d587d62eba
feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal (#13148)
* feat(security): URL query param + userinfo + form body redaction

Port from nearai/ironclaw#2529.

Hermes already has broad value-shape coverage in agent/redact.py
(30+ vendor prefixes, JWTs, DB connstrs, etc.) but missed three
key-name-based patterns that catch opaque tokens without recognizable
prefixes:

1. URL query params - OAuth callback codes (?code=...),
   access_token, refresh_token, signature, etc. These are opaque and
   won't match any prefix regex. Now redacted by parameter NAME.

2. URL userinfo (https://user:pass@host) - for non-DB schemes. DB
   schemes were already handled by _DB_CONNSTR_RE.

3. Form-urlencoded body (k=v pairs joined by ampersands) -
   conservative, only triggers on clean pure-form inputs with no
   other text.

Sensitive key allowlist matches ironclaw's (exact case-insensitive,
NOT substring - so token_count and session_id pass through).

Tests: +20 new test cases across 3 test classes. All 75 redact tests
pass; gateway/test_pii_redaction and tools/test_browser_secret_exfil
also green.

Known pre-existing limitation: _ENV_ASSIGN_RE greedy match swallows
whole all-caps ENV-style names + trailing text when followed by
another assignment. Left untouched here (out of scope); URL query
redaction handles the lowercase case.

* feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal

Update model catalogs for OpenRouter (fallback snapshot), Nous Portal,
and NVIDIA NIM to reference moonshotai/kimi-k2.6.  Add kimi-k2.6 to
the fixed-temperature frozenset in auxiliary_client.py so the 0.6
contract is enforced on aggregator routings.

Native Moonshot provider lists (kimi-coding, kimi-coding-cn, moonshot,
opencode-zen, opencode-go) are unchanged — those use Moonshot's own
model IDs which are unaffected.
2026-04-20 11:49:54 -07:00
Austin Pickett
720e1c65b2
Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
kshitijk4poor
bc2559c44d fix: remove codex spark model support
Drop gpt-5.3-codex-spark from Codex forward-compat synthesis,
provider catalogs, and context metadata now that the API no longer
supports it.
2026-04-20 04:51:44 -07:00
Linux2010
b869bf206c fix(error_classifier): handle dict-typed message fields without crashing
When API providers return Pydantic-style validation errors where
body['message'] or body['error']['message'] is a dict (e.g.
{"detail": [...]}), the error classifier was crashing with
AttributeError: 'dict' object has no attribute 'lower'.

The 'or ""' fallback only handles None/falsy values. A non-empty
dict is truthy and passes through to .lower(), which fails.

Fix: Wrap all 5 call sites with str() before calling .lower().
This is a no-op for strings and safely converts dicts to their
repr for pattern matching (no false positives on classification
patterns like 'rate limit', 'context length', etc.).

Closes #11233
2026-04-20 02:40:20 -07:00
haileymarshall
49282b6e04 fix(gemini): assign unique stream indices to parallel tool calls
The streaming translator in agent/gemini_cloudcode_adapter.py keyed OpenAI
tool-call indices by function name, so when the model emitted multiple
parallel functionCall parts with the same name in a single turn (e.g.
three read_file calls in one response), they all collapsed onto index 0.
Downstream aggregators that key chunks by index would overwrite or drop
all but the first call.

Replace the name-keyed dict with a per-stream counter that persists across
SSE events. Each functionCall part now gets a fresh, unique index,
matching the non-streaming path which already uses enumerate(parts).

Add TestTranslateStreamEvent covering parallel-same-name calls, index
persistence across events, and finish-reason promotion to tool_calls.
2026-04-20 02:10:53 -07:00
Ruzzgar
60236862ee fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
helix4u
6ab78401c9 fix(aux): add session_search extra_body and concurrency controls
Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>
2026-04-20 00:47:39 -07:00
kagura-agent
9b60ffc47f fix: include api.moonshot.cn in public API temperature override (#12745)
kimi-k2.5 on api.moonshot.cn/v1 rejects temperature=0.6 with HTTP 400, same
as api.moonshot.ai. The public API check now matches both domains.
2026-04-20 00:32:06 -07:00
helix4u
8155ebd7c4 fix(gemini): sanitize tool schemas for Google providers 2026-04-20 00:26:18 -07:00
Teknium
fc5fda5e38
fix(display): render <missing old_text> in memory previews instead of empty quotes (#12852)
When the model omits old_text on memory replace/remove, the tool preview
rendered as '~memory: ""' / '-memory: ""', which obscured what went wrong.
Render '<missing old_text>' in that case so the failure mode is legible
in the activity feed.

Narrow salvage from #12456 / #12831 — only the display-layer fix, not the
schema/API changes.
2026-04-19 22:45:47 -07:00
Teknium
65a31ee0d5
fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)
Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers.  Synthesizes
eight stale community PRs into one consolidated change.

Five fixes:

- URL detection: consolidate three inline `endswith("/anthropic")`
  checks in runtime_provider.py into the shared _detect_api_mode_for_url
  helper.  Third-party /anthropic endpoints now auto-resolve to
  api_mode=anthropic_messages via one code path instead of three.

- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
  (__init__, switch_model, _try_refresh_anthropic_client_credentials,
  _swap_credential, _try_activate_fallback) now gate on
  `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
  Claude-Code identity injection on third-party endpoints.  Previously
  only 2 of 5 sites were guarded.

- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
  `(should_cache, use_native_layout)` per endpoint.  Replaces three
  inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
  call-site flag.  Native Anthropic and third-party Anthropic gateways
  both get the native cache_control layout; OpenRouter gets envelope
  layout.  Layout is persisted in `_primary_runtime` so fallback
  restoration preserves the per-endpoint choice.

- Auxiliary client: `_try_custom_endpoint` honors
  `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
  instead of silently downgrading to an OpenAI-wire client.  Degrades
  gracefully to OpenAI-wire when the anthropic SDK isn't installed.

- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
  clears stale `api_key`/`api_mode` when switching to a built-in
  provider, so a previous MiniMax custom endpoint's credentials can't
  leak into a later OpenRouter session.

- Truncation continuation: length-continuation and tool-call-truncation
  retry now cover `anthropic_messages` in addition to `chat_completions`
  and `bedrock_converse`.  Reuses the existing `_build_assistant_message`
  path via `normalize_anthropic_response()` so the interim message
  shape is byte-identical to the non-truncated path.

Tests: 6 new files, 42 test cases.  Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).

Synthesized from (credits preserved via Co-authored-by trailers):
  #7410  @nocoo           — URL detection helper
  #7393  @keyuyuan        — OAuth 5-site guard
  #7367  @n-WN            — OAuth guard (narrower cousin, kept comment)
  #8636  @sgaofen         — caching helper + native-vs-proxy layout split
  #10954 @Only-Code-A     — caching on anthropic_messages+Claude
  #7648  @zhongyueming1121 — aux client anthropic_messages branch
  #6096  @hansnow         — /model switch clears stale api_mode
  #9691  @TroyMitchell911 — anthropic_messages truncation continuation

Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects:    #9621 (OpenAI-wire caching with incomplete blocklist — risky),
            #7242 (superseded by #9691, stale branch),
            #8321 (targets smart_model_routing which was removed in #12732).

Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
2026-04-19 22:43:09 -07:00
taeng0204
6f79b8f01d fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai
Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.
2026-04-19 18:54:35 -07:00
Teknium
424e9f36b0
refactor: remove smart_model_routing feature (#12732)
Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default.  This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.

The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.

Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
  example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
  HermesCLI (signature kept; just no longer initialised through the
  smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
  routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md

Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
  simplified _resolve_turn_agent_config directly (preserves credential
  pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
  — they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
  helpers

Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
2026-04-19 18:12:55 -07:00
kshitijk4poor
d393104bad fix(gemini): tighten native routing and streaming replay
- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks
2026-04-19 12:40:08 -07:00
kshitijk4poor
3dea497b20 feat(providers): route gemini through the native AI Studio API
- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks
2026-04-19 12:40:08 -07:00
Teknium
db60c98276
docs(memory): steer agents to save declarative facts, not instructions (#12665)
Imperative memory entries ('Always respond concisely', 'Run tests with
pytest -n 4') get re-read as directives in future sessions, causing
repeated work or overriding the user's current request. Add a short
phrasing guideline to MEMORY_GUIDANCE so the model writes declarative
facts instead ('User prefers concise responses', 'Project uses pytest
with xdist').

Credit: observation from @Mariandipietra on X.
2026-04-19 12:00:53 -07:00
Teknium
cca3278079 fix(codex): pin correct Cloudflare headers and extend to auxiliary client
The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers
2026-04-19 11:59:25 -07:00
Teknium
f1fe29d1c3 feat(providers): extend request_timeout_seconds to all client paths
Follow-up on top of mvanhorn's cherry-picked commit. Original PR only
wired request_timeout_seconds into the explicit-creds OpenAI branch at
run_agent.py init; router-based implicit auth, native Anthropic, and the
fallback chain were still hardcoded to SDK defaults.

- agent/anthropic_adapter.py: build_anthropic_client() accepts an optional
  timeout kwarg (default 900s preserved when unset/invalid).
- run_agent.py: resolve per-provider/per-model timeout once at init; apply
  to Anthropic native init + post-refresh rebuild + stale/interrupt
  rebuilds + switch_model + _restore_primary_runtime + the OpenAI
  implicit-auth path + _try_activate_fallback (with immediate client
  rebuild so the first fallback request carries the configured timeout).
- tests: cover anthropic adapter kwarg honoring; widen mock signatures
  to accept the new timeout kwarg.
- docs/example: clarify that the knob now applies to every transport,
  the fallback chain, and rebuilds after credential rotation.
2026-04-19 11:23:00 -07:00
Dusk1e
fd119a1c4a fix(agent): refresh skills prompt cache when disabled skills change 2026-04-19 11:16:24 -07:00
Teknium
13294c2d18 feat(compression): summaries now respect the conversation's language
Context compaction summaries were always produced in English regardless
of the conversation language, which injected English context into
non-English conversations and muddied the continuation experience.

Adds a one-sentence instruction to the shared `_summarizer_preamble`
used by both the initial-compaction and iterative-update prompt paths.
Placing it in the preamble (rather than adding it separately to each
prompt) means both code paths stay in sync with one edit.

Ported from anomalyco/opencode#20581. The original PR (#4670) landed
before main's prompt templates were refactored to share the
`_summarizer_preamble` and `_template_sections` blocks, so the
cherry-pick conflicted on the now-obsolete inline sections; re-applied
the essential one-line change on top of the current structure.

Verified: 48/48 existing compressor tests pass.
2026-04-19 11:05:14 -07:00
Teknium
b02833f32d
fix(codex): Hermes owns its own Codex auth; stop touching ~/.codex/auth.json (#12360)
Codex OAuth refresh tokens are single-use and rotate on every refresh.
Sharing them with the Codex CLI / VS Code via ~/.codex/auth.json made
concurrent use of both tools a race: whoever refreshed last invalidated
the other side's refresh_token.  On top of that, the silent auto-import
path picked up placeholder / aborted-auth data from ~/.codex/auth.json
(e.g. literal {"access_token":"access-new","refresh_token":"refresh-new"})
and seeded it into the Hermes pool as an entry the selector could
eventually pick.

Hermes now owns its own Codex auth state end-to-end:

Removed
- agent/credential_pool.py: _sync_codex_entry_from_cli() method,
  its pre-refresh + retry + _available_entries call sites, and the
  post-refresh write-back to ~/.codex/auth.json.
- agent/credential_pool.py: auto-import from ~/.codex/auth.json in
  _seed_from_singletons() — users now run `hermes auth openai-codex`
  explicitly.
- hermes_cli/auth.py: silent runtime migration in
  resolve_codex_runtime_credentials() — now surfaces
  `codex_auth_missing` directly (message already points to `hermes auth`).
- hermes_cli/auth.py: post-refresh write-back in
  _refresh_codex_auth_tokens().
- hermes_cli/auth.py: dead helper _write_codex_cli_tokens() and its 4
  tests in test_auth_codex_provider.py.

Kept
- hermes_cli/auth.py: _import_codex_cli_tokens() — still used by the
  interactive `hermes auth openai-codex` setup flow for a user-gated
  one-time import (with "a separate login is recommended" messaging).

User-visible impact
- On existing installs with Hermes auth already present: no change.
- On a fresh install where the user has only logged in via Codex CLI:
  `hermes chat --provider openai-codex` now fails with "No Codex
  credentials stored. Run `hermes auth` to authenticate." The
  interactive setup flow then detects ~/.codex/auth.json and offers a
  one-time import.
- On an install where Codex CLI later refreshes its token: Hermes is
  unaffected (we no longer read from that file at runtime).

Tests
- tests/hermes_cli/test_auth_codex_provider.py: 15/15 pass.
- tests/hermes_cli/test_auth_commands.py: 20/20 pass.
- tests/agent/test_credential_pool.py: 31/31 pass.
- Live E2E on openai-codex/gpt-5.4: 1 API call, 1.7s latency,
  3 log lines, no refresh events, no auth drama.

The related 14:52 refresh-loop bug (hundreds of rotations/minute on a
single entry) is a separate issue — that requires a refresh-attempt
cap on the auth-recovery path in run_agent.py, which remains open.
2026-04-18 19:19:46 -07:00
helix4u
ca32a2a60b fix(gemini): restore bearer auth on openai route 2026-04-18 12:52:01 -07:00
helix4u
a7dd6a3449 fix(gemini): hide stale and low-TPM Google models 2026-04-18 12:52:01 -07:00
helix4u
2eab7ee15f fix(gemini): hide low-TPM Gemma models from exposed lists 2026-04-18 12:52:01 -07:00
Honghua Yang
3128d9fcd2 fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::

    {"path": "/foo.md", "content": "# Long markdown
    ...[truncated]

— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.

Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.

Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.

Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
  output, non-JSON pass-through, nested structures, non-string leaves,
  scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
  reproduces the exact failure payload shape from the incident

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:40:56 -07:00
kshitij
c14b3b5880
fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
AviArora02-commits
994faacce8 fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893) 2026-04-17 21:30:17 -07:00
Teknium
2297c5f5ce fix(auth): restore --label for hermes auth add nous --type oauth
persist_nous_credentials() now accepts an optional label kwarg which
gets embedded in providers.nous under the 'label' key.
_seed_from_singletons() prefers the embedded label over the
auto-derived label_from_token() fingerprint when materialising the
pool entry, so re-seeding on every load_pool('nous') preserves the
user's chosen label.

auth_commands.py threads --label through to the helper, restoring
parity with how other OAuth providers (anthropic, codex, google,
qwen) honor the flag.

Tests: 4 new (embed, reseed-survives, no-label fallback, end-to-end
through auth_add_command). All 390 nous/auth/credential_pool tests
pass.
2026-04-17 19:13:40 -07:00
Teknium
a155b4a159
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
Michel Belleau
d465fc5869 fix(skills): use frontmatter name in skills index instead of directory name
build_skills_system_prompt() was using the skill directory name (skill_name)
when appending to skills_by_category in all three code paths (snapshot cache,
cold filesystem scan, external dirs). This meant any skill whose directory name
differed from its frontmatter `name` field would appear under the wrong name in
the system prompt, causing LLM routing failures.

The snapshot entry already stores both skill_name (dir) and frontmatter_name
(declared); switch the three tuple appends to use frontmatter_name. Also fix
the external-dir dedup set (seen_skill_names) to track frontmatter names for
consistency with the local-skill tuples now stored under frontmatter_name.

Fixes #11777

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:56:37 -07:00
helix4u
2b60478fc2 fix(kimi): force kimi-for-coding temperature to 0.6 2026-04-17 15:49:14 -07:00
Teknium
c6fd2619f7
fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833)
Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit
path (status_code on the exception, Retry-After preserved via error.response)
instead of being opaque RuntimeErrors. User sees a one-line capacity message
instead of a 500-char JSON dump.

Changes
- CodeAssistError grows status_code / response / retry_after / details attrs.
  _extract_status_code in error_classifier picks up status_code and classifies
  429 as FailoverReason.rate_limit, so fallback_providers triggers the same
  way it does for SDK errors. run_agent.py line ~10428 already walks
  error.response.headers for Retry-After — preserving the response means that
  path just works.
- _gemini_http_error parses the Google error envelope (error.status +
  error.details[].reason from google.rpc.ErrorInfo, retryDelay from
  google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404
  model-not-found each produce a human-readable message; unknown shapes fall
  back to the previous raw-body format.
- Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and
  agent/model_metadata.py — Google returned 404 for it today in local repro.
  Kept gemma-4-31b-it (capacity-constrained but not retired).

Validation
|                           | Before                         | After                                     |
|---------------------------|--------------------------------|-------------------------------------------|
| Error message             | 'Code Assist returned HTTP 429: {500 chars JSON}' | 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' |
| status_code on error      | None (opaque RuntimeError)     | 429                                       |
| Classifier reason         | unknown (string-match fallback) | FailoverReason.rate_limit                |
| Retry-After honored       | ignored                        | extracted from RetryInfo or header        |
| gemma-4-26b-it picker     | advertised (404s on Google)    | removed                                   |

Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found,
Retry-After header fallback, malformed body, and classifier integration.
Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full
tests/hermes_cli (2203 tests) green.

Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 15:34:12 -07:00