hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-17 09:41:58 +00:00

Author	SHA1	Message	Date
vlwkaos	f7f7588893	fix(agent): only set rate-limit cooldown when leaving primary; add tests	2026-04-24 05:35:43 -07:00
LeonSGP43	a9fd8d7c88	fix(agent): default missing fallback chain on switch	2026-04-24 05:35:43 -07:00
Teknium	a1caec1088	fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124 ) Claude-style and some Anthropic-tuned models occasionally emit tool names as class-like identifiers: TodoTool_tool, Patch_tool, BrowserClick_tool, PatchTool. These failed strict-dict lookup in valid_tool_names and triggered the 'Unknown tool' self-correction loop, wasting a full turn of iteration and tokens. _repair_tool_call already handled lowercase / separator / fuzzy matches but couldn't bridge the CamelCase-to-snake_case gap or the trailing '_tool' suffix that Claude sometimes tacks on. Extend it with two bounded normalization passes: 1. CamelCase -> snake_case (via regex lookbehind). 2. Strip trailing _tool / -tool / tool suffix (case-insensitive, applied twice so TodoTool_tool reduces all the way: strip _tool -> TodoTool, snake -> todo_tool, strip 'tool' -> todo). Cheap fast-paths (lowercase / separator-normalized) still run first so the common case stays zero-cost. Fuzzy match remains the last resort unchanged. Tests: tests/run_agent/test_repair_tool_call_name.py covers the three original reports (TodoTool_tool, Patch_tool, BrowserClick_tool), plus PatchTool, WriteFileTool, ReadFile_tool, write-file_Tool, patch-tool, and edge cases (empty, None, '_tool' alone, genuinely unknown names). 18 new tests + 17 existing arg-repair tests = 35/35 pass. Closes #14784	2026-04-24 05:32:08 -07:00
Prasad Subrahmanya	1fc77f995b	fix(agent): fall back on rate limit when pool has no rotation room Extracts pool-rotation-room logic into `_pool_may_recover_from_rate_limit` so single-credential pools no longer block the eager-fallback path on 429. The existing check `pool is not None and pool.has_available()` lets fallback fire only after the pool marks every entry as exhausted. With exactly one credential in the pool (the common shape for Gemini OAuth, Vertex service accounts, and any personal-key setup), `has_available()` flips back to True as soon as the cooldown expires — Hermes retries against the same entry, hits the same daily-quota 429, and burns the retry budget in a tight loop before ever reaching the configured `fallback_model`. Observed in the wild as 4+ hours of 429 noise on a single Gemini key instead of falling through to Vertex as configured. Rotation is only meaningful with more than one credential — gate on `len(pool.entries()) > 1`. Multi-credential pools keep the current wait-for-rotation behaviour unchanged. Fixes #11314. Related to #8947, #10210, #7230. Narrower scope than open PRs #8023 (classifier change) and #11492 (503/529 credential-pool bypass) — this addresses the single-credential 429 case specifically and does not conflict with either. Tests: 6 new unit tests in tests/run_agent/test_provider_fallback.py covering (a) None pool, (b) single-cred available, (c) single-cred in cooldown, (d) 2-cred available rotates, (e) multi-cred all cooling-down falls back, (f) many-cred available rotates. All 18 tests in the file pass.	2026-04-24 05:20:05 -07:00
l0hde	2cab8129d1	feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section.	2026-04-24 05:09:08 -07:00
Teknium	c2b3db48f5	fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107 ) json.JSONDecodeError inherits from ValueError. The agent loop's non-retryable classifier at run_agent.py ~L10782 treated any ValueError/TypeError as a local programming bug and short-circuited retry. Without a carve-out, a transient JSONDecodeError from a provider that returned a malformed response body, a truncated stream, or a router-layer corruption would fail the turn immediately. Add JSONDecodeError to the existing UnicodeEncodeError exclusion tuple so the classified-retry logic (which already handles 429/529/ context-overflow/etc.) gets to run on bad-JSON errors. Tests (tests/run_agent/test_jsondecodeerror_retryable.py): - JSONDecodeError: NOT local validation - UnicodeEncodeError: NOT local validation (existing carve-out) - bare ValueError: IS local validation (programming bug) - bare TypeError: IS local validation (programming bug) - source-level assertion that run_agent.py still carries the carve-out (guards against accidental revert) Closes #14782	2026-04-24 05:02:58 -07:00
Teknium	18f3fc8a6f	fix(tests): resolve 17 persistent CI test failures (#15084 ) Make the main-branch test suite pass again. Most failures were tests still asserting old shapes after recent refactors; two were real source bugs. Source fixes: - tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty. - hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150. Test fixes (mostly stale mock targets / missing fixture fields): - test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm (the local name bound at import), not tools.terminal_tool.cleanup_vm. - test_browser_camofox: patch tools.browser_camofox.load_config, not hermes_cli.config.load_config (the source module, not the resolved one). - test_flush_memories_codex._chat_response_with_memory_call: add finish_reason, tool_call.id, tool_call.type so the chat_completions transport normalizer doesn't AttributeError. - test_concurrent_interrupt: polling_tool signature now accepts messages= kwarg that _invoke_tool() passes through. - test_minimax_provider: add _fallback_chain=[] to the __new__'d agent so switch_model() doesn't AttributeError. - test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working after the scanner switched to agent.skill_utils.iter_skill_index_files (os.walk-based). Point SKILLS_DIR at a real tmp_path and patch agent.skill_utils.get_external_skills_dirs. - test_browser_cdp_tool: browser_cdp toolset was intentionally split into 'browser-cdp' (commit `96b0f3700`) so its stricter check_fn doesn't gate the whole browser toolset; test now expects 'browser-cdp'. - test_registry: add tools.browser_dialog_tool to the expected builtin-discovery set (PR #14540 added it). - test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint' key on the JSON payload, not inline '[Hint: ...' text. - test_write_deny test_hermes_env: resolve .env via get_hermes_home() so the path matches the profile-aware denylist under hermetic HERMES_HOME. - test_checkpoint_manager test_falls_back_to_parent: guard the walk-up so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the project root. - test_quick_commands: set cli.session_id in the __new__'d CLI so the alias-args path doesn't trip AttributeError when fuzzy-matching leaks a skill command across xdist test distribution.	2026-04-24 03:46:46 -07:00
WildCat Eng Manager	7626f3702e	feat: read prompt caching cache_ttl from config - Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in) - Document DEFAULT_CONFIG and developer guide example - Add unit tests for default, 1h, and invalid TTL fallback Made-with: Cursor	2026-04-24 03:21:29 -07:00
luyao618	bc15f526fb	fix(agent): exclude prior-history tool messages from background review summary Cherry-pick-of: `27b6a217b` (PR #14967 by @luyao618) Co-authored-by: luyao618 <364939526@qq.com>	2026-04-24 03:10:19 -07:00
Teknium	166b960fe4	test(proxy): regression tests for NO_PROXY bypass on keepalive client Pin the behaviour added in the preceding commit — `_get_proxy_for_base_url()` must return None for hosts covered by NO_PROXY and the HTTPS_PROXY otherwise, and the full `_create_openai_client()` path must NOT mount HTTPProxy for a NO_PROXY host. Refs: #14966	2026-04-24 03:04:42 -07:00
Reginaldas	3e10f339fd	fix(providers): send user agent to routermint endpoints	2026-04-24 03:02:16 -07:00
Teknium	a9a4416c7c	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 ) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	2026-04-24 02:55:43 -07:00
Teknium	6a20e187dd	test,chore: cover stringified array/object coercion + AUTHOR_MAP entry Follow-up to the cherry-picked coercion commit: adds 9 regression tests covering array/object parsing, invalid-JSON passthrough, wrong-shape preservation, and the issue #3947 gmail-mcp scenario end-to-end. Adds dan@danlynn.com -> danklynn to scripts/release.py AUTHOR_MAP so the salvage PR's contributor attribution doesn't break CI.	2026-04-23 16:38:38 -07:00
maelrx	e020f46bec	fix(agent): preserve MiniMax context length on delta-only overflow	2026-04-23 14:06:37 -07:00
helix4u	1dfcda4e3c	fix(approval): guard env and config overwrites	2026-04-23 14:05:36 -07:00
Teknium	165b2e481a	feat(agent): make API retry count configurable via agent.api_max_retries (#14730 ) Closes #11616. The agent's API retry loop hardcoded max_retries = 3, so users with fallback providers on flaky primaries burned through ~3 × provider timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a chance to kick in. Expose a new config key: agent: api_max_retries: 3 # default unchanged Set it to 1 for fast failover when you have fallback providers, or raise it if you prefer longer tolerance on a single provider. Values < 1 are clamped to 1 (single attempt, no retry); non-integer values fall back to the default. This wraps the Hermes-level retry loop only — the OpenAI SDK's own low-level retries (max_retries=2 default) still run beneath this for transient network errors. Changes: - hermes_cli/config.py: add agent.api_max_retries default 3 with comment. - run_agent.py: read self._api_max_retries in AIAgent.__init__; replace hardcoded max_retries = 3 in the retry loop with self._api_max_retries. - cli-config.yaml.example: documented example entry. - hermes_cli/tips.py: discoverable tip line. - tests/run_agent/test_api_max_retries_config.py: 4 tests covering default, override, clamp-to-one, and invalid-value fallback.	2026-04-23 13:59:32 -07:00
kshitijk4poor	43de1ca8c2	refactor: remove _nr_to_assistant_message shim + fix flush_memories guard NormalizedResponse and ToolCall now have backward-compat properties so the agent loop can read them directly without the shim: ToolCall: .type, .function (returns self), .call_id, .response_item_id NormalizedResponse: .reasoning_content, .reasoning_details, .codex_reasoning_items This eliminates the 35-line shim and its 4 call sites in run_agent.py. Also changes flush_memories guard from hasattr(response, 'choices') to self.api_mode in ('chat_completions', 'bedrock_converse') so it works with raw boto3 dicts too. WS1 items 3+4 of Cycle 2 (#14418).	2026-04-23 02:30:05 -07:00
kshitijk4poor	f4612785a4	refactor: collapse normalize_anthropic_response to return NormalizedResponse directly 3-layer chain (transport → v2 → v1) was collapsed to 2-layer in PR 7. This collapses the remaining 2-layer (transport → v1 → NR mapping in transport) to 1-layer: v1 now returns NormalizedResponse directly. Before: adapter returns (SimpleNamespace, finish_reason) tuple, transport unpacks and maps to NormalizedResponse (22 lines). After: adapter returns NormalizedResponse, transport is a 1-line passthrough. Also updates ToolCall construction — adapter now creates ToolCall dataclass directly instead of SimpleNamespace(id, type, function). WS1 item 1 of Cycle 2 (#14418).	2026-04-23 02:30:05 -07:00
kshitijk4poor	d30ee2e545	refactor: unify transport dispatch + collapse normalize shims Consolidate 4 per-transport lazy singleton helpers (_get_anthropic_transport, _get_codex_transport, _get_chat_completions_transport, _get_bedrock_transport) into one generic _get_transport(api_mode) with a shared dict cache. Collapse the 65-line main normalize block (3 api_mode branches, each with its own SimpleNamespace shim) into 7 lines: one _get_transport() call + one _nr_to_assistant_message() shared shim. The shim extracts provider_data fields (codex_reasoning_items, reasoning_details, call_id, response_item_id) into the SimpleNamespace shape downstream code expects. Wire chat_completions and bedrock_converse normalize through their transports for the first time — these were previously falling into the raw response.choices[0].message else branch. Remove 8 dead codex adapter imports that have zero callers after PRs 1-6. Transport lifecycle improvements: - Eagerly warm transport cache at __init__ (surfaces import errors early) - Invalidate transport cache on api_mode change (switch_model, fallback activation, fallback restore, transport recovery) — prevents stale transport after mid-session provider switch run_agent.py: -32 net lines (11,988 -> 11,956). PR 7 of the provider transport refactor.	2026-04-22 18:34:25 -07:00
Teknium	c345ec9a63	fix(display): strip standalone tool-call XML tags from visible text Port from openclaw/openclaw#67318. Some open models (notably Gemma variants served via OpenRouter) emit tool calls as XML blocks inside assistant content instead of via the structured tool_calls field: <function name="read_file"><parameter name="path">/tmp/x</parameter></function> <tool_call>{"name":"x"}</tool_call> <function_calls>[{...}]</function_calls> Left unstripped, this raw XML leaked to gateway users (Discord, Telegram, Matrix, Feishu, Signal, WhatsApp, etc.) and the CLI, since hermes-agent's existing reasoning-tag stripper handled only <think>/<thinking>/<thought> variants. Extend _strip_think_blocks (run_agent.py) and _strip_reasoning_tags (cli.py) to cover: * <tool_call>, <tool_calls>, <tool_result> * <function_call>, <function_calls> * <function name="..."> ... </function> (Gemma-style) The <function> variant is boundary-gated (only strips when the tag sits at start-of-line or after sentence punctuation AND carries a name="..." attribute) so prose mentions like 'Use <function> declarations in JS' are preserved. Dangling <function name="..."> with no close is intentionally left visible — matches OpenClaw's asymmetry so a truncated streaming tail still reaches the user. Tests: 9 new cases in TestStripThinkBlocks (run_agent) + 9 in new file tests/run_agent/test_strip_reasoning_tags_cli.py. Covers Qwen-style <tool_call>, Gemma-style <function name="...">, multi-line payloads, prose preservation, stray close tags, dangling open tags, and mixed reasoning+tool_call content. Note: this port covers the post-streaming final-text path, which is what gateway adapters and CLI display consume. Extending the per-delta stream filter in gateway/stream_consumer.py to hide these tags live as they stream is a separate follow-up; for now users may see raw XML briefly during a stream before the final cleaned text replaces it. Refs: openclaw/openclaw#67318	2026-04-22 18:12:42 -07:00
LeonSGP43	4ac1c959b2	fix(agent): resolve fallback provider key_env secrets	2026-04-22 14:42:48 -07:00
Teknium	ea67e49574	fix(streaming): silent retry when stream dies mid tool-call (#14151 ) When the streaming connection dropped AFTER user-visible text was delivered but a tool call was in flight, we stubbed the turn with a '⚠ Stream stalled mid tool-call; Ask me to retry' warning — costing an iteration and breaking the flow. Users report this happening increasingly often on long SSE streams through flaky provider routes. Fix: in the existing inner stream-retry loop, relax the deltas_were_sent short-circuit. If a tool call was in flight (partial_tool_names populated) AND the error is a transient connection error (timeout, RemoteProtocolError, SSE 'connection lost', etc.), silently retry instead of bailing out. Fire a brief 'Connection dropped mid tool-call; reconnecting…' marker so the user understands the preamble is about to be re-streamed. Researched how Claude Code (tombstone + non-streaming fallback), OpenCode (blind Effect.retry wrapping whole stream), and Clawdbot (4-way gate: stopReason==error + output==0 + !hadPotentialSideEffects) handle this. Chose the narrow Clawdbot-style gate: retry only when (a) a tool call was actually in flight (otherwise the existing stub-with-recovered-text is correct for pure-text stalls) and (b) the error is transient. Side-effect safety is automatic — no tool has been dispatched within this single API call yet. UX trade-off: user sees preamble text twice on retry (OpenCode-style). Strictly better than a lost action with a 'retry manually' message. If retries exhaust, falls through to the existing stub-with-warning path so the user isn't left with zero signal. Tests: 3 new tests in TestSilentRetryMidToolCall covering (1) silent retry recovers tool call; (2) exhausted retries fall back to stub; (3) text-only stalls don't trigger retry. 30/30 pass.	2026-04-22 13:47:33 -07:00
helix4u	a7d78d3bfd	fix: preserve reasoning_content on Kimi replay	2026-04-22 04:31:59 -07:00
kshitijk4poor	c832ebd67c	feat: add ResponsesApiTransport + wire all Codex transport paths Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the ProviderTransport ABC. Auto-registered via _discover_transports(). Wire ALL Codex transport methods to production paths in run_agent.py: - build_kwargs: main _build_api_kwargs codex branch (50 lines extracted) - normalize_response: main loop + flush + summary + retry (4 sites) - convert_tools: memory flush tool override - convert_messages: called internally via build_kwargs - validate_response: response validation gate - preflight_kwargs: request sanitization (2 sites) Remove 7 dead legacy wrappers from AIAgent (_responses_tools, _chat_messages_to_responses_input, _normalize_codex_response, _preflight_codex_api_kwargs, _preflight_codex_input_items, _extract_responses_message_text, _extract_responses_reasoning_text). Keep 3 ID manipulation methods still used by _build_assistant_message. Update 18 test call sites across 3 test files to call adapter functions directly instead of through deleted AIAgent wrappers. 24 new tests. 343 codex/responses/transport tests pass (0 failures). PR 4 of the provider transport refactor.	2026-04-21 19:48:56 -07:00
Brooklyn Nicholson	f0b763c74f	fix(model-switch): drop stale provider from fallback chain and env after /model Reported during the TUI v2 blitz test: switching from openrouter to anthropic via `/model <name> --provider anthropic` appeared to succeed, but the next turn kept hitting openrouter — the provider the user was deliberately moving away from. Two gaps caused this: 1. `Agent.switch_model` reset `_fallback_activated` / `_fallback_index` but left `_fallback_chain` intact. The chain was seeded from `fallback_providers:` at agent init for the original primary, so when the new primary returned 401 (invalid/expired Anthropic key), `_try_activate_fallback()` picked the old provider back up without informing the user. Prune entries matching either the old primary (user is moving away) or the new primary (redundant) whenever the primary provider actually changes. 2. `_apply_model_switch` persisted `HERMES_MODEL` but never updated `HERMES_INFERENCE_PROVIDER`. Any ambient re-resolution of the runtime (credential pool refresh, compressor rebuild, aux clients) falls through to that env var in `resolve_requested_provider`, so it kept reporting the original provider even after an in-memory switch. Adds three regression tests: fallback-chain prune on primary change, no-op on same-provider model swap, and env-var sync on explicit switch.	2026-04-21 14:31:47 -05:00
Teknium	5e0eed470f	fix(cache): enable prompt caching for Qwen on OpenCode/OpenCode-Go/Alibaba (#13528 ) Qwen models on OpenCode, OpenCode Go, and direct DashScope accept Anthropic-style cache_control markers on OpenAI-wire chat completions, but hermes only injected markers for Claude-named models. Result: zero cache hits on every turn, full prompt re-billed — a community user reported burning through their OpenCode Go subscription on Qwen3.6. Extend _anthropic_prompt_cache_policy to return (True, False) — envelope layout, not native — for the Alibaba provider family when the model name contains 'qwen'. Envelope layout places markers on inner content blocks (matching pi-mono's 'alibaba' cacheControlFormat) and correctly skips top-level markers on tool-role messages (which OpenCode rejects). Non-Qwen models on these providers (GLM, Kimi) keep their existing behaviour — they have automatic server-side caching and don't need client markers. Upstream reference: pi-mono #3392 / #3393 documented this contract for opencode-go Qwen models. Adds 7 regression tests covering Qwen3.5/3.6/coder on each affected provider plus negative cases for GLM/Kimi/OpenRouter-Qwen.	2026-04-21 06:40:58 -07:00
unlinearity	155b619867	fix(agent): normalize socks:// env proxies for httpx/anthropic WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed. Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them: - run_agent._get_proxy_from_env - agent.auxiliary_client._validate_proxy_env_urls - agent.anthropic_adapter.build_anthropic_client - gateway.platforms.base.resolve_proxy_url Regression coverage: - run_agent proxy env resolution - auxiliary proxy env normalization - gateway proxy URL resolution Verified with: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py 39 passed.	2026-04-21 05:52:46 -07:00
Kian Meng	063bc3c1e2	fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot Kimi/Moonshot endpoints require explicit parameters that Hermes was not sending, causing 'Response truncated due to output length limit' errors and inconsistent reasoning behavior. Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli, packages/kosong/src/kosong/chat_provider/kimi.py): 1. max_tokens: Kimi's API defaults to a very low value when omitted. Reasoning tokens share the output budget — the model exhausts it on thinking alone. Send 32000, matching Kimi CLI's generate() default. 2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not inside extra_body). Hermes was not sending it at all because _supports_reasoning_extra_body() returns False for non-OpenRouter endpoints. 3. extra_body.thinking: Kimi CLI uses with_thinking() which sets extra_body.thinking={"type":"enabled"} alongside reasoning_effort. This is a separate control from the OpenAI-style reasoning extra_body that Hermes sends for OpenRouter/GitHub. Without it, the Kimi gateway may not activate reasoning mode correctly. Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot). Tests: 6 new test cases for max_tokens, reasoning_effort, and extra_body.thinking under various configs.	2026-04-21 05:32:27 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
Teknium	70d7f79bef	refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340 ) The mid-run steer marker was '[USER STEER (injected mid-run, not tool output): <text>]'. Replaced with a plain two-newline-prefixed 'User guidance: <text>' suffix. Rationale: the marker lives inside the tool result's content string regardless of whether the tool returned JSON, plain text, an MCP result, or a plugin result. The bracketed tag read like structured metadata that some tools (terminal, execute_code) could confuse with their own output formatting. A plain labelled suffix works uniformly across every content shape we produce. Behavior unchanged: - Still injected into the last tool-role message's content. - Still preserves multimodal (Anthropic) content-block lists by appending a text block. - Still drained at both sites added in #12959 and #13205 — per-tool drain between individual calls, and pre-API-call drain at the top of each main-loop iteration. Checked Codex's equivalent (pending_input / inject_user_message_without_turn in codex-rs/core): they record mid-turn user input as a real role:user message via record_user_prompt_and_emit_turn_item(). That's cleaner for their Responses-API model but not portable to Chat Completions where role alternation after tool_calls is strict. Embedding the guidance in the last tool result remains the correct placement for us. Validation: all 21 tests in tests/run_agent/test_steer.py pass.	2026-04-20 22:18:49 -07:00
jerilynzheng	b117538798	feat: attribution default_headers for ai-gateway provider Requests through Vercel AI Gateway now carry referrerUrl / appName / User-Agent attribution so traffic shows up in the gateway's analytics. Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.	2026-04-20 21:02:28 -07:00
Teknium	999dc43899	fix(steer): drain pending steer before each API call, not just after tool execution (#13205 ) When /steer is sent during an API call (model thinking), the steer text sits in _pending_steer until after the next tool batch — which may never come if the model returns a final response. In that case the steer is only delivered as a post-run follow-up, defeating the purpose. Add a pre-API-call drain at the top of the main loop: before building api_messages, check _pending_steer and inject into the last tool result in the messages list. This ensures steers sent during model thinking are visible on the very next API call. If no tool result exists yet (first iteration), the steer is restashed for the post-tool drain to pick up — injecting into a user message would break role alternation. Three new tests cover the pre-API-call drain: injection into last tool result, restash when no tool message exists, and backward scan past non-tool messages.	2026-04-20 16:06:17 -07:00
Teknium	3cba81ebed	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 ) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR #13137 (@kshitijk4poor).	2026-04-20 12:23:05 -07:00
Teknium	9725b452a1	fix: extract _repair_tool_call_arguments helper, add tests, bound loop Follow-up for PR #12252 salvage: - Extract 75-line inline repair block to _repair_tool_call_arguments() module-level helper for testability and readability - Remove redundant 'import re as _re' (re already imported at line 33) - Bound the while-True excess-delimiter removal loop to 50 iterations - Add 17 tests covering all 6 repair stages - Add sirEven to AUTHOR_MAP in release.py	2026-04-20 05:12:55 -07:00
Sanjays2402	570f8bab8f	fix(compression): exclude completion tokens from compression trigger (#12026 ) Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026	2026-04-20 05:12:10 -07:00
Teknium	f683132c1d	feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969 ) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text\|image_url, image_url: {url, detail?}} Responses: {type: input_text\|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text\|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>	2026-04-20 04:16:13 -07:00
Teknium	4f24db4258	fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898 ) Context compression silently failed when the auxiliary compression model's context window was smaller than the main model's compression threshold (e.g. GLM-4.5-air at 131k paired with a 150k threshold). The feasibility check warned but the session kept running and compression attempts errored out mid-conversation. Two changes in _check_compression_model_feasibility(): 1. Hard floor: if detected aux context < MINIMUM_CONTEXT_LENGTH (64k), raise ValueError so the session refuses to start. Mirrors the existing main-model rejection at AIAgent.__init__ line 1600. A compression model below 64k cannot summarise a full threshold-sized window. 2. Auto-correct: when aux context is >= 64k but below the computed threshold, lower the live compressor's threshold_tokens to aux_context (and update threshold_percent to match so later update_model() calls stay in sync). Warning reworded to say what was done and how to persist the fix in config.yaml. Only ValueError re-raises; other exceptions in the check remain swallowed as non-fatal.	2026-04-20 00:56:04 -07:00
kshitijk4poor	e485bc60cd	test(kimi): cover api.moonshot.cn direct-call regressions\n\n- add run_agent coverage for the Moonshot China endpoint\n- add sync/async trajectory compressor coverage for api.moonshot.cn	2026-04-20 00:32:06 -07:00
Teknium	65a31ee0d5	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 ) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>	2026-04-19 22:43:09 -07:00
Teknium	c9b833feb3	fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in `9e844160` (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR #12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR #11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)	2026-04-19 19:18:19 -07:00
kshitijk4poor	50d6799389	fix: propagate kimi base-url temperature overrides Follow up salvaged PR #12668 by threading base_url through the remaining direct-call sites so kimi-k2.5 uses temperature=1.0 on api.moonshot.ai and keeps 0.6 on api.kimi.com/coding. Add focused regression tests for run_agent, trajectory_compressor, and mini_swe_runner.	2026-04-19 18:54:35 -07:00
Teknium	aa5bd09232	fix(tests): unstick CI — sweep stale tests from recent merges (#12670 ) One source fix (web_server category merge) + five test updates that didn't travel with their feature PRs. All 13 failures on the 04-19 CI run on main are now accounted for (5 already self-healed on main; 8 fixed here). Changes - web_server.py: add code_execution → agent to _CATEGORY_MERGE (new singleton section from #11971 broke no-single-field-category invariant). - test_browser_camofox_state: bump hardcoded _config_version 18 → 19 (also from #11971). - test_registry: add browser_cdp_tool (#12369) and discord_tool (#4753) to the expected built-in tool set. - test_run_agent::test_tool_call_accumulation: rewrite fragment chunks — #`0f778f77` switched streaming name-accumulation from += to = to fix MiniMax/NIM duplication; the test still encoded the old fragment-per-chunk premise. - test_concurrent_interrupt::_Stub: no-op _apply_pending_steer_to_tool_results — #12116 added this call after concurrent tool batches; the hand-rolled stub was missing it. - test_codex_cli_model_picker: drop the two obsolete tests that asserted auto-import from ~/.codex/auth.json into the Hermes auth store. #12360 explicitly removed that behavior (refresh-token reuse races with Codex CLI / VS Code); adoption is now explicit via `hermes auth openai-codex`. Remaining 3 tests in the file (normal path, Claude Code fallback, negative case) still cover the picker. Validation - scripts/run_tests.sh across all 6 affected files + surrounding tests (54 tests total) all green locally.	2026-04-19 12:39:58 -07:00
Teknium	d48d6fadff	test(run_agent): pin proxy-env forwarding through keepalive transport Adds a regression guard for the #11277 → proxy-bypass regression fixed in `42b394c3`. With HTTPS_PROXY / HTTP_PROXY / ALL_PROXY set, the custom httpx transport used for TCP keepalives must still route requests through an HTTPProxy pool; without proxy env, no HTTPProxy mount should exist. Also maps zrc <zhurongcheng@rcrai.com> → heykb in scripts/release.py AUTHOR_MAP so the salvage PR passes the author-attribution CI check.	2026-04-19 11:44:43 -07:00
Teknium	f1fe29d1c3	feat(providers): extend request_timeout_seconds to all client paths Follow-up on top of mvanhorn's cherry-picked commit. Original PR only wired request_timeout_seconds into the explicit-creds OpenAI branch at run_agent.py init; router-based implicit auth, native Anthropic, and the fallback chain were still hardcoded to SDK defaults. - agent/anthropic_adapter.py: build_anthropic_client() accepts an optional timeout kwarg (default 900s preserved when unset/invalid). - run_agent.py: resolve per-provider/per-model timeout once at init; apply to Anthropic native init + post-refresh rebuild + stale/interrupt rebuilds + switch_model + _restore_primary_runtime + the OpenAI implicit-auth path + _try_activate_fallback (with immediate client rebuild so the first fallback request carries the configured timeout). - tests: cover anthropic adapter kwarg honoring; widen mock signatures to accept the new timeout kwarg. - docs/example: clarify that the knob now applies to every transport, the fallback chain, and rebuilds after credential rotation.	2026-04-19 11:23:00 -07:00
kshitijk4poor	7bd1a3a4b1	test(compression): cover real init feasibility override	2026-04-19 10:40:26 -07:00
kshitijk4poor	045b28733e	fix(compression): resolve missing config attribute in feasibility check Commit `4a9c3565` added a reference to `self.config` in `_check_compression_model_feasibility()` to pass the user-configured `auxiliary.compression.context_length` to `get_model_context_length()`. However, `AIAgent` never stores the loaded config dict as an instance attribute — the config is loaded into a local variable `_agent_cfg` in `__init__()` and discarded after init. This causes an `AttributeError: 'AIAgent' object has no attribute 'config'` on every session start when compression is enabled, caught by the try/except and logged as a non-fatal DEBUG message. Fix: store the loaded config as `self._config` in `__init__()` and update the reference in the feasibility check to use `self._config`.	2026-04-19 10:40:26 -07:00
helix4u	cd59af17cc	fix(agent): silence quiet_mode in python library use	2026-04-19 00:28:25 -07:00
helix4u	7b1a11b971	fix(memory): keep Honcho provider opt-in	2026-04-18 22:50:55 -07:00
Tranquil-Flow	ec48ec5530	fix(agent): strip <think> blocks from stored assistant content Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression. _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant. Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed. Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction. 3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip. Resolves #8878 and #9568. Originally proposed as PR #9250.	2026-04-18 19:19:24 -07:00
Teknium	9489d1577d	fix(agent): strip unterminated <think> blocks from visible content Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field. _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408). Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close. Everything from that tag to end of string is stripped. The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped. Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it. 6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs. The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom. This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.	2026-04-18 19:19:24 -07:00

1 2 3

128 commits