hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
unlinearity	155b619867	fix(agent): normalize socks:// env proxies for httpx/anthropic WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed. Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them: - run_agent._get_proxy_from_env - agent.auxiliary_client._validate_proxy_env_urls - agent.anthropic_adapter.build_anthropic_client - gateway.platforms.base.resolve_proxy_url Regression coverage: - run_agent proxy env resolution - auxiliary proxy env normalization - gateway proxy URL resolution Verified with: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py 39 passed.	2026-04-21 05:52:46 -07:00
Kian Meng	063bc3c1e2	fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot Kimi/Moonshot endpoints require explicit parameters that Hermes was not sending, causing 'Response truncated due to output length limit' errors and inconsistent reasoning behavior. Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli, packages/kosong/src/kosong/chat_provider/kimi.py): 1. max_tokens: Kimi's API defaults to a very low value when omitted. Reasoning tokens share the output budget — the model exhausts it on thinking alone. Send 32000, matching Kimi CLI's generate() default. 2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not inside extra_body). Hermes was not sending it at all because _supports_reasoning_extra_body() returns False for non-OpenRouter endpoints. 3. extra_body.thinking: Kimi CLI uses with_thinking() which sets extra_body.thinking={"type":"enabled"} alongside reasoning_effort. This is a separate control from the OpenAI-style reasoning extra_body that Hermes sends for OpenRouter/GitHub. Without it, the Kimi gateway may not activate reasoning mode correctly. Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot). Tests: 6 new test cases for max_tokens, reasoning_effort, and extra_body.thinking under various configs.	2026-04-21 05:32:27 -07:00
kshitijk4poor	731f4fbae6	feat: add transport ABC + AnthropicTransport wired to all paths Add ProviderTransport ABC (4 abstract methods: convert_messages, convert_tools, build_kwargs, normalize_response) plus optional hooks (validate_response, extract_cache_stats, map_finish_reason). Add transport registry with lazy discovery — get_transport() auto-imports transport modules on first call. Add AnthropicTransport — delegates to existing anthropic_adapter.py functions, wired to ALL Anthropic code paths in run_agent.py: - Main normalize loop (L10775) - Main build_kwargs (L6673) - Response validation (L9366) - Finish reason mapping (L9534) - Cache stats extraction (L9827) - Truncation normalize (L9565) - Memory flush build_kwargs + normalize (L7363, L7395) - Iteration-limit summary + retry (L8465, L8498) Zero direct adapter imports remain for transport methods. Client lifecycle, streaming, auth, and credential management stay on AIAgent. 20 new tests (ABC contract, registry, AnthropicTransport methods). 359 anthropic-related tests pass (0 failures). PR 3 of the provider transport refactor.	2026-04-21 01:27:01 -07:00
alt-glitch	28b3f49aaa	refactor: remove remaining redundant local imports (comprehensive sweep) Full AST-based scan of all .py files to find every case where a module or name is imported locally inside a function body but is already available at module level. This is the second pass — the first commit handled the known cases from the lint report; this one catches everything else. Files changed (19): cli.py — 16 removals: time as _time/_t/_tmod (×10), re / re as _re (×2), os as _os, sys, partial os from combo import, from model_tools import get_tool_definitions gateway/run.py — 8 removals: MessageEvent as _ME / MessageType as _MT (×3), os as _os2, MessageEvent+MessageType (×2), Platform, BasePlatformAdapter as _BaseAdapter run_agent.py — 6 removals: get_hermes_home as _ghh, partial (contextlib, os as _os), cleanup_vm, cleanup_browser, set_interrupt as _sif (×2), partial get_toolset_for_tool hermes_cli/main.py — 4 removals: get_hermes_home, time as _time, logging as _log, shutil hermes_cli/config.py — 1 removal: get_hermes_home as _ghome hermes_cli/runtime_provider.py — 1 removal: load_config as _load_bedrock_config hermes_cli/setup.py — 2 removals: importlib.util (×2) hermes_cli/nous_subscription.py — 1 removal: from hermes_cli.config import load_config hermes_cli/tools_config.py — 1 removal: from hermes_cli.config import load_config, save_config cron/scheduler.py — 3 removals: concurrent.futures, json as _json, from hermes_cli.config import load_config batch_runner.py — 1 removal: list_distributions as get_all_dists (kept print_distribution_info, not at top level) tools/send_message_tool.py — 2 removals: import os (×2) tools/skills_tool.py — 1 removal: logging as _logging tools/browser_camofox.py — 1 removal: from hermes_cli.config import load_config tools/image_generation_tool.py — 1 removal: import fal_client environments/tool_context.py — 1 removal: concurrent.futures gateway/platforms/bluebubbles.py — 1 removal: httpx as _httpx gateway/platforms/whatsapp.py — 1 removal: import asyncio tui_gateway/server.py — 2 removals: from datetime import datetime, import time All alias references (_time, _t, _tmod, _re, _os, _os2, _json, _ghh, _ghome, _sif, _ME, _MT, _BaseAdapter, _load_bedrock_config, _httpx, _logging, _log, get_all_dists) updated to use the top-level names.	2026-04-21 00:50:58 -07:00
alt-glitch	1010e5fa3c	refactor: remove redundant local imports already available at module level Sweep ~74 redundant local imports across 21 files where the same module was already imported at the top level. Also includes type fixes and lint cleanups on the same branch.	2026-04-21 00:50:58 -07:00
kshitijk4poor	7ab5eebd03	feat: add transport types + migrate Anthropic normalize path Add agent/transports/types.py with three shared dataclasses: - NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data - ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata) - Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the existing v1 function and maps its output to NormalizedResponse. One call site in run_agent.py (the main normalize branch) uses v2 with a back-compat shim to SimpleNamespace for downstream code. No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3 with the first concrete transport (AnthropicTransport). 46 new tests: - test_types.py: dataclass construction, build_tool_call, map_finish_reason - test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools, thinking, mixed, stop reasons, mcp prefix stripping, edge cases) Part of the provider transport refactor (PR 2 of 9).	2026-04-20 23:06:00 -07:00
Teknium	70d7f79bef	refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340 ) The mid-run steer marker was '[USER STEER (injected mid-run, not tool output): <text>]'. Replaced with a plain two-newline-prefixed 'User guidance: <text>' suffix. Rationale: the marker lives inside the tool result's content string regardless of whether the tool returned JSON, plain text, an MCP result, or a plugin result. The bracketed tag read like structured metadata that some tools (terminal, execute_code) could confuse with their own output formatting. A plain labelled suffix works uniformly across every content shape we produce. Behavior unchanged: - Still injected into the last tool-role message's content. - Still preserves multimodal (Anthropic) content-block lists by appending a text block. - Still drained at both sites added in #12959 and #13205 — per-tool drain between individual calls, and pre-API-call drain at the top of each main-loop iteration. Checked Codex's equivalent (pending_input / inject_user_message_without_turn in codex-rs/core): they record mid-turn user input as a real role:user message via record_user_prompt_and_emit_turn_item(). That's cleaner for their Responses-API model but not portable to Chat Completions where role alternation after tool_calls is strict. Embedding the guidance in the last tool result remains the correct placement for us. Validation: all 21 tests in tests/run_agent/test_steer.py pass.	2026-04-20 22:18:49 -07:00
Teknium	dbb7e00e7e	fix: sweep remaining provider-URL substring checks across codebase Completes the hostname-hardening sweep — every substring check against a provider host in live-routing code is now hostname-based. This closes the same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen, ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI, Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI, and Anthropic. New helper: - utils.base_url_host_matches(base_url, domain) — safe counterpart to 'domain in base_url'. Accepts hostname equality and subdomain matches; rejects path segments, host suffixes, and prefix collisions. Call sites converted (real-code only; tests, optional-skills, red-teaming scripts untouched): run_agent.py (10 sites): - AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check) - header cascade for openrouter / copilot / kimi / qwen / chatgpt - interleaved-thinking trigger (openrouter + claude) - _is_openrouter_url(), _is_qwen_portal() - is_native_anthropic check - github-models-vs-copilot detection (3 sites) - reasoning-capable route gate (nousresearch, vercel, github) - codex-backend detection in API kwargs build - fallback api_mode Bedrock detection agent/auxiliary_client.py (7 sites): - extra-headers cascades in 4 distinct client-construction paths (resolve custom, resolve auto, OpenRouter-fallback-to-custom, _async_client_from_sync, resolve_provider_client explicit-custom, resolve_auto_with_codex) - _is_openrouter_client() base_url sniff agent/usage_pricing.py: - resolve_billing_route openrouter branch agent/model_metadata.py: - _is_openrouter_base_url(), Bedrock context-length lookup hermes_cli/providers.py: - determine_api_mode Bedrock heuristic hermes_cli/runtime_provider.py: - _is_openrouter_url flag for API-key preference (issues #420, #560) hermes_cli/doctor.py: - Kimi User-Agent header for /models probes tools/delegate_tool.py: - subagent Codex endpoint detection trajectory_compressor.py: - _detect_provider() cascade (8 providers: openrouter, nous, codex, zai, kimi-coding, arcee, minimax-cn, minimax) cli.py, gateway/run.py: - /model-switch cache-enabled hint (openrouter + claude) Bedrock detection tightened from 'bedrock-runtime in url' to 'hostname starts with bedrock-runtime. AND host is under amazonaws.com'. ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'. Tests: - tests/test_base_url_hostname.py extended with a base_url_host_matches suite (exact match, subdomain, path-segment rejection, host-suffix rejection, host-prefix rejection, empty-input, case-insensitivity, trailing dot). Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock, gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback, fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution, delegate, credential_pool, context_compressor, plus the 4 hostname test modules). 26-assertion E2E call-site verification across 6 modules passes.	2026-04-20 22:14:29 -07:00
Teknium	cecf84daf7	fix: extend hostname-match provider detection across remaining call sites Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the two openai/xai sites in run_agent.py. This finishes the sweep: the same substring-match false-positive class (e.g. https://api.openai.com.evil/v1, https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1) existed in eight more call sites, and the hostname helper was duplicated in two modules. - utils: add shared base_url_hostname() (single source of truth). - hermes_cli/runtime_provider, run_agent: drop local duplicates, import from utils. Reuse the cached AIAgent._base_url_hostname attribute everywhere it's already populated. - agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg selection to hostname equality. - run_agent: native-anthropic check in the Claude-style model branch and in the AIAgent init provider-auto-detect branch. - agent/model_metadata: Anthropic /v1/models context-length lookup. - hermes_cli/providers.determine_api_mode: anthropic / openai URL heuristics for custom/unknown providers (the /anthropic path-suffix convention for third-party gateways is preserved). - tools/delegate_tool: anthropic detection for delegated subagent runtimes. - hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint native-OpenAI detection (paired with deduping the repeated check into a single is_native_openai boolean per branch). Tests: - tests/test_base_url_hostname.py covers the helper directly (path-containing-host, host-suffix, trailing dot, port, case). - tests/hermes_cli/test_determine_api_mode_hostname.py adds the same regression class for determine_api_mode, plus a test that the /anthropic third-party gateway convention still wins. Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.	2026-04-20 22:14:29 -07:00
Aslaaen	5356797f1b	fix: restrict provider URL detection to exact hostname matches	2026-04-20 22:14:29 -07:00
jerilynzheng	b117538798	feat: attribution default_headers for ai-gateway provider Requests through Vercel AI Gateway now carry referrerUrl / appName / User-Agent attribution so traffic shows up in the gateway's analytics. Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.	2026-04-20 21:02:28 -07:00
Teknium	e00d9630c5	fix: thread api_key through ollama num_ctx probe + author map Follow-up for salvaged PR #3185: - run_agent.py: pass self.api_key to query_ollama_num_ctx() so Ollama behind an auth proxy (same issue class as the LM Studio fix) can be probed successfully. - scripts/release.py AUTHOR_MAP: map @tannerfokkens-maker's local-hostname commit email.	2026-04-20 20:51:56 -07:00
Teknium	999dc43899	fix(steer): drain pending steer before each API call, not just after tool execution (#13205 ) When /steer is sent during an API call (model thinking), the steer text sits in _pending_steer until after the next tool batch — which may never come if the model returns a final response. In that case the steer is only delivered as a post-run follow-up, defeating the purpose. Add a pre-API-call drain at the top of the main loop: before building api_messages, check _pending_steer and inject into the last tool result in the messages list. This ensures steers sent during model thinking are visible on the very next API call. If no tool result exists yet (first iteration), the steer is restashed for the post-tool drain to pick up — injecting into a user message would break role alternation. Three new tests cover the pre-API-call drain: injection into last tool result, restash when no tool message exists, and backward scan past non-tool messages.	2026-04-20 16:06:17 -07:00
Teknium	3cba81ebed	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 ) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR #13137 (@kshitijk4poor).	2026-04-20 12:23:05 -07:00
kshitijk4poor	ff56bebdf3	refactor: extract codex_responses logic into dedicated adapter Extract 12 Codex Responses API format-conversion and normalization functions from run_agent.py into agent/codex_responses_adapter.py, following the existing pattern of anthropic_adapter.py and bedrock_adapter.py. run_agent.py: 12,550 → 11,865 lines (-685 lines) Functions moved: - _chat_content_to_responses_parts (multimodal content conversion) - _summarize_user_message_for_log (multimodal message logging) - _deterministic_call_id (cache-safe fallback IDs) - _split_responses_tool_id (composite ID splitting) - _derive_responses_function_call_id (fc_ prefix conversion) - _responses_tools (schema format conversion) - _chat_messages_to_responses_input (message format conversion) - _preflight_codex_input_items (input validation) - _preflight_codex_api_kwargs (API kwargs validation) - _extract_responses_message_text (response text extraction) - _extract_responses_reasoning_text (reasoning extraction) - _normalize_codex_response (full response normalization) All functions are stateless module-level functions. AIAgent methods remain as thin one-line wrappers. Both module-level helpers are re-exported from run_agent.py for backward compatibility with existing test imports. Includes multimodal inline image support (PR #12969) that the original PR was missing. Based on PR #12975 by @kshitijk4poor.	2026-04-20 11:53:17 -07:00
Teknium	9725b452a1	fix: extract _repair_tool_call_arguments helper, add tests, bound loop Follow-up for PR #12252 salvage: - Extract 75-line inline repair block to _repair_tool_call_arguments() module-level helper for testability and readability - Remove redundant 'import re as _re' (re already imported at line 33) - Bound the while-True excess-delimiter removal loop to 50 iterations - Add 17 tests covering all 6 repair stages - Add sirEven to AUTHOR_MAP in release.py	2026-04-20 05:12:55 -07:00
Severin Bretscher	9eeaaa4f1b	fix(agent): repair malformed tool_call arguments before API send Cherry-picked from PR #12252 by @sirEven. Models like GLM-5.1 via Ollama can produce malformed tool_call arguments (truncated JSON, trailing commas, Python None). The existing except Exception: pass silently passes broken args to the API, which rejects them with HTTP 400, crashing the session. Adds a multi-stage repair pipeline at the pre-send normalization point: 1. Empty/whitespace-only → {} 2. Python None literal → {} 3. Strip trailing commas 4. Auto-close unclosed brackets 5. Remove excess closing delimiters 6. Last resort: replace with {} (logged at WARNING)	2026-04-20 05:12:55 -07:00
Sanjays2402	570f8bab8f	fix(compression): exclude completion tokens from compression trigger (#12026 ) Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026	2026-04-20 05:12:10 -07:00
Teknium	f683132c1d	feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969 ) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text\|image_url, image_url: {url, detail?}} Responses: {type: input_text\|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text\|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>	2026-04-20 04:16:13 -07:00
Teknium	eba7c869bb	fix(steer): drain /steer between individual tool calls, not at batch end (#12959 ) Previously, /steer text was only injected after an entire tool batch completed (_execute_tool_calls_sequential/concurrent returned). If the batch had a long-running tool (delegate_task, terminal build), the steer waited for ALL tools to finish before landing — functionally identical to /queue from the user's perspective. Now _apply_pending_steer_to_tool_results() is called after EACH individual tool result is appended to messages, in both the sequential and concurrent paths. A steer arriving during Tool 1 lands in Tool 1's result before Tool 2 starts executing. Also handles leftover steers in the gateway: if a steer arrives during the final API call (no tool batch to drain into), it's now delivered as the next user turn instead of being silently dropped. Fixes user report from Utku.	2026-04-20 03:08:04 -07:00
Teknium	4f24db4258	fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898 ) Context compression silently failed when the auxiliary compression model's context window was smaller than the main model's compression threshold (e.g. GLM-4.5-air at 131k paired with a 150k threshold). The feasibility check warned but the session kept running and compression attempts errored out mid-conversation. Two changes in _check_compression_model_feasibility(): 1. Hard floor: if detected aux context < MINIMUM_CONTEXT_LENGTH (64k), raise ValueError so the session refuses to start. Mirrors the existing main-model rejection at AIAgent.__init__ line 1600. A compression model below 64k cannot summarise a full threshold-sized window. 2. Auto-correct: when aux context is >= 64k but below the computed threshold, lower the live compressor's threshold_tokens to aux_context (and update threshold_percent to match so later update_model() calls stay in sync). Warning reworded to say what was done and how to persist the fix in config.yaml. Only ValueError re-raises; other exceptions in the check remain swallowed as non-fatal.	2026-04-20 00:56:04 -07:00
helix4u	03e3c22e86	fix(config): add stale timeout settings	2026-04-20 00:52:50 -07:00
Teknium	65a31ee0d5	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 ) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>	2026-04-19 22:43:09 -07:00
Brian D. Evans	1cf1016e72	fix(run_agent): preserve dotted Bedrock inference-profile model IDs (#11976 ) Bedrock rejects ``global-anthropic-claude-opus-4-7`` with ``HTTP 400: The provided model identifier is invalid`` because its inference profile IDs embed structural dots (``global.anthropic.claude-opus-4-7``) that ``normalize_model_name`` was converting to hyphens. ``AIAgent._anthropic_preserve_dots`` did not include ``bedrock`` in its provider allowlist, so every Claude-on- Bedrock request through the AnthropicBedrock SDK path shipped with the mangled model ID and failed. Root cause ---------- ``run_agent.py:_anthropic_preserve_dots`` (previously line 6589) controls whether ``agent.anthropic_adapter.normalize_model_name`` converts dots to hyphens. The function listed Alibaba, MiniMax, OpenCode Go/Zen and ZAI but not Bedrock, so when a user set ``provider: bedrock`` with a dotted inference-profile model the flag returned False and ``normalize_model_name`` mangled every dot in the ID. All four call sites in run_agent.py (``build_anthropic_kwargs`` + three fallback / review / summary paths at lines 6707, 7343, 8408, 8440) read from this same helper. The bug shape matches #5211 for opencode-go, which was fixed in commit `f77be22c` by extending this same allowlist. Fix --- * Add ``"bedrock"`` to the provider allowlist. * Add ``"bedrock-runtime."`` to the base-URL heuristic as defense-in-depth, so a custom-provider-shaped config with ``base_url: https://bedrock-runtime.<region>.amazonaws.com`` also takes the preserve-dots path even if ``provider`` isn't explicitly set to ``"bedrock"``. This mirrors how the code downstream at run_agent.py:759 already treats either signal as "this is Bedrock". Bedrock model ID shapes covered ------------------------------- \| Shape \| Preserved \| \| --- \| --- \| \| ``global.anthropic.claude-opus-4-7`` (reporter's exact ID) \| ✓ \| \| ``us.anthropic.claude-sonnet-4-5-20250929-v1:0`` \| ✓ \| \| ``apac.anthropic.claude-haiku-4-5`` \| ✓ \| \| ``anthropic.claude-3-5-sonnet-20241022-v2:0`` (foundation) \| ✓ \| \| ``eu.anthropic.claude-3-5-sonnet`` (regional inference profile) \| ✓ \| Non-Claude Bedrock models (Nova, Llama, DeepSeek) take the ``bedrock_converse`` / boto3 path which does not call ``normalize_model_name``, so they were never affected by this bug and remain unaffected by the fix. Narrow scope — explicitly not changed ------------------------------------- * ``bedrock_converse`` path (non-Claude Bedrock models) — already correct; no ``normalize_model_name`` in that pipeline. * Provider aliases (``aws``, ``aws-bedrock``, ``amazon``, ``amazon-bedrock``) — if a user bypasses the alias-normalization pipeline and passes ``provider="aws"`` directly, the base-URL heuristic still catches it because Bedrock always uses a ``bedrock-runtime.`` endpoint. Adding the aliases themselves to the provider set is cheap but would be scope creep for this fix. * No other places in ``agent/anthropic_adapter.py`` mangle dots, so the fix is confined to ``_anthropic_preserve_dots``. Regression coverage ------------------- ``tests/agent/test_bedrock_integration.py`` gains three new classes: * ``TestBedrockPreserveDotsFlag`` (5 tests): flag returns True for ``provider="bedrock"`` and for Bedrock runtime URLs (us-east-1 and ap-northeast-2 — the reporter's region); returns False for non- Bedrock AWS URLs like ``s3.us-east-1.amazonaws.com``; canary that Anthropic-native still returns False. * ``TestBedrockModelNameNormalization`` (5 tests): every documented Bedrock model-ID shape survives ``normalize_model_name`` with the flag on; inverse canary pins that ``preserve_dots=False`` still mangles (so a future refactor can't decouple the flag from its effect). * ``TestBedrockBuildAnthropicKwargsEndToEnd`` (2 tests): integration through ``build_anthropic_kwargs`` shows the reporter's exact model ID ends up unmangled in the outgoing kwargs. Three of the new flag tests fail on unpatched ``origin/main`` with ``assert False is True`` (preserve-dots returning False for Bedrock), confirming the regression is caught. Validation ---------- ``source venv/bin/activate && python -m pytest tests/agent/test_bedrock_integration.py tests/agent/test_minimax_provider.py -q`` -> 84 passed (40 new bedrock tests + 44 pre-existing, including the minimax canaries that pin the pattern this fix mirrors). CI-aligned broad suite: 12827 passed, 39 skipped, 19 pre-existing baseline failures (all reproduce on clean ``origin/main``; none in the touched code path). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-19 20:30:44 -07:00
kshitijk4poor	50d6799389	fix: propagate kimi base-url temperature overrides Follow up salvaged PR #12668 by threading base_url through the remaining direct-call sites so kimi-k2.5 uses temperature=1.0 on api.moonshot.ai and keeps 0.6 on api.kimi.com/coding. Add focused regression tests for run_agent, trajectory_compressor, and mini_swe_runner.	2026-04-19 18:54:35 -07:00
kshitijk4poor	d393104bad	fix(gemini): tighten native routing and streaming replay - only use the native adapter for the canonical Gemini native endpoint - keep custom and /openai base URLs on the OpenAI-compatible path - preserve Hermes keepalive transport injection for native Gemini clients - stabilize streaming tool-call replay across repeated SSE events - add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks	2026-04-19 12:40:08 -07:00
kshitijk4poor	3dea497b20	feat(providers): route gemini through the native AI Studio API - add a native Gemini adapter over generateContent/streamGenerateContent - switch the built-in gemini provider off the OpenAI-compatible endpoint - preserve thought signatures and native functionResponse replay - route auxiliary Gemini clients through the same adapter - add focused unit coverage plus native-provider integration checks	2026-04-19 12:40:08 -07:00
Teknium	cca3278079	fix(codex): pin correct Cloudflare headers and extend to auxiliary client The cherry-picked salvage (admin28980's commit) added codex headers only on the primary chat client path, with two inaccuracies: - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs, codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on the list, so the header had no mitigating effect on the 403 (the account-id header alone may have been carrying the fix). - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID). Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex branch) constructs OpenAI clients against the same chatgpt.com endpoint with no default headers at all — so compression, title generation, vision, session search, and web_extract all still 403 from VPS IPs. Consolidate the header set into _codex_cloudflare_headers() in agent/auxiliary_client.py (natural home next to _read_codex_access_token and the existing JWT decode logic) and call it from all four insertion points: - run_agent.py: AIAgent.__init__ (initial construction) - run_agent.py: _apply_client_headers_for_base_url (credential rotation) - agent/auxiliary_client.py: _try_codex (aux client) - agent/auxiliary_client.py: resolve_provider_client raw_codex branch Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to match the codex-rs shape while keeping product attribution. Tests in tests/agent/test_codex_cloudflare_headers.py cover: - originator value, User-Agent shape, canonical header casing - account-ID extraction from a real JWT fixture - graceful handling of malformed / non-string / claim-missing tokens - wiring at all four insertion points (primary init, rotation, both aux paths) - non-chatgpt base URLs (openrouter) do NOT get codex headers - switching away from chatgpt.com drops the headers	2026-04-19 11:59:25 -07:00
admin28980	4d0846b640	Fix Cloudflare 403s for openai-codex provider on server IPs Add ChatGPT-Account-Id and originator headers when using chatgpt.com backend-api endpoint. Matches official codex-rs CLI behavior to prevent Cloudflare JavaScript challenges on non-residential IPs (VPS, Mac Mini, always-on servers). Applied in AIAgent.__init__ and _update_base_url_headers to cover both initial setup and credential rotation paths.	2026-04-19 11:59:25 -07:00
zrc	023208b17a	fix(agent): respect HTTP_PROXY/HTTPS_PROXY when using custom httpx transport When creating httpx.Client with a custom transport for TCP keepalive, proxy environment variables (HTTP_PROXY, HTTPS_PROXY) were ignored because httpx only auto-reads them when transport=None. Add _get_proxy_from_env() to explicitly read proxy settings and pass them to httpx.Client, ensuring providers like kimi-coding-cn work correctly when behind a proxy. Fixes connection errors when HTTP_PROXY/HTTPS_PROXY are set.	2026-04-19 11:44:43 -07:00
Teknium	c11ab6f64d	feat(providers): enforce request_timeout_seconds on OpenAI-wire primary calls Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the initial wiring was insufficient: run_agent.py was overriding the client-level timeout on every call via hardcoded per-request kwargs. Root cause: run_agent.py had two sites that pass an explicit timeout= kwarg into chat.completions.create() — api_kwargs['timeout'] at line 7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's _httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line 5760. Both override the per-provider config value the client was constructed with, so a 0.5s config timeout would silently not enforce. This commit: - Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default. - Uses it for the non-streaming api_kwargs['timeout'] field. - Uses it for the streaming path's httpx.Timeout(connect, read, write, pool) so both connect and read respect the configured value when set. Local-provider auto-bump (Ollama/vLLM cold-start) only applies when no explicit config value is set. - New test: test_resolved_api_call_timeout_priority covers all three precedence cases (config, env, default). Live verified: 0.5s config on claude-sonnet-4.6 now triggers APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total (was: 29-47s success with timeout ignored). Positive case (60s config + gpt-4o-mini) still succeeds at 1.3s.	2026-04-19 11:23:00 -07:00
Teknium	f1fe29d1c3	feat(providers): extend request_timeout_seconds to all client paths Follow-up on top of mvanhorn's cherry-picked commit. Original PR only wired request_timeout_seconds into the explicit-creds OpenAI branch at run_agent.py init; router-based implicit auth, native Anthropic, and the fallback chain were still hardcoded to SDK defaults. - agent/anthropic_adapter.py: build_anthropic_client() accepts an optional timeout kwarg (default 900s preserved when unset/invalid). - run_agent.py: resolve per-provider/per-model timeout once at init; apply to Anthropic native init + post-refresh rebuild + stale/interrupt rebuilds + switch_model + _restore_primary_runtime + the OpenAI implicit-auth path + _try_activate_fallback (with immediate client rebuild so the first fallback request carries the configured timeout). - tests: cover anthropic adapter kwarg honoring; widen mock signatures to accept the new timeout kwarg. - docs/example: clarify that the knob now applies to every transport, the fallback chain, and rebuilds after credential rotation.	2026-04-19 11:23:00 -07:00
Matt Van Horn	3143d32330	feat(providers): add per-provider and per-model request_timeout_seconds config Adds optional providers.<id>.request_timeout_seconds and providers.<id>.models.<model>.timeout_seconds config, resolved via a new hermes_cli/timeouts.py helper and applied where client_kwargs is built in run_agent.py. Zero default behavior change: when both keys are unset, the openai SDK default takes over. Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py for auxiliary tasks - the primary turn path just never got the equivalent knob. Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for exactly this config - specifically calls out Ollama cold-start hanging the client.	2026-04-19 11:23:00 -07:00
kshitijk4poor	7bd1a3a4b1	test(compression): cover real init feasibility override	2026-04-19 10:40:26 -07:00
kshitijk4poor	045b28733e	fix(compression): resolve missing config attribute in feasibility check Commit `4a9c3565` added a reference to `self.config` in `_check_compression_model_feasibility()` to pass the user-configured `auxiliary.compression.context_length` to `get_model_context_length()`. However, `AIAgent` never stores the loaded config dict as an instance attribute — the config is loaded into a local variable `_agent_cfg` in `__init__()` and discarded after init. This causes an `AttributeError: 'AIAgent' object has no attribute 'config'` on every session start when compression is enabled, caught by the try/except and logged as a non-fatal DEBUG message. Fix: store the loaded config as `self._config` in `__init__()` and update the reference in the feasibility check to use `self._config`.	2026-04-19 10:40:26 -07:00
kshitijk4poor	175cf7e6bb	fix: tighten quiet-mode salvage follow-ups Follow-up for the helix4u easy-fix salvage batch: - route remaining context-engine quiet-mode output through _should_emit_quiet_tool_messages() so non-CLI/library callers stay silent consistently - drop the extra senderAliases computation from WhatsApp allowlist-drop logging and remove the now-unused import This keeps the batch scoped to the intended fixes while avoiding leaked quiet-mode output and unnecessary duplicate work in the bridge.	2026-04-19 00:28:25 -07:00
helix4u	cd59af17cc	fix(agent): silence quiet_mode in python library use	2026-04-19 00:28:25 -07:00
helix4u	7b1a11b971	fix(memory): keep Honcho provider opt-in	2026-04-18 22:50:55 -07:00
Tranquil-Flow	ec48ec5530	fix(agent): strip <think> blocks from stored assistant content Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression. _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant. Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed. Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction. 3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip. Resolves #8878 and #9568. Originally proposed as PR #9250.	2026-04-18 19:19:24 -07:00
Teknium	9489d1577d	fix(agent): strip unterminated <think> blocks from visible content Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field. _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408). Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close. Everything from that tag to end of string is stripped. The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped. Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it. 6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs. The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom. This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.	2026-04-18 19:19:24 -07:00
Teknium	1e5f0439d9	docs: update Anthropic console URLs to platform.claude.com Anthropic migrated their developer console from console.anthropic.com to platform.claude.com. Two user-facing display URLs were still pointing to the old domain: - hermes_cli/main.py — API key prompt in the Anthropic model flow - run_agent.py — 401 troubleshooting output The OAuth token refresh endpoint was already migrated in PR #3246 (with fallback). Spotted by @LucidPaths in PR #3237. (Salvage of #3758 — dropped the setup.py hunk since that section was refactored away and no longer contains the stale URL.)	2026-04-18 18:55:58 -07:00
helix4u	ca32a2a60b	fix(gemini): restore bearer auth on openai route	2026-04-18 12:52:01 -07:00
LVT382009	f7af90e2da	fix: wire _ephemeral_max_output_tokens into chat_completions and add NVIDIA NIM default Based on #12152 by @LVT382009. Two fixes to run_agent.py: 1. _ephemeral_max_output_tokens consumption in chat_completions path: The error-recovery ephemeral override was only consumed in the anthropic_messages branch of _build_api_kwargs. All chat_completions providers (OpenRouter, NVIDIA NIM, Qwen, Alibaba, custom, etc.) silently ignored it. Now consumed at highest priority, matching the anthropic pattern. 2. NVIDIA NIM max_tokens default (16384): NVIDIA NIM falls back to a very low internal default when max_tokens is omitted, causing models like GLM-4.7 to truncate immediately (thinking tokens exhaust the budget before the response starts). 3. Progressive length-continuation boost: When finish_reason='length' triggers a continuation retry, the output budget now grows progressively (2x base on retry 1, 3x on retry 2, capped at 32768) via _ephemeral_max_output_tokens. Previously the retry loop just re-sent the same token limit on all 3 attempts.	2026-04-18 12:51:30 -07:00
jarvischer	0f778f7768	fix: prevent tool name duplication in streaming accumulator (MiniMax/NVIDIA NIM) Based on #11984 by @maxchernin. Fixes #8259. Some providers (MiniMax M2.7 via NVIDIA NIM) resend the full function name in every streaming chunk instead of only the first. The old accumulator used += which concatenated them into 'read_fileread_file'. Changed to simple assignment (=), matching the OpenAI Node SDK, LiteLLM, and Vercel AI SDK patterns. Function names are atomic identifiers delivered complete — no provider splits them across chunks, so concatenation was never correct semantics.	2026-04-18 12:50:32 -07:00
Teknium	2edebedc9e	feat(steer): /steer <prompt> injects a mid-run note after the next tool call (#12116 ) * feat(steer): /steer <prompt> injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer <text> stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command.	2026-04-18 04:17:18 -07:00
Teknium	8322b42c6c	fix(streaming): surface dropped tool-call on mid-stream stall (#12072 ) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: \| Scenario \| Before \| After \| \|---------------------------------------------\|---------------------------\|---------------------------------------------\| \| Stream dies mid tool-call, text already sent \| Silent exit, no indication \| User sees ⚠ warning naming the dropped tool \| \| Text-only partial stream \| Bare recovered text \| Unchanged \| \| tests/run_agent/test_streaming.py \| 24 passed \| 26 passed (2 new) \|	2026-04-18 01:52:06 -07:00
AviArora02-commits	994faacce8	fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893 )	2026-04-17 21:30:17 -07:00
Teknium	20f2258f34	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 ) * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: \| Before \| After \| \|---------------------------------------------------------\|--------------------\| \| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM \| dies within 2 s \| \| No INTERRUPT DETECTED in trace \| INTERRUPT DETECTED fires + killing process group \| \| tests/tools/test_local_interrupt_cleanup \| 1/1 pass \| \| tests/run_agent/test_concurrent_interrupt \| 4/4 pass \|	2026-04-17 20:39:25 -07:00
helix4u	016ae5c334	fix(kimi): force 0.6 on main chat path	2026-04-17 18:47:01 -07:00
Brooklyn Nicholson	aa583cb14e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 17:51:40 -05:00

1 2 3 4 5 ...

703 commits