hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
Teknium	0d137f1039	feat(errors): actionable guidance for Nous OAuth 401s (#32082 ) Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path), but the non-retryable-401 guidance branch only covered openai-codex and xai-oauth. A Nous 401 fell through to the generic 'Your API key was rejected... run hermes setup' message, which is wrong advice — the user needs hermes auth add nous --type oauth, not an API key. Also flag the case where the failing model slug ends in :free (OpenRouter syntax) while provider is nous. Without that hint, users re-OAuth successfully and then hit the same 401 on the next message because Nous Portal doesn't carry the OpenRouter free-tier slug. Reported by ashh — debug dump showed Nous device_code exhausted + deepseek/deepseek-v4-flash:free as the model.	2026-05-25 06:06:51 -07:00
daimon-nous[bot]	ac5359a3f3	fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998 ) (#32012 ) * fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance. * refactor: extract constants and continuation prompt helper - Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic --------- Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-05-25 17:43:10 +05:30
Teknium	11c40d6a42	test+polish(compression): pin anti-thrash gate and gateway session_id persistence Follow-up to @someaka's fix. Polish: - Drop the redundant `_preflight_tokens >= threshold_tokens` clause. `should_compress(tokens)` already short-circuits when tokens < threshold, so the explicit comparison was dead code on the True branch. Tests: - Preflight: pin that should_compress() is called (anti-thrash has a vote). Mocks should_compress to return False even with tokens past the raw threshold and asserts no compression runs — exact bug shape from #29335. - Gateway: AST scan of gateway/run.py asserts every `session_entry.session_id = ...` assignment is followed by a `session_store._save()` call within the same block. Three sites mutate the session_id after compression; all three must persist or the next turn loads the pre-compression transcript and re-loops. Empirically verified the test catches the bug (drops the new _save() line → red). AUTHOR_MAP: - Map ed@bebop.crew -> someaka so the salvaged commit resolves to @someaka in release notes.	2026-05-25 01:44:46 -07:00
Radical Edward	3914089d52	fix(compression): 3-line fix for infinite compression loop (#29335 ) Three compounding root causes: A) run_conversation() result dict missing session_id — gateway's dead-code guard at gateway/run.py:8700 never triggers B) preflight compression bypasses should_compress() anti-thrashing — re-triggers every turn when tool schemas dominate token budget C) gateway updates session_entry.session_id in memory but doesn't persist via session_store._save() Fixes: #29335	2026-05-25 01:44:46 -07:00
teknium1	af144cd60d	fix(model): include Premium+ in xAI OAuth label X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription' wording suggested SuperGrok was the only entitlement path. Updated to 'SuperGrok / Premium+' across the picker label, setup wizard, auth flows, and docs so Premium+ subscribers know the row applies to them too.	2026-05-24 18:12:16 -07:00
Teknium	8065e70274	fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443 ) Closes #31273. HTTP 402 (insufficient credits) was retried up to agent.api_max_retries times (default 3), burning paid requests against an exhausted balance. Real-world impact: ~$40 in 48h on a 24/7 Telegram+Discord gateway. Root cause: FailoverReason.billing was in the is_client_error exclusion set in agent/conversation_loop.py, which prevents the non-retryable-abort branch from firing. By the time control reaches that predicate: * credential-pool rotation has already run for billing and either continued the loop or returned False (pool exhausted/absent) * the eager-fallback branch has also fired on billing and either continued the loop or fell through (no fallback configured) Falling through to the backoff retry from here has no recovery mechanism left — it just burns more paid requests. Removing billing from the exclusion set makes 402 abort cleanly once pool+fallback recovery has failed, mirroring how 401/403 (also should_fallback=True) already behave. Added tests/run_agent/test_31273_402_not_retried.py which mirrors the is_client_error predicate shape from the source and asserts the invariant (plus a source-inspection guard against accidental re-introduction).	2026-05-24 15:14:13 -07:00
annguyenNous	38b8d0da85	fix: emit guardrail halt message to client before closing stream When the tool loop guardrail fires (max_tool_failures, etc.), the turn exits with guardrail_halt but no final assistant message was emitted to the client. The SSE stream closed silently — indistinguishable from a crash. The stream_delta_callback(None) before tool execution is a display flush, not a hard close. After generating the halt response, emit it through both _safe_print (CLI) and stream_delta_callback (SSE) so clients see the explanation. Fixes #30770	2026-05-24 07:38:24 -07:00
xxxigm	20b3703a42	fix(conversation-loop): tailor length-continuation prompt for partial stream The length-continue path's user-facing vprint and continuation prompt both told the model "your response was truncated by the output length limit." That's a lie when the stub came from a partial-stream network error (issue #30963) — and a lie the model can detect, leading to "I wasn't truncated, I'm done" no-op responses that defeat the continuation entirely. Detect the partial-stream-stub via response.id and swap in: - vprint: "Stream interrupted by network error (finish_reason='length' on partial-stream-stub)" - prompt: "[System: The previous response was cut off by a network error mid-stream. Continue exactly where you left off. Do not restart or repeat prior text. Finish the answer directly.]" Real length truncations still see the original "truncated by output length limit" prompt — the model needs to know which class of failure it's recovering from. Same length_continue_retries=3 budget, truncated_response_parts merging, and final-response stitching infrastructure on both branches. Refs: NousResearch/hermes-agent#30963	2026-05-24 04:35:15 -07:00
teknium1	b9f533af0a	test(gateway): regression for plugin-transformed response after streaming Adds a test that fails without the gateway fix, exercising the response_transformed=True branch in _finalize_response: a streamed response whose final text was modified by a transform_llm_output plugin hook must be edit_message'd in place (not duplicate-sent), with already_sent=True so the normal final-send is skipped. Also drops two minor leftovers from the salvaged PR #29119: * accumulated_text property on GatewayStreamConsumer (unused) * duplicate _response_transformed=False inside the hook try block	2026-05-24 04:31:13 -07:00
kenyonxu	8edeebe6d7	fix: propagate response_transformed flag — plugin hook output survives streaming suppression When a transform_llm_output hook modifies final_response after streaming, the gateway was silently discarding the transformed content because streamed=True / content_delivered=True triggered the final-send suppression. Three changes: 1. conversation_loop: set `_response_transformed=True` when a transform_llm_output hook returns a non-empty string, and expose it as `response_transformed` in the result dict. 2. gateway/run: skip the final-send suppression when `response_transformed` is True — the transformed response must reach the client even if streaming already sent the original text. 3. acp_adapter/server: remove `not streamed_message` guard so final_response is always delivered (ACP path fixed separately).	2026-05-24 04:31:13 -07:00
0z1-ghb	dcbcdd6526	fix(compressor): propagate api_mode and fix root logger calls - Add api_mode to 4 update_model() call sites: - conversation_loop.py: long_context failover and probe stepping - agent_runtime_helpers.py: rollback restore (also saves compressor_api_mode) - chat_completion_helpers.py: fallback activation - Fix 31 root-logger calls across 5 files (logging.warning/error/info -> logger.warning/error/info) to respect module-level log filtering	2026-05-23 17:38:19 -07:00
Teknium	c769be344a	fix(agent): recover from providers rejecting list-type tool content (#27344 ) (#30259 ) Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible servers) follow the OpenAI spec strictly and require tool message `content` to be a string — they reject our list-type content (text + image_url parts) with HTTP 400 'text is not set' / 'tool message content must be a string'. Instead of an allowlist of known-good providers (maintenance burden, guaranteed to miss aggregators like OpenRouter where the underlying model determines support, not the aggregator name), this lands a reactive recovery: 1. New `FailoverReason.multimodal_tool_content_unsupported` with a small pattern list covering the common 400 wordings. 2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API message list, downgrades any `role:tool` message whose content is list-with-image to a plain text summary (preserves text parts) in place, AND records the active (provider, model) in a session-scoped `_no_list_tool_content_models` set. 3. `_tool_result_content_for_active_model` short-circuits to a text summary when (provider, model) is in the cache — so after the first 400 + retry, subsequent screenshots in the same session skip the round trip entirely. 4. Retry hook in `agent.conversation_loop` mirrors the existing `image_too_large` recovery: detect the reason, run the helper, retry once, fall through to the normal error path if no list-type tool content was actually present. Cache is transient (per-session) by design — next session retries in case the provider added support, no persistent state to maintain. Fixes #27344. Closes #27351 (allowlist approach superseded by reactive recovery).	2026-05-21 23:40:16 -07:00
helix4u	f6f25b9449	fix(agent): fail fast on small Ollama runtime context	2026-05-21 23:25:01 -07:00
yoniebans	ce26785187	refactor(session-log): delete _save_session_log and all callers state.db now stores every message field the JSON snapshot stored. Removed the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O. Body is in git history if it ever needs to come back.	2026-05-20 11:44:10 -07:00
AceWattGit	25e0f4d465	fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError.	2026-05-18 20:04:57 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
teknium1	4a3f13b47b	perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR #20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes #20451 (date-only), #18547 (prefix stabilization), #8689 (stabilize timestamp across compression), #15866 (timestamp caching question), #8687 (compression timestamp), #27339 (claim #3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>	2026-05-17 23:20:37 -07:00
teknium1	df22d29522	fix(copilot): GitHub Models 413 hint — port to extracted conversation_loop Original commits `4ded3ede3` (@konsisumer) + `374dc81c2` (Teknium) added a 413 hint to run_agent.py's agent loop. Final-state version (the sharpened `374dc81c2` wording) ported to agent/conversation_loop.py, where the payload_too_large branch now lives. The deprecation detection + _URL_TO_PROVIDER changes from both commits landed in agent/copilot_acp_client.py and agent/model_metadata.py via the prior merge. Closes #10648 Co-authored-by: konsisumer <der@konsi.org> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-16 23:38:45 -07:00
teknium1	b07524e53a	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider — port to extracted modules Original commit `b62c99797` by Jaaneek targeted six locations in pre-refactor run_agent.py. Re-applied to the extracted post-PR locations: - api_mode dispatch → agent/agent_init.py - is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py - codex_auth_retry block + 401 hint → agent/conversation_loop.py - _try_refresh_codex_client_credentials body → run_agent.py (kept) The non-run_agent.py portions of the commit (auxiliary_client, codex transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly from main via the prior merge commit. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-16 23:23:38 -07:00
teknium1	7d221aa1f2	fix(langfuse): complete observability fix — port to extracted conversation_loop Original commit `db84a78e6` by kshitij targeted run_conversation()'s pre_api_request and post_api_request hooks in pre-refactor run_agent.py. Re-applied to the extracted location in agent/conversation_loop.py. Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com> Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>	2026-05-16 23:21:51 -07:00
teknium1	a77ca9295e	perf(run_agent): accumulate length-continuation prefix via list+join Original commit `4f8aaf104` by InB4DevOps targeted run_conversation() in the pre-refactor run_agent.py. Re-applied to the extracted location in agent/conversation_loop.py. Co-authored-by: InB4DevOps <tolle.lege+github@gmail.com>	2026-05-16 23:20:27 -07:00
teknium1	0530252384	refactor(run_agent): extract run_conversation to agent/conversation_loop.py The 3,877-line run_conversation body — the agent loop itself — moves out of run_agent.py into a dedicated module. AIAgent.run_conversation is now a thin forwarder that delegates to agent.conversation_loop.run_conversation with the AIAgent instance as the first argument. This is the largest single extraction in the run_agent.py refactor. The body keeps all 163 self.X references intact (rewritten as agent.X), all nested closures, all retry/backoff/compression machinery. Symbols that tests or callers patch on run_agent (_set_interrupt, handle_function_call, AIAgent class attrs) are resolved through _ra() inside the extracted module so the patch surface is preserved. Five tests doing inspect.getsource(AIAgent.run_conversation) updated to scan agent.conversation_loop.run_conversation. Two source-introspection tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart) updated to accept either self.X (legacy) or agent.X (extracted form) in the matched assertions. Live E2E verified on three model paths: * openai/gpt-5.4 (OpenAI chat completions via OpenRouter) * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter) * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path) Plus read_file tool execution, terminal tool, web_search. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client::test_custom_endpoint... — same as on main). run_agent.py: 9800 -> 5944 lines (-3856). Total reduction since baseline: 16083 -> 5944 (-10139, 63%).	2026-05-16 19:26:52 -07:00

22 commits