hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Chaz Dinkle	1dde7e2f2a	fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on its own schedule, so by the time Hermes notices an expired token Claude Code may have already rotated it. Re-read live credential sources first and adopt a valid token rather than POSTing a possibly-stale refresh token. Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle) on top of the keychain/file reconciliation from PR #21112 (nodejun). Adds AUTHOR_MAP entry for nodejun.	2026-06-27 19:14:43 -07:00
jun	5a5396aecb	fix(anthropic): reconcile keychain/file credentials when one is expired read_claude_code_credentials() previously returned the macOS Keychain entry as soon as one existed, even if its OAuth token was already expired. Callers then ran is_claude_code_token_valid() on the result and got False, so resolve_anthropic_token() returned None — surfacing the misleading 'No Anthropic credentials found' error even when ~/.claude/.credentials.json held a perfectly valid token. Now reads both sources and prefers the non-expired one. When both are valid (or both expired), prefers the later expiresAt so any subsequent refresh uses the freshest refresh_token. Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation cases. The existing 'keychain wins' priority test still passes because both fixtures share the same expiresAt and the tiebreaker is >=.	2026-06-27 19:14:43 -07:00
linyubin	c946e6709f	fix(agent): activate fallback on persistent transport failures (#22277 ) Eager fallback previously fired only on rate_limit/billing. A stale- detector-killed hung stream classifies as FailoverReason.timeout (retryable=True) and the retry loop re-hit the same dead primary until the budget exhausted -- 3 x ~180-300s stale kills compounding into a 15+ min silent hang while the configured fallback chain sat idle. Extend the existing eager-fallback gate to also cover timeout and overloaded, but only after one real retry (retry_count >= 2) so genuine transient hiccups still recover on the primary. Reuses the same pool-recovery guard and state-reset as the rate_limit branch -- no new config flag, no change to the rate-limit intent. Salvaged from PR #50228 by @linyubin. Closes #22277. Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>	2026-06-27 19:12:21 -07:00
LeonSGP43	c56b39c11e	fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted _try_openrouter() returned (None, None) whenever an OpenRouter credential pool existed but was exhausted (_select_pool_entry -> (True, None)), making the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks (compression, vision, web_extract) silently failed even with a valid env key. Now the pool-present branch only returns early when it successfully builds a client; an exhausted pool falls through to the env-var path. The final failure (pool exhausted AND no env var) still marks the provider unhealthy. Fixes #23452. Co-authored-by: ambition0802 <noreply@github.com>	2026-06-27 19:09:27 -07:00
qWaitCrypto	46e18804ad	fix(auxiliary): fall back on 401 auth errors in auto mode (#21165 ) When the primary provider returns 401 and the auth-refresh path is unavailable or fails, both call_llm() and async_call_llm() reached the should_fallback gate without _is_auth_error in the condition, so the auxiliary task (e.g. compression) was dropped silently — losing message history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in both sync and async paths, plus an 'auth error' reason branch. Auth stays a non-capacity error: it falls back in auto mode via the is_auto gate, but on an explicitly-configured provider it still respects the user's choice and raises rather than silently switching providers.	2026-06-27 19:07:04 -07:00
Teknium	1a570dae00	fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901 ) The agent's image-rejection fallback strips images and retries text-only when a provider rejects image content, which is what lets the gateway drain its queued messages. The fallback only fires on a hardcoded phrase list, and the OpenRouter wording — HTTP 404 'No endpoints found that support image input' — was missing. For OpenRouter-routed non-vision models the fallback never fired, the retry loop re-sent the same rejected request until exhaustion, and every subsequent message (including plain text) stayed queued behind the stuck turn. Add the phrase to _IMAGE_REJECTION_PHRASES (the 404 already passes the 4xx gate). Add a positive test and a guard test so the sibling OpenRouter 'no endpoints ... data policy / guardrail' 404s do NOT get their images stripped. Fixes #21160. Reported by @liu14goal14-ux; PR #21198 by @ygd58.	2026-06-27 19:07:02 -07:00
konsisumer	8b4c29f0f0	fix(auth): preserve concurrently-added credentials on pool rewrite	2026-06-27 19:01:37 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
Teknium	3ac96d3308	fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827 ) On a MoA session, auxiliary tasks (title generation, compression, vision, …) ran through _resolve_auto with provider='moa' / model='<preset>', which sent the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client — producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible as the title-generation warning). MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the reference fan-out. _resolve_auto now resolves a 'moa' main provider to the preset's aggregator slot (its acting model) and continues Step 1 with that real provider+model, dropping the virtual moa://local base_url + placeholder key so the aggregator resolves via its own provider credentials. Mirrors the MoA context-length resolution. Verified live: a MoA turn no longer emits the 'not a valid model ID' warning. Test: tests/agent/test_auxiliary_main_first.py (19 pass).	2026-06-27 14:21:26 -07:00
Gille	e7bb67332d	fix(moa): preserve Codex slot routing	2026-06-27 14:20:51 -07:00
Gille	66aeda3550	fix(moa): keep virtual provider on MoA client	2026-06-27 14:20:51 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Teknium	3b44a3c8bb	feat(moa): show each reference model's output as a labelled block before the aggregator (#53793 ) When a MoA preset is selected, each reference model's answer now renders in the CLI as a thinking-style block labelled with its source model, BEFORE the aggregator responds — so the mixture-of-agents process is visible instead of a silent pause. The aggregator's response (and its tool actions) follow as normal. Mechanism (shared seam, all surfaces): - MoAChatCompletions/MoAClient take an optional reference_callback and emit 'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating' (aggregator label) once. agent_init wires this to the agent's tool_progress_callback, which every surface already consumes — so the events reach CLI/TUI/desktop/gateway with no new plumbing. - CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_ preview), and 'moa.aggregating' as a spinner transition. Display-only; never touches message history (cache-safe). Turn-scoped reference cache: the agent loop calls the facade once per tool-loop iteration, but the advisory message view is identical across iterations within a turn, so references are now run AND displayed once per user turn (keyed by the advisory view's signature) instead of re-running/re-spamming on every iteration. This also cuts reference API cost from O(iterations) back to O(turns). Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs): reference blocks render once per turn, labelled by model, before the aggregator; fresh blocks on each new turn; aggregator tool actions still execute. Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive the events via tool_progress_callback; their surface-specific renderers are a separate change.	2026-06-27 12:45:23 -07:00
Teknium	227e6c0143	fix(moa): resolve context window from the aggregator, not the 256K default (#53780 ) A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is the virtual local endpoint, so get_model_context_length() missed every probe and fell through to the 256K fallback — even when the aggregator is a 1M-context model. The acting model in MoA IS the aggregator, so resolve the context window from the aggregator slot's real provider+model. - model_metadata.get_model_context_length: when provider=='moa', resolve the preset's aggregator slot through resolve_runtime_provider and recurse with the aggregator's real provider/model/base_url. Explicit model.context_length still wins (checked first); falls through to the generic default if resolution fails. Tests: opus-gpt preset now reports 1M (the aggregator window), config override still honored.	2026-06-27 12:08:09 -07:00
konsisumer	1b6ebb24c0	fix(agent): validate OpenRouter provider sort before request dispatch	2026-06-27 11:43:08 -07:00
Teknium	190e1ffac9	fix(redact): mask passwords in lowercase/dotted config keys (#53590 ) The secret redactor only matched uppercase env-style keys ([A-Z0-9_]), so config-file assignments like spring.datasource.password=secret, app.api.key=xyz, and YAML password: secret leaked verbatim when the agent ran cat/grep on application.properties or .env files (issue #16413). Adds three case-insensitive config-key matchers that run only in a config-file context, preserving the existing #4367 (lowercase code/prose) and web-URL-passthrough carve-outs: - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export) - _YAML_ASSIGN_RE: unquoted colon config (password: value) Value capture stops at whitespace and '&' so form bodies stay pair-wise; the '://' guard keeps intentional web-URL query-param passthrough intact. Reported-by: Murtaza1211	2026-06-27 04:43:28 -07:00
Teknium	02b32e2d7c	fix(moa): call reference + aggregator models through their provider's real route (#53580 ) MoA was calling reference and aggregator models through a bare call_llm(provider=slot["provider"], model=slot["model"]) with a forced temperature and a forced max_tokens (the preset's hardcoded 4096). That left base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed the API surface instead of using the provider's real runtime, and the 4096 cap truncated long aggregator syntheses. A MoA slot is just a model selection and must be called the same way any model is called elsewhere. Each slot is now resolved through resolve_runtime_provider (the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and delegate_task all use) via a new _slot_runtime() helper, and the resolved endpoint is passed into call_llm. So a reference/aggregator gets its provider's actual API surface — MiniMax → anthropic_messages, GPT-5/o-series → max_completion_tokens, custom endpoints → their base_url — identical to how that model is handled as the acting model. MoA also no longer imposes its own output cap: max_tokens defaults to None (omitted → the model's real maximum) for references and is passed through from the caller for the aggregator. The preset's hardcoded 4096 is gone. The max_tokens preset config field is left in place (config/web/desktop unchanged); it is simply no longer applied as a forced cap. Tests: slots route through resolve_runtime_provider with resolved base_url/ api_key; resolution errors fall back to bare provider/model; neither call carries an output cap even when the preset config still contains max_tokens.	2026-06-27 04:39:42 -07:00
herbalizer404	3fe16e3cd5	fix(fallback): attach credential pool after provider switch When automatic fallback activates a provider that differs from the primary, try_activate_fallback() cleared the primary's pool (to avoid cross-provider base_url contamination, #33163) but never loaded the fallback provider's own pool. The fallback then ran with no pool, so rate_limit/billing/auth recovery couldn't rotate its credentials. After clearing a mismatched pool, load_pool(fb_provider) and attach it when it has credentials, so provider-specific rotation continues to work on the fallback target.	2026-06-27 04:39:26 -07:00
Tranquil-Flow	635841d210	fix(agent): reload credential pool on switch_model provider change (#52727 ) switch_model() swapped model/provider/base_url/api_key but never refreshed agent._credential_pool, which stays bound to the original provider. recover_with_credential_pool() then sees a pool.provider != agent.provider mismatch and short-circuits — so a 429/401 on the new provider gets no rotation and falls through to fallback instead. Reload load_pool(new_provider) inside switch_model when the provider changes (or the pool is missing). The reload is inside the protected swap block and the pool is added to the rollback snapshot, so a failed client rebuild restores the original pool. Fixes #16678, #52727.	2026-06-27 04:39:26 -07:00
teknium1	38e7bd8a08	fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status path classified these unconditionally as rate_limit with should_rotate_credential=True, so an overloaded provider exhausted the credential pool after two errors — fatal for a single-key user, who has nothing to rotate to. The credential is valid; the server is just busy. Disambiguate the 429 body against a shared _OVERLOADED_PATTERNS list and route overload language to FailoverReason.overloaded (retryable, no rotation), matching the existing 503/529 path and the message-only path (#52890). Genuine rate limits (no overload language) still rotate. Extracted the inline overloaded tuple #52890 added into the shared _OVERLOADED_PATTERNS constant so the status-code and message paths use one list. Closes #14038.	2026-06-27 04:16:54 -07:00
LeonSGP43	e7c013494d	fix(agent): preserve nested API error bodies	2026-06-27 04:13:53 -07:00
Teknium	68a65ed7a1	fix(agent_init): correct misleading sub-64K context_length error message (#53569 ) The error raised when a model's context window is below the 64K minimum advertised "or set model.context_length in config.yaml to override" — but the guard intentionally has no sub-64K escape hatch. Sub-64K models are rejected by design (tool schemas + system prompt need the headroom). The misleading clause invited a cluster of dup PRs (#11097, #11110, #8962, #9142, #37548) all trying to wire an override that we don't want. Reword to state the real options: pick a >=64K model, or — if your local server under-reports its true window — declare the real value (which must itself be >=64K). Guard behavior is unchanged.	2026-06-27 03:56:25 -07:00
Bartok9	45ce35ed72	fix(agent): classify message-only 'overloaded' as server overload Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the overloaded-classification fix, with a regression test that fails without it.	2026-06-27 03:52:52 -07:00
Teknium	ec769e49d2	fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564 ) The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not use markdown as it does not render' — factually wrong. Both adapters actively convert markdown to native formatting: - whatsapp_common.format_message(): bold, ~~strike~~, # headers, links, code blocks -> WhatsApp native syntax - signal_format.markdown_to_signal(): same conversions via bodyRanges, plus '- item' / '* item' bullets -> '• ' Unicode bullets The wrong hint made the agent strip bullets and bold the adapter would have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud: markdown is auto-converted, bullet lists work, tables are not supported. Added a contract test asserting markdown-converting platforms never forbid markdown in their hint.	2026-06-27 03:46:41 -07:00
Teknium	60f58a2b95	feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552 ) The verify-on-stop guard fired too eagerly — including on doc/markdown/skill edits with nothing to verify, where it pushed a pointless /tmp verification script. Three changes: 1. Default OFF for new installs: agent.verify_on_stop defaults to false (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31. 2. One-time migration (v30 -> v31): existing installs are switched off once, but only when the value is missing or still the "auto" sentinel — an explicit true/false the user set is preserved. 3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on the code paths. The legacy "auto" sentinel is still honored when set explicitly (ON for interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env override unchanged.	2026-06-27 03:23:22 -07:00
diamondeyesfox	8df231c941	fix(agent): rebaseline in-place compression flushes	2026-06-27 03:04:26 -07:00
kshitijk4poor	cdb1dfbc49	fix: use os.pathsep, add tests, update tips for multi-root support - Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and the Windows separator ';' work correctly. - Add 9 tests covering multi-root behavior: writes inside first/second root, writes outside all roots, trailing/leading/double separators, all-separators edge case, static deny priority, duplicate dedup. - Update hermes_cli/tips.py tip string to mention multiple paths. - Update docs to mention os.pathsep / ; on Windows. Follow-up for salvaged PR #49557.	2026-06-27 04:01:12 +05:30
Zheng Tao	fa8f1517da	feat(file_safety): support multiple HERMES_WRITE_SAFE_ROOT dirs Supports multiple directories separated by ':' (Unix PATH-style). E.g., HERMES_WRITE_SAFE_ROOT=/opt/data:/var/www/html Fixes #49535	2026-06-27 04:01:12 +05:30
Teknium	217047de2d	fix(agent): silence verification-stop loop status line (#53223 ) The verify-on-stop guard (#52296) printed '↻ Verification required before finishing' to the terminal on every internal nudge turn, adding noise to CLI/gateway sessions whenever code was edited without fresh passing checks. Demote the user-facing status emit to a logger.debug breadcrumb — the loop still nudges the model to verify before finishing, just silently.	2026-06-26 11:52:11 -07:00
helix4u	063fe4f6ef	fix(auxiliary): fallback on invalid provider responses	2026-06-26 13:49:46 +05:30
brooklyn!	a2b49e60b6	Merge pull request #52412 from GodsBoy/fix/verify-on-stop-messaging-surface-leak fix(agent): gate verify-on-stop nudge off for messaging surfaces	2026-06-26 02:30:08 -05:00
Moonsong	4e66bf1f80	fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608 ) When operator config has provider=anthropic with model.base_url pointing at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic), the auxiliary Anthropic path was unconditionally applying that override. Main-session traffic routed correctly because the main path attaches the right credential for the actual destination, but every side-channel call (memory extractors, reflection, vision, title generation, janus extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd. Gate the override on hostname == api.anthropic.com. Operators routing main through a non-Anthropic provider must use that provider's own auxiliary client; the Anthropic aux path now stays pointed at api.anthropic.com. Regression tests cover openrouter, openai, anthropic-with-path, empty, and anthropic-default-base_url cases.	2026-06-26 11:21:05 +05:30
DavidMetcalfe	27c486e3b1	feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors Wire get_reasoning_stale_timeout_floor() into both stale detectors so known reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of the upstream gateway idle-killing the socket (BrokenPipeError) before first token. Applied as max(default, floor) — never overrides explicit user config, never lowers an existing threshold. The reasoning_timeouts.py allowlist module already landed on main via #52795, so this salvage carries only the wiring + tests (the duplicate module and the stale-base MoA reverts from the original PR branch are dropped). Salvaged from #52238. Fixes #52217.	2026-06-25 22:12:06 -07:00
brooklyn!	f4c656b0a0	Merge pull request #52854 from NousResearch/bb/fix-interrupt-partial-reply fix(interrupt): keep partial streamed reply when stopped mid-response	2026-06-26 00:04:37 -05:00
teknium1	4d04c652f2	fix(curator): make external-skill write guard actually fire during curation The salvaged #51875 added a background-review write guard in skill_manage that refuses mutations to skills.external_dirs skills — but it only fires when is_background_review() is true. The curator's LLM review fork ran with the default _memory_write_origin='assistant_tool', so the guard never triggered during the exact curation pass it exists to protect against (GH-47688). - Set _memory_write_origin='background_review' on the curator review fork so turn_context binds it onto the write-origin ContextVar and the guard fires. - Add a regression test asserting the fork runs under the background_review origin (the invariant linking the fork to the guard). - AUTHOR_MAP: map yu-xin-c for the salvaged commit.	2026-06-25 22:03:02 -07:00
yu-xin-c	96bc524a71	fix(curator): protect external skills from background curation	2026-06-25 22:03:02 -07:00
teknium1	6c58878e7d	fix(browser): force secret-pattern redaction on browser_type display Force redact_sensitive_text(force=True) on the browser_type text arg so recognized credentials (API keys, tokens, JWTs) are masked in tool progress, previews, callbacks, and return payloads even when the global security.redact_secrets opt-out is set — a typed credential reaching chat history is a security boundary, not log hygiene. Normal typed text matches no pattern and stays fully readable for debuggability. Tests assert the API-key-shaped secret is masked across every surface and that normal text passes through unchanged.	2026-06-25 22:02:22 -07:00
rebel	8ff426e53b	fix: redact browser typed text surfaces	2026-06-25 22:02:22 -07:00
Brooklyn Nicholson	8233598e64	fix(interrupt): keep partial streamed reply when stopped mid-response Stopping a turn while the model is streaming (stop/esc to redirect) raised InterruptedError, set final_response to the throwaway "waiting for model response" sentinel, and persisted messages WITHOUT the assistant text that was already streamed to the screen. The next turn then had no record of the half-finished reply, so the model appeared to "forget" what it just said. Recover the on-screen text from _current_streamed_assistant_text in the InterruptedError branch and append it as the assistant turn (and surface it as final_response). The metadata sentinel is kept only when nothing was streamed yet, preserving the ACP/client suppression behavior. Completes the partial-stream recovery from `397eae5d9` (which wired the same _current_streamed_assistant_text salvage into the connection-failure twin but missed the user-interrupt path). The lossy handler dates to `c98ee9852`.	2026-06-25 23:54:20 -05:00
Teknium	a4091e49f1	fix(auth): write rotated Codex/xAI pool grant through to global root (#48415 ) (#52760 ) CredentialPool._sync_device_code_entry_to_auth_store rotated single-use OAuth refresh tokens but wrote the new chain only into the active profile store. When a profile resolves a grant from the global-root fallback (read_credential_pool, #18594) and the pool then refreshes it, root was left holding a now-revoked refresh token — every other profile reading the stale root grant subsequently died with refresh_token_reused / invalid_grant once its access token expired. This is the credential-pool analog of #43589 (which fixed the non-pool xAI refresh path in _save_xai_oauth_tokens). Detect the read-from-root case (profile lacks its own providers.<id> block) BEFORE the profile save and, after it, write the rotated chain back to the global root via a best-effort, seat-belted write-through. A profile that genuinely shadows root (owns the block) is untouched; classic mode (profile == root) is a no-op; a failed root write never breaks the profile's own save. Covers openai-codex (reported), xai-oauth, and nous through the shared sync path.	2026-06-25 19:14:06 -07:00
DavidMetcalfe	865a09a610	fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice Two-part fix: Part 1 (classifier override at agent/error_classifier.py:720-738): A transport disconnect on a reasoning model — even on a large session — now routes to FailoverReason.timeout instead of context_overflow. Without this, large-session reasoning-model disconnects route to the compression branch and silently delete conversation history on a phantom context-length error. The override is strictly targeted: non-reasoning models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to context_overflow on large sessions — the existing intentional behavior for chat models whose proxy doesn't idle-kill during prefill/generation. Part 2 (new agent/thinking_timeout_guidance.py + integration at agent/conversation_loop.py:3488-3567): New is_thinking_timeout() and build_thinking_timeout_guidance() helpers. When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3, Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning) hits a transport-kill on a small session (classifier says timeout directly) or after Part 1 routes correctly (large session), the user now sees reasoning-specific guidance with three actionable workarounds in priority order: 1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900 in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s for known reasoning models; raise further if upstream is even tighter). 2. Lower reasoning_budget or set reasoning_effort: medium on this model if the provider supports it. 3. Use a smaller / faster reasoning model if the task doesn't require deep thinking. The new guidance takes precedence via if/elif over the existing _is_stream_drop block, so a reasoning-model user with a transport-kill message sees actionable advice instead of the misleading "try execute_code with Python's open() for large files" advice (which is correct for the unrelated large-file-write stream-drop case but actively wrong for the thinking-timeout case). Verified: - 478 tests passing across 9 directly-relevant files (49 new + 429 existing, zero regressions). - Ruff lint clean on all 4 modified/new files. - Negative test: 6 parametrized regression guards confirm non-reasoning models still route to context_overflow on large sessions; 4 parametrized gates confirm non-timeout classifier reasons never trigger the guidance; 5 parametrized cases confirm non-transport messages never trigger it. - Regression guard: new guidance message does NOT contain "execute_code" or "open()" — the misleading advice is fully replaced, not appended alongside. - Cross-vendor dual review via agy -p: - Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one SHOULD-FIX (vprint block duplication — fixed by extracting detection into a helper module). - GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits (test placement — adopted at tests/agent/test_thinking_timeout_guidance.py; primary-model capture — accepted as non-issue per Flash's nit). Dependency note for maintainers: This PR includes agent/reasoning_timeouts.py (the reasoning-model allowlist module from PR #52238) because the Layer 1 override is load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238 lands on main, this PR's duplicate agent/reasoning_timeouts.py should be rebased away. Either PR can land first; the other rebase is mechanical. Fixes #52271.	2026-06-25 19:00:48 -07:00
brooklyn!	ffa3d3c811	Merge pull request #49037 from NousResearch/bb/projects-paradigm feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools	2026-06-25 17:49:05 -05:00
x7peeps	c7e934a5b4	fix(hermes_state): persist billing provider/base_url after mid-session /model switch The session database records billing_provider and billing_base_url using COALESCE(column, ?) in update_token_counts(), making them write-once. When a user switches models mid-session via /model, the runtime (agent.provider, agent.base_url) updates correctly, but the session row never reflects the new provider. This causes the dashboard Models page to display a stale provider badge and misattributes token usage / cost analytics. Fix: add update_session_billing_route() that unconditionally sets billing_provider, billing_base_url, and billing_mode (no COALESCE), and call it from switch_model() in agent_runtime_helpers.py after the swap succeeds. This follows the same pattern as update_session_model() which already unconditionally updates the model column (added for the identical COALESCE problem on the model field). Closes #48248	2026-06-25 14:44:00 -07:00
Brooklyn Nicholson	86e748df13	fix(agent): require code for coding posture	2026-06-25 16:40:27 -05:00
Brooklyn Nicholson	4ffdedd369	feat(tools): add project workspace tools	2026-06-25 16:40:27 -05:00
Teknium	c6575df927	feat(moa): expose MoA presets as selectable virtual models (#46081 ) * feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.	2026-06-25 13:52:06 -07:00
teknium1	0d777453fa	fix(auxiliary): fall back when a route can't run the model at all (400 capability mismatch) The salvaged context-window screen (#52392) skips fallback candidates that are too small, and the rate-limit/403 fixes skip candidates that are at capacity. A third hard failure remained uncovered: a fallback that builds a client fine but returns a 400 because it structurally cannot run the model. The canonical case is a configured openai-codex / ChatGPT-account fallback asked to compress a glm-5.2 conversation: 400 - {'detail': "The 'glm-5.2' model is not supported when using Codex with a ChatGPT account."} This is a request-validation error, so should_fallback was False and the explicit-provider gate blocked it — the auxiliary task (compression) aborted every turn, dropping middle turns without a summary and churning the session, which is exactly what destroys the prompt cache. Adds _is_model_incompatible_error() (400 + capability phrasing, excluding not-found and billing 400s which the sibling predicates own) and treats it as a fallback-worthy capacity error in both sync and async call_llm, so the chain skips the incapable route and continues to the next viable candidate.	2026-06-25 13:08:18 -07:00
Tranquil-Flow	e4d026aa3b	fix(auxiliary): screen fallback chain by context window for compression (#52392 ) The runtime auxiliary fallback chain (_try_configured_fallback_chain and _try_main_fallback_chain) returned the first reachable candidate without checking whether the candidate's context window was large enough for the task. For task='compression' this meant a reachable but undersized fallback (e.g. 32K) could be selected and then fail, even when a later larger-context fallback was available. This adds two small helpers: _task_minimum_context_length(task) Returns MINIMUM_CONTEXT_LENGTH (64K) for compression, None for other tasks (vision, web_extract, etc.). _candidate_context_window(provider, model, ...) Thin wrapper around get_model_context_length that returns None on probe failure so unknown/custom endpoints pass through unchanged (preserves the existing fallback surface). Both fallback loops now skip reachable candidates whose resolved context is below the task minimum and continue iterating. The success path (first viable candidate wins) is unchanged. Return shape and ordering for healthy candidates are preserved. Six regression tests cover: L2 configured chain skips too-small candidate L2 chain continues after skipping, returns last viable L3 main chain skips too-small candidate L4 unknown-context candidate passes through L5 non-compression task is not filtered L6 minimum constant matches MINIMUM_CONTEXT_LENGTH (64K) 3/6 fail on upstream/main without the production change (verified); all 6 pass with the fix. Full test_auxiliary_client.py suite (231 tests) and related compression tests (130 tests) remain green.	2026-06-25 13:08:18 -07:00
herbalizer404	b82c83d320	fix(auxiliary): honor fallback chain when compression provider auth is unavailable When an explicit aux provider cannot build a client before any request is sent (missing raw env key, exhausted/unavailable OAuth or credential-pool auth, resolver returning (None, None)), call_llm raised a misleading "no API key was found" error and bypassed the configured fallback_chain entirely. A provider authenticated through Hermes auth / the credential pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so compression failed instead of routing to the configured fallback. Adds _try_configured_fallback_for_unavailable_client() and wires it into both sync and async call_llm before the raise, and into the startup compression feasibility check. Salvaged from #51835 by @herbalizer404.	2026-06-25 13:08:18 -07:00

1 2 3 4 5 ...

1462 commits