hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-22 10:32:00 +00:00

Author	SHA1	Message	Date
teknium1	0a2b712965	test(chat-completions): cover timestamp strip + add AUTHOR_MAP entry Add a regression test for #47868 asserting convert_messages strips the internal per-message timestamp field, plus the identity-return path for timestamp-free message lists. Map x7peeps for the release attribution gate.	2026-06-20 17:05:17 -07:00
teknium1	5a53e0f0f4	fix(compression): abort on auth failure instead of rotating into a degraded session When the auxiliary summary call fails with an authentication/permission error (HTTP 401/403), context compression now ABORTS and preserves the session unchanged instead of rotating into a child session with a placeholder summary. Before: a 401 (invalid/blocked key, or a token pointed at the wrong inference host) fell through every transient-error check to 'return None', and because compression.abort_on_summary_failure defaults False, compress() took the static-fallback path and rotated the session anyway (messages N->N). The user landed on a fresh-but-broken session that kept failing the same way — paying for a full-context API call each turn with no useful compression. After: _generate_summary classifies 401/403 as a non-recoverable auth failure (_last_summary_auth_failure) and compress() aborts on it regardless of abort_on_summary_failure. A distinct auxiliary summary_model that 401s still retries once on the main model first (its dedicated creds may be the only broken thing); the abort only sticks when the main model itself auth-fails or the fallback also auth-fails. The existing _last_compress_aborted handling in conversation_compression.py already skips rotation and emits a warning, so no session rotation occurs. Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite flag=False, non-auth failures keep the historical fallback path, and aux-model auth failure recovers on main without aborting.	2026-06-20 11:38:21 -07:00
teknium1	f22dd8a75a	fix(agent): fail over to fallback provider on persistent auth failure (401/403) When the active provider returns a 401/403 that survives its per-provider credential-refresh attempt (revoked OAuth, blocked/expired key, or an account pinned to a dead/staging inference endpoint), the conversation loop now escalates to the configured fallback chain instead of dead-ending. Before: the generic failover dispatch fired only for {rate_limit, billing}; auth/auth_permanent fell through to 'switch providers manually' advice and never called _try_activate_fallback(). A user whose primary credential was broken kept thrashing on the same dead credential every turn — the main agent appeared 'stuck in fallback mode' while never actually failing over. This also affected auxiliary tasks (compression, vision, title-gen), since auto-resolved aux follows the main provider. After: a persistent auth failure with a configured fallback chain switches to the next provider (mirroring the rate-limit/billing failover path), guarded one-shot per attempt by TurnRetryState.auth_failover_attempted. When no fallback is configured the behavior is unchanged — it falls through to the existing terminal handling and provider-specific troubleshooting guidance. Tests: test_auth_provider_failover.py — 401/403 classify as auth, the gating condition fires only with a chain present + guard unset, the guard blocks repeats, and non-auth (500) errors do not trigger auth failover.	2026-06-20 11:38:01 -07:00
Sancho	c884ff64ea	fix(agent): keep system-prompt model identity in sync across provider failover The session-stable system prompt embeds Model:/Provider: identity lines, but mid-turn failover (try_activate_fallback) swaps the runtime without touching them, so a fallback model misreports itself as the primary when asked "what model are you?". rewrite_prompt_model_identity() rewrites the last occurrence of each line on _cached_system_prompt when a fallback activates (and back on restore, byte-identical so the primary's prefix cache still hits). The rewrite is never persisted to the session DB. _sync_failover_system_message() patches the in-flight api_messages[0] at all 8 failover sites so the current turn ships the corrected identity. Cache-safe: the fallback's prefix cache is cold on a model switch anyway. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-20 10:46:01 -07:00
Teknium	cf58f1a520	feat(titles): support language-aware title generation (#45296 ) Make auxiliary title prompts match the user language by default, with an optional pinned `auxiliary.title_generation.language` config.	2026-06-19 17:15:52 -07:00
alt-glitch	88d523220f	fix(mcp): address adversarial review round 2 (stale-publish race, parity holes) Second review pass (Codex + Hermes subagent). Codex reproduced a real race with a two-thread harness; both converged on the remaining issues. - Generation-aware publish (fixes a lost-update race): two refresh callers (the late-refresh daemon and the between-turns prologue around turn 1) could each compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry generation could acquire the publish lock after a newer caller and clobber it, deleting just-landed tools. refresh_agent_mcp_tools now captures registry._generation before computing and refuses to publish a stale set; agent._tool_snapshot_generation tracks the published generation. - Context-engine routing names (_context_engine_tool_names) are now staged on a local and published atomically with the snapshot, and only claimed when this rebuild actually appended the schema — matching agent_init's dedup so a registry/plugin tool of the same name keeps its own dispatch. (Previously mutated live, before the publish lock, and on no-change refreshes.) - CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a server newly ENABLED in config mid-session wasn't picked up (TUI already re-resolved). Merge now-connected MCP server names into the override (unless the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in sync. Closes the CLI/TUI parity hole. - ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th sibling rebuild that re-injected memory tools but NOT context-engine tools and bypassed the atomic/name-diff path (inert today, fragile). - mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG (single source of truth) instead of a stale hardcoded 5.0 literal. - Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout fallback uses DEFAULT_CONFIG.	2026-06-19 11:57:43 -07:00
alt-glitch	3713483874	fix(mcp): refresh agent tool snapshot between turns (cache-safe late-binding) A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting after the agent's one-time tool snapshot was uncallable for the rest of the session. The merged pre-first-turn late-refresh only helps during the dead air before the user's first keystroke; once a turn starts it bails to protect the prompt cache, so a user who types before the server connects never gets the tools without a manual /reload-mcp. Refresh the snapshot in the per-turn prologue (build_turn_context), before this turn's first API call assembles tools=. This is cache-safe by construction: the refresh only ever extends a fresh request prefix at a turn boundary, never mutates the cached prefix of an in-flight turn. So late tools become callable on the user's NEXT turn automatically, with no /reload-mcp and no cache cost. - tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with no MCP servers (the common case) skip the rebuild entirely. - agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the top of the prologue when MCP servers are registered. - tests: 3 contract tests through the real build_turn_context (adds late tool; skipped when no servers; no snapshot churn when unchanged). .hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.	2026-06-19 11:57:43 -07:00
Ben Barclay	f538470cf4	feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A) The credential gate. When multiplexing is active, a profile's secrets resolve from a context-local scope, never the process-global os.environ (which in a multiplexer may hold another profile's keys, and is inherited by every subprocess spawned with env=dict(os.environ)). - agent/secret_scope.py: get_secret() backed by a secret-scope contextvar. FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped read RAISES UnscopedSecretError instead of falling back to os.environ — a missed/new call site crashes loudly at that line rather than leaking a cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths, …) keep reading os.environ via an allowlist. load_env_file/build_profile_ secret_scope parse a profile .env into an isolated dict WITHOUT mutating os.environ. Off by default => transparent os.getenv behavior. - hermes_cli/runtime_provider.py: all credential/provider/base-url reads go through _getenv -> get_secret. - agent/credential_pool.py: env fallbacks route through get_secret (the ~/.hermes/.env-first preference is preserved and already profile-correct via the home override). - tools/mcp_tool.py: MCP config interpolation resolves through get_secret, so a server's picks up the routed profile's value. - gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env reload is a no-op for credentials in multiplex mode (secrets come from the scope, not global env); _profile_runtime_scope context manager combines the HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in that scope (resolved via _resolve_profile_home_for_source) when multiplexing. Propagates into the agent worker thread for free via the existing copy_context() in _run_in_executor_with_context. Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove two profiles isolated, unscoped read raises, globals still read environ).	2026-06-19 07:34:15 -07:00
tt-a1i	46f9d53468	fix(agent): aggregate anthropic aux calls via stream	2026-06-19 17:32:13 +05:30
Hao Zhe	d7cd0bc086	fix(openviking): preserve structured sync attribution	2026-06-19 15:23:41 +08:00
Victor Kyriazakos	3ead2bdd0d	feat(prompt): configurable per-platform system-prompt hint overrides Add platform_hints config so an admin can append to or replace Hermes' built-in platform hint for a single messaging platform (WhatsApp, Slack, Telegram, ...) without affecting other platforms. Enables enterprise managed profiles to steer platform-aware skills (e.g. invoke a custom table-formatting skill on WhatsApp where Markdown tables don't render) while leaving Telegram/Slack/CLI behavior unchanged. - hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG - agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides - agent/system_prompt.py: _resolve_platform_hint() applies append/replace (replace wins; bare string = append shorthand); defensive on bad config - tests: 16 cases covering append/replace/shorthand/isolation/malformed Override only affects the platform-hint segment of the system prompt; SOUL/context/memory tiers and general instructions are unchanged.	2026-06-18 14:28:01 -07:00
Siddharth Balyan	73cd8622f9	feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449 ) * feat(billing): nous_billing http client + BillingState core (phase 2b) Phase 2b terminal-billing client foundation: - hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints (state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired, BillingRateLimited, BillingAuthError) mapped from the live-verified contract; fail-open is the caller's job. Idempotency-Key enforced client-side. - agent/billing_view.py: surface-agnostic BillingState core + Decimal money parsing (server emits decimal strings, not 2dp), fail-open builder, idempotency-key gen, custom-amount validation. - 51 unit tests (decimal parse/format, payload tiering, error->exception matrix, fail-open, amount validation). Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md * feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b) - NOUS_BILLING_MANAGE_SCOPE constant. - nous_token_has_billing_scope(): split-based scope check (no false-positive substring match). - step_up_nous_billing_scope(): re-runs the device flow requesting billing:manage, reusing the held credential's portal/inference URLs + client_id (so a preview stays a preview), persists like _login_nous but WITHOUT the model picker. Returns True iff the minted token carries the scope (False when NAS silently downscopes a non-admin / unticked grant). Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope from a billing call triggers this. 7 unit tests. * feat(billing): billing JSON-RPC methods for the TUI (phase 2b) billing.state / charge / charge_status / auto_reload / step_up in tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok + result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise always resolves and the TUI branches on the typed billing error code (insufficient_scope, rate_limited, no_payment_method, …) to render the right affordance. Money serialized as decimal STRINGS + display strings. charge mints + echoes an idempotency_key for retry reuse. 16 unit tests. * feat(billing): /billing CLI handler + command registry (phase 2b) - CommandDef("billing", subcommands=buy\|auto-reload\|limit), added to _SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap parity test green, same as /credits). - cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→ poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal / _prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on 403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal. - 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green. * feat(billing): /billing Ink TUI screens + tests (phase 2b) - ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5 screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/ 5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> → ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay (D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to portal. Client-side amount validation mirrors the server (bounds + 2dp). - gatewayTypes.ts: Billing* response interfaces. - registry.ts: register billingCommands. - billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll- settled/no_payment_method/step-up/limit/auto-reload/validation). TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built. * docs(billing): scrub private cross-repo references NAS is a private repo — remove all references to it from the public PR: - drop the cross-repo planning doc (planning scaffolding, not a deliverable; the PR description documents the design) - replace 'NAS' / 'PR #412 preview' mentions in code + test comments with generic 'the server' / 'a preview deployment' * docs(billing): scrub final NAS reference in step-up docstring * docs(billing): drop dangling plan-doc refs The phase-2b plan doc was removed in the cross-repo scrub (`300afcc0b`) but two module docstrings still pointed at it. Drop the dead refs. * feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes Adds the interactive /billing TUI overlay and hardens the terminal-billing client across CLI and TUI. - TUI: full /billing overlay state machine (overview to buy to confirm, auto-reload, read-only monthly limit) reusing the existing confirm overlay. - Step-up: surface the verification link in-transcript and open the browser via the TUI's own opener (the device flow runs in the headless gateway, so a printed URL was being dropped); run the step-up handler off the main loop and emit the link as an out-of-band event so the gateway stays responsive. - Step-up copy is scope-accurate ("Billing permission granted") and re-checks /state so it never claims "enabled" when the org kill-switch is still off. - Portal deep-links resolve to absolute URLs against the active portal base (the server emits them relative) - fixes a bare "/billing?topup=open" link. - Billing calls refresh an expired access token via the stored refresh token instead of reporting a false "not logged in". - Optimistic funnel: advise "set up a saved card on the portal" up front when no card is on file (advisory, not a hard gate). - Token resolution is cached briefly so the 2s charge poll loop stops re-locking + re-reading the auth store on every tick; 401 re-resolves fresh. - Remove the temporary demo-mode shims. Validation: 87 Python billing tests, 88 TS tests (billing command + gateway event handler), tsc clean, ink + ui-tui builds green. * docs(billing): add /billing TUI screenshots for PR * fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test The UI-invalidate throttle read self._last_invalidate unconditionally, which raised AttributeError on HermesCLI instances built without __init__ (the thread-safety test's object.__new__ shell). Guard the read with getattr. The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel cleanly to None instead of falling back to a bare input() that would hang on the slash-worker thread; the test still asserted the old direct-input fallback. Update it to assert the current intended behavior: returns None, calls neither run_in_terminal nor input(), and does not hang.	2026-06-19 01:53:32 +05:30
Brooklyn Nicholson	07e785d60a	fix(prompt): dedupe parallel-tool-call steer; correct its rationale The universal PARALLEL_TOOL_CALL_GUIDANCE block already lives on main, but it shipped with two rough edges this change cleans up: - It duplicated the batching steer for Google models. The GOOGLE_MODEL_OPERATIONAL_GUIDANCE block still carried its own "Parallel tool calls" bullet, so Gemini/Gemma received the instruction twice in one prompt. Drop the redundant bullet — the universal block is now the single source. - Its comment claimed "nothing in the open-source system prompt encouraged batching," which was wrong: the steer existed for Google models only. Reword to say the gap was that every other model got nothing. - Tighten the test that asserts the steer (precedence-correct), and add an invariant guarding against re-introducing the Google duplicate.	2026-06-18 13:22:12 -05:00
Teknium	0fa7d6f660	fix(desktop): never persist or restore a named custom provider as bare "custom" (#48547 ) * Port from cline/cline#11514: encourage parallel tool calls Add a universal system-prompt guidance block telling the model to batch independent tool calls (reads, searches, web fetches, read-only commands) into a single assistant turn instead of one call per turn. The runtime already executes independent batches concurrently (read-only tools always; non-overlapping path-scoped file ops); the open-source system prompt had nothing steering the model to PRODUCE the batch. Fewer round-trips means less resent context, which compounds over a long conversation. - prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static, cache-amortised) modeled on TASK_COMPLETION_GUIDANCE. - system_prompt.py: inject right after the task-completion block, gated by agent.valid_tool_names + the new toggle. - agent_init.py: read agent.parallel_tool_call_guidance (default True). - config.py: add the default under the agent section. - test_prompt_builder.py: behavior-contract tests (batching steer, dependent carve-out, length bound) — invariants, not wording snapshots. Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's Python prompt-assembly architecture and config-over-env conventions. * fix(desktop): never persist or restore a named custom provider as bare "custom" Custom providers vanish from the Desktop/TUI model picker with "No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578) and repeatedly regressed (#44022, #47714) because every fix only recovered the entry identity from a persisted base_url. When a session is persisted/restored with the resolved provider "custom" and NO base_url, bare "custom" leaked through verbatim; resolve_runtime_provider("custom") routes to the OpenRouter default URL with no api_key, so the next turn/resume dies. Bare "custom" is the resolved billing class shared by every named providers:/ custom_providers: entry — it is not a routable identity. Centralize the "never let bare custom escape" invariant in one helper, runtime_provider.canonical_custom_identity(), and apply it at all four leak sites in tui_gateway/server.py: - _ensure_session_db_row — the ORIGIN: first DB write seeds the bad row - _runtime_model_config — live persist - _stored_session_runtime_overrides — resume restore (heals old rows; drops unrecoverable bare custom so resume falls back to config default) - _make_agent — rebuild / per-turn The helper recovers custom:<name> from the endpoint URL when present, else from config.model.provider (the durable identity left when no base_url survived). Regression tests in test_custom_provider_session_persistence.py lock the no-base_url vector at every site so it cannot regress again.	2026-06-18 11:11:51 -07:00
teknium1	c5eb64b9f7	fix(xai): scope native web_search to swap-only + reconcile composer ctx to 200k Salvage corrections on top of @XVVH's #44341: - Make native web_search injection a 1:1 swap for an already-present client web_search function, NOT an additive grant. The original unconditionally appended {"type":"web_search"} on every is_xai_responses turn with any tools, force-enabling Grok server-side search even when the user never enabled the web toolset (bypassing Hermes web-provider config + tool-trace plumbing). Now gated on a client web_search actually being present. - Reconcile grok-composer context to 200000 (merged in #47908) rather than 262144; 200k is xAI's published usable context window for Composer 2.5, 262144 is the /v1/responses input+output budget. - Update tests to match scoped behavior + add a no-web-toolset guard test. - AUTHOR_MAP entry for #44341 salvage. Incomplete-guard (server-side *_call items at in_progress no longer flip has_incomplete_items) and preflight built-in-tool allowlist kept as-is.	2026-06-17 17:33:32 -07:00
XVVH	6f89e17a33	fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context - model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models) - codex transport: inject native {"type":"web_search"} for is_xai_responses; drop client web_search to avoid duplicate-name 400s - codex adapter: do not treat in-progress server-side *_call items as incomplete - tests: adapter, transport build_kwargs, model_metadata, oauth recovery	2026-06-17 17:33:32 -07:00
Teknium	020e59d3cf	fix(agent): dampen empty-name phantom tool-call loop (#47967 ) (#48109 ) Weak open models (mimo, nemotron-class) that see tool-call XML/JSON sitting in file contents or tool output get primed and emit their own structured tool calls mimicking the payload — usually with an empty/whitespace name. Those calls can't be fuzzy-repaired toward a real tool, so the dispatch loop returns an error and the model retries. Before this fix, every empty-name error dumped the full tool catalog back to the model, which fed the priming loop more names to mimic and inflated context 3-4x across the retry budget. A blank/whitespace-only tool name now gets a terse anti-priming error that tells the model in-context tool-call syntax is DATA, with no catalog dump. A genuinely-wrong-but-nonempty name (a real typo) still gets the full catalog so the model can self-correct. Not a sandbox/auth boundary issue: Hermes never parses tool-call text from content into executable calls (structured tool_calls only; the lone text->call parser is the Copilot ACP transport and it also rejects empty names). The reporter's own debug dump confirms the injection never executed. Behavior-contract test added: empty-name -> terse error, no catalog; nonempty unknown -> catalog preserved. Exercised end-to-end via run_conversation against an in-process mock provider.	2026-06-17 17:32:14 -07:00
Teknium	f80381c456	feat(prompt): scale context-file cap to model window + point agent at truncated file (#47846 ) Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were hard-capped at a flat 20K chars before head/tail truncation. Among the agent harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code, OpenCode, and Cline load them whole. The flat 20K predates large context windows and silently truncates real-world AGENTS.md files. B — dynamic cap: when context_file_max_chars is unset (now the shipped default), the cap scales with the model's context window (ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at the historical 20K; a 200K model gets 48K; large models stop truncating real docs. An explicit context_file_max_chars still wins. Context length is resolved once per conversation (stable -> prompt cache untouched). C — when truncation does happen, the marker now names the concrete file path and tells the agent to read_file it for the full content. Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config (0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate with the path-bearing marker, large windows load whole, and the system prompt is byte-stable across rebuilds.	2026-06-17 05:40:26 -07:00
Max Freedom Pollard	fc1119ca66	fix(curator): stop the rollback safety snapshot from pruning its target Rolling back to the oldest curator snapshot failed and deleted that snapshot. rollback() takes a safety snapshot first, and snapshot_skills() ends by pruning the backups directory down to keep (5 by default). At the steady keep limit that prune removed the oldest snapshot, which is the very one being restored, so the extract found no skills.tar.gz and the rollback stopped with "snapshot extract failed (state restored)". Thread an optional protect set through snapshot_skills() into _prune_old() so the pre rollback safety snapshot can never evict the snapshot being restored. Add two regression tests covering restore of the oldest snapshot at the keep limit. Fixes #47612	2026-06-17 05:40:05 -07:00
Teknium	7bbffceb9c	feat(curator): make skill consolidation opt-in (prune stays default-on) (#47840 ) The curator now defaults to prune-only: the deterministic inactivity pass (mark stale / archive long-unused skills) still runs whenever the curator is enabled, but the opinionated LLM umbrella-building consolidation fork is OFF by default. - agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate the forked aux-model review in run_curator_review behind it (new consolidate param, None=read config). When off, the LLM pass is skipped entirely (no aux-model cost); the run is still recorded and reported. - config.py: add curator.consolidate (default false); v29->v30 migration seeds the key for existing installs without clobbering a user-set value. - hermes_cli/curator.py: 'hermes curator run --consolidate' override; status shows consolidate state; prune-only notice on run. - docs + tests.	2026-06-17 05:20:32 -07:00
kyssta-exe	4d39a603d1	fix(codex): restore session_id/x-client-request-id HTTP headers for cache routing (#47335 )	2026-06-17 05:13:12 -07:00
kshitijk4poor	b70a4e7533	fix(anthropic): also normalize MCP-server tool names to mcp__ on OAuth wire The double-underscore prefix swap fixed bare native tools but SKIPPED tools already named mcp_<server>_<tool> (real MCP servers, e.g. mcp_linear_get_issue): they went on the OAuth wire single-underscore and still tripped Anthropic's third-party billing classifier -> HTTP 400 'extra usage, not plan limits'. Verified empirically against a live Max subscription: a single mcp_ tool flips the whole request to the extra-usage lane; mcp__ is accepted. - build_anthropic_kwargs: promote ANY leading single-underscore mcp_ to mcp__ (bare names -> mcp__name; mcp_<server>_<tool> -> mcp__<server>_<tool>), never double-prefixing an already-mcp__ name. Same for tool_use blocks in history. - normalize_response: reverse the mcp__ wire name back to whichever original the registry knows — the single-underscore mcp_<server>_<tool> form for MCP server tools, or the bare name for native tools — preferring a name that already resolves natively. - Tests rewritten to assert the invariant: ZERO single-underscore mcp_ names reach the OAuth wire, and the mcp__ round-trip resolves back to the registered name for both native and MCP-server tools. Builds on liuhao1024's mcp__ prefix commit (cherry-picked). Closes the MCP-server gap that left any session with an MCP server configured still billing to extra usage.	2026-06-17 13:20:29 +05:30
liuhao1024	3d37869295	fix(anthropic): use double-underscore mcp__ prefix for OAuth tool names Anthropic's Claude-Code request classifier treats tool names with a single-underscore `mcp_<x>` prefix as non-Claude-Code / third-party, routing the request to extra-usage billing (HTTP 400). Real Claude Code uses double underscores: `mcp__<server>__<tool>`. Change the tool-name prefix from `mcp_` to `mcp__` in both the outgoing path (build_anthropic_kwargs) and the incoming path (normalize_response). Update the skip-guard to check for both `mcp_` and `mcp__` prefixes so native MCP server tools (which use the legacy single-underscore format) are not double-prefixed. Fixes #46675	2026-06-17 13:12:23 +05:30
Wolfram Ravenwolf	9137b86a52	fix(skills): ignore support docs in skill discovery Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks. This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references. Tests cover: - pruning nested SKILL.md files inside skill support directories - preserving support-named top-level categories - avoiding skill_view collisions from support markdown - keeping archived package SKILL.md files accessible only through file_path	2026-06-16 13:08:34 -07:00
teknium	6ebc449915	fix(prompt): isolate truncation warnings per context Follow-up to salvaged PR #41619: replace the module-global _truncation_warnings list with a contextvars.ContextVar so concurrent gateway-session prompt builds can't drain or clear each other's pending warnings (cross-session leak). Adds a context-isolation test.	2026-06-16 11:28:35 -07:00
Wolfram Ravenwolf	f6a42b1acf	feat(prompt): make context-file truncation limit configurable PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful. SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.	2026-06-16 11:28:35 -07:00
Teknium	c2c55c4443	fix(memory): strip skill scaffolding for all providers, not just openviking Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution affected every auto-syncing memory provider — mem0, hindsight, retaindb, byterover, honcho, supermemory all store/embed the raw user turn, so a /skill invocation poisoned their stores with the full skill body, not just openviking. - Lift the contributor's parser into agent/skill_commands.py as the canonical extract_user_instruction_from_skill_message(), co-located with the message builders so the markers can't drift. - Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} — fixes the whole provider fan-out, bare /skill turns are skipped entirely. - OpenViking's _derive_openviking_user_text() now delegates to the shared helper as defense-in-depth (no duplicated marker literals). - Marker-drift regression now asserts against the canonical skill_commands constants; add manager-level coverage proving every provider gets clean text.	2026-06-16 10:37:37 -07:00
brooklyn!	c6e99ab375	Merge pull request #46959 from NousResearch/bb/composer-model-selector feat(desktop): composer model selector, per-model presets & external-provider disconnect	2026-06-16 09:55:57 -05:00
Brooklyn Nicholson	7d938cc5c9	fix(desktop): keep live model switch metadata truthful A live config.set model switch already moved the next API call to the new model, but the conversation could still restore an old sessions.system_prompt snapshot whose Model/Provider lines named the previous runtime. That made "what model are you?" answer from stale metadata even while inference ran on the new model. After a live switch we now refresh the stored system prompt and append a real system-history pivot (not a fake user turn) so the transcript itself records the new model/provider. Restore also rejects already-stale prompt snapshots when their Model/Provider lines disagree with the runtime, so existing bad sessions self-heal.	2026-06-16 09:50:17 -05:00
Teknium	4858942c55	fix(auxiliary): honor main fallback chain for auto tasks (#47235 )	2026-06-16 06:23:24 -07:00
Teknium	3e7e9b24d4	fix: harden salvaged session and browser improvements Polish salvaged contributor work before PR review: - read browser inactivity timeout from config with documented fallback - skip redundant v10 trigram backfill before v11 FTS rebuild - show delegate_task goals safely in progress previews - show gateway status model/context without redundant token wording - wire gateway /sessions to shared session-listing helpers - map Ravenwolf author emails for release attribution Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de> Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>	2026-06-15 07:46:34 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
liuhao1024	2cddc9c895	fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream converse() and converse_stream() were added in boto3 1.34.59. When Hermes is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46), the system boto3 takes precedence and calls to converse_stream fail with AttributeError. Add an early version check in _require_boto3() that raises a clear RuntimeError with upgrade instructions.	2026-06-15 05:25:17 -07:00
Teknium	f3fe99863d	revert(web): remove keyless Parallel search fallback (#46350 ) Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced. Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.	2026-06-14 16:47:57 -07:00
mr-r0b0t	bff78a34dc	feat(zai): add GLM-5.2 with verified 1M context window GLM-5.2 ships with a 1M (1,048,576) token context window. Without this entry, Hermes falls through to the generic 'glm' key (202,752 tokens), under-reporting the context bar and prematurely compressing conversations. The 1M limit was verified empirically via needle-in-a-haystack retrieval at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors, zero truncation, correct retrieval at every tested size (25K through 789K). Changes: - agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback - hermes_cli/models.py: add glm-5.2 to zai curated models - hermes_cli/setup.py: add glm-5.2 to setup wizard zai list - hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes - plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models - tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests	2026-06-14 13:50:36 -07:00
Teknium	4e6d05c6a5	perf(skills): share raw config cache in skill utils (#46149 )	2026-06-14 11:14:58 -07:00
Teknium	13a1bd0f83	perf(model-metadata): persist OpenRouter metadata cache (#46114 )	2026-06-14 04:45:46 -07:00
Teknium	723c2331bd	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00
helix4u	85e6232a07	fix(providers): support anthropic proxy v1 endpoints	2026-06-14 02:09:16 -07:00
kshitijk4poor	12c84d6c77	fix(transports): only treat a refusal as terminal when it is the sole payload A chat-completions response that carries real text or tool calls alongside a `message.refusal` note is a normal, usable turn — the model did work. The prior logic flipped finish_reason to `content_filter` whenever a refusal string was present, so the conversation loop reframed a content-bearing turn as a failed safety refusal (failed=True) and buried the model's actual output inside the "model declined" template, or dropped tool calls entirely. Only promote to a terminal `content_filter` when the refusal is the sole payload (no visible text AND no tool calls). The refusal explanation is still recorded in provider_data in every case for observability. Refusal-only responses (the bug this feature targets) are unaffected and still surface terminally; the empty+refusal, bare content_filter passthrough, and no-refusal common cases are byte-identical to before. Updates the partial-content test to the corrected contract and adds a tool_calls-alongside-refusal regression guard.	2026-06-14 12:12:52 +05:30
SHL0MS	ab26541b9a	test(transports): lock in content_filter passthrough for OpenRouter OpenRouter (and every other OpenAI-compatible provider) uses the default chat_completions transport, so it is already covered by the refusal fix: an upstream Claude / moderation refusal arrives as finish_reason="content_filter" (often empty content, no message.refusal). Add a regression test asserting the transport passes that finish reason straight through to the loop's content_filter handler. (cherry picked from commit `60168a513b`)	2026-06-14 12:10:08 +05:30
SHL0MS	bb46bf8ce4	fix(agent): surface model refusals instead of retrying them as errors A Claude refusal (HTTP 200, stop_reason="refusal", empty content) was laundered into a generic retry loop and surfaced as a misleading "rate limited / invalid response" or "no content after retries" error, burning paid attempts reproducing a deterministic refusal. This hit two distinct paths: - Direct Anthropic (anthropic_messages): validate_response rejected the empty-content refusal before normalize_response mapped refusal -> content_filter, so it fell into the invalid-response retry loop. - Nous Portal / OpenAI-compatible (chat_completions): the portal surfaces a Claude refusal via message.refusal with empty content, which sailed past validation and died in the empty-response retry loop. Fix (one unified content_filter dispatch for all backends): - AnthropicTransport.validate_response: accept empty content when stop_reason == "refusal" so it flows to normalize_response. - ChatCompletionsTransport.normalize_response: promote message.refusal to content + a content_filter finish reason. - conversation_loop: handle finish_reason == "content_filter" - fire the api_request_error hook (content_policy_blocked), try a configured fallback once, else return a clear terminal refusal message. Never retry a deterministic refusal. Supersedes #43084, which fixed only the direct-Anthropic path and could not reach the chat_completions/portal path. Tests: transport-level (validate_response refusal, message.refusal promotion) + end-to-end loop (refusal surfaced, exactly one API call). (cherry picked from commit `01f546f92c`)	2026-06-14 12:10:08 +05:30
Teknium	7aaae7acd0	fix(ssl): align guard docs and escape hatch	2026-06-13 21:14:32 -07:00
Teknium	af5b526472	fix(ssl): validate CA bundle paths before provider calls	2026-06-13 21:14:32 -07:00
chromalinx	b42c5bf652	test(ssl_guard): fix macOS fallback test that passed for the wrong reason The previous test patched ssl.create_default_context globally with a bare SSLContext that has zero CA certs. Both verify_ca_bundle() and the macOS fallback got the same mocked context, so the test verified nothing useful: both paths produced empty get_ca_certs() and the assertion that no exception escaped was vacuously satisfied. Only mock the fallback call (no cafile) — let the certifi call hit the real SSL stack and fail with SSLError on the broken PEM. The mock fallback returns a context with load_default_certs() so the test now verifies the real scenario: broken certifi → SSLConfigurationError, macOS system trust store → success. Also pads the broken PEM past the 1 KB size guard so the size check doesn't short-circuit before ssl.create_default_context(cafile=...) runs. Reported by @liuhao1024 in PR review.	2026-06-13 21:14:32 -07:00
chromalinx	a218a0f156	fix(agent,gateway,doctor): add SSL CA cert bundle fail-fast guard A stale certifi CA bundle after a partial `hermes update` used to crash the agent on the first outbound HTTPS call with a raw traceback and trap the gateway in a retry loop. This patch: * Adds `agent/errors.py` with a typed `SSLConfigurationError` * Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight that asserts the bundle exists, is non-trivial in size, and can build a working SSLContext. On macOS, it falls back to the system trust store when the bundle is empty but the system store is healthy (covers corporate proxies / MDM setups). * Wires the guard into `run_agent.py` and `gateway/run.py` right after the `hermes_bootstrap` import, inside a try/except so a bug in the guard itself can never prevent startup. * Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so users can detect the failure with one command. * Adds unit tests covering the healthy, missing, empty, skip-env, and macOS-fallback paths. * Adds an RCA document describing the failure mode and the recovery path (`pip install -e .`). When the bundle is broken the user sees: \u26a0\ufe0f SSL certificate bundle issue detected. Run: pip install -e . `HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed environments that ship their own trust store.	2026-06-13 21:14:32 -07:00
Teknium	c8e5f34f24	fix(gemini): strip native self prefixes before generateContent (#36141 ) Strip `google/` and `gemini/` self-prefixes before native Gemini generateContent calls, and keep provider-normalization expectations aligned.	2026-06-13 13:47:08 -07:00
ITheEqualizer	7c0605bf22	fix(telegram): preserve rich formatting on stream final	2026-06-13 13:44:45 -07:00
achaljhawar	819def44c7	fix(agent): scope Nous tags to Nous auxiliary calls	2026-06-13 13:24:40 -07:00
ashishpatel26	957a8ffa88	fix(bedrock): omit sampling params for restricted Claude models Bedrock Converse rejects non-default sampling parameters for Opus 4.7 and 4.8 with a ValidationException. Reuse the Anthropic-native sampling-param guard in the Bedrock kwargs builder so those models omit temperature/topP while older Claude and non-Claude models keep existing behavior. Includes the stop-sequence regression from the parallel fix to ensure stopSequences still pass through for restricted Opus models. Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-13 10:45:56 -07:00

1 2 3 4 5 ...

724 commits