hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-21 10:22:18 +00:00

Author	SHA1	Message	Date
Victor Kyriazakos	3ead2bdd0d	feat(prompt): configurable per-platform system-prompt hint overrides Add platform_hints config so an admin can append to or replace Hermes' built-in platform hint for a single messaging platform (WhatsApp, Slack, Telegram, ...) without affecting other platforms. Enables enterprise managed profiles to steer platform-aware skills (e.g. invoke a custom table-formatting skill on WhatsApp where Markdown tables don't render) while leaving Telegram/Slack/CLI behavior unchanged. - hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG - agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides - agent/system_prompt.py: _resolve_platform_hint() applies append/replace (replace wins; bare string = append shorthand); defensive on bad config - tests: 16 cases covering append/replace/shorthand/isolation/malformed Override only affects the platform-hint segment of the system prompt; SOUL/context/memory tiers and general instructions are unchanged.	2026-06-18 14:28:01 -07:00
kyssta-exe	81ff916e57	fix(agent): flush un-persisted messages before session rotation (#47202 ) compress_context() rotates the session (end_session -> create_session) mid-turn when auto-compress triggers, but never called _flush_messages_to_session_db() first. Messages generated during the current turn that hadn't been persisted to state.db were silently lost. The same bug existed in cli.py:new_session() (/new command). Both paths now flush un-persisted messages before ending the old session.	2026-06-18 13:38:35 -07:00
Siddharth Balyan	73cd8622f9	feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449 ) * feat(billing): nous_billing http client + BillingState core (phase 2b) Phase 2b terminal-billing client foundation: - hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints (state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired, BillingRateLimited, BillingAuthError) mapped from the live-verified contract; fail-open is the caller's job. Idempotency-Key enforced client-side. - agent/billing_view.py: surface-agnostic BillingState core + Decimal money parsing (server emits decimal strings, not 2dp), fail-open builder, idempotency-key gen, custom-amount validation. - 51 unit tests (decimal parse/format, payload tiering, error->exception matrix, fail-open, amount validation). Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md * feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b) - NOUS_BILLING_MANAGE_SCOPE constant. - nous_token_has_billing_scope(): split-based scope check (no false-positive substring match). - step_up_nous_billing_scope(): re-runs the device flow requesting billing:manage, reusing the held credential's portal/inference URLs + client_id (so a preview stays a preview), persists like _login_nous but WITHOUT the model picker. Returns True iff the minted token carries the scope (False when NAS silently downscopes a non-admin / unticked grant). Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope from a billing call triggers this. 7 unit tests. * feat(billing): billing JSON-RPC methods for the TUI (phase 2b) billing.state / charge / charge_status / auto_reload / step_up in tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok + result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise always resolves and the TUI branches on the typed billing error code (insufficient_scope, rate_limited, no_payment_method, …) to render the right affordance. Money serialized as decimal STRINGS + display strings. charge mints + echoes an idempotency_key for retry reuse. 16 unit tests. * feat(billing): /billing CLI handler + command registry (phase 2b) - CommandDef("billing", subcommands=buy\|auto-reload\|limit), added to _SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap parity test green, same as /credits). - cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→ poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal / _prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on 403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal. - 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green. * feat(billing): /billing Ink TUI screens + tests (phase 2b) - ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5 screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/ 5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> → ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay (D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to portal. Client-side amount validation mirrors the server (bounds + 2dp). - gatewayTypes.ts: Billing* response interfaces. - registry.ts: register billingCommands. - billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll- settled/no_payment_method/step-up/limit/auto-reload/validation). TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built. * docs(billing): scrub private cross-repo references NAS is a private repo — remove all references to it from the public PR: - drop the cross-repo planning doc (planning scaffolding, not a deliverable; the PR description documents the design) - replace 'NAS' / 'PR #412 preview' mentions in code + test comments with generic 'the server' / 'a preview deployment' * docs(billing): scrub final NAS reference in step-up docstring * docs(billing): drop dangling plan-doc refs The phase-2b plan doc was removed in the cross-repo scrub (`300afcc0b`) but two module docstrings still pointed at it. Drop the dead refs. * feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes Adds the interactive /billing TUI overlay and hardens the terminal-billing client across CLI and TUI. - TUI: full /billing overlay state machine (overview to buy to confirm, auto-reload, read-only monthly limit) reusing the existing confirm overlay. - Step-up: surface the verification link in-transcript and open the browser via the TUI's own opener (the device flow runs in the headless gateway, so a printed URL was being dropped); run the step-up handler off the main loop and emit the link as an out-of-band event so the gateway stays responsive. - Step-up copy is scope-accurate ("Billing permission granted") and re-checks /state so it never claims "enabled" when the org kill-switch is still off. - Portal deep-links resolve to absolute URLs against the active portal base (the server emits them relative) - fixes a bare "/billing?topup=open" link. - Billing calls refresh an expired access token via the stored refresh token instead of reporting a false "not logged in". - Optimistic funnel: advise "set up a saved card on the portal" up front when no card is on file (advisory, not a hard gate). - Token resolution is cached briefly so the 2s charge poll loop stops re-locking + re-reading the auth store on every tick; 401 re-resolves fresh. - Remove the temporary demo-mode shims. Validation: 87 Python billing tests, 88 TS tests (billing command + gateway event handler), tsc clean, ink + ui-tui builds green. * docs(billing): add /billing TUI screenshots for PR * fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test The UI-invalidate throttle read self._last_invalidate unconditionally, which raised AttributeError on HermesCLI instances built without __init__ (the thread-safety test's object.__new__ shell). Guard the read with getattr. The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel cleanly to None instead of falling back to a bare input() that would hang on the slash-worker thread; the test still asserted the old direct-input fallback. Update it to assert the current intended behavior: returns None, calls neither run_in_terminal nor input(), and does not hang.	2026-06-19 01:53:32 +05:30
Brooklyn Nicholson	07e785d60a	fix(prompt): dedupe parallel-tool-call steer; correct its rationale The universal PARALLEL_TOOL_CALL_GUIDANCE block already lives on main, but it shipped with two rough edges this change cleans up: - It duplicated the batching steer for Google models. The GOOGLE_MODEL_OPERATIONAL_GUIDANCE block still carried its own "Parallel tool calls" bullet, so Gemini/Gemma received the instruction twice in one prompt. Drop the redundant bullet — the universal block is now the single source. - Its comment claimed "nothing in the open-source system prompt encouraged batching," which was wrong: the steer existed for Google models only. Reword to say the gap was that every other model got nothing. - Tighten the test that asserts the steer (precedence-correct), and add an invariant guarding against re-introducing the Google duplicate.	2026-06-18 13:22:12 -05:00
Teknium	0fa7d6f660	fix(desktop): never persist or restore a named custom provider as bare "custom" (#48547 ) * Port from cline/cline#11514: encourage parallel tool calls Add a universal system-prompt guidance block telling the model to batch independent tool calls (reads, searches, web fetches, read-only commands) into a single assistant turn instead of one call per turn. The runtime already executes independent batches concurrently (read-only tools always; non-overlapping path-scoped file ops); the open-source system prompt had nothing steering the model to PRODUCE the batch. Fewer round-trips means less resent context, which compounds over a long conversation. - prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static, cache-amortised) modeled on TASK_COMPLETION_GUIDANCE. - system_prompt.py: inject right after the task-completion block, gated by agent.valid_tool_names + the new toggle. - agent_init.py: read agent.parallel_tool_call_guidance (default True). - config.py: add the default under the agent section. - test_prompt_builder.py: behavior-contract tests (batching steer, dependent carve-out, length bound) — invariants, not wording snapshots. Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's Python prompt-assembly architecture and config-over-env conventions. * fix(desktop): never persist or restore a named custom provider as bare "custom" Custom providers vanish from the Desktop/TUI model picker with "No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578) and repeatedly regressed (#44022, #47714) because every fix only recovered the entry identity from a persisted base_url. When a session is persisted/restored with the resolved provider "custom" and NO base_url, bare "custom" leaked through verbatim; resolve_runtime_provider("custom") routes to the OpenRouter default URL with no api_key, so the next turn/resume dies. Bare "custom" is the resolved billing class shared by every named providers:/ custom_providers: entry — it is not a routable identity. Centralize the "never let bare custom escape" invariant in one helper, runtime_provider.canonical_custom_identity(), and apply it at all four leak sites in tui_gateway/server.py: - _ensure_session_db_row — the ORIGIN: first DB write seeds the bad row - _runtime_model_config — live persist - _stored_session_runtime_overrides — resume restore (heals old rows; drops unrecoverable bare custom so resume falls back to config default) - _make_agent — rebuild / per-turn The helper recovers custom:<name> from the endpoint URL when present, else from config.model.provider (the durable identity left when no base_url survived). Regression tests in test_custom_provider_session_persistence.py lock the no-base_url vector at every site so it cannot regress again.	2026-06-18 11:11:51 -07:00
Teknium	38c8a9c10f	feat(memory): batch operations for single-turn memory updates (#48507 ) The memory tool was strictly one-op-per-call. With the store running near its char limit by design, a new add that would overflow gets rejected with 'consolidate now, then retry' -- but the model could not consolidate and add in one call. It had to remove/replace across several turns, then retry the add, each turn re-sending the whole conversation context. Expensive thrash. Add an 'operations' array: a list of add/replace/remove ops applied atomically against the FINAL char budget. The model frees space and adds new entries in ONE call, even when an add alone would overflow. All-or-nothing: any bad op aborts the whole batch, nothing written. Root-cause note: the two agent-level memory interception sites (agent_runtime_helpers.py, tool_executor.py) silently dropped any param not in their explicit kwarg list, so 'operations' never reached the handler and batch calls failed with 'Unknown action None'. Both now pass it through and bridge each add/replace op to external memory providers. Also: success response is now terminal (done=true + 'do not repeat' note, no full-entries echo that invited re-edits); schema rewritten to lead with the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars). Live-verified: near-full consolidate-and-add went 7 calls -> 1 call, stable across 3 reps. 103 memory/approval tests + 398 background-review/ run_agent tests green; 6 new batch tests added.	2026-06-18 10:19:33 -07:00
kshitijk4poor	1153b42b24	Merge upstream/main into OpenViking setup-UX (salvage #32445 ) Resolves conflicts from the OpenViking churn that merged after #32445 was opened (#48042/#47662 session-switch + write hardening, #47311/#47973): - plugins/memory/openviking/__init__.py: keep both __init__ field groups (the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down). - tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new setup-validation tests and main's session-switch/concurrency tests (disjoint additions to the same region). Two fixes layered while reconciling (contributor work otherwise preserved): - Restore the merged tenant-header contract (#22414/#21232). The PR had changed _VikingClient defaults to '' and made empty account/user OMIT the tenant headers; main's contract is that empty falls back to 'default' and the X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them). Reverted the constructor to 'account or os.environ.get(..., "default")' and updated the two PR tests that asserted the omit-when-empty behavior. - Close a secret-file TOCTOU in the setup writers. _write_env_vars and _write_ovcli_config wrote the api_key/root_api_key file and chmod 0600 AFTERWARD, leaving a world-readable window on newly-created files. Added _precreate_secret_file() to create with 0600 before any secret bytes land.	2026-06-18 11:28:51 +05:30
teknium1	c5eb64b9f7	fix(xai): scope native web_search to swap-only + reconcile composer ctx to 200k Salvage corrections on top of @XVVH's #44341: - Make native web_search injection a 1:1 swap for an already-present client web_search function, NOT an additive grant. The original unconditionally appended {"type":"web_search"} on every is_xai_responses turn with any tools, force-enabling Grok server-side search even when the user never enabled the web toolset (bypassing Hermes web-provider config + tool-trace plumbing). Now gated on a client web_search actually being present. - Reconcile grok-composer context to 200000 (merged in #47908) rather than 262144; 200k is xAI's published usable context window for Composer 2.5, 262144 is the /v1/responses input+output budget. - Update tests to match scoped behavior + add a no-web-toolset guard test. - AUTHOR_MAP entry for #44341 salvage. Incomplete-guard (server-side *_call items at in_progress no longer flip has_incomplete_items) and preflight built-in-tool allowlist kept as-is.	2026-06-17 17:33:32 -07:00
XVVH	6f89e17a33	fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context - model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models) - codex transport: inject native {"type":"web_search"} for is_xai_responses; drop client web_search to avoid duplicate-name 400s - codex adapter: do not treat in-progress server-side *_call items as incomplete - tests: adapter, transport build_kwargs, model_metadata, oauth recovery	2026-06-17 17:33:32 -07:00
Teknium	020e59d3cf	fix(agent): dampen empty-name phantom tool-call loop (#47967 ) (#48109 ) Weak open models (mimo, nemotron-class) that see tool-call XML/JSON sitting in file contents or tool output get primed and emit their own structured tool calls mimicking the payload — usually with an empty/whitespace name. Those calls can't be fuzzy-repaired toward a real tool, so the dispatch loop returns an error and the model retries. Before this fix, every empty-name error dumped the full tool catalog back to the model, which fed the priming loop more names to mimic and inflated context 3-4x across the retry budget. A blank/whitespace-only tool name now gets a terse anti-priming error that tells the model in-context tool-call syntax is DATA, with no catalog dump. A genuinely-wrong-but-nonempty name (a real typo) still gets the full catalog so the model can self-correct. Not a sandbox/auth boundary issue: Hermes never parses tool-call text from content into executable calls (structured tool_calls only; the lone text->call parser is the Copilot ACP transport and it also rejects empty names). The reporter's own debug dump confirms the injection never executed. Behavior-contract test added: empty-name -> terse error, no catalog; nonempty unknown -> catalog preserved. Exercised end-to-end via run_conversation against an in-process mock provider.	2026-06-17 17:32:14 -07:00
definitelynotguru	eaddeaf2e6	feat(xai): add grok-composer-2.5-fast to xAI OAuth model picker The model is callable via xAI OAuth but omitted from models.dev and /v1/models listings. Merge it into the curated xAI catalog so it appears in `hermes model` without requiring a custom model name.	2026-06-17 09:49:46 -07:00
Teknium	f80381c456	feat(prompt): scale context-file cap to model window + point agent at truncated file (#47846 ) Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were hard-capped at a flat 20K chars before head/tail truncation. Among the agent harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code, OpenCode, and Cline load them whole. The flat 20K predates large context windows and silently truncates real-world AGENTS.md files. B — dynamic cap: when context_file_max_chars is unset (now the shipped default), the cap scales with the model's context window (ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at the historical 20K; a 200K model gets 48K; large models stop truncating real docs. An explicit context_file_max_chars still wins. Context length is resolved once per conversation (stable -> prompt cache untouched). C — when truncation does happen, the marker now names the concrete file path and tells the agent to read_file it for the full content. Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config (0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate with the path-bearing marker, large windows load whole, and the system prompt is byte-stable across rebuilds.	2026-06-17 05:40:26 -07:00
Max Freedom Pollard	fc1119ca66	fix(curator): stop the rollback safety snapshot from pruning its target Rolling back to the oldest curator snapshot failed and deleted that snapshot. rollback() takes a safety snapshot first, and snapshot_skills() ends by pruning the backups directory down to keep (5 by default). At the steady keep limit that prune removed the oldest snapshot, which is the very one being restored, so the extract found no skills.tar.gz and the rollback stopped with "snapshot extract failed (state restored)". Thread an optional protect set through snapshot_skills() into _prune_old() so the pre rollback safety snapshot can never evict the snapshot being restored. Add two regression tests covering restore of the oldest snapshot at the keep limit. Fixes #47612	2026-06-17 05:40:05 -07:00
Teknium	7bbffceb9c	feat(curator): make skill consolidation opt-in (prune stays default-on) (#47840 ) The curator now defaults to prune-only: the deterministic inactivity pass (mark stale / archive long-unused skills) still runs whenever the curator is enabled, but the opinionated LLM umbrella-building consolidation fork is OFF by default. - agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate the forked aux-model review in run_curator_review behind it (new consolidate param, None=read config). When off, the LLM pass is skipped entirely (no aux-model cost); the run is still recorded and reported. - config.py: add curator.consolidate (default false); v29->v30 migration seeds the key for existing installs without clobbering a user-set value. - hermes_cli/curator.py: 'hermes curator run --consolidate' override; status shows consolidate state; prune-only notice on run. - docs + tests.	2026-06-17 05:20:32 -07:00
kyssta-exe	4d39a603d1	fix(codex): restore session_id/x-client-request-id HTTP headers for cache routing (#47335 )	2026-06-17 05:13:12 -07:00
kshitijk4poor	b70a4e7533	fix(anthropic): also normalize MCP-server tool names to mcp__ on OAuth wire The double-underscore prefix swap fixed bare native tools but SKIPPED tools already named mcp_<server>_<tool> (real MCP servers, e.g. mcp_linear_get_issue): they went on the OAuth wire single-underscore and still tripped Anthropic's third-party billing classifier -> HTTP 400 'extra usage, not plan limits'. Verified empirically against a live Max subscription: a single mcp_ tool flips the whole request to the extra-usage lane; mcp__ is accepted. - build_anthropic_kwargs: promote ANY leading single-underscore mcp_ to mcp__ (bare names -> mcp__name; mcp_<server>_<tool> -> mcp__<server>_<tool>), never double-prefixing an already-mcp__ name. Same for tool_use blocks in history. - normalize_response: reverse the mcp__ wire name back to whichever original the registry knows — the single-underscore mcp_<server>_<tool> form for MCP server tools, or the bare name for native tools — preferring a name that already resolves natively. - Tests rewritten to assert the invariant: ZERO single-underscore mcp_ names reach the OAuth wire, and the mcp__ round-trip resolves back to the registered name for both native and MCP-server tools. Builds on liuhao1024's mcp__ prefix commit (cherry-picked). Closes the MCP-server gap that left any session with an MCP server configured still billing to extra usage.	2026-06-17 13:20:29 +05:30
liuhao1024	3d37869295	fix(anthropic): use double-underscore mcp__ prefix for OAuth tool names Anthropic's Claude-Code request classifier treats tool names with a single-underscore `mcp_<x>` prefix as non-Claude-Code / third-party, routing the request to extra-usage billing (HTTP 400). Real Claude Code uses double underscores: `mcp__<server>__<tool>`. Change the tool-name prefix from `mcp_` to `mcp__` in both the outgoing path (build_anthropic_kwargs) and the incoming path (normalize_response). Update the skip-guard to check for both `mcp_` and `mcp__` prefixes so native MCP server tools (which use the legacy single-underscore format) are not double-prefixed. Fixes #46675	2026-06-17 13:12:23 +05:30
Wolfram Ravenwolf	bd7fc8fdcd	feat(gateway): inject stable human-readable message timestamps Consolidates these related Amy fork patches: - 429830f39 feat(gateway): inject message timestamps into user messages for LLM context - 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay - 2874f7725 feat: human-friendly timestamp format with weekday and timezone name - 3735f4c8b fix: render gateway message timestamps once	2026-06-16 15:49:59 -07:00
Wolfram Ravenwolf	9137b86a52	fix(skills): ignore support docs in skill discovery Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks. This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references. Tests cover: - pruning nested SKILL.md files inside skill support directories - preserving support-named top-level categories - avoiding skill_view collisions from support markdown - keeping archived package SKILL.md files accessible only through file_path	2026-06-16 13:08:34 -07:00
Wolfram Ravenwolf	e76e7b5073	feat(hooks): session:compress event_callback for MemPalace sync	2026-06-16 11:45:36 -07:00
teknium	6ebc449915	fix(prompt): isolate truncation warnings per context Follow-up to salvaged PR #41619: replace the module-global _truncation_warnings list with a contextvars.ContextVar so concurrent gateway-session prompt builds can't drain or clear each other's pending warnings (cross-session leak). Adds a context-isolation test.	2026-06-16 11:28:35 -07:00
Wolfram Ravenwolf	f6a42b1acf	feat(prompt): make context-file truncation limit configurable PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful. SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.	2026-06-16 11:28:35 -07:00
Teknium	c2c55c4443	fix(memory): strip skill scaffolding for all providers, not just openviking Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution affected every auto-syncing memory provider — mem0, hindsight, retaindb, byterover, honcho, supermemory all store/embed the raw user turn, so a /skill invocation poisoned their stores with the full skill body, not just openviking. - Lift the contributor's parser into agent/skill_commands.py as the canonical extract_user_instruction_from_skill_message(), co-located with the message builders so the markers can't drift. - Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} — fixes the whole provider fan-out, bare /skill turns are skipped entirely. - OpenViking's _derive_openviking_user_text() now delegates to the shared helper as defense-in-depth (no duplicated marker literals). - Marker-drift regression now asserts against the canonical skill_commands constants; add manager-level coverage proving every provider gets clean text.	2026-06-16 10:37:37 -07:00
Hao Zhe	2c2ca0443b	feat(memory): improve OpenViking setup UX	2026-06-17 01:04:26 +08:00
brooklyn!	c6e99ab375	Merge pull request #46959 from NousResearch/bb/composer-model-selector feat(desktop): composer model selector, per-model presets & external-provider disconnect	2026-06-16 09:55:57 -05:00
Brooklyn Nicholson	7d938cc5c9	fix(desktop): keep live model switch metadata truthful A live config.set model switch already moved the next API call to the new model, but the conversation could still restore an old sessions.system_prompt snapshot whose Model/Provider lines named the previous runtime. That made "what model are you?" answer from stale metadata even while inference ran on the new model. After a live switch we now refresh the stored system prompt and append a real system-history pivot (not a fake user turn) so the transcript itself records the new model/provider. Restore also rejects already-stale prompt snapshots when their Model/Provider lines disagree with the runtime, so existing bad sessions self-heal.	2026-06-16 09:50:17 -05:00
Teknium	4858942c55	fix(auxiliary): honor main fallback chain for auto tasks (#47235 )	2026-06-16 06:23:24 -07:00
Wolfram Ravenwolf	4cf9d80fba	feat(display): verbose skill change notifications with content previews When display.memory_notifications is set to 'verbose', skill_manage notifications now show meaningful change details instead of just the generic tool message. Before (verbose mode): 💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement). After (verbose mode): 💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..." Changes: - skill_manager_tool.py: _patch_skill() now includes old/new string previews (truncated to 200 chars) in the result via '_change' key. _create_skill() and _edit_skill() include skill description from frontmatter for verbose create/edit notifications. - run_agent.py: Background review notification builder now reads the '_change' dict from skill tool results and formats descriptive notifications per action type (patch → old→new diff, create/edit → description preview). Falls back to generic message when _change data is unavailable (backwards compatible). This is especially useful when subagents patch skills, since neither the user nor the parent agent can see what the subagent changed.	2026-06-16 05:45:40 -07:00
Wolfram Ravenwolf	20b1f4f3fb	feat(memory): configurable background memory update notifications Background memory reviews now support three notification modes, configured via display.memory_notifications in config.yaml: off — no chat notification (still logged to stdout/HA log) on — generic '💾 Memory updated' (default, unchanged behavior) verbose — content preview with action indicators: 💾 Memory ➕ Hermes Repo liegt unter /config/amy/hermes-agent/... 💾 Memory ✏️ Updated repo path from claude-code to hermes-agent... 💾 Memory ➖ old entry about claude-code path... Previews are truncated to 120 chars for adds/replaces, 60 for removes. Each action gets its own line in verbose mode for readability. Files: run_agent.py, gateway/run.py	2026-06-16 05:45:40 -07:00
Teknium	3e7e9b24d4	fix: harden salvaged session and browser improvements Polish salvaged contributor work before PR review: - read browser inactivity timeout from config with documented fallback - skip redundant v10 trigram backfill before v11 FTS rebuild - show delegate_task goals safely in progress previews - show gateway status model/context without redundant token wording - wire gateway /sessions to shared session-listing helpers - map Ravenwolf author emails for release attribution Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de> Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>	2026-06-15 07:46:34 -07:00
Amy Ravenwolf	5035fa9029	feat(display): show delegate_task goals in tool progress notifications Previously, delegate_task in batch mode only showed '3 parallel tasks' without revealing what the tasks actually are. Single-task mode showed the goal via the primary_args fallback, but batch mode had no goal extraction. Changes: - build_tool_preview(): Add dedicated delegate_task handler that extracts individual task goals from both single and batch modes. Batch shows '3 tasks: Goal A \| Goal B \| Goal C'. - _get_cute_tool_message_impl(): Show individual goals in CLI cute messages for batch delegate calls ('3x: Goal A \| Goal B'). - Add 4 tests covering single goal, batch goals, missing goals, and no-goal edge case.	2026-06-15 07:46:34 -07:00
Teknium	49e743985a	fix: route minimax m3 reasoning controls through profile Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes. Co-authored-by: goku94123 <gooku94123@gmail.com>	2026-06-15 07:08:43 -07:00
Teknium	733472952a	fix: complete cron jobs lock salvage Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.	2026-06-15 06:29:00 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
xtymac	ad58dd51ac	redact secrets in API request debug dumps dump_api_request_debug() masks the provider Authorization header but writes the request `body` (system prompt, tool defs, context-embedded values) and the error message raw via atomic_json_write. This path also fires unconditionally on API errors (not only under HERMES_DUMP_REQUESTS), so any secret surfaced into context (e.g. an integration token) lands in cleartext at request_dump_*.json on every failed call. Run the serialized dump through the existing redact_sensitive_text() scrubber (already used for logs/tool output) before persisting and before the HERMES_DUMP_REQUEST_STDOUT print; preserve atomicity via temp-file + Path.replace. Also add the Notion internal-integration prefix (ntn_) to _PREFIX_PATTERNS so bare values are caught. Per SECURITY.md §3.2 this is a redaction (in-process heuristic) hardening, not a §3.1 vulnerability. Refs #46583.	2026-06-15 05:31:21 -07:00
liuhao1024	2cddc9c895	fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream converse() and converse_stream() were added in boto3 1.34.59. When Hermes is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46), the system boto3 takes precedence and calls to converse_stream fail with AttributeError. Add an early version check in _require_boto3() that raises a clear RuntimeError with upgrade instructions.	2026-06-15 05:25:17 -07:00
Teknium	f3fe99863d	revert(web): remove keyless Parallel search fallback (#46350 ) Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced. Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.	2026-06-14 16:47:57 -07:00
mr-r0b0t	bff78a34dc	feat(zai): add GLM-5.2 with verified 1M context window GLM-5.2 ships with a 1M (1,048,576) token context window. Without this entry, Hermes falls through to the generic 'glm' key (202,752 tokens), under-reporting the context bar and prematurely compressing conversations. The 1M limit was verified empirically via needle-in-a-haystack retrieval at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors, zero truncation, correct retrieval at every tested size (25K through 789K). Changes: - agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback - hermes_cli/models.py: add glm-5.2 to zai curated models - hermes_cli/setup.py: add glm-5.2 to setup wizard zai list - hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes - plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models - tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests	2026-06-14 13:50:36 -07:00
Teknium	4e6d05c6a5	perf(skills): share raw config cache in skill utils (#46149 )	2026-06-14 11:14:58 -07:00
kshitijk4poor	ce19fdb7ce	fix(skills): apply global\|platform disabled union to all resolution sites The platform-disabled fix landed only in agent.skill_utils.get_disabled_skill_names (the system-prompt path). Two sibling resolvers still used the old replace-not-union semantics, so the same skill could be hidden from the <available_skills> prompt yet reported enabled elsewhere: - hermes_cli/skills_config.get_disabled_skills (the 'hermes skills config' UI) returned only the platform list, so a globally-disabled skill showed as enabled (unchecked) on any platform with a platform_disabled entry. - tools/skills_tool._is_skill_disabled (gates whether skill_view loads a skill) ignored the global list when a platform list existed, so a globally-disabled skill could still be loaded on such a platform. Both now union the global list with the platform list, matching get_disabled_skill_names. An explicit empty platform list no longer re-enables a globally-disabled skill — global disables hold on every platform (#46201). Also: fix the now-stale get_disabled_skill_names docstring and drop a stray blank line. Regression tests added for both sites (proven to fail on the old replace semantics).	2026-06-14 22:54:54 +05:30
ibrahim özsaraç	7bbe7024c2	fix: filter platform-disabled skills from <available_skills> prompt (#46201 ) build_skills_system_prompt() already resolved _platform_hint but called get_disabled_skill_names() with no argument, so the resolved platform never reached the filter and the prompt cache_key varied by platform while the disabled set did not. Pass _platform_hint or None. get_disabled_skill_names() also fully ignored the global 'disabled' list once a platform-specific list was found. Return the union (global \| platform) so a globally-disabled skill stays disabled on every platform. Salvaged from #46203 by @iborazzi; the unrelated apps/shared/tsconfig.json ES2023 bump is intentionally dropped (one concern per PR).	2026-06-14 22:52:57 +05:30
Teknium	13a1bd0f83	perf(model-metadata): persist OpenRouter metadata cache (#46114 )	2026-06-14 04:45:46 -07:00
Teknium	723c2331bd	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00
zccyman	b00060ce54	fix(agent): expose HERMES_REAL_HOME in subprocess envs for profile isolation When profile isolation activates ({HERMES_HOME}/home/ exists), child processes receive HOME={HERMES_HOME}/home/ for tool config isolation (git, ssh, gh). However, scripts using Path.home() to locate ~/.hermes/ would incorrectly resolve to the isolated profile home, breaking helpers that rely on the real user home directory. New get_real_home() helper in hermes_constants resolves the actual user home independently of profile isolation. All four subprocess spawners now inject HERMES_REAL_HOME alongside the profile HOME: - tools/code_execution_tool.py (execute_code) - tools/environments/local.py (terminal background, run_env) - agent/copilot_acp_client.py (Copilot ACP) Child scripts can now use: Path(os.environ.get("HERMES_REAL_HOME", os.environ.get("HOME", ""))) to reliably find the real user home regardless of profile isolation. Closes #25114	2026-06-14 03:20:21 -07:00
helix4u	85e6232a07	fix(providers): support anthropic proxy v1 endpoints	2026-06-14 02:09:16 -07:00
Teknium	81e42335a1	fix(file-safety): relax user-write deny policy (#45947 ) Allow file tools to edit shell startup files, user package-manager configs, and Hermes control files that the user can already modify directly. Keep hard blocks for SSH keys, .env/OAuth token stores, mcp-tokens, pairing files, and system privilege files.	2026-06-14 02:07:32 -07:00
Brooklyn Nicholson	715b691723	fix(desktop): show summarizing indicator during auto-compaction Auto-compression rewrites history mid-turn, which made long threads look like they reset. Re-tag the gateway lifecycle status as compacting and surface it in the desktop thread loading indicators.	2026-06-14 02:28:07 -05:00
kshitijk4poor	10bd01972b	refactor(agent): share the content_policy_blocked result builder + recovery hint The HTTP-200 refusal handler (finish_reason=content_filter) and the exception-path handler (a provider moderation error classified as content_policy_blocked) independently built the same terminal turn result — the same {final_response, messages, api_calls, completed:False, failed:True, error:'content_policy_blocked: ...'} dict — and ended their user-facing message with the same 'Try rephrasing... hermes fallback add' trailer, copied verbatim. The two copies could drift. Funnel both through a shared _content_policy_blocked_result() builder and a shared _CONTENT_POLICY_RECOVERY_HINT constant. Also collapse the HTTP-200 path's two near-identical with/without-explanation templates into one (compute the detail fragment once) and pass reason=FailoverReason.content_policy_blocked .value to the error hook instead of a hand-written string literal, matching the sibling hook call. Behavior-preserving: the provider/refusal lead-in wording stays distinct (a provider safety filter vs the model declining are genuinely different signals), the with-text and exception messages are byte-identical to before, and the no-explanation case only gains a paragraph break for consistency. Surfaced by the simplify-code reuse/quality reviewers. The efficiency reviewer's 'redundant normalize_response' flag was deliberately NOT applied: that branch is cold (refusal-only) and pure-CPU, and reusing the sibling-branch normalized locals would risk a NameError on the codex_responses path (which sets finish_reason without normalizing) — re-normalizing is the robust choice.	2026-06-14 12:19:19 +05:30
kshitijk4poor	12c84d6c77	fix(transports): only treat a refusal as terminal when it is the sole payload A chat-completions response that carries real text or tool calls alongside a `message.refusal` note is a normal, usable turn — the model did work. The prior logic flipped finish_reason to `content_filter` whenever a refusal string was present, so the conversation loop reframed a content-bearing turn as a failed safety refusal (failed=True) and buried the model's actual output inside the "model declined" template, or dropped tool calls entirely. Only promote to a terminal `content_filter` when the refusal is the sole payload (no visible text AND no tool calls). The refusal explanation is still recorded in provider_data in every case for observability. Refusal-only responses (the bug this feature targets) are unaffected and still surface terminally; the empty+refusal, bare content_filter passthrough, and no-refusal common cases are byte-identical to before. Updates the partial-content test to the corrected contract and adds a tool_calls-alongside-refusal regression guard.	2026-06-14 12:12:52 +05:30
SHL0MS	bb46bf8ce4	fix(agent): surface model refusals instead of retrying them as errors A Claude refusal (HTTP 200, stop_reason="refusal", empty content) was laundered into a generic retry loop and surfaced as a misleading "rate limited / invalid response" or "no content after retries" error, burning paid attempts reproducing a deterministic refusal. This hit two distinct paths: - Direct Anthropic (anthropic_messages): validate_response rejected the empty-content refusal before normalize_response mapped refusal -> content_filter, so it fell into the invalid-response retry loop. - Nous Portal / OpenAI-compatible (chat_completions): the portal surfaces a Claude refusal via message.refusal with empty content, which sailed past validation and died in the empty-response retry loop. Fix (one unified content_filter dispatch for all backends): - AnthropicTransport.validate_response: accept empty content when stop_reason == "refusal" so it flows to normalize_response. - ChatCompletionsTransport.normalize_response: promote message.refusal to content + a content_filter finish reason. - conversation_loop: handle finish_reason == "content_filter" - fire the api_request_error hook (content_policy_blocked), try a configured fallback once, else return a clear terminal refusal message. Never retry a deterministic refusal. Supersedes #43084, which fixed only the direct-Anthropic path and could not reach the chat_completions/portal path. Tests: transport-level (validate_response refusal, message.refusal promotion) + end-to-end loop (refusal surfaced, exactly one API call). (cherry picked from commit `01f546f92c`)	2026-06-14 12:10:08 +05:30

1 2 3 4 5 ...

1305 commits