hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-13 09:01:54 +00:00

Author	SHA1	Message	Date
Teknium	7e46533d9f	test: compressed-summary metadata flag set in-process, stripped on wire	2026-06-12 16:47:15 -07:00
konsisumer	aec38855b5	fix(agent): preserve recent turns during compression	2026-06-12 16:26:58 -07:00
xxxigm	68536d4375	test(compressor): regression coverage for assistant-tail anchor + compaction rollup (#29824 ) 21 cases pinning the new ``_ensure_last_assistant_message_in_tail`` anchor and its interaction with the existing tail-cut path: * ``TestFindLastAssistantMessageIdx`` — helper contract: prefers a content-bearing assistant message, skips ``tool_calls``-only stubs, multimodal text-block content counts, falls back to "any assistant" when no content-bearing reply exists, honours ``head_end``, returns -1 when there's none. * ``TestEnsureLastAssistantMessageInTail`` — direct: no-op when already in the tail, walks ``cut_idx`` back when the reply is in the compressed middle, never crosses into the head region, re-aligns through a preceding ``tool_call`` / ``tool_result`` group instead of orphaning it. * ``TestFindTailCutByTokensAnchorsAssistant`` — integration: reporter repro (long tool-output run after the visible reply) now preserves the reply; user and assistant anchors compose in a single tail-cut call; a soft-ceiling-overrunning oversized tool result no longer strands the prior reply. * ``TestCompactionRollupReproduction`` — end-to-end through ``compress()`` with a stubbed ``_generate_summary``: the visible reply text survives either as its own standalone assistant message (normal path) or concatenated onto the merged summary tail (double-collision path the WebUI then re-splits). The standalone-summary case is asserted strictly (exactly one summary row, exactly one separate assistant row carrying the reply) — that's the dominant path and any drift there reintroduces the original bug. * ``TestSourceGuardrail`` — static asserts on ``agent/context_compressor.py``: the helper exists, the anchor is wired into ``_find_tail_cut_by_tokens`` AFTER the user-message anchor (so chaining is monotonic), the content-bearing preference is preserved, and the issue number is referenced so future bisects can find this fix.	2026-06-12 15:41:57 -07:00
Tranquil-Flow	749b7219c4	fix(compression): always append END OF CONTEXT SUMMARY marker to standalone summaries regardless of role When the compression summary lands as an assistant-role message (head ends with user), the end marker was not appended. Models may regurgitate the summary text as their own visible output when there's no clear boundary signal (#33256). The end marker was already appended for user-role summaries (#11475, #14521) but the assistant-role path was missed in the original fix. This ensures ALL standalone summary messages carry the boundary marker, preventing summary text from leaking into user-visible chat output.	2026-06-12 15:05:00 -07:00
Teknium	652dd9c9f2	fix: rich messages follow-ups — reply_parameters, send latch, opt-in default - Use reply_parameters per the sendRichMessage spec instead of the undocumented reply_to_message_id scalar (silently ignored -> reply anchor quietly dropped). - Latch rich sends off after an endpoint-capability failure (old PTB / server without sendRichMessage) so every later reply doesn't pay a doomed extra roundtrip; per-message BadRequests do NOT latch. - Default rich_messages to OFF (opt-in) while the day-old Bot API 10.1 endpoint is validated live; revert the prompt-hint table guidance until the default flips on. - Tests: reply_parameters shape, send-latch behavior, BadRequest non-latch; rich tests opt in explicitly via extra.	2026-06-12 11:47:54 -07:00
ITheEqualizer	05b9c84ca4	Add Telegram Bot API 10.1 rich message support Introduce opportunistic support for Telegram Bot API 10.1 rich messages by sending raw agent Markdown via sendRichMessage and streaming previews via sendRichMessageDraft. Implements a rich-path fast‑path in gateway/platforms/telegram.py (RICH_MESSAGE_MAX_BYTES=32768, feature gate platforms.telegram.extra.rich_messages, bot capability checks, routing/thread handling, and conservative fallback rules: permanent/capability errors fall back to the legacy MarkdownV2 path, transient/network errors are surfaced without legacy-resend). Also add a latch for draft capability failures (_rich_draft_disabled) and preserve legacy chunking and draft behavior when needed. Update agent prompt hints (telegram encourages rich Markdown/tables), add CLI config example option, update English and Chinese docs to describe rich messages and fallbacks, and add/adjust tests for rich send and draft behavior.	2026-06-12 11:47:54 -07:00
Siddharth Balyan	7ba5df0d52	feat(billing): /credits command — balance + portal top-up handoff (#44776 ) * feat(billing): /usage → portal top-up browser handoff Add the terminal side of the billing slice (phase 2a): start a top-up by throwing the user to the portal billing page with the top-up modal open. The terminal does not confirm, poll, or track payment — checkout completes in the browser and the next /usage shows the new balance. - nous_account.py: parse organisation.slug/name from /api/oauth/account into NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned {base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy {base}/billing?topup=open (never /orgs/None/...). - portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line (Topping up as <email> / org <name>), browser open with printed-URL fallback, no-wait closing copy. No polling/confirmation (deferred to 2b). - account_usage.py: the shared /usage credits block now links the org-pinned top-up URL (auto-opens the modal) + points to the command. Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until that is live on the target env; until then /api/oauth/account returns organisation: { id } only and the URL falls back to legacy. * feat(billing): /credits command for balance + top-up handoff Replace the standalone `hermes portal topup` subcommand with an in-session /credits slash command — a focused money surface (balance in, top-up out) that works in the CLI, TUI, and every messaging platform from one registry entry. - commands.py: register /credits (Info category). Slack is at its 50-slash cap, so /credits is routed via /hermes credits on Slack only (new _SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the native list and breaking Telegram parity; native everywhere else. - account_usage.py: build_credits_view() — one portal fetch → balance lines + identity line + org-pinned top-up URL + depleted flag, consumed by all surfaces. Reuses the same snapshot/URL builder as /usage so numbers match. - cli.py: _show_credits() — balance block + identity line + 3-button panel (Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal. ASK, never auto-launch; headless falls back to printing the URL. - gateway/slash_commands.py: _handle_credits_command() — renders the block + tappable top-up URL + no-wait copy; works on button and plain-text platforms. - /usage credits line now points to /credits. - Retire `hermes portal topup` (portal_cli.py back to baseline); the engine (slug/name parse + nous_portal_topup_url) stays as the shared core. No polling, no payment confirmation (billing phase 2a). Depends on NAS #409. * fix(credits): /credits works in the TUI slash-worker (non-interactive) In the TUI, /credits runs in the slash-worker subprocess where there is no live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called the 3-button modal unconditionally, which fell back to reading stdin → exception → slash.exec rejected → the command produced no output (only the pre-existing 'Credit access paused' banner showed). - _show_credits: when self._app is None (TUI worker / piped / non-interactive), render the text variant — balance block + tappable top-up URL + no-wait line, same affordance as the messaging surfaces — and skip the modal entirely. The 3-button panel still renders in the interactive CLI. - Depleted banner copy: 'run /usage for balance' → 'run /credits to top up' now that /credits is the dedicated money surface (+ tests). - Regression tests: _show_credits with self._app=None renders text and never invokes the modal; logged-out path. * feat(tui): credits.view RPC for the /credits tappable top-up button Add a credits.view JSON-RPC method returning the structured CreditsView (logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can render a clickable <Link> top-up button instead of plain text. Account- independent (portal fetch gated on a logged-in Nous account), fail-open to {logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern. Frontend (TUI-local /credits command + Ink component) lands separately. * feat(tui): /credits command with keyboard-driven top-up confirm TUI-local /credits: fetches the structured balance via the credits.view RPC, prints the balance + identity + top-up URL, then arms the EXISTING confirm overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel). Reuses ConfirmReq — no new overlay component/state/input handler. Headless (openExternalUrl returns false) falls back to printing the URL. - gatewayTypes.ts: CreditsViewResponse. - commands/credits.ts: the command (mirrors /status's rpc+guarded pattern). - registry.ts: register creditsCommands. - test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases). Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.	2026-06-12 08:51:10 +00:00
Tranquil-Flow	286ecd26d8	fix(agent): strip MEDIA directives from compressor summarizer input (#14665 )	2026-06-12 01:14:28 -07:00
Teknium	c196269d8d	fix(credits): suppress usage gauge when top-up funds exist + add display.credits_notices toggle (#44716 ) The subscription-cap usage gauge (50/75/90% bands) ignored purchased (top-up) credits: a sub user with top-up funds got a sticky warn banner at 90% of their cap — permanently at >=100%, alongside grant_spent — despite being fully able to keep inferencing. The cap is the wrong denominator for an account that can keep spending. - evaluate_credits_notices: purchased_micros > 0 suppresses the usage band (grant_spent already covers the cap-reached + top-up case with the remaining balance). A top-up landing mid-session clears any showing band; spending top-up down to 0 resumes the gauge. - New display.credits_notices config (default true): false silences all credits notices. State capture and /usage are unaffected. Read once per agent (cached) in _emit_credits_notices, fail-open true. - Docs: configuration.md display block.	2026-06-12 01:06:46 -07:00
kshitijk4poor	15439bee47	refactor(memory): reuse _summarize_user_message_for_log instead of forking it The original fix added agent/memory_manager.py:flatten_message_content, but that helper was a near-exact duplicate of agent/codex_responses_adapter.py:_summarize_user_message_for_log — same None/str/list dispatch, same {text,input_text,output_text}/{image_url,input_image} part sets, the identical [N image(s)] marker, and the same str() fallback. The only difference was the join separator (newline for memory vs space for the log/trajectory previews the existing helper already serves), and that helper is already imported into agent/turn_finalizer.py — the same file whose call site the memory fix touches. Parameterize the existing helper with sep=' ' (default preserves every current logging/trajectory caller byte-for-byte) and call it with sep='\n' at the memory boundary; drop the forked flatten_message_content. Repoints the unit tests to the consolidated helper and adds a case locking the default space-join. Single source of truth for multimodal-content flattening; no behavior change for the fix or for existing callers.	2026-06-12 12:49:18 +05:30
Erosika	87893fe4cb	fix(memory): flatten multimodal content before provider sync Multimodal turns carry message content as a list of typed parts ({type: "text"\|"image_url", ...}). _sync_external_memory_for_turn passed that list straight into MemoryManager.sync_all, and providers feed it to regexes — Honcho's sync_turn calls sanitize_context, where re.sub raised 'expected string or bytes-like object, got list'. Every turn with an attached image silently never synced. Flatten to plain text at the boundary: text parts joined, images noted as an [N image(s)] marker so the attachment isn't erased from recall. Fixing here covers all providers instead of patching each plugin. (cherry picked from commit `705bdb6ffe`)	2026-06-12 12:46:28 +05:30
Teknium	c7bee8f961	refactor(agent): drop unused tail_start param from _derive_auto_focus_topic The parameter was reserved-but-unused (del'd immediately); YAGNI. Test call site updated.	2026-06-11 23:03:52 -07:00
konsisumer	434c684bfa	fix(agent): focus automatic compression on recent user turns	2026-06-11 23:03:52 -07:00
Teknium	db7714d5f1	Merge pull request #44331 from NousResearch/hermes/hermes-6b48295e feat(whatsapp): WhatsApp Business Cloud API adapter (salvage #43921)	2026-06-11 22:48:06 -07:00
Brooklyn Nicholson	ab06ef8ed6	fix(coding): teach agents terminal env state persists Tell coding agents to activate shell setup once per session instead of re-sourcing it before every command, and pin the existing LocalEnvironment env-snapshot behavior with regression tests.	2026-06-11 19:50:08 -05:00
ethernet	96cc7ee1e3	fix(coding): don't provide worktree root in context this makes the agent frequently edit files in the wrong worktree. what the agent doesn't know can't hurt it.	2026-06-11 20:27:06 -04:00
Teknium	acb2954d82	fix(agent): freeze carveout-era SUMMARY_PREFIX for renormalization The prompt consolidation above retires the carveout-era prefix. Without a frozen copy in _HISTORICAL_SUMMARY_PREFIXES, summaries persisted by pre-upgrade builds would lose detection (_is_context_summary_content) and renormalization (_strip_summary_prefix) — the exact regression class the tuple exists to prevent. Adds contract tests covering every frozen prefix. Refs #41607 #38364 #42812	2026-06-11 13:57:13 -07:00
kyssta-exe	8f8cad7ec5	fix(agent): strengthen compression preamble against stale task execution (#41607 )	2026-06-11 13:57:13 -07:00
konsisumer	d5e2fbf244	fix(agent): frame compaction handoff sections as historical context	2026-06-11 13:57:13 -07:00
brooklyn!	a4f179c509	fix(agent): steer GPT/Codex family to V4A for single-file edits too (#44411 ) The coding-posture brief told GPT/Codex models to use patch mode='patch' (V4A) for structured/multi-file changes but mode='replace' "for a single small swap". That second nudge points those models at a format their first-party harness never taught them. Verified against openai/codex (current main): apply_patch is the ONLY file editor in codex-rs — zero occurrences of str_replace/old_string anywhere in the repo; the grammar (core/src/tools/handlers/apply_patch.lark) is exactly the V4A dialect our patch_parser implements; the shipped model prompts (gpt_5_codex, gpt-5.2-codex, gpt-5.1-codex-max + instruction templates) explicitly say to use apply_patch "for single file edits"; and the tool is gated per model via ModelInfo.apply_patch_tool_type, i.e. OpenAI ships V4A-for-everything as model metadata. The GPT-family line now steers to mode='patch' for all edits, single-file included. The replace-family line (Claude + open-weight) is unchanged — Claude Code's FileEdit is old_string/new_string/replace_all exact string replacement (confirmed from Anthropic's shipped sdk-tools.d.ts, the only file editor in its tool union), matching our mode='replace'.	2026-06-11 17:52:52 +00:00
Teknium	4d6a133a9f	fix(agent): gate skill-index demotion behind the opt-in focus mode (#44387 ) The coding posture's names-only demotion of non-coding skill categories (#44342) applied under the default auto mode, silently changing the skill index for every user in a git repo. Index changes must be opt-in: demotion now only fires under agent.coding_context=focus, alongside the toolset collapse. auto/on leave the skill index untouched; focus semantics are unchanged (demoted, never hidden; deny-list keeps coding-adjacent and custom categories at full entries).	2026-06-11 10:00:57 -07:00
brooklyn!	ee1a744ace	fix(agent): demote non-coding skill categories to names-only — never hide skills (#44342 ) Real-world failure with the original index pruning: under the default auto posture, an agent-created ops skill in a demoted category vanished from the prompt's skill index mid-project, and the agent silently fell back to a stale sibling skill instead. The "discovery-only" premise didn't hold — models do not reach for skills_list to rediscover what the index stops showing them, and agent-created skills are the model's accumulated project memory (runbooks, pitfalls, operating rules). Gating pruning behind the opt-in focus mode was the wrong fix too: users opening a worktree don't know the config exists, so the index-noise win would effectively never ship. Instead, the coding posture now DEMOTES non-coding categories rather than hiding them: each demoted category renders as a single names-only line ("gaming [names only]: allthemons10-ops, mc-backup") with a footer note explaining the omitted descriptions. Every skill name stays in the prompt, so memory-anchored recall ("load <name>") keeps working in every mode, while the description noise is still cut. Applies in auto/on/focus alike; the general posture demotes nothing. Deny-list semantics unchanged — unknown/custom categories and coding-adjacent ones keep full entries. API renamed to match the honest semantics: hidden_skill_categories → compact_skill_categories, build_skills_system_prompt(hidden_categories=) → compact_categories=.	2026-06-11 10:25:42 -05:00
Teknium	2ecb4e62bb	Merge remote-tracking branch 'origin/main' into hermes/hermes-6b48295e	2026-06-11 07:38:25 -07:00
Teknium	e24c935cf3	fix(bedrock): fall back to non-streaming InvokeModel when IAM denies InvokeModelWithResponseStream (#44293 ) IAM policies scoped to bedrock:InvokeModel only (a common least-privilege setup) reject converse_stream() with AccessDeniedException. The agent loop hard-prefers streaming and the denial never matched the 'stream not supported' auto-fallback, so InvokeModel-only users looped on AccessDenied forever. - agent/bedrock_adapter.py: new is_streaming_access_denied_error() detector (ClientError code check + wrapped-SDK message match); call_converse_stream() falls back to converse() on denial. - agent/chat_completion_helpers.py: bedrock_converse streaming branch retries inline via converse() and sets _disable_streaming so later turns skip the doomed stream attempt; the chat-completions retry block also recognizes the denial for the AnthropicBedrock SDK path (message pre-check avoids importing bedrock_adapter — and its lazy boto3 install — for unrelated providers). Both paths print a one-line notice telling the user which IAM action restores streaming.	2026-06-11 07:15:30 -07:00
brooklyn!	3e74f75e41	feat(agent): coding-context posture across CLI/TUI/desktop/ACP (#43316 ) * feat(agent): coding-context posture with per-model edit-format tuning Hermes detects when it's running in a coding context — an interactive surface (CLI, TUI, ACP, desktop) sitting in a code workspace (git repo or recognised project root) — and shifts into a coding posture. Outside that (chat platforms, non-workspaces) nothing changes. The posture is modelled as a frozen RuntimeMode selected from a small ContextProfile registry (coding/general). A profile is data: the toolset to collapse to, the operating brief to inject, and seams for model routing and memory. Every domain reads the same resolved object instead of re-probing git/config on its own: - System prompt — RuntimeMode.system_blocks(): an operating brief (gather context before editing, edit through tools not chat, verify with terminal, cap retry loops) plus a live git/workspace snapshot, built once and baked into the stable prompt tier so per-conversation caching is preserved. - Per-model edit-format tuning — the brief nudges each model family toward the patch mode it handles best: OpenAI/Codex toward mode='patch' (V4A multi-file diffs), Anthropic toward mode='replace' (string replacement). The model id rides on RuntimeMode; unknown families keep neutral wording. - Skill index — non-coding skill categories are pruned from the prompt's skill index (discovery-only; skills_list/skill_view still reach the full catalog, with a disclosure note). - Toolset — only under the opt-in 'focus' mode does the posture collapse to the coding toolset + enabled MCP servers; the default posture is prompt-only and never overrides configured toolsets. Activation via agent.coding_context: auto (default), focus, on, off. Subagents inherit the posture for free via toolset inheritance + the shared prompt builder. Detection is not memoized so a long-lived gateway/TUI process can't pin a stale posture across working directories. * feat(agent): cover new-file authoring in the coding edit-format nudge The per-model edit-format guidance only addressed editing existing code (patch mode='patch' vs 'replace'), but authoring a brand-new file — write_file, not patch — is a large fraction of real coding work and the nudge was silent on it. Surfaced when building a single-file artifact where the dominant operation was write_file and the steering offered no guidance. Both family lines now lead with "author new files with write_file; for edits to existing code prefer ...". Tests assert write_file appears in each family's brief; unknown families still get neutral wording. * docs(agent): correct memoization docstring + clarify TUI config-load asymmetry * feat(agent): sharpen the coding posture — verify-loop facts, wider edit steering, $HOME guard Tuning pass on the coding posture from dogfooding it as a harness: - Workspace snapshot now hands the model its verify loop up front: detected manifests + package manager (lockfile sniff), the exact verify commands (package.json scripts, Makefile targets, scripts/run_tests.sh, pytest config), and which context files (AGENTS.md / CLAUDE.md / .cursorrules) exist at the root. Marker-only (non-git) projects get the snapshot too instead of nothing. The "verify before claiming done" brief line was the highest-value piece in evals — this turns it from advice into an executable loop instead of making the model rediscover the test command every session. Still stat-cheap, size-guarded reads, built once at prompt time. - Edit-format steering covers the families Hermes actually serves: Gemini and open-weight coding models (DeepSeek, Qwen, Kimi, GLM, Grok, Hermes, Llama, Mistral, Devstral, MiniMax) steer to mode='replace' — their RL scaffolds use str_replace-style editors. Previously only GPT/Codex and Claude families got steering; the models Hermes users disproportionately run all fell to neutral. - Operating brief gains four behaviors elite harnesses encode: batch independent reads/searches in one turn; fix root causes and the bug class (sibling call paths), not the reported site; no drive-by refactors/renames/reformatting; never read, print, or commit secrets. Plus a patch-failure escalation ladder: after the same region fails twice, rewrite the enclosing function/file with write_file instead of a third patch attempt. - $HOME dotfiles guard: a git repo rooted exactly at the home directory (or a marker sitting in it, e.g. a global ~/AGENTS.md) is user config, not a code workspace — without the guard, every session anywhere under a dotfiles-managed home silently flipped to the coding posture. Real projects under such a home still detect via their own markers/repos; 'on' mode bypasses the guard.	2026-06-10 23:06:44 -05:00
teknium1	efcbbde48c	refactor: keep anthropic_content_blocks in-memory only (no state.db column) Drop the hermes_state.py column + persistence plumbing from the salvaged interleaved-thinking fix. The ordered-block channel covers the failure window in-memory (turn replayed within the live conversation loop). A session reloaded from disk after a crash falls back to reconstruction; if that replay 400s, the thinking-signature recovery (#43667) strips reasoning_details and retries — one degraded call in a rare resume path instead of a schema column. Replaces the DB-roundtrip test with a fallback-shape test.	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	7a1eed8268	fix(anthropic): redact replayed tool inputs and broaden thinking-replay 400 recovery Two additive hardening changes on the interleaved-thinking replay path introduced by this PR's anthropic_content_blocks channel. Both are scoped to that channel's blast radius; neither changes correct behavior. 1. Replay-time tool-input re-sourcing (credential safety). The ordered-block channel captures each tool_use `input` from the RAW API response in normalize_response, which is NOT credential-redacted. The parallel tool_calls[].function.arguments IS redacted at storage time (build_assistant_message, #19798). The verbatim-replay fast path in _convert_assistant_message replayed the raw block input, so a secret a model inlined into a tool call (e.g. an Authorization header value passed inside a terminal command) would ride back onto the wire even though it is redacted everywhere else in history. Re-source tool_use input from the redacted tool_calls map by sanitized id; interleave order (the reason this channel exists) is unaffected. Adapted from #36071, which re-sources tool inputs the same way on its replay path. 2. Broaden the thinking-replay 400 classifier (defense-in-depth). error_classifier only matched "signature" + "thinking", so the frozen-block variant — "thinking ... blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response." — carried no "signature" token and fell through to a non-retryable abort. The anthropic_content_blocks channel prevents the reorder that triggers this 400 at the source, but if any future mutator reintroduces it, the turn now self-heals via the existing strip-reasoning-and-retry recovery instead of crash-looping. A negative case ensures an unrelated "cannot be modified" 400 (no "thinking") is not swept in. Mirrors the classifier broadening in #36087 and #36071. Tests - tests/agent/test_anthropic_thinking_block_order.py: a replay test asserting an inlined secret is redacted on the wire while interleave order is preserved. - tests/agent/test_error_classifier.py: three cases — frozen-block 400 native and via OpenRouter route to thinking_signature/retryable; an unrelated "cannot be modified" 400 does not. Both grafts verified RED (tests fail with the change reverted) then GREEN. Full adapter, transport, classifier and output-field-leak suites pass. Co-authored-by: AlexanderBFoley <92330381+AlexanderBFoley@users.noreply.github.com>	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	529bb1c3d5	fix(anthropic): strip output-only SDK fields from replayed content blocks HTTP 400 "messages.N.content.M.text.parsed_output: Extra inputs are not permitted" on the native Anthropic transport. Anthropic SDK 0.87.0 response blocks carry output-only attributes the Messages input schema forbids: text blocks get `parsed_output` and `citations=None`, tool_use blocks get `caller`. normalize_response captured blocks verbatim via _to_plain_data and replayed them as request input on the next turn, so the forbidden fields leaked back -> 400. Like the earlier thinking-block bug, one poisoned turn wedges every subsequent request in the session (even the diagnostic turn), recoverable only by switching models or deleting the session. This is a defect in the anthropic_content_blocks channel added for the interleaved-thinking fix: it preserved block ORDER correctly but copied every SDK attribute, including output-only ones. Fix — whitelist input-permitted fields per block type at all three leak points: - agent/transports/anthropic.py normalize_response: sanitize at CAPTURE so the poison never persists to state.db (defence-in-depth). - agent/anthropic_adapter.py _sanitize_replay_block (new): whitelist used on the ordered-blocks replay path; also recovers already-poisoned stored sessions. - agent/anthropic_adapter.py _convert_content_part_to_anthropic: a stored `text` part is rebuilt from whitelisted fields instead of dict(part) verbatim (this was the exact content.N.text.parsed_output failure locus). Whitelist not blacklist, so future SDK output-only fields can't reintroduce it. Block order and thinking-block signatures are preserved (the reason the channel exists). Adds tests/agent/test_anthropic_output_field_leak.py; full adapter suite green (163 tests). Existing poisoned state.db rows scrubbed out-of-band.	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	aaccaada28	fix(anthropic): preserve interleaved thinking/tool_use block order on replay Interleaved-thinking turns (adaptive thinking, Claude 4.6+/Opus 4.8) emit content blocks like: thinking_1(signed) tool_use_1 thinking_2(signed) tool_use_2 Anthropic signs each thinking block against the turn content preceding it at its position. normalize_response split the turn into two parallel lists (reasoning_details + tool_calls), discarding cross-type order, and _convert_assistant_message rebuilt it as [all thinking][text][all tool_use]. That moved thinking_2 ahead of tool_use_1, invalidating its signature, so Anthropic rejected the latest assistant message with HTTP 400: messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. Observed repeatedly in agent.conversation_loop against api.anthropic.com / claude-opus-4-8, recurring across sessions on multi-thinking-block turns. Fix: carry a verbatim, order-preserving copy of the turn's content blocks (anthropic_content_blocks) end-to-end - capture in normalize_response, persist/restore through state.db, and replay unchanged for the latest assistant message. Gated to turns that actually interleave signed thinking with tool_use, so normal turns are unaffected. Adds 3 regression tests including a SQLite round-trip covering the crash-recovery reload path.	2026-06-10 20:45:16 -07:00
Matt Harris	e0e2571711	feat(web): Parallel-backed web search & extract — free Search MCP when keyless, v1 REST when keyed Make Parallel the web search/extract backend with a zero-setup free tier: - Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel becomes the default backend when no other web credentials are configured (ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing web_search/web_extract tools are the only tools registered. - Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints (client.search / client.extract with advanced_settings.full_content) — no beta. Bumps parallel-web 0.4.2 -> 0.6.0. - Attribution: on the free path only, results carry provider/attribution and the CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is unbranded. - Selection/registration: web tools register unconditionally (free MCP backstop) while check_web_api_key remains a real usability probe; explicit per-capability backends are honored (so misconfig surfaces) rather than masked by the fallback. Tested: live web_search/web_extract against search.parallel.ai in keyless and keyed modes; unit suites for the MCP client, backend selection, and display labeling; full agent run shows the "Parallel search" label on the free path.	2026-06-10 19:54:38 -07:00
emozilla	bfcc9f92b4	Merge commit '`6110aed9b`' into feat/whatsapp-cloud-api	2026-06-10 21:39:22 -04:00
xxxigm	615ad97928	fix(streaming): stop socket read timeout from preempting stale-stream detector (#43570 ) * fix(streaming): stop socket read timeout from preempting stale-stream detector The stale-stream detector is deliberately scaled to 180-300s so reasoning models (e.g. Opus) can pause mid-stream during extended thinking. But the httpx socket read timeout stayed at a flat 120s for cloud providers and fired first, tearing down healthy reasoning streams before the detector (which owns retry + diagnostics) could act. Symptom: every Copilot/Opus turn dies with ReadTimeout at a consistent ~125s and never completes. Floor the cloud socket read timeout at the stale-stream timeout so it can no longer fire before the detector. Local providers and explicit HERMES_STREAM_READ_TIMEOUT / request_timeout_seconds overrides are unchanged. * test(streaming): pin read-timeout >= stale-stream invariant for cloud reasoning streams Cover the contract that the httpx socket read timeout is never shorter than the stale-stream detector for cloud providers on the default: small contexts floor to 180s, >=50K to 240s, >=100K to 300s; explicit overrides win; local providers and the unresolved-value fallback are unaffected.	2026-06-10 20:21:38 -05:00
Ian Culling	86e10dd874	fix(agent): route 'thinking blocks cannot be modified' 400 to recovery Anthropic returns a 400 when the thinking/redacted_thinking blocks in the latest assistant message are mutated upstream: 'thinking or redacted_thinking blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.' The classifier's thinking_signature branch only matched on the substring 'signature', so this variant fell through to a non-retryable client error and hard-aborted the turn -- even though the existing strip-reasoning_details -and-retry recovery would have healed it. Broaden the 400 match to also catch 'cannot be modified' / 'must remain as they were' (still gated on 'thinking'), routing it to the same recovery. Adds a negative-case test so unrelated 'cannot be modified' 400s are not swept in. Defense-in-depth, orthogonal to the root-cause work in #35975 / #17861 (which prevent the block mutation in the first place). Only changes a terminal-failure into a one-shot recovery. Signed-off-by: Ian Culling <ian@culling.ca>	2026-06-10 12:39:44 -07:00
rob-maron	6110aed9be	Suppress "Credit access paused" notice on free models (#43669 ) * don't show credits message on free model * PR comments	2026-06-10 23:55:06 +05:30
Gille	47e77ae166	fix(curator): use shared atomic state writer	2026-06-10 03:04:54 -07:00
Robin Fernandes	af978ecb17	fix(model): require confirmation for expensive model selections Rebased onto current main and re-ported across the restructured surfaces: model flows now thread confirm_provider/base_url/api_key through hermes_cli/model_setup_flows.py, the Discord picker lives in plugins/platforms/discord/adapter.py, and the web dashboard picker applies chat-mode switches via config.set so the expensive-model confirmation can ride the response. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 00:24:06 -07:00
teknium	2ce3ae3d16	fix(error-classifier): don't misclassify unsupported-param 400s as context overflow A GPT-5 model rejecting max_tokens returns a 400 whose message contains the literal substring 'max_tokens' — one of the _CONTEXT_OVERFLOW_PATTERNS. The 400 path in _classify_400 checked overflow patterns before any request-validation check (which only existed on the 5xx path), so the parameter error was routed into the compression loop, re-sent with the same bad param, and ended in 'Cannot compress further' on a tiny context. Hoist a request-validation guard (unsupported/unknown parameter) above the context-overflow check in _classify_400. Deliberately excludes the generic invalid_request_error code, which OpenAI also stamps on real overflow 400s, so genuine overflows still compress. Pairs with the max_completion_tokens param fix that stops the bad request at the source. Also adds AUTHOR_MAP entry for the salvaged PR #13902 commit.	2026-06-09 23:22:10 -07:00
Xiangji	19c07c4037	fix(params): send max_completion_tokens for newer OpenAI families on custom endpoints Third-party OpenAI-compatible endpoints (self-hosted gateways, OpenRouter, Azure proxies) fronting gpt-4o / gpt-4.1 / gpt-5+ / o1-o4 models silently received max_tokens and 400'd with unsupported_parameter, because the three kwarg-selection sites only checked base_url_hostname(...) == "api.openai.com" and fell through to max_tokens on every other host. The constraint is enforced server-side by the model family, not by the URL, so name-based detection is required as a fallback. Changes: - utils.py: new shared helper model_forces_max_completion_tokens(model) that prefix-matches gpt-4o, gpt-4.1, gpt-5, o1, o3, o4 families on normalized (lowercased, vendor-prefix-stripped) names. - run_agent.py: _max_tokens_param ORs the helper into the URL check. - agent/auxiliary_client.py: - auxiliary_max_tokens_param gains an optional keyword-only model arg. - _build_call_kwargs inline branch applies the same check for both provider == "custom" and non-custom paths. Tests: - tests/test_model_forces_max_completion_tokens.py: 31 new cases covering positive families, negatives (classic gpt-4, claude, llama, mistral, qwen, deepseek), vendor prefixes, case-insensitivity, whitespace, None/empty, and substring-not-prefix guards. - tests/run_agent/test_run_agent.py::TestMaxTokensParam: 5 new model-based cases (custom + gpt-5.4, openrouter + gpt-4o-mini, custom + o1-preview, classic gpt-4-turbo keeps max_tokens, llama3 keeps max_tokens). - tests/agent/test_auxiliary_client.py::TestAuxiliaryMaxTokensParam: new class, 7 tests covering the URL x model matrix.	2026-06-09 23:22:10 -07:00
Brooklyn Nicholson	7ffc216bc0	fix(agent): make a binary @file: reference actionable instead of a dead end A binary @file: ref (PDF, docx, spreadsheet, …) expanded to a bare "binary files are not supported" warning with no content. The model saw a failure and gave up — e.g. a dropped PDF came back as a text note claiming the type was unsupported, even though the file was staged on disk right next to it. Inject an actionable content block instead: the path, mime type, size, and a nudge to use its tools to read/convert/view the file (and explicitly not to tell the user the type is unsupported). General across every binary type — not PDF-specific. The file already resolves where the agent's tools run (local cwd or the staged copy in a remote session workspace), so it can act on it directly.	2026-06-09 19:16:46 -05:00
Siddharth Balyan	1febb08240	fix(anthropic): default new Claude models to the modern thinking contract (#42991 ) New Anthropic models without a recognized version substring (claude-fable-5 and future named/numbered releases) were classified as legacy and routed down the manual-thinking path, which made OpenRouter emit thinking.type.disabled — a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400. Invert the brittle version-substring allowlists to default-to-modern (mirroring _get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/ no-sampling contract, with an explicit legacy list for older families. Non-Claude Anthropic-Messages models (minimax, qwen3, …) keep the manual path. - anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort / _forbids_sampling_params now default unknown Claude models to modern; legacy families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS. - openrouter profile: omit reasoning entirely (→ adaptive default) instead of forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy Anthropic + all non-Anthropic models still pass the disable form through. - model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out). Tests assert the invariant ("unknown Claude model -> modern contract; legacy stays manual; non-Claude unaffected"), not specific model names.	2026-06-09 23:37:23 +05:30
Teknium	967c325da8	fix(models): read OpenRouter live context_length before hardcoded catch-all (#42986 ) OpenRouter-routed slugs that are absent from models.dev (e.g. a freshly shipped anthropic/claude-fable-5) fell through to the generic DEFAULT_CONTEXT_LENGTHS["claude"]=200K entry and under-reported their real 1M window. The step-6 OpenRouter live-metadata fallback was gated on `not effective_provider`, but an OpenRouter selection sets effective_provider="openrouter" (inferred from the base URL), so that branch was dead code for every OR model. Add a dedicated step-5 OpenRouter branch that consults the live /models catalog (authoritative, refreshes as new slugs ship) before models.dev and the hardcoded family defaults — mirroring the existing Nous/Copilot/GMI branches. Keeps the Kimi-family 32k underreport guard. Per-model values are respected (claude-haiku-4.5 stays 200K), so it does not blanket-bump to 1M. Regression tests cover the fable-5 case, the genuinely-200k case, and the Kimi guard.	2026-06-09 10:49:32 -07:00
JP Lew	cb4cc08b0a	fix(codex): record app-server token usage in session accounting	2026-06-09 02:46:04 -07:00
Teknium	399b8ee5f0	fix(anthropic): strip Responses-only kwargs before Messages SDK call (#31673 ) (#42155 ) A Responses-API-shaped payload carrying instructions=/input=/store=/ parallel_tool_calls= can reach the native Anthropic messages.stream() / messages.create() call under a rare api_mode-flip race (e.g. a concurrent auxiliary vision call mutating a shared agent between the kwargs build and the stream dispatch). The Anthropic SDK rejects these with a non-retryable TypeError that kills the whole turn and propagates the entire fallback chain. Add sanitize_anthropic_kwargs() at both Anthropic dispatch sites: it drops the Responses-only keys in place and logs a WARNING (with #31673 breadcrumb) when one is present, so the underlying race stays visible in the wild instead of being silently papered over.	2026-06-08 09:36:38 -07:00
teknium1	dd0d1222a2	fix(agent): don't retry interrupt-induced transport errors (cascading-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR #6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>	2026-06-08 02:19:13 -07:00
Teknium	aa6f2775fa	fix(memory): run end-of-turn sync off the turn thread (#41945 ) A misconfigured/slow external memory provider could hold the agent in the 'running' state for minutes after the final response was delivered. MemoryManager.sync_all / queue_prefetch_all looped provider.sync_turn / queue_prefetch INLINE on the turn-completion path; a provider making a blocking network/daemon call (a broken Hindsight daemon was observed blocking ~298s before failing) blocked run_conversation from returning. Because every interface (CLI, TUI, gateway) marks the agent 'running' until run_conversation returns, the agent stayed busy for the full block and any follow-up message triggered an aggressive interrupt that dropped the message. Dispatch provider sync/prefetch to a lazily-created single-worker background executor. sync_all / queue_prefetch_all return immediately; work completes (or fails, logged) in the background. A single worker serializes writes so turn N lands before turn N+1. flush_pending() provides a barrier for session boundaries and deterministic tests. shutdown_all() drains the executor with a bounded timeout so a wedged provider can never hang teardown. Builtin-only / no-provider sessions spawn no executor (zero new threads in the common case).	2026-06-08 02:18:59 -07:00
teknium1	02a4d66951	fix(auxiliary): retry transient transport error once before fallback (#16587 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Has been cancelled Details Nix Lockfile Fix / fix (push) Has been cancelled Details A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>	2026-06-08 01:05:45 -07:00
teknium1	524453dab5	refactor(agent): consolidate inner-retry-loop recovery flags into TurnRetryState (god-file Phase 1b) run_conversation's inner retry loop tracked recovery state in ~15 scattered bare booleans (per-provider OAuth refresh guards, format-recovery guards, restart signals). They are now fields on a single TurnRetryState dataclass the loop mutates in place (_retry.<flag>), giving the recovery bookkeeping a named, testable home. Loop-control vars (retry_count, max_retries, max_compression_attempts) stay as plain locals — they're while-mechanics, not recovery bookkeeping. Behavior-neutral: pure local→attribute rewrite of 42 references; kwarg NAMES preserved (e.g. has_retried_429=_retry.has_retried_429). Live simple + tool turns OK. Validation: tests/run_agent/ 1615 passed / 0 failed under per-file process isolation; new test_turn_retry_state.py pins the field contract.	2026-06-07 22:42:05 -07:00
JimStenstrom	cb5c24e37d	fix(agent): sync logging session context on compaction id rotation When context compaction rotates agent.session_id, it updates the gateway/tools session context (set_current_session_id -> HERMES_SESSION_ID env + ContextVar) but never updates the separate logging session context. The [session_id] tag on log lines comes from hermes_logging._session_context (set once per turn in conversation_loop.py), so post-compaction log lines in the same turn carry the STALE old id while the message/DB/gateway state carry the new one — breaking log correlation exactly at the compaction boundary. Call hermes_logging.set_session_context(agent.session_id) alongside the existing set_current_session_id, guarded so a logging failure can't regress the routing update. Logs-only; no runtime or caching impact. Refs #34089	2026-06-07 22:30:02 -07:00
Teknium	8e223b36ed	fix(curator): protect load-bearing built-in skills from archival/consolidation (#41817 ) The curator's idle-archival path (apply_automatic_transitions under prune_builtins) could archive the bundled `plan` skill, killing the /plan slash command silently — typing /plan then returned 'Unknown command' with no signal that a skill had vanished. The archived skill's hash stays in .bundled_manifest, so 'hermes update' wouldn't re-seed it. Add PROTECTED_BUILTIN_SKILLS ({plan}) enforced at the master gate is_curation_eligible() (covers archive_skill + the transition walk) and in the candidate enumerator (so the LLM consolidation pass never sees them). Immune to prune_builtins, pin state, and LLM judgment.	2026-06-07 22:23:29 -07:00
Teknium	2789bf4e25	fix(auxiliary): route Codex Responses path through shared converter (#5709 ) The auxiliary Codex adapter maintained its own chat->Responses conversion loop that forwarded every non-system message's role verbatim into Responses input[]. When flush_memories()/compression replayed session history containing assistant tool_calls + role=tool results, those tool messages leaked into the request and the Responses API rejected them with HTTP 400: Invalid value: 'tool'. Route _CodexCompletionsAdapter.create() through the same shared converter the main agent transport uses (_chat_messages_to_responses_input), so tool calls become function_call items and tool results become function_call_output items with a valid call_id. Single conversion path means no future drift. Also remove the now-dead _convert_content_for_responses() helper — its only caller was the private conversion loop this change deletes. Co-authored-by: ProgramCaiCai <techxacm@gmail.com>	2026-06-07 22:18:31 -07:00

1 2 3 4 5 ...

667 commits