hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-13 09:01:54 +00:00

Author	SHA1	Message	Date
xxxigm	691ff7c188	fix(compressor): keep last visible assistant reply out of compaction summary + label handoffs in WebUI (#29824 ) Two-pronged fix for the WebUI "context compaction block in place of last assistant response" regression. Agent layer (the real fix). ``_find_tail_cut_by_tokens`` already had ``_ensure_last_user_message_in_tail`` to keep the most recent user request out of the compressed middle (#10896), but no symmetric anchor for the assistant side. When the conversation has an oversized recent tool result or a long stretch of tool-call/result pairs after the assistant's last visible reply, the token-budget walk can stop with the previously-visible reply on the wrong side of ``cut_idx``. The summariser then rolls it into the single ``[CONTEXT COMPACTION — REFERENCE ONLY]`` block persisted as ``role="user"`` or ``role="assistant"``, and from the operator's perspective the WebUI session viewer (``web/src/pages/SessionsPage.tsx``) and the TUI chat panel both suddenly show the opaque "Context compaction" block in the slot where they were just reading the actual answer: User: "i cant see the output of the last message you sent, i did see it previously, however now see 'context compaction'" Added ``_ensure_last_assistant_message_in_tail`` mirror of the user-side anchor. It looks for the most recent assistant message with non-empty text content (skipping tool-call-only assistant "stubs" which the UI renders as small "calling tool X" indicators rather than a readable bubble) and walks ``cut_idx`` back through the standard ``_align_boundary_backward`` so we don't split a tool_call/result group that immediately precedes it. The two anchors are chained — each only walks ``cut_idx`` backward, so the tail can only grow. Falls back to "most recent assistant of any kind" only when no content-bearing reply exists in the compressible region (fresh multi-step tool sequence with no prior reply) — in that case the agent-side fix is effectively a no-op and the existing user-message anchor carries the load. WebUI layer (clarity). Added ``isCompactionMessage`` detector that recognises the ``[CONTEXT COMPACTION — REFERENCE ONLY]`` (current) and ``[CONTEXT SUMMARY]:`` (legacy) prefixes from ``agent/context_compressor.py``, and a new ``compaction`` entry in ``MessageBubble``'s ``ROLE_STYLES`` map. Compaction blocks now render as muted, italicised system-style rows labelled ``Context handoff`` — clearly metadata, not the assistant's actual reply — so an operator scrolling back through a long session can't mistake the summary for a real answer. Keeping the detected prefixes inline (rather than importing them) because the WebUI bundle has no Python interop. A guardrail comment points readers at the source-of-truth constants in ``agent/context_compressor.py``.	2026-06-12 15:41:57 -07:00
Teknium	0db5cb8e75	refactor(agent): hoist summary end marker to _SUMMARY_END_MARKER; strip it on rehydration Follow-up to the #33346 cherry-pick: - the marker string was duplicated at both insertion sites (standalone + merged-into-tail); hoist to a module constant - _strip_summary_prefix now also strips a trailing end marker so a rehydrated handoff body doesn't leak the boundary directive into the iterative-update summarizer prompt (it is re-appended on insertion)	2026-06-12 15:05:00 -07:00
Tranquil-Flow	749b7219c4	fix(compression): always append END OF CONTEXT SUMMARY marker to standalone summaries regardless of role When the compression summary lands as an assistant-role message (head ends with user), the end marker was not appended. Models may regurgitate the summary text as their own visible output when there's no clear boundary signal (#33256). The end marker was already appended for user-role summaries (#11475, #14521) but the assistant-role path was missed in the original fix. This ensures ALL standalone summary messages carry the boundary marker, preventing summary text from leaking into user-visible chat output.	2026-06-12 15:05:00 -07:00
Teknium	8e5b7592f8	refactor(agent): hoist MEDIA-directive regex to module level Avoid recompiling the pattern on every _serialize_for_summary call; name it beside _PATH_MENTION_RE with the #14665 rationale.	2026-06-12 01:14:28 -07:00
Tranquil-Flow	286ecd26d8	fix(agent): strip MEDIA directives from compressor summarizer input (#14665 )	2026-06-12 01:14:28 -07:00
Teknium	c7bee8f961	refactor(agent): drop unused tail_start param from _derive_auto_focus_topic The parameter was reserved-but-unused (del'd immediately); YAGNI. Test call site updated.	2026-06-11 23:03:52 -07:00
konsisumer	434c684bfa	fix(agent): focus automatic compression on recent user turns	2026-06-11 23:03:52 -07:00
Teknium	6c752ca3a5	refactor(agent): tighten SUMMARY_PREFIX wording and fix stale doc references Legibility pass on the consolidated prefix: collapse the topic-overlap rule from three overlapping sentences into one WINS sentence + one discard/no-wrap-up sentence (same constraints, less dilution), fix the module docstring to describe the headings that actually shipped, and correct the #10896 comment's heading name (Historical Pending User Asks).	2026-06-11 13:57:13 -07:00
Teknium	acb2954d82	fix(agent): freeze carveout-era SUMMARY_PREFIX for renormalization The prompt consolidation above retires the carveout-era prefix. Without a frozen copy in _HISTORICAL_SUMMARY_PREFIXES, summaries persisted by pre-upgrade builds would lose detection (_is_context_summary_content) and renormalization (_strip_summary_prefix) — the exact regression class the tuple exists to prevent. Adds contract tests covering every frozen prefix. Refs #41607 #38364 #42812	2026-06-11 13:57:13 -07:00
kyssta-exe	8f8cad7ec5	fix(agent): strengthen compression preamble against stale task execution (#41607 )	2026-06-11 13:57:13 -07:00
konsisumer	d5e2fbf244	fix(agent): frame compaction handoff sections as historical context	2026-06-11 13:57:13 -07:00
dusterbloom	cca3b77a4b	fix(compression): clear _previous_summary on session end (defense-in-depth) ContextCompressor inherited a no-op on_session_end() from ContextEngine, so per-session iterative-summary state (_previous_summary) survived a real session boundary on a reused compressor instance. Override it to clear the summary the moment the owning session ends, complementing the point-of-use guard in compress(). Closes the cross-session contamination path in #38788. Co-authored-by: dusterbloom <32869278+dusterbloom@users.noreply.github.com>	2026-06-07 22:09:45 -07:00
Basil Al Shukaili	8513a6aec7	fix(compression): guard against cross-session stale _previous_summary contamination When a cron or background session compacts, it sets _previous_summary for iterative updates. If that session ends without /new or /reset (which calls on_session_reset()), the stale summary survives on the ContextCompressor instance. A subsequent live messaging session's compaction then injects it as 'PREVIOUS SUMMARY:' into the summarizer prompt — contaminating the live session with unrelated content from the prior session. Add an else guard in compress(): when no handoff summary is found in the current messages but _previous_summary is non-empty, discard it so _generate_summary() starts fresh instead of iteratively updating a stale cross-session summary. Fixes #38788	2026-06-07 22:09:45 -07:00
islam666	b18490b890	fix(compaction): prevent infinite loop when transcript fits in tail budget When summary_target_ratio is large (e.g. 0.45) and the context_length is moderate (e.g. 96000), the soft_ceiling (token_budget * 1.5) can exceed the total transcript size. _find_tail_cut_by_tokens walks the entire transcript without breaking early, and the resulting compress window is either empty (compress_start >= compress_end) or a single message whose summary-of-one overhead saves ~0 tokens. Both outcomes cause a no-op compression that does not increment _ineffective_compression_count, so should_compress() returns True on every subsequent turn and the loop repeats endlessly. Fix (two layers): 1. _find_tail_cut_by_tokens: when the backward walk consumed the entire transcript without breaking (cut_idx <= head_end and accumulated <= soft_ceiling), re-walk with the raw (non-inflated) token budget to find a meaningful cut that gives the summarizer a useful middle window. 2. compress(): when compress_start >= compress_end, increment _ineffective_compression_count and log a warning so the existing anti-thrashing guard in should_compress() can break the loop. Fixes #40803	2026-06-07 21:50:57 -07:00
Teknium	d87f293972	feat(compression): temporal anchoring in compaction summaries (#41102 ) Compaction summaries now receive the current date and instruct the summarizer to rewrite completed actions as absolute, dated, past-tense facts (e.g. "email John about the proposal" -> "Sent the proposal email to John on 2026-06-07"). A resumed conversation no longer re-issues work that already happened or treats a finished action as still pending. The date is resolved via hermes_time.now() (date-only, user-configured timezone) inside _generate_summary. The compaction summary is a mid-conversation message that is never part of the cached prefix, so the date does not affect prompt-cache stability. Date resolution is best-effort: a clock failure omits the rule rather than blocking compaction. The rule rides the shared template, so both first-compaction and iterative-update prompts carry it. Inspired by Poke's summarization (temporal anchoring + semantic preservation).	2026-06-07 08:36:45 -07:00
Teknium	42bbd221e8	fix(compressor): strip stale handoff prefix on resume; reconcile #26290+#32787 (#35344 ) A handoff persisted under an older SUMMARY_PREFIX can be inherited into a resumed lineage. _strip_summary_prefix only matched the current/legacy literal, so on re-compaction the old 'resume exactly from Active Task' directive stayed embedded in the body and kept hijacking replies to new, unrelated user messages. - Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize them in _strip_summary_prefix + _is_context_summary_content so resumed stale handoffs are re-normalized to the current latest-message-wins prefix. - Reconcile the overlapping Active Task template edits from the salvaged #26290 (reverse-signal cancellation) and #32787 (capture open questions / decisions, don't write None too eagerly) — both intents kept. - Regression coverage in tests/agent/test_resume_stale_active_task.py. - AUTHOR_MAP entries for both salvaged contributors.	2026-05-30 07:29:21 -07:00
Mathijs van den Hurk	56b8dccf25	fix(compressor): treat unanswered user questions as Active Task, not 'None' The Active Task field in compression summaries is the single most important field for task continuity across context boundaries. The previous template described it narrowly as a 'task assignment' or 'request', which caused the summary LLM to write 'None' whenever the user's most recent input was a question, a decision request, or a discussion turn rather than an imperative command. The assistant on the other side of the compaction then treated the conversation as resolved and gave a generic recap instead of answering the still-open question. Expand the template guidance to cover: * explicit task assignments * questions awaiting an answer * decisions awaiting input (A vs B) * ongoing discussions where the assistant owes the next substantive reply Reserve 'None' for the rare case where the last exchange was fully resolved (e.g. user said 'thanks, that's all'). Also tighten the trailing CRITICAL instruction in the summary prompt so the LLM cannot fall back to the old 'no imperative command → None' heuristic. No behavioural code changes — template strings only. All 83 existing compressor tests pass.	2026-05-30 07:29:21 -07:00
Zhipeng Li	020601d41e	fix(compression): drop conflicting 'resume Active Task' directive in summary prefix SUMMARY_PREFIX previously contained two contradictory directives: 1. "treat it as background reference, NOT as active instructions" "Do NOT answer questions or fulfill requests mentioned in this summary" "Respond ONLY to the latest user message that appears AFTER this summary" 2. "Your current task is identified in the '## Active Task' section of the summary — resume exactly from there." When the latest user message contradicted Active Task (e.g. 'stop the i18n refactor', 'never mind, look at grafana instead'), models tended to follow (2) anyway because 'resume exactly' is a strong, unambiguous directive — leading to repeated re-surfacing of already-cancelled work across turns, even after explicit 'stop'/'don't keep bringing that up' messages from the user. This change: - Removes the conflicting 'resume exactly from Active Task' clause. - Makes the precedence explicit: latest user message is the single source of truth; it WINS on conflict; cancelled Active Task / In Progress / Pending User Asks / Remaining Work must be discarded entirely (no 'wrap up the old task first'). - Names canonical reverse signals (stop, undo, roll back, never mind, just verify, topic change) so the model recognizes them as cancellation triggers, not background context. - Updates the summarizer template instruction so the LLM doesn't mechanically copy a cancelled task into Active Task on the next compaction (it's instructed to copy the reverse signal verbatim). - Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and the 'don't repeat work already reflected in session state' clause. Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so the conflict can't regress. Tested: - All compaction tests pass: tests/agent/test_context_compressor.py, tests/agent/test_context_compressor_summary_continuity.py, tests/run_agent/test_413_compression.py, tests/run_agent/test_compression_persistence.py, tests/run_agent/test_compression_boundary_hook.py, tests/cli/test_manual_compress.py — 117/117 passing. - Tested on macOS.	2026-05-30 07:29:21 -07:00
helix4u	e38b0b55d1	fix(compression): avoid repeat preflight compaction from rough estimates	2026-05-29 19:05:03 -07:00
hinotoi-agent	042c1d6bb0	test: cover fallback dropped-turn handoff	2026-05-28 20:34:40 -07:00
Hinotoi Agent	6dc068ef04	fix: broaden deterministic compression fallback coverage	2026-05-28 20:34:40 -07:00
Hinotoi Agent	e785c0ad70	fix: preserve context when summary generation fails	2026-05-28 20:34:40 -07:00
0z1-ghb	8b2adead78	fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency	2026-05-23 17:38:19 -07:00
Teknium	9aae59feab	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 ) PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled.	2026-05-18 10:28:20 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
Teknium	3b39096904	Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189 ) After context compression, the protected tail messages retain their original image parts. When those include multi-MB pasted screenshots, every subsequent API request re-ships the same base-64 blobs forever — which can push the request past provider body-size limits and wedge the session even though compression 'succeeded'. Add _strip_historical_media() to agent/context_compressor.py. After the summary is built, find the newest user message that carries an image part and replace image parts in every earlier message with a short text placeholder ('[Attached image — stripped after compression]'). The newest image-bearing user turn keeps its media so the model can still analyse what the user just sent. Handles all three multimodal shapes: - OpenAI chat.completions image_url - OpenAI Responses API input_image - Anthropic native {type: image, source: ...} Includes 27 unit tests covering the helpers and the end-to-end compress() integration, plus a manual E2E check confirming a ~4MB two-image conversation shrinks to ~2MB after compression.	2026-05-16 17:18:25 -07:00
Stephen Schoettler	5ce0067c08	fix(ci): stabilize shared test state after 21012	2026-05-14 14:28:14 -07:00
teknium1	4ceab16893	fix(compression): keep default protect_first_n at 3 + align ABC Follow-up on the salvaged feat commit: - Keep the constructor / config / yaml-example default at 3 so existing gateway and CLI users see no behavioural change. PR #13754 (which this builds on) had lowered the default to 2 to chase pre-feature parity in the system-prompt-present case, at the cost of quietly halving the protected head for the gateway path (which strips the system prompt before calling compress()). With the new "system prompt is implicit" semantics, default 3 gives every caller a stable head shape. - agent/context_engine.py: bring the ABC's protect_first_n docstring in line with the new semantics so plugin context engines interpret the config key the same way the built-in compressor does. - tests: adjust the default-value test (3, not 2) and a stale comment; per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is since those tests fix concrete head shapes.	2026-05-13 22:25:16 -07:00
snav	dee71a31e5	feat(compression): make protect_first_n configurable The number of head messages preserved verbatim across context compactions was previously hardcoded to 3 in AIAgent.__init__. Expose it as `compression.protect_first_n` in config, matching the existing `protect_last_n` pattern. Motivation: users who rely on rolling compaction for long-running sessions had the opening user/assistant exchange pinned as head forever, which doesn't always match how they want the session framed after many compactions. Lowering to 1 preserves the system prompt + first non-system message; lowering to 0 preserves only the system prompt and lets the entire first exchange age out naturally through the summary. Semantics: `protect_first_n` counts non-system head messages protected in addition to the system prompt, which is always implicitly protected when present. Same meaning across both code paths: protect_first_n=0 → system prompt only (or nothing if no system message) protect_first_n=2 → system prompt + first 2 non-system messages (default) This unifies the CLI path (which reads messages with the system prompt at position 0) and the gateway path (where the gateway /compress handler strips the system prompt before calling compress() — see gateway/run.py L9150-9154 on the parent fork). Previously these two paths disagreed: CLI path: protect_first_n=1 → protect system prompt only Gateway path: protect_first_n=1 → protect first USER turn forever In practice on long-running gateway sessions the old semantics pinned whatever stale aside happened to be the first user message, reinserting it into every compaction summary indefinitely. Default chosen as 2 (not 3) so that the effective protected head count remains 3 messages in the common case — assuming a system prompt is present, default protection becomes system + 2 non-system = 3 total, matching the pre-feature behaviour where `protect_first_n` was hardcoded to protect 3 messages total. Sessions without a system prompt will see a small behaviour change (2 protected head messages instead of 3), but this is the rare path and the new semantics make the system-prompt-present case the well-defined one. Changes: - agent/context_compressor.py: redefine protect_first_n as the count of non-system head messages protected beyond the implicit system-prompt guarantee; both paths converge. Constructor default updated to 2. - hermes_cli/config.py: add `compression.protect_first_n` default (2), matching the new semantics. `show_config` label tweaked to 'Protect first: N non-system head messages' for clarity. - run_agent.py: read protect_first_n from config; 0 is now valid (system prompt is always implicitly protected). - cli-config.yaml.example: document the new key and rationale. - tests/agent/test_context_compressor.py: cover default, override, the end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour, the no-system-prompt (gateway) path, and the new shared-semantics regression test. Fixes #13751 Tested on Ubuntu 24.04.	2026-05-13 22:25:16 -07:00
kshitij	2ec8d2b42f	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 ) Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero).	2026-05-11 11:13:25 -07:00
kshitij	657874460f	chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926 ) PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions.	2026-05-11 11:03:29 -07:00
Wesley Simplicio	35f773c459	fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:52:51 -07:00
kshitij	c7e8add120	fix(context): handle JSON decode errors in compression — salvage of #22248 (#22416 ) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of #22248 by @0xharryriddle. Closes #22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>	2026-05-09 01:47:15 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
LeonSGP43	fc88eec926	fix(compressor): soften summary prompt for content filters	2026-05-07 06:42:32 -07:00
kshitijk4poor	aa88dcc57b	fix: salvage batch — compaction guidance, memory authority, cache eviction after compression - Fix /compact → /compress in context-overflow tips (closes #20020) - Evict cached agent after session hygiene and /compress so system prompt refreshes with current SOUL.md, memory, and skills - Restore memory authority across compaction: change 'informational background data' to 'authoritative reference data' in memory block and SUMMARY_PREFIX, with backward-compatible regex Based on: - PR #20027 by @LeonSGP43 - PR #18767 by @MacroAnarchy - PR #17380 by @vominh1919 PR #17121 boundary marker fix already merged to main (`2eef395e1`). PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail().	2026-05-05 22:33:45 -07:00
wmagev	2eef395e1c	fix(compaction): mark end of context summary in role=user fallback When the head ends with assistant/tool and the tail starts with assistant, the summary is inserted as a standalone role="user" message. The body's verbatim "## Active Task" quote then gets read as fresh user input by weak/local models (#11475, #14521). The merge-into-tail path already appends an explicit end-of-summary marker for this reason. Mirror it on the standalone path so both insertion routes give the model the same "summary above, not new input" signal.	2026-05-05 04:51:29 -07:00
revaraver	4a3e3e20e5	fix(compression): preserve iterative summary continuity	2026-05-05 04:42:44 -07:00
JasonOA888	a7417f8a4a	fix(compressor): skip non-string tool content in summarization pass to prevent AttributeError Commit `408dd8aa` added a non-string guard for Pass 1 (dedup), but the same pattern exists in Pass 2 (summarization/pruning) where content.startswith() and len() are called on potentially non-string tool content. When a provider returns tool results with non-string content (e.g. dict or int from llama.cpp or similar), the pruning pass crashes with AttributeError. Add the same isinstance(content, str) guard to Pass 2 for consistency.	2026-05-04 06:23:52 -07:00
swithek	b7bbc62503	fix(compressor): _prune_old_tool_results boundary direction	2026-05-04 05:05:18 -07:00
pander	6b88f46c54	fix(compressor): trigger fallback on timeout errors alongside model-not-found Previously only HTTP 404/503 and specific error strings triggered a fallback to the main model when the summary model was unavailable. Timeout errors (HTTP 408/429/502/504, or error strings containing 'timeout') entered a short cooldown instead, leaving context to grow unbounded for the rest of the session. Add _is_timeout detection alongside _is_model_not_found so that transient timeout errors on the summary model also trigger immediate fallback to the main model, preventing compression failure from cascading. Closes #15935	2026-05-04 03:10:53 -07:00
nftpoetrist	e2211b2683	fix(compressor): reset _summary_failure_cooldown_until in on_session_reset() on_session_reset() cleared _previous_summary, _last_summary_error, and _ineffective_compression_count but left _summary_failure_cooldown_until intact. When a transient summary error sets a 60 s cooldown (or 600 s for a missing-provider RuntimeError) and the user immediately runs /reset or /new, the cooldown carries into the new session. If the new session reaches the compression threshold before the cooldown expires, _generate_summary() returns None early, middle turns are silently dropped without a summary, and the agent continues with no indication that compaction was skipped. Fix: set _summary_failure_cooldown_until = 0.0 in on_session_reset(), matching the value assigned in __init__ and symmetric with the other per-session fields already cleared there. Fixes #15547	2026-05-04 02:30:31 -07:00
sprmn24	408dd8aa28	fix(compressor): skip non-string tool content in dedup pass to prevent AttributeError	2026-05-03 15:28:30 -07:00
0z!	b194617d00	fix(context_compressor): off-by-one in tail protection for short conversations	2026-04-30 20:00:01 -07:00
Stephen Schoettler	b29b709a71	fix(agent): sanitize Codex tool-call history summaries	2026-04-30 19:58:46 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
Teknium	6ea5699e3f	fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775 ) A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures. Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields: - _last_aux_model_failure_model: str \| None - _last_aux_model_failure_error: str \| None Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings. Surface at three places: - gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved) - gateway /compress command: ℹ line appended to the reply - CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.	2026-04-27 20:08:23 -07:00
Teknium	94b26f3ec9	fix(compression): retry summary on main model for unknown errors before giving up (#16774 ) The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.	2026-04-27 19:25:57 -07:00

1 2 3

112 commits