hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
brooklyn!	f4c656b0a0	Merge pull request #52854 from NousResearch/bb/fix-interrupt-partial-reply fix(interrupt): keep partial streamed reply when stopped mid-response	2026-06-26 00:04:37 -05:00
teknium1	6c58878e7d	fix(browser): force secret-pattern redaction on browser_type display Force redact_sensitive_text(force=True) on the browser_type text arg so recognized credentials (API keys, tokens, JWTs) are masked in tool progress, previews, callbacks, and return payloads even when the global security.redact_secrets opt-out is set — a typed credential reaching chat history is a security boundary, not log hygiene. Normal typed text matches no pattern and stays fully readable for debuggability. Tests assert the API-key-shaped secret is masked across every surface and that normal text passes through unchanged.	2026-06-25 22:02:22 -07:00
rebel	8ff426e53b	fix: redact browser typed text surfaces	2026-06-25 22:02:22 -07:00
Brooklyn Nicholson	8233598e64	fix(interrupt): keep partial streamed reply when stopped mid-response Stopping a turn while the model is streaming (stop/esc to redirect) raised InterruptedError, set final_response to the throwaway "waiting for model response" sentinel, and persisted messages WITHOUT the assistant text that was already streamed to the screen. The next turn then had no record of the half-finished reply, so the model appeared to "forget" what it just said. Recover the on-screen text from _current_streamed_assistant_text in the InterruptedError branch and append it as the assistant turn (and surface it as final_response). The metadata sentinel is kept only when nothing was streamed yet, preserving the ACP/client suppression behavior. Completes the partial-stream recovery from `397eae5d9` (which wired the same _current_streamed_assistant_text salvage into the connection-failure twin but missed the user-interrupt path). The lossy handler dates to `c98ee9852`.	2026-06-25 23:54:20 -05:00
Teknium	c6575df927	feat(moa): expose MoA presets as selectable virtual models (#46081 ) * feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.	2026-06-25 13:52:06 -07:00
herbalizer404	b82c83d320	fix(auxiliary): honor fallback chain when compression provider auth is unavailable When an explicit aux provider cannot build a client before any request is sent (missing raw env key, exhausted/unavailable OAuth or credential-pool auth, resolver returning (None, None)), call_llm raised a misleading "no API key was found" error and bypassed the configured fallback_chain entirely. A provider authenticated through Hermes auth / the credential pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so compression failed instead of routing to the configured fallback. Adds _try_configured_fallback_for_unavailable_client() and wires it into both sync and async call_llm before the raise, and into the startup compression feasibility check. Salvaged from #51835 by @herbalizer404.	2026-06-25 13:08:18 -07:00
kshitijk4poor	d9bd7ce827	test(compression): pin rotation-fallback tests to in_place=False ahead of default flip These 7 test sites assert rotation behavior (fork, child sessions, lock contention, logging session-context follows id rotation, boundary hooks fire on rotation). Pin each builder to in_place=False explicitly so they keep exercising the retained rotation fallback regardless of the global default (flipped to True in #38763). Rotation stays a working opt-out fallback and deserves continued coverage — these are NOT deleted. Pinned sites: - test_compression_concurrent_fork._build_agent_with_db - test_compression_logging_session_context._build_agent_with_db - test_compression_rotation_state._build_agent_with_db - test_compression_boundary_hook._make_agent (2 helpers: CompressionBoundaryHook + SessionCompressEvent) - test_compression_concurrent_sessions._build_agent_with_db	2026-06-25 12:56:05 -07:00
Brooklyn Nicholson	2f1a47b90e	feat(agent): require verification before finishing edits Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.	2026-06-24 23:02:48 -05:00
kshitijk4poor	e0272cfef2	Revert "fix(compression): make minimum context floor configurable (#31600 )" This reverts commit `cae1ee44a7`.	2026-06-25 01:04:44 +05:30
Tranquil-Flow	cae1ee44a7	fix(compression): make minimum context floor configurable (#31600 ) Add compression.minimum_context_floor config key that allows users to lower the compression threshold floor below the hardcoded 64K default, preventing infinite tool-call loops on models whose structured output degrades well before 64K tokens. - agent/model_metadata.py: add get_configurable_minimum_context() helper with 16K hard safety limit - agent/context_compressor.py: accept minimum_context_floor param, thread it through _compute_threshold_tokens - agent/conversation_compression.py: use compressor's floor for aux model context validation - agent/agent_init.py: read compression.minimum_context_floor from config and pass to ContextCompressor - gateway/run.py: cache-busting includes new key Salvaged from #31686 by @Tranquil-Flow onto current main. Resolves conflicts with in-place compaction (#38763) and max_tokens threshold computation (#43547) that landed after the original PR. Closes #31600	2026-06-25 00:56:04 +05:30
helix4u	292a456c06	fix(agent): handle concurrent tool submit shutdown	2026-06-24 02:56:56 +05:30
konsisumer	190b01c553	fix(agent): persist tool calls before turn-end flush Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-24 02:15:57 +05:30
Brooklyn Nicholson	88e136448d	fix(agent): shrink anthropic-native image history Retry image-size rejections by rewriting Anthropic base64 image source blocks, not just OpenAI-style image_url parts.	2026-06-22 18:23:21 -05:00
Teknium	87c4a5ebb8	feat(background-review): aux-model selector for the self-improvement review (#49252 ) Adds auxiliary.background_review.{provider,model} (default auto = main chat model — unchanged). Set it to a different, cheaper model and the post-turn self-improvement review runs there for ~3-5x lower cost. Cache-aware by design: the main chat is warm in the prompt cache, so the default full-history replay on the main model is cheap cache reads — left exactly as-is. A different model can't reuse that cache (different key), so when (and only when) routed to a different model the fork replays a compact digest instead of the full transcript, minimising what it cold-writes on the aux model. Same model -> full replay; different model -> digest. Quality holds in benchmarks: memory capture identical, skill near-identical. Nothing changes unless you opt in by naming a different model. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-22 14:54:53 -07:00
Teknium	b1b20270c4	refactor(memory): move write-mirror gating behind MemoryManager interface The success/staged gating and op-expansion for mirroring built-in memory writes to external providers lived in a standalone agent/memory_write_bridge.py helper called inline from two core call sites (tool_executor.py, agent_runtime_helpers.py). That left the mirror decision-making in the agent loop, outside the memory-provider interface. Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the loop now hands over the raw tool result + args and a metadata callback, and the manager decides whether/what to mirror. Both core call sites collapse to a single call; the orphan module is removed. No MemoryProvider ABC change. Tests rewritten as behavior tests against the manager method.	2026-06-22 07:00:42 -07:00
Hao Zhe	70e7132e2f	fix(openviking): gate memory writes and add viking_forget Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove. Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.	2026-06-22 07:00:42 -07:00
kshitijk4poor	ebd38e1280	test(agent): regression for token-only compression progress (#39550 , #23767 ) Adds test_413_retries_on_token_only_compression: same message count but materially fewer tokens after compaction must count as progress and retry, not abort. Fails on main without the salvaged fix, passes with it.	2026-06-22 15:26:29 +05:30
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	2b3a4f0af8	fix(agent): strip stale reasoning_content when falling back to a strict provider (#50480 ) * fix(agent): strip stale reasoning_content when falling back to a strict provider A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content on every assistant tool-call turn (a single space " " pad). api_messages is built once under the primary; on a mid-session fallback to a strict OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale pads were replayed verbatim and rejected with HTTP 400/422: body.messages.2.assistant.reasoning_content: Extra inputs are not permitted (input: ' ') reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never reconciled history built under a reasoning primary against a strict fallback. copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only shapes to non-pad providers. Fix both sites: when the active provider does not enforce echo-back, strip reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when switching TO a reasoning provider is preserved. Covers the Cerebras 400 from #45655 and the DeepSeek->Mistral 422 fallback report. Refs #45655. * test: update reasoning-replay tests for strict-provider stripping test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning provider so the precedence it checks is observable. Add a positive strict-provider test asserting reasoning_content is stripped on replay.	2026-06-21 18:05:07 -07:00
JP Lew	c11ae8261b	fix(codex): seed app-server sessions with configured cwd	2026-06-21 16:39:02 -07:00
teknium1	9e4fe32d36	fix(session): opt the background-review fork out of session finalization The background-review fork (fires ~every 10 turns) pins review_agent.session_id = agent.session_id — the parent's LIVE id — for prefix-cache parity, then calls close(). With session finalization now in close(), that would end the still-active parent session mid-conversation. Set _end_session_on_close = False on the fork so the real owner (CLI close / gateway reset / cron) finalizes the session instead. Follow-up to the #12029 fix.	2026-06-21 11:35:09 -07:00
konsisumer	3e354b61db	fix(agent): preserve copilot routed headers	2026-06-21 11:29:49 -07:00
teknium1	2f4f23fbfb	fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835 ) When the main provider is the Codex app-server runtime (api_mode codex_app_server), the gateway showed no verbose 'running X' tool-progress breadcrumbs on Telegram while every other provider did. The app-server session processes item/started notifications (command execution, file changes, MCP/dynamic tool calls) but never surfaced them as Hermes tool-progress events — the session was constructed without an on_event hook, so the agent's tool_progress_callback was never invoked on this route. Add _codex_note_to_tool_progress() mapping item/started → (tool_name, preview, args) for commandExecution / fileChange / mcpToolCall / dynamicToolCall, and wire an on_event hook into CodexAppServerSession that forwards mapped events to agent.tool_progress_callback('tool.started', ...) — the same signature the chat_completions path uses (tool_executor.py). Non-tool items (agentMessage/reasoning) and non-item/started methods map to None and are ignored. Co-authored-by: jplew <462836+jplew@users.noreply.github.com>	2026-06-21 08:46:06 -07:00
teknium1	f22dd8a75a	fix(agent): fail over to fallback provider on persistent auth failure (401/403) When the active provider returns a 401/403 that survives its per-provider credential-refresh attempt (revoked OAuth, blocked/expired key, or an account pinned to a dead/staging inference endpoint), the conversation loop now escalates to the configured fallback chain instead of dead-ending. Before: the generic failover dispatch fired only for {rate_limit, billing}; auth/auth_permanent fell through to 'switch providers manually' advice and never called _try_activate_fallback(). A user whose primary credential was broken kept thrashing on the same dead credential every turn — the main agent appeared 'stuck in fallback mode' while never actually failing over. This also affected auxiliary tasks (compression, vision, title-gen), since auto-resolved aux follows the main provider. After: a persistent auth failure with a configured fallback chain switches to the next provider (mirroring the rate-limit/billing failover path), guarded one-shot per attempt by TurnRetryState.auth_failover_attempted. When no fallback is configured the behavior is unchanged — it falls through to the existing terminal handling and provider-specific troubleshooting guidance. Tests: test_auth_provider_failover.py — 401/403 classify as auth, the gating condition fires only with a chain present + guard unset, the guard blocks repeats, and non-auth (500) errors do not trigger auth failover.	2026-06-20 11:38:01 -07:00
kshitijk4poor	854d75723f	fix(compression): keep compaction-archived turns discoverable in session_search Follow-up to the soft-archive durability fix. Reusing the rewind/undo active=0 flag for compaction-archived turns inherited the wrong search semantics: undo rows are intentionally HIDDEN from session_search (the user took them back), but compaction-archived turns must stay DISCOVERABLE — that is the whole point of Teknium's "searchable / recoverable" requirement. As built, search_messages defaulted to WHERE active=1, so after in-place compaction the pre-compaction turns were in the FTS index but filtered out of the default search. (The earlier "searchable" claim only held for a raw FTS query / include_inactive=True, not the actual session_search tool.) Empirically confirmed the gap: search 'HMAC' returned 2 hits before compaction, 1 after (only the summary's mention) — the originals were hidden. Fix — a `compacted` flag distinct from `active`, giving a 3-way state: - active=1, compacted=0 → live context (normal) - active=0, compacted=1 → compaction-archived: OUT of live context, IN search - active=0, compacted=0 → rewind/undo: OUT of live context, OUT of search Changes: - messages.compacted INTEGER NOT NULL DEFAULT 0 added to SCHEMA_SQL. Declarative _reconcile_columns adds it on existing DBs — no version bump (plain column add). - archive_and_compact: UPDATE … SET active=0, compacted=1 (was active=0 only). - search_messages: default WHERE active=1 → (active=1 OR compacted=1), on BOTH the main FTS5 path and the trigram CJK path. include_inactive=True still returns everything. The short-CJK LIKE fallback already returns all rows (no active filter) — unchanged. - Docstrings on archive_and_compact + search_messages document the 3-way state. Verified: after compaction, session_search default finds the archived originals (ids 1 & 4); rewind/undo rows stay hidden by default (recoverable via include_inactive); live context still excludes both. 322 in-place + hermes_state tests and 46 session_search tests green; ruff clean. Mutation check: reverting the search WHERE to active-only fails the new searchable test. (Surfaced by the question "is search semantic or only FTS?" — answer: session search is FTS5 keyword/BM25 only, no embeddings over the transcript; semantic retrieval lives in the optional memory-provider layer. Tracing that confirmed the active-only filter gap above.)	2026-06-20 10:57:07 -07:00
kshitijk4poor	4663456996	fix(compression): in-place compaction is non-destructive (soft-archive, not delete) Teknium review: keeping one durable session id must NOT come at the cost of destroying history. The prior in-place implementation used replace_messages, which hard-DELETEs the pre-compaction turns (they also drop out of the FTS index) — same id, but the original conversation is gone with no recovery path and the summary becomes the only record. Rotation today is non-destructive (the old session's full transcript survives under the old id); in-place must match that durability contract, not weaken it. Fix: compact in place by SOFT-ARCHIVING, reusing the existing messages.active flag (the /undo soft-delete mechanic), instead of deleting: - New SessionDB.archive_and_compact(session_id, compacted): in one atomic write, UPDATE messages SET active=0 on the live turns, then insert the compacted set as fresh active=1 rows. Nothing is deleted. - The insert loop is extracted into a shared _insert_message_rows() helper so archive_and_compact and replace_messages don't duplicate the 60-line column/encoding block (extend-don't-duplicate). - Agent in-place branch calls archive_and_compact instead of replace_messages. Durability outcome (proven by test + E2E across repeated compactions): - Live context load (get_messages_as_conversation / get_messages) filters active=1, so a resume reloads ONLY the compacted set — compaction still shrinks the live session. - The pre-compaction turns stay on disk at active=0, recoverable via get_messages(include_inactive=True) / restore_rewound. - They remain FTS-searchable: the messages_fts* triggers index on INSERT and remove on DELETE only — they do NOT key on active, and active=0 is a content-preserving UPDATE. session_search still finds them. - Verified across TWO successive compactions: the 1st compaction's originals are still recoverable + searchable after the 2nd (answers the "no recovery path after the next compaction" concern directly). message_count now reflects the LIVE (active/compacted) count, matching the live load. replace_messages keeps its DELETE semantics (still correct for /retry, /undo) and gains a docstring note pointing compaction at the non-destructive method. Tests: test_in_place_keeps_same_session_id strengthened to assert the 8 seeded originals survive at active=0 alongside the 2 compacted rows AND stay FTS-searchable. Mutation check: swapping archive_and_compact back to a hard DELETE fails the test, so the non-destructive contract is bound. 285 hermes_state + in-place tests green; rotation/persistence/compress-command/cli suites green; ruff clean.	2026-06-20 10:57:07 -07:00
kshitijk4poor	4f9485a95d	refactor(compression): tidy in-place compaction path (simplify pass) Parallel 3-reviewer cleanup of the in-place compaction code. Findings applied: - perf: in-place mode no longer pre-flushes current-turn messages. The flush ran INSERTs that the immediately-following replace_messages(compressed) DELETE+reinsert discarded -- pure wasted writes per compaction. The current-turn tail survives via the compressor's compressed output (protect_last_n), not the flush. Verified no data loss; rotation still pre-flushes (its old session row is preserved, so the flush is real there). - quality: hoist the two shared post-write steps (update_system_prompt + _last_flushed_db_idx = 0) below the if/else -- they ran in both branches against agent.session_id. Removes the easiest divergence bug. - quality: compute the compaction-boundary locals (_old_sid, _is_boundary, _boundary_parent) ONCE instead of recomputing locals().get('old_session_id') and the "_old_sid or agent.session_id or ''" chain three times. - quality: initialize compacted_in_place up front and assign agent._last_compaction_in_place directly, dropping the fragile locals().get('compacted_in_place') reflection. - reuse: parse the in_place config flag with utils.is_truthy_value (the project's canonical truthy coerce) instead of a hand-rolled str().lower() in {...} (agent_init already imports from utils). Dropped as false positives / out of scope: gateway getattr of agent internals (established session_id pattern), dual result-dict carry (mirrors history_offset etc.), stringly-typed "compression" (codebase-wide convention, no constant). Behavior-preserving: 7 in-place tests (incl. 2 new flush-guard tests) + 26 rotation/boundary/persistence/command tests green; mutation check confirms the durable-replace guard still binds (removing replace_messages fails the test); ruff clean. Added test_in_place_skips_redundant_preflush / test_rotation_still_preflushes to guard the perf change.	2026-06-20 10:57:07 -07:00
kshitijk4poor	1fbf48d4ad	fix(compression): make in-place compaction durable + rotation-independent end-to-end Review (Codex + 3-agent parallel) found the first cut of in-place mode was incomplete: it only updated the system prompt, so the persisted transcript stayed 'full history + summary' and the next turn/resume reloaded the full history and immediately re-compacted (a loop), and every downstream layer that keyed off session-id rotation silently no-op'd. The session_id was doing double duty as the 'compaction happened' signal. This wires the whole path so removing rotation is actually complete: Agent (agent/conversation_compression.py): - In-place now DURABLY replaces the transcript: replace_messages(session_id, compressed) on the same row (the canonical store the gateway reloads from), not just update_system_prompt. Resume reloads the compacted set; no loop. - Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids cleared) so next-turn appends diff against the compacted transcript. - Expose a rotation-independent signal: agent._last_compaction_in_place, and in_place=True on the session:compress event. - Fire the compaction-boundary hooks (context-engine on_session_start, memory manager on_session_switch, reason='compression') in BOTH modes — in-place passes the same id as parent so DAG/buffer state still checkpoints. Without this, memory/context plugins miss every in-place compaction. Gateway auto-compress (gateway/run.py): - Read agent._last_compaction_in_place; set history_offset=0 on rotation OR in-place (both return the compacted set, so slicing past the pre-compaction length would drop everything). Carry compacted_in_place in the result dict. - No extra rewrite needed: the agent shares the gateway's SessionDB, so its replace_messages already updated the canonical store load_transcript reads. Manual /compress (gateway/slash_commands.py): - The throwaway /compress agent has no _session_db, so rewrite_transcript is the durable write. Previously gated behind 'if rotated:' which treated 'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite — making /compress a silent no-op in in-place mode. Now rewrites on rotated OR in_place; the data-loss guard still fires only for the genuine no-rotation-AND-not-in-place failure. Hygiene auto-compress already writes _compressed to the same id unconditionally (its agent has no _session_db, can't rotate) — correct for in-place, no change. Tests (tests/run_agent/test_in_place_compaction.py): - Assert the DURABLE transcript IS the compacted set after reload (get_messages_as_conversation == compacted), message_count==2, flush identity reset, and the rotation-independent signal set on in-place / unset on rotation. Rotation regression guard unchanged. Verified: 64 tests green across in-place + rotation/persistence/boundary/ concurrent/failure-sync/command/cli suites; E2E both modes (durable replace, gateway offset=0, rotation preserves old transcript); ruff clean. Still default-off.	2026-06-20 10:57:07 -07:00
kshitijk4poor	47fadc24d7	feat(compression): in-place compaction option that keeps one session id (#38763 ) Context compression today rewrites the message list AND rotates the session id — it ends the session, forks a parent_session_id child, and renumbers the title (name -> name #2). That moving identity key is the root cause of a whole bug cluster: /goal lost (#33618), pending response lost at the split (#14238), orphan sessions (#33907), TUI sid desync (#36777), FTS search gaps + duplicate sidebar entries (#45117), null continuation cwd (#42228), and title-rename dead-ends (#48989). It also forced a large defensive apparatus (compression lock, contextvar/env/ logging triple-sync, orphan finalization, gateway SessionEntry re-propagation, tip projection) whose only job is surviving a mid-conversation id change. Add a compression.in_place config flag (default False during rollout). When True, compaction rewrites the transcript and rebuilds the system prompt but keeps the SAME session_id: no end_session, no child row, no title renumber, no contextvar/logging re-sync, no memory/context-engine session-switch. The conversation keeps one durable id for life, like Claude Code / Codex. Compaction is lossy by design — the pre-compaction transcript is summarized away, not archived. The rotation path is unchanged when the flag is off (moved verbatim into an else branch). Staged rollout: this PR ships the option behind a default-off flag for live validation; a follow-up flips the default and deletes the now-redundant rotation machinery, superseding the 14 open band-aid PRs in this area. - hermes_cli/config.py: add compression.in_place (default False), documented - agent/agent_init.py: resolve the flag -> agent.compression_in_place - agent/conversation_compression.py: branch compress_context() on the flag - tests/run_agent/test_in_place_compaction.py: in-place invariants + rotation regression guard + config default The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no boundary data loss. Prompt-cache invariant preserved: the system-prompt rebuild is the same single sanctioned invalidation that already happens during compaction — no NEW invalidation. Message alternation preserved.	2026-06-20 10:57:07 -07:00
Gille	a7983d5ad7	fix(dashboard): hide sidecar sessions from history (#49269 ) * fix(dashboard): hide sidecar sessions from history * test(dashboard): allow sidecar source in session payload	2026-06-19 18:06:38 -04:00
alt-glitch	990273d90a	fix(agent): accept pixel-correct image downscale when bytes grow (#48013 ) The image-too-large reactive shrink (try_shrink_image_parts_in_messages) conflated two independent constraints: it always rejected a resize whose re-encoded bytes were >= the original, even when the shrink was driven by a PIXEL-DIMENSION cap (Anthropic many-image 2000px) rather than the byte budget. Downscaled screenshot PNGs routinely re-encode LARGER in bytes, so the dimension-correct result was discarded and the image left oversized -> the provider re-rejected on retry and the session wedged forever. Fix: track which constraint triggered the shrink (bytes vs dimension) and gate the accept on the SAME axis. * dimension path: accept the result as long as it is now within max_dimension, regardless of byte size (verify via Pillow; fall back to the byte gate only when the re-encode can't be decoded). * bytes path: still require bytes to shrink, but ALSO re-check the per-side cap when it's active — _resize_image_for_vision returns a best-effort, possibly over-cap blob when it exhausts its halving budget on a very-high-aspect image, so a byte-shrink alone can leave it over the dimension cap and re-brick on retry. Extend the unshrinkable-oversized guard to the pixel axis so a partial shrink doesn't burn the one-shot retry. Single shared agent path -> fixes CLI, TUI, and gateway alike. Adds a real-Pillow runnable proof (repro_48013_image_shrink_brick.py) that reproduces the issue's per-image table (bricks 3/5 before, passes 5/5 after) plus unit invariants for the dimension and bytes accept/reject paths, partial-progress accounting, and the bytes-path still-over-cap regression surfaced by adversarial review. Closes #48013	2026-06-19 11:37:51 -07:00
tt-a1i	46f9d53468	fix(agent): aggregate anthropic aux calls via stream	2026-06-19 17:32:13 +05:30
kshitij	226ec2801a	Merge pull request #48367 from kshitijk4poor/salvage-47289 fix(agent): summarize non-retryable API errors so raw HTML never leaks to delivery	2026-06-19 14:30:04 +05:30
Hao Zhe	d7cd0bc086	fix(openviking): preserve structured sync attribution	2026-06-19 15:23:41 +08:00
Gille	e4452ffb8a	fix(agent): summarize structured provider error messages	2026-06-18 21:37:52 -07:00
xxxigm	f18f31ebf6	test(agent): cover non-retryable error HTML summarization Locks the contract that a non-retryable failure (a Cloudflare 403 "managed challenge" page) returns a short, HTML-free `error` field — guarding the field path where the raw page was dumped to Discord as ~31 messages. The test drives the standard chat-completions path with a concrete model so the turn actually reaches `client.chat.completions.create`, where the mocked 403 is raised. It asserts the create call happened (guarding against a vacuous pass — an empty model on the Codex Responses path would otherwise abort on a validation ValueError before any API call) and that the summarized error includes "403" while excluding <html> / _cf_chl_opt. The non-retryable abort path is provider-agnostic; a Cloudflare managed-challenge 403 can surface on any provider behind Cloudflare.	2026-06-18 15:46:19 +05:30
kshitijk4poor	1153b42b24	Merge upstream/main into OpenViking setup-UX (salvage #32445 ) Resolves conflicts from the OpenViking churn that merged after #32445 was opened (#48042/#47662 session-switch + write hardening, #47311/#47973): - plugins/memory/openviking/__init__.py: keep both __init__ field groups (the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down). - tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new setup-validation tests and main's session-switch/concurrency tests (disjoint additions to the same region). Two fixes layered while reconciling (contributor work otherwise preserved): - Restore the merged tenant-header contract (#22414/#21232). The PR had changed _VikingClient defaults to '' and made empty account/user OMIT the tenant headers; main's contract is that empty falls back to 'default' and the X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them). Reverted the constructor to 'account or os.environ.get(..., "default")' and updated the two PR tests that asserted the omit-when-empty behavior. - Close a secret-file TOCTOU in the setup writers. _write_env_vars and _write_ovcli_config wrote the api_key/root_api_key file and chmod 0600 AFTERWARD, leaving a world-readable window on newly-created files. Added _precreate_secret_file() to create with 0600 before any secret bytes land.	2026-06-18 11:28:51 +05:30
teknium1	c5eb64b9f7	fix(xai): scope native web_search to swap-only + reconcile composer ctx to 200k Salvage corrections on top of @XVVH's #44341: - Make native web_search injection a 1:1 swap for an already-present client web_search function, NOT an additive grant. The original unconditionally appended {"type":"web_search"} on every is_xai_responses turn with any tools, force-enabling Grok server-side search even when the user never enabled the web toolset (bypassing Hermes web-provider config + tool-trace plumbing). Now gated on a client web_search actually being present. - Reconcile grok-composer context to 200000 (merged in #47908) rather than 262144; 200k is xAI's published usable context window for Composer 2.5, 262144 is the /v1/responses input+output budget. - Update tests to match scoped behavior + add a no-web-toolset guard test. - AUTHOR_MAP entry for #44341 salvage. Incomplete-guard (server-side *_call items at in_progress no longer flip has_incomplete_items) and preflight built-in-tool allowlist kept as-is.	2026-06-17 17:33:32 -07:00
XVVH	6f89e17a33	fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context - model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models) - codex transport: inject native {"type":"web_search"} for is_xai_responses; drop client web_search to avoid duplicate-name 400s - codex adapter: do not treat in-progress server-side *_call items as incomplete - tests: adapter, transport build_kwargs, model_metadata, oauth recovery	2026-06-17 17:33:32 -07:00
Reiji Kisaragi	3d21666b2f	fix: preserve multimodal user content during persistence Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242	2026-06-17 09:49:39 -07:00
teknium	28f92478e3	test(hooks): cover session:compress event; drop dead import Follow-up to salvaged PR #41624: - Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused) - Add tests: session:compress emits with correct context, no-callback is safe, and a callback exception does not break compression	2026-06-16 11:45:36 -07:00
Hao Zhe	e3adbb5ae9	fix(openviking): sanitize skill memory input	2026-06-16 10:37:37 -07:00
Hao Zhe	2c2ca0443b	feat(memory): improve OpenViking setup UX	2026-06-17 01:04:26 +08:00
Teknium	4858942c55	fix(auxiliary): honor main fallback chain for auto tasks (#47235 )	2026-06-16 06:23:24 -07:00
teknium	98ae28657f	feat(display): document and test memory_notifications setting Follow-up to salvaged PR #4684: - Add display.memory_notifications to DEFAULT_CONFIG (off\|on\|verbose, default on) - Document the setting in docs/user-guide/features/memory.md - Add resolver tests for off/on/verbose memory + skill paths	2026-06-16 05:45:40 -07:00
Wolfram Ravenwolf	4cf9d80fba	feat(display): verbose skill change notifications with content previews When display.memory_notifications is set to 'verbose', skill_manage notifications now show meaningful change details instead of just the generic tool message. Before (verbose mode): 💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement). After (verbose mode): 💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..." Changes: - skill_manager_tool.py: _patch_skill() now includes old/new string previews (truncated to 200 chars) in the result via '_change' key. _create_skill() and _edit_skill() include skill description from frontmatter for verbose create/edit notifications. - run_agent.py: Background review notification builder now reads the '_change' dict from skill tool results and formats descriptive notifications per action type (patch → old→new diff, create/edit → description preview). Falls back to generic message when _change data is unavailable (backwards compatible). This is especially useful when subagents patch skills, since neither the user nor the parent agent can see what the subagent changed.	2026-06-16 05:45:40 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
Teknium	2b4873f7fb	fix(agent): persist repaired-turn responses (#46071 )	2026-06-14 03:20:25 -07:00
SHL0MS	bb46bf8ce4	fix(agent): surface model refusals instead of retrying them as errors A Claude refusal (HTTP 200, stop_reason="refusal", empty content) was laundered into a generic retry loop and surfaced as a misleading "rate limited / invalid response" or "no content after retries" error, burning paid attempts reproducing a deterministic refusal. This hit two distinct paths: - Direct Anthropic (anthropic_messages): validate_response rejected the empty-content refusal before normalize_response mapped refusal -> content_filter, so it fell into the invalid-response retry loop. - Nous Portal / OpenAI-compatible (chat_completions): the portal surfaces a Claude refusal via message.refusal with empty content, which sailed past validation and died in the empty-response retry loop. Fix (one unified content_filter dispatch for all backends): - AnthropicTransport.validate_response: accept empty content when stop_reason == "refusal" so it flows to normalize_response. - ChatCompletionsTransport.normalize_response: promote message.refusal to content + a content_filter finish reason. - conversation_loop: handle finish_reason == "content_filter" - fire the api_request_error hook (content_policy_blocked), try a configured fallback once, else return a clear terminal refusal message. Never retry a deterministic refusal. Supersedes #43084, which fixed only the direct-Anthropic path and could not reach the chat_completions/portal path. Tests: transport-level (validate_response refusal, message.refusal promotion) + end-to-end loop (refusal surfaced, exactly one API call). (cherry picked from commit `01f546f92c`)	2026-06-14 12:10:08 +05:30
brooklyn!	4b5ba112ad	fix: shrink images to reported provider dimension limit (#45979 ) Parse provider-reported image pixel ceilings so many-image Anthropic requests can recover by shrinking Retina screenshots below the stricter limit instead of retrying the same rejected payload.	2026-06-14 01:07:43 -05:00

1 2 3 4 5 ...

431 commits