hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-21 10:22:18 +00:00

Author	SHA1	Message	Date
tt-a1i	46f9d53468	fix(agent): aggregate anthropic aux calls via stream	2026-06-19 17:32:13 +05:30
kshitij	226ec2801a	Merge pull request #48367 from kshitijk4poor/salvage-47289 fix(agent): summarize non-retryable API errors so raw HTML never leaks to delivery	2026-06-19 14:30:04 +05:30
Hao Zhe	d7cd0bc086	fix(openviking): preserve structured sync attribution	2026-06-19 15:23:41 +08:00
Gille	e4452ffb8a	fix(agent): summarize structured provider error messages	2026-06-18 21:37:52 -07:00
xxxigm	f18f31ebf6	test(agent): cover non-retryable error HTML summarization Locks the contract that a non-retryable failure (a Cloudflare 403 "managed challenge" page) returns a short, HTML-free `error` field — guarding the field path where the raw page was dumped to Discord as ~31 messages. The test drives the standard chat-completions path with a concrete model so the turn actually reaches `client.chat.completions.create`, where the mocked 403 is raised. It asserts the create call happened (guarding against a vacuous pass — an empty model on the Codex Responses path would otherwise abort on a validation ValueError before any API call) and that the summarized error includes "403" while excluding <html> / _cf_chl_opt. The non-retryable abort path is provider-agnostic; a Cloudflare managed-challenge 403 can surface on any provider behind Cloudflare.	2026-06-18 15:46:19 +05:30
kshitijk4poor	1153b42b24	Merge upstream/main into OpenViking setup-UX (salvage #32445 ) Resolves conflicts from the OpenViking churn that merged after #32445 was opened (#48042/#47662 session-switch + write hardening, #47311/#47973): - plugins/memory/openviking/__init__.py: keep both __init__ field groups (the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down). - tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new setup-validation tests and main's session-switch/concurrency tests (disjoint additions to the same region). Two fixes layered while reconciling (contributor work otherwise preserved): - Restore the merged tenant-header contract (#22414/#21232). The PR had changed _VikingClient defaults to '' and made empty account/user OMIT the tenant headers; main's contract is that empty falls back to 'default' and the X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them). Reverted the constructor to 'account or os.environ.get(..., "default")' and updated the two PR tests that asserted the omit-when-empty behavior. - Close a secret-file TOCTOU in the setup writers. _write_env_vars and _write_ovcli_config wrote the api_key/root_api_key file and chmod 0600 AFTERWARD, leaving a world-readable window on newly-created files. Added _precreate_secret_file() to create with 0600 before any secret bytes land.	2026-06-18 11:28:51 +05:30
teknium1	c5eb64b9f7	fix(xai): scope native web_search to swap-only + reconcile composer ctx to 200k Salvage corrections on top of @XVVH's #44341: - Make native web_search injection a 1:1 swap for an already-present client web_search function, NOT an additive grant. The original unconditionally appended {"type":"web_search"} on every is_xai_responses turn with any tools, force-enabling Grok server-side search even when the user never enabled the web toolset (bypassing Hermes web-provider config + tool-trace plumbing). Now gated on a client web_search actually being present. - Reconcile grok-composer context to 200000 (merged in #47908) rather than 262144; 200k is xAI's published usable context window for Composer 2.5, 262144 is the /v1/responses input+output budget. - Update tests to match scoped behavior + add a no-web-toolset guard test. - AUTHOR_MAP entry for #44341 salvage. Incomplete-guard (server-side *_call items at in_progress no longer flip has_incomplete_items) and preflight built-in-tool allowlist kept as-is.	2026-06-17 17:33:32 -07:00
XVVH	6f89e17a33	fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context - model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models) - codex transport: inject native {"type":"web_search"} for is_xai_responses; drop client web_search to avoid duplicate-name 400s - codex adapter: do not treat in-progress server-side *_call items as incomplete - tests: adapter, transport build_kwargs, model_metadata, oauth recovery	2026-06-17 17:33:32 -07:00
Reiji Kisaragi	3d21666b2f	fix: preserve multimodal user content during persistence Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242	2026-06-17 09:49:39 -07:00
teknium	28f92478e3	test(hooks): cover session:compress event; drop dead import Follow-up to salvaged PR #41624: - Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused) - Add tests: session:compress emits with correct context, no-callback is safe, and a callback exception does not break compression	2026-06-16 11:45:36 -07:00
Hao Zhe	e3adbb5ae9	fix(openviking): sanitize skill memory input	2026-06-16 10:37:37 -07:00
Hao Zhe	2c2ca0443b	feat(memory): improve OpenViking setup UX	2026-06-17 01:04:26 +08:00
Teknium	4858942c55	fix(auxiliary): honor main fallback chain for auto tasks (#47235 )	2026-06-16 06:23:24 -07:00
teknium	98ae28657f	feat(display): document and test memory_notifications setting Follow-up to salvaged PR #4684: - Add display.memory_notifications to DEFAULT_CONFIG (off\|on\|verbose, default on) - Document the setting in docs/user-guide/features/memory.md - Add resolver tests for off/on/verbose memory + skill paths	2026-06-16 05:45:40 -07:00
Wolfram Ravenwolf	4cf9d80fba	feat(display): verbose skill change notifications with content previews When display.memory_notifications is set to 'verbose', skill_manage notifications now show meaningful change details instead of just the generic tool message. Before (verbose mode): 💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement). After (verbose mode): 💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..." Changes: - skill_manager_tool.py: _patch_skill() now includes old/new string previews (truncated to 200 chars) in the result via '_change' key. _create_skill() and _edit_skill() include skill description from frontmatter for verbose create/edit notifications. - run_agent.py: Background review notification builder now reads the '_change' dict from skill tool results and formats descriptive notifications per action type (patch → old→new diff, create/edit → description preview). Falls back to generic message when _change data is unavailable (backwards compatible). This is especially useful when subagents patch skills, since neither the user nor the parent agent can see what the subagent changed.	2026-06-16 05:45:40 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
Teknium	2b4873f7fb	fix(agent): persist repaired-turn responses (#46071 )	2026-06-14 03:20:25 -07:00
SHL0MS	bb46bf8ce4	fix(agent): surface model refusals instead of retrying them as errors A Claude refusal (HTTP 200, stop_reason="refusal", empty content) was laundered into a generic retry loop and surfaced as a misleading "rate limited / invalid response" or "no content after retries" error, burning paid attempts reproducing a deterministic refusal. This hit two distinct paths: - Direct Anthropic (anthropic_messages): validate_response rejected the empty-content refusal before normalize_response mapped refusal -> content_filter, so it fell into the invalid-response retry loop. - Nous Portal / OpenAI-compatible (chat_completions): the portal surfaces a Claude refusal via message.refusal with empty content, which sailed past validation and died in the empty-response retry loop. Fix (one unified content_filter dispatch for all backends): - AnthropicTransport.validate_response: accept empty content when stop_reason == "refusal" so it flows to normalize_response. - ChatCompletionsTransport.normalize_response: promote message.refusal to content + a content_filter finish reason. - conversation_loop: handle finish_reason == "content_filter" - fire the api_request_error hook (content_policy_blocked), try a configured fallback once, else return a clear terminal refusal message. Never retry a deterministic refusal. Supersedes #43084, which fixed only the direct-Anthropic path and could not reach the chat_completions/portal path. Tests: transport-level (validate_response refusal, message.refusal promotion) + end-to-end loop (refusal surfaced, exactly one API call). (cherry picked from commit `01f546f92c`)	2026-06-14 12:10:08 +05:30
brooklyn!	4b5ba112ad	fix: shrink images to reported provider dimension limit (#45979 ) Parse provider-reported image pixel ceilings so many-image Anthropic requests can recover by shrinking Retina screenshots below the stricter limit instead of retrying the same rejected payload.	2026-06-14 01:07:43 -05:00
briandevans	1d584a301e	fix(agent): treat Codex reasoning items as thinking-only	2026-06-13 14:35:00 -07:00
briandevans	7d11fa4e9e	fix(codex-responses): let final_answer complete top-level incomplete responses	2026-06-13 13:45:29 -07:00
Teknium	8905ee6b8a	fix(agent): rewind flush cursor exactly when repair compacts before the cursor Follow-up to the #44837 clamp: a min() clamp only fixes cursor overshoot past the new end of the list. When repair_message_sequence drops/merges messages at indexes below the cursor, the clamp leaves the cursor pointing past unflushed rows and the turn-end flush silently skips them. Extract repair_message_sequence_with_cursor(): snapshot the flushed prefix by object identity before repair, then recompute the cursor as the count of surviving flushed messages. Falls back to the clamp when no snapshot is available. Keeps the safety guard in _flush_messages_to_session_db. Adds targeted tests for overshoot, before-cursor compaction, no-repair, bare-agent, and the flush guard.	2026-06-12 16:29:01 -07:00
Teknium	fca84fe20b	test: regression guard for Nous 429 fallback re-entry; AUTHOR_MAP entry	2026-06-12 12:21:29 -07:00
Teknium	c196269d8d	fix(credits): suppress usage gauge when top-up funds exist + add display.credits_notices toggle (#44716 ) The subscription-cap usage gauge (50/75/90% bands) ignored purchased (top-up) credits: a sub user with top-up funds got a sticky warn banner at 90% of their cap — permanently at >=100%, alongside grant_spent — despite being fully able to keep inferencing. The cap is the wrong denominator for an account that can keep spending. - evaluate_credits_notices: purchased_micros > 0 suppresses the usage band (grant_spent already covers the cap-reached + top-up case with the remaining balance). A top-up landing mid-session clears any showing band; spending top-up down to 0 resumes the gauge. - New display.credits_notices config (default true): false silences all credits notices. State capture and /usage are unaffected. Read once per agent (cached) in _emit_credits_notices, fail-open true. - Docs: configuration.md display block.	2026-06-12 01:06:46 -07:00
Erosika	87893fe4cb	fix(memory): flatten multimodal content before provider sync Multimodal turns carry message content as a list of typed parts ({type: "text"\|"image_url", ...}). _sync_external_memory_for_turn passed that list straight into MemoryManager.sync_all, and providers feed it to regexes — Honcho's sync_turn calls sanitize_context, where re.sub raised 'expected string or bytes-like object, got list'. Every turn with an attached image silently never synced. Flatten to plain text at the boundary: text parts joined, images noted as an [N image(s)] marker so the attachment isn't erased from recall. Fixing here covers all providers instead of patching each plugin. (cherry picked from commit `705bdb6ffe`)	2026-06-12 12:46:28 +05:30
Teknium	e24c935cf3	fix(bedrock): fall back to non-streaming InvokeModel when IAM denies InvokeModelWithResponseStream (#44293 ) IAM policies scoped to bedrock:InvokeModel only (a common least-privilege setup) reject converse_stream() with AccessDeniedException. The agent loop hard-prefers streaming and the denial never matched the 'stream not supported' auto-fallback, so InvokeModel-only users looped on AccessDenied forever. - agent/bedrock_adapter.py: new is_streaming_access_denied_error() detector (ClientError code check + wrapped-SDK message match); call_converse_stream() falls back to converse() on denial. - agent/chat_completion_helpers.py: bedrock_converse streaming branch retries inline via converse() and sets _disable_streaming so later turns skip the doomed stream attempt; the chat-completions retry block also recognizes the denial for the AnthropicBedrock SDK path (message pre-check avoids importing bedrock_adapter — and its lazy boto3 install — for unrelated providers). Both paths print a one-line notice telling the user which IAM action restores streaming.	2026-06-11 07:15:30 -07:00
0xyg3n	9f95f72b98	fix(agent): strip api_messages in thinking-signature recovery so the retry actually omits thinking blocks The thinking-signature recovery in agent/conversation_loop.py popped reasoning_details from messages, then continued to retry. That had two defects. First, the strip never reached the wire payload. api_messages is built once at the start of the turn by shallow-copying every entry in messages (line 919 area). Each api_messages entry has its own reference to the same reasoning_details list. When build_api_kwargs runs on every retry iteration of the inner while-loop, it consumes api_messages, not messages. Popping reasoning_details from messages left api_messages untouched, so the retry's request still carried the same thinking blocks Anthropic had just rejected. The classifier latched thinking_sig_retry_attempted = True after the first attempt, and the loop terminated with max_retries_exhausted on the same 400. Second, the pop mutated the canonical message list. messages is the same list _persist_session writes to state.db and the session transcript, so a single recovery permanently wiped every signed thinking block from the stored conversation. Subsequent turns reloaded the stripped state, hit the same 400 ('invalid signature' or 'cannot be modified', see #24107), and the agent stopped responding entirely. Cascading compaction-ended sessions then chained off the corrupted parent and the affected chat could not produce a response on any future turn. Move the strip onto api_messages, which is the API-call-time list rebuilt into kwargs on every retry. messages is no longer touched, so disk I/O stays clean and the recovery actually reaches the wire. Observed against the native Anthropic Messages API on claude-opus-4-7 and claude-opus-4-8 with the interleaved-thinking-2025-05-14 beta on hermes-agent 0.12.0 and 0.14.0. PR #24107 narrows the trigger; this change makes the recovery do what it always claimed to do, and prevents the destructive aftermath. Tests cover the api_messages strip in isolation: pop on a shallow copy does not affect the source, the canonical messages list survives the strip, idempotency on a duplicate firing path, and a no-op when no reasoning_details exist on the messages. Related: #24107, #26959, #17861.	2026-06-10 12:39:44 -07:00
Xiangji	19c07c4037	fix(params): send max_completion_tokens for newer OpenAI families on custom endpoints Third-party OpenAI-compatible endpoints (self-hosted gateways, OpenRouter, Azure proxies) fronting gpt-4o / gpt-4.1 / gpt-5+ / o1-o4 models silently received max_tokens and 400'd with unsupported_parameter, because the three kwarg-selection sites only checked base_url_hostname(...) == "api.openai.com" and fell through to max_tokens on every other host. The constraint is enforced server-side by the model family, not by the URL, so name-based detection is required as a fallback. Changes: - utils.py: new shared helper model_forces_max_completion_tokens(model) that prefix-matches gpt-4o, gpt-4.1, gpt-5, o1, o3, o4 families on normalized (lowercased, vendor-prefix-stripped) names. - run_agent.py: _max_tokens_param ORs the helper into the URL check. - agent/auxiliary_client.py: - auxiliary_max_tokens_param gains an optional keyword-only model arg. - _build_call_kwargs inline branch applies the same check for both provider == "custom" and non-custom paths. Tests: - tests/test_model_forces_max_completion_tokens.py: 31 new cases covering positive families, negatives (classic gpt-4, claude, llama, mistral, qwen, deepseek), vendor prefixes, case-insensitivity, whitespace, None/empty, and substring-not-prefix guards. - tests/run_agent/test_run_agent.py::TestMaxTokensParam: 5 new model-based cases (custom + gpt-5.4, openrouter + gpt-4o-mini, custom + o1-preview, classic gpt-4-turbo keeps max_tokens, llama3 keeps max_tokens). - tests/agent/test_auxiliary_client.py::TestAuxiliaryMaxTokensParam: new class, 7 tests covering the URL x model matrix.	2026-06-09 23:22:10 -07:00
JP Lew	cb4cc08b0a	fix(codex): record app-server token usage in session accounting	2026-06-09 02:46:04 -07:00
Teknium	d6c11a4575	test(run_agent): fix racy ordering in test_concurrent_handles_tool_error (#42356 ) The test keyed the 'which call raises' decision on a shared invocation counter (first call → raise, second → success), then asserted the error landed in messages[0] (c1) and success in messages[1] (c2). But _execute_tool_calls_concurrent runs the two web_search calls on a thread pool with no ordering guarantee — c2's handler can be invoked first, take the 'first call raises' branch, and the error ends up in messages[1]. Results are ordered by tool_call_id, so messages[0] (c1) was then 'success' and the assertion failed. It passed in isolation but reliably failed under CI's full parallel slice (8 xdist workers) where the scheduler actually interleaves the two handlers. Fix: tie the raise to a specific tool call via its arguments (q=boom raises, q=ok succeeds) instead of invocation order, and assert tool_call_id ↔ content pairing explicitly. Deterministic regardless of thread scheduling — verified 10/10 in isolation and the full TestConcurrentToolExecution class (32) green.	2026-06-08 14:40:39 -07:00
Teknium	c9094f5e5f	fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314 ) * fix(stream): don't report dropped mid-tool-call streams as output truncation A streaming tool call whose SSE ends with no finish_reason (the upstream delivers the tool name + opening '{' then closes the connection cleanly, no terminator, no [DONE]) was stamped finish_reason='length' by the mock builder. That routed it through the output-cap truncation path: 3 useless max_tokens-boosted retries, then the misleading 'Response truncated due to output length limit' error — even though the model never reported hitting any cap. Reproduced live on nvidia/nemotron-3-ultra:free via the Nous dedicated endpoint, which stalls/drops during large tool-arg generation (50s-4m41s). Now: when tool args are incomplete AND the provider sent no finish_reason, tag the response as a partial-stream stub so the loop reports an honest mid-tool-call drop and asks the model to chunk its output (existing continuation machinery), instead of escalating output budget and lying. A provider-reported finish_reason='length' still takes the real-truncation path unchanged. * test(stream): update truncated-tool-args test for drop-vs-cap split test_truncated_tool_call_args_upgrade_finish_reason_to_length pinned the old behaviour where ANY incomplete tool args → finish_reason='length' with tool_calls preserved. That single-chunk-no-finish_reason scenario is exactly the mid-tool-call stream drop now reclassified as a partial-stream stub. Split into two tests matching the new contract: - no finish_reason + incomplete args → PARTIAL_STREAM_STUB_ID, tool_calls=None, _dropped_tool_names set (the drop path) - explicit finish_reason='length' + incomplete args → tool_calls preserved, 'length' upgrade unchanged (the genuine output-cap path)	2026-06-08 11:56:10 -07:00
teknium1	094aa85c37	refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) Lift the 5 agent-construction/session-resume methods out of HermesCLI into hermes_cli/cli_agent_setup_mixin.py:CLIAgentSetupMixin. Behavior-neutral; cli.py 14139 -> 13492 LOC. Methods moved (~647 LOC): _ensure_runtime_credentials, _resolve_turn_agent_config, _init_agent, _preload_resumed_session, _display_resumed_history. All self.* calls resolve unchanged via the MRO (HermesCLI(CLIAgentSetupMixin, CLICommandsMixin)). Import split (same recipe as #41942): 2 neutral deps (sys, _escape) imported at the mixin module top; 12 cli.py-internal helpers/constants (AIAgent, ChatConsole, CLI_CONFIG, _cprint, _DIM, _RST, _accent_hex, ...) imported lazily per-method (from cli import ...) so the mixin never imports cli at module scope -> no cycle. Repointed one source-inspection change-detector (test_callable_api_key.py) to read the mixin file where the method now lives.	2026-06-08 09:41:34 -07:00
teknium1	7a5827c8b0	test: repoint percentage-clamp source guard to gateway/slash_commands.py test_gateway_run_clamped read gateway/run.py asserting the /usage stats handler clamps pct with min(100, ...). That handler moved to gateway/slash_commands.py in this PR's extraction; repoint the guard so it still fires on clamp removal. tests/run_agent/ + tests/gateway/ 8024 passed / 0 failed.	2026-06-08 01:25:35 -07:00
Martín Alcalá Rubí	132d6fe6d6	fix(volcengine): strip XML attribute fragments from tool_use.name (#33007 ) VolcEngine's api/plan endpoint occasionally leaks raw XML attribute fragments into tool_use.name when its protocol-translation layer converts the model's native XML-style tool emission to Anthropic Messages tool_use blocks, producing names like: terminal" parameter="command" string="true execute_code" parameter="code" string="true session_search" parameter="session_id" string="true The corruption happens server-side at the provider, but it breaks every tool call for affected users — no normalization rule in repair_tool_call can rescue them, so each request runs through three retries and then aborts as partial. Add an early sanitizer in agent_runtime_helpers.repair_tool_call that trims at the first ' " ', " ' ", '<', or '>' character (idx > 0 only) so the rest of the existing repair pipeline (lowercase / snake_case / fuzzy match) can resolve the cleaned name normally. Whitespace is deliberately NOT a separator — the legitimate "write file" -> write_file repair path (covered by test_space_to_underscore) must keep working. Tests: 11 new regression cases in TestVolcEngineXmlPollution covering all three observed polluted names, CamelCase + pollution mix, single-quote variants, angle-bracket variants, clean-name passthrough, and the whitespace-preservation guard. All 18 pre- existing repair tests still pass (29 total in the file).	2026-06-07 22:22:01 -07:00
teknium1	54870847cb	refactor(agent): extract run_conversation prologue into agent/turn_context.py Phase 1 of the god-file decomposition plan. run_conversation's ~470-line once-per-turn setup block (stdio guarding, retry-counter resets, user-message sanitization, todo/nudge hydration, system-prompt restore-or-build, crash-resilience persistence, preflight compression, the pre_llm_call hook, and external-memory prefetch) is moved verbatim into build_turn_context(), which returns a TurnContext dataclass the loop unpacks. Behavior-neutral move-and-name refactor: the builder mutates `agent` exactly as the inline code did; only the locals the loop reads back are returned. - run_conversation: 4602 -> 4217 LOC (-385) - agent/conversation_loop.py: 4965 -> ~4580 LOC - new agent/turn_context.py: focused, dependency-injected, unit-tested in isolation Tests: tests/run_agent/ 1570 passed / 0 failed under per-file process isolation. Relocation follow-ups: 413_compression mocks now patch both module references; nudge/on_turn_start source-inspection guards point at the extracted module.	2026-06-07 22:17:35 -07:00
islam666	9513793ad7	fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072 ) Xiaomi MiMo (and potentially other providers) support multimodal user messages but reject list-type tool message content with 400 'text is not set'. Previously this was handled reactively — the API call would fail, images would be stripped, and the request retried, losing visual info. Fix: add supports_vision_tool_messages field to ProviderProfile (default True). Xiaomi sets it to False. _tool_result_content_for_active_model now checks this field proactively and returns a text summary instead of list content, avoiding the round-trip failure entirely.	2026-06-07 21:50:57 -07:00
islam666	b18490b890	fix(compaction): prevent infinite loop when transcript fits in tail budget When summary_target_ratio is large (e.g. 0.45) and the context_length is moderate (e.g. 96000), the soft_ceiling (token_budget * 1.5) can exceed the total transcript size. _find_tail_cut_by_tokens walks the entire transcript without breaking early, and the resulting compress window is either empty (compress_start >= compress_end) or a single message whose summary-of-one overhead saves ~0 tokens. Both outcomes cause a no-op compression that does not increment _ineffective_compression_count, so should_compress() returns True on every subsequent turn and the loop repeats endlessly. Fix (two layers): 1. _find_tail_cut_by_tokens: when the backward walk consumed the entire transcript without breaking (cut_idx <= head_end and accumulated <= soft_ceiling), re-walk with the raw (non-inflated) token budget to find a meaningful cut that gives the summarizer a useful middle window. 2. compress(): when compress_start >= compress_end, increment _ineffective_compression_count and log a warning so the existing anti-thrashing guard in should_compress() can break the loop. Fixes #40803	2026-06-07 21:50:57 -07:00
bmoore210	330ca4585b	fix: harden gateway startup and turn persistence Persist the inbound user turn before provider/tool execution so a crash before run_conversation() (e.g. provider/httpx client init failure) keeps the inbound message in the transcript. Repair stale/missing SSL_CERT_FILE state on gateway startup, and avoid duplicate gateway fallback writes.	2026-06-07 02:15:23 -07:00
Sanidhya Singh	a216ff839b	fix(agent): honor model.default_headers for custom OpenAI-compatible providers (#40033 ) Custom OpenAI-compatible endpoints sitting behind a gateway/WAF can reject the OpenAI Python SDK's default identifying headers (User-Agent: OpenAI/Python, X-Stainless-*) and return an opaque 502/4xx even though the same request body succeeds under curl. There was no supported way to override those headers. Add a model.default_headers config key whose values are merged onto the OpenAI client's default_headers, taking precedence over provider- and SDK-supplied defaults. Applied at client construction and on every credential swap / client rebuild so the override survives reconnects. No-op for native Anthropic / Bedrock modes and when unconfigured.	2026-06-07 02:02:40 -07:00
Teknium	e18f14d928	test(kimi): align stale parity/profile tests with thinking-xor-effort contract (#41095 ) * test(kimi): align stale parity/profile tests with thinking-xor-effort contract `ce4e74b3` (fix(kimi): send thinking xor reasoning_effort, never both) changed the Kimi profile to emit at most one of extra_body.thinking or a top-level reasoning_effort, and added tests/plugins/model_providers/test_kimi_profile.py to pin it — but left two older test files still asserting the removed 'send both' behavior, turning main red for every PR branched after it. Update the stale assertions to the xor contract: - explicit recognized effort (low\|medium\|high) -> reasoning_effort only, no thinking - enabled w/o effort, or no reasoning_config -> thinking:enabled only, no reasoning_effort - disabled -> thinking:disabled only No production change. * test(kimi): cover remaining xor stale assertions (profile_wiring, run_agent) Two more test files asserted the pre-ce4e74b3 'thinking + reasoning_effort together' behavior — landed in a different CI shard so they surfaced only after the first batch went green: - tests/providers/test_profile_wiring.py::TestKimiProfileParity (2) - tests/run_agent/test_run_agent.py::TestBuildApiKwargs (3: kimi-coding, moonshot, moonshot-cn) Same realignment to the xor contract: default/enabled-without-effort emits thinking:enabled and no reasoning_effort; explicit effort emits reasoning_effort only. Verified by running the full provider + TestBuildApiKwargs Kimi surface (202 passed) plus a codebase-wide grep for any remaining paired thinking+effort assertion (none).	2026-06-07 01:52:49 -07:00
Frowtek	40cea4d58d	fix(agent): import SimpleNamespace for hook payload sanitization _hook_jsonable() referenced SimpleNamespace without importing it, so sanitizing any hook payload that contained one raised NameError: name 'SimpleNamespace' is not defined. Bedrock, Codex-responses, and the auxiliary client build their response / message / tool_call objects as SimpleNamespace and hand the raw objects to the post_api_request hook. The hook call sites swallow exceptions (except Exception: pass), so the crash silently dropped the observability hook for those providers. Add the missing `from types import SimpleNamespace` and a regression test covering the SimpleNamespace sanitization path.	2026-06-06 19:32:36 -07:00
Teknium	fe8920db18	fix(memory): reject memory tools that shadow core tool names (#40902 ) A memory provider tool whose name collides with a built-in core tool (e.g. clarify, delegate_task) was skipped from agent.tools at init but lingered in MemoryManager._tool_to_provider, where the has_tool dispatch branch could route a call to a tool that was never registered (#40466). Block the collision at registration instead of patching dispatch: - MemoryManager.add_provider rejects any tool whose name is in _HERMES_CORE_TOOLS (warn + skip), so it never enters the routing table. - get_all_tool_schemas applies the same filter, so the manager never advertises a schema it would refuse to route. Built-ins always win, matching the invariant used by the TTS/browser/ search provider registries. Makes the dispatch-hijack structurally impossible regardless of branch ordering. Closes #40466.	2026-06-06 18:44:09 -07:00
kshitij	d4a7bfd3aa	Merge pull request #29724 from bbednarski9/bbednarski/nmf-41B-nemoflow-plugin feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay	2026-06-06 10:46:41 -07:00
Siddharth Balyan	fcb1944b4f	feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details * feat(tui): HERMES_DEV_CREDITS live-spend dev readout (L0 tracer for usage-aware credits) L0 of the usage-aware-credits feature: a dev-only, env-gated tracer that exercises the real header -> CreditsState -> TUI pipe end-to-end behind HERMES_DEV_CREDITS, de-risking the L1/L5 build before the notice policy exists. - agent/credits_tracker.py: CreditsState + parse_credits_headers (headers are strings -> paid_access via == "true", never bool(); retain-last-known; only subscription_micros may be negative; _usd kept verbatim). - run_agent.py: _capture_credits / get_credits_state / get_credits_spent_micros, session-start baseline latch, + dev-gated "credits" capture log. - agent/chat_completion_helpers.py: capture on the streaming response. - agent/agent_init.py: init _credits_state + _credits_session_start_micros. - tui_gateway/server.py: _get_usage emits dev_credits_spent_micros only when flagged. - ui-tui appChrome.tsx / types.ts: cents delta status segment + "(dev credits)" banner. Off by default; silent for normal users. Validated live against staging (capture log delta matches the TUI segment). Throwaway consumer (readout/log/ banner); credits_tracker + the capture plumbing are the real feature foundation. test(credits): lock parser under 9-state matrix + harden validation (L2) Add tests/agent/test_credits_tracker.py with 92 tests covering the 9-state matrix (healthy, sub_90pct, grant_exhausted, purchased_only, tool_pool_free, depleted, debt, missing, no_org) plus validation edge cases: version strict==1 with warn-once latch for v>1, bool-string trap (paid_access/tool_pool_gated_off == "true"/"false", never bool()), half-pair subscription limit treated as both-absent while parse succeeds, USD regex ^-?\d+\.\d{2}$, non-int micros → None, negative non-subscription micros → None, as_of_ms junk → None, zero limit ZeroDivision guard. Harden agent/credits_tracker.py to match the spec: - Add tool_pool_micros/tool_pool_gated_off/from_header fields to CreditsState - Add depleted property (== not paid_access, never remaining==0) - Change used_fraction guard to key off subscription_limit_micros (the actual denominator) not denominator_kind (metadata) - Replace fail-soft _safe_int with a sentinel-returning variant; full validation now returns None on any malformed field rather than silently defaulting - Add module-level warn-once latch for version > 1 - Add USD regex validation; add denominator_kind allow-list check - Parse x-nous-tool-pool-* prefix headers (not x-nous-credits-tool-pool-) feat(credits): notice spine — AgentNotice + notice_callback/notice_clear_callback + TUI binding (L1) L1 of usage-aware credits: the driver-agnostic notice delivery spine that L4's policy will fire through and L5's TUI render will consume. - agent/credits_tracker.py: AgentNotice dataclass (text/level/kind/ttl_ms/key/id; kind defaults "sticky", kept TTL-expressive for a future config seam). - run_agent.py: AIAgent gains notice_callback + notice_clear_callback slots and _emit_notice / _emit_notice_clear emitters (swallow all callback errors — a notice must never break the agent loop; no-op when unbound). - agent/agent_init.py: thread both callbacks through init_agent. - tui_gateway/server.py: bind both in _agent_cbs → notification.show / notification.clear WS events (snake_case payload, matching the existing gateway-event convention). - ui-tui/src/gatewayTypes.ts: notification.show / notification.clear arms on GatewayEvent. - tests/run_agent/test_notice_spine.py: 15 tests (emitter fire + fail-open + no-op, signature threading, TUI binding payload shape). Messaging push is out of v1 (binds neither callback). CLI binding + the TUI render/ decode land with L4 (firing) and L5 (render) so turn-end flush is wired correctly. * feat(credits): threshold reconciliation policy + tests (L4.1) * feat(credits): wire threshold policy into capture + latch (L4.2) After a fresh header parse, _capture_credits runs evaluate_credits_notices against the agent's _credits_latch and emits the result — clears first, then shows (so a recovered depletion clears before the "restored" success lands, and depleted wins the latest-wins slot). Gated on a bound notice_callback: messaging (no callbacks) still caches state for /usage but runs no policy. Parse stays fail-open (miss → keep last-known); the eval/emit path warns on failure rather than swallowing, so a depletion-notice bug can't vanish silently. - run_agent.py: _capture_credits split into parse (swallow→miss) + policy (warn); latch lazy-guarded (object.__new__ safety). - agent/agent_init.py: init agent._credits_latch = {"active": set(), "seen_below_90": False}. * feat(tui): render credits notices in the status bar (L5, Strategy B) The TUI now renders the notification.show / notification.clear gateway events the agent emits — a level-colored notice overrides the status/verb slot when not busy. - Notice state machine on turnController (pendingNotice + dedicated noticeTimer + show/clear/applyNotice/flushPendingNotice/clearNoticeState). createGatewayEventHandler decodes the events and delegates. - Render priority busy > notice > status (appChrome StatusRule); notice text rendered verbatim (its glyph comes from the policy), shrinkable so it never clips model│ctx; dev-credits banner + Δ segment preserved. UiState.notice is snake_case (matches wire). - Busy-wins: a notice arriving mid-turn is held and flushed at the THREE turn-end sites (recordMessageComplete / interruptTurn / recordError) — never idle(), which reset() also calls (would leak across sessions); reset() clears instead. - Dedicated noticeTimer (never statusTimer); TTL starts on visibility with an id-guard; latest-wins cancels the prior timer; clear is key-matched (no-op on mismatch); a sticky survives a turn (flush no-ops with no pending); session reset clears (no cross-session leak). - 20 tests (handler/turnController logic incl. R3-C2 timer isolation + render priority). * feat(credits): cold-start seed for new Nous sessions (L3) A genuinely-new Nous session has no inference header yet, so seed credits state from the authoritative GET /api/oauth/account snapshot at session start (in the new-session branch of _restore_or_build_system_prompt — inline, since the on_session_start plugin hook gets no agent reference). The seed runs the shared notice policy, so a session that opens already depleted warns IMMEDIATELY rather than only after the first turn. - Maps the nested account fields (paid_service_access → paid_access; total_usable / subscription / purchased on paid_service_access_info; rollover on subscription), each None-guarded; float dollars → micros via round(d1e6), _usd left "" (render formats from micros — never synthesize a verbatim usd from a float). - Magnitudes-only: no monthlyCredits on the endpoint → subscription_limit_* unset → used_fraction None → no warn90 from the seed (% only once a header lands, per D-E). - Provider-guarded to Nous; fail-open (any error leaves _credits_state None, never blocks startup); paid_access unknown ⇒ True (never falsely depleted). - run_agent.py: extracted the warm-path policy/emit block into a shared _emit_credits_notices() so capture and the seed fire notices identically. * feat(credits): /usage Nous credits magnitudes view + recovery trigger (L6) Add Nous credit dollar magnitudes to /usage (subscription / top-up / total + rollover + renewal + portal CTA), magnitudes-only per v1 (no % until the account endpoint exposes a denominator). Reuses the existing account-usage render machinery via a new pure build_nous_credits_snapshot() that maps a NousPortalAccountInfo to an AccountUsageSnapshot; no nous branch is added to fetch_account_usage (keeps the per-provider boundary intact). CLI /usage also doubles as a depletion-recovery trigger: a force_fresh account fetch, kept in a SEPARATE local so it never clobbers the header-sourced agent._credits_state (which alone carries used_fraction). If paid access recovered while credits.depleted is latched and a notice consumer is bound, it reuses agent._emit_credits_notices() to clear it. Gateway /usage displays magnitudes only — messaging binds no notice consumer, so it performs no recovery emit. Fail-open throughout: any portal hiccup leaves /usage unaffected. * refactor(credits): dedupe HERMES_DEV_CREDITS flag parse via shared helpers The dev-flag truthy check was inlined in three places. Replace with the shared utils.is_truthy_value (run_agent.py, tui_gateway/server.py — also drops a redundant inline `import os`) and a hoisted DEV_CREDITS_MODE export in ui-tui/src/config/env.ts (consumed by appChrome, which also stops recomputing the env check on every render). Behaviour-preserving; identical truthy set. * fix(credits): cut dead /usage recovery trigger + bound portal fetches (L6 review) Adversarial review found the /usage depletion-recovery trigger dead AND broken: the CLI binds no notice_clear_callback, the TUI runs /usage in a separate slash-worker subprocess (its own agent/latch), and the no-clobber rule made it evaluate stale paid_access anyway. Recovery already happens on the next inference (warm path), so the trigger was redundant — remove it and stop the depleted notice over-promising. - cli.py: remove the dead recovery block; bound the /usage portal fetch with a 10s wall-clock timeout (ThreadPoolExecutor) like the per-provider fetch — urllib's per-socket timeout is not a wall-clock guarantee. - agent/credits_tracker.py: reword the depleted CTA to "run /usage for balance" (no false recovery promise; /usage shows fresh magnitudes, sticky clears next turn). - agent/conversation_loop.py: same wall-clock timeout on the cold-start seed fetch so a stalled portal can't hang session startup; tidy its time import. * chore(credits): dev notice-state fixtures (HERMES_DEV_CREDITS_FIXTURE) Throwaway dev scaffolding to exercise the notice pipeline without real spend or Redis seeding. Set HERMES_DEV_CREDITS_FIXTURE to a state name (healthy / sub_90pct / grant_exhausted / depleted / clear) or a file path whose contents name a state (re-read each turn → flip states live for recovery testing). _capture_credits injects the chosen CreditsState instead of parsing real headers and runs the shared notice policy. Deletable with the rest of the HERMES_DEV_CREDITS scaffolding. * feat(credits): /usage monthly-grant % gauge The portal /api/oauth/account subscription block now carries monthly_credits (the per-period grant allowance, the % denominator). The consumer parsed monthly_charge but dropped monthly_credits, so /usage stayed magnitudes-only. Capture monthly_credits into NousPortalSubscriptionInfo + _subscription_from_payload. build_nous_credits_snapshot emits a Subscription usage window (real % used, routed through the existing render machinery) when monthly_credits is a finite positive denominator and credits_remaining is finite and <= cap; otherwise it degrades to magnitudes-only (older portals, rollover-over-cap, or non-finite payloads). Guards (adversarial-review-driven): reject non-finite operands (json.loads parses bare NaN/Infinity by default → would render $nan + a false 100% used), reject bools, guard div-by-zero (cap>0), and suppress the gauge when remaining > cap (rollover spanning the period makes the cap a nonsensical denominator → the $X-of-$Y detail would read as a contradiction). Debt (remaining<0) clamps to 100%. Money rule preserved: the ratio + magnitudes are computed from numeric float account fields via display formatting, never by parsing a server _usd string (there are none on these dataclasses). 13 gauge tests added (tests/agent/test_nous_credits_gauge.py). fix(credits): show /usage Nous block whenever a Nous account is present /usage runs in a slash-worker subprocess whose resolved inference provider is often not "nous" even when the user has a Nous account, so gating the Nous credits block on (provider == "nous") hid it entirely — the account data was fully available but never rendered. Gate instead on "a Nous account is logged in": a cheap local auth-state lookup (get_provider_auth_state('nous') has an access_token) decides whether to attempt the portal fetch, regardless of which provider inference runs on. In the gateway the block is also lifted out of the 'if provider:' scope so a Nous-credentialled user with another (or no) resident inference provider still sees their balance. Fail-open and the per-fetch wall-clock timeout are preserved. * fix(credits): show /usage Nous block when there's no live agent (TUI slash-worker) In the TUI, /usage runs in a slash-worker subprocess that resumes the session WITHOUT building an agent (self.agent is None), so _show_usage early-returned "(._.) No active agent" before ever reaching the Nous credits block — which is agent-independent (a portal fetch gated on Nous auth-state). Extract the block into _print_nous_credits_block() and run it at the no-agent / no-calls early-returns too (returns True if it printed, so the fallback message only shows when there's genuinely nothing). Verified live against staging: the block + monthly-grant gauge now render in the slash-worker /usage path (previously hidden). The plain CLI REPL + messaging paths are unchanged (they have a live agent). * feat(credits): escalating 50/75/90 usage bands (single status line) Replace the lone 90%-used warning with three escalating bands (50 info, 75 warn, 90 warn) shown as ONE status-bar line: it displays the highest band the subscription grant has crossed, replaces the line as usage climbs, steps back down on recovery, and clears below 50%. No stacking, no per-turn churn. Bands live in a tunable CREDITS_USAGE_BANDS list; the policy derives everything from it. Single notice key (credits.usage) with a usage_band latch field so the notice only re-emits when the band actually changes. The crossing gate (seen_below_90) is preserved so a fresh live session that opens mid-range stays quiet until it has been observed below the lowest band (cold-start primes it when it wants an open-high warning). Denominator math unchanged: % = subscription grant burn (cap - grant_remaining)/cap, clamped [0,1]; top-up never moves the %. Migrated test_credits_policy.py to the new key + added TestUsageBands (climb, step-down, recovery-clear, idempotent, inclusive boundaries). * feat(credits): hydrate notices at session OPEN via shared seed (TUI + first-turn) Notices previously only fired inside a conversation turn (first message), so a session that opened already depleted / past a usage band showed nothing at 'ready'. Extract the cold-start seed into a shared seed_credits_at_session_start() and call it (a) in the TUI/desktop agent build right after the notice callback is wired (fires at 'ready', before any message) and (b) as the first-turn fallback in conversation_loop. Idempotent (skips once _credits_state exists) and fail-open. The seed now maps monthly_credits -> subscription_limit_micros + denominator_kind='subscription_cap', so used_fraction is computable at seed time and usage-band warnings (not just depletion) hydrate on open. Primes the crossing latch so a session opening already in a band warns immediately. Degrades to depletion-only when monthly_credits is absent (older portals). Adds test_credits_cold_start.py covering open-at-band, depletion, debt, no-cap degradation, and the shared seed (fires/idempotent/skips-non-nous). * feat(credits): /usage monthly-grant % gauge + fixture support + TUI surfacing agent/account_usage.py: build_nous_credits_snapshot emits a subscription %% gauge when the portal supplies a positive, finite monthly_credits denominator with remaining <= cap (guards reject NaN/Infinity and rollover-over-cap, which would render $nan or a contradictory $X-of-$Y); degrades to magnitudes-only otherwise. Adds shared nous_credits_lines() (auth-gated, wall-clock-bounded portal fetch) so the CLI and TUI /usage render the same block, and _snapshot_from_credits_state() so HERMES_DEV_CREDITS_FIXTURE drives /usage offline too. TUI: session.usage RPC carries credits_lines (agent-independent) and the /usage panel renders them regardless of API-call count or resume state — previously the TUI's separate /usage implementation only showed token counts. Money rule preserved: %% and magnitudes come from numeric float account fields via display formatting, never by parsing a server _usd string. feat(credits): CLI REPL inline notices (parity with TUI) The plain CLI agent bound no notice callbacks, so credit notices were TUI-only. Bind notice_callback/notice_clear_callback on the CLI AIAgent; _on_notice renders a single level-colored line above the prompt (error red / warn yellow / success green / info dim) via _cprint, and seed credits at session open so a depletion or usage-band warning shows before the first message — the same hydration the TUI got. _on_notice_clear is a no-op (the REPL prints lines, no persistent slot). * test(credits): add sub_50pct + sub_75pct dev fixtures for the new usage bands The fixture set jumped 10%% -> 90%%; add sub_50pct (uf 0.5 -> band 50 info) and sub_75pct (uf 0.75 -> band 75 warn) so the new escalating bands are exercisable via HERMES_DEV_CREDITS_FIXTURE across all three surfaces (notice, session-open seed, /usage gauge). * fix(credits): usage-band notice clears on next prompt (not sticky-forever) A 50/75/90 usage heads-up was sticky and camped the status bar indefinitely. Clear the visible credits.usage notice when a new turn starts (startMessage), so it shows until your next prompt then yields. The server latch is unchanged, so it won't re-nag at the same band — it only re-shows when the band actually changes (climb) or clears when usage drops below the lowest band. Depletion stays sticky. * refactor(credits): consolidate the /usage credits block behind nous_credits_lines() The CLI (_print_nous_credits_block) and the messaging gateway (_handle_usage_command) each re-implemented the auth-gate + portal fetch + render, and both bypassed the dev-fixture short-circuit that only the TUI honored — so /usage ignored HERMES_DEV_CREDITS_FIXTURE on the CLI and in chat. Route both through the shared agent.account_usage.nous_credits_lines() helper: one fetch/render path, one auth gate, and the fixture works on every surface (~60 fewer duplicated lines). The gateway usage test recorded only the last asyncio.to_thread call; /usage now dispatches both the account fetch and the credits fetch, so it records every call and matches the account fetch by its provider arg. * fix(credits): keep the /usage gauge type-safe and log its fail-open path _is_finite_num is now a TypeGuard[float], so the type checker narrows the gauge operands (monthly_credits / credits_remaining) and the magnitudes passed to _fmt_usd through it — no more None-operand warnings on the arithmetic. Add a debug breadcrumb on the nous_credits_lines portal-fetch fail-open so a dead /usage block is diagnosable in agent.log without a dev flag. * fix(credits): harden the header tracker — prod-leak gate, hot-path probe, fire-and-forget seed - Prod-leak guard: dev fixtures (HERMES_DEV_CREDITS_FIXTURE) now also require HERMES_DEV_CREDITS, so a stray fixture var can't surface fabricated balances on a real account. Matches the documented run workflow (both vars set together). - Hot-path probe: parse_credits_headers checks for the version sentinel header before allocating a lowercased copy of the response headers — skips that work on every non-Nous API call. Behaviour-identical and still case-insensitive. - Fire-and-forget seed: the real portal fetch in seed_credits_at_session_start now runs in a daemon thread, so a slow/unreachable portal never delays session "ready" (previously blocked up to 10s). The dev-fixture path stays synchronous; the thread re-checks idempotency before hydrating (a live header may land first). - Diagnostics: debug breadcrumbs on the parse and seed fail-open paths so a crashed parser / dead seed is distinguishable from a legitimate no-headers miss. Cold-start tests set HERMES_DEV_CREDITS alongside the fixture to match the gate. * test(tui): fix env-timing in the StatusRule dev-credits assertion DEV_CREDITS_MODE is read once at module load (config/env), so mutating process.env.HERMES_DEV_CREDITS inside the test couldn't flip it — the dev-banner assertion only passed if the env was exported before vitest started, and failed in a normal run. Move that assertion to a sibling file that mocks config/env with DEV_CREDITS_MODE: true (scoped, no module-reset / React-identity hazard). * test(credits): cover the dev-fixture /usage render and usage-band clear-on-prompt - _snapshot_from_credits_state (the offline /usage renderer) had no direct test: lock the gauge math, the verbatim _usd magnitudes, the depletion line and the fixture marker, plus the no-cap (no gauge) and None-state cases. - turnController.startMessage had no test for clearing the credits.usage notice on the next prompt while leaving credits.depleted sticky. feat(credits): deliver credit notices over messaging gateways Bind notice_callback/notice_clear_callback on the per-turn gateway agent so usage-band / depletion / restored notices reach Telegram/Discord/Slack/ etc. Previously the messaging gateway bound neither callback, so the agent's _emit_credits_notices early-returned and a chat user crossing a band got nothing unless they ran /usage manually. - render_notice_line(): AgentNotice -> single plaintext line (level glyph + text), plaintext-only so it renders uniformly without per-platform escaping. Fail-soft on malformed/empty notices. - Standalone push for every notice (messaging has no persistent status bar): route through the shared _deliver_platform_notice rail (honors private/ public delivery + thread metadata), scheduled onto the gateway loop via safe_schedule_threadsafe from the agent's sync worker thread — same pattern as _status_callback_sync. - The fired-once latch lives on the cached (reused-in-place) agent and persists across turns, so a band crosses once -> one push, no per-turn re-nag. Re-fires only after idle-eviction rebuilds the agent (a reminder). - Recovery ('Credit access restored') rides the show path (emitted as a success notice, not a clear). notice_clear_callback is a no-op: a sent platform message can't be cleanly retracted. Tests: render glyph/levels/fail-soft + public/private delivery seam through _deliver_platform_notice + no-adapter no-op. * fix(credits): don't double the glyph on messaging notices render_notice_line prepended a per-level glyph, but the notice policy already bakes the glyph into the text (and the TUI + CLI render it verbatim) — so every credit notice over messaging came out doubled ("⚠ ⚠ Credits 90% used", "⛔ ✕ Credit access paused"). Emit the text verbatim instead; drop the now-dead level→glyph map. The render tests fed glyph-less text (and the success case only checked startswith), so the doubling slipped through. Rework them around the verbatim contract and add an end-to-end regression that runs real evaluate_credits_notices output through render_notice_line and asserts the line is returned unchanged.	2026-06-06 13:18:18 +05:30
Brooklyn Nicholson	0f45509daf	fix(agent): make mid-turn /steer trusted, not read as injection A steer rides inside a tool result (the only role-alternation-safe slot mid-turn), so a bare "User guidance:" line reads as untrusted tool content — well-behaved models refuse it as suspected prompt injection (observed live: "I only follow instructions from you directly, not ones injected through command results"). - Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker (prompt_builder.format_steer_marker), shared by both drain sites. - Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this exact marker and trusts it as a genuine user message — while still ignoring lookalikes buried in tool/web/file output. Static text → byte-stable prompt, no prompt-cache regression; gated on the agent having tools. - Desktop: steer ack is now an inline transcript note (⏩ steered · …) instead of a toast. Marker is intentionally static (not a per-session nonce) to honor the byte-stable system-prompt caching policy; nonce hardening noted as follow-up.	2026-06-05 20:59:36 -05:00
Teknium	ec46f5912e	fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints (#39730 ) * fix: respect disabled auto-compaction on context overflow Port from anomalyco/opencode#30749. When compression.enabled is false, NO automatic compaction trigger may fire. The proactive token-threshold paths (preflight + post-response should_compress gate) already honoured the setting, but the three provider-overflow recovery paths in the agent loop — long-context-tier 429, 413 payload-too-large, and context-overflow — called _compress_context() unconditionally, silently compressing and rotating the session against the user's explicit choice. Add a single guard at the top of the overflow-recovery dispatch: when compression is disabled and the error is one of those three overflow classes, surface a terminal error (compaction_disabled: True) telling the user to /compress manually, /new, switch to a larger-context model, or reduce attachments. Manual /compress (force=True) is unaffected — it never enters this loop. Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't compress when disabled; control case still compresses when enabled). Existing overflow-recovery tests updated to enable compaction explicitly (they verify the recovery fires); fixture defaults flipped to True to match production (compression.enabled defaults to True). * fix(gemini): default native maxOutputTokens + strip OpenAI extra_body on Gemini endpoints Two distinct failures hit users on the gemini provider with only Google AI Studio keys set. 1. Truncation loop: build_gemini_request() only set maxOutputTokens when max_tokens was non-None. Hermes passes None to mean "unlimited", but Gemini's native generateContent does NOT treat an absent maxOutputTokens as full budget — it applies a low internal default and stops early with finishReason=MAX_TOKENS, truncating tool calls. The agent then retries 3x and refuses the incomplete call. Now default to the published 65,535 ceiling (shared by all current Gemini text models) when max_tokens=None. 2. HTTP 400 on Gemini endpoint: the chat_completions transport assembles profile extra_body (Nous portal 'tags', reasoning, provider prefs) and sends it via the OpenAI client to whatever base_url is resolved. When a profile that emits extra_body (e.g. Nous) is active but the endpoint is a native Gemini base_url — typical when only Google creds exist and a fallback/aux call lands on Gemini — Google rejects the unknown 'tags' field with a non-retryable 400. Strip all non-thinking_config extra_body keys when the resolved endpoint is native Gemini. Verified E2E against real transport code: tags stripped on native Gemini, preserved on Nous and the /openai compat endpoint; maxOutputTokens=65535 on None, explicit values respected.	2026-06-05 03:53:59 -07:00
kyssta-exe	6bdbe30763	fix(vision): guard image pixel dimensions, not just bytes (#37677 ) Anthropic enforces two independent ceilings per image: 1. 5 MB encoded byte size 2. 8000 px longest side Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB) passes every byte check but fails the pixel check, returning a non-retryable HTTP 400 that permanently bricks the conversation thread. Fixes: - error_classifier: add 'image dimensions exceed' pattern to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large and triggers the shrink/retry path instead of falling through to non-retryable error. - conversation_compression: check pixel dimensions (via Pillow) even when byte size is under the 4 MB target. If max(dims) > 8000, force shrink. - vision_tools._resize_image_for_vision: add optional max_dimension param. When set, images exceeding the pixel cap are downscaled even if they're under the byte budget. The resize loop now checks both byte AND pixel limits before accepting a candidate. Closes #37677	2026-06-04 06:16:45 -07:00
Nate George	e8c3ac2f5c	fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral) Fireworks/Mistral reject HTTP 400 'Extra inputs are not permitted, field: messages[N].tool_calls[M].extra_content' on any session whose history contains prior Gemini tool calls. Gemini 3 thinking models attach extra_content (thought_signature) to tool_calls; it survived to the wire because the sanitize paths only stripped call_id/response_item_id. Strip extra_content from the outgoing wire copy in both sanitize paths (ChatCompletionsTransport.convert_messages + _sanitize_tool_calls_for_strict_api), but gate it on the target model: keep extra_content for Gemini-family targets (the thought_signature MUST be replayed or Gemini 400s), strip it for everyone else — including non-Gemini models that inherit a stale Gemini signature earlier in a mixed-provider session. Native Gemini is unaffected (GeminiNativeClient bypasses these paths). Original stored history is never mutated (only the per-call copy). Fixes #17986.	2026-06-03 16:42:52 -07:00
Bryan Bednarski	2e0c9083db	feat(middleware): add adaptive execution intercepts Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>	2026-06-03 11:22:06 -07:00
teknium1	827f251426	perf(observability): gate tool-hook emit on has_hook; slim per-tool footprint The salvaged observer contract gated the API-request hot path on has_hook() but left the per-tool emit ungated: every tool call ran result-field derivation + payload dict build + invoke_hook dispatch even with zero plugins registered. - _emit_post_tool_call_hook now short-circuits on has_hook("post_tool_call") and derives status/error fields lazily (after the gate, only when a listener will consume them). status defaults to None -> derived; explicit blocked/cancelled callers still pass status through. - transform_tool_result emit (pre-existing hook) likewise gated on has_hook(); skips _tool_result_observer_fields when no listener. - Removed the now-redundant _tool_result_observer_fields pre-computation at the three ok-path call sites (model_tools, agent_runtime_helpers, tool_executor) — the helper derives them, so the no-listener path costs one dict lookup and the call sites shrink. - Tests: stub has_hook=True where payload correctness is asserted; add a no-listener regression proving post_tool_call/transform_tool_result emit is skipped when nothing is registered.	2026-06-03 06:36:46 -07:00

1 2 3 4 5 ...

400 commits