hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	a5849917a8	test(pets): make slow pet generation suite opt-in The pet generation image-processing suite is deterministic but expensive enough to blow the per-file CI timeout on Linux (140s), and it is not relevant to the fast timeout PR's normal signal. Keep it available for manual validation, but do not run it by default. Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper now preserves that opt-in variable through its hermetic env.	2026-06-25 00:44:53 -05:00
Teknium	7a65800fed	fix(cache): content-address prompt_cache_key so recurring cron jobs reuse the warm prefix (#52295 ) Recurring cron jobs were prompt-cache-cold on every fire. session_id is built as cron_<job_id>_<timestamp>, and the Codex/Responses transport used session_id directly as prompt_cache_key — so the timestamp changed the cache key on every run and the static prefix (agent identity + tool schemas) was re-paid each tick. Derive prompt_cache_key from a SHA-256 of the static prefix (instructions + sorted tool schemas) instead. Repeated fires of the same job share one content-addressed key (pck_<hash>) and reuse the warm prefix within the provider's cache TTL. The key changes exactly when the prefix changes — edit the job's prompt or toolset and it re-keys; leave it alone and it stays stable. session_id is left untouched for transcript isolation, log correlation, and the Codex/xAI session-scope routing headers (session_id, x-client-request-id, x-grok-conv-id) — those are the per-fire identity, not the cache key. Only the prompt_cache_key body field (standard OpenAI/Codex path and the xAI extra_body field) is content-addressed. Closes #51395. Co-authored-by: spiky02plateau <spiky02plateau@users.noreply.github.com> Co-authored-by: JoaoMarcos44 <JoaoMarcos44@users.noreply.github.com>	2026-06-24 21:46:30 -07:00
brooklyn!	0c442fa1d3	Merge pull request #52303 from NousResearch/bb/pets-gen-qa feat(pets): quality-first OpenRouter chain, stronger atlas gates, global pet-gen notifications	2026-06-24 23:16:40 -05:00
Brooklyn Nicholson	e92b5c6af8	feat(pets): quality-first OpenRouter model chain + stronger atlas gates + global pet-gen notifications OpenRouter/Nous image gen now runs a quality-first model chain by default: attempt the highest-fidelity OpenAI image model first, then fall back to Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback. Atlas validation rejects malformed model output instead of shipping it: adds a per-state collapse guard (a single sliver/fragment row no longer passes because other rows are healthy), on top of the existing postage-stamp + multi-pose checks. Desktop: pet-gen native notifications are now "global" (not tied to a chat session), so a background generation started from the command center fires an OS notification when the user is away even with no active session. Adds a neutral "This can take up to 5 minutes." banner on step 1, and lets the provider picker auto-size. Tests updated/added for the OpenRouter fallback chain, the collapse guard, and the global notification path.	2026-06-24 23:11:21 -05:00
Brooklyn Nicholson	a5a2edd451	feat(agent): recognize focused ad-hoc verification scripts Allow focused temporary scripts to satisfy verification when no canonical suite is detected, while keeping suite evidence distinct from ad-hoc proof.	2026-06-24 23:03:45 -05:00
Brooklyn Nicholson	2f1a47b90e	feat(agent): require verification before finishing edits Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.	2026-06-24 23:02:48 -05:00
Brooklyn Nicholson	f0beb6f617	test(agent): cover verification evidence ledger Exercise command classification, session scoping, stale edits, bounded retention, and natural expiry for recorded verification evidence.	2026-06-24 22:35:27 -05:00
brooklyn!	7157b213f5	Merge pull request #47959 from NousResearch/bb/pets-gen Pet generation: frame-perfect hatch flow, backend picker, CPU-safe chroma, and CI-hardening	2026-06-24 19:41:34 -05:00
Brooklyn Nicholson	1fe013ee16	feat(pets): polish generate flow and reduce hatch CPU pressure Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.	2026-06-24 19:08:06 -05:00
kshitij	cedbb4cfa2	Merge pull request #52140 from NousResearch/salvage/47707-tool-schema-validation fix(agent): validate context/memory tool schemas before wrapping (#47707)	2026-06-25 02:36:19 +05:30
Bartok9	710cd48fb1	fix(agent): validate context/memory tool schemas before wrapping Closes #47707 Context engines and memory providers expose tool schemas via get_tool_schemas(). agent_init.py wrapped each as {"type":"function","function":_schema} without validating that _schema carries a top-level name. A provider returning an entry already in OpenAI tool form ({"type":"function","function":{...}}) was then double-wrapped into a tool whose function has no name. Strict providers (e.g. DeepSeek) reject the entire request with HTTP 400 'tools[N].function: missing field name', so one malformed schema silently disables the whole toolset and breaks every turn. The schema was also never added to valid_tool_names, so even lenient providers could not call it. Add a shared normalize_tool_schema() helper that unwraps an already-wrapped entry and returns None for anything lacking a resolvable string name. Wire it into the agent_init context-engine loop and all three memory_manager surfaces (inject_memory_provider_tools, add_provider routing index, get_all_tool_schemas), so a single bad plugin schema is skipped with a warning instead of poisoning the request. Verification: 209 targeted agent/memory tests pass (incl. 9 new). New tests assert the unwrap + skip-nameless behavior and fail without the fix.	2026-06-25 02:17:29 +05:30
liuhao1024	8d1f6debfd	fix(agent): deepcopy plugin context engine to prevent parent corruption on delegate_task (#42449 ) When delegate_task spawns a child agent with a different model/provider, the child's init_agent loaded the plugin context-engine GLOBAL singleton by reference (`_selected_engine = _candidate`) and then called update_model() on it with the child's (smaller) context_length. Because parent and child shared the same object, this mutated the PARENT's compressor: e.g. DeepSeek 1M ctx silently dropped to 204800 and the compression threshold from 200K to 40K after any delegate_task with a different model. Deepcopy the singleton before assigning/mutating it (agent_init.py) so the child gets its own instance and the parent's compressor is untouched. Salvaged from #42452 by @liuhao1024 (authorship preserved). Added a source-pin regression test that fails if the production line reverts to the bare alias, plus an end-to-end test driving get_plugin_context_engine() and a StubEngine.update_model() — the original PR's tests exercised copy.deepcopy in isolation but did not guard the actual agent_init code path. Closes #42449. Supersedes #42469, #42474 (same one-line fix, no test).	2026-06-25 02:13:26 +05:30
texhy	aacc6bb0a8	fix(agent): trigger preflight compression on few-but-huge sessions (#27405 ) The preflight-compression gate only ran the (expensive) token estimate when the message COUNT exceeded protect_first_n + protect_last_n + 1. A session with a handful of very large messages never tripped the count condition, so compression was never attempted and the turn eventually hit a hard context-overflow error. Add _should_run_preflight_estimate() with OR semantics: run the estimate when either the message count exceeds the protected ranges (the historical gate) OR a cheap char-based estimate already crosses the configured threshold. The downstream estimate_request_tokens_rough() stays authoritative — this is only a hint that decides whether to pay for the full estimate. Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current main: the preflight gate moved from conversation_loop.py to turn_context.py since the PR was opened, so the helper + gate are placed there; the test imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal. Closes #27405.	2026-06-25 01:20:23 +05:30
Brooklyn Nicholson	32f837add1	feat(pets): prompt → atlas sprite-generation engine Turn a text prompt into a petdex-spec spritesheet (8×9 grid of 192×208 cells), grounded so every animation row stays the same creature: - orchestrate: base drafts (distinct variation nudges) → per-row grounded generation → atlas compose; one image call per row, rows fan out in parallel. - atlas: frame-perfect registration in normalize_cells — 1-D cross-correlation of each frame's column-mass profile locks the body (robust to limbs/cape), one shared per-state scale, bottom-anchored; plus alpha-hole repair, gutter severing, and interior-seeded chroma-pocket clearing. - prompts: pixel-art-by-default style hints + registration constraints. - store: local pet write (register_local_pet), slugify/unique_slug, export_pet, slug-realigning rename_pet, createdBy provenance.	2026-06-24 13:48:29 -05:00
kshitijk4poor	ac822e4d36	fix(compression): abort (preserve context) on transient network summary failure (#29559 , #25585 ) When context compaction's summary generation fails, the compressor's default path (abort_on_summary_failure=False) drops the middle window and inserts a static 'summary unavailable' marker — destroying the compacted turns. #29559 reported the field impact: a Connection error at the compaction moment dropped 124->15 messages (110 lost) for a long browser-automation task; #25585 is the same failure mode (failed summary commits a destructive compaction anyway). compress() already has an EXCEPTION to the historical drop default: auth failures (401/403) ALWAYS abort and preserve the session, because rotating into a placeholder-summary child on a broken credential strands the user. A transient network/connection error is the same situation in reverse: it WILL recover, and retrying then is strictly better than discarding context for a momentary blip. Extend the always-abort carve-out to terminal connection/network failures: - new _last_summary_network_failure flag, set in _generate_summary's terminal failure branch when _is_connection_error(e) (reached only after any main-model fallback is exhausted), reset alongside the auth flag; - compress() aborts when it's set (returns messages unchanged, _last_compress_aborted=True), independent of abort_on_summary_failure; - a network-specific operator warning (distinct from the auth + config-flag messages). Scoped to connection errors only: a generic 500/400 still takes the historical fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green). Tests: network-failure detection + abort-despite-flag-false, both mutation-checked (removing the flag-set fails detection; removing the carve-out fails the abort).	2026-06-24 18:31:51 +05:30
teknium1	ba50787180	test(anthropic-oauth): cover login token-endpoint host + fallback Add two regression tests for the salvaged #48706 fix: - login token exchange targets platform.claude.com first - falls back to console.anthropic.com when the new host is unreachable Also map the salvaged contributor's noreply email in release.py AUTHOR_MAP (CI author-map gate).	2026-06-23 23:59:40 -07:00
Teknium	0957d77187	test(agent): cover interrupt tool-tail alternation close (#48879 ) Regression coverage for the synthetic-assistant close: interrupt after a successful tool must persist an assistant tail (placeholder when no delivered text), real delivered text is preserved, and non-interrupted or non-tool tails are left untouched.	2026-06-23 23:52:28 -07:00
Teknium	8e7e104521	fix(cron): tell the user TUI/CLI cron jobs are local-only at create time (#51683 ) deliver=origin (or omitted) from a TUI or classic-CLI session produces a job with origin=null, because those sessions never populate the HERMES_SESSION_PLATFORM/CHAT_ID context vars that _origin_from_env reads. The scheduler then resolves no delivery target and skips delivery — the job runs and saves output to last_output, but nothing reaches the user and they only find out by polling cronjob(action='list') (#51568). This is by design (local sessions have no live-delivery channel), so the fix surfaces it instead of silently dropping the intent: - cronjob create now appends an informational notice to its result when a created job resolves to zero delivery targets and the user did not explicitly ask for deliver='local'. The check uses the scheduler's own _resolve_delivery_targets so it accounts for origin, home channels, 'all', and explicit platform targets — no false positives. - PLATFORM_HINTS gains a 'tui' entry (the TUI had none) and the 'cli' hint now states that cron jobs from these sessions are local-only and that deliver must target a gateway-connected platform to notify the user. This stops the agent promising a delivery that never happens. No scheduler/delivery behavior change; no new env var; cron isolation invariant untouched.	2026-06-23 23:27:48 -07:00
Brooklyn Nicholson	e495b33bf1	Merge remote-tracking branch 'origin/main' into bb/pets-merge # Conflicts: # hermes_cli/commands.py # tui_gateway/server.py	2026-06-23 19:05:22 -05:00
Teknium	e32ebc6aa2	feat(skills): /learn — distill a reusable skill from anything you describe (#51506 ) Open-ended skill learning across every surface. /learn <free text> takes a description of any source — a directory, a URL, the workflow you just walked the agent through, or pasted notes — and the live agent gathers it with the tools it already has (read_file/search_files, web_extract, the conversation, the pasted text), then authors a SKILL.md via skill_manage following the house authoring standards (<=60-char description, the standard section order, Hermes-tool framing, no invented commands). No engine, no model-tool footprint, works on any terminal backend (local, Docker, remote): /learn builds a standards-guided prompt and hands it to the agent as a normal turn. - agent/learn_prompt.py: shared standards-guided prompt builder - /learn registry entry (both surfaces) + CLI handler (inject onto input queue) + gateway handler (rewrite turn, fall through, /blueprint pattern) - tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat - dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text) composes a /learn request and runs it in chat - docs (slash-commands ref + skills feature page), 11 targeted tests Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234 (dir-distillation engine); reworked to be open-ended and engine-free per review.	2026-06-23 13:51:28 -07:00
brooklyn!	211ba9c7d3	feat(agent): one-shot LLM helper + llm.oneshot gateway RPC (#51261 ) A "one-shot" is a single stateless model call that runs OUTSIDE any conversation: it never touches session history, never breaks prompt caching, and returns plain text. UI surfaces need this for small generative chores — a commit message from a diff, a rename suggestion, a summary — where an agent turn would pollute the thread and hand-rolling an LLM call at every call site would be worse. - `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client plumbing (same path as title generation). Two call shapes: explicit instructions/input, or a registered `template` + `variables` (templates own the prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a `commit_message` template. Model selection inherits the live session via `main_runtime`, else the configured aux `task` backend. - `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the session's model when `session_id` resolves. Stateless by construction — no session mutation, cache untouched.	2026-06-23 08:01:50 +00:00
brooklyn!	af7b7f6322	feat(agent): expose coding-context project facts as structured data + project.facts RPC (#51259 ) Follow-up to the coding-context posture (#43316): that PR detects each repo's verify loop (manifests, package manager, exact test/lint/build commands, context files) and bakes it into the system-prompt snapshot — but only as a string, for the model. Non-prompt consumers (the desktop verify UI) had no way to read it without re-sniffing and drifting from the prompt. Split detection from rendering, keeping one source of truth: - `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured facts; `_project_facts()` now renders it into the same snapshot lines, so the prompt block stays byte-identical (cache-safe). - `project_facts_for(cwd)` resolves the workspace root (git, else marker) and returns the structured facts, or None outside a workspace. - `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP). Tests assert the structured output and that the UI-facing commands never drift from what the prompt block renders (one detector feeds both).	2026-06-23 08:00:01 +00:00
helix4u	3972701424	fix(agent): complete final text on last turn	2026-06-22 13:57:59 -07:00
Teknium	b1b20270c4	refactor(memory): move write-mirror gating behind MemoryManager interface The success/staged gating and op-expansion for mirroring built-in memory writes to external providers lived in a standalone agent/memory_write_bridge.py helper called inline from two core call sites (tool_executor.py, agent_runtime_helpers.py). That left the mirror decision-making in the agent loop, outside the memory-provider interface. Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the loop now hands over the raw tool result + args and a metadata callback, and the manager decides whether/what to mirror. Both core call sites collapse to a single call; the orphan module is removed. No MemoryProvider ABC change. Tests rewritten as behavior tests against the manager method.	2026-06-22 07:00:42 -07:00
Hao Zhe	027cb649ef	fix(memory): fail closed on unclear write results	2026-06-22 07:00:42 -07:00
Hao Zhe	70e7132e2f	fix(openviking): gate memory writes and add viking_forget Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove. Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.	2026-06-22 07:00:42 -07:00
kshitijk4poor	623b21bf24	fix(compress): reserve output tokens in the compaction threshold (#23767 , #43547 ) The compaction trigger compared estimated input against context_length * threshold, but the provider reserves max_tokens of OUTPUT out of the same window. With a large max_tokens (e.g. 65536 on a custom provider) the usable input budget is materially smaller than the raw window, so sessions hit a provider 400 before compaction ever fired. _compute_threshold_tokens now subtracts the output reservation (context_length - max_tokens) before applying the percentage and the small-window 85% guard. max_tokens is stored on the compressor (threaded from agent.max_tokens at construction) and reused across update_model() switches; None = provider default = no reservation (full-window behavior, unchanged). Reimplemented on the current _compute_threshold_tokens surface (the inline threshold calc the original PR targeted was since refactored for the small-window #14690 fix); composes with that 85% guard on the effective budget. Credit: @kyssta-exe (#43651) — original design for the output-token reservation in the compaction threshold. Closes #43547.	2026-06-22 17:26:17 +05:30
kshitijk4poor	b2c84a1626	fix(agent): defer preflight compaction until real usage after a compaction (#23767 , #36718 ) After a compaction, the post-compression path parks last_prompt_tokens=-1 and sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens still holds the stale pre-compression value (above threshold). should_defer_ preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False' short-circuit and let preflight fire a SECOND compaction before the provider reported real post-compaction usage. Add an early-return on the awaiting flag so deferral holds for exactly one turn; update_from_response() clears it. The flag-setting half (#36718) already landed on main via the in-place compaction path (conversation_compression.py); this adds the missing should_defer guard that consumes it. Credit: - @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design - @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement Closes #36718.	2026-06-22 16:33:18 +05:30
Basil Al Shukaili	72f75f8456	fix(compressor): count tool_call envelope in tail-budget token estimate (#28053 ) The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing. Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere. Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 16:26:56 +05:30
kshitij	aa83213c53	Merge pull request #50740 from NousResearch/salvage/preflight-token-progress Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details fix(agent): count tokens, not just rows, as preflight compression progress (#23767, #39548)	2026-06-22 15:58:58 +05:30
kshitij	21541ce6e9	Merge pull request #50108 from NousResearch/salvage/f4m1-anthropic-pool fix(auth): consult credential_pool in resolve_anthropic_token (#26344)	2026-06-22 15:58:01 +05:30
Brooklyn Nicholson	5342eccf12	Merge remote-tracking branch 'origin/main' into bb/pets	2026-06-22 05:25:49 -05:00
kshitijk4poor	69de0360a1	fix(agent): align preflight token-progress floor to 5% (#23767 , #39548 ) Follow-up to the salvaged preflight token-progress fix: require a material (>5%) token reduction to count as progress, matching the overflow-handler retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the 3-pass preflight loop spinning. Adds boundary + zero-token regression tests.	2026-06-22 15:51:52 +05:30
kshitijk4poor	3545d29422	refactor(auth): drop dead select() fallback in anthropic pool resolver /simplify-code QUALITY finding: the `if callable(_available_entries): ... else: pool.select()` ladder was dead for the real CredentialPool type (`_available_entries` is always a bound method) AND the select() fallback violated the helper's read-only contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True, refresh=True), which persists to auth.json and triggers a network refresh. Call _available_entries(clear_expired=False, refresh=False) directly inside the existing try/except instead. Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the read-only / null-token contract tests were mutation-checked (flipping the flags / removing the None-guard fails the respective test).	2026-06-22 15:50:26 +05:30
JackJin	b08ee8ad04	fix(agent): count tokens, not just rows, as preflight compression progress Rebased onto god-file Phase 1 refactor — preflight compression has moved from agent/conversation_loop.py to agent/turn_context.py (no semantic change in the refactor itself; the bug below was carried over verbatim). The preflight compression loop in ``turn_context.py`` uses ``len(messages) >= _orig_len`` to decide whether a compression pass has made progress. That conflates two different conditions: a true no-op (transcript materially unchanged) and effective token compression that summarises message contents but keeps the same number of rows. The second case is misread as "Cannot compress further" — the session then surfaces ``Context length exceeded`` and auto-resets even when the post-compression estimate is far below the model context window. Observed example from #39548: a Telegram session on GPT-5.5 with a 1M context dropped from ~288k → ~183k tokens (a 36% reduction) while preserving 220 messages. The loop treats that as exhaustion and the gateway auto-resets the session. Fix --- Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)`` and call it after the post-pass ``estimate_request_tokens_rough`` (which is moved up to run before the progress check instead of after it). Either a row-count reduction OR a token-count reduction now counts as progress; only when neither moves do we break out as "stuck". Fixes #39548	2026-06-22 15:49:19 +05:30
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
Teknium	b7a912ea45	fix(antigravity): bake in public OAuth client + default project fallback Salvage follow-up on top of @pmos69's #29474. The PR resolved the Antigravity OAuth client purely by discovering it from an installed `agy` binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without agy installed hit a hard 'client ID not available' error. Antigravity's desktop OAuth client is a public, non-confidential installed-app client (PKCE provides the security), baked into every copy of the Antigravity CLI — same posture as the gemini-cli credentials Hermes already ships in google_oauth.py. Bake it in as the final fallback (env -> discovery -> public default) and add the public default Code Assist project as the discovery fallback, matching the reference Antigravity flow. Now consumers can authenticate directly without agy installed.	2026-06-21 16:41:30 -07:00
pmos69	8baa4e9976	feat(cli): add native Antigravity OAuth provider	2026-06-21 16:41:30 -07:00
devorun	6f0ecf37da	fix(redact): mask all Authorization schemes and x-api-key style headers Secret redaction only matched `Authorization: Bearer <token>`. Other auth headers passed through verbatim into logs, tool output, and transcripts: - `Authorization: Basic <base64>` — leaks base64(user:password) - `Authorization: token <pat>` / any non-Bearer scheme - `Proxy-Authorization: ...` - `x-api-key: <key>` (Anthropic and many providers) and `api-key`, `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with no known vendor prefix were caught by nothing A logged request or an echoed `curl -H "x-api-key: ..."` command therefore leaked live credentials. Generalize the Authorization rule to mask the credential for any scheme (and Proxy-Authorization) while preserving the header name and scheme word for debuggability, and add an api-key header rule for the single-opaque-value headers. Bearer behavior is unchanged; plain prose containing the word "authorization" (no colon-delimited value) is left untouched. Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key headers, including inside a curl command.	2026-06-21 14:08:06 -07:00
Teknium	e0498bd305	fix(bedrock): price Claude prompt-cache tokens in /usage (#50307 ) Bedrock Claude routes through the AnthropicBedrock SDK and injects cache_control, so cached tokens are always reported — but the pricing table had no cache cost fields for any Bedrock model, so /usage showed "cost unknown" on every cached session. Also, cross-region inference profiles (us./global./eu. prefixes) never matched the bare pricing keys. - Add cache_read/cache_write rates to the four Bedrock Claude rows (read 0.1x input, write 1.25x input per the Bedrock pricing page). - Normalize the cross-region prefix in the Bedrock pricing lookup, mirroring is_anthropic_bedrock_model's prefix list. Closes #50295.	2026-06-21 11:48:43 -07:00
teknium1	41e0c10f7e	fix(agent): route repeated-compression warning through _emit_status (#36908 ) The 'Session compressed N times — accuracy may degrade' warning went through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord never saw it — unlike the two other compression warnings in the same module, which route through _emit_status (and store _compression_warning for late-bound gateway status_callback replay). Set agent._compression_warning + call agent._emit_status() for this warning too, matching the sibling pattern. _emit_status still _vprints for the CLI, so CLI output is unchanged; TUI / gateway surfaces now receive it via status_callback (and replay_compression_warning can re-deliver it once a late-bound gateway callback is wired). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-21 11:34:47 -07:00
Teknium	b6a4638b6d	fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297 ) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty content. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.	2026-06-21 11:27:07 -07:00
yeyitech	8a506ed3ac	fix(auth): make load_pool() non-destructive for env-seeded credentials load_pool() is meant to be a read, but it persistently pruned env-seeded pool entries whenever the calling process's os.environ lacked the seeding var. A process without MINIMAX_API_KEY would delete the persisted env:MINIMAX_API_KEY entry from auth.json for every other process, causing auth.json to oscillate and auxiliary auto-detect to fall through to the wrong provider. env:* entries are persisted references re-hydrated from the environment on each load — a missing var means "cannot re-seed right now", not "source is gone forever". _prune_stale_seeded_entries now gates env-source removal behind prune_env_sources (default True for explicit cleanup paths); load_pool() passes prune_env_sources=False. File-backed singletons (device-code OAuth, hermes_pkce) still prune when their backing file is gone, and explicit removal via `hermes auth remove` (source suppression) is unaffected. Fixes #9331. Co-authored-by: houko <suzukaze.haduki@gmail.com>	2026-06-21 08:26:37 -07:00
teknium1	3509be7124	fix(compression): auto-compression triggers at minimum context length (#14690 ) The compaction threshold is max(context_length * threshold_percent, MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on large models, but degenerates at small windows: a model at exactly 64000 ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE window. should_compress() can then never fire, because the provider rejects the request before usage reaches 100%. Auto-compression silently never triggers for any model whose context_length <= MINIMUM / threshold_percent (e.g. 64K-per-slot local models). Centralize the calc in _compute_threshold_tokens(). When the floor would meet or exceed the context window, trigger at 85% of the window (_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses most of its budget before compacting (compacting at the 50% percentage would waste half the small window), but below 100% so compaction actually fires before the provider rejects the request. This mirrors the existing gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at 64000) is unchanged; both call sites (__init__ and update_model) use the shared helper. Co-authored-by: soynchux <soynchuux@gmail.com> Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com> Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-21 07:53:14 -07:00
kshitij	c6a0929875	Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch fix(agent): reset stale token calibration on model switch (#23767)	2026-06-21 20:02:08 +05:30
Teknium	9f67ba1b01	fix(agent): guard finalize_turn cleanup chain so it never drops the response (#50009 ) When a turn hit max_iterations, finalize_turn ran three unguarded cleanup steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources (remote VM/browser teardown), and _persist_session (SQLite write). Any raise there propagated out of run_conversation, discarding the partial final_response the caller was waiting for; subprocess wrappers saw an empty stdout with no traceback (#8049). Each step is now guarded independently so one failure can't skip the others. Failures log at ERROR with a traceback and are surfaced on the result dict via cleanup_errors; the partial response is always returned. Closes #8049.	2026-06-21 07:25:42 -07:00
kshitijk4poor	1e0b3a2bcc	fix(agent): reset stale token calibration on model switch (#23767 ) ContextCompressor.update_model() recomputed context_length/threshold/budgets but kept the cross-call calibration state (last_real_prompt_tokens, last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens, awaiting_real_usage_after_compression, _ineffective_compression_count) from the PREVIOUS model. Those fields encode 'the provider proved this prompt fit' / 'preflight can be deferred' decisions valid only for the model that produced them. Carried across a switch to a smaller-context model, should_defer_preflight_to_real_usage() used the old model's 'it fit' history to SKIP a preflight compression the new model actually needed — sending an oversized prompt the provider rejects (#23767). update_model() now clears that state; the new model's first response repopulates it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer no longer suppresses and should_compress fires on an over-threshold estimate.	2026-06-21 17:46:58 +05:30
LeonSGP43	3463188512	fix(auth): honor anthropic credential pool oauth Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-21 16:20:50 +05:30
teknium1	14ef6312b5	fix(compression): decay protect_first_n so early turns don't fossilize (#11996 ) protect_first_n keeps the first N non-system messages verbatim through compaction so the original task framing survives. But it was applied on EVERY compression pass: the same early user turns were re-copied into each child session and never summarized away, so across a long, repeatedly- compressed session those old messages became immortal and grew the protected head unboundedly (#11996, P1). Decay it: protect_first_n applies on the FIRST compaction only. Once the session has been compressed at least once (compression_count >= 1, or a handoff summary already exists), the early turns are captured in the summary, so _effective_protect_first_n() returns 0 and only the system prompt stays protected. The decay is read at compress_start computation time, before compression_count/_previous_summary are mutated at the end of compress(), so the first pass still protects correctly. Co-authored-by: truenorth-lj <liliangjya@gmail.com> Co-authored-by: davidvv <david.vv@icloud.com>	2026-06-21 00:06:58 -07:00

1 2 3 4 5 ...

778 commits