hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 10:52:21 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	88e136448d	fix(agent): shrink anthropic-native image history Retry image-size rejections by rewriting Anthropic base64 image source blocks, not just OpenAI-style image_url parts.	2026-06-22 18:23:21 -05:00
Teknium	87c4a5ebb8	feat(background-review): aux-model selector for the self-improvement review (#49252 ) Adds auxiliary.background_review.{provider,model} (default auto = main chat model — unchanged). Set it to a different, cheaper model and the post-turn self-improvement review runs there for ~3-5x lower cost. Cache-aware by design: the main chat is warm in the prompt cache, so the default full-history replay on the main model is cheap cache reads — left exactly as-is. A different model can't reuse that cache (different key), so when (and only when) routed to a different model the fork replays a compact digest instead of the full transcript, minimising what it cold-writes on the aux model. Same model -> full replay; different model -> digest. Quality holds in benchmarks: memory capture identical, skill near-identical. Nothing changes unless you opt in by naming a different model. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-22 14:54:53 -07:00
helix4u	3972701424	fix(agent): complete final text on last turn	2026-06-22 13:57:59 -07:00
Teknium	b1b20270c4	refactor(memory): move write-mirror gating behind MemoryManager interface The success/staged gating and op-expansion for mirroring built-in memory writes to external providers lived in a standalone agent/memory_write_bridge.py helper called inline from two core call sites (tool_executor.py, agent_runtime_helpers.py). That left the mirror decision-making in the agent loop, outside the memory-provider interface. Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the loop now hands over the raw tool result + args and a metadata callback, and the manager decides whether/what to mirror. Both core call sites collapse to a single call; the orphan module is removed. No MemoryProvider ABC change. Tests rewritten as behavior tests against the manager method.	2026-06-22 07:00:42 -07:00
Hao Zhe	027cb649ef	fix(memory): fail closed on unclear write results	2026-06-22 07:00:42 -07:00
Hao Zhe	70e7132e2f	fix(openviking): gate memory writes and add viking_forget Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove. Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.	2026-06-22 07:00:42 -07:00
Francesco Bonacci	f2e37549c6	feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux) Make the computer_use toolset platform-agnostic by driving cua-driver on macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces (capability discovery, structuredContent AX tree, opaque element_token, click button enum, explicit mimeType, machine-readable manifest, structured list_windows, structured health_report), each degrading gracefully on older drivers. Adds `hermes computer-use doctor` (drives cua-driver health_report with a per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full typed wrappers for the previously-uncovered cua-driver tools plus a generic call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware system-prompt guidance (host-deterministic, cache-safe), and honors HERMES_CUA_DRIVER_CMD end-to-end. Replaces the macOS-only skills/apple/macos-computer-use skill with a cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans docs. Supersedes #44221 (Windows-enablement salvage of #30660). Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-22 06:42:30 -07:00
kshitijk4poor	623b21bf24	fix(compress): reserve output tokens in the compaction threshold (#23767 , #43547 ) The compaction trigger compared estimated input against context_length * threshold, but the provider reserves max_tokens of OUTPUT out of the same window. With a large max_tokens (e.g. 65536 on a custom provider) the usable input budget is materially smaller than the raw window, so sessions hit a provider 400 before compaction ever fired. _compute_threshold_tokens now subtracts the output reservation (context_length - max_tokens) before applying the percentage and the small-window 85% guard. max_tokens is stored on the compressor (threaded from agent.max_tokens at construction) and reused across update_model() switches; None = provider default = no reservation (full-window behavior, unchanged). Reimplemented on the current _compute_threshold_tokens surface (the inline threshold calc the original PR targeted was since refactored for the small-window #14690 fix); composes with that 85% guard on the effective budget. Credit: @kyssta-exe (#43651) — original design for the output-token reservation in the compaction threshold. Closes #43547.	2026-06-22 17:26:17 +05:30
kshitijk4poor	b2c84a1626	fix(agent): defer preflight compaction until real usage after a compaction (#23767 , #36718 ) After a compaction, the post-compression path parks last_prompt_tokens=-1 and sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens still holds the stale pre-compression value (above threshold). should_defer_ preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False' short-circuit and let preflight fire a SECOND compaction before the provider reported real post-compaction usage. Add an early-return on the awaiting flag so deferral holds for exactly one turn; update_from_response() clears it. The flag-setting half (#36718) already landed on main via the in-place compaction path (conversation_compression.py); this adds the missing should_defer guard that consumes it. Credit: - @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design - @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement Closes #36718.	2026-06-22 16:33:18 +05:30
Basil Al Shukaili	72f75f8456	fix(compressor): count tool_call envelope in tail-budget token estimate (#28053 ) The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing. Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere. Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 16:26:56 +05:30
kshitij	aa83213c53	Merge pull request #50740 from NousResearch/salvage/preflight-token-progress Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details fix(agent): count tokens, not just rows, as preflight compression progress (#23767, #39548)	2026-06-22 15:58:58 +05:30
kshitij	21541ce6e9	Merge pull request #50108 from NousResearch/salvage/f4m1-anthropic-pool fix(auth): consult credential_pool in resolve_anthropic_token (#26344)	2026-06-22 15:58:01 +05:30
kshitijk4poor	69de0360a1	fix(agent): align preflight token-progress floor to 5% (#23767 , #39548 ) Follow-up to the salvaged preflight token-progress fix: require a material (>5%) token reduction to count as progress, matching the overflow-handler retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the 3-pass preflight loop spinning. Adds boundary + zero-token regression tests.	2026-06-22 15:51:52 +05:30
kshitijk4poor	3545d29422	refactor(auth): drop dead select() fallback in anthropic pool resolver /simplify-code QUALITY finding: the `if callable(_available_entries): ... else: pool.select()` ladder was dead for the real CredentialPool type (`_available_entries` is always a bound method) AND the select() fallback violated the helper's read-only contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True, refresh=True), which persists to auth.json and triggers a network refresh. Call _available_entries(clear_expired=False, refresh=False) directly inside the existing try/except instead. Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the read-only / null-token contract tests were mutation-checked (flipping the flags / removing the None-guard fails the respective test).	2026-06-22 15:50:26 +05:30
JackJin	b08ee8ad04	fix(agent): count tokens, not just rows, as preflight compression progress Rebased onto god-file Phase 1 refactor — preflight compression has moved from agent/conversation_loop.py to agent/turn_context.py (no semantic change in the refactor itself; the bug below was carried over verbatim). The preflight compression loop in ``turn_context.py`` uses ``len(messages) >= _orig_len`` to decide whether a compression pass has made progress. That conflates two different conditions: a true no-op (transcript materially unchanged) and effective token compression that summarises message contents but keeps the same number of rows. The second case is misread as "Cannot compress further" — the session then surfaces ``Context length exceeded`` and auto-resets even when the post-compression estimate is far below the model context window. Observed example from #39548: a Telegram session on GPT-5.5 with a 1M context dropped from ~288k → ~183k tokens (a 36% reduction) while preserving 220 messages. The loop treats that as exhaustion and the gateway auto-resets the session. Fix --- Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)`` and call it after the post-pass ``estimate_request_tokens_rough`` (which is moved up to run before the progress check instead of after it). Either a row-count reduction OR a token-count reduction now counts as progress; only when neither moves do we break out as "stuck". Fixes #39548	2026-06-22 15:49:19 +05:30
David Gutowsky	87b60ae49a	no-mistakes(review): guard token-delta status msg on actual compression in overflow handler	2026-06-22 15:23:24 +05:30
David Gutowsky	47b6b4cf85	fix #39550 : detect token-only compression success Compression can materially reduce request size (tool-result pruning, in-place summarization) without reducing message count. The two compression-success checks in conversation_loop.py (413 handler and context-overflow handler) only compared len(messages) to detect success, missing token-only compression. Now re-estimates tokens after compress_context() returns and treats any >=5% reduction as a successful compression pass. Error logs also use the post-compression token count instead of the stale pre-compression estimate. Fixes: #39550	2026-06-22 15:23:24 +05:30
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
Teknium	2b3a4f0af8	fix(agent): strip stale reasoning_content when falling back to a strict provider (#50480 ) * fix(agent): strip stale reasoning_content when falling back to a strict provider A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content on every assistant tool-call turn (a single space " " pad). api_messages is built once under the primary; on a mid-session fallback to a strict OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale pads were replayed verbatim and rejected with HTTP 400/422: body.messages.2.assistant.reasoning_content: Extra inputs are not permitted (input: ' ') reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never reconciled history built under a reasoning primary against a strict fallback. copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only shapes to non-pad providers. Fix both sites: when the active provider does not enforce echo-back, strip reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when switching TO a reasoning provider is preserved. Covers the Cerebras 400 from #45655 and the DeepSeek->Mistral 422 fallback report. Refs #45655. * test: update reasoning-replay tests for strict-provider stripping test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning provider so the precedence it checks is observable. Add a positive strict-provider test asserting reasoning_content is stripped on replay.	2026-06-21 18:05:07 -07:00
Teknium	84e1d31e54	refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473 ) The kanban-worker and kanban-orchestrator bundled skills existed only to be force-loaded into dispatcher-spawned workers, gated by environments:[kanban] so they wouldn't leak into normal CLI listings. That gating was fragile (the leak that #50443 patched) and the --skills auto-load was already best-effort — most workers ran without it because the bundled skill isn't present in profile-scoped skills dirs. Remove the skills entirely and promote their load-bearing content (workspace kinds, deliverable artifacts, created-card integrity, profile discovery) into KANBAN_GUIDANCE, which is already injected into every kanban worker's system prompt. Net result: every worker reliably gets the guidance, nothing can leak into a CLI/blank-slate session, and the gating machinery is gone. - agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE - hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe - hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card - hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text - delete skills/devops/kanban-{worker,orchestrator} - docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references - tests: update spawn-argv expectations; re-bound the guidance-size guard Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).	2026-06-21 17:06:48 -07:00
Teknium	b7a912ea45	fix(antigravity): bake in public OAuth client + default project fallback Salvage follow-up on top of @pmos69's #29474. The PR resolved the Antigravity OAuth client purely by discovering it from an installed `agy` binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without agy installed hit a hard 'client ID not available' error. Antigravity's desktop OAuth client is a public, non-confidential installed-app client (PKCE provides the security), baked into every copy of the Antigravity CLI — same posture as the gemini-cli credentials Hermes already ships in google_oauth.py. Bake it in as the final fallback (env -> discovery -> public default) and add the public default Code Assist project as the discovery fallback, matching the reference Antigravity flow. Now consumers can authenticate directly without agy installed.	2026-06-21 16:41:30 -07:00
pmos69	8baa4e9976	feat(cli): add native Antigravity OAuth provider	2026-06-21 16:41:30 -07:00
JP Lew	c11ae8261b	fix(codex): seed app-server sessions with configured cwd	2026-06-21 16:39:02 -07:00
devorun	6f0ecf37da	fix(redact): mask all Authorization schemes and x-api-key style headers Secret redaction only matched `Authorization: Bearer <token>`. Other auth headers passed through verbatim into logs, tool output, and transcripts: - `Authorization: Basic <base64>` — leaks base64(user:password) - `Authorization: token <pat>` / any non-Bearer scheme - `Proxy-Authorization: ...` - `x-api-key: <key>` (Anthropic and many providers) and `api-key`, `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with no known vendor prefix were caught by nothing A logged request or an echoed `curl -H "x-api-key: ..."` command therefore leaked live credentials. Generalize the Authorization rule to mask the credential for any scheme (and Proxy-Authorization) while preserving the header name and scheme word for debuggability, and add an api-key header rule for the single-opaque-value headers. Bearer behavior is unchanged; plain prose containing the word "authorization" (no colon-delimited value) is left untouched. Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key headers, including inside a curl command.	2026-06-21 14:08:06 -07:00
Teknium	587b5b9ac2	fix(backup): capture memory-provider state stored outside HERMES_HOME (#50325 ) hermes backup only walks HERMES_HOME, so memory providers that keep config/credentials in home-anchored dotdirs (honcho -> ~/.honcho, hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data across a backup/import cycle — the peer IDs, session pairings, and API keys never made it into the archive. Add an optional MemoryProvider.backup_paths() hook (default []). The active provider declares its external paths; backup resolves them from config only (no init, no network), archives the ones under the home dir into a reserved _external/ subtree encoded relative to home, and import restores them to their original location with a home-anchored traversal guard and 0600 on credential-shaped files. Paths outside home are skipped as non-portable. honcho, hindsight, and openviking override the hook. E2E-validated full backup->import cycle plus 7 new tests.	2026-06-21 12:03:46 -07:00
Teknium	e0498bd305	fix(bedrock): price Claude prompt-cache tokens in /usage (#50307 ) Bedrock Claude routes through the AnthropicBedrock SDK and injects cache_control, so cached tokens are always reported — but the pricing table had no cache cost fields for any Bedrock model, so /usage showed "cost unknown" on every cached session. Also, cross-region inference profiles (us./global./eu. prefixes) never matched the bare pricing keys. - Add cache_read/cache_write rates to the four Bedrock Claude rows (read 0.1x input, write 1.25x input per the Bedrock pricing page). - Normalize the cross-region prefix in the Bedrock pricing lookup, mirroring is_anthropic_bedrock_model's prefix list. Closes #50295.	2026-06-21 11:48:43 -07:00
teknium1	9e4fe32d36	fix(session): opt the background-review fork out of session finalization The background-review fork (fires ~every 10 turns) pins review_agent.session_id = agent.session_id — the parent's LIVE id — for prefix-cache parity, then calls close(). With session finalization now in close(), that would end the still-active parent session mid-conversation. Set _end_session_on_close = False on the fork so the real owner (CLI close / gateway reset / cron) finalizes the session instead. Follow-up to the #12029 fix.	2026-06-21 11:35:09 -07:00
yeyitech	b17180d950	fix(session): finalize owned SQLite session rows on AIAgent.close() Funnel session finalization through AIAgent.close() — the single terminal path every agent (CLI, gateway, subagent, cron) funnels through — so finished agents stop leaving rows with ended_at IS NULL. The biggest leak source was delegate_task subagent + background-review forks whose close() never ended their row. end_session() is first-reason-wins and no-ops on an already-ended row, so a 'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal path is never clobbered. /resume already calls reopen_session(), so finalizing-on-close does not break resumability. Temporary helper agents that rotate/share the session forward (manual compression, gateway session-hygiene) opt out via _end_session_on_close=False. Also stop the long-running gateway heartbeat once the executor is done or the session slot is rebound to a different agent, preventing a stale 'running: delegate_task' bubble from outliving its run. Closes #12029.	2026-06-21 11:35:09 -07:00
teknium1	41e0c10f7e	fix(agent): route repeated-compression warning through _emit_status (#36908 ) The 'Session compressed N times — accuracy may degrade' warning went through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord never saw it — unlike the two other compression warnings in the same module, which route through _emit_status (and store _compression_warning for late-bound gateway status_callback replay). Set agent._compression_warning + call agent._emit_status() for this warning too, matching the sibling pattern. _emit_status still _vprints for the CLI, so CLI output is unchanged; TUI / gateway surfaces now receive it via status_callback (and replay_compression_warning can re-deliver it once a late-bound gateway callback is wired). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-21 11:34:47 -07:00
konsisumer	3e354b61db	fix(agent): preserve copilot routed headers	2026-06-21 11:29:49 -07:00
Teknium	b6a4638b6d	fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297 ) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty content. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.	2026-06-21 11:27:07 -07:00
teknium1	2f4f23fbfb	fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835 ) When the main provider is the Codex app-server runtime (api_mode codex_app_server), the gateway showed no verbose 'running X' tool-progress breadcrumbs on Telegram while every other provider did. The app-server session processes item/started notifications (command execution, file changes, MCP/dynamic tool calls) but never surfaced them as Hermes tool-progress events — the session was constructed without an on_event hook, so the agent's tool_progress_callback was never invoked on this route. Add _codex_note_to_tool_progress() mapping item/started → (tool_name, preview, args) for commandExecution / fileChange / mcpToolCall / dynamicToolCall, and wire an on_event hook into CodexAppServerSession that forwards mapped events to agent.tool_progress_callback('tool.started', ...) — the same signature the chat_completions path uses (tool_executor.py). Non-tool items (agentMessage/reasoning) and non-item/started methods map to None and are ignored. Co-authored-by: jplew <462836+jplew@users.noreply.github.com>	2026-06-21 08:46:06 -07:00
yeyitech	8a506ed3ac	fix(auth): make load_pool() non-destructive for env-seeded credentials load_pool() is meant to be a read, but it persistently pruned env-seeded pool entries whenever the calling process's os.environ lacked the seeding var. A process without MINIMAX_API_KEY would delete the persisted env:MINIMAX_API_KEY entry from auth.json for every other process, causing auth.json to oscillate and auxiliary auto-detect to fall through to the wrong provider. env:* entries are persisted references re-hydrated from the environment on each load — a missing var means "cannot re-seed right now", not "source is gone forever". _prune_stale_seeded_entries now gates env-source removal behind prune_env_sources (default True for explicit cleanup paths); load_pool() passes prune_env_sources=False. File-backed singletons (device-code OAuth, hermes_pkce) still prune when their backing file is gone, and explicit removal via `hermes auth remove` (source suppression) is unaffected. Fixes #9331. Co-authored-by: houko <suzukaze.haduki@gmail.com>	2026-06-21 08:26:37 -07:00
teknium1	3509be7124	fix(compression): auto-compression triggers at minimum context length (#14690 ) The compaction threshold is max(context_length * threshold_percent, MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on large models, but degenerates at small windows: a model at exactly 64000 ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE window. should_compress() can then never fire, because the provider rejects the request before usage reaches 100%. Auto-compression silently never triggers for any model whose context_length <= MINIMUM / threshold_percent (e.g. 64K-per-slot local models). Centralize the calc in _compute_threshold_tokens(). When the floor would meet or exceed the context window, trigger at 85% of the window (_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses most of its budget before compacting (compacting at the 50% percentage would waste half the small window), but below 100% so compaction actually fires before the provider rejects the request. This mirrors the existing gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at 64000) is unchanged; both call sites (__init__ and update_model) use the shared helper. Co-authored-by: soynchux <soynchuux@gmail.com> Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com> Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-21 07:53:14 -07:00
kshitij	c6a0929875	Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch fix(agent): reset stale token calibration on model switch (#23767)	2026-06-21 20:02:08 +05:30
kshitij	ed8f7898b9	Merge pull request #50136 from NousResearch/fix/context-aware-tool-budget fix(agent): scale tool-output budget to the model context window (#23767)	2026-06-21 20:01:32 +05:30
Teknium	9f67ba1b01	fix(agent): guard finalize_turn cleanup chain so it never drops the response (#50009 ) When a turn hit max_iterations, finalize_turn ran three unguarded cleanup steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources (remote VM/browser teardown), and _persist_session (SQLite write). Any raise there propagated out of run_conversation, discarding the partial final_response the caller was waiting for; subprocess wrappers saw an empty stdout with no traceback (#8049). Each step is now guarded independently so one failure can't skip the others. Failures log at ERROR with a traceback and are surfaced on the result dict via cleanup_errors; the partial response is always returned. Closes #8049.	2026-06-21 07:25:42 -07:00
kshitijk4poor	1e0b3a2bcc	fix(agent): reset stale token calibration on model switch (#23767 ) ContextCompressor.update_model() recomputed context_length/threshold/budgets but kept the cross-call calibration state (last_real_prompt_tokens, last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens, awaiting_real_usage_after_compression, _ineffective_compression_count) from the PREVIOUS model. Those fields encode 'the provider proved this prompt fit' / 'preflight can be deferred' decisions valid only for the model that produced them. Carried across a switch to a smaller-context model, should_defer_preflight_to_real_usage() used the old model's 'it fit' history to SKIP a preflight compression the new model actually needed — sending an oversized prompt the provider rejects (#23767). update_model() now clears that state; the new model's first response repopulates it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer no longer suppresses and should_compress fires on an over-threshold estimate.	2026-06-21 17:46:58 +05:30
kshitijk4poor	1965d56219	fix(agent): scale tool-output budget to the model context window (#23767 ) The tool-result persistence budget was a fixed 100K chars/result and 200K chars/turn regardless of the active model. On a small-context model (e.g. a 65K-token local model switched into mid-session) a single large tool result (reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens) could by itself approach or exceed the window, forcing an oversized request that the provider rejects as "Prompt too long". - budget_config.budget_for_context_window() scales per-result/per-turn char caps to a fraction of the model window, clamped to the historical 100K/200K defaults (large models unchanged) and floored so small models stay usable. - resolve_threshold() now caps the per-tool registry value at default_result_size so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate a scaled-down budget. No-op for the default budget (both 100K). - tool_executor wires the agent's live context_length (recomputed on model switch) into all four persist/turn-budget call sites. read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical to today.	2026-06-21 17:46:38 +05:30
LeonSGP43	3463188512	fix(auth): honor anthropic credential pool oauth Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-21 16:20:50 +05:30
teknium1	14ef6312b5	fix(compression): decay protect_first_n so early turns don't fossilize (#11996 ) protect_first_n keeps the first N non-system messages verbatim through compaction so the original task framing survives. But it was applied on EVERY compression pass: the same early user turns were re-copied into each child session and never summarized away, so across a long, repeatedly- compressed session those old messages became immortal and grew the protected head unboundedly (#11996, P1). Decay it: protect_first_n applies on the FIRST compaction only. Once the session has been compressed at least once (compression_count >= 1, or a handoff summary already exists), the early turns are captured in the summary, so _effective_protect_first_n() returns 0 and only the system prompt stays protected. The decay is read at compress_start computation time, before compression_count/_previous_summary are mutated at the end of compress(), so the first pass still protects correctly. Co-authored-by: truenorth-lj <liliangjya@gmail.com> Co-authored-by: davidvv <david.vv@icloud.com>	2026-06-21 00:06:58 -07:00
allo	bc85f6150e	docs: document per-event extra keys in shell-hook wire protocol The shell-hook stdin payload's extra object contains event-specific kwargs, but the docstring only mentioned the field without listing what each event actually puts inside it. Add a reference table covering post_tool_call, pre_tool_call, on_session_start, on_session_end, and subagent_stop — the five hook sites that emit extra keys beyond the top-level payload. Closes #49370	2026-06-20 23:23:47 -07:00
Greg DeYoung	5eb158e317	docs(hermes-agent skill): document project context files and their discovery rules Adds a new 'Project Context Files' section to the hermes-agent skill explaining the priority order and discovery rules for .hermes.md, AGENTS.md, CLAUDE.md, and .cursorrules. Specifically clarifies: - .hermes.md walks parents up to the git root (good for monorepos) - AGENTS.md / agents.md is cwd-only (portable to other agents) - The 20K cap and head+tail truncation strategy - The threat-pattern scanner behavior (blocks content, not file) - What --ignore-rules actually skips (everything) Also fixes an inaccurate docstring in agent/agent_init.py for skip_context_files — the previous text only mentioned SOUL.md, AGENTS.md, and .cursorrules, but the actual behavior (per build_context_files_prompt and the --ignore-rules CLI flag) skips all of them plus .hermes.md and CLAUDE.md. Refs: https://github.com/NousResearch/hermes-agent/issues/46775	2026-06-20 23:23:47 -07:00
teknium1	1f874dfe44	fix(compression): stop fallback summary triplicating the latest user ask When LLM summarization fails, the deterministic fallback summary rendered the latest user ask (active_task = "User asked: '<ask>'") verbatim under THREE headings — Historical Task Snapshot, Historical In-Progress State, and Historical Pending User Asks. Re-presenting an already-handled ask as unresolved in-progress/pending work made the model re-answer it AND treat the resurrected ask as the active turn, burying the genuinely-new post-compaction user message (#49307: answer repetition + new-instruction loss, P1). Keep the latest ask once, under Task Snapshot, as historical context only. The In-Progress and Pending-Asks sections now say 'Unknown / None recoverable from deterministic fallback' (consistent with the Active State / Key Decisions / Resolved Questions sections) and explicitly note the ask is historical, not outstanding. The raw turn text still appears in the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn record, not a re-labeled instruction. Note: the separate role=assistant standalone-summary regurgitation (#33256) is left as-is — that role choice is constrained by strict message alternation (user collides with a user-ending head) and is already mitigated by the summary end-marker; forcing the role would risk the alternation invariant. Co-authored-by: r266-tech <r2668940489@gmail.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-20 23:19:27 -07:00
teknium1	2f3177adf4	fix(compression): protect the summary call from mid-flight interrupts Context compression is atomic, but a gateway interrupt (an incoming user message while the agent is busy) could abort the in-flight summary call. The Codex Responses aux stream polls the thread interrupt flag and raised InterruptedError unconditionally — so compression fell back to a degraded static 'summary unavailable' marker, losing the real handoff (#23975). Add a thread-local interrupt-protection flag (aux_interrupt_protection context manager) in auxiliary_client; the Codex stream's cancellation check honors it. The compressor wraps its summary call_llm in the context manager. Timeouts still fire (a hung call must die) and all other aux tasks (vision, web_extract, title_generation, …) stay interruptible. Re-entrant, so the main-model retry recursion is safe. Co-authored-by: konsisumer <der@konsi.org>	2026-06-20 21:32:30 -07:00
teknium1	7ace96ba40	fix(compression): preserve goal, platform, and session indexing across rotation Three state-loss bugs at the compression rotation boundary, fixed together because they all live in the same ~80-line rotation block: - #33618: a persistent /goal did not follow the rotation. load_goal does a flat per-session lookup with no lineage walk, so a goal silently died when compression minted a fresh child id. Added migrate_goal_to_session() and call it after the child session is created (move-not-copy: the parent row is archived as cleared so exactly one active goal row exists). - #33906/#33907: if the child create_session raised (FK constraint, contended write), the outer handler only warned and let the agent continue on the NEW id — which has no row in state.db — producing an orphan session. Now the rotation rolls agent.session_id back to the still-indexed parent (reopening it) instead of stranding the conversation on a phantom id. - #27633: the compaction-boundary on_session_start notification omitted the platform kwarg, so context-engine plugins saw source=unknown for every message after the boundary. Forward platform (matching the initial session-start call in agent_init.py). Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com> Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com> Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-20 20:06:24 -07:00
x7peeps	4467c22c8f	fix(chat-completions): strip timestamp from messages before sending to strict providers Per-message timestamp metadata injected by _apply_persist_user_message_override leaks into the Chat Completions payload sent to the provider. Strict OpenAI-compatible providers (e.g. Fireworks-backed endpoints like OpenCode Go 'glm-5.2', Mistral, Kimi) reject this schema-foreign field with HTTP 400: Extra inputs are not permitted, field: 'messages[0].timestamp' The ChatCompletionsTransport.convert_messages already strips known internal-only fields (tool_name, _-prefixed scaffolding keys, codex_reasoning_items, etc.) — add timestamp to that list. Closes #47868	2026-06-20 17:05:17 -07:00
teknium1	5a53e0f0f4	fix(compression): abort on auth failure instead of rotating into a degraded session When the auxiliary summary call fails with an authentication/permission error (HTTP 401/403), context compression now ABORTS and preserves the session unchanged instead of rotating into a child session with a placeholder summary. Before: a 401 (invalid/blocked key, or a token pointed at the wrong inference host) fell through every transient-error check to 'return None', and because compression.abort_on_summary_failure defaults False, compress() took the static-fallback path and rotated the session anyway (messages N->N). The user landed on a fresh-but-broken session that kept failing the same way — paying for a full-context API call each turn with no useful compression. After: _generate_summary classifies 401/403 as a non-recoverable auth failure (_last_summary_auth_failure) and compress() aborts on it regardless of abort_on_summary_failure. A distinct auxiliary summary_model that 401s still retries once on the main model first (its dedicated creds may be the only broken thing); the abort only sticks when the main model itself auth-fails or the fallback also auth-fails. The existing _last_compress_aborted handling in conversation_compression.py already skips rotation and emits a warning, so no session rotation occurs. Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite flag=False, non-auth failures keep the historical fallback path, and aux-model auth failure recovers on main without aborting.	2026-06-20 11:38:21 -07:00
teknium1	f22dd8a75a	fix(agent): fail over to fallback provider on persistent auth failure (401/403) When the active provider returns a 401/403 that survives its per-provider credential-refresh attempt (revoked OAuth, blocked/expired key, or an account pinned to a dead/staging inference endpoint), the conversation loop now escalates to the configured fallback chain instead of dead-ending. Before: the generic failover dispatch fired only for {rate_limit, billing}; auth/auth_permanent fell through to 'switch providers manually' advice and never called _try_activate_fallback(). A user whose primary credential was broken kept thrashing on the same dead credential every turn — the main agent appeared 'stuck in fallback mode' while never actually failing over. This also affected auxiliary tasks (compression, vision, title-gen), since auto-resolved aux follows the main provider. After: a persistent auth failure with a configured fallback chain switches to the next provider (mirroring the rate-limit/billing failover path), guarded one-shot per attempt by TurnRetryState.auth_failover_attempted. When no fallback is configured the behavior is unchanged — it falls through to the existing terminal handling and provider-specific troubleshooting guidance. Tests: test_auth_provider_failover.py — 401/403 classify as auth, the gating condition fires only with a chain present + guard unset, the guard blocks repeats, and non-auth (500) errors do not trigger auth failover.	2026-06-20 11:38:01 -07:00

1 2 3 4 5 ...

1375 commits