hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
texhy	aacc6bb0a8	fix(agent): trigger preflight compression on few-but-huge sessions (#27405 ) The preflight-compression gate only ran the (expensive) token estimate when the message COUNT exceeded protect_first_n + protect_last_n + 1. A session with a handful of very large messages never tripped the count condition, so compression was never attempted and the turn eventually hit a hard context-overflow error. Add _should_run_preflight_estimate() with OR semantics: run the estimate when either the message count exceeds the protected ranges (the historical gate) OR a cheap char-based estimate already crosses the configured threshold. The downstream estimate_request_tokens_rough() stays authoritative — this is only a hint that decides whether to pay for the full estimate. Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current main: the preflight gate moved from conversation_loop.py to turn_context.py since the PR was opened, so the helper + gate are placed there; the test imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal. Closes #27405.	2026-06-25 01:20:23 +05:30
kshitijk4poor	ac822e4d36	fix(compression): abort (preserve context) on transient network summary failure (#29559 , #25585 ) When context compaction's summary generation fails, the compressor's default path (abort_on_summary_failure=False) drops the middle window and inserts a static 'summary unavailable' marker — destroying the compacted turns. #29559 reported the field impact: a Connection error at the compaction moment dropped 124->15 messages (110 lost) for a long browser-automation task; #25585 is the same failure mode (failed summary commits a destructive compaction anyway). compress() already has an EXCEPTION to the historical drop default: auth failures (401/403) ALWAYS abort and preserve the session, because rotating into a placeholder-summary child on a broken credential strands the user. A transient network/connection error is the same situation in reverse: it WILL recover, and retrying then is strictly better than discarding context for a momentary blip. Extend the always-abort carve-out to terminal connection/network failures: - new _last_summary_network_failure flag, set in _generate_summary's terminal failure branch when _is_connection_error(e) (reached only after any main-model fallback is exhausted), reset alongside the auth flag; - compress() aborts when it's set (returns messages unchanged, _last_compress_aborted=True), independent of abort_on_summary_failure; - a network-specific operator warning (distinct from the auth + config-flag messages). Scoped to connection errors only: a generic 500/400 still takes the historical fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green). Tests: network-failure detection + abort-despite-flag-false, both mutation-checked (removing the flag-set fails detection; removing the carve-out fails the abort).	2026-06-24 18:31:51 +05:30
yusekiotacode	2ee6449fe5	fix(anthropic): use platform.claude.com for OAuth token exchange Anthropic migrated the OAuth token endpoint from console.anthropic.com/v1/oauth/token (now returns HTTP 404) to platform.claude.com/v1/oauth/token. The token refresh path already iterated both hosts, but the two initial code-exchange call sites were hardcoded to the dead console host, so every new Claude OAuth login failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved no credentials. Fix the whole bug class: - Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live host for backward-compat with existing imports. - run_hermes_oauth_login_pure() (CLI flow) iterates the list, first success wins, mirroring the refresh path. - hermes_cli/web_server.py (desktop dashboard flow) imports the list and iterates it too, so the GUI login path is fixed identically. Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone), platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real Claude MAX OAuth login now succeeds end-to-end.	2026-06-23 23:59:40 -07:00
kyssta-exe	81d2dc5d0f	fix(agent): close tool-call sequence on interrupt to prevent role alternation violation (#48879 )	2026-06-23 23:52:28 -07:00
Teknium	8e7e104521	fix(cron): tell the user TUI/CLI cron jobs are local-only at create time (#51683 ) deliver=origin (or omitted) from a TUI or classic-CLI session produces a job with origin=null, because those sessions never populate the HERMES_SESSION_PLATFORM/CHAT_ID context vars that _origin_from_env reads. The scheduler then resolves no delivery target and skips delivery — the job runs and saves output to last_output, but nothing reaches the user and they only find out by polling cronjob(action='list') (#51568). This is by design (local sessions have no live-delivery channel), so the fix surfaces it instead of silently dropping the intent: - cronjob create now appends an informational notice to its result when a created job resolves to zero delivery targets and the user did not explicitly ask for deliver='local'. The check uses the scheduler's own _resolve_delivery_targets so it accounts for origin, home channels, 'all', and explicit platform targets — no false positives. - PLATFORM_HINTS gains a 'tui' entry (the TUI had none) and the 'cli' hint now states that cron jobs from these sessions are local-only and that deliver must target a gateway-connected platform to notify the user. This stops the agent promising a delivery that never happens. No scheduler/delivery behavior change; no new env var; cron isolation invariant untouched.	2026-06-23 23:27:48 -07:00
Brooklyn Nicholson	6afeea2bea	harden(pets): host-pin asset downloads + sanitize slug paths install_pet now refuses spritesheet/pet.json URLs that aren't on a petdex host (matching thumbnail_png's existing _is_petdex_host guard), so a spoofed manifest can't redirect a download at an arbitrary host. Slugs are normalized to a single path segment before indexing into pets_dir(), closing a path-traversal vector in load_pet/remove_pet/install_pet.	2026-06-23 19:13:08 -05:00
Brooklyn Nicholson	e495b33bf1	Merge remote-tracking branch 'origin/main' into bb/pets-merge # Conflicts: # hermes_cli/commands.py # tui_gateway/server.py	2026-06-23 19:05:22 -05:00
helix4u	292a456c06	fix(agent): handle concurrent tool submit shutdown	2026-06-24 02:56:56 +05:30
kshitij	9e924f79a8	Merge pull request #51539 from NousResearch/salvage/49045-toolcall-persist fix(agent): persist tool calls before turn-end flush (#49045)	2026-06-24 02:27:36 +05:30
Teknium	e32ebc6aa2	feat(skills): /learn — distill a reusable skill from anything you describe (#51506 ) Open-ended skill learning across every surface. /learn <free text> takes a description of any source — a directory, a URL, the workflow you just walked the agent through, or pasted notes — and the live agent gathers it with the tools it already has (read_file/search_files, web_extract, the conversation, the pasted text), then authors a SKILL.md via skill_manage following the house authoring standards (<=60-char description, the standard section order, Hermes-tool framing, no invented commands). No engine, no model-tool footprint, works on any terminal backend (local, Docker, remote): /learn builds a standards-guided prompt and hands it to the agent as a normal turn. - agent/learn_prompt.py: shared standards-guided prompt builder - /learn registry entry (both surfaces) + CLI handler (inject onto input queue) + gateway handler (rewrite turn, fall through, /blueprint pattern) - tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat - dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text) composes a /learn request and runs it in chat - docs (slash-commands ref + skills feature page), 11 targeted tests Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234 (dir-distillation engine); reworked to be open-ended and engine-free per review.	2026-06-23 13:51:28 -07:00
konsisumer	190b01c553	fix(agent): persist tool calls before turn-end flush Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-24 02:15:57 +05:30
brooklyn!	211ba9c7d3	feat(agent): one-shot LLM helper + llm.oneshot gateway RPC (#51261 ) A "one-shot" is a single stateless model call that runs OUTSIDE any conversation: it never touches session history, never breaks prompt caching, and returns plain text. UI surfaces need this for small generative chores — a commit message from a diff, a rename suggestion, a summary — where an agent turn would pollute the thread and hand-rolling an LLM call at every call site would be worse. - `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client plumbing (same path as title generation). Two call shapes: explicit instructions/input, or a registered `template` + `variables` (templates own the prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a `commit_message` template. Model selection inherits the live session via `main_runtime`, else the configured aux `task` backend. - `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the session's model when `session_id` resolves. Stateless by construction — no session mutation, cache untouched.	2026-06-23 08:01:50 +00:00
brooklyn!	af7b7f6322	feat(agent): expose coding-context project facts as structured data + project.facts RPC (#51259 ) Follow-up to the coding-context posture (#43316): that PR detects each repo's verify loop (manifests, package manager, exact test/lint/build commands, context files) and bakes it into the system-prompt snapshot — but only as a string, for the model. Non-prompt consumers (the desktop verify UI) had no way to read it without re-sniffing and drifting from the prompt. Split detection from rendering, keeping one source of truth: - `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured facts; `_project_facts()` now renders it into the same snapshot lines, so the prompt block stays byte-identical (cache-safe). - `project_facts_for(cwd)` resolves the workspace root (git, else marker) and returns the structured facts, or None outside a workspace. - `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP). Tests assert the structured output and that the UI-facing commands never drift from what the prompt block renders (one detector feeds both).	2026-06-23 08:00:01 +00:00
Brooklyn Nicholson	88e136448d	fix(agent): shrink anthropic-native image history Retry image-size rejections by rewriting Anthropic base64 image source blocks, not just OpenAI-style image_url parts.	2026-06-22 18:23:21 -05:00
Teknium	87c4a5ebb8	feat(background-review): aux-model selector for the self-improvement review (#49252 ) Adds auxiliary.background_review.{provider,model} (default auto = main chat model — unchanged). Set it to a different, cheaper model and the post-turn self-improvement review runs there for ~3-5x lower cost. Cache-aware by design: the main chat is warm in the prompt cache, so the default full-history replay on the main model is cheap cache reads — left exactly as-is. A different model can't reuse that cache (different key), so when (and only when) routed to a different model the fork replays a compact digest instead of the full transcript, minimising what it cold-writes on the aux model. Same model -> full replay; different model -> digest. Quality holds in benchmarks: memory capture identical, skill near-identical. Nothing changes unless you opt in by naming a different model. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-22 14:54:53 -07:00
helix4u	3972701424	fix(agent): complete final text on last turn	2026-06-22 13:57:59 -07:00
Teknium	b1b20270c4	refactor(memory): move write-mirror gating behind MemoryManager interface The success/staged gating and op-expansion for mirroring built-in memory writes to external providers lived in a standalone agent/memory_write_bridge.py helper called inline from two core call sites (tool_executor.py, agent_runtime_helpers.py). That left the mirror decision-making in the agent loop, outside the memory-provider interface. Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the loop now hands over the raw tool result + args and a metadata callback, and the manager decides whether/what to mirror. Both core call sites collapse to a single call; the orphan module is removed. No MemoryProvider ABC change. Tests rewritten as behavior tests against the manager method.	2026-06-22 07:00:42 -07:00
Hao Zhe	027cb649ef	fix(memory): fail closed on unclear write results	2026-06-22 07:00:42 -07:00
Hao Zhe	70e7132e2f	fix(openviking): gate memory writes and add viking_forget Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove. Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.	2026-06-22 07:00:42 -07:00
Francesco Bonacci	f2e37549c6	feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux) Make the computer_use toolset platform-agnostic by driving cua-driver on macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces (capability discovery, structuredContent AX tree, opaque element_token, click button enum, explicit mimeType, machine-readable manifest, structured list_windows, structured health_report), each degrading gracefully on older drivers. Adds `hermes computer-use doctor` (drives cua-driver health_report with a per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full typed wrappers for the previously-uncovered cua-driver tools plus a generic call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware system-prompt guidance (host-deterministic, cache-safe), and honors HERMES_CUA_DRIVER_CMD end-to-end. Replaces the macOS-only skills/apple/macos-computer-use skill with a cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans docs. Supersedes #44221 (Windows-enablement salvage of #30660). Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-22 06:42:30 -07:00
kshitijk4poor	623b21bf24	fix(compress): reserve output tokens in the compaction threshold (#23767 , #43547 ) The compaction trigger compared estimated input against context_length * threshold, but the provider reserves max_tokens of OUTPUT out of the same window. With a large max_tokens (e.g. 65536 on a custom provider) the usable input budget is materially smaller than the raw window, so sessions hit a provider 400 before compaction ever fired. _compute_threshold_tokens now subtracts the output reservation (context_length - max_tokens) before applying the percentage and the small-window 85% guard. max_tokens is stored on the compressor (threaded from agent.max_tokens at construction) and reused across update_model() switches; None = provider default = no reservation (full-window behavior, unchanged). Reimplemented on the current _compute_threshold_tokens surface (the inline threshold calc the original PR targeted was since refactored for the small-window #14690 fix); composes with that 85% guard on the effective budget. Credit: @kyssta-exe (#43651) — original design for the output-token reservation in the compaction threshold. Closes #43547.	2026-06-22 17:26:17 +05:30
kshitijk4poor	b2c84a1626	fix(agent): defer preflight compaction until real usage after a compaction (#23767 , #36718 ) After a compaction, the post-compression path parks last_prompt_tokens=-1 and sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens still holds the stale pre-compression value (above threshold). should_defer_ preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False' short-circuit and let preflight fire a SECOND compaction before the provider reported real post-compaction usage. Add an early-return on the awaiting flag so deferral holds for exactly one turn; update_from_response() clears it. The flag-setting half (#36718) already landed on main via the in-place compaction path (conversation_compression.py); this adds the missing should_defer guard that consumes it. Credit: - @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design - @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement Closes #36718.	2026-06-22 16:33:18 +05:30
Basil Al Shukaili	72f75f8456	fix(compressor): count tool_call envelope in tail-budget token estimate (#28053 ) The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing. Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere. Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 16:26:56 +05:30
kshitij	aa83213c53	Merge pull request #50740 from NousResearch/salvage/preflight-token-progress Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details fix(agent): count tokens, not just rows, as preflight compression progress (#23767, #39548)	2026-06-22 15:58:58 +05:30
kshitij	21541ce6e9	Merge pull request #50108 from NousResearch/salvage/f4m1-anthropic-pool fix(auth): consult credential_pool in resolve_anthropic_token (#26344)	2026-06-22 15:58:01 +05:30
Brooklyn Nicholson	5342eccf12	Merge remote-tracking branch 'origin/main' into bb/pets	2026-06-22 05:25:49 -05:00
kshitijk4poor	69de0360a1	fix(agent): align preflight token-progress floor to 5% (#23767 , #39548 ) Follow-up to the salvaged preflight token-progress fix: require a material (>5%) token reduction to count as progress, matching the overflow-handler retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the 3-pass preflight loop spinning. Adds boundary + zero-token regression tests.	2026-06-22 15:51:52 +05:30
kshitijk4poor	3545d29422	refactor(auth): drop dead select() fallback in anthropic pool resolver /simplify-code QUALITY finding: the `if callable(_available_entries): ... else: pool.select()` ladder was dead for the real CredentialPool type (`_available_entries` is always a bound method) AND the select() fallback violated the helper's read-only contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True, refresh=True), which persists to auth.json and triggers a network refresh. Call _available_entries(clear_expired=False, refresh=False) directly inside the existing try/except instead. Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the read-only / null-token contract tests were mutation-checked (flipping the flags / removing the None-guard fails the respective test).	2026-06-22 15:50:26 +05:30
JackJin	b08ee8ad04	fix(agent): count tokens, not just rows, as preflight compression progress Rebased onto god-file Phase 1 refactor — preflight compression has moved from agent/conversation_loop.py to agent/turn_context.py (no semantic change in the refactor itself; the bug below was carried over verbatim). The preflight compression loop in ``turn_context.py`` uses ``len(messages) >= _orig_len`` to decide whether a compression pass has made progress. That conflates two different conditions: a true no-op (transcript materially unchanged) and effective token compression that summarises message contents but keeps the same number of rows. The second case is misread as "Cannot compress further" — the session then surfaces ``Context length exceeded`` and auto-resets even when the post-compression estimate is far below the model context window. Observed example from #39548: a Telegram session on GPT-5.5 with a 1M context dropped from ~288k → ~183k tokens (a 36% reduction) while preserving 220 messages. The loop treats that as exhaustion and the gateway auto-resets the session. Fix --- Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)`` and call it after the post-pass ``estimate_request_tokens_rough`` (which is moved up to run before the progress check instead of after it). Either a row-count reduction OR a token-count reduction now counts as progress; only when neither moves do we break out as "stuck". Fixes #39548	2026-06-22 15:49:19 +05:30
David Gutowsky	87b60ae49a	no-mistakes(review): guard token-delta status msg on actual compression in overflow handler	2026-06-22 15:23:24 +05:30
David Gutowsky	47b6b4cf85	fix #39550 : detect token-only compression success Compression can materially reduce request size (tool-result pruning, in-place summarization) without reducing message count. The two compression-success checks in conversation_loop.py (413 handler and context-overflow handler) only compared len(messages) to detect success, missing token-only compression. Now re-estimates tokens after compress_context() returns and treats any >=5% reduction as a successful compression pass. Error logs also use the post-compression token count instead of the stale pre-compression estimate. Fixes: #39550	2026-06-22 15:23:24 +05:30
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
Teknium	2b3a4f0af8	fix(agent): strip stale reasoning_content when falling back to a strict provider (#50480 ) * fix(agent): strip stale reasoning_content when falling back to a strict provider A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content on every assistant tool-call turn (a single space " " pad). api_messages is built once under the primary; on a mid-session fallback to a strict OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale pads were replayed verbatim and rejected with HTTP 400/422: body.messages.2.assistant.reasoning_content: Extra inputs are not permitted (input: ' ') reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never reconciled history built under a reasoning primary against a strict fallback. copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only shapes to non-pad providers. Fix both sites: when the active provider does not enforce echo-back, strip reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when switching TO a reasoning provider is preserved. Covers the Cerebras 400 from #45655 and the DeepSeek->Mistral 422 fallback report. Refs #45655. * test: update reasoning-replay tests for strict-provider stripping test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning provider so the precedence it checks is observable. Add a positive strict-provider test asserting reasoning_content is stripped on replay.	2026-06-21 18:05:07 -07:00
Teknium	84e1d31e54	refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473 ) The kanban-worker and kanban-orchestrator bundled skills existed only to be force-loaded into dispatcher-spawned workers, gated by environments:[kanban] so they wouldn't leak into normal CLI listings. That gating was fragile (the leak that #50443 patched) and the --skills auto-load was already best-effort — most workers ran without it because the bundled skill isn't present in profile-scoped skills dirs. Remove the skills entirely and promote their load-bearing content (workspace kinds, deliverable artifacts, created-card integrity, profile discovery) into KANBAN_GUIDANCE, which is already injected into every kanban worker's system prompt. Net result: every worker reliably gets the guidance, nothing can leak into a CLI/blank-slate session, and the gating machinery is gone. - agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE - hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe - hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card - hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text - delete skills/devops/kanban-{worker,orchestrator} - docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references - tests: update spawn-argv expectations; re-bound the guidance-size guard Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).	2026-06-21 17:06:48 -07:00
Teknium	b7a912ea45	fix(antigravity): bake in public OAuth client + default project fallback Salvage follow-up on top of @pmos69's #29474. The PR resolved the Antigravity OAuth client purely by discovering it from an installed `agy` binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without agy installed hit a hard 'client ID not available' error. Antigravity's desktop OAuth client is a public, non-confidential installed-app client (PKCE provides the security), baked into every copy of the Antigravity CLI — same posture as the gemini-cli credentials Hermes already ships in google_oauth.py. Bake it in as the final fallback (env -> discovery -> public default) and add the public default Code Assist project as the discovery fallback, matching the reference Antigravity flow. Now consumers can authenticate directly without agy installed.	2026-06-21 16:41:30 -07:00
pmos69	8baa4e9976	feat(cli): add native Antigravity OAuth provider	2026-06-21 16:41:30 -07:00
JP Lew	c11ae8261b	fix(codex): seed app-server sessions with configured cwd	2026-06-21 16:39:02 -07:00
devorun	6f0ecf37da	fix(redact): mask all Authorization schemes and x-api-key style headers Secret redaction only matched `Authorization: Bearer <token>`. Other auth headers passed through verbatim into logs, tool output, and transcripts: - `Authorization: Basic <base64>` — leaks base64(user:password) - `Authorization: token <pat>` / any non-Bearer scheme - `Proxy-Authorization: ...` - `x-api-key: <key>` (Anthropic and many providers) and `api-key`, `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with no known vendor prefix were caught by nothing A logged request or an echoed `curl -H "x-api-key: ..."` command therefore leaked live credentials. Generalize the Authorization rule to mask the credential for any scheme (and Proxy-Authorization) while preserving the header name and scheme word for debuggability, and add an api-key header rule for the single-opaque-value headers. Bearer behavior is unchanged; plain prose containing the word "authorization" (no colon-delimited value) is left untouched. Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key headers, including inside a curl command.	2026-06-21 14:08:06 -07:00
Teknium	587b5b9ac2	fix(backup): capture memory-provider state stored outside HERMES_HOME (#50325 ) hermes backup only walks HERMES_HOME, so memory providers that keep config/credentials in home-anchored dotdirs (honcho -> ~/.honcho, hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data across a backup/import cycle — the peer IDs, session pairings, and API keys never made it into the archive. Add an optional MemoryProvider.backup_paths() hook (default []). The active provider declares its external paths; backup resolves them from config only (no init, no network), archives the ones under the home dir into a reserved _external/ subtree encoded relative to home, and import restores them to their original location with a home-anchored traversal guard and 0600 on credential-shaped files. Paths outside home are skipped as non-portable. honcho, hindsight, and openviking override the hook. E2E-validated full backup->import cycle plus 7 new tests.	2026-06-21 12:03:46 -07:00
Teknium	e0498bd305	fix(bedrock): price Claude prompt-cache tokens in /usage (#50307 ) Bedrock Claude routes through the AnthropicBedrock SDK and injects cache_control, so cached tokens are always reported — but the pricing table had no cache cost fields for any Bedrock model, so /usage showed "cost unknown" on every cached session. Also, cross-region inference profiles (us./global./eu. prefixes) never matched the bare pricing keys. - Add cache_read/cache_write rates to the four Bedrock Claude rows (read 0.1x input, write 1.25x input per the Bedrock pricing page). - Normalize the cross-region prefix in the Bedrock pricing lookup, mirroring is_anthropic_bedrock_model's prefix list. Closes #50295.	2026-06-21 11:48:43 -07:00
teknium1	9e4fe32d36	fix(session): opt the background-review fork out of session finalization The background-review fork (fires ~every 10 turns) pins review_agent.session_id = agent.session_id — the parent's LIVE id — for prefix-cache parity, then calls close(). With session finalization now in close(), that would end the still-active parent session mid-conversation. Set _end_session_on_close = False on the fork so the real owner (CLI close / gateway reset / cron) finalizes the session instead. Follow-up to the #12029 fix.	2026-06-21 11:35:09 -07:00
yeyitech	b17180d950	fix(session): finalize owned SQLite session rows on AIAgent.close() Funnel session finalization through AIAgent.close() — the single terminal path every agent (CLI, gateway, subagent, cron) funnels through — so finished agents stop leaving rows with ended_at IS NULL. The biggest leak source was delegate_task subagent + background-review forks whose close() never ended their row. end_session() is first-reason-wins and no-ops on an already-ended row, so a 'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal path is never clobbered. /resume already calls reopen_session(), so finalizing-on-close does not break resumability. Temporary helper agents that rotate/share the session forward (manual compression, gateway session-hygiene) opt out via _end_session_on_close=False. Also stop the long-running gateway heartbeat once the executor is done or the session slot is rebound to a different agent, preventing a stale 'running: delegate_task' bubble from outliving its run. Closes #12029.	2026-06-21 11:35:09 -07:00
teknium1	41e0c10f7e	fix(agent): route repeated-compression warning through _emit_status (#36908 ) The 'Session compressed N times — accuracy may degrade' warning went through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord never saw it — unlike the two other compression warnings in the same module, which route through _emit_status (and store _compression_warning for late-bound gateway status_callback replay). Set agent._compression_warning + call agent._emit_status() for this warning too, matching the sibling pattern. _emit_status still _vprints for the CLI, so CLI output is unchanged; TUI / gateway surfaces now receive it via status_callback (and replay_compression_warning can re-deliver it once a late-bound gateway callback is wired). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-21 11:34:47 -07:00
konsisumer	3e354b61db	fix(agent): preserve copilot routed headers	2026-06-21 11:29:49 -07:00
Teknium	b6a4638b6d	fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297 ) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty content. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.	2026-06-21 11:27:07 -07:00
teknium1	2f4f23fbfb	fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835 ) When the main provider is the Codex app-server runtime (api_mode codex_app_server), the gateway showed no verbose 'running X' tool-progress breadcrumbs on Telegram while every other provider did. The app-server session processes item/started notifications (command execution, file changes, MCP/dynamic tool calls) but never surfaced them as Hermes tool-progress events — the session was constructed without an on_event hook, so the agent's tool_progress_callback was never invoked on this route. Add _codex_note_to_tool_progress() mapping item/started → (tool_name, preview, args) for commandExecution / fileChange / mcpToolCall / dynamicToolCall, and wire an on_event hook into CodexAppServerSession that forwards mapped events to agent.tool_progress_callback('tool.started', ...) — the same signature the chat_completions path uses (tool_executor.py). Non-tool items (agentMessage/reasoning) and non-item/started methods map to None and are ignored. Co-authored-by: jplew <462836+jplew@users.noreply.github.com>	2026-06-21 08:46:06 -07:00
yeyitech	8a506ed3ac	fix(auth): make load_pool() non-destructive for env-seeded credentials load_pool() is meant to be a read, but it persistently pruned env-seeded pool entries whenever the calling process's os.environ lacked the seeding var. A process without MINIMAX_API_KEY would delete the persisted env:MINIMAX_API_KEY entry from auth.json for every other process, causing auth.json to oscillate and auxiliary auto-detect to fall through to the wrong provider. env:* entries are persisted references re-hydrated from the environment on each load — a missing var means "cannot re-seed right now", not "source is gone forever". _prune_stale_seeded_entries now gates env-source removal behind prune_env_sources (default True for explicit cleanup paths); load_pool() passes prune_env_sources=False. File-backed singletons (device-code OAuth, hermes_pkce) still prune when their backing file is gone, and explicit removal via `hermes auth remove` (source suppression) is unaffected. Fixes #9331. Co-authored-by: houko <suzukaze.haduki@gmail.com>	2026-06-21 08:26:37 -07:00
teknium1	3509be7124	fix(compression): auto-compression triggers at minimum context length (#14690 ) The compaction threshold is max(context_length * threshold_percent, MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on large models, but degenerates at small windows: a model at exactly 64000 ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE window. should_compress() can then never fire, because the provider rejects the request before usage reaches 100%. Auto-compression silently never triggers for any model whose context_length <= MINIMUM / threshold_percent (e.g. 64K-per-slot local models). Centralize the calc in _compute_threshold_tokens(). When the floor would meet or exceed the context window, trigger at 85% of the window (_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses most of its budget before compacting (compacting at the 50% percentage would waste half the small window), but below 100% so compaction actually fires before the provider rejects the request. This mirrors the existing gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at 64000) is unchanged; both call sites (__init__ and update_model) use the shared helper. Co-authored-by: soynchux <soynchuux@gmail.com> Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com> Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-21 07:53:14 -07:00
kshitij	c6a0929875	Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch fix(agent): reset stale token calibration on model switch (#23767)	2026-06-21 20:02:08 +05:30

1 2 3 4 5 ...

1390 commits