hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 10:52:21 +00:00

Author	SHA1	Message	Date
David Gutowsky	87b60ae49a	no-mistakes(review): guard token-delta status msg on actual compression in overflow handler	2026-06-22 15:23:24 +05:30
David Gutowsky	47b6b4cf85	fix #39550 : detect token-only compression success Compression can materially reduce request size (tool-result pruning, in-place summarization) without reducing message count. The two compression-success checks in conversation_loop.py (413 handler and context-overflow handler) only compared len(messages) to detect success, missing token-only compression. Now re-estimates tokens after compress_context() returns and treats any >=5% reduction as a successful compression pass. Error logs also use the post-compression token count instead of the stale pre-compression estimate. Fixes: #39550	2026-06-22 15:23:24 +05:30
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
Teknium	2b3a4f0af8	fix(agent): strip stale reasoning_content when falling back to a strict provider (#50480 ) * fix(agent): strip stale reasoning_content when falling back to a strict provider A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content on every assistant tool-call turn (a single space " " pad). api_messages is built once under the primary; on a mid-session fallback to a strict OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale pads were replayed verbatim and rejected with HTTP 400/422: body.messages.2.assistant.reasoning_content: Extra inputs are not permitted (input: ' ') reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never reconciled history built under a reasoning primary against a strict fallback. copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only shapes to non-pad providers. Fix both sites: when the active provider does not enforce echo-back, strip reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when switching TO a reasoning provider is preserved. Covers the Cerebras 400 from #45655 and the DeepSeek->Mistral 422 fallback report. Refs #45655. * test: update reasoning-replay tests for strict-provider stripping test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning provider so the precedence it checks is observable. Add a positive strict-provider test asserting reasoning_content is stripped on replay.	2026-06-21 18:05:07 -07:00
Teknium	84e1d31e54	refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473 ) The kanban-worker and kanban-orchestrator bundled skills existed only to be force-loaded into dispatcher-spawned workers, gated by environments:[kanban] so they wouldn't leak into normal CLI listings. That gating was fragile (the leak that #50443 patched) and the --skills auto-load was already best-effort — most workers ran without it because the bundled skill isn't present in profile-scoped skills dirs. Remove the skills entirely and promote their load-bearing content (workspace kinds, deliverable artifacts, created-card integrity, profile discovery) into KANBAN_GUIDANCE, which is already injected into every kanban worker's system prompt. Net result: every worker reliably gets the guidance, nothing can leak into a CLI/blank-slate session, and the gating machinery is gone. - agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE - hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe - hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card - hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text - delete skills/devops/kanban-{worker,orchestrator} - docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references - tests: update spawn-argv expectations; re-bound the guidance-size guard Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).	2026-06-21 17:06:48 -07:00
Teknium	b7a912ea45	fix(antigravity): bake in public OAuth client + default project fallback Salvage follow-up on top of @pmos69's #29474. The PR resolved the Antigravity OAuth client purely by discovering it from an installed `agy` binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without agy installed hit a hard 'client ID not available' error. Antigravity's desktop OAuth client is a public, non-confidential installed-app client (PKCE provides the security), baked into every copy of the Antigravity CLI — same posture as the gemini-cli credentials Hermes already ships in google_oauth.py. Bake it in as the final fallback (env -> discovery -> public default) and add the public default Code Assist project as the discovery fallback, matching the reference Antigravity flow. Now consumers can authenticate directly without agy installed.	2026-06-21 16:41:30 -07:00
pmos69	8baa4e9976	feat(cli): add native Antigravity OAuth provider	2026-06-21 16:41:30 -07:00
JP Lew	c11ae8261b	fix(codex): seed app-server sessions with configured cwd	2026-06-21 16:39:02 -07:00
devorun	6f0ecf37da	fix(redact): mask all Authorization schemes and x-api-key style headers Secret redaction only matched `Authorization: Bearer <token>`. Other auth headers passed through verbatim into logs, tool output, and transcripts: - `Authorization: Basic <base64>` — leaks base64(user:password) - `Authorization: token <pat>` / any non-Bearer scheme - `Proxy-Authorization: ...` - `x-api-key: <key>` (Anthropic and many providers) and `api-key`, `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with no known vendor prefix were caught by nothing A logged request or an echoed `curl -H "x-api-key: ..."` command therefore leaked live credentials. Generalize the Authorization rule to mask the credential for any scheme (and Proxy-Authorization) while preserving the header name and scheme word for debuggability, and add an api-key header rule for the single-opaque-value headers. Bearer behavior is unchanged; plain prose containing the word "authorization" (no colon-delimited value) is left untouched. Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key headers, including inside a curl command.	2026-06-21 14:08:06 -07:00
Teknium	587b5b9ac2	fix(backup): capture memory-provider state stored outside HERMES_HOME (#50325 ) hermes backup only walks HERMES_HOME, so memory providers that keep config/credentials in home-anchored dotdirs (honcho -> ~/.honcho, hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data across a backup/import cycle — the peer IDs, session pairings, and API keys never made it into the archive. Add an optional MemoryProvider.backup_paths() hook (default []). The active provider declares its external paths; backup resolves them from config only (no init, no network), archives the ones under the home dir into a reserved _external/ subtree encoded relative to home, and import restores them to their original location with a home-anchored traversal guard and 0600 on credential-shaped files. Paths outside home are skipped as non-portable. honcho, hindsight, and openviking override the hook. E2E-validated full backup->import cycle plus 7 new tests.	2026-06-21 12:03:46 -07:00
Teknium	e0498bd305	fix(bedrock): price Claude prompt-cache tokens in /usage (#50307 ) Bedrock Claude routes through the AnthropicBedrock SDK and injects cache_control, so cached tokens are always reported — but the pricing table had no cache cost fields for any Bedrock model, so /usage showed "cost unknown" on every cached session. Also, cross-region inference profiles (us./global./eu. prefixes) never matched the bare pricing keys. - Add cache_read/cache_write rates to the four Bedrock Claude rows (read 0.1x input, write 1.25x input per the Bedrock pricing page). - Normalize the cross-region prefix in the Bedrock pricing lookup, mirroring is_anthropic_bedrock_model's prefix list. Closes #50295.	2026-06-21 11:48:43 -07:00
teknium1	9e4fe32d36	fix(session): opt the background-review fork out of session finalization The background-review fork (fires ~every 10 turns) pins review_agent.session_id = agent.session_id — the parent's LIVE id — for prefix-cache parity, then calls close(). With session finalization now in close(), that would end the still-active parent session mid-conversation. Set _end_session_on_close = False on the fork so the real owner (CLI close / gateway reset / cron) finalizes the session instead. Follow-up to the #12029 fix.	2026-06-21 11:35:09 -07:00
yeyitech	b17180d950	fix(session): finalize owned SQLite session rows on AIAgent.close() Funnel session finalization through AIAgent.close() — the single terminal path every agent (CLI, gateway, subagent, cron) funnels through — so finished agents stop leaving rows with ended_at IS NULL. The biggest leak source was delegate_task subagent + background-review forks whose close() never ended their row. end_session() is first-reason-wins and no-ops on an already-ended row, so a 'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal path is never clobbered. /resume already calls reopen_session(), so finalizing-on-close does not break resumability. Temporary helper agents that rotate/share the session forward (manual compression, gateway session-hygiene) opt out via _end_session_on_close=False. Also stop the long-running gateway heartbeat once the executor is done or the session slot is rebound to a different agent, preventing a stale 'running: delegate_task' bubble from outliving its run. Closes #12029.	2026-06-21 11:35:09 -07:00
teknium1	41e0c10f7e	fix(agent): route repeated-compression warning through _emit_status (#36908 ) The 'Session compressed N times — accuracy may degrade' warning went through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord never saw it — unlike the two other compression warnings in the same module, which route through _emit_status (and store _compression_warning for late-bound gateway status_callback replay). Set agent._compression_warning + call agent._emit_status() for this warning too, matching the sibling pattern. _emit_status still _vprints for the CLI, so CLI output is unchanged; TUI / gateway surfaces now receive it via status_callback (and replay_compression_warning can re-deliver it once a late-bound gateway callback is wired). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-21 11:34:47 -07:00
konsisumer	3e354b61db	fix(agent): preserve copilot routed headers	2026-06-21 11:29:49 -07:00
Teknium	b6a4638b6d	fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297 ) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty content. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.	2026-06-21 11:27:07 -07:00
teknium1	2f4f23fbfb	fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835 ) When the main provider is the Codex app-server runtime (api_mode codex_app_server), the gateway showed no verbose 'running X' tool-progress breadcrumbs on Telegram while every other provider did. The app-server session processes item/started notifications (command execution, file changes, MCP/dynamic tool calls) but never surfaced them as Hermes tool-progress events — the session was constructed without an on_event hook, so the agent's tool_progress_callback was never invoked on this route. Add _codex_note_to_tool_progress() mapping item/started → (tool_name, preview, args) for commandExecution / fileChange / mcpToolCall / dynamicToolCall, and wire an on_event hook into CodexAppServerSession that forwards mapped events to agent.tool_progress_callback('tool.started', ...) — the same signature the chat_completions path uses (tool_executor.py). Non-tool items (agentMessage/reasoning) and non-item/started methods map to None and are ignored. Co-authored-by: jplew <462836+jplew@users.noreply.github.com>	2026-06-21 08:46:06 -07:00
yeyitech	8a506ed3ac	fix(auth): make load_pool() non-destructive for env-seeded credentials load_pool() is meant to be a read, but it persistently pruned env-seeded pool entries whenever the calling process's os.environ lacked the seeding var. A process without MINIMAX_API_KEY would delete the persisted env:MINIMAX_API_KEY entry from auth.json for every other process, causing auth.json to oscillate and auxiliary auto-detect to fall through to the wrong provider. env:* entries are persisted references re-hydrated from the environment on each load — a missing var means "cannot re-seed right now", not "source is gone forever". _prune_stale_seeded_entries now gates env-source removal behind prune_env_sources (default True for explicit cleanup paths); load_pool() passes prune_env_sources=False. File-backed singletons (device-code OAuth, hermes_pkce) still prune when their backing file is gone, and explicit removal via `hermes auth remove` (source suppression) is unaffected. Fixes #9331. Co-authored-by: houko <suzukaze.haduki@gmail.com>	2026-06-21 08:26:37 -07:00
teknium1	3509be7124	fix(compression): auto-compression triggers at minimum context length (#14690 ) The compaction threshold is max(context_length * threshold_percent, MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on large models, but degenerates at small windows: a model at exactly 64000 ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE window. should_compress() can then never fire, because the provider rejects the request before usage reaches 100%. Auto-compression silently never triggers for any model whose context_length <= MINIMUM / threshold_percent (e.g. 64K-per-slot local models). Centralize the calc in _compute_threshold_tokens(). When the floor would meet or exceed the context window, trigger at 85% of the window (_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses most of its budget before compacting (compacting at the 50% percentage would waste half the small window), but below 100% so compaction actually fires before the provider rejects the request. This mirrors the existing gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at 64000) is unchanged; both call sites (__init__ and update_model) use the shared helper. Co-authored-by: soynchux <soynchuux@gmail.com> Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com> Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-21 07:53:14 -07:00
kshitij	c6a0929875	Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch fix(agent): reset stale token calibration on model switch (#23767)	2026-06-21 20:02:08 +05:30
kshitij	ed8f7898b9	Merge pull request #50136 from NousResearch/fix/context-aware-tool-budget fix(agent): scale tool-output budget to the model context window (#23767)	2026-06-21 20:01:32 +05:30
Teknium	9f67ba1b01	fix(agent): guard finalize_turn cleanup chain so it never drops the response (#50009 ) When a turn hit max_iterations, finalize_turn ran three unguarded cleanup steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources (remote VM/browser teardown), and _persist_session (SQLite write). Any raise there propagated out of run_conversation, discarding the partial final_response the caller was waiting for; subprocess wrappers saw an empty stdout with no traceback (#8049). Each step is now guarded independently so one failure can't skip the others. Failures log at ERROR with a traceback and are surfaced on the result dict via cleanup_errors; the partial response is always returned. Closes #8049.	2026-06-21 07:25:42 -07:00
kshitijk4poor	1e0b3a2bcc	fix(agent): reset stale token calibration on model switch (#23767 ) ContextCompressor.update_model() recomputed context_length/threshold/budgets but kept the cross-call calibration state (last_real_prompt_tokens, last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens, awaiting_real_usage_after_compression, _ineffective_compression_count) from the PREVIOUS model. Those fields encode 'the provider proved this prompt fit' / 'preflight can be deferred' decisions valid only for the model that produced them. Carried across a switch to a smaller-context model, should_defer_preflight_to_real_usage() used the old model's 'it fit' history to SKIP a preflight compression the new model actually needed — sending an oversized prompt the provider rejects (#23767). update_model() now clears that state; the new model's first response repopulates it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer no longer suppresses and should_compress fires on an over-threshold estimate.	2026-06-21 17:46:58 +05:30
kshitijk4poor	1965d56219	fix(agent): scale tool-output budget to the model context window (#23767 ) The tool-result persistence budget was a fixed 100K chars/result and 200K chars/turn regardless of the active model. On a small-context model (e.g. a 65K-token local model switched into mid-session) a single large tool result (reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens) could by itself approach or exceed the window, forcing an oversized request that the provider rejects as "Prompt too long". - budget_config.budget_for_context_window() scales per-result/per-turn char caps to a fraction of the model window, clamped to the historical 100K/200K defaults (large models unchanged) and floored so small models stay usable. - resolve_threshold() now caps the per-tool registry value at default_result_size so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate a scaled-down budget. No-op for the default budget (both 100K). - tool_executor wires the agent's live context_length (recomputed on model switch) into all four persist/turn-budget call sites. read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical to today.	2026-06-21 17:46:38 +05:30
teknium1	14ef6312b5	fix(compression): decay protect_first_n so early turns don't fossilize (#11996 ) protect_first_n keeps the first N non-system messages verbatim through compaction so the original task framing survives. But it was applied on EVERY compression pass: the same early user turns were re-copied into each child session and never summarized away, so across a long, repeatedly- compressed session those old messages became immortal and grew the protected head unboundedly (#11996, P1). Decay it: protect_first_n applies on the FIRST compaction only. Once the session has been compressed at least once (compression_count >= 1, or a handoff summary already exists), the early turns are captured in the summary, so _effective_protect_first_n() returns 0 and only the system prompt stays protected. The decay is read at compress_start computation time, before compression_count/_previous_summary are mutated at the end of compress(), so the first pass still protects correctly. Co-authored-by: truenorth-lj <liliangjya@gmail.com> Co-authored-by: davidvv <david.vv@icloud.com>	2026-06-21 00:06:58 -07:00
allo	bc85f6150e	docs: document per-event extra keys in shell-hook wire protocol The shell-hook stdin payload's extra object contains event-specific kwargs, but the docstring only mentioned the field without listing what each event actually puts inside it. Add a reference table covering post_tool_call, pre_tool_call, on_session_start, on_session_end, and subagent_stop — the five hook sites that emit extra keys beyond the top-level payload. Closes #49370	2026-06-20 23:23:47 -07:00
Greg DeYoung	5eb158e317	docs(hermes-agent skill): document project context files and their discovery rules Adds a new 'Project Context Files' section to the hermes-agent skill explaining the priority order and discovery rules for .hermes.md, AGENTS.md, CLAUDE.md, and .cursorrules. Specifically clarifies: - .hermes.md walks parents up to the git root (good for monorepos) - AGENTS.md / agents.md is cwd-only (portable to other agents) - The 20K cap and head+tail truncation strategy - The threat-pattern scanner behavior (blocks content, not file) - What --ignore-rules actually skips (everything) Also fixes an inaccurate docstring in agent/agent_init.py for skip_context_files — the previous text only mentioned SOUL.md, AGENTS.md, and .cursorrules, but the actual behavior (per build_context_files_prompt and the --ignore-rules CLI flag) skips all of them plus .hermes.md and CLAUDE.md. Refs: https://github.com/NousResearch/hermes-agent/issues/46775	2026-06-20 23:23:47 -07:00
teknium1	1f874dfe44	fix(compression): stop fallback summary triplicating the latest user ask When LLM summarization fails, the deterministic fallback summary rendered the latest user ask (active_task = "User asked: '<ask>'") verbatim under THREE headings — Historical Task Snapshot, Historical In-Progress State, and Historical Pending User Asks. Re-presenting an already-handled ask as unresolved in-progress/pending work made the model re-answer it AND treat the resurrected ask as the active turn, burying the genuinely-new post-compaction user message (#49307: answer repetition + new-instruction loss, P1). Keep the latest ask once, under Task Snapshot, as historical context only. The In-Progress and Pending-Asks sections now say 'Unknown / None recoverable from deterministic fallback' (consistent with the Active State / Key Decisions / Resolved Questions sections) and explicitly note the ask is historical, not outstanding. The raw turn text still appears in the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn record, not a re-labeled instruction. Note: the separate role=assistant standalone-summary regurgitation (#33256) is left as-is — that role choice is constrained by strict message alternation (user collides with a user-ending head) and is already mitigated by the summary end-marker; forcing the role would risk the alternation invariant. Co-authored-by: r266-tech <r2668940489@gmail.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-20 23:19:27 -07:00
teknium1	2f3177adf4	fix(compression): protect the summary call from mid-flight interrupts Context compression is atomic, but a gateway interrupt (an incoming user message while the agent is busy) could abort the in-flight summary call. The Codex Responses aux stream polls the thread interrupt flag and raised InterruptedError unconditionally — so compression fell back to a degraded static 'summary unavailable' marker, losing the real handoff (#23975). Add a thread-local interrupt-protection flag (aux_interrupt_protection context manager) in auxiliary_client; the Codex stream's cancellation check honors it. The compressor wraps its summary call_llm in the context manager. Timeouts still fire (a hung call must die) and all other aux tasks (vision, web_extract, title_generation, …) stay interruptible. Re-entrant, so the main-model retry recursion is safe. Co-authored-by: konsisumer <der@konsi.org>	2026-06-20 21:32:30 -07:00
teknium1	7ace96ba40	fix(compression): preserve goal, platform, and session indexing across rotation Three state-loss bugs at the compression rotation boundary, fixed together because they all live in the same ~80-line rotation block: - #33618: a persistent /goal did not follow the rotation. load_goal does a flat per-session lookup with no lineage walk, so a goal silently died when compression minted a fresh child id. Added migrate_goal_to_session() and call it after the child session is created (move-not-copy: the parent row is archived as cleared so exactly one active goal row exists). - #33906/#33907: if the child create_session raised (FK constraint, contended write), the outer handler only warned and let the agent continue on the NEW id — which has no row in state.db — producing an orphan session. Now the rotation rolls agent.session_id back to the still-indexed parent (reopening it) instead of stranding the conversation on a phantom id. - #27633: the compaction-boundary on_session_start notification omitted the platform kwarg, so context-engine plugins saw source=unknown for every message after the boundary. Forward platform (matching the initial session-start call in agent_init.py). Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com> Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com> Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-20 20:06:24 -07:00
x7peeps	4467c22c8f	fix(chat-completions): strip timestamp from messages before sending to strict providers Per-message timestamp metadata injected by _apply_persist_user_message_override leaks into the Chat Completions payload sent to the provider. Strict OpenAI-compatible providers (e.g. Fireworks-backed endpoints like OpenCode Go 'glm-5.2', Mistral, Kimi) reject this schema-foreign field with HTTP 400: Extra inputs are not permitted, field: 'messages[0].timestamp' The ChatCompletionsTransport.convert_messages already strips known internal-only fields (tool_name, _-prefixed scaffolding keys, codex_reasoning_items, etc.) — add timestamp to that list. Closes #47868	2026-06-20 17:05:17 -07:00
teknium1	5a53e0f0f4	fix(compression): abort on auth failure instead of rotating into a degraded session When the auxiliary summary call fails with an authentication/permission error (HTTP 401/403), context compression now ABORTS and preserves the session unchanged instead of rotating into a child session with a placeholder summary. Before: a 401 (invalid/blocked key, or a token pointed at the wrong inference host) fell through every transient-error check to 'return None', and because compression.abort_on_summary_failure defaults False, compress() took the static-fallback path and rotated the session anyway (messages N->N). The user landed on a fresh-but-broken session that kept failing the same way — paying for a full-context API call each turn with no useful compression. After: _generate_summary classifies 401/403 as a non-recoverable auth failure (_last_summary_auth_failure) and compress() aborts on it regardless of abort_on_summary_failure. A distinct auxiliary summary_model that 401s still retries once on the main model first (its dedicated creds may be the only broken thing); the abort only sticks when the main model itself auth-fails or the fallback also auth-fails. The existing _last_compress_aborted handling in conversation_compression.py already skips rotation and emits a warning, so no session rotation occurs. Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite flag=False, non-auth failures keep the historical fallback path, and aux-model auth failure recovers on main without aborting.	2026-06-20 11:38:21 -07:00
teknium1	f22dd8a75a	fix(agent): fail over to fallback provider on persistent auth failure (401/403) When the active provider returns a 401/403 that survives its per-provider credential-refresh attempt (revoked OAuth, blocked/expired key, or an account pinned to a dead/staging inference endpoint), the conversation loop now escalates to the configured fallback chain instead of dead-ending. Before: the generic failover dispatch fired only for {rate_limit, billing}; auth/auth_permanent fell through to 'switch providers manually' advice and never called _try_activate_fallback(). A user whose primary credential was broken kept thrashing on the same dead credential every turn — the main agent appeared 'stuck in fallback mode' while never actually failing over. This also affected auxiliary tasks (compression, vision, title-gen), since auto-resolved aux follows the main provider. After: a persistent auth failure with a configured fallback chain switches to the next provider (mirroring the rate-limit/billing failover path), guarded one-shot per attempt by TurnRetryState.auth_failover_attempted. When no fallback is configured the behavior is unchanged — it falls through to the existing terminal handling and provider-specific troubleshooting guidance. Tests: test_auth_provider_failover.py — 401/403 classify as auth, the gating condition fires only with a chain present + guard unset, the guard blocks repeats, and non-auth (500) errors do not trigger auth failover.	2026-06-20 11:38:01 -07:00
kshitijk4poor	4663456996	fix(compression): in-place compaction is non-destructive (soft-archive, not delete) Teknium review: keeping one durable session id must NOT come at the cost of destroying history. The prior in-place implementation used replace_messages, which hard-DELETEs the pre-compaction turns (they also drop out of the FTS index) — same id, but the original conversation is gone with no recovery path and the summary becomes the only record. Rotation today is non-destructive (the old session's full transcript survives under the old id); in-place must match that durability contract, not weaken it. Fix: compact in place by SOFT-ARCHIVING, reusing the existing messages.active flag (the /undo soft-delete mechanic), instead of deleting: - New SessionDB.archive_and_compact(session_id, compacted): in one atomic write, UPDATE messages SET active=0 on the live turns, then insert the compacted set as fresh active=1 rows. Nothing is deleted. - The insert loop is extracted into a shared _insert_message_rows() helper so archive_and_compact and replace_messages don't duplicate the 60-line column/encoding block (extend-don't-duplicate). - Agent in-place branch calls archive_and_compact instead of replace_messages. Durability outcome (proven by test + E2E across repeated compactions): - Live context load (get_messages_as_conversation / get_messages) filters active=1, so a resume reloads ONLY the compacted set — compaction still shrinks the live session. - The pre-compaction turns stay on disk at active=0, recoverable via get_messages(include_inactive=True) / restore_rewound. - They remain FTS-searchable: the messages_fts* triggers index on INSERT and remove on DELETE only — they do NOT key on active, and active=0 is a content-preserving UPDATE. session_search still finds them. - Verified across TWO successive compactions: the 1st compaction's originals are still recoverable + searchable after the 2nd (answers the "no recovery path after the next compaction" concern directly). message_count now reflects the LIVE (active/compacted) count, matching the live load. replace_messages keeps its DELETE semantics (still correct for /retry, /undo) and gains a docstring note pointing compaction at the non-destructive method. Tests: test_in_place_keeps_same_session_id strengthened to assert the 8 seeded originals survive at active=0 alongside the 2 compacted rows AND stay FTS-searchable. Mutation check: swapping archive_and_compact back to a hard DELETE fails the test, so the non-destructive contract is bound. 285 hermes_state + in-place tests green; rotation/persistence/compress-command/cli suites green; ruff clean.	2026-06-20 10:57:07 -07:00
kshitijk4poor	4f9485a95d	refactor(compression): tidy in-place compaction path (simplify pass) Parallel 3-reviewer cleanup of the in-place compaction code. Findings applied: - perf: in-place mode no longer pre-flushes current-turn messages. The flush ran INSERTs that the immediately-following replace_messages(compressed) DELETE+reinsert discarded -- pure wasted writes per compaction. The current-turn tail survives via the compressor's compressed output (protect_last_n), not the flush. Verified no data loss; rotation still pre-flushes (its old session row is preserved, so the flush is real there). - quality: hoist the two shared post-write steps (update_system_prompt + _last_flushed_db_idx = 0) below the if/else -- they ran in both branches against agent.session_id. Removes the easiest divergence bug. - quality: compute the compaction-boundary locals (_old_sid, _is_boundary, _boundary_parent) ONCE instead of recomputing locals().get('old_session_id') and the "_old_sid or agent.session_id or ''" chain three times. - quality: initialize compacted_in_place up front and assign agent._last_compaction_in_place directly, dropping the fragile locals().get('compacted_in_place') reflection. - reuse: parse the in_place config flag with utils.is_truthy_value (the project's canonical truthy coerce) instead of a hand-rolled str().lower() in {...} (agent_init already imports from utils). Dropped as false positives / out of scope: gateway getattr of agent internals (established session_id pattern), dual result-dict carry (mirrors history_offset etc.), stringly-typed "compression" (codebase-wide convention, no constant). Behavior-preserving: 7 in-place tests (incl. 2 new flush-guard tests) + 26 rotation/boundary/persistence/command tests green; mutation check confirms the durable-replace guard still binds (removing replace_messages fails the test); ruff clean. Added test_in_place_skips_redundant_preflush / test_rotation_still_preflushes to guard the perf change.	2026-06-20 10:57:07 -07:00
kshitijk4poor	1fbf48d4ad	fix(compression): make in-place compaction durable + rotation-independent end-to-end Review (Codex + 3-agent parallel) found the first cut of in-place mode was incomplete: it only updated the system prompt, so the persisted transcript stayed 'full history + summary' and the next turn/resume reloaded the full history and immediately re-compacted (a loop), and every downstream layer that keyed off session-id rotation silently no-op'd. The session_id was doing double duty as the 'compaction happened' signal. This wires the whole path so removing rotation is actually complete: Agent (agent/conversation_compression.py): - In-place now DURABLY replaces the transcript: replace_messages(session_id, compressed) on the same row (the canonical store the gateway reloads from), not just update_system_prompt. Resume reloads the compacted set; no loop. - Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids cleared) so next-turn appends diff against the compacted transcript. - Expose a rotation-independent signal: agent._last_compaction_in_place, and in_place=True on the session:compress event. - Fire the compaction-boundary hooks (context-engine on_session_start, memory manager on_session_switch, reason='compression') in BOTH modes — in-place passes the same id as parent so DAG/buffer state still checkpoints. Without this, memory/context plugins miss every in-place compaction. Gateway auto-compress (gateway/run.py): - Read agent._last_compaction_in_place; set history_offset=0 on rotation OR in-place (both return the compacted set, so slicing past the pre-compaction length would drop everything). Carry compacted_in_place in the result dict. - No extra rewrite needed: the agent shares the gateway's SessionDB, so its replace_messages already updated the canonical store load_transcript reads. Manual /compress (gateway/slash_commands.py): - The throwaway /compress agent has no _session_db, so rewrite_transcript is the durable write. Previously gated behind 'if rotated:' which treated 'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite — making /compress a silent no-op in in-place mode. Now rewrites on rotated OR in_place; the data-loss guard still fires only for the genuine no-rotation-AND-not-in-place failure. Hygiene auto-compress already writes _compressed to the same id unconditionally (its agent has no _session_db, can't rotate) — correct for in-place, no change. Tests (tests/run_agent/test_in_place_compaction.py): - Assert the DURABLE transcript IS the compacted set after reload (get_messages_as_conversation == compacted), message_count==2, flush identity reset, and the rotation-independent signal set on in-place / unset on rotation. Rotation regression guard unchanged. Verified: 64 tests green across in-place + rotation/persistence/boundary/ concurrent/failure-sync/command/cli suites; E2E both modes (durable replace, gateway offset=0, rotation preserves old transcript); ruff clean. Still default-off.	2026-06-20 10:57:07 -07:00
kshitijk4poor	47fadc24d7	feat(compression): in-place compaction option that keeps one session id (#38763 ) Context compression today rewrites the message list AND rotates the session id — it ends the session, forks a parent_session_id child, and renumbers the title (name -> name #2). That moving identity key is the root cause of a whole bug cluster: /goal lost (#33618), pending response lost at the split (#14238), orphan sessions (#33907), TUI sid desync (#36777), FTS search gaps + duplicate sidebar entries (#45117), null continuation cwd (#42228), and title-rename dead-ends (#48989). It also forced a large defensive apparatus (compression lock, contextvar/env/ logging triple-sync, orphan finalization, gateway SessionEntry re-propagation, tip projection) whose only job is surviving a mid-conversation id change. Add a compression.in_place config flag (default False during rollout). When True, compaction rewrites the transcript and rebuilds the system prompt but keeps the SAME session_id: no end_session, no child row, no title renumber, no contextvar/logging re-sync, no memory/context-engine session-switch. The conversation keeps one durable id for life, like Claude Code / Codex. Compaction is lossy by design — the pre-compaction transcript is summarized away, not archived. The rotation path is unchanged when the flag is off (moved verbatim into an else branch). Staged rollout: this PR ships the option behind a default-off flag for live validation; a follow-up flips the default and deletes the now-redundant rotation machinery, superseding the 14 open band-aid PRs in this area. - hermes_cli/config.py: add compression.in_place (default False), documented - agent/agent_init.py: resolve the flag -> agent.compression_in_place - agent/conversation_compression.py: branch compress_context() on the flag - tests/run_agent/test_in_place_compaction.py: in-place invariants + rotation regression guard + config default The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no boundary data loss. Prompt-cache invariant preserved: the system-prompt rebuild is the same single sanctioned invalidation that already happens during compaction — no NEW invalidation. Message alternation preserved.	2026-06-20 10:57:07 -07:00
Sancho	c884ff64ea	fix(agent): keep system-prompt model identity in sync across provider failover The session-stable system prompt embeds Model:/Provider: identity lines, but mid-turn failover (try_activate_fallback) swaps the runtime without touching them, so a fallback model misreports itself as the primary when asked "what model are you?". rewrite_prompt_model_identity() rewrites the last occurrence of each line on _cached_system_prompt when a fallback activates (and back on restore, byte-identical so the primary's prefix cache still hits). The rewrite is never persisted to the session DB. _sync_failover_system_message() patches the in-flight api_messages[0] at all 8 failover sites so the current turn ships the corrected identity. Cache-safe: the fallback's prefix cache is cold on a model switch anyway. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-20 10:46:01 -07:00
kshitijk4poor	a7dd98c860	fix(env): guard remaining malformed int/float env var casts with utils helpers Widen the env_float() guard from #48735 across the whole bug class: a non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd port) raised an unhandled ValueError and crashed adapter/agent init. Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to the canonical utils.env_int / utils.env_float helpers (the established house pattern), instead of duplicating per-module helpers or inline try/except: - gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT - gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL - gateway/platforms/feishu.py: dedup cache + text/media batch settings - gateway/platforms/wecom.py, discord/adapter.py: text batch delays - gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT - gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT - hermes_cli/auth.py: CODEX/XAI refresh timeouts - agent/chat_completion_helpers.py: API/stream read/stale timeouts - run_agent.py, agent/auxiliary_client.py: API + nous timeouts Sites already guarded by try/except or local helpers are left untouched. The HERMES_MAX_ITERATIONS sites are already guarded on main via _current_max_iterations(), so they are not included.	2026-06-20 14:54:36 +05:30
Teknium	cf58f1a520	feat(titles): support language-aware title generation (#45296 ) Make auxiliary title prompts match the user language by default, with an optional pinned `auxiliary.title_generation.language` config.	2026-06-19 17:15:52 -07:00
Gille	013f9c8750	fix(memory): log CLI shutdown hook failures Makes the CLI memory-provider shutdown path observable: log when CLI cleanup calls memory shutdown (with session id + message count), warn instead of swallowing CLI memory-shutdown exceptions, warn on on_session_end failures during agent shutdown, and raise the MemoryManager provider-hook failure log from debug to warning with a traceback. Salvaged from PR #49287 (authored by Gille / @helix4u).	2026-06-19 16:59:43 -07:00
alt-glitch	f3e967aae5	fix(mcp): round-3 polish — generation capture adjacency + gateway contract note Third review pass (Hermes subagent) declared convergence: no BLOCKING, the round-2 generation-aware publish / context-engine staging / CLI reload / ACP routing all verified correct by hand and by test. - agent_init: capture _tool_snapshot_generation immediately before the tool snapshot (was ~425 lines earlier); removes a harmless skew window so the recorded generation always matches the snapshot it describes. - gateway/run.py _execute_mcp_reload: keep preserving each cached agent's build-time enabled_toolsets EXACTLY (do NOT merge newly-connected servers like CLI/TUI do) and document WHY — gateway sessions can be deliberately locked down, and test_reload_mcp_preserves_per_agent_toolset_overrides asserts this. A reviewer suggested "parity" here; it would have violated that contract.	2026-06-19 11:57:43 -07:00
alt-glitch	88d523220f	fix(mcp): address adversarial review round 2 (stale-publish race, parity holes) Second review pass (Codex + Hermes subagent). Codex reproduced a real race with a two-thread harness; both converged on the remaining issues. - Generation-aware publish (fixes a lost-update race): two refresh callers (the late-refresh daemon and the between-turns prologue around turn 1) could each compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry generation could acquire the publish lock after a newer caller and clobber it, deleting just-landed tools. refresh_agent_mcp_tools now captures registry._generation before computing and refuses to publish a stale set; agent._tool_snapshot_generation tracks the published generation. - Context-engine routing names (_context_engine_tool_names) are now staged on a local and published atomically with the snapshot, and only claimed when this rebuild actually appended the schema — matching agent_init's dedup so a registry/plugin tool of the same name keeps its own dispatch. (Previously mutated live, before the publish lock, and on no-change refreshes.) - CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a server newly ENABLED in config mid-session wasn't picked up (TUI already re-resolved). Merge now-connected MCP server names into the override (unless the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in sync. Closes the CLI/TUI parity hole. - ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th sibling rebuild that re-injected memory tools but NOT context-engine tools and bypassed the atomic/name-diff path (inert today, fragile). - mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG (single source of truth) instead of a stale hardcoded 5.0 literal. - Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout fallback uses DEFAULT_CONFIG.	2026-06-19 11:57:43 -07:00
alt-glitch	b6e2a54a94	fix(mcp): address adversarial review round 1 (cache parity, gates, races) Consolidated findings from three independent reviewers (Codex, Claude Code, a Hermes subagent w/ the hermes-agent-dev skill): - BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently dropping post-build-injected memory-provider (mem0/honcho/…) and context- engine (lcm_) tools on every refresh. Now additive-preserving: re-applies the same injectors agent_init uses, staged on locals and published atomically. - Re-injection now honors the #5544 enabled_toolsets gate for context-engine tools, so a restricted-toolset platform can't get lcm_ leaked back in. - Atomic read-diff-publish under one lock: the returned `added` set and the (tools, valid_tool_names) pair are consistent even under concurrent callers (no half-swap, no TOCTOU). - background_review fork opts out (_skip_mcp_refresh) so its byte-identical tools[] cache parity with the parent is preserved. - CLI /reload-mcp routed through the shared helper (was a 4th divergent copy with the same clobber bug + missing disabled_toolsets). - Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user just enabled in config this session is picked up; automatic paths reuse the agent's build-time selection. - mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the between-turns refresh, so the startup wait is only a small turn-1 UX bump rather than a heavy dead-server latency penalty. - has_registered_mcp_tools checks registered TOOLS (not connected servers) so a zero-tool/prompt-only server doesn't make the per-turn hook fire forever. - Tests: rewrote the thread-safety test to actually exercise the write path (alternating tool sets), added the #5544-gate regression, the memory/context preservation regression, and a "callable next turn via valid_tool_names" contract; removed a dead monkeypatch line.	2026-06-19 11:57:43 -07:00
alt-glitch	3713483874	fix(mcp): refresh agent tool snapshot between turns (cache-safe late-binding) A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting after the agent's one-time tool snapshot was uncallable for the rest of the session. The merged pre-first-turn late-refresh only helps during the dead air before the user's first keystroke; once a turn starts it bails to protect the prompt cache, so a user who types before the server connects never gets the tools without a manual /reload-mcp. Refresh the snapshot in the per-turn prologue (build_turn_context), before this turn's first API call assembles tools=. This is cache-safe by construction: the refresh only ever extends a fresh request prefix at a turn boundary, never mutates the cached prefix of an in-flight turn. So late tools become callable on the user's NEXT turn automatically, with no /reload-mcp and no cache cost. - tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with no MCP servers (the common case) skip the rebuild entirely. - agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the top of the prologue when MCP servers are registered. - tests: 3 contract tests through the real build_turn_context (adds late tool; skipped when no servers; no snapshot churn when unchanged). .hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.	2026-06-19 11:57:43 -07:00
alt-glitch	990273d90a	fix(agent): accept pixel-correct image downscale when bytes grow (#48013 ) The image-too-large reactive shrink (try_shrink_image_parts_in_messages) conflated two independent constraints: it always rejected a resize whose re-encoded bytes were >= the original, even when the shrink was driven by a PIXEL-DIMENSION cap (Anthropic many-image 2000px) rather than the byte budget. Downscaled screenshot PNGs routinely re-encode LARGER in bytes, so the dimension-correct result was discarded and the image left oversized -> the provider re-rejected on retry and the session wedged forever. Fix: track which constraint triggered the shrink (bytes vs dimension) and gate the accept on the SAME axis. * dimension path: accept the result as long as it is now within max_dimension, regardless of byte size (verify via Pillow; fall back to the byte gate only when the re-encode can't be decoded). * bytes path: still require bytes to shrink, but ALSO re-check the per-side cap when it's active — _resize_image_for_vision returns a best-effort, possibly over-cap blob when it exhausts its halving budget on a very-high-aspect image, so a byte-shrink alone can leave it over the dimension cap and re-brick on retry. Extend the unshrinkable-oversized guard to the pixel axis so a partial shrink doesn't burn the one-shot retry. Single shared agent path -> fixes CLI, TUI, and gateway alike. Adds a real-Pillow runnable proof (repro_48013_image_shrink_brick.py) that reproduces the issue's per-image table (bricks 3/5 before, passes 5/5 after) plus unit invariants for the dimension and bytes accept/reject paths, partial-progress accounting, and the bytes-path still-over-cap regression surfaced by adversarial review. Closes #48013	2026-06-19 11:37:51 -07:00
Ben Barclay	f538470cf4	feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A) The credential gate. When multiplexing is active, a profile's secrets resolve from a context-local scope, never the process-global os.environ (which in a multiplexer may hold another profile's keys, and is inherited by every subprocess spawned with env=dict(os.environ)). - agent/secret_scope.py: get_secret() backed by a secret-scope contextvar. FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped read RAISES UnscopedSecretError instead of falling back to os.environ — a missed/new call site crashes loudly at that line rather than leaking a cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths, …) keep reading os.environ via an allowlist. load_env_file/build_profile_ secret_scope parse a profile .env into an isolated dict WITHOUT mutating os.environ. Off by default => transparent os.getenv behavior. - hermes_cli/runtime_provider.py: all credential/provider/base-url reads go through _getenv -> get_secret. - agent/credential_pool.py: env fallbacks route through get_secret (the ~/.hermes/.env-first preference is preserved and already profile-correct via the home override). - tools/mcp_tool.py: MCP config interpolation resolves through get_secret, so a server's picks up the routed profile's value. - gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env reload is a no-op for credentials in multiplex mode (secrets come from the scope, not global env); _profile_runtime_scope context manager combines the HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in that scope (resolved via _resolve_profile_home_for_source) when multiplexing. Propagates into the agent worker thread for free via the existing copy_context() in _run_in_executor_with_context. Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove two profiles isolated, unscoped read raises, globals still read environ).	2026-06-19 07:34:15 -07:00
tt-a1i	46f9d53468	fix(agent): aggregate anthropic aux calls via stream	2026-06-19 17:32:13 +05:30
kshitij	226ec2801a	Merge pull request #48367 from kshitijk4poor/salvage-47289 fix(agent): summarize non-retryable API errors so raw HTML never leaks to delivery	2026-06-19 14:30:04 +05:30

1 2 3 4 5 ...

1359 commits