hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Austin Pickett	fd324562d3	feat(desktop): add context usage breakdown popover Let users click the status bar context indicator to see how tokens are split across system prompt, tools, rules, skills, MCP, and conversation. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-29 09:18:10 -04:00
HexLab98	f1345290ed	test(auxiliary): cover NVIDIA NIM max_tokens in _build_call_kwargs	2026-06-29 18:04:39 +05:30
Teknium	dc5ef20d89	test(reasoning-floor): isolate stale-timeout floor tests from config-module reload races (#54775 ) The five _resolved_api_call_stale_timeout_base integration tests reloaded hermes_cli.config + hermes_cli.timeouts via importlib.reload to clear cached config. Under xdist that mutates module-global state shared across the worker process, so a sibling test could leave the config cache in a state that made get_provider_stale_timeout return a leaked value — intermittently failing test_reasoning_floor_applies_to_opus_4_thinking (shard 6 flake, #52217 area). Patch run_agent.get_provider_stale_timeout per-test instead: floor-path tests get None (resolver falls through to the reasoning floor / env var / default), the explicit-config test gets 60.0 (priority-1 short-circuit). Same assertions, no shared-module mutation, deterministic under parallel execution.	2026-06-29 02:42:54 -07:00
Ben Barclay	eddfecd2ce	fix(vision): cap vision_analyze fan-out concurrency process-wide A single agent turn can fan out N vision_analyze calls at once — the classic trigger is "analyze every frame of this video", where ffmpeg explodes a clip into dozens of frames and the model calls vision_analyze on each. Every call does a CPU-heavy base64-encode/resize burst AND holds a long-lived LLM stream open. The tool executor runs concurrent tool calls on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple agent sessions share one process (the dashboard runs the agent in-process), so there was no global ceiling. In prod (June 2026) a video-frame fan-out pinned a worker thread at ~100% CPU and starved the shared asyncio event loop that also serves the dashboard's /api/status liveness probe, flapping the instance to UNHEALTHY even though nothing had crashed. Add a process-global threading.BoundedSemaphore that bounds how many vision analyses run concurrently across the whole process, held across the entire analysis (image load + encode + LLM call) in the single _handle_vision_analyze chokepoint (covers both the native fast path and the legacy aux-LLM path). It is a threading semaphore, NOT asyncio: each vision call is dispatched through model_tools._run_async on a per-thread event loop, so an asyncio primitive bound to one loop cannot coordinate across them. The acquire is offloaded via run_in_executor so waiting for a slot never blocks the calling loop. Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency, or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can never be disabled into an unbounded fan-out. Tests: bounded-fan-out regression guard + a control proving it would fail without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu host, env override, and sub-1 rejection. Pre-existing handler tests updated for the now-async _handle_vision_analyze. Verified via the real registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls, peak bounded to cap).	2026-06-29 01:27:10 -07:00
HexLab98	23f245eda5	test(vision): cover Ollama /api/show vision capability routing (#54511 )	2026-06-28 22:52:59 -07:00
sgaofen	b481348fbc	fix(agent): stream copilot ACP chat completions	2026-06-28 22:52:51 -07:00
sgaofen	0106082d1f	fix(agent): return OpenAI-shaped copilot ACP tool calls	2026-06-28 22:52:51 -07:00
lkevincc	163562bf88	fix: normalize lmstudio base urls	2026-06-28 20:46:44 -07:00
teknium1	14204b0646	test(agent): cover .hermes.md no-git-root cwd-only behavior Regression tests for the injection fix: outside a git repo only cwd is checked (planted ancestor .hermes.md is ignored), a cwd-local .hermes.md is still found, and inside a git repo the parent walk to the git root still works.	2026-06-28 20:46:32 -07:00
Teknium	3483424aaa	fix(security): redact bare-token credentials in URL userinfo (#6396 ) (#54475 ) git remote set-url with an embedded password (https://PASSWORD@github.com) leaked the credential into agent output — the redaction engine only masked user:pass@ DB connection strings, never the colon-less bare-token userinfo form a git remote uses. Add _URL_BARE_TOKEN_RE: scheme://TOKEN@host for web/transport schemes (http/https/wss/git/ssh/ftp), 8+ char floor to skip short usernames, token class forbidding /:@ so an @ in a path/query is never treated as userinfo. Deliberately scoped to the bare-token form only. The user:pass@ colon form and query-string tokens stay passing through (#34029, 'pass web URLs through unchanged') so magic-link / OAuth round-trip skills keep working — a bare credential in userinfo is never a workflow token (those live in the query string), so masking it can't break a skill.	2026-06-28 18:52:42 -07:00
Teknium	4c2961c511	fix(curator): never archive cron-referenced skills + floor use=0 pruning (#54443 ) The curator's inactivity prune archived any non-pinned agent-created skill whose activity was older than archive_after_days (90d). A skill loaded only by a cron job had its usage bumped solely when the job fired, so paused jobs, infrequent (quarterly/annual) schedules, and far-future one-shots aged their skills out from under them — the next run then failed to load the now-archived skill. - cron/jobs.py: add referenced_skill_names() returning skills used by ANY job (incl. paused/disabled). - curator.apply_automatic_transitions(): skip cron-referenced skills like pinned; add a use=0 grace floor so a never-used skill is not marked stale/archived until it is at least stale_after_days old. - LLM review pass: candidate list marks cron=yes; prompt forbids pruning cron-referenced skills and never-used skills under 30 days. Tested E2E against a real cron job + real usage records and with 4 new unit tests.	2026-06-28 15:10:21 -07:00
teknium1	091ce825fe	test(redact): fix file_read regression-guard for current-main YAML collapse The salvaged #35519 regression guard asserted that default (non-file_read) mode keeps a head/tail `ghp_S1...Pn2T` mask for a `token: <key>` line. On current main the YAML config pass (`_YAML_ASSIGN_RE`, key `token`) re-masks the already-prefix-masked value to `***`, so the assertion was stale. Switch to a bare-token context so the guard isolates what it claims (prefix-mask head/tail shape in default mode) without depending on the YAML collapse.	2026-06-28 04:13:20 -07:00
kshitijk4poor	de928bccde	fix(redact): non-reusable sentinel for prefix secrets in file reads (#35519 ) When security.redact_secrets is on (default), read_file/search_files/cat applied redact_sensitive_text(code_file=True) to file content, which still ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.) came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking truncated key. When an agent read that and wrote it back to config, the masked value replaced the real credential, silently breaking auth (401). Production evidence: a config.yaml found containing the exact 13-char masked GitHub PAT. The two community PRs (#35529, #35534) fixed the corruption by NOT redacting prefixes for config reads — but that exposes the user's real keys to the agent context, model, and logs (a security regression). This takes the safer route: keep redacting, but for file content emit a NON-REUSABLE sentinel. - New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis wrapper is syntactically invalid as a token so it can't be mistaken for or written back as a usable key). - New `redact_sensitive_text(file_read=True)` routes prefix matches through it (implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token` still keeps head/tail (fine for logs, never written back). - Wired the 3 file_tools.py call sites (read_file / search_files / cat) to file_read=True. Fixes both the corruption AND avoids the secret-exposure of the un-redact approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass; mutation-verified (reverting to the old mask fails the sentinel/leak tests). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com> Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com> Closes #35519. Supersedes #35529, #35534.	2026-06-28 04:13:20 -07:00
Teknium	c1c179a239	fix(security): redact secrets in background process + foreground env-dump output (#43025 ) (#54149 ) * fix(security): redact secrets in background process + foreground env-dump output Terminal-output redaction was incomplete (#43025): - Gap 1: process(action=poll/log/wait) returned background stdout verbatim — no redaction at all. A background printenv/server/test emitting a key leaked raw to the model, session.db, and CLI display. Same for the gateway background-process watcher's completion/progress notifications. - Gap 2: the foreground terminal path hardcoded code_file=True, which skips the ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv leaked even there. Adds agent.redact.redact_terminal_output(output, command) as the single policy for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/ declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens; other commands stay on code_file=True to avoid false positives on source dumps. Wired into terminal_tool, process_registry (_handle_process boundary), and the gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved. * docs: add infographic for #43025 terminal-output redaction fix	2026-06-28 02:44:21 -07:00
teknium1	bbe1bf4045	fix(agent): stop redacting tool-call args in history; fix auth-header quote-eating Two related redaction bugs from #43083: 1. build_assistant_message redacted tool-call arguments in-memory. That dict feeds both the replayed conversation history and state.db (which is itself replayed verbatim on session resume), so the model read back its own PGPASSWORD='***' psql call and copied the placeholder, breaking every credential-dependent command on the second turn. The masking gave no real protection either — the same secret still leaks through tool OUTPUT. Remove it. Keeping secrets out of the replayable store is a separate tokenization/vault concern (security.redact_secrets still governs storage-time redaction elsewhere). 2. _AUTH_HEADER_RE's greedy \S+ credential class ate a closing quote when the token sat flush against it (Authorization: Bearer sk-.."), turning value corruption into syntax corruption (unterminated quote -> shell EOF / SyntaxError). Exclude " and ' from the token class; real credentials never contain them. Closes #43083.	2026-06-28 02:44:06 -07:00
teknium1	aa50c1ba5d	fix(prompt): repair backend probe import (get_environment never existed) The system-prompt backend probe imported a nonexistent symbol — `from tools.environments import get_environment` — which always raised ImportError: cannot import name 'get_environment'. The exception is caught and only drops the live backend description to a static fallback, so it is cosmetic, but it broke the live OS/user/cwd probe for every non-local backend (docker/singularity/modal/daytona/ssh). The real factory is `_create_environment` in tools.terminal_tool. Build the environment the same way the live terminal path does (select backend image, assemble ssh/container config from _get_env_config()), then run the probe. Note: this does NOT affect tool loading — tool selection runs each tool's check_fn and never consults this probe. Regression from #52147 (2026-06-25). Closes #53667 (probe import); the 'cronjob-only' tool-collapse symptom is not reproducible — tool selection has no probe dependency and memory's check_fn is unconditionally True.	2026-06-28 02:41:31 -07:00
Teknium	674e16e7c6	fix(redact): stop DB-connstr redaction from corrupting code output (#33801 ) (#54061 ) Secret redaction is display/output-scoped on main — write_file writes content verbatim, terminal/execute_code redact only output not the command/source. The real bug is in displayed tool OUTPUT (read_file, terminal, execute_code): _DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a multi-line block it scanned past the DSN line to the next stray '@' (a Python @decorator), replacing every intervening character — including line breaks — with *. That dropped lines and concatenated the next line onto the f-string line, making read_file output look corrupted (the file on disk was always correct). Reported in #33801. Fix: - Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so the match can never span a line break. A real DSN password never contains whitespace. This alone kills the catastrophic line-dropping. - Under code_file=True, preserve a password group that is a pure {...} brace expression — f"postgresql://{user}:{pass}@{host}" is an f-string template, not a live credential. Literal passwords are still masked. - Pass code_file=True at the terminal and execute_code output redaction call sites (file_tools already did) so code-execution output isn't corrupted by ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and private keys are still redacted. Verified E2E against the reporter's exact pydantic-settings module: file written verbatim, read_file shows the DSN f-string + @model_validator intact with zero * corruption, while a literal postgresql://admin:pw@host DSN and a real sk- key are still masked. Reported-by: koishi70 Reported-by: pfrenssen	2026-06-28 01:15:39 -07:00
teknium1	578e3989d4	fix(agent): route content-filter stream stalls to fallback chain (#32421 ) When a provider's output-layer safety filter (MiniMax "output new_sensitive (1027)", Azure content_filter, etc.) kills a streaming response after deltas were already sent, interruptible_streaming_api_call swallows the raw error into a finish_reason=length partial-stream stub. The conversation loop then burned 3 continuation retries against the SAME primary — re-hitting the content-deterministic filter every time — and gave up with "Response remained truncated after 3 continuation attempts", never consulting fallback_providers. Builds on @595650661's classifier change (cherry-picked) so error_classifier recognizes the filter; then: - chat_completion_helpers: run the swallowed error through error_classifier at the stub-creation point and stamp _content_filter_terminated on the stub (single source of truth — no parallel pattern list). - conversation_loop: read the tag and activate the fallback chain BEFORE burning any continuation retries; roll partial content back to the last clean turn and re-issue against the new provider (restart_with_rebuilt_messages). Plain network stalls are unaffected (only content_policy_blocked is tagged). Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the same issue via the stub-tag and loop-escalation approaches respectively. Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback fires on the first stub and the fallback provider completes the turn.	2026-06-28 01:15:21 -07:00
HexLab98	04ff4d9b54	test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702 )	2026-06-27 21:22:49 -07:00
Teknium	a8c862900b	fix(tui): sanitize replay history on WebUI/TUI session resume (#29086 ) (#53939 ) A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill, interrupt, or process restart before the tool result was written) persists a dangling assistant(tool_calls) or interrupted assistant->tool tail. The messaging gateway already strips these tails before replay (the #49201 fix), but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight in as the agent's conversation_history with no cleanup. The model re-issued the unanswered call on every resume -- including after a full WebUI + Gateway restart, since the poison lives in the SessionDB, not memory -- leaving the session permanently 'thinking'. Only deleting the session recovered it. - Extract the two strippers + helper from gateway/run.py into a shared agent/replay_cleanup.py (sanitize_replay_history wraps both). - gateway/run.py re-exports under the historical private names; messaging behavior unchanged. - Both TUI cold-resume sites now sanitize the model-fed history while leaving the display transcript untouched, so the user still sees their full history. Verified E2E against a real SessionDB: dangling and interrupted tails are stripped from the model feed, healthy mid-progress tool sequences are preserved, and the display transcript is always the full raw history.	2026-06-27 20:56:49 -07:00
Teknium	d43e0cf304	fix(agent): config-driven intent-ack continuation for all api_modes (#27881 ) (#53943 ) * fix(agent): config-driven intent-ack continuation for all api_modes (#27881) The agent could end a turn after only stating intent ('I will run a health check...') without executing the announced tool call, forcing the user to re-prompt. A continuation guard that catches this and nudges the model to proceed already existed but was hard-gated to the codex_responses api_mode, so Gemini/Claude/OpenRouter turns never benefited. - New agent.intent_ack_continuation config (default 'auto' = codex-only, byte-stable for existing conversations). 'true'/model-list opts every api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape. - looks_like_codex_intermediate_ack gains require_workspace (default True). The opted-in path drops the codebase/filesystem requirement so general autonomous workflows (server ops, deploys, API calls) are caught, not just coding tasks. Future-ack + action-verb + short-content + no-prior-tool guards still apply; the 2-nudge-per-turn cap is unchanged. - Resolution centralized in intent_ack_continuation_mode (off/codex_only/all). * docs(infographic): intent-ack continuation (#27881)	2026-06-27 20:46:00 -07:00
Jack Maloney	f0de4c6a47	fix(pool): re-select from credential pool on primary runtime restore _restore_primary_runtime restored the construction-time api_key snapshot and never consulted the credential pool. After the pool rotated away from a revoked/exhausted entry mid-session, every new turn restored the dead key, re-failed instantly, burned the remaining entries, and fell through to cross-provider fallback. After restoring the snapshot, re-select the pool's current best entry and swap the live credential in via _swap_credential (which already rebuilds the OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163 base_url / OAuth-detection fixes). Falls back to the snapshot key when the pool is absent, empty, or the entry has no usable key. Salvaged from #25206 onto current main: the original targeted the pre-refactor monolithic method in run_agent.py; the logic now lives in agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead of re-inlining the client rebuild. Fixes #25205	2026-06-27 20:04:45 -07:00
Shashwat Gokhe	505bc27d8d	fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats A document attached alongside an image in the same Discord message was swept into the vision pipeline and 400'd the whole turn ("Could not process image"), and was simultaneously never surfaced to the agent as a readable file. Restores the "any file type works" contract for mixed messages and fixes the HTTP 400. Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in a PHOTO message landed in image_paths and poisoned the vision call. The document context-note path was gated on message_type == DOCUMENT, so that same doc never reached the agent at all. Now classification is per-attachment (trust each attachment's own MIME; fall back to the message-level type only when MIME is unknown), via shared _event_media_is_* helpers used by both _build_media_placeholder and the main inbound loop. The document note now fires for any non-image/audio/video attachment regardless of message-level type. Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic 400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now transcodes those to PNG via Pillow before declaring media_type, skipping cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow can't rasterize it — so it's skipped rather than transcoded. Closes #25935. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com> Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>	2026-06-27 19:26:04 -07:00
konsisumer	1ab35ba25d	fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s The Anthropic SDK clients were built without max_retries, so the SDK default (max_retries=2) retried 429/5xx with its own backoff that ignores Retry-After — double-retrying inside hermes's outer loop and burning request slots against a bucket that won't refill for minutes. Set max_retries=0 on all Anthropic/AnthropicBedrock client constructions so the outer conversation loop (which already honors Retry-After) owns retry. Also raise the Retry-After cap in the conversation loop from 120s to 600s. Anthropic Tier 1 input-token buckets reset in ~171s, so the 120s cap made hermes retry before the reset window and re-trip the limit. Refs #26293	2026-06-27 19:23:15 -07:00
Chaz Dinkle	1dde7e2f2a	fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on its own schedule, so by the time Hermes notices an expired token Claude Code may have already rotated it. Re-read live credential sources first and adopt a valid token rather than POSTing a possibly-stale refresh token. Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle) on top of the keychain/file reconciliation from PR #21112 (nodejun). Adds AUTHOR_MAP entry for nodejun.	2026-06-27 19:14:43 -07:00
jun	5a5396aecb	fix(anthropic): reconcile keychain/file credentials when one is expired read_claude_code_credentials() previously returned the macOS Keychain entry as soon as one existed, even if its OAuth token was already expired. Callers then ran is_claude_code_token_valid() on the result and got False, so resolve_anthropic_token() returned None — surfacing the misleading 'No Anthropic credentials found' error even when ~/.claude/.credentials.json held a perfectly valid token. Now reads both sources and prefers the non-expired one. When both are valid (or both expired), prefers the later expiresAt so any subsequent refresh uses the freshest refresh_token. Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation cases. The existing 'keychain wins' priority test still passes because both fixtures share the same expiresAt and the tiebreaker is >=.	2026-06-27 19:14:43 -07:00
LeonSGP43	c56b39c11e	fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted _try_openrouter() returned (None, None) whenever an OpenRouter credential pool existed but was exhausted (_select_pool_entry -> (True, None)), making the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks (compression, vision, web_extract) silently failed even with a valid env key. Now the pool-present branch only returns early when it successfully builds a client; an exhausted pool falls through to the env-var path. The final failure (pool exhausted AND no env var) still marks the provider unhealthy. Fixes #23452. Co-authored-by: ambition0802 <noreply@github.com>	2026-06-27 19:09:27 -07:00
qWaitCrypto	46e18804ad	fix(auxiliary): fall back on 401 auth errors in auto mode (#21165 ) When the primary provider returns 401 and the auth-refresh path is unavailable or fails, both call_llm() and async_call_llm() reached the should_fallback gate without _is_auth_error in the condition, so the auxiliary task (e.g. compression) was dropped silently — losing message history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in both sync and async paths, plus an 'auth error' reason branch. Auth stays a non-capacity error: it falls back in auto mode via the is_auto gate, but on an explicitly-configured provider it still respects the user's choice and raises rather than silently switching providers.	2026-06-27 19:07:04 -07:00
konsisumer	8b4c29f0f0	fix(auth): preserve concurrently-added credentials on pool rewrite	2026-06-27 19:01:37 -07:00
Teknium	3ac96d3308	fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827 ) On a MoA session, auxiliary tasks (title generation, compression, vision, …) ran through _resolve_auto with provider='moa' / model='<preset>', which sent the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client — producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible as the title-generation warning). MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the reference fan-out. _resolve_auto now resolves a 'moa' main provider to the preset's aggregator slot (its acting model) and continues Step 1 with that real provider+model, dropping the virtual moa://local base_url + placeholder key so the aggregator resolves via its own provider credentials. Mirrors the MoA context-length resolution. Verified live: a MoA turn no longer emits the 'not a valid model ID' warning. Test: tests/agent/test_auxiliary_main_first.py (19 pass).	2026-06-27 14:21:26 -07:00
Teknium	227e6c0143	fix(moa): resolve context window from the aggregator, not the 256K default (#53780 ) A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is the virtual local endpoint, so get_model_context_length() missed every probe and fell through to the 256K fallback — even when the aggregator is a 1M-context model. The acting model in MoA IS the aggregator, so resolve the context window from the aggregator slot's real provider+model. - model_metadata.get_model_context_length: when provider=='moa', resolve the preset's aggregator slot through resolve_runtime_provider and recurse with the aggregator's real provider/model/base_url. Explicit model.context_length still wins (checked first); falls through to the generic default if resolution fails. Tests: opus-gpt preset now reports 1M (the aggregator window), config override still honored.	2026-06-27 12:08:09 -07:00
konsisumer	1b6ebb24c0	fix(agent): validate OpenRouter provider sort before request dispatch	2026-06-27 11:43:08 -07:00
Teknium	190e1ffac9	fix(redact): mask passwords in lowercase/dotted config keys (#53590 ) The secret redactor only matched uppercase env-style keys ([A-Z0-9_]), so config-file assignments like spring.datasource.password=secret, app.api.key=xyz, and YAML password: secret leaked verbatim when the agent ran cat/grep on application.properties or .env files (issue #16413). Adds three case-insensitive config-key matchers that run only in a config-file context, preserving the existing #4367 (lowercase code/prose) and web-URL-passthrough carve-outs: - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export) - _YAML_ASSIGN_RE: unquoted colon config (password: value) Value capture stops at whitespace and '&' so form bodies stay pair-wise; the '://' guard keeps intentional web-URL query-param passthrough intact. Reported-by: Murtaza1211	2026-06-27 04:43:28 -07:00
teknium1	38e7bd8a08	fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status path classified these unconditionally as rate_limit with should_rotate_credential=True, so an overloaded provider exhausted the credential pool after two errors — fatal for a single-key user, who has nothing to rotate to. The credential is valid; the server is just busy. Disambiguate the 429 body against a shared _OVERLOADED_PATTERNS list and route overload language to FailoverReason.overloaded (retryable, no rotation), matching the existing 503/529 path and the message-only path (#52890). Genuine rate limits (no overload language) still rotate. Extracted the inline overloaded tuple #52890 added into the shared _OVERLOADED_PATTERNS constant so the status-code and message paths use one list. Closes #14038.	2026-06-27 04:16:54 -07:00
LeonSGP43	e7c013494d	fix(agent): preserve nested API error bodies	2026-06-27 04:13:53 -07:00
Bartok	864d5521ad	test(curator): join straggler curator-review thread on fixture teardown The curator_env fixture left async review threads (synchronous=False spawns a daemon 'curator-review' thread that calls save_state() on completion) running past test teardown. save_state() resolves the state path from HERMES_HOME at write time, so a straggler could write into the next test's tmp home, corrupting test_state_file_survives_corrupt_read (and others) under CI load. Join the thread on teardown while HERMES_HOME is still pinned to this test's home.	2026-06-27 03:52:52 -07:00
Bartok9	45ce35ed72	fix(agent): classify message-only 'overloaded' as server overload Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the overloaded-classification fix, with a regression test that fails without it.	2026-06-27 03:52:52 -07:00
Teknium	ec769e49d2	fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564 ) The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not use markdown as it does not render' — factually wrong. Both adapters actively convert markdown to native formatting: - whatsapp_common.format_message(): bold, ~~strike~~, # headers, links, code blocks -> WhatsApp native syntax - signal_format.markdown_to_signal(): same conversions via bodyRanges, plus '- item' / '* item' bullets -> '• ' Unicode bullets The wrong hint made the agent strip bullets and bold the adapter would have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud: markdown is auto-converted, bullet lists work, tables are not supported. Added a contract test asserting markdown-converting platforms never forbid markdown in their hint.	2026-06-27 03:46:41 -07:00
Teknium	60f58a2b95	feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552 ) The verify-on-stop guard fired too eagerly — including on doc/markdown/skill edits with nothing to verify, where it pushed a pointless /tmp verification script. Three changes: 1. Default OFF for new installs: agent.verify_on_stop defaults to false (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31. 2. One-time migration (v30 -> v31): existing installs are switched off once, but only when the value is missing or still the "auto" sentinel — an explicit true/false the user set is preserved. 3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on the code paths. The legacy "auto" sentinel is still honored when set explicitly (ON for interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env override unchanged.	2026-06-27 03:23:22 -07:00
ethernet	bcc3eb3419	fix(ci): rip out some xdist legacy stuff... how did these ever work??	2026-06-26 19:15:18 -07:00
helix4u	063fe4f6ef	fix(auxiliary): fallback on invalid provider responses	2026-06-26 13:49:46 +05:30
brooklyn!	a2b49e60b6	Merge pull request #52412 from GodsBoy/fix/verify-on-stop-messaging-surface-leak fix(agent): gate verify-on-stop nudge off for messaging surfaces	2026-06-26 02:30:08 -05:00
Moonsong	4e66bf1f80	fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608 ) When operator config has provider=anthropic with model.base_url pointing at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic), the auxiliary Anthropic path was unconditionally applying that override. Main-session traffic routed correctly because the main path attaches the right credential for the actual destination, but every side-channel call (memory extractors, reflection, vision, title generation, janus extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd. Gate the override on hostname == api.anthropic.com. Operators routing main through a non-Anthropic provider must use that provider's own auxiliary client; the Anthropic aux path now stays pointed at api.anthropic.com. Regression tests cover openrouter, openai, anthropic-with-path, empty, and anthropic-default-base_url cases.	2026-06-26 11:21:05 +05:30
DavidMetcalfe	27c486e3b1	feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors Wire get_reasoning_stale_timeout_floor() into both stale detectors so known reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of the upstream gateway idle-killing the socket (BrokenPipeError) before first token. Applied as max(default, floor) — never overrides explicit user config, never lowers an existing threshold. The reasoning_timeouts.py allowlist module already landed on main via #52795, so this salvage carries only the wiring + tests (the duplicate module and the stale-base MoA reverts from the original PR branch are dropped). Salvaged from #52238. Fixes #52217.	2026-06-25 22:12:06 -07:00
teknium1	4d04c652f2	fix(curator): make external-skill write guard actually fire during curation The salvaged #51875 added a background-review write guard in skill_manage that refuses mutations to skills.external_dirs skills — but it only fires when is_background_review() is true. The curator's LLM review fork ran with the default _memory_write_origin='assistant_tool', so the guard never triggered during the exact curation pass it exists to protect against (GH-47688). - Set _memory_write_origin='background_review' on the curator review fork so turn_context binds it onto the write-origin ContextVar and the guard fires. - Add a regression test asserting the fork runs under the background_review origin (the invariant linking the fork to the guard). - AUTHOR_MAP: map yu-xin-c for the salvaged commit.	2026-06-25 22:03:02 -07:00
yu-xin-c	96bc524a71	fix(curator): protect external skills from background curation	2026-06-25 22:03:02 -07:00
teknium1	6c58878e7d	fix(browser): force secret-pattern redaction on browser_type display Force redact_sensitive_text(force=True) on the browser_type text arg so recognized credentials (API keys, tokens, JWTs) are masked in tool progress, previews, callbacks, and return payloads even when the global security.redact_secrets opt-out is set — a typed credential reaching chat history is a security boundary, not log hygiene. Normal typed text matches no pattern and stays fully readable for debuggability. Tests assert the API-key-shaped secret is masked across every surface and that normal text passes through unchanged.	2026-06-25 22:02:22 -07:00
rebel	8ff426e53b	fix: redact browser typed text surfaces	2026-06-25 22:02:22 -07:00
Teknium	a4091e49f1	fix(auth): write rotated Codex/xAI pool grant through to global root (#48415 ) (#52760 ) CredentialPool._sync_device_code_entry_to_auth_store rotated single-use OAuth refresh tokens but wrote the new chain only into the active profile store. When a profile resolves a grant from the global-root fallback (read_credential_pool, #18594) and the pool then refreshes it, root was left holding a now-revoked refresh token — every other profile reading the stale root grant subsequently died with refresh_token_reused / invalid_grant once its access token expired. This is the credential-pool analog of #43589 (which fixed the non-pool xAI refresh path in _save_xai_oauth_tokens). Detect the read-from-root case (profile lacks its own providers.<id> block) BEFORE the profile save and, after it, write the rotated chain back to the global root via a best-effort, seat-belted write-through. A profile that genuinely shadows root (owns the block) is untouched; classic mode (profile == root) is a no-op; a failed root write never breaks the profile's own save. Covers openai-codex (reported), xai-oauth, and nous through the shared sync path.	2026-06-25 19:14:06 -07:00
DavidMetcalfe	865a09a610	fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice Two-part fix: Part 1 (classifier override at agent/error_classifier.py:720-738): A transport disconnect on a reasoning model — even on a large session — now routes to FailoverReason.timeout instead of context_overflow. Without this, large-session reasoning-model disconnects route to the compression branch and silently delete conversation history on a phantom context-length error. The override is strictly targeted: non-reasoning models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to context_overflow on large sessions — the existing intentional behavior for chat models whose proxy doesn't idle-kill during prefill/generation. Part 2 (new agent/thinking_timeout_guidance.py + integration at agent/conversation_loop.py:3488-3567): New is_thinking_timeout() and build_thinking_timeout_guidance() helpers. When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3, Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning) hits a transport-kill on a small session (classifier says timeout directly) or after Part 1 routes correctly (large session), the user now sees reasoning-specific guidance with three actionable workarounds in priority order: 1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900 in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s for known reasoning models; raise further if upstream is even tighter). 2. Lower reasoning_budget or set reasoning_effort: medium on this model if the provider supports it. 3. Use a smaller / faster reasoning model if the task doesn't require deep thinking. The new guidance takes precedence via if/elif over the existing _is_stream_drop block, so a reasoning-model user with a transport-kill message sees actionable advice instead of the misleading "try execute_code with Python's open() for large files" advice (which is correct for the unrelated large-file-write stream-drop case but actively wrong for the thinking-timeout case). Verified: - 478 tests passing across 9 directly-relevant files (49 new + 429 existing, zero regressions). - Ruff lint clean on all 4 modified/new files. - Negative test: 6 parametrized regression guards confirm non-reasoning models still route to context_overflow on large sessions; 4 parametrized gates confirm non-timeout classifier reasons never trigger the guidance; 5 parametrized cases confirm non-transport messages never trigger it. - Regression guard: new guidance message does NOT contain "execute_code" or "open()" — the misleading advice is fully replaced, not appended alongside. - Cross-vendor dual review via agy -p: - Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one SHOULD-FIX (vprint block duplication — fixed by extracting detection into a helper module). - GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits (test placement — adopted at tests/agent/test_thinking_timeout_guidance.py; primary-model capture — accepted as non-issue per Flash's nit). Dependency note for maintainers: This PR includes agent/reasoning_timeouts.py (the reasoning-model allowlist module from PR #52238) because the Layer 1 override is load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238 lands on main, this PR's duplicate agent/reasoning_timeouts.py should be rebased away. Either PR can land first; the other rebase is mechanical. Fixes #52271.	2026-06-25 19:00:48 -07:00

1 2 3 4 5 ...

843 commits