hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-28 18:19:28 +00:00

Author	SHA1	Message	Date
konsisumer	d5e2fbf244	fix(agent): frame compaction handoff sections as historical context	2026-06-11 13:57:13 -07:00
teknium1	114e265737	fix(plugins): don't cache a failed discovery sweep as discovered Root-cause hardening for the stranded-empty-registry failure behind 'No web search/extract provider configured': discover_and_load() set _discovered=True before scanning, so a sweep that raised partway was swallowed by callers as a warning and every later call early-returned against an empty registry for the process lifetime. The flag now acts only as a re-entrancy guard and is reset when the sweep raises, so the next call retries discovery.	2026-06-11 12:56:44 -07:00
xxxigm	32a73010bb	test(web): cover keyless default surviving a failed plugin sweep Pins the invariant that _ensure_web_plugins_loaded registers the keyless Parallel default (and the wider bundled set) even when the general plugin discovery raises, that the direct-registration fallback honors plugins.disabled, and that it stays a no-op on the healthy path.	2026-06-11 12:56:44 -07:00
Austin Pickett	c3464ecf45	fix(discord): recover from runtime gateway task exits (#44383 ) * fix(discord): recover from runtime gateway task exits Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit recovery; the original PR was 1081 commits behind with 28 unrelated commits. A post-ready discord.py WebSocket crash left the gateway split-brained: producers stayed active while Discord stopped responding. After this fix the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error() so the existing GatewayRunner reconnect watcher replaces the dead adapter. Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy errors, invalid tokens) surface fast instead of burning the full ready timeout. Because connect() no longer waits via asyncio.wait_for on that path, test_connect_releases_token_lock_on_timeout is updated to trigger the timeout through the new helper (same lock-release contract). 3 tests pass (2 new runtime-failure tests + the updated timeout test); test_discord_connect.py and test_discord_slash_commands.py green. Co-Authored-By: ameobius <ameobius@local.host> * fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test connect() no longer uses asyncio.wait_for for the ready handshake, so test_connect_timeout_cancels_bot_task was hanging for 30s in CI. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ameobius <ameobius@local.host> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-11 15:39:01 -04:00
Dineth Hettiarachchi	020ef76cf1	fix(discord): cancel _bot_task on connect() timeout to prevent zombie client When connect() times out waiting for the Discord ready event, the background asyncio.Task running client.start() was not cancelled. discord.py's internal reconnect loop can ignore client.close() while a WebSocket handshake is in flight, so the orphaned task eventually completes and fires on_ready. A later successful reconnect then leaves two live Discord clients in the same process — each with its own on_message handler and MessageDeduplicator instance — so every @mention creates two threads because the per-adapter dedup caches cannot catch cross-client duplicates. Fix: explicitly cancel and await _bot_task in two places: 1. The asyncio.TimeoutError handler inside connect() — catches the case where the adapter's own inner wait_for fires before the gateway's outer timeout. 2. The start of disconnect() — the load-bearing path, always reached via _dispose_unused_adapter regardless of which timeout fired first. Root cause confirmed from production logs: a Jun 8 network outage caused three consecutive connect() timeouts. The first attempt's bot_task completed its handshake 4 minutes later ("Connected as") with no preceding watcher line, then the watcher's real reconnect also connected 90 seconds after that. The two clients ran continuously for 41+ hours, confirmed by the same user message appearing as two separate inbound events in two different thread IDs 357ms apart. Regression tests added to tests/gateway/test_discord_connect.py: - test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a NeverReadyBot and asserts _bot_task is None afterward - test_disconnect_cancels_running_bot_task: injects a live zombie task, calls disconnect(), and asserts the task is cancelled and the attribute cleared	2026-06-11 12:09:18 -07:00
Teknium	13650ab7f8	fix(gateway): audio attachment note no longer steers the agent into punting Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file attachment context note led with "Ask the user what they'd like you to do with it", steering the model into asking instead of transcribing. Rewritten to instruct the agent to transcribe/process the file itself when the request involves its content, only asking when intent is genuinely unclear. Contract assertion added to the existing audio attachment note test.	2026-06-11 11:58:19 -07:00
xxxigm	4e9be3ee32	test(gateway): cover document context note for PDF/DOCX vs text Pin the contract for _build_document_context_note: text documents confirm the inlined content and record the path; binary documents (PDF/DOCX/XLSX/octet- stream) tell the agent to extract the text itself and never instruct it to ask the user to paste the contents.	2026-06-11 11:58:19 -07:00
Austin Pickett	ce99a81123	fix(dashboard): suppress unicode-animations postinstall during npm ci Set CI=1 in _run_npm_install_deterministic so the package's /dev/tty postinstall demo is skipped during hermes dashboard web UI builds. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-11 11:49:08 -07:00
brooklyn!	a4f179c509	fix(agent): steer GPT/Codex family to V4A for single-file edits too (#44411 ) The coding-posture brief told GPT/Codex models to use patch mode='patch' (V4A) for structured/multi-file changes but mode='replace' "for a single small swap". That second nudge points those models at a format their first-party harness never taught them. Verified against openai/codex (current main): apply_patch is the ONLY file editor in codex-rs — zero occurrences of str_replace/old_string anywhere in the repo; the grammar (core/src/tools/handlers/apply_patch.lark) is exactly the V4A dialect our patch_parser implements; the shipped model prompts (gpt_5_codex, gpt-5.2-codex, gpt-5.1-codex-max + instruction templates) explicitly say to use apply_patch "for single file edits"; and the tool is gated per model via ModelInfo.apply_patch_tool_type, i.e. OpenAI ships V4A-for-everything as model metadata. The GPT-family line now steers to mode='patch' for all edits, single-file included. The replace-family line (Claude + open-weight) is unchanged — Claude Code's FileEdit is old_string/new_string/replace_all exact string replacement (confirmed from Anthropic's shipped sdk-tools.d.ts, the only file editor in its tool union), matching our mode='replace'.	2026-06-11 17:52:52 +00:00
Teknium	cb29e8a82e	refactor(cron): rebrand Cron Recipes -> Automation Blueprints Product rename across every surface: module/file names (blueprint_catalog, tools/blueprints, blueprint_cmd), slash command /cron-recipe -> /blueprint (alias /bp), dashboard API /api/cron/blueprints, desktop deep-link hermes://blueprint/<key>, docs catalog page + extract script, and the skill frontmatter block metadata.hermes.blueprint. No behavior change.	2026-06-11 10:49:47 -07:00
Teknium	3c489fda81	fix(commands): unpin /reset from Slack priority aliases — registry hit the 50-cap CI tests the PR merged with current main, where the new /memory canonical command filled Slack's 50-slash cap: with btw/bg/reset all pinned ahead of canonicals, the last canonical (/debug) got clamped and the Telegram-parity test failed. Canonical commands must win slots over alias spellings — /new keeps its native slot and 'reset' stays reachable via /hermes reset. Also updates test_includes_aliases_as_first_class_slashes to assert the pinned-alias contract (_SLACK_PRIORITY_ALIASES survive) instead of a specific unpinned alias's survival, which was the same change-detector pattern the docstring already warned about.	2026-06-11 10:49:47 -07:00
Teknium	e8b757845d	fix(cron-recipes): pre-release hardening — honest cadences, strict slot names, surface-aware UX Review fixes for the Cron Recipes stack before release: - hydration-move: /90 in the cron minute field silently wraps to hourly (croniter-verified) — 90/120-minute options never fired at their stated cadence. Replaced with an hour-field step (0 9-17/2 * 1-5) and an interval_hours slot whose options (1/2/3h) all fire as labeled. - fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to silently create the job at the 08:00 default; now it 422s on the dashboard form and errors on the slash/deep-link paths with the valid slot list. - deliver slot: non-strict enum (options are suggestions, scheduler validates downstream) so slack/whatsapp/etc. users aren't locked out; GET /api/cron/recipes rewrites its options from cron_delivery_targets() so the dashboard form only offers configured platforms; help text no longer claims dashboard-created jobs deliver to 'the chat you set this up from' (the endpoint strips origin — they go to the home channel). - gateway: success/accept messages no longer point at /cron (cli_only); surface-aware hint instead. Conversational fill now sends the 'Setting up X — I'll ask you a couple of things…' ack before the agent turn, matching the CLI experience. - important-mail catalog entry: reference the urgency classifier by module path (python3 -m cron.scripts.classify_items) instead of baking an absolute host path into the job prompt — stale after relocation and nonexistent on remote terminal backends. cron/scripts is now a real package and ships in the wheel (pyproject packages.find). - export_recipe: interval schedules round-trip again — parse_schedule stores 'minutes' but the renderer only read 'seconds', so every interval job exported as the silent '0 9 * * *' fallback. - skills_hub install: say so when a recipe suggestion is dropped (latched dedup or pending cap) instead of printing nothing. Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all 14 recipes fill+parse, hydration cadences via croniter, typo rejection on slash + endpoint paths, surface-aware hints, and interval export round-trip.	2026-06-11 10:49:47 -07:00
teknium1	e976faac7a	feat(cron-recipes): /cron-recipe <name> seeds a conversational fill Reworks the chat-line UX: pick a recipe by name and the agent asks you for what it needs, one question at a time, instead of forcing you to hand-type a slot=val command line. - /cron-recipe -> lists the catalog - /cron-recipe <name> -> forgiving name match (exact/prefix/substring/ fuzzy; ambiguous lists candidates), then seeds the agent with a natural-language fill request built from the recipe's typed slots + schedule and prompt templates. The agent asks for each value one at a time and calls the EXISTING cronjob tool. No new tool. - /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe -> create_job) for the dashboard/docs/power user. Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user turn, never a synthetic injection): - shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe() and build_recipe_seed() are the new shared pieces. - gateway: dispatch rewrites event.text to the seed and falls through to the agent (the same pattern /steer uses). - CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop consumes it right after process_command() and runs it as the next turn. The typed-slot schema stays the single source of truth (still validates the form/inline path via fill_recipe); the agent path just renders those slots into the questions to ask. Docs updated to lead with the name-then-ask flow.	2026-06-11 10:49:47 -07:00
teknium1	1593ca5406	feat(cron): Cron Recipes — parameterized automation templates across every surface A 'recipe' is a one-place definition of an automation that every surface renders natively. The slot schema (cron/recipe_catalog.py) is the single source of truth; four renderers consume it, and all paths end at the same cron.jobs.create_job — no second job engine. Form where there's a screen, conversation where there's a chat line: - Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each recipe's typed slots as a form (time-picker, enum dropdown, free-text); submit POSTs /api/cron/recipes/instantiate which fills + creates the job. - CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's fields, or fills + creates from a pasted 'key slot=val' command. The shared handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so the agent can ask a targeted follow-up. - Docs: a generated Cron Recipes catalog page (website, .mdx + React cards) shows each recipe with a copy-paste command and a 'Send to App' button. - Desktop: a hermes:// URL scheme (Electron single-instance lock + setAsDefaultProtocolClient + open-url/second-instance) routes hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled. Typed slots (time/enum/text/weekdays) with defaults: users never type raw cron — recipes parameterize time-of-day and weekday sets and translate to cron expressions; a free-text 'schedule' slot is the full-flexibility escape hatch. Consent-first throughout: nothing schedules without an explicit submit or send. Core: - cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes, recipe_form_schema / recipe_slash_command / recipe_deeplink / recipe_catalog_entry renderers, fill_recipe (validate + translate to create_job kwargs). - hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI + gateway never drift). CommandDef + dispatch in commands.py / cli.py / gateway/run.py. Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate (web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage, api.ts methods + types. Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue, preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller composer prefill, electron-builder protocols key). Docs: extract-cron-recipes.py generator wired into prebuild.mjs, cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry. Generated index json gitignored like skills.json. Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command handler/generator) + 5 web_server endpoint tests. E2E verified end to end: slot fill -> create_job -> persisted job with correct schedule/deliver/origin.	2026-06-11 10:49:47 -07:00
teknium1	9a09ea69fb	feat(cron): Suggested Cron Jobs — one surface for proposed automations Hermes can propose automations and let the user accept them with one tap via /suggestions, instead of making them assemble cron jobs by hand. Every proposal — wherever it originates — flows through one surface. Sources (the 'where suggestions come from'): - catalog: curated starter automations (daily briefing, important-mail monitor, weekly review, workday-start reminder) via /suggestions catalog - recipe: installing a skill that carries a metadata.hermes.recipe block registers a suggestion instead of auto-scheduling - usage / integration: reserved for the background-review detector and account-connect triggers (sources defined; emitters land next) Pieces: - cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by key (dismissed proposals never re-offered), pending cap so it can't become a nag wall. Accepting calls the existing cron.jobs.create_job — there is NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600). - cron/suggestion_catalog.py — the curated set. The important-mail monitor entry is where the old proactive-monitor poll->classify->surface engine lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE catalog automation rather than a standalone feature. - tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single translation both the direct and suggestion paths share. - hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway never drift); /suggestions [accept N\|dismiss N\|catalog\|clear]. - Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py) + aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py). Consent-first throughout: nothing auto-schedules; acceptance is always explicit; dismissals latch. Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here as a catalog entry and a suggestion source respectively. Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency, recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md -> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.	2026-06-11 10:49:47 -07:00
Teknium	4d6a133a9f	fix(agent): gate skill-index demotion behind the opt-in focus mode (#44387 ) The coding posture's names-only demotion of non-coding skill categories (#44342) applied under the default auto mode, silently changing the skill index for every user in a git repo. Index changes must be opt-in: demotion now only fires under agent.coding_context=focus, alongside the toolset collapse. auto/on leave the skill index untouched; focus semantics are unchanged (demoted, never hidden; deny-list keeps coding-adjacent and custom categories at full entries).	2026-06-11 10:00:57 -07:00
Teknium	c7bfc938d5	fix(dashboard): Config page header shows the switched profile's config.yaml path (#44374 ) The Config page read config_path from /api/status, which is machine-global and always reports the profile the dashboard process was started under. After switching profiles with the global switcher, the header kept showing the old profile's path (e.g. /root/.hermes/profiles/worker_1/config.yaml) even though reads/writes correctly targeted the new profile. Fix: /api/config/raw now returns the resolved path alongside the YAML (resolved inside _profile_scope, so it follows ?profile=). ConfigPage prefers that scoped path and only falls back to /api/status for old servers. ProfileKeyedRoutes already remounts the page on switch, so the header refreshes immediately.	2026-06-11 09:46:15 -07:00
yoniebans	9121834b31	fix(desktop): scope remote workspace defaults	2026-06-11 09:41:35 -07:00
yoniebans	51f47f9a97	feat(desktop): add read-only remote filesystem API	2026-06-11 09:41:35 -07:00
helix4u	e71d746820	fix(mcp): avoid false failed startup status	2026-06-11 09:01:52 -07:00
helix4u	b2043cf157	fix(tui): decode startup subprocess output as utf-8	2026-06-11 09:00:55 -07:00
helix4u	dca11b6650	fix(mcp): preserve stdio argv passthrough	2026-06-11 08:59:55 -07:00
brooklyn!	ee1a744ace	fix(agent): demote non-coding skill categories to names-only — never hide skills (#44342 ) Real-world failure with the original index pruning: under the default auto posture, an agent-created ops skill in a demoted category vanished from the prompt's skill index mid-project, and the agent silently fell back to a stale sibling skill instead. The "discovery-only" premise didn't hold — models do not reach for skills_list to rediscover what the index stops showing them, and agent-created skills are the model's accumulated project memory (runbooks, pitfalls, operating rules). Gating pruning behind the opt-in focus mode was the wrong fix too: users opening a worktree don't know the config exists, so the index-noise win would effectively never ship. Instead, the coding posture now DEMOTES non-coding categories rather than hiding them: each demoted category renders as a single names-only line ("gaming [names only]: allthemons10-ops, mc-backup") with a footer note explaining the omitted descriptions. Every skill name stays in the prompt, so memory-anchored recall ("load <name>") keeps working in every mode, while the description noise is still cut. Applies in auto/on/focus alike; the general posture demotes nothing. Deny-list semantics unchanged — unknown/custom categories and coding-adjacent ones keep full entries. API renamed to match the honest semantics: hidden_skill_categories → compact_skill_categories, build_skills_system_prompt(hidden_categories=) → compact_categories=.	2026-06-11 10:25:42 -05:00
Teknium	e24c935cf3	fix(bedrock): fall back to non-streaming InvokeModel when IAM denies InvokeModelWithResponseStream (#44293 ) IAM policies scoped to bedrock:InvokeModel only (a common least-privilege setup) reject converse_stream() with AccessDeniedException. The agent loop hard-prefers streaming and the denial never matched the 'stream not supported' auto-fallback, so InvokeModel-only users looped on AccessDenied forever. - agent/bedrock_adapter.py: new is_streaming_access_denied_error() detector (ClientError code check + wrapped-SDK message match); call_converse_stream() falls back to converse() on denial. - agent/chat_completion_helpers.py: bedrock_converse streaming branch retries inline via converse() and sets _disable_streaming so later turns skip the doomed stream attempt; the chat-completions retry block also recognizes the denial for the AnthropicBedrock SDK path (message pre-check avoids importing bedrock_adapter — and its lazy boto3 install — for unrelated providers). Both paths print a one-line notice telling the user which IAM action restores streaming.	2026-06-11 07:15:30 -07:00
Austin Pickett	d0e017bac8	fix(gateway): gate oversized Telegram voice/audio before download (#44245 ) * fix(gateway): gate oversized Telegram voice/audio before download Adds a pre-download size check to the Telegram voice and audio inbound paths. Files that exceed _max_doc_bytes (default 20 MB) are rejected before get_file() is called, preventing silent OOM-style stalls on large uploads. A human-readable note is appended to the event text so the model can explain the limit to the user. Also extends 403 entitlement detection in recover_with_credential_pool to cover two additional cases: 'oauth authentication is currently not allowed for this organization' and Anthropic anthropic_messages-mode 403s, both of which should be treated as entitlement failures rather than transient errors. Tests: 7 new cases in test_telegram_voice_v0_regressions.py covering the size gate (accept, reject, note text) and the STT-failure notice path. Salvaged from #40487 (cryptopafi) — cherry-picked the Telegram voice policy and 403 entitlement fixes; LiveKit/Discord/uv.lock workstreams left for separate PRs. * test(gateway): drop orphaned voice tests not backed by this PR The cherry-picked test file from #40487 included 3 tests for STT-failure notice and voice-mode (_handle_voice_command 'on' -> voice_only) behavior that this PR intentionally does NOT salvage (those belong to the LiveKit/ voice-policy workstreams left in #40487). They fail on both this branch and clean main because the feature code isn't present. Keep only the 2 tests backed by code actually in this PR: - test_telegram_audio_size_gate_rejects_oversized_media_before_download (covers the _telegram_media_size_allowed guard this PR adds) - test_voice_tts_is_explicit_audio_reply_opt_in (matches current main) Removed now-unused imports (MessageEvent, MessageType, AsyncMock).	2026-06-11 10:01:51 -04:00
Teknium	a09343cc96	feat(dashboard): SKILL.md editor on Skills page + attach-skill selector in cron modals (#44231 ) Headless/VPS users (dashboard-over-Tailscale, no comfortable SSH) could list/toggle/install skills and create/edit cron jobs, but not author a custom skill or link one to a cron job — the UI set WHEN a job runs, but not WHICH skill it uses. - Skills page: 'New skill' button + per-row edit pencil open a SKILL.md editor dialog (frontmatter + body, server-side validation via the same _create_skill/_edit_skill path as the agent's skill_manage tool). - New endpoints: GET /api/skills/content, POST /api/skills, PUT /api/skills/content — all profile-scoped via _profile_scope(), which now also retargets tools.skill_manager_tool's import-time SKILLS_DIR binding. - Cron page: skills multi-select in both create and edit modals (parity with hermes cron --skill / edit --add-skill); CronJobCreate gains a skills field; job cards show an attached-skills badge. update_job already accepted skills in updates. - Tests: 17 new endpoint tests (content read, create/edit validation + profile scoping + auth gate, cron skills round-trip).	2026-06-11 06:10:27 -07:00
Teknium	f456f302df	fix(gateway): refuse to write service definitions with a temp-dir HERMES_HOME (#44267 ) * fix(gateway): refuse to write service definitions with a temp-dir HERMES_HOME A test/E2E harness that exports HERMES_HOME=/tmp/... and touches any gateway service write path (install, start self-heal, restart's refresh_systemd_unit_if_needed) bakes the throwaway home into the production systemd unit / launchd plist. The gateway then restarts 'healthy' but pointed at an empty temp home — no platforms enabled, deaf to every message (live incident 2026-06-11: /tmp/hermes-e2e-41264 poisoned the unit during a PR-review E2E probe; the post-update restart produced a 7-hour zombie gateway). The existing safety belt only sniffed pytest-shaped markers (/pytest-of-, /hermes_test). Add a structural guard: _temp_home_in_service_definition() extracts HERMES_HOME from the generated systemd unit or launchd plist and refuses the write (with actionable guidance) when it resolves under tempfile.gettempdir(), /tmp, /var/tmp, or the macOS /private variants. Wired into all five write sites: systemd refresh + install, launchd refresh + install + start self-heal. * test: patch unit generator in install tests tripped by temp-home guard CI runs hermetic with HERMES_HOME under a tmp dir, so the real generate_systemd_unit() output now (correctly) trips the new temp-home write guard in three install tests. Patch the generator with synthetic non-temp content — same pattern the existing pytest-marker guard tests use.	2026-06-11 06:10:08 -07:00
Teknium	8972a151a4	feat(cli,tui): show time since last final agent response on the status bar (#44265 ) Adds an idle clock to the context/status bar in both the prompt_toolkit CLI and the Ink TUI: once a turn completes, a dim '✓ <elapsed>' segment shows how long the session has been idle since the last final agent response. Hidden while a turn is live (the per-prompt elapsed timer covers that) and before the first turn completes. - cli.py: track _last_turn_finished_at when the agent thread exits, surface it via _format_idle_since() in the snapshot, render in both the wide fragments path and the plain-text fallback. - ui-tui: stamp lastTurnEndedAt when busy flips false after a live turn, thread it through appStatus -> StatusRule, render via a ticking IdleSince segment sharing the duration breakpoint/width budget.	2026-06-11 06:06:19 -07:00
Teknium	9c16ca8790	fix(dashboard): normalize model assignments + confirm-modal for backup import (#44237 ) Two beta-reported dashboard bugs: 1. Models page: 'Use as -> Main model' on an analytics card sends entry.provider, which falls back to the model's VENDOR prefix (modelVendor('anthropic/claude-opus-4.6') == 'anthropic') when the session row has no billing_provider. That persisted provider: anthropic + default: anthropic/claude-opus-4.6 — a vendor-prefixed OpenRouter slug on the NATIVE Anthropic provider. New sessions then 400 against api.anthropic.com and the user reads it as 'changing models does nothing'. Unknown vendors (moonshotai, poolside, ...) were worse: a provider that can never resolve credentials. Fix: _normalize_main_model_assignment() at the single write chokepoint — maps non-provider vendor names back to the user's current aggregator (else openrouter), and runs the model through normalize_model_for_provider() so the persisted name matches the target provider's API format. Wired into both /api/model/set and the profile-scoped _write_profile_model. 2. System page: 'Restore from backup' spawns hermes import with stdin=DEVNULL, so the CLI's interactive 'Continue? [y/N]' overwrite prompt hits EOF and auto-aborts whenever a config already exists (always, when the dashboard is running). Fix: ConfirmDialog in the dashboard owns the consent, then the endpoint passes --force so the restore runs non-interactively. Validated live: dashboard on a temp HERMES_HOME, repro'd both failure modes pre-fix (vendor-slug write verified via config.yaml + tui session.create; import 'Aborted.' in action-import.log), then verified post-fix (normalized writes, modal -> --force -> restored marker file).	2026-06-11 05:07:58 -07:00
Chris	4717989c10	fix(matrix): isolate room context and restore reliable inbound dispatch (#18505 ) * fix(matrix): isolate room context and inbound dispatch * test(matrix): cover room isolation and dispatch regressions * docs(matrix): document room isolation and session scope * fix(matrix): stabilize CI requirement checks * test(matrix): isolate mautrix stubs in requirements tests * fix(matrix): port room-scoped status and resume to slash commands mixin Move Matrix /status scope output and /resume same-room guards from the pre-refactor gateway/run.py into gateway/slash_commands.py so PR #18505 foundation behavior survives the upstream god-file decomposition. Uses i18n keys for Matrix resume/status messages. Preserves upstream session.py fixes (role_authorized, DM user_id isolation). * docs(matrix): explain inbound dispatch via handle_sync loop Document why Hermes uses an explicit sync loop with handle_sync() rather than client.start(), aligning with upstream #7914 diagnostics while preserving Hermes background maintenance tasks. * fix(i18n): add Matrix resume/status keys to all locale catalogs The Matrix /resume and /status slash-command keys added in the foundation PR must exist in every supported locale file. tests/agent/test_i18n.py asserts key and placeholder parity across catalogs. Non-English locales use English strings as interim placeholders until community translators can localize them. * fix(matrix): restore gateway authz for allowed_users; honor config require_mention Revert the early MATRIX_ALLOWED_USERS gate in _on_room_message so inbound sender authorization stays in gateway authz like main. Parse require_mention from config.extra (platforms.matrix / top-level matrix yaml) with env fallback, matching thread_require_mention and fixing Forge when require_mention is set only in profile config.yaml. * fix(matrix): harden status scope and allowlisted DMs * fix(matrix): use session store lookup for resume scope	2026-06-11 07:41:43 -04:00
Teknium	73dd584995	fix(mcp): propagate HERMES_HOME override onto the MCP event loop (#44220 ) * fix(mcp): propagate HERMES_HOME override onto the MCP event loop Closes the known limit documented in #44007: tasks scheduled via run_coroutine_threadsafe are created INSIDE the MCP loop thread, so they copy that thread's context — a per-request profile scope (dashboard ?profile= endpoints, e.g. the MCP 'Test server' probe) silently vanished for anything resolving get_hermes_home() inside the coroutine. Most visible symptom: OAuth token-store paths (HERMES_HOME/mcp-tokens/) resolved against the process home instead of the selected profile, so testing an OAuth MCP cross-profile read the wrong tokens. _run_on_mcp_loop now wraps scheduled coroutines with the caller's context-local override (_wrap_with_home_override): set inside the task's own context on the loop, reset on completion — task-local, so concurrent calls carrying different scopes don't interfere, and the loop thread's default context stays untouched. No-op (coroutine passes through unwrapped) when no override is active, i.e. every non-dashboard caller. web_server's probe comment updated from 'known limit' to 'covered'. Tests: override propagation (direct + factory form), OAuth token-path resolution on the loop, loop-context cleanliness after scoped calls, no-op passthrough. 225 green across mcp_tool + unification suites. * test(mcp): concurrent different-scope calls don't interfere	2026-06-11 04:37:01 -07:00
Teknium	3edd09a46f	fix(whatsapp): restart stale bridge processes instead of silently reusing them (#44205 ) A long-lived Baileys bridge survives gateway restarts AND hermes update: connect() adopted any bridge already listening with status connected, and disconnect() only kills bridges the adapter spawned itself. Users who updated to get inbound media support kept talking to a bridge process serving months-old bridge.js — images and voice notes still arrived as placeholders with no cached file path (refs #19105 follow-up reports). Three fixes in the same stale-bridge class: - Staleness handshake: bridge.js reports a sha256 self-hash in /health (scriptHash); connect() compares it against bridge.js on disk and restarts the bridge on mismatch. Pre-handshake bridges report no hash and are treated as stale, so every existing stale bridge gets recycled exactly once on the next gateway start. - npm dep refresh: deps reinstall when package.json changes (stamp file in node_modules), not only when node_modules is missing — a Baileys pin bump now actually lands. - Cache-dir passthrough: the gateway passes profile-aware HERMES_{IMAGE,AUDIO,DOCUMENT}_CACHE_DIR to the bridge instead of the bridge hardcoding ~/.hermes/image_cache etc., fixing media paths under HERMES_HOME overrides, profiles, and the new cache/ layout.	2026-06-11 03:47:29 -07:00
Teknium	875aa8f162	feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher (#44007 ) * feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher The dashboard becomes a machine-level management surface with one write-target selector, replacing per-profile dashboard fragmentation. Backend: - profile param (query or body) on /api/config (get/put/raw), /api/env (get/put/delete/reveal), /api/mcp/servers (list/add/remove/test/enabled), /api/mcp/catalog (list/install), /api/model/info, /api/model/set — all scoped through the existing _profile_scope() context manager - model/set restructured: expensive-model warning (await) runs before the scope; the config write runs sync inside the scope in a worker thread - MCP catalog installs + git-bootstrap entries spawn 'hermes -p <profile>' - chat PTY: ?profile= on /api/pty points the child's HERMES_HOME at the profile dir (its own gateway subprocess, config/skills/memory/state.db all profile-bound); in-process gateway attach skipped when scoped CLI launch unification: - '<profile> dashboard' routes to the machine dashboard: attach (open browser at ?profile=) when one is listening, else re-exec pinned to the default profile with --open-profile preselecting the launcher - --isolated preserves the old dedicated per-profile server behavior - start_server(initial_profile=...) appends ?profile= to the auto-open URL Frontend: - ProfileProvider + sidebar ProfileSwitcher: ONE global selector, URL- persisted (?profile=), mirrored into fetchJSON which auto-appends the param to the scoped endpoint families (explicit params win) - app-wide amber banner names the managed profile - SkillsPage's page-local selector (from the skills-scoping PR) folded into the global context — single source of truth - ChatPage threads the scope into the PTY WS URL; switching profiles remounts the terminal into a fresh scoped session Omitted profile keeps legacy behavior everywhere. * docs(dashboard): document machine-level multi-profile management - web-dashboard.md: 'Managing multiple profiles' section (switcher, URL deep-links, unified launch, --isolated, scoped Chat, what stays per-profile) + --isolated in the options table - profiles.md: 'From the dashboard' subsection + set-as-active vs switcher clarification - cli-commands.md: --isolated flag + profile-alias launch example * fix(dashboard): address profile-unification review findings Review findings (dev review on PR #44007): 1. HIGH — stale page state on profile switch: pages load data on mount and didn't consume the profile scope, so a page opened under profile A kept showing A's state while writes silently targeted the newly selected B. Fixed structurally: ProfileKeyedRoutes wraps the routed page tree and keys it by the selected profile, remounting every page (fresh state + refetch) on switch. ChatPage keeps its own remount (channel keyed on scopedProfile). 2. HIGH — /api/model/auxiliary read was unscoped while /api/model/set wrote scoped (Models page could show default's aux pins while editing worker's). Endpoint now takes profile + _profile_scope, added to PROFILE_SCOPED_PREFIXES, HTTPException re-raise so ghost profiles 404 instead of 500. Regression test asserts read/write symmetry with differing worker/default aux config. 3. MEDIUM — tools post-setup spawned unscoped from the profile-aware drawer. Now spawns 'hermes -p <profile> tools post-setup <key>' (same mechanism as hub installs); drawer threads its profile prop. Most hooks install machine-level artifacts where the scope is inert, but hooks reading config/env now see the drawer's HERMES_HOME. 4. LOW — ty warnings: env Optional asserts before subscript/membership, fastapi import replaced with web_server.HTTPException re-use. 298 tests green across the four affected suites; tsc -b + vite build green; aux scoping E2E-verified with real imports. * fix(dashboard): address second profile-unification review (gille) 1. BLOCKER — profile scope dropped on sidebar navigation: ProfileProvider derived the selection from the current URL, and nav links are bare paths, so clicking Config from /skills?profile=worker silently reset the write target. State is now the source of truth; an effect re-asserts ?profile= onto the new location after every navigation (URL stays a synchronized projection for deep links/refresh), and an incoming URL param (e.g. 'Manage skills & tools' links) still wins. 2. BLOCKER — /api/model/options unscoped while model/set wrote scoped: the picker context (current model/provider, custom providers, per-profile .env auth state) now loads inside _profile_scope; added to PROFILE_SCOPED_PREFIXES. Test: a worker-only current-model pin appears in the scoped payload and not the unscoped one. 3. BLOCKER — MCP test-server probe escaped the scope after the config read: the probe now re-enters _profile_scope inside the worker thread so env-placeholder expansion resolves against the selected profile's .env. Known limit (documented): the probe's dedicated MCP event-loop thread doesn't inherit the contextvar (OAuth token paths). Test asserts get_hermes_home() inside the probe == the worker profile dir. 4. BLOCKER — broad excepts swallowed unknown-profile 404s: /api/model/info degraded to 200-with-empty-model-info and /api/mcp/catalog to a silently-empty catalog. Both re-raise HTTPException; 404 regression tests added for info/options/catalog. Polish: scope banner clears the fixed mobile header (mt-14 lg:mt-0); --open-profile hidden via argparse.SUPPRESS (internal re-exec flag); attach-path test now asserts the opened ?profile= URL. (Stale-page-state + /api/model/auxiliary findings from this review were already fixed in `92bcd1568` — the review ran against `e600f6951`.) 35 tests in the two new suites + 274 in the adjacent ones, all green; tsc -b + vite build green; scoping E2E-verified with real imports. * docs(dashboard)+fix: self-review pass — Profiles page section, REST profile-param tip, body-beats-query precedence Docs: - web-dashboard.md: add the missing 'Profiles' subsection to Pages (cards, create/builder, manage-skills jump, set-as-active vs switcher distinction, editors); REST API section gets a profile-scoped-endpoints tip documenting ?profile= / body profile / 404 semantics / /api/pty - (profiles.md + cli-commands.md were already updated in `e600f6951`) Precedence fix: scoped endpoints taking BOTH a query param and a body field now resolve body.profile first. The SPA's fetchJSON injects the query param from the GLOBAL switcher; an explicit body.profile (e.g. Profile Builder flows writing into a specific new profile) is the more specific intent and must not be overridden by whatever the sidebar happens to be set to. Matches the documented 'explicit beats global' contract in api.ts. Verified: 304 tests green across the four suites; tsc -b + vite build green; docusaurus build green (only pre-existing broken-link warnings, none from this PR's pages).	2026-06-11 03:29:33 -07:00
Teknium	85503dceca	Merge pull request #44038 from NousResearch/hermes/hermes-fb4ee8ce fix(cli): show quick commands in /help output	2026-06-11 03:04:30 -07:00
kshitij	955fa40062	Merge pull request #44085 from kshitijk4poor/review/pr-43754-ssh-update Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details fix(update): avoid SSH auth for passive official checks	2026-06-11 01:12:03 -07:00
kshitij	39f40ece70	Merge pull request #44074 from kshitijk4poor/fix/archive-compressed-session-lineages-salvage fix(sessions): archive compressed conversation lineages	2026-06-11 00:24:00 -07:00
kshitijk4poor	ed2b9e43c8	fix(backup): stage SQLite snapshots beside output zip in pre-update path too The pre-update / pre-migration backup path (_write_full_zip_backup) had the same /tmp staging bug as run_backup: a small tmpfs at the default tempfile location silently drops large *.db files from the archive. Route its SQLite staging temp files to the output zip's directory as well, and add regression tests (mutation-verified) for both staging paths. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-11 12:45:40 +05:30
helix4u	cedd9b6d47	fix(update): avoid SSH auth for passive official checks	2026-06-11 12:45:07 +05:30
liuhao1024	dd40600e0a	fix(backup): stage SQLite snapshots alongside output zip and stop excluding nested hermes-agent skill dirs Two bugs in the backup routine: 1. SQLite safe-copy used tempfile.NamedTemporaryFile() which defaults to the system temp directory (/tmp). When /tmp is a small tmpfs and the database is large, the copy silently fails and the resulting zip is missing state.db, kanban.db, and response_store.db. Fix: pass dir=out_path.parent so the temp file is staged alongside the output zip on the same filesystem. 2. _EXCLUDED_DIRS contained "hermes-agent" which matched at ANY path depth, accidentally excluding the Hermes Agent skill directory at skills/autonomous-ai-agents/hermes-agent/. Fix: special-case "hermes-agent" to only match when it is the first path component (the root-level code checkout). All other excluded dir names continue to match at any depth. Regression tests added for both fixes.	2026-06-11 12:43:39 +05:30
Dan Schnurbusch	04b3f19538	fix(sessions): archive compressed conversation lineages	2026-06-11 12:31:10 +05:30
Teknium	b8e2c16579	Merge origin/main into salvage branch (resolve AUTHOR_MAP conflict)	2026-06-10 23:25:54 -07:00
teknium1	cb2c13055e	fix(gateway): scrub _HERMES_GATEWAY from POSIX detached restart watcher too Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so the CLI's self-restart loop guard silently refuses 'hermes gateway restart' and the gateway never comes back. Scrub the marker from the watcher env on the POSIX branch as well, and extend the setsid test to assert it.	2026-06-10 23:22:43 -07:00
鼬君夏纪	264ac72b67	fix(gateway,windows): preserve restart watcher env	2026-06-10 23:22:43 -07:00
Shannon Sands	fa7f24e898	Enable webhooks from dashboard page	2026-06-10 22:55:06 -07:00
Teknium	13f1efdd15	fix(gateway): collapse repeated terminal headers in consecutive tool progress blocks (#43968 ) When the agent runs several terminal commands back-to-back, each progress line repeated the '💻 terminal' header above its fenced code block, cluttering the progress bubble. Now only the first terminal call in a streak emits the header; subsequent consecutive terminal calls render adjacent code blocks. Any other tool (or non-block preview) resets the streak so the next terminal call gets a fresh header.	2026-06-10 22:30:27 -07:00
brooklyn!	975edd4140	fix(cli): omit --workspace when subpackage has its own package-lock.json (#42973 ) (#43986 ) * fix(cli): omit --workspace when subpackage has its own package-lock.json When ui-tui/ (or web/) contains its own package-lock.json, _workspace_root() returns the subpackage directory itself. Passing --workspace ui-tui in that case fails because npm cannot find a workspace named 'ui-tui' inside ui-tui/. Fix: skip the --workspace flag when npm_cwd equals the target directory, running a plain 'npm install' from the standalone project root instead. Applies the same fix to both _make_tui_argv (TUI) and _build_web_ui (web). Fixes #42973 * test(cli): fix web workspace-scope fixture + cover own-lockfile fallback (#42973) The web half of the #42977 fix broke test_npm_install_uses_workspace_web_scope, which built its fixture with no lockfile anywhere. Without a root lockfile, _workspace_root(web_dir) already returns web_dir, so the new "() if npm_cwd == web_dir" branch correctly drops --workspace and the assertion failed. Model a real workspace checkout instead: the single package-lock.json lives at the root, so --workspace web scopes the install. Also add the symmetric web regression test (web/ carrying its own lockfile => --workspace must be dropped and the install runs plainly from web_dir via npm ci), matching the TUI coverage already in test_tui_npm_install.py. --------- Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-11 05:01:25 +00:00
brooklyn!	3e74f75e41	feat(agent): coding-context posture across CLI/TUI/desktop/ACP (#43316 ) * feat(agent): coding-context posture with per-model edit-format tuning Hermes detects when it's running in a coding context — an interactive surface (CLI, TUI, ACP, desktop) sitting in a code workspace (git repo or recognised project root) — and shifts into a coding posture. Outside that (chat platforms, non-workspaces) nothing changes. The posture is modelled as a frozen RuntimeMode selected from a small ContextProfile registry (coding/general). A profile is data: the toolset to collapse to, the operating brief to inject, and seams for model routing and memory. Every domain reads the same resolved object instead of re-probing git/config on its own: - System prompt — RuntimeMode.system_blocks(): an operating brief (gather context before editing, edit through tools not chat, verify with terminal, cap retry loops) plus a live git/workspace snapshot, built once and baked into the stable prompt tier so per-conversation caching is preserved. - Per-model edit-format tuning — the brief nudges each model family toward the patch mode it handles best: OpenAI/Codex toward mode='patch' (V4A multi-file diffs), Anthropic toward mode='replace' (string replacement). The model id rides on RuntimeMode; unknown families keep neutral wording. - Skill index — non-coding skill categories are pruned from the prompt's skill index (discovery-only; skills_list/skill_view still reach the full catalog, with a disclosure note). - Toolset — only under the opt-in 'focus' mode does the posture collapse to the coding toolset + enabled MCP servers; the default posture is prompt-only and never overrides configured toolsets. Activation via agent.coding_context: auto (default), focus, on, off. Subagents inherit the posture for free via toolset inheritance + the shared prompt builder. Detection is not memoized so a long-lived gateway/TUI process can't pin a stale posture across working directories. * feat(agent): cover new-file authoring in the coding edit-format nudge The per-model edit-format guidance only addressed editing existing code (patch mode='patch' vs 'replace'), but authoring a brand-new file — write_file, not patch — is a large fraction of real coding work and the nudge was silent on it. Surfaced when building a single-file artifact where the dominant operation was write_file and the steering offered no guidance. Both family lines now lead with "author new files with write_file; for edits to existing code prefer ...". Tests assert write_file appears in each family's brief; unknown families still get neutral wording. * docs(agent): correct memoization docstring + clarify TUI config-load asymmetry * feat(agent): sharpen the coding posture — verify-loop facts, wider edit steering, $HOME guard Tuning pass on the coding posture from dogfooding it as a harness: - Workspace snapshot now hands the model its verify loop up front: detected manifests + package manager (lockfile sniff), the exact verify commands (package.json scripts, Makefile targets, scripts/run_tests.sh, pytest config), and which context files (AGENTS.md / CLAUDE.md / .cursorrules) exist at the root. Marker-only (non-git) projects get the snapshot too instead of nothing. The "verify before claiming done" brief line was the highest-value piece in evals — this turns it from advice into an executable loop instead of making the model rediscover the test command every session. Still stat-cheap, size-guarded reads, built once at prompt time. - Edit-format steering covers the families Hermes actually serves: Gemini and open-weight coding models (DeepSeek, Qwen, Kimi, GLM, Grok, Hermes, Llama, Mistral, Devstral, MiniMax) steer to mode='replace' — their RL scaffolds use str_replace-style editors. Previously only GPT/Codex and Claude families got steering; the models Hermes users disproportionately run all fell to neutral. - Operating brief gains four behaviors elite harnesses encode: batch independent reads/searches in one turn; fix root causes and the bug class (sibling call paths), not the reported site; no drive-by refactors/renames/reformatting; never read, print, or commit secrets. Plus a patch-failure escalation ladder: after the same region fails twice, rewrite the enclosing function/file with write_file instead of a third patch attempt. - $HOME dotfiles guard: a git repo rooted exactly at the home directory (or a marker sitting in it, e.g. a global ~/AGENTS.md) is user config, not a code workspace — without the guard, every session anywhere under a dotfiles-managed home silently flipped to the coding posture. Real projects under such a home still detect via their own markers/repos; 'on' mode bypasses the guard.	2026-06-10 23:06:44 -05:00
Teknium	7d8d000b19	revert(cron): remove per-job profile support (PR #28124 ) (#43956 ) Fully removes the cron per-job 'profile' arg added in #28124: the cronjob tool schema field, CLI --profile flags on cron create/edit, job-record storage/validation, the scheduler's _job_profile_context wrapper, and the script-runner env override. Sequential-partition logic reverts to workdir-only. The context-local HERMES_HOME override in hermes_constants and the subprocess bridging in tools/environments/local.py are kept — they now have other consumers (dashboard multi-profile, TUI gateway).	2026-06-10 20:46:17 -07:00
teknium1	efcbbde48c	refactor: keep anthropic_content_blocks in-memory only (no state.db column) Drop the hermes_state.py column + persistence plumbing from the salvaged interleaved-thinking fix. The ordered-block channel covers the failure window in-memory (turn replayed within the live conversation loop). A session reloaded from disk after a crash falls back to reconstruction; if that replay 400s, the thinking-signature recovery (#43667) strips reasoning_details and retries — one degraded call in a rare resume path instead of a schema column. Replaces the DB-roundtrip test with a fallback-shape test.	2026-06-10 20:45:16 -07:00
RaumfahrerSpiffy	7a1eed8268	fix(anthropic): redact replayed tool inputs and broaden thinking-replay 400 recovery Two additive hardening changes on the interleaved-thinking replay path introduced by this PR's anthropic_content_blocks channel. Both are scoped to that channel's blast radius; neither changes correct behavior. 1. Replay-time tool-input re-sourcing (credential safety). The ordered-block channel captures each tool_use `input` from the RAW API response in normalize_response, which is NOT credential-redacted. The parallel tool_calls[].function.arguments IS redacted at storage time (build_assistant_message, #19798). The verbatim-replay fast path in _convert_assistant_message replayed the raw block input, so a secret a model inlined into a tool call (e.g. an Authorization header value passed inside a terminal command) would ride back onto the wire even though it is redacted everywhere else in history. Re-source tool_use input from the redacted tool_calls map by sanitized id; interleave order (the reason this channel exists) is unaffected. Adapted from #36071, which re-sources tool inputs the same way on its replay path. 2. Broaden the thinking-replay 400 classifier (defense-in-depth). error_classifier only matched "signature" + "thinking", so the frozen-block variant — "thinking ... blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response." — carried no "signature" token and fell through to a non-retryable abort. The anthropic_content_blocks channel prevents the reorder that triggers this 400 at the source, but if any future mutator reintroduces it, the turn now self-heals via the existing strip-reasoning-and-retry recovery instead of crash-looping. A negative case ensures an unrelated "cannot be modified" 400 (no "thinking") is not swept in. Mirrors the classifier broadening in #36087 and #36071. Tests - tests/agent/test_anthropic_thinking_block_order.py: a replay test asserting an inlined secret is redacted on the wire while interleave order is preserved. - tests/agent/test_error_classifier.py: three cases — frozen-block 400 native and via OpenRouter route to thinking_signature/retryable; an unrelated "cannot be modified" 400 does not. Both grafts verified RED (tests fail with the change reverted) then GREEN. Full adapter, transport, classifier and output-field-leak suites pass. Co-authored-by: AlexanderBFoley <92330381+AlexanderBFoley@users.noreply.github.com>	2026-06-10 20:45:16 -07:00

1 2 3 4 5 ...

5310 commits