hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
brooklyn!	ed962104c8	Merge pull request #52935 from NousResearch/bb/desktop-inline-rendering feat(desktop): inline rich embeds, diagrams & alerts in assistant markdown	2026-06-26 13:36:43 -05:00
Brooklyn Nicholson	db6ced4712	feat(desktop): consent gate for inline embeds (per-embed / per-service) Embeds reach out to third parties on render, so default to a placeholder that mirrors the tool-approval UX: "Load <service>" (this embed) or "Always allow <service>" (persisted). A desktop-local store ($embedMode ask\|always\|off + per-service allowlist) gates the fetch with zero gateway round-trip; an Appearance setting controls the global default. Local renderers (mermaid, svg, alerts) are never gated. Addresses review feedback on outbound third-party requests.	2026-06-26 13:35:01 -05:00
Teknium	2d3071f9d4	docs(moa): clarify MoA presets are selectable on every surface (CLI, hermes model, Dashboard, Desktop, TUI) (#53211 )	2026-06-26 11:16:14 -07:00
Teknium	9dd56f0dfb	docs(moa): add HermesBench results to Mixture of Agents page (#53206 )	2026-06-26 11:05:07 -07:00
Teknium	3d735fe156	fix(skills-hub): surface per-tap providers (NVIDIA/OpenAI/...) in runtime search (#53191 ) Natural-language skill search returned a short, arbitrary list and never surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes: 1. The runtime index collapses every GitHub tap into source="github", so there was no way to find or filter by provider at the CLI — the per-tap identity only existed in the docs-site catalog. 2. HermesIndexSource.search matched only name/description/tags (not the identifier or provider) and broke at the first `limit` hits in raw index order, burying the most relevant skills. `search` also defaulted to --limit 10 against an 86k-entry catalog. Changes: - GitHubSource stamps a per-tap provider label (extra.provider) on each skill via github_provider_for(); source stays "github" so dedup/floor/ index-skip logic is untouched. Flows into the built index. - HermesIndexSource.search now matches identifier + provider too, and collect-then-ranks (exact > prefix > whole-word > substring) instead of break-at-limit. - --source nvidia\|openai\|anthropic\|huggingface\|voltagent\|gstack\|minimax provider filters for browse/search (narrows merged results by provider). - search --limit default 10 -> 25; table Source column shows the provider label for github skills. Tested: 181 unit tests pass; E2E against the live runtime index confirms 'nvidia'/'cuda' searches now surface NVIDIA-provider skills and --source nvidia narrows to exactly the NVIDIA catalog.	2026-06-26 11:04:41 -07:00
Teknium	d430684d7c	fix(gateway,windows): respawn gateway windowless after GUI update (#52239 ) The post-update gateway restart path relaunched the gateway with the venv's console `python.exe` (via `get_python_path()` in `_gateway_run_args_for_profile`). On Windows this leaves a terminal window open permanently: uv's `venv\Scripts\python.exe` is a launcher shim that re-execs the base console interpreter, which allocates its own conhost — and `CREATE_NO_WINDOW` cannot suppress that second window. The clean-start path (`_spawn_detached`) already dodges this by routing through `_resolve_detached_python` to use the windowless base `pythonw.exe`; the restart watcher did not. Symptom (reported on Windows 11): after an in-app GUI update, a console window for the gateway stays open and never closes. Confirmed on the reporter's box — the running gateway was `python.exe ... gateway run --replace` with a live conhost child and the foreground "Press Ctrl+C to stop" banner, born exactly at the update's "Restarting Windows gateway" log line. Fix: - Add `gateway_windows.windowless_gateway_restart_spec(run_argv)` which rewrites a console-python gateway argv into the windowless `pythonw.exe` equivalent and returns the cwd + env overlay (VIRTUAL_ENV / PYTHONPATH / HERMES_HOME) the base interpreter needs to import `hermes_cli` without the venv launcher's site config. No-op on POSIX. - `_spawn_gateway_restart_watcher` now applies that rewrite on Windows and threads cwd= / env= into the inlined respawn Popen. Covers both restart entry points (`launch_detached_profile_gateway_restart` and `launch_detached_gateway_restart_by_cmdline`). CREATE_NO_WINDOW \| DETACHED_PROCESS \| CREATE_BREAKAWAY_FROM_JOB and the breakaway-denied fallback are all preserved. Verified E2E on a real Windows 11 box: drove the actual watcher against a dummy old-pid; the respawned gateway came up as `pythonw.exe` (zero console python, no conhost child) and booted fully (housekeeping + kanban dispatcher started → imports resolved under the base interpreter). Tests: TestWindowlessGatewayRestartSpec (behavior) + TestGatewayDetachedWatcherWindowsFlags regression assert. Pre-existing Linux-only failures on a Windows host (SIGKILL, systemd, docker-root) confirmed identical on the bare base.	2026-06-26 17:39:46 +00:00
Teknium	bb6a4d2a57	docs(nix): mark Nix/NixOS as no longer explicitly supported (#52975 ) Add a deprecation banner to the top of the dedicated Nix & NixOS setup guide and consistency notes at the Nix sections of installation, updating, and the plugin-distribution guide. Nix is now best-effort only; the supported install paths are the curl\|bash installer, Docker, and Windows.	2026-06-26 10:17:43 -07:00
kyssta-exe	c0568ca95f	fix(config): use read_raw_config() in migrations to prevent expanding defaults (#40821 )	2026-06-26 22:40:52 +05:30
brooklyn!	5cc4009deb	Merge pull request #52828 from helix4u/fix/desktop-backend-update-indicator fix(desktop): show remote backend updates without counts	2026-06-26 11:49:07 -05:00
kshitij	5038678647	Merge pull request #53110 from NousResearch/salvage/42203-find-shell-prefer-usershell fix(terminal): prefer $SHELL over bash for background process spawning (#42203)	2026-06-26 21:11:57 +05:30
liuhao1024	d9f1f1a1de	fix(terminal): prefer $SHELL over bash for background process spawning (#42203 ) On macOS, terminal(background=true) silently failed: the process returned a session_id and exit_code=0 but the command never ran (empty stdout, no side effects). Root cause is two interacting issues: 1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")` → /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh). 2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but drops the -c argument, so the command is swallowed (exit 0, no output). Decouple _find_shell from _find_bash: _find_shell now prefers the user's configured $SHELL on POSIX (the shell they actually log in with), falling back to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash semantics. zsh handles -lic correctly even with redirected stdin. Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick). On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty, Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and asserts the shell _find_shell selects actually executes a -lic background command. Mutation-verified: reverting _find_shell to the bash alias fails the $SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a .bash_profile->exec-zsh creates no file; zsh -lic does. Closes #42203. Supersedes #42290.	2026-06-26 20:45:32 +05:30
xxxigm	65be0061e0	fix(hermes): heal broken managed Node tree instead of PATH fallback When a Hermes-managed node/npm/npx shim exists but fails --version, redownload the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall back to system npm on PATH when a managed tree is present. POSIX heal probes node, npm, and npx (npm can break while node still runs).	2026-06-26 20:10:20 +05:30
xxxigm	3c5bcd3eee	test(hermes): cover broken managed npm fallback in node resolution Add POSIX runnable-probe coverage plus Windows fallback wiring that skips a managed npm.cmd when node_tool_runnable rejects it.	2026-06-26 20:10:20 +05:30
xxxigm	9274f73e48	fix(hermes): fall back when managed node/npm fails health probe A stale or partial Hermes-managed Node tree under the active HERMES_HOME can leave bin/npm behind while lib/cli.js is missing. File-existence checks alone made hermes update pick that broken npm and skip healthy system npm on PATH. Probe managed candidates with --version before preferring them.	2026-06-26 20:10:20 +05:30
kshitij	7b2c51152a	Merge pull request #52990 from NousResearch/salvage/52889-backup-projects-kanban fix(backup): include projects.db and kanban boards in pre-update snapshot (#52889)	2026-06-26 20:09:15 +05:30
0xDevNinja	9ef49cd78f	fix(backup): include projects.db, kanban boards, and sibling stores in pre-update snapshot (#52889 ) projects.db (per-profile project store) and kanban.db were missing from _QUICK_STATE_FILES, so the pre-update quick snapshot never backed them up. On a desktop upgrade, when the update flow removes/replaces the file and the post-update schema-init re-creates an empty one, all user-created projects, folder mappings, the active-project pointer, kanban board bindings, and tasks vanish silently — no error. Add the per-profile user-created stores to the snapshot set: - projects.db — project store - response_store.db — gateway conversation history / tool payloads (WAL) - memory_store.db — holographic memory facts/entities (WAL) - verification_evidence.db — agent verification audit trail - kanban.db — default board (back-compat <root>/kanban.db) - kanban/boards — non-default boards (<root>/kanban/boards/<slug>/kanban.db + metadata); workspaces/ and attachments/ subtrees are skipped as large + regenerable. Also: the directory-branch of create_quick_snapshot now routes *.db through the WAL-safe _safe_copy_db (SQLite backup() API), matching the top-level file path — previously a non-default board DB with an open WAL could be copied inconsistently. Salvaged from #52930 by @0xDevNinja (authorship preserved via cherry-pick). On top of the original (which covered only projects.db + the default kanban.db), this adds: non-default-board coverage, the three sibling per-profile DBs that meet the same upgrade-wipe criteria, WAL-safe directory copies, and a workspaces/attachments skip to avoid snapshot bloat (×20 retained). 8 tests, all mutation-verified; E2E verified snapshot→wipe→restore preserves all six store types on the real code path. Closes #52889. Supersedes #52930.	2026-06-26 19:23:33 +05:30
Ben	8ab7246c45	fix(gateway): stamp drain marker with instantiation epoch so a durable-volume restart clears it (NS-570) The external-drain marker .drain_request.json is written under HERMES_HOME, which on Hermes Cloud is a persistent Fly volume (/opt/data). A begin-drain marker therefore SURVIVES the post-update machine restart. But the disruptive lifecycle actions a drain protects (auto-update / image migrate / env edit / profile change) all restart the machine — which is exactly the signal the drain is over. The freshly-restarted gateway re-read the orphaned marker on its startup reconcile and parked itself back in 'draining', refusing every new turn indefinitely (NS-570: ~52 min until manually cleared). Fix: stamp the marker with an identity of THIS container/VM instantiation (kernel boot_id + PID 1 start time, read from /proc) and treat a marker whose epoch differs from the current instantiation as absent. A deliberate restart → new PID 1 → new epoch → stale marker ignored → gateway boots 'running'. A marker written during the current instantiation (the live drain) still matches; an s6 respawn of just the gateway (PID 1/init unchanged) keeps the same epoch, so an in-flight drain is still honoured (D4a reversibility preserved). The staleness check is lenient and never fail-closed: a legacy marker with no epoch, a corrupt/contentless marker, or an environment with no /proc (epoch unavailable) all degrade to the original presence-only behaviour. NAS is untouched — it only ever POSTs begin/cancel-drain over HTTP; the marker file is purely gateway-internal IPC. The fix is entirely within gateway/drain_control.py; the watcher and the dashboard endpoint go through the same drain_requested()/write_drain_request() chokepoints and need no functional change.	2026-06-26 18:59:41 +05:30
Dr1985	e3db1ef92d	fix(macos): clearly distinguish launchd supervision from detached fallback in gateway status Some checks failed CI / detect (push) Waiting to run Details CI / tests (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / typecheck (push) Blocked by required conditions Details CI / docs-site (push) Blocked by required conditions Details CI / history-check (push) Blocked by required conditions Details CI / contributor-check (push) Blocked by required conditions Details CI / uv-lockfile (push) Blocked by required conditions Details CI / docker-lint (push) Blocked by required conditions Details CI / supply-chain (push) Blocked by required conditions Details CI / osv-scanner (push) Blocked by required conditions Details CI / All required checks pass (push) Blocked by required conditions Details Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Has been cancelled Details Docker Build and Publish / build-arm64 (push) Has been cancelled Details Docker Build and Publish / merge (push) Has been cancelled Details ## Description On macOS 26.x, `launchctl bootstrap` and `launchctl kickstart` return exit code 5 ("Input/output error"), which Hermes already anticipates and handles by spawning a detached fallback process. However, the gateway status reporting is ambiguous: - `gateway status` says "Gateway service is loaded" (because `launchctl list` returns exit 0) - But `launchctl print` shows `state = not running` — launchd isn't actually supervising anything - The detached fallback PID running is invisible to the status command - Users can't tell whether auto-start at login and auto-restart on crash are available ### Root Cause Two problems in `hermes_cli/gateway.py`: 1. `_probe_launchd_service_running()` (line 1067): Determined launchd service liveness solely by `launchctl list <label>` exit code. On macOS 26, this returns 0 even when the service is only registered but not running (output lacks a `"PID"` field). This caused `GatewayRuntimeSnapshot.service_running = True` incorrectly, which suppressed the process/service mismatch warning. 2. `launchd_status()` (line 3569): Used the same binary "loaded/not loaded" check without inspecting whether launchd actually has a PID, whether a detached fallback is running, or whether auto-start/restart are available. ### Changes `hermes_cli/gateway.py`: 1. New `_parse_launchd_pid_from_list_output()` helper — Extracts the PID from `launchctl list` output. When launchd is actively supervising, the output includes `"PID" = <number>;`. When only registered but not running, no PID field is present. 2. Fixed `_probe_launchd_service_running()` — Now requires a PID in the `launchctl list` output to confirm launchd is actually supervising. This correctly sets `service_running = False` when launchd has the service registered but `state = not running`, which triggers the existing process/service mismatch detection. 3. Reworked `launchd_status()` — Reports clearly separated information: - LaunchAgent plist currentness (stale or current) - Whether launchd is actively supervising (with PID) - Whether a detached fallback PID is running - Whether auto-start at login and auto-restart on crash are available - When launchd supervision is known to be unavailable, explains why 4. Persistent unsupported marker (`~/.hermes/.gateway-launchd-unsupported`) — Written when `_launchd_fallback_to_detached()` is called (launchd exit 5/125). Allows `launchd_status()` to explain why launchd can't supervise even when no fallback process is currently running. Cleared automatically when a future bootstrap/kickstart succeeds (e.g., after an OS update fixes the issue). 5. Updated `_print_gateway_process_mismatch()` — Distinguishes the managed detached fallback from a genuinely manual `nohup hermes gateway run`, providing accurate guidance for each case. ### Status Output Examples Before (macOS 26, fallback active): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ✓ Gateway service is loaded { "Label" = "ai.hermes.gateway"; "OnDemand" = true; ... }; ``` After (macOS 26, fallback active): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ⚠ Gateway service is registered but launchd is not supervising it launchd cannot manage the gateway on this macOS version. ✓ Detached fallback process is running (PID 12345) Cron jobs will fire. Stop with: hermes gateway stop ⚠ Auto-start at login and auto-restart on crash are NOT available. ``` After (normal launchd supervision): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ✓ Gateway is supervised by launchd (PID 12345) Auto-start at login and auto-restart on crash are available. ``` ### Tests Updated 5 existing tests and added 11 new tests in `tests/hermes_cli/test_gateway_service.py`: - PID parsing from `launchctl list` output (with PID, without PID, empty, unquoted PID) - `_probe_launchd_service_running()` requires PID presence - Unsupport marker lifecycle (write, clear, persist across fallback) - Marker cleared on successful bootstrap - `launchd_status()` reporting: supervised, fallback-running, fallback-unavailable - Existing fallback tests now verify marker creation ### Related Issues - Issue #23387 (original macOS 26 launchd workaround) - Issue #42524 (this issue)	2026-06-26 16:30:30 +05:30
kshitij	1c832762a8	Merge pull request #52983 from kshitijk4poor/chore/author-map-dr1985 chore: add Dr1985 to AUTHOR_MAP for launchd salvage	2026-06-26 16:29:22 +05:30
kyssta-exe	07cc567dfa	fix(security): add circuit breaker for tirith crashes to prevent agent hangs (#41400 )	2026-06-26 15:26:08 +05:30
brooklyn!	ca82d0accc	Merge pull request #52993 from NousResearch/bb/desktop-clarify-redesign feat(desktop): redesign the clarify prompt + fix its awaiting-input states	2026-06-26 03:57:43 -05:00
Brooklyn Nicholson	54b50037e1	fix(desktop): treat a pending prompt as paused-on-you, not working A clarify/approval/sudo/secret prompt blocks the turn on the user, but the UI treated it as an in-flight turn: the "thinking" timer kept ticking and Esc interrupted the run — discarding a question you might want to come back to. Add $activeSessionAwaitingInput (the pet's awaitingInput concept, scoped to the active session) and use it to suppress the stall indicator and disarm Esc while a prompt waits. Clear the session's prompts (and needsInput) on Stop and on turn end so a resolved/aborted turn can't leave a dead panel or a stuck "needs input" dot.	2026-06-26 03:55:34 -05:00
Brooklyn Nicholson	8559246bfb	feat(desktop): rebuild the clarify prompt to match the chat UI The inline clarify panel used its own card tokens, an animated ring, and oversized spacing — out of step with every other tool row. Rebuild it on the shared --ui-/--conversation- tokens: a compact panel, letter-key badges (A/B/C…) that double as a/b/c… shortcuts, an inline content-sizing "Other" field (CSS field-sizing — no view swap, no layout shift on focus), and a Continue button so picking an option selects rather than auto-sends. Selection lives on the letter badge alone (solid primary; outlined while Other is focused-but-empty). Also settle the panel into the standard tool block once the turn stops running, so a stopped turn no longer strands a live, unanswerable prompt.	2026-06-26 03:55:29 -05:00
kshitij	1aa458a1e6	Merge pull request #52920 from NousResearch/salvage/38798-toolset-validation fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798)	2026-06-26 14:14:55 +05:30
Brooklyn Nicholson	da0ed979fa	feat(desktop): zoomable primitive — open full, pan/zoom, copy Add a content-agnostic Zoomable primitive (useZoomPan hook + overlay viewer): click to open full-screen, wheel-zoom toward the cursor, drag to pan, toolbar zoom/reset, and an optional copy action. Wire Mermaid diagrams into it with copy-as-PNG; reusable for other inline content later.	2026-06-26 03:40:49 -05:00
kshitijk4poor	05ba5f3962	chore: add Dr1985 to AUTHOR_MAP for launchd salvage (#42567 )	2026-06-26 14:09:11 +05:30
lEWFkRAD	41ede84b93	fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798 ) A config migration (or hand-edit) that leaves an invalid toolset name in `platform_toolsets` — e.g. the #38798 corruption that rewrote `hermes-cli` to the non-existent `hermes` — silently disabled all affected tools: resolve_toolset() returns [] for an unknown name, so the agent quietly lost its tools with no error, warning, or log entry and degraded to text-only replies. Surface it loudly at two points: - After migration (migrate_config): validate platform_toolsets and record/print a warning per unknown name, with a `hermes-<platform>` suggestion when that would have been valid (the exact #38798 shape). - At runtime (_get_platform_tools): if a platform was explicitly configured but every toolset name is invalid, log a warning when tools are resolved for a session — so an ALREADY-corrupted config is caught at startup, not only on the next `hermes update`. Logic lives in a new pure, side-effect-free helper (toolset_validation.py) with validate_toolset injected, so it is unit-testable without the tool registry. Note: the original v25→v26 migration that caused the corruption no longer exists (config format is now v30; no migration step rewrites toolset names). This change is the durable defense against the silent-failure mode regardless of cause, matching the issue's "Expected: log a warning". Salvaged from #39207 by @lEWFkRAD (authorship preserved via cherry-pick). Tests: 9 helper cases (incl. the #38798 corruption shape, mixed valid/invalid, zero-tools state, non-dict/scalar/non-string) + a runtime caplog test — both the helper warning and the runtime guard mutation-verified to fail without the fix. Closes #38798. Supersedes #39581 (prevent-in-v25→v26 — that path is gone), #41006 / #40208 (repair-migration for already-corrupted configs).	2026-06-26 14:07:43 +05:30
Brooklyn Nicholson	e36d9862ec	feat(desktop): render embeds, fences and alerts in assistant markdown Wire the embeds module into the markdown surface: bare provider autolinks unfurl to inline embeds, ```mermaid/```svg fences route to the rich renderers, and `> [!NOTE]`-style blockquotes become alert callouts. Labeled links stay plain.	2026-06-26 03:22:14 -05:00
Brooklyn Nicholson	0c190083cd	feat(desktop): lazy embed renderers + fenced diagrams/alerts Per-kind renderers, each a lazy split chunk: plain-iframe video/maps (wheel chains to the transcript; maps gate scroll behind ⌘), the in-document blockquote-script path for X/Instagram, the dark Spotify player, and the YouTube iframe. Adds Mermaid and DOMPurify-sanitised SVG fences and GFM alert callouts, all sized to 33dvh and theme-matched to avoid white color-scheme artifacts. Main-process stamps a Referer on YouTube embed requests.	2026-06-26 03:22:08 -05:00
Brooklyn Nicholson	81ac562bf0	feat(desktop): inline embed detection + module primitives Pure, synchronous URL→descriptor matchers for YouTube, Vimeo, Instagram, Pinterest, TikTok, X, Spotify, Google Maps and OpenStreetMap, plus the shared embed primitives (error boundary, fail card, escape-html, dark-mode hook, sizing token). Declares the mermaid + dompurify deps used by the fenced renderers.	2026-06-26 03:21:59 -05:00
helix4u	063fe4f6ef	fix(auxiliary): fallback on invalid provider responses	2026-06-26 13:49:46 +05:30
teknium1	fbfccbb3ee	fix(security): align cron invisible-unicode set with install-time scanner The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17. The cron-local set was missing U+2062-U+2064 (invisible math operators) and U+2066-U+2069 (directional isolates), so a directive obfuscated with one of those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past the runtime cron gate while being caught at install time. Import the canonical set so the cron tripwire and install scanner can't drift apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged. Fixes #35075 Co-authored-by: rlaope <piyrw9754@gmail.com>	2026-06-26 01:11:11 -07:00
Shannon Sands	a0dc92450b	Split dashboard PTY reconnect tests	2026-06-26 01:06:02 -07:00
Shannon Sands	41f8126148	Reconnect dashboard PTY chat after socket drops	2026-06-26 01:06:02 -07:00
Shannon Sands	6a319f570f	Settle TUI resume scroll after hydration	2026-06-26 01:05:26 -07:00
Teknium	619dc4a561	fix(whatsapp_cloud): resolve reply-to text so the agent sees reply context (#52957 ) Replies on WhatsApp Cloud arrived at the agent with reply_to_id set but reply_to_text=None, so run.py never injected the "[Replying to: ...]" disambiguation prefix (it gates on reply_to_text). Meta's webhook context object carries only the quoted message's id, never its text. Index (chat_id, wamid) -> text in rich_sent_store on every inbound message and every outbound text send -- the same store that solved the identical Telegram rich-send problem -- then look up the quoted text in _build_message_event_from_cloud and populate reply_to_text plus reply_to_is_own_message, derived from context.from versus the business number.	2026-06-26 01:05:05 -07:00
Ben	19b2624404	feat(gateway): external drain trigger + accept-gating (begin/cancel + control channel) Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible quiesce-without-restart machinery NAS drives during a lifecycle action (D4a). These ship together because the endpoint, the control channel, and the gateway state machine are one coherent slice. 2.2 — control channel (gateway/drain_control.py, new): The dashboard has no HTTP path into a running gateway (guardrails: "there is NO external control channel into a running gateway"); restart/drain is driven only by markers the gateway reacts to. So begin/cancel-drain writes/removes a presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write, never-raises read; a corrupt marker reads as present-contentless → fail-safe toward quiescing). This is Q-B option A. 2.2 — gateway state machine (gateway/run.py): - _external_drain_active flag, DISTINCT from the shutdown _draining flag: this one does NOT exit the process and is fully reversible. - _enter_external_drain / _exit_external_drain: idempotent transitions that flip gateway_state→draining / →running via _update_runtime_status (preserving the live active_agents count). exit refuses to revert to running during a real shutdown or after the loop stops (shutdown wins). - _drain_control_watcher: 1s background task (modelled on _handoff_watcher) reconciling accept-state with the marker; honours a marker that survived a restart on its first tick. Registered alongside the other watchers in start. - New-turn accept gate in _handle_message, placed BEFORE the session-slot claim: when draining, refuse to START a new turn (so active_agents can only fall → no TOCTOU race), while in-flight turns finish untouched. Internal/ system events (restart-recovery replays, bg-process completions) bypass it. 2.1 — endpoint (hermes_cli/web_server.py): POST /api/gateway/drain {action: drain\|cancel}. Authenticated by the Task-2.0a token seam (the drain plugin registered this exact path as a token route); attributes the request to the verified token principal. Begin writes the marker, cancel removes it — the gateway process owns the actual transition. Force-override (D6) is NOT here; it maps onto the existing immediate /api/gateway/restart force path. Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next): - tests/gateway/test_external_drain_control.py — marker contract (write/clear/ read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/ loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and in-flight-not-interrupted. 15 tests. - tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/ cancel/cancel-idempotent/bad-action-400. 6 tests. - dashboard.drain_auth config section already added in 2.0b commit. All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed. Intentionally deferred: - HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a real begin-drain marker, prove the 5-point checklist a–e. - Spec-doc status flip + Phase-2 PR. Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin, token-auth, and web_server-endpoint suites green.	2026-06-26 00:47:19 -07:00
Ben	2e322466b1	feat(dashboard-auth): drain shared-bearer-secret provider plugin Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A. plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic): - DrainSecretProvider: non-interactive provider, supports_token=True. Verifies an inbound Authorization bearer token against a per-agent shared secret with hmac.compare_digest (constant-time, no timing oracle) and, on a match, vouches for the caller as the "drain-control" principal scoped to "drain". The five interactive ABC methods raise NotImplementedError; verify_session returns None (stacks harmlessly in the cookie-verify loop). - assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or below 128 bits Shannon entropy — so a weak/structured/repeated secret can never be silently accepted. Enforced both at register() (friendly skip reason) and in __init__ (raises — defence in depth). - register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a strong secret, registers the provider AND opts /api/gateway/drain into the generic token-auth seam via register_token_route(). Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET (per-agent, provisioned by NAS at deploy). Behavioural knobs only (dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline. Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate (strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token (match → scoped principal, wrong/empty → None, custom scope), protocol compliance, interactive-methods-raise, and register() (skip-no-secret, fail-closed-weak-secret, strong-env-secret registers + route opt-in, config scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed. Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous. Intentionally deferred: - The begin/cancel-drain endpoint handler itself — Task 2.1. - The dashboard→gateway control channel — Task 2.2. Build status: dashboard-auth + drain-plugin suites green.	2026-06-26 00:47:19 -07:00
Ben	cb9cb6ba1c	feat(dashboard-auth): generic non-interactive API-token capability Task 2.0a of the safe-shutdown drain-coordination plan. Widens the dashboard auth framework GENERICALLY to support non-interactive (service-to-service) bearer-token auth, mirroring the existing supports_password precedent. This is a reusable capability — any future machine-credential provider plugs in without core changes (decisions.md Q-C). The drain bearer-secret plugin (Task 2.0b) is the first consumer, not the definition. - base.py: add TokenPrincipal dataclass (the token analog of Session) + supports_token capability flag + verify_token() on the ABC (default raises NotImplementedError so a misconfigured provider fails loud). Contract mirrors verify_session stacking: return None for unrecognised tokens (never raise), raise ProviderError only on a genuine backing-store outage. - registry.py: list_token_providers() — the supports_token subset, in registration order. Empty when none registered (token routes fail closed). - token_auth.py (new): route-agnostic seam. Routes opt in via register_token_route(exact path); token_auth_middleware owns the auth decision for those routes only — authenticate via stacked providers, attach request.state.token_principal + token_authenticated, pass through. 401 on missing/unrecognised token, 503 when a provider was unreachable, untouched passthrough for non-token routes. Fails closed (never open). - web_server.py: install the seam OUTERMOST (registered last → runs first). Both downstream gates (legacy auth_middleware + gated_auth_middleware) honour request.state.token_authenticated and skip enforcement, so a token-authed service request is never bounced to /login. - audit.py: TOKEN_AUTH_SUCCESS / TOKEN_AUTH_FAILURE events. Tests: tests/hermes_cli/test_dashboard_token_auth.py — ABC flag default, verify_token NotImplementedError, registry filter, bearer extraction (case-insensitive scheme, malformed/non-bearer → ""), provider stacking (first-match-wins, unreachable-remembered, unreachable-then-valid, buggy provider doesn't crash the gate), and the seam's passthrough/401/503/ fail-closed behaviour. 29 new tests; full dashboard-auth suite 169 passed. Intentionally deferred: - The concrete shared-bearer-secret provider plugin — Task 2.0b. - The begin/cancel-drain endpoint that registers itself as a token route — Task 2.1. Build status: dashboard-auth + plugin-hook suites green.	2026-06-26 00:47:19 -07:00
Teknium	099df3cd89	fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925 ) The known_c2_framework threat pattern included 'praxis' in its alternation alongside genuine offensive-security tool brands (Cobalt Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those distinctive brand names, 'praxis' is a common English word (Greek for practice/action) and a legitimate agent name, so any context file that mentioned an agent named Praxis matched at 'context' scope and the whole AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it reached the system prompt. Remove 'praxis' from the alternation and add a guard comment: every token in this list must be a distinctive tool brand, not a common word. Real C2 brands still fire.	2026-06-26 00:36:01 -07:00
teknium1	4d0dd6bd52	test(mcp): make invalid_client tests interactive under hermetic env The new _maybe_flag_poisoned_client tests built a provider via get_or_build_provider without an interactive stdin. Under the hermetic test env (no TTY, no cached tokens), the non-interactive guard in mcp_oauth_manager._make_provider raised OAuthNonInteractiveError before the provider was built, failing 6 tests in CI parity (they passed locally where stdin was a TTY). Thread monkeypatch into _provider_with_token_endpoint and present an interactive stdin, matching the sibling test_manager_builds_hermes_provider_subclass.	2026-06-26 00:35:27 -07:00
Max Hsu	075f93ad78	fix(mcp): auto-recover from invalid_client on stale OAuth client registration Fixes #36767. Two complementary recoveries for the recurring "delete three cache files and re-auth by hand" ritual when an MCP server's dynamically-registered OAuth client goes dead server-side (IdP redeploy / DB wipe / rebrand): - Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs auth-flow responses and, on a 400/401 `invalid_client` from the discovered token endpoint, backs up + deletes `<server>.client.json` and `.meta.json` and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client registration on the next flow. Conservative by construction: only dynamically-registered (non config-supplied) clients, only the token endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's `invalid_client_metadata` does not trip it); best-effort so a miss never breaks the live flow. Covers both code-exchange and refresh when the token endpoint was discovered. Tokens are preserved. - `hermes mcp reauth [<name>\|--all]`: the reporter's primary symptom — the IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK only sees a callback timeout), so it cannot be auto-detected. The new command re-auths one or ALL `auth: oauth` servers, serially: one browser flow at a time, which also fixes the startup popup storm when several servers are stale at once. Single-server reauth is factored out of `mcp login` and shared. Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint, success-response, pre-registered, and invalid_client_metadata negative guards; a bridge integration test driving the real async_auth_flow generator to prove the detection hook preserves the bidirectional asend() forwarding contract; reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh 122/122 green for the touched suites; check-windows-footguns.py and ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 00:35:27 -07:00
Ben Barclay	6e4e5967f7	feat(relay): multi-platform-per-agent — list identity, provision-loop, N-hello, per-frame egress (Phase 1.5) (#52830 ) Cut over the agent half of Shape A (D-Q1.5a/b.1/c) to front a SET of platforms on one relay WS: - relay_platform_identities() parses GATEWAY_RELAY_PLATFORMS (list) + GATEWAY_RELAY_BOT_IDS (JSON keyed map {platform:{botId,username?}}). Cut over from the scalar GATEWAY_RELAY_PLATFORM/_BOT_ID (no fallback, D-Q1.5c). - self_provision_relay() loops one /relay/provision per platform under one gatewayId+secret, partial-failure-tolerant. - WebSocketRelayTransport takes the identity SET, sends one hello per identity (connector accumulates the advertised set), and stamps the per-frame OutboundFrame.platform + its matching advertised botId on outbound. - RelayAdapter remembers each chat's underlying source.platform (mirroring the existing guild/dm scope capture) and tags the reply's egress platform. - send_relay_policy() declares one relevance policy per fronted platform (the connector keys policy by (tenant,platform,instanceId)). Single-platform deploys are byte-identical on the wire (1-element list, no per-frame tag -> connector session-default fallback). typecheck/ruff clean; relay unit 221 pass (+10 new); all 15 cross-repo E2E drivers green vs connector origin/main.	2026-06-26 17:32:46 +10:00
brooklyn!	a2b49e60b6	Merge pull request #52412 from GodsBoy/fix/verify-on-stop-messaging-surface-leak fix(agent): gate verify-on-stop nudge off for messaging surfaces	2026-06-26 02:30:08 -05:00
kshitij	7d568293f9	Merge pull request #52891 from kshitijk4poor/salvage/52623-aux-host fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608)	2026-06-26 12:24:19 +05:30
konsisumer	3cf900eb67	fix(install): discard managed lockfile churn before stashing	2026-06-25 23:49:11 -07:00
Ben Barclay	cb7d1f68f8	fix(relay): accept is_reconnect kwarg in RelayAdapter.connect (#52911 ) The gateway reconnect watcher (gateway/run.py) recovers a platform after a fatal adapter error by building a fresh adapter and calling connect(is_reconnect=True). Every BasePlatformAdapter implements connect(*, is_reconnect: bool = False) for this — except RelayAdapter, whose connect() was bare. So the watcher's recovery path raised: TypeError: connect() got an unexpected keyword argument 'is_reconnect' Observed live on a hosted staging agent: after a fatal relay adapter error the watcher could never re-establish relay, so the shared-bot inbound never reached the gateway and Discord DMs stopped (dashboard surfaced the TypeError). Relay deliberately ignores the flag: the #46621 server-side-queue-preservation concern doesn't apply, because relay's outage buffer is the connector's durable buffer (replayed on the transport's re-handshake), not a gateway-side queue the adapter owns. Routine WS drops are already handled by the transport's own reconnect supervisor (WebSocketRelayTransport, reconnect=True); the watcher path is fatal-error recovery, and the fatal handler disconnect()s the old adapter (cancelling its supervisor) before a fresh adapter+transport is built, so there is no double-dial. Adds two regression tests (both proven red without the fix): connect(is_reconnect=True) reaches the same transport-less RuntimeError instead of TypeError, and the signature matches BasePlatformAdapter.connect.	2026-06-26 16:46:09 +10:00
brooklyn!	0f81b0d458	Merge pull request #52901 from NousResearch/bb/desktop-tui-lint-fixes style(desktop,tui): fix all lint/type/formatting issues	2026-06-26 01:07:14 -05:00
Brooklyn Nicholson	62fe9fd101	style(desktop,tui): fix all lint/type/formatting issues Bring apps/desktop and ui-tui to a clean state for typecheck, eslint, and prettier: - Run prettier across both trees (printWidth/wrap drift; prettier is not CI-enforced for these JS projects, so main had accumulated drift). - Apply eslint --fix for padding-line-between-statements and perfectionist import/export sorting. - Manual fixes for non-auto-fixable rules: - remove unused node:net import in electron/main.cjs (uses Electron net) - replace inline `typeof import(...)` annotations with top-level `import type * as EnvModule` in two ui-tui test files - scoped eslint-disable no-control-regex on intentional sentinel/ANSI regexes (mathUnicode.ts, text.ts) - resolve react-hooks/exhaustive-deps per-case: correct swapped/missing deps, collapse redundant session.* members, and justified disables on settings mount-only data-load effects to preserve run-once behavior No behavior changes; test pass/fail counts are unchanged from the main baseline.	2026-06-26 01:04:33 -05:00
Moonsong	4e66bf1f80	fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608 ) When operator config has provider=anthropic with model.base_url pointing at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic), the auxiliary Anthropic path was unconditionally applying that override. Main-session traffic routed correctly because the main path attaches the right credential for the actual destination, but every side-channel call (memory extractors, reflection, vision, title generation, janus extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd. Gate the override on hostname == api.anthropic.com. Operators routing main through a non-Anthropic provider must use that provider's own auxiliary client; the Anthropic aux path now stays pointed at api.anthropic.com. Regression tests cover openrouter, openai, anthropic-with-path, empty, and anthropic-default-base_url cases.	2026-06-26 11:21:05 +05:30

1 2 3 4 5 ...

13053 commits