hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	27f03243a0	fix(dashboard): stop ElevenLabs voice-list 401 log spam The /api/audio/elevenlabs/voices endpoint logged a WARNING on every failure, and the desktop re-polls it on each settings open/focus — a bad/expired/scoped ELEVENLABS_API_KEY floods agent/gui logs with identical "voice list failed: HTTP Error 401" lines indefinitely. Treat 401/403 as a persistent "integration unavailable" state: return {available: false, error: "unauthorized"} with a 200 (the dropdown already handles available:false) instead of a 502, and collapse repeated identical failures to a single log line via a small re-arming latch (logs again on recovery or when the error changes). Non-auth errors keep the 502 but are throttled the same way.	2026-06-28 17:59:28 -05:00
Brooklyn Nicholson	453f134b3b	refactor(desktop): centralize remote git REST routing Keep the remote git mirror as a thin facade: route all GETs through gitGet, all mutations through gitPost, and keep consumers on desktopGit(). On the backend, route git paths through a single _git_path helper instead of repeating str(_fs_path(...)) in every endpoint. Behavior unchanged.	2026-06-28 14:37:36 -05:00
Brooklyn Nicholson	fc86e35764	feat(desktop): make the git cockpit work over a remote gateway After the folder picker fix, an added remote folder was still half-usable: the desktop's git GUI (coding-rail status, worktree lanes, review pane, branch switch, file diff) all ran Electron-local git on the USER's machine, so against a remote-gateway repo they silently degraded to empty. Mirror the whole surface over the dashboard REST API so it acts on the BACKEND repo where sessions actually run: - hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/ create-pr, file-diff, worktree add/remove, branch switch) shelling to the system git, mirroring the Electron ops' shapes. - web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as /api/fs, executor-offloaded, mutations -> 400). - apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as window.hermesDesktop.git; coding-status / review / projects / model / desktop-fs route through desktopGit() so local stays Electron, remote hits /api/git/*. Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts, review classification, diff incl. untracked all-add, stage+commit roundtrip, worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).	2026-06-28 14:26:09 -05:00
Teknium	a06d0198cd	fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190 ) The /api/pty handler only closed the PtyBridge in the writer loop's finally. On child EOF the reader task closes the WebSocket, but if the handler task is cancelled the instant the socket closes, the writer's finally can be skipped and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted. Reap the bridge in the reader's EOF finally too (close() is idempotent), so the PTY is reaped independently of the writer-loop cancellation race. Harden the regression test to poll for teardown instead of asserting on the same tick. Was flaky on main (2/20); now 25/25.	2026-06-28 03:58:18 -07:00
Teknium	6d879d486b	fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028 ) (#54123 ) * fix(dashboard): close PTY WebSocket on child EOF to stop FD leak The /api/pty handler's reader task returns on child EOF, but the writer loop stayed blocked on ws.receive() until the browser sent a disconnect. When the browser socket is half-open (no FIN delivered — common on macOS/launchd), that disconnect never arrives, so the handler never reaches its finally and the PTY master fd + child process leak. With dashboard auto-reconnect (#52962), every dropped socket then spawns a fresh PTY on top of the orphaned one, exhausting file descriptors within hours (EMFILE / Errno 24). Fix: the reader task now closes the WebSocket in a finally when the child EOFs or the send side breaks, which unblocks ws.receive() so the existing finally runs bridge.close(). The writer loop also guards ws.receive() against the RuntimeError Starlette raises once the socket is closed. Reported by @fifteenzhang. Fixes #54028 * docs: add infographic for #54028 PTY FD leak fix	2026-06-28 02:42:21 -07:00
Teknium	fde1c8570f	fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005 ) (#54126 ) When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a full traceback for every pending connection-lost callback — 50+ identical WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not actionable: they are the expected side effect of the peer hanging up before our writes drained. Install a loop exception handler on the gateway serving loop that collapses exactly this teardown class (ConnectionResetError/ConnectionAbortedError/ BrokenPipeError originating from _call_connection_lost) to a single debug line, forwarding every other loop error to the existing/default handler unchanged so genuine loop bugs still surface. Idempotent per loop.	2026-06-28 02:35:01 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Teknium	5ab4136631	fix(webui): switch provider when Config-page model field changes (#53583 ) The dashboard Config tab's Model field is a flat string with no provider info. _denormalize_config_from_web only updated model.default and kept the stale provider, so picking an OpenRouter model while the default provider was ollama-local left provider=ollama-local and every call 404'd. When the model string actually changes, infer the serving provider — curated catalog first, then a vendor/model-slug heuristic for non-aggregator providers — and route the switch through the existing _normalize_main_model_assignment / _apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are cleared on a provider change and preserved on a same-provider re-pick. Saving an unchanged model never re-detects, so unrelated config saves keep an explicit provider. Closes #14058	2026-06-27 04:13:44 -07:00
blaryx	76af2456a2	fix(dashboard): merge PUT /api/config with existing on-disk config The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate every root-level key the YAML supports. Most visibly, `custom_providers` is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend never sends it in the PUT body. The previous full-replace save() then silently wiped the key from disk every time the user clicked anything that triggered a save. Other casualties (less visible because defaults re-mask them on load) include `agent.personalities`, `agent.reasoning_effort`, `terminal.lifetime_seconds`, etc. Fix: read the raw on-disk config and deep-merge the incoming PUT body on top of it before saving. The frontend can only overwrite what it explicitly sends; everything else is preserved verbatim. Reuses the existing `_deep_merge` helper from `hermes_cli.config`. Tests: - `test_round_trip_preserves_custom_providers` exercises the exact bug: seed config with custom_providers, GET → drop the key → PUT, assert it's still on disk. - `test_round_trip_preserves_schema_invisible_nested_keys` covers the shallow-vs-deep-merge case for nested dicts under `agent` etc. Both fail on current main; both pass with this patch.	2026-06-27 03:48:18 -07:00
Versun	c655cdf2c1	feat(dashboard): expose cron job execution fields	2026-06-27 03:20:32 -07:00
kshitijk4poor	244a6f2ceb	fix(desktop): broken "Open setup guide" button for plugin platforms On the desktop Channels / Messaging page, the "Open setup guide" button was rendered as a bare <a href={platform.docs_url} target="_blank"> with no guard. Plugin-provided platforms (Microsoft Teams, Google Chat, Line, Raft, Yuanbao, …) ship an empty docs_url, so the anchor's href was "". In a packaged build, Electron resolves an empty href against the current document — the app's own index.html inside the asar bundle — and shell.openPath then fails with an OS "file not found" dialog. This is exactly the Windows error reported for Messaging → Teams → Open guide. Fix (3 changes): 1. fix(desktop) — Only render the "Open setup guide" button when docs_url is non-empty, and route clicks through openExternalLink so a relative/empty value can never be treated as a local bundle path. Fixes the whole class (every plugin platform), not just Teams. 2. fix(messaging) — Give the Teams platform plugin a real docs_url (Microsoft Teams setup guide) so its card shows a working button instead of nothing. 3. fix(messaging) — Give the Google Chat platform plugin a real docs_url (Google Chat setup guide) so its card shows a working button instead of nothing. Originally from #48940; folded in here because that PR's test was broken (it queried the HTTP endpoint, but google_chat is a dynamic enum member that only appears after the adapter module is imported). Test plan: - apps/desktop — new src/app/messaging/index.test.tsx: button is hidden when docs_url is empty; a real URL opens via the validated external opener (does not navigate). - apps/desktop typecheck (tsc --noEmit) clean. - backend — test_teams_messaging_metadata_links_setup_guide: the Teams catalog entry exposes the setup-guide docs_url. - backend — test_google_chat_messaging_metadata_links_setup_guide: the Google Chat catalog entry exposes the setup-guide docs_url. Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: p-andhika <andhika.prakasiwi@gmail.com>	2026-06-27 04:34:08 +05:30
Nacho Avecilla	f509f6e598	fix(dashboard): offload PTY spawn/close off the event loop (#53227 ) * Fix blocking tasks on the dashboard * Remove unnecessary comments	2026-06-26 12:47:23 -07:00
Shannon Sands	41f8126148	Reconnect dashboard PTY chat after socket drops	2026-06-26 01:06:02 -07:00
Ben	19b2624404	feat(gateway): external drain trigger + accept-gating (begin/cancel + control channel) Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible quiesce-without-restart machinery NAS drives during a lifecycle action (D4a). These ship together because the endpoint, the control channel, and the gateway state machine are one coherent slice. 2.2 — control channel (gateway/drain_control.py, new): The dashboard has no HTTP path into a running gateway (guardrails: "there is NO external control channel into a running gateway"); restart/drain is driven only by markers the gateway reacts to. So begin/cancel-drain writes/removes a presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write, never-raises read; a corrupt marker reads as present-contentless → fail-safe toward quiescing). This is Q-B option A. 2.2 — gateway state machine (gateway/run.py): - _external_drain_active flag, DISTINCT from the shutdown _draining flag: this one does NOT exit the process and is fully reversible. - _enter_external_drain / _exit_external_drain: idempotent transitions that flip gateway_state→draining / →running via _update_runtime_status (preserving the live active_agents count). exit refuses to revert to running during a real shutdown or after the loop stops (shutdown wins). - _drain_control_watcher: 1s background task (modelled on _handoff_watcher) reconciling accept-state with the marker; honours a marker that survived a restart on its first tick. Registered alongside the other watchers in start. - New-turn accept gate in _handle_message, placed BEFORE the session-slot claim: when draining, refuse to START a new turn (so active_agents can only fall → no TOCTOU race), while in-flight turns finish untouched. Internal/ system events (restart-recovery replays, bg-process completions) bypass it. 2.1 — endpoint (hermes_cli/web_server.py): POST /api/gateway/drain {action: drain\|cancel}. Authenticated by the Task-2.0a token seam (the drain plugin registered this exact path as a token route); attributes the request to the verified token principal. Begin writes the marker, cancel removes it — the gateway process owns the actual transition. Force-override (D6) is NOT here; it maps onto the existing immediate /api/gateway/restart force path. Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next): - tests/gateway/test_external_drain_control.py — marker contract (write/clear/ read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/ loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and in-flight-not-interrupted. 15 tests. - tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/ cancel/cancel-idempotent/bad-action-400. 6 tests. - dashboard.drain_auth config section already added in 2.0b commit. All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed. Intentionally deferred: - HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a real begin-drain marker, prove the 5-point checklist a–e. - Spec-doc status flip + Phase-2 PR. Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin, token-auth, and web_server-endpoint suites green.	2026-06-26 00:47:19 -07:00
Ben	cb9cb6ba1c	feat(dashboard-auth): generic non-interactive API-token capability Task 2.0a of the safe-shutdown drain-coordination plan. Widens the dashboard auth framework GENERICALLY to support non-interactive (service-to-service) bearer-token auth, mirroring the existing supports_password precedent. This is a reusable capability — any future machine-credential provider plugs in without core changes (decisions.md Q-C). The drain bearer-secret plugin (Task 2.0b) is the first consumer, not the definition. - base.py: add TokenPrincipal dataclass (the token analog of Session) + supports_token capability flag + verify_token() on the ABC (default raises NotImplementedError so a misconfigured provider fails loud). Contract mirrors verify_session stacking: return None for unrecognised tokens (never raise), raise ProviderError only on a genuine backing-store outage. - registry.py: list_token_providers() — the supports_token subset, in registration order. Empty when none registered (token routes fail closed). - token_auth.py (new): route-agnostic seam. Routes opt in via register_token_route(exact path); token_auth_middleware owns the auth decision for those routes only — authenticate via stacked providers, attach request.state.token_principal + token_authenticated, pass through. 401 on missing/unrecognised token, 503 when a provider was unreachable, untouched passthrough for non-token routes. Fails closed (never open). - web_server.py: install the seam OUTERMOST (registered last → runs first). Both downstream gates (legacy auth_middleware + gated_auth_middleware) honour request.state.token_authenticated and skip enforcement, so a token-authed service request is never bounced to /login. - audit.py: TOKEN_AUTH_SUCCESS / TOKEN_AUTH_FAILURE events. Tests: tests/hermes_cli/test_dashboard_token_auth.py — ABC flag default, verify_token NotImplementedError, registry filter, bearer extraction (case-insensitive scheme, malformed/non-bearer → ""), provider stacking (first-match-wins, unreachable-remembered, unreachable-then-valid, buggy provider doesn't crash the gate), and the seam's passthrough/401/503/ fail-closed behaviour. 29 new tests; full dashboard-auth suite 169 passed. Intentionally deferred: - The concrete shared-bearer-secret provider plugin — Task 2.0b. - The begin/cancel-drain endpoint that registers itself as a token route — Task 2.1. Build status: dashboard-auth + plugin-hook suites green.	2026-06-26 00:47:19 -07:00
Brooklyn Nicholson	ff81365988	feat(desktop): in-app spot editor for the file preview pane Adds a CodeMirror 6 spot editor to the right-rail file preview so users can make quick edits in-app without leaving for an IDE. Entering edit mode is a pure in-place swap of the read view — same fixed-height header, same gutter geometry/typography (mirrors SourceView 1:1) so nothing shifts — toggled via the Edit button, a bare `e` when the pane is hovered/focused, or the tab. - Save path is transport-agnostic (writeDesktopFileText): local Electron IPC or a new hardened POST /api/fs/write-text on the dashboard server (path validation, parent-must-exist, regular-files-only, size cap, atomic temp-file + os.replace), behind the existing auth middleware. - Stale-on-disk guard re-reads before writing and offers overwrite vs discard-and-reload instead of clobbering external/agent edits. - VS Code-style modified dot on the tab; ⌘/Ctrl+S and ⌘/Ctrl+Enter save, Esc cancels; GitHub highlight style matched to the read view's Shiki theme. - Typing stays render-free (draft in a ref; dirty flips once at the boundary).	2026-06-25 19:50:25 -05:00
brooklyn!	ffa3d3c811	Merge pull request #49037 from NousResearch/bb/projects-paradigm feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools	2026-06-25 17:49:05 -05:00
Gille	e7d2f0b93c	fix(windows): suppress console flashes and harden gateway restarts	2026-06-25 14:42:38 -07:00
Brooklyn Nicholson	4e023f5bc9	feat(gateway): build authoritative project tree	2026-06-25 16:40:27 -05:00
Teknium	c6575df927	feat(moa): expose MoA presets as selectable virtual models (#46081 ) * feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.	2026-06-25 13:52:06 -07:00
kshitij	c42d44cb2f	revert(plugins): restore user dashboard plugin backend API auto-import (#43719 ) (#51950 ) * Revert "refactor(security): centralize non-bundled plugin sources in one constant" This reverts commit `e2bea0abe6`. * Revert "fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)" This reverts commit `8845f3316c`.	2026-06-24 07:46:54 -07:00
yusekiotacode	2ee6449fe5	fix(anthropic): use platform.claude.com for OAuth token exchange Anthropic migrated the OAuth token endpoint from console.anthropic.com/v1/oauth/token (now returns HTTP 404) to platform.claude.com/v1/oauth/token. The token refresh path already iterated both hosts, but the two initial code-exchange call sites were hardcoded to the dead console host, so every new Claude OAuth login failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved no credentials. Fix the whole bug class: - Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live host for backward-compat with existing imports. - run_hermes_oauth_login_pure() (CLI flow) iterates the list, first success wins, mirroring the refresh path. - hermes_cli/web_server.py (desktop dashboard flow) imports the list and iterates it too, so the GUI login path is fixed identically. Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone), platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real Claude MAX OAuth login now succeeds end-to-end.	2026-06-23 23:59:40 -07:00
Teknium	050bd01b7b	fix(dashboard): serve uvicorn on SelectorEventLoop on Windows (#50641 ) (#51717 ) On Windows, start_server() served uvicorn via a bare asyncio.run(_serve()), which uses the default ProactorEventLoop. uvicorn's socket-serving stack assumes a SelectorEventLoop on win32 (uvicorn/loops/asyncio.py forces it, and uvicorn.Server.run threads config.get_loop_factory() into its runner for exactly this reason). Driving uvicorn on the proactor loop makes server.startup() bind a socket that never accepts: the dashboard and desktop backend print "Skipping web UI build" then hang forever with the port LISTENING but no TCP handshake completing. Fix is win32-scoped to keep the blast radius minimal: POSIX keeps the exact asyncio.run(_serve()) it had (its default loop is already a SelectorEventLoop / uvloop, which is what uvicorn serves on). Only on Windows do we mirror uvicorn.Server.run and run on the loop factory uvicorn picks, with a fallback to WindowsSelectorEventLoopPolicy for uvicorn < 0.36. Fixes hermes dashboard and hermes desktop (the Electron app spawns a hermes dashboard backend). The gateway symptom in the report has a separate root cause (no uvicorn) and is not addressed here.	2026-06-23 23:43:24 -07:00
uperLu	0d4cecb352	fix(cron): avoid provider package shadowing core cron	2026-06-23 23:39:22 -07:00
Eri Barrett	ba9e3a491b	feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335 ) * feat(memory): OAuth token storage and refresh for the Honcho provider * feat(memory): refresh the Honcho OAuth token in the client and session * feat(memory): zero-CLI loopback OAuth authorization flow * feat(memory): generic memory-provider OAuth connect endpoints * feat(desktop): memory-provider OAuth connect link * feat(memory): CLI OAuth sign-in with source-tagged authorize links * fix(memory): IP-literal loopback redirect and consent config_path on the authorize link * fix(memory): profile-scope the memory-provider OAuth endpoints * refactor(desktop): generic memory-provider OAuth client functions * docs(memory): trim OAuth module docstrings to the invariants * docs(memory): document OAuth connect as an optional auth method * fix(memory): send home-relative display path to consent, not the absolute path * perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read * fix(memory): log OAuth refresh failures at warning, not debug * feat(memory): fall back to an OS-assigned loopback port when 8765 is taken * test(memory): cover the desktop Connect launcher, status, and provider dispatch * fix(desktop): keep the memory-provider dropdown one size regardless of connect state * fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched * refactor(memory): move OAuth connect routes out of web_server into a memory-layer router * refactor(desktop): import MemoryConnect directly, drop the single-export barrel * fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard * fix(desktop): auto-clear the OAuth error state instead of leaving it sticky * test(honcho): isolate auth-method prompt from deployment-shape wizard tests main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only. * docs(honcho): document query-adaptive reasoning level (reasoningHeuristic) README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview. * fix(honcho): default the CLI peer prompt to the OAuth consent name The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it. * feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop) resolve_endpoints now picks the client_id from the initiating surface and threads it through authorize -> token exchange -> persisted grant -> refresh, so the CLI and desktop register as distinct OAuth clients. Surface-specific env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic HONCHO_OAUTH_CLIENT_ID, which still overrides every surface. * feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of masking the OAuth access token as a generic API key; setup notes an existing OAuth grant when re-run. * docs(honcho): drop 'shared pool' wording from unified observation mode help * fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation The in-process threading lock can't stop a sibling process (another profile or the desktop app sharing honcho.json) from replaying the single-use refresh token and tripping reuse-detection, which revokes the whole grant. Guard the read-refresh-persist section with an OS file lock on <config>.lock so only one process rotates at a time; the others re-read the freshly-persisted token. Best-effort: platforms without flock degrade to in-process serialization. * refactor(honcho): one OAuth client (hermes-agent) for all surfaces Collapse the per-surface client_id split. CLI and desktop now use a single client_id (hermes-agent); consent branding/UI still adapt via the source query param. One grant identity means no clientId-vs-refresh-token desync that could get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting. * fix(honcho): per-session resolves to session_id, never remapped by title Reorder resolve_session_name so stable identifiers win over labels: gateway per-chat key first, then the per-session session_id, then the cwd map / title. A (possibly auto-generated) title can no longer remap a live per-session conversation onto a second Honcho session mid-stream — fixes the desktop, which is per-conversation via session_id. Consequence: a gateway's per-chat key now also wins over a title (titles never remap a stable id).	2026-06-22 19:16:47 -05:00
Brooklyn Nicholson	2dfcead683	feat(computer-use): make the preflight cross-platform (win/linux) The card was macOS-only. cua-driver also runs on Windows and Linux, so fold `cua-driver doctor` (cross-platform binary/health probes) into a single OS-aware `ready` signal: - macOS: ready == both TCC grants; keeps the permission rows + grant flow. - Windows/Linux: no TCC toggles, so ready == driver health, with a per-OS note (SmartScreen/UIAccess on Windows; X11/XWayland on Linux). `computer_use_status()` replaces the macOS-only `permissions_status()` and surfaces `platform`, `ready`, `can_grant`, and the doctor `checks` (non-ok ones render as warnings). CLI `permissions status`, the REST endpoint, and the desktop card all key off the one payload. Grant stays macOS-only (400 elsewhere — nothing to grant).	2026-06-22 17:48:43 -05:00
Brooklyn Nicholson	0223ea5f59	feat(computer-use): surface macOS permission preflight in the desktop Computer Use already worked through the desktop backend (the cua-driver toolset enables + installs via Settings -> Skills & Tools), but there was no in-app way to see or grant the two macOS permissions it needs, so "give a model my Mac" was tribal knowledge. The grants attach to cua-driver's OWN TCC identity (com.trycua.driver / the installed CuaDriver.app), not Hermes -- so no app entitlement is involved. cua-driver 0.5+ exposes `permissions status/grant`, which we wrap: - tools/computer_use/permissions.py: thin client over the two subcommands - hermes computer-use permissions {status,grant}: CLI parity - GET /api/tools/computer-use/status, POST .../permissions/grant: desktop REST - ComputerUsePanel: live Accessibility + Screen Recording state with a Grant button (dialog attributed to CuaDriver), shown in the expanded Computer Use toolset row. Binary install stays in the existing provider post-setup runner. Follow-ups: i18n the card copy; a "Stop driver" control (cua-driver stop) for the runaway-`serve` case.	2026-06-22 17:33:52 -05:00
Austin Pickett	2a58fee1a1	fix(api): allow dashboard updates for git checkouts in containers (#51005 ) Salvages #50469 by @libre-7. _dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update. Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case. Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution. Co-authored-by: libre-7 <libre-7@users.noreply.github.com>	2026-06-22 15:55:33 -04:00
kshitij	5937b95192	Merge pull request #50773 from NousResearch/salvage/43719-dashboard-plugin-rce fix(security): restrict dashboard plugin backend auto-import to bundled plugins — defense-in-depth (#43719)	2026-06-22 22:57:33 +05:30
kshitijk4poor	e2bea0abe6	refactor(security): centralize non-bundled plugin sources in one constant /simplify-code (LOW, flagged by two reviewers): the source tags 'user' / 'project' / 'bundled' were bare string literals scattered across the discovery scrub and the two mount-time refuse guards. A typo in any one site (e.g. 'users') would SILENTLY disable a security gate with no error — the exact failure mode this RCE boundary must not have. Introduce a shared module-level _NON_BUNDLED_PLUGIN_SOURCES frozenset referenced by both the discovery scrub and the (now single) mount guard, so the auto-import policy lives in one place. The two mount guards collapse into one gate that still emits the distinct per-source operator message via a map (no loss of guidance). Behavior unchanged: 39 RCE-bypass tests pass, and the constant is mutation-checked (typo'ing it fails the bypass tests). Defence-in-depth (discovery scrub + mount refuse) is retained intentionally.	2026-06-22 22:48:37 +05:30
Teknium	f1e6d39a74	feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842 ) * feat(computer_use): disable cua-driver telemetry by default, add opt-in cua-driver ships anonymous PostHog usage telemetry ENABLED by default upstream (fires cua_driver_install / cua_driver_doctor events to eu.i.posthog.com). Hermes now disables it for our users unless they explicitly opt in. - New config key `computer_use.cua_telemetry` (default false) in DEFAULT_CONFIG. - `cua_backend.cua_driver_child_env()` injects `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is disabled (the default); leaves the var untouched on opt-in so the driver uses its own default. Reads config fail-safe — any error defaults to telemetry off. - Routed every cua-driver spawn site through the policy: MCP backend (StdioServerParameters env), `cua_driver_update_check`, doctor's health_report Popen, the install.sh/install.ps1 runner, and the `--version` / status probes. - Docs: new Telemetry subsection in computer-use.md (EN). - Tests: tests/computer_use/test_cua_telemetry.py — default disables, explicit-false disables, opt-in leaves var untouched, config-failure fails safe, inherited-enabled is overridden off. Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with the var=0 the driver reports "telemetry: disabled via CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs "sending event: cua_driver_doctor". 213 computer_use + install tests green. * fix(dashboard): fold computer_use config category into agent tab The new computer_use.cua_telemetry key created a single-field dashboard config category, tripping test_no_single_field_categories (web_server's invariant that categories with <2 fields must be merged to avoid tab sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the existing onboarding/telegram single-field folds.	2026-06-22 09:57:16 -07:00
Eugeniusz Gilewski	8845f3316c	fix(security): restrict dashboard plugin backend import to bundled plugins (#43719 ) Defense-in-depth for the dashboard plugin auto-import path. The web server auto-imports and mounts the Python backend (dashboard/manifest.json -> api file) of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project), not just bundled plugins. So any plugin that reaches one of those dirs gets arbitrary Python executed on the next dashboard start. NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public --insecure dashboard + open API used to git clone a malicious repo into ~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026 hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and --insecure no longer bypasses the auth gate. This change is therefore NOT closing that (now-authenticated) network path; it removes the residual 'arbitrary code executes merely because a plugin is on disk' hazard, which still applies when a plugin arrives by other means: a socially-engineered git clone, a supply-chain drop, an authenticated-but-malicious actor, or a future regression in the auth gate. Untrusted on-disk code should not auto-execute. Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and project plugins may still extend the dashboard UI via static JS/CSS, but their api Python file is never auto-imported. Two layers: _discover_dashboard_plugins scrubs api/_api_file for user/project sources (and bundled wins name conflicts so a non-bundled plugin cannot shadow a trusted backend route); _mount_plugin_api_routes re-refuses user/project at mount time. Tightens the prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only. Salvaged from #44472 (@egilewski) onto current main.	2026-06-22 17:51:37 +05:30
Shannon Sands	5dae502b86	Address email pairing review feedback	2026-06-21 22:43:57 -07:00
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
Teknium	e448b21414	feat(dashboard): interactive auth setup on no-provider non-loopback bind (#50551 ) When `hermes dashboard --host 0.0.0.0` is run interactively with the auth gate engaged but no DashboardAuthProvider configured, prompt to set up the bundled username/password provider on the spot (or point at `hermes dashboard register` for OAuth) instead of only emitting the fail-closed error. - main.py: `_maybe_setup_dashboard_auth_interactively()` runs before start_server. No-ops on loopback binds, when a provider is already registered, or when stdin/stdout isn't a TTY (Docker/s6, CI, piped runs) so the fail-closed SystemExit stays the backstop for unattended deploys. On the password path it writes dashboard.basic_auth.{username,password_hash,secret} to config.yaml (scrypt hash, never plaintext), then force-rediscovers plugins so the basic provider registers before the gate check. - web_server.py: fix the fail-closed hint — it told operators to set `dashboard_auth.basic.username` but the provider reads `dashboard.basic_auth`. - docs: note the interactive setup under Fail-closed semantics. No new env vars; reuses the existing dashboard.basic_auth config surface.	2026-06-21 20:21:48 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
teknium1	7726ce3040	fix(security): close hermes-0day MCP-persistence attack surface Remove the dashboard --insecure auth-bypass, add an MCP persistence guard + IOC blocklist, and raise the API-server key entropy floor. Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media instance): scanners find exposed Hermes dashboards/API servers, drive the root agent to plant a 'command: bash' MCP entry that appends an attacker SSH key to authorized_keys, which cron + startup then re-execute every tick. - dashboard: --insecure no longer disables the auth gate. should_require_auth returns True for every non-loopback bind; a public bind ALWAYS requires an auth provider (bundled password provider or OAuth). --insecure kept as a warned no-op for backward compat. Fail-closed error now points at the password provider, not at --insecure. - mcp_security: validate_mcp_server_entry now also rejects shell payloads that write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/ rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key + source IPs) anywhere in command/args/env. Runs at save AND spawn time. - api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars; warn when a network-accessible API server runs an unsandboxed local backend.	2026-06-21 19:05:27 -07:00
memosr	ed3d12a762	fix(security): fail-closed when WebSocket peer is empty in loopback mode Per @egilewski's audit on this PR (#15544), the original fix was correct but the file has refactored since: the four endpoint-local empty-peer checks have been consolidated into _ws_client_is_allowed and _ws_client_reason, but the helpers were left fail-open ('no peer host known means allow' / 'no reason to block'). On a loopback-bound dashboard with auth disabled, an ASGI server behind a misconfigured proxy or a unix-socket transport can deliver ws.client == None or ws.client.host == ''. The helpers were treating that as 'allowed', so the loopback-only peer gate could be bypassed by anything that suppressed the client tuple in transit. All four WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap applied uniformly. Fix: * _ws_client_is_allowed: return False when client_host is empty instead of True. Only reached on loopback bind with auth disabled (auth_required=True and explicit non-loopback binds short-circuit earlier), so the fail-closed behavior is scoped to the surface that needs it. * _ws_client_reason: return a 'missing_or_empty_peer bound=...' block reason instead of None, so the dispatcher's existing reason-based rejection path picks it up and the close gets logged with a machine-parseable token for diagnosability. Behavior unchanged for: * gated mode (auth_required=True) — early-returns True before the empty-peer check runs. The OAuth ticket is the auth at that point. * explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN address, always with --insecure) — early-returns True before the empty-peer check runs. DNS-rebinding is still blocked by the Host/Origin guard in _ws_host_origin_is_allowed. * legitimate loopback peers (client_host == '127.0.0.1' / '::1') — not affected by the empty-peer branch. Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py: * test_empty_client_host_rejected_in_loopback_mode * test_missing_client_object_rejected_in_loopback_mode * test_empty_client_host_reason_is_block Plus two regression guards to ensure the fix does not over-reach: * test_empty_client_host_still_allowed_in_insecure_public_mode * test_empty_client_host_still_allowed_in_gated_mode All three new fail-closed tests fail without this patch (the helpers return True / None for an empty peer) and pass with it. The 45 pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.	2026-06-21 13:33:18 -07:00
joaomarcos	475e81dab4	fix(web_server): use run_in_executor for gateway pre-warm and drain-timeout Fixes a regression introduced by the prior approach (synchronous import hermes_cli.gateway inside _lifespan) that caused a new failure mode: the blocking import stalled the asyncio event loop before uvicorn could bind its port, pushing HERMES_DASHBOARD_READY past the desktop shell's 45 s announcement deadline and triggering a respawn loop that accumulated orphaned backend processes. Two-part fix: _lifespan: replace the blocking import with a fire-and-forget run_in_executor call (_warm_gateway_module). The import runs in a worker thread while the server socket is already open, so HERMES_DASHBOARD_READY fires without delay. get_status: replace the inline lazy import with await run_in_executor(None, _resolve_restart_drain_timeout). This is the root fix for the original 15 s socket-timeout: the blocking .pyc-compilation + Defender scan is offloaded to a thread, keeping the event loop free for every /api/status probe. After the first call the module is in sys.modules and the executor returns in microseconds. Both helpers are extracted as module-level sync functions so they can be unit-tested independently of FastAPI or uvicorn. Closes #50209 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-21 12:29:18 -07:00
kshitijk4poor	4d7bb382b0	refactor(gateway): route all active_agents coercion through parse_active_agents; harden drain-timeout fallback Second cleanup pass (simplify-code review of the first follow-up): - write_runtime_status now clamps active_agents via parse_active_agents instead of an inline max(0, int(...)). Removes the duplicated clamp the helper's docstring acknowledged AND closes a write-side ValueError gap (a non-numeric active_agents previously raised; now degrades to 0). - hermes_cli/gateway.py draining-status line routes its active-agents count through parse_active_agents too — the third coercion site of the same persisted field, now consistent and non-raising with the two HTTP surfaces. - web_server.py /api/status: the drain-timeout resolver fallback now catches ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT (a real float) instead of a blanket 'except Exception -> None'. None would have violated the surfaced field's int/float contract and stripped NAS's poll-deadline hint silently. - Dropped a redundant 'if runtime else 0' branch (parse_active_agents already handles the empty/None case) and tightened the parse_active_agents docstring to describe the actual single-contract role (write + both reads).	2026-06-21 17:22:52 +05:30
kshitijk4poor	b577f25100	refactor(gateway): dedupe drain-timeout resolution + share active_agents parse Follow-up cleanups on top of the busy/idle readout (PR #50103): - web_server.py /api/status reused the single drain-timeout resolver hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT env -> agent.restart_drain_timeout config -> default) instead of inlining a third hand-rolled copy of that precedence chain. Also fixes a subtle divergence: the inline copy used os.environ.get() so a set-but-empty env var was treated as a value rather than falling through to config; the shared resolver .strip()s and falls through correctly. - Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces (/api/status and /health/detailed) through it, so the exposed active_agents field is consistently clamped non-negative. Previously /api/status clamped while /health/detailed exposed the raw file value, diverging on a corrupt count. - Added TestParseActiveAgents covering the shared coercion contract.	2026-06-21 17:22:52 +05:30
Ben	0ee75469d7	feat(dashboard): surface gateway busy/drainable on /api/status Give an external consumer (NAS) a trustworthy, always-reachable busy/idle readout it can poll before a disruptive lifecycle action (restart, migrate, stop, auto-update). The dashboard /api/status is the only HTTP surface guaranteed up on a hosted agent regardless of which gateway platforms are enabled, and it already reads gateway_state.json. Add to /api/status (additive, non-breaking): - active_agents — in-flight gateway-turn count (now refreshed per-turn by the companion gateway-side commit) - gateway_busy — running AND active_agents > 0 - gateway_drainable — running and live (a valid begin-drain target) - restart_drain_timeout — resolved seconds, so the consumer can size its poll deadline without out-of-band knowledge (env HERMES_RESTART_DRAIN_TIMEOUT → config agent.restart_drain_timeout → default) The busy/drainable contract is defined once in gateway.status (derive_gateway_busy / derive_gateway_drainable) and consumed by both /api/status and /health/detailed so the two surfaces can never disagree. Liveness keys off gateway_running (a live PID/health probe), NEVER gateway_updated_at — a healthy idle gateway never advances that timestamp. All derived fields degrade to safe falsy values when the gateway is down or the status file is absent/corrupt (never a spurious "busy" that would wedge the consumer). active_sessions (the 5-min DB recency heuristic the SPA reads) is left exactly as-is — new signal, new fields. Tests (behaviour contracts, not snapshots): the pure derivation contract across every running/state/count/liveness combination; /api/status integration for busy, idle-drainable, draining, down, stale-busy-file, corrupt-count, and timeout surfacing; and /health/detailed parity.	2026-06-21 17:22:52 +05:30
helix4u	c253b07380	fix(model): clear stale endpoint credentials across switches	2026-06-19 19:58:26 -07:00
teknium1	a58287afcb	Merge remote-tracking branch 'origin/main' into pr48275-rebase # Conflicts: # cron/scheduler.py	2026-06-19 07:40:29 -07:00
teknium1	1d59d2dcae	feat(desktop): resolve OAuth status for catalog-only account providers Accounts-tab cards derived from the unified provider_catalog() carry status_fn=None and had no hardcoded branch in _resolve_provider_status, so any future OAuth/account provider plugin rendered permanently logged-out. Fall through to the canonical hermes_cli.auth.get_auth_status slug dispatcher and adapt its shape, so membership AND status both auto-extend with the hermes model universe.	2026-06-19 07:26:46 -07:00
Austin Pickett	8fe7b52ebf	test(desktop): lock GUI⊇`hermes model` provider parity; surface Bedrock Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the `hermes model` universe) must be configurable on a desktop Providers tab — keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an invariant against the live endpoints so the GUI can never silently drift from the CLI again. Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock provider card. Anthropic is whitelisted as a legitimate dual-tab provider (direct API key + subscription OAuth). Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role as the override base for _build_oauth_catalog().	2026-06-19 07:26:46 -07:00
Austin Pickett	60dfa0f31b	feat(desktop): Accounts tab derives membership from unified provider catalog /api/providers/oauth now unions the explicit hand-tuned OAuth cards (_OAUTH_PROVIDER_CATALOG — bespoke flow/status/cli, plus the api-key Anthropic PKCE card and synthetic claude-code row) with every accounts-tab provider in provider_catalog(). Any OAuth/external provider in the `hermes model` universe now appears automatically, closing the drift where google-gemini-cli and copilot-acp had no Accounts card despite being CLI-configurable. Adds read-only status cards for google-gemini-cli (via existing get_gemini_oauth_auth_status) and copilot-acp (managed-by-CLI, like claude-code). DELETE handler routes through the same _build_oauth_catalog() builder. Parity test asserts the Accounts tab offers every accounts-tab catalog provider as an invariant.	2026-06-19 07:26:46 -07:00

1 2 3 4 5 ...

297 commits