hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	f9b469d7de	test(web_git): assert default branch invariant, not hardcoded main CI git init defaults to master on some runners; compare branch to defaultBranch instead of pinning a branch name.	2026-06-28 16:29:52 -05:00
Brooklyn Nicholson	4e9439cc3b	fix(desktop): route composer context picking through remote-aware fs Second pass on the remote-project flow: the project dialog and git cockpit were remote-aware, but the composer's Add file/folder context picker still called the native Electron picker directly. Route it through selectDesktopPaths so remote sessions use the backend-aware picker instead of local disk paths; preserve local multi-select behavior and keep remote folder selection single because the in-app remote picker only supports one directory. Also use readDesktopFileDataUrl for image previews so an already-known backend image path can be read through /api/fs/read-data-url, and add focused coverage for backend file-diff routing plus the plain-folder git init/worktree path.	2026-06-28 14:35:23 -05:00
Brooklyn Nicholson	fc86e35764	feat(desktop): make the git cockpit work over a remote gateway After the folder picker fix, an added remote folder was still half-usable: the desktop's git GUI (coding-rail status, worktree lanes, review pane, branch switch, file diff) all ran Electron-local git on the USER's machine, so against a remote-gateway repo they silently degraded to empty. Mirror the whole surface over the dashboard REST API so it acts on the BACKEND repo where sessions actually run: - hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/ create-pr, file-diff, worktree add/remove, branch switch) shelling to the system git, mirroring the Electron ops' shapes. - web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as /api/fs, executor-offloaded, mutations -> 400). - apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as window.hermesDesktop.git; coding-status / review / projects / model / desktop-fs route through desktopGit() so local stays Electron, remote hits /api/git/*. Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts, review classification, diff incl. untracked all-add, stage+commit roundtrip, worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).	2026-06-28 14:26:09 -05:00
ygd58	3e16176ba4	fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled _get_platform_tools() applies agent.disabled_toolsets as a final override AFTER reading platform_toolsets.<platform>, so a toolset listed there stays permanently OFF no matter what the toggle write path saves. Blank Slate installs pre-populate this list with ~27 toolsets, making most of the desktop Toolsets UI un-enableable (issue #49995). Fix: _save_platform_tools() now removes any toolset the user just explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets. Toolsets the user did not touch, or that remain disabled on other platforms, are left alone -- disabled_toolsets keeps working as a cross-platform suppression list for anything not actively re-enabled. Disabling a toolset (unchecking it) does not touch disabled_toolsets at all -- only enables reconcile it. Verified end-to-end with the exact repro from the issue: Blank Slate config (disabled_toolsets=['todo','memory','browser'], cli=['file', 'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools() now resolves 'todo' as enabled while 'memory'/'browser' (untouched) remain disabled. Added 4 regression tests. Full tools_config suite: 101 passed (97 existing + 4 new), no regressions. Fixes #49995	2026-06-28 21:59:03 +05:30
Teknium	0c2e6c0049	test: make active session cross-process race deterministic (#54248 )	2026-06-28 05:49:21 -07:00
izumi0uu	c4719aa51c	fix(gateway): boot out stale launchd registration before restart bootstrap launchd restart can leave the gateway job stopped but still registered after update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a detached process. Booting the stale registration out before bootstrap keeps the launchd-managed restart path intact and locks it with a regression test. Constraint: Keep upstream-facing conventional commit style while preserving local decision context Rejected: Treat bootstrap exit 5 as expected \| Leaves macOS launchd restart outside launchd supervision after update Confidence: high Scope-risk: narrow Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart" Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd Not-tested: Manual macOS launchctl restart after hermes update	2026-06-28 04:17:13 -07:00
teknium1	463225caf1	fix(gateway): bypass legacy-unit prompt in non-TTY systemd install Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt not guarded by --start-now/--start-on-login — is also skipped in headless contexts. Updates systemd_install test mocks to accept the new kwarg and adds coverage for the legacy-unit-skip path.	2026-06-28 04:09:54 -07:00
liuhao1024	831d443b03	fix(gateway): honor --start-now/--start-on-login flags and support non-TTY headless installs When running `hermes gateway install` on Linux/systemd, the command unconditionally prompts with two `prompt_yes_no` questions, breaking headless installs (SSH, CI, provisioning scripts) and ignoring the existing --start-now / --start-on-login CLI flags that the Windows branch already respects. The fix mirrors the Windows path: read CLI flags first, prompt only when flags are not provided AND stdin is a TTY, and fall back to True defaults for non-TTY contexts. The argparse help strings are promoted from SUPPRESS to visible so users can discover the flags. Fixes #42065	2026-06-28 04:09:54 -07:00
Teknium	a06d0198cd	fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190 ) The /api/pty handler only closed the PtyBridge in the writer loop's finally. On child EOF the reader task closes the WebSocket, but if the handler task is cancelled the instant the socket closes, the writer's finally can be skipped and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted. Reap the bridge in the reader's EOF finally too (close() is idempotent), so the PTY is reaped independently of the writer-loop cancellation race. Harden the regression test to poll for teardown instead of asserting on the same tick. Was flaky on main (2/20); now 25/25.	2026-06-28 03:58:18 -07:00
yoniebans	204a67f0c8	fix(kanban): retry write_txn on transient SQLITE_BUSY	2026-06-28 02:44:04 -07:00
yoniebans	90c1dc0493	test(kanban): cover write_txn BUSY retry (currently failing)	2026-06-28 02:44:04 -07:00
Teknium	6d879d486b	fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028 ) (#54123 ) * fix(dashboard): close PTY WebSocket on child EOF to stop FD leak The /api/pty handler's reader task returns on child EOF, but the writer loop stayed blocked on ws.receive() until the browser sent a disconnect. When the browser socket is half-open (no FIN delivered — common on macOS/launchd), that disconnect never arrives, so the handler never reaches its finally and the PTY master fd + child process leak. With dashboard auto-reconnect (#52962), every dropped socket then spawns a fresh PTY on top of the orphaned one, exhausting file descriptors within hours (EMFILE / Errno 24). Fix: the reader task now closes the WebSocket in a finally when the child EOFs or the send side breaks, which unblocks ws.receive() so the existing finally runs bridge.close(). The writer loop also guards ws.receive() against the RuntimeError Starlette raises once the socket is closed. Reported by @fifteenzhang. Fixes #54028 * docs: add infographic for #54028 PTY FD leak fix	2026-06-28 02:42:21 -07:00
teknium1	7c9cdad9fd	test(cli): cover Windows self-lock recovery guard + cmd-quote its hint Add two tests for the self-lock guard in _recover_from_interrupted_install: one asserting it clears the marker and skips install when hermes.exe is a process ancestor (breaking the #52378/#45542 loop), one asserting it falls through to a normal recovery install when the shim is NOT an ancestor. The guard's manual-recovery hint runs only inside the Windows branch, so quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform fallback hint at the end of the function is left POSIX-correct. Map Icather in scripts/release.py AUTHOR_MAP for the salvage.	2026-06-28 02:40:37 -07:00
PRATHAMESH75	e551da6ddb	fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart Long-lived helpers spawned indirectly by tool calls (adb, platform bridges) were left in the service cgroup after the gateway's main process exited. When the kernel rejected the deferred cgroup-wide kill with EINVAL, systemd blocked Restart=always for 6+ minutes, taking down all platforms and cron windows (#37454). Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks cgroup.procs and sends per-PID SIGKILLs — a different kernel code path than cgroup.kill, so it succeeds where the cgroup-wide write failed. KillMode=mixed is preserved so the gateway still reaps its own tool-call children before systemd intervenes (#8202). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 02:05:50 -07:00
xxxigm	093f567f0d	fix(agent,cli): surface empty-body API errors and fail oneshot exit code When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}), `_summarize_api_error` fell through to a bare `str(error)`, so users saw only "HTTP 400" with no provider detail (reported on Windows in #36109). The SDK leaves `body` empty in this case, but the httpx `response` still carries the payload in `.text`. - run_agent.py `_summarize_api_error`: when `body` is empty, fall back to `response.text` — parse a JSON `error.message`/`message` when present, else surface the raw (truncated) body. Platform-agnostic diagnostics. - hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns exit code 2 when the run is failed/partial with no usable final response, so scripts can detect LLM failures (still 0 when a response — incl. an error summary as output — is produced). Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests. NOTE: #36109's original root cause (Windows "all providers return empty 400") is not reproducible on current main (heavy provider-transport churn since v0.15.1). This change does not claim to fix that root cause — it makes any empty-body API error LEGIBLE so a future occurrence shows the real provider message instead of a bare HTTP 400. Relates to #36109 (does not close it).	2026-06-28 02:05:20 -07:00
teknium1	c918d42d88	feat(desktop): config-driven Electron launch flags + GPU policy Adds a desktop: section to config.yaml so headless/VM users can make `hermes desktop` launch correctly without a wrapper command: - desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11) appended to every launch. Accepts a list or a shell-split string. - desktop.disable_gpu: auto\|true\|false, bridged to the HERMES_DESKTOP_DISABLE_GPU env var the Electron app already reads. An explicit env var still wins. cmd_gui() reads these via _desktop_launch_options() and applies them. This is the config.yaml form of the capability proposed as a raw env var in #38934 (@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var. Co-authored-by: ray <86501179+1RB@users.noreply.github.com>	2026-06-27 22:26:43 -07:00
Rafael Millan	54ea059919	fix: fall back to no-sandbox for desktop launch on restricted Linux hosts	2026-06-27 22:16:20 -07:00
Teknium	4626ceb747	fix(gateway): only offer system-scope gateway install to root sessions (#53975 ) Non-root users picking 'System service' in the setup wizard were handed a 'sudo hermes gateway install --system --run-as-user <you>' recipe that fails on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs), so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user toward a system install they shouldn't be doing from a user session. Now prompt_linux_gateway_install_scope() only offers system scope when os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to re-run as root for a boot service. The non-root branch in install_linux_gateway_from_setup becomes a defensive guard that refuses without printing any self-elevation recipe. Gated the matching deferral hint in setup.py behind root too.	2026-06-27 21:24:08 -07:00
teknium1	f54c52800a	fix(models): scope live-first picker merge to opencode aggregators only Follow-up to the salvaged #49129 commit. The original change flipped the shared generic-provider merge in provider_model_ids() to live-first unconditionally, which regressed curated-first for single providers (kimi/zai, #46309) — and the PR encoded that regression by flipping the kimi-coding and zai test assertions to expect live-first. Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set ({opencode-zen, opencode-go}); every other provider keeps curated-first. Also widen the uncapped picker + live-first sets to opencode-go, which has the same 70+ model catalog problem as opencode-zen. Restore the kimi-coding curated-first test and rewrite the merge-order test to assert the per-provider contract.	2026-06-27 21:23:25 -07:00
Afnath Ahamed	f98ffbc246	fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker	2026-06-27 21:23:25 -07:00
Teknium	3b23a984b5	feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973 ) Multi-agent boards leak staleness: a sibling worker's parent handoff, comment, or prior-attempt summary gets read by the next worker as live truth even when it's a day old. build_worker_context surfaced the text with (at best) a bare absolute timestamp, which an LLM reads as fact regardless of age — parent results had no timestamp at all. Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every recalled-state line and a one-line 'point-in-time snapshot, re-verify against source' frame on the parent-results section, so the worker sees when handoffs were produced and re-checks stale ones before acting.	2026-06-27 21:21:54 -07:00
kshitijk4poor	2af1678bfc	fix(auth): explicit provider intent beats stale OAuth active_provider (#29285 ) `resolve_provider("auto")` checked `auth.json` `active_provider` BEFORE the config.yaml `model.provider` and env-var API-key checks. So a user who was OAuth-logged-into one provider (e.g. Anthropic) but had set an explicit `model.provider` or exported an API key (e.g. `OPENAI_API_KEY`) was silently routed to the stale OAuth provider — the override was invisible and surprising. Reorder the auto-path so explicit intent wins (the order the issue asks for): 1. explicit CLI api_key/base_url 2. config.yaml `model.provider` (safety net — see below) 3. OPENAI_API_KEY / OPENROUTER_API_KEY env 4. OpenRouter credential pool 5. provider-specific API-key env vars 6. auth.json `active_provider` (OAuth) ← demoted to last-resort 7. AWS Bedrock credential chain 8. error `active_provider` is still honored — it's just a last-resort fallback chosen only when the user expressed no other preference, instead of overriding one. The normal chat/gateway/TUI/ACP/status path already resolves config.provider upstream in `resolve_requested_provider()` before "auto" is reached, so this duplicate config check is the safety net for the lone direct caller (`main.py` `resolve_provider("auto")`) and any future bypass. Because every surface funnels through this one resolver, the fix propagates everywhere with a single edit — no sibling path re-implements precedence. Also add a one-shot WARN when resolution lands on `active_provider` while a populated `model` config dict lacks a `provider` key — surfacing the silent override the issue reported without breaking first-install. Synthesizes the two competing PRs: #29615 (LifeJiggy — config-before-auth + the silent-override framing) and #29809 (Minksgo — the env-before-auth reorder). #29809 could not be merged directly (bundled unrelated, un-opt-in cost-tagging telemetry); its reorder idea is incorporated here and credited. Tests: tests/hermes_cli/test_provider_precedence.py — config/env beat stale OAuth, OAuth still used as last resort, explicit request short-circuits, WARN fires on silent fall-through. Full provider-resolution suites: 374 passed. Fixes #29285 Co-authored-by: LifeJiggy <141562589+LifeJiggy@users.noreply.github.com> Co-authored-by: Minksgo <153416856+Minksgo@users.noreply.github.com>	2026-06-27 19:49:02 -07:00
郝鹏宇	98488c4be4	fix(config): prevent save_config from materialising schema defaults Fixes #27354 Root cause: called during init (or by any code path that saves ) wrote injected schema defaults into config.yaml as if the user had authored them. Two fix layers: 1. now only injects when the user actually set somewhere (root or agent). A user who never set keeps it absent, so 's explicit-path detection won't treat it as user-authored. 2. gains a parameter and a new pass that removes keys matching unless those paths were explicitly present in the raw (pre-normalization) config on disk. Explicit-path detection uses on before any normalisation runs — preventing injected-in defaults from being mistaken for user-set values. All migration and edit-config call sites pass to preserve their intentional default-seeding behaviour. New helpers: - — collects leaf-key paths from a raw dict - — removes keys matching schema defaults Test coverage: 4 new regression tests (59 total, all passing).	2026-06-27 19:38:11 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
ailthrim	25ec01f79f	fix(desktop): don't purge Electron cache / mirror-retry after a late build failure `hermes desktop` / `hermes update` recover from a corrupt Electron download by purging the cached zip + re-downloading and retrying the pack, and then by falling back to a public mirror. That recovery is only meaningful when the packaged executable is MISSING — the signature of a partial/corrupt unpack. A LATE failure such as macOS code signing (#40187) leaves `Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place. Re-downloading Electron can't repair a signing failure, so the purge + slow mirror retry just grind through another identical failure before the build finally errors out. Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None` so a build that already produced the executable fails fast instead of triggering the destructive download recovery. The corrupt-download path (executable missing) is unchanged. Salvage of #42782, re-applied onto current main (the surrounding recovery was refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR was opened). Adds a regression test asserting no purge / mirror retry runs when the executable exists, and updates the existing retry/mirror tests to model the corrupt-download case (executable absent) the recovery is actually for. Related to #40187 (the residual cache-purge sub-issue; the signing failure itself is fixed by #52591).	2026-06-28 00:29:34 +05:30
teknium1	1ef19bad90	fix(model): show MoA preset picker on selection and label MoA in the banner Selecting 'Mixture of Agents' in the `hermes model` provider picker fell through silently — select_provider_and_model had no moa branch, so it just reprinted the current model/provider summary and exited. And the CLI session banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'), which is meaningless out of context. - Add _model_flow_moa: always lists the available presets (even one), then prints the full reference-models + aggregator breakdown for the selection and persists model.provider=moa / model.default=<preset> (dropping stale base_url + endpoint creds, since moa is a virtual local provider). - Wire the branch into select_provider_and_model. - build_welcome_banner takes provider; when 'moa' it renders 'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call sites pass self.provider. Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker persists the preset and clears stale base_url/api_key.	2026-06-27 11:45:07 -07:00
Teknium	27322612b4	fix(update): route loud build/installer output to update.log instead of the terminal (#53616 ) * fix(update): route loud build/installer output to update.log instead of the terminal hermes update flooded the terminal with the full vite asset dump, electron-builder logs, npm deprecation warnings from the desktop build, and the cua-driver installer's 'Next steps' wall. All of that is low-signal noise the user doesn't need on a successful update. - Capture the desktop --build-only subprocess (vite + electron-builder) into ~/.hermes/logs/update.log; print a one-line status, and on failure surface the last 15 lines + a pointer to the full log. - Capture the cua-driver installer's output when verbose=False (the hermes update refresh path); concise upgrade line is unchanged. - Add _log_only_write() / _run_logged_subprocess() helpers that write to the update.log handle without echoing to the terminal. The repo-root npm install keeps streaming (capture_output=False) — that is the deliberate #18840 guard so a slow postinstall download doesn't look hung. The desktop npm install is a separate Electron process with no such progress concern and is captured. * fix(update): persist full cua-driver installer output to update.log The captured cua-driver installer output was only sent to logger.debug (agent.log) on failure, so the 'Next steps' wall was lost from update.log entirely on success. Write the full captured output straight to the update.log handle (sys.stdout._log) on both success and failure, matching the desktop-build capture, so update.log keeps the complete record of everything an update did.	2026-06-27 11:43:01 -07:00
Teknium	917f6bdb00	fix(tools): let vision pick any provider+model, not just OpenRouter (#53606 ) * fix(tools): let vision pick any provider+model, not just OpenRouter hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY. It now offers the same any-provider surface as the model command: Auto (use main model / aggregator fallback), pick any authenticated provider + model, or a custom OpenAI-compatible endpoint. Selections persist to auxiliary.vision.{provider,model,base_url} — the keys the vision resolver already reads. Custom endpoint pins provider=custom so base_url routes correctly. Reconfigure path uses the same picker instead of re-prompting for OPENROUTER_API_KEY. * docs: add PR infographic for vision any-provider picker	2026-06-27 04:41:42 -07:00
ms-alan	16192103f4	fix(config): accept placeholder base_url in custom provider validation _normalize_custom_provider_entry() ran urlparse() on base_url and dropped any entry whose value was an un-expanded placeholder, so a caller reaching the normalizer with raw config (e.g. the Dockerized gateway path) silently skipped the provider with a 'not a valid URL' warning. Skip URL validation when the candidate contains a placeholder token — both ${ENV_VAR} env-refs and bare {region}-style templates — since those are expanded at runtime. Closes #14457	2026-06-27 04:15:27 -07:00
Teknium	5ab4136631	fix(webui): switch provider when Config-page model field changes (#53583 ) The dashboard Config tab's Model field is a flat string with no provider info. _denormalize_config_from_web only updated model.default and kept the stale provider, so picking an OpenRouter model while the default provider was ollama-local left provider=ollama-local and every call 404'd. When the model string actually changes, infer the serving provider — curated catalog first, then a vendor/model-slug heuristic for non-aggregator providers — and route the switch through the existing _normalize_main_model_assignment / _apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are cleared on a provider change and preserved on a same-provider re-pick. Saving an unchanged model never re-detects, so unrelated config saves keep an explicit provider. Closes #14058	2026-06-27 04:13:44 -07:00
blaryx	76af2456a2	fix(dashboard): merge PUT /api/config with existing on-disk config The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate every root-level key the YAML supports. Most visibly, `custom_providers` is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend never sends it in the PUT body. The previous full-replace save() then silently wiped the key from disk every time the user clicked anything that triggered a save. Other casualties (less visible because defaults re-mask them on load) include `agent.personalities`, `agent.reasoning_effort`, `terminal.lifetime_seconds`, etc. Fix: read the raw on-disk config and deep-merge the incoming PUT body on top of it before saving. The frontend can only overwrite what it explicitly sends; everything else is preserved verbatim. Reuses the existing `_deep_merge` helper from `hermes_cli.config`. Tests: - `test_round_trip_preserves_custom_providers` exercises the exact bug: seed config with custom_providers, GET → drop the key → PUT, assert it's still on disk. - `test_round_trip_preserves_schema_invisible_nested_keys` covers the shallow-vs-deep-merge case for nested dicts under `agent` etc. Both fail on current main; both pass with this patch.	2026-06-27 03:48:18 -07:00
dodo-reach	ed54469d06	fix(gateway): show MoA presets in model picker	2026-06-27 03:43:38 -07:00
briandevans	17cb829991	test(moa): cover non-list/bare-dict reference_models normalization	2026-06-27 03:43:16 -07:00
Teknium	60f58a2b95	feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552 ) The verify-on-stop guard fired too eagerly — including on doc/markdown/skill edits with nothing to verify, where it pushed a pointless /tmp verification script. Three changes: 1. Default OFF for new installs: agent.verify_on_stop defaults to false (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31. 2. One-time migration (v30 -> v31): existing installs are switched off once, but only when the value is missing or still the "auto" sentinel — an explicit true/false the user set is preserved. 3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on the code paths. The legacy "auto" sentinel is still honored when set explicitly (ON for interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env override unchanged.	2026-06-27 03:23:22 -07:00
Versun	c655cdf2c1	feat(dashboard): expose cron job execution fields	2026-06-27 03:20:32 -07:00
Teknium	d712a7fd73	fix(model-picker): surface the current custom/uncurated model in picker rows (#53457 ) A model selected via the CLI (e.g. /model openrouter/<uncurated-name>) was absent from every model picker — the main picker AND the MoA reference/ aggregator slot pickers — because each provider row only carried its curated catalog. Inject the current model at the front of its provider's row so it is selectable and shown everywhere.	2026-06-27 00:06:34 -07:00
ethernet	bcc3eb3419	fix(ci): rip out some xdist legacy stuff... how did these ever work??	2026-06-26 19:15:18 -07:00
Nacho Avecilla	dbe734beff	fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239 ) * Return None instead of erroring on drain login failure * Fix login on drain * Remove login for drained endpoints flow and clean the code * chore: drop unrelated credits changes from this PR * Remove extra comments that were not really necessary	2026-06-27 10:08:13 +10:00
kshitijk4poor	7475d125d2	test(mcp): stub mcp_oauth in backgrounding test to deflake CI The backgrounding-contract test (test_prepare_agent_startup_backgrounds_ blocking_mcp_for_chat) failed intermittently on loaded CI shards: it stubs tools.mcp_tool.discover_mcp_tools but NOT tools.mcp_oauth, so the background discovery thread paid the real, cold ~0.75s 'import tools.mcp_oauth' (added by this PR's _discover_mcp_tools_without_interactive_oauth) before calling the stubbed discovery. On a slow/loaded runner that import plus thread scheduling exceeded the 1.0s polling deadline, leaving calls['mcp'] == 0. Fix: stub tools.mcp_oauth with a nullcontext suppress_interactive_oauth (the same no-op production falls back to when mcp_oauth is unavailable), so the test exercises the backgrounding contract without paying an unrelated cold import in its timing window. Bumped the poll deadline 1.0s -> 3.0s as belt-and-suspenders. Production behaviour is unchanged; the import cost was always off the main thread. Verified: 5/5 pass repeatedly via scripts/run_tests.sh (per-file isolation, matching CI), ruff clean.	2026-06-27 04:59:23 +05:30
zapabob	e55ddc3e33	fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927 ) When an MCP server requires OAuth, the interactive `hermes` TUI froze on startup: background MCP discovery hit the OAuth flow, which on an interactive TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the "paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That thread competes with the TUI's own stdin reader for the same terminal, so keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL ... the tui is freezing, not respond to typing." Add a thread-local `suppress_interactive_oauth()` context manager in tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the stdin paste-thread and prompt are never created. Background discovery (hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside that context, so OAuth-requiring servers soft-skip (raise OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin. A real `hermes mcp login` on the main thread is unaffected (thread-local). Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick; resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_ discovery refactor, keeping both). Verified E2E: with suppression the paste prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive soft-skip); without it the prompt shows (the freeze). Mutation-verified (removing the suppress check in _is_interactive fails the regression test). 76 tests pass, ruff clean. Closes #35927. SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT propagate to the dedicated mcp-event-loop thread where OAuth actually runs (discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so the suppression was a NO-OP in production (the tests passed only by stubbing out the cross-thread dispatch). Converted to a contextvars.ContextVar, which asyncio copies onto the scheduled coroutine — empirically verified suppression now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path. Added a cross-thread regression test (fails on threading.local, passes on the ContextVar) so the no-op can't regress.	2026-06-27 04:59:23 +05:30
kshitijk4poor	244a6f2ceb	fix(desktop): broken "Open setup guide" button for plugin platforms On the desktop Channels / Messaging page, the "Open setup guide" button was rendered as a bare <a href={platform.docs_url} target="_blank"> with no guard. Plugin-provided platforms (Microsoft Teams, Google Chat, Line, Raft, Yuanbao, …) ship an empty docs_url, so the anchor's href was "". In a packaged build, Electron resolves an empty href against the current document — the app's own index.html inside the asar bundle — and shell.openPath then fails with an OS "file not found" dialog. This is exactly the Windows error reported for Messaging → Teams → Open guide. Fix (3 changes): 1. fix(desktop) — Only render the "Open setup guide" button when docs_url is non-empty, and route clicks through openExternalLink so a relative/empty value can never be treated as a local bundle path. Fixes the whole class (every plugin platform), not just Teams. 2. fix(messaging) — Give the Teams platform plugin a real docs_url (Microsoft Teams setup guide) so its card shows a working button instead of nothing. 3. fix(messaging) — Give the Google Chat platform plugin a real docs_url (Google Chat setup guide) so its card shows a working button instead of nothing. Originally from #48940; folded in here because that PR's test was broken (it queried the HTTP endpoint, but google_chat is a dynamic enum member that only appears after the adapter module is imported). Test plan: - apps/desktop — new src/app/messaging/index.test.tsx: button is hidden when docs_url is empty; a real URL opens via the validated external opener (does not navigate). - apps/desktop typecheck (tsc --noEmit) clean. - backend — test_teams_messaging_metadata_links_setup_guide: the Teams catalog entry exposes the setup-guide docs_url. - backend — test_google_chat_messaging_metadata_links_setup_guide: the Google Chat catalog entry exposes the setup-guide docs_url. Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: p-andhika <andhika.prakasiwi@gmail.com>	2026-06-27 04:34:08 +05:30
Teknium	7e101e553b	fix(moa): block the moa virtual provider as a reference or aggregator slot (#53281 ) A MoA preset whose reference or aggregator slot points at the moa virtual provider creates a recursive MoA tree. The runtime guards in moa_loop.py only surface this mid-turn (references silently skipped, aggregator raises). Reject it at the config chokepoint (_clean_slot) so it can never be saved, and hide it from the desktop/dashboard slot pickers so it isn't offered as a dead choice.	2026-06-26 14:42:42 -07:00
srojk34	f0678b031e	fix(moa): tolerate non-numeric values in hand-edited MoA preset config _normalize_preset uses bare float() and int() to coerce reference_temperature, aggregator_temperature, and max_tokens from config.yaml. When a user hand-edits a non-numeric value (e.g. max_tokens: "8k" or reference_temperature: "hot"), the coercion raises ValueError. Since normalize_moa_config runs on every model-selection and MoA turn (via resolve_moa_preset), the crash is unrecoverable and blocks all MoA usage until the config is manually fixed. Replace the bare casts with _coerce_float / _coerce_int helpers that fall back to the default on TypeError/ValueError instead of raising.	2026-06-26 14:35:38 -07:00
kyssta-exe	c0568ca95f	fix(config): use read_raw_config() in migrations to prevent expanding defaults (#40821 )	2026-06-26 22:40:52 +05:30
brooklyn!	5cc4009deb	Merge pull request #52828 from helix4u/fix/desktop-backend-update-indicator fix(desktop): show remote backend updates without counts	2026-06-26 11:49:07 -05:00
kshitij	7b2c51152a	Merge pull request #52990 from NousResearch/salvage/52889-backup-projects-kanban fix(backup): include projects.db and kanban boards in pre-update snapshot (#52889)	2026-06-26 20:09:15 +05:30
0xDevNinja	9ef49cd78f	fix(backup): include projects.db, kanban boards, and sibling stores in pre-update snapshot (#52889 ) projects.db (per-profile project store) and kanban.db were missing from _QUICK_STATE_FILES, so the pre-update quick snapshot never backed them up. On a desktop upgrade, when the update flow removes/replaces the file and the post-update schema-init re-creates an empty one, all user-created projects, folder mappings, the active-project pointer, kanban board bindings, and tasks vanish silently — no error. Add the per-profile user-created stores to the snapshot set: - projects.db — project store - response_store.db — gateway conversation history / tool payloads (WAL) - memory_store.db — holographic memory facts/entities (WAL) - verification_evidence.db — agent verification audit trail - kanban.db — default board (back-compat <root>/kanban.db) - kanban/boards — non-default boards (<root>/kanban/boards/<slug>/kanban.db + metadata); workspaces/ and attachments/ subtrees are skipped as large + regenerable. Also: the directory-branch of create_quick_snapshot now routes *.db through the WAL-safe _safe_copy_db (SQLite backup() API), matching the top-level file path — previously a non-default board DB with an open WAL could be copied inconsistently. Salvaged from #52930 by @0xDevNinja (authorship preserved via cherry-pick). On top of the original (which covered only projects.db + the default kanban.db), this adds: non-default-board coverage, the three sibling per-profile DBs that meet the same upgrade-wipe criteria, WAL-safe directory copies, and a workspaces/attachments skip to avoid snapshot bloat (×20 retained). 8 tests, all mutation-verified; E2E verified snapshot→wipe→restore preserves all six store types on the real code path. Closes #52889. Supersedes #52930.	2026-06-26 19:23:33 +05:30
Dr1985	e3db1ef92d	fix(macos): clearly distinguish launchd supervision from detached fallback in gateway status Some checks failed CI / detect (push) Waiting to run Details CI / tests (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / typecheck (push) Blocked by required conditions Details CI / docs-site (push) Blocked by required conditions Details CI / history-check (push) Blocked by required conditions Details CI / contributor-check (push) Blocked by required conditions Details CI / uv-lockfile (push) Blocked by required conditions Details CI / docker-lint (push) Blocked by required conditions Details CI / supply-chain (push) Blocked by required conditions Details CI / osv-scanner (push) Blocked by required conditions Details CI / All required checks pass (push) Blocked by required conditions Details Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Has been cancelled Details Docker Build and Publish / build-arm64 (push) Has been cancelled Details Docker Build and Publish / merge (push) Has been cancelled Details ## Description On macOS 26.x, `launchctl bootstrap` and `launchctl kickstart` return exit code 5 ("Input/output error"), which Hermes already anticipates and handles by spawning a detached fallback process. However, the gateway status reporting is ambiguous: - `gateway status` says "Gateway service is loaded" (because `launchctl list` returns exit 0) - But `launchctl print` shows `state = not running` — launchd isn't actually supervising anything - The detached fallback PID running is invisible to the status command - Users can't tell whether auto-start at login and auto-restart on crash are available ### Root Cause Two problems in `hermes_cli/gateway.py`: 1. `_probe_launchd_service_running()` (line 1067): Determined launchd service liveness solely by `launchctl list <label>` exit code. On macOS 26, this returns 0 even when the service is only registered but not running (output lacks a `"PID"` field). This caused `GatewayRuntimeSnapshot.service_running = True` incorrectly, which suppressed the process/service mismatch warning. 2. `launchd_status()` (line 3569): Used the same binary "loaded/not loaded" check without inspecting whether launchd actually has a PID, whether a detached fallback is running, or whether auto-start/restart are available. ### Changes `hermes_cli/gateway.py`: 1. New `_parse_launchd_pid_from_list_output()` helper — Extracts the PID from `launchctl list` output. When launchd is actively supervising, the output includes `"PID" = <number>;`. When only registered but not running, no PID field is present. 2. Fixed `_probe_launchd_service_running()` — Now requires a PID in the `launchctl list` output to confirm launchd is actually supervising. This correctly sets `service_running = False` when launchd has the service registered but `state = not running`, which triggers the existing process/service mismatch detection. 3. Reworked `launchd_status()` — Reports clearly separated information: - LaunchAgent plist currentness (stale or current) - Whether launchd is actively supervising (with PID) - Whether a detached fallback PID is running - Whether auto-start at login and auto-restart on crash are available - When launchd supervision is known to be unavailable, explains why 4. Persistent unsupported marker (`~/.hermes/.gateway-launchd-unsupported`) — Written when `_launchd_fallback_to_detached()` is called (launchd exit 5/125). Allows `launchd_status()` to explain why launchd can't supervise even when no fallback process is currently running. Cleared automatically when a future bootstrap/kickstart succeeds (e.g., after an OS update fixes the issue). 5. Updated `_print_gateway_process_mismatch()` — Distinguishes the managed detached fallback from a genuinely manual `nohup hermes gateway run`, providing accurate guidance for each case. ### Status Output Examples Before (macOS 26, fallback active): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ✓ Gateway service is loaded { "Label" = "ai.hermes.gateway"; "OnDemand" = true; ... }; ``` After (macOS 26, fallback active): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ⚠ Gateway service is registered but launchd is not supervising it launchd cannot manage the gateway on this macOS version. ✓ Detached fallback process is running (PID 12345) Cron jobs will fire. Stop with: hermes gateway stop ⚠ Auto-start at login and auto-restart on crash are NOT available. ``` After (normal launchd supervision): ``` Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist ✓ Service definition matches the current Hermes install ✓ Gateway is supervised by launchd (PID 12345) Auto-start at login and auto-restart on crash are available. ``` ### Tests Updated 5 existing tests and added 11 new tests in `tests/hermes_cli/test_gateway_service.py`: - PID parsing from `launchctl list` output (with PID, without PID, empty, unquoted PID) - `_probe_launchd_service_running()` requires PID presence - Unsupport marker lifecycle (write, clear, persist across fallback) - Marker cleared on successful bootstrap - `launchd_status()` reporting: supervised, fallback-running, fallback-unavailable - Existing fallback tests now verify marker creation ### Related Issues - Issue #23387 (original macOS 26 launchd workaround) - Issue #42524 (this issue)	2026-06-26 16:30:30 +05:30

1 2 3 4 5 ...

1701 commits