hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-24 16:54:43 +00:00

Author	SHA1	Message	Date
Teknium	9c9d9113a8	fix(auth): auto-detect OpenRouter credential from the pool, not just env (#42263 ) resolve_provider() auto-detection only checked OPENROUTER_API_KEY/ OPENAI_API_KEY env vars, never the credential pool. A key added via `hermes auth add openrouter` (manual pool entry, no env var) was invisible: the provider failed to resolve or resolved with an empty api_key, so requests went out with no Authorization header and OpenRouter returned "HTTP 401: Missing Authentication header" while `hermes auth list` showed the credential. Closes #42130. - auth.py: check load_pool("openrouter").has_credentials() after the env check - dump.py: `debug share` shows 'openrouter set (auth pool)' instead of the misleading 'not set' when the key lives in the pool - add regression tests (pool credential auto-detects; empty pool still raises)	2026-06-08 10:01:47 -07:00
teknium1	a77efada5f	refactor(cli): extract 18 model-flow wizard functions into model_setup_flows (god-file Phase 2) Lift the 18 _model_flow_* provider-setup wizard functions out of hermes_cli/main.py into hermes_cli/model_setup_flows.py. Behavior-neutral; main.py 14050 -> 11479 LOC. select_provider_and_model (the dispatcher) STAYS in main.py and re-imports the flows via an explicit 'from hermes_cli.model_setup_flows import (...)' block, so both its bare-name calls and existing test monkeypatches targeting hermes_cli.main._model_flow_* keep resolving against main's namespace unchanged. Imports: 3 neutral deps (argparse, os, subprocess) at the module top; the 14 main.py-internal helpers the flows call (_prompt_api_key, _save_custom_provider, the reasoning-effort/stepfun/qwen helpers, _run_anthropic_oauth_flow, ...) are lazy-imported per-flow (from hermes_cli.main import ...) so the new module never imports main at module scope -> no import cycle. Repointed one source-inspection change-detector (test_setup_ollama_cloud_force_refresh) to read the module the ollama-cloud branch moved to. Validation: 6563/6563 hermes_cli tests pass; live flow-dispatch probe confirms the lazy main-internal imports resolve at runtime.	2026-06-08 09:42:44 -07:00
yoniebans	9e360681f8	feat(dashboard): return recent commits from /api/hermes/update/check Add a best-effort `commits` list (sha/summary/author/at) to the update-check response for git/pip installs that are behind upstream, so the desktop's remote update overlay can show what's changed before applying. Additive and non-breaking: existing consumers (legacy dashboard, tests using subset assertions) ignore the new field. Leaves the shared check_for_updates() int contract untouched — commits come from a separate best-effort git call.	2026-06-08 08:58:26 -07:00
paulb26	b31c6c33b2	fix(pty-bridge): terminate PTY process groups on teardown	2026-06-08 07:03:12 -07:00
kshitij	b99c6c4277	Merge #42076 : nested category plugin discovery + alias-normalized enable/disable (#41066 ) Merge #42076: nested category plugin discovery + alias-normalized enable/disable (#41066) Lands the complete nested category plugin fix: - Discovery in `hermes plugins list` (from @islam666's #41076, carried in this PR) - Alias-normalized enable/disable mutation path so nested plugins can be toggled - Fixes the #41076 base breakages (web_server 6-tuple unpack + stale test fixtures) Co-authored work: discovery by @islam666 (#41076). Closes #41066.	2026-06-08 05:47:27 -07:00
kshitijk4poor	2b89afec79	fix(plugins): alias-normalize enable/disable for nested category plugins (follow-up to #41076 ) #41076 makes `hermes plugins list` discover nested category plugins (e.g. observability/nemo_relay). This adds the missing enable/disable mutation path so those plugins can actually be toggled, and fixes two incomplete-update breakages on the #41076 base. Before: `hermes plugins enable nemo_relay` -> "Plugin 'nemo_relay' is not installed or bundled." (exit 1), because cmd_enable/cmd_disable went through _plugin_exists(), which only checked top-level plugins/<name>/. Changes: - Add _resolve_plugin_key(): resolve a bare manifest/leaf name OR a full path-derived key (observability/nemo_relay) to the canonical key the runtime loader gates on, reusing #41076's _discover_all_plugins(). A bare leaf name ambiguous across two categories resolves to None rather than silently picking one. - cmd_enable/cmd_disable resolve first, persist the canonical key, and drop any stale legacy bare-name alias so the enabled/disabled lists can't drift into a contradictory state. _plugin_exists delegates to the same resolver. - Fix #41076 base breakages: _discover_all_plugins now returns 6-tuples, but web_server._merged_plugins_hub() still unpacked 5 (ValueError on the dashboard plugins-hub endpoint) and several test_plugins_cmd_list.py fixtures were still 5-tuples. Both updated; the hub status check is now key-aware. Verified e2e on the real CLI + runtime loader (isolated HERMES_HOME): `hermes plugins enable nemo_relay` writes observability/nemo_relay to config.yaml and the loader then loads it (enabled=True, error=None); a stale bare-name alias is cleared on disable; the dashboard _merged_plugins_hub() runs without crashing. Adds resolution + enable/disable tests; full tests/hermes_cli/test_plugins_cmd* + web_server plugin tests green. Follow-up to #41076 (#41066). Branched from that PR's head.	2026-06-08 17:57:37 +05:30
floory	15c99b437f	fix(cli): set PYTHON env for node-gyp native builds on NixOS (#40690 ) * fix(cli): set PYTHON env for node-gyp native builds on NixOS node-gyp (triggered by node-pty during npm ci) looks for python3 on PATH, which fails on NixOS because python3 lives in the nix store and is not on the system PATH. Add _nixos_build_env() — a two-tier helper that detects NixOS and: 1. Fast path: hermes venv python3 (~0s) 2. Fallback: nix-shell which python3 (~2-5s) Wire it into _run_npm_install_deterministic() via a new env= parameter, then pass it through cmd_gui() and _update_node_dependencies(). Non-NixOS systems: _nixos_build_env() returns None, behavior unchanged. * fix(cli): merge _nixos_build_env() with os.environ, fix NixOS detection, add explicit return None - Critical fix: both Tier 1 (venv) and Tier 2 (nix-shell) now return {**os.environ, "PYTHON": ...} instead of {"PYTHON": ...} — subprocess.run with env= replaces the entire environment, so the old code wiped PATH and broke npm/node on NixOS entirely. - Uses re.search(r"^ID=nixos$", ...) for anchored NixOS detection instead of unanchored substring match (could match ID_LIKE=...nixos). - Removes redundant Path.exists() guard before read_text(); just catches OSError (one filesystem read instead of two). - Adds explicit return None at end of function for type-hint consistency.	2026-06-08 13:57:37 +05:30
Teknium	4d18717b6c	fix(gateway): drop --replace from systemd unit templates (#41892 ) Under systemd's Restart=always, --replace turns every restart into a self-kill loop: the new instance reads gateway.pid, kills the previous process, writes its own PID, and on the next restart the cycle repeats. A process supervisor owns the lifecycle — --replace is for manual one-shot takeovers and fights the supervisor. Remove --replace from both the system-level and user-level systemd ExecStart lines. The --replace flag stays available for manual 'hermes gateway run --replace' and on the macOS launchd fallback path (#23387), which is a deliberate manual takeover, not a supervised unit. Also drop RestartMaxDelaySec / RestartSteps from the templates — they require systemd v255+ and are silently ignored on older versions. The _strip_optional_systemd_directives normalizer stays so existing installs whose on-disk unit still carries those directives aren't flagged as outdated. Credit: reported and diagnosed by @Skippy-the-Magnificent-one (PR #37145); reimplemented here under project authorship because the original commit was authored under a non-existent email.	2026-06-08 00:20:08 -07:00
konsisumer	3714caa1b9	fix(session): follow compression continuations for transcript reads	2026-06-07 23:57:20 -07:00
teknium1	1c68f6f81f	refactor(gateway): extract kanban watcher loops into GatewayKanbanWatchersMixin (god-file Phase 3) gateway/run.py is the largest god file (20k LOC, GatewayRunner with 220 methods). This lifts the cohesive kanban-watcher cluster — _kanban_notifier_watcher, _kanban_dispatcher_watcher, _kanban_advance/unsub/rewind, _deliver_kanban_artifacts (~1,035 LOC, 6 methods) — into gateway/kanban_watchers.py as a mixin that GatewayRunner inherits. Mixin (not free functions) because the methods use only self state: inheriting keeps every self._kanban_* call site working unchanged via the MRO, making this a behavior-neutral move. The methods' lazy imports (_kb, _decomp, _load_config, Platform) travel with them; the mixin needs only stdlib + a matching logging.getLogger('gateway.run'). run.py 20187 -> 19157 LOC; GatewayRunner direct methods 220 -> 214. Behavior-neutral: gateway test suite 6582 passed / 0 failed; start() still wires both watchers via self._kanban_*; MRO resolves all 6 to the mixin. One test (corrupt-board quarantine retry) keyed its time-travel mock on the caller's filename being gateway/run.py — updated to also accept gateway/kanban_watchers.py. Establishes the mixin-extraction pattern for further GatewayRunner decomposition (the 2406-LOC _run_agent and 1164-LOC _handle_message remain, but their callback closures need a context-object redesign — deferred).	2026-06-07 23:14:18 -07:00
teknium1	1a626470ca	refactor(cli): promote 9 closure handlers to top-level + extract their parsers (god-file Phase 2 follow-up) Subcommands whose handler was a closure defined inside main() — memory, acp, tools, insights, skills, pairing, plugins, mcp, claw — have their handler promoted to a top-level function and their parser block extracted into hermes_cli/subcommands/<name>.py (build_<name>_parser, injected handler). These 9 had zero closure-over-main-locals, so promotion is a pure relocation. acp/mcp parser blocks use the shared add_accept_hooks_flag helper. main() 1798 -> 954 LOC (71% below the 3297 Phase-2 starting point); add_parser calls in main.py 89 -> 28. Deferred: sessions, computer-use, secrets handlers reference <name>_parser (for a no-subcommand print_help fallback) — left in place to avoid the _self_parser indirection; minority, low value. Behavior-neutral: all 9 subcommands' --help (incl nested subactions) byte- identical to pre-extraction (diff-verified). tests/hermes_cli/ 6519 passed / 0 failed; new test_subcommands_followup.py covers the 9 builders.	2026-06-07 22:56:23 -07:00
teknium1	568e127612	refactor(cli): extract 25 more subcommand parsers into hermes_cli/subcommands/ Batch extraction of every remaining subcommand whose handler is top-level and whose parser block is pure argparse: model, setup, postinstall, whatsapp, slack, login, logout, auth, status, webhook, hooks, doctor, security, dump, debug, backup, import, config, version, update, uninstall, dashboard, gui, logs, prompt-size. Each becomes hermes_cli/subcommands/<name>.py with build_<name>_parser() and an injected handler (no main import). dashboard also injects cmd_dashboard_register for its nested 'register' action. Behavior-neutral: all 25 subcommands' --help output (and nested subaction help) diff-verified byte-identical to pre-extraction. Two RawDescriptionHelpFormatter epilogs (debug, logs) needed their multi-line string interiors preserved at column 0 — caught by the --help diff, not compile. main() 3297 -> 1798 LOC across this PR; add_parser calls in main.py 179 -> 89. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new test_subcommands_batch.py smoke-tests all 25 builders + the dashboard two-handler case.	2026-06-07 22:18:14 -07:00
teknium1	4da45e8727	refactor(cli): extract profile + gateway/proxy parsers into hermes_cli/subcommands/ Follow-on to the cron extraction in the same Phase 2 PR. Same pattern: per-group build_<name>_parser() functions with injected handlers, no main import. - subcommands/profile.py: build_profile_parser (190-line block out of main()). - subcommands/gateway.py: build_gateway_parser (gateway + proxy, 238-line block; they shared one inline section). Imports argparse for SUPPRESS defaults. - main(): two more inline blocks become single builder calls. Behavior-neutral: 'profile [sub] --help' and 'gateway/proxy [sub] --help' byte-identical to pre-extraction (diff-verified). main() now 2723 LOC (was 3297 at Phase 2 start); add_parser calls in main.py 179 -> 141. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new builder unit tests cover subactions, aliases, dispatch, flags.	2026-06-07 22:18:14 -07:00
teknium1	b2e6053243	refactor(cli): extract hermes cron parser into hermes_cli/subcommands/ (god-file Phase 2) Phase 2 of the god-file decomposition plan. main()'s argparse tree is 179 inline add_parser calls in one 3,297-line function. This establishes the hermes_cli/subcommands/ package and extracts the first group (cron) as the proof-of-pattern: - hermes_cli/subcommands/_shared.py: shared parser helpers (add_accept_hooks_flag), re-exported from main.py for backwards compat. - hermes_cli/subcommands/cron.py: build_cron_parser(subparsers, cmd_cron=...). Handler injected so the module never imports main (cycle avoidance). - main()'s ~155-line inline cron block becomes one build_cron_parser() call. Behavior-neutral: 'hermes cron create --help' output is byte-identical to origin/main. main() 3297 -> 3143 LOC. Validation: tests/hermes_cli/ 6466 passed / 0 failed under per-file process isolation; new test_subcommands_cron.py covers subactions, aliases, options, no-agent tristate, injected dispatch, and --accept-hooks.	2026-06-07 22:18:14 -07:00
islam666	78e2101cd2	fix: reap zombie subprocesses in web_server action status and meet_bot cleanup - web_server.py: after proc.poll() returns a non-None exit code, call proc.wait() to reap the child and move the entry from _ACTION_PROCS to _ACTION_RESULTS. Previously .poll() alone left <defunct> zombies. - meet_bot.py: terminate and wait on the pcm_pump subprocess (paplay/ ffmpeg) during the finally-block teardown. Previously leaked on every normal bot exit. - tests: add test_action_status_reaps_completed_process and test_action_status_ignores_wait_failure covering both the happy path and the wait()-raises-OSError edge case. Closes #38032	2026-06-07 21:50:57 -07:00
islam666	e53b74c394	fix(dist): stop USER_OWNED_EXCLUDE from filtering nested directories The copytree ignore lambda in _copy_dist_payload applied USER_OWNED_EXCLUDE recursively at every directory depth. This caused nested directories whose names matched exclude entries (bin, logs, cache, etc.) to be silently dropped during distribution install/update. Fix: only apply USER_OWNED_EXCLUDE filtering at the root of the staged tree, matching the two-tier pattern used by _clone_all_copytree_ignore and _default_export_ignore in profiles.py. Add 5 tests covering nested bin/logs/cache preservation and top-level filtering still working. Fixes #37954	2026-06-07 21:50:57 -07:00
islam666	18c085b1a4	fix(gateway): normalize optional systemd directives in stale-check (#41119 ) On older systemd versions that don't support RestartMaxDelaySec / RestartSteps, the installed unit file has those directives silently dropped. systemd_unit_is_current() did a strict text comparison, so the unit was perpetually flagged as outdated. Fix: _strip_optional_systemd_directives() removes RestartMaxDelaySec and RestartSteps from both the installed and expected text before comparison. Units that differ only by these optional directives are now correctly considered current.	2026-06-07 21:50:57 -07:00
Shannon Sands	86e5efb0ae	Preserve Telegram onboarding fallback errors	2026-06-07 19:48:09 -07:00
Shannon Sands	ba29010902	Use httpx for Telegram onboarding worker calls	2026-06-07 19:48:09 -07:00
Teknium	2aa316ec9c	docs(windows): fix Get-Command PATH guidance to venv\Scripts\hermes.exe (#40613 ) Closes #40464. Salvaged from #40488; re-verified on main, tightened, tested. Co-authored-by: gauravsaxena1997 <gauravsaxena1997@users.noreply.github.com>	2026-06-07 18:28:23 -07:00
Teknium	6bdc4c0231	test: skip curses tests on Windows where _curses is unavailable (#40611 ) Salvaged from #40447; re-verified on main, tightened, tested. Co-authored-by: Ganesh0690 <Ganesh0690@users.noreply.github.com>	2026-06-07 18:21:03 -07:00
teknium1	16786f3bb3	feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network Desktop connected to a remote gateway can now attach images and PDFs and display agent-written images. Previously the desktop passed a LOCAL file path to image.attach; on a remote gateway that path doesn't exist, so the image was silently dropped ("skipped unreadable path") and the vision model never saw it. The reverse direction was also broken — images the agent wrote on the gateway rendered as dead links in the remote client. Gateway (tui_gateway/server.py): - image.attach_bytes: base64 byte upload written into the gateway's own images dir and queued via the existing native-image-attach pipeline. Magic-byte extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap, structured error codes. Accepts content_base64/filename (canonical) and data/ext (older-desktop aliases). - pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI and queues the pages as images; 50 MB / 25-page caps. Accepts host path or base64 upload. - Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image) so the two methods and the existing image.attach don't duplicate logic. Gateway (hermes_cli/web_server.py): - GET /api/media: returns a gateway-local image as a base64 data URL so remote clients can display it. Auth-gated like every /api route, extension allowlist + size cap, AND confined to the gateway's own media roots (images/screenshots/cache, resolved symlink-safe) so an authed caller can't read image-extension files anywhere on disk. Desktop (apps/desktop): - syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the connection mode is 'remote'; the local fast path is unchanged. - media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and markdown-text fetch images over /api/media in remote mode. Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437) into one coherent implementation, taking the strongest parts of each and adding shared-helper cleanup plus the /api/media root-confinement hardening on top. The per-profile gateway switching from #38876 is intentionally left out as a separable feature. TUI file uploads (#40492) remain a separate surface. Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop media.remote unit tests; full tui_gateway + web_server suites green (472 passed); tsc -b clean; E2E verified the full attach→disk→queue and gateway-path→data-URL display round-trip plus the out-of-root security block. Co-authored-by: Max Mitcham <maxmitcham@mac.home> Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com> Co-authored-by: Chris Cook <ccook@nvms.com> Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>	2026-06-07 10:05:53 -07:00
teknium1	76f01780f0	fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests Follow-up on the deferred-cleanup salvage (#33774): _cleanup_workspace returned early for a non-scratch ('dir'/'worktree') task and never ran the parent sweep, so a scratch parent waiting on a 'dir' child would leak its deferred workspace forever. Run the parent sweep before the early return. Adds regression tests: deferred-while-child-active, swept-after-last-child, and dir-child-unblocks-scratch-parent.	2026-06-07 09:50:44 -07:00
Teknium	9e63109522	feat(dashboard): change UI font from the theme picker, independent of theme (#41145 ) The dashboard font is now selectable from the UI, not just YAML. A new Font section in the header theme picker overrides the UI font of whatever theme is active; the choice is orthogonal to the theme and survives theme switches. Each theme keeps its own font as the default — picking "Theme default" clears the override. - web/src/themes/fonts.ts: curated font catalog (system + Google Fonts across sans/serif/mono), each with a family stack and optional webfont URL. The catalog is the only injected-font surface — no free-text URL box, so the injected <link> origins stay fixed. - web/src/themes/context.tsx: font-override state (localStorage + server), applied after theme typography so it wins; theme apply re-asserts it, and clearing re-runs theme apply to restore the theme's own font. Mono is left to the theme so code/terminal are untouched. - web/src/components/ThemeSwitcher.tsx: Font section with grouped, self- previewing font rows and a "Theme default" clear option. - hermes_cli/web_server.py: GET/PUT /api/dashboard/font persisting to config.yaml dashboard.font, with a server-side id allow-list (unknown ids coerce to the theme sentinel). - i18n + types, api client methods, tests, and docs. Validation: 6 new backend endpoint tests pass; tsc + vite build clean; live browser test confirmed pick/persist/survive-theme-switch/clear all work.	2026-06-07 03:39:01 -07:00
Teknium	0507e4630d	fix(desktop): preserve configured base_url on same-provider model switch (#41121 ) The desktop model picker calls POST /api/model/set with provider+model only (no base_url). _apply_main_model_assignment cleared model.base_url for every non-custom provider, so re-picking a Xiaomi MiMo model wiped a Token Plan endpoint (https://token-plan-*.xiaomimimo.com/v1) back to the registry default api.xiaomimimo.com — breaking valid tp- keys with 401s. Now base_url is cleared only when switching to a different provider (the stale URL belonged to the old one); same-provider re-assignment preserves it, and an explicitly supplied base_url is honored for any provider.	2026-06-07 02:48:21 -07:00
islam666	ccacfdbd6d	fix(plugins): discover nested category plugins in 'plugins list' (issue #41066 ) _discover_all_plugins() previously did a flat iterdir() scan, missing all category-namespaced plugins (web/, image_gen/, browser/, video_gen/). Now recurses up to 2 levels deep, matching PluginManager._scan_directory_level(). Also fixes _plugin_status() to check both manifest name AND path-derived key against enabled/disabled sets, so category plugins like 'web/tavily' show correct status when enabled via config.	2026-06-07 08:02:55 +00:00
kshitijk4poor	44c0c2d4ac	refactor(inventory): make force_fresh_nous_tier keyword-only + pin contract Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Has been cancelled Details uv.lock check / uv lock --check (push) Has been cancelled Details Follow-up to the salvaged perf fix. The new force_fresh_nous_tier param was inserted into list_authenticated_providers between custom_providers and max_models. Make it keyword-only (*) so a positional caller passing max_models as the 5th arg can never silently mis-bind it to the tier-refresh flag, and add a signature-contract test that fails if the keyword-only separator is later dropped. All in-repo callers already use keyword args; verified no caller breaks.	2026-06-07 00:41:13 -07:00
helix4u	eb70ab894b	fix(inventory): avoid fresh Nous tier checks in picker payloads	2026-06-07 00:41:13 -07:00
brooklyn!	846821d8c0	Merge pull request #40684 from NousResearch/bb/cron-sessions-sidebar feat(desktop): first-class cron jobs in the sidebar + dashboard scheduler	2026-06-07 00:32:25 -05:00
Teknium	fc086da8bd	fix(gateway,windows): reliability — JOB breakaway + status --deep probes + test-leak fix (#40909 ) * fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep Three coordinated fixes for the Windows gateway reliability story: 1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn The 'hermes update' triggered from the Electron Desktop GUI ran inside Electron's job object. Without breakaway, the post-update gateway watcher spawned by update — already DETACHED_PROCESS — was still reaped when Electron's job tore down, so the gateway never came back after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000) to: - hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by every helper that calls windows_detach_popen_kwargs(), including launch_detached_profile_gateway_restart() - The watcher subprocess's own respawn snippet in hermes_cli/gateway.py (inlined flags so the watcher's child respawn also breaks away) _spawn_detached() in gateway_windows.py already had the flag; this change brings the rest of the codebase to parity. 2. Per-minute supervisor Scheduled Task — Windows equivalent of systemd Restart=always Introduces hermes_cli/gateway_supervisor.py and registers it as a second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1, LIMITED rights) alongside the existing ONLOGON task. Every minute, the supervisor uses the same gateway.status.get_running_pid() probe as 'hermes gateway status' and, if no gateway is alive, calls gateway_windows._spawn_detached() (which now includes BREAKAWAY) to bring one back. Covers every crash mode, not just 'machine rebooted': taskkill, OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw startup per minute when down, one PID-existence check per minute when up. Wired into both the schtasks-success and Startup-folder-fallback install paths via _install_supervisor_best_effort(), and removed in uninstall(). Best-effort: a failing supervisor install logs a warning but doesn't roll back the primary install. 3. 'hermes gateway status --deep' shows per-probe PASS/FAIL Replaces the existing terse '--deep' output (which only printed paths) with an actual diagnostic table: [1] PID file present [2] Lock file held by a live process [3] get_running_pid() result [4] _pid_exists(pid) — OS-level liveness [5] gateway_state.json (state + age) [6] Last lifecycle event from gateway-exit-diag.log When the high-level summary disagrees with reality, the user can see exactly which signal is lying. Test-leak fix ------------- tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages monkey-patched is_linux/is_wsl/supports_systemd_services to simulate WSL but did NOT stub is_windows(). On a Windows host, the dispatcher in _gateway_command_inner takes the is_windows() branch BEFORE the WSL guidance branch, so the test invoked gateway_windows.install() for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd — the REAL user Startup folder, never sandboxed by tmp_path — pointing at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper. When pytest tore down the tmp_path, every subsequent Windows login flashed a cmd.exe window that failed to find the missing target. Stubs is_windows=False on all four affected tests: test_install_wsl_no_systemd test_start_wsl_no_systemd test_status_wsl_running_manual test_status_wsl_not_running Defense-in-depth: _build_startup_launcher() now prefixes the launcher with 'if not exist <target> exit /b 0', so any future stale Startup entry silently no-ops instead of flashing a console window. Status enhancements ------------------- - status() now reports supervisor task presence alongside the existing schtasks/Startup info, and nudges the user to reinstall if the supervisor isn't registered. - Deep mode dumps both the supervisor task name + script path. * fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes Earlier in this branch we added a per-minute schtasks-based supervisor to respawn the gateway after crashes / GUI-update SIGTERMs. The implementation flashed a brief console window on every firing, which stole window focus. We tried several variants: - cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem) - schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is actually subsystem=Console, not GUI; it respawns the real pythonw) - schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side conhost preallocation; documented Windows quirk) - XML registration with <Hidden>true> -> still flashes (<Hidden> only hides the task in the Task Scheduler UI, not the spawned window) Researched what leading projects do: - Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor. - Tailscale: real Windows Service via SCM. Session 0, no console possible. - Syncthing: --no-console flag inside the binary + Startup folder. - openclaw: VBS Run(..., 0, False) wrapper. Suppresses the window but Super User Q971162 confirms focus-steal still occurs in some cases. None of these use a per-minute polling scheduled task. The 'auto-restart on crash' responsibility belongs INSIDE the daemon (Tailscale's in-process recovery / Ollama's monitor+worker pair) OR is delegated to the Windows Service Control Manager — not Task Scheduler. So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB fix in _subprocess_compat.py (from commit `c1e5fa433`) survives — that is the real fix for problem #2 (GUI-update kills gateway): the post-update watcher in launch_detached_profile_gateway_restart() now breaks out of Electron's job object, so the gateway respawn watcher survives the GUI quit and successfully respawns the gateway. Surviving from `c1e5fa433`: * CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes #2) * Inlined breakaway flag in the watcher respawn snippet in gateway.py * hermes gateway status --deep PASS/FAIL probes (fixes #1 — visibility) * 'if not exist <target> exit /b 0' guard in _build_startup_launcher (fixes #3 — silent no-op for stale Startup entries) * tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause of #3 — pytest WSL tests no longer leak Startup entries on Win hosts) Removed in this commit: * hermes_cli/gateway_supervisor.py (entire file) * Supervisor section in hermes_cli/gateway_windows.py (~180 lines): get_supervisor_task_name, get_supervisor_script_path, _build_supervisor_cmd_script, _write_supervisor_script, _install_supervisor_task, is_supervisor_task_registered, _install_supervisor_best_effort * _install_supervisor_best_effort() calls in install() (3 spots) * supervisor cleanup block in uninstall() * supervisor display lines in status() / status(deep=True) Future direction (out of scope for this PR): the right place for Windows 'Restart=always' semantics is a real Windows Service installed via pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM auto-restart, no console window possible. That's a meaningful next-PR project, not a band-aid. Tests: 51 pass / 2 pre-existing failures in tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are TestSupportsSystemdServicesWSL cases that fail on origin/main too — unrelated to this PR).	2026-06-06 19:53:58 -07:00
Teknium	887295ba54	fix(config): preserve custom-provider models maps and metadata through v11->v12 migration (#40573 ) Salvaged from #40410; cleaned up, re-verified against main, tests added. Co-authored-by: rodboev <rodboev@users.noreply.github.com>	2026-06-06 18:43:20 -07:00
Teknium	89040e0db3	fix(secrets): fail early with clear error when bitwarden setup runs without TTY (#40571 ) Salvaged from #40280; cleaned up, re-verified against main, tests added. Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>	2026-06-06 18:36:40 -07:00
Teknium	5b43bf7d02	feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI) (#40355 ) * feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI) Adds a GUI-only uninstall path so people can remove the desktop Chat GUI while keeping the Hermes agent + their config/sessions/.env, and surfaces the three CLI uninstall modes inside the desktop app's Settings → About. CLI: - New hermes_cli/gui_uninstall.py: cross-platform discovery + removal of the desktop GUI's artifacts (source-built dist/release/node_modules + build stamp, the packaged app bundle, and the Electron userData dir) on Linux, macOS, and Windows. Never touches the agent source, venv, or user data. - `hermes uninstall --gui` removes only the Chat GUI; `--gui-summary` prints a JSON install snapshot (used by the desktop UI to gate options + detect a missing agent for a future lite client). - `hermes uninstall --yes` / `--full --yes` now run non-interactively, sharing the destructive sequence via a new _perform_uninstall() helper. The keep-data and full flows also sweep the GUI artifacts. Desktop: - electron/desktop-uninstall.cjs: pure helpers mapping each mode (gui/lite/full) to CLI flags, resolving the running app bundle per OS, and building the detached cleanup script that waits for the app to exit, runs the Python uninstall, and removes the bundle. - IPC hermes:uninstall:summary / :run, preload bridge, and types. - Settings → About "Danger zone" with the three options; agent-removing options hide when no local agent is detected. Tests: tests/hermes_cli/test_gui_uninstall.py (22 pass with the existing uninstall tests), electron/desktop-uninstall.test.cjs (17 pass, wired into test:desktop:platforms). Docs: desktop.md "Uninstalling" + cli-commands.md. * fix(desktop): tear down backend process tree before GUI uninstall (Windows lock safety) The desktop uninstall cleanup script waited only on the desktop app's own PID, but a backend grandchild (gateway / pty terminal / hermes REPL) can outlive it and keep hermes.exe + venv files mandatory-locked on Windows — making the script's rmdir half-fail and leaving a partial install, the same failure class as the self-update path's #37532. - main.cjs: runDesktopUninstall now awaits releaseBackendLock() before spawning the cleanup script — tree-kills every backend PID the desktop owns (primary + pool) via taskkill /T /F and polls the venv shim until unlocked. Extracted the shared core out of releaseBackendLockForUpdate so both the update hand-off and the uninstaller use the identical, incident-hardened teardown. No-op on macOS/Linux (no mandatory locks). - desktop-uninstall.cjs: Windows cleanup script removes the bundle via a bounded rmdir retry loop (10x, 1s) instead of a single rmdir, since Windows releases directory handles lazily even after the holding process exits. - Dropped a fragile tasklist\|findstr reap-by-path attempt; the Electron-side tree-kill-by-PID is the reliable mechanism. Tests: desktop-uninstall.test.cjs updated for the retry-loop output (17 pass). * fix(desktop): address review on GUI uninstall (venv self-delete, gates, wait-loop) Resolves @OutThisLife's review on #40355: 1. full mode now gated on agent presence (needsAgent: true). It removes the agent + user data, so on a lite client with no local agent it's hidden like lite — no more offering to remove an agent that isn't there. 2. (Finding 3, the real bug) lite/full no longer rmtree the venv from the venv's OWN python. On Windows a running python.exe is mandatory-locked, so that half-fails. New lightweight 'python -m hermes_cli.uninstall --mode X' entrypoint (stdlib-only imports) lets the desktop run agent-removing modes under the SYSTEM python (findSystemPython) with PYTHONPATH=<agentRoot>, so import hermes_cli resolves from source while the venv is torn down. Falls back to venv python + logs when no system python (gui-only unaffected). 3. Windows wait-loop is now bounded (60 tries, matching POSIX) and matches the PID as a whole space-delimited token via findstr (no substring 99->990 trap, no redundant bare find). set HERMES_HOME/PID/PYTHONPATH now quoted. 4. Renamed the misleading 'returns null for dev run' test — the dev-run safety is shouldRemoveAppBundle(isPackaged=false), which the test now asserts. Docs: note that --gui on a source checkout also sweeps node_modules/build output. Tests: 18 python + 19 desktop pass.	2026-06-06 18:22:38 -07:00
Teknium	f2e8234307	test: update non-Termux workspace-scope fixtures for #38358 fix The non-Termux web/TUI install path now scopes to --workspace <name>; update two fixtures that asserted the old unscoped install commands.	2026-06-06 18:22:20 -07:00
Teknium	7db7a9462d	fix: align test fixture arg order + add zakame to AUTHOR_MAP Conflict resolution prefixes --workspace web before --silent (preserving the Termux npm_workspace_args path); update test_cmd_update fixture to match. Add zakame@zakame.net -> zakame mapping so CI author check passes.	2026-06-06 18:22:20 -07:00
Zak B. Elep	675fb10240	fix(install): correct check_dir tautology and add --workspace web test - check_dir = npm_dir if audit_extra else npm_dir evaluated identically in both branches; change to PROJECT_ROOT if audit_extra else npm_dir so workspace-scoped audits check the workspace root's node_modules - Add test_npm_install_uses_workspace_web_scope asserting --workspace web is passed adjacently in the _build_web_ui npm install invocation	2026-06-06 18:22:20 -07:00
Zak B. Elep	4bf52022e5	fix(tui): correct --skip-build hint and add TUI workspace install test - Update the --skip-build pre-build hint in the dashboard startup path to use `npm install --workspace web && npm run build -w web` so users don't accidentally trigger a desktop rebuild by following the hint. - Add test_tui_launch_install_uses_workspace_scope to assert that the TUI launch npm install carries --workspace ui-tui, covering the call site added in the prior commit.	2026-06-06 18:22:20 -07:00
Zak B. Elep	0416f852f2	fix(tui): scope TUI launch install and fix stale hints/test - Add --workspace ui-tui to the TUI launch npm install, the one call site missed by the prior commit. Without scoping it ran from PROJECT_ROOT and still resolved apps/desktop via the apps/* glob. - Update the two manual-recovery hints in _build_web_ui (npm install failure and build failure paths) to use the scoped form `npm install --workspace web && npm run build -w web` so users following the hint don't accidentally trigger a desktop rebuild. - Update the stale test assertion in test_cmd_update.py to expect --workspace web in the _build_web_ui npm ci call, which was previously unreachable through the if-guard and left the workspace- scoping change from the prior commit unverified.	2026-06-06 18:22:20 -07:00
Brooklyn Nicholson	f491260365	Merge remote-tracking branch 'origin/main' into bb/cron-sessions-sidebar # Conflicts: # apps/desktop/src/app/cron/index.tsx	2026-06-06 16:34:23 -05:00
kshitij	ebed881d46	fix(cli): quarantine running hermes.exe during update dep-verification repair on Windows (#40409 ) The dependency-verification repair in _verify_core_dependencies_installed ran 'pip install --reinstall -e .' via _run_install_with_heartbeat directly, bypassing the Windows shim-quarantine that the primary install path performs. That reinstall rewrites the entry-point shims, and on Windows the live hermes.exe is the running process — pip can neither delete nor overwrite it. With no quarantine, the shim was left missing and 'hermes' dropped off PATH ('hermes' is not recognized... after update). Extract the rename-out-of-the-way / restore-on-failure logic into a reusable _run_quarantined_install helper and route both the primary editable installs and the --reinstall -e . repair through it. The per-package repair installs only third-party deps (never hermes-agent), so they don't touch the shims and are left untouched. Add a regression test (fails on old code, passes on new).	2026-06-06 12:50:58 -05:00
kshitij	d4a7bfd3aa	Merge pull request #29724 from bbednarski9/bbednarski/nmf-41B-nemoflow-plugin feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay	2026-06-06 10:46:41 -07:00
Brooklyn Nicholson	003110c107	fix(ci): map @TheGardenGallery email + drop unused pytest import - check-attribution: add chilltulpa@gmail.com -> TheGardenGallery to AUTHOR_MAP in scripts/release.py (new external contributor via the carried-over commits). - ty: the dashboard back-compat test imported pytest but never used it, tripping unresolved-import. Drop the dead import — tests are plain functions driving the parser via subprocess, no pytest API needed.	2026-06-06 12:43:28 -05:00
The Garden	2820d87ea5	fix(cli): tolerate stale `dashboard --tui` from old desktop shells Older Hermes desktop app shells (<= 0.15.x) spawn the backend as `hermes dashboard --no-open --tui --host ... --port ...`. The --tui flag was removed from the dashboard subcommand in `cae6b5486` (embedded chat is always on now). When a user's CLI updates past that commit but their desktop app binary has not, argparse hard-errored with 'unrecognized arguments: --tui' and exit(2). The backend died before becoming ready and the desktop GUI showed only 'Hermes couldn't start' with no actionable cause — a confusing brick for anyone whose app and CLI versions drift apart across an update. Add a hidden, deprecated, accepted-and-ignored --tui flag to the dashboard subparser so an old app shell + new CLI degrades gracefully. Hidden from --help via argparse.SUPPRESS so we don't re-advertise a removed feature. Safe to delete once the floor app version is well past 0.16.0. Adds tests/hermes_cli/test_dashboard_tui_backcompat.py pinning: the flag parses without error, stays hidden from --help, and the modern (no --tui) invocation is unaffected.	2026-06-06 12:43:28 -05:00
Brooklyn Nicholson	3e2d758816	feat(desktop): fire cron jobs from the dashboard backend The cron scheduler tick loop only ran inside `hermes gateway run`, but the desktop app spawns a `hermes dashboard` backend with no gateway — so any cron a user created in the app was saved and never fired (silently). Run a minimal scheduler ticker inside the dashboard lifespan, gated on a new HERMES_DESKTOP=1 marker the electron shell injects, so server `hermes dashboard` is unaffected. Cross-process safe via the existing cron/.tick.lock, so it never double-fires alongside a real gateway.	2026-06-06 12:42:32 -05:00
kshitijk4poor	c4c5548eb4	fix(middleware): single-use next_call guard + deepcopy-safe request copies Address the two non-blocking follow-ups from review: - next_call is now single-use per middleware frame. A second invocation raises instead of silently re-running the downstream provider/tool, so the terminal call cannot execute twice via the chain. The error surfaces through the existing handler, which preserves the first downstream result. - Request-middleware payload copies go through _safe_copy(), which falls back to a shallow dict copy when deepcopy() fails on a non-deepcopyable member (clients, callbacks, file handles) instead of aborting the pass. Adds regression coverage for both: double next_call() keeps the terminal single-run, and a non-deepcopyable (threading.Lock) request payload still runs middleware via the shallow fallback.	2026-06-06 23:07:25 +05:30
Bryan Bednarski	5abe45674d	fix(middleware): preserve translated downstream failures Track successful next_call completion separately from invocation so execution middleware that catches and translates a downstream provider/tool failure does not accidentally convert that failure into a successful None result. Also avoid wrapping BaseException from downstream execution, and document the execution middleware error semantics. Tests cover: - pre-next_call middleware failures fail open to the remaining chain - post-next_call middleware failures preserve the downstream result - translated downstream failures propagate instead of returning None - downstream BaseException is not wrapped Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>	2026-06-06 09:26:18 -07:00
Brooklyn Nicholson	3606307339	fix(gateway): use user launchd domain + Background session, detached fallback (macOS 26) Salvages the primary fix from #24275 (asdlem) and layers a last-resort fallback on top: Primary (from #24275): the real macOS 26 root cause is that `gui/<uid>` isn't reachable from non-Aqua/background sessions. Switch the launchd domain to `user/<uid>` and mark the plist valid for both Aqua and Background sessions (LimitLoadToSessionType), restoring a real supervised service. Treat exit code 125 as "job unloaded" so start/restart re-bootstrap and retry. Last resort (this PR): the #23387 reporter saw `user/<uid>` bootstrap also fail with error 5 on some hosts. When even a fresh bootstrap can't manage the domain (codes 5/125 persist), degrade to a CLI-managed detached background process instead of crashing — logs to gateway.log, PID tracked via gateway.pid so stop/status/restart keep working. Print guidance that it won't auto-start at login or auto-restart on crash. Co-authored-by: asdlem <asdlem@users.noreply.github.com>	2026-06-06 09:08:37 -07:00
Brooklyn Nicholson	59c273ba3a	fix(gateway): fall back to detached launch when launchd rejects domain (macOS 26) macOS 26+ broke launchctl management of the gui/<uid> (and user/<uid>) domains: `bootstrap` returns error 5 and `kickstart` returns error 125 ("Domain does not support specified action"), so `hermes gateway start/install/restart` crashed with a cryptic traceback (#23387). Detect these codes and degrade gracefully: launch the gateway as a CLI-managed detached background process (the documented `nohup hermes gateway run --replace` workaround), with logs to gateway.log and the PID tracked via gateway.pid so stop/status/restart keep working. Print clear guidance that the service won't auto-start at login or auto-restart on crash on this macOS version. launchd_stop also tolerates 125/5 from bootout and falls through to the PID-based kill.	2026-06-06 09:08:37 -07:00
Teknium	2bf0a6e760	feat(dashboard): full tool backend configuration in the GUI (#40418 ) Replicate the `hermes tools` configurator in the dashboard Skills → Toolsets view. Each toolset now opens a config drawer that covers the full lifecycle the CLI offers: enable/disable, pick a provider/backend, enter and save API keys, and run a provider's post-setup install hook with a live log tail. The toolset view was previously read+toggle only — the provider matrix and key-status endpoints existed but the page never called them, and there was no way to save a key or run a backend install (npm/pip/binary) from the browser. Backend: - New CLI subcommand `hermes tools post-setup <KEY>` — non-interactive, scriptable target that runs a provider's install hook (agent_browser, camofox, cua_driver, kittentts, piper, ddgs, spotify, langfuse, xai_grok). Validated against valid_post_setup_keys() so an arbitrary key can't drive _run_post_setup. - PUT /api/tools/toolsets/{name}/env — save API keys to ~/.hermes/.env via save_env_value (same store the CLI writes), validated against the toolset category's env-var allowlist; blank values skipped. - POST /api/tools/toolsets/{name}/post-setup — spawn-action that runs `hermes tools post-setup <key>`; frontend tails the log via the existing /api/actions/tools-post-setup/status. Registered in _ACTION_LOG_FILES. Frontend: - New ToolsetConfigDrawer component (provider radios, password key inputs with saved-state, get-a-key links, Run-setup + live install log). Toolset cards get a Configure button + the drawer also exposes the enable toggle. - api.ts: toggleToolset, getToolsetConfig, selectToolsetProvider, saveToolsetEnv, runToolsetPostSetup + ToolsetConfig/Provider/EnvVar/ EnvResult types. Validation: 56 admin-endpoint tests pass (10 new: env save w/ CLI parity + allowlist reject + blank-skip, post-setup spawn validation, auth gate); 232 web_server tests pass; web npm run build + eslint clean; HTTP E2E exercises save-key (CLI reads it back) and spawn+poll post-setup to exit 0.	2026-06-06 07:45:36 -07:00
Teknium	56236b16e3	feat(dashboard): rehaul Skills hub browser — connected hubs, featured, preview + security scan (#40384 ) The Browse-hub tab was a blank search box with sparse result cards (name + source + one Install button), no way to read a skill before installing, no visual security scan, and no indication it was even connected to any hubs. Backend (web_server.py): - GET /api/skills/hub/sources — lists the configured hubs (label + trust tier + GitHub rate-limit + index availability) and featured skills pulled from the centralized index (zero extra API calls), plus installed-skill provenance so the UI can mark already-installed results. - GET /api/skills/hub/preview — fetches a skill's SKILL.md text + file manifest WITHOUT installing (decodes byte-stored text, masks binaries). - GET /api/skills/hub/scan — runs the SAME quarantine + scan_skill + should_allow_install pipeline the CLI installer uses, then cleans up quarantine, returning verdict / per-finding detail / severity tally / install-policy decision. - search now returns per-source counts + timed-out sources + installed map. Frontend (SkillsPage HubBrowser): - Landing state: connected-hubs strip + featured skill grid (no more blank page). - Rich cards: trust-level color coding, source, tags, identifier, Details + Install (or Installed state). - Detail dialog: read the actual SKILL.md, on-demand visual security scan (verdict pill, severity tally, per-finding list, allow/block policy), GitHub repo link. - Search meta line: result count + timing + per-source breakdown (the 'feels slow / no feedback' complaint). Tests: 4 new endpoint test classes (sources/preview/scan + updated search shape) in test_dashboard_admin_endpoints.py.	2026-06-06 02:44:50 -07:00

1 2 3 4 5 ...

1337 commits