hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-24 16:54:43 +00:00

Author	SHA1	Message	Date
Robin Fernandes	639c1e3636	feat(sessions): add optional max session cap	2026-06-08 15:12:12 -07:00
Brooklyn Nicholson	e88116256c	fix(update): scope git fetch to target branch A bare `git fetch origin` (and `git fetch upstream`) pulls every ref. The repo carries thousands of auto-generated branches, so on any non-single-branch checkout the installer's update path and `hermes update` spend minutes downloading the full branch list — long enough to stall the desktop installer or trip the follow-up `git pull --ff-only`. Scope every update-path fetch to the branch we actually compare/merge against: - scripts/install.sh: collapse the remote to single-branch and fetch only $BRANCH on the "existing install, updating" path. - hermes_cli/main.py: fetch the resolved branch in the apply path, the --check path (upstream + origin), and the fork upstream-sync. Tracking-ref updates still happen via git's opportunistic refspec, so the later origin/<branch> rev-parse/rev-list checks are unaffected. Tests assert the apply-path fetch is branch-scoped and never bare.	2026-06-08 15:24:31 -04:00
teknium1	c78b3e1d3c	fix(auth): add Codex OAuth accounts as distinct pool entries hermes auth add openai-codex now creates an independent manual:device_code pool entry per account instead of routing through the singleton _save_codex_tokens save path, which collapsed every added account into the latest login (the second add overwrote the first account's singleton-mirrored device_code entry). This is the add-path half of #39236; PR #39243 (already on this branch) fixes the re-auth half. manual:device_code entries refresh from their own token pair (_sync_codex_entry_from_auth_store only adopts the singleton for source=="device_code"), so they need no providers.openai-codex shadow. Adding the first credential marks openai-codex active (the singleton path did this implicitly) so the setup wizard's get_active_provider() check still passes; subsequent adds leave the active provider untouched. Adds SOURCE_MANUAL_DEVICE_CODE constant and a regression test that two distinct accounts keep distinct token pairs. Updates two existing add tests to the pool-only behavior. Co-authored-by: glesperance <info@glesperance.com>	2026-06-08 11:57:03 -07:00
Ted Malone	761b744abb	fix(auth): preserve independent Codex pool entries on re-auth (#39236 ) The #33538 fix refreshed every credential_pool entry with source "manual:device_code" on every Codex OAuth re-auth, on the assumption that such entries were always legacy aliases of the singleton from the #33000 workaround era. That assumption is no longer true: `hermes auth add openai-codex` also produces "manual:device_code" entries for independent ChatGPT accounts, and the broad sync silently clobbered them with the latest-authenticated token pair (labels preserved, token material overwritten, status / quota readings then lie). Narrow the sync: refresh a "manual:device_code" entry only when its existing access_token matches the previous singleton access_token (true legacy alias). Entries with distinct token material represent independent accounts and are now left alone. Error markers are cleared only on entries actually rewritten, so an independent account's own 429 / 401 state survives a re-auth that targeted a different account. Tests: * New: independent acctB/acctC are not overwritten when acctA re-auths. * New: legacy singleton-alias still refreshed (preserves #33538). * New: missing previous singleton state handled (no crash, no false alias match). * New: access_token-only alias match (legacy schema without refresh_token still recognized). * New: error markers cleared only on entries actually refreshed. * Updated: existing manual-device-code sync test now covers both the legacy-alias path AND the independent-account path in one fixture. Behaviour change is zero for users with a single Codex account and zero for users whose only "manual:device_code" entry is the legacy alias of the singleton. Users with multiple independent Codex accounts added via `hermes auth add` now keep their distinct token material across re-auths. Local: 29 passed in tests/hermes_cli/test_auth_codex_provider.py, no new failures in tests/hermes_cli/ vs upstream/main baseline. Fixes #39236.	2026-06-08 11:57:03 -07:00
xxxigm	96fd9d4979	fix(desktop): stop running Hermes.exe locking win-unpacked before Windows pack (#42100 ) * fix(desktop): stop running app locking win-unpacked before pack On Windows a running Hermes.exe keeps an exclusive lock on release/win-unpacked/Hermes.exe, so electron-builder's pack cannot replace it and dies with "remove ...\Hermes.exe: Access is denied" / ERR_ELECTRON_BUILDER_CANNOT_EXECUTE (before-pack hits the same EPERM cleaning the dir, and the cache-purge retry repeats the failure since the lock is still held). Before building the packaged app, terminate any process whose executable lives inside this build's release/ tree so the rebuild -- including the installer's headless --update rebuild -- can replace the binary. Scope is narrow (only exes under release/), POSIX is a no-op (it can unlink a running binary), and the final error now points Windows users at the running-app cause. * test(desktop): cover the win-unpacked lock-breaker helper Verify _stop_desktop_processes_locking_build is a no-op off-Windows, terminates only processes whose exe lives under release/ (sparing our own PID and unrelated installs), and short-circuits when no release dir exists.	2026-06-08 11:51:31 -07:00
Teknium	abcf996b1f	feat(windows): enable dashboard /chat tab via ConPTY (win_pty_bridge) + tests (#42251 ) * feat(windows): enable dashboard chat tab via ConPTY (win_pty_bridge) Add hermes_cli/win_pty_bridge.py — a pywinpty-backed drop-in for PtyBridge with the same spawn/read/write/resize/close surface — and wire it into the web_server PTY import block so Windows picks it up instead of falling back to None. pywinpty is already a declared win32 dependency (pyproject.toml). The ConPTY read path runs inside run_in_executor so the event loop is never blocked. Spawn/read/write/terminate call shapes are taken directly from tools/process_registry.py which already exercises the same pywinpty version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: remove WSL2-only caveat for dashboard chat tab The chat pane now works on native Windows via the ConPTY bridge added in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(windows): cover ConPTY bridge + web_server platform-branched import Companion to the bridge added in the previous commits. Verified live on native Windows 11 (pywinpty 2.0.15) against `hermes dashboard`'s `/api/pty` WebSocket: the spawned `hermes --tui` (node entry.js) renders through ConPTY, resize escapes reach `setwinsize`, and closing the WS reaps both the node child and the pywinpty agent with zero orphans. tests/hermes_cli/test_win_pty_bridge.py Mirrors the layout of the existing POSIX test_pty_bridge.py: spawn/io/resize/close/env coverage against cmd.exe and python -c, plus the cross-platform fallback surface (PtyUnavailableError, the off-Windows `spawn -> raises PtyUnavailableError` guard, and the load-bearing _clamp() helper that protects setwinsize from garbage winsize values out of xterm.js). tests/hermes_cli/test_web_server_pty_import.py Asserts that web_server.PtyBridge resolves to WinPtyBridge on win32 and to the POSIX PtyBridge on POSIX, that PtyUnavailableError is the matching class on each side (so isinstance checks in /api/pty's spawn fallback path work), and a source-text check that pins the platform-branched import shape so a future refactor can't quietly collapse it back to a POSIX-only import. scripts/release.py AUTHOR_MAP entries so CI release-note generation can resolve both authors' plain (non-noreply) emails to their GitHub logins. Co-Authored-By: JoelJJohnson <josephjohnson.joel@gmail.com> Co-Authored-By: Nea74 <andreas@schwarz-ketsch.de> --------- Co-authored-by: JoelJJohnson <josephjohnson.joel@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nea74 <andreas@schwarz-ketsch.de>	2026-06-08 11:32:43 -07:00
BarnacleBoy	550b72dd87	fix(cli): gate tool-rendering paths with tool_progress_mode, not quiet_mode quiet_mode was being used to suppress tool-result display when tool_progress_mode was 'off'. But quiet_mode also gates operational status messages, so users with /verbose + tool-progress off lost all status output. Adds a dedicated tool_progress_mode attribute to AIAgent; the tool_executor result-rendering path gates on tool_progress_mode != 'off'. The CLI passes its tool_progress_mode through agent setup and the tool-progress cycle command syncs it onto the live agent. Fixes #33860.	2026-06-08 11:29:53 -07:00
firefly	ae94ed1728	fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-08 10:02:05 -07:00
Teknium	9c9d9113a8	fix(auth): auto-detect OpenRouter credential from the pool, not just env (#42263 ) resolve_provider() auto-detection only checked OPENROUTER_API_KEY/ OPENAI_API_KEY env vars, never the credential pool. A key added via `hermes auth add openrouter` (manual pool entry, no env var) was invisible: the provider failed to resolve or resolved with an empty api_key, so requests went out with no Authorization header and OpenRouter returned "HTTP 401: Missing Authentication header" while `hermes auth list` showed the credential. Closes #42130. - auth.py: check load_pool("openrouter").has_credentials() after the env check - dump.py: `debug share` shows 'openrouter set (auth pool)' instead of the misleading 'not set' when the key lives in the pool - add regression tests (pool credential auto-detects; empty pool still raises)	2026-06-08 10:01:47 -07:00
teknium1	a77efada5f	refactor(cli): extract 18 model-flow wizard functions into model_setup_flows (god-file Phase 2) Lift the 18 _model_flow_* provider-setup wizard functions out of hermes_cli/main.py into hermes_cli/model_setup_flows.py. Behavior-neutral; main.py 14050 -> 11479 LOC. select_provider_and_model (the dispatcher) STAYS in main.py and re-imports the flows via an explicit 'from hermes_cli.model_setup_flows import (...)' block, so both its bare-name calls and existing test monkeypatches targeting hermes_cli.main._model_flow_* keep resolving against main's namespace unchanged. Imports: 3 neutral deps (argparse, os, subprocess) at the module top; the 14 main.py-internal helpers the flows call (_prompt_api_key, _save_custom_provider, the reasoning-effort/stepfun/qwen helpers, _run_anthropic_oauth_flow, ...) are lazy-imported per-flow (from hermes_cli.main import ...) so the new module never imports main at module scope -> no import cycle. Repointed one source-inspection change-detector (test_setup_ollama_cloud_force_refresh) to read the module the ollama-cloud branch moved to. Validation: 6563/6563 hermes_cli tests pass; live flow-dispatch probe confirms the lazy main-internal imports resolve at runtime.	2026-06-08 09:42:44 -07:00
teknium1	094aa85c37	refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) Lift the 5 agent-construction/session-resume methods out of HermesCLI into hermes_cli/cli_agent_setup_mixin.py:CLIAgentSetupMixin. Behavior-neutral; cli.py 14139 -> 13492 LOC. Methods moved (~647 LOC): _ensure_runtime_credentials, _resolve_turn_agent_config, _init_agent, _preload_resumed_session, _display_resumed_history. All self.* calls resolve unchanged via the MRO (HermesCLI(CLIAgentSetupMixin, CLICommandsMixin)). Import split (same recipe as #41942): 2 neutral deps (sys, _escape) imported at the mixin module top; 12 cli.py-internal helpers/constants (AIAgent, ChatConsole, CLI_CONFIG, _cprint, _DIM, _RST, _accent_hex, ...) imported lazily per-method (from cli import ...) so the mixin never imports cli at module scope -> no cycle. Repointed one source-inspection change-detector (test_callable_api_key.py) to read the mixin file where the method now lives.	2026-06-08 09:41:34 -07:00
yoniebans	87ac7cac13	fix(dashboard): log update changelog against origin/main, not @{upstream} The behind-count (banner._check_via_local_git) measures HEAD..origin/main, but _recent_upstream_commits logged HEAD..@{upstream}. On a feature-branch checkout @{upstream} is the branch's own tip (0 commits), so the changelog came back empty while behind>0 — the overlay then showed generic filler instead of what changed. Pin the commit range to origin/main so count and changelog agree. Verified against a checkout 11 behind origin/main: now returns 11 commits.	2026-06-08 08:58:26 -07:00
yoniebans	9e360681f8	feat(dashboard): return recent commits from /api/hermes/update/check Add a best-effort `commits` list (sha/summary/author/at) to the update-check response for git/pip installs that are behind upstream, so the desktop's remote update overlay can show what's changed before applying. Additive and non-breaking: existing consumers (legacy dashboard, tests using subset assertions) ignore the new field. Leaves the shared check_for_updates() int contract untouched — commits come from a separate best-effort git call.	2026-06-08 08:58:26 -07:00
teknium1	cb13723f53	fix(pty-bridge): mark os.killpg/getpgid windows-footgun-ok (POSIX-only module)	2026-06-08 07:03:12 -07:00
paulb26	b31c6c33b2	fix(pty-bridge): terminate PTY process groups on teardown	2026-06-08 07:03:12 -07:00
kshitij	b99c6c4277	Merge #42076 : nested category plugin discovery + alias-normalized enable/disable (#41066 ) Merge #42076: nested category plugin discovery + alias-normalized enable/disable (#41066) Lands the complete nested category plugin fix: - Discovery in `hermes plugins list` (from @islam666's #41076, carried in this PR) - Alias-normalized enable/disable mutation path so nested plugins can be toggled - Fixes the #41076 base breakages (web_server 6-tuple unpack + stale test fixtures) Co-authored work: discovery by @islam666 (#41076). Closes #41066.	2026-06-08 05:47:27 -07:00
kshitijk4poor	2b89afec79	fix(plugins): alias-normalize enable/disable for nested category plugins (follow-up to #41076 ) #41076 makes `hermes plugins list` discover nested category plugins (e.g. observability/nemo_relay). This adds the missing enable/disable mutation path so those plugins can actually be toggled, and fixes two incomplete-update breakages on the #41076 base. Before: `hermes plugins enable nemo_relay` -> "Plugin 'nemo_relay' is not installed or bundled." (exit 1), because cmd_enable/cmd_disable went through _plugin_exists(), which only checked top-level plugins/<name>/. Changes: - Add _resolve_plugin_key(): resolve a bare manifest/leaf name OR a full path-derived key (observability/nemo_relay) to the canonical key the runtime loader gates on, reusing #41076's _discover_all_plugins(). A bare leaf name ambiguous across two categories resolves to None rather than silently picking one. - cmd_enable/cmd_disable resolve first, persist the canonical key, and drop any stale legacy bare-name alias so the enabled/disabled lists can't drift into a contradictory state. _plugin_exists delegates to the same resolver. - Fix #41076 base breakages: _discover_all_plugins now returns 6-tuples, but web_server._merged_plugins_hub() still unpacked 5 (ValueError on the dashboard plugins-hub endpoint) and several test_plugins_cmd_list.py fixtures were still 5-tuples. Both updated; the hub status check is now key-aware. Verified e2e on the real CLI + runtime loader (isolated HERMES_HOME): `hermes plugins enable nemo_relay` writes observability/nemo_relay to config.yaml and the loader then loads it (enabled=True, error=None); a stale bare-name alias is cleared on disable; the dashboard _merged_plugins_hub() runs without crashing. Adds resolution + enable/disable tests; full tests/hermes_cli/test_plugins_cmd* + web_server plugin tests green. Follow-up to #41076 (#41066). Branched from that PR's head.	2026-06-08 17:57:37 +05:30
teknium1	0904bc7ea2	refactor(cli): extract 32 slash-command handlers into CLICommandsMixin (god-file Phase 4) Lift the `_handle_*_command` cluster (2,077 LOC) out of HermesCLI into hermes_cli/cli_commands_mixin.py; HermesCLI now inherits CLICommandsMixin so every self.<handler> call resolves unchanged via the MRO. Behavior-neutral. Import discipline mirrors gateway/slash_commands.py (PR #41886): neutral deps imported at the mixin module top level; cli.py-internal helpers/constants (_cprint, _ACCENT, save_config_value, ...) imported lazily inside each handler via 'from cli import ...' so the mixin never imports cli at module scope. cli.py 16215 -> 14139 LOC. One test mock repointed (cli.is_browser_debug_ready -> hermes_cli.cli_commands_mixin.is_browser_debug_ready).	2026-06-08 02:13:07 -07:00
floory	15c99b437f	fix(cli): set PYTHON env for node-gyp native builds on NixOS (#40690 ) * fix(cli): set PYTHON env for node-gyp native builds on NixOS node-gyp (triggered by node-pty during npm ci) looks for python3 on PATH, which fails on NixOS because python3 lives in the nix store and is not on the system PATH. Add _nixos_build_env() — a two-tier helper that detects NixOS and: 1. Fast path: hermes venv python3 (~0s) 2. Fallback: nix-shell which python3 (~2-5s) Wire it into _run_npm_install_deterministic() via a new env= parameter, then pass it through cmd_gui() and _update_node_dependencies(). Non-NixOS systems: _nixos_build_env() returns None, behavior unchanged. * fix(cli): merge _nixos_build_env() with os.environ, fix NixOS detection, add explicit return None - Critical fix: both Tier 1 (venv) and Tier 2 (nix-shell) now return {**os.environ, "PYTHON": ...} instead of {"PYTHON": ...} — subprocess.run with env= replaces the entire environment, so the old code wiped PATH and broke npm/node on NixOS entirely. - Uses re.search(r"^ID=nixos$", ...) for anchored NixOS detection instead of unanchored substring match (could match ID_LIKE=...nixos). - Removes redundant Path.exists() guard before read_text(); just catches OSError (one filesystem read instead of two). - Adds explicit return None at end of function for type-hint consistency.	2026-06-08 13:57:37 +05:30
Teknium	4d18717b6c	fix(gateway): drop --replace from systemd unit templates (#41892 ) Under systemd's Restart=always, --replace turns every restart into a self-kill loop: the new instance reads gateway.pid, kills the previous process, writes its own PID, and on the next restart the cycle repeats. A process supervisor owns the lifecycle — --replace is for manual one-shot takeovers and fights the supervisor. Remove --replace from both the system-level and user-level systemd ExecStart lines. The --replace flag stays available for manual 'hermes gateway run --replace' and on the macOS launchd fallback path (#23387), which is a deliberate manual takeover, not a supervised unit. Also drop RestartMaxDelaySec / RestartSteps from the templates — they require systemd v255+ and are silently ignored on older versions. The _strip_optional_systemd_directives normalizer stays so existing installs whose on-disk unit still carries those directives aren't flagged as outdated. Credit: reported and diagnosed by @Skippy-the-Magnificent-one (PR #37145); reimplemented here under project authorship because the original commit was authored under a non-existent email.	2026-06-08 00:20:08 -07:00
konsisumer	3714caa1b9	fix(session): follow compression continuations for transcript reads	2026-06-07 23:57:20 -07:00
teknium1	1a626470ca	refactor(cli): promote 9 closure handlers to top-level + extract their parsers (god-file Phase 2 follow-up) Subcommands whose handler was a closure defined inside main() — memory, acp, tools, insights, skills, pairing, plugins, mcp, claw — have their handler promoted to a top-level function and their parser block extracted into hermes_cli/subcommands/<name>.py (build_<name>_parser, injected handler). These 9 had zero closure-over-main-locals, so promotion is a pure relocation. acp/mcp parser blocks use the shared add_accept_hooks_flag helper. main() 1798 -> 954 LOC (71% below the 3297 Phase-2 starting point); add_parser calls in main.py 89 -> 28. Deferred: sessions, computer-use, secrets handlers reference <name>_parser (for a no-subcommand print_help fallback) — left in place to avoid the _self_parser indirection; minority, low value. Behavior-neutral: all 9 subcommands' --help (incl nested subactions) byte- identical to pre-extraction (diff-verified). tests/hermes_cli/ 6519 passed / 0 failed; new test_subcommands_followup.py covers the 9 builders.	2026-06-07 22:56:23 -07:00
teknium1	568e127612	refactor(cli): extract 25 more subcommand parsers into hermes_cli/subcommands/ Batch extraction of every remaining subcommand whose handler is top-level and whose parser block is pure argparse: model, setup, postinstall, whatsapp, slack, login, logout, auth, status, webhook, hooks, doctor, security, dump, debug, backup, import, config, version, update, uninstall, dashboard, gui, logs, prompt-size. Each becomes hermes_cli/subcommands/<name>.py with build_<name>_parser() and an injected handler (no main import). dashboard also injects cmd_dashboard_register for its nested 'register' action. Behavior-neutral: all 25 subcommands' --help output (and nested subaction help) diff-verified byte-identical to pre-extraction. Two RawDescriptionHelpFormatter epilogs (debug, logs) needed their multi-line string interiors preserved at column 0 — caught by the --help diff, not compile. main() 3297 -> 1798 LOC across this PR; add_parser calls in main.py 179 -> 89. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new test_subcommands_batch.py smoke-tests all 25 builders + the dashboard two-handler case.	2026-06-07 22:18:14 -07:00
teknium1	4da45e8727	refactor(cli): extract profile + gateway/proxy parsers into hermes_cli/subcommands/ Follow-on to the cron extraction in the same Phase 2 PR. Same pattern: per-group build_<name>_parser() functions with injected handlers, no main import. - subcommands/profile.py: build_profile_parser (190-line block out of main()). - subcommands/gateway.py: build_gateway_parser (gateway + proxy, 238-line block; they shared one inline section). Imports argparse for SUPPRESS defaults. - main(): two more inline blocks become single builder calls. Behavior-neutral: 'profile [sub] --help' and 'gateway/proxy [sub] --help' byte-identical to pre-extraction (diff-verified). main() now 2723 LOC (was 3297 at Phase 2 start); add_parser calls in main.py 179 -> 141. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new builder unit tests cover subactions, aliases, dispatch, flags.	2026-06-07 22:18:14 -07:00
teknium1	b2e6053243	refactor(cli): extract hermes cron parser into hermes_cli/subcommands/ (god-file Phase 2) Phase 2 of the god-file decomposition plan. main()'s argparse tree is 179 inline add_parser calls in one 3,297-line function. This establishes the hermes_cli/subcommands/ package and extracts the first group (cron) as the proof-of-pattern: - hermes_cli/subcommands/_shared.py: shared parser helpers (add_accept_hooks_flag), re-exported from main.py for backwards compat. - hermes_cli/subcommands/cron.py: build_cron_parser(subparsers, cmd_cron=...). Handler injected so the module never imports main (cycle avoidance). - main()'s ~155-line inline cron block becomes one build_cron_parser() call. Behavior-neutral: 'hermes cron create --help' output is byte-identical to origin/main. main() 3297 -> 3143 LOC. Validation: tests/hermes_cli/ 6466 passed / 0 failed under per-file process isolation; new test_subcommands_cron.py covers subactions, aliases, options, no-agent tristate, injected dispatch, and --accept-hooks.	2026-06-07 22:18:14 -07:00
islam666	78e2101cd2	fix: reap zombie subprocesses in web_server action status and meet_bot cleanup - web_server.py: after proc.poll() returns a non-None exit code, call proc.wait() to reap the child and move the entry from _ACTION_PROCS to _ACTION_RESULTS. Previously .poll() alone left <defunct> zombies. - meet_bot.py: terminate and wait on the pcm_pump subprocess (paplay/ ffmpeg) during the finally-block teardown. Previously leaked on every normal bot exit. - tests: add test_action_status_reaps_completed_process and test_action_status_ignores_wait_failure covering both the happy path and the wait()-raises-OSError edge case. Closes #38032	2026-06-07 21:50:57 -07:00
islam666	e53b74c394	fix(dist): stop USER_OWNED_EXCLUDE from filtering nested directories The copytree ignore lambda in _copy_dist_payload applied USER_OWNED_EXCLUDE recursively at every directory depth. This caused nested directories whose names matched exclude entries (bin, logs, cache, etc.) to be silently dropped during distribution install/update. Fix: only apply USER_OWNED_EXCLUDE filtering at the root of the staged tree, matching the two-tier pattern used by _clone_all_copytree_ignore and _default_export_ignore in profiles.py. Add 5 tests covering nested bin/logs/cache preservation and top-level filtering still working. Fixes #37954	2026-06-07 21:50:57 -07:00
islam666	f1d3afb151	fix(profiles): skip 'default' in named profiles scan to prevent duplicates When ~/.hermes/profiles/default/ exists as a directory, list_profiles() returns 'default' twice: once as the built-in default profile (~/.hermes) and once from the directory scan (~/.hermes/profiles/default). This causes the cron dashboard API (profile=all) to read the same jobs.json twice, showing every default-profile job duplicated in the UI. Fix: skip name=='default' in the named profiles loop, since it's already added as the built-in default at the top of the function. Fixes #39346	2026-06-07 21:50:57 -07:00
islam666	18c085b1a4	fix(gateway): normalize optional systemd directives in stale-check (#41119 ) On older systemd versions that don't support RestartMaxDelaySec / RestartSteps, the installed unit file has those directives silently dropped. systemd_unit_is_current() did a strict text comparison, so the unit was perpetually flagged as outdated. Fix: _strip_optional_systemd_directives() removes RestartMaxDelaySec and RestartSteps from both the installed and expected text before comparison. Units that differ only by these optional directives are now correctly considered current.	2026-06-07 21:50:57 -07:00
Shannon Sands	86e5efb0ae	Preserve Telegram onboarding fallback errors	2026-06-07 19:48:09 -07:00
Shannon Sands	ba29010902	Use httpx for Telegram onboarding worker calls	2026-06-07 19:48:09 -07:00
Teknium	1892e22acb	fix(skills): browse shows full catalog, not first 5000 (#41413 ) hermes skills browse capped the hermes-index source at 5000, so it surfaced ~5.4k of the ~90.7k skills the index actually carries. Raise the per-source ceiling above catalog size; browse already paginates client-side and the index is disk-cached, so no extra fetch cost.	2026-06-07 10:15:31 -07:00
teknium1	16786f3bb3	feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network Desktop connected to a remote gateway can now attach images and PDFs and display agent-written images. Previously the desktop passed a LOCAL file path to image.attach; on a remote gateway that path doesn't exist, so the image was silently dropped ("skipped unreadable path") and the vision model never saw it. The reverse direction was also broken — images the agent wrote on the gateway rendered as dead links in the remote client. Gateway (tui_gateway/server.py): - image.attach_bytes: base64 byte upload written into the gateway's own images dir and queued via the existing native-image-attach pipeline. Magic-byte extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap, structured error codes. Accepts content_base64/filename (canonical) and data/ext (older-desktop aliases). - pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI and queues the pages as images; 50 MB / 25-page caps. Accepts host path or base64 upload. - Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image) so the two methods and the existing image.attach don't duplicate logic. Gateway (hermes_cli/web_server.py): - GET /api/media: returns a gateway-local image as a base64 data URL so remote clients can display it. Auth-gated like every /api route, extension allowlist + size cap, AND confined to the gateway's own media roots (images/screenshots/cache, resolved symlink-safe) so an authed caller can't read image-extension files anywhere on disk. Desktop (apps/desktop): - syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the connection mode is 'remote'; the local fast path is unchanged. - media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and markdown-text fetch images over /api/media in remote mode. Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437) into one coherent implementation, taking the strongest parts of each and adding shared-helper cleanup plus the /api/media root-confinement hardening on top. The per-profile gateway switching from #38876 is intentionally left out as a separable feature. TUI file uploads (#40492) remain a separate surface. Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop media.remote unit tests; full tui_gateway + web_server suites green (472 passed); tsc -b clean; E2E verified the full attach→disk→queue and gateway-path→data-URL display round-trip plus the out-of-root security block. Co-authored-by: Max Mitcham <maxmitcham@mac.home> Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com> Co-authored-by: Chris Cook <ccook@nvms.com> Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>	2026-06-07 10:05:53 -07:00
teknium1	76f01780f0	fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests Follow-up on the deferred-cleanup salvage (#33774): _cleanup_workspace returned early for a non-scratch ('dir'/'worktree') task and never ran the parent sweep, so a scratch parent waiting on a 'dir' child would leak its deferred workspace forever. Run the parent sweep before the early return. Adds regression tests: deferred-while-child-active, swept-after-last-child, and dir-child-unblocks-scratch-parent.	2026-06-07 09:50:44 -07:00
annguyenNous	9405cd0812	fix: defer scratch workspace cleanup when task has active children (#33774 ) When a Kanban task with workspace_kind=scratch completes, the _cleanup_workspace() function immediately deletes the workspace directory. If the task has children linked via task_links, those children find the workspace deleted when they start. This fix adds two checks: 1. Before deleting, check if any children are still active (todo/ready/running). If so, defer cleanup. 2. After a child completes, check if parent workspace can now be cleaned up (all children terminal). Fixes NousResearch/hermes-agent#33774	2026-06-07 09:50:44 -07:00
Teknium	cb3e41e2fd	feat(onboarding): opt-in structured profile-build path on first contact (#41114 ) * feat(onboarding): opt-in structured profile-build path on first contact On a user's very first gateway message, Hermes now optionally offers to build a short profile of them — then, only with consent, gathers durable facts and persists them to the user-profile memory store (memory tool, target="user") so future sessions start already knowing who they are. Inspired by Poke's zero-input onboarding, but consent-first by design: - The agent OFFERS, never assumes. Declining stops it immediately. - Before ANY external lookup it states what it will look up and asks. - It never reads connected accounts (email/calendar) silently — the exact privacy concern that made naive implementations feel invasive. Wiring reuses existing infrastructure end-to-end: - gateway/run.py first-message hook (was a plain self-intro) now swaps in the profile-build directive when enabled and not yet offered. - agent/onboarding.py gains profile_build_mode()/profile_build_directive() + PROFILE_BUILD_FLAG, latched once via the existing onboarding.seen mechanism so the offer fires at most once per install. - config default onboarding.profile_build: "ask" (set "off" to disable). Added to an existing section, so no _config_version bump needed. No new storage layer, no new injection path, no prompt-cache impact. * fix(dashboard): fold onboarding into agent tab to avoid 1-field category onboarding.profile_build is the only schema-surfaced onboarding field (onboarding.seen is an internal latch dict), so the dashboard CONFIG_SCHEMA single-field-category invariant rejected it. Merge onboarding -> agent like the other small categories.	2026-06-07 08:36:48 -07:00
Teknium	2912d94370	fix: guard int(os.getenv()) casts against malformed env vars (#40598 ) A non-numeric value in env vars like HERMES_STREAM_RETRIES, HERMES_KANBAN_SPECIFY_MAX_TOKENS, GOOGLE_CHAT_MAX_BYTES, IRC_PORT, etc. raised ValueError at import/init and crashed startup. Parse them safely, falling back to the default. Unified onto the existing utils.env_int(key, default) helper for core/ hermes_cli/tools modules instead of the original PR's three duplicate local helpers; plugins keep minimal inline guards (no core-utils import). All existing max()/min()/`or extra.get()` wrappers preserved. Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>	2026-06-07 06:14:24 -07:00
oxngon	e2cc24e331	fix: respect Honcho env var fallback in doctor and honcho status hermes doctor and hermes honcho status warned 'Honcho config not found' whenever ~/.honcho/config.json was absent, even though HONCHO_API_KEY in .env resolves a working config via HonchoClientConfig.from_global_config() -> from_env(). Both now check hcfg.api_key/base_url before warning. Co-authored-by: oxngon <98992931+oxngon@users.noreply.github.com>	2026-06-07 05:37:02 -07:00
Teknium	9e63109522	feat(dashboard): change UI font from the theme picker, independent of theme (#41145 ) The dashboard font is now selectable from the UI, not just YAML. A new Font section in the header theme picker overrides the UI font of whatever theme is active; the choice is orthogonal to the theme and survives theme switches. Each theme keeps its own font as the default — picking "Theme default" clears the override. - web/src/themes/fonts.ts: curated font catalog (system + Google Fonts across sans/serif/mono), each with a family stack and optional webfont URL. The catalog is the only injected-font surface — no free-text URL box, so the injected <link> origins stay fixed. - web/src/themes/context.tsx: font-override state (localStorage + server), applied after theme typography so it wins; theme apply re-asserts it, and clearing re-runs theme apply to restore the theme's own font. Mono is left to the theme so code/terminal are untouched. - web/src/components/ThemeSwitcher.tsx: Font section with grouped, self- previewing font rows and a "Theme default" clear option. - hermes_cli/web_server.py: GET/PUT /api/dashboard/font persisting to config.yaml dashboard.font, with a server-side id allow-list (unknown ids coerce to the theme sentinel). - i18n + types, api client methods, tests, and docs. Validation: 6 new backend endpoint tests pass; tsc + vite build clean; live browser test confirmed pick/persist/survive-theme-switch/clear all work.	2026-06-07 03:39:01 -07:00
Teknium	0507e4630d	fix(desktop): preserve configured base_url on same-provider model switch (#41121 ) The desktop model picker calls POST /api/model/set with provider+model only (no base_url). _apply_main_model_assignment cleared model.base_url for every non-custom provider, so re-picking a Xiaomi MiMo model wiped a Token Plan endpoint (https://token-plan-*.xiaomimimo.com/v1) back to the registry default api.xiaomimimo.com — breaking valid tp- keys with 401s. Now base_url is cleared only when switching to a different provider (the stale URL belonged to the old one); same-provider re-assignment preserves it, and an explicitly supplied base_url is honored for any provider.	2026-06-07 02:48:21 -07:00
Teknium	ed81cfe3de	fix(cron): bound the desktop run-history query to one job (#41088 ) The cron run-history endpoint (GET /api/cron/jobs/{id}/runs, added in #40684) reused list_sessions_rich's order_by_last_active path with a leading-wildcard id_query. That routes through the recursive compression-chain CTE, which seeds from EVERY source='cron' row in the DB and runs per-row preview/last_active subqueries before filtering to one job and applying LIMIT. Work scaled with the total cron history, so a large pile made the run-history load time out before eventually populating. Cron runs are flat, never-compressed sessions with ids of the form cron_{job_id}_{ts}, so the chain machinery is pure overhead and the job binding is a true prefix, not a substring. - New SessionDB.list_cron_job_runs(): bounded [prefix, hi) id-range scan on source='cron', ordered by started_at DESC, with the same preview/last_active enrichment. No CTE, no leading-wildcard LIKE. - Add idx_sessions_source(source, id) so the range is an index scan; bump SCHEMA_VERSION 14 -> 15 (index reconciles onto existing DBs via CREATE INDEX IF NOT EXISTS on startup). - Point the endpoint at the new method. Measured on a real SessionDB with 30k cron rows: 5ms vs 85ms for the old path (16x), and the new path stays flat as the pile grows while the old one scaled with it. Verified the query plan uses idx_sessions_source_id (range scan, no full table scan), runs are correctly scoped (substring collisions like cron_xalpha_ excluded), newest-first, and paged.	2026-06-07 02:41:01 -07:00
Teknium	0524c9b34e	feat(compression): raise compaction trigger to 85% for gpt-5.5 on Codex OAuth (#40957 ) The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window (verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses is rejected with context_length_exceeded while ~250K succeeds; the same slug exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the default 50% trigger, auto-compaction fires at ~136K — half the usable window. Raise the trigger to 85% (~231K) on this exact route only, gated by a new compression.codex_gpt55_autoraise config flag (default true). When it fires, emit a one-time notice (CLI inline print + gateway status_callback replay) with the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's global threshold. - _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex - _compression_threshold_for_model() now provider-aware + opt-out param - config key + _config_version bump (27->28) for backfill - docs + tests (40 cases in test_arcee_trinity_overrides.py)	2026-06-07 01:40:50 -07:00
Teknium	fe0b3f2338	fix(windows): retry watcher Popen without breakaway when parent job denies it, plus regression tests for the breakaway bit (#40956 ) #40909 added `CREATE_BREAKAWAY_FROM_JOB` to `windows_detach_flags()`, which fixed the headline bug (gateway dies after Desktop GUI update and never comes back). The flag's own docstring acknowledges that restrictive parent job objects can still refuse breakaway with `ERROR_ACCESS_DENIED`, surfacing as `OSError` on the `subprocess.Popen` call: "Callers in this codebase already wrap detached spawns in try/except OSError and fall back to a cmd.exe wrapper, so the breakaway-denied case degrades gracefully rather than crashing." That's true for `_spawn_detached` in `gateway_windows.py` (the `hermes gateway start` path), which has both the breakaway bit AND a retry-without-breakaway fallback. It's NOT true for the post-update watcher path in `launch_detached_profile_gateway_restart` (`hermes_cli/gateway.py`), which only has `except OSError: return False` and gives up entirely. If a user's shell/terminal/container wraps Hermes in a breakaway-denying job, the gateway-respawn watcher silently fails to launch instead of trying again without breakaway. This PR closes that gap and adds the regression tests that were missing from the original fix. ## Changes ### `hermes_cli/_subprocess_compat.py` Adds a sibling helper `windows_detach_flags_without_breakaway()` so callers can express the fallback symbolically (via the helper) rather than coding the magic `& ~0x01000000` mask at every site. Documented on `windows_detach_flags` and `windows_detach_flags_without_breakaway` with the recommended try/except pattern. ### `hermes_cli/gateway.py::launch_detached_profile_gateway_restart` Two changes, both aligned with the canonical pattern in `gateway_windows._spawn_detached`: 1. The outer watcher Popen now wraps in `try/except OSError`, and on failure retries with `windows_detach_flags_without_breakaway()` (POSIX never reaches this branch — `start_new_session=True` can't raise OSError). 2. The inlined respawn payload (the `python -c` watcher) also wraps its CreateProcess in try/except OSError and retries with `_flags & ~_CREATE_BREAKAWAY_FROM_JOB` on failure. This matters because the watcher's job-object inheritance is independent of the outer process's — even if the outer Popen succeeds with breakaway, the respawned gateway might inherit a job that doesn't. ### Regression tests in `tests/tools/test_windows_native_support.py` #40909 shipped the fix without any test that the breakaway bit is present (the existing `test_windows_detach_flags_has_expected_win32_bits` asserted only the three legacy bits). Four new tests close that: - `test_windows_detach_flags_includes_breakaway_from_job` — explicit assertion that the breakaway bit is in the default bundle, with the rationale spelled out in the docstring so a future maintainer staring at this test understands why removing it would resurrect the gateway-dies-after-GUI-update bug. - `test_windows_detach_flags_without_breakaway_drops_only_that_bit` — fallback payload keeps the other three detach bits intact. - `test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway` — static-text check on the stringified watcher payload. The inlined Python program isn't reachable via normal import-time inspection because it lives in a `textwrap.dedent("""...""")` literal that gets passed to a separate `python -c` interpreter. Asserting that both `_CREATE_BREAKAWAY_FROM_JOB` (symbolic) and `0x01000000` (hex literal) appear inside the dedent block is a sufficient regression guard against accidental refactors. - `test_launch_detached_profile_gateway_restart_outer_popen_has_access_denied_fallback` — static check that this PR's fallback retry is wired up symbolically. Without standing up a real Windows job object that refuses breakaway, we can't trigger the OSError in a unit test; the text guard catches the case where a future refactor removes the helper import or the `& ~_CREATE_BREAKAWAY_FROM_JOB` retry. Also extends `test_windows_detach_flags_has_expected_win32_bits` to include the breakaway bit assertion and updates `test_windows_flags_zero_on_posix` to cover the new helper. ## Tests Locally on Windows: 8/8 in the `-k "detach or breakaway or popen_kwargs or launch_detached or gateway_run_update or hermes_cli_gateway"` slice pass. Broader `tests/hermes_cli/test_gateway*.py + test_windows_native_support.py`: 172 passed, 10 failed. All 10 failures are pre-existing POSIX-only tests running on a Windows host (os.geteuid, SIGKILL fallback, is_linux fixture mismatches). Stashing this PR and re-running on bare post-#40909 main reproduces all 10 identically — none are regressions. POSIX paths unchanged: `windows_detach_flags()` and `windows_detach_flags_without_breakaway()` both return 0 off Windows, `windows_detach_popen_kwargs()` still yields `{"start_new_session": True}`. ## Out of scope - The other detached-spawn site in `hermes_cli/gateway.py` (around line 3068) also uses `windows_detach_popen_kwargs()` + `except OSError`. It deserves the same fallback treatment but the codepath is different enough (not the update-flow watcher) that it warrants a separate PR with its own scrutiny. - `gateway/run.py` has Windows branches with `windows_detach_popen_kwargs` too — same reasoning. ## Context Follow-up to #40909 (merged). I had a parallel PR (#40934, closed) that duplicated the core breakaway fix; the bits unique to that PR that #40909 didn't cover are the contents of this one. Closing #40934 and opening this slimmed-down version as the focused follow-up.	2026-06-07 01:21:58 -07:00
islam666	ccacfdbd6d	fix(plugins): discover nested category plugins in 'plugins list' (issue #41066 ) _discover_all_plugins() previously did a flat iterdir() scan, missing all category-namespaced plugins (web/, image_gen/, browser/, video_gen/). Now recurses up to 2 levels deep, matching PluginManager._scan_directory_level(). Also fixes _plugin_status() to check both manifest name AND path-derived key against enabled/disabled sets, so category plugins like 'web/tavily' show correct status when enabled via config.	2026-06-07 08:02:55 +00:00
kshitijk4poor	44c0c2d4ac	refactor(inventory): make force_fresh_nous_tier keyword-only + pin contract Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Has been cancelled Details uv.lock check / uv lock --check (push) Has been cancelled Details Follow-up to the salvaged perf fix. The new force_fresh_nous_tier param was inserted into list_authenticated_providers between custom_providers and max_models. Make it keyword-only (*) so a positional caller passing max_models as the 5th arg can never silently mis-bind it to the tier-refresh flag, and add a signature-contract test that fails if the keyword-only separator is later dropped. All in-repo callers already use keyword args; verified no caller breaks.	2026-06-07 00:41:13 -07:00
helix4u	eb70ab894b	fix(inventory): avoid fresh Nous tier checks in picker payloads	2026-06-07 00:41:13 -07:00
brooklyn!	846821d8c0	Merge pull request #40684 from NousResearch/bb/cron-sessions-sidebar feat(desktop): first-class cron jobs in the sidebar + dashboard scheduler	2026-06-07 00:32:25 -05:00
Teknium	fc086da8bd	fix(gateway,windows): reliability — JOB breakaway + status --deep probes + test-leak fix (#40909 ) * fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep Three coordinated fixes for the Windows gateway reliability story: 1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn The 'hermes update' triggered from the Electron Desktop GUI ran inside Electron's job object. Without breakaway, the post-update gateway watcher spawned by update — already DETACHED_PROCESS — was still reaped when Electron's job tore down, so the gateway never came back after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000) to: - hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by every helper that calls windows_detach_popen_kwargs(), including launch_detached_profile_gateway_restart() - The watcher subprocess's own respawn snippet in hermes_cli/gateway.py (inlined flags so the watcher's child respawn also breaks away) _spawn_detached() in gateway_windows.py already had the flag; this change brings the rest of the codebase to parity. 2. Per-minute supervisor Scheduled Task — Windows equivalent of systemd Restart=always Introduces hermes_cli/gateway_supervisor.py and registers it as a second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1, LIMITED rights) alongside the existing ONLOGON task. Every minute, the supervisor uses the same gateway.status.get_running_pid() probe as 'hermes gateway status' and, if no gateway is alive, calls gateway_windows._spawn_detached() (which now includes BREAKAWAY) to bring one back. Covers every crash mode, not just 'machine rebooted': taskkill, OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw startup per minute when down, one PID-existence check per minute when up. Wired into both the schtasks-success and Startup-folder-fallback install paths via _install_supervisor_best_effort(), and removed in uninstall(). Best-effort: a failing supervisor install logs a warning but doesn't roll back the primary install. 3. 'hermes gateway status --deep' shows per-probe PASS/FAIL Replaces the existing terse '--deep' output (which only printed paths) with an actual diagnostic table: [1] PID file present [2] Lock file held by a live process [3] get_running_pid() result [4] _pid_exists(pid) — OS-level liveness [5] gateway_state.json (state + age) [6] Last lifecycle event from gateway-exit-diag.log When the high-level summary disagrees with reality, the user can see exactly which signal is lying. Test-leak fix ------------- tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages monkey-patched is_linux/is_wsl/supports_systemd_services to simulate WSL but did NOT stub is_windows(). On a Windows host, the dispatcher in _gateway_command_inner takes the is_windows() branch BEFORE the WSL guidance branch, so the test invoked gateway_windows.install() for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd — the REAL user Startup folder, never sandboxed by tmp_path — pointing at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper. When pytest tore down the tmp_path, every subsequent Windows login flashed a cmd.exe window that failed to find the missing target. Stubs is_windows=False on all four affected tests: test_install_wsl_no_systemd test_start_wsl_no_systemd test_status_wsl_running_manual test_status_wsl_not_running Defense-in-depth: _build_startup_launcher() now prefixes the launcher with 'if not exist <target> exit /b 0', so any future stale Startup entry silently no-ops instead of flashing a console window. Status enhancements ------------------- - status() now reports supervisor task presence alongside the existing schtasks/Startup info, and nudges the user to reinstall if the supervisor isn't registered. - Deep mode dumps both the supervisor task name + script path. * fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes Earlier in this branch we added a per-minute schtasks-based supervisor to respawn the gateway after crashes / GUI-update SIGTERMs. The implementation flashed a brief console window on every firing, which stole window focus. We tried several variants: - cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem) - schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is actually subsystem=Console, not GUI; it respawns the real pythonw) - schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side conhost preallocation; documented Windows quirk) - XML registration with <Hidden>true> -> still flashes (<Hidden> only hides the task in the Task Scheduler UI, not the spawned window) Researched what leading projects do: - Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor. - Tailscale: real Windows Service via SCM. Session 0, no console possible. - Syncthing: --no-console flag inside the binary + Startup folder. - openclaw: VBS Run(..., 0, False) wrapper. Suppresses the window but Super User Q971162 confirms focus-steal still occurs in some cases. None of these use a per-minute polling scheduled task. The 'auto-restart on crash' responsibility belongs INSIDE the daemon (Tailscale's in-process recovery / Ollama's monitor+worker pair) OR is delegated to the Windows Service Control Manager — not Task Scheduler. So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB fix in _subprocess_compat.py (from commit `c1e5fa433`) survives — that is the real fix for problem #2 (GUI-update kills gateway): the post-update watcher in launch_detached_profile_gateway_restart() now breaks out of Electron's job object, so the gateway respawn watcher survives the GUI quit and successfully respawns the gateway. Surviving from `c1e5fa433`: * CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes #2) * Inlined breakaway flag in the watcher respawn snippet in gateway.py * hermes gateway status --deep PASS/FAIL probes (fixes #1 — visibility) * 'if not exist <target> exit /b 0' guard in _build_startup_launcher (fixes #3 — silent no-op for stale Startup entries) * tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause of #3 — pytest WSL tests no longer leak Startup entries on Win hosts) Removed in this commit: * hermes_cli/gateway_supervisor.py (entire file) * Supervisor section in hermes_cli/gateway_windows.py (~180 lines): get_supervisor_task_name, get_supervisor_script_path, _build_supervisor_cmd_script, _write_supervisor_script, _install_supervisor_task, is_supervisor_task_registered, _install_supervisor_best_effort * _install_supervisor_best_effort() calls in install() (3 spots) * supervisor cleanup block in uninstall() * supervisor display lines in status() / status(deep=True) Future direction (out of scope for this PR): the right place for Windows 'Restart=always' semantics is a real Windows Service installed via pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM auto-restart, no console window possible. That's a meaningful next-PR project, not a band-aid. Tests: 51 pass / 2 pre-existing failures in tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are TestSupportsSystemdServicesWSL cases that fail on origin/main too — unrelated to this PR).	2026-06-06 19:53:58 -07:00
Gille	fda66c488b	docs(kanban): clarify decomposer profile roles	2026-06-06 19:29:00 -07:00
Teknium	887295ba54	fix(config): preserve custom-provider models maps and metadata through v11->v12 migration (#40573 ) Salvaged from #40410; cleaned up, re-verified against main, tests added. Co-authored-by: rodboev <rodboev@users.noreply.github.com>	2026-06-06 18:43:20 -07:00

1 2 3 4 5 ...

2613 commits