hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-26 17:38:36 +00:00

Author	SHA1	Message	Date
Teknium	96af61b6ef	feat(memory,skills): approve/deny gate for memory + skill writes (#38199 ) Adds memory.write_mode and skills.write_mode (on\|off\|approve), applied to both foreground turns and the background self-improvement review fork — the source of the unprompted 'wrong assumption' saves users reported. - on (default): write freely, unchanged behaviour - off: never write; the tool returns a clean disabled result - approve: don't commit. Memory foreground writes prompt inline (small, reviewable in a chat bubble); background memory writes and ALL skill writes stage to a pending store instead (a SKILL.md is too large to review inline, and a daemon thread can't block on a prompt) Review staged writes from CLI or any messaging platform: /memory pending\|approve\|reject\|mode /skills pending\|approve\|reject\|diff\|mode Skill review respects the size asymmetry: inline you see a one-line gist; the full unified diff stays out-of-band (/skills diff, dashboard, or the staged JSON file). New: tools/write_approval.py (gate + pending store), hermes_cli/ write_approval_commands.py (shared CLI+gateway handlers). Gates wired at the single entry points memory_tool() and skill_manage(), using the existing write-origin ContextVar to distinguish foreground from background_review.	2026-06-09 21:51:43 -07:00
Ben Barclay	5cf6e28a2f	fix(gateway): auto-start after container restart via planned-stop marker (#42675 ) (#43236 ) * fix(gateway): auto-start after container restart via planned-stop marker On Docker (s6-overlay), the gateway runs as a dynamically-registered s6 service. When the container stops/restarts/upgrades, s6 sends the gateway a plain SIGTERM. The shutdown path (_stop_impl) ended with an unconditional _update_runtime_status("stopped"), persisting gateway_state=stopped to the volume. container_boot.py reads that on the next boot and only auto-starts gateways whose last state was "running" (_AUTOSTART_STATES) — so after a routine `docker compose up --force-recreate` the gateway stays down and messaging channels silently go dark, with no error surfaced (issue #42675). The codebase already distinguishes intentional stops from unexpected signals via the planned-stop marker (write_planned_stop_marker / consume_planned_stop_marker_for_self): `hermes gateway stop`, systemd/launchd ExecStop, and Ctrl+C write a marker before signalling, so the handler classifies them as planned. An unmarked SIGTERM (container/s6 restart, OOM, bare kill) is signal-initiated. This wires that existing classification through to the state persist, rather than adding unreliable signal-source inference: - run.py: GatewayRunner._signal_initiated_shutdown, set in shutdown_signal_handler's unmarked-signal branch. In _stop_impl, a signal-initiated (non-restart) teardown now persists "running" instead of "stopped" — preserving the operator's run-intent and overwriting the mid-shutdown "draining" marker so _AUTOSTART_STATES matches on reboot. Operator stops and restarts persist "stopped" as before. - service_manager.py: S6ServiceManager.stop() now writes the planned-stop marker for the supervised PID (read from s6-svstat) before `s6-svc -d`, so an in-container `hermes gateway stop` is correctly classified as intentional (parity with the systemd/launchd/host stop paths, which already mark). Best-effort: a marker-write failure falls back to the safe signal-initiated path. Tests: shutdown persist-decision table (signal→running, operator→stopped, restart→stopped), s6 stop marker write + svstat PID parse + failure tolerance. The signal→running and s6-marker tests fail without the respective source change. Verified end-to-end against a container built from this branch: an unmarked SIGTERM to the live gateway leaves gateway_state=running (shutdown-context log confirms signal path); existing real container-restart suite still green. * docs(docker): clarify gateway autostart distinguishes operator-stop from container-kill The per-profile-supervision section described the autostart-across-restart contract as "running gateways come back, stopped stay stopped" without spelling out what records 'stopped'. That contract was the source of #42675 confusion: users expected a restart to bring the gateway back and it didn't. With the write-side fix, only an explicit `hermes gateway stop` records 'stopped'; container/s6 restart SIGTERMs (incl. image upgrades and unexpected exits) leave the state 'running' so the gateway auto-starts. Make that distinction explicit in both the multi-profile and per-profile-supervision sections. * test(docker): real-restart autostart E2E for #42675 Adds test_live_gateway_autostarts_after_real_restart_without_manual_state_stamp: a live s6-supervised gateway is killed by an actual `docker restart` SIGTERM (no manual gateway_state stamp, no planned-stop marker) and must auto-start on the next boot. Exercises the WRITE side of the fix that the existing stamp-based tests bypass. Verified to FAIL against an origin/main image (reconciler logs prior_state=stopped action=registered — the #42675 bug) and PASS against the fixed image (prior_state=running action=started).	2026-06-10 14:01:34 +10:00
Ben Barclay	7df3aa34b1	fix(dashboard-auth): warn when public_url override is silently rejected (#43214 ) A non-empty HERMES_DASHBOARD_PUBLIC_URL / dashboard.public_url value that fails URL validation (overwhelmingly: a missing http(s):// scheme, e.g. "hermes.domain.com") was silently discarded by resolve_public_url(), falling back to reconstructing the OAuth redirect_uri from request headers. Behind a reverse proxy that doesn't forward X-Forwarded-Proto reliably, that yields an http:// callback even though the operator explicitly set the public URL — with no signal as to why (#42780). Emit a deduplicated operator-facing WARNING (once per distinct value, since resolve_public_url runs per request) naming the offending value and the required scheme. Turns a silent footgun into a self-diagnosing one; behaviour is otherwise unchanged. Tests assert the warning fires for a scheme-less value, is deduplicated across repeated calls, and stays silent for a valid value — all three fail without the fix.	2026-06-10 12:14:57 +10:00
brooklyn!	218452b050	fix(state.db): recover from malformed sqlite_master so hidden sessions reappear (#43149 ) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes #32589 and #33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes #33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>	2026-06-09 18:49:08 -05:00
Teknium	57c6714995	fix(models): keep curated Anthropic aliases in /model picker (#43103 ) The Anthropic picker returned the live /v1/models dump verbatim whenever credentials were configured. Anthropic's API lags newly-routed curated aliases (e.g. claude-fable-5, reachable on Anthropic before the models endpoint enumerates it), so the curated entry vanished from the picker. Merge curated _PROVIDER_MODELS["anthropic"] with the live catalog — curated first, live-only appended, deduped — mirroring the OpenAI curated-merge path. Live failure / no creds falls back to curated verbatim.	2026-06-09 14:45:19 -07:00
emozilla	d7886da08c	add Fable 5 to model list for Anthropic provider	2026-06-09 15:33:42 -04:00
brooklyn!	ba44de06da	fix(install): self-heal a stuck Electron download (salvage of #42894 ) (#42998 ) * fix(install): self-heal a stuck Electron download on the desktop build The desktop build downloads Electron (~114MB) from GitHub. A corrupt cached zip, or a blocked/throttled GitHub release host (the repeating "retrying" log), hard-failed the install — and install.sh had no recovery at all while install.ps1 / `hermes desktop` only purged the cache. All three build paths now escalate on a failed `npm run pack`: GitHub → purge corrupt electron-.zip + stale -unpacked and retry → one retry via a public Electron mirror (npmmirror.com). @electron/get SHASUM-verifies the download, and a user-pinned ELECTRON_MIRROR is always respected (never overridden). Adds a bash clear_electron_build_cache()/_desktop_pack() to mirror the existing PowerShell/Python helpers. * test(install): cover the Electron mirror fallback Verify `hermes desktop` falls back to a mirror when the cache purge finds nothing, and that a user-pinned ELECTRON_MIRROR is respected (no extra attempt, not overridden). * docs(desktop): troubleshoot a stuck Electron download Document the automatic cache-purge + mirror fallback, how to pin your own ELECTRON_MIRROR, and how to clear a corrupt cached zip by hand. * docs(install): correct the Electron mirror trust framing The mirror-fallback comments and the desktop troubleshooting doc implied `@electron/get`'s SHASUM check makes the npmmirror.com download safe against tampering. It doesn't: the SHASUMS256.txt is fetched from the same mirror, so the check guards against a corrupt/partial download, not a compromised mirror. Reframe all four surfaces (install.sh, install.ps1, `hermes desktop`, and the docs) to state the trust trade-off honestly — npmmirror.com is the de-facto Electron community mirror, we only fall back to it after the canonical GitHub download fails, and a user-pinned ELECTRON_MIRROR is never overridden. No behavior change. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>	2026-06-09 18:19:14 +00:00
Teknium	f6f573ebaa	feat(plugins): install from a subdirectory within a repo (#42963 ) Support installing a plugin that lives in a subdirectory of a larger repo (docs/tests at root, plugin in a subdir) without forcing a dedicated single-plugin repo. Identifier syntax: owner/repo/path/to/plugin (shorthand + subpath) <url>.git/path/to/plugin (.git boundary on GitHub-style URLs) <url>#path/to/plugin (explicit fragment, any scheme) _resolve_git_url now returns (git_url, subdir); _install_plugin_core reads the manifest from and moves only the subdir, so root-level docs and tests no longer leak into ~/.hermes/plugins. _resolve_subdir_within guards against path traversal, missing dirs, and non-directories. Both the CLI (hermes plugins install) and the dashboard install endpoint inherit this for free since they share _install_plugin_core. Dashboard install hint + placeholder updated to advertise the subdir syntax. Co-authored-by: Austin Pickett <pickett.austin@gmail.com>	2026-06-09 13:42:51 -04:00
Teknium	ff9c110d5a	feat(models): add anthropic/claude-fable-5 to openrouter + nous curated lists (#42979 ) Adds the model above claude-opus-4.8 in both the OpenROUTER_MODELS and _PROVIDER_MODELS['nous'] curated picker lists used by /model and `hermes model`. Regenerated website/static/api/model-catalog.json to match.	2026-06-09 10:20:37 -07:00
Gille	c6dc2fcd21	fix(desktop): release profile backends before delete (#42613 )	2026-06-09 10:52:02 -05:00
helix4u	f8adefdebf	fix(tui): apply terminal backend config before launch Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Build Skills Index / build-index (push) Has been cancelled Details Build Skills Index / trigger-deploy (push) Has been cancelled Details	2026-06-09 00:31:27 -07:00
Teknium	e687292eb4	feat(models): persist Nous recommended-models to disk; fall back on Portal failure (#42628 ) The Portal's /api/nous/recommended-models endpoint is the source of truth for which models are free/paid right now, but its result was cached in-process only. When the live fetch failed (network, parse, non-2xx), the function returned {} and the model picker silently dropped the free/paid recommendations — free models would vanish with no indication anything went wrong. Add a per-base disk cache at $HERMES_HOME/cache/nous_recommended_cache.json: a successful live fetch is persisted as last-known-good, and a failed fetch with an empty in-process cache falls back to the disk copy instead of {}. Self-heals on the next successful fetch. With no disk copy, still degrades to {} (callers already handle that). Keyed by portal base URL so staging/prod don't collide. E2E: live fetch writes disk; simulated Portal failure returns the cached free models from disk; no-disk + failure returns {}.	2026-06-09 00:03:43 -07:00
Teknium	c4066091ca	feat(models): add laguna-m.1 + nemotron-3-ultra to curated OpenRouter list (#42629 ) Two new free-tier slugs surfaced in /model and `hermes model`. owl-alpha was already present. Regenerated website/static/api/model-catalog.json to keep the manifest sync test green.	2026-06-08 23:05:35 -07:00
Teknium	54318c65b0	feat(models): seed model-catalog disk cache from checkout on update (#42614 ) hermes update pulls the latest repo, so the freshly-pulled website/static/api/model-catalog.json is already the newest catalog. Copy it straight over ~/.hermes/cache/model_catalog.json instead of relying on a network fetch (which can be Vercel bot-gated or hit a Portal hiccup and silently degrade the picker to a stale/short list). Adds seed_cache_from_checkout() in model_catalog.py (read shipped manifest, validate, atomic write via _write_disk_cache, reset in-process cache) and calls it from both update paths in main.py: _cmd_update_impl (git pull) and _update_via_zip (Docker/no-git). Non-fatal on missing/malformed/invalid files — the normal network refresh still applies on next picker open.	2026-06-08 22:31:06 -07:00
Ben Barclay	a46462ec65	fix(cli): persist custom --portal-url to .env on dashboard register (#42435 ) * fix(cli): persist custom --portal-url to .env on dashboard register `hermes dashboard register --portal-url <url>` resolved the custom portal for the registration request but only persisted it to .env when the var was absent AND non-default. So a user who re-registered against a different portal (e.g. switching preview deploys) silently kept the stale HERMES_DASHBOARD_PORTAL_URL, and an explicit request for the production portal was never written at all. Track whether a custom portal was explicitly supplied (--portal-url flag or HERMES_DASHBOARD_PORTAL_URL env), separately from the resolved value: - explicit custom URL -> always persist (update in place via save_env_value, which overwrites the matching key rather than appending a duplicate), even when it equals the production default; no-op when it already matches. - no custom URL supplied -> unchanged conservative behaviour: only write an inferred portal when absent and non-default; never alter an existing entry unexpectedly. save_env_value already preserves other lines/comments and dedups in place; this only changes the decision of when to call it. Adds TestCustomPortalPersistence covering all four cases. Co-authored-by: Hermes Agent <agent@nousresearch.com> * feat(cli): persist dashboard public URL from --redirect-uri on register When the user registers a publicly-exposed dashboard with --redirect-uri (the full OAuth callback, e.g. https://hermes.example.com/auth/callback), derive its origin and persist it as HERMES_DASHBOARD_PUBLIC_URL — the env var the dashboard auth layer actually consumes at serve time. dashboard_auth/routes._redirect_uri reconstructs the callback as HERMES_DASHBOARD_PUBLIC_URL + "/auth/callback" (verbatim), and dashboard_auth/prefix.resolve_public_url reads that var (then config.yaml dashboard.public_url) to decide the public origin. Previously --redirect-uri was sent to the portal at registration but never persisted, so the operator had to set HERMES_DASHBOARD_PUBLIC_URL by hand for the login gate to engage and the callback to round-trip. We now wire it automatically. Persist the ORIGIN (scheme://host[:port]), not the full callback path — persisting the raw redirect would double the path when the runtime appends /auth/callback. Mirrors the portal-url persistence semantics already in this PR: always write an explicitly-derived value (updating in place, no duplicate), no-op when it already matches, never written on a localhost-only install (no --redirect-uri), and skipped for a non-http(s)/malformed redirect. Verified end-to-end: cmd_dashboard_register writes the origin to .env, then resolve_public_url() reads it back and public_url + /auth/callback reconstructs exactly the originally-supplied --redirect-uri. Adds TestPublicUrlPersistence (8 cases) incl. origin-derivation, port preservation, update-in-place, no-op, no-flag, non-http skip, and both-portal-and-public-url-persisted. Co-authored-by: Hermes Agent <agent@nousresearch.com> --------- Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-09 13:56:33 +10:00
Ben Barclay	52ae9d9f02	feat(dashboard): make `hermes dashboard register` idempotent (#42455 ) Re-running `hermes dashboard register` now updates the existing dashboard record in nous-account-service instead of creating a duplicate. The stable key is the client_id this install already persisted in HERMES_DASHBOARD_OAUTH_CLIENT_ID on a prior run: - No stored client_id -> first registration -> create a fresh client with an auto-generated name (unchanged behavior). - Stored client_id present -> re-send it as `client_id` so the portal updates that row in place. Without an explicit --name, the name is omitted so the portal-stored name isn't churned to a new random value on every re-run. - Prints "Updated dashboard" vs "Registered dashboard" based on whether the portal echoed back the same client_id. A stale/deleted id safely falls through to a fresh create server-side. Requires the matching nous-account-service change (POST /api/oauth/self-hosted-client accepting an optional client_id + optional name). Tests: 7 new TestIdempotentRerun cases (key sent, name preserved/overridden, Updated message, persisted id, stale-id fall-through, blank-id first-run); existing create-path tests unchanged (23 pass).	2026-06-09 13:19:35 +10:00
teknium1	aa424e51ac	refactor(doctor): fold custom-provider vendor-slug check into one predicate Collapse the bare-"custom" allowlist entry and the custom:<name> guard into a single provider_accepts_vendor_slug predicate so the slug-warning suppression reads as one rule instead of two scattered conditions. No behavior change.	2026-06-08 15:53:09 -07:00
helix4u	732ababa1a	fix(doctor): allow vendor slugs for named custom providers	2026-06-08 15:53:09 -07:00
Robin Fernandes	639c1e3636	feat(sessions): add optional max session cap	2026-06-08 15:12:12 -07:00
Brooklyn Nicholson	e88116256c	fix(update): scope git fetch to target branch A bare `git fetch origin` (and `git fetch upstream`) pulls every ref. The repo carries thousands of auto-generated branches, so on any non-single-branch checkout the installer's update path and `hermes update` spend minutes downloading the full branch list — long enough to stall the desktop installer or trip the follow-up `git pull --ff-only`. Scope every update-path fetch to the branch we actually compare/merge against: - scripts/install.sh: collapse the remote to single-branch and fetch only $BRANCH on the "existing install, updating" path. - hermes_cli/main.py: fetch the resolved branch in the apply path, the --check path (upstream + origin), and the fork upstream-sync. Tracking-ref updates still happen via git's opportunistic refspec, so the later origin/<branch> rev-parse/rev-list checks are unaffected. Tests assert the apply-path fetch is branch-scoped and never bare.	2026-06-08 15:24:31 -04:00
teknium1	c78b3e1d3c	fix(auth): add Codex OAuth accounts as distinct pool entries hermes auth add openai-codex now creates an independent manual:device_code pool entry per account instead of routing through the singleton _save_codex_tokens save path, which collapsed every added account into the latest login (the second add overwrote the first account's singleton-mirrored device_code entry). This is the add-path half of #39236; PR #39243 (already on this branch) fixes the re-auth half. manual:device_code entries refresh from their own token pair (_sync_codex_entry_from_auth_store only adopts the singleton for source=="device_code"), so they need no providers.openai-codex shadow. Adding the first credential marks openai-codex active (the singleton path did this implicitly) so the setup wizard's get_active_provider() check still passes; subsequent adds leave the active provider untouched. Adds SOURCE_MANUAL_DEVICE_CODE constant and a regression test that two distinct accounts keep distinct token pairs. Updates two existing add tests to the pool-only behavior. Co-authored-by: glesperance <info@glesperance.com>	2026-06-08 11:57:03 -07:00
Ted Malone	761b744abb	fix(auth): preserve independent Codex pool entries on re-auth (#39236 ) The #33538 fix refreshed every credential_pool entry with source "manual:device_code" on every Codex OAuth re-auth, on the assumption that such entries were always legacy aliases of the singleton from the #33000 workaround era. That assumption is no longer true: `hermes auth add openai-codex` also produces "manual:device_code" entries for independent ChatGPT accounts, and the broad sync silently clobbered them with the latest-authenticated token pair (labels preserved, token material overwritten, status / quota readings then lie). Narrow the sync: refresh a "manual:device_code" entry only when its existing access_token matches the previous singleton access_token (true legacy alias). Entries with distinct token material represent independent accounts and are now left alone. Error markers are cleared only on entries actually rewritten, so an independent account's own 429 / 401 state survives a re-auth that targeted a different account. Tests: * New: independent acctB/acctC are not overwritten when acctA re-auths. * New: legacy singleton-alias still refreshed (preserves #33538). * New: missing previous singleton state handled (no crash, no false alias match). * New: access_token-only alias match (legacy schema without refresh_token still recognized). * New: error markers cleared only on entries actually refreshed. * Updated: existing manual-device-code sync test now covers both the legacy-alias path AND the independent-account path in one fixture. Behaviour change is zero for users with a single Codex account and zero for users whose only "manual:device_code" entry is the legacy alias of the singleton. Users with multiple independent Codex accounts added via `hermes auth add` now keep their distinct token material across re-auths. Local: 29 passed in tests/hermes_cli/test_auth_codex_provider.py, no new failures in tests/hermes_cli/ vs upstream/main baseline. Fixes #39236.	2026-06-08 11:57:03 -07:00
xxxigm	96fd9d4979	fix(desktop): stop running Hermes.exe locking win-unpacked before Windows pack (#42100 ) * fix(desktop): stop running app locking win-unpacked before pack On Windows a running Hermes.exe keeps an exclusive lock on release/win-unpacked/Hermes.exe, so electron-builder's pack cannot replace it and dies with "remove ...\Hermes.exe: Access is denied" / ERR_ELECTRON_BUILDER_CANNOT_EXECUTE (before-pack hits the same EPERM cleaning the dir, and the cache-purge retry repeats the failure since the lock is still held). Before building the packaged app, terminate any process whose executable lives inside this build's release/ tree so the rebuild -- including the installer's headless --update rebuild -- can replace the binary. Scope is narrow (only exes under release/), POSIX is a no-op (it can unlink a running binary), and the final error now points Windows users at the running-app cause. * test(desktop): cover the win-unpacked lock-breaker helper Verify _stop_desktop_processes_locking_build is a no-op off-Windows, terminates only processes whose exe lives under release/ (sparing our own PID and unrelated installs), and short-circuits when no release dir exists.	2026-06-08 11:51:31 -07:00
Teknium	abcf996b1f	feat(windows): enable dashboard /chat tab via ConPTY (win_pty_bridge) + tests (#42251 ) * feat(windows): enable dashboard chat tab via ConPTY (win_pty_bridge) Add hermes_cli/win_pty_bridge.py — a pywinpty-backed drop-in for PtyBridge with the same spawn/read/write/resize/close surface — and wire it into the web_server PTY import block so Windows picks it up instead of falling back to None. pywinpty is already a declared win32 dependency (pyproject.toml). The ConPTY read path runs inside run_in_executor so the event loop is never blocked. Spawn/read/write/terminate call shapes are taken directly from tools/process_registry.py which already exercises the same pywinpty version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: remove WSL2-only caveat for dashboard chat tab The chat pane now works on native Windows via the ConPTY bridge added in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(windows): cover ConPTY bridge + web_server platform-branched import Companion to the bridge added in the previous commits. Verified live on native Windows 11 (pywinpty 2.0.15) against `hermes dashboard`'s `/api/pty` WebSocket: the spawned `hermes --tui` (node entry.js) renders through ConPTY, resize escapes reach `setwinsize`, and closing the WS reaps both the node child and the pywinpty agent with zero orphans. tests/hermes_cli/test_win_pty_bridge.py Mirrors the layout of the existing POSIX test_pty_bridge.py: spawn/io/resize/close/env coverage against cmd.exe and python -c, plus the cross-platform fallback surface (PtyUnavailableError, the off-Windows `spawn -> raises PtyUnavailableError` guard, and the load-bearing _clamp() helper that protects setwinsize from garbage winsize values out of xterm.js). tests/hermes_cli/test_web_server_pty_import.py Asserts that web_server.PtyBridge resolves to WinPtyBridge on win32 and to the POSIX PtyBridge on POSIX, that PtyUnavailableError is the matching class on each side (so isinstance checks in /api/pty's spawn fallback path work), and a source-text check that pins the platform-branched import shape so a future refactor can't quietly collapse it back to a POSIX-only import. scripts/release.py AUTHOR_MAP entries so CI release-note generation can resolve both authors' plain (non-noreply) emails to their GitHub logins. Co-Authored-By: JoelJJohnson <josephjohnson.joel@gmail.com> Co-Authored-By: Nea74 <andreas@schwarz-ketsch.de> --------- Co-authored-by: JoelJJohnson <josephjohnson.joel@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nea74 <andreas@schwarz-ketsch.de>	2026-06-08 11:32:43 -07:00
BarnacleBoy	550b72dd87	fix(cli): gate tool-rendering paths with tool_progress_mode, not quiet_mode quiet_mode was being used to suppress tool-result display when tool_progress_mode was 'off'. But quiet_mode also gates operational status messages, so users with /verbose + tool-progress off lost all status output. Adds a dedicated tool_progress_mode attribute to AIAgent; the tool_executor result-rendering path gates on tool_progress_mode != 'off'. The CLI passes its tool_progress_mode through agent setup and the tool-progress cycle command syncs it onto the live agent. Fixes #33860.	2026-06-08 11:29:53 -07:00
firefly	ae94ed1728	fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-08 10:02:05 -07:00
Teknium	9c9d9113a8	fix(auth): auto-detect OpenRouter credential from the pool, not just env (#42263 ) resolve_provider() auto-detection only checked OPENROUTER_API_KEY/ OPENAI_API_KEY env vars, never the credential pool. A key added via `hermes auth add openrouter` (manual pool entry, no env var) was invisible: the provider failed to resolve or resolved with an empty api_key, so requests went out with no Authorization header and OpenRouter returned "HTTP 401: Missing Authentication header" while `hermes auth list` showed the credential. Closes #42130. - auth.py: check load_pool("openrouter").has_credentials() after the env check - dump.py: `debug share` shows 'openrouter set (auth pool)' instead of the misleading 'not set' when the key lives in the pool - add regression tests (pool credential auto-detects; empty pool still raises)	2026-06-08 10:01:47 -07:00
teknium1	a77efada5f	refactor(cli): extract 18 model-flow wizard functions into model_setup_flows (god-file Phase 2) Lift the 18 _model_flow_* provider-setup wizard functions out of hermes_cli/main.py into hermes_cli/model_setup_flows.py. Behavior-neutral; main.py 14050 -> 11479 LOC. select_provider_and_model (the dispatcher) STAYS in main.py and re-imports the flows via an explicit 'from hermes_cli.model_setup_flows import (...)' block, so both its bare-name calls and existing test monkeypatches targeting hermes_cli.main._model_flow_* keep resolving against main's namespace unchanged. Imports: 3 neutral deps (argparse, os, subprocess) at the module top; the 14 main.py-internal helpers the flows call (_prompt_api_key, _save_custom_provider, the reasoning-effort/stepfun/qwen helpers, _run_anthropic_oauth_flow, ...) are lazy-imported per-flow (from hermes_cli.main import ...) so the new module never imports main at module scope -> no import cycle. Repointed one source-inspection change-detector (test_setup_ollama_cloud_force_refresh) to read the module the ollama-cloud branch moved to. Validation: 6563/6563 hermes_cli tests pass; live flow-dispatch probe confirms the lazy main-internal imports resolve at runtime.	2026-06-08 09:42:44 -07:00
teknium1	094aa85c37	refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) Lift the 5 agent-construction/session-resume methods out of HermesCLI into hermes_cli/cli_agent_setup_mixin.py:CLIAgentSetupMixin. Behavior-neutral; cli.py 14139 -> 13492 LOC. Methods moved (~647 LOC): _ensure_runtime_credentials, _resolve_turn_agent_config, _init_agent, _preload_resumed_session, _display_resumed_history. All self.* calls resolve unchanged via the MRO (HermesCLI(CLIAgentSetupMixin, CLICommandsMixin)). Import split (same recipe as #41942): 2 neutral deps (sys, _escape) imported at the mixin module top; 12 cli.py-internal helpers/constants (AIAgent, ChatConsole, CLI_CONFIG, _cprint, _DIM, _RST, _accent_hex, ...) imported lazily per-method (from cli import ...) so the mixin never imports cli at module scope -> no cycle. Repointed one source-inspection change-detector (test_callable_api_key.py) to read the mixin file where the method now lives.	2026-06-08 09:41:34 -07:00
yoniebans	87ac7cac13	fix(dashboard): log update changelog against origin/main, not @{upstream} The behind-count (banner._check_via_local_git) measures HEAD..origin/main, but _recent_upstream_commits logged HEAD..@{upstream}. On a feature-branch checkout @{upstream} is the branch's own tip (0 commits), so the changelog came back empty while behind>0 — the overlay then showed generic filler instead of what changed. Pin the commit range to origin/main so count and changelog agree. Verified against a checkout 11 behind origin/main: now returns 11 commits.	2026-06-08 08:58:26 -07:00
yoniebans	9e360681f8	feat(dashboard): return recent commits from /api/hermes/update/check Add a best-effort `commits` list (sha/summary/author/at) to the update-check response for git/pip installs that are behind upstream, so the desktop's remote update overlay can show what's changed before applying. Additive and non-breaking: existing consumers (legacy dashboard, tests using subset assertions) ignore the new field. Leaves the shared check_for_updates() int contract untouched — commits come from a separate best-effort git call.	2026-06-08 08:58:26 -07:00
teknium1	cb13723f53	fix(pty-bridge): mark os.killpg/getpgid windows-footgun-ok (POSIX-only module)	2026-06-08 07:03:12 -07:00
paulb26	b31c6c33b2	fix(pty-bridge): terminate PTY process groups on teardown	2026-06-08 07:03:12 -07:00
kshitij	b99c6c4277	Merge #42076 : nested category plugin discovery + alias-normalized enable/disable (#41066 ) Merge #42076: nested category plugin discovery + alias-normalized enable/disable (#41066) Lands the complete nested category plugin fix: - Discovery in `hermes plugins list` (from @islam666's #41076, carried in this PR) - Alias-normalized enable/disable mutation path so nested plugins can be toggled - Fixes the #41076 base breakages (web_server 6-tuple unpack + stale test fixtures) Co-authored work: discovery by @islam666 (#41076). Closes #41066.	2026-06-08 05:47:27 -07:00
kshitijk4poor	2b89afec79	fix(plugins): alias-normalize enable/disable for nested category plugins (follow-up to #41076 ) #41076 makes `hermes plugins list` discover nested category plugins (e.g. observability/nemo_relay). This adds the missing enable/disable mutation path so those plugins can actually be toggled, and fixes two incomplete-update breakages on the #41076 base. Before: `hermes plugins enable nemo_relay` -> "Plugin 'nemo_relay' is not installed or bundled." (exit 1), because cmd_enable/cmd_disable went through _plugin_exists(), which only checked top-level plugins/<name>/. Changes: - Add _resolve_plugin_key(): resolve a bare manifest/leaf name OR a full path-derived key (observability/nemo_relay) to the canonical key the runtime loader gates on, reusing #41076's _discover_all_plugins(). A bare leaf name ambiguous across two categories resolves to None rather than silently picking one. - cmd_enable/cmd_disable resolve first, persist the canonical key, and drop any stale legacy bare-name alias so the enabled/disabled lists can't drift into a contradictory state. _plugin_exists delegates to the same resolver. - Fix #41076 base breakages: _discover_all_plugins now returns 6-tuples, but web_server._merged_plugins_hub() still unpacked 5 (ValueError on the dashboard plugins-hub endpoint) and several test_plugins_cmd_list.py fixtures were still 5-tuples. Both updated; the hub status check is now key-aware. Verified e2e on the real CLI + runtime loader (isolated HERMES_HOME): `hermes plugins enable nemo_relay` writes observability/nemo_relay to config.yaml and the loader then loads it (enabled=True, error=None); a stale bare-name alias is cleared on disable; the dashboard _merged_plugins_hub() runs without crashing. Adds resolution + enable/disable tests; full tests/hermes_cli/test_plugins_cmd* + web_server plugin tests green. Follow-up to #41076 (#41066). Branched from that PR's head.	2026-06-08 17:57:37 +05:30
teknium1	0904bc7ea2	refactor(cli): extract 32 slash-command handlers into CLICommandsMixin (god-file Phase 4) Lift the `_handle_*_command` cluster (2,077 LOC) out of HermesCLI into hermes_cli/cli_commands_mixin.py; HermesCLI now inherits CLICommandsMixin so every self.<handler> call resolves unchanged via the MRO. Behavior-neutral. Import discipline mirrors gateway/slash_commands.py (PR #41886): neutral deps imported at the mixin module top level; cli.py-internal helpers/constants (_cprint, _ACCENT, save_config_value, ...) imported lazily inside each handler via 'from cli import ...' so the mixin never imports cli at module scope. cli.py 16215 -> 14139 LOC. One test mock repointed (cli.is_browser_debug_ready -> hermes_cli.cli_commands_mixin.is_browser_debug_ready).	2026-06-08 02:13:07 -07:00
floory	15c99b437f	fix(cli): set PYTHON env for node-gyp native builds on NixOS (#40690 ) * fix(cli): set PYTHON env for node-gyp native builds on NixOS node-gyp (triggered by node-pty during npm ci) looks for python3 on PATH, which fails on NixOS because python3 lives in the nix store and is not on the system PATH. Add _nixos_build_env() — a two-tier helper that detects NixOS and: 1. Fast path: hermes venv python3 (~0s) 2. Fallback: nix-shell which python3 (~2-5s) Wire it into _run_npm_install_deterministic() via a new env= parameter, then pass it through cmd_gui() and _update_node_dependencies(). Non-NixOS systems: _nixos_build_env() returns None, behavior unchanged. * fix(cli): merge _nixos_build_env() with os.environ, fix NixOS detection, add explicit return None - Critical fix: both Tier 1 (venv) and Tier 2 (nix-shell) now return {**os.environ, "PYTHON": ...} instead of {"PYTHON": ...} — subprocess.run with env= replaces the entire environment, so the old code wiped PATH and broke npm/node on NixOS entirely. - Uses re.search(r"^ID=nixos$", ...) for anchored NixOS detection instead of unanchored substring match (could match ID_LIKE=...nixos). - Removes redundant Path.exists() guard before read_text(); just catches OSError (one filesystem read instead of two). - Adds explicit return None at end of function for type-hint consistency.	2026-06-08 13:57:37 +05:30
Teknium	4d18717b6c	fix(gateway): drop --replace from systemd unit templates (#41892 ) Under systemd's Restart=always, --replace turns every restart into a self-kill loop: the new instance reads gateway.pid, kills the previous process, writes its own PID, and on the next restart the cycle repeats. A process supervisor owns the lifecycle — --replace is for manual one-shot takeovers and fights the supervisor. Remove --replace from both the system-level and user-level systemd ExecStart lines. The --replace flag stays available for manual 'hermes gateway run --replace' and on the macOS launchd fallback path (#23387), which is a deliberate manual takeover, not a supervised unit. Also drop RestartMaxDelaySec / RestartSteps from the templates — they require systemd v255+ and are silently ignored on older versions. The _strip_optional_systemd_directives normalizer stays so existing installs whose on-disk unit still carries those directives aren't flagged as outdated. Credit: reported and diagnosed by @Skippy-the-Magnificent-one (PR #37145); reimplemented here under project authorship because the original commit was authored under a non-existent email.	2026-06-08 00:20:08 -07:00
konsisumer	3714caa1b9	fix(session): follow compression continuations for transcript reads	2026-06-07 23:57:20 -07:00
teknium1	1a626470ca	refactor(cli): promote 9 closure handlers to top-level + extract their parsers (god-file Phase 2 follow-up) Subcommands whose handler was a closure defined inside main() — memory, acp, tools, insights, skills, pairing, plugins, mcp, claw — have their handler promoted to a top-level function and their parser block extracted into hermes_cli/subcommands/<name>.py (build_<name>_parser, injected handler). These 9 had zero closure-over-main-locals, so promotion is a pure relocation. acp/mcp parser blocks use the shared add_accept_hooks_flag helper. main() 1798 -> 954 LOC (71% below the 3297 Phase-2 starting point); add_parser calls in main.py 89 -> 28. Deferred: sessions, computer-use, secrets handlers reference <name>_parser (for a no-subcommand print_help fallback) — left in place to avoid the _self_parser indirection; minority, low value. Behavior-neutral: all 9 subcommands' --help (incl nested subactions) byte- identical to pre-extraction (diff-verified). tests/hermes_cli/ 6519 passed / 0 failed; new test_subcommands_followup.py covers the 9 builders.	2026-06-07 22:56:23 -07:00
teknium1	568e127612	refactor(cli): extract 25 more subcommand parsers into hermes_cli/subcommands/ Batch extraction of every remaining subcommand whose handler is top-level and whose parser block is pure argparse: model, setup, postinstall, whatsapp, slack, login, logout, auth, status, webhook, hooks, doctor, security, dump, debug, backup, import, config, version, update, uninstall, dashboard, gui, logs, prompt-size. Each becomes hermes_cli/subcommands/<name>.py with build_<name>_parser() and an injected handler (no main import). dashboard also injects cmd_dashboard_register for its nested 'register' action. Behavior-neutral: all 25 subcommands' --help output (and nested subaction help) diff-verified byte-identical to pre-extraction. Two RawDescriptionHelpFormatter epilogs (debug, logs) needed their multi-line string interiors preserved at column 0 — caught by the --help diff, not compile. main() 3297 -> 1798 LOC across this PR; add_parser calls in main.py 179 -> 89. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new test_subcommands_batch.py smoke-tests all 25 builders + the dashboard two-handler case.	2026-06-07 22:18:14 -07:00
teknium1	4da45e8727	refactor(cli): extract profile + gateway/proxy parsers into hermes_cli/subcommands/ Follow-on to the cron extraction in the same Phase 2 PR. Same pattern: per-group build_<name>_parser() functions with injected handlers, no main import. - subcommands/profile.py: build_profile_parser (190-line block out of main()). - subcommands/gateway.py: build_gateway_parser (gateway + proxy, 238-line block; they shared one inline section). Imports argparse for SUPPRESS defaults. - main(): two more inline blocks become single builder calls. Behavior-neutral: 'profile [sub] --help' and 'gateway/proxy [sub] --help' byte-identical to pre-extraction (diff-verified). main() now 2723 LOC (was 3297 at Phase 2 start); add_parser calls in main.py 179 -> 141. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new builder unit tests cover subactions, aliases, dispatch, flags.	2026-06-07 22:18:14 -07:00
teknium1	b2e6053243	refactor(cli): extract hermes cron parser into hermes_cli/subcommands/ (god-file Phase 2) Phase 2 of the god-file decomposition plan. main()'s argparse tree is 179 inline add_parser calls in one 3,297-line function. This establishes the hermes_cli/subcommands/ package and extracts the first group (cron) as the proof-of-pattern: - hermes_cli/subcommands/_shared.py: shared parser helpers (add_accept_hooks_flag), re-exported from main.py for backwards compat. - hermes_cli/subcommands/cron.py: build_cron_parser(subparsers, cmd_cron=...). Handler injected so the module never imports main (cycle avoidance). - main()'s ~155-line inline cron block becomes one build_cron_parser() call. Behavior-neutral: 'hermes cron create --help' output is byte-identical to origin/main. main() 3297 -> 3143 LOC. Validation: tests/hermes_cli/ 6466 passed / 0 failed under per-file process isolation; new test_subcommands_cron.py covers subactions, aliases, options, no-agent tristate, injected dispatch, and --accept-hooks.	2026-06-07 22:18:14 -07:00
islam666	78e2101cd2	fix: reap zombie subprocesses in web_server action status and meet_bot cleanup - web_server.py: after proc.poll() returns a non-None exit code, call proc.wait() to reap the child and move the entry from _ACTION_PROCS to _ACTION_RESULTS. Previously .poll() alone left <defunct> zombies. - meet_bot.py: terminate and wait on the pcm_pump subprocess (paplay/ ffmpeg) during the finally-block teardown. Previously leaked on every normal bot exit. - tests: add test_action_status_reaps_completed_process and test_action_status_ignores_wait_failure covering both the happy path and the wait()-raises-OSError edge case. Closes #38032	2026-06-07 21:50:57 -07:00
islam666	e53b74c394	fix(dist): stop USER_OWNED_EXCLUDE from filtering nested directories The copytree ignore lambda in _copy_dist_payload applied USER_OWNED_EXCLUDE recursively at every directory depth. This caused nested directories whose names matched exclude entries (bin, logs, cache, etc.) to be silently dropped during distribution install/update. Fix: only apply USER_OWNED_EXCLUDE filtering at the root of the staged tree, matching the two-tier pattern used by _clone_all_copytree_ignore and _default_export_ignore in profiles.py. Add 5 tests covering nested bin/logs/cache preservation and top-level filtering still working. Fixes #37954	2026-06-07 21:50:57 -07:00
islam666	f1d3afb151	fix(profiles): skip 'default' in named profiles scan to prevent duplicates When ~/.hermes/profiles/default/ exists as a directory, list_profiles() returns 'default' twice: once as the built-in default profile (~/.hermes) and once from the directory scan (~/.hermes/profiles/default). This causes the cron dashboard API (profile=all) to read the same jobs.json twice, showing every default-profile job duplicated in the UI. Fix: skip name=='default' in the named profiles loop, since it's already added as the built-in default at the top of the function. Fixes #39346	2026-06-07 21:50:57 -07:00
islam666	18c085b1a4	fix(gateway): normalize optional systemd directives in stale-check (#41119 ) On older systemd versions that don't support RestartMaxDelaySec / RestartSteps, the installed unit file has those directives silently dropped. systemd_unit_is_current() did a strict text comparison, so the unit was perpetually flagged as outdated. Fix: _strip_optional_systemd_directives() removes RestartMaxDelaySec and RestartSteps from both the installed and expected text before comparison. Units that differ only by these optional directives are now correctly considered current.	2026-06-07 21:50:57 -07:00
Shannon Sands	86e5efb0ae	Preserve Telegram onboarding fallback errors	2026-06-07 19:48:09 -07:00
Shannon Sands	ba29010902	Use httpx for Telegram onboarding worker calls	2026-06-07 19:48:09 -07:00
Teknium	1892e22acb	fix(skills): browse shows full catalog, not first 5000 (#41413 ) hermes skills browse capped the hermes-index source at 5000, so it surfaced ~5.4k of the ~90.7k skills the index actually carries. Raise the per-source ceiling above catalog size; browse already paginates client-side and the index is disk-cached, so no extra fetch cost.	2026-06-07 10:15:31 -07:00

1 2 3 4 5 ...

2631 commits