hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-29 18:46:59 +00:00

Author	SHA1	Message	Date
phragg	696037587f	docs: phragg was here	2026-05-29 16:22:11 -04:00
emozilla	bebf1b7e01	fix(desktop): branch-pin the CLI manual-update command card The 'Update from your terminal' card (shown to CLI installs with no staged updater) hardcoded bare `hermes update` — which defaults to main and would switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for the GUI button, leaked into the card's copy text. Resolve the checkout's current branch and show `hermes update --branch <current>` for non-main checkouts; keep it bare for main so the card stays clean. Best-effort: bare fallback if branch detection fails. Matches the GUI button + installer --update contract; bare terminal/bot/TUI update paths still default to main, unchanged.	2026-05-29 02:54:30 -04:00
emozilla	3d8c285054	update test	2026-05-29 02:49:28 -04:00
emozilla	1653a04f70	fix(gui): pin /api/hermes/update to the current branch The desktop command-center 'update' action hits POST /api/hermes/update, which spawned bare `hermes update` with no --branch. cmd_update then falls back to its default (main) and checks the working tree OUT of the tracked branch — a bb/gui install silently jumped to main and lost the desktop CLI. Resolve the checkout's current branch and pass --branch <current> from this endpoint only. The engine default (main) is DELIBERATELY unchanged: bare `hermes update` from a terminal, the gateway /update bot command, and the CLI/TUI relaunch path all keep their long-standing 'update against main' contract for the existing user base. Only the GUI button is scoped to update-the-branch-you're-on. Detached HEAD / git failure falls back to the bare default.	2026-05-29 02:18:31 -04:00
emozilla	48db64c846	update test	2026-05-29 02:02:58 -04:00
emozilla	ce31ec09b9	fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error A user who installed via the CLI (irm\|iex / install.sh) then ran `hermes desktop` has no staged hermes-setup.exe, so clicking Update in-app hit resolveUpdaterBinary()=null and showed a misleading error ('re-run the Hermes installer') with a Try-again button that could never succeed — a dead loop for a perfectly valid install. Treat the no-updater case as an intentional outcome, not a failure: - main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' } (no throw, no 'error' stage) when no updater binary exists. - New 'manual' update stage + apply-state.command thread the command to the UI. - updates-overlay ManualView: a polished terminal-native card with the exact command and a copy button, framed as the correct path for a CLI user rather than an error. GUI-installer users are unaffected — hermes-setup.exe present => seamless auto-update runs as before. Zero new process orchestration; can't fail the update demo.	2026-05-29 01:53:43 -04:00
emozilla	006136c4ab	update test	2026-05-29 01:26:45 -04:00
emozilla	25488de4ba	fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign) The unpacked Hermes.exe showed the stock Electron icon + name in the taskbar because build.win.signAndEditExecutable=false disables BOTH electron-builder's signing AND its rcedit metadata/icon stamping. That flag is load-bearing: enabling it re-triggers signtool -> winCodeSign, whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end). Decouple identity-stamping from signing entirely: after npm run pack, run rcedit ourselves on the produced exe. - Add rcedit as a direct devDependency of apps/desktop (the transitive electron-winstaller copy is fragile). - apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls rcedit's named export to set icon + ProductName/FileDescription/ CompanyName. Node builds argv natively — avoids the PowerShell->exe ->JSON double-escaping that broke the app-builder rcedit path. - install.ps1 Set-DesktopExeIdentity invokes the script after the build, before shortcuts. Best-effort: failure keeps the stock icon, never fails the install. rcedit is a pure PE editor — no signtool, no winCodeSign, no symlinks. Verified locally: stamping a copy of the built Hermes.exe embeds the 32x32 icon and sets ProductName=Hermes. Also fix update-path success-screen flash: in update mode the installer hands off + exits in ~600ms, so don't route to the 'launch Hermes' success view (it flashed before the window closed).	2026-05-29 00:50:14 -04:00
emozilla	aeebe1afa7	test update	2026-05-29 00:25:16 -04:00
emozilla	71d64880d9	fix(installer): pass --branch to hermes update in the --update flow The install is a detached-HEAD checkout of a pinned commit. Without --branch, 'hermes update' fell back to its default (main) and switched the checkout to main — a divergent branch that lacks the desktop CLI command — so the update targeted the wrong branch and the rebuild stage failed with 'invalid choice: desktop'. Thread BUILD_PIN_BRANCH (the branch this installer was built against, and the same branch the desktop detected the update on) into 'hermes update --branch <b>' so update + rebuild stay on-branch.	2026-05-29 00:11:14 -04:00
emozilla	be663d36a5	test update	2026-05-29 00:06:10 -04:00
emozilla	6381e70448	feat(installer): drive in-app updates through the Tauri installer Converge update on the same principle as bootstrap: one driver owns all repo mutation. The desktop becomes a pure consumer that hands off to Hermes-Setup.exe --update instead of re-implementing git/pip in Electron. - hermes desktop --build-only: build without launching, so the installer owns the post-update launch (CLI keeps build logic single-sourced). - Installer AppMode {Install,Update} from argv; get_mode exposed to the UI. - Installer self-copies to HERMES_HOME/hermes-setup.exe on install success (no-op guard during --update re-invocation to avoid the locked-exe copy). - Installer --update flow (update.rs): wait for the desktop to release the venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other), then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses the bootstrap event channel + progress UI via a synthetic two-stage manifest. - Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip removed) -> thin handoff: spawn updater, app.quit() to free the shim. Detection (checkUpdates, commit changelog, behind-count) kept intact. - install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe (never bare 'hermes desktop', which would rebuild every launch).	2026-05-28 23:48:21 -04:00
emozilla	a4cfc8b740	feat(install.ps1): write .hermes-bootstrap-complete marker at end of install The desktop app's main.cjs resolver ladder has a 'bootstrap-needed' rung that fires when .hermes-bootstrap-complete is missing from ACTIVE_HERMES_ROOT. Pre-Hermes-Setup, this marker was written by the packaged-desktop's own bootstrap-runner.cjs at the end of its install flow. Now that Hermes-Setup.exe runs install.ps1 directly, install.ps1 needs to own the marker — otherwise the desktop sees no marker on first launch and triggers its legacy first-launch bootstrap (re-running install.ps1 from inside Electron, the exact recursion Hermes-Setup.exe was supposed to obviate). Implementation: * New Stage-BootstrapMarker (worker) → Write-BootstrapMarker (helper) * Slotted in the manifest right after platform-sdks, before the interactive configure/gateway stages, so it runs unconditionally when the install reaches the finalize phase * Schema mirrors apps/desktop/electron/main.cjs writeBootstrapMarker / isBootstrapComplete EXACTLY: {schemaVersion: 1, pinnedCommit, pinnedBranch, completedAt}. Schema version stays at 1 so old desktops that read marker files written by future install.ps1s can still parse them. * pinnedCommit comes from -Commit flag (Hermes-Setup.exe passes it) or falls back to 'git rev-parse HEAD' in InstallDir * pinnedBranch from -Branch flag, defaults to 'main' matching install.ps1's own param default Two PS-5.1 gotchas baked into comments: * The ?. null-conditional operator doesn't exist pre-PS7; use explicit if-checks on Get-Command results * Set-Content -Encoding UTF8 emits a BOM in 5.1 and Node's plain JSON.parse rejects BOM — write via .NET's UTF8Encoding(false) to produce BOM-less JSON the desktop's readJson() can parse	2026-05-28 13:31:44 -04:00
emozilla	060c4f64a8	fix(desktop): signAndEditExecutable=false to skip signtool path entirely After reading app-builder-lib/winPackager.js line 216 + 231 directly: signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits both signApp() (which signs Hermes.exe + every shouldSignFile match including bundled prebuilds) AND createTransformerForExtraFiles(). None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the winCodeSign download — that happens before they're consulted. What we lose: rcedit also runs through signAndEditResources, so disabling this drops PE metadata (file properties showing 'Hermes' / 'Nous Research' / file description). Cost is real but bounded: * Hermes.exe filename, icon, asar contents, app identity intact * Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE description) — minor downgrade * Start menu, taskbar, window title all work normally * SmartScreen will warn once (unsigned, same as before) When the cert lands, flip signAndEditExecutable back to default true, both signing AND rcedit return, PE metadata is restored. Removes the no-op sign function (build-noop-sign.cjs) since signAndEditExecutable=false prevents signtool from being invoked at all — the custom hook never gets called either.	2026-05-28 13:14:23 -04:00
emozilla	91bf5ee6b7	fix(desktop): use no-op sign function instead of sign=null VM run 6 still hit the symlink crash even with signtoolOptions.sign=null. electron-builder 26.8.1 treats null as 'use the default signtool path' rather than 'skip signing', so the winCodeSign fetch + extraction still fired for the bundled prebuild re-sign. The Electron docs (electronjs.org/docs/latest/tutorial/code-signing) make it clear signing is OPTIONAL and unsigned apps work fine — users just see SmartScreen on first launch. The electron-builder mechanism for 'don't actually sign anything' is to supply a custom sign function (via signtoolOptions.sign: '<path-to-cjs-module>') that resolves without invoking signtool. build-noop-sign.cjs is that module — a 5-line async function that returns undefined. electron-builder calls it for every binary it would have signed, gets back a resolved promise, and considers each binary 'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash. When Nous's cert arrives, replace this file with a real signing hook (@electron/windows-sign-based or a direct signtool invocation). The architecture's signing-ready and the cutover is a one-file edit.	2026-05-28 12:59:14 -04:00
emozilla	3387b8df58	fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract VM run 5 diagnosis: the pre-extract from `3b29e65c1` ran (extracted 83 files, 24MB) but produced ZERO files at the expected sentinel path '/winCodeSign-2.6.0/windows-10/x64/signtool.exe'. Cause: the .7z archive's root entries are 'windows-10/', 'darwin/', 'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with '-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at $cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory nesting wrong from the start. And then we observed: electron-builder downloads winCodeSign-2.6.0.7z under a random numeric filename ('384387955.7z') regardless of what's already extracted in the parent dir. The cache key isn't the dirname; it's content-addressed. So the pre-extract approach was doomed even if the path nesting had been right. Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's win build config. electron-builder honors this and skips the bundled- prebuild signing entirely — no signtool invocation, no winCodeSign fetch, no symlink-privilege crash. The previous failures all stemmed from electron-builder pre-signing node-pty's bundled .exes (winpty-agent.exe, OpenConsole.exe) which are already author-signed upstream; re-signing with our nonexistent cert was overwriting good sigs with nothing useful anyway. Cost: when we DO get a real cert later, we'll add it back with the sign function pointing at the cert chain. Until then, all-null is the correct config and unblocks every non-admin Windows user. Removed Initialize-ElectronBuilderCache (the dead pre-extract). Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env vars as belt-and-suspenders against a future electron-builder change that might revive cert auto-discovery.	2026-05-28 11:42:40 -04:00
emozilla	3b29e65c1b	fix(install.ps1): restore Initialize-ElectronBuilderCache (CSC env vars alone aren't enough) VM run 4 diagnosis: even with CSC_IDENTITY_AUTO_DISCOVERY=false set, electron-builder still fetches winCodeSign and signs bundled binaries. The log shows the signing happens BEFORE the cache extraction: • signing with signtool.exe ...\winpty-agent.exe • signing with signtool.exe ...\OpenConsole.exe • downloading winCodeSign-2.6.0.7z • <symlink privilege error> Cause: node-pty's bundled prebuilds are listed in apps/desktop's asarUnpack ['*/.node', '/prebuilds/']. electron-builder re-signs anything unpacked from asar, regardless of whether OUR binary gets signed. The signtool invocation needs winCodeSign on disk, which needs the .7z extracted, which hits the macOS-symlink crash on non-admin Windows. The CSC env vars I added in `d5fe46727` only kill IDENTITY DISCOVERY (so OUR Hermes.exe stays unsigned, which is fine — we have no cert). They don't prevent the toolchain fetch for the bundled-prebuild re-sign. I removed the pre-extract in `d5fe46727` thinking the env vars subsumed it; that was wrong. Both are needed. Restoring Initialize-ElectronBuilderCache verbatim from `c7e46f9f3` and keeping the CSC env vars. Wrote a clearer doc-comment at the call site explaining the two-knob interaction so future maintainers don't drop one half again.	2026-05-28 11:17:05 -04:00
emozilla	e2d69ce066	fix(install.ps1): Stage-NodeDeps cross-process $HasNode + stream npm install output to bootstrap log VM run 3 diagnosis: node-deps stage skipped on the VM (logged 'Skipping Node.js dependencies (Node not installed)') and then desktop's npm install failed with exit 1 and zero diagnostic detail. Two root causes: 1. $HasNode false-skip in Stage-NodeDeps — same cross-process bug pattern we fixed for Stage-Desktop in `c7e46f9f3`. Stage-Node ran in process A and set $script:HasNode = $true, then exited. Stage- NodeDeps ran in fresh process B (Hermes-Setup.exe -Stage NAME spawns each stage independently), where that variable doesn't exist. Re-probe via Get-Command npm instead of trusting the stale script-scope global. The previous stage already verified Node so the re-probe succeeds. 2. npm install --silent + Tee to TEMP file hid the real error. When the workspace install failed on the VM, the actual reason was buffered in $env:TEMP\hermes-npm-desktop-install-*.log and the user saw only 'exit 1'. Drop --silent so npm streams its full output, drop the TEMP-file dance — the Tauri installer's streaming sink already tees every stdout/stderr line to the rolling bootstrap-installer.log, so a side log file is dead weight that hides the very error we need. After this, the bootstrap log on a failure will contain npm's full output (deprecation warnings, ETARGET, native-module compile errors, whatever) tagged with stage=desktop, making the actual cause diagnosable instead of an opaque exit code.	2026-05-28 11:02:47 -04:00
emozilla	17edb1db2b	fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line Diagnosing the second VM failure was impossible because bootstrap-installer.log contained only the 'starting' banner. Two causes: 1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the floor under the default INFO env-filter. 2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only emitted Tauri events to the frontend; they never tee'd to the log file at all. When the failure route mounts, the Tauri event stream is the only place the script output lived, and it gets discarded. 3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event() were also Tauri-only — so even the 'which stage failed' frame never reached the log. Fixes: * emit_log() → tracing::info! * Sink callbacks tee stdout to info!, stderr to warn!, with stage label as a structured field for grep'ability * emit_event() now matches on the variant and logs each lifecycle frame at the right level: Failed → tracing::error!, others → info! Result: a failing install leaves a complete forensic trail in bootstrap-installer.log — manifest stage list, every install.ps1 stdout/stderr line tagged by stage, the stage transitions, and the final error. Same path as before so nothing the user does changes.	2026-05-28 10:48:43 -04:00
emozilla	d5fe467277	fix(install.ps1): tell electron-builder we're NOT signing instead of pre-extracting winCodeSign The previous commit (`c7e46f9f3`) worked around the winCodeSign-symlinks- on-Windows extraction crash by pre-extracting the archive ourselves with -snl + -x!darwin. That fix was correct but addressed the wrong layer. The deeper question: why was electron-builder fetching winCodeSign at all when we have no signing cert configured? Answer: electron-builder unconditionally pre-warms the toolchain assuming any build MIGHT sign. The cert auto-discovery never finds anything (we never set CSC_LINK or anything else), so the signing never happens — but the 100MB fetch of winCodeSign and its broken-on-Windows symlink extraction does. Set CSC_IDENTITY_AUTO_DISCOVERY=false (with WIN_CSC_LINK and WIN_CSC_KEY_PASSWORD also explicitly cleared as belt-and-suspenders) before invoking npm run pack, and electron-builder skips the entire winCodeSign apparatus. No download, no extraction, no privilege check. Env vars are saved/restored around the invocation so we don't leak the override into Stage-PlatformSdks etc. Net: removes the 100-line Initialize-ElectronBuilderCache helper that manually downloaded + extracted winCodeSign-2.6.0.7z. Replaced with 3 env-var assignments. The produced Hermes.exe is functionally identical — just no longer carries a code-signing-machinery dependency we never used.	2026-05-28 10:37:07 -04:00
emozilla	c7e46f9f3d	fix(install.ps1): pre-warm electron-builder winCodeSign cache + fix Stage-Desktop $HasNode false-skip Two bugs caught in the second VM end-to-end run: 1. electron-builder's winCodeSign extraction fails on grandma-class Windows boxes because the .7z archive contains macOS symlinks (darwin/10.12/lib/libcrypto.dylib and libssl.dylib pointing at versioned siblings). Creating symlinks on Windows requires SeCreateSymbolicLinkPrivilege, a per-user right that non-admin accounts don't have on stock Windows. Result: every fresh install on a non-admin user fails Stage-Desktop with a 7-Zip 'cannot create symbolic link' error, retried four times, then bails. Fix: Initialize-ElectronBuilderCache pre-extracts winCodeSign-2.6.0.7z ourselves with -snl (don't preserve symlinks, store as resolved file content) AND -x!darwin (skip the entire macOS subtree — irrelevant on Windows). Writes to electron-builder's expected cache dir before electron-builder gets a chance to try its own broken extraction. Idempotent — fast-paths via signtool.exe sentinel check. 2. Install-Desktop's first guard was 'if (-not $HasNode) skip'. $HasNode is set by Stage-Node into $script:HasNode, but in cross-process driver mode (each -Stage NAME is a fresh powershell.exe spawned by Hermes-Setup.exe), that script-scope variable from the PREVIOUS process is invisible — so the guard always fired and Install-Desktop returned in 900ms with a misleading 'Node.js not available' reason. The real npm probe below it never got to run. Fix: re-probe npm directly via Get-Command when $HasNode is empty/false, since by that point Stage-Node has already verified Node is installed and the only question is whether this process can see it on PATH (it can — installer-wide PATH update from Stage-Node).	2026-05-28 03:05:26 -04:00
emozilla	0a079f7321	fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop Three bugs found in the first VM end-to-end test: 1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the manifest came back with the 14-stage list (no desktop stage), the UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to both the manifest fetch and the per-stage runs — install.ps1 gates the desktop stage's inclusion on the flag. 2. The Success screen's Launch button silently swallowed the Tauri error when no Hermes.exe existed (e.g. Stage-Desktop was skipped). Wire the error through to inline UI with an alert callout, so the user gets actionable text ('Hermes.exe missing, run hermes desktop from a terminal') instead of an unresponsive button. 3. The Success screen tells users to run 'hermes desktop' from a terminal but the CLI only accepted 'hermes gui' — invalid choice for 'desktop'. Rename the subcommand canonically to 'desktop' with 'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets used by session-flag arg parsing + logging-mode probe so both names route to the same logic.	2026-05-28 02:42:33 -04:00
emozilla	8eedb50bce	feat(installer): Tauri bootstrap installer for first-time onboarding Hermes-Setup.exe is a small signed Rust+Tauri binary that drives scripts/install.ps1 stage-by-stage with a native UI matching the desktop's design language. Replaces the chicken-and-egg pattern of shipping a 200MB Electron app whose first launch existed only to run install.ps1. The architecture: Rust backend (src-tauri/): bootstrap.rs orchestrator -- Tauri commands, stage iteration install_script.rs resolve install.ps1 (dev checkout, cache, GitHub raw) powershell.rs spawn powershell, line-stream stdout/stderr, parse JSON events.rs BootstrapEvent types -- mirror bootstrap-runner.cjs paths.rs HERMES_HOME resolution + tracing log setup build.rs bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from 'git rev-parse HEAD' at compile time React frontend (src/): Tauri webview rendering 4 screens (welcome / progress / success / failure), driven by nanostores subscribing to the Rust event stream. Visual layer reuses the desktop's styles.css wholesale via @import so the installer and desktop never drift visually. Distribution: targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The raw target/release/Hermes-Setup.exe IS the artifact on Windows; .dmg + .app on macOS; AppImage on Linux. One file, double-click, no installer-installing-an-installer pattern. Compile-time pinning: build.rs reads 'git rev-parse HEAD' and emits cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>. bootstrap.rs's option_env!() picks these up so the binary fetches install.ps1 from the exact SHA it was tested against. CI / release builds can override via HERMES_BUILD_PIN_COMMIT env var. Windows manifest: hermes-setup.manifest declares level='asInvoker' so the productName 'Hermes Setup' doesn't trip Windows's installer- detection heuristic and refuse to launch without elevation. Also declares PerMonitorV2 DPI + UTF-8 active code page + Common Controls v6. Limitations of this initial version: * No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe ('More info -> Run anyway'). The downstream binaries it produces (Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and therefore don't carry MOTW, so they launch without SmartScreen intervention. Cert procurement tracked separately. * macOS and Linux build paths defined but untested -- Windows-only V1.	2026-05-28 02:23:13 -04:00
emozilla	80d782bc78	feat(install.ps1): add -IncludeDesktop switch + Stage-Desktop The new Hermes-Setup.exe (Tauri bootstrap installer) passes -IncludeDesktop so users who install via the GUI end up with a launchable Hermes.exe at apps/desktop/release/<os>-unpacked/. Existing flows are unchanged: * The 'irm install.ps1 \| iex' CLI one-liner omits the flag — terminal users don't need a prebuilt desktop binary; 'hermes desktop' builds on demand. * The Electron desktop's bootstrap-runner.cjs also omits the flag — rebuilding apps/desktop from inside a running Hermes.exe would try to overwrite the live binary on disk and fail. Stage-Desktop runs after Stage-NodeDeps so workspace npm is already installed when electron-builder fires. It does: 1. 'npm install' at repo root so apps/* workspaces resolve their deps (Electron itself arrives via npm here, ~150MB) 2. 'npm run pack' in apps/desktop (tsc + vite + electron-builder --dir) 3. Probes apps/desktop/release/{win-unpacked,win-arm64-unpacked}/Hermes.exe The --dir mode produces an unpacked launchable binary without an NSIS/MSI installer artifact — we don't need one because Hermes-Setup.exe spawns the unpacked binary directly via launch_hermes_desktop.	2026-05-28 02:23:13 -04:00
copilot-swe-agent[bot]	19419a47d7	Merge origin/main into bb/gui	2026-05-28 03:08:24 +00:00
brooklyn!	912e6e2274	fix(tui): suppress mouse-residue leaks during Python launcher startup (#31213 ) * fix(tui): suppress mouse-residue leaks during Python launcher startup `hermes --tui …` spends ~100–300ms inside the Python launcher (lazy imports, arg parsing, session resolution) before exec'ing the Node TUI binary. During that window stdin is still in cooked + echo mode. If a prior session left DEC mouse tracking asserted (or the user spammed mouse movement while the previous session was opening), the terminal keeps emitting `\\x1b[<…M` SGR motion reports that get echoed straight back into the user's shell scrollback as literal `^[[<…M` text and sit there above the TUI banner until the next clear. The Node side already calls `resetTerminalModes()` in `entry.tsx`, but by then the race is already lost — the bytes echoed during the Python warmup window were committed to the scrollback before Node started. Fix: write the mouse-tracking disable sequence at the very top of `hermes_cli.main`, before every heavy import. The terminal stops emitting motion events as soon as the bytes hit the wire (one TTY round-trip), shrinking the race window from hundreds of milliseconds to a few. `HERMES_TUI_NO_EARLY_DISABLE=1` opts out for diagnostics. * test(tui): drop dead _reload_main, hoist import out of patch context Addresses Copilot review on PR #31213. The tests used to import `hermes_cli.main` inside the `patch("os.write")` context, which Copilot pointed out is order-dependent: if the module is already loaded (e.g. imported by a prior test in the same process), the import is a no-op and the patch only sees the explicit `_suppress_mouse_residue_early()` call. Either way the assertion can flake when run alongside other tests. Move the import to module scope — every subprocess gets a fresh `hermes_cli.main`, whose module-level invocation is a no-op under pytest argv. Tests then exercise `_suppress_mouse_residue_early()` directly inside their own patch context. Also drop the unused `_reload_main` helper. * fix(tui): skip early mouse-disable when stdout is not a TTY Addresses Copilot review on PR #31213. `hermes --tui … >log` or CI capture pipes fd 1 away from the terminal. The disable bytes can't reach the terminal in that case but would still get written into the log file as raw CSI sequences. Guard with `os.isatty(1)` inside the existing `try/except OSError` block so the 'never break startup' contract holds. * docs(tui): rephrase 'raw cooked mode' as 'cooked + echo mode' Copilot review nit on PR #31213 — the original wording was self- contradictory. Pre-TUI stdin state is cooked + echo (kernel TTY discipline still owns the line buffer and echoes input back). The TUI switches it to raw mode later when Ink mounts.	2026-05-27 22:03:45 -05:00
emozilla	a5d418bc5b	Merge branch 'bb/gui' of github.com:NousResearch/hermes-agent into bb/gui	2026-05-27 22:43:13 -04:00
emozilla	791f4e939d	fix(setup): drop shadowing inner importlib.util re-imports _print_setup_summary and _setup_tts_provider each had 'import importlib.util' inside a try: block nested deeper in the function body. Python flips importlib to function-local for the whole scope, so earlier references in the same function (the neutts branches at lines 493 / 1109) hit UnboundLocalError before the late import can run. The top-of-module 'import importlib.util' at line 14 already covers both call sites, so dropping the redundant inner imports restores the intended behavior.	2026-05-27 22:42:24 -04:00
emozilla	7a15f0b1ac	fix(telegram): import Set for _dm_topic_chat_ids annotation self._dm_topic_chat_ids: Set[str] = {...} at line 460 references Set but only Dict, List, Optional, Any are imported from typing. The file has no 'from __future__ import annotations', so the annotation is evaluated at runtime and raises NameError on TelegramAdapter construction.	2026-05-27 22:42:16 -04:00
Ben Barclay	0927fb5584	feat(docker): auto-redirect `gateway run` to supervised mode inside s6 image Pre-s6, `docker run nousresearch/hermes-agent gateway run` was the standard invocation: gateway ran as the container's main process, tini reaped zombies, container exit code matched gateway exit code, no supervision. With s6-overlay as PID 1, the same invocation now auto-upgrades to supervised semantics — auto-restart on crash, dashboard supervised alongside (when HERMES_DASHBOARD=1 is set), multiple profile gateways under the same /init. Users get the new behavior with zero changes to their docker run command. A loud one-line breadcrumb on stderr explains the upgrade and points at the opt-out for users who genuinely want pre-s6 foreground semantics. How it works: 1. `_gateway_command_inner` (the `gateway run` handler) checks if we're inside a container with s6 as PID 1. 2. If yes, dispatches `start` to the s6 service manager (registers and starts gateway-default), then `exec sleep infinity` to keep the CMD process alive without binding container lifetime to gateway PID lifetime. The supervised gateway can flap freely; `docker stop` still tears everything down via /init stage 3. 3. If no, falls through to the existing foreground code path unchanged. Host runs of `hermes gateway run` are unaffected. Three gates make the redirect inert outside the intended scope: * `detect_service_manager() != "s6"` — host/non-s6-container runs. * `HERMES_S6_SUPERVISED_CHILD=1` env var (recursion guard) — exported by `S6ServiceManager._render_run_script` for the s6-supervised invocation itself. Without this guard, the supervised `gateway run --replace` would re-enter the redirect and recurse (run → start → run → start → ...) infinitely. * `--no-supervise` CLI flag OR `HERMES_GATEWAY_NO_SUPERVISE=1` env var — explicit user opt-out for CI smoke tests, debugging the foreground startup path, or any case wanting "CMD exit = container exit" semantics. Strict truthiness (1/true/yes, case-insensitive); typos like `=0` do NOT silently opt out. Tests: * Unit tests in tests/hermes_cli/test_gateway_s6_dispatch.py cover all five paths (host no-op, supervised fire, sentinel recursion guard, CLI flag, env var truthy + falsy). The two load-bearing gates (sentinel + opt-out) were mutation-tested by removing each gate in isolation and confirming the dedicated test fails with the expected error. * Docker harness tests in tests/docker/test_gateway_run_supervised.py cover the round trips end-to-end against a built image: redirect fires (sleep-infinity heartbeat + supervised gateway-default slot + breadcrumb), --no-supervise opt-out (foreground gateway, no want-up on the slot), HERMES_GATEWAY_NO_SUPERVISE env var works identically, recursion is impossible (≤1 supervised python gateway-run + exactly 1 sleep-infinity parented to the CMD wrapper), and HERMES_DASHBOARD=1 produces both supervised gateway and supervised dashboard. Docs: * Added a `:::tip Gateway runs supervised` admonition near the main docker.md example explaining the upgrade and pointing at the opt-out. Pre-s6 (tini-based) images still run gateway run as the foreground main process, so the note is scoped to the s6 image only. Trade-off documented in the helper docstring: container exit code under the redirect is sleep's exit code (always 0 on SIGTERM), not the gateway's. That was an explicit design call — the supervised gateway is allowed to flap without taking the container with it, which is what "supervision" means. CI users who want exit-code forwarding can pass --no-supervise.	2026-05-28 12:42:13 +10:00
emozilla	a3df95e76d	fix(tui-gateway): restore _content_display_text helper lost in main merge The May 27 merge of origin/main into bb/gui re-introduced two callers of _content_display_text (in _inflight_text and _history_to_messages) but dropped the helper definition itself, leaving an unresolved reference. NameError fires on every user message via _start_inflight_turn -> _inflight_text, taking down both the TUI and the desktop (which share this gateway backend) the moment input is dispatched. Restores the helper verbatim from main (commit `36c99af37`) -- pure structured-content text extractor, no other dependencies.	2026-05-27 22:42:09 -04:00
Brooklyn Nicholson	cbf80ff71d	fix(tui_gateway): restore _content_display_text helper Bb/gui had dropped the helper but the orchestrator code merged from main still calls it (_inflight_text, _message_preview). Re-add the definition verbatim from main so session.create / _start_inflight_turn don't crash with NameError on first prompt submit.	2026-05-27 21:32:25 -05:00
Brooklyn Nicholson	02d26981d3	Merge origin/main into bb/gui	2026-05-27 21:22:14 -05:00
teknium1	36c99af37a	test(kanban): align two tests with recent kanban hardening Two pre-existing test failures on main, both pointing at code that was hardened recently — not behaviour bugs, test expectations that fell out of date. 1. tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id `c002668ff` ("fix(kanban): add grace period to detect_crashed_workers") gates each running task behind a launch-window grace period so freshly-spawned workers whose PID isn't yet visible on /proc don't get reclaimed. The test creates a worker_env fixture moments before asserting reclamation, so the default 30s grace skips the liveness check and detect_crashed_workers returns []. Fix: set HERMES_KANBAN_CRASH_GRACE_SECONDS=0 in the test so we get the immediate-reclaim semantics the assertion expects. 2. tests/tools/test_windows_native_support.py:: TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop `ffdc937c1` ("fix(kanban): hoist zombie reaper out of dispatch_once") reshaped reap_worker_zombies to use an early-return Windows guard (\`if os.name == "nt": return []\`) instead of an inverted gate (\`if os.name != "nt":\`). Both correctly keep the waitpid loop off Windows — the early-return form is stronger because the rest of the function never runs. Fix: accept either gate pattern in the source scan. Both failures reproduce verbatim on \`origin/main\` in a clean env; neither relates to in-flight work on #33564 (the FD-leak fix). Filing this as a separate fix-it PR per green-CI-policy so the kanban CI shard stays green for downstream PRs.	2026-05-27 18:26:44 -07:00
kshitijk4poor	2d5dcfabc3	test(kanban): update dispatcher tick counter for hoisted zombie reaper The reaper hoist in the prior commit adds an extra `asyncio.to_thread(_kb.reap_worker_zombies)` call at the top of every dispatcher tick (before the per-board work). The existing `test_gateway_dispatcher_disables_corrupt_board_without_traceback` mocks `to_thread` with a 4-call cap that previously matched 2 full dispatch ticks. With the reaper hoist each tick is now 3 `to_thread` calls instead of 2, so the cap is raised to 6 to preserve the same number of dispatch ticks. The `connect == 5` assertion is unchanged. Also add the contributor's `steveonjava@gmail.com` to AUTHOR_MAP alongside `steve@steveonjava.com` so contributor-audit passes for both identities used across the salvaged commits. Salvage follow-up for PR #32857.	2026-05-27 14:31:55 -07:00
Stephen Chin	dc98314fbd	fix(kanban): skip redundant WAL pragma on already-WAL connections apply_wal_with_fallback() issued PRAGMA journal_mode=WAL on every call, including connections to DBs already in WAL mode. This triggered the WAL init code path, causing SQLite to acquire EXCLUSIVE, checkpoint, and unlink kanban.db-{wal,shm}. Other open connections received (deleted) FDs and raised sqlite3.OperationalError: disk I/O error. Add a cheap read probe (PRAGMA journal_mode, no flock/checkpoint/unlink) before the set-pragma path. If already wal, return early. The set-pragma and DELETE fallback paths are unchanged. Closes #31158. Addresses root cause that PRs #32226 and #32322 attempted via connection-sharing/caching approaches.	2026-05-27 14:31:55 -07:00
Stephen Chin	ffdc937c18	fix(kanban): hoist zombie reaper out of dispatch_once Reaper now runs at the top of every dispatcher tick regardless of per-board connect() failures. Previously the reaper sat inside dispatch_once after the kanban_db.connect() call — any EIO during connect would skip reaping for that tick, accumulating zombie workers and stale claim_lock rows. Also: reap_worker_zombies now returns the list of reaped pids (the dispatcher logs them) and a test indentation fix. Squashes three sibling commits from PR #32301 into one logical change for batch review.	2026-05-27 14:31:55 -07:00
steveonjava	99c19eb2fe	fix(kanban): add post-commit page_count invariant check to write_txn Reads header bytes 28-31 after every COMMIT and compares against actual file size. Raises sqlite3.DatabaseError on torn-extend (actual_pages < page_count). Also sets PRAGMA wal_autocheckpoint=100 in connect(). Refs: #31208 (Bug E - same file, coordinate), #30973 (wal_autocheckpoint) Refs: #30445, #30896, #30908 (corruption reports)	2026-05-27 14:31:55 -07:00
Stephen Chin	c002668ff0	fix(kanban): add grace period to detect_crashed_workers `detect_crashed_workers` calls `_pid_alive` on every `running` task whose claim is held by this host. The check can transiently return False for a freshly-spawned worker (fork → /proc-visibility lag, or reap-race between SIGCHLD and parent reaping). When a second dispatcher ticks inside that window it reclaims the task and spawns a duplicate worker. Add `DEFAULT_CRASH_GRACE_SECONDS = 30` and an `HERMES_KANBAN_CRASH_GRACE_SECONDS` env-var override. `detect_crashed_workers` skips the liveness check when `time.time() - started_at < grace`. The existing 15-minute claim TTL still reclaims genuinely-crashed workers; grace only suppresses the launch-window false positive. `HERMES_KANBAN_CRASH_GRACE_SECONDS=0` is set on the `kanban_home` fixture in `test_kanban_core_functionality.py` so existing tests that assert immediate reclaim retain pre-fix semantics. Companion to merged PR #23442 (`release_stale_claims`, closes #23025), which addressed the same multi-dispatcher race in the stale-claim path. Related: #20015 (`_pid_alive` false-negative behaviour),	2026-05-27 14:31:55 -07:00
Stephen Chin	e83252dc46	fix(kanban): preserve original exception when write_txn rollback fails When code inside a write_txn block raises an OperationalError that SQLite has already auto-rolled-back (typical for disk I/O error, database is locked, and database disk image is malformed), the explicit ROLLBACK in write_txn.__exit__ itself raises cannot rollback - no transaction is active and the secondary exception replaces the original in the traceback. Operators see a misleading error and lose the diagnostic information they need. Swallow the rollback-time OperationalError so the caller always sees the original cause. Confirmed reproducer: tests/hermes_cli/test_kanban_db.py:: test_write_txn_preserves_original_exception_when_rollback_fails	2026-05-27 14:31:55 -07:00
Stephen Chin	5c49cd0ed0	fix(state): never silently downgrade WAL to DELETE on transient EIO apply_wal_with_fallback() treated "disk i/o error" as a permanent WAL-incompatibility marker, identical to "locking protocol" (NFS) and "not authorized" (FUSE). But EIO during PRAGMA journal_mode=WAL is typically TRANSIENT — page-cache pressure, brief lock contention, recoverable storage hiccups — not a permanent filesystem property. Treating transient EIO as a permanent downgrade signal produces the mixed-journal-mode-across-processes corruption pattern: 1. Process A opens kanban.db, hits transient EIO on the WAL pragma, silently downgrades to journal_mode=DELETE. 2. Process B (no EIO) opens the same file moments later and successfully sets journal_mode=WAL. 3. A writes rollback-journal frames while B writes WAL frames. SQLite documents this as unsupported and corrupts the file: https://www.sqlite.org/wal.html ("all connections to the same database must use the same locking protocol"). This was the root cause of repeated kanban.db corruption on hosts with multiple gateway processes plus CLI invocations against the same DB (observed pattern: corruption shortly after gateway startup, after the process logged "WAL journal_mode unsupported on this filesystem (disk I/O error) — falling back to journal_mode=DELETE"). The fallback warning told the truth — fallback DID happen — but the premise ("unsupported on this filesystem") was wrong; the EIO was a one-shot event and sibling processes successfully used WAL. Fix has two layers: 1. Remove "disk i/o error" from _WAL_INCOMPAT_MARKERS. EIO now re-raises so callers can retry instead of silently corrupting the DB. The two remaining markers ("locking protocol", "not authorized") are deterministic per filesystem so they remain safe permanent-downgrade signals. 2. Belt-and-suspenders: before downgrading on ANY marker match, peek the on-disk journal mode. If the header says WAL, refuse to downgrade and re-raise the original error. This guards against any future addition to _WAL_INCOMPAT_MARKERS turning out to be transient in some environment we haven't yet seen. Tests: - tests/test_hermes_state_wal_fallback.py: * Flipped test_falls_back_on_disk_io_error → test_reraises_on_disk_io_error asserting EIO is re-raised, not silently swallowed. * Added test_does_not_downgrade_when_disk_says_wal covering the on-disk-header safety guard for the existing legitimate markers. - tests/hermes_cli/test_kanban_db.py: * test_connect_falls_back_to_delete_on_locking_protocol now uses a truly-fresh DB (instead of the kanban_home fixture which pre-inits in WAL). On NFS the very first process touching the file legitimately downgrades; on a file already in WAL the new guard correctly refuses. A standalone reproducer lives at /tmp/kanban-stress/repro_bugD_eio_wal_downgrade.py (not committed): without fix the DB silently flips from WAL to DELETE mid-process; with fix the EIO surfaces and the file stays WAL. Refs: Bug D in the kanban-corruption investigation series (Bugs A and C shipped in ebe7374f3 and e02147d5e respectively). Bug D explains every corruption incident this week including those that survived A's single-dispatcher mitigation, because every CLI invocation is a separate process whose WAL pragma can transiently fail.	2026-05-27 14:31:55 -07:00
Stephen Chin	6416dd5187	fix(kanban): harden SQLite against torn-write corruption (secure_delete + cell_size_check + synchronous=FULL) Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).	2026-05-27 14:31:55 -07:00
kshitijk4poor	963d22cde6	test(install): harden uv-python-path regression test against future drift Self-review follow-ups on the salvage of #22494: W2 — Added encoding="utf-8" to read_text() calls. scripts/install.sh contains 48 em-dash ("—") characters and ~1500 non-ASCII bytes total; on Windows with cp1252 default locale, bare read_text() would raise UnicodeDecodeError. Project-wide cleanup of the other 11 similar sites across 5 install_sh test files is deferred to a separate follow-up. W3 — Bound the branch-containment check by the function body (head "resolve_install_layout() {" / tail "\n}\n") instead of by "next `return 0` after the marker". scripts/install.sh has 5 additional `return 0` statements between resolve_install_layout's first one and EOF; if a future maintainer hoists the export above another conditional with its own early-return or inserts an early-return between the marker and the export, the old assertion still passes while the export is unreachable. The body-bounded slice makes that class of regression visible. Also added more specific assertion messages and a guard for the body extraction to fail loudly if the function signature ever changes.	2026-05-27 13:55:51 -07:00
Wesley Simplicio	4efb40c325	fix(install): set world-readable uv python dirs for root FHS layout When installing as root on Linux with the default FHS layout (/usr/local/lib/hermes-agent), `uv python install` placed the managed Python under /root/.local/share/uv/python/, which non-root users cannot traverse. The shared /usr/local/bin/hermes wrapper then failed for them with "bad interpreter: Permission denied" when execing the venv python. Export UV_PYTHON_INSTALL_DIR and UV_PYTHON_BIN_DIR to /usr/local/share/uv/ in the root-FHS branch of resolve_install_layout so the managed Python is world-readable and the shared wrapper works for any user. Closes #21457	2026-05-27 13:55:51 -07:00
kshitijk4poor	0537e2600d	fix(skills): atomic lock write + drop dead _validate_category_name Self-review follow-ups on the salvage of #33177 + #33188 + #33209: W3 (real, lock_path.write_text was non-atomic AND the read path silently resets data to an empty installed dict on JSONDecodeError — a crash mid- write could nuke ALL hub provenance, not just official-optional). Switch to the same mkstemp + fsync + atomic_replace pattern that _write_manifest already uses in this module. W5 (dead code) — _validate_category_name had one caller on origin/main (install_from_quarantine), swapped to _validate_install_parent_path by #33177. Remove the now-unused definition to avoid the attractive-nuisance of contributors picking the wrong validator. Behavior preserved on the happy path; verified all 200 skills/hub tests plus the three E2E scenarios (destructive restore, backfill idempotency, adversarial nonexistent skill) still pass after both fixes.	2026-05-27 13:39:58 -07:00
wysie	ee80dfdea0	fix: preserve skill packages during curator consolidation	2026-05-27 13:39:58 -07:00
wysie	f040710d04	fix: backfill official optional skill provenance	2026-05-27 13:39:58 -07:00
wysie	a38e283395	fix: preserve nested official skill install paths	2026-05-27 13:39:58 -07:00
kshitijk4poor	53bdef5775	test(cli): regression test for hermes update fork upstream sync (#26172 ) Asserts that when hermes update runs on a fork whose local HEAD matches origin/main but commit_count == 0, the early-return path still consults _sync_with_upstream_if_needed() before printing "Already up to date!". Locks in the fix from the parent commit so the upstream-sync call cannot silently regress out of the commit_count == 0 branch.	2026-05-27 13:10:50 -07:00
Franci Penov	6f2a2f157f	fix: check upstream even when origin/main has no new commits The upstream sync logic only ran after a successful origin pull, so forks whose origin/main was already in sync with local (but behind upstream/main) would bail out with "Already up to date!" without ever checking upstream.	2026-05-27 13:10:50 -07:00

1 2 3 4 5 ...

9946 commits