Commit graph

170 commits

Author SHA1 Message Date
Brooklyn Nicholson
357db76b40 chore: linux build 2026-05-30 23:38:30 -05:00
Brooklyn Nicholson
1831eccd88 chore: uptick 2026-05-30 23:14:25 -05:00
brooklyn!
5f9e0545ca
macOS desktop: install + in-app self-update (#35607)
* fix(installer): align macOS HERMES_HOME with the rest of the stack

paths.rs computed the macOS Hermes home as ~/Library/Application Support/
hermes, but nothing else does: hermes_constants.get_hermes_home() (Python),
scripts/install.sh, and the Electron desktop's resolveHermesHome() all use
~/.hermes on macOS. The drift meant the Tauri installer wrote the install to
one directory and the desktop looked for it in another, so a fresh GUI
install never found its backend (the file's own comment warned this exact
drift would break things). Use ~/.hermes on macOS to match.

* fix(install.sh): always emit a stage result frame on failure

Stage helpers (clone_repo, install_deps, check_python, …) were written for
the monolithic flow and call `exit 1` on failure. Under `--stage`, that
terminated the process before the JSON result frame was printed, so the
installer's parse_stage_result saw "no frame" instead of a clean
{ok:false,...} contract response. Run the stage body in a subshell so an
`exit` only unwinds the subshell and the parent still emits the frame.

* feat(install.sh): auto-provision git on macOS/Linux (parity with install.ps1)

install.ps1 downloads PortableGit on Windows, but install.sh just printed a
"please install git" hint and exited — so a fresh Mac with no developer tools
(no Xcode CLT → no git) couldn't get past the clone step. check_git now tries
to install git before bailing:
  - macOS: Homebrew if present (headless), else `xcode-select --install`
    (the CLT prompt also provides the compiler some wheels need), polling for
    git to appear.
  - Linux: apt/dnf/pacman via sudo when available.
Falls back to the manual instructions only if auto-provision fails.

* feat(desktop): in-app GUI+backend self-update on macOS/Linux

On Windows the staged Hermes-Setup binary drives updates (quit → hermes
update → hermes desktop --build-only → relaunch). The mac drag-install has no
such binary, so "Update now" previously just printed `hermes update`.

Since there's no venv-shim file lock on POSIX, the desktop can drive the whole
update itself. applyUpdates now, when no staged updater exists on mac/linux:
  1. runs `hermes update --yes [--branch <current>]` (backend git pull + deps),
  2. runs `hermes desktop --build-only` (OS-aware GUI rebuild) with the
     Hermes-managed Node + venv on PATH,
  3. spawns a detached swapper that waits for this process to exit, dittos the
     freshly built Hermes.app over the running bundle, clears quarantine, and
     relaunches.
Degrades to "backend updated — restart to load the new GUI" if the rebuild
fails or there's no .app bundle to swap (dev run, Linux AppImage).

* chore: uptick
2026-05-30 22:26:08 -05:00
emozilla
baf6e07599 say 'OS appearance' instead of 'macOS appearance' 2026-05-30 20:07:18 -04:00
emozilla
4ed01f2fa4 fix(dashboard): return 404 JSON for unmatched /api paths instead of SPA HTML
The SPA catch-all (serve_spa) served index.html for any unmatched GET,
including unregistered /api/* endpoints. A missing API route therefore
came back as <!doctype html> with status 200, and JSON clients (the
desktop app's fetchJson) crashed with an opaque
'SyntaxError: Unexpected token <' instead of a clear error.

- web_server.py: unmatched /api or /api/... now returns 404 JSON
  ('No such API endpoint'); non-api paths still serve the SPA for
  client-side routing.
- main.cjs fetchJson: detect an HTML body / text/html content-type on a
  2xx response and reject with a clear message naming the URL, rather
  than a raw JSON.parse SyntaxError. Empty bodies resolve to null;
  malformed JSON reports the URL plus a snippet.
2026-05-30 20:05:49 -04:00
Brooklyn Nicholson
2e157a2154 feat(desktop): automate first-launch bootstrap on macOS/Linux
Previously a packaged macOS/Linux app with no Hermes install hit a
dead-end ("first-launch install is not yet automated -- run install.sh
manually") because install.sh lacked the staged protocol install.ps1
exposes. Now both platforms bootstrap on first launch with the same
structured, per-step progress UI as Windows.

- install.sh: add --manifest / --stage / --json / --non-interactive plus
  a stage dispatcher (prerequisites, repository, venv, python-deps,
  node-deps, path, config, setup, gateway, complete). User-input stages
  (setup, gateway) are skipped under --non-interactive; the in-app
  onboarding overlay owns API keys/model, matching the Windows flow.
  Each stage runs inside the install dir (its own process) and a new
  --commit flag pins the checkout to the build-stamp SHA.
- bootstrap-runner.cjs: drive the staged manifest/stage/JSON protocol for
  both install.ps1 (PowerShell) and install.sh (bash), selected by
  installer kind; removed the single-blob POSIX shim.
- main.cjs: drop the macOS/Linux unsupported-platform dead-end so the
  bootstrap-needed path runs the installer on every platform.
2026-05-30 01:40:53 -05:00
Brooklyn Nicholson
6e3f50a3a8 fix(desktop): resolve renderer assets relative to BASE_URL
Absolute public asset paths (/apple-touch-icon.png, /ds-assets/...) work
under the dev server but break in the packaged app, where the renderer is
loaded from file://.../index.html and a leading slash resolves to the
filesystem root -> broken onboarding provider icon and backdrop image on
macOS. Prefix these with import.meta.env.BASE_URL so they resolve next to
the bundled index.html in both dev and packaged builds.
2026-05-30 01:40:42 -05:00
Brooklyn Nicholson
5aade4bc57 fix(desktop): align fresh-install home so upgraders don't brick
Two related first-launch bugs on machines with a legacy ~/.hermes:

- install.ps1 hardcoded $HermesHome/$InstallDir to %LOCALAPPDATA%\hermes
  and ignored the HERMES_HOME the desktop passes through. The desktop
  freezes HERMES_HOME at module load and prefers a legacy ~/.hermes when
  %LOCALAPPDATA%\hermes is absent, so the installer wrote to a different
  home than the shell read -> "Could not connect to Hermes gateway". Honor
  $env:HERMES_HOME in the param defaults.

- isBootstrapComplete() trusted the marker + checkout without verifying a
  runnable venv, so an interrupted/split install spawned a dead backend
  instead of re-bootstrapping. Also require the venv python to exist.
2026-05-29 23:55:32 -05:00
Brooklyn Nicholson
8b1b9146c4 fix(desktop): stop completed-message layout shift while streaming
The assistant message action bar used `hideWhenRunning`, which unmounts it
whenever the thread is streaming. Since the bar reserves vertical space in
each completed assistant message's footer (it's invisible-until-hover via
opacity, not via mount), unmounting it collapsed every prior turn by the
bar's height — then remounting on resolve grew them back, shifting the whole
conversation (visible as "padding appears above the last user message").
Drop hideWhenRunning so the footer height is constant; the bar stays
invisible during streaming via its existing opacity/pointer-events gating.
2026-05-29 20:41:51 -05:00
Brooklyn Nicholson
815f171f37 fix(desktop): stop streaming caret from shifting layout on completion
The streaming caret (::after on the running message's last child) was an
in-flow inline-block adding ~0.78em of inline width, which could wrap the
last line mid-stream; when the caret is removed on completion the line
un-wraps and reflows — the visible post-response layout shift. Net-zero its
inline advance with a compensating negative margin so it paints at the text
end without consuming layout width.
2026-05-29 20:40:25 -05:00
Brooklyn Nicholson
5e7a7f6a38 fix(desktop): resolve PortableGit for update checks + reserve titlebar tools space
- runGit() hardcoded spawn('git'), which ENOENTs on fresh installer-driven
  Windows installs (git is PortableGit under %LOCALAPPDATA%\hermes\git, never
  on PATH) — so "Check for updates" failed with "Couldn't check for updates".
  Add resolveGitBinary() mirroring findGitBash (PortableGit → Git-for-Windows
  → PATH) and use it in runGit.
- PageSearchShell rendered a full-width search input in the titlebar row, so
  on Windows its right edge slid under the fixed top-right tools + native
  window controls. Reserve that footprint via --titlebar-tools-* vars.
2026-05-29 20:40:25 -05:00
Brooklyn Nicholson
8f29ad23c2 feat(desktop): live elapsed timer on install bootstrap steps
The first-launch install overlay showed a static "Installing" with no
motion, so long steps (notably the repo clone) looked frozen. Stamp each
stage's start time on the running transition and tick once a second so the
active step shows live elapsed (e.g. "Installing · 1:23"), plus elapsed on
the overall current-step line. Completed steps keep their final duration.
2026-05-29 20:40:25 -05:00
Brooklyn Nicholson
b86043834f Merge origin/main into bb/gui
Adopt main's web/ dashboard layout (apps/dashboard removed; web/ restored),
keep bb/gui's desktop CLI/update workspace handling, and preserve main's
mTLS/URL validation MCP changes. Dashboard backend is aligned to main with
only the intended STT provider quarantine/ElevenLabs override reapplied.
2026-05-29 20:40:08 -05:00
brooklyn!
de8fed32fd
feat(desktop): lead onboarding with Nous Portal + fix fresh-install detection (#34970)
- Feature Nous Portal as the primary onboarding card (Recommended tag,
  app logo, single pitch line); collapse other OAuth providers behind an
  "Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
  move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
  error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
  an empty credential or only implicit Bedrock/IAM, so fresh installs never
  saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
  Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
  via the /proc kernel marker.
2026-05-29 17:00:45 -05:00
emozilla
bebf1b7e01 fix(desktop): branch-pin the CLI manual-update command card
The 'Update from your terminal' card (shown to CLI installs with no staged
updater) hardcoded bare `hermes update` — which defaults to main and would
switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for
the GUI button, leaked into the card's copy text.

Resolve the checkout's current branch and show `hermes update --branch
<current>` for non-main checkouts; keep it bare for main so the card stays
clean. Best-effort: bare fallback if branch detection fails. Matches the
GUI button + installer --update contract; bare terminal/bot/TUI update
paths still default to main, unchanged.
2026-05-29 02:54:30 -04:00
emozilla
3d8c285054 update test 2026-05-29 02:49:28 -04:00
emozilla
48db64c846 update test 2026-05-29 02:02:58 -04:00
emozilla
ce31ec09b9 fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error
A user who installed via the CLI (irm|iex / install.sh) then ran
`hermes desktop` has no staged hermes-setup.exe, so clicking Update
in-app hit resolveUpdaterBinary()=null and showed a misleading error
('re-run the Hermes installer') with a Try-again button that could
never succeed — a dead loop for a perfectly valid install.

Treat the no-updater case as an intentional outcome, not a failure:
- main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' }
  (no throw, no 'error' stage) when no updater binary exists.
- New 'manual' update stage + apply-state.command thread the command to the UI.
- updates-overlay ManualView: a polished terminal-native card with the
  exact command and a copy button, framed as the correct path for a CLI
  user rather than an error.

GUI-installer users are unaffected — hermes-setup.exe present => seamless
auto-update runs as before. Zero new process orchestration; can't fail
the update demo.
2026-05-29 01:53:43 -04:00
emozilla
006136c4ab update test 2026-05-29 01:26:45 -04:00
emozilla
25488de4ba fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign)
The unpacked Hermes.exe showed the stock Electron icon + name in the
taskbar because build.win.signAndEditExecutable=false disables BOTH
electron-builder's signing AND its rcedit metadata/icon stamping. That
flag is load-bearing: enabling it re-triggers signtool -> winCodeSign,
whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end).

Decouple identity-stamping from signing entirely: after npm run pack,
run rcedit ourselves on the produced exe.
- Add rcedit as a direct devDependency of apps/desktop (the transitive
  electron-winstaller copy is fragile).
- apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls
  rcedit's named export to set icon + ProductName/FileDescription/
  CompanyName. Node builds argv natively — avoids the PowerShell->exe
  ->JSON double-escaping that broke the app-builder rcedit path.
- install.ps1 Set-DesktopExeIdentity invokes the script after the build,
  before shortcuts. Best-effort: failure keeps the stock icon, never
  fails the install. rcedit is a pure PE editor — no signtool, no
  winCodeSign, no symlinks.

Verified locally: stamping a copy of the built Hermes.exe embeds the
32x32 icon and sets ProductName=Hermes.

Also fix update-path success-screen flash: in update mode the installer
hands off + exits in ~600ms, so don't route to the 'launch Hermes'
success view (it flashed before the window closed).
2026-05-29 00:50:14 -04:00
emozilla
aeebe1afa7 test update 2026-05-29 00:25:16 -04:00
emozilla
71d64880d9 fix(installer): pass --branch to hermes update in the --update flow
The install is a detached-HEAD checkout of a pinned commit. Without
--branch, 'hermes update' fell back to its default (main) and switched
the checkout to main — a divergent branch that lacks the desktop CLI
command — so the update targeted the wrong branch and the rebuild stage
failed with 'invalid choice: desktop'.

Thread BUILD_PIN_BRANCH (the branch this installer was built against,
and the same branch the desktop detected the update on) into
'hermes update --branch <b>' so update + rebuild stay on-branch.
2026-05-29 00:11:14 -04:00
emozilla
be663d36a5 test update 2026-05-29 00:06:10 -04:00
emozilla
6381e70448 feat(installer): drive in-app updates through the Tauri installer
Converge update on the same principle as bootstrap: one driver owns all
repo mutation. The desktop becomes a pure consumer that hands off to
Hermes-Setup.exe --update instead of re-implementing git/pip in Electron.

- hermes desktop --build-only: build without launching, so the installer
  owns the post-update launch (CLI keeps build logic single-sourced).
- Installer AppMode {Install,Update} from argv; get_mode exposed to the UI.
- Installer self-copies to HERMES_HOME/hermes-setup.exe on install success
  (no-op guard during --update re-invocation to avoid the locked-exe copy).
- Installer --update flow (update.rs): wait for the desktop to release the
  venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other),
  then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses
  the bootstrap event channel + progress UI via a synthetic two-stage manifest.
- Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip
  removed) -> thin handoff: spawn updater, app.quit() to free the shim.
  Detection (checkUpdates, commit changelog, behind-count) kept intact.
- install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe
  (never bare 'hermes desktop', which would rebuild every launch).
2026-05-28 23:48:21 -04:00
emozilla
060c4f64a8 fix(desktop): signAndEditExecutable=false to skip signtool path entirely
After reading app-builder-lib/winPackager.js line 216 + 231 directly:
signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits
both signApp() (which signs Hermes.exe + every shouldSignFile match
including bundled prebuilds) AND createTransformerForExtraFiles().
None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the
winCodeSign download — that happens before they're consulted.

What we lose: rcedit also runs through signAndEditResources, so
disabling this drops PE metadata (file properties showing 'Hermes' /
'Nous Research' / file description). Cost is real but bounded:
  * Hermes.exe filename, icon, asar contents, app identity intact
  * Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE
    description) — minor downgrade
  * Start menu, taskbar, window title all work normally
  * SmartScreen will warn once (unsigned, same as before)

When the cert lands, flip signAndEditExecutable back to default true,
both signing AND rcedit return, PE metadata is restored.

Removes the no-op sign function (build-noop-sign.cjs) since
signAndEditExecutable=false prevents signtool from being invoked at
all — the custom hook never gets called either.
2026-05-28 13:14:23 -04:00
emozilla
91bf5ee6b7 fix(desktop): use no-op sign function instead of sign=null
VM run 6 still hit the symlink crash even with signtoolOptions.sign=null.
electron-builder 26.8.1 treats null as 'use the default signtool path'
rather than 'skip signing', so the winCodeSign fetch + extraction still
fired for the bundled prebuild re-sign.

The Electron docs (electronjs.org/docs/latest/tutorial/code-signing)
make it clear signing is OPTIONAL and unsigned apps work fine — users
just see SmartScreen on first launch. The electron-builder mechanism
for 'don't actually sign anything' is to supply a custom sign function
(via signtoolOptions.sign: '<path-to-cjs-module>') that resolves
without invoking signtool.

build-noop-sign.cjs is that module — a 5-line async function that
returns undefined. electron-builder calls it for every binary it would
have signed, gets back a resolved promise, and considers each binary
'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash.

When Nous's cert arrives, replace this file with a real signing hook
(@electron/windows-sign-based or a direct signtool invocation). The
architecture's signing-ready and the cutover is a one-file edit.
2026-05-28 12:59:14 -04:00
emozilla
3387b8df58 fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract
VM run 5 diagnosis: the pre-extract from 3b29e65c1 ran (extracted 83
files, 24MB) but produced ZERO files at the expected sentinel path
'/winCodeSign-2.6.0/windows-10/x64/signtool.exe'.

Cause: the .7z archive's root entries are 'windows-10/', 'darwin/',
'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with
'-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at
$cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory
nesting wrong from the start.

And then we observed: electron-builder downloads winCodeSign-2.6.0.7z
under a random numeric filename ('384387955.7z') regardless of what's
already extracted in the parent dir. The cache key isn't the dirname;
it's content-addressed. So the pre-extract approach was doomed even
if the path nesting had been right.

Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's
win build config. electron-builder honors this and skips the bundled-
prebuild signing entirely — no signtool invocation, no winCodeSign
fetch, no symlink-privilege crash. The previous failures all stemmed
from electron-builder pre-signing node-pty's bundled .exes
(winpty-agent.exe, OpenConsole.exe) which are already author-signed
upstream; re-signing with our nonexistent cert was overwriting good
sigs with nothing useful anyway.

Cost: when we DO get a real cert later, we'll add it back with the
sign function pointing at the cert chain. Until then, all-null is
the correct config and unblocks every non-admin Windows user.

Removed Initialize-ElectronBuilderCache (the dead pre-extract).
Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env
vars as belt-and-suspenders against a future electron-builder
change that might revive cert auto-discovery.
2026-05-28 11:42:40 -04:00
emozilla
17edb1db2b fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line
Diagnosing the second VM failure was impossible because bootstrap-installer.log
contained only the 'starting' banner. Two causes:

1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the
   floor under the default INFO env-filter.

2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only
   emitted Tauri events to the frontend; they never tee'd to the log file
   at all. When the failure route mounts, the Tauri event stream is the
   only place the script output lived, and it gets discarded.

3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event()
   were also Tauri-only — so even the 'which stage failed' frame never
   reached the log.

Fixes:
  * emit_log() → tracing::info!
  * Sink callbacks tee stdout to info!, stderr to warn!, with stage label
    as a structured field for grep'ability
  * emit_event() now matches on the variant and logs each lifecycle frame
    at the right level: Failed → tracing::error!, others → info!

Result: a failing install leaves a complete forensic trail in
bootstrap-installer.log — manifest stage list, every install.ps1
stdout/stderr line tagged by stage, the stage transitions, and the
final error. Same path as before so nothing the user does changes.
2026-05-28 10:48:43 -04:00
emozilla
0a079f7321 fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop
Three bugs found in the first VM end-to-end test:

1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the
   manifest came back with the 14-stage list (no desktop stage), the
   UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to
   both the manifest fetch and the per-stage runs — install.ps1 gates
   the desktop stage's inclusion on the flag.

2. The Success screen's Launch button silently swallowed the Tauri
   error when no Hermes.exe existed (e.g. Stage-Desktop was skipped).
   Wire the error through to inline UI with an alert callout, so the
   user gets actionable text ('Hermes.exe missing, run hermes desktop
   from a terminal') instead of an unresponsive button.

3. The Success screen tells users to run 'hermes desktop' from a
   terminal but the CLI only accepted 'hermes gui' — invalid choice
   for 'desktop'. Rename the subcommand canonically to 'desktop' with
   'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets
   used by session-flag arg parsing + logging-mode probe so both names
   route to the same logic.
2026-05-28 02:42:33 -04:00
emozilla
8eedb50bce feat(installer): Tauri bootstrap installer for first-time onboarding
Hermes-Setup.exe is a small signed Rust+Tauri binary that drives
scripts/install.ps1 stage-by-stage with a native UI matching the
desktop's design language. Replaces the chicken-and-egg pattern of
shipping a 200MB Electron app whose first launch existed only to
run install.ps1.

The architecture:

  Rust backend (src-tauri/):
    bootstrap.rs        orchestrator -- Tauri commands, stage iteration
    install_script.rs   resolve install.ps1 (dev checkout, cache, GitHub raw)
    powershell.rs       spawn powershell, line-stream stdout/stderr, parse JSON
    events.rs           BootstrapEvent types -- mirror bootstrap-runner.cjs
    paths.rs            HERMES_HOME resolution + tracing log setup
    build.rs            bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from
                        'git rev-parse HEAD' at compile time

  React frontend (src/):
    Tauri webview rendering 4 screens (welcome / progress / success /
    failure), driven by nanostores subscribing to the Rust event stream.
    Visual layer reuses the desktop's styles.css wholesale via @import
    so the installer and desktop never drift visually.

  Distribution:
    targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The
    raw target/release/Hermes-Setup.exe IS the artifact on Windows;
    .dmg + .app on macOS; AppImage on Linux. One file, double-click,
    no installer-installing-an-installer pattern.

  Compile-time pinning:
    build.rs reads 'git rev-parse HEAD' and emits
    cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>.
    bootstrap.rs's option_env!() picks these up so the binary fetches
    install.ps1 from the exact SHA it was tested against. CI / release
    builds can override via HERMES_BUILD_PIN_COMMIT env var.

  Windows manifest:
    hermes-setup.manifest declares level='asInvoker' so the
    productName 'Hermes Setup' doesn't trip Windows's installer-
    detection heuristic and refuse to launch without elevation.
    Also declares PerMonitorV2 DPI + UTF-8 active code page + Common
    Controls v6.

Limitations of this initial version:

  * No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe
    ('More info -> Run anyway'). The downstream binaries it produces
    (Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and
    therefore don't carry MOTW, so they launch without SmartScreen
    intervention. Cert procurement tracked separately.

  * macOS and Linux build paths defined but untested -- Windows-only V1.
2026-05-28 02:23:13 -04:00
Brooklyn Nicholson
02d26981d3 Merge origin/main into bb/gui 2026-05-27 21:22:14 -05:00
Jeffrey Quesnelle
e1338265c1 Merge origin/main into bb/gui (2026-05-24)
Bring 313 commits of upstream main into the bb/gui dashboard
refactor branch.  Eight conflicts resolved by hand, the rest
auto-merged.  One missing class (_StreamErrorEvent) restored from
main after the auto-merger dropped it.

Conflict resolutions:

  apps/dashboard/README.md          take HEAD: main's text described
                                    the pre-rename web/ layout that
                                    bb/gui refactored away.

  apps/dashboard/package.json       combine: keep HEAD's @hermes/shared
                                    workspace dep, take main's
                                    @nous-research/ui 0.16.0 bump.

  apps/dashboard/package-lock.json  regenerate via
                                    npm install --package-lock-only.
                                    Root lock also regenerated; only
                                    dashboard and apps/desktop entries
                                    moved (apps/desktop version 0.0.1 →
                                    0.0.2 to match bb/gui's
                                    package.json bump).

  apps/dashboard/src/pages/         take main (4 hunks): text-xs
    EnvPage.tsx                     replaces text-[0.65rem] per the
                                    typography rule HEAD's own README
                                    documents.

  hermes_cli/gateway.py             take main (2 hunks): Discord
                                    setup metadata moved to plugin
                                    (architectural migration); s6
                                    service-manager dispatch helpers
                                    additive.

  hermes_cli/main.py                combine (2 hunks): take main's
                                    Termux-aware
                                    _sync_bundled_skills_for_startup;
                                    combine gui + portal subcommands
                                    in the known-subcommand list.

  hermes_cli/web_server.py          mixed (10 hunks):
                                    - take main on _PUBLIC_API_PATHS
                                      (bb/gui's own test asserts the
                                      rescan endpoint must require auth)
                                    - combine WS helpers: keep HEAD's
                                      _ws_client_label + main's
                                      Host/Origin guard + composing
                                      _ws_request_is_allowed
                                    - take HEAD's debug-level broadcast
                                      drop log (matches the comment
                                      "subscriber went away mid-send")
                                    - take main's _safe_plugin_api_relpath
                                      GHSA-5qr3-c538-wm9j fix and the
                                      paired discovery-time validation
                                    - take main's {name:path} route
                                      converter for plugin visibility

  tui_gateway/server.py             take main: PR #31379's verbose-
                                    args gating supersedes HEAD's
                                    unconditional args dump on
                                    tool.start.

Post-merge restoration:

  run_agent.py                      restored class _StreamErrorEvent
                                    (40 lines, from origin/main:288).
                                    Auto-merge silently dropped it,
                                    breaking imports in
                                    agent/codex_runtime.py and three
                                    test files
                                    (test_codex_xai_oauth_recovery.py,
                                    test_streaming.py).  Restored
                                    verbatim from main.

Sanity checks:

  * git diff --check / --cached --check: clean (no stray markers)
  * ast.parse + import on all touched .py files: clean
  * targeted pytest on resolved files: 756 passed, 1 pre-existing
    Windows-curses failure unrelated to the merge
  * full pytest_parallel run: 105 files / 391 failures vs baseline
    98 files / 346.  Differential vs origin/bb/gui shows all 11
    "new" failure files come from main's added tests/code and
    reproduce identically against origin/main on the same Windows
    host (pure Windows path-separator / perms / git-bash issues
    in upstream tests, not merge regressions).  4 baseline
    failures fixed: 3 in test_codex_xai_oauth_recovery (the
    _StreamErrorEvent restoration), 1 each in test_pairing,
    test_runner_startup_failures, test_stream_consumer.
  * sentinel-token sweep on main's eight largest commits:
    every audited symbol present in the merged tree at expected
    counts (TTSProvider 61, NtfyAdapter 29, S6ServiceManager 70,
    install_bws 12, security_audit 16, register_image_gen_provider
    23, list_profile_gateways 22, DISCORD_FREE_RESPONSE_CHANNELS
    48, …).
  * byte-diff sweep: 30/30 sampled main-only-modified files
    byte-identical to origin/main; the four bb/gui-only files
    that drifted (i18n/types.ts, i18n/ru.ts, ThemeSwitcher.tsx,
    ToolCall.tsx) correctly absorbed main's web/ → apps/dashboard/
    edits through git's rename detection (main's added lines all
    present, removed lines all absent).
2026-05-25 00:39:46 -04:00
emozilla
3adb74269a bump gui version to 0.0.2 2026-05-22 21:23:58 -04:00
Brooklyn Nicholson
f6e6f00ff8 perf(desktop): useDeferredValue for streaming markdown so parses don't block input
Streamdown's per-Block parse cost grows with the live tail's length and
is unavoidable inside the block-memo pattern (industry standard, see
findings doc). The fix is to stop having that work block the main thread.

`<DeferStreamingText>` is a 12-line wrapper that reads message-part state
via `useMessagePartText`, runs it through `useDeferredValue`, and
re-publishes via assistant-ui's `<TextMessagePartProvider>`. The inner
`<StreamdownTextPrimitive>` reads the deferred value through the normal
`useMessagePartText` hook — no fork, no internal-path imports, fully on
assistant-ui's public API. React's concurrent scheduler then:

  - abandons in-flight deferred renders when a newer token arrives, so
    intermediate states get skipped under fast streams
  - deprioritises the markdown render when the main thread has urgent
    work (typing, scroll), so input stays responsive even while a
    100ms parse is queued

Streamdown already uses `useTransition` for its block-array setState;
this lifts the deferral up to the consumer boundary so it covers the
whole pipeline (preprocess → split → repair → parse → render).

A/B on the 34 MB session, 300 tokens at 50 tok/sec, markdown chunks
(four trials each, with the 33ms flush throttle on for both):

| | avgFps | p99 frame | LTs/5s | max LT | typing-while-stream p95 |
|---|---|---|---|---|---|
| pre  | 54.3 | 41 ms | 1.7 | 110 ms | ~17 ms |
| post | 58.5 | 31 ms | 2.0 | 117 ms | 14-18 ms |

Longtask count + max LT unchanged — useDeferredValue doesn't reduce
CPU, only its priority. The avgFps lift and p99 frame drop are the
proof that the existing CPU is no longer blocking 60 fps cadence. One
clean run logged MUTATIONS=0 — React skipped every intermediate text
state and only committed the final one (textbook deferred-value
behaviour).

The actually-reduce-CPU path is replacing the parser with a state
machine like Flowdown — left for a future PR; see
`apps/desktop/scripts/profile-typing-lag.md` for the full investigation.
2026-05-21 20:31:26 -05:00
Brooklyn Nicholson
7003df708c perf(desktop): floor assistant-text flush gap to 33ms for predictable batching
`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame`
only. The "at most one flush per frame" guarantee that gives you is fine
for fast streams (>~80 tok/sec) where multiple tokens arrive within a
single frame, but breaks down at typical LLM token rates (30-80 tok/sec)
where each token arrives slower than the rAF cadence and triggers its
own React commit + Streamdown markdown re-parse.

Track `lastFlushAt` and require at least 33 ms between two flushes.
React 18+ auto-batching probabilistically already collapsed some of
these, but the floor makes it deterministic.

A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks):

| | avgFps | p99 frame | LTs / 5 s | max LT |
|---|---|---|---|---|
| no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms |
| 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms |

`inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms,
which is the expected signature of a deterministic floor. Doesn't fully
solve the user's perceived hitches — Streamdown's per-Block parse cost
when the last block grows past ~2 k chars is still the elephant — but
it consistently shaves the worst-case longtask and makes the streaming
cadence visibly steadier.

Also threads a matching `flushMinMs` option through the synthetic
stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs`
so the harness can A/B both regimes without spending LLM credits.

See `scripts/profile-typing-lag.md` for the full investigation.
2026-05-21 20:08:49 -05:00
Brooklyn Nicholson
ea510a7c02 perf(desktop): memoize MarkdownText plugins to stop churning Streamdown
The inline `plugins={{ math: mathPlugin, ...(isStreaming ? {} : { code }) }}`
on `<StreamdownTextPrimitive>` constructed a new object literal on every
parent render. That broke `<Streamdown>`'s outer memo and forced its
internal `rehypePlugins` / `remarkPlugins` array useMemos to rebuild,
which propagates a new identity into every `<Block>` and defeats Block's
memoization for stable historical blocks.

After memoizing on `[isStreaming]` (the only real dimension of variance),
CPU profile during a 5 s synthetic stream on the 34 MB session shows
`parser` self-time dropping out of the top 10, `compile` cut roughly in
half, and `bn$1` / `m$1` (micromark internals) leaving the top entries.

Doesn't move the visible longtask count on its own — Streamdown's
per-Block parse cost still dominates whenever the last block's content
changes — but it removes a class of unnecessary re-parses for historical
blocks during streaming. See `scripts/profile-typing-lag.md` for the
full investigation.
2026-05-21 20:08:32 -05:00
Brooklyn Nicholson
3143f79b8f perf(desktop): memo FadeText so it skips re-renders when text unchanged
FadeText is used 110+ times inside `tool-fallback.tsx` on a tool-heavy
thread. During streaming each parent re-render previously triggered the
component's `useEffect([children])`, which forced a `scrollWidth` layout
read even when the title text was unchanged. The `useResizeObserver` was
already covering the genuine resize case, so that effect was strictly
redundant work.

Drops the effect and wraps the component in `React.memo` with a custom
comparator that field-compares `className`, `fadeWidth`, and `style`,
plus identity-compares `children` (scalar fast-path; correct for JSX
nodes too since a new node should force a re-render).

Verified via temporary render counter on the 34 MB
`session_20260514_215353_fe0ac8` thread (110 FadeText instances): a
2 s synthetic stream went from ~11k FadeText render calls to 122 —
roughly one render per truly-new instance instead of one per parent
commit per instance.

Doesn't move the longtask needle on its own (Streamdown's markdown
re-parse dwarfs it) but eliminates a steady CPU floor and a class of
forced layouts during streaming. Profile-typing-lag.md documents the
full investigation, including the remaining Streamdown cost as the
real source of the perceived "5 fps moment" hitches.
2026-05-21 19:38:40 -05:00
Brooklyn Nicholson
99f2a9503c chore(desktop): synthetic-stream perf harness + scripts
Drops the React `<Profiler>` approach (no-op because Vite is currently
serving the production React build) in favor of an externally-observable
measurement stack: rAF frame intervals, `PerformanceObserver({entryTypes:
['longtask']})`, and a `MutationObserver` on the live streaming message.

Adds a synthetic stream driver — `window.__PERF_DRIVE__.stream({...})` —
that pushes tokens through the live `$messages` atom at a controlled rate,
so the assistant-ui runtime, incremental repository, and Streamdown
markdown pipeline see the same workload they'd see during a real LLM
stream, without the LLM cost.

The driver lives in `src/app/chat/perf-probe.tsx`; `main.tsx` side-imports
it under `import.meta.env.MODE !== 'production'` so it tree-shakes out of
prod builds. (Using `MODE` rather than `DEV` because our Vite setup
currently reports `DEV=false` even under `vite dev` — see the dev-build
note in `profile-typing-lag.md`.)

Scripts:
  - measure-synthetic-stream.mjs  drive synthetic + record frame/longtask/mutation
  - profile-synth-stream.mjs      CPU profile + top self-time during synthetic
  - measure-real-stream.mjs       same harness, real LLM stream
  - profile-real-stream.mjs       CPU profile bracketing the real stream window
  - eval.mjs / reload.mjs         small CDP helpers

A real-LLM measurement on Cloud Shadows (gpt-4o-mini, 39 s window) showed
12 longtasks in the same 75-127 ms range the synthetic predicted, so the
synthetic is a faithful proxy.
2026-05-21 19:38:26 -05:00
Brooklyn Nicholson
5abf89ddd1 Revert "Revert "perf(desktop): use textContent for trigger precondition""
This reverts commit 0739588f48.
2026-05-21 18:57:18 -05:00
Brooklyn Nicholson
563ad23853 Revert "Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer""
This reverts commit b7b378e3a4.
2026-05-21 18:57:18 -05:00
Brooklyn Nicholson
b7b378e3a4 Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer"
This reverts commit bff1b3261d.
2026-05-21 18:54:32 -05:00
Brooklyn Nicholson
493dd5b660 Revert "perf(desktop): cut FadeText forced layouts during streaming"
This reverts commit 88e7d7537c.
2026-05-21 18:54:24 -05:00
Brooklyn Nicholson
0739588f48 Revert "perf(desktop): use textContent for trigger precondition"
This reverts commit a6a78ff08a.
2026-05-21 18:54:24 -05:00
Brooklyn Nicholson
a6a78ff08a perf(desktop): use textContent for trigger precondition
Replace composerPlainText() call inside refreshTrigger's no-trigger
fast-bail with a textContent check. textContent is a browser-native
flat traversal; composerPlainText walks recursively with chip-aware
logic. We only need to know if @ or / appears; either way the trigger
char will be in textContent because chips contain @ in their refText.

Profile shows composerPlainText was ~18ms self over a 12s typing-during-
stream window, called from refreshTrigger on every keystroke. Most of
that was the precondition check (the trigger detection path is the
slow path but only runs when a trigger char is present).
2026-05-21 18:05:43 -05:00
Brooklyn Nicholson
e529694919 perf(desktop): rate-limit thread auto-pin during streaming
Follow-up to the Enter-jump fix. The first version did a synchronous
re-pin loop inside the on-scroll handler when the browser clamped our
`scrollTop = scrollHeight` write short of the new bottom; that gave a
tight 4 px visible jump on Enter, but during streaming the
ResizeObserver fires many times per second as content grows, and each
RO callback re-entered the pin loop. CPU profile showed
`Virtualizer.getMaxScrollOffset` climbing to 22 ms self over a typing-
during-streaming window — the sync re-pin path was paying tanstack-
virtual's recompute cost ~3× per token.

Re-architect:

- RO callback coalesces to one pin per animation frame. Streaming-rate
  RO bursts now cost the same as a single per-frame pin.
- The on-scroll programmatic-counter guard remains (it's what prevents
  the false-disarm bug when the browser clamps a write). It no longer
  does sync re-pins; the next RO/rAF will catch up.
- The useLayoutEffect on groupCount (the path that fires on user
  submit / new turn arrival) ALSO schedules one rAF pin in addition to
  the synchronous pin. This catches the case where React mounts the
  new message in a second commit (after our layout effect ran), which
  grows scrollHeight again. Two pins instead of a tight loop, paid only
  once per turn change.

Net effect on the Cloud Shadows long thread:

  enter-jump transient:   12–20 px for 1 frame (was 49 px permanent)
  CPU during stream+type: `getMaxScrollOffset` dropped out of top-5
                          self-time list
  typing-during-stream:   p50 ~10 ms paint, p99 ~20 ms (1 frame),
                          occasional 40 ms+ outliers during burst
                          token arrivals

Also adds scripts/profile-long-stream.mjs: 20-second streaming profile
with per-500ms FPS histogram + content-length tracking, so we can see
whether streaming render cost grows with message length (it doesn't —
sustained 60 fps).
2026-05-21 18:02:26 -05:00
Brooklyn Nicholson
a7e6a4fc0b perf(desktop): fix "Enter jumps up" on long threads
User reported: after pressing Enter on a long thread, the view jumps up
— the just-submitted message disappears below the fold. Confirmed via
apps/desktop/scripts/measure-jump.mjs:

  before:  distFromBottom 0 → 49.5px, sticks there permanently
  after:   distFromBottom 0 → ~0 (worst case 4px for one frame)

Root cause in useThreadScrollAnchor (thread-virtualizer.tsx):

1. The sticky-bottom logic disarmed on any scroll event where
   `scrollTop < lastTopRef.current`. That check can't distinguish a
   user scrolling up from a programmatic `pinToBottom` write that
   the browser clamped short of bottom (because content also grew in
   the same frame, so `scrollTop = scrollHeight` lands at
   `scrollHeight - clientHeight` for the OLD scrollHeight, which is
   now below the NEW scrollHeight). Result: sticky-bottom disarmed
   permanently on the user's first submit.

2. There was no synchronous pin tied to React's commit phase. By the
   time the ResizeObserver fired and re-pinned, the user had already
   seen ~50ms of "message below the fold" — visually that reads as the
   view jumping up.

Fix:

- `programmaticScrollPendingRef` counter tracks scroll events we
  expect to be ours (one per `pinToBottom` write). The scroll handler
  skips the disarm check when consuming a pending tick, keeps the
  arm bit true, and re-pins synchronously if the browser clamped us
  short of bottom. A depth cap (8) breaks runaway loops in
  pathological streaming-burst layouts.

- `useLayoutEffect` on `groupCount` increase pins BEFORE the browser
  paints, eliminating the visible ~50ms window between optimistic
  user-message insert and the RO/scroll-event chain firing.

Verified on the long Cloud Shadows thread (7-8 turns, ~11k px tall):
all three repro runs now hold within 0–4 px of bottom across the
post-Enter transition. Submit latency unchanged (paint 77–107 ms),
streaming-typing latency unchanged.

Also adds three debug harnesses:
  - measure-jump.mjs   — sample thread scroll across Enter
  - probe-thread.mjs   — dump current thread / scroll state
  - diag-jump.mjs      — intercept scrollTop + RO + mutations across Enter
2026-05-21 17:45:55 -05:00
Brooklyn Nicholson
e18c233c1e docs(desktop): correct leak-typing numbers on a real session
Re-ran the leak harness on a populated session (Phaser thread) for both
unpatched and patched builds. The original 'listener leak' was transient
warm-up cost, not a steady-state leak — both versions show 0 listener
growth/round in steady state.

The load-bearing number is forced layouts per character:
  unpatched (HEAD~2):  7.02 layouts/char
  patched   (HEAD):    2.35 layouts/char  (3× fewer)

The patches reduce per-char forced-layout work to Blink's natural floor.
Document node count and heap are flat in both builds.
2026-05-21 17:14:21 -05:00
Brooklyn Nicholson
a1b8631176 chore(desktop): drop diag scratch scripts no longer needed 2026-05-21 16:11:46 -05:00
Brooklyn Nicholson
88e7d7537c perf(desktop): cut FadeText forced layouts during streaming
The slowest user-felt path is typing into the composer while the
assistant is streaming. Profile (scripts/profile-under-stream.mjs):

  FadeText measureOverflow self time:  35.8 ms → 18.1 ms  (-50%)
  total active CPU during 7s window:   ~150 ms → ~50 ms

Two changes in src/components/ui/fade-text.tsx:

1. Drop the `useEffect([children])` that re-ran `measureOverflow`
   (reads scrollWidth + clientWidth — forced layout) on every parent
   re-render. `useResizeObserver` already fires the same callback on
   mount and whenever the host span's box size changes; that covers
   the only case where overflow state can legitimately change. The
   previous explicit useEffect was a forced-layout flush on every
   parent render, which during streaming meant every token tick.

2. Wrap the component in `memo` with a custom comparator that
   short-circuits the entire render when scalar string `children` and
   the className/fadeWidth/style props are unchanged. The hot path
   was tool-fallback's title chips being re-rendered by parent
   streaming updates even though their text was stable; memo+
   comparator skips that.

Also adds two harness scripts under apps/desktop/scripts/:
  - latency-under-stream.mjs (key→paint latency while a turn streams)
  - profile-under-stream.mjs (CPU profile while a turn streams)

Updates profile-typing-lag.md with the streaming numbers and confirms
the Enter→paint submit path is already fast (≤320ms on the populated
session; the 2s "stall after Enter" the user noticed once was a
one-time cold-start, not reproducible at the UI layer).

I'd guess the felt jank in real use is fast-burst typing during a
long-form streaming reply (code blocks + markdown lists multiply the
per-token render cost). The CPU savings here scale linearly with
token volume.
2026-05-21 16:09:44 -05:00
Brooklyn Nicholson
bff1b3261d perf(desktop): cut per-keystroke layout + listener churn in chat composer
Empirical work via CDP harnesses under apps/desktop/scripts/ (see
profile-typing-lag.md):

  jsListeners growth (per round of 200 chars + GC):
    before: +35  (verified leak — listeners stuck after 1st trigger popover use)
    after:  +0

Four narrow edits in src/app/chat/composer/index.tsx:

1. Drop the per-keystroke `editorRef.current.scrollHeight` read used to
   decide composer expansion. Replace with `draft.length > 60` heuristic;
   the existing ResizeObserver still catches edge cases. `scrollHeight`
   is a forced-layout call and was firing on every char until the first
   wrap.

2. Bucket measured composer height to 8px before writing
   `--composer-measured-height` / `--composer-surface-measured-height`
   on `documentElement`. Without this, the editor grows ~1px per char,
   setProperty fires every keystroke, computed style is invalidated tree-
   wide.

3. Remove the dead `$composerDraft` two-way sync. Nothing outside the
   composer subscribed to that atom (verified via grep). Two useEffects
   on `[draft]` were pushing draft→atom and atom→aui per keystroke for
   no consumer. Also drop the per-keystroke
   `reconcileComposerTerminalSelections` call; it was pruning stale
   labels for `terminalContextBlocksFromDraft`, but that helper already
   ignores labels not in the current submitted text, so pruning per
   keystroke was just bookkeeping.

4. `refreshTrigger` fast-bails when the draft contains neither `@` nor
   `/`. Previously `textBeforeCaret(editor)` ran on every input/keyup
   regardless; `range.toString()` inside is O(n) over draft length.

Synthetic typing latency p50/p90/p99 is similar before vs after on a
freshly-loaded session (Blink can already handle ~30cps typing into a
contentEditable on its own); the real win is the listener leak being
gone and the global computed-style invalidations dropping ~8× when the
composer is sitting at a fixed height row.

The `Enter → stall` follow-up (see profile-typing-lag.md §"Submit /
TTFT stall") is unmeasured here — needs a throwaway session because
the harness fires a real prompt. Not blocking this commit.
2026-05-21 15:45:01 -05:00