hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-19 10:02:16 +00:00

Author	SHA1	Message	Date
kshitij	832d5967f8	Merge pull request #48262 from kshitijk4poor/salvage-32445 feat(memory): improve OpenViking setup UX (salvage #32445)	2026-06-18 11:34:11 +05:30
kshitijk4poor	1153b42b24	Merge upstream/main into OpenViking setup-UX (salvage #32445 ) Resolves conflicts from the OpenViking churn that merged after #32445 was opened (#48042/#47662 session-switch + write hardening, #47311/#47973): - plugins/memory/openviking/__init__.py: keep both __init__ field groups (the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down). - tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new setup-validation tests and main's session-switch/concurrency tests (disjoint additions to the same region). Two fixes layered while reconciling (contributor work otherwise preserved): - Restore the merged tenant-header contract (#22414/#21232). The PR had changed _VikingClient defaults to '' and made empty account/user OMIT the tenant headers; main's contract is that empty falls back to 'default' and the X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them). Reverted the constructor to 'account or os.environ.get(..., "default")' and updated the two PR tests that asserted the omit-when-empty behavior. - Close a secret-file TOCTOU in the setup writers. _write_env_vars and _write_ovcli_config wrote the api_key/root_api_key file and chmod 0600 AFTERWARD, leaving a world-readable window on newly-created files. Added _precreate_secret_file() to create with 0600 before any secret bytes land.	2026-06-18 11:28:51 +05:30
Ben Barclay	c661634537	fix(dashboard): stream file uploads via multipart instead of base64 JSON (NS-501) (#47663 ) * fix(dashboard): stream file uploads via multipart instead of base64 JSON The dashboard file manager uploaded files (including backup/restore zip archives) by reading them client-side with FileReader.readAsDataURL and POSTing a base64 data URL inside a JSON body to /api/files/upload. For a large backup this (a) inflates the payload ~33%, (b) buffers the whole file plus its decoded copy in memory, and (c) reliably trips an upstream proxy body-size/timeout limit, surfacing as a 502 with the upload appearing to hang indefinitely (NS-501). Dashboard-only hosted users have no shell fallback to place the archive, so backup restore was unusable. Add a streaming multipart endpoint POST /api/files/upload-stream (UploadFile + Form) that reads the request body in 1 MiB chunks straight to a sibling temp file, enforces the existing 100 MB size cap as it streams (413 on overflow, before buffering the whole file), and atomically renames into place so a partial/aborted/over-limit upload never clobbers an existing file. The frontend api.uploadFile now sends multipart/form-data (raw bytes, no base64, browser-set boundary) and FilesPage passes the File object directly; the dead readAsDataUrl helper is removed. The legacy base64 JSON endpoint stays for backward compat. FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in by fastapi itself, so it is added to the base deps, the [web] extra, and the tool.dashboard lazy-install set (kept in sync). Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB, over-limit 413 without clobbering + no temp-file leak, overwrite=false conflict, forced-root traversal containment); existing base64 tests still pass; web typecheck + vite build clean; and a real uvicorn server E2E (5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a 30 MB TestClient roundtrip confirm constant-memory streaming end to end. Reported via beta (NS-501). * build(deps): regenerate uv.lock for python-multipart (NS-501) CI ran uv lock --check / uv sync --locked which failed because the python-multipart dependency add was not reflected in uv.lock. Regenerate the lockfile (resolves to 0.0.20, matching the [web] extra pin) after merging current main.	2026-06-18 15:54:32 +10:00
Ben Barclay	9c3c5da356	fix(backup): hermes import never overwrites volatile gateway runtime state (NS-501) (#48243 ) Importing a backup wrote every file from the zip over the target home wholesale. On a hosted instance this clobbered gateway_state.json with the source machine's last recorded run/desired state — driving the container-boot reconciler (container_boot._read_desired_state, which only auto-starts a gateway whose state is "running") off stale/foreign state and leaving the gateway stuck "starting", disconnected from the Nous portal. Add _IMPORT_SKIP_NAMES (gateway_state.json, gateway.pid, cron.pid, gateway.lock, processes.json) and skip them by basename in run_import, so both the root profile and named profiles preserve the target's own runtime state. This mirrors what container_boot._STALE_RUNTIME_FILES already sweeps on every container boot, and protects against older backups that predate the backup-side exclusions. The import summary reports which files were preserved. This is the second half of NS-501 (filed separately as NS-508): the upload 502 was fixed in #47663; this fixes the import-breaks-the-instance half.	2026-06-18 15:27:45 +10:00
Ben Barclay	4440d77bf3	fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188 ) The install method (docker/git/pip/...) describes the running binary, but detect_install_method() read it from $HERMES_HOME/.install_method — a shared DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME (~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with a host-side Desktop/CLI install. When a containerized gateway and a host install share one $HERMES_HOME, the home-scoped stamp is a single slot describing two installs: the published image stamps 'docker' on every boot, the host install then reads 'docker' and the in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker container"). Reinstalling the Desktop app from the DMG doesn't help because the contaminated stamp is re-read every time. Fix (option 1 — code-scoped stamp): - detect_install_method() reads <install tree>/.install_method first (next to the running code, immune to the shared data dir). It falls back to the legacy $HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when not actually containerized — so already-poisoned shared homes self-heal. - stamp_install_method() writes the code-scoped stamp. - install.sh stamps $INSTALL_DIR instead of $HERMES_HOME. - Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time (inside the immutable block); stage2-hook.sh no longer writes the home stamp and proactively removes a stale 'docker' one to heal existing shared homes. Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp honored when containerized). Unstamped installs in generic containers still fall through to git/pip (preserves the #34397 fix).	2026-06-18 14:14:41 +10:00
Ben Barclay	c276b017ad	feat(relay): connector⇄gateway channel auth + signed-HTTP inbound receiver + enroll CLI (#48147 ) * feat(relay): authenticate the connector⇄gateway WS channel The relay gateway may be customer-managed and internet-exposed, so the connector⇄gateway channel is itself authenticated (distinct from the platform crypto the relay path sheds). Add gateway/relay/auth.py — a Python port of the connector's HMAC token + delivery-signature schemes (relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against the connector's compiled TypeScript via cross-language test vectors. Present an Authorization bearer on the /relay WS upgrade keyed by the per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET in env or config). The connector rejects an unauthenticated/invalid/ revoked upgrade with close 4401. * feat(relay): signed-HTTP inbound delivery receiver The connector delivers normalized inbound events to a tenant's gateway over a signed HTTP POST, not the outbound /relay WS: the connector instance owning a platform socket is generally not the instance a given gateway dialed out to, so inbound targets a tenant endpoint that may load-balance across gateway instances. Add gateway/relay/inbound_receiver.py — verifies x-relay-signature / x-relay-timestamp over the EXACT raw request bytes (re-serializing would break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces) against the per-tenant delivery key verify list within a 300s replay window, then dispatches messages to handle_message and interrupts to the interrupt handler. Wire it into the adapter lifecycle (start in connect() when a delivery key + bind port are configured, tear down in disconnect(); a purely-outbound dev gateway runs without it). Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from the platform-symbol scan but still banned from importing platform-crypto modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib. * feat(relay): hermes gateway enroll CLI Add the gateway half of zero-touch enrollment. `hermes gateway enroll` resolves a fresh Nous Portal access token (the tenant-proving identity), POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade; the per-tenant delivery key verifies signed inbound deliveries. Refuses under is_managed() (hosted installs get the secret stamped in by the orchestrator). Added as an 'enroll' subcommand on the existing gateway subparser — not a new top-level command. * docs(relay): inbound is signed HTTP, not WS; document channel auth Fix the stale contract: §3/§5 said inbound rode the WS socket (single- instance only, predates the multi-instance socket-ownership + channel-auth model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per- gateway WS-upgrade secret, per-tenant inbound delivery key) and how they differ from the platform crypto the relay path sheds. * test(relay): update build_gateway_parser callers for cmd_gateway_enroll The enroll subcommand added cmd_gateway_enroll as a required keyword-only arg to build_gateway_parser, but two existing parser-extraction tests still called it with only cmd_gateway/cmd_proxy — failing CI with TypeError. Thread the new handler through both call sites and add a test asserting `gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.	2026-06-18 12:01:54 +10:00
Ben Barclay	fcf6cb3d73	fix(docker): supervised gateway uses --replace to take over stale holder (NS-505) (#47555 ) * fix(docker): supervised gateway uses --replace to take over stale holder Inside the s6 container image the per-profile gateway service rendered a bare `hermes gateway run` (no --replace). When a gateway is started OUTSIDE s6 — a stray shell `hermes gateway run`, an agent action, or the Open WebUI helper (scripts/setup_open_webui.sh) — it grabs the per-HERMES_HOME PID lock first. The supervised slot then execs the bare `gateway run`, hits the "Another gateway instance is already running" guard, exits non-zero, and s6 restarts it: a restart loop that floods the log every ~12s and never binds. The container looks up but the gateway is permanently down, and dashboard-only users (no shell) cannot recover. Render the supervised run script as `gateway run --replace` so s6 is authoritative for its slot: it reaps the stale holder via the hardened takeover path (takeover marker + SIGTERM->SIGKILL-with-confirmation + scoped-lock cleanup in gateway/run.py) and binds. This matches the systemd service path, which already builds its argv with --replace (_build_gateway_argv / 'nohup hermes gateway run --replace'), and the intent already documented in _maybe_redirect_run_to_s6_supervision. The existing HERMES_S6_SUPERVISED_CHILD sentinel still prevents the run->start->run redirect recursion. Each profile is scoped to its own HERMES_HOME and s6 guarantees one supervised instance per slot, so there is no legitimate supervised sibling for --replace to clobber. Reported via beta (NS-505): gateway.log showed PID 17907 'running (manual process)' with the guard error repeating every ~12s on v2026.6.5. Adds a regression test asserting every gateway-run exec line in the rendered script (default + named profile, both privilege branches) carries --replace, and updates the existing render-script assertion. * fix(ci): remove stray .venv symlink committed into repo The PR's commit accidentally tracked a .venv symlink pointing at the developer's local venv (mode 120000 -> /home/ben/nous/hermes-agent/.venv). The CI test/e2e/build jobs run `uv venv` to create .venv and failed with `failed to create directory .venv: File exists (os error 17)` because the checkout already contained the symlink. All test shards aborted in <15s during setup, before any test ran. Untrack the symlink and add a bare `.venv` entry to .gitignore (the existing `.venv/` rule only matches a directory, so a symlink slipped through).	2026-06-18 10:49:02 +10:00
Teknium	9ba4615db2	fix(dump): show commit date instead of release date in hermes debug (#48104 ) * feat(mcp): raise default tool-call timeout 120s -> 300s Port from openai/codex#28234. Long-running MCP tools (web fetches, sandboxed builds, deep-research servers) routinely exceed 120s, causing spurious timeout failures. Codex bumped its default MCP tool timeout from 120 to 300 for the same reason. - _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server 'timeout' config override unchanged) - update test_default_timeout assertion - document the default in mcp-config-reference.md * fix(dump): show commit date instead of release date in hermes dump The version line in `hermes dump` (the top of the /debug report) appended the package release date in parentheses, which reads like a wall-clock "generated at" timestamp and confuses support triage. Replace it with the date the HEAD commit was actually made, resolved live via `git log -1 --format=%cd --date=short`, kept next to the commit SHA. On Docker/wheel installs with no .git the date resolves to '' and the suffix is simply omitted (the baked SHA still identifies the build).	2026-06-17 16:53:42 -07:00
brooklyn!	c1f9eb0ec4	fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091 ) * fix(desktop): resolve electronDist dynamically + self-heal blocked installs Supersedes the static-path approach (#48081) and the install-step self-heal (#48082) with a fix that removes the whole failure class instead of chasing each symptom. Three distinct faults converged into the June desktop-build outage; this closes all three. Root cause (the part #48081 left open — "Gap B"): build.electronDist was a static relative path in apps/desktop/package.json, but npm workspace hoisting is NOT deterministic — depending on the npm version and what else is installed, npm nests the workspace-only electron devDep under apps/desktop/node_modules/electron OR hoists it to the repo root. A static path matches only one layout, so a clean install intermittently fails with "The specified electronDist does not exist". #48081 re-pointed the path at the nested layout (correct today) but electron-builder reads electronDist STATICALLY, so any future hoist change silently breaks it again — only caught by a CI invariant, never self-corrected. Fix: - scripts/run-electron-builder.cjs: resolve electron the way Node's runtime does — require.resolve("electron/package.json") walks node_modules from the desktop project upward and finds electron wherever npm actually put it. The path can never drift out of sync with the install layout again, on any OS/npm version. * dist present -> pass -c.electronDist=<abs>/dist so electron-builder reuses the unpacked runtime (keeps the #38673 fast path that dodges the 26.8.x missing-binary re-unpack bug). * dist absent -> omit electronDist; electron-builder fetches Electron itself via @electron/get honoring electronVersion + ELECTRON_MIRROR. package.json: builder script now runs the wrapper; the static build.electronDist is removed (the resolver owns it). - main.py / install.sh / install.ps1: on a dependency-install failure where the electron package staged but its dist is missing (electron's install.js process.exit(1) on a blocked/throttled binary download — #47266/#47917/#48021), repopulate the dist via electron's downloader (canonical, then npmmirror.com) and CONTINUE to the build instead of aborting. npm runs postinstall LAST, so the only casualty is electron/dist; bailing here is what made the pack-time mirror self-heal unreachable on a blocked network. Hard-fail only when electron never staged at all (a genuine dependency error). - The pack-time mirror fallback now retries the build even when the pre-fetch can't populate the dist: the wrapper lets electron-builder download Electron itself via the mirror, so the retry is no longer a no-op (it was, when electronDist was a static path). The exact 40.10.2 pin (already on main) keeps the third mode — the native @electron-internal/extract-zip win32 binding that 40.10.3/40.10.4 ship without a published prebuild — from recurring. Tests: - test_desktop_electron_pin.py: replace the static-path-matches-lockfile invariant with contracts that there is no hardcoded electronDist to drift, the builder script routes through the resolver, and the resolver uses Node module resolution + injects -c.electronDist. - test_gui_command.py: install-failure self-heal continues to build; genuine (electron-never-staged) install failure still hard-fails; pack retries under the mirror even when the pre-fetch is blocked. Salvages/supersedes the overlapping community work in #48003 (sitkarev), #48012 (omegazheng), #48033 (james47kjv), and #48082. Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com> Co-authored-by: omegazheng <zheng@omegasys.eu> Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com> * fix(desktop): narrow Electron self-heal to real missing-dist failures Follow-up on #48091 to remove the remaining misdiagnosis risk from the installer/build fallback path (#46785 concern): only take the Electron repair/retry path when Electron's package files are staged and dist is actually missing/corrupt. - main.py: add _electron_pkg_staged_missing_dist() and use it to gate install failure recovery; fail fast for unrelated npm install errors. - main.py/install.sh/install.ps1: run cache purge + retry only when dist is missing; do not retry unrelated tsc/vite/build failures under an Electron-specific narrative. - install.sh/install.ps1: tighten install-stage self-heal guard to require both package.json + install.js and missing dist. - tests: add coverage that install failure hard-fails when Electron dist already exists, and update retry test to reflect the tightened recovery condition. Validation: - Python tests: 64 passed - install.sh-related tests included in the run - Real mac build on this machine: - npm ci at repo root: success - cd apps/desktop && npm run pack: success - electron-builder packaged darwin arm64 and used custom unpacked Electron dist * refactor(desktop): trim electron self-heal helpers and comments Deduplicate mirror-retry into _try_redownload_electron_dist / shell counterparts; shorten wrapper and install-script commentary without changing recovery semantics. --------- Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com> Co-authored-by: omegazheng <zheng@omegasys.eu> Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>	2026-06-17 18:48:35 -05:00
Teknium	f8098c6b6f	fix(desktop): resolve electronDist to the actual electron install location (#48081 ) After the June lockfile regeneration (#46652) floated electron and reshuffled npm workspace hoisting, the desktop pack fails with "The specified electronDist does not exist". apps/desktop/package.json pointed electronDist at the repo root (../../node_modules/electron/dist) while npm now installs electron nested under apps/desktop/node_modules/electron. The two contradict, so a clean install can never package the app (Windows + macOS). - electronDist -> node_modules/electron/dist (resolved relative to apps/desktop, i.e. the workspace-local install npm actually produces). - hermes_cli/main.py, scripts/install.sh, scripts/install.ps1: add a runtime electron-dir resolver that prefers apps/desktop/node_modules/electron and falls back to the root hoist, so dist checks + the mirror re-download work under either npm layout. - patch-electron-builder-mac-binary.cjs: try the workspace-local Electron.app before the root hoist in the macOS binary-restore fallback (sibling site no PR touched). - test: assert build.electronDist resolves to where the lockfile installs electron, so a future hoist change (root <-> nested) can't silently break it. Salvages the overlapping work in #48003 (sitkarev), #48012 (omegazheng), and #48033 (james47kjv). Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com> Co-authored-by: omegazheng <zheng@omegasys.eu> Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>	2026-06-17 18:08:01 -05:00
kshitij	49d7481dfb	Merge pull request #47706 from NousResearch/fix/cli-login-deprecation-graceful fix(cli): deprecated `hermes login` fails gracefully for any provider	2026-06-17 23:02:32 +05:30
teknium1	aa6f77596b	chore: add AUTHOR_MAP entry for #47904 salvage	2026-06-17 09:49:46 -07:00
definitelynotguru	eaddeaf2e6	feat(xai): add grok-composer-2.5-fast to xAI OAuth model picker The model is callable via xAI OAuth but omitted from models.dev and /v1/models listings. Merge it into the curated xAI catalog so it appears in `hermes model` without requiring a custom model name.	2026-06-17 09:49:46 -07:00
Teknium	c6c8abbadb	refactor: remove agent-callable send_message tool (#47856 ) * feat(mcp): raise default tool-call timeout 120s -> 300s Port from openai/codex#28234. Long-running MCP tools (web fetches, sandboxed builds, deep-research servers) routinely exceed 120s, causing spurious timeout failures. Codex bumped its default MCP tool timeout from 120 to 300 for the same reason. - _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server 'timeout' config override unchanged) - update test_default_timeout assertion - document the default in mcp-config-reference.md * refactor: remove agent-callable send_message tool The agent should not decide on its own to fire off cross-platform messages or reactions. Outbound platform messaging is handled outside the agent loop — cron delivery, the gateway kanban notifier (dashboard-toggled), and the `hermes send` CLI. Removes the model-tool registration only; the send engine in send_message_tool.py (_send_to_platform, _send_via_adapter, _parse_target_ref, per-platform _send_* helpers) is kept intact for those non-agent callers. Drops the now-empty 'messaging' toolset and its `hermes tools` toggle. Yuanbao DM guidance now points at the native yb_send_dm tool.	2026-06-17 07:11:23 -07:00
Teknium	cbfa018aef	fix(auth): retry Codex device-code login on 429 with clear rate-limit message (#47860 ) The OpenAI device-code login (POST auth.openai.com/.../deviceauth/usercode) had no retry or 429 handling — a transient throttle from OpenAI surfaced as a bare "Device code request returned status 429" with no guidance, reading as a hard login failure. - Retry the device-code request with capped exponential backoff (honoring Retry-After), up to 4 attempts. - On persistent 429, raise a clear AuthError tagged CODEX_RATE_LIMITED_CODE (classified transient, not a credential problem) with a wait hint. - Apply the same 429 classification to the token-exchange step (same bug class). Unrelated to PR #47399 (Responses-API cache headers); this is the OAuth device-code path in hermes_cli/auth.py.	2026-06-17 05:48:35 -07:00
Shannon Sands	674e8b098a	Fix dashboard gateway profile scoping	2026-06-17 05:40:57 -07:00
Teknium	e48803daec	fix(gateway): defer macOS launchd reload when run inside the gateway tree (#47842 ) When refresh_launchd_plist_if_needed() runs from inside the gateway's own launchd process tree (agent-initiated self-update via the terminal tool), a direct launchctl bootout tears down the service's process group — including the CLI doing the refresh — before the follow-up bootstrap can run. The gateway is left unloaded and KeepAlive can't revive it (#43842). Detect in-service execution via gateway.status.get_running_pid() + _is_pid_ancestor_of_current_process(), and delegate the bootout->bootstrap to a detached (start_new_session=True) helper that survives the process-group teardown. The normal out-of-tree CLI path is unchanged. Fixes #43842.	2026-06-17 05:19:21 -07:00
kshitijk4poor	a7ec334448	fix(cli): deprecated `hermes login` fails gracefully for any provider `hermes login` was removed in favor of `hermes auth` / `hermes model`, but the subparser still validated `--provider` against a hardcoded choices list (nous, openai-codex, xai-oauth). Running `hermes login --provider anthropic` therefore crashed in argparse with `invalid choice: 'anthropic'` before the deprecation handler could print the redirect to `hermes model` — so a user trying to authenticate a perfectly valid provider just saw a hard error and assumed the feature was broken rather than relocated. - Drop the restrictive `choices=` so every `--provider` value reaches the deprecation handler (which ignores the value and prints guidance). - Omit the subparser `help=` kwarg so the dead command no longer advertises itself in `hermes --help` (#24756). Avoids the `==SUPPRESS==` placeholder leak that `help=argparse.SUPPRESS` emits for a top-level subparser on 3.12+. - `hermes login [--flags]` still reaches the actionable deprecation message for old scripts/aliases; `hermes login --help` shows the redirect. Picks up the intent of the inactivity-closed #24902, rebased onto the post-refactor parser location (hermes_cli/subcommands/login.py) and extended to fix the whole bug class (any provider value), not just hiding from --help. Tests: parametrized provider acceptance + help-suppression (no SUPPRESS leak).	2026-06-17 12:55:40 +05:30
kshitijk4poor	fbaad3031a	test(cli): URL tokens must not trigger filesystem path completion Regression coverage for the keystroke-latency fix: a URL token contains "/", so the bare-slash path heuristic used to return it as a path word and run os.listdir on every keystroke. Assert _extract_path_word rejects http/https/ssh scheme tokens, that ordinary paths (incl. a bare colon) are unaffected, and that the completer never touches the filesystem for a URL under the cursor.	2026-06-17 12:33:56 +05:30
xxxigm	d1ecebcbfd	fix(desktop): re-download Electron binary via mirror when pack fails (#47266 ) (#47276 ) * fix(desktop): re-download Electron binary via mirror when pack fails (#47266) Since #38673 pinned build.electronDist to node_modules/electron/dist, electron-builder reads the Electron binary straight from there and never downloads it during `npm run pack`. That dist tree is only produced by the electron package's postinstall (install.js) during `npm ci`. When that download is blocked or throttled (GitHub's release host is unreachable in some regions), the dist is missing and the build dies with: The specified electronDist does not exist: .../node_modules/electron/dist The existing ELECTRON_MIRROR fallback in all three desktop-build paths (scripts/install.ps1, scripts/install.sh, and `hermes desktop` in hermes_cli/main.py) re-ran `npm run pack` with ELECTRON_MIRROR set — but pack never downloads Electron anymore, so the mirror was never used and the retry re-read the same missing dist. The fallback was effectively dead. Drive the mirror through electron's own downloader instead: - Add a dist-presence check + a downloader helper (Test-ElectronDist / Restore-ElectronDist, _electron_dist_ok / _restore_electron_dist, _electron_dist_ok / _redownload_electron_dist) that wipes a partial dist + the path.txt version marker (electron's install.js short-circuits on it) and re-runs `node install.js`, optionally via a mirror. - On the first retry, repopulate a missing dist from the canonical source; on the mirror retry, re-fetch through npmmirror.com, then pack. - Gate the re-download on the dist check so an unrelated build failure (tsc/vite) doesn't trigger a pointless ~200 MB refetch, and skip the final pack when the binary still can't be fetched instead of failing the same way. * test(desktop): cover Electron dist re-download mirror fallback (#47266) Add behavior coverage for the electronDist re-download fix: - _electron_dist_ok across linux/win32/darwin, including the partial-dist case (dir present but binary missing) that makes the pinned electronDist fail. - _redownload_electron_dist: no-op when the binary is present, bail when install.js is absent, wipe a stale dist + path.txt marker and run electron's downloader with ELECTRON_MIRROR injected, and report failure when the download still produces no binary. - `hermes desktop`: the mirror fallback now drives electron's own downloader before re-running pack, and skips the final pack entirely when the binary can't be fetched. Replaces the old mirror test that asserted the (now-fixed) dead behavior of re-running `npm run pack` with ELECTRON_MIRROR set — pack never downloads Electron under the pinned electronDist, so that retry could never help.	2026-06-16 15:40:55 -05:00
teknium1	db44af004c	test(model-picker): cover two overlapping user-defined custom providers Guards that two user-defined custom endpoints exposing an overlapping model each keep their full catalog — the dedup must never cross-filter two user-defined rows against each other.	2026-06-16 13:09:40 -07:00
teknium1	7493de7fc3	test(model-switch): cover section-3 no-auth probe; map chimpera author Salvage follow-up for PR #29575: add regression tests for the section-3 no-api_key /v1/models probe (probes bare endpoints, skips when explicit models set) and add the contributor AUTHOR_MAP entry.	2026-06-16 13:07:52 -07:00
cyb0rgk1tty	b7fa62c530	fix(inventory): keep user-defined custom providers in model dedup The #45954 model-dedup builds `user_models` from every is_user_defined row, then strips those model IDs from every row where is_aggregator(slug) is True. But is_aggregator() returns True for every `custom:*` slug, and list_authenticated_providers emits named custom providers with slug `custom:<name>` and is_user_defined=True. So a user's own custom provider is treated as an aggregator and filtered against user_models — which holds exactly its own models (the row helped build that set). Every model is removed, the row drops to zero, and the provider disappears from the model picker. Guard the dedup loop to skip is_user_defined rows: a user's configured provider is never an aggregator duplicate of itself. Built-in aggregators (openrouter, etc.) are still deduped as before. Adds a regression test.	2026-06-16 13:04:07 -07:00
kshitij	17251e865b	Merge pull request #46857 from liuhao1024/fix/model-picker-merge-live-static fix(models): merge live API results with curated static catalog in generic provider path	2026-06-16 23:30:34 +05:30
kshitijk4poor	658ac1d866	fix(models): keep curated-first ordering in live+curated merge; use pure-catalog helper in validation The generic live+curated merge (commit `630b438`) seeded the merged list from live results, demoting curated-only models below live ones. That regressed #46309, which deliberately surfaces the newest curated model (kimi-k2.7-code) FIRST in the native picker even when the live /models listing lags. Restore curated-first ordering: curated entries lead (in catalog order), live-only entries are appended for discovery. This keeps the #46850 fix (zai glm-5.2 now appears) without the kimi regression. Also switch the validate_requested_model curated fallback (commit `ee7b8a4`) from provider_model_ids() — which triggers a second, uncached live /models fetch with its own 8s timeout and may resolve different credentials than the api_key/base_url just probed — to the pure-catalog helper _model_in_provider_catalog(). Membership is checked against the shipped catalog only, with no extra network call. Tests: restore the curated-first assertion in test_kimi_coding_live_catalog_does_not_hide_curated_k2_7_code; update the new merge tests to curated-first semantics; de-circularize the validation fallback tests to patch _PROVIDER_MODELS (the real source) instead of mocking the function under test.	2026-06-16 23:25:07 +05:30
Hao Zhe	2c2ca0443b	feat(memory): improve OpenViking setup UX	2026-06-17 01:04:26 +08:00
Hao Zhe	a893d77d8d	fix(memory): separate setup option descriptions	2026-06-17 01:02:39 +08:00
Hao Zhe	7f76cf7195	fix(memory): smooth setup transition after provider selection	2026-06-17 01:02:39 +08:00
Hao Zhe	2dace37f6b	feat(memory): improve OpenViking setup UX Support linking, copying, and creating ovcli.conf during OpenViking memory setup. Make setup cancellation write nothing and cover OpenViking/Hindsight picker cancellation paths.	2026-06-17 01:02:38 +08:00
brooklyn!	c6e99ab375	Merge pull request #46959 from NousResearch/bb/composer-model-selector feat(desktop): composer model selector, per-model presets & external-provider disconnect	2026-06-16 09:55:57 -05:00
MrDiamondBallz	9a59ad73dd	fix(auth): preserve Codex pool-only rate-limit state Classify exhausted pool-only openai-codex credentials as quota/rate-limited instead of missing auth. This prevents auth status and runtime credential resolution from reporting missing credentials when a valid manual:device_code pool credential exists but is temporarily in a 429 usage-limit cooldown. Adds regression coverage for pool-only Codex auth status and runtime resolution.	2026-06-16 05:56:11 -07:00
liuhao1024	ee7b8a4672	fix(models): validate_requested_model falls back to curated catalog when live API omits model When live /v1/models responds but omits a model that exists in the curated static catalog, validate_requested_model now accepts it with a note instead of rejecting. This covers the /model slash-command path (the picker path was already fixed in the parent commit). Addresses review feedback from potatogim on #46857.	2026-06-16 16:24:11 +08:00
liuhao1024	630b43892d	fix(models): merge live API results with curated static catalog in generic provider path When a provider's live /v1/models endpoint returns a stale or incomplete list (e.g. Z.AI missing glm-5.2), the generic profile-based code path returned only the live results, silently dropping curated models. Generalize the kimi-coding merge pattern to all providers: live entries come first (provider's preferred order), then curated-only entries are appended with case-insensitive dedup. This ensures models that the live endpoint omits still appear in /model picker. Fixes #46850	2026-06-16 16:21:01 +08:00
Brooklyn Nicholson	a0ec4f52b9	feat(desktop): disconnect external (CLI-managed) providers External providers (Claude Code) store creds outside Hermes, so the disconnect API refuses them. The backend now hands the GUI a per-OS `disconnect_command` that clears the credential the same way the CLI's logout does (macOS Keychain entry + ~/.claude/.credentials.json), and the misleading "use claude setup-token" hint is corrected. Settings → Providers offers a Disconnect button for these: it confirms, leaves Settings, and runs the removal command in the embedded terminal via a new runInTerminal() (queues onto $terminalInjection; the terminal pane flushes and clears it once its session is live). The expanded list also gets its own "Other providers" header so it no longer reads as grouped under "Connected". API-managed providers keep the one-click (trash) disconnect.	2026-06-16 00:08:21 -05:00
brooklyn!	c6b0eb4de0	fix(desktop): open remote-gateway artifacts via authenticated download (#46895 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details OSV-Scanner / Scan lockfiles (push) Has been cancelled Details uv.lock check / uv lock --check (push) Has been cancelled Details On a remote gateway connection, agent-written files live on the gateway host, not the desktop's disk, so the Artifacts view's file:// hrefs failed ("Invalid external URL") and image thumbnails broke. Make mediaExternalUrl() remote-aware in one place: in remote mode it rewrites gateway-local paths to GET /api/files/download (a new endpoint that streams the file as a Content-Disposition: attachment). The artifacts view now resolves through it, and so do the existing chat-media and generated-image callers, for free. The download endpoint stays auth-gated; auth_middleware additionally accepts the session token as a ?token= query param for this one path so a shell/browser-opened download (which can't set the session header) still authenticates — the same query-token tradeoff as the /api/pty WebSocket. It is NOT added to PUBLIC_API_PATHS. Salvages #46663 (which carried ~19k lines of CRLF noise and made the endpoint public). Reimplemented on a clean LF base with the security hole closed and tests added. Co-authored-by: qingshan89 <qs2816661685@gmail.com>	2026-06-15 23:50:19 -05:00
Gille	0441b7f19f	fix(desktop): route global remote profile REST calls (#47011 ) * fix(desktop): route global remote profile REST calls * fix(dashboard): scope oauth provider routes by profile * test(tui): isolate notification poller queue	2026-06-15 23:24:55 -05:00
Shannon Sands	7cd71de1f4	Simplify dashboard update detection to containers	2026-06-15 20:08:39 -07:00
Shannon Sands	b1d6a57883	Detect containerized dashboard update management	2026-06-15 20:08:39 -07:00
Shannon Sands	0b6b29a30c	Hide hosted dashboard update controls	2026-06-15 20:08:39 -07:00
xxxigm	2a08b8c86f	test(dump): cover terminal backend override reporting Verifies `hermes debug` surfaces a TERMINAL_ENV override of terminal.backend, reports the config value when no override is present, and emits no spurious note when env and config agree.	2026-06-15 12:31:23 -07:00
liuhao1024	60cc42e38b	fix(inventory): deduplicate models between user-defined and aggregator providers When a user-defined provider (e.g. litellm-proxy) and an aggregator (e.g. openrouter) both advertise the same model name, the Desktop/TUI model picker would show the model under both groups. Selecting it from the aggregator row silently set model.provider to the aggregator, breaking calls because the aggregator doesn't actually serve that model ID. Fix: after list_authenticated_providers() returns, collect all models from user-defined provider rows and filter them out of aggregator rows. Uses is_aggregator() from hermes_cli/providers.py to identify aggregators. Case-insensitive matching. Fixes #45954	2026-06-15 12:25:41 -07:00
liuhao1024	9df1a1a8de	fix(doctor): recognize nvidia as vendor-slug-accepting provider NVIDIA NIM API uses vendor-prefixed model IDs (e.g. qwen/qwen3.5-122b-a10b, nvidia/nemotron-3-super-120b-a12b). The doctor command incorrectly warns that vendor-prefixed slugs belong to aggregators like openrouter when nvidia is the configured provider. Add 'nvidia' to the providers_accepting_vendor_slugs set so doctor no longer raises false-positive warnings for valid NVIDIA NIM configurations. Fixes #35425	2026-06-15 12:24:46 -07:00
FT_IOxCS	92a456f711	fix(cli,deps): clear esbuild audit loop Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.	2026-06-15 06:18:27 -07:00
Teknium	0d82060c74	fix: harden WhatsApp target alias salvage Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.	2026-06-15 05:51:47 -07:00
Veritas-7	febdddb41a	fix(auth): refresh xAI OAuth tokens earlier	2026-06-15 05:40:23 -07:00
kshitijk4poor	497352bc4e	fix(auth): write rotated xAI OAuth tokens back to global root (#43589 ) The salvaged read-side fix lets a profile resolve the xAI OAuth grant from the global-root auth store when it has no own providers.xai-oauth block. But _save_xai_oauth_tokens still wrote rotated tokens only to the active profile store. Because xAI rotates the refresh_token on every refresh, a profile that reads root's grant and refreshes it left root holding a now- revoked refresh token — killing every other profile reading the stale root grant with invalid_grant once its access token expired (#43589). Detect the read-from-root case (profile lacks its own providers.xai-oauth block) and, after the profile save, write the rotated chain back to the global root too via a best-effort, TOCTOU-safe write-through that reuses _save_auth_store with an explicit target path. A profile that genuinely shadows root (has its own block) is left untouched, classic mode is a no-op, and a failed root write never breaks the profile's own save. Pairs with the read fallback in the preceding commit so the cross-profile xAI grant stays coherent in both directions.	2026-06-15 17:08:19 +05:30
Andrew Walker	f1d6f04362	fix(auth): resolve xAI OAuth credentials across profiles (cherry picked from commit `8d8b9f50e4`)	2026-06-15 17:03:35 +05:30
helix4u	dcc3216955	fix(mcp): fail fast for noninteractive oauth without tokens	2026-06-15 04:22:07 -07:00
Teknium	aca11c227e	fix(docker): skip gateway reconciliation in dashboard container (autodetect) (#46293 ) * fix(docker): skip per-profile gateway reconciliation in dashboard container When gateway and dashboard containers share a bind-mounted HERMES_HOME, both run the cont-init.d profile reconciliation script, which creates s6-log processes for every persisted profile. These s6-log processes in different containers race to flock() the same log-directory lock files under logs/gateways/<profile>/lock, producing repeated "s6-log: fatal: unable to lock ... Resource busy" errors and a supervision restart storm. Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py and set it in the official docker-compose.yml dashboard service so the dashboard container no longer creates per-profile gateway s6 services it never uses. * chore(release): map salvaged contributor * refactor(docker): autodetect dashboard container instead of env-var gate Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role detection. A dashboard-only container never spawns or supervises per-profile gateways, so the reconcile boot hook now skips itself when /proc/1/cmdline is the dashboard command — no operator flag to set (or forget in a hand-written manifest, which would reintroduce the s6-log flock storm this prevents). - Extract _strip_container_argv_prefix() shared by the legacy-gateway and new dashboard detectors (DRY the init/wrapper/hermes peel). - Add _is_dashboard_container(); gate reconcile main() on it. - Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml. - Tests: argv matrix for both roles + main()-level skip/reconcile proof and a regression that the removed env var is now inert. Co-authored-by: 895252509 <895252509@qq.com> --------- Co-authored-by: zhouxiang <895252509@qq.com> Co-authored-by: Ben <ben@nousresearch.com>	2026-06-15 20:51:48 +10:00
Ben Barclay	95715dcb03	fix(s6): reserved default gateway must not follow sticky active_profile (#46483 ) The supervised `gateway-default` s6 slot runs bare `hermes gateway run` (no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override` falls through its #22502 HERMES_HOME guard for the container root (/opt/data, whose parent is not `profiles`) and reads the sticky `active_profile` file. If the user set another profile active (e.g. via the dashboard), the reserved default gateway gets redirected into that profile — producing a duplicate gateway for the active profile and no real default gateway. The profile page and `gateway status` then correctly report default as "not running" because there genuinely isn't one. Guard step 2 (the sticky active_profile fallback) with the existing HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already exports. Supervised named-profile slots pass -p explicitly (step 1, never reaches step 2); only the bare default slot was affected. Inert outside the s6 container — the sentinel is never set elsewhere. Reported in the 'Docker & Profiles & Dashboard' support thread.	2026-06-15 05:36:20 +00:00

1 2 3 4 5 ...

1503 commits