Commit graph

13396 commits

Author SHA1 Message Date
HexLab98
76bb8f46a0 test(cli): cover Windows console script repair (#52931)
Add unit tests for missing-shim detection and repair trigger in
_verify_console_scripts_installed.
2026-06-28 17:01:31 -05:00
HexLab98
95994bbc56 fix(windows): repair missing hermes.exe after pip install (#52931)
On Windows, uv pip install -e . can register hermes.exe in package metadata
while the launcher never lands on disk. Detect missing [project.scripts]
shims and reinstall entry points under the existing quarantine path in
hermes update and install.ps1.
2026-06-28 17:01:31 -05:00
brooklyn!
28097d9cd9
Merge pull request #54385 from NousResearch/bb/project-folder-picker-remote
feat(desktop): remote-gateway-aware folder picker + git cockpit (status, review, worktrees)
2026-06-28 16:35:57 -05:00
Teknium
e5d22ab80d
fix(daytona): quote single-upload mkdir parent path (#54440)
* fix(daytona): quote single-upload mkdir parent path

The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.

Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.

Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.

* docs(infographic): daytona quote-sync fix
2026-06-28 14:33:03 -07:00
Brooklyn Nicholson
f9b469d7de test(web_git): assert default branch invariant, not hardcoded main
CI git init defaults to master on some runners; compare branch to
defaultBranch instead of pinning a branch name.
2026-06-28 16:29:52 -05:00
teknium1
c648ecdca5 fix(telegram): reject unauthorized users before event construction (#40863)
Removed/unauthorized Telegram users could inject prompt content before the
per-user auth gate fired. The adapter ran `_should_process_message`,
`_build_message_event`, and text/photo batching — and dispatched to the
runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected
the sender. Unmentioned group chatter from a removed user was also
persisted into the session transcript via `_observe_unmentioned_group_message`,
leaking into the agent's observed context independent of dispatch.

Add `_is_user_authorized_from_message()` as an intake prefilter that runs
in `_handle_text_message`, `_handle_command`, `_handle_location_message`,
and `_handle_media_message` BEFORE batching, event construction, and the
unmentioned-group observe branch. It reuses the runner's
`_is_user_authorized()` with a correctly-shaped SessionSource (group vs
forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists),
falls back to env allowlists, and only rejects when an allowlist actually
exists — unknown DMs with no allowlist still reach the pairing flow.
Channel posts authorize via `sender_chat` identity when `from_user` is
absent.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>
2026-06-28 14:25:15 -07:00
srojk34
61210097a5 fix(browser): extend private-network guard to browser_get_images
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.

Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.

5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
2026-06-28 14:25:10 -07:00
Brooklyn Nicholson
c7542358f2 fix(desktop): remote project picker UX and profile-scoped fs/git routing
Route FS/git REST through the active profile, mount the remote folder picker
at app root, keep the project dialog open while picking, show a first-run
blank state, flip into grouped view on create, and constrain the picker scroll
area so Select stays reachable.
2026-06-28 16:23:39 -05:00
Teknium
9a0010fd46
fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
Teknium
b31b0b9d95
docs: reconcile docs with code across last 3 releases (#54254)
Audited the last 3 releases (v2026.5.28..main) against the docs site and
fixed code-vs-docs drift:

- slash-commands: add /moa, /prompt, /pet, /hatch, /timestamps
- cli-commands: add hermes pets / project / desktop / whatsapp-cloud +
  dashboard register; correct --insecure (now a deprecated no-op);
  add gateway migrate-legacy + enroll --wake-url + dashboard --skip-build
- environment-variables: document the remaining ~48 env vars (SimpleX,
  Photon, Teams adapter, per-platform *_ALLOW_ALL_USERS, home-channel vars,
  IRC, Brave/Krea/Notion/Linear/Airtable/Tenor keys, QQ_SANDBOX) — full
  OPTIONAL_ENV_VARS (265) now covered
- configuration: document tool_loop_guardrails, goals, prompt_caching,
  network, onboarding, dashboard config blocks
- toolsets/tools-reference + tools.md: add coding/project toolsets and
  read_terminal/project_* tools; remove the stale messaging toolset and
  send_message agent tool (removed in #47856); drop stale RL-training prose
- messaging: new IRC channel page (adapter shipped without docs) + index
  row + sidebar + env vars
- pets: document the /hatch AI generation pipeline + Nous/OpenRouter image
  backend
- web-dashboard: document the bearer-token / TokenPrincipal service auth path
- purge agent-callable send_message references across guides/features and
  the research-paper-writing skill (tool removed in #47856)

Verified: docusaurus build succeeds; all authored internal links resolve.
2026-06-28 12:47:50 -07:00
Brooklyn Nicholson
19bae1b9e0 test(desktop): assert new backend sessions carry workspace cwd
Pin the desktop-to-gateway cwd handoff: createBackendSessionForSend must pass
the current workspace cwd into session.create so the backend registers the
session cwd before the agent/tools run.
2026-06-28 14:44:28 -05:00
Brooklyn Nicholson
8d8c7111d9 refactor(desktop): keep remote fs routing inside the fs facade
Let UI callers ask for folders/files without knowing remote-picker limits:
selectDesktopPaths now normalizes remote directory selection to a single folder
inside the facade. Project creation and composer context picking no longer branch
on remote mode; they route through desktop-fs helpers just like git callers route
through desktopGit(). Behavior unchanged except remote folder context now works
through the same backend picker path.
2026-06-28 14:39:33 -05:00
Brooklyn Nicholson
453f134b3b refactor(desktop): centralize remote git REST routing
Keep the remote git mirror as a thin facade: route all GETs through gitGet,
all mutations through gitPost, and keep consumers on desktopGit(). On the
backend, route git paths through a single _git_path helper instead of repeating
str(_fs_path(...)) in every endpoint. Behavior unchanged.
2026-06-28 14:37:36 -05:00
Brooklyn Nicholson
4e9439cc3b fix(desktop): route composer context picking through remote-aware fs
Second pass on the remote-project flow: the project dialog and git cockpit were
remote-aware, but the composer's Add file/folder context picker still called the
native Electron picker directly. Route it through selectDesktopPaths so remote
sessions use the backend-aware picker instead of local disk paths; preserve local
multi-select behavior and keep remote folder selection single because the in-app
remote picker only supports one directory.

Also use readDesktopFileDataUrl for image previews so an already-known backend
image path can be read through /api/fs/read-data-url, and add focused coverage
for backend file-diff routing plus the plain-folder git init/worktree path.
2026-06-28 14:35:23 -05:00
Brooklyn Nicholson
9b71221187 fix(desktop): write project IDEA.md through the remote-aware fs path
writeProjectIdea used the local-only Electron writeTextFile, so on a remote
gateway IDEA.md never landed on the backend (where the project folder lives).
Route it through writeDesktopFileText (local Electron / POST /api/fs/write-text).
2026-06-28 14:31:58 -05:00
Brooklyn Nicholson
e4cf3a2e9d refactor(web_git): unify porcelain-v2 parsing into one walker
Collapse the two near-duplicate status parsers (_parse_status_v2 +
_iter_status_entries) into a single _walk_entries generator feeding the rail,
review list, and commit flow; share the staged predicate; hoist `import re`.
Behavior unchanged.
2026-06-28 14:29:59 -05:00
Brooklyn Nicholson
fc86e35764 feat(desktop): make the git cockpit work over a remote gateway
After the folder picker fix, an added remote folder was still half-usable:
the desktop's git GUI (coding-rail status, worktree lanes, review pane,
branch switch, file diff) all ran Electron-local git on the USER's machine,
so against a remote-gateway repo they silently degraded to empty.

Mirror the whole surface over the dashboard REST API so it acts on the
BACKEND repo where sessions actually run:

- hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review
  list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/
  create-pr, file-diff, worktree add/remove, branch switch) shelling to the
  system git, mirroring the Electron ops' shapes.
- web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as
  /api/fs, executor-offloaded, mutations -> 400).
- apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as
  window.hermesDesktop.git; coding-status / review / projects / model /
  desktop-fs route through desktopGit() so local stays Electron, remote hits
  /api/git/*.

Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts,
review classification, diff incl. untracked all-add, stage+commit roundtrip,
worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and
desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).
2026-06-28 14:26:09 -05:00
Brooklyn Nicholson
304f0650c4 style(desktop): tighten pickProjectFolder comment 2026-06-28 14:13:36 -05:00
Brooklyn Nicholson
4526fccdbe fix(desktop): make project "Add folder" picker remote-gateway aware
The new-project / add-folder dialog (PR #49037) picked folders via the
native Electron dialog (pickDefaultProjectDir), which only browses the
LOCAL machine. On a remote gateway that picks a path that doesn't exist
on the backend where sessions actually run.

Route pickProjectFolder() through selectDesktopPaths({directories,
multiple:false}) — the same remote-aware path the retired right-sidebar
picker used: local mode opens the native directory dialog, remote mode
browses the backend filesystem via the in-app RemoteFolderPicker. Seed
it with the backend's default cwd on remote so it opens somewhere useful.
2026-06-28 13:49:45 -05:00
brooklyn!
b699d27a4a
Merge pull request #54357 from NousResearch/bb/browser-chromium-autoinstall
feat(browser): auto-install Chromium binary on local cold-start failure
2026-06-28 12:36:22 -05:00
brooklyn!
27868e5b55
Merge pull request #54353 from NousResearch/bb/browser-first-open-timeout
fix(browser): extend first-open timeout & surface daemon errors on Linux (salvage #52575)
2026-06-28 12:32:41 -05:00
Brooklyn Nicholson
70292596ef feat(browser): auto-install Chromium binary on local cold-start failure
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.

Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
  system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process

Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
2026-06-28 12:25:15 -05:00
Brooklyn Nicholson
1ab5c3cdda refactor(browser): drop redundant sandbox-hint substring check 2026-06-28 12:14:47 -05:00
infinitycrew39
7bb8aa3bd5 test(browser): cover open timeout diagnostics and failed navigate title
Add regression tests for open-command timeout floors, sandbox bypass,
stderr capture formatting, first-navigation timeout wiring, and desktop
failed-navigate labeling.
2026-06-28 12:14:21 -05:00
infinitycrew39
a10727a555 fix(browser): extend first-open timeout and surface daemon errors
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
2026-06-28 12:14:21 -05:00
brooklyn!
23021be26e
Merge pull request #52656 from helix4u/fix-desktop-empty-resume-view
fix(desktop): retry empty resumed transcripts
2026-06-28 11:57:57 -05:00
ygd58
3e16176ba4 fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled
_get_platform_tools() applies agent.disabled_toolsets as a final
override AFTER reading platform_toolsets.<platform>, so a toolset
listed there stays permanently OFF no matter what the toggle write
path saves. Blank Slate installs pre-populate this list with ~27
toolsets, making most of the desktop Toolsets UI un-enableable
(issue #49995).

Fix: _save_platform_tools() now removes any toolset the user just
explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets.
Toolsets the user did not touch, or that remain disabled on other
platforms, are left alone -- disabled_toolsets keeps working as a
cross-platform suppression list for anything not actively re-enabled.
Disabling a toolset (unchecking it) does not touch disabled_toolsets
at all -- only enables reconcile it.

Verified end-to-end with the exact repro from the issue: Blank Slate
config (disabled_toolsets=['todo','memory','browser'], cli=['file',
'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools()
now resolves 'todo' as enabled while 'memory'/'browser' (untouched)
remain disabled.

Added 4 regression tests. Full tools_config suite: 101 passed
(97 existing + 4 new), no regressions.

Fixes #49995
2026-06-28 21:59:03 +05:30
brooklyn!
020966574d
Merge pull request #53892 from NousResearch/bb/windows-popup-spawn-legs 2026-06-28 11:16:35 -05:00
Brooklyn Nicholson
eeca59f489 fix(windows): hide remaining backend console-flash legs missed on main
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:

- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
  pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
  console python.exe and flashes a conhost the desktop backend can't suppress.
  Both backend creators put the venv site-packages on PYTHONPATH so imports
  still resolve under the base interpreter. (main's commit said this Electron
  leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
  ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
  windows_hide_flags().

All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
2026-06-28 10:19:21 -05:00
Teknium
0c2e6c0049
test: make active session cross-process race deterministic (#54248) 2026-06-28 05:49:21 -07:00
teknium1
1ffa01f35f test(windows): cover no-window backend subprocess flags 2026-06-28 05:28:45 -07:00
Teknium
cb982ad997 fix(windows): hide console-window flash on backend git/gh/wmic/bash subprocess spawns
The Windows desktop GUI runs its backend headless via pythonw.exe. Several
auxiliary subprocess sites that run inside that windowless backend spawned
console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill)
WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and
flashed a black window on screen — sometimes continuously (the dashboard
Projects-tree git probe alone fired ~118 spawns in 60s on startup).

The terminal tool, cron, browser, code_execution, and gateway-spawn paths
already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs
were missed. Wire the existing helper into them:

- tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the
  cp950 UnicodeDecodeError on CJK paths from the same site)
- agent/coding_context.py: _git (per-turn git status/log/diff)
- agent/context_references.py: _run_git + _rg_files (@file/@ref resolution)
- hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto)
- hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan
- hermes_cli/main.py: wmic stale-dashboard PID scan
- gateway/status.py: taskkill /T /F force-kill

windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on
Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls
all pass creationflags=CREATE_NO_WINDOW).

Scoped to the windowless-backend paths that cause the reported flashing. The
Electron updater-handoff leg (main.cjs windowsHide:false) and the
interactive-CLI banner probes (cli.py) are intentionally NOT touched here —
the former needs a Windows-tested change of its own, the latter runs in a
visible console anyway.

Tracking: #54220
Refs: #53178 #53631 #53781 #53957 #49602 #52982 #53424 #53053 #53016
2026-06-28 05:28:45 -07:00
teknium1
f25f235722 chore: map salvaged PR #49845 author email for AUTHOR_MAP 2026-06-28 04:47:39 -07:00
homelab-ha-agent
d05cc8f4d6 fix(mcp): skip preflight content-type probe for OAuth servers
OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token.  The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.

Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run().  The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.

Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD.  Without the guard:
  ✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
  ✓ Connected (1487ms) — 63 tools discovered

Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
  DocuSeal), but only passes when POST returns 2xx + MCP content-type.
  Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
  does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
  (making it OAuth-aware), but is labeled duplicate of #37598 and may not
  land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.

Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
  _preflight_content_type raises NonMcpEndpointError for 200 text/html
  (the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
  _preflight_content_type when auth_type=="oauth", using class-level
  monkeypatching so the gate is exercised without a live MCP transport.

23 passed  tests/tools/test_mcp_preflight_content_type.py
2026-06-28 04:47:39 -07:00
liuhao1024
9d919daf44 fix(gateway): mark platform lock failure as retryable instead of permanently fatal
When a stale lock file survives a gateway crash, `acquire_scoped_lock()`
may return `(False, existing_dict)` even after detecting and deleting
the stale lock (e.g. if unlink fails or a race condition occurs).

Previously, `_acquire_platform_lock()` called
`_set_fatal_error(..., retryable=False)`, which permanently killed the
platform — the reconnect watcher never retries a non-retryable fatal
error.

Change to `retryable=True` so the platform enters the "retrying"
state and the reconnect watcher can attempt acquisition again after the
standard backoff delay.

Fixes #54167
2026-06-28 04:35:37 -07:00
teknium1
61622bb56a fix(tui): use role=user for model switch marker to avoid HTTP 400 on strict providers (#48338)
_append_model_switch_marker() appended the post-/model-switch context marker
to session history as {"role": "system"}. The cached system prompt is
prepended to the API message list (conversation_loop.py), so this marker
became a SECOND system message mid-array after prior user/assistant turns.
Strict OpenAI-compatible providers (vLLM, Qwen) reject any system message
that is not at the beginning of the array, returning HTTP 400 and killing
the conversation on the next turn.

Flip the marker to role="user" (history entry + both session-DB persist
sites), matching the existing personality-overlay marker which already uses
role="user". repair_message_sequence() then coalesces it with adjacent user
turns as needed.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Lucas Nicolas <lucas.nicolas@proton.me>
2026-06-28 04:34:55 -07:00
Brad Hallett
376d021fee fix(desktop): force app exit after update/uninstall handoff on macOS
Some checks are pending
CI / Detect affected areas (push) Waiting to run
CI / Python tests (push) Blocked by required conditions
CI / Python lints (push) Blocked by required conditions
CI / TypeScript (push) Blocked by required conditions
CI / Docs Site (push) Blocked by required conditions
CI / Deny unrelated histories (push) Blocked by required conditions
CI / Check contributors (push) Blocked by required conditions
CI / Check uv.lock (push) Blocked by required conditions
CI / Lint Docker scripts (push) Blocked by required conditions
CI / Build&Test Docker image (push) Blocked by required conditions
CI / Supply-chain scan (push) Blocked by required conditions
CI / OSV scan (push) Waiting to run
CI / All required checks pass (push) Blocked by required conditions
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
On macOS app.quit() closes windows but window-all-closed deliberately keeps
the process alive (Dock convention). Every detached hand-off (update swap,
relaunch, Windows bootstrap recovery, uninstall cleanup) waits for the
desktop PID to exit before replacing/removing the bundle — so the process
never dying means the script spins its full PID-wait and the user sees a
blank app, or an uninstall that appears to do nothing.

Add a module-level isQuittingForHandoff flag, set before every hand-off
app.quit(); window-all-closed then quits on all platforms when it's set.

Covers all five hand-off sites including the Linux relaunch path.
2026-06-28 04:30:14 -07:00
teknium1
e54bedd8ea docs: add infographic for #42006 launchd bootout fix 2026-06-28 04:17:13 -07:00
izumi0uu
c4719aa51c fix(gateway): boot out stale launchd registration before restart bootstrap
launchd restart can leave the gateway job stopped but still registered after
update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a
detached process. Booting the stale registration out before bootstrap keeps the
launchd-managed restart path intact and locks it with a regression test.

Constraint: Keep upstream-facing conventional commit style while preserving local decision context
Rejected: Treat bootstrap exit 5 as expected | Leaves macOS launchd restart outside launchd supervision after update
Confidence: high
Scope-risk: narrow
Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart"
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd
Not-tested: Manual macOS launchctl restart after hermes update
2026-06-28 04:17:13 -07:00
Teknium
52a853f5c3
fix(test): pin monotonic clock in spinner-elapsed test to fix CI flake (#54203)
test_spinner_elapsed_format_is_fixed_width_to_reduce_wrap_jitter derived
_tool_start_time from the live time.monotonic() clock (now - 65.2 / now - 9.2).
monotonic()'s epoch is arbitrary — on a host where monotonic() < 65.2 (fresh
subprocess on a freshly-booted CI runner) the start time went negative, the
(t0 > 0) guard in _render_spinner_text() dropped the '(elapsed)' suffix, and
short.split('(',1)[1] raised IndexError: list index out of range. Deterministic
given a small clock, so it would keep flaking, not clear on rerun.

Pin time.monotonic to a fixed 1000.0 and offset _tool_start_time from it so both
the <60s and >=60s paths always render the elapsed suffix regardless of the
runner's monotonic epoch.

Pre-existing main flake (surfaced in CI test slice 1/8).
2026-06-28 04:16:25 -07:00
Teknium
8e356eccea
docs(readme): trim provider list to a few names plus docs link (#54169)
The README line enumerated 11 providers inline, which dilutes the point
and goes stale as providers come and go. Replace with Nous Portal,
OpenRouter, OpenAI, your own endpoint, and a 'many others' link to the
canonical AI Providers docs page that already lists them all.
2026-06-28 04:14:59 -07:00
teknium1
f22b9d3867 docs: add infographic for MCP WS discovery fix (#38945) 2026-06-28 04:14:12 -07:00
Cornna
5c2c85c545 fix(tui): start MCP discovery for websocket sessions
The desktop app and dashboard chat reach the agent through the /api/ws
JSON-RPC sidecar (tui_gateway.ws.handle_ws), NOT through
tui_gateway.entry.main() — the stdio-TUI path that spawns the background
MCP discovery thread. In the WS process discovery was therefore never
started: _make_agent only *waits* (wait_for_mcp_discovery), which no-ops
when the thread was never created, so the agent snapshotted an MCP-less
tool list. The only discovery trigger reachable was a manual /reload-mcp,
which is why tools appeared after a reload but vanished on restart.

Start the shared, idempotent, config-gated background discovery in
handle_ws right after accept() and before gateway.ready, so the first
agent build picks up already-spawning servers (and the existing
late-binding refresh handles slow ones).

Fixes #38945.
2026-06-28 04:14:12 -07:00
teknium1
091ce825fe test(redact): fix file_read regression-guard for current-main YAML collapse
The salvaged #35519 regression guard asserted that default (non-file_read)
mode keeps a head/tail `ghp_S1...Pn2T` mask for a `token: <key>` line. On
current main the YAML config pass (`_YAML_ASSIGN_RE`, key `token`) re-masks
the already-prefix-masked value to `***`, so the assertion was stale. Switch
to a bare-token context so the guard isolates what it claims (prefix-mask
head/tail shape in default mode) without depending on the YAML collapse.
2026-06-28 04:13:20 -07:00
kshitijk4poor
de928bccde fix(redact): non-reusable sentinel for prefix secrets in file reads (#35519)
When security.redact_secrets is on (default), read_file/search_files/cat
applied redact_sensitive_text(code_file=True) to file content, which still
ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.)
came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking
truncated key. When an agent read that and wrote it back to config, the masked
value replaced the real credential, silently breaking auth (401). Production
evidence: a config.yaml found containing the exact 13-char masked GitHub PAT.

The two community PRs (#35529, #35534) fixed the corruption by NOT redacting
prefixes for config reads — but that exposes the user's real keys to the agent
context, model, and logs (a security regression). This takes the safer route:
keep redacting, but for file content emit a NON-REUSABLE sentinel.

- New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor
  label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis
  wrapper is syntactically invalid as a token so it can't be mistaken for or
  written back as a usable key).
- New `redact_sensitive_text(file_read=True)` routes prefix matches through it
  (implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token`
  still keeps head/tail (fine for logs, never written back).
- Wired the 3 file_tools.py call sites (read_file / search_files / cat) to
  file_read=True.

Fixes both the corruption AND avoids the secret-exposure of the un-redact
approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default
mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass;
mutation-verified (reverting to the old mask fails the sentinel/leak tests).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com>

Closes #35519. Supersedes #35529, #35534.
2026-06-28 04:13:20 -07:00
teknium1
19cbbe304a docs: add infographic for clarify typed-replies fix 2026-06-28 04:13:19 -07:00
tymrtn
d7f655f370 fix: accept typed clarify choice replies 2026-06-28 04:13:19 -07:00
teknium1
9bb5a809b5 fix(gateway): make zombie check defensive against partial psutil stubs
The zombie status probe referenced psutil.Process/NoSuchProcess/Error
unconditionally, which raised AttributeError when psutil is a partial
stub that only defines pid_exists (as in test_windows_native_support's
fallback tests). Guard the probe so any failure to read status degrades
to the authoritative pid_exists() instead of raising.
2026-06-28 04:11:14 -07:00
MorAlekss
acca526286 fix(gateway): treat zombie PIDs as dead in _pid_exists to unblock --replace (closes #42126)
Under systemd Restart=always, the old gateway becomes a zombie (in the
process table, awaiting reap) when the replacement starts. _pid_exists()
reported the zombie as alive, so --replace waited on a PID that never
dies, then aborted with exit 1 — a silent crash loop. Standalone runs are
unaffected because nothing respawns the gateway into a zombie.

The live path is psutil.pid_exists(), which returns True for zombies, so
the check is added there (Process.status() == STATUS_ZOMBIE -> dead). The
psutil-less POSIX fallback also reads /proc/<pid>/stat (state Z) with a ps
state= fallback for macOS/BSD, before the os.kill(pid, 0) liveness probe.

Diagnosis and the /proc + ps POSIX fallback by MorAlekss (PR #44898);
extended to cover the psutil hot path so the fix applies on normal installs.

Co-authored-by: MorAlekss <mor.aleksandr@yahoo.com>
2026-06-28 04:11:14 -07:00
teknium1
463225caf1 fix(gateway): bypass legacy-unit prompt in non-TTY systemd install
Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive
flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt
not guarded by --start-now/--start-on-login — is also skipped in headless
contexts. Updates systemd_install test mocks to accept the new kwarg and
adds coverage for the legacy-unit-skip path.
2026-06-28 04:09:54 -07:00