Commit graph

4770 commits

Author SHA1 Message Date
brooklyn!
31c40c72c0
fix(desktop): stabilize project folder sessions (#37586)
* fix(desktop): stabilize project folder sessions

Keep desktop folder selection aligned with new sessions and scope TUI gateway cwd through session context so prompts and tools resolve against the selected workspace.

* fix(desktop): address review feedback on folder sessions

Snapshot sessions before iterating to avoid concurrent-mutation crashes,
optional-chain the revealLogs catch, and read console-message args from
the correct Electron event/messageDetails positions.

* fix(desktop): address second review pass on folder sessions

Sync the remembered workspace key with the cwd atom (clear on empty),
only load tree children for real directory nodes, and throttle renderer
auto-reloads so a deterministic startup crash can't loop forever.

* fix(desktop): inherit parent workspace for ephemeral agent tasks

Background and preview tasks use ephemeral ids absent from the session
map, so pass the parent session cwd into the session context explicitly
instead of clearing it back to the gateway launch dir. Also correct the
set_session_vars docstring about clear_session_vars semantics.

* fix(desktop): validate preview cwd before pinning session context

A non-empty but non-existent client cwd would pin an unusable override
and silently fall back to the launch dir. Validate once, reuse for both
the session context and the terminal override, and fall back to the
parent session workspace when invalid.

* fix(desktop): harden preview cwd normalization and adopt normalized cwd

Guard preview cwd normalization against malformed client paths so a bad
input can't fail the whole restart, and adopt the backend's normalized
config.get cwd in the no-active-session path so the persisted workspace
stays consistent with what the agent uses.
2026-06-02 20:23:09 +00:00
brooklyn!
bb0619dbce
fix(auth): align Codex OAuth persistence paths (#37517)
* fix(desktop): codex OAuth onboarding now resolves on fresh install

The desktop codex device-code worker persisted tokens with a hand-rolled
pool.add_entry(), writing only credential_pool.openai-codex. It never set
active_provider, so on a fresh install the onboarding setup.runtime_check
resolved provider "auto", couldn't detect the Codex OAuth session, and raised
"No inference provider configured" — while setup.status (which sniffs the pool)
reported configured. The disagreement surfaced as the onboarding banner
"Connected, but Hermes still cannot resolve a usable provider."

Use the canonical _save_codex_tokens() instead, matching the CLI's
`hermes auth add openai-codex` path and the Nous/MiniMax dashboard workers.
It writes the providers.openai-codex singleton (setting active_provider) and
syncs the pool.

* fix(auth): align Codex OAuth persistence paths

Ensure desktop and CLI Codex OAuth logins both write the canonical provider state so fresh installs resolve a usable runtime provider.

---------

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-06-02 12:19:44 -05:00
Austin Pickett
6d14a24b79
feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383)
* feat(dashboard): nous-blue theme, bulk sessions, schedule picker

Batch of related dashboard improvements gathered on
austin/fix/dashboard-changes:

* Nous Blue theme — faithful port of the LENS_5I overlay system onto
  the existing DashboardTheme. Lifts the foreground inversion layer to
  z-index 200 to fix the long-standing hover / loading visual artifact,
  adds an explicit swatchColors slot so the theme picker shows the
  post-inversion preview, and migrates the legacy "lens-5i" theme key
  from localStorage / API to "nous-blue" on first read.
* Theme-aware series colors: new --series-input-token /
  --series-output-token CSS vars consumed by Analytics + Models
  charts; ToolCall + ModelInfoCard switched to semantic
  --color-success for diff lines and the Tools capability badge.
* Analytics + Models headers: consolidate period selector + refresh
  next to the page title and drop the redundant period badge.
* Bulk session management — "Delete empty (N)" button + per-row
  checkboxes with shift-click range select and a bulk-delete action
  bar. Backed by SessionDB.delete_sessions() /
  delete_empty_sessions() plus POST /api/sessions/bulk-delete and
  DELETE /api/sessions/empty (registered before the templated
  /api/sessions/{session_id} family so they don't get shadowed).
  Hard cap of 500 IDs per bulk request. Full pytest coverage.
* Cron page — human-readable schedule picker (every-interval / daily
  / weekly / monthly / once / custom) replaces the raw cron
  expression input; the job list now renders "Weekly on Mon, Wed,
  Fri at 14:30" instead of "30 14 * * 1,3,5". English-only ordinals
  for monthly schedules so non-English locales don't get incorrect
  suffixes.
* example-dashboard plugin moved from plugins/ to tests/fixtures/ so
  stock installs no longer ship the demo. Tests install it
  dynamically via a pytest fixture that also reorders the FastAPI
  routes.
* i18n: 40+ new keys for the bulk-select UI and schedule
  picker/describer translated across all 16 locales.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(dashboard): dedupe memory provider picker

The memory provider <Select> lived on both /system and /plugins,
writing the same config.yaml field through two different endpoints
with no cross-page refresh. Remove the picker from /system in favor
of a read-only status row + link to /plugins, where it pairs with
the context-engine picker under "Plugin providers".

/system retains the destructive admin controls (file sizes, Reset
MEMORY.md / USER.md / all). The api.setMemoryProvider client and
PUT /api/memory/provider backend endpoint are left in place for
CLI / script callers.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(dashboard): address Copilot review on PR #37383

- Backdrop layer-stack comment claimed LENS_5I-style themes override
  --component-backdrop-bg-blend-mode to multiply, but our only
  LENS_5I-style theme (nous-blue) keeps the default difference.
  Reword to describe what the code actually does and present the
  var as a forward-looking extension hook.
- /api/sessions/bulk-delete docstring promised the response would
  echo back the list of deleted IDs, but the implementation only
  returns {ok, deleted}. Tighten the docstring to match the wire
  format; the client already knows what it asked to delete, so the
  IDs aren't needed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address copilot review on cron describe + bulk-select checkbox

- schedule.ts: restrict `describeCronExpression` to strictly 5-field cron
  expressions. The backend `parse_schedule` also accepts the 6-field
  `min hour dom month dow year` form, and humanising those by
  destructuring only the first five fields would silently drop the year
  (e.g. ``0 9 * * * 2099`` rendered as "Daily at 09:00"). 6+ field
  expressions now fall through to the raw-string fallback so the user
  sees what's actually scheduled.

- SessionsPage.tsx (SessionRow): wire the bulk-select Checkbox's
  ``onClick`` directly instead of attaching it to a parent ``<span>``
  with a no-op ``onCheckedChange``. Radix forwards onClick to the
  underlying ``<button role=checkbox>``, so the same handler now drives
  both mouse clicks (preserving shift-key state for range select) and
  keyboard activation (Space on the focused checkbox, which the browser
  synthesises as a click on the <button>). Improves a11y / keyboard UX
  without changing the controlled-selection model.

- SessionsPage.tsx: also extend ``SessionRowProps`` with the new
  ``onRename`` / ``onExport`` props introduced on main so the row's
  destructured prop types resolve after the merge.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-02 12:37:40 -04:00
ethernet
a6b6afdff4
Merge pull request #36864 from maxmilian/fix/tui-reset-terminal-input-modes-on-exit
fix(cli): reset terminal input modes on TUI exit to stop focus/mouse leaks
2026-06-02 11:30:50 -04:00
Brooklyn Nicholson
267e7fd395 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/desktop-session-list 2026-06-02 09:27:34 -05:00
Teknium
afea650e16
fix(model-picker): OpenAI shows curated models; OpenRouter no longer phantom-shows (#37404)
The model picker now matches `hermes model` for OpenAI, and OpenRouter
stops appearing as authenticated when only OPENAI_API_KEY is set.

- models.py: provider_model_ids() for the default api.openai.com endpoint
  intersects the live /v1/models dump (120+ entries incl. embeddings,
  whisper, tts, dall-e, moderation, legacy chat) with the curated agentic
  list, preserving curated order. Custom OpenAI-compatible endpoints keep
  the live list verbatim so discovery still works.
- providers.py: drop extra_env_vars=("OPENAI_API_KEY",) from the openrouter
  overlay. list_authenticated_providers reads extra_env_vars to decide
  whether a provider is authenticated, so any OpenAI user saw a phantom
  OpenRouter row. Runtime OpenRouter credential resolution still falls back
  to OPENAI_API_KEY (runtime_provider.py), independent of the overlay.
- Regression tests for both paths.
2026-06-02 06:31:37 -07:00
Teknium
195c4d2a98
feat(streaming): per-platform streaming defaults (Telegram on, Discord off) + dashboard toggles (#37303)
Streaming quality differs sharply by platform: Telegram has native animated
draft streaming (sendMessageDraft) which is smooth, while Discord/Slack only
have edit-based streaming (repeated editMessage) which visibly flickers. Ship
defaults that match reality instead of one global flag.

- hermes_cli/config.py: DEFAULT_CONFIG display.platforms now ships
  telegram.streaming=true and discord.streaming=false (was empty {}). These
  are gap-fillers — config deep-merge has user values win, so anyone who
  explicitly sets discord.streaming=true keeps it. The global
  streaming.enabled master switch still gates everything; these per-platform
  flags only take effect once streaming is on.
- Dashboard exposure comes for free: the web settings schema is generated
  from DEFAULT_CONFIG, so display.platforms.telegram.streaming and
  .discord.streaming now surface as editable boolean toggles in the UI with
  no frontend change. (Previously the per-platform tree was {} and invisible.)
- tests: pin the defaults, the resolver outcome (telegram on / discord off /
  unlisted platforms follow global), user-override-wins, and dashboard schema
  exposure.

No _config_version bump: deep-merge fills the gap for existing installs; no
value migration needed.
2026-06-02 05:52:54 -07:00
Brooklyn Nicholson
135c65093a feat(desktop): stable in-workspace ordering + No-workspace default
- Sidebar: rows within a workspace group now sort by creation time instead of
  last activity, so they stop reshuffling every time a message lands (muscle
  memory). Groups still float up by recency.
- Sessions only persist a workspace cwd when one was explicitly chosen; an
  auto-detected launch directory is no longer stamped on the row, so untargeted
  sessions group under "No workspace" instead of "desktop". The agent still
  runs in the detected directory.
2026-06-02 07:18:47 -05:00
Brooklyn Nicholson
de8bdf529d fix(desktop): keep pinned + recent sessions visible across compression
Long-running sessions auto-compress: the gateway ends the original session
and surfaces the live continuation under a new id (list_sessions_rich projects
the root forward to its tip). Two symptoms fell out of the id rotation:

- A pinned session "vanished" — the pin is stored as the pre-compression root
  id, but the sidebar only matched on the live id, so it was filtered out.
  Pins now resolve on the durable lineage-root id (`_lineage_root_id`, already
  surfaced by the projection): the sidebar indexes sessions by both ids, pin/
  unpin and reorder operate on the durable id, and `sessionPinId()` is shared
  with the Cmd+P toggle. Existing pins keep working with no migration.

- A freshly-continued session was missing from the list until you ungrouped +
  "load 50 more" — the list paginated by original start time, so an old-but-
  active conversation sat past the first page. The desktop now requests
  `order=recent` (GET /api/sessions gains an `order` param backed by the
  existing recency CTE), surfacing live continuations on the first page.
2026-06-02 07:12:05 -05:00
Ben Barclay
c10ccaaf51
feat(dashboard-auth): rotate dashboard sessions via refresh token (#37247)
* feat(dashboard-auth): rotate dashboard sessions via refresh token

The dashboard auth-code grant now issues a 24h rotating refresh token
(server side: NousResearch/nous-account-service#293). This wires up the
Hermes client half so an expired access token is transparently refreshed
instead of bouncing the user to /login every 15 minutes.

plugins/dashboard_auth/nous:
- refresh_session() now POSTs grant_type=refresh_token to Portal's token
  endpoint and returns a Session carrying the ROTATED refresh token (was
  an unconditional RefreshExpiredError under the old "no RT in V1"
  contract). The RT is sent in BOTH the request body (Portal's schema
  requires it there) and the X-Refresh-Token header (log redaction) —
  verified against the #293 preview deploy: header-only is rejected as
  invalid_request, body is accepted.
- A 400 from Portal (expired / revoked / reuse-detected) maps to
  RefreshExpiredError so the middleware forces a clean re-login; network
  errors map to ProviderError; empty RT fast-fails without a network call.
- complete_login now captures the initial refresh token Portal returns
  (forward-tolerant: empty string if a deploy omits it).
- Extracted the shared token-response handling into
  _token_response_to_session, parameterised on the 400 exception type so
  the auth-code path raises InvalidCodeError and the refresh path raises
  RefreshExpiredError.
- revoke_session stays a best-effort no-op: Portal exposes no public
  token-endpoint revocation grant (revocation is the authenticated
  /sessions UI, keyed by sessionId+userId), so logout is cookie-clearing
  and the 24h session expires on its own. Documented for a future
  revoke grant.

hermes_cli/dashboard_auth/middleware:
- On an expired/invalid access token the gate now attempts refresh via
  the session's RT BEFORE forcing re-login. On success it serves the
  request and re-sets the rotated cookies on the response (mandatory:
  Portal rotates the RT every refresh and reuse-detects, so a stale RT
  cookie would revoke the whole session on the next refresh). On
  RefreshExpiredError (or no RT) it falls through to clear-and-relogin.
- ProviderError during refresh (Portal unreachable) forces a clean
  re-login rather than 500-ing the request.
- Uses the existing REFRESH_SUCCESS / REFRESH_FAILURE audit events.

Validation:
- 176 dashboard-auth unit/integration tests pass.
- Live E2E against the #293 preview deploy: refresh_session(bad rt) ->
  RefreshExpiredError through the real token endpoint; live JWKS fetch +
  RS256 verification rejects a forged token; empty-RT fast-fail. The
  successful happy-path rotation is covered by unit tests (a live run
  needs an interactive browser OAuth round trip + registered agent:*
  client).

Depends on: NousResearch/nous-account-service#293 (server-side RT issuance).

* fix(dashboard-auth): use Portal's x-nous-refresh-token header name

The refresh-token header must match Portal's REFRESH_TOKEN_HEADER exactly
("x-nous-refresh-token"); the initial cut used "X-Refresh-Token", which
Portal silently ignores (harmless since the RT is also in the body, which
is what the schema requires — but the header redaction was a no-op).
Confirmed against the NAS token route + re-validated live against the
#293 preview deploy.

* fix(dashboard-auth): refresh session when access-token cookie has been evicted

The gated middleware bounced users to /login the instant the access-token
cookie was absent, without ever consulting the refresh token:

    at, _rt = read_session_cookies(request)
    if not at:
        return _unauth_response(...)   # bailed here

This made transparent refresh effectively dead for the common case. The
access-token cookie is set with Max-Age = access_token_expires_in (~15 min),
so a real browser EVICTS hermes_session_at the moment the token lapses while
hermes_session_rt persists (30-day Max-Age). From that point the browser
sends only the refresh-token cookie — and the old guard rejected it before
_attempt_refresh could run. The _attempt_refresh path only fired for a
present-but-invalid access token, which never happens in a browser.

Fix: only hard-bounce when NEITHER cookie is present. A request carrying
just the refresh token now skips verification (no AT to verify) and flows
into the existing refresh path, which rotates both cookies and serves the
request transparently. A dead/expired RT still raises RefreshExpiredError
and falls through to clear-and-relogin.

This failure mode escaped the original tests + manual refresh button because
both kept the access-token cookie present; only a real browser evicting the
cookie at Max-Age exposes it. Added 3 regression tests covering: AT-evicted +
RT-present (transparent refresh), no-cookies (still bounces), and RT-only with
a dead RT (clean 401, no 500).
2026-06-02 21:16:41 +10:00
Jeffrey Quesnelle
89db6c8534
Merge pull request #37283 from NousResearch/fix-toolset-provider-selection-display
fix(desktop): reflect active toolset provider in config panel
2026-06-02 04:05:52 -04:00
Teknium
787936d133
feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250)
Introduce a typed agent→gateway delivery contract so the gateway (not the
agent) decides how each streaming event is rendered per platform. Moves toward
smart-agent/smart-gateway separation while reproducing today's behavior exactly
in the base class.

- gateway/stream_events.py: typed event vocabulary (MessageChunk/Stop,
  Commentary, ToolCallChunk/Finished, LongToolHint, GatewayNotice).
- gateway/stream_dispatch.py: GatewayEventDispatcher routes events through the
  adapter; adapters can eat events they can't render (e.g. tool chrome on
  plain-text platforms).
- gateway/platforms/base.py: render_message_event + format_tool_event default
  hooks reproduce the historical emoji/preview tool formatting and consumer
  delegation 1:1; adapters override for native rendering.
- gateway/platforms/telegram.py: send_draft now applies MarkdownV2 (format_message
  + parse_mode) with a plain-text fallback on BadRequest, fixing the jarring
  raw-text→formatted shift when the draft finalizes as a real sendMessage.
- gateway/config.py: default streaming transport edit → auto. Safe globally:
  adapters without draft support report supports_draft_streaming()==False and
  transparently use edit, so only Telegram DMs gain native drafts.

Presentation-only contract — nothing rendered here is persisted to conversation
history, preserving cache/message-flow invariants.
2026-06-02 00:33:50 -07:00
Teknium
2c0d648397
fix(cron): sanitize invisible unicode in vetted skill content instead of hard-blocking (#37245)
A stray zero-width space (U+200B), BOM, or bidi control in loaded skill
markdown permanently killed any cron that loaded it. The skills-attached
assembled-prompt scan hard-blocked on any invisible-unicode char, even
though skill bodies are already install-time vetted by skills_guard.py and
the chars commonly appear in copy-pasted unicode docs / code examples.

The skills path now strips invisibles (logging the codepoints) and runs the
cleaned prompt. The raw user-prompt path (_scan_cron_prompt) keeps the hard
block — that is the actual #3968 injection surface, where a small directive
prompt with a ZWSP is a smoking gun, not prose. Stripping does not let a real
injection slip through: the directive still matches after sanitization.

_scan_cron_skill_assembled now returns (cleaned_prompt, error).
2026-06-02 00:29:44 -07:00
emozilla
134643a2fa fix(desktop): reflect active toolset provider in config panel
The toolset config panel highlighted the first keyless provider (e.g.
Nous Portal) on load instead of the provider actually written to config.
The /api/tools/toolsets/{name}/config endpoint never reported which
provider was active, so the GUI's default-expand logic fell back to
"first configured" — and keyless providers are always "configured".

Backend now annotates each provider with is_active (via the same
_is_provider_active helper the CLI 'hermes tools' picker uses) plus a
top-level active_provider summary. The panel prefers that signal before
falling back to first-configured/first.

Adds a frontend regression test (active provider is expanded on load)
and backend coverage (config reports is_active/active_provider; selecting
a provider round-trips into the next config read).
2026-06-02 03:25:46 -04:00
Teknium
0269eca7e1
test(minimax): assert M3 stale-cache guard contract, not a brittle 1M literal (#37220)
test_stale_m3_cache_dropped_and_reresolves_to_1m hardcoded
assert ctx == 1_000_000. The test re-resolves M3 through the live models.dev
registry (the seeded stale entry is dropped, so nothing short-circuits the
lookup), and models.dev now reports MiniMax-M3 at 512,000 — a change-detector
failure unrelated to any code change.

The guard's actual contract is: a stale <=204,800 catch-all value for an M3
slug must be DROPPED and re-resolved to M3's real (large) context. Both
sources satisfy that (hardcoded catalog 1,000,000; models.dev 512,000), so
assert the invariant (ctx > 204,800, stale value gone) instead of a literal
that external data can move. Renamed accordingly.

47/47 in test_minimax_provider.py pass.
2026-06-01 23:35:23 -07:00
Evi Nova
81dd43a8eb
fix(docker): preserve Docker -w workdir in main-wrapper (#35472) (#36259)
Save the original working directory before init scripts cd to
/opt/data, then restore it before exec'ing the user command, so
the container starts in the Docker -w directory instead of /opt/data.

Adds regression test verifying cwd save/restore ordering in
main-wrapper.sh.
2026-06-02 16:13:44 +10:00
Teknium
272c2f30aa
fix(kanban): kanban_create inherits the spawning worker's task workspace (#37182)
When a dispatcher-spawned worker (HERMES_KANBAN_TASK set) calls
kanban_create without an explicit workspace, the new child now inherits
the worker's own running-task workspace_kind/workspace_path instead of
defaulting to scratch. A worker editing a dir:/worktree project that
spawns a follow-up child keeps it in that project.

Orchestrators (kanban toolset, no HERMES_KANBAN_TASK) and CLI/dashboard
callers still default to scratch. An explicit workspace arg always wins.
2026-06-01 21:26:29 -07:00
Teknium
bd8e2ec1a6
feat(dashboard): complete admin panel — MCP catalog, enable/disable toggles, hook creation, system stats (#36736)
* feat(dashboard): MCP catalog + enable/disable, webhook toggle, hook create/delete, system stats

Backend for the comprehensive admin pass:
- MCP: GET /api/mcp/catalog (browse Nous-approved optional-mcps), POST
  /api/mcp/catalog/install, PUT /api/mcp/servers/{name}/enabled
- Webhooks: PUT /api/webhooks/{name}/enabled; gateway rejects disabled routes
  with 403 (hot-reloaded, no restart)
- Hooks: POST/DELETE /api/ops/hooks — create (with consent approval) + remove;
  list now reports accurate allowlist status + valid events
- System: GET /api/system/stats — OS/arch/python/cpu + psutil memory/disk/
  uptime/process, stdlib fallback

All gated by dashboard auth; secrets never returned.

* feat(dashboard): MCP catalog UI, enable/disable toggles, hook create, system stats

- McpPage: catalog section (browse Nous-approved MCPs, one-click install with
  env prompts) + per-server enable/disable toggle with gateway-restart note
- WebhooksPage: per-subscription enable/disable toggle (muted + badge when off)
- SystemPage: new Host stats section (OS/arch/python/cpu/mem/disk/uptime/load),
  shell-hook create modal + delete, 'Create backup' label
- api.ts: client methods + types for catalog, toggles, hook CRUD, system stats

* test(dashboard): cover catalog, toggles, hook CRUD, system stats, webhook toggle

Adds tests for the comprehensive pass: MCP enable/disable + catalog list +
catalog-install-unknown, hook create/delete with consent, system stats shape,
and webhook enable/disable. 26 tests total, all green.

* docs(dashboard): document the comprehensive admin pass + fresh screenshots

Updates the MCP/Webhooks/Pairing/System sections for catalog browse+install,
enable/disable toggles, hook creation, and host system stats; adds the new
endpoints to the API table; replaces the screenshots with live captures of
the rebuilt pages (real data, no dummies) including the hook-create modal.

* feat(dashboard): curator, portal status, and prompt-size/dump/migrate ops

Closes the last in-scope CLI gaps from the coverage audit:
- Curator: GET /api/curator (status), PUT /api/curator/paused, POST
  /api/curator/run (background)
- Portal: GET /api/portal (Nous auth + Tool Gateway routing, read-only)
- Diagnostics: POST /api/ops/prompt-size, /api/ops/dump, /api/ops/config-migrate
  (backgrounded, tailed via action status)

Host-bound commands (secrets/proxy/lsp/acp/computer-use/desktop/completion/
postinstall/uninstall/claw) remain CLI-only by design.

* feat(dashboard): curator + portal + diagnostics UI, tests

- SystemPage: Nous Portal status section (auth + Tool Gateway routing),
  Skill curator card (status + pause/resume + run now), and three new
  Operations buttons (prompt size, support dump, migrate config)
- api.ts: client methods + CuratorStatus/PortalStatus types
- tests: curator pause/resume, portal shape, system-stats shape, + auth-gate
  coverage for the new GET endpoints (31 tests total)

* docs(dashboard): document curator, portal, and diagnostics + refresh System screenshots

Updates the System section for the Nous Portal status, Skill curator
controls, and the new prompt-size/dump/migrate operations; adds them to the
API table; refreshes the System screenshots (now showing Portal + Curator)
and adds a dedicated curator/gateway/memory capture.

* feat(dashboard): session stats/export/prune + skills hub search endpoints

Completes the existing tabs' backend depth (audit vs CLI):
- Sessions: GET /api/sessions/stats (store stats), GET /api/sessions/{id}/export,
  POST /api/sessions/prune. /stats is registered before /{session_id} so the
  literal path isn't captured by the parameterized route.
- Skills: GET /api/skills/hub/search — parallel multi-source hub search (threaded),
  returns installable identifiers
- (rename via PATCH and cron-edit via PUT already existed; now surfaced in UI)

* feat(dashboard): complete existing tabs — sessions mgmt, skills hub browse, cron edit

Audited every existing tab against its CLI command and filled the gaps:
- Sessions: store stats bar, per-row rename + export (JSON download), and a
  prune-old-sessions control (mirrors hermes sessions rename/export/prune/stats)
- Skills: new 'Browse hub' view — search the skill hub across all sources,
  install by identifier with a live install log, and 'Update all' (mirrors
  hermes skills search/install/update)
- Cron: per-job Edit modal (pre-filled) calling updateCronJob (hermes cron edit)
- api.ts: renameSession/getSessionStats/exportSessionUrl/pruneSessions,
  updateCronJob, searchSkillsHub + types

Models tab was already comprehensive (provider+model picker, dynamic per-provider
lists, main + all 11 aux-task assignments, reset) — verified, no change needed.

* test(dashboard): cover session stats/rename/export/prune + skills hub search

Adds the route-shadowing guard for /api/sessions/stats (must not be captured
by /api/sessions/{session_id}), rename/export/prune, and the empty-query
short-circuit for hub search. 36 tests total, all green.

* docs(dashboard): document enhanced Sessions, Skills hub, and Cron edit

Sessions: stats bar, rename, export, prune (+ screenshot). Skills: new Browse
hub view for search/install/update (+ screenshot). Cron: edit action. API
table updated with the new endpoints.
2026-06-02 00:16:11 -04:00
whyhkzk
1495f0cc38
fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049) (#32407)
* fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049)

Brian's shape-based guard (#32213) catches paths that still carry the
full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side.
The inner-container case is not covered: when file tools execute inside
Docker the bind-mount strips that prefix, so the guard receives plain
/root/.hermes/… and passes through. The root:root ownership on the
divergent SOUL.md in #32049 confirms this is the primary failure mode.

Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment
when persistent=True. classify_container_mirror_target / get_container_
mirror_warning detect any write whose resolved path falls under that
prefix, using the same warning format and cross_profile=True bypass
contract as the existing guards. Chain the new guard in
_check_cross_profile_path after the two existing detectors.

* fix(file-safety): derive Docker mirror guard from task

---------

Co-authored-by: Ben <ben@nousresearch.com>
2026-06-02 14:03:37 +10:00
Stephen Chin
a5aecf26fa feat(kanban): gate notifier watcher on dispatch_in_gateway
Non-dispatch gateways no longer open per-board kanban DBs for notifier
polling. Mirrors the existing dispatcher gate (config
kanban.dispatch_in_gateway, default True; env override
HERMES_KANBAN_DISPATCH_IN_GATEWAY) so multi-gateway setups collapse to a
single process holding kanban.db file descriptors.

Salvaged from PR #31964 by @steveonjava; tests and docs trimmed during
salvage.
2026-06-01 20:30:24 -07:00
xxxigm
c35ede789f refactor(cli): normalize note and avoid blank lines in prepend helper
Adopt the cleaner handling from PR #37080: coerce/strip the note and
skip the extra newlines when the underlying message (or text part) is
empty, while keeping the safer fail-open behavior for unknown shapes.
2026-06-01 20:30:08 -07:00
xxxigm
a26a12ad07 test(cli): cover _prepend_note_to_message str/list handling
Regression coverage for the multimodal-message TypeError: note folding into
text parts, image-only insertion, empty-note passthrough, and unknown-shape
fail-open.
2026-06-01 20:30:08 -07:00
Teknium
21f55af769
fix(model-picker): stop routing OpenAI selection to OpenRouter (#37175)
The /model picker emitted a standalone slug=openai row (gated on
OPENAI_API_KEY). Selecting it ran resolve_provider_full("openai"),
which resolved the legacy providers.py alias openai->openrouter BEFORE
checking the user's own providers.openai config — silently switching
users onto OpenRouter (HTTP 401 when they have no OR key).

- model_switch.list_authenticated_providers: skip vendor names that are
  aliases to an aggregator (isolates openai->openrouter; copilot/kimi/etc.
  are real providers and unaffected). Kills the phantom picker row.
- providers.resolve_provider_full: user-config providers.<name> now wins
  over the built-in alias table, so providers.openai (api.openai.com)
  beats the alias.
- model_switch PATH A: user-config providers resolve credentials via
  their own endpoint instead of the name-based runtime resolver that
  doesn't know user-config slugs; plus a fail-loud guard for explicit
  unauthed-aggregator hops.

Verified E2E with the reporter's config (no OR key): selecting OpenAI +
gpt-4o-mini now resolves to api.openai.com instead of openrouter.ai.
2026-06-01 20:27:41 -07:00
Teknium
72e82f88c0
fix(kanban): decompose children inherit root workspace instead of forcing scratch (#37172)
decompose_triage_task hardcoded every fan-out child to workspace_kind
'scratch', ignoring the root task's workspace. A code-gen task created
with a dir:/worktree: workspace would fan out into throwaway scratch tmp
dirs (GC'd on archive), so generated code never landed in the project.

Children now inherit the root's workspace_kind + workspace_path. A child
dict may still override with its own workspace_kind/workspace_path; the
path only carries over when kinds match. Scratch roots are unchanged.
2026-06-01 20:26:57 -07:00
teknium1
fa3b06b035 refactor(telegram): generalize observed-media caching into a reusable primitive
Collapse the per-type observed-media dispatch into one platform-agnostic
cache_media_bytes() helper in gateway/platforms/base.py. Any adapter can now
hand it raw attachment bytes + a filename/MIME hint; it classifies against the
shared MIME registries, routes to the right cache_*_from_bytes helper,
sandbox-translates the path, and returns a CachedMedia with a ready
context_note(). Telegram's observed-group path shrinks to: size-gate, download,
call the helper, annotate. Also dedupes the addressed-media type ladder into
_media_message_type().

Net: contributor's Telegram-only +595 LOC becomes a +210/-32 production change,
with the reusable primitive available to Discord/Slack/Signal/etc.

Co-authored-by: Glucksberg <markuscontasul@gmail.com>
2026-06-01 20:18:41 -07:00
Glucksberg
f768e75ecf fix(telegram): cache observed group media 2026-06-01 20:18:41 -07:00
Stephen Schoettler
f24b7ed9d9 fix: make Honcho startup fail open 2026-06-01 20:13:42 -07:00
Teknium
59510d7b44
feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143)
Skills discovery surfaced ~136 of 88k skills in the CLI and gave community
skills no clickable source on the docs page. Three coupled fixes:

CLI browse:
- hermes skills browse capped at 50 because the per-source limit dict had no
  'hermes-index' key — when the centralized index is available the router
  skips external APIs and serves only the index, so the default-50 fallthrough
  silently truncated the whole hub. Add hermes-index: 5000. Browse now loads
  5367 (269 pages) instead of 136.
- Add an Identifier column + install/inspect hint to the browse table so users
  can act on what they see without a second 'search'.
- Route the TUI browse_skills() helper through parallel_search_sources so it
  inherits the same index-aware source-skip (was double-counting); expose
  identifier in its output.

Docs Skills Hub page:
- Synthesize a sourceUrl for every community skill (github tree URL, clawhub /
  skills.sh / lobehub / browse.sh detail pages), preferring the adapter's
  explicit extra.detail_url/source_url/repo_url. Expanded cards now show
  'View source' for community skills (was nothing) and keep 'View full
  documentation' for built-in/optional. 99% coverage.
- Add a Copy button on the install command.
- Add a loading state instead of flashing '0 skills / No skills found' while
  the 45MB catalog fetches.

Category cleanup:
- _guess_category fell back to tags[0] verbatim, producing ~430 junk one-off
  categories (version strings, brand names: '0.10.7 Dev', 'Doramagic Crystal').
  Now only curated buckets are accepted; unknowns fold into 'Other'. Widen the
  tag->category map so common community tags route to real buckets. 430 -> 173
  categories, top 20 all meaningful.

Tests: tests/website/test_extract_skills.py covers _source_url synthesis +
precedence and _guess_category curation (13 tests). All 27 skills-hub CLI
tests still pass. Docusaurus build verified; expanded cards confirmed in
browser for both community (View source) and built-in (View full docs).
2026-06-01 19:52:28 -07:00
Zyrixtrex
0cd5867bbb fix(whatsapp): honor dm_policy and group_policy open at the gateway 2026-06-01 19:51:21 -07:00
kyssta-exe
d4b533de4e fix: batch of small robustness/correctness fixes from @kyssta-exe
Salvages 8 distinct fixes from a batch of PRs by @kyssta-exe, reapplied
onto current main (original branches were stale) with a few refinements.

- cron(jobs.py): load_jobs() validates top-level JSON shape — a bare
  list auto-repairs into the {"jobs": [...]} dict; scalars/null raise a
  clear RuntimeError instead of an uncaught AttributeError that took
  down the whole cron subsystem (#37065, closes #36867).
- web(web_server.py): close the per-action log file handle after Popen
  so the parent stops leaking one fd per spawned action (#36843).
- web(web_server.py): DELETE /api/env returns 400 for invalid key names
  instead of a misleading 500, mirroring PUT /api/env (#36840).
- gateway(gateway.py): read /proc/<pid>/cmdline inside a with-block so
  the fd is released immediately instead of relying on GC (#36804).
- web-tools(web_tools.py): include "xai" in check_web_api_key() so a
  configured X.AI web backend reports as available (#36802).
- compression(conversation_compression.py): mark the feasibility check
  done only after it completes, and default the gate to "not checked"
  if the attribute is missing (#36803).
- completion(completion.py): replace `ls` with directory globbing in the
  generated bash/zsh/fish profile listers — handles names with spaces
  and skips non-directory entries (#36806).
- terminal-tool(terminal_tool.py): drop a duplicate `import threading`
  (#36808).
- claw(claw.py): the migrate recommendation now points at the real
  `hermes gateway stop` command instead of the non-existent
  `hermes stop` (#36795, #36796, closes #36771).
- tests: guard against a leaked HERMES_CRON_SESSION breaking gateway
  approval tests — add it to the hermetic conftest unset list (root
  cause, protects every test) and pop it in the affected test's
  setup_method (#36796).

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-01 19:51:03 -07:00
teknium1
64f7f36713 fix(mcp): make non-MCP HTTP endpoint fast-fail robust and non-retryable
Reworks the content-type preflight so a misconfigured HTTP MCP url (a web-app
root serving HTML) fails in <1s instead of hanging the full 60s connect_timeout
— and does so non-retryably, which neither original PR achieved.

- Allow-list detection (application/json, text/event-stream) instead of a
  text/html-only denylist — catches text/plain, application/xml, etc.
- New NonMcpEndpointError(ConnectionError); run() catches it in the same
  top-level fast-fail block as InvalidMcpUrlError, so it returns before the
  reconnect-backoff loop (truly non-retryable) and the probe runs once, not
  on every reconnect.
- Probe runs on its own httpx client OUTSIDE the SDK anyio task group, so the
  error propagates as itself rather than wrapped in an ExceptionGroup (the
  trap that made the in-SDK event-hook approach a no-op).
- Forwards ssl_verify + client_cert + headers; HEAD->GET fallback on 405/501;
  best-effort pass-through on missing content type, non-2xx, and network
  errors; skips SSE transport. CancelledError is never swallowed.
- Replaces the malformed test file (which never imported the real method and
  failed CI) with 21 tests driving the actual _preflight_content_type against
  a real local HTTP server, plus full run() integration verifying <1s
  non-retryable failure.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: uzunkuyruk <egitimviscara@gmail.com>
2026-06-01 19:49:50 -07:00
liuhao1024
c914e4a371 fix(mcp): fail fast on HTML content-type instead of waiting full connect_timeout
A misconfigured MCP server URL that returns text/html (e.g. pointing at
a web app root instead of an MCP endpoint) causes the MCP SDK to block
for the full connect_timeout (default 60 s) before surfacing
CancelledError.

Add a lightweight HEAD pre-flight check that detects text/html responses
in ≤5 s and raises ConnectionError with an actionable message. Non-HTML
responses, missing headers, and network errors pass through silently so
the normal MCP handshake proceeds unaffected.

Fixes #36052
2026-06-01 19:49:50 -07:00
brooklyn!
fabca0bdd8
feat(tui): single /model command + unified Sessions overlay (#37112)
* feat(tui): single /model command + unified Sessions overlay

Collapse the redundant `/provider` alias so `/model` is the only name
everywhere (it already drove the same 2-step ModelPicker in the TUI).

Merge the separate `/resume` (cold history browser) and `/sessions` (live
switcher) surfaces into one Sessions overlay reached by `/resume`,
`/sessions`, `/session`, and `/switch`. It pins a "+ new" row at the top
(always visible), lists live sessions with status, and lists resumable
history below — dispatching session.activate for live rows vs resume for
cold ones, with close/delete in place. Fixes `/session` opening an empty
live-only switcher and the hidden new-session affordance.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix(tui): address Copilot review on the Sessions overlay

- Track the armed history-delete by session id instead of row index so the
  1.5s live-status poll re-indexing rows can't redirect the second `d` to a
  different session.
- Re-add the busy-session guard to immediate `/resume <id>` and `/sessions new`
  actions (browsing the bare overlay stays allowed) so resuming/switching can't
  corrupt an in-flight turn's streaming/busy state.

* fix(tui): guard cold-resume (not live-switch/new) from the Sessions overlay

Copilot flagged that overlay actions bypassed the busy guard. Only cold
resume actually closes the current session, so only it is guarded — both
from the slash path and now from the overlay (appActions.resumeById).
Switching between live sessions and starting a `+ new` live session keep
the current session running in the background, so they stay unguarded:
that concurrency is the orchestrator's whole purpose. Also dropped the
over-broad guard on `/sessions new` for the same reason.

* fix(tui): address Copilot review (history dedup + desktop /provider)

- The 1.5s poll now re-derives the resumable list from the RAW session.list
  results (rawHistoryRef) against the current live set, so a session hidden
  while live reappears in history once it closes — instead of being lost
  until a full reload. Delete also prunes the raw ref.
- Drop the dead `/provider` entry from the desktop PICKER_OWNED_COMMANDS now
  that the alias is gone, so the desktop client no longer advertises it.

* fix(tui): surface session.list errors + keep selection stable across polls

- A garbled session.list response now surfaces an error and preserves the
  last good raw history, instead of silently blanking the resumable section.
- The 1.5s poll re-anchors the selection to the same row by session id
  (live or history) when the live list grows/shrinks, so the highlight no
  longer drifts to a different row mid-interaction.

* fix(tui): degrade session.list independently + cover overlay helpers

- Fetch active_list and session.list via Promise.allSettled so a failing
  session.list no longer rejects the whole load: live sessions still render
  and only the resumable history degrades (with an error).
- Add unit tests for the new helpers (sessionRowKindAt row ordering,
  resumableHistory dedupe, sessionsCountLabel, relativeSessionAge).

* test(tui-gateway): assert /provider alias is gone, /model remains

The CI test_complete_slash_includes_provider_alias asserted the removed
`/provider` alias still autocompleted. Flip it to lock in the removal:
`/pro` no longer offers `provider`, and `/mod` still completes `model`.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-01 22:28:36 -04:00
Zyrixtrex
f7a3509b25 fix(gateway): honor WECOM_ALLOWED_USERS in env-only WeCom DM allowlist 2026-06-01 19:20:36 -07:00
Julien Talbot
8104b20269 fix(xai): route video models by modality 2026-06-01 19:00:30 -07:00
Ben Barclay
eee32cdd52
fix(gateway): fall back to in-process heartbeat when s6 sleep is missing (#36208) (#37120)
Inside an s6 container, `gateway run` redirects to the supervised
gateway and then keeps the CMD process alive as a no-op heartbeat so
/init doesn't start stage-3 shutdown. That heartbeat is
`os.execvp("sleep", ["sleep", "infinity"])`, which does a PATH lookup
for the `sleep` binary. When PATH was empty/truncated/clobbered at that
point — e.g. after user customizations rewrote PATH, or on a minimal
image without `sleep` on PATH — the exec raised FileNotFoundError,
killing the CMD process and causing /init to tear down every service:
the container failed to start (issue #36208, a regression in the s6
image from 2026.5.28).

Wrap the exec in try/except OSError: on success it still replaces the
process with the cheap `sleep` heartbeat (no resident Python
interpreter, and the existing process-tree/recursion contract is
preserved); on failure it falls back to `_block_until_terminated()` —
a SIGTERM handler (clean 128+signum exit on `docker stop`) plus a
signal.pause() loop, which needs no external binary and so can't fail
on PATH state. A threading.Event().wait() fallback covers platforms
without signal.pause().

Keeping execvp as the primary path (rather than replacing it outright)
preserves the `sleep infinity` heartbeat that the docker integration
tests assert (test_gateway_run_supervised.py) and avoids leaving a
full Python interpreter resident for the container's lifetime.

Verified end-to-end on a built image: with execvp forced to fail,
_block_until_terminated() blocks cleanly instead of raising
FileNotFoundError; normal boot still runs the cheap `sleep infinity`
heartbeat; the 6 test_gateway_run_supervised.py integration tests pass.

Salvages the two community fixes for this issue — the fallback design
from #36221 (@Pluviobyte) and the signal.pause() heartbeat from #36267
(@karmeleon) — and adds regression tests for both the normal and
sleep-missing paths.

Co-authored-by: Pluviobyte <Pluviobyte@users.noreply.github.com>
Co-authored-by: karmeleon <karmeleon@users.noreply.github.com>

Closes #36208.
2026-06-02 11:59:27 +10:00
Trevin Chow
05022066ea feat(bluebubbles): support group mention gating 2026-06-01 18:52:05 -07:00
brooklyn!
85b65e29f0
feat(desktop): session hygiene, archive, media streaming + connecting overlay (#37099)
* feat(desktop): session hygiene, archive, media streaming + connecting overlay

Address a batch of desktop feedback:

- Stop leaking empty "Untitled" sessions: the TUI gateway pre-created a DB
  row on every session.create (i.e. every launch/draft). Persist the row
  lazily on first prompt instead, and hide message-less rows in the sidebar.
- Archive/hide sessions: new `archived` column + set_session_archived, web
  API (`?archived=` + PATCH archived), Ctrl/⌘-click and a context-menu item
  in the sidebar, and an "Archived Chats" settings panel to restore/delete.
- Videos load via a streaming `hermes-media://` protocol instead of capped,
  in-memory data URLs (16 MB limit) — bypasses the cap and supports seeking.
- Background-process completions route to the session that launched them:
  the completion event now carries session_key and each poller only consumes
  its own.
- Sidebar: "Group by workspace" toggle is always visible; each workspace
  group gets a "+" to start a session in that directory; "New agent"/"Agents"
  relabeled to "New session"/"Sessions".
- New gateway connecting overlay (ascii decode → fade out) replacing the bare
  skeleton/"starting gateway" state.

* fix(desktop): bail connecting overlay on boot error

The shownRef latch kept the connecting overlay mounted behind
BootFailureOverlay after a hard boot failure. Return null on boot.error
so the failure recovery surface fully owns the screen.

* fix(desktop): address Copilot review

- /api/sessions: validate `archived` (400 on unknown) and return `archived`
  as a JSON boolean instead of SQLite's 0/1.
- PATCH /api/sessions/{id}: 400 (not a misleading 404) when the body has no
  updatable fields; stop conflating a no-op with "not found".
- hermes-media protocol: drop `bypassCSP` — streaming only needs
  secure/standard/stream/supportFetchAPI.
- Sidebar workspace header: split the toggle and the "+" into sibling buttons
  so we no longer nest interactive elements inside a <button>.

* fix(desktop): address Copilot re-review

- hermes-media protocol: restrict streaming to an audio/video extension
  allowlist (415 otherwise) so it can't be used to read arbitrary local files.
- Connecting overlay: use z-[1200] instead of the non-standard z-1200 utility.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-01 20:41:34 -05:00
Brian D. Evans
162c7856ca
fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32213)
#32049 reports that under terminal.backend: docker, write_file / patch
calls to authoritative profile state (SOUL.md, memories, etc.) land on
the sandbox-local mirror at
``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...``
— a path the host Hermes process never reads. The tool reports success,
the user sees no behavior change, and on disk two divergent copies of
SOUL.md (or any other profile file) accumulate.

The existing classify_cross_profile_target guard does not catch this:
its parts[2] check sees "sandboxes" and returns None, and the path is
in-profile from the inner-mirror perspective so even a fixed version
would not fire.

Add a parallel sandbox-mirror classifier in agent/file_safety:

  * classify_sandbox_mirror_target() detects the
    ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts.
    Detection is path-shape only — backend-agnostic, does not require
    the file to exist, and works regardless of which HERMES_HOME resolves.
  * get_sandbox_mirror_warning() returns a model-facing warning that
    names the mirror root and the inner authoritative path the agent
    likely meant.

Wire both detectors through tools/file_tools._check_cross_profile_path
so the existing write_file and v4a patch call sites pick up the new
guard with no API change. The bypass kwarg (``cross_profile=True``)
remains shared between the two guards — same "I know what I'm doing"
escape valve after explicit user direction.

This is the defense-in-depth piece of the proposal in #32049 ("any
…/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It
catches the host-side speculation case where the agent writes a literal
sandbox-mirror path. The inner-container case (where the bind mount
strips the ``sandboxes/`` prefix from the agent's path view) is out of
scope for this surgical change — that requires either a dispatch-layer
host-side check before the container handoff, or the host-side
``profile_state`` / ``soul`` tool the issue also proposes.

Soft guard, NOT a security boundary — matches the existing
classify_cross_profile_target contract.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Co-authored-by: Ben Barclay <ben@nousresearch.com>
2026-06-02 11:29:24 +10:00
firefly
765790a216 test(weixin): regression suite for _api_post/_api_get timeout migration 2026-06-01 17:31:40 -07:00
firefly
a1f76ba7e9 fix(gateway): recover extract-stripped tool responses on all platforms (#29346)
The extract pipeline (extract_media/extract_images/extract_local_files +
directive strips) can reduce a non-empty tool-using response to empty
text_content with no deliverable attachment. The 'if text_content' send
guard then silently skips delivery: a 'response ready' log with no
'Sending response', no error, and the answer never reaches the user.

- A2: snapshot the pre-extract response; when extraction yields empty text
  and no image/local/media attachment, deliver the recovered original from
  the post-extract_media body (so a spaced MEDIA path can't leak). Applies
  on ALL platforms (supersedes the Discord-only #33842 and the unsafe
  raw-fallback #29499).
- A3: loud delivery invariant - a non-empty response that produces nothing
  deliverable logs response_delivery_dropped at ERROR; every recovery logs
  response_delivery_recovered. No silent drop survives.
- Factor a _strip_media_directives helper for the [[...]] strips; MEDIA
  stripping stays owned by extract_media, whose grammar handles spaced and
  quoted paths.
- Salvaged + de-scoped the #33842 test harness to all platforms; added
  unrecoverable-drop and no-leak regression tests.
2026-06-01 17:31:32 -07:00
firefly
8bf498c21d fix(gateway): scope final-delivery flags to turn-final segment (#29346)
A streamed preamble ("Let me search...") finalized at a tool boundary
routed through _try_fresh_final, which unconditionally set
_final_response_sent=True even though it is a NON-final segment. The
gateway then reads that flag as "final delivered" and suppresses the
genuine final answer produced on the next API call, so the user silently
gets nothing. Only reproduces with fresh_final_after_seconds > 0.

- _try_fresh_final / _send_or_edit take is_turn_final; the segment-break
  call site passes is_turn_final=got_done so only the turn-final answer
  marks final-delivered.
- _reset_segment_state clears the final-delivery flags at every tool
  boundary as defense-in-depth against any future premature setter.
- Failing-first regression + happy-path no-duplicate test.
2026-06-01 17:31:32 -07:00
kshitijk4poor
0fdab53ef0 feat(cli): ranked fuzzy search in the curses model picker
Wires the salvaged search helpers into the shared curses menu driver and
turns on type-to-filter for the CLI model pickers (the 100+ model lists
that previously required scrolling).

- Search lives in the shared `_run_curses_menu` driver behind a
  `searchable` flag + `search_labels`, so both `curses_radiolist` and
  `curses_single_select` get it without per-menu duplication. `/` opens
  the filter, BACKSPACE edits, Ctrl+U clears, ESC clears the filter then
  cancels. Returned values are always original item indices.
- `_filter_indices` RANKS matches (best-first) via a Python port of the
  TS scorer in ui-tui/src/lib/fuzzy.ts and web/src/lib/fuzzy.ts. The port
  is byte-identical in score: same per-char bonuses, prefix (+8) and
  exact (+20) bonuses, camelCase/word-boundary detection (matching on the
  lowercased target, boundary on the original case), and the -len*0.01
  length tiebreak — so the CLI, TUI, and WebUI rank results identically.
  A cross-language parity test pins the exact scores.
- `_prompt_model_selection` (the canonical picker across the model flows)
  and the custom-provider model list pass `searchable=True`.
- Split `_decode_menu_key` out of `read_menu_key` so the search loop can
  peek the raw key (catch `/`) before nav decoding.
- ESC during active search now clears the query (restores the full list)
  so a no-match filter can't strand the user; printable-key capture is
  restricted to ASCII to avoid Latin-1 mojibake.
- Update two setup-menu tests whose mock signatures predate the new
  `searchable` kwarg; add ranked-scorer + parity + state-machine tests.
2026-06-01 16:58:58 -07:00
Harish Kukreja
53f598e7a2 feat(cli): add fuzzy search helpers for curses pickers
Pure, refactor-independent helpers for type-to-filter search in the
curses single-/radio-select menus: subsequence matching, filtered-index
mapping, cursor reconciliation, scroll clamping, and an active-search
key handler, plus unit tests.

Salvaged from #22758 (the curses event loop was since refactored into a
shared driver on main, so the integration is rebuilt in a follow-up
commit; these pure helpers and their tests carry over unchanged).
2026-06-01 16:58:58 -07:00
firefly
128da68823 test(tools): characterize tool-surface TERMINAL_CWD contract (#29265)
Port PR #29365's tool-surface contract test: terminal/file/execute_code
already honor TERMINAL_CWD (out of scope for the resolver cluster). Pinning
the behavior makes the supersession of #29365 airtight and guards against a
future refactor silently regressing the workspace contract.
2026-06-01 16:55:04 -07:00
firefly
ac0cce5f3f test(agent): pin whitespace-strip and OSError-propagation in runtime_cwd
Cover the two new hardening behaviors that were unpinned: whitespace-only
TERMINAL_CWD falling through to getcwd/None, and OSError from the getcwd
fallback arm propagating to the build_environment_hints try/except guard.
2026-06-01 16:55:04 -07:00
firefly
75f478750c docs(test): correct None-semantics comment in test_runtime_cwd (discovery not skipped) 2026-06-01 16:55:04 -07:00
firefly
f90777a6b8 refactor(prompt): route context-file cwd through runtime_cwd resolver 2026-06-01 16:55:04 -07:00
firefly
c79b80a8a5 test(prompt): place cwd regression tests in TestEnvironmentHints (drop redundant docker case) 2026-06-01 16:55:04 -07:00
firefly
16047655b5 fix(prompt): show configured working directory in system prompt (closes #24882, #24969, #27383, #29265) 2026-06-01 16:55:04 -07:00