Commit graph

13028 commits

Author SHA1 Message Date
kyssta-exe
07cc567dfa fix(security): add circuit breaker for tirith crashes to prevent agent hangs (#41400) 2026-06-26 15:26:08 +05:30
brooklyn!
ca82d0accc
Merge pull request #52993 from NousResearch/bb/desktop-clarify-redesign
feat(desktop): redesign the clarify prompt + fix its awaiting-input states
2026-06-26 03:57:43 -05:00
Brooklyn Nicholson
54b50037e1 fix(desktop): treat a pending prompt as paused-on-you, not working
A clarify/approval/sudo/secret prompt blocks the turn on the user, but the UI
treated it as an in-flight turn: the "thinking" timer kept ticking and Esc
interrupted the run — discarding a question you might want to come back to. Add
$activeSessionAwaitingInput (the pet's awaitingInput concept, scoped to the
active session) and use it to suppress the stall indicator and disarm Esc while a
prompt waits. Clear the session's prompts (and needsInput) on Stop and on turn
end so a resolved/aborted turn can't leave a dead panel or a stuck "needs input"
dot.
2026-06-26 03:55:34 -05:00
Brooklyn Nicholson
8559246bfb feat(desktop): rebuild the clarify prompt to match the chat UI
The inline clarify panel used its own card tokens, an animated ring, and
oversized spacing — out of step with every other tool row. Rebuild it on the
shared --ui-*/--conversation-* tokens: a compact panel, letter-key badges
(A/B/C…) that double as a/b/c… shortcuts, an inline content-sizing "Other" field
(CSS field-sizing — no view swap, no layout shift on focus), and a Continue
button so picking an option selects rather than auto-sends. Selection lives on
the letter badge alone (solid primary; outlined while Other is focused-but-empty).
Also settle the panel into the standard tool block once the turn stops running,
so a stopped turn no longer strands a live, unanswerable prompt.
2026-06-26 03:55:29 -05:00
kshitij
1aa458a1e6
Merge pull request #52920 from NousResearch/salvage/38798-toolset-validation
fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798)
2026-06-26 14:14:55 +05:30
lEWFkRAD
41ede84b93 fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798)
A config migration (or hand-edit) that leaves an invalid toolset name in
`platform_toolsets` — e.g. the #38798 corruption that rewrote `hermes-cli` to
the non-existent `hermes` — silently disabled all affected tools:
resolve_toolset() returns [] for an unknown name, so the agent quietly lost its
tools with no error, warning, or log entry and degraded to text-only replies.

Surface it loudly at two points:
- After migration (migrate_config): validate platform_toolsets and record/print
  a warning per unknown name, with a `hermes-<platform>` suggestion when that
  would have been valid (the exact #38798 shape).
- At runtime (_get_platform_tools): if a platform was explicitly configured but
  every toolset name is invalid, log a warning when tools are resolved for a
  session — so an ALREADY-corrupted config is caught at startup, not only on the
  next `hermes update`.

Logic lives in a new pure, side-effect-free helper (toolset_validation.py) with
validate_toolset injected, so it is unit-testable without the tool registry.

Note: the original v25→v26 migration that caused the corruption no longer
exists (config format is now v30; no migration step rewrites toolset names).
This change is the durable defense against the silent-failure mode regardless
of cause, matching the issue's "Expected: log a warning".

Salvaged from #39207 by @lEWFkRAD (authorship preserved via cherry-pick).
Tests: 9 helper cases (incl. the #38798 corruption shape, mixed valid/invalid,
zero-tools state, non-dict/scalar/non-string) + a runtime caplog test — both the
helper warning and the runtime guard mutation-verified to fail without the fix.

Closes #38798. Supersedes #39581 (prevent-in-v25→v26 — that path is gone),
#41006 / #40208 (repair-migration for already-corrupted configs).
2026-06-26 14:07:43 +05:30
helix4u
063fe4f6ef fix(auxiliary): fallback on invalid provider responses 2026-06-26 13:49:46 +05:30
teknium1
fbfccbb3ee fix(security): align cron invisible-unicode set with install-time scanner
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.

Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.

Fixes #35075

Co-authored-by: rlaope <piyrw9754@gmail.com>
2026-06-26 01:11:11 -07:00
Shannon Sands
a0dc92450b Split dashboard PTY reconnect tests 2026-06-26 01:06:02 -07:00
Shannon Sands
41f8126148 Reconnect dashboard PTY chat after socket drops 2026-06-26 01:06:02 -07:00
Shannon Sands
6a319f570f Settle TUI resume scroll after hydration 2026-06-26 01:05:26 -07:00
Teknium
619dc4a561
fix(whatsapp_cloud): resolve reply-to text so the agent sees reply context (#52957)
Replies on WhatsApp Cloud arrived at the agent with reply_to_id set but
reply_to_text=None, so run.py never injected the "[Replying to: ...]"
disambiguation prefix (it gates on reply_to_text). Meta's webhook context
object carries only the quoted message's id, never its text.

Index (chat_id, wamid) -> text in rich_sent_store on every inbound message
and every outbound text send -- the same store that solved the identical
Telegram rich-send problem -- then look up the quoted text in
_build_message_event_from_cloud and populate reply_to_text plus
reply_to_is_own_message, derived from context.from versus the business
number.
2026-06-26 01:05:05 -07:00
Ben
19b2624404 feat(gateway): external drain trigger + accept-gating (begin/cancel + control channel)
Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible
quiesce-without-restart machinery NAS drives during a lifecycle action (D4a).
These ship together because the endpoint, the control channel, and the gateway
state machine are one coherent slice.

2.2 — control channel (gateway/drain_control.py, new):
The dashboard has no HTTP path into a running gateway (guardrails: "there is NO
external control channel into a running gateway"); restart/drain is driven only
by markers the gateway reacts to. So begin/cancel-drain writes/removes a
presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write,
never-raises read; a corrupt marker reads as present-contentless → fail-safe
toward quiescing). This is Q-B option A.

2.2 — gateway state machine (gateway/run.py):
- _external_drain_active flag, DISTINCT from the shutdown _draining flag: this
  one does NOT exit the process and is fully reversible.
- _enter_external_drain / _exit_external_drain: idempotent transitions that
  flip gateway_state→draining / →running via _update_runtime_status (preserving
  the live active_agents count). exit refuses to revert to running during a
  real shutdown or after the loop stops (shutdown wins).
- _drain_control_watcher: 1s background task (modelled on _handoff_watcher)
  reconciling accept-state with the marker; honours a marker that survived a
  restart on its first tick. Registered alongside the other watchers in start.
- New-turn accept gate in _handle_message, placed BEFORE the session-slot
  claim: when draining, refuse to START a new turn (so active_agents can only
  fall → no TOCTOU race), while in-flight turns finish untouched. Internal/
  system events (restart-recovery replays, bg-process completions) bypass it.

2.1 — endpoint (hermes_cli/web_server.py):
POST /api/gateway/drain {action: drain|cancel}. Authenticated by the Task-2.0a
token seam (the drain plugin registered this exact path as a token route);
attributes the request to the verified token principal. Begin writes the
marker, cancel removes it — the gateway process owns the actual transition.
Force-override (D6) is NOT here; it maps onto the existing immediate
/api/gateway/restart force path.

Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next):
- tests/gateway/test_external_drain_control.py — marker contract (write/clear/
  read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/
  loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and
  in-flight-not-interrupted. 15 tests.
- tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/
  cancel/cancel-idempotent/bad-action-400. 6 tests.
- dashboard.drain_auth config section already added in 2.0b commit.

All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed.

Intentionally deferred:
- HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a
  real begin-drain marker, prove the 5-point checklist a–e.
- Spec-doc status flip + Phase-2 PR.

Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin,
token-auth, and web_server-endpoint suites green.
2026-06-26 00:47:19 -07:00
Ben
2e322466b1 feat(dashboard-auth): drain shared-bearer-secret provider plugin
Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer
of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A.

plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic):
- DrainSecretProvider: non-interactive provider, supports_token=True. Verifies
  an inbound Authorization bearer token against a per-agent shared secret with
  hmac.compare_digest (constant-time, no timing oracle) and, on a match,
  vouches for the caller as the "drain-control" principal scoped to "drain".
  The five interactive ABC methods raise NotImplementedError; verify_session
  returns None (stacks harmlessly in the cookie-verify loop).
- assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter
  than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or
  below 128 bits Shannon entropy — so a weak/structured/repeated secret can
  never be silently accepted. Enforced both at register() (friendly skip
  reason) and in __init__ (raises — defence in depth).
- register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is
  unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a
  strong secret, registers the provider AND opts /api/gateway/drain into the
  generic token-auth seam via register_token_route().

Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET
(per-agent, provisioned by NAS at deploy). Behavioural knobs only
(dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to
DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline.

Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate
(strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token
(match → scoped principal, wrong/empty → None, custom scope), protocol
compliance, interactive-methods-raise, and register() (skip-no-secret,
fail-closed-weak-secret, strong-env-secret registers + route opt-in, config
scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed.
Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous.

Intentionally deferred:
- The begin/cancel-drain endpoint handler itself — Task 2.1.
- The dashboard→gateway control channel — Task 2.2.

Build status: dashboard-auth + drain-plugin suites green.
2026-06-26 00:47:19 -07:00
Ben
cb9cb6ba1c feat(dashboard-auth): generic non-interactive API-token capability
Task 2.0a of the safe-shutdown drain-coordination plan. Widens the dashboard
auth framework GENERICALLY to support non-interactive (service-to-service)
bearer-token auth, mirroring the existing supports_password precedent. This is
a reusable capability — any future machine-credential provider plugs in without
core changes (decisions.md Q-C). The drain bearer-secret plugin (Task 2.0b) is
the first consumer, not the definition.

- base.py: add TokenPrincipal dataclass (the token analog of Session) +
  supports_token capability flag + verify_token() on the ABC (default raises
  NotImplementedError so a misconfigured provider fails loud). Contract mirrors
  verify_session stacking: return None for unrecognised tokens (never raise),
  raise ProviderError only on a genuine backing-store outage.
- registry.py: list_token_providers() — the supports_token subset, in
  registration order. Empty when none registered (token routes fail closed).
- token_auth.py (new): route-agnostic seam. Routes opt in via
  register_token_route(exact path); token_auth_middleware owns the auth
  decision for those routes only — authenticate via stacked providers, attach
  request.state.token_principal + token_authenticated, pass through. 401 on
  missing/unrecognised token, 503 when a provider was unreachable, untouched
  passthrough for non-token routes. Fails closed (never open).
- web_server.py: install the seam OUTERMOST (registered last → runs first).
  Both downstream gates (legacy auth_middleware + gated_auth_middleware) honour
  request.state.token_authenticated and skip enforcement, so a token-authed
  service request is never bounced to /login.
- audit.py: TOKEN_AUTH_SUCCESS / TOKEN_AUTH_FAILURE events.

Tests: tests/hermes_cli/test_dashboard_token_auth.py — ABC flag default,
verify_token NotImplementedError, registry filter, bearer extraction
(case-insensitive scheme, malformed/non-bearer → ""), provider stacking
(first-match-wins, unreachable-remembered, unreachable-then-valid, buggy
provider doesn't crash the gate), and the seam's passthrough/401/503/
fail-closed behaviour. 29 new tests; full dashboard-auth suite 169 passed.

Intentionally deferred:
- The concrete shared-bearer-secret provider plugin — Task 2.0b.
- The begin/cancel-drain endpoint that registers itself as a token route —
  Task 2.1.

Build status: dashboard-auth + plugin-hook suites green.
2026-06-26 00:47:19 -07:00
Teknium
099df3cd89
fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925)
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.

Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
2026-06-26 00:36:01 -07:00
teknium1
4d0dd6bd52 test(mcp): make invalid_client tests interactive under hermetic env
The new _maybe_flag_poisoned_client tests built a provider via
get_or_build_provider without an interactive stdin. Under the hermetic
test env (no TTY, no cached tokens), the non-interactive guard in
mcp_oauth_manager._make_provider raised OAuthNonInteractiveError before
the provider was built, failing 6 tests in CI parity (they passed
locally where stdin was a TTY).

Thread monkeypatch into _provider_with_token_endpoint and present an
interactive stdin, matching the sibling test_manager_builds_hermes_provider_subclass.
2026-06-26 00:35:27 -07:00
Max Hsu
075f93ad78 fix(mcp): auto-recover from invalid_client on stale OAuth client registration
Fixes #36767.

Two complementary recoveries for the recurring "delete three cache files and
re-auth by hand" ritual when an MCP server's dynamically-registered OAuth
client goes dead server-side (IdP redeploy / DB wipe / rebrand):

- Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs
  auth-flow responses and, on a 400/401 `invalid_client` from the discovered
  token endpoint, backs up + deletes `<server>.client.json` and `.meta.json`
  and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client
  registration on the next flow. Conservative by construction: only
  dynamically-registered (non config-supplied) clients, only the token
  endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's
  `invalid_client_metadata` does not trip it); best-effort so a miss never
  breaks the live flow. Covers both code-exchange and refresh when the token
  endpoint was discovered. Tokens are preserved.

- `hermes mcp reauth [<name>|--all]`: the reporter's primary symptom — the
  IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK
  only sees a callback timeout), so it cannot be auto-detected. The new
  command re-auths one or ALL `auth: oauth` servers, serially: one browser
  flow at a time, which also fixes the startup popup storm when several
  servers are stale at once. Single-server reauth is factored out of
  `mcp login` and shared.

Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint,
success-response, pre-registered, and invalid_client_metadata negative guards;
a bridge integration test driving the real async_auth_flow generator to prove
the detection hook preserves the bidirectional asend() forwarding contract;
reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh
122/122 green for the touched suites; check-windows-footguns.py and ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 00:35:27 -07:00
Ben Barclay
6e4e5967f7
feat(relay): multi-platform-per-agent — list identity, provision-loop, N-hello, per-frame egress (Phase 1.5) (#52830)
Cut over the agent half of Shape A (D-Q1.5a/b.1/c) to front a SET of platforms on
one relay WS:

- relay_platform_identities() parses GATEWAY_RELAY_PLATFORMS (list) +
  GATEWAY_RELAY_BOT_IDS (JSON keyed map {platform:{botId,username?}}). Cut over
  from the scalar GATEWAY_RELAY_PLATFORM/_BOT_ID (no fallback, D-Q1.5c).
- self_provision_relay() loops one /relay/provision per platform under one
  gatewayId+secret, partial-failure-tolerant.
- WebSocketRelayTransport takes the identity SET, sends one hello per identity
  (connector accumulates the advertised set), and stamps the per-frame
  OutboundFrame.platform + its matching advertised botId on outbound.
- RelayAdapter remembers each chat's underlying source.platform (mirroring the
  existing guild/dm scope capture) and tags the reply's egress platform.
- send_relay_policy() declares one relevance policy per fronted platform (the
  connector keys policy by (tenant,platform,instanceId)).

Single-platform deploys are byte-identical on the wire (1-element list, no per-frame
tag -> connector session-default fallback). typecheck/ruff clean; relay unit 221 pass
(+10 new); all 15 cross-repo E2E drivers green vs connector origin/main.
2026-06-26 17:32:46 +10:00
brooklyn!
a2b49e60b6
Merge pull request #52412 from GodsBoy/fix/verify-on-stop-messaging-surface-leak
fix(agent): gate verify-on-stop nudge off for messaging surfaces
2026-06-26 02:30:08 -05:00
kshitij
7d568293f9
Merge pull request #52891 from kshitijk4poor/salvage/52623-aux-host
fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608)
2026-06-26 12:24:19 +05:30
konsisumer
3cf900eb67 fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
Ben Barclay
cb7d1f68f8
fix(relay): accept is_reconnect kwarg in RelayAdapter.connect (#52911)
The gateway reconnect watcher (gateway/run.py) recovers a platform after a
fatal adapter error by building a fresh adapter and calling
connect(is_reconnect=True). Every BasePlatformAdapter implements
connect(*, is_reconnect: bool = False) for this — except RelayAdapter, whose
connect() was bare. So the watcher's recovery path raised:

    TypeError: connect() got an unexpected keyword argument 'is_reconnect'

Observed live on a hosted staging agent: after a fatal relay adapter error the
watcher could never re-establish relay, so the shared-bot inbound never reached
the gateway and Discord DMs stopped (dashboard surfaced the TypeError).

Relay deliberately ignores the flag: the #46621 server-side-queue-preservation
concern doesn't apply, because relay's outage buffer is the connector's durable
buffer (replayed on the transport's re-handshake), not a gateway-side queue the
adapter owns. Routine WS drops are already handled by the transport's own
reconnect supervisor (WebSocketRelayTransport, reconnect=True); the watcher path
is fatal-error recovery, and the fatal handler disconnect()s the old adapter
(cancelling its supervisor) before a fresh adapter+transport is built, so there
is no double-dial.

Adds two regression tests (both proven red without the fix): connect(is_reconnect=True)
reaches the same transport-less RuntimeError instead of TypeError, and the
signature matches BasePlatformAdapter.connect.
2026-06-26 16:46:09 +10:00
brooklyn!
0f81b0d458
Merge pull request #52901 from NousResearch/bb/desktop-tui-lint-fixes
style(desktop,tui): fix all lint/type/formatting issues
2026-06-26 01:07:14 -05:00
Brooklyn Nicholson
62fe9fd101 style(desktop,tui): fix all lint/type/formatting issues
Bring apps/desktop and ui-tui to a clean state for typecheck, eslint,
and prettier:

- Run prettier across both trees (printWidth/wrap drift; prettier is not
  CI-enforced for these JS projects, so main had accumulated drift).
- Apply eslint --fix for padding-line-between-statements and perfectionist
  import/export sorting.
- Manual fixes for non-auto-fixable rules:
  - remove unused node:net import in electron/main.cjs (uses Electron net)
  - replace inline `typeof import(...)` annotations with top-level
    `import type * as EnvModule` in two ui-tui test files
  - scoped eslint-disable no-control-regex on intentional sentinel/ANSI
    regexes (mathUnicode.ts, text.ts)
  - resolve react-hooks/exhaustive-deps per-case: correct swapped/missing
    deps, collapse redundant session.* members, and justified disables on
    settings mount-only data-load effects to preserve run-once behavior

No behavior changes; test pass/fail counts are unchanged from the main
baseline.
2026-06-26 01:04:33 -05:00
Moonsong
4e66bf1f80 fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608)
When operator config has provider=anthropic with model.base_url pointing
at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic),
the auxiliary Anthropic path was unconditionally applying that override.
Main-session traffic routed correctly because the main path attaches the
right credential for the actual destination, but every side-channel call
(memory extractors, reflection, vision, title generation, janus
extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd.

Gate the override on hostname == api.anthropic.com. Operators routing main
through a non-Anthropic provider must use that provider's own auxiliary
client; the Anthropic aux path now stays pointed at api.anthropic.com.

Regression tests cover openrouter, openai, anthropic-with-path, empty, and
anthropic-default-base_url cases.
2026-06-26 11:21:05 +05:30
kshitij
7aa32ec82f
Merge pull request #52888 from kshitijk4poor/chore/author-map-tranquil-flow
chore: add Tranquil-Flow to AUTHOR_MAP for auxiliary base_url salvage
2026-06-26 11:20:54 +05:30
brooklyn!
2b86e9dae4
Merge pull request #52877 from NousResearch/bb/pet-scale-gesture
feat(desktop): Alt+wheel to scale the pet, never cropped
2026-06-26 00:45:07 -05:00
Brooklyn Nicholson
bf60bbb6c5 refactor(desktop): collapse overlay zoom-anchor math 2026-06-26 00:42:19 -05:00
kshitijk4poor
fe255ab28b chore: add Tranquil-Flow to AUTHOR_MAP for auxiliary base_url salvage (#52623) 2026-06-26 11:11:33 +05:30
Brooklyn Nicholson
7d1b72a15d feat(desktop): zoom the pet toward the cursor
Alt+wheel now scales about the pixel under the pointer instead of growing from a
corner, so the pet stays put under the cursor instead of running away. In-window
shifts its top-left; the overlay repositions its OS window (cursor-anchored on
wheel, bottom-center for slider-driven changes).
2026-06-26 00:37:45 -05:00
brooklyn!
6ba551e942
Merge pull request #52871 from NousResearch/bb/fix-tui-interrupt-queued
fix(tui-gateway): make stop interrupt queued turns
2026-06-26 00:37:13 -05:00
Brooklyn Nicholson
dd980aaba1 feat(desktop): Alt+wheel to scale the pet, never cropped
Hold Alt/Option and scroll over the mascot to resize it (same on Mac and
Windows); the modifier keeps a plain scroll passing through to the page. The
gesture drives the same `display.pet.scale` path as the settings slider.

The popped-out overlay grows its OS window to fit the pet at any scale (anchored
bottom-center) so the sprite is never clipped by the window edge, and the
in-window pet re-clamps against its actual size so growing near an edge can't
crop it. Also makes the overlay click-through per-pixel: only solid sprite
pixels (plus bubble / mail button) are interactive, transparent margins pass
clicks through.
2026-06-26 00:33:22 -05:00
Brooklyn Nicholson
594380d44a fix(tui): make stop interrupt queued desktop turns
Ensure TUI/desktop stop targets the actual conversation thread and cancels any queued next prompt, including the lazy agent-start window, so a stopped session cannot keep running or restart itself.
2026-06-26 00:31:06 -05:00
kshitij
a28b939092
Merge pull request #52678 from kshitijk4poor/salvage/52502-fuzzy-boundary
fix(fuzzy-match): preserve boundary space after whitespace-normalized match (#52491)
2026-06-26 10:59:14 +05:30
DavidMetcalfe
27c486e3b1 feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors
Wire get_reasoning_stale_timeout_floor() into both stale detectors so known
reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek
R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of
the upstream gateway idle-killing the socket (BrokenPipeError) before first
token. Applied as max(default, floor) — never overrides explicit user config,
never lowers an existing threshold.

The reasoning_timeouts.py allowlist module already landed on main via #52795,
so this salvage carries only the wiring + tests (the duplicate module and the
stale-base MoA reverts from the original PR branch are dropped).

Salvaged from #52238. Fixes #52217.
2026-06-25 22:12:06 -07:00
brooklyn!
f4c656b0a0
Merge pull request #52854 from NousResearch/bb/fix-interrupt-partial-reply
fix(interrupt): keep partial streamed reply when stopped mid-response
2026-06-26 00:04:37 -05:00
teknium1
4d04c652f2 fix(curator): make external-skill write guard actually fire during curation
The salvaged #51875 added a background-review write guard in skill_manage
that refuses mutations to skills.external_dirs skills — but it only fires
when is_background_review() is true. The curator's LLM review fork ran with
the default _memory_write_origin='assistant_tool', so the guard never
triggered during the exact curation pass it exists to protect against
(GH-47688).

- Set _memory_write_origin='background_review' on the curator review fork so
  turn_context binds it onto the write-origin ContextVar and the guard fires.
- Add a regression test asserting the fork runs under the background_review
  origin (the invariant linking the fork to the guard).
- AUTHOR_MAP: map yu-xin-c for the salvaged commit.
2026-06-25 22:03:02 -07:00
yu-xin-c
96bc524a71 fix(curator): protect external skills from background curation 2026-06-25 22:03:02 -07:00
teknium1
eed9bbeb0a chore(release): add rebel0789 to AUTHOR_MAP for salvaged PR #47308 2026-06-25 22:02:22 -07:00
teknium1
6c58878e7d fix(browser): force secret-pattern redaction on browser_type display
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.

Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.
2026-06-25 22:02:22 -07:00
rebel
8ff426e53b fix: redact browser typed text surfaces 2026-06-25 22:02:22 -07:00
brooklyn!
5add283ec8
Merge pull request #52833 from NousResearch/bb/wsl-desktop-fixes
fix(desktop): WSL2 clipboard paste, titlebar layout, HMR survival, and GPU acceleration
2026-06-25 23:55:42 -05:00
Brooklyn Nicholson
8233598e64 fix(interrupt): keep partial streamed reply when stopped mid-response
Stopping a turn while the model is streaming (stop/esc to redirect) raised
InterruptedError, set final_response to the throwaway "waiting for model
response" sentinel, and persisted messages WITHOUT the assistant text that
was already streamed to the screen. The next turn then had no record of the
half-finished reply, so the model appeared to "forget" what it just said.

Recover the on-screen text from _current_streamed_assistant_text in the
InterruptedError branch and append it as the assistant turn (and surface it
as final_response). The metadata sentinel is kept only when nothing was
streamed yet, preserving the ACP/client suppression behavior.

Completes the partial-stream recovery from 397eae5d9 (which wired the same
_current_streamed_assistant_text salvage into the connection-failure twin
but missed the user-interrupt path). The lossy handler dates to c98ee9852.
2026-06-25 23:54:20 -05:00
Brooklyn Nicholson
76074b2145 fix(desktop): transparent WCO titlebar chrome on Windows/WSLg
Use a transparent native overlay so renderer chrome shows through the min/max/close
band. Sync window pre-paint bg to the computed chrome mix.
2026-06-25 23:50:59 -05:00
Brooklyn Nicholson
3b1344c18c fix(desktop): WSL titlebar layout and WSL2 GPU acceleration
Live-measure WCO width in the renderer, drop the right rail below the titlebar
band, and re-enable GPU compositing under WSLg when /dev/dxg is present.
2026-06-25 23:50:59 -05:00
Brooklyn Nicholson
da5484b61f fix(desktop): WSL2 clipboard image paste + Linux titlebar overlay
WSLg bridges clipboard text but not images — pull host screenshots via
PowerShell. Disable titleBarOverlay on plain Linux; gate overlay width per
platform in titlebar-overlay-width.cjs.
2026-06-25 23:50:59 -05:00
Teknium
5b5c79a8ef
feat(kanban): typed block reasons + unblock-loop breaker (#52848)
* feat(kanban): typed block reasons + unblock-loop breaker

Stops the kanban blocked-task loop: a worker blocks a task, a cron
unblocks it, the worker re-blocks for the same reason, repeat forever.

block_task now takes a typed kind and a persistent block_recurrences
counter on the tasks table:

- kind=dependency routes to todo (parent-gated, auto-resumed), never
  the human 'blocked' bucket a cron would keep unblocking.
- needs_input/capability/transient/untyped land in blocked; each
  same-cause re-block after an unblock increments block_recurrences,
  and at BLOCK_RECURRENCE_LIMIT (default 2) the task routes to triage
  for a human instead of blocked.
- unblock_task no longer resets block_recurrences (the amnesia that
  let the loop run unbounded); complete_task clears it on success.

Wired through the worker kanban_block tool (new kind arg) and the
hermes kanban block --kind CLI flag, both reporting where the task
actually landed. Docs + 11 new tests; 536 existing kanban tests green.

* test(kanban): make second-block notify test use a distinct block cause

test_notifier_second_blocked_delivers blocked the same task twice with
the same (untyped) reason, which now trips the new unblock-loop breaker
and routes the second block to triage instead of blocked — so only one
'blocked' notification fired. The test's actual intent is that TWO
distinct block cycles each notify; give the two cycles different kinds
(needs_input then capability) so they're genuinely separate blocks. The
same-cause loop→triage path is covered by test_kanban_block_kinds.py.
2026-06-25 21:46:58 -07:00
teknium1
43b8ba4181 fix(telegram): preserve Bot API update queue on watcher reconnect
After a prolonged outage the in-process network-error ladder escalates to
fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter
that reconnects through the bootstrap path. That path called
start_polling(drop_pending_updates=True), discarding every update Telegram
queued during the outage — all messages sent while the bot was down were
silently lost. The in-process ladder and 409-conflict handler already passed
drop_pending_updates=False; only bootstrap did not distinguish a cold first
boot from a reconnect.

Thread an is_reconnect signal from the watcher through
_connect_adapter_with_timeout into adapter.connect(). The base
BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every
adapter inherits a tolerant signature (no per-platform breakage when the
runner forwards the kwarg). Telegram translates is_reconnect into
drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap
calls. Cold boot still drops the stale queue; a watcher reconnect preserves it.

Fixes #46621.

Co-authored-by: annguyenNous <annguyen@nousresearch.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>
2026-06-25 21:29:57 -07:00
liuhao1024
f44415e71a fix(gateway): add init-time provider fallback to _make_agent
When the primary provider raises AuthError (e.g. expired OAuth token),
_make_agent now walks the configured fallback_providers/fallback_model
chain before giving up — matching the behavior that cron/scheduler.py
and cli_agent_setup_mixin.py already have.

Fixes #47627
2026-06-25 21:21:58 -07:00