Commit graph

12819 commits

Author SHA1 Message Date
xxxigm
4aeaba6922 test(desktop): cover undefined/null attachment holes in ref helpers
Regression for the refText crash: attachmentDisplayText and
optimisticAttachmentRef must return null (not throw) when handed an
undefined/null attachment hole, so the submit path can't reproduce
"Cannot read properties of undefined (reading 'refText')".
2026-06-24 18:22:01 -07:00
xxxigm
7e2db0a140 fix(desktop): stop refText crash on undefined composer attachment holes
A session switch or draft restore can leave undefined/null holes in the
composer attachments array. AttachmentList was guarded against this in
#49624, but the sibling submit path was not: submitPromptText maps the
same array through attachmentDisplayText/optimisticAttachmentRef and
buildContextText (a.kind / a.label / a.refText), so a hole threw
"Cannot read properties of undefined (reading 'refText')" — an uncaught
renderer error that blanks the chat pane and shows "Desktop app link
offline".

Close the whole bug class:
- attachmentDisplayText / optimisticAttachmentRef no-op on a falsy
  attachment (shared chokepoint, also protects thread.tsx drop handler).
- submitPromptText filters falsy entries from the source array, and
  buildContextText filters its (possibly post-sync) input before reading
  fields.
2026-06-24 18:22:01 -07:00
helix4u
17beb55e3c fix(telegram): gate rich draft previews separately 2026-06-24 18:11:14 -07:00
Gille
284be6cc24
Merge pull request #52210 from helix4u/fix/desktop-update-progress-visibility
fix(desktop): surface update progress lines
2026-06-24 19:45:05 -05:00
brooklyn!
7157b213f5
Merge pull request #47959 from NousResearch/bb/pets-gen
Pet generation: frame-perfect hatch flow, backend picker, CPU-safe chroma, and CI-hardening
2026-06-24 19:41:34 -05:00
brooklyn!
153ad79524
Merge pull request #52201 from NousResearch/bb/desktop-shallow-update-count
fix(desktop): don't report a bogus update count for a shallow checkout
2026-06-24 19:34:02 -05:00
Brooklyn Nicholson
a05a9b0e07 test(delegate): harden heartbeat in-tool stale timing assertion
Stabilize the long-running-tool heartbeat test by patching stale thresholds inside the test and asserting the heartbeat exceeds the idle ceiling, which preserves intent while removing scheduler-sensitive assumptions that flake in CI.
2026-06-24 19:33:40 -05:00
Brooklyn Nicholson
2ea94c6c45 fix(pets): make inline generate cancel discard draft flow
Wire the sparkle generate button's cancel action to the same discard/reset path as step-2 cancel so abort semantics are consistent and always return to step 1 while retaining the prompt input.
2026-06-24 19:33:33 -05:00
brooklyn!
d635a6d507
Merge pull request #52208 from NousResearch/bb/desktop-update-steps
fix(desktop): stop the update overlay looking frozen while it works
2026-06-24 19:29:02 -05:00
brooklyn!
42e14d1089
Merge pull request #52205 from NousResearch/bb/desktop-restart-profile
fix(desktop): route gateway restart / status / update to the active profile
2026-06-24 19:28:53 -05:00
brooklyn!
b649cdee4a
Merge pull request #52203 from NousResearch/bb/update-drain-announce
fix(update): announce gateway drain waits so desktop updates don't look hung
2026-06-24 19:28:44 -05:00
Ben
538c419d2e fix(gateway): scope dashboard liveness fallback to the profile
PR #52151 hardened the runtime-status liveness check to trust a readable
live process command line over stale gateway_state.json argv, so a recycled
PID now owned by an s6 supervisor no longer counts as a running gateway.

That fix is correct but incomplete for the reported symptom: the web
dashboard showed a named profile's gateway green while
`hermes -p <name> gateway status` showed it stopped. Two further issues:

1. Cross-profile PID reuse. In per-profile Docker supervision, one profile's
   stale `gateway_state.json` can record a PID the OS later recycled onto a
   DIFFERENT profile's live gateway. That PID's command line still
   `looks_like_gateway`, so the dead profile was reported running. The
   recorded argv has its `-p <name>` selector stripped in-process by
   `_apply_profile_override`, so it cannot disambiguate; the live `/proc`
   cmdline still carries it. `get_runtime_status_running_pid` now accepts an
   `expected_home` and validates the live command line belongs to THAT
   profile (mirroring `hermes_cli.gateway._matches_current_profile`, the
   logic the CLI scan path already uses — which is why the CLI was correct).
   `_check_gateway_running` passes the enumerated profile dir.

2. The existing regression test `test_gateway_running_check_falls_back_to_
   runtime_state` used the live pytest PID with a gateway-shaped record; once
   the live cmdline became authoritative it no longer looked like a gateway.
   Updated to mock the live cmdline to the real separate-process scenario it
   describes.

The active-profile path (`get_running_pid`) is intentionally left unscoped:
it is lock-verified and any live gateway cmdline is acceptable there. Multiplex
mode is unaffected — `running` state is only ever written to a gateway's own
home, never a secondary served profile's.

Adds coverage for: cross-profile PID reuse (named + default), matching
profile cmdline (`-p`, `--profile`, explicit HERMES_HOME=), the bare default
gateway, and the unreadable-cmdline cross-platform fallback. Each new
cross-profile assertion fails without the profile scope and passes with it.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-06-25 10:25:54 +10:00
helix4u
f1617a7ebb fix(gateway): validate runtime status pid command line 2026-06-25 10:25:54 +10:00
Brooklyn Nicholson
592c462e3c refine(pets): preserve user-requested tone in generation prompts
Remove cute/chibi-biased wording from base draft variations and explicitly preserve the requested mood across base and row prompts so scary, eerie, or other non-cute concepts are honored while keeping sprite constraints.
2026-06-24 19:22:00 -05:00
Brooklyn Nicholson
9a4600c5fb fix(desktop): stop the update overlay looking frozen while it works
Two ways the update overlay read as stuck even though the update was
streaming progress underneath:

- In-app (macOS/Linux) UpdatesOverlay: runStreamedUpdate forwards every
  stdout line as a progress event with percent: null, and ingestProgress
  wrote that straight through — clobbering the milestone percents (10/60)
  so the bar fell back to indeterminate on every log line. Keep the last
  percent when a line carries null.

- Staged install/update overlay: the bar is completedCount / totalCount,
  which counts only *finished* stages, so a long first stage pinned it at
  "0 of 2" / 0% until the stage ended. Count the running stage as half a
  unit so the bar advances during the stage (the per-stage spinner already
  shows which step is live).

Both are display-only; no stage/event semantics change. (The Windows
hermes-setup Tauri progress UI in apps/bootstrap-installer has the same
counter-only-on-completion logic — parity follow-up.)
2026-06-24 19:20:38 -05:00
Brooklyn Nicholson
65b13e9dbc fix(desktop): route gateway restart / status / update to the active profile
restartGateway, getActionStatus, getStatus, updateHermes and
checkHermesUpdate all hit window.hermesDesktop.api WITHOUT spreading
profileScoped() — unlike their siblings (getModelInfo, setModelAssignment,
grantComputerUsePermissions). _apiProfile tracks the active gateway
profile, and the Electron proxy uses request.profile to pick which pooled
/ remote backend serves the call.

So for a multi-profile or global-remote user, the System-panel "Restart
gateway" (and its status poll, plus Update / status reads) targeted the
primary/default backend instead of the one they're on: the restart hit
the wrong gateway and the poll never saw the action → it looked like
restart silently failed. Single-profile users are unaffected
(profileScoped() returns {} when no profile is active).

Add ...profileScoped() to the five backend-action helpers so they follow
the active profile like the rest of the API surface.
2026-06-24 19:16:26 -05:00
AIalliAI
463bf2be25 fix(update): announce gateway drain waits so desktop updates don't look hung
On macOS, the desktop updater's stage 1 (hermes update --gateway) ends by
restarting running gateways. launchd_restart() SIGTERMs the gateway and
silently waits up to agent.restart_drain_timeout (default 180s) for the
drain; the manual profile-gateway loop waits its drain budget per gateway
the same way. Neither path prints anything before the wait, so the desktop
updater's live output goes dead for minutes right after '✓ Update
complete!' — users read it as a hung update and force-kill their gateway
processes to make it move (#44515). The systemd branch already announces
its drain ('draining (up to Ns)...'); launchd and the manual loop did not.

Print the stop/drain (with PID and budget) before the wait in both paths,
mirroring the systemd branch, and assert the message in the existing
launchd drain test.

Fixes #44515
2026-06-24 19:12:44 -05:00
briandevans
cb6edbf448 fix(desktop): skip the rev-list count when it is discarded anyway
checkUpdates() ran `git rev-list HEAD..origin/<branch> --count`
unconditionally in the parallel probe batch, even on the shallow +
no-merge-base path where resolveBehindCount() ignores the result and
falls back to a SHA compare. In the #51922 failure mode that count walks
the entire remote ancestry (thousands of commits), so the work was pure
latency on every update check for the exact case the fix targets.

Split the probes into two phases: resolve --is-shallow-repository and
merge-base first, then run rev-list --count only when shouldCountCommits
says the number is meaningful (full clone, or shallow-with-merge-base).
The shallow/no-merge-base SHA fallback is preserved unchanged.
2026-06-24 19:12:09 -05:00
briandevans
a6485bddb8 fix(desktop): don't report a bogus update count for a shallow checkout
The desktop installer clones with `--depth 1`, so a public install's local
history often shares no merge-base with the freshly fetched origin tip. In
that state `git rev-list HEAD..origin/<branch> --count` enumerates the
entire remote ancestry and returns a meaningless huge number, surfacing as
e.g. "v0.17.0 (+12104)" in the update indicator (#51922).

The official-SSH branch of checkUpdates() already sidesteps this by reporting
a binary up-to-date check (`behind: currentSha === targetSha ? 0 : 1`), and
hermes_cli/banner.py guards the identical class for the CLI banner. The
passive desktop count path was the one place the shallow guard was missing.

Detect shallow + no-merge-base up front and fall back to the same SHA-based
binary check; full clones (developers / Docker dev images) keep the exact
count path unchanged. The resolution logic lives in a pure update-count.cjs
helper so it is unit-testable without booting Electron.
2026-06-24 19:12:09 -05:00
Brooklyn Nicholson
1fe013ee16 feat(pets): polish generate flow and reduce hatch CPU pressure
Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.
2026-06-24 19:08:06 -05:00
Ben
d335164833 fix(relay): authorize relay inbound via connector-enforced upstream authz
A hosted instance fronted by the Team Gateway connector dropped EVERY relay
message as "Unauthorized user" and the agent never replied — despite the
message routing correctly through the connector to the instance.

Root cause: gateway authorization (_is_user_authorized) had no notion of
upstream-enforced authz. Platform.RELAY matches no {PLATFORM}_ALLOWED_USERS
allowlist and isn't in the HA/WEBHOOK always-authorized set, so a relay user
with no env allowlist configured hit the default-deny ("No user allowlists
configured. All unauthorized users will be denied."). The message was received,
then silently denied before reaching the agent.

This is incorrect for relay: the connector authenticates the gateway's WS with
a per-instance secret and performs owner-only author-binding resolution BEFORE
delivering. A message only reaches this gateway because the connector resolved
it to THIS instance's bound user (user_instance_binding), keyed on the author id
the connector OBSERVED off the event — never a gateway claim. The authorization
decision is already made by a trusted, authenticated upstream; there is no local
RELAY_ALLOWED_USERS allowlist to consult, and default-denying for its absence is
the bug.

Fix: add a generic BasePlatformAdapter.authorization_is_upstream capability
(default False) that the relay adapter overrides to True, plus a dedicated
trusted branch in _is_user_authorized that honors it. This is delegation to a
trusted upstream, NOT a fail-open: it fires only for an adapter that explicitly
declares the flag; every direct network-exposed adapter leaves it False and the
env-allowlist default-deny (SECURITY.md §2.6) is unchanged. Distinct from
enforces_own_access_policy, which mirrors a LOCAL config-driven allowlist —
this delegates to an authenticated upstream's decision.

Tests: behavior contract that the base defaults False, the relay adapter
declares True, a relay user (group + DM) is authorized with no env allowlist,
and crucially a non-upstream adapter with no allowlist still default-denies
(guards against the fix becoming a blanket fail-open). 6 new tests; relay +
authz + config-policy suites green (134 + 90).

Found via live staging debug of the Discord self-serve onboarding flow.
2026-06-25 10:06:21 +10:00
brooklyn!
a378b1e980
Merge pull request #52192 from NousResearch/bb/session-loop-guard
fix(desktop): let the session watchdog heal a stuck "looping" turn
2026-06-24 19:03:43 -05:00
brooklyn!
4127332f15
Merge pull request #52189 from NousResearch/bb/desktop-offline
fix(desktop): give the gateway reconnect loop an escape hatch
2026-06-24 19:03:34 -05:00
brooklyn!
70650e82a3
Merge pull request #52187 from NousResearch/bb/desktop-voice
fix(desktop): wire Ctrl+B voice, declutter voice settings, stop endless TTS hang
2026-06-24 19:03:25 -05:00
brooklyn!
9a94865552
Merge pull request #52183 from NousResearch/bb/desktop-agents-status
fix(desktop): make Agents indicator match the Spawn-tree panel
2026-06-24 19:03:11 -05:00
Brooklyn Nicholson
93192059c9 fix(desktop): let the session watchdog heal a stuck "looping" turn
The 8-minute stream-silence watchdog only removed a stuck session from
$workingSessionIds (the sidebar dot). The composer's busy state lives in
the session-state cache and was never cleared, so a hung or looping turn
that never delivered its terminal event — including an old session
re-opened while the backend still reports it "running" — stayed wedged on
"Thinking" / Stop indefinitely.

Have the watchdog notify subscribers when it force-clears a session, and
subscribe from the session-state cache to also drop that session's
busy/awaiting/needsInput flags. updateSessionState re-syncs $busy when the
healed session is the one on screen, so the composer recovers instead of
spinning forever.

Frontend-only safety net; doesn't touch the turn lifecycle. The backend
root (a stale in-memory session["running"] surviving a dead turn thread
and re-arming busy on every resume) is a separate follow-up.
2026-06-24 18:36:17 -05:00
Brooklyn Nicholson
2a75c4a8cb fix(desktop): give the gateway reconnect loop an escape hatch
When a remote gateway dropped after a healthy boot (internet loss,
sleep/wake, VPS restart), use-gateway-boot retried with backoff forever
and never surfaced an error. The renderer sat behind the fullscreen
CONNECTING overlay with gatewayState non-open and boot.error null — no
way to reach Settings, sign in again, or switch to a local gateway. To
the user the app was simply broken on connection loss.

Raise a recoverable boot error once the reconnect loop crosses
RECONNECT_ESCALATE_AFTER (6 attempts, ≈45s), so the BootFailureOverlay
(Retry / Sign in / Use local gateway) replaces the dead-end CONNECTING
screen. The loop keeps retrying underneath; the next successful reconnect
(or a manual/wake-driven one) clears the error and dismisses the overlay.

This implements the contract already specified — but never wired up — in
use-gateway-boot.test.tsx (desktop vitest isn't in CI, so the failing
"FIX:" specs went unnoticed). All 4 hook tests + the 3 connecting-overlay
tests pass.
2026-06-24 18:32:29 -05:00
Brooklyn Nicholson
8d1706ae5c fix(desktop): wire Ctrl+B voice, declutter voice settings, stop endless TTS hang
Three voice-mode papercuts in the desktop app:

1. Ctrl+B did nothing. The docs + `voice.record_key` advertise Ctrl+B to
   talk, but the desktop never bound it (only ⌘B = sidebar existed). Add a
   rebindable `composer.voice` action that toggles the voice conversation,
   defaulting to ⌃B on macOS (distinct from ⌘B; off-macOS `ctrl` folds to
   the sidebar chord, so it ships unbound there to avoid stealing it). The
   global keybind reaches the composer through a new focus-bus event.

2. The Voice settings page rendered every provider's options at once (~30
   fields). Filter to the *selected* TTS/STT provider's sub-fields; STT
   provider fields hide when STT is off. Picking "edge" now shows just the
   Edge voice, making it obvious voice chat also needs STT enabled.

3. Voice mode could hang "speaking" forever. Free Edge TTS sometimes returns
   audio that never fires `playing`/`ended`/`error`, so the playback promise
   never settled. Add a stall watchdog (rearmed on each progress tick, so
   long speech is never cut off) that rejects a stuck stream, letting the
   loop recover with a clear error.
2026-06-24 18:26:14 -05:00
Ben
41b9b7e719 test(lazy-deps): make durable-target tests network-free
CI test shard has no PyPI egress: the real 'pip install packaging==20.9'
in test_core_package_is_not_shadowed failed (the pypi.org reachability
probe passed but the actual install didn't), failing slice 2/6.

- Prove the anti-shadow invariant deterministically: synthesize a fake
  'packaging' in the durable target with a sentinel and assert the import
  still resolves to the core copy (TestCoreNeverShadowed). No network.
- Cover the install wire offline: stub subprocess and assert --target +
  --constraint are built in durable mode and absent in venv-scoped mode
  (TestInstallArgConstruction).
- Gate the genuine PyPI install behind HERMES_RUN_NETWORK_TESTS=1 (opt-in,
  skipped in CI) instead of a flaky reachability probe that doesn't predict
  install success.
2026-06-25 09:20:13 +10:00
Ben
cbd6ba1bdd fix(docker): redirect lazy installs to a durable target so opt-in backends work in the immutable image (#51136)
The published Docker image seals the agent venv (root-owned, read-only
/opt/hermes) and sets HERMES_DISABLE_LAZY_INSTALLS=1 so a runtime install
can't mutate and brick the core. But opt-in backends (Firecrawl web search,
Exa, Feishu, ...) deliberately keep their SDKs in tools/lazy_deps.py and out
of [all] (pyproject policy 2026-05-12: one quarantined release must not break
every install). The two policies collided: the SDK isn't baked in AND can't
lazy-install, so the default Firecrawl web_search/web_extract fail out of the
box in Docker (#51136), as do Exa (#49445) and Feishu (#50205).

Fix the whole class instead of baking in one backend: when
HERMES_LAZY_INSTALL_TARGET is set, lazy installs are redirected to a writable
dir on the durable /opt/data volume via `pip/uv install --target`, and that
dir is APPENDED to the end of sys.path. Because the core venv always wins
name collisions, a package installed this way can only ADD new modules — it
can never shadow, downgrade, or break a module the core ships. The worst a
bad/incompatible backend package can do is fail to import and report itself
unavailable; the agent core stays healthy. That structural guarantee is what
made it safe to seal the venv, and it is preserved here even with installs
re-enabled.

- tools/lazy_deps.py: durable-target mode — `--target` install + core-pinned
  `--constraint` file (shared deps resolve to core's versions, conflicts fail
  loudly at install time), append-only sys.path activation, ABI/Python-version
  stamp that wipes the store if an image rebuild bumps the interpreter, and a
  reworked gate so HERMES_DISABLE_LAZY_INSTALLS=1 redirects (rather than hard-
  blocks) when a target is set. security.allow_lazy_installs=false still
  disables installs in every mode.
- hermes_bootstrap.py: activate the durable target on sys.path at first import
  (before any backend imports its SDK) so packages installed on a previous run
  are importable on this run.
- Dockerfile: set HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages.
- docker/stage2-hook.sh: seed + chown the dir on the data volume.
- tests: real-install E2E proving installs land in the target, import cleanly,
  don't leak into the sealed venv, and that a core package is never shadowed;
  ABI-stamp wipe/preserve; gate matrix; Dockerfile/stage2 contract test.

Fixes #51136
2026-06-25 09:20:13 +10:00
Brooklyn Nicholson
a268dfff0a fix(desktop): make Agents indicator match the Spawn-tree panel
The status-bar "Agents" item conflated three unrelated signals — running
subagents (aggregated across all sessions), in-flight session turns, and
failed background *system* actions (gateway restarts, toolset installs,
computer-use grants via $desktopActionTasks/preview restart) — yet
clicking it opens AgentsView, which renders only subagents. A failed
gateway restart therefore showed "Agents (1 Failed)" over an empty
"No live subagents" tree. AgentsView also filtered to the active session,
so a subagent running in a background session showed "Agents N running"
with nothing in the tree (the desync reported in #49808).

Unify the scope both surfaces speak:
- AgentsView aggregates subagents across every session (salvages #49819).
- The indicator's running/failed counts come from subagents only
  (aggregated), never background system actions — those keep their own
  surfaces in settings / command center.

So "Agents (N …)" now always points at a populated Spawn tree.

Supersedes #49819. Fixes #49808.
2026-06-24 18:16:14 -05:00
liuhao1024
404b06ac4f fix(gateway): honor server retry_after in _send_with_retry for Telegram flood control (#46762)
When Telegram's sendRichMessage returns a FloodWait/RetryAfter error,
_try_send_rich() now extracts the server-provided retry_after value and
propagates it through SendResult.retry_after. The base _send_with_retry()
layer honors this value instead of using its default short exponential
backoff (~2s, ~4s), preventing the retry budget from being exhausted
against a server that demands a 25-37s wait.

Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from
gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py
since the original PR.

Closes #46762
2026-06-25 02:43:47 +05:30
kshitij
cedbb4cfa2
Merge pull request #52140 from NousResearch/salvage/47707-tool-schema-validation
fix(agent): validate context/memory tool schemas before wrapping (#47707)
2026-06-25 02:36:19 +05:30
kshitij
085096fd59
Merge pull request #52135 from NousResearch/salvage/51826-tirith-mkdtemp-oerror
fix(tools): catch mkdtemp OSError in tirith install (#51826)
2026-06-25 02:35:27 +05:30
kshitij
7d2c1f3f84
Merge pull request #52134 from NousResearch/salvage/42449-deepcopy-ctx-engine
fix(agent): deepcopy plugin context engine to prevent parent corruption on delegate_task (#42449)
2026-06-25 02:28:37 +05:30
Bartok9
710cd48fb1 fix(agent): validate context/memory tool schemas before wrapping
Closes #47707

Context engines and memory providers expose tool schemas via
get_tool_schemas(). agent_init.py wrapped each as
{"type":"function","function":_schema} without validating that
_schema carries a top-level name. A provider returning an entry already
in OpenAI tool form ({"type":"function","function":{...}}) was then
double-wrapped into a tool whose function has no name. Strict providers
(e.g. DeepSeek) reject the entire request with HTTP 400
'tools[N].function: missing field name', so one malformed schema
silently disables the whole toolset and breaks every turn. The schema
was also never added to valid_tool_names, so even lenient providers
could not call it.

Add a shared normalize_tool_schema() helper that unwraps an
already-wrapped entry and returns None for anything lacking a resolvable
string name. Wire it into the agent_init context-engine loop and all
three memory_manager surfaces (inject_memory_provider_tools,
add_provider routing index, get_all_tool_schemas), so a single bad
plugin schema is skipped with a warning instead of poisoning the
request.

Verification: 209 targeted agent/memory tests pass (incl. 9 new).
New tests assert the unwrap + skip-nameless behavior and fail without
the fix.
2026-06-25 02:17:29 +05:30
liuhao1024
dbf0797335 fix(tools): catch mkdtemp OSError in tirith install to prevent unbounded retry and temp-dir leak (#51826)
When tempfile.mkdtemp() raises OSError (e.g. disk full), the exception
propagated past the try/finally block, so _mark_install_failed() was
never called. The 24h backoff marker never engaged, causing unbounded
retry on every command -- each attempt leaked a tirith-install-* temp
directory, eventually filling /tmp completely.

Fix: wrap mkdtemp in its own try/except OSError, returning
(None, "no_space") so the caller's normal failure path (including
_mark_install_failed) executes.

Salvaged from #51831 by @liuhao1024.

Closes #51826
2026-06-25 02:13:56 +05:30
liuhao1024
8d1f6debfd fix(agent): deepcopy plugin context engine to prevent parent corruption on delegate_task (#42449)
When delegate_task spawns a child agent with a different model/provider, the
child's init_agent loaded the plugin context-engine GLOBAL singleton by
reference (`_selected_engine = _candidate`) and then called update_model() on
it with the child's (smaller) context_length. Because parent and child shared
the same object, this mutated the PARENT's compressor: e.g. DeepSeek 1M ctx
silently dropped to 204800 and the compression threshold from 200K to 40K
after any delegate_task with a different model.

Deepcopy the singleton before assigning/mutating it (agent_init.py) so the
child gets its own instance and the parent's compressor is untouched.

Salvaged from #42452 by @liuhao1024 (authorship preserved). Added a
source-pin regression test that fails if the production line reverts to the
bare alias, plus an end-to-end test driving get_plugin_context_engine() and a
StubEngine.update_model() — the original PR's tests exercised copy.deepcopy in
isolation but did not guard the actual agent_init code path.

Closes #42449. Supersedes #42469, #42474 (same one-line fix, no test).
2026-06-25 02:13:26 +05:30
kshitij
77d2b50751
Merge pull request #52118 from NousResearch/salvage/36776-ddgs-timeout
fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776)
2026-06-25 01:56:26 +05:30
kshitij
4d589b1e13
Merge pull request #52121 from NousResearch/salvage/43466-strip-cronjob-toolset
fix(delegate): strip cronjob toolset from delegated children (#43466)
2026-06-25 01:54:37 +05:30
uzunkuyruk
489b85ee1e fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776)
A single ddgs (DuckDuckGo) search could hang indefinitely and block the
shared agent loop — and therefore every platform (CLI, Telegram, Matrix...).
The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's
multi-engine retry loop has no overall cap, so a slow/rate-limited response
could spin for 20+ minutes with no output and no error.

Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap
it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a
clear failure ("DuckDuckGo search timed out ... try a different provider")
instead of blocking; the pool is shut down with cancel_futures so a hung
worker is never awaited.

Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on
current main (the PR's provider.py base had diverged). Added a load-bearing
timeout regression test (the original PR only updated the fake's constructor
and had no timeout-behavior test) — mutation-verified to fail without the cap.

Closes #36776.
2026-06-25 01:45:06 +05:30
kshitijk4poor
e25b56fc64 chore: AUTHOR_MAP entry for riyas22 (PR #43687 salvage) 2026-06-25 01:39:11 +05:30
Riyasudeen Farook
1e4df599ec fix(delegate): strip cronjob toolset from delegated children (#43466)
_strip_blocked_tools used a hardcoded set missing 'cronjob'. Children
on gateway platforms could inherit the cronjob toolset, scheduling
persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS.

Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the
two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for
documentation consistency. Two regression tests lock the invariant.

Salvaged from #43687 by @riyas22. Adapted test to current main (no
'messaging' toolset exists -- send_message is intentionally not
registered as an agent tool).

Closes #43466
2026-06-25 01:37:25 +05:30
kshitij
7a79a4447c
Merge pull request #52116 from NousResearch/fix/46994-session-load-bool-iterable
fix(gateway): skip non-dict entries in session loading (#46994)
2026-06-25 01:33:36 +05:30
kshitij
8f0a12ce09
Merge pull request #52114 from NousResearch/salvage/27405-preflight-fewbig
fix(agent): trigger preflight compression on few-but-huge sessions (#27405)
2026-06-25 01:27:07 +05:30
kshitijk4poor
9c994377ed fix(gateway): skip non-dict entries in session loading (#46994)
Corrupted sessions.json entries (e.g. a bare bool where a dict is
expected) caused TypeError on 'origin' in data' which escaped the
(ValueError, KeyError) inner except and aborted loading ALL remaining
sessions, not just the corrupted one.

Two-layer fix:
- Loop level: isinstance(entry_data, dict) guard before from_dict
- from_dict: isinstance(data['origin'], dict) instead of bare truthiness
- Added TypeError to the inner except as defense-in-depth

Closes #46994
2026-06-25 01:26:13 +05:30
texhy
aacc6bb0a8 fix(agent): trigger preflight compression on few-but-huge sessions (#27405)
The preflight-compression gate only ran the (expensive) token estimate when
the message COUNT exceeded protect_first_n + protect_last_n + 1. A session
with a handful of very large messages never tripped the count condition, so
compression was never attempted and the turn eventually hit a hard
context-overflow error.

Add _should_run_preflight_estimate() with OR semantics: run the estimate when
either the message count exceeds the protected ranges (the historical gate)
OR a cheap char-based estimate already crosses the configured threshold. The
downstream estimate_request_tokens_rough() stays authoritative — this is only
a hint that decides whether to pay for the full estimate.

Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current
main: the preflight gate moved from conversation_loop.py to turn_context.py
since the PR was opened, so the helper + gate are placed there; the test
imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal.

Closes #27405.
2026-06-25 01:20:23 +05:30
kshitij
ed1fdb5b61
Merge pull request #52112 from NousResearch/revert/52053-minimum-context-floor
revert(plugins): revert minimum context floor configurable (#52053)
2026-06-25 01:11:53 +05:30
kshitijk4poor
e0272cfef2 Revert "fix(compression): make minimum context floor configurable (#31600)"
This reverts commit cae1ee44a7.
2026-06-25 01:04:44 +05:30
kshitij
59acaa972f
Merge pull request #52053 from NousResearch/salvage/31600-minimum-context-length-configurable
fix(compression): make minimum context floor configurable (#31600)
2026-06-25 01:02:52 +05:30