Commit graph

13633 commits

Author SHA1 Message Date
nightq
fa3ab2ffd0 fix: normalize tool_call_id whitespace in sanitizer
_sanitize_api_messages() compared raw tool_call_id strings without
stripping whitespace. When assistant-side IDs and tool-result IDs
diverged due to surrounding whitespace, valid tool results were treated
as orphaned and replaced with [Result unavailable] stub placeholders.

Strip whitespace in _get_tool_call_id_static() (both call_id/id paths,
dict and object) and at the two result_call_id comparison sites in
sanitize_api_messages(). Adds regression tests for preserved-whitespace
results and orphaned-whitespace removal.

Closes #9999
2026-06-30 01:43:40 -07:00
brooklyn!
f9b619dfae
Merge pull request #55504 from NousResearch/bb/desktop-split-prompt-body
refactor(desktop): decompose use-prompt-actions (slash + submit sub-hooks)
2026-06-30 03:22:43 -05:00
brooklyn!
90f59ecdbb
Merge pull request #55501 from NousResearch/bb/desktop-split-message-stream
refactor(desktop): split use-message-stream (utils + gateway-event sub-hook)
2026-06-30 03:22:21 -05:00
Brooklyn Nicholson
7337248a4c refactor(desktop): extract submit pipeline into use-prompt-actions/submit
After the slash dispatcher, the next-largest body unit was submitPromptText —
a ~280-line submit pipeline. Lift it into a colocated useSubmitPrompt sub-hook
(use-prompt-actions/submit.ts) with a typed SubmitPromptDeps object; body moves
verbatim. SubmitTextOptions moves to utils.ts (shared by submit + submitText).

Pure restructuring, no behaviour change (full use-prompt-actions suite green).
index.ts: 1,212 -> 937.
2026-06-30 03:15:10 -05:00
Brooklyn Nicholson
51a710e57e refactor(desktop): extract gateway-event dispatcher into its own sub-hook
The remaining bulk of useMessageStream was handleGatewayEvent — a ~550-line
event-type dispatcher. Lift it into a colocated useGatewayEventHandler sub-hook
(use-message-stream/gateway-event.ts): the values it closed over (sibling
streaming callbacks + the 3 stable refs the deps array omitted + options)
become a typed GatewayEventDeps object; the dispatcher body moves verbatim.

Pure restructuring, no behaviour change (utils tests still green). index.ts:
1,120 -> 540.
2026-06-30 03:11:14 -05:00
kshitijk4poor
58d8e25e67 fix(agent): make compression lock-lease refresher tolerate transient DB blips
Follow-up hardening on the salvaged #54465 backoff persistence work.

The lease refresher's loop treated ANY falsy refresh as a permanent stop
(`if not refreshed: break`), conflating two distinct cases:
  - genuine lost-ownership (rowcount 0) — correct to stop, and
  - a one-off transient DB error (write contention that escapes
    _execute_write's retry budget) — which returned False identically.

A single transient blip therefore killed the lease for the rest of a
multi-minute compression call, silently reintroducing the exact 300s-TTL <
~361s-call expiry wedge the PR set out to fix.

Changes:
- _CompressionLockLeaseRefresher._run now tolerates a bounded run of
  consecutive failures (_MAX_CONSECUTIVE_REFRESH_FAILURES = 3) before giving
  up the lease; a recovered tick resets the counter. Worst-case extra hold is
  cap * refresh_interval, still bounded by the acquirer's TTL.
- Replace the two remaining silent `except Exception: pass` arms in the
  compression-failure-cooldown persist/clear helpers with debug logging, for
  parity with their sqlite3.Error sibling arms (a non-sqlite bug was invisible).
- Document the join(timeout=1.0) quiesce bound in stop().
- Add 3 regression tests: single-blip tolerance, persistent-failure stop at the
  cap, and refresh-raising tolerance.
2026-06-30 13:36:29 +05:30
Rod Boev
7479f26b3f fix(agent): keep unbound compressors on the fail-open path (#54465) 2026-06-30 13:36:29 +05:30
Rod Boev
6fd701acbe fix(agent): keep cooldown state on the active session (#54465) 2026-06-30 13:36:29 +05:30
Rod Boev
cafe9d9261 fix(agent): prevent stale lock leases after early compression exits (#54465) 2026-06-30 13:36:29 +05:30
Rod Boev
f2ace45286 fix(agent): release refreshed compression locks on every exit path (#54465) 2026-06-30 13:36:29 +05:30
Rod Boev
53ef954841 fix(agent): keep cooldown and lock refresh on one authority (#54465) 2026-06-30 13:36:29 +05:30
Rod Boev
f2ccb2859f fix(agent): persist compression backoff across resume (#54465) 2026-06-30 13:36:29 +05:30
brooklyn!
5edfda5088
Merge pull request #55497 from NousResearch/bb/desktop-split-session-actions
refactor(desktop): split use-session-actions into folder + utils
2026-06-30 03:05:57 -05:00
Brooklyn Nicholson
08c83d0555 refactor(desktop): extract slash dispatcher into use-prompt-actions/slash
The usePromptActions body's largest unit was executeSlashCommand — a ~530-line
`/command` dispatcher. Lift it into a colocated useSlashCommand sub-hook
(use-prompt-actions/slash.ts): the ~13 values it closed over become a typed
SlashCommandDeps object the parent passes in; the dispatcher body (and its inner
runSlash recursion) moves verbatim. SlashActionCtx (slash-only) moves with it.

Pure restructuring, no behaviour change (verified: full use-prompt-actions test
suite still green). index.ts: 1,772 -> ~1,250.
2026-06-30 03:03:45 -05:00
Teknium
643b0dc678
fix(cron): raise default pre-run script timeout from 120s to 1h (#55489)
Cron pre-run scripts were capped at 120s by default, which surprised
users running long data-collection scripts on crons (the whole point of
crons being to offload long work). Raise _DEFAULT_SCRIPT_TIMEOUT to 3600s
(1 hour).

This bounds the script only — skill/agent jobs already run on a separate
inactivity budget (HERMES_CRON_TIMEOUT, default 600s idle, 0=unlimited),
not a wall-clock cap. Scripts dispatch to a persistent thread pool and do
not hold the tick lock, so a long script doesn't starve other due jobs.

Docs clarified to make the script-vs-agent timeout distinction explicit.

env/config overrides (HERMES_CRON_SCRIPT_TIMEOUT,
cron.script_timeout_seconds) unchanged and still take precedence.
2026-06-30 01:00:39 -07:00
Brooklyn Nicholson
086343854d refactor(desktop): split use-message-stream into folder + utils
Extract the standalone gateway-event helpers (session-info patch derivation,
completion-error detection, todo-payload routing, delegate_task -> subagent
spec mapping, + the stream-flush/subagent-event constants) out of the
1,285-line hook into a colocated, tested use-message-stream/utils.ts. index.ts
keeps the stateful streaming hook and consumes the helpers.

Pure restructuring, no behaviour change; folder index keeps the import path
intact. index.ts: 1,285 -> ~1,120. Adds unit tests for the pure helpers.
2026-06-30 02:58:45 -05:00
Brooklyn Nicholson
ed47f2b4aa refactor(desktop): split use-session-actions into folder + utils
Extract the ~16 standalone helpers (message reconciliation, optimistic/resolved
session upserts, stored-session resolution, runtime-info application, error
classification) out of the 1,254-line god hook into a colocated, tested
use-session-actions/utils.ts. index.ts keeps the hook orchestrator (the
stateful action callbacks) and consumes the helpers.

Pure restructuring, no behaviour change; folder index keeps the import path
(`@/app/session/hooks/use-session-actions`) intact. index.ts: 1,254 -> ~950.
Adds unit tests for the pure helpers.
2026-06-30 02:54:46 -05:00
David Gutowsky
3a83b6bc5d fix(gateway): self-heal stale sessions.json routing at message time
Detect a routing key whose session is already ended in state.db
(end_reason set) inside get_or_create_session and drop the stale entry
instead of silently routing the message into a closed session.

Previously the only runtime cleanup of sessions.json was the startup
_prune_stale_sessions_locked (#52808/#54138), which requires a restart.
A session ended while the gateway stays alive — any path that finalizes
the DB row without clearing sessions.json — left a live routing key
pointing at a closed session. get_or_create_session never consulted
end_reason, so it returned that stale entry and every subsequent message
was silently dropped (no log, no error, no response) until the next
restart. This is the live-gateway variant of #52804/FM9, which needed an
actual gateway crash.

The guard drops the stale entry and falls through to
_recover_session_from_db, which reopens agent_close-ended rows and
resumes the SAME session_id (transcript preserved); if the row ended for
a non-recoverable reason (e.g. /new) it correctly starts a fresh
session. A warning is logged so the event is visible (the field
incident reported zero log output).

Adds tests/gateway/test_session_store_runtime_stale_guard.py covering
the _is_session_ended_in_db helper and the end-to-end routing self-heal
(recover-vs-fresh, live-entry untouched, stale-wins-over-suspended,
force_new short-circuit).

Closes #54878.

Co-authored-by: David Gutowsky <david.gutowsky@gmail.com>
2026-06-30 13:17:51 +05:30
brooklyn!
6763d63240
Merge pull request #55493 from NousResearch/bb/desktop-hook-folders
refactor(desktop): colocate hook/component families into scoped folders
2026-06-30 02:45:44 -05:00
Brooklyn Nicholson
fa7bce0789 refactor(desktop): colocate hook/component families into scoped folders
Single-scoped helpers/sub-files were sitting flat in shared/grab-bag dirs.
Fold each family into its own folder (index = the export, dir resolution keeps
public import paths intact), dropping the now-redundant filename prefix:

- session/hooks/use-prompt-actions.ts (+ -utils, + tests)
  -> use-prompt-actions/{index,utils}.ts (+ tests)
- components/assistant-ui/thread* + assistant/system/user message renderers
  -> assistant-ui/thread/{index,content,status,message-parts,timestamp,types,
     list,timeline,timeline-data,assistant-message,system-message,user-message,
     user-edit-composer,user-message-text} (+ tests)
- components/assistant-ui/tool-fallback(+model)/tool-approval
  -> assistant-ui/tool/{fallback,fallback-model,approval} (+ tests)

Pure move + import rewrites; no behaviour change. App-wide shared primitives
(markdown-text, directive-text, tooltip-icon-button, clarify-tool, ansi-text,
message-render-boundary) stay flat. desktop-controller intentionally left in
app/ (route root; foldering would churn ~80 relative imports for no gain).
2026-06-30 02:42:07 -05:00
kshitijk4poor
c9269fbfb6 fix(web_extract): bound stored full-text size + give concrete read_file offset
Two robustness gaps from the #54843 truncate-store path:

- _store_full_text wrote the full clean page to cache/web with no upper
  bound (path.write_text(content)); a multi-MB page → unbounded per-extract
  disk write. Cap at MAX_STORED_TEXT_CHARS (2MB, the pre-truncate-store
  refusal ceiling) with a marker when capped.
- The truncation footer told the model 'read_file ... offset=<line>' — a
  literal placeholder it had to guess. Compute the real starting line of the
  omitted middle (head line count + 1) so the first read_file lands in the gap.
2026-06-30 00:19:49 -07:00
kshitijk4poor
c1b9de73f5 perf(context-refs): expand @-references concurrently
Multiple @-references in one message (esp. @url: refs, each a full
web_extract round-trip) were expanded in a serial `for ref in refs: await`
loop. Switch to asyncio.gather over the independent _expand_reference calls,
reassembling warnings/blocks in original positional order so output is
byte-identical to the serial path; the token-budget check is unchanged.

Generic + provider-agnostic: helps every web backend equally (exa/tavily/
firecrawl/parallel) since it's above the provider layer. RED/GREEN test:
3 url refs @ 0.2s each = 0.60s serial -> ~0.20s concurrent.
2026-06-30 00:19:49 -07:00
brooklyn!
a1b6e7eadc
Merge pull request #55470 from NousResearch/bb/desktop-button-consistency
refactor(desktop): formalize row-as-button primitive (RowButton)
2026-06-30 02:19:37 -05:00
brooklyn!
8f8487b54f
Merge pull request #55468 from NousResearch/bb/desktop-icon-size-token
refactor(desktop): add iconSize token, migrate ad-hoc icon sizes onto it
2026-06-30 02:19:28 -05:00
brooklyn!
28ba01c603
Merge pull request #55459 from NousResearch/bb/desktop-split-prompt-actions
refactor(desktop): extract use-prompt-actions standalone helpers into utils
2026-06-30 02:19:12 -05:00
brooklyn!
b69c2d2fcd
Merge pull request #55456 from NousResearch/bb/desktop-split-controller
refactor(desktop): thin desktop-controller by extracting session-list actions
2026-06-30 02:19:03 -05:00
brooklyn!
116acf3821
Merge pull request #55455 from NousResearch/bb/desktop-split-composer
refactor(desktop): extract composer pure helpers into composer-utils
2026-06-30 02:18:41 -05:00
brooklyn!
61211967e1
Merge pull request #55453 from NousResearch/bb/desktop-split-sidebar
refactor(desktop): split sidebar/index.tsx god file into focused modules
2026-06-30 02:18:32 -05:00
brooklyn!
374d38f09d
Merge pull request #55451 from NousResearch/bb/desktop-split-thread
refactor(desktop): split thread.tsx god file into focused modules
2026-06-30 02:17:56 -05:00
brooklyn!
ddf0d980b6
Merge pull request #55473 from NousResearch/bb/cmdk-drag-region
fix(desktop): make ⌘K / session-switcher HUDs ignore the titlebar drag band
2026-06-30 02:11:03 -05:00
Brooklyn Nicholson
03311abe49 fix(desktop): make ⌘K / session-switcher HUDs ignore titlebar drag band
The top-center floating HUDs (command palette + session switcher) pin at
top-3, overlapping the titlebar's `[-webkit-app-region:drag]` bands. Drag
regions win hit-testing over the DOM regardless of z-index, so the top of
each surface — the search input — swallowed clicks, leaving only a ~2px
strip focusable. Add `[-webkit-app-region:no-drag]` to the shared
HUD_SURFACE so the whole surface is interactive.
2026-06-30 02:06:30 -05:00
Brooklyn Nicholson
57dd86f247 docs(desktop): tighten RowButton doc comment 2026-06-30 02:05:55 -05:00
Brooklyn Nicholson
f3cd744f5c docs(desktop): tighten iconSize doc comment 2026-06-30 02:05:31 -05:00
Brooklyn Nicholson
c2fb651c5e refactor(desktop): formalize row-as-button primitive (RowButton)
Finding 2 of the desktop UI-consistency pass. Several surfaces intentionally
make an entire row/cell the click target while hosting nested layout inside a
raw <button> (each re-justifying the pattern in a local comment). Introduce a
zero-style RowButton primitive (components/ui/row-button.tsx) that bakes in the
shared semantics — type="button" + a stable data-slot — without imposing any
styling, then migrate every genuine row-button onto it:

- app/overlays/panel.tsx
- app/artifacts/index.tsx
- app/chat/sidebar/chrome.tsx (SidebarRowBody, SidebarRowLink)
- app/settings/providers-settings.tsx
- components/desktop-onboarding-overlay.tsx (PROVIDER_ROW_CLASS rows)

Fully behavior-preserving: RowButton adds no classes, so each row keeps its
exact layout/look (verified by a unit test asserting className passthrough).

Left as-is (not row-buttons; converting would risk visual regressions): the
compact bespoke buttons in shell/statusbar-controls.tsx (STATUSBAR_ACTION_CLASS,
also a nested DropdownMenuTrigger asChild) and pet-generate/reference-chip.tsx.
2026-06-30 01:56:57 -05:00
Brooklyn Nicholson
5e51f9c689 refactor(desktop): add iconSize token and migrate ad-hoc icon sizes onto it
Finding 1 of the desktop UI-consistency pass: SVG icon sizing had four
competing conventions with no source of truth. Introduce a named icon-size
scale (iconSize.xs/sm/md/lg/xl -> size-3/3.5/4/5/6) in lib/icons.ts and migrate
the genuine icon deviants onto it:

- desktop-install-overlay.tsx: Loader2/Check/AlertTriangle/Chevron* (h-4 w-4,
  h-3.5 w-3.5 -> iconSize.md/sm)
- composer/controls.tsx, voice-activity.tsx, queue-panel.tsx: numeric size={N}
  on Tabler icons -> iconSize classes

Sizes snap to the nearest scale step; the only rendered deltas are size={11}
-> 12px (queue/stop glyphs, +1px) and AudioLines size={15} -> 14px (-1px, now
matches its sibling toolbar icons). All other migrations are exact (12/14/16px).

Out of scope (different sizing mechanisms, left untouched): non-icon h-N w-N
layout (sliders, skeletons, swatches), sprite size props (PixelEggSprite), and
Codicon font-icon sizing. Broader size-N -> token adoption is follow-up.
2026-06-30 01:50:27 -05:00
brooklyn!
d6396e6a41
Merge pull request #55449 from NousResearch/bb/verify-on-stop-auto-default
feat(agent): restore surface-aware "auto" default for verify_on_stop
2026-06-30 01:47:59 -05:00
Brooklyn Nicholson
4dbd869ab3 feat(agent): restore surface-aware "auto" default for verify_on_stop
#53552 flipped verify_on_stop to default OFF because the guard fired on
doc/markdown/skill edits and felt like noise. That doc/markdown/skill
suppression already shipped in the same change (_filter_verifiable_paths in
agent/verification_stop.py), so the original noise rationale no longer holds:
the guard already skips prose-only turns.

Restore the surface-aware "auto" default — ON for interactive coding surfaces
(CLI, TUI, desktop) and programmatic callers, OFF for conversational messaging
surfaces (Telegram, Discord, etc.) where the verification narrative would reach
a human as chat noise. The missing/unrecognized fallback in
verify_on_stop_enabled now resolves to the same surface-aware default instead of
hard OFF, so both the DEFAULT_CONFIG value and the resolver agree.

Scope: this changes the shipped default for fresh installs and configs without
an explicit verify_on_stop key. Existing configs that #53552/#54740 migrated to
an explicit `false` are respected and unchanged — this PR does not add a
force-migration of those values back to auto.
2026-06-30 01:43:08 -05:00
brooklyn!
aa3c1d6679
Merge pull request #55457 from NousResearch/bb/fix-windows-subprocess-test-flake
test: fix flaky windows no-window-flag tests vs update-check daemon
2026-06-30 01:42:48 -05:00
Brooklyn Nicholson
bde2dc1051 refactor(desktop): extract use-prompt-actions standalone helpers into utils
The usePromptActions hook is the textbook "god hook" AGENTS.md warns against.
As a first, safe slice, pull its module-level standalone helpers (no closure
over hook state) into a focused, testable use-prompt-actions-utils.ts sibling:

- error classifiers: isSessionNotFoundError, isSessionBusyError,
  isProviderSetupError, inlineErrorMessage
- session-busy retry: withSessionBusyRetry (+ its constants)
- attachment IO: base64FromDataUrl, imageFilenameFromPath,
  readImageForRemoteAttach, readFileDataUrlForAttach, friendlyRemoteAttachError
- misc: delay, isSessionIdCandidate, blobToDataUrl, renderCommandsCatalog,
  slashStatusText, appendText, visibleUserOrdinal, visibleUserIndexAtOrdinal,
  the _submitInFlight guard set, and the GatewayRequest type

Pure restructuring, no behavior change; the usePromptActions and
uploadComposerAttachment exports (and their import paths) are unchanged. Adds
unit tests for the pure helpers. use-prompt-actions.ts: 1,956 -> 1,772.
2026-06-30 01:38:58 -05:00
Brooklyn Nicholson
0ea318a7d4 test: make windows no-window-flag assertions immune to update-check daemon
These tests patch `<module>.subprocess.run`, which is the shared `subprocess`
module singleton, so the patch is process-wide. Importing `tui_gateway.server`
runs `prefetch_update_check()` at import time, spawning an unnamed daemon thread
(`Thread-N (_run)`) that shells out to `git ... origin` (`text=True, timeout=5`).
That call races the test and lands in the captured list, intermittently failing
`test_tui_gateway_fuzzy_file_listing_hides_git_windows` with either
`KeyError: 'creationflags'` (the daemon's git call has no creationflags) or a
call-count mismatch (3 git calls captured, not 2). It only reproduced under the
parallel test harness because of the extra concurrency/timing.

Filter captured calls to the distinctive argv tokens of the call under test
(`--show-toplevel`, `ls-files`, `branch --show-current`, `diff`, `rg`,
`taskkill`) and read `creationflags` via `.get`, mirroring the existing
hardening on `test_gateway_pid_scan_hides_wmic_and_powershell_windows`. The
production code is unchanged; this is a test-isolation fix.
2026-06-30 01:35:55 -05:00
Brooklyn Nicholson
25c7900fb5 refactor(desktop): thin desktop-controller by extracting session-list actions
DesktopController is a route root that had grown a controller's worth of
session-list plumbing inline. Extract the cohesive fetch/paging cluster into
a focused hook and a tested pure helper, per AGENTS.md's "keep route roots
thin" guidance:

- use-session-list-actions.ts: refreshSessions / loadMoreSessions /
  loadMoreSessionsForProfile / loadMoreMessagingForPlatform / refreshCronJobs
  (plus the private cron/messaging refreshers, sessionsToKeep, and the
  excluded-source constants)
- desktop-controller-utils.ts: pure sameCronSignature helper (+ unit tests)

Pure restructuring, no behavior change. desktop-controller.tsx: 1,441 -> 1,233.
2026-06-30 01:34:12 -05:00
Brooklyn Nicholson
dd659c8d17 refactor(desktop): extract composer pure helpers into composer-utils
Pull ChatBar's module-level pure helpers, constants, and the QueueEditState
type out of the 2.3k-line composer/index.tsx into a focused, testable
composer-utils.ts sibling:

- constants: COMPOSER_STACK_BREAKPOINT_PX, COMPOSER_SINGLE_LINE_MAX_PX,
  COMPOSER_FADE_BACKGROUND, DRAFT_PERSIST_DEBOUNCE_MS
- helpers: pickPlaceholder, COMPLETION_ACTIONS, slashChipKindForItem,
  slashArgStage, slashCommandToken, cloneAttachments
- type: QueueEditState

Pure restructuring, no behavior change; adds unit tests for the slash helpers.
(The ChatBar component itself is a single tightly-coupled megacomponent; a
deeper hook-based decomposition is left for a dedicated follow-up.)
2026-06-30 01:30:52 -05:00
Brooklyn Nicholson
88e29e35bd refactor(desktop): split sidebar/index.tsx god file into focused modules
Behavior-preserving extraction of the 1,963-line ChatSidebar file into the
existing sidebar/ sibling-module convention:

- order.ts: add pure orderByIds / reconcileOrderIds / sameIds helpers (+ tests)
- reorderable-list.tsx: the generic ReorderableList + useSortableBindings DnD
  primitive
- section-states.tsx: SidebarSessionSkeletons / SidebarBlankState /
  SidebarPinnedEmptyState
- sessions-section.tsx: SidebarSectionHeader + the large SidebarSessionsSection
  renderer + its sortable row wrappers

index.tsx now holds only the ChatSidebar component (1,963 -> 1,416 lines).
2026-06-30 01:26:37 -05:00
Brooklyn Nicholson
7ff6908a59 refactor(desktop): split thread.tsx god file into focused modules
Behavior-preserving extraction of the 1,942-line thread.tsx transcript
renderer into co-located sibling modules, matching the existing flat
assistant-ui/ convention:

- thread-content.ts / thread-timestamp.ts: pure helpers (+ unit tests)
- thread-types.ts: shared RestoreMessageTarget
- thread-status.tsx: loading / stall / background-resume indicators
- thread-message-parts.tsx: reasoning + tool part components
- assistant-message.tsx, system-message.tsx, user-message.tsx,
  user-edit-composer.tsx: the message renderers

thread.tsx now holds only the Thread route component (1,942 -> 119 lines).
Also drops a dead readAloudAudio module variable (no references).
2026-06-30 01:21:55 -05:00
brooklyn!
a81c5922a2
Merge pull request #55413 from NousResearch/bb/pre-stop-hook
feat(agent): add pre_verify hook and coding guidance config
2026-06-30 01:10:08 -05:00
Brooklyn Nicholson
821d9f709f feat(agent): add configurable coding_instructions
agent.coding_instructions (a string or list) is appended to the coding brief as
its own stable system block, so users can pin project-wide workflow rules
without editing the shipped brief. Coding-posture only and cache-safe (resolved
once per session; takes effect next session). Empty by default.
2026-06-30 00:59:59 -05:00
Brooklyn Nicholson
a10113658b feat(agent): add pre_verify hook and verify-on-stop coding guidance
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.

Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
2026-06-30 00:59:29 -05:00
beardthelion
14c4a849b7 fix(kanban): make goal_mode judge gate truly fail-open
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.

Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.

Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.

Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
2026-06-29 22:20:19 -07:00
beardthelion
b3c1b3b3f3 fix(kanban): address review feedback on goal_mode judge gate
Apply naqerl's review comments on PR #38388:

- Hoist `from hermes_cli.goals import judge_goal` to module-level
  imports so an import failure surfaces at module init, not lazily
  on the first goal-mode completion (no circular import: hermes_cli
  package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
  The verdict check and its rejection `return tool_error(...)` now
  live outside the handler, so a failure there can no longer be
  swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.

Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
2026-06-29 22:20:19 -07:00
beardthelion
0b33bc5396 fix(kanban): gate goal_mode task completion with auxiliary judge
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.

This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.

Fixes #38367

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-29 22:20:19 -07:00