Commit graph

8232 commits

Author SHA1 Message Date
Teknium
3b122cc1ac
feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578)
Surface ready tasks that nobody claims within a threshold (default
30 min) regardless of why. One identity-agnostic signal that catches:

- Operator typo'd the assignee
- Profile was deleted, leaving its tasks stranded
- External worker pool (Codex CLI lane, custom daemon) is down
- Dispatcher misconfigured (wrong board / wrong HERMES_HOME)

Today the dispatcher correctly skips these (no respawn loop, good)
but nothing surfaces the fact that operator-actionable work is
accumulating. The new `stranded_in_ready` rule does that without
requiring a manual lane registry — it reads the most recent ready-
transition event (`created` / `promoted` / `reclaimed` / `unblocked`)
and fires when (now - last_ready_ts) > threshold.

Severity escalates with age: warning at threshold, error at 2x,
critical at 6x. The cli_hint and reassign actions point operators
at the right next step.

Out of scope deliberately:
- Lane registry (#20157 closed) — this signal supersedes it.
- Pushing the diagnostic into messaging gateways — diagnostics
  are pull-only via 'hermes kanban diagnostics' for now; gateway
  push is a separate UX decision.

Tests: 10 new + 461 existing kanban tests pass. E2E verified end-
to-end via 'hermes kanban diagnostics --json' against a 2h-old
stranded task — surfaces as error severity with correct actions.
2026-05-10 21:58:44 -07:00
Teknium1
bf5b8a7d61 chore(release): map @eloklam tailnet email 2026-05-10 21:44:37 -07:00
Teknium1
b8bf2f817d fix(kanban): merge dashboard batch QOL with i18n + collapse + assignee-casing
PR #23240 was branched before main landed:
- c39168453 i18n localization (16 locales)
- a91e5a875 native <details> collapse + skip empty metadata
- 0e0ddaac8 tone down completed-run metadata panel
- b308dd7d7 preserve assignee casing in dashboard

The cherry-pick took PR's dist/index.js wholesale via -X theirs,
which dropped those features. This commit re-applies them by
hand-merging the 7 conflict regions:

1. bulk-action catch handler: keep PR's failedIds + loadBoard,
   keep main's t-in-deps for tx() i18n calls
2. Refresh button: keep main's tx(t, 'refresh', ...), add PR's
   Clear filters button with tx(t, 'clearFilters', ...)
3. Archive button: keep main's tx(t, 'archive', ...), add PR's
   priority setter with tx(t, 'priority'/'setPriority', ...)
4. Column header: keep main's colHelp i18n var, add PR's
   column-select-all checkbox
5/6. lane.tasks/column.tasks .map: keep main's t->tk rename
   (avoids shadowing the i18n t), apply tk to PR's failed/
   draggingSource props
7. Card checkbox label-wrap: keep PR's <label> structure
   (larger hit target), keep main's tx(i18n, 'selectForBulk', ...)

Adds three new i18n keys (clearFilters, priority, setPriority)
that fall back to English via tx() until translators add them
to the kanban catalog, matching the existing pattern.
2026-05-10 21:44:37 -07:00
eloklam
b60462a205 test(kanban): remove stale t.summary assertion from search test
Task.summary was never a real field; latest_summary already covers it.
Matches the haystack cleanup in commit f3015e6ab.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
3df7e30244 kanban dashboard: fix shift-click range selection, column select-all toggle, and bulk action optimistic UI
- Bug 1: shift-click now always adds the target card and sets it as the
  last-selected anchor, so range selection works even when 0 or 1 cards
  are selected.
- Bug 2: column select-all checkbox now toggles: if every card in the
  column is already selected, clicking unselects them all.
- Bug 3: applyBulk now mirrors moveSelected with optimistic UI updates
  for status moves and calls loadBoard() on catch for consistency.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
69053832e3 kanban dashboard: remove redundant t.summary from search haystack
The Task dataclass has no `summary` field; only Run carries summary.
The dashboard already searches `latest_summary` (derived from the
latest run), so `t.summary` in the client-side haystack was always
undefined and therefore redundant.

Verdict from task t_4bcac44f:
- Before batch QOL (6c7ec94d9): search only covered id, title,
  assignee, tenant.
- Batch QOL (7fd187102) correctly added body, result, latest_summary.
- `t.summary` was included but is a misleading no-op because tasks
  never expose a `summary` key — `latest_summary` already covers it.

Removes the redundant field from the haystack only.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
a88f201cd4 kanban dashboard: multi-card drag visual feedback
- When dragging a selected card while multiple cards are selected, the
  browser ghost image now shows a 'N cards' badge instead of a single card.
- All selected cards in the original column are dimmed (opacity 0.45 +
  grayscale) during the drag so the user sees the whole set is in-flight.
- Uses React state for the dragged task id; event delegation on the board
  columns container to avoid deep prop threading.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
98c499b235 kanban dashboard: fix batch QOL oracle blockers
- Preserve failedIds partial-failure highlighting after moveSelected/
  applyBulk by clearing only selectedIds/lastSelectedId instead of
  calling clearSelected() (which also wiped failedIds).
- Fix touch/native multi-drag drop stale closure by adding
  props.selectedIds and props.onMoveSelected to the hermes-kanban:drop
  useEffect dependency array.

Fixes t_5bfafb73.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
0ea234e093 feat(kanban): dashboard batch QOL upgrade
- Shift-click range selection, column select-all, select-all-visible
- Multi-card drag/drop via selectedIds + /tasks/bulk
- Expanded bulk actions: todo/ready/blocked/unblock/complete/archive,
  priority setter, reassign with reclaim_first checkbox
- Partial failure card highlight (failedIds + hermes-kanban-card--failed)
- Search expanded to body, result, latest_summary, summary
- Clear filters button + reset all filters on board switch
- Accessibility: larger checkbox hit target, tabIndex/role/aria-label,
  Enter/Space/Esc keyboard handlers
- Fix temporal-dead-zone bug: move clearSelected before moveSelected
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam
518d37f6af feat(kanban): add reclaim_first support to bulk reassign endpoint
- Extend BulkTaskBody with reclaim_first: bool = False
- In bulk_update, use kanban_db.reassign_task(..., reclaim_first=True)
  when payload.reclaim_first is set and assignee is present
- Falls back to existing assign_task behavior when reclaim_first is false

This enables the dashboard to bulk-reassign running tasks by
reclaiming their claims first, matching the single-task
/tasks/{id}/reassign endpoint behavior.
2026-05-10 21:44:37 -07:00
Teknium
a63a2b7c78
fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)
Live-tested on gemini-3-flash-preview the judge kept returning empty
or non-JSON content, tripping the consecutive-parse-failures auto-
pause. Free-form JSON output is hopeful; tool-call schemas are
enforced server-side by virtually every modern provider.

Two new tools the judge calls:

  - submit_checklist(items)  — Phase A, decompose
  - update_checklist(updates, new_items, reason) — Phase B, evaluate

Both phases now call the auxiliary client with tool_choice forcing
the right tool. read_file remains for Phase B history inspection,
with the loop exiting only when update_checklist is called or the
read budget is exhausted (at which point read_file is dropped from
the toolbox and update_checklist is forced).

Robustness:
- _call_judge_with_tool_choice falls back tool_choice forced→required→
  auto if the provider rejects a particular shape.
- If a fully-broken provider still returns content instead of a tool
  call, the legacy JSON-text parsers stay around as a last-ditch
  backstop so we never silently lose a checklist.
- _normalize_update_args replaces the JSON parser for the apply
  layer; same 1-based→0-based conversion + terminal-status filter.

Live verification: same fizzbuzz goal that was hitting 'judge model
returned unparseable output 3 turns in a row' before now terminates
in 2 turns, all 11 items marked completed with item-specific
evidence, no auto-pause. Agent log shows
'produced 11 checklist items via tool call' instead of the JSON-
parse path.

Tests: 7 new cases for the tool-call path (Phase A success, Phase B
update only, Phase B read_file→update, JSON-content backstop,
empty-text item dropping, non-terminal status filter).
2026-05-10 20:51:40 -07:00
Teknium
4a080b1d5a
fix(goals): forward standing /goal state on auto-compression session rotation (#23530)
When run_agent's _compress_context fires mid-turn it ends the parent
session in SessionDB and creates a new continuation session with a
fresh session_id. The /goal state is keyed on session_id in
state_meta ("goal:<sid>"), so without forwarding the goal silently
disappears: _get_goal_manager() rebinds for the new session_id,
load_goal() returns None, mgr.is_active() is False, and the
continuation loop dies with no user-visible signal.

Fix: in the same SessionDB transaction block that creates the
continuation session, copy state_meta[goal:<old>] →
state_meta[goal:<new>] when present. No-op when the user has no
active goal. Logged at INFO so a stuck loop is debuggable.

Tests cover the round-trip via SessionDB and the no-op path.

Affects all three run-conversation surfaces (CLI, gateway, TUI
gateway) because _compress_context is the single rotation site.
2026-05-10 20:41:53 -07:00
teknium1
68d081f570 fix(kanban): keep '--created-by' default as 'user'
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
uv.lock check / uv lock --check (push) Waiting to run
Out-of-scope behavior change in #23521 — the kanban notifier-routing fix
also flipped the 'kanban create --created-by' default from 'user' to the
active profile name. Revert to keep PR scope focused on the notifier
ownership fix; the profile-aware author default can be its own change.
2026-05-10 20:04:53 -07:00
Mike Nguyen
ba5640fa11 fix(gateway): route kanban notifications to creator profile 2026-05-10 20:04:53 -07:00
teknium1
9e005d6779 chore: AUTHOR_MAP entry for NivOO5 2026-05-10 20:02:50 -07:00
teknium1
7f90141c63 test(telegram): native-draft transport coverage + docs
Added tests/gateway/test_stream_consumer_draft.py with 11 tests
covering:
- Transport selection: auto+dm-supported -> draft; auto+group -> edit;
  explicit edit; explicit draft on unsupported adapter -> edit;
  MagicMock adapter -> edit (back-compat for the existing test suite).
- Happy path: DM stream animates draft frames with a single shared
  draft_id, then finalizes via a regular adapter.send.
- Group fallback: drafts entirely skipped in non-DM chats.
- Failure fallback: send_draft returning success=False disables drafts
  for the rest of the response.
- Draft_id lifecycle: consecutive responses use distinct ids; tool
  boundaries bump the id so post-tool text animates fresh below the
  tool-progress bubble (the openclaw #32535 leak guard).
- _already_sent contract: drafts must NOT set the flag so the gateway's
  fallback final-send still fires (drafts have no message_id).

Updated website/docs/user-guide/messaging/telegram.md with a
'Streaming transport' section explaining auto|draft|edit|off, the
DM-only constraint, and the per-response fallback behaviour.
2026-05-10 20:02:50 -07:00
NivOO5
4ed293b38e feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+)
Adds Telegram's native streaming-draft API as a streaming transport so DM
replies render with smooth animated previews as tokens arrive, dropping
the per-edit jitter of the legacy editMessageText polling path.

Adapter contract (gateway/platforms/base.py):
  - supports_draft_streaming(chat_type, metadata) -> bool. Default False.
    Telegram returns True only for DMs and only when the bound python-
    telegram-bot version exposes Bot.send_message_draft (PTB 22.6+).
  - send_draft(chat_id, draft_id, content, metadata) -> SendResult.
    Default raises NotImplementedError. Telegram delegates to PTB's
    send_message_draft. Drafts have no message_id (Bot API contract);
    SendResult.message_id is None on success.

Telegram adapter (gateway/platforms/telegram.py):
  - supports_draft_streaming gates on chat_type='dm' AND PTB capability.
  - send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads
    message_thread_id through metadata, and routes failures back as
    SendResult(success=False, error=...) so the consumer can fall back.

Stream consumer (gateway/stream_consumer.py):
  - StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off')
    and chat_type fields.
  - run() resolves _use_draft_streaming once via a probe at the top of
    the run, allocating a fresh class-wide draft_id_counter so each
    response animates as its own preview (no animation collision across
    consecutive responses to the same chat).
  - _send_or_edit gains a pre-edit branch: when drafts are active AND
    not finalizing AND no edit-path message_id is established, the
    frame routes through _send_draft_frame instead of edit_message.
    Drafts intentionally do NOT set _already_sent so the gateway's
    final sendMessage path still fires — drafts have no message_id and
    the user needs a real message in their chat history.
  - _reset_segment_state bumps the draft_id when the consumer is in
    draft mode so each text block after a tool boundary animates as a
    fresh preview below the tool-progress bubble (avoids the inter-
    tool-call leak openclaw documented in their #32535).
  - Per-response fallback: any send_draft failure (transient network,
    server reject, capability gap) flips _use_draft_streaming to False
    for the rest of the run, gracefully returning to the edit path.

Gateway config (gateway/config.py):
  - StreamingConfig.transport default flips edit -> auto. The auto path
    is identical to edit on every chat type that doesn't currently
    support drafts (groups, supergroups, forum topics, every non-
    Telegram platform), so the default is backwards-compatible for
    non-DM users.

Lifecycle model (Telegram Bot API 9.5):
  1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble.
  2. Repeated sendMessageDraft calls with the SAME draft_id animate
     the preview as text grows.
  3. Drafts have no message_id and cannot be edited or deleted.
  4. When the response finishes the gateway's normal sendMessage path
     delivers the final answer; the draft preview clears naturally on
     the client and the user sees a real message in their history.

Inspired by PR #3412 by @NivOO5. Re-authored against current main
(stream_consumer.py is now ~4x larger than at #3412's branch base, with
new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the
original PR didn't account for) but the design call (DM-only, edit-
fallback, transport=auto|draft|edit|off) is faithful to the original
proposal, with two improvements baked in:

  1. Per-response draft_id (monotonic counter, not a time hash) — no
     collision risk across consecutive responses on the same chat.
  2. Tool-boundary draft_id bump — prevents the inter-tool-call leak
     openclaw hit during their rollout (their #32535).

Closes #21439 (duplicate feature request).
2026-05-10 20:02:50 -07:00
Teknium
80bb5f2947 fix(achievements): use canonical X-Hermes-Session-Token header
Follow-up to TreyDong's fix: switch the auth header to
`X-Hermes-Session-Token` (the canonical pattern used by the rest of
the dashboard SPA — see `web/src/lib/api.ts` `fetchJSON()`). The
server still accepts both schemes, so the original `Authorization:
Bearer` form would also work; we standardize on X-header to match
every other dashboard fetch and only set the header when a token is
actually present.

Also add scripts/release.py AUTHOR_MAP entry for treydong.zh@gmail.com.
2026-05-10 19:41:45 -07:00
treydong
da2ed478b5 fix(achievements): inject Authorization header in plugin API calls 2026-05-10 19:41:45 -07:00
Teknium
771b8c4a36
test(conftest): plug every gateway-kill leak path (#23486)
The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg
and a narrow subset of subprocess invocations. Tests still SIGTERMed the
live gateway today (May 10) because the guard had structural holes.

Plug them all:
- subprocess: also wrap getoutput, getstatusoutput
- os.system, os.popen - completely unwrapped before
- pty.spawn - completely unwrapped before
- asyncio.create_subprocess_exec / create_subprocess_shell - bypassed
  the subprocess module entirely; now wrapped
- Subprocess command inspection now looks at the WHOLE command string,
  not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c
  'systemctl', setsid systemctl, /usr/bin/systemctl, etc.
- New process-killer block: pkill / killall / taskkill / fuser
  targeting hermes/python patterns is now refused
- os.kill PID 0 (own group) allowed; PID -1 (every process we can
  signal) refused
- subprocess.Popen wrapper preserves __class_getitem__ so third-party
  packages that use Popen[bytes] as a type annotation still import

Coverage is locked in by tests/test_live_system_guard_self_test.py -
exercises every primitive against a guaranteed-foreign PID and asserts
the guard fires. Adding a new kill primitive without updating the guard
breaks CI.

scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py
when present (developer-machine convenience), so even worktrees that
predate this commit get the protection on subsequent test runs through
the canonical wrapper.
2026-05-10 18:55:28 -07:00
Teknium
e5bce320db
fix(auxiliary): evict cached client on timeout/connection error (#23482)
A Codex auxiliary timeout closes the underlying OpenAI client (so the
streaming hang doesn't sit until the user kills the session), but the
cached wrapper kept pointing at the now-dead transport. Subsequent
auxiliary calls (compression retry, memory flush, background review,
title generation routed via provider: main) reused that closed client
and failed fast with 'Connection error' until the gateway restarted —
even though the main agent route was healthy the whole time.

Sync `_get_cached_client` had no liveness check (async did, via loop
identity), and the connection-error fallback in `call_llm` only fired
on the auto provider path, so an explicit provider — including the
common `auxiliary.compression.provider: main` shape — never evicted.

Three fixes:

* New `_evict_cached_client_instance(target)` helper that drops the
  cache entry whose stored client is target (or wraps it via
  `_real_client`, for `CodexAuxiliaryClient`).
* `_CodexCompletionsAdapter._close_client_on_timeout` evicts the
  wrapper after closing the inner OpenAI client.
* `call_llm` and `async_call_llm` evict on `_is_connection_error`
  before re-raising, regardless of whether the provider is auto.

Net effect: one timeout costs one summary attempt + the existing 30s
compressor cooldown; the next compaction rebuilds the client and
works. Non-connection errors (4xx/5xx) do not evict, so cache hits
stay stable.

Closes #23432
2026-05-10 18:55:05 -07:00
Teknium1
ae83a54be4 docs(kanban): worker lane contract page + review-required convention
Closes the architectural-pin part of #19931. Most of what that issue
asked for is already implemented (logs under kanban root, env-pinned
workspace, dispatcher routing of unknown assignees, lifecycle
ownership, structured handoff conventions). What was missing:

1. A written contract integrators can point at when adding a new
   worker lane shape, and
2. The "code-changing workers should not auto-promote success to
   done" convention.

This commit ships both as docs+convention layered on existing primitives.
No kernel changes — the kanban_complete / kanban_block / kanban_comment
surfaces already support the review-required pattern; we just hadn't
written it down or made it visible to workers.

Changes:

- `agent/prompt_builder.py::KANBAN_GUIDANCE`: append the review-required
  exception to step 5 of the lifecycle. Workers get the cue
  auto-injected into their system prompt — drop structured metadata
  into a kanban_comment first, then end with
  kanban_block(reason="review-required: <summary>") instead of
  kanban_complete when the work needs review. Total prompt size went
  from ~3000 to ~3275 chars; well under the 4096 budget enforced by
  test_kanban_guidance_size.

- `skills/devops/kanban-worker/SKILL.md`: add a worked example to the
  existing "Good summary + metadata shapes" section between the
  Coding-task and Research-task examples. Same shape as the others
  (kanban_comment with structured handoff JSON, then kanban_block with
  the human-readable reason). Plus a one-line guide on when to use
  kanban_complete vs the review-required pattern.

- `website/docs/user-guide/features/kanban-worker-lanes.md` (new): the
  integrator-facing contract. Covers the hierarchy, the three things
  every lane must provide (assignee, spawn mechanism, lifecycle
  terminator), the env vars the dispatcher injects, the
  review-required convention, the failure modes the kernel handles
  for free, and an explicit "external CLI worker lane" deferred-
  pending-concrete-asker section that links to #19931 and #19924.

- `website/sidebars.ts`: link the new page under user-guide/features.

The "specialist worker lanes for external CLI tools (Codex / Claude
Code / OpenCode)" runner is NOT shipped here. The dispatcher's
spawn_fn parameter already supports plugin-shaped extension; the
per-CLI integration work (auth, sandbox policy, exit-code mapping)
needs a concrete asker. The new docs page tells would-be integrators
the contract any such lane must satisfy.

Refs #19931
2026-05-10 18:15:52 -07:00
teknium1
666b751536 chore: AUTHOR_MAP entry for rahimsais 2026-05-10 18:09:31 -07:00
rahimsais
737314fe91 fix(telegram): normalize dm threads and retry control sends
Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id
issue (#3206):

1. _build_message_event filters DM thread_ids: only preserve thread_id
   for real topic messages (is_topic_message=True). Telegram puts
   message_thread_id on every DM that is a reply, but reply-chain ids
   route to nonexistent threads on send.

2. _send_message_with_thread_fallback helper: control sends
   (send_update_prompt, send_exec_approval / send_slash_confirm,
   send_model_picker) retry once without message_thread_id when
   Telegram returns BadRequest 'Message thread not found'. Mirrors
   the pattern PR #3390 added for the streaming send path.

Salvage notes:
- Conflict 1 (line ~4099): merged the contributor's DM is_topic_message
  filter with the existing forum General-topic default from #22423,
  preserving both behaviors.
- Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416)
  alongside the new helper. Tightened the helper's exception catch
  from bare 'Exception' to use the existing _is_bad_request_error +
  _is_thread_not_found_error helpers (line 484-496) for consistency
  with the streaming send path.
- Widened the fix to send_update_prompt (was bare self._bot.send_message,
  same bug class).

Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@
local commit author).
2026-05-10 18:09:31 -07:00
Teknium
404640a2b7
feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls

Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.

Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
  compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
  more items not fewer. Falls through to legacy freeform judge when
  decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
  a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
  A bounded read_file tool (max 5 calls per turn, restricted to that
  one file) lets the judge inspect history when the snippet is
  ambiguous. Stickiness in code: terminal items are frozen, only the
  user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
  reverts to old prompt when empty.
- Status line shows M/N done counts.

CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.

Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.

* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning

Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:

1. Off-by-one bug — the judge sees the checklist rendered with 1-based
   indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
   state.checklist as 0-based. Result: every judge update landed on
   the wrong item, evidence got attached to neighbouring rows, and
   the genuine 'first pending' item (usually #1) never got marked.
   Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
   user prompt to call out the 1-based scheme explicitly. New tests
   cover the parser conversion + an end-to-end fake-judge round-trip.

2. Conversation dump never happened — _extract_agent_messages tried
   common AIAgent attribute names (.messages, .conversation_history,
   etc.) but AIAgent doesn't expose the message list as an instance
   attribute; it lives inside run_conversation()'s scope. Result: the
   judge's read_file tool always saw history_path=unavailable. Fix:
   added an explicit messages= kwarg to evaluate_after_turn that all
   three call sites (CLI, gateway, TUI gateway) now pass directly.
   Agent-attribute extraction kept as back-compat fallback.

3. Prompt was too harsh on simple goals. The original 'be HARSH,
   default to leaving items pending' wording made the judge refuse
   to mark 'file exists' completed even after the agent ran ls,
   test -f, os.path.isfile, and find — burning the entire 8-turn
   budget on a fizzbuzz task. Softened to 'strict but not absurd'
   with explicit guidance on what counts as evidence and a directive
   not to require re-proving items already established earlier.

Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
2026-05-10 16:56:51 -07:00
teknium1
c0bbdec850 chore: AUTHOR_MAP entry for Freeman-Consulting 2026-05-10 16:21:07 -07:00
teknium1
121bbe0385 test(stream-consumer): add UTF-16 overflow regression tests for #11170
New TestUtf16OverflowDetection class covers two scenarios:
- test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds
  2200 emoji codepoints (4400 UTF-16 units) — under Telegram's
  codepoint-equivalent limit but over its UTF-16 limit. Asserts
  truncate_message was called with len_fn=utf16_len, confirming the
  consumer detected the overflow.
- test_codepoint_only_adapter_falls_back_to_len: documents that
  adapters which don't subclass BasePlatformAdapter (or test MagicMocks)
  fall back to plain len for backwards compat.

The contributor's PR shipped no tests for the UTF-16 path.
2026-05-10 16:21:07 -07:00
Aubrey Freeman III
c0da5d09a6 fix: use UTF-16 length for Telegram stream consumer message splitting
The stream consumer measured message length using Python's len() (Unicode
code points), but Telegram's actual limit is in UTF-16 code units. This
caused messages with supplementary characters (emoji, CJK, etc.) to exceed
Telegram's 4096-character limit, resulting in truncated messages with
formatting artifacts.

Changes:
- Add message_len_fn property to BasePlatformAdapter (defaults to len)
- Override in TelegramAdapter to return utf16_len
- Stream consumer uses adapter.message_len_fn for:
  - safe_limit calculation
  - overflow detection
  - truncate_message calls
  - split point calculation (via _custom_unit_to_cp)
  - fallback final send chunking

Fixes truncated messages with black square artifacts on Telegram when
the model generates responses containing multi-byte Unicode characters.
2026-05-10 16:21:07 -07:00
Teknium
c5f1f863ac
fix(cli): drive _prompt_text_input directly when off main thread (#23454)
Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the
process_loop daemon thread.  prompt_toolkit.run_in_terminal returns a
coroutine that only the main-thread event loop can drive, so calling it
from a daemon thread orphans the coroutine — the input prompt never
renders and user keystrokes leak into the composer instead of the
confirmation prompt (issue #23185).

Mirror the thread-aware guard already in _run_curses_picker: when off the
main thread, fall back to a direct input() call.  Also wrap
run_in_terminal in try/except so WSL / Warp / other emulators that
silently drop the scheduled coroutine fall back to input() too.

Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main
thread (run_in_terminal path), daemon thread (direct input fallback),
no-app, run_in_terminal-raises, and EOF handling.
2026-05-10 16:16:10 -07:00
konsisumer
62cfe79e93 fix(tools): clarify kanban_complete phantom-card retry guidance
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.

Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.

Fixes #22923
2026-05-10 16:14:43 -07:00
Keyu Yuan
2f00559d9e fix(telegram): pass source.thread_id explicitly on auto-reset notice (carve-out of #7404)
The auto-reset notice ("◐ Session automatically reset…") was being sent
with metadata=getattr(event, 'metadata', None), which can drop or
mis-route in Telegram forum topics: the event's metadata isn't
guaranteed to carry the originating thread_id, so the notice could leak
into General or another topic.

Use the existing self._thread_metadata_for_source(source) helper, which
already handles thread_id construction plus the Telegram DM topic
reply-fallback shape used everywhere else in the gateway.

Carve-out of #7404. The PR's other hunk (line 7578, queued first
response) is already redundant on main — gateway/run.py:15782 has used
_status_thread_metadata since the _thread_metadata_for_source plumbing
landed.

Closes #7355 (path B; paths A and C closed via prior salvage merges).
2026-05-10 16:12:40 -07:00
Wesley Simplicio
a2920b1762 fix(tui): right-click copies selection, only pastes when no selection
Sub-issue 5 of #22034.

Right-click on the composer always pasted from the clipboard, even when
the user had highlighted text — diverging from terminal-native behavior
(xterm/iTerm/gnome-terminal) where right-click copies an active selection
and only pastes when nothing is selected.

Extract a small pure helper, decideRightClickAction(value, range), and
route the existing onMouseDown right-click branch through it. Selection
present and non-empty -> writeClipboardText(slice). Otherwise fall back
to the existing emitPaste path.
2026-05-10 16:06:33 -07:00
Teknium1
59d3f24f10 chore: AUTHOR_MAP entry for konsisumer noreply (#23071) 2026-05-10 15:23:04 -07:00
konsisumer
88588b6159 fix(kanban): extend stale claim instead of killing live worker
Workers running slow models (e.g. kimi-k2.6) can spend longer than
DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making
no tool calls and therefore not heartbeating. release_stale_claims
previously reclaimed these healthy workers, producing the
spawn-then-immediately-reclaim loop reported in #23025.

When a stale-by-TTL claim's host-local worker PID is still alive,
extend the claim (emit a claim_extended event) rather than killing
it. enforce_max_runtime / detect_crashed_workers remain the upper
bounds for genuinely wedged or dead workers. Reclaim events now also
record claim_expires, last_heartbeat_at, worker_pid, and host_local
so operators can see why a worker was killed.
2026-05-10 15:23:04 -07:00
Teknium
3974a137c6
docs(user-stories): add 116 stories from the Hermes Discord archive (#23436)
* docs(user-stories): add 116 stories from Discord archive

Mined teknium1/nous-discord-archive for first-person user stories that match
the existing collage voice ('I run X every day', 'my family uses Hermes for
Y', 'so I built Z'). Skipped pure project pitches, Q&A, install help, and
generic announcements.

- Added 'discord' as a source in UserStoriesCollage (label + brand color)
- Added 116 entries to userStories.json (237 total, up from 121)
- Each entry links back to the discord-archive thread or channel archive file

* docs(user-stories): interleave discord stories across the full collage

Shuffle userStories.json with a fixed seed so the 116 Discord-sourced
entries are mixed evenly with the existing 121 entries instead of
appearing as a contiguous block at the end. Even distribution: 10-16
discord entries per decile across the array (ideal would be ~11).
2026-05-10 15:21:40 -07:00
Teknium
d6e1fadbf5
fix(xai): omit reasoning.effort for grok models that reject it (#23435)
xAI's Responses API returns HTTP 400 ("Model X does not support
parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-*,
grok-4-1-fast-*, grok-3, grok-4.20-0309-*, and grok-code-fast-1 — even
though those models reason natively. Hermes was unconditionally sending
`reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking
direct `--provider xai` for the entire grok-4 line.

Add a substring allowlist predicate (verified live against api.x.ai
2026-05-10) covering the only Grok families that accept the effort dial:
grok-3-mini*, grok-4.20-multi-agent*, grok-4.3*. The Responses transport
omits the `reasoning` key entirely for everything else while still
including `reasoning.encrypted_content` so we capture native reasoning
tokens.

Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709`
went from HTTP 400 to a successful reply.
2026-05-10 15:21:30 -07:00
teknium1
cc2a0c674a chore: AUTHOR_MAP entry for hrygo (黄飞虹) 2026-05-10 15:20:40 -07:00
teknium1
f9e0d60a99 test(thread-routing): handle both lark-SDK-present and absent paths
The contributor's regression test for Feishu fallback thread routing
asserted on attributes specific to the real lark SDK builder
(call_args.body, body.receive_id). In test environments without the
lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py
_build_create_message_request) returns a SimpleNamespace using
.request_body instead of .body, causing AttributeError.

Now reads via getattr fallback and also verifies receive_id_type is
'thread_id' (not 'chat_id') as a stronger contract check.
2026-05-10 15:20:40 -07:00
黄飞虹
e164a9c1ed fix(stream-consumer): preserve thread routing on overflow first-send path
When the first streamed message exceeds the platform length limit and
gets split into chunks, _send_new_chunk was called with self._message_id
(which is None on first send), dropping thread routing entirely.

Fallback to self._initial_reply_to_id so overflow chunks land in the
correct topic/thread.

Also fix a fragile test assertion that could be silently skipped.
2026-05-10 15:20:40 -07:00
hrygo
ff14666cdc fix(gateway): stream consumer first message drops thread context
Cherry-picked from PR #13077 commits:
- 5500c7d8 fix(gateway): stream consumer first message drops thread context
- e84403b9 test(gateway): add regression tests for stream consumer thread routing

Fixes: Streaming first message drops thread/topic context in Feishu group
topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id
ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and
_send_new_chunk. Also fixes Feishu _send_raw_message fallback path
(reply -> create) to use receive_id_type='thread_id' so the new message
lands in the correct topic instead of the main channel.

Authored by hrygo via PR #13077 (re-attributed from the bot-authored
salvage commit on the original branch).
2026-05-10 15:20:40 -07:00
Teknium
6636fecd47
fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420)
The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was
copying the cumulative _already_sent flag into _final_response_sent on the
done frame. _already_sent goes True on any successful prior edit (tool
progress) or on fallback-mode promotion when an edit fails — neither
proves the *current* chunked send delivered the final answer.

When the chunked send actually fails (network error, flood control), the
consumer would wrongly claim 'final delivered' and the gateway's
independent fallback delivery in run.py would be suppressed. User saw
only tool-progress bubbles and never got the answer.

Now we track per-chunk success locally: _send_new_chunk returns the new
message_id on success or returns the passed-in reply_to unchanged on
failure. If at least one returned id differs, chunks_delivered = True;
otherwise stays False, gateway fallback runs.

Adds two regression tests:
- test_split_overflow_failed_send_does_not_mark_final_sent — primes
  _already_sent=True, then makes every send fail; asserts
  _final_response_sent stays False.
- test_split_overflow_partial_send_marks_final_sent — happy path,
  asserts _final_response_sent goes True.

Note: the companion bug at the CancelledError handler (issue cited
lines 417-418) was already fixed by 3b5572ded on 2026-04-16.

Closes #10748
2026-05-10 15:13:54 -07:00
Teknium
b38b100105 chore: AUTHOR_MAP entry for jelrod27 (#21398) 2026-05-10 14:27:59 -07:00
Teknium
787e3c368c test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests
Follow-up to the previous commit's notifier behavior change. Two test fixes:

1. `tests/gateway/test_kanban_notifier.py` gains
   `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new
   contract directly: a task that crashes, gets reclaimed, and crashes
   again notifies the user BOTH times. Before #21398 the second crash
   silently dropped because the subscription was already deleted.

2. `tests/hermes_cli/test_kanban_notify.py::
   test_notifier_unsubs_after_abnormal_events[gave_up|crashed|timed_out]`
   is flipped. Those tests were added in the salvage of #22941 and
   asserted the OLD behavior (subscription deleted after gave_up /
   crashed / timed_out). They're now obsolete — the new contract is
   "subscription survives a non-final terminal event so retries reach
   the user." Updated docstring + asserts; the cursor-advance check is
   added to confirm the dedup mechanism still works.

The `test_notifier_unsubs_after_completed_event` test stays untouched
because `completed` IS still a terminal event that triggers unsub
(the task hits `done` status, which is handled by the `task_terminal`
branch in the notifier loop).
2026-05-10 14:27:59 -07:00
jelrod27
a96dd54872 fix: deduplicate kanban notifications for blocked/gave_up states
The kanban notifier was re-firing the same blocked/gave_up/crashed/timed_out
notifications on every 5-second tick. Root cause: after delivering a terminal
event, the notifier unsubscribed the subscription, deleting its cursor. If
the unsub failed (WAL contention, transient error), the subscription survived
with a stale cursor, and the next tick would re-deliver the same event.

Even when the unsub succeeded, the subscription was gone. If the task later
transitioned to a different state (e.g., blocked -> unblocked -> blocked
again), a new subscription would start at cursor=0, re-delivering all past
events.

Fix: stop unsubscribing on terminal event kinds. Only remove the subscription
when the task reaches a truly final status (done/archived). For blocked,
gave_up, crashed, and timed_out, the subscription stays alive and the cursor
mechanism deduplicates naturally -- events with id <= last_event_id are never
re-fetched. This makes the dedup idempotent and eliminates the re-fire bug.

The old concern about subscriptions leaking forever on blocked tasks is moot:
blocked tasks will eventually be unblocked (transitioning to ready/running)
or archived, at which point the subscription is cleaned up.
2026-05-10 14:27:59 -07:00
teknium1
04e18160ab chore: AUTHOR_MAP entry for HuangYuChuh 2026-05-10 14:22:59 -07:00
teknium1
ec1fad3449 fix(gateway): align fallback delete with sibling style + add regression tests
Follow-up to HuangYuChuh's #17384 cherry-pick:

- Use defensive getattr+logger.debug for delete_message lookup, mirroring
  the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms
  that don't implement delete_message no longer raise AttributeError; the
  failure path now logs at debug for diagnosability instead of silently
  swallowing.
- Add three regression tests in tests/gateway/test_stream_consumer.py:
  - delete_message awaited on happy-path exit with stale id
  - delete_message NOT awaited when no fallback chunks reached the user
  - no crash on adapters that lack delete_message (spec-restricted mock)
2026-05-10 14:22:59 -07:00
HuangYuChuh
4eb8479ebd fix(gateway): delete partial message after fallback send on flood control
When Telegram flood control triggers 3+ consecutive edit failures, the
stream consumer enters fallback mode and sends the complete response as
a new message. This leaves the user seeing two messages: a frozen
partial (with cursor) and the full duplicate.

After the fallback chunks are sent successfully, delete the original
partial message so the user only sees one complete response. The delete
is best-effort — if it fails (e.g. flood still active, missing
permissions), the full answer is still delivered.

Fixes #16668
2026-05-10 14:22:59 -07:00
Teknium
cdb6e5e52a
test(conftest): block tests from killing the live hermes-gateway (#23397)
The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest
runs sending SIGTERM to the developer's live gateway 5+ times in 3
days. Root cause: when a single test forgets to mock os.kill or
find_gateway_pids, the real call leaks past the hermetic HERMES_HOME
isolation — find_gateway_pids' psutil scan walks the whole machine and
returns the live gateway PID, then the unmocked os.kill delivers the
signal.

Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes,
and stop_profile_gateway code paths, install a single autouse guard in
tests/conftest.py that blocks the two primitives that actually cause
the damage:

  - os.kill rejects any PID outside the test process subtree with a
    hard RuntimeError so the offending test gets a stack trace instead
    of silently murdering the real gateway.
  - subprocess.run / Popen / call / check_call / check_output reject
    any 'systemctl <verb> hermes-gateway' invocation that would mutate
    the live unit. Read-only systemctl calls (status, show, list-units)
    still pass through.

We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids —
tests of those functions themselves need the real implementation.
Discovery without delivery is harmless; the os.kill + systemctl guards
catch the actual damage path.

Tests that legitimately need real signal delivery (e.g. PTY tests
signalling their own child) opt out via
@pytest.mark.live_system_guard_bypass.

Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce
the same 17 failures with and without this guard (all pre-existing on
main, unrelated to gateway-kill leaks). The live gateway survives the
test run that previously SIGTERMed it.
2026-05-10 13:20:27 -07:00
Mike Nguyen
6062c24fd1 ci: skip lint comment on fork PRs 2026-05-10 13:19:41 -07:00
Teknium
9c68d12079 test(kanban): cover send-exception rewind + drop noisy success log to debug
Two follow-up improvements to the previous commit's notifier dedup work.

1. Add a regression test for the send-exception rewind path. The
   contributor's PR included a test for the adapter-disconnect path
   (test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where
   adapter is None at delivery time), but not for the "adapter is
   connected, send() raises" path that fires inside the inner try/except
   at gateway/run.py:4314. The new test
   (test_kanban_notifier_rewinds_claim_on_send_exception) uses a
   FailingAdapter that always raises and confirms (a) send was actually
   attempted, (b) the claim was rewound, (c) the next call to
   unseen_events_for_sub still returns the event for retry.

2. Drop the per-delivery success log from INFO to DEBUG. A busy board
   on a multi-platform gateway can produce hundreds of these per day;
   that's gateway.log noise that obscures real warnings. Failure paths
   stay at WARNING (where you'd want to look when something's wrong)
   so we don't lose visibility into transient send issues.
2026-05-10 13:19:41 -07:00