hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-02 12:13:05 +00:00

Author	SHA1	Message	Date
0xbyt4	f6736ced81	fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages 1. Quick command exec ran in the gateway process's full environment without env sanitization or output redaction. A quick command like "env" or "printenv" would leak all API keys, OAuth tokens, and bot credentials to the messaging user. Fix: apply _sanitize_subprocess_env() before exec and redact_sensitive_text() on output before returning. 2. GatewayRunner._pending_messages was written on every interrupt (lines 1331-1334) but never read or consumed anywhere. The actual interrupt delivery uses adapter._pending_messages (a separate dict). Removed the write-only accumulation to prevent unbounded growth.	2026-05-10 22:12:23 -07:00
Muhammet Eren Karakuş	4c57a5b318	feat(skills): add api-testing optional skill (#1800 ) Adds optional-skills/software-development/api-testing/SKILL.md — a single-file runbook for systematic REST/GraphQL API debugging via Hermes tools (terminal, execute_code, web_extract, delegate_task). - 60-char description; gated to platforms: [linux, macos] - Layered debug flow (connectivity → TLS → auth → format → parse → semantics) - HTTP status playbook (401/403/404/409/422/429/5xx) - Pagination, idempotency, contract validation, correlation IDs - pytest smoke template, token-redaction patterns, leak checklist - Hermes tool patterns replace generic curl/python examples Lands in optional-skills/ (not always-active skills/) so it's installed via hermes skills install official/software-development/api-testing. scripts/release.py: AUTHOR_MAP entry for erenkar950@gmail.com → eren-karakus0. Closes #1800. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-10 22:11:31 -07:00
teknium1	6c1af45b78	chore: AUTHOR_MAP entry for kjames2001 (James Huang)	2026-05-10 22:02:56 -07:00
teknium1	82352e54c4	test(telegram): regression coverage for edit overflow split-and-deliver Two new tests: - tests/gateway/test_telegram_format.py test_message_too_long_splits_into_continuations_not_silent_truncation: asserts edit_message returns success=True with continuation_message_ids populated and message_id pointing at the last continuation when content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original fail-on-overflow assertion with the split-and-deliver contract. - tests/gateway/test_stream_consumer.py TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver: asserts the consumer side updates _message_id to the latest continuation, clears _last_sent_text, and fires on_new_message when the adapter reports a split-and-deliver result.	2026-05-10 22:02:56 -07:00
kjames2001	bf1f40996f	fix(telegram): split-and-deliver oversized edits instead of silent truncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw #32535 inter-tool-leak guard. Composes with what just landed: - PR #23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR #23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR #19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes #19537.	2026-05-10 22:02:56 -07:00
Teknium	3b122cc1ac	feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578 ) Surface ready tasks that nobody claims within a threshold (default 30 min) regardless of why. One identity-agnostic signal that catches: - Operator typo'd the assignee - Profile was deleted, leaving its tasks stranded - External worker pool (Codex CLI lane, custom daemon) is down - Dispatcher misconfigured (wrong board / wrong HERMES_HOME) Today the dispatcher correctly skips these (no respawn loop, good) but nothing surfaces the fact that operator-actionable work is accumulating. The new `stranded_in_ready` rule does that without requiring a manual lane registry — it reads the most recent ready- transition event (`created` / `promoted` / `reclaimed` / `unblocked`) and fires when (now - last_ready_ts) > threshold. Severity escalates with age: warning at threshold, error at 2x, critical at 6x. The cli_hint and reassign actions point operators at the right next step. Out of scope deliberately: - Lane registry (#20157 closed) — this signal supersedes it. - Pushing the diagnostic into messaging gateways — diagnostics are pull-only via 'hermes kanban diagnostics' for now; gateway push is a separate UX decision. Tests: 10 new + 461 existing kanban tests pass. E2E verified end- to-end via 'hermes kanban diagnostics --json' against a 2h-old stranded task — surfaces as error severity with correct actions.	2026-05-10 21:58:44 -07:00
Teknium1	bf5b8a7d61	chore(release): map @eloklam tailnet email	2026-05-10 21:44:37 -07:00
Teknium1	b8bf2f817d	fix(kanban): merge dashboard batch QOL with i18n + collapse + assignee-casing PR #23240 was branched before main landed: - `c39168453` i18n localization (16 locales) - `a91e5a875` native <details> collapse + skip empty metadata - `0e0ddaac8` tone down completed-run metadata panel - `b308dd7d7` preserve assignee casing in dashboard The cherry-pick took PR's dist/index.js wholesale via -X theirs, which dropped those features. This commit re-applies them by hand-merging the 7 conflict regions: 1. bulk-action catch handler: keep PR's failedIds + loadBoard, keep main's t-in-deps for tx() i18n calls 2. Refresh button: keep main's tx(t, 'refresh', ...), add PR's Clear filters button with tx(t, 'clearFilters', ...) 3. Archive button: keep main's tx(t, 'archive', ...), add PR's priority setter with tx(t, 'priority'/'setPriority', ...) 4. Column header: keep main's colHelp i18n var, add PR's column-select-all checkbox 5/6. lane.tasks/column.tasks .map: keep main's t->tk rename (avoids shadowing the i18n t), apply tk to PR's failed/ draggingSource props 7. Card checkbox label-wrap: keep PR's <label> structure (larger hit target), keep main's tx(i18n, 'selectForBulk', ...) Adds three new i18n keys (clearFilters, priority, setPriority) that fall back to English via tx() until translators add them to the kanban catalog, matching the existing pattern.	2026-05-10 21:44:37 -07:00
eloklam	b60462a205	test(kanban): remove stale t.summary assertion from search test Task.summary was never a real field; latest_summary already covers it. Matches the haystack cleanup in commit f3015e6ab.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	3df7e30244	kanban dashboard: fix shift-click range selection, column select-all toggle, and bulk action optimistic UI - Bug 1: shift-click now always adds the target card and sets it as the last-selected anchor, so range selection works even when 0 or 1 cards are selected. - Bug 2: column select-all checkbox now toggles: if every card in the column is already selected, clicking unselects them all. - Bug 3: applyBulk now mirrors moveSelected with optimistic UI updates for status moves and calls loadBoard() on catch for consistency.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	69053832e3	kanban dashboard: remove redundant t.summary from search haystack The Task dataclass has no `summary` field; only Run carries summary. The dashboard already searches `latest_summary` (derived from the latest run), so `t.summary` in the client-side haystack was always undefined and therefore redundant. Verdict from task t_4bcac44f: - Before batch QOL (6c7ec94d9): search only covered id, title, assignee, tenant. - Batch QOL (7fd187102) correctly added body, result, latest_summary. - `t.summary` was included but is a misleading no-op because tasks never expose a `summary` key — `latest_summary` already covers it. Removes the redundant field from the haystack only.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	a88f201cd4	kanban dashboard: multi-card drag visual feedback - When dragging a selected card while multiple cards are selected, the browser ghost image now shows a 'N cards' badge instead of a single card. - All selected cards in the original column are dimmed (opacity 0.45 + grayscale) during the drag so the user sees the whole set is in-flight. - Uses React state for the dragged task id; event delegation on the board columns container to avoid deep prop threading.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	98c499b235	kanban dashboard: fix batch QOL oracle blockers - Preserve failedIds partial-failure highlighting after moveSelected/ applyBulk by clearing only selectedIds/lastSelectedId instead of calling clearSelected() (which also wiped failedIds). - Fix touch/native multi-drag drop stale closure by adding props.selectedIds and props.onMoveSelected to the hermes-kanban:drop useEffect dependency array. Fixes t_5bfafb73.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	0ea234e093	feat(kanban): dashboard batch QOL upgrade - Shift-click range selection, column select-all, select-all-visible - Multi-card drag/drop via selectedIds + /tasks/bulk - Expanded bulk actions: todo/ready/blocked/unblock/complete/archive, priority setter, reassign with reclaim_first checkbox - Partial failure card highlight (failedIds + hermes-kanban-card--failed) - Search expanded to body, result, latest_summary, summary - Clear filters button + reset all filters on board switch - Accessibility: larger checkbox hit target, tabIndex/role/aria-label, Enter/Space/Esc keyboard handlers - Fix temporal-dead-zone bug: move clearSelected before moveSelected	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	518d37f6af	feat(kanban): add reclaim_first support to bulk reassign endpoint - Extend BulkTaskBody with reclaim_first: bool = False - In bulk_update, use kanban_db.reassign_task(..., reclaim_first=True) when payload.reclaim_first is set and assignee is present - Falls back to existing assign_task behavior when reclaim_first is false This enables the dashboard to bulk-reassign running tasks by reclaiming their claims first, matching the single-task /tasks/{id}/reassign endpoint behavior.	2026-05-10 21:44:37 -07:00
Teknium	a63a2b7c78	fix(goals): force judge to use tool calls instead of JSON-text replies (#23547 ) Live-tested on gemini-3-flash-preview the judge kept returning empty or non-JSON content, tripping the consecutive-parse-failures auto- pause. Free-form JSON output is hopeful; tool-call schemas are enforced server-side by virtually every modern provider. Two new tools the judge calls: - submit_checklist(items) — Phase A, decompose - update_checklist(updates, new_items, reason) — Phase B, evaluate Both phases now call the auxiliary client with tool_choice forcing the right tool. read_file remains for Phase B history inspection, with the loop exiting only when update_checklist is called or the read budget is exhausted (at which point read_file is dropped from the toolbox and update_checklist is forced). Robustness: - _call_judge_with_tool_choice falls back tool_choice forced→required→ auto if the provider rejects a particular shape. - If a fully-broken provider still returns content instead of a tool call, the legacy JSON-text parsers stay around as a last-ditch backstop so we never silently lose a checklist. - _normalize_update_args replaces the JSON parser for the apply layer; same 1-based→0-based conversion + terminal-status filter. Live verification: same fizzbuzz goal that was hitting 'judge model returned unparseable output 3 turns in a row' before now terminates in 2 turns, all 11 items marked completed with item-specific evidence, no auto-pause. Agent log shows 'produced 11 checklist items via tool call' instead of the JSON- parse path. Tests: 7 new cases for the tool-call path (Phase A success, Phase B update only, Phase B read_file→update, JSON-content backstop, empty-text item dropping, non-terminal status filter).	2026-05-10 20:51:40 -07:00
Teknium	4a080b1d5a	fix(goals): forward standing /goal state on auto-compression session rotation (#23530 ) When run_agent's _compress_context fires mid-turn it ends the parent session in SessionDB and creates a new continuation session with a fresh session_id. The /goal state is keyed on session_id in state_meta ("goal:<sid>"), so without forwarding the goal silently disappears: _get_goal_manager() rebinds for the new session_id, load_goal() returns None, mgr.is_active() is False, and the continuation loop dies with no user-visible signal. Fix: in the same SessionDB transaction block that creates the continuation session, copy state_meta[goal:<old>] → state_meta[goal:<new>] when present. No-op when the user has no active goal. Logged at INFO so a stuck loop is debuggable. Tests cover the round-trip via SessionDB and the no-op path. Affects all three run-conversation surfaces (CLI, gateway, TUI gateway) because _compress_context is the single rotation site.	2026-05-10 20:41:53 -07:00
teknium1	68d081f570	fix(kanban): keep '--created-by' default as 'user' Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Out-of-scope behavior change in #23521 — the kanban notifier-routing fix also flipped the 'kanban create --created-by' default from 'user' to the active profile name. Revert to keep PR scope focused on the notifier ownership fix; the profile-aware author default can be its own change.	2026-05-10 20:04:53 -07:00
Mike Nguyen	ba5640fa11	fix(gateway): route kanban notifications to creator profile	2026-05-10 20:04:53 -07:00
teknium1	9e005d6779	chore: AUTHOR_MAP entry for NivOO5	2026-05-10 20:02:50 -07:00
teknium1	7f90141c63	test(telegram): native-draft transport coverage + docs Added tests/gateway/test_stream_consumer_draft.py with 11 tests covering: - Transport selection: auto+dm-supported -> draft; auto+group -> edit; explicit edit; explicit draft on unsupported adapter -> edit; MagicMock adapter -> edit (back-compat for the existing test suite). - Happy path: DM stream animates draft frames with a single shared draft_id, then finalizes via a regular adapter.send. - Group fallback: drafts entirely skipped in non-DM chats. - Failure fallback: send_draft returning success=False disables drafts for the rest of the response. - Draft_id lifecycle: consecutive responses use distinct ids; tool boundaries bump the id so post-tool text animates fresh below the tool-progress bubble (the openclaw #32535 leak guard). - _already_sent contract: drafts must NOT set the flag so the gateway's fallback final-send still fires (drafts have no message_id). Updated website/docs/user-guide/messaging/telegram.md with a 'Streaming transport' section explaining auto\|draft\|edit\|off, the DM-only constraint, and the per-response fallback behaviour.	2026-05-10 20:02:50 -07:00
NivOO5	4ed293b38e	feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+) Adds Telegram's native streaming-draft API as a streaming transport so DM replies render with smooth animated previews as tokens arrive, dropping the per-edit jitter of the legacy editMessageText polling path. Adapter contract (gateway/platforms/base.py): - supports_draft_streaming(chat_type, metadata) -> bool. Default False. Telegram returns True only for DMs and only when the bound python- telegram-bot version exposes Bot.send_message_draft (PTB 22.6+). - send_draft(chat_id, draft_id, content, metadata) -> SendResult. Default raises NotImplementedError. Telegram delegates to PTB's send_message_draft. Drafts have no message_id (Bot API contract); SendResult.message_id is None on success. Telegram adapter (gateway/platforms/telegram.py): - supports_draft_streaming gates on chat_type='dm' AND PTB capability. - send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads message_thread_id through metadata, and routes failures back as SendResult(success=False, error=...) so the consumer can fall back. Stream consumer (gateway/stream_consumer.py): - StreamConsumerConfig gains transport ('auto'\|'draft'\|'edit'\|'off') and chat_type fields. - run() resolves _use_draft_streaming once via a probe at the top of the run, allocating a fresh class-wide draft_id_counter so each response animates as its own preview (no animation collision across consecutive responses to the same chat). - _send_or_edit gains a pre-edit branch: when drafts are active AND not finalizing AND no edit-path message_id is established, the frame routes through _send_draft_frame instead of edit_message. Drafts intentionally do NOT set _already_sent so the gateway's final sendMessage path still fires — drafts have no message_id and the user needs a real message in their chat history. - _reset_segment_state bumps the draft_id when the consumer is in draft mode so each text block after a tool boundary animates as a fresh preview below the tool-progress bubble (avoids the inter- tool-call leak openclaw documented in their #32535). - Per-response fallback: any send_draft failure (transient network, server reject, capability gap) flips _use_draft_streaming to False for the rest of the run, gracefully returning to the edit path. Gateway config (gateway/config.py): - StreamingConfig.transport default flips edit -> auto. The auto path is identical to edit on every chat type that doesn't currently support drafts (groups, supergroups, forum topics, every non- Telegram platform), so the default is backwards-compatible for non-DM users. Lifecycle model (Telegram Bot API 9.5): 1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble. 2. Repeated sendMessageDraft calls with the SAME draft_id animate the preview as text grows. 3. Drafts have no message_id and cannot be edited or deleted. 4. When the response finishes the gateway's normal sendMessage path delivers the final answer; the draft preview clears naturally on the client and the user sees a real message in their history. Inspired by PR #3412 by @NivOO5. Re-authored against current main (stream_consumer.py is now ~4x larger than at #3412's branch base, with new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the original PR didn't account for) but the design call (DM-only, edit- fallback, transport=auto\|draft\|edit\|off) is faithful to the original proposal, with two improvements baked in: 1. Per-response draft_id (monotonic counter, not a time hash) — no collision risk across consecutive responses on the same chat. 2. Tool-boundary draft_id bump — prevents the inter-tool-call leak openclaw hit during their rollout (their #32535). Closes #21439 (duplicate feature request).	2026-05-10 20:02:50 -07:00
Teknium	80bb5f2947	fix(achievements): use canonical X-Hermes-Session-Token header Follow-up to TreyDong's fix: switch the auth header to `X-Hermes-Session-Token` (the canonical pattern used by the rest of the dashboard SPA — see `web/src/lib/api.ts` `fetchJSON()`). The server still accepts both schemes, so the original `Authorization: Bearer` form would also work; we standardize on X-header to match every other dashboard fetch and only set the header when a token is actually present. Also add scripts/release.py AUTHOR_MAP entry for treydong.zh@gmail.com.	2026-05-10 19:41:45 -07:00
treydong	da2ed478b5	fix(achievements): inject Authorization header in plugin API calls	2026-05-10 19:41:45 -07:00
Teknium	771b8c4a36	test(conftest): plug every gateway-kill leak path (#23486 ) The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.	2026-05-10 18:55:28 -07:00
Teknium	e5bce320db	fix(auxiliary): evict cached client on timeout/connection error (#23482 ) A Codex auxiliary timeout closes the underlying OpenAI client (so the streaming hang doesn't sit until the user kills the session), but the cached wrapper kept pointing at the now-dead transport. Subsequent auxiliary calls (compression retry, memory flush, background review, title generation routed via provider: main) reused that closed client and failed fast with 'Connection error' until the gateway restarted — even though the main agent route was healthy the whole time. Sync `_get_cached_client` had no liveness check (async did, via loop identity), and the connection-error fallback in `call_llm` only fired on the auto provider path, so an explicit provider — including the common `auxiliary.compression.provider: main` shape — never evicted. Three fixes: * New `_evict_cached_client_instance(target)` helper that drops the cache entry whose stored client is target (or wraps it via `_real_client`, for `CodexAuxiliaryClient`). * `_CodexCompletionsAdapter._close_client_on_timeout` evicts the wrapper after closing the inner OpenAI client. * `call_llm` and `async_call_llm` evict on `_is_connection_error` before re-raising, regardless of whether the provider is auto. Net effect: one timeout costs one summary attempt + the existing 30s compressor cooldown; the next compaction rebuilds the client and works. Non-connection errors (4xx/5xx) do not evict, so cache hits stay stable. Closes #23432	2026-05-10 18:55:05 -07:00
Teknium1	ae83a54be4	docs(kanban): worker lane contract page + review-required convention Closes the architectural-pin part of #19931. Most of what that issue asked for is already implemented (logs under kanban root, env-pinned workspace, dispatcher routing of unknown assignees, lifecycle ownership, structured handoff conventions). What was missing: 1. A written contract integrators can point at when adding a new worker lane shape, and 2. The "code-changing workers should not auto-promote success to done" convention. This commit ships both as docs+convention layered on existing primitives. No kernel changes — the kanban_complete / kanban_block / kanban_comment surfaces already support the review-required pattern; we just hadn't written it down or made it visible to workers. Changes: - `agent/prompt_builder.py::KANBAN_GUIDANCE`: append the review-required exception to step 5 of the lifecycle. Workers get the cue auto-injected into their system prompt — drop structured metadata into a kanban_comment first, then end with kanban_block(reason="review-required: <summary>") instead of kanban_complete when the work needs review. Total prompt size went from ~3000 to ~3275 chars; well under the 4096 budget enforced by test_kanban_guidance_size. - `skills/devops/kanban-worker/SKILL.md`: add a worked example to the existing "Good summary + metadata shapes" section between the Coding-task and Research-task examples. Same shape as the others (kanban_comment with structured handoff JSON, then kanban_block with the human-readable reason). Plus a one-line guide on when to use kanban_complete vs the review-required pattern. - `website/docs/user-guide/features/kanban-worker-lanes.md` (new): the integrator-facing contract. Covers the hierarchy, the three things every lane must provide (assignee, spawn mechanism, lifecycle terminator), the env vars the dispatcher injects, the review-required convention, the failure modes the kernel handles for free, and an explicit "external CLI worker lane" deferred- pending-concrete-asker section that links to #19931 and #19924. - `website/sidebars.ts`: link the new page under user-guide/features. The "specialist worker lanes for external CLI tools (Codex / Claude Code / OpenCode)" runner is NOT shipped here. The dispatcher's spawn_fn parameter already supports plugin-shaped extension; the per-CLI integration work (auth, sandbox policy, exit-code mapping) needs a concrete asker. The new docs page tells would-be integrators the contract any such lane must satisfy. Refs #19931	2026-05-10 18:15:52 -07:00
teknium1	666b751536	chore: AUTHOR_MAP entry for rahimsais	2026-05-10 18:09:31 -07:00
rahimsais	737314fe91	fix(telegram): normalize dm threads and retry control sends Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id issue (#3206): 1. _build_message_event filters DM thread_ids: only preserve thread_id for real topic messages (is_topic_message=True). Telegram puts message_thread_id on every DM that is a reply, but reply-chain ids route to nonexistent threads on send. 2. _send_message_with_thread_fallback helper: control sends (send_update_prompt, send_exec_approval / send_slash_confirm, send_model_picker) retry once without message_thread_id when Telegram returns BadRequest 'Message thread not found'. Mirrors the pattern PR #3390 added for the streaming send path. Salvage notes: - Conflict 1 (line ~4099): merged the contributor's DM is_topic_message filter with the existing forum General-topic default from #22423, preserving both behaviors. - Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416) alongside the new helper. Tightened the helper's exception catch from bare 'Exception' to use the existing _is_bad_request_error + _is_thread_not_found_error helpers (line 484-496) for consistency with the streaming send path. - Widened the fix to send_update_prompt (was bare self._bot.send_message, same bug class). Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@ local commit author).	2026-05-10 18:09:31 -07:00
Teknium	404640a2b7	feat(goals): /goal checklist + /subgoal user controls (#23456 ) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.	2026-05-10 16:56:51 -07:00
teknium1	c0bbdec850	chore: AUTHOR_MAP entry for Freeman-Consulting	2026-05-10 16:21:07 -07:00
teknium1	121bbe0385	test(stream-consumer): add UTF-16 overflow regression tests for #11170 New TestUtf16OverflowDetection class covers two scenarios: - test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds 2200 emoji codepoints (4400 UTF-16 units) — under Telegram's codepoint-equivalent limit but over its UTF-16 limit. Asserts truncate_message was called with len_fn=utf16_len, confirming the consumer detected the overflow. - test_codepoint_only_adapter_falls_back_to_len: documents that adapters which don't subclass BasePlatformAdapter (or test MagicMocks) fall back to plain len for backwards compat. The contributor's PR shipped no tests for the UTF-16 path.	2026-05-10 16:21:07 -07:00
Aubrey Freeman III	c0da5d09a6	fix: use UTF-16 length for Telegram stream consumer message splitting The stream consumer measured message length using Python's len() (Unicode code points), but Telegram's actual limit is in UTF-16 code units. This caused messages with supplementary characters (emoji, CJK, etc.) to exceed Telegram's 4096-character limit, resulting in truncated messages with formatting artifacts. Changes: - Add message_len_fn property to BasePlatformAdapter (defaults to len) - Override in TelegramAdapter to return utf16_len - Stream consumer uses adapter.message_len_fn for: - safe_limit calculation - overflow detection - truncate_message calls - split point calculation (via _custom_unit_to_cp) - fallback final send chunking Fixes truncated messages with black square artifacts on Telegram when the model generates responses containing multi-byte Unicode characters.	2026-05-10 16:21:07 -07:00
Teknium	c5f1f863ac	fix(cli): drive _prompt_text_input directly when off main thread (#23454 ) Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the process_loop daemon thread. prompt_toolkit.run_in_terminal returns a coroutine that only the main-thread event loop can drive, so calling it from a daemon thread orphans the coroutine — the input prompt never renders and user keystrokes leak into the composer instead of the confirmation prompt (issue #23185). Mirror the thread-aware guard already in _run_curses_picker: when off the main thread, fall back to a direct input() call. Also wrap run_in_terminal in try/except so WSL / Warp / other emulators that silently drop the scheduled coroutine fall back to input() too. Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main thread (run_in_terminal path), daemon thread (direct input fallback), no-app, run_in_terminal-raises, and EOF handling.	2026-05-10 16:16:10 -07:00
konsisumer	62cfe79e93	fix(tools): clarify kanban_complete phantom-card retry guidance When kanban_complete rejects a created_cards list as hallucinated, the task is intentionally left in-flight (the gate runs before the write txn) so the worker can retry with a corrected list or pass created_cards=[] to skip the check. The retry path already worked, but the previous error wording read like a terminal failure and workers were observed abandoning the run instead of trying again. Spell out the recovery path explicitly in the tool_error response ("Your task is still in-flight ... Retry kanban_complete with ...") and add regression coverage at both the kernel and tool layers so the retry contract — and the wording the worker depends on to discover it — is pinned. Fixes #22923	2026-05-10 16:14:43 -07:00
Keyu Yuan	2f00559d9e	fix(telegram): pass source.thread_id explicitly on auto-reset notice (carve-out of #7404 ) The auto-reset notice ("◐ Session automatically reset…") was being sent with metadata=getattr(event, 'metadata', None), which can drop or mis-route in Telegram forum topics: the event's metadata isn't guaranteed to carry the originating thread_id, so the notice could leak into General or another topic. Use the existing self._thread_metadata_for_source(source) helper, which already handles thread_id construction plus the Telegram DM topic reply-fallback shape used everywhere else in the gateway. Carve-out of #7404. The PR's other hunk (line 7578, queued first response) is already redundant on main — gateway/run.py:15782 has used _status_thread_metadata since the _thread_metadata_for_source plumbing landed. Closes #7355 (path B; paths A and C closed via prior salvage merges).	2026-05-10 16:12:40 -07:00
Wesley Simplicio	a2920b1762	fix(tui): right-click copies selection, only pastes when no selection Sub-issue 5 of #22034. Right-click on the composer always pasted from the clipboard, even when the user had highlighted text — diverging from terminal-native behavior (xterm/iTerm/gnome-terminal) where right-click copies an active selection and only pastes when nothing is selected. Extract a small pure helper, decideRightClickAction(value, range), and route the existing onMouseDown right-click branch through it. Selection present and non-empty -> writeClipboardText(slice). Otherwise fall back to the existing emitPaste path.	2026-05-10 16:06:33 -07:00
Teknium1	59d3f24f10	chore: AUTHOR_MAP entry for konsisumer noreply (#23071 )	2026-05-10 15:23:04 -07:00
konsisumer	88588b6159	fix(kanban): extend stale claim instead of killing live worker Workers running slow models (e.g. kimi-k2.6) can spend longer than DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making no tool calls and therefore not heartbeating. release_stale_claims previously reclaimed these healthy workers, producing the spawn-then-immediately-reclaim loop reported in #23025. When a stale-by-TTL claim's host-local worker PID is still alive, extend the claim (emit a claim_extended event) rather than killing it. enforce_max_runtime / detect_crashed_workers remain the upper bounds for genuinely wedged or dead workers. Reclaim events now also record claim_expires, last_heartbeat_at, worker_pid, and host_local so operators can see why a worker was killed.	2026-05-10 15:23:04 -07:00
Teknium	3974a137c6	docs(user-stories): add 116 stories from the Hermes Discord archive (#23436 ) * docs(user-stories): add 116 stories from Discord archive Mined teknium1/nous-discord-archive for first-person user stories that match the existing collage voice ('I run X every day', 'my family uses Hermes for Y', 'so I built Z'). Skipped pure project pitches, Q&A, install help, and generic announcements. - Added 'discord' as a source in UserStoriesCollage (label + brand color) - Added 116 entries to userStories.json (237 total, up from 121) - Each entry links back to the discord-archive thread or channel archive file * docs(user-stories): interleave discord stories across the full collage Shuffle userStories.json with a fixed seed so the 116 Discord-sourced entries are mixed evenly with the existing 121 entries instead of appearing as a contiguous block at the end. Even distribution: 10-16 discord entries per decile across the array (ideal would be ~11).	2026-05-10 15:21:40 -07:00
Teknium	d6e1fadbf5	fix(xai): omit reasoning.effort for grok models that reject it (#23435 ) xAI's Responses API returns HTTP 400 ("Model X does not support parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-, grok-4-1-fast-, grok-3, grok-4.20-0309-, and grok-code-fast-1 — even though those models reason natively. Hermes was unconditionally sending `reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking direct `--provider xai` for the entire grok-4 line. Add a substring allowlist predicate (verified live against api.x.ai 2026-05-10) covering the only Grok families that accept the effort dial: grok-3-mini, grok-4.20-multi-agent, grok-4.3. The Responses transport omits the `reasoning` key entirely for everything else while still including `reasoning.encrypted_content` so we capture native reasoning tokens. Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709` went from HTTP 400 to a successful reply.	2026-05-10 15:21:30 -07:00
teknium1	cc2a0c674a	chore: AUTHOR_MAP entry for hrygo (黄飞虹)	2026-05-10 15:20:40 -07:00
teknium1	f9e0d60a99	test(thread-routing): handle both lark-SDK-present and absent paths The contributor's regression test for Feishu fallback thread routing asserted on attributes specific to the real lark SDK builder (call_args.body, body.receive_id). In test environments without the lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py _build_create_message_request) returns a SimpleNamespace using .request_body instead of .body, causing AttributeError. Now reads via getattr fallback and also verifies receive_id_type is 'thread_id' (not 'chat_id') as a stronger contract check.	2026-05-10 15:20:40 -07:00
黄飞虹	e164a9c1ed	fix(stream-consumer): preserve thread routing on overflow first-send path When the first streamed message exceeds the platform length limit and gets split into chunks, _send_new_chunk was called with self._message_id (which is None on first send), dropping thread routing entirely. Fallback to self._initial_reply_to_id so overflow chunks land in the correct topic/thread. Also fix a fragile test assertion that could be silently skipped.	2026-05-10 15:20:40 -07:00
hrygo	ff14666cdc	fix(gateway): stream consumer first message drops thread context Cherry-picked from PR #13077 commits: - 5500c7d8 fix(gateway): stream consumer first message drops thread context - e84403b9 test(gateway): add regression tests for stream consumer thread routing Fixes: Streaming first message drops thread/topic context in Feishu group topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and _send_new_chunk. Also fixes Feishu _send_raw_message fallback path (reply -> create) to use receive_id_type='thread_id' so the new message lands in the correct topic instead of the main channel. Authored by hrygo via PR #13077 (re-attributed from the bot-authored salvage commit on the original branch).	2026-05-10 15:20:40 -07:00
Teknium	6636fecd47	fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420 ) The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was copying the cumulative _already_sent flag into _final_response_sent on the done frame. _already_sent goes True on any successful prior edit (tool progress) or on fallback-mode promotion when an edit fails — neither proves the current chunked send delivered the final answer. When the chunked send actually fails (network error, flood control), the consumer would wrongly claim 'final delivered' and the gateway's independent fallback delivery in run.py would be suppressed. User saw only tool-progress bubbles and never got the answer. Now we track per-chunk success locally: _send_new_chunk returns the new message_id on success or returns the passed-in reply_to unchanged on failure. If at least one returned id differs, chunks_delivered = True; otherwise stays False, gateway fallback runs. Adds two regression tests: - test_split_overflow_failed_send_does_not_mark_final_sent — primes _already_sent=True, then makes every send fail; asserts _final_response_sent stays False. - test_split_overflow_partial_send_marks_final_sent — happy path, asserts _final_response_sent goes True. Note: the companion bug at the CancelledError handler (issue cited lines 417-418) was already fixed by `3b5572ded` on 2026-04-16. Closes #10748	2026-05-10 15:13:54 -07:00
Teknium	b38b100105	chore: AUTHOR_MAP entry for jelrod27 (#21398 )	2026-05-10 14:27:59 -07:00
Teknium	787e3c368c	test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests Follow-up to the previous commit's notifier behavior change. Two test fixes: 1. `tests/gateway/test_kanban_notifier.py` gains `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new contract directly: a task that crashes, gets reclaimed, and crashes again notifies the user BOTH times. Before #21398 the second crash silently dropped because the subscription was already deleted. 2. `tests/hermes_cli/test_kanban_notify.py:: test_notifier_unsubs_after_abnormal_events[gave_up\|crashed\|timed_out]` is flipped. Those tests were added in the salvage of #22941 and asserted the OLD behavior (subscription deleted after gave_up / crashed / timed_out). They're now obsolete — the new contract is "subscription survives a non-final terminal event so retries reach the user." Updated docstring + asserts; the cursor-advance check is added to confirm the dedup mechanism still works. The `test_notifier_unsubs_after_completed_event` test stays untouched because `completed` IS still a terminal event that triggers unsub (the task hits `done` status, which is handled by the `task_terminal` branch in the notifier loop).	2026-05-10 14:27:59 -07:00
jelrod27	a96dd54872	fix: deduplicate kanban notifications for blocked/gave_up states The kanban notifier was re-firing the same blocked/gave_up/crashed/timed_out notifications on every 5-second tick. Root cause: after delivering a terminal event, the notifier unsubscribed the subscription, deleting its cursor. If the unsub failed (WAL contention, transient error), the subscription survived with a stale cursor, and the next tick would re-deliver the same event. Even when the unsub succeeded, the subscription was gone. If the task later transitioned to a different state (e.g., blocked -> unblocked -> blocked again), a new subscription would start at cursor=0, re-delivering all past events. Fix: stop unsubscribing on terminal event kinds. Only remove the subscription when the task reaches a truly final status (done/archived). For blocked, gave_up, crashed, and timed_out, the subscription stays alive and the cursor mechanism deduplicates naturally -- events with id <= last_event_id are never re-fetched. This makes the dedup idempotent and eliminates the re-fire bug. The old concern about subscriptions leaking forever on blocked tasks is moot: blocked tasks will eventually be unblocked (transitioning to ready/running) or archived, at which point the subscription is cleaned up.	2026-05-10 14:27:59 -07:00
teknium1	04e18160ab	chore: AUTHOR_MAP entry for HuangYuChuh	2026-05-10 14:22:59 -07:00

1 2 3 4 5 ...

8029 commits