hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-06 12:52:11 +00:00

Author	SHA1	Message	Date
Teknium	7b76366552	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 ) Cuts input cost for first-turn Claude requests by ~85-90% on subsequent sessions within an hour. Tools array (~13k tokens for default toolset) + stable system prefix (~5-8k tokens) get a 1h cache_control marker; the volatile suffix (memory, USER profile, timestamp, session id) sits in a separate non-cached block at the end so it doesn't poison the cross-session prefix when it changes. Provider gate: Claude on native Anthropic (incl. OAuth subscription), OpenRouter, and Nous Portal (which proxies to OpenRouter). All other providers keep today's system_and_3 layout unchanged. Layout (4 cache_control breakpoints, Anthropic max): 1. tools[-1] -> 1h (cross-session) 2. system content[0] -> 1h (cross-session, stable prefix) 3. messages[-2] -> 5m (within-session rolling) 4. messages[-1] -> 5m (within-session rolling) Within-session rolling shrinks from 3 messages to 2 to free the breakpoint budget. On Claude with realistic tool loadouts the long-lived tier carries the bulk of cross-session value anyway. System prompt is now always assembled cache-friendly: stable identity / guidance / skills / platform hints first, then session-stable context files (AGENTS.md, .cursorrules), then per-call volatile content. Old single-string callers see the same logical content (same join order), just reordered so volatile lives at the end. Config knobs (defaults shown): prompt_caching: cache_ttl: "5m" # rolling-window TTL (unchanged) long_lived_prefix: true # opt-out switch long_lived_ttl: "1h" # cross-session prefix TTL Live E2E (tests/agent/test_prompt_caching_live.py, gated on OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset: Call 1 (cold): cache_write=13,415 cache_read=0 Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025 Cross-session reuse: 97.09% Implementation: * agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived() + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control() preserved verbatim for the fallback path. * agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards cache_control onto each Anthropic-format tool dict. * run_agent.py: _build_system_prompt_parts() returns the 3-tier dict; _build_system_prompt() joins them (backward compatible). _supports_long_lived_anthropic_cache() policy added next to the existing _anthropic_prompt_cache_policy() (which now also recognises Nous Portal Claude — pre-existing gap fixed in passing). _build_api_kwargs() resolves tools_for_api once and propagates the marker through all four build paths (anthropic_messages, bedrock, codex_responses, profile/legacy chat completions). Long-lived flag plumbed into the runtime snapshot/restore + model-switch + fallback-promotion paths. Tests: * tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache, TestApplyAnthropicCacheControlLongLived). * tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes + a fallback-target case). * tests/agent/test_prompt_caching_live.py: new live E2E (skipif when OPENROUTER_API_KEY is unset; runs outside the hermetic suite). * Targeted suites: 327/327 pass (caching/adapter/policy/builder). * tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry, verified failing on pristine origin/main).	2026-05-11 11:14:56 -07:00
wuli666	111b859e49	fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482 ) #23482 fixed cache poisoning in the sync path: when a Codex auxiliary timeout closes the underlying OpenAI client, _evict_cached_client_instance walks CodexAuxiliaryClient wrappers via their _real_client attribute and drops the cache entry so the next aux call rebuilds. The cache key includes async_mode (see _client_cache_key), so the sync and async clients for the same provider live in two distinct entries pointing at the same underlying transport. The fix walked the sync wrapper's _real_client correctly but the async wrappers (AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient, AsyncGeminiNativeClient) never exposed _real_client at all, so the async entry survived eviction and kept handing out the poisoned client. Effect on async aux callers: one timeout now poisons every subsequent async aux call (compression, vision, session_search, title_generation) with 'Connection error' until gateway restart -- even while the sync route recovered as designed in #23482. Mirror the sync wrapper's _real_client onto each async wrapper so the existing eviction helper finds them. Three changes, one per wrapper: - AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client (the underlying OpenAI client) - AsyncAnthropicAuxiliaryClient: same shape - AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's native facade is itself the leaf; no OpenAI client beneath it) Update _evict_cached_client_instance docstring to reflect that it now covers both sync and async wrappers via the same attribute walk. Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper seeds both sync and async cache entries pointing at the same leaf and asserts both are dropped on a single eviction call. Verified the test fails without the wrapper changes ("async cache entry survived eviction -- wrapper is missing _real_client") and passes with them. Refs #23482, #23432	2026-05-11 11:13:20 -07:00
Teknium	1d00716754	fix(cli,tui): align CJK / wide-char markdown tables (#23863 ) CJK and emoji glyphs render as two terminal cells but JS String#length and the model's own padding count them as one, so any markdown table with Chinese / Japanese / Korean cells drifts right per row when a real terminal renders it. Both surfaces fix this with a display-cell width measurement (wcswidth on the Python side, stringWidth on the TUI side). Changes: - agent/markdown_tables.py: new helper. realign_markdown_tables(text) detects markdown table blocks (header + \|---\| divider) and rewrites the row padding using wcwidth.wcswidth so every pipe and dash lines up across rows. No-op on text without tables. - cli.py: hook the helper into _render_final_assistant_content for strip / render modes (raw passes through untouched), and into the streaming line emitter so live token-by-token rendering also produces aligned tables. A small two-buffer state machine in _emit_stream_text holds table rows until the block ends, then flushes them through the realigner so all rows pad to a single per-column width. - ui-tui/src/components/markdown.tsx: renderTable now uses stringWidth (Bun.stringWidth fast path + East-Asian-width-aware fallback, already memoised in @hermes/ink) instead of UTF-16 String#length for both column-width measurement and per-cell padding. Drops the comment that documented the bug as a deliberate limitation. Validation: - New tests/agent/test_markdown_tables.py (11): every rebuilt block shares pipe column offsets across rows for pure CJK, mixed CJK+emoji, ragged-row, and multi-table inputs. - Updated tests/cli/test_cli_markdown_rendering.py: the existing strip-mode test asserted exact whitespace; rewritten to assert the alignment contract (cell content survives + every rendered row shares pipe offsets). - New ui-tui markdown.test.ts case (1): rendered column-2 start offset is identical for the header + every body row, including the CJK row that drifted before the fix. - Live: hermes chat -q with the user-reported screenshot prompt now produces a perfectly aligned table on the wire (header, divider, 4 body rows including '通义千问', all pipes at identical columns).	2026-05-11 11:13:06 -07:00
Teknium	8e2eb4b511	fix(/model): surface Nous Portal models from remote catalog manifest (#23912 ) The /model picker for Nous Portal users was returning the in-repo _PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes releases — instead of the remote manifest published at https://hermes-agent.nousresearch.com/docs/api/model-catalog.json. OpenRouter already pulled from the manifest via fetch_openrouter_models; "nous" was the only curated provider where the existing manifest plumbing (get_curated_nous_model_ids → get_curated_nous_models) was defined but not wired into the picker pipeline. Switch the curated build in list_authenticated_providers to use it, with the same graceful fallback to the in-repo snapshot when the manifest is unreachable. Test: tests/hermes_cli/test_model_catalog.py exercises the picker with a patched manifest and asserts the manifest's nous list reaches list_picker_providers. Falls-back-to-static path was already covered by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog.	2026-05-11 10:15:30 -07:00
zhengyuna	054f568578	fix: use TUI modal for slash confirmations	2026-05-11 10:02:03 -07:00
Teknium1	283381b1ce	fix(dashboard): validate dist exists when --skip-build is set Follow-up to PR #23824. Adds two correctness fixes on top of the contributor's salvaged commit: 1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard` passes `fatal=True` and is the primary scenario this fallback is for (issue #23817 — Windows Scheduled Task at logon). The previous gate meant the fallback never fired in the case it was designed for. 2. `--skip-build` now verifies the dist actually exists before starting the server. Without this, a misconfigured pre-build would launch the dashboard pointing at a missing dist and silently serve 404s. We now exit 1 with a clear "pre-build first: cd web && npm run build" message, and on success print which dist directory is being used. Verified end-to-end on Linux: - build fails + stale dist (fatal=True) -> fallback fires - build fails + no dist (fatal=True) -> exit 1 with stderr surfaced - build fails + stale dist (fatal=False) -> fallback fires - --skip-build + missing dist -> exit 1 with clear guidance - --skip-build + valid dist -> 'Skipping web UI build...'	2026-05-11 09:27:05 -07:00
文森.Z	a479ec01ed	fix: make web UI build output decoding robust on Windows On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning. This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows. The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics.	2026-05-11 08:14:03 -07:00
Teknium	7026af4e23	fix(agent): catch ChatGPT-account Codex data-URL rejection so images are stripped instead of cascading to compression (#23602 ) When the user's main provider is openai-codex on the ChatGPT-account backend (https://chatgpt.com/backend-api/codex), sending a native image attachment encodes it as data:image/...base64,... in the input_image field. The OpenAI Responses API on the public endpoint accepts that, but the ChatGPT-account variant rejects it with HTTP 400: Invalid 'input[N].content[K].image_url'. Expected a valid URL, but got a value with an invalid format. Hermes' image-rejection phrase list didn't include this wording, so the error escaped the strip-and-retry branch and fell through to the generic recovery path: model fallback → context-too-large → compression cascade → auxiliary OpenRouter 402 spam (issue #23570). Add a NARROW phrase keyed on the field-path apostrophe used by the Codex Responses error format: "image_url'. expected". This matches the actual error format without false-tripping on generic 'Expected a valid URL' errors from unrelated tools (webhooks, redirect_uri, etc.). Once matched, the existing branch strips images from history, sets _vision_supported= False for the session, and retries text-only. Refs #23570 (1 of 3 image-replay improvements; persistence rewrite to store image PATHS instead of inlined base64 is a separate follow-up)	2026-05-11 07:37:22 -07:00
Teknium	3e7145e0bb	revert: roll back /goal checklist + /subgoal feature stack (#23813 ) * Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit `a63a2b7c78`. * Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)" This reverts commit `4a080b1d5a`. * Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)" This reverts commit `404640a2b7`.	2026-05-11 07:06:27 -07:00
fr33d3m0n	976d8e27ad	fix(approval): catch sudo with stdin/askpass/shell privilege flags Adds the only #17873 category not covered by the in-flight PRs #17962 (briandevans, reverse shell + download-execute) and #7993 (SHL0MS, credential reads + curl/wget exfiltration): sudo invocations that an LLM-driven agent can drive without TTY interaction. The agent has no TTY, so the sudo forms that succeed without human involvement are those reading the password from stdin (`-S` / `--stdin`) or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`) and list-privileges (`-a`) flags are also gated since they are privilege-relevant invocations the agent can chain after acquiring the password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell). Plain `sudo cmd` (no flag) is TTY-bound and excluded. Two patterns: 1. Direct flag: `\bsudo\b[^;\|&\n]?\s+(?:-s\b\|--stdin\b\|-a\b\|--askpass\b)` The lazy `[^;\|&\n]?` consumes flag-arguments without spanning command separators, so `sudo -u root -S whoami` matches (a textbook offensive form that a strict `(?:\s+-[^\s]+)` "leading flags only" pattern would have missed because `root` is a flag-value not a flag). 2. Combined short flags: `\bsudo\b[^;\|&\n]?\s+-[a-z][sa][a-z]\b` Catches packed forms like `sudo -nS id` where multiple flags share a single `-X` token. `_normalize_command_for_detection` lowercases input before pattern matching (tools/approval.py:340), so case variants of S/s and A/a collapse — both letter-pairs are gated since each is a privilege- relevant invocation. Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all flag-order permutations including herestring source and printf-piped forms; 9 negative including TTY-bound `sudo whoami`, interactive `sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`, package install, and the `pseudosudo` word-boundary edge case). Empirical coverage: 11/11 attacks matched, 0/10 false positives. Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download- execute), #7993 (credential reads + curl/wget exfiltration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 06:56:30 -07:00
OpenClaw Agent	9520a1ccdf	fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set Fixes #9590: Block explicit sudo -S (stdin password mode) commands when the SUDO_PASSWORD environment variable is not configured. The attack vector: the LLM constructs 'echo guessedpass \| sudo -S cmd' to brute-force sudo passwords, iterates based on sudo's error output ('Sorry, try again'). The existing _transform_sudo_command only injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit sudo -S must be treated as a guessing attempt. Changes: - Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when SUDO_PASSWORD is absent, anchored to command-start positions (^ ; && \|\| \| etc.) to avoid false positives on literal text - Integrate into check_all_command_guards() above yolo/mode=off so the block is unconditional (like the hardline floor) - Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass, integration with check_all_command_guards, yolo non-bypass, container backend bypass	2026-05-11 06:56:30 -07:00
kshitijk4poor	494824fb11	chore: remove unused sentinel in test_send_message_tool	2026-05-11 06:44:58 -07:00
Dominikh	379e7dd014	test(send_message): cover _check_send_message gating paths Adds a TestCheckSendMessage class with 7 focused tests pinning the four passing conditions and the failure modes: - HERMES_KANBAN_TASK grants access (the new branch) - HERMES_KANBAN_TASK short-circuits before consulting session_context or gateway.status (so workers don't depend on those import paths being healthy) - HERMES_SESSION_PLATFORM=telegram grants access - HERMES_SESSION_PLATFORM=local falls through to gateway check - is_gateway_running()=True grants access - All signals absent → False - gateway.status ImportError is swallowed → False Pinning the short-circuit (test #2) is the load-bearing one — it documents the contract that worker-side availability cannot regress to depending on gateway-side state lookups.	2026-05-11 06:44:58 -07:00
Sylw3ster	641e40c4bd	fix(kanban): restore HERMES_KANBAN_BOARD after scoped slash override	2026-05-11 06:44:58 -07:00
liuhao1024	2b3bf17dfa	fix(kanban): call kanban_block on iteration-budget exhaustion to prevent protocol violation When a kanban worker subprocess hits the iteration budget, the agent loop strips tools and asks the model for a summary. The model cannot call kanban_block itself at that point, so the process exits rc=0 without calling kanban_complete or kanban_block — a protocol violation that the dispatcher detects as a fatal error, giving up after 1 failure and stranding downstream tasks. Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK and call kanban_block with a reason describing the exhaustion. The dispatcher then sees a clean block transition instead of a protocol violation, and the task can be retried or escalated by a human. Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget exhaustion without calling kanban_complete or kanban_block #23216	2026-05-11 06:44:58 -07:00
Frowtek	f6d4f3c37d	fix(kanban): route gateway create auto-subscribe to explicit board	2026-05-11 06:44:58 -07:00
Teknium	228b7d27bd	fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms (#23597 ) When an auxiliary provider returns HTTP 402 (credit / payment), every subsequent compression / title-gen / session-search / vision call still re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402 again, then falling back. On a long Discord/LCM session that meant dozens of doomed 402s per minute (issue #23570). Add a per-process unhealthy-provider cache with a 10 min TTL. When any caller observes a payment error against a provider, the label is marked unhealthy and skipped by: * _resolve_auto Step-1 (main provider use-as-aux path) * _resolve_auto Step-2 (aggregator/fallback chain) * _try_payment_fallback (used by call_llm/acall_llm on first 402) Skip-logs are throttled to once per minute per label so a bursty session doesn't spam agent.log. Entries auto-expire so a topped-up account recovers without manual intervention. The cache is in-process only by design — multi-profile users with different keys per profile must each hit the 402 once. Refs #23570	2026-05-10 22:43:14 -07:00
0xbyt4	ace1c4ea8c	fix(discord): typing indicator task not cleaned up after API error When the Discord typing API call fails (rate limit, network error, 403), _typing_loop returns early but the stale task remains in _typing_tasks. Subsequent send_typing calls see the stale entry and skip, leaving no typing indicator for the rest of the agent invocation. Add finally block to _typing_loop to always remove the task from _typing_tasks on exit, whether from cancellation, error, or normal completion. This allows send_typing to create a fresh task. 3 new tests in test_discord_send.py: - Task removed after API error - Typing restartable after failure - stop_typing cleans up	2026-05-10 22:41:26 -07:00
Teknium	228a4d11ae	fix(config): warn loudly on YAML parse failure instead of silent default fallback (#23585 ) A YAML parse error in ~/.hermes/config.yaml caused load_config() to print one line to stdout (Warning: Failed to load config: ...) and silently fall back to DEFAULT_CONFIG, dropping every user override (auxiliary providers, fallback chain, model settings). Users only noticed when downstream behavior misbehaved — see issue #23570 where a tab-indent error in the auxiliary section caused aux fallback to use OpenRouter (depleted) instead of the configured Codex/MiniMax chain. Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam, and re-warn after the user edits the file. Both call sites (raw read + merged load) route through the same helper. Refs #23570	2026-05-10 22:36:19 -07:00
Gutslabs	3af3c4eb8c	fix(misc): three small defensive fixes from PR #1974 Salvages the three substantive low-severity fixes from Gutslabs' #1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-10 22:28:01 -07:00
teknium1	edb4a2bda5	test(telegram): cover env-clamped helper + adaptive text-batch tiers - New tests/gateway/test_telegram_text_batch_perf.py: TestEnvFloatClamped — 7 tests covering default-when-unset, valid parse, garbage fallback, NaN rejection, Inf rejection, min-clamp, max-clamp. Asserts asyncio.sleep() always gets a finite number. TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant invariants and the min(cap, tier_delay) composition rule. - tests/gateway/test_display_config.py: update assertions for Telegram's new tool_progress='new' default.	2026-05-10 22:22:25 -07:00
Hugo Sqr	f2e8ed2405	Add unit tests for hyperliquid skill functionality - Implement tests for normalizing perpetual markets and DEXs. - Validate JSON output for main commands including markets, candles, and review. - Ensure environment variable resolution and dotenv file reading are covered. - Test export functionality for market data with expected output structure.	2026-05-10 22:15:04 -07:00
Teknium1	28b4fe6007	test: stabilize quick-command redaction test against xdist ordering agent.redact._REDACT_ENABLED is snapshotted at import time from HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker can flip it, so test_exec_command_output_is_redacted was order-dependent. Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.	2026-05-10 22:12:23 -07:00
0xbyt4	f6736ced81	fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages 1. Quick command exec ran in the gateway process's full environment without env sanitization or output redaction. A quick command like "env" or "printenv" would leak all API keys, OAuth tokens, and bot credentials to the messaging user. Fix: apply _sanitize_subprocess_env() before exec and redact_sensitive_text() on output before returning. 2. GatewayRunner._pending_messages was written on every interrupt (lines 1331-1334) but never read or consumed anywhere. The actual interrupt delivery uses adapter._pending_messages (a separate dict). Removed the write-only accumulation to prevent unbounded growth.	2026-05-10 22:12:23 -07:00
teknium1	82352e54c4	test(telegram): regression coverage for edit overflow split-and-deliver Two new tests: - tests/gateway/test_telegram_format.py test_message_too_long_splits_into_continuations_not_silent_truncation: asserts edit_message returns success=True with continuation_message_ids populated and message_id pointing at the last continuation when content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original fail-on-overflow assertion with the split-and-deliver contract. - tests/gateway/test_stream_consumer.py TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver: asserts the consumer side updates _message_id to the latest continuation, clears _last_sent_text, and fires on_new_message when the adapter reports a split-and-deliver result.	2026-05-10 22:02:56 -07:00
Teknium	3b122cc1ac	feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578 ) Surface ready tasks that nobody claims within a threshold (default 30 min) regardless of why. One identity-agnostic signal that catches: - Operator typo'd the assignee - Profile was deleted, leaving its tasks stranded - External worker pool (Codex CLI lane, custom daemon) is down - Dispatcher misconfigured (wrong board / wrong HERMES_HOME) Today the dispatcher correctly skips these (no respawn loop, good) but nothing surfaces the fact that operator-actionable work is accumulating. The new `stranded_in_ready` rule does that without requiring a manual lane registry — it reads the most recent ready- transition event (`created` / `promoted` / `reclaimed` / `unblocked`) and fires when (now - last_ready_ts) > threshold. Severity escalates with age: warning at threshold, error at 2x, critical at 6x. The cli_hint and reassign actions point operators at the right next step. Out of scope deliberately: - Lane registry (#20157 closed) — this signal supersedes it. - Pushing the diagnostic into messaging gateways — diagnostics are pull-only via 'hermes kanban diagnostics' for now; gateway push is a separate UX decision. Tests: 10 new + 461 existing kanban tests pass. E2E verified end- to-end via 'hermes kanban diagnostics --json' against a 2h-old stranded task — surfaces as error severity with correct actions.	2026-05-10 21:58:44 -07:00
eloklam	b60462a205	test(kanban): remove stale t.summary assertion from search test Task.summary was never a real field; latest_summary already covers it. Matches the haystack cleanup in commit f3015e6ab.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	0ea234e093	feat(kanban): dashboard batch QOL upgrade - Shift-click range selection, column select-all, select-all-visible - Multi-card drag/drop via selectedIds + /tasks/bulk - Expanded bulk actions: todo/ready/blocked/unblock/complete/archive, priority setter, reassign with reclaim_first checkbox - Partial failure card highlight (failedIds + hermes-kanban-card--failed) - Search expanded to body, result, latest_summary, summary - Clear filters button + reset all filters on board switch - Accessibility: larger checkbox hit target, tabIndex/role/aria-label, Enter/Space/Esc keyboard handlers - Fix temporal-dead-zone bug: move clearSelected before moveSelected	2026-05-10 21:44:37 -07:00
Teknium	a63a2b7c78	fix(goals): force judge to use tool calls instead of JSON-text replies (#23547 ) Live-tested on gemini-3-flash-preview the judge kept returning empty or non-JSON content, tripping the consecutive-parse-failures auto- pause. Free-form JSON output is hopeful; tool-call schemas are enforced server-side by virtually every modern provider. Two new tools the judge calls: - submit_checklist(items) — Phase A, decompose - update_checklist(updates, new_items, reason) — Phase B, evaluate Both phases now call the auxiliary client with tool_choice forcing the right tool. read_file remains for Phase B history inspection, with the loop exiting only when update_checklist is called or the read budget is exhausted (at which point read_file is dropped from the toolbox and update_checklist is forced). Robustness: - _call_judge_with_tool_choice falls back tool_choice forced→required→ auto if the provider rejects a particular shape. - If a fully-broken provider still returns content instead of a tool call, the legacy JSON-text parsers stay around as a last-ditch backstop so we never silently lose a checklist. - _normalize_update_args replaces the JSON parser for the apply layer; same 1-based→0-based conversion + terminal-status filter. Live verification: same fizzbuzz goal that was hitting 'judge model returned unparseable output 3 turns in a row' before now terminates in 2 turns, all 11 items marked completed with item-specific evidence, no auto-pause. Agent log shows 'produced 11 checklist items via tool call' instead of the JSON- parse path. Tests: 7 new cases for the tool-call path (Phase A success, Phase B update only, Phase B read_file→update, JSON-content backstop, empty-text item dropping, non-terminal status filter).	2026-05-10 20:51:40 -07:00
Teknium	4a080b1d5a	fix(goals): forward standing /goal state on auto-compression session rotation (#23530 ) When run_agent's _compress_context fires mid-turn it ends the parent session in SessionDB and creates a new continuation session with a fresh session_id. The /goal state is keyed on session_id in state_meta ("goal:<sid>"), so without forwarding the goal silently disappears: _get_goal_manager() rebinds for the new session_id, load_goal() returns None, mgr.is_active() is False, and the continuation loop dies with no user-visible signal. Fix: in the same SessionDB transaction block that creates the continuation session, copy state_meta[goal:<old>] → state_meta[goal:<new>] when present. No-op when the user has no active goal. Logged at INFO so a stuck loop is debuggable. Tests cover the round-trip via SessionDB and the no-op path. Affects all three run-conversation surfaces (CLI, gateway, TUI gateway) because _compress_context is the single rotation site.	2026-05-10 20:41:53 -07:00
Mike Nguyen	ba5640fa11	fix(gateway): route kanban notifications to creator profile	2026-05-10 20:04:53 -07:00
teknium1	7f90141c63	test(telegram): native-draft transport coverage + docs Added tests/gateway/test_stream_consumer_draft.py with 11 tests covering: - Transport selection: auto+dm-supported -> draft; auto+group -> edit; explicit edit; explicit draft on unsupported adapter -> edit; MagicMock adapter -> edit (back-compat for the existing test suite). - Happy path: DM stream animates draft frames with a single shared draft_id, then finalizes via a regular adapter.send. - Group fallback: drafts entirely skipped in non-DM chats. - Failure fallback: send_draft returning success=False disables drafts for the rest of the response. - Draft_id lifecycle: consecutive responses use distinct ids; tool boundaries bump the id so post-tool text animates fresh below the tool-progress bubble (the openclaw #32535 leak guard). - _already_sent contract: drafts must NOT set the flag so the gateway's fallback final-send still fires (drafts have no message_id). Updated website/docs/user-guide/messaging/telegram.md with a 'Streaming transport' section explaining auto\|draft\|edit\|off, the DM-only constraint, and the per-response fallback behaviour.	2026-05-10 20:02:50 -07:00
Teknium	771b8c4a36	test(conftest): plug every gateway-kill leak path (#23486 ) The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.	2026-05-10 18:55:28 -07:00
Teknium	e5bce320db	fix(auxiliary): evict cached client on timeout/connection error (#23482 ) A Codex auxiliary timeout closes the underlying OpenAI client (so the streaming hang doesn't sit until the user kills the session), but the cached wrapper kept pointing at the now-dead transport. Subsequent auxiliary calls (compression retry, memory flush, background review, title generation routed via provider: main) reused that closed client and failed fast with 'Connection error' until the gateway restarted — even though the main agent route was healthy the whole time. Sync `_get_cached_client` had no liveness check (async did, via loop identity), and the connection-error fallback in `call_llm` only fired on the auto provider path, so an explicit provider — including the common `auxiliary.compression.provider: main` shape — never evicted. Three fixes: * New `_evict_cached_client_instance(target)` helper that drops the cache entry whose stored client is target (or wraps it via `_real_client`, for `CodexAuxiliaryClient`). * `_CodexCompletionsAdapter._close_client_on_timeout` evicts the wrapper after closing the inner OpenAI client. * `call_llm` and `async_call_llm` evict on `_is_connection_error` before re-raising, regardless of whether the provider is auto. Net effect: one timeout costs one summary attempt + the existing 30s compressor cooldown; the next compaction rebuilds the client and works. Non-connection errors (4xx/5xx) do not evict, so cache hits stay stable. Closes #23432	2026-05-10 18:55:05 -07:00
rahimsais	737314fe91	fix(telegram): normalize dm threads and retry control sends Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id issue (#3206): 1. _build_message_event filters DM thread_ids: only preserve thread_id for real topic messages (is_topic_message=True). Telegram puts message_thread_id on every DM that is a reply, but reply-chain ids route to nonexistent threads on send. 2. _send_message_with_thread_fallback helper: control sends (send_update_prompt, send_exec_approval / send_slash_confirm, send_model_picker) retry once without message_thread_id when Telegram returns BadRequest 'Message thread not found'. Mirrors the pattern PR #3390 added for the streaming send path. Salvage notes: - Conflict 1 (line ~4099): merged the contributor's DM is_topic_message filter with the existing forum General-topic default from #22423, preserving both behaviors. - Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416) alongside the new helper. Tightened the helper's exception catch from bare 'Exception' to use the existing _is_bad_request_error + _is_thread_not_found_error helpers (line 484-496) for consistency with the streaming send path. - Widened the fix to send_update_prompt (was bare self._bot.send_message, same bug class). Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@ local commit author).	2026-05-10 18:09:31 -07:00
Teknium	404640a2b7	feat(goals): /goal checklist + /subgoal user controls (#23456 ) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.	2026-05-10 16:56:51 -07:00
teknium1	121bbe0385	test(stream-consumer): add UTF-16 overflow regression tests for #11170 New TestUtf16OverflowDetection class covers two scenarios: - test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds 2200 emoji codepoints (4400 UTF-16 units) — under Telegram's codepoint-equivalent limit but over its UTF-16 limit. Asserts truncate_message was called with len_fn=utf16_len, confirming the consumer detected the overflow. - test_codepoint_only_adapter_falls_back_to_len: documents that adapters which don't subclass BasePlatformAdapter (or test MagicMocks) fall back to plain len for backwards compat. The contributor's PR shipped no tests for the UTF-16 path.	2026-05-10 16:21:07 -07:00
Teknium	c5f1f863ac	fix(cli): drive _prompt_text_input directly when off main thread (#23454 ) Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the process_loop daemon thread. prompt_toolkit.run_in_terminal returns a coroutine that only the main-thread event loop can drive, so calling it from a daemon thread orphans the coroutine — the input prompt never renders and user keystrokes leak into the composer instead of the confirmation prompt (issue #23185). Mirror the thread-aware guard already in _run_curses_picker: when off the main thread, fall back to a direct input() call. Also wrap run_in_terminal in try/except so WSL / Warp / other emulators that silently drop the scheduled coroutine fall back to input() too. Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main thread (run_in_terminal path), daemon thread (direct input fallback), no-app, run_in_terminal-raises, and EOF handling.	2026-05-10 16:16:10 -07:00
konsisumer	62cfe79e93	fix(tools): clarify kanban_complete phantom-card retry guidance When kanban_complete rejects a created_cards list as hallucinated, the task is intentionally left in-flight (the gate runs before the write txn) so the worker can retry with a corrected list or pass created_cards=[] to skip the check. The retry path already worked, but the previous error wording read like a terminal failure and workers were observed abandoning the run instead of trying again. Spell out the recovery path explicitly in the tool_error response ("Your task is still in-flight ... Retry kanban_complete with ...") and add regression coverage at both the kernel and tool layers so the retry contract — and the wording the worker depends on to discover it — is pinned. Fixes #22923	2026-05-10 16:14:43 -07:00
konsisumer	88588b6159	fix(kanban): extend stale claim instead of killing live worker Workers running slow models (e.g. kimi-k2.6) can spend longer than DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making no tool calls and therefore not heartbeating. release_stale_claims previously reclaimed these healthy workers, producing the spawn-then-immediately-reclaim loop reported in #23025. When a stale-by-TTL claim's host-local worker PID is still alive, extend the claim (emit a claim_extended event) rather than killing it. enforce_max_runtime / detect_crashed_workers remain the upper bounds for genuinely wedged or dead workers. Reclaim events now also record claim_expires, last_heartbeat_at, worker_pid, and host_local so operators can see why a worker was killed.	2026-05-10 15:23:04 -07:00
Teknium	d6e1fadbf5	fix(xai): omit reasoning.effort for grok models that reject it (#23435 ) xAI's Responses API returns HTTP 400 ("Model X does not support parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-, grok-4-1-fast-, grok-3, grok-4.20-0309-, and grok-code-fast-1 — even though those models reason natively. Hermes was unconditionally sending `reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking direct `--provider xai` for the entire grok-4 line. Add a substring allowlist predicate (verified live against api.x.ai 2026-05-10) covering the only Grok families that accept the effort dial: grok-3-mini, grok-4.20-multi-agent, grok-4.3. The Responses transport omits the `reasoning` key entirely for everything else while still including `reasoning.encrypted_content` so we capture native reasoning tokens. Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709` went from HTTP 400 to a successful reply.	2026-05-10 15:21:30 -07:00
teknium1	f9e0d60a99	test(thread-routing): handle both lark-SDK-present and absent paths The contributor's regression test for Feishu fallback thread routing asserted on attributes specific to the real lark SDK builder (call_args.body, body.receive_id). In test environments without the lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py _build_create_message_request) returns a SimpleNamespace using .request_body instead of .body, causing AttributeError. Now reads via getattr fallback and also verifies receive_id_type is 'thread_id' (not 'chat_id') as a stronger contract check.	2026-05-10 15:20:40 -07:00
黄飞虹	e164a9c1ed	fix(stream-consumer): preserve thread routing on overflow first-send path When the first streamed message exceeds the platform length limit and gets split into chunks, _send_new_chunk was called with self._message_id (which is None on first send), dropping thread routing entirely. Fallback to self._initial_reply_to_id so overflow chunks land in the correct topic/thread. Also fix a fragile test assertion that could be silently skipped.	2026-05-10 15:20:40 -07:00
hrygo	ff14666cdc	fix(gateway): stream consumer first message drops thread context Cherry-picked from PR #13077 commits: - 5500c7d8 fix(gateway): stream consumer first message drops thread context - e84403b9 test(gateway): add regression tests for stream consumer thread routing Fixes: Streaming first message drops thread/topic context in Feishu group topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and _send_new_chunk. Also fixes Feishu _send_raw_message fallback path (reply -> create) to use receive_id_type='thread_id' so the new message lands in the correct topic instead of the main channel. Authored by hrygo via PR #13077 (re-attributed from the bot-authored salvage commit on the original branch).	2026-05-10 15:20:40 -07:00
Teknium	6636fecd47	fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420 ) The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was copying the cumulative _already_sent flag into _final_response_sent on the done frame. _already_sent goes True on any successful prior edit (tool progress) or on fallback-mode promotion when an edit fails — neither proves the current chunked send delivered the final answer. When the chunked send actually fails (network error, flood control), the consumer would wrongly claim 'final delivered' and the gateway's independent fallback delivery in run.py would be suppressed. User saw only tool-progress bubbles and never got the answer. Now we track per-chunk success locally: _send_new_chunk returns the new message_id on success or returns the passed-in reply_to unchanged on failure. If at least one returned id differs, chunks_delivered = True; otherwise stays False, gateway fallback runs. Adds two regression tests: - test_split_overflow_failed_send_does_not_mark_final_sent — primes _already_sent=True, then makes every send fail; asserts _final_response_sent stays False. - test_split_overflow_partial_send_marks_final_sent — happy path, asserts _final_response_sent goes True. Note: the companion bug at the CancelledError handler (issue cited lines 417-418) was already fixed by `3b5572ded` on 2026-04-16. Closes #10748	2026-05-10 15:13:54 -07:00
Teknium	787e3c368c	test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests Follow-up to the previous commit's notifier behavior change. Two test fixes: 1. `tests/gateway/test_kanban_notifier.py` gains `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new contract directly: a task that crashes, gets reclaimed, and crashes again notifies the user BOTH times. Before #21398 the second crash silently dropped because the subscription was already deleted. 2. `tests/hermes_cli/test_kanban_notify.py:: test_notifier_unsubs_after_abnormal_events[gave_up\|crashed\|timed_out]` is flipped. Those tests were added in the salvage of #22941 and asserted the OLD behavior (subscription deleted after gave_up / crashed / timed_out). They're now obsolete — the new contract is "subscription survives a non-final terminal event so retries reach the user." Updated docstring + asserts; the cursor-advance check is added to confirm the dedup mechanism still works. The `test_notifier_unsubs_after_completed_event` test stays untouched because `completed` IS still a terminal event that triggers unsub (the task hits `done` status, which is handled by the `task_terminal` branch in the notifier loop).	2026-05-10 14:27:59 -07:00
teknium1	ec1fad3449	fix(gateway): align fallback delete with sibling style + add regression tests Follow-up to HuangYuChuh's #17384 cherry-pick: - Use defensive getattr+logger.debug for delete_message lookup, mirroring the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms that don't implement delete_message no longer raise AttributeError; the failure path now logs at debug for diagnosability instead of silently swallowing. - Add three regression tests in tests/gateway/test_stream_consumer.py: - delete_message awaited on happy-path exit with stale id - delete_message NOT awaited when no fallback chunks reached the user - no crash on adapters that lack delete_message (spec-restricted mock)	2026-05-10 14:22:59 -07:00
Teknium	cdb6e5e52a	test(conftest): block tests from killing the live hermes-gateway (#23397 ) The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest runs sending SIGTERM to the developer's live gateway 5+ times in 3 days. Root cause: when a single test forgets to mock os.kill or find_gateway_pids, the real call leaks past the hermetic HERMES_HOME isolation — find_gateway_pids' psutil scan walks the whole machine and returns the live gateway PID, then the unmocked os.kill delivers the signal. Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes, and stop_profile_gateway code paths, install a single autouse guard in tests/conftest.py that blocks the two primitives that actually cause the damage: - os.kill rejects any PID outside the test process subtree with a hard RuntimeError so the offending test gets a stack trace instead of silently murdering the real gateway. - subprocess.run / Popen / call / check_call / check_output reject any 'systemctl <verb> hermes-gateway' invocation that would mutate the live unit. Read-only systemctl calls (status, show, list-units) still pass through. We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids — tests of those functions themselves need the real implementation. Discovery without delivery is harmless; the os.kill + systemctl guards catch the actual damage path. Tests that legitimately need real signal delivery (e.g. PTY tests signalling their own child) opt out via @pytest.mark.live_system_guard_bypass. Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce the same 17 failures with and without this guard (all pre-existing on main, unrelated to gateway-kill leaks). The live gateway survives the test run that previously SIGTERMed it.	2026-05-10 13:20:27 -07:00
Teknium	9c68d12079	test(kanban): cover send-exception rewind + drop noisy success log to debug Two follow-up improvements to the previous commit's notifier dedup work. 1. Add a regression test for the send-exception rewind path. The contributor's PR included a test for the adapter-disconnect path (test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where adapter is None at delivery time), but not for the "adapter is connected, send() raises" path that fires inside the inner try/except at gateway/run.py:4314. The new test (test_kanban_notifier_rewinds_claim_on_send_exception) uses a FailingAdapter that always raises and confirms (a) send was actually attempted, (b) the claim was rewound, (c) the next call to unseen_events_for_sub still returns the event for retry. 2. Drop the per-delivery success log from INFO to DEBUG. A busy board on a multi-platform gateway can produce hundreds of these per day; that's gateway.log noise that obscures real warnings. Failure paths stay at WARNING (where you'd want to look when something's wrong) so we don't lose visibility into transient send issues.	2026-05-10 13:19:41 -07:00
Mike Nguyen	861ce7c0b6	fix: dedupe kanban notifier delivery claims	2026-05-10 13:19:41 -07:00

1 2 3 4 5 ...

3593 commits