hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

Author	SHA1	Message	Date
Teknium	228a4d11ae	fix(config): warn loudly on YAML parse failure instead of silent default fallback (#23585 ) A YAML parse error in ~/.hermes/config.yaml caused load_config() to print one line to stdout (Warning: Failed to load config: ...) and silently fall back to DEFAULT_CONFIG, dropping every user override (auxiliary providers, fallback chain, model settings). Users only noticed when downstream behavior misbehaved — see issue #23570 where a tab-indent error in the auxiliary section caused aux fallback to use OpenRouter (depleted) instead of the configured Codex/MiniMax chain. Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam, and re-warn after the user edits the file. Both call sites (raw read + merged load) route through the same helper. Refs #23570	2026-05-10 22:36:19 -07:00
Gutslabs	3af3c4eb8c	fix(misc): three small defensive fixes from PR #1974 Salvages the three substantive low-severity fixes from Gutslabs' #1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-10 22:28:01 -07:00
teknium1	edb4a2bda5	test(telegram): cover env-clamped helper + adaptive text-batch tiers - New tests/gateway/test_telegram_text_batch_perf.py: TestEnvFloatClamped — 7 tests covering default-when-unset, valid parse, garbage fallback, NaN rejection, Inf rejection, min-clamp, max-clamp. Asserts asyncio.sleep() always gets a finite number. TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant invariants and the min(cap, tier_delay) composition rule. - tests/gateway/test_display_config.py: update assertions for Telegram's new tool_progress='new' default.	2026-05-10 22:22:25 -07:00
Hugo Sqr	f2e8ed2405	Add unit tests for hyperliquid skill functionality - Implement tests for normalizing perpetual markets and DEXs. - Validate JSON output for main commands including markets, candles, and review. - Ensure environment variable resolution and dotenv file reading are covered. - Test export functionality for market data with expected output structure.	2026-05-10 22:15:04 -07:00
Teknium1	28b4fe6007	test: stabilize quick-command redaction test against xdist ordering agent.redact._REDACT_ENABLED is snapshotted at import time from HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker can flip it, so test_exec_command_output_is_redacted was order-dependent. Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.	2026-05-10 22:12:23 -07:00
0xbyt4	f6736ced81	fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages 1. Quick command exec ran in the gateway process's full environment without env sanitization or output redaction. A quick command like "env" or "printenv" would leak all API keys, OAuth tokens, and bot credentials to the messaging user. Fix: apply _sanitize_subprocess_env() before exec and redact_sensitive_text() on output before returning. 2. GatewayRunner._pending_messages was written on every interrupt (lines 1331-1334) but never read or consumed anywhere. The actual interrupt delivery uses adapter._pending_messages (a separate dict). Removed the write-only accumulation to prevent unbounded growth.	2026-05-10 22:12:23 -07:00
teknium1	82352e54c4	test(telegram): regression coverage for edit overflow split-and-deliver Two new tests: - tests/gateway/test_telegram_format.py test_message_too_long_splits_into_continuations_not_silent_truncation: asserts edit_message returns success=True with continuation_message_ids populated and message_id pointing at the last continuation when content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original fail-on-overflow assertion with the split-and-deliver contract. - tests/gateway/test_stream_consumer.py TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver: asserts the consumer side updates _message_id to the latest continuation, clears _last_sent_text, and fires on_new_message when the adapter reports a split-and-deliver result.	2026-05-10 22:02:56 -07:00
Teknium	3b122cc1ac	feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578 ) Surface ready tasks that nobody claims within a threshold (default 30 min) regardless of why. One identity-agnostic signal that catches: - Operator typo'd the assignee - Profile was deleted, leaving its tasks stranded - External worker pool (Codex CLI lane, custom daemon) is down - Dispatcher misconfigured (wrong board / wrong HERMES_HOME) Today the dispatcher correctly skips these (no respawn loop, good) but nothing surfaces the fact that operator-actionable work is accumulating. The new `stranded_in_ready` rule does that without requiring a manual lane registry — it reads the most recent ready- transition event (`created` / `promoted` / `reclaimed` / `unblocked`) and fires when (now - last_ready_ts) > threshold. Severity escalates with age: warning at threshold, error at 2x, critical at 6x. The cli_hint and reassign actions point operators at the right next step. Out of scope deliberately: - Lane registry (#20157 closed) — this signal supersedes it. - Pushing the diagnostic into messaging gateways — diagnostics are pull-only via 'hermes kanban diagnostics' for now; gateway push is a separate UX decision. Tests: 10 new + 461 existing kanban tests pass. E2E verified end- to-end via 'hermes kanban diagnostics --json' against a 2h-old stranded task — surfaces as error severity with correct actions.	2026-05-10 21:58:44 -07:00
eloklam	b60462a205	test(kanban): remove stale t.summary assertion from search test Task.summary was never a real field; latest_summary already covers it. Matches the haystack cleanup in commit f3015e6ab.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	0ea234e093	feat(kanban): dashboard batch QOL upgrade - Shift-click range selection, column select-all, select-all-visible - Multi-card drag/drop via selectedIds + /tasks/bulk - Expanded bulk actions: todo/ready/blocked/unblock/complete/archive, priority setter, reassign with reclaim_first checkbox - Partial failure card highlight (failedIds + hermes-kanban-card--failed) - Search expanded to body, result, latest_summary, summary - Clear filters button + reset all filters on board switch - Accessibility: larger checkbox hit target, tabIndex/role/aria-label, Enter/Space/Esc keyboard handlers - Fix temporal-dead-zone bug: move clearSelected before moveSelected	2026-05-10 21:44:37 -07:00
Teknium	a63a2b7c78	fix(goals): force judge to use tool calls instead of JSON-text replies (#23547 ) Live-tested on gemini-3-flash-preview the judge kept returning empty or non-JSON content, tripping the consecutive-parse-failures auto- pause. Free-form JSON output is hopeful; tool-call schemas are enforced server-side by virtually every modern provider. Two new tools the judge calls: - submit_checklist(items) — Phase A, decompose - update_checklist(updates, new_items, reason) — Phase B, evaluate Both phases now call the auxiliary client with tool_choice forcing the right tool. read_file remains for Phase B history inspection, with the loop exiting only when update_checklist is called or the read budget is exhausted (at which point read_file is dropped from the toolbox and update_checklist is forced). Robustness: - _call_judge_with_tool_choice falls back tool_choice forced→required→ auto if the provider rejects a particular shape. - If a fully-broken provider still returns content instead of a tool call, the legacy JSON-text parsers stay around as a last-ditch backstop so we never silently lose a checklist. - _normalize_update_args replaces the JSON parser for the apply layer; same 1-based→0-based conversion + terminal-status filter. Live verification: same fizzbuzz goal that was hitting 'judge model returned unparseable output 3 turns in a row' before now terminates in 2 turns, all 11 items marked completed with item-specific evidence, no auto-pause. Agent log shows 'produced 11 checklist items via tool call' instead of the JSON- parse path. Tests: 7 new cases for the tool-call path (Phase A success, Phase B update only, Phase B read_file→update, JSON-content backstop, empty-text item dropping, non-terminal status filter).	2026-05-10 20:51:40 -07:00
Teknium	4a080b1d5a	fix(goals): forward standing /goal state on auto-compression session rotation (#23530 ) When run_agent's _compress_context fires mid-turn it ends the parent session in SessionDB and creates a new continuation session with a fresh session_id. The /goal state is keyed on session_id in state_meta ("goal:<sid>"), so without forwarding the goal silently disappears: _get_goal_manager() rebinds for the new session_id, load_goal() returns None, mgr.is_active() is False, and the continuation loop dies with no user-visible signal. Fix: in the same SessionDB transaction block that creates the continuation session, copy state_meta[goal:<old>] → state_meta[goal:<new>] when present. No-op when the user has no active goal. Logged at INFO so a stuck loop is debuggable. Tests cover the round-trip via SessionDB and the no-op path. Affects all three run-conversation surfaces (CLI, gateway, TUI gateway) because _compress_context is the single rotation site.	2026-05-10 20:41:53 -07:00
Mike Nguyen	ba5640fa11	fix(gateway): route kanban notifications to creator profile	2026-05-10 20:04:53 -07:00
teknium1	7f90141c63	test(telegram): native-draft transport coverage + docs Added tests/gateway/test_stream_consumer_draft.py with 11 tests covering: - Transport selection: auto+dm-supported -> draft; auto+group -> edit; explicit edit; explicit draft on unsupported adapter -> edit; MagicMock adapter -> edit (back-compat for the existing test suite). - Happy path: DM stream animates draft frames with a single shared draft_id, then finalizes via a regular adapter.send. - Group fallback: drafts entirely skipped in non-DM chats. - Failure fallback: send_draft returning success=False disables drafts for the rest of the response. - Draft_id lifecycle: consecutive responses use distinct ids; tool boundaries bump the id so post-tool text animates fresh below the tool-progress bubble (the openclaw #32535 leak guard). - _already_sent contract: drafts must NOT set the flag so the gateway's fallback final-send still fires (drafts have no message_id). Updated website/docs/user-guide/messaging/telegram.md with a 'Streaming transport' section explaining auto\|draft\|edit\|off, the DM-only constraint, and the per-response fallback behaviour.	2026-05-10 20:02:50 -07:00
Teknium	771b8c4a36	test(conftest): plug every gateway-kill leak path (#23486 ) The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.	2026-05-10 18:55:28 -07:00
Teknium	e5bce320db	fix(auxiliary): evict cached client on timeout/connection error (#23482 ) A Codex auxiliary timeout closes the underlying OpenAI client (so the streaming hang doesn't sit until the user kills the session), but the cached wrapper kept pointing at the now-dead transport. Subsequent auxiliary calls (compression retry, memory flush, background review, title generation routed via provider: main) reused that closed client and failed fast with 'Connection error' until the gateway restarted — even though the main agent route was healthy the whole time. Sync `_get_cached_client` had no liveness check (async did, via loop identity), and the connection-error fallback in `call_llm` only fired on the auto provider path, so an explicit provider — including the common `auxiliary.compression.provider: main` shape — never evicted. Three fixes: * New `_evict_cached_client_instance(target)` helper that drops the cache entry whose stored client is target (or wraps it via `_real_client`, for `CodexAuxiliaryClient`). * `_CodexCompletionsAdapter._close_client_on_timeout` evicts the wrapper after closing the inner OpenAI client. * `call_llm` and `async_call_llm` evict on `_is_connection_error` before re-raising, regardless of whether the provider is auto. Net effect: one timeout costs one summary attempt + the existing 30s compressor cooldown; the next compaction rebuilds the client and works. Non-connection errors (4xx/5xx) do not evict, so cache hits stay stable. Closes #23432	2026-05-10 18:55:05 -07:00
rahimsais	737314fe91	fix(telegram): normalize dm threads and retry control sends Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id issue (#3206): 1. _build_message_event filters DM thread_ids: only preserve thread_id for real topic messages (is_topic_message=True). Telegram puts message_thread_id on every DM that is a reply, but reply-chain ids route to nonexistent threads on send. 2. _send_message_with_thread_fallback helper: control sends (send_update_prompt, send_exec_approval / send_slash_confirm, send_model_picker) retry once without message_thread_id when Telegram returns BadRequest 'Message thread not found'. Mirrors the pattern PR #3390 added for the streaming send path. Salvage notes: - Conflict 1 (line ~4099): merged the contributor's DM is_topic_message filter with the existing forum General-topic default from #22423, preserving both behaviors. - Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416) alongside the new helper. Tightened the helper's exception catch from bare 'Exception' to use the existing _is_bad_request_error + _is_thread_not_found_error helpers (line 484-496) for consistency with the streaming send path. - Widened the fix to send_update_prompt (was bare self._bot.send_message, same bug class). Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@ local commit author).	2026-05-10 18:09:31 -07:00
Teknium	404640a2b7	feat(goals): /goal checklist + /subgoal user controls (#23456 ) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.	2026-05-10 16:56:51 -07:00
teknium1	121bbe0385	test(stream-consumer): add UTF-16 overflow regression tests for #11170 New TestUtf16OverflowDetection class covers two scenarios: - test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds 2200 emoji codepoints (4400 UTF-16 units) — under Telegram's codepoint-equivalent limit but over its UTF-16 limit. Asserts truncate_message was called with len_fn=utf16_len, confirming the consumer detected the overflow. - test_codepoint_only_adapter_falls_back_to_len: documents that adapters which don't subclass BasePlatformAdapter (or test MagicMocks) fall back to plain len for backwards compat. The contributor's PR shipped no tests for the UTF-16 path.	2026-05-10 16:21:07 -07:00
Teknium	c5f1f863ac	fix(cli): drive _prompt_text_input directly when off main thread (#23454 ) Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the process_loop daemon thread. prompt_toolkit.run_in_terminal returns a coroutine that only the main-thread event loop can drive, so calling it from a daemon thread orphans the coroutine — the input prompt never renders and user keystrokes leak into the composer instead of the confirmation prompt (issue #23185). Mirror the thread-aware guard already in _run_curses_picker: when off the main thread, fall back to a direct input() call. Also wrap run_in_terminal in try/except so WSL / Warp / other emulators that silently drop the scheduled coroutine fall back to input() too. Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main thread (run_in_terminal path), daemon thread (direct input fallback), no-app, run_in_terminal-raises, and EOF handling.	2026-05-10 16:16:10 -07:00
konsisumer	62cfe79e93	fix(tools): clarify kanban_complete phantom-card retry guidance When kanban_complete rejects a created_cards list as hallucinated, the task is intentionally left in-flight (the gate runs before the write txn) so the worker can retry with a corrected list or pass created_cards=[] to skip the check. The retry path already worked, but the previous error wording read like a terminal failure and workers were observed abandoning the run instead of trying again. Spell out the recovery path explicitly in the tool_error response ("Your task is still in-flight ... Retry kanban_complete with ...") and add regression coverage at both the kernel and tool layers so the retry contract — and the wording the worker depends on to discover it — is pinned. Fixes #22923	2026-05-10 16:14:43 -07:00
konsisumer	88588b6159	fix(kanban): extend stale claim instead of killing live worker Workers running slow models (e.g. kimi-k2.6) can spend longer than DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making no tool calls and therefore not heartbeating. release_stale_claims previously reclaimed these healthy workers, producing the spawn-then-immediately-reclaim loop reported in #23025. When a stale-by-TTL claim's host-local worker PID is still alive, extend the claim (emit a claim_extended event) rather than killing it. enforce_max_runtime / detect_crashed_workers remain the upper bounds for genuinely wedged or dead workers. Reclaim events now also record claim_expires, last_heartbeat_at, worker_pid, and host_local so operators can see why a worker was killed.	2026-05-10 15:23:04 -07:00
Teknium	d6e1fadbf5	fix(xai): omit reasoning.effort for grok models that reject it (#23435 ) xAI's Responses API returns HTTP 400 ("Model X does not support parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-, grok-4-1-fast-, grok-3, grok-4.20-0309-, and grok-code-fast-1 — even though those models reason natively. Hermes was unconditionally sending `reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking direct `--provider xai` for the entire grok-4 line. Add a substring allowlist predicate (verified live against api.x.ai 2026-05-10) covering the only Grok families that accept the effort dial: grok-3-mini, grok-4.20-multi-agent, grok-4.3. The Responses transport omits the `reasoning` key entirely for everything else while still including `reasoning.encrypted_content` so we capture native reasoning tokens. Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709` went from HTTP 400 to a successful reply.	2026-05-10 15:21:30 -07:00
teknium1	f9e0d60a99	test(thread-routing): handle both lark-SDK-present and absent paths The contributor's regression test for Feishu fallback thread routing asserted on attributes specific to the real lark SDK builder (call_args.body, body.receive_id). In test environments without the lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py _build_create_message_request) returns a SimpleNamespace using .request_body instead of .body, causing AttributeError. Now reads via getattr fallback and also verifies receive_id_type is 'thread_id' (not 'chat_id') as a stronger contract check.	2026-05-10 15:20:40 -07:00
黄飞虹	e164a9c1ed	fix(stream-consumer): preserve thread routing on overflow first-send path When the first streamed message exceeds the platform length limit and gets split into chunks, _send_new_chunk was called with self._message_id (which is None on first send), dropping thread routing entirely. Fallback to self._initial_reply_to_id so overflow chunks land in the correct topic/thread. Also fix a fragile test assertion that could be silently skipped.	2026-05-10 15:20:40 -07:00
hrygo	ff14666cdc	fix(gateway): stream consumer first message drops thread context Cherry-picked from PR #13077 commits: - 5500c7d8 fix(gateway): stream consumer first message drops thread context - e84403b9 test(gateway): add regression tests for stream consumer thread routing Fixes: Streaming first message drops thread/topic context in Feishu group topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and _send_new_chunk. Also fixes Feishu _send_raw_message fallback path (reply -> create) to use receive_id_type='thread_id' so the new message lands in the correct topic instead of the main channel. Authored by hrygo via PR #13077 (re-attributed from the bot-authored salvage commit on the original branch).	2026-05-10 15:20:40 -07:00
Teknium	6636fecd47	fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420 ) The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was copying the cumulative _already_sent flag into _final_response_sent on the done frame. _already_sent goes True on any successful prior edit (tool progress) or on fallback-mode promotion when an edit fails — neither proves the current chunked send delivered the final answer. When the chunked send actually fails (network error, flood control), the consumer would wrongly claim 'final delivered' and the gateway's independent fallback delivery in run.py would be suppressed. User saw only tool-progress bubbles and never got the answer. Now we track per-chunk success locally: _send_new_chunk returns the new message_id on success or returns the passed-in reply_to unchanged on failure. If at least one returned id differs, chunks_delivered = True; otherwise stays False, gateway fallback runs. Adds two regression tests: - test_split_overflow_failed_send_does_not_mark_final_sent — primes _already_sent=True, then makes every send fail; asserts _final_response_sent stays False. - test_split_overflow_partial_send_marks_final_sent — happy path, asserts _final_response_sent goes True. Note: the companion bug at the CancelledError handler (issue cited lines 417-418) was already fixed by `3b5572ded` on 2026-04-16. Closes #10748	2026-05-10 15:13:54 -07:00
Teknium	787e3c368c	test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests Follow-up to the previous commit's notifier behavior change. Two test fixes: 1. `tests/gateway/test_kanban_notifier.py` gains `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new contract directly: a task that crashes, gets reclaimed, and crashes again notifies the user BOTH times. Before #21398 the second crash silently dropped because the subscription was already deleted. 2. `tests/hermes_cli/test_kanban_notify.py:: test_notifier_unsubs_after_abnormal_events[gave_up\|crashed\|timed_out]` is flipped. Those tests were added in the salvage of #22941 and asserted the OLD behavior (subscription deleted after gave_up / crashed / timed_out). They're now obsolete — the new contract is "subscription survives a non-final terminal event so retries reach the user." Updated docstring + asserts; the cursor-advance check is added to confirm the dedup mechanism still works. The `test_notifier_unsubs_after_completed_event` test stays untouched because `completed` IS still a terminal event that triggers unsub (the task hits `done` status, which is handled by the `task_terminal` branch in the notifier loop).	2026-05-10 14:27:59 -07:00
teknium1	ec1fad3449	fix(gateway): align fallback delete with sibling style + add regression tests Follow-up to HuangYuChuh's #17384 cherry-pick: - Use defensive getattr+logger.debug for delete_message lookup, mirroring the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms that don't implement delete_message no longer raise AttributeError; the failure path now logs at debug for diagnosability instead of silently swallowing. - Add three regression tests in tests/gateway/test_stream_consumer.py: - delete_message awaited on happy-path exit with stale id - delete_message NOT awaited when no fallback chunks reached the user - no crash on adapters that lack delete_message (spec-restricted mock)	2026-05-10 14:22:59 -07:00
Teknium	cdb6e5e52a	test(conftest): block tests from killing the live hermes-gateway (#23397 ) The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest runs sending SIGTERM to the developer's live gateway 5+ times in 3 days. Root cause: when a single test forgets to mock os.kill or find_gateway_pids, the real call leaks past the hermetic HERMES_HOME isolation — find_gateway_pids' psutil scan walks the whole machine and returns the live gateway PID, then the unmocked os.kill delivers the signal. Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes, and stop_profile_gateway code paths, install a single autouse guard in tests/conftest.py that blocks the two primitives that actually cause the damage: - os.kill rejects any PID outside the test process subtree with a hard RuntimeError so the offending test gets a stack trace instead of silently murdering the real gateway. - subprocess.run / Popen / call / check_call / check_output reject any 'systemctl <verb> hermes-gateway' invocation that would mutate the live unit. Read-only systemctl calls (status, show, list-units) still pass through. We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids — tests of those functions themselves need the real implementation. Discovery without delivery is harmless; the os.kill + systemctl guards catch the actual damage path. Tests that legitimately need real signal delivery (e.g. PTY tests signalling their own child) opt out via @pytest.mark.live_system_guard_bypass. Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce the same 17 failures with and without this guard (all pre-existing on main, unrelated to gateway-kill leaks). The live gateway survives the test run that previously SIGTERMed it.	2026-05-10 13:20:27 -07:00
Teknium	9c68d12079	test(kanban): cover send-exception rewind + drop noisy success log to debug Two follow-up improvements to the previous commit's notifier dedup work. 1. Add a regression test for the send-exception rewind path. The contributor's PR included a test for the adapter-disconnect path (test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where adapter is None at delivery time), but not for the "adapter is connected, send() raises" path that fires inside the inner try/except at gateway/run.py:4314. The new test (test_kanban_notifier_rewinds_claim_on_send_exception) uses a FailingAdapter that always raises and confirms (a) send was actually attempted, (b) the claim was rewound, (c) the next call to unseen_events_for_sub still returns the event for retry. 2. Drop the per-delivery success log from INFO to DEBUG. A busy board on a multi-platform gateway can produce hundreds of these per day; that's gateway.log noise that obscures real warnings. Failure paths stay at WARNING (where you'd want to look when something's wrong) so we don't lose visibility into transient send issues.	2026-05-10 13:19:41 -07:00
Mike Nguyen	861ce7c0b6	fix: dedupe kanban notifier delivery claims	2026-05-10 13:19:41 -07:00
teknium1	00ce5f04d9	feat(session): make /handoff actually transfer the session live Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow deferred everything to whenever a real user happened to message the target platform; this rewrites it so the gateway picks up handoffs immediately and the destination chat just starts working. State machine on sessions table replaces the boolean flag: None -> 'pending' -> 'running' -> ('completed' \| 'failed') plus handoff_error for failure reasons. CLI request_handoff / get_handoff_state / list_pending_handoffs / claim_handoff / complete_handoff / fail_handoff helpers wrap the transitions. CLI side (cli.py): /handoff <platform> validates the platform's home channel via load_gateway_config, refuses if the agent is mid-turn, flips the row to 'pending', and poll-blocks (60s) on terminal state. On 'completed' it prints the /resume hint and exits the CLI like /quit. On 'failed' or timeout it surfaces the reason and the CLI session stays intact. Gateway side (gateway/run.py): new _handoff_watcher background task scans state.db every 2s, atomically claims pending rows, and runs _process_handoff for each. _process_handoff: 1. Resolves the platform's home channel. 2. Asks the adapter for a fresh thread via the new create_handoff_thread(parent_chat_id, name) capability so the handed-off conversation gets its own scrollback. Adapters that don't support threads (or fail) return None and the watcher falls back to the home channel directly. 3. Constructs a SessionSource keyed as 'thread' when a thread was created, 'dm' otherwise, then session_store.switch_session re-binds the destination key to the CLI session_id. The full role-aware transcript replays via load_transcript on the next turn (no flat-text injection into context_prompt). 4. Forges a synthetic MessageEvent(internal=True) with the handoff notice and dispatches through _handle_message; the agent runs against the loaded transcript and adapter.send delivers the reply. 5. Marks the row 'completed' on success, 'failed' (+error) on any exception. Adapter capability (gateway/platforms/base.py): create_handoff_thread default returns None. Three overrides: - Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic so DM topics (Bot API 9.4+) and forum supergroups both work. - Discord (gateway/platforms/discord.py): parent.create_thread on text channels with a seed-message + message.create_thread fallback for permission edge cases. Skips DMs and other non-thread-capable parents. - Slack (gateway/platforms/slack.py): posts a seed message and returns its ts as the thread anchor — Slack threads are message-anchored. In thread mode, build_session_key keys the destination without user_id (thread_sessions_per_user defaults to False) so the synthetic turn and any later real-user message in the thread share the same session_key — seamless takeover without race. CommandDef stays cli_only=True (handoff is initiated from the CLI; gateway exposes /resume for the reverse direction). Removed the original PR's _handle_message_with_agent handoff hook (transcript-as-text injection into context_prompt) and the send_message_tool notification — both replaced by the watcher path. Tests rewritten around the new state machine: 13/13 pass. E2E-validated thread + no-thread paths and the failure path against real worktree imports with mocked adapters.	2026-05-10 13:06:25 -07:00
kshitijk4poor	878611a79d	feat(session): add /handoff command for cross-platform session transfer Adds /handoff <platform> CLI command that queues the current session for resume on the configured home channel of any messaging platform. CLI side: - /handoff telegram — marks session in shared DB, sends summary to the Telegram home channel via send_message - /handoff discord — same for Discord - Supports telegram, discord, slack, whatsapp, signal, matrix Gateway side: - On new session creation, checks for pending handoffs for the incoming message's platform - If found, loads the CLI session's full conversation history and injects it into the context prompt as a handoff transcript - Agent continues the conversation seamlessly Files: - hermes_state.py: handoff_pending, handoff_platform columns + helpers - cli.py: _handle_handoff_command dispatch + handler - hermes_cli/commands.py: CommandDef entry - gateway/run.py: handoff detection in _handle_message_with_agent - tests/hermes_cli/test_session_handoff.py: 8 tests	2026-05-10 13:06:25 -07:00
Teknium	a282434301	feat(gateway): per-platform admin/user split for slash commands (salvage of #4443 ) (#23373 ) * feat(gateway): per-platform admin/user split for slash commands Adds an opt-in two-list access control on top of the existing per-platform `allow_from` allowlists, scoped to slash commands only: - allow_admin_from — full slash command access - user_allowed_commands — what non-admins may run - group_allow_admin_from — same, group/channel scope - group_user_allowed_commands When `allow_admin_from` is unset for a scope, gating is disabled and every allowed user keeps full access (backward compat). Plain chat is unaffected. `/help` and `/whoami` are always reachable so users can see what they can run. Gate runs at the slash command dispatch site in gateway/run.py and uses `is_gateway_known_command()`, so it covers built-in AND plugin-registered commands through the live registry without per-feature wiring. Adds `/whoami` showing platform, scope, tier, and runnable commands. Salvage of PR #4443's permission tier work, scoped down. The full tier system, tool filtering, audit log, usage tracking, rate limiting, `/promote` flow, and persistent SQLite stores are not included here — those can be re-expanded later if needed. Co-authored-by: ReqX <mike@grossmann.at> * fix(gateway): close running-agent fast-path bypass + add coverage and central docs The slash command access gate was only applied at the cold dispatch site (line ~5921). When an agent was already running, the running-agent fast-path block (line ~5574) dispatched /restart, /stop, /new, /steer, /model, /approve, /deny, /agents, /background, /kanban, /goal, /yolo, /verbose, /footer, /help, /commands, /profile, /update directly without going through the gate — letting non-admins bypass gating just because an agent happens to be busy. Refactored the gate into _check_slash_access() and called from BOTH paths. /status remains intentionally pre-gate so users can always see session state. Also added 18 more dispatch tests covering: - Running-agent fast-path: blocks non-admin, allows admin, /status always works - Alias canonicalization (gate uses canonical name, not user alias) - Unknown / unregistered commands pass through (don't false-positive) - DM admin scope-locked when group has its own admin list - Multi-platform isolation (Discord gated, Telegram unrestricted) Docs: added Slash Command Access Control section to the central messaging index page + /whoami row in the chat commands table. Co-authored-by: ReqX <mike@grossmann.at> --------- Co-authored-by: ReqX <mike@grossmann.at>	2026-05-10 12:33:54 -07:00
guglielmofonda	845be254ec	fix(kanban): cap dispatch by running workers	2026-05-10 09:13:07 -07:00
Teknium	cede612987	feat(gateway): shutdown forensics — non-blocking diag, per-phase timing, stale-unit warning (#23285 ) When the gateway received SIGTERM, the shutdown_signal_handler ran a synchronous 'ps aux' (3s timeout) inside the asyncio event loop, then asyncio.create_task(runner.stop()). On a busy host that ate 1-3s of the teardown budget before draining could even start, and the resulting log line was a multi-line ps dump that didn't tell us who sent the signal. The shutdown path itself logged 'Stopping gateway...' and then nothing until 'Gateway stopped' — when systemd SIGKILLed mid-drain, there was no way to see which phase wedged. Changes: - New gateway/shutdown_forensics.py: * snapshot_shutdown_context(sig) — sub-millisecond /proc-only capture of signal name, parent pid+name+cmdline, INVOCATION_ID (systemd marker), loadavg_1m, TracerPid, takeover/planned-stop marker presence + whether-it-names-self. Pure stdlib, never raises. * spawn_async_diagnostic(log_path, sig) — detached subprocess with its own 'timeout 5s', start_new_session=True, writes ps auxf + pstree + dmesg to ~/.hermes/logs/gateway-shutdown-diag.log. Returns immediately, can't block the event loop or the cgroup teardown. * check_systemd_timing_alignment(drain_timeout) — reads /proc/self/cgroup for our unit, asks systemctl show for TimeoutStopUSec, returns mismatch info when the unit's stop timeout is smaller than restart_drain_timeout + 30s headroom (the case where systemd SIGKILLs mid-drain). * _parse_systemd_duration_to_us — covers '90s', '1min 30s', '500ms', '1h' style values from systemctl show. * format_context_for_log — single scannable key=value line, parent cmdline last. - gateway/run.py shutdown_signal_handler: * Replaces synchronous ps aux + ad-hoc 'hermes-related lines' filter with snapshot + detached spawn. * Always logs 'Shutdown context: signal=... parent_pid=... parent_cmdline=...' regardless of planned/unexpected so we can correlate signal source even on planned restarts. - gateway/run.py _stop_impl: * Per-phase '+X.XXs' timing for notify_active_sessions, drain (with drain_seconds, active_at_start, active_now, timed_out), post-interrupt tool kill, each adapter disconnect (Xs), all adapters disconnected, final-cleanup tool kill, SessionDB close, total teardown. - gateway/run.py start(): * Stale-unit warning at startup when the running systemd unit's TimeoutStopSec is smaller than the configured drain timeout. Points the user at 'hermes gateway service install --replace' to regenerate, or at shortening agent.restart_drain_timeout. Tests: 30 new in tests/gateway/test_shutdown_forensics.py — snapshot speed bound, signal name resolution, marker detection self-vs-other, async diag spawn doesn't block caller, systemd duration parser, and alignment check returns None outside systemd. Wider tests/gateway/ suite: 5258 passing, 3 pre-existing TTS-routing failures unchanged on main.	2026-05-10 09:01:51 -07:00
Teknium	1f5983c4c8	feat(kanban): aggregate all toolset-name typos in skills before raising Follow-up to the previous commit's toolset-vs-skill validation. The contributor's fix raises ValueError on the first toolset name found in the skills list. That works for one mistake, but agents that confuse skills with toolsets usually pass several at once (`skills=["web", "browser", "terminal"]`) — and serial-correcting one per failure round-trip wastes tokens. Collect all toolset-shaped entries first, then raise once with the full list. The error message is also slightly clearer: 'web', 'browser', 'terminal' are toolset names, not skill name(s). Put toolsets in the assignee profile's `toolsets:` config instead of per-task skills. Skills are named skill bundles (e.g. `kanban-worker`, `blogwatcher`); toolsets are runtime capabilities (e.g. `web`, `browser`, `terminal`). vs. the previous "the assignee profile's toolsets" — explicitly naming the YAML key (`toolsets:`) and giving concrete examples in both categories closes the conceptual gap that produced the bug to begin with. Adds one regression test (test_create_task_skills_lists_all_toolset_typos) covering the multi-name aggregation path. The single-typo test from the original PR still passes (the loose `match="toolset name"` matches both singular and plural forms).	2026-05-10 08:41:28 -07:00
LeonSGP43	673418dfa1	fix(kanban): reject toolset names in task skills	2026-05-10 08:41:28 -07:00
Teknium	a91e5a8759	feat(kanban-dashboard): native <details> collapse + skip empty metadata Two follow-up improvements to Tranquil-Flow's metadata-panel restyle. Both stay within the parent PR's "tone down the panel" scope. 1. Native <details>/<summary> collapse for verbose metadata. The parent PR consciously deferred this ("adding native expand/collapse would be the next step but requires UX agreement"). The default they asked for is straightforward: collapsed when the rendered JSON exceeds 300 chars (the threshold where the max-height: 8.5rem cap actually starts mattering), expanded otherwise. <details>/<summary> is the right primitive — zero JS, browser-handled state, accessible by default (keyboard-navigable, screen-reader announces the disclosure state), and survives any react-state churn for free. The OS-default disclosure marker is suppressed (list-style: none + ::-webkit-details-marker hidden) and replaced with a CSS ::before chevron that rotates 90deg on the [open] attribute, so the look is consistent across Firefox/WebKit/Blink without the double-marker that would otherwise appear on the platforms that still render the default triangle. 2. Skip rendering when metadata is an empty object. `r.metadata && ...` truthy-checks, but `{}` is truthy in JS — so a completed task with no actual metadata would render a "Metadata" labeled disclosure block containing literal `{}`. Adds an Object.keys(r.metadata).length > 0 guard so empty payloads render nothing instead of an empty disclosure stub. Tests: three new static-asset assertions covering the <details> shape, the empty-object skip, and the suppress-default-marker + animated-chevron CSS — all in `tests/plugins/test_kanban_dashboard_plugin.py`.	2026-05-10 08:30:42 -07:00
Tranquil-Flow	0e0ddaac8f	fix(kanban-dashboard): tone down completed-run metadata panel (#19548 ) Hand-rebased onto current main from PR #19980; the original branch was stale against main (~6 unrelated dashboard fixes had landed since), so applying the PR's dist files directly would have silently reverted them. The run-history panel in the task drawer rendered each completed run's `metadata` field as a `<code class="hermes-kanban-run-meta">` containing `JSON.stringify(r.metadata)` — a single unindented monoline. With `white-space: pre-wrap` and a monospace font, a writer task's metadata (changed_files paths, source URLs, generated-artifact details) wrapped into a tall block of code-ish text that filled the parent run row. The container's faint `--color-foreground 3%` background then made the whole thing read like a crash dump even though the run completed normally. Restyle and label, no interactivity changes: - Wrap the meta payload in a `.hermes-kanban-run-meta-block` sub-block with an explicit `Metadata` label (small, uppercase, muted) so the panel reads as auxiliary detail at a glance. - Pretty-print the JSON (`indent=2`) so the structure is scannable instead of a wall of monoline text. - Cap `.hermes-kanban-run-meta` at `max-height: 8.5rem; overflow: auto` so a verbose blob scrolls inside its own pane rather than swamping the run row. - Sub-block uses a thin `border-left` rule and `background: transparent` — distinct from the destructive-tinted treatment used by crashed / timed_out / blocked / spawn_failed runs higher in the same file. Tests: two new static-asset assertions in `tests/plugins/test_kanban_dashboard_plugin.py` lock in the rendered shape (the plugin ships built-only, no src/).	2026-05-10 08:30:42 -07:00
Teknium	d4b26df897	perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226 ) Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a fast path when a supervisor is alive for the current task_id. Replaces the ~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms Runtime.evaluate over the supervisor's already-connected WebSocket. Falls through to the existing agent-browser CLI path when no supervisor is running (e.g. backends without CDP, or before the first browser_navigate attaches one), so behaviour is unchanged where it can't apply. JS-side exceptions surface directly without falling through to the subprocess (the subprocess would just re-raise the same error, slower); supervisor-side failures (loop down, no session) fall through cleanly. Benchmark — 30 iterations of `1 + 1` against headless Chrome: supervisor WS mean= 0.96ms median= 0.91ms agent-browser subprocess mean=179.35ms median=167.73ms → 187x speedup mean Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5 real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome being installed). Browser test suite: 355 passed, 1 skipped.	2026-05-10 07:37:55 -07:00
Teknium	08c5b35a73	test(kanban-dashboard): pin assignee-casing static-asset regressions + AUTHOR_MAP Follow-up to the previous commit's casing fix. The original PR shipped the dist edits without test coverage. The contributor's reasoning (UI-only attributes in a pre-built JS bundle, nothing meaningful to unit-test) is fair, but a static-asset assertion catches the most likely regression vector — a future rebuild of the dist bundle that loses the attributes — at near-zero cost. Adds two regression tests in tests/plugins/test_kanban_dashboard_plugin.py: - test_dashboard_assignee_inputs_preserve_casing — reads dist/index.js and asserts autoCapitalize="none", autoCorrect="off", spellCheck=false, and textTransform="none" each appear at least twice (one per assignee input — inline triage/lane create + task-edit panel). - test_dashboard_lane_head_preserves_assignee_casing — reads dist/style.css and asserts the .hermes-kanban-lane-head rule body does NOT contain text-transform: uppercase. Locates the rule by marker so unrelated CSS churn nearby doesn't flake the test. Both follow the same shape as the existing test_dashboard_requests_default_board_explicitly static-asset guard from PR #22940's salvage. Also adds the AUTHOR_MAP entry for princepal9120's GitHub-noreply email so release notes credit the right account.	2026-05-10 07:35:01 -07:00
Teknium	40a4bfa719	test(kanban): cover task_age safe-int guards + AUTHOR_MAP entry Follow-up to the previous commit's safe-int task_age fix. The original PR shipped without test coverage. This commit adds: - test_safe_int_accepts_int_and_int_string — sanity for the well-typed path so the helper itself can't quietly start swallowing valid values. - test_safe_int_returns_none_on_corrupt_inputs — the failure modes (None, '%s', 'abc', '', '1.5', random objects). Covers both the ValueError and TypeError catch branches. - test_task_age_handles_corrupt_created_at — the headline regression: a task with created_at='%s' used to raise ValueError and turn GET /api/plugins/kanban/board into a 500. - test_task_age_handles_corrupt_started_and_completed — confirms the safe-int treatment is consistent across all three timestamp fields. - test_task_age_well_formed_task — regression that the safe path doesn't change observable output for normal data. - test_task_dict_survives_corrupt_created_at — defense in depth. Writes a corrupt row directly via SQL, reads it back through the ORM, and confirms task_age + the surrounding plugin_api guard degrade gracefully instead of crashing. Also adds the AUTHOR_MAP entry for the contributor's GitHub-noreply email so release notes credit @baocin (the commit was authored locally as `aoi <aoi@hino.local>` — re-attributed during salvage to the github noreply form).	2026-05-10 07:15:59 -07:00
Teknium	c39168453d	feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914 ) * feat(i18n): localize /model command output Reported by @tianma8888: when Chinese users run /model, the labels ("Provider:", "Context:", "_session only_", etc.) are still English. This routes the static prose through the existing i18n catalog so it follows display.language / HERMES_LANGUAGE. Changes: - locales/{en,zh,ja,de,es,fr,tr,uk}.yaml: add 17 keys under gateway.model.* covering switched/provider/context/max_output/cost/ capabilities/prompt_caching/warning/saved_global/session_only_hint/ current_label/current_tag/more_models_suffix/usage_. - gateway/run.py _handle_model_command: replace hardcoded f-strings in the picker callback, the text-list fallback, and the direct-switch confirmation block with t("gateway.model.<key>", ...). What stays English: - model IDs, provider slugs, capability strings, cost figures, and the "[Note: model was just switched...]" prepended to the model's next prompt (LLM-facing, not user-facing). - The two slightly-different session-only hints unify on a single key with the em-dash phrasing. Validation: tests/agent/test_i18n.py 27/27 passing (parity contract holds), tests/gateway/ -k 'model or i18n' 74/74 passing. feat(i18n): localize all gateway slash command outputs Expands the i18n catalog from 7 strings to 234 keys across 35 gateway slash command handlers, so non-English users see localized output for \`/profile\`, \`/status\`, \`/help\`, \`/personality\`, \`/voice\`, \`/reset\`, \`/agents\`, \`/restart\`, \`/commands\`, \`/goal\`, \`/retry\`, \`/undo\`, \`/sethome\`, \`/title\`, \`/yolo\`, \`/background\`, \`/approve\`, \`/deny\`, \`/insights\`, \`/debug\`, \`/rollback\`, \`/reasoning\`, \`/fast\`, \`/verbose\`, \`/footer\`, \`/compress\`, \`/topic\`, \`/kanban\`, \`/resume\`, \`/branch\`, \`/usage\`, \`/reload-mcp\`, \`/reload-skills\`, \`/update\`, \`/stop\` (plus the \`/model\` block already added in the previous commit). Reported by @tianma8888 — Chinese users want command output prose in their language, not just the labels we already had. Translations are hand-written for all 8 supported locales (en, zh, ja, de, es, fr, tr, uk), matching each catalog's existing style: full-width punctuation in zh, em-dashes in zh/ja/uk, French spaced colons, German noun capitalization, etc. What stays English (unchanged): - Identifiers/values: model IDs, file paths, profile names, session IDs, command flag names like --global, URLs, config keys. - Backtick code spans: \`/foo\`, \`config.yaml\`. - Log messages (logger.info/warning/error). - LLM-facing system notes prepended to next prompt (e.g. [Note: model was just switched...]). - Strings produced by external modules (gateway_help_lines, format_gateway, manual_compression_feedback) — those have their own surfaces. New shared keys for cross-handler boilerplate: - gateway.shared.session_db_unavailable (5 call sites: branch, title, resume, topic, _disable_telegram_topic_mode_for_chat) - gateway.shared.session_not_found (1 site) - gateway.shared.warn_passthrough (2 sites in /title's f"⚠️ {e}" pattern) YAML gotcha fixed: \`yolo.on\` and \`yolo.off\` were originally written unquoted, which YAML 1.1 parses as boolean True/False keys. Renamed to \`yolo.enabled\` / \`yolo.disabled\` for both safety and clarity. Test fix: tests/agent/test_i18n.py::test_t_missing_key_in_non_english_falls_back_to_english now resets the catalog cache on teardown, so the fake "foo: English Foo" locale doesn't poison the module-level cache for subsequent tests in the same xdist worker. (Without this, every gateway slash command test that shares a worker with the i18n suite would see the fake catalog.) Validation: - tests/agent/test_i18n.py: 27/27 (parity contract — every key in every locale, matching placeholder tokens). - tests/gateway/: 5077 passed, 0 failed (full gateway suite). - 180 t() call sites added across 35 handlers; 1872 catalog entries total (234 keys × 8 locales). * feat(i18n): add 8 new locales — af, ko, it, ga, zh-hant, pt, ru, hu Expands the static-message catalog from 8 → 16 languages, each with full 270-key parity against the English source-of-truth. Every locale now covers the same surface PR #22914 added: approval prompts plus all 35 gateway slash command outputs. New locales: - af Afrikaans (community ask in #21961 by @GodsBoy; PRs #21962, #21970) - ko Korean (PRs #20297 by @tmdgusya, #22285 by @project820) - it Italian (PR #20371 by @leprincep35700) - ga Irish/Gaeilge (PR #20962 by @ryanmcc09-dot) - zh-hant Traditional Chinese (PRs #20523 by @jackey8616, #13140 by @anomixer) - pt Portuguese (PRs #20443 by @pedroborges, #15737 by @carloshenriquecarniatto, #22063 by @Magaav) - ru Russian (PR #22770 by @DrMaks22) - hu Hungarian (PR #22336 by @lunasec007) Each locale uses native-quality translations matching the existing tone and conventions of the older 8 locales: - zh-hant uses 繁體 characters with TW/HK technical vocabulary (軟體 not 软件, 連線 not 连接, 設定 not 设置, 訊息 not 消息, 工作階段 not 会话, 程式 not 程序, 預設 not 默认, 伺服器 not 服务器), full-width punctuation 「：（）」. - ko uses formal 합니다체 (습니다/합니다) register throughout. - pt uses European Portuguese as baseline with neutral PT/BR vocabulary where possible. - ga uses standard An Caighdeán Oifigiúil; English loanwords retained for tech terms without good Irish equivalents (gateway, API, JSON). - All preserve {placeholder} tokens, backtick code spans, slash commands, brand names (Hermes, MCP, TTS, YOLO, OpenAI, Telegram, etc.), and emoji. Aliases added in agent/i18n.py: - af-za, Afrikaans → af - ko-kr, Korean, 한국어 → ko - it-it, italiano → it - ga-ie, Irish, Gaeilge → ga - zh-tw, zh-hk, zh-mo, traditional-chinese → zh-hant (note: zh-tw used to alias to zh; now aliases to its own zh-hant catalog) - zh-cn, zh-hans, zh-sg → zh (unchanged from before) - pt-pt, pt-br, brazilian, portuguese → pt - ru-ru, Russian, русский → ru - hu-hu, Magyar → hu The zh-tw alias re-routing is intentional: previously typing 'zh-TW' got the Simplified Chinese catalog (wrong vocabulary for Taiwan/HK users). Now those users get the proper Traditional Chinese catalog. Validation: - tests/agent/test_i18n.py: 43/43 (parity contract holds for all 16 languages × 270 keys = 4320 catalog entries, with matching placeholder tokens). - E2E alias resolution verified for all 19 alias inputs (Afrikaans, ko-KR, 한국어, italiano, Gaeilge, zh-TW, zh-HK, traditional-chinese, pt-BR, brazilian, Magyar, etc.). - tests/gateway/: 5198 passed (3 pre-existing TTS routing failures unrelated to i18n). Credit to all contributors whose PRs surfaced these language requests. Their original PRs may now be closed as superseded with credit. * feat(dashboard-i18n): add 14 web dashboard locales matching the static catalog Brings the React dashboard (web/src/) up to the same 16-language coverage the static catalog already has after the previous commits in this PR. The Translations interface is TypeScript-typed, so every new locale must provide every key — tsc -b is the parity guard. Languages added (each is a complete 429-line locale file): - af Afrikaans - ja Japanese (PR #22513 by @snuffxxx surfaced this) - de German (PR #21749 by @mag1art) - es Spanish (PR #21749) - fr French (PRs #21749, #10310 by @foXaCe) - tr Turkish - uk Ukrainian - ko Korean (PRs #21749, #18894 by @ovstng, #22285 by @project820) - it Italian - ga Irish (Gaeilge) - zh-hant Traditional Chinese (PR #13140 by @anomixer) - pt Portuguese (PRs #22063 by @Magaav, #22182 by @wesleysimplicio, #15737 by @carloshenriquecarniatto) - ru Russian (PRs #21749, #22770 by @DrMaks22) - hu Hungarian (PR #22336 by @lunasec007) Each translation covers all 15 namespaces with full key parity vs en.ts, preserves every {placeholder} token verbatim, keeps identifiers untranslated (brand names, file paths, cron expressions, code spans), translates the language.switchTo tooltip into the target language, and matches existing tone conventions (zh-hant uses TW/HK vocab; ja uses formal desu/masu; ko uses formal seumnida register; ga uses An Caighdean Oifigiuil with English loanwords for tech vocab without good Irish equivalents). Plumbing: - web/src/i18n/types.ts: Locale union expanded to all 16 codes. - web/src/i18n/context.tsx: imports all 16 catalogs; exports LOCALE_META (endonym + flag per locale); isLocale() type guard. - web/src/i18n/index.ts: re-export LOCALE_META. - web/src/components/LanguageSwitcher.tsx: replaced two-state EN-ZH toggle with a click-to-open dropdown listing all 16 languages. Note: zh-hant.ts exports zhHant (camelCase) since hyphen is invalid in a JS identifier; the canonical 'zh-hant' string keys it in TRANSLATIONS. Validation: - npx tsc -b: 0 errors. Every locale satisfies Translations. - npm run build (tsc + vite production): green, 2062 modules. - Each locale file is exactly 429 lines. Out of scope: plugin dashboards (kanban/achievements ship as prebuilt bundles with no source in repo); Docusaurus docs (separate surface); TUI (no i18n yet). * feat(plugin-i18n): localize achievements + kanban plugin dashboards across all 16 locales Brings the two shipped plugin dashboards (hermes-achievements, kanban) under the same i18n umbrella as the core dashboard PR #22914 just established. Both bundles now read user-facing strings from the host's i18n catalog via SDK.useI18n() instead of hardcoded English. ## Approach Plugin dashboards ship as prebuilt IIFE bundles in plugins/<name>/dashboard/dist/index.js — no build step, no source in repo (upstream-authored, vendored as compiled JS). Earlier contributor PRs (#22594, #22595, #18747) tried direct edits but didn't actually wire the bundles to read translations. This change does the wiring properly: 1. Each bundle gets a useI18n shim at IIFE scope: const useI18n = SDK.useI18n \|\| function () { return { t: { kanban: null }, locale: "en" }; }; Older host SDKs without useI18n still load the bundle and render English fallbacks. 2. A small tx(t, path, fallback, vars) helper resolves dotted keys under the plugin's namespace (t.kanban.* or t.achievements.) and interpolates {placeholder} tokens. 3. Every React component starts with const { t } = useI18n() and each user-visible string is wrapped in tx(t, "key", "English fallback"). Helpers called outside React components (window.prompt callers, constants used during init) take t as a parameter. 4. Top-level constants that were English dictionaries (COLUMN_LABEL, COLUMN_HELP, DESTRUCTIVE_TRANSITIONS, DIAGNOSTIC_EVENT_LABELS in kanban) become getColumnLabel(t, status)-style functions backed by FALLBACK_ dictionaries. ## Translations added Two new top-level namespaces added to the dashboard's TypeScript-typed Translations interface: - achievements: ~70 keys covering the hero, scan banner, achievement card, share dialog, stats, filters, and empty states. - kanban: ~145 keys covering the board, columns (with nested columnLabels and columnHelp sub-dicts), card detail panel, bulk-actions toolbar, dependency editor, board switcher, and diagnostic callouts. Each key is provided across all 16 supported locales: en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu. Total new translation entries: ~3,440 (215 keys × 16 locales). ## What stays English (deliberate) - API paths, CSS class names, data-* attributes, JSON keys, regex strings, URLs, file paths (~/.hermes/kanban.db, boards/_archived/). - State identifier strings used as lookup keys (triage / todo / ready / running / blocked / done / archived) — labels translate, key strings don't. - The PNG share-card text rendered to canvas in the achievements ShareDialog (HERMES AGENT watermark, UNLOCKED stamp, tier names) — these become part of a globally-shared image and stay English. - localStorage keys (hermes.kanban.selectedBoard). - Brand names (Kanban, Hermes, WebSocket, Nous Research). ## Contributor credit PR #22594 by @02356abc and PR #22595 by @02356abc supplied the en + zh kanban namespace skeleton (145 keys); used as the en source- of-truth in this commit and translated to the other 14 locales. PR #18747 by @laolaoshiren first surfaced the achievements localization request. ## Validation - npx tsc -b: 0 errors. All 16 locale .ts files satisfy the Translations type with full key parity. - npm run build (tsc + vite production build): green, 2062 modules, 1.56MB JS / 95KB CSS, ~2.5s build. - node --check on both plugin bundles: parse cleanly. - 126 tx() call sites in kanban, 46 in achievements. ## Out of scope - TUI (ui-tui/) has no i18n infrastructure yet. - Docusaurus docs (website/i18n/) — already had zh-Hans; expanding is a separate translation workstream (Thai / Korean / Hindi PRs).	2026-05-10 07:14:14 -07:00
Teknium	62b1c74cbc	fix(kanban): correct dispatcher spawn module name + PATH-first lookup Follow-up to the previous commit's contributor cherry-pick. The cherry-picked change replaced the bare ``["hermes", ...]`` spawn with ``[sys.executable, "-m", "hermes", ...]``. The intent was right (avoid PATH dependence — cron, systemd User= services, launchd jobs, and other detached dispatcher invocations routinely run with a stripped $PATH that doesn't include the venv's bin/, breaking the bare-shim spawn) but the module name is wrong: there is no top-level ``hermes`` package. The console-script entry point in pyproject.toml is ``hermes = "hermes_cli.main:main"``, and ``python -m hermes`` fails with ``No module named hermes``. The cherry-picked form would have replaced a sometimes-broken spawn with an always-broken one. This commit: - Adds ``_resolve_hermes_argv()``, mirroring ``gateway.run._resolve_hermes_bin``. Tries ``shutil.which("hermes")`` first (preferred — keeps existing ``ps`` output and log lines familiar in the common case) and falls back to ``[sys.executable, "-m", "hermes_cli.main"]`` when the shim is not on PATH. The fallback goes through the running interpreter so it's PATH-independent. Kept as a local helper rather than imported from gateway because ``hermes_cli`` sits below ``gateway`` in the dependency order. - Switches the dispatcher's ``cmd`` list to use ``_resolve_hermes_argv()``. - Adds three regression tests: ``test_resolve_hermes_argv_prefers_path_shim`` — pins the PATH-first branch so a future refactor doesn't silently flip the order. * ``test_resolve_hermes_argv_falls_back_to_module_form_when_no_path_shim`` — pins the correct module name (``hermes_cli.main``, NOT ``hermes``). Direct regression guard for the form that shipped in the original PR. * ``test_resolve_hermes_argv_module_actually_runs`` — runs the fallback invocation as a real subprocess and asserts ``--version`` works, so losing ``hermes_cli.main``'s ``__main__`` handling can't slip past the string-match test. Verified end-to-end: with the shim on PATH the resolver returns ``[/.../hermes]`` and ``--version`` works; with the shim removed the resolver returns ``[python, -m, hermes_cli.main]`` and ``--version`` still works; the original PR's ``python -m hermes`` invocation fails as expected (``No module named hermes``).	2026-05-10 07:10:47 -07:00
Teknium	5aa755e4e6	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 ) * feat(plugins): host-owned LLM access via ctx.llm Plugins can now ask the host to run a one-shot chat or structured completion against the user's active model and auth, without ever seeing an OAuth token or API key. Closes the gap where plugins that needed bounded structured inference (receipts, CRM extraction, support classification) had to either bring their own provider keys or register a tool the agent had to call. New surface on PluginContext: - ctx.llm.complete(messages, ...) - ctx.llm.complete_structured(instructions, input, json_schema, ...) - async siblings ctx.llm.acomplete / acomplete_structured Backed by the existing auxiliary_client.call_llm pipeline — every provider, fallback chain, vision routing, and timeout policy Hermes already supports applies automatically. Trust gate (fail-closed by default): - plugins.entries.<id>.llm.allow_model_override - plugins.entries.<id>.llm.allowed_models (allowlist; '' = any) - plugins.entries.<id>.llm.allow_agent_id_override - plugins.entries.<id>.llm.allow_profile_override Embedded model@profile shorthand goes through the same gate as explicit profile=, so it can't bypass the auth-profile policy. Conflicting explicit and embedded profiles fail closed. Also lands: - plugins/plugin-llm-example/ — reference plugin that registers /receipt-extract, demonstrating image+text structured input, jsonschema validation, and the trust-gate config. - website/docs/developer-guide/plugin-llm-access.md — full API docs. - 45 unit tests covering trust gates, JSON parsing, schema validation, image encoding, async surface, and config loading. Validation: - 2628 tests pass in tests/agent/ - E2E: bundled plugin loaded with isolated HERMES_HOME, slash command produced parsed JSON via stubbed call_llm - response_format extra_body wired correctly for both json_object and json_schema modes docs(plugin-llm): rewrite quickstart and framing The quickstart now uses a meeting-notes-to-tasks example instead of a receipt extractor, and the page leads with hook-time / gateway pre-filter / scheduled-job framing rather than the OpenClaw KB/support/CRM/finance/migration enumeration that the original upstream PR used. Receipt example moved to a separate worked example link so the docs page itself doesn't echo any of the upstream framing. Also clarifies where ctx.llm fits in the broader plugin surface (table comparing register_tool / register_platform / register_hook / etc.) and what makes this lane different from auxiliary_client internals. No code change. * docs(plugin-llm): reframe as any LLM call, not just structured output The original draft leaned heavily on complete_structured() and made the chat lane (complete() / acomplete()) feel like a footnote. Restructure so: - The page title and description say 'any LLM call.' - The lead shows BOTH a plain chat call (error rewriter) AND a structured call (triage scorer) up top. - Quick start has two complete plugin examples — /tldr (chat) and /paste-to-tasks (structured). - New 'When to use which' table for choosing complete() vs complete_structured() vs the async siblings. - Trust-gate sections explicitly note 'all four methods,' and the request-shaping list calls out chat-only fields (messages) and structured-only fields (instructions, input, json_schema) alongside each other. - The 'Where this fits' section now says 'for any reason, structured or not.' The receipt-extractor reference plugin still exists under plugins/plugin-llm-example/ — but the docs page no longer treats it as the canonical surface example. It's now described as 'a third worked example, this time with image input.' No code change. * feat(plugin-llm): split provider/model into independent explicit kwargs The first cut accepted a single 'provider/model' slug on every method and split it internally. That looked clean but broke under live test: the model-override path tried to use the slug's vendor prefix as a literal Hermes provider id, which silently switched the user off their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user who routes through OpenRouter — host attempted to call the 'openai' provider directly, failed because OPENAI_API_KEY wasn't set). New shape mirrors the host's main config: ctx.llm.complete( messages=[...], provider='openrouter', # gated, optional model='openai/gpt-4o-mini', # gated, optional profile='work', # gated, optional ... ) Each is independently gated by its own allow__override flag. Granting model-override does NOT auto-grant provider-override. Allowlists are now per-axis (allowed_providers, allowed_models) matched literally against whatever string the plugin sends. Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes doesn't use that pattern anywhere else; profile= is its own kwarg. Live E2E (against real OpenRouter via Teknium's config) confirms: - zero-config call works - default-deny blocks each override with a helpful error - model-only override stays on user's active provider (the bug) - provider+model override switches cleanly - allowlist refuses non-listed entries - structured output round-trip parses + schema-validates Tests: 49 cases (up from 45); all green. Docs updated to match the new shape, including a 'most plugins never need this section' callout on the trust-gate config block. fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core Three integration fixes for the ctx.llm surface: 1. Attribution bug — result.provider and result.model now reflect what call_llm actually used, not placeholder fallbacks ('auto', 'default'). New _resolve_attribution() helper: - explicit overrides win (what the call targeted) - response.model wins for the recorded model (provider canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.) - falls back to _read_main_provider() / _read_main_model() when no override is set, so audit logs reflect the user's active main provider/model - 'auto' / 'default' only when EVERYTHING is empty Live verified: zero-config call now records provider='openrouter', model='anthropic/claude-4.7-opus-20260416' instead of provider='auto', model='default'. 2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete works from inside a registered post_tool_call callback. The docs page promised hook integration; now there's a test that exercises the lazy-import path through the real invoke_hook machinery. Two cases: traceback-rewrite hook with conditional ctx.llm.complete, and minimal hook regression for the sync-hook + sync-llm path. 3. Reference plugin moved out of core. plugins/plugin-llm-example/ is gone from hermes-agent — it now lives in the new NousResearch/hermes-example-plugins companion repo. The docs page links there. Hermes' bundled plugins should be plugins users actually run; reference / docs-companion plugins live externally. Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/ + tests/gateway/ + tests/tools/ + tests/agent/ shows 16770 passing; the 12 failures are all pre-existing on origin/main (verified by stashing this branch's changes and re-running) — kanban-boards, delegate-task, gateway-restart, tts-routing — none touch the plugin_llm surface. * chore(plugins): move all example plugins to companion repo Reference / docs-companion plugins now live exclusively in NousResearch/hermes-example-plugins, not bundled with the core repo: - example-dashboard - strike-freedom-cockpit A new fourth example, plugin-llm-async-example, was added to that repo demonstrating ctx.llm's async surface (acomplete()) with asyncio.gather() — registers /translate <lang>: <text> which fires forward translation + sentiment classifier in parallel, then a back-translation for QA. Live-tested at 2.5s for three real provider round-trips (would be ~5-6s sequential). Docs updated: - developer-guide/plugin-llm-access.md links both sync and async examples in the Reference section - user-guide/features/extending-the-dashboard.md repoints both demo sections to the companion repo with corrected install paths - user-guide/features/built-in-plugins.md drops the two demo rows - AGENTS.md notes that example plugins live in the companion repo Net: hermes-agent's plugins/ directory now contains only plugins users actually run (memory providers, dashboard tabs that ship real features, the disk-cleanup hook, platform adapters). All four demo / reference plugins live externally where they can be cloned on demand instead of inflating the core install.	2026-05-10 07:09:28 -07:00
Teknium	ae4b09ce10	test(security): broaden plugin API auth coverage + correct stale docstring Follow-up to the previous commit's middleware fix. - plugins/kanban/dashboard/plugin_api.py: rewrite the "Security note" docstring. The previous text said "/api/plugins/ is unauthenticated by design" — that's now actively wrong and dangerously misleading. New text explains that plugin routes flow through the same session-token middleware as core API routes and that --host 0.0.0.0 is safe to use on a LAN as a result. - tests/hermes_cli/test_web_server.py: extend TestPluginAPIAuth to cover the surfaces the original PR didn't pin: * test_plugin_route_allows_auth now exercises a real plugin path (/api/plugins/example/hello) instead of accepting 200 OR 404 from a maybe-loaded kanban plugin — the assertion was effectively vacuous. * test_plugin_patch_requires_auth + test_plugin_delete_requires_auth cover non-GET mutation methods in case a future regression whitelists them by accident. * test_non_kanban_plugin_route_requires_auth proves the fix is plugin-agnostic, not kanban-specific (hits hermes-achievements + a non-existent plugin namespace; both 401 before route resolution). * test_plugin_websocket_unaffected_by_http_middleware locks in that the HTTP middleware change didn't accidentally start gating WS upgrades — kanban /events still uses its own ?token= check. Plus a cosmetic blank-line cleanup.	2026-05-10 07:04:18 -07:00
liuhao1024	ec9329ec41	fix(security): require dashboard auth for plugin API routes Remove the blanket /api/plugins/* exemption from auth_middleware so plugin API routes (e.g. Kanban dashboard) require the same session token as all other /api/ endpoints. Fixes #19533	2026-05-10 07:04:18 -07:00
Teknium	7312f7f849	feat(curator): hint at `hermes curator pin` in the rename block (#23212 ) Surfaces the pin command at the moment users care about it: when a consolidation just landed against their skill library and they're looking at the umbrella name in the curator output. Previously `hermes curator pin` existed but had no discovery surface — users only learned it existed by reading docs or stumbling onto `hermes curator --help`. The hint: archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale — pruned (stale) full report: hermes curator status keep an umbrella stable: hermes curator pin document-tools Gated on having at least one consolidation that produced an umbrella. Pruned-only runs (nothing surviving to pin) skip the hint. When multiple umbrellas were produced, picks alphabetically first as a concrete example rather than listing them all. 3 new tests in tests/agent/test_curator_classification.py covering: consolidation produces hint with real umbrella name, pruned-only run omits it, multi-umbrella picks one example.	2026-05-10 06:44:53 -07:00

... 3 4 5 6 7 ...

3777 commits