hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-21 10:22:18 +00:00

Author	SHA1	Message	Date
ruangraung	8cf7df867e	fix(plugins): silence raft check_fn log spam for users without raft CLI The raft platform plugin's check_raft_requirements() logged a WARNING every time it returned False. Since check_fn is called on every load_gateway_config() (~every 10s during normal gateway operation), users who don't have the raft CLI installed get their logs flooded with no way to suppress it — hermes plugins disable doesn't work for bundled platform plugins, and platforms.raft.enabled: false doesn't gate the check_fn call. Fix: make check_raft_requirements() a silent predicate (return True/False only, no logging), matching the convention documented and used by other platform adapters (e.g. teams/adapter.py). The caller in gateway/platform_registry.py create_adapter() already emits its own warning when requirements aren't met and an adapter is actually requested — that's the correct place for a user-facing warning (fires once per connect attempt, not once per config load). Fixes #49234	2026-06-19 17:12:58 -07:00
joaomarcos	75ed07ace8	fix(gateway): break the restart loop at the source on session resume When a tool call itself restarts the gateway (docker restart, systemctl restart, and similar), the process is terminated mid-call — before the tool result is persisted and before the orderly drain rewind can run. The transcript tail is left as an assistant(tool_calls) with no matching tool answer. On resume the model re-issues the unanswered call, taking the gateway down again — an infinite loop (#49201). Source fix: _build_gateway_agent_history now strips a trailing assistant(tool_calls) block that has no tool answers (_strip_dangling_tool_call_tail), so there is nothing for the model to re-execute. This complements _strip_interrupted_tool_tails, which only handles the case where a tool result row exists with an interrupt marker. Cognitive backstop: the resume-pending system note now states that any restart command in the history already ran and must not be re-executed or verified, and the empty-message auto-resume startup turn reports recovery and asks for instructions instead of the nonsensical "address the user's NEW message" (there is no new message on that turn). Reimplements the intent of #49243 by @JoaoMarcos44 at the replay layer. Fixes #49201	2026-06-19 16:59:58 -07:00
teknium1	6504f51cd5	chore: add @hakanpak to AUTHOR_MAP for PR #49282 salvage	2026-06-19 16:59:54 -07:00
hakanpak	d45addc2f1	fix(tools): never let a model whitelist strip the prompt / source images _build_fal_payload and _build_fal_edit_payload assemble the request and then filter it down to the model's supports / edit_supports whitelist. That filter also covers prompt (and image_urls for edits), which every FAL endpoint requires. Today all model configs happen to list those keys, but a single config that omits one would silently produce a request with no prompt or no source images — a broken generation with no error. Always keep the mandatory keys regardless of the whitelist so a missing whitelist entry can only drop optional knobs, never the prompt or the images.	2026-06-19 16:59:54 -07:00
sprmn24	8ebe37f6ad	feat(desktop): notify renderer when GPU acceleration is disabled due to remote display Remote displays (RDP/SSH/X11) silently disable GPU hardware acceleration with only a console.log, leaving the user unaware that software rendering is active. Expose the detected reason over IPC and surface a dismissible banner in the renderer.	2026-06-19 16:59:47 -07:00
teknium1	64b21e50fb	fix(cli): publish agent ref to cli module so memory on_session_end fires on exit The god-file Phase 4 refactor (`094aa85c37`) moved agent construction into CLIAgentSetupMixin, which set the atexit shutdown reference with a bare `global _active_agent_ref`. After extraction that global binds the mixin module's namespace, not cli.py's. cli._run_cleanup reads cli._active_agent_ref to decide whether to fire the memory provider's on_session_end hook — and it stayed None for the whole session, so the `if _active_agent_ref:` branch was dead and on_session_end never ran on /exit. Custom memory providers silently lost end-of-session extraction. Fix: publish the reference onto the cli module explicitly (`import cli as _cli; _cli._active_agent_ref = self.agent`), using the deferred-import pattern already established in the mixin. Regression test asserts cli._active_agent_ref is populated by the mixin's publish line and guards against a relapse to the bare `global` form. The existing shutdown tests passed only because they hand-assigned the ref, which is exactly what masked this.	2026-06-19 16:59:43 -07:00
Gille	013f9c8750	fix(memory): log CLI shutdown hook failures Makes the CLI memory-provider shutdown path observable: log when CLI cleanup calls memory shutdown (with session id + message count), warn instead of swallowing CLI memory-shutdown exceptions, warn on on_session_end failures during agent shutdown, and raise the MemoryManager provider-hook failure log from debug to warning with a traceback. Salvaged from PR #49287 (authored by Gille / @helix4u).	2026-06-19 16:59:43 -07:00
teknium1	c1a0b6a5f1	style: strip trailing whitespace in cron scheduler live-adapter block Follow-up on salvaged PR #49280.	2026-06-19 16:59:38 -07:00
joaomarcos	3a6c171e9e	fix(gateway): log signal transport response and bubble cron live adapter errors	2026-06-19 16:59:38 -07:00
joaomarcos	5649b8649a	Fix silent delivery failures in Signal live adapter (#49260 )	2026-06-19 16:59:38 -07:00
Teknium	5f55f0ff85	feat(teams): native send_video/send_voice/send_document attachments (#49308 ) Teams overrode send_image/send_image_file but not send_video, send_voice, or send_document — so when the gateway dispatched a video/voice/document reply to a Teams chat it fell through to the base-class text fallback and sent the local file path as plain text (same broken-UX class as the LINE URL-image gap in #49298). Extract the existing send_image attachment logic into a shared _send_media_attachment helper (remote URL by reference, local file as a base64 data URI, MIME guessed from the path) and route all four media kinds through it. 5 new tests cover remote-URL, local-file base64, no-app, and missing-file paths.	2026-06-19 16:20:59 -07:00
KeyArgo	1e40b21b2e	docs: clean up three stale comments from the #32848 audit (#45638 ) * docs: clean up three stale comments from the #32848 audit - tools/memory_tool.py:20 — 'read' action was intentionally removed but the docstring still listed it. Now matches the schema. - tools/fuzzy_match.py:9 — unicode_normalized was added but the chain-count docstring still said '8-strategy'. Now says '9'. - run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in. Replaced with a backfill note. Fixes #32848 (parts 3, 4, and 12) * docs(memory): also remove stray memory(action=read) references in lines 144 and 201 The original #32848 audit fix (in `6fd661d6`) only addressed line 20 (the action list in the module docstring), but the action was referenced in two other places: - tools/memory_tool.py:144 — in a class docstring, claimed 'memory(action=read)' was a way to SEE poisoned entries - tools/memory_tool.py:201 — in a user-facing warning message, told the user to 'use memory(action=read) to inspect' Since the schema on line 683 only allows add/replace/remove, both references were misleading: the first claimed a way to inspect poisoned entries that doesn't exist, the second would error out when the user followed the warning. This commit removes both references: - Line 144: '...keep the original text so the user can still SEE poisoned entries by inspecting the source files directly, and remove them — silently dropping them would hide the attack from the user.' - Line 201: '...use memory(action=remove) to delete the original. (drop the read-action reference)' Followup to the previous commit on this branch. --------- Co-authored-by: KeyArgo <keyargo@argobox.com>	2026-06-19 16:09:30 -07:00
SHL0MS	d799284b15	feat(optional-skills/creative-ideation): expand to v2.1.0 method library (#42402 ) The optional-skills copy was still the v1.0.0 constraint-dispatch skill (SKILL.md + full-prompt-library.md only). This brings it up to the current tool: a situation-routed library of 22 named ideation methods drawn from working artists, scientists, designers, and writers. SKILL.md becomes a 4-step router (extract PHASE/DOMAIN/SPECIFICITY signals → apply overrides → route phase-then-domain → resolve ambiguity), with anti-slop operating rules and an anti-default check. Adds: - 22 method files under references/methods/ — oblique-strategies (Eno/Schmidt), oulipo, scamper, lateral-provocations (de Bono), triz (Altshuller), leverage-points (Meadows), pattern-languages (Alexander), compression-progress (Schmidhuber), analogy-and-blending, pataphysics, first-principles, polya, biomimicry, volume-generation, creative-discipline, premortem-and-inversion, defamiliarization, derive-and-mapping, affinity-diagrams, jobs-to-be-done, story-skeletons, chance-and-remix. Each: when/when-not, the actual cards/principles/operators, a procedure, a worked example, anti-slop notes. - references/method-catalog.md (index + when-to-use), heuristics.md (extended decision tree), anti-slop.md (rules applied to every output), exercises.md (time-boxed exercises). - full-prompt-library.md restructured into domain-affinity sections (general / software / physical / social / lists) so the no-direction default isn't developer-biased. Frontmatter: name aligned to directory slug (creative-ideation, folding in the fix from #18084); version 2.0.0→2.1.0; platforms field preserved. Original wttdotm-derived constraint dispatch is kept as the default path. Supersedes #19295 (which targeted the pre-move skills/ path). Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>	2026-06-19 15:40:02 -07:00
Gille	a7983d5ad7	fix(dashboard): hide sidecar sessions from history (#49269 ) * fix(dashboard): hide sidecar sessions from history * test(dashboard): allow sidecar source in session payload	2026-06-19 18:06:38 -04:00
kshitij	1a0ef1311c	Merge pull request #49264 from kshitijk4poor/salvage-picker-persist-49176 fix(gateway): persist inline-keyboard model-picker selections by default, matching /model (#49066)	2026-06-20 02:50:54 +05:30
kshitijk4poor	2099c7b531	test(gateway): make picker-persist tests hermetic and parametrized Simplify pass on the picker-persist coverage: - Stub list_picker_providers + resolve_display_context_length so the tests no longer make real outbound HTTP calls (OpenRouter catalog + Ollama /api/show) during picker setup and confirmation rendering. Runtime drops from ~11s to ~0.4s and the tests are now deterministic. - Collapse the two positive persist cases into one parametrize over the config seed (nested-dict vs flat-string), asserting the nested-dict invariant in both. - Assert the in-memory session override is applied in the --session case, closing a 'passes for the wrong reason' gap (config untouched AND the switch still took effect). - _FakePickerResult -> types.SimpleNamespace. Mutation re-checked on the final test: both persist cases fail on pre-fix slash_commands.py; the --session case passes on both.	2026-06-20 02:46:01 +05:30
kshitijk4poor	10fea06c19	test(gateway): cover inline-keyboard model-picker persistence Add regression coverage for the picker persist fix: drive the real _handle_model_command with a fake picker-capable adapter that captures the on_model_selected callback, fire a 'tap', and assert config.yaml is written (bare /model), left untouched (--session), and that a flat-string model: is coerced to a nested dict on a tap. Mutation-checked: the persist and coercion assertions fail on pre-fix slash_commands.py and pass on the fix.	2026-06-20 02:35:02 +05:30
Evo	2fe78d1ae3	fix(gateway): persist inline-keyboard model-picker selections by default #49066 made /model text and the CLI picker persist to config.yaml by default, but the gateway (Telegram/Discord/Matrix) inline-keyboard picker callback stayed session-only. Mirror the text path's persist block so a tapped model survives across launches like a typed one.	2026-06-20 02:32:44 +05:30
kshitij	01f581d8d2	Merge pull request #49254 from kshitijk4poor/salvage-windows-managed-node-49239 fix(windows): prefer managed node for whatsapp and desktop	2026-06-20 02:18:09 +05:30
kshitijk4poor	d4e7dd609d	refactor(windows): tidy managed-node resolver helpers Behavior-preserving cleanups on the managed-node resolver: - Hoist _candidate_node_command_names() out of the inner dir loop in find_hermes_node_executable (computed once, not per directory). - Drop redundant os.environ.copy() at the two with_hermes_node_path( os.environ.copy()) sites \u2014 the helper already copies os.environ when called with no argument (verified env-equivalent). - Add reciprocal keep-in-sync comments between iter_hermes_node_dirs() (hermes_constants.py) and hermesManagedNodePathEntries() (electron main.cjs), which mirror the same platform-ordering rule across the Python/Node boundary.	2026-06-20 02:12:16 +05:30
kshitijk4poor	fcc169057d	fix(windows): prefer managed npm for hermes update desktop-rebuild gate The `hermes update` desktop-rebuild gate still used a bare `shutil.which("npm")` presence check. On a Windows box where the only working npm is the Hermes-managed npm.cmd (not on PATH), the gate would skip the desktop rebuild even though _build_web_ui / cmd_gui can now find it via find_node_executable. Route the gate through the same resolver for full bug-class coverage. Surfaced during review of #49239.	2026-06-20 02:01:24 +05:30
helix4u	7a7b56d498	fix(windows): prefer managed node for whatsapp and desktop	2026-06-20 02:00:37 +05:30
hakanpak	38f1a923af	fix(gateway): rename the Telegram topic from /title, not only auto-titles Auto-generated session titles already rename the Telegram forum topic via the title_callback path, but the /title command only wrote the session title to the database. On a Telegram topic lane the visible topic kept its auto-assigned name, so a user who ran /title to override it saw no change. Propagate the user-chosen title to the topic by calling the existing _schedule_telegram_topic_title_rename helper on a successful /title set. It already no-ops off Telegram topic lanes and when auto-rename is disabled.	2026-06-20 01:54:16 +05:30
Teknium	866f1d65c4	chore(desktop): sync package.json version fallback to 0.17.0 (#49236 )	2026-06-19 12:53:35 -07:00
teknium1	2bd1977d8f	chore: release v0.17.0 (2026.6.19)	2026-06-19 12:38:31 -07:00
emozilla	40722058e5	fix(mcp): keep short-TTL HTTP sessions alive with configurable ping keepalive MCP Streamable HTTP servers that garbage-collect idle sessions on a short TTL (e.g. Unreal Engine's editor MCP, ~15s) were unusable: the keepalive was hardcoded at 180s, so the session was always dead by the time it ran, and every idle tool call then landed on an expired session and paid the full reconnect path (observed hangs of 113-143s until interrupt, bounded only by the 300s tool_timeout). Two coordinated, backward-compatible changes: - Add per-server `keepalive_interval` (config.yaml, not an env var per the contribution rubric). Default 180s — byte-identical to the old hardcoded value when unset — floored at 5s. Servers with short session TTLs set it below their TTL so the session stays warm. - Switch the keepalive probe from `list_tools()` to `ping` (the MCP base protocol liveness primitive). On large servers `list_tools` pulled ~1 MB every cycle (830 tools = 1,068,041 bytes); `ping` is ~55 bytes and works uniformly across tool/prompt/resource servers. Tool-list changes still arrive out-of-band via notifications/tools/list_changed -> _refresh_tools. `ping` is an OPTIONAL utility, so to guarantee zero regression for a tool-capable server that doesn't implement it: the first -32601 latches `_ping_unsupported` and the probe falls back to the pre-ping `list_tools` path for that connection (no reconnect loop). The latch resets on each fresh connection (_discover_tools, all transport paths) so a server that gains ping support after a reconnect is re-probed with the cheap path. Non-(-32601) ping errors propagate as genuine liveness failures. Verified end-to-end against a live Unreal MCP server (idle 22s past the ~15s TTL -> post-idle tool call returns in 0.31s, no teardown) and with a simulated ping-less tool server driving the real keepalive loop (ping once, list_tools thereafter, no reconnect). 25/25 unit tests pass. Note: a separate upstream defect (modelcontextprotocol/python-sdk#2604) still tears down the whole session when one tool-call POST returns 4xx; that is not addressed here.	2026-06-19 12:16:33 -07:00
kshitij	4c5217b717	Merge pull request #49207 from kshitijk4poor/fix/cron-script-env-sanitize fix(cron): sanitize env for job script subprocesses	2026-06-20 00:36:26 +05:30
Teknium	ba49fb51a5	fix(discord): hydrate channel context when replying to a message (#49212 ) * fix(discord): hydrate channel context when replying to a message Replying to a message in a free-response (non-mention, threads-off) channel previously received only the 500-char "[Replying to: ...]" snippet — the history-backfill gate fired only for mention-gated channels and threads, so a reply got no surrounding channel context. Replies now route through the same _fetch_channel_context hydration that threads use. When the user replied to a specific (often older) message, a reply-anchored window is scanned ending at that message so the agent sees the exchange around what was pointed at, even when the target sits before the self-message partition. The two windows are merged chronologically and de-duplicated by message id. Also hardens the recent-window scan to skip non-conversational status bumps before the self-message partition check, and makes author-name resolution defensive against partial/deleted authors. * fix(discord): duck-type reply-target resolution instead of isinstance(discord.Message) The e2e suite stubs the discord module, so discord.Message is a MagicMock and isinstance(_resolved, discord.Message) raises 'isinstance() arg 2 must be a type'. Any object with an int .id works as a scan anchor, so resolve the reply target by duck-typing on .id and fall back to a _Snowflake from the reference message_id.	2026-06-19 12:03:08 -07:00
kshitijk4poor	f06508836d	docs(security): enumerate cron job scripts in §2.3 credential scoping The cron-script subprocess is now sanitized alongside shell/MCP/ code-exec children; §2.3 listed only the original three. Makes the _run_job_script docstring's §2.3 citation fully accurate. Follow-up to salvaged PR #49207.	2026-06-20 00:30:42 +05:30
kshitijk4poor	8dc0b18894	refactor(cron): copy os.environ before sanitizing for subprocess Matches the env= callsite convention at the other sanitized subprocess spawns (cua_backend dict(os.environ), gateway os.environ.copy()). Functionally equivalent — _sanitize_subprocess_env never mutates its input — but avoids handing the live mapping to the helper. Follow-up to salvaged PR #49207.	2026-06-20 00:29:46 +05:30
alt-glitch	16642e2769	fix(mcp): revert ACP rebuild to original; harden generation guard CI caught 3 ACP test failures (tests/acp/test_server.py, tests/acp/test_mcp_e2e.py). Root cause: routing ACP's tool-surface rebuild through the shared refresh_agent_mcp_tools helper (added in the round-2 pass) broke a deliberate, pre-existing ACP contract: - the ACP tests assert `agent.tools is <get_tool_definitions return>` (object identity) and an exact get_tool_definitions(enabled_toolsets=[...], disabled_toolsets=..., quiet_mode=True) call signature; the shared helper list()-copies and re-derives differently, breaking identity; and - the tests use a MagicMock agent whose _tool_snapshot_generation is a mock, so the new `int < published_gen` generation guard raised TypeError and the whole ACP refresh silently failed. ACP already preserves memory-provider tools (its own inject call) and excludes context_engine, so there was no bug to fix there — only over-reach. Reverted ACP to its original rebuild. (Same lesson as the gateway path: leave call sites that carry their own tested contract alone; a reviewer's "inert today, fragile" note meant leave-it, not change-it.) Also hardened the generation guard defensively: tolerate a non-int _tool_snapshot_generation (mock / partially-built agent) instead of throwing TypeError and silently failing the refresh.	2026-06-19 11:57:43 -07:00
alt-glitch	f3e967aae5	fix(mcp): round-3 polish — generation capture adjacency + gateway contract note Third review pass (Hermes subagent) declared convergence: no BLOCKING, the round-2 generation-aware publish / context-engine staging / CLI reload / ACP routing all verified correct by hand and by test. - agent_init: capture _tool_snapshot_generation immediately before the tool snapshot (was ~425 lines earlier); removes a harmless skew window so the recorded generation always matches the snapshot it describes. - gateway/run.py _execute_mcp_reload: keep preserving each cached agent's build-time enabled_toolsets EXACTLY (do NOT merge newly-connected servers like CLI/TUI do) and document WHY — gateway sessions can be deliberately locked down, and test_reload_mcp_preserves_per_agent_toolset_overrides asserts this. A reviewer suggested "parity" here; it would have violated that contract.	2026-06-19 11:57:43 -07:00
alt-glitch	88d523220f	fix(mcp): address adversarial review round 2 (stale-publish race, parity holes) Second review pass (Codex + Hermes subagent). Codex reproduced a real race with a two-thread harness; both converged on the remaining issues. - Generation-aware publish (fixes a lost-update race): two refresh callers (the late-refresh daemon and the between-turns prologue around turn 1) could each compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry generation could acquire the publish lock after a newer caller and clobber it, deleting just-landed tools. refresh_agent_mcp_tools now captures registry._generation before computing and refuses to publish a stale set; agent._tool_snapshot_generation tracks the published generation. - Context-engine routing names (_context_engine_tool_names) are now staged on a local and published atomically with the snapshot, and only claimed when this rebuild actually appended the schema — matching agent_init's dedup so a registry/plugin tool of the same name keeps its own dispatch. (Previously mutated live, before the publish lock, and on no-change refreshes.) - CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a server newly ENABLED in config mid-session wasn't picked up (TUI already re-resolved). Merge now-connected MCP server names into the override (unless the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in sync. Closes the CLI/TUI parity hole. - ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th sibling rebuild that re-injected memory tools but NOT context-engine tools and bypassed the atomic/name-diff path (inert today, fragile). - mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG (single source of truth) instead of a stale hardcoded 5.0 literal. - Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout fallback uses DEFAULT_CONFIG.	2026-06-19 11:57:43 -07:00
alt-glitch	b6e2a54a94	fix(mcp): address adversarial review round 1 (cache parity, gates, races) Consolidated findings from three independent reviewers (Codex, Claude Code, a Hermes subagent w/ the hermes-agent-dev skill): - BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently dropping post-build-injected memory-provider (mem0/honcho/…) and context- engine (lcm_) tools on every refresh. Now additive-preserving: re-applies the same injectors agent_init uses, staged on locals and published atomically. - Re-injection now honors the #5544 enabled_toolsets gate for context-engine tools, so a restricted-toolset platform can't get lcm_ leaked back in. - Atomic read-diff-publish under one lock: the returned `added` set and the (tools, valid_tool_names) pair are consistent even under concurrent callers (no half-swap, no TOCTOU). - background_review fork opts out (_skip_mcp_refresh) so its byte-identical tools[] cache parity with the parent is preserved. - CLI /reload-mcp routed through the shared helper (was a 4th divergent copy with the same clobber bug + missing disabled_toolsets). - Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user just enabled in config this session is picked up; automatic paths reuse the agent's build-time selection. - mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the between-turns refresh, so the startup wait is only a small turn-1 UX bump rather than a heavy dead-server latency penalty. - has_registered_mcp_tools checks registered TOOLS (not connected servers) so a zero-tool/prompt-only server doesn't make the per-turn hook fire forever. - Tests: rewrote the thread-safety test to actually exercise the write path (alternating tool sets), added the #5544-gate regression, the memory/context preservation regression, and a "callable next turn via valid_tool_names" contract; removed a dead monkeypatch line.	2026-06-19 11:57:43 -07:00
alt-glitch	3713483874	fix(mcp): refresh agent tool snapshot between turns (cache-safe late-binding) A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting after the agent's one-time tool snapshot was uncallable for the rest of the session. The merged pre-first-turn late-refresh only helps during the dead air before the user's first keystroke; once a turn starts it bails to protect the prompt cache, so a user who types before the server connects never gets the tools without a manual /reload-mcp. Refresh the snapshot in the per-turn prologue (build_turn_context), before this turn's first API call assembles tools=. This is cache-safe by construction: the refresh only ever extends a fresh request prefix at a turn boundary, never mutates the cached prefix of an in-flight turn. So late tools become callable on the user's NEXT turn automatically, with no /reload-mcp and no cache cost. - tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with no MCP servers (the common case) skip the rebuild entirely. - agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the top of the prologue when MCP servers are registered. - tests: 3 contract tests through the real build_turn_context (adds late tool; skipped when no servers; no snapshot churn when unchanged). .hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.	2026-06-19 11:57:43 -07:00
alt-glitch	93d6e73028	fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway) MCP servers that connect after the agent's one-time tool snapshot were invisible for the whole session. Two root causes, fixed together: 1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers commonly take 2-6s on a cold connect, so they missed the window and their tools never entered the agent's snapshot. `thread.join(timeout)` already returns the instant discovery completes, so raising the bound costs ~0s for the common case (no MCP / fast servers) and only ever blocks for a genuinely-pending server, capped so a dead server can't freeze startup. The bound is now configurable via `mcp_discovery_timeout` (config.yaml, default 5.0s). 2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh thread), and the late-refresh detected changes by tool COUNT — missing an equal-size add/remove swap. Consolidated into one shared `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by tool NAME, mutates the agent under a lock (thread-safe), and respects the agent's own enabled/disabled toolsets. The late-binding refresh keeps its pre-first-turn cache-safety guard: it never rebuilds the tool list once a turn has started, so the cached prompt prefix is never invalidated mid-conversation. Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the name-based diff, in-place mutation, agent-scoped filtering, thread safety, and the config-driven discovery bound (incl. instant-return when nothing is pending). 75 passed across the touched areas.	2026-06-19 11:57:43 -07:00
kshitijk4poor	2d978bf44a	test(cron): make env-sanitize probe var deterministic next(iter(frozenset)) picked a different blocklist var each run (PYTHONHASHSEED-dependent), hurting reproducibility. sorted()[0] keeps the invariant-style assertion (any real blocklisted var) while making failures reproducible. Follow-up to salvaged PR #49207.	2026-06-20 00:22:55 +05:30
teknium1	746c46d610	chore: add lgalabru to AUTHOR_MAP for PR #43112 salvage	2026-06-19 11:46:25 -07:00
Ludo Galabru	239740a19e	feat(tools): MCP elicitation handler with gateway-aware approval routing Wires support for the MCP `elicitation/create` request (Python SDK 1.11+) so MCP servers can ask the user to confirm sensitive operations mid-tool-call (payment authorization, OAuth confirmation, etc.) instead of failing closed or requiring out-of-band biometrics. Behavior: - `tools/mcp_tool.py` adds `ElicitationHandler`, attached per server task and passed to `ClientSession` as `elicitation_callback`. Form-mode requests route through the existing approval system; URL-mode requests decline cleanly (out of scope for this pass). - `tools/approval.py` adds `request_elicitation_consent()`, which dispatches to whichever surface owns the active session — `_await_gateway_decision` for Telegram / Slack / etc. (so the approval prompt lands on the right platform), `prompt_dangerous_approval` for CLI / TUI. Fails closed on timeout, missing notify_cb, or exception. - The MCP tool wrapper snapshots `contextvars.copy_context()` into `MCPServerTask._pending_call_context` before each `session.call_tool` and clears it after. The recv-loop task that dispatches incoming `elicitation/create` requests does not inherit the agent task's contextvars (HERMES_SESSION_PLATFORM and friends), so without the bridge `_is_gateway_approval_context()` returns False on every gateway session and the elicitation falls through to a CLI prompt that has no TTY → fail-closed decline. The handler now reads the snapshot via its `owner` back-reference and replays it through `Context.copy().run(...)` so attribution survives the task hop. Tests (`tests/tools/test_mcp_elicitation.py`): - form-mode accept / decline / cancel - URL-mode declined without prompting - exception in approval system → decline - timeout in approval → cancel - context-bridge regression tests (replay observed in consent call, missing-context fallback, multiple-replay safety, owner with cleared `_pending_call_context`) Verified end-to-end against pay's MCP server on macOS: agent message arrives via Telegram, agent calls `mcp_pay_curl` against a paid endpoint, pay returns 402, ElicitationHandler routes the approval prompt back to the originating Telegram chat, user replies in TG, the curl tool signs and completes. Platforms tested: macOS 14 (darwin/arm64). No Unix-only syscalls introduced; Windows footgun checker passes on the touched files.	2026-06-19 11:46:25 -07:00
0z1-ghb	da7253215d	fix(cron): sanitize env for job script subprocesses Cron no_agent and pre-check scripts ran with the full gateway/agent environment, allowing scripts under HERMES_HOME/scripts/ to read provider credentials. Apply _sanitize_subprocess_env like terminal and MCP paths (SECURITY.md section 2.3). Add regression test asserting blocklisted provider vars are absent in the child process.	2026-06-20 00:13:11 +05:30
Teknium	26e76a75e5	feat(telegram): opt-in Online/Offline bot status indicator (#49134 ) Sets the Telegram bot's short description (the line under its name) to "Online" on gateway connect and "Offline" on clean disconnect, gated behind extra.status_indicator (off by default). Telegram bots have no presence/online dot — that's a user-account feature the Bot API doesn't expose for bots. The short description is the closest available surface, so this gives users a way to tell whether the gateway is up from the bot's profile. - New extra.status_indicator flag (+ status_online/status_offline text overrides), read in __init__ via config.extra — no config-schema change. - _set_status_indicator() helper: best-effort, swallows API errors so it never blocks connect/disconnect; truncates to Telegram's 120-char cap. - Wired Online after _mark_connected(), Offline at top of disconnect() while the bot HTTP client is still alive. - 9 unit tests + Telegram docs section. Requested by @ilTrumpista, cc @Teknium.	2026-06-19 11:38:39 -07:00
alt-glitch	990273d90a	fix(agent): accept pixel-correct image downscale when bytes grow (#48013 ) The image-too-large reactive shrink (try_shrink_image_parts_in_messages) conflated two independent constraints: it always rejected a resize whose re-encoded bytes were >= the original, even when the shrink was driven by a PIXEL-DIMENSION cap (Anthropic many-image 2000px) rather than the byte budget. Downscaled screenshot PNGs routinely re-encode LARGER in bytes, so the dimension-correct result was discarded and the image left oversized -> the provider re-rejected on retry and the session wedged forever. Fix: track which constraint triggered the shrink (bytes vs dimension) and gate the accept on the SAME axis. * dimension path: accept the result as long as it is now within max_dimension, regardless of byte size (verify via Pillow; fall back to the byte gate only when the re-encode can't be decoded). * bytes path: still require bytes to shrink, but ALSO re-check the per-side cap when it's active — _resize_image_for_vision returns a best-effort, possibly over-cap blob when it exhausts its halving budget on a very-high-aspect image, so a byte-shrink alone can leave it over the dimension cap and re-brick on retry. Extend the unshrinkable-oversized guard to the pixel axis so a partial shrink doesn't burn the one-shot retry. Single shared agent path -> fixes CLI, TUI, and gateway alike. Adds a real-Pillow runnable proof (repro_48013_image_shrink_brick.py) that reproduces the issue's per-image table (bricks 3/5 before, passes 5/5 after) plus unit invariants for the dimension and bytes accept/reject paths, partial-progress accounting, and the bytes-path still-over-cap regression surfaced by adversarial review. Closes #48013	2026-06-19 11:37:51 -07:00
Teknium	ac00e73688	feat(dashboard): add a reasoning-effort picker to the chat sidebar (#49141 ) The web dashboard only showed a read-only "Reasoning" capability badge with no way to set the effort level — unlike the desktop app, which has an effort radio in its composer model menu. This adds a picker so the two surfaces reach parity. - ReasoningPicker: a Select rendered in the chat sidebar, gated on the effective model's supports_reasoning capability (from /api/model/info). Reads/writes agent.reasoning_effort via the existing config REST endpoints (read-modify-write, the dashboard's single-key save pattern), so the value lands in the config the agent boots a fresh chat from. Options mirror the desktop: Off/Minimal/Low/Medium/High/Max. - ChatSidebar: capture supports_reasoning from the model-info fetch and render the picker; on change, show the same 'apply on /new or reload' notice the model switch uses. - reasoning-effort.ts: DOM-free helpers (normalizeEffort + options) so the node-env vitest harness can cover the resolution logic, plus tests.	2026-06-19 11:37:40 -07:00
Teknium	c06898098b	fix(cli): clear viewport on width-change resize so the status bar can't duplicate (#49120 ) The classic CLI status bar could appear twice after a horizontal terminal resize — two bars at two widths with two different elapsed readings. Root cause: prompt_toolkit's Application._on_resize() calls renderer.erase(), which does cursor_up(_cursor_pos.y) + erase_down() using the _cursor_pos.y cached from the LAST render at the OLD width (renderer.py:745). On a column shrink the terminal reflows the already-painted full-width chrome into extra physical rows, so the cached y undershoots: cursor_up doesn't climb past the reflowed rows and erase_down leaves the old bar stranded ABOVE the live origin. The next paint stacks a fresh bar below it. The existing post-resize suppression hides the NEW bar for ~0.35s but never erases the already-reflowed OLD one, so the ghost survives the whole window. Ctrl+L / /redraw clears it, confirming a viewport wipe is the fix. Fix: on a WIDTH change, _recover_after_resize now routes through the same recovery as Ctrl+L — _clear_prompt_toolkit_screen(rebuild_scrollback=False) (CSI 2J, visible viewport only) + _replay_output_history() — BEFORE delegating to prompt_toolkit's resize. Banner-safe: 2J never touches scrollback history (that's CSI 3J, which we don't send here), so the startup banner is preserved. Rows-only resizes skip the clear (no reflow → no ghost) to avoid an extra repaint. Tracks _last_resize_width to distinguish the two. Tests: replace the now-obsolete 'never clears on resize' assertion with two tests — rows-only resize delegates without clearing; width change clears the viewport + replays and never wipes scrollback.	2026-06-19 08:43:42 -07:00
Teknium	b266ad748c	chore(deps): npm audit fix — bump transitive undici to clear advisories (#49113 ) Resolves the 2 npm audit advisories (1 high, 1 moderate), both from transitive undici: - undici 6.26.0 -> 6.27.0 (high: TLS bypass / header injection / response queue poisoning class, via node-gyp + ui-tui) - jsdom's undici 7.27.2 -> 7.28.0 (moderate, via jsdom test dep) Both are in-range bumps (no --force). Lockfile also reconciled two pre-existing manifest drifts during the install: dompurify 3.4.10 -> 3.4.11 (in-range patch) and the web workspace's already-declared vitest ^4.1.5 devDep. No package.json changes. npm audit reports 0 vulnerabilities in root, ui-tui, and apps/desktop after.	2026-06-19 08:20:03 -07:00
brooklyn!	0e8b76532e	fix(desktop): rename "Restart messaging" → "Restart gateway", surface restarts in the statusbar, make logs selectable (#49094 ) * fix(desktop): rename "Restart messaging" -> "Restart gateway" The Command Center control restarts the whole messaging gateway, yet was labelled "Restart messaging" while the status line above it reads "Messaging gateway running/stopped". Rename the i18n key to match what it does, across all 4 locales. * feat(desktop): restart the gateway from Cmd+K, with statusbar spinner feedback Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a new Cmd+K "Restart gateway" action. While a restart is in flight the statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads "restarting…", returning to its real state on completion — driven by a $gatewayRestarting atom, not a transient toast or the generic "Agents running" counter. The helper owns its error handling so fire-and-forget callers can't leak an unhandled rejection; only a failure toasts. * fix(desktop): offer a Restart gateway action on messaging save/toggle toasts The "setup saved" and "platform enabled/disabled" toasts told users their change needs a gateway restart but left it a separate hunt. Attach a "Restart gateway" action (the shared runGatewayRestart), and reword the copy to state the pending consequence ("...takes effect after a gateway restart") now that the button carries the verb. Updated all 4 locales. * fix(desktop): make rendered logs selectable so they can be copied The global body { user-select: none } left log surfaces unselectable. Opt them back in via the existing data-selectable-text convention — at the shared LogView primitive (boot-failure + bootstrap install overlays) plus Command Center recent logs, toolset post-setup output, notification detail, and subagent stream/file lines.	2026-06-19 10:09:15 -05:00
Brooklyn Nicholson	929dbf7778	fix(desktop): make rendered logs selectable so they can be copied The global body { user-select: none } left log surfaces unselectable. Opt them back in via the existing data-selectable-text convention — at the shared LogView primitive (boot-failure + bootstrap install overlays) plus Command Center recent logs, toolset post-setup output, notification detail, and subagent stream/file lines.	2026-06-19 10:03:46 -05:00
Brooklyn Nicholson	a1639921ac	fix(desktop): offer a Restart gateway action on messaging save/toggle toasts The "setup saved" and "platform enabled/disabled" toasts told users their change needs a gateway restart but left it a separate hunt. Attach a "Restart gateway" action (the shared runGatewayRestart), and reword the copy to state the pending consequence ("...takes effect after a gateway restart") now that the button carries the verb. Updated all 4 locales.	2026-06-19 10:03:24 -05:00
Brooklyn Nicholson	553cf4f977	feat(desktop): restart the gateway from Cmd+K, with statusbar spinner feedback Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a new Cmd+K "Restart gateway" action. While a restart is in flight the statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads "restarting…", returning to its real state on completion — driven by a $gatewayRestarting atom, not a transient toast or the generic "Agents running" counter. The helper owns its error handling so fire-and-forget callers can't leak an unhandled rejection; only a failure toasts.	2026-06-19 10:02:54 -05:00
Brooklyn Nicholson	6308d3416a	fix(desktop): rename "Restart messaging" -> "Restart gateway" The Command Center control restarts the whole messaging gateway, yet was labelled "Restart messaging" while the status line above it reads "Messaging gateway running/stopped". Rename the i18n key to match what it does, across all 4 locales.	2026-06-19 10:02:21 -05:00

1 2 3 4 5 ...

12216 commits