hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
BaxBit	bbf02c3224	fix(gateway): validate Svix webhook signatures (#30200 )	2026-05-24 04:45:13 -07:00
Jiaming Guo	ee002e7fc5	fix(dashboard): require auth for plugin rescan (#27340 )	2026-05-24 04:45:07 -07:00
Teknium	5acaeba2bb	fix(mcp): raise ImportError instead of NameError when stdio SDK missing (#31450 ) When the 'mcp' Python SDK isn't installed, _run_stdio leaked a bare 'NameError: name StdioServerParameters is not defined' because the top-level 'from mcp import ...' fails inside try/except ImportError, leaving the names unbound at module scope. Mirror the _MCP_HTTP_AVAILABLE gate that _run_http already had: raise a clear ImportError with install instructions instead. Fixes #30904	2026-05-24 04:44:59 -07:00
xxxigm	6cafcf9c77	test(streaming): pin partial-stream-stub finish_reason + continuation contract Three test classes lock in the #30963 fix: 1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call through the two recovery branches and asserts: - text-only partial → finish_reason="length" (the new behaviour), - mid-tool-call partial → finish_reason="stop" (unchanged on purpose). 2. TestLengthContinuationPromptBranching — pure-Python check on the branch that picks the continuation prompt by response.id. Locks the network error wording for partial-stream-stub vs. the output-length wording for everything else. 3. TestConversationLoopPartialStreamContinuation — feeds a stub + continuation pair into run_conversation, verifies the loop makes a second API call (instead of exiting with text_response(stop)), confirms the network-error continuation prompt actually reaches the model on call #2, and that final_response stitches both halves. Refs: NousResearch/hermes-agent#30963	2026-05-24 04:35:15 -07:00
xxxigm	20b3703a42	fix(conversation-loop): tailor length-continuation prompt for partial stream The length-continue path's user-facing vprint and continuation prompt both told the model "your response was truncated by the output length limit." That's a lie when the stub came from a partial-stream network error (issue #30963) — and a lie the model can detect, leading to "I wasn't truncated, I'm done" no-op responses that defeat the continuation entirely. Detect the partial-stream-stub via response.id and swap in: - vprint: "Stream interrupted by network error (finish_reason='length' on partial-stream-stub)" - prompt: "[System: The previous response was cut off by a network error mid-stream. Continue exactly where you left off. Do not restart or repeat prior text. Finish the answer directly.]" Real length truncations still see the original "truncated by output length limit" prompt — the model needs to know which class of failure it's recovering from. Same length_continue_retries=3 budget, truncated_response_parts merging, and final-response stitching infrastructure on both branches. Refs: NousResearch/hermes-agent#30963	2026-05-24 04:35:15 -07:00
xxxigm	9140be7c22	fix(streaming): emit finish_reason=length on text-only partial-stream stub When the API connection drops mid-stream after text deltas have already been delivered, chat_completion_helpers returned a stub response with finish_reason=stop. The conversation loop then classified the stub as a clean text completion (text_response(finish_reason=stop)) and exited with iteration budget remaining — even when the goal-judge verdict came back as "continue" milliseconds later (issue #30963). Switch the text-only partial-stream stub to finish_reason=length. The existing length-continuation path (length_continue_retries up to 3, "continue exactly where you left off" prompt, partial parts merged into final_response) then fires automatically: the partial assistant content is persisted, the model is asked to continue from the cut point, and the loop keeps making progress against the goal. The mid-tool-call branch keeps finish_reason=stop on purpose — its user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry rather than auto-replaying a tool call with possible side effects. #5544's "no duplicate message" contract is preserved verbatim: the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta. Refs: NousResearch/hermes-agent#30963	2026-05-24 04:35:15 -07:00
teknium1	60d20a37c9	fix(acp): only deliver final_response after streaming when transformed PR #29119 dropped the 'not streamed_message' guard unconditionally so that plugin-transformed responses (transform_llm_output hook) would reach ACP clients. That regressed test_prompt_does_not_duplicate_streamed_final_message: when no transform happened, the streamed text was re-sent as a duplicate final delivery. Tighten the condition to mirror the gateway side: deliver after streaming only when response_transformed=True. Otherwise keep the old guard. Adds test_prompt_delivers_transformed_response_after_streaming so the transformed path stays covered.	2026-05-24 04:31:13 -07:00
teknium1	26088ca669	chore: map kenyon1977@gmail.com for PR #29119 salvage	2026-05-24 04:31:13 -07:00
teknium1	b9f533af0a	test(gateway): regression for plugin-transformed response after streaming Adds a test that fails without the gateway fix, exercising the response_transformed=True branch in _finalize_response: a streamed response whose final text was modified by a transform_llm_output plugin hook must be edit_message'd in place (not duplicate-sent), with already_sent=True so the normal final-send is skipped. Also drops two minor leftovers from the salvaged PR #29119: * accumulated_text property on GatewayStreamConsumer (unused) * duplicate _response_transformed=False inside the hook try block	2026-05-24 04:31:13 -07:00
kenyonxu	5cb21e3fb5	fix(gateway): edit streamed message instead of sending duplicate when response_transformed When a transform_llm_output hook appends content after streaming, the previous fix skipped the final-send suppression which caused the full response to be sent as a NEW message (duplicate). Instead, edit the existing streamed message in-place to append the transformed content, then set already_sent=True. Added stream_consumer.message_id and .accumulated_text public properties.	2026-05-24 04:31:13 -07:00
kenyonxu	a4ceead796	fix(gateway): propagate response_transformed flag through run_sync return dict run_sync() cherry-picks fields from the run_conversation result dict into a new response dict for the gateway. response_transformed was missing from the cherry-pick list, so the gateway always saw it as False and suppressed the final send even though a transform_llm_output hook had modified the content.	2026-05-24 04:31:13 -07:00
kenyonxu	8edeebe6d7	fix: propagate response_transformed flag — plugin hook output survives streaming suppression When a transform_llm_output hook modifies final_response after streaming, the gateway was silently discarding the transformed content because streamed=True / content_delivered=True triggered the final-send suppression. Three changes: 1. conversation_loop: set `_response_transformed=True` when a transform_llm_output hook returns a non-empty string, and expose it as `response_transformed` in the result dict. 2. gateway/run: skip the final-send suppression when `response_transformed` is True — the transformed response must reach the client even if streaming already sent the original text. 3. acp_adapter/server: remove `not streamed_message` guard so final_response is always delivered (ACP path fixed separately).	2026-05-24 04:31:13 -07:00
kenyonxu	7eb6c7f489	fix(acp): deliver final_response after streaming — transform_llm_output hook now visible When streaming is active, streamed_message=True skipped the final_response update, causing plugin hooks like transform_llm_output to be silently invisible. Remove the `not streamed_message` guard so the final response (possibly transformed by plugins) is always delivered to the ACP client.	2026-05-24 04:31:13 -07:00
Teknium	197f63f454	fix(feishu): require webhook auth secret and honor config extras (#30746 )	2026-05-24 04:27:28 -07:00
Teknium	bdb97b8573	fix(feishu): enforce auth and chat binding for approval buttons (#30744 )	2026-05-24 04:27:17 -07:00
Teknium	485292ac7d	fix(feishu): authorize interactive exec approval callbacks (#30739 )	2026-05-24 04:26:57 -07:00
Teknium	be27bfed01	security: harden API server key placeholder handling (#30738 )	2026-05-24 04:25:32 -07:00
Teknium	2df2f9190b	fix(docker): keep dashboard side-process loopback by default (#30740 )	2026-05-24 04:25:28 -07:00
Teknium	4ca77f1059	Harden msgraph webhook auth requirements (#30169 )	2026-05-24 04:25:20 -07:00
Teknium	3e78e353d7	fix(qqbot): authorize approval button interactions by session owner (#30737 )	2026-05-24 04:25:12 -07:00
Teknium	e4a1220f83	security: restrict default webhook toolset capabilities (#30745 )	2026-05-24 04:24:54 -07:00
Teknium	c3caca6584	fix(gateway): remove discord role allowlist auth bypass (#30742 )	2026-05-24 04:24:49 -07:00
Teknium	1f897b0dc9	fix(gateway): stop enabling dingtalk allow-all during setup (#30743 )	2026-05-24 04:24:44 -07:00
Teknium	9732559864	fix(security): restrict dashboard websockets to loopback clients (#30741 )	2026-05-24 04:24:40 -07:00
Teknium	bc3f1f4f34	feat(secrets/bitwarden): EU Cloud + self-hosted server URL support (#31378 ) Closes #31370. bws defaults to the US identity endpoint, so EU Cloud and self-hosted machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"} during 'hermes secrets bitwarden setup'. The token is valid — it's just being checked against the wrong region. Add a Bitwarden region step to the wizard between the access-token and project-list steps: Step 1 Install bws Step 2 Provide access token Step 3 Pick region <-- new (US / EU / self-hosted-custom-URL) Step 4 Pick project (now talks to the right endpoint) Step 5 Test fetch Region is stored in config.yaml as secrets.bitwarden.server_url and plumbed into every bws subprocess as BWS_SERVER_URL (project list, secret list, test fetch, and the env_loader startup pull). Also: - Non-interactive: 'hermes secrets bitwarden setup --server-url ...' - Pre-existing BWS_SERVER_URL in the shell is detected and reused - Cache key includes server_url so EU/US fetches don't collide - 'hermes secrets bitwarden status' shows the configured region - 'invalid_client' / '400 Bad Request' from bws now triggers a hint pointing at the region setting instead of looking like a bad token	2026-05-24 02:19:57 -07:00
Teknium	c9b3eeabdc	fix(cli): decouple tool_progress=verbose from global DEBUG logging (#31379 ) PR #`6a1aa420e` coupled `display.tool_progress: verbose` (a per-tool display toggle for full args / results / think blocks) to `self.verbose` — which controls root-logger DEBUG level. Result: setting tool_progress: verbose in config silently flipped every module in the process to DEBUG and flooded the terminal with internal logging, far beyond just full tool calls. The two concepts are separate: - `tool_progress_mode == 'verbose'` → display behavior (tool rendering) - `self.verbose` → logging behavior (root logger → DEBUG, line 9795) This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback plumbing but severs the verbose-display → debug-logging link. Changes: - cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no longer auto-True when tool_progress_mode == 'verbose'. - cli.py:_toggle_verbose — slash-cycle through tool progress modes no longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`. - cli.py:9355 — fix misleading label (drop 'and debug logs'). - tui_gateway/server.py:_make_agent — same decoupling on the TUI side (verbose_logging no longer derived from tool_progress_mode). - tests/cli/test_tool_progress_scrollback.py — invert the test that asserted the broken coupling; add coverage for explicit `--verbose` still enabling DEBUG independent of tool_progress. Live verified: - tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines - --verbose flag explicit → 32 DEBUG/INFO log lines (as expected)	2026-05-24 02:19:20 -07:00
AhmetArif0	5848174374	fix(wecom): guard flush task against cancel-delivery race to prevent message loss When asyncio.sleep() fires just before Task.cancel() is called, CPython sets _must_cancel=True but cannot cancel the already-completed sleep future, so CancelledError is delivered at the next await (handle_message) rather than at the sleep. By that point the superseded task has already popped the merged event from _pending_text_batches, so the superseding task sees an empty batch and silently drops the message. Fix: add a synchronous task-registry check between the sleep and the pop. No await between the check and the pop means no other coroutine can interleave, so the guard is race-free.	2026-05-24 01:33:40 -07:00
Teknium	1bed4e8eed	fix(gateway): drop text snippet from debounce debug log (CodeQL) CodeQL py/clear-text-logging-sensitive-data flagged the candidate-accept debug log including event.text[:60]. Log text_len instead — sufficient for debugging burst behavior without surfacing message contents. Co-authored-by: Paulo Nascimento <pnascimento9596@gmail.com>	2026-05-24 01:31:45 -07:00
Teknium	51bb8c0a9e	chore: map pnascimento9596@gmail.com for PR #31235 salvage	2026-05-24 01:31:45 -07:00
Paulo Nascimento	7abd62719b	gateway: debounce queued text follow-ups	2026-05-24 01:31:45 -07:00
AhmetArif0	21db250034	fix(wecom-callback): retry send with fresh token on errcode 40001/42001 When WeCom returns errcode=40001 (invalid credential) or 42001 (token expired), send() was returning a failure without evicting the bad token from _access_tokens. All subsequent sends then kept using the same invalid cached token until its TTL naturally expired (~7200s). Fix: on the first token-rejection errcode, evict the cache entry and retry once with a freshly fetched token. Non-token errcodes fail immediately as before. If the refreshed token also fails, the error is returned without looping further. Adds four regression tests covering: successful retry on 40001, successful retry on 42001, no retry on unrelated errcode, and clean failure when the refresh does not help.	2026-05-24 01:30:47 -07:00
Teknium	d3c167b644	fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290 ) * fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint Adds a soft guard so an agent running under one Hermes profile cannot silently edit a different profile's skills/plugins/cron/memories. Three layers: A. agent/file_safety.classify_cross_profile_target Classifies a write target against the active HERMES_HOME. Returns a {active_profile, target_profile, area, target_path} dict when the path lands in another profile's scoped area. PROFILE_SCOPED_AREAS = (skills, plugins, cron, memories). get_cross_profile_warning() wraps it into a model-facing error string that names both profiles, names the area, and points at the cross_profile=True bypass. Defense-in-depth, NOT a security boundary — the terminal tool runs as the same OS user and can write any of these paths directly. The guard exists to prevent confused-agent corruption, not to stop a determined attacker. SECURITY.md §3.2 (terminal-bypass posture) still applies. Wired into tools/file_tools.write_file_tool and patch_tool with a cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both advertise cross_profile so the model can pass it after explicit user direction. patch_tool extracts target paths from V4A patch bodies before checking (same shape as the existing sensitive-path check). skill_manage is already scoped to the active profile's SKILLS_DIR by construction, so no extra guard wiring is needed there. The D-side error message (below) still names other profiles when the skill exists elsewhere. B. agent/system_prompt One deterministic line near the environment-hints block names the active profile and tells the model not to modify another profile's skills/plugins/cron/memories without explicit direction. Profile name is stable for the lifetime of the AIAgent, so the line is prompt-cache-safe. D. tools/skill_manager_tool._skill_not_found_error Replaces the bare "Skill 'X' not found." with a message that: - names the active profile, - searches OTHER profiles' skills dirs for the same name, - names the profile(s) where the skill exists and the path, - suggests `hermes -p <name>` to switch profiles, or cross_profile=True for an explicit edit. All 5 "not found" sites in skill_manager_tool (edit, patch, delete, write_file, remove_file) now go through the helper. Reference incident (May 2026): a hermes-security profile session edited skills under both ~/.hermes/profiles/hermes-security/skills/ AND ~/.hermes/skills/ (the default profile's skills) without realizing the second path belonged to a different profile. Three of the four skill files needed manual restoration afterward. What this PR does NOT do: * No hard block. The terminal tool can still touch any of these paths with no guard — same posture as the dangerous-command approval flow. SECURITY.md §3.2 applies. * No regex sweep on terminal commands for cross-profile paths. That direction is a Skills-Guard-style arms race (cd + relative paths, base64, etc.) and would false-positive on legitimate cross-profile reads. Filed as a follow-up. * No on-disk path migration. ~/.hermes/skills/ remains the default profile's skills dir; this PR is about telling the agent about that boundary, not changing the layout. Tests: tests/agent/test_file_safety_cross_profile.py (16 tests) - _resolve_active_profile_name covers default/named/failure paths - classify_cross_profile_target covers all four scoped areas, both directions (default → named, named → default, named → named), non-Hermes paths, and root-level config files - get_cross_profile_warning covers in-profile no-op, cross-profile message shape, and the defense-in-depth self-documentation tests/tools/test_cross_profile_guard.py (12 tests) - write_file: in-profile allow, cross-profile block, cross_profile=True bypass, non-Hermes pass-through - patch: replace-mode block, cross_profile=True bypass, V4A patch path extraction - skill_manage: error names the other profile (single + multiple), missing-everywhere falls back to skills_list hint - system prompt: contract-level checks (both branches present, cross_profile=True mentioned, ~/.hermes/profiles/ referenced) All 207 existing tests in file_safety/file_operations/skill_manager still pass. 10 system-prompt tests still pass. E2E verified: the exact incident scenario (security profile editing default's hermes-agent-dev skill) is now blocked with the warning message; cross_profile=True unblocks. * fix(code_execution): add cross_profile to write_file/patch stubs The cross_profile kwarg added to write_file_tool/patch_tool needs to flow through the execute_code sandbox stubs in _TOOL_STUBS so the test_stubs_cover_all_schema_params drift test passes. Without this, scripts running inside execute_code couldn't pass cross_profile=True through hermes_tools.write_file(). Caught by CI on PR #31290.	2026-05-24 00:38:17 -07:00
Teknium	b207dc28b3	feat(kanban): --ids bulk promote + AUTHOR_MAP entry for #29464 Adds an --ids flag to 'hermes kanban promote' mirroring the existing block/schedule convention, so the marquee use case from issue #28822 (promote all children of a closed organizational parent in one shot) doesn't require a shell loop. Single-id JSON output stays a flat object for back-compat; bulk emits a list. Dedupes positional + --ids so the same id can't be promoted twice in one call. 5 new CLI-level tests cover bulk happy path, partial-failure exit code, JSON shapes, and dedup. Also adds the thedavidmurray noreply-email -> github-login mapping in scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP contributor-credit check.	2026-05-23 23:10:36 -07:00
David Murray	d46adad22f	feat(cli): kanban promote verb for manual todo->ready recovery Adds `hermes kanban promote <task_id>` for manual lifecycle recovery when an auto-promote daemon misses the parent-done transition (issue #28822). Refuses promotion unless every parent dep is done/archived (override with --force). Emits a `promoted_manual` audit event distinct from the automatic `promoted` kind, so audit consumers can filter human-driven from system-driven promotions. Supports --dry-run and --json for orchestration. Does not mutate assignee/claim state — the dispatcher picks the card up via its normal ready polling path. Closes #28822.	2026-05-23 23:10:36 -07:00
novax635	421ab81052	fix(cli): reuse canonical root model key normalization in load_cli_config	2026-05-23 23:08:05 -07:00
Teknium	2442a0c281	fix(background-review): allow pinned skills to be improved The post-turn background reviewer prompt listed pinned skills under 'Protected skills (DO NOT edit these)' alongside bundled and hub-installed skills, with the instruction to say 'Nothing to save.' if only protected skills needed updating. This meant the reviewer would refuse to patch a pinned skill even when the user explicitly wanted that skill improved. The underlying tool layer already gets this right: skill_manage's _pinned_guard only fires on delete; patch/edit/write_file go through on pinned skills. Curator archive/consolidation still skips pinned at the data layer (agent/curator.py), which is the correct place for that protection — pin's job is anti-deletion, not anti-improvement. Both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT now explicitly tell the reviewer that pinned skills can be patched, with rationale, so it doesn't bail out of an improvement just because the target is pinned.	2026-05-23 22:57:42 -07:00
brooklyn!	a627981a65	fix(tui): stop slash dropdown from chopping last char of /goal (#31311 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Two independent bugs caused the slash-command autocomplete to render `/goal` as `/goa` (and `/gquota` as `/gquot` for that matter) in the TUI: 1. `tui_gateway/server.py` was forwarding `c.display` from prompt_toolkit's `Completion` straight into the JSON-RPC payload. prompt_toolkit normalizes `display=` into `FormattedText` (a `list` subclass), so the wire format became `[["", "/goal"]]` instead of the `string` that `CompletionItem.display` in the TUI declares. `meta` already went through `to_plain_text` — `display` did not. 2. The dropdown row in `appOverlays.tsx` used `flexDirection="row"` with the display `<Text>` and the (very long) meta `<Text>` as siblings. When the meta overflows the row width, Ink/Yoga shrinks the first column by one cell, lopping the trailing character off the command name. `/goal` triggers it reliably because its meta string is the longest of any built-in command (description + embedded `[text \| pause \| resume \| clear \| status]` usage hint). Wrapping the display column in `<Box flexShrink={0}>` keeps it at its natural width and lets the meta wrap or truncate instead.	2026-05-23 22:12:55 -07:00
Teknium	2666009ccc	docs: dedicated Nous Portal integration page and setup guide (#31296 ) If Nous Portal is the recommended way to run Hermes Agent, it deserves more than a sub-section buried under `## Inference Providers`. Add two new pages and shrink the existing providers.md section to a stub that points at them. New pages: - `website/docs/integrations/nous-portal.md` — landing page. What's in the subscription (300+ model catalog table, Tool Gateway breakdown, Nous Chat, cross-platform parity, no-dotfile-credentials). Hermes 4 recommendation note. Setup paths (fresh install, existing install, headless / SSH, profiles). Day-to-day usage (portal status / portal tools / portal open, switching models, mixing gateway with own backends, subscription management). Configuration reference. Token handling. Troubleshooting. Cross-links. Sidebar-position 1 — first entry under Integrations. - `website/docs/guides/run-hermes-with-nous-portal.md` — task script. Eight numbered steps: subscribe → setup --portal → verify with portal status → first chat → switch models → customize gateway routing → voice mode → cron/always-on. Per-step troubleshooting. 'What this gets you in plain numbers' comparison table. Sidebar position 1 — first entry under Guides & Tutorials. Existing providers.md: - Replace the 80-line `### Nous Portal` deep-dive with a 13-line stub that summarizes the value prop, lists the three CLI commands, and links to the new pages. Saves ~6KB. Other provider sections and callouts (Codex Note, Two Commands, Tool Gateway tip) preserved. Sidebar: - `integrations/nous-portal` inserted right after `integrations/index`, before `integrations/providers`. - `guides/run-hermes-with-nous-portal` inserted first in Guides & Tutorials.	2026-05-23 21:07:58 -07:00
Teknium	2b10024ee8	test(display): cover failure-suffix rendering + update scrollback test The original PR #17194 description claimed test_display_tool_preview.py but only ever shipped test_display_todo_progress.py. Add the missing coverage for the failure-suffix path: - _trim_error: whitespace strip, length cap, File-not-found path collapse - _detect_tool_failure: terminal exit codes, memory full, structured {error}/{message} extraction, malformed JSON, None result - get_cute_tool_message E2E: read_file failure, terminal exit-only, terminal stderr message, memory full, success path, no-result path Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool to reflect the new behavior: the generic '[error]' fallback in cli.py has been removed; failure suffixes now come from the result-aware _detect_tool_failure (e.g. '[exit 1]', '[File not found: x]').	2026-05-23 21:03:51 -07:00
Albert.Zhou	ffde8b7b09	feat(cli): show todo progress as done/total fraction Parse the todo_tool result summary to display completion progress in CLI tool preview lines: Read: ┊ 📋 plan 3/4 task(s) 0.5s Update: ┊ 📋 plan update 3/4 ✓ 0.5s Create: falls back to plain count when no completed tasks Falls back gracefully to the existing 'N task(s)' format when the result is missing, malformed, or has no completed items. Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current main. Co-authored-by: Albert.Zhou <albert748@gmail.com>	2026-05-23 21:03:51 -07:00
Albert.Zhou	094d732378	fix(cli): surface tool failures with specific error messages Improves the failure suffix on tool completion lines. Instead of always showing '[error]' for non-terminal failures, parse the tool's JSON result and surface the actual message: Before: ┊ 📖 read foo.py 0.1s [error] After: ┊ 📖 read foo.py 0.1s [File not found: foo.py] Before: ┊ 💻 $ ls bad 0.1s [exit 127] After: ┊ 💻 $ ls bad 0.1s [ls: cannot access 'bad'...] Adds a _trim_error helper that strips long absolute paths down to the filename and caps the suffix at 48 chars so it stays readable on narrow terminals. Threads the tool result through the tool.completed progress callback so agent/display.get_cute_tool_message can inspect it. The cli.py [error] post-suffix is removed in favor of the richer suffix _detect_tool_failure now produces directly. Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current main with the dead-code preview-length bumps dropped (tool_preview_length config already strictly caps previews, so the per-tool n= defaults are unreachable). Co-authored-by: Albert.Zhou <albert748@gmail.com>	2026-05-23 21:03:51 -07:00
honor2030	6a1aa420e7	Fix CLI verbose tool progress config fallback	2026-05-23 21:03:51 -07:00
Teknium	d97c324473	fix(terminal): warn at call time when background=true runs silently (#31289 ) `terminal(background=true)` without `notify_on_complete=true` or `watch_patterns` runs the process SILENTLY — the agent has no way to learn it finished short of calling `process(action='poll')` explicitly. That's correct for genuine long-lived processes (servers, watchers, daemons) but is a footgun for every bounded task (tests, builds, deploys, CI pollers, batch jobs), which is the vast majority of background uses. Hit on May 23, 2026 (PR #31231 incident): agent launched a CI-watch loop with `background=true` only. The poller ran fine, exited green 6 minutes later, agent never noticed. User had to surface 'we are green CI, you can merge.' Memory and skill docs said what to do (poll in background) but not how to receive the result. The `notify_on_complete=true` flag exists and works, but is easy to forget when bg seems sufficient on its own. Two changes here, mutually reinforcing: 1. Runtime nudge: tool result for `background=true` w/o notify or watch_patterns now includes a `hint` field explaining the silent- process failure mode and pointing at the corrective flag. Agent sees it on the same turn and self-corrects without needing the user to surface anything. Cost for legitimate server cases is one ignored read (~50 tokens); cost for forgot-notify cases is prevented blindness (potentially many turns, or a user nudge). False positives << false negatives. 2. Schema/description rewrite: top-level TERMINAL_TOOL_DESCRIPTION and the `background` field description now lead with 'Almost always pair with notify_on_complete=true' instead of presenting it as one of two equally-likely patterns. The two legitimate non-notify shapes (long-lived servers; watch_patterns mid-process signals) are still documented, but as the minority case. Tests cover all four shapes: bg-only emits hint, bg+notify doesn't, bg+watch_patterns doesn't, foreground doesn't. 4 new tests; full suite of background/process tests stays green (160/160 across the relevant 6 test files).	2026-05-23 21:02:14 -07:00
AhmetArif0	39b8d1d313	fix(dingtalk): finalize open streaming cards before disconnect AI Card "tool progress" cards created with finalize=False were left in streaming state on DingTalk's UI after a gateway restart because disconnect() called _streaming_cards.clear() without first closing them via _close_streaming_siblings. Move the finalization loop before self._http_client.aclose() so the HTTP client is still available when the finalize requests are sent. Adds a regression test that asserts the HTTP client is alive during finalization.	2026-05-23 20:48:56 -07:00
Teknium	a7b622effc	docs(providers): move Nous Portal first, Google Gemini OAuth last (#31287 ) Reorder the per-provider subsections under '## Inference Providers' so Nous Portal — the recommended setup — leads the list, and Google Gemini via OAuth (which carries a policy-risk warning) drops to last position right before the '## Custom & Self-Hosted LLM Providers' section. All other provider sections keep their relative order. Pure section move; no content changes.	2026-05-23 20:46:17 -07:00
Fewmanism	83f6a83b24	fix(tui): handle images with codex app-server	2026-05-23 20:40:09 -07:00
teknium1	7ce6b504a2	fix(process_registry): use taskkill /T /F for tree-kill on Windows The Windows branch of `_terminate_host_pid` early-returned after `os.kill(pid, SIGTERM)` (which Python maps to `TerminateProcess` for the target handle only), leaving descendant processes — e.g. Chromium renderer/GPU/network helpers spawned by an `agent-browser` daemon — running on Windows even after the preceding commit fixed POSIX. The right Windows primitive is `taskkill /PID <pid> /T /F`: `/T` walks the tree, `/F` force-terminates. Same approach `gateway.status.terminate_pid(force=True)` already uses for the gateway's own shutdown path; reuse the same shape here. Why NOT extend the POSIX psutil tree-walk to Windows: 1. Windows doesn't maintain a Unix-style process tree. `psutil. Process.children(recursive=True)` walks PPID links that go stale when intermediate processes exit, so enumeration is best-effort and silently misses orphaned descendants. The whole bug we're fixing is orphaned descendants. 2. `psutil.Process.terminate()` on Windows is `TerminateProcess()` for one handle — same single-PID scope as the existing `os.kill`. The existing comment in `gateway/status.py:: terminate_pid` warns this explicitly: 'os.kill SIGTERM is not equivalent to a tree-killing hard stop' on Windows. 3. Headless Chromium has no GUI window, so the softer `taskkill /T` without `/F` (which sends WM_CLOSE) won't reach it either. `/F` is required. POSIX path is unchanged. The taskkill subprocess uses the same `creationflags=windows_hide_flags()` pattern other Windows shellouts in this codebase use. `FileNotFoundError` / `TimeoutExpired` / `OSError` fall back to bare `os.kill(SIGTERM)` as cheap insurance. Tests cover the Windows branch via the codebase's standard `monkeypatch _IS_WINDOWS` pattern (`references/windows-native- support.md`), plus POSIX tree-walk order, NoSuchProcess swallow, and the OSError fallback path. 7 new tests, all green on Linux CI.	2026-05-23 20:30:29 -07:00
Yuan Li	22f3f5a75a	fix(browser): use process-tree termination for daemon cleanup os.kill(pid, SIGTERM) only signals the parent, leaving Chromium child processes (renderer, GPU, etc.) orphaned. Reuse the existing ProcessRegistry._terminate_host_pid() helper which walks the process tree leaf-up via psutil, terminating children before the parent.	2026-05-23 20:30:29 -07:00
Teknium	72ff3e909c	docs(providers): rewrite Nous Portal section as primary recommended path (#31230 ) The old section sold Nous Portal as access to Hermes-4 models, which is backwards — Hermes 4 is a chat/reasoning family that's NOT recommended for Hermes Agent (per portal.nousresearch.com/info itself). The actual value prop is the 300+ frontier agentic models (Claude, GPT, Gemini, DeepSeek, etc.) plus the Tool Gateway plus Nous Chat under one subscription. Rewrite to lead with that, position the portal as the recommended way to run Hermes Agent, demote Hermes 4 to a 'note' explaining why it's not the right pick for agent workloads, and link to the manage-subscription page from setup.	2026-05-23 18:19:17 -07:00
Teknium	e42fcc5625	fix(provider): make config.yaml model.provider the single source of truth (#31222 ) Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER was leaking behavioral config into the .env surface, including from the gateway, which bypassed config.yaml entirely. Behavior: - gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs. Gateway now flows through resolve_runtime_provider() with no `requested` override, which reads model.provider from config.yaml first. Docs/UX (strip env var from user-facing surface): - --provider help text no longer mentions the env var - cli-config.yaml.example same - reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and the cross-reference from HERMES_INFERENCE_MODEL - reference/cli-commands.md: blank the env-var column for --provider - guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider - developer-guide/adding-providers.md, model-provider-plugin.md: reframe Internal mechanism (kept as-is): - hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env - tui_gateway/server.py reads it on TUI startup - resolve_requested_provider() / oneshot.py / cli.py still fall through to the env var as a last-resort behind config.yaml, which is what makes the TUI parent->child handoff work This stays. We just stop documenting it as a user knob. Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first call, succeed on second; drop monkeypatch.setenv lines that no longer matter. Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying issue but proposed aligning gateway to the env var rather than removing it).	2026-05-23 18:18:41 -07:00

1 2 3 4 5 ...

9358 commits