hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-04 12:33:08 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	290bf93104	fix(tui): harden Terminal.app render behavior Avoid Terminal.app paint corruption by disabling fast-echo in that terminal, sanitizing non-SGR control sequences before ANSI rendering, and defaulting Apple Terminal back to the safer 256-color path unless truecolor is explicitly requested.	2026-05-16 22:51:51 -05:00
Teknium	3b39096904	Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189 ) After context compression, the protected tail messages retain their original image parts. When those include multi-MB pasted screenshots, every subsequent API request re-ships the same base-64 blobs forever — which can push the request past provider body-size limits and wedge the session even though compression 'succeeded'. Add _strip_historical_media() to agent/context_compressor.py. After the summary is built, find the newest user message that carries an image part and replace image parts in every earlier message with a short text placeholder ('[Attached image — stripped after compression]'). The newest image-bearing user turn keeps its media so the model can still analyse what the user just sent. Handles all three multimodal shapes: - OpenAI chat.completions image_url - OpenAI Responses API input_image - Anthropic native {type: image, source: ...} Includes 27 unit tests covering the helpers and the end-to-end compress() integration, plus a manual E2E check confirming a ~4MB two-image conversation shrinks to ~2MB after compression.	2026-05-16 17:18:25 -07:00
Guillaume Meyer	5cbe0b1c4f	test(plugins): cover _discover_all_plugins recursion + cross-link loader Add a TestDiscoverAllPlugins class covering the six cases the recursive scan needs to handle: - flat plugin uses its manifest ``name:`` as the key - category-namespaced plugin keys off ``<category>/<dirname>`` even when the manifest ``name:`` is bare (regression test for the original bug — ``plugins/observability/langfuse/`` with ``name: langfuse`` must surface as ``observability/langfuse``, not ``langfuse``) - user-installed plugin overrides bundled on key collision - depth cap: anything below ``<root>/<category>/<plugin>/`` is ignored - bundled ``memory/`` and ``context_engine/`` are skipped (they have their own loaders), but user plugins under those category names are still scanned Also add an in-source comment next to the key derivation pointing at the loader's matching line (``PluginManager._parse_manifest`` in plugins.py:1027-1028), so future renames of one site flag the other. Both items raised in Copilot review on #27161. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:15:19 -07:00
Guillaume Meyer	21be7025c5	refactor(plugins): drop dead bundled-source guard in _discover_all_plugins The `if key in seen and source == "bundled": continue` check was unreachable: bundled is scanned before user, so `key in seen` can never be true while `source == "bundled"`. The "user overrides bundled" semantics are preserved automatically by the unconditional `seen[key] = …` on the user pass. Replaces the dead guard with a one-line comment explaining the overwrite semantics, so a future contributor adding a third source (e.g. project plugins) can see at a glance how ordering interacts with the dict-overwrite. Matches `PluginManager.discover_and_load`'s "user wins" rule. Spotted by Copilot in code review on #27161. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:15:19 -07:00
Guillaume Meyer	8ab8bc2f03	fix(plugins): remove unreachable hermes tools → Langfuse path The langfuse plugin is hooks-only (no toolsets), so it never appears in `hermes tools` — that menu iterates `_get_effective_configurable_toolsets()` (= `CONFIGURABLE_TOOLSETS` + plugin-registered toolsets), and "langfuse" is in neither. The `TOOL_CATEGORIES["langfuse"]` setup wizard (with its `post_setup: "langfuse"` hook that pip-installs the SDK and writes `plugins.enabled`) was reachable only when a toolset key "langfuse" got enabled, which can't happen — so it's been dead code, and the docs that promised "Setup (interactive): hermes tools → Langfuse Observability" were silently broken. Right home for that wizard is `hermes plugins` (e.g. auto-running a plugin's post-setup hook on enable), which is a generic plugin-setup mechanism worth designing properly rather than shoehorning langfuse back into `hermes tools`. Until that exists, point users at the working manual flow. Code: - Delete `TOOL_CATEGORIES["langfuse"]` (24 lines) — unreachable. - Delete the `post_setup_key == "langfuse"` branch in `_run_post_setup` (29 lines) — only caller was the deleted TOOL_CATEGORIES entry. Docs / comments (point at the manual flow + interactive `hermes plugins`): - `plugins/observability/langfuse/README.md`: collapse the two-option setup section to the single working flow. - `plugins/observability/langfuse/plugin.yaml`: update `description`. - `plugins/observability/langfuse/__init__.py`: update module docstring. - `hermes_cli/config.py`: update inline comment above the LANGFUSE_* env-var allow-list. - `website/docs/user-guide/features/built-in-plugins.md`: collapse "Setup (interactive)" + "Setup (manual)" into one accurate block. - `website/docs/reference/environment-variables.md`: update the cross-reference in the Langfuse env-vars section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:15:19 -07:00
Guillaume Meyer	9b82586c6b	fix(plugins): surface category-namespaced plugins in hermes plugins list `_discover_all_plugins()` in plugins_cmd.py did a flat scan of the bundled and user plugin directories — only direct children with a plugin.yaml were surfaced. Category directories like `observability/`, `image_gen/`, `platforms/`, `model-providers/`, `web/`, and `video_gen/` have no plugin.yaml of their own, so their nested plugins (`observability/langfuse`, `image_gen/openai`, etc.) never appeared in `hermes plugins list` or the interactive `hermes plugins` UI — even though the runtime loader (`PluginManager._scan_directory_level`) discovers them correctly and they do load at runtime. This broke the documented promise that bundled plugins appear in `hermes plugins list` and the interactive UI before being enabled, and made it look like `observability/langfuse` didn't exist. Refactor `_discover_all_plugins()` to mirror the loader's recursion (depth cap = 2, same skip set, user overrides bundled on key collision). Return the path-derived registry key (e.g. `observability/langfuse`) as the displayed name, matching what the user passes to `hermes plugins enable …` / writes under `plugins.enabled` in config.yaml. Also clarify the plugins docs: spell out that sub-category plugins surface by their `<category>/<plugin>` key in `hermes plugins list` / interactive UI, add an `observability/langfuse` example to the command reference, and include a nested entry in the interactive-UI mock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:15:19 -07:00
Teknium	29b1bd0e20	feat(cli): add `hermes send` to pipe script output to any messaging platform (#27188 ) Introduces a thin CLI wrapper around the existing send_message_tool so shell scripts, cron scripts, CI hooks, and monitoring daemons can reuse the gateway's already-configured platform credentials without reimplementing each platform's REST client. hermes send --to telegram "deploy finished" echo "RAM 92%" \| hermes send --to telegram:-1001234567890 hermes send --to discord:#ops --file report.md hermes send --to slack:#eng --subject "[CI]" --file build.log hermes send --list # all targets hermes send --list telegram # filter by platform Supports all platforms the send_message tool already does (Telegram, Discord, Slack, Signal, SMS, WhatsApp, Matrix, Feishu, DingTalk, WeCom, Weixin, Email, etc.), including threaded targets and #channel-name resolution via the channel directory. hermes_cli/send_cmd.py delegates to tools.send_message_tool.send_message_tool, which means there is zero new platform-specific code. The subcommand just: 1. Bridges ~/.hermes/.env and top-level ~/.hermes/config.yaml scalars into os.environ (same bootstrap the gateway does at startup) — required so TELEGRAM_HOME_CHANNEL and friends are visible to load_gateway_config(). 2. Resolves the message body from positional arg, --file, or piped stdin. 3. Calls the shared tool and translates its JSON result to exit codes: 0 success, 1 delivery failure, 2 usage error. No running gateway is required for bot-token platforms (Telegram, Discord, Slack, Signal, SMS, WhatsApp) — the tool hits each platform's REST API directly. Plugin platforms that rely on a live adapter connection still need the gateway running; the error message is forwarded verbatim. - New guide: website/docs/guides/pipe-script-output.md covering real-world patterns (memory watchdogs, CI hooks, cron pipes, long-running task completion pings) and the security/gateway notes. - Cross-links added from automate-with-cron.md ("no LLM? use hermes send") and developer-guide/gateway-internals.md (delivery-path section). tests/hermes_cli/test_send_cmd.py (20 tests, all green): - Happy paths: positional message, stdin, --file, --file -, --subject, --json, --quiet. - Error paths: missing --to, missing body, file not found, tool returns error payload (exit 1), tool skipped-send result (exit 0). - --list: human output, --json output, platform filter, unknown platform. - Env loader: bridges config.yaml scalars into env, does not override existing env vars, gracefully handles missing files. - Registrar contract: register_send_subparser() returns a working parser. Smoke-tested end-to-end against a live Telegram bot before commit.	2026-05-16 17:14:45 -07:00
konsisumer	33528b428d	fix(agent): reset _fallback_index at turn start even when no fallback activated In long-lived interactive sessions, _try_activate_fallback() advances _fallback_index before attempting client resolution. When resolution fails (provider not configured, etc.) the function returns False without ever setting _fallback_activated=True. _restore_primary_runtime() then skips its reset block entirely (guarded by `if not _fallback_activated`), leaving _fallback_index >= len(_fallback_chain) for all subsequent turns. The eager-fallback guard at the top of the retry loop checks `_fallback_index < len(_fallback_chain)`, so the condition fails silently and no fallback is ever attempted again for that session. Cron jobs spawn a fresh AIAgent per run and never hit this path, which is why the same fallback chain works reliably for cron but not interactive. Fix: reset _fallback_index=0 in the `not _fallback_activated` early-return branch so every new turn starts with the full chain available. Fixes #20465	2026-05-16 17:12:48 -07:00
Teknium	2b193907d6	fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184 ) xAI's Responses stream emits 'type=error' as the FIRST SSE frame when an OAuth account is unsubscribed/exhausted or rejects the encrypted-reasoning replay introduced in the May 2026 SuperGrok rollout. The SDK helper raises RuntimeError(Expected to have received response.created before error), which the caller correctly routes to _run_codex_create_stream_fallback. The fallback then opens a new stream that emits the same 'error' frame — but the fallback loop only handled {response.completed, response.incomplete, response.failed} and silently continue'd past 'error' events. Result: the loop fell off the end of the stream and raised the useless 'fallback did not emit a terminal response' RuntimeError, which the classifier marked retryable=True and looped 3x before failing with no clue what went wrong. Now: 'error' frames raise a synthesized _StreamErrorEvent with an OpenAI SDK-shaped .body so _summarize_api_error, _extract_api_error_context, _is_entitlement_failure, and classify_api_error all see the real provider message. Users on unsubscribed accounts now see 'do not have an active Grok subscription' once, not three RuntimeErrors. Verified end-to-end: classifier returns reason=auth retryable=False; entitlement detector matches even with status_code=None; summarizer returns the full xAI message. Tests: 4 new in TestCodexFallbackErrorEvent covering xAI subscription message, dict-shaped events, summarizer integration, and the empty-stream case (must still raise the original RuntimeError so 'truncated mid-flight' stays distinguishable from 'provider rejected the call').	2026-05-16 17:09:41 -07:00
Teknium	e21cb8d145	feat(status): append session recap to /status output (#27176 ) Adds a pure-local recap of recent session activity — turn counts, tools used, files touched, last user ask, last assistant reply — appended to the existing /status output. Useful when juggling multiple sessions and you want a one-glance reminder of where this one left off. Inspired by Claude Code 2.1.114's /recap, but folded into /status so we don't add a 6th info command. Pure local computation: no LLM call, no auxiliary model, no prompt-cache invalidation, instant and free. Salvage of #18587 — kept the shared hermes_cli.session_recap.build_recap helper and its 13 unit tests, dropped the /recap slash command + ACTIVE_SESSION_BYPASS_COMMANDS entry + Level-2 bypass since /status already covers both surfaces. Tailored to hermes-agent's tool vocabulary: file-editing tools (patch, write_file, read_file, skill_manage, skill_view) surface touched paths; tool-call counts highlight which classes of work drove the session. Source: https://code.claude.com/docs/en/whats-new/2026-w17	2026-05-16 16:51:42 -07:00
Teknium	226cee43d9	feat(cli): show ▶ N indicator in status bar when /background tasks are running (#27175 ) Surface live background-task count in the prompt_toolkit status bar so users can see at a glance that a /background task exists and is running — no need to ask the agent about it (the agent has no visibility into bg sessions by design). - _get_status_bar_snapshot now reports active_background_tasks from len() of the live _background_tasks dict (entries are removed in the task thread's finally block, so this reflects truly-running tasks) - Indicator shown only on medium (<76) and wide (>=76) tiers; narrow (<52) stays minimal since it's already cramped - No invalidate plumbing needed: status bar fragments are pulled via lambda on every redraw, and the bg thread already calls _app.invalidate() on exit Refs #8568	2026-05-16 16:51:29 -07:00
helix4u	6f817e1447	fix(telegram): restore DM topic typing indicator	2026-05-16 16:50:02 -07:00
Maxim Esipov	e51d74ab91	fix(codex): rotate pool on usage limit 429	2026-05-16 16:49:56 -07:00
Teknium	dffb602f37	fix(xai): drop stale X Premium+ hint from entitlement 403 surfacing (#27110 ) xAI announced on 2026-05-16 (https://x.ai/news/grok-hermes) that X Premium subscriptions now work in Hermes Agent. The hint we shipped in PR #26644 asserted the opposite ("X Premium+ does NOT include xAI API access — only standalone SuperGrok subscribers can use this provider"), which would now misdirect Premium+ users who hit any other 403 (no Grok sub at all, wrong tier, exhausted quota) into thinking they need to switch subscriptions when their sub is in fact valid. Remove _decorate_xai_entitlement_error and its two call sites in _summarize_api_error. xAI's own body text already says "Manage subscriptions at https://grok.com/?_s=usage" — surface that verbatim and let xAI's wording do the diagnosis. The _is_entitlement_failure guard (which prevents credential-pool refresh loops on entitlement 403s) and the reasoning-replay gating for xai-oauth are unrelated and untouched. Update tests to assert the body still surfaces verbatim and that no Hermes-side editorializing is appended.	2026-05-16 16:00:01 -07:00
Teknium	fb05f5d4b5	fix(mcp): validate remote URLs up-front with a clear error (#27105 ) Port from anomalyco/opencode#25019 ("fix: handle invalid mcp urls"). Previously: a typo in `config.yaml` (missing scheme, wrong scheme, empty string, non-string value) slipped past `_is_http()` and hit `httpx.URL(url)` or `streamablehttp_client(url, ...)` deep in the transport layer. That raised a generic exception which went through the reconnect-backoff loop, so a bad URL caused _MAX_INITIAL_CONNECT_RETRIES attempts with doubling backoff — about a minute of pointless retries plus an opaque error — before the server was marked failed. Now: we validate the URL once, at the top of `run()`, before entering the retry loop. A malformed URL raises `InvalidMcpUrlError` (a `ValueError` subclass) with a message that names the offending server and explains exactly what was wrong. `_ready` is set and `_error` is populated, so `start()` re-raises and the server shows up as failed in `hermes mcp list` without any backoff burn. Validation rules: - Must be a string (rejects None, dict, int) - Must be non-empty (rejects '' and whitespace-only) - Scheme must be http or https (rejects file://, ws://, stdio://) - Must have a non-empty host (rejects http:///, http://:8080) Tests (21 new cases in tests/tools/test_mcp_invalid_url.py): - TestValidUrlsAccepted: http, https, IPv6, ports, paths, query strings - TestInvalidUrlsRejected: every rejection path above + clear error text - TestErrorIsValueError: downstream code catching ValueError still works E2E verified: a misconfigured server with `url: not-a-valid-url` now fails in <0.001s with the clear error, instead of minutes of retries. Doesn't touch stdio servers (they use `command`, not `url`) — the validator only fires when `_is_http()` returns True.	2026-05-16 13:06:56 -07:00
Teknium	93e109a1d5	fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104 ) Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects two shapes that the rest of the JSON Schema ecosystem accepts: 1. $ref nodes with sibling keywords. Moonshot expands the reference before validation and then rejects the node if keys like `description`, `type`, or `default` appear alongside $ref. MCP-sourced tool schemas commonly put a `description` on $ref-typed properties so the model sees the field hint — which worked on every provider except Moonshot. 2. Tuple-style `items` arrays (positional element schemas). Moonshot's engine requires ONE schema applied to every array element. Common in tool schemas generated from Go/Protobuf that model fixed-length arrays as `[{type:number}, {type:number}]`. Repairs applied in `agent/moonshot_schema.py`: - Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only (strip every sibling). The referenced definition still carries its own description on the target node, which Moonshot accepts. - Rule 4: when `items` is a list, collapse to the first element schema (falling back to `{}` which is then filled by the generic missing-type rule). Preserves `minItems` / `maxItems` / other siblings. Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems, plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type still passes (it asserted plain $ref passes through; now it passes through as exactly `{"$ref": "..."}` which is strictly compatible). All 35 tests in test_moonshot_schema.py pass.	2026-05-16 13:02:19 -07:00
Teknium	dc3d0fe148	Port from cline/cline#10343: periodic gateway memory logging (#27102 ) Emit a grep-friendly '[MEMORY] rss=...MB ...' line in agent.log / gateway.log every N minutes (default 5) so slow leaks in the long-lived gateway process show up as a time series. Based on https://github.com/cline/cline/pull/10343 (src/standalone/memory-monitor.ts). - gateway/memory_monitor.py: new module. Daemon thread, baseline on start, final snapshot on stop. Uses resource.getrusage() (stdlib) first, falls back to psutil, disables itself with one WARNING if neither is available. - gateway/run.py: start monitor right after setup_logging() in start_gateway(); stop it in the shutdown block next to MCP teardown. - hermes_cli/config.py: logging.memory_monitor { enabled, interval_seconds } defaults under the existing logging section. - tests/gateway/test_memory_monitor.py: 10 unit tests covering format, baseline/shutdown snapshots, double-start noop, periodic timer, daemon thread invariant, and unavailable-RSS warn-and-skip path. Adapted from TypeScript/Node to Python (threading.Event-based daemon thread instead of setInterval/unref), added Python-specific gc + thread counts to the log line (handier than ext/arrayBuffers for diagnosing Python gateway leaks), and gated behind a config.yaml toggle so users can silence the periodic line if they want. No heap-snapshot-on-OOM equivalent — CPython doesn't have V8's --heapsnapshot-near-heap-limit; tracemalloc would be the Python equivalent but adds non-trivial overhead, so leaving that out.	2026-05-16 12:55:23 -07:00
Teknium	fc03c95da1	feat(cli): add /exit --delete flag to remove session on quit (#27101 ) Port from google-gemini/gemini-cli#19332. Users can now exit with '/exit --delete' (or '/quit --delete', '/exit -d') to permanently remove the current session's SQLite history plus on-disk transcripts (.json / .jsonl / request_dump_*) in one shot. Useful for privacy-sensitive workflows and one-off interactions where leaving a session recording behind is undesirable. Implementation: - New HermesCLI._delete_session_on_exit one-shot flag (defaults False). - process_command() parses --delete / -d after /exit or /quit and arms the flag. Unknown args print a hint and keep the CLI running (prevents typos like '/exit -delete' from accidentally exiting). - Shutdown path calls SessionDB.delete_session(session_id, sessions_dir=...) right after end_session() when the flag is set. That API already existed for 'hermes sessions delete' and handles both SQLite removal (orphaning child sessions so FK constraints hold) and on-disk file cleanup. - /quit CommandDef now advertises '[--delete]' in args_hint so /help and CLI autocomplete surface it. Tests: tests/cli/test_exit_delete_session.py (12 cases covering both aliases, case insensitivity, whitespace, short form, unknown-arg rejection, and registry metadata). E2E-verified with isolated HERMES_HOME: session row deleted, all three transcript/request-dump files removed, second delete_session call correctly returns False.	2026-05-16 12:51:08 -07:00
briandevans	c844d15c3d	fix(update): stream npm install output so postinstall progress is visible (#18840 ) `hermes update` ran the repo-root and ui-tui npm installs with both `--silent` and `subprocess.run(..., capture_output=True)`, which hides all output from optional postinstall scripts. The largest of those — `@askjo/camofox-browser`'s `npx camoufox-js fetch` — downloads a Firefox-fork browser binary that can take many minutes on slow connections. Because nothing was printed during that wait, the updater appeared to hang at "Updating Node.js dependencies..." and users Ctrl-C'd, sometimes leaving `node_modules` partially installed. Drop `--silent` and pass `capture_output=False` for the repo-root and ui-tui paths so npm streams its `info run …` postinstall lines straight to the terminal. Output is still mirrored to `~/.hermes/logs/update.log` by the existing `_UpdateOutputStream` wrapper, so SSH-disconnect safety is preserved. The `web/` install path is untouched — its build step is fast and does not run binary-fetching postinstalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 12:18:55 -07:00
Teknium	05af78c53d	fix(update): make Camofox lazy-installed instead of eager (#27055 ) The `@askjo/camofox-browser` npm package was a top-level entry in the root `package.json` `dependencies` block, so `hermes update` ran its postinstall on every user, every update. That postinstall calls `npx camoufox-js fetch`, which silently downloads a ~300MB Firefox-fork browser binary from GitHub Releases — multi-minute on fast connections, and a hard block for users on slow / restricted networks (notably users in China running through a VPN). Camofox is an explicit opt-in browser backend. The runtime check in `tools/browser_tool.py` only routes through Camofox when the user has set `CAMOFOX_URL` (selected via `hermes tools` → Browser Automation → Camofox). Users who never opted in never touched the package at runtime, yet every `hermes update` paid for the binary fetch anyway. This change: * Removes `@askjo/camofox-browser` from root `package.json` dependencies (and the regenerated `package-lock.json` drops Camofox's entire transitive tree, ~2.6k lines). * Updates the Camofox `post_setup` handler in `hermes_cli/tools_config.py` to install `@askjo/camofox-browser@^1.5.2` explicitly when the user selects Camofox, and streams npm output (no `--silent`, no `capture_output`) so the ~300MB download is visible rather than appearing frozen. * Adds `tests/test_package_json_lazy_deps.py` as a regression guard so future PRs can't silently re-add Camofox (or any binary-postinstall package) to eager root dependencies. `agent-browser` stays eager — it is the default Chromium-driving backend used by every session that does not have a cloud browser provider configured, and its postinstall is small. Validation: \| \| Before \| After \| \|---\|---\|---\| \| `hermes update` time on slow network \| multi-minute hang at `→ Updating Node.js dependencies...` \| seconds (no binary fetch) \| \| Camofox opt-in install visibility \| silent, looked frozen \| streamed npm output \| \| Regression guard against re-adding \| none \| `test_package_json_lazy_deps.py` \| Tests: - `tests/test_package_json_lazy_deps.py`: 3/3 pass - `tests/tools/test_browser_camofox*`: 92/92 pass - `tests/hermes_cli/test_tools_config.py`: 66/66 pass - `tests/hermes_cli/test_cmd_update.py` + adjacent: green Reported by lulu (Discord, May 2026) — `hermes update` hangs at `→ Updating Node.js dependencies...` in China. Related: #18840, #18869.	2026-05-16 12:15:45 -07:00
Teknium	8a2b2b9f6f	docs(release): expand v0.14.0 highlights with newcomer-friendly context (#27053 ) Each highlight now gets 2-3 sentences explaining the user-facing value, not just the technical change. Targeted at someone discovering Hermes for the first time who isn't deep in the codebase.	2026-05-16 11:57:59 -07:00
Teknium	6c2406c5e1	fix(signal): read groupV2.id in envelope, fall back to legacy groupInfo (#27051 ) Port from qwibitai/nanoclaw#1962: modern Signal V2-only groups surface on dataMessage.groupV2.id, not groupInfo.groupId. signal-cli versions differ in which field they expose for V2 groups — some forward the underlying libsignal envelope verbatim (groupV2), others normalize everything into groupInfo. Without a groupV2 read, V2-only groups appear as DMs because groupInfo is undefined and the adapter misroutes them to the sender's DM session. Reads groupV2.id first, falls back to groupInfo.groupId. Also hardens chat_name extraction against non-dict groupInfo payloads (crashed with AttributeError under malformed envelopes). 6 new tests cover V2 routing, V1 legacy compatibility, V2-preferred precedence, no-group DM path, allowlist enforcement, and malformed payloads.	2026-05-16 11:53:57 -07:00
Teknium	35f25523c6	docs(tools): add video_generate / video_gen toolset to user-facing tool docs (#27050 ) The video_gen toolset and its video_generate tool shipped without user-facing reference docs. toolsets-reference.md and the dev-guide plugin page were already in, but reference/tools-reference.md had no video_gen section at all and user-guide/features/tools.md's Media row didn't list video_generate. - reference/tools-reference.md: add a video_gen section after video, including backend list (xAI Grok-Imagine, FAL.ai Veo/Pixverse/Kling), unified text-to-video / image-to-video surface note, link to the dev-guide plugin page, and the video_generate tool row. Add video_generate to the standalone-tools quick-counts line. - user-guide/features/tools.md: extend Media row with video_generate and video_analyze plus an opt-in caveat.	2026-05-16 11:53:13 -07:00
Teknium	6836987428	docs(release): rewrite v0.14.0 highlights for excitement framing (#27035 ) * chore: release v0.14.0 (2026.5.16) The Foundation Release — Hermes installs and runs anywhere now. Highlights: - Native Windows support (early beta) — PowerShell installer, native subprocess/PTY paths, ~40 follow-up Windows-only fixes - pip install hermes-agent — PyPI wheel - Cold-start wave — ~19s off hermes launch, 180x faster browser_console (CDP WS) - Supply-chain advisory checker + lazy-deps + tiered install fallback - OpenAI-compatible local proxy for OAuth providers (Claude Pro, ChatGPT Pro, SuperGrok) - Cross-session 1h Claude prompt cache (Anthropic / OpenRouter / Nous Portal) - 2 new platforms: LINE + SimpleX Chat (22 total) - Microsoft Graph foundation — Teams pipeline + webhook adapter - /handoff actually transfers sessions live - x_search first-class tool, vision_analyze pixel passthrough - LSP semantic diagnostics on every write - Unified video_generate with pluggable backends - computer_use cua-driver backend - 9 new optional skills, OpenRouter Pareto Code router, xAI Grok OAuth - 12 P0 + 50 P1 closures 808 commits · 633 PRs · 1393 files · 165k insertions · 545 issues closed · 215 contributors * docs(release): rewrite v0.14.0 highlights for excitement framing Demote Windows beta from headline; lead with SuperGrok / OAuth proxy / x_search / Microsoft Teams. Frame lazy-deps as a debloating wave that makes installs dramatically lighter. Add highlights for clickable URLs in any terminal, dangerous-command detection bypasses, ChatGPT Pro and SuperGrok via the local proxy. Tighten the summary paragraph.	2026-05-16 11:18:06 -07:00
kshitij	3034eee38e	fix(acp): replay session history before responding to session/load (#12285 follow-up) (#26957 ) Switches `_replay_session_history` from `loop.call_soon`-deferred (after the `LoadSessionResponse` is written) to `await`-inline (before the response is constructed) for both `session/load` and `session/resume`. Adds defensive try/except around the awaited call so a replay helper crash still yields a successful load response — partial transcripts are acceptable, total load failure is not. The deferral was added on May 2 in commit `19854c7cd` with the rationale "Zed only attaches streamed transcript/tool updates once the load/resume response has completed." That justification was incorrect: - Zed's current ACP integration (zed-industries/zed crates/agent_servers/src/acp.rs) explicitly registers the session-update routing entry BEFORE awaiting the loadSession RPC, with the comment: "so that any session/update notifications that arrive during the call (e.g. history replay during session/load) can find the thread." - Every other reference ACP server (Codex, Claude Code, OpenCode, Pi, agentao) replays history BEFORE responding to the load request. - The ACP spec wording ("Stream the entire conversation history back to the client via notifications") and the natural JSON-RPC reading both mean "during the request's lifetime", not "after the response resolves". Empirical reproduction (reported by Biraj on @agentclientprotocol/sdk v0.21.1): the same custom ACP client works correctly against Codex / Claude Code / OpenCode / Pi but receives 0 notifications from Hermes because it measures the per-call notification count at the moment `loadSession` resolves — which on Hermes was before the `call_soon`- scheduled replay coroutine had a chance to run. Changes: - `acp_adapter/server.py`: remove `_schedule_history_replay`; both `load_session` and `resume_session` now `await self._replay_session_history` before returning, wrapped in try/except that logs and continues on helper exceptions. - `tests/acp/test_server.py`: replace the single `test_load_session_schedules_history_replay_after_response` (which encoded the now-incorrect post-response ordering) with two tests asserting `events == ["replay", "returned"]` for load and resume. Add two regression tests confirming that a replay helper raising still yields a `LoadSessionResponse` / `ResumeSessionResponse` rather than propagating the exception out as a JSON-RPC error. Result: 240 ACP tests pass (was 238), ruff clean. Verified end-to-end: biraj's synchronous notification-counter pattern now sees 6 notifications during `loadSession` for a 5-message session, matching all other reference ACP servers. The `_fenced_text` change in `acp_adapter/tools.py` from the same May 2 commit is orthogonal and intentionally left intact — it's a separate, still-valid fix for Zed's pipe-as-table rendering. Refs #12285. Follows up #26943 (which added thought-chunk replay but kept the deferral).	2026-05-16 07:41:34 -07:00
kshitij	f3a4af9cf2	fix(acp): replay assistant reasoning as agent_thought_chunk on session/load (#12285 ) (#26943 ) Persisted assistant `reasoning_content` / `reasoning` fields are now emitted as ACP `agent_thought_chunk` notifications during `_replay_session_history`, so editor clients (Zed, etc.) rebuild collapsed Thinking panes when the user re-opens a session that used a thinking model. Ordering matches live streaming: thought precedes message text within the same assistant turn, mirroring how `reasoning_callback` deltas arrive before `stream_delta_callback` deltas in `events.py::make_thinking_cb` / `make_message_cb`. Behavior on non-reasoning histories is unchanged; the replay loop's existing text / tool_call / tool_call_update / plan emission is preserved bit-for-bit. Closes #12285. Credit: - @Yukipukii1 (#14691) — original thought-replay design via `acp.update_agent_thought_text`; the tool-call portion of that PR has since landed via #19139, but the reasoning replay is theirs. - @HenkDz (#17652 / #18578) — established the `_replay_session_history` and `_history_*` helper conventions this builds on. - @D1zzyDwarf (#16531) — also closed by this work.	2026-05-16 06:45:29 -07:00
Teknium	a91a57fa5a	chore: release v0.14.0 (2026.5.16) (#26862 ) The Foundation Release — Hermes installs and runs anywhere now. Highlights: - Native Windows support (early beta) — PowerShell installer, native subprocess/PTY paths, ~40 follow-up Windows-only fixes - pip install hermes-agent — PyPI wheel - Cold-start wave — ~19s off hermes launch, 180x faster browser_console (CDP WS) - Supply-chain advisory checker + lazy-deps + tiered install fallback - OpenAI-compatible local proxy for OAuth providers (Claude Pro, ChatGPT Pro, SuperGrok) - Cross-session 1h Claude prompt cache (Anthropic / OpenRouter / Nous Portal) - 2 new platforms: LINE + SimpleX Chat (22 total) - Microsoft Graph foundation — Teams pipeline + webhook adapter - /handoff actually transfers sessions live - x_search first-class tool, vision_analyze pixel passthrough - LSP semantic diagnostics on every write - Unified video_generate with pluggable backends - computer_use cua-driver backend - 9 new optional skills, OpenRouter Pareto Code router, xAI Grok OAuth - 12 P0 + 50 P1 closures 808 commits · 633 PRs · 1393 files · 165k insertions · 545 issues closed · 215 contributors	2026-05-16 02:58:57 -07:00
teknium1	72f94f4a7c	test(security): regression guard for OAuth PKCE state/verifier separation Two unit tests for run_hermes_oauth_login_pure(): 1. test_authorization_url_state_is_not_pkce_verifier — asserts state in the auth URL is independent from the PKCE code_verifier sent in the token exchange, and that the verifier never appears in the URL. 2. test_callback_state_mismatch_aborts — asserts the flow returns None (no token exchange) when the callback state does not match the value we generated. Negative control verified: reintroducing the `b17e5c10` vulnerable pattern (state = verifier, no callback validation) makes both tests fail. Also adds AUTHOR_MAP entry for shaun0927 (contributor of the fix).	2026-05-16 02:38:02 -07:00
JunghwanNA	345821b4a1	style: move secrets import alongside other function-level imports Group the secrets import with time and webbrowser at the top of run_hermes_oauth_login_pure(), matching the existing pattern. Drop the _secrets alias — no name conflict in this scope.	2026-05-16 02:38:02 -07:00
JunghwanNA	fcd9011f8d	fix(security): separate OAuth PKCE state from code_verifier The PKCE flow reused the code_verifier as the OAuth state parameter. Per RFC 6749 §10.12 and RFC 7636, these serve different purposes: state is an anti-CSRF token visible in the authorization URL; the code_verifier must remain secret for the token exchange. Generate an independent secrets.token_urlsafe(32) for state and validate it on callback to provide actual CSRF protection. Closes #10693	2026-05-16 02:38:02 -07:00
Teknium	585d6b6430	fix(gateway): merge rapid TEXT follow-ups during active sessions (#4469 ) (#26822 ) When the agent is running and the user sends multiple TEXT messages in rapid succession, base.py's active-session branch stored the pending event as a single-slot replacement: self._pending_messages[session_key] = event Three rapid messages A, B, C landed as: A (interrupts), B (replaces A before consumer reads), C (replaces B). Only C reached the next turn — A and B were silently dropped. This is the symptom in #4469. Route the follow-up through merge_pending_message_event(..., merge_text=True) so TEXT events accumulate into the existing pending event's text instead of clobbering it. Photo and media bursts already merged through the same helper; this just extends the merge_text path (already used by the Telegram bursty-grace branch in gateway/run.py) to all platforms. Test exercises BasePlatformAdapter.handle_message directly with the session marked active and asserts three rapid TEXT events merge to 'part two\\npart three' rather than dropping the middle message. Sanity-checked the test would fail without the fix. Credits @devorun for the original investigation and analysis in #4491 that surfaced the underlying queue handling, though their fix targeted GatewayRunner._pending_messages which is now dead state on main.	2026-05-16 02:25:41 -07:00
teknium1	374dc81c23	fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint Follow-up improvements on top of @konsisumer's cherry-picked fix for #10648: 1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and a deprecation marker. The previous list included 'copilot-cli' and bare 'deprecation', which would false-positive on stderr from the NEW @github/copilot CLI — whose repo is literally github.com/github/copilot-cli and which legitimately surfaces those substrings in its own messages. 2. Replace the deprecation hint. The user in #10648 installed 'gh extension install github/gh-copilot' (the deprecated extension) thinking that's what ACP mode uses, when ACP actually spawns the new 'copilot' binary from '@github/copilot'. The hint now points users at the correct install command ('npm install -g @github/copilot') with the new CLI's repo URL, and demotes provider-switching to a fallback alternative. 3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the 'github-models' alias to the canonical 'copilot' provider id, matching the convention used by every other entry in the table. 4. Sharpen the 413 hint message. The free tier's ~8K cap is below the system-prompt floor, so this endpoint is fundamentally incompatible with an agentic loop — not a 'use a different URL' problem. Tests: - New parametrized false-positive coverage for the new CLI's stderr shape. - Updated assertion to require canonical 'copilot' provider mapping. - All 14 deprecation/URL tests pass.	2026-05-16 02:24:48 -07:00
konsisumer	b85b938b1f	test: add tests for copilot ACP deprecation detection and Azure URL mapping Cover the deprecation pattern matching against real gh-copilot stderr output, verify the GitHub Models Azure URL is in _URL_TO_PROVIDER, and confirm _is_github_models_base_url recognises the Azure endpoint.	2026-05-16 02:24:48 -07:00
konsisumer	4ded3ede33	fix: detect gh-copilot deprecation and improve GitHub Models 413 errors (#10648 ) Address two blocking issues when using GitHub Copilot integrations: 1. ACP mode: detect the gh-copilot CLI deprecation error from stderr and surface an actionable message with alternatives instead of hanging or showing a cryptic error. 2. GitHub Models (Azure) 413: recognize models.inference.ai.azure.com as a known GitHub Models URL, and print a targeted hint explaining the hard 8K token limit that makes this endpoint incompatible with Hermes' system prompt size.	2026-05-16 02:24:48 -07:00
kshitijk4poor	7bb97b952f	chore: add worlldz to AUTHOR_MAP for #26704 salvage	2026-05-16 02:21:17 -07:00
worlldz	d0a183cadd	fix(doctor): suppress stale direct-key issues when oauth is healthy Fixes #26693 `hermes doctor` currently promotes invalid direct API keys into the final summary even when the matching OAuth path is already healthy. That makes the setup look more broken than it really is. This change keeps the failed API Connectivity row visible but stops treating it as a blocking summary issue when a healthy OAuth fallback already exists for the same provider family. Covered cases: - Gemini OAuth + invalid direct Gemini key - MiniMax OAuth + invalid direct MiniMax key Based on #26704 by @worlldz.	2026-05-16 02:21:17 -07:00
Teknium	5f91b1a48b	feat(skills): add osint-investigation optional skill (closes #355 ) (#26729 ) * feat(skills): add osint-investigation optional skill (closes #355) Phase-1 public-records OSINT investigation framework adapted from ShinMegamiBoson/OpenPlanter (MIT). Lives in optional-skills/research/. Six data-source wiki entries (FEC, SEC EDGAR, USAspending, Senate LD, OFAC SDN, ICIJ Offshore Leaks), each following the 9-section template: summary, access, schema, coverage, cross-reference keys, data quality, acquisition, legal, references. Six stdlib-only acquisition scripts that emit normalized CSV, plus three analysis scripts: - entity_resolution.py — three-tier match (exact / fuzzy / token overlap) with explicit confidence per row - timing_analysis.py — permutation test for donation/contract timing correlation, joins through cross-links - build_findings.py — assembles structured findings.json with evidence chains pointing back to source rows Validation: full pipeline runs end-to-end on synthetic fixtures. Entity resolution found 24 cross-matches with 0 false positives on a 5-row / 4-row test set. Timing analysis on 5 donations clustered near 3 awards returned p=0.000, effect size 2.41 SD. Findings JSON correctly tags HIGH-severity timing pattern. All 9 scripts pass --help and py_compile. Docs site page auto-generated by website/scripts/generate-skill-docs.py; sidebar + catalog entries updated by the same generator. * fix(osint-investigation): live API fixes from end-to-end sweep Live-tested the skill on a real public-citizen query and found three bugs the synthetic E2E missed. All three are now fixed and re-verified. 1. FEC fetch hung on contributor name searches. The combination of two_year_transaction_period + sort=date + contributor_name puts the OpenFEC query plan on a slow path that the upstream gateway times out (25s+). Switched to min_date/max_date with no explicit sort. Renamed --candidate to --contributor (the original name was misleading: FEC searches by donor, not by candidate; --candidate is kept as a deprecated alias). Added --state filter for narrowing. 2. ICIJ Offshore Leaks reconcile endpoint returns 404. ICIJ removed the Open Refine reconciliation API. Rewrote fetch_icij_offshore.py to download the official bulk CSV ZIP (~70 MB, public, no auth) and search it locally. Cached under $HERMES_OSINT_CACHE/icij/ (default ~/.cache/hermes-osint/icij/) for 30 days, --force-refresh to refetch. Verified live: 'PUTIN' query returns 5 Panama Papers officer matches in 0.5s after first download. 3. SEC EDGAR silently returned 0 when the company-name resolver matched an individual Form 3/4/5 filer (insider trading disclosures). Now surfaces 'Resolved company X → CIK Y (Z)' on stderr, prints a filing-type histogram when the type filter wipes results, and explicitly warns when the matched CIK appears to be an individual filer rather than a corporate registrant. Bonus: _http.py was retrying 429 responses with exponential backoff plus honoring (often-missing) Retry-After headers, which compounded into multi-second hangs per page when the upstream key was over quota. Changed to fail-fast on 429 with a clear, actionable error showing the upstream's quota message. Verified: 0.3s fast-fail vs the previous 60s hang on DEMO_KEY rate-limit exhaustion. Updated SKILL.md, fec.md, and icij-offshore.md to match the new CLI flags and ICIJ bulk-cache flow. Regenerated the docusaurus page via website/scripts/generate-skill-docs.py. Live sweep results across all 6 sources for 'Dillon Rolnick, New York': - OFAC SDN: 0 matches ✓ (correctly not sanctioned) - USAspending: 0 matches ✓ (correctly not a federal contractor) - Senate LDA: 0 matches ✓ (correctly not a lobbying client) - SEC EDGAR: warns it resolved to 'Rolnick Michael' (CIK 0001845264) who is an individual Form 3 filer, not a corporate registrant - ICIJ: 0 matches ✓ (correctly not in any offshore leak) - FEC: rate-limited (DEMO_KEY); fails fast with clear quota message * feat(osint-investigation): expand to 12 sources covering identity, property, courts, archives, news Phase-2 expansion per Teknium feedback that the original 6-source skill (federal financial/regulatory only) wasn't a complete OSINT toolkit. Adds 6 more sources covering the major omissions a real investigation would reach for first. New sources (6 fetch scripts + 6 wiki entries): 1. NYC ACRIS — Real property records (deeds, mortgages, liens) via the city's Socrata API. Search by party name or property address. Joins Parties to Master to populate doc_type, dates, borough, and amount. Coverage: 5 NYC boroughs, ~70M party records, 1966-present. 2. OpenCorporates — Global corporate registry covering 130+ jurisdictions (~200M companies). Free API token at https://opencorporates.com/api_accounts/new raises the rate limit; HTML fallback works without one (limited fields). 3. CourtListener (Free Law Project) — federal + state court opinions (~10M back to colonial era) + PACER dockets via RECAP. Anonymous v4 search works; COURTLISTENER_TOKEN raises rate limits. 4. Wayback Machine CDX — historical web captures (~900B+). Used both for surveillance-of-record (when did this site change?) and as a content-recovery layer when other sources point to dead URLs. 5. Wikipedia + Wikidata — narrative bio + structured facts. Wikipedia OpenSearch for article matching, REST summary for extracts, Wikidata Action API (wbgetentities) for claims. Avoids the SPARQL Query Service which is aggressively rate-limited. 6. GDELT 2.0 DOC API — global news monitoring in 100+ languages, ~2015-present. Auto-retries with 6s backoff on the standard 1-req-per-5-sec throttle. Other changes in this commit: - SEC EDGAR no longer raises SystemExit when the company-name resolver finds no CIK; writes an empty CSV with header so the rest of a pipeline can keep moving and the warning is just on stderr. - _http.py User-Agent updated per Wikimedia policy: includes app name, version, and a 'set HERMES_OSINT_UA to identify yourself' instruction. - SKILL.md workflow now groups sources into two clusters (federal financial vs identity/property/courts/archives/news) with bash examples for each. 'When to use this skill' lists the broader set of investigation patterns the expanded sources unlock. Live sweep results on 'Dillon Rolnick, New York' across all 12 sources: ofac ✓ 0 (correctly clean) icij ✓ 0 (correctly not in any leak) usaspending ✓ 0 (correctly not a federal contractor) senate_lda ✓ 0 (correctly not a lobbying client) sec_edgar ✓ 0, warns: resolved to 'Rolnick Michael' (CIK 0001845264), individual Form 3 filer, NOT a corporate registrant fec — rate-limited (DEMO_KEY exhausted), fails fast with clear quota message nyc_acris ✓ 200 records named Rolnick across NYC; 48 records at 571 Hudson (the property the web identifies as his) opencorporates ✓ 0 (no API token configured; HTML fallback) courtlistener ✓ 0 for 'Dillon Rolnick'; 20 for 'Rolnick' generally; 5 for 'Microsoft' sanity check wayback ✓ 30 captures of nousresearch.com from 2011-present wikipedia ✓ 0 (correctly not notable enough); Bill Gates sanity returns full structured facts (occupation, employer, DOB, place of birth, country) gdelt ✓ 0 for 'Dillon Rolnick'; 5 for 'Nous Research' All 17 scripts compile clean and pass --help. Synthetic analysis pipeline regression still passes (entity_resolution 30 matches, timing p=0.000, findings 2). * feat(osint-investigation): remove FEC; DEMO_KEY rate-limits make it unreliable The FEC fetcher consistently failed the live sweep because the OpenFEC DEMO_KEY tier (40 calls/hour) exhausts on a single investigation, and the upstream returns slow-path query plans for unindexed contributor-name searches that the gateway times out. Without a real API key it's not usable; with one the user has to sign up at api.data.gov first. That's too much setup friction for a skill that should work out of the box. Removed: - scripts/fetch_fec.py - references/sources/fec.md Updated: - SKILL.md frontmatter description + tags - 'When NOT to use' now points users at https://www.fec.gov/data/ for federal donations - entity_resolution example switched from donor↔contractor to lobbying-client↔contractor (Senate LDA + USAspending pair) - timing_analysis example switched to lobbying-filings vs awards - 8 wiki entries had their 'FEC ↔ ...' cross-reference bullets removed 11 sources remain (5 federal financial + 6 identity/property/courts/ archives/news). All scripts compile, pass --help, and the synthetic analysis pipeline still passes on the new lobbying-shaped regression fixture (30 matches, p=0.000 on tight clustering, 2 findings).	2026-05-16 01:55:06 -07:00
Teknium	d725407c56	security(deps): bump aiohttp, anthropic, cryptography to CVE-fixed versions (#26830 ) Closes #10695. Picks up the still-vulnerable Python pins on current main: - aiohttp 3.13.3 -> 3.13.4 (messaging, slack, homeassistant, sms extras + lazy_deps platform.slack) — CVE-2026-34513 (DNS cache exhaustion), CVE-2026-34518 (cookie/proxy-auth leak on cross-origin redirect, relevant for the gateway since it handles OAuth tokens), CVE-2026-34519 (response reason injection), CVE-2026-34520 (null bytes in headers), CVE-2026-34525 (multiple Host headers). - anthropic 0.86.0 -> 0.87.0 (anthropic extra + lazy_deps provider.anthropic) — CVE-2026-34450 (memory tool files created mode 0o666), CVE-2026-34452 (path-traversal in async local-filesystem memory tool). Not directly exploitable since hermes-agent doesn't use the SDK's filesystem memory tool, but the SDK is bumped for hygiene. - cryptography pinned explicitly at 46.0.7 in core dependencies — CVE-2026-39892 (buffer overflow on non-contiguous buffers). Previously came in transitively via PyJWT[crypto]; the explicit floor keeps the WeCom/Weixin crypto paths from drifting below the fix. curl-cffi from the original issue is no longer in pyproject.toml or uv.lock, so no action needed there. uv.lock regenerated cleanly; only aiohttp / anthropic / cryptography moved. Credit: original issue + scoping by @shaun0927 (#10695, #10701). Floor analysis and packaging-surface audit by @gnanirahulnutakki (#10784), adapted to current main's exact-pin style. Co-authored-by: shaun0927 <shaun0927@users.noreply.github.com> Co-authored-by: Gnani Rahul Nutakki <gnanirahulnutakki@users.noreply.github.com>	2026-05-16 01:25:25 -07:00
Teknium	6ba35ec336	Inspired by Claude Code: tighten dangerous-command detection (#26829 ) Port three hardening patches from Claude Code 2.1.113's expanded deny rules to hermes' detect_dangerous_command() pattern list. 1. macOS /private/{etc,var,tmp,home} system paths /etc, /var, /tmp, /home are symlinks to /private/<name> on macOS. A write to /private/etc/sudoers works identically to /etc/sudoers but bypassed the plain /etc/ pattern check. Extracted a shared _SYSTEM_CONFIG_PATH fragment so /etc/ and the /private/ mirror stay in sync across redirect / tee / cp / mv / install / sed -i patterns. 2. killall -9 / -KILL / -SIGKILL / -s KILL / -r <regex> Parallel to the existing pkill -9 pattern. killall -9 against non-hermes processes was previously unprotected, and killall -r can sweep unrelated processes matching a regex. 3. find -execdir rm Same destructive effect as find -exec rm but ran in each match's directory. The previous pattern required a literal '-exec ' so -execdir slipped through. Guarded by 32 new test cases in 4 test classes: - TestMacOSPrivateSystemPaths (11 cases) - TestKillallKillSignals (9 cases) - TestFindExecdir (4 cases) - TestEtcPatternsUnaffectedByRefactor (6 regression guards on the existing /etc/ coverage after the _SYSTEM_CONFIG_PATH refactor) Inspiration: https://github.com/anthropics/claude-code/releases (Claude Code 2.1.113, April 17 2026 - "Enhanced deny rules" and "Dangerous path protection")	2026-05-16 01:24:25 -07:00
Teknium	395e9dd9e2	feat: add supports_parallel_tool_calls for MCP servers (#26825 ) Port from openai/codex#17667: MCP servers can now opt-in to parallel tool execution by setting supports_parallel_tool_calls: true in their config. This allows tools from the same server to run concurrently within a single tool-call batch, matching the behavior already available for built-in tools like web_search and read_file. Previously all MCP tools were forced sequential because they weren't in the _PARALLEL_SAFE_TOOLS set. Now _should_parallelize_tool_batch checks is_mcp_tool_parallel_safe() which looks up the server's config flag. Config example: mcp_servers: docs: command: "docs-server" supports_parallel_tool_calls: true Changes: - tools/mcp_tool.py: Track parallel-safe servers in _parallel_safe_servers set, populated during register_mcp_servers(). Add is_mcp_tool_parallel_safe() public API. - run_agent.py: Add _is_mcp_tool_parallel_safe() lazy-import wrapper. Update _should_parallelize_tool_batch() to check MCP tools against server config. - 11 new tests covering the feature end-to-end. - Updated MCP docs and config reference.	2026-05-16 01:04:28 -07:00
Teknium	c445f48b78	fix(delegation): honor api_mode + auto-detect anthropic_messages URLs (#26824 ) Subagent delegation hardcoded api_mode='chat_completions' for any delegation.base_url that didn't match three specific hostnames (chatgpt.com, api.anthropic.com, api.kimi.com/coding), and never read delegation.api_mode from config. Azure AI Foundry's https://foundry.services.ai.azure.com/anthropic endpoint fell through and got chat_completions, causing 404s on every delegate_task call. The main agent already handles this correctly via the shared _detect_api_mode_for_url() helper (anything ending in /anthropic → anthropic_messages); delegation reimplemented its own narrower check. Reuse the shared detector and honor an explicit delegation.api_mode when set so users can also force the transport on non-standard endpoints the URL heuristic can't classify. Fixes #10213. Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>	2026-05-16 01:00:27 -07:00
Teknium	74d0b392e7	feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth (#26763 ) * feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth Salvages tools/x_search_tool.py from the closed PR #10786 (originally by @Jaaneek) and reworks its credential resolution so the tool registers when EITHER xAI credential path is available: * XAI_API_KEY (paid xAI API key) is set in ~/.hermes/.env or the env, OR * The user is signed in via xAI Grok OAuth — SuperGrok subscription — i.e. hermes auth add xai-oauth has been run Both paths route through xAI's built-in x_search Responses tool at https://api.x.ai/v1/responses. When both credentials exist OAuth wins, matching tools/xai_http.py's existing preference order (uses SuperGrok quota instead of paid API spend). The check_fn calls resolve_xai_http_credentials() which auto-refreshes the OAuth access token if it's within the refresh skew window, so a True return means the bearer is fetchable AND non-empty. Wiring - tools/x_search_tool.py — new tool, ~370 LOC. Schema gated by check_fn, bearer resolved per-call so revoked OAuth surfaces a clean tool_error rather than an HTTP 401. - toolsets.py — "x_search" toolset def. NOT added to _HERMES_CORE_TOOLS; users opt in via hermes tools. - hermes_cli/tools_config.py — CONFIGURABLE_TOOLSETS entry + TOOL_CATEGORIES block with two provider options (OAuth + API key) sharing the existing xai_grok post_setup hook for credential bootstrap. - hermes_cli/config.py — DEFAULT_CONFIG["x_search"] with model / timeout_seconds / retries. Additive nested key; no version bump. - tests/tools/test_x_search_tool.py — 13 tests covering HTTP shape, handle validation, citation extraction, 4xx/5xx/timeout handling, and the full credential-resolution matrix (OAuth-only, API-key-only, both-set, neither-set, resolver-raises, config overrides, registry registration). - website/docs/guides/xai-grok-oauth.md — adds X Search to the direct-to-xAI tools section with off-by-default note. - website/docs/user-guide/features/tools.md — new row in the tools table. Off by default — users enable via `hermes tools` → 🐦 X (Twitter) Search. Schema only appears to the model when xAI credentials are configured. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * docs(x_search): add dedicated feature page + reference entries - website/docs/user-guide/features/x-search.md (new) — full feature walkthrough: authentication, enablement, configuration, parameters, returned fields, example, troubleshooting, see-also links. - website/docs/reference/tools-reference.md — new "x_search" toolset section with parameter docs and credential gating note. - website/docs/reference/toolsets-reference.md — new row in the toolset catalog table. - website/sidebars.ts — wires the new feature page under Media & Web, after web-search. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-16 00:58:27 -07:00
Teknium	627f8a5f1d	security: sanitize tool error strings before injecting into model context (#26823 ) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of #3838's three-feature scout). The truncated-tool-call (#1632) and empty-response-recovery (#1677, #1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation).	2026-05-16 00:57:39 -07:00
brooklyn!	70b663504f	fix(tui): keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting (#26717 ) * fix(tui): keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting TextInput's fast-echo bypass writes characters directly to stdout to avoid waiting on a React re-render for each keystroke. The hardware cursor advances by text.length cells, but Ink's cached `displayCursor` (the basis for the next frame's relative cursor-move preamble in log-update) stayed unchanged. When ANY unrelated component re-rendered between the fast-echo write and the deferred composer setCur/setParent flush — status bar timer, streaming reasoning, etc. — the next frame's preamble emitted a relative cursor move from a stale parked position and the hardware cursor parked N cells offset from the actual caret. Visible symptom: extra whitespace between the just-typed character and the cursor block, intermittent, worse on long sessions during streaming. Alt-screen was immune because frames begin with absolute CSI H. This adds a small API in @hermes/ink: - `Ink.noteExternalCursorAdvance(dx, dy?)` — bumps displayCursor if set, otherwise seeds from frontFrame.cursor so the next preamble's relative move correctly cancels the external advance. No-op on alt-screen. - `CursorAdvanceContext` + `useCursorAdvance()` hook to expose it. TextInput then calls `noteCursorAdvance(text.length)` after the fast-echo `stdout.write(text)` append, and `noteCursorAdvance(-1)` after the fast-backspace `\b \b` sequence. Tests: 4 new vitest cases pin the API contract (bumps when set, seeds from frontFrame.cursor when null, alt-screen no-op, zero-delta no-op). All 751 ui-tui tests pass; tests/test_tui_gateway_server.py (177) pass. * fix(tui): also advance cursorDeclaration so fast-echo survives deferred React state Copilot review on PR #26717 flagged a gap in the original fix: TextInput's fast-echo path defers the React `cur` state update by 16ms (perf optimization that batches re-renders during heavy typing). Inside that window, `useDeclaredCursor` still publishes a target computed from the PRE-keystroke `cur` — `cursorLayout(display, cur, columns)`. Advancing only `displayCursor` would let any unrelated re-render in that 16ms window run onRender's cursor-park branch with the stale declaration and visually undo the fast-echo's advance. The fix is symmetric: `noteExternalCursorAdvance` now bumps BOTH `displayCursor` (the log-update relative-move basis) AND, if non-null, `cursorDeclaration.relativeX/Y` (the target the cursor parks at after every frame). When React finally flushes `setCur`, `useDeclaredCursor` publishes a fresh declaration that supersedes our bumped one — exactly what we want. Adds two new vitest cases covering both halves: - active declaration advances in lock-step with displayCursor - null declaration stays null (no spurious bump) All 753 ui-tui tests pass; tests/test_tui_gateway_server.py (177) pass. Closes review threads: PRRT_kwDOPRF1G86ChKtD (textInput.tsx:1016 fast-echo append) PRRT_kwDOPRF1G86ChKtF (textInput.tsx:924 fast-backspace) PRRT_kwDOPRF1G86ChKtG (ink-cursor-advance.test.ts:57 missing coverage) * fix(tui): make fast-echo survive TextInput rerenders + alt-screen (Copilot round 2) Round 2 of PR #26717 review. Three real holes Copilot flagged after the initial cursorDeclaration bump: 1. alt-screen early-return skipped BOTH halves of the notifier. But the default TUI wraps the composer in <AlternateScreen> — that IS the production path. CSI H resets log-update's relative-move basis, but the alt-screen park branch uses absolute CUP = `rect.x + decl.relativeX`, so a stale declaration there still parks the cursor at the pre-keystroke caret. Fix: skip ONLY the displayCursor half on alt-screen; still bump cursorDeclaration. 2. TextInput's own rerender could clobber the Ink-level bump. The fast- echo path defers setCur by 16ms; if a parent state change rerenders TextInput in that window, the layout effect inside useDeclaredCursor reads the stale React `cur` state and re-publishes a declaration at the OLD column. Fix: `cursorLayout(display, curRef.current, columns)` — read the always- up-to-date ref, not the deferred state. useMemo dropped (compute is cheap, single-line wrap-text in the common case). 3. Tests bypassed the production wiring. Added two structural tests: - `still advances cursorDeclaration on alt-screen` in the Ink-level suite, asserting displayCursor stays put but the declaration advances by the delta. - `textInputCursorSourceOfTruth.test.ts` pins three structural invariants: layout reads curRef.current, never the bare `cur` state, and the fast-echo stdout.write calls remain paired with noteCursorAdvance(±N). Source-grep invariants > flaky Ink mount tests for this kind of regression. 757/757 ui-tui tests pass (+3 over round 1). type-check clean. lint introduces zero new errors on touched files. tests/test_tui_gateway_server.py (177) pass. Closes review threads: PRRT_kwDOPRF1G86ChOG2 (ink.tsx alt-screen guard) PRRT_kwDOPRF1G86ChOG9 (textInput.tsx fast-backspace rerender window) PRRT_kwDOPRF1G86ChOHC (textInput.tsx fast-append rerender window) PRRT_kwDOPRF1G86ChOHJ (alt-screen test asserts wrong invariant) PRRT_kwDOPRF1G86ChOHP (missing integration-style coverage) * fix(tui): reject fast-backspace at soft-wrap boundary (Copilot round 3) PR #26717 round 3. Copilot caught two real things: 1. `\b \b` cannot move the terminal cursor onto the previous visual row across a soft-wrap boundary. When the caret sits at visual column 0 of a wrapped row (e.g. value 'hello ' at width 6 → cursorLayout produces (line 1, col 0)), backspace would leave the physical cursor in place while the logical caret moves up to the end of the previous visual line. `noteCursorAdvance(-1)` would then feed Ink a wrong delta. Fix: `canFastBackspaceShape` now takes the composer width and rejects when `cursorLayout(value, cursor, columns).column === 0`. The fast path falls through to the normal Ink render, which correctly lays out the new caret position. The PR-description inconsistency about alt-screen is fixed in a separate gh pr edit. Adds 4 new tests in textInputFastEcho.test.ts pinning the rejection at exact-multiple wrap boundaries plus a positive control inside a wrapped line and a back-compat case where `columns` is omitted. 761/761 ui-tui tests pass. type-check / lint clean. 177/177 Python tests/test_tui_gateway_server.py pass. Closes review threads: PRRT_kwDOPRF1G86ChxE5 (textInput.tsx:933 wrap-boundary regression) * fix(tui): polish doc + tests after Copilot round 4 Three polish points Copilot raised: 1. canFastBackspaceShape doc comment overstated the legacy contract — said it conservatively rejects potential wrap boundaries when columns is omitted, but the implementation actually skips the wrap-boundary check entirely. Reworded to make the legacy behavior explicit and warn callers not to rely on protection they don't get. 2. ink-cursor-advance.test.ts rationale comment for the 'advances cursorDeclaration in lock-step' case still referenced the pre-fix `cursorLayout(display, cur, columns)` expression. Now accurately describes the current source of truth — `curRef.current` in textInput.tsx — and explains the window the bump is bridging. 3. Removed the three `__getForTest` accessors from Ink. The test file already cast the instance to inspect private state in the couple of tests that needed declaration mutation; the rest now use a small `peek(ink)` helper that does the same cast for reads. No test-only API surface ships in production. 761/761 ui-tui tests pass. type-check clean. lint introduces zero new errors on touched files. 177/177 tests/test_tui_gateway_server.py pass. Closes review threads: PRRT_kwDOPRF1G86Ch23W (canFastBackspaceShape doc accuracy) PRRT_kwDOPRF1G86Ch23f (stale test rationale) PRRT_kwDOPRF1G86Ch23p (test-only API surface in production) fix(tui): tighten doc + add dy test coverage (Copilot round 5) Two polish points from round 5: 1. canFastBackspaceShape doc had two paragraphs that conflicted — the main 'Additionally rejects when the physical cursor sits at visual column 0' was stated unconditionally, then the columns-param paragraph qualified that it only happens when columns is passed. Reworked into clear 'When supplied / When omitted' branches with a concrete example value ('hello ' returns true without columns even though it would be unsafe at width 6). No more inconsistency. 2. Added a test asserting cursorDeclaration.relativeY advances when dy is non-zero. Existing tests exercised dy on displayCursor only. Newlines in fast-echoed text don't currently hit the bypass (canFastAppendShape rejects '\n'), but dy is part of the public notifier contract and must propagate symmetrically with dx so future callers get a fully-implemented contract. 762/762 ui-tui tests pass (+1). type-check / lint / build clean. Closes review threads: PRRT_kwDOPRF1G86Ch6Sz (doc inconsistency) PRRT_kwDOPRF1G86Ch6TE (missing dy coverage on declaration) * fix(tui): doc polish (Copilot round 6) Four small but valid points: 1. textInputCursorSourceOfTruth.test.ts used bare 'fs'/'path'/'url' imports; the rest of ui-tui consistently uses the 'node:' prefix (see src/__tests__/useSessionLifecycle.test.ts, src/lib/editor.test.ts). Switched to node:fs / node:path / node:url to match convention. 2. CursorAdvanceContext.ts type-level doc described only displayCursor. The notifier intentionally also mutates the active cursorDeclaration and that's the only part that matters on alt-screen. Reworked the doc into a two-part 'updates both' summary with the alt-screen asymmetry called out explicitly. 3. use-cursor-advance.ts hook doc had the same problem. Same fix — document both pieces of state, both screen modes. 4. App.tsx onCursorAdvance prop comment was incomplete. Same fix — describe both state updates and the screen-mode asymmetry. No behavior change. 762/762 ui-tui tests pass. type-check / lint / build clean. Closes review threads (auto-resolved on PR but valid critiques): PRRT_kwDOPRF1G86Ch926 (node: prefix on built-in imports) PRRT_kwDOPRF1G86Ch92_ (use-cursor-advance.ts doc) PRRT_kwDOPRF1G86Ch93H (CursorAdvanceContext.ts type doc) PRRT_kwDOPRF1G86Ch93J (App.tsx prop comment)	2026-05-16 00:28:12 -05:00
teknium1	559c6ad94a	feat(skills): add optional pinggy-tunnel skill Zero-install localhost tunnels over SSH via Pinggy. Covers HTTP/HTTPS, TCP, TLS, access control (basic auth / bearer / IP whitelist), header manipulation (CORS, force-HTTPS), web debugger, Pro token mode, and four composite recipes (webhook receiver, MCP server exposure, local LLM endpoint share, dev-server quick-share with one-shot password). Closes #361	2026-05-15 22:15:14 -07:00
teknium1	afb97dbc53	docs: add Programmatic Integration overview (closes #360 ) Document the three protocols already available for driving hermes-agent from external programs — ACP, the TUI gateway JSON-RPC, and the OpenAI-compatible API server — with a 'which one should I use' guide and a Pi-style RPC command mapping table. Sidebar entry under Developer Guide -> Architecture.	2026-05-15 22:14:33 -07:00
Teknium	016c772e7f	feat(plugins): tool override flag for replacing built-in tools (closes #11049 ) (#26759 ) Plugins can now replace a built-in tool by passing override=True to ctx.register_tool(). Without it, the registry rejects any registration that would shadow an existing tool from a different toolset (unchanged default behavior). Unlocks the use case from #11049: drop-in replacement of browser/web backends without forking core. Composes with the existing pre_tool_call hook for runtime interception of any implementation. The override is audit-logged at INFO so it surfaces in agent.log.	2026-05-15 22:12:57 -07:00
helix4u	9c304a7f56	fix(agent): retry malformed anthropic stream parser errors	2026-05-15 22:12:42 -07:00
teknium1	53637fb17d	chore(skills/darwinian-evolver): AUTHOR_MAP + docs regen	2026-05-15 21:56:07 -07:00
teknium1	c9b32a654c	feat(skill): darwinian-evolver optional skill Thin wrapper around Imbue's darwinian_evolver (AGPL-3.0, subprocess-only). Ships a working OpenRouter driver (parrot_openrouter.py), a snapshot inspector (show_snapshot.py), and a custom-problem template. SKILL.md has 58-char description, Pitfalls sourced from actually running the loop: non-viable seed trap, Azure content filter killing runs, loop.run() being a generator, nested-pickle snapshots, and aggressive default concurrency. Salvaged from #12719 by @Bihruze — original PR shipped 12,289 LOC across 61 files (29 Python modules, FastAPI dashboard, VS Code extension, benchmark hub, marketplace, etc.) which was far beyond the scope of the underlying issue (#336). This version stays at the ~700-LOC scope that issue actually asked for. Authorship of the original effort credited via AUTHOR_MAP entry and the SKILL.md author field. Verified end-to-end: seed 'Say {{ phrase }}' (score 0.000) evolved into 'Please repeat the following phrase exactly as it is, without any modifications or additional formatting: {{ phrase }}' (score 0.750) across 3 iterations on gpt-4o-mini via OpenRouter. Co-authored-by: Bihruze <98262967+Bihruze@users.noreply.github.com>	2026-05-15 21:56:07 -07:00

1 2 3 4 5 ...

8511 commits